Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Sep 1.
Published in final edited form as: Psychother Res. 2020 Jun 2;30(7):885–899. doi: 10.1080/10503307.2020.1769875

Do therapist effects really impact estimates of within-patient mechanisms of change? A Monte Carlo simulation study

Fredrik Falkenström 1, Nili Solomonov 2, Julian A Rubel 3
PMCID: PMC7526345  NIHMSID: NIHMS1617688  PMID: 32482144

Abstract

Objective:

Existing evidence highlights the importance of modeling differential therapist effectiveness when studying psychotherapy outcome. However, no study to date examined whether this assertion applies to the study of within-patient effects in mechanisms of change. The study investigated whether therapist effects should be modeled when studying mechanisms of change on a within-patient level.

Methods:

We conducted a Monte Carlo simulation study, varying patient- and therapist level sample sizes, degree of therapist-level nesting (intra-class correlation), balanced vs. unbalanced assignment of patients to therapists, and fixed vs random within-patient coefficients. We estimated all models using longitudinal multilevel and structural equation models that ignored (2-level model) or modeled therapist effects (3-level model).

Results:

Across all conditions, 2-level models performed equally or were superior to 3-level models. Within-patient coefficients were unbiased in both 2- and 3-level models. In 3-level models, standard errors were biased when number of therapists was small, and this bias increased in unbalanced designs. Ignoring random slopes led to biased standard errors when slope variance was large; but 2-level models still outperformed 3-level models.

Conclusions:

In contrast to treatment outcome research, when studying mechanisms of change on a within-patient level, modeling therapist effects may even reduce model performance and increase bias.

Keywords: Mechanisms of change, Therapist effects, Cross-Lagged Panel Model, Multilevel Modeling, Structural Equation Modeling


In recent years, there has been an upsurge of interest in studying mechanisms of change in psychotherapy research. A mechanism of change is a theoretically postulated underlying target for therapeutic interventions that, if changed, theoretically leads to change in outcome (Kazdin, 2007). Therapeutic approaches differ in hypothesized mechanisms presumed to lead to clinical improvement. For example, cognitive therapy for depression targets patients’ distorted cognitions about the self, the world, and the future (Beck, Rush, Shaw, & Emery, 1979). Psychodynamic therapy facilitates patients’ insight into problematic relationship patterns, awareness of avoided emotions, and/or a corrective emotional experiences with the therapist (Summers & Barber, 2010). Some mechanisms are trans-theoretical and assumed to facilitate therapeutic change across treatment models (e.g. the working alliance).

This increased focus on change mechanisms has been accompanied by an emphasis on collecting longitudinal data with repeated measurements of both candidate mechanisms and outcome throughout treatment, allowing researchers to study nuanced and complex associations over time (Falkenström, Finkel, Sandell, Rubel, & Holmqvist, 2017). Recent decades have also been marked with developments in the statistical modeling of longitudinal data and increased awareness of their complexity (e.g. Allison, Williams, & Moral-Benito, 2017; Curran & Bauer, 2011; Curran, Howard, Bainter, Lane, & McGinley, 2014; Hamaker, Kuiper, & Grasman, 2015; Wang & Maxwell, 2015; Zyphur et al., 2019). One such complexity is the issue of separating within- from between patient effects. Traditionally, mechanisms of change research focused on between-patient effects – examining whether average differences in a mechanism across patients predicts change in outcome across patients. Within-patient effects are relationships between fluctuations over time in two or more variables for each given patient (Hamaker, 2012; Lundh & Falkenström, 2019; Molenaar & Campbell, 2009). That is, within-patient effects focus on associations between variables over time (across time-points) for each given patient. In contrast to between-patient effects, within-patient effects are not influenced by any factor that does not change during the time of the study. This characteristic of within-patient effects allows the exclusion of a number of alternative explanations connected to potentially confounding stable variables that are relegated to the between-patient level, and thus provide stronger evidence for causality. Conceptually, within-patient effects are also better aligned with theories of therapeutic change that focus on change processes on an individual-patient level, as well as recent emphasis on personalization of therapeutic processes (Barber & Solomonov, 2019; Rubel, Zilcha-Mano, Giesemann, Prinz, & Lutz, 2019). Our focus on mechanisms of change should not be confused with mediation analysis, which is a method commonly used for studying mechanisms of change by estimating indirect effects of treatment on outcome via a candidate mechanism. Our focus is instead on within-patient fluctuations in a candidate mechanism predicting within-patient changes in outcome, which we would argue is a method well-suited for studying mechanisms of change (Falkenström, Solomonov, & Rubel, 2020).

Therapist effects in mechanisms of change research

Therapists vary in their ability to help their patients get better (e.g. Baldwin & Imel, 2013). While therapists’ efficacy in reducing symptomatic distress has been widely studied, little is known about the effects of therapists’ variability in facilitating improvement in candidate mechanisms of change. Inclusion of therapist effects when modeling change over time requires attention to ‘nesting’ – most therapists treat more than one patient, and thus observations collected from their patients are dependent. This nested structure violates the assumption of independence of observations required for most statistical tests (Adelson & Owen, 2012).

Early on, Crits-Christoph and Mintz (1991) noted the risk for increased Type-I error rate, and overestimation of population treatment effects, if nesting of patients within therapists is ignored. Since then, a number of simulation studies have confirmed this premise, suggesting that therapist effects should be modeled when examining outcome over time to avoid bias in the estimation of treatment effects and increased risk for Type-I error (de Jong, Moerbeek, & van der Leeden, 2010; Magnusson, Andersson, & Carlbring, 2018; Wampold & Serlin, 2000). The most common method to adjust for therapist effects is allowing for variation in outcome among therapists by inclusion of random effects in multilevel models (e.g. Snijders & Bosker, 2012). While therapist effects in outcome studies have been relatively widely studied, much less is known regarding the impact of modeling therapist effects in the study of mechanisms of change, especially when studying within-patient effects.

Therapist effects in within-patient studies

Consider first a simple within-patient regression model, also called fixed effects or mean centered model, in which patient means in both independent and dependent variables are eliminated by person-mean centering, i.e.:

Yi,t-Y-i=β0+β1Xi,t-1-X-i+εi,t. (Fixed effects/mean centered model)

In this model, Yi,t is outcome for individual i at time t, β0 is the intercept term, β1 is the effect of X for individual i at time t-1, and εi,t is the model error term1. All between-patient variance is eliminated by person-mean centering, i.e. subtracting the patient means from each observation of Y and X. What is left after centering is time-specific deviations from each patient’s mean, which constitute the within-patient scores. Regression of within-patient scores in Y on within-patient scores in X yields the within-patient effect. In this model, therapist mean differences in Y and X are both eliminated through mean centering, because when there are no mean differences among patients, there are no therapist differences either.

Multilevel modeling (MLM).

In MLM, person-level means in Y are not eliminated by mean centering, but rather modeled using a random intercept term, which captures the variance in means across individuals. Similar to mean centering, what remains is within-patient deviation scores in Y. However, in standard MLM it is not possible to estimate a random intercept for X. In order to acquire a true within-patient regression model, X still needs to be person-mean centered manually. The person-mean centered model (Firebaugh, Warner, & Massoglia, 2013; Hoffman & Stawski, 2009), combines these two methods by using mean centering/fixed effect for X but random effect for Y. It is one of the most commonly used within-patient effects model in psychotherapy research:

Level 1: Yi,t=β0i+β1Xi,t-1-X-i+ εi,t.Level 2: β0i= γ00+ u0i. (2-level Person-mean centered model)

In this model, Level-1 refers to the within-patient level, i.e. repeated measurements across time, and Level-2 refers to the between-patient level. Between-person variation in Y is captured by the patient-level random intercept u0i, while X is still person-mean centered. The estimate for coefficient β1 should, however, be the same as in the fixed effects model. The advantage of the hybrid random effects model, compared to the fixed effects model, is that between-patient variance in Y is not eliminated but can be modeled and potentially explained using between-patient predictors. For instance, the Level-2 intercept u0i, can, in principle, be modeled further by adding a random intercept on the therapist level, thus creating a three-level model with repeated measures nested within patients, which in turn are nested within therapists:

Level 1: YT,i,t=β0T,i+β1XT,i,t-1-X-T,i + εT,i,t.Level 2: β0T,i= δ00T+ u0T,i.Level 3: δ00T= γ000+v0T. (3-level Person-mean centered model)

In this model, we have added Level-3 which is the between-therapist level. YT,i,t represents outcome for therapist T with patient i at time t. X is centered at the patient level, as in the previous model. The only difference between this and the previous model is that the additional random effect v0T models the between-therapist variance among patient-level means, i.e. it is u0i in the previous model which is further divided into one between-therapist component (v0T) and one within-therapist deviation component (u0T,i).

The consensus in psychotherapy research suggests that ignoring a level of nesting in the model (in our case – therapists) can lead to a bias in standard errors resulting in too liberal testing (e.g. Crits-Christoph & Mintz, 1991; Wampold & Serlin, 2000). Although usually not explicitly stated, this refers to when the ignored level is the one immediately ‘above’ the one on which the effect of interest is located, e.g. ignoring therapist effects (Level-3) when focusing on differential outcome among patients (a Level-2 effect). However, in the case of Level-1 effects, Moerbeek (2004) argued that in balanced designs these are unaffected by the omission of Level-3 nesting, because variance components at Level-3 will only be redistributed at Level-2, leaving Level-1 unaffected. Thus, leaving out Level-3 will affect standard errors at Level-2, but not at Level-1. Given that psychotherapy studies are almost invariably unbalanced, i.e. in our case when therapists treat equal numbers of patients, this finding may not apply. Thus, there is a need to evaluate the impact of ignoring therapist effects in within-person mechanism of change studies.

So far, we have only considered therapist effects in the form of average differences in predictor and/or outcome variable among therapists. However, there is also the possibility that the within-patient coefficient itself (i.e. β1) may vary among therapists. For instance, in alliance-outcome research, it is possible that the lagged effect of alliance on symptomatic distress in the next session will be stronger for some therapists than for others, perhaps because some therapists are more able to utilize the alliance in an effective way. Alternatively, the effect of challenging cognitions on subsequent depressive symptoms may be more pronounced in therapists working within a cognitive model than in therapists working within a psychodynamic model of depression. The following equations show the hybrid random effects model with a therapist-level random slope:

Level 1: YT,i,t=β0T,i+β1T,iXT,i,t-1-X-T,i+ εT,i,t.Level 2: β0T,i=δ00T + u0,T,i.β1T,i= δ01T+u1T,i.Level 3: δ00T= γ000+v0T.δ01T= γ001+v1T. (3-level Person-mean centered model with random slopes)

In this model we have allowed the slope of the within-patient coefficient β1 to vary across therapists and patients. The variation in slopes across patients, within therapists, is captured by the term u1T,i, and the variation in slopes across therapists is captured by the term v1T. If the above is the true population model, and estimation is carried out using a fixed coefficient model, results may be biased (Baird & Maxwell, 2016). To our knowledge, there are no studies testing whether ignoring therapist-level random slopes affects estimates of within-person effects.

Limitations of MLM estimation of lagged within-person effects.

So far, we have considered only a fairly simple lagged effect model within an MLM framework. This model ignores several complexities that may be present in time-series or panel data frameworks, some which can be addressed within an SEM framework. First, there is the issue of autoregression, i.e. the influence of the outcome variable at time t on itself at time t+1. This effect needs to be adjusted for, otherwise we need to assume that there are no pre-existing differences in Y at time t. Adjusting for the prior value of Y means that lagged X predicts residualized change in Y, thus ensuring correct temporality – i.e. that the prediction of Y at time t+1 from X at time t are not purely a consequence pre-existing differences in Y at time t.

Within the standard MLM framework, it is complicated to estimate autoregression while simultaneously separating within- from between-patient variance. If a lagged dependent variable is added as a covariate, the estimated model coefficients will be biased. This is variously called ‘dynamic panel bias’ or ‘Nickel’s bias’ (Nickell, 1981). It is possible to allow an autoregressive structure of the residuals, but this model still does not fully take into account the dynamic implications of the time-series data structure (Asparouhov & Muthén, 2019).

Another limitation of the standard MLM framework is that it assumes unidirectionality, and thus, does not include reverse effects, i.e. ‘feedback effects’, from the dependent variable back to the predictor (i.e. from Y to X). In studies of working alliance and psychological distress, for instance, it is now fairly well-established that these variables mutually influence each other over time, so that higher alliance scores in one session predicts lower distress in the next session, which in turn predicts higher alliance quality and so on (e.g. Falkenström, Granström, & Holmqvist, 2013; Xu & Tracey, 2015; Zilcha-Mano & Errázuriz, 2015). In SEM framework, this issue can be addressed through modeling of bidirectional relationships in cross-lagged panel models (e.g. Hamaker et al., 2015).

Structural Equation Modeling.

In SEM several equations are estimated simultaneously, with data in wide format (i.e. one row per patient in a data frame) where each variable and each wave of data has its own equation, e.g.:

Yi,t2=β0+β1Xi, t1+β2Yi, t1+ε1.Yi,t3=β0+β1Xi, t2+β2Yi, t2+ε2. (Unidirectional Cross-lagged panel model)

In these equations, β0 is the intercept, while β1 is the coefficient for the cross-lagged effect of X on Y2. In addition, we have here incorporated the autoregressive effect, which is captured in β2. The above equations only incorporate three waves, but additional waves can easily be added.

To model the reverse effects, i.e. the feedback from Y to X, we add the following equations:

Xi,t2=γ0+γ1Yi, t1+γ2Xi, t1+ε3.Xi,t3=γ0+γ1Yi, t2+γ2Xi, t2+ε4. (Bidirectional Cross-lagged panel model)

Here, γ0 is the intercept, γ1 is the coefficient for the cross-lagged effect of Y on X, and γ2 is the autoregression coefficient for X. This model, with all the above equations estimated simultaneously, is called cross-lagged panel model. Usually, the error terms at the same time-point are allowed to covary (i.e. ε1 and ε52 and ε6, and so on) to capture contemporaneous relationships between variables.

In the classical cross-lagged panel model, within- and between-patient effects are conflated. The separation of within- and between-patient variances in the dependent variable is accomplished in both MLM and SEM through the inclusion of random intercept terms. This turns SEM into Multilevel SEM (ML-SEM; e.g. Mehta & Neale, 2005), which will be represented by the addition of the person-specific random intercept u0i. This is shown in the following equation (now using general rather than time-specific notation as above):

Yi,t=β0+β1Xi, t-1+β2Yi, t-1+u0i+ε0i,t.

The difference from standard MLM is that in ML-SEM it is possible to add a random intercept term for X:

Xi,t=γ0+γ1Yi, t-1+γ2Xi, t-1+u1i+ε1i,t.

Similar to person-mean centering in MLM, the inclusion of a random intercept term for X separates the person-specific means, that are captured in the random intercept, from the time-specific deviations from the person’s mean that are used for the cross-lagged model. The above model is called Random Intercept Cross-Lagged Panel Model (RI-CLPM; Hamaker et al., 2015). The random intercepts u0i and u1i are allowed to covary, since it is likely that if there is a relationship between X and Y at the within-person level, there will also be a relationship between X and Y at the between-person level. Due to the fact that between- and within levels are separated by means of random effects in both X and Y, the problem of modeling autoregression that is present in MLM does not occur in SEM (Allison et al., 2017). Again, however, adding a third level random intercept will simply model the between-therapist variances in patient-level means; i.e. u0i and u1i are further separated into between- and within-therapist components:

YT,i,t=β0+β1XT,i, t-1+β2YT,i, t-1+u0iT,i+v00T+ε1T,i,t.XT,i,t=γ0+γ1YT,i, t-1+γ2XT,i, t-1+u1T,i+v01T+ε2T,i,t. (RI-CLPM with random intercepts for therapists)

Theoretically, the within-patient estimates for β1 and γ1 should be unaffected, since v00T and v01T only models the variances in u0iT,i and u1T,i, respectively.

Limitations of SEM estimation of lagged within-person effects.

As in MLM, there is the possibility that β1 (or any other within-level coefficient) varies among patients and/or therapists. Estimating random slope models in ML-SEM is possible, just as in standard MLM, although this requires numerical integration which is computationally burdensome and time-consuming, and may become intractable when the number of integration dimensions becomes large (Asparouhov & Muthén, 2007).

The Simulation Study

The aim of this Monte Carlo simulation study was to test the impact of ignoring the nesting of patients within therapists in within-patient analyses, under various conditions reasonably representative for psychotherapy research. The focus of this analysis was β1 in the previous equations, i.e. the within-patient cross-lagged effect of X on Y. A recent study of therapist effects in outcome research suggested bias when ignoring therapist effects in samples with few therapists, many patients per therapist, large proportion of variance at the therapist level (relative to the within-patient residual variance), and unbalanced allocation of patients to therapists (Magnusson et al., 2018). Thus, we decided to experimentally manipulate the following: a) degree of therapist-level nesting; b) number of therapists; c) number of patients per therapist; d) MLM and SEM models; and e) balanced and unbalanced allocation of patients to therapists. In addition, we tested the impact of ignoring therapist-level random slopes for the within-patient coefficient under 1) different total slope variances; and 2) different proportions of slope variance at the therapist level, as well as conditions a) – d) above.

Models and software

We created the datasets and estimated MLM simulations using the Person-mean centered model, and SEM models using the RI-CLPM (for equations see above) using Mplus v.8.1.7 (Muthén & Muthén, 1998–2017) and the MplusAutomation package in R (Hallquist & Wiley, 2018). All models were estimated using Maximum Likelihood with robust standard errors.

Fixed conditions

We set the effect size (standardized regression coefficient) for the cross-lagged effect to β = .30. In addition, we also tested β = 0 to see whether the models correctly indicate non-significance when there is no effect in the population, and to calculate the empirical alpha level. For the RI-CLPM, we set the autoregressive effects β2 and γ2 to .50 and the contemporaneous error correlations to .20.

Experimental conditions for data generation

Degree of nesting.

We included three conditions for the Intraclass Correlation Coefficient (ICC) at the therapist level: a) ICC = 0 (i.e. no nesting at therapist level, b) ICC = .05; the average ICC reported in meta-analyses of therapist effects in psychotherapy research (e.g. Baldwin & Imel, 2013); c) ICC = .20; and d) ICC = .40. Although an ICC of .40 may seem large, this ICC size may be found when therapists differ in their self-report style on change in mechanisms (Hatcher, Lindqvist, & Falkenström, 2019). In all simulations, we assigned 50% of variance to Level-1, the within-patient (repeated measures) level, while splitting the remaining 50% variance between Level-2 and Level-3 depending on the therapist-level ICC. For example, when the therapist-level ICC was set to .05, Level-3 was assigned 5%, and Level-2 45% of total variance, while when the ICC was set to .40, Level-3 was assigned 40% variance and Level-2 10%.

Sample size.

We created datasets with five repeated measurements, nested within 50, 100 or 200 patients, to reflect fairly realistic scenarios in psychotherapy studies (e.g. Lambert, 2013).

Number of therapists and number of patients per therapist.

Each sample size was created with a different number of therapists, and a different number of patients per therapist. Thus, N = 50 was either 5 therapists with 10 patients each, or 10 therapists with 5 patients each, N = 100 was either 5 therapists with 20 patients each or 20 therapists with 5 patients each, and N = 200 was 5 therapists with 40 patients each, 20 therapists with 10 patients each, or 40 therapists with 5 patients each.

Unbalanced allocation.

We created six conditions with unbalanced allocation of patients to therapists, common in psychotherapy studies. Within each of these sample sizes we created two unbalanced conditions – one with relatively many therapists and one with relatively few therapists. We varied the numbers of patients treated by therapists as much as possible, within reasonable limits. The conditions were the following:

N = 50. a) Two therapists treating 15 patients each and ten therapists treating two patients each (total of 12 therapists); b) One therapist treating 30 patients, two therapists treating eight patients each, and two therapists treating two patients each (total of five therapists).

N = 100. a) Two therapists treating 20 patients, four therapists treating ten patients each, and ten therapists treating two patients each (total of 16 therapists); b) Two therapists treating 25 patients, two therapists treating 15 patients each, and two therapists treating 10 patients each (total of six therapists).

N = 200. a) Three therapists treating 25 patients each, five therapists treating 15 patients each, and ten therapists treating five patients each (total of 18 therapists); b) Two therapists treating 50 patients each, three therapists treating 30 patients each, and two therapists treating five patients each (total of seven therapists).

Random slopes for the cross-lagged effect.

We chose large values for the slope variances to create conditions where potential problems can emerge while remaining within reasonable realistic conditions. Since the fixed effect was set to a standardized regression coefficient of .30, a therapist-level standard deviation of 0.15 would mean that approximately 95% (corresponding to about 2 standard deviation units) of the therapists had effects that were larger than zero. With larger variation than that, a substantial proportion of patients would have zero or even negative effects of the mechanism on outcome, which seems unlikely. Thus, we chose SD = 0.15 as our upper boundary for the total between-therapist variation in slopes. We also examined a condition of SD = 0.10, to test a somewhat smaller variation in slopes. In the next step, we included the Slope Intraclass Correlation, or Slope ICC (Magnusson et al., 2018), which represents therapists’ contribution to the total slope variance. Slope ICC is calculated as the therapist-level variance in slopes divided by the total variance in slopes (i.e. therapist + patient level variances). Since meta-analytic evidence has shown therapist effects on outcome to be between 5–10% (Baldwin & Imel, 2013), we set the Slope ICC to be either .05, .10 or .20.

Estimation and assessment of estimator performance

In each condition, we created 2000 samples and subsequently analyzed the data using the Person-mean centered MLM or RI-CLPM, in one of two formats: 1) a 2-level model which ignored therapist-level nesting and 2) a 3-level model which took therapist-level nesting into account. The difference between these two models is thus whether the v variables were included or not, i.e. v0T/v1T in the Person-mean centered MLM, and the v00T and v01T in the RI-CLPM. We tested each model estimates for:

  1. Proportional coefficient bias. Calculated as the difference between the average estimate of β and the true population value for β, divided by the true population value for β, i.e. β^¯-ββ. This represents the percentage average bias from the true coefficient value. A rule of thumb for unbiased models is that this should not exceed 5% (Muthén & Muthén, 2002). For the models with true population value β = 0, coefficient bias cannot be calculated as a proportion since that would involve division by zero. Instead, we report the average deviation score (which will be the average estimate of β).

  2. Coverage of the 95% confidence intervals. This is calculated as the proportion of times the estimated 95% confidence interval of β contains the true effect. Coverage is considered a robust indicator of standard error estimation (T. Asparouhov, personal communication, January 2nd, 2020). Coverage should be close to 95%, and a rule of thumb is that it should be between 91 and 98% (Muthén & Muthén, 2002).

  3. Statistical power. This is calculated as the proportion of times the model yielded a statistically significant effect when the population effect was non-zero. As is common in the behavioral sciences, we used 80% power as the criterion for large-enough statistical power (Cohen, 1992).

  4. Empirical alpha level. In models in which the population effect of β was set to zero, the proportion of times the model yielded a false-positive statistically significant effect can be interpreted as the empirical alpha level. This should be close to 5%.

Results

Overall, results of the MLM and SEM models were similar across conditions (instances of differences are described below). Model convergence rates were good, with 100% of MLM models and between 97.8–100% of SEM models converging, with the exception for the 3-level SEM models with random slopes for which none of the estimations converged. The complete results are summarized in Tables 153.

Table 1.

Simulation results for Person-mean centered model with balanced assignment of patients to therapists at ICC = 0, .05, .20, and .40a.

2-level estimates 3-level estimates
β = .30 β SE β SE
NL2 NL3 Est.b Biasc Powerd SDe Est.f Coverg Est.b Biasc Powerd SDe Est.f Coverg
50 5 0.298 −0.7% 95.6% 0.081 0.076 93.3% 0.298 −0.7% 96.6% 0.081 0.066 83.0%
50 10 0.298 −0.6% 96.2% 0.078 0.076 93.7% 0.298 −0.6% 96.2% 0.078 0.071 89.9%
100 5 0.297 −1.0% 100.0% 0.055 0.055 95.0% 0.297 −1.0% 99.9% 0.055 0.046 84.0%
100 20 0.297 −0.9% 99.9% 0.056 0.055 94.2% 0.297 −0.9% 100.0% 0.056 0.053 92.1%
200 5 0.299 −0.2% 100.0% 0.039 0.039 95.1% 0.299 −0.2% 100.0% 0.039 0.033 84.8%
200 20 0.300 −0.1% 100.0% 0.039 0.039 95.0% 0.300 −0.1% 100.0% 0.039 0.038 93.4%
200 40 0.299 −0.3% 100.0% 0.040 0.039 93.8% 0.299 −0.3% 100.0% 0.040 0.038 92.4%
β = 0 Est.b Biasc Alphah SDe Est.f Coverg Est.b Biasc Alphah SDe Est.f Coverg
50 5 −0.001 6.9% 0.084 0.080 93.0% −0.001 16.0% 0.084 0.068 84.0%
50 10 −0.001 6.4% 0.083 0.080 93.7% −0.001 11.2% 0.083 0.074 88.8%
100 5 −0.003 5.8% 0.058 0.057 94.2% −0.003 15.6% 0.058 0.049 84.4%
100 20 −0.003 6.0% 0.060 0.057 94.1% −0.003 8.0% 0.060 0.055 92.0%
200 5 −0.001 4.6% 0.040 0.041 95.4% −0.001 15.0% 0.040 0.035 85.0%
200 20 −0.001 4.5% 0.041 0.041 95.5% −0.001 6.4% 0.041 0.039 93.7%
200 40 −0.001 6.3% 0.042 0.041 93.7% −0.001 7.3% 0.042 0.040 92.7%

Note.

a

Since estimates were exactly identical for all ICC levels, only one set of results are presented in the table.

b

Average estimate of β over 2000 replications,

c

calculated as β^¯ββ,

d

proportion of 2000 replications yielding significant estimate when β = .30,

e

standard deviation of β over 2000 replications (used as estimate of population standard error),

f

average of estimated standard errors over 2000 replications,

g

proportion of 2000 replications when 95% confidence intervals contain the population value of β,

h

proportion of 2000 replications yielding significant estimate when β = 0.

Table 5.

Two-level simulation results for Random Intercept Cross-Lagged Panel Model, ignoring therapist-level random slopes.

Slope SD = 0.10 Slope SD = 0.15
ICCslope = .05 β SE β SE
NL2 NL3 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.284 −5.3% 83.1% 0.100 0.093 92.7% 0.283 −5.8% 81.2% 0.103 0.094 92.2%
50 10 0.286 −4.8% 82.8% 0.102 0.093 92.4% 0.285 −5.1% 82.2% 0.105 0.094 91.6%
100 5 0.296 −1.3% 98.7% 0.068 0.065 93.8% 0.296 −1.5% 98.3% 0.071 0.066 92.5%
100 20 0.295 −1.6% 98.5% 0.068 0.065 94.0% 0.294 −1.9% 98.1% 0.070 0.066 93.8%
200 5 0.297 −0.9% 100.0% 0.048 0.046 94.2% 0.296 −1.2% 100.0% 0.050 0.046 92.6%
200 20 0.297 −1.0% 100.0% 0.047 0.046 94.7% 0.296 −1.3% 100.0% 0.049 0.046 93.8%
200 40 0.297 −1.0% 100.0% 0.047 0.046 94.7% 0.296 −1.3% 100.0% 0.049 0.046 93.7%
ICCslope = .10 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.284 −5.3% 82.7% 0.100 0.093 92.4% 0.283 −5.8% 80.8% 0.105 0.094 91.9%
50 10 0.286 −4.8% 82.7% 0.102 0.093 92.2% 0.284 −5.2% 82.5% 0.105 0.094 91.5%
100 5 0.296 −1.3% 98.7% 0.069 0.065 93.3% 0.296 −1.4% 98.1% 0.073 0.066 91.9%
100 20 0.295 −1.6% 98.5% 0.068 0.065 94.2% 0.294 −1.9% 98.2% 0.070 0.066 93.4%
200 5 0.297 −0.9% 100.0% 0.049 0.046 93.3% 0.296 −1.2% 100.0% 0.053 0.046 91.4%
200 20 0.297 −1.0% 100.0% 0.047 0.046 94.4% 0.296 −1.3% 100.0% 0.049 0.046 93.8%
200 40 0.297 −1.0% 100.0% 0.047 0.046 94.3% 0.296 −1.3% 100.0% 0.049 0.046 93.8%
ICCslope = .20 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.284 −5.4% 82.2% 0.102 0.093 92.0% 0.283 −5.7% 80.5% 0.107 0.094 91.4%
50 10 0.285 −4.9% 83.0% 0.102 0.093 92.2% 0.284 −5.3% 81.7% 0.106 0.094 91.1%
100 5 0.296 −1.2% 98.3% 0.070 0.065 92.7% 0.296 −1.3% 97.8% 0.0754 0.0658 91.2%
100 20 0.295 −1.7% 98.5% 0.068 0.065 94.2% 0.294 −1.9% 98.0% 0.0705 0.066 93.3%
200 5 0.297 −0.9% 100.0% 0.051 0.046 91.8% 0.296 −1.2% 100.0% 0.0567 0.0461 89.3%
200 20 0.297 −1.0% 100.0% 0.047 0.046 94.6% 0.296 −1.2% 100.0% 0.0502 0.0463 93.4%
200 40 0.297 −1.0% 100.0% 0.047 0.046 94.3% 0.296 −1.3% 100.0% 0.0493 0.0462 93.3%

Note.

a

Average estimate of β over 2000 replications,

b

calculated as β^¯ββ,

c

proportion of 2000 replications yielding significant estimate when β = .30,

d

standard deviation of β over 2000 replications (used as estimate of population standard error),

e

average of estimated standard errors over 2000 replications,

f

calculated as se^¯sdsd,

g

proportion of 2000 replications when 95% confidence intervals contain the population value of β,

h

proportion of 2000 replications yielding significant estimate when β = 0.

Balanced allocation of patients to therapists (Table 1 and Tables S2S3)

Degree of therapist-level nesting (ICC).

With balanced assignment of patients to therapists, ignoring therapist-level nesting did not impact estimated within-patient coefficients. In MLM, with manually patient-mean centered predictor variables, the estimates were identical for conditions of ICC = .00, .05, .20, or .40 (Table 1). In SEM, where centering is accomplished by latent variable modeling, estimates were highly similar but not identical (Tables S2S3).

Number of therapists.

The 3-level models showed poor performance in the estimation of standard errors when the number of therapists was small. For the 3-level model, at least 10 therapists were needed to achieve accurate estimation of standard errors, as shown by coverage rates falling below 91% and alpha levels increasing to ~ 15% in the conditions with only 5 therapists (Table 1, Table S2 and Table S3). Thus, estimating a 3-level model with too few therapists increases the risk of Type-I error. In contrast, the 2-level model, which ignores therapist-level nesting, showed no problems regardless of the number of therapists.

Number of patients per therapist.

Overall, the number of patients per therapist did not impact the performance of neither 2- nor 3-level models for MLM and SEM (see Tables 1, S2 and S3).

Unbalanced allocation of patients to therapists (Tables 2, 4 and S4)

Table 2.

Simulation results for Person-mean centered model with unbalanced assignment of patients to therapists at ICC = 0, .05, .20, and .40.

2-level estimates 3-level estimates
β = .30 β SE β SE
NL2 NL3 Est.b Biasc Powerd SDe Est.f Coverg Est.b Biasc Powerd SDe Est.f Coverg
50 5 0.297 −0.9% 96.1% 0.080 0.076 93.7% 0.297 −0.9% 97.5% 0.080 0.055 74.7%
50 10 0.298 −0.6% 96.7% 0.078 0.076 93.9% 0.298 −0.6% 96.9% 0.078 0.067 87.7%
100 5 0.298 −0.8% 100.0% 0.055 0.055 94.4% 0.298 −0.8% 99.9% 0.055 0.047 84.7%
100 16 0.297 −1.1% 100.0% 0.056 0.055 94.0% 0.297 −1.1% 99.9% 0.056 0.050 89.0%
200 7 0.299 −0.4% 100.0% 0.039 0.039 94.9% 0.299 −0.4% 100.0% 0.039 0.033 85.8%
200 18 0.300 −0.1% 100.0% 0.039 0.039 94.7% 0.300 −0.1% 100.0% 0.039 0.036 90.8%
β = 0 Est.b Biasc Alphah SDe Est.f Coverg Est.b Biasc Alphah SDe Est.f Coverg
50 5 −0.003 6.2% 0.082 0.080 93.8% −0.0025 24.0% 0.082 0.057 76.0%
50 10 −0.002 6.4% 0.082 0.080 93.6% −0.0021 12.9% 0.082 0.070 87.1%
100 5 −0.002 5.4% 0.058 0.057 94.7% −0.0018 14.7% 0.058 0.049 85.3%
100 16 −0.003 6.4% 0.059 0.057 93.6% −0.0032 11.3% 0.059 0.052 88.7%
200 7 −0.001 6.0% 0.041 0.041 94.0% −0.0014 15.8% 0.041 0.035 84.2%
200 18 0.000 5.4% 0.041 0.041 94.6% −0.0001 8.6% 0.041 0.038 91.4%

Note.

a

Since estimates were exactly identical for all ICC levels, only one set of results are presented in the table.

b

Average estimate of β over 2000 replications,

c

calculated as β^¯ββ,

d

proportion of 2000 replications yielding significant estimate when β = .30,

e

standard deviation of β over 2000 replications (used as estimate of population standard error),

f

average of estimated standard errors over 2000 replications,

g

proportion of 2000 replications when 95% confidence intervals contain the population value of β,

h

proportion of 2000 replications yielding significant estimate when β = 0.

Table 4.

Simulation results for Random Intercept Cross-Lagged Panel Model with unbalanced assignment of patients to therapists at ICC = 0, .05, .20, and .40 and β = .30.

2-level estimates 3-level estimates
ICC = 0 β SE β SE
NL2 NL3 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.287 −4.3% 84.4% 0.097 0.093 93.3% 0.289 −3.8% 88.1% 0.095 20.887 78.0%
50 10 0.286 −4.6% 84.4% 0.095 0.093 93.8% 0.288 −4.1% 83.3% 0.094 11.554 87.8%
100 5 0.295 −1.7% 99.1% 0.064 0.065 95.1% 0.295 −1.6% 98.5% 0.064 0.057 87.0%
100 16 0.296 −1.5% 98.4% 0.066 0.065 94.3% 0.296 −1.4% 97.9% 0.066 0.061 90.2%
200 7 0.298 −0.7% 100.0% 0.044 0.045 95.6% 0.298 −0.6% 100.0% 0.044 0.039 86.4%
200 18 0.298 −0.8% 100.0% 0.044 0.045 96.1% 0.298 −0.8% 100.0% 0.044 0.043 92.7%
ICC = .05 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.287 −4.2% 84.4% 0.097 0.093 93.2% 0.289 −3.8% 87.3% 0.096 0.370 78.0%
50 10 0.287 −4.5% 84.3% 0.094 0.093 94.0% 0.288 −4.0% 81.8% 0.095 >100 88.0%
100 5 0.295 −1.6% 99.0% 0.065 0.065 95.0% 0.295 −1.5% 97.9% 0.064 0.061 87.4%
100 16 0.296 −1.5% 98.6% 0.066 0.065 94.3% 0.296 −1.5% 97.2% 0.066 0.077 90.8%
200 7 0.298 −0.7% 100.0% 0.045 0.045 95.2% 0.298 −0.6% 100.0% 0.044 0.039 86.3%
200 18 0.298 −0.8% 100.0% 0.045 0.045 96.1% 0.298 −0.8% 99.8% 0.044 0.045 92.9%
ICC = .20 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.288 −3.9% 85.2% 0.096 0.092 92.9% 0.290 −3.3% 88.5% 0.094 4.853 78.0%
50 10 0.288 −4.1% 84.9% 0.094 0.093 94.1% 0.289 −3.5% 84.1% 0.094 >100 87.7%
100 5 0.296 −1.5% 98.8% 0.065 0.065 94.8% 0.296 −1.2% 98.5% 0.063 >100 87.6%
100 16 0.296 −1.5% 98.4% 0.067 0.065 94.3% 0.296 −1.2% 97.6% 0.065 0.061 90.5%
200 7 0.298 −0.7% 100.0% 0.045 0.045 95.2% 0.299 −0.5% 99.9% 0.044 0.039 86.2%
200 18 0.298 −0.8% 100.0% 0.045 0.045 95.6% 0.298 −0.6% 100.0% 0.044 0.042 92.9%
ICC = .40 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.290 −3.3% 86.6% 0.095 0.091 93.3% 0.294 −2.2% 90.6% 0.090 20.178 77.2%
50 10 0.289 −3.6% 85.4% 0.093 0.092 94.0% 0.293 −2.2% 87.2% 0.091 >100 87.6%
100 5 0.296 −1.2% 99.1% 0.065 0.064 94.1% 0.298 −0.6% 99.0% 0.061 0.054 86.6%
100 16 0.296 −1.4% 98.6% 0.067 0.065 94.2% 0.298 −0.5% 98.9% 0.064 0.059 89.7%
200 7 0.298 −0.6% 100.0% 0.045 0.045 95.5% 0.300 −0.2% 99.9% 0.043 0.037 85.9%
200 18 0.298 −0.7% 100.0% 0.045 0.045 95.4% 0.299 −0.3% 100.0% 0.043 0.041 92.6%

Note.

a

Average estimate of β over 2000 replications,

b

calculated as β^¯ββ,

c

proportion of 2000 replications yielding significant estimate when β = .30,

d

standard deviation of β over 2000 replications (used as estimate of population standard error),

e

average of estimated standard errors over 2000 replications,

f

calculated as se^¯sdsd,

g

proportion of 2000 replications when 95% confidence intervals contain the population value of β,

h

proportion of 2000 replications yielding significant estimate when β = 0.

For both MLM (Table 2) and SEM (Table 4 and S4), the two-level model was unaffected by unbalanced allocation, while the 3-level model showed poorer performance in terms of bias in standard errors compared to balanced conditions. Coverage rates for the three-level models were below the 91% threshold in almost all conditions, and alpha levels ranged between 9 and 24%.

Degree of therapist-level nesting (ICC).

Results were comparable to the balanced conditions, such that estimates did not depend on the degree of nesting (for MLM exactly identical results across all ICC levels, for SEM similar but not identical results).

Number of therapists.

In the 3-level models, results were similar to the balanced conditions such that the largest bias was seen in conditions with few therapists. In contrast, the 2-level models were unaffected by unbalanced assignment, regardless of the number of therapists.

Number of patients per therapist.

Since the number of patients per therapist varied within each condition (by definition to create unbalanced data), it is not possible to make inferences with certainty regarding results under this condition.

Varying coefficients among therapists (random slope models; Tables 3, 5 and S1)

Table 3.

Simulation results for Person-mean centered model with therapist-level random slopes (slope SD = 0.15).

2-level estimates 3-level estimates
ICCslope = .05 β SE β SE
NL2 NL3 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.299 −0.2% 94.8% 0.083 0.081 93.3% 0.300 −0.1% 91.4% 0.083 0.087 85.9%
50 10 0.298 −0.8% 95.1% 0.082 0.081 94.1% 0.298 −0.8% 89.6% 0.083 0.111 91.9%
100 5 0.299 −0.4% 99.8% 0.061 0.058 92.7% 0.299 −0.3% 98.9% 0.061 0.055 84.0%
100 20 0.299 −0.2% 100.0% 0.060 0.058 94.1% 0.299 −0.3% 98.9% 0.059 0.063 93.2%
200 5 0.299 −0.2% 100.0% 0.045 0.041 92.8% 0.299 −0.2% 99.8% 0.045 0.038 84.2%
200 20 0.300 0.1% 100.0% 0.042 0.041 94.4% 0.300 0.1% 99.8% 0.042 0.041 93.2%
200 40 0.301 0.4% 100.0% 0.042 0.041 93.9% 0.301 0.4% 99.9% 0.042 0.042 93.8%
ICCslope = .10 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.299 −0.2% 94.5% 0.084 0.081 92.9% 0.300 −0.1% 91.0% 0.084 0.119 85.9%
50 10 0.298 −0.7% 94.9% 0.083 0.081 93.9% 0.298 −0.7% 89.6% 0.083 0.164 92.0%
100 5 0.299 −0.4% 99.8% 0.063 0.058 92.4% 0.299 −0.3% 98.5% 0.063 0.058 84.8%
100 20 0.299 −0.2% 100.0% 0.060 0.058 94.2% 0.299 −0.2% 99.1% 0.060 0.061 93.1%
200 5 0.299 −0.3% 100.0% 0.048 0.041 91.0% 0.299 −0.3% 99.9% 0.047 0.039 85.2%
200 20 0.300 0.1% 100.0% 0.043 0.041 94.2% 0.300 0.1% 99.8% 0.043 0.044 93.2%
200 40 0.301 0.4% 100.0% 0.042 0.041 93.6% 0.301 0.4% 100.0% 0.043 0.042 93.8%
ICCslope = .20 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg
50 5 0.300 −0.2% 94.2% 0.087 0.081 92.4% 0.300 −0.1% 89.2% 0.087 0.100 85.8%
50 10 0.298 −0.5% 94.7% 0.084 0.081 93.8% 0.298 −0.5% 89.3% 0.084 0.105 92.2%
100 5 0.299 −0.3% 99.7% 0.066 0.058 91.0% 0.299 −0.3% 98.5% 0.066 0.063 84.7%
100 20 0.300 −0.2% 100.0% 0.061 0.058 93.8% 0.299 −0.2% 98.7% 0.060 0.066 93.3%
200 5 0.299 −0.4% 100.0% 0.052 0.041 87.7% 0.299 −0.4% 99.8% 0.052 0.044 85.3%
200 20 0.300 0.1% 100.0% 0.044 0.041 93.5% 0.300 0.1% 99.8% 0.044 0.043 93.0%
200 40 0.301 0.4% 100.0% 0.043 0.041 93.3% 0.301 0.4% 99.9% 0.043 0.042 93.7%

Note.

a

Average estimate of β over 2000 replications,

b

calculated as β^¯ββ,

c

proportion of 2000 replications yielding significant estimate when β = .30,

d

standard deviation of β over 2000 replications (used as estimate of population standard error),

e

average of estimated standard errors over 2000 replications,

f

calculated as se^¯sdsd,

g

proportion of 2000 replications when 95% confidence intervals contain the population value of β.

We created data for which the slopes for the Level-1 coefficient varied at both Level-2 (patients) and Level-3 (therapists). The 2-level estimation models were the same as previously, i.e. assuming a fixed Level-1 coefficient – thus ignoring random slopes altogether4, while the 3-level model estimated random slopes on both Level-2 and −3. Overall, results were mostly unbiased in the fixed coefficient (2-level) model, in both MLM (Table 3 and S1) and SEM (Table 5). Of note, none of the 3-level SEM models converged. In MLM, 3-level models converged but there were problems in the estimation of standard errors in some of the conditions.

Total slope variance.

In conditions with larger total slope variance (SD = 0.15, compared to SD = 0.10) the 2-level model showed slightly worse coverage rates (Tables 3, S1 and 5). For the 3-level model, the amount of total slope variance did not impact estimation performance. However, in both conditions the 2-level model performed better than the 3-level model.

Proportion of slope variance at the therapist level (ICCslope).

For the 2-level model, a larger proportion of slope variance at the therapist level reduced coverage rates slightly. The 3-level model was almost unaffected by the increased slope ICC.

Number of therapists.

In this condition, increasing the number of therapists improved estimates in both the 3-level and 2-level models. However, this effect was smaller for the 2-level model than the 3-level model.

Number of patients per therapist.

In the 2-level model, larger number of patients per therapist was related to worse estimation of standard error. However, this effect was relatively small when evaluated by coverage rate of the 95% confidence intervals. For the 3-level model, the number of patients per therapist did not affect estimation performance.

In sum, the 2-level model was superior to the 3-level model in all random slope conditions. The 2-level model performance was reduced in the following conditions: 1) the variance in slopes at the therapist level was very large; 2) the number of therapists was small (< 10); 3) the number of patients per therapist was very large (40 or more). Nevertheless, even in its worst condition (slope SD = 0.15, slope ICC = .20, N = 200 and 5 therapists), the 2-level model still outperformed the 3-level model.

Discussion

Our results suggest that therapist effects do not have a significant impact on within-patient estimates in mechanisms of change studies. Consistent with previous work (Moerbeek, 2004), therapist effects did not affect estimates in balanced design conditions (when all therapists treat the same number of patients), and with within-patient coefficients fixed to be the same across all therapists. This is the first study to expand this finding to various study conditions, including unbalanced designs (unequal assignment of patients to therapists), and variation in within-patient coefficients across therapists. We found that if anything, inclusion of therapist effects in 3-level models in these designs reduced model performance compared to when therapist effects were ignored (i.e. 2-level models).

This is the first study to show that therapist effects can be ignored when within-patient effects (i.e. Level-1 effects) are the primary focus in either MLM or SEM frameworks across various conditions. Taking therapist effects into account did not improve model performance in any of the simulation conditions. When the number of therapists was small, 3-level models resulted in increased risk of Type-I error. Although it is possible to improve estimation of MLM with small samples on the highest level, e.g. by using Restricted Maximum Likelihood with adjusted degrees of freedom (McNeish, 2017), this is seldom done in psychotherapy studies.

Previous simulation studies that focused on the study of treatment efficacy have shown that therapist effects should be modeled when investigating differential treatment effects, estimated at between-patient levels (Magnusson et al., 2018). Our results suggest that this conclusion does not apply to the study of within-patient effects (i.e. Level-1 effects in MLM terminology). This result challenges a common tendency to include therapist effects as a ‘better safe than sorry’ strategy in all psychotherapy studies.

Our findings also suggest that ignoring therapist-level random slopes does not, generally, affect coefficient estimates (which represent a weighted average between therapists). An exception was for SEM models with very small samples (N = 50), the coefficients were slightly downward biased (just above the 5% criterion). Standard errors were biased only if the slope variance at the therapist level was large, the number of therapists small, and/or the number of patients per therapist was large. Still, 2-level models outperformed 3-level models even in the most stringent condition, likely uncommon in psychotherapy studies, with a slope SD = 0.15 (in the context of standardized regression coefficients), slope ICC = .20, and only five therapists who treat 40 patients each. Notably, the simulation study by Magnusson et al. (2018) found substantial bias when ignoring random slopes of similar magnitudes5 when studying the effects of ignoring therapist effects in outcome research (i.e. the effect of ignoring Level-3 on Level-2 coefficients).

Our simulations showed that statistical power needed to find significant effects was above 90% in all MLM conditions tested, and above 80% for all SEM conditions, even when sample size was as small as N = 50. Although the effect size used in simulations was fairly large (β = .30), this finding is still notable. The high statistical power of these models is most likely attributed to additional information provided by repeated observations beyond sample size, i.e. with N = 50 and T = 5, there are 50 × 4 = 200 observations6 used to calculate the within-patient effect.

MLM and SEM showed similar findings, although with slightly higher bias and lower power for the SEM models. Considering the greater complexity of the SEM models used, the reduction in power was smaller than expected, suggesting good performance of these models. Notably, an important disadvantage of MLM compared to SEM models is that they do not allow adjusting the cross-lagged coefficient estimate for the dependent variable at the prior session. Ignoring the autoregressive effect in panel data results in mis-specified models and in turn in an increased risk for biased estimates of the cross-lagged effects of interest.

Our findings should be considered in the context of several limitations. First, we were unable to estimate therapist-level random slope models in any of the SEM conditions tested. If a researcher is interested in the average effect of a putative mechanism (i.e. the within-patient effect), results from our 2-level models show that random slopes can be ignored as long as they are not very large, the number of therapists is not very small, and the number of patients per therapist is not very large. In practice, the size of random slopes is of course unknown, but if the other conditions (i.e. small number of therapists and large number of patients per therapist) are fulfilled, a researcher should consider whether large slope variance at the therapist level is expected. For our simulation study, we chose the maximum slope variance so that 95% of the slopes would not cross zero, because it is unlikely that any theoretically proposed mechanism in psychotherapy research would have opposite effects among therapists. However, if in a particular research context this may be a possibility, our simulation results may not hold.

If therapist-level random slopes are the focus of research interest, SEM might not be the best modeling framework to use. As mentioned, estimating random slopes in SEM is complex and requires numerical integration which is highly computationally demanding and time-consuming (Asparouhov & Muthén, 2007). Other estimators, such as Bayesian estimation, might be preferable but are beyond the scope of the current study. Additionally, it is likely that the performance of the 3-level MLM models with small numbers of therapists would have been improved with Restricted Maximum Likelihood, perhaps with some adjustment to degrees of freedom (McNeish, 2017). However, since the focus of our study was to investigate whether therapist effects impact within-patient estimates in 2-level models rather than testing the performance of 3-level models in small samples, we did not pursue this further.

Second, for feasibility purposes and given the number of conditions tested, we chose to focus on an effect size of β = .30, which represents a medium-sized effect, common in mechanisms of change research. Of note, the β = 0 models (Tables 12) showed very similar findings as β = .30, and we also tested some of our models using smaller (0.10 and 0.20) and larger (.40 and .50) effects and results were comparable. Future studies can expand our work by testing these models using a range of effect sizes.

Conclusions, recommendations, and suggestions for future research

Due to the increased interest in mechanisms of change research in general, and within-patient effects in particular, it is vital that researchers use the most robust and accurate methods available. Since adjusting for therapist effects has proven to be important in other areas of psychotherapy research, it is understandable that many researchers believe that this is the case for within-patient mechanism studies. However, our simulation study shows that when interest is in within-patient associations, and models are used that separate these from between-patient differences, the safest strategy is to ignore therapist effects. Thus, we recommend using 2-level models (i.e. repeated measures nested within patients) when estimating these effects. If researchers are specifically interested in therapist effects, these should be models with an effort to recruit a sufficient number of therapists. Assuming an unbalanced assignment of patients to therapists this should likely be at least 15–20 therapists. Alternatively, with fewer therapists, differences among therapists can be modeled using a fixed effects approach (Snijders & Bosker, 2012).

Further research is needed on optimal estimation of random slopes in SEM models. The newly developed Dynamic Structural Equation Model, which can be used for similar research questions (i.e. estimating within-patient cross-lagged coefficients) may be optimal, given its combination of the advantages of both MLM and SEM approaches (DSEM; Asparouhov, Hamaker, & Muthén, 2018). However, at present, it requires at least around 10 repeated measurements for stable effects and good model performance (Schultzberg & Muthén, 2018).

The growing interest in personalization of psychotherapy processes and the study of individual differences across treatment requires use of complex modeling techniques. Our study demonstrates that such modeling should be used with caution and awareness of potential pitfalls and challenges that may affect inferences from estimation results. When studying within-person effects, we encourage researchers to expand our line of inquiry by cautiously considering the impact of their study conditions on modeling, selecting the most optimal and appropriate statistical approach, and investigating whether results are replicated under a range of conditions. As with treatment packages, therapeutic processes, and mechanisms of change, when it comes to modeling – one size does not fit all.

Supplementary Material

Supplementary Material

Clinical or Methodological Significance of this Article:

Our findings suggest that therapist effects can be ignored in longitudinal studies focusing on within-patient effects. Including therapist effects in statistical models focusing on within-patient effects when the number of therapists is small and/or therapists treat unequal numbers of patients may even increase the risk of biased inference.

Footnotes

1

Errors are assumed to be independent in all of the models, unless otherwise specified.

2

It is also possible in SEM to assign different coefficients to each time-point, thus allowing for different effects between, say, time 1 to time 2 than from time 2 to time 3, and so on. However, we focus on comparable effects in SEM and MLM and thus only discuss models in which (cross-) associations are constrained to be equal over time, i.e. there is only one coefficient for the within-person effect of X on Y.

3

More tables are included in the online supplement, see Tables S1S4.

4

A model that estimated random slopes on Level-2, i.e. still ignoring the therapist-level, turned out to result in very similar estimates as the model with completely fixed slopes.

5

Our largest slope variance represents a variance ratio (slope variance/within-patient variance) of 0.022, which is in-between the two largest slope variance conditions (0.02 and 0.03) used in the study by Magnusson et al. (2018). In that study, however, the largest slope ICC tested was .10 while we also included slope ICC = .20.

6

Due to lagging, there are only T-1 repeated observations used in the estimate of β, which is why we multiply by 4 instead of 5.

Data transparency statement: The data for this study has never been used before.

Contributor Information

Fredrik Falkenström, Department of Behavioral Sciences and Learning, Linköping University, Sweden.

Nili Solomonov, Weill Cornell Institute of Geriatric Psychiatry, Weill Cornell Medical College, USA.

Julian A. Rubel, Department of Psychology, Justus Liebig University Giessen, Giessen, Germany

References

  1. Adelson JL, & Owen J (2012). Bringing the psychotherapist back: basic concepts for reading articles examining therapist effects using multilevel modeling. Psychotherapy, 49, 152–162. doi: 10.1037/a0023990 [DOI] [PubMed] [Google Scholar]
  2. Allison PD, Williams R, & Moral-Benito E (2017). Maximum Likelihood for Cross-lagged Panel Models with Fixed Effects. Socius, 3, 2378023117710578. doi: 10.1177/2378023117710578 [DOI] [Google Scholar]
  3. Asparouhov T, Hamaker EL, & Muthén B (2018). Dynamic Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 25(3), 359–388. doi: 10.1080/10705511.2017.1406803 [DOI] [Google Scholar]
  4. Asparouhov T, & Muthén B (2007). Computationally efficient estimation of multilevel high-dimensional latent variable models. Paper presented at the Proceedings of the 2007 JSM meeting in Salt Lake City, Utah, Section on Statistics in Epidemiology. [Google Scholar]
  5. Asparouhov T, & Muthén B (2019). Comparison of models for the analysis of intensive longitudinal data. Structural Equation Modeling: A Multidisciplinary Journal, 1–23. doi: 10.1080/10705511.2019.1626733 [DOI] [Google Scholar]
  6. Baird R, & Maxwell SE (2016). Performance of time-varying predictors in multilevel models under an assumption of fixed or random effects. Psychological Methods, 21(2), 175–188. doi: 10.1037/met0000070 [DOI] [PubMed] [Google Scholar]
  7. Baldwin SA, & Imel ZE (2013). Therapist effects: Findings and Methods In Lambert MJ (Ed.), Bergin and Garfield’s Handbook of Psychotherapy and Behavior Change (6th ed., pp. 258–297). New York: Wiley. [Google Scholar]
  8. Barber JP, & Solomonov N (2019). Toward a personalized approach to psychotherapy outcome and the study of therapeutic change. World Psychiatry, 18(3), 291–292. doi: 10.1002/wps.20666 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Beck AT, Rush A, Shaw B, & Emery G (1979). Cognitive therapy of depression. New York: Guilford Press. [Google Scholar]
  10. Cohen J (1992). A power primer. Psychological Bulletin, 112, 155–159. [DOI] [PubMed] [Google Scholar]
  11. Crits-Christoph P, & Mintz J (1991). Implications of therapist effects for the design and analysis of comparative studies of psychotherapies. Journal of Consulting and Clinical Psychology, 59, 20–26. [DOI] [PubMed] [Google Scholar]
  12. Curran PJ, & Bauer DJ (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. doi:doi: 10.1146/annurev.psych.093008.100356 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Curran PJ, Howard AL, Bainter S. a., Lane ST, & McGinley JS (2014). The separation of between-person and within-person components of individual change over time: A latent curve model with structured residuals. Journal of Consulting and Clinical Psychology, 82(5), 879–894. doi: 10.1037/a0035297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. de Jong K, Moerbeek M, & van der Leeden R (2010). A priori power analysis in longitudinal three-level multilevel models: an example with therapist effects. Psychotherapy Research, 20, 273–284. doi: 10.1080/10503300903376320 [DOI] [PubMed] [Google Scholar]
  15. Falkenström F, Finkel S, Sandell R, Rubel JA, & Holmqvist R (2017). Dynamic models of individual change in psychotherapy process research. Journal of Consulting and Clinical Psychology, 85(6), 537–549. doi: 10.1037/ccp0000203 [DOI] [PubMed] [Google Scholar]
  16. Falkenström F, Granström F, & Holmqvist R (2013). Therapeutic alliance predicts symptomatic improvement session by session. Journal of Counseling Psychology, 60, 317–328. doi: 10.1037/a0032258 [DOI] [PubMed] [Google Scholar]
  17. Falkenström F, Solomonov N, & Rubel J (2020). Using time-lagged panel data analysis to study mechanisms of change in psychotherapy research: Methodological recommendations. Counselling and Psychotherapy Research, n/a(n/a). doi: 10.1002/capr.12293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Firebaugh G, Warner C, & Massoglia M (2013). Fixed effects, random effects, and hybrid models for causal analysis In Morgan LS (Ed.), Handbook of Causal Analysis for Social Research (pp. 113–132). Dordrecht: Springer Netherlands. [Google Scholar]
  19. Hallquist MN, & Wiley JF (2018). MplusAutomation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 621–638. doi: 10.1080/10705511.2017.1402334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hamaker EL (2012). Why researchers should think ‘within-person’: A paradigmatic rationale. Handbook of research methods for studying daily life, 43–61. [Google Scholar]
  21. Hamaker EL, Kuiper RM, & Grasman RPPP (2015). A critique of the cross-lagged panel Model. Psychological Methods, 20, 102–116. doi: 10.1037/a0038889 [DOI] [PubMed] [Google Scholar]
  22. Hatcher RL, Lindqvist K, & Falkenström F (2019). Psychometric evaluation of the Working Alliance Inventory—Therapist version: Current and new short forms. Psychotherapy Research, 1–12. doi: 10.1080/10503307.2019.1677964 [DOI] [PubMed] [Google Scholar]
  23. Hoffman L, & Stawski RS (2009). Persons as contexts: Evaluating between-person and within-person effects in longitudinal analysis. Research in Human Development, 6, 97–120. doi: 10.1080/15427600902911189 [DOI] [Google Scholar]
  24. Kazdin AE (2007). Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology, 3, 1–27. doi: 10.1146/annurev.clinpsy.3.022806.091432 [DOI] [PubMed] [Google Scholar]
  25. Lambert MJ (2013). Bergin and Garfield’s Handbook of Psychotherapy and Behavior Change (6 ed.). New York: Wiley. [Google Scholar]
  26. Lundh LG, & Falkenström F (2019). Towards a person-oriented approach to psychotherapy research. Journal for Person-Oriented Research, 5(2), 65–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Magnusson K, Andersson G, & Carlbring P (2018). The consequences of ignoring therapist effects in trials with longitudinal data: A simulation study. Journal of Consulting and Clinical Psychology, 86(9), 711–725. doi: 10.1037/ccp0000333 [DOI] [PubMed] [Google Scholar]
  28. McNeish D (2017). Small Sample Methods for Multilevel Modeling: A Colloquial Elucidation of REML and the Kenward-Roger Correction. Multivariate Behavioral Research, 52(5), 661–670. doi: 10.1080/00273171.2017.1344538 [DOI] [PubMed] [Google Scholar]
  29. Mehta PD, & Neale MC (2005). People are variables too: multilevel structural equations modeling. Psychological Methods, 10, 259–284. doi: 10.1037/1082-989X.10.3.259 [DOI] [PubMed] [Google Scholar]
  30. Moerbeek M (2004). The consequence of ignoring a level of nesting in Multilevel Analysis. Multivariate Behavioral Research, 39(1), 129–149. doi: 10.1207/s15327906mbr3901_5 [DOI] [PubMed] [Google Scholar]
  31. Molenaar PCM, & Campbell CG (2009). The new person-specific paradigm in psychology. Current Directions in Psychological Science, 18(2), 112–117. doi: 10.1111/j.1467-8721.2009.01619.x [DOI] [Google Scholar]
  32. Muthén LK, & Muthén BO (1998–2017). Mplus user’s guide. Los Angeles, CA: Muthén & Muthén. [Google Scholar]
  33. Muthén LK, & Muthén BO (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599–620. [Google Scholar]
  34. Nickell S (1981). Biases in dynamic models with fixed effects. Econometrica, 49(6), 1417–1426. doi: 10.2307/1911408 [DOI] [Google Scholar]
  35. Rubel JA, Zilcha-Mano S, Giesemann J, Prinz J, & Lutz W (2019). Predicting personalized process-outcome associations in psychotherapy using machine learning approaches—A demonstration. Psychotherapy Research, 1–10. doi: 10.1080/10503307.2019.1597994 [DOI] [PubMed] [Google Scholar]
  36. Schultzberg M, & Muthén B (2018). Number of subjects and time points needed for multilevel time-series analysis: A simulation study of Dynamic Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 495–515. doi: 10.1080/10705511.2017.1392862 [DOI] [Google Scholar]
  37. Snijders TAB, & Bosker RJ (2012). Multilevel Analysis: An introduction to basic and advanced multilevel modeling.: Sage Publishers. [Google Scholar]
  38. Summers RF, & Barber JP (2010). Psychodynamic therapy: A guide to evidence-based practice. New York, NY, US: Guilford Press. [Google Scholar]
  39. Wampold BE, & Serlin RC (2000). The consequence of ignoring a nested factor on measures of effect size in analysis of variance. Psychological Methods, 5(4), 425–433. doi: 10.1037/1082-989X.5.4.425 [DOI] [PubMed] [Google Scholar]
  40. Wang LP, & Maxwell SE (2015). On disaggregating between-person and within-person effects with longitudinal data using Multilevel Models. Psychological Methods, 20, 63–83. doi: 10.1037/met0000030 [DOI] [PubMed] [Google Scholar]
  41. Xu H, & Tracey TJG (2015). Reciprocal influence model of working alliance and therapeutic outcome over individual therapy course. Journal of Counseling Psychology, 62, 351–359. [DOI] [PubMed] [Google Scholar]
  42. Zilcha-Mano S, & Errázuriz P (2015). One size does not fit all: Examining heterogeneity and identifying moderators of the alliance – outcome association. Journal of Counseling Psychology, 62(4), 579–591. doi: 10.1037/cou0000103 [DOI] [PubMed] [Google Scholar]
  43. Zyphur MJ, Allison PD, Tay L, Voelkle M, Preacher KJ, Zhang Z, … Diener E (2019). From data to causes I: Building a general cross-lagged panel model (GCLM). Organizational Research Methods. doi: 10.1177/1094428119847278 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES