Evaluating Response Shift in Statistical Mediation Analysis

A R Georgeson; Matthew J Valente; Oscar Gonzalez

doi:10.1177/25152459211012271

. Author manuscript; available in PMC: 2022 May 19.

Published in final edited form as: Adv Methods Pract Psychol Sci. 2021 May 13;4(2):10.1177/25152459211012271. doi: 10.1177/25152459211012271

Evaluating Response Shift in Statistical Mediation Analysis

A R Georgeson ¹, Matthew J Valente ², Oscar Gonzalez ¹

PMCID: PMC9119663 NIHMSID: NIHMS1794539 PMID: 35600065

Abstract

Researchers and prevention scientists often develop interventions to target intermediate variables (known as mediators) that are thought to be related to an outcome. When researchers target a mediating construct measured by self-report, the meaning of self-report measure could change from pretest to posttest for the individuals who received the intervention – which is a phenomenon referred to as response shift. As a result, any observed changes on the mediator measure across groups or across time might reflect a combination of true change on the construct and response shift. Although previous studies have focused on identifying the source and type of response shift in measures after an intervention, there has been limited research on how using sum scores in the presence of response shift affects the estimation of mediated effects via statistical mediation analysis, which is critical for explaining how the intervention worked. In this paper, we focus on recalibration response shift, which is a change in internal standards of measurement, which affects how respondents interpret the response scale. We provide background on the theory of response shift and the methodology used to detect response shift (i.e., tests of measurement invariance). Additionally, we use simulated datasets to provide an illustration of how recalibration in the mediator can bias estimates of the mediated effect and also impact type I error and power.

Keywords: response shift, statistical mediation, measurement invariance, randomized intervention, pretest-posttest design

A key aspect of intervention research is to determine how the intervention works. Statistical mediation analysis is an analytic technique that introduces intermediate variables, known as mediators [M], to explain how an intervention [X] transmits its effect to an outcome [Y] (Baron & Kenny, 1986; MacKinnon, 2008; VanderWeele, 2015). Statistical mediation plays a critical role in the advancement of intervention research by identifying the program components that are beneficial, iatrogenic, or that need to be reinforced (MacKinnon & Dwyer, 1993). Beyond testing whether an intervention changed the mediator (e.g., a manipulation check), if a particular mediator is identified in one intervention, that knowledge could be extended to other types of treatment. For example, if we found that a program component reduced cravings in a sample of smokers, which then led to reduced smoking, we could use that program component in other preventive interventions for addictive behaviors.

Mediators are often assessed by self-report measures. An inherent assumption made when using self-report measures is that all respondents interpret and respond to the measure in the same way such that a particular response has the same meaning for all individuals. Consider a hypothetical example that we will refer to throughout the paper featuring a randomized intervention to reduce cravings of alcohol and drugs (inspired by Hsiao et al., 2019). Participants in the treatment group received a mindfulness-based relapse prevention intervention, and the controls received a twelve-step abstinence-based program. The intervention targets self-awareness, a facet of mindfulness, which is thought to mediate the relation between the mindfulness intervention and reduced cravings. Suppose that as a result of undergoing the intervention, respondents in the treatment group develop a different interpretation of the mediator, self-awareness, than what they had at baseline. When this occurs, observed changes in the self-awareness measure could reflect both true change in the construct of self-awareness as well as the differences in interpretation across respondents. When the meaning of the responses on the self-report measure change as a result of the treatment, this is called a response shift.

While the term response shift and similar concepts were initially discussed in educational training interventions (Howard, 1980) and organizational research (Golembiewski et al., 1979), response shift has primarily been discussed with respect to measures of health related quality of life (QoL; Oort, Visser & Sprangers, 2009). Response shift provided an explanation for counterintuitive findings in which individuals with severe life-threatening illnesses reported QoL that was equal or superior to what was reported before their diagnosis, or relative to healthy individuals (Sprangers & Schwartz, 1999). Although response shift has been widely studied in QoL research, there has not yet been an investigation of how response shift in mediators may impact our understanding of how an intervention works. Therefore, the goal of this paper is to illustrate how response shift might be manifested in a mediator and affect the estimation of the mediated effect. The structure of the paper is the following. First, we provide background on statistical mediation and response-shift theory. Then, we discuss how potential response shift can be detected using latent variable models. Next, we use simulated datasets to illustrate how recalibration, a type of response shift, in the mediator affects the estimation of the mediated effect when sum scores are used. Finally, we summarize the implications of our illustration and discuss limitations and future directions.

Statistical Mediation Analysis

In pretest-posttest intervention studies, an appropriate model to test for mediation is the two-wave mediation model (Cole & Maxwell, 2003; MacKinnon, 2008; Valente & MacKinnon, 2017). Examples of studies that used the two-wave mediation model have recently appeared in various fields such as in mental health and clinical psychology (e.g., Behrendet et al., 2020), developmental psychology (e.g., Luengo Kanacri et al., 2019), physical health or sports science (e.g., Plow et al., 2020), and sociology (e.g., Bruneau, Kteily & Urbiola, 2020). The two-wave mediation model is represented by the following equations (also see Figure 1),

M_{2} = i_{1} + a X + s_{1} M_{1} + b_{1} Y_{1} + e_{1}

(1)

Y_{2} = i_{2} + c' X + s_{2} Y_{1} + b_{2} M_{1} + b_{3} M_{2} + e_{2}

(2)

where X is our binary treatment or control group indicator, M is the mediator (in our case, self-awareness) measured at pretest and posttest (M₁ and M₂), and Y is the outcome (in our case, craving) also measured at pretest and posttest (Y₁ and Y₂) – all variables are observed. The coefficients of Equations 1 and 2 are explained below Figure 1. In the self-awareness example, we posit that the intervention changed the self-awareness mediator and that self-awareness then changed the cravings outcome. The mediated effect is defined by the product of the a₄ and b₅ paths, and the significance of ab₃ provides evidence supportive of mediation.

Figure 1. — Two wave mediation model with observed mediator

*Note:* Boxes refer to observed variables, X is a binary variable indicating treatment group. a is the Effect of X on M2, s₁ is the stability of M, s₂ is the stability of M, b₁ is the cross-lag path between M1 and Y2, b₂ is the cross-lag path between Y1 and M2, b₃ is the effect of M2 on Y2, c’ is the direct effect of X on Y2*. Additionally, from Equations 1 and 2, i₁ and i₂ are regression intercepts and e₁ and e₂ are regression residuals (not shown in diagram).

Several assumptions need to be met so that the mediated effect is given a causal interpretation (MacKinnon, 2008). We assume that the functional form and temporal precedence between the variables have been correctly specified, and that there is no unmeasured confounding among the X-M₂ and X-Y₂ relations conditional on pretest measures M₁ and Y_1, no unmeasured confounding of the M₂-Y₂ relation conditional on X and the pretest measures, and no post treatment confounders of the M₂-Y₂ relation affected by X conditional on the pretest measures (Mayer et al., 2014; Pearl, 2014; Valente et al., 2019; Valeri & VanderWeele, 2013). . In this paper, we focus on the assumption that the mediator has been accurately assessed (Gonzalez & MacKinnon, in press), specifically that the mediator was measured consistently across respondents and time (i.e., no response shift). Below, we provide more detail on response shift theory and how to detect response shift.

Response Shift Theory

Sprangers and Schwartz (1999) defined response shift as a change in the meaning of an individual’s self-evaluation (i.e., responses to the self-report measure) and described three ways in which it occurs, which we refer to as types of response shift: (1) recalibration, which is a change in internal standards of measurement; (2) reprioritization, which is a change in “values”, or a reevaluation of the importance of various domains that are relevant to the target construct; or (3) reconceptualization, which is a redefinition of the target construct (see Table 1 and Oort, 2005 for examples). Oort (2005) specifies recalibration as a change in the meaning of the values on the item response scale, reprioritization as a change in the importance of the item to the measurement of the target construct, and reconceptualization as a change in the meaning of the item content. Further, Sprangers and Schwartz (1999) define catalysts as changes in health status (e.g., an intervention, elapsed time, a diagnosis, or medical procedure), and mechanisms as behavioral, cognitive and affective processes that accommodate the catalyst, (e.g., coping or social comparison) but are unrelated to true change in construct. Their theoretical model of response shift proposes that a catalyst triggers a mechanism which then changes the meaning of responses to the measure via recalibration, reprioritization, or reconceptualization. In general, response shift is a concern as it could mean that observed changes on a self-report measure that may not reflect true change in the target construct (Oort et al., 2009).¹ Based on this theoretical model, response shift could potentially occur in the context of an intervention whenever self-report measures are used.

Table 1.

Summary of Levels of Invariance, Response Shift Terminology, and Examples from the Literature.

Invariance Model	Response Shift Term	Parameters Tested/Hypothesis	Consequences of Noninvariance	Example
Scalar Invariance	Recalibration	Intercepts consistent across groups τ_g = τ τ_t = τ	Moderate; common and assumes that the same construct has been measured	In a study investigating various treatments for depression, Fokkema et al., (2013) identified recalibration response shift in eight items on the Beck Depression Inventory such that the intercepts increased over time (indicating greater levels of depression). These appeared to be stronger for those receiving psychotherapy than medication. If ignored, the authors point out that the observed item scores would overestimate the level of depression.
Metric invariance	Reprioritization	Factor loadings consistent across groups or timepoints λ_g = λ or λ_t = λ	Moderate-severe; may not be measuring same construct.	Carlier et al., (2019) found response shift in a sample of individuals receiving outpatient treatment in two items – one cognitive (“I could not concentrate well”), and one somatic (“I was shaking or trembling”) such that the factor loadings were higher after treatment than before. The authors concluded that the patients placed more value on these problems posttreatment.
Configural Invariance	Reconceptualization	Pattern of fixed/free factor loadings consistent across groups, i.e., same structure, number of factors	Severe; not measuring same construct	Carlier et al., (2019) found in a sample of individuals receiving outpatient psychiatric treatment (multiple diagnoses) that items assessing suicidal ideation and hopelessness broke apart from the mood subscale to form a distinct factor, which was different from the structure of the items at pretreatment. The authors concluded that these concepts became more distinct after treatment whereas the other mood-related items did not.

Open in a new tab

In a review of response shift in QoL measures, Sajobi et al., (2018) reported that recalibration response shift occurred in 85% of the studies reviewed, making it the most common type. For this reason, we focus primarily on recalibration in the main text and discuss examples of reconceptualization and reprioritization in Supplement 5. To illustrate recalibration, suppose that our self-awareness mediator is measured by the eight-item acting with awareness subscale of the Five Facet Mindfulness Questionnaire (FFMQ; Baer, Smith, Hopkins, Krietemeyer & Toney, 2006). At pretest, a respondent in the treatment group interprets the item “I am easily distracted” as referring to becoming distracted by checking smartphone notifications and endorse the response of 5-“Very often or always true,” which they interpret as meaning that they are distracted by their notifications approximately once per day. Suppose that during the intervention, the individual learns that for some people, distractions can actually occur so frequently that they affect the ability to complete tasks. At posttest, the respondent does not increase on self-awareness (target construct) and is still distracted by smartphone notifications daily. However, because of what they learned during the intervention, they engage in social comparison and now interpret the response 5-“Very often or always true” as becoming distracted by notifications approximately once per hour. Therefore, they now endorse a 2-“Rarely true” since they consider a daily distraction to be a lower frequency given this new information.

Relating this example to Sprangers and Schwartz’s (1999) theoretical model, as a result of the intervention (i.e., the catalyst), they engaged in social comparison (i.e., the mechanism) which led to a shift in their internal standards (i.e., recalibration), and as a result, the meaning of their responses at posttest has changed – the response options now refer to different levels of being distracted than they did before. In contrast, an individual in the control group did not experience response shift as they have not learned new information from the intervention. If this shift was consistent for all individuals in the treatment group, the raw scores would show improvement in self-awareness from pretest to posttest, but this improvement is occurring because of changes in internal standards of self-awareness, not a true increase in self-awareness. The next section describes the detection of response shift using latent variable models.

Detecting Response Shift Using Tests of Measurement Invariance

As outlined in Oort et al. (2009), response shift can be understood from either a conceptual perspective or a measurement perspective depending on whether one views response shift as leading to true change in the observed variable of interest or measurement bias (i.e., a systematic difference in how the variable is measured). Either perspective has implications for the methodology used to identify response shift. Throughout this paper, we have embraced the measurement perspective and posit that response shift results in measurement bias – meaning that observed changes in the outcome variable do not necessarily reflect true change.² Moreover, under a so-called “broad view” of response shift (Oort et al., 2009; Oort, 2005), there is less focus on identifying the precise mechanism causing response shift, whereas a “narrow definition” would argue that measurement bias must be caused by a particular mechanism (e.g., adaptation, coping, social comparison) to qualify as response shift. In this paper, we adopt a broad view of response shift, and therefore do not comment further on specific mechanisms and whether they lead to response shift as we believe this would be dependent on the application. Under this perspective and view, response shift can be detected by using tests for measurement invariance (Meredith, 1993) with Confirmatory Factory Analysis (CFA; Oort, 2005). Measurement invariance (i.e., a lack of measurement bias) is a very technical subject and interested readers are referred to Millsap (2012) for a full discussion of the topic (for longitudinal invariance, see: Millsap and Cham, 2011, and Chapter 14 of Grimm, Ram and Estabrook, 2016). Here, we offer an overview of measurement invariance insofar as it relates to response shift.

First, we define a CFA model. Drawing from our example, suppose that researchers want to measure the participants’ level of self-awareness using a self-report measure consisting of eight items. We assume that the responses to the items are imperfect representations of their true or latent level of self-awareness, and that differences in item responses are due to differences in the latent level of self-awareness. We use a one-factor CFA model to map the theoretical relation between the eight item responses we observe for individual i (vector x_i) and their latent level (i.e., their true level) of the self-awareness construct (ξ_i) as follows:

x_{i} = τ + λ ξ_{i} + ϵ_{i},

(3)

where λ is a vector of eight factor loadings, which represent the strength of relation between each item in the measure and the self-awareness construct; τ is a vector of eight item intercepts which are the expected values of x when ξ is zero; and ϵ_i is a vector of unique scores, which captures any other influences that determine participant’s responses other than ξ_i, thus containing both random measurement error as well as variability specific to the item content.

Conceptually, measurement invariance means that the measurement parameters (i.e., τ, λ, and VAR(ϵ)) relating the observed responses to the latent variables are equivalent across groups or across time. In other words, the relation between the latent variable and item responses is the same across groups and time. Measurement noninvariance (i.e., measurement bias), on the other hand, refers to situations in which these parameters are not equivalent. When sum scores are used, measurement invariance must hold τ, λ, and VAR(ϵ) in order to make valid group comparisons.³ A problem arises when invariance is violated and sum scores are used because any observed differences across time or groups may not reflect true differences on the construct of interest. To explicate why measurement noninvariance is an issue for group mean comparisons, consider the expression for the mean of the observed variable x_1g:

E (x_{1 g}) = τ_{1 g} + λ_{1} κ_{g}

(4)

where E(x_1g) is the mean of the item x₁ for group g, and κ_g is the group mean of the latent factor for group g (i.e., κ =E[ξ_g]). Assuming that the groups have the same factor loading (i.e., λ_1g = λ₁) for this item, this equation shows that two groups might have different means on a particular item, x_1, because (1) the groups have a different mean on the latent factor ξ (i.e., true difference), (2) groups have equal means for ξ, but different intercepts, or (3) a combination of (1) and (2). At the level of the sum scores, if two groups do not differ on the latent variable, but the intercepts vary, differences in the intercepts will be conflated with differences in the means.

To detect measurement noninvariance, Equation 3 would be expanded to allow the intercepts (τ), loadings (λ), and residual variances (VAR(ϵ)) to vary by group in a multiple-group CFA model.⁴ Then, we use significance tests to evaluate whether the parameters are equivalent across respondents from different groups (i.e., treatment and control group) and across time (i.e. pretest and posttest). In order to attribute observed differences in sum scores to true differences in the latent variables (Millsap & Olivera-Aguilar, 2012), researchers would typically compare the fit of three latent variable models: 1) the configural invariance model to assess if the same factor structure holds for each group (i.e., the same number of factors, same cross loadings and correlated residuals); 2) the metric invariance model to assess if the factor loadings are equivalent across groups (e.g. λ_g = λ); and 3) the scalar invariance model to assess if the item intercepts are equivalent across groups (e.g. τ_g = τ). Box 1 provides details about these tests. If the measure is invariant, this means that the relation between ξ_i and x_i is consistent across groups and, critically, that the observed sum scores reflect true differences in the latent factor rather than measurement bias.

Box 1. Overview of Steps of Measurement Invariance Testing.

When testing for measurement invariance, there are a series of nested structural equation models that are fit using software such as Mplus or the R package lavaan. Each subsequent model adds constraints and is then compared to the previous model.

The steps are as follows:

Step 1. The configural invariance model has the same structure, or pattern of fixed and free factor loadings for each group or timepoint.

Identification: Typically, one item is chosen as the reference indicator and the loading is set to one and the intercept to zero for each group/timepoint.

Evaluation: The appropriateness of the configural invariance model is determined using fit indices common in SEM (e.g., model χ², CFI, TLI, RMSEA).

Interpretation: When configural invariance is not supported, the observed measures are understood to represent different constructs within each group. Reconceptualization response shift corresponds to a failure to find configural invariance after an intervention has occurred.
Step 2. If configural invariance holds, the metric invariance model, also referred to as the weak invariance model, is tested by constraining factor loadings to be equal across group or timepoint. Identification: The factor means are set to zero in both groups and the variances are free to vary.

Evaluation: A likelihood ratio test compares the model χ² of this model to the configural model. Interpretation: If the p-value from the likelihood ratio test is non-significant, then the hypothesis of equal factor loadings across groups or timepoints can be retained. Reprioritization response shift would result in different factor loadings across groups, and a failure to find metric invariance indicates that the relationship of the items to the latent variables differs across groups.
Step 3. If metric invariance holds, the scalar invariance model (or the strong invariance model) is tested by constraining all intercepts to be equal across groups or timepoints.

Identification. The factor mean is set to 0 in one group and estimated freely in the other group.

Evaluation: Compare fit of this model to the metric invariance model using likelihood ratio test.

Interpretation: A non-significant p-value indicates that the hypothesis of equal intercepts could be retained. Recalibration response shift results in different intercepts across groups and a failure to find scalar invariance.

Recall that in this paper we focus on recalibration response shift, which would be appear as a violation to scalar invariance (i.e., nonequivalent intercepts). Returning to the example provided for recalibration with the item “I am easily distracted”, where a response of a 2 at posttest had the same meaning as a 5 at pretest, assume that the true level of self-awareness is constant for all individuals in the treatment group and all responses are shifted down by three points.⁵ Recalibration would shift the observed means downward for this item and result in a violation to scalar invariance since this change occurred despite self-awareness remaining constant. The connection between violations to invariance and the other types of response shift are described in Supplement 5.

Up to this point, we have discussed invariance with respect to groups, but in the two wave model, we would need to test for invariance across groups and across time. Noninvariance in the mediator could arise in four main patterns in the two wave model. First, the mediator could be noninvariant across groups at pretest (and hold at posttest), but this is unlikely because random assignment should ensure the groups are approximately equal prior to treatment. Second, ignoring group assignment, noninvariance could occur across time (i.e., from pretest to posttest), suggesting that some other influence, such as development, resulted in a change in the intercepts that was consistent across groups. This is commonly referred to as maturation.⁶ Third, the intercepts could be noninvariant across time for the control group and invariant across time for the treatment group, but this would be a surprising result and the explanation would require specific knowledge about the intervention and research design. Finally, the intercepts could be invariant across time for the control group, but noninvariant for the treatment group, which would suggest that recalibration response shift due to the intervention has occurred since the treatment group had received the intervention and the control group had not. Figure 2 is a flowchart for making modeling decisions based on measurement invariance tests and Supplement 2 is a tutorial on how to test these models.

Figure 2. — Flow Chart for Testing for Invariance and Response Shift

*Note:* This flowchart focuses on scalar invariance (i.e., invariant intercepts), but could be used similarly for metric invariance (i.e., invariant loadings).

In sum, the relationship between response shift and measurement invariance is reciprocal. Measurement invariance provides a statistical definition for response shift as well as a methodological tool for assessing whether response shift may have occurred in an intervention (summarized in Box 1). On the other hand, the theory of response shift provides an explanation for why measurement invariance could occur specifically in an intervention context. Most of the literature on measurement noninvariance focuses on research scenarios in which groups will be compared, or when studying growth across development. However, there is a lack of theoretical work on causes of measurement noninvariance in psychology, making the theory of response shift an important consideration and impetus for incorporating measurement invariance tests into intervention work. Moreover, it is unclear what the specific consequences of recalibration response shift would be in the two wave mediation model. Below, we demonstrate the consequences of recalibration response shift in the two-wave model using a simulated illustration.

Illustration

When researchers design an intervention to target a mediator and there is random assignment, response shift could occur due to either the intervention (i.e., response shift due to treatment) or time (i.e., maturation). Previous research on cross-sectional mediation models suggests that when there is measurement noninvariance in the mediator, but this is not accounted for in the model, the mediated effect could be biased and type I error rates could be higher than .05 (Guenole & Brown, 2014; Olivera-Aguilar, Rikoon, Gonzalez, Kisbu-Sakarya & MacKinnon, 2017; Williams et al., 2010). However, these prior studies are limited because they did not use a longitudinal model and used latent variables to represent the mediator rather than sum scores, which is the most common way to represent mediators in pretest-posttest studies (MacKinnon, 2008; Valente & MacKinnon, 2017).

In our illustration, we expand previous methodological work by examining how response shift would affect the estimation of the two-wave mediation model (Gonzalez, Valente, & MacKinnon, 2016) when sum scores are used. These examples feature recalibration only as it appears to be the most common type of response shift in interventions (Sajobi et al., 2018). In the illustration, we show how the power, type I error rates, and bias related to the mediated effect are affected when recalibration (due to the intervention or maturation) in the mediator is ignored.

Data Generation

The illustrations below are inspired by the mindfulness intervention example that we have discussed throughout, where a randomized, treatment-control group intervention targets self-awareness to reduce alcohol and drug cravings. Datasets were simulated in the R statistical environment using the R-package lavaan (Rosseel, 2012). The conditions for the simulated examples were chosen to reflect realistic data scenarios. In particular, we chose small effect sizes among variables and a sample size of N=650 (N=325 in each treatment and control group) to maintain a power to detect the mediated effect of .79. A conceptual representation of the data-generating model is presented in Figure 3.

Figure 3. — Path diagram of the conceptual model used for data-generation.

*Note*. Data were generated from a two-group model that represented the groups indexed by X. The paths emanating from X were derived by specifying different intercepts per group for variables M2 and Y2. The intercept values for M2 and Y2 were the true values for a and c’, respectively. Circles are latent variables and squares are observed variables. The different indicator intercepts across groups on M2 and the residual correlations across time points for the indicators are not included in the figure to reduce clutter.

The binary variable X represents treatment group, Y₁ and Y₂ were continuous, normally-distributed variables, and M₁ and M₂ were latent variables, each defined by six continuous items. For the items, the loadings were invariant and were specified to be λ = (1.0, .65, .55, .60, .50, .80) (Standardized = .66, .49, .52, .58, .47, .67) at each time point. The residual variances were 1.3, 1.3, .8, .7, .9, .8 (standardized = .58, .71, .69, .65, .73, .57). The composite reliability (omega) was .82. The residual item covariances were .20 (residual item correlations = .15, .15, .25, .29, .22, .25) for all items across time points. To introduce a medium effect size of recalibration due to treatment (as estimated by Olivera-Aguilar et al., 2017), we specified two out of the six indicator intercepts for M₂ to differ between the treatment and control group (see Table 2; the item intercepts for the control group were the same as in the invariant condition). Similarly, recalibration due to maturation was introduced by specifying the item intercepts for M₁ and M₂ to differ between pretest and posttest. However, the estimation of the mediated effect was not adversely affected by maturation, (code for conditions with maturation response shift are provided in Supplement 3). The intuitive explanation for this finding is that maturation results in the same increase/decrease in the average sum scores for the mediator in both groups. In other words, if the average sum scores increased by 2 points in the treatment group due to noninvariance, they would also increase by 2 points for the control group. Therefore, the a-path, which represents the group difference (adjusting for M₁), would only reflect true differences across the groups. Therefore, maturation does not appear to impact the mediated effect estimates. Table 2 also presents the standardized,⁷ and unstandardized true values for a, b₃, and c’. For all simulated datasets, there was a stability of .70 between M₁ and M₂ and between Y₁ and Y₂, cross-lags from Y₁ to M₂ and from M₁ to Y₂ were set to zero, and a correlation between M₁ and Y₁ was set to .50. Therefore, there were three invariant conditions and five conditions with recalibration due to treatment, each with 1,000 replications per condition. Overall, five of the conditions had a nonzero mediated effect in the population, and the other three had a zero mediated effect in the population. Supplement 1 has R code to reproduce the simulated examples.

Table 2.

Data-generating Item Intercepts and True Paths

Models	Intercepts at T2 for Treatment Group	Unstandardized a, b_3, and c’ paths	Standardized a, b₃ and c’ paths wrt endogenous var
Model 1 (invariant)	.90, .40, .50, .50, .70, .40	.285, .145, 0	.202, .149, 0
Model 2	.90, .40, .72, .74, .70, .40	.285, .145, 0	.202, .149, 0
Model 3	.90, .40, .28, .26, .70, .40	.285, .145, 0	.202, .149, 0
Model 4 (invariant)	.90, .40, .50, .50, .70, .40	0, .145, 0	0, .148, 0
Model 5	.90, .40, .72, .74, .70, .40	0, .145, 0	0, .148, 0
Model 6	.90, .40, .28, .26, .70, .40	0, .145, 0	0, .148, 0
Model 7 (Invariant)	.90, .40, .50, .50, .70, .40	.285, .145, .780	.202, .141, .543
Model 8	.90, .40, .72, .74, .70, .40	.285, .145, .780	.202, .141, .543

Open in a new tab

Note. Intercepts for both groups at T1 are the same as those in Models 1, 4, and 7. Intercepts that are bold are noninvariant. The a path has either a zero or a small effect size (Cohen’s f² = .02), the b₃ path had a small effect size (f² = .02), and the c’ path had a zero or a medium effect size (f² = .15). See Supplement 4 on the correspondence between our true values on the paths and Cohen’s f² effect size. The b₃ path is the relation between M2 and Y2, and both are endogenous variables, so this path is fully standardized. wrt stands for with respect to and var stands for variable. Supplement 4 demonstrates how the mediation paths were obtained.

Data Analysis

The two wave mediation model in Figure 1 was used to analyze the generated datasets, which treats all of the variables as observed by using sum scores (thus not accounting for response shift). Analyzing datasets in which we know that there is response shift provides insight into how the mediated effect is affected when response shift is ignored and sum scores are used. Observed scores for M₁ and M₂ were estimated by summing the indicators for M₁ and M₂. Datasets from Models 1, 4 and 7, which feature no response shift in the mediator, are provided as a baseline for power and type I error rates. True parameter values for the model with observed variables were verified using the population generating covariance matrix. Relative bias for the parameter estimates with a nonzero true value was calculated by taking the difference between each sample’s parameter estimates and the true values and then dividing by the true values. These estimates were then averaged over all the samples. Relative bias estimates below 0.05 were deemed acceptable. For conditions with a zero true value, standardized bias was estimated by dividing the difference between the parameter estimate and the true value by the empirical standard deviation of the estimate. Across all datasets, the significance of the mediated effect ab₃ was examined with the distribution of the product method (e.g., MacKinnon et al., 2002) using the RMediation R-package (Tofighi & MacKinnon, 2011). Type I error rate and power were computed by taking the proportion of datasets in which the mediated effect was statistically significant in conditions with a true zero mediated effect and a true nonzero mediated effect, respectively. The results are summarized in Table 3 and discussed in more detail below.

Table 3.

Simulated Models and Results

Models	Model Conditions		Performance		Relative Bias
	Size of Mediated Effect ab	Response shift	Power	Type I error	a	b ₃	c’
Model 1	Small ES	None	.787	-			<.001
Model 2	Small ES	Positive Direction	.938	-	0.394	0.006	<.001
Model 3	Small ES	Negative Direction	.407	-	−0.393	0.002	<.001
Model 4	Zero	None	-	.037			<.001
Model 5	Zero	Positive Direction	-	.190	1.190	0.006	<.001
Model 6	Zero	Negative Direction	-	.206	−1.190	0.006	<.001
Model 7	Small ES; Med. ES for c’	None	.787	-			<.001
Model 8	Small ES; Med. ES for c’	Positive Direction	.938	-	0.394	<.001	<.001

Open in a new tab

Note. ES is Effect Size. Relative Bias below a level of .05 is considered acceptable. When response shift was present, it had a medium effect size for 2 of 6 items

Models with a Nonzero Mediated Effect

Model 1 represents a situation where respondents in the treatment group increased on the self-awareness construct, and the mediator is free of response shift. The power to detect the mediated effect in data simulated from Model 1 was .787. Similar power estimates were found in Model 7, which includes a nonzero c’-path (i.e., partial mediation).

Furthermore, Model 2 represents a situation where respondents in the treatment group increased on the self-awareness construct (i.e., the a-path was significant), but they also showed response shift in the same direction (i.e., the intercepts for the treatment group were higher at posttest). In this case, we would expect positively biased estimates for both the a-path and the mediated effect, which in turn would inaccurately yield high power. In our results, the power to detect the mediated effect was .938, which is higher than the power to detect the mediated effect when the mediator is free of response shift (.787 -- as in Model 1). The relative bias for the a-path was 0.394 and for the b₃ path was below 0.05. Thus, the bias in the a-path resulted in greater power to detect a mediated effect. This is a concern because inflated mediated effects may pose a problem for planning future studies – the mediated effect may be over-estimated, causing researchers to overstate the effect that the intervention had on the mediator. Similar bias in the a-path and power estimates were found in Model 8, which differs from Model 2 by the inclusion of a nonzero c’-path.⁸

Finally, Model 3 represents a situation where respondents in the treatment group increased on the self-awareness construct (i.e., the a-path was significant), but they showed response shift in the opposite direction (i.e., the intercepts for the treatment group were lower at posttest). Therefore, we would expect negatively biased estimates for both the a-path and the mediated effect. In our results, the power to detect the mediated effect was .407, which nearly a 50% reduction in power to detect the mediated effect compared to when the mediator is free of response shift (.787 -- as in Model 1). The relative bias in the a-path was −0.393 and for the b-path was below 0.05. This example underscores that recalibration in the opposite direction of the a-path can result in enough bias to lead to type 2 errors (i.e., a failure to detect a true effect). This is a concern because incorrect conclusions that the intervention did not work through the mediator could lead future intervention studies to no longer consider that mediator, or, conversely, to enhance program components to produce a larger effect (i.e., increasing the number of hours and/or duration of the intervention), potentially wasting valuable resources.

Models with No Mediated Effect

Model 4 represents a situation where the intervention did not increase self-awareness (i.e., a-path is zero, no mediated effect), and there is no response shift in the mediator. In this case, the type I error rate was .037, which provides a comparison for subsequent models. Model 5 and Model 6 represent situations where respondents in the treatment group did not change on the self-awareness construct (a-path is zero), but there is a response shift in a positive (Model 5) or in a negative (Model 6) direction for the treatment group. Consequently, the magnitude of bias for the a-path estimates was |1.19| in Models 5 and 6, but was positive for Model 5 and negative in Model 6. Both have a similar type I error rate of around .20, which is four times larger than the type I error rate for the invariant model (Model 4). An inflated type I error rate is a concern as incorrect conclusions that the intervention affected the mediator could motivate similar future studies, rather than evidence that this particular mediator is not affected.

Summary

The power and type I error rates for the mediated effect were affected when there is response shift due to a recalibration in the mediator. When there was a nonzero mediated effect and the response shift was in the same direction as the a-path, the mediated effect estimate was larger than it should be. On the other hand, response shift in the opposite direction of the a-path, resulted in a mediated effect estimate that was smaller than it should be. Finally, when there was no effect of the intervention on self-awareness (the a-path was zero, and thus the mediated effect was zero), but there was recalibration response shift in the positive or negative direction for the treatment group, type I error rates for the mediated effect were higher than 0.05. Results extend to situations where there is full or partial mediation.

General Discussion

When researchers target a mediator assessed via self-report, there is a possibility that the responses at posttest have a different meaning than they did at pretest due to changes experienced as a result of the intervention, a phenomenon referred to as response shift. The goals of this paper were to provide background on response shift as it could occur in an intervention study and demonstrate how ignoring response shift in the mediator could affect the detection of the mediated effect. Our simulated examples demonstrate that ignoring response shift can lead to drastically different conclusions about statistical mediation. These conclusions are important because the most common model to analyze intervention data uses sum scores, which do not allow for tests of measurement invariance. Therefore, we encourage researchers to test for response shift using measurement invariance tests, and to understand the nature of the response shift by identifying its source (due to the treatment, maturation, or both) and its type (reconceptualization, reprioritization, or recalibration). If response shift in the mediator is assessed and detected, it could be accommodated by using a latent variable for the mediator and allow some of the factor loadings and item intercepts to vary across groups or across time (see Supplement 2 for a tutorial). A more general recommendation is that researchers testing intervention-based mediation models use latent variable models. Latent variable models not only allow for tests of measurement invariance but can also address violations to measurement invariance in ways not possible with sum scores.

While we have focused on randomized interventions, the conclusions from the simulation regarding bias, type I error, and power could potentially apply to non-randomized interventions, longitudinal studies, or other models with mediators that violate measurement invariance (not necessarily due to response shift). In a randomized study, we expect for the mediator measure to be invariant across groups at pretest, but we do not expect invariance at pretest in nonrandomized studies, nor can we expect this to hold for all measurement occasions in a longitudinal study. Therefore, we urge researchers to use measurement invariance tests to assess whether they are assessing the same construct at all measurement occasions, and if not, to understand the nature of the noninvariance. Additional technical and theoretical work is needed to determine how violations of invariance at pretest affect the estimation of the mediated effect. In addition, while response shift is defined specifically for self-report measures (e.g., Howard, 1980), similar effects could occur in other instruments, such as in parent-report measures on child behavior gathered before and after a parenting intervention. Finally, additional evidence, such as qualitative data, additional measures, or extensive subject-matter expertise would be required to determine the specific mechanism(s) responsible for response shift in a given study.

Limitations and Future Directions

While we have provided definitions and explanations for response shift and the theoretical model that is in line with Sprangers and Schwartz (1999), the concept of response shift is challenging to capture and the theory continues to be refined within QoL research. We consider this paper to offer a light introduction to response shift and think that adopting a measurement perspective allowed for greater clarity in describing response shift. However, a limitation of this paper is that we have not provided a full discussion of the nuances of response shift, nor presented alternative perspectives from the literature. For example, Ubel, Peeters and Smith (2010) proposed abandoning the term “response shift” altogether, arguing that this term created conceptual confusion by conflating measurement bias and true change, while Donaldson (2005) critiqued the use of measurement invariance methodology for investigating response shift.⁹ Future work should focus on fully translating response shift into a psychological context.

One shortcoming that affects any study of measurement invariance is that certain constraints must be put on the model for identification and scaling, and that these constraints also make assumptions about invariance. For example, invariance is assumed when using a scaling indicator by constraining the loadings and intercepts for the first item across groups or across time points. If recalibration affected all items, effects of noninvariance and true change could not be differentiated. Therefore, it is important to have at least one item that is invariant across groups and timepoints and to correctly identify it in the examined model.

We have assumed throughout the paper that the intervention is changing the mediating construct, which would mean that observed indicators are affected through changes in the latent construct, not directly by the intervention. A question for further research is to determine the best way to accommodate a situation in which the intervention causes change in specific behaviors, while not affecting others (Gonzalez & MacKinnon, in press). In this situation, the theory describing the impact of the intervention on the mediator is incorrect, and our assumed model is incorrect. Therefore, we may detect response shift when the actual problem is that our model is incorrect (i.e., mis-specified).

Finally, the framework of measurement invariance assumes reflective indicators, where the correct model is one in which the latent variable causes the observed indicators (i.e., as the latent variable increases, the scores on the indicators increase). An alternate conceptualization of this relationship is one in which the latent variables are caused by the indicators. In this case, the indicators could be causal indicators that assess the latent variable. If a causal indicator is incorrectly modeled as a reflective indicator, the model would be mis-specified and the results would not be meaningful. While a full discussion of the implications of this type of mis-specification is beyond the scope of this paper, we recommend Bollen and Bauldry (2011) and Rhemtulla, van Bork, and Borsboom (2019) for a thorough discussion of these issues.

Overall, we encourage researchers to probe for response shift when they are testing for mediation in an intervention setting. Response shift could affect the likelihood of finding statistically significant mediated effects, which in turn could affect our conclusions about how the intervention worked. We hope that researchers incorporate the methodology presented to their toolbox to make the most accurate conclusions about mediation methods.

Supplementary Material

Supplement 5 -- Examples of Reconceptualization and Reprioritization Response Shift

NIHMS1794539-supplement-Supplement_5_--_Examples_of_Reconceptualization_and_Reprioritization_Response_Shift.docx^{(17.5KB, docx)}

Supplement 1 -- Response Shift Examples

NIHMS1794539-supplement-Supplement_1_--_Response_Shift_Examples.html^{(707KB, html)}

Supplement 2 -- Response Shift and Measurement Invariance Testing Tutorial

NIHMS1794539-supplement-Supplement_2_--_Response_Shift_and_Measurement_Invariance_Testing_Tutorial.html^{(753.8KB, html)}

Supplement 3 -- Maturation Examples

NIHMS1794539-supplement-Supplement_3_--_Maturation_Examples.html^{(703.8KB, html)}

Supplement 4 -- Mediation Effect Sizes

NIHMS1794539-supplement-Supplement_4_--_Mediation_Effect_Sizes.html^{(640.7KB, html)}

Funding

This research was supported in part by the National Institute on Drug Abuse under Grant No. R37-DA009757.

Footnotes

Conflicts of Interest

The author(s) declare that there were no conflicts of interest with respect to the authorship or the publication of this article.

Supplemental Materials

https://osf.io/t67ps/

This only holds if one views response shift as measurement bias; see next section and footnote.

The conceptual perspective defines response shift as occurring when certain variables (i.e., “mechanisms” such as coping or social comparison) confound the relationship between the explanatory variable (i.e., “catalysts” such as the intervention) and the outcome variable (i.e., the mediator in Equation 1). The conceptual perspective therefore views response shift as a special case of explanation bias, because observed change in the outcome variable is not fully explained by the explanatory variable (i.e., treatment), as the outcome variable is also affected by other variables (i.e., mechanisms). Critically, observed changes in the outcome variable are considered to be true change under this perspective; thus CFA need not be used to test for explanation bias. See Oort, Visser and Sprangers (2009) for further details.

⁴

The multiple-group CFA model is used for groups while the longitudinal CFA would be used to test for longitudinal invariance, but the principles are largely the same. We focus on groups in this explanation for clarity, but in the two wave model we are interested in invariance across groups and time. See Millsap and Cham (2011) for further details on the longitudinal CFA model.

⁵

While this would not technically be possible with an ordinal scale, which is bounded, we assume this for the purpose of the illustration.

⁶

Note that by taking a broad view of response shift, maturation could be considered a type of response shift. We distinguish the types as response shift due to the intervention and response shift due to maturation.

⁷

When all other parameters are fixed, an unstandardized path coefficient represents the change on the outcome for a one-unit change in the focal predictor. A standardized path coefficient represents how many standard deviation units an outcome changes per one-standard-deviation-change in the predictor. In this paper, we present standardized path coefficients with respect to the outcomes or endogenous variables only because we have a binary predictor (e.g., the treatment indicator), and a one-standard-deviation change of a binary predictor is not meaningful (Hayes, 2009).

⁸

Example 2a in supplement 1 shows the same premise of Model 2, but a scenario in which the items were parallel (e.g., same factor loadings, error variances, and intercepts). Parallel items overcome some of the other limitations imposed by sum scores unrelated to invariance (see McNeish and Wolf, 2019) and our findings were similar.

⁹

See Reeve (2010) and Sprangers & Schwartz (2010) for responses to Ubel, Peeters and Smith (2010); See Oort (2005) and Ahmed and Mayo (2005) for responses to Donaldson (2005).

References

Ahmed S & Mayo N (2005). Response to Donaldson’s Commentary. Quality of Life Research, 14(10), 2357–2358. http://www.jstor.org/stable/4039972 [DOI] [PubMed] [Google Scholar]
Baer RA, Smith GT, Hopkins J, Krietemeyer J, & Toney L (2006). Using self-report assessment methods to explore facets of mindfulness. Assessment, 13, 27–45. 10.1177/1073191105283504 [DOI] [PubMed] [Google Scholar]
Baron RM, & Kenny DA (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. [DOI] [PubMed] [Google Scholar]
Bollen KA, & Bauldry S (2011). Three Cs in measurement models: Causal indicators, composite indicators, and covariates. Psychological methods, 16(3), 265. 10.1037/a0024448 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bruneau EG, Kteily NS, & Urbiola A (2020). A collective blame hypocrisy intervention enduringly reduces hostility towards Muslims. Nature Human Behaviour, 4(1), 45–54. 10.1038/s41562-019-0747-7 [DOI] [PubMed] [Google Scholar]
Carlier IV, van Eeden WA, de Jong K, Giltay EJ, van Noorden MS, van der Feltz‐Cornelis C, … & van Hemert AM (2019). Testing for response shift in treatment evaluation of change in self‐reported psychopathology amongst secondary psychiatric care outpatients. International Journal of Methods in Psychiatric Research, 28(3), e1785. 10.1002/mpr.1785 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cole DA, & Maxwell SE (2003). Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112(4), 558–577. http://dx.doi.org.libproxy.lib.unc.edu/10.1037/0021-843X.112.4.558 [DOI] [PubMed] [Google Scholar]
Donaldson GW (2005). Structural equation models for quality of life response shifts: promises and pitfalls. Quality of life research, 14(10), 2345–2351. https://www.jstor.org/stable/4039970 [DOI] [PubMed] [Google Scholar]
Fokkema M, Smits N, Kelderman H, & Cuijpers P (2013). Response shifts in mental health interventions: An illustration of longitudinal measurement invariance. Psychological Assessment, 25(2), 520. 10.1037/a0031669 [DOI] [PubMed] [Google Scholar]
Golembiewski RT, Billingsley K, & Yeager S (1976). Measuring change and persistence in human affairs: Types of change generated by OD designs. The Journal of Applied Behavioral Science, 12, 133–157. 10.1177/002188637601200201 [DOI] [Google Scholar]
Gonzalez O & MacKinnon DP (in press). The measurement of the mediator and its influence on statistical mediation conclusions. Psychological Methods. DOI: 10.1037/met0000263. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gonzalez O, Valente MJ, & MacKinnon DP, (May, 2017). Violations of longitudinal measurement invariance in the two-wave mediation model. Paper presented at the 25th annual meeting of the Society for Prevention Research, Washington, DC. [Google Scholar]
Grimm KJ, Ram N, & Estabrook R (2017). Growth modeling: Structural equation and multilevel modeling approaches. New York, NY: The Guilford Press. [Google Scholar]
Guenole N, & Brown A (2014). The consequences of ignoring measurement invariance for path coefficients in structural equation models. Frontiers in Psychology, 5, Article 980. 10.3389/fpsyg.2014.00980 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hancock GR, Stapleton LM, & Arnold-Berkovits I (2009). The tenuousness of invariance tests within multisample covariance and mean structure models. In Teo T & Khine MS (eds.), Structural equation modeling: Concepts and applications in educational research (pp. 137–174). Rotterdam, The Netherlands: Sense Publishers. [Google Scholar]
Hayes AF (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76(4), 408–420. 10.1080/03637750903310360. [DOI] [Google Scholar]
Howard GS (1980). Response-shift bias: A problem in evaluating interventions with pre/post self-reports. Evaluation Review, 4, 93–106. 10.1177/0193841X8000400105 [DOI] [Google Scholar]
Hsiao YY, Tofighi D, Kruger ES, Van Horn ML, MacKinnon DP, & Witkiewitz K (2019). The (lack of) replication of self-reported mindfulness as a mechanism of change in mindfulness-based relapse prevention for substance use disorders. Mindfulness, 10, 724–736. 10.1007/s12671-018-1023-z [DOI] [PMC free article] [PubMed] [Google Scholar]
Luengo Kanacri BP, Zuffiano A, Pastorelli C, Jiménez‐Moya G, Tirado LU, Thartori E, Gerbino M, Cumsille P, & Martinez ML (2019). Cross‐national evidences of a school‐based universal programme for promoting prosocial behaviours in peer interactions: Main theoretical communalities and local unicity. International Journal of Psychology, 55. 10.1002/ijop.12579 [DOI] [PubMed] [Google Scholar]
MacKinnon DP (2008). Introduction to statistical mediation analysis. New York, NY: Taylor & Francis/Erlbaum. [Google Scholar]
MacKinnon DP, & Dwyer JH (1993). Estimating mediated effects in prevention studies. Evaluation review, 17(2), 144–158. 10.1177/0193841X9301700202 [DOI] [Google Scholar]
MacKinnon DP, Lockwood CM, Hoffman JM, West SG, & Sheets V (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological methods, 7(1), 83. 10.1037/1082-989X.7.1.83 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mayer A, Thoemmes F, Rose N, Steyer R, & West SG (2014). Theory and analysis of total, direct, and indirect causal effects. Multivariate Behavioral Research, 49, 425–442. 10.1080/00273171.2014.931797 [DOI] [PubMed] [Google Scholar]
Meredith W (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543. 10.1007/BF02294825 [DOI] [Google Scholar]
Millsap RE (2011). Statistical approaches to measurement invariance. Routledge. [Google Scholar]
Millsap RE, & Olivera-Aguilar M (2012). Investigating measurement invariance using confirmatory factor analysis. In Hoyle RH (Ed.), Handbook of structural equation modeling (pp. 380–392). New York, NY: Guilford Press. [Google Scholar]
Oort FJ (2005). Using structural equation modeling to detect response shifts and true change. Quality of Life Research, 14, 587–598. 10.1007/s11136-004-0830-y [DOI] [PubMed] [Google Scholar]
Oort FJ (2005). Towards a formal definition of response shift (in reply to GW Donaldson). Quality of Life Research, 2353–2355. http://www.jstor.org/stable/4039971 [DOI] [PubMed] [Google Scholar]
Oort FJ, Visser MR, & Sprangers MA (2005). An application of structural equation modeling to detect response shifts and true change in quality of life data from cancer patients undergoing invasive surgery. Quality of Life Research, 14, 599–609. 10.1007/s11136-004-0831-x [DOI] [PubMed] [Google Scholar]
Oort FJ, Visser MR, & Sprangers MA (2009). Formal definitions of measurement bias and explanation bias clarify measurement and conceptual perspectives on response shift. Journal of clinical epidemiology, 62(11), 1126–1137. [DOI] [PubMed] [Google Scholar]
Pearl J (2001, August). Direct and indirect effects. In Proceedings of the seventeenth conference on uncertainty in artificial intelligence (pp. 411–420). Morgan Kaufmann Publishers Inc. arXiv:1301.2300 [Google Scholar]
Plow M, Motl RW, Finlayson M, & Bethoux F (2020). Intervention mediators in a randomized controlled trial to increase physical activity and fatigue self-management behaviors among adults with multiple sclerosis. Annals of Behavioral Medicine, 54(3), 213–221. 10.1093/abm/kaz033g [DOI] [PubMed] [Google Scholar]
Quigley L, Dozois DJA, Bagby RM, Lobo DSS, Ravindran L, Quilty LC (2019). Cognitive change in cognitive-behavioural therapy v. pharmacotherapy for adult depression: a longitudinal mediation analysis. Psychological Medicine 49, 2626–2634. 10.1017/S0033291718003653 [DOI] [PubMed] [Google Scholar]
Rapkin BD, & Schwartz CE (2019). Advancing quality-of-life research by deepening our understanding of response shift: a unifying theory of appraisal. Quality of Life Research, 1–8. 10.1007/s11136-019-02248-z [DOI] [PubMed] [Google Scholar]
Reeve BB (2010). An opportunity to refine our understanding of “response shift” and to educate researchers on designing quality research studies: response to Ubel, Peeters, and Smith. Quality of Life Research, 19(4), 473–475. 10.1007/s11136-010-9612-x [DOI] [PubMed] [Google Scholar]
Rhemtulla M, van Bork R, & Borsboom D (2019). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological methods. 10.1037/met0000220 [DOI] [PubMed] [Google Scholar]
Robins JM, & Greenland S (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. www.jstor.org/stable/3702894 [DOI] [PubMed] [Google Scholar]
Rosseel Y (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of Statistical Software, 48, 1–36. [Google Scholar]
Sajobi TT, Brahmbatt R, Lix LM, Zumbo BD, & Sawatzky R (2018). Scoping review of response shift methods: current reporting practices and recommendations. Quality of Life Research, 27, 1133–1146. [DOI] [PubMed] [Google Scholar]
Sibthorp J, Paisley K, Gookin J, & Ward P (2007). Addressing response-shift bias: Retrospective pretests in recreation research and evaluation. Journal of Leisure Research, 39, 295–315. [Google Scholar]
Sprangers MA, & Schwartz CE (1999). Integrating response shift into health-related quality of life research: a theoretical model. Social Science & Medicine, 48, 1507–1515. 10.1007/s11136-017-1751-x [DOI] [PubMed] [Google Scholar]
Sprangers MA, & Schwartz CE (2010). Do not throw out the baby with the bath water: build on current approaches to realize conceptual clarity. Response to Ubel, Peeters, and Smith. Quality of Life Research, 19(4), 477–479. 10.1007/s11136-010-9611-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Tofighi D, & MacKinnon DP (2011). RMediation: An R package for mediation analysis confidence intervals. Behavior Research Methods, 43, 692–700. 10.3758/s13428-011-0076-x [DOI] [PMC free article] [PubMed] [Google Scholar]
Ubel PA, Peeters Y, & Smith D (2010). Abandoning the language of “response shift”: a plea for conceptual clarity in distinguishing scale recalibration from true changes in quality of life. Quality of Life Research, 19(4), 465–471. 10.1007/s11136-010-9592-x [DOI] [PubMed] [Google Scholar]
Valente MJ, & MacKinnon DP (2017). Comparing models of change to estimate the mediated effect in the pretest–posttest control group design. Structural Equation Modeling: A Multidisciplinary Journal, 24, 428–450. 10.1080/10705511.2016.1274657 [DOI] [PMC free article] [PubMed] [Google Scholar]
Valente MJ, MacKinnon DP, & Mazza GL (2019). A viable alternative when propensity scores fail: Evaluation of inverse propensity weighting and sequential G-estimation in a two-wave mediation model. Multivariate Behavioral Research, 1–23. 10.1080/00273171.2019.1614429 [DOI] [PMC free article] [PubMed] [Google Scholar]
VanderWeele T (2015). Explanation in causal inference: methods for mediation and interaction. Oxford University Press. [Google Scholar]
Vergauwe J, Wille B, Hofmans J, Kaiser RB, & De Fruyt F (2018). The double-edged sword of leader charisma: Understanding the curvilinear relationship between charismatic personality and leader effectiveness. Journal of Personality and Social Psychology, 114, 110–130. 10.1037/pspp0000147 [DOI] [PubMed] [Google Scholar]
Weissman DG, Bitran D, Miller AB, Schaefer JD, Sheridan MA, & McLaughlin KA (2019). Difficulties with emotion regulation as a transdiagnostic mechanism linking child maltreatment with the emergence of psychopathology. Development and Psychopathology, 31, 1–17. 10.1017/S0954579419000348 [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams J, Jones SB, Pemberton MR, Bray RM, Brown JM, & Vandermaas-Peeler R (2010). Measurement invariance of alcohol use motivations in junior military personnel at risk of depression or anxiety. Addictive Behaviors, 35, 444–451. 10.1016/j.addbeh.2009.12.012 [DOI] [PubMed] [Google Scholar]
Woods CM, & Grimm KJ (2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Applied Psychological Measurement, 35, 339–361. 10.1177/0146621611405984 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 5 -- Examples of Reconceptualization and Reprioritization Response Shift

NIHMS1794539-supplement-Supplement_5_--_Examples_of_Reconceptualization_and_Reprioritization_Response_Shift.docx^{(17.5KB, docx)}

Supplement 1 -- Response Shift Examples

NIHMS1794539-supplement-Supplement_1_--_Response_Shift_Examples.html^{(707KB, html)}

Supplement 2 -- Response Shift and Measurement Invariance Testing Tutorial

NIHMS1794539-supplement-Supplement_2_--_Response_Shift_and_Measurement_Invariance_Testing_Tutorial.html^{(753.8KB, html)}

Supplement 3 -- Maturation Examples

NIHMS1794539-supplement-Supplement_3_--_Maturation_Examples.html^{(703.8KB, html)}

Supplement 4 -- Mediation Effect Sizes

NIHMS1794539-supplement-Supplement_4_--_Mediation_Effect_Sizes.html^{(640.7KB, html)}

[R1] Ahmed S & Mayo N (2005). Response to Donaldson’s Commentary. Quality of Life Research, 14(10), 2357–2358. http://www.jstor.org/stable/4039972 [DOI] [PubMed] [Google Scholar]

[R2] Baer RA, Smith GT, Hopkins J, Krietemeyer J, & Toney L (2006). Using self-report assessment methods to explore facets of mindfulness. Assessment, 13, 27–45. 10.1177/1073191105283504 [DOI] [PubMed] [Google Scholar]

[R3] Baron RM, & Kenny DA (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. [DOI] [PubMed] [Google Scholar]

[R4] Bollen KA, & Bauldry S (2011). Three Cs in measurement models: Causal indicators, composite indicators, and covariates. Psychological methods, 16(3), 265. 10.1037/a0024448 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Bruneau EG, Kteily NS, & Urbiola A (2020). A collective blame hypocrisy intervention enduringly reduces hostility towards Muslims. Nature Human Behaviour, 4(1), 45–54. 10.1038/s41562-019-0747-7 [DOI] [PubMed] [Google Scholar]

[R6] Carlier IV, van Eeden WA, de Jong K, Giltay EJ, van Noorden MS, van der Feltz‐Cornelis C, … & van Hemert AM (2019). Testing for response shift in treatment evaluation of change in self‐reported psychopathology amongst secondary psychiatric care outpatients. International Journal of Methods in Psychiatric Research, 28(3), e1785. 10.1002/mpr.1785 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Cole DA, & Maxwell SE (2003). Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112(4), 558–577. http://dx.doi.org.libproxy.lib.unc.edu/10.1037/0021-843X.112.4.558 [DOI] [PubMed] [Google Scholar]

[R8] Donaldson GW (2005). Structural equation models for quality of life response shifts: promises and pitfalls. Quality of life research, 14(10), 2345–2351. https://www.jstor.org/stable/4039970 [DOI] [PubMed] [Google Scholar]

[R9] Fokkema M, Smits N, Kelderman H, & Cuijpers P (2013). Response shifts in mental health interventions: An illustration of longitudinal measurement invariance. Psychological Assessment, 25(2), 520. 10.1037/a0031669 [DOI] [PubMed] [Google Scholar]

[R10] Golembiewski RT, Billingsley K, & Yeager S (1976). Measuring change and persistence in human affairs: Types of change generated by OD designs. The Journal of Applied Behavioral Science, 12, 133–157. 10.1177/002188637601200201 [DOI] [Google Scholar]

[R11] Gonzalez O & MacKinnon DP (in press). The measurement of the mediator and its influence on statistical mediation conclusions. Psychological Methods. DOI: 10.1037/met0000263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Gonzalez O, Valente MJ, & MacKinnon DP, (May, 2017). Violations of longitudinal measurement invariance in the two-wave mediation model. Paper presented at the 25th annual meeting of the Society for Prevention Research, Washington, DC. [Google Scholar]

[R13] Grimm KJ, Ram N, & Estabrook R (2017). Growth modeling: Structural equation and multilevel modeling approaches. New York, NY: The Guilford Press. [Google Scholar]

[R14] Guenole N, & Brown A (2014). The consequences of ignoring measurement invariance for path coefficients in structural equation models. Frontiers in Psychology, 5, Article 980. 10.3389/fpsyg.2014.00980 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Hancock GR, Stapleton LM, & Arnold-Berkovits I (2009). The tenuousness of invariance tests within multisample covariance and mean structure models. In Teo T & Khine MS (eds.), Structural equation modeling: Concepts and applications in educational research (pp. 137–174). Rotterdam, The Netherlands: Sense Publishers. [Google Scholar]

[R16] Hayes AF (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76(4), 408–420. 10.1080/03637750903310360. [DOI] [Google Scholar]

[R17] Howard GS (1980). Response-shift bias: A problem in evaluating interventions with pre/post self-reports. Evaluation Review, 4, 93–106. 10.1177/0193841X8000400105 [DOI] [Google Scholar]

[R18] Hsiao YY, Tofighi D, Kruger ES, Van Horn ML, MacKinnon DP, & Witkiewitz K (2019). The (lack of) replication of self-reported mindfulness as a mechanism of change in mindfulness-based relapse prevention for substance use disorders. Mindfulness, 10, 724–736. 10.1007/s12671-018-1023-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Luengo Kanacri BP, Zuffiano A, Pastorelli C, Jiménez‐Moya G, Tirado LU, Thartori E, Gerbino M, Cumsille P, & Martinez ML (2019). Cross‐national evidences of a school‐based universal programme for promoting prosocial behaviours in peer interactions: Main theoretical communalities and local unicity. International Journal of Psychology, 55. 10.1002/ijop.12579 [DOI] [PubMed] [Google Scholar]

[R20] MacKinnon DP (2008). Introduction to statistical mediation analysis. New York, NY: Taylor & Francis/Erlbaum. [Google Scholar]

[R21] MacKinnon DP, & Dwyer JH (1993). Estimating mediated effects in prevention studies. Evaluation review, 17(2), 144–158. 10.1177/0193841X9301700202 [DOI] [Google Scholar]

[R22] MacKinnon DP, Lockwood CM, Hoffman JM, West SG, & Sheets V (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological methods, 7(1), 83. 10.1037/1082-989X.7.1.83 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Mayer A, Thoemmes F, Rose N, Steyer R, & West SG (2014). Theory and analysis of total, direct, and indirect causal effects. Multivariate Behavioral Research, 49, 425–442. 10.1080/00273171.2014.931797 [DOI] [PubMed] [Google Scholar]

[R24] Meredith W (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543. 10.1007/BF02294825 [DOI] [Google Scholar]

[R25] Millsap RE (2011). Statistical approaches to measurement invariance. Routledge. [Google Scholar]

[R26] Millsap RE, & Olivera-Aguilar M (2012). Investigating measurement invariance using confirmatory factor analysis. In Hoyle RH (Ed.), Handbook of structural equation modeling (pp. 380–392). New York, NY: Guilford Press. [Google Scholar]

[R27] Oort FJ (2005). Using structural equation modeling to detect response shifts and true change. Quality of Life Research, 14, 587–598. 10.1007/s11136-004-0830-y [DOI] [PubMed] [Google Scholar]

[R28] Oort FJ (2005). Towards a formal definition of response shift (in reply to GW Donaldson). Quality of Life Research, 2353–2355. http://www.jstor.org/stable/4039971 [DOI] [PubMed] [Google Scholar]

[R29] Oort FJ, Visser MR, & Sprangers MA (2005). An application of structural equation modeling to detect response shifts and true change in quality of life data from cancer patients undergoing invasive surgery. Quality of Life Research, 14, 599–609. 10.1007/s11136-004-0831-x [DOI] [PubMed] [Google Scholar]

[R30] Oort FJ, Visser MR, & Sprangers MA (2009). Formal definitions of measurement bias and explanation bias clarify measurement and conceptual perspectives on response shift. Journal of clinical epidemiology, 62(11), 1126–1137. [DOI] [PubMed] [Google Scholar]

[R31] Pearl J (2001, August). Direct and indirect effects. In Proceedings of the seventeenth conference on uncertainty in artificial intelligence (pp. 411–420). Morgan Kaufmann Publishers Inc. arXiv:1301.2300 [Google Scholar]

[R32] Plow M, Motl RW, Finlayson M, & Bethoux F (2020). Intervention mediators in a randomized controlled trial to increase physical activity and fatigue self-management behaviors among adults with multiple sclerosis. Annals of Behavioral Medicine, 54(3), 213–221. 10.1093/abm/kaz033g [DOI] [PubMed] [Google Scholar]

[R33] Quigley L, Dozois DJA, Bagby RM, Lobo DSS, Ravindran L, Quilty LC (2019). Cognitive change in cognitive-behavioural therapy v. pharmacotherapy for adult depression: a longitudinal mediation analysis. Psychological Medicine 49, 2626–2634. 10.1017/S0033291718003653 [DOI] [PubMed] [Google Scholar]

[R34] Rapkin BD, & Schwartz CE (2019). Advancing quality-of-life research by deepening our understanding of response shift: a unifying theory of appraisal. Quality of Life Research, 1–8. 10.1007/s11136-019-02248-z [DOI] [PubMed] [Google Scholar]

[R35] Reeve BB (2010). An opportunity to refine our understanding of “response shift” and to educate researchers on designing quality research studies: response to Ubel, Peeters, and Smith. Quality of Life Research, 19(4), 473–475. 10.1007/s11136-010-9612-x [DOI] [PubMed] [Google Scholar]

[R36] Rhemtulla M, van Bork R, & Borsboom D (2019). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological methods. 10.1037/met0000220 [DOI] [PubMed] [Google Scholar]

[R37] Robins JM, & Greenland S (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. www.jstor.org/stable/3702894 [DOI] [PubMed] [Google Scholar]

[R38] Rosseel Y (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of Statistical Software, 48, 1–36. [Google Scholar]

[R39] Sajobi TT, Brahmbatt R, Lix LM, Zumbo BD, & Sawatzky R (2018). Scoping review of response shift methods: current reporting practices and recommendations. Quality of Life Research, 27, 1133–1146. [DOI] [PubMed] [Google Scholar]

[R40] Sibthorp J, Paisley K, Gookin J, & Ward P (2007). Addressing response-shift bias: Retrospective pretests in recreation research and evaluation. Journal of Leisure Research, 39, 295–315. [Google Scholar]

[R41] Sprangers MA, & Schwartz CE (1999). Integrating response shift into health-related quality of life research: a theoretical model. Social Science & Medicine, 48, 1507–1515. 10.1007/s11136-017-1751-x [DOI] [PubMed] [Google Scholar]

[R42] Sprangers MA, & Schwartz CE (2010). Do not throw out the baby with the bath water: build on current approaches to realize conceptual clarity. Response to Ubel, Peeters, and Smith. Quality of Life Research, 19(4), 477–479. 10.1007/s11136-010-9611-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Tofighi D, & MacKinnon DP (2011). RMediation: An R package for mediation analysis confidence intervals. Behavior Research Methods, 43, 692–700. 10.3758/s13428-011-0076-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Ubel PA, Peeters Y, & Smith D (2010). Abandoning the language of “response shift”: a plea for conceptual clarity in distinguishing scale recalibration from true changes in quality of life. Quality of Life Research, 19(4), 465–471. 10.1007/s11136-010-9592-x [DOI] [PubMed] [Google Scholar]

[R45] Valente MJ, & MacKinnon DP (2017). Comparing models of change to estimate the mediated effect in the pretest–posttest control group design. Structural Equation Modeling: A Multidisciplinary Journal, 24, 428–450. 10.1080/10705511.2016.1274657 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] Valente MJ, MacKinnon DP, & Mazza GL (2019). A viable alternative when propensity scores fail: Evaluation of inverse propensity weighting and sequential G-estimation in a two-wave mediation model. Multivariate Behavioral Research, 1–23. 10.1080/00273171.2019.1614429 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] VanderWeele T (2015). Explanation in causal inference: methods for mediation and interaction. Oxford University Press. [Google Scholar]

[R48] Vergauwe J, Wille B, Hofmans J, Kaiser RB, & De Fruyt F (2018). The double-edged sword of leader charisma: Understanding the curvilinear relationship between charismatic personality and leader effectiveness. Journal of Personality and Social Psychology, 114, 110–130. 10.1037/pspp0000147 [DOI] [PubMed] [Google Scholar]

[R49] Weissman DG, Bitran D, Miller AB, Schaefer JD, Sheridan MA, & McLaughlin KA (2019). Difficulties with emotion regulation as a transdiagnostic mechanism linking child maltreatment with the emergence of psychopathology. Development and Psychopathology, 31, 1–17. 10.1017/S0954579419000348 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Williams J, Jones SB, Pemberton MR, Bray RM, Brown JM, & Vandermaas-Peeler R (2010). Measurement invariance of alcohol use motivations in junior military personnel at risk of depression or anxiety. Addictive Behaviors, 35, 444–451. 10.1016/j.addbeh.2009.12.012 [DOI] [PubMed] [Google Scholar]

[R51] Woods CM, & Grimm KJ (2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Applied Psychological Measurement, 35, 339–361. 10.1177/0146621611405984 [DOI] [Google Scholar]

PERMALINK

Evaluating Response Shift in Statistical Mediation Analysis

A R Georgeson

Matthew J Valente

Oscar Gonzalez