Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jul 1.
Published in final edited form as: Epidemiology. 2012 Jul;23(4):561–564. doi: 10.1097/EDE.0b013e318258f5e4

The role of measurement error and misclassification in mediation analysis

Tyler J VanderWeele 1, Linda Valeri 2, Elizabeth L Ogburn 3
PMCID: PMC3367328  NIHMSID: NIHMS375505  PMID: 22659547

Methods to estimate direct and indirect effects have been rapidly expanding.114 It is now well documented that such mediation analyses are subject to strong no-confounding assumptions and that an unmeasured confounder of the mediator-outcome relationship can lead to substantial bias in direct and indirect effect estimates.1,2,6,7,14,15 Much less attention has been given to the question of how measurement error may bias estimates of direct and indirect effects. le Cessie and colleagues.16 have done a service to investigators interested in direct effects by providing simple correction formulas for direct effects estimates in a variety of mediator measurement-error scenarios. Here we will consider the implications of these and other results as they relate to making inferences, not just about direct effects, but also about mediation and indirect effects.

Mediation and Non-differential Measurement Error

Suppose that a mediator is subject to non-differential measurement error or misclassification (that is to say, the error does not depend on the exposure or outcome conditional on the true mediator and covariates). Intuitively we might expect that such measurement error will weaken the association between the mediator and the outcome and will therefore perhaps bias estimates of mediated effects towards the null and bias estimates of direct effects away from the null. An important question is under what conditions this intuition holds.

le Cessie et al.16 consider a logistic regression model of the form:

logit[P(Y=1X,M,C)]=β0+β1X+β2M+βcC (1)

where Y is the outcome, X the exposure, M the mediator, and C the covariates. They suppose that the investigator has access to a mismeasured mediator M*= M+ U, where U is normally distributed with mean 0 and independent of M. They let λ denote the proportion of the variance in M* explained by M, conditional on X and C. An investigator might then fit the logistic model with the mismeasured mediator:

logit[P(Y=1X,M,C)]=β0+β1X+β2M+βctC (2)

le Cessie et al. also consider a linear regression for the mediator:

E[MX,C]=α0+α1X+α2tC (3)

If the linear regression is fit with the mismeasured mediator this would be:

E[MX,C]=α0+α1X+α2tC (4)

Under these assumptions, the coefficients in models (3) and (4) will be the same. However, for the logistic regression, the coefficients in models (1) and (2) will differ. Under their assumptions about measurement error they note that the relationships between the coefficients in models (1) and (2) are given by:

β1=β1-β2(1/λ-1)α1 (5)
β2=β2/λ (6)

Provided that the covariates C control for confounding of the exposure-outcome and mediator-outcome relationships, exp(β1) is equal to the controlled direct effects odds ratio.5 le Cessie et al.16 thus note that even when the mediator is subject to such measurement error, we could obtain a corrected direct effect estimate as follows. We could use model (2) with the mismeasured mediator to obtain estimates of β1 and β2. We could then use model (4) to estimate α1=α1. If we specify λ (the proportion of the variance of M* explained by M, conditional on X and C), then we could use λ, α1=α1 and β1 and β2 in equation (5) to obtain a corrected estimates of β1. The controlled direct effect odds ratio is simply given by exp(β1)

Similar logic can in fact also be used for mediated effects. Suppose that C controls for confounding for the (i) exposure-outcome, (ii) mediator-outcome, and (iii) exposure-mediator relationships and that (iv) there is no mediator-outcome confounder that is affected by exposure.2,4,5 If, in addition, the outcome is rare and the models are correctly specified, then so-called natural direct and indirect effect odds ratios are given by exp(β1) and exp(α1β2), respectively. In this case, we could fit models (2) and (4), specify λ, and use the expression in (5) and (6) to obtain estimates of β1 and β2 that are corrected for measurement error. We could then use exp(β1) and exp(α1β2) as estimates of natural direct and indirect effects. Corrected confidence intervals either have to take into account that α1 is estimated (because α1 appears in the correction formula for β1; this can be done with the delta method) or corrected confidence intervals could be obtained by bootstrapping.

The results also have interesting implications for the direction of bias of these effects. If we ignore measurement error and use exp(α1β2) as the estimate of the indirect effect, then this would be biased towards the null odds ratio of 1 because β2=λβ2 and λ < 1. Similarly it follows from equation (5) that if the direct and indirect effects are in the same direction, and if we ignore measurement error and use exp(β1) as a measure of the direct effect, then this will be biased away from the null odds ratio of 1 (i.e. if the true direct effect odds ratio is greater than 1, then exp(β1) will be even larger; if the true direct effect odds ratio is less than 1, then exp(β1) will be even smaller). Thus under classical non-differential measurement error with a normally distributed mediator, the bias of the mediated effect under models (1) and (3) is always towards the null. If the direct and indirect effects are in the same direction, then the bias of direct effect is away from the null.

Will non-differential measurement error result in a similar pattern of biases in other settings? In related work, we have shown that if a binary mediator is subject to non-differential misclassification then, once again, the bias of the mediated effect is towards the null and the bias of direct effect is away from the null.17 Unfortunately, however, non-differential misclassification of a polytomous mediator will not always result in biases that follow these patterns. It is possible to construct examples of a non-differentially misclassified mediator with three levels such that the bias of the mediated effect is away from the null and the bias of the direct effect is towards the null.17 It is even possible to construct examples in which the direct and mediated effect estimates, when ignoring measurement error, lie on the wrong side of the null. An important task will be to better characterize those settings in which non-differential measurement error or misclassification of a mediator leads to bias patterns of the type we would intuitively expect.

One final point is of interest before moving on. Using the definitions of natural direct and indirect effects given in the causal inference literature, the total effect will always decompose into the sum of the natural direct and indirect effects on a difference scale, while a total effect on the odds ratio scale will always decompose into the product of the natural direct and indirect effects odds ratios. We saw above that in the presence of measurement error for the mediator, the standard estimators for the direct and indirect effects will be biased. Perhaps surprisingly, however, if we use these biased direct and indirect measures and take their product on the odds-ratio scale (or sum on the difference scale) we will still get an unbiased estimate of the total effect. In some ways this is intuitive. Even if we have measurement error of the mediator, we should still be able to obtain valid estimates of total effects by simply ignoring the mediator. What may be surprising is that even if we use the mismeasured mediator to estimate biased direct and indirect effects, their combination is still unbiased for the total effect. In fact, this property holds not simply for non-differential measurement error of the mediator, but, as shown in the Appendix, for any form of measurement error of the mediator.

Other Forms of Measurement Error and More Complex Models

This discussion of effect decomposition in fact also suggests another way to harness the results of le Cessie et al.16 to reason not only about direct effects but about mediated effects. In the previous section, under suitable assumptions, we used exp(α1β2) as a measure of the mediated effect. The use of this and similar expressions is sometimes referred to as the “product method” because in essence it takes the product of the exposure-coefficient in the mediator model with the mediator-coefficient in the outcome model.11,18 An alternative way to go about estimating mediated effects – at least under models (1) and (3) – is sometimes referred to as the “difference method.” The difference method first estimates a total effect (e.g. by not using data on the mediator), then estimates a direct effect, as in model (1), and takes as the estimate of the mediated effect the “difference” between the total effect and the direct effect. For odds ratios, this “difference” is done on the log-odds scale. Equivalently, then, we could take the total effect odds ratio divided by the direct effects odds ratio to get an indirect effect odds ratio. Provided there are no exposure-mediator interactions or other interactions between variables, the product method and difference method will coincide for continuous outcomes and coincide approximately for binary outcomes if the outcome is rare.5,18

This difference method supplies an approach to obtain corrected estimates of mediated effect in the various other mediator measurement error scenarios described by le Cessie et al.16 (e.g. differential measurement error with the exposure or outcome affecting the mediator measurement, differential or non-differential intra individual variation over time, or trigger mechanisms). If model (2) is correctly specified so that there are no exposure-mediator interactions, and if the no-confounding assumption (i)–(iv) hold, and the outcome is rare, we could get corrected estimates of the direct effect odds ratio exp(β1) using the formulas in le Cessie et al. for any of the measurement error scenarios that they consider. We could then estimate the total effect odds ratio by simply ignoring data on the mediator. Measurement error of the mediator will then not affect estimates of the total effect. Finally we could take the ratio of our total-effect odds ratio and the measurement-error-corrected direct-effect odds ratio to obtain a measurement-error-corrected indirect-effect odds ratio. Standard errors could be obtained by bootstrapping. This approach will work for any of the forms of mediator measurement error described by le Cessie et al., and it will work for any other form of mediator measurement error for which we are able to obtain measurement-error-corrected estimates of the direct effect.

Of course all of our discussion here has presupposed that the models in equations (1)(4) were correctly specified. One advantage of the approach to mediation that has developed within the causal inference literature is that it has allowed for the definition and estimation of direct and indirect effects even in the presence of exposure-mediator interactions.16,11 We could thus extend also models (1) and (2) to allow for such interaction. Analytic expressions for direct and indirect in the presence of interactions when there is no measurement error are given elsewhere.4,5,11 As noted by le Cessie and colleagues,16 in such cases, the simple formulas that they derived are no longer applicable, though one could use SIMEX methods to attempt to correct for measurement error. In fact, in related work, we have shown that even in the presence of exposure-mediator interactions, analytic expressions can be derived for measurement-error-corrected direct and indirect effects, at least for classical non-differential classification of a continuous mediator.11 Even these expressions can be quite complicated, and comparison with other approaches, such as SIMEX, becomes important.19

Discussion

le Cessie et al.16 have provided a number of helpful results in correcting direct-effect estimates for mediator measurement error. Here we have discussed how correction methods can similarly be applied to indirect or mediated effects, and we have discussed simple rules to know a priori, at least in certain cases, the direction of the bias of direct and indirect effects. Other work in the social sciences addresses mediator measurement error by utilizing data on multiple measurements or on variables related to the mediator.20,21 Future work could consider settings in which both the exposure and the mediator, or both the mediator and the outcome, are subject to either differential or non-differential measurement error.

Concerns about measurement error in mediation analysis are not simply hypothetical. Such issues arose in a study on the extent to which certain genetic variants affected lung cancer through nicotine dependence and associated smoking behavior (measured in terms of cigarettes per day) versus other pathways.22 The number of cigarettes per day was assessed by self-report and was thus subject to measurement error; moreover, cigarettes per day, even correctly reported, would be a crude measure of nicotine dependence. Analyses ignoring measurement error suggested that most of the effect of the variants was direct; using methods of the type described here, it was possible to show that, although the indirect effect may have been underestimated in the initial analysis, allowing for even very substantial measurement error would not change the qualitative conclusion that the vast majority of the effect was direct. Similar questions are likely to arise in other settings in which mediation is of interest. Measurement error could potentially prove to be as significant a threat to mediation analyses as unmeasured mediator-outcome confounding. Correction methods such as the type described by le Cessie et al.16 will be helpful in addressing this potential source of bias.

Acknowledgments

Tyler J. VanderWeele was supported by National Institutes of Health grant HD060696.

Appendix

Proof that under mediator measurement error, the sum of the biased direct and indirect effect estimators still give an unbiased total effect:

We will consider a difference scale; the proof for the odds ratio scale is similar. When the relevant no-confounding assumptions hold, estimators for natural direct and indirect effects are given by Pearl2 as follows:

m{E[YX=1,m,c]-E[YX=0,m,c]}P(mX=0,c)

and the natural indirect effect is given by:

mE[YX=1,m,c]{P(mX=1,c)-P(mX=0,c)}

If the mis-measured mediator M* were used instead of M, these two expressions would be respectively:

m{E[YX=1,m,c]-E[YX=0,m,c)}P(mX=0,c)mE[YX=1,m,c]{P(mX=1,c)-P(mX=0,c)}

The sum of these is equal to:

mE[YX=1,m,c]{P(mX=1,c)-P(mX=0,c)}+m{E[YX=1,m,c]-E[YX=0,m,c)}P(mX=0,c)mE[YX=1,m,c]P[mX=1,c)-mE[YX=0,m,x]P(mX=0,c)=E[YX=1,c]-E[YX=0,c]

where the final equality follows by iterated expectations and the final quantity is equal to the total effect, conditional on C=c, of the exposure on the outcome provided that C suffices control for confounding of the effect of X on Y. Thus the sum of the biased natural direct and indirect effect estimators using data on the mis-measured mediator M* rather than M will still be an unbiased estimator of the total effect. Note that the argument did not make any assumptions about the confounding for the effect of M nor about the form of the measurement error for M. Invited Commentary (on EDE 11–304)

Contributor Information

Tyler J. VanderWeele, Departments of Epidemiology and Biostatistics, Harvard School of Public Health

Linda Valeri, Department of Biostatistics, Harvard School of Public Health.

Elizabeth L. Ogburn, Department of Epidemiology, Harvard School of Public Health

References

  • 1.Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
  • 2.Pearl J. Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence. San Francisco: Morgan Kaufmann; 2001. Direct and indirect effects; pp. 411–420. [Google Scholar]
  • 3.van der Laan MJ, Petersen ML. Direct effect models. International Journal of Biostatistics. 2008:Article 23. doi: 10.2202/1557-4679.1064. [DOI] [PubMed] [Google Scholar]
  • 4.VanderWeele TJ, Vansteelandt S. Conceptual issues concerning mediation, interventions and composition. Statistics and Its Interface - Special Issue on Mental Health and Social Behavioral Science. 2009;2:457–468. [Google Scholar]
  • 5.VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis with a dichotomous outcome. American Journal of Epidemiology. 2010;172:1339–1348. doi: 10.1093/aje/kwq332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychological Methods. 2010;15:309–334. doi: 10.1037/a0020761. [DOI] [PubMed] [Google Scholar]
  • 7.VanderWeele TJ. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology. 2010;21:540–551. doi: 10.1097/EDE.0b013e3181df191c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lange T, Hansen JV. Direct and indirect effects in a survival context. Epidemiology. 2011;22:575–581. doi: 10.1097/EDE.0b013e31821c680c. [DOI] [PubMed] [Google Scholar]
  • 9.VanderWeele TJ. Causal mediation analysis with survival data. Epidemiology. 2011;22:582–585. doi: 10.1097/EDE.0b013e31821db37e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tchetgen EJ. On causal mediation analysis with a survival outcome. International Journal of Biostatistics. 2011;7:Article 33, 1–38. doi: 10.2202/1557-4679.1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Valeri L, VanderWeele TJ. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods. doi: 10.1037/a0031034. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ten Have TR, Joffe MM. A review of causal estimation of effects in mediation analyses. Stat Methods Med Res. 2012;21:77–107. doi: 10.1177/0962280210391076. [DOI] [PubMed] [Google Scholar]
  • 13.Tchetgen EJ, Shpitser I. Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Annals of Statistics. doi: 10.1214/12-AOS990. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Judd CM, Kenny DA. Process analysis: estimating mediation in treatment evaluations. Evaluation Review. 1981;5:602–619. [Google Scholar]
  • 15.Cole SR, Hernan MA. Fallibility in estimating direct effects. International Journal of Epidemiology. 2002;31:163–165. doi: 10.1093/ije/31.1.163. [DOI] [PubMed] [Google Scholar]
  • 16.le Cessie S, Debeij J, Rosendaal FR, Cannegieter SC, Vandenbroucke J. Quantification of bias in direct effects estimates due to different types of measurement error in the mediator. Epidmiology. 2012;23:xxx–xxx. doi: 10.1097/EDE.0b013e318254f5de. [DOI] [PubMed] [Google Scholar]
  • 17.Ogburn EL, VanderWeele TJ. Analytic results on the bias due to nondifferential misclassification of a binary mediator. American Journal of Epidemiology. doi: 10.1093/aje/kws131. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.MacKinnon DP, Warsi G, Dwyer JH. A simulation study of mediated effect measures. Multivariate Behavioral Research. 1995;30:41–62. doi: 10.1207/s15327906mbr3001_3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Valeri L, Lin X, VanderWeele TJ. Mediation analysis when the mediator is measured with error and the outcome follows a generalized linear model. Technical Report. doi: 10.1002/sim.6295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bollen KA. Structural Equations with Latent Variables. Wiley; New York, NY: 1989. [Google Scholar]
  • 21.MacKinnon DP. An Introduction to Statistical Mediation Analysis. Lawrence Erlbaum Associates; New York: 2008. [Google Scholar]
  • 22.VanderWeele TJ, Asomaning K, Tchetgen EJ, Han Y, Spitz MR, Shete S, Wu X, Gaborieau V, Wang Y, McLaughlin J, Hung RJ, Brennan P, Amos CI, Christiani DC, Lin X. Genetic variants on 15q25.1, smoking and lung cancer: an assessment of mediation and interaction. American Journal of Epidemiology. doi: 10.1093/aje/kwr467. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES