Confounding and reverse causality. A confounder is a third variable (C) that influences both the exposure (X) and the outcome (Y), causing a spurious association between them. Traditionally, a confounder was defined on the basis of three criteria, namely that it should be: (i) associated with X; (ii) associated with Y, conditional on X and (iii) not on the causal pathway between X and Y. For example, Fig. 1A shows the association between smoking (X) and educational attainment (Y), which is partly confounded by behavioural problems (C). Reverse causality is a specific case of confounding where pre-existing symptoms of the outcome can cause the exposure and result in the observed association between the exposure and outcome. Reverse causality is often addressed by adjusting for a baseline measure of the outcome (Y1) when examining the association between the exposure (X) and the outcome at follow-up (Y2). However, because X and Y1 are assessed simultaneously, it is possible that Y1 is on the causal pathway between X and Y2 (Fig. 1B) resulting in overcontrol bias. A second example of inappropriate adjustment for confounding follows directly from the traditional definition of a confounder. Figure 1C shows an example of a third variable (L) which is associated with the exposure (X) due to an unmeasured confounder (U2), and associated with outcome (Y) due to an unmeasured confounder (U1), and not on the causal pathway between X and Y. According to the traditional definition, L should be adjusted for in the analyses. However, as shown in Fig. 1D, conditioning on L (represented by a square drawn around L) induces an association between U1 and U2 (represented by a dashed line) which introduces unmeasured confounding for the association between X and Y. This is an example of collider bias, which is discussed in more detail below. A more recent definition of a confounder that prevents this potential bias occurring is a variable that can be used to block a backdoor path between the exposure and outcome (Hernan & Robins, 2020). Selection bias. Selection bias is an overarching term for many different biases including differential loss to follow-up, non-response bias, volunteer bias, healthy worker bias, and inappropriate selection of controls in case−control studies (Hernan, 2004). It is present when the process used to select subjects into the study or analysis results in the association between the exposure and outcome in those selected subjects differing from the association in the whole population (Hernan, Hernandez-Diaz, & Robins, 2004). This bias is (usually) a consequence of conditioning (i.e. stratifying, adjusting or selecting) on a common effect of an exposure and an outcome (or a common effect of a cause of the exposure and a cause of the outcome), known as collider bias (Elwert & Winship, 2014; Hernan et al., 2004). Figures 1E and F show how bias can result from selective non-response or attrition in longitudinal studies. Figure 1E represents a longitudinal study examining the association between maternal smoking in pregnancy (X) and child autism (Y). Those with a mother who smoked in pregnancy (X) and males (U) are less likely to participate in the follow-up (R). If a male participant provides follow-up data, then it is less likely that the alternative cause of drop-out (maternal smoking in pregnancy) will be present. This results in a negative association between X (maternal smoking) and U (male gender) in those with complete outcome data. Male gender (U) is positively associated with child autism (Y), therefore, restricting to those with complete outcome data will result in the positive association between X (maternal smoking in pregnancy) and Y (child autism) being underestimated; see (Hernan et al., 2004) for an alternative example. Non-response or attrition results in bias when conditioning on response introduces a spurious path between the exposure and outcome (Elwert & Winship, 2014). Further examples of selection bias, including attrition, are described in detail elsewhere (Daniel, Kenward, Cousens, & De Stavola, 2012; Elwert & Winship, 2014; Hernan et al., 2004). Measurement bias. Measurement bias results from errors in assessment of the variables in the analysis due to imprecise data collection methods (for example, self-report measures of socially undesirable behaviours such as smoking can often be underreported). Measurement error can be either differential (e.g. measurement error in the exposure is related to the outcome or vice versa) or non-differential. With a few exceptions (e.g. non-differential measurement error in a continuous outcome) both non-differential and differential measurement error will result in bias (Hernan & Cole, 2009; Jiang & VanderWeele, 2015; VanderWeele, 2016). Figure 1G shows an example of non-differential measurement error in a mediator. M refers to the true mediator, M* refers to the measured mediator, and UM refers to the measurement error for M (Hernan & Cole, 2009). Reducing measurement error is especially important in the context of a mediation model, because measurement error in the mediator often leads to an underestimated indirect effect and an overestimated direct effect (Blakely, McKenzie, & Carter, 2013; VanderWeele, 2016). Figure 1H shows an example of differential measurement error. Measurement error in the exposure X (parent smoking in pregnancy assessed retrospectively) is influenced by the outcome Y (child behavioural problems) resulting in bias in the exposure-outcome association. When there is measurement error in both the exposure and the outcome, it can be dependent (when the errors are associated, for example, due to measurement using a common instrument) or independent. Both differential measurement error and dependent measurement error can open a backdoor pathway between the exposure and outcome (Hernan & Cole, 2009). |