Table 1.
Statistical approaches | Description | Assumptions | Limitations | Example |
---|---|---|---|---|
Confounding | ||||
Multivariable regression | Potential confounders are included in the regression model for the effect of the exposure on the outcome | No residual confounding (all confounders are accurately measured, and correctly included in the statistical model); for multivariable regression, the outcome is modelled correctly given the exposure and confounders, for propensity score methods the exposure is modelled correctly given the confounders | Assumptions difficult to meet with full confidence resulting in bias from residual confounding; although propensity scores carry some advantages over multivariable regression (e.g. statistical efficiency and flexibility), the different methods to incorporate a propensity score into the analysis model (e.g. stratifying, matching, adjusting, weighting) each have their own limitations – see Haukoos and Lewis (Haukoos & Lewis, 2015) for an overview | Harrison and colleagues (Harrison et al., 2020) performed a multivariable logistic regression between smoking behaviours and suicidal ideation and attempts, adjusting for potential confounders including age, sex and socio-economic position |
Propensity scores | Propensity scores are used to control for time-invariant confounding, calculated by estimating the probability that an individual is exposed, given the values of their observed baseline confounders; can be extended to address time-varying confounding via marginal structural models | Bray and colleagues (Bray et al., 2019) used a propensity score to adjust for confounding when examining the association between reasons for alcohol use latent class membership during the year after high school and problem alcohol use at age of 35 years | ||
Fixed-effects regression | This approach uses repeated measures of an exposure and an outcome to account for the possibility of an association between the exposure and the unexplained variability in the outcome (representing unmeasured confounding); can adjust for all time-invariant confounders, including unobserved confounders, and can incorporate observed time-varying confounders | Potential time-varying confounders are measured accurately and correctly included in the statistical model | Requires repeated assessments of exposure and outcome; model cannot control for unobserved fixed confounding factors whose effects vary with age, or that combine interactively with the exposure to influence the outcome, or unobserved time-varying confounders | Fergusson and Horwood (Fergusson & Horwood, 2000) used fixed-effects regression to assess the influence of deviant peer affiliations on substance use and crime across adolescence and young adulthood, taking into account unobserved fixed confounding factors and observed time-varying factors |
Selection bias | ||||
Complete case analysis with covariate adjustment | Analyses are performed on those with complete data on all variables, but covariates are included in the model that are associated with missingness | Data are MAR or MCAR; results can be unbiased when data are MNAR as long as the chance of being a complete case does not depend on the outcome after adjusting for covariates | Cannot address lack of power due to missing data; results biased when outcome MNAR; must be aware of and measure predictors of missingness; cannot include information from variables not included in main analysis that are associated with missingness | Hughes and colleagues (Hughes et al., 2019) use a hypothetical example examining the relationship between cannabis use at 15 years with depression symptoms and self-harm at age 21 years to describe missing mechanisms using causal diagrams and provide situations where complete case analysis and multiple imputation will or will not result in bias |
Approaches based on the MAR assumption, e.g. multiple imputation | Multiple imputation is a two-stage process, where first, multiple imputed data sets are created with each missing value replaced by imputed values using models fitted to the observed data, and second, each imputed data set is analysed, and results are combined in an appropriate way; can address both lack of power and bias (with extensions that exist to allow for MNAR mechanisms using sensitivity parameters) | Data are MAR or MCAR; imputation model is compatible with analysis model; imputation is performed multiple times and performed ‘properly;’ final analysis combines appropriately over the multiple data sets (e.g. using Rubin's rules); for a more in-depth discussion of potential pitfalls in multiple imputation see the review by Sterne and colleagues (Sterne et al., 2009) | If exposure is MNAR, multiple imputation can cause more bias than using complete case analysis; requires information to be collected on auxiliary variables, closely associated with variables to be imputed; all aspects of the analysis model must be included in the imputation model, therefore if changes are made at a later date (e.g. testing an interaction), the imputation model needs to be redone; computationally intensive therefore can result in computational problems (particularly with small sample sizes) | |
Approaches based on the MNAR assumption, e.g. using linkage to external routinely collected health records | Routinely collected health data can be used to examine biases from selective non-response by providing data on those that did and did not respond to assessments within population cohorts or surveys; it can also be used as a proxy for the missing study outcome in multiple imputation or deriving weights to adjust for potential bias and make the MAR assumption more plausible | High correlation between study outcome and linked proxy; if the outcome is not MNAR but missingness depends on the proxy, inclusion of the proxy in a multiple imputation model would increase bias – see Cornish and colleagues (Cornish et al., 2017) for an example) | Requires access to closely related routinely collected data; not all participants may consent to linkage which could introduce bias if differences between non-consenters and non-responders; linkage to external datasets can be costly and complicated; use of a proxy in multiple imputation can increase bias depending on missing data mechanism | Gorman and colleagues (Gorman et al., 2017) found that the use of routinely collected health data on alcohol-related harm in a multiple imputation model resulted in higher alcohol consumption estimates among Scottish men |
Measurement bias | ||||
Latent variables using multiple sources of data | A latent variable is a source of variance not directly measured but estimated from the covariation between a set of strongly related observed variables; if these observed variables are assessed using multiple methods, each with different sources of bias, variability due to bias shared across items can be removed from the latent variable | Latent variable indicators all measure same underlying construct and responses on the indicators are a result of an individual's position on the latent variable; latent variable variance is independent from measurement residual variance; indicators assessed using different methods have different sources of bias; for a description of all assumptions in latent variable modelling see Kline (Kline, 2015) | Requires at least four strongly correlated measures assessed using different methods each with different sources of bias; important that items included make theoretical sense given underlying construct; important to think carefully about the meaning of the latent variable | Palmer and colleagues (Palmer et al., 2002) describe a method using two self-report and two biochemical measures of smoking (carbon monoxide and cotinine), to remove variability due to self-report bias (e.g. recall or social desirability bias) and biological bias (e.g. second-hand smoke) and create a latent variable representing cigarette smoking |
Mechanisms | ||||
Counterfactual mediation | Mediation approach based on conceptualizing ‘potential outcomes’ for each individual [Y(x)] that would have been observed if particular conditions were met (i.e. had the exposure X been set to the value x through some intervention) – regardless of the conditions that were in fact met for each individual; allows the presence of an interaction between the exposure and mediator to be tested, inclusion of binary mediators and outcomes, and sensitivity analyses to examine potential impact on conclusions of unmeasured confounding and measurement bias | Main assumptions include conditional exchangeability, no interference and consistency; see de Stavola and colleagues (De Stavola, Daniel, Ploubidis, & Micali, 2015) for an accessible description of these assumptions and a comparison to assumptions made when estimating mediation within an SEM framework | Still subject to the same threats to causality as traditional approaches to mediation analyses (including poorly measured or unmeasured confounding and measurement error); challenging to extend to examine individual paths via multiple mediators; each specific counterfactual mediation method subject to its own limitations – see VanderWeele (VanderWeele, 2015) | Using a sequential counterfactual mediation approach, Aitken and colleagues (Aitken, Simpson, Gurrin, Bentley, & Kavanagh, 2018) showed that behavioural factors (including smoking and alcohol consumption) explained a further 5% of the association between disability acquisition and poor mental health in adults after accounting for material and psychosocial factors. The authors also performed a bias analysis which showed that the indirect effects were unlikely to be explained by unmeasured mediator-outcome confounding |
Design-based approaches | ||||
RCTs | In an RCT, participants are randomly assigned to a treatment or control group, and the outcome is compared across groups; when performed well, RCTs can account for both known and unknown confounders and are therefore considered to be the gold standard for estimating causal effects | Assignment to treatment and control groups is random, and so groups are similar except with respect to the intervention | Prone to potential bias, such as lack of concealment of the random allocation, failure to maintain randomization, lack of blinding to which group participants have been randomized, non-adherence, and differential loss to follow-up between groups; often recruit highly selected samples which are not representative of the population of interest, threatening the generalizability of results; can be expensive and time-consuming and not always feasible or ethical, particularly in mental health research | Ford and colleagues (Ford et al., 2019) performed a cluster RCT to examine the effectiveness and cost-effectiveness of the Incredible Years Teacher Classroom Management programme as a universal intervention in primary school children; the intervention reduced the total difficulties score on the Strength and Difficulties Questionnaire at 9 months compared to teaching as usual, but this did not persist at 18 or 30 months |
Natural experiments | Populations are compared before and after (or with and without exposure to) a ‘natural’ exposure at a specific time point, with the assumption that potential biases (such as confounding) are similar between them; exposure may occur naturally (e.g. famine), or be quasi-random (e.g. introduction of policies) | Populations compared are comparable (e.g. with respect to the underlying confounding structure) except for the naturally occurring (or quasi-randomized) exposure | Potential sources of bias include differences on characteristics that may confound any observed association, or misclassification of outcome that relates to the naturally occurring exposure; relies on the occurrence of appropriate natural experiments that manipulate exposure of interest; selection bias can be present as exposure is not manipulated by researcher | Davies and colleagues (Davies et al., 2018a) used the raising of the school leaving age from 15 to 16 years as a natural experiment for testing whether remaining in school at 15 years of age affected later health outcomes (including depression diagnosis, alcohol use and smoking) |
Instrumental variables | An instrumental variable is a variable that is robustly associated with an exposure of interest, but not confounders of the exposure and outcome. MR is an extension of this approach where a genetic variant is used as a proxy for the exposure | The instrument is associated with the exposure (relevance assumption); the instrument is not associated with confounders of the exposure-outcome association (exchangeability assumption); the instrument is not associated with the outcome other than via its association with the exposure (exclusion restriction assumption) | Weak instrument bias can result from a weak association between the instrument and the exposure; another source of bias is the exclusion restriction criterion being violated – this is the main source of bias in MR (due to horizontal pleiotropy), and therefore a number of extensions have been developed which are robust to horizontal pleiotropy; population stratification is also a source of bias in MR, which may require focusing on an ethnically homogeneous population, or adjusting for genetic principal components that reflect different population sub-groups | Taylor and colleagues (Taylor et al., 2020) used the tendency of physicians to prefer prescribing one medication over another as an instrumental variable in testing the association between varenicline (v. nicotine replacement therapy) with smoking cessation and mental health |
Different confounding structures | Multiple samples with different confounding structures are used, for example, comparing multiple control groups within a case−control design, or multiple populations with different confounding structures | The bias introduced by confounding is different across samples so that congruent results are more likely to reflect causal effects; different results across samples are due to different confounding structures and not true differences in causal effect; no other sources of bias that could explain results being the same or different across samples | Assessment and quality of measures must be similar across samples; misclassification of exposure or outcome (or other unknown sources of bias) can produce misleading results; strong a priori hypotheses required about confounding structures across samples | Sellers and colleagues (Sellers et al., 2020) compared the association between maternal smoking in pregnancy and offspring birth weight, cognition and hyperactivity in two national UK cohorts born in 1958 and 2000/2001 with different confounding structures |
Positive and negative controls | This approach allows a test of whether an exposure or outcome is behaving as expected (a positive control), or not as expected (a negative control); a positive control is known to be causally related to the outcome (or exposure), whereas a negative control is not plausibly causally related to outcome (or exposure) | The real exposure (or outcome) and negative control exposure (or outcome) have the same sources of bias; the negative control exposure is not causally related to the outcome (and vice versa for negative control outcome); the positive control exposure is causally related to the outcome (and vice versa for positive control outcome) | Important to consider assortative mating in the prenatal negative control design, and mutually adjust for maternal and paternal exposures [see Madley-Dowd and colleagues (Madley-Dowd et al., 2020b)]; appropriate negative control variables can be difficult to identify (e.g. where an exposure may have diverse effects on a range of outcomes) | Caramaschi and colleagues (Caramaschi et al., 2018) used paternal smoking during pregnancy as a negative control exposure to investigate whether the association between maternal smoking during pregnancy and offspring autism is likely to be causal, on the assumption that any biological effect of paternal smoking on offspring autism will be negligible, but that confounding structures will be similar to maternal smoking |
Discordant siblings | Family-based study designs can provide a degree of control over family-level confounding by comparing outcomes for siblings who are discordant for an exposure; for example, two siblings born to a mother who smoked during one pregnancy, but not the other, provide information on the intrauterine effects of tobacco exposure, while controlling for observed and unobserved genetic and shared environmental familial confounding | Any misclassification of the exposure or outcome is similar across siblings, and there is little or no individual-level confounding (for example, one sibling was not exposed to a potential confounder where the other was not) | The assumption of no individual-level confounding is unlikely to be met (for example, the plausible scenario where a mother is both older and less likely to be smoking for the second pregnancy); method depends on the availability of suitable samples which means sample size can be limited (particularly for use of identical twins within a discordant-sibling design); bias due to individual-level confounding or misclassification of exposure/ outcome will be larger than in studies of unrelated individuals – see Frisell and colleagues (Frisell, Oberg, Kuja-Halkola, & Sjolander, 2012) | Madley-Dowd and colleagues (Madley-Dowd et al., 2020a) used a Danish cohort of parents and siblings to examine the association between maternal smoking in pregnancy and offspring intellectual disability; the lack of within-family effect suggested that any association was due to genetic or environmental confounders shared between the siblings; a positive control outcome (birthweight) where a causal relation with the exposure (maternal smoking in pregnancy) is well established was used to validate the method |
MAR, missing at random; MCAR, missing completely at random; MNAR, missing not at random; SEM, structural equation modelling; RCT, randomized controlled trial; MR, Mendelian randomization.