Addiction researchers need to be more aware of collider bias as a possible explanation for findings. This could be an explanation for an apparently protective effect of smoking on COVID‐19.
Addiction researchers tend to have a keen eye for confounder bias. They are often quick to spot why variables that cause both an exposure and an outcome can produce spurious associations. A researcher who is studying possible gateway effects (for example, the likelihood of cocaine use among people who have prior exposure of cannabis use) will take care to control for confounding variables, such as the personality trait ‘novelty‐seeking’. In this simulated example, novelty‐seeking is a cause of cannabis use and also a cause of cocaine use [1]. Unless controlled for, confounders may suggest a causal association between cannabis and cocaine use where none exists. However, there is another source of bias that is often neglected: collider bias. Collider bias can be seen as the flip side of confounder bias, but it is much less intuitive. Whereas confounders cause both exposures and outcomes, colliders are caused by both exposures and outcomes (Fig. 1 shows directed acyclic graphs). Whereas controlling for a confounder removes bias, controlling for a collider can produce it. So, when do confounder and collider bias occur?
Confounder bias occurs when an analysis fails to adequately control for a variable, a ‘confounder’, that is a cause of both the exposure and outcome. The effect of this is to distort the association between exposure and outcome. In the above example, novelty‐seeking (confounder) causes both cannabis (exposure) and cocaine use (outcome). This indicates that people who use cannabis have higher levels of the novelty‐seeking trait than those who do not. As high novelty‐seeking also causes people to use cocaine, there will be a positive association between use of both drugs even if cannabis use does not itself cause subsequent cocaine use. To determine whether or not cannabis use causes cocaine use, the researcher must control for novelty‐seeking by adding it as a covariable in the analysis. This will show whether people who use cannabis are more likely to subsequently use cocaine than those who do not, even when they have the same level of novelty‐seeking. By controlling for the confounder, bias has been removed.
By contrast, collider bias occurs when an analysis controls for, stratifies on, or selects its sample based on a variable, a ‘collider’, that is caused by the exposure and also caused by the outcome [2, 3]. This distorts the association between the exposure and outcome. For example, a researcher is interested in testing whether depression (the exposure) is associated with impulsivity (the outcome). Let us assume that, in this simulated example, there is no association between depression and impulsivity in the general population. However, depression and impulsivity both increase the likelihood of a person using opioids—so opioid use is a collider. Controlling for opioid use in the analysis would introduce a negative association between depressive symptoms and impulsivity. It would make it appear that depression causes people to become less impulsive, or impulsivity causes people to become less depressed.
Collider bias occurs not only when adding a collider as a covariable, but also when you select (or stratify) your sample based on a collider. This is also often called ‘selection bias’. This selection process happens frequently in addiction research. Using our previous example, selecting a sample of only people who use opioids would produce a negative association between depressive symptoms and impulsivity, where none existed in the general population (see Fig. 2). Another way to think about this is the following: if you know someone uses opioids, but they have no depressive symptoms, something must have caused them to start. Therefore, they may be more likely to be impulsive. Conversely, people who are depressed may use opioids even if they are not very impulsive. Hence, the relationship shown in Fig. 2.
Recently, it has been suggested that collider bias could be a particular problem in research investigating whether smoking may protect against contracting COVID‐19 [4]. In many countries, people who develop a cough are advised to be tested for COVID‐19. However, both smoking and COVID‐19 can cause coughing. Smokers who develop a smoking‐related cough may seek out a test even when they do not have COVID‐19. This would lead smokers to be over‐represented among those who test negative for COVID‐19, inducing a negative association between smoking and COVID‐19 where none really exists. This is because only those who are tested are included in the sample, and being tested is the result of the collider ‘having a cough’. Thus, selecting for a sample of those who are tested for COVID‐19 is the equivalent of conditioning on this collider. Results from a recent systematic review support the above interpretation: compared with people who have never smoked, smokers were more likely to be tested for COVID‐19, but less likely to test positive [5]. It should be noted that there is some evidence from sources that are unaffected by collider bias, such as seroprevalence studies [6, 7], that smokers have lower risk of contracting COVID‐19. Nonetheless, smokers may have worse outcomes when hospitalized [5], so the overall effect of smoking could be negative even if it protects against infection.
Collider bias has been implicated in seemingly protective effects of smoking previously. For example, a recent paper explored the impact of collider bias in studies examining how smoking affects birth defects [8]. The authors found that, in a sample where only live births were included, children of smokers were less likely to exhibit the birth defect anencephaly relative to children of non‐smokers. Smoking (the exposure) and anencephaly (the outcome) both reduce the likelihood that pregnancy will end in a live birth (the collider). Therefore, only including live births in a sample introduces collider bias, which inflates the negative association between smoking and anencephaly. This could lead someone unaware of collider bias to erroneously conclude that smoking protects against anencephaly. The authors also included results from pregnancies that did not result in live births. When they did so, the association weakened towards the null—which lends support for the hypothesis that the association is due to collider bias rather than representing causality.
What should addiction researchers do to address collider bias? First, we must think carefully about the causal relationships between variables in our studies. This can be done by drawing simple causal graphs, such as in Fig. 1. Even if we do not present these graphs in our manuscripts, having a better picture of the causal relationships between variables may help us to spot confounders and colliders that we would have otherwise missed. Secondly, we can design studies in a way that mitigates against the impact of collider bias. These include using weighting of participants to make the sample more representative of the underlying population, with the aim of removing or minimizing the impact of biases in the selection of the sample. We can also use cross‐contextual studies, where the same associations are explored in different settings with different underlying sample selection criteria. All these methods have limitations, but crucially they have different limitations. By triangulating different study designs, a clearer estimate of the likelihood of various causal associations can be ascertained [9].
Thirdly, once a study has been conducted, we can look for indicators of collider bias. We can examine the demographics of the sample being analysed to identify whether particular groups are over‐ or under‐represented. For example, the over‐representation of smokers among those tested for COVID‐19, but their lower rates of testing positive relative to never smokers, is indicative of collider bias. Similarly, we can use ‘negative controls’: variables that we have reason to assume should not be associated. If an association is found with these, then the risk of collider bias may be high [10].
Much as with confounding, it is often not possible to be sure that collider bias has been avoided or eliminated. However, with a clearer understanding of what causes it, and checks and balances to explore whether it could be present, we can be better placed to interpret surprising and implausible findings with appropriate caution and caveat.
Declaration of interests
None.
Author contributions
Harry Tattan‐Birch: Formal analysis; writing‐review & editing. John Marsden: Conceptualization; writing‐review & editing. Robert West: Conceptualization; writing‐review & editing. Suzanne Gage: Conceptualization; writing‐original draft; writing‐review & editing.
Acknowledgements
H.T.B. holds a studentship that is funded by Public Health England (558 585/180737).
Tattan‐Birch, H. , Marsden, J. , West, R. , and Gage, S. H. (2021) Assessing and addressing collider bias in addiction research: the curious case of smoking and COVID‐19. Addiction, 116: 982–984. 10.1111/add.15348.
References
- 1. Fergusson D. M., Boden J. M., Horwood L. J. Cannabis use and other illicit drug use: testing the cannabis gateway hypothesis. Addiction 2006; 101: 556–569. [DOI] [PubMed] [Google Scholar]
- 2. Greenland S. Quantifying biases in causal models: classical confounding vs collider‐stratification bias. Epidemiology 2003; 14: 300–306. [PubMed] [Google Scholar]
- 3. Greenland S., Pearl J., Robins J. M. Causal diagrams for epidemiologic research. Epidemiology 1999; 10: 37–48. [PubMed] [Google Scholar]
- 4. Griffith G. J., Morris T. T., Tudball M. J., Herbert A., Mancano G., Pike L., et al. Collider bias undermines our understanding of COVID‐19 disease risk and severity. Nat Commun 2020; 11: 5749. 10.1038/s41467-020-19478-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Simons D, Shahab L, Brown J, Perski O. The association of smoking status with SARS‐CoV‐2 infection, hospitalisation and mortality from COVID‐19: a living rapid evidence review with Bayesian meta‐analyses (version 7). Addiction [internet]. 2020. [cited 2020 Oct 29];add.15276. Available at: https://onlinelibrary.wiley.com/doi/10.1111/add.15276 [DOI] [PMC free article] [PubMed]
- 6. Ward H., Atchison C. J., Whitaker M., Ainslie K. E. C., Elliott J., Okell L. C., et al. Antibody prevalence for SARS‐CoV‐2 in England following first peak of the pandemic: REACT2 study in 100,000 adults. medRxiv 2020; 10.1101/2020.08.12.20173690 [DOI] [Google Scholar]
- 7. Carrat F., le Lamballerie X., Rahib D., Blanché H., Lapidus N., Artaud F., et al. Seroprevalence of SARS‐CoV‐2 among adults in three regions of France following the lockdown and associated risk factors: a multicohort study. medRxiv 2020; 10.1101/2020.09.16.20195693 [DOI] [Google Scholar]
- 8. Heinke D., Rich‐Edwards J. W., Williams P. L., Hernandez‐Diaz S., Anderka M., Fisher S. C., et al. Quantification of selection bias in studies of risk factors for birth defects among livebirths. Paediatr Perinat Epidemiol [internet] 2020; 34: 655 [cited 2020 Oct 29]. Available at: https://onlinelibrary.wiley.com/doi/10.1111/ppe.12650 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Lawlor D. A., Tilling K., Smith G. D. Triangulation in aetiological epidemiology. Int J Epidemiol [internet] 2016; 45: 1866–1686 [cited 2020 Oct 29]. Available at: https://academic.oup.com/ije/article/45/6/1866/2930550 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Gage S. H., Munafò M. R., Davey Smith G. Causal inference in developmental origins of health and disease (DOHaD) research. Annu Rev Psychol [internet] 2016; 67: 567–585 [cited 2020 Oct 29]. Available at: http://www.annualreviews.org/doi/10.1146/annurev-psych-122414-033352 [DOI] [PubMed] [Google Scholar]