Factors associated with: problems of using exploratory multivariable regression to identify causal risk factors

Dan Lewer; Thomas Brothers; Elizabeth O’Nions; John Pickavance

doi:10.1136/bmjmed-2025-001375

. 2025 Oct 27;4(1):e001375. doi: 10.1136/bmjmed-2025-001375

Factors associated with: problems of using exploratory multivariable regression to identify causal risk factors

Dan Lewer ^1,^2,^✉,⁰, Thomas Brothers ^3,⁰, Elizabeth O’Nions ^1,⁰, John Pickavance ^1,⁰

PMCID: PMC12574381 PMID: 41181819

Lewer and colleagues explain why risk factors for a health outcome should not be studied by putting several candidate independent variables (exposures) into a multivariable regression model and identifying which are statistically significant after mutual adjustment.

Key messages.

A “factors associated with” study design can be defined as an observational research study that does not have a prespecified primary independent variable (exposure) and instead uses multivariable regression to test whether any of several exposures affects a health outcome
Variables that have a statistically significant association with the outcome after mutual adjustment are identified as factors associated with the outcome
This study design is common in medical and epidemiological research, and thousands of these studies are published each year
Although not always labelled as such, these studies aim to identify causal relations but do not have the rigour necessary to make causal claims
This study design has important methodological flaws, such as lack of rationale for mutual adjustment of the variables, multiple statistical testing, and post hoc interpretation of significant results, which can produce unreliable and sometimes implausible results
Researchers should not use the “factors associated with” study design and scientific journals should not publish these analyses

Introduction

Many medical and epidemiological studies use multivariable regression to test whether several independent variables (exposures) are causal determinants of a health outcome. Where mutually adjusted regression coefficients are significant, the exposures are labelled as risk factors for the outcome. We call this study design “factors associated with.” In this article, we argue that this method is flawed due to a lack of reasoning about which variables are treated as confounders, multiple statistical testing, and post hoc interpretation of the results. In some cases, researchers use algorithmic or stepwise approaches to select exposure variables, which further exacerbates these problems. Although the results of factors associated with studies often seem reasonable, these problems mean that the method can also produce implausible results, such as dementia reducing the risk of death in patients admitted to hospital for trauma,¹ diabetes reducing the risk of venous thromboembolism in the general population,² and lack of food reducing the risk of post-traumatic stress disorder in refugees.³ Many of these studies are published every year, including in well respected journals. We argue that these studies are misleading and contribute to research waste, and the “factors associated with” method should be abandoned.

The “factors associated with” study design

What factors are associated with death and disease? What are the most important determinants of health? Can we identify risk factors that might be modified? These questions are often asked in observational health research. For example, researchers have investigated risk factors associated with heart attacks,⁴ strokes,⁵ serious covid-19,6,9 complications of cystic fibrosis,¹⁰ suicide among veterans,¹¹ and pain after breast cancer surgery,¹² with the aim of informing preventive interventions or identifying high risk groups. These studies are not testing a theory or hypothesis about the role of a specific risk factor, but instead simultaneously screen multiple potential risk factors for significant associations with the outcome (box 1).

Box 1. How to identify a “factors associated with” study: some common features.

Exploratory aims (eg, to explore, identify, understand, or characterise factors associated with an outcome).
No primary independent variable (exposure) or hypothesis.
Tables reporting regression results (such as odds ratios) and highlighting statistically significant results.
Classification of multiple exposures as either associated or not associated with the outcome.

In a typical “factors associated with” study, the researcher uses a regression model where the dependent variable is the outcome of interest and the independent variables are exposures such as personal characteristics, health behaviours, clinical factors, living and working conditions, and other socioeconomic and environmental factors. Where the mutually adjusted regression coefficients are statistically significant, the exposures are identified as factors associated with the outcome.

Although the results often look reasonable, we argue that the findings provide little scientific value and can be misleading. “Factors associated with” studies combine multiple poor research practices, including the table 2 fallacy, P-hacking, fishing expeditions, data dredging, HARKing (hypothesising after the results are known), and the Texas sharpshooter fallacy.^{13 14}

The “factors associated with” design was mainstream by the 1970s^{15 16} and has become increasingly common. In 2024, more than 4000 articles with the phrase “factors associated with” in the title were added to the PubMed database, a 10-fold increase from 2004. Reasons for the increased use of this method might include the widespread availability of multivariable regression in desktop statistical software, ease of doing these studies when data are already collected, lack of requirement for theoretical development, and apparent efficiency of studying several risk factors simultaneously. Although many epidemiologists might be aware of the limitations of this study design, increasing numbers of studies continue to be published, including in the highest ranked journals.

In this article, we do not consider other forms of exploratory causal research, such as genome-wide association studies,¹⁷ which typically adjust the association between each candidate genetic variant and the outcome for a limited set of confounding variables, rather than adjusting all of the genetic variables for each other.

Problems with “factors associated with” studies

Consider these six unexpected findings published in high quality journals, each of which was based on analysis of multiple candidate exposures using a regression model.

A study of primary care patients in England found that tobacco smoking reduced the risk of death due to covid-19.⁷ The authors suggested that inclusion of chronic respiratory disease in the regression model might explain this finding, because smoking causes chronic respiratory disease, which in turn increases vulnerability to covid-19. Therefore, adjusting for chronic respiratory disease could mask the effect of smoking on death due to covid-19. This problem is known as adjusting for a mediator.
A study of people testing for covid-19 found that tobacco smoking reduced the risk of a positive test result.¹⁸ This finding may be explained by collider bias. Two reasons why people might access a covid-19 test are that they have symptoms of covid-19 or they smoke tobacco and have a cough related to smoking. Those with a smoking related cough are less likely to have a positive test result within the sample of people testing for covid-19. This problem is known as collider bias, which can occur when an analysis adjusts, stratifies, or selects participants based on a variable (in this case, covid-19 testing) that is affected by both the exposure (tobacco smoking) and the outcome (covid-19).¹⁹ Collider bias can mean that the observed association between the exposure and outcome is very different from the causal effect.
A study of refugees from Guatemala found that lacking sufficient food almost eliminated the risk of post-traumatic stress disorder.³ The authors suggested that sharing of food and collective cooking might have created a sense of wellbeing. Again, collider bias might be an alternative explanation. Here, refugee status is the collider, which can arise from famine or food scarcity (the exposure) and war (which is a cause of post-traumatic stress disorder, the outcome). Among refugees, those displaced by famine may be less likely to have experienced war, producing a misleading negative association between food scarcity and post-traumatic stress disorder.
A study of patients admitted to hospital due to trauma found that dementia reduced the risk of death.¹ This finding might be explained by selection bias, where patients with dementia have a lower threshold for hospital admission or activation of a trauma team response.
A study of US enlisted marines found that relationship counselling increased the risk of suicide, whereas post-traumatic stress disorder, self-harm, and adverse childhood experiences did not.²⁰ The authors suggested that relationship problems might be underlying both counselling and suicide, a scenario sometimes known as confounding by a common cause.
A study of participants in the UK Biobank found that diabetes reduced the risk of venous thromboembolism.² The authors could not explain the apparent protective effect of diabetes and acknowledged that further evidence would be needed to support this finding. More plausible findings included increased risk of venous thromboembolism associated with older age, male sex, tobacco smoking, and higher body mass index.

Although we offered some possible explanations for these unexpected findings, we can only speculate, and many unknown mechanisms and unmeasured variables likely contribed to the observed associations (ie, a kind of spaghetti of causation). Apparently plausible results generated by the same method should be treated with the same caution as these strange findings. For example, in the study of venous thromboembolism,² tobacco smoking was associated with an increased risk, which is likely true, but the effect size may be biased by other elements of the study design.

Key problems of the “factors associated with” studies include: no prespecified primary exposure and no rationale for which variables are used in statistical adjustment; use of multiple statistical tests; and creating post hoc hypotheses.

No strategy for statistical adjustment: the table 2 fallacy

In a focused observational study of cause and effect, one variable is examined as a potential risk factor (exposure) on a health outcome, and the other variables are included in the model to adjust for confounding. In a “factors associated with” study, all variables are treated simultaneously as exposures and confounding variables for each other. The lack of rationale for statistical adjustment means that the quantities estimated by the regression model (eg, odds ratios) do not represent the effect of each variable on the outcome.

Westreich and Greenland called this problem the table 2 fallacy,¹³ because many studies include a table 1 describing the distribution of exposure variables in a study sample and a table 2 showing multivariable associations between the exposures and an outcome. The table 2 fallacy occurs when all values in table 2 are interpreted as causal effects on the outcome, rather than only the value estimated for the primary exposure.

Imagine having survey data from the general population, including whether participants recently had an injury or fall as a pedestrian (the outcome), their age (an exposure), and whether participants had balance problems (a second exposure). Table 2 might have two rows showing the mutually adjusted results for age and balance problems, based on a multivariable regression model. To estimate the effect of balance problems on pedestrian injuries and falls, an adjustment for age might be made, because age affects both balance and the risk of injuries, and hence is a confounder. Therefore, the value in table 2 might be a useful estimate. To estimate the effect of age on injuries, however, the effect would likely be left unadjusted in relation to balance problems, because balance is probably best understood as a mechanism or mediator for the effect of age on injuries.

Exposures can affect each other in many ways, and some are easier to understand than others. Some common relations between relevant variables and an exposure of interest include: a confounder, which affects the probability of both the primary exposure and the outcome, and may therefore mean that the exposure is correlated with the outcome even if it does not cause the outcome; a mediator, which lies on the causal pathway between the exposure and outcome, and can be considered a mechanism for a causal effect; and a collider, which is an event that is caused by both the exposure and the outcome, and can introduce bias if controlled in the analysis. These relations can be represented diagrammatically with directed acyclic graphs, which can help in the design of statistical adjustment strategies to estimate the causal effect of an exposure.²¹ When estimating causal effects, in most cases, confounders but not other variables should be controlled. The effect of controlling other variables is unpredictable, but in some cases, controlling for a mediator can create the false impression of no relation between a real exposure and an outcome (a false negative result) whereas controlling for a collider can create the false appearance of a relation between the exposure and outcome (a false positive result). Figure 1 shows an example of a confounder, mediator, and collider in a hypothetical study of risk factors for an injury or fall as a pedestrian.

These causal pathways are difficult to understand in a study that was not designed to quantify these pathways. Using a different set of variables to adjust each the effect of each variable is unlikely to resolve these problems, because unmeasured variables and the choice of study sample may also affect observed associations. For example, collider bias can result from the choice of study sample or selection bias during recruitment.²² In the examples above, where studies were conducted in samples of people testing for covid-19 or in Guatemalan refugees, collider bias could result from the fact that people not testing for covid-19 and non-refugees were excluded. Collider variables could affect each exposure in a “factors associated with” study differently, and may not appear in the list of exposure variables or in the description of the sample frame.

Given these complexities, it is generally not useful to try to unpick the reasons for an apparent multivariable association related to an exposure that is not the primary exposure. Unfortunately, in “factors associated with” studies, plausible findings are often taken to be true, whereas implausible findings are ignored or dismissed, even though these findings were generated with the same method.

Multiple statistical tests: fishing expeditions, data dredging, and P-hacking

By design, “factors associated with” studies involve multiple statistical tests. Each independent variable has its own P value, which is purportedly a probability that the effect size or something greater might be observed if there was actually no real effect. When none of the variables under investigation are true risk factors and all are unrelated to each other and to the outcome, on average one in 20 will yield a P value below 0.05 — a false positive finding (type I error). The probability of obtaining a false positive finding increases with the number of statistical tests performed and depends on whether any true associations exist — which, in practice, are never known. For instance, when testing 10 independent candidate risk factors, there is roughly a 40% probability of obtaining at least one false positive finding, irrespective of sample size (figure 2, top). If the study includes one true risk factor and a sample size of 250 participants, the probability that an observed statistically significant association reflects a real effect is only about 50% (figure 2, bottom). Researchers might assume a 95% probability that a significant risk factor is real.

Researchers might include all of their candidate risk factors into one model, or use algorithmic or stepwise methods to reduce the number of variables. Algorithmic or stepwise methods involve iteratively adding and removing independent variables based on the significance of their associations with the outcome, or the impact on model residuals (the model fit).

Stepwise variable selection exacerbates the problems of multiple statistical testing. By comparing combinations of risk factors and selecting those with significant associations, this procedure maximises false positive results,^{23 24} and can be viewed as a type of automated P-hacking. P-hacking has been defined as “trying out several statistical analyses and/or data eligibility specifications and then selectively reporting those that produce significant results.”²⁵ Researchers might use P-hacking to deliberately maximise their scientific publications at the cost of misleading results. More often, well intentioned researchers are unaware that the stepwise algorithms are a type of P-hacking. These problems are also known as data dredging²⁶ or fishing expeditions because the researcher looks for associations in a dataset rather than testing a hypothesis or theory based on previous knowledge.

Post hoc hypotheses: Texas sharpshooter fallacy and HARKing

The Texas sharpshooter fallacy is an analogy to describe hypothesising about associations after these associations have been observed. Imagine that someone is doing target practice when no one is watching. They fire a gun at a blank wall and then draw a target around the tightest cluster of bullet holes. They then invite people to observe the accuracy of their aim. This practice is also known as Hypothesising After the Results are Known (HARKing).²⁷

“Factors associated with” studies are a special case of the Texas sharpshooter fallacy and HARKing. Classically, these practices involve pretending that the hypothesis (or target) was already present before the research was started, and therefore have an element of dishonesty. Many “factors associated with” studies do not claim they had prior hypotheses. The conclusions, however, often suggest mechanisms for the observed risk factors as if these proposed mechanisms were being tested. In fact, these mechanisms are being suggested post hoc.

In summary, these three problems mean that the results of “factors associated with” studies do not provide useful insights into causal relations between exposures and outcomes. Methods that estimate causal effects in observational research must be guided by counterfactual theories about what might have happened to people in unobserved exposure statuses. Introductions to these problems and tools such as directed acyclic graphs are available elsewhere,^{28 29} and recommendations for avoiding some common pitfalls are provided in box 2.

Box 2. When studying the causes of diseases, how to avoid the problems of “factors associated with” studies.

Use unambiguous causal language in your research question, such as “does reducing salt in the diet prevent strokes?”
Prespecify an adjustment strategy based on existing evidence, knowledge, and assumptions; directed acyclic graphs can help to communicate your strategy.
Include all potential confounding variables in the analysis, and do not adjust for mediator or collider variables without good reason.
Use a preregistered research protocol; if legitimate deviations from the protocol are necessary, the protocol will help to explain the deviations transparently.
Focus on estimating an effect, including the uncertainty; if P>0.05, the effect size and confidence interval should still be reported (rather than reported as not significant).
Avoid speculation about results from variables that were included in the analysis to adjust for confounding.

Defences of “factors associated with” studies

Results are exploratory or hypothesis generating

Researchers might accept that limitations exist in a “factors associated with” analysis, but argue that the results give a useful initial indication of potential risk factors. Researchers might say that the results are hypothesis generating and can inform confirmatory studies with a more focused study design. But how often has an important hypothesis been generated for the first time from a “factors associated with” study, and then confirmed to be true? We have yet to identify an example. If readers can identify examples, which we would like to hear about, we wonder if these justify the waste from the large number of “factors associated with studies that are done each year.

In some cases, “factors associated with” studies are not only wasteful, but potentially harmful. Two of the examples above^{7 18} suggested that tobacco smoking might prevent serious covid-19, which could undermine public health efforts to reduce the prevalence of smoking. In another example, researchers examined the risk factors associated with recent high intensity physical activity in patients with hypertrophic cardiomyopathy who died during physical activity of any intensity.³⁰ The results suggested that younger age was associated with “high intensity physical activity related sudden cardiac death”. The researchers concluded that younger patients (but not older patients) should be advised against high intensity physical activity, which could undermine clinical guidance and evidence from randomised trials.^{31 32}

A “factors associated with” study might identify an important risk factor by chance, but at the cost of misleading findings and substantial research resources.

Results show associations rather than causal effects

Some researchers argue that an analysis of observational data is not attempting to quantify the effect of a risk factor, and therefore the rigour required to measure a causal effect is not needed. Instead, researchers might claim that the risk factor is “independently associated” with the outcome, and that estimating the size of an association is different from estimating a causal effect.

We have argued that the findings of a factors associated with study can reflect the arbitrary design of a regression model rather than processes of substantive importance in the real world. Moreover, if the findings are interpreted in ways that imply the exposure should be modified or might contribute to the risk of a disease, then the inference is inherently causal. The term independently associated, if not implying causation, does not mean anything. This language can obscure methodological challenges or downplay the need for a clear research question and careful causal reasoning. Even if the discussion section of the article highlights the limitations of the method, the results may still be highlighted and promoted by journalists or policy makers.

Results can help prioritise groups for more support

Based on the results of “factors associated with” studies, researchers might conclude that subgroups should be prioritised for effective interventions. In the example of US enlisted marines,²⁰ one might argue that relationship counselling probably did not cause suicide, but nonetheless marines who receive relationship counselling are at higher risk. Therefore, those receiving counselling could benefit from interventions to prevent suicide. This is a question about prediction rather than causation. If the question is whether marines should be prioritised for interventions to prevent suicide according to whether they received relationship counselling, then the relevant value is the univariable association between relationship counselling and suicide, rather than this association after adjustment for other variables, such as deployment to war zones and traumatic brain injury. If the multivariable model is used to prioritise interventions to prevent suicide, then the relevant value would be the predicted risk for each marine based on the full model, rather than coefficients for individual exposures. Neither approach would involve inferring a causal relation based on the significant association between receiving counselling and suicide, after adjustment for the other available variables.

Multivariable regression models can estimate causal effects (what is the effect of an exposure on a disease?) or predict outcomes (who is most at risk?) but cannot do both. “Factors associated with” studies often confuse these different aims, and use language and methods related to both paradigms in the same research study.

Conclusions

We recommend that researchers should not use the “factors associated with” method and scientific journals should not publish these analyses. Many examples of “factors associated with” studies have results that seem reasonable. We are concerned, however, that these studies add little beyond common sense, and may be misleading and harmful. To our knowledge, no “factors associated with” studies have led to important scientific progress, despite the publication of many studies each year. Hence we believe that the “factors associated with” study design should be abandoned.

Footnotes

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.

Provenance and peer review: Commissioned; externally peer reviewed.

References

1.Yadav K, Lampron J, Nadj R, et al. Predictors of mortality among older major trauma patients. Can J Emerg Med . 2023;25:865–72. doi: 10.1007/s43678-023-00597-w. [DOI] [Google Scholar]
2.Gregson J, Kaptoge S, Bolton T, et al. Cardiovascular Risk Factors Associated With Venous Thromboembolism. JAMA Cardiol. 2019;4:163–73. doi: 10.1001/jamacardio.2018.4537. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Sabin M, Lopes Cardozo B, Nackerud L, et al. Factors associated with poor mental health among Guatemalan refugees living in Mexico 20 years after civil conflict. JAMA. 2003;290:635–42. doi: 10.1001/jama.290.5.635. [DOI] [PubMed] [Google Scholar]
4.Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet. 2004;364:937–52. doi: 10.1016/S0140-6736(04)17018-9. [DOI] [PubMed] [Google Scholar]
5.O’Donnell MJ, Chin SL, Rangarajan S, et al. Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet. 2016;388:761–75. doi: 10.1016/S0140-6736(16)30506-2. [DOI] [PubMed] [Google Scholar]
6.Swann OV, Holden KA, Turtle L, et al. Clinical characteristics of children and young people admitted to hospital with covid-19 in United Kingdom: prospective multicentre observational cohort study. BMJ. 2020;370:m3249. doi: 10.1136/bmj.m3249. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature New Biol. 2020;584:430–6. doi: 10.1038/s41586-020-2521-4. [DOI] [Google Scholar]
8.Petrilli CM, Jones SA, Yang J, et al. Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease 2019 in New York City: prospective cohort study. BMJ. 2020;369:m1966. doi: 10.1136/bmj.m1966. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Grasselli G, Greco M, Zanella A, et al. Risk Factors Associated With Mortality Among Patients With COVID-19 in Intensive Care Units in Lombardy, Italy. JAMA Intern Med. 2020;180:1345–55. doi: 10.1001/jamainternmed.2020.3539. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Sly PD, Gangell CL, Chen L, et al. Risk factors for bronchiectasis in children with cystic fibrosis. N Engl J Med. 2013;368:1963–70. doi: 10.1056/NEJMoa1301725. [DOI] [PubMed] [Google Scholar]
11.LeardMann CA, Powell TM, Smith TC, et al. Risk factors associated with suicide in current and former US military personnel. JAMA. 2013;310:496–506. doi: 10.1001/jama.2013.65164. [DOI] [PubMed] [Google Scholar]
12.Gärtner R, Jensen M-B, Nielsen J, et al. Prevalence of and factors associated with persistent pain following breast cancer surgery. JAMA. 2009;302:1985–92. doi: 10.1001/jama.2009.1568. [DOI] [PubMed] [Google Scholar]
13.Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am J Epidemiol. 2013;177:292–8. doi: 10.1093/aje/kws412. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Andrade C. HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions, and Data Dredging and Mining as Questionable Research Practices. J Clin Psychiatry. 2021;82:20f13804. doi: 10.4088/JCP.20f13804. [DOI] [Google Scholar]
15.Stamler J, Rhomberg P, Schoenberger JA, et al. Multivariate analysis of the relationship of seven variables to blood pressure: findings of the Chicago Heart Association Detection Project in Industry, 1967-1972. J Chronic Dis. 1975;28:527–48. doi: 10.1016/0021-9681(75)90060-0. [DOI] [PubMed] [Google Scholar]
16.Eisenberg M, Bergner L, Hallstrom A. Paramedic programs and out-of-hospital cardiac arrest: I. Factors associated with successful resuscitation. Am J Public Health. 1979;69:30–8. doi: 10.2105/ajph.69.1.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Uffelmann E, Huang QQ, Munung NS, et al. Genome-wide association studies. Nat Rev Methods Primers . 2021;1 doi: 10.1038/s43586-021-00056-9. [DOI] [Google Scholar]
18.de Lusignan S, Dorward J, Correa A, et al. Risk factors for SARS-CoV-2 among patients in the Oxford Royal College of General Practitioners Research and Surveillance Centre primary care network: a cross-sectional study. Lancet Infect Dis. 2020;20:1034–42. doi: 10.1016/S1473-3099(20)30371-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Griffith GJ, Morris TT, Tudball MJ, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun. 2020;11:5749. doi: 10.1038/s41467-020-19478-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Phillips CJ, LeardMann CA, Vyas KJ, et al. Risk Factors Associated With Suicide Completions Among US Enlisted Marines. Am J Epidemiol. 2017;186:668–78. doi: 10.1093/aje/kwx117. [DOI] [PubMed] [Google Scholar]
21.Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology (Sunnyvale) 1999;10:37–48. [Google Scholar]
22.Hernán MA, Monge S. Selection bias due to conditioning on a collider. BMJ. 2023;381:1135. doi: 10.1136/bmj.p1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Smith G. Step away from stepwise. J Big Data. 2018;5:32. doi: 10.1186/s40537-018-0143-6. [DOI] [Google Scholar]
24.Heinze G, Dunkler D. Five myths about variable selection. Transpl Int. 2017;30:6–10. doi: 10.1111/tri.12895. [DOI] [PubMed] [Google Scholar]
25.Head ML, Holman L, Lanfear R, et al. The extent and consequences of p-hacking in science. PLoS Biol. 2015;13:e1002106. doi: 10.1371/journal.pbio.1002106. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Davey Smith G. Data dredging, bias, or confounding. BMJ. 2002;325:1437–8. doi: 10.1136/bmj.325.7378.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Kerr NL. HARKing: Hypothesizing After the Results are Known. Pers Soc Psychol Rev . 1998;2:196–217. doi: 10.1207/s15327957pspr0203_4. [DOI] [PubMed] [Google Scholar]
28.Hernan MA, Robins JM. What If: causal inference. Boca Raton: Chapman & Hall/CRC; https://miguelhernan.org/whatifbook Available. [Google Scholar]
29.Tennant PWG, Murray EJ, Arnold KF, et al. Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations. Int J Epidemiol. 2021;50:620–32. doi: 10.1093/ije/dyaa213. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Lee H-J, Gwak S-Y, Kim K, et al. Factors associated with high-intensity physical activity and sudden cardiac death in hypertrophic cardiomyopathy. Heart. 2025;111:253–61. doi: 10.1136/heartjnl-2024-324928. [DOI] [PubMed] [Google Scholar]
31.Basu J, Nikoletou D, Miles C, et al. High intensity exercise programme in patients with hypertrophic cardiomyopathy: a randomized trial. Eur Heart J. 2025;46:1803–15. doi: 10.1093/eurheartj/ehae919. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Ommen SR, Ho CY, Asif IM, et al. 2024 AHA/ACC/AMSSM/HRS/PACES/SCMR Guideline for the Management of Hypertrophic Cardiomyopathy: A Report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines. Circulation. 2024;149 doi: 10.1161/CIR.0000000000001250. [DOI] [Google Scholar]

[R1] 1.Yadav K, Lampron J, Nadj R, et al. Predictors of mortality among older major trauma patients. Can J Emerg Med . 2023;25:865–72. doi: 10.1007/s43678-023-00597-w. [DOI] [Google Scholar]

[R2] 2.Gregson J, Kaptoge S, Bolton T, et al. Cardiovascular Risk Factors Associated With Venous Thromboembolism. JAMA Cardiol. 2019;4:163–73. doi: 10.1001/jamacardio.2018.4537. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Sabin M, Lopes Cardozo B, Nackerud L, et al. Factors associated with poor mental health among Guatemalan refugees living in Mexico 20 years after civil conflict. JAMA. 2003;290:635–42. doi: 10.1001/jama.290.5.635. [DOI] [PubMed] [Google Scholar]

[R4] 4.Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet. 2004;364:937–52. doi: 10.1016/S0140-6736(04)17018-9. [DOI] [PubMed] [Google Scholar]

[R5] 5.O’Donnell MJ, Chin SL, Rangarajan S, et al. Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet. 2016;388:761–75. doi: 10.1016/S0140-6736(16)30506-2. [DOI] [PubMed] [Google Scholar]

[R6] 6.Swann OV, Holden KA, Turtle L, et al. Clinical characteristics of children and young people admitted to hospital with covid-19 in United Kingdom: prospective multicentre observational cohort study. BMJ. 2020;370:m3249. doi: 10.1136/bmj.m3249. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature New Biol. 2020;584:430–6. doi: 10.1038/s41586-020-2521-4. [DOI] [Google Scholar]

[R8] 8.Petrilli CM, Jones SA, Yang J, et al. Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease 2019 in New York City: prospective cohort study. BMJ. 2020;369:m1966. doi: 10.1136/bmj.m1966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Grasselli G, Greco M, Zanella A, et al. Risk Factors Associated With Mortality Among Patients With COVID-19 in Intensive Care Units in Lombardy, Italy. JAMA Intern Med. 2020;180:1345–55. doi: 10.1001/jamainternmed.2020.3539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Sly PD, Gangell CL, Chen L, et al. Risk factors for bronchiectasis in children with cystic fibrosis. N Engl J Med. 2013;368:1963–70. doi: 10.1056/NEJMoa1301725. [DOI] [PubMed] [Google Scholar]

[R11] 11.LeardMann CA, Powell TM, Smith TC, et al. Risk factors associated with suicide in current and former US military personnel. JAMA. 2013;310:496–506. doi: 10.1001/jama.2013.65164. [DOI] [PubMed] [Google Scholar]

[R12] 12.Gärtner R, Jensen M-B, Nielsen J, et al. Prevalence of and factors associated with persistent pain following breast cancer surgery. JAMA. 2009;302:1985–92. doi: 10.1001/jama.2009.1568. [DOI] [PubMed] [Google Scholar]

[R13] 13.Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am J Epidemiol. 2013;177:292–8. doi: 10.1093/aje/kws412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Andrade C. HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions, and Data Dredging and Mining as Questionable Research Practices. J Clin Psychiatry. 2021;82:20f13804. doi: 10.4088/JCP.20f13804. [DOI] [Google Scholar]

[R15] 15.Stamler J, Rhomberg P, Schoenberger JA, et al. Multivariate analysis of the relationship of seven variables to blood pressure: findings of the Chicago Heart Association Detection Project in Industry, 1967-1972. J Chronic Dis. 1975;28:527–48. doi: 10.1016/0021-9681(75)90060-0. [DOI] [PubMed] [Google Scholar]

[R16] 16.Eisenberg M, Bergner L, Hallstrom A. Paramedic programs and out-of-hospital cardiac arrest: I. Factors associated with successful resuscitation. Am J Public Health. 1979;69:30–8. doi: 10.2105/ajph.69.1.30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Uffelmann E, Huang QQ, Munung NS, et al. Genome-wide association studies. Nat Rev Methods Primers . 2021;1 doi: 10.1038/s43586-021-00056-9. [DOI] [Google Scholar]

[R18] 18.de Lusignan S, Dorward J, Correa A, et al. Risk factors for SARS-CoV-2 among patients in the Oxford Royal College of General Practitioners Research and Surveillance Centre primary care network: a cross-sectional study. Lancet Infect Dis. 2020;20:1034–42. doi: 10.1016/S1473-3099(20)30371-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Griffith GJ, Morris TT, Tudball MJ, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun. 2020;11:5749. doi: 10.1038/s41467-020-19478-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Phillips CJ, LeardMann CA, Vyas KJ, et al. Risk Factors Associated With Suicide Completions Among US Enlisted Marines. Am J Epidemiol. 2017;186:668–78. doi: 10.1093/aje/kwx117. [DOI] [PubMed] [Google Scholar]

[R21] 21.Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology (Sunnyvale) 1999;10:37–48. [Google Scholar]

[R22] 22.Hernán MA, Monge S. Selection bias due to conditioning on a collider. BMJ. 2023;381:1135. doi: 10.1136/bmj.p1135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Smith G. Step away from stepwise. J Big Data. 2018;5:32. doi: 10.1186/s40537-018-0143-6. [DOI] [Google Scholar]

[R24] 24.Heinze G, Dunkler D. Five myths about variable selection. Transpl Int. 2017;30:6–10. doi: 10.1111/tri.12895. [DOI] [PubMed] [Google Scholar]

[R25] 25.Head ML, Holman L, Lanfear R, et al. The extent and consequences of p-hacking in science. PLoS Biol. 2015;13:e1002106. doi: 10.1371/journal.pbio.1002106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Davey Smith G. Data dredging, bias, or confounding. BMJ. 2002;325:1437–8. doi: 10.1136/bmj.325.7378.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Kerr NL. HARKing: Hypothesizing After the Results are Known. Pers Soc Psychol Rev . 1998;2:196–217. doi: 10.1207/s15327957pspr0203_4. [DOI] [PubMed] [Google Scholar]

[R28] 28.Hernan MA, Robins JM. What If: causal inference. Boca Raton: Chapman & Hall/CRC; https://miguelhernan.org/whatifbook Available. [Google Scholar]

[R29] 29.Tennant PWG, Murray EJ, Arnold KF, et al. Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations. Int J Epidemiol. 2021;50:620–32. doi: 10.1093/ije/dyaa213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Lee H-J, Gwak S-Y, Kim K, et al. Factors associated with high-intensity physical activity and sudden cardiac death in hypertrophic cardiomyopathy. Heart. 2025;111:253–61. doi: 10.1136/heartjnl-2024-324928. [DOI] [PubMed] [Google Scholar]

[R31] 31.Basu J, Nikoletou D, Miles C, et al. High intensity exercise programme in patients with hypertrophic cardiomyopathy: a randomized trial. Eur Heart J. 2025;46:1803–15. doi: 10.1093/eurheartj/ehae919. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Ommen SR, Ho CY, Asif IM, et al. 2024 AHA/ACC/AMSSM/HRS/PACES/SCMR Guideline for the Management of Hypertrophic Cardiomyopathy: A Report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines. Circulation. 2024;149 doi: 10.1161/CIR.0000000000001250. [DOI] [Google Scholar]

PERMALINK

Factors associated with: problems of using exploratory multivariable regression to identify causal risk factors

Dan Lewer

Thomas Brothers

Elizabeth O’Nions

John Pickavance

Key messages.

Introduction

The “factors associated with” study design

Box 1. How to identify a “factors associated with” study: some common features.

Problems with “factors associated with” studies

No strategy for statistical adjustment: the table 2 fallacy

Figure 1. Example of a confounder, mediator, and collider in a hypothetical study of risk factors for an injury or fall as a pedestrian, with simplified directed acyclic graphs.

Multiple statistical tests: fishing expeditions, data dredging, and P-hacking

Post hoc hypotheses: Texas sharpshooter fallacy and HARKing

Box 2. When studying the causes of diseases, how to avoid the problems of “factors associated with” studies.

Defences of “factors associated with” studies

Results are exploratory or hypothesis generating

Results show associations rather than causal effects

Results can help prioritise groups for more support

Conclusions

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Factors associated with: problems of using exploratory multivariable regression to identify causal risk factors

Dan Lewer

Thomas Brothers

Elizabeth O’Nions

John Pickavance

Key messages.

Introduction

The “factors associated with” study design

Box 1. How to identify a “factors associated with” study: some common features.

Problems with “factors associated with” studies

No strategy for statistical adjustment: the table 2 fallacy

Figure 1. Example of a confounder, mediator, and collider in a hypothetical study of risk factors for an injury or fall as a pedestrian, with simplified directed acyclic graphs.

Multiple statistical tests: fishing expeditions, data dredging, and P-hacking

Post hoc hypotheses: Texas sharpshooter fallacy and HARKing

Box 2. When studying the causes of diseases, how to avoid the problems of “factors associated with” studies.

Defences of “factors associated with” studies

Results are exploratory or hypothesis generating

Results show associations rather than causal effects

Results can help prioritise groups for more support

Conclusions

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases