Skip to main content
Seminars in Hearing logoLink to Seminars in Hearing
. 2021 Apr 15;42(1):3–9. doi: 10.1055/s-0041-1725996

Interpreting Results from Epidemiologic Studies

Jennifer A Deal 1,2,3,, Joshua Betz 3,4, Frank R Lin 1,2,3, Nicholas S Reed 1,2,3
PMCID: PMC8050417  PMID: 33883787

Abstract

Epidemiology is the science of public health. The focus of this discussion is to present a brief overview of how epidemiology approaches questions of disease causation, including why it sometimes gets things wrong, and so to provide a framework for how we consume and use this type of research, particularly when it comes to patient care.

Keywords: epidemiology, causal inference

Discussion

Epidemiology is “the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems” 1 and has an indispensable role in promoting public health. Through epidemiologic methods, we can assess the burden of disease in a population to guide resource allocation, study the natural history and prognosis of disease, and evaluate interventions to prevent or treat disease. Epidemiologic research also provides an important evidence base to guide clinical decision-making and policy. As the science of public health, one of the primary goals of epidemiology is to identify factors that are associated with increased risk of a disease in a population. This allows for the identification of subgroups of the population who may be at higher risk for the disease and who may therefore benefit from intervention. The identification of risk factors in a population also may provide clues as to causes of the disease, with the ultimate goal of intervening on those factors to prevent or delay the development of the disease.

Despite these important applications, epidemiologic research is sometimes disparaged as having little value. However, concerns about epidemiologic research stem primarily from the interpretation and communication of study findings and are not intrinsic to epidemiologic methods themselves. In many instances, epidemiology has been the major driver in improving public health—tobacco control may arguably be one of its greatest successes. But sometimes the conclusions of epidemiologic research, particularly observational research, are at best, conflicted, and, at worst, wrong. Take the example of menopausal hormone therapy (MHT) in postmenopausal women. Despite a preponderance of observational research suggesting health benefits of MHT for chronic disease prevention, the Women's Health Initiative (WHI), a large, placebo-controlled randomized trial, was stopped early due to trends suggesting an increase in rates of cardiovascular events and incidence of breast cancer in one of the groups treated with MHT. The effect was immediate—new prescriptions for MHT fell sharply and many women who had been taking MHT stopped. 2 Yet more recent research suggests that the story may not be so simple; although the WHI's findings may hold for older women, especially perhaps for women who may have experienced menopause many years prior to MHT initiation, for younger women early in menopause, the benefits of MHT may outweigh the harms. 3 Given these conflicting findings, how is a doctor going to advise her patient about the use of MHT? How is a woman going to decide whether and when to initiate MHT? The focus of this discussion is to present a brief overview of how epidemiology approaches questions of disease causation, including why it sometimes gets things wrong, and so to provide a framework for how we consume and use this type of research, particularly when it comes to patient care.

Two fundamental tenets of epidemiology are that (1) epidemiologic research is science that occurs at the population, not the individual, level and (2) health and disease in a population are not random. Factors causing disease can be intrinsic, a genetic mutation for example, or may be related to factors that we are exposed to, such as air pollution, or behaviors that we choose to engage (or not engage) in, such as physical activity. Although the latter tenet is understood by a variety of scientific disciplines, which inference occurs at the population level is one of the defining characteristics of epidemiology, and what sets it apart from some other approaches to health research.

What is the value of looking at populations? Simply put, at an individual level, understanding how a constellation of exposures and behaviors relates to health or disease can be difficult to discern. However, when looking across individuals in a population, patterns that are unclear at the individual level may begin to emerge. This may have been recognized for the first time in 1662, when English businessman, John Graunt (1620–1674), published, “Natural and Political Observations Made upon the Bills of Mortality.” 4 Using 50 years of weekly “bills,” or death records, Graunt was, for the first time, able to determine that there was a cyclic regularity to mortality due to certain causes in London over time. Although it may seem obvious now, at the time it was not understood that, although the cause of death for any given individual is not predictable, causes of death in a population are predictable, and with a remarkable degree of accuracy. This powerful realization allowed for the understanding of the number of deaths that would be expected for a given cause in a given year. Importantly, if the expected number of deaths can be anticipated, the ability to detect outbreaks, an excess of deaths due to a given cause, became possible—a true revelation. 5

We have said that one of epidemiology's primary goals is to identify causal risk factors for disease. To understand epidemiology's role in identifying causes of disease, it may be helpful to first discuss other scientific approaches to determining etiology (or cause). One approach is to study the effect of an agent on animals in a controlled laboratory experiment. However, while at the end of the experiment we may be confident about our results in that animal, we must extrapolate those results to humans. Alternatively, we may use human cell culture, but again, at the end of the experiment, must extrapolate results from individual cells to an entire organism.

An alternative approach to studying etiology is to conduct experiments in human populations. Many disciplines take this approach, including psychology and auditory neuroscience. What sets these disciplines apart from epidemiology is the use of experimental conditions in controlled samples. For example, take the case of hearing loss and cognitive function. The epidemiologic approach consists of assessments in large heterogeneous populations and characterizing associations of a standardized outcome (e.g., cognition measured by a standardized test) and hearing loss (measured by clinical gold standard). By contrast, at the auditory neuroscience level, the approach may be to recruit a smaller homogeneous population and utilize a series of tasks with varying clinical utility and difficulty to characterize the performance of two matched groups. Each approach may lead to the same conclusion; however, they also may diverge given the vastly different designs.

In epidemiology, the experimental study design is the randomized trial. Participants are randomly assigned, most often using a computer-generated allocation scheme, to a treatment group or to a control group. The control group may be the standard of care, or in some instances, a placebo. Because treatment is randomly assigned, it is not linked to any patient characteristics (e.g., age, race/ethnicity), which is important because these characteristics also may be related to increased risk to the outcome. In other words, with successful randomization, there is no bias in the assignment of treatment. Consequently, the treatment and control groups will be, on average, balanced with respect to factors related to the outcome. Any observed difference in the outcome comparing the treatment to the control group may therefore be ascribed solely to the treatment. For this reason, randomized trials are often considered the “gold standard” epidemiologic design for determining the effect of an exposure on an outcome.

Randomized trials have their limitations, however. They are expensive to carry out and, for that reason, are often limited to the short-term effects of a treatment. Consider again the WHI, which was able to look at only the very short-term effects of MHT on health outcomes. Additionally, participants in randomized trials are often healthier than the general population, and certain groups—for example, pregnant women 6 and older adults 7 —may be excluded from participating out of concern of excess risk of harm. Consequently, it may be unclear at the end of the study whether the treatment will work the same in the general population or in the subgroups that were excluded from participating. Additionally, individuals cannot ethically be randomized to receive a treatment which may be harmful. Therefore, nonrandomized, observational studies (e.g., cross-sectional, case–control study, cohort study) must be used to study the possible effect of such an exposure in human populations.

Observational study designs do not have the same protection from bias as randomized studies. Any observed association may be due to several possible explanations. It may be that an observed association is due solely to chance. Alternatively, it could be that the association is not real, but instead is due to bias. Finally, it could be that the association represents a true causal relationship between an exposure and an outcome. Let us take each of these possibilities in turn, beginning with chance.

Consider the association between hearing loss and risk of dementia in older adults. A recent systematic review and meta-analysis estimated that hearing loss is associated with a 94% increased risk of developing dementia; the hazard ratio was 1.94 with a 95% confidence interval of 1.38 to 2.73. 8 The role of the confidence interval is to give an idea of the uncertainty about the sample estimate of 1.94. Why is there a need to estimate uncertainty? Inferential statistics is about using a statistical model to infer about an entire population using only a sample from that population. A sample of the population is needed because it is not feasible—practically, ethically, or fiscally—to study everyone. Because we are sampling from a population of interest, the characteristics of any sample we take will deviate to some degree from those in the larger population. The degree of these deviations can result in an over- or underestimation of the association we are interested in—in this case, the hearing loss–dementia association—compared with what we would have observed in the entire population. This concept is known as “sampling variability” and when we say that an association may be due to chance, what we really mean is that the association may be due to sampling variability.

How do we evaluate the potential role of sampling variability in our study? Let us assume that there is no association between hearing loss and dementia risk in the population (this assumption of “no relationship” is what is referred to as the “null hypothesis”). In a long series of studies, we would expect to see varying degrees of association simply due to sampling variability. For example, some studies may show positive associations (greater hearing loss associated with greater dementia risk), while others show negative associations (greater hearing loss associated with lower dementia risk), and some studies will show no relationship. However, on average, across all studies, we would see that hearing loss is not associated with dementia. Of course, in practice, we typically have only one sample—the sample in which we conducted our study. Therefore, when interpreting the results of our study, we must rely on the statistical model to guide our inference about the association in the larger population of interest. Two common metrics to guide our inference and quantify uncertainty due to sampling variability include confidence intervals and p -values.

Confidence intervals allow us to infer about values of the association in the population that are most consistent with the model and data at hand. If we were to apply 95% confidence intervals appropriately in a very large number of studies, then 95% of these intervals (19 out of 20) will contain the true population value of the association. In the example of Livingston et al, 8 given the data and the model, the best estimate of the hazard ratio of hearing loss and dementia in the population was 1.94, but because of sampling variability, the data and model are also consistent with population values as low as 1.38 and as high as 2.73. Can we say that the probability that the population hazard ratio is between 1.38 and 2.73 is 95%? No—once we have calculated the confidence interval, it either contains the true population value or it does not. While we do not know whether the confidence interval from our sample contains the true population value of the association, we have “confidence” in it because in the long run of experience, properly computed confidence intervals often do contain the true population value.

Another way we quantify uncertainty related to sampling variability is with the often-misunderstood p -value. If we assume that our model and all its assumptions (including the null hypothesis) are true, the p -value tells us how often we would expect to see a sample association at least as large in magnitude as the one we actually observed in a large number of studies from a population where no such association exists. A small p -value suggests that the association we observed is not likely to be explained by sampling variability alone. However, what a p -value does not tell us is just as important as what it does. The p -value does not tell us the magnitude of the association or the clinical importance of the relationship between exposure and outcome. Additionally, because sampling variability is linked to sample size, it is possible to observe statistically significant associations that lack any clinical or scientific relevance in large samples or see no statistically significant association with known risk factors in smaller samples. Although we often state that a relationship is significant if the p -value is less than 0.05, use of this cut point is due to convention, and best practice recommendations are to view statistical significance as a continuous phenomenon, not just classified as above or below a given threshold. For all of these reasons, the p -value, although useful, should not be the sole basis for scientific inference. 9

It is important to note that the validity of our inference from statistical models, including p -values and confidence intervals, relies on the validity our model assumptions. Therefore, it is important to report the assessment of model assumptions and discuss how deviations from these assumptions could impact our inference. In addition, all statistical analyses performed should be transparently reported, and common misinterpretations of statistical inference avoided. 10 11 For interested readers, an excellent treatment of concepts in statistics and probability can be found in Hacking and Ian. 12

We have said that chance (i.e., sampling variability) is one possible reason that an exposure may be associated with an outcome in an epidemiologic study. A second possible reason is because of bias. Researchers consider three main sources of possible bias in any epidemiologic study: (1) bias due to the way in which participants are selected to participate in the study or to be included in the analysis (i.e., selection bias); (2) bias due to error in the measurement of the exposure, outcome, or other factors in the study (i.e., information bias); and (3) bias due to failure to account for other factors related to the exposure that also cause the outcome (i.e., confounding). Selection bias is different from selection of study participants, which, as we have previously discussed, must occur as it is not possible to study everyone. Readers interested in learning more about bias are referred to the excellent introductory textbook, Gordis Epidemiology . 13

It is the role of the researcher to evaluate the possible role of each bias in any given study, and to ensure that the inference from the study appropriately acknowledges that possible bias. It is important to note that it is not a question of whether bias exists in an observational study—it does! The question is the magnitude of the bias, and how it may influence our estimated association between an exposure and an outcome. Unfortunately, this important step of communicating the limitations of a study is often not effectively accomplished by the researcher or by the media, which can lead to confusion and a general sense that epidemiologic studies are not valuable.

Finally, if it is believed that observed relationship is not due to chance or bias, but is a true relationship, how should it be determined if the relationship is causal? Because of the possibility for bias in observational studies, “causal significance of an association is a matter of judgment.” 14 The determination of cause must therefore be beyond the scope of any individual observational study and must incorporate evidence from other scientific studies in both humans and animals. Except in the rare case of a few definitive randomized trials, cause can never be determined from one study.

In 1965, Sir Austin Bradford Hill, 15 epidemiologist and statistician, gave the inaugural presidential address at the meeting of the Section of Occupational Medicine of the Royal Society of Medicine in London, England. In this address, later published in the Proceedings of the Royal Academy of Medicine (Hill 1965), Hill presented nine possible guidelines for judging whether an observed association is causal ( Table 1 ). Of the nine guidelines, only temporality —that exposure to a factor must precede the development of the outcome—is required for a factor to be a cause of a disease. In Hill's own words,

Table 1. Sir Bradford Hill's guidelines for evaluating a possible causal association between a factor and a disease (from Hill 1965) 15 .

Guideline Description
Analogy Is there a similar analogous association that is known to be causal? For example, a drug is known to cause birth defects; if a similar drug is associated with birth defects, that association may also be causal
Biological gradient Does risk of disease increase as the exposure to the factor increases? For example, does risk of lung cancer increase with increasing number of cigarettes smoked per day? Also referred to as “dose–response”
Coherence Is a casual association consistent with what is known from scientific investigations from other disciplines? An association should not conflict with what is known about the natural history and biology of the disease
Consistency Has this association been observed in epidemiologic studies in different places, populations, and times? If the association is causal, it would be expected that it would be replicated in other populations
Experiment In an unplanned or “natural” experiment, if preventive actions regarding an exposure are taken, is the disease, in fact, prevented?
Plausibility Is current biological knowledge consistent with a causal association? Was an association between the exposure and the disease anticipated a priori based on current biological knowledge?
Specificity Is the exposure associated with only one disease and is the disease associated with only one exposure? Specificity holds for infectious disease, but is considered one of the weaker causal guidelines when evaluating cause for a chronic disease
Strength of the association How strong is the relationship between a factor and an outcome? Compared with weak associations, strong associations may be less likely to be explained only by bias
Temporality Does the exposure precede the outcome? To be a cause, the exposure must be present prior to the development of the disease. Temporality is the one causal guideline that must be met in order for a factor to cause a disease

What I do not believe…is that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we accept cause and effect…What they [the nine guidelines] can do, with greater or less strength, is to help us to make up our minds on the fundamental question - is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect? (Hill 1965) 15

Modern causal thinking moves beyond this general framework, emphasizing, as Hill did, that there is no checklist for causation. 10 16 However, Hill's framework is still influential and can provide a good starting point for the evaluation of cause. Recent advances in causal methods in epidemiology and biostatistics have the goal of making observational data more like a randomized trial to strengthen causal inference. Yet causal inference in epidemiology remains a matter of synthesis and judgment.

At the beginning of this discussion, we described how epidemiology is fundamentally a science of populations. It is easy to see therefore how epidemiology may make important contributions to public health, as public health is “the health of a whole society.” 2 Although the population approach has the advantage of seeing patterns that are difficult to discern in individuals, it also has the drawback of applying a population-based inference to an individual patient. Importantly, this is a concern even if the epidemiologic study results are unbiased and 100% accurate. Recall that the estimated increase in risk of dementia associated with hearing loss is 94%. 8 Does that mean that an individual with hearing loss has a 94% increased risk of developing dementia? The answer is no. An individual's chance of developing dementia is 0 or 1. In other words, the individual will develop dementia or they won't. The correct interpretation is that, on average, in a population , hearing loss is associated with a 94% increased risk, but that does not mean that everyone with hearing loss will develop dementia or that a person with normal hearing won't. So how are the results of an epidemiologic study to be interpreted for an individual patient? Would that patient, had they participated in the study, have had that average response? Precision medicine approaches attempt to address this concern with the goal of tailoring healthcare to an individual based on that individual's characteristics 17 but to date have met with limited success.

Disease processes are complex. As we have discussed, observational epidemiologic evidence on its own does not and cannot definitively determine cause. Despite this limitation, it does make a valuable contribution to our understanding of causes of disease in a population. Results from well-designed, well-conducted, well-analyzed epidemiologic studies are more than correlation studies, but they do have their limitations. The possible role of bias must be evaluated for each study. How strong is the bias likely to be? Is it likely to have resulted in an under- or overestimate of the association? When analyzing the data, every statistical model involves assumptions about how the data being analyzed were obtained. Just like models in physics, chemistry, and engineering, models in statistics involve simplifying assumptions: even if these assumptions are not precisely met in practice, models can still be useful and informative when appropriately applied and interpreted. It is important to understand the assumptions that are inherent in statistical models used, how they are met in the data at hand, how the possible violation of these assumptions should temper their interpretation, and the larger context of the analyses performed. As consumers of epidemiologic data, we must learn to be comfortable with uncertainty and recognize that the results of any one study should never be interpreted in isolation. Replication of findings in other studies and in other diverse populations is key. What must ultimately be balanced is the totality of the strength of the evidence for cause and the possible harms of avoiding public health action. As eloquently express by Sir Bradford Hill in his 1965 address 15 :

All scientific work is incomplete - whether it be observational or experimental. All scientific work is liable to be upset or modified by advancing knowledge. That does not confer upon us a freedom to ignore the knowledge we already have, or to postpone the action that it appears to demand at a given time. (Hill 1965) 15

With the right perspective, we can correctly interpret and evaluate the results of epidemiologic studies and place those findings into context of the scientific work that has come before. It is this process of evaluation and synthesizing the evidence from multiple studies and multiple disciplines that allows us to conclude cause, and to take the necessary clinical and policy steps to protect health and improve lives.

Funding Statement

Funding J.A.D. was supported by NIH/NIA grant K01AG054693. N.S.R. was supported by NIH/NIA grant K23AG065443.

Footnotes

Conflict of Interest None declared.

References

  • 1.Porta M. 6th ed. New York: Oxford University Press; 2014. A Dictionary of Epidemiology. [Google Scholar]
  • 2.Hersh A L, Stefanick M L, Stafford R S. National use of postmenopausal hormone therapy: annual trends and response to recent evidence. JAMA. 2004;291(01):47–53. doi: 10.1001/jama.291.1.47. [DOI] [PubMed] [Google Scholar]
  • 3.Chester R C, Kling J M, Manson J E. What the Women's Health Initiative has taught us about menopausal hormone therapy. Clin Cardiol. 2018;41(02):247–252. doi: 10.1002/clc.22891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Graunt J, Willcox W F. Baltimore, MD: The Johns Hopkins press; 1939. Natural and Political Observations Made Upon the Bills of Mortality. [Google Scholar]
  • 5.Morabia A. Epidemiology's 350th Anniversary: 1662-2012. Epidemiology. 2013;24(02):179–183. doi: 10.1097/EDE.0b013e31827b5359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Blehar M C, Spong C, Grady C, Goldkind S F, Sahin L, Clayton J A. Enrolling pregnant women: issues in clinical research. Womens Health Issues. 2013;23(01):e39–e45. doi: 10.1016/j.whi.2012.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bernard M A, Clayton J A, Lauer M S. Inclusion across the lifespan: NIH policy for clinical research. JAMA. 2018;320(15):1535–1536. doi: 10.1001/jama.2018.12368. [DOI] [PubMed] [Google Scholar]
  • 8.Livingston G, Sommerlad A, Orgeta V.Dementia prevention, intervention, and care Lancet 2017390(10113):2673–2734. [DOI] [PubMed] [Google Scholar]
  • 9.Wasserstein R L, Lazar N A. The ASA's statement on p -values: context, process, and purpose . Am Stat. 2016;70(02):129–133. [Google Scholar]
  • 10.Rothman K J, Greenland S, Lash T L. 3rd ed. Philadelphia, PA: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2008. Modern Epidemiology. [Google Scholar]
  • 11.Greenland S, Pearl J, Robins J M. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(01):37–48. [PubMed] [Google Scholar]
  • 12.Hacking I, Ian H. Cambridge University Press; 2001. An Introduction to Probability and Inductive Logic. [Google Scholar]
  • 13.Celentano D D, Mhs S, Szklo M. 6th ed. Elsevier; 2019. Gordis Epidemiology. [Google Scholar]
  • 14.US Department of Health . Washington, DC: Public Health Service; 1964. Education and Welfare: Smoking and Health: Report of the Advisory Committee to the Surgeon General. [Google Scholar]
  • 15.Hill A B. The environment and disease: association or causation? Proc R Soc Med. 1965;58:295–300. doi: 10.1177/003591576505800503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rothman K J, Greenland S. Causation and causal inference in epidemiology. Am J Public Health. 2005;95 01:S144–S150. doi: 10.2105/AJPH.2004.059204. [DOI] [PubMed] [Google Scholar]
  • 17.National Research Council (US) Committee on a Framework for Developing a New Taxonomy of Disease . Washington, DC: National Academies Press; 2011. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. [PubMed] [Google Scholar]

Articles from Seminars in Hearing are provided here courtesy of Thieme Medical Publishers

RESOURCES