Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jul 1.
Published in final edited form as: Epidemiology. 2019 Jul;30(4):609–614. doi: 10.1097/EDE.0000000000001025

Countering the Curse of Dimensionality: Exploring data-generating mechanisms through participant observation and mechanistic modeling

Alan Hubbard 1, James Trostle 2, Ivan Cangemi 3, Joseph N S Eisenberg 3
PMCID: PMC6548691  NIHMSID: NIHMS1525870  PMID: 30985531

Causal assumptions [..] cannot be verified even in principle, unless one resorts to experimental control. […] Statisticians can no longer ignore the mental representation in which scientists store experiential knowledge, since it is this representation, and the language used to access it that determine the reliability of the judgments upon which the analysis so crucially depends.

– Judea Pearl1

Public-health researchers face ethical and practical barriers that often preclude attaining a sufficient degree of experimental control to formally infer causality. Ethical concerns rule out many experiments on human health, and diverse socio-ecologic mechanisms shaping disease outcomes challenge experimental design (Figure part A). Researchers are instead relying on large observational datasets to attain causal understanding, but this requires balancing populations across all potentially relevant variables for an outcome of interest. As the number and interdependence of such variables increase, the size of the required dataset rapidly exceeds plausible levels (“Curse of Dimensionality”; fig. 1 Part B).2

Figure. Facilitating causal inference through systematic mechanistic exploration.

Figure.

(A) Outcomes of interest to public health are shaped by complex networks of mechanisms. For example, population circulation can affect and be driven by pathogen transmission, and pathogen transmission can alter the structure and dynamics of social networks, which in turn can influence the capacity of communities to adopt health-enhancing behavioral norms.44 (B) Disentangling these mechanisms to ascertain causes presents serious analytical challenges. Faced with the curse of dimensionality, for instance, researchers must rely on assumptions from the outside concerning data-generating mechanisms to select key variables and thereby reduce the dimension of the problem. (C) By placing researchers inside system dynamics, participant observation promotes a process of continuous, responsive counterfactual reasoning, without predetermined variables and rigid study designs. Instead, researchers accumulate perceptions and experiences of potentially relevant mechanisms from diverse perspectives. (D) Combined with mechanistic modeling, participant observation can facilitate causal inference by guiding the exploration of candidate data-generating mechanisms across different contexts of observation (‘transportability’).45

Over the last thirty years, it has become evident that the curse of dimensionality extends across global challenges from climate change and threats to biodiversity, to increasing disparities and insularity among social groups. To improve causal understanding requires lowering the dimension of these complex processes. This in turn requires experiential knowledge across many disciplines and thus expanded definitions of relevant research foci and methods.3 Public health has partially addressed this through a renewed commitment to investigating health outcomes as parts of socio-ecologic systems, a return to the holistic roots of the field in the nineteenth and early twentieth centuries.4,5

Delving into the health impacts of the systems involved in climate change6 or social disparities,7 public health researchers are trying to confront causal tangles not easily solved using conventional analytical approaches such as simple linear or logistic regression models. Identifying major risk factors for disease can lead to misleading results when studying large numbers of variables interacting at multiple social, spatial, and temporal scales.811 In the case of epidemiology, in particular, the risk factor approach has been critically evaluated in discussions of “black box” epidemiology.12 There have been subsequent calls for multi-level analysis,13 more sophisticated analysis of causal webs,14 and the need to move away from proximate (individual) and toward ultimate (socio-ecologic) causes of disease.15

Since the late 1990s, therefore, public-health methodologists have been evaluating how researchers move between population samples, joint distributions of variables, and the “data-generating mechanisms”1 that underlie them. Arguably their most important insight is that, beyond the confines of strict experimental control, causal inference must proceed from prior assumptions or beliefs about data-generating mechanisms.16,17 These causal assumptions, unlike associational assumptions, are fundamentally untestable; their validity hinges on the researcher’s ability to identify and accurately map the true data-generating mechanism underlying a distribution of interest.1

Much of the early causal inference literature, however, created its evidence only within an experimental paradigm. Some recent discussions of causal inference have called for broader types of evidence to describe data-generating mechanisms: Schwartz et al.18 (see also Kaufman19), for example, criticized the excessive attention on intervention designs as the only legitimate form of causal exploration, and Krieger and Davey-Smith called for “different strands of evidence produced by myriad methods.”20

We propose that field research iterating cycles of observation and mechanistic modeling represents a useful approach to explore and evaluate candidate data-generating mechanisms. Participant observation is especially appropriate when modeling human behavioral influences on disease. Other kinds of observation related to physical or natural processes (e.g., rainfall patterns, soil composition, and animal behavior) along with mechanistic modeling can reveal the relevant causal dynamics of non-human or non-behavioral influences on disease. This is precisely the experiential knowledge that Pearl promotes as essential for causal inference. For the purposes of this commentary, we focus on participant observation of social processes to illustrate our ideas about how to link observed associations to theories about causality.

Participant observation methods are especially important for epidemiologists collecting data in communities (“shoe-leather” studies), but we think that epidemiologists doing secondary analysis of existing data would also benefit from referring to these kinds of data. As we next point out, fundamental elements of this approach were instrumental to the early development of public health, especially in studies of infectious disease.

The power of participant observation

By participant observation, I mean a technique […] of getting data, it seems to me, by subjecting yourself, your own body and your own personality, and your own social situation, to the set of contingencies that play upon a set of individuals […]. So that you are close to them while they are responding to what life does to them. […] To me, that’s the core of observation.

– Erving Goffman21

Modern public health relies heavily on observational evidence gathered in the field. Researchers collecting community level data spend substantial amounts of time in the communities they study in order to identify and interact with key informants, conduct focus groups, administer surveys to carefully selected samples of individuals, and measure predefined variables. They may also collect relevant information in an informal way as they are repeatedly exposed to and share in the details of daily community life. Although unconstrained experiential observation of this sort is seldom included as an explicit aim of modern public-health fieldwork, when carried out intensively it can enrich the static snapshots of system states produced through standard observational techniques. It can stimulate and guide intuition and creativity—whose role in epidemiology was recently highlighted in this journal22—and ultimately facilitate causal inference.

Intensive engagement with a system over time stimulates researchers to consider target phenomena from diverse perspectives, promoting continuous, responsive counterfactual reasoning. When accumulating perceptions and experiences of daily life, researchers can abstract observed patterns into categories—social roles, types of objects and places, etc.—and the relations that link them in recurring types of interactions. Researchers can then systematically evaluate the coherence and operation of these categories and relations across data sets, time, and situations.23

Anthropologists and sociologists use the term “participant observation” to refer to fieldwork designed to elicit this sort of immersive engagement with a system (Figure part C), and have written extensively about its methodologic underpinnings.24 Well before the formal advent of participant observation within anthropology and sociology, however, close precursors were pioneered in nineteenth-century public health.25,26 In the case of measles, for instance, over a century before the isolation of the virus, Peter Panum used detailed studies of diets, habits, and living conditions following an outbreak in the Faroe Islands to show its respiratory transmission, incubation period, and the lifelong immunity it confers. During 8 months, he visited 52 villages, talking with those who had contracted the disease, verifying the timeline of its progression, and showing the relationship between people’s gathering together in group fish hunts and their subsequent infection.

Similarly, important aspects of participant observation can be attributed to another foundational contribution of modern epidemiology, John Snow’s work on the transmission of cholera.27 Snow isolated the causal features underlying cholera outbreaks in large part by accumulating perceptions and experiences of socio-ecologic dynamics within different contexts of transmission. His sources included informal observations in the coalmines of Newcastle, systematic conversations with residents during the Soho outbreak of 1854, and the detailed local knowledge of a clergyman who was intimately involved with the community. Unlike most of his contemporaries, Snow was able to focus “on the dynamics of transmission, on movement rather than stasis, on the verbs of actions, rather than the nouns of characteristics and things.”28

Turning to contemporary public health, aspects of participant observation have been productively used to study illicit drug use and related interventions,29,30 and to evaluate population-level records of infant mortality in Brazil in light of the perceptions and experiences of “popular death reporters” such as coffin makers, midwives, and priests.31 These studies illustrate the utility of immersive fieldwork for probing unexpected results from quantitative analyses and improving the validity of estimates. Although participant observation can thus be effective on its own, we believe that fully realizing its promise for systematic mechanistic exploration will require integrating it into formal mechanistic modeling techniques. We next argue that the strengths of mechanistic modeling complement those of participant observation.

Toward systematic mechanistic exploration

Where other forms of observation, such as surveys and interviews, capture static snapshots of predetermined facets of a system, participant observation injects the researcher directly into the thick of the dynamics. Participant observers often struggle to connect rigorously the details of daily lives they accumulate during fieldwork with their evolving insights regarding the macro-level socio-ecological forces that help shape them. Sociologist Mitchell Duneier argues that researchers should “try to grasp the connections between individual lives and macroforces at every turn, while acknowledging [their] uncertainty when [they] cannot be sure how those forces come to bear on individual lives”.32 Agent- and equation-based mechanistic modeling can help participant observers probe mechanistic connections, communicate assumptions and uncertainties unambiguously, and ensure the compatibility of the resulting insights with appropriate statistical techniques (Figure part D).

Mechanistic models can easily be extended from simple toy models—little more than dynamic conceptual schemas (e.g., compartmental models)—to virtual counterfactual experiments incorporating large sets of carefully estimated parameters for specific cases. This spectrum of mechanistic abstraction is ideally suited to complement participant observation iteratively. Particularly in the early stages of participatory fieldwork, models can help clarify and organize mechanistic insights as they emerge from the observer’s accumulating perceptions and experiences. Even divorced from quantitative data, models at this stage can shed light on the plausibility of potential data-generating mechanisms.

Exploratory mechanistic modeling also has a long history within public health. The gradual extension of models based on simple mathematical abstractions in the late nineteenth and early twentieth centuries, for instance, helped fuel the development of infectious-disease epidemiology. Examples include the fundamental debate concerning whether regularities observed across epidemic curves for various diseases should be attributed to mechanisms of variable pathogen infectivity or host population dynamics. This debate was resolved through the gradual development and analysis of SIR (Susceptible Infected and Recovered) models,33 which helped establish the concept of herd immunity and demonstrate the value of mosquito control in combating malaria.34,35

The early 20th-century epidemiologist Ronald Ross was especially clear about the importance of mechanistic modeling for the development of public health. At a time when epidemiologic research was predominantly rooted in “a posteriori” statistical analyses, Ross advocated an “a priori pathometry,” conceptualizing epidemiologic problems in terms of abstract “happenings” and identifying mechanisms that could account for observed disease patterns across multiple contexts.35 He presented a strong argument for the value of simple mechanistic models and their iterative extension: “These studies require to be developed much further; but they will already be useful if they help to suggest a more precise and quantitative consideration of the numerous factors concerned in epidemics. At present medical ideas regarding these factors are generally so nebulous that almost any statements about them pass muster […]”.36

Many of the model notations and structures developed from the foundational debates mentioned above have anchored a century of sophisticated forays into epidemiologic dynamics.37 For example, the progressive extension of SIR and related models was instrumental in establishing the role of superspreaders as potential drivers of outbreaks. Analyses of specific chains of transmission for diseases such as gonorrhea and severe acute respiratory syndrome (SARS) suggested that interactions within small groups of high-risk individuals might sustain transmission in larger populations.38,39 These observations promoted models allowing heterogeneous network dynamics, which have since confirmed the importance of considering superspreaders in outbreak control strategies, and the types and quality of data necessary to document superspreading events.38,40

Simple models can lend structure to the early stages of mechanistic exploration without determining or constraining the perceptions and experiences of the observer. In turn, the observer’s evolving mechanistic insights can guide the extension of models and targeted data collection to inform them. Ultimately, we envision this iterative approach as yielding a constrained set of well-specified models representing candidate data-generating mechanisms, primed to guide inference in combination with appropriate statistical techniques.2, 4143

Obstacles in academic and public health practice create disincentives to bringing observational analysis from social sciences together with modeling analysis from the natural and physical sciences. These obstacles are curricular, financial, temporal, and cultural. They can be addressed through training epidemiologists on qualitative methods and more grant opportunities requiring multiple disciplines in research teams. As epidemiologists acquire these skills and experience, the impediments associated with disciplinary conventions about study duration and legitimate methods will grow smaller. We suggest that iteratively using participant observation, basic science, and mechanistic modeling will provide a smaller and more efficient causal representation of a system under study, eliminating unnecessary variables and ensuring we include the important variables. Thus, theory can reduce the dimension of the problem because fewer variables are considered relevant, helping to operationalize Judea Pearl’s call for statisticians to consider “the mental representation in which scientists store experiential knowledge.”1

Acknowledgments

Source of funding: This work was supported by grant U01GM110712 from the Models of Infectious Disease Agent Study (MIDAS) program within the National Institute of General Medical Sciences of the National Institutes of Health and grant 1360330 from the National Science Foundation Water Sustainability and Climate program.

Biography

About the Authors:

Alan Hubbard is Professor of Biostatistics at University of California, Berkeley. Dr. Hubbard’s research focuses on the estimation of causal effects using machine learning with applications in epidemiology and biomedicine.

James A Trostle is Professor of Anthropology at Trinity College and Visiting Professor of Public Health at the University of Chile. He combines anthropology and epidemiology to study the transmission of infectious diseases within families and across regional landscapes. During three decades of global health work he has served on task forces for WHO Programs on Tropical Disease Research, Diarrheal Disease Research, and Human Reproduction.

Ivan Camgeni is a Postdoctoral Scholar in the Department of Epidemiology at the University of Michigan. Dr. Camgeni’s research focuses on agent based modeling of population processes with a focus on the dynamics of social structure and interaction.

Joseph N.S. Eisenberg is Professor and Chair of the Department of Epidemiology at the University of Michigan. Dr. Eisenberg has a particular interest in environmental determinants of infectious diseases, largely focusing on water- and vectorborne diseases, integrating field epidemiology with dynamic transmission model analysis.

Footnotes

The authors are not aware of any conflicts of interest.

References

  • 1.Pearl J An introduction to causal inference. Int. J. Biostat 2010;6:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bengtsson T, Bickel P, Li B. Curse-of-dimensionality revisited: collapse of the particle filter in very large scale systems In: Nolan D, Speed T, eds. Probability and statistics: essays in honor of David A. Freedman Beachwood: Institute of Mathematical Statistics; 2008:316–334. [Google Scholar]
  • 3.Editorial Board. Mind Meld. Nature 2015;525:289–290. [DOI] [PubMed] [Google Scholar]
  • 4.Schwartz S, Susser E, & Susser M. A future for epidemiology? Annu. Rev. Public Health 1999;20:15–33. [DOI] [PubMed] [Google Scholar]
  • 5.Eisenberg JNS, Desai MA, Levy K, et al. Environmental determinants of infectious disease: a framework for tracking causal links and guiding public health research. Environ. Health Perspect 2007;115:1216–1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Herlihy N, Bar-Hen A, Verner G, et al. Climate change and human health: what are the research trends? A scoping review protocol. BMJ Open 2016;6:e012022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Diez Roux AV. Conceptual approaches to the study of health disparities. Annu. Rev. Public Health 2012;33:41–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Taubes G Epidemiology faces its limits. Science 1995;269:164–169. [DOI] [PubMed] [Google Scholar]
  • 9.Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2:696–701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ioannidis JP. Why most discovered true associations are inflated. Epidemiology 2008;19: 640–648. [DOI] [PubMed] [Google Scholar]
  • 11.Burton PR, Hansell AL, Fortier I, et al. Size matters: just how big is BIG? Quantifying realistic sample size requirements for human genome epidemiology. Int. J. Epidemiol 2009;38:263–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Susser E Eco-epidemiology: thinking outside the black box [commentary]. Epidemiology 2004;15: 519–520. [DOI] [PubMed] [Google Scholar]
  • 13.Diez-Roux AV. Multilevel analysis in public health research. Annu. Rev. Public Health 2000;21:171–192. [DOI] [PubMed] [Google Scholar]
  • 14.Krieger N Epidemiology and the web of causation: has anyone seen the spider? Soc. Sci. Med 1994;39:887–903. [DOI] [PubMed] [Google Scholar]
  • 15.McMichael AJ. Prisoners of the proximate: loosening the constraints on epidemiology in an age of change. Am. J. Epidemiol 1999;149:887–897. [DOI] [PubMed] [Google Scholar]
  • 16.Greenland S Modeling and variable selection in epidemiologic analysis. Am. J. Public Health 1989;79:340–349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Petersen ML, van der Laan MJ. Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiology 2014;25:418–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schwartz S, Gatto NM, Campbell UB. Causal identification: a charge of epidemiology in danger of marginalization. Ann. Epidemiol 2016; 26:669–673. [DOI] [PubMed] [Google Scholar]
  • 19.Kaufman JS. There is no virtue in vagueness: Comment on: Causal identification: a charge of epidemiology in danger of marginalization by Sharon Schwartz, Nicolle M. Gatto, and Ulka B. Campbell. Ann. Epidemiol 2016; 26:683–684. [DOI] [PubMed] [Google Scholar]
  • 20.Krieger N, Davey Smith G. The tale wagged by the DAG: broadening the scope of causal inference and explanation for epidemiology. Int. J. Epidemiol 2016;45:1787–1808. [DOI] [PubMed] [Google Scholar]
  • 21.Goffman E On fieldwork. J. Contemp. Ethnogr 1989;18:123–132. [Google Scholar]
  • 22.Wilcox AJ, Cortese M, Baravelli CM, Skjaerven R. When intuition invites the analytical mind to dance—the essential role of creativity in science. Epidemiology 2018;29:753–755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Tavory I, Timmermans S. A pragmatist approach to causality in ethnography. Am. J. Sociol 2013;119:682–714. [Google Scholar]
  • 24.Kawulich BB. Participant observation as a data collection method. Forum Qual. Soc. Res 2005;6:43. [Google Scholar]
  • 25.Fleck AC, Ianni FAJ. Epidemiology and anthropology: some suggested affinities in theory and method. Hum. Organ 1958;16:38–40. [Google Scholar]
  • 26.Trostle J Epidemiology and culture New York: Cambridge University Press; 2005. [Google Scholar]
  • 27.Snow J On the mode of communication of cholera London: John Churchill; 1855. [Google Scholar]
  • 28.Paneth N, Fine P. The singular science of John Snow. Lancet 2013;381:1267–1268. [DOI] [PubMed] [Google Scholar]
  • 29.Power R Participant observation and its place in the study of illicit drug abuse. Br. J. Addict 1989;84:43–52. [DOI] [PubMed] [Google Scholar]
  • 30.Bourgois P, Bruneau J. Needle exchange, HIV infection, and the politics of science: confronting Canada’s cocaine injection epidemic with participant observation. Med. Anthropol 2000;18:325–350. [Google Scholar]
  • 31.Nations MK, Amaral ML. Flesh, blood, souls, and households: cultural validity in mortality inquiry. Med. Anthropol. Q 1991;5:204–220. [Google Scholar]
  • 32.Duneier M, Carter O, Hasan H. Sidewalk New York: Farrar, Straus, and Giroux; 1999. [Google Scholar]
  • 33.Fine PEM. John Brownlee and the measurement of infectiousness: an historical study in epidemic theory. J. R. Stat. Soc. Ser. A Stat. Soc 1979;142:347–362. [Google Scholar]
  • 34.Fine PEM. Herd immunity: history, theory, practice. Epidemiol. Rev 1993;15:265–302. [DOI] [PubMed] [Google Scholar]
  • 35.Fine PEM. Ross’s a priori pathometry—a perspective. Proc. R. Soc. Med 1975;68:1–5. [PMC free article] [PubMed] [Google Scholar]
  • 36.Ross R Some quantitative studies in epidemiology. Nature 1911;87:466–467. [Google Scholar]
  • 37.Hethcote HW. The mathematics of infectious diseases. SIAM Rev 2000;42:599–653. [Google Scholar]
  • 38.Hethcote HW, Yorke JA. Gonorrhea transmission dynamics and control Berlin: Springer-Verlag; 1984. [Google Scholar]
  • 39.Stein RA. Super-spreaders in infectious diseases. Int. J. Infect. Dis 2011;15:e510–e513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kemper JT. On the identification of superspreaders for infectious disease. Math. Biosci 1980;48:111–127. [Google Scholar]
  • 41.Ionides EL, Bhadra A, Atchadé Y, et al. Iterated filtering. Ann. Stat 2011;39:1776–1802. [Google Scholar]
  • 42.Lavine JS, Rohani P. Resolving pertussis immunity and vaccine effectiveness using incidence time series. Expert Rev. of Vaccines 2012;11:1319–1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Roy M, Bouma MJ, Ionides EL, et al. The potential elimination of Plasmodium vivax malaria by relapse treatment: insights from a transmission model and surveillance data from NW India. PLoS Negl. Trop. Dis 2013;7:e1979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zelner JL, Trostle J, Goldstick JE, et al. Social connectedness and disease transmission: social organization, cohesion, village context, and infection risk in rural Ecuador. Am. J. Public Health 2012;102:2233–2239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Pearl J, Bareinboim E. Transportability of causal and statistical relations: a formal approach In: Proceedings of the 25th AAAI Conference on Artificial Intelligence Menlo Park, CA: AAAI Press; 2011: 247–254. [Google Scholar]

RESOURCES