Abstract
Population health improvements are the most relevant yardstick against which to evaluate the success of social epidemiology. In coming years, social epidemiology must increasingly emphasize research that facilitates translation into health improvements, with continued focus on macro-level social determinants of health. Given the evidence that the effects of social interventions often differ across population subgroups, systematic and transparent exploration of the heterogeneity of health determinants across populations will help inform effective interventions. This research should consider both biological and social risk factors and effect modifiers. We also recommend that social epidemiologists take advantage of recent revolutionary improvements in data availability and computing power to examine new hypotheses and expand our repertoire of study designs. Better data and computing power should facilitate underused analytic approaches, such as instrumental variables, simulation studies and models of complex systems, and sensitivity analyses of model biases. Many data-driven machine-learning approaches are also now computationally feasible and likely to improve both prediction models and causal inference in social epidemiology. Finally, we emphasize the importance of specifying exposures corresponding with realistic interventions and policy options. Effect estimates for directly modifiable, clearly defined health determinants are most relevant for building translational social epidemiology to reduce disparities and improve population health.
Keywords: causal inference, effect modification, exploratory analyses, machine learning, population health, social epidemiology, translational research
The final metric for evaluating social epidemiology must be its contribution to improvements in population health. If we fail to translate research findings from academic journals to human health, the field is irrelevant. We agree with Galea and Link (1) that social epidemiology is increasingly popular. Our discipline sits at the intersection of science and social justice—an exciting intersection—but scientific popularity is no guarantee of scientific merit. Basic descriptive and theoretical work has laid the groundwork for social epidemiology, drawing heavily on theories from sociology, to define and evaluate the extent to which social and economic factors outside the individual are important contributors to health. It is now imperative that results-driven translational social epidemiology assume greater importance within the field. Identifying opportunities for effective intervention may require that we go off the beaten path of risk-factor epidemiology. We recommend 3 complementary courses for the next generation of social epidemiology: 1) systematically and transparently exploring the heterogeneity of health determinants across populations, considering both biological and social risk factors; 2) leveraging recent revolutionary improvements in data availability and computing power to improve knowledge generation and causal inference; and 3) focusing research on exposures that correspond with realistic interventions and policy options to facilitate translation from research into action.
EXPLORING HETEROGENEITY: WHAT'S GOOD FOR THE GOOSE MIGHT NOT BE GOOD FOR THE GANDER
Social epidemiologists face a tension between recognizing the contingency of effects of social determinants and providing summary effect estimates that can be easily communicated. The empirical evidence now suggests that there is substantial heterogeneity in many important determinants of health. Interventions that seemed promising prima facie have turned out, upon rigorous evaluation, to have null or even harmful effects for some subgroups (2–7). Findings from major randomized controlled trials and quasi-experimental studies that suggest qualitative effect modification (i.e., opposite treatment effects for different subgroups) should force us to be even more ruthless in analyses of observational data. Often, there was insufficient observational exploration before initiation of the randomized controlled trials to fully anticipate the heterogeneity of treatment effects; we did not recognize the contingency of effects because theoretical understanding was insufficient and we were too cautious in exploring effect-modification hypotheses in observational analyses (8). Even for interventions with beneficial effects on average, there is an ethical obligation to understand whether and why some people are harmed by the intervention. Understanding subgroup differences is especially important when the subgroup potentially harmed has little information or opportunity to avoid the treatment, as would often be the case with policy interventions (4–7).
In addition to considering the possibility of heterogeneous effects of social determinants based on sociodemographic characteristics, social epidemiologists should investigate whether genetic and biologically embedded health risk factors are modified by social context (9–12). We assume that few health disparities are strictly attributable to differences in genetic allele frequencies. However, social inequalities become biologically embedded, and the physiological differences induced by such inequalities may modify other major pathways to disease. In Rothman's causal pie heuristic for multiple causations of disease (13), the influence of each component cause on disease depends on the prevalence of the other components in the population. Identical genetic risk factors may have entirely different consequences for individuals in high-risk environments compared with those in low-risk environments. For example, major genetic determinants of obesity operate via appetite. The effects of increased appetite may well be larger for individuals who live in highly obesogenic environments with easy accessibility to unhealthy food (14). We hypothesize that many social determinants have different effects on people with high versus low background genetic risks. If so, we must incorporate effect modification by genetic background to accurately describe the health effects of social context.
Given the focus of social epidemiology on processes of social stratification and inequality, our discipline may be uniquely positioned to explore effect-modification hypotheses, including hypotheses about how social and environmental contexts silence, exacerbate, or even reverse genetic effects. The likely importance of environmental modifiers in the expression of genetic determinants of health is an emerging lesson from genome-wide association studies (15–18). Social epidemiologists should have a strong voice in gene–environment studies: articulating meaningful measures of social risk factors, designing studies that are inclusive and unbiased, and helping interpret and frame results. For example, researchers must evaluate the extent to which genetic background influences social risk factors, which may in turn impact health (19). Social epidemiologists may hesitate to engage with genetic research because it is assumed that genetic factors are neither major contributors to health inequalities nor easily modifiable (and therefore of little public health interest) (20). The history of genetic determinism and eugenics certainly justifies a critical approach, but it also indicates the importance of involving thoughtful social scientists in genetic research.
How should we explore the heterogeneous effects of social determinants of health, whether based upon biology or social group? Ethnographic approaches provide rich information for the generation of hypotheses regarding heterogeneous effects. Some quantitative methods also naturally lend themselves to identification of heterogeneity; for example quantile regression methods can reveal whether an exposure has similar effects on people at high versus low background risks of the outcome (21), even when specific modifiers have not been identified. A priori hypotheses about heterogeneous effects can arise from rich social theory (8). Although a priori or post-hoc theoretical explanations for heterogeneous effect estimates can often be found, it is sometimes difficult for even very theoretically sophisticated researchers to reliably anticipate the nature of such heterogeneity. Social theory is important for many reasons (1), but theoretical frameworks may (ironically) make it more difficult to discover new, unexpected phenomena, to accept the veracity of unexpected results, or to imagine plausible alternatives precisely because theories shape our expectations. For these reasons, we believe social epidemiology is likely to be advanced by a systematic exploration of heterogeneity that is accompanied by transparent reporting of findings.
EVALUATING HETEROGENEITY IN THE CONTEXT OF UNCERTAINTY: DO NOT FEAR THE FISH
The goal of exploring heterogeneity is generally thought to be in conflict with an idealized process of scientific research and knowledge generation, which specifies that we develop a hypothesis based on prior theory and evidence, conduct a statistical test, and report the parameter estimate. Full stop. Do not reformulate and test post-hoc hypotheses; do not report multiple associations; and do not report exploratory results that fail to reach statistical significance. However, if we lack good explanations for the observed population health patterns, as we often do, this approach makes little sense. In the context of uncertainty, when we are engaging in research as a tool for discovery, “fishing” or data mining may offer a way forward.
Fishing—testing numerous exploratory hypotheses without strong prior evidence—inevitably generates many statistically significant associations due to chance (i.e., type I errors). These chance associations do not correspond with causal structures and therefore cannot inform effective public health interventions. Publication of such spurious findings need not slow scientific progress, however. Results must be honestly presented as exploratory and efforts to reproduce the associations in other studies must be published (even when the findings do not support the original association). The problem with fishing is not so much type I errors, but rather the selective publication of only “statistically significant” findings. Reporting of nonsignificant associations is essential if multiple testing approaches are to be useful. Publication bias against null findings is a recognized problem (22), but this problem could be solved or substantially remediated if scientists were not incentivized to hide null findings (23). For example, journals could commit to the online publication of exploratory analyses in conjunction with papers published in the print journal. Journals could also commit to online publication of new studies in which investigators attempt to replicate previous analyses, including exploratory analyses. Data mining, conducted and presented responsibly, could improve our very incomplete understanding of social determinants of health.
Social epidemiologists can learn from genetic epidemiologists and the lessons of genome-wide association studies. Casting the net wide for possible new understandings of social and environmental determinants of health can be successful if combined with transparency of approach, full presentation of findings, and syntheses of evidence from multiple studies. While genome-wide association study approaches continue forward, related approaches incorporating other domains of exposures, coined environmental wide association studies, are emerging (24, 25). As with genome-wide association studies, environmental wide association studies will undoubtedly generate many false leads, but transparent reporting and systematic meta-analyses will culminate in robust findings.
HARNESSING 21ST CENTURY COMPUTING POWER: THIS IS NOT YOUR MOTHER'S MACINTOSH
Revolutionary increases in computing power put a host of powerful methods within reach of social epidemiologists, and some of these tools should dramatically change how we conduct research. Improved computing power has facilitated access to an avalanche of new data for epidemiologic research, including genetics, neuroimaging, geospatial, social interaction, dietary, and medical care information. Not only will data availability grow into the future, but new data sources will also provide opportunities for new study designs to test social epidemiologic hypotheses. For example, instrumental variables can often be derived from genetics (26), geography (27, 28), characteristics of health care service delivery (29–31), or other variables that would not in the past have been routinely available. Thus, new data sources expand the repertoire of study designs to improve causal inference.
More computational power will also facilitate simulation studies and agent-based models for complex systems. As Galea and Link note, these studies are likely to be valuable to help evaluate new research methods, clarify plausible mechanistic models, and project the public health impacts of proposed interventions. Computationally intensive tools such as bootstrapping are already popular (32). Improved computational power should also foster sophisticated sensitivity analyses (33, 34). We envision a day when our discussion sections no longer include claims such as “any bias is most likely conservative” and instead state “the distribution of plausible effect estimates across a range of bias parameters is shown in the appendix.”
New data and increased computational power should also facilitate exploration of heterogeneity and generation of novel substantive hypotheses. Machine learning approaches automate search across large numbers of variables. These search tools go beyond testing previously hypothesized effect modifiers to use data-driven, supervised learning procedures to help elaborate complex causal structures (35). Prediction applications using individual algorithms, such as classification trees, least absolute shrinkage and selection operator, or Bayes nets (36–39) are likely to be supplanted by ensemble methods, which use weighted averages of multiple alternative algorithms to derive more stable estimates (36, 40–45). The power of ensemble approaches received widespread recognition when the movie rental company Netflix sponsored an international contest for the best model to predict customers' film preferences. To win the Netflix contest, competing teams repeatedly merged to access one another's algorithms, improving accuracy by combining predictions from an ever larger set of predictive algorithms (44, 46).
Alternative machine-learning procedures emphasize causal inference, identifying the (often large) set of causal structures consistent with observed data (47–49). Various estimation approaches can then be applied to quantify or parameterize the causal pathways in the proposed structures. Although these tools have rarely been applied in social epidemiology, recent applications are promising (50).
As evidenced by the Netflix challenge, there is currently intensive competition to improve on machine-learning approaches. We assume this competition will continue for years to come, to the benefit of the field. Social epidemiologists would do well to adopt a hybrid approach linking search and data mining procedures with our knowledge of the social phenomena, thereby drawing on social theory. However, search and data mining procedures may be most valuable for topics about which we understand the least.
In sum, social epidemiology will be advanced by adopting data-driven exploration and search methods and taking advantage of the recent expansion in secondary data availability. Methodological innovation is not merely about applying novel methods to improve our estimation in the third decimal point. New data and new computing power should allow us to approach problems differently. Inevitably, alternative approaches will have different strengths and weaknesses. We hope that as novel methods are introduced, researchers will articulate both the advantages and disadvantages of each innovation to foster judicious adoption of the new tools.
WHAT DOES TRANSLATION MEAN FOR SOCIAL EPIDEMIOLOGY? HONING EXPOSURE OPERATIONALIZATION TO FOSTER TRANSLATION
The primary challenge we face in social epidemiology—to improve population health—is reflected in the question, “So what can we do about it?” The “so what” question is frequently posed not only by other epidemiologists and practitioners but also by the public and policymakers. For example, given research demonstrating that socioeconomic position strongly predicts health, a common response is that it is not feasible to eliminate socioeconomic inequalities in health because it is not feasible to eliminate socioeconomic inequalities.
Using specific, modifiable exposure measures instead of generic socioeconomic position indicators will naturally address the “so what” question by estimating contrasts between realistic counterfactuals, supporting a “practicable social epidemiology” (51). Syme et al. argued that “it doesn't take a revolution” (52, p. 114) to leverage our understanding of social determinants to improve population health. Changes in social and economic exposures, as well as corresponding improvements in population health, occur routinely even in the absence of vast resource redistribution. To that end, a critical step for improving translation in 21st century social epidemiology is to define and test exposures that correspond with directly modifiable risk factors. Clearly defined counterfactuals lay the groundwork for causal inference, and causal inference lays the groundwork for translation into population health improvements (53, 54).
For example, the majority of research on the health effects of financial conditions uses “income” or “wealth” without considering the source of the money. Yet, the health effects of income seem to differ depending on the mechanism by which the income is delivered (55–57). Research on the health effects of income would have much greater potential for translation if income were operationalized to correspond to specific mechanisms for delivering additional income (e.g., increases in minimum wages, tax credits or reductions in sales taxes, or housing subsidies). Policies for income supplementation could then be compared with alternative subsidies, for example, food or housing assistance. This framework implies a greater focus on effects of changes in exposure. For example, although depression is an established predictor of stroke, little research has been conducted on successful treatment of depression and subsequent stroke risk (58). Evidence from life-course research implies that meaningful specification of exposure must include the age at which the risk factor is encountered (59). For example, poverty at 4 years of age may have different consequences than poverty at 64 years of age. Research evaluating social determinants of health with such precise specifications of the hypothesized social determinant is currently rare.
Heretofore, social epidemiology has been dominated by descriptive studies. Descriptive work is necessary and has been valuable to, for example, establish key dimensions of health inequality, demonstrate that inequalities are mutable, and challenge the view that inequalities are the “fault” of the disadvantaged (60–62). Descriptive efforts clearly merit ongoing attention. However, too little research has focused on what to do about these well-documented inequalities. To now move forward we must ask how we can improve the health of disadvantaged groups. Our discipline focuses on the effects of social context on health, so translation must focus primarily on modifying social context. To support translation, we must understand how distal factors, such as economic conditions, public policy, physical environment, social networks, early childhood family and educational context, or working conditions, influence health. How can we most effectively interrupt the cascading, intersecting processes that lead to disease?
We also emphasize what a translational social epidemiology is not: Population health relevance does not imply narrowing the scope of the field to only small scale interventions, interventions localized to a single community, individual behavioral processes, or proximal risk factors that are shaped by the larger social context. Translational social epidemiology does not imply focusing exclusively on clinical services, even if effects of clinical interventions may be observable within a much shorter time frame and randomization is often easier. Rather, the purview of translational social epidemiology should include estimating health effects of alternative social and economic policies (unemployment compensation, earned income tax credits, social security eligibility age); of specific infrastructure features of neighborhoods or communities (grocery stores, sidewalks, policing strategies); or of discriminatory interactions.
Galea and Link lament that social epidemiologists are rarely at the policy table. Enhanced training in social policy and law, along with more interdisciplinary collaborations with policy researchers and policymakers, would help address this. However, even with strong training and collaborations, rigorously evaluating the health effects of macro-level factors such as social policies is difficult. Conventional randomization may be nearly impossible; the number of macro units (e.g., countries, states) is usually small and finite; modifying the macro-level cause might be extremely difficult or politically infeasible (52); and effects of any isolated macro-level cause, once conveyed through a long causal chain, are typically small and may emerge at an unknown time in the future (63). Appropriate data are often unavailable; high-quality health measures were not included in many past social policy evaluations, and available assessments sometimes focused exclusively on health care delivery or service usage (64). These challenges imply that it is not always realistic to exclusively study naturalistic exposures such as existing social policies or specific interventions, and we must conduct research on effects all along the causal pathway. Establishing the effects of downstream mediators is essential for translational social epidemiology, to make a compelling case for causality, to identify opportunities for downstream interventions, and to inform theory and thereby predict generalizability of findings from one context to another.
CONCLUSION
The last decade of research has made us both more humble about our current understanding of social determinants of health and more optimistic about the potential role of social epidemiology research to improve population health. Why are we humbled? Several interventions based on observational epidemiologic research failed to deliver health improvements (59, 65). Why are we optimistic? Analytic methods based on observational data have been vindicated time and again (66, 67), and the disappointments of observational epidemiology have led to critical re-appraisal and strengthening of conceptual and empirical approaches (59, 68). Research continues to support the overwhelming role of social causes in morbidity and mortality (69). We are confident that social epidemiologists, armed with conceptual clarity, methodological innovation, and scientific rigor, can identify new and more effective approaches for reducing inequalities and improving population health. Some of our recommendations, such as data-driven exploratory analyses and a greater tolerance for potentially false positive associations, veer off the well-trodden path of hypothesis-driven regression models. Going off-road entails risk of mistakes, but it also offers the best chance of new discoveries.
ACKNOWLEDGMENTS
Author affiliations: Department of Social and Behavioral Sciences, Harvard School of Public Health, Boston, Massachusetts (M. Maria Glymour); Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, California (M. Maria Glymour); Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota (Theresa L. Osypuk); and General Medical Disciplines, School of Medicine, Stanford University, Stanford, California (David H. Rehkopf).
Conflict of interest: none declared.
REFERENCES
- 1.Galea S, Link B. Six paths for the future of social epidemiology. Am J Epidemiol. 2013;178(6):843–849. doi: 10.1093/aje/kwt148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Schmeiser MD. Expanding wallets and waistlines: the impact of family income on the BMI of women and men eligible for the Earned Income Tax Credit. Health Econ. 2009;18(11):1277–1294. doi: 10.1002/hec.1430. [DOI] [PubMed] [Google Scholar]
- 3.Ertel KA, Glymour MM, Glass TA, et al. Frailty modifies effectiveness of psychosocial intervention in recovery from stroke. Clin Rehabil. 2007;21(6):511–522. doi: 10.1177/0269215507078312. [DOI] [PubMed] [Google Scholar]
- 4.Osypuk TL, Schmidt NM, Bates LM, et al. Gender and crime victimization modify neighborhood effects on adolescent mental health. Pediatrics. 2012;130(3):472–481. doi: 10.1542/peds.2011-2535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Osypuk TL, Tchetgen EJ, Acevedo-Garcia D, et al. Differential mental health effects of neighborhood relocation among youth in vulnerable families: results from a randomized trial. Arch Gen Psychiatry. 2012;69(12):1284–1294. doi: 10.1001/archgenpsychiatry.2012.449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kling JR, Liebman JB, Katz LF. Experimental analysis of neighborhood effects. Econometrica. 2007;75(1):83–119. [Google Scholar]
- 7.Jacob BA, Ludwig J, Miller DL. The effects of housing and neighborhood conditions on child mortality. J Health Econ. 2013;32(1):195–206. doi: 10.1016/j.jhealeco.2012.10.008. [DOI] [PubMed] [Google Scholar]
- 8.Osypuk TL. Invited commentary: integrating a life course perspective and social theory to advance research on residential segregation and health. Am J Epidemiol. 2013;177(4):310–315. doi: 10.1093/aje/kws371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Johnson W, Kyvik KO, Mortensen EL, et al. Education reduces the effects of genetic susceptibilities to poor physical health. Int J Epidemiol. 2010;39(2):406–414. doi: 10.1093/ije/dyp314. [DOI] [PubMed] [Google Scholar]
- 10.Rehkopf DH, Adler N. Commentary: it's not all means and genes—socio-economic position, variation and genetic confounding. Int J Epidemiol. 2010;39(2):415–416. doi: 10.1093/ije/dyp396. [DOI] [PubMed] [Google Scholar]
- 11.Boardman JD. State-level moderation of genetic tendencies to smoke. Am J Public Health. 2009;99(3):480–486. doi: 10.2105/AJPH.2008.134932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Boardman JD, Roettger ME, Domingue BW, et al. Gene-environment interactions related to body mass: school policies and social context as environmental moderators. J Theor Polit. 2012;24(3):370–388. doi: 10.1177/0951629812437751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Rothman K, Greenland S, Lash T. Modern Epidemiology. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008. [Google Scholar]
- 14.Wardle J, Carnell S, Haworth CM, et al. Evidence for a strong genetic influence on childhood adiposity despite the force of the obesogenic environment. Am J Clin Nutr. 2008;87(2):398–404. doi: 10.1093/ajcn/87.2.398. [DOI] [PubMed] [Google Scholar]
- 15.Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zuk O, Hechter E, Sunyaev SR, et al. The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012;109(4):1193–1198. doi: 10.1073/pnas.1119675109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ober C, Vercelli D. Gene-environment interactions in human disease: nuisance or opportunity? Trends Genet. 2011;27(3):107–115. doi: 10.1016/j.tig.2010.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Smith GD. Epidemiology, epigenetics and the ‘Gloomy Prospect’: embracing randomness in population health research and practice. Int J Epidemiol. 2011;40(3):537–562. doi: 10.1093/ije/dyr117. [DOI] [PubMed] [Google Scholar]
- 19.Jencks C. Heredity, environment, and public policy reconsidered. Am Sociol Rev. 1980;45(5):723–736. [PubMed] [Google Scholar]
- 20.Merikangas KR, Risch N. Genomic priorities and public health. Science. 2003;302(5645):599–601. doi: 10.1126/science.1091468. [DOI] [PubMed] [Google Scholar]
- 21.Liu SY, Kawachi I, Glymour MM. Education and inequalities in risk scores for coronary heart disease and body mass index: evidence for a population strategy. Epidemiology. 2012;23(5):657–664. doi: 10.1097/EDE.0b013e318261c7cc. [DOI] [PubMed] [Google Scholar]
- 22.Dwan K, Altman DG, Arnaiz JA, et al. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS One. 2008;3(8):e3081. doi: 10.1371/journal.pone.0003081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Glymour M, Kawachi I. Here's a proposal for editors that may help reduce publication bias. BMJ. 2005;331:638. doi: 10.1136/bmj.331.7517.638-a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Patel CJ, Bhattacharya J, Butte AJ. An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus. PLoS One. 2010;5(5):e10746. doi: 10.1371/journal.pone.0010746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Patel CJ, Cullen MR, Ioannidis JP, et al. Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels. Int J Epidemiol. 2012;41(3):828–843. doi: 10.1093/ije/dys003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
- 27.McClellan M, McNeil BJ, Newhouse JP. Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? Analysis using instrumental variables. JAMA. 1994;272(11):859–866. [PubMed] [Google Scholar]
- 28.Glymour MM, Kawachi I, Jencks CS, et al. Does childhood schooling affect old age memory or mental status? Using state schooling laws as natural experiments. J Epidemiol Community Health. 2008;62(6):532–537. doi: 10.1136/jech.2006.059469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Brookhart MA, Rassen JA, Schneeweiss S. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Saf. 2010;19(6):537–554. doi: 10.1002/pds.1908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Brookhart MA, Wang PS, Solomon DH, et al. Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology. 2006;17(3):268–275. doi: 10.1097/01.ede.0000193606.58671.c5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Malkin JD, Broder MS, Keeler E. Do longer postpartum stays reduce newborn readmissions? Analysis using instrumental variables. Health Serv Res. 2000;35(5 Pt 2):1071–1091. [PMC free article] [PubMed] [Google Scholar]
- 32.Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med. 2000;19(9):1141–1164. doi: 10.1002/(sici)1097-0258(20000515)19:9<1141::aid-sim479>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
- 33.Fox M, Lash T, Greenland S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int J Epidemiol. 2005;34(6):1370–1376. doi: 10.1093/ije/dyi184. [DOI] [PubMed] [Google Scholar]
- 34.Lash T, Fink A. Semi-automated sensitivity analysis to assess systematic errors in observational data. Epidemiology. 2003;14(4):451–458. doi: 10.1097/01.EDE.0000071419.41011.cf. [DOI] [PubMed] [Google Scholar]
- 35.Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York, NY: Springer; 2009. [Google Scholar]
- 36.Zhang C, Ma Y. Ensemble Machine Learning: Methods and Applications. New York, NY: Springer; 2012. [Google Scholar]
- 37.Chang YJ, Chen LJ, Chung KP, et al. Risk groups defined by Recursive Partitioning Analysis of patients with colorectal adenocarcinoma treated with colorectal resection. BMC Med Res Methodol. 2012;12:2. doi: 10.1186/1471-2288-12-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Strobl C, Malley J, Tutz G. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods. 2009;14(4):323–348. doi: 10.1037/a0016973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hothorn T, Hornik K, Zieileis A. Vienna, Austria: R Foundation for Statistical Computing; 2006. A laboratory for recursive part(y)itioning. R package version 09-11. [Google Scholar]
- 40.Rose S. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol. 2013;177(5):443–452. doi: 10.1093/aje/kws241. [DOI] [PubMed] [Google Scholar]
- 41.van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:Article25. doi: 10.2202/1544-6115.1309. [DOI] [PubMed] [Google Scholar]
- 42.Tan AC, Gilbert D. Ensemble machine learning on gene expression data for cancer classification. Appl Bioinformatics. 2003;2(3 suppl):S75–S83. [PubMed] [Google Scholar]
- 43.Goldstein BA, Hubbard AE, Cutler A, et al. An application of random forests to a genome-wide association dataset: methodological considerations & new findings. BMC Genet. 2010;11:49. doi: 10.1186/1471-2156-11-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Jahrer M, Töscher A, Legenstein R. Combining predictions for accurate recommender systems. Presented at the 16th Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining International Conference; July 25–28, 2010; Washington, DC. [Google Scholar]
- 45.Foreman KJ, Lozano R, Lopez AD, et al. Modeling causes of death: an integrated approach using CODEm. Popul Health Metr. 2012;10:1. doi: 10.1186/1478-7954-10-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Abu-Mostafa YS. Machines that think for themselves. Sci Am. 2012;307(1):78–81. [PubMed] [Google Scholar]
- 47.Spirtes P, Glymour C, Scheines R. Causation, Predition and Search. 2nd ed. Boston: MIT Press; 2000. [Google Scholar]
- 48.Maathuis MH, Colombo D, Kalisch M, et al. Predicting causal effects in large-scale systems from observational data. Nat Methods. 2010;7(4):247–248. doi: 10.1038/nmeth0410-247. [DOI] [PubMed] [Google Scholar]
- 49.Robins J, Richardson T, Spirtes P. Seattle, WA: University of Washington; 2009. On identification and inference for direct effects. http://www.stat.washington.edu/research/reports/2009/tr563.pdf . [Google Scholar]
- 50.Rettenmaier A, Wang Z. What determines health: a causal analysis using county level data [published online ahead of print September 9, 2012] Eur J Health Econ. doi: 10.1007/s10198-012-0429-0. (doi:10.1007/s10198-012-0429-0) [DOI] [PubMed] [Google Scholar]
- 51.Oakes JM. The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. Soc Sci Med. 2004;58(10):1929–1952. doi: 10.1016/j.socscimed.2003.08.004. [DOI] [PubMed] [Google Scholar]
- 52.Syme SL, Lefkowitz B, Krimgold BK. Incorporating socioeconomic factors into U.S. health policy: addressing the barriers. Health Aff. 2002;21(2):113–118. doi: 10.1377/hlthaff.21.2.113. [DOI] [PubMed] [Google Scholar]
- 53.Hernán MA, Taubman SL. Does obesity shorten life? The importance of well-defined interventions to answer causal questions. Int J Obes. 2008;32:S8–S14. doi: 10.1038/ijo.2008.82. [DOI] [PubMed] [Google Scholar]
- 54.Hernán MA. Invited commentary: hypothetical interventions to define causal effects—Afterthought or prerequisite? Am J Epidemiol. 2005;162(7):618–620. doi: 10.1093/aje/kwi255. [DOI] [PubMed] [Google Scholar]
- 55.Strully KW, Rehkopf DH, Xuan Z. Effects of prenatal poverty on infant health. Am Sociol Rev. 2010;75(4):534–562. doi: 10.1177/0003122410374086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Snyder SE, Evans WN. The effect of income on mortality: evidence from the Social Security Notch. Rev Econ Stat. 2006;88(3):482–495. [Google Scholar]
- 57.Bruckner TA, Brown RA, Margerison-Zilko C. Positive income shocks and accidental deaths among Cherokee Indians: a natural experiment. Int J Epidemiol. 2011;40(4):1083–1090. doi: 10.1093/ije/dyr073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Pan A, Sun Q, Okereke OI, et al. Depression and risk of stroke morbidity and mortality. JAMA. 2011;306(11):1241–1249. doi: 10.1001/jama.2011.1282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Berkman LF. Social epidemiology: social determinants of health in the United States: are we losing ground? Annu Rev Public Health. 2009;30(1):27–41. doi: 10.1146/annurev.publhealth.031308.100310. [DOI] [PubMed] [Google Scholar]
- 60.Marmot MG, Kogevinas M, Elston MA. Social/economic status and disease. Annu Rev Public Health. 1987;8(1):111–135. doi: 10.1146/annurev.pu.08.050187.000551. [DOI] [PubMed] [Google Scholar]
- 61.Adler NE, Rehkopf DH. U.S. disparities in health: descriptions, causes, and mechanisms. Annu Rev Public Health. 2008;29:235–252. doi: 10.1146/annurev.publhealth.29.020907.090852. [DOI] [PubMed] [Google Scholar]
- 62.Williams DR. Socioeconomic differentials in health: a review and redirection. Soc Psychol Q. 1990;53(2):81–99. [Google Scholar]
- 63.Schwartz S, Diez Roux AV. Commentary: causes of incidence and causes of cases—a Durkheimian perspective on Rose. Int J Epidemiol. 2001;30(3):435–439. doi: 10.1093/ije/30.3.435. [DOI] [PubMed] [Google Scholar]
- 64.Osypuk TL, Joshi P, Geronimo K, et al. Do social policies influence the health of women and their children? Implications for designing future policies using a social determinants of health lens. In: Goldman MB, Troisi R, Rexrode KM, editors. Women and Health. 2nd ed. London, UK: Elsevier Publishing; 2013. pp. 735–751. [Google Scholar]
- 65.Manson JE, Hsia J, Johnson KC, et al. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med. 2003;349(6):523–534. doi: 10.1056/NEJMoa030808. [DOI] [PubMed] [Google Scholar]
- 66.Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342(25):1878–1886. doi: 10.1056/NEJM200006223422506. [DOI] [PubMed] [Google Scholar]
- 67.Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342(25):1887–1892. doi: 10.1056/NEJM200006223422507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Hernan MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19(6):766–779. doi: 10.1097/EDE.0b013e3181875e61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Galea S, Tracy M, Hoggatt KJ, et al. Estimated deaths attributable to social factors in the United States. Am J Public Health. 2011;101(8):1456–1465. doi: 10.2105/AJPH.2010.300086. [DOI] [PMC free article] [PubMed] [Google Scholar]