Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
letter
. 2018 Jun 5;25(8):1109–1110. doi: 10.1093/jamia/ocy061

The potential value of social determinants of health in predicting health outcomes

Jessica S Ancker 1, Min-Hyung Kim 1, Yiye Zhang 1, Yongkang Zhang 2, Jyotishman Pathak 1
PMCID: PMC7646877  PMID: 29878223

Dear Dr Ohno-Machado,

As Kasthurirathne and colleagues1 point out in their recent JAMIA paper, it is well established that population health is affected by socioeconomic status and other social determinants of health (SDH). It is therefore reasonable to hypothesize that SDH data should have predictive power for clinical and healthcare utilization outcomes for individual patients, yet the authors found that adding SDH data to clinical data produced no significant improvement in the performance of algorithms predicting the need for social service referrals.1

It is important to be reminded that not all available data have utility for all purposes, and that many plausible hypotheses do not survive rigorous analysis. However, we would like to make sure that the informatics community does not interpret these findings more broadly to indicate that SDH data have no value. We would like to propose several possible explanations for these interesting and surprising findings, each of which might suggest future avenues of exploration.

  1. Correlations between predictors: It is known that SDH are correlated with the risk of many clinical conditions, and it seems possible that in this particular study, the SDH variables were strongly correlated with the clinical diagnoses. For example, social determinants (eg socioeconomic status, race, social support, etc.) are well established as strong predictors of cardiovascular disease.2 If, in the current study, the information contained in the SDH was already present in the clinical variables, then adding SDH would not improve model performance. Future work in different domains might show SDH to have more predictive power. For example, with certain congenital conditions, SDH might have little influence on the risk of disease but instead could be related to access to care or quality of life for people with the condition, and thus would contribute additional information to a model containing clinical diagnoses.

    In addition, it is likely that many of the SDH are correlated with each other. Community-level research, which often deals with collinear predictors, often addresses this problem by collapsing correlated measures into scales (such as the Centers for Disease Control’s Social Vulnerability Index [https://svi.cdc.gov]), which are then utilized as single metrics.

  2. Choice of outcome variables: A closely related potential explanation is that the specific social service referrals chosen in the current study were not well predicted by SDH variables because they were already too well predicted by the clinical ones. For example, referrals to dietitians (one of the study’s outcome variables) might be so strongly predicted by diagnosis of diabetes, hypertension, or congestive heart failure that additional data are not helpful. Similarly, referrals to mental health services might be sufficiently strongly predicted by the presence of psychiatric diagnoses. It is possible that SDH data may have predictive power for other sorts of healthcare utilization and health outcomes, just not for these particular social services.

  3. Variability in predictors: As the authors suggest in their discussion, it is possible that the patients of the safety net health system had insufficient variability in SDH. For example, if household income did not range much above or below the poverty line in the entire population of interest, this variable would provide little discriminatory power. It is possible that models built from a more socioeconomically diverse population data set might provide different results.

  4. Comprehensiveness of the clinical data: The authors had access to unusually comprehensive clinical data through the Indiana Network for Patient Care, a well-established health information exchange organization, which allowed the researchers to leverage data such as emergency and hospital admissions from other organizations. Healthcare organizations with access to only their own local clinical data may find that SDH variables contribute more to predictive models, precisely because the SDH data might serve as proxies for some of the missing clinical data.

  5. Ecological inferences: In the absence of individual-level SDH data, the authors used community-level (ZIP code and census tract level) measures as proxies. It is possible that for certain SDH variables, community estimates are either imprecise or biased. For example, if the within-community variance in education is extremely high, then individual educational attainment will not be predicted well by the neighborhood average, creating lack of precision. Alternately, if the patients who seek care at a safety net hospital tend to be less well educated than their close neighbors, then individual educational attainment will be systematically overestimated by the neighborhood average, creating bias.3 Future work might explore whether community-level data have more utility when they describe neighborhood characteristics (such as, in this study, data about local availability of well-lit walkways or grocery stores) rather than being used to infer individual-level characteristics (such as education). A useful family of methodological approaches to account for these ecological relationships is hierarchical or multilevel models, which explicitly account for the nested structure of the data (individuals within neighborhoods, in this case).

  6. Choice of predictor variables: In addition to the rich set of social and environmental factors used by Kasthurirathne and colleagues, it is possible that others not available to the researchers might have predictive power, such as social support and social capital, or detailed employment type.4 Inspecting variable importance measures5 in the models might suggest new hypotheses about which types of SDH have the most predictive utility, and whether these point to other SDH data to collect or obtain from other sources.

Given the extensive public health literature on the relationship between social determinants and health, it is exciting to see the health informatics community begin to explore mergers of clinical and SDH data sets. We appreciate the contribution of Kasthurirathne et al. to this emerging literature and welcome additional exploration of the potential utility of social determinants of health in clinical care and predictive analytics.

Conflict of interest statement. None declared.

REFERENCES

  • 1. Kasthurirathne SN, Vest JR, Menachemi N, Halverson PK, Grannis SJ.. Assessing the capacity of social determinants of health data to augment predictive models identifying patients in need of wraparound social services. J Am Med Inform Assoc 2018; 251: 47–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Havranek E, Mujahid M, Barr D et al. , . Social determinants of risk and outcomes for cardiovascular disease. Circulation 2015; 1329: 873–98. [DOI] [PubMed] [Google Scholar]
  • 3. Goodman L. Ecological regressions and the behavior of individuals. Am Sociol Rev 1953; 186: 663–4. [Google Scholar]
  • 4. Commission on Social Determinants of Health. Closing the Gap in a Generation: Health Equity through Action on the Social Determinants of Health. Geneva, Switzerland: World Health Organization; 2008. [DOI] [PubMed] [Google Scholar]
  • 5. Strobl C, Boulesteix A, Zeileis A, Hothorn T.. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 2007; 81: 25. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES