Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jul 1.
Published in final edited form as: Contemp Clin Trials. 2023 May 22;130:107238. doi: 10.1016/j.cct.2023.107238

Equity and Bias in Electronic Health Records Data

Andrew D Boyd 1, Rosa Gonzalez-Guarda 2, Katharine Lawrence 3, Crystal L Patil 4, Miriam O Ezenwa 5, Emily C O’Brien 6, Hyung Paek 7, Jordan M Braciszewski 8, Oluwaseun Adeyemi 9, Allison M Cuthel 10, Juanita E Darby 11, Christina K Zigler 12, P Michael Ho 13, Keturah R Faurot 14, Karen Staman 15, Jonathan W Leigh 16, Dana L Dailey 17, Andrea Cheville 18, Guilherme Del Fiol 19, Mitchell R Knisely 20, Keith Marsolo 21, Rachel L Richesson 22, Judith M Schlaeger 23
PMCID: PMC10330606  NIHMSID: NIHMS1905043  PMID: 37225122

Abstract

Embedded pragmatic clinical trials (ePCTs) are conducted during routine clinical care and have the potential to increase knowledge about the effectiveness of interventions under real world conditions. However, many pragmatic trials rely on data from the electronic health record (EHR) data, which are subject to bias from incomplete data, poor data quality, lack of representation from people who are medically underserved, and implicit bias in EHR design. This commentary examines how the use of EHR data might exacerbate bias and potentially increase health inequities. We offer recommendations for how to increase generalizability of ePCT results and begin to mitigate bias to promote health equity.

Keywords: Health equity, patient-reported outcomes, social determinants of health, community engagement, health literacy

Introduction

By using data collected during clinical care, embedded pragmatic clinical trials (ePCTs) increase knowledge about the effectiveness of clinical interventions under real world conditions. However, the electronic health record (EHR) data upon which many ePCTs are designed are subject to implicit bias in EHR design; bias from incomplete data and poor data quality; and overrepresentation of data from people with structural privilege.1 These biases can limit the relevance and generalizability of results, and subsequently increase health inequities.

This commentary draws our collective experience to examine how the use of EHR data might exacerbate bias and potentially increase health inequities. We offer recommendations for how to increase generalizability of ePCT results to begin to mitigate bias and promote health equity.

Strategies to address bias in health research using the EHR

Research leveraging EHRs must be deliberately designed to identify and address bias to promote health equity. The Health Equity Lens framework, initially developed for public health professionals, outlines five health equity concepts for framing health disparities.2 We use these concepts to explore sources of bias and provide recommendations.

1. Systemic, Social and Health Inequity Bias

Problem:

The use of EHR-derived data requires careful attention to mitigate the unintended consequences of using data that mirrors US social and structural inequities. Moreover, insufficient attention has been paid to collecting data about the social determinants of health (SDoH).3 Critically and more difficult to resolve, the available data from EHRs only reflects those who access healthcare. Those who are not represented in EHR datasets are a direct consequence of historical and ongoing forms of oppression causing ubiquitous health inequities that limit who can access care. Further, EHR data completeness and accuracy may reflect additional biases resulting from institutional policy, training practices, and implicit provider bias.4 When patient-reported outcomes (PROs) are collected using patient-facing EHR modalities alone (i.e., patient portals), a portion of the population that does not use portals will also be excluded for various reasons (e.g., literacy and/or technology barriers).

Recommendation:

Data sources such as PROs and Z-codes (included in the International Classification of Diseases-10) can be used to collect the demographic and SDoH variables needed to understand outcomes and, ultimately, improve clinical practice. While there is no consensus about best practices for equity-based data collection and which SDoH measures should be minimally included, we suggest that ePCT teams should strive to collect and report standardized SDoH measures. The HL7 Gravity5,6 is one initiative aiming to identify and harmonize SDoH data so these are interoperable for electronic health information exchange. The increased national and global attention to health equity is driving not only standards but also incentives and tools to support SDOH data collection. To reduce bias in patient reported data often collected through patient portals to EHR systems, health systems and researchers will need to invest in the design of portals and engagement features, such as text messaging, and conduct specific research efforts to better understand the clinical effectiveness of these optimized EHR features in improving patients’ effective use of EHRs and engagement in their health and health care.

2. Representation and Diversity

Problem:

Much of current medical evidence was generated from clinical trials with predominately white participants which does not ensure conclusions drawn are safe and effective for all populations.7 From these trials and knowledge, algorithms are built into EHR clinical decision support (CDS) tools to suggest risk factors, diagnoses, treatments and supportive services, with potentially the same omission of areas of study. Since the range of patient populations are not proportionally represented, the underlying logic of these algorithms and CDS tools limit applicability.8

Recommendation:

Given the identified limitations of EHR data sets, greater transparency is needed regarding sources, input, and missing data and modelling choices underlying clinical decision support tools.9 When planning ePCTs, sponsors and investigators should actively seek out and engage with a variety of settings serving diverse populations; efforts that support participatory research design should be prioritized. To address data collection barriers among people who have been historically marginalized and underrepresented, some investigators have enabled interventions using bidirectional text messaging that collect PRO measures and facilitate engagement with underrepresented populations who have high rates of cell phone ownership. To reduce bias that may arise from translated PRO or patient-facing measures that are used without cross-cultural validation, we recommend investing in the testing and psychometric validation of instruments used among different populations prior to use.10,11

3. Community Engagement

Problem:

Community engaged approaches to EHR research are underutilized.

Recommendation:

More than 25 years of evidence supports following the principles of community-based participatory health research.12,13 14 In ePCTs, patients and communities ultimately affected by the health condition of concern should inform the research questions, variables and instrument selection, implementation, and interpretation of clinical research to ensure the research is relevant. Human-centered design15,16 is one strategy that incorporates diverse stakeholders in the design and development of health technology interventions. Increasingly, these approaches focus on understanding and engaging with patients,17 and incorporate equity-centered or emancipatory lenses that place equity more centrally in the process.1820

4. Intersectionality

Problem:

EHR-based research rarely captures variables that allow for intersectionality analyses.

Recommendation:

Intersectionality21 conceptualizes how political and economic power and oppression are linked and create systems of discrimination or disadvantage that are experienced by individuals based on identities (e.g., race, gender, sexual orientation, disability, immigration status, housing, education and income). Intersectionality is a lens that can be used to understand the differential effects of interventions tested through ePCTs. However, more refined data collection is needed to capture the identities and SDoH variables that are not typically documented in the EHR. For example, offering identities write-in options can allow the social categories that are important to patient’s experiences to emerge. Such nuances will enable analyses of individual and combined (additive or multiplicative) associations. This level of clarity and granularity can help prevent inappropriate data aggregation and increase transparency regarding decisions on how variables are produced and used in analyses. Although studies may not be powered to control for every variable, allowing for more refined social categories will help ensure more people will benefit from the interventions being tested which promotes health equity.

5. Literacy and Health Literacy

Problem:

Health information collected in PROs is often written above the NIH-recommended 5th-grade level, or are developed without the input of patient end-users. Misunderstanding of the PROs due to reading grade level or lack of community knowledge could lead to incorrected data or unvalidated data collected by patients impacting clinical decisions.

Recommendation:

The reading level of PROs should be formally evaluated, with potential cognitive testing to ensure suitability for the population of interest. As mentioned above, community partners should be involved in the review and validation of PRO content. As literacy and health literacy has a material impact on how patients interpret and respond to PRO tools, efforts should be made to appropriately capture SDoH of respondents; this includes the “digital” domains of literacy (e.g., digital health literacy, digital competence, digital agency) that may influence PRO data collection and interpretation.

Conclusion

EHR-based data collection within PCTs is increasing, leaving research vulnerable to biases in the design, collection, and use of electronic health data, and potentially propagating inequities in health and the healthcare system. Complex multilevel (national, state, and local) strategies and support from stakeholders are needed to address bias stemming from the use of EHR data for research and healthcare delivery. The embedded, ubiquitous, and often unknown biases in EHR data (due to variations in care delivery, experience, data capture or data quality, and lack of diverse representation) can limit the relevance and generalizability of results from pragmatic trials, and subsequently increase health inequities.

Disclosures

EO: Reports grants to her institution from Pfizer, BMS, and Novartis. KM: reports grants and contracts to his institution from Novartis, Amgen, Seqirus, Genentech, BMS, and Boehringer Ingelheim. ADB: reports grants from Alike Health, travel from Microsoft. All other authors have nothing to disclose.

This work was supported within the National Center for Complementary and Integrative Health (NCCIH), the National Institute of Allergy and Infectious Diseases (NIAID), the National Cancer Institute (NCI), the National Institute on Aging (NIA), the National Heart, Lung, and Blood Institute (NHLBI), the National Institute of Nursing Research (NINR), the National Institute of Minority Health and Health Disparities (NIMHD), the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), the NIH Office of Behavioral and Social Sciences Research (OBSSR), and the NIH Office of Disease Prevention (ODP). This work was also supported by the NIH through the NIH HEAL Initiative under award number U24AT010961. Demonstration Projects within the NIH Pragmatic Trials Collaboratory were supported by the following cooperative agreements with NIH Institutes, Centers, and Offices: EMBED (UG3DA047003, UH3DA047003), GGC4H (UG3AT009838, UH3AT009838), Nudge (UG3HL144163, UH3HL144163), PRIM-ER (UG3AT009844, UH3AT009844). Demonstration Projects within the NIH HEAL Initiative were supported by the following cooperative agreements with NIH Institutes, Centers, and Offices: Back In Action (UG3AT010739, UH3AT010739), BeatPain Utah (UG3NR019943), FM-TIPS (UG3AR076387,UH3AR076387), GRACE (UG3AT011265, UH3AT011265), NOHARM (UG3AG067593, UH3AG067593), OPTIMUM (UG3AT010621, UH3AT010621). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NCCIH,NIAID, NCI, NIA, NHLBI, NINR, NIMHD, NIAMS, OBSSR, or ODP, or the NIH or its HEAL Initiative.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Andrew D. Boyd, Department of Biomedical and Health Information Sciences, University of Illinois Chicago, Chicago, IL.

Rosa Gonzalez-Guarda, Duke University School of Nursing, Durham NC.

Katharine Lawrence, Department of Population Health, New York University Grossman School of Medicine, New York NY.

Crystal L. Patil, University of Illinois Chicago, College of Nursing, Chicago, IL.

Miriam O. Ezenwa, University of Florida College of Nursing, Gainesville, Florida.

Emily C. O’Brien, Department of Population Health Sciences, Duke University School of Medicine, Durham, NC.

Hyung Paek, Yale University, New Haven, CT.

Jordan M. Braciszewski, Henry Ford Health, Detroit, MI.

Oluwaseun Adeyemi, New York University Grossman School of Medicine, Ronald O. Perelman Department of Emergency Medicine, New York, NY.

Allison M Cuthel, New York University Grossman School of Medicine, Ronald O. Perelman Department of Emergency Medicine, New York, NY.

Juanita E. Darby, University of Illinois Chicago College of Nursing, Chicago, IL.

Christina K. Zigler, Duke University School of Medicine, Durham, NC.

P. Michael Ho, Division of Cardiology, University of Colorado School of Medicine, Aurora, CO.

Keturah R. Faurot, Department of Physical Medicine and Rehabilitation, University of North Carolina School of Medicine, Chapel Hill, NC.

Karen Staman, Duke University School of Medicine, Durham NC.

Jonathan W. Leigh, University of Illinois Chicago, College of Nursing, Chicago, IL.

Dana L. Dailey, St. Ambrose University, Davenport, IA and University of Iowa, Iowa City, IA.

Andrea Cheville, Mayo Clinic Comprehensive Cancer Center, Rochester, MN.

Guilherme Del Fiol, Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT.

Mitchell R. Knisely, Duke University School of Nursing.

Keith Marsolo, Department of Population Health Sciences, Duke University School of Medicine, Durham, NC.

Rachel L. Richesson, Department of Learning Health Sciences, University of Michigan Medical School.

Judith M. Schlaeger, University of Illinois Chicago, College of Nursing, Chicago, IL.

References

RESOURCES