Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2024 Mar 20;31(9):2147–2150. doi: 10.1093/jamia/ocae055

Leveraging large language models to foster equity in healthcare

Jorge A Rodriguez 1,2,, Emily Alsentzer 3,4, David W Bates 5,6
PMCID: PMC11339521  PMID: 38511501

Abstract

Objectives

Large language models (LLMs) are poised to change care delivery, but their impact on health equity is unclear. While marginalized populations have been historically excluded from early technology developments, LLMs present an opportunity to change our approach to developing, evaluating, and implementing new technologies. In this perspective, we describe the role of LLMs in supporting health equity.

Materials and Methods

We apply the National Institute on Minority Health and Health Disparities (NIMHD) research framework to explore the use of LLMs for health equity.

Results

We present opportunities for how LLMs can improve health equity across individual, family and organizational, community, and population health. We describe emerging concerns including biased data, limited technology diffusion, and privacy. Finally, we highlight recommendations focused on prompt engineering, retrieval augmentation, digital inclusion, transparency, and bias mitigation.

Conclusion

The potential of LLMs to support health equity depends on making health equity a focus from the start.

Keywords: health equity, health disparities, digital inclusion, artificial intelligence

Introduction

Emerging technologies are transforming healthcare, but will they help close health disparities or exacerbate them? Two key currents in healthcare are a growing focus on health equity and the rise of large language models (LLMs). These currents interact and may be either synergistic or in opposition. Healthcare organizations are acknowledging the drivers of disparities, like social and structural determinants of health. At the same time, LLMs, which are artificial intelligence (AI) systems capable of processing and generating human-like text, are being lauded and integrated into care.1 However, their impact on health equity is not yet clear. Historically, marginalized populations have been left behind by new technologies and experience bias propagated by AI.2 If LLMs are to have a net positive impact on health equity, we must prioritize the use of LLMs for health equity efforts and consider how LLMs can be deployed equitably. In this perspective, we use the National Institute on Minority Health and Health Disparities (NIMHD) research framework, which highlights the factors driving health disparities, to discuss opportunities, concerns, and recommendations for the application of LLMs to support health equity.3

Opportunities

Individual health

Behavior change and disease knowledge represent key components of health equity initiatives for chronic disease management which may be augmented with LLMs. One example for addressing diabetes disparities is the Diabetes Self-Management Education and Support program (DSME/S), which provides support for behavior change.4 Despite the evidence for DMSE/S, low participation and limited personalization have hampered effectiveness, especially among marginalized communities. Promising work using text messaging lays the foundation for LLMs to impact chronic disease disparities.5 LLMs could offer patients personalized and interactive versions of these programs by adapting the health information to meet a patient’s literacy level and preferred language. For example, a patient admitted to the hospital with newly diagnosed diabetes would use an LLM-powered chatbot to learn about their disease and be better prepared to speak with a diabetes educator. The chatbot would use recommended health literacy techniques for education, like the teach-back method, to explore patient understanding.6 Further, once the patient leaves the hospital, the chatbot would provide personalized lifestyle recommendations and educational content adapted to patients’ culture and preferences.

Patients from marginalized communities, especially those with lower health and digital literacy, find it hard to access their data. The 21st Century Cures Act, which gives patients increased access to their health information, underscores the need to support patients in engaging with their data. LLMs can summarize complex medical text in patient-friendly language. For example, using an LLM-powered tool, a patient being discharged from the hospital can learn from their notes. These tools could also serve as digital assistants that help patients navigate the healthcare system. Rather than navigating a complex user interface that requires high digital literacy, patients will use LLMs to ask: “What medications were changed?,” “What are my follow up appointments?,” or “Can you help me reschedule my appointment?” This changes patients’ interaction with current information, like after-visit summaries, which often contains complex language.

Family and organizational health

LLMs could also strengthen the patient-clinician relationship and empower patients and care partners from marginalized communities. By providing support before, during, and after visits, LLM-powered tools can help patients prepare for visits by creating relevant questions about their care in the appropriate literacy level.7 During the visit, LLM-powered tools could help patients remember the generated questions, note clinician answers for later review, or understand consent forms. For example, Mirza et al8 used LLMs to improve readability of surgical consent forms from a reading level of 12.6 to 6.7. After the visit, LLMs could answer patient questions, help capture patient reported outcomes and preferences, which would inform medical decision-making by integrating the patient voice in care.9,10

Community health

LLMs have the potential to allow healthcare systems to connect patients with community resources to address social needs. This requires screening patients, referring to resources, and ensuring that patients can access these resources. This process can be onerous for patients and clinical teams alike. LLMs can augment the work of community health workers and existing programs by not only screening patients, but also facilitating the application process for resources. For example, an LLM-powered tool could screen patients for food insecurity, then help patients find their nearest food pantry or help patients navigate the application for food benefits. Additionally, there is already an example of using LLMs to extract social determinants of health (SDOH) information from free text notes, which could facilitate the identification of patients in need of resources.11

Patients from marginalized communities experience worse healthcare access, which may be mitigated with LLM-based applications. Patients can use LLMs to write their secure messages and help clinical teams respond to messages in multiple languages and literacy levels. This is already being deployed and could be expanded to support non-English languages.12 This application would not only contribute to linguistically adapted care, but also address the challenge of message volume.

Population health

LLMs can facilitate care across the healthcare system and increase operational efficiency to help bridge quality gaps, especially in healthcare settings with limited resources. Patients from marginalized populations are more likely to have multiple sites of care, requiring clinicians to review information from multiple sources. LLMs can provide clinicians with summarized clinical courses, which can minimize repetitive testing and progress patient care.

For healthcare leaders, LLMs can enhance data analysis and recognize quality gaps affecting marginalized populations. For example, leaders could use natural language to query data (eg, electronic health record data, social determinants of health data) to identify disparities in chronic disease quality metrics.11 This can augment existing data analytics infrastructure.

At the bedside, clinical decision support powered by LLMs is poised to provide clinicians with easier access to evidence-based care, especially in areas lacking access to specialists. Further, administrative hurdles to prescribing have already been described as targets for LLMs.13 This can help address challenges in access to medications and help achieve “pharmacoequity.”14

Concerns

Like other AI tools, the potential benefits of LLMs come with the risk of exacerbating health disparities. First, LLMs are trained on data that contain historical and selection biases which could propagate discrimination.10 There are known biases in the internet data used to train LLMs, and there is stigmatizing language in clinical notes.15–17 LLMs trained on data influenced by commercial entities could preferentially recommend specific treatments. Additionally, languages that are not well-represented in the training data may not be well supported for machine translation. These translations have the potential to cause communication errors and create safety concerns.18 Second, organizations serving marginalized communities have lagged in their uptake of technologies, like decision support in electronic health records, and may lack the technical infrastructure to leverage LLMs. Consequently, these organizations will be less likely to contribute patient data to further train LLMs, thus worsening performance for patients receiving care at underserved organizations. Third, patients may be hesitant or prefer to maintain their personal connections with their care teams, rather than rely on LLM-powered tools. These changes in the patient-clinician relationship could be particularly harmful for patients living in areas with low healthcare access where health systems may be more likely to implement LLMs. Fourth, mistrust in the healthcare system could be worsened by a lack of privacy and transparency in the development and use of LLMs. Fifth, affordability poses a barrier to patients and healthcare systems. While some consumer-facing LLMs are freely available, paid options could pose a barrier to access for marginalized populations. Sixth, if LLM use is expanded, marginalized populations may be predominantly impacted by their negative effects if the models underperform for their use cases or due to challenges presented by hallucinations that patients with low health and digital literacy may not be able to identify.

Recommendations

We highlight recommendations grounded in the NIMHD levels of influence (noted in parentheses) to ensure that LLMs are developed and implemented with a commitment to health equity. Central to all future LLM work is the participation of patients, especially from marginalized communities, in development, implementation, and evaluation. These recommendations add to prior work in AI and digital equity.19–21

Engineer prompts that support health equity (individual, community, societal)

The inputs or prompts provided to LLMs should be refined to ensure the output is accurate, unbiased, and language, and literacy appropriate. Engineering appropriate prompts will require collaboration between researchers, developers, clinical teams, and patients. For example, by using few-shot prompting informed by health equity principles (eg, culturally tailored care), we may be able to improve the LLM output. Recommendations for dietary recommendations could be tailored to dietary preferences based on a patient’s cultural background. These examples should be derived iteratively through collaboration with communities where the LLMs will be deployed.

Leverage retrieval augmentation and fine tuning to adapt to patient needs (individual, interpersonal, community)

LLM tools can provide personalized information to patients by leveraging data that includes culturally and linguistically diverse health information. Two methods provide the foundation for this application: (1) retrieval augmentation generation, where relevant external sources are retrieved and integrated into the prompt to improve the reliability of LLM output, and (2) fine tuning, where pretrained LLMs are further trained on data for a specific task.22,23 Cultural adaptation of health information is a nuanced process that requires multidisciplinary input to tailor information to patient’s “norms, beliefs, values, and literacy skills.”24,25 To culturally adapt LLM output, developers and researchers can use previously developed health materials (eg, patient information sheets, recipes, text message campaigns) as source material for retrieval-augmented LLM tools focused on specific disease management.26 For example, an LLM-based tool can retrieve and incorporate relevant culturally adapted diabetes self-management materials to increase the accuracy and relevance of the LLM output. Healthcare organizations can collaborate with researchers to create trustworthy sources for training LLMs. This could allow the creation of retrieval-augmented chatbots for specific diseases that would enable the patient to look back at trusted source material. Worse LLM performance on less common languages may be mitigated by the creation of smaller task specific LLMs trained on linguistically diverse text.27 Fine tuning LLMs can support further tailoring, though it requires additional training data and technical expertise which may limit its scalability.

Prioritize digital inclusion (individual, community, societal)

Digital inclusion, or the ability for all patients and communities to have a fair and just opportunity to benefit from digital tools, should remain a focus of healthcare systems as well as local, state, and national governments. Technology access disparities may become increasingly relevant as LLM-based tools for health information-seeking and patient communication are released. Healthcare and community organizations should continue efforts to increase access (eg, broadband internet and devices) to digital tools, support digital literacy, and address misinformation. As LLMs become integrated into search engines, digital inclusion efforts should focus on working with patients to understand the limitations of the information presented or how they can change their search terms or prompts to retrieve accurate information. Additionally, without appropriate focus on equitable diffusion, clinics serving patients from marginalized communities may lag in LLM uptake, thus limiting their potential impact. Healthcare systems serving marginalized communities may require additional funding and support for the technical infrastructure required for successful implementation and maintenance of LLM-based technologies.

Develop transparent notification and consent procedures (interpersonal, societal)

Balancing the use of LLMs in healthcare with the potential impact on patient trust requires transparency.28 Organizations should develop easily understandable consent forms to allow patients to understand how their data will be used (ie, will it be used for training data) and offer an opportunity to opt-out. After deployment, organizations should notify patients when LLMs will be used for their care. The most pressing example is the use of LLMs to respond to patient messages. As a first step, organizations may include a notification in the message letting patients know that an LLM has been used to respond to their question. However, for some patients the concept of LLMs or AI may be unfamiliar. Healthcare organizations should include language that is created with the input from patients and communities. The language would note the use of LLMs, but also promotes trust in the response. The use of these transparent notifications can mitigate mistrust in technology and the healthcare system.

Continuous monitoring of LLM-based technologies for biases (community, societal)

As LLM-based technologies are deployed in clinical care, it will be critical to appropriately evaluate for biases. For example, evaluations can determine if LLMs respond differently to clinical questions depending on patient demographics.29 Institutions could use “red teaming” or “jailbreaking,” which tests models for vulnerabilities that might lead to unwanted outputs, to probe for biases.30 These efforts could be informed by health equity work that has identified structural and systemic barriers to care. These evaluations could be included in policies regulating the use of LLMs for care.31,32

Conclusions

Healthcare systems and vendors alike face the challenges of implementing new technology while emphasizing equity. This will require funding and collaboration between multiple stakeholders to match the drivers of health disparities with the unique abilities of LLMs. While prior technologies have left marginalized patients behind, LLMs have great potential to improve care, only if health equity is focused on from the start.

Contributor Information

Jorge A Rodriguez, Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA 02115, United States; Harvard Medical School, Boston, MA 02115, United States.

Emily Alsentzer, Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA 02115, United States; Harvard Medical School, Boston, MA 02115, United States.

David W Bates, Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA 02115, United States; Harvard Medical School, Boston, MA 02115, United States.

Author contributions

All authors contributed to the conception, drafting, revising, and final approval of the manuscript.

Funding

J.A.R. receives funding for this research from the National Institute of Minority Health and Health Disparities grant number K23MD016439. The views expressed in this article are those of the authors and do not necessarily reflect the views and policy of the National Institutes of Health.

Conflicts of interest

D.W.B. reports grants and personal fees from EarlySense, personal fees from CDI Negev, equity from ValeraHealth, equity from Clew, equity from MDClone, personal fees and equity from AESOP, personal fees and equity from FeelBetter, personal fees and equity from Guided Clinical Solutions, and grants from IBM Watson Health, outside the submitted work. Additional authors do not have competing interests to declare.

Data availability

No new data were generated or analyzed in support of this research.

References

  • 1. Omiye JA, Gui H, Rezaei SJ, Zou J, Daneshjou R.  Large language models in medicine: the potentials and pitfalls. Ann Intern Med. 2024;177(2):210-220. 10.7326/M23-2772 [DOI] [PubMed] [Google Scholar]
  • 2. Veinot TC, Mitchell H, Ancker JS.  Good intentions are not enough: how informatics interventions can worsen inequality. J Am Med Inform Assoc. 2018;25(8):1080-1088. 10.1093/jamia/ocy052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. National Institute on Minority Health and Health Disparities. NIMHD Research Framework. 2017. Accessed September 27, 2023. https://nimhd.nih.gov/researchFramework
  • 4. Diabetes Self-Management Education and Support (DSMES) Toolkit. Diabetes. CDC; 2022. Accessed May 23, 2023. https://www.cdc.gov/diabetes/dsmes-toolkit/index.html
  • 5. ElSayed NA, Aleppo G, Aroda VR, et al.  5. Facilitating positive health behaviors and well-being to improve health outcomes: standards of care in diabetes—2023. Diabetes Care. 2022;46(Suppl 1):S68-S96. 10.2337/dc23-S005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Use the Teach-Back Method: Tool 5. Rockville, MD: Agency for Healthcare Research and Quality; Content last reviewed February 2024. Accessed September 27, 2023. https://www.ahrq.gov/health-literacy/improve/precautions/tool5.html
  • 7. QuestionBuilder App. Rockville, MD: Agency for Healthcare Research and Quality; Content last reviewed June 2022. Accessed September 27, 2023. https://www.ahrq.gov/questions/question-builder/index.html
  • 8. Mirza FN, Tang OY, Connolly ID, et al.  Using ChatGPT to facilitate truly informed medical consent. NEJM AI. 2024;1(2):AIcs2300145. 10.1056/AIcs2300145 [DOI] [Google Scholar]
  • 9. Mika AP, Martin JR, Engstrom SM, Polkowski GG, Wilson JM.  Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. J Bone Joint Surg Am. 2023;105(19):1519-1526. 10.2106/JBJS.23.00209 [DOI] [PubMed] [Google Scholar]
  • 10. Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L.  Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA. 2023;329(10):842-844. 10.1001/jama.2023.1044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Guevara M, Chen S, Thomas S, et al.  Large language models to identify social determinants of health in electronic health records. NPJ Digit Med. 2024;7(1):6-14. 10.1038/s41746-023-00970-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Epic. Epic and Microsoft Bring GPT-4 to EHRs. Published May 5, 2023. Accessed October 18, 2023. https://www.epic.com/epic/post/epic-and-microsoft-bring-gpt-4-to-ehrs
  • 13. Docs GPT. Doximity. Accessed May 23, 2023. https://www.doximity.com/docs-gpt
  • 14. Essien UR.  Pharmacoequity: a new goal for ending disparities in U.S. health care. STAT. 2021;28. Accessed May 23, 2023. https://www.statnews.com/2021/07/28/pharmacoequity-new-goal-ending-disparities-us-health-care/ [Google Scholar]
  • 15.Zack T, Lehman E, Suzgun M, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit Health. 2024;6(1):e12-e22. 10.1016/S2589-7500(23)00225-X [DOI] [PubMed] [Google Scholar]
  • 16. Park J, Saha S, Chee B, Taylor J, Beach MC.  Physician use of stigmatizing language in patient medical records. JAMA Netw Open. 2021;4(7):e2117052. 10.1001/jamanetworkopen.2021.17052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Navigli R, Conia S, Ross B.  Biases in large language models: origins, inventory, and discussion. J Data Inform Qual. 2023;15(2):1-10. 10.1145/3597307 [DOI] [Google Scholar]
  • 18. Khoong EC, Steinbrook E, Brown C, Fernandez A.  Assessing the use of Google translate for Spanish and Chinese translations of emergency department discharge instructions. JAMA Intern Med. 2019;179(4):580-582. 10.1001/jamainternmed.2018.7653 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M.  Ethical machine learning in healthcare. Annu Rev Biomed Data Sci. 2021;4(1):123-144. 10.1146/annurev-biodatasci-092820-114757 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Rodriguez JA, Clark CR, Bates DW.  Digital health equity as a necessity in the 21st century cures act era. JAMA. 2020;323(23):2381-2382. 10.1001/jama.2020.7858 [DOI] [PubMed] [Google Scholar]
  • 21. Richardson S, Lawrence K, Schoenthaler AM, Mann D.  A framework for digital health equity. NPJ Digit Med. 2022;5(1):119. 10.1038/s41746-022-00663-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20. Curran Associates Inc.; 2020:9459-9474.
  • 23. Devlin J, Chang MW, Lee K, Toutanova K.  BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T, eds. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019:4171-4186. 10.18653/v1/N19-1423 [DOI] [Google Scholar]
  • 24. University of Wisconsin Population Health Institute. Culturally Adapted Health Care. County Health Rankings & Roadmaps. 2020. Accessed September 27, 2023. https://www.countyhealthrankings.org/take-action-to-improve-health/what-works-for-health/strategies/culturally-adapted-health-care
  • 25. Attridge M, Creamer J, Ramsden M, Cannings JR, Hawthorne K.  Culturally appropriate health education for people in ethnic minority groups with type 2 diabetes mellitus. Cochrane Database Syst Rev. 2014;2014(9):CD006424. 10.1002/14651858.CD006424.pub3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Centers for Medicare & Medicaid Services. Diabetes Prevention Programs: Equity Tailored Resources. Published January 2023. Accessed February 6, 2024. https://www.cms.gov/files/document/culturally-and-linguistically-tailored-type-2-diabetes-prevention-resource.pdf
  • 27. Lai VD, Ngo NT, Veyseh APB, et al. ChatGPT beyond English: towards a comprehensive evaluation of large language models in multilingual learning. April 12, 2023. 10.48550/arXiv.2304.05613 [DOI]
  • 28. Benda NC, Novak LL, Reale C, Ancker JS.  Trust in AI: why we should be designing for APPROPRIATE reliance. J Am Med Inform Assoc. 2022;29(1):207-212. 10.1093/jamia/ocab238 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Omiye JA, Lester JC, Spichak S, Rotemberg V, Daneshjou R.  Large language models propagate race-based medicine. NPJ Digit Med. 2023;6(1):195-194. 10.1038/s41746-023-00939-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Rajani N, Lambert N, Tunstall L. Red-Teaming Large Language Models. Published February 24, 2023. Accessed October 18, 2023. https://huggingface.co/blog/red-teaming
  • 31. Centers for Medicare & Medicaid Services. CMS Framework for Health Equity. Published April 2022. Accessed May 23, 2023. https://www.cms.gov/about-cms/agency-information/omh/health-equity-programs/cms-framework-for-health-equity
  • 32. Office of Science and Technology Policy. Blueprint for an AI Bill of Rights. Published October 2022. Accessed May 23, 2023. https://www.whitehouse.gov/ostp/ai-bill-of-rights/

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No new data were generated or analyzed in support of this research.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES