Abstract
The growing presence of large language models (LLMs) in health care applications holds significant promise for innovative advancements in patient care. However, concerns about ethical implications and potential biases have been raised by various stakeholders. Here, we evaluate the ethics of LLMs in medicine along 2 key axes: empathy and equity. We outline the importance of these factors in novel models of care and develop frameworks for addressing these alongside LLM deployment.
Keywords: ChatGPT, AI, artificial intelligence, large language models, LLMs, ethics, empathy, equity, bias, language model, health care application, patient care, care, development, framework, model, ethical implication
Introduction
The rapid proliferation of applications that leverage the ability of large language models (LLMs) to use large amounts of complex information to find relevant patterns and apply them to novel use cases promises great innovation in health care and many other sectors. Many health care applications, such as clinical decision support, patient education, electronic health records (EHRs), and workflow optimization, have been proposed [1]. Despite the immense potential advantages of this technology, various key stakeholders have raised concerns regarding its ethical implications and potential perpetuation of existing biases and structural barriers [2-6]. Furthermore, its growing usage in the health care setting also raises the concern of transparency or disclosure about its use and role in patient management. Ethically incorporating LLMs into health care delivery requires honest dialogue about the principles we aim to uphold in patient care and a comprehensive analysis of the various ways in which LLMs could bolster or impair these.
Studies have demonstrated the utility of LLMs as a clinical decision support tool in various settings, including in triage, diagnostics, and treatment [7-11]. While LLMs show great promise in improving the efficiency of clinical workflows, they lack one key facet of physician-patient encounters: empathy. Though LLMs can be trained to use empathetic language [12] and have been able to use empathetic language in patient interactions [13], this concept of artificial empathy is easily distinguishable from real empathy from a patient’s perspective, and real empathy matters to patients [14]. The concept of artificial empathy, which aims to imbue artificial intelligence (AI) with human-like empathy, ought not to be considered interchangeable with human empathy. Efforts made to design artificial empathy, while commendable, should aim to be complementary to human empathy in order to avoid further isolating patients in their time of need by destroying the therapeutic alliance between patients and physicians [15]. Loneliness is one of the key public health crises of our time, and conflating technology with human-to-human interaction will only exacerbate this [16]. Empathic care for patients should be one of the core mandates of the health care sector, and true empathy requires human connection. Therefore, while LLMs show great promise in clinical workflows, they should augment, rather than replace, physician-led care (Table 1).
Table 1.
Approach | Primary motivation | Impact on empathy and health equity |
LLM-led clinical care or patient-facing LLMs | Advancement-driven: incorporation of new and sophisticated technologies mainly aimed at improving efficiency |
|
Physician-led LLM incorporation in clinical care | Holistic, equitable, and empathetic health care delivery |
|
In addition to empathy, equity is crucial in novel models of care. The current most popular LLMs, including ChatGPT, Bard, Med-PaLM, and others, are trained on vast sources of data, including wide swaths of the internet. These sources are rife with inherent biases and lack transparency regarding the contents of the training data sets. They also lack specific evaluation of model biases, which may be harbingers of ethical dilemmas via the rapid incorporation of LLMs into clinical spaces. While there is little consensus regarding the degree of bias in current LLMs, in most embedding models, which have similar underlying architecture, there is evidence of racial, gender, and age bias [17]. LLMs have been demonstrated to associate negative terms with given names that are popular among the African American as well as with the masculine poles of most gender axes [17]. Until systematic evaluation of LLMs is performed in clinical use cases to understand and mitigate biases against vulnerable demographics, careful risk-benefit calculations and a regulatory framework should be implemented by relevant governing bodies before LLMs are permitted in clinical care. This framework must ensure that these models are improving health care delivery and outcomes for all. Importantly, the US Food and Drug Administration lacks a robust authorization pathway for software as a medical device; this in itself is challenging, and given the rapid development of LLMs, would benefit from expeditious guidelines [18] (see Table 2 for proactive measures to ensure the equitable incorporation of LLMs into health care). Following a previously published ethical framework for integrating innovative domains into medicine, we suggest an LLM framework guided by Blythe et al [19] grounded in principled primary motivations as detailed in Tables 1 and 2.
Table 2.
Stakeholder | Examples of proactive measures |
Regulatory bodies |
|
Professional societies |
|
Journals |
|
Software developers and industry |
|
Despite these ethical risks, the potential benefits of incorporating LLMs into health care are numerous. LLMs are adept at quickly synthesizing large amounts of complex data, which can form the basis for numerous applications in the health care sector, including the management and interpretation of EHRs and clinical notes, adjuncts for patient visits (eg, encounter transcription and patient translation), billing for medical services, patient education, and more [20,21]. Thus, the key ethical question at hand is as follows: do the benefits outweigh the risks?
From a utilitarian perspective, we must consider this question to not only enhance decision-making but also take advantage of opportunities to mitigate potential harms. Proposals for the incorporation of a systematized, frequently reevaluated method of bias evaluation into clinical applications of LLMs [3], the addition of human verification steps at both the input and output stages for LLM-guided generation of clinical texts [22], and the implementation of self-questioning—a novel prompting strategy that encourages prompt iteration to improve accuracy in a medical context—are all steps in the correct direction. Comprehensive frameworks that include the use of diverse training data sources and continuous evaluation of bias, such as those proposed by the World Economic Forum and the Coalition for Health AI, can provide useful guardrails as new proposals for ethical validation and have been tested [23,24]. Furthermore, ensuring that physicians are actively involved in the development and evaluation of LLMs for health care is essential in keeping with a physician-led approach. Strategies such as these are key in navigating the ethics of empathy and equity in the development of novel clinical technologies.
It is essential to approach the ethical conundrums of LLM adoption in clinical care with a balanced perspective. LLMs that were built on data with inherent systemic biases must be implemented strategically into health care through a justice-oriented innovation lens to advance health equity. To keep pace with the accelerated adoption of LLMs in the clinic, ethical evaluations should be conducted together with an evaluation of use case efficacy to ensure both efficient and ethical health care. A complete assessment of the risks and benefits associated with this technology—an admittedly challenging task—may remain elusive if not tested in real-world settings. Clinical use cases of LLMs are already being tested; delaying collaboration among all stakeholders, including health care professionals, ethicists, AI researchers, and (crucially) patients, will only delay the discovery of potential harms. Real-world pilots, therefore, should be deployed alongside regular monitoring, oversight, and feedback from all parties. As we collectively seek to make full use of this exciting new technology, we must keep empathy and equity at the forefront of our minds.
Acknowledgments
This project was supported in part by an award from the National Institute of General Medical Sciences (T32GM144273). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health.
Abbreviations
- AI
artificial intelligence
- EHR
electronic health record
- LLM
large language model
Footnotes
Conflicts of Interest: EF is co-chair of the Radiological Society of North America (RSNA) Health Equity Committee; associate editor and editorial board member of the Journal of the American College of Radiology (JACR); has received speaker honoraria for academic Grand Rounds, from WebMD and from GO2 for Lung Cancer foundation; GO2 Foundation Travel support; grant funding from NCI K08 1K08CA270430-01A1. ML is a consultant for GE Healthcare and for Takeda, Roche, and SeaGen Pharma. AL is a consultant for the Abbott Medical Device Cybersecurity Council.
References
- 1.Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023 May 4;6:1169595. doi: 10.3389/frai.2023.1169595. https://europepmc.org/abstract/MED/37215063 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rozado D. Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types. PLoS One. 2020 Apr 21;15(4):e0231189. doi: 10.1371/journal.pone.0231189. https://dx.plos.org/10.1371/journal.pone.0231189 .PONE-D-19-18498 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Garrido-Muñoz I, Montejo-Ráez A, Martínez-Santiago F, Ureña-López LA. A survey on bias in deep NLP. Appl Sci. 2021 Apr 02;11(7):3184. doi: 10.3390/app11073184. [DOI] [Google Scholar]
- 4.Liu R, Jia C, Wei J, Xu G, Vosoughi S. Quantifying and alleviating political bias in language models. Artificial Intelligence. 2022 Mar;304:103654. doi: 10.1016/j.artint.2021.103654. [DOI] [Google Scholar]
- 5.Li H, Moon JT, Purkayastha S, Celi LA, Trivedi H, Gichoya JW. Ethics of large language models in medicine and medical research. Lancet Digital Health. 2023 Jun;5(6):e333–e335. doi: 10.1016/s2589-7500(23)00083-3. [DOI] [PubMed] [Google Scholar]
- 6.Meskó Bertalan, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. 2023 Jul 06;6(1):120. doi: 10.1038/s41746-023-00873-0. doi: 10.1038/s41746-023-00873-0.10.1038/s41746-023-00873-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rao A, Kim J, Kamineni M, Pang M, Lie W, Dreyer KJ, Succi MD. Evaluating GPT as an adjunct for radiologic decision making: GPT-4 versus GPT-3.5 in a breast imaging pilot. J Am Coll Radiol. 2023 Oct;20(10):990–997. doi: 10.1016/j.jacr.2023.05.003.S1546-1440(23)00394-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rao A, Kim J, Kamineni M, Pang M, Lie W, Succi M. Evaluating ChatGPT as an adjunct for radiologic decision-making. medRxiv. doi: 10.1101/2023.02.02.23285399. doi: 10.1101/2023.02.02.23285399. Preprint posted online February 7, 2023.2023.02.02.23285399 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad A, Landman Adam, Dreyer Keith J, Succi Marc D. Assessing the utility of ChatGPT throughout the entire clinical workflow. medRxiv. doi: 10.1101/2023.02.21.23285886. doi: 10.1101/2023.02.21.23285886. Preprint posted online February 26, 2023.2023.02.21.23285886 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Varney ET, Lee CI. The potential for using ChatGPT to improve imaging appropriateness. J Am Coll Radiol. 2023 Oct;20(10):988–989. doi: 10.1016/j.jacr.2023.06.005.S1546-1440(23)00474-X [DOI] [PubMed] [Google Scholar]
- 11.Chonde DB, Pourvaziri A, Williams J, McGowan J, Moskos M, Alvarez C, Narayan AK, Daye D, Flores EJ, Succi MD. RadTranslate: an artificial intelligence-powered intervention for urgent imaging to enhance care equity for patients with limited English proficiency during the COVID-19 pandemic. J Am Coll Radiol. 2021 Jul;18(7):1000–1008. doi: 10.1016/j.jacr.2021.01.013. https://europepmc.org/abstract/MED/33609456 .S1546-1440(21)00032-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sharma A, Lin IW, Miner AS, Atkins DC, Althoff T. Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nat Mach Intell. 2023 Jan 23;5(1):46–57. doi: 10.1038/s42256-022-00593-2. [DOI] [Google Scholar]
- 13.Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023 Jun 01;183(6):589–596. doi: 10.1001/jamainternmed.2023.1838.2804309 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Guidi C, Traversa C. Empathy in patient care: from 'Clinical Empathy' to 'Empathic Concern'. Med Health Care Philos. 2021 Dec 01;24(4):573–585. doi: 10.1007/s11019-021-10033-4. https://europepmc.org/abstract/MED/34196934 .10.1007/s11019-021-10033-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Smoktunowicz E, Barak A, Andersson G, Banos RM, Berger T, Botella C, Dear BF, Donker T, Ebert DD, Hadjistavropoulos H, Hodgins DC, Kaldo V, Mohr DC, Nordgreen T, Powers MB, Riper H, Ritterband LM, Rozental A, Schueller SM, Titov N, Weise C, Carlbring P. Consensus statement on the problem of terminology in psychological interventions using the internet or digital components. Internet Interv. 2020 Sep;21:100331. doi: 10.1016/j.invent.2020.100331. https://boris.unibe.ch/id/eprint/146055 .S2214-7829(20)30013-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jaffe S. US Surgeon General: loneliness is a public health crisis. The Lancet. 2023 May;401(10388):1560. doi: 10.1016/s0140-6736(23)00957-1. [DOI] [PubMed] [Google Scholar]
- 17.Nadeem M, Bethke A, Reddy S. StereoSet: Measuring stereotypical bias in pretrained language models. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021; Online. 2021. pp. 5356–5371. [DOI] [Google Scholar]
- 18.Dortche K, McCarthy G, Banbury S, Yannatos I. Promoting health equity through improved regulation of artificial intelligence medical devices. JSPG. 2023 Jan 23;21(03) doi: 10.38126/JSPG210302. [DOI] [Google Scholar]
- 19.Blythe JA, Flores EJ, Succi MD. Justice and innovation in radiology. J Am Coll Radiol. 2023 Jul;20(7):667–670. doi: 10.1016/j.jacr.2023.05.005.S1546-1440(23)00400-3 [DOI] [PubMed] [Google Scholar]
- 20.Jiang LY, Liu XC, Nejatian NP, Nasir-Moin M, Wang D, Abidin A, Eaton K, Riina HA, Laufer I, Punjabi P, Miceli M, Kim NC, Orillac C, Schnurman Z, Livia C, Weiss H, Kurland D, Neifert S, Dastagirzada Y, Kondziolka D, Cheung ATM, Yang G, Cao M, Flores M, Costa AB, Aphinyanaphongs Y, Cho K, Oermann EK. Health system-scale language models are all-purpose prediction engines. Nature. 2023 Jul 07;619(7969):357–362. doi: 10.1038/s41586-023-06160-y. https://europepmc.org/abstract/MED/37286606 .10.1038/s41586-023-06160-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ali SR, Dobbs TD, Hutchings HA, Whitaker IS. Using ChatGPT to write patient clinic letters. Lancet Digital Health. 2023 Apr;5(4):e179–e181. doi: 10.1016/s2589-7500(23)00048-1. [DOI] [PubMed] [Google Scholar]
- 22.Singh S, Djalilian A, Ali MJ. ChatGPT and ophthalmology: exploring its potential with discharge summaries and operative notes. Semin Ophthalmol. 2023 Jul 03;38(5):503–507. doi: 10.1080/08820538.2023.2209166. [DOI] [PubMed] [Google Scholar]
- 23.A Blueprint for Equity and Inclusion in Artificial Intelligence 2022. World Economic Forum. [2023-11-14]. https://www.weforum.org/whitepapers/a-blueprint-for-equity-and-inclusion-in-artificial-intelligence/
- 24.Blueprint for Trustworthy AI Implementation Guidance and Assurance for Healthcare 2023. Coalition for Health AI. [2023-11-14]. https://www.coalitionforhealthai.org/papers/blueprint-for-trustworthy-ai_V1.0.pdf .