INTRODUCTION
The emergence of large language models (LLMs) with user-friendly chatbot interfaces has revolutionized access to advanced artificial intelligence (AI). Tools previously gated behind innumerable access barriers are now widely available. However, as more members of the public explored the potential applications of these models, the simplicity of the interfaces belied the many nuances of the underlying technology, resulting in several high-profile mishaps when using LLMs.1 This increase in public awareness was accompanied by wider scrutiny of the potential transformative impact of AI on society in general, both good and bad, resulting in an executive order on the responsible use of AI.2
Given that the consequences of errors in healthcare are potentially life-altering, a thoughtful, measured approach is imperative for the introduction of new technologies. While the focus is frequently on improving clinician performance, the potential for novel use cases in quality improvement and research also deserves rigorous exploration. As a key invested party within these areas, the challenge for hospital medicine lies in realizing the promise of these AI-enabled or assisted innovations while mitigating and overcoming their limitations (Table 1).
TABLE 1.
Thematic summary.
| Quality improvement | Research | |
|---|---|---|
| Promises |
|
|
| Limitations |
|
|
Abbreviation: EHR, electronic health record.
PROMISES OF AI IN QUALITY IMPROVEMENT AND PATIENT SAFETY
The incorporation of AI into clinical medicine is frequently envisioned as the creation of advanced clinical decision support tools that aid healthcare providers at the point of care. These tools can be categorized into two distinct types, each with its own unique set of benefits. The first type focuses on the standardization of processes that might otherwise vary widely, such as the initial workup sent by primary care physicians when placing referrals.3 The second type enhances the personalization of care, such as the use of AI to direct the overall diagnostic approach,4 perform diagnostic stewardship,5 and guide surgical decision-making and perioperative care.6 These two “flavors” of AI tools enable more precise, efficient, and patient-centered approaches to healthcare, marrying the rigors of standardized treatment with the nuanced understanding of individual patient needs.
The prowess of AI extends to automated real-time patient monitoring tools. Many act through clinicians as intermediaries by flagging potentially actionable events for further review such as inappropriate antimicrobial use,7 arrhythmias,8 iatrogenic hypoglycemia,9 sepsis,10 impending ICU escalations,11 readmission risk,12 the accidental inclusion of confidential information in shared notes,13 and the need for advanced care planning.14 Alternatives to this model include fully automated titration of medications15 as well as the prediction of nonclinical outcomes such as emergency department volumes.16 The result is an increased awareness of patient status and dynamicity throughout their clinical journeys, enabling previously impossible patient safety and quality improvement interventions.
LIMITATIONS OF AI IN QUALITY IMPROVEMENT AND PATIENT SAFETY
Despite its promise, the application of AI to quality improvement and patient safety remains limited. In terms of the models alone, the promise of improved standardization frequently falls apart in the face of challenges with the generalization of algorithms across practice settings (e.g., differences between patients admitted to general medicine versus oncology inpatient services), the erosion of model accuracy in the face of changing practice patterns and data drift over time, and the potential for models to perpetuate pre-existing disparities.17,18 The practical application of machine learning algorithms to deliver instantaneous clinical decision support at the point of care in real-time is also challenging, and currently requires ad hoc solutions to overcome technical barriers associated with immature infrastructure and standards for data exchange.19
More importantly, the success of AI technologies rests on the interaction between the models themselves, the humans they interact with,20 and the health systems within which they are embedded. For instance, a model with equal predictive performance across all patients during training and validation may ultimately exacerbate inequalities in patient outcomes once deployed if not assessed and monitored following implementation.21 The attitude and knowledge of healthcare providers towards AI, the interpretability of AI models, and the potential for automation bias can all hinder the effective deployment of AI. Just-in-time education in the form of effective user interfaces can mitigate some of these issues but cannot substitute for increasing baseline data literacy among clinicians. Beyond the optimization of the clinician-machine interface, the broader implementation framework must account for other health systems considerations such as patient attitudes, competing priorities for clinicians, financial sustainability,22 and potential legal ramifications.23
PROMISES OF AI IN RESEARCH
Scholarly productivity remains an area of concern within academic hospital medicine, with a minority of faculty achieving promotion to associate or full professor and a median of zero publications.24 Clinical burden is commonly cited for lack of productivity, suggesting that tools that assist researchers with efficiency may be helpful. A recent survey of postdocs by Nature showed that about 1/3 of postdocs are using generative AI chatbots for research.25 As a “thought partner,” AI may contribute to study design, identifying relevant variables and potential correlations that would otherwise go unnoticed. When functioning as a “coding assistant,” it can rapidly generate boilerplate code, allowing researchers to focus on higher-level analytical tasks. As a “research assistant,” AI can assist with literature searches and figure generation. As an “editor” in the writing process, AI can correct grammar, distill complex thoughts into digestible frameworks, and assist with code switching. AI's role as an informal “reviewer” can be particularly beneficial, offering feedback and prompting authors to consider different aspects of their work that might require further elucidation or support. This is especially true given the relative paucity of experienced research and scholarship mentors in many academic hospital medicine groups, which can hinder junior faculty career progression.
One of the most time-consuming tasks in clinical research is manual chart review. Automated chart reviews for the identification of relevant predictive variables and labels can significantly cut down on the associated time and resource expenditure required for clinician review.26 Natural Language Processing (NLP) tools can parse unstructured data (particularly important when capturing social determinants of health, which are often scattered throughout multiple documents), while computer vision algorithms can supplement the analysis of telemetry and medical imaging27 beyond what is available in written reports. Further, the development of novel natural language interfaces can help minimize analyst time and resources for cohort identification and data extraction.28 Ultimately, automated data processing pipelines can result in the development of highly capable predictive clinical models.29
The potential for AI to optimize clinical trials deserves special mention given the enormous associated cost. AI has the potential to revolutionize participant screening and selection by identifying eligible subjects from the EMR or other large databases.30,31 LLMs can potentially assist with the creation of easily understandable content at various literacy levels in different languages, enabling recruitment of a more diverse and inclusive participant pool. Finally, just as AI can automate documentation and improve standardization in clinical care, it can similarly improve the efficiency of clinical trials through decreased protocol deviations and improved trial reporting.
LIMITATIONS OF AI IN RESEARCH
A fundamental challenge in training machine learning models for clinical applications is the absence of universal gold standards for clinically relevant outcomes or processes. There is wide variability in clinical practice with poor interclinician reliability when it comes to how individual clinicians diagnose and treat the same conditions. This is especially true for common diagnoses in hospital medicine, such as congestive heart failure, pneumonia, and COPD exacerbation that are defined clinically rather than through clear objective criteria. These discrepancies can pollute the training data resulting in unusable models. Addressing this challenge requires a concerted effort to foster consensus among healthcare professionals and to establish evidence-based benchmarks, accounting for the full spectrum of severity and acuity seen even within the same diagnoses in hospital medicine.
Finally, transparency is paramount when utilizing AI in research. The extent and nature of AI involvement must be explicitly stated in research methodologies to ensure the integrity of the study and to maintain the trust of the scientific community and the public. In particular, generative AI tools introduce risks such as the spread of misinformation or fabrication of citations, necessitating critical evaluation by subject matter experts. Most journals have adopted guidelines for the disclosure of AI assistance that researchers must adhere to, thereby maintaining transparency.32 Specifically with respect to authorship, the Committee on Publication Ethics (COPE) position statement bars AI tools from being credited as a coauthor as they are unable to take responsibility for the integrity of submitted work.33 The investigators and authors remain ultimately responsible and accountable for the work presented, regardless of the tools used to do the work.
CONCLUSION
AI offers promising advancements in QI and research within hospital medicine, providing tools that may enhance decision-making and clinician efficiency, improve patient outcomes, lower healthcare costs, and increase the pace of scientific discovery. However, the adoption of these tools comes with significant challenges, particularly in terms of integration into clinical practice, demonstration of impact, and ethical considerations such as the potential to exacerbate pre-existing biases. As the rapid pace of innovation in AI continues, its relationship with clinicians and researchers must be carefully managed to ensure the best outcomes for patients and the integrity of scientific discovery.
ACKNOWLEDGMENTS
This study was funded by NIH/National Institute of Allergy and Infectious Diseases (1R01AI17812101), NIH/National Institute on Drug Abuse Clinical Trials Network (UG1DA015815-CTN-0136), Gordon and Betty Moore Foundation (Grant #12409), Stanford Artificial Intelligence in Medicine and Imaging–Human-Centered Artificial Intelligence (AIMI-HAI) Partnership Grant, Doris Duke Charitable Foundation–COVID-19 Fund to Retain Clinical Scientists (20211260). Google, Inc. Research collaboration Co-I to leverage electronic health record data to predict a range of clinical outcomes. American Heart Association–Strategically Focused Research Network–Diversity in Clinical Trials. NIH-NCATS-CTSA grant (UL1TR003142) for common research resources.
Funding information
Stanford Center for Artificial Intelligence in Medicine and Imaging; Gordon and Betty Moore Foundation, Grant/Award Number: 12409; Google; National Institute on Drug Abuse, Grant/Award Number: UG1DA015815 - CTN-0136; Doris Duke Charitable Foundation, Grant/Award Number: 20211260; National Institute of Allergy and Infectious Diseases, Grant/Award Number: 1R01AI17812101; National Center for Advancing Translational Sciences, Grant/Award Number: UL1TR003142; American Heart Association
Footnotes
CONFLICT OF INTEREST STATEMENT
Jonathan H. Chen is cofounder of Reaction Explorer LLC that develops and licenses organic chemistry education software. Paid consulting fees from Sutton Pierce, Younker Hyde MacFarlane, and Sykes McAllister as a medical expert witness. The other authors declare no conflicts of interest.
REFERENCES
- 1.Weiser B. Here's what happens when your lawyer uses ChatGPT, NY Times (Print). May 27, 2023. [Google Scholar]
- 2.House TW. Executive order on the safe, secure, and trustworthy development and use of artificial intelligence. The White House. 2023. Accessed November 6, 2023. https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/ [Google Scholar]
- 3.Fouladvand S, Gomez FR, Nilforoshan H, et al. Graph-based clinical recommender: predicting specialists procedure orders using graph representation learning. J Biomed Inf. 2023;143:104407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Adler-Milstein J, Chen JH, Dhaliwal G. Next-generation artificial intelligence for diagnosis: from predicting diagnostic labels to “wayfinding”. JAMA. 2021;326:2467–2468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rabbani N, Ma SP, Li RC, et al. Targeting repetitive laboratory testing with electronic health records-embedded predictive decision support: a pre-implementation study. Clin Biochem. 2023;113:70–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mahajan A, Esper S, Oo TH, et al. Development and validation of a machine learning model to identify patients before surgery at high risk for postoperative adverse events. JAMA Netw Open. 2023;6:e2322285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Corbin CK, Sung L, Chattopadhyay A, et al. Personalized antibiograms for machine learning driven antibiotic selection. Commun Med. 2022;2:38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Perez MV, Mahaffey KW, Hedlin H, et al. Large-scale assessment of a smartwatch to identify atrial fibrillation. N Engl J Med. 2019;381:1909–1917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mathioudakis NN, Abusamaan MS, Shakarchi AF, et al. Development and validation of a machine learning model to predict near-term risk of iatrogenic hypoglycemia in hospitalized patients. JAMA Netw Open. 2021;4:e2030913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Adams R, Henry KE, Sridharan A, et al. Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis. Nature Med. 2022;28:1455–1460. [DOI] [PubMed] [Google Scholar]
- 11.Singh K, Valley TS, Tang S, et al. Evaluating a widely implemented proprietary deterioration index model among hospitalized patients with COVID-19. Ann Am Thorac Soc. 2021;18:1129–1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Brown Z, Bergman D, Holt L, et al. Augmenting a transitional care model with artificial intelligence decreased readmissions. J Am Med Dir Assoc. 2023;24:958–963. [DOI] [PubMed] [Google Scholar]
- 13.Rabbani N, Bedgood M, Brown C, et al. A natural language processing model to identify confidential content in adolescent clinical notes. Appl Clin Inform. 2023;14:400–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jung K, Kashyap S, Avati A, et al. A framework for making predictive models useful in practice. J Am Med Inform Assoc. 2021;28:1149–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nimri R, Battelino T, Laffel LM, et al. Insulin dose optimization using an automated artificial intelligence-based decision support system in youths with type 1 diabetes. Nature Med. 2020;26:1380–1384. [DOI] [PubMed] [Google Scholar]
- 16.Tideman S, Santillana M, Bickel J, Reis B. Internet search query data improve forecasts of daily emergency department volume. J Am Med Inform Assoc. 2019;26:1574–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lu J, Sattler A, Wang S, et al. Considerations in the reliability and fairness audits of predictive models for advance care planning. Front Digital Health. 2022;4:943768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kim J, Cai ZR, Chen ML, Simard JF, Linos E. Assessing biases in medical decisions via clinician and AI chatbot responses to patient vignettes. JAMA Netw Open. 2023;6:e2338050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Corbin CK, et al. DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record. J Am Med Inform Assoc. 2023;114:1532–1542. doi: 10.1093/jamia/ocad114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Seneviratne MG, Li RC, Schreier M, et al. User-centred design for machine learning in health care: a case study from care management. BMJ Health Care Inform Online. 2022;29:e100656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018;169:866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shah NH, Entwistle D, Pfeffer MA. Creation and adoption of large language models in Medicine. JAMA. 2023;330:866–869. [DOI] [PubMed] [Google Scholar]
- 23.Mello MM, Guha N. ChatGPT and physicians' malpractice risk. JAMA Health Forum. 2023;4:e231938. [DOI] [PubMed] [Google Scholar]
- 24.Sumarsono A, et al. Scholarly productivity and rank in academic hospital medicine. J Hosp Med. 2021;16(9):545–548. doi: 10.12788/jhm.3631 [DOI] [PubMed] [Google Scholar]
- 25.Nordling L. How ChatGPT is transforming the postdoc experience. Nature. 2023;622:655–657. [DOI] [PubMed] [Google Scholar]
- 26.Mahdavi M, Choubdar H, Rostami Z, et al. Hybrid feature engineering of medical data via variational autoencoders with triplet loss: a COVID-19 prognosis study. Sci Rep. 2023;13:2827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Huang J, Neill L, Wittbrodt M, et al. Generative artificial intelligence for chest radiograph interpretation in the emergency department. JAMA Netw Open. 2023;6:e2336100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yuan C, Ryan PB, Ta C, et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc. 2019;26:294–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. npj Dig Med. 2018;1:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Askin S, Burkhalter D, Calado G, El Dakrouni S. Artificial intelligence applied to clinical trials: opportunities and challenges. Health Technol. 2023;13:203–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Oikonomou EK, Thangaraj PM, Bhatt DL, et al. An explainable machine learning-based phenomapping strategy for adaptive predictive enrichment in randomized clinical trials. npj Dig Med. 2023;6:217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Leung TI, de Azevedo Cardoso T, Mavragani A, Eysenbach G. Best practices for using AI tools as an author, peer reviewer, or editor. J Med Internet Res. 2023;25:e51584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.COPE. Authorship and AI Tools. Committee on Publication Ethics. 2023. https://publicationethics.org/cope-position-statements/ai-author [Google Scholar]
