Abstract
Large language models like ChatGPT are a type of machine learning model that can offer a positive paradigm shift in case-based/problem-based learning (CBL/PBL). ChatGPT may be able to augment the existing paradigm to work in conjunction with the clinical-teacher in PBL/CBL case generation. It can develop realistic patient cases that could be revised by clinical teachers to ensure accuracy and relevance. Further, it can be directed to include specific case content in order to facilitate the constructive alignment of the case with the broader learning objectives of the curriculum. There is also the possibility of improving engagement by ‘gamifying’ CBL/PBL.
Supplementary Information
The online version contains supplementary material available at 10.1007/s40670-023-01934-5.
Keywords: Medical education, Artificial intelligence, Case based learning, Medical curriculum
Large language models (LLM) are a type of artificial intelligence that can offer a positive paradigm shift in case-based/problem-based learning (CBL/PBL) curriculum development. LLM is a type of machine learning model that has been trained on a massive amount of text data (from books, websites, publications) in order to understand natural human language. LLMs are capable of mimicking human language processing abilities to generate human-like responses to a wide range of prompts and questions. These models typically use deep neural networks to process and analyse complex patterns and comprehend connections within the data they were trained upon (not unlike the connections required to generate differential diagnoses based on available information). The most well recognised LLM is OpenAI’s ChatGPT; based on the Generative Pre-trained Transformer architecture, the current free to use iteration was trained on over 45 terabytes of data. This is not a trivial amount, equivalent to 900,000 100-page word documents with 500 words per page.
Before artificial intelligence can be utilised in medical education, it is imperative that its generation of medical information be interrogated. Recent experimentation has demonstrated that ChatGPT’s large dataset and ability to comprehend complex connections has afforded it an impressive degree of medical proficiency. Evidence of ChatGPT’s medical proficiency has been demonstrated across a diverse range of implementations including its ability to pass the United States Medical Licensing Exams, accurately provide a list of differentials for cases written by internal medicine physicians and even contribute to writing published medical case reports [1–3]. Limitations of this technology, including the potential for coherent inaccurate content generation, have been described previously.
Medical educators must be vigilant and proactive in exploring the integration of new technologies into existing teaching methods. It is evident that ChatGPT has the potential for implementation in medical education. Specifically, its characteristics would lend well to integration with inquiry-based learning strategies, such as CBL/PBL. These curriculum elements are commonly used in medical education, where students are presented with realistic patient cases to work through, generally in groups. These methods seek to promote the development of skills in problem-solving, critical thinking and clinical reasoning, whilst also facilitating the integration of knowledge across multiple domains (e.g. anatomy, pathophysiology, clinical science, professionalism, public health). Good cases require a high level of clinical realism and should be set in a context representative of students future practice [4]. Generally, the clinical teacher is the cornerstone of case generation due to their ability to utilise their clinical experiences as inspiration, facilitating the incorporation of realistic case details such as availability of investigations and therapeutic options. However, this process represents a significant limitation; the generation of effective cases is time-consuming, resource intensive and tends to be a ‘one size fits all’ for a class of heterogeneous students. Additionally, medical educators are limited by their own experiences and conscious or unconscious personal biases which may influence the gender, ethnicity, genetic backgrounds and past medical histories of generated cases.
A generative AI like ChatGPT may be able to augment the existing paradigm to work in conjunction with the clinical teacher in generating cases for PBL/CBL. ChatGPT can be used to develop realistic patient cases that could be revised by clinical teachers to ensure accuracy and relevance. The possible benefits of ChatGPT integration extend beyond case volume and may include ensuring appropriate diversity in generated cases, ensuring an appropriate coverage of ethnicity, sexual orientation, gender and genetic background. This can ensure that the cases implemented into CBL/PBL are diversified and appropriately address the diversity healthcare students across the globe, and the patients they will serve in their future careers. Additionally, effective cases should be targeted to the requirements of the learners at various stages of their training and should align with pre-determined learning objectives [4]. ChatGPT can be utilised to generate cases with varying levels of complexity and be directed to include specific content in order to facilitate the constructive alignment of the case with the broader learning objectives of the curriculum (Supplement, Fig. 1). Medical educators can benefit from ChatGPT’s contextual memory, where much like traditional case generation, the user input, response generation and feedback mechanisms are not linear, rather they are cyclical and capable of modulation at any stage. At present, the ‘memory’ is limited by prompt/ input size which is currently ~ 4000 tokens (one token ~ 6 characters) with aims to increase this contextual memory to 32,000 tokens in subsequent iterations. Therefore, users are able to modify the cases generated almost ad infinitum, a very simplistic example is provided in Fig. 1 where the addition of troponins was requested in a ‘chest pain’ generated case. This simple case demonstrated in Fig. 1 required ~ 30 s of user input and feedback time. This simplistic case may be suitable for students in their pre-medical training to encourage self-directed learning and diverse approach to the potential differentials. However, more complex cases will invariably be required as students’ progress through their training which, in turn, will require slightly more time to produce (with anecdotal estimates of ~ 5–10 min for more complex cases which necessitated the inclusion of genetic and chemical chemistry). As there is no ‘best’ output, but rather only an optimal output for the required student population, there is no best narrative or input prescription that can be provided. Rather, medical educators could use the contextual memory to increase (or decrease) clinical case complexity and alter available investigations or even outcome of the case. More in-depth, detailed inputs generally will produce more detailed responses and reduce the need for cyclical feedback and alteration. Additionally, the large database offers the ability for basic scientists to modify cases so that topics previously considered arcane and not clinically germane by students can be provided with clinical context and improve retention [5, 6].
With further research, ChatGPT could also operate as a virtual tutor for students, generating cases specific to their requirements. ChatGPT is capable of challenging incorrect premises generated by the user. This feature could theoretically allow ChatGPT to generate specific cases targeted at aspects of medicine that students require further education (such as the appropriate use of CT scans to avoid unnecessary radiation exposure or practicing the interpretation of diseases into layperson terms for patient explanations). Additionally, the speed at which ChatGPT generates cases, in conjunction with its ability to remember and engage in protracted ‘conversations’ confers an ability to ‘gamify’ CBL/PBL. Gamification involves the application of elements of game-playing and has been proven to produce increases in knowledge compared to standard education practice [7]. ChatGPT could produce a ‘choose your own adventure’ style case, where in a chat-based adventure, students are privy only to details that they specifically reveal via history, physical examination or investigations, much like navigating a real clinical case but with instantaneous feedback (Supplemental Information Fig. 2).
There are however several limitations and biases that medical educators must be cognisant of if integrating generative AI [8]. As alluded previously, the quality of the input by the medical educator generally affects the quality of the output. Therefore, the accuracy and validity of output must be meticulously assessed by the medical educator to ensure the data is accurate, relevant and free of bias. An example of the need for careful expert review is included in Fig. 1, where technically the troponin values given by the generative AI are 1–2 magnitudes larger at the time frames indicated than levels reported in the literature and guidelines that form the baseline expectations for the normal progression of troponin level rise following a myocardial infarction [9]. Generative AI like ChatGPT may struggle to identify important information and differentiate between reliable and unreliable sources [10]. This may limit its insight, so whilst provisional analyses suggest that ChatGPT is relatively invariant to ethnicity or racial profiles, care is required to ensure that no discriminatory behaviour or inappropriate stereotypes are reinforced [11]. Whilst ChatGPT adheres to the European Union’s ethical guidelines for AI, there are still ethical and legal considerations that must be considered, particularly in reference its generative ability and medical education. It’s not implausible that a generative AI incidentally generates content that copyright protected or confidential information without obtaining the necessary permissions. Additionally, response generation must be vetted for ensure the case does not favour a particular medical investigation or treatment algorithm that is not otherwise advocated for in guidelines and current practice. Lastly, whilst this article has focused on ChatGPT as an example of generative AI due to its more mainstream popularity, there is no shortage of alternatives, each with their own benefits and shortcomings. ChatSonic is one such alternative that claims to be powered by Google Search and can generate real-time content. This function theoretically allows it to content about trending topics and current events/practice updates in real time. Where ChatGPT has only been trained on information datasets up to 2021, in a sector where the’doubling time’ of medical knowledge has decreased from 50 years in 1950 to 73 days in 2020, this could ensure medical education is always at the forefront of medical advancement and reduces the time lag in translational research [12, 13]. There are no shortage of alternatives, some of the more relevant for medical educators may be Learnt.ai which uses a GPT language generation model to help with tasks such as creating lesson plans and assessment questions and Character.ai, which uses neural language models to generate characters with ‘personalities’ and parameters that may be defined by the educator, and disseminated amongst the class, enabling them to engage in conversations with these ‘characters’ to practice history taking skills.
In conclusion, the use of generative AI, such as ChatGPT, has the potential to transform medical education, particularly CBL/PBL. Whilst there are potential limitations and challenges associated with this technology, with appropriate human oversight and quality control, it has the potential to enhance the quality and efficiency of medical education. Further research is needed to evaluate the effectiveness of ChatGPT-generated CBL/PBL cases and to identify best practices for their implementation.
Supplementary Information
Below is the link to the electronic supplementary material.
Declarations
Conflict of Interest
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digital Health. 2023;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study. Int J Environ Res Public Health. 2023;20(4):3378. doi: 10.3390/ijerph20043378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Nachshon A, Batzofin B, Beil M, Van Heerden PV. When palliative care may be the only option in the management of severe burns: a case report written with the help of ChatGPT. Cureus. 2023. [DOI] [PMC free article] [PubMed]
- 4.Azer SA, Peterson R, Guerrero APS, Edgren G. Twelve tips for constructing problem-based learning cases. Med Teach. 2012;34(5):361–367. doi: 10.3109/0142159X.2011.613500. [DOI] [PubMed] [Google Scholar]
- 5.Ikah DSK, Finn GM, Swamy M, White PM, McLachlan JC. Clinical vignettes improve performance in anatomy practical assessment. Anat Sci Educ. 2015;8(3):221–229. doi: 10.1002/ase.1471. [DOI] [PubMed] [Google Scholar]
- 6.Karabacak M, Ozkara BB, Margetis K, Wintermark M, Bisdas S. The advent of generative language models in medical education. JMIR Medical Education. 2023;9:e48163. doi: 10.2196/48163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Abdulmajed H, Park YS, Tekian A. Assessment of educational games for health professions: a systematic review of trends and outcomes. Med Teach. 2015;37(Suppl 1):S27–32. doi: 10.3109/0142159X.2015.1006609. [DOI] [PubMed] [Google Scholar]
- 8.Busch F, Adams LC, Bressem KK. Biomedical ethical aspects towards the implementation of artificial intelligence in medical education. Medical Science Educator. 2023;33(4):1007–1012. doi: 10.1007/s40670-023-01815-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sandoval Y, Apple FS, Mahler SA, Body R, Collinson PO, Jaffe AS. High-sensitivity cardiac troponin and the 2021 AHA/ACC/ASE/CHEST/SAEM/SCCT/SCMR guidelines for the evaluation and diagnosis of acute chest pain. Circulation. 2022;146(7):569–581. doi: 10.1161/CIRCULATIONAHA.122.059678. [DOI] [PubMed] [Google Scholar]
- 10.Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. doi: 10.3389/frai.2023.1169595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hanna JJ, Wakene AD, Lehmann CU, Medford RJ. Assessing racial and ethnic bias in text generation for healthcare-related tasks by ChatGPT (1). medRxiv. 2023.
- 12.Densen P. Challenges and opportunities facing medical education. Trans Am Clin Climatol Assoc. 2011;122:48–58. [PMC free article] [PubMed] [Google Scholar]
- 13.Kundu S. How will artificial intelligence change medical training? Commun Med. 2021;1(1). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
