Table 2.
Major themes identified, associated subthemes, and representative quotations.
Themes and subthemes | Representative quotations | |
Theme 1: Test performance and preparation | ||
|
Licensing examination performance | “...we evaluated the performance of ChatGPT, a language-based AI [artificial intelligence], on the United States Medical Licensing Exam (USMLE). The USMLE is a set of three standardized tests of expert-level knowledge, which are required for medical licensure in the United States. We found that ChatGPT performed at or near the passing threshold of 60% accuracy.” [8] |
|
Specialty exam performance | “We challenged it to answer questions from a more demanding, post-graduate exam—the European Exam in Core Cardiology (EECC), the final exam for the completion of specialty training in Cardiology in many countries. Our results demonstrate that ChatGPT succeeds in the EECC.” [9] |
|
Undergraduate exam performance | “It can be concluded that ChatGPT helps in seeking answers for higher-order reasoning questions in medical biochemistry.” [10] |
|
Improving understanding | “Moreover, active surgeons who completed their training over a decade ago may find LLMs [large language models] helpful for continuous medical education (CME)...By utilizing an up-to-date LLM as a supplementary resource in their decision-making process, surgeons may have additional means to stay informed and strive for evidence-based care in their patient management.” [11] |
|
Self-directed learning | “Self-directed learning with ChatGPT can be phenomenal since it incorporates multiple domains and learns from the conversation it has with the student.” [12] |
|
Exam preparation/practice | “However, ChatGPT performed acceptably in negative phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation.” [13] |
Theme 2: Novel learning strategies | ||
|
Development of personalized learning plans | “The creation of personalized quizzes for students is an illustration of the use of generative AI in medical education evaluations. By analyzing each student's strengths and weaknesses, generative AI can generate unique formative and summative assessments for each student.” [14] |
|
Creation of learning materials | “Language models can analyze the performance of individual students and generate personalized learning materials that address their specific areas of weakness. For example, if a student struggles with a particular medical concept, the language model can generate additional resources or exercises to help them better understand it.” [1] |
|
Providing feedback | “By serving as a virtual teaching assistant, ChatGPT could be leveraged to provide students with real-time and personalized feedback.” [15] |
|
Communication skills training | “Although in its infancy, AI chatbot use has the potential to disrupt how we teach medical students and graduate medical residents communication skills in outpatient and hospital settings.” [16] |
|
Clinical image generation for learning | “...text-to-picture AI system is a developing and promising tool for medical education…With the use of ‘non existing people’ we can, with a good conscience, provide image material whose dissemination on the internet or social media does not violate patients’ privacy.” [17] |
|
Medical humanities exercises | “In a small-group educational setting, students will have the ability to create art that may tell a patient’s story, help in debriefing, and share an experience with others.” [18] |
Theme 3: Writing and research assistance | ||
|
Assisting non-native speakers | “In this context, LLMs could be used to translate and correct manuscripts in ways that could reduce language barriers, thereby allowing scholarly work from non-native English-speaking countries to be considered on a more equal footing.” [19] |
|
Translations | ChatGPT’s ability to translate language effectively can be utilized by medical professionals and educators to help communicate with patients from different linguistic backgrounds, in order to provide the best medical care.” [20] |
|
Literature review/summarization | “...medical researchers can use GLMs [generative language models] to scan and analyze vast amounts of medical literature quickly, identifying relevant studies and summarizing their findings. This can significantly reduce the time spent on literature reviews, allowing researchers to focus more on their primary research work.” [14] |
|
Fabricated references/hallucinations | “Simply put: ChatGPT generates fake citations and references.” [21] |
Theme 4: Academic integrity concerns | ||
|
Cheating on examinations | “The ability of LLMs to respond to short-answer and multiple-choice exam questions can be exploited for cheating purposes.” [22] |
|
Reduced effectiveness of learning exercises | “Student dependency on the language model may also propagate academic dishonesty or ‘cheating.’ For example, a student might use ChatGPT to complete an essay or other written assignment without fully understanding the material or putting in the required effort.” [15] |
|
Technological plagiarism | “Some educators are changing their course, examination, and grading structure and updating their definition of plagiarism to include, ‘using text written by a generation system as one’s own (eg, entering a prompt into an AI tool and using the output in a paper).’” [23] |
|
Need for policy development | “Consensus-based guidelines at the institutional and/or national level should be implemented to govern the appropriate use of [generative artificial intelligence].” [24] |
|
Guidance for disclosure and transparency | “Emerging issues have been raised with technology-generated academic papers, including how to define the extent of using AI assisted editing, the way of disclosure, privacy and confidentiality, and boundary of integrity.” [25] |
Theme 5: Accuracy and dependability | ||
|
Reliance on training data | “Although ChatGPT is trained on large amounts of data, there is always the possibility of errors or oversights in its training process, and the training data itself may contain inaccurate information.” [15] |
|
Lack of up-to-date information | “...the data set that ChatGPT was trained on was last updated in 2021. As a result, it is possible that the system is not able to provide users with the most up-to-date information, decreasing its reliability.” [26] |
|
Hallucination | “ChatGPT repeats its answers with much confidence and clear explanations, even in case of a totally wrong answer. This is technically called hallucination.” [27] |
|
Confidence expressed by models | “ChatGPT, with apparent confidence, provided an essay on liver involvement which, in reality, has not been reported yet.” [28] |
|
Misinformation propagation | “Further, AI-generated content can potentially produce misinformation or biased information...” [14] |
|
Limited accuracy in specific areas | “Consequently, the current level of accuracy is not yet sufficient for immediate clinical application in patient care.” [11] |
|
Need for further training in limitations | “AI is still underrepresented in the medical curriculum, and students lack the opportunity to engage more intensively with the topic of AI and develop the required expertise.” [29] |
Theme 6: Potential detriments to learning | ||
|
Overdependence | “Lastly, there is a need to delve deeper into the possible consequences of overdependence on LLMs in medical education.” [22] |
|
Challenges with assessment | “The performance of AI on certification tests says as much about the nature of those assessments as it does about the remarkable capacity of AI to pass them. We need to think carefully about the kind of performance we want our assessments to elicit.” [30] |
|
Propagating inaccurate information | “...students may find it challenging to differentiate between genuine knowledge and unverified information. As a result, they may not scrutinize the validity of information and end up believing inaccurate or deceptive information.” [22] |
|
Inequities in access | “Generative AI tools and LLMs may increase the inequity among students and educators, given that these tools are not equally accessible to all of them.” [22] |