Abstract
BACKGROUND
The purpose of this 2024 study was to determine if there is an association between the usage of artificial intelligence (AI) to study and exam scores of medical students in the preclinical phase of their schooling.
METHODS
We created and distributed a survey via an unbiased third-party to students in the class of 2027 at the Kirk Kerkorian School of Medicine at UNLV to evaluate students AI use to study for their preclinical system-based exams. Students were categorized into two groups, those that use AI to study and those who do not. Two-sample t-tests were run to compare the mean exam scores of both groups on six different organ system exams as well as the cumulative final exam score for each group. The group that did use AI was further asked about which AI tools they use and how exactly they use these tools to study for preclinical examinations.
RESULTS
The results of the study showed that there is no statistically significant difference in exam scores between students who use AI for study purposes and students who do not. It was also found that most AI users studied with ChatGPT. The most common way users studied was by using AI to simplify and clarify topics they did not understand.
CONCLUSIONS
Based on the results of this study, we concluded that usage of AI programs for students for medical examinations did not yield a positive or negative effect on students’ organ system-based exam scores.
Keywords: artificial intelligence, medical students, exam scores, NBME, phase I
Introduction
Medical education resources have evolved over time from written textbooks to digital articles and now include innovative study tools to enhance students’ ability to retain vast amounts of information within limited timeframes. Some of these study resources include UptoDate, Pubmed, and Google. 1 Several studies have been conducted that showed medical students preferred these digital learning modalities to traditional textbooks. In 2002, a survey at the University of Iowa Roy J. and Lucille A. Carver College of Medicine found that 85% of the medical students who responded selected electronic sources as their preferred method of study materials. 2 Another survey found that 76% of medical students who were surveyed during their clinical pediatric rotation used digital resources and 30% did not use their textbook for studying at all. 3
In recent years, artificial intelligence (AI) has become a valuable resource for medical students and healthcare professionals to access information and develop a deeper understanding of subject matter. AI was first implemented in general education in the 1920s as a form of self-tutoring, but modern advancements in technology have made it into what we know and see today, 100 years later. 4 AI models were introduced into medical education training in the 1980s with the development of CIRCSIM, an AI program designed to teach medical students about cardiovascular physiology.4,5 Since then, AI has become much more readily accessible via programs like ChatGPT and several studies were conducted regarding its role in medical education. One study discussed situations where using ChatGPT was effective for medical school studies including brainstorming of differential diagnoses, interactive medical practice cases, and multiple choice question review. The same study also showed that AI is limited in its ability to simulate human cognition or intuition. AI was ineffective at tasks where it was asked to use humanistic elements of medical practice learned in both the preclinical and clinical setting, such as interpret sensory and nonverbal cues and build rapport and interpersonal relationships. 6 Another study evaluated the performance of ChatGPT on questions used in preparation for the United States Medical Licensing Examination (USMLE) Steps 1 and 2. The study found that ChatGPT is capable of correctly answering enough questions correctly to meet the benchmark passing standards for both Step licensing examinations, indicating its ability to synthesize medical information at the level expected from a third-year medical student. 7 This study suggests that AI models could potentially be used as a tutor because of its performance and ability to provide explanations that were “coherent in terms of logical reasoning, use of internal information, and use of external information.” 7 As AI develops and new resources are created, physicians who utilize these tools can expect to experience improvements in productivity, thereby limiting their time spent placing orders, writing notes, or precharting, and increasing their face-time with patients. As these effects continue to occur with the development of AI tools to be used in medicine, so too might the relationships of physicians and patients.
While this area of interest has great potential, it is not without its limitations and concerns. In a survey of 3018 medical students, over 85% of respondents agreed that AI resources would improve access to healthcare-related knowledge as doctors and just over 70% of respondents felt AI would lead to a reduction in medical errors. 8 Other surveys of medical students demonstrate a concern for AI usage given a lack of formal training on the domain. 9 Lack of exposure to this field of development could lead to inappropriate usage of these tools or potentially dangerous decisions by clinicians due to inaccurate or incomplete information. These rising concerns have prompted scholars to advocate for formal training and creations of curricula that emphasize ethical considerations and other themes regarding AI usage in healthcare. Specifically, a review of articles from 2000 to 2020 identified five recurring themes that should be addressed when including AI in medical education. 10 These themes included a formal training on AI proficiency and ethical considerations as well as emphasizing training on basic people and professional skills. 11 Similarly, another literature review emphasized the call for a “humanistic approach to patient safety” and “early exposure to patient-oriented integration.” 12 It is clear that there is great concern that AI carries the risk of dehumanizing physicians and medical students and, if integrated, should be done so in a manner that considers the aforementioned curricula themes.
AI focused curriculum development appears to be imminent. One article cited that 94% of the 486 respondents from 17 different medical schools in Canada recognized that AI use in medical education was inevitable. 10 This impending matter has led scholars to question how AI should be introduced in medical training and what are the ethical considerations of doing so. Some researchers propose the “early stages of a preclerkship curriculum represent an ideal opportunity” to use AI programs to teach early medical students basic science topics and apply them to real-world care. 12 It is proposed that early exposure to AI tools with an emphasis on ethical usage in the medical setting will assist students in developing the skills needed to utilize AI safely and efficiently as physicians. Harvard Medical School has already taken steps to provide a curriculum for its students that includes a 1 month introductory course on AI in healthcare and expects that the most “successful physicians and researchers will be the ones who can harness genAI for innovation and strategic planning.” 13
As scholars advocate for early and formal education on AI resources used in the medical field, it would be valuable to know if there is a quantifiable benefit to those receiving this education. As mentioned previously, medical students are already using AI resources in their studies due to a perceived beneficial relationship, ie, several resources are free and easy to use. What is not well known, is if students who use these AI resources perform better on their preclinical exams than those who do not. The purpose of our study is to identify if there is a correlation between AI usage and preclinical exam performance in first fast medical students. We also want to identify the specific AI tools the subjects use to study and how they use those tools.
Methods
Study design, setting, and participants
We, the authors, distributed an online survey to the Kirk Kerkorian School of Medicine at UNLV (KKSOM) class of 2027 via the survey program Qualtrics by an unbiased third-party official. This survey assessed the usage of AI in studying for preclinical medical school exams. Participant inclusion criteria consisted of all students who voluntarily responded to the survey in the class of 2027 at KKSOM regardless of AI usage. Previous classes were excluded due to their advancement to clinical courses and risk of recollection bias if their previous exam scores were included. The reporting of this study conforms to the STROBE statement for cross-sectional studies [STROBE cross-sectional checklist]. 14
Ethical considerations
This study was approved by the UNLV Biomedical Institutional Review Board (UNLV-2023-694) on 02-12-2024.
Sample recruitment
Participants were recruited via email through an institutional list or word of mouth within the institution of study. Prior to the survey, participants were required to complete an informed consent. Upon agreement, participants proceeded to the questionnaire.
Instruments
We created the survey and distributed it to medical students via the survey program Qualtrics. Please see Appendix A for a complete account of the questions included in the survey.
Procedure
The survey was distributed prior to the first organ block exam for the KKSOM Class of 2027. The survey used was an original creation which consisted of nine items that inquired if subjects used AI to study for medical school exams. Based on their response, subjects were separated into two groups: AI users (yAI) and non-AI users (nAI). If subjects used AI to study for exams they were also asked which AI programs they use and how they use these programs to study. Subjects’ responses were recorded and linked to an anonymous, unique identification number. The survey questions are shown in Appendix A. Subjects then, over the course of 4 months, completed their Pulmonary, Cardiovascular, Renal, Gastrointestinal, Endocrine, Reproductive and Comprehensive Final exams. Exams consisted of 60 retired NBME questions chosen by the organ block course director from a data bank. Raw, un-curved scores were recorded and anonymously matched to the subjects survey via an unbiased third-party official to ensure students’ survey responses and grades were not shared with the student-authors. If subjects were placed in the yAI group the survey asked further questions about what AI they use and how they use AI to study for exams. The responses were grouped into one of the following categories: clarification and simplification of new or confusing material, summarization and content breakdown, practice questions and information application, clinical context and differentiation of confusing material, study guide creation.
Data analysis
We first examined data for normality with the Shapiro-Wilk normality test. A two-sample t-test was then conducted to analyze mean test scores from each exam comparing the yAI and nAI groups (α = 0.01). RStudio version 2024.04.2 + 764 was used for all statistical analysis.
Results
We distributed the survey to the KKSOM class of 2027. Of the 68 total students in the cohort, there were 38 responses (56% response rate) leading to a study population of 38 subjects. Of these 38 students, there were 23 subjects (60.5%) who used AI to study and 15 subjects (39.5%) who did not use AI to study.
The average results on examinations between each study group are reported in Table 1. The yAI group averaged 82.4% over the six exams while the nAI group averaged 82.7%. Each individual exam followed a similar trend with average scores ranging from 80.2% to 85.0%.
Table 1.
Average exam results and standard deviations of the yAI and nAI groups for each organ block NBME exam with associated P-values and the average of all seven exams.
EXAM | yAI RAW SCORE (%) (n = 23) |
nAI RAW SCORES (%) (n = 15) |
P-VALUE | yAI RANGE (%) | nAI RANGE (%) |
---|---|---|---|---|---|
Pulmonary | 80.7 ± 10.3 | 80.2 ± 8.9 | 0.88 | 48–95 | 67–92 |
Cardiovascular | 83.4 ± 8.2 | 85.0 ± 5.2 | 0.48 | 58–95 | 77–97 |
Renal | 82.4 ± 6.4 | 82.1 ± 6.5 | 0.91 | 70–95 | 73–92 |
Gastrointestinal | 81.8 ± 6.1 | 81.8 ± 7.4 | 0.99 | 67–95 | 62–93 |
Endocrine | 82.0 ± 9.2 | 81.08 ± 4.4 | 0.66 | 55–93 | 72–90 |
Reproductive | 82.9 ± 5.1 | 82.38 ± 7.1 | 0.81 | 73–92 | 70–93 |
Comprehensive final | 83.4 ± 5.9 | 82.69 ± 2.8 | 0.60 | 73–94 | 77–87 |
7 Exam group average | 82.4 | 82.7 |
Of those who stated that they use AI to study, the survey asked participants which AI tools they use. At the time of the survey, 22 of the 23 subjects (95.7%) used ChatGPT 3.5 or 4 as a study tool. Of the subjects in the yAI group 6 (26.1%) used more than one AI tool to study. The final results are tabulated in Table 2.
Table 2.
Total number of users for each AI tool including ChatGPT, THI.AI, and other from the yAI group (n = 23).
AI RESOURCE | ChatGPT | THI.AI | OTHER | COMBINATION OF TOOLS |
---|---|---|---|---|
Number of users | 22 (95.7%) | 6 (26.1%) | 1 (4.3%) | 6 (26.1%) |
Those in the yAI group were also asked how they use AI to study. Responses were grouped into the following general categories: clarification and simplification of new or confusing material, summarization and content breakdown, practice questions and information application, clinical context and differentiation of confusing material, study guide creation. Many participants used AI in more than one of the ways above to study. Table 3 shows the breakdown of the way participants use AI to study. All responses are shown in Appendix B.
Table 3.
Total number of yAI participant responses to how they use AI grouped into the following categories: clarification and simplification of new or confusing material, summarization and content breakdown, practice questions and information application, clinical context and differentiation of confusing material, and study guide creation.
HOW TO USE AI | CLARIFICATION AND SIMPLIFICATION | SUMMARIZING AND CONTENT BREAKDOWN | PRACTICE AND APPLICATION | CLINICAL CONTEXT AND DIFFERENTIATION | STUDY GUIDE CREATION |
---|---|---|---|---|---|
Number of users (n = 23) |
15 | 8 | 3 | 3 | 2 |
Discussion
The purpose of our work was to evaluate the extent to which medical students at a single institution use AI programs to study and whether AI usage correlated to higher exam performance in their preclinical years. Our results showed that there was not a statistically significant difference in exam performance between the two groups on all 7 exams and the overall average of all 7 exams. As previously mentioned, there is very little literature regarding AI usage and exam performance in medical school. Therefore, the results of our study are a unique addition to this field of research.
Much of the literature regarding AI and educational application discusses theoretical benefits as well as more tangible benefits such as saving student time or tailoring their education to meet their individual needs.15–17 Currently, there is little information on whether using AI actually correlates to higher exam performance. A 2023 literature review of 92 articles regarding AI usage in education reported that approximately 27 of those articles commented on how AI improves students’ academic success in one form or another. 18 One study, for example, found that AI programs improved students’ speaking skills compared to a control group of human interactions. 19 Interestingly, another study stated that students who used generative AI programs to write essays, on average, scored over 6% lower than those who did not use these programs. 20 These studies’ results highlighted a negative correlation between AI usage and academic performance. Though important in their implications, these articles did not address a direct correlation between students’ AI usage and exam performance. Previous studies have shown that the introduction of new technologies for studying may not change or worsen school performance when it is first introduced, but as students become more familiar, these novel study methods can improve performance.21,22 The introduction of AI into studying may follow a similar pattern requiring more longitudinal studies following student use of AI and their performance in academic settings.
The limited research and conflicting findings have created a gap in knowledge that our study hopes to address. The results of this study ultimately indicate that using AI tools to study yields similar results as those who do not use AI. These findings are important as medical students spend significant amounts of money on third-party resources and several of the AI programs that students admitted to using, such as those found in Table 2, are free programs.
There were a number of limitations to our research. First, this research was confined to a 66-student cohort of a single institution and relied on responses to a survey to determine the number of participants leading to a limited sample size that may not represent the population decreasing the study's external validity. Additionally, data collection with a survey introduces many limitations on its own. The self-selection of the participants introduces selection bias which may lead to respondents having different characteristics compared to those who chose not to participate. Furthermore, there are a number of confounding variables that may have affected the average scores within the two groups. Baseline academic performance, preferred study habits, and total time spent studying were not accounted for in this study. In addition, medical education resources are abundant and students typically rely on more than one resource to acquire knowledge in the same manner as those who do not use AI tools. Finally, the brief duration of the study restricts the ability to assess the long-term impacts of AI usage on learning, knowledge retention, and academic performance.
Conclusion
Medical education is a rapidly evolving and often expensive field for medical students to navigate. Advancements of AI improved students’ access to information without the higher costs associated with other third-party resources. Our study shows that students who utilized AI tools to study for subject exams during the first phase of medical school, performed just as well as those who did not use AI. Future research should continue to investigate the tangible benefits of AI tools to medical students throughout their educational experience including basic medical sciences and clerkship rotations.
Supplemental Material
Supplemental material, sj-pdf-1-mde-10.1177_23821205251320150 for Evaluating the Use of Artificial Intelligence as a Study Tool for Preclinical Medical School Exams by Peyton G. Sakelaris, Kaitlyn V. Novotny, Miriam S. Borvick, Gemma G. Lagasca and Edward G. Simanton in Journal of Medical Education and Curricular Development
Acknowledgments
We would like to thank the Kirk Kerkorian School of Medicine at UNLV for their assistance with this project.
Appendix
A: AI Use Survey
1. What Block are you currently in?
- 2. Do you use Artificial Intelligence (AI)?
- Yes
- No
If “No” is selected as answer to Question 2 end Survey
- 3. Do you use AI to study for Medical School?
- Yes
- No
If “No” is selected as answer to Question 3 end Survey
- 4. Which AI tool(s) do you use to study for Medical School? (choose all that apply)
- Chat GPT 3.5
- Chat GPT 4.0
- THI.AI
- Other
- 5. How often do you use AI to Study?
- 1 – Rarely (once or twice for this exam)
- 2
- 3 – Sometimes (A couple times per week)
- 4
- 5 – All the time (Multiple times per day)
6. How do you use AI to study? (Examples: I ask questions about material I don't understand; I have it summarize book chapters; I use it to create a study plan; I use it to create practice questions)
- 7. What pass of material do you use AI for? (choose all that apply)
- first pass
- second pass
- third pass
- fourth+ pass
- 8. How prepared do you feel for this exam?
- 1 – Completely unprepared
- 2
- 3 – Indifferent
- 4
- 5 – I Feel Good about this Exam
- 9. How much of your preparation do you attribute to your AI use?
- 1 – AI did not help me for this exam
- 2
- 3 – Indifferent
- 4
- 5 – I know things primarily due to AI
Appendix B
How do you use AI to study?
(Examples: I ask questions about material I don't understand; I have it summarize book chapters; I use it to create a study plan; I use it to create practice questions)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Footnotes
Authors Contributions: PS was a major contributor in the writing and editing of the manuscript. KN was a major contributor in the writing and editing of the manuscript. MB was an editor of the manuscript. PS analyzed the statistics that contributed to the results of the manuscript. GL was an editor of the manuscript. ES was an editor of the manuscript and distributed the survey used in the study. All authors read and approved the final manuscript.
Availability of data and materials: Raw data can be made available upon reasonable request to the correspondence author.
Data/Research Materials Availability Statement: The data and research materials used in this study can be accessed through the corresponding author upon reasonable request.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics approval and consent to participate.: Experiment protocols were approved by the University of Nevada, Las Vegas Biomedical Institutional Review Board. Protocol was approved on February 12, 2024, under protocol ID number UNLV-2023-694 with the protocol title of “AI Study Tool and Its Impact on Academic Performance in Undergraduate Medical Education.”
FUNDING: The authors received no financial support for the research, authorship, and/or publication of this article.
ORCID iDs: Peyton G. Sakelaris https://orcid.org/0009-0003-7717-7349
Kaitlyn V. Novotny https://orcid.org/0009-0008-3140-6691
Previous Presentations: Material has previously been presented as a poster at the Western Group on Educational Affairs conference on 5/5/2024 in Irvine, CA as well as the International Association of Medical Science Educators Conference on 6/16/2024 in Minneapolis, MN.
Supplemental material: Supplemental material for this article is available online.
References
- 1.Ryan L, Sheehan K, Marion MI, Harbison J. Online resources used by medical students, a literature review. MedEdPublish. 2020;9:136. doi: 10.15694/mep.2020.000136.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Peterson MW, Rowat J, Kreiter C, Mandel J. Medical students’ use of information resources: is the digital age dawning? Acad Med. 2004;79(1):89–95. doi: 10.1097/00001888-200401000-00019 [DOI] [PubMed] [Google Scholar]
- 3.Scott K, Morris A, Marais B. Medical student use of digital learning resources. Clin Teach. 2018;15(1):29–33. doi: 10.1111/tct.12630 [DOI] [PubMed] [Google Scholar]
- 4.Randhawa GK, Jackson M. The role of artificial intelligence in learning and professional development for healthcare professionals. Healthc Manage Forum. 2020;33(1):19–24. doi: 10.1177/0840470419869032 [DOI] [PubMed] [Google Scholar]
- 5.Evens MW, Chang RC, Lee YH, et al. CIRCSIM-Tutor: an intelligent tutoring system using natural language dialogue. In: Proceedings of the Fifth Conference on Applied Natural Language Processing: Descriptions of System Demonstrations and Videos (ANLC ‘97). Association for Computational Linguistics, 1997, pp. 13–14, doi: 10.3115/974281.974289. [DOI] [Google Scholar]
- 6.Safranek CW, Sidamon-Eristoff AE, Gilson A, Chartash D. The role of large language models in medical education: applications and implications. JMIR Med Educ. 2023;9:e50945. doi: 10.2196/50945 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical licensing examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi: 10.2196/45312 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Civaner MM, Uncu Y, Bulut F, Chalil EG, Tatli A. Artificial intelligence in medical education: a cross-sectional needs assessment. BMC Med Educ. 2022;22(1):772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pucchio A, Rathagirishnan R, Caton N, et al. Exploration of exposure to artificial intelligence in undergraduate medical education: a Canadian cross-sectional mixed-methods study. BMC Med Educ. 2022;22(1):815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lee J, Wu AS, Li D, Kulasegaram KM. Artificial intelligence in undergraduate medical education: a scoping review. Acad Med. 2021;96(11S):S62–S70. doi: 10.1097/ACM.0000000000004291 [DOI] [PubMed] [Google Scholar]
- 11.Han ER, Yeo S, Kim MJ, Lee YH, Park KH, Roh H. Medical education trends for future physicians in the era of advanced technology and artificial intelligence: an integrative review. BMC Med Educ. 2019;19(1):460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Seth P, Hueppchen N, Miller SD, et al. Data science as a core competency in undergraduate medical education in the age of artificial intelligence in health care. JMIR Med Educ. 2023;9:e46344. doi: 10.2196/46344 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gehrman E. How generative AI is transforming medical education. Harvard Med Mag. 2023. Published 2023 Oct 10. Available from: https://magazine.hms.harvard.edu/articles/how-generative-ai-transforming-medical-education. [Google Scholar]
- 14.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344–349, doi: 10.1016/j.jclinepi.2007.11.008. [DOI] [PubMed] [Google Scholar]
- 15.Pisica AI, Edu T, Zaharia RM, Zaharia R. Implementing artificial intelligence in higher education: pros and cons from the perspectives of academics. Societies. 2023;13(5):118. doi: 10.3390/soc13050118 [DOI] [Google Scholar]
- 16.Paranjape K, Schinkel M, Nannan Panday R, Car J, Nanayakkara P. Introducing artificial intelligence training in medical education. JMIR Med Educ. 2019;5(2): e16048. doi: 10.2196/16048. Available at: https://mededu.jmir.org/2019/2/e16048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lee H. The rise of ChatGPT: exploring its potential in medical education [published correction appears in Anat Sci Educ. 2024 Aug 1, doi: 10.1002/ase.2496]. Anat Sci Educ. 2024;17(5):926–931. doi: 10.1002/ase.2270 [DOI] [PubMed] [Google Scholar]
- 18.Chiu TKF, Xia Q, Zhou X, Chai CS, Cheng M. Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education. Comp Educ: Artif Intell. 2023;4:100118. doi: 10.1016/j.caeai.2022.100118 [DOI] [Google Scholar]
- 19.Kim HS, Kim N, Cha Y. Is it beneficial to use AI chatbots to improve learners’ speaking performance? J AsiaTEFL. 2021;18:161–178. doi: 10.18823/asiatefl.2021.18.1.10.161 [DOI] [Google Scholar]
- 20.Wecks JO, Voshaar J, Plate B, Zimmermann J. Generative AI usage and academic performance. SSRN Electr J. 2024:18–21. doi: 10.2139/ssrn.4812513 [DOI] [Google Scholar]
- 21.Do DH, Lakhal S, Bernier M, Bisson J, Bergeron L, St-Onge C. Drivers of iPad use by undergraduate medical students: the technology acceptance model perspective. BMC Med Educ. 2022;22(1):87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Miller T. Developing numeracy skills using interactive technology in a play-based learning environment. Int J STEM Educ. 2018;5(1):39. doi: 10.1186/s40594-018-0135-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-pdf-1-mde-10.1177_23821205251320150 for Evaluating the Use of Artificial Intelligence as a Study Tool for Preclinical Medical School Exams by Peyton G. Sakelaris, Kaitlyn V. Novotny, Miriam S. Borvick, Gemma G. Lagasca and Edward G. Simanton in Journal of Medical Education and Curricular Development