Artificial Intelligence in Medical Education Assessments: Navigating the Challenges to Academic Integrity

John C Lin; Cameron A Sabet; Christopher Chang; Ingrid U Scott

doi:10.1007/s40670-024-02178-7

editorial

. 2024 Sep 24;35(1):509–512. doi: 10.1007/s40670-024-02178-7

Artificial Intelligence in Medical Education Assessments: Navigating the Challenges to Academic Integrity

John C Lin ¹, Cameron A Sabet ², Christopher Chang ³, Ingrid U Scott ^4,^✉

PMCID: PMC11933566 PMID: 40144121

Abstract

Artificial intelligence (AI) chatbots are threatening academic integrity in medical education. Potential misuse of AI chatbots to cheat has led to internet restrictions by academic institutions. We propose five strategies to mitigate this issue: (1) in-person proctored exams, (2) online proctored exams, (3) clear expectations and consequences, (4) institutional culture of integrity, and (5) addressing exam pressure. Given the limitations of these strategies, it may be necessary to consider regulations, collaboration of medical educators with AI developers, and a broader reimagining of medical education.

Keywords: ChatGPT, Artificial intelligence, Academic integrity, Academic honesty, Medical education

Artificial intelligence (AI) chatbots like ChatGPT have recently demonstrated the ability to achieve passing scores on professional medical examinations like the United States Medical Licensing Examination (USMLE) Step 1 [1]. One study from Strong and colleagues revealed ChatGPT surpassed first- and second-year Stanford medical students by an average of 4.2 points on a clinical reasoning exam [2]. These advancements bear significant implications for medical education, especially for medical schools with untimed, unproctored, remote preclinical exams. With the de-emphasis of in-person learning in favor of external study materials and online assessments, [3] the temptation to seek AI assistance to provide text-based or multiple choice answers for online and self-administered exams, and possibly even virtual situational judgment tests and interviews, cannot be overlooked. This new reality poses significant challenges to academic integrity, the efficacy of medical training, and, in the long run, patient safety. We assess five strategies to maintain academic integrity with the advent of AI in assessing skills, alongside considerations for how medical educators might navigate current use of AI in academic evaluations, which differs across various contexts.

Proposals for Curricular Reform

In-person proctored exams — In-person proctoring can deter students from using AI assistance during exams due to the immediate possibility of getting caught and can identify suspicious behavior like odd eye shifts or keystroke patterns. Proctors can also maintain strict control, banning electronics and scanning for covert items.

However, in-person exams are logistically demanding and the approach strains resources—requiring ample secure testing facilities and trained proctors who may be inconsistent in rule enforcement. Moreover, in-person exams may not fully prevent sophisticated methods of discreet AI-assisted cheating such as micro-earpieces tucked under hair or smartwatches beneath long sleeves. As the pandemic bolstered remote learning’s appeal, traditional, in-person proctoring has become less popular and increasingly unfeasible. According to research from Dyer and colleagues, students reported being significantly less likely to engage in cheating behaviors during in-person proctored exams compared to unproctored exams, with cheating frequency scores of 1.19 for proctored tests and 1.32 for unproctored tests on a 5-point scale [4]. When considering only students who admitted to some degree of cheating, the effect was even larger, and students generally felt that it was more acceptable to cheat on unproctored assessments.4 Research from Hill and LoPalo also support the value of in-person proctored testing, revealing that online examinations boosted performance by 9.1–9.8% [5]. This roughly translated to a two-thirds standard deviation increase—improvement by an entire letter grade—when comparing online test performance to tests taken in a supervised classroom environment [5].
Online proctored exams — In contrast to medical schools with untimed, unproctored, remote preclinical exams, many medical programs offer proctored remote exams exclusively. Given the explosive popularity of online assessments during the recent COVID-19 pandemic, virtual proctoring tools have been developed to monitor a student’s screen and actions during the exam, record suspicious activities, and flag potential misconduct for further review. A study conducted by Dendir and Maxwell found that the use of webcam software to proctor online exams noticeably decreased the likelihood of cheating in two web-based courses with a decline in average exam scores by 18.6% and 13.5%, respectively, after the intervention [6]. Furthermore, regression analysis highlighted a more pronounced correlation between students’ grade point average (GPA) and exam performance in a proctored environment, where each additional GPA point translated to an 8% increase in scores, versus a 5% increase without the use of proctoring tools [6]. Though actual cheating rates under both situations could only be estimated by proxy, this research suggests that proctoring can be effective in maintaining academic integrity.

Though proctoring has been shown to reduce instances of academic dishonesty, [7] privacy concerns arise with such monitoring systems, as they often require access to the student’s computer screen and webcam. False positives can also occur, when innocent behaviors are flagged as suspicious by human or AI proctors, causing unnecessary distress for students. Worse, some students might lack access to a reliable Internet connection, a quiet environment, or the necessary hardware to participate in an online proctored exam. Unlike with in-person proctoring, security measures can be bypassed using separate devices outside the camera’s field of vision, screen-sharing with experts, impersonating, and malware attacks on a proctor’s computer [8].
Establishing clear, well-defined academic integrity policies is valuable for discouraging AI-assisted cheating. These policies should provide unambiguous guidelines on what constitutes acceptable use of AI tools in medical education. For instance, policies may specify that the use of AI language models, such as ChatGPT, is prohibited for graded assignments and exams unless explicitly permitted by the instructor. Policies should also detail the consequences of violating academic integrity standards, such as course failure, disciplinary probation, or even dismissal from the medical program. To be effective, these expectations must be widely disseminated and punishments consistently enforced. Faculty should discuss these policies with students at the beginning of each course and include them in syllabi and assignment instructions. However, the effectiveness of this approach depends on strong AI detection capabilities, which may be lacking.

As AI tools become more sophisticated, it can be challenging to distinguish between human-written and AI-generated content. Therefore, institutions should invest in advanced AI detection software that analyze writing style, syntax, and originality. Evaluation by AI detection resource Scribbr found that Originality.AI (https://originality.ai/) and QuillBot (https://quillbot.com/plagiarism-checker) have among the highest ratings for precision and accuracy of the available AI detection tools [9]. Using these tools, faculty members should report potential AI-powered academic dishonesty. Training on how to set expectations can come from participating in school-specific training sessions like the Yale Poorvu Center’s in-person sessions titled “Navigating AI Literacy” and “Teaching in the Age of Generative AI” [10, 11]. These sessions have granted educators experience using AI and strategies for designing assignments that reduce cheating. Faculty can also participate in initiatives like Harvard’s metaLAB “AI Pedagogy Project” (https://aipedagogy.org/) focusing on AI understanding, interactive chatbot sessions, and the development of cheat-resistant evaluation methods. Finally, the “Practical AI for Instructors and Students” video series by Mollick and Mollick at the University of Pennsylvania is another great resource for educators seeking to learn more about the technology and how to effectively regulate AI use in the classroom [12].
Institutional culture of integrity — A culture of integrity is important for deterring academic dishonesty. Students should receive guidance on using AI tools in their learning responsibly, with policies emphasizing honesty as a core value of the medical profession and providing opportunities for growth. By fostering a strong culture of academic integrity, institutions can effectively discourage dishonest behaviors, as students internalize values of trust, responsibility, and respect for academic standards when treated as responsible individuals. Indeed, one study by McCabe and Trevino revealed a correlation between the presence of honor codes in colleges and lower instances of academic dishonesty, suggesting that simply promoting integrity and trusting students can help maintain high ethical standards [13]. However, it can be difficult to receive commitment from all stakeholders, and the lack of direct oversight can leave room for dishonest actions to go unnoticed or unpunished [13]. Moreover, the subjective nature of what constitutes integrity can lead to inconsistencies and potential unfairness in handling violations.

To build a culture of integrity, institutions must start by clearly communicating their academic integrity values and expectations to all stakeholders, including students, faculty, and staff. This can be done through orientation sessions, workshops, and ongoing educational campaigns that emphasize the importance of academic honesty and the consequences of cheating. Organizations could model such workshops after San Francisco State University’s Center for Equity and Excellence in Teaching and Learning (CEETL), which offers workshops educating students on AI use and academic integrity principles [14]. Institutions should also let students help lead the development and promotion of academic integrity policies, as this can increase student ownership of the process. For instance, honor councils can establish AI sub-committees specifically responsible for educating their peers about academic integrity with these technologies and investigating related violations. Faculty could also play a valuable role in reinforcing academic integrity values. They should consistently demonstrate ethical behavior in their own academic work and interactions mentoring students. Faculty could also focus on creating classroom environments that emphasize individual growth rather than competition. An emphasis on collaborative learning, encouraging open discussion of ethical dilemmas, and offering constructive feedback on assignments.
Addressing exam pressure — When formidable exams such as the USMLE Step 1 cast a long shadow over preclinical education, students may feel besieged by stress, inching toward the temptation of academic dishonesty, including leveraging AI tools for unfair advantages. For medical schools still on numbered grading systems, pass/fail grading schemes could be instrumental in minimizing competition and anxiety over grades, nudging students towards valuing the essence of learning above high grades at all costs. Georgetown’s initiative in promoting preclinical team-based learning activities, which evaluate collective performance, stands as a promising model for nurturing this sense of teamwork among medical students and mitigating the individual drive to engage in dishonest practices. Medical schools could also replace large examinations separated by months with weekly, low-stakes formative assessments like those given at Howard University School of Medicine, implement pass/fail grading systems, and emphasize group learning dynamics [15]. Regular formative assessments offer students ongoing feedback, easing the pressure associated with major exams. These assessments are valuable in helping students recognize areas for improvement promptly and before the block continues, helping them tailor their study methods more effectively than concentrated exams at the end of each block. However, the frequent creation of assessments demands significant effort from faculty, who already juggle teaching, research, and clinical responsibilities. To tackle this problem proactively, medical schools could allocate specific time for developing assessments, conduct faculty development programs on crafting effective assessments, and honor faculty excellence in teaching and evaluation. It is crucial to recognize that students might still be drawn to using AI tools for tasks perceived as menial or of lesser importance. To counter this, educators must ensure that all assessments, formative ones included, are aligned with the learning objectives of the course and provide meaningful feedback. Moreover, medical schools should extend comprehensive support services, including academic advising, tutoring, and mental health counseling, to assist students in developing efficient study strategies, managing stress and anxiety, and maintaining overall well-being. By tackling the underlying issues of academic dishonesty and creating a supportive educational environment, medical schools can alleviate the compulsion to cheat and cultivate a culture of integrity and collective effort among students.

Given the limitations of the proposed strategies, regulation of AI usage in education may be necessary. Establishing a central database of AI users and their student status could help deter unethical use of AI. Student data would need to be deidentified to protect their data from unauthorized third-party use, and would only be re-identified after an account was flagged for suspicious activity or inputs of school-specific copyrighted test questions. Medical school faculty could also design clear, non-negotiable paths for escalating concerns related to unauthorized AI use, bringing accountability to those who do not comply with exam AI usage policies. Furthermore, collaboration of medical educators and academic leadership with AI developers and regulators could help ensure that AI plays a supportive role in advancing medical education rather than a detrimental one. For instance, the Liaison Committee on Medical Education (LCME) could collaborate with the United States Department of Education to mandate AI literacy and ethics in medical school curricula, ensuring physicians are aware of this tool poised to shape the profession. The Association of American Medical Colleges (AAMC) and medical schools could also collaborate with AI developers to develop educational tools that can ensure students can reap the benefits of this technology maximally, without straying into unregulated territory.

The rise of AI reveals the need for a broader reimagining of medical education assessments. Given that AI is well-suited to storing and recalling vast amounts of information, rather than human-like thoughtful reasoning, its success on medical professional exams may reflect an overemphasis of medical exams on rote memorization. To ensure that medical professionals are adequately prepared for real-world medical practice, assessments could be restructured to focus more on critical thinking, interpersonal skills, and clinical decision-making—areas in which AI currently cannot compete with human ability. For instance, problem-based learning, which integrates scenario-based assessments, can train students in factual knowledge, empathy, and practical wisdom. Likewise, students should be educated about the capabilities, limitations, and ethics of AI and human-AI collaboration. While it is difficult to predict whether these strategies will avoid breaches in academic integrity, such transformations in medical education are increasingly important in our AI-filled future.

Declarations

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2): e0000198. 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Strong E, DiGiammarino A, Weng Y, et al. Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern Med. 2023;183(9):1028–30. 10.1001/jamainternmed.2023.2909. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Burk-Rafel J, Santen SA, Purkiss J. Study behaviors and USMLE Step 1 performance: implications of a student self-directed parallel curriculum. Acad Med J Assoc Am Med Coll. 2017;92(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 56th Annual Research in Medical Education Sessions):S67-S74. 10.1097/ACM.0000000000001916. [DOI] [PubMed]
4.Dyer JM, Heidi C. Pettyjohn, Steve Saladin. Academic dishonesty and testing: how student beliefs and test settings impact decisions to cheat. J Natl Coll Test Assoc. 2020;4(1).
5.Hill AJ, LoPalo M. The effects of online vs in-class testing in moderate-stakes college environments. Econ Educ Rev. 2024;98: 102505. 10.1016/j.econedurev.2023.102505. [Google Scholar]
6.Dendir S, Maxwell RS. Cheating in online courses: evidence from online proctoring. Comput Hum Behav Rep. 2020;2: 100033. 10.1016/j.chbr.2020.100033. [Google Scholar]
7.Will Douglas Heaven. ChatGPT is going to change education, not destroy it. MIT Technology Review. Published April 6, 2023. Accessed March 28, 2024. https://www.technologyreview.com/2023/04/06/1071059/chatgpt-change-not-destroy-education-openai/.
8.Developer W. 11 ways of cheating in online exams and how to prevent them? | SkillRobo. Skillrobo New. Published February 7, 2024. Accessed March 27, 2024. https://www.skillrobo.com/11-ways-of-cheating-in-online-exams/.
9.Driessen K. Best AI detector | Free & premium tools tested. Scribbr. Published February 26, 2024. Accessed March 28, 2024. https://www.scribbr.com/ai-tools/best-ai-detector/.
10.Teaching in the age of generative AI. Poorvu Center for Teaching and Learning. Accessed March 27, 2024. https://poorvucenter.yale.edu/event/teaching-age-generative-ai.
11.Navigating AI literacy. Poorvu Center for Teaching and Learning. Accessed March 27, 2024. https://poorvucenter.yale.edu/event/navigating-ai-literacy.
12.Practical AI for instructors and students part 1: introduction to AI for teachers and students - YouTube. Accessed March 27, 2024. https://www.youtube.com/watch?v=t9gmyvf7JYo&ab_channel=WhartonSchool.
13.McCabe DL, Trevino LK. Academic dishonesty: honor codes and other contextual influences. J High Educ. 1993;64(5):522–38. 10.2307/2959991. [Google Scholar]
14.CEETL AI Workshop: Academic Integrity | Center for Equity and Excellence in Teaching and Learning (CEETL). Accessed March 28, 2024. https://ceetl.sfsu.edu/event/ceetl-ai-workshop-academic-integrity.
15.Freshman FAQ. Howard University College of Medicine. Accessed March 28, 2024. https://www.hucm.org/freshmanfaq.

[CR1] 1.Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2): e0000198. 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Strong E, DiGiammarino A, Weng Y, et al. Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern Med. 2023;183(9):1028–30. 10.1001/jamainternmed.2023.2909. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Burk-Rafel J, Santen SA, Purkiss J. Study behaviors and USMLE Step 1 performance: implications of a student self-directed parallel curriculum. Acad Med J Assoc Am Med Coll. 2017;92(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 56th Annual Research in Medical Education Sessions):S67-S74. 10.1097/ACM.0000000000001916. [DOI] [PubMed]

[CR4] 4.Dyer JM, Heidi C. Pettyjohn, Steve Saladin. Academic dishonesty and testing: how student beliefs and test settings impact decisions to cheat. J Natl Coll Test Assoc. 2020;4(1).

[CR5] 5.Hill AJ, LoPalo M. The effects of online vs in-class testing in moderate-stakes college environments. Econ Educ Rev. 2024;98: 102505. 10.1016/j.econedurev.2023.102505. [Google Scholar]

[CR6] 6.Dendir S, Maxwell RS. Cheating in online courses: evidence from online proctoring. Comput Hum Behav Rep. 2020;2: 100033. 10.1016/j.chbr.2020.100033. [Google Scholar]

[CR7] 7.Will Douglas Heaven. ChatGPT is going to change education, not destroy it. MIT Technology Review. Published April 6, 2023. Accessed March 28, 2024. https://www.technologyreview.com/2023/04/06/1071059/chatgpt-change-not-destroy-education-openai/.

[CR8] 8.Developer W. 11 ways of cheating in online exams and how to prevent them? | SkillRobo. Skillrobo New. Published February 7, 2024. Accessed March 27, 2024. https://www.skillrobo.com/11-ways-of-cheating-in-online-exams/.

[CR9] 9.Driessen K. Best AI detector | Free & premium tools tested. Scribbr. Published February 26, 2024. Accessed March 28, 2024. https://www.scribbr.com/ai-tools/best-ai-detector/.

[CR10] 10.Teaching in the age of generative AI. Poorvu Center for Teaching and Learning. Accessed March 27, 2024. https://poorvucenter.yale.edu/event/teaching-age-generative-ai.

[CR11] 11.Navigating AI literacy. Poorvu Center for Teaching and Learning. Accessed March 27, 2024. https://poorvucenter.yale.edu/event/navigating-ai-literacy.

[CR12] 12.Practical AI for instructors and students part 1: introduction to AI for teachers and students - YouTube. Accessed March 27, 2024. https://www.youtube.com/watch?v=t9gmyvf7JYo&ab_channel=WhartonSchool.

[CR13] 13.McCabe DL, Trevino LK. Academic dishonesty: honor codes and other contextual influences. J High Educ. 1993;64(5):522–38. 10.2307/2959991. [Google Scholar]

[CR14] 14.CEETL AI Workshop: Academic Integrity | Center for Equity and Excellence in Teaching and Learning (CEETL). Accessed March 28, 2024. https://ceetl.sfsu.edu/event/ceetl-ai-workshop-academic-integrity.

[CR15] 15.Freshman FAQ. Howard University College of Medicine. Accessed March 28, 2024. https://www.hucm.org/freshmanfaq.

PERMALINK

Artificial Intelligence in Medical Education Assessments: Navigating the Challenges to Academic Integrity

John C Lin

Cameron A Sabet

Christopher Chang

Ingrid U Scott

Abstract

Proposals for Curricular Reform

Declarations

Competing Interests

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Artificial Intelligence in Medical Education Assessments: Navigating the Challenges to Academic Integrity

John C Lin

Cameron A Sabet

Christopher Chang

Ingrid U Scott

Abstract

Proposals for Curricular Reform

Declarations

Competing Interests

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases