Case: 3 Questions in 1 Week
By Friday afternoon, an attending realizes she’s had the same conversation 3 times in different settings, with the same unease about artificial intelligence (AI) use. On Monday after rounds, a third-year resident lingered at the workstation to ask whether he must disclose that an AI tool helped draft his notes, even though he verified the facts and added his own reasoning. On Wednesday, an inbox message arrived from a colleague, preparing a presentation, who wondered how to evaluate a resident’s project that used AI for data summaries and first-pass drafting. What, exactly, counts as the trainee’s own work? On Friday at the Clinical Competency Committee (CCC), several portfolio reflections read remarkably smoothly, which prompted the chair to ask: does AI-assisted prose represent authentic self-assessment? None of these conversations are hypothetical; they reflect the current collision of AI with supervision, assessment, and the clinical learning environment. Though there are many different questions, the common thread is uncertainty. The Journal of Graduate Medical Education’s new AI Teaching Rounds series will begin here, with this premise: programs don’t need perfect answers to begin using AI; they need a shared language, clear guardrails, and small, testable steps they can revise as they learn.
Why Now
AI in health care is arriving from 2 directions. At the institutional level, health systems are rolling out ambient AI scribing and documentation assistants. Early studies suggest these interventions reduced note-writing time and improved clinicians’ experience, with effects that vary by role and specialty.1 At the individual level, AI adoption is no longer a fringe phenomenon; national survey data show a sharp rise in physicians’ use of health care AI over the last year.2 Residents encounter AI tools during rotations and bring their own versions for learning, documentation, and decision support. Whether or not your trainee handbook has caught up with the trend, AI is already in the clinical learning environment.
The question, then, is not whether residency programs should engage with AI, but rather how. Tools designed to enhance patient care and minimize errors may undermine the development of key clinical skills. For example, endoscopists who used AI for polyp detection showed reduced detection ability when performing colonoscopies without AI assistance—they fell below their baseline performance before any AI exposure.3 Eager integration without careful consideration of the potential impacts risks undermining training outcomes. We must determine in which settings these tools can and should be used. We cannot make these determinations without active engagement and testing.
Some lines are worth drawing with AI use immediately. Do not place protected health information in consumer (commercial) tools. Instead, use institution-approved systems with business agreements regarding data sharing. Do consider equity at the outset: not every trainee has access to or can purchase premium tools. In addition, language differences matter in AI use. Programs may need to provide trainees with access to preferred AI tools and set guardrails that avoid advantaging certain groups.
Principles
Three principles guide this series. First, normalize disclosure of AI use in graduate medical education (GME) without stigma. Transparency mirrors emerging scholarly publishing norms that require acknowledging AI assistance, just as we acknowledge other collaborative tools. Humans remain responsible for the final output.4 Second, prioritize observable reasoning over polished products. When work appears unusually smooth, ask trainees to walk through their thinking aloud or explain their decision-making process. Third, start small, evaluate fast, and revise—then scale up what works.
What This Series Will Cover
AI in the Clinical Environment
Programs must establish appropriate limits on AI tool use during training and determine whether existing competency frameworks suffice or new ones are needed. We will explore how to teach and assess AI use through the same approach used for other clinical procedures: clear expectations, direct observation, and feedback. We will translate current frameworks into supervision language that programs can use now.5
AI for Teaching
AI can generate personalized teaching materials, provide rapid formative feedback, and enable simulations that previously required extensive infrastructure. However, these applications must be grounded in pedagogical principles and informed by faculty educational expertise. We will explore how to balance efficiency gains with the essential requirement for careful human review.
AI Analytics
Programs generate mountains of narrative comments, simulation notes, and procedure logs. AI may help to identify surface patterns for program improvement and trainee assessment, if we watch for bias and use the results to generate interpretable and meaningful insights. A recent scoping review underscores both promise and caution for AI program analytics.6
AI in Education Research
We will explore how AI can facilitate research designs for fast-moving educational contexts and support meaningful reproducibility practices. The series will also suggest a roadmap for publishing rigorous education scholarship on AI interventions in GME.7
Burning Issues
Authenticity Versus Assistance
A resident submits a beautifully written reflection drafted with AI help. How should the CCC weigh this reflection? The challenge intensifies when AI helped organize the resident’s thoughts but the insights appear genuinely their own. Training level adds another layer: what serves as appropriate scaffolding for interns may signal concerning dependence for senior residents. In studies of applicant materials and research articles, humans struggle to reliably detect AI-generated text.8,9 AI detection tools are currently unreliable in differentiating human versus AI content and cannot be used for high-stakes judgments.10 Building a transparent AI disclosure culture will be more reliable than attempting detection.
Efficiency Versus Skill Development
AI tools promise to reduce documentation burden and accelerate diagnostic processes, but continuous exposure may undermine the very skills trainees need to develop. The endoscopy example illuminates this risk: AI assistance created dependency rather than augmenting capability. If trainees rely on AI for differential diagnosis generation, clinical reasoning synthesis, or procedure guidance, they may not develop the cognitive frameworks necessary to function independently. Programs must distinguish between tools that support learning and those that bypass it, yet these boundaries remain unclear and likely vary by specialty, task, and training stage.
Accountability and Supervision
When a trainee uses AI assistance and an error occurs, who bears responsibility? The trainee who accepted the AI output? The attending who supervised without knowing AI was involved? The institution that provided or permitted the tool? Current supervision frameworks assume human-generated work products. AI introduces an additional layer: faculty may be supervising a collaboration between trainee and algorithm rather than assessing the trainee’s independent work. Disclosure helps, but it doesn’t resolve the fundamental question of where accountability lies in AI-assisted clinical work and learning.
Try This: Your First AI Conversation (5 Minutes)
At your next team interaction—rounds, a committee meeting, or a teaching session—pose 3 questions: Where is AI already showing up in our work? What uses should always be disclosed? What tasks must remain purely human? Document what you learn and share it at your next CCC or Program Evaluation Committee meeting. This simple exercise normalizes AI conversations, reveals current practices, and may surface an area ready for a pilot intervention.
What’s Next in This Series
Each article in this series will tackle one domain with practical guidance you can use immediately: competent AI use in clinical settings, teaching applications with verification workflows, analytics that support program improvement, and research methods for rigorous scholarship. Throughout, we will hold to the principles outlined above—transparency through disclosure, reasoning over polish, and small testable steps—while protecting what matters most: human judgment, professional integrity, and learning that serves patient care.
We want to hear from you. Consider submitting a letter, perspective, innovation, or original research to add to this conversation.
Box 1 What We Mean by “AI” in This Series
Artificial intelligence (AI): Umbrella term for computational systems performing tasks associated with human cognition. Here we focus on AI assistants for education, documentation, and faculty work.
Large language model (LLM): A generative model trained on large text corpora (some are licensed and proprietary). LLMs draft, summarize, or rephrase text.
Multimodal LLM: Accepts and produces modalities beyond text (eg, images, audio). Relevant GME applications include analysis of screen-recorded simulation debriefs or ultrasound images.
Generative vs predictive: Generative models create text or images; predictive models produce labels or scores (eg, risk stratification or triage) and often underpin analytics.
Prompt/prompting: The task, context, and constraints supplied to an AI request or model. Specificity improves output and speeds human review.
Hallucination: Fluent but inaccurate AI output, hence, the need for human verification before educational, clinical, or research use.
Disclosure: A brief note describing where and how AI assisted the creation of a GME work product. This professional norm enables supervision and fair assessment.
Human-in-the-loop: Educators or clinicians review, correct, or adjudicate AI model outputs for high-stakes clinical or assessment tasks.
Abbreviation: GME, graduate medical education.
Box 2 Starting Small: 3 Pilot Ideas for Your Program
Disclosure on one service. For 4 weeks on a single rotation, ask residents to mark “AI-assisted” on any work product that used AI generative tools, with a brief note on how it was used (eg, “outline only,” “grammar,” or “first-draft structure”). At the end of the pilot, gather feedback from residents and faculty: was disclosure clear and feasible? Did it change supervision practices? Did any patterns emerge about when and how AI was used? If the approach proves workable, refine the language and expand to other settings.
Supervised AI use protocol. On one rotation create explicit guidelines for when/how residents can use AI for clinical reasoning or documentation. Require disclosure with each use. Have attendings document how their supervision changed and when they knew AI was involved. Evaluate: What additional questions did attendings ask? Did they check reasoning more carefully? What new supervision strategies emerged? Use these findings to develop broader supervision frameworks for AI-assisted clinical work.
Feedback scaffolding. Record brief bullet-point observations, then use AI to structure them into narrative feedback. Review the AI-generated narrative to ensure that it accurately reflects your observations and add concrete examples as needed. Ask residents whether this feedback was more actionable than previous sessions. Ask faculty about time investment versus quality. If the approach saves time without sacrificing specificity, consider scaling to other assessment contexts.
Abbreviation: AI, artificial intelligence.
Box 3 Useful Resources
Association of American Medical Colleges Webinar Series: AI Skill Building for Medical Educators
Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med Educ. 2023;9:e48785. doi:10.2196/48785
Benjamin J, Masters K, Agrawal A, MacNeill H, Mehta N. Twelve tips on applying AI tools in HPE scholarship using Boyer’s model. Med Teach. 2025;47(6):949-954. doi:10.1080/0142159X.2024.2445058
Mollick E. Co-Intelligence: Living and Working With AI. Portfolio/Penguin; 2024. (Accessible primer on how to think with AI in professional contexts.)
Abbreviation: AI, artificial intelligence.
References
- 1.Stults CD, Deng S, Martinez MC et al. Evaluation of an ambient artificial intelligence documentation platform for clinicians. JAMA Netw Open. 2025;8(5):e258614. doi: 10.1001/jamanetworkopen.2025.8614. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Henry TA. American Medical Association. 2 in 3 physicians are using health AI—up 78% from 2023. Published February 26, 2025. Accessed October 21, 2025. https://www.ama-assn.org/practice-management/digital-health/2-3-physicians-are-using-health-ai-78-2023. [Google Scholar]
- 3.Budzyń K, Romańczyk M, Kitala D et al. Endoscopist deskilling risk after exposure to artificial intelligence in colonoscopy: a multicentre, observational study. Lancet Gastroenterol Hepatol. 2025;10(10):896–903. doi: 10.1016/S2468-1253(25)00133-5. doi: [DOI] [PubMed] [Google Scholar]
- 4.International Committee of Medical Journal Editors. Recommendations. Accessed October 21, 2025. https://www.icmje.org/recommendations/ [DOI] [PMC free article] [PubMed]
- 5.Boscardin CK, Gin B, Golde PB, Hauer KE. ChatGPT and generative artificial intelligence for medical education: potential impact and opportunity. Acad Med. 2024;99(1):22–27. doi: 10.1097/ACM.0000000000005439. doi: [DOI] [PubMed] [Google Scholar]
- 6.Verghese BG, Iyer C, Borse T, Cooper S, White J, Sheehy R. Modern artificial intelligence and large language models in graduate medical education: a scoping review of attitudes, applications, and practice. BMC Med Educ. 2025;25(1):730. doi: 10.1186/s12909-025-07321-5. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tolsgaard MG, Pusic MV, Sebok-Syer SS et al. The fundamentals of artificial intelligence in medical education research: AMEE guide no. 156. Med Teach. 2023;45(6):565–573. doi: 10.1080/0142159X.2023.2180340. doi: [DOI] [PubMed] [Google Scholar]
- 8.Preiksaitis C, Nash C, Gottlieb M, Chan TM, Alvarez A, Landry A. Brain versus bot: distinguishing letters of recommendation authored by humans compared with artificial intelligence. AEM Educ Train. 2023;7(6) doi: 10.1002/aet2.10924. 10.1002/aet2.10924. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mangold S, Ream M. Artificial intelligence in graduate medical education applications. J Grad Med Educ. 2024;16(2):115–118. doi: 10.4300/JGME-D-23-00510.1. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Walters WH. The effectiveness of software designed to detect AI-generated writing: a comparison of 16 AI text detectors. Open Inf Sci. 2023;7(1) doi: 10.1515/opis-2022-0158. doi: [DOI] [Google Scholar]
