The Case of the AI Scribe and the Missing Struggle
On a busy overnight shift in the emergency department, a second-year resident evaluates a patient with hypotension while an ambient AI scribe runs quietly in the background. By the time the resident returns to the workstation, the note is ready: structured, comprehensive, and better formatted than most attending documentation.
During the presentation, the facts are all correct. The medication list is spotless, the history reads smoothly, and the physical examination is detailed. But when the attending asks, “So what do you think is going on?” the resident hesitates. “The note has everything,” they say. “I’m still…putting it together.”
The attending wonders whether the scribe helped the resident to learn or silently took over the key clinical reasoning steps. Is the solution to turn off AI, ignore it, or implement something more deliberate?
In our inaugural article, we proposed 3 principles for programs experimenting with AI: normalize disclosure of AI use, prioritize observable reasoning over polished products, and start with small, testable steps.1 In this installment of AI Teaching Rounds, we apply those principles to the most immediate challenge facing attending physicians: supervising residents who are using AI tools during clinical work.
Why This Matters Now
AI is likely to be routine practice in high-income countries by the time today’s residents graduate. Large language models, ambient scribes, and decision-support tools are moving from pilot projects to products that health systems expect clinicians to use.2,3 AI literacy and safe use have become an educational obligation, not an optional enrichment activity.
At the same time, health system goals do not always align with educational goals. Many clinical AI tools are optimized for efficiency: shorter notes, fewer clicks, faster throughput.4 These may be reasonable aims for busy attendings, but resident learning depends on productive struggle. Trainees need repeated opportunities to gather messy histories, construct and prioritize differential diagnoses, make supervised mistakes, and repair them with feedback.5-7 If AI’s invisible hand removes too much of that work, we may inadvertently trade short-term efficiency for long-term underdevelopment.
AI also exposes what we really value in training. For decades, activities like detailed note writing were perceived as inherently educational.6,8 When a tool offers to take note writing off residents’ plates, we must ask a hard question: which parts of this task truly support learning and which parts were simply how the work was always done? Supervision in an AI-enabled environment begins with being explicit about which tasks we want to protect, which ones we are comfortable augmenting, and which ones we can safely offload.1,9
What Should Remain Human-Only (for Now)
Before discussing how to supervise permitted AI use, we need to clarify what should be off-limits. Even in environments where AI tools are widely available, a few tasks are worth guarding as “human-only.” Constructing a first-pass differential diagnosis outline is one of them. So is articulating an initial problem representation, assessment, and plan. These exercises are core to the development of clinical reasoning and expertise.6-8 If residents routinely outsource them, they may never get enough practice opportunities to build robust mental models.5,6
Programs can set clear expectations that trainees should do this work themselves first, before consulting any tools.3,10 The specifics will vary by specialty and setting, but the principle holds: identify the cognitive exercises that are central to your discipline and protect those as required human tasks (Table).
Table.
Examples of “Human-First” Cognitive Tasks Across Residency Programs
| Residency | Example: “Human-First” Cognitive Task | How AI Might Be Used After Resident Attempts |
|---|---|---|
| Anesthesiology | Create initial anesthetic plan and risk stratification for a high-risk surgical patient | Use AI to review guideline-based risk scores or alternative strategies |
| Emergency medicine | Form rapid initial synthesis (“sick vs not sick”), working diagnosis, and disposition plan based on limited data | Use AI to uncover additional diagnoses or red flags after resident commits to plan |
| Family medicine | Prioritize problems and set a visit agenda for a complex primary care encounter | Use AI to identify preventive-care gaps or evidence-based opportunities |
| General surgery | Decide how to adjust intraoperative plan in response to unexpected findings | Use AI postoperatively to review similar cases or guidelines during the debrief |
| Internal medicine | Construct initial problem representation and prioritized differential for a complex presentation | Use AI to check for missed diagnoses or relevant guidelines |
| Obstetrics and gynecology | Form initial labor plan with competing maternal-fetal risks | Use AI to compare plan with institutional protocols or evidence summaries |
| Orthopedic surgery | Classify fracture and outline initial operative vs nonoperative plan for patient with comorbidities | Use AI to explore alternative classification schemes or management pathways |
| Pediatrics | Integrate developmental history, examination, and family situation into an assessment for behavioral or learning concerns | Use AI to suggest key questions or alternative diagnoses the resident may have missed |
| Psychiatry | Develop diagnostic formulation and risk assessment after a new patient interview and collateral information | Use AI to review structured risk tools and alternative diagnoses |
Supervising AI Use in Clinical Work: The DEFT-AI Framework
Some AI uses will be prohibited. Many will not. Programs can prevent many real-time attending supervision challenges by preparing faculty and residents for safe AI use. Proactive steps include setting clear expectations for which tasks trainees must complete independently, creating written policies describing when and how AI tools may be used, and offering brief evidence-based discussions or case-based reviews that illustrate both productive and problematic uses of AI. Establishing these shared norms gives trainees and supervisors a common starting point and makes moment-to-moment supervision more consistent. Talking about appropriate AI use can be integrated into existing conferences, journal clubs, and simulations, as routine aspects of case discussions.
For the uses we allow in clinical work, supervisors also need a practical way to respond in real time. Abdulnour and colleagues recently proposed an educational framework called DEFT-AI to guide clinical supervision when AI is involved (Box 1).2 We suggest using this framework when directly supervising residents who use AI in clinical care.
Box 1 DEFT-AI in Under 2 Minutes for Clinical Supervisors
D: Diagnose the AI moment
“Did you use any tools, like an AI assistant or scribe, to help with this case?”
E: Explore the evidence and inputs
“What did you tell the AI?”
“What does the AI know about this patient, and what does it not know?”
F: Feedback on reasoning
“Tell me your plan as if the AI weren’t here. What did you think before you looked?”
“What did the AI add, and what did you decide to accept or discard?”
T: Teach verification
“How could we quickly check this recommendation?”
“If this were wrong, how would it mislead us?”
AI: Advice on future use
“For this task, I would like you to form your own plan first, then use AI as a double-check.”
Note: Adapted from Abdulnour et al.2
The first step is to identify when AI may be shaping a resident’s work. Unusually polished documentation, differentials that suddenly become more encyclopedic, or a casual “I ran it through the AI” can all signal that AI was involved in the work.2,4 Additionally, a routine question such as “Did you use any tools, like an AI assistant or scribe, to help with this case?” normalizes disclosure and indicates to residents that sharing AI use is safe. These questions reinforce that the goal is not to police or punish tool use, but to understand how the resident approached the case and learn what resources they found helpful. These discussions also give supervisors a clearer picture of how often AI is being used in day-to-day clinical work, which can inform future teaching practices and program policies.1,3 Inquiring about specific AI tools also allows the supervisor to ensure trainee compliance with program and institutional requirements, as permitted AI platforms vary widely across institutions and clinical settings.
Once AI use is identified, the supervisor can explore what the resident provided to the model as a prompt. What does the AI know? What is missing? This highlights for the resident a critical limitation: these systems work only with the inputs they receive.2,9 Important elements of clinical understanding, such as subtle examination findings, social context, or patient goals, often never make it into the prompt.
Feedback should focus on the resident’s reasoning, not primarily on the correctness of the generated answer. One approach is to separate 3 strands: the resident’s own thinking, AI suggestions, and the real clinical picture, including examination findings, patient context, and updated data that neither the AI nor the prompt may have captured.2,8 For example, asking the resident what they would have planned without the AI keeps the resident, not the model, as the primary clinical decision maker.
DEFT-AI also encourages routine but essential habits of verification and skepticism. A brief discussion of how to check AI recommendations against a guideline or with new data, and what harms would result if the AI conclusions were wrong, can shift the resident from passive acceptance to active appraisal.2,3,9 These moments should be incorporated naturally into the flow of clinical teaching.
Task Dissection: What Exactly Is AI Offloading, and Do We Want That?
So far, we have focused on what supervisors can do in real time with individual residents. Real-time clinical supervision is important but not sufficient. Programs also need to step back and examine how AI is changing the learning value of specific tasks.1,9,11 For each resident activity, consider: What is the task teaching? What does the AI offload? And how does that trade-off differ by learner stage?5,7 Supervisors and programs can use the structured questions in Box 2 as a guide for determining whether a task should be protected, augmented, or offloaded.
Box 2 Task Dissection Framework: 5 Questions
What is this task teaching?
What exactly would the AI offload?
For this trainee at this stage, is that offloading helpful or harmful?
What new learning opportunity might AI create?
How will supervisors still see the trainee’s thinking?
Three common tasks illustrate how this lens can guide supervisory decisions.
History and Physical Examination With an Ambient Scribe
An AI scribe offloads verbatim recall and transcription. Those tasks are acceptable when the goals are rapid, legible, and more concise transcription.6,8 However, subtle issues of question-and-answer order, nuances of examination details, patient context, and patient emotions or reactions are a few key areas in which residents may need specialty-specific documentation expertise, particularly for patients with complex, multiple problems. In addition to these considerations, the resident should remain responsible for summarizing and articulating an assessment before viewing the AI draft: a must starting point.
Differential Diagnosis
We should be unwilling to outsource this high-value cognitive work.6-8 If AI supplies the first-pass list, the resident may never learn to build and prioritize their own thoughts. This “never-skilling” risk is noted in emerging discussions of AI’s impact on clinical reasoning.2,4 AI is best used after residents generate their own differential diagnosis considerations.
Note Writing
Documentation helps learners structure their thoughts and writing skills for clear communication, but much note writing is burdensome. Letting AI draft a note may reduce implicit learning but allow more time for supervisor-guided reflection and discussion. The AI-generated note may still function as a teaching tool if residents are required to edit, explain, or justify parts of the note that the model got wrong or communicated poorly. This approach could preserve insight into reasoning and communication skills, while still reducing the low-value clerical work of documentation.
For example, a resident might notice that the AI-generated history of present illness attributes symptom onset to “several weeks” when the patient clearly stated the symptoms began yesterday. Correcting that discrepancy requires the resident to articulate why onset matters for diagnosis and changes the clinical story. Or the resident might revise an overly definitive assessment such as “pneumonia” to something more appropriate for early reasoning, like “suspected community-acquired pneumonia, but consider pulmonary embolism given tachycardia and risk factors,” which reveals their thought process.
Supervisors who want to experiment with small, low-stakes strategies for managing AI use can try the approaches listed in Box 3. Readers interested in exploring more can look at the materials in Box 4.
Box 3 Try This: Small Experiments for Supervising AI Use
Add one question during case presentations:
“Did you use any tools, including AI, to help with this case?”
Protect the differential:
Require a resident-generated differential before tools are consulted.2,7,8
Use the AI-generated note as a teaching artifact:
Ask the resident to identify one place where the draft misrepresented their thinking or findings.4,9
Box 4 Useful Resources
Abdulnour REE, Gin B, Boscardin CK. Educational strategies for clinical supervision of artificial intelligence use. N Engl J Med. 2025;393(8):786-797. doi:10.1056/NEJMra2503232
A concise and practical introduction to the DEFT-AI framework and the risks of over-reliance, including “deskilling” and “never-skilling.” Useful for clinicians learning to supervise AI use in real time.
Gin BC, O’Sullivan PS, Hauer KE, et al. Entrustment and EPAs for artificial intelligence (AI): a framework to safeguard the use of AI in health professions education. Acad Med. 2025;100(3):264-272. doi:10.1097/ACM.0000000000005930
Extends EPA and entrustment principles to AI tools themselves. Helps programs define when certain uses should be prohibited, supervised, or allowed with increasing independence.
Association of American Medical Colleges. Principles for the responsible use of artificial intelligence in and for medical education. https://www.aamc.org/about-us/mission-areas/medical-education/principles-ai-use
Offers clear institutional guidance on transparency, access, equity, and safety. Useful GME leaders creating local policies around AI use.
Janumpally R, Nanua S, Ngo A, Youens K. Generative artificial intelligence in graduate medical education. Front Med (Lausanne). 2025:11:1525604. doi:10.3389/fmed.2024.1525604
A short, accessible overview of emerging use cases in GME, with attention to cognitive offloading and responsible adoption.
Abbreviations: EPA, entrustable professional activity; GME, graduate medical education.
Conclusions
Teaching while AI machines are on does not require us to abandon what we know about effective supervision. It does require us to be explicit about which parts of clinical work truly enhance learning, to invite residents to show us their thinking even when AI is involved, and to use each AI-generated suggestion as an opportunity to ask why we believe this suggestion is accurate. If we are deliberate, AI can help strip away low-yield busywork and refocus training on the most human parts of medicine: working through uncertainty, caring for patients, and reflecting on our own thinking.
References
- 1.Preiksaitis C. AI teaching rounds: orienting graduate medical education without the hype. J Grad Med Educ. 2025;17(6):685–688. doi: 10.4300/JGME-D-25-01014.1. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Abdulnour REE, Gin B, Boscardin CK. Educational strategies for clinical supervision of artificial intelligence use. N Engl J Med. 2025;393(8):786–797. doi: 10.1056/NEJMra2503232. doi: [DOI] [PubMed] [Google Scholar]
- 3.Association of American Medical Colleges. Principles for the responsible use of artificial intelligence in and for medical education. Accessed November 14, 2025. https://www.aamc.org/about-us/mission-areas/medical-education/principles-ai-use.
- 4.Rose C, Preiksaitis C. AI passed the test, but can it make the rounds? AEM Educ Train. 2024;8(6):e11044. doi: 10.1002/aet2.11044. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bjork EL, Bjork RA. Psychology and the Real World: Essays Illustrating Fundamental Contributions to Society. Worth Publishers; 2011. Making things hard on yourself, but in a good way: creating desirable difficulties to enhance learning; pp. 56–64. [Google Scholar]
- 6.Norman G. Building on experience—the development of clinical reasoning. N Engl J Med. 2006;355(21):2251–2252. doi: 10.1056/NEJMe068134. doi: [DOI] [PubMed] [Google Scholar]
- 7.Schmidt HG, Rikers RMJP. How expertise develops in medicine: knowledge encapsulation and illness script formation. Med Educ. 2007;41(12):1133–1139. doi: 10.1111/j.1365-2923.2007.02915.x. doi: [DOI] [PubMed] [Google Scholar]
- 8.Bowen JL. Educational strategies to promote clinical diagnostic reasoning. N Engl J Med. 2006;355(21):2217–2225. doi: 10.1056/NEJMra054782. doi: [DOI] [PubMed] [Google Scholar]
- 9.Tolsgaard MG, Pusic MV, Sebok-Syer SS, et al. The fundamentals of artificial intelligence in medical education research: AMEE guide no. 156. Med Teach. 2023;45(6):565–573. doi: 10.1080/0142159X.2023.2180340. doi: [DOI] [PubMed] [Google Scholar]
- 10.Gin BC, O’Sullivan PS, Hauer KE, et al. Entrustment and EPAs for artificial intelligence (AI): a framework to safeguard the use of AI in health professions education. Acad Med. 2025;100(3):264–272. doi: 10.1097/ACM.0000000000005930. doi: [DOI] [PubMed] [Google Scholar]
- 11.Janumpally R, Nanua S, Ngo A, Youens K. Generative artificial intelligence in graduate medical education. Front Med (Lausanne) 2024;11:1525604. doi: 10.3389/fmed.2024.1525604. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
