Skip to main content
Advances in Simulation logoLink to Advances in Simulation
. 2026 Jan 29;11:14. doi: 10.1186/s41077-026-00407-0

Generative AI in simulation debriefings: an exploratory study using the Team-FIRST framework and qualitative feedback from simulation experts and learners

David W Tscholl 1,#, Max Ebensperger 2,#, Arend RahrischRahrisch 1, Helius Wang 3, Hubert Heckel 1, Max Thomasius 1, Alexander Kaserer 1, Bastian Grande 1, Julia C Seelandt 4, Michaela Kolbe 4,5,
PMCID: PMC12924402  PMID: 41612508

Abstract

Background

Effective debriefings in simulation-based education require accurate observation of team interactions, yet facilitators face challenges due to cognitive load, observer bias, and the complexity of team dynamics. Generative artificial intelligence (AI) tools offer a potential means to support this process by analyzing verbal communication and providing structured feedback. This study explored how AI tools can contribute to teamwork observation and debriefing in immersive medical simulations.

Methods

We conducted a qualitative, exploratory study using thematic analysis of simulation participants’ and debriefers’ experiences with AI-generated teamwork reports. Forty-one participants (anesthesia nurses, residents, and attendings) participated in immersive scenarios at the University Hospital Zurich simulation center. Verbal interactions were transcribed with AI-assisted speech recognition and analyzed using two large language model–based systems (Isaac and ChatGPT-4o) guided by a prompt based on the Team-FIRST framework. Structured reports were generated for each scenario and reviewed by four simulation experts. Semi-structured interviews captured learners’ perspectives on being observed by AI tools.

Results

A total of 26 AI-generated reports and 27 learner interviews were analyzed. Experts valued the detailed transcripts and illustrative quotes, which supported structured feedback and captured observations that might otherwise be missed. Limitations included inaccuracies in categorization, misattribution of speakers, overly generalized interpretations, and the absence of contextual or nonverbal information. Learners expressed openness and optimism about AI’s potential benefits: efficiency, objectivity, and enhanced perception, while also raising concerns about transparency, data protection, interpretation errors, and risks of overreliance. Both groups emphasized the necessity of human oversight.

Conclusion

Generative AI tools can complement simulation debriefings by structuring communication data and highlighting teamwork patterns, supporting reflective practice. Current limitations highlight the need for multimodal approaches, refined prompting strategies, and integration with expert facilitation to ensure AI functions as a support tool rather than a replacement in simulation-based education.

Trial registration

BASEC ID: Req-2024-01642.

Supplementary Information

The online version contains supplementary material available at 10.1186/s41077-026-00407-0.

Keywords: Generative artificial intelligence, Healthcare, Simulation, Debriefing, Large language models, Teamwork, Qualitative thematic analysis, Clinical education technology, Automated assessments

Introduction

Effective team interaction is crucial for patient safety and quality of care [1], with much recent scientific emphasis being placed on defining what constitutes good teamwork in healthcare [24], particularly regarding psychological safety, speaking up, team learning, and reflective practice [58]. Simulation training has become a key method for enhancing these interactions by providing a safe environment in which to practice and refine skills [912]. Observing team interactions during simulation training allows training facilitators to identify strengths and weaknesses [13, 14]. This information is critical for providing feedback and facilitating reflective discussion [1520], which are considered the core learning elements of simulation training [18, 19, 2125]. By observing simulated interactions, it is possible to identify both helpful and unhelpful teamwork patterns [26, 27]. Targeted observation allows facilitators to guide these debriefing sessions that reinforce learning objectives [28]. However, observing team interactions presents significant challenges. First, team dynamics are neither linear nor straightforward [2932]. Accurately perceiving these dynamics can be mentally demanding, especially in immersive simulations [3335]. Second, observers bring their biases and preconceptions to the observation process, affecting their interpretations [36]. Third, facilitators experience significant cognitive load [3740]. This can lead to incomplete or inaccurate observations [41]. Currently, this process is often supported by technology through integrated sound and video recording and replay systems; however, no theory-based, automated solution exists for assessing teamwork. Integrating artificial intelligence (AI) technology presents a compelling solution addressing the above challenges in simulation. Generative AI is used to support documentation and clinical decision-making [42, 43]. A subcategory of generative AI, large language models, can understand context and generate coherent, human-like language outputs [4447]. Researchers have begun investigating AI tools, highlighting both opportunities and limitations in simulation-based education [4852].

Objectives

This exploratory study aimed to investigate how generative AI tools can be applied to enhance the observation of team interactions during simulation training. The AI analyzed transcripts of simulated cases and generated automated reports designed to support facilitators during post-scenario debriefing. We were interested in how experienced simulation experts assessed the quality and relevance of AI-generated teamwork reports, as well as how learners experienced being analyzed by AI tools.

Methods

Study design and setting

In this qualitative, exploratory study, we used thematic analysis of (a) structured feedback and semi-structured interviews to evaluate simulation experts’ perceptions of AI-generated simulation reports and (b) participants’ experiences with being observed by AI tools during simulated clinical scenarios. See Fig. 1 for the study´s flowchart. We surveyed the annual, three-week simulation training program of the University Hospital Zurich’s Institute of Anesthesiology and Perioperative Medicine (January and February 2025) at the hospital’s simulation center, a state-of-the-art facility for simulation training, as shown in Fig. 2.

Fig. 1.

Fig. 1

Flowchart of Study Design and Overview of the 10 Categories of the Team FIRST Framework (based on Greilich et al., 2023).

Fig. 2.

Fig. 2

Photographs from the simulation. A Simulation environment at the University Hospital Zürich. B View from the control room, with three experts supervising a simulation scenario

Participants

Participants (total = 41), including nurses (n = 16), resident physicians (n = 17), and attending physicians (n = 8), were employed at the Institute of Anesthesiology and Perioperative Medicine. No specific limit or threshold on participation was set. Learners were provided with the study introduction and an explanation of our research objectives, along with an anonymized data usage agreement. No identifiable information (including names, timestamps, or participant IDs) was collected, and all data were fully anonymized. Learners did not have the opportunity to view the AI-generated reports at any point; handling of all data and reports was conducted exclusively by the study authors. See Table 1 for further Demographics.

Table 1.

Demographics and professional background of study participants (n = 41)

Category (total n = 41)
Gender (female) 59%
Age (years) 34 (30-43)
Experience in anesthesia (years) 7 (3-19)
Role (count) Nurse (non-anesthesia) = 5
Anesthesia nurse = 11
Anesthesia resident = 17
Anesthesia attending = 8
Experience at USZ (years) 2 (1-7)

Simulation-based training

Simulation-based training took place during work hours, and participants received in-house Continuing Medical Education (CME) credits. Participation was voluntary. Five to seven participants trained together for a full day. Qualified clinical simulation educators led the simulation-based training. They welcomed participants and spent approximately 1–1.5 h establishing a supportive learning environment, clarifying the learning objectives, and providing orientation to the simulation equipment [53]. Subsequently, learners alternated participating in and observing their colleagues participating in two simulated cases using SimMan3G (Laerdal, Stavanger, Norway) [54]. A structured methodology was employed to develop the scenarios [55], which focused on sudden airway collapse after sedation (Scenario 1) and difficult airway management (Scenario 2). See Supplementary Table 1 for the complete simulation scripts. Scenarios were audio- and video-recorded and were used during subsequent debriefings. Two facilitators conducted the debriefings immediately after the simulated cases were complete. Debriefings focused on Team FIRST competencies and were conducted with adherence to the Debriefing with Good Judgment and TeamGAINS approaches [17, 20].

RTF prompt design

Based on the Team-FIRST framework [3], we developed a Role Task Format prompt (RTF) to automatically generate AI debriefing reports. Structured prompt engineering has been shown to improve task specificity, clarity, and reproducibility when interacting with LLMs. It optimizes the model’s behavior toward defined tasks and reduces ambiguity in responses, which is critical for the reproducibility of LLM-prompt analysis [55]. Structured (task) inputs have also been found to improve output quality for academic uses [56]. The design of the prompt followed a clear input-to-output logic. The prompt guided the AI tools to process this material by mapping observed behaviors and communication patterns onto the ten categories defined by the Team-FIRST framework. The expected output was a report in which each category was presented as a separate section containing a summary and examples extracted from the transcript. Figure 1 illustrates the ten categories of the Team-FIRST framework. See Supplementary Fig. 1 for the complete RTF prompt.

Integration of AI tools for team verbal communication transcription and teamwork analysis

We asked the participants to wear RØDE lapel microphones while participating in the simulated cases, with audio signals transmitted directly to an iPad running the Isaac software (Saipient AI, Switzerland). This system generated real-time automatic transcripts of verbal team communication. We uploaded these transcripts into two separate AI-based tools (with the same underlying large language model) for analysis: (a) Isaac by Saipient AI and (b) ChatGPT-4o by OpenAI. Both then compiled a separate and annotated report (Supplementary Material). Transcripts were uploaded immediately after scenario completion. The AI tool analysis took less than two minutes. Learners did not receive copies of the AI-generated reports. Transcripts were generally accurate and clear according to the simulation experts (and the study’s authors), allowing reliable analysis. Minor errors occurred in speaker identification, particularly for statements delivered by the overseeing facilitators when giving external cues and commands using the voice-over loudspeaker (Audio to Room).

The debriefers’ perspective: evaluation of AI-generated teamwork reports

Four members of our study group (JCS, MK, HH, MT), experienced simulation experts, evaluated the AI-generated reports. Author DWT assigned them two reports per scenario: one report for each AI tool. The origin of the report was blinded to the debriefers. The debriefers were asked to comment on the selection, structure, interpretation, and relevance of the content, as well as share any emotional reactions by open-ended written commentary (i.e., instruction:“My thoughts on the report (report begins below the line): Please enter your thoughts and feelings here.”).

The learners’ perspective: evaluation of learners’ reactions

To understand how learners experienced being analyzed by AI tools, authors DWT, AR, and HW interviewed them after they completed their scenarios at the end of the training day. The guiding question was:

What are your thoughts and feelings about being observed by AI during the simulation? Please share anything that comes to mind.

Data analysis

We analyzed the learners’ written interview responses and the evaluators’ comments using a thematic analysis approach [57]. DWT and AR reviewed all raw responses and developed initial codes. They iteratively structured and consolidated the data into overarching thematic clusters, using Microsoft Word, with each response assigned a corresponding code. The resulting coding structure was reviewed, discussed, and validated with co-authors JCS and MK. The resulting categories and sample quotes from debriefers’ evaluations of the AI-generated teamwork reports are presented in Table 2. The categories and sample quotes from learners’ reactions to being analysed by AI tools are shown in Table 3.

Table 2.

Categories and example quotes from debriefers. The table presents key themes, subthemes, and illustrative comments from simulation debriefers on the quality and usefulness of AI-generated reports

Theme Subtheme Examples
Positive Feedback Broader Observation Capture “Examples [quotes] are appropriate. (However, I don’t remember them — which doesn’t necessarily mean anything, as there was a lot of noise in the control room and unrest in the scenario.)”
Quote Selection Supports the Theme Well “I share the above assessment — from my perspective, it’s perfect.”
Support for Debriefing “I would benefit from this transcript in a debriefing.”
Categorization Accuracy Issues Unclear, Inaccurate, or Overlapping Categorization “The psychologically safe environment was present — but I didn’t see it reflected in these quotes.”
Overgeneralized, Superficial, or Ambiguous Categorization “These two points from the “mutual trust” category seem strongly out of context to me.”
More Representative Quotes Could Have Been Chosen “For me, structured communication would mean something like an ABCDE or a 10-for-10; the above statements are, in my opinion, not examples of that. I heard and coded an ABCDE during the scenario, and its content was different.”
Desire for More Context or Stronger Exemplification “I think the statements are part of structured communication; what was said before or after? Additional information would help me form an opinion.”
Identification of Problems Misidentification of Speaker or Role “I believe this information also came from outside the team and perhaps shouldn’t be included here. We would need a way to mark speakers and exclude certain roles from interpretation — for example, instructors or embedded simulated persons (ESPs).”
Misinterpretation of Technical Terms or Abbreviations “There are issues with the handling of technical vocabulary, although the key aspects were still recognizable.”
Overconfident or Unwarranted Wording by AI “Strange conclusion — how did the AI come up with that?”
Systematic Biases in AI Interpretation “It would also be interesting if central aspects that were missing could be suggested — for example, lack of coordination, absence of closed-loop communication, or the use of destructive language.”
Suggestions for Improvement Link Quotes to Timestamps and Identify Speakers “What I’m missing here are the names of the speakers. I find the timestamps extremely helpful for understanding the context and potentially showing the corresponding scene in the video.”
Expand Beyond Text-Based Analysis “I could also imagine that the AI might be able to distinguish the participants’ speaking volume over time — for example, when someone speaks loudly or quietly, and what impact that has on the team.”
Improve Result Presentation and Clarity “The quotes could be shown in a list and color-coded by theme—whether the AI classified them as relating to challenges, communication, or coordination skills.”

Table 3.

Categories and example quotes from participants. The table shows selected categories and corresponding illustrative statements reflecting participant perspectives on AI-based observation and perceived benefits

Theme Subtheme Examples
Influence of perceived observation - “The fact that AI is listening and creating a transcript doesn’t change how I feel.”
Perceived Benefits General Optimism About the Potential of AI Technologies “AI is the future, definitely. You have to play along, or you’ll miss the trend. It’s a race — if you miss the jump, you’re left behind.”
Trust in AI “Not really a problem with how it’s being used.”
Enhanced Perception Through AI “An additional aid to humans, because it notices more.”
Support for Generating Ideas and Structuring Feedback “It can provide interesting input and be helpful when things are being compared.”
Increased Efficiency Through Automation “It can help us summarize information.”
Objectivity and Impartiality “Computers take emotions out of the equation — that might make them fairer in terms of evaluation.”
Independence from Human Factors “A human can’t be completely objective; AI is objective. AI looks only at what is given and puts it down on paper. That can be an advantage and a disadvantage.”
Perceived Risks Lack of Transparency “You’re left in the dark about the background — it gives you an uneasy feeling.”
Data Protection and Security Concerns “Conflicted about data protection: Where does the data go? How is it processed?”
Interpretation Errors “There may be a lack of understanding. AI certainly can’t empathize with people as well — humans are better at that.”
Loss of Cognitive, Communicative, and Social Abilities “Fear of neglecting one’s own intuition.”
Lack of Trust in AI “The lack of emotion is also a disadvantage, because certain aspects go unnoticed.”
Various Concerns About AI Technology “Can AI understand all dialects, all languages?”
Suggested Key-Features for AI Support for Human Situation Awareness “I want feedback to know that the AI understood what it was for.”
Desire for Human Involvement in Evaluation “The interpretation and debriefing should be led by a human.”

Reflexivity

As authors of this manuscript, we come from medicine, nursing, and psychology and represent a range of career stages, with shared experience in designing, delivering, and studying simulation-based education. Our backgrounds and professional roles shaped both our perspectives and our analytic focus. DWT, BG, and MK had prior experience with the use of AI in developing novel visual patient monitoring systems [58, 59], whereas AK, MT, MK, JCS, and BG had substantial expertise in team interaction analysis [36, 6063] and interprofessional simulation training [15, 20, 26, 64]. Throughout the study, ongoing interdisciplinary discussions led us to refine our assumptions, particularly by integrating psychological and nursing perspectives into the interpretation of observed effects.

Results

A total of 26 AI-generated reports from nine different simulation scenarios, conducted between January 21 st and January 28th 2025, were reviewed and commented on by four simulation experts. Additionally, 27 participants participated in interviews about their experiences with AI-based observation and analysis during the simulations; see Table 1 for the demographics of the simulation participants.

The simulation experts’ perspective on AI

We have identified four different themes in simulation experts’ perspectives on AI-generated teamwork reports: (1) positive feedback, (2) categorization accuracy issues, (3) identification of problems, and (4) suggestions for improvement. Table 2 provides selected quotes from the simulation experts’ comments. The coding trees derived from these debriefers’ responses are presented in Supplementary Fig. 2. Supplementary Table 2 provides the full feedback from the simulation experts. While we did not specifically ask whether the AI tools’ output would influence the simulation experts’ personal practice, several commented that the report could serve as a helpful adjunct during debriefings.

Simulation experts’ theme 1: positive feedback

Broader Observation Capture” emphasized that AI-generated transcripts often captured details that simulation experts missed during live observation. “Quote Selection Supports Theme Well,” captured views that the generative AI tools selected quotes that effectively illustrated specific and relevant aspects of teamwork: “A participant stated that the (AI tools) remarks helped to build trust, spontaneously and without the topic having been discussed (further by the debriefer).” Lastly, “Support for Debriefing,” reflected debriefers’ perception that the concept offered valuable assistance for debriefing practice.

Simulation experts’ theme 2: categorization accuracy issues

This theme relates to the difficulties AI tools face in accurately categorizing transcribed quotes into the Team FIRST categories. “Unclear, Inaccurate, or Overlapping Assignment” reflected that quotes did not consistently align with the competencies, while “Overgeneralized, Superficial, or Ambiguous Categorization” captured that classifications lacked depth, specificity, or sufficient contextual justification. Finally, the “Desire for More Context or Stronger Exemplification” summarized the request for longer or more complete dialogue excerpts.

Simulation experts’ theme 3: identification of problems

The first subtheme, “Identification of Problems: Misidentification of Speaker or Role” captured concerns about inaccuracies in attributing statements to the correct team members. “Identification of Problems: Misinterpretation of Technical Terms or Abbreviations” reflected debriefers’ observations that technical language was sometimes inaccurately captured or translated. “Overconfident or Unwarranted Wording by AI” captured concerns that some statements lacked critical nuance. One debriefer stated: “The wording is very assertive and explicit.” The final subtheme, “Systematic Biases in AI Interpretation” captured concerns that the AI tools may have exhibited implicit assumptions in their interpretation of team dynamics. One debriefer questioned, “I wonder what knowledge the statement is based on: that this happened between a doctor and a nurse. Could a bias be at play, assuming certain roles automatically belong to certain professions?”

Simulation experts’ theme 4: suggestions for improvement

The first subtheme was “Link Quotes to Timestamps and Identify Speakers”. The second subtheme, “Expand Beyond Text-Based Analysis” highlighted concerns that important nonverbal cues, such as tone of voice, body orientation, and overall atmosphere, were not captured. One remarked, “The calm and controlled tone that contributed to psychological safety is missing. This is something we pay close attention to as instructors.” The third subtheme was “Improve Result Presentation and Clarity”. One suggestion was to embed timestamped quotes directly into existing tools.

The learners’ perspective on AI

Based on the interviews with learners, we have identified four different themes in their perspectives on being observed and analyzed by an AI tool during simulation-based training: (1) influence of perceived AI observation, (2) perceived benefits, (3) perceived risks, and (4) suggested key features for AI. Table 3 presents corresponding examples from the participant interviews. Supplementary Fig. 3 and Supplementary Table 3 provide the full feedback from learners.

Learners’ theme 1: Influence of Perceived AI Observation

This theme encompassed participants’ varying reactions to being aware of being observed by AI tools. For some learners, the presence of AI tools faded into the background and did not influence their behavior. Others described a general sense of inhibition, regardless of who or what was watching:

“You don’t feel entirely free to act. Just the fact of being watched causes tension, whether it’s AI or a human.”

Learners’ theme 2: perceived benefits

The first subtheme, “General Optimism About the Potential of AI Technologies”, captured participants’ hopeful outlook on the role of AI tools in clinical settings. The second subtheme, “Trust in AI”, included participants’ statements expressing high confidence in AI tools and little concern about potential risks. ”Enhanced Perception Through AI” included learners’ reflections, as one remarked, “As humans, we can only perceive so much. […] AI might help us notice other things.” The fourth subtheme, “Support for Generating Ideas and Structuring Feedback”, included participants’ views of AI as a valuable partner in creative and analytical processes. “Increased Efficiency Through Automation” and “Objectivity and Impartiality” captured the notion that: “Unlike human analysis, it prevents bias based on past mistakes.” The final subtheme, “Independence from Human Factors”, included participants’ appreciation of AI’s ability to operate without being directly influenced by human biases.

Learners’ theme 3: perceived risks

The first subtheme, ”Lack of Transparency”, included participants’ concerns: “AI can present facts in a way that makes them seem true. With humans, you notice uncertainty or emotion. A computer appears more confident. But is the information correct?” The second subtheme, “Data Protection and Security Concerns”, included participants’ unease about how their data is stored, processed, and shared. “Interpretation Errors” included participants’ concerns about the risk of AI misinterpreting input. “Loss of Cognitive, Social, and Communicative Abilities” and “Lack of Trust in AI” included participants’ concerns about the long-term impact of AI tools on essential human skills. One remarked, “You don’t need communication skills anymore,” referring to how AI tools increasingly take over tasks like writing or speaking. “Various Concerns About AI Technology” included a broad range of practical and ethical questions raised by participants.

Learners’ theme 4: suggested key features for AI

Support for Human Situation Awareness” included participants’ reflections on how AI tools’ output should be accessible and applicable. They emphasized that “being easy to perceive will be the key. It must be output that people notice.”Desire for Human Involvement in Evaluation” included participants’ insistence on needing expert oversight when using AI tools. One noted that “AI is only as good as the input you give it, and its output must be “evaluated with expert knowledge.

Discussion

In this exploratory study, we examined the current capabilities of two AI tools to generate teamwork reports from transcripts of simulated cases, with the aim of supporting post-scenario debriefing rather than conducting a comparative evaluation. Experienced simulation experts evaluated the quality and relevance of the AI-generated reports, while learners reflected on their experiences of being observed and analyzed by AI tools. Simulation experts viewed the reports as valuable adjuncts for debriefing due to their ability to capture overlooked interactional details and provide structured illustrative quotes, while also identifying limitations related to categorization accuracy, contextual understanding, and potential bias. Learners expressed general optimism about the AI tool’s potential benefits, including efficiency and perceived objectivity, alongside concerns about transparency, data protection, and the impact on communication skills.

Simulation experts’ experiences with AI-generated teamwork reports

A central strength of AI-generated reports was what experts described as “broader observation capture”. Both AI tools offered relevant interactional details that would have likely escaped the expert’s notice during the scenario. This observation is consistent with constraints of real-time observation in immersive simulation [38, 64]. Scenario coordination, alongside attentive analysis of team dynamics, does occur in a linear or clearly demarcated sequence; instead, it merges through rapid, overlapping verbal and nonverbal exchanges. Even experienced facilitators, operating under significant cognitive load, may miss subtle but important moments in interactional dynamics [31, 6567]. In this context, AI-generated transcripts and selected quotes can serve as a post-hoc reconstruction, enabling facilitators to revisit sequences with greater granularity. The AI tools showed to have the ability to capture illustrative quotes that aligned with teamwork dynamics, with one simulation expert noting that: “the [AI tools] remarks helped to build trust, spontaneously and without the topic having been discussed [further by the debriefer].” This aligns with emerging evidence that AI tools can produce clinically relevant summaries with reasonable completeness when guided by structured prompts, even though contextual fidelity remains a challenge [68, 69]. This was closely tied to concerns related to the unclear, inaccurate, or overlapping assignment of quotes and categorizations, as well as insufficient contextual justification. Categorizing teamwork behaviors is not a purely lexical task; it requires sensitivity to intent, timing, role expectations, and sequential dependencies. These concerns relate to critiques of AI tools in healthcare, where hallucinations and context-insensitive outputs pose risks [70]. Simulation experts did not reject the AI tools categorization in many cases, but emphasized the need for richer contextual embedding, such as longer dialogue excerpts. A major concern raised by simulation experts is the misidentification of speakers and what was perceived as overconfident or unwarranted wording. Experts also questioned whether AI implicitly inferred professional roles based on speech patterns or content, for example, if a directive statement originated from a physician rather than a nurse. One debriefer explicitly asked whether “a bias [could] be at play, assuming certain roles automatically belong to certain professions?” This observation aligns with the broader literature on bias in AI tools, which reflects the sociocultural patterns embedded in training data and may perpetuate hierarchical assumptions unless explicitly addressed [71]. Human observers also bring expectations, schemas, and role stereotypes to their evaluations. However, whereas human biases are often reflected upon and distributed across multiple facilitators, AI biases may be systematic and replicated on a large scale [72, 73]. Simulation Experts suggested practical mitigation strategies, including manual speaker identification as input to the AI tool, linking quotes directly to timestamps, and pairing AI outputs with written context prior to analysis. Human validation of AI-generated reports could create a bias-checking loop, ensuring that no single interpretive source, human or AI tool, dominates the debriefing narrative. Without workflow optimization, AI tools risk shifting rather than reducing cognitive burden, forcing facilitators to expend additional effort reconstructing who said what, when, and in response to which event. This concern aligns with findings from prior work, which indicate that the introduction of AI into debriefing can increase facilitators’ mental demand if systems are not streamlined and well-integrated [50]. From a systems perspective, these observations reinforce the importance of human-in-the-loop (HITL) design. AI tools are increasingly capable of autonomously generating structured reports, but the literature consistently emphasizes that they do not replace the need for human judgment, contextual understanding, and accountability [74]. Effective HITL systems actively integrate human expertise into the AI decision-making cycle, rather than relegating it to post-hoc review [75, 76]. Evidence from clinical contexts suggests that expert oversight of AI-generated text enhances quality and reduces risk compared to unattended automation, albeit with trade-offs related to efficiency and cognitive load [77]. In simulation debriefing, where interpretive nuance and pedagogical intent are central, such trade-offs may be both inevitable and acceptable. A further limitation identified by simulation experts was the absence of nonverbal information in AI-generated reports. While text-based transcripts capture what was said, they fail to represent how it was said and how it was embodied in its situational, simulated context. Experts have noted that critical teamwork constructs are often conveyed through tone of voice, prosody, body orientation, gaze, and overall atmosphere. One simulation expert remarked that “the calm and controlled tone that contributed to psychological safety is missing.” This limitation is known to affect teamwork research, where nonverbal communication is recognized as a crucial component of coordination and trust [7, 64]. From a technical standpoint, integrating audio and video into live AI tool analysis is feasible, offering richer data for interpretation. Combining audio, text, and video can enhance analytical depth [78]. However, practical constraints, including the need for high-resolution recordings, data upload, and governance challenges, as well as the ethical implications of streaming sensitive audiovisual data, exist, making it difficult for simulation centers to live-upload video into AI tools. Text-based systems may be enhanced through improved contextualization, speaker identification, and integration with scenario timelines [79]. Multimodal systems incorporating selected audio or video segments may offer more faithful representations of team dynamics, provided that governance frameworks and human oversight remain central to their operation [80].

Learners’ experiences of being observed and analyzed by AI

Learners’ perspectives on AI-based observation revealed optimism tempered by caution. Some learners reported that the presence of AI tools faded into the background and did not influence their behavior, while others described a sense of inhibition associated with being observed, regardless of whether the observer was human or artificial. This suggests that the mere awareness of observation, rather than the AI tools per se, may shape learner behavior, a phenomenon well described in educational and organizational psychology [81]. Many learners expressed enthusiasm about the potential benefits of AI. They envisioned AI as a tool that could enhance perception, structure feedback, and increase efficiency by automating aspects of analysis. Many emphasized objectivity and impartiality, contrasting AI with human observers who may carry biases or be influenced by prior experiences. This perceived independence from human factors aligns with arguments in favor of technical performance assessment in complex team settings, where subjective interpretation can compromise reliability [64]. At the same time, a “Lack of transparency” emerged as a central concern. Participants questioned how AI-generated outputs were produced and whether they could be trusted, particularly given the confident tone often adopted by AI systems. One learner noted that “AI can present facts in a way that makes them seem true, highlighting the risk that errors may go unnoticed precisely because AI outputs lack the emotional cues and uncertainty that characterize human feedback. These concerns mirror those raised by simulation experts and align with broader debates about explainability and trust in AI systems used in healthcare [82]. Learners expressed unease about how their data were stored, processed, and potentially shared. Concerns about misinterpretation extended to linguistic issues, including local dialects and abbreviations. Together, these observations highlight the importance of transparent communication regarding data handling, technical limitations, and error mitigation strategies. Learners also reflected on potential long-term risks associated with increasing reliance on AI tools. Some worried about the erosion of cognitive, social, and communicative skills if AI tools were to assume roles traditionally fulfilled by human interaction [83]. Learners strongly emphasized the importance of human involvement in interpretation and decision-making, and this reflects foundational principles of human-centered AI design [84]: systems should enhance, not replace, expert judgment; AI-generated output should be transparent, interpretable, and actionable; and they should respect the social and ethical dimensions of educational practice.

Limitations and future research

This study was conducted at a single simulation center, which had a limited number of scenarios and simulation experts. The results provide qualitative depth but are not generalizable. Moreover, the analysis relied exclusively on one RTF prompt and verbatim transcripts, which included local dialects, potentially introducing transcription-related errors and limiting the AI’s interpretive accuracy. The AI tools were not trained or fine-tuned on domain-specific simulation data. Supervised HITL may improve output reliability, as future research could explore a live-feed multimedia approach that combines video, audio, and context through annotations from human facilitators directly to AI tools. The authors’ prior experience inevitably influenced study design and interpretation. All authors have professional expertise in simulation-based education and qualitative methods, including thematic analysis. We recognize that our perspectives may have shaped data interpretation, coding decisions, and the framing of themes. To mitigate potential bias, multiple researchers independently reviewed transcripts and AI-generated outputs, and discrepancies were resolved through discussion. Although participants did not have access to AI-generated debriefings, they did not perceive AI as threatening, while expressing concerns regarding privacy, data storage, and AI-derived analyses. Consistent with recent interview studies, participants appear receptive to AI-generated feedback when it provides concrete, relevant suggestions, is embedded within a guided feedback process, and is not used as a standalone evaluative tool [85]. AI-generated reports may serve as a standardized baseline to identify recurring themes, communication dynamics, or technical skills of participants, and should help open the facilitator-led discussion. Future research should focus on integrating AI tools with live, multimodal inputs, including video, audio, and HITL context, to support debriefing in complex simulation environments.

Conclusion

AI tools, such as ChatGPT and Isaac, can provide a structured report in simulation-based education by organizing team communication and highlighting relevant interactions. These may help facilitators and learners rapidly identify key events, recurrent themes, and potential learning objectives, thereby improving efficiency and focus on the outset of the debriefing. Current limitations in contextual understanding and nuanced interpretation underscore the need for these systems to complement, rather than replace, human-led debriefing, as it is dependent on interactive discussion.

Supplementary Information

Supplementary Material 1 (512.4KB, docx)
Supplementary Material 2 (77.6KB, pdf)

Acknowledgements

The authors are very grateful to all study participants. They thank Andrina Nef, Barbara Fratangeli, Violetta Beier and Claudia Lang-Schnyder for their technical and operational support.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors used OpenAI (2024) ChatGPT 3.5/4.0 and DeepL (DeepL SE, Cologne, Germany) to improve the linguistic quality of the manuscript. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article.

DWT

is an anesthesia attending physician and the head of the Visualization Technology Research Group at the Institute of Anesthesiology and Perioperative Medicine of the University Hospital Zurich. ME is a medical student with experience as an anesthesia simulation assistant. AR is an anesthesia resident physician. HW is a medical student. HH is an anesthesia nurse and simulation educator. MT is an anesthesia attending physician and simulation educator. AK is an anesthesia senior attending physician. BG is an anesthesia senior attending physician, simulation educator, and medical director of the simulation center. JCS is a psychologist and head of training and faculty development of the simulation center, and MK is a psychologist and director of the simulation center.

Authors’ contributions

DWT: Supervision, Conceptualization, Data Collection, Project Administration, Investigation, Writing – Original Draft, ME: Data Curation, Formal Analysis, Methodology, Visualization, Writing – Original Draft, AR: Data Collection, Data Curation, HW: Data collection, HH: Data Collection, MT: Data Collection, AK: Data Collection, BG: Data Collection, JCS: Conceptualization, Data Collection, Writing – Original Draft, MK: Supervision, Conceptualization, Data Collection, Project Administration, Investigation, Writing – Original Draft, All authors: Writing – Review & Editing.

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich. This research received no specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability

Most data generated or analysed during this study are included in this published article (see Supplementary Tables 1 and Supplementary Table 2).

Declarations

Ethical approval and consent to participate

The study was reviewed by the Cantonal Ethics Committee of Zurich and classified as not falling under the scope of the Human Research Act (BASEC ID: Req-2024-01642). All participants provided written informed consent before participation.

Consent for publication

none.

Competing interests

DWT is an inventor of Visual Patient, Visual Patient Predictive, Visual Blood, Visual Clot, Visual Hemofilter, and Visual Patient Heart with intellectual property held by Philips and the University of Zurich. DWT receives research funding, honoraria, royalties, and travel support through joint development and licensing agreements. Instrumentation Laboratory–Werfen, the Swiss Foundation for Anaesthesia Research, and the International Symposium on Intensive Care and Emergency Medicine have provided additional honoraria and travel support. DWT serves on the Philips Patient Safety Advisory Board. ME is registered inventor of Visual Patient Heart. AK received lecture honoraria from Bayer AG (Switzerland), CSL Behring GmbH (Switzerland) and advisory honoraria from AstraZeneca AG (Switzerland) and Pharmacosmos (Switzerland). All other authors declare no conflict of interest.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

David W. Tscholl and Max Ebensperger contributed equally to this work.

References

  • 1.World Health Organization. Global Patient Safety Action Plan 2021–2030: Towards Eliminating Avoidable Harm in Health Care. 1st ed. Geneva: World Health Organization; 2021. [Google Scholar]
  • 2.Tannenbaum SI, Greilich PE. The debrief imperative: building teaming competencies and team effectiveness. BMJ Qual Saf. 2023;32:125–8. 10.1136/bmjqs-2022-015259. [DOI] [PubMed] [Google Scholar]
  • 3.Greilich PE, Kilcullen M, Paquette S, Lazzara EH, Scielzo S, Hernandez J, et al. Team FIRST framework: identifying core teamwork competencies critical to interprofessional healthcare curricula. J Clin Transl Sci. 2023;7:e106. 10.1017/cts.2023.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tannenbaum SI, Traylor AM, Thomas EJ, Salas E. Managing teamwork in the face of pandemic: evidence-based tips. BMJ Qual Saf. 2021;30:59–63. 10.1136/bmjqs-2020-011447. [DOI] [PubMed] [Google Scholar]
  • 5.Tschan F, Keller S, Semmer NK, Timm-Holzer E, Zimmermann J, Huber SA, et al. Effects of structured intraoperative briefings on patient outcomes: multicentre before-and-after study. Br J Surg. 2021;109:136–44. 10.1093/bjs/znab384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kolbe M, Schmutz S, Seelandt JC, Eppich WJ, Schmutz JB. Team debriefings in healthcare: aligning intention and impact. BMJ. 2021;374:n2042. 10.1136/bmj.n2042. [DOI] [PubMed] [Google Scholar]
  • 7.Rudolph JW, Pian-Smith MCM, Minehart RD. Setting the stage for speaking up: psychological safety and directing care in acute care collaboration. Br J Anaesth. 2022;128:3–7. 10.1016/j.bja.2021.09.014. [DOI] [PubMed] [Google Scholar]
  • 8.Schmutz JB, Lei Z, Eppich WJ, Manser T. Reflection in the heat of the moment: the role of in-action team reflexivity in health care emergency teams. J Organ Behav. 2018;39:749–65. 10.1002/job.2299. [Google Scholar]
  • 9.Stefanidis D, Cook D, Kalantar-Motamedi S-M, Muret-Wagstaff S, Calhoun AW, Lauridsen KG, et al. Society for simulation in healthcare guidelines for simulation training. Simul Healthc. 2024;19:S4–22. 10.1097/SIH.0000000000000776. [DOI] [PubMed] [Google Scholar]
  • 10.Salas E, Rosen MA. Building high reliability teams: progress and some reflections on teamwork training. BMJ Qual Saf. 2013;22:369–73. 10.1136/bmjqs-2013-002015. [DOI] [PubMed] [Google Scholar]
  • 11.Salas E, Paige JT, Rosen MA. Creating new realities in healthcare: the status of simulation-based training as a patient safety improvement strategy. BMJ Qual Saf. 2013;22:449–52. 10.1136/bmjqs-2013-002112. [DOI] [PubMed] [Google Scholar]
  • 12.Diaz-Navarro C, Armstrong R, Charnetski M, Freeman KJ, Koh S, Reedy G, et al. Global consensus statement on simulation-based practice in healthcare. Adv Simul. 2024;9:19. 10.1186/s41077-024-00288-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cheng A, Eppich W, Epps C, Kolbe M, Meguerdichian M, Grant V. Embracing informed learner self-assessment during debriefing: the art of plus-delta. Adv Simul. 2021;6:22. 10.1186/s41077-021-00173-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kolbe M, Grande B, Spahn DR. Briefing and debriefing during simulation-based training and beyond: content, structure, attitude and setting. Best Pract Res Clin Anaesthesiol. 2015;29:87–96. 10.1016/j.bpa.2015.01.002. [DOI] [PubMed] [Google Scholar]
  • 15.Kolbe M, Grande B, Lehmann-Willenbrock N, Seelandt JC. Helping healthcare teams to debrief effectively: associations of debriefers’ actions and participants’ reflections during team debriefings. BMJ Qual Saf. 2023;32:160–72. 10.1136/bmjqs-2021-014393. [DOI] [PubMed] [Google Scholar]
  • 16.Duff JP, Morse KJ, Seelandt J, Gross IT, Lydston M, Sargeant J, et al. Debriefing methods for simulation in healthcare: a systematic review. Simul Healthc. 2024;19:S112–21. 10.1097/SIH.0000000000000765. [DOI] [PubMed] [Google Scholar]
  • 17.Rudolph JW, Simon R, Rivard P, Dufresne RL, Raemer DB. Debriefing with good judgment: combining rigorous feedback with genuine inquiry. Anesthesiol Clin. 2007;25:361–76. 10.1016/j.anclin.2007.03.007. [DOI] [PubMed] [Google Scholar]
  • 18.Rudolph JW, Simon R, Raemer DB, Eppich WJ. Debriefing as formative assessment: closing performance gaps in medical education. Acad Emerg Med. 2008;15:1010–6. 10.1111/j.1553-2712.2008.00248.x. [DOI] [PubMed] [Google Scholar]
  • 19.Fey MK, Roussin CJ, Rudolph JW, Morse KJ, Palaganas JC, Szyld D. Teaching, coaching, or debriefing with good judgment: a roadmap for implementing “With Good Judgment” across the SimZones. Adv Simul. 2022;7:39. 10.1186/s41077-022-00235-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kolbe M, Weiss M, Grote G, Knauth A, Dambach M, Spahn DR, Grande B. TeamGAINS: a tool for structured debriefings for simulation-based team trainings. BMJ Qual Saf. 2013;22:541–53. 10.1136/bmjqs-2012-000917. [DOI] [PubMed] [Google Scholar]
  • 21.Meguerdichian MJ, Trottier DG, Campbell-Taylor K, Bentley S, Bryant K, Kolbe M, et al. When common cognitive biases impact debriefing conversations. Adv Simul. 2024;9:48. 10.1186/s41077-024-00324-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tavares W, Eppich W, Cheng A, Miller S, Teunissen PW, Watling CJ, Sargeant J. Learning conversations: an analysis of the theoretical roots and their manifestations of feedback and debriefing in medical education. Acad Med. 2020;95:1020–5. 10.1097/ACM.0000000000002932. [DOI] [PubMed] [Google Scholar]
  • 23.Rudolph JW, Foldy EG, Robinson T, Kendall S, Taylor SS, Simon R. Helping without harming: the instructor’s feedback dilemma in debriefing–a case study. Simul Healthc. 2013;8:304–16. 10.1097/SIH.0b013e318294854e. [DOI] [PubMed] [Google Scholar]
  • 24.Cheng A, Palaganas J, Eppich W, Rudolph J, Robinson T, Grant V. Co-debriefing for simulation-based education: a primer for facilitators. Simul Healthc. 2015;10:69–75. 10.1097/SIH.0000000000000077. [DOI] [PubMed] [Google Scholar]
  • 25.Des Roze Ordons AL, Eppich W, Lockyer J, Wilkie RD, Grant V, Cheng A. Guiding, intermediating, facilitating, and teaching (GIFT): a conceptual framework for simulation educator roles in healthcare debriefing. Simul Healthc. 2022;17:283–92. 10.1097/SIH.0000000000000619. [DOI] [PubMed] [Google Scholar]
  • 26.Kolbe M, Marty A, Seelandt J, Grande B. How to debrief teamwork interactions: using circular questions to explore and change team interaction patterns. Adv Simul. 2016;1:29. 10.1186/s41077-016-0029-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Salas E, Klein C, King H, Salisbury M, Augenstein JS, Birnbach DJ, et al. Debriefing medical teams: 12 evidence-based best practices and tips. Jt Comm J Qual Patient Saf. 2008;34:518–27. 10.1016/s1553-7250(08)34066-5. [DOI] [PubMed] [Google Scholar]
  • 28.Kolb DA. Experimental learning: Experience as the source of learning and development. Englewood Cliffs, N.J.: Prentice-Hall; 1984. [Google Scholar]
  • 29.Kolbe M, Grote G, Waller MJ, Wacker J, Grande B, Burtscher MJ, Spahn DR. Monitoring and talking to the room: autochthonous coordination patterns in team interaction and performance. J Appl Psychol. 2014;99:1254–67. 10.1037/a0037877. [DOI] [PubMed] [Google Scholar]
  • 30.Lei Z, Waller MJ, Hagen J, Kaplan S. Team adaptiveness in dynamiccontexts: Contextualizing the roles of interaction patterns and in-process planning. Group &Organization Management. 2016;41(4):491–525. 10.1177/1059601115615246.
  • 31.Zijlstra FRH, Waller MJ, Phillips SI. Setting the tone: early interaction patterns in swift-starting teams as a predictor of effectiveness. Eur J Work Organ Psychol. 2012;21:749–77. 10.1080/1359432X.2012.690399. [Google Scholar]
  • 32.Grote G, Kolbe M, Waller MJ. The dual nature of adaptive coordination in teams. Organ Psychol Rev. 2018;8:125–48. 10.1177/2041386618790112. [Google Scholar]
  • 33.Brauner E, Boos M, Kolbe M. (Eds.). The Cambridge handbook of group interaction analysis. Cambridge: Cambridge University Press; 2018.
  • 34.Kolbe M, Boos M. Laborious but Elaborate: The Benefits of Really Studying Team Dynamics. Front Psychol. 2019;10:1478. 10.3389/fpsyg.2019.01478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Weingart LR. How did they do that? The ways and means of studying group process. Res Organ Behav. 1997;19:189–239.
  • 36.Seelandt JC, Tschan F, Keller S, Beldi G, Jenni N, Kurmann A, et al. Assessing distractors and teamwork during surgery: developing an event-based method for direct observation. BMJ Qual Saf. 2014;23:918–29. 10.1136/bmjqs-2014-002860. [DOI] [PubMed] [Google Scholar]
  • 37.Meguerdichian M, Walker K, Bajaj K. Working memory is limited: improving knowledge transfer by optimising simulation through cognitive load theory. BMJ Simul Technol Enhanc Learn. 2016;2:131–8. 10.1136/bmjstel-2015-000098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Fraser KL, Meguerdichian MJ, Haws JT, Grant VJ, Bajaj K, Cheng A. Cognitive load theory for debriefing simulations: implications for faculty development. Adv Simul. 2018;3:28. 10.1186/s41077-018-0086-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Fraser K, McLaughlin K. Temporal pattern of emotions and cognitive load during simulation training and debriefing. Med Teach. 2019;41:184–9. 10.1080/0142159X.2018.1459531. [DOI] [PubMed] [Google Scholar]
  • 40.Gardner AK, Rodgers DL, Steinert Y, Davis R, Condron C, Peterson DT, et al. Mapping the terrain of faculty development for simulation: a scoping review. Simul Healthc J Soc Simul Healthc. 2024;19:S75–89. 10.1097/SIH.0000000000000758. [DOI] [PubMed] [Google Scholar]
  • 41.Mommers L, Verstegen D, Dolmans D, van Mook WNKA. Observation of behavioural skills by medical simulation facilitators: a cross-sectional analysis of self-reported importance, difficulties, observation strategies and expertise development. Adv Simul (Lond). 2023;8:28. 10.1186/s41077-023-00268-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Takita H, Kabata D, Walston SL, Tatekawa H, Saito K, Tsujimoto Y, et al. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. NPJ Digit Med. 2025;8:175. 10.1038/s41746-025-01543-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Adams R, Henry KE, Sridharan A, Soleimani H, Zhan A, Rawat N, et al. Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis. Nat Med. 2022;28:1455–60. 10.1038/s41591-022-01894-0. [DOI] [PubMed] [Google Scholar]
  • 44.Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton E, et al. Applications and concerns of ChatGPT and other conversational large language models in health care: systematic review. J Med Internet Res. 2024;26:e22769. 10.2196/22769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bednarczyk L, Reichenpfader D, Gaudet-Blavignac C, Ette AK, Zaghir J, Zheng Y, et al. Scientific evidence for clinical text summarization using large language models: scoping review. J Med Internet Res. 2025;27:e68998. 10.2196/68998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Shashikumar SP, Mohammadi S, Krishnamoorthy R, Patel A, Wardi G, Ahn JC, et al. Development and prospective implementation of a large language model based system for early sepsis prediction. NPJ Digit Med. 2025;8:290. 10.1038/s41746-025-01689-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zhang K, Meng X, Yan X, Ji J, Liu J, Xu H, et al. Revolutionizing health care: the transformative impact of large language models in medicine. J Med Internet Res. 2025;27:e59069. 10.2196/59069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Barra FL, Rodella G, Costa A, Scalogna A, Carenzo L, Monzani A, et al. From prompt to platform: an agentic AI workflow for healthcare simulation scenario design. Adv Simul (Lond). 2025;10:29. 10.1186/s41077-025-00357-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Berger-Estilita J, Gisselbaek M, Devos A, Chan A, Ingrassia PL, Meco BC, et al. AI and inclusion in simulation education and leadership: a global cross-sectional evaluation of diversity. Adv Simul (Lond). 2025;10:26. 10.1186/s41077-025-00355-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hong E, Kazmir S, Dylik B, Auerbach M, Rosati M, Athanasopoulou S, et al. Exploring the use of a large language model in simulation debriefing: an observational simulation-based pilot study. Simul Healthc. 2025. 10.1097/SIH.0000000000000861. [DOI] [PubMed] [Google Scholar]
  • 51.Rodgers DL, Needler M, Robinson A, Barnes R, Brosche T, Hernandez J, et al. Artificial intelligence and the simulationists. Simul Healthc. 2023;18:395–9. 10.1097/SIH.0000000000000747. [DOI] [PubMed] [Google Scholar]
  • 52.Maaz S, Palaganas JC, Palaganas G, Bajwa M. A guide to prompt design: foundations and applications for healthcare simulationists. Front Med Lausanne. 2024;11:1504532. 10.3389/fmed.2024.1504532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Rudolph JW, Raemer DB, Simon R. Establishing a safe container for learning in simulation: the role of the presimulation briefing. Simul Healthc. 2014;9:339–49. 10.1097/SIH.0000000000000047. [DOI] [PubMed] [Google Scholar]
  • 54.Laerdal Inc. SimMan 3G PLUS. 23.08.2025. https://laerdal.com/ch/products/simulation-training/emergency-care-trauma/simman-3g/.
  • 55.Chen B, Zhang Z, Langrené N, Zhu S. Unleashing the potential of prompt engineering for large language models. Patterns. 2025;6:101260. 10.1016/j.patter.2025.101260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Giray L. Prompt engineering with ChatGPT: a guide for academic writers. Ann Biomed Eng. 2023;51:2629–33. 10.1007/s10439-023-03272-4. [DOI] [PubMed] [Google Scholar]
  • 57.Braun V, Clarke V, Hayfield N, Terry G. Thematic analysis. In: Liamputtong P, editor. Handbook of research methods in health social sciences: with 192 figures and 81 tables. Singapore: Springer Singapore; 2019. pp. 843–60. 10.1007/978-981-10-5251-4_103. [Google Scholar]
  • 58.Roche TR, Said S, Braun J, Maas EJC, Machado C, Grande B, et al. Avatar-based patient monitoring in critical anaesthesia events: a randomised high-fidelity simulation study. Br J Anaesth. 2021;126:1046–54. 10.1016/j.bja.2021.01.015. [DOI] [PubMed] [Google Scholar]
  • 59.Gasciauskaite G, Castellucci C, Malorgio A, Budowski AD, Schweiger G, Kolbe M, et al. User perceptions of visual clot in a high-fidelity simulation study: mixed qualitative-quantitative study. JMIR Hum Factors. 2024;11:e47991. 10.2196/47991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Seelandt JC, Grande B, Kriech S, Kolbe M. DE-CODE: a coding scheme for assessing debriefing interactions. BMJ Simul Technol Enhanc Learn. 2018;4:51–8. 10.1136/bmjstel-2017-000233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lemke R, Burtscher MJ, Seelandt JC, Grande B, Kolbe M. Associations of form and function of speaking up in anaesthesia: a prospective observational study. Br J Anaesth. 2021;127:971–80. 10.1016/j.bja.2021.08.014. [DOI] [PubMed] [Google Scholar]
  • 62.Kolbe M, Goldhahn J, Useini M, Grande B. Asking for help is a strength"-how to promote undergraduate medical students’ teamwork through simulation training and interprofessional faculty. Front Psychol. 2023;14:1214091. 10.3389/fpsyg.2023.1214091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Seelandt JC, Boos M, Kolbe M, Kämmer JE. How to enrich team research in healthcare by considering five theoretical perspectives. Front Psychol. 2023;14:1232331. 10.3389/fpsyg.2023.1232331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Kolbe M, Eppich W, Rudolph J, Meguerdichian M, Catena H, Cripps A, et al. Managing psychological safety in debriefings: a dynamic balancing act. BMJ Simul Technol Enhanc Learn. 2020;6:164–71. 10.1136/bmjstel-2019-000470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kolbe M, Burtscher MJ, Manser T. Co-ACT–a framework for observing coordination behaviour in acute care teams. BMJ Qual Saf. 2013;22:596–605. 10.1136/bmjqs-2012-001319. [DOI] [PubMed] [Google Scholar]
  • 66.Ishak AW, Ballard DI. Time to re-group. Small Group Res. 2012;43:3–29. 10.1177/1046496411425250. [Google Scholar]
  • 67.Sundstrom E, Meuse KPde, Futrell D. Work teams: applications and effectiveness. Am Psychol. 1990;45:120–33. 10.1037/0003-066X.45.2.120. [Google Scholar]
  • 68.Haniff Q, Meng Z, Pongkemmanun T, Sia ZC, Newport H, Ooi Y, et al. Use of artificial intelligence to transcribe and summarise general practice consultations. J Med Artif Intell. 2025;8:43. 10.21037/jmai-24-257. [Google Scholar]
  • 69.Ng JJW, Wang E, Zhou X, Zhou KX, Le Goh CX, Sim GZN, et al. Evaluating the performance of artificial intelligence-based speech recognition for clinical documentation: a systematic review. BMC Med Inf Decis Mak. 2025;25:236. 10.1186/s12911-025-03061-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Kernberg A, Gold JA, Mohan V. Using ChatGPT-4 to create structured medical notes from audio recordings of physician-patient encounters: comparative study. J Med Internet Res. 2024;26:e54419. 10.2196/54419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Schmidgall S, Harris C, Essien I, Olshvang D, Rahman T, Kim JW, et al. Evaluation and mitigation of cognitive biases in medical language models. NPJ Digit Med. 2024;7:295. 10.1038/s41746-024-01283-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Bender EM, Gebru T, McMillan-Major A, Shmitchell S. On the Dangers of Stochastic Parrots. In: FAccT ‘21: 2021 ACM Conference on Fairness, Accountability, and Transparency; 03 03 2021 10 03 2021; Virtual Event Canada. New York, NY, USA: ACM; 2021. pp. 610–623. 10.1145/3442188.3445922.
  • 73.Zack T, Lehman E, Suzgun M, Rodriguez JA, Celi LA, Gichoya J, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit Health. 2024;6:e12–22. 10.1016/S2589-7500(23)00225-X. [DOI] [PubMed] [Google Scholar]
  • 74.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
  • 75.Bakken S. AI in health: keeping the human in the loop. J Am Med Inform Assoc. 2023;30:1225–6. 10.1093/jamia/ocad091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Maadi M, Akbarzadeh Khorshidi H, Aickelin U. A review on human-AI interaction in machine learning and insights for medical applications. Int J Environ Res Public Health. 2021. 10.3390/ijerph18042121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Brewster RC, Tse G, Fan AL, Elborki M, Newell M, Gonzalez P, et al. Evaluating human-in-the-loop strategies for artificial intelligence-enabled translation of patient discharge instructions: a multidisciplinary analysis. NPJ Digit Med. 2025;8:629. 10.1038/s41746-025-02055-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Chen J, Lv Z, Wu S, Lin KQ, Song C, Gao D, et al. VideoLLM-online: Online Video Large Language Model for Streaming Video. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024:18407–18.
  • 79.Brutschi R, Wang R, Kolbe M, Weiss K, Lohmeyer Q, Meboldt M. Speech recognition technology for assessing team debriefing communication and interaction patterns: an algorithmic toolkit for healthcare simulation educators. Adv Simul (Lond). 2024;9:42. 10.1186/s41077-024-00315-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Mosqueira-Rey E, Hernández-Pereira E, Alonso-Ríos D, Bobes-Bascarán J, Fernández-Leal Á. Human-in-the-loop machine learning: a state of the art. Artif Intell Rev. 2023;56:3005–54. 10.1007/s10462-022-10246-w. [Google Scholar]
  • 81.Paradis E, Sutkin G. Beyond a good story: from Hawthorne effect to reactivity in health professions education research. Med Educ. 2017;51:31–9. 10.1111/medu.13122. [DOI] [PubMed] [Google Scholar]
  • 82.Henckert D, Malorgio A, Schweiger G, Raimann FJ, Piekarski F, Zacharowski K, et al. Attitudes of anesthesiologists toward artificial intelligence in anesthesia: a multicenter, mixed qualitative-quantitative study. J Clin Med. 2023. 10.3390/jcm12062096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Choudhury A, Chaudhry Z. Large language models and user trust: consequence of self-referential learning loop and the deskilling of health care professionals. J Med Internet Res. 2024;26:e56764. 10.2196/56764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Le Dinh T, Le TD, Uwizeyemungu S, Pelletier C. Human-centered artificial intelligence in higher education: a framework for systematic literature reviews. Information. 2025;16:240. 10.3390/info16030240. [Google Scholar]
  • 85.Brodowski H, Dammermann A, Kinyara MM, Obst MA, Samek F, Peifer C, et al. Student and lecturer perceptions of the use of an AI to improve communication skills in healthcare: an interview study. J Med Educ Curric Dev. 2025;12:23821205251358089. 10.1177/23821205251358089. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (512.4KB, docx)
Supplementary Material 2 (77.6KB, pdf)

Data Availability Statement

Most data generated or analysed during this study are included in this published article (see Supplementary Tables 1 and Supplementary Table 2).


Articles from Advances in Simulation are provided here courtesy of BMC

RESOURCES