Skip to main content
Springer logoLink to Springer
. 2025 Sep 13;4(1):79. doi: 10.1007/s44186-025-00390-6

Development of simulation scenarios for surgeons’ non-technical skills evaluation

Nicholas E Anton 1,, Brittany Anderson-Montoya 2, Amy Holmstrom 1, Marian Obuseh 3, Yichuan Yan 1, Payton M Miller 1, Wendy S Li 1, Qais AbuHasan 1, Denny Yu 3, Dimitrios Stefanidis 1
PMCID: PMC12433441  PMID: 40955203

Abstract

Purpose

The purpose of this study was to identify specific simulated scenarios and events for surgeons that discretely and effectively measure non-technical skills (NTS) constructs and appropriately measure NTS using specific behavioral anchors.

Methods

Over the course of two rounds, experts in NTS accumulated a comprehensive list of simulated scenarios used at our institution to evaluate NTS. Utilizing a survey, experts were able to review scenarios, events, and behavioral anchors to evaluate their agreement with the appropriateness of the events and effectiveness of the anchors. Coefficient of variation (CV) was used to evaluate agreement between raters. The highest rated events were identified.

Results

The CV between raters was moderate, and further inspection revealed that 86% of event ratings had discordance within ± 1 point. The highest rated events in terms of appropriateness and behavioral anchor effectiveness evaluated surgeons’ leadership and communication.

Conclusions

In this study, experts identified three simulated events that isolate and discretely evaluate specific NTS among surgeons. In the future, these scenarios can be used to effectively evaluate surgeons’ NTS.

Supplementary Information

The online version contains supplementary material available at 10.1007/s44186-025-00390-6.

Keywords: Simulation, Non-Technical Skills, Interpersonal Skills, Scenarios, Assessment, Behavioral Markers

Introduction

Among healthcare workers, high-level non-technical skills (NTS) are imperative for the safe and effective provision of care to patients. NTS domains include interpersonal (e.g., communication and leadership) and cognitive skills (e.g., decision making and situation awareness) that are needed for all aspects of surgical performance [1]. Among surgeons, situation awareness reflects the maintenance of up-to-date understanding of the operating room environment throughout a procedure, decision-making refers to cognitive flexibility to consider and execute appropriate actions, communication refers to how clinical information is conveyed to other members of the team, and leadership reflects the surgeons’ maintenance of standards and control of all activities in the operating room, particularly during stressful situations [2]. Literature suggests that nearly 50% of errors contributing to medical malpractice claims result from poor communication (i.e., 53% of poor communication events were attributed to provider-patient communication breakdowns and 47% were communication issues between providers) [3]. In surgery, specifically, NTS errors have been linked to technical errors and reduced patient safety [4]. Indeed, 43% of surgical errors during patient care are attributable to communication errors among healthcare personnel [5]. In a study of 45 attending surgeons’ NTS in the clinical environment over a period of 7 months, researchers have found that increased surgeon NTS (i.e., evaluated using observer-based ratings) are associated with reduced risk of patient mortality, post-operative complications, and return to the operating room [6]. Given the importance of NTS for clinical performance, it is necessary to consider how surgeons’ NTS are currently evaluated.

Current gold standards to measure NTS in surgery involve observer-based assessments in the clinical environment [7]. Observer-based measures of NTS rely on behavioral anchors to categorize the dynamic behaviors performed by providers in the clinical environment into numeric values [8]. However, due to the complex and subtle nature of surgeon NTS in the clinical environment, extensive rater training may be needed to accurately and reliably utilize common NTS measures [9]. Furthermore, by nature, NTS are highly interrelated. For example, research indicates that the communication of pertinent patient or case information underlies all interpersonal and cognitive NTS [10]. Isolating the evaluation of specific NTS in the clinical environment may be challenging. It is necessary, then, to consider assessment methods that allow for consistent and specific evaluation of surgeons’ NTS.

In healthcare education, simulation has emerged as an effective modality to provide learners with reproducible and immersive experiences [11]. Simulation, consisting of both procedural skills training and scenario-based learning opportunities, affords learners the opportunity to practice skills on inanimate models without risking harm to patients [1115]. Simulation has also shown promise as an effective method for high-stakes examination of trainees’ clinical skills. For example, the Advanced Cardiac Life Support program consists of a series of scenario-based simulations that evaluate incoming resident physicians’ ability to manage emergent cardiac events and is used as a certification tool for residents [16]. Despite the evidence suggesting simulated scenarios can aid in trainees’ acquisition of NTS, there have been few attempts, if any, to develop scenarios with embedded measures of NTS [1720]. Indeed, even in simulated scenarios, NTS are evaluated with the same global NTS assessments utilized in the clinical environment [8]. Given the controlled and reproducible environment afforded through simulated scenarios, it may be possible to develop events that evaluate specific NTS constructs. However, current simulation-based approaches to evaluating NTS do not isolate specific NTS for evaluation and do not have event-specific behavioral anchors for NTS evaluation.

The purpose of the current study was to identify simulated surgical patient care scenarios and events that discretely and effectively measure NTS constructs and appropriately measure NTS using specific behavioral anchors. We aimed to identify these simulated events by establishing consensus among researchers with expertise in evaluating surgeons’ NTS.

Methods

Our team, which has expertise in simulation scenario design, has developed a large library of scenarios with specific behavioral anchors to measure NTS [1921]. Accordingly, the research team first compiled a comprehensive list of all scenarios designed by our team to measure or train practicing surgeons, surgery residents, or medical students in NTS. These scenarios and behavioral anchors were inspired by our team’s collective expertise in evaluating surgeons’ NTS in the clinical intraoperative environment and the Non-Technical Skills for Surgeons (NOTSS) framework [2225]. All scenarios were designed with a presenting pre-operative, intraoperative, or post-operative clinical problem that participants had to manage. All cases focused on a single participant portraying the role of primary resident or surgeon in charge of the scenario, and all other members of the team were embedded to ensure consistency and replicability. Furthermore, the patient in all cases was represented by a high-fidelity patient manikin (Sim Man 3 g, Laerdal Medical). The same patient manikin was used for all cases. Throughout these scenarios, unexpected challenges to surgeons’ NTS were introduced by embedded research team members and evaluated by trained raters after the scenario using discrete behavioral anchors for each event. The behavioral anchors were developed specifically for each event by members of the research team. The goal of these anchors was to measure NTS constructs on 3 or 4-point scales ranging from poor to exemplary for that skill. Each point of the scales included exemplar behaviors that represented that respective level of NTS performance to guide raters in their evaluation of trainees (Fig. 1).

Fig. 1.

Fig. 1

Example NTS simulation events with embedded anchors

In round one, a team of six individuals (i.e., human factors PhD Students and surgical education research fellows and researchers, with significant experience assessing NTS in the clinical and simulation environment) performed a preliminary evaluation of scenarios and events on their suitability for further evaluation based on their appropriateness to challenge an NTS domain (i.e., defined in our scenarios as either Situation Awareness, Decision Making, Communication, or Leadership) and the effectiveness of the behavioral anchors to distinguish good and poor NTS. Suitable scenarios and events would be chosen for further evaluation in round two. In round two, two additional raters with expertise in surgeon NTS (i.e., human factors psychologist and surgeon) and who were not involved in the development of the simulated events were recruited to evaluate the suitable events identified in round one. Experts completed a survey to evaluate the appropriateness and effectiveness of identified events and met virtually to discuss ratings further and identified the most effective simulated events to assess surgeons’ NTS.

Round one

First, a framework was provided to reviewers that offered a comprehensive explanation of the NTS domains, explanations of sub-categories within each domain, and examples of exemplary and poor surgeon behaviors for each sub-category. At this time, the scenarios and NTS events were also provided to reviewers. This included a total of six scenarios with 30 discrete events. The following week, a virtual meeting was scheduled to review the framework, address any outstanding questions, and review the NTS scenarios and events in light of this framework. Each scenario stem was read to reviewers before reviewing each component event and the behavioral anchors to measure the NTS construct. Each reviewer was encouraged to provide feedback on the events. When a consensus was reached on whether an event should be included in the second round of evaluation, the team moved on to the next event.

Round two

In round two, a survey was developed using REDCap electronic data capture tools that provided respondents with the simulation case stem (i.e., including the case location, supplies available, and initial script provided to participants) [26, 27]. The survey has been provided in Appendix A. Then, each event was provided with the specific NTS construct being challenged, the behavioral anchors and their accompanying scores for the event, and a brief video clip showing the event occurring in the simulation environment. Respondents were then asked to rate the appropriateness of the event to measure the defined NTS construct on a five-point scale: 1—“Disagree Completely”, 2—“Disagree”, 3—“Neutral”, 4—“Agree”, 5—“Agree Completely”, and the effectiveness of the behavioral anchors to measure the defined NTS construct for this event on the same scale. Respondents were also provided optional free-text response options for both measures for each event.

The two raters were provided the NTS framework developed for round 1 via electronic mail and instructed to ask the facilitator any questions before proceeding to event rating. The experts then rated each of the events using the provided scale. Following event ratings, a virtual meeting was held to discuss any discrepancies in ratings and come to a consensus on which three events were optimal to study NTS in simulation scenarios.

Statistical analyses

The Statistical Package for Social Sciences (version 29) was used for statistical analysis (International Business Machines Corporation, Armonk, NY). Interrater agreement in round two evaluations was measured using the coefficient of variation (CV) for each event and overall agreement. The CV, or the proportion of the standard deviation to the mean, is calculated using the following formula: CV=σμ. The CV has been identified as an appropriate measure of interrater agreement for quantitative data [28]. A CV closer to 0 represents better interrater agreement, and results can be interpreted as: 0–0.7 = excellent agreement, 0.07–0.13 = moderate agreement, and > 0.13 = low agreement. Furthermore, since our objective was to establish a consensus among raters on those simulated events with high appropriateness and effectiveness ratings and raters completed evaluations independently, this approach represents a pseudo-Delphi study design. Based on a Delphi study with more than 100 respondents per round across three rounds, the CV was identified as the best statistical procedure to establish agreement in this type of research [29]. The current study differs appreciably from this previous work (i.e., two raters, only implemented in a single round), but since our objective was to measure the consensus between raters, we elected to use the CV in our work.

Results

The six raters in round one identified 14 events over five scenarios for further evaluation in round two (Appendix B). In round 2, the average event ratings for appropriateness of events for raters was 3.9 and 3.6, respectively. Regarding the effectiveness of NTS behavioral anchors, average event ratings were 3.8 and 3.2, respectively. Overall agreement between raters was moderate (CV = 0.12). The overall CV ranged from 0 to 0.33. The CV for appropriateness of events to measure NTS was 0.11, and the CV for effectiveness of NTS behavioral anchors was 0.14. The CV for each event is presented in Table 1. Further inspection revealed that 86% of event ratings (i.e., for both appropriateness of events and effectiveness of anchors) had discordance within ± 1 point.

Table 1.

Coefficient of variation for appropriateness and effectiveness ratings for events

Event Appropriateness CV Effectiveness CV
1 0.2 0.2
2 0 0.2
3 0 0.14
4 0.33 0.33
5 0.14 0.14
6 0 0.33
7 0 0
8 0 0.14
9 0.14 0
10 0.33 0.14
11 0.11 0.11
12 0.11 0
13 0.11 0.14
14 0 0

Appropriateness of events

Out of ten possible points (i.e., the combined score of the two raters), three events had the highest score of 9 out of 10 (Fig. 2). These events included event 11 (i.e., guidance of the scrub tech to secure the liver retractor for a hiatal hernia repair), event 12 (i.e., guidance of the scrub tech to maintain adequate retraction and hold the laparoscope appropriately), and event 13 (i.e., guide trainee to correct an injury that the trainee causes).

Fig. 2.

Fig. 2

Ratings for appropriateness of events to measure NTS construct

Effectiveness of behavioral anchors

Regarding the effectiveness of behavioral anchor ratings, the raters identified five events with effective ratings (i.e., events with combined scores of 8 or above) (Fig. 3).Specifically, these included events 7 (i.e., communication with anesthesia to troubleshoot causes for decreased spO2.), 9 (i.e., during timeout, novice circulator reads off incorrect procedure), 11, 12, and 14 (i.e., surgeon asks distracting personnel to leave operating room during crisis). Furthermore, the anchors for event 11 received the highest combined score of 9.

Fig. 3.

Fig. 3

Ratings for effectiveness of behavioral anchors to assess NTS construct

Discussion

The purpose of this study was to systematically identify simulated scenarios that appropriately evaluate specific NTS constructs and have discrete behavioral anchors that effectively measure surgeons’ NTS. Accordingly, our team initially reviewed all simulations evaluating NTS in our surgical education research program and identified those specific events (i.e., that were designed to challenge NTS) suitable for further evaluation. Our team then invited experts in NTS from surgery and human factors to review and score the identified simulated events. Overall agreement between raters was moderate (CV = 0.12). The literature suggests that CV between 0.07 and 0.13 reflects moderate agreement between raters [28]. We also found that six of the fourteen events had absolute agreement between raters regarding their appropriateness to measure particular NTS constructs. This approach to NTS event identification is the first of its kind, as surgical education researchers may develop simulation-based curricula to study and train surgeons’ NTS without establishing expert consensus of their appropriateness and effectiveness before deployment with participants [30]. The current study consulted expert opinion to identify which NTS simulation events and behavioral anchors could serve as effective simulation-based assessments of surgeons’ NTS.

Of the 14 events that were rated, 9 of 14 events were rated highly, but 3 events stood out having highly rated appropriateness and effectiveness to measure NTS (i.e., events 11, 12, and 14). These events focused on interpersonal NTS constructs like leadership and communication. These findings align with the literature on NTS evaluation, as interpersonal behaviors are easier and more reliably captured through explicit behaviors compared to cognitive NTS like decision-making and situation awareness (i.e., evidenced by higher reliability in the Non-Technical Skills for Surgeons tool interpersonal skill ratings compared to more variable ratings of cognitive skills) [31].

Reviewing our findings in more depth, it appears that raters evaluated events that clearly isolated single NTS constructs highly, whereas events that measured multiple NTS constructs concurrently were rated poorly. For example, event 11 was rated as a 9 out of 10 (i.e., indicating complete agreement) for its appropriateness to measure NTS and the effectiveness of its behavioral anchors, and one of the raters commented “I like this example for leadership”. The event specifically involved the need to place a liver retractor for a procedure and the first assist/trainee does not have experience affixing/constructing the liver retractor and is unable to do so independently. This event is realistic, as static liver retractors are commonly placed in minimally invasive surgical procedures but are not commonplace in all specialties [32]. Furthermore, due to operating room staffing shortages in hospitals across the United States, first assists may be required to assist in specialties outside of their primary specialty [33, 34]. For this event, ideal leadership is defined as the surgeon educating the first assist on how to place the retractor, whereas suboptimal leadership is defined as the surgeon becoming frustrated and exhibiting signs of stress. The NOTSS provides a framework for exemplary surgical leadership and details it as setting and maintaining standards, coping with pressure, and supporting others [25]. This event clearly isolates leadership of the OR team through participants’ support of the staff and ability to cope with pressure.

As opposed to the clear isolation and evaluation of surgical leadership in event 11, some events were more convoluted and were rated poorly accordingly. Event 1, for instance, was the lowest-rated event regarding its appropriateness to evaluate an NTS construct. One rater commented that this event did not isolate the assessment of surgeons’ leadership and instead, “addresses several of the other NTS”. The interrelation of NTS constructs has been suggested by researchers in the past. Indeed, in the aviation industry, the non-technical skills scale (NOTECHS) to evaluate crew members’ NTS opted not to include communication in their assessment due to the belief that it was inherent to all other aspects of NTS and was difficult to isolate [10, 35]. Our developed simulations faced similar challenges regarding the isolation of NTS for evaluation, which was most impactful to poorly rated events.

Three events (i.e., 4, 6, and 10) had high CVs compared to the others, which reflects poor rater agreement on their appropriateness and effectiveness. The high variability in ratings has several possible explanations. First, event 4, which had high CV for appropriateness and effectiveness, requires participants to manage an urgent page about a deteriorating patient in a different room (i.e., while caring for the primary deteriorating patient) requires surgeons to allocate resources between the two patients and make a decision about which patient is more acute. It is possible that this particular scenario is more appropriate to measure decision-making than leadership. Furthermore, the anchors may not effectively capture good and acceptable decisions about whether to attend to the urgent page or the primary patient, which led to the discordance in ratings. Event 6, which had high CV for just the effectiveness of the anchors, was designed to measure surgeons’ situation awareness and communication regarding the deterioration of the patient’s vital signs intraoperatively. However, the anchors we used focused more on surgeon communication in response to this event rather than their situation awareness. The lack of specificity of these anchors to measure situation awareness may have contributed to the high variability in responses. Finally, event 10 had high CV for the appropriateness of the simulation to measure situation awareness and leadership. The event was designed to simulate a patient falling off the operating room table due to not being secured properly. However, since the surgeon was expected to join the case after the patient was draped and did not have the opportunity to verify the appropriateness of the patient’s security before repositioning them, these circumstances may have artificially decreased the participants’ situation awareness. This could have contributed to the discordance in appropriateness evaluations between raters.

There were limitations to the current study. Rater agreement between experts was only moderate. Compared to studies that report interrater agreement for NTS evaluations, such as the NOTSS, rater agreement is often higher (e.g., intraclass correlation coefficients among trained NOTSS raters has been reported at 0.72 for the leadership domain) [36]. However, unlike studies involving actual NTS evaluations, there was no calibration training for our raters prior to establishing consensus. We relied entirely on their perceptions about the appropriateness and effectiveness of our simulated events to capture and measure surgeons’ NTS without any interference from the research team. Similar to a Delphi survey approach, which solicits expert opinions asynchronously to avoid any potential bias from impacting expert opinions, we wanted experts to review the simulated events without any interference from the study team [37]. However, unlike a Delphi approach which attempts to conduct multiple rounds of ratings to obtain a consensus, we relied on a single virtual meeting to verify findings and reach consensus. Despite this limitation, the percentage of ratings within one point of each other was 86%, which suggests that the expert raters were generally aligned in their evaluation of NTS events.

Another potential limitation of our study was the use of just two expert raters to evaluate events in round 2. In an effort to limit bias, our team aimed to recruit NTS experts outside of our study team, as this would ensure they did not participate in developing the scenarios. Due to these logistical constraints, our team was limited in the number of raters we were able to recruit for round 2. While the limited number of raters may have reduced the overall evaluation variability that would otherwise be seen with additional raters, the two selected raters represented expertise from the fields of surgery and human factors. Thus, the raters had diverse backgrounds, which may have contributed to their moderate absolute agreement. We can be confident, then, that those events that were evaluated highly by both raters represent appropriate and effective events for the measurement of surgeons’ NTS. Another limitation of our study was the measurement of appropriateness of the scenarios to measure particular NTS constructs and the effectiveness of behavioral anchors to delineate between various levels of NTS. We utilized five-point Likert scales to measure the salient aspects of simulated events to our study objectives, but there is no established literature supporting this approach. That being said, our overall study approach did align with established literature on the design of simulation scenarios to support performance assessment in healthcare [38]. Specifically, our team developed scenarios and mapped them to specific NTS constructs for surgeons in accordance with the literature; we selected a validation team with expertise in simulation and NTS measurement to review the content and identify scenarios to include for further review, and established consensus on the appropriateness and effectiveness of simulated events among experts using independent evaluation to avoid introducing bias.

While this initial work to identify simulated scenario that can effectively and appropriately measure NTS constructs is valuable, rigorous measurement science approaches will be needed to establish the utility of these simulated events to measure surgeons’ NTS. Specifically, establishing the interrater reliability of these NTS events will be paramount to determine if the detailed behavioral anchors contribute to consistent ratings. In order to establish the interrater reliability of these events and anchors, our team plans to conduct an initial study with at least three raters to establish kappa across ratings. Unlike assessing the proportion of absolute agreement between raters, which is crude given the lack of consideration to chance agreement, kappa enables researchers to study rater agreement while accounting for chance agreement [39]. This rigorous approach to studying interrater reliability of this proposed NTS approach is needed.

Conclusions

In this novel study, our team systematically developed simulation scenarios and specific events that enable the evaluation of surgeons’ NTS. The expert raters involved in our study identified three events that were appropriate for the evaluation of surgeons’ leadership and communication. In the future, our team may be able to leverage these scenarios to evaluate the benefits of NTS interventions or identify objective measures of surgeons’ NTS.

Supporting data

The author confirms that all quantitative data generated or analyzed during this study are included in this published article. Supporting qualitative data will be made available upon reasonable request.

Supplementary Information

Below is the link to the electronic supplementary material.

Acknowledgements

This study was funded by the Agency for Healthcare Research and Quality, Grant No.11001301

Declarations

Conflict of interest

The authors declare no conflicts of interest related to this work.

References

  • 1.Yule S, Flin R, Paterson-Brown S, Maran N. Non-technical skills for surgeons in the operating room: a review of the literature. Surgery. 2006;139(2):140–9. [DOI] [PubMed] [Google Scholar]
  • 2.Carthey J, de Leval MR, Wright DJ, Farewell VT, Reason JT. Behavioural markers of surgical excellence. Saf Sci. 2003;41(5):409–25. [Google Scholar]
  • 3.Humphrey KE, Sundberg M, Milliren CE, Graham DA, Landrigan CP. Frequency and nature of communication and handoff failures in medical malpractice claims. J Patient Saf. 2022;18(2):130–7. [DOI] [PubMed] [Google Scholar]
  • 4.Hull L, Arora S, Aggarwal R, Darzi A, Vincent C, Sevdalis N. The impact of nontechnical skills on technical performance in surgery: a systematic review. J Am Coll Surg. 2012;214(2):214–30. [DOI] [PubMed] [Google Scholar]
  • 5.Gawande AA, Zinner MJ, Studdert DM, Brennan TA. Analysis of errors reported by surgeons at three teaching hospitals. Surgery. 2003;133(6):614–21. [DOI] [PubMed] [Google Scholar]
  • 6.Abahuje E, Cong L, Iroz CB, Barsuk JH, Stey A, Likosky DS, et al. A prospective study to assess the relationship between nontechnical skills for surgeons (NOTSS) and patient outcomes. J Surg Educ. 2024;81(11):1568–76. [DOI] [PubMed] [Google Scholar]
  • 7.Allard M-A, Blanié A, Brouquet A, Benhamou D. Learning non-technical skills in surgery. J Visc Surg. 2020;157(3):S131–6. [DOI] [PubMed] [Google Scholar]
  • 8.Higham H, Greig P, Crabtree N, Hadjipavlou G, Young D, Vincent C. A study of validity and usability evidence for non-technical skills assessment tools in simulated adult resuscitation scenarios. BMC Med Educ. 2023;23(1):153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sharma B, Mishra A, Aggarwal R, Grantcharov TP. Non-technical skills assessment in surgery. Surg Oncol. 2011;20(3):169–77. [DOI] [PubMed] [Google Scholar]
  • 10.Cha JS, Yu D. Objective measures of surgeon non-technical skills in surgery: a scoping review. Hum Factors. 2022;64(1):42–73. [DOI] [PubMed] [Google Scholar]
  • 11.Ounounou E, Aydin A, Brunckhorst O, Khan MS, Dasgupta P, Ahmed K. Nontechnical skills in surgery: a systematic review of current training modalities. J Surg Educ. 2019;76(1):14–24. [DOI] [PubMed] [Google Scholar]
  • 12.Saleh GM, Wawrzynski JR, Saha K, Smith P, Flanagan D, Hingorani M, et al. Feasibility of human factors immersive simulation training in ophthalmology: the London pilot. JAMA Ophthalmol. 2016;134(8):905–11. [DOI] [PubMed] [Google Scholar]
  • 13.Lee JY, Mucksavage P, Canales C, McDougall EM, Lin S. High fidelity simulation based team training in urology: a preliminary interdisciplinary study of technical and nontechnical skills in laparoscopic complications management. J Urol. 2012;187(4):1385–91. [DOI] [PubMed] [Google Scholar]
  • 14.Brunckhorst O, Shahid S, Aydin A, McIlhenny C, Khan S, Raza SJ, et al. Simulation-based ureteroscopy skills training curriculum with integration of technical and non-technical skills: a randomised controlled trial. Surg Endosc. 2015;29:2728–35. [DOI] [PubMed] [Google Scholar]
  • 15.Brewin J, Tang J, Dasgupta P, Khan MS, Ahmed K, Bello F, et al. Full immersion simulation: validation of a distributed simulation environment for technical and non-technical skills training in urology. BJU Int. 2015;116(1):156–62. [DOI] [PubMed] [Google Scholar]
  • 16.Honarmand K, Mepham C, Ainsworth C, Khalid Z. Adherence to advanced cardiovascular life support (ACLS) guidelines during in-hospital cardiac arrest is associated with improved outcomes. Resuscitation. 2018;129:76–81. [DOI] [PubMed] [Google Scholar]
  • 17.Huffman EM, Anton NE, Athanasiadis DI, Ahmed R, Cooper D, Stefanidis D, et al. Multidisciplinary simulation-based trauma team training with an emphasis on crisis resource management improves residents’ non-technical skills. Surgery. 2021;170(4):1083–6. [DOI] [PubMed] [Google Scholar]
  • 18.Gjeraa K, Møller TP, Østergaard D. Efficacy of simulation-based trauma team training of non-technical skills. A systematic review. Acta Anaesthesiol Scand. 2014;58(7):775–87. [DOI] [PubMed] [Google Scholar]
  • 19.Anton NE, Cha JS, Hernandez E, Athanasiadis DI, Yang J, Zhou G, et al. Utilizing eye tracking to assess medical student non-technical performance during scenario-based simulation: results of a pilot study. Glob Surg Educ J Assoc Surg Educ. 2023;2(1):49. [Google Scholar]
  • 20.Ball M, Goodwin C, Cha JS, Anton N, Athanasiadis DI, Hernandez E, et al., editors. Evaluating Nontechnical Skills and Leadership Skills During Simulated Critical Care Scenarios. Proceedings of the Human Factors and Ergonomics Society Annual Meeting; 2023: SAGE Publications Sage CA: Los Angeles, CA.
  • 21.Narasimha S, Obuseh M, Anton NE, Chen H, Chakrabarty R, Stefanidis D, et al. Eye tracking and audio sensors to evaluate surgeon’s non-technical skills: an empirical study. Appl Ergon. 2024;119:104320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Obuseh M, Anton NE, Gardiner R, Chen M, Narasimha S, Stefanidis D, et al. Development and application of a non-technical skills coaching intervention framework for surgeons: a pilot quality improvement initiative. PLoS ONE. 2024;19(11):e0312125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cha JS, Athanasiadis DI, Peng Y, Wu D, Anton NE, Stefanidis D, et al. Objective nontechnical skills measurement using sensor-based behavior metrics in surgical teams. Hum Factors. 2024;66(3):729–43. [DOI] [PubMed] [Google Scholar]
  • 24.Cha JS, Athanasiadis D, Anton NE, Stefanidis D, Yu D. Measurement of nontechnical skills during robotic-assisted surgery using sensor-based communication and proximity metrics. JAMA Netw Open. 2021;4(11):e2132209-e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yule S, Gupta A, Blair PG, Sachdeva AK, Smink DS. Gathering validity evidence to adapt the non-technical skills for surgeons (NOTSS) assessment tool to the United States context. J Surg Educ. 2021;78(3):955–66. [DOI] [PubMed] [Google Scholar]
  • 26.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L, et al. The REDCap consortium: building an international community of software platform partners. J Biomed Inform. 2019;95:103208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Marella D, Bove G. Measures of interrater agreement for quantitative data. AStA Adv Stat Anal. 2023. 10.1007/s10182-023-00483-x. [Google Scholar]
  • 29.Shah HA, Kalaian SA. Which is the best parametric statistical method for analyzing Delphi data? J Mod Appl Stat Methods. 2009;8(1):20. [Google Scholar]
  • 30.Wood TC, Raison N, Haldar S, Brunckhorst O, McIlhenny C, Dasgupta P, et al. Training tools for nontechnical skills for surgeons—a systematic review. J Surg Educ. 2017;74(4):548–78. [DOI] [PubMed] [Google Scholar]
  • 31.Yule S, Paterson-Brown S. Surgeons’ non-technical skills. Surg Clin North Am. 2012;92(1):37–50. [DOI] [PubMed] [Google Scholar]
  • 32.Midya S, Ramus J, Hakim A, Jones G, Sampson M. Comparison of two types of liver retractors in laparoscopic Roux-en-Y gastric bypass for morbid obesity. Obes Surg. 2020;30:233–7. [DOI] [PubMed] [Google Scholar]
  • 33.Xie A, Duff J, Munday J. Perioperative nursing shortages: an integrative review of their impact, causal factors, and mitigation strategies. J Nurs Manag. 2024;2024(1):2983251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Witmer HD, Keçeli Ç, Morris-Levenson JA, Dhiman A, Kratochvil A, Matthews JB, et al. Operative team familiarity and specialization at an academic medical center. Ann Surg. 2023;277(5):e1006–17. [DOI] [PubMed] [Google Scholar]
  • 35.Flin R, Martin L, Goeters K-M, Hörmann H-J, Amalberti R, Valot C, et al. Development of the NOTECHS (non-technical skills) system for assessing pilots’ CRM skills. In: Human factors and aerospace safety. London: Routledge; 2018. p. 97–119.
  • 36.Yule S, Flin R, Maran N, Rowley D, Youngson G, Paterson-Brown S. Surgeons’ non-technical skills in the operating room: reliability testing of the NOTSS behavior rating system. World J Surg. 2008;32:548–56. [DOI] [PubMed] [Google Scholar]
  • 37.Hasson F, Keeney S, McKenna H. Research guidelines for the Delphi survey technique. J Adv Nurs. 2000;32(4):1008–15. [PubMed] [Google Scholar]
  • 38.O’Brien JE, Hagler D, Thompson MS. Designing simulation scenarios to support performance assessment validity. J Contin Educ Nurs. 2015;46(11):492–8. [DOI] [PubMed] [Google Scholar]
  • 39.Yudkowsky R, Park YS, Downing SM. Assessment in health professions education. New York: Routledge; 2019. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Global Surgical Education are provided here courtesy of Springer

RESOURCES