Abstract
Introduction
Our study explores the extent to which teams are accurate assessors of their own performance and teamwork, and how simulation can help this critical skill develop over time.
Methods
Surgery residents in teams of three completed five daily simulations. After each scenario, each team reviewed their performance and jointly completed a scenario-specific team performance evaluation and a 17-item Communication and Teamwork Skills tool. Videos were rated to obtain discrepancy values. Paired-samples t-tests and mean comparisons were used to examine changes in team self-assessment accuracy and comparisons between high-performing and low-performing teams.
Results
Resident (n=30) teams rated team performance higher than faculty across the first 3 days (p<0.01), but provided similar ratings thereafter. Agreement of team performance from day 1 to 5 significantly improved (p<0.001). Teams rated their teamwork higher than faculty across all days (p<0.01). Top performing teams provided more accurate self-assessments for both teamwork (average discrepancy 8% vs 39%) and team performance (average discrepancy 12% vs 23%).
Conclusion
Teams that continue to work together over time may become more accurate judges of their own performance, but do not become more accurate assessors of teamwork competencies.
Keywords: teamwork training, self-assessment
Healthcare teams must be able to accurately assess their skills for diagnostic and quality improvement purposes. Teams that review their own performance can gain additional insight into how their technical and non-technical skills measure compared with best practices, objective standards and organisational expectations. Importantly, self-assessment can also identify teams that may be overconfident or unable to recognise the limits of their competency.1 Team self-assessments are thus an important method of quality assurance and can improve patient safety and team-based errors.
For team self-assessments to have these benefits, though, they must be accurate. Unfortunately, work within surgical education and elsewhere has demonstrated that self-perceptions of skills rarely match actual abilities. Within surgery, studies show that individual self-assessments are poorly correlated with expert assessment scores and objective skill metrics.2–6 We know little, however, on the ability of teams to form accurate self-assessments of their performance. Team self-assessments may be susceptible to the same phenomena impacting individual self-assessment such as ego preservation or ‘unconscious incompetence’,7 thereby reducing accuracy. In contrast, the team may be ‘greater than the sum of its parts’ and be able to provide more precise evaluations of their performance.
Our study explores the extent to which teams are accurate assessors of their own performance, as measured by both teamwork and team effectiveness metrics. Additionally, we explore how these skills develop over time as teams continue to perform with one another in a variety of patient care scenarios.
Methods
First-year general surgery residents were randomly assigned to one of 10 three-person teams after completing Advanced Trauma Life Support certification. All residents also held certification in Advanced Cardiac Life Support. Teams were the same over the 5-day period and participated in one simulation per day. Scenario descriptions and expected resident responses are shown in table 1.
Table 1.
Description of the team-based simulation scenarios and expected responses
| Scenario | Scenario description | Expected responses | 
| 1 | Patient presents with both blunt and penetrating injuries. Findings indicate a pneumothorax and an abdominal penetrating injury in a haemodynamically unstable patient | Residents were expected to follow the ATLS algorithms to identify and manage all injuries and make the decision to take the patient to the operating room | 
| 2 | A postoperative patient complains of chest pain and subsequently goes into cardiac arrest | Residents were expected to follow the ACLS algorithms for a variety of arrhythmias, including pulseless electrical activity, ventricular fibrillation/tachycardia and supra ventricular tachycardia | 
| 3 | While completing a laparoscopic task in the operating room, a fire emerges | Residents had to work through a series of steps to minimise harm to the patient, including putting out the fire, shutting off oxygen in the room and communicating with the anaesthesia provider to move the patient to a safe location | 
| 4 | A postoperative patient suffers a PEA arrest. While treating the arrest, the resident team is notified of a DNR order. The next-of-kin family member enters the scene and vehemently disagrees with the DNR | Residents were expected to stabilise the patient, explain the circumstances to the family member and come to an agreement about further care | 
| 5 | A patient in the Surgical Intensive Care Unit with a known intracranial injury becomes bradycardic and hypertensive | Residents were expected to identify intracranial hypertension and to proceed with an escalation of care to decrease intracranial pressures | 
ACLS, Advanced Cardiac Life Support; ATLS, Advanced Trauma Life Support; DNR, Do Note Resuscitate; PEA, pulseless electrical activity.
All scenarios were programmed into a Human Patient Simulator with an initial standardised script that became dynamic as participants took action with the patient. Scenarios took place in a simulation lab fully equipped with all necessary instruments and materials, reflecting the appearance of a postanaesthesia care unit and operating room. This room was equipped with cameras and overhead microphones for video and audio recordings.
All residents participated in a 15 min orientation to the simulation space and equipment prior to the training sessions. The scenarios were designed to be approximately 20 min in duration. Prior to beginning each scenario, teams were instructed to delegate roles (team leader, airway management, assistant and so on) according to their own preference. On entering the room, teams were provided with a situation, background, assessment, recommendations (SBAR) report by the nurse confederate. The nurse was instructed to aid in the resuscitations only as directed by the residents.
After each scenario, each team reviewed their performance video and jointly completed a scenario-specific team effectiveness evaluation with both time and quality metrics used elsewhere within simulation education.8–10 Teams also completed an 8-item Mayo High-Performance Teamwork Scale11 for each performance episode. Teams were not provided with any performance feedback prior to completing the self-assessments.
Videos were rated using the same team performance and teamwork tools by a surgeon (team effectiveness) and a PhD (teamwork). Both faculties were involved in the creation of the team effectiveness tools and had prior experience using the tools for both formative and summative assessment. The PhD had expertise in team science and underwent training to use the teamwork tool for prior projects. Both teamwork and team effectiveness scores are reported at the team level and are represented as the percentage of total items (0%–100%) achieved on the checklists. Faculty versus resident team discrepancy values were created, with negative values indicating that teams rated themselves lower than faculty and positive values indicating that teams rated themselves higher than faculty.
Descriptive statistics were analysed using SPSS V.21 and a significance level of p<0.05 was chosen. Paired-sample t-tests and mean comparisons were used to examine changes in team self-assessment accuracy and comparisons between high-performing and low-performing teams. Continuous data are reported as mean ±SD. This project was deemed exempt from the Institutional Review Board.
Results
Thirty first-year general surgery residents participated in the training sessions. Participants had an average age of 27±2% and 77% were men.
Self-assessments of team effectiveness were 67%±7% for day 1, 76%±10% for day 2, 42%±21% for day 3, 62%±13% for day 4 and 81%±20% for day 5. Self-assessments of teamwork were 65%±12% for day 1, 85%±7% for day 2, 58%±29% for day 3, 91%±8% for day 4 and 97%±6% for day 5.
Faculty assessments of team effectiveness were 39%±20% for day 1, 52%±12% for day 2, 22%±17% for day 3, 63%±9% for day 4 and 79%±22% for day 5. Faculty assessments of teamwork were 49%±15% for day 1, 49%±21% for day 2, 10%±9% for day 3, 61%±11% for day 4 and 65%±22% for day 5.
Resident teams rated team effectiveness higher than faculty across the first 3 days (day 1: 67% vs 39%, day 2: 76% vs 52%, day 3: 42% vs 22%, all p<0.01). However, resident teams and faculty provided similar ratings for the fourth and fifth simulations (day 4: 62% vs 63%, day 5: 81% vs 79%). Additionally, discrepancy values for team effectiveness from day 1 to 5 were significantly lower (28% → 1%, p<0.001), indicating that resident teams significantly improved their self-assessment accuracy by the end of the week (figure 1).
Figure 1.
Team effectiveness across scenarios.
Teams rated their teamwork higher than faculty across all days (day 1: 65% vs 49%, day 2: 85% vs 49%, day 3: 58% vs 10%, day 4:91% vs 51%, day 5: 97% vs 65%, all p<0.01), and these changes in discrepancy values for teamwork from day 1 to 5 did not reach significance (16% → 32%, p=0.07) (figure 2).
Figure 2.
Teamwork across scenarios.
Finally, top performing teams (ie, highest quartile) provided more accurate self-assessments for both teamwork (average discrepancy 8% vs 39%) and team performance (average discrepancy 12% vs 23%). These discrepancies are displayed in figures 3 and 4.
Figure 3.
Team effectiveness discrepancy between top 25% and other teams.
Figure 4.
Teamwork discrepancy between top 25% and other teams.
Discussion
Our results suggest that teams in the early stages of formation overestimate their effectiveness. However, after working together on different patient cases over time, their self-assessment ratings can become just as accurate as faculty ratings. Self-assessment of teamwork, on the other hand, were significantly inflated compared with faculty ratings across all performance episodes. These differences in self-assessment accuracy between teamwork and team effectiveness were surprising. Even though teams were rating the construct of teamwork using the same tool and anchors each scenario, their ratings did not become more accurate over time. The inflations of their teamwork abilities versus clinical performance may be based on prior experiences. For example, all trainees have had prior experiences working on teams (sports, projects and so on) in the past, which could influence the overestimation of their skills. However, trainees may not have had many, if any at all, experiences managing the clinical scenarios and could thus rate themselves lower without any concern of whether or not they ‘should’ have those skills already. Our data also found that low-performing teams provided the most inaccurate ratings for both team effectiveness and teamwork. Top performing teams, on the other hand, were more accurate estimators of their own abilities. These findings align with the well-known Kruger-Dunning effect,6 which describes how the most incompetent are ‘unskilled and unaware’, instead of relying on mistaken impressions that they are doing just fine. In this way, low-performing teams suffer a dual burden: not only do they lack the knowledge and skills for producing high performance, but their incompetence also keeps them from realising it.
These findings have a number of implications for simulation educators. First, our study contributes to the plethora of research demonstrating the effectiveness of simulation to enhance team effectiveness competencies, as teams exhibited significantly better team performance from the first to last case. Additionally, these data suggest that simulation can be a venue through which teams have the opportunity to review and critique their own effectiveness and that this process enhances the accuracy of those team effectiveness assessments. This work also highlights that simply having teams work together and review their performance may not be enough for important teamwork competencies to emerge. Dedicated curricula or debriefing sessions focused on teamwork may be necessary to foster team leadership, coordination, communication and collaboration skills. Alternatively, if these skills do develop naturally based on accumulated experience, more than five performance episodes working together may be necessary.
Although we feel perceive that we have done a competent job in designing this study and drawing out relevant implications, our thesis suggests that this may be an overestimation of our skills. Thus, these findings should be considered in light of the study’s limitations. These data represent simulation-based performance and evaluations from one cohort of general surgery interns from a single institution, thus limiting the generalisability of these findings. Additionally, it is plausible that the assessments provided by the faculty members were inaccurate. We specifically chose individuals with substantial experience and expertise with each of the instruments, and many of the tools chosen had objective indicators (ie, intubated within first 2 min), but it is possible that their ratings did not reflect reality.
Conclusion
Teams that continue to work together over time may become more accurate judges of their own performance, but do not become more accurate assessors of teamwork competencies. Low-performing teams provide the most inaccurate ratings of team-related skills. Future work should explore what additional tools or interventions can be used to facilitate accurate team self-assessments.
Footnotes
Contributors: AKG and KA contributed to the design, implementation and manuscript preparation of the manuscript.
Competing interests: None declared.
Provenance and peer review: Not commissioned; externally peer reviewed.
Presented at: Presented at the Annual ACS Surgical Simulation Meeting, Chicago, IL (16–17 March 2018).
References
- 1. Evans AW, Leeson RM, Petrie A. Reliability of peer and self-assessment scores compared with trainers' scores following third molar surgery. Med Educ 2007;41:866–72. 10.1111/j.1365-2923.2007.02819.x [DOI] [PubMed] [Google Scholar]
 - 2. Pandey VA, Wolfe JH, Black SA, et al. Self-assessment of technical skill in surgery: the need for expert feedback. Ann R Coll Surg Engl 2008;90:286–90. 10.1308/003588408X286008 [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 3. Brewster LP, Risucci DA, Joehl RJ, et al. Comparison of resident self-assessments with trained faculty and standardized patient assessments of clinical and technical skills in a structured educational module. Am J Surg 2008;195:1–4. 10.1016/j.amjsurg.2007.08.048 [DOI] [PubMed] [Google Scholar]
 - 4. Arora S, Miskovic D, Hull L, et al. Self vs expert assessment of technical and non-technical skills in high fidelity simulation. Am J Surg 2011;202:500–6. 10.1016/j.amjsurg.2011.01.024 [DOI] [PubMed] [Google Scholar]
 - 5. Vyasa P, Willis RE, Dunkin BJ, et al. Self-assessment of endoscopic skills by general surgery residents: Does video observation help? J Surg Educ 2017;74:23–9. [DOI] [PubMed] [Google Scholar]
 - 6. Ward M, MacRae H, Schlachta C, et al. Resident self-assessment of operative performance. Am J Surg 2003;185:521–4. 10.1016/S0002-9610(03)00069-2 [DOI] [PubMed] [Google Scholar]
 - 7. Kruger J, Dunning D. Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J Pers Soc Psychol 1999;77:1121–34. 10.1037/0022-3514.77.6.1121 [DOI] [PubMed] [Google Scholar]
 - 8. Gardner AK, Ahmed RA. Transforming trauma teams through transactive memory: how simulation can enhance performance. Simul Gaming 2014;45:356–70. [Google Scholar]
 - 9. Joshi K, Hernandez J, Martinez J, et al. Should they stay or should they go now? Exploring the impact of team familiarity on interprofessional team training outcomes. Am J Surg 2018;215. 10.1016/j.amjsurg.2017.08.048 [DOI] [PubMed] [Google Scholar]
 - 10. Gardner AK, Scott DJ, AbdelFattah KR. Do great teams think alike? An examination of team mental models and their impact on team performance. Surgery 2017;161:1203–8. 10.1016/j.surg.2016.11.010 [DOI] [PubMed] [Google Scholar]
 - 11. Malec JF, Torsher LC, Dunn WF, et al. The mayo high performance teamwork scale: reliability and validity for evaluating key crew resource management skills. Simul Healthc 2007;2:4–10. 10.1097/SIH.0b013e31802b68ee [DOI] [PubMed] [Google Scholar]
 




