Abstract
Background:
Surgeons must select cases whose complexity aligns with their skill set.
Objectives:
To determine how accurately trainees report involvement in procedures, judge case complexity, and assess their own skills.
Methods:
We recruited attendings and trainees from two otolaryngology departments. After performing septoplasty, they completed identical surveys regarding case complexity, achievement of goals, who performed which steps, and trainee skill using the septoplasty global assessment tool (SGAT) and visual analog scale (VAS). Agreement regarding which steps were performed by the trainee was assessed with Cohen's kappa coefficients (κ). Correlations between trainee and attending responses were measured with Spearman's correlation coefficients (rho).
Results:
Seven attendings and 42 trainees completed 181 paired surveys. Trainees and attendings sometimes disagreed about which steps were performed by trainees (range of κ = 0.743–0.846). Correlation between attending and trainee responses was low for VAS skill ratings (range of rho = 0.12–0.34), SGAT questions (range of rho = 0.03–0.53), and evaluation of case complexity (range of rho = 0.24–0.48).
Conclusion:
Trainees sometimes disagree with attendings about which septoplasty steps they perform and are limited in their ability to judge complexity, goals, and their skill.
Introduction
Residencies and fellowships aim to train physicians who can practice independently upon completion of training. Currently, preparation for independent practice is judged by the Accreditation Council for Graduate Medical Education (ACGME) using self-reported case logs. Residents must log a prespecified number of “key indicator” cases in which they performed the majority of the critical steps. Little is known, however, about how accurately trainees report their level of involvement in cases. Furthermore, involvement does not necessarily imply competence.
KEY POINTS
Question: Do surgical trainees agree with attending surgeons on what happens in the operating room when performing nasal septoplasty?
Findings: In this study, surgical trainees sometimes disagreed with attending surgeons about their level of involvement in procedures and differed significantly from attending surgeons in evaluating case complexity and their own skill when performing septoplasty.
Meaning: Changes to resident and fellow training focused on addressing these differences are warranted.
Therefore, objective measures of skills and knowledge have become popular in otolaryngology and other specialties.1–5 Although technical skills and knowledge are prerequisites for practicing competently, equally important are awareness of one's skills, knowledge, and limitations and the ability to relate these to the complexity of patient cases.6 Illustrating the importance of trainee self-awareness, surgical residents' self-perceived competence in procedures affects their practice patterns after graduation, whereas case volumes do not.7
In otolaryngology, trainees feel they have gained subjective competence in common procedures later than is expected by residency program directors, suggesting misalignment between trainee self-perception and faculty assessment of trainees.2 These findings highlight the importance of teaching residents to self-assess. However, little is known about the ability of trainees to assess their own competence. Furthermore, no studies have assessed the ability of trainees to judge case complexity or understand the achievement of surgical goals. Evaluation of case complexity is critical in case selection, whereas understanding surgical goals allows a surgeon to terminate an operation when they have been achieved.
Utilizing septoplasty as a model operation, we assessed the accuracy with which trainees report their level of involvement in procedures, as well as how well they evaluate case complexity, surgical goals, and their own skills. To do this, we compared identical surveys and assessments completed by trainees and attendings immediately after performing septoplasties.
Materials and Methods
Participants
We recruited attendings, residents, and facial plastic surgery fellows at Johns Hopkins University School of Medicine and Washington DC Veterans Affairs Medical Center. The institutional review boards at Johns Hopkins and Washington DC Veterans Affairs Medical Center approved this study. Data were de-identified before analysis.
Survey instruments
Participants completed demographic surveys including age, gender, ethnicity, and level of training. For this study, septoplasty was considered to consist of five steps: (1) opening the septum or making the incision, (2) elevating septal flaps, (3) removing deviated bone and cartilage, (4) reconstructing the septum, including replacing scored or straightened cartilage, and (5) closing the incision. At the conclusion of the case, the attending and trainee each completed a survey. They both noted which steps were performed by the trainee and rated the trainee's skill level for each step performed using a visual analog scale (VAS) from 0 (novice) to 100 (master).
Both also rated the trainee's skill using the septoplasty global assessment tool (SGAT), a 7-item tool rating various aspects of skill and knowledge in performing septoplasty.8 They were asked to rate the technical complexity of the surgery, the severity of anatomical deflection of the septum, the severity of the pathological process causing the patients symptoms, and how well the goals of the surgery were achieved. Each of these questions were rated from 1 to 10. Trainees and attendings were blinded to each other's responses.
Statistical analysis
Agreement between trainees and attendings regarding who performed which steps were analyzed by computing the Cohen's kappa coefficient and percentage agreement for each step. Agreements between trainee and attending responses to VAS, SGAT, and case complexity questions were analyzed using Spearman's rank correlation coefficients (rho). The mean differences between trainee and attending responses for each question along with 95% limits of agreement were plotted based on the method of Bland and Altman.9,10 To control for possible variation in ratings between different attendings, repeated measures correlation coefficients were computed.11
The presented correlation coefficients measure the strength of associations from −1 (perfect negative correlation where high attending scores correspond to low trainee scores) to 0 (no correlation) to 1 (perfect positive correlation where high attending scores correspond to high trainee scores). Associated p-values describe the statistical significance of each correlation with null hypothesis of 0 correlation. A statistically significant p-value, however, should not be interpreted as a strong relationship between the two variables, especially when the correlation coefficient is low (closer to 0).
To measure the effect of training level on agreement between attendings and trainees, trainees were categorized into two categories: senior trainees (postgraduate year [PGY]-5 to fellow level) and junior trainees (PGY-1 to PGY-4 level). These cutoffs were chosen because PGY-5 residents and fellows are completing training and preparing to enter independent practice, whereas otolaryngology residents at MedStar Georgetown and Johns Hopkins have their first facial plastic surgery rotations as PGY-3 and PGY-4 residents, respectively.
To assess whether agreement between attendings and trainees was higher among senior or junior trainees, a multivariate comparison of the Spearman's correlation coefficients computed for each question within each group was performed. To perform this test, the sampling distribution of each correlation coefficient was transformed using the Fisher-Z transformation to obtain a normal sampling distribution. These derived normal distributions were then compared using a multivariate analysis of variance.12 Figure 1 summarizes the overall study design. Analyses were performed using R version 4.0.2.
Fig. 1.
Flow diagram summarizing methods of data collection and data analysis.
Results
We analyzed data from 151 septoplasty procedures, including 181 paired surveys. There were more paired surveys than cases because more than one trainee participated in a subset of procedures. Seven attendings and 42 trainees participated. There were four male and three female attendings. Trainee demographics are summarized in Table 1. All training levels (PGY-1 to fellow) were included.
Table 1.
Trainee demographics
| n (%) | |
|---|---|
| Age | |
| 25–30 | 21 (50) |
| 31–35 | 19 (45) |
| 36+ | 2 (5) |
| Gender | |
| Female | 20 (48) |
| Male | 22 (52) |
| Training level | |
| PGY-1 to 4 | 26 (62) |
| PGY-5 to fellow | 16 (38) |
| Ethnicity | |
| Hispanic | 2 (5) |
| Not Hispanic | 39 (93) |
| Not specified | 1 (2) |
| Race | |
| White | 28 (67) |
| Asian | 13 (31) |
| Black | 1 (2) |
PGY, postgraduate year.
Attendings and trainees sometimes disagreed about which steps were performed by the trainee. Agreement between attendings and trainees regarding which steps were performed by the trainee and Cohen's kappa coefficients for each step are summarized in Table 2. Correlations between attending and trainee ratings of SGAT, VAS, and case complexity questions were all relatively low (Table 3). Correlations were notably low regarding how well the trainee understood the objectives of the surgery (rho = 0.03, p = 0.72), how complex the case was technically (rho = 0.28, p < 0.001), how well the goals of the surgery were achieved (rho = 0.25, p < 0.001), and how skilled the trainee was in removing deviated cartilage and bone (rho = 0.17, p = 0.08) and in reconstructing the septum (rho = 0.12, p = 0.33).
Table 2.
Agreement tables for each operative step and Cohen's kappa coefficients
| Operative steps | Agreement tables | Cohen's kappa | Percentage agreement | ||
|---|---|---|---|---|---|
| Opening the septum | Attending | 0.763 | 91.2% | ||
| Yes | No | ||||
| Trainee | Yes | 128 | 9 | ||
| No | 7 | 37 | |||
| Elevating septal flaps | Attending | 0.807 | 93.4% | ||
| Yes | No | ||||
| Trainee | Yes | 135 | 7 | ||
| No | 5 | 34 | |||
| Removing septum | Attending | 0.846 | 92.8% | ||
| Yes | No | ||||
| Trainee | Yes | 107 | 8 | ||
| No | 5 | 61 | |||
| Septal reconstruction | Attending | 0.743 | 87.3% | ||
| Yes | No | ||||
| Trainee | Yes | 69 | 14 | ||
| No | 9 | 89 | |||
| Closing | Attending | 0.784 | 90.6% | ||
| Yes | No | ||||
| Trainee | Yes | 115 | 14 | ||
| No | 3 | 49 | |||
Table 3.
Spearman's correlation coefficients (rho) and repeated measures correlation coefficients (r) for association between trainee and attending responses
|
Figure 2 shows mean differences between trainee and attending responses along with 95% limits of agreement, demonstrating wide variation in the difference for most questions asked. Although on average trainees may have responded lower or higher for certain questions, the limits of agreement demonstrate that some trainees responded with higher values and some responded with lower values than attendings and did not consistently differ in one direction or the other.
Fig. 2.

Mean differences between trainee and attending responses and 95% limits of agreement for (A) visual analog scale ratings, (B) septoplasty global assessment tool, and (C) case complexity questions.
Repeated measures correlation coefficients, which were computed by removing variation in the data attributable to differences between attending surgeon rating styles, were at most modestly higher than the Spearman's coefficients, indicating that the low correlations were not related to differences in the way attendings responded (Table 3). Multivariate testing showed no significant difference between the Spearman coefficients computed within the junior trainee group and senior trainee group (p = 0.45).
Discussion
Surgical training programs endeavor to produce surgeons competent to practice independently. Part of assuring the graduation of competent surgeons involves accurately tracking trainees' involvement in surgical cases. In our study, trainees sometimes disagreed with attendings regarding who performed which steps of the operation. This is problematic in the setting of an “honor system” based case log in which trainees report whether they served as “assistant surgeon,” “resident surgeon,” or “resident supervisor,” depending on their level of involvement.
For key indicator cases that residents participate in more rarely and for which a small number of cases are required for graduation (e.g., residents must perform eight rhinoplasties as resident or supervising surgeon), inaccuracy in self-reporting may result in graduation of surgeons with significantly varied surgical experiences. Our data indicate that further study may be required if self-reported ACGME case logs are to be used as a significant factor in determining competency or resident experience.
Furthermore, trainees struggled to evaluate case complexity and demonstrated poor understanding of the achievement of surgical goals. Although these variables may seem intuitive to attendings, it is possible that trainees receive little instruction in evaluating case complexity and in setting and assessing achievement of surgical goals. This suggests that surgeons who have recently completed training may unknowingly select cases that are more complex than they are equipped to handle. Furthermore, they might stop a case prematurely or perform unnecessary steps if they do not understand how surgical goals define surgical “stopping rules.” These findings suggest that formal instruction in evaluating case complexity and assessing achievement of surgical goals could be beneficial.
In addition to accurately recording trainee participation in cases, assuring graduation of competent trainees may include objectively measuring technical skills and knowledge.1–5 Toward this end, objective assessment tools in otolaryngology have been developed for tonsillectomy, endoscopic sinus surgery, direct laryngoscopy, mastoidectomy, and septoplasty.8,13–16 Objective assessment tools can determine whether trainees have obtained the necessary skills and knowledge to independently perform a procedure.
Ideally, in addition to gaining skills and knowledge, trainees should develop the ability to assess their own skills, knowledge, and limitations and relate these to their understandings of patient cases. This is especially important in practice building and case selection.7 Highlighting the importance of self-awareness, the American Board of Medical Specialties lists self-assessment and lifelong learning as one of four elements in its Maintenance of Certification Program.17 Our study suggests that surgical trainees are poor at assessing their own skills and knowledge.
The wide variation in the differences in responses between trainees and attendings in our study demonstrates that some trainees may overestimate, whereas others underestimate, their skills and knowledge. An overestimation of abilities could lead surgeons to take on cases that are beyond their abilities, potentially placing patients at risk of complications. An underestimation of abilities could lead to a lack of confidence that limits a trainee's growth and restricts his or her scope of practice.
The low correlations observed were present even among PGY-5 residents and fellows. In fact, there was no significant difference in the correlations calculated within this group of senior trainees and the group of more junior trainees. Low correlations were observed even when controlling for differences in attending rating styles, suggesting our findings were related primarily to variance within trainee responses.
Our findings agree with other studies that suggest that physicians are poor at self-assessment with weak or no association between self-assessments and external assessments of performance across multiple specialties.18–24 Although these findings are concerning, perhaps most worrisome is the finding that the physicians who are the worst at self-assessment are also the least competent as measured by external assessments, consistently overestimating their own abilities.18–21,25 These findings suggest that it is critical that trainees be given the tools to self-assess accurately.
Specifically in otolaryngology, Kim et al. conducted a study of residents performing thyroidectomy and found that trainees had high self-awareness of their performances, which contrasts with our findings.26 Notably, this single-institution study included only 11 residents and 1 attending who was not blinded to trainee self-assessments, potentially introducing bias into the results. It is also possible that incompetence in thyroidectomy is more readily recognized by trainees due to immediately noticeable complications (e.g., recurrent laryngeal nerve injury).
Although our findings are concerning, they suggest avenues for improving training. For example, future studies might use tool-tracking data or other methods to objectively measure participation in cases. Studies have shown that feedback including comparison of self-assessments with external assessments can improve self-rating abilities.27,28
In teaching septoplasty, perhaps frequent feedback sessions that include comparison of self-assessments with attending assessments, such as those from our study, could aid residents to understand their skill level and deficiencies. Pre- and postoperative discussions of case complexity could improve trainees' ability to evaluate and select appropriate cases. Teaching surgical goals related to patients' symptoms and pathologies and postoperative debriefings regarding whether goals were met could help trainees understand surgical stopping rules.
This study has limitations. Although we included two academic institutions, it is plausible that our findings might not generalize to all training programs. Although we measured agreement between trainees and attendings, our study did not evaluate all factors that might affect this agreement. Notably, training level did not influence agreement. Future studies might seek to identify factors that predict which trainees are more likely to disagree with attendings who could require additional attention and mentoring. The use of multiple attendings in our study introduces the possibility of variance in rating style.
However, our findings remained consistent even when controlling for variance in attending rating styles. Despite this, the use of third-party external observers could serve to validate attending responses. For example, although attendings and trainees sometimes disagreed about which steps were performed by the trainee, we did not analyze objective measures of who was correct. We did not correlate agreement with surgical outcomes, but future studies should correlate self-assessment skills with patient-reported outcome measures.
Conclusion
Trainees sometimes disagree with attendings regarding who performs which steps of septoplasties. Furthermore, there is low correlation between trainee and attending assessments of case complexity, achievement of surgical goals, and trainee skill level.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study was supported by grant R01-DE025265 (M.I.) from the National Institutes of Health.
References
- 1. Brown DJ, Thompson RE, Bhatti NI. Assessment of operative competency in otolaryngology residency: survey of US Program Directors. Laryngoscope. 2008;118(10):1761–1764. [DOI] [PubMed] [Google Scholar]
- 2. Chadwick KA, Dodson KM, Wan W, Reiter ER. Attainment of surgical competence in otolaryngology training. Laryngoscope. 2015;125(2):331–336. [DOI] [PubMed] [Google Scholar]
- 3. Cooney CM, Cooney DS, Bello RJ, Bojovic B, Redett RJ, Lifchez SD. Comprehensive observations of resident evolution: a novel method for assessing procedure-based residency training. Plast Reconstr Surg. 2016;137(2):673–678. [DOI] [PubMed] [Google Scholar]
- 4. Joyner BD. An historical review of graduate medical education and a protocol of Accreditation Council for Graduate Medical Education compliance. J Urol. 2004;172(1):34–39. [DOI] [PubMed] [Google Scholar]
- 5. Marple BF. Competency-based resident education. Otolaryngol Clin North Am. 2007;40(6):1215–1225, vi-vii. [DOI] [PubMed] [Google Scholar]
- 6. Duffy FD, Holmboe ES. Self-assessment in lifelong learning and improving performance in practice: physician know thyself. JAMA. 2006;296(9):1137–1139. [DOI] [PubMed] [Google Scholar]
- 7. Fronza JS, Prystowsky JP, DaRosa D, Fryer JP. Surgical residents' perception of competence and relevance of the clinical curriculum to future practice. J Surg Educ. 2012;69(6):792–797. [DOI] [PubMed] [Google Scholar]
- 8. Obeid AA, Al-Qahtani KH, Ashraf M, Alghamdi FR, Marglani O, Alherabi A. Development and testing for an operative competency assessment tool for nasal septoplasty surgery. Am J Rhinol Allergy. 2014;28(4):e163–e167. [DOI] [PubMed] [Google Scholar]
- 9. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–310. [PubMed] [Google Scholar]
- 10. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–160. [DOI] [PubMed] [Google Scholar]
- 11. Bland JM, Altman DG. Statistics notes: calculating correlation coefficients with repeated observations: part 1—correlation within subjects. BMJ. 1995;310(6977):446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Rao CR, ed. Essential Statistical Methods for Medical Statistics: A Derivative of Handbook of Statistics: Epidemiology and Medical Statistics, Vol. 27. 1 ed. Amsterdam, Netherlands: Elsevier; 2011. [Google Scholar]
- 13. Ahmed A, Ishman SL, Laeeq K, Bhatti NI. Assessment of improvement of trainee surgical skills in the operating room for tonsillectomy. Laryngoscope. 2013;123(7):1639–1644. [DOI] [PubMed] [Google Scholar]
- 14. Laeeq K, Lin SY, Varela DADV, Lane AP, Reh D, Bhatti NI. Achievement of competency in endoscopic sinus surgery of otolaryngology residents. Laryngoscope. 2013;123(12):2932–2934. [DOI] [PubMed] [Google Scholar]
- 15. Ishman SL, Benke JR, Johnson KE, et al. Blinded evaluation of interrater reliability of an operative competency assessment tool for direct laryngoscopy and rigid bronchoscopy. Arch Otolaryngol Head Neck Surg. 2012;138(10):916–922. [DOI] [PubMed] [Google Scholar]
- 16. Malik MU, Varela DADV, Park E, et al. Determinants of resident competence in mastoidectomy: role of interest and deliberate practice. Laryngoscope. 2013;123(12):3162–3167. [DOI] [PubMed] [Google Scholar]
- 17. Standards for the ABMS Program for Maintenance of Certification (MOC). https://www.abms.org/wp-content/uploads/2020/11/standards-for-the-abms-program-for-moc-final.pdf (Accessed November 15, 2021).
- 18. Davis DA, Mazmanian PE, Fordis M, Van Harrison R, Thorpe KE, Perrier L. Accuracy of physician self-assessment compared with observed measures of competence: a systematic review. JAMA. 2006;296(9):1094–1102. [DOI] [PubMed] [Google Scholar]
- 19. Fox RA, Ingham Clark CL, Scotland AD, Dacre JE. A study of pre-registration house officers' clinical skills. Med Educ. 2000;34(12):1007–1012. [DOI] [PubMed] [Google Scholar]
- 20. Leopold SS, Morgan HD, Kadel NJ, Gardner GC, Schaad DC, Wolf FM. Impact of educational intervention on confidence and competence in the performance of a simple surgical task. J Bone Joint Surg Am. 2005;87(5):1031–1037. [DOI] [PubMed] [Google Scholar]
- 21. Parker RW, Alford C, Passmore C. Can family medicine residents predict their performance on the in-training examination? Fam Med. 2004;36(10):705–709. [PubMed] [Google Scholar]
- 22. Claridge JA, Calland JF, Chandrasekhara V, Young JS, Sanfey H, Schirmer BD. Comparing resident measurements to attending surgeon self-perceptions of surgical educators. Am J Surg. 2003;185(4):323–327. [DOI] [PubMed] [Google Scholar]
- 23. Ireton HR, Sherman M. Self-ratings of graduating family practice residents' psychological medicine abilities. Fam Pract Res J. 1988;7(4):236–244. [PubMed] [Google Scholar]
- 24. Johnson D, Cujec B. Comparison of self, nurse, and physician assessment of residents rotating through an intensive care unit. Crit Care Med. 1998;26(11):1811–1816. [DOI] [PubMed] [Google Scholar]
- 25. Hodges B, Regehr G, Martin D. Difficulties in recognizing one's own incompetence: novice physicians who are unskilled and unaware of it. Acad Med. 2001;76(10 Suppl):S87–S89. [DOI] [PubMed] [Google Scholar]
- 26. Kim AH, Vaughn CA, King DL, Maizels M, Meade P, Stack BC. Assessment of operative competency for thyroidectomy: comparison of resident self-assessment vs attending surgeon assessment. Head Neck. 2020;42(12):3551–3557. [DOI] [PubMed] [Google Scholar]
- 27. Gordon MJ. A review of the validity and accuracy of self-assessments in health professions training. Acad Med. 1991;66(12):762–769. [DOI] [PubMed] [Google Scholar]
- 28. Moret L, Tequi B, Lombrail P. Should self-assessment methods be used to measure compliance with handwashing recommendations? A study carried out in a French university hospital. Am J Infect Control. 2004;32(7):384–390. [DOI] [PubMed] [Google Scholar]

