Key Points
Question
Can intraoperative recordings of mastoidectomy be used to evaluate surgeon experience and surgical technique?
Findings
In this observational study of 24 intraoperative recordings of mastoidectomies performed by 12 surgeons of different experience levels, attending surgeons performed substantially more strokes per unit time using the drill and achieved higher ratings on drilling efficiency, stroke pattern, use of suction, and use of irrigation compared with junior residents. There was fair to excellent intraclass correlation among the 3 observers who evaluated the videos.
Meaning
Observation of intraoperative mastoidectomy recordings is a feasible method of evaluating surgeon experience and technique.
Abstract
Importance
Otolaryngology residency programs currently lack rigorous methods for assessing surgical skill and often rely on biased tools of evaluation.
Objectives
To evaluate which techniques used in mastoidectomy can serve as indicators of surgeon level (defined as the level of training) and whether these determinations of technique can be made based solely on the movement of the drill head or suction.
Design, Setting, and Participants
In this prospective, observational study conducted from January 1, 2015, to December 31, 2019, at a single tertiary care institution, 3 independent observers made blinded evaluations on 24 intraoperative recordings of surgeons (6 junior residents, 4 senior residents, and 2 attending surgeons) performing mastoidectomies.
Main Outcomes and Measures
Observers assessed drill stroke count, drilling efficiency, stroke pattern, use of suction and irrigation, and estimated surgeon level. Assessments were made on both original videos and animated videos that show only the path of the burr head or suction as dots against a white background.
Results
Among the 24 recorded mastoidectomies performed by the 12 study surgeons, intraclass correlation was excellent for original video assessment of drill stroke count (0.98 [95% CI, 0.97-1.00]), use of suction (0.75 [95% CI, 0.52-0.89]), use of irrigation (0.83 [95% CI, 0.66-0.92]), and estimated surgeon level (0.82 [95% CI, 0.64-0.92]) and fair for drilling efficiency (0.54 [95% CI, 0.09-0.79]) and stroke pattern (0.49 [95% CI, −0.02 to 0.76]). Intraclass correlation was excellent for animated video assessment of drill stroke count per unit time (0.98 [95% CI, 0.96-0.99]) and drilling efficiency (0.80 [95% CI, 0.60-0.91]), good for stroke pattern (0.68 [95% CI, 0.38-0.85]) and estimated surgeon level (based on path of drill) (0.69 [95% CI, 0.38-0.85]), and fair for use of suction (0.58 [95% CI, 0.16-0.80]) and estimated surgeon level (based on path of suction) (0.58 [95% CI, 0.17-0.80]). On evaluation of original videos, junior residents had lower drill stroke count compared with senior residents and attending surgeons (6.0 [interquartile range (IQR), 3.0-8.0] vs 9.5 [IQR, 5.0-13.0] vs 10.5 [IQR, 5.0-17.8]; η2 = 0.14 [95% CI, 0.01-0.28]). On evaluation of animated videos, junior residents also had lower drill stroke count compared with senior residents and attending surgeons (6.0 [IQR, 4.0-9.0] vs 10.5 [IQR, 10.0-13.8] vs 10.5 [IQR, 4.3-21.0]; η2 = 0.19 [95% CI, 0.04-0.33]). Compared with junior and senior residents, attending surgeons had higher median ratings of drilling efficiency (original videos: junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 4.0 [IQR, 3.0-4.8]; attending surgeons, 5.0 [IQR, 4.3-5.0]; η2 = 0.23 [95% CI, 0.06-0.37]; animated videos: junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 3.0 [IQR, 2.0-4.0]; attending surgeons, 5.0 [IQR, 4.0-5.0]; η2 = 0.25 [95% CI, 0.08-0.39]) and stroke pattern (original videos: junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 4.0 [IQR, 3.0-4.8]; attending surgeons, 5.0 [IQR, 5.0-5.0]; η2 = 0.17 [95% CI, 0.03-0.31]; animated videos: junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 4.0 [IQR, 2.0-4.0]; attending surgeons, 5.0 [IQR, 5.0-5.0]; η2 = 0.15 [95% CI, 0.02-0.29]).
Conclusions and Relevance
This study suggests that observation of intraoperative mastoidectomy recordings is a feasible method of evaluating surgeon level. Reasonable indicators of surgeon level include the drill stroke count, drilling efficiency, stroke pattern, and use of the suction irrigator. Observing the path of the drill alone is sufficient to appreciate differences in drilling technique but not sufficient to accurately determine surgeon level. Intraoperative recordings can serve as a useful addition to resident education and evaluation.
This quality improvement study examines which techniques used in mastoidectomy can serve as indicators of surgeon level and whether these determinations of technique can be made based solely on the movement of the drill head or suction.
Introduction
Otolaryngology residency programs rely on various assessment tools to evaluate the surgical performance of trainees. These tools include surgical case logs, oral and written examinations, and/or subjective evaluations of residents under direct observation in the operating room.1 Some training programs have developed tools to assess surgical skills in laboratories or simulation settings.2,3 However, these tools are subject to observer bias by the evaluator, some of the tools are timed measures, and they do not necessarily assess the skill level of otolaryngology trainees in real operative situations. In addition, these tools sometimes rely on imprecise narrative descriptions by the evaluator.
Surgical recordings are a useful complement to resident training,4,5 especially as many institutions now readily have video recording capabilities. Evaluation of intraoperative videos provides an opportunity to assess surgical performance while reducing observer bias, especially if carried out in a blinded manner. It allows evaluators to differentiate techniques used by otolaryngologists of various skill levels.6 In the present study, we aim to evaluate (1) whether observation of short intraoperative mastoidectomy recordings is a feasible method of evaluating surgeon level (defined as the level of training), (2) which techniques used in mastoidectomy can serve as indicators of surgeon level, and (3) whether these determinations of technique can be made based solely on the movement of the drill head or suction. To our knowledge, the present study is the first of its kind and builds on a previous study by the same institution.7
Methods
Intraoperative Video Selection
In this observational study, intraoperative videos were retrospectively selected from mastoidectomies performed at a single tertiary care institution, Medical University of South Carolina hospital, from January 1, 2015, to December 31, 2019. Intraoperative recordings are routinely conducted for the purpose of resident education at our institution. Written informed consent for recording is included in the consent for the surgical procedure. All surgical procedures were performed on adult patients with no history of otologic disease who were receiving cochlear implantation and resulted in successful implantation. Recordings from 12 surgeons (6 junior residents, 4 senior residents, and 2 neurotology attending surgeons) were sampled at 2 different segments during the second minute of drilling, resulting in twenty-four 5-second clips. Five-second clips were chosen for analysis because we wanted to assess if differences in techniques and surgeon level could be feasibly determined in a short period of time. The second minute of drilling was chosen to minimize confounding influences that complex temporal bone anatomy might impose on less experienced trainees at later points in the case. In addition, the second minute of drilling would capture a task that surgeons of all levels would be able to complete. The camera view was limited to the surgeon’s hands and instruments over the surgical field. The videos contained no identifying information pertaining to the patient or surgeon (Figure, A). This study was approved by the Medical University of South Carolina Institutional Review Board (Pro00068069).
Figure. Screenshot of Intraoperative Recording of Mastoidectomy .
A, Intraoperative recording of mastoidectomy. B, Animated video depicting the 2-dimensional path of the drill head captured using video tracking software.
Animated Videos
Video tracking software (Kinovea, version 0.8.27; Kinovea) was used to track the movement of the surgical instruments and output the data into a 2-dimensional plane with respect to time. Using MatLab (MathWorks), animated renditions of the original videos were created wherein the path of the drill or suction was represented by a single dot moving against a blank background in a 2-dimensional plane (Figure, B).
Data Collection
Two otolaryngology residents and a neurotology fellow served as independent observers of the videos. The observers viewed each video once and were blinded to the identity of the surgeon being evaluated. The observers were asked to assess surgeon level and number of strokes completed by the drill during the 5-second clips. In addition, 5-point Likert scales were used to assess drilling efficiency, stroke pattern, and proficiency with the suction and irrigator. Table 1 provides details of the observers’ evaluations and how the 5-point Likert scale reflected the evaluations. The observers subsequently evaluated the same measures when viewing the animated videos, with the exception of the use of irrigation, because this procedure cannot be determined from the animation. All evaluations were collected using Research Electronic Data Capture (Vanderbilt University).
Table 1. Components of Evaluation.
Outcome | Question | Scale |
---|---|---|
Drill stroke count | How many strokes did the surgeon complete? | Count (ie, No. of strokes) |
Drill efficiency | Please evaluate the efficiency of the surgeon | 1 = Mostly unnecessary without clear purpose; 2 = majority of time spent making unnecessary moves; 3 = noticeable, but limited, amount of time spent making unnecessary moves; 4 = few unnecessary moves; 5 = no unnecessary moves |
Stroke pattern | Please evaluate the drill stroke pattern | 1 = Random placement of cuts; 2 = mostly random placement of cuts with few deliberate cuts; 3 = more deliberate placement of cuts but still makes some random cuts; 4 = deliberate, stepwise strokes but shows some hesitation throughout procedure; 5 = deliberate cuts performed in stepwise manner |
Use of suction | Please evaluate the use of the suction | 1 = Does not use suction; 2 = random use of suction; 3 = some deliberate use of suction; 4 = deliberate use of suction that does not interfere with progression of surgery; 5 = deliberate use of suction that aids in progression of surgery |
Use of irrigator | Please evaluate the use of the irrigator | 1 = Does not use irrigator; 2 = random use of irrigator; 3 = some deliberate use of irrigator; 4 = deliberate use of irrigator that does not interfere with progression of surgery; 5 = deliberate use of irrigator that aids in progression of surgery |
Estimated surgeon level | Please estimate the level of the surgeon | 1 = Junior resident; 2 = senior resident; 3 = attending |
Statistical Analysis
Statistical analyses were performed using SPSS, version 25.0 (IBM Corp) and MedCalc, version 19.1.7 (MedCalc Software). Continuous variables and ordinal variables (5-point ratings) were summarized with median and interquartile range (IQR) owing to a nonparametric distribution as confirmed by the Shapiro-Wilk test of normality. Categorical variables were summarized by frequency and percentage. Effect size for comparisons of continuous variables (eg, drill stroke count) and ordinal variables (eg, drill efficiency, stroke pattern, use of suction, and use of irrigation) among the 3 surgeon levels (attending surgeon, senior resident, and junior resident) were calculated using η2 along with 95% CIs. The η2 values were interpreted as follows: small, 0.01 to 0.05; medium, 0.06 to 0.13; and large, 0.14 or more. Post hoc measures of effect size for between-group comparisons (eg, attending surgeon vs junior resident) were calculated with Cohen d, where values are interpreted as follows: small, 0.2 to 0.4; medium, 0.5 to 0.7; and large, 0.8 or more. Effect size for comparisons of continuous and ordinal variables between original and animated videos were calculated with η2. Interpretation of values is described by Cohen.8
In addition, an intraclass correlation coefficient (ICC) with 95% CI was used to determine the reliability of measurements or ratings among different surgeon levels averaged together. For this study, the “consistency” model was more preferable than the “absolute agreement” model because all participants were evaluated by the same group of surgeons and because systematic differences between surgeons were irrelevant. The ICC values correlate with the strength of agreements and are rated as follows: poor, less than 0.40; fair, 0.40 to 0.59; good, 0.60 to 0.74; and excellent, 0.75 to 1.00.9 Based on the sample size of 24 video clips, 3 raters, 5 categories, and α level of .05 with 0.80 power, a sample size of 27 is needed for an ICC value of 0.40, 19 for an ICC value of 0.60, and 11 for an ICC value of 0.75.
Finally, Cohen κ was used to measure agreement between observers’ estimation of surgeon level compared with the actual surgeon level. The κ value can be interpreted as follows: poor, less than 0.20; fair, 0.21 to 0.40; moderate, 0.41 to 0.60; good, 0.61 to 0.80; and very good, 0.81 to 1.00.10
Results
Interrater Reliability
When viewing the original videos, interrater reliability for drill stroke count (ICC, 0.98 [95% CI, 0.97-1.00]), use of suction (ICC, 0.75 [95% CI, 0.52-0.89]), use of irrigation (ICC, 0.83 [95% CI, 0.66-0.92]), and estimated surgeon level was excellent (ICC, 0.82 [95% CI, 0.64-0.92]) (Table 2).9 Ratings of drill efficiency (ICC, 0.54 [95% CI, 0.09-0.79]) and stroke pattern (ICC, 0.49 [95% CI, −0.02 to 0.76]) had fair interrater reliability. When viewing animated videos, interrater reliability was excellent for drill stroke count (ICC, 0.98 [95% CI, 0.96-0.99]) and drilling efficiency (ICC, 0.80 [95% CI, 0.60-0.91]), good for stroke pattern (ICC, 0.68 [95% CI, 0.38-0.85]) and estimated surgeon level (based on the path of the drill) (ICC, 0.69 [95% CI, 0.38-0.85]), and fair for use of suction (ICC, 0.58 [95% CI, 0.16-0.80]) and estimated surgeon level (based on the path of the suction) (ICC, 0.58 [95% CI, 0.17-0.80]).
Table 2. Intraclass Correlation Coefficients for Measured Outcomes.
Component | ICC (95% CI) | Interpretationa |
---|---|---|
Original videos | ||
Drill stroke count | 0.98 (0.97 to 1.00) | Excellent |
Drilling efficiency | 0.54 (0.09 to 0.79) | Fair |
Stroke pattern | 0.49 (−0.02 to 0.76) | Fair |
Use of suction | 0.75 (0.52 to 0.89) | Excellent |
Use of irrigator | 0.83 (0.66 to 0.92) | Excellent |
Estimated surgeon level | 0.82 (0.64 to 0.92) | Excellent |
Animated videos | ||
Drill stroke count | 0.98 (0.96 to 0.99) | Excellent |
Drilling efficiency | 0.80 (0.60 to 0.91) | Excellent |
Stroke pattern | 0.68 (0.38 to 0.85) | Good |
Estimated surgeon level (based on path of drill) | 0.69 (0.38 to 0.85) | Good |
Use of suction | 0.58 (0.16 to 0.80) | Fair |
Estimated surgeon level (based on path of suction) | 0.58 (0.17 to 0.80) | Fair |
Abbreviation: ICC, intraclass correlation coefficient.
ICC interpretation defined by Cicchetti9 as follows: poor, less than 0.40; fair, 0.40 to 0.59; good, 0.60 to 0.74; and excellent, 0.75 to 1.00.
Evaluation of Original Videos
The median number of strokes completed by the drill over 5 seconds increased with the level of the surgeon from junior residents (6.0 [IQR, 3.0-8.0]), to senior residents (9.5 [IQR, 5.0-13.0]), to attending surgeons (10.5 [IQR, 5.0-17.8]) (η2 = 0.14 [IQR, 0.01-0.28]) (Table 3).8 The difference in stroke count, however, had a medium to large effect size only between attending surgeons and junior residents and between senior and junior residents on post hoc analysis (both d≥0.5). Five-point ratings of drilling efficiency differed among surgeon levels (junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 4.0 [IQR, 3.0-4.8]; attending surgeons, 5.0 [IQR, 4.3-5.0]; η2 = 0.23 [95% CI, 0.06-0.37]), as did ratings of stroke pattern (junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 4.0 [IQR, 3.0-4.8]; attending surgeons, 5.0 [IQR, 5.0-5.0]; η2 = 0.17 [95% CI, 0.03-0.31]). Ratings of use of suction and use of irrigation also differed among surgeon levels. On post hoc analysis, attending surgeons had higher ratings compared with junior and senior residents across all 4 rated measures (all d ≥ 0.5). Senior residents had higher ratings compared with junior residents in regard to use of suction and irrigation (both d ≥ 0.5). Table 3 summarizes the results of this evaluation.8 The accuracy of the surgeon level estimations was 68.1%, with a corresponding Cohen κ interpreted as moderate (Table 4).10
Table 3. Evaluation Results.
Component evaluated | Observed surgeon, median (IQR) | Effect size (95% CI)a | ||
---|---|---|---|---|
Junior resident | Senior resident | Attending surgeon | ||
Original videos | ||||
Drill stroke countb,c | 6.0 (3.0-8.0) | 9.5 (5.0-13.0) | 10.5 (5.0-17.8) | 0.14 (0.01-0.28) |
Drilling efficiencyb,d | 4.0 (3.0-4.0) | 4.0 (3.0-4.8) | 5.0 (4.3-5.0) | 0.23 (0.06-0.37) |
Stroke patternb,d | 4.0 (3.0-4.0) | 4.0 (3.0-4.8) | 5.0 (5.0-5.0) | 0.17 (0.03-0.31) |
Use of suctionb,c,d | 3.0 (2.0-4.0) | 4.0 (3.0-5.0) | 5.0 (5.0-5.0) | 0.31 (0.12-0.44) |
Use of irrigationb,c,d | 3.0 (2.0-4.0) | 4.0 (3.0-5.0) | 5.0 (5.0-5.0) | 0.24 (0.07-0.38) |
Animated videos | ||||
Drill stroke countb,c | 6.0 (4.0-9.0) | 10.5 (10.0-13.8) | 10.5 (4.3-21.0) | 0.19 (0.04-0.33) |
Drilling efficiencyb,c,d | 4.0 (3.0-4.0) | 3.0 (2.0-4.0) | 5.0 (4.0-5.0) | 0.25 (0.08-0.39) |
Stroke patternb,d | 4.0 (3.0-4.0) | 4.0 (2.0-4.0) | 5.0 (5.0-5.0) | 0.15 (0.02-0.29) |
Use of suction | 3.0 (3.0-4.0) | 3.5 (3.0-5.0) | 3.5 (2.0-5.0)e | 0.00 (0.00-0.05) |
Abbreviation: IQR, interquartile range.
Effect size (η2) defined by Cohen8 as follows: small, 0.01 to 0.05; medium, 0.06 to 0.13; and large, 0.14 or more.
Difference between attending surgeon vs junior resident has medium to large effect size (Cohen d).
Difference between senior vs junior resident has medium to large effect size (Cohen d).
Difference between attending surgeon vs senior resident has medium to large effect size (Cohen d).
Difference between animated vs original videos has medium to large effect size (η2).
Table 4. Accuracy of Surgeon Level Estimations Compared With Actual Surgeon Level.
Observer | Accuracy, % | κ (95% CI) | Interpretationa |
---|---|---|---|
Original videos | 68.1 | 0.49 (0.31 to 0.70) | Moderate |
Animated drill | 48.6 | 0.20 (0.01 to 0.38) | Poor |
Animated suction | 40.3 | 0.05 (−0.13 to 0.22) | Poor |
Defined by Altman10 as follows: poor, 0.20 or less; fair, 0.21 to 0.40; moderate, 0.41 to 0.60; good, 0.61 to 0.80; and very good, 0.81 to 1.00.
Evaluation of Animated Videos
The median number of strokes completed by the drill over 5 seconds increased from junior residents (6.0 [IQR, 4.0-9.0]), to senior residents (10.5 [IQR, 10.0-13.8]), to attending surgeons (10.5 [IQR, 4.3-21.0]) (η2 = 0.19 [95% CI, 0.04-0.33]) (Table 3).8 Post hoc analysis revealed that junior residents had lower stroke counts compared with attending surgeons and senior residents (both d ≥ 0.5). Five-point ratings of drilling efficiency differed among surgeon levels (junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 3.0 [IQR, 2.0-4.0]; attending surgeons, 5.0 [IQR, 4.0-5.0]; η2 = 0.25 [95% CI, 0.08-0.39]), as did ratings of stroke pattern (junior residents, 4.0 [IQR, 3.0-4.0]; senior residents, 4.0 [IQR, 2.0-4.0]; attending surgeons, 5.0 [IQR, 5.0-5.0]; η2 = 0.15 [95% CI, 0.02-0.29]). Attending surgeons achieved higher ratings compared with junior residents and senior residents on post hoc analysis (all d ≥ 0.5). In addition, senior residents received higher ratings vs junior residents on drill efficiency (d ≥ 0.5). The use of suction did not show large differences across surgeon levels (junior residents, 3.0 [IQR, 3.0-4.0]; senior residents, 3.5 [IQR, 3.0-5.0]; attending surgeons, 3.5 [IQR, 2.0-5.0]; η2 = 0.00 [95% CI, 0.00-0.05]). Comparison of values between animated and original videos revealed differences of medium to large effect size only between the use of suction among attending surgeons. The accuracy of the surgeon level estimations was 48.6% for the path of the drill and 40.3% for the path of the suction (Table 4).10 Corresponding values for Cohen κ were both interpreted as poor.
Discussion
The use of surgical videos is an increasingly common educational tool that has been shown to provide significant gains in knowledge compared with traditional teaching methods alone.11 The present study used evaluations of mastoidectomy video recordings to determine whether certain operative techniques can be measured and serve as indicators of surgeon level. In particular, we examined various aspects of the use of the drill and the use of the suction irrigator when performing a mastoidectomy. In the present study, we focused on short videos during the early portion of the case.
Drilling Proficiency
In regard to drilling technique, we found that, compared with junior residents, attending surgeons performed a greater number of strokes per unit time and were more efficient and deliberate with their strokes. Compared with senior residents, attending surgeons were similarly more efficient and deliberate with their strokes. However, there were only small differences in the number of strokes between attending surgeons and senior residents. Perhaps this finding indicates that, while residents can quickly improve their rate of drilling, qualities such as efficiency and deliberate stroke pattern are more complex and nuanced and require more time to develop. In the literature, complex tasks have been found to better distinguish surgical skill level.12 The differences we found by surgeon level largely remained consistent when observers were asked to evaluate the animated videos of the drill head. These findings suggest that the movement and path of the drill head conveys a substantial amount of information about the surgeon, even in a 5-second video clip early in a mastoidectomy. Although it is reasonable that the determination of drill efficiency can be made based on drill movement and position alone, it is interesting that the deliberate placement of cuts (stroke pattern) can also be observed, even in the absence of the anatomical and functional context. In our experience, deliberateness is an intuitive indicator of a surgeon’s level, but it is often a quality that is less clearly defined. Our results might therefore describe movement as a specific component of deliberate technique, a finding that could be relevant to those currently developing methods of mastoidectomy evaluation.
In the present study, there was excellent agreement in the number of strokes performed in both the original and animated videos. The agreement between observers appeared stronger when evaluating the drill efficiency and stroke pattern on the animated videos (excellent and good) compared with the original videos (fair). One interpretation of this finding is that the background on which the surgeon operates can distract the observer during assessment of the drilling technique of the primary surgeon. Although drilling technique encompasses using the side of the drill vs the tip of the drill, force of drilling, and respect for surrounding structures, our results indicate that drill movement alone is an important individual component of mastoidectomy. Despite this finding, our results also showed that viewing the animated path of the drill might not be sufficient to accurately estimate surgeon level, as observers were generally more accurate in their estimations when viewing the original videos. The coordination of the drill with the suction irrigator might provide additional information to the observer. Because the animated videos separated the 2 instruments, determining surgeon level might have been more difficult for the observer.
Suction Irrigator Proficiency
Observers in our study also evaluated the proficiency with which the surgeon used the suction irrigator. For both the use of suction and irrigation, our study found that each surgeon level received higher ratings than the level below. It is possible that both attending surgeons’ experience and a steep learning curve for using the suction irrigator with the nondominant hand were associated with the differences observed.
Unlike the use of the drill, the movement and path of the suction might be less important than the functional use of the instrument in assessing proficiency. For instance, we only found small differences in ratings across surgeon levels when viewing the animated videos of the suction in contrast to our findings from original videos. There was also a difference of medium effect size in ratings between the original and animated videos for attending surgeons’ use of suction as a result. In addition, observers in our study had excellent agreement in their ratings of the suction irrigator when viewing the original video but had only fair agreement when watching the animated rendition of the suction. Finally, observers watching the movement of the suction on animation were unable to accurately estimate the level of the surgeon. Altogether, our results suggest that movement of the suction irrigator alone is not a reliable indicator of proficiency in mastoidectomy. Again, the coordination of the suction irrigator with the drill would likely improve the accuracy of estimations of surgeon level, as would assessments about visual field clarity. The association of handedness with the use of the suction irrigator would also be important to consider in future studies. Handedness has been investigated in a number of surgical disciplines,13,14,15 with a study in ophthalmology showing a steeper learning curve in the dexterity of the nondominant hand relative to the dominant hand.16
Relevance to Training in Otolaryngology
At our institution, intraoperative recordings are routinely collected as part of formative resident feedback. We believe that our method of evaluating surgical performance can be easily incorporated at other residency programs, as most operating rooms are now equipped with video recording capability. Advantages of using intraoperative recordings for evaluation include monitoring the yearly progress of individual trainees, with attention to specific techniques described in our study. Assessment of trainees can be time consuming, but our results indicate that even short video segments can provide valuable information. Deidentified recordings also serve to minimize bias from the observer, promoting fairness and trust in the evaluation process. Finally, recordings can potentially supplement a resident’s application for future training or employment, allowing viewers to measure not only surgical performance but also individual growth in specific techniques and reception to feedback. These advantages ultimately shift away from traditional measures of case volume and questionnaires posed to supervisors and move even more toward a competency-based curriculum.
Limitations and Strengths
This study has some limitations, including the use of a small sample of short intraoperative recordings within a single institution. Therefore, our study does not necessarily account for a wider representation of stylistic differences and at different points of time when performing a mastoidectomy. Although 5-second videos can potentially omit important information relative to longer segments, our study sought to determine whether differences in techniques and surgeon level could be feasibly determined in short segments. As our results indicated, this assessment was possible and suggests that there are qualities that become almost immediately apparent to the observer. From a study design perspective, this protocol helps us focus on some of the most salient techniques associated with different surgeon levels. Longer video segments could also introduce more variability in surgical tasks performed. Other limitations include the inability to standardize the operative field, although we controlled for adults without a significant history of otologic problems receiving cochlear implants during the second minute of surgery. The individual progress of surgeons during the study time frame likely also introduced differences in the recordings. Although intraoperative recordings were deidentified to minimize observation bias, the evaluation method remained subjective.
Next, while the grading scale described in our study is not formally validated, we believe it is appropriate for our study objective for evaluating both original and animated videos. The Objective Structured Assessment of Technical Skills is a validated assessment tool commonly used in surgical training,1 but it fails to capture more nuanced components of performing mastoidectomy. As a result, many assessment tools for evaluating mastoidectomy have been devised, and the validation of such a tool is part of a large ongoing effort.17
Our study also has some strengths. First, we maintained the same group of observers and found relatively consistent observations among them. Our recordings also captured real-life operative conditions wherein the potential for surgical complications or consequences of mistakes undoubtedly influence the surgeon under observation. The outcome of these risks cannot be replicated in simulated or laboratory educational settings. In addition, we found large observable differences between surgeon levels in the measures evaluated. Our study therefore provides support for surgical metrics that can be educationally relevant. Finally, our findings set a foundation for future investigation seeking to use video tracking technology or other objective methods to evaluate surgical performance.
Conclusions
Observation of short 5-second intraoperative mastoidectomy recordings early in the procedure is a feasible method of evaluating surgeons’ experience based on time spent in training. Reasonable indicators of surgeon level include the number of strokes using the drill, efficiency of drilling, pattern of drilling, and use of the suction irrigator. Observing the path of the drill alone is sufficient to appreciate differences in drilling technique but not sufficient to accurately determine surgeon level. Neither proficiency with the suction irrigator nor determination of surgeon level can be evaluated based on the path of the suction alone.
References
- 1.Niitsu H, Hirabayashi N, Yoshimitsu M, et al. Using the Objective Structured Assessment of Technical Skills (OSATS) global rating scale to evaluate the skills of surgical trainees in the operating room. Surg Today. 2013;43(3):271-275. doi: 10.1007/s00595-012-0313-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mowry SE, Woodson E, Gubbels S, Carfrae M, Hansen MR. A simple assessment tool for evaluation of cadaveric temporal bone dissection. Laryngoscope. 2018;128(2):451-455. doi: 10.1002/lary.26578 [DOI] [PubMed] [Google Scholar]
- 3.Zirkle M, Taplin MA, Anthony R, Dubrowski A. Objective assessment of temporal bone drilling skills. Ann Otol Rhinol Laryngol. 2007;116(11):793-798. doi: 10.1177/000348940711601101 [DOI] [PubMed] [Google Scholar]
- 4.Gambadauro P, Magos A. Digital video recordings for training, assessment, and revalidation of surgical skills. Surg Technol Int. 2010;20:36-39. [PubMed] [Google Scholar]
- 5.Poon C, Stevens SM, Golub JS, Pensak ML, Samy RN. Pilot study evaluating the impact of otology surgery videos on otolaryngology resident education. Otol Neurotol. 2017;38(3):423-428. doi: 10.1097/MAO.0000000000001303 [DOI] [PubMed] [Google Scholar]
- 6.Bowles PFD, Harries M, Young P, Das P, Saunders N, Fleming JC. A validation study on the use of intra-operative video recording as an objective assessment tool for core ENT surgery. Clin Otolaryngol. 2014;39(2):102-107. doi: 10.1111/coa.12240 [DOI] [PubMed] [Google Scholar]
- 7.Close MF, Mehta CH, Liu Y, et al. Subjective vs computerized assessment of surgeon skill level during mastoidectomy. Otolaryngol Head Neck Surg. Published online June 30, 2020. doi: 10.1177/0194599820933882 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cohen J. Statistical Power Analysis for the Behavioral Sciences. L. Erlbaum Associates; 1988. [Google Scholar]
- 9.Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6(4):284-290. doi: 10.1037/1040-3590.6.4.284 [DOI] [Google Scholar]
- 10.Altman DG. Practical Statistics for Medical Research. 1st ed. Chapman and Hall; 1991. [Google Scholar]
- 11.Ahmet A, Gamze K, Rustem M, Sezen KA. Is video-based education an effective method in surgical education? a systematic review. J Surg Educ. 2018;75(5):1150-1158. doi: 10.1016/j.jsurg.2018.01.014 [DOI] [PubMed] [Google Scholar]
- 12.Fard MJ, Ameri S, Darin Ellis R, Chinnam RB, Pandya AK, Klein MD. Automated robot-assisted surgical skill evaluation: predictive analytics approach. Int J Med Robot. 2018;14(1). doi: 10.1002/rcs.1850 [DOI] [PubMed] [Google Scholar]
- 13.Choussein S, Srouji SS, Farland LV, et al. Robotic assistance confers ambidexterity to laparoscopic surgeons. J Minim Invasive Gynecol. 2018;25(1):76-83. doi: 10.1016/j.jmig.2017.07.010 [DOI] [PubMed] [Google Scholar]
- 14.Molinas CR, Binda MM, Campo R. Dominant hand, non-dominant hand, or both? the effect of pre-training in hand-eye coordination upon the learning curve of laparoscopic intra-corporeal knot tying. Gynecol Surg. 2017;14(1):12. doi: 10.1186/s10397-017-1015-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Naunheim MR, Le A, Dedmon MM, Franco RA, Anderson J, Song PC. The effect of handedness and laterality in a microlaryngeal surgery simulator. Am J Otolaryngol. 2017;38(4):472-474. doi: 10.1016/j.amjoto.2017.04.009 [DOI] [PubMed] [Google Scholar]
- 16.Gonzalez-Gonzalez LA, Payal AR, Gonzalez-Monroy JE, Daly MK. Ophthalmic surgical simulation in training dexterity in dominant and nondominant hands: results from a pilot study. J Surg Educ. 2016;73(4):699-708. doi: 10.1016/j.jsurg.2016.01.014 [DOI] [PubMed] [Google Scholar]
- 17.Sethia R, Kerwin TF, Wiet GJ. Performance assessment for mastoidectomy. Otolaryngol Head Neck Surg. 2017;156(1):61-69. doi: 10.1177/0194599816670886 [DOI] [PMC free article] [PubMed] [Google Scholar]