Skip to main content
European Urology Open Science logoLink to European Urology Open Science
. 2024 Mar 5;62:81–90. doi: 10.1016/j.euros.2024.02.014

Implementation and Validation of an Automated, Longitudinal Robotic Surgical Evaluation and Feedback Program at a High-volume Center and Impact on Training

Rand N Wilcox Vanden Berg a, Emily A Vertosick b, Daniel D Sjoberg b, Eugene K Cha a, Jonathan A Coleman a, Timothy F Donahue a, James A Eastham a, Behfar Ehdaie a, Vincent P Laudone a, Eugene J Pietzak a, Robert C Smith a, Alvin C Goh a,
PMCID: PMC10926308  PMID: 38468865

Take Home Message

An automated, standardized, longitudinal surgical skill assessment and feedback system can be implemented successfully and inform mentored training experience. We identified the predictors of trainee proficiency in surgical steps to enhance individualized surgical education.

Keywords: Education, Learning curve, Robotics

Abstract

Background

Surgical education lacks a standardized, proficiency-based approach to evaluation and feedback.

Objective

To assess the implementation and reception (ie, feasibility) of an automated, standardized, longitudinal surgical skill assessment and feedback system, and identify baseline trainee (resident and fellow) characteristics associated with achieving proficiency in robotic surgery while learning robotic-assisted laparoscopic prostatectomy.

Design, setting, and participants

A quality improvement study assessing a pilot of a surgical experience tracking program was conducted over 1 yr. Participants were six fellows, eight residents, and nine attending surgeons at a tertiary cancer center.

Intervention

Trainees underwent baseline self-assessment. After each surgery, an evaluation was completed independently by the trainee and attending surgeons. Performance was rated on a five-point anchored Likert scale (trainees were considered “proficient” when attending surgeons’ rating was ≥4). Technical skills were assessed using the Global Evaluative Assessment of Robotic Skills (GEARS) and Prostatectomy Assessment and Competency Evaluation (PACE).

Outcome measurements and statistical analysis

Program success and utility were assessed by evaluating completion rates, evaluation completion times, and concordance rates between attending and trainee surgeons, and exit surveys. Baseline characteristics were assessed to determine associations with achieving proficiency.

Results and limitations

Completion rates for trainees and attending surgeons were 72% and 77%, respectively. Fellows performed more steps/cases than residents (median [interquartile range]: 5 [3–7] and 3 [2–4], respectively; p < 0.01). Prior completion of robotics or laparoscopic skill courses and surgical experience measures were associated with achieving proficiency in multiple surgical steps and GEARS domains. Interclass correlation coefficients on individual components were 0.27–0.47 on GEARS domains.

Conclusions

An automated surgical experience tracker with structured, longitudinal evaluation and feedback can be implemented with good participation and minimal participant time commitment, and can guide curricular development in a proficiency-based education program by identifying modifiable factors associated with proficiency, individualizing education, and identifying improvement areas within the education program.

Patient summary

An automated, standardized, longitudinal surgical skill assessment and feedback system can be implemented successfully in surgical education settings and used to inform education plans and predict trainee proficiency.

1. Introduction

Surgical education in the USA adapts an apprenticeship model in which trainees graduate after completing a set number of index procedures and fulfilling individual training program requirements. In 2009, the Accreditation Council for Graduate Medical Education created a set of milestones as a guide to evaluate trainees and usher in proficiency-based education; however, proficiency was defined by the individual training programs [1], [2]. Achieving proficiency has become more difficult with continued medical advances, an increasing number of new surgical procedures, and the work-hour restrictions introduced in 2001 [3].

Recent surveys regarding surgical training programs have suggested areas for improvement. Graduating residents are not universally deemed competent for index procedures [4], [5]. Of graduating urology residents, 61% reported perceived nonproficiency in robotic-assisted laparoscopic prostatectomy (RALP), 59% in radical nephrectomy, and 79% in radical cystectomy [6]. Thus, better methods are needed to evaluate trainees, provide feedback, and produce confident and competent surgeons.

The inherently subjective and heterogeneous nature of traditional, nonstructured performance evaluations has promoted the development of more structured, standardized tools for skill assessment. Recently, structured evaluation of robotic surgical skills has been validated by assessment tools such as the Global Evaluative Assessment of Robotic Skills (GEARS), Prostatectomy Assessment and Competency Evaluation (PACE), and Robotic Anastomosis Competency Evaluation [7], [8], [9]. Automated performance metrics have been proposed as additional objective measures of surgical skills and correlated with perioperative outcomes [10]. While not widespread, some training programs have assessed surgical abilities with structured evaluations and feedback through mobile applications such as SIMPL and myTIPreport [11], [12]. However, comprehensive longitudinal implementation of standardized evaluations to track robotic skill acquisition with assessment of baseline factors affecting proficiency has not been reported.

Herein, we evaluate the implementation and utility of MSK SURGE (SURGical Experience), a novel automated system, to track trainee skill acquisition and progress in RALP. We also aim to identify baseline trainee characteristics associated with skill acquisition and proficiency.

2. Materials and methods

2.1. Program design

Following institutional review board approval, urology trainees and attending surgeons were enrolled in MSK SURGE. Trainees then performed a baseline self-assessment. MSK SURGE is a novel, standardized, automated program that tracks trainee progress based on the assessment tools GEARS and PACE [7], [8], and does not rely on trainees or attending surgeons to initiate evaluations. Instead, once training surgeries are present in the scheduling system, evaluations are automatically generated and sent by the computer system (results of prior evaluations sent before surgery and blank evaluation forms after surgery). All participants received an overview of the evaluation system to review before participation and reference as needed. All RALPs performed with a trainee between July 2020 and June 2021 were eligible for capture. After each captured procedure, the trainee and attending surgeons independently completed a two-component evaluation (Supplementary Tables 1 and 2). Robotic technical skills were assessed using GEARS. Procedure step-specific performance was assessed on a five-point anchored Likert scale (1, “novice” to 5, “expert”) across 12 steps based on PACE. Trainees indicated which steps they completed ≥50% and performed a self-assessment. Attending surgeons then verified the completed steps and performed a blinded evaluation of the trainees’ performance. After 4 mo, before performing the next scheduled training surgery, trainees and attending surgeons began receiving performance reports summarizing the mean performance score over the previous five training events and the cumulative number of times a step had been completed (Supplementary Fig. 1).

Program utility was assessed with an exit survey evaluating the subjective impact on the trainee’s education and the effect of evaluation summaries upon discussions prior to the training surgery and the attending surgeon’s education plan during the training procedure.

2.2. Statistical analysis

Program feasibility and implementation were assessed by the capture rate (all RALPs captured vs performed at the institution), evaluation completion rate by trainee versus attending surgeons, and time to complete the evaluations. The concordance between performance evaluations of trainee and attending surgeons were described with concordance rates and intraclass correlation coefficient (ICC; ranging from 0, no correlation, to 1, perfect correlation). Generalized estimating equation (GEE) models assessed the impact of baseline trainee characteristics (resident vs fellow, years of training, prior laparoscopic or robotic courses, number of previously performed RALP, baseline trainee familiarity with RALP, baseline trainee self-assessment for each step/domain, and time between the first and current cases) on achieving an attending rating of ≥4 (proficient) for each RALP step and GEARS domain. A GEE model was created for each baseline trainee characteristic, which included the characteristic of interest and the time from first RALP training event, with models clustered by trainee. GEE models were also created to assess the impact of prior performance of a step and whether trainees performed it during subsequent training events. A GEE model was created for each step, with the outcome being completion of that step and the predictor being the average attending rating across the previous five times that the step was performed by that trainee, with clustering by the trainee. Correction for multiple testing was done using the false discovery rate. Exit survey responses were summarized with descriptive statistics.

Analyses were conducted using R version 4.1.0 (R Foundation for Statistical Computing, Vienna, Austria).

3. Results

3.1. Feasibility/overview of program

Fourteen urology trainees (six fellows and eight chief residents) and nine attending surgeons participated in this program. Information on previous training among trainees is presented in Table 1. Trainee baseline self-assessments on RALP steps and GEARS questions are presented in Supplementary Tables 3 and 4. During the study period, 519 RALPs were performed, with 449 (87%) entering into MSK SURGE. Of the captured surgeries, trainees completed self-evaluations for 323 (72%); of these 323, attending surgeons completed evaluations for 250 (77%). Attrition over the study period was low except for the last 2 mo of the program, corresponding to the end of the academic year (Supplementary Fig. 2). Of the surgeries with completed self-evaluations, 117 were completed before performance summary reporting and 206 were completed after summary reporting. The median time to complete the evaluation was 1 min for trainees (interquartile range [IQR], 1–2) and attending surgeons (IQR, 0–1). Evaluations were completed on the same day of receiving these 31% and 68% of the time by trainees and attending surgeons, respectively (Supplementary Table 5). Trainees and attending surgeons agreed for the steps performed during 94% (235/250) of surgeries, with only 25 total discrepancies.

Table 1.

Previous training and experience from enrollment survey

Characteristic Fellow
(N = 6)
Resident
(N = 8)
Training, no. (%)
 PGY 4 NA 2 (25)
 PGY 5 NA 3 (38)
 PGY 6 (not a fellow) NA 3 (38)
 1 yr out of residency 3 (50) NA
 2 yr out of residency 3 (50) NA
Robotic course taken, no. (%)
 No 3 (50) 7 (88)
 Yes (≤2 yr ago) 1 (17) 1 (13)
 Yes (>2 yr ago) 2 (33) 0 (0)
Laparoscopic course taken, no. (%) 2 (33) 0 (0)
Robotic modules taken, no. (%) 6 (100) 5 (63)
No. of previous laparoscopic surgeries led as surgeon, median (IQR) 10 (7, 32) 2 (0, 7)
No. of previous laparoscopic surgeries assisted, median (IQR) 20 (12, 35) 15 (9, 42)
Ever performed robotic surgery, no. (%) 6 (100) 8 (100)
No. of previous robotic surgeries led as surgeon, median (IQR) 65 (35, 132) 8 (0, 20)
No. of previous robotic surgeries assisted, median (IQR) 88 (64, 100) 50 (46, 50)

IQR = interquartile range; NA = not applicable; no. = number; PGY = postgraduate year.

Residents and fellows completed different steps. The steps least commonly performed by residents were neurovascular bundle dissection (N = 0), apical and dorsal venous complex dissection/urethral division (N = 2), endopelvic fascia (N = 5), lateral pedicles (N = 7), and posterior dissection (N = 9). The steps least commonly performed by fellows were apical and dorsal venous complex dissection/urethral division (N = 13) and endopelvic fascia (N = 16). Fellows performed a higher cumulative number of steps than residents, with no residents and two (33%) fellows performing all 12 steps (Supplementary Table 6). While the median number of steps that attending surgeons allowed any trainee to perform ranged across attending surgeons, from two (IQR, 2–2) to seven (IQR, 5–7) steps, fellows performed more steps per surgery than residents with little variation throughout the study (Supplementary Fig. 3).

Inter-rater agreement for evaluation tools (Supplementary Table 7), as calculated by ICC among the GEARS questions, was highest for bimanual dexterity (ICC, 0.47; 95% confidence interval [CI], 0.37–0.56) and lowest for force sensitivity (ICC, 0.27; 95% CI, 0.14–0.38). Agreement amongst RALP steps was highest for seminal vesicle dissection (ICC, 0.57; 95% CI, 0.43–0.68) and lowest for endopelvic fascia dissection (ICC, 0.05; 95% CI, –0.48 to 0.56).

3.2. Predictors of proficiency and performance

There were enough cases and events to assess the association between baseline trainee characteristics and scoring of ≥4 for the following steps: anterior bladder neck, posterior bladder neck, seminal vesicle dissection, posterior dissection, lateral pedicles, anterior anastomosis, and pelvic lymph node dissection (Table 2).

Table 2.

Association of achieving proficiency (attending evaluation score ≥4) on individual surgical steps with baseline training and self-assessment covariates a

Characteristic
Anterior bladder neck
Posterior bladder neck
Seminal vesicle dissection
Posterior dissection
N Event N OR (95% CI) p value N Event N OR (95% CI) p value N Event N OR (95% CI) p value N Event N OR (95% CI) p value
Trainee is a fellow 117 80 3.07 (1.21, 7.79) 0.050 76 36 3.53 (0.44, 28.4) 0.3 123 78 3.88 (1.65, 9.12) 0.010 64 39 2.80 (0.23, 34.9) 0.5
Years of training 110 75 1.64 (1.11, 2.43) 0.046 72 36 0.91 (0.48, 1.69) 0.8 112 73 1.50 (0.99, 2.27) 0.10 60 39 0.76 (0.39, 1.51) 0.5
Most familiarity with RALP 117 80 2.09 (0.90, 4.89) 0.15 76 36 1.22 (0.56, 2.63) 0.6 123 78 2.44 (1.16, 5.13) 0.050 64 39 3.88 (1.84, 8.20) 0.003
Any prior laparoscopic course taken 117 80 76 36 3.91 (1.49, 10.2) 0.023 123 78 2.75 (0.85, 8.91) 0.15 64 39 2.12 (0.85, 5.32) 0.2
Any prior robotic course taken 117 80 2.19 (1.13, 4.24) 0.051 76 36 3.45 (1.15, 10.3) 0.063 123 78 1.81 (0.57, 5.73) 0.4 64 39 1.46 (0.56, 3.78) 0.5
No. of laparoscopic procedures (per 10 procedures) 117 80 1.12 (1.07, 1.18) <0.001 76 36 1.16 (1.02, 1.32) 0.062 123 78 1.16 (1.02, 1.32) 0.050 64 39 1.16 (1.04, 1.29) 0.036
No. of robotic procedures (per 10 procedures) 117 80 1.15 (1.09, 1.21) <0.001 76 36 1.02 (0.95, 1.10) 0.6 123 78 1.05 (0.94, 1.17) 0.5 64 39 1.02 (0.92, 1.13) 0.8
Baseline rating
Never performed 6 1 2 0
 Score 1–3 48 31 4.80 (3.50, 6.60) <0.001 58 24 64 36 43 25
 Score 4–5 63 48 12.1 (7.17, 20.5) <0.001 18 12 3.03 (0.74, 12.4) 0.2 59 42 1.41 (0.49, 4.03) 0.6 19 14
Months between first and current cases 117 80 1.15 (1.00, 1.32) 0.085 76 36 1.10 (0.96, 1.25) 0.2 123 78 1.28 (1.16, 1.40) <0.001 64 39 1.51 (1.09, 2.09) 0.046
Lateral pedicles Anterior anastomosis Pelvic lymph node dissection
N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value
Trainee is a fellow 55 21 217 193 7.00
(1.22, 40.3)
0.064 132 95 2.91
(0.81, 10.5)
0.2
Years of training 53 21 2.06
(1.09, 3.90)
0.063 202 180 3.16
(1.41, 7.06)
0.022 119 91 1.27
(1.01, 1.58)
0.082
Most familiarity with RALP 55 21 0.26
(0.05, 1.34
0.2 217 193 10.2
(1.88, 55.9)
0.028 132 95 0.28
(0.07, 1.17)
0.14
Any prior laparoscopic course taken 55 21 2.27
(0.60, 8.66)
0.3 217 193 132 95 3.33
(0.55, 20.2)
0.3
Any prior robotic course taken 55 21 3.81
(1.00, 14.5)
0.092 217 193 5.64
(0.75, 42.5)
0.2 132 95 3.27
(0.91, 11.8)
0.12
No. of laparoscopic procedures (per 10 procedures) 55 21 1.14
(0.95, 1.37)
0.2 217 193 1.22
(0.97, 1.53)
0.2 132 95 1.15
(0.97, 1.36)
0.2
No. of robotic procedures (per 10 procedures) 55 21 1.10
(0.98, 1.24)
0.2 217 193 1.18
(0.90, 1.53)
0.3 132 95 1.07
(0.99, 1.16)
0.2
Baseline rating
Never performed 4 0 33 21 21 11
 Score 1–3 35 16 107 98 5.71
(0.63, 52.0)
0.2 103 76
 Score 4–5 16 5 77 74 11.9
(1.07, 131)
0.085 8 8
Months between first and current cases 55 21 1.08
(0.80, 1.47)
0.6 217 193 1.20
(0.97, 1.47)
0.15 132 95 1.24
(1.04, 1.48)
0.046

CI = confidence interval; No. = number; OR = odds ratio; RALP = robotic-assisted laparoscopic prostatectomy.

a

Missing estimates for bladder mobilization, endopelvic fascia, neurovascular bundle dissection, apical and dorsal venous complex dissection/urethral division, and posterior anastomosis due to insufficient events to create generalized estimating equation model. Empty cells indicate missing estimates due to insufficient events to create generalized estimating equation model. Cells with “–” indicate the reference group.

Being a fellow, increased years of training, and prior laparoscopic or robotic course taken increased the odds of receiving a score of ≥4 on the studied steps. While not all associations were statistically significant, effect sizes were relatively consistent across steps, and outlying odds ratios (ORs) had wide CIs. Being a fellow, a 1-yr increase in years of training, or having taken a laparoscopic or robotic course was associated with a two to four times increase in odds of scoring ≥4. Similarly, the number of prior laparoscopic or robotic procedures performed was significantly associated with many of the steps. While not all associations were significant, ORs for the number of prior procedures were consistent, with most ORs being between 1.1 and 1.2 (per ten prior procedures performed) for these steps. Effect sizes and statistical significance for baseline rating and months between first and current cases were less consistent across steps.

Results were similar for the association between baseline characteristics and scores ≥4 for the GEARS domains (Table 3). There were significant associations between most baseline characteristics and most GEARS domains. While no individual characteristic was significantly associated with all GEARS domains, effect sizes were largely consistent across GEARS domains for the same baseline characteristic. There was a trend toward a higher median total GEARS score for fellows in comparison with residents, although that difference appeared to disappear with increased case numbers (Supplementary Fig. 4).

Table 3.

Association of achieving proficiency (attending evaluation score of ≥4) on individual Global Evaluative Assessment of Robotic Skills domains with baseline training and self-assessment covariates a

Characteristic N Depth perception
Bimanual dexterity
Efficiency
Event N OR 95% CI p value Event N OR 95% CI p value Event N OR 95% CI p value
Trainee is a fellow 250 225 8.21 1.25, 54.1 0.064 205 4.92 1.34, 18.0 0.048 166 9.51 3.79, 23.8 <0.001
Years of training 233 211 4.20 2.05, 8.61 <0.001 194 2.08 1.09, 3.96 0.062 160 2.82 1.59, 5.02 0.003
Most familiarity with RALP 250 225 6.98 1.43, 34.0 0.048 205 4.11 0.96, 17.6 0.10 166 1.48 0.25, 8.64 0.7
Any prior laparoscopic course taken 250 225 205 14.6 5.76, 37.0 <0.001 166 7.98 3.84, 16.6 <0.001
Any prior robotic course taken 250 225 205 6.15 1.60, 23.6 0.031 166 8.10 4.47, 14.7 <0.001
No. of laparoscopic procedures (per 10 procedures) 250 225 1.40 1.03, 1.91 0.067 205 1.20 1.04, 1.39 0.046 166 1.17 1.01, 1.35 0.082
No. of robotic procedures (per 10 procedures) 250 225 1.15 0.82, 1.62 0.5 205 1.23 1.00, 1.51 0.089 166 1.24 1.10, 1.41 0.003
Baseline rating
 Score 1–3 162 141 154 99
 Score 4–5 88 84 2.71 0.46, 15.8 0.3 51 1.46 0.40, 5.31 0.6 67 12.7 4.70, 34.2 <0.001
Months between first and current cases 250 225 1.38 1.12, 1.70 0.012 205 1.57 1.22, 2.03 0.003 166 1.13 1.00, 1.28 0.085
Force sensitivity Autonomy Robotic control
Event N OR 95% CI p value Event N OR 95% CI p value Event N OR 95% CI p value
Trainee is a fellow 250 206 4.62 2.13, 10.0 <0.001 176 5.11 1.58, 16.6 0.026 221 9.27 2.12, 40.5 0.014
Years of training 233 195 1.58 1.02, 2.44 0.082 170 1.89 1.13, 3.16 0.048 206 3.28 1.80, 5.99 <0.001
Most familiarity with RALP 250 206 0.25 0.09, 0.73 0.040 176 1.87 0.39, 9.01 0.5 221 7.21 1.39, 37.3 0.050
Any prior laparoscopic course taken 250 206 14.7 2.57, 84.1 0.012 176 4.18 0.58, 30.2 0.2 221 6.30 0.59, 67.8 0.2
Any prior robotic course taken 250 206 8.14 2.25, 29.4 0.008 176 6.29 1.32, 29.9 0.053 221 4.25 0.82, 22.0 0.14
No. of laparoscopic procedures (per 10 procedures) 250 206 1.24 1.02, 1.50 0.063 176 1.12 0.91, 1.38 0.4 221 1.25 1.01, 1.54 0.085
No. of robotic procedures (per 10 procedures) 250 206 1.16 1.06, 1.27 0.006 176 1.05 0.94, 1.17 0.5 221 1.34 0.90, 1.98 0.2
Baseline rating
Score 1–3 209 166 114 138
Score 4–5 41 40 13.6 2.68, 69.5 0.009 62 2.18 0.46, 10.4 0.4 83 7.09 1.42, 35.4 0.049
Months between first and current cases 250 206 1.06 0.84, 1.33 0.6 176 1.33 1.19, 1.49 <0.001 221 1.28 1.08, 1.50 0.016

CI = confidence interval; No. = number; OR = odds ratio; RALP = robotic-assisted laparoscopic prostatectomy.

a

Empty cells indicate missing estimates due to insufficient events to create generalized estimating equation model. Cells with “–” indicate the reference group.

3.3. Impact on training

Summative feedback reports did not affect the number of steps performed (Supplementary Fig. 3). Previously performing the posterior anastomosis with a cumulative score of ≥4 increased the likelihood of performing it again (OR, 6.08; 95% CI, 2.47–15.0), indicating some influence of prior performance on future training (Table 4).

Table 4.

Prior performance on steps captured by performance summaries and their association with subsequently performing the same steps a

Characteristic Step 1
Bladder mobilization
Step 2
Endopelvic fascia
Step 3
Anterior bladder neck dissection
Step 4
Posterior bladder neck dissection
N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value
Rating on previous 5 steps
 Never performed 35 11 0.8 151 5 34 11 0.2 51 9 0.5
 Avg attending rating <4 b 62 17 0.91
(0.20, 4.15)
14 3 120 53 1.66
(0.60, 4.53)
159 50 1.16
(0.65, 2.07)
 Avg attending rating ≥4 b 142 47 0.78
(0.39, 1.57)
73 5 82 48 2.92
(0.94, 9.05)
27 16 1.80
(0.70, 4.62)
Step 5
Seminal vesicle dissection
Step 6
Posterior dissection
Step 7
Lateral pedicles
Step 8
Neurovascular bundle dissection
N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value
Rating on previous 5 steps
 Never performed 21 13 0.11 73 10 0.9 91 10 0.6 132 6 0.5
 Avg attending rating <4 b 155 63 0.38
(0.11, 1.34)
131 35 0.85
(0.36, 2.02)
118 35 1.14
(0.60, 2.14)
97 21 0.37
(0.05, 3.02)
 Avg attending rating ≥4 b 63 40 0.67
(0.20, 2.26)
33 16 0.71
(0.14, 3.46)
31 7 0.67
(0.32, 1.43)
10 3 0.62
(0.02, 15.7)
Step 9
Apical and dorsal venous complex dissection/urethral division
Step 10
Posterior anastomosis
Step 11
Anterior anastomosis
Step 12
Pelvic lymph node dissection
N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value N Event N OR
(95% CI)
p value
Rating on previous 5 steps
 Never performed 162 7 46 9 0.003 15 12 0.6 34 9 0.5
 Avg attending rating <4 b 42 2 37 22 4.71
(0.76, 29.1)
51 43 1.41
(0.31, 6.42)
122 64 1.77
(0.68, 4.63)
 Avg attending rating ≥4 b 31 3 156 101 6.08
(2.47, 15.0)
172 155 2.14
(0.57, 8.03)
83 51 2.05
(0.68, 6.20)

Avg = average; CI = confidence interval; OR = odds ratio.

a

Empty cells indicate missing estimates due to insufficient events to create generalized estimating equation model. Cells with “–” indicate the reference group.

b

Indicates average attending rating over the last five times.

3.4. Utility assessment

Exit surveys measuring program utility were completed by 89% (8/9) of attending surgeons and 50% (7/14) of trainees, specifically 5/6 fellows and 2/8 residents. Five of the eight (63%) responding attending surgeons agreed or strongly agreed that summary reporting influenced which steps a trainee performed. Prior to receiving summary reports, 50% (4/8) of responding attending surgeons agreed or strongly agreed that discussing which steps a trainee had performed led the trainee to performing that step, which increased to 88% (7/8) with summary reporting. For responding trainees, this was 33% (2/6) before and 50% (3/6) after summary reporting (Table 5).

Table 5.

Exit survey responses regarding the impact of the program on performance and the utility of the program as a training tool for trainees and for other index operations

Evaluations of impact on performance, no. (%) Before performance summary introduction
After performance summary introduction
Attending
(N = 8)
Trainee
(N = 7)
Attending
(N = 8)
Trainee
(N = 7)
How often was trainee's completion of steps in previous surgeries discussed?
 Never 1 (13) 1 (17) 1 (13) 1 (17)
 Sometimes 3 (38) 4 (67) 1 (13) 2 (33)
 About half the time 3 (38) 0 (0) 1 (13) 1 (17)
 Most of the time 1 (13) 1 (17) 5 (63) 2 (33)
 Always 0 (0) 0 (0) 0 (0) 0 (0)
 Unknown 0 1 0 1
How often was trainee's completion of steps in upcoming surgery discussed?
 Never 0 (0) 1 (17) 0 (0) 1 (17)
 Sometimes 3 (38) 3 (50) 1 (13) 2 (33)
 About half the time 5 (63) 1 (17) 2 (25) 0 (0)
 Most of the time 0 (0) 1 (17) 5 (63) 3 (50)
 Always 0 (0) 0 (0) 0 (0) 0 (0)
 Unknown 0 1 0 1
If a step was discussed before the surgery, how often did the trainee complete that step?
 Never 0 (0) 1 (17) 0 (0) 1 (17)
 Sometimes 2 (25) 2 (33) 1 (13) 1 (17)
 About half the time 2 (25) 1 (17) 0 (0) 1 (17)
 Most of the time 4 (50) 2 (33) 6 (75) 3 (50)
 Always 0 (0) 0 (0) 1 (13) 0 (0)
 Unknown 0 1 0 1
How often did you read the performance summary?
 Never NA NA 0 (0) 1 (17)
 Sometimes NA NA 2 (25) 1 (17)
 About half the time NA NA 0 (0) 1 (17)
 Most of the time NA NA 3 (38) 3 (50)
 Always NA NA 3 (38) 0 (0)
 Unknown NA NA 0 1
How often did you discuss the performance summary?
 Never NA NA 0 (0) 1 (17)
 Sometimes NA NA 4 (50) 3 (50)
 About half the time NA NA 0 (0) 1 (17)
 Most of the time NA NA 4 (50) 1 (17)
 Always NA NA 0 (0) 0 (0)
 Unknown NA NA 0 1
Discussing the performance summary led to changes in which steps the trainee performed during surgery?
 Strongly disagree NA NA 0 (0) 0 (0)
 Somewhat disagree NA NA 0 (0) 1 (17)
 Neither agree nor disagree NA NA 3 (38) 3 (50)
 Somewhat agree NA NA 4 (50) 2 (33)
 Strongly agree NA NA 1 (13) 0 (0)
 Unknown NA NA 0 1
Utility evaluations, no. (%) Attending
(N = 8)
Trainee
(N = 7)
The assessment and feedback system was a tool that improved my training a NA 6 (86)
Such an experience tracking and reporting system would be beneficial for other index operations
 Strongly disagree 0 (0) 0 (0)
 Somewhat disagree 0 (0) 1 (14)
 Neither agree nor disagree 0 (0) 0 (0)
 Somewhat agree 5 (63) 5 (71)
 Strongly agree 3 (38) 1 (14)
Regarding the daily summary e-mails prior to RALP, seeing a representation of the fellow/resident's performance informed which steps I had the fellow/resident perform b
 Strongly disagree 0 (0) NA
 Somewhat disagree 1 (13) NA
 Neither agree nor disagree 1 (13) NA
Somewhat agree 4 (50) NA
 Strongly agree 2 (25) NA
Regarding the daily summary e-mails prior to RALP, seeing a representation of the fellow/resident's performance informed which robotic skills I focused on with the fellow/resident b
 Strongly disagree 1 (13) NA
 Somewhat disagree 0 (0) NA
 Neither agree nor disagree 0 (0) NA
 Somewhat agree 5 (63) NA
 Strongly agree 2 (25) NA
Regarding the daily summary e-mails prior to RALP, having a performance summary prior to surgery increased my likelihood of discussing prior performance with the resident/fellow compared with not having a performance summary b
 Strongly disagree 0 (0) NA
 Somewhat disagree 1 (13) NA
 Neither agree nor disagree 0 (0) NA
 Somewhat agree 4 (50) NA
 Strongly agree 3 (38) NA

NA = not applicable; no. = number; RALP = robotic-assisted laparoscopic prostatectomy.

a

Only asked to trainees. Response is “yes/no.”

b

Only asked to attending surgeons.

Overall, 88% (7/8) of responding attending surgeons agreed or strongly agreed that seeing a representation of prior performance increased the chances of discussing prior performance with the trainee and informed them on which robotic skills to subsequently focus on. Responding trainees (86%, 6/7) stated that the program improved their training and would be beneficial for other index surgeries (Table 5).

4. Discussion

To the authors’ knowledge, this study represents the first automated longitudinal assessment of robotic surgical skill acquisition and performance in which evaluation events were initiated for all captured training events rather than those selected by the trainee or attending surgeon [11], [12]. Certain baseline characteristics also were associated with achieving proficiency. Implementation of such a system through a high capture rate of all RALP training events was feasible with a minimal burden on the participants. The performance summary system informed attending surgeons of prior performance and helped inform how attending surgeons approached teaching trainees in subsequent cases, presenting a path toward proficiency-based education.

High capture and compliance rates with completing the evaluations throughout the study period were observed. The high capture rate of 87% may be due to MSK SURGE being the first system to continuously track every training event. MSK SURGE resulted in an overall paired completion rate of 77%, which is higher than other studies with paired completion rates utilizing SIMPLE (38%) and myTIPreport (43%; overall capture rate not reported) [12]. The evaluation completion rate was consistent, except in the last 2 mo of the program, in comparison with other electronic evaluation programs that have reported decreased participation and occasional forgetfulness to complete evaluations [11], [12]. The observed evaluation completion and capture rates could be attributed to the minimal time commitment per evaluation and automated nature of the system.

Baseline characteristics, including the amount of training, prior number of laparoscopic or robotic cases, and prior laparoscopic or robotic course participation, were associated with achieving proficiency on RALP steps and GEARS domains. The study’s finding that formal robotic skill training is associated with improved surgical skill performance is consistent with the findings of previous studies [13], [14]. Therefore, we identified actionable areas, such as focusing on simulation and skill training outside the operating room, to subsequently enhance skill acquisition in the operating room.

We hypothesized that the knowledge of prior experience would inform which steps a trainee performed during subsequent cases. Attending exit survey responses also suggested that performance reports influenced education plans and increased discussion of the trainee’s prior performance before a training event. Although the models identified less influence than expected, there was a positive effect on performing posterior anastomosis after achieving proficiency. Reasons for this discrepancy might be unmeasured factors affecting training participation, that too few of cases were performed to measure a change, and/or the possibility that attending surgeons allow trainees to perform the prescribed steps with little variation. If the last two were drivers, an experience tracking system such as the one described here would help identify such weaknesses in the curriculum and act as a quality improvement initiative for education. It could inform further simulated or in vivo training and the possible benefits of a structured approach to RALP instruction, as advocated previously [15], [16].

There was a wide range of concordance between attending surgeon evaluations and trainee self-evaluations, and discordance was often due to the trainees rating themselves lower than the attending surgeons. Trainee under-rating on self-assessments has been described previously [17], but the known phenomenon of surgeons struggling to self-assess has been overcome by coaching sessions [18], [19]. As GEARS was previously validated while assessing individual surgical steps (eg, seminal vesicle dissection or bowel suturing), herein, it was applied to RALP as a whole, which could have caused variability in GEARS assessments [7], [20]. This limitation could be overcome with further standardization of evaluations, an external evaluator, and increased training in performing evaluations.

Despite the high capture and evaluation completion rates, neither was 100%, and it is possible that infrequently performed RALP steps by a trainee were not captured, contributing to the difficulty of analyzing longitudinal performance metrics for these infrequently performed steps. We are currently exploring methods to increase automation of case capture and improve assessment completion. Administering exit surveys at the end of the academic year may also have lowered participation, as most rotating residents returned to their home institutions. Finally, due to the subjective nature of the evaluations, although structured, we did not capture automated robotic metrics, which could add an objective component.

Despite the limitations inherent to a newly implemented system, MSK SURGE was quick to use and helped guide the education process for both trainees and attending surgeons. The importance of capturing every procedure that a trainee performed and assuring technical proficiency rather than relying on global assessments have been discussed previously [21]. MSK SURGE applied this principle and captured technical competence during RALP, whereas prior feedback programs within urology offer only an overall assessment and are designed to capture cases only when initiated by those involved [11], [12], [22]. While systems such as SIMPL evaluate a trainee’s perceived readiness to operate independently, they do not guarantee the technical skill on individual steps performed, or that the trainee has performed all steps before completing training. MSK SURGE tracks actual trainee involvement and progress during surgery, and can be scaled to include all index surgeries, thereby giving the attending surgeons a snapshot of a trainee’s prior experience and performance before the next training event. Automated evaluations and feedback allow for further curricular development that addresses trainees’ proficiencies in an individualized manner. We also found that certain steps were rarely performed by trainees, which can help inform programs during curricular development to shift toward proficiency-directed training programs.

Modifiable baseline characteristics such as previous formalized course participation in robotic surgery, or passively modifiable ones such as years of training, were associated with higher performance evaluations. These baseline assessments can advise training programs on potential ways of increasing trainee performance outside the operating room. Most participants found the system helpful and would like to expand it to other index surgeries. Future efforts include removing barriers for participation by increasing automation, expanding to other case types, refining curricular development, and predicting proficiency outcomes from training assessments.

5. Conclusions

This is the first successful implementation of an automated, structured surgical experience and feedback system. The evaluations were quick and unique to MSK SURGE. We tracked and presented trainee longitudinal proficiency before training events. The present data highlight the potential role for formalized robotics courses in surgical education to catalyze skill acquisition and possible areas for curricular development. Such a program has the potential to personalize an individual’s training and identify areas of opportunity for surgical education programs. Further studies are needed to assess the use of such a program on a larger scale (eg, other index surgeries, at the residency level) to better describe the learning curve over the complete education continuum and inform quality improvement projects to address curricular development to achieve proficiency-based education and evaluation.

This study was previously presented in poster format at the American Urological Association Annual Meeting on May 14, 2022, in New Orleans, LA, USA.



Author contributions: Alvin C. Goh had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.



Study concept and design: Goh, Wilcox Vanden Berg.

Acquisition of data: Goh, Cha, Coleman, Donahue, Eastham, Ehdaie, Laudone, Pietzak, Smith.

Analysis and interpretation of data: Goh, Vertosick, Sjoberg, Wilcox Vanden Berg.

Drafting of the manuscript: Wilcox Vanden Berg, Goh.

Critical revision of the manuscript for important intellectual content: Goh, Wilcox Vanden Berg.

Statistical analysis: Vertosick, Sjoberg.

Obtaining funding: Goh.

Administrative, technical, or material support: Goh.

Supervision: Goh.

Other: None.



Financial disclosures: Alvin C. Goh certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.



Funding/Support and role of the sponsor: This work was supported in part by the National Institutes of Health/National Cancer Institute (NIH/NCI) Cancer Center Support Grant to Memorial Sloan Kettering Cancer Center (P30 CA008748). The sponsor did not play a role in the design and conduct of the study; data collection, management, analysis, and interpretation; or preparation, review, and approval of the manuscript.

Associate Editor: Roderick van den Bergh

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.euros.2024.02.014.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Supplementary data 1
mmc1.docx (496.1KB, docx)

References

  • 1.Swing S.R., Beeson M.S., Carraccio C., et al. Educational milestone development in the first 7 specialties to enter the next accreditation system. J Grad Med Educ. 2013;5:98–106. doi: 10.4300/JGME-05-01-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Potts J.R., 3rd. Assessment of competence: the accreditation council for graduate medical education/residency review committee perspective. Surg Clin North Am. 2016;96:15–24. doi: 10.1016/j.suc.2015.08.008. [DOI] [PubMed] [Google Scholar]
  • 3.Carlin A.M., Gasevic E., Shepard A.D. Effect of the 80-hour work week on resident operative experience in general surgery. Am J Surg. 2007;193:326–330. doi: 10.1016/j.amjsurg.2006.09.014. [DOI] [PubMed] [Google Scholar]
  • 4.Mattar S.G., Alseidi A.A., Jones D.B., et al. General surgery residency inadequately prepares trainees for fellowship: results of a survey of fellowship program directors. Ann Surg. 2013;258:440–449. doi: 10.1097/SLA.0b013e3182a191ca. [DOI] [PubMed] [Google Scholar]
  • 5.George B.C., Bohnen J.D., Williams R.G., et al. Readiness of US general surgery residents for independent practice. Ann Surg. 2017;266:582–594. doi: 10.1097/SLA.0000000000002414. [DOI] [PubMed] [Google Scholar]
  • 6.Okhunov Z., Safiullah S., Patel R., et al. Evaluation of urology residency training and perceived resident abilities in the United States. J Surg Educ. 2019;76:936–948. doi: 10.1016/j.jsurg.2019.02.002. [DOI] [PubMed] [Google Scholar]
  • 7.Aghazadeh M.A., Jayaratna I.S., Hung A.J., et al. External validation of global evaluative assessment of robotic skills (GEARS) Surg Endosc. 2015;29:3261–3266. doi: 10.1007/s00464-015-4070-8. [DOI] [PubMed] [Google Scholar]
  • 8.Hussein A.A., Ghani K.R., Peabody J., et al. Development and validation of an objective scoring tool for robot-assisted radical prostatectomy: prostatectomy assessment and competency evaluation. J Urol. 2017;197:1237–1244. doi: 10.1016/j.juro.2016.11.100. [DOI] [PubMed] [Google Scholar]
  • 9.Khan H., Kozlowski J.D., Hussein A.A., et al. Use of robotic anastomosis competency evaluation (RACE) for assessment of surgical competency during urethrovesical anastomosis. Can Urol Assoc J. 2019;13:E10–E16. doi: 10.5489/cuaj.5348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hung A.J., Chen J., Ghodoussipour S., et al. A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU Int. 2019;124:487–495. doi: 10.1111/bju.14735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang R.S., Daignault-Newton S., Ambani S.N., Hafez K., George B.C., Kraft K.H. SIMPLifying urology residency operative assessments: a pilot study in urology training. J Urol. 2021;206:1009–1019. doi: 10.1097/JU.0000000000001874. [DOI] [PubMed] [Google Scholar]
  • 12.Nethala D., Martin C., Griffiths L., et al. Feasibility and utility of mobile applications for the evaluation of urology residents' surgical competence. Urology. 2021;158:11–17. doi: 10.1016/j.urology.2021.05.112. [DOI] [PubMed] [Google Scholar]
  • 13.Satava R.M., Stefanidis D., Levy J.S., et al. Proving the effectiveness of the fundamentals of robotic surgery (FRS) skills curriculum: a single-blinded, multispecialty, multi-institutional randomized control trial. Ann Surg. 2020;272:384–392. doi: 10.1097/SLA.0000000000003220. [DOI] [PubMed] [Google Scholar]
  • 14.Martin J.R., Stefanidis D., Dorin R.P., Goh A.C., Satava R.M., Levy J.S. Demonstrating the effectiveness of the fundamentals of robotic surgery (FRS) curriculum on the RobotiX mentor virtual reality simulation platform. J Robot Surg. 2021;15:187–193. doi: 10.1007/s11701-020-01085-4. [DOI] [PubMed] [Google Scholar]
  • 15.Hung A.J., Bottyan T., Clifford T.G., et al. Structured learning for robotic surgery utilizing a proficiency score: a pilot study. World J Urol. 2017;35:27–34. doi: 10.1007/s00345-016-1833-3. [DOI] [PubMed] [Google Scholar]
  • 16.Lovegrove C., Ahmed K., Novara G., et al. Modular training for robot-assisted radical prostatectomy: where to begin? J Surg Educ. 2017;74:486–494. doi: 10.1016/j.jsurg.2016.11.002. [DOI] [PubMed] [Google Scholar]
  • 17.Alameddine M.B., Claflin J., Scally C.P., et al. Resident surgeons underrate their laparoscopic skills and comfort level when compared with the rating by attending surgeons. J Surg Educ. 2015;72:1240–1246. doi: 10.1016/j.jsurg.2015.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Davis D.A., Mazmanian P.E., Fordis M., Van Harrison R., Thorpe K.E., Perrier L. Accuracy of physician self-assessment compared with observed measures of competence: a systematic review. JAMA. 2006;296:1094–1102. doi: 10.1001/jama.296.9.1094. [DOI] [PubMed] [Google Scholar]
  • 19.Bull N.B., Silverman C.D., Bonrath E.M. Targeted surgical coaching can improve operative self-assessment ability: a single-blinded nonrandomized trial. Surgery. 2020;167:308–313. doi: 10.1016/j.surg.2019.08.002. [DOI] [PubMed] [Google Scholar]
  • 20.Goh A.C., Goldfarb D.W., Sander J.C., Miles B.J., Dunkin B.J. Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills. J Urol. 2012;187:247–252. doi: 10.1016/j.juro.2011.09.032. [DOI] [PubMed] [Google Scholar]
  • 21.Williams R.G., George B.C., Bohnen J.D., et al. A proposed blueprint for operative performance training, assessment, and certification. Ann Surg. 2021;273:701–708. doi: 10.1097/SLA.0000000000004467. [DOI] [PubMed] [Google Scholar]
  • 22.Harriman D., Singla R., Nguan C. The resident report card: a tool for operative feedback and evaluation of technical skills. J Surg Res. 2019;239:261–268. doi: 10.1016/j.jss.2019.02.006. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data 1
mmc1.docx (496.1KB, docx)

Articles from European Urology Open Science are provided here courtesy of Elsevier

RESOURCES