Abstract
Objective:
Surgical skill assessment tools such as the End-to-End Assessment of Suturing Expertise (EASE) can differentiate surgeon’s experience level. In this simulation-based study, we define a competency benchmark for intraoperative robotic suturing using EASE as a validated measure of performance.
Design:
Participants conducted a dry-lab vesicourethral anastomosis (VUA) exercise. Videos were each independently scored by 2 trained, blinded reviewers using EASE. Inter-rater reliability was measured with prevalence-adjusted bias-adjusted Kappa (PABAK) using two example videos. All videos were reviewed by an expert surgeon, who determined if the suturing skills exhibited were at a competency level expected for residency graduation (pass or fail). The Contrasting Group (CG) method was then used to set a pass/fail score at the intercept of the pass and fail cohorts’ EASE score distributions.
Setting:
Keck School of Medicine, University of Southern California
Participants:
26 participants: 8 medical students, 8 junior residents (PGY 1–2), 7 senior residents (PGY 3–5) and 3 attending urologists
Results:
After one round of consensus-building, average PABAK across EASE sub-skills was 0.90 (Range 0.67–1.0). The CG method produced a competency benchmark EASE score of >35/39, with a pass rate of 10/26 (38%); 27% were deemed competent by expert evaluation. False positives and negatives were defined as medical students who passed and attendings who failed the assessment, respectively. This pass/fail score produced no false positives or negatives, and fewer JR than SR were considered competent by both the expert and CG benchmark.
Conclusions:
Using an absolute standard setting method, competency scores were set to identify trainees who could competently execute a standardized dry-lab robotic suturing exercise. This standard can be used for high stakes decisions regarding a trainee’s technical readiness for independent practice. Future work includes validation of this standard in the clinical environment through correlation with clinical outcomes.
Keywords: Clinical Competence, Benchmarking, Education, Robotic Surgical Procedures
Introduction
Surgical training requires the gradual mastery of a wide array of technical skills, including careful tissue handling, dissection and suturing. Effective suturing technique is critical across all surgical specialties and platforms, including robotic-robotic assisted surgery (RAS). Robotic suturing performance has been shown to impact clinical outcomes, such as predicting time to continence recovery following in robotic-assisted radical prostatectomy (RARP)1,2.
Previously, our group developed the End-to-End Assessment of Suturing Expertise (EASE) as a way to describe intraoperative robotic suturing in a granular fashion, as well as quantitively assess a surgeon’s robotic suturing performance. EASE scores are determined by examining 7 domains and 18 subskills considered necessary to suture efficiently in robotic-assisted surgery, derived via a rigorous Cognitive Task Analysis and Delphi Methodology. EASE was previously validated to be capable of discerning between expert surgeons with variable levels of experience using video taken during live surgery3. The constraints of evaluating surgeons in the operating room limits the ability to assess RAS novices, given the ethical and safety concerns4. This impacts the generalizability of the EASE score and makes its use as an assessment of competency in training unsubstantiated. The current study seeks to address this validity gap by scoring the full spectrum of trainee suturing skill, from pure novice such as medical students, all the way to expert attending urologists with mastery of RAS.
Beyond providing scores of trainee and surgeon suturing performance, it is equally important to define standards that contextualize these measures for the purposes of promotion along the training continuum5. Surgical education has transformed from a time-based model to a competency-based one, that demands trainees meet a minimum standard of knowledge and ability prior to independent practice. Competency standard-setting has been utilized previously to help categorize trainee performance in other aspects of surgical training6, but has not yet been defined for suturing or its composite skillsets. Without such regimented benchmarking and a way to interpret results, taking these evaluation tools into the field is impossible.
Herein, we aim to contextualize EASE for use in urologic and surgical education community by outlining a method for competency standard-setting in the context of technical intraoperative robotic suturing skills.
Methods
This study evaluated 28 participants, ranging from medical students to attending urologists, on their performance on the completion of a dry-lab model of a robotic vesicourethral anastomosis using EASE. Videos of participant performance were also blindly rated by an expert surgeon evaluator on the basis of whether they were at the competency level expected of a graduating senior resident. The combination of these data were used in the Contrasting Group method to establish an absolute standard with EASE for competent completion of the dry-lab exercise.
Step 1: Inter-rater Reliability (IRR) and Dry Lab Assessment
Four blinded, trained raters (TH, AH, JK, JY) assessed 2 example cases of a dry-lab vesicourethral anastomosis (VUA) exercise using End-to-End Assessment of Suturing Expertise (EASE) (IRB Protocol ID: STUDY00002850). The raters scored 13 sub-skills that make up the overall EASE score (Table 1). Full EASE criteria can be reviewed in Appendix 1. For subskills that occur multiple times in a case, such as Needle Repositions, each instance of said subskill was assigned a whole number (1,2,3); in order to rate the entirety of the case, an average of each instance’s score is tabulated. culminating in a continuous variable.
Table 1.
EASE Sub-skills evaluated in study
| EASE Sub-Skills |
|---|
| Domain: Needle Handling |
| Needle Repositions |
| Hold Ratio |
| Hold Angle |
| Depth of Needle Hold |
| Domain: Needle Entry |
| Entry Angle |
| Domain: Needle Driving |
| Driving Smoothness |
| Driving Wrist Rotation |
| Depth of Suture |
| Domain: Needle Withdrawal |
| Withdrawal Wrist Rotation |
| Domain: Suture Placement/Management |
| Suture Spacing |
| Suture Awareness |
| Cinching |
| Tissue Approximation |
After the initial scoring of the example cases, the four raters participated in a 1-hour consensus meeting to review discrepancies and improve alignment among the subskills7. The inter-rater reliability was measured with prevalence-adjusted bias-adjusted Kappa (PABAK) based on these scores.
Once aligned, the raters assessed the VUA dry-lab exercises of 26 participants with varying degrees of surgical experience; this included 8 medical students, 8 junior residents (PGY 1–2), 7 senior residents (PGY 3–5), and 3 attending urologists. The raters assessed 8 stitches total per case. Each participant’s case was independently assessed by 2 blind raters.
Step 2: Expert Review, Contrasting Group Method, and Analysis
All 26 participant cases were reviewed by an attending urologic surgeon (HD) who is an expert in robotic surgery and is responsible for training residents in robotic surgery at our institution. This expert evaluated the video and determined whether the presented robotic suturing skills were at a level acceptable for residency graduation (pass “competent” /fail “non-competent”). This expert was blinded to EASE scores and did not consider EASE in their evaluation.
The expert reviewer pass/fail scores were used along with the averaged raters’ EASE scores to establish a threshold of competency using the Contrasting Group (CG) method. CG is a common technique used in medical education studies to categorize performance4,6,8–11. Previous studies demonstrated that CG identifies cutoff points at levels similar to those identified using other methods (i.e., borderline groups) and provided evidence of consistency across the different methods2.
The CG method identifies a pass/fail cutoff score for competency based on EASE scores at the intersection between the expert evaluation-derived competent and non-competent group, allowing for the distinction of observed false positives and negatives2. Observed false positives and negatives were defined as medical students (a priori non-competent) who passed and attending urologists (a priori competent) who failed the assessment via CG cutoff, respectively. Residents were thought to be a heterogenous group for whom a priori conclusions could not be made. Theoretical false positives and false negatives were characterized with the cumulative distribution function2 by comparing the medical students and attending urologists cohorts.
We also characterized groups that passed one evaluation method and failed the other (e.g., CG Pass & Expert Fail or vice-versa), referred to as swap groups. In addition to observed swaps, using the cumulative distribution function4, we calculated the theoretical “swap-up” (CG Pass/Expert Fail) and “swap-down” (CG Fail/Expert Pass) rates.
Additionally, analysis was conducted on EASE scores between the subgroups 1) medical students and junior residents, and 2) senior residents and attending urologists. A Mann-Whitney U test was conducted to compare the mean overall, domain and sub-skill EASE scores of the two groups. The average EASE score of the raters from the 26 cases were used as the inputs. EASE scores were also compared by expert evaluation pass/fail and CG score cutoff.
Results
Participants were categorized by their level of training. The resident cohort included trainees from General Surgery (2), Obstetrics and Gynecology (1), and Urology (13). All video raters scored the two training videos. After one round of consensus-building, average PABAK across EASE sub-skills improved from 0.48 (Range 0.19–0.85) to 0.90 (0.67–1.0), with all agreement scores being characterized as substantial or better (PABAK ≥0.61).
The CG method, based off of EASE evaluation by trained non-expert raters, produced a competency benchmark score of >35/39 (Fig. 1), with a pass rate of 10/26 (38%); 7/26 (27%) were deemed competent by expert evaluation. All medical students were deemed non-competent and all attending urologists were deemed competent by both CG cutoff and expert evaluation (Fig. 2). Furthermore, using the previously documented distribution function4, theoretical false positives (TFP) and negatives (TFN) were found to be 2.4% and 1.3%, respectively. This would correlate with a sensitivity and specificity of 97.6% and 98.7%, respectively.
Fig. 1:

EASE Standard setting using the Contrasting Groups method (color)
Fig. 2:

Competency Rates based on Expert and CG Standard Setting Evaluation
The total agreement between expert and CG is 80.77% (Table 2). Compared to expert evaluation, CG had a sensitivity of 85.6% (6/7) and a specificity of 79.0% (15/19). The CG Fail group (16) consisted of 8 medical students, 6 junior residents and 2 senior residents; expert evaluation fail group (19) consisted of 8 medical students, 7 junior residents and 4 senior residents. Fewer junior residents than senior residents were considered competent by both cutoff methods.
Table 2.
Expert vs. EASE Cutoff Agreement, Swap Ups and Swan Downs
Swap Up: Competent by CG Cutoff and non-competent by Expert Evaluation
Swap Down: Non-competent by CG Cutoff and competent by Expert Evaluation
Four participants (15.3%) were considered swap-ups: competent by CG cutoff (EASE) and non-competent by expert surgeon evaluation. Swap-ups consisted of 1 junior resident and 3 senior residents, with overall EASE scores between 35.5 – 37.5. Swap-downs consisted of 1 senior residents (3.8%) with an overall EASE score of 33.8. (Table 2).
When comparing experience levels: 1) medical students and junior residents to 2) senior residents and attendings, significant differences were found in 6/13 sub-skills as well as overall EASE score (Table 3). When comparing CG cutoff to experience levels, one additional sub-skill is found to be significantly different when looking at CG cutoff groups (Cinching Technique). No additional sub-skills were found to be significantly different when comparing expert evaluation to experience levels. All EASE domains except Needle Entry were significantly different for comparisons by experience levels, CG cutoff and expert evaluation. All comparison methods found significant differences in overall EASE Scores.
Table 3.
Median, IQR and Mann Whitney U Tests on EASE Sub-Skills by Multiple Comparison Methods (color)
| EASE Domain | EASE Sub-Skill | Novices and JR (N=16) Median (IQR) | SR and Attendings (N=10) Median (IQR) | p-value | CG Fail (N=16) Median (IQR) | CG Pass (N=10) Median (IQR) | p-value | Expert Fail (N=19) Median (IQR) | Expert Pass (N=7) Median (IQR) | p-value |
|---|---|---|---|---|---|---|---|---|---|---|
| Pre-Planning | Field Optimization | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| Needle Handling | Needle Repositions | 2.8 (2.6–2.9) | 2.8 (2.8–2.9) | 0.544 | 2.4 (1.9–2.6) | 2.4 (2.1–2.6) | 0.579 | 2.3 (2.1–2.6) | 2.6 (1.9–2.9) | 0.247 |
| Hold Ratio | 2.4 (1.8–2.8) | 2.8 (2.6–2.9) | 0.020 | 2.4 (1.8–2.6) | 2.8 (2.8–3.0) | <0.001 | 2.4 (2.0–2.8) | 2.8 (2.8–2.9) | 0.025 | |
| Hold Angle | 2.5 (2.2–2.7) | 2.8 (2.8–3.0) | 0.006 | 2.5 (2.2–2.7) | 2.9 (2.8–3.0) | 0.001 | 2.6 (2.3–2.8) | 2.8 (2.8–3.0) | 0.034 | |
| Depth of Needle Hold | 2.8 (2.6–2.9) | 2.8 (2.8–2.9) | 0.672 | 2.8 (2.6–2.9) | 2.9 (2.8–2.9) | 0.302 | 2.8 (2.7–2.9) | 2.8 (2.8–2.9) | 0.749 | |
| Needle Entry | Entry Angle | 3.0 (2.8–3.0) | 3.0 (3.0–3.0) | 0.056 | 3.0 (2.8–3.0) | 3.0 (3.0–3.0) | 0.056 | 3.0 (2.9–3.0) | 3.0 (3.0–3.0) | 0.142 |
| Needle Driving | Driving Smoothness | 2.6 (2.5–2.8) | 3.0 (2.9–3.0) | 0.002 | 2.6 (2.5–2.8) | 3.0 (2.9–3.0) | 0.017 | 2.6 (2.5–2.9) | 3.0 (2.9–3.0) | 0.008 |
| Driving Wrist Rotation | 2.7 (2.3–2.8) | 3.0 (2.9–3.0) | 0.003 | 2.7 (2.3–2.9) | 2.9 (2.8–3.0) | 0.027 | 2.7 (2.4–2.9) | 2.9 (2.9–3.0) | 0.013 | |
| Depth of Suture | 3.0 (2.8–3.0) | 3.0 (2.9–3.0) | 0.349 | 3.0 (2.8–3.0) | 3.0 (3.0–3.0) | 0.103 | 3.0 (2.9–3.0) | 3.0 (3.0–3.0) | 0.119 | |
| Needle Withdrawal | Withdrawal Wrist Rotation | 2.5 (2.2–2.7) | 2.9 (2.7–2.9) | 0.003 | 2.4 (2.2–2.7) | 2.8 (2.8–2.9) | 0.001 | 2.6 (2.2–2.7) | 2.8 (2.8–2.9) | 0.019 |
| Suture Placeme nt and Manage ment | Suture Spacing | 3.0 (3.0–3.0) | 3.0 (3.0–3.0) | 0.154 | 3.0 (3.0–3.0) | 3.0 (3.0–3.0) | 0.154 | 3.0 (3.0–3.0) | 3.0 (3.0–3.0) | 0.274 |
| Suture Awareness | 2.0 (1.0–3.0) | 3.0 (2.0–3.0) | 0.034 | 2.0 (1.0–3.0) | 3.0 (3.0–3.0) | 0.002 | 2.0 (1.0–3.0) | 3.0 (2.0–3.0) | 0.044 | |
| Cinching | 2.0 (2.0–3.0) | 3.0 (2.0–3.0) | 0.088 | 2.0 (2.0–3.0) | 3.0 (3.0–3.0) | 0.016 | 2.0 (2.0–3.0) | 3.0 (2.0–3.0) | 0.165 | |
| Tissue Approximation | 3.0 (3.0–3.0) | 3.0 (3.0–3.0) | 0.812 | 3.0 (3.0–3.0) | 3.0 (3.0–3.0) | 0.154 | 3.0 (3.0–3.0) | 3.0 (3.0–3.0) | 0.274 | |
| Knot Tying | Tie Length | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| Prep | -- | -- | -- | -- | -- | -- | -- | -- | -- | |
| Knot Tension | -- | -- | -- | -- | -- | -- | -- | -- | -- | |
| Secure/Air Knot | -- | -- | -- | -- | -- | -- | -- | -- | -- | |
| Total Score | -- | 32.7 (30.9–34.6) | 37.2 (35.5–37.9) | <0.001 | 32.7 (30.9–33.9) | 37.2 (36.3–37.9) | <0.001 | 33.5 (31.2–35.0) | 37.9 (36.3–38.3) | 0.002 |
| Legend: | Significant difference between groups | |||||||||
Discussion
As surgical training continues to evolve from a pure apprenticeship model to one with objective assessment, the surgical educational community has to meet the demand and provide usable, standardized and scalable solutions that will help train future surgical talent. Using an absolute standard setting method, competency scores were set using a validated assessment tool to identify trainees who could competently execute a standardized dry-lab robotic suturing exercise. Critically, the level of competency identified was associated with no observed false positives or negatives, improving the consequential validity of EASE in the use of robotic suturing training.
Inter-rater Reliability
EASE has previously demonstrated validity evidence related to IRR when trained raters are used. Using previously described consensus-building techniques for scoring, our group was able to show meaningful improvement in IRR, with all agreement scores being considered substantial or excellent12.
Contrasting Groups Method
Use of the CG method requires two groups defined by clear differences in expertise. In many previous studies, this breakdown is done by differences in expertise, such as novices who are not supposed to pass the test and experienced individuals who a priori should all pass the test2. While our study included these two groups (medical students and attending urologists, respectively), these were not the groups used to set the CG cutoff. Instead, this was done with groups delineated by whether or not an expert evaluator felt the participant was at the level of competency of a graduating resident. In this manner, the absolute standard set was not based on a surrogate of proficiency, such as experience level, but rather on what could be considered the status quo of current surgical training: surgical educators who make a priori decisions of competency regarding their trainees. Screening the pass and fail groups for the unexpected presence of medical students and attending urologists could serve as a check on results from standard setting, given the evaluator was blinded to experience level.
Importantly, there were no observed false positives or false negatives in either the expert evaluation or CG cutoff (EASE-based methodology). Additionally, reported TFP and TFN values are well in line with previously reported values2. As the authors of this method mentioned, this method cannot overcome shortcomings from small sample sizes, however these values in concert with observed FP and FN can help better characterize the results of the cutoff.
Although no false positives or negatives were identified, the expert evaluation method had a lower pass rate when compared to the CG cutoff. The swap groups consisted exclusively of junior and senior residents, which is expected given skill-level can expectedly vary both throughout residency and at the time when competency is determined. While individual EASE scores can provide directional information, variability may exist in a trainee’s measured technical skills13. Future studies with longitudinal trends in EASE scores throughout training will provide invaluable information on variability and skill progression; current studies have not established a consistent learning curve for standard robotic procedures14.
In both observed and theoretical calculations, swap ups were more prevalent than swap downs, indicating the CG cutoff could “upskill” participants when compared to an expert evaluator. A specificity of 79% (when comparing CG evaluation to expert evaluation) may not be considered acceptable in a scenario as high stakes as surgical training. However, these results are promising for an initial study and trends in larger cohorts from multiple sites or use of additional expert raters may impact these trends.
Application of Contrasting Groups Cutoff
This standard can be used for high stakes decisions regarding a trainee’s technical readiness for independent practice, given it was derived via an “absolute” standard setting method15,16. This cutoff level, set at competency expected at residency graduation, seems best situated as a directional tool for trainees to understand if they are on trajectory to be competent in robotic suturing. Part of this use-case will be driven by the risk tolerance of surgical educators in using this established cutoff to guide trainees when expert evaluators are not present; however, this needs to be balanced by the extremely limited time and number of expert evaluator surgeons available to all trainees. Again, that no medical students or attending urologists were incorrectly categorized by the CG cutoff should help build confidence, along with the minimal theoretical false positive and false negative rates. Additionally, the expected results of a higher proportion of senior residents being considered competent than junior residents by both expert evaluation and CG cutoff further adds to the validity of the set standard.
Comparison of Contrasting Groups Cutoff to Experience Level and Expert Evaluation
To better understand the differences in evaluation methods, we compared the EASE scores of groups as detailed in Table 3. All comparison methods had similar differences in EASE sub-skills scores; one additional sub-skill was significantly different when comparing groups determined by the CG. Additionally, all domains except for Needle Entry were significant in all comparisons. This helps characterize that these three comparisons, which are proxies for how current trainees can be evaluated and are capturing similar, yet not exact, pictures of EASE performance in trainees. It is interesting to note that CG captured differences in the greatest number of sub-skills.
Compared to significant sub-skills identified in the live surgical setting3, more EASE sub-skills were considered statistically significantly different when looking at a more diverse cohort of trainees. Driving Smoothness and Wrist Rotation (Driving) were found to significant in both studies. Interestingly, Needle Repositions is no longer significant in any of the comparison groups. This could be due to the difference in testing medium, setting or due to difference in group compositions. However, this warrants evaluation in further studies.
Limitations
There are several limitations of this study that should be considered in its evaluation. First, this is a study that had all medical students, residents and attending urologists come from a single institution. Additionally, a single expert reviewer was used to evaluate all participants for a pass/fail evaluation. Future directions of this research include broadening evaluation to include residents from multiple institutions and expert evaluators from multiple institutions. There was a small sample size of attending urologists, part of this is due to time constraints and overall limited number of experts2. Consensus-building was not undertaken for all participant videos, only for the training videos. EASE was previously shown to be reliable and reproducible with rater training. All EASE raters were trained under a standardized protocol and underwent consensus-building for the training videos, with PABAK of substantial or higher levels. Use of overall EASE score to set competency may obfuscate more nuanced trends in specific sub-skills and may not account for if a minimum level of competence is achieved in each sub-skill. Also, the use of RAS is relatively much higher in urology, thus potentially limiting the generalizability of this study to other specialties; however, this study does also include 2 general surgery and 1 obstetrics/gynecology resident.
Future Steps
In addition to future steps mentioned above, future work includes understanding the consequences of this standard in the clinical environment through correlation with clinical outcomes. Secondly, it is important to investigate standard setting across different levels of residency training, to help identify individuals in need of remediation earlier in training.
Finally, we envision the end-state creation of an interactive online tool that can help trainees anywhere better understand the tenets of robotic suturing and help to incorporate EASE into their practice routines via automated evaluation. Use of automated evaluation can be a boon to training surgical residents. Some urology programs provide access to robotic consoles, either in a dry lab or with a da Vinci Surgical Skills Simulator (dVSSS). However, allowing residents to train with these mediums without proper guidance could lead to sub-optimal techniques or could slow the learning curve. Providing automated evaluation can help contextualize performance for trainees without straining the limited bandwidth of surgical experts. At the same time, we have pointed out that there is not perfect concordance between expert and CG evaluation. This shows that while automated evaluation can serve as a helpful adjunct for training surgeons, it will not fully replace evaluation by actual expert surgeons in its current state. As these technologies progress and become more precise, they could potentially be used as an adjunct to decisions related to robotic surgery certifications of competency.
Studies have shown translatability between Virtual Reality and live surgical outcomes17,18. Opportunities to leverage emerging technologies for surgical training and combine them with established techniques like “gamification” can drive participation and create a positive feedback loop for both trainees and the data pipeline19. As more studies into the translatability of VR training and other technologies into real-world scenarios take place, further insights into how to best replicate real-world scenarios can be better understood. For example, work by Ghazi et al.20 show that highlight the promise of 3D printing anatomic replicas for use as comprehensive, interactive, simulation platforms that can be a critical surgical decision making as well as an effective teaching tool.
Conclusion
Using an absolute standard setting method, scores were set using a validated assessment tool to identify trainees who could competently execute a standardized dry-lab robotic suturing exercise. This standard can be used for high stakes decisions regarding a trainee’s technical readiness for independent practice and can inform trainees of their general level of competency.
Overall, this study serves as the initial evidence for the use of EASE as a tool to guide robotic surgical evaluation. Surgical assessment tools can be leveraged for a variety of applications, including real-time, automated residency evaluation with artificial intelligence and credentialing in robotic surgery1,21,22.
Supplementary Material
Highlights.
Robotic suturing standards were established with an absolute standard setting method
Participants included medical students, surgical residents and attending urologists
Competency grounded in expert evaluation instead of proxies e.g., experience level
This standard can be used for high stakes decisions regarding a trainee’s readiness
Future applications may include automated evaluation and competency credentialing
Funding Source:
Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA251579. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Abbreviations:
- EASE
End-to-End Assessment of Suturing Expertise
- VUA
Vesicourethral anastomosis
- PGY
Postgraduate year
- PABAK
Prevalence-adjusted bias-adjusted Kappa
- CG
Contrasting Group
- RARP
Robotic-assisted radical prostatectomy
- TFP
Theoretical false positives
- TFN
Theoretical false negatives
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Kiyasseh D, Ma R, Haque TF, et al. A vision transformer for decoding surgeon activity from surgical videos. Nature Biomedical Engineering. 2023/06/01 2023;7(6):780–796. doi: 10.1038/s41551-023-01010-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Trinh L, Mingo S, Vanstrum EB, et al. Survival Analysis Using Surgeon Skill Metrics and Patient Factors to Predict Urinary Continence Recovery After Robot-assisted Radical Prostatectomy. Eur Urol Focus. Mar 2022;8(2):623–630. doi: 10.1016/j.euf.2021.04.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Haque TF, Hui A, You J, et al. An Assessment Tool to Provide Targeted Feedback to Robotic Surgical Trainees: Development and Validation of the End-To-End Assessment of Suturing Expertise (EASE). Urol Pract. Nov 2022;9(6):532–539. doi: 10.1097/upj.0000000000000344 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Jørgensen M, Konge L, Subhi Y. Contrasting groups’ standard setting for consequences analysis in validity studies: reporting considerations. Adv Simul (Lond). 2018;3:5. doi: 10.1186/s41077-018-0064-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mozafarpour S, Kavoussi LR. Editorial Commentary. Urology Practice. 2022;9(6):540–541. doi:doi: 10.1097/UPJ.0000000000000344.02 [DOI] [PubMed] [Google Scholar]
- 6.Goldenberg MG, Garbens A, Szasz P, Hauer T, Grantcharov TP. Systematic review to establish absolute standards for technical performance in surgery. Br J Surg. Jan 2017;104(1):13–21. doi: 10.1002/bjs.10313 [DOI] [PubMed] [Google Scholar]
- 7.Chu Timothy N, Sanford Daniel I, Wong Elyssa Y, et al. PD01–11 AUTOMATED SURGICAL SKILLS ASSESSMENT: CONSENSUS BUILDING TOWARDS HUMAN-DERIVED GROUND TRUTH SCORES FOR MACHINE LEARNING ALGORITHMS. Journal of Urology. 2023/04/01 2023;209(Supplement 4):e67. doi: 10.1097/JU.0000000000003218.11 [DOI] [Google Scholar]
- 8.Burrows PJ, Bingham L, Brailovsky CA. A Modified Contrasting Groups Method Used for Setting the Passmark in a Small Scale Standardised Patient Examination. Advances in Health Sciences Education. 1999/05/01 1999;4(2):145–154. doi: 10.1023/A:1009826701445 [DOI] [PubMed] [Google Scholar]
- 9.Clauser BE, Nungester RJ. Setting Standards on Performance Assessments of Physicians’ Clinical Skills Using Contrasting Groups and Receiver Operating Characteristic Curves. Evaluation & the Health Professions. 1997;20(2):215–238. doi: 10.1177/016327879702000207 [DOI] [PubMed] [Google Scholar]
- 10.Jacobsen N, Nolsøe CP, Konge L, et al. Development of and Gathering Validity Evidence for a Theoretical Test in Contrast-Enhanced Ultrasound. Ultrasound in Medicine & Biology. 2022/02/01/ 2022;48(2):248–256. doi: 10.1016/j.ultrasmedbio.2021.10.016 [DOI] [PubMed] [Google Scholar]
- 11.Jaud C, Salleron J, Cisse C, Angioi-Duprez K, Berrod JP, Conart JB. EyeSi Surgical Simulator: validation of a proficiency-based test for assessment of vitreoretinal surgical skills. Acta Ophthalmol. Jun 2021;99(4):390–396. doi: 10.1111/aos.14628 [DOI] [PubMed] [Google Scholar]
- 12.Slade SC, Finnegan S, Dionne CE, Underwood M, Buchbinder R. The Consensus on Exercise Reporting Template (CERT) applied to exercise interventions in musculoskeletal trials demonstrated good rater agreement and incomplete reporting. Journal of Clinical Epidemiology. 2018/11/01/ 2018;103:120–130. doi: 10.1016/j.jclinepi.2018.07.009 [DOI] [PubMed] [Google Scholar]
- 13.Stulberg JJ, Huang R, Kreutzer L, et al. Association Between Surgeon Technical Skills and Patient Outcomes [published correction appears in JAMA Surg. 2020 Oct 1;155(10):1002] [published correction appears in JAMA Surg. 2021 Jul 1;156(7):694]. JAMA Surg. 2020;155(10):960–968. doi: 10.1001/jamasurg.2020.3007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chahal B, Aydin A, Amin MSA, et al. The learning curves of major laparoscopic and robotic procedures in urology: a systematic review. Int J Surg. 2023;109(7):2037–2057. Published 2023 Jul 1. doi: 10.1097/JS9.0000000000000345 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kane MT, Crooks TJ, Cohen AS. Designing and Evaluating Standard-Setting Procedures for Licensure and Certification Tests. Adv Health Sci Educ Theory Pract. 1999;4(3):195–207. doi: 10.1023/a:1009849528247 [DOI] [PubMed] [Google Scholar]
- 16.Norcini JJ. Setting standards on educational tests. Med Educ. May 2003;37(5):464–9. doi: 10.1046/j.1365-2923.2003.01495.x [DOI] [PubMed] [Google Scholar]
- 17.Laca Jasper A, Kiyasseh D, Kocielnik R, et al. PD30–05 AI-BASED VIDEO FEEDBACK TO IMPROVE NOVICE PERFORMANCE ON A ROBOTIC SUTURING TASK. Journal of Urology. 2023/04/01 2023;209(Supplement 4):e832. doi: 10.1097/JU.0000000000003316.05 [DOI] [Google Scholar]
- 18.Chu TN, Wong EY, Ma R, et al. A Multi-institution Study on the Association of Virtual Reality Skills with Continence Recovery after Robot-assisted Radical Prostatectomy [published online ahead of print, 2023 Jun 3]. Eur Urol Focus. 2023;S2405–4569(23)00122–0. doi: 10.1016/j.euf.2023.05.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mokadam NA, Lee R, Vaporciyan AA, et al. Gamification in thoracic surgical education: Using competition to fuel performance. The Journal of Thoracic and Cardiovascular Surgery. 2015/11/01/ 2015;150(5):1052–1058. doi: 10.1016/j.jtcvs.2015.07.064 [DOI] [PubMed] [Google Scholar]
- 20.Ghazi AE, Teplitz BA. Role of 3D printing in surgical education for robotic urology procedures. Transl Androl Urol. 2020;9(2):931–941. doi: 10.21037/tau.2020.01.03 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kiyasseh D, Laca J, Haque TF, et al. A multi-institutional study using artificial intelligence to provide reliable and fair feedback to surgeons. Communications Medicine. 2023/03/30 2023;3(1):42. doi: 10.1038/s43856-023-00263-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kiyasseh D, Laca J, Haque TF, et al. Human visual explanations mitigate bias in AI-based assessment of surgeon skills. npj Digital Medicine. 2023/03/30 2023;6(1):54. doi: 10.1038/s41746-023-00766-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
