Development and validation of an objective scoring tool to evaluate surgical dissection: Dissection Assessment for Robotic Technique (DART)

Erik B Vanstrum; Runzhuo Ma; Jacqueline Maya-Silva; Daniel Sanford; Jessica H Nguyen; Xiaomeng Lei; Michael Chevinksy; Alireza Ghoreifi; Jullet Han; Charles F Polotti; Ryan Powers; Wesley Yip; Michael Zhang; Monish Aron; Justin Collins; Siamak Daneshmand; John W Davis; Mihir M Desai; Roger Gerjy; Alvin C Goh; Rainer Kimmig; Thomas S Lendvay; James Porter; Rene Sotelo; Chandru P Sundaram; Steven Cen; Inderbir S Gill; Andrew J Hung

doi:10.1097/upj.0000000000000246

. Author manuscript; available in PMC: 2023 May 1.

Published in final edited form as: Urol Pract. 2021 Sep;8(5):596–604. doi: 10.1097/upj.0000000000000246

Development and validation of an objective scoring tool to evaluate surgical dissection: Dissection Assessment for Robotic Technique (DART)

Erik B Vanstrum ¹, Runzhuo Ma ¹, Jacqueline Maya-Silva ¹, Daniel Sanford ¹, Jessica H Nguyen ¹, Xiaomeng Lei ², Michael Chevinksy ¹, Alireza Ghoreifi ¹, Jullet Han ¹, Charles F Polotti ¹, Ryan Powers ¹, Wesley Yip ¹, Michael Zhang ¹, Monish Aron ¹, Justin Collins ³, Siamak Daneshmand ¹, John W Davis ⁴, Mihir M Desai ¹, Roger Gerjy ⁵, Alvin C Goh ⁶, Rainer Kimmig ⁷, Thomas S Lendvay ⁸, James Porter ⁹, Rene Sotelo ¹, Chandru P Sundaram ¹⁰, Steven Cen ², Inderbir S Gill ¹, Andrew J Hung ¹

PMCID: PMC10150863 NIHMSID: NIHMS1875028 PMID: 37131998

Abstract

Purpose

Evaluation of surgical competency has important implications for training new surgeons, accreditation, and improving patient outcomes. A method to specifically evaluate dissection performance does not yet exist. This project aimed to design a tool to assess surgical dissection quality.

Methods

Delphi method was used to validate structure and content of the dissection evaluation. A multi-institutional and multi-disciplinary panel of 14 expert surgeons systematically evaluated each element of the dissection tool. Ten blinded reviewers evaluated 46 de-identified videos of pelvic lymph node and seminal vesicle dissections during the robot-assisted radical prostatectomy. Inter-rater variability was calculated using prevalence-adjusted and bias-adjusted kappa. The area under the curve from receiver operating characteristic curve was used to assess discrimination power for overall DART scores as well as domains in discriminating trainees (≤100 robotic cases) from experts (>100).

Results

Four rounds of Delphi method achieved language and content validity in 27/28 elements. Use of 3- or 5-point scale remained contested; thus, both scales were evaluated during validation. The 3-point scale showed improved kappa for each domain. Experts demonstrated significantly greater total scores on both scales (3-point, p< 0.001; 5-point, p< 0.001). The ability to distinguish experience was equivalent for total score on both scales (3-point AUC= 0.92, CI 0.82-1.00, 5-point AUC= 0.92, CI 0.83-1.00).

Conclusions

We present the development and validation of Dissection Assessment for Robotic Technique (DART), an objective and reproducible 3-point surgical assessment to evaluate tissue dissection. DART can effectively differentiate levels of surgeon experience and can be used in multiple surgical steps.

Keywords: Tissue dissection, surgical education, assessment tool

Introduction

Surgical education has evolved from curricula based on sheer repetition towards those rooted in assessment of surgical skills.¹ With the increasing utilization of robotics, there remain significant challenges in regard to the safe and effective transfer of expertise to novice surgeons.² Additionally, evaluation of surgical competency has important implications for improving patient outcomes and credentialing for robotic procedures.^2,3 In order to minimize the effects of training in the operating room, there is a need for validated evaluation tools to assess and promote technical competency of foundational surgical skills.²

The surgical community has designed numerous assessment tools across specialties, with urologists being some of the most active participants on this front.⁴ The trend in development of these tools is progressing with increasing granularity in their focus. Early assessments emphasized the evaluation of global skills (e.g. GEARS) such as surgical independence and general ability to operate robotic controls.⁵ More recent assessment tools focus on specific procedures (e.g. PACE) and even steps within a procedure (e.g. RACE).^6,7

This progression addresses deficiencies in early, global assessment tools (i.e., GEARS) by providing feedback at a more detailed and comprehensive level (i.e., PACE, RACE). However, the emphasis on procedure-specific evaluation may limit the utility of these tools. There can be significant variation among institutions in regard to how procedures are performed, complicating the applicability of stepwise procedural assessment. Additionally, feedback from procedure-specific evaluations may not translate to other procedures. Given that the implementation of manually scored and therefore time-consuming evaluations can be resource intensive, skills assessments that focus on increasingly narrow aspects of surgical expertise may be restricted in their practical use.

We propose that a detailed assessment of fundamental surgical skills, common to many procedures, is an approach that both recognizes the need for rigorous evaluation and broadly applies to a variety of surgical specialties and situations. For example, tissue dissection is a foundational technical skill set that comprises a large portion of surgery across disciplines and procedures.⁸ While dissection performance is addressed to some degree in both global and procedure-specific assessment tools, this skillset has discrete and identifiable components that can be further evaluated at a more nuanced level, i.e. to the degree that mastery is required. Similarly, improvement in dissection ability could improve proficiency in a diversity of surgical procedures. For these reasons, we present the development and validation of Dissection Assessment for Robotic Technique (DART), an objective and reproducible surgical assessment to evaluate tissue dissection.

Methods

Development

A team guided by the senior author (AJH) deconstructed dissection into fundamental elements. An initial product consisted of six domains that addressed components at the gesture and procedural level. Each domain was attributed three anchor descriptions designed to assess tiered progression of skill.

Delphi Method

Delphi method was used to validate structure and content of DART. A multi-institutional and multi-disciplinary (n = 12 Urology, n = 1 OBGYN, n = 1 General Surgery) panel was composed of 14 experts who are clinically and scientifically productive robotic surgeons (Robotic procedures, median 1600 (IQR 775-3000); H-index, 31 (22–46)). Panelists systematically, synchronously, and anonymously evaluated each element of the dissection tool, including domain presence, level descriptors, and potential 3- versus 5-point scoring scales. Additionally, panelists were given the opportunity to provide anonymous free response feedback on the content of DART. In order to clarify language and address panelist concerns, this feedback was incorporated into question from with the motivation for driving consensus in subsequent rounds. Criteria for graduating an element and inclusion in the final product was defined as achieving a content validity index (CVI) of 0.8 or greater; CVI is the proportion of expert panelists that agreed with the inclusion of a unique element. Elements that did not achieve consensus were modified based on panelist feedback for the following round. Delphi rounds continued until consensus was achieved. DART was constructed on the final round of consensus.

Inter-rater Variability and Validation

Subsequently, 10 blinded reviewers (n = 3 non-surgically trained (medical students), n = 7 surgically trained (chief residents, fellows)) evaluated tissue dissection performance from 46 de-identified videos over the course of 3 scoring rounds for a total of 460 total assessments. Of these videos, 23 were right-side standard-template pelvic lymph node dissections (PLND) and 23 were seminal vesicle (SV) dissections recorded during robot-assisted radical prostatectomies. During the first, second and third scoring rounds, 5, 8 and 10 videos from each procedure type were evaluated, respectively. A 1-hour consensus-building session was held prior to each round for raters to discuss the dissection scoring tool amongst one another or to review previously scored videos from the round prior with an emphasis on discrepant scoring. Scores were finalized on submission and not changed after discussion. During the first session, the videos reviewed were independent from those used for validation. The prevalence-adjusted and bias-adjusted kappa statistic was calculated for each round to measure inter-rater variability (IRV), or the degree to which DART scores agreed between raters.⁹ Kappa values 0.21-0.40, 0.41-0.60, 0.61-80, and 0.81-0.99 show fair moderate, substantial, and almost perfect agreement, respectively.¹⁰ Finalized scoring data was reported after a plateau in interobserver variability had been reached, with exclusion of the first 10 scored videos from the “training” round 1.

Validation of DART evaluated its ability to differentiate levels of surgeon experience. Expert and trainee surgeons were defined by prior robotic caseload >100 and ≤100, respectively (Table 1).¹¹ We took the average across all raters as the overall rating score of the 12 trainee and 24 expert video cases. The area under the curve (AUC) from receiver operating characteristic (ROC) curve was used to assess discrimination power for overall DART scores as well as domains in discriminating trainees from experts. A Z-test was used to determine whether ΔAUC (AUC_5point-AUC_3point)=0. If we could not reject the null hypothesis of ΔAUC=0, the 95% confidence interval (CI) of ΔAUC would be reported for the justification for equivalency. Statistical significance was determined by p <0.05. SASv9.4 (SAS Institute Inc., Cary, NC) was utilized for statistical tests.

Table 1:

Surgeon data for the evaluated surgical videos

	Experts (N=9) Median (IQR)	Trainees (N=6) Median (IQR)
Age, years	39 (36-48)	33 (30-39)
Years of robotic surgery	7 (4-11)	2 (1-2)
Prior robotic surgery caseload	300 (200-1900)	45 (20-78)

Open in a new tab

Results

Delphi Method

Four rounds of Delphi method achieved consensus in 27/28 elements. In round one, 18/28 (64%) elements reached consensus, including each of the domains. In rounds two and three, surgeons reached consensus on the frequency of events describing the lower and middle scoring anchors, achieving cumulatively 27/28 (96%) consensus. The element that did not reach consensus after round 3 involved use of a 3-point or 5-point scale. All expert surgeons agreed that an evaluation tool should be easily scored and 13/14 (93%) agreed that it is important to minimize variability between raters. 9/14 (64%) preferred a 3-point scale over a 5-point scale. As consensus could not be reached after an additional round 4, the decision was made to assess the inter-rater variability and validation of both scales.

Inter-rater Variability

IRV improved substantially after the first round of scoring (Figure 1).

In certain domains, an increase in agreement was measured from the second to third round in some domains (e.g., Tissue Retraction), while in other domains there was a decrease in agreement (e.g., Efficiency).

For overall IRV, both scales achieved largely moderate (kappa >0.50) or better agreement for each domain with the exception of one domain on the 5-point scale (Tissue Handling, kappa 0.38; Table 2). The 3-point scale showed improved IRV as compared with the 5-point scale, with 6/6 domains having greater kappa values.

Table 2:

Overall Inter-rater Variability

	3-point scale	5-point scale
Domain	Weighted Kappa (95% CI)
Gesture Selection and Efficacy	0.61 (0.40-0.81)	0.56 (0.37-0.74)
Instrument Visualization and Awareness	0.94 (0.85-1.00)	0.93 (0.84-1.00)
Respect of Tissue Plane	0.66 (0.46-0.86)	0.57 (0.38-0.76)
Tissue Handling	0.53 (0.31-0.76)	0.38 (0.18-0.58)
Tissue Retraction	0.55 (0.33-0.77)	0.47 (0.27-0.67)
Efficiency	0.52 (0.29-0.74)	0.44 (0.24-0.64)

Open in a new tab

IRV was consistent when analyzed for two separate dissection steps (PLND and SVD), with kappa differing by no more than 0.08 for each domain (Supplemental Table 1). As compared with surgically trained raters, non-surgically trained raters consistently showed equivalent or better agreement across domains (Supplemental Table 2).

Evaluative Tool Validation

Overall DART scores significantly differentiated expert and trainee surgeons (3-point scale, median 17.2 (16.9-17.4) vs. 15.7 (14.8-16.3), p<0.001; 5-point scale, median 28.4 (27.9-28.9) vs. 25.7 (23.8-26.8), p<0.001). When broken down by domain, the 3-point scale significantly differentiated surgical experience in 3/6 domains, including Gesture Selection, Tissue Retraction, and Efficiency (Table 3). The 5-point scale significantly differentiated surgical experience in the aforementioned domains as well as the Respect of Tissue Planes domain for a total of 4/6 domains. In the Tissue Handling and Instrument Visualization and Awareness domains, experts received a superior score than training surgeons, but the difference is not significantly different for either scale.

Table 3:

DART score broken down by domain

3-point scale	Expert (24 cases) Median (IQR)	Trainee (12 cases) Median (IQR)	p value
Gesture Selection and Efficacy	2.9 (2.9-3.0)	2.7 (2.5-2.8)	<0.001
Instrument Visualization and Awareness	3.0 (3.0-3.0)	3.0 (3.0-3.0)	1.00
Respect of Tissue Planes	2.9 (2.8-3.0)	2.8 (2.4-3.0)	0.078
Tissue Handling	2.7 (2.4-2.9)	2.5 (2.3-2.9)	0.728
Tissue Retraction	2.9 (2.8-3.0)	2.5 (2.3-2.7)	<0.001
Efficiency	2.9 (2.7-3.0)	2.3 (2.1-2.5)	<0.001
5-point scale	Expert (24 cases) Median (IQR)	Trainee (12 cases) Median (IQR)	p value
Gesture Selection and Efficacy	4.9 (4.8-5.0)	4.4 (4.1-4.6)	<0.001
Instrument Visualization and Awareness	5.0 (5.0-5.0)	5.0 (5.0-5.0)	0.679
Respect of Tissue Planes	4.8 (4.6-5.0)	4.6 (4.1-4.8)	0.010
Tissue Handling	4.4 (3.8-4.9)	4.0 (3.8-4.8)	0.779
Tissue Retraction	4.8 (4.5-5.0)	4.2 (3.6-4.4)	<0.001
Efficiency	4.8 (4.6-4.9)	3.5 (3.2-3.9)	<0.001

Open in a new tab

The discrimination power in discriminating experts from trainees was similar between the 3-point (AUC=0.92, CI 0.82-1.00) and 5-point (AUC=0.92, CI 0.83-1.00) scales. The difference between these two AUCs was 0.003 (CI −0.018-0.025), which is within a boundary of ±0.03 (Supplemental Table 3). Therefore, they are equal in discriminating experts from trainees. Most of the subdomains showed a difference in AUC between 3-point and 5-point scales within the boundary of ±0.1, except Respect of Tissue Planes. This domain on the 5-point scale (AUC=0.76, CI 0.6-0.93) had better discrimination power than on the 3-point scale (AUC=0.69, CI 0.49-0.88), with a difference of 0.08 (CI 0.01-0.15, p=0.03).

Discussion

Tissue dissection is a fundamental surgical skill set that is employed consistently in the operating room across procedures and specialties. In this study, we developed a broadly applicable objective assessment tool to evaluate tissue dissection proficiency in robotic surgery. We show that this tool can be employed during multiple surgical procedures, and effectively differentiates surgeons based on experience with acceptable levels of IRV.

Delphi Method

Through 4 rounds, the panelists could not settle on a 3- or 5-point scale for DART (Figure 2 and Supplemental Figure 1, respectively). In fact, previous surgical assessment tools (i.e., GEARS, PACE, RACE) have utilized a 5-point Likert structure into their original design. It is unclear whether they ever entertained a 3-point scale. In our study, those experts in favor of a 3-point scale argued that it delineated between substantial differences in dissection proficiency and as a result would be more reproducible and standardized in practical use. Those experts in favor of a 5-point scale solicited that the increased granularity provided greater option to differentiate levels of skill. As consensus could not be reached, we elected to continue the IRV and validation parts of the study with inclusion of both scales.

Evaluation of Inter-rater Variability

While reporting on previous assessment tools has noted that training of users is required, there is little published data reporting the extent of this process. Here, we show that a group of 10 raters requires a “training round” with 10 total videos and 2 consensus-building discussions to reach a relative plateau in IRV, demonstrated by minimal change in kappa value from round 2 to 3. We did see a substantial increase in agreement of Tissue Retraction scores after round 2. We attribute this increase to a consensus-building session conversation surrounding different opinions about the role of the third robotic arm for retraction. After round 2, the group agreed that a surgeon should not be penalized even if the third arm is not being used as long as tissue retraction is satisfactory. With this clarification, agreement improved in the subsequent round. This example highlights the utility of consensus-building sessions in training raters and that there may be a learning curve associated with assessment tool use.

Agreement of Tissue Handling scores did not improve to the degree that the other domains did from round 1 to 2 and decreased slightly from round 2 to 3. One potential reason for the lack of progression could be a variable interpretation of trauma and/or bleeding. For example, some graders may have given a lower score for triggering an obturator reflex during PLND, whereas others may not. Additionally, it was difficult to reach consensus on how bleeding should be rated. As a result, we have added to the anchors for Tissue Handling the qualifier of “unnecessary” or “avoidable bleeding” to clarify how bleeding should be addressed.

Overall, DART was shown to be reliable with largely moderate or better agreement (Table 2). This degree of agreement is consistent with previously published assessment tools.⁶ Unsurprisingly, the 3-point scale, with less options presented to raters, showed improved IRV as compared with the 5-point scale.

We show that DART can be reliably used during different procedural steps, with minimal variation in IRV. Our data also suggest that users of this assessment tool do not necessarily require prior surgical training. In fact, raters with no prior experience in the operating room had equivalent if not improved scoring agreement as compared with those with prior surgical training. Raters with prior surgical training may have brought biases engrained from training into their decision-making process, as compared with those raters with a “blank slate” attitude. As a means to circumvent the resource-intensive process of manual surgical assessment by expert surgeons, evaluation tools have tested with crowdsourced evaluation.^12,13 This strategy has been validated against expert feedback and could be used to deliver timely surgical feedback, a key factor in promoting improvement of any skill. Alternatively, we have also developed this assessment tool with an eye for the future when deep learning algorithms utilizing computer vision may automatically provide tissue dissection assessment.¹⁴

Validation

Total DART scores can reliably differentiate expert and trainee surgeon performance. In our surgeon cohort, the Instrument Visualization and Awareness domain had almost perfect agreement among raters and thus carried no significant ability to differentiate experience. Perhaps the relative experience level of our evaluated surgeons did not reveal major deficiencies in this domain.

3- vs 5-point Scale

We present IRV and validation data on both 3- and 5-point DART scales after our expert panel could not reach consensus. With improved IRV and indistinguishable ability to differentiate experience, our results suggest that the 3-point scale is superior. Therefore, we recommend use of the 3-point scale for future study (Figure 2).

Study Limitations

This study has a number of limitations. We did not assess intra-rater variability, although our raters show good agreement after the training round, suggesting consistency in scoring agreement over time. While our Delphi panelists were a multi-institutional and multi-specialty group, our validation of DART only utilized surgical videos from urologists at one institution. Additionally, we only differentiated evaluating surgeons into two levels of experience. Applicability of this tool should be studied in the context of a wider range of skill levels, for example, among surgeons with little-to-no robotic experience, and should be externally validated.

Future Steps

Future work will seek to address the educational value of DART with study of score changes over time and perceived influence on learning and skill acquisition. Application of this tool in the setting of crowd-sourced or automated assessment could provide avenues for efficient evaluation and timely delivery of educational feedback.¹⁵ These avenues for scalable evaluation with DART may present reliable means of assessing procedural skill, unlike use of automated performance metrics and kinematic data, which largely measure surgical efficiency. Future inquiry will describe the relationship between DART and clinical outcomes, such as operative time, tumor margin, or cancer recurrence. Finally, this multi-specialty tool could be implemented into core-skills robotic training curricula.

Conclusion

We developed DART, an objective assessment tool to evaluate tissue dissection. This tool can be applied to multiple procedural steps and can effectively differentiate levels of surgeon experience with acceptable IRV.

Supplementary Material

Supplementary Material Table 1

NIHMS1875028-supplement-Supplementary_Material_Table_1.docx^{(13.4KB, docx)}

Supplementary Material Table 2

NIHMS1875028-supplement-Supplementary_Material_Table_2.docx^{(13.3KB, docx)}

Supplementary Material Table 3

NIHMS1875028-supplement-Supplementary_Material_Table_3.docx^{(13.6KB, docx)}

Supplementary Material Figure 1

NIHMS1875028-supplement-Supplementary_Material_Figure_1.docx^{(87.6KB, docx)}

Funding:

Research reported in this publication was supported in part by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number K23EB026493

Standard Abbreviations

DART: Dissection Assessment for Robotic Technique
GEARS: Global Evaluative Assessment of Robotic Skills
PACE: Prostatectomy Assessment and Competency Evaluation
RACE: Robotic Anastomosis Competency Evaluation

Footnotes

Conflicts of interest/Competing interests: M Aron has financial disclosures with Intuitive. JW Davis is a consultant for Intuitive Surgical and Janssen. MM Desai is a consultant for Auris Health, PROCEPT BioRobotics. AC Goh is a consultant for Medtronic. R Kimmig has financial disclosures with Intuitive Surgical Inc., Medtronic, Avatera, CMR and Medicaroid. J Porter is a consultant for Medtronic, Ceevra, Proximie, Intuitive. I Gill is an unpaid advisor for Steba Biotech. AJ Hung is a consultant for Mimic Technologies, Quantagene, and Johnson & Johnson. The study was not funded by any of these companies. Other authors have no conflict of interest.

Bibliography

1.Hurreiz H The evolution of surgical training in the UK. Adv Med Educ Pract. 2019;Volume 10:163–168. doi: 10.2147/AMEP.S189298 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lee JY, Mucksavage P, Sundaram CP, et al. Best Practices for Robotic Surgery Training and Credentialing. J Urol. 2011;185(4):1191–1197. doi: 10.1016/j.juro.2010.11.067 [DOI] [PubMed] [Google Scholar]
3.Hung AJ, Chen J, Ghodoussipour S, et al. A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU Int. 2019;124(3):487–495. doi: 10.1111/bju.14735 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Vaidya A, Aydin A, Ridgley J, et al. Current Status of Technical Skills Assessment Tools in Surgery: A Systematic Review. J Surg Res. 2020;246:342–378. doi: 10.1016/j.jss.2019.09.006 [DOI] [PubMed] [Google Scholar]
5.Goh AC, Goldfarb DW, Sander JC, et al. Global Evaluative Assessment of Robotic Skills: Validation of a Clinical Assessment Tool to Measure Robotic Surgical Skills. J Urol. 2012;187(1):247–252. doi: 10.1016/j.juro.2011.09.032 [DOI] [PubMed] [Google Scholar]
6.Hussein AA, Ghani KR, Peabody J, et al. Development and Validation of an Objective Scoring Tool for Robot-Assisted Radical Prostatectomy: Prostatectomy Assessment and Competency Evaluation. J Urol. 2017;197(5):1237–1244. doi: 10.1016/j.juro.2016.11.100 [DOI] [PubMed] [Google Scholar]
7.Raza SJ, Field E, Jay C, et al. Surgical Competency for Urethrovesical Anastomosis During Robot-assisted Radical Prostatectomy: Development and Validation of the Robotic Anastomosis Competency Evaluation. Urology. 2015;85(1):27–32. doi: 10.1016/j.urology.2014.09.017 [DOI] [PubMed] [Google Scholar]
8.Ma R, Vanstrum EB, Nguyen JH, et al. A Novel Dissection Gesture Classification to Characterize Robotic Dissection Technique for Renal Hilar Dissection. J Urol. Published online August 18, 2020. doi: 10.1097/JU.0000000000001328 [DOI] [PubMed] [Google Scholar]
9.Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol. 1993;46(5):423–429. doi: 10.1016/0895-4356(93)90018-v [DOI] [PubMed] [Google Scholar]
10.Viera AJ, Garrett JM. Understanding Interobserver Agreement: The Kappa Statistic. Fam Med.:4. [PubMed] [Google Scholar]
11.Abboudi H, Khan MS, Guru KA, et al. Learning curves for urological procedures: a systematic review. BJU Int. 2014;114(4):617–629. doi: 10.1111/bju.12315 [DOI] [PubMed] [Google Scholar]
12.Holst D, Kowalewski TM, White LW, et al. Crowd-Sourced Assessment of Technical Skills: Differentiating Animate Surgical Skill Through the Wisdom of Crowds. J Endourol. 2015;29(10):1183–1188. doi: 10.1089/end.2015.0104 [DOI] [PubMed] [Google Scholar]
13.Holst D, Kowalewski TM, White LW, et al. Crowd-Sourced Assessment of Technical Skills: An Adjunct to Urology Resident Surgical Simulation Training. J Endourol. 2014;29(5):604–609. doi: 10.1089/end.2014.0616 [DOI] [PubMed] [Google Scholar]
14.Luongo F, Hakim R, Nguyen JH, et al. Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery. Surgery. Published online September 26, 2020. doi: 10.1016/j.surg.2020.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ghani KR, Miller DC, Linsell S, et al. Measuring to Improve: Peer and Crowd-sourced Assessments of Technical Skill with Robot-assisted Radical Prostatectomy. Eur Urol. 2016;69(4):547–550. doi: 10.1016/j.eururo.2015.11.028 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material Table 1

NIHMS1875028-supplement-Supplementary_Material_Table_1.docx^{(13.4KB, docx)}

Supplementary Material Table 2

NIHMS1875028-supplement-Supplementary_Material_Table_2.docx^{(13.3KB, docx)}

Supplementary Material Table 3

NIHMS1875028-supplement-Supplementary_Material_Table_3.docx^{(13.6KB, docx)}

Supplementary Material Figure 1

NIHMS1875028-supplement-Supplementary_Material_Figure_1.docx^{(87.6KB, docx)}

[R1] 1.Hurreiz H The evolution of surgical training in the UK. Adv Med Educ Pract. 2019;Volume 10:163–168. doi: 10.2147/AMEP.S189298 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Lee JY, Mucksavage P, Sundaram CP, et al. Best Practices for Robotic Surgery Training and Credentialing. J Urol. 2011;185(4):1191–1197. doi: 10.1016/j.juro.2010.11.067 [DOI] [PubMed] [Google Scholar]

[R3] 3.Hung AJ, Chen J, Ghodoussipour S, et al. A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU Int. 2019;124(3):487–495. doi: 10.1111/bju.14735 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Vaidya A, Aydin A, Ridgley J, et al. Current Status of Technical Skills Assessment Tools in Surgery: A Systematic Review. J Surg Res. 2020;246:342–378. doi: 10.1016/j.jss.2019.09.006 [DOI] [PubMed] [Google Scholar]

[R5] 5.Goh AC, Goldfarb DW, Sander JC, et al. Global Evaluative Assessment of Robotic Skills: Validation of a Clinical Assessment Tool to Measure Robotic Surgical Skills. J Urol. 2012;187(1):247–252. doi: 10.1016/j.juro.2011.09.032 [DOI] [PubMed] [Google Scholar]

[R6] 6.Hussein AA, Ghani KR, Peabody J, et al. Development and Validation of an Objective Scoring Tool for Robot-Assisted Radical Prostatectomy: Prostatectomy Assessment and Competency Evaluation. J Urol. 2017;197(5):1237–1244. doi: 10.1016/j.juro.2016.11.100 [DOI] [PubMed] [Google Scholar]

[R7] 7.Raza SJ, Field E, Jay C, et al. Surgical Competency for Urethrovesical Anastomosis During Robot-assisted Radical Prostatectomy: Development and Validation of the Robotic Anastomosis Competency Evaluation. Urology. 2015;85(1):27–32. doi: 10.1016/j.urology.2014.09.017 [DOI] [PubMed] [Google Scholar]

[R8] 8.Ma R, Vanstrum EB, Nguyen JH, et al. A Novel Dissection Gesture Classification to Characterize Robotic Dissection Technique for Renal Hilar Dissection. J Urol. Published online August 18, 2020. doi: 10.1097/JU.0000000000001328 [DOI] [PubMed] [Google Scholar]

[R9] 9.Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol. 1993;46(5):423–429. doi: 10.1016/0895-4356(93)90018-v [DOI] [PubMed] [Google Scholar]

[R10] 10.Viera AJ, Garrett JM. Understanding Interobserver Agreement: The Kappa Statistic. Fam Med.:4. [PubMed] [Google Scholar]

[R11] 11.Abboudi H, Khan MS, Guru KA, et al. Learning curves for urological procedures: a systematic review. BJU Int. 2014;114(4):617–629. doi: 10.1111/bju.12315 [DOI] [PubMed] [Google Scholar]

[R12] 12.Holst D, Kowalewski TM, White LW, et al. Crowd-Sourced Assessment of Technical Skills: Differentiating Animate Surgical Skill Through the Wisdom of Crowds. J Endourol. 2015;29(10):1183–1188. doi: 10.1089/end.2015.0104 [DOI] [PubMed] [Google Scholar]

[R13] 13.Holst D, Kowalewski TM, White LW, et al. Crowd-Sourced Assessment of Technical Skills: An Adjunct to Urology Resident Surgical Simulation Training. J Endourol. 2014;29(5):604–609. doi: 10.1089/end.2014.0616 [DOI] [PubMed] [Google Scholar]

[R14] 14.Luongo F, Hakim R, Nguyen JH, et al. Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery. Surgery. Published online September 26, 2020. doi: 10.1016/j.surg.2020.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Ghani KR, Miller DC, Linsell S, et al. Measuring to Improve: Peer and Crowd-sourced Assessments of Technical Skill with Robot-assisted Radical Prostatectomy. Eur Urol. 2016;69(4):547–550. doi: 10.1016/j.eururo.2015.11.028 [DOI] [PubMed] [Google Scholar]

PERMALINK

Development and validation of an objective scoring tool to evaluate surgical dissection: Dissection Assessment for Robotic Technique (DART)

Erik B Vanstrum

Runzhuo Ma, MD

Jacqueline Maya-Silva

Daniel Sanford

Jessica H Nguyen

Xiaomeng Lei, MPH

Michael Chevinksy, MD

Alireza Ghoreifi, MD

Jullet Han, MD

Charles F Polotti, MD

Ryan Powers, DO, MPH

Wesley Yip, MD

Michael Zhang, MD

Monish Aron, MD

Justin Collins, MD

Siamak Daneshmand, MD

John W Davis, MD

Mihir M Desai, MD

Roger Gerjy, MD, PhD

Alvin C Goh, MD

Rainer Kimmig, MD

Thomas S Lendvay, MD

James Porter, MD

Rene Sotelo, MD

Chandru P Sundaram, MD

Steven Cen, PhD

Inderbir S Gill, MD

Andrew J Hung, MD

Abstract

Purpose

Methods

Results

Conclusions

Introduction

Methods

Development

Delphi Method

Inter-rater Variability and Validation

Table 1:

Results

Delphi Method

Inter-rater Variability

Figure 1: Progression of inter-rater variability from video scoring rounds 1-3 broken down by domain and scale.

Table 2:

Evaluative Tool Validation

Table 3:

Discussion

Delphi Method

Figure 2. Dissection Assessment for Robotic Technique – 3-point scale.

Evaluation of Inter-rater Variability

Validation

3- vs 5-point Scale

Study Limitations

Future Steps

Conclusion

Supplementary Material

Funding:

Standard Abbreviations

Footnotes

Bibliography

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases