Abstract
Purpose
In this study, we investigate the ability of automated performance metrics (APMs) and task-evoked pupillary response (TEPR), as objective measures of surgeon performance, to distinguish varying levels of surgeon expertise during generic robotic surgical tasks. Additionally, we evaluate the association between APMs and TEPR.
Methods
Participants completed ten tasks on a da Vinci Xi Surgical System (Intuitive Surgical, Inc.), each representing a surgical skill type: EndoWrist ® manipulation, needle targeting, suturing/knot tying, and excision/dissection. APMs (instrument motion tracking, EndoWrist® articulation, and system events data) and TEPR were recorded by a systems data recorder (Intuitive Surgical, Inc.) and Tobii Pro Glasses 2 (Tobii Technologies, Inc.) respectively. The Kruskal-Wallis test determined significant differences between groups of varying expertise. Spearman’s rank correlation coefficient measured associations between APMs and TEPR.
Results
Twenty-six participants were stratified by robotic surgical experience: novice (no prior experience; n=9), intermediate (<100 cases; n=9), and experts (≥ 100 cases; n=8). Several APMs differentiated surgeon experience including task duration (p<0.01), time active of instruments (p<0.03), linear velocity of instruments (p<0.04), and angular velocity of dominant instrument (p<0.04). TEPR distinguished surgeon expertise for 3 out of 4 task types (p<0.04). Correlation trends between APMs and TEPR revealed that expert surgeons move more slowly with high cognitive workload (ρ<−0.60, p<0.05), while novices move faster under the same cognitive experiences (ρ>0.66, p<0.05).
Conclusions
APMs and TEPR can distinguish surgeon expertise levels during robotic surgical tasks. Furthermore, under high cognitive workload, there can be a divergence in robotic movement profiles between expertise levels.
Keywords: Robotic surgical training, Surgeon assessment, Automated performance metrics, Task-evoked pupillary response
Introduction
Robotic surgery is a common approach for multiple surgical disciplines including urology. In 2018, over one million robotic-assisted surgical procedures were performed using the da Vinci Surgical System (Intuitive Surgical, Inc.) [1]. Training surgeons to robotic technical proficiency is essential to ensure competency and prevent surgical errors, since technical skill impacts patient outcomes [2–4].
Consequently, objective assessment of robotic surgical skill is critical for surgeon training [5]. The current gold standard of evaluating surgical skill includes subsequent manual review by experts [6]. Unfortunately, this approach is time-consuming, subjective, and susceptible to inter-observer variability [5, 7–8].
By nature, robotic procedures lend themselves to alternative assessment methodologies utilizing computer generated metrics, which are truly objective and largely immune to human bias [7]. In previous validation studies by our group, automated performance metrics (APMs), derived from kinematic data and robot systems events data, have distinguished surgeon experience during select steps of a robot-assisted radical prostatectomy [9]. Further application of APMs to modern computer science (through machine learning or deep learning) illustrates a link to clinical outcomes [10].
Surgeon assessment can also be conducted by considering cognitive workload, or mental strain in the working memory – which has limited processing capacity [5]. Studies show that cognitive workload differs in trainees and experts executing identical tasks and affects patient outcomes [11,12]. Task-evoked pupillary response (TEPR), based on eye movements and changes in pupil size, has been shown to correlate with cognitive processing and is an advantageous tool for reporting cognitive workload in real-time [3, 11, 13].
In this study, we present an evaluation of APMs and TEPR while surgeons of varying experience performed specialty-neutral tasks in a controlled laboratory environment. First, we sought to determine the ability of APMs and TEPR to distinguish surgical expertise. Secondly, we examined the association between APMs and TEPR during the tasks to investigate surgeon performance during different cognitive states of mind.
Methods
Study Design
After obtaining institutional review board approval, participants of varying surgical experience at our institution performed four sets of surgical tasks on the da Vinci Xi Surgical System. Each task type was chosen to represent one of four broad robotic skill categories: EndoWrist ® manipulation (Fig. 1a), needle targeting (Fig. 1b), suturing/knot tying (Fig. 1c), and excision/dissection (Fig. 1d).
Participants
Participants in the present study were faculty surgeons, robotic fellows, urology residents, and medical students from our institution, enrolled to represent a spectrum of surgical experience. The participants were stratified a priori into three surgical experience levels: novices (no surgical experience), intermediates (less than 100 robotic console cases), and experts (greater than or equal to 100 robotic console cases). The cut-off defining an expert surgeon was based on a meta-analysis and our previous study [7, 14]. We utilized a standardized orientation to ensure baseline understanding on the use of the da Vinci surgeon console.
Data Collection
Surgical skill assessment was quantified utilizing two data streams: APMs and TEPR.
During all tasks, robotic metrics were collected at a sampling rate of 50 Hz using a systems events data recorder (Intuitive Surgical, Inc.) provided by Intuitive Surgical (Fig. 2a). The APMs derived from the recorded metrics included kinematic data (instrument travel time, path length, velocity, EndoWrist® movements) and system events data (camera movements, clutch use). In this study, a total of 22 previously validated APMs were examined for analysis [7,9].
Cognitive mental workload, assessed through TEPR, was recorded by an eye-tracking device measuring eye movements, gaze patterns, and pupil dilation at a sampling rate of 100 Hz. Participants wore the Tobii Pro Glasses 2 (Tobii Technology, Inc.), a wearable tracking system which did not obstruct the normal field of view (Fig. 2b). These eye-tracking recordings were anonymized and sent to EyeTracking, Inc. for data processing through their EyeWorks™ software. The software’s algorithms produced the Index of Cognitive Activity (ICA), a scaled metric from 0 to 1, reflective of TEPR and real-time cognitive workload, with greater values indicating higher cognitive processes.
Statistical Analysis
The Kruskal-Wallis test was utilized to compare performance differences, as measured by APMs and ICA, between the three participant groups. For post-hoc pairwise comparisons, Dunn’s test with Bonferroni correction revealed which group pairings exhibited the significant difference.
Spearman’s rank correlation coefficient (ρ) exemplified any bivariate correlation between APMs and ICA for task types through two-tailed tests.
All statistical analysis in this study was conducted using IBM SPSS® v24, with p<0.05 taken to signify statistical significance.
Results
Participant Demographics
Twenty-six students, surgical trainees, and surgeons participated in this study – of which nine were true novices (no prior robotic experience), nine were intermediates (median 50.0 (range 22.5–80.0) prior console cases), and eight were experts (525 (175–2500) cases) (Table 1).
Table 1.
Group | |||
---|---|---|---|
N | I | E | |
Novice (no prior robotic experience) | Intermediate (< 100 robotic cases) | Experts (≥ 100 robotic cases) | |
Number of participants | 9 | 9 | 8 |
Median (IQR) | |||
Age | 26.0 (23.5 – 31.5) | 35.0 (32.5 – 40.5) | 42.5 (33.25 – 47.5) |
Years of practice | 0.0 (0.0 – 5.5) | 6.0 (3.0 – 7.5) | 14.0 (6.25 – 24.0) |
Number of robotic cases | 0.0 (0.0 – 0.0) | 50.0 (22.5 – 80.0) | 525.0 (175.0 – 2500.0) |
Comparison of surgeon performance by experience groups across task types
Automatic performance metrics (APMs)
In the present series of dry lab tasks, several APMs distinguished surgeons based on experience level (Online Resource 1). APMs that differentiated experience across the majority of task types (>2 of 4) included: total task duration (p < 0.01), time active of dominant and non-dominant instruments (p <0.03), linear velocity of dominant instruments, non-dominant instruments, and camera (p<0.04), and lastly angular velocity of dominant instrument (p<0.04).
Task-evoked pupillary response (TEPR)
TEPR, as a measure of cognitive workload quantified by the Index of Cognitive Activity (ICA), was also able to distinguish surgical experience. Trends showed a general decrease in ICA as surgeon experience increased (Table 2). This trend exhibited significance across three out of the four dry lab exercise types: EndoWrist® Manipulation (p<0.005), suturing/knot tying (p<0.037), and excision/dissection (p<0.043).
Table 2.
Task Type | Group | Index of Cognitive Activity (ICA) | p-Value |
---|---|---|---|
Median (IQR) | |||
EndoWrist® manipulation | N | 0.517 (0.433 – 0.574) | < 0.005 †⋄ |
I | 0.442 (0.363 – 0.475) | ||
E | 0.423 (0.298 –0.506) | ||
Needle targeting | N | 0.430 (0.383 – 0.547) | 0.051 |
I | 0.365 (0.298 – 0.425) | ||
E | 0.328 (0.192 – 0.452) | ||
Suturing/knot tying | N | 0.534 (0.388 – 0.592) | 0.037 ⋄ |
I | 0.372 (0.278– 0.485) | ||
E | 0.372(0.248 – 0.485) | ||
Excision/dissection | N | 0.471 (0.416 – 0.534) | 0.043 ⋄ |
I | 0.418 (0.389 – 0.468) | ||
E | 0.346 (0.265 – 0.473) |
TEPR, as a measure of cognitive workload, is quantified by the Index of Cognitive Activity (ICA) on a scale of 0 to 1, with higher values indicating higher cognitive workload.
denotes significant difference (p<0.05) between groups E & N (experts and novices)
denotes significant difference (p<0.05) between groups I & N (intermediates and novices)
denotes significant difference (p<0.05) between groups E & I (experts and intermediates)
Correlations between APMs and TEPR
Table 3 summarizes the correlation between APMs and TEPR by experience group and task type. The most significant correlations were observed during EndoWrist® manipulation, needle targeting, and suturing/knot tying tasks.
Table 3.
Task Type | EndoWrist® maniuplation | Needle targeting | Suturing/knot tying | Excision/dissection | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Group | N | I | E | N | I | E | N | I | E | N | I | E | |
APM | |||||||||||||
Time-related metrics (seconds) | |||||||||||||
Time to complete task | - | - | 0.43 | 0.69 | - | - | - | - | - | - | - | - | |
Moving time of dominant instrument | - | - | 0.42 | 0.85 | - | - | - | - | - | - | - | - | |
Moving time of non-dominant instrument | - | - | 0.4 | 0.79 | - | - | - | - | - | - | - | - | |
Moving time of camera | - | - | 0.64 | - | - | - | - | - | - | - | - | - | |
System events metrics | |||||||||||||
Master clutch usage during task | - | - | - | - | −0.77 | - | - | - | - | - | - | - | |
Instrument kinematic metrics | |||||||||||||
Path length of dominant instrument (cm) | - | - | - | 0.73 | - | - | - | - | - | - | - | - | |
Path length of non-dominant instrument (cm) | - | - | - | 0.77 | - | - | - | 0.90 | - | - | - | - | |
Linear velocity of dominant instrument (cm/s) | - | 0.76 | - | - | - | - | 0.62 | - | −0.69 | - | 0.71 | - | |
Linear velocity of non-dominant instrument (cm/s) | - | 0.66 | −0.76 | - | - | −0.64 | - | 0.88 | −0.85 | - | - | - | |
Camera kinematic metrics | |||||||||||||
Linear velocity of camera (cm/s) | - | 0.64 | - | - | - | - | - | 0.83 | −0.60 | - | 0.76 | ||
Number of camera adjustments during the task | - | - | - | - | - | - | - | - | - | - | - | - | |
Path length of camera (cm) | 0.48 | - | 0.47 | - | - | - | - | - | - | - | - | - | |
EndoWrist® articulation metrics | |||||||||||||
Shaft rotation of dominant instrument (rad) | - | - | - | - | - | - | - | - | - | - | - | - | |
Wrist translation along axis 1 of dominant instrument (rad) | - | - | - | - | - | 0.61 | - | - | - | - | - | - | |
Wrist translation along axis 2 of dominant instrument (rad) | - | - | - | - | - | - | - | - | - | - | - | - | |
Shaft rotation of non-dominant instrument (rad) | - | - | - | 0.72 | - | - | - | - | - | - | - | - | |
Wrist translation along axis 1 of non-dominant instrument (rad) | - | - | - | - | - | - | - | - | - | - | - | - | |
Wrist translation along axis 2 of non-dominant instrument (rad) | - | - | - | - | - | - | - | - | - | - | - | - | |
Dominant instrument articulation (rad) | - | - | - | - | - | 0.61 | - | - | - | - | - | - | |
Non-dominant instrument articulation (rad) | - | - | - | - | - | - | - | - | - | - | - | - | |
Angular velocity of dominant instrument articulation (rad/s) | - | - | 0.53 | - | 0.71 | - | - | 0.79 | - | - | - | - | |
Angular velocity of non-dominant instrument articulation (rad/s) | −0.61 | 0.51 | - | - | - | - | - | - | - | - | - | - |
TEPR, as a measure of cognitive workload, is quantified by the Index of Cognitive Activity (ICA) on a scale of 0 to 1, with higher values indicated higher cognitive workload.
All values presented are significant correlations (p<0.05) between ICA and APMs
Stronger associations (ρ>0.6) were seen from specific APMs involving instrument and camera velocities. Our findings displayed opposing relationships between APMs and TEPR when comparing expert surgeons to less experienced surgeons. In two task types, experts’ linear velocity of their non-dominant instrument showed an inverse relationship with ICA (ρ<−0.64, p<0.05), revealing that experts move their instruments slower when more cognitive workload was required. In contrast, intermediates exhibited a direct relationship as their non-dominant instruments move faster with more cognitive demand (ρ>0.66, p<0.05). These relationships were also apparent for linear velocity of the dominant instrument for experts (ρ=−0.69, p=0.01) and intermediates (ρ>0.71, p<0.05), as well as for linear velocity of the camera [experts (ρ=−0.60, p=0.04); intermediates (ρ>0.64, p<0.05)] during select task types.
Wrist angular velocity of non-dominant instrument also followed the above trend. In one task type, experts rotated their wrists more slowly with increased cognitive workload (ρ=−0.53, p=0.01). Conversely, in two task types, intermediate surgeons rotated their instrument wrists faster with increased cognitive workload (ρ>0.71, p<0.05).
Discussion
The present study demonstrated the ability of APMs and TEPR to distinguish robotic surgical experience in a controlled training environment. Furthermore, during states of high cognitive workload, there can be a divergence in robotic movement profiles between surgeons of varying experience. Our findings showed that expert surgeons generally slowed their movements with increasing cognitive workload. This was contrary to surgeons with less experience, who commonly sped up with increasing cognitive workload. Perhaps experience has informed experts that slower movements maintain safety when the task at hand is more difficult.
This study also revealed that although some tasks may not display significant differences in objective performance metrics, they can still exhibit differences for cognitive workload between experience levels. For the excision/dissection tasks, experts experienced a lower cognitive workload while executing the same level of surgical performance, suggesting a greater capacity for decision making with unexpected complications should they arise.
Our findings are relevant because it is the first time our group has linked an automated method of surgical assessment to cognitive workload. In our previous validation studies of APMs, surgeon performance was measured in isolation – with no attempt made to survey the surgeon’s state of mind or the surgical environment. Correlations between APMs and TEPR illuminate surgeon adaptation skills while performing more difficult tasks. Understanding a surgeon’s cognitive workload provides insight into their decision-making process – which can be applied to how surgery is taught. Surgical training could encompass not only how to master a technical skill, but perhaps also how to cope with technical challenges. The next steps would require further understanding of how enhancing a surgeon’s “coping” skills may improve performance.
Some limitations to our study are noted. The small sample size from a single institution may limit the generalizability of our results. External validation of these findings should be performed. Our study evaluated for associations between surgeon performance and cognitive workload – any significant associations are merely that - no cause or effect can be attributed from one to the other dataset. Other unmeasured factors may contribute to variation of APMs and TEPR.
Conclusion
APMs and TEPRs can distinguish surgeons of varying experience. Furthermore, expert surgeons generally slow down their movements when more cognitive workload is required. In contrast, surgeons with more limited experience tend to speed up with greater cognitive demand.
Supplementary Material
Acknowledgements
Research reported in this publication was supported in part by the National Institute Of Biomedical Imaging And Bioengineering of the National Institutes of Health under Award Number K23EB026493 and an Intuitive Surgical Clinical Research Grant. Anthony Jarc and Liheng Guo (Intuitive Surgical, Inc.) assisted with automated performance metric processing.
Disclosure of potential conflicts of interest The study was supported in part by an Intuitive Surgical, Inc. clinical grant. Intuitive Surgical, Inc. provided the systems events data recorder. AJ Hung has a financial disclosure with Ethicon, Inc. (consultant).
Footnotes
Compliance with Ethical Standards
Research involving Human Participants and/or Animals All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.
Informed Consent Informed consent was obtained from all individual participants included in the study.
Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.
References
- 1. Intuitive Surgical, Inc.: Annual Report. 2018 Available at http://www.annualreports.com/Company/intuitive-surgical-inc.
- 2.Lerner MA, Ayalew M, Peine WJ, Sundaram CP (2010) Does training on a virtual reality robotic simulator improve performance on the da Vinci surgical system? J Endourol 24 (3):467–472. doi: 10.1089/end.2009.0190 [DOI] [PubMed] [Google Scholar]
- 3.Richstone L, Schwartz MJ, Seideman C, Cadeddu J, Marshall S, Kavoussi LR (2010) Eye metrics as an objective assessment of surgical skill. Ann Surg 252 (1): 177–182. doi: 10.1097/SLA.0b013e3181e464fb [DOI] [PubMed] [Google Scholar]
- 4.Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJ, Michigan Bariatric Surgery C (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369 (15): 1434–1442. doi: 10.1056/NEJMsa1300625 [DOI] [PubMed] [Google Scholar]
- 5.Chen J, Cheng N, Cacciamani G, Oh P, Lin-Brande M, Remulla D, Gill IS, Hung AJ (2019) Objective Assessment of Robotic Surgical Technical Skill: A Systematic Review. J Urol 201 (3):461–469. doi: 10.1016/j.juro.2018.06.078 [DOI] [PubMed] [Google Scholar]
- 6.Goh AC, Goldfarb DW, Sander JC, Miles BJ, Dunkin BJ (2012) Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills. J Urol 187 (1):247–252. doi: 10.1016/j.juro.2011.09.032 [DOI] [PubMed] [Google Scholar]
- 7.Hung AJ, Chen J, Jarc A, Hatcher D, Djaladat H, Gill IS (2018) Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study. J Urol 199 (1):296–304. doi: 10.1016/j.juro.2017.07.081 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ghani KR, Miller DC, Linsell S, Brachulis A, Lane B, Sarle R, Dalela D, Menon M, Comstock B, Lendvay TS, Montie J, Peabody JO, Michigan Urological Surgery Improvement C (2016) Measuring to Improve: Peer and Crowd-sourced Assessments of Technical Skill with Robot-assisted Radical Prostatectomy. Eur Urol 69 (4):547–550. doi: 10.1016/j.eururo.2015.11.028 [DOI] [PubMed] [Google Scholar]
- 9.Hung AJ, Oh PJ, Chen J, Ghodoussipour S, Lane C, Jarc A, Gill IS (2019) Experts vs super-experts: differences in automated performance metrics and clinical outcomes for robot-assisted radical prostatectomy. BJU Int 123 (5):861–868. doi: 10.1111/bju.14599 [DOI] [PubMed] [Google Scholar]
- 10.Hung AJ, Chen J, Ghodoussipour S, Oh PJ, Liu Z, Nguyen J, Purushotham S, Gill IS, Liu Y (2019) A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU Int. doi: 10.1111/bju.14735\ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dias RD, Ngo-Howard MC, Boskovski MT, Zenati MA, Yule SJ (2018) Systematic review of measurement tools to assess surgeons’ intraoperative cognitive workload. Br J Surg 105 (5):491–501. doi: 10.1002/bjs.10795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ruiz-Rabelo JF, Navarro-Rodriguez E, Di-Stasi LL, Diaz-Jimenez N, Cabrera-Bermon J, Diaz-Iglesias C, Gomez-Alvarez M, Briceno-Delgado J (2015) Validation of the NASA-TLX Score in Ongoing Assessment of Mental Workload During a Laparoscopic Learning Curve in Bariatric Surgery. Obes Surg 25 (12):2451–2456. doi: 10.1007/s11695-015-1922-1 [DOI] [PubMed] [Google Scholar]
- 13.Szulewski A, Roth N, Howes D (2015) The Use of Task-Evoked Pupillary Response as an Objective Measure of Cognitive Load in Novices and Trained Physicians: A New Tool for the Assessment of Expertise. Acad Med 90 (7):981–987. doi: 10.1097/ACM.0000000000000677 [DOI] [PubMed] [Google Scholar]
- 14.Abboudi H, Khan MS, Guru KA, Froghi S, de Win G, Van Poppel H, Dasgupta P, Ahmed K (2014) Learning curves for urological procedures: a systematic review. BJU Int 114 (4):617–629. doi: 10.1111/bju.12315 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.