Abstract
Background
Annotated data are foundational to applications of supervised machine learning. However, there seems to be a lack of common language used in the field of surgical data science.
The aim of this study is to review the process of annotation and semantics used in the creation of SPM for minimally invasive surgery videos.
Methods
For this systematic review, we reviewed articles indexed in the MEDLINE database from January 2000 until March 2022. We selected articles using surgical video annotations to describe a surgical process model in the field of minimally invasive surgery. We excluded studies focusing on instrument detection or recognition of anatomical areas only. The risk of bias was evaluated with the Newcastle Ottawa Quality assessment tool. Data from the studies were visually presented in table using the SPIDER tool.
Results
Of the 2806 articles identified, 34 were selected for review. Twenty-two were in the field of digestive surgery, six in ophthalmologic surgery only, one in neurosurgery, three in gynecologic surgery, and two in mixed fields. Thirty-one studies (88.2%) were dedicated to phase, step, or action recognition and mainly relied on a very simple formalization (29, 85.2%). Clinical information in the datasets was lacking for studies using available public datasets. The process of annotation for surgical process model was lacking and poorly described, and description of the surgical procedures was highly variable between studies.
Conclusion
Surgical video annotation lacks a rigorous and reproducible framework. This leads to difficulties in sharing videos between institutions and hospitals because of the different languages used. There is a need to develop and use common ontology to improve libraries of annotated surgical videos.
Keywords: Surgical data science, Ontology, Surgical process model, Annotation, Surgical video, Minimally invasive surgery
With the rise of minimally invasive surgery—including endoscopic, laparoscopic, and robotically assisted procedures—the new domain of surgical data science is emerging to improve the consistency and, hopefully, quality of care of the patient [1, 2]. Surgical data science consists of the scientific characterization of digital surgical information to improve patient outcomes. Among other areas of interest, two important goals of this field are to analyze surgical workflows and develop context-aware systems, both of which are significant features of the operating room of the future [3]. In the case of minimally invasive surgeries using endoscopy, both goals require surgical videos to be manually labeled in a spatial and/or temporal way following a surgical process model (SPM) [4], which is the cornerstone of surgical data science. Surgical process modeling consists of an analytical reduction of the surgical procedure in a formal or semi-formal representation defining phases, steps, and actions. It has already been developed for open surgery [5, 6]. Computer vision using machine learning has recently been used successfully for phase/step recognition or, in some cases, to estimate how long the surgical procedure will last [7, 8]. The vast majority of these approaches requires labeled surgical videos.
However, the availability of a large volume of labeled data represents a major bottleneck for machine learning applied to surgical video analysis [8], and partly explains why the application of machine learning in surgery is limited compared to medical imaging. It is therefore important to share data between different institutions to increase the data pool and accelerate research. An important limitation to this sharing is the lack of a standard vocabulary for video annotation.
Although the Society of Gastrointestinal and Endoscopic Surgeons (SAGES) has recently provided guidelines for video annotation, a review that reproduces prior processes in surgical video annotation with robust formalization is required to ensure that surgical data are machine-readable and clinically meaningful [9]. Recent literature reviews of surgical data science focus mostly on artificial intelligence and machine learning techniques [8]. To date, no reviews have summarized the process of choice and language used for surgical video annotation.
Therefore, the aim of this study is to review the process of annotation and semantics used in the creation of SPM for minimally invasive surgery videos.
To present a review of the literature using annotation for creating SPM for minimally invasive surgery.
Materials and methods
Search strategy
We conducted a systematic review according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis guidelines (PRISMA) 2020 [10].
Two investigators (KNT and MZ) performed an English literature search on Medline (Pubmed) from January 2000 to March 2022.
We selected articles in English including the use of labeled surgical videos to contextualize the SPM for minimally invasive surgeries.
The following keywords were used: minimally invasive surgery OR surgery AND machine learning OR deep learning OR computer vision AND surgical workflow OR surgical process model OR video annotation.
After the exclusion of duplicate articles, all the articles were screened by the two investigators. Titles and abstracts were initially assessed for eligibility before conducting a second selection based on the full text to exclude inappropriate articles. Any discrepancies were resolved by consensus.
Inclusion and exclusion criteria
Studies were included if they used surgical video labeling in minimally invasive surgery.
All fields of surgical specialty were considered. Studies focusing on instrument detection or recognition of anatomical areas only were excluded. We excluded commentaries, editorials, expert consensus, reviews, abstracts, and pure bioengineering research.
Definitions
An SPM is defined as “a simplified pattern of a surgical process that reflects a predefined subset of interest of the surgical process in a formal or semi-formal representation” [4].
The granularity level for the temporal description of the procedure is defined as the level of abstraction at which the surgical procedure is described. The highest level is the procedure itself. The procedure is composed of a list of phases. They must occur sequentially in the following order: Access, Execution of Surgical Objectives, Closure. Each phase is composed of several steps. As described by Lalys et al. [4] and the recommendations of SAGES [9], steps represent a sequence of activities used to achieve a clinically meaningful surgical objective. They are procedure specific. An activity is defined as a physical task including an action verb, instrument, and anatomy with an origin and a destination. It has starting and ending times, as well as the body part(s) performing the action. Each activity is composed of a list of motions.
A granularity level for the spatial description of the anatomy is defined with a basic hierarchical organization of anatomic spatial features from high level to low level: (1) Anatomic region (e.g., upper or lower abdomen, pelvis, retroperitoneum), (2) General anatomy (e.g., veins, arteries, muscle), and (3) Specific anatomy (e.g., liver, gallbladder, stomach, cystic artery, common bile duct) [9].
A surgical procedure is described through the annotation of videos. This description is available through a representation at a certain level of formalization describing the level at which the description is represented: heavyweight ontology is a representation with the highest formalization level based on a hierarchy of concepts and relations, lightweight ontologies are represented by Unified Modeling Language (UML) class diagrams and/or eXtensible Markup Language (XML) Schema. At the lower level, hierarchical decomposition, sequential or non-sequential lists are also used, suggesting a list of words to represent one or many levels of the surgery’s granularity [4, 9].
Data extraction (Fig. 1)
Fig. 1.
Framework of analysis of articles
The principal data items extracted and analyzed from the articles are as follows:
–. Application
Surgical specialties In this review, we individualized surgical specialties according to the organ system targeted: digestive surgery, ophthalmologic surgery, neurosurgery, gynecologic surgery, and ear-nose-throat surgery.
Clinical applications phase, step, or action recognition, surgery time prediction, surgical quality, context-aware systems, robotic assistance, and automatic generation report.
Quality criteria of the dataset To assess the quality of the dataset and avoid the high risk of bias, as described by Anteby et al. [8], we reported the following items: description of the population, clinical information, ethics committee approval, clinical selection criteria of patients eligible for surgery, selection criteria of videos, inclusion timeline, and consecutive cases.
Modeling Modeling describes and explains the work domain which is identified by the granularity level at which the procedure is studied, the operator involved, and the formalization4.
- The granularity levels (defined above).
- We measured the semantic strength of each phase, i.e., the median number of words used to define the phase.
Annotation creation was the methodology employed to provide SPM: generation based on local expert consensus; or based on international consensus, literature, or upper ontology.
- The formalization level (defined above)
- Acquisition The collection of data on which the models are built and the first step toward creating an SPM. We extracted information about:
The videos: number
The annotation software: name and availability
The surgeons: number and level of expertise
The annotators: number, specific training, quality of annotators and study of inter-/intra-annotator variability.
Quality assessment
For assessing the risk of bias to comply with Prisma criteria, the Newcastle–Ottawa quality assessment scale was used. Even though not all domains of Newcastle–Ottawa could be applied, Results are presented in Appendix.
Results
Study selection and characteristics of the studies
The initial search yielded a total of 2806 articles of which 374 were screened and 75 were eligible. Finally, 34 were selected for the review including a total of 2588 annotated videos (Fig. 2). All the studies were published between 2011 and 2022 (Table 1).
Fig. 2.
Flowchart
Table 1.
Articles included in the review
Study | Sample | Phenomena of interest | Design |
Evaluation | Research | |||
---|---|---|---|---|---|---|---|---|
Applications | Platform | Algorithms | Input features | |||||
| ||||||||
Blum et al. [32] | 10 surgical videos 2 surgeons and assistants |
Laparoscopic cholecystectomies | Automatic generation and visualization of the workflow of surgery | Laparoscopy | HMM model merging | Video frames (instruments signals) | Generation of a graphic unit user Average duration. Probability of reaching node, probability of an instrument being used | Qualitative Quantitative |
Bodenstedt et al. [19] | 80 surgical videos (CHOLEC80) | Laparoscopic cholecystectomies | Surgical workflow analysis | Laparoscopy | Deep Bayesian networks | Video frames | Variance, variation ratio, entropy, mutual information, the weighted LI score, accuracy | Quantitative |
Bodenstedt et al. [20] | 80 surgical videos (CHOLEC80) | Laparoscopic cholecystectomies | To directly predict and refine the duration of laparoscopic interventions | Laparoscopy | recurrent CNNs | surgical device. Endoscopic image data. Both | Mean absolute error. Mean relative error | Quantitative |
Cheng et al. [13] | 163 surgical videos 4 medical centers 7 different surgeons with different levels |
Laparoscopic cholecystectomies | Automatic surgical phases recognition | Laparoscopy | CNN LTSM |
Video frames | precision, recall, LI score, and overall accuracy | Quantitative |
Derathé et al. [16] | 29 surgical videos Two confirmed surgeons | Laparoscopic Sleeve Gastrectomy | Automatic surgical phase recognition (to predict the exposure of the surgical scene) | Laparoscopy | Algorithm pipeline | Video frames | Accuracy, sensitivity, specificity | Quantitative |
Dergachyova et al. [21] | 7 surgical videos | Laparoscopic Cholecystectomies | Automatic phase recognition | Laparoscopy | Adaboost classifier Hidden semi-Markov Model | Video frames Instrument usage signals | Accuracy, Precision, recall | Quantitative |
Guedon et al. [31] | 33 surgical videos 5 surgeons 35 surgical videos 3 surgeons |
Laparoscopic Cholecystectomies Laparoscopic hysterectomies | Automatic phase recognition | Laparoscopy | InceptionV3 network with ResNet50 | Video frames | Accuracy, precision, recall | Quantitative |
Hashimoto et al. [14] | 88 surgical videos | Laparoscopic sleeve gastrectomies | Automatic step recognition | Laparoscopy | Sleevenet (combined ResNet and LTSM) | Video frames | Accuracy | Quantitative |
Huaulmé et al. [18] | 11 Surgical videos 1 surgeon expert |
Laparoscopic rectopexies | Automatic detection of surgical process | Laparoscopy | Multi-dimensional non-linear temporal scaling with a hidden semi-Markov model | Video frames | Accuracy, recall, precision | Quantitative |
Jalal et al. [56] | 80 Surgical videos | 80 Surgical videos | Automatic phase recognition | Laparoscopy | CNN and NARX | Video frames | Accuracy and precision | Quantitative |
Jin et al. [22] | 80 Surgical videos | 80 Surgical videos | Automatic phase recognition and surgical tool detection | Laparoscopy | MTRCNet-CL | Video frames | Precision, recall, accuracy, mean average precision | Quantitative |
Katie et al. [37] | 16 surgical videos | 16 multiple surgical procedures(11 pancreatic resections and 5 adrenalectomies) Several surgeons |
Automatic phase recognition | Laparoscopy | Random forest Cultural optimization | Video frames | recognition rate, variance of the recognition rate the average time to recognize a phase | Qualitative Quantitative |
Khan etal. [11] | 50 surgical videos 1 attending surgeon and 1 subspecialty fellow | Endoscopic pituitary surgery | Automatic phase recognition | Endoscopy | CNN and Recurrent CNN | Video frames | Precision, recall, accuracy, LI score | Quantitative |
Kitugachi et al. [12] | 300 surgical videos 19 high-volume endoscopic centers Numerous different surgeons |
Laparoscopic Sigmoidectomies | Automatic phase recognition | Laparoscopy | CNN | Video frames | Overall accuracies. Intersection over union | Quantitative |
Kitugachi et al. [29] | 71 surgical videos 19 surgeons |
Laparoscopic Sigmoidectomies | Automatic phase recognition | Laparoscopy | Inception-ResNet-v2 LightGBM |
Video frames | Precision, recall, LI score, and overall accuracy | Quantitative |
Pangal et al. [55] | 10 surgical videos | Laparoscopic Cholecystectomies | Evaluation of the reliability of real-time workflow recognition in | Laparoscopy | Intelligent Workflow analysis and Prediction software | Continuous sensor-based data acquisition | Correlation coefficient, percentage of over or under estimation | Quantitative |
Lalys et al. [39] | 20 surgical videos 2 experts surgeons |
Cataract | automatic detection of low-level surgical tasks, that is, the sequence of activities in a surgical procedure |
Microscopy | SVM DTW | Video frames | percentage of frames correctly recognized in one video within a frame-by-frame analysis, global accuracy, and recognition rates per activity pair | Quantitative |
Lalys et al. [40] | 20 surgical videos 3 experts surgeons |
Cataract | automatic recognition of high-level surgical tasks | Microscopy | DTW HMM |
Videos frames | frequency recognition rate, accuracy | Quantitative |
Lecuyer et al. [23] | 80 surgical videos 50 surgical videos |
Laparoscopic Cholecystectomy Cataract | Automatic phase recognition | Laparoscopy | VG19, Inception V3, Resnet50 | Video frames | Accuracy, Duration of annotation using assistance system compared to duration without | Quantitative |
Malpani et al. [30] | 24 surgical videos 6 faculty residents with more than 20 residents |
Robotic hysterectomies | Automatic phase recognition | Laparoscopy robot-assisted | Semi-Markov conditional random field | Video frames Tool motion data System (console) events recorded by the Da Vinci |
Precision, recall, accuracy, the normalized Leven-shtein distance | Quantitative |
Mascagani et al. [38] | 155 surgical videos 12 surgeons |
Laparoscopic Cholecystectomies | Automatic location of critical events and provide short video clips documenting the critical view of safety | Laparoscopy | Endodigest | Video frames | Mean Absolute Error, the percentage of video clips in which the CVS was assessable by surgeons (relevance) | Quantitative |
Mascagani et al. [41] | 144 surgical videos 4 italian centers |
Laparoscopic Cholecystectomies | Automatic phase recognition (external validation) | Laparoscopy | EndoDigest | Video frames | mean error, percentage of the automatically extracted short video clips documented CVS effectively |
Quantitative |
Meuwssen et al.[28] | 40 surgical videos | Laparoscopic Hysterectomies | Automatic phase recognition | Laparoscopy | Random forest | Video frames | Accuracy, mean absolute error, the end time prediction | Quantitative |
Nespolo et al. [17] | 10 surgical videos Attending physicians Surgical trainee | Cataract | Automatic phase recognition | Microscopy | Faster R-CNN | Video frames | Area under the receiver operating characteristic curve, mean proceeding speed | Quantitative |
Guerin et al. [56] | 16 surgical videos 4 surgeons of various skills |
Laparoscopic Cholecystectomies | Automatic phase recognition | Laparoscopy | DTW Hidden Markov |
Videos from endoscopic views and two external views | accuracy, average recall, and average precision | Quantitative |
Quellec et al. [33] | 186 surgical videos 10 different surgeons |
Cataract | Automatic phase recognition | Microscopy | Conditional random fields Unary potential |
Video frames | Accuracy, area under the receiver operating characteristic (ROC) curve | Quantitative |
Ramesh et al. [24] | 40 surgical videos 7 experts surgeons |
Bypass 40 | Automatic phase and step recognition | Laparoscopy | MTMS-TCN CNN |
Video frames | accuracy, precision, and recall |
Quantitative |
Shi et al. [25] | 80 surgical videos 41 surgical videos |
Laparoscopic cholecystectomies | Automatic phase recognition | Laparoscopy | Two-stage Semi-Supervised Learning | Video frames | accuracy, precision and recall, jaccard | Quantitative |
Twinanda et al. [26] | 80 surgical videos 7 surgical videos |
Laparoscopic cholecystectomies | Automatic phase recognition and tool detection | Laparoscopy | Endonet | Video frames | Average precision, average recall, accuracy. | Quantitative |
Twinanda et al. [27] | 120 surgical videos 170 surgical videos |
Laparoscopic cholecystectomies Laparoscopic bypass | Automatic estimation of the remaining surgery duration | Laparoscopy | RSDNet | Video frames | Mean absolute error, overestimation, underestimation | Quantitative |
Yeh et al. [15] | 298 surgical videos 12 resident surgeons |
Cataract | Automatic phase/ step recognition | Microscopy | VGG16 VGG16 followed by CNN and RCNN |
Video frames | Accuracy, Micro-averaged area under receiver operating characteristic curves | Quantitative |
Yu et al. [42] | 100 surgical videos 1 Faculty and 1 trainee surgeons |
Cataract | Automatic phase recognition | Microscopy | 1) SVM 2) a recurrent neural network (RNN) input 3) a CNN 4) a CNN-RNN input with a time series of images; and (5) a CNN-RNN input with time series of images and instrument labels |
Video frames | Accuracy, area under the receiver operating characteristic curve, sensitivity, specificity, and precision | Quantitative |
Zhang et al. [35] | 14 surgical videos A group of surgeons CholecSO | Laparoscopic sacrocol-popexies | Automatic phase recognition | Laparoscopy | seq2seq LTSM and transformers |
Video frames | accuracy, FI score. Ward metric | Quantitative |
Zhang et al. [34] | 461 surgical videos | Robotic laparoscopic sleeve gastrectomies | Automatic phase recognition | Robotic and laparoscopy | Inflated 3D ConvNet (I3D) | Video frames | accuracy, precision, and recall weighted jaccard score | Quantitative |
Figure 1 represents the framework of analysis of the articles.
Application
Surgical field
Of the selected studies, 22 were in the field of digestive surgery, six in ophthalmologic surgery only, one in neurosurgery, three in gynecologic surgery, and two in mixed fields (1 of ophthalmologic and digestive surgeries, and 1 of digestive and gynecologic surgeries) (Fig. 3). Only two of the 34 studies studied robotic-assisted surgery.
Fig. 3.
Distribution of the articles
Clinical application and results
Most of the studies (31, 88.2%) were dedicated to phase, step, or action recognition. Three (8.8%) studies focused on surgical quality, and two (6.1%) on procedure duration. Only one (2.9%) study provided a clinical correlation with surgical procedure annotation.
Thirty-three (97.1%) studies used machine learning techniques.
Quality criteria of the dataset
The population involved in the dataset was fully described in two (5.9%) studies only. In the studies of Kitugachi et al. and Khan et al. [11, 12], information such as colorectal tumor score and histologic nature of the pituitary gland tumors were provided. Clinical information was lacking for studies using available public datasets. Clinical information was confronted to annotation analysis results in Cheng et al. only (2.9%) [13]. They correlated the severity of cholecystitis with the duration of the surgical procedure [13]. However, no other clinical information was available in their study. Selection clinical criteria of patients for surgery were detailed for one (2.9%) study [11] only. Selection criteria of videos for inclusion in the dataset were detailed in four (11.8%) studies [11, 13–15].
Only thirteen (41.2%) studies mentioned ethics committee approval [11–18].
Only Huaulmé et al. indicated that cases were consecutively included in the dataset during the inclusion time [18], decreasing the bias associated with non-consecutive case inclusion.
The inclusion timeline was often extensive (median duration of 35 (5–125) months) implying a possible shift in surgical guidelines during the study period.
Eleven (32.4%) studies [12, 17, 19–27] used one or several available public datasets (Table 2) and the remaining 23 (67.7%) used private datasets.
Table 2.
Available public datasets and articles related to this current review
Datasets | Field | Interventions | Number of Videos | Studies | Application |
---|---|---|---|---|---|
| |||||
CHOLEC80http://camma.u-strasbg.fr/datasets | Digestive surgery | Laparoscopic cholecystectomy | 80 | Shi et al. [25] | Surgical workflow recognition |
Jin et al. [22] | Tool presence detection and phase recognition | ||||
Twinanda et al. [26] | Tool presence detection and phase recognition | ||||
Bodenstedt et al. [19] | Surgical workflow recognition | ||||
Bodenstedt et al. [20] | Procedure duration prediction | ||||
Jalal et al. [56] | Surgical workflow recognition | ||||
Lecuyer et al. [23] | Surgical workflow recognition | ||||
CHOLEC120 = CHOLEC80 + 40 cholecystectomies https://github.com/CAMMA-public/Surgical-Phase-Recognition | Digestive surgery | Laparoscopic cholecystectomy | 120 | Twinanda et al. [27] | Prediction of remaining surgery duration |
CATARACT 101 http://ftp.itec.aau.at/datasets/ovid/cat-101/ | Ophthalmology surgery | Cataract | 101 | Nespolo et al. [17] | Surgical workflow recognition |
CATARACT https://arxiv.org/abs/1906.11586 |
Ophthalmology surgery | Cataract | 50 | Lecuyer et al. [23] | Surgical workflow recognition |
Modeling
Formalization
Most of the studies were based on a very simple formalization: a 2D graph with a sequential list in 24 (70.6%) studies and a non-sequential list in five (14.7%). A more complex formalization was used in two studies with a hierarchical decomposition in three (8.8%) and a diagram in one.
Work domain
Surgical procedures
The studied surgical procedures were cholecystectomy (17, 50%), cataract surgery (7, 20.6%), bypass (2, 5.9%), sleeve gastrectomy (3, 8.8%), hysterectomy (2, 5.8%), pituitary gland removal (1, 2.9%), rectopexy (1, 2.9%), and sacrocolpopexy (1, 2.9%). At this highest level, only 53% (18) of the studies clearly described the specific surgical procedure.
Annotation creation was described in 23 (68%) studies and was based on one surgeon (9, 26.5%), literature exclusively (4, 11.8%), local consensus by several surgeons exclusively (4, 11.8%), both local consensus and literature (4, 11.8%), cognitive task analysis with engineer and surgeon experts (1, 2.9%), or upper ontology (1, 2.9%).
The lower granularity level included phases (29, 87.9%), steps (5, 14.7%), actions (7, 20.6%), and instruments (23, 68%).
Phases
Two (2/29, 6.9%) studies used the term phases in coherence with the previous consensual definition [13, 28]. We observed that the terms step and phase were employed indiscriminately in many articles.
The median semantic strength (i.e., number of words used to describe a phase) was 2 but was highly variable (0–33). Four (13.8%) studies provided additional information like the start and ending of a phase [12, 13, 29, 30].
Eleven (38%) studies provided pictures to illustrate phases.
For the same surgical procedure, we observed high heterogeneity in the number of phases described from one study to another (Fig. 4): from 6 to 14 for cholecystectomies; from 3 to 12 for cataract procedures; 7 and 10 for the two hysterectomy studies; and 7 and 8 for the sleeve gastrostomy procedures.
Fig. 4.
Heterogeneity of phases for cataract surgical procedure
Three studies focusing on the whole procedure did not detail the number of phases [15, 18, 26].
In the study by Guedon et al., excessive bleeding was described as a possible additional phase that could occur at any time [31].
Nine studies annotated idle times, [15, 17, 23, 24, 30, 32–35] providing additional data on phase annotation.
Steps
Five (14.3%) studies described the steps [11, 14, 23, 24, 28], and used the term in coherence with the previous consensual definitions [1, 4]. Semantic strength varied considerably with a median of 3.5 (2–36). None of the studies provided additional information (like the start and end of a step), or pictures to illustrate the steps.
Activities: actions, instruments, anatomy
Activities were characterized in nine (26.5%) of studies [4, 12, 16, 29, 30, 36–38], with one (11.1%) providing pictures to illustrate the activities [39].
The study by Derathé et al. [16], analyzing laparoscopic sleeve gastrectomy, focused on a single phase: the exposure of the surgical scene with a view to assessing its quality. Therefore, they annotated activities during this phase: actions, instruments, and anatomy.
Instruments were characterized in 23 (68%) studies. In 19 (82.6%) studies, specific instruments were described with a median of 12 (2–21) instruments specified per study. Four (17.4%) studies [12, 19, 20, 23] provided pictures to illustrate the instruments.
Anatomy was described in eight (23%) studies [16, 17, 36–40], and specific anatomy described in three [38–40]. Anatomical characteristics (normal or pathologic) were never reported.
Other useful information
Some authors, such as Derathé et al. [16], focused on the surgical quality within a phase with a quality-oriented annotation.
Mascagani et al. focused on the identification of the critical view of safety [38, 41].
Other studies added events and classified them as “normal or abnormal” such as Hashimoto et al. [14] during sleeve gastrectomy, and Huaulmé et al. [18] during rectopexy.
Finally, Malpani et al. [30] reported additional data provided by da Vinci Surgical System as tool identity, tool changes, endoscopic movements, repositioning of the manipulator, and a head-in indicator identifying whether the surgeon was working at the console.
Acquisition
Surgeons
The number of surgeons performing or involved in the surgeries was listed in 24 (70.6%) studies. The median number of surgeons involved was 6 (1–28). Twenty (58.8%) studies reported the expertise level of surgeons although the definition of expertise was poorly detailed and heterogeneous: a trainee in one study, an expert in 10, and mixed trainee and surgeon in nine. Two studies reported that the surgeries could be performed by two operators spontaneously: an expert and a fellow [30, 32]. Twenty (58.8%) of the studies mentioned that the surgery took place in an affiliated institution, and five studies involved multiple institutions.
Videos
The median number of videos used per study was 45 (7–461).
Videos came from robotic surgery in two studies [28, 35]. One study mixed videos from robotic and laparoscopic surgeries [35].
The annotation software was reported in 12 (34.3%) studies (Table 3).
Table 3.
Annotation software used in the articles
Annotation Software | Academic or corporate | Studies |
---|---|---|
| ||
Nous (COSMONiO) ® | Private | Guédon, et al. [31] |
Annotate ® | Private | Derathé et al. [16] Huaulmé et al. [18] Lecuyer et al. [23] |
Anvil research tool annotation software ® | Public | Cheng et al. [13] Hashimoto et al. [14] |
Swansuite ® | Private | Katic et al. [37] |
Touchsurgery ® | Private | Khan et al. [11] |
Via Software ® | Private | Yeh et al. [15] |
Annotators
Information about annotators was available in 23 (65.7%) studies. The median number of annotators was 2 (1–4). The annotators had been specifically trained in three studies [14, 15, 31]. Thirteen studies indicated the expertise level of annotators: surgeons in nine studies [11–14, 17, 23, 33, 35] [40], non-clinical researchers in two [15, 18], mixed physicians and trained annotators in one [42], and mixed scientists and expert surgeons in one [16]. Only one study studied inter-annotator variability [13] (Table 4). None of the studies described the learning curve of the annotators.
Table 4.
Experience of annotators
Study | Level of experience of annotators |
---|---|
| |
Blum et al. [32] | Not mentioned |
Bodenstedt et al. [19] | Not mentioned |
Bodenstedt et al. [20] | Not mentioned |
Cheng et al. [13] | surgeons |
Derathé et al. [16] | Surgeons and scientists |
Dergachyova et al. [21] | Not mentioned |
Guedon et al. [31] | Instructed students and author |
Instructed students and author | |
Hashimoto et al. [14] | surgeons |
Huaulmé et al. [18] | Scientist |
Jalal et al. [56] | Not mentioned |
Jin et al. [22] | Not mentioned |
Katic et al. [37] | Not mentioned |
Khan et al. [11] | surgeons |
Kitugachi et al. [12] | surgeons |
Kitugachi et al. [29] | Not mentioned |
Pangal et al. [55] | Not mentioned |
Lalys et al. [39] | Not mentioned |
Lalys et al. [40] | Surgeon |
Lecuyer et al. [23] | Surgeon |
Surgeon 1 | |
Malpani et al. [30] | Not mentioned |
Mascagani et al. [38] | Not mentioned |
Mascagani et al. [41] | Not mentioned |
Meuwssen et al. [28] | Not mentioned |
Nespolo et al. [17] | surgeons |
Guerin et al. [56] | Not mentioned |
Quellec et al. [33] | surgeons |
Ramesh et al. [24] | Not mentioned |
Shi et al. [25] | Not mentioned |
Not mentioned | |
Twinanda et al. [26] | Not mentioned |
Not mentioned | |
Twinanda et al. [27] | Not mentioned |
Not mentioned | |
Yeh et al. [15] | Non-clinical researcher |
Yu et al. [42] | Physicians and trained annotators |
Zhang et al. [35] | surgeons |
Zhang et al. [34] | Not mentioned |
Annotation was corrected by an expert surgeon annotator in four studies [14, 15, 18, 23]
In one study the annotator was changed for the same dataset [15].
Quality assessment and risk of bias
Manuscripts were ascribed a high risk of bias because of failure to report ethics committee approval and to describe the study population. The Newcastle–Ottawa quality assessment scale was used (Appendix).
Evolution of quality of annotation over the years
The quality of annotation seems to have slightly improved over the 14-year time span.
Nine studies used literature and cross-referenced previous studies in the process of creating an annotation. These studies were all recent (from 2014 to now).
Before 2020 (i.e., between 2008 and 2020), among 19 articles, only 6 (31.6%) studies had ethics committee approval. Only one (5%) study had clinical data within the dataset. Information about annotators was reported in 9 (47.4%) studies (surgeons in 3 cases, physicians and trained annotators in 1 case, and scientists and surgeons in 1 case). The median semantic strength of phase was 4 (2.2–6.6), and of steps was 4.5 (3–6.5).
After 2020 (i.e., between 2020 and 2022), among 15 articles, 9 (60%) studies had ethics committee approval. Three (20%) studies had clinical data within the dataset. Information about annotators was reported in 7 (47%) studies (surgeons in 6 cases, data engineers in 1 case, and scientists in 1 case). The median semantic strength of phase was 6.2 (3.2–10), and of steps was 2.8 (1–3).
Discussion
The present review highlights that the process of surgical video annotation for minimally invasive surgeries is highly variable between studies. Surgical procedure description through the SPM lacks robust and consistent formalization illustrating relationships between concepts. The methodology employed to choose semantics and vocabulary is rarely standardized and not reproducible, leading to heterogeneity in the generation of SPM among studies. These results may explain the current lack of success stories in the field of surgical data science.
Video labeling is a matter of high interest in the surgical data science area. The semantics used are the fundamental basis for generating SPM [43], i.e., detailed descriptions of surgical procedures [4]. SPM is an original approach to establish a solid basis for analysis of various aspects of surgical procedures [44]. Its usage could improve surgical workflow management in the modern operating room and help optimize and improve the procedures [3]. Additionally, semantic coherence facilitates data sharing and collaboration between institutions which would result in gathering enough surgical cases to ensure the representativeness of pathologies and procedures. Data are the foundation of machine learning, and the lack of annotated data is a limiting factor for improving deep learning performance [8]. Machine learning mainly uses supervised learning; thus, raw data are of little utility without annotation [45]. Finally, a robust representation of SPM or formalization is needed to represent surgical knowledge, and make it explicit, and consequently shareable. According to the recent SAGES recommendations, formalization must be universal, scalable, machine-readable, clinically applicable, and flexible [14]. Ontologies are key to creating standardized SPMs [37, 43]. Therefore, choosing appropriate vocabulary to annotate surgical videos is crucial to raising artificial intelligence surgical data science.
In this review, we found that laparoscopic cholecystectomy is the most studied surgical procedure as in the review by Garrow et al. [45]. This is probably because there are many publicly available datasets which focus on this simple, and often minimally invasive, procedure.
In the current review, one-third of the studies did not describe the annotation creation employed to generate SPM. When described, only five studies were based on both local consensus and literature or on an upper ontology. Huaulmé et al. used a cognitive task analysis between engineer and surgeon experts [18]. The Delphi methodology for SPM generation may be a good option as many expert surgeons from different institutions are involved in the process, making the results more broadly accepted [46].
We also observed a high variability and heterogeneity in SPMs across the studies. There was considerable variability between the number of phases or steps for the same surgical procedures. For example, depending on the study, the cataract procedure consisted of 4 to 14 phases. Also, the terms “steps” and “phases” were often used incorrectly, making study comparison difficult, although clear definitions of phases and steps exist [4, 14].
We also investigated the quality of the definitions of the elements of SPM by measuring the mean number of words used (called semantic strength) and found it to be highly variable for phases and steps. We noted that four studies added information at the start and end of a phase, increasing the accuracy of definitions. Also, additional pictures to illustrate phases, steps, or instruments were provided in some studies.
Most of the studies in the present review used very simple formalization such as a sequential or non-sequential list. This lightweight formalization does allow visualization of relationships between elements compared to heavyweight formalization. While international healthcare terminology standards for biomedical data science are well established (such as the Foundational Model of Anatomy (FMA) [47], Gene Ontology (GO) [48], and others), ontologies to describe activities and other aspects of interventional care processes are rare [1, 2]. However, In the surgical field, recent insights have provided clues to create a robust formalization. OntoSPM [43] and LapOntoSPM [37] are the first specific ontologies focusing on the modeling of the entities of surgical process models. OntoSPM [43] is now organized as a collaborative action associating a dozen European research institutions, gathering the basic vocabulary to describe surgical actions, instruments, actors, and their roles. This project is promising as initiatives like the OBO Foundry [49] (a project that focuses on biology and biomedicine) provided evidence that building and sharing interoperable ontologies stimulate data sharing within a domain [1]. Widespread broad ontology for surgical application is thus fundamental, built upon close collaboration between surgeons and engineers [50]. It improves the clinical relevance of the terms used as well as promoting the use of vocabulary familiar to surgeons. The formalization of the SPM has already been used to describe open surgical procedures. This allows interesting analyses such as distinguishing expert and junior performance in discectomy, for example [6].
However, there are some disadvantages of ontology including a lack of flexibility, a considerable initial effort [44], and its so-called complexity. These factors explain why ontology is not widely used in machine learning despite its interest. Representing relationships between elements can be time-consuming and challenging in spite of software like Protégé, a free open-source ontology editor [51].
All the studies included in this review used endoscopic or microscopic video for data acquisition. These techniques give a good view of the surgical site, suitable for low-level data acquisition, but no access to operative room ergonomics or insight into team interactions [44]
Beyond the question of ontology, video annotation is associated with several challenges. Practical considerations need to be taken into account when sharing surgical videos between countries and between hospitals. The use of a dataset of quality is fundamental. Before gathering videos within a dataset, ethics committee approval is required, and patient consent must be obtained. In this review, several publications failed to provide information about ethics committee approval, the surgeons involved, number of institutions, and clinical data. These results are consistent with those of Anteby et al. [8]. Furthermore, public datasets provide less clinical information. A pseudo-anonymization should be performed to be able to use a video retrospectively. When integrating a shared database, the question of copyright of the video can also be an issue [31]. Furthermore, a major bottleneck for data annotation is the lack of access to expert knowledge. However, dedicated software can help structure expert knowledge through SPM [52]. It is therefore crucial to use specifically trained annotators. In this review, information about annotators was often lacking, and especially inter-annotator reliability. Moreover, labeling videos is timeconsuming, as demonstrated by Huaulmé et al. [36], and resource-intensive. One solution may be crowdsourcing, where the annotation task is outsourced to an anonymous untrained crowd [2]. Ultimately, data should be collected as a matter of best practice in a consistent, longitudinal manner using tools that are smoothly integrated into the clinical workflow. Workers in the field need to identify allies and clear short-term “win scenarios” that will build interest and trust in the area so that hospitals, insurers, and practitioners all see the value of creating these resources, which will ultimately advance the profession [2]. Surgeons must be reserved for high-performance annotation such as identifying anatomical features and assessing the quality of a dissection [50].
In this review, only two studies reported cases using robotic surgery, and one also focused on data about optics. Robotic assistance surgery offers the possibility to extract additional data from robot kinematics and event data. Hung’s team has introduced automated performance metrics (APM) with machine learning to assess a surgeon’s performance, recognize the surgical activity, and even anticipate surgical outcomes [53–55]. Additionally, in a recent review, our team demonstrated that APMs could be considered objective tools for technical skills assessment, even though associations between APMs and clinical outcomes remain to be confirmed by further studies, particularly outside the field of urology [56].
Limitations of the current review include the fact that we focused exclusively on minimally invasive surgery based on video. Studies focusing on element recognition, such as anatomy or instruments, were excluded in order to focus on the whole surgical procedure.
We put forward several propositions to support the generation of consistent and shareable annotation of surgical video labeling. First, each project should be supported by a dedicated team with a real partnership between surgeons and engineers with a protocol and process of annotation decided ahead of time. Second, the objective of the project should be clearly determined with a specific question and a specific outcome. This would help define the granularity level of the SPM required in both temporal and spatial dimensions during the annotation. Third, the creation of a high-quality database is crucial including both ethics committee approval and patient consent, and the collection of anonymized patient clinical data. Fourth, a clear methodology for annotation with SPM should be established based on worldwide expert surgeon consensus and literature with robust formalization applied to enhance the relationships between the different elements. Finally, trained annotators must be carefully chosen for a specific task according to their abilities to minimize inter-annotator variability. An annotation review by experts can be added when needed.
We conclude that most of the studies in this review failed to adhere to a rigorous, reproducible, and understandable framework for surgical video annotation. This results in the use of different languages and hinders the sharing of videos between institutions and hospitals, resulting in difficulties for widespread dissemination of surgical data science. There is an urgent need to follow rigorous and formal methodologies in surgical video annotation, including common ontology.
Acknowledgements
The authors want to acknowledge Felicity Neilson, native English speaker specialized in scientific writing for English editing.
Appendix: Risk of bias and the Newcastle–Ottawa quality assessment scale
Study | Selection Ascertainment of exposure |
Outcome Assessment of outcome |
---|---|---|
Secure record (e.g., surgical records) | Independent blind assessment, record linkage | |
| ||
Blum et al. [32] | * | * |
Bodenstedt et al. [19] | * | * |
Bodenstedt et al. [20] | * | * |
Cheng et al. [13] | * | * |
Derathé et al. [16] | * | * |
Dergachyova et al. [21] | * | * |
Guedon et al. [31] | * | * |
* | * | |
Hashimoto et al. [14] | * | * |
Huaulmé et al. [18] | * | * |
Jalal et al. [56] | * | * |
Jin et al. [22] | * | * |
Katic et al. [37] | * | * |
Khan et al. [11] | * | * |
Kitugachi et al. [12] | * | * |
Kitugachi et al. [29] | * | * |
Pangal et al. [55] | * | * |
Lalys et al. [39] | * | * |
Lalys et al. [40] | * | * |
Lecuyer et al. [23] | * | * |
* | * | |
Malpani et al. [30] | * | * |
Mascagani et al. [38] | * | * |
Mascagani et al. [41] | * | * |
Meuwssen et al. [28] | * | * |
Nespolo et al. [17] | * | * |
Guerin et al. [56] | * | * |
Quellec et al. [33] | * | * |
Ramesh et al. [24] | * | * |
Shi et al. [25] | * | * |
* | * | |
Twinanda et al. [26] | * | * |
* | * | |
Twinanda et al. [27] | * | * |
* | * | |
Yeh et al. [15] | * | * |
Yu et al. [42] | * | * |
Zhang et al. [35] | * | * |
Zhang et al. [34] | * | * |
Footnotes
Disclosure Dr. Krystel NYANGOH TIMOH, Dr. Arnaud HUAULMÉ, Dr. Kevin CLEARY, Mrs. Myrah A ZAHEER, Dr. Dan DONOHO, and Dr. Pierre JANNIN have no conflict of interest or financial ties to disclose. Pr. Vincent LAVOUÉ has a contract with Intuitiv® for proctoring.
References
- 1.Maier-Hein L, Eisenmann M, Sarikaya D, März K, Collins T, Malpani A, Fallert J, Feussner H, Giannarou S, Mascagni P, Nakawala H, Park A, Pugh C, Stoyanov D, Vedula SS, Cleary K, Fichtinger G, Forestier G, Gibaud B, Grantcharov T, Hashizume M, Heckmann-Nötzel D, Kenngott HG, Kikinis R, Mündermann L, Navab N, Onogur S, Roß T, Sznitman R, Taylor RH, Tizabi MD, Wagner M, Hager GD, Neumuth T, Padoy N, Collins J, Gockel I, Goedeke J, Hashimoto DA, Joyeux L, Lam K, Leff DR, Madani A, Marcus HJ, Meireles O, Seitel A, Teber D, Ückert F, Müller-Stich BP, Jannin P, Speidel S (2022) Surgical data science - from concepts toward clinical translation. Med Image Anal 76:102306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1:691–696 [DOI] [PubMed] [Google Scholar]
- 3.Cleary K, Kinsella A (2005) OR 2020: the operating room of the future. J Laparoendosc Adv Surg Tech A 15(495):497–573 [DOI] [PubMed] [Google Scholar]
- 4.Lalys F, Jannin P (2014) Surgical process modelling: a review. Int J Comput Assist Radiol Surg 9:495–511 [DOI] [PubMed] [Google Scholar]
- 5.Jannin P, Raimbault M, Morandi X, Riffaud L, Gibaud B (2003) Model of surgical procedures for multimodal image-guided neurosurgery. Comput Aided Surg 8:98–106 [DOI] [PubMed] [Google Scholar]
- 6.Riffaud L, Neumuth T, Morandi X, Trantakis C, Meixensberger J, Burgert O, Trelhu B, Jannin P (2010) Recording of surgical processes: a study comparing senior and junior neurosurgeons during lumbar disc herniation surgery. Neurosurgery 67:325–332 [DOI] [PubMed] [Google Scholar]
- 7.Moglia A, Georgiou K, Georgiou E, Satava RM, Cuschieri A (2021) A systematic review on artificial intelligence in robot-assisted surgery. Int J Surg 95:106151. [DOI] [PubMed] [Google Scholar]
- 8.Anteby R, Horesh N, Soffer S, Zager Y, Barash Y, Amiel I, Rosin D, Gutman M, Klang E (2021) Deep learning visual analysis in laparoscopic surgery: a systematic review and diagnostic test accuracy meta-analysis. Surg Endosc 35:1521–1533 [DOI] [PubMed] [Google Scholar]
- 9.Meireles OR, Rosman G, Altieri MS, Carin L, Hager G, Madani A, Padoy N, Pugh CM, Sylla P, Ward TM, Hashimoto DA (2021) SAGES consensus recommendations on an annotation framework for surgical video. Surg Endosc 35:4918–4929 [DOI] [PubMed] [Google Scholar]
- 10.Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, McKenzie JE (2021) PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ 372:n160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Khan DZ, Luengo I, Barbarisi S, Addis C, Culshaw L, Dorward NL, Haikka P, Jain A, Kerr K, Koh CH, Layard Horsfall H, Muirhead W, Palmisciano P, Vasey B, Stoyanov D, Marcus HJ (2021) Automated operative workflow analysis of endoscopic pituitary surgery using machine learning: development and preclinical evaluation (IDEAL stage 0). J Neurosurg. 10.1016/j.bas.2021.100580 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kitaguchi D, Takeshita N, Matsuzaki H, Oda T, Watanabe M, Mori K, Kobayashi E, Ito M (2020) Automated laparoscopic colorectal surgery workflow recognition using artificial intelligence: experimental research. Int J Surg 79:88–94 [DOI] [PubMed] [Google Scholar]
- 13.Cheng K, You J, Wu S, Chen Z, Zhou Z, Guan J, Peng B, Wang X (2022) Artificial intelligence-based automated laparoscopic cholecystectomy surgical phase recognition and analysis. Surg Endosc 36:3160–3168 [DOI] [PubMed] [Google Scholar]
- 14.Hashimoto DA, Rosman G, Witkowski ER, Stafford C, NavaretteWelton AJ, Rattner DW, Lillemoe KD, Rus DL, Meireles OR (2019) Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve gastrectomy. Ann Surg 270:414–421 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yeh HH, Jain AM, Fox O, Wang SY (2021) PhacoTrainer: a multicenter study of deep learning for activity recognition in cataract surgical videos. Transl Vis Sci Technol 10:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Derathé A, Reche F, Moreau-Gaudry A, Jannin P, Gibaud B, Voros S (2020) Predicting the quality of surgical exposure using spatial and procedural features from laparoscopic videos. Int J Comput Assist Radiol Surg 15:59–67 [DOI] [PubMed] [Google Scholar]
- 17.Garcia Nespolo R, Yi D, Cole E, Valikodath N, Luciano C, Leiderman YI (2022) Evaluation of artificial intelligence-based intraoperative guidance tools for phacoemulsification cataract surgery. JAMA Ophthalmol 140:170–177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Huaulmé A, Jannin P, Reche F, Faucheron JL, Moreau-Gaudry A, Voros S (2020) Offline identification of surgical deviations in laparoscopic rectopexy. Artif Intell Med 104:101837. [DOI] [PubMed] [Google Scholar]
- 19.Bodenstedt S, Rivoir D, Jenke A, Wagner M, Breucha M, Müller-Stich B, Mees ST, Weitz J, Speidel S (2019) Active learning using deep Bayesian networks for surgical workflow analysis. Int J Comput Assist Radiol Surg 14:1079–1087 [DOI] [PubMed] [Google Scholar]
- 20.Bodenstedt S, Wagner M, Mündermann L, Kenngott H, Müller-Stich B, Breucha M, Mees ST, Weitz J, Speidel S (2019) Prediction of laparoscopic procedure duration using unlabeled, multimodal sensor data. Int J Comput Assist Radiol Surg 14:1089–1095 [DOI] [PubMed] [Google Scholar]
- 21.Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P (2016) Automatic data-driven real-time segmentation and recognition of surgical workflow. Int J Comput Assist Radiol Surg 11:1081–1089 [DOI] [PubMed] [Google Scholar]
- 22.Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572. [DOI] [PubMed] [Google Scholar]
- 23.Lecuyer G, Ragot M, Martin N, Launay L, Jannin P (2020) Assisted phase and step annotation for surgical videos. Int J Comput Assist Radiol Surg 15:673–680 [DOI] [PubMed] [Google Scholar]
- 24.Ramesh S, Dall’Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N (2021) Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures. Int J Comput Assist Radiol Surg 16:1111–1119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Shi X, Jin Y, Dou Q, Heng PA (2021) Semi-supervised learning with progressive unlabeled data excavation for label-efficient surgical workflow recognition. Med Image Anal 73:102158. [DOI] [PubMed] [Google Scholar]
- 26.Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2017) EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36:86–97 [DOI] [PubMed] [Google Scholar]
- 27.Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2019) RSDNet: learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE Trans Med Imaging 38:1069–1078 [DOI] [PubMed] [Google Scholar]
- 28.Meeuwsen FC, van Luyn F, Blikkendaal MD, Jansen FW, van den Dobbelsteen JJ (2019) Surgical phase modelling in minimal invasive surgery. Surg Endosc 33:1426–1432 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kitaguchi D, Takeshita N, Matsuzaki H, Takano H, Owada Y, Enomoto T, Oda T, Miura H, Yamanashi T, Watanabe M, Sato D, Sugomori Y, Hara S, Ito M (2020) Real-time automatic surgical phase recognition in laparoscopic sigmoidectomy using the convolutional neural network-based deep learning approach. Surg Endosc 34:4924–4931 [DOI] [PubMed] [Google Scholar]
- 30.Malpani A, Lea C, Chen CC, Hager GD (2016) System events: readily accessible features for surgical phase detection. Int J Comput Assist Radiol Surg 11:1201–1209 [DOI] [PubMed] [Google Scholar]
- 31.Guédon ACP, Meij SEP, Osman K, Kloosterman HA, van Stralen KJ, Grimbergen MCM, Eijsbouts QAJ, van den Dobbelsteen JJ, Twinanda AP (2021) Deep learning for surgical phase recognition using endoscopic videos. Surg Endosc 35:6150–6157 [DOI] [PubMed] [Google Scholar]
- 32.Blum T, Padoy N, Feußner H, Navab N (2008) Workflow mining for visualization and analysis of surgeries. Int J Comput Assist Radiol Surg 3:379–386 [Google Scholar]
- 33.Quellec G, Lamard M, Cochener B, Cazuguel G (2014) Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans Med Imaging 33:2352–2360 [DOI] [PubMed] [Google Scholar]
- 34.Zhang B, Ghanem A, Simes A, Choi H, Yoo A (2021) Surgical workflow recognition with 3DCNN for sleeve gastrectomy. Int J Comput Assist Radiol Surg 16:2029–2036 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhang Y, Bano S, Page AS, Deprest J, Stoyanov D, Vasconcelos F (2022) Large-scale surgical workflow segmentation for laparoscopic sacrocolpopexy. Int J Comput Assist Radiol Surg 17:467–477 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Huaulmé A, Despinoy F, Perez SAH, Harada K, Mitsuishi M, Jannin P (2019) Automatic annotation of surgical activities using virtual reality environments. Int J Comput Assist Radiol Surg 14:1663–1671 [DOI] [PubMed] [Google Scholar]
- 37.Katić D, Schuck J, Wekerle AL, Kenngott H, Müller-Stich BP, Dillmann R, Speidel S (2016) Bridging the gap between formal and experience-based knowledge for context-aware laparoscopy. Int J Comput Assist Radiol Surg 11:881–888 [DOI] [PubMed] [Google Scholar]
- 38.Mascagni P, Alapatt D, Urade T, Vardazaryan A, Mutter D, Marescaux J, Costamagna G, Dallemagne B, Padoy N (2021) A computer vision platform to automatically locate critical events in surgical videos: documenting safety in laparoscopic cholecystectomy. Ann Surg 274:e93–e95 [DOI] [PubMed] [Google Scholar]
- 39.Lalys F, Bouget D, Riffaud L, Jannin P (2013) Automatic knowledge-based recognition of low-level tasks in ophthalmological procedures. Int J Comput Assist Radiol Surg 8:39–49 [DOI] [PubMed] [Google Scholar]
- 40.Lalys F, Riffaud L, Bouget D, Jannin P (2012) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59:966–976 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mascagni P, Alapatt D, Laracca GG, Guerriero L, Spota A, Fiorillo C, Vardazaryan A, Quero G, Alfieri S, Baldari L, Cassinotti E, Boni L, Cuccurullo D, Costamagna G, Dallemagne B, Padoy N (2022) Multicentric validation of EndoDigest: a computer vision platform for video documentation of the critical view of safety in laparoscopic cholecystectomy. Surg Endosc. 10.1007/s00464-022-09112-1 [DOI] [PubMed] [Google Scholar]
- 42.Yu F, Silva Croso G, Kim TS, Song Z, Parker F, Hager GD, Reiter A, Vedula SS, Ali H, Sikder S (2019) Assessment of automated identification of phases in videos of cataract surgery using machine learning and deep learning techniques. JAMA Netw Open 2:e191860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gibaud B, Forestier G, Feldmann C, Ferrigno G, Gonçalves P, Haidegger T, Julliard C, Katić D, Kenngott H, Maier-Hein L, März K, de Momi E, Nagy D, Nakawala H, Neumann J, Neumuth T, Rojas Balderrama J, Speidel S, Wagner M, Jannin P (2018) Toward a standard ontology of surgical process models. Int J Comput Assist Radiol Surg 13:1397–1408 [DOI] [PubMed] [Google Scholar]
- 44.Gholinejad M, Loeve AJ, Dankelman J (2019) Surgical process modelling strategies: which method to choose for determining workflow? Minim Invasive Ther Allied Technol 28:91–104 [DOI] [PubMed] [Google Scholar]
- 45.Garrow CR, Kowalewski KF, Li L, Wagner M, Schmidt MW, Engelhardt S, Hashimoto DA, Kenngott HG, Bodenstedt S, Speidel S, Müller-Stich BP, Nickel F (2021) Machine learning for surgical phase recognition: a systematic review. Ann Surg 273:684–693 [DOI] [PubMed] [Google Scholar]
- 46.Marcus HJ, Khan DZ, Borg A, Buchfelder M, Cetas JS, Collins JW, Dorward NL, Fleseriu M, Gurnell M, Javadpour M, Jones PS, Koh CH, Layard Horsfall H, Mamelak AN, Mortini P, Muirhead W, Oyesiku NM, Schwartz TH, Sinha S, Stoyanov D, Syro LV, Tsermoulas G, Williams A, Winder MJ, Zada G, Laws ER (2021) Pituitary society expert Delphi consensus: operative workflow in endoscopic transsphenoidal pituitary adenoma resection. Pituitary 24:839–853 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Rosse C, Mejino JL Jr (2003) A reference ontology for biomedical informatics: the foundational model of anatomy. J Biomed Inform 36:478–500 [DOI] [PubMed] [Google Scholar]
- 48.Lomax J, McCray AT (2004) Mapping the gene ontology into the unified medical language system. Comp Funct Genomics 5:354–361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Grenon P, Smith B, Goldberg L (2004) Biodynamic ontology: applying BFO in the biomedical domain. Stud Health Technol Inform 102:20–38 [PubMed] [Google Scholar]
- 50.Moglia A, Georgiou K, Morelli L, Toutouzas K, Satava RM, Cuschieri A (2022) Breaking down the silos of artificial intelligence in surgery: glossary of terms. Surg Endosc 36:7986–7997 51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Protégé. https://protege.stanford.edu/ [Google Scholar]
- 52.Huaulmé A, Dardenne G, Labbe B, Gelin M, Chesneau C, Diverrez JM, Riffaud L, Jannin P (2022) Surgical declarative knowledge learning: concept and acceptability study. Comput Assist Surg (Abingdon) 27:74–83 [DOI] [PubMed] [Google Scholar]
- 53.Hung AJ, Ma R, Cen S, Nguyen JH, Lei X, Wagner C (2021) Surgeon automated performance metrics as predictors of early urinary continence recovery after robotic radical prostatectomy-a prospective bi-institutional study. Eur Urol Open Sci 27:65–72 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ma R, Lee RS, Nguyen JH, Cowan A, Haque TF, You J, Robert SI, Cen S, Jarc A, Gill IS, Hung AJ (2022) Tailored feedback based on clinically relevant performance metrics expedites the acquisition of robotic suturing skills-an unblinded pilot randomized controlled trial. J Urol. 10.1097/JU.0000000000002691 [DOI] [PubMed] [Google Scholar]
- 55.Pangal DJ, Kugener G, Cardinal T, Lechtholz-Zey E, Collet C, Lasky S, Sundaram S, Zhu Y, Roshannai A, Chan J, Sinha A, Hung AJ, Anandkumar A, Zada G, Donoho DA (2021) Use of surgical video-based automated performance metrics to predict blood loss and success of simulated vascular injury control in neurosurgery: a pilot study. J Neurosurg 137(3):840–849. 10.3171/2021.10.JNS211064 [DOI] [PubMed] [Google Scholar]
- 56.Guerin S, Huaulmé A, Lavoue V, Jannin P, Timoh KN (2022) Review of automated performance metrics to assess surgical technical skills in robot-assisted laparoscopy. Surg Endosc 36:853–870 [DOI] [PubMed] [Google Scholar]