Background:
The rate of bile duct injury in laparoscopic cholecystectomy (LC) continues to be high due to low critical view of safety (CVS) achievement and the absence of an effective quality control system. The development of an intelligent system enables the automatic quality control of LC surgery and, eventually, the mitigation of bile duct injury. This study aims to develop an intelligent surgical quality control system for LC and using the system to evaluate LC videos and investigate factors associated with CVS achievement.
Materials and methods:
SurgSmart, an intelligent system capable of recognizing surgical phases, disease severity, critical division action, and CVS automatically, was developed using training datasets. SurgSmart was also applied in another multicenter dataset to validate its application and investigate factors associated with CVS achievement.
Results:
SurgSmart performed well in all models, with the critical division action model achieving the highest overall accuracy (98.49%), followed by the disease severity model (95.45%) and surgical phases model (88.61%). CVSI, CVSII, and CVSIII had an accuracy of 80.64, 97.62, and 78.87%, respectively. CVS was achieved in 4.33% in the system application dataset. In addition, the analysis indicated that surgeons at a higher hospital level had a higher CVS achievement rate. However, there was still considerable variation in CVS achievement among surgeons in the same hospital.
Conclusions:
SurgSmart, the surgical quality control system, performed admirably in our study. In addition, the system’s initial application demonstrated its broad potential for use in surgical quality control.
Keywords: artificial intelligence, cholecystectomy, critical view of safety, quality control, surgical safety

Introduction
Highlights
An artificial intelligent (AI) system for quality control in laparoscopic cholecystectomy (LC) was established.
We initially applied our system to analyze multicenter LC videos.
Considerable variation in LC surgical quality was found among surgeons.
Our AI system has the potential use in surgical quality control.
Since Mouret’s first case in 1987, LC has become the gold standard for symptomatic gallbladder diseases and is widely used worldwide1. Currently, over 750 000 patients in the USA undergo LC annually2. Despite advances in minimally invasive techniques, the incidence of major bile duct injury (BDI) that required reconstruction remains high, ranging from 0.15 to 0.36%3,4. As a result, several international surgical associations collaborated to develop several practice guidelines to reduce the incidence of BDI4,5. These safety cholecystectomy guidelines recommend achieving a critical view of safety (CVS) as one of the most important approaches to avoid BDI. It is worth noting that attempted CVS exposure was extremely simple and can be accomplished in more than 90% of LC procedures6,7. According to some large-volume studies, the frequency of BDI could be reduced to 2/1 000 000 when performed routinely, demonstrating the effectiveness of CVS8,9. However, actual CVS achievement is low, and there is still a wide variation in its implementation rate across institutions. The basic completion rate of CVS in a prospective study in Strasbourg, France, was only 15.9%10.
Furthermore, in a previous multi-institution survey in China, CVS achievement ranged from 1.92 to 11.98% across hospitals11. As a result, one of the key factors contributing to the elevated BDI rate in LC is low CVS achievement. The advancement of laparoscopic equipment has made it easier for us to review high-definition surgical videos12,13. Similarly, it is valid for LC to review and analyze critical information in surgical videos to assist surgeons in improving CVS achievement and thus decreasing the BDI rate. However, manual analysis of surgical videos used to be a time consuming and laborious task, which limits its widespread application. A growing number of studies have recently used AI to analyze and evaluate surgical videos, with the majority focusing on surgical phase recognition and achieving good results14. Few institutions have used AI techniques in LC video application15,16. It demonstrates the possibility and hope for the of AI techniques, particularly in surgery. A comprehensive intelligent surgical quality control system is urgently needed to improve CVS achievement and reduce BDI rates.
As a result, the goals of this study are as follows: (1) developing a multidimensional, intelligent surgical quality control system called SurgSmart to automatically recognize surgical phases, disease severity, critical division action, and CVS status in LC; and (2) using the cutting-edge system to analyze a multicenter, prospective dataset and to investigate factors associated with CVS achievement.
Materials and methods
Datasets
From October 2019 to September 2021, 1107 videos of LC cases were routinely recorded and prospectively collected from 17 hospitals, most of which were located in China’s Southwest regions. These processes were carried out by 87 surgeons of various levels (resident, attending, vice-chief, and chief physicians). The following are the video inclusion criteria: (1) LC with the integral surgical phases and no conversion to open surgery; (2) no other concurrent surgical process; and (3) standard laparoscopic videos with tubular zoomed view, at least 720×560 resolution, and a frame rate of 25 flip per second. The exclusion criteria were: (1) single-port laparoscopic and (2) other unusual process videos including laparoscopic suturing, fluorescence technique, and massive bleeding. Patients’ consent to video recording was obtained before surgery. All LC videos and surgeons’ information (including surgeons’ age, levels, sex, working year, and surgical experience) were prospectively recorded and uploaded to our online website datasets (https://lc10000.withai.com/). The Committee of the Ethical Board reviewed and approved this study. To develop the surgical quality control system and explore characteristics of surgical quality in LC videos using this system, the dataset was further divided by videos’ updating date into system-developing dataset (676 videos, ∼60%, before 17 December 2021) and system-applying dataset (431 videos, ∼40%, after 18 December 2021). Detailed datasets information can be seen in Table 1. Figure 1 shows the process of SurgSmart development and application.
Table 1.
Baseline data and test results of each functional segment of SurgSmart models
| Model name | Dataset | Labeling | Labeling frames | Total frames | Videos | Annotating κ | Precision (%) | Recall | mAP | Accuracy | F1-score |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Surgical phase and steps | Training | IT | 27 131 | 215 170 | 178 | 0.90 | |||||
| EA | 10 877 | ||||||||||
| AL | 13 511 | ||||||||||
| MHT | 78 128 | ||||||||||
| DGB | 34 868 | ||||||||||
| EG | 20 312 | ||||||||||
| COR | 30 343 | ||||||||||
| Validating | IT | 2281 | 20 906 | 22 | |||||||
| EA | 825 | ||||||||||
| AL | 776 | ||||||||||
| MHT | 9066 | ||||||||||
| DGB | 3531 | ||||||||||
| EG | 1855 | ||||||||||
| COR | 2572 | ||||||||||
| Testing | IT | 4110 | 25 598 | 22 | 91.31 | 84.38 | NA | 88.61 | 78.61 | ||
| EA | 1841 | 78.53 | 82.08 | ||||||||
| AL | 1047 | 98.97 | 56.62 | ||||||||
| MHT | 9865 | 90.36 | 97.89 | ||||||||
| DGB | 3598 | 85.98 | 91.28 | ||||||||
| EG | 2705 | 92.00 | 83.25 | ||||||||
| COR | 2432 | 80.48 | 91.84 | ||||||||
| Parkland grading | Training | Mild (grade 1–2) | 3862 | 6525 | 172 | 0.97 | |||||
| Severe (grade 3–5) | 2663 | ||||||||||
| Validating | Mild (grade 1–2) | 468 | 850 | 22 | |||||||
| Severe (grade 3–5) | 382 | ||||||||||
| Testing | Mild (grade 1–2) | 468 | 858 | 22 | 100.00 | 92.31 | NA | 95.45 | 95.37 | ||
| Severe (grade 3–5) | 390 | 90.00 | 100.00 | ||||||||
| Critical division action | Training | Artery/cystic duct division | 4550 | 5120 | 142 | 0.92 | |||||
| Validating | 196 | 22 | |||||||||
| Testing | 374 | 23 | 75.08 | 89.31 | NA | 98.49 | 76.94 | ||||
| CVSI | Training | −1 | 5906 | 21 863 | 339 | 0.89 | |||||
| 0 | 3894 | ||||||||||
| 1 | 11 293 | ||||||||||
| 2 | 770 | ||||||||||
| Validating | −1 | 658 | 2126 | 76 | |||||||
| 0 | 466 | ||||||||||
| 1 | 924 | ||||||||||
| 2 | 78 | ||||||||||
| Testing | −1 | 19 | 5305 | 22 | 61.29 | 100.00 | 76.86 | 80.64 | 70.66 | ||
| 0 | 675 | 39.02 | 82.67 | ||||||||
| 1 | 4579 | 96.84 | 80.24 | ||||||||
| 2 | 32 | 54.00 | 84.38 | ||||||||
| CVSII | Training | −1 | 6144 | 22 298 | 342 | 0.88 | |||||
| 0 | 15 560 | ||||||||||
| 2 | 594 | ||||||||||
| Validating | −1 | 738 | 2273 | 78 | |||||||
| 0 | 1429 | ||||||||||
| 2 | 106 | ||||||||||
| Testing | −1 | 29 | 6060 | 22 | 24.14 | 96.55 | 71.04 | 97.62 | 59.42 | ||
| 0 | 5994 | 99.68 | 97.91 | ||||||||
| 2 | 37 | 33.93 | 51.35 | ||||||||
| CVSIII | Training | −1 | 5950 | 20 337 | 316 | 0.89 | |||||
| 0 | 8690 | ||||||||||
| 1 | 3985 | ||||||||||
| 2 | 1712 | ||||||||||
| Validating | −1 | 671 | 2006 | 74 | |||||||
| 0 | 830 | ||||||||||
| 1 | 323 | ||||||||||
| 2 | 182 | ||||||||||
| Testing | −1 | 19 | 4530 | 22 | 37.25 | 100.00 | 66.49 | 78.87 | 56.82 | ||
| 0 | 3541 | 92.48 | 84.69 | ||||||||
| 1 | 896 | 51.17 | 56.36 | ||||||||
| 2 | 74 | 20.08 | 67.57 |
The labeling of CVS=−1 means no area for CVS evaluation was detected.
AL, adhesion lysis; COR, clear the operative region; CVSI, hepatocystic triangle criterion; CVSII, cystic plate criterion; CVSIII, two structures criterion; CVS, the critical view of safety; DGB, dissecting gallbladder from liver bed; EA, establishing access; EG, extracting the gallbladder; IT, idle time; mAP, mean average precision; MHT, mobilizing the hepatocystic triangle; NA, not applicable.
Figure 1.

The flow diagram of SurgSmart development and application. CVS, critical view of safety; FPS, flip per second; LC, laparoscopic cholecystectomy; TRN, temporal residual network; TSM, temporal shift module; YOLO, you only look once.
Definition used for labeling
LC was classified into seven surgical phases based on a modified definition from our previous study17: establishing access, adhesions lysis, mobilizing the hepatocystic triangle (MHT), dissecting gallbladder from liver bed, extracting the gallbladder, clear operative regions, and idle time when the camera was outside the abdomen or no action was visualized with the view. Also, Parkland Grading System (PGS) was used to identify disease severity, with a scale of 1–5 based on the surrounding tissues and appearance of the gallbladder18. Due to the significant increase of surgical complexity from Parkland 2–3, disease severity was classified into two categories to assess the surgical complexity based on Parkland grading. Parkland 1/2 was considered to have no/mild inflammation, whereas Parkland 3/4/5 had severe inflammation. Critical division action was defined as the division of the bile duct or artery beginning with the clip applier’s entrance and ending with the duct/artery division. CVS definition and detailed scoring were shown in Table A.1 (Supplemental Digital Content 1, http://links.lww.com/JS9/A288). Every CVS criterion was scored. CVS=−1 represents that the current frame does not include any assessable view; Meanwhile, CVS=0 represents the criteria of CVSI/CVSII/CVSIII are not achieved; CVS=1 means the criteria of CVSI/CVSIII is partially achieved; CVS=2 illustrates that the criteria of CVSI/CVSII/CVSIII is completely achieved.
Details of annotation
Two well-trained surgeons (involved in >300 LC cases) from our hospital independently annotated frames extracted from LC videos. The disagreement was meditated by a senior hepato–biliary–pancreatic surgeon (>2000 LC cases). The data was randomly split on a case-by-case basis to avoid frames from a video appearing in a different dataset. To develop the system, system-developing datasets were randomly divided into training, validating, and testing set (ratio 8:1:1). Frames were extracted at a rate of one frame per second using FFmpeg (https://www.ffmpeg.org) and screened for further annotation using our custom annotation platform (https://www.LC10000.withai.com). Specifically, frames to assess disease severity were extracted 20 s before and 20 s after the transition time point of establishing access/MHT or adhesions lysis/MHT. Also, CVS frames were extracted 60 s before annotated critical division action. Two annotators separately annotated the surgical phases (the starting frame and the ending frame of each video with our annotation platform), disease severity (mild/severe inflammation with our annotation platform), critical division action [the starting frame and the ending frame of each video, using our annotation platform; the clip applier, using Labelme (v4.2.10)], and CVS scoring [CVSI, CVSII, and CVSIII, using Colabeler (http://www.jinglingbiaozhu.com)]. The inter-rater agreement was determined using Cohen’s κ coefficient. Three-fold cross-validation was conducted to avoid sampling bias. The model was developed in Python 3.8 using Anaconda 2021.05 (Anaconda Inc.). Two NVIDIA (NVIDIAphics processing units were used for model training (GPU). Each model was pretrained using the ImageNet dataset. The following section details the model development process.
Surgical phases model development
All frames extracted from videos that met the eligibility criteria were annotated. Annotation example can be seen in Figure A.1 in the Appendix A (Supplemental Digital Content 1, http://links.lww.com/JS9/A288). The Temporal Shifting Module (TSM)19, which can efficiently and accurately incorporates temporal information, developed this model (Fig. A.2a in the Appendix A, Supplemental Digital Content 1, http://links.lww.com/JS9/A288).
Disease severity model development
To efficiently apply the grading system in clinical practice and provide the information of surgical difficulty for surgeons, we simplified Parkland grading. Due to the significant increase of surgical complexity from Parkland 2–3, disease severity was classified into two categories to assess the surgical complexity based on Parkland grading20. Parkland 1/2 was considered to have no/mild inflammation, whereas Parkland 3/4/5 have severe inflammation. The Temporal Residual Network21 was used to develop the model (Fig. A.2b in Appendix A, Supplemental Digital Content 1, http://links.lww.com/JS9/A288).
Critical division action model development
Considering that the duct clipping procedure is irreversible and that improper clipping can result in significant BDI, this step was designated as a critical division action in LC, defined as the division of the bile duct/artery beginning with the clip applier’s entrance and ending with the duct/artery division. Due to the duration of critical division action was short, surgical phase learning model and training strategy were firstly used and the results showed an unsatisfied accuracy that was lower than 85%. Thus, clip applier was regarded as the landmark of critical division action and was applied to enhance the performance of algorithm. The model was developed using YOLO v4, a fast and accurate object detection algorithm22. In addition, postprocessing techniques such as sliding windows were used to enhance the model’s performance (Fig. A.2c in Appendix A, Supplemental Digital Content 1, http://links.lww.com/JS9/A288).
Critical view of safety status model development
Strasberg et al. 23 defined CVS, and Table A.1 (Supplemental Digital Content 1, http://links.lww.com/JS9/A288) contains the definition and detailed classifications (CVSI–CVSIII) and scores (−1 to 2 points) in our study. The annotation examples are shown in Figure A.3 in Appendix A (Supplemental Digital Content 1, http://links.lww.com/JS9/A288). Due to the complexity of the task, and to optimize this model’s performance, the algorithm we used to analyze the CVS has two stages (Fig. A.2d in Appendix A, Supplemental Digital Content 1, http://links.lww.com/JS9/A288). The first stage is an image feature extraction network, in practice YOLO v4, which takes an original image frame as input and provides the region of interest of CVS evaluation as output. In the second stage, a temporal network is used, in practice, we use TSM. TSM combines the region of interest information of several consecutive frames, to determine the score of each CVS criterion. To conduct additional research and analysis of surgical quality with high precision, expert annotators reviewed all CVS results and corrected any incorrect scores output.
Evaluation and application of quality control system
The performance of these models was evaluated by Precision (Precision = Truepositive/(Truepositive + falsepositive), Recall (Recall = Truepositive/(Truepositive + falsenegative) Accuracy (Accuracy = (Truepositive + Truenegative)/Totalframes), F1-score (F1-score = 2Precision * recall/(Precision + recall), and mean average precision (mAP). After training, validating, and testing all models in the System-Developing Dataset, SurgSmart analyzed videos in the System Applying Dataset. The surgical phases, severity of the disease, critical division action, and CVS status were all exported to the SurgSmart report and review system (Fig. 2).
Figure 2.

The schematic of SurgSmart workflow. The lanes show the estimated phase output by SurgSmart, and each color bar represents a corresponding surgical phase (include idle time, IT). The round icons represent the disease severity of no/mild (M) or severe (S) inflammation. The pentagram icon represents the time point of critical division action (CDA). The red detecting box represents that 20/60 frames were extracted for Parkland/critical view of safety (CVS) evaluation, respectively. The blue detecting box represents the minimally area for CVS evaluation, detected by YOLO, v4. In addition, the salient map indicates the most prominent regions in an image when SurgSmart evaluating CVS status. The upper lane (video ID: 0009151480) illustrates an easy LC procedure with regular phase process. Disease severity (mild inflammation) were recognized around the start time of mobilizing the hepatocystic triangle (MHT). The surgeon decided to divide duct at about 10 min, then the 60 frames (1 frame per second) before that time were used for CVS evaluation. The SurgSmart shows five scores of total CVS. The lower lane (video ID: 635090) illustrates a complex laparoscopic cholecystectomy procedure with frequent transitions of the surgical phase. Disease severity (severe inflammation) were recognized around the start time of adhesions lysis. The surgeon decided to divide duct at about 23 min. The SurgSmart shows zero scores of total CVS. AL, adhesion lysis; COR, clear the operative region; DGB, dissecting gallbladder from liver bed; EA, establishing access; EG, extracting the gallbladder.
Statistical methods
SPSS Statistics, v23.0 was used to analyze the application dataset (IBM Corp.). Continuous variables were reported as the mean±SD if following a normal distribution. Otherwise, the data was showed as the median (25–75 quantiles). Categorical values were presented as numbers and percentages. Variables deemed clinically significant were included in the ordinal logistic regression model for CVS attainment status. All statistical results were considered statistically significant at P-value less than 0.05 (both sides). In addition, we demonstrated the bubble gram contains significant factors associated with various CVS achievement statuses.
Results
Dataset and performance of all models
Table A.2 in the Supplement (Supplemental Digital Content 1, http://links.lww.com/JS9/A288) contains the baseline data for patients, surgeons, and hospitals in the Model Developing Dataset. As shown in Table 1, different video datasets ranging from 187 to 443 cases were assigned to develop models. Approximately 200 videos were assigned to develop simple models (surgical phases, disease severity, and critical division action). In contrast, over 400 videos were assigned to develop the CVS evaluation model due to its imbalanced datasets, lacking of high-grade samples. The κ value, which ranged from 0.88 to 0.97, was used to assess the inter-rater reliability of annotators for all models. Cross-validation results were demonstrated in Table A.3 (Supplemental Digital Content 1, http://links.lww.com/JS9/A288). The critical division action model had the highest overall accuracy (98.49%), followed by the disease severity model (95.45%) and surgical phase model (88.61%), in terms of performance metrics for all models. For CVSI, CVSII, and CVSIII, the accuracy and mAP were 80.64/76.86, 97.62/71.04, and 78.87%/66.49%, respectively. Such findings indicate that the CVS model’s precision and recall varied significantly, with lower precision and recall in frames with high scores.
Models application results
Table A.2 in the Supplement (Supplemental Digital Content 1, http://links.lww.com/JS9/A288) displays the baseline data of patients, surgeons, and hospitals from the Model Application Dataset. Table 2 displays the outcome of the model application. The median surgical time was 35.0 min, with MHT taking the most time (11.8 min), followed by idle time (6.5 min). There were 165 severe inflammation cases in the application dataset, accounting for 38.11%. In addition, a hepatocystic triangle-to-fundus technique was used in 91.68%, where CVS was detectable.
Table 2.
Surgical factors analyzed by SurgSmart in application dataset
| Surgical factors | Application dataset [n (%)] | |
|---|---|---|
| Phase time [median (range)] (min) | ||
| IT | 6.5 (4.1–9.5) | |
| EA | 1.5 (1.0–2.1) | |
| AL | 0 (0–1.6) | |
| MHT | 11.8 (8.4–17.3) | |
| DGB | 4.0 (2.5–6.1) | |
| EG | 2.5 (1.4–4.6) | |
| COR | 4.1 (2.5–6.8) | |
| Overall | 35.0 (25.0–47.6) | |
| Time from MHT to clippinga | 6.0 (4.0–9.6) | |
| Operation methods | ||
| Hepatocystic triangle first | 397 (91.68) | |
| Convert to fundus-first after the dissection of hepatocystic triangle | 18 (4.16) | |
| Undergo fundus-first directly | 18 (4.16) | |
| Parkland grading | ||
| <3 | 268 (61.89) | |
| ≥3 | 165 ((38.11) | |
| Dissection of hepatocystic triangle (CVSI)b | ||
| 0/2 | 1 (0.24) | |
| 1/2 | 380 (91.57) | |
| 2/2 | 34 (8.19) | |
| Lower one third of cystic plate (CVSII)b | ||
| 0/2 | 390 (93.98) | |
| 2/2 | 25 (6.02) | |
| Exposure of two ductal structure (CVSIII)b | ||
| 0/2 | 85 (20.48) | |
| 1/2 | 231 (55.66) | |
| 2/2 | 99 (23.86) | |
| Sum score of CVSb | ||
| 0/6 | 1 (0.24) | |
| 1/6 | 83 (20.00) | |
| 2/6 | 222 (53.01) | |
| 3/6 | 67 (16.14) | |
| 4/6 | 24 (5.78) | |
| 5/6 | 5 (1.20) | |
| 6/6 | 13 (3.13) | |
AL, adhesion lysis; COR, clear the operative region; CVS, critical view of safety; DGB, dissecting gallbladder from liver bed; EA, establishing access; EG, extracting the gallbladder; IT, idle time; MHT, mobilizing the hepatocystic triangle.
aThe duration from the starting point of MHT to irreversible duct clipping (critical division action).
bVideos containing the method of hepatocystic triangle first or converted to fundus-first after the dissection of the hepatocystic triangle (N=415) were CVS scorable.
The CVS evaluation results showed that clearing the hepatocystic triangle (CVSI) and dissecting the lower one third of a cystic plate (CVSII) had only 8.1 and 6.0% completion rates, respectively. Furthermore, CVS was only achieved in 4.33% of the cases (score 5 or 6). Also, there was a significant variation in the rate of high-quality CVS achievement across hospital levels, including county-level hospitals, provincial and prefectural hospitals, and ministerial hospitals (Table 2). As illustrated in Figure 3, the CVS achievement rate varied significantly among surgeons. The ordinal logistic regression analysis revealed that higher hospital level is a significant predictor of CVS achievement compared with county-level hospitals, with the ministerial hospital having an odd ratio of 3.717 and provincial and prefectural hospitals having an odds ratio of 2.339 (Table 3).
Figure 3.

The critical view of safety achievement rate of surgeons. The diameter of dots indicates the amount of surgery in the dataset. The color of the scatter represents the hospital where the doctor is located. Abbreviations: deleted for blinding.
Table 3.
Ordinal logistic regression of factors relevant to the scoring of critical view of safety
| Characteristics | Odds ratio | 95% CI for odds ratio | SE | P |
|---|---|---|---|---|
| Work age | 1.003 | 0.961–1.046 | 0.021 | 0.903 |
| Sex of patients | ||||
| Female | 0.987 | 0.668–1.457 | 0.199 | 0.947 |
| Male | Reference | |||
| Diagnosis | ||||
| Gallstone/gallbladder polyps | 0.917 | 0.579–1.452 | 0.234 | 0.712 |
| Chronic cholecystitis | 1.266 | 0.688–2.330 | 0.311 | 0.448 |
| Acute cholecystitis | Reference | |||
| Emergency or not | ||||
| Nonemergency | 0.713 | 0.105–4.836 | 0.977 | 0.729 |
| Emergency | Reference | |||
| Gender of surgeons | ||||
| Male | 0.729 | 0.109–4.881 | 0.970 | 0.744 |
| Female | Reference | |||
| Title of surgeons | ||||
| Resident | 2.399 | 0.496–11.614 | 0.805 | 0.277 |
| Attending | 1.077 | 0.287–4.036 | 0.674 | 0.912 |
| Vice-chief | 2.148 | 0.657–7.020 | 0.604 | 0.206 |
| Chief | Reference | |||
| Hospital level | ||||
| Ministerial level | 3.717 | 1.847–7.477 | 0.357 | <0.001 |
| Prefectural and provincial | 2.339 | 1.198–4.566 | 0.341 | 0.013 |
| County level | Reference | |||
CI, coefficient interval; SE, standard error.
Significant P-values are in bold.
Discussion
Numerous studies have discovered that the quality of surgery is a significant predictor of both perioperative and long-term outcomes24,25. Since surgical videos contain a wealth of critical information regarding safety, efficacy, disease difficulty, and so forth, it is exceedingly difficult to evaluate surgical quality with high accuracy, objectivity, and efficiency using traditional techniques26,27. In contrast, with the rapid advancement of AI technology, intelligent surgical quality control is now possible28,29. Thus, to improve the quality of LC and decrease the rate of BDI, an intelligent quality control system, SurgSmart, was developed in the current study to automatically recognize surgical phases, disease severity, critical division action, and CVS status. Following a series of training sessions, SurgSmart demonstrated satisfied performance in all models, particularly in identifying disease severity and critical division action, with an accuracy exceeding 95%. Then, 431 videos from the System Applying Dataset were analyzed using SurgSmart. The results indicated that CVS (5/6 points) was achieved in only 4.3% of cases, along with basic surgical quality data. In less than 10% of cases, the hepatocystic triangle was cleared (CVSI), or the lower part of the cystic plate was dissected (CVSII). Further analysis revealed that while higher hospital levels were associated with improved CVS achievement, there was still considerable variation in CVS achievement among surgeons working in the same hospital. All of these results demonstrate SurgSmart’s satisfied performance in automatically recognizing critical information and its significant application potential in surgical quality control.
Several institutes have begun investigating the application of AI in surgery in recent years. One of the well-developed applications is the surgical phase model. LC, sleeve gastrectomy, colorectal surgery, and laparoscopic hysterectomy are performed successfully with current models14,30–32. Our algorithms output an overall accuracy of 88.6%, which is comparable to the general results of conducted studies analyzing LC (90%)14. Specially, as a relatively important phase in LC, MHT achieved an accuracy rate over 90% (Table 1) The surgical phases serve as a general indicator of surgical workflow efficiency and rationality. The electronic report output generated by the SurgSmart system enables the easy identification of surgeries with varying degrees of efficiency and rationality (Fig. 2). Meanwhile, it could determine which procedure employs or converts to the fundus-first technique based on the surgical phase’s results. In addition, based on previous work17, “idle time” was added to surgical phases for the first time. Intriguingly, in the System Applying Dataset, idle time, defined as a camera outside the abdomen or no action occurring during, was the second most time-consuming phase after MHT during LC. Our system could rapidly locate and review idle time phases during each surgery, which could help us improve surgical efficiency even more.
The severity of the disease plays a significant role in determining the difficulty of an operation33. Appropriate disease severity assessment may aid us in carrying out surgical quality control and decision-making objectively. Currently, the PGS and the Tokyo Guideline 18 (TG18) are the two most widely used systems for assessing disease severity in LC18,34. PGS, dependent on intraoperative findings, was simple to classify disease severity using computer vision and correlated positively with TG18 for acute cholecystitis. Korndorffer et al. 35 used a Temporal-ConvNet and a Temporal-Sequence model to review approximately 1000 LC videos assisted by AI rapidly and discovered that a well-trained AI model performed well at classifying disease severity. In addition, they demonstrated that intraoperative complications (drain insertion, gallbladder spillage, and so on) occurred more frequently in high-severity cases compared with low-severity cases. Our study discovered that 20 frames extracted from the surgical phase transition could accurately determine disease severity, allowing us to make better use of our system’s computational power. Also, to better realize clinical application, we classified PGS into two categories to assess the surgical complexity. An AI model developed by Ward and colleagues achieved Krippendorff’s α coefficient of 0.71 and showed relatively higher accuracy in low grade1,2 and high grade4,5,20. Considering the significant increase of gallbladder inflammation and surgical complexity from Parkland 2–3, we regard Parkland 1/2 to as no/mild inflammation, and Parkland 3/4/5 as severe inflammation. SurgSmart’s accuracy was over 95% after a series of training in our multicenter dataset, highlighting its clinical application of disease severity classification.
BDI always occur when the surgeon divides the duct. Once a division action is initiated, it cannot be reversed. As a result, duct division should be regarded as a surgical safety checkpoint, and it provides critical information for CVS evaluation in all LC procedures33. Pietro et al. 36 created EndoDigest, a computer vision platform that could automatically locate critical duct division events in surgical videos with a mean absolute error of 63 s and provide video clips with 91% CVS status. We firstly applied the same training strategy of surgical phase model, but the accuracy was low due to the duration of critical division action was very short. Considering the inevitable use of clip applier in clipping action and barely no appearance during the other part of a procedure, we regarded the detection of clip applier as the beginning of critical division action and it was annotated to enhance the performance of system. Our study developed a model that can automatically locate critical division action with 98.49% accuracy. Meanwhile, in the applying dataset, it was demonstrated that most critical division actions occurred in the middle of the MHT phase was discovered, reflecting the insufficient achievement of CVS from another perspective.
Almost all safety guidelines strongly recommend CVS as an important intraoperative anatomy exposure technique to reduce the BDI rate of LC5,34,37. However, the actual CVS achievement rate worldwide is far lower than we anticipated1. Pietro et al.10 demonstrated in a 2-year study that using a 5-s time-out rule before clipping improves CVS rate. In our study, which included 431 cases from a multicenter dataset in China’s southwest, the CVS achievement rate was as low as 4.33%. Thus, improving CVS achievement is the premise for ensuring LC safety. Nonetheless, cognitive bias and executive ability continued to limit manual monitoring and feedback of CVS over time. A few institutions began to investigate the possibility of automatic CVS recognition. Korndorffer et al. 35 used a two-model AI system comprised of a temporal-ConvNet model and a Long Short-Term Memory Network to achieve ∼80% accuracy in each CVS criterion. Pietro et al. 16, on the other hand, used a two-stage model called ‘DeepCVS’ to segment hepatocystic anatomy and predict CVS achievement, which achieved roughly 50/74/92% average accuracy in CVSI/CVSII/CVSIII criterion. To optimize the performance of model, YOLO, v4 was firstly used to provides the region of interest of CVS evaluation as output, including the hepatocystic triangle and lower part of the cystic plate. Considering the faster speed of inference, smaller # params, and less overfit, TSM was used to calculate the CVS score (Table A.3 in Appendix, Supplemental Digital Content 1, http://links.lww.com/JS9/A288) In our study, the accuracy of each CVS criterion ranged from 78 to 97%, and model performance decreased as the CVS score increased. (Table 1) The mAP was 77, 71, and 66%, respectively for CVSI, CVSII, and CVSIII, which was comparable to DeepCVS in hepatocystic triangle and cystic plate criterion, but inferior in two structures criterion. We believe the lack of high-grading CVS data caused the unsatisfied results considering the low accuracy in two-point CVSIII (20%) and CVSII (51%). (Table 1) Then, in System Applying Datasets, we used this system to evaluate CVS. According to the results of ordinal logistic regression, CVS score is related to hospital level, with a higher level of hospital having higher CVS score. Surprisingly, even within the same hospital, there was a large variation in CVS achievement rate among different surgeons (Fig. 3), and this variation had no significant correlation with surgeon experience. These findings demonstrated that lower CVS achievement rates among surgeons are not only a technique issue but also a cognition and execution problem. Therefore, surgical quality control should be carried out as usual.
Overall, an intelligent surgical quality control system that can automatically evaluate critical information on multiple dimensions has been developed. Through rapid review and analysis of the electronic reports generated by SurgSmart, hospital administrators could manage surgical quality and safety. Our quality control system, however, still has some limitations. Initially, the CVS status model’s accuracy was significantly lower than that of other models, and it still needs further improvement. As a result, efforts to improve the model’s performance were continued by optimizing the algorithm and increasing the qualified data. Furthermore, although the dataset for model training was collected from various tertiary hospitals, mostly in China’s Southwest regions, the external adaptability still needs to be validated nationally. Finally, it is unknown whether this intelligent system can improve surgical safety and quality in general, necessitating the urgent need for a prospective multicenter randomized control trial.
Conclusions
SurgSmart, a surgical quality control system that integrates automated evaluation and analysis of surgical phases, disease severity, critical division action, and CVS status, had a decent overall accuracy based on our findings. In addition, our study’s initial application of this system revealed a significant potential surgical hazard associated with extremely low CVS achievement among hospitals and surgeons, demonstrating the critical need for and promising future of this comprehensive and intelligent quality control system.
Ethical approval
The study was approved by the Ethics Committee on Biomedical Research, West China Hospital of Sichuan University (registration no. 2020-503).
Sources of funding
This work was supported by the fund of the high-quality development of Guang’an People’s Hospital (21FZ001), Key Project of Science & Technology Department of Sichuan Province (22ZDYF1920), and Union Project of Science & Technology Department of Chongqing(cstc2021jscx-msxmX0011).
Author contributions
W.X., P.B., and W.S. conceived and designed the study. C.Y., W.A., and L.Q. performed the research. A.L., W.Y., and Y.Z. developed and validated the deep-learning system under the supervision of L.J. and J.J. L.R. and A.J. did the statistical analysis. W.S., C.Z., and L.R. wrote the first draft of the article, which was critically revised and approved by all authors.
Conflicts of interest disclosure
The authors declare that they have no financial conflict of interest with regard to the content of this report.
Research registration unique identifying number (UIN)
Not applicable because the research involved only surgical videos and didn’t involve human subjects.
Guarantor
Xin Wang.
Supplementary Material
Acknowledgements
None.
Footnotes
S.W., Z.C., and R.L. are co-first authors.
B.P. and X.W. are co-corresponding authors.
Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article.
Supplemental Digital Content is available for this article. Direct URL citations are provided in the HTML and PDF versions of this article on the journal's website, www.journal-surgery.net.
Published online 12 April 2023
Contributor Information
Shangdi Wu, Email: husadi893508964@163.com.
Zixin Chen, Email: doctorchenzx@outlook.com.
Runwen Liu, Email: liurunwen2021@outlook.com.
Ang Li, Email: angli@scu.edu.cn.
Yu Cao, Email: 1101298559@qq.com.
Ailin Wei, Email: ailinwei_yzmy@163.com.
Qingyu Liu, Email: liuqingyu525@163.com.
Jie Liu, Email: liujie@withai.com.
Yuxian Wang, Email: wangyx@withai.com.
Jingwen Jiang, Email: jiangjingwen@wchscu.cn.
Zhiye Ying, Email: yingzhiye@wchscu.cn.
Jingjing An, Email: sss-ajj@163.com.
Bing Peng, Email: pengbing84@hotmail.com.
Xin Wang, Email: hxwangxin2012@hotmail.com.
References
- 1. Pucher P, Brunt L, Davies N, et al. Outcome trends and safety measures after 30 years of laparoscopic cholecystectomy: a systematic review and pooled data analysis. Surg Endosc 2018;32:2175–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Vollmer CM, Jr, Callery MP. Biliary injury following laparoscopic cholecystectomy: why still a problem? Gastroenterology 2007;133:1039–1041. [DOI] [PubMed] [Google Scholar]
- 3. Törnqvist B, Strömberg C, Persson G, et al. Effect of intended intraoperative cholangiography and early detection of bile duct injury on survival after cholecystectomy: population based cohort study. BMJ 2012;345:e6457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Brunt LM, Deziel DJ, Telem DA, et al. Safe Cholecystectomy Multi-society Practice Guideline and State of the Art Consensus Conference on Prevention of Bile Duct Injury During Cholecystectomy. Ann Surg 2020;272:3–23. [DOI] [PubMed] [Google Scholar]
- 5. Pucher P, Brunt L, Fanelli R, et al. SAGES expert Delphi consensus: critical factors for safe surgical practice in laparoscopic cholecystectomy. Surg Endosc 2015;29:3074–3085. [DOI] [PubMed] [Google Scholar]
- 6. Avgerinos C, Kelgiorgi D, Touloumis Z, et al. One thousand laparoscopic cholecystectomies in a single surgical unit using the “critical view of safety” technique. J Gastrointest Surg 2009;13:498–503. [DOI] [PubMed] [Google Scholar]
- 7. Sanjay P, Fulke J, Exon D. ‘Critical view of safety’ as an alternative to routine intraoperative cholangiography during laparoscopic cholecystectomy for acute biliary pathology. J Gastrointest Surg 2010;14:1280–1284. [DOI] [PubMed] [Google Scholar]
- 8. Yegiyants S, Collins JC. Operative strategy can reduce the incidence of major bile duct injury in laparoscopic cholecystectomy. Am Surg 2008;74:985–987. [PubMed] [Google Scholar]
- 9. Tsalis K, Antoniou N, Koukouritaki Z, et al. Open-access technique and “critical view of safety” as the safest way to perform laparoscopic cholecystectomy. Surg Laparosc Endosc Percutaneous Tech 2015;25:119–124. [DOI] [PubMed] [Google Scholar]
- 10. Mascagni P, Rodriguez-Luna MR, Urade T, et al. Intraoperative time-out to promote the implementation of the critical view of safety in laparoscopic cholecystectomy: a video-based assessment of 343 procedures. J Am Coll Surg 2021;233:497–505. [DOI] [PubMed] [Google Scholar]
- 11. Liu R, Liu Z, Wang X. Intraoperative time-out to promote the implementation of the critical view of safety in laparoscopic cholecystectomy: on the comprehension of criteria, disease severity, and hawthorne effects. J Am Coll Surg 2022;234:1261–1262. [DOI] [PubMed] [Google Scholar]
- 12. Hashimoto DA, Rosman G, Rus D, et al. Artificial intelligence in surgery: promises and perils. Ann Surg 2018;268:70–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Bonrath EM, Gordon LE, Grantcharov TP. Characterising ‘near miss’ events in complex laparoscopic surgery through video analysis. BMJ Qual Saf 2015;24:516–521. [DOI] [PubMed] [Google Scholar]
- 14. Garrow C, Kowalewski K, Li L, et al. Machine learning for surgical phase recognition: a systematic review. Ann Surg 2021;273:684–93. [DOI] [PubMed] [Google Scholar]
- 15. Madani A, Namazi B, Altieri MS, et al. Artificial intelligence for intraoperative guidance: using semantic segmentation to identify surgical anatomy during laparoscopic cholecystectomy. Ann Surg 2022;276:363–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Mascagni P, Vardazaryan A, Alapatt D, et al. Artificial intelligence for surgical safety: automatic assessment of the critical view of safety in laparoscopic cholecystectomy using deep learning. Ann Surg 2022;275:955–61. [DOI] [PubMed] [Google Scholar]
- 17. Cheng K, You J, Wu S, et al. Artificial intelligence-based automated laparoscopic cholecystectomy surgical phase recognition and analysis. Surg Endosc 2022;36:3160–3168. [DOI] [PubMed] [Google Scholar]
- 18. Madni TD, Leshikar DE, Minshall CT, et al. The Parkland grading scale for cholecystitis. Am J Surg 2018;215:625–30. [DOI] [PubMed] [Google Scholar]
- 19. Lin J, Gan C, Han S. Tsm: Temporal shift module for efficient video understanding. Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019.
- 20. Ward TM, Hashimoto DA, Ban Y, et al. Artificial intelligence prediction of cholecystectomy operative course from automated identification of gallbladder inflammation. Surg Endosc 2022;36:6832–6840. [DOI] [PubMed] [Google Scholar]
- 21. Zhou B, Andonian A, Oliva A, et al. Temporal relational reasoning in videos. Proceedings of the European conference on computer vision (ECCV); 2018.
- 22. Bochkovskiy A, Wang C-Y, Liao H-YMJ. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint. 2020.
- 23. Strasberg SM, Hertl M, Soper NJ. An analysis of the problem of biliary injury during laparoscopic cholecystectomy. J Am Coll Surg 1995;180:101–125. [PubMed] [Google Scholar]
- 24. Birkmeyer JD, Finks JF, O’Reilly A, et al. Surgical skill and complication rates after bariatric surgery. N Engl J Med 2013;369:1434–1442. [DOI] [PubMed] [Google Scholar]
- 25. Hannan EL, Kilburn H, O’Donnell JF, et al. Adult open heart surgery in New York State. An analysis of risk factors and hospital mortality rates. JAMA 1990;264:2768–2774. [PubMed] [Google Scholar]
- 26. Curtis NJ, Foster JD, Miskovic D, et al. Association of surgical skill assessment with clinical outcomes in cancer surgery. JAMA Surg 2020;155:590–598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Pugh CM, Hashimoto DA, Korndorffer JR. The what? How? And Who? Of video based assessment. Am J Surg 2021;221:13–18. [DOI] [PubMed] [Google Scholar]
- 28. Loftus TJ, Tighe PJ, Filiberto AC, et al. Artificial intelligence and surgical decision-making. JAMA Surg 2020;155:148–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Gordon L, Grantcharov T, Rudzicz F. Explainable artificial intelligence for safe intraoperative decision support. JAMA Surg 2019;154:1064–1065. [DOI] [PubMed] [Google Scholar]
- 30. Kitaguchi D, Takeshita N, Matsuzaki H, et al. Real-time automatic surgical phase recognition in laparoscopic sigmoidectomy using the convolutional neural network-based deep learning approach. Surg Endosc 2020;34:4924–4931. [DOI] [PubMed] [Google Scholar]
- 31. Guédon ACP, Meij SEP, Osman KNMMH, et al. Deep learning for surgical phase recognition using endoscopic videos. Surg Endosc 2021;35:6150–6157. [DOI] [PubMed] [Google Scholar]
- 32. Hashimoto DA, Rosman G, Witkowski ER, et al. Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve gastrectomy. Ann Surg 2019;270:414–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Hussain A. Difficult laparoscopic cholecystectomy: current evidence and strategies of management. Surg Laparosc Endosc Percutaneous Tech 2011;21:211–217. [DOI] [PubMed] [Google Scholar]
- 34. Wakabayashi G, Iwashita Y, Hibi T, et al. Tokyo Guidelines 2018: surgical management of acute cholecystitis: safe steps in laparoscopic cholecystectomy for acute cholecystitis (with videos). J Hepatobiliary Pancreat Sci 2018;25:73–86. [DOI] [PubMed] [Google Scholar]
- 35. Korndorffer J, Hawn M, Spain D, et al. Situating artificial intelligence in surgery: a focus on disease severity. Ann Surg 2020;272:523–528. [DOI] [PubMed] [Google Scholar]
- 36. Mascagni P, Alapatt D, Urade T, et al. A computer vision platform to automatically locate critical events in surgical videos: documenting safety in laparoscopic cholecystectomy. Ann Surg 2021;274:e93–e95. [DOI] [PubMed] [Google Scholar]
- 37. Conrad C, Wakabayashi G, Asbun HJ, et al. IRCAD recommendation on safe laparoscopic cholecystectomy. J Hepatobiliary Pancreat Sci 2017;24:603–615. [DOI] [PubMed] [Google Scholar]
