Table 1.
Author, year | Country | Study design | Type of procedure | Main outcomes | Objective | Final Cohort | Type of AI | Prediction Performance | External validation |
---|---|---|---|---|---|---|---|---|---|
Bartek MA. J Am Coll Surg. 2019 Oct. | USA | Monocentric, retrospective, observational study | All surgeries | Surgical time prediction | Development of statistical models to improve estimation of case-time duration. | 14 345 cases | Random forest and XGBoost |
The ability to predict cases within 10% improved from 32% using our institutional standard to 39% with the ML surgeon-specific model. Models with accuracy greater than or equal to that of schedulers (i.e. >75%) constituted 45% of all models. These models were notably superior to the surgeon schedulers with within 10% prediction as high as 50% compared to 32%. |
No |
Martinez O. Comput Methods Programs Biomed. 2021 Sep. | Colombia | Monocentric, observational study | Single procedures surgeries | Surgical time prediction | Optimization of the OR efficiency by improving the surgery scheduling task, which requires the estimation of surgical time duration. | 81 248 cases | Linear Regression, Regression Trees, Support Vector Regression and Bagging Regression Trees | The best overall performance was obtained using Bagged Trees (26 min RMSE, 3.16 min training time, 0.49 min testing time) when using a subset of the DB with the nine specialties containing 80% of the surgeries. Bagged Trees also outperformed the experience-based method with a lower RMSE. | No |
Jiao Y. Br J Anaesth. 2022 May. | USA | Multicentric, retrospective, observational study | All surgeries | Methods to predict procedure duration | Development of a machine learning approach that continuously incorporates preoperative and intraoperative information for forecasting surgical duration. | 70 826 cases | Modular artificial neural network | The modular artificial neural network had the lowest time error (CRPS mean = 13.8; standard deviation = 35.4 min), which was significantly better (mean difference = 6.4 min [95% confidence interval: 6.3–6.5]; P < 0.001) than the Bayesian approach. The modular artificial neural network also had the highest accuracy in identifying operating theatres that would overrun 15:00 (accuracy at 1 h prior = 89%) compared with the Bayesian approach (80%) and a naive approach using the scheduled duration (78%). | Yes |
Hassanzadeh H. BMC Med Inform Decis Mak. 2022 Jun. | Australia | Monocentric, observational study | Elective and emergency surgeries | Predicting daily surgery demand and by medical specialty | Utilization of operating theatre data to provide decision support for improved theatre management. | 99 732 surgeries on 63 697 unique patients. | Rolling window, Regression (Linear), Regression (Poisson), Regression (Negative binomial), Decision tree, Random forest, SVM (Linear, RBF, Sigmoid, Poly, Bagging regressor, Gradient boosting regressor, XGBoost regressor, Ensemble regressor | Predicting operating theatre demand is a viable component in theatre management, enabling hospitals to provide services as efficiently and effectively as possible to obtain the best health outcomes. They could be predicted with 90% accuracy. | No |
Abbou B. Big Data Cogn Comput. 2022 Jul. | Israel | Two-centre, observational, retrospective study | All surgeries | Expected length of stay | Improvement of the productivity and utility of operating rooms. | 102 301 cases in hospital 1 and 149 308 case in hospital 2 |
Naïve model based on the median length of similar surgeries and XGBoost model |
Using different measures of performance evaluations, the XGBoost models performed better than the naïve models: the MAE was 21.5 compared to 25.4 in hospital 1 and 25.3 compared to 28.7 in hospital 2; RMSE, 36.6 vs. 49.0 (hospital 1), 40.3 vs. 55.0 (hospital 2); PVE, 66.7 vs. 44.0 (hospital 1), 70. vs. 46.8 (hospital 2); and ML2R, 0.46 vs. 0.53 (hospital 1) and 0.46 vs. 0.49 (hospital 2). In the case of MAPE, differences between the naïve and the ML-based model were minor—35.15 vs. 35.37 in hospital 1 and 35.09 vs. 32.48 in hospital 2 according to hospital performance evaluations. | No |
Lam SSW. Healthcare (Basel). 2022 Jun. | Singapore and North Carolina | Two-centre, observational, retrospective study | Colorectal surgeries | Estimation of Surgery Durations | Determination of the performance of current surgery case duration estimations and the use of machine learning models to predict surgery duration across two large tertiary healthcare institutions. | 7 585 cases (Center-1) and 3 597 cases (Center-2) | CatBoost |
The simple MA-based predictions outperform the scheduled duration provided by the OR schedulers across RMSE, MAE, MAPE and proportion of cases within 80–120% of the scheduled actual duration. In center-1, the Model 5 shows the best performance with RMSE 45.18, MAE 23.986 and MAPE 34.40%. In center-2, Model 5 has the best performance, with 56.11% of its predictions falling within +/−20% of the actual duration. Model 5 prediction accuracy (within +/−20%) is 7.78% higher than that of the MA. Model 5 also has the lowest RMSE, MAE and MAPE at 38.48%, 23.61% and 23.36%, respectively. |
No |
Gabriel RA. Anesth Analg. 2022 Jul. * | California, USA | Observational, retrospective, single-centre study | Orthopedic and ear, nose, and throat surgeries. | Surgery end time and discharge from recovery room | Development of machine learning models that predicted the following composite outcome: surgery finished by end of operating room block time and patient was discharged by end of recovery room nursing shift. | 13 447 surgical procedures | Logistic regression, random forest classifier, support vector classifier, simple feedforward neural network, balanced random forest classifier, and balanced bagging classifier. SMOTE | It has been created a model for each start time (1 pm, 2 pm, 3 pm, or 4 pm) and showed that the ensemble learning approaches had the highest AUC scores. The balanced bagging classifier performed best with F1 score of 0.78, 0.80, 0.82, and 0.82 when predicting our outcome if cases were to start at 1 pm, 2 pm, 3 pm, or 4 pm, respectively. | No |
Huang L. J Healthc Eng. 2022 Apr. | China | Monocentric, observational study | All surgeries | Surgical time and anesthesia emergence duration prediction | Creation of a surgery and anesthesia emergence duration-prediction system. | 15 754 samples | Perceptron | By combining the surgery duration prediction system with the anesthesia emergence duration prediction system, it has a prediction accuracy > 0.95 | No |
Chu J. Healthcare (Basel) 2022 Aug. | Taiwan | Retrospective, monocentric, observational study | All surgeries | Surgical time prediction | Construction of prediction models to accurately predict the OR room usage time and compare the performance of different models. | 124 528 entries of room | XGBoost, Random Forest, Artificial Neural Network, and 1-dimensional Convolution neural network. | They have found the result of their best performing department-specific XGBoost model with the values 31.6 min, 18.71 min, 0.71, 28% and 27% for the metrics of RMSE, MAE, R 2, MAPE and proportion of estimated result within 10% variation, respectively. We have presented each department-specific result with our estimated results between 5- and 10-min deviation would be more informative to the users in the real application. | No |
Gabriel RA. JMIR Perioper Med. 2023 Jan | USA | Single-academic-center, retrospective study | Spine surgery | Prediction of case duration | Utilization of an ensemble learning approach that may improve the accuracy of scheduled case duration for spine surgery. | 3 189 patients | Multivariable linear regression, Random Forest regressors, bagging regressors, and XGBoost regressors. | The XGBoost regressor performed the best with an explained variance score of 0.778, an R 2 of 0.770, an RMSE of 92.95 min, and an MAE of 44.31 min. Based on SHAP analysis of the XGBoost regression, body mass index, spinal fusions, surgical procedure, and number of spine levels involved were the features with the most impact on the model. | No |
Eshghali M. Ann Oper Res. 2023 Jan. | Iran | Observational, single-centre study | All surgeries | Prediction of surgical duration | Development of an approach for scheduling and rescheduling for both elective and emergency patients in OTs. | All cases in the first 20 weeks of 2020 | Random Forest, Genetic Algorithm, Particle Swarm Optimization, traffic congestion index, CPLEX. | The results show that by applying the proposed model, the performance of OT can improve by approximately 10.5% on average. | No |
Miller LE. Otolaryngol Head Neck Surg. 2023 Feb. | USA | Monocentric, observational study | Otolaryngology surgical cases | Prediction of surgeries duration | Improvement of ML methods by projecting case lengths over existing non-ML techniques for otolaryngology–head and neck surgery cases. | 50 888 cases | CatBoost and XGBoost | The CatBoost model demonstrated better predictive ability (RMSE = 38.2, MAE = 23.2) than the XGBoost model (RMSE = 39.3, MAE = 24.3) (P = 0.041). However, both performed better than the baseline model (RMSE = 46.3, MAE = 32.8) (P < 0.001) reducing operative time MAE by 9.6 min and 8.5 min compared to current methods, respectively. | No |
Zhong W. J Clin Monit Comput. 2023 Sep. | USA | Retrospective, monocentric, observational study | Open reduction internal fixation of radius fractures | Prediction of surgeries duration | Demonstration of a proof-of-concept study for predicting case duration by applying natural language processing (NLP) and machine learning that interpret radiology reports for patients undergoing radius fracture repair. | 201 cases | Baseline Model, Linear regression, Random Forest regressor, Multilayer perceptron neural network, Performance Metrics, K-Folds Cross-Validation | The average root mean squared error was lowest using feedforward neural networks using outputs from ClinicalBERT (25.6 min, 95% CI: 21.5–29.7), which was significantly (P < 0.001) lower than the baseline model (39.3 min, 95% CI: 30.9–47.7). Using the feedforward neural network and ClinicalBERT on the test set, the percentage of accurately predicted cases, which was defined by the actual surgical duration within 15% of the predicted surgical duration, increased from 26.8 to 58.9% (P < 0.001). | No |
Adams T. Comput Methods Programs Biomed. 2023 Jun. | New Zeland | Retrospective, monocentric, observational study | Surgical operations | Prediction of procedure durations | Two methods for incorporating the medical information about a surgical procedure into the prediction of the duration of the procedure. | 35 000 surgical operations | Linear regression | The ontological information provides an improvement in the continuous ranked probability scores of the prediction of procedure durations from 18.4 min to 17.1 min, and from 25.3 to 21.3 min for types of procedures that are not performed very often. | No |
Yeo I. Arch Orthop Trauma Surg. 2023 Jun. | USA | Retrospective, monocentric, observational study | Total knee arthroplasty | Prediction of surgeries duration | Development of an accurate predictive model for surgical operative time for patients undergoing primary total knee arthroplasty. | 10 021 patients | Artificial Neural Networks, Random Forest and K-Nearest Neighbor. | Younger age (< 45 years), tranexamic acid non-usage, and a high BMI (> 40 kg/m2) were the strongest predictors associated with surgical operative time. The accurate estimation (AUC = 0.82) is important in enhancing OR efficiency and identifying patients at risk for prolonged surgical operative time. | No |
Strömblad CT. JAMA Surg. 2021. | USA | Single-center, 2-campus, randomized clinical trial, prospective study | Colorectal and gynecology surgery | Prediction of the duration of each scheduled surgery, measured by (arithmetic) mean (SD) error and mean absolute error. | Assessement of accuracy and real-world outcome from implementation of a machine learning model that predicts surgical case duration. | 683 patients | Random Forest |
The implementation of a machine learning model significantly improved accuracy in predicting case duration and led to reduced patient wait time, no difference in time between cases (i.e., turnover time or surgeon wait time), and reduced presurgical length of stay compared to the control group. The SD for colorectal service in the intervention arm would have been reduced from 87 to 70 for the mean absolute error SD and 103 to 86 for the mean error SD. |
No |
Rozario N. Can J Surg. 2020 | Canada | Observational, retrospective, single-centre study | All surgeries | Optimization of surgeries time | Creation of customized models to optimize the efficiency of operating room booking times. | 10 553 cases | Python programming language combined with the open source OR-Tools software suite from Google AI | The optimized schedule had 113 min of PACU holds [95% CI: 110, 115 min], a 76% reduction; in addiction to that, 26 min of delays occurred [95% CI: 25, 27 min], corresponding to an 80% reduction in PACU admission delay time [95% CI: 79%, 81%]. | No |
ML: Machine Learning. RMSE: Root Mean Square Error. CRPS: Continuous Ranked Probability Score SVM: support vector machine. MAE: mean absolute error. MAPE: mean absolute percentage error. R2: coefficient of determination. OT: Operating Theatre. AUC: area under the curve. PACU: Post Anesthesia Care Unit.