. 2024 Feb 14;48(1):19. doi: 10.1007/s10916-024-02038-2

Table 1.

Main studies about prediction of surgical time

Author, year	Country	Study design	Type of procedure	Main outcomes	Objective	Final Cohort	Type of AI	Prediction Performance	External validation
Bartek MA. J Am Coll Surg. 2019 Oct.	USA	Monocentric, retrospective, observational study	All surgeries	Surgical time prediction	Development of statistical models to improve estimation of case-time duration.	14 345 cases	Random forest and XGBoost	The ability to predict cases within 10% improved from 32% using our institutional standard to 39% with the ML surgeon-specific model. Models with accuracy greater than or equal to that of schedulers (i.e. >75%) constituted 45% of all models. These models were notably superior to the surgeon schedulers with within 10% prediction as high as 50% compared to 32%.	No
Martinez O. Comput Methods Programs Biomed. 2021 Sep.	Colombia	Monocentric, observational study	Single procedures surgeries	Surgical time prediction	Optimization of the OR efficiency by improving the surgery scheduling task, which requires the estimation of surgical time duration.	81 248 cases	Linear Regression, Regression Trees, Support Vector Regression and Bagging Regression Trees	The best overall performance was obtained using Bagged Trees (26 min RMSE, 3.16 min training time, 0.49 min testing time) when using a subset of the DB with the nine specialties containing 80% of the surgeries. Bagged Trees also outperformed the experience-based method with a lower RMSE.	No
Jiao Y. Br J Anaesth. 2022 May.	USA	Multicentric, retrospective, observational study	All surgeries	Methods to predict procedure duration	Development of a machine learning approach that continuously incorporates preoperative and intraoperative information for forecasting surgical duration.	70 826 cases	Modular artificial neural network	The modular artificial neural network had the lowest time error (CRPS mean = 13.8; standard deviation = 35.4 min), which was significantly better (mean difference = 6.4 min [95% confidence interval: 6.3–6.5]; P < 0.001) than the Bayesian approach. The modular artificial neural network also had the highest accuracy in identifying operating theatres that would overrun 15:00 (accuracy at 1 h prior = 89%) compared with the Bayesian approach (80%) and a naive approach using the scheduled duration (78%).	Yes
Hassanzadeh H. BMC Med Inform Decis Mak. 2022 Jun.	Australia	Monocentric, observational study	Elective and emergency surgeries	Predicting daily surgery demand and by medical specialty	Utilization of operating theatre data to provide decision support for improved theatre management.	99 732 surgeries on 63 697 unique patients.	Rolling window, Regression (Linear), Regression (Poisson), Regression (Negative binomial), Decision tree, Random forest, SVM (Linear, RBF, Sigmoid, Poly, Bagging regressor, Gradient boosting regressor, XGBoost regressor, Ensemble regressor	Predicting operating theatre demand is a viable component in theatre management, enabling hospitals to provide services as efficiently and effectively as possible to obtain the best health outcomes. They could be predicted with 90% accuracy.	No
Abbou B. Big Data Cogn Comput. 2022 Jul.	Israel	Two-centre, observational, retrospective study	All surgeries	Expected length of stay	Improvement of the productivity and utility of operating rooms.	102 301 cases in hospital 1 and 149 308 case in hospital 2	Naïve model based on the median length of similar surgeries and XGBoost model	Using different measures of performance evaluations, the XGBoost models performed better than the naïve models: the MAE was 21.5 compared to 25.4 in hospital 1 and 25.3 compared to 28.7 in hospital 2; RMSE, 36.6 vs. 49.0 (hospital 1), 40.3 vs. 55.0 (hospital 2); PVE, 66.7 vs. 44.0 (hospital 1), 70. vs. 46.8 (hospital 2); and ML2R, 0.46 vs. 0.53 (hospital 1) and 0.46 vs. 0.49 (hospital 2). In the case of MAPE, differences between the naïve and the ML-based model were minor—35.15 vs. 35.37 in hospital 1 and 35.09 vs. 32.48 in hospital 2 according to hospital performance evaluations.	No
Lam SSW. Healthcare (Basel). 2022 Jun.	Singapore and North Carolina	Two-centre, observational, retrospective study	Colorectal surgeries	Estimation of Surgery Durations	Determination of the performance of current surgery case duration estimations and the use of machine learning models to predict surgery duration across two large tertiary healthcare institutions.	7 585 cases (Center-1) and 3 597 cases (Center-2)	CatBoost	The simple MA-based predictions outperform the scheduled duration provided by the OR schedulers across RMSE, MAE, MAPE and proportion of cases within 80–120% of the scheduled actual duration. In center-1, the Model 5 shows the best performance with RMSE 45.18, MAE 23.986 and MAPE 34.40%. In center-2, Model 5 has the best performance, with 56.11% of its predictions falling within +/−20% of the actual duration. Model 5 prediction accuracy (within +/−20%) is 7.78% higher than that of the MA. Model 5 also has the lowest RMSE, MAE and MAPE at 38.48%, 23.61% and 23.36%, respectively.	No
Gabriel RA. Anesth Analg. 2022 Jul. *	California, USA	Observational, retrospective, single-centre study	Orthopedic and ear, nose, and throat surgeries.	Surgery end time and discharge from recovery room	Development of machine learning models that predicted the following composite outcome: surgery finished by end of operating room block time and patient was discharged by end of recovery room nursing shift.	13 447 surgical procedures	Logistic regression, random forest classifier, support vector classifier, simple feedforward neural network, balanced random forest classifier, and balanced bagging classifier. SMOTE	It has been created a model for each start time (1 pm, 2 pm, 3 pm, or 4 pm) and showed that the ensemble learning approaches had the highest AUC scores. The balanced bagging classifier performed best with F1 score of 0.78, 0.80, 0.82, and 0.82 when predicting our outcome if cases were to start at 1 pm, 2 pm, 3 pm, or 4 pm, respectively.	No
Huang L. J Healthc Eng. 2022 Apr.	China	Monocentric, observational study	All surgeries	Surgical time and anesthesia emergence duration prediction	Creation of a surgery and anesthesia emergence duration-prediction system.	15 754 samples	Perceptron	By combining the surgery duration prediction system with the anesthesia emergence duration prediction system, it has a prediction accuracy > 0.95	No
Chu J. Healthcare (Basel) 2022 Aug.	Taiwan	Retrospective, monocentric, observational study	All surgeries	Surgical time prediction	Construction of prediction models to accurately predict the OR room usage time and compare the performance of different models.	124 528 entries of room	XGBoost, Random Forest, Artificial Neural Network, and 1-dimensional Convolution neural network.	They have found the result of their best performing department-specific XGBoost model with the values 31.6 min, 18.71 min, 0.71, 28% and 27% for the metrics of RMSE, MAE, R 2, MAPE and proportion of estimated result within 10% variation, respectively. We have presented each department-specific result with our estimated results between 5- and 10-min deviation would be more informative to the users in the real application.	No
Gabriel RA. JMIR Perioper Med. 2023 Jan	USA	Single-academic-center, retrospective study	Spine surgery	Prediction of case duration	Utilization of an ensemble learning approach that may improve the accuracy of scheduled case duration for spine surgery.	3 189 patients	Multivariable linear regression, Random Forest regressors, bagging regressors, and XGBoost regressors.	The XGBoost regressor performed the best with an explained variance score of 0.778, an R 2 of 0.770, an RMSE of 92.95 min, and an MAE of 44.31 min. Based on SHAP analysis of the XGBoost regression, body mass index, spinal fusions, surgical procedure, and number of spine levels involved were the features with the most impact on the model.	No
Eshghali M. Ann Oper Res. 2023 Jan.	Iran	Observational, single-centre study	All surgeries	Prediction of surgical duration	Development of an approach for scheduling and rescheduling for both elective and emergency patients in OTs.	All cases in the first 20 weeks of 2020	Random Forest, Genetic Algorithm, Particle Swarm Optimization, traffic congestion index, CPLEX.	The results show that by applying the proposed model, the performance of OT can improve by approximately 10.5% on average.	No
Miller LE. Otolaryngol Head Neck Surg. 2023 Feb.	USA	Monocentric, observational study	Otolaryngology surgical cases	Prediction of surgeries duration	Improvement of ML methods by projecting case lengths over existing non-ML techniques for otolaryngology–head and neck surgery cases.	50 888 cases	CatBoost and XGBoost	The CatBoost model demonstrated better predictive ability (RMSE = 38.2, MAE = 23.2) than the XGBoost model (RMSE = 39.3, MAE = 24.3) (P = 0.041). However, both performed better than the baseline model (RMSE = 46.3, MAE = 32.8) (P < 0.001) reducing operative time MAE by 9.6 min and 8.5 min compared to current methods, respectively.	No
Zhong W. J Clin Monit Comput. 2023 Sep.	USA	Retrospective, monocentric, observational study	Open reduction internal fixation of radius fractures	Prediction of surgeries duration	Demonstration of a proof-of-concept study for predicting case duration by applying natural language processing (NLP) and machine learning that interpret radiology reports for patients undergoing radius fracture repair.	201 cases	Baseline Model, Linear regression, Random Forest regressor, Multilayer perceptron neural network, Performance Metrics, K-Folds Cross-Validation	The average root mean squared error was lowest using feedforward neural networks using outputs from ClinicalBERT (25.6 min, 95% CI: 21.5–29.7), which was significantly (P < 0.001) lower than the baseline model (39.3 min, 95% CI: 30.9–47.7). Using the feedforward neural network and ClinicalBERT on the test set, the percentage of accurately predicted cases, which was defined by the actual surgical duration within 15% of the predicted surgical duration, increased from 26.8 to 58.9% (P < 0.001).	No
Adams T. Comput Methods Programs Biomed. 2023 Jun.	New Zeland	Retrospective, monocentric, observational study	Surgical operations	Prediction of procedure durations	Two methods for incorporating the medical information about a surgical procedure into the prediction of the duration of the procedure.	35 000 surgical operations	Linear regression	The ontological information provides an improvement in the continuous ranked probability scores of the prediction of procedure durations from 18.4 min to 17.1 min, and from 25.3 to 21.3 min for types of procedures that are not performed very often.	No
Yeo I. Arch Orthop Trauma Surg. 2023 Jun.	USA	Retrospective, monocentric, observational study	Total knee arthroplasty	Prediction of surgeries duration	Development of an accurate predictive model for surgical operative time for patients undergoing primary total knee arthroplasty.	10 021 patients	Artificial Neural Networks, Random Forest and K-Nearest Neighbor.	Younger age (< 45 years), tranexamic acid non-usage, and a high BMI (> 40 kg/m2) were the strongest predictors associated with surgical operative time. The accurate estimation (AUC = 0.82) is important in enhancing OR efficiency and identifying patients at risk for prolonged surgical operative time.	No
Strömblad CT. JAMA Surg. 2021.	USA	Single-center, 2-campus, randomized clinical trial, prospective study	Colorectal and gynecology surgery	Prediction of the duration of each scheduled surgery, measured by (arithmetic) mean (SD) error and mean absolute error.	Assessement of accuracy and real-world outcome from implementation of a machine learning model that predicts surgical case duration.	683 patients	Random Forest	The implementation of a machine learning model significantly improved accuracy in predicting case duration and led to reduced patient wait time, no difference in time between cases (i.e., turnover time or surgeon wait time), and reduced presurgical length of stay compared to the control group. The SD for colorectal service in the intervention arm would have been reduced from 87 to 70 for the mean absolute error SD and 103 to 86 for the mean error SD.	No
Rozario N. Can J Surg. 2020	Canada	Observational, retrospective, single-centre study	All surgeries	Optimization of surgeries time	Creation of customized models to optimize the efficiency of operating room booking times.	10 553 cases	Python programming language combined with the open source OR-Tools software suite from Google AI	The optimized schedule had 113 min of PACU holds [95% CI: 110, 115 min], a 76% reduction; in addiction to that, 26 min of delays occurred [95% CI: 25, 27 min], corresponding to an 80% reduction in PACU admission delay time [95% CI: 79%, 81%].	No

ML: Machine Learning. RMSE: Root Mean Square Error. CRPS: Continuous Ranked Probability Score SVM: support vector machine. MAE: mean absolute error. MAPE: mean absolute percentage error. R2: coefficient of determination. OT: Operating Theatre. AUC: area under the curve. PACU: Post Anesthesia Care Unit.