Abstract
Objective:
To implement a machine learning model using only the restricted data available at case creation time to predict surgical case length for multiple services at different locations.
Background:
The operating room is one of the most expensive resources in a health system, estimated to cost $22 to $133 per minute and generate about 40% of hospital revenue. Accurate prediction of surgical case length is necessary for efficient scheduling and cost-effective utilization of the operating room and other resources.
Methods:
We introduced a similarity cascade to capture the complexity of cases and surgeon influence on the case length and incorporated that into a gradient-boosting machine learning model. The model loss function was customized to improve the balance between over- and under-prediction of the case length. A production pipeline was created to seamlessly deploy and implement the model across our institution.
Results:
The prospective analysis showed that the model output was gradually adopted by the schedulers and outperformed the scheduler-predicted case length from August to December 2022. In 33,815 surgical cases across outpatient and inpatient platforms, the operational implementation predicted 11.2% fewer underpredicted cases and 5.9% more cases within 20% of the actual case length compared with the schedulers and only overpredicted 5.3% more. The model assisted schedulers to predict 3.4% more cases within 20% of the actual case length and 4.3% fewer underpredicted cases.
Conclusions:
We created a unique framework that is being leveraged every day to predict surgical case length more accurately at case posting time and could be potentially utilized to deploy future machine learning models.
Keywords: implementation, machine learning, prospective analysis, surgical case duration, surgical case length
The operating room (OR) is one of the most expensive resources in hospitals and comprises a significant portion of surgical costs, which accounts for 30% to 40% of all health care expenditures in the United States.1,2 The OR costs $22 to $133 and generates about 40% of hospital revenue.1–3 It is estimated that about 60% of all admitted patients require surgery during their hospital stay.4,5 Therefore, it is imperative to improve OR efficiency to enhance patient flow, increase hospital revenue, and reduce operating costs. One of the first steps toward improvement in OR efficiency is a more accurate surgical case length estimate for scheduling. Accurate prediction of surgical case length could increase the utilization and efficiency of the OR, reduce patient and surgeon wait time, and release presurgical beds for the post-anesthesia care unit.4,6 However, a high degree of variability in the patient, procedure, surgeon, and operational factors causes case length prediction to be very challenging.
Several performance indicators, such as over and undertime case length predictions, have been proposed to evaluate OR scheduling.7 Over or underpredicting a case length means that the case was finished earlier or later than the expected time, respectively. Overprediction decreases the OR efficiency and increases OR idle time. In contrast, underprediction leads to the cancellation or rescheduling of cases and increases patient wait time as well as the cost of surgery due to personnel overtime costs. About 55% to 59% of the total OR cost is the direct expenses, of which wages and benefits account for two-thirds in a California hospital study.8 Although the financial effect of an improved OR utilization is complicated, a basic study showed that saving 9.6 minutes per case in an institution with 10,904 annual cases could save $3.7 million annually, which could be realized as more scheduled cases and staff utilization.9 In another study, it was shown that a 21% reduction in underprediction could save $469,000 in overtime costs over 3 years.10 Strömblad et al6 showed that reducing underprediction and absolute error resulted in decreased patient wait time with no increase in surgeon wait time.
Current Procedural Terminology (CPT) codes have been used for over 25 years to post or create surgical cases.11 Most hospitals use surgeon-estimated and/or historical median case length to schedule surgical cases, which both have uncertainties and are inaccurate.1,4–6,10,12,13 Some hospitals also use their electronic health record (EHR) system-generated time, which is a summary (median or average) of past similar cases lengths based on the surgeon, platform, and combination of CPT codes and is shown to be unreliable due to preoperative data variations.5,6,13 Many groups have incorporated patient, procedural, and operational factors to create machine-learning models and predict surgical case length.1,4 It has been shown that surgeons are the most significant contributor to case length variability as well as prediction.13 Ito et al2 reviewed several machine learning approaches in the literature that have been used to predict surgical case length, among which the boosted decision tree models showed better performance. To the best of our knowledge, there is only one report in the literature that describes the implementation of a surgical case length predictive model in a clinical trial for multiple surgical services using preoperative data.6 However, that study incorporated more than 300 variables from structured and unstructured data up to one day before surgery, which may not be necessarily available at case posting time. To seamlessly deploy a case length predictive model and provide schedulers with the predicted value at scheduling time, it is crucial to incorporate only the data that are available at case posting time, which could be months before the surgery.
Here, we report the development, deployment, and implementation of a gradient-boosted decision tree model to predict surgical case length for multiple services and locations within our institution using very limited data available at the time of surgical case posting. We also propose an improved method upon the EHR-generated median case length to calculate the historical median of past similar cases length and show that it improved model performance and significantly reduces sparsity and training time. Finally, we show how to adjust the model’s loss function to balance over and undertime errors based on our institution’s priority.
METHODS
Source of Data
This study was found exempt by the Duke University Institutional Review Board (protocol number: Pro00104275). Elective surgical case posting data of 20 different services across 12 inpatient and ambulatory locations at Duke University Health System (DUHS) from July 2013 to April 2022 was initially collected and used to evaluate the best training period (results not shown), and the data from January 2021 to Apil 2022 was selected for the study. Records with missing timestamps or negative case lengths were excluded from the cohort, which resulted in 107,898 cases. We selected 80,595 and 27,303 cases performed in 2021 and 2022 as the training and testing sets, respectively.
Outcome
The patient-in to patient-out or wheels-in to wheels-out time was defined as the surgical case length because of its wide perioperative and scheduling use.1,13 The surgical case length was log-transformed to address the skewness of the original case length distribution (Supplemental Digital Content Fig. 1, http://links.lww.com/SLA/E646).
Predictors
The numerical variables include age, number of panels, number of posted CPTs, EHR-generated median case length, and surgeon-estimated case length. The categorical variables comprise sex, patient class, service, primary physician, primary CPT, primary anesthesia type, location, and laterality. We reported that all the CPTs could be converted to relative value units (RVU) in a case length prediction model.14 The RVU consists of 3 categories: physician work, practice expense, and professional liability. The physician work or simply the work RVU accounts for about half of the total RVU and generally depends on the required time to perform a procedure.15 Here we implemented the same method and converted all posted CPTs to a single work RVU as a continuous variable.
We defined a similarity cascade to calculate the historical median and SD of case length (Fig. 1). Cases in the training set with selected similar features at each stage were used to calculate the median and SD of case length and stored as four separate reference tables. Then for each case, we looked at the first reference table to find similar features respective historical median and SD of case length. If no similar case was found in the first table, the next tables with fewer similar features were used to assign the historical median and SD of case length. If no similar case was found in the last reference table, a null value was assigned as the median and SD.
FIGURE 1.
The similarity cascade to find similar surgical cases at each stage for calculation of the historical median and SD of case length.
Model Development and Evaluation
Three gradient-boosted decision tree regression models were created using XGBoost16 (version 1.2.1) in Python (version 3.7.6) to predict the case length. The mean squared logarithmic error (MSLE) was selected as the loss function. All the hyperparameters optimization was performed by 5-fold cross-validation on the training set. In the first model, all the mentioned variables except the historical median and SD were incorporated into the model. The primary physicians were one-hot encoded and used in the model. In the second model, all the mentioned features except the one-hot encoded physicians were used and historical median and SD were derived from the reference tables created by the similarity cascade approach as two additional features. In the third model, we used a similar feature set as the second model but modified the loss function. Specifically, we introduced 2 regularization parameters C1 and C2 from 0.9 to 1.1 into the MSLE loss function for short (≤30 minutes) and long (>30 minutes) case durations, respectively, to differently penalize cases in each group. Then we used Optuna17 to find the best C1 and C2 parameters to minimize MSLE as well as the difference between over and underprediction errors. Each model performance was benchmarked against scheduler-predicted and EHR-generated case length using MSLE as well as mean absolute error (MAE), and root mean squared error as the performance metrics. Bootstrapping and resampling of the test set were used to calculate the 95% CIs of the metrics.
Model Deployment and Implementation
To seamlessly implement and deploy our case length prediction model, we collaborated with multiple groups within DUHS to create a production pipeline. Analytics Center of Excellence created an application programming interface that supplies surgical case data. Duke Institute for Health Innovation and Duke Health Technology Solutions supplied the framework to run the model on the Duke Kubernetes system. The EHR system developers embedded a new field in Epic OpTime to show the model-predicted case length where schedulers could easily incorporate the predicted value and schedule cases one day after the surgeon posted the case. The production pipeline starts one day after cases are posted. Data extraction, transformation, and loading are completed in the backend by 4am and then the application programming interface is updated with the latest posted cases data by 5am. The model is run at 6am and the predicted case length is written as an embedded field in the Epic for schedulers use by 7am. Each step is separated by 1 hour to provide a reasonable buffer time against an unforeseen delay in the earlier steps. We registered our model and pipeline with the Duke Algorithm-Based Clinical Decision Support18 committee, an internal oversight board within our health system. After several phases of silent evaluation from May to July 2022, the model was implemented and the schedulers were directed to use the model-predicted case length beginning on August 1, 2022. To measure and compare OR utilization during silent evaluation and implementation phases, we defined and calculated the following 3 metrics for each phase: (1) case volume as the number of cases per day; (2) percentage room utilization as a sum of all patient-in to patient-out time divided by the sum of all time available; and (3) nurse availability as the number of nurses per case. All 3 metrics were calculated considering cases that were started and ended between 7:30am and 7PM and 7:30am to 4:30Pm in inpatient and ambulatory ORs, respectively, as regular operating hours.
RESULTS
Schedulers Performance
We evaluated the performance of the schedulers in the whole cohort (January 2021–April 2022). Over and underprediction errors were defined as the case length predicted more or <20% of the actual case length, respectively. As shown in Figure 2, schedulers underpredicted cases 2.6 times more than overpredicting the cases indicating significant overbook of the ORs, patient and surgeon wait time, and staff over-time payment. Only 44% of cases were predicted within 20% of the actual case length.
FIGURE 2.
Schedulers’ performance from January 2021 to April 2022. (A) Distribution of the actual and the scheduler-predicted case length in minutes and (B) percentage of cases predicted within, over, or under 20% of the actual case length. Predicted case lengths within, under, and over 20% of the actual case length are depicted in white, red, and blue, respectively.
Model Retrospective Performance
We created 3 gradient-boosted decision tree models using the 2021 and 2022 data as training and testing sets, respectively, and compared the schedulers, EHR system-generated median, and model performance. The first model was created with encoded primary physicians and comprised 736 features, of which 643 were only the one-hot encoded physicians. It predicted 57% of cases within 20% of the actual case length, 16.4% under, and 26.6% over the error margin (Fig. 3). To demonstrate the significance of the similarity cascade, 643 encodes physicians were replaced in the second model with only 2 features, the historical median and SD of case length calculated using the similarity cascade. Interestingly, it showed almost similar performance to the first model and predicted 57.8% of cases within the 20% margin whereas 17.6% and 24.6% of cases were under and overpredicted, respectively showing an overall imbalanced error (Fig. 3). Our health system requirement before implementation was an outcome with balanced over and underprediction errors. For example, the second model over-prediction for short cases (≤30 min) is 2.6 times more than under-prediction indicating more imbalanced error within that case length period (Fig. 4A). Such short cases are generally ambulatory cases with high volume and overpredicting those cases may negatively affect the number of performed cases and hospital revenue. Therefore, we created the third model using the same features set as in the second model and adjusted the loss function to improve the overall performance as well as the balance among the prediction errors. This model predicted 58.7% of cases within 20% of the actual case length with an overall balanced error of 20.7% over and 20.6% underprediction, respectively (Fig. 3). Compared with the scheduler, model 3 resulted in 485 and 53 fewer overtime hours for cases that ended past regular working hours in inpatient and ambulatory ORs, respectively. Model 3 also predicted inpatient and ambulatory cases with 37.4 and 19.6 minutes less median total error time per room per day, respectively. Importantly, all 3 models outperformed the scheduler as well as the EHR system-generated historical median, with the third model beating the other two (Fig. 3). As shown in Figure 4B, the third model provides a more balanced prediction error for short cases and a 2.9% improvement in cases within the 20% margin as well as the lowest MSLE value (Supplemental Digital Content Table 1, http://links.lww.com/SLA/E647). Therefore, the third model was selected for implementation.
FIGURE 3.
Performance comparison of three models with scheduler and EHR predicted case length. Models 1 to 3 include encoded physicians, historical median from similarity cascade, and historical median from similarity cascade with adjusted loss function, respectively. Predicted case lengths within, under, and over 20% of the actual case length are depicted in white, red, and blue, respectively.
FIGURE 4.
Performance comparison at different predicted case length periods. (A) Model 2 and (B) model 3. Predicted case lengths within, under, and over 20% of the actual case length are depicted in white, red, and blue, respectively.
Model Prospective Performance
We collected 33,815 surgical cases that were performed at selected DUHS locations from August to December 2022 and evaluated the accuracy of model prediction as well as how much the model output was used by schedulers. Although the model generates predicted case length for all cases, schedulers were free to use and adjust the predicted value. Ideally, there should be no difference between schedulers and model case length prediction if the schedulers were exclusively using the model output for all cases. As shown in Figure 5A, the median difference between the schedulers and the model-predicted case length value has been shrinking since the start of the silent evaluation in May 2022 when some schedulers adopted the use of model output and that gap has even further decreased after the model implementation in August 2022 and reduced by 7 minutes by the end of 2022 indicating that more schedulers gradually started to use the model output from August to December 2022. Schedulers' performance was improved in August to December 2022 by using the model compared with January to April 2022 as they predicted 3.4% more cases within 20% of the actual case length and 4.3% fewer cases with undertime error and only overpredicted cases 1% more (Figs. 3 and 5B). The model outperformed schedulers in all performance metrics (MSLE, MAE, and root mean squared error Supplemental Digital Content Table 1, http://links.lww.com/SLA/E647), performed more balanced, and predicted 11.2% fewer underpredicted cases and 5.9% more cases within 20% of the actual case length compared with the schedulers and only overpredicted 5.3% more (Fig. 5B), which the same performance trend can be seen at different predicted periods (Figs. 5C, D). Specifically, the model outperformed the schedulers for short cases (ie, ≤30 min), 34.4% fewer underpredicted cases, 18.8% more cases within the 20% error margin, and only 15.6% more overpredicted cases. The implemented model also resulted in 5 and 26 fewer overtime hours for cases that ended past normal working hours as well as 18 and 20 minutes less median total error time per room per day in an inpatient and ambulatory venues, respectively. We compared utilization metrics during silent evaluation and implementation phases, that is, May to July 2022 and August to December 2022, respectively (Supplemental Digital Content Table 2, http://links.lww.com/SLA/E648). The case volume decreased by 1.24 and increased by 2.14 cases per day in ambulatory and inpatient ORs, respectively, after the implementation. Although there were 0.05 and 0.13 fewer nurses available per case in ambulatory and inpatient ORs, respectively, after the implementation due to our institution’s nursing shortage, the utilization only decreased by 0.36% and 0.18% in ambulatory and inpatient venues, respectively.
FIGURE 5.
Prospective evaluation of the case length model from August to December 2022. (A) Median difference in predicted case length per day by the scheduler and model in minutes, (B) overall performance comparison between scheduler and model, and (C and D) scheduler and model performance at different predicted case periods, respectively. The solid black lines are the regression lines from May to July 2022 and August to December 2022. The dashed red and black lines indicate the implementation of the model from August 2022 and no difference between the median of the scheduler and model predicted case length per day, respectively. Predicted case lengths within, under, and over 20% of the actual case length are depicted in white, red, and blue, respectively.
DISCUSSION
The similarity cascade addresses two main issues that could be raised by using a large number of encoded physicians in a model. First, a large number of encoded features creates a big sparse matrix, which negatively affects computational cost. And second, the model is not robust to cases with new unseen primary physicians and fails. Replacing the encoded physicians in the second model with the historical median and SD of case length calculated using the similarity cascade not only captured surgeon and procedure-related case length variation in various services and platforms but also resulted in significantly fewer features, improved robustness to new physicians, and ~5 times faster training than the first model.
It should be noted that the small amount of reduction in the MAE could significantly improve clinical workflow and save costs over time. For example, the fewer overtime hours for cases that ended past regular working hours in inpatient and ambulatory ORs could potentially reduce overtime labor expenses by about $79,000 from January to April 2022 assuming 2 staff per case for an overtime rate of $73 per hour. Although model 3 adds some overprediction or idle time, it would be possible to add more cases. For example, more than one thousand inpatient and ambulatory cases with lengths shorter than 20 and 40 minutes, respectively, were performed in January to April 2022 and similar short cases could be scheduled and performed using the saved time per room per day in inpatient and ambulatory ORs.
Prospective analysis of the implemented model showed gradual adoption of the model by schedulers. Specifically, the reduced difference in overtime error for cases that ended past normal working hours indicates an overall consistency between the model and scheduler for such cases. However, the model still outperformed the scheduler by less median total error time per room per day for both ambulatory and inpatient venues, which implies that the unused OR time, especially the ambulatory rooms, could be better utilized by exclusively using the model for scheduling. Our institution experienced a historical nursing shortage in 2022, including surgical staff. Duke Surgery leadership directed the schedulers to use the model’s predicted case length to avoid further disruption in every day OR workflow and maintain consistent scheduling. Although OR utilization is a complex topic that depends on several other factors, we managed to maintain room utilization consistency in part by using the model during this national nursing shortage.
Our model has several limitations. First, new surgeons, CPTs, instruments, and techniques as well as increased efficiency in performing procedures could all affect the case length prediction. Although our model is robust to new physicians and CPTs because of the similarity cascade and use of RVU it does not require new physicians to be encoded and explicitly added into the model. However, the model requires regular retraining and evaluation with the latest data to implicitly incorporate the above changes using the similarity cascade. Second, inconsistency in posting the CPTs could also negatively affect the case length prediction. Therefore, part of the implementation plan involved prompting the surgeons and their assistants to be as accurate as possible about CPTs when posting planned surgical cases. We also compared posted CPTs versus billed CPTs and created lists of CPT combinations for each surgeon based on their posting versus billed CPTs history to facilitate case creation and improve case posting consistency, so some of the improvement described above could simply have come from improved CPT accuracy in the case posting. Third, it has been shown that surgeon-estimated case length for cases with limited historical data could contribute to the case length prediction.2,11 Although surgeon-estimated case length is included in the implemented model not all surgeons input their estimate at case posting time. Therefore, we requested surgeons to provide their case length estimates, especially for new types of surgeries and/or complicated cases so that future updates could potentially improve the model predictions. Fourth, CPT codes are updated annually by the American Medical Association and go into effect on January 1 of each year (https://www.ama-assn.org/about/cpt-editorial-panel/cpt-code-process). This could create inconsistency for some cases and negatively affect case length prediction. Fifth, our model was developed using the data from a single health system. Although we hypothesize that using similar feature sets and the similarity cascade in other health systems should perform comparably,19 our model needs to be validated at other institutions. Sixth, the features set in the current model are from case posting data. More studies could explore the extraction and addition of reliable features from unstructured data using natural language processing and image analysis, as well as other structured EHR data captured before case creation, such as comorbidities and past surgical encounters. Seventh, our utilization metrics showed consistent room utilization before and after the model implementation. However, a deeper study is required to evaluate the model's impact on OR utilization and efficiency as well as hospital revenue.
CONCLUSIONS
We created the similarity cascade to better calculate the historical median of surgical case length and capture case complexity and primary surgeon influence on the case length. We showed that replacing thousands of primary surgeon names with a single median calculated by using the similarity cascade not only reduced the model training time and sparsity and enhanced robustness but also delivered the same if not better performance. The similarity cascade could be customized and explored for other outcomes, such as hospital length of stay, that are dependent on a variety of variables. We adjusted the model based on our institution’s priority and created a production pipeline to deploy and implement the model for the everyday use of the schedulers. Our prospective evaluation predicted 11.2% fewer under-predicted cases and 5.9% more cases within 20% of the actual case length compared with the schedulers and only overpredicted 5.3% more. We maintained room utilization while experiencing a nursing shortage by using the model. The developed pipeline could be leveraged to deploy future models across our and other health systems.
Supplementary Material
Acknowledgments
We would like to acknowledge Suresh Balu, Marshall Nichols, and Matt Gardner from DIHI, Ryan Craig, Alejandro Trillo, Deepthi Krisnamaneni, Sanjay Ghosh, and Omar Sodeq from ACE, and Andrii Kuraska and Tim Crittenden from Duke clinical applications for their support to create the production pipeline.
Footnotes
This study was supported by Operational funding from the Department of Surgery, Duke University Medical Center.
The authors report no conflicts of interest.
Supplemental Digital Content is available for this article. Direct URL citations are provided in the HTML and PDF versions of this article on the journal's website, www.annalsofsurgery.com.
Contributor Information
Hamed Zaribafzadeh, Email: hz116@duke.edu.
Wendy L. Webster, Email: wendy.webster@duke.edu.
Christopher J. Vail, Email: christopher.vail@duke.edu.
Thomas Daigle, Email: thomas.daigle@duke.edu.
Allan D. Kirk, Email: allan.kirk@duke.edu.
Peter J. Allen, Email: peter.allen@duke.edu.
Ricardo Henao, Email: ricardo.henao@duke.edu.
Daniel M. Buckland, Email: dan.buckland@duke.edu.
REFERENCES
- 1. Jiao Y, Sharma A, Ben Abdallah A, et al. Probabilistic forecasting of surgical case duration using machine learning: model development and validation. J Am Med Inform Assoc. 2020;27:1885–1893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Ito M, Hoshino K, Takashima R, et al. Does case-mix classification affect predictions? A machine learning algorithm for surgical duration estimation. Healthcare Analytics. 2022;2:100119. [Google Scholar]
- 3. Bellini V, Guzzon M, Bigliardi B, et al. Artificial intelligence: a new tool in operating room management. Role of machine learning models in operating room optimization. J Med Syst. 2019;44:20. [DOI] [PubMed] [Google Scholar]
- 4. Zhao B, Waterman RS, Urman RD, et al. A machine learning approach to predicting case duration for robot-assisted surgery. J Med Syst. 2019;43:32. [DOI] [PubMed] [Google Scholar]
- 5. Tuwatananurak JP, Zadeh S, Xu X, et al. Machine learning can improve estimation of surgical case duration: a pilot study. J Med Syst. 2019;43:44. [DOI] [PubMed] [Google Scholar]
- 6. Strömblad CT, Baxter-King RG, Meisami A, et al. Effect of a predictive model on planned surgical duration accuracy, patient wait time, and use of presurgical resources. JAMA Surg. 2021;156:315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Rahimi I, Gandomi AH. A Comprehensive review and analysis of operating room and surgery scheduling. Arch Comput Methods Eng. 2020;28:1667–1688. [Google Scholar]
- 8. Childers CP, Maggard-Gibbons M. Understanding costs of care in the operating room. JAMA Surg. 2018;153:e176233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Miller LE, Goedicke W, Crowson MG, et al. Using machine learning to predict operating room case duration: a case study in otolaryngology. Otolaryngol Head Neck Surg. 2022;168:241–247. [DOI] [PubMed] [Google Scholar]
- 10. Rozario N, Rozario D. Can machine learning optimize the efficiency of the operating room in the era of COVID-19? Can J Surg. 2020;63:E527–E529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dexter F, Epstein RH, Marian AA. Case duration prediction and estimating time remaining in ongoing cases. Br J Anaesth. 2022;128:751–755. [DOI] [PubMed] [Google Scholar]
- 12. Robertson A, Kla K, Yaghmour E. Efficiency in the operating room: optimizing patient throughput. Int Anesthesiol Clin. 2021;59:47–52. [DOI] [PubMed] [Google Scholar]
- 13. Bartek MA, Saxena RC, Solomon S, et al. Improving operating room efficiency: machine learning approach to predict case-time duration. J Am Coll Surg. 2019;229:346–354 e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Garside N, Zaribafzadeh H, Henao R, et al. CPT to RVU conversion improves model performance in the prediction of surgical case length. Sci Rep. 2021;11:14169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Nurok M, Gewertz B. Relative value units and the measurement of physician performance. JAMA. 2019;322:1139–1140. [DOI] [PubMed] [Google Scholar]
- 16. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: Association for Computing Machinery; 2016:785–794. [Google Scholar]
- 17. Akiba T, Sano S, Yanase T, et al. Optuna: a next-generation hyperparameter optimization framework. KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY: Association for Computing Machinery; 2019:2623–2631. [Google Scholar]
- 18. Bedoya AD, Economou-Zavlanos NJ, Goldstein BA, et al. A framework for the oversight and local deployment of safe and high-quality prediction models. J Am Med Inform Assoc. 2022;29:1631–1636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Lam SSW, Zaribafzadeh H, Ang BY, et al. Estimation of surgery durations using machine learning methods-a cross-country multi-site collaborative study. Healthcare (Basel). 2022;10:1191. [DOI] [PMC free article] [PubMed] [Google Scholar]