Abstract
Patient-reported outcomes (PROs) enable providers to identify differences in treatment effectiveness, postoperative recovery, quality of life, and patient satisfaction. By allowing a shift from disease-specific factors to the patient perspective, PROs provide a tailored patient-centric approach to shared decision-making. Artificial intelligence (AI) and machine learning (ML) techniques can facilitate such shared decision-making and improve patient outcomes by accurate prediction of PROs. This article aims to provide a comprehensive review of the use of AI and ML models in predicting PROs following surgery through an overview of common predictive algorithms and modeling techniques, as well as current applications and limitations in the surgical field.
Keywords: artificial intelligence, machine learning, deep learning, surgery, patient-reported outcomes
Introduction
Surgical quality has been traditionally evaluated using objective measures such as mortality, hospital readmission, and postoperative complication rates.1,2 However, these measures are frequently not the best attribute to assess quality for specific procedures and are usually not sensitive enough to detect differences in patient outcomes.2 Patient-reported outcomes (PROs), in which a patient self-reports their satisfaction and quality of life after a procedure, enable providers to identify differences in treatment effectiveness.3 PROs are also useful for assessing differences in patient outcomes when comparing surgical interventions with comparable objective data such as survival or complications rates.4 As they represent a shift from disease-specific factors to the patient perspective, PROs provide a tailored patient-centric approach to shared decision-making.
The ability of providers to predict PROs, notably the effectiveness of an intervention, satisfaction, and post-operative recovery, is the foundation of informed shared decision-making. Predictive modeling using artificial intelligence (AI) and machine learning (ML) techniques can facilitate such shared decision-making and lead to improved patient satisfaction by accurate prediction of perioperative outcomes.5–8 It is critical for surgeons to understand the available predictive modeling approaches of PROs in order to adequately appraise and demonstrate the effectiveness of a chosen procedure. To this end, this article provides a comprehensive review of the use of AI and ML models in predicting PROs following surgery through an overview of common predictive algorithms and modeling techniques, as well as current applications and limitations in the surgical field.
Modeling Techniques
Model Development
Previous studies used several ML algorithms to predict PROs including the Cox Lasso, random forest, neural networks, k-means clustering, generalized additive model (GAM), and gradient boosted model (GBM). A brief description of the algorithms is shown in Table 1.9–13 A detailed overview of the algorithms, model development, performance metrics, and interpretation is previously published by our research group.14
Table 1.
Description of Machine Learning Algorithms Used for Predicting Patient-Reported Outcomes.
| Algorithm | Description |
|---|---|
| Neural networks | The algorithm comprises three basic layers: Input, hidden, and output layers. The input layer has several nodes (neurons) representing original predictors. The algorithm uses the information from the input layer to form one or more hidden layers with any number of nodes assigned a weight. Next, the algorithm transforms the input data when they pass through the hidden layer to the output layer. |
| Random forest | The algorithm comprises many de-correlated decision trees that jointly determine the final prediction. The algorithm builds many decision trees on randomly bootstrapped copies of the training data and then uses the aggregated predictions from all the trees as the final output. |
| Extreme gradient boosting | The algorithm combines the prediction power of several models. The algorithm starts with building a weak model and sequentially trains many models, with each model learning and improving by addressing the biggest mistake that its previous model makes. |
| LASSO regression | Regularized regression models use functions such as least absolute shrinkage and selection operator (LASSO), ridge regression, and elastic net (a linear combination of ridge and LASSO regularization). Aside from decreasing the risk of bias and increasing likelihood of generalizability to new data sets, regularized regression generates coefficients that can be interpreted for feature importance. |
| Generalized additive model | The algorithm uses a regression model that allows for non-linear relationships between predictors and the outcome. GAM models produce reliable results and accurately estimate the variance, particularly in large databases. |
| K-means clustering | The algorithm is an unsupervised learning algorithm that finds cluster structure in a data set based on the greatest similarity within the same cluster and the greatest dissimilarity between clusters. |
Minimal Clinically Important Difference
The minimal clinically important difference (MCID) is a patient-centered measure that indicates the smallest amount that a PRO must change in order to make a meaningful difference to patients.15,16 In most studies, the primary outcomes was prediction of whether a patient would experience either no improvement or better improvement in PROs at the final follow-up. Using these measures, studies often defined two types of outcomes: improved PROs, or stable PROs.
Applications in Surgery
Surgical Oncology
ML has been used in the field of oncologic surgery by Rossi et al17 to predict post-discharge lung and gastrointestinal cancer surgery complications via telemonitoring of PROs and patient-generated health data (PGHD). Post-discharge complications are known to pose significant cost and psychological burden to patients and providers.17 This group of researchers identified PROs as a useful tool in predicting post-surgical complications and aimed to utilize ML algorithms using PRO input to identify patients at risk of poor outcomes. ML algorithms were trained on data from 52 patients including clinical variables, PROs (MD Anderson Symptom Inventory [MDASI]), and PGHD (VivoFit). Their ML model was able to achieve an area under the curve (AUC) of .74 and predict post-operative complications with high accuracy utilizing PROs and PGHD.17 The ML models also demonstrated that a decrease in physical activity or a large increase in physical activity can lead to a higher complication risk after surgery. These findings will help in accurate monitoring and risk stratification of patients who are more or less likely to develop complications after oncologic surgery based on PROs.
Breast Reconstruction
PROs following breast reconstruction, such as satisfaction with breasts, are a vital part of measuring a patient’s quality of life and psychological well-being. Decision-making regarding breast reconstruction is currently reliant on population-based outcomes and provider preferences. In recent work,18 our group trained three ML algorithms utilizing data from 1921 women in North America who underwent mastectomy and breast reconstruction to accurately predict PROs at 1 year postoperatively. Changes in satisfaction with breasts were measured using the validated BREAST-Q tool. We found that all three ML algorithms predicted increased or decreased satisfaction with breasts with high accuracy for both the training (.81 [range = .73–.83]) and validation (.83 [range = .81–.84]) datasets. The models also achieved high AUCs of .86 (range = .83–.89) and .84 (range = .78–.85) for both datasets.18 At 1-year follow-up, 30% of patients in the study demonstrated dissatisfaction with their reconstructive outcome. These patients could have been identified by the ML algorithm before treatment and individually counseled on alternative reconstructive options.18 These findings will provide physicians with a tool to better inform patient-centered decision-making and offer individualized care for patients undergoing breast reconstruction.
In a subsequent study, our group evaluated the performance of the models on a cohort of 1553 women after mastectomy and breast reconstruction with a follow up of 2 years.19 Of study participants, 45.2% reported increased satisfaction with the appearance of their breasts, while 27.2% reported dissatisfaction.19 The three ML models were trained, tested, and validated on the additional data and achieved equally higher performance. Furthermore, the models demonstrated high accuracy for predicting both increased and decreased satisfaction with surgical outcomes (AUCs = .86–.87 and .84–.85), respectively.19 These findings demonstrated that ML can accurately predict long-term PROs after breast reconstruction.
Orthopedic Surgery
Huber et al20 sought to augment shared decision-making by evaluating PROs after hip and knee arthroplasty. The group compared eight ML models trained on National Health Service data to predict PROs following hip or knee replacement. The EQ-5D-3 L visual analogue scale (VAS) and the Oxford Hip and Knee Scores (Q scores) were used to record symptomatic improvement. The most important predictors for PROs in hip and knee arthroplasty were found to be preoperative VAS, Q score, and specific instrument dimensions. Extreme gradient boosting produced the highest performance of the models and achieved AUCs of .87 (VAS) and .78 (Q score) for hip replacement and .86 (VAS) and .70 (Q score) for knee arthroplasty.20 The authors also compared their ML model to the hip prediction tool used by NHS, which is a linear regression model. The findings demonstrated that supervised ML outperformed this linear model.20 In a clinical setting, the ML models will encourage shared decision-making and provide patients with accurate predictions of relevant outcomes that affect their quality of life. Furthermore, these predictions will be individualized, which will discourage providers from relying on population-based outcomes.
Further applications include Stefano et al21 prospective trial predicting 6-week PROs using wearable sensor data after joint replacement. The group used ML to identify patients at higher risk of unsuccessful recovery after arthroplasty to better predict poor outcomes.21 Data was collected from fifteen patients using three different trackers from 4 weeks before to 6 weeks after surgery, as well as PROs before and after surgery. Three million data points were collected and three sets of features with the highest importance scores relative to PROs were identified. The k-means clustering algorithm helped accurately predict 6-week PRO data as early as 11 days after arthroplasty.21
While models have been developed to predict PROs based on objective data, ML algorithms have also been developed to predict outcomes based on PRO data. Martin et al22 utilized ML to predict subjective failure of anterior cruciate ligament (ACL) reconstruction. The model was developed to identify risk factors based on PROs that are associated with poor outcomes after ACL reconstruction.22 The ML model was trained on the Norwegian Knee Ligament Register (NKLR) to help develop a clinically useful tool for identifying patients at risk of subjective failure of ACL reconstruction. The 2-year follow-up questionnaire was completed by 11 630 patients, of which 22% reported subjective failure. Four ML models were trained on this data, including Cox Lasso, survival random forest, GAM, and GBM, that produced AUCs of .67–.68. The GAM model achieved the highest AUC (.68; 95%, CI: .64–.71) and was used to create an in-clinic, patient specific calculator to predict subjective failure of ACL reconstruction.22 The product of this study was an ML model that adequately predicted PROs after ACL reconstruction to better inform patients about individual expectations prior to surgery.
Spine Surgery
ML models have been developed in the field of spine surgery to predict PROs after lumbar discectomy. Discectomy is known to be successful in relieving pain from lumbar disc herniation, but 20–30% of patients still experience poor outcomes after surgery.23 Staartjes et al23 utilized ML to accurately identify patients at risk of experiencing poor outcomes. The study included 422 patients whose leg and back pain severity and Oswestry Disability Index scores were recorded at 1 year post-operatively. The group found that neural network algorithms demonstrated high accuracy (85, 87, and 75%) in predicting MCID with high performance (AUC = .87, .90, and .84).23 Regression models were also tested and showed lower performance when compared to ML models. The ML models developed in this study allow providers to accurately counsel patients in the perioperative setting about the probability of poor outcomes after undergoing lumbar discectomy. This type of discussion fosters personalized patient care.
Limitations
There are several potential limitations to consider in studies that use ML to predict PROs. First, most studies used self-reported survey data, which can introduce recall bias and possibly imprecision. Second, the low response rate in previous studies may indicate selection bias, in which patients who are likely to report positive PROs, such as satisfaction after surgery, respond to the questionnaire. Non-responders may have significant systematic differences from respondents, limiting the models’ validity. Third, studies were non-randomized; the procedure of interest was chosen based on patient and surgeon preference, which limited generalizability and predictive accuracy. Finally, most studies included non-diverse patient populations with limited racial and ethnic diversity, which may introduce bias into the model and limit predictive performance in racially/ethnically diverse patients.
Conclusion
There has recently been a rapid increase in the use of AI and ML models in predicting PROs after surgery. Studies evaluating AI-driven predictive modeling techniques demonstrated high accuracy in predicting both short- and long-term PROs. These models have the potential to improve surgical outcomes by informing patients about their individual expectations and encouraging patient-centered decision-making.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Butler is a consultant for Allergan Inc.
References
- 1.McCormick JD, Werner BC, Shimer AL. Patient-reported outcome measures in spine surgery. J Am Acad Orthop Surg. 2013;21(2):99–107. [DOI] [PubMed] [Google Scholar]
- 2.Waljee J, McGlinn EP, Sears ED, Chung KC. Patient expectations and patient-reported outcomes in surgery: A systematic review. Surgery. 2014;155(5):799–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Black N Patient reported outcome measures could help transform healthcare. BMJ. 2013;346:f167. [DOI] [PubMed] [Google Scholar]
- 4.Pusic AL, Lemaine V, Klassen AF, Scott AM, Cano SJ. Patient-reported outcome measures in plastic surgery: Use and interpretation in evidence-based medicine. Plast Reconstr Surg. 2011;127(3):1361–1367. [DOI] [PubMed] [Google Scholar]
- 5.Hassan A, Lu S, Asaad M, et al. Novel machine learning approach for prediction of hernia recurrence, surgical complications, and 30-day readmission following abdominal wall reconstruction. J Am Coll Surg. 2022;234(5): 918–927. [DOI] [PubMed] [Google Scholar]
- 6.Hassan A, Tamirisa N, Singh P, Offodile AC, Butler CE. A novel support vector machine to predict sentinel lymph node status in elderly patients with breast cancer. J Clin Oncol. 2022;40:1560. [Google Scholar]
- 7.Hassan AM, Biaggi AP, Asaad M, et al. Development and assessment of machine learning models for individualized risk assessment of mastectomy skin flap necrosis. Ann Surg. 2022. doi: 10.1097/SLA.0000000000005386. [DOI] [PubMed] [Google Scholar]
- 8.Hassan AM, Biaggi AP, Asaad M, et al. Artificial intelligence modeling to predict periprosthetic infection and explantation following implant-based reconstruction. Plast Reconstr Surg. 2022. [DOI] [PubMed] [Google Scholar]
- 9.Sohn I, Kim J, Jung SH, Park C. Gradient lasso for Cox proportional hazards model. Bioinformatics. 2009;25(14): 1775–1781. [DOI] [PubMed] [Google Scholar]
- 10.Schmidhuber J Deep learning in neural networks: an overview. Neural Network. 2015;61:85–117. [DOI] [PubMed] [Google Scholar]
- 11.Mullin S, Zola J, Lee R, et al. Longitudinal K-means approaches to clustering and analyzing EHR opioid use trajectories for clinical subtypes. J Biomed Inf. 2021;122: 103889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Clements MS, Armstrong BK, Moolgavkar SH. Lung cancer rate predictions using generalized additive models. Biostatistics. 2005;6(4):576–589. [DOI] [PubMed] [Google Scholar]
- 13.Ji GW, Jiao CY, Xu ZG, Li XC, Wang K, Wang XH. Development and validation of a gradient boosting machine to predict prognosis after liver resection for intrahepatic cholangiocarcinoma. BMC Cancer. 2022;22(1):258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hassan A, Rajesh A, Asaad M, et al. A surgeon’s guide to artificial intelligence-driven predictive models. Am Surg. 2022;19:31348221103648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.McGlothlin AE, Lewis RJ. Minimal clinically important difference: defining what really matters to patients. JAMA. 2014;312(13):1342–1343. [DOI] [PubMed] [Google Scholar]
- 16.Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30. doi: 10.48550/arXiv.1705.07874. [DOI] [Google Scholar]
- 17.Rossi LA, Melstrom LG, Fong Y, Sun V. Predicting post-discharge cancer surgery complications via telemonitoring of patient-reported outcomes and patient-generated health data. J Surg Oncol. 2021;123(5):1345–1352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pfob A, Mehrara BJ, Nelson JA, Wilkins EG, Pusic AL, Sidey-Gibbons C. Towards patient-centered decision-making in breast cancer surgery: machine learning to predict individual patient-reported outcomes at 1-year follow-up. Ann Surg. 2021. doi: 10.1097/SLA.0000000000004862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pfob A, Mehrara BJ, Nelson JA, Wilkins EG, Pusic AL, Sidey-Gibbons C. Machine learning to predict individual patient-reported outcomes at 2-year follow-up for women undergoing cancer-related mastectomy and breast re-construction (INSPiRED-001). Breast. 2021;60:111–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huber M, Kurz C, Leidl R. Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning. BMC Med Inf Decis Making. 2019;19(1):3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bini SA, Shah RF, Bendich I, Patterson JT, Hwang KM, Zaid MB. Machine learning algorithms can use wearable sensor data to accurately predict six-week patient-reported outcome scores following joint replacement in a prospective trial. J Arthroplasty. 2019;34(10):2242–2247. [DOI] [PubMed] [Google Scholar]
- 22.Martin KWS, Pareek A, Persson A, et al. Predicting subjective failure of ACL reconstruction: A machine learning analysis of the norwegian knee ligament register and patient reported outcomes. Journal of ISAKOS. 2021; 7:1–9. [DOI] [PubMed] [Google Scholar]
- 23.Staartjes VE, de Wispelaere MP, Vandertop WP, Schröder ML. Deep learning-based preoperative predictive analytics for patient-reported outcomes following lumbar discectomy: feasibility of center-specific modeling. Spine J. 2019;19(5):853–861. [DOI] [PubMed] [Google Scholar]
