Abstract
Background:
As the prevalence of hip osteoarthritis increases, the number of total hip arthroplasty (THA) procedures performed is also projected to increase. Accurately risk-stratifying patients who undergo THA would be of great utility, given the significant cost and morbidity associated with developing perioperative complications. We aim to develop a novel machine learning (ML)-based ensemble algorithm for the prediction of major complications after THA, as well as compare its performance against standard benchmark ML methods.
Methods:
This is a retrospective cohort study of 89,986 adults who underwent primary THA at any California-licensed hospital between 2015 and 2017. The primary outcome was major complications (eg infection, venous thromboembolism, cardiac complication, pulmonary complication). We developed a model predicting complication risk using AutoPrognosis, an automated ML framework that configures the optimally performing ensemble of ML-based prognostic models. We compared our model with logistic regression and standard benchmark ML models, assessing discrimination and calibration.
Results:
There were 545 patients who had major complications (0.61%). Our novel algorithm was well-calibrated and improved risk prediction compared to logistic regression, as well as outperformed the other four standard benchmark ML algorithms. The variables most important for AutoPrognosis (eg malnutrition, dementia, cancer) differ from those that are most important for logistic regression (eg chronic atherosclerosis, renal failure, chronic obstructive pulmonary disease).
Conclusion:
We report a novel ensemble ML algorithm for the prediction of major complications after THA. It demonstrates superior risk prediction compared to logistic regression and other standard ML benchmark algorithms. By providing accurate prognostic information, this algorithm may facilitate more informed preoperative shared decision-making.
Keywords: machine learning, artificial intelligence, complications, total hip arthroplasty, outcomes
Total hip arthroplasty (THA) is one of the most commonly performed procedures in the United States, with over 500,000 cases performed each year. By 2030, this number is projected to grow to over 600,000 THAs annually due to a growing elderly population, as well as expanding indications for THA that include younger, active patients [1–3]. While most patients do well after THA, a small subset have perioperative complications. Complications with subsequent unplanned readmissions or reoperations are a major driver of increased morbidity and cost. As more THAs are performed on the elderly and those with medical comorbidities, the likelihood of complications will increase as well [4]. Preoperative identification of which patients are at elevated risk of complications would be of great utility, allowing for more extensive counseling and optimization, as well as preventative measures postoperatively.
Risk calculators that use inputs of patient demographic characteristics and comorbidities to predict the risk of complications and mortality after THA have been reported. Unfortunately, many of these tools have poor discrimination and calibration, remain unvalidated, and lack the generalizability necessary to accurately risk-stratify patients preoperatively [2,5,6]. These tools have largely been developed using multivariable regression techniques. Machine learning (ML) methods have grown in popularity in recent years due to their ability to capture complex, nonlinear relationships in large datasets [7]. ML algorithms have been shown to accurately diagnose disease and predict clinical outcomes across multiple medical disciplines, outperforming logistic regression in many cases [8–10]. Yet ML has remained relatively underutilized in orthopedic surgery, and arthroplasty in particular.
AutoPrognosis is a novel ML framework developed by our group that has been specifically tailored for prognostic clinical research. It employs an ensemble of ML methods, as well as traditional statistics, optimizing these into the best performing pipeline for any given dataset [11]. AutoPrognosis uses a Bayesian optimization algorithm to efficiently configure a dataset to generate a single well-calibrated ensemble algorithm. This automation allows AutoPrognosis to be applied across diverse datasets and precludes the need for clinicians to choose a specific algorithm or tune hyperparameters. AutoPrognosis has been successfully implemented for the prediction of cardiovascular risk, outperforming the Framingham score in patients with diabetes mellitus [12]. Furthermore, AutoPrognosis has been shown to more accurately predict the short-term survival of cystic fibrosis patients than existing risk scores [13].
We primarily aim to build a novel ML-based algorithm with AutoPrognosis for the prediction of major complications after THA. Secondarily, we aim to compare the performance of this novel algorithm against traditional logistic regression and other standard ML models. We hypothesize that AutoPrognosis will demonstrate superior predictive performance to these models, as well as identify novel risk factors for complications after THA.
Materials and Methods
Study Design and Subjects
This study was a retrospective review of all California Office of Statewide Health and Planning and Development (OSHPD), a mandatory statewide discharge database containing admission data on all nonfederal hospital admissions in California for primary THA between 2015 and 2017. Patients in this database are assigned a unique record linkage number that allows patients to be tracked longitudinally for complications regardless of whether future admissions are at a different hospital from where the index procedure was performed. We included patients who were 18 years or older who underwent primary THA using International Classification of Diseases, Tenth Revision (ICD-10) procedure codes for primary THA coding specification from the performance measure developed by the Centers for Medicare and Medicaid (CMS) [14]. These coding specifications exclude patients who underwent a revision or resurfacing procedure, concurrent partial hip arthroplasty, concomitant pelvic or femur fracture, pelvic or lower extremity malignancy, or mechanical complication of hardware.
Outcome and Other Variables
The primary outcome measure was any major complication after the index THA. Complications were also identified using the CMS performance measure ICD-10 coding specifications. These include acute myocardial infarction, pneumonia, sepsis, pulmonary embolism, surgical site bleeding, mechanical complications, and periprosthetic joint infection or wound infection [14]. Myocardial infarction, pneumonia, and sepsis must occur during the index admission or within 7 days of the start of the index admission. Pulmonary embolism must occur during the index admission or within 30 days of admission. Mechanical complications, surgical site bleeding, and periprosthetic/wound infection must occur during the index admission or within 90 days.
Explanatory features collected for the cohort include patient demographic characteristics (eg age, sex, body mass index, insurance type), patient medical comorbidities using the CMS Condition Categories as defined by the CMS Hierarchical Condition Category (HCC) risk adjustment model (eg malignancy, coronary atherosclerosis, chronic obstructive pulmonary disease, renal failure, protein-calorie malnutrition, diabetes, dementia), and hospital THA volume. The ICD-9 and ICD-10 codes included for select Condition Categories are shown in Supplementary Table 1.
Statistical Analysis
An ML-based risk prediction model for major complications after THA was developed using AutoPrognosis. AutoPrognosis is an algorithmic framework for automating the design of ML-based clinical prognostic models, freeing investigators without an in-depth knowledge of ML from having to choose one particular prognostic model [11]. A schematic illustration of AutoPrognosis is provided in Figure 1.
Fig. 1.
Schematic representation of AutoPrognosis workflow.
Given the input variables and complication outcomes, AutoPrognosis employs an advanced Bayesian optimization algorithm to design a prognostic model as a weighted ensemble of machine learning pipelines. Each pipeline comprises design choices for classification methods among 16 underlying classification methods (Table 1) and the corresponding hyperparameters. To train the model, we conduct 100 iterations of the Bayesian optimization. During each iteration, the algorithm explores a new ML pipeline of classification methods and the corresponding hyperparameters. Five-fold stratified cross-validation was performed within the training set (80% of the study population) to evaluate the performance of the ensemble pipeline under evaluation. The final model is constructed based on the weighted ensemble of 10 ML pipelines (Table 2).
Table 1.
List of Classification Methods Included in AutoPrognosis.
Classification Methods | ||
---|---|---|
| ||
Logistic Regression | Random Forest | Gradient Boosting |
XGBoost | AdaBoost | Bagging |
Bernoulli NB | Gaussian NB | Multinomial NB |
Perceptron | Decision Trees | SVM |
LDA | QDA | kNN |
Neural Networks |
Table 2.
List of the 10 Pipelines Fitted to the THA Cohort.
Pipeline # |
Methods | Hyper-Parameters | Weight |
---|---|---|---|
| |||
1 | Logistic Regression |
(l2-penalty, 0.292) | 0.165 |
2 | Random Forest | (max_depth = 5, n_estimators = 180) | 0.160 |
3 | Logistic Regression |
(l2-penalty, 0.290) | 0.134 |
4 | Logistic Regression |
(l2-penalty, 0.129) | 0.110 |
5 | Logistic Regression |
(l2-penalty, 0.305) | 0.103 |
6 | Logistic Regression |
(l2-penalty, 0.092) | 0.103 |
7 | Gradient Boosting | (max_depth = 5, n_estimators = 150) | 0.096 |
8 | XGBoost | (max_depth = 3, n_estimators = 100) | 0.073 |
9 | Random Forest | (max_depth = 5, n_estimators = 300) | 0.049 |
10 | Logistic Regression |
(l2-penalty, 0.104) | 0.008 |
We built five standard ML benchmark models that span different classes of ML modeling approaches: logistic regression (a linear classifier), random forest (a tree-based ensemble classifier), AdaBoost, gradient boosting machines, and XGBoost (booting ensemble classifiers) [15–18]. We implemented logistic regression, random forest, AdaBoost, and gradient boosting machines using the scikit-learn Python library [19]. XGBoost was built using the xgboost Python library [18]. The hyperparameters of each model were selected via grid search. For logistic regression, the coefficient for L2 regularization was chosen from a set of values on a logarithmic scale between 1.0 × e−3 and 1.0 × e3. For random forest, AdaBoost, and Gradient Boosting models, the number and trees are chosen from the set {50, 100, 200, 300}. For XGBoost, the number of trees and the maximum depth of each tree were selected from sets {50, 100, 200, 300} and {2, 3, 4, 5}, respectively. These steps were repeated as posthoc analyses of two cohort subgroups: patients with morbid obesity and patients with diabetes mellitus. These groups were chosen due to their documented association with postoperative complications [20,21]. Furthermore, these are shares of the THA patient population expected to grow significantly [4].
Performance Metrics
In order to avoid overfitting, we evaluated the discrimination and calibration performances of the prognostic models using five-fold stratified cross-validation. In every cross-validation fold, the training cohort (80% of the study population) was used to derive the AutoPrognosis model and the ML benchmark models. A holdout testing cohort (20% of the study population) was used for performance evaluation. We report the mean and 95% confidence intervals for all models.
Discrimination determines how well a model distinguishes patients who developed postoperative complications from those who did not. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Equivalent to the concordance statistic or C-statistic, AUROC represents the probability that a randomly selected patient who experienced an outcome was assigned a higher risk by the model than a patient who did not experience the outcome. An AUROC of 0.5 indicates random prediction, while an AUROC of 1 indicates perfect prediction [4,22]. A value greater than 0.9 is considered to have high accuracy, 0.7–0.9 indicates moderate accuracy, and 0.5–0.7 indicates low accuracy [23].
Calibration signifies the agreement between the model’s predictions and the observed outcomes in the study population. This is assessed graphically using calibration plots, which are plots of observed versus predicted outcomes over equally sized deciles of risk [22]. The Brier score–the mean squared error between the observed values and the predicted probabilities–is a measure of discrimination and calibration. Brier scores closer to zero indicate a low deviation of the model’s predicted values from the observed probability, and therefore, a more accurate model [4].
Feature Importance
In order to help interpret predictions issued by AutoPrognosis, we utilize the partial dependence function described by Friedman to measure the importance of an individual feature by assessing the average effect in predicted risks when its value is altered [17]. More specifically, is a chosen target feature in the set of input features and be its complement, i.e., , and be the predicted risk by our trained model. Then, we define the feature importance score for an individual feature by averaging for binary features and where and are the maximum and minimum of feature .
Results
Baseline Characteristics
A total of 89,986 patients met the inclusion criteria for this study. The median age of the cohort was 67 years, with 39,780 males (44%). The majority of patients (54%) were insured through Medicare. The most common medical comorbidity present in the cohort was diabetes mellitus (13%). A complete description of the cohort demographics is provided in Table 3. There were 545 patients who suffered at least one major complication (0.61%). The most common complications were pneumonia, sepsis, acute myocardial infarction, and pulmonary embolism (Table 4).
Table 3.
Baseline Cohort Statistics.
Variable | All Patients (n = 89,986) |
---|---|
| |
Demographics | |
Age (years) | Median (IQR) 67 (59–74) |
Male | Number (%) 39,780 (44.21) |
Race | |
White | 75,099 (83.46) |
Black | 4979 (5.53) |
Asian/Pacific Islander | 3148 (3.50) |
Native American | 233 (0.26) |
Other | 5591 (6.21) |
Unknown | 936 (1.04) |
Ethnicity | |
Non-Hispanic | 80,724 (89.71) |
Hispanic | 8082 (8.98) |
Unknown | 1180 (1.31) |
Insurance | |
Medicare | 48,803 (54.23) |
Private | 33,196 (36.89) |
Medi-Cal | 5530 (6.15) |
Workers’ compensation | 1049 (1.17) |
Other | 1408 (1.56) |
Medical comorbidities | |
Diabetes mellitus | 11,449 (12.72) |
Metastatic cancer or acute leukemia | 192 (0.21) |
Other major cancer | 1284 (1.43) |
Respiratory/heart/digestive/urinary/other neoplasms | 654 (0.73) |
Protein-calorie malnutrition | 472 (0.52) |
Morbid obesity | 5526 (6.14) |
Rheumatoid arthritis or inflammatory connective tissue disease | 3635 (4.04) |
Osteoarthritis | 4751 (5.28) |
Osteoporosis or other bone/cartilage disorders | 8075 (8.97) |
Dementia | 803 (0.89) |
Major psychiatric disorder | 3691 (4.10) |
Hemiplegia, paraplegia, paralysis, or functional disability | 161 (0.18) |
Chronic atherosclerosis/angina | 7380 (8.20) |
COPD | 4856 (5.40) |
Renal failure | 6109 (6.79) |
Decubitus ulcer | 68 (0.08) |
Vertebral fracture | 30 (0.03) |
Skeletal disorders | 1878 (2.09) |
Post-traumatic osteoarthritis | 56 (0.06) |
Hospital volumea | Median (IQR) 1493 (990–2393) |
IQR, interquartile range; COPD, chronic obstructive pulmonary disease.
Cases of total hip and knee arthroplasties performed between 2015 and 2017.
Table 4.
Postoperative Complications.
Complications | All Patients (n = 89,986) Number (%) |
---|---|
| |
At least one complication | 545 (0.61) |
Pneumonia | 242 (0.27) |
Sepsis | 159 (0.18) |
Acute myocardial infarction | 85 (0.09) |
Pulmonary embolism | 56 (0.06) |
Mechanical complications | 55 (0.06) |
Periprosthetic or wound infection | 13 (0.01) |
Surgical site bleeding | 6 (0.01) |
IQR, interquartile range.
Model Performances
Algorithms predicting the risk of major complications after THA were built with AutoPrognosis, logistic regression, and four standard ML models (XGBoost, Gradient Boosting, AdaBoost, Random Forest). In the overall cohort, the AutoPrognosis model demonstrates higher discrimination (AUROC: 0.732 ± 0.01) compared to logistic regression (AUROC: 0.644 ± 0.02). It also outperforms the four other standard benchmark ML models (Table 5). The AutoPrognosis model is well-calibrated with a Brier score of 0.007 ± 0.002. The logistic regression and standard ML models are similarly well-calibrated (Table 6). A calibration plot of AutoPrognosis and logistic regression is depicted in Figure 2. AutoPrognosis has the best performance of all tested models in the overall cohort, as well as in the morbid obesity and diabetes cohorts. The gain in performance improvement of the AutoPrognosis model over the best performing standard model was comparable in the overall cohort compared to the morbid obesity and diabetes subgroups (Table 7).
Table 5.
Discrimination of AutoPrognosis and Standard Models.
Model | AUROC (95% Confidence Interval) |
---|---|
| |
AutoPrognosis | 0.732 ± 0.01 (0.720–0.745) |
Logistic Regression | 0.644 ± 0.02 (0.628–0.659) |
XGBoost | 0.635 ± 0.03 (0.612–0.658) |
Gradient Boosting | 0.715 ± 0.02 (0.699–0.732) |
AdaBoost | 0.710 ± 0.02 (0.695–0.726) |
Random Forest | 0.574 ± 0.02 (0.557–0.591) |
Table 6.
Brier Scores for Model Calibration.
Model | Brier Score (95% Confidence Interval) |
---|---|
| |
AutoPrognosis | 0.0070 ± 0.0021 (0.0052–0.0088) |
Logistic Regression | 0.0060 ± 0.0001 (0.0059–0.0061) |
XGBoost | 0.0063 ± 0.0001 (0.0062–0.0063) |
Gradient Boosting | 0.0084 ± 0.0034 (0.0054–0.0114) |
AdaBoost | 0.0068 ± 0.0001 (0.0067–0.0069) |
Random Forest | 0.0070 ± 0.0002 (0.0068–0.0072) |
Fig. 2.
Calibration plot of AutoPrognosis and logistic regression models.
Table 7.
Discrimination of AutoPrognosis and Standard Models in Morbid Obesity and Diabetes Subgroups.
Model | Morbid Obesity (n = 5526) | Diabetes (n = 11,449) |
---|---|---|
| ||
AutoPrognosis | 0.661 ± 0.07 | 0.722 ± 0.069 |
Logistic Regression | 0.572 ± 0.02 | 0.622 ± 0.030 |
XGBoost | 0.614 ± 0.02 | 0.608 ± 0.027 |
Gradient Boosting | 0.650 ± 0.03 | 0.705 ± 0.041 |
AdaBoost | 0.658 ± 0.05 | 0.721 ± 0.062 |
Random Forest | 0.549 ± 0.03 | 0.570 ± 0.017 |
Relative Feature Importance
The relative importance of each variable to the model performance for AutoPrognosis and logistic regression are displayed in Table 8 and Figure 3. The five features most important for risk prediction in AutoPrognosis are malnutrition, dementia, malignancy, chronic obstructive pulmonary disease (COPD), and Medicare coverage. The variables that are most important for AutoPrognosis differ from those that are most important for logistic regression.
Table 8.
Relative Feature Importance for Complications After THA.
Feature | Rank in AutoPrognosis (Rank in Logistic Regression) |
Change to Risk Prediction |
---|---|---|
| ||
Binary features | ||
Malnutrition | 1 (5) | 6.01 × 10−2 |
Dementia | 2 (10) | 1.51 × 10−2 |
Cancer | 3 (13) | 8.08 × 10−3 |
COPD | 4 (3) | 6.12 × 10−3 |
Medicare | 5 (8) | −5.32 × 10−3 |
Chronic atherosclerosis | 6 (1) | 5.22 × 10−3 |
Renal failure | 7 (2) | 5.03 × 10−3 |
Other insurance | 8 (18) | −3.21 × 10−3 |
Osteoarthritis | 9 (11) | 2.82 × 10−3 |
Workers’ compensation | 10 (20) | −2.75 × 10−3 |
Skeletal disorders | 11 (15) | 2.65 × 10−3 |
Medi-Cal | 12 (7) | −2.39 × 10−3 |
Rheumatoid arthritis | 13 (23) | −2.15 × 10−3 |
Diabetes mellitus | 14 (6) | 1.97 × 10−3 |
Morbid obesity | 15 (12) | 1.94 × 10−3 |
Continuous features | ||
Hospital volume | 1 (2) | −4.70 × 10−3 |
Age | 2 (1) | 2.63 × 10−3 |
COPD, chronic obstructive pulmonary disease.
Fig. 3.
A. Relative binary feature importance toward AutoPrognosis (AP) and logistic regression (LR) model performances for the prediction of complications after THA. B. Relative continuous feature importance toward AutoPrognosis (AP) and logistic regression (LR) model performances for the prediction of complications after THA.
Discussion
Lower extremity total joint arthroplasty comprises the largest procedural expenditure for Medicare [4,24]. The prevalence of THA is projected to increase substantially in the coming decades. Given the significant cost and morbidity caused by postoperative complications and unplanned readmissions, accurate prediction of a patient’s risk of developing complications is valuable. Accurate prediction tools can be used to inform preoperative counseling and management decisions. This prognostic information also presents an opportunity to address potentially modifiable risk factors.
Using a dataset of over 1.4 million cases and 2805 different procedure codes, the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) developed a universal surgical risk calculator spanning multiple surgical specialties [25]. CMS specifically recommends this calculator for preoperative counseling [26]. Only 12% of patients used to develop this calculator underwent orthopedic procedures, raising questions about its applicability to orthopedic surgery. Indeed, the ACS NSQIP calculator has poor performance at predicting complications after arthroplasty [2]. Specific to lower extremity arthroplasty, the American Joint Replacement Registry Total Joint Risk Calculator returns a 90-day mortality risk [6]. Yet this calculator is not internally valid and demonstrates poor discrimination and calibration for mortality [22]. Wuerz and colleagues report a nomogram with a fair prediction for major in-hospital complications following lower extremity arthroplasty [5]. Similarly, Mesko and colleagues report the prediction of unplanned readmission within 30 day [27]. While these two studies demonstrate fair prediction of outcomes, their applicability to preoperative risk stratification is limited by the inclusion of intraoperative and admission-specific variables such as estimated blood loss and length of hospitalization.
Most risk prediction tools in arthroplasty have been developed with logistic regression. A subset of artificial intelligence, ML methods iteratively learn and make predictions from underlying patterns in large datasets [28,29]. ML may allow for the detection of indirect nonlinear relationships and multivariate effects that traditional regression techniques are less sensitive at detecting [28]. Within orthopedic surgery, ML algorithms have been used to recognize osteoarthritis from a radiograph, as well as through gait analysis [30]. ML methods have also been employed to predict outcomes for spinal infections, spinal metastasis, and orthopedic oncology [31–33]. With respect to arthroplasty, ML has been used for predicting inpatient length of stay and costs with subsequent development of patient-specific pricing due to the elective nature, high patient volume, and shifting payment models of these procedures [30,34–36]. When predicting complications after total joint arthroplasty, however, recent algorithms have shown suboptimal performance. Harris and colleagues reported an algorithm predicting any major complication after elective total hip or knee arthroplasty with an AUROC of 0.60 [37]. More recently, Harris and colleagues report an ML algorithm predicting any complication after total hip or knee arthroplasty with an AUROC of 0.638 [38].
We report the use of an algorithmic framework that leverages Bayesian optimization techniques for automating the generation of an ML-based clinical prognostic model. This novel ensemble algorithm demonstrates accurate prediction of major perioperative complications after THA. With an AUROC of 0.732, this model represents moderate statistical accuracy with superior risk prediction compared to currently available risk prediction algorithms. Built with data from approximately 90,000 patients, this model is well-calibrated and demonstrates superior discrimination compared to logistic regression and four other standard benchmark ML algorithms. While some standard ML models improved prediction compared to logistic regression, others did not. Thus, the appropriate selection of an ML model with careful tuning of hyperparameters is crucial to realize the full benefits of ML modeling [12]. AutoPrognosis automates these steps, facilitating the use of highly optimized ML pipelines in mainstream clinical research by investigators who may not possess an in-depth knowledge of ML methods.
We also report the relative importance of each feature to the performance of the AutoPrognosis model. While these features are not necessarily causal, their inclusion in the model increases predictive accuracy. Of the binary variables, the most important feature we identified is malnutrition. Preoperative hypoalbuminemia has been shown to be a robust predictor of post-operative sepsis, wound complications, respiratory complications, extended length of stay, unplanned readmission, and death [39–41]. We identified dementia as the next most important predictor. Multiple studies have shown preoperative dementia to be associated with mortality and the need for revision surgery [6,42,43]. We also show that comorbidities such as cancer, diabetes mellitus, COPD, renal failure, and chronic atherosclerosis are important contributors to the predictive accuracy of our novel ML algorithm. These markers of the low physiologic reserve have been shown to be predictive of complications after THA and have been incorporated in prior risk calculators [6,20,21,44].
The two most important continuous variables to model performance were hospital volume and age. As with total knee arthroplasty, shoulder arthroplasty, and spinal surgery, there is evidence of a volume-outcome relationship following THA. THAs performed at low-volume centers have been shown to be associated with a higher risk of surgical site infections, complications, as well as short-term and long-term mortality [45]. Advanced age is an established risk factor for mortality and complications following THA [6,44,46]. Our findings lend further support to the volume-outcome relationship in THA and the association between advanced age and outcomes.
While there is overlap between the features found to be most important for AutoPrognosis and logistic regression, the rank list of relative feature importance differs significantly between these two approaches. For example, renal failure has consistently been shown to be an important risk factor for developing complications after THA [47–50]. This is the case in our analysis as well; renal failure is the second most important predictor in our logistic regression model and the seventh most important in AutoPrognosis. What is notable, however, is the relative difference in the importance of this variable between the two models. That renal failure is ranked lower in importance for AutoPrognosis than for logistic regression does not mean that renal failure is less correlated with complications than the literature suggests; in fact, AutoPrognosis is unable to make any conclusions regarding statistical inference. Rather, it indicates that the role of renal failure as a variable is different in AutoPrognosis than in logistic regression. Taken together, the differences in relative feature importance suggest that the superior discriminative capability of AutoPrognosis stems not just from its ability to select among different ML models but also by capturing complex nonlinear relationships between variables that logistic regression is unable to capture.
This study has limitations, the first of which is its retrospective design. Second, the use of a deidentified state-wide database limits the granularity of the variables and outcomes that can be collected. The reliance on ICD-9/10 diagnosis and procedure codes to search the database is a less comprehensive approach compared to chart review and may underestimate complication rates. It is possible that a complication may be documented in clinical notes but not necessarily assigned a diagnosis code for the hospitalization. Similarly, medical comorbidities may not always be coded accurately. For example, standard preoperative laboratory tests may show hypoalbuminemia, but a diagnosis code for malnutrition may not be entered. Specific laboratory values (eg albumin, hemoglobin A1c) are not available for manual look-up, requiring reliance on CMS-HCC Condition Categories comprised of groups of ICD-9/10 codes to determine comorbidities such as malnutrition and diabetes. Furthermore, with this database, we are unable to assess mortality, patient-reported functional outcomes, and patient satisfaction. It should also be noted that this cohort is comprised of patients for whom it was determined that the perceived benefit of surgery outweighed the risk of surgery; this represents a source of selection bias. Additionally, with any predictive model, there exists a concern for overfitting. In overfitting, the algorithm has good performance on the development set, but the generalizability of the algorithm to new cohorts is diminished since the algorithm is fit to the specific idiosyncrasies of the development cohort. While we aim to protect against overfitting with our model development and validation strategy, future studies in which this algorithm is validated on external cohorts are thus critical. Finally, it should be noted that any predictive algorithm is only as reliable as the data it is built on. Systematic biases in clinical decisions and data collection are amplified by ML, potentially adversely affecting historically underrepresented groups such as patients of lower socioeconomic status, ethnic minorities, and women [28]. Future studies should thus validate or refute the models in this analysis by using multi-institutional data or prospective study designs.
We show that the use of a novel ensemble ML algorithm allows for the prediction of major complications after THA–more accurately than any risk calculator currently available. This algorithm is well-calibrated and shows superior discriminatory capability than logistic regression and other standard benchmark ML models. By providing accurate prognostic information, this algorithm may facilitate preoperative shared decision-making. A physician must be able to provide patients with comprehensive and accurate information regarding risks and benefits in order for the patient to provide true informed consent. The determination of an acceptable risk-to-benefit ratio remains solely with the patient and his/her surgeon; we simply aim to provide an accurate estimation of perioperative risk. Furthermore, this tool can be used to identify and address potentially modifiable risk factors for complications. Preoperatively addressing malnutrition, diabetes, and obesity may reduce overall healthcare costs by decreasing the likelihood of complications.
Supplementary Material
Acknowledgments
The research reported in this publication was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National Institutes of Health under the Ruth L. Kirschstein National Research Service Award Number T32AR059033. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
One or more of the authors of this paper have disclosed potential or pertinent conflicts of interest, which may include receipt of payment, either direct or indirect, institutional support, or association with an entity in the biomedical field which may be perceived to have potential conflict of interest with this work. For full disclosure statements refer to https://doi.org/10.1016/j.arth.2020.12.040.
References
- [1].Kurtz S, Ong K, Lau E, Mowat F, Halpern M. Projections of primary and revision hip and knee arthroplasty in the United States from 2005 to 2030. J Bone Joint Surg Am 2007;89:780–5. 10.2106/JBJS.F.00222. [DOI] [PubMed] [Google Scholar]
- [2].Edelstein AI, Kwasny MJ, Suleiman LI, Khakhkhar RH, Moore MA, Beal MD, et al. Can the American College of Surgeons risk calculator predict 30-day complications after knee and hip arthroplasty? J Arthroplasty 2015;30:5–10. 10.1016/j.arth.2015.01.057. [DOI] [PubMed] [Google Scholar]
- [3].Sloan M, Premkumar A, Sheth NP. Projected volume of primary total joint arthroplasty in the U.S., 2014 to 2030. J Bone Jt Surg Am 2018;100:1455–60. 10.2106/JBJS.17.01617. [DOI] [PubMed] [Google Scholar]
- [4].Manning DW, Edelstein AI, Alvi HM. Risk prediction tools for hip and knee arthroplasty. J Am Acad Orthop Surg 2016;24:19–27. 10.5435/JAAOS-D-15-00072. [DOI] [PubMed] [Google Scholar]
- [5].Wuerz TH, Kent DM, Malchau H, Rubash HE. A nomogram to predict major complications after hip and knee arthroplasty. J Arthroplasty 2014;29:1457–62. 10.1016/j.arth.2013.09.007. [DOI] [PubMed] [Google Scholar]
- [6].Bozic KJ, Lau E, Kurtz S, Ong K, Rubash H, Vail TP, et al. Patient-related risk factors for periprosthetic joint infection and postoperative mortality following total hip arthroplasty in Medicare patients. J Bone Jt Surg Am 2012;94-A:794–800. [DOI] [PubMed] [Google Scholar]
- [7].Cabitza F, Locoro A, Banfi G. Machine learning in orthopedics: a literature review. Front Bioeng Biotechnol 2018;6:1–20. 10.3389/fbioe.2018.00075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017;542:686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J Am Med Assoc 2016;316:2402–10. [DOI] [PubMed] [Google Scholar]
- [10].Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ, et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One 2013;8:e61318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Alaa A, van der Schaar M. AutoPrognosis: automated clinical prognostic modeling via Bayesian optimization with structured kernel learning. Proc 35th Int Conf Mach Learn PMLR; 2018;80:139–48. [Google Scholar]
- [12].Alaa AM, Bolton T, Angelantonio E Di, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK Biobank participants. PLoS One 2019;14:1–17. 10.1371/journal.pone.0213653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Alaa AM, van der Schaar M. Prognostication and risk factors for cystic fibrosis via automated machine learning. Sci Rep 2018;8:1–19. 10.1038/s41598-018-29523-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation. Procedure-specific measure updates and specifications report hospital-level risk-standardized complication measure: elective primary total hip arthroplasty (THA) and/or total knee arthroplasty (TKA) - version 6.0. 2017. Centers for Medicare & Medicaid Services; 2017. https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/elective%20primary%20total%20hip%20arthroplasty%20%28tha%29%20and-or%20total%20knee%20arthroplasty%20hospital-level%20risk-standardized%20complication%20measure%202017%20measure%20updates%20and%20specifications%20report_3.pdf. [Google Scholar]
- [15].Breiman L Random forests. Mach Learn 2001;45:5–32. [Google Scholar]
- [16].Ratsch G, Onoda T, Muller K. Soft margins for AdaBoost. Mach Learn 2001;42:287–320. [Google Scholar]
- [17].Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001;29:1189–232. 10.1214/aos/1013203451. [DOI] [Google Scholar]
- [18].Chen T, Guestrin C. XGBoost: a scalable tree boosting system. KDD ‘16 Proc 22nd ACM SIGKDD Int Conf Knowl Discov Data Min 2016:785–94. [Google Scholar]
- [19].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–30. [Google Scholar]
- [20].Jain NB, Guller U, Pietrobon R, Bond TK, Higgins LD. Comorbidities increase complication rates in patients having arthroplasty. Clin Orthop Relat Res 2005;435:232–8. 10.1097/01.blo.0000156479.97488.a2. [DOI] [PubMed] [Google Scholar]
- [21].Marchant MH, Viens NA, Cook C, Vail TP, Bolognesi MP. The impact of glycemic control and diabetes mellitus on perioperative outcomes after total joint arthroplasty. J Bone Jt Surg Am 2009;91:1621–9. 10.2106/JBJS.H.00116. [DOI] [PubMed] [Google Scholar]
- [22].Harris AHS, Kuo AC, Bozic KJ, Lau E, Bowe T, Gupta S, et al. American joint replacement registry risk calculator does not predict 90-day mortality in veterans undergoing total joint replacement. Clin Orthop Relat Res 2018;476:1869–75. 10.1097/CORR.0000000000000377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Fischer J, Bachmann L, Jaeschke R. A readers’ guide to the interprettation of diagnostic test properties: clinical example of sepsis. Intensive Care Med 2003;29:1043–51. [DOI] [PubMed] [Google Scholar]
- [24].Bozic KJ, Rubash HE, Sculco T, Berry DJ. An analysis of Medicarre payment policy for total joint arthroplasty. J Arthroplasty 2008;23:133–8. [DOI] [PubMed] [Google Scholar]
- [25].Bilimoria KY, Liu Y, Paruch JL, Zhou L, Kmiecik TE, Ko CY, et al. Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. J Am Coll Surg 2013;217:833–842–843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Centers for Medicare & Medicaid Services (CMS). HHS. Medicare program; revisions to payment policies under the physician fee schedule, clinical laboratory fee schedule, clinical laboratory fee schedule, access to identifiable data for the center for medicare and Medicaid innovation models & other revisions. Fed Regist 2014;79:67547–8010. [PubMed] [Google Scholar]
- [27].Mesko NW, Bachmann KR, Kovacevic D, LoGrasso ME, O’Rourke C, Froimson MI. Thirty-day readmission following total hip and knee arthroplasty - a preliminary single institution predictive model. J Arthroplasty 2014;29:1532–8. 10.1016/j.arth.2014.02.030. [DOI] [PubMed] [Google Scholar]
- [28].Hashimoto D, Rosman G, Rus D, Meireles O. Artifical intelligence in surgery: promises and perils. Ann Surg 2018;268:70–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Chen J, Asch S. Machine learning and predition in medicine - beyond the peak of inflated expectations. N Engl J Med 2017;376:2507–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Haeberle HS, Helm JM, Navarro SM, Karnuta JM, Schaffer JL, Callaghan JJ, et al. Artificial intelligence and machine learning in lower extremity arthroplasty: a review. J Arthroplasty 2019;34:2201–3. 10.1016/j.arth.2019.05.055. [DOI] [PubMed] [Google Scholar]
- [31].Shah A, Karhade A, Bono C, Harris M, Nelson S, Schwab J. Development of a machine learning algorithm for prediction of failure of nonoperative management in spinal epidural abscess. Spine J 2019;19:1657–65. [DOI] [PubMed] [Google Scholar]
- [32].Karhade AV, Thio QCBS, Ogink PT, Shah AA, Bono CM, Oh KS, et al. Development of machine learning algorithms for prediction of 30-day mortality after surgery for spinal metastasis. Neurosurgery 2019;85:E83–91. 10.1016/j.wneu.2018.07.276. Epub ahead of print. [DOI] [PubMed] [Google Scholar]
- [33].Thio QCBS Karhade AV, Ogink PT Raskin KA, De Amorim Bernstein K, Lozano Calderon SA, et al. Can machine-learning techniques be used for 5-year survival prediction of patients with chondrosarcoma? Clin Orthop Relat Res 2018;476:2040–8. 10.1097/CORR.0000000000000433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Karnuta JM, Navarro SM, Haeberle HS, Helm JM, Kamath AF, Schaffer JL, et al. Predicting inpatient payments prior to lower extremity arthroplasty using deep learning: which model architecture is best? J Arthroplasty 2019;34:2235–41. 10.1016/j.arth.2019.05.048. [DOI] [PubMed] [Google Scholar]
- [35].Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Scuderi GR, Mont MA. Deep learning preoperatively predicts value metrics for primary total knee arthroplasty: development and validation of an artificial neural network model. J Arthroplasty 2019;34:2220–7. 10.1016/j.arth.2019.05.034. [DOI] [PubMed] [Google Scholar]
- [36].Ramkumar PN, Haeberle HS, Bloomfield MR, Schaffer JL, Kamath AF, Patterson BM, et al. Artificial intelligence and arthroplasty at a single institution: real-world applications of machine learning to big data, value-based care, mobile health, and remote patient monitoring. J Arthroplasty 2019;34:2204–9. 10.1016/j.arth.2019.06.018. [DOI] [PubMed] [Google Scholar]
- [37].Harris AHS, Kuo AC, Bowe T, Gupta S, Nordin D, Giori NJ. Prediction models for 30-day mortality and complications after total knee and hip arthroplasties for veteran health administration patients with osteoarthritis. J Arthroplasty 2018;33:1539–45. 10.1016/j.arth.2017.12.003. Prediction. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Harris AHS, Kuo AC, Weng Y, Trickey AW, Bowe T, Giori NJ. Can machine learning methods produce accurate and easy-to-use prediction models of 30-day complications and mortality after knee or hip arthroplasty? Clin Orthop Relat Res 2019;477:452–60. 10.1097/CORR.0000000000000601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Fu MC, D’Ambrosia C, McLawhorn AS, Schairer WW, Padgett DE, Cross MB. Malnutrition increases with obesity and is a stronger independent risk factor for postoperative complications: a propensity-adjusted analysis of total hip arthroplasty patients. J Arthroplasty 2016;31:2415–21. 10.1016/j.arth.2016.04.032. [DOI] [PubMed] [Google Scholar]
- [40].Gu A, Malahias M, Strigelli V, Nocon A, Sculco T, Sculco P. Preoperative malnutrition negatively correlates with postoperative wound complications and infection after total joint arthroplasty: a systematic review and meta-analysis. J Arthroplasty 2019;34:1013–24. [DOI] [PubMed] [Google Scholar]
- [41].Black CS, Goltz DE, Ryan SP, Fletcher AN, Wellman SS, Bolognesi MP, et al. The role of malnutrition in ninety-day outcomes after total joint arthroplasty. J Arthroplast 2019;34:2594–600. [DOI] [PubMed] [Google Scholar]
- [42].Ali AM, Loeffler MD, Aylin P, Bottle A. Factors associated with 30-day read-mission after primary total hip arthroplasty analysis of 514,455 procedures in the UK National Health Service. JAMA Surg 2017;152:1–6. 10.1001/jamasurg.2017.3949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Hernandez N, Cunningham D, Jiranek W, Bolognesi MP, Seyler TM. Total hip arthroplasty in patients with dementia. J Arthroplasty 2020;35:1667–70. [DOI] [PubMed] [Google Scholar]
- [44].Bozic KJ, Ong K, Lau E, Berry DJ, Vail TP, Kurtz SM, et al. Estimating risk in medicare patients with THA: an electronic risk calculator for periprosthetic joint infection and mortality hip. Clin Orthop Relat Res 2013;471:574–83. 10.1007/s11999-012-2605-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Mufarrih SH, Ghani MOA, Martins RS, Qureshi NQ, Mufarrih SA, Malik AT, et al. Effect of hospital volume on outcomes of total hip arthroplasty: a systematic review and meta-analysis. J Orthop Surg Res 2019;14:1–13. 10.1186/s13018-019-1531-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Santaguida PL, Hawker GA, Hudak PL, Glazier R, Mahomed NN, Kreder HJ, et al. Patient characteristics affecting the prognosis of total hip and knee joint arthroplasty: a systematic review. Can J Surg 2008;51:428–36. [PMC free article] [PubMed] [Google Scholar]
- [47].Erkocak O, Yoo J, Restrepo C, Maltenfort M, Parvizi J. Incidence of infection and inhospital mortality in patients with chronic renal failure after total joint arthroplasty. J Arthroplasty 2016;31:2437–41. [DOI] [PubMed] [Google Scholar]
- [48].Cavanaugh P, Chen A, Rasouli M, Post Z, Orozco F, Ong A. Complications and mortality in chronic renal failure patients undergoing total joint arthroplasty: a comparison between dialysis and renal transplant patients. J Arthroplasty 2016;31:465–72. [DOI] [PubMed] [Google Scholar]
- [49].Inoue D, Yazdi H, Goswami K, Lan T, Parvizi J. Comparison of postoperative complications and survivorship of total hip and knee arthroplasty in dialysis and renal transplantation patients. J Arthroplasty 2020;35:971–5. [DOI] [PubMed] [Google Scholar]
- [50].Malkani JA, Heimroth JC, Ong KL, Wilson H, Price M, Piuzzi NS, et al. Complications and readmission incidence following total hip arthroplasty in patients who have end-stage renal failure. J Arthroplasty 2020;35:794–800. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.