Abstract
Objective:
To develop an electronic health record-based risk model for perioperative medicine (POM) triage and compare this model with legacy triage practices that were based on clinician assessment.
Summary of Background Data:
POM clinicians seek to address the increasingly complex medical needs of patients prior to scheduled surgery. Identifying which patients might derive the most benefit from evaluation is challenging.
Methods:
Elective surgical cases performed within a health system 2014–2019 (N = 470,727) were used to develop a predictive score, called the Comorbidity Assessment for Surgical Triage (CAST) score, using split validation. CAST incorporates patient and surgical case characteristics to predict the risk of 30-day post-operative morbidity, defined as a composite of mortality and major NSQIP complications. Thresholds of CAST were then selected to define risk groups, which correspond with triage to POM appointments of different durations and modalities. The predictive discrimination CAST score was compared with the surgeon’s assessments of patient complexity and the American Society of Anesthesiologists class.
Results:
The CAST score demonstrated a significantly higher discrimination for predicting post-operative morbidity (area under the receiver operating characteristic curve 0.75) than the surgeon’s complexity designation (0.63; P < 0.001) or the American Society of Anesthesiologists (0.65; P < 0.001) (Fig. 1). Incorporating the complexity designation in the CAST model did not significantly alter the discrimination (0.75; P = 0.098). Compared with the complexity designation, classification based on CAST score groups resulted a net reclassification improvement index of 10.4% (P < 0.001) (Table 1).
Conclusion:
A parsimonious electronic health record-based predictive model demonstrates improved performance for identifying pre-surgical patients who are at risk than previously-used assessments for POM triage.
Keywords: biomedical informatics, machine learning, perioperative medicine, quality and safety, surgical efficiency
Perioperative medicine (POM) is an area of medicine designed to optimize patient’s health status from when a potential surgery is contemplated in the outpatient setting and to when the operation occurs. Because complications after elective surgery are a substantial source of morbidity, and cost,1–3 POM practitioners work collaboratively with surgeons and anesthesiologists to mitigate preventable harm by optimizing pre-surgical care. Outpatient POM evaluations for elective surgery patients are associated with the improved identification of new medical conditions that impact surgical outcomes4–6 increased rates of postponed surgery for medical optimization,6,7 decreased day of surgery cancellations,8–10 reduced hospital length of stay,11,12 and decreased risks of wound infections.13
Given the increasing number of frail and medically complex patients undergoing surgery,14 optimal triage of pre-surgical patients into POM clinic encounters of appropriate length and modality is essential.15,16 Ideally, a triage system would consistently distinguish between low risk and high risk surgical patients, including those expected to derive the greatest benefit from pre-surgical optimization. Prior studies have identified some of the relevant predictors of POM clinic referral, which include patient age and comorbidity, operative complexity, and residential proximity to the clinic.17,18 However, there is substantial variation in the request for outpatient POM consultation,19 even among patients with high comorbidities undergoing major elective surgeries,20,21 leading to the potential for both under- or over-utilization of POM resources. Multiple triage approaches have been described, including those based on patient questionnaires,22 surgeon referrals,18 screening by nurses or anesthesiologists,23,24 and procedure characteristics.25,26 However, few studies have compared the performance of POM triage algorithms in distinguishing low and high risk surgical patients.
In this report, we describe the development and validation of a real-time electronic health record-based model that uses patient characteristics, comorbidities, laboratory values, and surgical case characteristics to predict risk of poor surgical outcomes. This risk score, called the Comorbidity Assessment and Surgical Triage (CAST) score, was developed using retrospective data from an integrated healthcare delivery system, Kaiser Permanente Northern California (KPNC). The CAST score is used to segment pre-operative patients into risk groups, which then informs which patients might be expedited through POM and which patients require more intensive POM evaluations. This model for POM triage has been developed as part of a regional effort in KPNC to improve the efficiency and quality of surgical care. We hypothesize that the CAST model would demonstrate better discrimination compared to the following clinician-based methods of POM triage and referral that were previously used: 1) surgeon-assessed designation of patient complexity and the American Society of Anesthesiologists (ASA) physical status classification system.
METHODS
This study was approved by the Kaiser Permanente Institutional Review Board, which provided a waiver of informed consent.
Identification of Elective Surgical Cases
Within KPNC, all hospitals and clinics employ the same information systems and share a single EHR system. Our study population consisted of elective surgical cases for adult patients (≥ 18 years of age), excluding labor and delivery, treated at KPNC facilities between January 1, 2014 and December 31, 2019. We identified elective cases based on EHR data entered by the surgeon into the surgical ‘case request’ which is a prerequisite for scheduling surgical procedures, and we excluded cases if the surgeon indicated it was an ‘add on’ case, needs to be performed within 24 hours, or needs to be performed within 48 hours from the time of the case request submission. We also excluded non-surgical procedures performed in operating rooms as well as ophthalmological surgery cases based on guidance from clinical leadership that POM evaluations are not used for the majority of those cases.
Formation of Primary and Secondary Cohorts
Because we sought to develop a real-time model to assist with POM triage that would predict the risk of major complications, our primary analytic cohort was limited to the subset of the elective surgeries performed between January 1, 2014 and December 31, 2019 that were sampled for the American College of Surgeons National Surgery Quality Improvement Program (NSQIP), which KPNC has participated in throughout the study period (n = 162,202 or 34.5% of all elective surgeries). The NSQIP sampling and data collection strategy, which has been described elsewhere,27 requires a dedicated surgical clinical nurse reviewer to abstract data on 30-day post-operative complications for patients undergoing a broad range of operations across all surgical subspecialties, with the exception of transplant and trauma. The primary cohort allowed us to capture a variety of reviewed and validated post-operative complications. Our secondary cohort consisted of the elective surgical cases performed between January 1, 2017 and December 31, 2019 that were not sampled for reporting to NSQIP.
Surgical Adverse Outcomes
The CAST model predicts the primary outcome 30-day postoperative “morbidity,” which was defined as a composite of 30-day mortality and 30-day major post-operative complications. Mortality was identified using KPNC health system records, and 30-day major post-operative complications were identified based on data collected using the NSQIP framework and included cardiac arrest, myocardial infarction, pulmonary embolism, sepsis, septic shock, surgical site infections, unplanned intubation, deep vein thrombosis, progressive renal insufficiency or renal failure, or cerebrovascular accident.28 Because complications data were available only for primary cohort and not the secondary cohort, we collected data on the following secondary outcomes which were available for all patients: 30-day mortality, inpatient mortality, 30-day readmission, 7 day return to emergency department, and 30-day return to care.
Potential Predictors
We considered a parsimonious set of potential predictors that would be available in real-time within the EHR when a request is placed by the surgeon to schedule a case. The potential predictors included age, body mass index, scalar measures of comorbid disease burden and illness severity, and case-specific information entered by the surgeon. We quantified comorbid disease burden with a previously validated risk score, the Comorbidity Points Score, Version 2 (COPS2), which is based on patients’ medical diagnoses within the 12 months preceding the date of the case request submission.29 Elective surgeries within KPNC are only performed on KPNC members, and case requests typically would be submitted after evaluations by the referring clinician and the surgeon, which ensures availability of diagnosis data. COPS2 was entered as restricted cubic spline with 3 knots. We quantified severity of illness at the time of case request, with the abbreviated Laboratory-based Acute Physiology Score (abLAPS), which is based on the most physiologically deranged value of 14 laboratory tests over the month before the date that surgery was requested. This score is an outpatient modification of a previously reported hospital-based severity of illness score30 and can be unavailable if the patient did not have any of the 14 laboratory tests resulted in the month preceding the surgery, which occurred in the majority of cases. The abLAPS score was categorized into 4 groups: unavailable, low (0 to 4), medium (5 to 10) and high (> 10). Finally, we also considered predictors that were captured from required fields in the case request submission form, including patient class (the intended post-operative disposition, i.e. inpatient or outpatient) and case class (cancer, elective, or other).
Model Development
We used a split validation approach for model development, with 60% of the cohort randomly assigned to the training, 20% to validation, and 20% to test sets.31 We developed our model using the training and validation data sets, and our final model performance was reported using the test set. We evaluated model discrimination based on the area under the receiver operating characteristic curve (AUROC). However, this measure may be misleading in imbalanced data sets with rare outcomes, so we therefore consulted the area under precision-recall curves (AUPRC).32 We visually assessed calibration through calibration plots.33 Because our focus was on real-time implementation within the Epic EHR software used in KP (KP Health Connect, KPHC), we preferred simpler predictive modeling techniques (logistic regression) to more sophisticated ones (random forests), the latter of which could not be implemented in real-time in the EHR. We chose the most parsimonious model that was easily reproducible in KPHC.
We evaluated the discrimination of the final model, called the Comorbidity Assessment and Surgical Triage (CAST) score, on the test subset, using the primary outcome of morbidity. We further assessed model performance within each surgical specialty and using secondary outcomes.
Threshold Selection for CAST Model Implementation
Based on guidance from KPNC POM regional leadership, we identified CAST score thresholds to stratify surgical cases into Low CAST, Medium CAST, or High CAST levels. The thresholds of CAST score were selected based on resource considerations, related to the allocation of provider time, and to achieve a greater than 70% specificity and a greater than 70% sensitivity. The CAST model is embedded into the EHR and is calculated when the surgeon submits a request to book a surgical case. Surgical schedulers use the CAST levels to direct POM scheduling to (1) a 20-minute POM telephone visit, (2) a 40-minute POM in-person visit, or (3) a 60-minute POM in-person visit. These thresholds and the interventions associated with each CAST level were selected in by operational leadership based on the overall availability of POM resources, particularly provider time. The 60-minute POM visit was developed concurrently to the development of CAST, and is reserved for the small proportion of very high-risk patients. However, the implementation of the model was intended to not substantially alter the overall utilization of POM resources, and the model allows for modification of the thresholds in the future based on availability of resources. In the EHR, the CAST level and associated POM appointment type is displayed to the surgeon while the case request is being completed, and the surgeon is able to alter the POM appointment for that patient based on clinical judgement.
Comparison with Previous KPNC Triage Procedures
Prior to the implementation of the CAST model, KPNC POM triage was based solely on data entered within the surgical case request form in the EHR. During case request submission, the surgeon must designate the case as falling within one of three ‘complexity’ categories as: “No POM appointment required,” “Relatively Healthy,” or “Relatively Complex.” Surgical schedulers used these complexity designations to direct POM scheduling to (1) no POM visit, (2) a 20-minute POM telephone visit, or (3) a 40-minute POM in-person visit, respectively.
Among all elective surgical cases, we compared distribution of complexity designations across facilities within KPNC. The discrimination of the patient complexity designation, the CAST model, the American Society of Anesthesiologist physical status score (ASA), and a model using patient complexity and the predictors in the CAST model were assessed using AUROC and AUPRC. Of note, practice guidelines have suggested ASA score as a method for triage to POM,34 though in our system it is not used for triage. ASA score is part of the standard preoperative assessment by the anesthesiologist on the day of surgery. Finally, we quantified appropriateness of classification by CAST levels compared with the patient complexity designation, by calculating the categorical event and non-event net reclassification indices for 30-day morbidity.35
All statistics were 2-tailed, and statistical significance was accepted at the p < 0.05 level. Statistical analyses were performed using SAS version 9.4 (SAS Institute), R version 4.0.2 (The R Foundation), and Stata 16 (StataCorp).
RESULTS
A total of 470,727 elective surgical cases were identified, 162,202 of which were in the primary cohort (Table 1). Compared with the secondary cohort, the primary cohort was older (57.8 years vs. 55.0 years, p < 0.001) and had similar, albeit statistically significantly lower, comorbid disease burden (COPS2: 20.5 vs 21.4, p < 0.001). A significantly higher proportion of primary cohort was scheduled to have an inpatient post-operative disposition (36.4% vs 14.8%, p < 0.001). A higher proportion of primary cohort were designated by the surgeon as complex (37.5% vs 26.2%, p < 0.001) compared with the secondary cohort. Morbidity, defined as the unadjusted composite outcome of 30-day mortality and major complications, was 2.2% in the quality assessment cohort (Table 2). Thirty-day mortality, inpatient mortality, 30-day readmission, 7-day return to the emergency department, and 30-day any returns to care were all significantly higher in the primary cohort than in secondary cohort. Within the primary cohort, the training, validation, and test sets had similar characteristics (see Table, Supplemental Digital Content 1, http://links.lww.com/SLA/D513).
TABLE 1.
Characteristics of Elective Surgical Cases 2014–2019
Primary Cohort* (2014–2019) 162,202 | Secondary Cohort† (2017–2019) 308,525 | P-value | |
---|---|---|---|
| |||
Female, No. (%) | 93,516 (57.7%) | 174,056 (56.4%) | < 0.001 |
Age, mean (SD), yr | 57.4 (16.2) | 54.9 (16.8) | < 0.001 |
Body Mass Index, mean (SD), kg/m2 | 29.2 (6.5) | 29.1 (6.5) | < 0.001 |
Abbreviated Laboratory-based Acute Physiology Score, No. (%) | |||
Low | 18,617 (11.5%) | 33,928 (11.0%) | < 0.001 |
Medium | 12,068 (7.4%) | 20,250 (6.6%) | |
High | 10,557 (6.5%) | 20,154 (6.5%) | |
Unavailable | 120,960 (74.6%) | 234,193 (75.9%) | |
Comorbidity Points Score, Version 2, mean (SD) | 20.5 (24.0) | 21.4 (26.9) | < 0.001 |
Patient Class, No. (%) | |||
Inpatient | 59,066 (36.4%) | 45,703 (14.8%) | < 0.001 |
Outpatient | 103,136 (63.6%) | 262,822 (85.2%) | |
Case Class, No. (%) | |||
Cancer | 24,862 (15.3%) | 30,451 (9.9%) | < 0.001 |
Elective | 125,702 (77.5%) | 250,023 (81.0%) | |
Other | 11,638 (7.2%) | 28,051 (9.1%) | |
Service Category, No. (%) | |||
Bariatric | 4,565 (2.8%) | 2,154 (0.7%) | < 0.001 |
General Surgery | 44,075 (27.2%) | 70,550 (22.9%) | |
Gynecology | 18,075 (11.1%) | 32,017 (10.4%) | |
Head and Neck | 8,210 (5.1%) | 32,389 (10.5%) | |
Neurosurgery | 5,332 (3.3%) | 6,157 (2.0%) | |
Orthopedics | 55,262 (34.1%) | 96,355 (31.2%) | |
Plastics | 5,610 (3.5%) | 15,190 (4.9%) | |
Spine | 3,769 (2.3%) | 6,172 (2.0%) | |
Thoracic | 1,663 (1.0%) | 2,042 (0.7%) | |
Urology | 11,956 (7.4%) | 35,500 (11.5%) | |
Vascular | 3,685 (2.3%) | 9,999 (3.2%) | |
Patient Complexity, No. (%) | |||
No perioperative medicine appointment required | 3,735 (2.3%) | 6,805 (2.2%) | < 0.001 |
Relatively Healthy | 97,658 (60.2%) | 220,923 (71.6%) | |
Relatively Complex | 60,809 (37.5%) | 80,797 (26.2%) | |
American Society of Anesthesiologists Physical Status Class, No. (%) | |||
Class 1 | 15,362 (9.5%) | 35,615 (11.5%) | < 0.001 |
Class 2 | 90,763 (56.0%) | 176,334 (57.2%) | |
Class 3 | 53,751 (33.1%) | 90,803 (29.4%) | |
Class 4 | 2,295 (1.4%) | 5,747 (1.9%) | |
Class 5 | 3 (0.0%) | 5 (0.0%) |
The primary cohort was limited to elective surgical cases reported to the American College of Surgeons National Surgery Quality Improvement Program (NSQIP).
The secondary cohort limited to elective surgical cases that were not sampled for reporting to NSQIP.
TABLE 2.
Rates of Surgical Adverse Outcomes
Primary Cohort* (2014–2019) 162,202 | Secondary Cohort† (2017–2019) 308,525 | P-value | |
---|---|---|---|
| |||
Morbidity (30 d)‡ | 2.2% | – | < 0.001 |
Mortality (30 d) | 0.2% | 0.1% | < 0.001 |
Mortality (inpatient) | 0.1% | 0.0% | < 0.001 |
Readmission (30 d) | 2.8% | 2.7% | < 0.001 |
Return to Emergency Department (7 d) | 7.4% | 5.7% | < 0.001 |
Any Returns to Care (30 d) | 12.8% | 11.5% | < 0.001 |
The primary cohort was limited to elective surgical cases reported to the American College of Surgeons National Surgery Quality Improvement Program (NSQIP).
The secondary cohort limited to elective surgical cases that were not sampled for reporting to NSQIP.
Morbidity is defined the composite outcome of 30-day mortality and 30-day NSQIP major complications, which include cardiac arrest, myocardial infarction, pulmonary embolism, sepsis, septic shock, surgical site infections, unplanned intubation, deep vein thrombosis, progressive renal insufficiency or renal failure, and cerebrovascular accident.
The various logistic models we considered demonstrated better performance for predicting morbidity than random forest models (see Table, Supplemental Digital Content 2, http://links.lww.com/SLA/D513), and we ultimately favored logistic over random forest due to the overall performance. The final set of predictors and their associated transformation for the model we selected, called CAST model, allowed for easy implementation within the EHR (see Table, Supplemental Digital Content 3, http://links.lww.com/SLA/D513). The CAST model had an AUROC of 0.75 (95% CI 0.73 to 0.77) and an AUPRC 0.08 (95% CI 0.06 to 0.09) within the primary cohort (Table 3). The predictors selected for the CAST model were similarly predictive of secondary outcomes in the primary and secondary cohorts, such as 30-day mortality (AUROC 0.87 in the primary cohort vs. 0.90 in the secondary cohort) and 30-day readmission (AUROC 0.72 in the primary cohort vs. 0.76 in the secondary cohort). Across different service categories the AUROC was 0.66 to 0.82 (see Table, Supplemental Digital Content 4, http://links.lww.com/SLA/D513), with the exception of bariatric surgery (AUROC 0.55). Using the predictors in the CAST model, AUROC is similar for 30-day mortality and 30-day readmission within the primary and secondary cohorts (0.87 vs. 0.90 for 30-day mortality, 0.72 vs. 0.76 for 30-day readmission). AUROC was lower for 7-day return to emergency department (0.62 for quality assessment sample) and 30-day returns to care (0.64 for quality assessment sample). The model appeared well calibrated throughout its range (see Figure, Supplemental Digital Content 5, http://links.lww.com/SLA/D513).
TABLE 3.
Performance of Predictors Selected for Comorbidity Assessment for Surgical Triage (CAST) Model on Adverse Outcomes
Primary Cohort |
Secondary Cohort |
|||
---|---|---|---|---|
Outcomes | AUROC | AUPRC | AUROC | AUPRC |
| ||||
Morbidity* | 0.75 (0.73, 0.77) | 0.08 (0.06, 0.09) | – | – |
Mortality (30 day) | 0.87 (0.83, 0.91) | 0.02 (0.00, 0.05) | 0.90 (0.88, 0.91) | 0.01 (0.01, 0.02) |
Mortality (inpatient) | 0.91 (0.86, 0.95) | 0.01 (0.00, 0.02) | 0.87 (0.83, 0.90) | 0.01 (0.00, 0.01) |
Readmission (30 day) | 0.72 (0.70, 0.74) | 0.08 (0.07, 0.09) | 0.76 (0.75, 0.76) | 0.10 (0.09, 0.10) |
Return to Emergency Department (7 d) | 0.62 (0.60, 0.63) | 0.11 (0.11, 0.12) | 0.61 (0.61, 0.62) | 0.09 (0.09, 0.09) |
Any Returns to Care (30 d) | 0.64 (0.63, 0.64) | 0.21 (0.20, 0.22) | 0.64 (0.63, 0.64) | 0.19 (0.19, 0.20) |
Logistic models constructed using predictors selected for the CAST model. Performance metrics calculated using test subset of the primary cohort and the secondary cohort.
Morbidity is defined the composite outcome of 30-day mortality and 30-day NSQIP major complications, which include cardiac arrest, myocardial infarction, pulmonary embolism, sepsis, septic shock, surgical site infections, unplanned intubation, deep vein thrombosis, progressive renal insufficiency or renal failure, and cerebrovascular accident.
AUPRC indicates area under precision-recall curve; AUROC, area under receiver-operating curve.
Across facilities, the proportion of surgical patients designated as “relatively complex” ranged from 13.9%–54.7% (Fig. 1).
FIGURE 1.
Interfacility variation in patient complexity designation.
Interfacility variation in the ‘relatively complex’ designations were seen within specific operations, including total knee replacement (proportion designated as “relatively complex” ranged from 22.6% to 100.0%), inguinal hernia repair (3.0% to 38.5%), transurethral prostatectomy (5.0% to 83.5%), and carotid endarterectomy (70.8% to 100.0%).
The CAST model demonstrates a significantly higher discrimination for post-operative morbidity than the patient complexity designation by the surgeon (AUROC 0.75 vs 0.63, p < 0.001) or ASA score (AUROC 0.75 vs 0.65, p < 0.001) (Fig. 2). The CAST model demonstrates higher discrimination for post-operative morbidity than the patient complexity designation by the surgeon in every service category (see Table, Supplemental Digital Content 4, http://links.lww.com/SLA/D513), with the exception of bariatrics, within which the discrimination of the patient complexity designation was similarly poor (AUROC 0.55). Incorporation of the patient complexity designation into the CAST model exhibited similar discrimination to the CAST model alone (AUROC 0.75 vs. 0.75, p < 0.001). Applying cutoffs of 1.8% and 10% of CAST score, 69.9% of cases were estimated to fall into Low CAST, 23.6% in Medium CAST, and 6.4% in High CAST (Table 4). Thirty-day mortality increased from 0.0%, 0.4%, and 0.8% across Low, Medium, and High CAST, along with the other secondary outcomes. Classification based on CAST score groups compared with surgeon’s complexity assessment was associated with 10.9% net correctly reclassified into a higher risk group among cases with morbidity, −0.05% net correctly reclassified into a lower risk group among cases without morbidity, and an overall net reclassification improvement index of 10.4% (p < 0.001) (Table 5). The overall net reclassification improvement index was positive within every service category except thoracic surgery (see Table, Supplemental Digital Content 6, http://links.lww.com/SLA/D513), though the net correctly reclassified with morbidity was positive within thoracic surgery.
FIGURE 2.
Receiver operating characteristic curves for 30-day post-operative morbidity. AUROC, Area Under Receiver-Operator Curve; CAST, Comorbidity Assessment for Surgical Triage.
TABLE 4.
Rates of Adverse Outcomes, by Level of Comorbidity Assessment for Surgical Triage (CAST) Score
% | Morbidity (30 d) | Mortality (30 d) | Mortality (inpatient) | Readmission (30 d) | Return to Emergency Department (7 d) | Any Returns to Care (30 d) | |
---|---|---|---|---|---|---|---|
| |||||||
Low CAST Score (<1.8%) | 69.9% | 0.9% | 0.0% | 0.0% | 1.4% | 5.0% | 9.3% |
Medium CAST Score (1.8%–10%) | 23.6% | 3.6% | 0.4% | 0.1% | 5.8% | 8.9% | 18.0% |
High CAST Score (>10%) | 6.4% | 6.0% | 0.8% | 0.3% | 6.9% | 10.5% | 19.6% |
Morbidity rate based on the primary cohort. All other outcomes based on all elective surgical cases.
TABLE 5.
Change in Risk Classification for Post-Operative Morbidity Using Comorbidity Assessment for Surgical Triage (CAST) Model Compared With Using Surgeon’s Designation of Patient Complexity
Risk Classification Using Surgeon’s Designation | Risk Classification Using CAST Model | Reclassfied as | Net Correctly Reclassified (%) | Net Reclassification Improvement (%) | |||
---|---|---|---|---|---|---|---|
|
|
||||||
Low CAST | Medium CAST | High CAST | Higher Risk | Lower Risk | |||
| |||||||
Patient With Morbidity Within 30 Days | |||||||
No POM Appointment Required | 25 | 25 | 7 | ||||
Relatively Healthy | 582 | 563 | 92 | 687 | 306 | 10.9% | |
Relatively Complex | 306 | 1,484 | 401 | 10.4% | |||
Patient Without Morbidity Within 30 Days | |||||||
No POM Appointment Required | 2,599 | 973 | 106 | ||||
Relatively Healthy | 75,779 | 18,150 | 2,492 | 21,721 | 20,944 | −0.5% | |
Relatively Complex | 20,944 | 34,797 | 2,877 | ||||
Improved classification | |||||||
No classification change | |||||||
Worse classification |
POM indicates perioperative medicine.
DISCUSSION
Optimization of triage into POM clinics represents a unique challenge within healthcare systems, as POM clinics must evaluate patients with the full spectrum of medical conditions before they undergo nearly any type of surgical procedure. To our knowledge, the CAST score is the first scoring system based on the risk of postoperative complications to be developed and embedded into the EHR in the context of POM triage. The score represents a simple, deployable model with good discrimination that uses data elements that are readily available in most comprehensive EHRs. The CAST score appears to exhibit better discrimination for post-operative complications than clinical judgement alone. Incorporating the case complexity designation within CAST did not meaningfully augment the predictive discrimination of the model, which suggests that the objective and subjective factors that influence how surgeons determine complexity designation may be captured in the CAST model.
Our data further suggest that CAST-based POM triage would have correctly reclassified patients who experienced post-operative adverse events into higher levels of POM evaluation, while not substantially altering the risk classification of those who did not have significant complications.
Multiple different POM triage practices are described in the literature, including clinical judgement alone,18,25 patient questionnaires,23,36,37 or scoring systems based on patient and surgical procedure characteristics.24–26,34,38 Investigators have previously described developing and implementing scoring systems for POM triage that are based on the risk of the patient being ASA score III or IV.24,26,39–43 However, given the inherent subjectivity and poor inter-rater reliability in the assignment of ASA score by providers,40,44,45 we questioned whether training a model to predict ASA will yield the most consistent approach to POM triage. Our analysis demonstrates that compared with ASA, CAST exhibited greater discrimination for poor surgical outcomes.
We found that the clinician-assessed case complexity was variable with as much as four-fold interfacility difference in the percentage of patients designated as complex by the surgeon. It is unclear what proportion of the observed interfacility variation in patient complexity designation can be attributed to true differences in case complexity or risk versus practice variation. However, the extensive degree of observed variation suggests potential over- or under-triage to POM that may be improved through standardization. This variability is perhaps unsurprising, given prior studies that suggest surgeons have varying attitudes towards the indications for POM evaluation.46 Another potential mechanism by which the implementation of the CAST score could improve safety is by standardizing triage practices across departments and facilities.47
While CAST score is designed to select default triage pathways to POM clinics, our implementation requires surgeon oversight and allows the surgeon to re-triage a given case. Our view is that clinician oversight is an essential feature for successfully embedding predictive models into clinical workflows. There may be several reasons why a surgeon would choose to override the default CAST-based triage selection. First, patients may have specific perioperative medical or social needs, such as life care planning, that warrant longer appointments. The CAST model standardizes triage based on predicted risk, but a low risk patient may still require a complex evaluation. Second, surgeons might request the specific consultation of POM practitioners in the course of weighing the choice between surgical and non-surgical management. Third, the CAST model does not directly include prior care, and the adequacy of prior care may alter the need for intensive POM evaluation. For example, a patient may have recently been seen by a pulmonologist or cardiologist and therefore the surgeon may deem that a less intensive POM evaluation may be appropriate. Finally, there may be settings with established pre-operative pathways that may obviate the need for CAST-based triage or POM altogether. For example, bariatric surgery patients are routinely subject to intense medical and psychosocial screening before considering weight loss surgery, and they also have relatively lower baseline comorbidity, which likely explains why the CAST model performs poorly in bariatric surgery patients.
Furthermore, the CAST score is not intended to guide specific decisions about whether an individual patient or surgeon should pursue surgery. Various registry-based online risk assessment tools, including the American College of Surgeons NSQIP risk calculator, have been validated in larger populations and can quantify the risks of outcomes that are specific to the type of operation that is being considered. Within our system, the use of the NSQIP risk calculator is encouraged to support discussions between patients and providers. The CAST score is designed for automated use to improve on potentially heterogeneous pre-surgical processes. While other pre-operative risk stratification tools could be used to support risk-based POM triage, the calculation of the CAST score is automatable within most comprehensive EHRs, whereas most risk stratification tools require manual input.
Our study has several limitations. First, while our cohort of surgical cases spanned 21 hospitals that vary in size, local practices, and culture, our analysis was performed at a single highly integrated health care delivery system, which may limit the generalizability of our results. KPNC members receive the vast majority of their care within our system which improves the quality and availability of data. External validation of our model and evaluation of its impact on POM designation is essential. Second, while other studies have demonstrated the potential for reduced post-operative complications and improved surgical efficiency due to POM visits, our study did not assess the relationship between methods of POM triage and patient outcomes. While predictive models can show strong performance in risk stratifying patient populations, that may not translate into improvements in surgical outcomes. Further evaluation will be useful to assess the impact of the program on surgical outcomes and to compare POM triage strategies based on measures that capture risk, modifiability of risk, and complexity of evaluations. Finally, our model showed lower performance for the area under the precision-recall curves which can translate into a higher ‘number needed to evaluate’ metric.
Supplementary Material
Acknowledgments
This study was supported by The Permanente Medical Group (TPMG) and the TPMG Delivery Science Fellowship program. VXL was also supported by R35GM128672.
Footnotes
The authors report no conflicts of interest.
Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.annalsofsurgery.com.
REFERENCES
- 1.Birkmeyer JD, Gust C, Dimick JB, et al. Hospital quality and the cost of inpatient surgery in the United States. Ann Surg. 2012;255:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Nepogodiev D, Martin J, Biccard B, et al. Global burden of post-operative death. Lancet. 2019;393:33139–33138. [Google Scholar]
- 3.Healy MA, Mullard AJ, Campbell DA, et al. Hospital and payer costs associated with surgical complications. JAMA Surg. 2016;151:823–830. [DOI] [PubMed] [Google Scholar]
- 4.Levinson W Preoperative evaluations by an internist-are they worth-while? West J Med. 1984;141:395–398. [PMC free article] [PubMed] [Google Scholar]
- 5.Devereaux PJ, Ghali WA, Gibson NE, et al. Physicians’ recommendations for patients who undergo noncardiac surgery. Clìn Invest Med. 2000;23:116–123. [PubMed] [Google Scholar]
- 6.Clelland C, Worland RL, Jessup DE, et al. Preoperative medical evaluation in patients having joint replacement surgery: added benefits. South Med J. 1996;89:958–960. [DOI] [PubMed] [Google Scholar]
- 7.Macpherson DS, Lofgren RP. Outpatient internal medicine preoperative evaluation: a randomized clinical trial. Medìcal care. 1994;32:498–507. [DOI] [PubMed] [Google Scholar]
- 8.Olson RP, Dhakal IB. Day of surgery cancellation rate after preoperative telephone nurse screening or comprehensive optimization visit. Perioper Med. 2015;4:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ferschl MB, Tung A, Sweitzer B, et al. Preoperative clinic visits reduce operating room cancellations and delays. Anesthesiology. 2005;103: 855–859. [DOI] [PubMed] [Google Scholar]
- 10.Knox M, Myers E, Wilson I, et al. The impact of pre-operative assessment clinics on elective surgical case cancellations. Surgeon. 2009;7:76–78. [DOI] [PubMed] [Google Scholar]
- 11.Vazirani S, Lankarani-Fard A, Liang LJ, et al. Perioperative processes and outcomes after implementation of a hospitalist-run preoperative clinic. J Hosp Med. 2012;7:697–701. [DOI] [PubMed] [Google Scholar]
- 12.Macpherson DS, Parenti C, Nee J, et al. An internist joins the surgery service: does comanagement make a difference? J Gen Intern Med. 1994;9:440–444. [DOI] [PubMed] [Google Scholar]
- 13.Caplan GA, Brown A, Crowe PJ, et al. Re-engineering the elective surgical service of a tertiary hospital: a historical controlled trial. Med J Aust. 1998;169:247–251. [PubMed] [Google Scholar]
- 14.Klopfenstein CE, Herrmann FR, Michel JP, et al. The influence of an aging surgical population on the anesthesia workload: a ten-year survey. Anesth Analg. 1998;86:1165–1170. [DOI] [PubMed] [Google Scholar]
- 15.Vetter TR, Bader AM. Continued evolution of perioperative medicine: realizing its full potential. Anesth Analg. 2020;130:804–807. [DOI] [PubMed] [Google Scholar]
- 16.Aronson S, Murray S, Martin G, et al. Roadmap for transforming preoperative assessment to preoperative optimization. Anesth Analg. 2020;130:811–819. [DOI] [PubMed] [Google Scholar]
- 17.Seidel JE, Beck CA, Pocobelli G, et al. Location of residence associated with the likelihood of patient visit to the preoperative assessment clinic. BMC Health Serv Res. 2006;6:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bugar JM, Ghali WA, lemaire JB, et al. Utilization of a preoperative assessment clinic in a tertiary care centre. Clìn Invest Med. 2002;25(1/2):11–18. [PubMed] [Google Scholar]
- 19.Thilen SR, Bryson CL, Reid RJ, et al. Patterns of preoperative consultation and surgical specialty in an integrated healthcare system. Anesthesiology. 2013;118:1028–1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Thilen SR, Woersching AL, Cornea AM, et al. Surgical specialty and preoperative medical consultation based on commercial health insurance claims. Perioper Med (Lond). 2018;7:9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wijeysundera DN, Austin PC, Beattie WS, et al. Variation in the practice of preoperative medical consultation for major elective noncardiac surgery: a population-based study. Anesthesiology. 2012;116:25–34. [DOI] [PubMed] [Google Scholar]
- 22.Edward G, Biervliet J, Hollmann M, et al. Comparing the organisational structure of the preoperative assessment clinic at eight university hospitals. Acta Anaesthesiol Belg. 2008;59:33–37. [PubMed] [Google Scholar]
- 23.Van Klei W, Hennis P, Moen J, et al. The accuracy of trained nurses in pre-operative health assessment: results of the OPEN study. Anaesthesia. 2004;59:971–978. [DOI] [PubMed] [Google Scholar]
- 24.Molin C, Rovsing ML, Meyhoff CS. Preoperative anaesthesia triage with a patient-centred system—A prospective clinical study. Acta Anaesthesiol Scand. 2020;64:1446–1452. [DOI] [PubMed] [Google Scholar]
- 25.Parker BM, Tetzlaff JE, Litaker DL, et al. Redefining the preoperative evaluation process and the role of the anesthesiologist. J Clin Anesth. 2000;12:350–356. [DOI] [PubMed] [Google Scholar]
- 26.Enneking FK, Radhakrishnan NS, Berg K, et al. Patient-centered anesthesia triage system predicts ASA physical status. Anesth Analg. 2017;124:1957–1962. [DOI] [PubMed] [Google Scholar]
- 27.Raval MV, Cohen ME, Ingraham AM, et al. Improving American College of Surgeons National Surgical Quality Improvement Program risk adjustment: incorporation of a novel procedure risk score. J Am Coll Surg. 2010;211:715–723. [DOI] [PubMed] [Google Scholar]
- 28.Cohen ME, Ko CY, Bilimoria KY, et al. Optimizing ACS NSQIP Modeling for Evaluation of Surgical Quality and Risk: Patient Risk Adjustment, Procedure Mix Adjustment, Shrinkage Adjustment, and Surgical Focus. J Am Coll Surg. 2013;217:336–346. e331. [DOI] [PubMed] [Google Scholar]
- 29.Escobar GJ, Gardner MN, Greene JD, et al. Risk-adjusting hospital mortality using a comprehensive electronic record in an integrated health care delivery system. Med Care. 2013;446–453. [DOI] [PubMed] [Google Scholar]
- 30.Escobar GJ, Greene JD, Scheirer P, et al. Risk-adjusting hospital inpatient mortality using automated inpatient, outpatient, and laboratory databases. Med Care. 2008;232–239. [DOI] [PubMed] [Google Scholar]
- 31.Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer, 2009. [Google Scholar]
- 32.Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One. 2015;10:e0118432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Crowson CS, Atkinson EJ, Therneau TM. Assessing calibration of prognostic risk scores. Stat Methods Med Res. 2016;25:1692–1706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Apfelbaum JL, Connis RT, Nickinovich DG, et al. Practice advisory for preanesthesia evaluation: an updated report by the American Society of Anesthesiologists Task Force on Preanesthesia Evaluation. Anesthesiology. 2012;116:522–538. [DOI] [PubMed] [Google Scholar]
- 35.Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27:157–172. [DOI] [PubMed] [Google Scholar]
- 36.Grant C, Ludbrook G, O’Loughlin E, et al. An analysis of computer-assisted pre-screening prior to elective surgery. Anaesth Intensive Care. 2012;40:297–304. [DOI] [PubMed] [Google Scholar]
- 37.Hilditch W, Asbury A, Jack E, et al. Validation of a pre-anaesthetic screening questionnaire. Anaesthesia. 2003;58:874–877. [DOI] [PubMed] [Google Scholar]
- 38.Grocott MPW, Plumb JOM, Edwards M, et al. Re-designing the pathway to surgery: better care and added value. Perioper Med. 2017;6:9–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sankar A, Johnson S, Beattie W, et al. Reliability of the American Society of Anesthesiologists physical status scale in clinical practice. Br J Anaesth. 2014;113:424–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Curatolo C, Goldberg A, Maerz D, et al. ASA physical status assignment by non-anesthesia providers: do surgeons consistently downgrade the ASA score preoperatively? J Clin Anesth. 2017;38:123–128. [DOI] [PubMed] [Google Scholar]
- 41.Riley R, Holman C, Fletcher D. Inter-rater reliability of the ASA physical status classification in a sample of anaesthetists in Western Australia. Anaesth Intensive Care. 2014;42:614–618. [DOI] [PubMed] [Google Scholar]
- 42.Knuf KM, Maani CV, Cummings AK. Clinical agreement in the American Society of Anesthesiologists physical status classification. Perioper Med. 2018;7:1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Parenti N, Reggiani MLB, Percudani D, et al. Reliability of American Society of Anesthesiologists physical status classification. Indian J Anaesth. 2016;60:208–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ranta S, Hynynen M, Tammisto T. A survey of the ASA physical status classification: significant variation in allocation among Finnish anaesthesiologists. Acta Anaesthesiol Scand. 1997;41:629–632. [DOI] [PubMed] [Google Scholar]
- 45.Mak P, Campbell R, Irwin M. The ASA physical status classification: inter-observer consistency. Anaesthesia and intensive care. 2002;30:633–640. [DOI] [PubMed] [Google Scholar]
- 46.Riggs KR, Berger ZD, Makary MA, et al. Surgeons’ views on preoperative medical evaluation: a qualitative study. Perioper Med. 2017;6:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Rozich JD, Howard RJ, Justeson JM, et al. Standardization as a mechanism to improve safety in health care. Jt Comm J Qual Saf. 2004;30:5–14. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.