Personalized surgical transfusion risk prediction using machine learning to guide preoperative type and screen orders

Sunny S Lou; Hanyang Liu; Chenyang Lu; Troy S Wildes; Bruce L Hall; Thomas Kannampallil

doi:10.1097/ALN.0000000000004139

. Author manuscript; available in PMC: 2023 Jul 1.

Published in final edited form as: Anesthesiology. 2022 Jul 1;137(1):55–66. doi: 10.1097/ALN.0000000000004139

Personalized surgical transfusion risk prediction using machine learning to guide preoperative type and screen orders

Sunny S Lou ¹, Hanyang Liu ², Chenyang Lu ², Troy S Wildes ¹, Bruce L Hall ³, Thomas Kannampallil ^1,⁴

PMCID: PMC9177553 NIHMSID: NIHMS1770229 PMID: 35147666

Abstract

Background

Accurate estimation of surgical transfusion risk is essential for efficient allocation of blood bank resources and for other aspects of anesthetic planning. We hypothesized that a machine learning model incorporating both surgery- and patient-specific variables would outperform the traditional approach that uses only procedure-specific information, allowing for more efficient allocation of preoperative type and screen orders.

Methods

The American College of Surgeons National Surgical Quality Improvement Program Participant Use File was used to train four machine learning models to predict likelihood of red cell transfusion using surgery-specific and patient-specific variables. A baseline model using only procedure-specific information was created for comparison. Models were trained on surgical encounters that occurred at 722 hospitals in 2016–2018. Models were internally validated on surgical cases that occurred at 719 hospitals in 2019. Generalizability of the best-performing model was assessed by external validation on surgical cases occurring at a single institution in 2020.

Results

Transfusion prevalence was 2.4% (73,313/3,049,617), 2.2% (23,205/1,076,441), and 6.7% (1,104/16,053) across the training, internal validation, and external validation cohorts, respectively. The gradient boosting machine outperformed the baseline model, and was the best performing model. At a fixed 96% sensitivity, this model had a positive predictive value of 0.06 and 0.21, and recommended type and screens for 36% and 30% of the patients in internal and external validation, respectively. By comparison, the baseline model at the same sensitivity had a positive predictive value of 0.04 and 0.144, and recommended type and screens for 57% and 45% of the patients in internal and external validation, respectively. The most important predictor variables were overall procedure-specific transfusion rate and preoperative hematocrit.

Conclusions

A personalized transfusion risk prediction model was created using both surgery- and patient-specific variables to guide preoperative type and screen orders and showed better performance compared to the traditional procedure-centric approach.

Introduction

Blood transfusion can be a lifesaving therapy in the perioperative setting. Ideally, patients with nontrivial risk of transfusion should receive blood typing and antibody screening pre-procedurally to ensure the availability of compatible blood products¹. Conversely, patients with low risk of transfusion should be spared the discomfort and cost of an unnecessary lab test¹. Information on transfusion risk is also useful for anesthetic planning and decision making, including aiding in decisions on the need for additional intravenous access or invasive monitoring. Therefore, accurate estimation of a patient’s likelihood of transfusion has implications for both patient safety and cost.

A common approach for estimating a patient’s likelihood of transfusion is based on surgical characteristics such as the historical percentage of patients undergoing that procedure who require transfusion^2–5, referred to as the procedure-specific transfusion rate. For example, previous studies proposed that patients undergoing surgery with a transfusion rate less than 5% and average blood loss less than 50 milliliters could omit a type and screen^4,5. However, in previous guidelines, patient-specific factors were not considered, even though preoperative anemia, renal dysfunction, patient age, size, and sex all have been associated with an increased likelihood of surgical transfusion^6–8. We hypothesized that a machine learning model incorporating both patient and surgery-specific variables would provide better discrimination of transfusion risk as compared to current guidelines; such models at the point-of-care could allow for more efficient decision making regarding preoperative type and screen orders.

Machine learning techniques rely on computer-based algorithms to identify patterns in large datasets and make predictions by learning from examples⁹. Machine learning has shown promise for clinical prediction in many domains of medicine¹⁰, including for transfusion risk prediction^7,8,11–16, but previous work has typically focused on modeling a limited subset of procedures and did not incorporate practical considerations necessary to guide preoperative blood ordering practice, such as the asymmetric harms of false positive and false negative predictions. In this study, we have the following research objectives: (1) develop, evaluate, and validate the performance of multiple machine learning models to estimate surgical red cell transfusion risk using both surgery- and patient-specific variables, (2) compare machine learning model performance to a baseline model that uses only the procedure-specific transfusion rate, the current standard of care, and (3) evaluate the performance of transfusion risk prediction models when tailored specifically for the decision to order a type and screen.

Materials and methods

The protocol for this retrospective observational study was approved by the institutional review board of Washington University (IRB# 202102003, St Louis, Missouri) with a waiver of informed consent. This study is reported according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines¹⁷. An overview of pre-specified experimental design is shown in Figure 1. The analysis plan was written after the data were accessed.

Figure 1 – — Diagram of experimental design.

Models were trained exclusively on the National Surgical Quality Improvement Program Participant Use File surgical case cohort from 2016–2018, which was split 80% to be used for model training and hyperparameter tuning, and 20% used for model evaluation, early stopping, and selection of the final model. Once the final model in each model category was chosen based on the training data, all model parameters were fixed and models were evaluated on the internal validation data, which contained cases performed in 2019 from the same national database, and external validation data, which contained surgical cases performed at a single academic institution in 2020.

Data sources

This study included all surgical cases submitted to the American College of Surgeons National Surgical Quality Improvement Program¹⁸ for the time period spanning 1/1/2016–12/31/2019. No formal power calculation was performed to determine data set size. The National Surgical Quality Improvement Program Participant Use File contains information on surgical procedures performed on adult patients sampled from academic and community hospitals across the United States. In this database, only the occurrence of red cell transfusion is captured; the details regarding the quantity of red cell transfusion or the transfusion of other blood products such as fresh frozen plasma or platelets are not available.

We also extracted information on surgical procedures performed between 1/1/2020 and 12/31/2020 on adult patients at Barnes-Jewish Hospital, a large academic medical center, from the electronic health record (Epic Systems, Verona, WI). In addition, transfusion data was collected for cases performed at the same institution between 1/1/2019–12/31/2019 to estimate the procedure-specific transfusion rate needed to evaluate model performance for the 2020 dataset, as will be described below. We were unable to use transfusion data from earlier than 2019 to estimate procedure-specific transfusion rates for the external validation dataset due to a transition in the electronic health record system implemented in 2018.

Data sources were split into subsets that are referred to as training (for National Surgical Quality Improvement Program data collected between 1/1/2016–12/31/2018), internal validation (for National Surgical Quality Improvement Program data collected between 1/1/2019–12/31/2019), and external validation (for Barnes-Jewish Hospital data collected between 1/1/2020–12/31/2020).

Our goal was to create a model with generalizable performance across a diverse set of surgical procedures. Towards that end, all procedures occurring in an operating room were included. However, two groups of surgical procedures – ophthalmologic surgery and obstetric surgery – have special considerations related to blood ordering and transfusion¹⁹ for which a general surgical model may not apply. These two groups are not present in the National Surgical Quality Improvement Program database and therefore were excluded from the Barnes-Jewish Hospital cohort to match.

Variable selection and extraction

The outcome variable for our models was the presence of red cell transfusion on the day of surgery as a binary outcome. For the training and internal validation data, this was transfusion in either the intraoperative or postoperative period on the day of surgery. For the external validation dataset, this included transfusion during the intraoperative period only, due to lack of post-operative transfusion data.

Predictor variables were selected based on data availability, prior literature, and ease of retrieval from an electronic health record. Our goal was to create a model that can be implemented directly within an electronic health record.

The following patient-specific variables were included: patient demographics (age, height, weight, sex), patient comorbidities (history of hypertension, congestive heart failure, smoking, chronic obstructive lung disease, dialysis, diabetes), and patient preoperative lab values (hematocrit, platelet count, international normalized ratio, partial thromboplastin time, creatinine, sodium, albumin, and bilirubin). American Society of Anesthesiologists (ASA) physical status was not included due to its history of poor inter-rater reliability^20,21 and its potential lack of availability in the preoperative setting.

The surgery-specific variables included in the model were elective surgery status (i.e., whether the patient arrived for surgery from home) and the historical procedure-specific transfusion rate, which were pre-computed as described below. Whether each variable was considered binary, ordinal, or continuous is indicated in Table 1. Binary variables were coded 0 or 1. Ordinal variables were coded 0 through N based on the number of levels N.

Table 1 –

Demographic characteristics of the training, internal validation, and external validation datasets.

Variable	Data Type	Training: (n = 3,049,617)	Internal Validation: (n = 1,076,441)	External Validation: (n = 16,053)
Transfused – Yes, n (%)	Binary	73,313 (2.4%)	23,205 (2.2%)^*	1,104 (6.7%)^*
Age, median (IQR)	Continuous	58 (45–69)	59 (44–70)^*	59 (46–69)
Height (in), median (IQR)	Continuous	66 (63–69)	66 (63–69)^*	67 (64–70)^*
Weight (lbs), median (IQR)	Continuous	181 (152–216)	180 (152–214)^*	185 (155–220)^*
Sex – Male, n (%)	Binary	1,316,142 (43%)	454,517 (42%)^*	7,592 (47%)^*
ASA status – I, n (%)	Ordinal	253,948 (8%)	85,144 (7.9%)^*	658 (4%)^*
II, n (%)		1,356,498 (45%)	486,469 (45%)^*	5,962 (37%)^*
III, n (%)		1,247,668 (41%)	442,515 (41%)^*	7,504 (47%)^*
IV, n (%)		178,613 (6%)	57,950 (5%)^*	1,842 (11%)^*
V, n (%)		5,413 (0.2%)	1,905 (0.2%)^*	69 (0.4%)^*
Hypertension – Yes, n (%)	Binary	1,354,643 (44%)	466,520 (43%)^*	7,227 (45%)
Congestive heart failure – Yes, n (%)	Binary	25,930 (1%)	9,051 (1%)	1,192 (7%)^*
Smoking – Yes, n (%)	Binary	516,646 (17%)	165,551 (15%)^*	1,760 (11%)^*
COPD – Yes, n (%)	Binary	129,528 (4%)	43,290 (4%)^*	1,337 (8%)^*
Dialysis – Yes, n (%)	Binary	39,343 (1%)	11,459 (1%)^*	524 (3%)^*
Diabetes – No, n (%)	Ordinal	2,574,255 (84%)	913,163 (85%)^*	13,192 (82%)^*
Meds, n (%)		300,959 (10%)	106,689 (10%)^*	1,881 (12%)^*
Insulin, n (%)		174,402 (6%)	56,589 (5%)^*	979 (6%)^*
Hematocrit, g/dL, median (IQR)	Continuous	40.1 (36.9–43.1)	40.4 (37.0–43.4)^*	37.7 (32.0–41.8)^*
Platelet, 1000/uL, median (IQR)	Continuous	242 (199–291)	246 (202–296)^*	243 (191–306)^*
INR, median (IQR)	Continuous	1.0 (1.0–1.1)	1.0 (1.0–1.1)^*	1.2 (1.1–1.3)^*
PTT, s, median (IQR)	Continuous	29 (27–32)	29 (27–33)^*	31 (28–34)^*
Creatinine, mg/dL, median (IQR)	Continuous	0.85 (0.70–1.02)	0.85 (0.70–1.02)^*	0.93 (0.75–1.20)^*
Sodium, mEq/L, median (IQR)	Continuous	139 (138–141)	139 (138–141)^*	139 (137–141)^*
Albumin, g/dL, median (IQR)	Continuous	4.0 (3.6–4.3)	4.1 (3.7–4.4)^*	4.0 (3.3–4.4)^*
Bilirubin, mg/dL, median (IQR)	Continuous	0.5 (0.4–0.7)	0.5 (0.4–0.7)^*	0.4 (0.3–0.6)^*
Elective surgery – Yes	Binary	2,429,994 (80%)	847,691 (79%)^*	10,686 (67%)^*
Procedure-specific transfusion rate (%), median (IQR)	Continuous	0.29 (0.07–1.69)	0.30 (0.07–1.66)^*	1.08 (0.00–7.50)^*

Open in a new tab

Machine learning models were trained and internally validated using National Surgical Quality Improvement Program Participant Use File data from 2016–18 and 2019, respectively. External validation was performed using a separate cohort of surgical cases occurring at a single academic medical center in 2020. Demographic characteristics for each of these datasets is shown. Median (interquartile range) is shown for continuous variables. Count (%) shown for categorical variables. The outcome variable is “Transfused – Yes/No”, representing whether the patient received red cell transfusion on the day of surgery for the training and internal validation data, and intraoperative transfusion for the external validation data. Diabetes is an ordinal variable with three levels: no diabetes, diabetes requiring medications, and diabetes requiring insulin therapy. Elective surgery indicates whether the patient arrived for surgery from home rather than from an inpatient or emergency department bed. COPD – history of chronic obstructive lung disease. INR – international normalized ratio. PTT – partial thromboplastin time.

indicates Bonferroni-corrected p < 0.05 when compared to the training cohort.

For training and internal validation data, each selected variable corresponded to a variable available in the National Surgical Quality Improvement Program Participant Use File. Patient comorbidities were abstracted from medical records by trained data experts at each participating hospital according to detailed criteria²². Laboratory values represented the most recent lab value drawn within 90 days of surgery date.

For the external validation data, patient comorbidities were extracted from structured preoperative assessment notes (Figure S1). Patient demographics and preoperative lab values were extracted from the electronic health record using the same criteria as used for the training and internal validation data.

Computing historical procedure-specific transfusion rates

To compute the procedure-specific transfusion rate, i.e., the historical frequency of transfusion for each surgery, the computation differed between the training, internal validation, and external validation datasets. For the training dataset, the prevalence of transfusion for each unique primary Current Procedural Terminology code was computed across the entire training dataset and the resulting surgery-specific transfusion frequency table was mapped to each case based on its primary Current Procedural Terminology code. The Current Procedural Terminology codes were otherwise not used in the models. For example, 80 of 127,315 (0.06%) laparoscopic appendectomies (Current Procedure Terminology code 44970) in the training data required red cell transfusion on the day of surgery. Therefore, all laparoscopic appendectomies in the training and internal validation data were annotated with the procedure-specific transfusion rate of 0.06.

To simulate prospective implementation of the models developed on the training data for the internal validation data, the same surgery-specific transfusion frequency table from the 2016–2018 training data was used to annotate surgeries performed in the 2019 internal validation dataset; i.e., the actual transfusion prevalence for each surgery in 2019 was not used to avoid label leakage. New primary Current Procedural Terminology codes that only occurred in 2019 were assigned missing values for surgery-specific transfusion frequency. There were 2,796 procedure types included in the internal validation analysis (Supplemental Appendix 1).

A similar process was performed for the external validation dataset; however, the historical transfusion rate was not matched to the national database, but computed specifically for surgeries occurring at Barnes-Jewish Hospital in 2019 grouped by the pre-procedural primary procedure name used for the case booking. In other words, for the external validation analysis, a set of procedure-specific transfusion rates were computed that were specific to Barnes-Jewish Hospital. We chose not to use Current Procedural Terminology codes to group procedures for the external validation dataset as these codes are often not available in the preoperative setting and would thus limit the practical translation and implementation of our models. This pre-procedural text was not available for the training and internal validation data.

To ensure reliability of the historical procedure-specific transfusion rates used for external validation, only primary procedures that occurred at least 50 times in 2019 were included in the external validation dataset. This cutoff was chosen such that the estimated 95% confidence intervals for transfusion frequency would be no worse than approximately +/− 5%, calculated using Clopper-Pearson confidence intervals sampled from the binomial distribution. Of the 30,114 surgical procedures performed at Barnes-Jewish Hospital in 2020, 16,053 (53.3%) were booked with a primary procedure that occurred at least 50 times in 2019, and thus were included in the analysis. There were 171 primary procedures included in the external validation analysis (Supplemental Appendix 2).

Data preprocessing

Missing values were present in the training, internal validation, and external validation datasets with frequencies indicated in Table S1. Missing values were replaced by median imputation using the data distribution of the training data for all three datasets. This is likely a reasonable imputation approach as missingness for lab values, which were most commonly missing, is influenced by low clinician suspicion of abnormality²³. The per-variable median values used for imputation are shown in Table S2. No further pre-processing, such as non-linear scaling of continuous variables, was performed.

To facilitate speed of model training, all data were normalized by mean subtraction and scaling to unit variance, using only the data distribution for the training data for all three datasets. Per-variable mean and variance used for normalization are shown in Table S2.

Model training

The training data was split 80% for model training and hyperparameter tuning and 20% for model testing and early stopping (Figure 1). Four supervised machine learning models were trained using the selected predictor variables to predict the binary transfusion outcome.

The following models were constructed: penalized logistic regression²⁴ with tuning of lasso and ridge parameters, decision tree²⁵ with tuning of tree depth, random forest²⁶ with tuning of tree depth and number of features considered each split, and gradient boosting machines, implemented using XGBoost²⁷ 1.2.0, with tuning of tree depth, node purity precluding a split, and feature sub-sampling. Early stopping was used to determine the number of boosting rounds for XGBoost using average precision achieved on the 20% test split of the training data. All models were implemented in scikit-learn²⁸ 0.22.1 and Python 3.7.6 with hyperparameter tuning determined using five-fold cross-validation on the 80% training split to optimize average precision.

To facilitate comparison between our models and previous methods to determine need for preoperative type and screen orders⁴, we also created a baseline model using a single variable: the procedure-specific transfusion rate⁴. We were unable to include estimated blood loss as it was not reported in the National Surgical Quality Improvement Program database, and it was poorly documented in the anesthetic records for the external validation dataset.

Model evaluation

Following model training on the training data, all model parameters were fixed, and models were evaluated on the internal validation data. The best-performing model was then evaluated on the external validation dataset.

Overall model discrimination was evaluated with area under the receiver operating characteristic curve, i.e., c-statistic, and area under the precision recall curve, i.e., average precision. Average precision was chosen because it measures model discrimination specifically for the positive class, which is the more relevant class given the relative rarity of surgical transfusion. Calibration of model predicted transfusion risk is also an important measure of model performance²⁹, especially if model predictions are to be useful to guide other aspects of anesthetic care. Calibration of model predicted probabilities was assessed using a calibration curve. Net benefit analysis was also performed to assess the relative value of the models across a range of prediction thresholds³⁰ using the Decision Curve Analysis R package³¹.

For the specific use case of developing a transfusion risk prediction model to guide preoperative type and screen orders, a model decision threshold should balance the relative harms of false negatives (model predicts no transfusion, no type and screen is ordered, but patient subsequently requires transfusion) and false positives (model predicts transfusion, type and screen ordered, but no transfusion is required). Given that the potential patient safety harms of false negatives are far greater than the mostly monetary harms of false positives, we decided to set all model thresholds to achieve 96% sensitivity on the training data. Thresholds chosen on the training data were carried over to the internal validation data. Thresholds were readjusted on the external validation dataset due to the higher prevalence of transfusion in that dataset.

Then, model performance was evaluated in terms of the positive predictive value and overall frequency of positive predictions, both metrics for excess type and screen orders, i.e., type and screens recommended for patients who did not actually require transfusion. The potential cost-savings for each model in reducing excess type and screen orders was evaluated in comparison to the baseline model using the 2020 Medicare Clinical Laboratory Fee Schedule³². To estimate cost savings, the difference in excess type and screen orders between models was multiplied by the Medicare reimbursement rate for a type and screen ($15.75).

Model explanation

Machine learning predictions are typically more trusted when explanations for why the model makes a particular prediction are available³³. We used Shapley values³⁴, a coalitional game theory approach to interpretation of machine learning predictions, to measure overall variable importance and explain individual patient predictions for the best-performing gradient boosting model, implemented with SHAP³⁴ 0.37.0. For this model, Shapley values for each variable are represented in logit space, similar to the coefficients for a logistic regression model³⁴. For a particular value of a variable, a high magnitude (i.e., absolute value) Shapley value indicates that the variable caused a large change in the model’s predicted risk; a negative Shapley value implies that the value for that variable decreased risk, and a positive value implies increased risk.

Shapley values were computed for individual patients to explain individual model predictions. To measure overall variable importance for a cohort, Shapley values were computed for all of the patients in the cohort and illustrated using a beeswarm plot. They were also summarized across the cohort using the mean absolute value of all of the Shapley values for each variable, indicating the overall extent to which the variable contributed to the model’s predictions.

All computer code necessary for model training, evaluation, and explanation are available at https://github.com/sslou/publications/tree/main/2021_blood_product. Example code is also provided for generating model predictions and explanations on new data.

Statistical analysis

Differences in variable distribution between the training, internal validation, and external validation populations were explored with descriptive statistics, the Mann-Whitney-U test for continuous variables, and Chi-square test for categorical variables. Two-tailed tests were employed throughout. Confidence intervals for model performance metrics were generated by bootstrap resampling of each dataset. Pairwise comparisons of model performance were assessed using McNemar’s test statistic³⁵, which is a chi-squared test statistic evaluating the comparative accuracy of two models. A Bonferroni-corrected p-value of 0.05 was used to determine statistical significance. Statistical analysis was conducted using Python 3.7.6.

Results

Cohort characteristics

Models were trained on a cohort of 3,049,617 surgical encounters that occurred at 722 hospitals across the United States during 2016–2018, internally validated on a cohort of 1,076,441 surgical encounters occurring at 719 hospitals in 2019, and externally validated on a cohort of 16,053 surgical encounters that occurred at a single institution in 2020.

Demographic characteristics and distribution of variables used in the models for the three cohorts are shown in Table 1. Overall, data distribution was similar between the training and internal validation cohorts, but the external validation cohort was less healthy (Table 1). 2.4% of surgical encounters required transfusion in the training data, while 2.2% required transfusion in the internal validation data and 6.7% required transfusion in the external validation data.

Performance of machine learning models

For comparison with widely accepted guidelines for preoperative blood typing and antibody screening⁴, we constructed a baseline model that reports transfusion probability simply as the historical procedure-specific transfusion rate for each procedure (Table 2). This baseline model achieves a c-statistic of 0.888 (95% confidence interval CI 0.881–0.894), and an average precision of 0.215 (0.197–0.235).

Table 2 –

Model performance on the internal validation data.

Model	c-statistic	Average Precision	Sensitivity	Specificity	Positive Predictive Value	% Positive
Baseline	0.888 (0.881–0.894)	0.215 (0.197–0.235)	0.970 (0.963–0.977)	0.439 (0.436–0.442)	0.037 (0.035–0.038)	57.0% (56.7–57.3)
Logistic Regression	0.907 (0.900–0.913)	0.280 (0.260–0.300)	0.962 (0.954–0.970)	0.512 (0.509–0.515)	0.042 (0.040–0.043)	49.8% (49.5–50.1)
Decision Tree	0.916 (0.910–0.922)	0.298 (0.278–0.318)	0.963 (0.955–0.971)	0.618 (0.614–0.621)	0.053 (0.050–0.055)	39.5% (39.2–39.8)
Random Forest	0.913 (0.906–0.918)	0.247 (0.230–0.265)	0.962 (0.954–0.970)	0.593 (0.590–0.597)	0.050 (0.047–0.052)	41.9% (41.5–42.2)
Gradient Boosting Machine	0.924 (0.919–0.929)	0.292 (0.273–0.314)	0.963 (0.956–0.971)	0.651 (0.648–0.654)	0.058 (0.055–0.060)	36.2% (35.9–36.5)

Open in a new tab

Penalized logistic regression²⁴, decision tree²⁵, random forest²⁶, and gradient boosting machine²⁷ models were trained on the training data and evaluated on the internal validation data to predict transfusion on the day of surgery, using the procedure-specific transfusion rates observed in 2016–18. For comparison, a baseline model is also presented that used only the procedure-specific transfusion rate. c-statistic – area under the receiver operating characteristic curve. Average precision – area under the precision recall curve, indicative of model discrimination for the positive class. % Positive indicates percent of cases in the cohort for whom the model made a positive prediction, i.e., recommended a type and screen. All models were fixed with decision thresholds to achieve 96% sensitivity on the training data. 95% confidence intervals are shown in parentheses.

We constructed 4 machine learning models: penalized logistic regression, decision tree, random forest, and gradient boosting machine. Model discrimination metrics for the internal validation data are shown in Table 2. The gradient boosting machine outperformed the other models (pairwise McNemar’s test p-values < 0.001), achieving an c-statistic of 0.924 (95% CI 0.919–0.929) and an average precision of 0.292 (95% CI 0.273–0.314). Calibration plots for all the described models are shown in Figure S2.

For the specific use case of guiding preoperative type and screen orders, we set model discrimination thresholds to achieve 96% sensitivity, given the asymmetric harms of false positive (i.e., patient has type and screen but does not require transfusion) and false negative (i.e., patient requires transfusion but has no type and screen) predictions. Sensitivity can also be characterized as the percentage of patients requiring transfusion who had a preoperative type and screen recommended by each model. For reference, the 5% procedure-specific transfusion rate threshold previously described to guide type and screen decisions^4,5 only achieved a sensitivity of 83.7% (95% CI 82.1–85.2%). At 96% sensitivity, the best-performing gradient boosting model made a positive prediction, i.e., recommended a type and screen, for 36.2% (95% CI 35.9–36.5) of the cases in the internal validation cohort with a positive predictive value of 0.058 (0.055–0.060). In contrast, the baseline model recommended type and screens for 57.0% (95% CI 56.7–57.3) with a positive predictive value of 0.037 (95% CI 0.035–0.038). In other words, for the same sensitivity or false negative rate, the gradient boosting model required one-third fewer type and screen orders as compared with the baseline or penalized logistic regression models.

Model generalizability to external validation

The best-performing gradient boosting model was also evaluated on an independent hold-out external validation cohort in comparison to the baseline model (Table 3). With the threshold set to achieve 96% sensitivity, the gradient boosting model recommended type and screens for 31.0% (95% CI 30.4–31.6) of cases with a PPV of 0.213 (95% CI 0.203–0.223). The baseline model was less efficient, recommending type and screens for 45.7% (95% CI 45.0–46.4) of cases with a PPV of 0.144 (95% CI 0.137–0.151). The gradient boosting model failed to recommend a type and screen for 45 (95% CI 34–57) patients who subsequently required transfusion in this cohort (0.28%), whereas the baseline model failed to recommend a type and screen for 47 (95% CI 35–59) patients who subsequently required transfusion in this cohort (0.29%). Given that the gradient boosting model ordered 2,360 fewer type and screens than the baseline model for this cohort of 16,053 patients, the estimated cost savings for implementing the personalized model was $37,167 ($2.32 per patient) for this cohort, as calculated using the Medicare reimbursement price of $15.75 for a type and screen. The gradient boosting model also had higher net benefit (Figure S3).

Table 3 –

Discrimination of the best-performing gradient boosting model on the external validation data.

Model	c-statistic	Average Precision	Sensitivity	Specificity	Positive Predictive Value	% Positive
Baseline	0.908 (0.899–0.916)	0.511 (0.481–0.539)	0.957 (0.946–0.969)	0.580 (0.573–0.587)	0.144 (0.137–0.151)	45.7% (45.0–46.4)
Gradient Boosting Machine	0.939 (0.933–0.944)	0.583 (0.554–0.610)	0.959 (0.949–0.969)	0.738 (0.732–0.744)	0.213 (0.203–0.223)	31.0% (30.4–31.6)

Open in a new tab

The best-performing gradient boosting model was evaluated on its ability to predict intraoperative transfusion for the external validation data using procedure-specific transfusion rates observed at this institution in 2019. Only procedures that occurred more than 50 times at this institution in 2019 are included, as transfusion rates are too uncertain below this threshold (Figure S3). c-statistic – area under the receiver operating characteristic curve. Average precision – area under the precision recall curve, indicative of model discrimination for the positive class. % Positive indicates percent of cases in the cohort for whom the model made a positive prediction, i.e., recommended a type and screen. All models were tuned with decision thresholds to achieve 96% sensitivity. 95% confidence intervals are shown in parentheses.

Interpretation of model predictions

Explanations for how the gradient boosting model arrived at a prediction for an individual patient can be computed and illustrated (Figure 2). For this representative patient having a robotic-assisted partial nephrectomy, their predicted transfusion risk was decreased from the population average by the low historical frequency of transfusion for this procedure in addition to their high starting hematocrit and above-average weight. However, their transfusion risk was also revised upwards due to decreased renal function, mildly abnormal INR, and age.

Such individual patient explanations were summarized over a cohort to explain model predictions in aggregate (Figure 3). Across the cohort, the procedure-specific transfusion rate and preoperative hematocrit had the largest impact on model predictions; patients having procedures with high procedure-specific rates of transfusion had higher transfusion risk, and patients with low preoperative hematocrit had higher transfusion risk. Although INR, platelet count, creatinine, weight, and albumin levels had low average impact on model predictions, these variables had larger impact when they were very abnormal.

Discussion

In this study, we developed a machine learning model trained on a large national cohort of surgical cases to predict transfusion risk using both surgery- and patient-specific variables. Compared with a baseline model, which used only the historic procedure-specific transfusion rate as is recommended by current guidelines^1,3,4, the gradient boosting machine model demonstrated the best discriminative performance (Table 2, 3). with the highest c-statistic and average precision. When tailored specifically to guide preoperative type and screen decision making, the gradient boosting model required one-third fewer type and screen orders, while maintaining 96% sensitivity to detect transfusion, compared to the baseline model in both internal and external validation datasets. These findings highlight the considerable potential for utilizing these models to estimate transfusion risk and guide preoperative type and screen ordering decisions.

Current guidelines for preoperative type and screen decision-making^1,3,4, i.e., the maximum surgical blood ordering schedule², have focused on surgical characteristics such as the transfusion rate for each surgery. Consistent with this prior work, our baseline model with knowledge only of historical procedure-specific transfusion rates performed reasonably well (Table 2), and this procedure-specific transfusion rate variable was by far the most important variable in our best-performing multivariable gradient boosting model (Figure 3). However, the 5% procedure-specific transfusion risk threshold previously described to guide type and screen orders^3,4 only achieved a sensitivity of 0.837 on the internal validation cohort, suggesting that it would miss over 16% of patients who actually required transfusion. In practice, most institutions incorporate clinician opinion or other procedure-specific variables into their maximum surgical blood ordering schedule such that type and screen ordering practice is much more conservative⁴. Nonetheless, we were able to substantially improve predictive performance over the baseline model by incorporating patient-specific variables (Table 2), demonstrating the importance of personalized patient-specific risk prediction. In contrast to previous transfusion risk-prediction models^{7,8,11–16,36}, our models incorporated both surgery- and patient-specific variables, used only variables available in the preoperative setting and readily extractable from a patient’s health record, made predictions across a diverse range of surgical procedures, and explicitly considered a decision threshold appropriate for a preoperative type and screen decision making use case.

One key innovation of our approach is the use of the surgery-specific transfusion rate as a form of transfer learning, allowing our model to be generalized across hospital systems. By training our model on a large multi-institutional database with diverse transfusion practices, the model learned how patient comorbidities and preoperative laboratory tests typically transform the procedure-specific transfusion risk to a personalized transfusion risk. For application in new settings, any hospital can then specify their historical transfusion rates for each procedure, and the model can apply its knowledge of how patient-specific variables modify that baseline risk. We demonstrated the feasibility of this transfer learning approach by generalizing model performance to an independent cohort of surgical patients at a single academic medical center as a proof of principle (Table 3). Importantly, we demonstrated generalizability of model performance despite this external validation dataset containing information retrieved from an electronic health record and therefore not being as well-curated as the training data derived from a national quality registry.

A second innovation of our approach is the display of individualized explanations of model predictions (Figure 2). Interpretable machine learning is critical for building clinician trust in model outputs³³. Visualization of model explanations also provides a margin of safety to protect against model failures because the reason for nonsensical predictions can be easily identified and addressed. The most important variables for our best-performing model (Figure 3) match with clinical intuition and previous literature: the procedure-specific transfusion rate, preoperative hematocrit, age, and laboratory indicators of coagulopathy.

We propose that our model predictions and visualizations can be implemented as point-of-care clinical decision support within an electronic health record to guide preoperative type and screen ordering practice for any hospital. To implement our model at a new hospital, only the historical procedure-specific transfusion rates are required; this is the same data that is used to create a conventional maximum surgical blood ordering schedule. Then, for each new patient, our model can predict their transfusion risk and recommend whether to order a type and screen given their planned surgery and patient-specific characteristics. We intentionally chose patient-specific variables that would be easy to abstract from an electronic health record to aid in the ease of model implementation.

However, our model likely requires further validation before it should be widely implemented to guide preoperative type and screen ordering practice. Although we trained our models on a large multi-hospital cohort of surgical cases and demonstrated model generalizability to a single hospital using a transfer learning approach, model performance may vary at other hospitals with different transfusion practices or patient populations. In addition, our external validation analysis was limited by the lack of postoperative transfusion data. Prospective validation of the model and the type and screen decision threshold are needed prior to implementation. We do not claim that our model accounts for all possible variables that contribute to surgical transfusion risk or the decision to order a type and screen; for example, preoperative anticoagulation medications were not included in our models.

Another limitation is that our model adjusts predicted risk substantially based on the planned procedure and available laboratory values (Figure 3), which could change between the preoperative visit (where the model might be implemented) and the day of surgery. For example, a common preoperative workflow is to order the preoperative laboratory tests at the same time as the type and screen; thus, the newly ordered labs may not have resulted by the time the decision to obtain a type and screen needs to be made. Our model is tolerant to the absence of lab values, as is commonly the case (Table S1) when unnecessary labs are not ordered for relatively healthy patients. Several practical solutions to the problem of changing information are possible; for example, the model could be updated as new information becomes available, and new recommendations can be made, Additional research is needed to explore the pragmatic integration of our model into clinical workflow, situated within local practice settings.

Three years of historical transfusion data were used to estimate procedure-specific transfusion rates for the internal validation data. Because only one year of historical data was available for the external validation dataset, procedure-specific transfusion rates could be estimated with reasonable precision for only 53.3% of the surgical cases that occurred at the external validation institution in 2020, and thus model predictions were only made for this subset (Table 3). We expect other institutions with longer durations of historical transfusion data to be able to apply our model to a larger fraction of their surgical cases. Model predictions were inaccurate if low-precision procedure-specific transfusion rates – such as those estimated for uncommon procedures – were provided as model input (Supplemental Table S3). This limitation regarding uncommon procedures, which in aggregate may comprise a meaningful fraction of surgical case volume, is also present for existing maximum surgical blood ordering schedule strategies.

All models can experience decay of prediction performance over time³⁷ due to changes in transfusion practice, case mix, or surgical technique. In our case, this can potentially be ameliorated by continuously updating the historical procedure-specific transfusion rate using our transfer learning technique, although this has yet to be demonstrated to be effective. Conversely, inaccuracies in procedure-specific transfusion rates or changes in procedure naming that disrupt matching of cases to their historical transfusion rates can magnify prediction error (Supplemental Table S3).

Finally, although we chose to fix our models at 96% sensitivity, some may disagree regarding the optimal sensitivity threshold at which the cost of false positives and risk of false negative predictions are balanced. Our threshold can be considered as aggressive, although administration of unmatched emergency release blood in the setting of unexpected transfusion has generally been demonstrated to be safe^38,39. Improved identification of the true cost to patient safety and systems cost of unnecessary type and screen orders could allow the selection of a more optimal decision threshold using utility analysis⁴⁰. Although we modeled cost-savings using the Medicare reimbursement rate for type and screens, this may not be representative of all costs; for example, the organizational costs of unnecessary sample acquisition, storage, and processing, and potential patient harm induced by unnecessary needle sticks and phlebotomy were not included.

In summary, we developed a gradient boosting machine model to predict surgical transfusion risk using both patient and surgery-specific variables and tailored it to guide decision making regarding preoperative type and screen orders. Our model outperformed a baseline model that used surgical information alone, as is recommended by current blood ordering guidelines. Although our model requires further prospective validation and implementation, it is an important first step towards personalized surgical blood orders and has the potential to improve patient safety and reduce healthcare costs.

Supplementary Material

Supplemental Materials

NIHMS1770229-supplement-Supplemental_Materials.docx^{(400.2KB, docx)}

Acknowledgements:

We would like to thank Derek Harford, Alex Kronzer, and Kevin Heard for their assistance in obtaining the data used for this study.

Funding source:

SSL was supported by NIH 5T32GM108539-07 (Bethesda, MD).

Footnotes

Conflicts of interest: The authors declare no competing interests. BLH is consulting director of the ACS-NSQIP for the American College of Surgeons. TK has consulting relationships with Pfizer Inc. and Elsevier that are unrelated to this work.

References

1.American Society of Anesthesiologists Task Force on Perioperative Blood Management: Practice guidelines for perioperative blood management: an updated report by the American Society of Anesthesiologists Task Force on Perioperative Blood Management*. Anesthesiology 2015; 122:241–75 [DOI] [PubMed] [Google Scholar]
2.Friedman BA: An analysis of surgical blood use in United States hospitals with application to the maximum surgical blood order schedule. Transfusion (Paris) 1979; 19:268–78 [DOI] [PubMed] [Google Scholar]
3.Dexter F, Ledolter J, Davis E, Witkowski TA, Herman JH, Epstein RH: Systematic Criteria for Type and Screen Based on Procedure’s Probability of Erythrocyte Transfusion. Anesthesiology 2012; 116:768–78 [DOI] [PubMed] [Google Scholar]
4.Frank SM, Rothschild JA, Masear CG, Rivers RJ, Merritt WT, Savage WJ, Ness PM: Optimizing Preoperative Blood Ordering with Data Acquired from an Anesthesia Information Management System. Anesthesiology 2013; 118:1286–97 [DOI] [PubMed] [Google Scholar]
5.Woodrum CL, Wisniewski M, Triulzi DJ, Waters JH, Alarcon LH, Yazer MH: The effects of a data driven maximum surgical blood ordering schedule on preoperative blood ordering practices. Hematology 2017; 22:571–7 [DOI] [PubMed] [Google Scholar]
6.Geißler RG, Franz D, Buddendick H, Krakowitzky P, Bunzemeier H, Roeder N, Van Aken H, Kessler T, Berdel W, Sibrowski W, Schlenke P: Retrospective Analysis of the Blood Component Utilization in a University Hospital of Maximum Medical Care. Transfus Med Hemotherapy 2012; 39:129–38 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Frisch NB, Wessell NM, Charters MA, Yu S, Jeffries JJ, Silverton CD: Predictors and Complications of Blood Transfusion in Total Hip and Knee Arthroplasty. J Arthroplasty 2014; 29:189–92 [DOI] [PubMed] [Google Scholar]
8.Hayn D, Kreiner K, Ebner H, Kastner P, Breznik N, Rzepka A, Hofmann A, Gombotz H, Schreier G: Development of Multivariable Models to Predict and Benchmark Transfusion in Elective Surgery Supporting Patient Blood Management. Appl Clin Inform 2017; 8:617–31 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Mathis MR, Kheterpal S, Najarian K: Artificial Intelligence for Anesthesia: What the Practicing Clinician Needs to Know: More than Black Magic for the Art of the Dark. Anesthesiology 2018; 129:619–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Jalilian L, Cannesson M: Precision medicine in anesthesiology. Int Anesthesiol Clin 2020; 58:17–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Nuttall GA, Santrach PJ, Oliver WC, Ereth MH, Horlocker TT, Cabanela ME, Trousdale RT, Bryant S, Currie TW: A prospective randomized trial of the surgical blood order equation for ordering red cells for total hip arthroplasty patients. Transfusion (Paris) 1998; 38:828–33 [DOI] [PubMed] [Google Scholar]
12.Klei WA van, Moons KG, Leyssius AT, Knape JT, Rutten CL, Grobbee DE: A reduction in type and screen: preoperative prediction of RBC transfusions in surgery procedures with intermediate transfusion risks. Br J Anaesth 2001; 87:250–7 [DOI] [PubMed] [Google Scholar]
13.Palmer T, Wahr JA, O’Reilly M, Greenfield MLVH: Reducing unnecessary cross-matching: a patient-specific blood ordering system is more accurate in predicting who will receive a blood transfusion than the maximum blood ordering system. Anesth Analg 2003; 96:369–75, table of contents [DOI] [PubMed] [Google Scholar]
14.Mitterecker A, Hofmann A, Trentino KM, Lloyd A, Leahy MF, Schwarzbauer K, Tschoellitsch T, Böck C, Hochreiter S, Meier J: Machine learning–based prediction of transfusion. Transfusion (Paris) 2020; 60:1977–86 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Walczak S, Velanovich V: Prediction of perioperative transfusions using an artificial neural network. PLOS ONE 2020; 15:e0229450. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Jalali A, Lonsdale H, Zamora LV, Ahumada L, Nguyen ATH, Rehman M, Fackler J, Stricker PA, Fernandez AM, Group PCC: Machine Learning Applied to Registry Data: Development of a Patient-Specific Prediction Model for Blood Transfusion Requirements During Craniofacial Surgery Using the Pediatric Craniofacial Perioperative Registry Dataset. Anesth Analg 2021; 132:160–71 [DOI] [PubMed] [Google Scholar]
17.Collins GS, Reitsma JB, Altman DG, Moons KGM: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med 2015; 162:55–63 [DOI] [PubMed] [Google Scholar]
18.Shiloach M, Frencher SK, Steeger JE, Rowell KS, Bartzokis K, Tomeh MG, Richards KE, Ko CY, Hall BL: Toward Robust Information: Data Quality and Inter-Rater Reliability in the American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg 2010; 210:6–16 [DOI] [PubMed] [Google Scholar]
19.Frank SM, Oleyar MJ, Ness PM, Tobian AAR: Reducing Unnecessary Preoperative Blood Orders and Costs by Implementing an Updated Institution-specific Maximum Surgical Blood Order Schedule and a Remote Electronic Blood Release System. Anesthesiology 2014; 121:501–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Mak PHK, Campbell RCH, Irwin MG, American Society of Anesthesiologists: The ASA Physical Status Classification: inter-observer consistency. American Society of Anesthesiologists. Anaesth Intensive Care 2002; 30:633–40 [DOI] [PubMed] [Google Scholar]
21.Sankar A, Johnson SR, Beattie WS, Tait G, Wijeysundera DN, Myles PS: Reliability of the American Society of Anesthesiologists physical status scale in clinical practice. BJA Br J Anaesth 2014; 113:424–32 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Hall B, Hamilton B, Richards K, Bilimoria K, Cohen M, Ko C: Does Surgical Quality Improve in the American College of Surgeons National Surgical Quality Improvement Program: An Evaluation of All Participating Hospitals. Ann Surg 2009; 250:363–76 [DOI] [PubMed] [Google Scholar]
23.Hamilton BH, Ko CY, Richards K, Hall BL: Missing Data in the American College of Surgeons National Surgical Quality Improvement Program Are Not Missing at Random: Implications and Potential Impact on Quality Assessments. J Am Coll Surg 2010; 210:125–139.e2 [DOI] [PubMed] [Google Scholar]
24.Zou H, Hastie T: Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 2005; 67:301–20 [Google Scholar]
25.Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and regression trees. Monterey, CA, Wadsworth & Brooks/Cole Advanced Books & Software, 1984. at <https://catalogue.library.cern/literature/1fzjd-7yq74> [Google Scholar]
26.Breiman L: Random Forests. Mach Learn 2001; 45:5–32 [Google Scholar]
27.Chen T, Guestrin C: XGBoost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA, Association for Computing Machinery, 2016, pp 785–94 doi: 10.1145/2939672.2939785 [DOI] [Google Scholar]
28.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É: Scikit-learn: Machine Learning in Python. J Mach Learn Res 2011; 12:2825–30 [Google Scholar]
29.Van Calster B, McLernon DJ, Smeden M van, Wynants L, Steyerberg EW, Bossuyt P, Collins GS, Macaskill P, McLernon DJ, Moons KGM, Steyerberg EW, Van Calster B, van Smeden M, Vickers AJ, On behalf of Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative: Calibration: the Achilles heel of predictive analytics. BMC Med 2019; 17:230. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Vickers AJ, Calster BV, Steyerberg EW: Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 2016; 352:i6. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Vickers AJ: Decision Curve Analysis 2015. at <www.decisioncurveanalysis.org>
32.Centers for Medicare and Medicaid Services: Clinical Laboratory Fee Schedule 2020. at <https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/ClinicalLabFeeSched>
33.Diprose WK, Buist N, Hua N, Thurier Q, Shand G, Robinson R: Physician understanding, explainability, and trust in a hypothetical machine learning risk calculator. J Am Med Inform Assoc 2020; 27:592–600 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I: From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2020; 2:56–67 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Dietterich TG: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 1998; 10:1895–923 [DOI] [PubMed] [Google Scholar]
36.Pempe C, Werdehausen R, Pieroh P, Federbusch M, Petros S, Henschler R, Roth A, Pfrepper C: Predictors for blood loss and transfusion frequency to guide blood saving programs in primary knee- and hip-arthroplasty. Sci Rep 2021; 11:4386. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Nestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, Goldenberg A, Ghassemi M: Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks, Machine Learning for Healthcare Conference. PMLR, 2019, pp 381–405 at <http://proceedings.mlr.press/v106/nestor19a.html> [Google Scholar]
38.Dutton RP, Shih D, Edelman BB, Hess J, Scalea TM: Safety of uncrossmatched type-O red cells for resuscitation from hemorrhagic shock. J Trauma 2005; 59:1445–9 [DOI] [PubMed] [Google Scholar]
39.Napolitano LM, Kurek S, Luchette FA, Anderson GL, Bard MR, Bromberg W, Chiu WC, Cipolle MD, Clancy KD, Diebel L, Hoff WS, Hughes KM, Munshi I, Nayduch D, Sandhu R, Yelon JA, Corwin HL, Barie PS, Tisherman SA, Hebert PC, EAST Practice Management Workgroup, American College of Critical Care Medicine (ACCM) Taskforce of the Society of Critical Care Medicine (SCCM): Clinical practice guideline: red blood cell transfusion in adult trauma and critical care. J Trauma 2009; 67:1439–42 [DOI] [PubMed] [Google Scholar]
40.Vickers AJ, Elkin EB: Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Med Decis Making 2006; 26:565–74 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Materials

NIHMS1770229-supplement-Supplemental_Materials.docx^{(400.2KB, docx)}

[R1] 1.American Society of Anesthesiologists Task Force on Perioperative Blood Management: Practice guidelines for perioperative blood management: an updated report by the American Society of Anesthesiologists Task Force on Perioperative Blood Management*. Anesthesiology 2015; 122:241–75 [DOI] [PubMed] [Google Scholar]

[R2] 2.Friedman BA: An analysis of surgical blood use in United States hospitals with application to the maximum surgical blood order schedule. Transfusion (Paris) 1979; 19:268–78 [DOI] [PubMed] [Google Scholar]

[R3] 3.Dexter F, Ledolter J, Davis E, Witkowski TA, Herman JH, Epstein RH: Systematic Criteria for Type and Screen Based on Procedure’s Probability of Erythrocyte Transfusion. Anesthesiology 2012; 116:768–78 [DOI] [PubMed] [Google Scholar]

[R4] 4.Frank SM, Rothschild JA, Masear CG, Rivers RJ, Merritt WT, Savage WJ, Ness PM: Optimizing Preoperative Blood Ordering with Data Acquired from an Anesthesia Information Management System. Anesthesiology 2013; 118:1286–97 [DOI] [PubMed] [Google Scholar]

[R5] 5.Woodrum CL, Wisniewski M, Triulzi DJ, Waters JH, Alarcon LH, Yazer MH: The effects of a data driven maximum surgical blood ordering schedule on preoperative blood ordering practices. Hematology 2017; 22:571–7 [DOI] [PubMed] [Google Scholar]

[R6] 6.Geißler RG, Franz D, Buddendick H, Krakowitzky P, Bunzemeier H, Roeder N, Van Aken H, Kessler T, Berdel W, Sibrowski W, Schlenke P: Retrospective Analysis of the Blood Component Utilization in a University Hospital of Maximum Medical Care. Transfus Med Hemotherapy 2012; 39:129–38 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Frisch NB, Wessell NM, Charters MA, Yu S, Jeffries JJ, Silverton CD: Predictors and Complications of Blood Transfusion in Total Hip and Knee Arthroplasty. J Arthroplasty 2014; 29:189–92 [DOI] [PubMed] [Google Scholar]

[R8] 8.Hayn D, Kreiner K, Ebner H, Kastner P, Breznik N, Rzepka A, Hofmann A, Gombotz H, Schreier G: Development of Multivariable Models to Predict and Benchmark Transfusion in Elective Surgery Supporting Patient Blood Management. Appl Clin Inform 2017; 8:617–31 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Mathis MR, Kheterpal S, Najarian K: Artificial Intelligence for Anesthesia: What the Practicing Clinician Needs to Know: More than Black Magic for the Art of the Dark. Anesthesiology 2018; 129:619–22 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Jalilian L, Cannesson M: Precision medicine in anesthesiology. Int Anesthesiol Clin 2020; 58:17–22 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Nuttall GA, Santrach PJ, Oliver WC, Ereth MH, Horlocker TT, Cabanela ME, Trousdale RT, Bryant S, Currie TW: A prospective randomized trial of the surgical blood order equation for ordering red cells for total hip arthroplasty patients. Transfusion (Paris) 1998; 38:828–33 [DOI] [PubMed] [Google Scholar]

[R12] 12.Klei WA van, Moons KG, Leyssius AT, Knape JT, Rutten CL, Grobbee DE: A reduction in type and screen: preoperative prediction of RBC transfusions in surgery procedures with intermediate transfusion risks. Br J Anaesth 2001; 87:250–7 [DOI] [PubMed] [Google Scholar]

[R13] 13.Palmer T, Wahr JA, O’Reilly M, Greenfield MLVH: Reducing unnecessary cross-matching: a patient-specific blood ordering system is more accurate in predicting who will receive a blood transfusion than the maximum blood ordering system. Anesth Analg 2003; 96:369–75, table of contents [DOI] [PubMed] [Google Scholar]

[R14] 14.Mitterecker A, Hofmann A, Trentino KM, Lloyd A, Leahy MF, Schwarzbauer K, Tschoellitsch T, Böck C, Hochreiter S, Meier J: Machine learning–based prediction of transfusion. Transfusion (Paris) 2020; 60:1977–86 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Walczak S, Velanovich V: Prediction of perioperative transfusions using an artificial neural network. PLOS ONE 2020; 15:e0229450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Jalali A, Lonsdale H, Zamora LV, Ahumada L, Nguyen ATH, Rehman M, Fackler J, Stricker PA, Fernandez AM, Group PCC: Machine Learning Applied to Registry Data: Development of a Patient-Specific Prediction Model for Blood Transfusion Requirements During Craniofacial Surgery Using the Pediatric Craniofacial Perioperative Registry Dataset. Anesth Analg 2021; 132:160–71 [DOI] [PubMed] [Google Scholar]

[R17] 17.Collins GS, Reitsma JB, Altman DG, Moons KGM: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med 2015; 162:55–63 [DOI] [PubMed] [Google Scholar]

[R18] 18.Shiloach M, Frencher SK, Steeger JE, Rowell KS, Bartzokis K, Tomeh MG, Richards KE, Ko CY, Hall BL: Toward Robust Information: Data Quality and Inter-Rater Reliability in the American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg 2010; 210:6–16 [DOI] [PubMed] [Google Scholar]

[R19] 19.Frank SM, Oleyar MJ, Ness PM, Tobian AAR: Reducing Unnecessary Preoperative Blood Orders and Costs by Implementing an Updated Institution-specific Maximum Surgical Blood Order Schedule and a Remote Electronic Blood Release System. Anesthesiology 2014; 121:501–9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Mak PHK, Campbell RCH, Irwin MG, American Society of Anesthesiologists: The ASA Physical Status Classification: inter-observer consistency. American Society of Anesthesiologists. Anaesth Intensive Care 2002; 30:633–40 [DOI] [PubMed] [Google Scholar]

[R21] 21.Sankar A, Johnson SR, Beattie WS, Tait G, Wijeysundera DN, Myles PS: Reliability of the American Society of Anesthesiologists physical status scale in clinical practice. BJA Br J Anaesth 2014; 113:424–32 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Hall B, Hamilton B, Richards K, Bilimoria K, Cohen M, Ko C: Does Surgical Quality Improve in the American College of Surgeons National Surgical Quality Improvement Program: An Evaluation of All Participating Hospitals. Ann Surg 2009; 250:363–76 [DOI] [PubMed] [Google Scholar]

[R23] 23.Hamilton BH, Ko CY, Richards K, Hall BL: Missing Data in the American College of Surgeons National Surgical Quality Improvement Program Are Not Missing at Random: Implications and Potential Impact on Quality Assessments. J Am Coll Surg 2010; 210:125–139.e2 [DOI] [PubMed] [Google Scholar]

[R24] 24.Zou H, Hastie T: Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 2005; 67:301–20 [Google Scholar]

[R25] 25.Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and regression trees. Monterey, CA, Wadsworth & Brooks/Cole Advanced Books & Software, 1984. at <https://catalogue.library.cern/literature/1fzjd-7yq74> [Google Scholar]

[R26] 26.Breiman L: Random Forests. Mach Learn 2001; 45:5–32 [Google Scholar]

[R27] 27.Chen T, Guestrin C: XGBoost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA, Association for Computing Machinery, 2016, pp 785–94 doi: 10.1145/2939672.2939785 [DOI] [Google Scholar]

[R28] 28.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É: Scikit-learn: Machine Learning in Python. J Mach Learn Res 2011; 12:2825–30 [Google Scholar]

[R29] 29.Van Calster B, McLernon DJ, Smeden M van, Wynants L, Steyerberg EW, Bossuyt P, Collins GS, Macaskill P, McLernon DJ, Moons KGM, Steyerberg EW, Van Calster B, van Smeden M, Vickers AJ, On behalf of Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative: Calibration: the Achilles heel of predictive analytics. BMC Med 2019; 17:230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Vickers AJ, Calster BV, Steyerberg EW: Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 2016; 352:i6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Vickers AJ: Decision Curve Analysis 2015. at <www.decisioncurveanalysis.org>

[R32] 32.Centers for Medicare and Medicaid Services: Clinical Laboratory Fee Schedule 2020. at <https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/ClinicalLabFeeSched>

[R33] 33.Diprose WK, Buist N, Hua N, Thurier Q, Shand G, Robinson R: Physician understanding, explainability, and trust in a hypothetical machine learning risk calculator. J Am Med Inform Assoc 2020; 27:592–600 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I: From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2020; 2:56–67 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Dietterich TG: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 1998; 10:1895–923 [DOI] [PubMed] [Google Scholar]

[R36] 36.Pempe C, Werdehausen R, Pieroh P, Federbusch M, Petros S, Henschler R, Roth A, Pfrepper C: Predictors for blood loss and transfusion frequency to guide blood saving programs in primary knee- and hip-arthroplasty. Sci Rep 2021; 11:4386. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Nestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, Goldenberg A, Ghassemi M: Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks, Machine Learning for Healthcare Conference. PMLR, 2019, pp 381–405 at <http://proceedings.mlr.press/v106/nestor19a.html> [Google Scholar]

[R38] 38.Dutton RP, Shih D, Edelman BB, Hess J, Scalea TM: Safety of uncrossmatched type-O red cells for resuscitation from hemorrhagic shock. J Trauma 2005; 59:1445–9 [DOI] [PubMed] [Google Scholar]

[R39] 39.Napolitano LM, Kurek S, Luchette FA, Anderson GL, Bard MR, Bromberg W, Chiu WC, Cipolle MD, Clancy KD, Diebel L, Hoff WS, Hughes KM, Munshi I, Nayduch D, Sandhu R, Yelon JA, Corwin HL, Barie PS, Tisherman SA, Hebert PC, EAST Practice Management Workgroup, American College of Critical Care Medicine (ACCM) Taskforce of the Society of Critical Care Medicine (SCCM): Clinical practice guideline: red blood cell transfusion in adult trauma and critical care. J Trauma 2009; 67:1439–42 [DOI] [PubMed] [Google Scholar]

[R40] 40.Vickers AJ, Elkin EB: Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Med Decis Making 2006; 26:565–74 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Personalized surgical transfusion risk prediction using machine learning to guide preoperative type and screen orders

Sunny S Lou, MD, PhD

Hanyang Liu, MEng

Chenyang Lu, PhD

Troy S Wildes, MD

Bruce L Hall, MD, PhD, MBA

Thomas Kannampallil, PhD

Abstract

Background

Methods

Results

Conclusions

Introduction

Materials and methods

Figure 1 –

Data sources

Variable selection and extraction

Table 1 –

Computing historical procedure-specific transfusion rates

Data preprocessing

Model training

Model evaluation

Model explanation

Statistical analysis

Results

Cohort characteristics

Performance of machine learning models

Table 2 –

Model generalizability to external validation

Table 3 –

Interpretation of model predictions

Figure 2 –

Figure 3 –

Discussion

Supplementary Material

Acknowledgements:

Funding source:

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases