Skip to main content
Digital Health logoLink to Digital Health
. 2024 Jan 17;10:20552076231224245. doi: 10.1177/20552076231224245

Development of blood demand prediction model using artificial intelligence based on national public big data

Hi Jeong Kwon 1,*, Sholhui Park 2,*, Young Hoon Park 3, Seung Min Baik 4,, Dong Jin Park 5,
PMCID: PMC10798124  PMID: 38250146

Abstract

Objective

Modern healthcare systems face challenges related to the stable and sufficient blood supply of blood due to shortages. This study aimed to predict the monthly blood transfusion requirements in medical institutions using an artificial intelligence model based on national open big data related to transfusion.

Methods

Data regarding blood types and components in Korea from January 2010 to December 2021 were obtained from the Health Insurance Review and Assessment Service and Statistics Korea. The data were collected from a single medical institution. Using the obtained information, predictive models were developed, including eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and category boosting (CatBoost). An ensemble model was created using these three models.

Results

The prediction performance of XGBoost, LGBM, and CatBoost demonstrated a mean absolute error ranging from 14.6657 for AB+ red blood cells (RBCs) to 84.0433 for A+ platelet concentrate (PC) and a root mean squared error ranging from 18.5374 for AB+ RBCs to 118.6245 for B+ PC. The error range was further improved by creating ensemble models, wherein the department requesting blood was the most influential parameter affecting transfusion prediction performance for different blood products and types. Except for the department, the features that affected the prediction performance varied for each product and blood type, including the number of RBC antibody screens, crossmatch, nationwide blood donations, and surgeries.

Conclusion

Based on blood-related open big data, the developed blood-demand prediction algorithm can efficiently provide medical facilities with an appropriate volume of blood ahead of time.

Keywords: Transfusion, big data, prediction model, artificial intelligence, boosting model

Introduction

Ensuring a stable and sufficient blood supply is a critical concern within the healthcare system. However, this presents challenges due to the reliance on blood donations, which have a limited shelf life ranging from just a few days to a few weeks. Furthermore, as the population ages and chronic diseases become more prevalent, the demand for blood continues to rise while blood donations decrease. 1 In particular, the coronavirus disease 2019 (COVID-19) pandemic has further exacerbated the supply of blood supply situation. 2 Statistics from the American Red Cross reveal that overall blood donations dropped by 10% since March 2020, when the COVID-19 pandemic was declared, compared to 2019.3,4 Similarly, in Korea, blood donations for 2020 reduced by 6.4% compared to 2019, leading to a 5.9% reduction on available blood for transfusions. 5 Notably, the decline in blood supply in Korea predates the COVID-19 pandemic. In 2019, before the pandemic spread, total blood donations dropped by 3.2% compared to 2018, with the total blood supply decreasing by 0.1%. 6 Even in the most recent data for 2021, which show that supplies increased by 2.6% and donations declined by 0.3% compared to 2020, there remained a shortage of blood. 6

Since each allogeneic transfusion might result in negative side effects, the transfusion should be conducted after determining the benefits and risks of transfusion. 7 Under the guidelines for patient blood management, transfusions are carried out by using evidence-based medical and surgical concepts apart from simply decreasing blood transfusion in many countries.8,9 These adjustments are intended to deal with the decreasing blood supply while enhancing therapeutic results for the patient. Insufficient blood supply can be addressed by promoting blood donation, but only approximately 3% of the entire population participates in blood donation. 4 As a result, improving the accuracy of predicting the amount of blood used in medical institutions through data analysis can contribute to more effective blood supply management, complementing existing procedures for demand forecasting.

In recent years, the use of artificial intelligence (AI) in the medical field has become remarkable, and medical AI research is being performed. There are some reports of AI studies related to blood transfusion prediction, but most of them focus on certain diseases or patient populations.1015 While numerous studies have been conducted in recent years to estimate the blood requirements of medical facilities, there are still regions and specific institutions where predictive accuracy remains challenging owing to various factors, emphasizing the need for continued research in this area. Globally, blood shortages have been exacerbated by a significant drop in donations. While it is crucial to address this decline, our study focused on leveraging AI to predict blood transfusion demand and assist institutions with efficient resource allocation.

The National Health Insurance System manages all medical activities in South Korea. Medical service information is reposited in the Health Insurance Review and Assessment Service (HIRA) data, except for non-covered services. 16 Moreover, blood supply is managed nationally through voluntary donation. Consequently, detailed information on blood supply and usage is accessible.

The research objectives of this study are as follows: first, the main goal was to create predictive models capable of forecasting the demand for different blood components (red blood cells (RBCs), fresh frozen plasma (FFP), and platelet concentrate (PC)) based on specific blood types. We intend for these models to provide healthcare institutions with accurate estimates of their future blood supply requirements. Second, this study used open public data, such as national blood supply data and COVID-19 statistics, in combination with data from a single medical institution. The objective was to harness these diverse data sources to enhance prediction accuracy. Third, by achieving accurate blood demand predictions, this study aims to contribute to better blood supply management in healthcare systems. This objective is significant in addressing blood shortages and ensuring a stable and sufficient blood supply.

Methods

The study period spanned 12 years, from January 2010 to December 2021. This study primarily focuses on data from Korea. The data for blood transfusions were collected from a single healthcare institution, specifically the Yeouido St. Mary's Hospital.

Data collection and processing

We developed datasets by gathering open public data from the Healthcare Big Data Hub of HIRA 17 and Statistics Korea 18 and data from a single medical institution from January 2010 to December 2021. Monthly data obtained from the Healthcare Big Data Hub of the HIRA and Statistics Korea were as follows: blood inventory by blood components (RBCs, FFP, and PC), transfusion demand, blood supply, national blood donation, national blood donation by blood type (A+, B+, O+, and AB+), and the number of COVID-19 cases from February 2020 to December 2021. Six PC units were equivalent to one apheresis platelet unit. RhD-negative blood types are uncommon in South Korea, with a prevalence of 0.1% for each blood type. In this study, we primarily focused on the commonly encountered blood types because of the availability and comprehensiveness of the data. Notably, RhD-negative blood types were not included in this dataset. This limitation may affect the direct applicability of our prediction models to regions or countries where RhD-negative blood types have a higher clinical significance. Future adaptations of this model should consider incorporating RhD-negative data for a more holistic prediction.

With regard to actual blood usage, the number of blood transfusion counts for each formulation (according to blood types and blood components) was obtained from 247,537 transfusion-related electronic medical records of a single medical institution.

To develop a blood-demand prediction model for a single institution, the following monthly information was obtained from the relevant medical institution: total number of beds, number of surgeries, surgery type (major, moderate, or minor), number of RBC antibody screens, ABO/Rh tests, crossmatch, number of transfusions by department, blood components, and blood type.

The data for analysis were meticulously sourced from open-source repositories. Before inclusion, each dataset underwent a rigorous quality check involving screening for inconsistencies, missing values, and outliers. To ensure that the results derived from these datasets were feasible and generalizable, we used a combination of training and validation, namely, a stratified k-fold cross-validation technique based on the distribution of the correct values. To improve the general applicability of the model, this strategy was further enhanced by ensuring that the datasets included a wide range of transfusion-related medical information representing different scenarios. Additionally, to ensure the integrity of our results, we implemented a stringent data-cleaning regimen. This process involves handling missing values, correcting outliers, standardizing scales, and removing duplicates. This ensured that our data were robust, consistent, and ready for model training and analysis.

Three datasets were meticulously integrated for our analysis. Data from January 2010 to December 2021 served as the foundational dataset, detailing transfusion-related records from a single medical institute. Although the exact period was not specified, the dataset was incorporated based on overlapping identifiers, offering insights into transfusion demand, blood supply, and national donation trends, specifically emphasizing the three primary blood components: RBCs, FFP, and PC. Data on COVID-19 cases from February 2020 to December 2021 were overlaid to examine the potential effects of the pandemic on transfusion practices. Merging was conducted using shared variables such as patient number, date, and blood type to ensure a cohesive and comprehensive dataset for our analysis.

Development of prediction model

We created a blood demand prediction model for each of the four blood types and three blood components (12 models in total). We chose the following three boosting models as regression prediction models and evaluated their performance because the 12 output variables for predicting blood requirements are continuous numerical variables: Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and Category Boosting (CatBoost). The boosting algorithm is a machine-learning ensemble technique that improves prediction and classification performance by sequentially combining several weak learners. Because our data were hospital-structured, we used three boosting models because boosting models perform very well for such structured data. It is a supervised learning model that uses a set of rules to classify and regress data. XGBoost is a type of gradient boosting (GB) model that compensates for slowness and overfitting, which are risks of GB. The LGBM is a leafwise tree model that extends a specific branch of a tree by setting num_leaves. The LGBM has the advantage of learning more deeply about a better node by distinguishing which side is more advantageous when both sides are compared based on the cost function when developing a tree. However, the drawback is that many data points may be discarded. CatBoost can work to its advantage when a single column contains very complex information, and it learns using a balanced tree method (level-wise) such as GB. Owing to their optimized parameter tuning, XGBoost and LGBM are challenging to establish parameters using XGBoost and LGBM, whereas CatBoost can obtain results by employing an internal method without requiring complex parameter tuning. However, when data are simple, it is challenging to develop a model that considers the interaction of columns; therefore, it is generally not appropriate for simple data processing.

In the data wrangling phase, we first extracted the variables into raw units provided by the data source. For example, the numbers of surgeries, RBC antibody tests, and blood transfusions were displayed as raw numbers. To harmonize the data, we merged hospital transfusion-related data with data from open public databases based on a temporal timeline of years and months. Subsequently, we calculated and partitioned the blood products based on time intervals.

The index for prediction performance was analyzed by R-squared (R2), mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE).

Development of ensemble model

There are reports that ensemble models using several AI models can improve prediction performance.19,20 Hence, by combining the three boosting models, a total of 12 ensemble models were created for each component and blood type. These ensemble models were created using soft-weighted voting in which the analysis results of XGBoost, LGBM, and CatBoost were equally weighted.

Stratified k-fold cross-validation

In this study, monthly usage is the output value. Accordingly, there is insufficient data to divide a specific period into a test set and a validation set. Thus, there is insufficient data to divide a specific period into test and validation sets, and it is difficult to designate the selected test and validation sets are representative. Therefore, we divided the monthly data of 12 years into 9 : 1 through K-fold (n_split = 10). This technique allowed us to use all the data from the first 10% of the 10th to the last 10% for training and validation. By employing this method, we ensured that the model was trained and validated on the entirety of the dataset, enhancing its overall robustness. In essence, we utilized an out-of-fold technique that maximized the utilization of the entire dataset without any loss.

Analysis of parameters contributing to prediction performance by “feature importance” technique

The feature importance (FI) technique was employed to determine the most important parameters in the blood demand prediction model. FI refers to a method that assigns a score to input features based on their usefulness in predicting a target variable. 21 The FI method uses ranking to visually assess a parameter's impact on a prediction.

Results

Comparison between actual demand and predicted demand of four models (XGBoost, LGBM, CatBoost, and Ensemble model)

Supplemental Materials 1 and 2 show the monthly and annual average blood consumption of a single institution from January 2010 to December 2021. Figures 13 display the actual monthly blood demand from 2010 to 2021, the data collection period, and the blood demand predicted by the four models. The anticipated values of the ensemble model, as shown in Figures 13, display the average degrees of XGBoost, LGBM, and CatBoost. In the ensemble model, the aspects of the graphs for O+ RBC and O+ PC were comparable to the actual demand and predicted amounts. For FFP, the anticipated value and actual requirements for all four blood types were marginally different.

Figure 1.

Figure 1.

Comparison between actual demand and predicted demand for RBC of XGBoost, LGBM, CatBoost, and ensemble model.

RBC: red blood cell; XGBoost: extreme gradient boosting; LGBM: light gradient boosting machine; CatBoost, category boosting.

Figure 3.

Figure 3.

Comparison between actual demand and predicted demand for platelet concentrate of XGBoost, LGBM, CatBoost, and Ensemble model.

XGBoost, extreme gradient boosting; LGBM, light gradient boosting machine; CatBoost: category boosting.

Figure 2.

Figure 2.

Comparison between actual demand and predicted demand for fresh frozen plasma of XGBoost, LGBM, CatBoost, and Ensemble model.

XGBoost, extreme gradient boosting; LGBM, light gradient boosting machine; CatBoost: category boosting.

Performance of prediction models

In this study, predictive models were developed using three algorithms: XGBoost, LGBM, and CatBoost. These models were applied to predict the blood transfusion requirements based on a combination of blood components (RBC, FFP, and PC) and blood types (A+, B+, O+, and AB+). The results summarized in Table 1 highlight the best-performing models for each scenario. For instance, the prediction performance for RBC transfusions was the most accurate for blood type A+ when using the XGBoost algorithm (R2 = 0.5360, MAE = 23.0618, MSE = 750.6225, and RMSE = 27.3975), whereas blood type O+ showed superior results with CatBoost (R2 = 0.6364, MAE = 19.5797, MSE = 556.2468, and RMSE = 23.5849). For FFP transfusion predictions, XGBoost outperformed blood type A+ (R2 = 0.4592, MAE = 34.7690, MSE = 2109.9710, and RMSE = 45.9344), whereas LGBM performed best for blood type B+ (R2 = 0.3656, MAE = 27.9301, MSE = 1341.7582, and RMSE = 36.6300). CatBoost was the preferred choice for the blood type AB+ FFP (R2 = 0.4917, MAE = 34.2320, MSE = 1983.0699, and RMSE = 44.5317). In terms of PC transfusion predictions, LGBM demonstrated remarkable accuracy for blood type A+ (R2 = 0.6981, MAE = 82.6260, MSE =  12,076.3967, and RMSE = 109.8927), whereas blood type B+ performed better with LGBM than with XGBoost or CatBoost (R2 = 0.7350, MAE = 81.8227, MSE = 13,933.6621, and RMSE = 118.0409). Blood type O+ achieved the highest accuracy with the LGBM (R2 = 0.8399, MAE = 73.1341, MSE = 10,162.0240, and RMSE = 100.8069), whereas blood type AB+ showed the optimal results with the LGBM (R2 = 0.5426, MAE = 53.0461, MSE = 4739.3300, and RMSE = 68.8428). These findings highlight the effectiveness of specific algorithms for different blood components and types in predicting transfusion needs.

Table 1.

Prediction performance for each model by blood components and types.

Variables R 2 MAE MSE RMSE
XGBoost
RBC
 A+ 0.5360* 23.0618 750.6225 27.3975
 B+ 0.4288 22.8674 826.9041 28.7559
 O+ 0.6002 19.7778 611.6880 24.7323
 AB+ 0.3307 14.8101 345.8632 18.5974
FFP
 A+ 0.4592* 34.7690 2109.9710 45.9344
 B+ 0.3610 27.4514 1351.4052 36.7615
 O+ 0.4451* 25.8957 956.5291 30.9278
 AB+ 0.4591 34.7690 2109.9710 45.9344
PC
 A+ 0.6879 84.0433 12,482.7581 111.7263
 B+ 0.7350* 81.8227 13,933.6621 118.0409
 O+ 0.8374 69.4440 10,321.3135 101.5939
 AB+ 0.4758 54.0728 5431.1866 73.6966
LGBM
RBC
 A+ 0.5155 23.8749 790.5339 28.1164
 B+ 0.4441* 22.8849 804.7080 28.3674
 O+ 0.5779 20.5805 645.8559 25.4137
 AB+ 0.3350* 14.6657 343.6337 18.5374
FFP
 A+ 0.4360 36.4550 2200.4620 45.9344
 B+ 0.3656* 27.9301 1341.7582 36.6300
 O+ 0.4442 24.5970 957.9558 30.9509
 AB+ 0.4359 37.0228 2201.0255 46.9151
PC
 A+ 0.6981* 82.6260 12,076.3967 109.8927
 B+ 0.7324 81.4982 14,071.7654 118.6245
 O+ 0.8399* 73.1341 10,162.0240 100.8069
 AB+ 0.5426* 53.0461 4739.3300 68.8428
CatBoost
RBC
A+ 0.5172 23.9773 787.7774 28.0674
 B+ 0.4114 23.5824 852.0790 29.1904
 O+ 0.6364* 19.5797 556.2468 23.5849
 AB+ 0.3171 14.9000 352.9159 18.7861
FFP
 A+ 0.4482 35.1327 2152.7697 46.3979
 B+ 0.3353 27.7660 1405.9616 37.4962
 O+ 0.4451* 24.0549 956.5148 30.9276
 AB+ 0.4917* 34.2320 1983.0699 44.5317
PC
 A+ 0.6957 81.9228 12,171.2840 110.3235
 B+ 0.7350* 83.5574 13,933.1966 118.0390
 O+ 0.8362 71.4978 10,401.4455 101.9875
 AB+ 0.5118 53.4796 5058.6207 71.1240

R2: R-squared; MAE: mean absolute error; MSE: mean squared error; RMSE: root mean squared error; XGBoost: extreme gradient boosting; LGBM: light gradient boosting machine; CatBoost: category boosting; RBC: red blood cell; FFP: fresh frozen plasma; PC: platelet concentrate.

*Best performance among three models.

Ensemble model performance

An ensemble model is created using the first three prediction models (XGBoost, LGBM, and CatBoost) (Table 2). Therefore, R2 for RBC by blood type is 0.5418 for A+, 0.4615 for B+, 0.6288 for O+, and 0.3564 for AB+. R2 for FFP according to blood type were 0.4683, 0.3786, 0.4676, and 0.4876 for A+, B+, 0.4676 for O+, and 0.4876 for AB+, respectively. The R2 values for PC according to blood type were 0.7098, 0.7498, 0.8497, and 0.5291 for A +, B+, 0.8497 for O+, and 0.5291 for AB+, respectively. Except for O+ in the RBCs and AB+ in the FFP, the prediction performance was enhanced.

Table 2.

Prediction performance for ensemble model using blood components and types.

Variables R 2 MAE MSE RMSE
Ensemble model
RBC
 A+ 0.5418 23.4566 747.5972 27.3422
 B+ 0.4615 22.4865 779.4861 27.9193
 O+ 0.6288 19.5733 567.8626 23.8299
 AB+ 0.3564 14.4631 332.5969 18.2372
FFP
 A+ 0.4683 34.7742 2074.6106 45.5479
 B+ 0.3786 27.3093 1314.3403 36.2538
 O+ 0.4676 23.9352 917.7138 30.2938
 AB+ 0.4876 34.4725 1999.1331 44.7117
PC
 A+ 0.7098 80.6534 11,608.2959 107.7418
 B+ 0.7498 79.3645 13,159.3359 114.7141
 O+ 0.8497 66.9101 9544.2003 97.6944
 AB+ 0.5291 52.5927 4878.5954 69.8469

R2: R-squared; MAE: mean absolute error; MSE: mean squared error; RMSE: root mean squared error; RBC: red blood cell; FFP: fresh frozen plasma; PC: platelet concentrate.

Parameters contributing to prediction performance by “feature importance” technique

For the ensemble model, we examined the parameters that affected the prediction performance for each blood component and blood type using the FI (Figures 46). A+ RBCs revealed a feature impact in the order of department, number of crossmatch tests, Type A national blood donation, number of RBC antibody screening tests, and actual demand for RBC nationwide. In B+ RBCs, the order was department, number of crossmatch tests, number of RBC antibody screening tests, number of hospital beds, and total number of surgeries. The number of crossmatch tests, departments, RBC antibody screening tests, RBC reserves, and hospital beds were all in the proper order for O+ RBC.

Figure 4.

Figure 4.

Feature impact on red blood cell (RBC) demand forecasting performance in ensemble models. The small boxes of each result are the department's feature impact.

Figure 6.

Figure 6.

Feature impact on platelet concentrate demand forecasting performance in ensemble models. The small boxes of each result are the department's feature impact.

Figure 5.

Figure 5.

Feature impact on fresh frozen plasma demand forecasting performance in ensemble models. The small boxes of each result are the department's feature impact.

For A+ FFP, the order was as follows: department, year, number of moderate surgeries, total number of surgeries, and number of major surgeries. In B+ FFP, the department, number of major surgeries, number of RBC antibody screening tests, year, and number of cross-matching tests were in that order. The department, number of major operations, year, capital O+ blood type donation, and number of RBC antibody screening tests were all in proper order for O+ FFP. The department, year, FFP reserves, total number of operations, and number of large operations were in the proper order in AB+ FFP (Figures 46).

In A+ PC, department, year, number of RBC antibody screening tests, A+ blood type national blood donation performance, and number of moderate surgeries were ranked in that order. In B+ PC, the department, number of RBC antibody screening tests, number of hospital beds, year, and number of COVID-19 verified cases were in that order. In the O+ PC department, the number of RBC antibody screening tests, year, number of COVID-19 confirmed cases, and O+ blood type national blood donation performance were ranked in that order. In AB+ of PC, department, number of RBC antibody screening tests, number of ABO/Rh tests, and AB+ blood type blood donation in the capital city were in that order (Figures 46).

Regarding the feature impact results of the FI, the importance of the department was high overall. Consequently, we examined the feature impact of the department in detail (Figures 46). For RBCs, the A+ blood type followed the order of neurosurgery, hematology–oncology, pediatrics, rheumatology, and pulmonology. For the B+ blood type, the order of neurosurgery, infectious medicine, nephrology, gastroenterology, and surgery is depicted. The O+ blood type revealed the order of neurology, neurosurgery, pediatrics, rheumatology, and plastic surgery (PS). The AB+ blood type followed the order of hematology–oncology, emergency department, gastroenterology, surgery, and pulmonology. The order of gastroenterology, thoracic surgery, emergency department, hematology–oncology, and surgery for FFP was A+. In the order of B blood type, neurosurgery, neurology, PS, infectious medicine, and hematology–oncology were performed. The O+ blood type followed the order of surgery, emergency department, gastroenterology, pulmonology, and Orthopedics. For the AB+ blood type, the order of the emergency department, surgery, neurology, orthopedics, and cardiology is shown. For PC, the A+ blood type was in the following order: hematology–oncology, urology, pulmonology, cardiology, and orthopedics. For the B+ blood type, the order was hematology–oncology, pulmonology, orthopedics, urology, and otorhinolaryngology. The O+ blood type was associated with hematology–oncology, otorhinolaryngology, rheumatology, nephrology, and infectious medicine. For the AB+ blood type, the order of hematology–oncology, nephrology, pulmonology, gastroenterology, and gynecology was revealed.

Discussion

We constructed an AI model using open public data related to the national blood supply and information related to blood transfusion in a medical institution, without the need for patient clinical information. This study is novel in that a blood-demand prediction model was created by subdividing the demand by blood components and blood type. The development of models that predict blood usage according to blood type and components can help hospitals become more organized when requesting and receiving blood, which can lead to more efficient care at the hospital level. According to Supplemental Materials 1 and 2, monthly and yearly usage were not consistent, and the range of standard deviations was quite wide. The MAE and RMSE values of our boosting models were within a reasonable range, indicating that they contributed to the resolution of the regression issue.

While many previous studies have extensively discussed the impact of reduced blood donations and a lack of blood supply,14 our study aims to address this problem from a demand-forecasting perspective. Recognizing that efficient demand forecasting can alleviate supply problems, we use AI to develop an optimal model for this purpose. There have been limited attempts to use AI to predict blood demand. For example, Lin et al. 22 utilized a linear regression model to predict blood needs. In contrast, Shokouhifar and Ranjbarimesan 23 employed a traditional time-series analysis. In contrast, our study utilized advanced machine learning techniques that provide a more robust and dynamic approach to addressing the complex nature of blood demand. Medical institutions can develop more effective strategies, optimize blood donation campaigns, and allocate resources by accurately predicting the amount of blood needed. Our findings complement the existing literature and pave the way for a paradigm shift in blood supply management approaches. This study bridges the gap between observed blood donation reductions and the practical steps that can be taken to manage and mitigate the resulting shortages.

In this study, K-fold validation was used to prevent data loss, and stratified K-fold validation was applied to the learning of the monthly blood predictive volume model we wanted to predict. Typically, only nominal scales with stratified options can be used. However, the predictive value that we wanted to investigate was an ordinal scale composed of numbers. To achieve an optimum outcome through the learning process, the ordinal scale, which represented the proper value, was divided into portions. In each K-fold, it was possible to obtain optimal outcomes only when training was conducted; therefore, the data of the most different intervals were included. In this study, using the q_10 option, the correct answer interval during learning was divided into 10 equal parts. The correct answer value was the same for each k-fold. The intervals were divided and learned to achieve optimal results. Logistic regression is frequently employed in regression problems, and boosting models can be utilized for both classification and regression problems. 24 Our data structure includes linear variables such as numerical data and categorical variables such as department. Consequently, we developed AI models for blood-demand prediction using multiple-tree-based boosting models. We applied three boosting models instead of a general linear model to address this regression issue.

By combining the three previously developed boosting models, we created an ensemble model that increased prediction accuracy (Table 2). An ensemble model improves the generalization performance (i.e. reduces variance) by combining different models learned individually. Furthermore, there is an effect of decreasing the overfitting of individual models, such that the overall performance (R2, MAE, and RMSE) is enhanced.

By examining the features that affected predictive performance, the parameters of open public data were found to be as crucial as transfusion-related hospital data. A detailed analysis of the features using the FI technique is as follows: except for O+ RBC, the department had the highest feature impact. Neurosurgery has a significant impact on the prediction of RBC transfusions. Although many guidelines emphasize that RBC transfusion should follow a restrictive threshold of 7–8 g/dL, 25 there is a report that, concerning brain function, setting the transfusion target hemoglobin level slightly higher can improve clinical results. 26

Regarding the prediction of FFP transfusion, the impact of Emergency Medicine and Surgery was high. When active bleeding occurs due to clotting factor insufficiency, FFP is used to ensure hemostasis. FFP is also administered for planned surgery or invasive methods, warfarin reversal, or when vitamin K is inadequate to reverse the effects of warfarin and thrombotic thrombocytopenic purpura. 27 Nonetheless, FFP is frequently empirically transfused in circumstances of prolonged international normalized ratio (INR). FFP transfusions should be avoided since prolonged INR can occur in a variety of circumstances. 28 One study reported that FFP transfusions were inappropriately conducted in 60% of patients in the emergency department. 26 Contrarily, 90% of patients who did not have active bleeding or who received preventive transfusions received excessive FFP infusions. 29

For PC, hematology–oncology had a high influence on all blood types. This may also be because the proportion of patients with thrombocytopenia is high in hematology–oncology, and thrombocytopenia requires prompt platelet administration due to the high bleeding risk. Moreover, since the storage period of PC is limited to 5–7 days, 30 an immediate request from the blood inventory should be made when a blood transfusion is needed. As a result, effective preparation and supply of PC are expected by examining the hematology–oncology status of the medical institution and utilizing our model.

Excluding the department, important features in transfusion prediction differed according to the blood component and blood type. For RBCs, the feature impact was high for the number of crossmatch tests. The number of RBC antibody screening tests also significantly affected PC. These two tests are required for blood transfusion, and their outcomes are predictable. However, for FFP, the number of surgeries (particularly, major operations) had a significant impact. Recently, the limited use of FFP has been emphasized.31,32 The use of FFP for volume expanders is discouraged, and it is recommended that the cause of prolonged INR be corrected rather than corrected INR itself. 33 However, our results showed that FFP was significantly associated with the number of operations. Although FFP transfusion is unavoidable in some cases, determining whether the latest transfusion guidelines should be followed is important. Necessary blood transfusions must be promptly administered; however, excessive inappropriate transfusions should be prevented. This is because of the potential consequences of blood transfusion, such as acute lung injury and circulatory overloads. 34

The predictive modeling of blood demand in the hospitals that we developed was deemed beneficial. However, given that the decline in global blood donation remains a root issue, our study represents only a part of the broader solution that urgently needs direct intervention. In South Korea, hospitals address transfusion shortages in various ways, such as targeted donations, wherein the donated blood is specifically provided to designated patients. This approach is particularly employed for inpatients requiring surgery and is effective at the individual hospital level. Moreover, although not fully practical, the advent of artificial blood could be a potential solution to address future blood shortages.

In evaluating the practical applications of our AI model, its integration holds significant promise for both clinics and blood centers. Clinics can harness this model to refine blood order estimates, minimize waste, and meet patient needs. Blood centers entrusted with the responsibility of meeting the blood demands of multiple health facilities can utilize predictive insights to optimize their distribution strategies. This synergy, facilitated by our AI prediction model, not only promotes efficient resource allocation but also enhances patient care standards.

To highlight the potential of AI in predicting blood demand, this study has several noteworthy limitations. Relying on hospital transfusion data and domestic open data, our dependence on open-source repositories may introduce biases or errors despite meticulous data cleaning. Variability in data periods could lead to consistency in capturing long-term trends. Although we employed three boosting machine learning algorithms for their anticipated efficacy, we recognize the potential of unexplored models for enhanced insights. Our study underscores the need for external validation from other institutions for robust generalization based on 12 years of data from specific hospitals and domestic open data. Future research using multicenter transfusion data could develop more generalized and accurate models. Models fine-tuned to specific datasets may not retain their precision when extrapolated to regions or countries with distinct medical dynamics. Relying predominantly on historical data, our model may only partially encapsulate blood-demand complexities, especially during unforeseen health emergencies or socio-political shifts. These limitations highlight areas for potential improvement and underscore the intricacies of the issue and the expansive scope of future research in this domain.

Conclusion

Sufficient blood supply is a countermeasure against blood demand, which has always been insufficient worldwide. An adequate blood supply is difficult to achieve for several reasons. The model we created is based on open public data related to blood supply in a country and will enable medical institutions to predict the required amount of blood in advance. The study found that the department requesting the blood was a significant parameter. Future research could investigate other parameters that might influence transfusion prediction performance. This study used data from January 2010 to December 2021. A longitudinal study conducted over a longer period could provide more insights into the trends and patterns in blood transfusion usage.

Contributorship

HJK, SMB, and DJP conceived and designed the study; SMB and DJP developed the methodology; SMB and DJP acquired the data; HJK, SP, YHP, SMB, and DJP analyzed and interpreted the data; and wrote, reviewed, and revised the manuscript.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076231224245 - Supplemental material for Development of blood demand prediction model using artificial intelligence based on national public big data

Supplemental material, sj-docx-1-dhj-10.1177_20552076231224245 for Development of blood demand prediction model using artificial intelligence based on national public big data by Hi Jeong Kwon, Sholhui Park, Young Hoon Park, Seung Min Baik and Dong Jin Park in DIGITAL HEALTH

sj-docx-2-dhj-10.1177_20552076231224245 - Supplemental material for Development of blood demand prediction model using artificial intelligence based on national public big data

Supplemental material, sj-docx-2-dhj-10.1177_20552076231224245 for Development of blood demand prediction model using artificial intelligence based on national public big data by Hi Jeong Kwon, Sholhui Park, Young Hoon Park, Seung Min Baik and Dong Jin Park in DIGITAL HEALTH

Footnotes

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval: This study was approved by the Institutional Review Board of the Catholic University of Korea at Yeouido St. Mary's Hospital (approval number: SC22RISI0001). This research was conducted ethically, and all study procedures were performed in accordance with the requirements of the Declaration of Helsinki of the World Medical Association. The requirement for informed consent was waived due to the retrospective nature of this study.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

Guarantor: SMB and DJP.

Supplemental material: Supplemental material for this article is available online.

References

  • 1.Mowla SJ, Sapiano MRP, Jones JM, et al. Supplemental findings of the 2019 National Blood Collection and Utilization Survey. Transfusion 2021; 61: S11–S35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Veseli B, Sandner S, Studte S, et al. The impact of COVID-19 on blood donations. PLoS One 2022; 17: e0265171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Community Blood Center, https://givingblood.org/donate-blood/why-give-blood.aspx (2022, accessed 15 January 2022).
  • 4.American Red Cross, https://www.redcross.org/about-us/news-and-events/press-release/2022/blood-donors-needed-now-as-omicron-intensifies.html (2022, accessed 15 January 2022).
  • 5.Korean Statical Information Service, https://kosis.kr/index/index.do (2020, accessed 15 January 2022).
  • 6.Korean Red Cross, https://www.redcross.or.kr/main/main.do (2019, accessed 15 January 2022).
  • 7.Sahu S, Hemlata. Verma A. Adverse events related to blood transfusion. Indian J Anaesth 2014; 58: 543–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ellingson KD, Sapiano MRP, Haass KA, et al. Continued decline in blood collection and transfusion in the United States-2015. Transfusion 2017; 57: 1588–1598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Shander A, Isbister J, Gombotz H. Patient blood management: The global view. Transfusion 2016; 56: S94–102. [DOI] [PubMed] [Google Scholar]
  • 10.Wang M, Cheng J, Li X, et al. Development and validation of a machine learning algorithm for prediction of platelet transfusion efficiency in patients with hematological diseases. Blood 2019; 134: 2454–2454. [Google Scholar]
  • 11.Yao Y, Cifuentes J, Zheng B, et al. Computer algorithm can match physicians’ decisions about blood transfusions. J Transl Med 2019; 17: Article ID: 340. -y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Durand WM, DePasse JM, Daniels AH. Predictive modeling for blood transfusion after adult spinal deformity surgery: A tree-based machine learning approach. Spine (Phila Pa 1976) 2018; 43: 1058–1066. [DOI] [PubMed] [Google Scholar]
  • 13.Doshi KA, Shastry S, Pai VB. Transfusion requirement prediction score for patients undergoing cardiac surgery: An experience from a tertiary care set-up from south India. Transfus Med 2021; 31: 243–249. [DOI] [PubMed] [Google Scholar]
  • 14.Akaraborworn O, Chaiwat O, Chatmongkolchart S, et al. Prediction of massive transfusion in trauma patients in the surgical intensive care units (THAI-SICU study). Chin J Traumatol 2019; 22: 219–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cantle PM, Cotton BA. Prediction of massive transfusion in trauma. Crit Care Clin 2017; 33: 71–84. [DOI] [PubMed] [Google Scholar]
  • 16.Kim JA, Yoon S, Kim LY, et al. Towards actualizing the value potential of Korea health insurance review and assessment (HIRA) data as a resource for health research: Strengths, limitations, applications, and strategies for optimal use of HIRA data. J Korean Med Sci 2017; 32: 718–728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Healthcare Bigdata Hub of Health Insurance Review and Assessment Service, https://opendata.hira.or.kr/home.do (2022, accessed 15 January 2022).
  • 18.Statics Korea, https://kostat.go.kr/portal/korea/index.action (2022, accessed 15 January 2022).
  • 19.Baik SM, Lee M, Hong KS, et al. Development of machine-learning model to predict COVID-19 mortality: Application of ensemble model and regarding feature impacts. Diagnostics (Basel) 2022; 12, 1464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Park DJ, Park MW, Lee H, et al. Development of machine learning model for diagnostic disease prediction based on laboratory tests. Sci Rep 2021; 11: 7567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Vimalachandran A, Jayachandran T, Tkachenko AY, et al. Gas turbine design analysis and optimization with novel hybrid model using classical physics and machine learning. In: 2021 international scientific and technical engine conference (EC), Samara, Russian Federation, 23–25 June 2021, pp.1–9. IEEE. [Google Scholar]
  • 22.Lin F, He X, Zhang H, et al. Forecasting blood supply in Chinese major cities by fractional grey prediction model and linear regression model. medRxiv 2023. 2023.2004.2025.23287469 10.1101/2023.04.25.23287469 [DOI] [Google Scholar]
  • 23.Shokouhifar M, Ranjbarimesan M. Multivariate time-series blood donation/demand forecasting for resilient supply chain management during COVID-19 pandemic. Clean Logist Supply Chain 2022; 5: 100078. [Google Scholar]
  • 24.Emmert-Streib F, Yli-Harja O, Dehmer M. Artificial intelligence: A clarification of misconceptions, myths and desired status. Front. Artif. Intell 2020; 3: 524339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Carson JL, Guyatt G, Heddle NM, et al. Clinical practice guidelines from the AABB: Red blood cell transfusion thresholds and storage. JAMA 2016; 316: 2025–2035. [DOI] [PubMed] [Google Scholar]
  • 26.Griesdale DE, Sekhon MS, Menon DK, et al. Hemoglobin area and time Index above 90 g/L are associated with improved 6-month functional outcomes in patients with severe traumatic brain injury. Neurocrit Care 2015; 23: 78–84. [DOI] [PubMed] [Google Scholar]
  • 27.Khawar H, Kelley W, Stevens JB, et al. Fresh frozen plasma (FFP). Treasure Island (FL): StatPearls Publishing, 2022. [PubMed] [Google Scholar]
  • 28.Dellinger RP, Carlet JM, Masur H, et al. Surviving sepsis campaign guidelines for management of severe sepsis and septic shock. Crit Care Med 2004; 32: 858–873. [DOI] [PubMed] [Google Scholar]
  • 29.Emektar E, Dagar S, Corbacioglu SK, et al. The evaluation of the audit of fresh-frozen plasma (FFP) usage in emergency department. Turk J Emerg Med 2016; 16: 137–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Aubron C, Flint AWJ, Ozier Y, et al. Platelet storage duration and its clinical and transfusion outcomes: A systematic review. Crit Care 2018; 22: 185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Green L, Cardigan R, Beattie C, et al. Addendum to the British Committee for Standards in Haematology (BCSH): Guidelines for the use of fresh-frozen plasma, cryoprecipitate and cryosupernatant, 2004 (Br J Haematol 2004, 126, 11–28). Br J Haematol 2017; 178: 646–647. [DOI] [PubMed] [Google Scholar]
  • 32.Stanworth SJ, Brunskill SJ, Hyde CJ, et al. Is fresh frozen plasma clinically effective? A systematic review of randomized controlled trials. Br J Haematol 2004; 126: 139–152. [DOI] [PubMed] [Google Scholar]
  • 33.Biu E, Beraj S, Vyshka G, et al. Transfusion of fresh frozen plasma in critically ill patients: Effective or useless? Open Access Maced J Med Sci 2018; 6: 820–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Semple JW, Rebetz J, Kapur R. Transfusion-associated circulatory overload and transfusion-related acute lung injury. Blood 2019; 133: 1840–1853. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-docx-1-dhj-10.1177_20552076231224245 - Supplemental material for Development of blood demand prediction model using artificial intelligence based on national public big data

Supplemental material, sj-docx-1-dhj-10.1177_20552076231224245 for Development of blood demand prediction model using artificial intelligence based on national public big data by Hi Jeong Kwon, Sholhui Park, Young Hoon Park, Seung Min Baik and Dong Jin Park in DIGITAL HEALTH

sj-docx-2-dhj-10.1177_20552076231224245 - Supplemental material for Development of blood demand prediction model using artificial intelligence based on national public big data

Supplemental material, sj-docx-2-dhj-10.1177_20552076231224245 for Development of blood demand prediction model using artificial intelligence based on national public big data by Hi Jeong Kwon, Sholhui Park, Young Hoon Park, Seung Min Baik and Dong Jin Park in DIGITAL HEALTH


Articles from Digital Health are provided here courtesy of SAGE Publications

RESOURCES