Abstract
Purpose
The uncertainty in supply and the short shelf life of blood products have led to a substantial outdating of the collected donor blood. On the other hand, hospitals and blood centers experience severe blood shortage due to the very limited donor population. Therefore, the necessity to forecast the blood supply to minimize outdating as well as shortage is obvious. This study aims to efficiently forecast the supply of blood components at blood centers.
Methods
Two different types of forecasting techniques, time series and machine learning algorithms, are developed and the best performing method for the given case study is determined. Under the time series, we consider the Autoregressive (AUTOREG), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA, Seasonal Exponential Smoothing Method (ESM), and Holt-Winters models. Artificial neural network (ANN) and multiple regression are considered under the machine learning algorithms.
Results
We leverage five years worth of historical blood supply data from the Taiwan Blood Services Foundation (TBSF) to conduct our study. On comparing the different techniques, we found that time series forecasting methods yield better results than machine learning algorithms. More specifically, the least value of the error measures is observed in seasonal ESM and ARIMA models.
Conclusions
The models developed can act as a decision support system to administrators and pathologists at blood banks, blood donation centers, and hospitals to determine their inventory policy based on the estimated future blood supply. The forecasting models developed in this study can help healthcare managers to manage blood inventory control more efficiently, thus reducing blood shortage and blood wastage.
1. Introduction
Blood performs several important functions in the human body such as transporting oxygen, carrying supplements to our cells, disposing ammonia, carbon dioxide, and other waste items. Four of the most critical elements are the red blood cells (RBC), white blood cells (WBC), plasma, and platelets [1]. The American Red Cross reported that over 35,000 RBC units, 10,000 plasma units, and 7,000 platelet units are required day-to-day within the US [2]. Due to the short shelf life of blood components, hospitals and blood centers are faced with the challenge of maintaining appropriate inventory levels to avoid outdating and shortage.
Managing blood supply and demand is the core part of the healthcare supply chain system as blood plays a very crucial role in saving human lives. Blood supply forecasting is essential for making supply chain decisions, such as donor drive scheduling, vehicle routing policies, and inventory management, at blood centers and hospitals. Accurate forecasts of the timing and amount of future blood requests have been considered as the key inputs to donor recruitment decision making and inventory control. It is important to gather data for several years to forecast monthly demand and to recognize seasonality in demand [3–6]. Lestari et al. [7] indicated that the forecasting can predict the data trend observed and future demand for blood components.
2. Literature Review
Several studies have leveraged time series forecasting techniques for predicting the blood demand at hospitals and blood centers. For instance, Pereira [8] investigated and evaluated the autoregressive integrated moving average (ARIMA) model and Holt-Winters exponential smoothing model to predict monthly demand for red blood cell transfusions at a tertiary care. While these methods focused on using time series forecast, Bosnes et al. [9] used the statistical regression technique for the forecast of blood donor arrivals at the blood bank of Oslo and found that the most important factors among 18 explanatory variables were: donor age, time from making an appointment to arriving at the drive, contact methods used, number of prior donations, and donor no-show rate. Fortsch and Khapalova [10] introduced numerous practical methods to predict future demand of blood. Several forecasting models, including the naïve, exponential smoothing, moving average, and time series decomposition, were tested using the daily demand data from a blood center that were obtained for January 2006 to December 2012. They also compared the performance of these methods with an autoregressive moving average (ARMA) model. The results revealed that the ARMA forecasting model performed better for eight out of nine time series model settings. Similarly, Khaldi et al. [11] explored the capabilities of employing machine learning algorithms such as the artificial neural network (ANN) model to predict future demand for blood.
3. Materials and Methods
As discussed earlier, the study aims to develop effective forecasting methods to predict the supply of RBCs using two different techniques: time series forecasting methods and machine learning algorithms.
3.1. Time Series Forecasting
This section discusses the seven time series forecasting methods used in this study.
3.1.1. Autoregressive (AUTOREG) Model [12, 13]
The AUTOREG procedure estimates and forecasts linear regression models for time series data when the errors are autocorrelated. The autoregressive model regresses the value of the series at time t (Yt) on the values during the time periods t − 1, t − 2,…, t − p. The mathematical formula is expressed as follows:
(1) |
where α0, α1, α2,…, αp are the linear regression coefficients, Yt is the forecasted value at time t, and εt is the random error variable and is generally assumed to have a normal distribution with mean 0 and variance σ2 (i.e., normal (0, σ2)).
3.1.2. Autoregressive Moving Average (ARMA) Models [12–14]
ARMA model is one of the basic tools in time series modeling. Suppose the time series Y1, Y2,…, Yt is a stationary stochastic process time series, the expression ARMA (p, q) represents the model with autoregressive order of p and moving-average order of q. This model is a combination of the AR (p) and MA (q) models, where AR (p) is written as Yt=a+∅1Yt−1+∅2Yt−2+⋯+∅pYt−p+εt and MA (q) is written as Yt=b − θ1εt−1 − θ2εt−2 − ⋯−θqεt−q+εt.
As in the AUTOREG model, Yt is the observation value at time t. The ARMA (p, q) process is generally written as follows:
(2) |
where a, b, and c are constants, εt is the random error variable and is generally assumed to have a normal distribution with mean 0 and variance σ2; ∅1, ∅2,…, ∅p are the autoregressive coefficients to be estimated, and θ1, θ2,…, θq are the moving average coefficients to be estimated.
3.1.3. Autoregressive Integrated Moving Average (ARIMA) Model [12–14]
The ARIMA (autoregressive integrated moving average) approach was made popular by Box–Jenkins models [11]. The ARIMA procedure is functioning as a linear combination of its current values, past values, past errors, and past values of other time series (predictor time series) to predict a future response value in a time series.
With time series nonstationary behavior, the above ARMA (p, q) model can be extended and written using difference which is defined as: Yt − Yt−1=(1 − B)Yt= ∇Yt, where t is the index of time, Yt is time series {Yt : 1 ≤ t ≤ n} at time t, and B is the backward shift operator, which means that B has the effect of shifting the data back one period (i.e., BYt=Yt−1).
3.1.4. Seasonal ARIMA Model [12, 13, 15, 16]
Seasonal ARIMA model is written with the general expression ARIMA (p, d, q)(P, D, Q)s. The symbol p is the order of the nonseasonal autoregressive component, d is the order of the differencing, q is the order of the nonseasonal moving-average process, P is the order of the seasonal autoregressive part, D is the order of the seasonal differencing, Q is the order of the seasonal moving-average process, and s is the duration of the seasonal cycle.
Let Yt be a dependent time series {Yt : 1 ≤ t ≤ n} at time t, then the mathematical formula for the seasonal ARIMA model is expressed as follows:
(3) |
where μ is the constant mean, Bs is the seasonal backward shift operator, ∅s(Bs)=1 − ∅s,1(Bs) − ⋯− ∅s,P(BsP) is the seasonal autoregressive component, and θs(Bs)=1 − θs,1(Bs) − ⋯−θs,Q(BsQ) is the seasonal moving-average component.
3.1.5. Seasonal Exponential Smoothing Model [12, 13, 15, 16]
In the seasonal exponential smoothing method (ESM), the equation of forecast value at time t+k (Yt+k) is given by
(4) |
The smoothing equations are as follows:
(5) |
(6) |
where Xt is given observation at time t, and α and γ are the level and seasonal smoothing parameters, respectively, Lt is the estimated level component at time t, St is the estimated seasonal component at time t, and p is the periods after which the seasonal cycle repeats itself.
3.1.6. Multiplicative Holt-Winters Model [12, 13, 15, 16]
The Holt-Winters model, also known as the triple exponential smoothing, applies three types of exponential smoothing to the time series—value, trend, and seasonality. The model equation for the Holt-Winters method can be either additive or multiplicative model. In this section, we present the multiplicative Holt-Winters model, whereas Section 3.1.7 presents the additive model.
The mathematical formula relevant to a time series with a trend and constant seasonal component using the Holt-Winters additive technique has the forecast at time t+k (Yt+k) given by following equation:
(7) |
The smoothing equations are given using the following equations:
(8) |
(9) |
(10) |
where Xt is given observation at time t, α, β, and γ are the level, trend, and seasonal corresponding constants, respectively, Lt is the estimated level at time t, Tt is the estimated trend at time t, SIt is the seasonality index at time t, and p is the periods after which the seasonal cycle repeats itself.
3.1.7. Additive Holt-Winters Model [12, 13, 15, 16]
In this section, we present the additive Holt-Winters Model.
For the additive model, the forecasted supply estimate for time t+k is given by the following equation:
(11) |
The estimates of level, trend, and seasonal factors for additive model equations are given using the following equations:
(12) |
(13) |
(14) |
3.2. Machine Learning Algorithms
Machine learning is a technology exploring the algorithms to analyze a set of data, learn from the insights gathered, and make predictions on data [17]. For the blood supply forecasting, we leverage the two most widely used machine-learning techniques, artificial neural network and regression.
3.2.1. Artificial Neural Networks (ANN)
ANN is a reinforcement learning method that is an adaptation of biological neural network. The network consists of several nodes that are distributed across numerous layers, and each layer is connected to its previous and subsequent layers within the network [17]. These interconnected elements work closely to process information that they receive from the nodes of the previous layers and transfer them to the next layer based on the sigmoid function. They are particularly useful for modeling complex relationships in high-dimensional data or where the relationship between the input and output variables is not easy to understand [17].
3.2.2. Multiple Regression
Multiple regression is another class of problem in machine learning that is trying to predict a continuous value of a variable instead of a class unlike in classification problem [17]. Linear regression with ordinary least square is one of the classic machine learning algorithms in this domain. The mathematical formula for the regression model is represented as follows:
(15) |
where Y is the response variable, Xn is an independent variable, β0 is the intercept, βi is the slope of the coefficient Xi (both β0 and βi are unknown coefficients to be estimated by the model), and ε is the error variable.
3.3. Evaluation of the Different Methods
We use four different measures of forecast errors for evaluating the model performance and the accuracy of the methods; they are MAE, MSE, BIAS, and MAPE [12, 15, 18].
Assume X1, X2,…, Xn are actual data and F1, F2,…, Fn are forecasted data, and then the n values of forecast errors, e1, e2,…, en, are given by e1=F1 − X1, e2=F2 − X2,…, en=Fn − Xn.
Mean absolute error (MAE): it measures the average significance of the forecast errors, where all individual errors have equal weights:
(16) |
(b) Mean squared error (MSE): it also measures the significance of the forecast errors, and larger errors get penalized more due to squaring:
(17) |
(c) BIAS: this is an indication of whether the forecast is overestimating or underestimating the actual supply over the forecast horizon:
(18) |
(d) Mean absolute percentage error (MAPE): it measures the relative significance of forecasting errors in percentage terms:
(19) |
4. Results
4.1. Data Collection
The historical supply data for five years from 2013 to 2017 are first gathered from the health records. The summary statistics are given in Table 1.
Table 1.
Year | Day | Average | Min. | Max. | Standard deviation | Coefficient of supply variation (%) |
---|---|---|---|---|---|---|
2013 | Sunday | 188 | 32 | 461 | 84 | 44.68 |
Monday | 1,523 | 173 | 1,928 | 287 | 18.84 | |
Tuesday | 820 | 154 | 1,558 | 200 | 24.39 | |
Wednesday | 961 | 327 | 1,606 | 254 | 26.43 | |
Thursday | 1,127 | 299 | 1,596 | 282 | 25.02 | |
Friday | 1,039 | 458 | 1,956 | 263 | 25.31 | |
Saturday | 135 | 43 | 462 | 68 | 50.37 | |
| ||||||
2014 | Sunday | 174 | 31 | 456 | 82 | 47.13 |
Monday | 1,525 | 688 | 2,324 | 351 | 23.02 | |
Tuesday | 858 | 327 | 1,935 | 253 | 29.49 | |
Wednesday | 857 | 168 | 1,474 | 210 | 24.50 | |
Thursday | 1,238 | 80 | 2,048 | 304 | 24.56 | |
Friday | 1,013 | 84 | 2,027 | 314 | 31.00 | |
Saturday | 138 | 31 | 587 | 103 | 74.64 | |
| ||||||
2015 | Sunday | 200 | 39 | 531 | 126 | 63.00 |
Monday | 1,504 | 850 | 2,636 | 303 | 20.15 | |
Tuesday | 850 | 495 | 1,421 | 200 | 23.53 | |
Wednesday | 855 | 1 | 1,461 | 252 | 29.47 | |
Thursday | 1,381 | 139 | 1,923 | 309 | 22.38 | |
Friday | 1,025 | 197 | 1,450 | 253 | 24.68 | |
Saturday | 164 | 31 | 660 | 122 | 74.39 | |
| ||||||
2016 | Sunday | 204 | 31 | 542 | 99 | 48.53 |
Monday | 1,497 | 162 | 2,073 | 331 | 22.11 | |
Tuesday | 855 | 372 | 1,572 | 239 | 27.95 | |
Wednesday | 862 | 146 | 1,264 | 199 | 23.09 | |
Thursday | 1,439 | 547 | 2,643 | 319 | 22.17 | |
Friday | 1,060 | 81 | 2,058 | 301 | 28.40 | |
Saturday | 146 | 55 | 490 | 69 | 47.26 | |
| ||||||
2017 | Sunday | 201 | 50 | 522 | 116 | 57.71 |
Monday | 1,445 | 212 | 1,964 | 324 | 22.42 | |
Tuesday | 888 | 355 | 1,508 | 238 | 26.80 | |
Wednesday | 888 | 272 | 1,656 | 224 | 25.23 | |
Thursday | 1,383 | 502 | 1,846 | 273 | 19.74 | |
Friday | 1,159 | 57 | 2,061 | 312 | 26.92 | |
Saturday | 192 | 41 | 679 | 100 | 52.08 |
From Table 1, it is observed that the average blood supplies of the weekdays for each year are steady. Also, we can see that Monday supply is very high, Thursday and Friday supplies are quite high, Tuesday and Wednesday supplies are moderate, and Saturday and Sunday supplies are significantly lower.
4.2. Time Series Forecasting Results
After running the seven different time series models discussed in Section 3.1 and obtaining the forecasts, we evaluate them using the error measures given in Section 3.3, and the results are presented in Table 2. It is clear that Seasonal ARIMA Model, Seasonal Exponential Smoothing Method, and Multiplicative Holt-Winters Model yield minimal error measures. Hence, we conclude that, under the time series methods, these three models are best forecasting the blood supply for the case study data under consideration.
Table 2.
Error | Method | ||||||
---|---|---|---|---|---|---|---|
AUTOREG | ARMA | Basic ARIMA | Seasonalized ARIMA | Seasonalized ESM | Multiplicative Holt-Winters | Additive Holt-Winters | |
MAE | 215 | 449 | 600 | 160 | 158 | 159 | 159 |
MSE | 88,031 | 288,002 | 577,197 | 57,235 | 57,111 | 57,111 | 57,189 |
BIAS | −383 | −20,578 | 754 | −5,575 | −7,338 | −8,507 | −15,056 |
MAPE | 94.50 | 227 | 224 | 80 | 81 | 81 | 80 |
4.3. Machine Learning Algorithm Results
The performance of the machine learning algorithms is compared in Table 3. For this particular dataset, results show that regression is a better predictor of the blood supply, nevertheless, the power of the results using regression is quite low (R2 = 63.71%).
Table 3.
Statistics of fit | Artificial neural network | Regression |
---|---|---|
R-square | 58.59% | 63.71% |
Therefore, regression is used to predict the supply for the first week of January 2018 as shown in Table 4. A summary of the results obtained under the time series method and regression is given in Table 4.
Table 4.
Methods | Prediction | ||||||
---|---|---|---|---|---|---|---|
1/1/2018 | 1/2/2018 | 1/3/2018 | 1/4/2018 | 1/5/2018 | 1/6/2018 | 1/7/2018 | |
Seasonalized ARIMA | 1491 | 899 | 882 | 1301 | 1242 | 200 | 208 |
Seasonalized ESM | 1480 | 901 | 883 | 1314 | 1232 | 200 | 210 |
Multiplicative Holt-Winters | 1490 | 906 | 887 | 1308 | 1251 | 202 | 210 |
Regression | 1458 | 1269 | 1088 | 951 | 779 | 589 | 410 |
Actual supply | 979 | 1223 | 972 | 1354 | 721 | 263 | 203 |
Clearly from the results, we can infer that there is not a single method that predicts the supply accurately, and hence we recommend using the average value of the forecasts obtained under these four methods for estimating the future supply [15, 19–21].
5. Discussion
This study focusses on predicting the supply of red blood cells for Taiwan Blood Services Foundation (TBSF) [22], a nongovernmental and nonprofit organization. So far, more than seven million citizens have donated blood in Taiwan through this foundation (which accounts for over 25% of the total population of Taiwan) [23]. Currently, blood centers at TBSF do not have a proper blood forecasting system, and some blood centers face blood shortage problems as a result to lack of accurate forecasting of blood supply. This paper focusses on developing a blood supply forecasting decision support tool for TBSF using time series and machine learning algorithms. The accurate forecasting models will enable TSBF to make good blood supply chain management planning decisions, such as when to collect blood from donors, how much units to collect, proper assignment of the workforce for collecting blood in donor drives, and blood component testing process. Upon accurately forecasting the future supply using the methods discussed in this study, inventory models can then be developed to make decisions on the number of units to order and time between orders.
There are some limitations on forecasting methods. Accuracy of forecasting could be affected by various factors. If there are some unknown variable(s) that could cause some of the fluctuations in the data, then it will be more difficult to forecast unless there are known explanatory variable(s) accounting for the variations. Blood supply forecasting is vital for blood supply chain decisions, and they have to be updated as more reliable information becomes available. Hence, after appropriate forecasting methods are selected, it is important to continuously monitor the forecast accuracy.
Acknowledgments
We are grateful to Kuan-Tsou (Johnny) Lin, Director of Department of Operation, and Ming Chang Lin, Director of Hsin Chu Blood Center at the TBSF, for providing us with five years of daily blood supply data. We would also like to show our gratitude to Sabrina Lei Li, Director of Department of Public Relations, who provides important insight and expertise that greatly assisted the research. The first author is grateful to the US Department of Education for funding his PhD study through the Graduate Assistance in Areas of National Need (GAANN) fellowship.
Data Availability
The data used to support the findings of this study have not been made available because they are confidential to the case study blood center and hospitals.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
References
- 1.American Society of Hematology. Blood Basics. Washington, DC, USA: American Society of Hematology; 2018. http://www.hematology.org/Patients/Basics. [Google Scholar]
- 2.American Red Cross. Blood Types. Washington, DC, USA: American Red Cross; 2018. https://www.redcrossblood.org/learn-about-blood/blood-types.html. [Google Scholar]
- 3.Pierskalla W. P. Operations Research and Health Care: A Handbook of Methods and Applications. New York, NY, USA: Kluwer Academic Publishers; 2014. Supply chain management of blood banks; pp. 103–145. [Google Scholar]
- 4.Rajendran S., Ravindran A. R. Platelet ordering policies at hospitals using stochastic integer programming model and heuristic approaches to reduce wastage. Computers & Industrial Engineering. 2017;110:151–164. doi: 10.1016/j.cie.2017.05.021. [DOI] [Google Scholar]
- 5.Rajendran S., Ravindran A. R. Inventory management of platelets along blood supply chain to minimize wastage and shortage. Computers & Industrial Engineering. 2019;130:714–730. doi: 10.1016/j.cie.2019.03.010. [DOI] [Google Scholar]
- 6.Srinivas S., Ravindran A. R. Systematic review of opportunities to improve outpatient appointment systems. Proceedings of the IIE Annual Conference; 2017; Pittsburgh, PA, USA: Institute of Industrial and Systems Engineers (IISE); pp. 1697–1702. [Google Scholar]
- 7.Lestari F., Anwar U., Nugraha N., Azwar B. Forecasting demand in blood supply chain (case study on blood transfusion unit). Proceedings of the World Congress on Engineering; July 2017; London, UK. [Google Scholar]
- 8.Pereira A. Performance of time-series methods in forecasting the demand for red blood cells transfusion. Transfusion. 2004;44(5):739–746. doi: 10.1111/j.1537-2995.2004.03363.x. [DOI] [PubMed] [Google Scholar]
- 9.Bosnes V., Aldrin M., Heier H. E. Predicting blood donor arrival. Transfusion. 2005;45(2):162–170. doi: 10.1111/j.1537-2995.2004.04167.x. [DOI] [PubMed] [Google Scholar]
- 10.Fortsch S. M., Khapalova E. A. Reducing uncertainty in demand for blood. Operations Research for Health Care. 2016;9:16–28. doi: 10.1016/j.orhc.2016.02.002. [DOI] [Google Scholar]
- 11.Khaldi R., Afia A. E., Chiheb R., Faizi R. Artificial Neural Network Based Approach for Blood Demand Forecasting: Fez Transfusion Blood Center Case. Rabat, Morocco: Mohammed V University; 2017. [Google Scholar]
- 12.Nahmias S. Production and Operations Analysis. 6th. Irwin, CA, USA: McGraw-Hill; 2008. [Google Scholar]
- 13.SAS. Forecasting Process Details” Retrieved from SAS/ETS 14.3 User’s Guide. Cary, NC, USA: SAS Institute Inc.; 2017. pp. 4150–4177. [Google Scholar]
- 14.Pankratz A. Forecasting with Univariate Box-Jenkins Models: Concepts and Cases. New York, NY, USA: John Wiley & Sons; 1983. [Google Scholar]
- 15.Ravindran A., Warsing D. P. Supply Chain Engineering: Models and Applications. Boca Raton, FL, USA: CRC Press; 2013. [Google Scholar]
- 16.Hyndman R. J., Athanasopoulos G. Forecasting: Principles and Practice. 2nd. Australia: OTexts; May 2018. [Google Scholar]
- 17.Srinivas S., Rajendran S. Big Data Analytics Using Multiple Criteria Decision-Making Models. Boca Raton, FL, USA: CRC Press; 2017. A data-driven approach for multiobjective loan portfolio optimization using machine-learning algorithms and mathematical programming; pp. 175–210. [Google Scholar]
- 18.Chopra S., Meindl P. Supply Chain Management: Strategy, Planning, and Operation. 6th. Upper Saddle River, NJ, USA: Pearson-Prentice Hall; 2015. [Google Scholar]
- 19.Frances P. H. Averaging model forecasts and expert forecasts: why does it work. Interface. 2011;41(2):177–181. doi: 10.1287/inte.1100.0554. [DOI] [Google Scholar]
- 20.Gahirwal M., Vijayalakshmi M. Inter Time Series Sales Forecasting. Chembur, India: Society’s Institute of Technology; 2013. https://arxiv.org/abs/1303.0117. [Google Scholar]
- 21.Rajendran S. Finite and infinite time horizon inventory models to minimize platelet wastage at hospitals. International Journal of Operations and Quantitative Management. 2016;22(2):119–140. [Google Scholar]
- 22.Taiwan Blood Services Foundation. Annual report. 2018. http://intra.blood.org.tw/upload/cf15c0af-f84d-4628-9f8f-9661b6cf34b8.pdf.
- 23. Blood Donation Services in Taiwan, 2019, http://intra.blood.org.tw/upload/ce805c8a-9f64-45e3-ab07-20673a31164c.pdf.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data used to support the findings of this study have not been made available because they are confidential to the case study blood center and hospitals.