Skip to main content
Journal of Healthcare Engineering logoLink to Journal of Healthcare Engineering
. 2019 Sep 17;2019:6123745. doi: 10.1155/2019/6123745

Comparison of Time Series Methods and Machine Learning Algorithms for Forecasting Taiwan Blood Services Foundation's Blood Supply

Han Shih 1, Suchithra Rajendran 1,2,
PMCID: PMC6766103  PMID: 31636879

Abstract

Purpose

The uncertainty in supply and the short shelf life of blood products have led to a substantial outdating of the collected donor blood. On the other hand, hospitals and blood centers experience severe blood shortage due to the very limited donor population. Therefore, the necessity to forecast the blood supply to minimize outdating as well as shortage is obvious. This study aims to efficiently forecast the supply of blood components at blood centers.

Methods

Two different types of forecasting techniques, time series and machine learning algorithms, are developed and the best performing method for the given case study is determined. Under the time series, we consider the Autoregressive (AUTOREG), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA, Seasonal Exponential Smoothing Method (ESM), and Holt-Winters models. Artificial neural network (ANN) and multiple regression are considered under the machine learning algorithms.

Results

We leverage five years worth of historical blood supply data from the Taiwan Blood Services Foundation (TBSF) to conduct our study. On comparing the different techniques, we found that time series forecasting methods yield better results than machine learning algorithms. More specifically, the least value of the error measures is observed in seasonal ESM and ARIMA models.

Conclusions

The models developed can act as a decision support system to administrators and pathologists at blood banks, blood donation centers, and hospitals to determine their inventory policy based on the estimated future blood supply. The forecasting models developed in this study can help healthcare managers to manage blood inventory control more efficiently, thus reducing blood shortage and blood wastage.

1. Introduction

Blood performs several important functions in the human body such as transporting oxygen, carrying supplements to our cells, disposing ammonia, carbon dioxide, and other waste items. Four of the most critical elements are the red blood cells (RBC), white blood cells (WBC), plasma, and platelets [1]. The American Red Cross reported that over 35,000 RBC units, 10,000 plasma units, and 7,000 platelet units are required day-to-day within the US [2]. Due to the short shelf life of blood components, hospitals and blood centers are faced with the challenge of maintaining appropriate inventory levels to avoid outdating and shortage.

Managing blood supply and demand is the core part of the healthcare supply chain system as blood plays a very crucial role in saving human lives. Blood supply forecasting is essential for making supply chain decisions, such as donor drive scheduling, vehicle routing policies, and inventory management, at blood centers and hospitals. Accurate forecasts of the timing and amount of future blood requests have been considered as the key inputs to donor recruitment decision making and inventory control. It is important to gather data for several years to forecast monthly demand and to recognize seasonality in demand [36]. Lestari et al. [7] indicated that the forecasting can predict the data trend observed and future demand for blood components.

2. Literature Review

Several studies have leveraged time series forecasting techniques for predicting the blood demand at hospitals and blood centers. For instance, Pereira [8] investigated and evaluated the autoregressive integrated moving average (ARIMA) model and Holt-Winters exponential smoothing model to predict monthly demand for red blood cell transfusions at a tertiary care. While these methods focused on using time series forecast, Bosnes et al. [9] used the statistical regression technique for the forecast of blood donor arrivals at the blood bank of Oslo and found that the most important factors among 18 explanatory variables were: donor age, time from making an appointment to arriving at the drive, contact methods used, number of prior donations, and donor no-show rate. Fortsch and Khapalova [10] introduced numerous practical methods to predict future demand of blood. Several forecasting models, including the naïve, exponential smoothing, moving average, and time series decomposition, were tested using the daily demand data from a blood center that were obtained for January 2006 to December 2012. They also compared the performance of these methods with an autoregressive moving average (ARMA) model. The results revealed that the ARMA forecasting model performed better for eight out of nine time series model settings. Similarly, Khaldi et al. [11] explored the capabilities of employing machine learning algorithms such as the artificial neural network (ANN) model to predict future demand for blood.

3. Materials and Methods

As discussed earlier, the study aims to develop effective forecasting methods to predict the supply of RBCs using two different techniques: time series forecasting methods and machine learning algorithms.

3.1. Time Series Forecasting

This section discusses the seven time series forecasting methods used in this study.

3.1.1. Autoregressive (AUTOREG) Model [12, 13]

The AUTOREG procedure estimates and forecasts linear regression models for time series data when the errors are autocorrelated. The autoregressive model regresses the value of the series at time t (Yt) on the values during the time periods t − 1, t − 2,…, tp.  The mathematical formula is expressed as follows:

Yt=α0+α1Yt1+α2Yt2++αpYtp+εt, (1)

where α0, α1, α2,…, αp are the linear regression coefficients, Yt is the forecasted value at time t, and εt is the random error variable and is generally assumed to have a normal distribution with mean 0 and variance σ2 (i.e., normal (0, σ2)).

3.1.2. Autoregressive Moving Average (ARMA) Models [1214]

ARMA model is one of the basic tools in time series modeling. Suppose the time series Y1, Y2,…, Yt is a stationary stochastic process time series, the expression ARMA (p, q) represents the model with autoregressive order of p and moving-average order of q. This model is a combination of the AR (p) and MA (q) models, where AR (p) is written as Yt=a+∅1Yt−1+∅2Yt−2+⋯+∅pYtp+εt and MA (q) is written as Yt=bθ1εt−1θ2εt−2 − ⋯−θqεtq+εt.

As in the AUTOREG model, Yt is the observation value at time t. The ARMA (p, q) process is generally written as follows:

Yt=c+i=1piYtii=1qθiεti+εt, (2)

where a, b, and c are constants, εt is the random error variable and is generally assumed to have a normal distribution with mean 0 and variance σ2; ∅1, ∅2,…, ∅p are the autoregressive coefficients to be estimated, and θ1, θ2,…, θq are the moving average coefficients to be estimated.

3.1.3. Autoregressive Integrated Moving Average (ARIMA) Model [1214]

The ARIMA (autoregressive integrated moving average) approach was made popular by Box–Jenkins models [11]. The ARIMA procedure is functioning as a linear combination of its current values, past values, past errors, and past values of other time series (predictor time series) to predict a future response value in a time series.

With time series nonstationary behavior, the above ARMA (p, q) model can be extended and written using difference which is defined as: YtYt−1=(1 − B)Yt= ∇Yt, where t is the index of time, Yt is time series {Yt :  1 ≤ tn}  at time t, and B is the backward shift operator, which means that B has the effect of shifting the data back one period (i.e., BYt=Yt−1).

3.1.4. Seasonal ARIMA Model [12, 13, 15, 16]

Seasonal ARIMA model is written with the general expression ARIMA (p, d, q)(P, D, Q)s. The symbol p is the order of the nonseasonal autoregressive component, d  is the order of the differencing, q is the order of the nonseasonal moving-average process, P is the order of the seasonal autoregressive part, D  is the order of the seasonal differencing, Q  is the order of the seasonal moving-average process, and s is the duration of the seasonal cycle.

Let Yt be a dependent time series {Yt :  1 ≤ tn} at time t, then the mathematical formula for the seasonal ARIMA model is expressed as follows:

1Bd1BsDYt=µ+θBθsBsφBsBsεt, (3)

where μ is the constant mean, Bs is the seasonal backward shift operator, ∅s(Bs)=1 − ∅s,1(Bs) − ⋯− ∅s,P(BsP) is the seasonal autoregressive component, and θs(Bs)=1 − θs,1(Bs) − ⋯−θs,Q(BsQ) is the seasonal moving-average component.

3.1.5. Seasonal Exponential Smoothing Model [12, 13, 15, 16]

In the seasonal exponential smoothing method (ESM), the equation of forecast value at time t+k (Yt+k) is given by

Yt+k=Lt+Stp+k. (4)

The smoothing equations are as follows:

Lt=αXtStp+1αLt1, (5)
St=γXtLt+1γStp, (6)

where Xt is given observation at time t, and α and γ are the level and seasonal smoothing parameters, respectively, Lt is the estimated level component at time t, St is the estimated seasonal component at time t, and p is the periods after which the seasonal cycle repeats itself.

3.1.6. Multiplicative Holt-Winters Model [12, 13, 15, 16]

The Holt-Winters model, also known as the triple exponential smoothing, applies three types of exponential smoothing to the time series—value, trend, and seasonality. The model equation for the Holt-Winters method can be either additive or multiplicative model. In this section, we present the multiplicative Holt-Winters model, whereas Section 3.1.7 presents the additive model.

The mathematical formula relevant to a time series with a trend and constant seasonal component using the Holt-Winters additive technique has the forecast at time t+k (Yt+k) given by following equation:

Yt+k=Lt+kTtSIt+kp. (7)

The smoothing equations are given using the following equations:

Lt=αXtSItp+1αLt1+Tt1, (8)
Tt=βLtLt1+1βTt1, (9)
SIt=γXtLt+1γSItp, (10)

where Xt is given observation at time t,  α,  β,   and γ are the level, trend, and seasonal corresponding constants, respectively, Lt is the estimated level at time t, Tt is the estimated trend at time t, SIt is the seasonality index at time t, and p is the periods after which the seasonal cycle repeats itself.

3.1.7. Additive Holt-Winters Model [12, 13, 15, 16]

In this section, we present the additive Holt-Winters Model.

For the additive model, the forecasted supply estimate for time t+k is given by the following equation:

Yt+k=Lt+kTt+Stp+k. (11)

The estimates of level, trend, and seasonal factors for additive model equations are given using the following equations:

Lt=αYtStp+1αLt1+Tt1, (12)
Tt=βLtLt1+1βTt1, (13)
St=γYtLt+1γStp. (14)

3.2. Machine Learning Algorithms

Machine learning is a technology exploring the algorithms to analyze a set of data, learn from the insights gathered, and make predictions on data [17]. For the blood supply forecasting, we leverage the two most widely used machine-learning techniques, artificial neural network and regression.

3.2.1. Artificial Neural Networks (ANN)

ANN is a reinforcement learning method that is an adaptation of biological neural network. The network consists of several nodes that are distributed across numerous layers, and each layer is connected to its previous and subsequent layers within the network [17]. These interconnected elements work closely to process information that they receive from the nodes of the previous layers and transfer them to the next layer based on the sigmoid function. They are particularly useful for modeling complex relationships in high-dimensional data or where the relationship between the input and output variables is not easy to understand [17].

3.2.2. Multiple Regression

Multiple regression is another class of problem in machine learning that is trying to predict a continuous value of a variable instead of a class unlike in classification problem [17]. Linear regression with ordinary least square is one of the classic machine learning algorithms in this domain. The mathematical formula for the regression model is represented as follows:

Y=β0+β1X1++βnXn+ε, (15)

where Y is the response variable, Xn is an independent variable, β0 is the intercept, βi is the slope of the coefficient Xi (both β0 and βi are unknown coefficients to be estimated by the model), and ε is the error variable.

3.3. Evaluation of the Different Methods

We use four different measures of forecast errors for evaluating the model performance and the accuracy of the methods; they are MAE, MSE, BIAS, and MAPE [12, 15, 18].

Assume X1, X2,…, Xn are actual data and F1, F2,…, Fn are forecasted data, and then the n values of forecast errors, e1, e2,…, en, are given by e1=F1X1,  e2=F2X2,…, en=FnXn.

  1. Mean absolute error (MAE): it measures the average significance of the forecast errors, where all individual errors have equal weights:

MAE=1ni=1nei. (16)
  • (b) Mean squared error (MSE): it also measures the significance of the forecast errors, and larger errors get penalized more due to squaring:

MSE=1ni=1nei2. (17)
  • (c) BIAS: this is an indication of whether the forecast is overestimating or underestimating the actual supply over the forecast horizon:

BIAS=i=1nei. (18)
  • (d) Mean absolute percentage error (MAPE): it measures the relative significance of forecasting errors in percentage terms:

MAPE=1ni=1neiXi×100. (19)

4. Results

4.1. Data Collection

The historical supply data for five years from 2013 to 2017 are first gathered from the health records. The summary statistics are given in Table 1.

Table 1.

2013–2017 TBSF weekly supply summary statistics.

Year Day Average Min. Max. Standard deviation Coefficient of supply variation (%)
2013 Sunday 188 32 461 84 44.68
Monday 1,523 173 1,928 287 18.84
Tuesday 820 154 1,558 200 24.39
Wednesday 961 327 1,606 254 26.43
Thursday 1,127 299 1,596 282 25.02
Friday 1,039 458 1,956 263 25.31
Saturday 135 43 462 68 50.37

2014 Sunday 174 31 456 82 47.13
Monday 1,525 688 2,324 351 23.02
Tuesday 858 327 1,935 253 29.49
Wednesday 857 168 1,474 210 24.50
Thursday 1,238 80 2,048 304 24.56
Friday 1,013 84 2,027 314 31.00
Saturday 138 31 587 103 74.64

2015 Sunday 200 39 531 126 63.00
Monday 1,504 850 2,636 303 20.15
Tuesday 850 495 1,421 200 23.53
Wednesday 855 1 1,461 252 29.47
Thursday 1,381 139 1,923 309 22.38
Friday 1,025 197 1,450 253 24.68
Saturday 164 31 660 122 74.39

2016 Sunday 204 31 542 99 48.53
Monday 1,497 162 2,073 331 22.11
Tuesday 855 372 1,572 239 27.95
Wednesday 862 146 1,264 199 23.09
Thursday 1,439 547 2,643 319 22.17
Friday 1,060 81 2,058 301 28.40
Saturday 146 55 490 69 47.26

2017 Sunday 201 50 522 116 57.71
Monday 1,445 212 1,964 324 22.42
Tuesday 888 355 1,508 238 26.80
Wednesday 888 272 1,656 224 25.23
Thursday 1,383 502 1,846 273 19.74
Friday 1,159 57 2,061 312 26.92
Saturday 192 41 679 100 52.08

From Table 1, it is observed that the average blood supplies of the weekdays for each year are steady. Also, we can see that Monday supply is very high, Thursday and Friday supplies are quite high, Tuesday and Wednesday supplies are moderate, and Saturday and Sunday supplies are significantly lower.

4.2. Time Series Forecasting Results

After running the seven different time series models discussed in Section 3.1 and obtaining the forecasts, we evaluate them using the error measures given in Section 3.3, and the results are presented in Table 2. It is clear that Seasonal ARIMA Model, Seasonal Exponential Smoothing Method, and Multiplicative Holt-Winters Model yield minimal error measures. Hence, we conclude that, under the time series methods, these three models are best forecasting the blood supply for the case study data under consideration.

Table 2.

Error measures obtained under the seven time series models.

Error Method
AUTOREG ARMA Basic ARIMA Seasonalized ARIMA Seasonalized ESM Multiplicative Holt-Winters Additive Holt-Winters
MAE 215 449 600 160 158 159 159
MSE 88,031 288,002 577,197 57,235 57,111 57,111 57,189
BIAS −383 −20,578 754 −5,575 −7,338 −8,507 −15,056
MAPE 94.50 227 224 80 81 81 80

4.3. Machine Learning Algorithm Results

The performance of the machine learning algorithms is compared in Table 3. For this particular dataset, results show that regression is a better predictor of the blood supply, nevertheless, the power of the results using regression is quite low (R2 = 63.71%).

Table 3.

Performance of machine learning algorithms.

Statistics of fit Artificial neural network Regression
R-square 58.59% 63.71%

Therefore, regression is used to predict the supply for the first week of January 2018 as shown in Table 4. A summary of the results obtained under the time series method and regression is given in Table 4.

Table 4.

Blood supply predictions using the best performing time series and machine learning methods.

Methods Prediction
1/1/2018 1/2/2018 1/3/2018 1/4/2018 1/5/2018 1/6/2018 1/7/2018
Seasonalized ARIMA 1491 899 882 1301 1242 200 208
Seasonalized ESM 1480 901 883 1314 1232 200 210
Multiplicative Holt-Winters 1490 906 887 1308 1251 202 210
Regression 1458 1269 1088 951 779 589 410
Actual supply 979 1223 972 1354 721 263 203

Clearly from the results, we can infer that there is not a single method that predicts the supply accurately, and hence we recommend using the average value of the forecasts obtained under these four methods for estimating the future supply [15, 1921].

5. Discussion

This study focusses on predicting the supply of red blood cells for Taiwan Blood Services Foundation (TBSF) [22], a nongovernmental and nonprofit organization. So far, more than seven million citizens have donated blood in Taiwan through this foundation (which accounts for over 25% of the total population of Taiwan) [23]. Currently, blood centers at TBSF do not have a proper blood forecasting system, and some blood centers face blood shortage problems as a result to lack of accurate forecasting of blood supply. This paper focusses on developing a blood supply forecasting decision support tool for TBSF using time series and machine learning algorithms. The accurate forecasting models will enable TSBF to make good blood supply chain management planning decisions, such as when to collect blood from donors, how much units to collect, proper assignment of the workforce for collecting blood in donor drives, and blood component testing process. Upon accurately forecasting the future supply using the methods discussed in this study, inventory models can then be developed to make decisions on the number of units to order and time between orders.

There are some limitations on forecasting methods. Accuracy of forecasting could be affected by various factors. If there are some unknown variable(s) that could cause some of the fluctuations in the data, then it will be more difficult to forecast unless there are known explanatory variable(s) accounting for the variations. Blood supply forecasting is vital for blood supply chain decisions, and they have to be updated as more reliable information becomes available. Hence, after appropriate forecasting methods are selected, it is important to continuously monitor the forecast accuracy.

Acknowledgments

We are grateful to Kuan-Tsou (Johnny) Lin, Director of Department of Operation, and Ming Chang Lin, Director of Hsin Chu Blood Center at the TBSF, for providing us with five years of daily blood supply data. We would also like to show our gratitude to Sabrina Lei Li, Director of Department of Public Relations, who provides important insight and expertise that greatly assisted the research. The first author is grateful to the US Department of Education for funding his PhD study through the Graduate Assistance in Areas of National Need (GAANN) fellowship.

Data Availability

The data used to support the findings of this study have not been made available because they are confidential to the case study blood center and hospitals.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

  • 1.American Society of Hematology. Blood Basics. Washington, DC, USA: American Society of Hematology; 2018. http://www.hematology.org/Patients/Basics. [Google Scholar]
  • 2.American Red Cross. Blood Types. Washington, DC, USA: American Red Cross; 2018. https://www.redcrossblood.org/learn-about-blood/blood-types.html. [Google Scholar]
  • 3.Pierskalla W. P. Operations Research and Health Care: A Handbook of Methods and Applications. New York, NY, USA: Kluwer Academic Publishers; 2014. Supply chain management of blood banks; pp. 103–145. [Google Scholar]
  • 4.Rajendran S., Ravindran A. R. Platelet ordering policies at hospitals using stochastic integer programming model and heuristic approaches to reduce wastage. Computers & Industrial Engineering. 2017;110:151–164. doi: 10.1016/j.cie.2017.05.021. [DOI] [Google Scholar]
  • 5.Rajendran S., Ravindran A. R. Inventory management of platelets along blood supply chain to minimize wastage and shortage. Computers & Industrial Engineering. 2019;130:714–730. doi: 10.1016/j.cie.2019.03.010. [DOI] [Google Scholar]
  • 6.Srinivas S., Ravindran A. R. Systematic review of opportunities to improve outpatient appointment systems. Proceedings of the IIE Annual Conference; 2017; Pittsburgh, PA, USA: Institute of Industrial and Systems Engineers (IISE); pp. 1697–1702. [Google Scholar]
  • 7.Lestari F., Anwar U., Nugraha N., Azwar B. Forecasting demand in blood supply chain (case study on blood transfusion unit). Proceedings of the World Congress on Engineering; July 2017; London, UK. [Google Scholar]
  • 8.Pereira A. Performance of time-series methods in forecasting the demand for red blood cells transfusion. Transfusion. 2004;44(5):739–746. doi: 10.1111/j.1537-2995.2004.03363.x. [DOI] [PubMed] [Google Scholar]
  • 9.Bosnes V., Aldrin M., Heier H. E. Predicting blood donor arrival. Transfusion. 2005;45(2):162–170. doi: 10.1111/j.1537-2995.2004.04167.x. [DOI] [PubMed] [Google Scholar]
  • 10.Fortsch S. M., Khapalova E. A. Reducing uncertainty in demand for blood. Operations Research for Health Care. 2016;9:16–28. doi: 10.1016/j.orhc.2016.02.002. [DOI] [Google Scholar]
  • 11.Khaldi R., Afia A. E., Chiheb R., Faizi R. Artificial Neural Network Based Approach for Blood Demand Forecasting: Fez Transfusion Blood Center Case. Rabat, Morocco: Mohammed V University; 2017. [Google Scholar]
  • 12.Nahmias S. Production and Operations Analysis. 6th. Irwin, CA, USA: McGraw-Hill; 2008. [Google Scholar]
  • 13.SAS. Forecasting Process Details” Retrieved from SAS/ETS 14.3 User’s Guide. Cary, NC, USA: SAS Institute Inc.; 2017. pp. 4150–4177. [Google Scholar]
  • 14.Pankratz A. Forecasting with Univariate Box-Jenkins Models: Concepts and Cases. New York, NY, USA: John Wiley & Sons; 1983. [Google Scholar]
  • 15.Ravindran A., Warsing D. P. Supply Chain Engineering: Models and Applications. Boca Raton, FL, USA: CRC Press; 2013. [Google Scholar]
  • 16.Hyndman R. J., Athanasopoulos G. Forecasting: Principles and Practice. 2nd. Australia: OTexts; May 2018. [Google Scholar]
  • 17.Srinivas S., Rajendran S. Big Data Analytics Using Multiple Criteria Decision-Making Models. Boca Raton, FL, USA: CRC Press; 2017. A data-driven approach for multiobjective loan portfolio optimization using machine-learning algorithms and mathematical programming; pp. 175–210. [Google Scholar]
  • 18.Chopra S., Meindl P. Supply Chain Management: Strategy, Planning, and Operation. 6th. Upper Saddle River, NJ, USA: Pearson-Prentice Hall; 2015. [Google Scholar]
  • 19.Frances P. H. Averaging model forecasts and expert forecasts: why does it work. Interface. 2011;41(2):177–181. doi: 10.1287/inte.1100.0554. [DOI] [Google Scholar]
  • 20.Gahirwal M., Vijayalakshmi M. Inter Time Series Sales Forecasting. Chembur, India: Society’s Institute of Technology; 2013. https://arxiv.org/abs/1303.0117. [Google Scholar]
  • 21.Rajendran S. Finite and infinite time horizon inventory models to minimize platelet wastage at hospitals. International Journal of Operations and Quantitative Management. 2016;22(2):119–140. [Google Scholar]
  • 22.Taiwan Blood Services Foundation. Annual report. 2018. http://intra.blood.org.tw/upload/cf15c0af-f84d-4628-9f8f-9661b6cf34b8.pdf.
  • 23. Blood Donation Services in Taiwan, 2019, http://intra.blood.org.tw/upload/ce805c8a-9f64-45e3-ab07-20673a31164c.pdf.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used to support the findings of this study have not been made available because they are confidential to the case study blood center and hospitals.


Articles from Journal of Healthcare Engineering are provided here courtesy of Wiley

RESOURCES