Abstract
Background
Emergency departments (ED) are struggling with an increased influx of patients. One of the methods to help departments prepare for surges of admittance is time series forecasting (TSF). The aim of this study was to create an overview of current literature to help guide future research. Firstly, we aimed to identify external variables used. Secondly, we tried to identify TSF methods used and their performance.
Methods
We included model development or validation studies that were forecasting patient arrivals to the ED and used external variables. We included studies on any forecast horizon and any forecasting methodology. Literature search was done through PubMed, Scopus, Web of Science, CINAHL and Embase databases. We extracted data on methods and variables used. The study is reported according to TRIPOD-SRMA guidelines. The risk of bias was assessed using PROBAST and authors’ own dimensions.
Results
We included 30 studies. Our analysis has identified 10 different groups of variables used in models. Weather and calendar variables were commonly used. We found 3 different families of TSF methods. However, none of the studies followed reporting guidelines and model code was seldom published.
Conclusions
Our results identify the need for better reported results of model development and validation to better understand the role of external variables used in created models, as well as for more uniform reporting of results between different research groups and external validation of created models. Based on our findings, we also suggest a future research agenda for this field.
Clinical trial number
Not applicable.
Supplementary information
The online version contains supplementary material available at 10.1186/s12873-025-01250-8.
Graphical Abstract
Keywords: Systematic review, Forecasting, Emergency service, Artificial intelligence, Statistical models, External data, TRIPOD, PROBAST
Background
Emergency departments (ED) are the backbone of acute (and becoming prominent in chronic) patient care and one of the main entrances for patients entering the healthcare system besides primary care and their referrals [1, 2]. EDs around the globe are experiencing overcrowding and staff shortages. The ED staff in the Western World have stated concerns over mismatching system resources to cope with peak hours patient influx [3]. This has a negative effect both on patient care and safety as well as on staff working conditions [3].
Time series (TS) is sequential data collected over time observing a phenomenon [4]. Time series forecasting (TSF) is an effort to try and predict what will happen with the observed phenomena, based on its history and other variables in the near to far future [4]. TSF was popularized during the COVID-19 pandemic, where it was used for predicting the pandemic trajectory [5]. Other than medicine, TSF is also used in retail, energy, logistics and many other industries, which suffer from similar problems of trying to balance the supply and demand problem [6]. With TSF in the medical field, we can try to anticipate the number of patients that will get infected, show symptoms or seek medical care in a set horizon in the future [7].
With new advances coming from other fields of TSF, the research field is growing, and new technologies are emerging [2, 8, 9]. EDs routinely collect data that can be used for TSF, being a good starting point before proceeding to other departments. A major problem is the lack of uniform terminology (box 1), blocking rapid development and hindering the ability of systematic review to capture all of the research pertaining to the research question [2, 8, 9].
Box 1.
Important term definitions
| Number of visits: number of patients arriving to the emergency department (This includes all patients, regardless of whether they are attended to, leave without being seen, or are redirected elsewhere). |
| Number of arrivals: number of patients who have visited ED and have stayed there, sometimes also called “census”. |
| Attendance: number of patients that were actually attended to in the emergency department (can be lower than the number of visits). |
| Disposition: choice if patient is going to be admited to ward or discharged. |
One of the earliest articles from the field of ED TSF was published in 1994 and described using raw observations, moving averages and auto-regressive integrated moving averages (ARIMA) [10]. Since then, teams from around the world have been trying to improve their predictions using different models available [2, 8, 9, 11]. Recently, with the advancement of hardware and lowered entry requirements for advanced methods, such as machine learning [12], the field has been experiencing rapid development.
We have identified 4 different systematic reviews carried out on this topic to this date [2, 8, 9, 11]. Each raises the field of ED TSF to a higher level and views the topic through a slightly different prism. Although reviews are exhaustive, they still leave a novice in the field wondering where and how to start. It is still uncertain which variables perform better and which models should be used. Additionally, none of the reviews address bias in a systematic way using externally validated tools.
With this in mind, the aim of this systematic review is to help authors starting in the ED arrivals TSF field. The primary goal was to identify which external variables to use to make prediction models more accurate. The secondary goal was to find out which methods of TSF are used and which of them perform better.
Methods
Our systematic review is reported according to TRIPOD-SRMA guidelines [13] (see Supplement 1), which uses some elements of the PRISMA checklist with added thematic items for AI and modelling [13]. The protocol of the study (see Supplement 2) was not registered to Prospero as it does not research randomized controlled studies and is out of scope for the database.
Information sources and search strategy
Our systematic review search strategy was informed by previous work done on other systematic reviews from this field (Table 1) [2, 8, 9, 11]
Table 1.
Previous systematic reviews on ED TSF
| Author | Databases searched | Search strategy | Search strategy hits | Number of included studies | Search start | Search end |
|---|---|---|---|---|---|---|
| Wargon et al. [2] | PubMed (Medline), Science Citation Index | ‘Emergency Service [mh] AND (forecasting [all] or scheduling) AND (simulation OR models, theoretical [mh]) | 201 | 9 | NA | 2007 |
| Muhammet et al. [11] | PubMed (Medline), Scopus, Web of Science, Google Scholar, Wiley, Elsevier, Springer, IEEE, BMJ Journals, Hindawi, Wolters Kluver Healths, Mary Ann Liebert, Oxford Journals, and Taylor & Francis | Emergency depart- ment/emergency room AND forecasting/predicting/ estimating AND patient demand/arrival/visit/volume/ presentation, LOS, admission, waiting time, ambulance diversion, etc. | 2964 | 41 | 2002 | 2018 |
| Jiang et al. [9] | PubMed (Medline), ScienceDirect and Scopus | ‘forecast OR predict’ AND ‘daily attendances OR patient volumes OR patient visits’ AND ‘emergency room OR emergency department’ AND ‘time series OR regression’ | 1677 | 35 | 2004 | 2021 |
| Silva et al. [8] | PubMed (Medline), Web of Science and Scopus | forecasting AND ED AND (visits OR arriv- als)) AND (method OR model) AND (better OR improved) AND (performance OR accuracy) | 233 | 7 | 2015 | 2021 |
Our systematic search was conducted in 5 databases – PubMed [14], Scopus [15], Web of Science [16], CINAHL [17] and Embase [18], on the 13th of February 2025. We used the following search key, which was based on previous systematic reviews search queries (Table 1):
(“emergency department*” OR “emergency room*” OR “emergency service*” OR “ED” OR “ER”) AND (attendance* OR visit* OR “patient volume” OR “patient flow”) AND (modelling OR modeling) AND TITLE-ABS-KEY (forecast* OR prediction OR “time series forecasting” OR “time series model” OR “patient visits forecast” OR overcrowding) OR TITLE-ABS-KEY (“machine learning”).
We restricted our search to exclude certain publication types. Conference abstracts and books or book chapters did not appear in our search results. Only results in English language were included. No restrictions regarding study design were applied. Only studies reporting development and validation of models were included. The search included all eligible articles published until February 2025. The review builds on previous work conducted.
Study selection process
We included model development and/or validation studies that were forecasting patient arrivals to the ED and used external variables to better inform their model. Studies which were on any age, any sex or any disease were assessed as relevant to this review. We included studies on any forecast horizon and any forecasting methodology. Our primary outcome was the final list of external variables used in the model. The secondary outcome was an analysis of which model performed best (if more than one model was used).
Before the screening, search results were deduplicated using SR-accelerator Deduplicator software [19]. Search results were independently screened for eligibility by 2 authors (LP, KGR). Two review authors (LP, KGR) retrieved the full texts, and 2 authors (LP, KGR) screened the full texts for inclusion. The citation search was done by LP and KGR. Discrepancies were resolved by a second look and agreement between the authors screening full texts (LP, KGR).
The selection process was recorded in sufficient detail to complete a PRISMA flow diagram (see Fig. 1), which was created online [20] and a list of deduplicated studies (Supplement 3).
Fig. 1.
PRISMA flowchart representing the study selection process
Data items
An author-created form (initially piloted on one included study) was used for data extraction of characteristics of studies, outcomes and risk of bias. One study author (LP) extracted the following data from included studies:
year of publication,
country,
dataset size,
methods used,
list of all tested external variables,
the population the model tried to capture,
list of external variables used in the final model,
forecasting horizon,
external variable preprocessing used,
missing data handling,
main conclusions and
future research directions.
Risk of bias and applicability assessment
Risk of bias was assessed using the PROBAST tool [21]. Two authors (LP, KGR) assessed risk of bias for each study on 73 PROBAST points. TSF expert (JŽ) also suggested additional items to better assess the risk of bias and quality of manuscripts, which could be answered with a yes or no. These items were:
Is the model made public (ex. Hugging face or Github)?
Is the development dataset size specified (number of records)?
Was the development data made available (ex. Zenodo or Figshare)?
Were any reporting guidelines followed (ex. TRIPOD)?
Did the study use internal or external validation?
What kind of data validation was used and what type is it according to TRIPOD 2015 [22]?
Did the authors report predictions with confidence intervals?
PROBAST tool includes 4 steps. In the first step, the systematic review question is specified (see the aim of the study), this step is not repeated later. In the second to fourth step, each study is assessed for the type of prediction model evaluation, risk of bias and applicability and finally, the overall judgement is passed [21]. If only a low risk in step 2 and 3 is found, the article has low risk of bias and a low concern over applicability. If at least one dimension in step 2 in 3 is marked as high, the article gets a high risk of bias or a high concern of applicability [21].
The unit of analysis was “population seeking care in EDs”. We did not contact investigators or study sponsors to provide missing data. We did not assess publication bias/small studies effect. We did not perform subgroup or sensitivity analysis because it does not apply to these kinds of studies.
Results
Search results
Our systematic search identified 1247 studies, 10 from EBSCO, 208 from EMBASE, 319 from PubMed, 519 from Scopus and 191 from Web of Science. After removing duplicates, 668 studies were screened. 30 studies were included in the [10, 23–51] systematic review [10, 23–51]. Figure 1 shows the PRISMA [52] diagram of the study.
Study and model characteristics
The summary data of 30 studies included in our systematic review (SR) can be seen in Table 1. Most of the included articles tried to predict all patient admissions, while 4 focused on asthma or respiratory illnesses [27, 30, 33, 42] and 2 on respiratory illnesses related ED arrivals [33, 42]. Average study period was 4 years, ranging from 1 to 13 years in one study (see Supplement 4) [53]. The vast majority of research comes from the United States of America and Asia. Most European studies are based in the UK or Spain. The forecast horizon ranged from 1 day to 1 year. Sample size was between 3000 and 901,000. The most common conclusions are that using weather and calendar variables improves prediction accuracy, that machine learning models generally perform better than other methods of TSF and that use of internet search data (ie. Google Trends) can meaningfully help improve models,but needs further research. Future research suggested by authors is similar between articles. Prospective studies are suggested as well, using external validation. Additionally, authors want to try incorporating more granularity into their prediction, i.e. predicting triage categories of arriving patients and the number of different chief complaints. Finally, authors propose creating new hybrid models (also known as ensemble or fusion of models) to improve on currently available ones. Some of the studies also used variable pre-processing (Table 2). None of the studies used any methods to fill in the missing data.
Table 2.
Summary data of 30 included studies in the field of prognostic models for emergency department arrivals forecasting.
| Year | Author | Country | Number of patient records | Forecast horizons | Population | Main conclusions | Future research | External variables preprocessing |
|---|---|---|---|---|---|---|---|---|
| 2024 | Peláez et al. [23] | Spain | 3322 | 7 days | GP |
I. Dividing the database according to the day of the week allows obtaining the best results in both databases II. Meteorological variables exhibit notable significance. Particularly, maximum and minimum temperatures III. The most effective threshold is identified at 40 visits when splitting the dataset based on the count of ED visits from the sixth previous day IV. Variables related to temperature hold significant relevance in both databases when performing the training data segmentation into various subsets V. Variables referring to the number of visitors on preceding days also play a pivotal role in the process |
I. The development of hybrid models II. Refining the selection of optimal thresholds and variables to improve forecasting precision III. The distinction between various patient categories and an extension of the prediction-time horizons of the forecasting problem IV. The adaptation of the prediction system to extreme events |
I. Correlation matrix for variable selection, feature standardization using mean and standard deviation |
| 2023 | Gafni-Pappas et al. [24] | USA | NA | 1 day | GP | I. Machine learning models perform better at predicting daily patient volumes as compared to simple univariate time series models |
I. Prospective study using the identified variables identified II. Additional feature engineering III. Assessing differences in model accuracy on other outcomes IV. One could also consider reviewing common emergency department complaints to see if the models can predict these V. Other variables to aid in prediction may exist |
None |
| 2022 | Tuominen, et al. [25] | Finland | 900700 | 1 day | GP |
I. Predictive accuracy of next day arrivals can be increased using high-dimensional feature selection approach when compared to both univariate and nonfiltered high-dimensional approach II. Performance over multiple horizons was similar with a gradual decline for longer horizons |
I. Enhancing the feature selection mechanism II. Investigating its applicability to other domains, III. In identifying other potentially relevant explanatory variables |
None |
| 2022 | Fan et al. [26] | Hong Kong | 139910 | 7 days | GP |
I. The internet search index significantly improves forecasting as a reliable predictor monitoring continuous behavior trend and sudden changes during the epidemic II. Nonlinear model is better |
I. Explore ED-related data from social media platforms II. Predict the ED patient flow with different severity levels III. The relationship between levels of patient arrivals and internet information |
I. Minimum-maximum normalization technique before modeling |
| 2020 | Peng et al. [27] | China | 13218 | prediction of peak ED arrival | Chronic respiratory disease patients | I. Random forests reached the best classification prediction performance | None | I. Removed records with less than 10 people on weekends to eliminate weekend effects, bringing the total number of samples collected to 55 |
| 2019 | Tideman et al. [28] | USA | NA | Unknown | GP | I. Search volume data can meaningfully improve forecasting of ED |
I. Validating the methodology at additional diverse sites II. Incorporating additional novel sources of data III. Prospectively testing the models in real time |
I. Pairways correlation and PCA |
| 2018 | Jiang et al. [29] | Hong Kong | 245377 | 28 days | GP | I. Predictive values are significant indicators of patients' demand and can be used by ED managers to make resource planning and allocation | None | I. A genetic algorithm (GA)-based feature selection algorithm is improved and implemented as a pre-processing step |
| 2015 | Ram et al. [30] | USA | NA | 3 days | Asthma patients | I. Model can predict the number of asthma ED visits based on near-real-time environmental and social media data with approximately 70% precision |
I. A temporal prediction model that analyzes the trends in tweets and AQI changes and estimates the time lag between these changes and the number of asthma ED visits. II. Examine the effects of seasonal variations III. The effect of relevant data from other types of social media IV. How combining real-time or near-real-time social media and environmental data with more traditional data might affect the performance and timing of the current individual-level prediction models V. Diseases with geographical and temporal variability |
I. Z-score normalization, twitter with ratio number of asthma related vs. all tweets |
| 2011 | Perry et al. [31] | Canada | 140000 | 1 day | Weather induced heat-related illness | I. Estimates of HRI visits from regression models using both weather variables and visit counts captured by syndromic surveillance as predictors were slightly more highly correlated with NACRS HRI ED | None | None |
| 2023 | Hugo Álvarez et al. [32] | Spain | 361698 | 1, 3 and 7 days | GP | I. Attention model with calendar data achieves the best results | None | None |
| 2025 | Morbey et al. [33] | United Kingdom | Unknown | 1 to 28 days | Respiratory ED visit |
I. Machine learning pipelines can be used to create daily forecast that predict the peak in demand for RSV activity over the following 28 days II. Correctly predict the peak in ED acute bronchiolitis attendances in children under 5 years III. Whilst ‘ensemble forecast methods’ inevitably fit the data better, they lack transparency, and it is difficult to explain the theoretical justification for individual forecasts |
I. Better forecasts during atypical seasons by weighting the training data to give more emphasis to the rare events II. Synthetic data could be incorporated in training data |
I. Data were smoothed by weighting activity to remove day of the week effects caused by weekends and public holidays so that weekends and any public holidays within the week always contributed two sevenths of the weekly activity. II. Data were normalised, so that the smoothed daily activity were in the range zero to one |
| 2024 | Lim et al. [34] | Singapore | unknown | Unknown | GP |
I. The EWS-ED framework is designed to detect drifts and predict the extent of these drifts, thereby allowing pre-emptive signals to be detected in order to consider model updating or retraining II. Importance of evaluating concept drift from the conflicting problem arising due to the stability versus the plasticity of predictive models over time. |
I. External validation of the modeling framework II. Including data from multiple hospitals across different regions |
None |
| 2024 | Porto et al. [35] | Australia, the USA, the Netherlands | 46495 | 7 and 45 days | GP |
I. XGBoost consistently outperformed other models across all forecast horizons, with FE significantly enhancing the predictive capabilities of all ML algorithms. II. Temperatures and days of the week are important predictors III. Calendar variables show more importance than meteorological ones |
I. Apply FE to other variables, such as meteorological data | None |
| 2023 | Susnjak et al. [36] | New Zealand | unknown | 90 days | GP | I. Best performing models are Voting, Averaging, Stacking, and Prophet | None | None |
| 2022 | Petsis et al. [37] | Greece | 893046 | 1 day | GP |
I. Day of the week, the mean number of visits of the four previous on duty days, and the daily maximum temperature of the city of Ioannina are the variables that most affect the number of patient ED visits on the next on duty day II. Monday is the busiest day for the ED III. On days with a maximum daily temperature of over 24 Celsius degrees, larger numbers of visits are expected IV. A mean above 284 number of visits on the four previous on duty days means that probably there would be many visits on the next on duty day |
None | None |
| 2021 | Sharafat et al. [38] | USA | unknown | 30 days | GP |
I. PatientFlowNet has a slightly better prediction accuracy for exogenous variables II. It produces substantially more accurate predictions for the endogenous flow variables such as treatment and discharge rates |
None | None |
| 2021 | Vollmer et al. [39] | UK | unknown | 1, 3 and 7 days | GP | I. Framework that is quick and easy to implement | I. Focus on granular data sets including individual level patient data with diagnoses and testing information | None |
| 2021 | Duarte et al. [40] | UK | unknown | 7 days | GP | I. Prediction of Emergency Department (ED) indicators can be improved by using machine learning models in comparison to traditional, particularly when sudden changes are observed in the data | I. Simulations with different KPIs and data sample periods | None |
| 2020 | Yousefi et al. [41] | Brazil | unknown | 1 to 7 days | GP |
I. Short-term forecasting horizons (1 and 2 days) the ARIMA-LR performs better than other II. Forecasting horizons of 3–7 days the MLR shows better results |
I. Future study can integrate simulation with forecasting to show the impact of patient visits on the performance of an ED II. Same model in a number of EDs in different countries to evaluate the performance of the proposed model |
None |
| 2019 | Martin et al. [42] | Canada | unknown | 3 days | Influenza like illness | I. The most effective modelling approach was based on a GLMM with random intercept and incorporated spatio-temporal characteristics as well as an indicator variable similar to a hinge function |
I. Model predictions could be updated after the 3-day window of advanced warning- update estimates post-prediction. II. Predictions at a smaller geographic area or at the hospital-level, and by age category III. Other methods to model the spatial distribution of the visits across the city IV. Model performance could be evaluated based on other influenza season characteristics V. Model should be tested in practice |
None |
| 2019 | Ho et al. [43] | Singapore | unknown | 1 day and 7 days | GP |
I. Informed resource and manpower allocation in hospitals II. Can serve as a key mitigation measure against worsening ED congestion problems |
None | None |
| 2018 | Ekström et al. [44] | Sweden | unknown | 3 days | GP | I. Gradient boosting may be favorable over neural networks for predicting the coming 72 hours | I. Longer prediction horizons | None |
| 2016 | Calegari et al. [45] | Brazil | 57128 | 1, 7, 14, 21, and 30 days | GP |
I. Seasonal and weekly patterns II. Time-series models can be used to provide accurate forecasts, especially short-range forecasts horizons III. Some climatic factors displayed significant correlation with demand series from different classes of patients but do not increase the accuracy of prediction when incorporated into the model |
I. Study should be extended to regions with different climatic characteristics | None |
| 2015 | Aroua et al. [46] | Canada | unknown | 7 days | GP |
I. SARIMAX provides better information than simple SARIMA models II. The level of snow, the maximum and minimum weekly average temperature and weeks of the year are often factors that are significant |
None | None |
| 2015 | Zlotnik et al. [47] | Spain | unknown | 2, 4, 8, 12, 16, 20, 24 weeks | GP | I. The M5P models are simpler and computationally faster than SVM models, may be a good alternative for larger data sets | I. Additional adjustments of model for different locales | None |
| 2015 | Ablogaye et al. [48] | Australia | unknown | 12 months | GP | I. VARIMA models are a more reliable to predict ED demand by patient subgroups and provide a more accurate forecast for ED demand than the normal or standard univariate ARMA and Winters’ methods |
I. Scenario modelling to consider potential changes in population projections and in health policy and clinical practice II. Identify the factors/drivers that contribute to the increasing trend of ED presentation III. Health services strategic planning, and guide workforce modelling, infrastructure and financial modelling |
None |
| 2010 | Perry et al. [49] | Canada | unknown | 4 and 7 days | Respiratory illness |
I. Telehealth can be used to estimate future visits at the health unit level II. Estimates are better for health units with larger populations III. Non-linear modeling methods produced more accurate estimates IV. Because no knowledge of the actual number of past visits is used in any model over the validation data, the statistically significant decrease in prediction accuracy with increasing forecasting lead time provides evidence that there is information about the future number of visits in the call time series |
None | None |
| 2009 | Jones et al. [50] | USA | unknown | 35 days | GP |
I. Multivariate VAR models provided more accurate forecasts of ED arrivals for extended forecast horizons when compared to standard univariate time series methods II. We are not confident that these forecasts could be used reliably in the clinical setting |
I. Focus on the development and use of robust analytical methods such as queuing theory, optimization, and simulation modeling that can be used to provide decision support for clinical staffing, real-time monitoring, and forecasting | None |
| 1994 | Tandberg et al. [10] | USA | 87354 | unknown | GP |
I. Models based on prior observations allow for predictions for future staffing and other needs. II. Simple models performed better than more complex III. Time series forecasts of length of stay and patient acuity are not likely to contribute additional useful information for staffing and resource allocation decisions |
None | None |
| 2019 | Asheim et al. [51] | Norway | 191939 | 1, 2 and 3 hours | GP | I. This method shows continuously updated forecasts with better accuracy than methods based on time variables alone, particularly for periods of large inflow |
I. A more sophisticated machine learning algorithm giving more accurate forecasts II. Interpretability of the proposed model may be valuable |
I. Rounding to 10 minutes before |
Legend: (GP: general population, ED: emergency department, AQI: air quality index, NACRS HRI ED: National Ambulatory Care Reporting System, EWS-ED: Early Warning System for ED attendance predictions, FE: Feature engineering, M5P: decision tree with with regression on nodes, SVM: Support Vector Machines: Multivariate VAR, Cesnus, RSV: Respiratory syncytial virus, FE: Feature engineering, ML: Machine Learning, ARIMA: Autoregressive integrated moving average KPI: Key Performance Indicator, ARIMA-LR: ARIMA Linear Regression, MLR: Multiple Linear Regression, SARIMA: Seasonal ARIMA, SARIMAX: Seasonal Autoregressive Integrated Moving Average + exogenous variable, VARIMA: Vector Autoregressive Integrating Moving Average, ARMA: Autoregressive moving-average model)
Primary outcome
The number of external variables (Table 3) varied between studies from one [28] to over thirty [32]. Analyzing the external variables, they can be split into different thematic groups. These groups are: (I) weather, (II) air quality, (III) Social networks (ex. X formerly Twitter), (IV) calendar, (V) Google search trends, (VI) hospital website visits and (VII) hospital information system, (VIII) Prehospital dispatch, (IX) Experienced local Staff and (X) national health phone lines. Most commonly, weather and calendar variables were used for the final model. Although studies have tried a wide array of external variables, the final model usually used only a select few, which were found to have the highest predictive value. These variables are presented in the table below.
Table 3.
External variables that were used in final models for forecasting ED arrivals
| Author | Variables Used | Variables Retained in the Best Model |
| Peláez et al. [23] |
1. Weather a. temperature, b. pressure, c. rainfall for the three previous days, 2. Calendar a. day of week, b. moon phase, c. the presence of festivities on both the current and preceding days, 3. HIS a. daily ED visits of the previous week. |
1. Weather a. minimum temperature b. maximum temperature c. wind speed 2. HIS a. daily ED visits of the previous week |
| Gafni-Pappas et al. [24] |
1. Weather a. max temperature, b. precipitation, c. average cloud coverage, d. relative humidity, e. average SFC pressure, f. daily change in SFC pressure, g. average wet bulb, h. average solar radiation, 2. Air quality a. overall air quality score, b. max ozone concentration, c. percent flu visits 3. Calendar a. day of the week |
1. Calendar a. day of the week b. maximum temperature 2. Weather a. SFC pressure |
| Tuominen, et al. [25] |
1. HIS a. number of available hospital beds, 2. Calendar a. weekday, b. month, c. specific holiday, d. lagged holiday, e. working day, f. N of minor public events, g. N of major public events, h. N of all public events. 3. Weather a. cloud count, b. air pressure, c. relative humidity, d. rain intensity, e. snow depth, f. air temperature, g. dew point h. temperature, i. visibility, j. air temperature min, k. air temperature max, 4. Website traffic a. website Visitstays.fi, b. website Visitstays.fi/acuta c. Ekstöm’s visitstays.fi, d. Ekström’s ratiotays.fi, 5. Search index/Google trends a. “Acuta” |
1. Calendar a. Shrove Sunday b. Easter day c. Midsummer d. Christmas eve e. All Saint’s day f. Independence day eve g. Ascension day h. Holidayt −1 i. working day j. March k. February l. December m. 29 individual public events* n. N of major public events o. Weekdays i. Thursday ii. Saturday iii. Friday iv. Wednesday v. Tuesday vi. Sunday vii. Monday 2. Available hospital beds 3. Weather a. cloud count 4. Hospital website visits |
| Fan et al. [26] |
1. Calendar a. months, b. school holiday c. public holidays 2. Weather a. the daily highest temperature b. daily lowest temperature 3. Search index/Google trends |
1. Calendar a. months, b. school holiday c. public holidays 2. Weather a. the daily highest temperature b. daily lowest temperature 3. Search index/Google trends a. ginger, swine flu symptoms, infect, dept. of health, fever, swine flu clinic, dentist, enterovirus, cough |
| Peng et al. [27] |
1. Weather a. wind speed b. outside temperature c. atmospheric pressure d. relative humidity 2. Air quality a. particulate matter less than 2.5 μm in diameter b. sulphur dioxide. c. carbon monoxide d. nitrogen dioxide e. 8-hour average ozone slip in a day |
Same |
| Tideman et al. [28] |
1. Search index/Google trends a. allergies, blizzard, bronchitis, fever, flu, flu shot, forecast, humidifier, pneumonia, and snow 2. Calendar a. holidays b. special days (eg. Monday after a long weekend) 3. Weather a. min, mean, max, temp b. barometer c. visibility d. mean, max wind speed e. daily snowfall f. weather warnings |
1. Weather a. maximum temperature b. mean visibility 2. Search index/Google trends a. snow, allergies, blizzard, flu shot, fever, forecast, pneumonia 3. Calendar |
| Jiang et al. [29] |
1. Calendar a. hour of day, b. day of week, c. month of year, d. public holiday, e. before public holiday, f. after public holiday, 2. Weather a. absolute daily max air temp, b. mean air temp, c. absolute daily min air temp, d. mean dew point, e. mean relative humidity, f. mean amount of cloud, g. total rainfall, h. mean wind speed, i. change of max air, j. mean air, k. absolute min air, l. mean dew, m. mean relative humidity, n. mean amount of cloud, o. total rainfall, p. mean wind speed to one day before |
same |
| Ram et al. [30] |
1. Twitter data, 2. Search index/Google trends 3. Air quality index (AQI) 4. Air quality measures a. particulate matter b. ground-level ozone c. carbon monoxide d. sulfur oxides e. nitrogen oxides f. lead |
1. Air quality a. CO b. NOx c. PM 2. Twiiter data a. asthma related tweets 3. Search index/Google trends |
| Perry et al. [31] |
1. Weather a. heat index b. Humindex c. wet bulb globe temperature d. temperature maximum e. temperature minimum f. average daily values of the temperature g. dew point h. wind speed |
1. Weather a. temperature b. humidity c. wind speed |
| Hugo Álvarez et al. [32] |
1. Calendar a. weekday b. month of year c. holidays d. academic holidays e. weekends 2.Weather a. temperature (avg, min, max, variation, hour max,hour min) b. max wind direction c. wind speed avg d. hours of daylight e. atmospheric pressure (min, max) 3.Air quality: a. BEN, CH4, CO, EBE, NMHC, NONO2, NOX, O3, PM10, PM2.5, SO2,TCH, TOL, OXY, MXY, PXY 4.Allergens alnus, alternaria, artemisia, betula, carex, castanea, cupressaceae, fraxinus, gramineae, mercurialis, morus, olea, arecaceae, pinus, plantago, platanus, populus, amaranthaceae, quercus, rumex, ulmus, urticaceae 5.Search index/Google trends a. allergy, analgesic, antihistamines, appendicitis, asthma, aspirin, bronchiolitis, fall, conjunctivitis, damage, depression, fainting, diabetes, pain, emergency, endometriosis, illness, coping, epilepsy, COPD, fever, gastroenteritis, flu, hospital, ibuprofen, incident, heart attack, discomfort, medication, migraine, omeprazole, otitis, paracetamol,prostate, pseudoephedrine, pneumonia, allergic reaction, common cold, brawl, hospital route, bleeding, sinister, high blood pressure, thyroid, urgency, chickenpox, gallbladder |
1. Calendar a. weekday b. month of year c. holidays d. academic holidays e. weekends 2. Weather a. temperature (avg, min, max, variation, hour max,hour min) b. max wind direction c. wind speed avg d. hours of daylight e. atmospheric pressure (min, max) 3. Air quality a. BEN, CH4, CO, EBE, NMHC, NO, NO2, NOX, O3, PM10, PM2.5, SO2, TCH, TOL, OXY, MXY, PXY 4. Search index/Google trends a. appendicitis, bronchiolitis, conjunctivitis b. influenza, hospital, otitis, common cold |
| Morbey et al. [33] |
1. Calendar a. day of month 2. UKHSA real-time syndromic surveillance system* |
Same |
| Lim et al. [34] |
1. calendar a. Holiday i. public holidays ii. post-holiday (the working day following a public holiday) iii. pre-holiday (the day preceding a public holiday) b. working day (indicator for weekdays and non-public holidays) c. day of the week (indicator for Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday) |
Same |
| Porto et al. [35] |
1. Calendar data a. day of the week in which the patient arrived at the ED – 7 dummy variables b. month of the year in which the patient arrived at the ED - 12 dummy variables 2. Weather data a. minimum daily temperature in Celsius of the arrival day, mean daily temperature in Celsius of the arrival day, maximum temperature in Celsius of the arrival day), 3. Feature Engineered (FE) variables |
Same |
| Susnjak et al. [36] |
1. Calendar a. public_holiday (Public holiday indicator (0/1)) b. week (Week number ranging from 1 to 53) 2. Covid a. COVID-19 alert level ranging from one to four |
1. Covid-19 Alert level |
| Petsis et al. [37] |
1. Calendar a. the day of year, the month, the day of the month, the day of the week, and the week of the year b. holidays and event based i. weekend, public holiday, before public holiday, after public holiday, event, before event, after event, school holiday 2. Weather a. min temp b. max temp c. mean temp d. rain e. average wind speed |
Same, with most important being: 1. Calendar a. the day of the week 2. Weather a. maximum temperature |
| Sharafat et al. [38] |
1. HIS a. arrival b. treatment c. discharge |
2. HIS a. arrival b. treatment c. discharge |
| Vollmer et al. [39] |
1. Calendar a. school and bank holidays 2. Weather 3. Search index/Google trends a. flu 4. Experienced staff-known local events that cause surge |
1. Calendar a. Christmas b. Monday c. Sunday d. school holiday e. Saturday f. Wednesday g. Thursday h. July i. time j. March k. Tuesday l. Nottingham Carnival m. August n. December o. June p. September q. bank holiday r. November s. February t. May u. October v. April 2. Weather a. precipitation yesterday b. min air temperature yesterday c. max air temperature yesterday 3. Search index/Google trends a. flu |
| Duarte et al. [40] |
1. HIS a. N of attendances b. unallocated patients with decision to admit c. medically fit to discharge |
2. HIS a. N of attendances b. unallocated patients with decision to admit c. medically fit to discharge |
| Yousefi et al. [41] |
1. Calendar a. soccer match days b. weekends c. holidays d. day before holiday e. day after holiday |
Same |
| Martin et al. [42] |
1. HIS a. date of birth b. forward sortation area (FSA) of the postal code and c. chief complaint 2. Calendar a. season |
1. HIS a. visitor age b. area of residence and 2. Calendar a. season |
| Ho et al. [43] | 1. Search index/Google trends |
2. Search index/Google trends a. “SGH map,” “Singapore General Hospital address,” “SGH appointment,” and “SGH A&E.” |
| Ekström et al. [44] |
1. Calendar a. day of week b. month c. hour of the day d. holidays |
2. Calendar a. day of week b. month c. hour of the day d. holidays |
| Calegari et al. [45] |
1. calendar a. month b. day of the week 2. weather a. average temperature b. min temperature c. max temperature d. temperature gap e. rain f. air velocity g. relative humidity h. hours of insolation |
1. Weather a. minimal temperature b. maximal temperature c. rain d. air velocity e. relative humidity f. temperature gap |
| Aroua et al. [46] |
1. Calendar a. weeks of the year 2. Weather a. weekly average temperature b. maximum temperature c. minimum temperature d. weekly temperature differences e. rain f. snow |
Same |
| Zlotnik et al. [47] |
1. Calendar a. day of week b. holiday c. month of the year d. week number |
Same |
| Ablogaye et al. [48] | 1. Weather |
1. Calendar a. month of the year 2. HIS a. age group b. place of treatment c. triage category d. disposition |
| Perry et al. [49] |
1. National phone service a. Daily aggregate number Telehealth Ontario Calls** for respiratory illness 2. Calendar a. indicator variable for upcoming holidays and weekends |
Same |
| Jones et al. [50] |
1. HIS a. ED arrivals b. ED census c. ED laboratory orders d. ED radiography orders e. ED computed tomography (CT) orders f. inpatient census g. inpatient laboratory h. inpatient radiography orders i. inpatient CT orders |
1. HIS a. laboratory orders, b. radiography orders, c. CT orders, d. inpatient census e. inpatient diagnostic orders |
| Tandberg et al. [10] | None | None |
| Asheim et al. [51] |
1. Calendar a. time of day b. day of the week c. month d. holidays 2. Dispatch a. prehospital notifications |
Same |
Legend: SFC: surface pressure chart, PM: particulate matter, AQI: air quality index, N: number, HIS: Hospital Information System, FSA: Forward Sortation Area, *: Monitor anonymised health service contacts from across the National Health Service (NHS) in England, **: a free 24-hour health advice telephone service
Secondary outcome
We have identified 53 different methods from 3 different method families across our systematic review (see Table 4 below). ARIMA, Support Vector Machines and Random Forest are the most commonly used methods.
Table 4.
Methods used across the literature
| Method Family | Method | Titles of studies with this methodology |
|---|---|---|
| Statistical Methods | ARIMA | Gaffni-Pappas et al. [24], Tuomien et al. [25], Jiang et al. [29], Vollmer et al. [39], Duarte et al [40]., Yousefi et al [41]., Ekström et al. [44], Calegari et al [45]., Abogaye-Sarfo et al. [48], Tandberg et al [10]., Susnjak et al. [36], Fan et al. [26], Duarte et al. [40] |
| ARIMAX | Tuomien et al. [25], Jiang et al. [29], Fan et al. [26] | |
| Seasonal-ARIMA | Jiang et al. [29], Calegari et al [45]., Lim et al [34], Sharafat et al. [38] | |
| ARIMA with linear regression | Yousefi et al. [41] | |
| Exponential Smoothing | Gafni-Pappas [24], Vollmer et al [39]., Calegari et al. [45] | |
| EWMA (Exponentially Weighted Moving Average) | Perry et al. [49] | |
| Linear Regression | Morbey et al [33]. | |
| Multiple Linear Regression | Tiedman et al. [28], Yousefi et al. [41], Ho et al [43]., Perry et al. [31], Aroua et al. [46], Fan et al. [26] | |
| Poisson Regression | Perry et al. [31], Martin et al [42]., Asheim et al. [51] | |
| Relaxed LASSO | Jiang et al. [29] | |
| Generalised linear model with elastic net regularisation | Morbey et al. [33] | |
| LASSO and generalised linear model | Porto et al. [35] | |
| General Linear Method | Yousefi et al. [41] | |
| Simulated annealing and Floating search | Tuomien et al. [25] | |
| Facebook Prophet [54] | Gafni-Pappas et al. [24], Susnjak et al. [36], Duarte et al. [40] | |
| Ridge regression | Susnjak et al. [36] | |
| Generalized estimating equations | Yousefi et al. [41] | |
| Holt-Winters | Calegari et al [45]., Abogaye-Sarfo et al. [48] | |
| Stratified average | Zlotnik et al. [47] | |
| Non-linear difference equation model | Perry et al. [49] | |
| Moving averages | Tandberg et al. [10] | |
| Machine Learning | Decision Tree | Ram et al. [30], Zlotnik et al. [47] |
| Random Forest | Gafni-Pappas et al. [24], Tuomien et al. [25], Peng et al. [27], Jiang et al. [29], Porto et al. [35], Susnjak et al. [36], Sharafat et al [38]., Morbey et al [33]., Aroua et al. [46], Fan et al. [26] | |
| Support Vector Machine (SVM) | Jiang et al. [29], Ram et al. [30], Morbey et al [33]., Yousefi et al. [41], Aroua et al. [46], Zlotnik et al. [47], Jones et al. [50], Fan et al. [26], Porto et al. [35] | |
| Naive Bayes | Ram et al. [30], Susnjak et al. [36] | |
| Artificial Neural Network | Ram et al. [30] | |
| Deep Neural Network | Jiang et al. [29], Ekström et al. [44] | |
| Long Short-Term Memory (LSTM) | Duarte et al [40]., Yousefi et al. [41], Sharafat et al [38]., Vollmer et al. [39], Fan et al. [26] | |
| Attention-Based Neural Network | Alvarez et al. [32] | |
| Deep Neural Network Integrated Genetic Algorithm | Jiang et al. [29] | |
| K-nearest neighbour | Morbey et al. [33], Susnjak et al. [36], | |
| Neural network regression | Porto et al. [35] | |
| Feed forward neural network | Sharafat et al. [38] | |
| General regression neural network | Duarte et al. [40] | |
| Recurrent neural network | Duarte et al. [40] | |
| Radial absis network | Duarte et al. [40] | |
| Parallel cascade model | Perry et al. [49] | |
| State space model | Perry et al. [49] | |
| Extreme learning machine | Fan et al. [26] | |
| Convolutional neural network | Sharafat et al. [38] | |
| EnsembleMethods | Cluster-based Ensemble Learning | Pelaez et al. [23] |
| Gradient Boosted Machines | Gafni-Pappas et al. [24], Ekström et al [44]., Morbey et al. [33], Porto et al. [35], Petsis et al [37]., Aroua et al. [46] | |
| CatBoost | Susnjak et al. [36] | |
| Voting regressor | Susnjak et al. [36] | |
| Averaging model | Susnjak et al. [36] | |
| Stacking model | Susnjak et al. [36], Vollmer et al. [39] | |
| Threshold based machine learning | Pelaez et al. [23] |
Legend: ARIMA: Autoregressive integrated moving average; ARIMAX: AutoRegressive Integrated Moving Average with eXogenous inputs; LASSO: least absolute shrinkage and selection operator; SVM: support vector machines; RBF: radial basis function., Facebook Prophet: decomposable time series model with trend, seasonality and holidays components
To rate the performance of models, different studies used different measurements. These measurements are explained in Table 5. Due to none of the values in Table 5 being reported across all studies, the secondary aim of comparing models’ ability to correctly predict ED arrivals was not achieved. A meta-analysis was not performed due to the high heterogeneity of the included studies. The study designs, outcomes measured and study populations varied across the studies. This variability would introduce substantial bias, thereby compromising the reliability of the meta-analysis. We do however present all of the best performing models from each study and their horizon in Table 6.
Table 5.
Measures used to describe models ability to correctly predict emergency department arrivals
| Acronym | Measure |
| RMSE | Root mean square error |
| MSE | Mean Square Error |
| MAPE | Mean Absolute Percentage Error |
| DM | Diebold-Mariano test (The null hypothesis is that the reference model is more accurate than the test model [26]) |
| Precision* | Number of true positives divided by the total number of positive predictions |
| Pearson correlation | Coefficient that measures the linear association |
| AUC | Area under curve |
| MAE | Mean absolute error |
| R2 | Coefficient of determination |
| Peak error | The peak error measure is zero if the peak forecast correctly identifies both the date and intensity of the peak |
Legend: Precision* was used in a study where they additionally tried to predict if the arrivals is going to be peak event, compared to others
Table 6.
Comparison of model performance based on reported metrics and horizon of the best performing model
| Authors | Horizon* | RMSE | MSE | MAPE | Precision | Pearson Correlation | MAE | AUC | Peak error ** | R^2 |
| Lim et al. [34] | Not clearly marked | 5.3 | ||||||||
| Petsis et al. [37] | 22.96 | 6.5 | 18.37 | |||||||
| Sharafat et al. [38] | 38.6 | 55.6 | 25.1 | 87 | ||||||
| Gafni-Pappas et al. [24] | 18.94 | |||||||||
| Fan et al. [26] | 14.55 | 3 | ||||||||
| Peng et al. [27] | 0.809 | |||||||||
| Tideman et al. [28] | 7.58 | |||||||||
| Ram et al. [30] | days | 70% | ||||||||
| Perry et al. [31] | 0.29 | 0.73 | ||||||||
| Porto et al. [35] | 7 days | 5.48 | ||||||||
| Duarte et al. [40] | 7.1002 | |||||||||
| Aroua et al. [46] | 6 | |||||||||
| Perry et al. [49] | 4 days | 36 | ||||||||
| Martin et al. [42] | 5.65 | |||||||||
| Tandberg et al. [10] | 28 days | 42 | ||||||||
| Morbey et al. [33] | 0.195 | |||||||||
| Zlotnik et al. [47] | 2 weeks | 10.717 | ||||||||
| Ekström et al. [44] | 2 days | 17.5 | ||||||||
| Susnjak et al. [36] | 13 weeks | 15.9 | 8.9 | 12.8 | ||||||
| Abogaye-Sarfo et al. [48] | 12 months | 323.45 | 1.81 | 424.28 | ||||||
| Jones et al. [50] | 1 hour | 2.5 | ||||||||
| Jiang et al. [29] | 2.7 | |||||||||
| Pelaez-Rodriguez et al. [23] | 1 day | 8.36 | ||||||||
| Tuomien et al. [25] | 6.6 | |||||||||
| Álvarez-Chaves et al. [32] | 5.98 | |||||||||
| Vollmer et al. [39] | 6.7 | 14.27 | ||||||||
| Yousefi et al [41]. | 4.89 | 99.6 | ||||||||
| Ho et al. [43] | 0.4744 | |||||||||
| Calegari et al. [45] | 2.91 | |||||||||
| Asheim et al. [51] | 3 hours | 31 |
Legend: * Horizon of best performing model, **The peak error measure is zero if the peak forecast correctly identifies both the date and intensity of the peak, RMSE: Root mean square error, MSE: Mean Square Error, MAPE: Mean Absolute Percentage Error, MAE: Mean Absolute Error, AUC: Area Under the Curve, R^2: Coefficient of determination
We compared models based on their output performance measurements reported and horizons of the manuscripts’ best performing model.
Most common time horizon was 1 day (7/30 studies) and most common reported metric was MAPE (16/30 studies) (Table 6).
Risk of bias and applicability
Risk of bias assessment was carried out first on authors own dimensions and then using PROBAST tool.
Own dimensions
Authors’ own risk of bias domains have found that in general studies have not made their development data available (Table 7). Only one of the manuscripts published the full code [51]. For validation, a train/test split was most common. Additionally, Morbey et al. [33] used Generalized Additive Model For Location, Scale, And Shape to create confidence intervals, Porto et al. [35] used Time-Series Split Cross-Validation (TSCV), Martin et al. [42] used external validation and 5 of the studies reported their predictions with confidence intervals. According to TRIPOD, the studies had analysis type 1b (development and validation using resampling) and type 2b (nonrandom split-sample development and validation), some of studies were both type 1b and 2b, and one study used the type 3 [42] analysis (Table 5).
Table 7.
Risk of bias assessment using authors’ own domains
| Author | Model Made Public | Dataset Size Specified | Training Data Made Public | Train: Test Division Made | Guidelines (Ex. TRIPOD) Followed | Confidence Intervals Used? | Validation Methods | TRIPOD 2015 Type* |
|---|---|---|---|---|---|---|---|---|
| Pelaez-rodriguez et al. [23] | No | Yes | Yes | Yes | No | No | Cross Validation K = 5, Train/Test Split | 1b &2b |
| Gaffni-pappas et al. [24] | No | Yes | No | Yes | No | No | Cross Validation K = 5, Train/Test Split | 1b &2b |
| Tuomien et al. [25] | No | Yes | Partially | Yes | No | No | Train/Test Split and Cross Validation (K Unknown) | 1b &2b |
| Fan et al. [26] | No | Yes | No | Yes | No | No | Train/Test Split | 2b |
| Peng et al. [27] | No | Yes | No | No | No | No | Train/Test Split | 2b |
| Tideman et al. [28] | No | No | No | No | No | No | Train/Test Split, Cross Validation (K = 10), | 1b &2b |
| Jiang et al. [29] | No | Yes | No | Yes | No | No | Train/Test Split, Cross Validation (K = 5), | 1b &2b |
| Ram et al. [30] | No | No | No | No | No | No | Cross Validation (K = 10) | 1b |
| Perry et al. [31] | No | Yes | No | Yes | No | No | Train/Test Split | 2b |
| Alvarez-chales et al. [32] | No | Yes | No | Yes | No | No | Train/Test Split, Cross Validation | 1b &2b |
| Morbey et al. [33] | No | No | No | Yes | No | Yes | Generalized Additive Model For Location, Scale, And Shape (GAMLSS) To Create Uncertainty Intervals | 1b |
| Lim et al. [34] | No | Yes | No | Yes | No | Yes | Incremental Learning Validation Process | 1b |
| Porto et al. [35] | No | Yes | Yes | Yes | No | No | Cross Validation (K = 5), Time-Series Split Cross-Validation (TSCV) | 1b, 2b |
| Susnjak et al. [36] | No | Yes | No | Yes | No | No | Cross Validation (K = 5), Train/Test Split | 1b, 2b |
| Petsis et al. [37] | No | Yes | No | Yes | No | No | Cross Validation, Train/Test Split | 1b, 2b |
| Sharafat et al. [38] | No | No | No | Yes | No | No | Train/Validation/Test Split | 1b, 2b |
| Vollmer et al. [39] | No | No | No | Yes | No | No | Train/Test Split, Temporal One Day Cross Validation For Each Day | 2b |
| Duarte et al. [40] | No | No | No | No | No | Yes | Train/Test Split | 2b |
| Yousefi et al. [41] | No | No | No | Yes | No | No | Test/Train Split | 2b |
| Martin et al. [42] | No | No | No | Yes | No | No | Train/Test Split, Cross Validation, External Validation | 1b, 2b, 3 |
| Ho et al. [43] | No | No | No | No | No | No | None | None |
| Ekstrom et al. [44] | No | Yes | No | Yes | No | No | Training, Validation and Train/Test Split | 2b |
| Calegari et al. [45] | No | Yes | No | Yes | No | No | Train/Test Split | 2b |
| Aroua et al. [46] | No | Yes | No | Yes | No | No | None | None |
| Zlotnik et al. [47] | No | Yes | No | Yes | No | No | Train/Test Split | 2b |
| Abogaye et al. [48] | No | No | No | Yes | No | Yes | Train/Test Split | 2b |
| Perry et al. [49] | No | Yes | No | Yes | No | No | Train/Test Split | 2b |
| Jones et al. [50] | No | Yes | No | Yes | No | No | Train/Test Split | 2b |
| Tandberg et al. [10] | No | Yes | No | Yes | No | Yes | Train/Test Split | 2b |
| Asheim et al. [51] | Yes | Yes | No | Yes | No | No | Train/Test Split | 2b |
Legend: * see TRIPOD 2015 (22), 1b: development and validation using resampling, 2b: non-random split-sample development and validation
Probast
Using PROBAST (see Supplement 4) tool (Table 8), we found 11 out of 30 studies have a high overall risk of bias [10, 27, 30, 31, 43, 46–49]. Only 2 out of 30 studies have a high concern over applicability to the review question [30, 33].
Table 8.
Risk of Bias summary using PROBAST tool
| Author | Overall judgement of risk of bias | Concern over overall judgement of applicability |
|---|---|---|
| Pelaez-Rodriguez et al. [23] | low | low |
| Gaffni-Pappas et al. [24] | low | low |
| Tuomien et al. [25] | low | low |
| Fan et al. [26] | low | low |
| Peng et al. [27] | high | low |
| Tideman et al. [28] | low | low |
| Jiang et al. [29] | low | low |
| Ram et al. [30] | high | high |
| Perry et al. [31] | high | low |
| Alvarez-Chales et al. [32] | low | low |
| Morbey et al. [33] | low | high |
| Lim et al. [34] | low | low |
| Porto et al. [35] | low | low |
| Susnjak et al. [36] | low | low |
| Petsis et al. [37] | low | low |
| Sharafat et al. [38] | low | low |
| Vollmer et al. [39] | low | low |
| Duarte et al. [40] | low | low |
| Yousefi et al. [41] | low | low |
| Martin et al. [42] | low | low |
| Ho et al. [43] | high | low |
| Ekstrom et al. [44] | low | low |
| Calegari et al. [45] | low | low |
| Aroua et al. [46] | high | low |
| Zlotnik et al. [47] | high | low |
| Aboagye et al. [48] | high | low |
| Perry et al. [49] | high | low |
| Jones et al. [50] | low | low |
| Tandberg et al. [10] | high | low |
| Asheim et al. [51] | high | low |
Discussion
To date, this is the first systemic review known to authors that carried out a validated risk of bias assessment on the included studies [2, 8, 9, 11], which is an important factor when trying to build foundation for further advancement of the field. We found 10 thematic areas of external variables that different studies used to improve their basic TSF model and 53 different methods of TSF from 3 TSF methods families. Additionally, we found that the publishing of training data is rare and that most of the studies did not publish final models for use by other researchers.
Methodological approaches
TSF can optimize the logistics of EDs and has been gaining more momentum recently with over 870 results for TSF and ED on PubMed [55]. It shows great promise in accurately predicting ED arrivals [2, 8, 9, 11].
We found that authors used simple and transparent methods of data preprocessing and no methods to fill the missing values were described or used. It is possible that due to accessing metadata from patient care, there is not any missing data, as this might be the case with other modeling types of studies [56]. We recommend that authors describe the completeness of the data used in future work to achieve better transparency.
TSF has developed multiple methodological approaches to the problem, especially during the COVID-19 pandemic [57, 58]. The current state of the art has been identified in a comprehensive review by Kolambe et al. [59]. Recently, this field has also seen a lot of new programming libraries and commercial solutions entering the market, for example Nixtla, Kumo AI, Sulie, Chronos and Neural Prophet [12, 60–63]. In our systematic review, we have identified 3 studies that used an off the shelf library– Facebook Prophet [25, 36, 40], which was already out of date at the time of publication and updated with Neural Prophet [63].
In our analysis, we identified multiple methods used, from the most classic ones like ARIMA, to advanced machine learning and hybrid models. The systematic review from 2009 found that ARIMA, exponential smoothing, regression, moving average and linear regression are most used [2]. Silva et al. have suggested using ARIMA for ED TSF when patterns show strong seasonality and Machine Learning approaches when there is more randomness [8]. More recent reviews identified a wide plethora of methods, ranging from statistical time series methods (ie. ARIMA, SARIMA, MSARIMA), regression methods (multiple linear regression, Poisson regression, quadratic model) to more novel ones using artificial intelligence (ie. Artificial neural networks) and hybrid models (ie. ARIMA-LR, ARIMA-ANN) [2, 8, 9, 11]. Due to differences in reporting accuracy (already pointed out by Silva et al. and Wargon et al. [2, 8]), we cannot make a recommendation on which methods should be tested first and which hold the most promise for future research. Silva et al. have already recommended starting with ARIMA [8]. We however suggest starting with one model from each of the forecasting families identified. From the statistical methods we recommend ARIMA, from machine learning methods we suggest neuronal nets (Artificial Neural Network - ANN or Deep Neural Network - DNN) and from hybrid methods gradient-boosted machines. From these starting points, future studies should have a good baseline on which they can compare methods and choose the best-performing ones.
To ensure, among other things, minimalization of risk of bias, comparability and reproducibility, reporting guidelines have been created [64]. This is the second review to address bias in analyzed studies of ED TSF. One review approached bias, but has done so in an unvalidated way [9]. One of the most important hubs for research reporting guidelines is Equator Network [65]. In modeling, the most referred to guideline is TRIPOD statement [66] and a version of it has also been used for reporting this systematic review [13]. In selected studies, none of the authors have adhered to any of the existing reporting guidelines, obstructing the reproducibility and dampening the usefulness of their research. To asses bias we adhered to TRIPOD-SRMA guidelines [13]. We have identified 11 manuscripts as having a high risk of bias. With no TRIPOD statement in the included manuscripts, it is also difficult to extract required information to make a systematic review, as the wording is not uniform across different research groups.
We found that almost none of the results reported had confidence or prediction intervals next to predicted values. This is an important missing feature, as it leaves the forecast end-users without a reference of how accurate the forecast is and what the edge conditions that could happen are. One of the methods that could be used to quantify such uncertainty is conformal prediction [67]. This would also build trust with the users, as they would be able to better interpret the forecasts [67].
Different platforms have emerged to compare models between each other, drive community collaboration and speed up development. Examples of such platforms are Kaggle and Hugging face [68, 69]. These platforms create competitions with a standard dataset and a forecasting model on which different users and research teams base their models and try to improve prediction scores [70]. Since 1982, a world time series competition, called “M competition” is organized, which compares different TSF methods using real-world data [71]. During the COVID-19 pandemic, two websites specifically dedicated to the challenge of COVID-19 TSF were created, separately for the European Union and the United States of America [57, 58]. Our research found that the majority of the studies included have not made their model open for use by others or performed external validation. Furthermore, only two studies made their development data easily available [23, 25] and one partially available [25]. We could not achieve the secondary aim of creating a scoreboard and finding the best-performing model due to various methods of reporting model success of prediction. We found that RMSE, MSE, MAPE, DM, Precision and Pearson correlation were used, but none of the measures were used throughout the body of research, making comparisons impossible, although RMSE and MAPE were used most frequently. We suggest adding Weighted Interval Score (WIS), which is similar to DM and which was also used when comparing COVID-19 forecasting methods. WIS is great for ensemble methods, because it compares the performance of a model to the last prediction (the null prediction being the same as the last known value) [58]. We additionally suggest using validated open-source libraries to carry out validation, like scores, published by the Australian Bureau of Meteorology [72], instead of writing a new validation code.
In our analysis most of the studies were both type 1b and 2b according to TRIPOD 2015, some were only type 1b or 2b separately, and one was type 3 as well [22]. It is important to note that none of the studies were type 4 (using external validation on previously published data set). Because of this, the results of the model can be too optimistic, which reduces reproducibility and generalizability of the model [22, 73]. Our findings are in line with the general trend that can be seen in medicine, where the vast majority of developed models do not use external validation [73]. External validation is a crucial step for models to come into clinical practice and meaningfully affect clinical practice [73]. We recommend that a standard dataset should be published in such a way that all ED TSF researchers can easily externally validate their model.
External variables
External variables have been shown to improve TSF models in more recent systematic reviews of ED TSF [2, 8, 9, 11]. In previous reviews, calendar dates, weather, air quality, web site traffic, influenza levels, socioeconomic data, triage level data, patient information and Monday effect were identified as external variables that were used [2, 8, 9, 11].We found 10 different thematic areas of external variables used. To the existing weather, calendar, air quality, hospital web site and other potential variables (ie. socioeconomic indicators, triage level, influenza level) we are adding social networks (ex. X, formerly Twitter), Google search trends, HIS, calls to the nonacute medical care phone number, prehospital team’s announcements of arrivals, experienced local staff knowledge of busy holidays. Use of call center data was unique to Asheim et al. and Perry et al., respectively [49, 51]. Currently, the the usefulness of this in other countries is questionable, as it requires a high level of digitalization and health policy supporting these underlying systems – especially the non-acute phone number. A good candidate for incorporating call center data would also be Finland with its 116–117 helpline [74]. An interesting external variable never before used is the experienced local staff knowledge to help identify holidays that drive the most traffic to the ED [39]. A good mix of external variables can probably be established from in all of the gathered examples and is probably highly variable depending on the locale of the model. We have also found that too many variables can hinder the model accuracy, so testing which combinations work best is an important step in every model development. Authors should check availibility and effect of above mentioned before conducting research.
We found that most commonly, weather-related and calendar variables are most commonly used in the final models, which was also established by previous reviews [2, 9, 11]. The novelty of our study is a much wider array of external variables [2, 8, 9, 11], with our review finding 10. None of the studies in our sample incorporated socioeconomic indicators, which Jiang et al. suggested possibly hold value but are under-researched [9]. An interesting addition could also be satellite-based data such as Normalized Difference Vegetation Index, which was used in conjunction with temperature to predict severe allergic reactions in Greece [75].
It has also been suggested that a meta-analysis of external variables should be carried out to try and identify the most prudent variables to gather and use [8]. Currently, with the state of literature published in this field, this is still not achievable, as authors mostly do not publish their model weights, do not quantify predictive values of the external variables used in their models nor do they make the code of the models open source to allow external validation and reproducibility of their analyses. We did, however, create a table reporting horizon and best performing model metric for all studies included.
Limitations and strengths
Our literature review is limited to a small sample of articles fitting our screening criteria. Compared to other reviews, we had the 3rd biggest sample of articles after Shanchend et al. [9] and Muhammet et al. [11], while other reviews had a smaller sample [2, 8]. Additionally, our article does not compare different models’ forecasting ability due to heterogeneity of dimensions used to report these results. We included research from different continents, contributing to possible hindering of applicability, as different social, geographical, environmental and cultural determinants might lead to different external variables predicting ED arrivals; there probably isn’t a set of true answers that would be applicable in all situations. This systematic review should serve as a guide as to which variables more effort should be put into for testing.
The main strength of this systematic review is adhering to TRIPOD reporting guidelines [13], which is intended specifically for reviews of forecasting models and additionally evaluating the risk of bias of included research. We also curated a list of external variables on which TSF for ED could be based and methods that are most common for these problems. Additionally, we identified directions on which future research could be based. Finally, we found some major shortcomings of current research that should be easy to address by authors in future studies.
Implications for research and future directions
In our review, we have identified 18 out of 30 articles providing future directions of research.
These directions are (I) feature engineering, (II) predicting incoming patient categories, (III) use of social media and internet data, (IV) prospective study design, (V) simulation of policy changes, (VI) studying geographical variation, (VII) using multiple study sites, (VIII) discovering new variables, (IX) creation of hybrid models, (X) external validation of models, (XI) longer prediction horizons and (XII) synthetic data use. They suggest that data pre-processing should be incorporated into the model preparation workflow. This was not mentioned in articles inside our scope of research, even though it is a valid and good suggestion. Additionally, Jiang et al. suggest creating hybrid models, which is in line with our results [9].
Explainable AI (XAI) is an expanding field which emphasizes the importanceof understanding why the models give one results and not the other and not only trusting their performance metrics [36]. This was suggested as a future research direction by Asheim et al. [51] (with some of the research already carried out by Susnjak et al. [36]). NeuralProphet, a successor to Facebook Prophet, one of most proliferated forecasting libraries in the field of TSF is one step into such direction, promising explainable results of its’ predictions [63]. One of such methodologies is incorportaing SHapley Additive exPlanations (SHAP) into result reporting workflows [76]. In the field of ED triage it has been already used extensively and had positive impact [77]. This type of research is better fit to address legislative concerns and gain trust of key stakeholders of whom action based on created forecasts is expected [78].
Feature engineering is an approach of transforming input variables to make a better use of machine learning methods used [79]. Some of these transformations are Principal Component Analysis (PCA), Logarithmic transformations, Box-Cox transformation and others [79]. We found that authors used PCA, Z-scoring, weighting and normalization. Further research is warranted to find which kind of feature engineering works best for a given TSF methodology. Pelaz et al. [23] mentioned this as a possible direction of future research as well.
Important concern about feasibility was brought up by Wargon et al. [2]. None of the studies have touched on the clinical importance of their models, nor was there any model that made it through translational phase into practice. It is not clear whether ED departments can actually benefit from ED arrivals prediction. We believe that an interview-based study with ED directors of different economic areas (USA, EU, etc.) would be prudent to define what time horizons should be used, what certainty assured and which predictions hold actual clinical and managerial value.
We hypothesize that long-term trends in ED arrivals might be more important than short term horizons in giving exact numbers of patients attending EDs.
Hybrid models should be tested, as they combine strengths of different methods to overcome limitations of any one model used. It is also important to state that a meta-analysis of effect of external variables at the time of writing this review is not feasible, due to authors not making their models open and not reporting specific results for each of the external variables tested to better asses which ones across the literature carry the most promise. Finally, we recommend that authors report their accuracy result in a metric, which allows for an easy comparison; in our review we have identified RMSE and MAPE to be most used. Additionally, we recommend the use of WIS, which has proven useful in COVID-19 models scoring.
This review serves as a good starting point for creating a TSF model for ED arrivals. With gaps identified, especially in comparability of the studies, the authors will create a TSF challenge on one of the aforementioned platforms and prepare a dataset that could be used by all authors for external validation of their models as no such TSF challenge nor dataset currently exist. Silva et al. have already recommended that at least 3 years of data should be used for doing ED TSF research [8]. Not all research groups have access to such datasets. The creation of this challenge will democratize this research field and allow other researchers to externally validate their models, thus making it possible to advance the field and actually translate the findings and models into real-world use. We also recommend further exploration of external variables, especially socioeconomic ones, which are under-researched and could improve prediction models.
Conclusions
TSF is feasible for use in EDs. Our results emphasize the need for better reported results of model development and validation to better understand the role of external variables used in created models, as well as for more uniform reporting of results between different research groups and external validation of created models. We believe that open science initiatives and stricter requirements of journals for authors to adhere to when reporting will allow for such studies to be conducted in the near future. We recommend authors make their models and results as open as possible in order to improve the field of TSF in ED.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
This systematic review is part of the author’s (LP) PhD candidacy. Mentors of the PhD candidate are TA and JŽ. The mentor for review writing is AK.
Abbreviations
- AI
Artificial Inteligence
- ANN
Artificial Neural Network
- AQI
Air Quality Index
- ARMA
Autoregressive–Moving-Average
- ARIMA
Autoregressive integrated moving average
- ARIMA-LR
Autoregressive integrated moving average – linear regression
- ARIMAX
AutoRegressive Integrated Moving Average + exogenous variables
- AUC
Area Under Curve
- SARIMA
Seasonal Autoregressive Integrated Moving Average
- SARIMAX
Seasonal Autoregressive Integrated Moving Average + exogenous variables
- BMJ
British Medical Journal
- CINAHL
Cumulative Index to Nursing and Allied Health Literature
- COPD
Chronic Obstructive Pulmonary Disease
- COVID-19
Coronavirus disease 2019
- DM
Diebold-Mariano test
- DNN
Deep Neural Network
- EBSCO
EBSCO Information Services
- ED
Emergency department
- EMBASE
Excerpta Medica Database
- EWS ED
Early Warning System for ED attendance predictions
- EWMA
Exponentially Weighted Moving Average
- FE
Feature engineering
- FSA
Forward Sortation Area
- GA
Genetic Algorythm
- GP
General Population
- HIS
Hospital Information System
- HRI NACRS ED
National Ambulatory Care Reporting System
- IEEE
Institute of Electrical and Electronics Engineers
- KPI
Key Performance Indicator
- LASSO
Least absolute shrinkage and selection operator
- LR
Linear regression
- LTSM
Long short-term memory networks
- MAE
Mean Absolute Error
- MAPE
Mean Absolute Percentage Error
- ML
Machine Learning
- MLR
Multiple Linear Regression
- MSE
Mean Square Error
- NA
Not Applicable
- PCA
Primary component Analysis
- PM
Particulate Matter
- PRISMA
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
- PROBAST
Prediction model Risk Of Bias ASsessment Tool
- RBF
Radial basis function
- RMSE
Root mean square error
- RSV
Respiratory syncytial virus
- SFC
Surface Pressure Chart
- SHAP
SHapley Additive exPlanations
- SR
Systematic review
- SRMA
Systematic review and meta analysis
- SVM
Support Vector Machines
- TRIPOD
Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis
- TSF
Time series forecasting
- USA
United States of America
- UKHSA
United Kingdom Health Security Agency
- VARIMA
Vector Autoregressive Integrating Moving Average
- WIS
Weighted Interval Score
Author contribution
LP: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Resources, Software, Visualization, Writing – Original Draft, Writing – Review & Editing. KG: Article Screening, Data Curation, Investigation, Writing – Review & Editing. JŽ: Conceptualization, Methodology, Writing – Review & Editing. TA: Writing – Review & Editing. AK: Writing – Original Draft, Writing – Review & Editing
Funding
This work is in part financed by P3-0429 (SLORApro).
Data availibility
All data generated or analysed during this study are included in this published article [and its supplementary information files]
Declarations
Ethical approval
IRB approval was not necessary for this type of study.
Use of AI
Generative AI was not used in the creation of this manuscript or any parts of analysis.
Patient and public involvement
No patients or public were involved in this study.
Consent for publication
Not applicable
Competing interest
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Garattini L, Badinella Martini M, Nobili A Urgent and emergency care in europe: time for a rational network? European Journal of Internal Medicine. 2024 Feb;120:1–2. [DOI] [PubMed] [Google Scholar]
- 2.Wargon M, Guidet B, Hoang TD, Hejblum G A systematic review of models for forecasting the number of emergency department visits. Emergency Medicine Journal. 2009 Jun 1;26(6):395–99. [DOI] [PubMed]
- 3.Petrino R, Tuunainen E, Bruzzone G, Garcia-Castrillo L Patient safety in emergency departments: A problem for health care systems? An international survey. European Journal of Emergency Medicine. 2023 Aug;30(4):280–86. [DOI] [PubMed] [Google Scholar]
- 4.He X A Survey on Time Series Forecasting. In: Patnaik S, Kountchev R, Tai Y, Kountcheva R, editors. 3D Imaging—Multidimensional Signal Processing and Deep Learning [Internet]. Singapore: Springer Nature Singapore; 2023 [2024 Nov 27]. p. 13–23. (Smart Innovation, Systems and Technologies; vol. 348). Available from: https://link.springer.com/10.1007/978-981-99-1145-5_2 [Google Scholar]
- 5.Rahimi I, Chen F, Gandomi AH A review on COVID-19 forecasting models. Neural Comput & Applic. 2023 Nov;35(33):23671–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Torres JF, Hadjout D, Sebaa A, Martínez-Álvarez F, Troncoso A Deep Learning for Time Series Forecasting: a Survey. Big Data. 2021 Feb 1;9(1):3–21. [DOI] [PubMed]
- 7.Gao S, Koker T, Queen O, Hartvigsen T, Tsiligkaridis T, Zitnik M Units: Building a unified time series model. arXiv preprint arXiv:240300131. 2024;
- 8.Silva E, Pereira MF, Vieira JT, Ferreira-Coimbra J, Henriques M, Rodrigues NF Predicting hospital emergency department visits accurately: A systematic review. Int J Health Plann Manage. 2023 Jul;38(4):904–17. [DOI] [PubMed] [Google Scholar]
- 9.Jiang S, Liu Q, Ding B A systematic review of the modelling of patient arrivals in emergency departments. Quant Imaging Med Surg. 2023 Mar;13(3):1957–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tandberg D, Qualls C Time series forecasts of emergency department patient volume, length of stay, and acuity. ANN EMERG MED. 1994;23(2):299–306. [DOI] [PubMed] [Google Scholar]
- 11.Gul M, Celik E An exhaustive review and analysis on applications of statistical forecasting in hospital emergency departments. Health Systems. 2020 Oct 1;9(4):263–84. [DOI] [PMC free article] [PubMed]
- 12.Nixtla [Internet]. [2024 Nov 28].: https://www.nixtla.io/about
- 13.Snell KIE, Levis B, Damen JAA, Dhiman P, Debray TPA, Hooft L, et al. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: Checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). BMJ. 2023 May 3;e073538. [DOI] [PMC free article] [PubMed]
- 14.White J PubMed 2.0. Medical Reference Services Quarterly. 2020 Oct 1;39(4):382–87. [DOI] [PubMed]
- 15.Scopus preview - Scopus - Welcome to Scopus [Internet]. [2024 Dec 8].: https://www.scopus.com/standard/marketing.uri
- 16.Clarivate. Web of Science [Internet]. 2025. Available from: https://mjl.clarivate.com/search-results
- 17.Ebsco. CINAHL [Internet]. Available from: https://www.ebsco.com/products/research-databases/cinahl-database
- 18.Embase. Embase [Internet]. Available from: https://www.embase.com/landing?status=grey
- 19.Forbes C, Greenwood H, Carter M, Clark J Automation of duplicate record detection for systematic reviews: Deduplicator. Syst Rev. 2024 Aug 2;13(1):206. [DOI] [PMC free article] [PubMed]
- 20.Haddaway NR, Page MJ, Pritchard CC, McGuinness LA PRISMA2020: an R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and Open Synthesis. Campbell Systematic Reviews. 2022 Jun;18(2):e1230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019 Jan 1;170(1):51. [DOI] [PubMed]
- 22.Collins GS, Reitsma JB, Altman DG, Moons K Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Peláez-Rodríguez C, Torres-López R, Pérez-Aracil J, López-Laguna N, Sánchez-Rodríguez S, Salcedo-Sanz S. An explainable machine learning approach for hospital emergency department visits forecasting using continuous training and multi-model regression. Comput Methods Programs Biomed. 2024 Mar;245:108033. [DOI] [PubMed] [Google Scholar]
- 24.Gafni-Pappas G, Khan M Predicting daily emergency department visits using machine learning could increase accuracy. Am J Emerg Med. 2023 Mar;65:5–11. [DOI] [PubMed] [Google Scholar]
- 25.Tuominen J, Lomio F, Oksala N, Palomäki A, Peltonen J, Huttunen H, et al. Forecasting daily emergency department arrivals using high-dimensional multivariate data: a feature selection approach. BMC Med Inform Decis Mak. 2022 May 17;22(1):134. [DOI] [PMC free article] [PubMed]
- 26.Fan B, Peng J, Guo H, Gu H, Xu K, Wu T Accurate Forecasting of Emergency Department Arrivals With Internet Search Index and Machine Learning Models: model Development and Performance Evaluation. JMIR Med Inform. 2022 Jul 20;10(7):e34504. [DOI] [PMC free article] [PubMed]
- 27.Peng J, Chen C, Zhou M, Xie X, Zhou Y, Luo CH Peak Outpatient and Emergency Department Visit Forecasting for Patients With Chronic Respiratory Diseases Using Machine Learning Methods: retrospective Cohort Study. JMIR Med Inform. 2020 Mar 30;8(3):e13075. [DOI] [PMC free article] [PubMed]
- 28.Tideman S, Santillana M, Bickel J, Reis B Internet search query data improve forecasts of daily emergency department volume. Journal of the American Medical Informatics Association. 2019;26(12):1574–83. [DOI] [PMC free article] [PubMed]
- 29.Jiang S, Chin KS, Tsui KL A universal deep learning approach for modeling the flow of patients under different severities. Computer Methods and Programs in Biomedicine. 2018;154:191–203. [DOI] [PubMed] [Google Scholar]
- 30.Ram S, Zhang W, Williams M, Pengetnze Y Predicting Asthma-Related Emergency Department Visits Using Big Data. IEEE Journal of Biomedical and Health Informatics. 2015;19(4):1216–23. [DOI] [PubMed]
- 31.Perry AG, Korenberg MJ, Hall GG, Moore KM Modeling and syndromic surveillance for estimating weather-induced heat-related Illness. Journal of Environmental and Public Health [Internet]. 2011; 2011. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-79959259852%26doi=10.1155%2f2011%2f750236%26partnerID=40%26 [DOI] [PMC free article] [PubMed]
- 32.Álvarez-Chaves H, Maseda-Zurdo I, Muñoz P, R-Moreno MD Evaluating the impact of exogenous variables for patients forecasting in an Emergency Department using Attention Neural Networks. Expert Systems with Applications. 2024 Apr;240:122496. [Google Scholar]
- 33.Morbey RA, Todkill D, Moura P, Tollinton L, Charlett A, Watson C, et al. Using machine learning to forecast peak health care service demand in real-time during the 2022–23 winter season: a pilot in England, UK. PLoS ONE [Internet]. 2025;20(1 January). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216322193%26doi=10.1371%2fjournal.pone.0292829%26partnerID=40%26md5=bde55b03f0bb565e7b03cb6436b2b155 [DOI] [PMC free article] [PubMed]
- 34.Lim KH, Nguyen FNHL, Cheong RWL, Tan XGY, Pasupathy Y, Toh SC, et al. Enhancing Emergency Department Management: a Data-Driven Approach to Detect and Predict Surge Persistence. Healthcare (Basel) [Internet]. 2024;12(17). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203694230%26doi=10.3390%2fhealthcare12171751%26partnerID=40%26md5=e1ed5738a97ed385aece30f0358b8626 [DOI] [PMC free article] [PubMed]
- 35.Porto BM, Fogliatto FS Enhanced forecasting of emergency department patient arrivals using feature engineering approach and machine learning. BMC Med Informatics Decis Mak [Internet]. 2024;24(1). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85212442074%26doi=10.1186%2fs12911-024-02788-6%26partnerID=40%26md5=0c5aeba7d430214e82d5511771f039f2 [DOI] [PMC free article] [PubMed]
- 36.Susnjak T, Maddigan P Forecasting patient demand at urgent care clinics using explainable machine learning. CAAI Trans Intell Technol. 2023;8(3):712–33. [Google Scholar]
- 37.Petsis S, Karamanou A, Kalampokis E, Tarabanis K Forecasting and explaining emergency department visits in a public hospital. J Intell Inform Syst. 2022;59(2):479–500. [Google Scholar]
- 38.Sharafat AR, Bayati M PatientFlowNet: a Deep Learning Approach to Patient Flow Prediction in Emergency Departments. IEEE Access. 2021;9:45552–61. [Google Scholar]
- 39.Vollmer MAC, Glampson B, Mellan T, Mishra S, Mercuri L, Costello C, et al. A unified machine learning approach to time series forecasting applied to demand at emergency departments. BMC Emerg Med [Internet]. 2021;21(1). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85100128286%26doi=10.1186%2fs12873-020-00395-y%26partnerID=40%26md5=49f7b87f13acd0c88d35ad999dc24bec [DOI] [PMC free article] [PubMed]
- 40.Duarte D, Walshaw C, Ramesh N A comparison of time-series predictions for healthcare emergency department indicators and the impact of COVID-19. Appl Sci [Internet]. 2021;11(8). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85104712585%26doi=10.3390%2fapp11083561%26partnerID=40%26md5=5bd2028d09307a1fcf47dcf1da4ec7c4
- 41.Yousefi M, Yousefi M, Fathi M, Fogliatto FS Patient visit forecasting in an emergency department using a deep neural network approach. Kybernetes. 2020;49(9):2335–48. [Google Scholar]
- 42.Martin LJ, Dong H, Liu Q, Talbot J, Qiu W, Yasui Y Predicting influenza-like illness-related emergency department visits by modelling spatio-temporal syndromic surveillance data. Epidemiol Infect. 2019;147((Martin LJ, Dong H, Liu Q, Talbot J, Qiu W, Yasui Y) School of Public Health, University of Alberta, AB, Edmonton, Canada):e312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.AFW H, BZYS T, Koh JM, Cheong KH Forecasting Hospital Emergency Department Patient Volume Using Internet Search Data. IEEE Access. 2019;7:93387–95. [Google Scholar]
- 44.Ekström A, Nordberg M, Eriksson O Shorter waiting time, better emergency healthcare: forecasting Stockholm’s emergency department visits. Model Assisted Stat Appl. 2018;13(4):377–85. [Google Scholar]
- 45.Calegari R, Fogliatto FS, Lucini FR, Neyeloff J, Kuchenbecker RS, Schaan BD Forecasting Daily Volume and Acuity of Patients in the Emergency Department. Comp Math Methods Med [Internet]. 2016;2016((Calegari R, Fogliatto FS, Lucini FR, filipe.lucini@gmail.com), Department of Industrial and Transportation Engineering, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil). Available from: https://www.embase.com/search/results?subaction=viewrecord%26id=L612579331%26from=export [DOI] [PMC free article] [PubMed]
- 46.Aroua A, Abdul-Nour G Forecast emergency room visits-a major diagnostic categories based approach. Int J Metrol Qual Eng [Internet]. 2015;6(2). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84940374052%26doi=10.1051%2fijmqe%2f2015011%26partnerID=40%26md5=7214a3ba8284ee9d66d1a3416f014f69
- 47.Zlotnik A, Gallardo-Antolín A, Alfaro MC, Pérez MCP, Martínez JMM Emergency department visit forecasting and dynamic nursing staff allocation using machine learning techniques with readily available open-source software. CIN Comput Informatics Nurs. 2015;33(8):368–77. [DOI] [PubMed] [Google Scholar]
- 48.Aboagye-Sarfo P, Mai Q, Sanfilippo F, Preen D, Stewart L, Fatovich D A comparison of multivariate and univariate time series approaches to modelling and forecasting emergency department demand in Western Australia. JOURNAL OF BIOMEDICAL INFORMATICS. 2015 Oct;57:62–73. [DOI] [PubMed]
- 49.Perry AG, Moore KM, Levesque LE, Pickett CWL, Korenberg MJ A comparison of methods for forecasting emergency department visits for respiratory illness using Telehealth Ontario calls. Canadian Journal of Public Health. 2010;101(6):464–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Jones S, Evans R, Allen T, Thomas A, Haug P, Welch S, et al. A multivariate time series approach to modeling and forecasting demand in the emergency department. JOURNAL OF BIOMEDICAL INFORMATICS. 2009 Feb;42(1):123–39. [DOI] [PubMed] [Google Scholar]
- 51.Asheim A, Bjørnsen LP–, Næss-Pleym LE, Uleberg O, Dale J, Nilsen SM Real-time forecasting of emergency department arrivals using prehospital data. BMC Emergency Medicine [Internet]. 2019;19(1). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85070364349%26doi=10.1186%2fs12873-019-0256-z%26partnerID=40%26md5=e25ecc7773b4435b45b4e1a6270dc53b [DOI] [PMC free article] [PubMed]
- 52.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021 Mar 29;n71. [DOI] [PMC free article] [PubMed]
- 53.AFW H, BZYS T, Koh JM, Cheong KH Forecasting Hospital Emergency Department Patient Volume Using Internet Search Data. IEEE Access [Internet]. 2019;7. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85073890301%26doi=10.1109%2fACCESS.2019.2928122%26partnerID=40%26md5=3c85ee49ab38c1035d66dabbb46523d9
- 54.Taylor SJ, Letham B Forecasting at scale [Internet]. 2017 [2024 Dec 7].: https://peerj.com/preprints/3190v2
- 55.time series forecasting and emergency department - Search Results - PubMed [Internet]. [2024 Nov 27].: https://pubmed.ncbi.nlm.nih.gov/?term=time%20series%20forecasting%20and%20emergency%20department%26filter=pubt.clinicaltrial%26filter=pubt.controlledclinicaltrial%26filter=pubt.guideline%26filter=pubt.meta-analysis%26filter=pubt.multicenterstudy%26filter=pubt.observationalstudy%26filter=pubt.practiceguideline%26filter=pubt.preprint%26filter=pubt.randomizedcontrolledtrial%26filter=pubt.review%26filter=pubt.systematicreview%26filter=pubt.validationstudy%26timeline=expanded
- 56.Weiskopf NG, Hripcsak G, Swaminathan S, Weng C Defining and measuring completeness of electronic health records for secondary use. Journal of Biomedical Informatics. 2013 Oct;46(5):830–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Cramer EY, Huang Y, Wang Y, Ray EL, Cornell M, Bracher J, et al. The United States COVID-19 Forecast Hub dataset [Internet]. 2021 [2024 Nov 28]. http://medrxiv.org/lookup/doi/10.1101/2021.11.04.21265886 [DOI] [PMC free article] [PubMed]
- 58.Sherratt K, Gruson H, Grah R, Johnson H, Niehus R, Prasse B, et al. Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations. eLife. 2023 Apr 21;12:e81916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Kolambe M, Arora S Forecasting the future: a comprehensive review of time series prediction techniques. Journal of Electrical Systems. 2024;20(2s):575–86. [Google Scholar]
- 60.Forecasting and Anomaly Detection for Businesses - Kumo [Internet]. [2024 Nov 28].: https://kumo.ai/solutions/forecasting-and-anomaly-detection/
- 61.Sulie [Internet]. [2024 Nov 28].: https://sulie.co/
- 62.Ansari AF, Stella L, Turkmen C, Zhang X, Mercado P, Shen H, et al. Chronos: learning the Language of Time Series [Internet]. arXiv; 2024 [2024 Nov 29].: https://arxiv.org/abs/2403.07815
- 63.Triebe O, Hewamalage H, Pilyugina P, Laptev N, Bergmeir C, Rajagopal R NeuralProphet: explainable Forecasting at Scale [Internet]. arXiv; 2021 [2024 Nov 28].: https://arxiv.org/abs/2111.15397
- 64.Nicholls SG, Langan SM, Benchimol EI, Moher D Reporting transparency: making the ethical mandate explicit. BMC Med. 2016 Dec;14(1):44, s12916–016–0587–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Richards D The EQUATOR network and website. Evid Based Dent. 2007 Dec;8(4):117–117. [DOI] [PubMed] [Google Scholar]
- 66.Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024 Apr 16;e078378. [DOI] [PMC free article] [PubMed]
- 67.Manokhin V, Sudjianto A Practical Guide to Applied Conformal Prediction in Python: learn and Apply the Best Uncertainty Frameworks to Your Industry Applications. Place of Publication Not Identified: Packt Publishing; 2023. 1 p. [Google Scholar]
- 68.Hugging Face – the AI community building the future. [Internet]. 2024 [2024 Nov 28].: https://huggingface.co/
- 69.Kaggle: Your Machine Learning and Data Science Community [Internet]. [2024 Nov 28].: https://www.kaggle.com/
- 70.Bojer CS, Meldgaard JP Kaggle forecasting competitions: an overlooked learning opportunity. International Journal of Forecasting. 2021 Apr;37(2):587–603. [Google Scholar]
- 71.Makridakis S, Spiliotis E, Assimakopoulos V M5 accuracy competition: results, findings, and conclusions. International Journal of Forecasting. 2022 Oct;38(4):1346–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Leeuwenburg T, Loveday N, Ebert EE, Cook H, Khanarmuei M, Taggart RJ, et al. scores: a Python package for verifying and evaluating models and predictions with xarray. Joss. 2024 Jul 9;9(99):6889.
- 73.Ramspek CL, Jager KJ, Dekker FW, Zoccali C, Van Diepen M External validation of prognostic models: what, why, how, when and where? Clinical Kidney Journal. 2021 Feb 3;14(1):49–58. [DOI] [PMC free article] [PubMed]
- 74.Medical Helpline 116117 [Internet]. [2025 Mar 13].: https://116117.fi/en
- 75.Notas G, Bariotakis M, Kalogrias V, Andrianaki M, Azariadis K, Kampouri E, et al. Accurate Prediction of Severe Allergic Reactions by a Small Set of Environmental Parameters (NDVI, Temperature). Hogan SP, editor. PLoS ONE. 2015 Mar 20;10(3):e0121475. [DOI] [PMC free article] [PubMed]
- 76.Lundberg SM, Lee SI A Unified Approach to Interpreting Model Predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2017. Available from: https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
- 77.Porto BM Improving triage performance in emergency departments using machine learning and natural language processing: a systematic review. BMC Emerg Med. 2024 Nov 18;24(1):219. [DOI] [PMC free article] [PubMed]
- 78.Shin D The effects of explainability and causability on perception, trust, and acceptance: implications for explainable AI. International Journal of Human-Computer Studies. 2021 Feb;146:102551. [Google Scholar]
- 79.Verdonck T, Baesens B, Óskarsdóttir M, Vanden Broucke S Special issue on feature engineering editorial. Mach Learn. 2024 Jul;113(7):3917–28. [Google Scholar]
- 80.Morbey RA, Todkill D, Moura P, Tollinton L, Charlett A, Watson C, et al. Using machine learning to forecast peak health care service demand in real-time during the 2022–23 winter season: a pilot in England, UK. PLoS ONE [Internet]. 2025;20(1 January). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216322193%26doi=10.1371%2fjournal.pone.0292829%26partnerID=40%26md5=bde55b03f0bb565e7b03cb6436b2b155 [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated or analysed during this study are included in this published article [and its supplementary information files]


