A novel machine learning approach for spatiotemporal prediction of EMS events: A case study from Barranquilla, Colombia

Dionicio Neira-Rodado; Juan Camilo Paz-Roa; John Willmer Escobar; Miguel Ángel Ortiz-Barrios

doi:10.1016/j.heliyon.2025.e41904

. 2025 Jan 13;11(2):e41904. doi: 10.1016/j.heliyon.2025.e41904

A novel machine learning approach for spatiotemporal prediction of EMS events: A case study from Barranquilla, Colombia

Dionicio Neira-Rodado ^a,^b, Juan Camilo Paz-Roa ^c, John Willmer Escobar ^a,^⁎, Miguel Ángel Ortiz-Barrios ^b

PMCID: PMC11786830 PMID: 39897906

Abstract

Anticipating the timing and location of future emergency calls is crucial for making informed decisions about vehicle location and relocation, ultimately reducing response times and enhancing service quality. A predictive model for EMS (Emergency Medical Services) events is proposed to address this need. The proposed spatiotemporal approach integrates machine learning, signal analysis, and statistical features, capturing geographical, temporal, and event-specific factors. The model identifies patterns associated with the occurrence or absence of emergency calls, using clustering techniques for demand spatial splitting and then training an XGBoost model on the multivariate time series. The model uses signal analysis to extract valuable insights from time-series data, enhancing understanding of temporal patterns, while statistical features enhance predictive capabilities. Principal Component Analysis (PCA) enhances convergence and integrates diverse time series features. As a result, this novel integrated approach improves the estimation of spatiotemporal probabilities of emergency events, effectively addressing data sparsity challenges. This framework adapts effectively, predicting EMS zones and guiding system configuration. The model outperforms a Random Forest trained solely on time-series data, boosting accuracy by up to 26.9 % in Barranquilla's case study zones, with a mean improvement of 16.4 %. Accuracy improvement makes the model helpful in assisting city authorities in ambulance location/relocation and dispatching decisions.

Keywords: PCA (principal component analysis), Clustering, XGboost, Signal processing, Statistical features, Spatiotemporal classification, EMS, Demand forecast

Highlights

•
Calls prediction using machine learning, signal analysis, and statistical features.
•
Demand spatial division with clustering techniques.
•
Time series PCA integrating diverse features.
•
Data Sparsity handling.
•
Mean 16.4 % increase in performance in specific zones.

1. Introduction

The effective coordination of EMS is critical for ensuring timely responses to various emergencies, from diseases to traffic accidents and cardiorespiratory arrests. This emergency variety implies that EMS systems must usually coordinate first responders, crisis management at regulatory centers, pre-hospital and emergency services, transport modalities, hospital care, educational programs, and surveillance processes.

The economic implications of inefficient EMS systems are substantial, with cardiovascular diseases incurring costs exceeding 900 billion euros (945 billion USD) annually worldwide [1]. At the same time, populations are pyramidally inverted, and concerns arise regarding the financial sustainability of these systems, potentially leading to tax increases. Many medical emergencies, particularly out-of-hospital cardiac arrests and cardiovascular events, are highly time-sensitive [2], with response time being critical due to the narrow 4-min window for preventing irreversible neurological damage [1,3]. EMS systems must deal also with road accidents, whose casualties, impose a substantial economic burden on most nations, equivalent to 3 % of their GDP [1,4]. Moreover, different media highlight prevalent issues within the current EMS, underscoring significant deficiencies in emergency response. This situation contributes to an elevated mortality rate, deteriorates the quality of life for individuals, and escalates healthcare system costs [5].

EMS management involves multiple stages, from forecasting demand requirements, deciding the number of vehicles, locating/relocating emergency vehicles, and choosing an ambulance dispatching policy. Considering emergency spatiotemporal emergency calls as the foundational step sets the stage for subsequent planning and operational strategies. Configuring these systems presents substantial challenges due to the significant spatiotemporal variability inherent in emergency incidents. Addressing this demand variability requires developing and applying suitable and tailored forecasting techniques.

Accurately predicting emergency calls' timing, location, and severity can dramatically improve EMS system performance by up to 90 % [6]. Given the relevance accurate forecasting has on EMS outcomes, we have addressed this topic, proposing a novel integration supporting on clustering analysis, weather features, signal processing features, and multivariate time series. This multivariate time series was reduced in its dimension with a time serial PCA to also retain the main contribution of the different variables in the time series, reducing algorithmic convergence. This integration uses the XGBoost algorithm to obtain fast convergence and more accurate predictions. This prediction was tackled as a dichotomic one (presence or absence of emergency calls in a time window in a city zone). In this sense, our research aims to contribute to two specific issues: (1) EMS predictive capacity and (2) the interpretability of factors affecting emergency call occurrences.

Understanding the factors influencing emergency call occurrences is essential for dynamic system configuration. However, achieving this understanding poses a challenge. To overcome it, we delved into an interpretable approach to enlighten the relationships between call patterns and various contextual variables such as time of day, weather conditions, and geographical features. By improving interpretability, EMS administrators gain the flexibility to adjust system settings based on evolving conditions, ensuring optimal responsiveness. On the other hand, conventional zoning methods often rely on predefined grids, which may limit pattern identification. In response, we proposed a zoning strategy based on clustering techniques that enhances predictive performance and effectively guides resource allocation. Additionally, EMS demand patterns evolve over time, requiring frequent model updates to maintain accuracy. Considering its short convergence times, we have supported on XGBoost to address this challenge [7,8]. This agility ensures timely adaptation to changing demand trends, enabling the system to maintain accurate predictions aligned with real-world dynamics.

To summarize, our proposed spatiotemporal forecasting framework estimates the probability of occurrence of emergency events in the different zones in the upcoming time slots, considering current and past system events. It integrates statistical, time series, geographical, weather, and signal analysis features, reducing feature complexity by 90 % using PCA. By grouping demand into zones and integrating various analytical techniques, the model achieves a mean 16.4 % improvement in spatiotemporal prediction accuracy. Additionally, it addresses the challenge of zero values in spatiotemporal divisions, outperforming traditional forecasting approaches like Artificial Neural Networks (ANN) [9,10]. In this sense, it is also relevant to point out that we chose XGBoost, considering that it outperforms ANN in computing time and accuracy [7,8]. Additionally, unlike ARIMA, XGBoost can handle multivariate time series [11]. This research holds universal relevance, and its results are helpful for practitioners and researchers, considering that the integrated model gives researchers a new pathway to tackle forecasting spatiotemporal EMS demand and improve location decisions. This paper is organized as follows: Section 1 contains an introduction to the topic, a general description of the model, and highlights its main contributions. Section 2 corresponds to the literature review, focusing on different forecasting approaches for EMS. Section 3 describes the proposed approach, explaining each one of the steps to be followed to build the proposed forecasting approach. The results are analyzed in section 4, and section 5 contains the study's conclusions.

2. Literature review

EMS agencies face the ongoing challenge of efficiently allocating ambulances and medical personnel to ensure optimal geographic coverage while minimizing response times. EMS managers and dispatchers meticulously analyze the distribution of incoming call requests (demand) to meet these objectives [12,13]. They establish resource deployment plans that outline the necessary number of ambulances and emergency response personnel for future periods. These deployment plans dictate personnel's daily and hourly work-shift schedules based on historical demand data and forecasts. This strategic approach is essential for enhancing emergency response efficiency and meeting the dynamic demands of the healthcare system [[12], [13], [14]]. Historically, the majority of studies have focused on the time of the day as the sole dimension and have generated forecasts for broad geographic areas such as entire counties or cities. Nevertheless, demand may fluctuate temporally and spatially based on the time and day of the week. This requires dispatchers to redistribute (redeploy) their ambulances to different locations throughout the day to compensate for spatiotemporal demand fluctuations [9,12,13,15]. Industry and academic researchers have conducted numerous studies on developing novel deployment strategies and associated staffing plans to reduce response time while maximizing service coverage or minimizing response time, among other performance measures. These redeployment models depend on call volume forecasts distributed by space and time to serve as inputs [9,12,13,15].

This literature review focuses on spatiotemporal EMS demand prediction to determine gaps in addressing this problem. In this sense, it is essential to highlight that the earliest works related to emergency medical services demand forecasting appeared in the 1970s. Kamenetzky et al. [16] found that 4 of 27 variables explained 92 % of the variations in demand for the regions. On the other hand, many early studies focused on ambulance demand modeling and analyzed how socioeconomic attributes of a geographic region correlated with ambulance demand using linear models. While these methods are of interest in understanding the causes of ambulance demand, they are not directly relevant to short-term demand prediction. Additionally, despite contextual data has demonstrated its efficacy in reducing errors in demand forecasts they are usually not available.

EMS demand forecasting has often relied on linear models at the strategic level. While regression and time series models overlook the occurrence's location, they complement each other well. Regression models stand out in explaining demand behavior using diverse contextual data, while time series models are proficient in revealing temporal variations in demand. In contrast, spatiotemporal models simultaneously consider time and location in their predictions.

In this sense, a spatiotemporal weighted kernel density estimator was used to predict hourly EMS demand in Melbourne [10]. The model successfully represents a map's hourly spatial demand density while having fast computing times. Nicoletta et al. [17] explored a Bayesian model considering a 2-h time window to predict EMS call volumes. Although the approach yielded a global MAE of 0.145, no information was provided on its computational complexity. On the other hand, Zhou and Matteson [10] showed that spatiotemporal forecasts have to face, on many occasions, data sparsity, making it difficult to train a model. In their experimentation, this situation represented a challenge even for strong learners such as ANN, obtaining better results when predicting zeros always in a naive way.

Other studies focused on forecasting EMS's demand using time series models. For example, a version of the Holt–Winters' exponential smoothing method and a parameter optimization model were developed to generate daily forecasts of ambulance demand [18]. On the other hand, Brown et al. [19] determined the System Status Management (SSM) framework yielded a 19 % MAPE in forecasting EMS calls. Similarly, Channouf et al. [20] applied a standard regression model, a regression model configured with correlated residuals, and a double-seasonal ARIMA model to produce daily call volume forecasts. The authors found that each model generated relatively accurate daily and hourly forecasts.

Setzler et al. [21] found that ANN moderately outperformed the moving average forecasts at certain spatial divisions and time slots. The authors use 1-h and 3-h timeslots and 2x2-mile and 4x4-mile grids. Chen et al. [22] further explored the importance of incorporating the spatial component of demand. When comparing the performance of ANN, moving average, linear regression, and support vector regression to predict daily and sub-daily EMS call volume, they obtained MAPE values ranging between 23.01 % and 60.56 %.

Vile et al. [23] introduced Singular Spectrum Analysis (SSA) for ambulance demand predictions. Vile et al. [23] found that SSA could produce short-term forecasts (7–14 days) with accuracies equivalent to ARIMA and Holt-Winters models. Additionally, SSA demonstrated better accuracy over extended periods (21–24 days).

Additionally, Grekousis and Liu [24] developed a predictive model for ambulance demand using artificial intelligence, incorporating spatial machine learning algorithms to identify high-probability areas for emergency events in Athens, Greece. Their study highlights the effectiveness of AI techniques in forecasting precise event locations, emphasizing the importance of integrating spatiotemporal data for enhanced EMS planning. The model utilized a time window of months, relying solely on time-series data for predictions. The methodology analyzed the distances between accidents in each time window to achieve this, generating spatial patterns that informed the predictions. This approach demonstrated robust support for neural networks in EMS modeling.

Similarly, Grekousis et al. [25] analyzed high-risk emergency areas in Athens, Greece, utilizing GIS and neural networks to identify patterns of incident occurrence. This study also employed time windows spanning weeks to months, focusing on spatial and temporal patterns of emergency events. By analyzing spatial distances between incidents over time, the study leveraged neural networks to provide valuable insights into spatial risk factors, showcasing the utility of geospatial analysis in emergency management.

Photis et al. [26] proposed a comprehensive methodology for emergency response planning in Athens, Greece, combining GIS with fuzzy logic and neural networks. Their approach predicted spatial demand patterns and optimally allocated emergency resources by examining incident distances over successive time windows to infer spatial relationships. Like the other studies, it relied on time-series data and monthly time windows, further supporting the application of neural networks in emergency management scenarios.

Together, these studies underscore the potential of integrating AI, GIS, and spatiotemporal modeling to enhance emergency response systems by forecasting demand and optimizing resource allocation. They also illustrate the importance of using distances between incidents as a critical variable in predictive modeling, highlighting its role in improving the accuracy of spatial and temporal forecasts.

The models described above were explicitly employed in EMS demand forecasting across strategic, tactical, and operational levels. Many of them gave spatially aggregated daily forecasts. Others considered demand spatial distribution, and the last group also incorporated hourly or time-window demand to make spatiotemporal predictions. Furthermore, the contextual variables utilized by various authors cannot be universally applied to all EMS systems, as each country possesses unique characteristics.

Although EMS forecasting has received considerable attention in the literature, there has been limited exploration of spatiotemporal analysis and even less consideration of simultaneous spatiotemporal and multivariate analyses. Consequently, we reviewed studies on spatiotemporal taxi demand prediction to identify potential approaches applicable to EMS demand forecasting. This review was motivated by the shared objective of addressing spatiotemporal transportation demand and optimizing vehicle location and relocation strategies to respond to demand effectively. Like ambulances, taxi requests also face stochastic demand behavior, requiring swift forecast updates. In this sense, it is relevant to highlight the work of Veloso et al. [27], who used a naive Bayesian conditional probabilistic model to predict the next pick-up area. Their model used time of day, day of the week, weather conditions, last drop-off location, and point of interest as the given factors (independent variables) for this model. They concluded that the location of the previous drop-off is the most influential factor, and the weather condition is the least significant factor in their prediction model.

Machine learning methods have also gained significant relevance for demand forecasting in the context of taxi demand. This is attributed to advancements in data storage, computational speed, and the capability to gather data from diverse sources swiftly. Regarding these machine-learning approaches, Shalev-Schwartz and David [28] discussed that these techniques are preferred over traditional methods when the problem is too complex or needs dynamic adaptivity. In this sense, several studies have developed decision tree approaches to forecast taxi [29], [30], [31], [32], [33], [34]].

On the other hand, different clustering approaches have been used to detect areas with a higher probability of taxi demand [35,36]. Davis et al. [37] Wei et al. [38] used clustering and time series models to predict taxi demand. This hybrid approach allowed them to achieve an improvement of 20 % in MAPE.

Nevertheless, each method may perform well for only some specific time or area [38,39]. This situation encouraged researchers to tackle the problem with hybrid models combining different M.L. methods, to improve the model's performance. Random forest is one of the most used techniques to ensemble other approaches [29,31,40], and [32]. Nevertheless, these approaches face data scarcity, making it more difficult to be adequately trained. When data is available, processing such a vast amount of data is time-consuming, and depending on the technique used, it is difficult to interpret the factors considered in the model.

Finally, it is relevant to note that EMS call demand prediction is challenging since most spatiotemporal bins contain zeroes. This creates a poor signal-to-noise ratio, which causes difﬁculties during model training [9,10]. A summary of the main contributions found in the different studies can be found in Table 1.

Table 1.

Main contributions in the fields of EMS and taxi forecasting.

Study	Context	Classification	Regression		Features		Dimensionality reduction	Time horizon	Spatial partition	City
Study	Context	Classification	Time series	Causal Forecast	Socio-demographic	Weather	Dimensionality reduction	Time horizon	Spatial partition	City
Martin et al. [12]	EMS calls			Multi-Layer Perceptron		X	Boruta Technique (Wrapper)	Hourly	K-Means	Mecklenburg County, North Carolina, USA
Abreu et al. [13]	EMS calls	Call type - Multinomial regression		Multi-Layer Perceptron	X	X	Grid Search	Hourly	Political divisions	Portugal
Payares-Garcia et al. [15]	Criminal emergency calls		Log-Gaussian Cox process					Weekly	Customized partition	Valencia, Spain
Nabarro et al. [9]	EMS calls		Gaussian Process Regression					Hourly	Customized partition	Western Cape, South Africa
Kamenetzky et al. [16]	EMS calls			Second order regression	X				Political divisions	Southwestern Pennsylvania, USA
Photis and Grekousis [26]	EMS events		Multi-Layer Perceptron					Monthly	Exact location	Athens
Grekousis and Liu [24]	EMS events		Multi-Layer Perceptron					Weekly	Exact location	Athens
Grekousis and Photis [25]	EMS events		Multi-Layer Perceptron					Monthly	Exact location	Athens
Zhou and Matteson [10]	EMS calls		The spatiotemporal kernel warping method					Hourly	Customized partition	Melbourne, Australia
Nicoletta et al. [17]	EMS calls			Bayesian model	X			2-h	Customized partition	Montreal-Quebec, Canada
Baker and Fitzpatrick [18]	EMS calls		Winter's exponential smoothing					Daily	Political divisions	North Carolina, USA
Tandberg et al. [41]	EMS calls		ARIMA - Exponential smoothing					Hourly		Albuquerque, USA
Brown et al. [19]	EMS calls		Exponential smoothing					Hourly		different USA cities
Channouf et al. [20]	EMS calls		ARIMA - Multinomial distribution					Hourly		Calgary, Canada
Setzler et al. [21]	EMS calls		Artificial Neural Network					Hourly Two-hour Three-hour	4 x 4 mile 2 x 2 mile divisions	Mecklenburg County, North Carolina, USA
Chen et al. [22]	EMS calls		Artificial Neural Network and Support Vector Regressor					Three-hour	3 × 3 km divisions	Taipei, Taiwan
Vile et al. [23]	EMS calls		Singular Spectrum Analysis					Shift (8-h)		Wales, UK
Saadi et al. [29]	Taxi requests		Artificial Neural Network, different CART tools					Ten-minute	Political divisions	China
Moreira-Matias et al. [30]	Taxi requests		incremental ARIMA					Real-time		Porto, Odivelas and Loures. Portugal.
Rahaman et al. [32]	Taxi passenger queue			SVM, Random Forest, kNN, Decision Tree	X	X		Hourly	JFK airport	New York, USA
Xu [33]	Taxi requests		SVM, Random Forest, kNN, Decision Tree					Hourly	Selected NYC locations	New York, USA
Zhang et al. [34]	The gap between taxi requests and taxi offer			Double Ensemble Gradient Boosting Decision Tree				Ten-minute	Peking University	Beijing, China
Chang et al. [36]	Taxi requests			k-means		X		Hourly	K-Means	Taipei, Taiwan
Laviolette et al. [35]	Taxi requests			k-means	X	X		Hourly	K-Means	Montreal, Canada
Davis et al. [37]	Taxi travels		Multi-level clustering, Holt-Winters					Hourly	5 × 5 km divisions	Bengaluru, India
Wei et al. [38]	Taxi demand	Spatial predictor	Ensemble predictor, combining the results of time predictor and spatial predictor.					Hourly	Customized partitions	China
Laha and Putatunda [39]	Taxi demand		Ensembled Spherical-Spherical Regression and Multivariate Multiple Regression					Hourly	Customized partitions	Different benchmark datasets
Ke et al. [31]	Taxi requests			Fusion convolutional long short-term memory network		X		Hourly	5 × 5 km divisions	Hangzhou, China

Open in a new tab

It is relevant to note that taxi demand forecasting and EMS event prediction are grounded in different theoretical frameworks. However, both fields leverage shared methodological approaches, such as spatiotemporal clustering, time-series modeling, and machine learning techniques, to predict event distribution and frequency. While taxi demand forecasting focuses on discretionary travel behavior, often influenced by socioeconomic and environmental factors., EMS event prediction targets critical, life-saving interventions that require accounting for stochastic and usually life-threatening events. Although the methodologies overlap, EMS prediction requires additional considerations such as response time optimization and equitable resource allocation.

Considering the reviewed research in taxi and EMS spatiotemporal requests forecasting, this paper proposes an EMS forecasting approach that considers some of the contributions made by different authors summarized in Table 1, with some novel contributions to assist decision-makers in locating/relocating ambulances dynamically in a better way. This proposed integrated model will help improve response times, coverage, and quality of service. In this sense, the main gaps identified in this review are.

•
Improve the zoning criterion to make it analytic and not divide the area in a grid with a subjective criterion. This can impact the ability to detect patterns and, consequently, the accuracy of the spatiotemporal forecast.
•
Lack of multivariate time series approaches.
•
Improve the model's accuracy or reduce forecasting errors.
•
Models with short computing time must perform frequent training processes to update the model for real-time applications.
•
Models to estimate the spatiotemporal probability of the following incident occurrence instead of the number of calls expected.

To fill these gaps, we propose a machine learning approach to estimate the likelihood of emergency calls within a 3-h timeframe, offering practical and actionable insights across space and time. Our proposed approach considers current and past system status across different variables in the time series to make its predictions. This approach can help decision-makers adapt system configurations over time. We introduce new features rooted in statistical analysis and signal processing techniques, detailed in the methods section, while employing PCA to streamline training.

In summary, the main contribution of our model to these gaps are:

Enhanced Predictive Accuracy: Our methodology uses features based on statistical, signal, geographical, weather, and time series analyses to better sense variability. PCA was also useful for this purpose, retrieving the main contribution of the original variables to obtain the transformed variables and train an XGBoost algorithm with them. This helps to reduce noise in the model. Our approach demonstrates its potential to provide more accurate predictions in practical applications, thereby enhancing EMS responsiveness. XGBoost was used considering its short computing times and better accuracy [7,8].

Interpretability through Principal Component Analysis (PCA): Contrary to approaches found in the literature, our research employs PCA to improve the interpretability of time series data. According to the reviewed literature, Using PCA in a multivariate time series has not been used yet to the best of our knowledge. By adopting PCA, we significantly reduce feature complexity by up to 90 % without compromising the model's predictive performance, offering a more accessible and efficient approach to understanding EMS data.

Construction of a Multivariate Time Series: Using PCA and including lagged variables as independent variables allowed us to generate a multivariate time series to capture variability better, helping to obtain better accuracy measures.

Addressing Spatiotemporal Data Sparsity: We tackle the challenge of data sparsity inherent in spatiotemporal analyses through a comprehensive model that surpasses traditional forecasting methods. Incorporating a diverse set of features—including statistical, geographical, meteorological, and signal processing elements—our model achieves a mean 16.4 % improvement in the accuracy of spatiotemporal predictions. For this purpose, it was also essential to use XGBoost, considering the poor performance evidenced by Nabarro [8] and Zhou and Matteson [9] when using ARIMA and ANN to deal with data sparsity.

Additionally, our framework predicts the spatial likelihood of emergency incidents within forthcoming time slots, leveraging a combination of historical and real-time data across various dimensions. This is a step towards real-time prediction, considering that the goal is to obtain the probability of receiving the next call in each zone, and by this way, being able to reposition ambulances, increasing system preparedness.

This study contributes to the EMS call occurrence prediction field and offers practical insights for optimizing EMS operations and enhancing preparedness and resource utilization in healthcare settings. Our comprehensive analysis and innovative approach contribute to knowledge in EMS forecasting, paving the way for improved emergency medical responses and patient care.

3. Methodology

Data corresponding to the city of Barranquilla's traffic accidents will be used as a study case to build the proposed approach. Then, the proposed approach will be explained, considering that it would be used to make a spatiotemporal prediction of EMS calls due to traffic accidents in this context. Forecasting Barranquilla's EMS calls will be tackled following the methodology shown in Fig. 1.

•
The EMS traffic accident calls dataset was obtained from Barranquilla's mayor's office and consists of 40,818 events, each with corresponding addresses and timestamps. Data was gathered automatically through an information system connected to hospitals and ambulances, implemented in 2018. However, early adoption challenges led some data to be manually keyed by authorities, potentially affecting timestamps and locations. These challenges were resolved by 2020, but due to the exceptional nature of the pandemic in that year, only data from January 1, 2021, onwards was considered for training and testing the model.

To ensure data quality, authorities downloaded records directly from their information system. The research team verified the dataset for completeness, focusing on timestamps and locations. Missing location data was supplemented using Bing Maps® and GPS Visualizer® to convert addresses into latitude and longitude. After preprocessing, no missing or inconsistent data were found in the final dataset used for analysis.

•
After preprocessing, no missing or inconsistent data were found. Three-hour time slots were created to enhance temporal and spatial analysis, reflecting the total events within each slot. This approach, guided by data sparsity, literature findings, and input from EMS authorities, was vital in developing the spatiotemporal forecasting model.
•
Spatiotemporal descriptive analysis was carried out to explore the dataset.
•
The Colombian Institute of Hydrology, Meteorology, and Environmental Studies (IDEAM) supplied the hourly data in millimeters of rainfall for the city of Barranquilla for the period. In this sense, it is essential to point out that this information corresponds to a unique meteorological station in the city, located in the city airport, which is south of the city of Barranquilla. No missing data was found in the time series.
•
The final dataset was obtained by integrating the accidents and rainfall datasets in 3-h time slots. The code used for this task and the following steps was R®.
•
Determining city zones performing k-means clustering. This zoning approach will be helpful because it groups points with similar behavior, contributing to obtaining a better prediction.

Determining city zones was achieved using k-means clustering, grouping points with similar behavior geographic location. This zoning approach is beneficial as it allows for a more nuanced understanding of spatial-temporal dynamics, enhancing the accuracy of predictions.

Using k-means clustering aims to make a better zoning decision and have better granular analysis when building the time series for prediction. Zoning wasn't performed with honeycomb or rectangular griding, considering clustering has performed better in zoning decisions in spatial time series demand forecasting [37]. According to Davis et al. [37] incorporating clustering reduces noise and improves model accuracy by grouping similar zones, which leads to better generalization and lower prediction errors. The election of the number of clusters (k) was based on the elbow method, ensuring an optimal balance between within-cluster variance and interpretability.

This process contributes to a more accurate prediction by training the algorithm for each zone's distinct patterns, enabling the spatiotemporal forecasting model to provide more precise predictions for emergency call hotspots [35,36,38].

•
The dataset containing EMS calls and rainfall in the 3-h time window will be divided into the number of clusters obtained in the previous step.

The following steps will be followed for all the individual datasets obtained in the previous step.

•
Restructuring datasets considering that some events correspond to another cluster; this observation will be missing in the rest of the datasets. So, all the obtained datasets in the previous step had to be reprocessed to replace this observation with a 0 to guarantee the time series continuity.
•
Calculate new features from the original ones. All calculated features are described in Table 2.
•
Performing training and testing of selected classifiers. The classifiers used in this case were Random Forest and XGBoost. For the training set, the data used was for the full year 2021 and January to April 2022. For testing purposes, the month of May 2022 was used. Dimensionality reduction techniques were used to identify the optimal model dimensionality. In particular, a wrapper and Principal Component Analysis (PCA) were used. Given the high variability in the data, particularly in spatiotemporal slots, it is crucial that the model can be rapidly trained for quick updates and enhanced sensitivity to emerging trends or influences in the spatiotemporal series. This approach aims to increase the model's capability to provide more accurate forecasts with short computing times.
•
Evaluation and analysis of the results of the different tests, considering runtimes and accuracy of predictions, using different statistical tests.

Table 2.

Variables calculated in each cluster dataset.

Variable Name	Description	Variable type	Variable Family
Call	Response variable.	Categorical	Variable Family
Zone_X_Hour	This variable indicates the time slot.	Categorical	Time series
	Time slot 1 - (00:01–03:00)
	Time slot 2 - (03:01–06:00)
	Time slot 3 - (06:01–09:00)
	Time slot 4 - (09:01–12:00)
	Time slot 5 - (12:01–15:00)
	Time slot 6 - (15:01–18:00)
	Time slot 7 - (18:01–21:00)
	Time slot 8 - (21:01–24:00)
Zone_X_Weekday	This variable indicates the day of the week. (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)	Categorical	Time series
Zone_X_Monthday	This variable indicates the day of the month. It takes values from 1 to 31; depending on the month, it can go up to 28 or 30.	Categorical	Time series
Zone_X_Month	This variable indicates the month within the year. (January, February, March, April, May, June, July, August, September, October, November, and December).	Categorical	Time series
Zone_X_Year	Year. In this case, it starts in 2015 and finishes in 2022. In the end, only the years 2021 and 2022 were used.	Categorical	Time series

Variable Name	Description	Variable type	Variable Family

Zone_X_NumCallslag_i	Number of calls in lag i in Zone X. The number of lags considered was 40 for all cases. This means that there are 40 variables of this type. Zone X is the Zone (Cluster) in which the prediction will be made. (The lags refer to the previous time slots).	Numeric	Time series
Zone_X_Callslag_i	This categorical (Dichotomic) variable takes the value one if there were one or more calls in lag i in Zone X. For all cases, the number of lags considered was 40. This means that there are 40 variables of this type. Zone X is the Zone (Cluster) in which the prediction will be made. (The lags refer to the previous time slots).	Categorical	Time series
Zone_X_Rainfalllag_i	Amount of rainfall in millimeters in lag I in Zone X. The number of lags considered was 40 for all cases. This means that there are 40 variables of this type. Zone X is the Zone (Cluster) in which the prediction will be made. (The lags refer to the previous time slots).	Numeric	Time series - weather
Zone_X_Rainlag_i	This categorical (Dichotomic) variable takes the value one if there were one or more millimeters of rain in lag I in Zone X. For all cases, the number of lags considered was 40. This means that there are 40 variables of this type. Zone X is the Zone (Cluster) in which the prediction will be made. (The lags refer to the previous time slots).	Categorical	Time series – weather
Zone_Y_NumCallslag_i	Number of calls in lag i in Zone Y. For all cases, the number of lags considered was 40. This means that there are 40 variables of this type. Zone Y is adjacent to Zone X, the Zone (Cluster) in which the prediction will be made. These 40 variables will repeat for each adjacent Zone. (The lags refer to the previous time slots).	Numeric	Time series – Cooccurrence
Zone_Y_Callslag_i	This categorical (Dichotomic) variable takes the value one if there were one or more calls in lag i in Zone Y. For all cases, the number of lags considered was 40. This means that there are 40 variables of this type. Zone Y is adjacent to Zone X, the Zone (Cluster) in which the prediction will be made. These 40 variables will repeat for each adjacent Zone. (The lags refer to the previous time slots).	Categorical	Time series – Cooccurrence
Zone X ARIMA Forecast NumCall	This corresponds to ARIMA's forecast for the number of calls for period t in Zone X. ARIMA was implemented using the package forecast from RStudio ®.	Numeric	Forecast variable
Zone X ARIMA Forecast Rainfall	This corresponds to ARIMA's rainfall forecast for period t in Zone X. ARIMA was implemented using the package forecast from RStudio ®.	Numeric	Forecast variable
Zone X n-lag moving Average Rainfall	The rainfall moving average of previous lags was calculated using n lags in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots).	Numeric	Statistical
Zone X n-lag moving Average Number of Calls	The number of calls moving average of previous lags was calculated using n lags in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots).	Numeric	Statistical
Zone X n-lag moving median Rainfall	Rainfall moving median of previous lags was calculated using n lags in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots).	Numeric	Statistical
Zone X n-lag moving median Number of Calls	Number of calls moving median of previous lags calculated using n lags in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots).	Numeric	Statistical
Zone X Number of time slots without Calls	This variable indicates the time slots required to return to find a Call. This is calculated for each time slot.	Numeric	Statistical

Variable Name	Description	Variable type	Variable Family

Zone X n-lag Rainfall skewness	Skewness of the n-lag data series for Rainfall in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). The calculation was made using the following formulation: ${s k}_{n} = \frac{1}{n σ^{3}} \sum_{i = 2}^{n} {({R a i n f a l l}_{i} - μ_{n_{l a g} R a i n f a l l})}^{3}$	Numeric	Statistical
Zone X n-lag Rainfall kurtosis	Kurtosis of the n-lag data series for Rainfall in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). The calculation was made using the following formulation: ${k u r t}_{n} = \frac{1}{n σ^{4}} \sum_{i = 2}^{n} {({R a i n f a l l}_{i} - μ_{n_{l a g} R a i n f a l l})}^{4}$	Numeric	Statistical
Zone X n-lag Number of Calls skewness	Skewness of the n-lag data series for Number of Calls in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). The calculation was made using the following formulation: ${s k}_{n} = \frac{1}{n σ^{3}} \sum_{i = 2}^{n} {({N u m C a l l s}_{i} - μ_{n_{l a g} N u m C a l l s})}^{3}$	Numeric	Statistical
Zone X n-lag Number of Calls kurtosis	Kurtosis of the n-lag data series for Number of Calls in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). The calculation was made using the following formulation: ${k u r t}_{n} = \frac{1}{n σ^{4}} \sum_{i = 2}^{n} {({N u m C a l l s l}_{i} - μ_{n_{l a g} N u m C a l l s})}^{4}$	Numeric	Statistical
Zone X n-lag Rainfall entropy	Entropy of the n-lag data series for Rainfall in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). This was implemented using the package ForeCA in RStudio ®.	Numeric	Signal processing
Zone X n-lag Rainfall Power Spectral Density	Power Spectral Density of the n-lag data series for Rainfall in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). The calculation is based on evaluating the Fast Fourier Transform (FFT) in the n-lag slot. It was implemented using the FFT and Auto-Correlation Factor (ACF) functions from the package stats in RStudio ®.	Numeric	Signal processing
Zone X n-lag Rainfall RMS	RMS of the n-lag data series for Rainfall in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). This calculation is performed using the following formulation: $R M S = \sqrt[2]{\frac{\sum_{i = 2}^{n_l a g} {Rainfalllag_i}^{2}}{N u m b e r o f l a g s}}$	Numeric	Signal processing
Zone X n-lag Rainfall Trapezoidal Function	The trapezoidal rule of the n-lag data series for Rainfall in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots).	Numeric	Signal processing
Zone X n-lag Number of Calls entropy	Entropy of the n-lag data series for Number of Calls in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). This was implemented using the package ForeCA in RStudio ®.	Numeric	Signal processing

Variable Name	Description	Variable type	Variable Family

Zone X n-lag Number of Calls Power Spectral Density	Power Spectral Density of the n-lag data series for Number of Calls in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots). The estimate is based on evaluating the Fast Fourier Transform (FFT) in the n-lag slot. It was implemented using the FFT and Auto-Correlation Factor (ACF) functions from the package stats in RStudio ®.	Numeric	Signal processing
Zone X n-lag Number of Calls RMS	RMS of the n-lag data series for Number of Calls in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots).This calculation is performed using the following formulation: $R M S = \sqrt[2]{\frac{\sum_{i = 2}^{n_l a g} {NumCallslag_i}^{2}}{N u m b e r o f l a g s}}$	Numeric	Signal processing
Zone X n-lag Number of Calls Trapezoidal Function	The trapezoidal rule of the n-lag data series for the Number of Calls in Zone X. This calculation was done until n = 40. (The lags refer to the previous time slots).	Numeric	Signal processing

Open in a new tab

XGBoost (eXtreme Gradient Boosting) was selected to perform the prediction, considering that it represents a cutting-edge implementation of gradient-boosted decision trees tailored for unparalleled speed and performance. It is versatile and scalable and has become very popular due to its prowess in tackling diverse predictive modeling tasks [8,42]. This method operates through a gradient boosting mechanism, systematically introducing new models that rectify the errors made by their predecessors [8,42]. Notably, XGBoost incorporates regularization terms within its objective function, a strategic measure to curb overfitting and bolster its resilience against noise [8,42].

On the other hand, Random Forest is an ensemble learning method used for classification and regression tasks, which operates by constructing multiple decision trees during the training phase and outputting the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random Forest combines the simplicity of decision trees with flexibility, resulting in a vast improvement in accuracy [43]. Additionally, the algorithm performs implicit feature selection and provides indicators of the importance of each feature, making it easier to identify significant variables.

On the other hand, PCA was used to reduce dimensionality and convert the time series into a multivariate time series. PCA is a dimensionality reduction technique that involves constructing new variables (components) that are linear combinations of the original variables. Importantly, these components are uncorrelated with each other, ensuring that they capture unique information even when the original variables are highly correlated. Each principal component is designed to capture as much variance (information) as possible, with the first component explaining the most variance, followed by the second, and so on. This hierarchical order of importance reduces the dataset's complexity by focusing on the first few components, which explain the most variation, thus retaining most of the original information [44].

4. Results

The city faces some inefficiencies in EMS operations, such as locating ambulances without any analytical criteria, leading to high response times. Additionally, the number of available ambulances is reduced during weekends, increasing response times considering that ambulances need to cover a wider area. The implementation of the methodology described in the previous section, allowed us to obtain the following results.

4.1. Descriptive analysis

This exploration was necessary for the initial understanding of the process. Fig. 2 shows the heatmap corresponding to the EMS traffic accident calls during the full-time series in Barranquilla. It is important to point out that this study delves into the city's accident rate areas using a database spanning accident report from 2015 to May 2022. Barranquilla lies in the Atlántico region's northeastern corner, on the Magdalena River's western bank, 7.5 km from its mouth at the Caribbean Sea in Colombia. It has an area of 154 km². The city's population is 1,274,250, making it the fourth most populated city in the country [3].

The heatmap illustrates the spatial distribution of emergency calls, with red areas indicating hotspots of high call frequency. This visualization highlights the city's northern sector as a key area with denser emergency activity. The heatmap, created using OpenStreetMap and the leaflet package from RStudio®, serves as an initial exploration to demonstrate the presence of spatial patterns in the data.

Understanding these spatial patterns is essential for improving emergency call prediction and informing ambulance relocation policies. Additionally, capturing the temporal dynamics of incidents is critical for developing effective predictive models. To this end, an analysis focused on time and days of the week was performed, using a 3-h time window. This time frame was chosen based on its alignment with similar studies [10,17], the behavior of the data, and input from EMS authorities.

While the heatmap provides an overview of spatial distribution, zoning the calls using clustering techniques offers a more spatial pattern recognition compared to standard geometric zoning [37,38]. Clustering adapts to the underlying data, grouping areas with similar behaviors and improving the ability to capture spatial patterns. This data-driven zoning approach enhances the predictive accuracy [37,38] of the proposed model by effectively integrating spatial and temporal dimensions, ultimately supporting better decision-making for ambulance deployment.

On the other hand, the initial exploration of data allowed the identification of some patterns in the time series. In this sense, Fig. 3 shows that the number of calls varies on workdays, weekends, and different time slots. These patterns are essential because they can help decision-makers identify the needs of ambulances available in the system to guarantee a good quality of service. The number of calls is more significant during workdays during the day hours. On the other hand, the number of calls increases during the time slots corresponding to the night hours during the weekends. These behaviors are related to workday rush hours and weekend entertainment hours. This analysis was essential to determine that time slots and days of the week should be within the set of independent variables for the forecasting model.

A cluster analysis was also conducted to predict better demand considering time series and spatial variability. From the elbow method (Fig. 4), the recommended number of clusters is between 4 and 6. Therefore, the construction and evaluation of the model will be carried out using these three options for demand clusters. In this sense, the distribution of demand clusters can be observed in Fig. 5, Fig. 6, Fig. 7.

Once the three cluster divisions were identified, each cluster dataset's corresponding features (Table 2) were calculated. This yielded 15 datasets, considering that the cluster divisions considered in the process were four, five, and six. Considering that the process aims to predict a dichotomic variable, ARIMA and Poisson regression were discarded, and tree-based machine learning algorithms were considered. Random Forest was the first option considered in the process. A wrapper was implemented to identify and use the most relevant features for future training. Nevertheless, each dataset's algorithm took almost 2 h and 25 min to obtain the most relevant features. Nonetheless, the accuracy obtained with this approach was very low. The results obtained in each case can be observed in Table 3.

Table 3.

Random forest performance.

Random Forest	Balanced Accuracy
Zone_1_of_4	0.567
Zone_1_of_5	0.5615
Zone_1_of_6	0.5659
Zone_2_of_4	0.5019
Zone_2_of_5	0.4979
Zone_2_of_6	0.5345
Zone_3_of_4	0.6249
Zone_3_of_5	0.551
Zone_3_of_6	0.501
Zone_4_of_4	0.5437
Zone_4_of_5	0.5287
Zone_4_of_6	0.5207
Zone_5_of_5	0.5303
Zone_5_of_6	0.5636
Zone_6_of_6	0.5449

Open in a new tab

Considering the poor performance and long computing times obtained with random forest, getting balanced Accuracy close to 0.5 in most cases, XGBoost was considered an alternative to reduce computing times and improve accuracy. Balanced accuracy results obtained with XGBoost can be found in Table 4. Although computing times were reduced to 50 min per model, accuracies were not statistically significantly different, getting a p-value of 0.9718 when comparing mean accuracy between XGBoost and Random Forest.

Table 4.

XGBoost performance.

XGBoost	Balanced Accuracy
Zone_1_of_4	0.5875
Zone_1_of_5	0.5474
Zone_1_of_6	0.5702
Zone_2_of_4	0.5312
Zone_2_of_5	0.522
Zone_2_of_6	0.5196
Zone_3_of_4	0.5503
Zone_3_of_5	0.5033
Zone_3_of_6	0.5124
Zone_4_of_4	0.5193
Zone_4_of_5	0.5248
Zone_4_of_6	0.5381
Zone_5_of_5	0.5666
Zone_5_of_6	0.5832
Zone_6_of_6	0.5558

Open in a new tab

In all these previous cases and the rest of the tests carried out in the study, the observations from January 2021 to April 2022 were used as the training set. The test set was made of the observations corresponding to May 2022. This allowed us to have 3860 time slots in the training set and 228 time slots in the test set. Since accuracy was still very low and computing times were high, it was necessary to explore new features to increase accuracy and new ways to reduce computing times. Considering that lower computing times were obtained with the implementation of XGBoost, it was selected as the starting point to improve the model's accuracy and computing times performance.

To capture the main contribution of the different variables to explain the response variable variability, PCA was considered an excellent way to achieve this goal, and doing this with short computing times. Implementing PCA yielded n-1 components in all cases. In this sense, it is essential to note that depending on the number of clusters considered in each approach, the number of variables will vary. Models with 4 clusters used 924 variables, and models with 5 clusters used 1004 variables and 1084 when working with 6 clusters. These differences are because, depending on the number of clusters considered, the variables associated with the surrounding clusters will increase because there will be more surrounding clusters.

In all experiments conducted in this study, observations from January 2021 to April 2022 were used as the training set, while observations from May 2022 constituted the test set. This division provided 3860 time slots in the training set and 228 time slots in the test set. Despite these configurations, initial results showed low accuracy and high computational costs, necessitating the exploration of new features to enhance accuracy and techniques to reduce computation time. Given that XGBoost yielded lower computational times in initial tests, it was selected as the baseline model for further refinement to improve accuracy and computational performance.

Before implementing the updated configuration incorporating Principal Component Analysis (PCA), a comparison was conducted between ARIMA forecasting and XGBoost using all variables except ARIMA forecast and signal processing features. This step aimed to verify the advantage of the XGBoost/PCA approach without incorporating signal processing features. For this analysis, the response variable was transformed into a numeric variable (number of calls expected in the next time window), excluding statistical and signal processing features. Both approaches were trained using data from January 2021 to April 2022, tested on May 2022 observations, and updated iteratively by including the actual value of each time slot before forecasting the next. This process continued until all May 2022 time slots were evaluated.

The results in Table 5 demonstrate that the proposed XGBoost/PCA approach generally outperformed ARIMA, achieving lower or equivalent RMSE (Root Mean Square Error) values across scenarios, except for Zone 5 in the 5-cluster configuration. The mean RMSE reduction across all scenarios was 7.24 %, highlighting the statistically significant improvement of the XGBoost/PCA model over ARIMA, with a p-value <0.001. These findings confirm that integrating PCA with XGBoost creates a superior multivariate time-series model compared to ARIMA, which was solely trained on univariate time-series data. This validation step confirmed the proposed strategy's efficacy.

Table 5.

ARIMA and XGBoost/PCA performance comparison.

Zones	RMSE_Arima	RMSE_XGBoost	PCA Explained Variability	Percentage reduction/increase of RMSE Using Xgboost's proposed approach
Zone_1_of_4	0.39	0.352	97.2 %	−10 %
Zone_1_of_5	0.358	0.308	97.1 %	−14 %
Zone_1_of_6	0.52	0.47	97.0 %	−10 %
Zone_2_of_4	0.416	0.407	97.1 %	−2%
Zone_2_of_5	0.407	0.361	97.0 %	−11 %
Zone_2_of_6	0.247	0.246	97.2 %	0 %
Zone_3_of_4	0.593	0.525	97.1 %	−11 %
Zone_3_of_5	0.32	0.304	97.1 %	−5%
Zone_3_of_6	0.33	0.326	97.0 %	−1%
Zone_4_of_4	0.675	0.627	97.1 %	−7%
Zone_4_of_5	0.537	0.478	97.0 %	−11 %
Zone_4_of_6	0.266	0.228	97.0 %	−14 %
Zone_5_of_5	0.39	0.404	97.1 %	4 %
Zone_5_of_6	0.316	0.269	97.0 %	−15 %
Zone_6_of_6	0.352	0.352	97.0 %	0 %

Open in a new tab

Following this validation, ten replicates for each zone were conducted, incorporating all features. PCA was employed to capture the contribution of each variable and transform them into components for use as input variables in the classifiers. Two classifiers, Random Forest and XGBoost, were tested. From the total components generated by PCA, the first 100 were retained, explaining approximately 97 % of the variability. This dimensionality reduction aimed to enhance the model by retaining the most relevant information while minimizing noise. This approach significantly reduced model size; from 924 original variables, the model size was reduced by 90 %, using only 100 PCA components. PCA/Random Forest and PCA/XGBoost accuracies are compared in Table 6.

Table 6.

XGBoost/PCA and Random Forest/PCA accuracy comparison (PCA including ARIMA, statistical, weather, signal processing, and geographical features) 10 replicates results.

Model	Mean Accuracy XGBoost/PCA	Standard Deviation Accuracy XGBoost/PCA	Mean Accuracy Random Forest/PCA	Standard Deviation Accuracy Random Forest/PCA	Accuracy Difference XGBoost/PCA – Random Forest/PCA	t-test (p-value)
Zone_1_de_4	0.698	0.001	0.523	0.009	17.5 %	<0.0001
Zone_1_de_5	0.599	0.001	0.511	0.007	8.8 %	<0.0001
Zone_1_de_6	0.766	0.001	0.598	0.013	16,8 %	<0.0001
Zone_2_de_4	0.659	0.001	0.507	0.007	15.2 %	<0.0001
Zone_2_de_5	0.741	0.001	0.500	0.005	24.1 %	<0.0001
Zone_2_de_6	0.605	0.001	0.495	0.000	11.0 %	<0.0001
Zone_3_de_4	0.716	0.001	0.659	0.010	5.7 %	<0.0001
Zone_3_de_5	0.752	0.001	0.509	0.005	24.3 %	<0.0001
Zone_3_de_6	0.708	0.001	0.501	0.004	20.7 %	<0.0001
Zone_4_de_4	0.719	0.001	0.624	0.012	9.5 %	<0.0001
Zone_4_de_5	0.752	0.001	0.587	0.012	16.5 %	<0.0001
Zone_4_de_6	0.673	0.001	0.527	0.006	14.6 %	<0.0001
Zone_5_de_5	0.696	0.001	0.518	0.007	17.8 %	<0.0001
Zone_5_de_6	0.766	0.001	0.497	0.005	26.9 %	<0.0001
Zone_6_de_6	0.675	0.001	0.508	0.006	16.7 %	<0.0001

Open in a new tab

After demonstrating the effectiveness of Principal Component Analysis (PCA) in capturing the contribution of various variables to explain the variability of the response variable, PCA was integrated with both Random Forest and XGBoost models. This integration included all variables, incorporating the proposed signal processing features to enhance the model's predictive capabilities.

It is important to note that, due to data sparsity, the primary objective is to predict the absence or presence of emergency calls in the subsequent time slot. Consequently, the response variable is dichotomous, and the model aims to characterize its behavior based on the patterns and relationships of the remaining variables across the time series. This approach enables the development of a robust predictive framework for time-sensitive emergency scenarios. In this sense, Table 6 shows that accuracy outcomes with PCA/Random Forest are similar to the initial values obtained using the actual variables. The difference in performance in this case and the wrapper applied to the Random Forest was computing times. In this case, the mean PCA runtime was 25.65 s, considering all the 15 scenarios. On the other hand, the run time for Random Forest, considering 1000 trees, was 28.18 s. The number of trees used to finish the algorithm was 1000, considering that with a higher number of trees, no improvement in accuracy was observed. With fewer trees, accuracy was slightly reduced in some cases. Then, the total time taken for the model to converge, including Random Forest implementation and PCA, was 53.83 s. This is a huge difference compared to the initial time taken with a wrapper of more than 2 h to converge. Nevertheless, prediction results are too poor to consider this model as applicable.

Table 6 also shows the results obtained using the same dataset with the PCA and training an XGBoost. In this case, PCA convergence times were similar to convergence times when using Random Forest because this step is conducted precisely the same way in both cases. Nevertheless, when comparing accuracy results, differences show the benefits of working with the integration of XGBoost-PCA, including statistical, signal processing, geographical, time series, and weather features. This table also displays the p-values obtained by performing t-tests for the difference in the accuracy in each of the 15 scenarios. In all cases, the p-value was <0.001, proving that the proposed approach is better than using PCA and Random Forest to obtain good accuracy and short computing times. Additionally, the accuracy differences between Random Forest and XGBoost, both integrated with PCA, are presented for each scenario in Table 6. These differences range from 5.7 % to 26.9 %, with a mean difference of 16.4 % and a median difference of 16.7 %

After validating that the integration of XGBoost and PCA achieves statistically significant improvements in accuracy compared to Random Forest, the next step was to determine the importance of each original variable. This was accomplished by analyzing the contribution of each variable to the 100 principal components. PCA transforms the original variables to capture the maximum variance in the data, with each principal component associated with an eigenvalue that represents the amount of total variability it explains. The proportion of variability explained by each component is calculated as the ratio of its eigenvalue to the sum of all eigenvalues, highlighting its relative importance in describing the dataset's overall structure.

Each principal component is a combination of the original variables, where the coefficients, or loadings, reflect how strongly each variable contributes to that component. By squaring these loadings, we measure the proportion of variability in each principal component attributable to a specific variable. To assess the total contribution of each variable, these squared loadings are weighted by the explained variability of the respective components and summed across all components.

This process highlights which original variables significantly influence the predictive model, even after dimensionality reduction with PCA. It ensures that key information is retained while reducing noise and redundancy, enhancing the efficiency and interpretability of the model. The analysis of each actual variable family can be seen in Table 7—this table groups variables of the same family to facilitate the study of its contribution to the model. Therefore, in most cases in the table, one row corresponds to the variable group contribution considering, for example, different lags, despite not all lags having the same contribution to explain the variability in the response variable. This table shows the contribution of the features considered in this approach to increase the accuracy of the model and the amount of variability these variables explain from the response variables. Table 7 shows that the most relevant feature is the amount of Rainfall in the different lagged periods. This variable group accounts for 39.23 % of total variability. The second and third group variables in importance are statistical and signal analysis features such as Kurtosis in the lagged rainfall and number of calls accounting for 33.24 % of the variability and Power Spectral Density and Trapezoidal function of the lagged number of calls accounting for 12.88 % of the variability.

Table 7.

Variable importance for Zone 1 of 6 clusters when working with XGBoost/PCA considering the weather, statistical, signal processing, time series, and geographical features.

Variable group	Explained variability	Type of variable
Zone_1 40-lag Rainfall	39.23 %	Weather
Rainfall 40-lag Kurtosis	21.79 %	Statistical
Number of calls 40-lag Kurtosis	11.45 %	Statistical
Number of Calls 40-lag Power Spectral Density (PSD)	8.16 %	Signal processing
Number of Calls 40-lag Trapezoidal Function	4.72 %	Signal processing
Rainfall 40-lag Mean	3.03 %	Statistical
Zone_1_Number of Calls 40-lag	2.11 %	Time series
Zone_1_Day of Month	1.00 %	Time series
Number of periods of previous event	0.99 %	Time series
Zone_1_Month	0.98 %	Time series
Zone_1_Hour of day	0.98 %	Time series
Zone_1_Day of the Week	0.97 %	Time series
Rainfall 40-lag Skewness	0.88 %	Statistical
Number of Calls 40-lag Skewness	0.84 %	Statistical
Surroundings 40-lag Number of Calls	0.57 %	Geographical-Time series
Prediction of presence or absence of call in the next time window	0.49 %	Prediction
Prediction of the presence or absence of rain in the next time window	0.49 %	Prediction
Zone_1_Presense or absence of Calls 40-lag	0.36 %	Time series categorical
Surroundings Presence or absence of Calls 40-lag	0.32 %	Geographical-Time series
Rainfall 40-lag Median	0.32 %	Statistical
Number of Calls 40-lag RMS	0.12 %	Signal processing
Number of Calls 40-lag Mean	0.10 %	Statistical
Number of Calls 40-lag Median	0.05 %	Statistical
Presence or absence of rain 40-lag	0.03 %	Weather categorical
Zone_1_Year	0.01 %	Time series
Number of Calls 40-lag Entropy	0.00 %	Signal processing
ARIMA forecast Number of calls for next time window	0.00 %	Forecast
ARIMA forecast Rainfall for next time window	0.00 %	Forecast

Open in a new tab

5. Concluding Remarks

Considering the importance of relocation and location decisions to achieve short EMS response times, spatiotemporal prediction of the incident likelihood in the next time window becomes crucial to this decision-making process. In this sense, optimizing prediction techniques has become one of the significant challenges in EMS deployment at the different levels (Strategic, tactical, and operational). In the first two levels, forecasting the number of emergencies helps determine the resources required by the system and future system expansions due to the increase in population and population aging, among others. Nevertheless, when shifting to finer granularities in spatial and temporal aspects, it would be more useful in the decision-making process to determine the most probable location of the following emergency call. This spatiotemporal probability estimation would help relocate ambulances before the emergency call arrives instead of the emergency service sending the closest ambulance and reconfiguring the system to maximize coverage, considering that one or more ambulances are busy responding to previous calls. Therefore, this study aims to estimate the probability of receiving an emergency call in a spatiotemporal scope.

Building a robust model of these characteristics found many difficulties, such as data sparsity and having just one weather station in the city to gather rainfall data. On the other hand, including time series features in the model was challenging, considering that the benchmark prediction model focuses on forecasting the number of calls. It was necessary to compare the proposed approach to ARIMA, knowing that ARIMA is one of the most popular approaches used to do this forecast. Nevertheless, ARIMA only uses time series variables to perform its forecast; on the other hand, a multivariable prediction proved to be more effective. In this sense, statistical, weather, and signal analysis features improved model performance. Additionally, using PCA helped integrate the different types of variables and extract their contribution to explaining the total variability of the response variable. This helps to reduce noise, improving the accuracy of the model. The proposed approach proved helpful in tactical decisions due to its short computing times. This computing time can be further optimized in future studies.

It is also important to point out that the algorithm used to make the prediction was a significant factor in obtaining good results. In this study, XGBoost with a log-likelihood optimization objective and 3000 trees proved more effective than a random forest classifier. Future studies should deepen the selection and better configuration of the classifier and its hyperparameters.

One of the relevant findings of this study is how the use of statistical and signal processing features can help to increase model performance and complement the variability explained by time series variables. In this case, time series variables explained 7.03 % of the response variable variability, proving their relevance in this prediction but being less critical than weather features (39.26 %), statistical features (38.45 %), and signal processing features (13 %). This proves that it is necessary to continue exploring new features that help predict this spatiotemporal probability of receiving an emergency call in a better way. It is also important to gather other demographic data and date features such as Mother's Day, Independence Day, and soccer final matches. In Colombia, Mother's Day is one of the more violent days due to high levels of alcohol ingestion, so considering this particular variable in the model can contribute to improving the model's accuracy. More data could also help reach finer spatiotemporal granularity levels, helping decision-makers make real-time predictions. Future studies should focus on the integration of forecasting models and redeployment models for emergency vehicles, considering the optimization of processing time to achieve a real-time integration and, in doing so, real-time decision-making assistance. Finally, it is relevant to highlight that identifying demand clusters can help obtain better prediction outcomes. It should be considered one of the criteria for EMS grid design instead of working solely with rectangular grids. These clusters can be built by working with Euclidian or Manhattan distances and driving times.

The proposed approach has successfully enhanced accuracy by up to 26.9 % by integrating XGBoost and PCA and incorporating new features based on statistical, signal, geographical, weather, and time series analyses. PCA was also useful for this purpose, acquainting 97 % of the variability with 100 components (transformed variables) and reducing the noise induced in the model. Our approach demonstrates its potential to provide more accurate predictions in practical applications, thereby enhancing EMS responsiveness. Additionally, using PCA helped identify how much of the response variable variability is explained by each of the original family groups. For example, weather features accounted for 39.26 %, statistical features for 38.45 %, and signal processing features for 13 %.

The model successfully integrated K-means clustering to determine the zoning pattern, ensuring its ability to adjust dynamically to spatiotemporal variations in emergency calls. Considering that the technique groups similar observations in a zone, the classifier can better detect patterns, helping it increase its accuracy by up to 26.9 %.

The proposed approach addressed spatiotemporal data sparsity, surpassing traditional forecasting methods. Incorporating a diverse set of features—including statistical, geographical, meteorological, and signal processing elements—our model achieves a mean 16.4 % improvement in the accuracy of spatiotemporal predictions.

It is relevant to point out that while this study demonstrates the validity of the proposed approach and achieves higher accuracy values, certain limitations should be acknowledged to provide context for the findings. First, the dataset spans from January 2021 to May 2022, as data from earlier years were excluded due to quality concerns such as incomplete timestamps and locations. While this decision ensured the reliability of the data used, it may affect its prediction capabilities, considering that with a longer time series more patterns could be identified, improving model performance. Finally, while the model was validated on unseen data from May 2022, longer validation periods across diverse geographic and temporal conditions would further strengthen its generalizability. Future research could expand the dataset to include more extensive temporal ranges and test the model in other urban contexts to assess its adaptability.

Despite these limitations, the findings confirm the efficacy of integrating PCA with XGBoost for emergency call prediction, demonstrating significant performance improvements. These results provide a foundation for further exploration and highlight the potential for advanced predictive modeling to enhance emergency service response systems. In addition to these limitations, the study used default hyperparameters to evaluate the proposed model's baseline performance. While the results demonstrate the efficacy of integrating PCA with XGBoost for emergency call prediction, achieving a 16.4 % mean accuracy improvement hyperparameter tuning could further optimize predictive accuracy and computational efficiency. Exploring this in future work represents a promising research path for refining the proposed framework. Future research could explore advanced spatial machine learning techniques to enhance clustering and prediction accuracy. For instance, methods like Local Fuzzy Geographically Weighted Clustering [45], could improve geodemographic segmentation, while predictive techniques such as Geographical Random Forests [46] or Multiscale Geographically Weighted Regression [47] could provide deeper insights into spatial variations and relationships.

CRediT authorship contribution statement

Dionicio Neira-Rodado: Writing – original draft, Visualization, Validation, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Juan Camilo Paz-Roa: Writing – review & editing, Writing – original draft, Validation, Supervision, Methodology, Investigation, Conceptualization. John Willmer Escobar: Writing – review & editing, Supervision, Project administration, Methodology, Conceptualization. Miguel Ángel Ortiz-Barrios: Writing – review & editing.

Data availability statement

Not applicable.

Funding

This work was partially supported by the REMIND Project from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 734355.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:Dionicio Neira-Rodado reports travel was provided by Horizon 2020 European Innovation Council Fast Track to Innovation. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Dionicio Neira-Rodado, Email: dionicio.neira@correounivalle.edu.co, dneira1@cuc.edu.co.

Juan Camilo Paz-Roa, Email: juan.paz@javerianacali.edu.co.

John Willmer Escobar, Email: john.wilmer.escobar@correounivalle.edu.co.

Miguel Ángel Ortiz-Barrios, Email: mortiz1@cuc.edu.co.

References

1.Neira-Rodado D., Escobar-Velasquez J.W., McClean S. Ambulances deployment problems: categorization, evolution and dynamic problems review. ISPRS Int. J. Geo-Inf. Feb. 2022;11(2):109. doi: 10.3390/ijgi11020109. [DOI] [Google Scholar]
2.Han M.X., et al. MDPI; Oct. 01, 2021. Cardiac Arrest Occurring in High-Rise Buildings: A Scoping. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Laurina B.-B., Gabriel M.-O., Dionicio N.-R., Ana T.-E. Improvement of Barranquilla's EMS response time with the use of GIS. Procedia Comput. Sci. 2022;198:219–224. doi: 10.1016/j.procs.2021.12.231. [DOI] [Google Scholar]
4.Gonzalez R.P., Cummings G.R., Phelan H.A., Mulekar M.S., Rodning C.B. Does increased emergency medical services prehospital time affect patient mortality in rural motor vehicle crashes? A statewide analysis. Am. J. Surg. Jan. 2009;197(1):30–34. doi: 10.1016/J.AMJSURG.2007.11.018. [DOI] [PubMed] [Google Scholar]
5.Rodríguez Q A.K., Osorno O G.M., Maya D P.A. Relocation of vehicles emergency medical services: a review. Ing Cienc. 2016;12(23):163–202. doi: 10.17230/ingciencia.12.23.9. [DOI] [Google Scholar]
6.Aringhieri R., Bruni M.E., Khodaparasti S., van Essen J.T. Emergency medical services and beyond: addressing new challenges through a wide literature review. Comput. Oper. Res. 2017;78:349–368. doi: 10.1016/j.cor.2016.09.016. [DOI] [Google Scholar]
7.Wu J., Li Y., Ma Y. 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer, ICFTIC 2021. 2021. Comparison of XGBoost and the Neural Network model on the class-balanced datasets; pp. 457–461. [DOI] [Google Scholar]
8.Giannakas F., Troussas C., Krouska A., Sgouropoulou C., Voyiatzis I. XGBoost and deep neural network comparison: the case of teams' performance. Lect. Notes Comput. Sci. 2021;12677(LNCS):343–349. doi: 10.1007/978-3-030-80421-3_37/COVER. [DOI] [Google Scholar]
9.Nabarro S., Fletcher T., Shawe-Taylor J. Spatiotemporal prediction of ambulance demand using Gaussian process regression. 2018. http://arxiv.org/abs/1806.10873 [Online]. Available:
10.Zhou Z., Matteson D.S. Predicting Melbourne Ambulance demand using kernel warping. Ann. Appl. Stat. Jul. 2015;10(4):1977–1996. doi: 10.1214/16-AOAS961. [DOI] [Google Scholar]
11.Paliari I., Karanikola A., Kotsiantis S. IISA 2021 - 12th International Conference on Information, Intelligence, Systems and Applications. Jul. 2021. A comparison of the optimized LSTM, XGBOOST and ARIMA in Time Series forecasting. [DOI] [Google Scholar]
12.Martin R.J., Mousavi R., Saydam C. Predicting emergency medical service call demand: a modern spatiotemporal machine learning approach. Oper Res Health Care. 2021;28(Mar) doi: 10.1016/j.orhc.2021.100285. [DOI] [Google Scholar]
13.Abreu P., Santos D., Barbosa-Povoa A. Data-driven forecasting for operational planning of emergency medical services. Socioecon Plann Sci. 2023;86(Apr) doi: 10.1016/j.seps.2022.101492. [DOI] [Google Scholar]
14.Reuter-Oppermann M., van den Berg P.L., Vile J.L. Logistics for emergency medical service systems. Health Systems. 2017;6(3):187–208. doi: 10.1057/s41306-017-0023-x. [DOI] [Google Scholar]
15.Payares-Garcia D., Platero J., Mateu J. A dynamic spatio-temporal stochastic modeling approach of emergency calls in an urban context. Mathematics. Feb. 2023;11(4) doi: 10.3390/math11041052. [DOI] [Google Scholar]
16.Kamenetzky R.D., Shuman L.J., Wolfe H. Estimating need and demand for prehospital care. Oper. Res. 1982;30(6):1148–1167. doi: 10.1287/OPRE.30.6.1148. [DOI] [PubMed] [Google Scholar]
17.Nicoletta V., Lanzarone E., Guglielmi A., Bélanger V., Ruiz A. A bayesian model for describing and predicting the stochastic demand of emergency calls. Springer Proceedings in Mathematics and Statistics. 2017;194:203–212. doi: 10.1007/978-3-319-54084-9_19. [DOI] [Google Scholar]
18.Baker J.R., Fitzpatrick K.E. Determination of an optimal forecast model for ambulance demand using goal programming. J. Oper. Res. Soc. Nov. 1986;37(11):1047–1059. doi: 10.1057/JORS.1986.182/METRICS. [DOI] [PubMed] [Google Scholar]
19.Brown L.H., Lerner E.B., Larmon B., LeGassick T., Taigman M. Are EMS call volume predictions based on demand pattern analysis accurate? Prehospital emergency care. Apr. 2007;11(2):199–203. doi: 10.1080/10903120701204797. [DOI] [PubMed] [Google Scholar]
20.Channouf N., L'Ecuyer P., Ingolfsson A., Avramidis A.N. The application of forecasting techniques to modeling emergency medical system calls in Calgary, Alberta. Health Care Manag. Sci. Feb. 2007;10(1):25–45. doi: 10.1007/S10729-006-9006-3. [DOI] [PubMed] [Google Scholar]
21.Setzler H., Saydam C., Park S. EMS call volume predictions. Comput. Oper. Res. Jun. 2009;36(6):1843–1851. doi: 10.1016/J.COR.2008.05.010. [DOI] [Google Scholar]
22.Chen A.Y., Lu T.Y., Ma M.H.M., Sun W.Z. Demand forecast using data analytics for the preallocation of ambulances. IEEE J Biomed Health Inform. Jul. 2016;20(4):1178–1187. doi: 10.1109/JBHI.2015.2443799. [DOI] [PubMed] [Google Scholar]
23.Vile J.L., Gillard J.W., Harper P.R., Knight V.A. Predicting ambulance demand using singular spectrum analysis. J. Oper. Res. Soc. 2012;63(11):1556–1565. doi: 10.1057/JORS.2011.160. [DOI] [Google Scholar]
24.Grekousis G., Liu Y. Where will the next emergency event occur? Predicting ambulance demand in emergency medical services using artificial intelligence. Comput. Environ. Urban Syst. 2019;76(April):110–122. doi: 10.1016/j.compenvurbsys.2019.04.006. [DOI] [Google Scholar]
25.Grekousis G., Photis Y.N. Analyzing high-risk emergency areas with GIS and neural networks: the case of Athens, Greece. Prof. Geogr. Jan. 2014;66(1):124–137. doi: 10.1080/00330124.2013.765300. [DOI] [Google Scholar]
26.Photis Y.N., Grekousis G. Locational planning for emergency management and response: an artificial intelligence approach. Int. J. Sustain. Dev. Plann. 2012;7(3):372–384. doi: 10.2495/SDP-V7-N3-372-384. [DOI] [Google Scholar]
27.António M., Veloso M. 2016. “Analysis of Taxi Data for Understanding Urban Dynamics,”. [Google Scholar]
28.Shalev-Schwartz S., Ben-David S. Understanding machine learning: from theory to algorithms. 2014. http://www.cs.huji.ac.il/∼shais/UnderstandingMachineLearning [Online]. Available:
29.Saadi I., Wong M., Farooq B., Teller J., Cools M. An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service. arXiv.org. 2017 [Google Scholar]
30.L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas, “On Predicting the Taxi-Passenger Demand: A Real-Time Approach”.
31.Ke J., Zheng H., Yang H., Michael Chen X. Short-term forecasting of passenger demand under on-demand ride services: a spatio-temporal deep learning approach. Transp Res Part C Emerg Technol. Dec. 2017;85:591–608. doi: 10.1016/J.TRC.2017.10.016. [DOI] [Google Scholar]
32.Rahaman M.S., Hamilton M., Salim F. Jan. 2017. “Predicting Imbalanced Taxi and Passenger Queue Contexts in Airport,”. [Online]. Available:/articles/conference_contribution/Predicting_imbalanced_taxi_and_passenger_queue_contexts_in_airport/24791523/1. [Google Scholar]
33.Xu R.S. M.M. I. of T. Machine learning for real-time demand forecasting. 2015. https://dspace.mit.edu/handle/1721.1/99565 [Online]. Available:
34.Zhang X., Wang X., Chen W., Tao J., Huang W., Wang T. 2017 IEEE 3rd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS) Jul. 2017. A taxi gap prediction method via double ensemble gradient boosting decision tree; pp. 255–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Laviolette J., Morency C., Saunier N., Lacombe A., Douakha K. 2017. “Temporal & Spatial Analysis of Taxi Demand in Montréal, Using a Clustering Approach,”. [Google Scholar]
36.wen Chang H., chin Tai Y., Hsu Y. jen J. Context-aware taxi demand hotspots prediction. Int. J. Bus. Intell. Data Min. 2010;5(1):3–18. doi: 10.1504/IJBIDM.2010.030296. [DOI] [Google Scholar]
37.Davis N., Raina G., Jagannathan K. A multi-level clustering approach for forecasting taxi travel demand. IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC. Dec. 2016:223–228. doi: 10.1109/ITSC.2016.7795558. [DOI] [Google Scholar]
38.Wei H., Wang Y., Wo T., Liu Y., Xu J. ZEST: a hybrid model on predicting passenger demand for chauffeured car service. International Conference on Information and Knowledge Management, Proceedings. Oct. 2016;24–28:2203–2208. doi: 10.1145/2983323.2983667. October-2016. [DOI] [Google Scholar]
39.Laha A.K., Putatunda S. Real time location prediction with taxi-GPS data streams. Transp Res Part C Emerg Technol. Jul. 2018;92:298–322. doi: 10.1016/J.TRC.2018.05.005. [DOI] [Google Scholar]
40.Liu J., Sun L., Li Q., Ming J., Liu Y., Xiong H. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Part F129685; Aug. 2017. Functional zone based hierarchical demand prediction for bike system expansion; pp. 957–966. [DOI] [Google Scholar]
41.Tandberg D., Tibbetts J., Sklar D.P. Time series forecasts of ambulance run volume. Am. J. Emerg. Med. May 1998;16(3):232–237. doi: 10.1016/S0735-6757(98)90090-0. [DOI] [PubMed] [Google Scholar]
42.Niazkar M., et al. Applications of XGBoost in water resources engineering: a systematic literature review (Dec 2018–May 2023) Environ. Model. Software. Mar. 2024;174 doi: 10.1016/J.ENVSOFT.2024.105971. [DOI] [Google Scholar]
43.Belgiu M., Drăgu L. Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogrammetry Remote Sens. Apr. 2016;114:24–31. doi: 10.1016/J.ISPRSJPRS.2016.01.011. [DOI] [Google Scholar]
44.Festa D., et al. Unsupervised detection of InSAR time series patterns based on PCA and K-means clustering. Int. J. Appl. Earth Obs. Geoinf. Apr. 2023;118 doi: 10.1016/J.JAG.2023.103276. [DOI] [Google Scholar]
45.Grekousis G. Local fuzzy geographically weighted clustering: a new method for geodemographic segmentation. Int. J. Geogr. Inf. Sci. Jan. 2021;35(1):152–174. doi: 10.1080/13658816.2020.1808221. [DOI] [Google Scholar]
46.Lotfata A., Grekousis G., Wang R. “Using geographical random forest models to explore spatial patterns in the neighborhood determinants of hypertension prevalence across chicago, illinois, USA,”. 2023;50(9):2376–2393. doi: 10.1177/23998083231153401. 10.1177/23998083231153401. [DOI] [Google Scholar]
47.Wang Z., Gong X., Zhang Y., Liu S., Chen N. Multi-scale geographically weighted elasticity regression model to explore the elastic effects of the built environment on ride-hailing ridership. Sustainability. Mar. 2023;15(6):4966. doi: 10.3390/SU15064966. 2023, Vol. 15, Page 4966. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.

[bib1] 1.Neira-Rodado D., Escobar-Velasquez J.W., McClean S. Ambulances deployment problems: categorization, evolution and dynamic problems review. ISPRS Int. J. Geo-Inf. Feb. 2022;11(2):109. doi: 10.3390/ijgi11020109. [DOI] [Google Scholar]

[bib2] 2.Han M.X., et al. MDPI; Oct. 01, 2021. Cardiac Arrest Occurring in High-Rise Buildings: A Scoping. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Laurina B.-B., Gabriel M.-O., Dionicio N.-R., Ana T.-E. Improvement of Barranquilla's EMS response time with the use of GIS. Procedia Comput. Sci. 2022;198:219–224. doi: 10.1016/j.procs.2021.12.231. [DOI] [Google Scholar]

[bib4] 4.Gonzalez R.P., Cummings G.R., Phelan H.A., Mulekar M.S., Rodning C.B. Does increased emergency medical services prehospital time affect patient mortality in rural motor vehicle crashes? A statewide analysis. Am. J. Surg. Jan. 2009;197(1):30–34. doi: 10.1016/J.AMJSURG.2007.11.018. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Rodríguez Q A.K., Osorno O G.M., Maya D P.A. Relocation of vehicles emergency medical services: a review. Ing Cienc. 2016;12(23):163–202. doi: 10.17230/ingciencia.12.23.9. [DOI] [Google Scholar]

[bib6] 6.Aringhieri R., Bruni M.E., Khodaparasti S., van Essen J.T. Emergency medical services and beyond: addressing new challenges through a wide literature review. Comput. Oper. Res. 2017;78:349–368. doi: 10.1016/j.cor.2016.09.016. [DOI] [Google Scholar]

[bib7] 7.Wu J., Li Y., Ma Y. 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer, ICFTIC 2021. 2021. Comparison of XGBoost and the Neural Network model on the class-balanced datasets; pp. 457–461. [DOI] [Google Scholar]

[bib8] 8.Giannakas F., Troussas C., Krouska A., Sgouropoulou C., Voyiatzis I. XGBoost and deep neural network comparison: the case of teams' performance. Lect. Notes Comput. Sci. 2021;12677(LNCS):343–349. doi: 10.1007/978-3-030-80421-3_37/COVER. [DOI] [Google Scholar]

[bib9] 9.Nabarro S., Fletcher T., Shawe-Taylor J. Spatiotemporal prediction of ambulance demand using Gaussian process regression. 2018. http://arxiv.org/abs/1806.10873 [Online]. Available:

[bib10] 10.Zhou Z., Matteson D.S. Predicting Melbourne Ambulance demand using kernel warping. Ann. Appl. Stat. Jul. 2015;10(4):1977–1996. doi: 10.1214/16-AOAS961. [DOI] [Google Scholar]

[bib11] 11.Paliari I., Karanikola A., Kotsiantis S. IISA 2021 - 12th International Conference on Information, Intelligence, Systems and Applications. Jul. 2021. A comparison of the optimized LSTM, XGBOOST and ARIMA in Time Series forecasting. [DOI] [Google Scholar]

[bib12] 12.Martin R.J., Mousavi R., Saydam C. Predicting emergency medical service call demand: a modern spatiotemporal machine learning approach. Oper Res Health Care. 2021;28(Mar) doi: 10.1016/j.orhc.2021.100285. [DOI] [Google Scholar]

[bib13] 13.Abreu P., Santos D., Barbosa-Povoa A. Data-driven forecasting for operational planning of emergency medical services. Socioecon Plann Sci. 2023;86(Apr) doi: 10.1016/j.seps.2022.101492. [DOI] [Google Scholar]

[bib14] 14.Reuter-Oppermann M., van den Berg P.L., Vile J.L. Logistics for emergency medical service systems. Health Systems. 2017;6(3):187–208. doi: 10.1057/s41306-017-0023-x. [DOI] [Google Scholar]

[bib15] 15.Payares-Garcia D., Platero J., Mateu J. A dynamic spatio-temporal stochastic modeling approach of emergency calls in an urban context. Mathematics. Feb. 2023;11(4) doi: 10.3390/math11041052. [DOI] [Google Scholar]

[bib16] 16.Kamenetzky R.D., Shuman L.J., Wolfe H. Estimating need and demand for prehospital care. Oper. Res. 1982;30(6):1148–1167. doi: 10.1287/OPRE.30.6.1148. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Nicoletta V., Lanzarone E., Guglielmi A., Bélanger V., Ruiz A. A bayesian model for describing and predicting the stochastic demand of emergency calls. Springer Proceedings in Mathematics and Statistics. 2017;194:203–212. doi: 10.1007/978-3-319-54084-9_19. [DOI] [Google Scholar]

[bib18] 18.Baker J.R., Fitzpatrick K.E. Determination of an optimal forecast model for ambulance demand using goal programming. J. Oper. Res. Soc. Nov. 1986;37(11):1047–1059. doi: 10.1057/JORS.1986.182/METRICS. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Brown L.H., Lerner E.B., Larmon B., LeGassick T., Taigman M. Are EMS call volume predictions based on demand pattern analysis accurate? Prehospital emergency care. Apr. 2007;11(2):199–203. doi: 10.1080/10903120701204797. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Channouf N., L'Ecuyer P., Ingolfsson A., Avramidis A.N. The application of forecasting techniques to modeling emergency medical system calls in Calgary, Alberta. Health Care Manag. Sci. Feb. 2007;10(1):25–45. doi: 10.1007/S10729-006-9006-3. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Setzler H., Saydam C., Park S. EMS call volume predictions. Comput. Oper. Res. Jun. 2009;36(6):1843–1851. doi: 10.1016/J.COR.2008.05.010. [DOI] [Google Scholar]

[bib22] 22.Chen A.Y., Lu T.Y., Ma M.H.M., Sun W.Z. Demand forecast using data analytics for the preallocation of ambulances. IEEE J Biomed Health Inform. Jul. 2016;20(4):1178–1187. doi: 10.1109/JBHI.2015.2443799. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Vile J.L., Gillard J.W., Harper P.R., Knight V.A. Predicting ambulance demand using singular spectrum analysis. J. Oper. Res. Soc. 2012;63(11):1556–1565. doi: 10.1057/JORS.2011.160. [DOI] [Google Scholar]

[bib24] 24.Grekousis G., Liu Y. Where will the next emergency event occur? Predicting ambulance demand in emergency medical services using artificial intelligence. Comput. Environ. Urban Syst. 2019;76(April):110–122. doi: 10.1016/j.compenvurbsys.2019.04.006. [DOI] [Google Scholar]

[bib25] 25.Grekousis G., Photis Y.N. Analyzing high-risk emergency areas with GIS and neural networks: the case of Athens, Greece. Prof. Geogr. Jan. 2014;66(1):124–137. doi: 10.1080/00330124.2013.765300. [DOI] [Google Scholar]

[bib26] 26.Photis Y.N., Grekousis G. Locational planning for emergency management and response: an artificial intelligence approach. Int. J. Sustain. Dev. Plann. 2012;7(3):372–384. doi: 10.2495/SDP-V7-N3-372-384. [DOI] [Google Scholar]

[bib27] 27.António M., Veloso M. 2016. “Analysis of Taxi Data for Understanding Urban Dynamics,”. [Google Scholar]

[bib28] 28.Shalev-Schwartz S., Ben-David S. Understanding machine learning: from theory to algorithms. 2014. http://www.cs.huji.ac.il/∼shais/UnderstandingMachineLearning [Online]. Available:

[bib29] 29.Saadi I., Wong M., Farooq B., Teller J., Cools M. An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service. arXiv.org. 2017 [Google Scholar]

[bib30] 30.L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas, “On Predicting the Taxi-Passenger Demand: A Real-Time Approach”.

[bib31] 31.Ke J., Zheng H., Yang H., Michael Chen X. Short-term forecasting of passenger demand under on-demand ride services: a spatio-temporal deep learning approach. Transp Res Part C Emerg Technol. Dec. 2017;85:591–608. doi: 10.1016/J.TRC.2017.10.016. [DOI] [Google Scholar]

[bib32] 32.Rahaman M.S., Hamilton M., Salim F. Jan. 2017. “Predicting Imbalanced Taxi and Passenger Queue Contexts in Airport,”. [Online]. Available:/articles/conference_contribution/Predicting_imbalanced_taxi_and_passenger_queue_contexts_in_airport/24791523/1. [Google Scholar]

[bib33] 33.Xu R.S. M.M. I. of T. Machine learning for real-time demand forecasting. 2015. https://dspace.mit.edu/handle/1721.1/99565 [Online]. Available:

[bib34] 34.Zhang X., Wang X., Chen W., Tao J., Huang W., Wang T. 2017 IEEE 3rd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS) Jul. 2017. A taxi gap prediction method via double ensemble gradient boosting decision tree; pp. 255–260. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Laviolette J., Morency C., Saunier N., Lacombe A., Douakha K. 2017. “Temporal & Spatial Analysis of Taxi Demand in Montréal, Using a Clustering Approach,”. [Google Scholar]

[bib36] 36.wen Chang H., chin Tai Y., Hsu Y. jen J. Context-aware taxi demand hotspots prediction. Int. J. Bus. Intell. Data Min. 2010;5(1):3–18. doi: 10.1504/IJBIDM.2010.030296. [DOI] [Google Scholar]

[bib37] 37.Davis N., Raina G., Jagannathan K. A multi-level clustering approach for forecasting taxi travel demand. IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC. Dec. 2016:223–228. doi: 10.1109/ITSC.2016.7795558. [DOI] [Google Scholar]

[bib38] 38.Wei H., Wang Y., Wo T., Liu Y., Xu J. ZEST: a hybrid model on predicting passenger demand for chauffeured car service. International Conference on Information and Knowledge Management, Proceedings. Oct. 2016;24–28:2203–2208. doi: 10.1145/2983323.2983667. October-2016. [DOI] [Google Scholar]

[bib39] 39.Laha A.K., Putatunda S. Real time location prediction with taxi-GPS data streams. Transp Res Part C Emerg Technol. Jul. 2018;92:298–322. doi: 10.1016/J.TRC.2018.05.005. [DOI] [Google Scholar]

[bib40] 40.Liu J., Sun L., Li Q., Ming J., Liu Y., Xiong H. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Part F129685; Aug. 2017. Functional zone based hierarchical demand prediction for bike system expansion; pp. 957–966. [DOI] [Google Scholar]

[bib41] 41.Tandberg D., Tibbetts J., Sklar D.P. Time series forecasts of ambulance run volume. Am. J. Emerg. Med. May 1998;16(3):232–237. doi: 10.1016/S0735-6757(98)90090-0. [DOI] [PubMed] [Google Scholar]

[bib42] 42.Niazkar M., et al. Applications of XGBoost in water resources engineering: a systematic literature review (Dec 2018–May 2023) Environ. Model. Software. Mar. 2024;174 doi: 10.1016/J.ENVSOFT.2024.105971. [DOI] [Google Scholar]

[bib43] 43.Belgiu M., Drăgu L. Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogrammetry Remote Sens. Apr. 2016;114:24–31. doi: 10.1016/J.ISPRSJPRS.2016.01.011. [DOI] [Google Scholar]

[bib44] 44.Festa D., et al. Unsupervised detection of InSAR time series patterns based on PCA and K-means clustering. Int. J. Appl. Earth Obs. Geoinf. Apr. 2023;118 doi: 10.1016/J.JAG.2023.103276. [DOI] [Google Scholar]

[bib45] 45.Grekousis G. Local fuzzy geographically weighted clustering: a new method for geodemographic segmentation. Int. J. Geogr. Inf. Sci. Jan. 2021;35(1):152–174. doi: 10.1080/13658816.2020.1808221. [DOI] [Google Scholar]

[bib46] 46.Lotfata A., Grekousis G., Wang R. “Using geographical random forest models to explore spatial patterns in the neighborhood determinants of hypertension prevalence across chicago, illinois, USA,”. 2023;50(9):2376–2393. doi: 10.1177/23998083231153401. 10.1177/23998083231153401. [DOI] [Google Scholar]

[bib47] 47.Wang Z., Gong X., Zhang Y., Liu S., Chen N. Multi-scale geographically weighted elasticity regression model to explore the elastic effects of the built environment on ride-hailing ridership. Sustainability. Mar. 2023;15(6):4966. doi: 10.3390/SU15064966. 2023, Vol. 15, Page 4966. [DOI] [Google Scholar]

PERMALINK

A novel machine learning approach for spatiotemporal prediction of EMS events: A case study from Barranquilla, Colombia

Dionicio Neira-Rodado

Juan Camilo Paz-Roa

John Willmer Escobar

Miguel Ángel Ortiz-Barrios

Abstract

Highlights

1. Introduction

2. Literature review

Table 1.

3. Methodology

Fig. 1.

Table 2.

4. Results

4.1. Descriptive analysis

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

5. Concluding Remarks

CRediT authorship contribution statement

Data availability statement

Funding

Declaration of competing interest

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases