Abstract
Background
The serial correlation in the time series datasets should be considered to prevent biased estimates for coefficients. Nonetheless, the current models almost cannot explicitly handle autocorrelation and seasonality, and they focus mainly on the discrete nature of data. Nonetheless, the crash time series follows a normal distribution at the macro-scale. Moreover, the influential exogenous variables have been overlooked in Iran, employing univariate models. There are also contradictory results in the literature regarding the effect of average speed on crash frequency.
Objective
This study is aimed to evaluate the distinct impacts of mean speed on total and fatal accident time series at the national level. Besides, the SARIMAX modeling framework is introduced as a robust multivariate method for short-term crash frequency prediction.
Method
To this end, monthly total and fatal crash counts were aggregated for all rural highways in Iran. Besides, the time trends of traffic exposure, and average speed recorded by loop detectors, were aggregated at the same level as covariates. The Box-Jenkins methodology was employed for time series analysis.
Results
The results illustrated that the seasonal autoregressive integrated moving average with explanatory variable (SARIMAX) model outperformed the univariate ARIMA and SARIMA models. Also, SARIMA was more appropriate than the simple ARIMA when seasonality existed in the time series. Besides, the average speed had a negative linear association with the total crashes. In contrast, it revealed an increasing effect on fatal crashes.
Conclusion
Average speed has a dissimilar effect on the different traffic crash severities. Besides, the seasonal nature of data and the dynamic effects of the influential underlying factors should be considered to prevent underfitting issues and to predict future time trends accurately.
Applications
The developed instruments could be employed by policymakers to evaluate the intervention's effectiveness and to forecast the future time trends of accidents in Iran.
Keywords: Time series, SARIMAX, Traffic safety, Multivariate, Crash prediction, Speed
Highlights
-
•
Average speed has a dissimilar effect on the different traffic crash severities.
-
•
The SARIMAX model is a valuable instrument to identify time-varying factors underlying accidents time series.
-
•
The accident frequency follows closely the traffic flow time trend, revealing a high seasonality.
1. Introduction
Motor vehicle crashes (MVCs) have become a critical public health issue worldwide, causing nearly 1.35 million fatalities annually [1]. The reported data indicates that MVCs were the eighth-leading cause of death in 2016; however, if no significant measures are taken, it is expected to escalate to the seventh position by 2030 [2]. Recognizing the severity of this problem, the United Nations included the prevention of the road traffic injuries as a part of the sustainable development goals in 2015 [3]. In Iran, the annual death rate from road traffic crashes is 25.5 per 100,000 people, which is significantly higher than the global average of 18.2 per 100,000 people [1]. Therefore, in order to develop effective interventions, it is crucial to gain an understanding of how traffic accidents will evolve over time, while taking into consideration the concurrent effects of various time-varying influential factors.
Time series regression methods are utilized to identify the relationship between crash counts for a geographic region (e.g., county, country) and their underlying time-varying regressors, such as macroeconomic conditions, operational traffic characteristics, and socio-demographic specifications. These methods target three purposes: to forecast future accident trends, given the future values of explanatory variables, to compare the relative importance of various time-varying contributing factors, and to conduct interrupted time series analysis aimed to evaluate the effectiveness of a particular policy or countermeasure. Nonetheless, the validity and performance of such models largely rely on selecting accurate econometric models. To recognize an adequate model postulation, recognizing count variables characteristics is essential as MVCs are nonnegative, discrete, and sporadic event count [4].
Previous studies have primarily utilized Negative Binomial (NB), and Poisson regressions for crash prediction, as crash counts are nonnegative integers. However, such models have a restrictive implicit assumption of independently and identically distributed error terms, which can be inappropriate for analyzing time series data. In the case of the crash time series, regular and seasonal serial correlations are intrinsic features that should not be overlooked, as disregarding them may result in inefficient parameter estimates and erroneous inferences [4]. To address this issue, scholars have introduced autoregressive (AR) terms to discrete-valued regressions, including the Poisson autoregressive (PAR) [5] and integer-valued Poisson (INAR) models [4]. However, these methods only consider the AR terms and overlook the moving average (MA) components. Additionally, they neglect the overdispersion feature of crash time series. To tackle these problems, the NBGARCH [6] and GLARMA [7] methods have been developed to handle autocorrelated, integer-valued, and over-dispersed time series. Nevertheless, these methods still fail to account for seasonal autocorrelations in crash time series, which is a significant limitation. In order to address the limitation of traditional time series models the seasonal autoregressive integrated moving average (SARIMA) model has been proposed as a linear dependence modeling framework that can effectively analyze time series, taking into account seasonality [8]. Previous studies have utilized the ARIMA method to model monthly crash counts aggregated at the intersection level and to evaluate photo enforcement efficiency [9]. More recently, the multivariate ARIMA (ARIMAX) model has been shown to outperform its univariate counterpart by considering vehicle, human, and roadway predictors of crash time series [10]. Furthermore, some scholars have suggested the use of the multivariate SARIMA (SARIMAX) model to account for both the dynamic effects of influential predictors and the seasonality of crash time series [11]. One of the main disadvantages of ARIMA-based models is that they assume a normal Gaussian distribution for the error term, which does not take into consideration the discrete nature of crash counts. Nevertheless, this restriction may not be particularly relevant in highly aggregated datasets with large spatial and temporal dimensions, where the mean of counts is relatively high, and the assumption of a Gaussian normal distribution for the error term is appropriate [4].
The statistical method is widely recognized for its comprehensibility and strong theoretical underpinnings. However, the inflexible assumptions regarding the data distribution and the restricted paradigmatic relationships imposed on the link function between the dependent variable and predictors are of significant concern. This is due to the potential bias in parameter estimates and misinterpreted findings resulting from the violation of implicit assumptions inherent in the statistical approach. In recent times, the nonlinear associations observed in actual safety data have prompted a shift in focus from traditional frequentist approaches to machine learning (ML) algorithms. While machine learning models have been criticized for their lack of transparency, this is not a significant impediment when the primary aim of the study is to make predictions rather than interpret the results. Consequently, prior studies have utilized feedforward, multilayer feedforward, and recurrent neural networks to accurately represent the nonlinear behavior of accident time series [8]. Nonetheless, deep learning models tend to perform poorly when dealing with small and less complex datasets, reducing the chances of effective training. There is empirical evidence that supports this claim. A comprehensive comparative study that evaluated various time series prediction models for the COVID-19 outbreak revealed that traditional statistical models outperformed advanced deep learning models at the onset of the outbreak due to the low sample size available [12]. That is especially the case for crash prediction models that rely on monthly data for a short period, containing up to 80 data points. More importantly, in crash analysis, the applicability of machine learning methods is limited by their black-box nature, particularly when the primary objective of the study is to prioritize the evaluation of the effects of various exogenous variables on the target variable rather than just prediction. Moreover, the theoretical basis of neural network methods for time series forecasting is vague, as it is not clear whether seasonal and trend dependencies should be removed prior to model estimation. Additionally, the best method of splitting the data for testing and training is not obvious in the methodology [13]. On the other hand, the classical SARIMA model has a well-established theoretical basis, and the availability of statistical packages to employ such models has increased the instrument's applicability for policymakers and local jurisdiction safety experts.
It is evident that speed is a major factor in the number of fatalities on the roads in high-income countries, with an estimated 30% of deaths attributed to it. Meanwhile, in some low-income and middle-income countries, speed is estimated to be the primary contributory factor in about half of all road crashes [14]. Reviewing the literature, it is evident that if speed increases while other conditions remain unchanged, accidents will tend to be more severe [15,16]. However, there are contradictory results regarding speed's effects on crash frequency.
Previous research has yielded contradictory findings on the relationship between mean speed and crash occurrence, despite the established understanding that speed variation is detrimental to traffic safety [[17], [18], [19]]. Some studies have found no significant association or even an inverse association between crash frequency and mean speed. However, these studies have been criticized for not accounting for the “frequency-severity indeterminacy”, which affects the distribution of accident severities and types, leading to inconclusive results [16].
1.1. Research motivation and objectives
Compared to simple ARIMA models, the SARIMA model is advantageous in modeling linear serial correlations in seasonal series [20] and is amongst the most efficient linear methods for such series [21]. However, the multivariate SARIMAX model, which accounts for both the seasonal effects and the underlying exogenous variables, has seldom been applied in the macro crash forecast [4]. Meanwhile, the underfitting issue in univariate models might result in a lack of generalizability and biased out-of-sample predictions. Accordingly, this research makes a noteworthy contribution by developing the multivariate SARIMA (SARIMAX) model and comparing its results to those of the univariate model, thereby filling a gap in the literature. Besides, previous studies have not employed multivariate methods to reveal the quantitative effects of traffic exposure and average speed on the traffic accidents time trend in Iran. Nonetheless, illustrating the partial contribution of these important contributing factors is crucial for adopting proper countermeasures. Considering the dynamic effects of the time-varying influential factors would improve the forecast accuracy and generalizability of the proposed instruments. Accordingly, this research contributes to the literature by employing the multivariate SARIMAX model to predict the national total and fatal traffic crashes on rural Iran highways. Estimating future time trends of crash frequency plays a key role in designing interventions to improve traffic safety and assists policymakers in evaluating the effectiveness of the implemented countermeasures.
To avoid an inverse association between crash likelihood and mean speed and misleading conclusions, the impacts of average speed on distinct crash types must be studied separately. Otherwise, the most common accident type in the study area could overshadow the different impacts of speed on other crash types. Unfortunately, previous studies have not investigated the different crash mechanisms for the different accident severities/types, which would explain the inconsistent findings in the literature regarding the speed-safety association. Hence, this paper investigates the distinct impacts of mean speed on total and fatal traffic crashes, considering the contradictory results in the literature.
The remainder of the paper is structured as follows: Section 2 presents a comprehensive literature review, highlighting state-of-the-art studies and research gaps. Section 3 provides details of the dataset, SARIMAX, and SARIMA models. Section 4 presents and discusses the forecasting results obtained after fitting the models to the data. Finally, the paper is concluded, and future research recommendations are provided in Section 5.
2. Literature review
2.1. Methodologies in crash count time series analysis
2.1.1. Integer-valued statistical methods
In the last two decades, the negative binomial and Poisson models have widely been used to predict cross-sectional, panel, and time series crash count data [22,23]. The results of studies illustrated that these models are appropriate for the nonnegative integer nature of accident count data. Nonetheless, these regression models have the implicit assumption that observations should be independent of each other. Consequently, they cannot incorporate the autocorrelation and seasonal fluctuations of time series data in the model development. Accordingly, different solutions have been proposed by researchers to deal with the impairment. For example, some studies adopted a time trend variable in the model as an explanatory variable to account for the serial correlation [24,25]. Others have utilized the state-space [26] or the Praise-Winston autoregressive (AR) models on crash count time series [27]. Despite the potential benefits of implementing the aforementioned methods, there is no guarantee that they will take into account the effect of serial correlation, particularly in the case of long-time series count data [4].
To explicitly account for the effect of autocorrelation, scholars developed the integer-valued autoregressive (INAR) Poisson model, dealing both with the serial correlation and discrete nonnegative nature of accident time series [4,28]. However, the INAR (1) Poisson model still had significant limitations, as it constrained the analysis to the stationary time series process. Besides, the dynamic properties of the influential factors were not adequately described in the INAR model. Moreover, the model's structure was not flexible enough to handle the serial correlation appropriately, as solely the first-order autoregressive term (AR (1)) was allowed, and there was no moving average (MA) term. Also, it could not handle the over-dispersed crash count data. Recently, studies employed the linear Poisson autoregressive (PAR(p)) model that can account for higher-order AR(P) processes and also deals well with the discrete nature of crash counts [5]. Nonetheless, both the PAR and INAR models fail to deal well with the effect of overdispersion. To address the issue, Ye et al. [6] introduced an extension of an NB logit regression model, termed the NBINGARCH model. The model contained explanatory variables and simultaneously accounted for the serial correlation and overdispersion characteristics of the time series count data. However, the methodology has a notable limitation in that none of the mentioned models explicitly account for seasonal effects in the long-term accident time series.
2.1.2. Real-valued statistical methods vs. machine learning algorithms
The autoregressive integrated moving average (ARIMA) model [29] has been widely used to incorporate the serial correlation feature of time series. The seasonal ARIMA model (SARIMA) also considers the seasonality of time trends and has indicated superiority over well-known predictive methods such as artificial neural networks, exponential smoothing, and moving average in the literature [30,31]. Recently, Blázquez-García et al. [32] compared the predictive capability of the SARIMA model with the artificial neural networks (ANN) and the generalized additive model (GAM). The results indicated that the SARIMA approach outperforms the competitive models. Qian et al. [8] analyzed the monthly road traffic fatalities in China from 2000 to 2017. The accuracy of SARIMA and Elman recurrent neural network (ERNN) models was compared, utilizing the mean absolute percentage error (MAPE) criterion. The ERRN model revealed slightly better performance. Indeed, the nonlinear models, such as the ANN model, solely outperform the traditional linear ARIMA models when the nonlinear behavior is observed in the time series [8].
A study recently investigated the yearly traffic fatality counts in Malaysia. The research compared the performance of ARIMA (0,1,1) with the Poisson GLM and the negative binomial GLM models. The results demonstrated that the ARIMA model performed better based on the MAPE criterion [33]. Quddus [4] reported that when the normality assumption of errors in the ARIMA model is violated, the INAR Poisson model leads to better performance. Nonetheless, the ARIMA outperforms the INAR model in the case of aggregated time series. Because, in this case, the mean of the counts is high, and the normal error distribution assumption is satisfactory. In conclusion, despite complicated machine learning and integer-valued time series regression models being other options for predicting crash counts, the classical ARIMA technique and its extensions still lead to predictions with high accuracy.
A recent study analyzed the time series of occupational traffic fatality rates in the southern states of Brazil. The study applied the βARIMA and KARIMA extensions of the ARIMA model to model the random effects of the time series data explicitly. Results indicated that the simple ARIMA model almost outperforms the extensions in the aggregated time series [34]. Disaggregated time series crash counts generally follow the Poisson-type distribution. So, Quddus [7] employed the Poisson-based generalized linear autoregressive and moving average (GLARMA) model to a time series dataset of aircraft safety data, analyzing an intervention's effectiveness within UK airspace. The study revealed that accounting for serial correlation in airprox counts would be more important than considering the over-dispersion while dealing with time series crash data. Quddus [7] suggested the use of the real-valued ARIMA model and its extensions for datasets with a high mean count (greater than 50) and a large spatial and temporal resolution, such as national-level fatal accidents. It is important to note, however, that the NBINGARCH or GLARMA models may be more suitable for highly disaggregated time series datasets. Indeed, if the aggregate data follows a normal distribution, the Poisson distribution assumption of these models would result in biased estimates and misleading interpretations. Besides, these models cannot explicitly control for inherent seasonal variations in the time series process.
2.1.3. Multivariate analysis of crash time series
Recent studies have indicated that incorporating human, vehicle, and environmental factors in time series analysis of crash datasets raises the prediction capability of the univariate forecasting methods [10]. Consequently, the ARIMA with explanatory variable (ARIMAX) modeling technique could outperform the simple ARIMA in terms of accuracy and comprehensiveness if such data is available. Nonetheless, the vast majority of studies in the literature analyzed only the aggregated crash counts without considering the simultaneous effects of influential underlying factors [8,[34], [35], [36]]. Although a limited number of studies have adopted exogenous variables as covariates, most of them only adopted dummy variables, evaluating an intervention's effectiveness [[37], [38], [39], [40]]. The studies which conducted the multivariate dynamic time series analysis have adopted operational traffic characteristics (e.g., traffic flow), macroeconomic conditions (e.g., GDP, unemployment rate), socio-demographic specifications (e.g., total population, automobile numbers), weather conditions, and traffic violations [4,5,10,37,[41], [42], [43], [44]]. However, no study analyzed the dynamic impacts of mean speed on different crash severity levels, controlling for traffic exposure. Besides, the SARIMAX modeling technique has not been well-established for analyzing accident time series in the literature.
2.2. Speed-crash association
On the one hand, the early research suggested that the speed-crash relationship could be represented by a “U-shaped” curve [45,46]. This implies that only high and low speed ranges contribute to accidents. Nonetheless, subsequent studies revealed exponential, power, or linear model postulations for the positive speed-crash association [[47], [48], [49], [50], [51]]. On the other hand, some studies found a negative [17,18,52,53] or an insignificant association [18,[54], [55], [56]]. Solomon [45] put forward the idea that the deviation from the modus speed matters. Subsequent studies have revealed that speed variation is a major factor in causing accidents and that the simultaneous effects of mean speed are comparatively negligible. This has led to the conclusion that “Variance Kills, not speed” [18,56]. In contrast, some findings suggest that speed and speed variations are both significant predictors of crash frequency [57,58].
Since the emergence of the COVID-19 pandemic, intercity traffic volume has decreased, and mean speed has increased significantly in some countries whose results on traffic safety would bring valuable insights. The traffic volume dropped sharply during the pandemic, even up to 50% in more than 12 countries [59]. With a such decline in exposure to crash risk, traffic accidents were expected to decrease significantly. Meanwhile, despite a consistent decline in the number of total crash counts, the number of road traffic deaths has increased in some countries. Studies attributed the surge in severe accidents to mobility with empty lines, reduced crowding, and increased speeding [59]. The empirical findings of recent studies also referred the increase in severe accidents to the increase in aggressiveness (e.g., speeding, drunk driving, improper passing) and inattentiveness (e.g., unbelted driving, distracted driving, failing to signal) of drivers on empty roads [60]. These findings suggest that the effect of mean speed on accidents would not be similar for different severity levels necessarily.
Recent studies have demonstrated that speed and its variations are not the only factors responsible for higher crash frequencies but that the combination of specific traffic conditions also plays an important role in crash occurrences [61]. Factors such as traffic flow (traffic exposure), vehicle occupancy, and road geometric design have been reported as influential factors. Traffic flow (traffic exposure) has always been considered the main predictor of crash risk [22,62,63]. Besides, the time trends of this crucial variable explain the general trends of crash counts. Consequently, scholars would not correctly extract the partial effects of other variables without controlling for traffic exposure.
Previous researches that have shown an inverse relationship between speed and crash occurrence have attributed their findings to unobserved heterogeneity, endogeneity, and confounding factors. However, they have not investigated influential mechanisms, such as the proportion of distinct accident types, which could help interpret these findings.
3. Materials and methods
3.1. Data source and data distribution
The National Traffic Police (NAJA) provided the surveillance data for this study, which consisted of all traffic accidents on rural Iran highways recorded on standard COM114 forms. The dataset included information about the passengers/occupants involved in accidents, accident severity, and accident type. Since 2016, the Ministry of Roads and Urban Development (MRUD) has collected daily data from both police reports and loop detectors. The loop detectors measure the Time Mean Speed (TMS) of vehicles on the roads, which is “the average speed of all vehicles passing a point on a highway or lane over a specified time period” [64]. The speed of individual vehicles can be calculated by dividing a specified distance (the length of the detector plus the average length of a vehicle) over the measured travel time (the time span that the vehicle is sensed by the loop detector). The TMS is simply the mean of individual vehicle speeds, estimated by the point detector. The average monthly TMSs recorded by 2604 loop detectors across the country were used in this study as a measure of monthly operating speed on rural highways.
As the traffic exposure variable plays a determinative role in the crash occurrence, inductive loop detectors were used to measure daily traffic volume on rural highways. To effectively capture the seasonal fluctuations in Iranian travel patterns, the data was aggregated on a monthly basis according to the Persian calendar. Finally, the time series of total and fatal traffic crashes, average monthly traffic flow, and average monthly TMS from Farvardin 1, 1395 (March 20, 2016) to Mordad 31, 1400 (August 22, 2021) were utilized for analysis (e.g., 65 data points). Table 1 presents the descriptive statistics of the target variable and the exogenous variables. The proposed crash prediction model will evaluate the impact of each contributing factor on the monthly number of crashes.
Table 1.
Descriptive statistics of data.
| Variable | Number of months | Minimum | Maximum | Mean | Std. Deviation |
|---|---|---|---|---|---|
| Response variables | |||||
| Number of total accidents | 65 | 6929 | 16,138 | 11203.06 | 2340.02 |
| Number of fatal accidents | 65 | 235 | 560 | 415.26 | 68.81 |
| Exogenous variables | |||||
| Mean speed (kph) | 65 | 76.18 | 80.94 | 78.61 | 0.91 |
| Traffic flow () | 65 | 261.42 | 698.76 | 515.73 | 79.53 |
Eviews version 10 was used to estimate the models employing the maximum likelihood procedure and to calculate the ADF and BDS tests. Additionally, Minitab version 19 was used to conduct the Kolmogorov-Smirnov (KS) and Ljung-Box (LB) tests on residuals and to represent the ACF/PACF and time series plots.
Fig. 1 indicates the normal probability plots, histograms, and KS statistics of monthly total and fatal accidents time series data. These statistics imply that the normal distribution assumption for the error term in the Box-Jenkins method is appropriate for the macro-level data, and suggest that the discrete-valued models with the Poisson distribution assumption would not be suitable for this case.
Fig. 1.
Normal probability plots, histograms, and KS statistics illustrate the normal distribution of target variables. Panels (a) and (b) represent the normal probability plots of the fatal and total accidents, respectively; While Panels (c) and (d) indicate the histograms of the fatal and total accidents, respectively.
3.2. Methodology
The ARIMA model, developed by Box and Jenkins [29], is a combination of the moving average (MA) and autoregressive (AR) models, which explicitly incorporates differencing in its formulation. The AR model characterizes a time series based on the relationship between current observations and their lagged values, whereas the MA model is a representation of a time series as an amalgamation of present and prior random error [10]. To account for seasonality, the seasonal ARIMA (SARIMA) adds three seasonal components to the model. The model's structure is denoted as , where d is the number of non-seasonal differencing operations, q is the moving average order, p is the autoregressive order, and s represents the number of periods in each season. The seasonal components are represented by D, Q, and P. The SARIMA model can be expressed as a series of lag polynomials, as represented in Eq. (1) [65]:
| (1) |
Where denotes the first-order seasonal difference, eliminating non-stationarity from the time series, and is the Gaussian white noise error at time .
The SARIMAX model, as introduced by scholars in Ref. [66], extends the SARIMA model by including exogenous variables to enhance its explanatory power. Referred to as “dynamic regression” in the literature [67], the SARIMAX model captures the dynamic effects of external factors on the time series of interest. According to its definition in Ref. [68], the SARIMAX model can be expressed as indicated in Eqs. (2), (3):
| (2) |
Where;
| (3) |
In which is the intervention term; is the appropriate transformation of the target variable, X is a vector of predictors, and denotes the error component, represented by . Besides, the are the seasonal AR (SAR) and regular AR operators. The , are the seasonal MA (SMA) and regular MA operators, and are the seasonal and regular backshift operators. The is a random error component.
The Box-Jenkins method is a three-step iterative modeling process that includes model identification, parameter estimation, and diagnosis checking [29]. The stationarity assumption is a core implicit assumption of the Box-Jenkins models, which necessitates that the statistical features of the time series, such as mean, variance, and serial correlation, remain constant over time [10].
To begin with, the stationarity and linearity of the series should be verified using the ADF (Augmented Dickey-Fuller) [69] and BDS (Brock, Dechert, and Scheinkman) [70] tests, respectively. The ADF test is a statistical tool used to determine the presence of a unit root in a given series, which implies that the series is non-stationary. The null hypothesis of the ADF test is that the time series contains a unit root. The BDS test has the null hypothesis that time series data is independent and identically distributed (iid). This test has a unique capability to detect nonlinearities while not being affected by linear dependencies in the data. It is necessary to set two free parameters - the epsilon value (ε) and the embedding dimension (m) - when conducting the BDS test. Generally, the embedding dimension should range from 2 to 8, although the maximum embedding dimension should be lower if there are fewer observations available [71]. Simulation studies suggest that the BDS test should be computed using an epsilon value in the range of 1.5–2.0 times the standard deviation of the time series [72]. This study computed the BDS statistics, setting the m and ε parameters to 5 and 1.5 respectively. Additionally, stationarity can be evaluated by visually inspecting the time series and analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots. Model identification is the subsequent step, which entails determining the parameters order of the Box-Jenkins model through examination of the PACF and ACF plots. Subsequently, the parameters of the formulated model are estimated using the maximum likelihood method. Ultimately, the alternative models are compared based on the lowest MAPE and BIC criteria.
After developing the model, the next task is to perform diagnostic checking to evaluate its accuracy. Visually assessing the white noise properties of the residuals, such as the constancy of variance, independence, and zero mean, can be done by inspecting the PACF and ACF residual plots. Additionally, the normality and independence of residuals can be evaluated using the Kolmogorov-Smirnov (KS) and Ljung and Box (LB) tests, respectively. The Ljung-Box test [73] is a measure of how well the autocorrelation of the residuals matches the autocorrelation of white noise. If the Ljung-Box test statistic is not statistically significant, it means that the null hypothesis of the white noise characteristics of the residuals cannot be rejected. Finally, the efficiency of the model's forecasts can be assessed by comparing the observed and predicted monthly crash frequencies. All statistical tests in this study are conducted at the 5% significance level.
4. Results and discussion
The purpose of this research was to propose an accurate time series regression model for crash macro forecast in Iran. To this end, ARIMA, ARIMAX, SARIMA, and SARIMAX econometric models were developed and compared for total and fatal crash count time series datasets. The datasets were divided into two sections, with the first 55 observations from Farvardin 1, 1395 (March 20, 2016) to Mehr 30, 1399 (October 21, 2020) utilized as the training set and the last ten months employed as the test set. The developed models were evaluated employing the higher , the lower MAPE, and the lower Bayesian information criterion (BIC) as forecast accuracy measures.
4.1. Accident macro forecast for the fatal crash count dataset
Fig. 2 illustrates the monthly time trends of fatal traffic accidents in Iran. It is evident that monthly fatal accidents follow the time trends of the traffic flow (traffic exposure) closely. They both pick up in the first two months of the year, during the New Year (Nowruz) holidays. Then they decline till the fourth month when the summer school holidays start. The trends pick up again more intensively till the sixth month, when summer vacations end. They then drop off during the autumn and the first two months of the winter, and they start the increasing pattern in the last month again. Consequently, the seasonal nature of the crash time series is evident, confirming the necessity of implementing the advanced SARIMA models.
Fig. 2.
Monthly total (*10)/fatal crash counts and monthly traffic flow (*10^5) in Iran.
The time series plot of the monthly fatal crashes indicates that the series is stationary in terms of variance and mean, as there are no significant trends or fluctuations in variance over time. The partial autocorrelation function (PACF) and autocorrelation function (ACF) plots for the first 24 lags display a damped sinusoid behavior, with the significance of the first lags’ autocorrelation coefficients, which suggests that the series is stationary (see Fig. 3). The ADF test further confirmed the stationarity of the series rejecting the null hypothesis of non-stationarity at a 99% confidence level (see Table 2).
Fig. 3.
The ACF and PACF plots of the original fatal crash count time series, represented in panels (a) and (b), respectively.
Table 2.
P-values of statistical tests for linearity and stationarity.
| Time series |
BDS test |
ADF test | |||
|---|---|---|---|---|---|
| m = 2 | 3 | 4 | 5 | ||
| Monthly fatal crashes | 0.320 | 0.938 | 0.993 | 0.999 | 0.000 |
| Ln (monthly total crashes) | 0.000 | 0.000 | 0.000 | 0.000 | 0.050 |
| Monthly total crashes | 0.000 | 0.000 | 0.000 | 0.000 | 0.066 |
Furthermore, the BDS test revealed that the monthly fatal accidents follow a linear time series, since the test statistic did not reject the null hypothesis at a 5% significance level (see Table 2); Therefore, linear time series regression methods would be suitable for modeling the process.
In order to identify the most appropriate model, the ACF and PACF plots were examined. The ACF plot showed a damped sinusoidal pattern, with only the first lag coefficients having high values. Moreover, the PACF demonstrated a spike at the first lag, followed by a tailing off, indicating the presence of autoregressive terms in the series. As spikes were also observed in subsequent lags in both plots, it was concluded that the time series was a SARIMA process. To determine the best model, multiple SARIMA models were developed, ranked by the lower BIC criterion. The statistical significance of the top 20 models’ parameters was evaluated at a 95% confidence level, as shown in Fig. 4. Finally, two univariate models were selected for further analysis, and their features are described in Table 3.
Fig. 4.
BIC values of seasonal [panel (a)] and non-seasonal [panel (b)] univariate models. The values in parentheses show the order of model parameters as (p,q) (P,Q).
Table 3.
Accident prediction models for monthly fatal accidents in Iran.
| ARIMA | ARIMAX | SARIMA | SARIMAX | |||||
|---|---|---|---|---|---|---|---|---|
| Coef. | T-stat | Coef. | T-stat | Coef. | T-stat | Coef. | T-stat | |
| Explanatory variables | ||||||||
| Non-Seasonal AR (1) | 0.677 | 7.19 | 0.608 | 5.180 | 0.534 | 4.09 | 0.442 | 3.229 |
| Seasonal AR (1) | – | – | – | – | 0.636 | 4.46 | 0.493 | 3.651 |
| Constant | 420.839 | 20.34 | −44.930 | −0.056 | 409.677 | 15.51 | – | – |
| Monthly traffic flow (* ) | – | – | 0.615 | 7.723 | – | – | 0.553 | 5.760 |
| Mean speed () | – | – | 1.835 | 1.857 | – | – | 1.679 | 2.513 |
| Descriptive statistics | ||||||||
| Length of series | 55 | 55 | 55 | 55 | ||||
| Log-likelihood | −293.30 | −267.67 | −283.64 | −263.57 | ||||
| Accuracy (within sample) | ||||||||
| Bayesian information criterion (BIC) | 10.884 | 10.098 | 10.606 | 9.949 | ||||
| Mean absolute % error (MAPE) | 10.205 | 6.092 | 8.209 | 5.761 | ||||
| Mean absolute deviation (MAD) | 41.528 | 25.160 | 33.162 | 24.027 | ||||
| Root mean square error (RMSE) | 49.805 | 31.301 | 39.589 | 28.242 | ||||
| 0.472 | 0.791 | 0.666 | 0.830 | |||||
| Diagnosis Check of Residuals | ||||||||
| Ljung and Box (LB) (K = 14) | 47.91 | 0.73 | 25.24 | 0.11 | 10.53 | 0.57 | 10.95 | 0.76 |
| Kolmogorov-Smirnov (KS) | 0.086 | <1.46 | 0.054 | <1.46 | 0.100 | <1.46 | 0.082 | <1.46 |
| Forecast Accuracy | ||||||||
| Out of sample MAPE (%) | 17.67 | 8.38 | 11.29 | 5.61 |
The univariate seasonal and non-seasonal models were found to be the best fit, with and ARIMA (1,0,0) models, respectively. As presented in Table 3, the estimated ARIMA model had an value of 0.472, indicating that roughly half of the variation in current monthly fatal crashes could be explained by past fatal crashes. The inclusion of seasonal dependencies in the SARIMA model increased the by 1.5 times, resulting in a value of 0.666.
The multivariate SARIMAX model demonstrated superior performance to other models, with a coefficient of determination of 0.830. Monthly mean speed and traffic exposure had a significant positive impact on fatal accidents. Diagnostic checks confirmed that the developed models were appropriate for forecasting time series data of fatal accidents in Iran. The ACF and PACF plots of the residuals displayed no large coefficients at any lag, indicating that there was no serial correlation among the residuals and that the models accurately captured the underlying time series process, as illustrated in Fig. 5.
Fig. 5.
The residuals ACF and PACF of ARIMA and SARIMA Models (fatal crashes). Panels (a) and (b) represent the ACF plots of ARIMA and SARIMA reseduals, respectively; While panels (c) and (d) indicate PACF plots of ARIMA and SARIMA reseduals, respectively.
The overall accuracy of the models was also evaluated using the LB test, and the test statistics were found to be statistically significant at a %95 confidence level (see Table 3). The estimated models were employed to forecast the first ten months of the out-of-sample period (see Fig. 6). The out-of-sample forecast accuracy of the models was reported as the mean absolute percent error (MAPE) in Table 3. Reviewing the SARIMAX model in Table 3, the mean speed has a linear positive association with fatal accidents, controlling for traffic exposure.
Fig. 6.
Observed vs. forecasted values of monthly fatal accidents in Iran.
4.2. Accident macro forecast for the total crash count dataset
The time series in Fig. 2 showed an increasing trend in monthly accidents, indicating non-stationarity in the mean. The stationarity was further examined using the ADF test, which did not reject the non-stationarity null hypothesis at a 95% confidence level, as indicated in Table 2. Additionally, non-stationarity in variance was observed after the 42nd data point, possibly due to the Covid pandemic (see Fig. 2). Utilizing a Box-Cox transformation with a λ of 0 enabled the mean and variance to be stabilized, resulting in a stationary series. The ADF test confirmed the stationarity of the transformed series, rejecting the null hypothesis at a 5% significance level, as shown in Table 2. Additionally, the ACF plot of the transformed time series indicated stationarity, with only the first lag coefficients being statistically significant (see Fig. 7).
Fig. 7.
The ACF and PACF plots of the transformed total crash count time series, represented in panels (a) and (b), respectively.
The transformed series ACF plot in Fig. 7 demonstrated a damped pattern, while the PACF plot showed a single significant spike at the first lag, indicating a pure AR (1) process. Despite developing various SARIMA models, none of the estimated models had statistically significant parameters. Finally, the ARIMA (1,0,0) and ARIMA (1,0,0) with exogenous variables (ARIMAX) models were chosen as the best models for forecasting monthly total accidents, and their estimated parameters are provided in Table 4.
Table 4.
Accident prediction models for log-transformed monthly total accidents in Iran.
| ARIMA | ARIMAX | SARIMA | ||||||
|---|---|---|---|---|---|---|---|---|
| Coef. | t-stat | Coef. | t-stat | Coef. | t-stat | |||
| Explanatory variables | ||||||||
| Non-Seasonal AR (1) | 0.815 | 10.86 | 0.710 | 7.130 | 0.798 | 11.286 | ||
| Seasonal AR (1) | – | – | – | – | 0.260 | 1.503 | ||
| Constant | 9.289 | 112.44 | 14.0495 | 9.82 | 9.273 | 93.776 | ||
| Monthly traffic flow (* ) | – | – | 0.002 | 10.30 | – | – | ||
| Mean speed () | – | – | −0.071 | −3.88 | – | – | ||
| Descriptive statistics | ||||||||
| Length of series | 55 | 55 | 55 | |||||
| Log-likelihood | 36.97 | 70.77 | 38.15 | |||||
| Accuracy (within sample) | ||||||||
| Bayesian information criterion (BIC) | −1.126 | −2.209 | −1.096 | |||||
| Mean absolute % error (MAPE) | 1.062 | 0.560 | 2.370 | |||||
| Mean absolute deviation (MAD) | 0.099 | 0.052 | 0.095 | |||||
| Root mean square error (RMSE) | 0.122 | 0.066 | 0.119 | |||||
| 0.664 | 0.901 | 0.683 | ||||||
| Diagnosis Check of Residuals | ||||||||
| Ljung and Box (LB) (K = 14) | 9.94 | 1.15 | 6.29 | 0.76 | 9.52 | 1.08 | ||
| Kolmogorov-Smirnov (KS) | 0.073 | <1.46 | 0.097 | <1.46 | 0.116 | 1.85 | ||
| Forecast Accuracy | ||||||||
| Out of sample MAPE (%) | 2.26 | 1.97 | 2.87 | |||||
Upon reviewing Table 4, it is observed that all the regressors, except for the seasonal model, are statistically significant at a 95% confidence level. The value of the ARIMA model is 0.664, indicating that the current crash counts can be explained by taking into account the autocorrelations in the series. The multivariate ARIMAX model performs better than the univariate model and explains 90% of the variation in the crash time series. The MAPE values suggest that the aforementioned models accurately reproduce observations with an average absolute error of approximately 1%. Furthermore, the ARIMAX model suggests that the monthly crash frequency is negatively associated with the mean speed while controlling for the simultaneous impacts of traffic exposure.
After conducting tests on the residuals, including checking for non-autocorrelation, variance stationarity, and zero mean, the partial autocorrelation function (PACF) and autocorrelation function (ACF) plots were analyzed and are presented in Fig. 8. The results indicate that the ACF/PACF values are insignificant, suggesting that the residuals are uncorrelated. The Ljung and Box test evaluated the fit of the developed instruments, where the null hypothesis of the test statistic was not rejected, as shown in Table 4. Moreover, the Kolmogorov-Smirnov (KS) test confirmed the hypothesis of the Gaussian white noise properties for the error term, as the normality null hypothesis was not rejected based on the test's P-values, all of which were above 0.15 (e.g., t-value<1.46).
Fig. 8.
The Residuals ACF and PACF of ARIMA and ARIMAX Models (total crashes). Panels (a) and (b) represent the ACF plots of ARIMA and ARIMAX reseduals, respectively; While panels (c) and (d) indicate PACF plots of ARIMA and ARIMAX reseduals, respectively.
The developed macro forecast crash prediction models were then applied for out-of-sample crash prediction (see Fig. 9). The prediction accuracy of the models was evaluated based on the lower MAPE criterion, with the results reported in the last row of Table 4.
Fig. 9.
Observed vs. forecasted values of monthly total accidents in Iran.
In summary, the results suggest that accident frequency increases when the mean speed declines, particularly in congested traffic, where elevated interactions among vehicles restrict the freedom of maneuver and relative distance. Therefore, the time series regression models indicate an inverse relationship between the crash frequency and mean speed. Recent real-time crash prediction research also supports this inverse association between crash occurrence and mean speed [[74], [75], [76], [77]]. On the other hand, crash severity declines in congested traffic because lower speed triggers fewer severe accidents [78,79]. The results demonstrate a positive association between mean speed and fatal crashes, which is in accordance with previous findings [16,80]. Additionally, the SARIMA model outperforms the ARIMA method when seasonality exists in the time series. This study criticizes the usefulness of the integer-valued extensions of the ARIMA model, known as GLARMA models, as they are not explicitly designed to model the highly seasonal crash counts. Finally, the study aggregated daily traffic and crash data based on the Persian calendar monthly, as the travel pattern of Iranians follows the ceremonies and holidays in their specific culture. Aggregating data on another basis would partially hide the seasonal nature of accidents time series, which previous studies on time trends of road traffic fatalities in Iran have overlooked [36,39,81,82].
5. Conclusions
5.1. Conclusions
This study aimed to employ the classical Box-Jenkins methodology to predict the time trends of monthly total and fatal crash counts of rural Iran highways, incorporating the dynamic effects of time-varying traffic characteristics. To this end, a range of univariate and multivariate econometric models was developed, namely ARIMA, ARIMAX, SARIMA, and SARIMAX models. Besides, the dissimilar impacts of mean speed on fatal and total accidents were explored using the aforementioned models while controlling for traffic exposure. The estimated models' in-sample accuracy was compared employing various performance metrics, and the proposed instruments' forecast performance was evaluated based on the mean absolute percent error (MAPE) criterion.
The following were made in this research: in the monthly fatal crash dataset, the most accurate crash macro forecast tools were the SARIMAX, SARIMA, and ARIMA models, respectively. While the ARIMA and ARIMAX models represented the best fits in predicting the total accidents as the seasonal effects were weaker in this case. Besides, the mean speed indicated a linear positive association with fatal accidents, controlling for traffic exposure. On the other hand, it was inversely related to monthly total crash frequency. Conclusively, the results illustrate that the multivariate ARIMA and SARIMA models clearly outperform their univariate counterparts. So, incorporating the dynamic effects of time-varying exogenous predictors on accidents, especially traffic conditions, would properly upgrade the forecast adequacy of the univariate time series regressions. Furthermore, the results suggest that it is essential to take into account seasonal dependencies when predicting crash time series data that exhibits strong seasonal patterns. Consequently, the SARIMA and SARIMAX models outperform their non-seasonal counterparts in the seasonal datasets. Moreover, the study illustrates the dissimilar impacts of average speed on total and severe crashes. Considering the contradictory results in the literature, the subsequent studies should focus on explaining the reasons behind these discrepancies.
5.2. Implications
The models developed in this study could aid policymakers in predicting future accident trends and evaluating the effectiveness of newly implemented interventions. The multivariate models, which incorporate speed and traffic exposure, can effectively identify the efficiency of policies even in the presence of external factors that disrupt common trends, such as pandemics. The study found that increasing the monthly mean speed by 1 km/h leads to a monthly increase of about two fatal accidents on rural highways in Iran.
In low-income and middle-income countries (LMICs), enforcing regulations and penalties for speeding offenses may be less effective than in high-income countries due to the negligible costs of such offenses. In many high-income countries, heavy vehicles are equipped with mechanisms to prevent violation of speed limits; however, these devices are often met with resistance in LMICs for economic reasons or disabled by operators if installed [83]. Drivers in LMICs are often under pressure to speed due to timetables and incentives based on ticket receipts. Studies suggest that offering an insurance incentive for installing speed enforcement devices in vehicles could be an effective policy for reducing speed-related accidents in Iran [84]. Additionally, highways in high-income countries are designed with a forgiving design concept, which can diminish the severity and probability of traffic crashes even if individuals exceed speed limits or make other mistakes while driving. Therefore, the results of this study raise concerns about speed-related accidents in Iran and highlight the need for policymakers to increase the cost of speed violations, implement modern enforcement techniques, and upgrade road design standards.
The study also recommends the use of the SARIMAX method as a useful tool for quantifying the explicit impacts of distinct influential factors on crashes while accounting for underlying serial correlation, seasonality, and dynamic effects of other contributing factors. Despite its potential benefits, this robust econometric tool has been largely overlooked in traffic safety studies that use aggregate crash count time series analysis. Finally, the study highlights that accident frequency in Iran closely follows the traffic flow time trend and the seasonal pattern of recreational travel. Therefore, future interventions aimed at reducing accidents in Iran will need to include measures to control unnecessary intercity travel, considering the low price of fuel in the country.
5.3. Limitations and recommendations
Containing environmental and behavioral regressors of accidents as exogenous variables in a SARIMAX framework is recommended for future studies. Although, such information may not be available at the national level in developing countries. Comparing the forecast performance of the developed models with advanced modeling techniques such as recurrent neural networks and integer-valued time series regression methods should also be influential. Although, the better performance of neural networks seems questionable for linear dependence time trends.
Author contribution statement
Habibollah Nassiri: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper. </p>
Seyed Iman Mohammadpour: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. </p>
Mohammad Dahaghin: Contributed reagents, materials, analysis tools or data. </p>
Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Data availability statement
Data will be made available on request.
Declaration of interest's statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Contributor Information
Habibollah Nassiri, Email: nassiri@sharif.edu.
Seyed Iman Mohammadpour, Email: s.mohammadpour@student.sharif.edu.
Mohammad Dahaghin, Email: Mohammad.dahaghin@alum.sharif.edu.
References
- 1.World Health Organization (WHO) World Health Organization; Geneva: 2018. Global Status Report on Road Safety 2018.https://www.who.int/publications/i/item/9789241565684 Available from: [Google Scholar]
- 2.United Nations Department for Safety and Security (UNDSS) Road safety strategy booklet. 2020 Sep. https://www.un.org/sites/un2.un.org/files/2020/09/road_safety_strategy_booklet.pdf [cited 27 Jan 2023]. In: The United Nations (UN) website [Internet]. Available from:
- 3.Anila P. United Nations publication issued by the United Nations Conference on Trade and Development; Geneva: 2017. Road Safety-Considerations in Support of the 2030 Agenda for Sustainable Development.https://unctad.org/system/files/official-document/dtltlb2017d4_en.pdf Available from: [Google Scholar]
- 4.Quddus M.A. Time series count data models: an empirical application to traffic accidents. Accid. Anal. Prev. 2008;40(5):1732–1741. doi: 10.1016/j.aap.2008.06.011. Sep. 1. [DOI] [PubMed] [Google Scholar]
- 5.Zhang Y., Zou Y., Wu L., Tang J., Muneeb Abid M. Exploring the application of the linear Poisson autoregressive model for analyzing the dynamic impact of traffic laws on fatal traffic accident frequency. J. Adv. Transport. 2020:1–9. 2020 Oct 9. [Google Scholar]
- 6.Ye F., Garcia T.P., Pourahmadi M., Lord D. Extension of negative binomial GARCH model: analyzing effects of gasoline price and miles traveled on fatal crashes involving intoxicated drivers in Texas. Transport. Res. Rec. 2012 Jan;2279(1):31–39. [Google Scholar]
- 7.Quddus M. Time-series Regression Models for Analysing Transport Safety Data. InSafe Mobility: Challenges, Methodology and Solutions 2018 Apr 18. Emerald Publishing Limited. p. 279–296.
- 8.Qian Y., Zhang X., Fei G., Sun Q., Li X., Stallones L., Xiang H. Forecasting deaths of road traffic injuries in China using an artificial neural network. Traffic Inj. Prev. 2020;21(6):407–412. doi: 10.1080/15389588.2020.1770238. Aug 17. [DOI] [PubMed] [Google Scholar]
- 9.Vanlaar W., Robertson R., Marcoux K. An evaluation of Winnipeg's photo enforcement safety program: results of time series analyses and an intersection camera experiment. Accid. Anal. Prev. 2014 Jan 1;62:238–247. doi: 10.1016/j.aap.2013.09.023. [DOI] [PubMed] [Google Scholar]
- 10.Ihueze C.C., Onwurah U.O. Road traffic accidents prediction modelling: an analysis of Anambra State, Nigeria. Accid. Anal. Prev. 2018 Mar 1;112:21–29. doi: 10.1016/j.aap.2017.12.016. [DOI] [PubMed] [Google Scholar]
- 11.Chen Y., Tjandra S. Daily collision prediction with SARIMAX and generalized linear models on the basis of temporal and weather variables. Transport. Res. Rec. 2014 Jan;2432(1):26–36. [Google Scholar]
- 12.Gupta R., Pandey G., Pal S.K. Comparative analysis of epidemiological models for COVID-19 pandemic predictions. Biostatistics & Epidemiology. 2021 Jan 2;5(1):69–91. [Google Scholar]
- 13.Sánchez-Sánchez PA, García-González JR, Coronell LH. IntechOpen; 2020. Encountered Problems of Time Series with Neural Networks: Models and Architectures, Recent Trends in Artificial Neural Networks - from Training to Prediction 2020 Mar 4; pp. 21–38. [Google Scholar]
- 14.Chan M., Bloomberg M.R. Reducing speed to save lives. 2017 May 9. https://www.who.int/news-room/commentaries/detail/reducing-speed-to-save-lives [cited 27 January 2023]. In: World Health Organization Newsroom, Commentaries [Internet]. Available from:
- 15.Aarts L., Van Schagen I. Driving speed and the risk of road crashes: a review. Accid. Anal. Prev. 2006 Mar 1;38(2):215–224. doi: 10.1016/j.aap.2005.07.004. [DOI] [PubMed] [Google Scholar]
- 16.Hauer E. Speed and safety. Transport. Res. Rec. 2009;2103(1):10–17. [Google Scholar]
- 17.Baruya A. 9th International Conference on Road Safety in Europe. Bergisch Gladbach; Germany, September: 1998. Speed-accident relationships on European roads. [Google Scholar]
- 18.Garber N.J., Gadiraju R. Factors affecting speed variance and its influence on accidents. Transport. Res. Rec. 1989;1213:64–71. [Google Scholar]
- 19.Mehrabani B.B., Mirbaha B. Evaluating the relationship between operating speed and collision frequency of rural multilane highways based on geometric and roadside features. Civil Engineering Journal. 2018 Mar;4(3):609. [Google Scholar]
- 20.Zhang X., Hongyan Y.A., Guoqing H.U., Mengjing C.U., Yue G.U., Xiang H. Basic characteristics of road traffic deaths in China. Iran. J. Public Health. 2013;42(1):7. [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang X., Pang Y., Cui M., Stallones L., Xiang H. Forecasting mortality of road traffic injuries in China using seasonal autoregressive integrated moving average model. Ann. Epidemiol. 2015 Feb 1;25(2):101–106. doi: 10.1016/j.annepidem.2014.10.015. [DOI] [PubMed] [Google Scholar]
- 22.Abdel-Aty M.A., Radwan A.E. Modeling traffic accident occurrence and involvement. Accid. Anal. Prev. 2000 Sep 1;32(5):633–642. doi: 10.1016/s0001-4575(99)00094-9. [DOI] [PubMed] [Google Scholar]
- 23.Lord D., Washington S.P., Ivan J.N. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accid. Anal. Prev. 2005 Jan 1;37(1):35–46. doi: 10.1016/j.aap.2004.02.004. [DOI] [PubMed] [Google Scholar]
- 24.Noland R.B., Quddus M.A. A spatially disaggregate analysis of road casualties in England. Accid. Anal. Prev. 2004 Nov 1;36(6):973–984. doi: 10.1016/j.aap.2003.11.001. [DOI] [PubMed] [Google Scholar]
- 25.Noland R., Quddus M., Ochieng W. Transportation Research Board 85st Annual Meeting. Washington, DC, USA, January; 2006. The effect of the congestion charge on traffic casualties in London: an intervention analysis. [Google Scholar]
- 26.Antoniou C., Yannis G. State-space based analysis and forecasting of macroscopic road safety trends in Greece. Accid. Anal. Prev. 2013 Nov 1;60:268–276. doi: 10.1016/j.aap.2013.02.039. [DOI] [PubMed] [Google Scholar]
- 27.Chi G., Cosby A.G., Quddus M.A., Gilbert P.A., Levinson D. Gasoline prices and traffic safety in Mississippi. J. Saf. Res. 2010 Dec 1;41(6):493–500. doi: 10.1016/j.jsr.2010.10.003. [DOI] [PubMed] [Google Scholar]
- 28.Brijs T., Karlis D., Wets G. Studying the effect of weather conditions on daily crash counts using a discrete time-series model. Accid. Anal. Prev. 2008 May 1;40(3):1180–1190. doi: 10.1016/j.aap.2008.01.001. [DOI] [PubMed] [Google Scholar]
- 29.Box G.E., Jenkins G.M. Holdan-Day; San Francisco: 1970. Time Series Analysis: Forecasting and Control. [Google Scholar]
- 30.Linthicum K.J., Anyamba A., Tucker C.J., Kelley P.W., Myers M.F., Peters C.J. Climate and satellite indicators to forecast Rift Valley fever epidemics in Kenya. Science. 1999 Jul 16;285(5426):397–400. doi: 10.1126/science.285.5426.397. [DOI] [PubMed] [Google Scholar]
- 31.Box G.E., Jenkins G.M., Reinsel G.C., Ljung G.M. John Wiley & Sons; 2015 May 29. Time Series Analysis: Forecasting and Control. [Google Scholar]
- 32.Blázquez-García A., Conde A., Milo A., Sánchez R., Barrio I. Short-term office building elevator energy consumption forecast using SARIMA. Journal of Building Performance Simulation. 2020 Jan 2;13(1):69–78. [Google Scholar]
- 33.Wai A.H.C., Seng S.Y., Fei J.L.W. Proceedings of the 2019 2nd International Conference on Mathematics and Statistics. Prague, Czech Republic; July. 2019. Fatality Involving Road Accidents in Malaysia: a comparison between three statistical models. [Google Scholar]
- 34.Melchior C., Zanini R.R., Guerra R.R., Rockenbach D.A. Forecasting Brazilian mortality rates due to occupational accidents using autoregressive moving average approaches. Int. J. Forecast. 2021 Apr 1;37(2):825–837. [Google Scholar]
- 35.Bahadorimonfared A., Soori H., Mehrabi Y., Delpisheh A., Esmaili A., Salehi M., Bakhtiyari M. Trends of fatal road traffic injuries in Iran. PLoS One. 2013;8(5) doi: 10.1371/journal.pone.0065198. 2004–2011. May 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Delavary Foroutaghe M., Mohammadzadeh Moghaddam A., Fakoor V. Time trends in gender-specific incidence rates of road traffic injuries in Iran. PLoS One. 2019 May 9;14(5) doi: 10.1371/journal.pone.0216462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bergel-Hayat R., Debbarh M., Antoniou C., Yannis G. Explaining the road accident risk: weather effects. Accid. Anal. Prev. 2013 Nov 1;60:456–465. doi: 10.1016/j.aap.2013.03.006. [DOI] [PubMed] [Google Scholar]
- 38.Wu Q., Chen T., Byrne P.A., Larsen J., Elzohairy Y. General deterrence of drinking and driving: an evaluation of the effectiveness of three Ontario countermeasures. Int. J. Eng. Manag. Econ. 2015;5(3–4):209–223. [Google Scholar]
- 39.Delavary Foroutaghe M., Mohammadzadeh Moghaddam A., Fakoor V. Impact of law enforcement and increased traffic fines policy on road traffic fatality, injuries and offenses in Iran: interrupted time series analysis. PLoS One. 2020 Apr 17;15(4) doi: 10.1371/journal.pone.0231182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Nazif-Munoz J.I., Oulhote Y., Ouimet M.C. The association between legalization of cannabis use and traffic deaths in Uruguay. Addiction. 2020 Sep;115(9):1697–1706. doi: 10.1111/add.14994. [DOI] [PubMed] [Google Scholar]
- 41.Law T.H., Umar R.R., Wong S.V. The Malaysian government's road accident death reduction target for year 2010. IATSS Res. 2005 Jan 1;29(1):42–49. [Google Scholar]
- 42.Li C., Chen J. International Conference on Measuring Technology and Mechatronics Automation. vol. 3. IEEE; 2009. Traffic accident macro forecast based on ARIMAX model; pp. 633–636. 2009 Apr 11. [Google Scholar]
- 43.Antoniou C., Yannis G., Papadimitriou E., Lassarre S. Relating traffic fatalities to GDP in Europe on the long term. Accid. Anal. Prev. 2016 Jul 1;92:89–96. doi: 10.1016/j.aap.2016.03.025. [DOI] [PubMed] [Google Scholar]
- 44.Zhang Z., Yang W., Wushour S. Traffic accident prediction based on LSTM-GBRT model. J. Control Sci. Eng. 2020:1. 2020 Mar 5. 0. [Google Scholar]
- 45.Solomon D.H. US Department of Transportation, Federal Highway Administration; 1964. Accidents on Main Rural Highways Related to Speed, Driver, and Vehicle. [Google Scholar]
- 46.Cirillo J.A. Interstate system accident research study II, interim report II. Public Roads. 1968;35(3):71–75. [Google Scholar]
- 47.Finch D.J., Kompfner P., Lockwood C.R., Maycock G. vol. 58. Transport Research Laboratory TRL, Project Report PR; 1994. (Speed, Speed Limits and Crashes. Crowthorne, Berkshire). [Google Scholar]
- 48.Kloeden C.N., McLean A.J., Moore V.M., Ponte G. NHMRC Road Accident Research Unit, The University of Adelaide; Adelaide: 1997. Travelling Speed and the Risk of Crash Involvement Volume 2-case and Reconstruction Details. [Google Scholar]
- 49.Kloeden C.N., McLean A.J., Glonek G. Department of Transport and Regional Services, Australian Transport Safety Bureau; 2002 Apr. (Road Accident Research Unit, the University of Adelaide, Adelaide, SA). Reanalysis of Travelling Speed and the Risk of Crash Involvement in Adelaide South Australia. Canberra (ACT) Report No.: CR 207. [Google Scholar]
- 50.Taylor M.C., Baruya A., Kennedy J.V. Road Safety Division, Department of the Environment; Transport and the Regions: 2002 Mar. The Relationship between Speed and Accidents on Rural Single Carriageway Roads. Crowthorne: Transport Research Laboratory. Report No.: TRL511. [Google Scholar]
- 51.Nilsson G. Lund Institute of Technology, Department of Technology and Society, Traffic Engineering; 2004. Traffic Safety Dimensions and the Power Model to Describe the Effect of Speed on Safety.https://www.motor-talk.de/forum/aktion/Attachment.html?attachmentId=689000 Ph.D. Thesis. Available from: [Google Scholar]
- 52.Taylor M.C., Lynam D., Baruya A. Crowthorne: Transport Research Laboratory; 2000 Mar. The Effect of Drivers' Speed on the Frequency of Accidents. Report No.: TRL421. Sponsored by Road Safety Division, Department of the Environment, Transport and the Regions. [Google Scholar]
- 53.Stuster J. US Department of Transportation, National Highway Traffic Safety Administration; 2004. Aggressive Driving Enforcement: Evaluations of Two Demonstration Programs. [Google Scholar]
- 54.Lave C.A. Speeding, coordination, and the 55 mph limit. Am. Econ. Rev. 1985 Dec 1;75(5):1159–1164. [Google Scholar]
- 55.Kockelman K.M., Ma J. Freeway speeds and speed variations preceding crashes, within and across lanes. J. Transport. Res. Forum. 2007;46(1):43–61. [Google Scholar]
- 56.Quddus M. Exploring the relationship between average speed, speed variation, and accident rates using spatial statistical models and GIS. J. Transport. Saf. Secur. 2013 Mar 1;5(1):27–45. [Google Scholar]
- 57.Levy D.T., Asch P. Speeding, coordination, and the 55-mph limit: comment. Am. Econ. Rev. 1989 Sep 1;79(4):913–915. [Google Scholar]
- 58.Tanishita M., Van Wee B. Impact of vehicle speeds and changes in mean speeds on per vehicle-kilometer traffic accident rates in Japan. IATSS Res. 2017 Oct 1;41(3):107–112. [Google Scholar]
- 59.Yasin Y.J., Grivna M., Abu-Zidan F.M. Global impact of COVID-19 pandemic on road traffic collisions. World J. Emerg. Surg. 2021 Dec;16:1–4. doi: 10.1186/s13017-021-00395-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Dong X., Xie K., Yang H. How did COVID-19 impact driving behaviors and crash Severity? A multigroup structural equation modeling. Accid. Anal. Prev. 2022 Jul 1;172 doi: 10.1016/j.aap.2022.106687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Choudhary P., Imprialou M., Velaga N.R., Choudhary A. Impacts of speed variations on freeway crashes by severity and vehicle type. Accid. Anal. Prev. 2018 Dec 1;121:213–222. doi: 10.1016/j.aap.2018.09.015. [DOI] [PubMed] [Google Scholar]
- 62.Chang L.Y. Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Saf. Sci. 2005 Oct 1;43(8):541–557. [Google Scholar]
- 63.Golob T.F., Recker W., Pavlis Y. Probabilistic models of freeway safety performance using traffic flow data as predictors. Saf. Sci. 2008 Nov 1;46(9):1306–1333. [Google Scholar]
- 64.Roess R.P., Prassas E.S., McShane W.R. Pearson/Prentice Hall; 2004. Traffic Engineering. [Google Scholar]
- 65.Adhikari R., Agrawal R.K. LAP LAMBERT Academic Publishing; 2013. An Introductory Study on Time Series Modeling and Forecasting. [Google Scholar]
- 66.Box G.E., Tiao G.C. Intervention analysis with applications to economic and environmental problems. J. Am. Stat. Assoc. 1975 Mar 1;70(349):70–79. [Google Scholar]
- 67.Pankratz A. John Wiley & Sons; 2012 Jan 20. Forecasting with Dynamic Regression Models. [Google Scholar]
- 68.Hipel K.W., McLeod A.I. Elsevier; 1994 Apr 7. Time Series Modelling of Water Resources and Environmental Systems. [Google Scholar]
- 69.Dickey D.A., Fuller W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979 Jun 1;74(366a):427–431. [Google Scholar]
- 70.Broock W.A., Scheinkman J.A., Dechert W.D., LeBaron B. A test for independence based on the correlation dimension. Econom. Rev. 1996 Jan 1;15(3):197–235. [Google Scholar]
- 71.Belaire-Franch J., Contreras D. How to compute the BDS test: a software comparison. J. Appl. Econom. 2002;17:691–699. [Google Scholar]
- 72.Kanzler L. Very fast and correctly sized estimation of the BDS statistic [Internet]. Social Science Research Network (SSRN) website; 1999 Feb [cited 2023 Mar 11]. Available from: 10.2139/ssrn.151669. [DOI]
- 73.Ljung G.M., Box G.E. On a measure of lack of fit in time series models. Biometrika. 1978;65(2):297–303. [Google Scholar]
- 74.Abdel-Aty M., Hassan H.M., Ahmed M. Transportation Research Board 91st Annual Meeting. Washington, DC, USA, January; 2012. Real-time analysis of visibility related crashes: can loop detector and AVI data predict them equally? [Google Scholar]
- 75.Yu R., Abdel-Aty M. Using hierarchical Bayesian binary probit models to analyze crash injury severity on high speed facilities with real-time traffic data. Accid. Anal. Prev. 2014 Jan 1;62:161–167. doi: 10.1016/j.aap.2013.08.009. [DOI] [PubMed] [Google Scholar]
- 76.Wang L., Abdel-Aty M., Shi Q., Park J. Real-time crash prediction for expressway weaving segments. Transport. Res. C Emerg. Technol. 2015;61:1. Dec 1. 0. [Google Scholar]
- 77.Xu C., Liu P., Wang W. Evaluation of the predictability of real-time crash risk models. Accid. Anal. Prev. 2016 Sep 1;94:207–215. doi: 10.1016/j.aap.2016.06.004. [DOI] [PubMed] [Google Scholar]
- 78.Shefer D., Rietveld P. Congestion and safety on highways: towards an analytical model. Urban Stud. 1997 Apr;34(4):679–692. [Google Scholar]
- 79.Xu C., Tarko A.P., Wang W., Liu P. Predicting crash likelihood and severity on freeways with real-time loop detector data. Accid. Anal. Prev. 2013 Aug 1;57:30–39. doi: 10.1016/j.aap.2013.03.035. [DOI] [PubMed] [Google Scholar]
- 80.Theofilatos A., Yannis G. A review of the effect of traffic and weather characteristics on road safety. Accid. Anal. Prev. 2014 Nov 1;72:244–256. doi: 10.1016/j.aap.2014.06.017. [DOI] [PubMed] [Google Scholar]
- 81.Yousefzadeh-Chabok S., Ranjbar-Taklimie F., Malekpouri R., Razzaghi A. A time series model for assessing the trend and forecasting the road traffic accident mortality. Archives of Trauma Research. 2016 Sep;5(3) doi: 10.5812/atr.36570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Parvareh M., Karimi A., Rezaei S., Woldemichael A., Nili S., Nouri B., Nasab N.E. Assessment and prediction of road accident injuries trend using time-series models in Kurdistan. Burns & Trauma. 2018 Dec 1:6. doi: 10.1186/s41038-018-0111-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Wegman F., Elsenaar P. SWOV Institute for Road Safety Research; Leidschendam: 1997. Sustainable Solutions to Improve Road Safety in The Netherlands. [Google Scholar]
- 84.Sahebi S., Nassiri H., Van Wee B., Araghi Y. Incorporating car owner preferences for the introduction of economic incentives for speed limit enforcement. Transport. Res. F Traffic Psychol. Behav. 2019 Jul 1;64:509–521. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.









