Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jul 8;20(7):e0326483. doi: 10.1371/journal.pone.0326483

Predicting errors in accident hotspots and investigating satiotemporal, weather, and behavioral factors using interpretable machine learning: An analysis of telematics big data

Ali Golestani 1,, Nazila Rezaei 1,, Mohammad-Reza Malekpour 1, Naser Ahmadi 1,2, Seyed Mohammad-Navid Ataei 1, Sepehr Khosravi 1, Ayyoob Jafari 3, Saeid Shahraz 4, Farshad Farzadfar 1,2,*
Editor: Habtamu Setegn Ngusie5
PMCID: PMC12237018  PMID: 40627630

Abstract

Background

Road traffic accidents (RTAs) are a major public health concern with significant health and economic burdens. Identifying high-risk areas and key contributing factors is essential for developing targeted interventions. While machine learning (ML) has been increasingly used to predict RTAs, the lack of interpretability limits its applicability in policymaking. This study aimed to utilize interpretable ML models to predict the occurrence of errors in road accident hotspots using telematics data in Iran and interpret the most influential predictors.

Methods

We utilized data collected via telematics from 1673 intercity buses throughout the year 2020, spanning cities across all provinces of Iran. Merging this data with a weather-related dataset resulted in a comprehensive dataset containing location, time, weather, and error type variables. After preprocessing, 619,988 records without any missing values were used to train and compare the performance of six machine learning models including logistic regression, K-nearest neighbors, random forest, Extreme Gradient Boosting (XGBoost), Naïve Bayes, and support vector machine. The best model was selected for interpretation using SHAP (SHapley Additive exPlanation). Due to the high imbalance in the outcome, an ensemble approach was applied to train all models.

Results

XGBoost demonstrated the best performance with an area under the curve (AUC) of 91.70% (95% uncertainty interval: 91.33% − 92.09%). SHAP values highlighted spatial-related variables, particularly the province of error and road type, as the most critical features for predicting errors in accident hotspots in Iran. Fatigue, as a behavioral error, was associated with a higher risk of predicting errors in accident hotspots, and certain weather-related variables including dew points and relative humidity also exhibited importance. However, temporal variables did not contribute significantly to the prediction.

Conclusion

By integrating spatiotemporal, behavioral, and weather-related variables, our study highlighted the dominance of spatial factors in predicting errors in accident hotspots. These findings underscore the need for targeted road infrastructure improvements and data-driven policymaking to mitigate RTA risks.

Introduction

Road Traffic Accidents (RTAs) represent a major global public health challenge, leading to significant mortality, morbidity, and economic burdens. The World Health Organization (WHO) estimated approximately 1.19 million annual fatalities due to road traffic injuries [1]. About 93% of road traffic deaths occurred in low- and middle-income countries (LMIC), despite having only 60% of vehicles [1]. In Iran, a lower-middle income country, RTAs rank as the second leading cause of Disability-Adjusted Life Years (DALYs) [2] and absorbs an estimated 2.19% of Iran’s Gross Domestic Production (GDP) [3]. These statistics highlight the urgent need to investigate the determinants of RTAs to formulate effective policies and mitigate their impact.

Various internal and external factors can affect driving, potentially leading to aggressive driving, errors, and accidents [4]. According to Haddon’s model, human, vehicle, and road-related factors—before, during, and after an accident—determine injury severity, with human factors contributing to 90%, vehicle-related factors to 30%, and road conditions to 10% of traffic accidents [5]. Notably, spatial variations in road traffic injuries demonstrate non-random cluster formations, indicating specific locations as more accident-prone like locations with higher traffic interactions and urban areas [6]. Furthermore, different road types influence driving styles and behavior [7]. Temporal factors, including seasons, weather conditions, hours, and days, all play crucial roles in the occurrence of RTAs [8]. An analysis of road traffic injuries in Iran revealed that most accidents occur during early evening/late afternoon and hours before noon [9]. Additionally, the highest incidence of accidents occurs during spring, summer, and early fall, with March marking the peak [9].

To better understand and predict driving errors leading to RTAs, researchers have employed various techniques incorporating factors such as vehicle speed, acceleration patterns, braking intensity, steering movements, road conditions, and environmental factors like weather and traffic density [10,11]. Machine learning (ML) techniques, known for strong predictive performance, have gained widespread use in RTAs research [12]. Previous studies have shown good performance of ML models in identifying the accident hotspots, assessing accident severity, and classifying key contributing factors [1315]. However, many ML models function as “black box” systems, lacking interpretability, which limits their application in policymaking [16]. Recent advancements in interpretable ML techniques address this challenge by maintaining predictive power while enhancing interpretability [17].

A promising source of data for ML-based RTA analysis is telematics—a technology that enables real-time vehicle data collection through telecommunications systems [1820]. Telematics data includes details such as the location, speed, sharp turning, harsh acceleration/braking, the number of driving episodes, and distance traveled [21]. Previous studies have leveraged telematics in insurance and driver behavior analysis [22,23], demonstrating its value for risk assessment and prevention [24]. It has been used to classify drivers based on risky behaviors, allowing insurers to adjust premiums and implement risk mitigation strategies [24]. Additionally, telematics-based feedback has been shown to improve driver behavior and road safety [21,25]. While telematics data is widely utilized in the insurance industry, its potential application for road safety policymaking, particularly in LMICs, remains largely unexplored.

This study aimed to bridge this gap by integrating telematics data with accident hotspots data and weather-related variables data to predict errors occurrence in accident hotspots in Iran. We employed various ML models, ranging from simple and conventional to more advanced techniques, to develop the best-performing predictive model. Furthermore, we utilized interpretability techniques to identify the most influential factors contributing to errors occurrence in accident hotspots. The findings of this study provide valuable insights for policymakers, facilitating data-driven interventions and optimized resource allocation for accident prevention, particularly in resource-limited settings like Iran.

Materials and methods

Study design

In this retrospective study, we leveraged data collected from intercity buses that were initially enrolled in a trial study [26]. The primary focus of the original trial was to investigate the impact of non-punitive peer-comparison feedback on driving behavior, utilizing telematics devices [21,26]. The participants were all male bus drivers aged over 20. They were recruited between February 1, 2017, and August 31, 2017. A telematics device was installed on their vehicles, which included a total of 1,673 intercity buses traversing cities across all provinces of Iran. These devices were designed to remain operational post-trial and integrated into fleet management systems, ensuring the future use of telematics data for research. In this study, we used the dataset that included information collected over the full year of 2020, from January 1 to December 31. Fig 1 illustrates the flowchart of the steps taken in this study.

Fig 1. The flowchart of this study steps.

Fig 1

*Models were trained in an ensemble approach to overcome outcome imbalance.

Telematics device and data collection

Telematics is a system that integrates an onboard computer and a GPS to monitor vehicle behavior. The data collected from embedded modules (including GPS location, GPS speed, 3-axis acceleration, and date-time information) and the OBD port of the vehicle (covering diagnostic trouble codes, fuel consumption, and engine speed) is processed by a microcontroller. Subsequently, this information is transmitted every 10 seconds via a Global System for Mobile Communications (GSM) module to a GSM network, where it is recorded in a centralized data center [26]. After recording the precise location and capturing the latitude and longitude of each vehicle at 10-second intervals in the centralized data center, reverse geocoding through the Nominatim OpenStreetMap API was utilized to obtain the road type for each data point [26,27]. The considered road types included ‘trunk’, ‘motorway’, ‘primary’, ‘secondary’, ‘tertiary’, ‘residential’, and ‘minor roads’ with detailed definitions and distinctions provided elsewhere [28]. The specific errors were defined in accordance with Azmin et al.’s [26] study as presented in Table 1. Instances in centralized data centers that were identified as errors based on predefined definitions were stored in a primary dataset. This primary dataset encompassing nearly 1.59 million records, made available for investigators in this study with information about the type of errors, the geographical location of errors (longitude and latitude), and the precise timing of errors (including date, hour, minute, and seconds), and road type. The distribution of the recorded errors by telematics in provinces of Iran is presented in Fig 2.

Table 1. Definition of error types using telematics devices in this study.

Error type Definition
Harsh braking (deceleration) x-axis deceleration lesser than −0.4 g(km.h-1s-1) [26]
Harsh acceleration x-axis acceleration more than 0.22g (7.8 km.h-1s-1) [26]
Harsh
turning
y-axis acceleration more than 0.7g (24.7 km.h-1s-1) [26]
Over speed Violation occurs when the vehicle speed exceeds the speed limit by more than 20%, determined by comparing the vehicle speed sent by telematics with the speed limit of the vehicle location. Speed limit information is obtained from Nominatim OpenStreetMap API, considering road geography, type, and speed limits [26,27].
Fatigue Continuous driving for more than 240 minutes without a 20-minute rest [26]

Fig 2. The distribution of errors recorded by telematics and accident hotspots in provinces of Iran.

Fig 2

A) Errors recorded by telematics. B) accident hotspots.

Preprocessing and data cleaning

The primary dataset initially contained information solely about location, time, road type, and the type of error. To identify errors in road accident hotspots, we merged our dataset with one from the Iran Road Management Center (IRMC) including 1492 high-density accident locations by latitude and longitude [29]. The distribution of the hotspots in the provinces of Iran is presented in Fig 2 and S1 Table. These locations were determined using an index proposed by the Ministry of Roads and Urban Development of Iran [30]. Over the three-year study period (2018–2020), accident points within a 300-meter distance in a road with at least two fatal accidents or three accidents resulting in injury, or a combination of one fatal and two accidents resulting in injury, were considered as a location with potentially high-density accident occurrence. For each location, the accident index was calculated as follows:

Accident index= (1000 × l)/ (365 × A × T)

where T is the number of study years (three in this study), A is a coefficient based on road type (higher for main roads), and l is related to the severity of accidents, calculated using the formula:

l=x+3y+9z 

where x is the number of accidents causing vehicle damage without injury or death, y is the number of accidents resulting in injury, and z is the number of fatal accidents. Locations with an index value above the mean were considered high-density accident locations, or accident hotspots. To determine if an error from our telematics dataset occurred in an accident hotspot, we employed the Haversine formula [31] to calculate the distance between each error location and any of the locations in the hotspot dataset. If any distance was less than 150 meters, we considered the error to have occurred in the accident hotspot. We utilized recorded locations to identify the province in which the error occurred. Subsequently, we used the time of the errors to create columns related to the season, month, day, and hour. Additionally, we categorized each day based on whether it was a workday, holiday, long holiday (more than one subsequent holiday day), or a pre-long holiday day. The exact time of the error was also used to determine the ambient light situation, including day, night, or twilight, based on the time of sunrise and sunset on that specific date.

To incorporate weather-related variables, we sent requests to the Meteostat website API [32], including the exact location and time of the error using the Scrapy package [33]. However, it’s important to note that a significant portion of the requested weather data was unavailable. Only a subset of the dataset, comprising approximately 690 thousand records, contained at least one non-missing weather-related variable. These variables included temperature (in Celsius (°C)), dew point (in °C), relative humidity (in percentage), wind direction (in degrees), average wind speed (in kilometers per hour), sea-level air pressure (in hectopascal (hPa)), and weather conditions (classified as clear, cloudy, foggy, rainy, snowy, or stormy). Among these variables, weather condition non-missing values comprised about 300 thousand rows of the dataset, roughly half of the other weather-related variables. To maximize data utilization, we used latitude, longitude, month, ambient light situation, hour, and other weather-related variables of each error to impute missing values in the weather conditions. For this purpose, 70% of dataset with non-missing values of weather conditions variable, was used to train histogram-based gradient boosting decision trees (HistGBDT) due to its ability to handle missing values in features [3436] The performance was optimized by grid search, and was evaluated with the remaining 30% of the dataset that showed an accuracy of 93% and a weighted average F1-score of 92%. This model was then used to impute the weather condition for all instances with missing values for this variable. Due to the large dataset and the study’s objective of assessing all possible variables, we restricted our ML analysis to data without any missing values, resulting in a final dataset comprising 619,988 rows.

To mitigate the impact of variations in quantitative variables units across different models in the next steps, all quantitative features were standardized using the ‘StandardScaler’ from the Scikit-Learn package [35]. Additionally, if it was required, all categorical variables were one-hot encoded before modelling.

Outcome imbalance

During the initial exploratory analysis of our dataset, we identified a significant imbalance in the outcome variable, which represented errors in road accident hotspots. Specifically, the prevalence of errors in road accident hotspots was only about 1.9% in the dataset.

This imbalance could result in falsely high accuracy, as the model might simply assign all outcomes to the majority class, failing to effectively capture the minority class. To mitigate this issue, various techniques could be employed. Instead of opting for conventional methods such as over-sampling (e.g., synthetic minority oversampling (SMOTE)) or under-sampling (e.g., K-nearest neighbors (KNN models)), we chose to leverage ensemble models [37]. In ensemble models, the minority class remains fixed, and in each iteration, a random sample is drawn from the majority class for training. This process is repeated over several iterations, allowing the model to train on different subsets of the main dataset. The final decision of the model for a test dataset is determined by the aggregated vote of all trained models, similar to other ensemble models. Throughout the steps of feature selection, grid search, and comparing the best-performing models, we utilized ensemble models with varying numbers of estimators based on the model type and computational efficiency. Specifically, we employed the BalancedRandomForestClassifier [38], EasyEnsembleClassifier [39], and BalancedBaggingClassifier [40] for random forest (RF), Extreme Gradient Boosting (XGBoost), and other models, respectively, from the ‘imblearn’ library [37].

Train/test splitting of dataset

We split the dataset into 70% training and 30% testing sets. We kept the test set separate from the training set at all stages, including feature selection and model development, and used it only for evaluating the performance of each model. This approach ensures the training set remains uncontaminated by any information from the test set, thereby ensuring the validity of model evaluations.

Feature selection

To identify the most influential predictors for our models, ensuring optimal performance and reduced model complexity, we employed feature selection techniques involving balanced random forest and the examination of variable correlations. For feature selection, a balanced random forest classifier with 1000 estimators, with replacement, and a maximum depth of 100, was trained on the training set, using the occurrence of errors in accident hotspots as the outcome. The subsequent model was utilized for permutation importance analysis on the test dataset, aiming to discern the significance of each predictor. Permutation feature importance quantifies the reduction in a model’s score when the values of a single predictor are randomly shuffled [41]. In this study, permutation analysis was performed on 1000 random samples, each with the same size as the minority outcome, obtained through bootstrapping from the majority outcome data. This process was iterated 10 times. This method was chosen for its robustness, particularly in handling the high cardinality of categorical variables, such as province and hour in our dataset [41]. Additionally, we calculated various correlation metrics—Pearson correlation, Cramér’s V, Point Biserial Correlation, and Correlation Ratio—to assess relationships among quantitative-quantitative, categorical-categorical, quantitative-binary categorical, and quantitative-non-binary categorical variables, respectively. Predictors with no negative permutation importance scores were selected for further analysis. In instances of high correlation among selected predictors, we prioritized the predictor with a higher permutation importance score.

Machine learning models and evaluation

To assess the predictive performance on our dataset, we compared the outcomes of six distinct models: (1) logistic regression [16], (2) K-nearest neighbors (KNN) [42], (3) random forest (RF) [42], (4) Extreme Gradient Boosting (XGBoost) [43], (5) Naïve Bayes [42], and (6) support vector machine (SVM) [42]. Notably, a linear SVM was chosen to accommodate the large size of our dataset, a factor limiting computational capacity when using SVM [44]. Models were trained on the training dataset. While various approaches such as grid search, random search, and Bayesian optimization exist for model optimization, grid search has demonstrated comparable performance [45]. Therefore, it was selected for hyperparameter tuning in this study. Grid search was conducted using stratified 5-fold cross-validation with 3 repeats. Performance metrics were calculated during hyperparameter tuning, with the selection of the best-performing model determined using the receiver operating characteristic (ROC) plot area under the curve (AUC) metric. After identifying the best-performing model for each of the six machine learning algorithms, performance evaluation across different models was conducted by comparing the 95% uncertainty interval (UI) of AUC, obtained by bootstrapping with 1000 iterations on the test dataset. Considering other performance evaluation metrics, AUC was specifically chosen for its robustness in the presence of outcome imbalance. The models were implemented using the Scikit-Learn and XGBoost packages [36,46].

Model interpretation

After selecting the best-performing model, we employed SHapley Additive exPlanations (SHAP) to interpret the model [47]. SHAP is a novel game-theory-based approach that calculates the contribution of each feature to the model’s predictions [17]. It assigns an importance value (SHAP value) to each feature, explaining the role of each feature in predicting the outcome for each observation. These importance values can be summarized using visualizations such as bee swarm plots or the mean of the absolute SHAP values for each feature. Consider a model where a set N with n features is utilized to predict an output v(N). In SHAP, each feature’s contribution (ϕi representing the contribution of feature i) on the model output v(Nis calculated based on their marginal impact [48]. Using axioms that guarantee a fair distribution of each feature’s contribution, SHAP values are determined as follows:

ϕi=S  N[i|S|!(n|S|1)!n! [v(S{i})v(S)]

An additive feature attribution method defines a linear function for binary features g as follows:

g(z)= ϕi+i=1Mϕizi

where zϵ [0,1]M represents 1 if a feature is present and 0 if it is not, and M denotes the total number of input features [49]. Given the additive properties of SHAP values, we initially computed the SHAP values for each predictor for every 619,988 instances in the dataset using our ensemble model. Subsequently, for each predictor, we calculated the mean of all SHAP values calculated for each instance in different iterations, consolidating them as the final SHAP values for that specific observation. As our outcome variable was categorical, and considering the values of each feature, higher feature values (or their presence in the case of categorical features) aligning with higher SHAP values indicated an association with a higher prediction of errors occurring in hotspots, and vice versa. Conversely, lower feature values (or their absence in the case of categorical features) aligning with higher SHAP values showed an inverse association of the feature with the prediction of errors occurring in accident hotspots, and vice versa. The ‘SHAP’ package was utilized for computing SHAP values and visualizing the results [50]. We also considered performing subgroup analysis if certain variables showed significant patterns.

Statistical analyses

Quantitative variables were described as the mean and standard deviation (SD), while qualitative variables were presented in the form of frequencies and percentages. A 95% confidence interval (95% CI) was utilized to indicate the uncertainty of an estimation and evaluate statistical significance. All processes related to data cleaning, modeling, and visualizations were executed using Python programming software version 3.10.13 (https://www.python.org/). A small sample of the data and the codes used for analysis are provided in S1 and S2 Files, respectively.

Ethical consideration

This study performed according to the Declaration of Helsinki. Before the installation of the device, all participating drivers provided written informed consent, which included a primary explanation of the project. They were informed their real-time driving data would be gathered and used in subsequent projects. Participants were assured that their data would remain anonymous and confidential. This project has received ethical approval from the National Institute for Medical Research Development (NIMAD), Tehran, Iran (ethics code: IR.NIMAD.REC.1394.016).

Results

Overview

After the initial cleaning of the primary telematics dataset, a total of 1,583,811 errors were available for analysis. Among these records, only 29,618 (1.87% [95% CI: 1.85–1.89]) occurred in identified accident hotspots. Examining error types revealed that the majority of errors were harsh turning (57.02% [56.95–57.10]) and fatigue (29.51% [29.44–29.58]). Regarding road types, trunks (51.99% [51.91–52.07]) and primary roads (21.01% [20.95–21.07]) accounted for the majority of errors. There were notable temporal patterns, with most errors occurring during the time intervals of 2–7 and 18–20 throughout the day. Surprisingly, the occurrence of errors was significantly lower in spring (19.08% [19.02–19.14]) compared to other seasons. January stood out as the month with the highest number of errors (11.21% [11.16–11.26]). Errors were predominantly recorded during daylight hours (47.61% [47.53–47.69]) and were more frequent on workdays (67.14% [67.07–67.22]), with Sunday, a workday in Iran, exhibiting the highest error rate at 14.94% (14.88–14.99). Analyzing errors across provinces highlighted Isfahan (13.09% [13.04–13.15]), Tehran (12.97% [12.92–13.02]), and Bushehr (12.14% [12.09–12.19]) as containing the highest error rates. Interestingly, the proportion of errors in hotspots was notably higher in Golestan (10.52% [10.00–11.05]), Hamadan (6.61% [6.35–6.86]), and Kurdistan (6.46% [6.24–6.68]) compared to other provinces. Additional details on the descriptive summary of variables in the dataset are provided in S2 Table. A descriptive summary of weather-related variables and the corresponding number of records with non-missing values are presented in Table 2.

Table 2. Descriptive summary of weather-related variables in the primary dataset.

Categorical variables
Variable Error occurrence in accident hotspots
(Number, percentage, 95% confidence interval)
Total
(Number, percentage, 95% confidence interval)
No Yes
Weather condition
Clear 188000 (98.14%)(98.08-98.2) 3570 (1.86%)(1.8-1.92) 191570 (68.05%)(67.87-68.22)
Cloudy 28453 (98.37%)(98.23-98.52) 470 (1.63%)(1.48-1.77) 28923 (10.27%)(10.16-10.39)
Foggy 41904 (99.16%)(99.07-99.24) 357 (0.84%)(0.76-0.93) 42261 (15.01%)(14.88-15.14)
Rainy 13466 (98.37%)(98.16-98.58) 223 (1.63%)(1.42-1.84) 13689 (4.86%)(4.78-4.94)
Snowy 2377 (97.42%)(96.79-98.05) 63 (2.58%)(1.95-3.21) 2440 (0.87%)(0.83-0.9)
Storm 2605 (98.45%)(97.98-98.92) 41 (1.55%)(1.08-2.02) 2646 (0.94%)(0.9-0.98)
Total 276805 (98.32%)(98.27-98.37) 4724 (1.68%)(1.63-1.73) 281529 (100.0%)(100.0-100.0)
Missing values 1277388 24894 1302282
Quantitative variables
Variables (unit) Error occurrence in accident hotspots
(Mean ± standard deviation) (proportion in non-missing)
Total
(Mean ± standard deviation) (number of non-missing)
No Yes
Temperature (°C) 19.38 ± 11.79 (98.42%) 17.38 ± 12.13 (1.58%) 19.35 ± 11.80 (671654)
Dew point (°C) 4.54 ± 9.57 (98.42%) 3.99 ± 8.09 (1.58%) 4.53 ± 9.55 (671523)
relative humidity (%) 46.11 ± 25.72 (98.42%) 49.47 ± 25.90 (1.58%) 46.16 + 25.73 (671251)
wind direction (degrees) 162.60 ± 116.92 (98.44%) 157.51 ± 113.60 (1.56%) 162.52 ± 116.87 (663861)
average wind speed (kilometers per hour) 10.09 ± 7.55 (98.38%) 9.20 ± 7.47 (1.62%) 10.08 ± 7.55 (637718)
sea-level air pressure (hPa) 1013.95 ± 8.57 (98.43%) 1014.19 ± 7.98 (1.57%) 1013.96 ± 8.55 (61598)
°C: Celsius, hPa: hectopascal

Feature selection

A descriptive summary of records without missing values after imputing weather condition variable is provided in S3 Table. Evaluation of the trained balanced random forest classifier for feature selection on the test set demonstrated a weighted average F1-score of 89.00% and an AUC score of 90.90%. The results of permutation importance, depicted in Fig 3, revealed that province, road type, and error type exhibited notably higher importance compared to other features. Dew point, weather condition, hour, average wind speed, and relative humidity also showed positive permutation importance values. Furthermore, we assessed the correlation among different variables, presenting the heatmaps in S1 Fig. The analysis indicated that the variables selected by permutation importance were not strongly correlated with each other. Consequently, all eight selected variables were considered for model training in the subsequent steps.

Fig 3. Feature selection using permutation importance on test dataset of the balanced random forest model.

Fig 3

Model evaluation

The hyperparameters tuned during grid search, along with those of the best-performing model, are detailed in S4 and S5 Tables. Among the best-performing models of each type, XGBoost emerged as the most performant, achieving an AUC of 91.70% (95% UI: 91.33% − 92.09%). Notably, RF (AUC: 91.14% [95% UI: 90.72% − 91.55%]) and KNN (90.09% [89.60% − 90.58%]) demonstrated competitive performance. Conversely, SVM exhibited the least favorable performance among the trained models, by an AUC of 82.03% (81.37% − 82.66), inferior to both Naïve Bayes (84.77% [84.09% − 85.40%]) and Logistic Regression (85.73% [85.15% − 86.28%]) (Fig 4). Consequently, the XGBoost model, with its superior AUC and balanced accuracy of 84.70%, was selected for interpretation. Additional performance evaluation metrics of models are detailed in S6 Table.

Fig 4. Receiver operating characteristic (ROC) curve for prediction of error occurrence in accident hotspots for all evaluated ML models with their corresponding AUC.

Fig 4

ML: machine learning, AUC: area under the curve.

Model interpretation

Fig 5, which illustrates the mean absolute SHAP values for features exceeding 0.05, underscores the pivotal impact of location-related, road-type, and error-type features on the model’s predictive decisions. Tehran province significantly influenced model decisions (mean absolute SHAP = 0.74), and trunk roads (0.36), fatigue (0.33), and Hamadan province (0.32) also played substantial roles. Among weather-related variables, dew point emerged as a prominent contributor, boasting the highest mean absolute SHAP value (0.14). S2 Fig provides a comprehensive view of the mean absolute SHAP values for all features.

Fig 5. The mean absolute SHAP values for predictors with values exceeding 0.05.

Fig 5

Fig 6 delineates the impact of features with a mean absolute SHAP greater than 0.05 on the model output, while SHAP values for all features are presented in S3 Fig. Error occurrences in Tehran were associated with negative SHAP values, indicating a connection to non-hotspot accidents. Conversely, errors in Hamadan, Bushehr, and Kurdistan showed higher SHAP values, signifying associations with accident hotspots. For road types, errors in trunks were primarily linked to accidents in non-hotspots, while errors in motorways exhibited positive SHAP values, indicating occurrences in accident hotspots. Fatigue exhibited a remarkable association with higher SHAP values, signifying a strong link with accidents in hotspots. Among other error types, harsh turning did not positively contribute to the prediction of occurring errors in the accident hotspot. Weather-related variables showed diverse SHAP values based on their values. Higher dew points and lower relative humidity were especially related to lower SHAP values, indicating errors in non-hotspots. Foggy weather was associated with lower SHAP values, while cloudy weather was predominantly associated with higher values. Other weather conditions represented modest SHAP values. Hour 19 showed the highest SHAP values, indicating a significant association with occurrences in hotspots.

Fig 6. SHAP beeswarm summary plot of the most important predictors for error occurrence in accident hotspots.

Fig 6

Due to the significance of the provinces in the analyses, we conducted subgroup analyses. In these subgroup analyses, we repeated analysis on the data from three important provinces including Tehran, Hamadan, and Bushehr. The results are shown in S4 Fig. Different variables showed importance in each province. While in Tehran, fatigue was associated with errors in accident hotspots, in Hamadan and Bushehr, harsh turning and primary roads were the most important variables.

Discussion

Utilizing a telematics system to aggregate data on predefined errors within intercity buses across all provinces of Iran and integrating this information with weather data, our study explored various machine learning models to predict the occurrence of errors in accident hotspots. Ultimately, we achieved optimal predictive performance with an XGBoost model with an AUC of 91.70%, which is considered as an outstanding performance based on AUC interpretation [51]. Furthermore, our findings underscored the location of errors (province of occurrence) and road types as the most important predictors in anticipating errors in accident hotspots. Additionally, behavioral factors, specifically driver fatigue, emerged as a crucial predictor, while variables associated with weather and time exhibited the least contribution to predicting errors in hotspots. Thus, this study indicated that factors related to the location were more important in predicting the occurrence of errors in accident hotspots compared to other factors.

Our study showed that decision tree-based models, including XGBoost and Random Forest, had the best performance. Similarly, other studies have also demonstrated strong performance of decision tree-based models [52,53]. A comprehensive examination of a 6-year dataset from Michigan Traffic Agencies (MTA) with nearly 270 thousand crashes, revealed that RF models could accurately predict injury severity with an accuracy of 75.5%. This study utilized variables related to driver demographics, environmental factors, and behavior [13]. Another study in Portugal identified RF as the best-performing model for predicting accident hotspots, achieving an AUC of 0.68 and an accuracy of 73% [14]. In the analysis of Nebraska crash data, KNN exhibited the most effective performance in predicting accident severity, followed by RF [54]. Another study, exploring various machine learning algorithms for road accident prediction, identified a model achieving a 61% prediction rate with a false alarm rate of 38% [55]. In another study, the RF outperformed other models in predicting injury severity, achieving higher accuracy in individual classes as well as overall prediction performance [56]. An alternative study, aimed at detecting accident occurrences using real-world data from Chicago metropolitan expressways, employed the XGBoost model, achieving an AUC of 89% and near-perfect accuracy [48]. Notably, their dataset also suffered from imbalance, yet the authors addressed this issue using SMOTE. While prior research has shown XGBoost’s superiority over neural networks in the context of balanced structured datasets, and its effectiveness in handling the complex data distribution within the feature space [57], both our study and the work by Parsa et al. [48] demonstrate with the appropriate approach to resolving data imbalance and subtle feature selection, XGBoost—a newer boosting decision tree introduced in 2016 [43]—can be an ideal model for predicting events. Its computational efficiency and ability to handle missing values have contributed to its growing popularity in recent years [58]. While XGBoost is often claimed to possess superior predictive performance, concerns have been raised regarding its applicability to tasks requiring interpretability [59]. However, the use of SHAP, a tool that overcomes the limitations associated with similar tools, has become prevalent [58]. This combination of XGBoost and SHAP has demonstrated effectiveness in achieving both predictive accuracy and interpretability in similar studies [6063]. This makes it a strong option for future research seeking both high accuracy and explainability.

In our study, feature selection based on permutation importance and model interpretation highlighted the significance of some previously evaluated predictors in forecasting errors in accident hotspots. The analysis revealed variables related to the time of the error had lesser importance. This contrasts with other studies on accident occurrence in Iran, which have demonstrated a notable association between time and accidents [9]. A study reported that approximately 28% of all accidents and about one-third of fatal accidents occurred in summer, particularly in August [64]. Regarding the occurrence of accidents on different days, Friday had the fewest accidents, and most road traffic injuries occurred on the day preceding long holidays, possibly due to heavier traffic on these days [64]. Data collected from the Iranian Legal Medicine Organization indicated a fluctuating injury rate throughout the year, following a periodic pattern with peaks in spring and summer, and declines in autumn and winter [65], consistent with injury death patterns [66]. However, it is essential to note that these differences could stem from methodological variances and the specific focus of each study. While time-related variables are potentially associated with the occurrence of accidents, they did not hold the same level of importance in predicting errors in hotspots when considered alongside other predictors in our model.

Our study highlighted the significance of spatial-related variables, particularly the province of error and road type, as the most important features for predicting errors in accident hotspots in Iran. Our findings revealed that the occurrence of accidents in Tehran, the capital province of Iran, is associated with a lower prediction of hotspot accidents, while provinces like Hamadan, Bushehr, and Kurdistan show a higher risk of error occurrence in hotspots. This aligns with previous studies that demonstrate certain clusters are more prone to accidents. Clustering methodologies have been employed in studies identifying accident hotspot zones and their spatial distribution in London, leading to the conclusion that location-specific policies should be adapted [67]. Another study utilized clustering methods to identify accident hotspots and then evaluated the impact of environmental road structure on drivers’ behavior [68]. While approximately 75% of accidents happen in city streets, fatal accidents are more common in rural areas, possibly due to a higher likelihood of dangerous behaviors such as overspeeding on outer city roads [64]. Factors related to road types, such as lane width, median existence, median type, shoulder width, horizontal curves, and vertical curves, can influence driving style and drivers’ behavior [69]. Additionally, regulations such as speed limits, the presence of cameras, traffic signs, and features like congestion levels can vary by road type [70]. While it is known that location-related variables are important, our study showed they were more significant than behavioral and weather variables. We identified specific provinces and road types associated with higher or lower error risks, guiding policymakers on resource allocation. Our findings suggest that improving location-related conditions in Iran can significantly reduce errors in hotspots. The subgroup analysis of the three most important provinces revealed that different variables are important in each province, highlighting the necessity of province-specific strategies and the evaluation of each province situation to avoid generalizing risk variables. Our study interpreted fatigue as one of the behavioral errors associated with a higher risk of predicting accident hotspots, and certain weather-related variables also demonstrated importance in hotspot prediction. The relationship between a fatigued or sleepy state and a decline in performance has been well-established, leading to increased accident risks and errors [71]. Thus, detecting driver fatigue with simple strategies like assessing the continuity of driving time would reduce accident risks. These strategies could operate at both individual and governance levels, such as sending alerts to drivers who have been driving for extended periods or establishing penalties for prolonged driving. Additionally, drivers’ behavior is influenced by weather conditions due to their impact on speed, road surfaces, and visibility [69,70]. Therefore, utilizing weather data would be beneficial for predicting potential driver behaviors in each location.

While the use of telematics in Iran and similar countries is not yet common, exploring its potential applications could lead to its wider integration in vehicles. It has been shown machine learning techniques are effective in providing alert information to travelers in unfamiliar locations, thereby reducing accidents [72]. Prior telematics studies have demonstrated its cost-effectiveness in monitoring and changing driving performance in Iran [21,26,73]. While informational interventions alone may not be sufficient, combining them with immediate consequences, such as prompt punishments, has proven effective in altering driver behavior [74]. Considering these insights, telematics emerges as a cost-effective option for vehicle and insurance companies. By aggregating real-time vehicle location data, integrating it with other relevant data sources, including weather and environmental variables, and identifying predefined errors such as drivers’ fatigue or risky driving behaviors, telematics systems can send alert messages to drivers at high risk of making errors in accident hotspots.

Strengths and limitations

Our study had several limitations that should be considered. First, our analysis focused on the behavioral errors of bus drivers—a group with specialized driving skills. Extrapolating these findings to the general population requires caution, as professional drivers have unique characteristics that may not fully represent all road users. Second, bus drivers typically operate on fixed and familiar routes, which may influence the nature of their errors compared to drivers navigating unfamiliar roads. Additionally, the findings of this study may not be directly applicable to other regions or transport systems due to differences in infrastructure, regulations, and driving behaviors. Another limitation is the study’s reliance on data from a single year. Without longitudinal data spanning multiple years, it is difficult to assess trends, seasonality, or long-term changes in predictor variables. Future research could incorporate multi-year datasets to predict the frequency of error occurrences in accident hotspots, providing policymakers with more reliable insights for road safety interventions [75]. Moreover, to streamline the model, we used a limited set of variables related to behavioral errors. While these factors are crucial, other variables—such as personality traits, age, and education level—may also influence driving behaviors. Research suggests that dangerous driving behaviors are less prevalent among men, individuals with intermediate education levels, and the elderly [76]. Additionally, factors such as road maintenance, law enforcement measures, and socio-economic conditions could further impact error risk at accident hotspots. Future studies should incorporate a broader range of driver-, vehicle-, and environment-related variables to develop more comprehensive predictive models. Although we implemented thorough data preprocessing, potential errors or biases in telematics and weather data collection remain. The study assumes that all variables were accurately measured and recorded. Future research should utilize datasets with a wider range of validated variables to improve analytical depth and accuracy.

Despite these limitations, our study has several strengths. First, data collection via telematics devices reduces information bias compared to self-reported driver data. The integration of large-scale data from multiple sources enhances the robustness and interpretability of our findings. By incorporating spatial, temporal, behavioral (fatigue), and weather-related variables, our study provides a multifaceted perspective on road safety and highlights the interactions between key predictive factors. Furthermore, we employed six different machine learning models, ranging from simple to advanced techniques, ensuring a comprehensive evaluation of predictive performance rather than relying on a single algorithm. The use of SHAP values enhances the interpretability of our models by identifying the most influential predictors, facilitating actionable policy recommendations. Finally, we addressed the significant imbalance in the outcome variable by applying an ensemble approach, which improves predictive reliability and reduces bias against the minority class.

Conclusion

In conclusion, this study employed machine learning techniques to develop a predictive model for identifying occurrences of errors in road accident hotspots, utilizing a telematics dataset integrated with weather information in Iran, a low-middle-income country. The XGBoost model, achieving an AUC of 91.70%, demonstrated outstanding predictive performance. While it was known location-related variables were important, SHAP values showed that province and road type were the most important predictors among different spatiotemporal, weather, and behavioral factors. This finding suggests a need for prioritized attention from policymakers toward these two factors and the development of province-specific strategies. For future research, employing larger datasets with additional variables can contribute to the development of more accurate and reliable predictive models.

Supporting information

S1 Fig. Evaluating correlation among variables using different correlation techniques.

A) Pearson’s correlation for quantitative variables, B) Cramér’s V for categorical variables, C) Correlation of categorical vs quantitative variables.

(PNG)

pone.0326483.s001.png (983.4KB, png)
S2 Fig. The mean absolute SHAP values for all predictors.

(PNG)

pone.0326483.s002.png (934.2KB, png)
S3 Fig. SHAP beeswarm summary plot of all predictors.

(PNG)

pone.0326483.s003.png (958.5KB, png)
S1 Table. Distribution of accident hotspots in Iran.

(DOCX)

pone.0326483.s004.docx (14.3KB, docx)
S2 Table. Descriptive summary of variables excluding weather-related variables in the primary dataset.

(DOCX)

pone.0326483.s005.docx (23.6KB, docx)
S3 Table. Descriptive summary of variables in the dataset after prediction of weather condition and without any missing values.

(DOCX)

pone.0326483.s006.docx (31.1KB, docx)
S4 Table. Models hyperparameters tuning.

(DOCX)

pone.0326483.s007.docx (13.8KB, docx)
S5 Table. Performance of different models during hyperparameters tuning by grid search.

(XLSX)

pone.0326483.s008.xlsx (19.6KB, xlsx)
S6 Table. Evaluation of machine learning models for the prediction of error occurrence in accident hotspots.

(DOCX)

pone.0326483.s009.docx (13.9KB, docx)
S1 File. A small sample of data used for analysis (in pickle format).

(PKL)

pone.0326483.s010.pkl (44.6KB, pkl)
S2 File. Codes used during analysis process.

(IPYNB)

pone.0326483.s011.ipynb (89.4KB, ipynb)

Acknowledgments

The authors gratefully acknowledge the National Institute for Medical Research Development (NIMAD), Tehran, Iran for its support. We also thank Kavi Bhalla, PhD, for his contribution to the revision of the manuscript.

Data Availability

The datasets generated and/or analysed during the current study are not publicly available due to the restrictions set by the funder of the study, National Institute for Medical Research Development (NIMAD). However, researchers with written permission can request to obtain the anonymized data. Requests to access the datasets should be directed to the NIMAD website (https://nimad.ac.ir/). A small sample of data is provided in the ‘supporting information’ section.

Funding Statement

This work was supported by the National Institute for Medical Research Development (NIMAD), Tehran, Iran [grant number: 940567]. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.WHO. Road traffic injuries; 2023. [cited 2025 Jan 31]. Available from: https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries [Google Scholar]
  • 2.(IHME) IoHMaE. GBD compare. [cited 2024 May 31]. Available from: https://vizhub.healthdata.org/gbd-compare/ [Google Scholar]
  • 3.Rezaei S, Arab M, Karami Matin B, Akbari Sari A. Extent, consequences and economic burden of road traffic crashes in Iran. J Inj Violence Res. 2014;6(2):57–63. doi: 10.5249/jivr.v6i2.191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ghandour R, Potams AJ, Boulkaibet I, Neji B, Al Barakeh Z. Driver behavior classification system analysis using machine learning methods. Appl Sci. 2021;11(22):10562. doi: 10.3390/app112210562 [DOI] [Google Scholar]
  • 5.Goniewicz K, Goniewicz M, Pawłowski W, Fiedor P. Road accident rates: strategies and programmes for improving road traffic safety. Eur J Trauma Emerg Surg. 2016;42(4):433–8. doi: 10.1007/s00068-015-0544-6 [DOI] [PubMed] [Google Scholar]
  • 6.Shafabakhsh GA, Famili A, Bahadori MS. GIS-based spatial analysis of urban traffic accidents: case study in Mashhad, Iran. J Traffic Transp Eng (Engl Ed). 2017;4(3):290–9. doi: 10.1016/j.jtte.2017.05.005 [DOI] [Google Scholar]
  • 7.Eboli L, Mazzulla G, Pungillo G, Pungillo R. Analysing car users’ driving behaviour: safety domains for different types of roads. Adv Transp Stud. 2018;46. [Google Scholar]
  • 8.Brijs T, Karlis D, Wets G. Studying the effect of weather conditions on daily crash counts using a discrete time-series model. Accid Anal Prev. 2008;40(3):1180–90. doi: 10.1016/j.aap.2008.01.001 [DOI] [PubMed] [Google Scholar]
  • 9.Zangeneh A, Najafi F, Karimi S, Saeidi S, Izadi N. Spatial-temporal cluster analysis of mortality from road traffic injuries using geographic information systems in West of Iran during 2009-2014. J Forensic Leg Med. 2018;55:15–22. doi: 10.1016/j.jflm.2018.02.009 [DOI] [PubMed] [Google Scholar]
  • 10.Khan MQ, Lee S. A Comprehensive survey of driving monitoring and assistance systems. Sensors (Basel). 2019;19(11):2574. doi: 10.3390/s19112574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Siami M, Naderpour M, Lu J. A mobile telematics pattern recognition framework for driving behavior extraction. IEEE Trans Intell Transport Syst. 2021;22(3):1459–72. doi: 10.1109/tits.2020.2971214 [DOI] [Google Scholar]
  • 12.Bokaba T, Doorsamy W, Paul BS. Comparative study of machine learning classifiers for modelling road traffic accidents. Applied Sciences. 2022;12(2):828. doi: 10.3390/app12020828 [DOI] [Google Scholar]
  • 13.AlMamlook RE, Kwayu KM, Alkasisbeh MR, Frefer AA, editors. Comparison of machine learning algorithms for predicting traffic accident severity. 2019 IEEE Jordan international joint conference on electrical engineering and information technology (JEEIT). IEEE; 2019. [Google Scholar]
  • 14.Santos D, Saias J, Quaresma P, Nogueira VB. Machine learning approaches to traffic accident analysis and hotspot prediction. Computers. 2021;10(12):157. doi: 10.3390/computers10120157 [DOI] [Google Scholar]
  • 15.Kumeda B, Zhang F, Zhou F, Hussain S, Almasri A, Assefa M, editors. Classification of road traffic accident data using machine learning algorithms. 2019 IEEE 11th international conference on communication software and networks (ICCSN). IEEE; 2019. [Google Scholar]
  • 16.James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. Springer; 2013. [Google Scholar]
  • 17.Molnar C. A guide for making black box models explainable. 2018;2(3). Available from: https://christophmgithubio/interpretable-ml-book [Google Scholar]
  • 18.Amarasinghe M, Kottegoda S, Arachchi AL, Muramudalige S, Bandara HD, Azeez A, editors. Cloud-based driver monitoring and vehicle diagnostic with OBD2 telematics. 2015 fifteenth international conference on advances in ICT for emerging regions (ICTer). IEEE; 2015. [Google Scholar]
  • 19.Cho K, Bae C, Chu Y, Suh M. Overview of telematics: a system architecture approach. Int J Automot Technol. 2006;7(4):509–17. [Google Scholar]
  • 20.Malekpour M-R, Azadnajafabad S, Rezazadeh-Khadem S, Bhalla K, Ghasemi E, Heydari ST, et al. The effectiveness of fixed speed cameras on Iranian taxi drivers: an evaluation of the influential factors. Front Public Health. 2022;10:964214. doi: 10.3389/fpubh.2022.964214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ghamari A, Rezaei N, Malekpour M-R, Azadnajafabad S, Jafari A, Ahmadi N, et al. The effect of non-punitive peer comparison and performance feedback on drivers’ behavior using the telematics: the first randomized trial in Iran. J Safety Res. 2022;82:430–7. doi: 10.1016/j.jsr.2022.07.010 [DOI] [PubMed] [Google Scholar]
  • 22.Boylan J, Meyer D, Chen WS. A systematic review of the use of in-vehicle telematics in monitoring driving behaviours. Accid Anal Prev. 2024;199:107519. doi: 10.1016/j.aap.2024.107519 [DOI] [PubMed] [Google Scholar]
  • 23.Safavi-Naini SAA, Sobhani S, Malekpour M-R, Bhalla K, Shahraz S, Haghshenas R, et al. Drivers’ behavior confronting fixed and point-to-point speed enforcement camera: agent-based simulation and translation to crash relative risk change. Sci Rep. 2024;14(1):1863. doi: 10.1038/s41598-024-52265-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Winlaw M, Steiner SH, MacKay RJ, Hilal AR. Using telematics data to find risky driver behaviour. Accid Anal Prev. 2019;131:131–6. doi: 10.1016/j.aap.2019.06.003 [DOI] [PubMed] [Google Scholar]
  • 25.Peer S, Muermann A, Sallinger K. App-based feedback on safety to novice drivers: learning and monetary incentives. Transp Res Part F: Traffic Psychol Behav. 2020;71:198–219. [Google Scholar]
  • 26.Azmin M, Jafari A, Rezaei N, Bhalla K, Bose D, Shahraz S, et al. An approach towards reducing road traffic injuries and improving public health through big data telematics: a randomised controlled trial protocol. Arch Iran Med. 2018;21(11):495–501. [PubMed] [Google Scholar]
  • 27.Nominatim. Nominatim; 2023. [cited 2023 Oct 1]. Available from: https://nominatim.openstreetmap.org/ui/reverse.html [Google Scholar]
  • 28.OpenStreetMap. OpenStreetMap road types; 2023. [cited 2023 Oct 1]. Available from: https://wiki.openstreetmap.org/wiki/Key:highway#Roads [Google Scholar]
  • 29.(IRMC) IRMC; 2023. [cited 2023 Oct 1]. Available from: https://141.ir/ [Google Scholar]
  • 30.Iran MoRaUDo; 2023. [cited 2023 Oct 1]. Available from: https://www.mrud.ir/ [Google Scholar]
  • 31.Chopde NR, Nichat M. Landmark based shortest path detection by using A* and Haversine formula. IJIRCCE. 2013;1(2):298–302. [Google Scholar]
  • 32.Developers M; 2023. [cited 2023 Oct 1]. Available from: https://dev.meteostat.net/ [Google Scholar]
  • 33.Scrapy; 2023. [cited 2023 Oct 15]. Available from: https://scrapy.org/ [Google Scholar]
  • 34.Kashifi M, Ahmad I. Efficient histogram-based gradient boosting approach for accident severity prediction with multisource data. Transp Res Rec. 2022;2676(6):236–58. [Google Scholar]
  • 35.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30. [Google Scholar]
  • 36.Scikit-learn. Histogram-based gradient boosting classification tree; 2023. [cited 2023 Oct 15]. Available from: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html [Google Scholar]
  • 37.Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(17):1–5. [Google Scholar]
  • 38.Imbalanced-learn. A balanced random forest classifier; 2023. [cited 2023 Oct 15]. Available from: https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.BalancedRandomForestClassifier.html [Google Scholar]
  • 39.Imbalanced-learn. EasyEnsemble; 2023. [cited 2023 Oct 15]. Available from: https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.EasyEnsembleClassifier.html [Google Scholar]
  • 40.Imbalanced-learn. Bagging classifier; 2023. [cited 2023 Oct 15]. Available from: https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.BalancedBaggingClassifier.html [Google Scholar]
  • 41.Altmann A, Toloşi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. Bioinformatics. 2010;26(10):1340–7. doi: 10.1093/bioinformatics/btq134 [DOI] [PubMed] [Google Scholar]
  • 42.Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. Springer; 2009. [Google Scholar]
  • 43.Chen T, Guestrin C, editors. Xgboost: a scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. [Google Scholar]
  • 44.Chauhan VK, Dahiya K, Sharma A. Problem formulations and solvers in linear SVM: a review. Artif Intell Rev. 2018;52(2):803–55. doi: 10.1007/s10462-018-9614-6 [DOI] [Google Scholar]
  • 45.Ngusie HS, Mengiste SA, Zemariam AB, Molla B, Tesfa GA, Seboka BT, et al. Predicting adverse birth outcome among childbearing women in Sub-Saharan Africa: employing innovative machine learning techniques. BMC Public Health. 2024;24(1):2029. doi: 10.1186/s12889-024-19566-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.XGBoost documentation; 2023. [cited 2023 Oct 1]. Available from: https://xgboost.readthedocs.io/en/stable/ [Google Scholar]
  • 47.Masís S. Interpretable machine learning with python: learn to build interpretable high-performance models with hands-on real-world examples. Packt Publishing Ltd; 2021. [Google Scholar]
  • 48.Parsa AB, Movahedi A, Taghipour H, Derrible S, Mohammadian AK. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid Anal Prev. 2020;136:105405. doi: 10.1016/j.aap.2019.105405 [DOI] [PubMed] [Google Scholar]
  • 49.Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30. [Google Scholar]
  • 50.SHAP documentation; 2023. [cited 2023 Nov 1]. Available from: https://shap.readthedocs.io/en/latest/index.html [Google Scholar]
  • 51.Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315–6. doi: 10.1097/JTO.0b013e3181ec173d [DOI] [PubMed] [Google Scholar]
  • 52.Silva PB, Andrade M, Ferreira S. Machine learning applied to road safety modeling: a systematic literature review. J Traffic Transp Eng (Engl Ed). 2020;7(6):775–90. [Google Scholar]
  • 53.Atumo EA, Fang T, Jiang X. Spatial statistics and random forest approaches for traffic crash hot spot identification and prediction. Int J Inj Contr Saf Promot. 2022;29(2):207–16. doi: 10.1080/17457300.2021.1983844 [DOI] [PubMed] [Google Scholar]
  • 54.Iranitalab A, Khattak A. Comparison of four statistical and machine learning methods for crash severity prediction. Accid Anal Prev. 2017;108:27–36. doi: 10.1016/j.aap.2017.08.008 [DOI] [PubMed] [Google Scholar]
  • 55.Lin L, Wang Q, Sadek AW. A novel variable selection method based on frequent pattern tree for real-time traffic accident risk prediction. Transp Res Part C: Emerg Technol. 2015;55:444–59. doi: 10.1016/j.trc.2015.03.015 [DOI] [Google Scholar]
  • 56.Islam MK, Reza I, Gazder U, Akter R, Arifuzzaman M, Rahman MM. Predicting road crash severity using classifier models and crash hotspots. Appl Sci. 2022;12(22):11354. [Google Scholar]
  • 57.Wu J, Li Y, Ma Y, editors. Comparison of XGBoost and the neural network model on the class-balanced datasets. 2021 IEEE 3rd international conference on frontiers technology of information and computer (ICFTIC). IEEE; 2021. [Google Scholar]
  • 58.Ali Y, Hussain F, Haque MM. Advances, challenges, and future research needs in machine learning-based crash prediction models: a systematic review. Accid Anal Prev. 2024;194:107378. doi: 10.1016/j.aap.2023.107378 [DOI] [PubMed] [Google Scholar]
  • 59.Sagi O, Rokach L. Approximating XGBoost with an interpretable decision tree. Inf Sci. 2021;572:522–42. doi: 10.1016/j.ins.2021.05.055 [DOI] [Google Scholar]
  • 60.Ma Y, Zhang J, Lu J, Chen S, Xing G, Feng R. Prediction and analysis of likelihood of freeway crash occurrence considering risky driving behavior. Accid Anal Prev. 2023;192:107244. doi: 10.1016/j.aap.2023.107244 [DOI] [PubMed] [Google Scholar]
  • 61.Chang I, Park H, Hong E, Lee J, Kwon N. Predicting effects of built environment on fatal pedestrian accidents at location-specific level: application of XGBoost and SHAP. Accid Anal Prev. 2022;166:106545. doi: 10.1016/j.aap.2021.106545 [DOI] [PubMed] [Google Scholar]
  • 62.Yang C, Chen M, Yuan Q. The application of XGBoost and SHAP to examining the factors in freight truck-related crashes: an exploratory analysis. Accid Anal Prev. 2021;158:106153. doi: 10.1016/j.aap.2021.106153 [DOI] [PubMed] [Google Scholar]
  • 63.Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One. 2023;18(2):e0281922. doi: 10.1371/journal.pone.0281922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Khorshidi A, Ainy E, Hashemi Nazari SS, Soori H. Temporal patterns of road traffic injuries in Iran. Arch Trauma Res. 2016;5(2):e27894. doi: 10.5812/atr.27894 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Delavary Foroutaghe M, Mohammadzadeh Moghaddam A, Fakoor V. Time trends in gender-specific incidence rates of road traffic injuries in Iran. PLoS One. 2019;14(5):e0216462. doi: 10.1371/journal.pone.0216462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Bahadorimonfared A, Soori H, Mehrabi Y, Delpisheh A, Esmaili A, Salehi M, et al. Trends of fatal road traffic injuries in Iran (2004–2011). PLoS One. 2013;8(5):e65198. doi: 10.1371/journal.pone.0065198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Anderson TK. Kernel density estimation and K-means clustering to profile road accident hotspots. Accid Anal Prev. 2009;41(3):359–64. doi: 10.1016/j.aap.2008.12.014 [DOI] [PubMed] [Google Scholar]
  • 68.Kaygisiz Ö, Düzgün Ş, Yildiz A, Senbil M. Spatio-temporal accident analysis for accident prevention in relation to behavioral factors in driving: the case of South Anatolian Motorway. Transp Res Part F: Traffic Psychol Behav. 2015;33:128–40. doi: 10.1016/j.trf.2015.07.002 [DOI] [Google Scholar]
  • 69.Hamdar SH, Qin L, Talebpour A. Weather and road geometry impact on longitudinal driving behavior: exploratory analysis using an empirically supported acceleration modeling framework. Transp Res Part C: Emerg Technol. 2016;67:193–213. doi: 10.1016/j.trc.2016.01.017 [DOI] [Google Scholar]
  • 70.Munigety CR, Mathew TV. Towards behavioral modeling of drivers in mixed traffic conditions. Transp Dev Econ. 2016;2(1):6. doi: 10.1007/s40890-016-0012-y [DOI] [Google Scholar]
  • 71.Lal SK, Craig A. A critical review of the psychophysiology of driver fatigue. Biol Psychol. 2001;55(3):173–94. doi: 10.1016/s0301-0511(00)00085-5 [DOI] [PubMed] [Google Scholar]
  • 72.Alagarsamy S, Malathi M, Manonmani M, Sanathani T, Kumar AS, editors. Prediction of road accidents using machine learning technique. 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE; 2021. [Google Scholar]
  • 73.Tavolinejad H, Malekpour M-R, Rezaei N, Jafari A, Ahmadi N, Nematollahi A, et al. Evaluation of the effect of fixed speed cameras on speeding behavior among Iranian taxi drivers through telematics monitoring. Traffic Inj Prev. 2021;22(7):559–63. doi: 10.1080/15389588.2021.1957100 [DOI] [PubMed] [Google Scholar]
  • 74.Malekpour M-R, Ghamari S-H, Ghasemi E, Hejaziyeganeh S, Abbasi-Kangevari M, Bhalla K, et al. The effect of Real-Time feedback and incentives on speeding behaviors using Telematics: a randomized controlled trial. Accid Anal Prev. 2023;191:107216. doi: 10.1016/j.aap.2023.107216 [DOI] [PubMed] [Google Scholar]
  • 75.Fawcett L, Thorpe N, Matthews J, Kremer K. A novel Bayesian hierarchical model for road safety hotspot prediction. Accid Anal Prev. 2017;99(Pt A):262–71. doi: 10.1016/j.aap.2016.11.021 [DOI] [PubMed] [Google Scholar]
  • 76.Moghaddam AM, Ayati E. Introducing a risk estimation index for drivers: a case of Iran. Saf Sci. 2014;62:90–7. [Google Scholar]

Decision Letter 0

Habtamu Setegn Ngusie

Dear Dr. Farzadfar,

As the editor, I would like to offer the following general comments:

Introduction: Please ensure that your introduction is scientifically sound. Start with a general overview and then narrow down to specific details. Incorporate existing solutions to the problem, highlight the research gap, and reference previous studies.

Language Editing: I recommend enhancing the overall quality of the English language in your manuscript.

Abstract: Please revise your abstract to make it more compelling and ensure it meets the journal's standards. It should serve as a concise summary of the key sections of your manuscript.

Please submit your revised manuscript by Mar 02 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Habtamu Setegn Ngusie

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure:

“This work was supported by the National Institute for Medical Research Development (NIMAD), Tehran, Iran [grant number: 940567]”

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“The authors gratefully acknowledge that this research benefited from funding from the National Institute for Medical Research Development (NIMAD), Tehran, Iran by grant number 940567. We also thank Kavi Bhalla, PhD, for his contribution to the revision of the manuscript. “

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This work was supported by the National Institute for Medical Research Development (NIMAD), Tehran, Iran [grant number: 940567]”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. In the online submission form, you indicated that [Insert text from online submission form here].

All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.

This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval.

Reviewers' comments:

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: Strengths:

Comprehensive Dataset: The study leverages a large and rich dataset from 1673 intercity buses integrated with weather and location data, enabling a more holistic view of the factors influencing errors in road accident hotspots.

Multiple ML Models Compared: By examining six different machine learning models (including XGBoost, random forest, and SVM), the study robustly identifies the best-performing method rather than relying on a single algorithm.

Interpretability with SHAP: The use of SHapley Additive exPlanation (SHAP) values offers insight into the most influential predictors, enhancing the understandability of the model’s results and supporting actionable policy decisions.

Attention to Class Imbalance: The study acknowledges the severe imbalance in the outcome variable and utilizes an ensemble approach, increasing the reliability of its predictive performance and reducing bias against the minority class.

Inclusion of Diverse Predictors: Incorporating spatial, temporal, behavioral (fatigue), and weather-related variables provides a multifaceted perspective on road safety issues and highlights the potential interplay between different predictive factors.

Weaknesses:

Lack of Longitudinal Data: The analysis is limited to a single year (2020). Without longitudinal data spanning multiple years, it’s harder to assess trends, seasonality, or long-term changes in predictors.

Domain-Specific Constraints: The generalizability to other regions or transport systems may be limited, as the findings are based on Iranian intercity buses and roads, which may have unique infrastructural, regulatory, or behavioral characteristics.

Data Quality and Completeness: Although preprocessing steps were taken, potential errors or biases in telematics and weather data collection remain. The study relies on the assumption that all variables were accurately measured and recorded.

Limited Exploration of Temporal Impact: While the study includes temporal variables, it notes a limited impact on predictions. Additional, more nuanced temporal analyses (e.g., time-of-day patterns, seasonal variations) might reveal subtler relationships not captured in the current approach.

Potential Unaccounted Predictors: The study may not incorporate all relevant factors, such as detailed driver profiles, road maintenance levels, enforcement measures, or socio-economic attributes of different regions, which could also influence error risk at hotspots.

Use some or all of these papers to help with weakness

Huang, A. A., & Huang, S. Y. (2023). Increasing transparency in machine learning through bootstrap simulation and Shapley additive explanations. PLoS One, 18(2), e0281922.

Fawcett, L., Thorpe, N., Matthews, J., & Kremer, K. (2017). A novel Bayesian hierarchical model for road safety hotspot prediction. Accident Analysis & Prevention, Elsevier.

Islam, M. K., Reza, I., Gazder, U., Akter, R., & Arifuzzaman, M. (2022). Predicting road crash severity using classifier models and crash hotspots. Applied Sciences, MDPI.

Atumo, E. A., Fang, T., & Jiang, X. (2022). Spatial statistics and random forest approaches for traffic crash hot spot identification and prediction. International Journal of Injury Control and Safety, [Volume/Issue pending], Taylor & Francis.

Reviewer #2: Thank you for inviting me to review this interesting scientific paper entitled "Predicting Errors in Accident Hotspots and Investigating Spatiotemporal, Weather, and Behavioral Factors Using Interpretable Machine Learning: an Analysis of Telematics Big Data".

The paper describes about the investigation of big telematics data to predict errors in Accidents Hotspots.

The study utilized modern and sound methods of prediction of outcomes such as machine learning models. I found the paper is sound and would be more sound if the authors consider the following suggestion.

Abstract:

Introduction: The abstract for this manuscript is not clear. Please briefly introduce the outcome and the main gaps you identified.

Methods: the authors identified six machine learning models to predict the RTA. My question is what are the unique criteria to select these models as there are many more models which can aid to predict the RTA? Why you select the six models indicated in the manuscript?

Key words: add the study area, Iran .

Introduction: I found the introduction was sound.

Lines 81-82 (Notably, spatial variations in road traffic injuries demonstrate non-random cluster formations, indicating specific locations as more accident-prone)... Can you mentions the areas where the road traffic accident is more clustered ...this could enable the reader to easily understand which areas are more affected by RTA. also the decision makers could spot it out for better interventions globally or locally.

-lines 86-87...mention the different variables used to predict the behavior's of drivers (such as....)

Methods:

- please provide the geographical map of the study area and locate the spatial distribution of the RTA that has common occurrence using GPS points. That can affirm the telematics data has value for identifying the cluster locations for your outcome variable.

-line 202--remove variable occurring second time.

-lines 203-204...your statement indicated ...we restricted our analysis to data without any missing values, resulting in a final dataset comprising 619,988 rows.... My question is how do you manage missing data as if you could mentioned the methods you employed in managing missing data??

-Train /test splitting for the data

-what was your reason to select the 70 % training and 30 % testing data set? As there are also 80 %and 20 % training and testing classifications. Provide valid reference for this classification.

Result:

-It is not clear that whether the authors employed missing values for which variable. What is the methodological implication for performing missing values? Table 2 indicated the reported errors as a missing value.

-Discussions:

In the discussion section the authors tried to compare the findings with other studies conducted previously reporting using different and best performing model. Does it sound comparing different models while reporting different performances. XGBoost is one model may be best and RF is one for the other. comparing these two models is not sound. The methods used to derive the results is also different.

Minor comments

I recommend the authors to point out the strength and limitation of the study, especially in selecting the machine learning models.

-

References

-consider reference numbers 29,30, 42, 43,46,47

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jul 8;20(7):e0326483. doi: 10.1371/journal.pone.0326483.r003

Author response to Decision Letter 1


11 Mar 2025

Response to Editor:

Editor:

As the editor, I would like to offer the following general comments:

Introduction: Please ensure that your introduction is scientifically sound. Start with a general overview and then narrow down to specific details. Incorporate existing solutions to the problem, highlight the research gap, and reference previous studies.

Language Editing: I recommend enhancing the overall quality of the English language in your manuscript.

Abstract: Please revise your abstract to make it more compelling and ensure it meets the journal's standards. It should serve as a concise summary of the key sections of your manuscript.

Authors:

The authors would like to sincerely thank you for your time and thoughtful consideration of our manuscript. We have carefully revised the manuscript to address your and the reviewers’ comments to the best of our ability. We have modified the Introduction section to improve its structure and flow. We first highlighted the importance of RTAs, then discussed key contributing factors, followed by the application of machine learning techniques, the potential of telematics data, and, finally, the aim of our study. Regarding language editing, we have thoroughly reviewed the manuscript and improved its clarity and readability. Additionally, we have revised the Abstract to emphasize the key findings of our study and ensure it aligns with the journal’s standards.

We also adressed journal requirements.

Response to Reviewer 1:

Reviewer comment:

Strengths:

Comprehensive Dataset: The study leverages a large and rich dataset from 1673 intercity buses integrated with weather and location data, enabling a more holistic view of the factors influencing errors in road accident hotspots.

Multiple ML Models Compared: By examining six different machine learning models (including XGBoost, random forest and SVM), the study robustly identifies the best-performing method rather than relying on a single algorithm.

Interpretability with SHAP: The use of SHapley Additive exPlanation (SHAP) values offers insight into the most influential predictors, enhancing the understandability of the model’s results and supporting actionable policy decisions.

Attention to Class Imbalance: The study acknowledges the severe imbalance in the outcome variable and utilizes an ensemble approach, increasing the reliability of its predictive performance and reducing bias against the minority class.

Inclusion of Diverse Predictors: Incorporating spatial, temporal, behavioral (fatigue), and weather-related variables provides a multifaceted perspective on road safety issues and highlights the potential interplay between different predictive factors.

Weaknesses:

Lack of Longitudinal Data: The analysis is limited to a single year (2020). Without longitudinal data spanning multiple years, it’s harder to assess trends, seasonality, or long-term changes in predictors.

Domain-Specific Constraints: The generalizability to other regions or transport systems may be limited, as the findings are based on Iranian intercity buses and roads, which may have unique infrastructural, regulatory, or behavioral characteristics.

Data Quality and Completeness: Although preprocessing steps were taken, potential errors or biases in telematics and weather data collection remain. The study relies on the assumption that all variables were accurately measured and recorded.

Limited Exploration of Temporal Impact: While the study includes temporal variables, it notes a limited impact on predictions. Additional, more nuanced temporal analyses (e.g., time-of-day patterns, seasonal variations) might reveal subtler relationships not captured in the current approach.

Potential Unaccounted Predictors: The study may not incorporate all relevant factors, such as detailed driver profiles, road maintenance levels, enforcement measures, or socio-economic attributes of different regions, which could also influence error risk at hotspots.

Use some or all of these papers to help with weakness

Huang, A. A., & Huang, S. Y. (2023). Increasing transparency in machine learning through bootstrap simulation and Shapley additive explanations. PLoS One, 18(2), e0281922.

Fawcett, L., Thorpe, N., Matthews, J., & Kremer, K. (2017). A novel Bayesian hierarchical model for road safety hotspot prediction. Accident Analysis & Prevention, Elsevier.

Islam, M. K., Reza, I., Gazder, U., Akter, R., & Arifuzzaman, M. (2022). Predicting road crash severity using classifier models and crash hotspots. Applied Sciences, MDPI.

Atumo, E. A., Fang, T., & Jiang, X. (2022). Spatial statistics and random forest approaches for traffic crash hot spot identification and prediction. International Journal of Injury Control and Safety, [Volume/Issue pending], Taylor & Francis.

Authors:

The authors sincerely appreciate the time and thoughtful consideration you have dedicated to reviewing our manuscript. We have modified the "Limitations and Strengths" section of our discussion based on the points you raised. Additionally, we have incorporated the references you suggested to better address the limitations of our work and to deepen the discussion. Regarding the "Limited Exploration of Temporal Impact," we would like to clarify that we did explore the temporal effects on our outcome. However, based on our feature importance analysis, temporal variables were not found to be significant, which is why they were not included in the subsequent ML analysis. Once again, we sincerely thank you for your meticulous comments, which we believe have significantly improved the quality of our work.

Response to Reviewer 2:

Reviewer comment:

Thank you for inviting me to review this interesting scientific paper entitled "Predicting Errors in Accident Hotspots and Investigating Spatiotemporal, Weather, and Behavioral Factors Using Interpretable Machine Learning: an Analysis of Telematics Big Data". The paper describes about the investigation of big telematics data to predict errors in Accidents Hotspots. The study utilized modern and sound methods of prediction of outcomes such as machine learning models. I found the paper is sound and would be more sound if the authors consider the following suggestion.

Authors:

The authors would like to express their most sincere words of appreciation for the time and kind consideration of the reviewer. Thank you for your thoughtful comments, which we believe have significantly improved the quality of our work.

Abstract:

Introduction: The abstract for this manuscript is not clear. Please briefly introduce the outcome and the main gaps you identified.

Thank you for your meticulous comment. We have revised the abstract to better highlight the background, significance, and outcome of our study. Additionally, we have provided more detailed results in the abstract section.

Methods: the authors identified six machine learning models to predict the RTA. My question is what are the unique criteria to select these models as there are many more models which can aid to predict the RTA? Why you select the six models indicated in the manuscript?

Thank you for your thoughtful comment. In many ML studies across different fields, multiple models are tested to identify the best-performing one before conducting further analysis (1-3). There are no strict or mandatory criteria for selecting specific models. For this study, we chose machine learning models commonly used in similar research, ranging from simple and conventional models like logistic regression to more advanced models like XGBoost. We have also highlighted this as one of the strengths of our study (as noted by Reviewer 1) in the discussion section.

Key words: add the study area, Iran .

Thank you for thhoughful comment. Iran added as one of the keywords.

Introduction: I found the introduction was sound.

Lines 81-82 (Notably, spatial variations in road traffic injuries demonstrate non-random cluster formations, indicating specific locations as more accident-prone)... Can you mentions the areas where the road traffic accident is more clustered ...this could enable the reader to easily understand which areas are more affected by RTA. also the decision makers could spot it out for better interventions globally or locally.

Thanks for you meticolus comment. We used the reference of the sentecne and clarified it: Notably, spatial variations in road traffic injuries demonstrate non-random cluster formations, indicating specific locations as more accident-prone like locations with higher traffic interactions and urban areas

-lines 86-87...mention the different variables used to predict the behavior's of drivers (such as....)

Thanks you for your meticolus comment. We used the reference of the sentecne and clarified it: To better understand and predict driving errors leading to RTAs, researchers have employed various techniques incorporating factors such as vehicle speed, acceleration patterns, braking intensity, steering movements, road conditions, and environmental factors like weather and traffic density

Methods:

- please provide the geographical map of the study area and locate the spatial distribution of the RTA that has common occurrence using GPS points. That can affirm the telematics data has value for identifying the cluster locations for your outcome variable.

Thank you for your constructive comment. We have added a map of Iran (as Figure 2) showing the distribution of errors recorded by telematics and the distribution of hotspots. We believe this visualization will help readers better understand the importance of telematics data in identifying error occurrences in accident hotspots.

-line 202--remove variable occurring second time.

Thanks for your accuracy. Addressed.

-lines 203-204...your statement indicated ...we restricted our analysis to data without any missing values, resulting in a final dataset comprising 619,988 rows.... My question is how do you manage missing data as if you could mentioned the methods you employed in managing missing data??

Thank you for your meticulous comment. Please note that we used three datasets in this study, as explained in the methodology. The first dataset was collected via telematics and included variables such as the type of errors, the geographical location of errors (longitude and latitude), the precise timing of errors (including date, hour, minute, and second), and road type, comprising approximately 1.59 million records. The second dataset contained information on 1,492 accident hotspot locations. We calculated the distance between errors recorded by telematics and all hotspot locations. If an error occurred within a 150-meter radius of an accident hotspot, it was considered to have happened in an accident hotspot. The third dataset consisted of weather-related variables obtained from Meteostat using the locations and exact timing of errors. However, complete weather data was not available, and only 690,000 errors had at least one non-missing weather-related variable. Since the highest number of missing values was observed in the weather conditions variable (classified as clear, cloudy, foggy, rainy, snowy, or stormy), we used other available data to impute missing values using the HistGBDT method. The final dataset, which integrated all these variables and had no missing values, consisted of approximately 620,000 records—an adequate sample size for performing machine learning (ML) analyses.

-Train /test splitting for the data

-what was your reason to select the 70 % training and 30 % testing data set? As there are also 80 %and 20 % training and testing classifications. Provide valid reference for this classification.

Thank you for your thoughtful comment. Different textbooks and courses suggest various data splitting ratios, typically ranging between 80:20 and 70:30. There is no strict rule for selecting a specific proportion, and as evident from the cited article (1), different studies have used different splitting ratios.In this study, we opted for a 70:30 split due to the high imbalance in the outcome variable. This approach allowed us to retain as many outcome instances as possible for both the training and testing processes, ensuring better model performance and generalization.

Result:

-It is not clear that whether the authors employed missing values for which variable. What is the methodological implication for performing missing values? Table 2 indicated the reported errors as a missing value.

Thank you for your meticulous comment. The descriptive part of the results section (Overview) is based on the available data. The percentages related to spatiotemporal distribution and locations were reported for 1,583,811 errors, as mentioned in the results section: 'After the initial cleaning of the primary telematics dataset, a total of 1,583,811 errors were available for analysis.'. In Table 2, we provided information on weather-related variables before performing HistGBDT to impute missing values for the 'weather condition' variable. The descriptive results were reported to provide an overview of our dataset. Regarding the outcome variable in our study (errors occurring in accident hotspots), which was complete, 1,277,388 instances were classified as 'no' and 24,894 as 'yes' and had missing values for the 'weather condition' variable. Thus, in Table 2, we did not report errors as missing values.

-Discussions:

In the discussion section the authors tried to compare the findings with other studies conducted previously reporting using different and best performing model. Does it sound comparing different models while reporting different performances. XGBoost is one model may be best and RF is one for the other. comparing these two models is not sound. The methods used to derive the results is also different.

Thank you for your thoughtful comment. We have made some changes in the discussion section to prevent any misunderstanding. Based on our study, both XGBoost and Random Forest, as decision tree-based models, demonstrated the best performance. In the discussion section, we also highlighted that decision tree-based models have performed well in other studies. Additionally, integrating them with SHAP can help address their interpretability challenges. Therefore, we suggest that future studies consider using similar methods to achieve both high accuracy and interpretability.

Minor comments

I recommend the authors to point out the strength and limitation of the study, especially in selecting the machine learning models.

Thank you for your thoughtful comment. Based on your feedback and that of Reviewer 1, we have revised the strengths and limitations section of our discussion to enhance clarity and comprehensiveness.

- References

-consider reference numbers 29,30, 42, 43,46,47

Thank you for your attention to detail. References 29 and 30 correspond to the websites from which we obtained data and information regarding accident hotspots. Accessing these sources requires an Iran-based IP. References 42 and 46 are both well-known textbooks on machine learning. Reference 43 is the original article introducing XGBoost with detailed explanations. Reference 47 was the first article discussing the SHAP method in game theory. However, we have removed it and replaced it with a more recent study that follows a similar methodology to our study.

Once again, we sincerely appreciate your meticulous comments, which we believe have significantly improved the quality of our work. We hope that the revisions we have made, along with our responses, address your concerns and make the manuscript suitable for publication in your view.

Referencess:

1. Silva PB, Andrade M, Ferreira S. Machine learning applied to road safety modeling: A systematic literature review. Journal of traffic and transportation engineering (English edition). 2020;7(6):775-90.

2. Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One. 2023;18(2):e0281922.

3. Iranitalab A, Khattak A. Comparison of four statistical and machine learning methods for crash severity prediction. Accident Analysis & Prevention. 2017;108:27-36.

Attachment

Submitted filename: Response to Reviewers.pdf

pone.0326483.s012.pdf (136.8KB, pdf)

Decision Letter 1

Habtamu Setegn Ngusie

Please submit your revised manuscript by Jun 28 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Habtamu Setegn Ngusie

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

The overall quality of the manuscript is good, but it is important to address a few comments before publication, as this journal maintains a high standard quality. This is also vital for the reputation of all experts involved in this manuscript, including the reviewers, the editor, and the esteemed authors.

Please recheck all English copyediting and grammatical issues. For example, in the third sentence of your introduction, change:

"93% of road traffic deaths occurred in low- and middle-income countries (LMIC), despite they have only 60% of vehicles [1]"

to

"About 93% of road traffic deaths occurred in low- and middle-income countries (LMIC), despite having only 60% of vehicles [1]."

Do not start sentences with numerical figures; add something like "about."

In the following sentence of your introduction in paragraph 2, there is no space between "and" and "accident." Review this quoted sentence:

"Various internal and external factors can affect driving, potentially leading to aggressive driving, errors, and accidents."

Please check all grammatical issues from the introduction to the end, as this is vital for your and the journal's reputation.

In the quoted sentence towards the end of your introduction, please remove the phrase "compared to conventional statistical techniques like regressions." The revised sentence should read:

"However, many ML models function as 'black box' systems, lacking interpretability, which limits their application in policymaking [16]."

When referencing your own tables and figures, please bold "Figure 1," "Table 1," and others. For example, when you mention it is further presented in Figure 1, it should be bolded as "Figure 1."

In your Statistical Analyses section, the first sentence states:

"Quantitative variables were described as the mean and standard deviation (SD), while qualitative variables were presented in the form of frequencies and percentages."

My question here is: how can qualitative variables be presented with percentages if they are not quantitative?

I would be very glad if you could incorporate "hyperparameter optimization techniques" and show the ROC curve/AUC before and after tuning or optimization. You can employ only one technique, for example, grid search tuning or Bayesian optimization.

I recommend adding a strengths and limitations of the study section at the end of the discussion and before the conclusion.

Please also highlight headings and subheadings clearly, ensuring the first letter of each heading and subheading is capitalized.

Captions for each figure are essential.

Lastly, the author may benefit from citing the following article for some of their methodological arguments, as it clearly articulates the aspects we should follow in machine learning, especially in predictive modeling; article link: https://link.springer.com/article/10.1186/s12889-024-19566-8

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

**********

Reviewer #1: All edits are made satisfactorily. This manuscript has meet the criteria to be accepted by this journal. No further edits needed

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org

PLoS One. 2025 Jul 8;20(7):e0326483. doi: 10.1371/journal.pone.0326483.r005

Author response to Decision Letter 2


17 May 2025

Response to Editor:

Editor:

The overall quality of the manuscript is good, but it is important to address a few comments before publication, as this journal maintains a high standard quality. This is also vital for the reputation of all experts involved in this manuscript, including the reviewers, the editor, and the esteemed authors.

Please recheck all English copyediting and grammatical issues. For example, in the third sentence of your introduction, change:

"93% of road traffic deaths occurred in low- and middle-income countries (LMIC), despite they have only 60% of vehicles [1]"

to

"About 93% of road traffic deaths occurred in low- and middle-income countries (LMIC), despite having only 60% of vehicles [1]."

Do not start sentences with numerical figures; add something like "about."

In the following sentence of your introduction in paragraph 2, there is no space between "and" and "accident." Review this quoted sentence:

"Various internal and external factors can affect driving, potentially leading to aggressive driving, errors, and accidents."

Please check all grammatical issues from the introduction to the end, as this is vital for your and the journal's reputation.

In the quoted sentence towards the end of your introduction, please remove the phrase "compared to conventional statistical techniques like regressions." The revised sentence should read:

"However, many ML models function as 'black box' systems, lacking interpretability, which limits their application in policymaking [16]."

Authors:

The authors sincerely thank you for your time, thoughtful review of our manuscript, and constructive feedback. We have thoroughly revised the manuscript, with particular attention to the grammatical issues you highlighted. We believe that the revised version now meets the standards of scientific writing and is suitable for publication. All changes have been clearly marked using track changes.

Editor:

When referencing your own tables and figures, please bold "Figure 1," "Table 1," and others. For example, when you mention it is further presented in Figure 1, it should be bolded as "Figure 1."

Authors:

Thank you for your feedback. We have bolded the names of figures and tables throughout the manuscript where they are referenced. We have ensured full alignment with the PLOS ONE submission guidelines (as outlined in https://journals.plos.org/plosone/s/file?id=9cba/PLOS%20Manuscript%20Body%20Formatting%20Guidelines.pdf). Specifically, we have referred to tables and figures as Table 1, Fig 1, S1 Table, S1 Fig, etc., in accordance with to PLOS ONE guideline.

Editor:

In your Statistical Analyses section, the first sentence states:

"Quantitative variables were described as the mean and standard deviation (SD), while qualitative variables were presented in the form of frequencies and percentages."

My question here is: how can qualitative variables be presented with percentages if they are not quantitative?

Authors:

Thank you for your attention to detail. The percentages for qualitative (categorical) variables were calculated by dividing the number of cases in a specific category by the total number of cases for that variable. For example, in Table 2, the percentage of error occurrences in accident hotspots under clear weather conditions is 1.86%. This was calculated by dividing the number of errors in accident hotspots under clear weather (3,570) by the total number of errors under clear weather conditions (191,570).

Editor:

I would be very glad if you could incorporate "hyperparameter optimization techniques" and show the ROC curve/AUC before and after tuning or optimization. You can employ only one technique, for example, grid search tuning or Bayesian optimization.

Authors:

Thank you for your insightful suggestion. We have performed grid search for hyperparameter tuning in last version of submission. For this version, We used the citation https://link.springer.com/article/10.1186/s12889-024-19566-8 to show the reason we choosed this approach in the “Machine learning models and evaluation” subsection of the method by:

“While various approaches such as grid search, random search, and Bayesian optimization exist for model optimization, grid search has demonstrated comparable performance. Therefore, it was selected for hyperparameter tuning in this study. Grid search was conducted using stratified 5-fold cross-validation with 3 repeats.”

Although we had previously presented the results of hyperparameter tuning in S4 Table, we have now added the complete results for hyperparameter tuning of all six models in a new supplementary file (S5 Table).

Editor:

I recommend adding a strengths and limitations of the study section at the end of the discussion and before the conclusion.

Authors:

Thank you for your recommendation. It is added.

Editor:

Please also highlight headings and subheadings clearly, ensuring the first letter of each heading and subheading is capitalized.

Captions for each figure are essential.

Authors:

Thank you for your recommendation. We reviewed all headings and subheadings and have applied the appropriate formatting in accordance with the PLOS ONE guidelines. Additionally, figure captions have been provided in the manuscript following the journal's instructions, which recommend including figure legends in the text after their first mention.

Editor:

Lastly, the author may benefit from citing the following article for some of their methodological arguments, as it clearly articulates the aspects we should follow in machine learning, especially in predictive modeling; article link: https://link.springer.com/article/10.1186/s12889-024-19566-8

Authors:

Thank you for your recommendation. The referenced study is indeed valuable, particularly due to its comprehensive and detailed methodology. We have cited this reference as number 45 in our manuscript, in the section where we discuss the rationale for selecting grid search for model optimization.

Attachment

Submitted filename: Response to Editor.pdf

pone.0326483.s013.pdf (127.8KB, pdf)

Decision Letter 2

Habtamu Setegn Ngusie

Predicting Errors in Accident Hotspots and Investigating Spatiotemporal, Weather, and Behavioral Factors Using Interpretable Machine Learning: an Analysis of Telematics Big Data

PONE-D-24-49118R2

Dear Dr. Farshad Farzadfar,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Habtamu Setegn Ngusie

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Accepted, but the academic editor is flexible if the author makes any editorial changes during the proof stage. The author is also advised to review the proof carefully.

Reviewers' comments:

Acceptance letter

Habtamu Setegn Ngusie

PONE-D-24-49118R2

PLOS ONE

Dear Dr. Farzadfar,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Habtamu Setegn Ngusie

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Evaluating correlation among variables using different correlation techniques.

    A) Pearson’s correlation for quantitative variables, B) Cramér’s V for categorical variables, C) Correlation of categorical vs quantitative variables.

    (PNG)

    pone.0326483.s001.png (983.4KB, png)
    S2 Fig. The mean absolute SHAP values for all predictors.

    (PNG)

    pone.0326483.s002.png (934.2KB, png)
    S3 Fig. SHAP beeswarm summary plot of all predictors.

    (PNG)

    pone.0326483.s003.png (958.5KB, png)
    S1 Table. Distribution of accident hotspots in Iran.

    (DOCX)

    pone.0326483.s004.docx (14.3KB, docx)
    S2 Table. Descriptive summary of variables excluding weather-related variables in the primary dataset.

    (DOCX)

    pone.0326483.s005.docx (23.6KB, docx)
    S3 Table. Descriptive summary of variables in the dataset after prediction of weather condition and without any missing values.

    (DOCX)

    pone.0326483.s006.docx (31.1KB, docx)
    S4 Table. Models hyperparameters tuning.

    (DOCX)

    pone.0326483.s007.docx (13.8KB, docx)
    S5 Table. Performance of different models during hyperparameters tuning by grid search.

    (XLSX)

    pone.0326483.s008.xlsx (19.6KB, xlsx)
    S6 Table. Evaluation of machine learning models for the prediction of error occurrence in accident hotspots.

    (DOCX)

    pone.0326483.s009.docx (13.9KB, docx)
    S1 File. A small sample of data used for analysis (in pickle format).

    (PKL)

    pone.0326483.s010.pkl (44.6KB, pkl)
    S2 File. Codes used during analysis process.

    (IPYNB)

    pone.0326483.s011.ipynb (89.4KB, ipynb)
    Attachment

    Submitted filename: Response to Reviewers.pdf

    pone.0326483.s012.pdf (136.8KB, pdf)
    Attachment

    Submitted filename: Response to Editor.pdf

    pone.0326483.s013.pdf (127.8KB, pdf)

    Data Availability Statement

    The datasets generated and/or analysed during the current study are not publicly available due to the restrictions set by the funder of the study, National Institute for Medical Research Development (NIMAD). However, researchers with written permission can request to obtain the anonymized data. Requests to access the datasets should be directed to the NIMAD website (https://nimad.ac.ir/). A small sample of data is provided in the ‘supporting information’ section.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES