Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Sep 2;15:32317. doi: 10.1038/s41598-025-16477-5

Investigating factors influencing injury severity in crashes involving vulnerable road users in Pakistan

Muhammad Junaid 1, Chaozhe Jiang 1,, Saleh Alotaibi 2, Tong Wang 3, Yahya Almarhab 4
PMCID: PMC12405482  PMID: 40897769

Abstract

Road traffic crashes claim around 1.19 million lives annually worldwide, with over half of the fatalities involving vulnerable road users (VRUs). While several studies have explored the risk factors associated with specific categories of VRUs in Pakistan, research focusing on VRUs collectively, considering all categories and their unique safety challenges, remains limited. This study aims to examine the influence of various risk factors on the severity of injuries resulting from crashes involving VRUs, using a three-year dataset (2021–2023). The study evaluated the effectiveness of six boosting-based ensemble machine learning classifiers across multiple evaluation metrics. The findings indicated that boosting with decision stumps outperformed extreme gradient boosting, light gradient boosting, histogram-based gradient boosting, categorical boosting, and adaptive boosting in terms of recall, F1-score, and accuracy. The partial dependence plots demonstrated that VRUs aged 55 years or older, collisions with other VRU groups, involvement of vans and heavy vehicles, rainy weather, the COVID-19 period, and the existence of painted medians increase the likelihood of severe injury in crashes involving VRUs. The pairwise SHAP interaction plot also supported these findings by illustrating that the interaction between different vehicle types (vans and heavy vehicles), adverse weather conditions, and VRU crashes during the COVID-19 lockdown period elevates the risk of severe crashes. Based on the study findings, several policy recommendations were proposed, including implementing education and awareness programs, developing strategies to manage mixed traffic, and improving road infrastructure to enhance safety for all VRU groups.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-16477-5.

Keywords: Vulnerable road users, Injury severity, Boosting-based ensemble learning, Partial dependence plots, SHAP interaction strength plot

Subject terms: Engineering, Civil engineering

Introduction

Road users without outer protective cells are referred to as vulnerable road users (VRUs). This group includes motorcyclists, rickshaw users, bicyclists, and pedestrians. Research focused on road traffic safety indicates that VRUs are at a greater risk of sustaining severe injuries or fatalities in road traffic crashes1. According to statistics, more than half of the 1.19 million annual road traffic fatalities in the world occur among VRUs2. In addition, it has been reported that pedestrians account for 23% of these fatalities, two-and three-wheelers for 21%, and cyclists for 6%. Notably, 92% of the fatalities occur in low- and middle-income nations. Furthermore, the likelihood of fatalities in low-income nations is three times greater than in high-income nations, despite these nations possessing less than 1% of the total motor vehicles2.

Significant advancements have been made in global traffic safety in recent years; however, it is evident that these improvements are not uniformly distributed across all nations and among all road users. Most of the initiatives aimed at enhancing road safety have taken place in high-income countries, primarily benefiting car occupants3. Modern vehicles are equipped with numerous safety features designed to safeguard occupants during collisions, including seatbelts, airbags, and crumple zones, which increase the likelihood of survival in the event of a crash. Conversely, the safety of VRUs has not been improved in a comparable manner in low- and middle-income nations, where traffic regulations are frequently inadequately enforced. This disparity is reflected in the statistics of crashes and the resulting injuries and fatalities among VRUs in these regions. For example, in São Paulo, Brazil, VRUs accounted for about 80% of road traffic fatalities in 20214. In addition, VRUs are responsible for about 54% and 50% of severe or fatal injuries in road crashes in India and Bangladesh, respectively5,6. This situation arises from several critical factors, including the dynamics of mixed traffic, risky driving behaviors, non-adherence to safety regulations, and diverse road conditions, such as inadequate pedestrian infrastructure, absence of designated lanes for two-wheeled vehicles, and interactions with roadside vendors, all of which create a unique and challenging safety environment7,8.

Similar to other low- and middle-income nations, Pakistan has one of the highest incidences of road traffic crashes globally, resulting in the loss of thousands of lives and severe injuries each year9. For instance, in 2021, the country recorded 10,379 crashes, which resulted in around 5,608 fatalities10. According to a press report, every five minutes, someone is either killed or seriously injured in a road traffic crash in Pakistan11. In addition, Pakistan’s traffic fatality rate ranks among the highest globally, with a figure of 14.2 deaths per 10,000 registered vehicles12. Furthermore, it is reported that two- and three-wheeled vehicles contribute to over 60% of crashes in the country13. These alarming trends underscore the urgent need for improved infrastructure and safety measures for these vehicles, which are essential for enhancing road safety.

The increased risk of severe injuries among VRUs in road traffic crashes has prompted several researchers in Pakistan to explore the severity and risk factors associated with specific categories of VRUs. For instance, 1417 focused on the risk factors affecting motorcyclists. Similarly, 18,19 analyzed the safety challenges associated with motorized rickshaw drivers. In addition, 20,21 assessed the risk factors influencing the safety of pedestrians. In these studies, the road functional class, speed limit, rider attributes, temporal factors, weather attributes, seasonal variations, violations of traffic regulations, and the use of unsafe helmets or the absence of helmets altogether were identified as the key risk factors affecting the safety of VRU groups. However, these studies have focused on specific VRU groups, highlighting a significant gap in the existing literature as no comprehensive study has been conducted in the country to investigate the safety of all VRU groups collectively. This shortcoming in the existing literature has led to an insufficient understanding of how various risk factors affect the safety of VRU groups collectively.

Aside from the aforementioned risk factors, existing literature reveals that road geometric features significantly impact the severity of injuries sustained in crashes involving VRUs. For example, the existence of intersections substantially increases the risk of severe injuries among motorcycle riders22. In addition, prior research23 has found that motorcyclists face a lower risk of sustaining severe injuries on straight segments of the roadway compared to curved segments, primarily due to improved visibility and enhanced reaction times. Furthermore, both the “AASHTO Highway Safety Manual” and “A Policy on Geometric Design of Highways and Streets” emphasize designing roadway alignments in line with drivers’ expectations to meet safety criteria. Moreover, several researchers have concluded that road characteristics are often more manageable and controllable compared to factors associated with drivers, vehicles, and environmental conditions. For instance, improving safety in motorcycle-related crashes can be more effectively achieved through the provision of quality road infrastructure, whereas regulating aspects such as rider demographics, vehicle models, and weather conditions remains considerably challenging2426. The current body of literature includes various studies from other regions of the world that have explored the influence of road geometric characteristics on road traffic crashes involving all categories of VRUs2729. However, prior research conducted in Pakistan has largely overlooked the effect of geometric risk factors on the injury severity of specific VRU categories. This oversight highlights a significant gap in understanding how several essential elements pertaining to roadway geometry (such as lane width, median type, shoulder width, shoulder type, alignment type, access control, U-turns, intersections, and the presence of sidewalks) influence the severity of crashes involving VRUs.

Regarding modeling techniques, previous studies in the country have mostly focused on statistical models when investigating road traffic crashes involving specific categories of VRUs, such as motorcycles1416rickshaws19and pedestrians21. These models have clear functional forms and are easy to explain while quantifying the marginal effects. However, it is widely recognized that these models are based on several fundamental assumptions, and any violation of these assumptions can result in low prediction accuracy and misleading conclusions30,31. Moreover, these methods often rely on linear functions to model relationships between dependent and explanatory variables, even though road traffic crashes are affected by a set of heterogeneous variables32. This highlights a pressing need for an effective crash injury severity model capable of uncovering latent information from extensive and intricate datasets.

In recent years, machine learning approaches have been extensively utilized across various fields, including road traffic safety, to improve prediction accuracy and reliability. These techniques do not necessitate any preliminary assumptions regarding the relationships among variables and are better suited for handling nonlinear relationships between input and output data33. In addition, these techniques are highly proficient with extensive datasets and demonstrate robust performance in high-dimensional data environments by employing techniques like dimensionality reduction, which simplifies such datasets by transforming them into a lower-dimensional space while preserving key information34. Despite these advantages, understanding the underlying phenomena behind machine learning techniques remains a significant challenge. However, various methods have been developed that provide explanations for machine learning techniques35,36. Notably, these methods include partial dependence (PD) plots, individual conditional expectation plots, local interpretable model-agnostic explanations, and SHAP analysis, which provide explanations for the model’s decisions3739.

Consequently, a wide community of researchers has employed various types of machine learning techniques to predict road crashes and associated risk factors. For instance, several researchers17,18,4043 have used a combination of non-ensemble and ensemble machine learning techniques, while others44 have focused exclusively on the utilization of ensemble machine learning techniques for predicting road traffic crashes and the associated risk factors. These models, particularly ensemble techniques, exploit the collective power of multiple models to enhance prediction accuracy. Among ensemble machine learning techniques, there has been growing interest in boosting-based algorithms, as these algorithms iteratively refine their predictions by learning from previous errors. Consequently, some researchers have concentrated solely on these techniques when investigating the severity of injuries in road traffic crashes23,42,45,46. In addition to traditional machine learning techniques, the applications of deep learning in the realm of road traffic safety have also been gaining attention from researchers4749. Table 1 presents a summary of the existing literature, which primarily focuses on road traffic crashes involving various VRU categories, detailing the study locations, number of observations, study contexts, machine learning techniques, and the risk factors associated with these crashes.

Table 1.

Summary of VRU crashes using machine learning techniques.

Country/dataset Study context Algorithm used Explanatory variables
↑ indicates increase
↓ indicates decrease
Refs.
Pakistan/2,743 Assessing the severity of injuries and risk factors associated with crashes involving motorized rickshaw drivers Random forest, decision jungle, decision tree Young drivers (↑), speeding (↑), weekdays (↑), off-peak hours (↑), clear weather conditions (↑) 18
China/1,656 Predicting VRU crashes and identifying risk factors contributing to the resulting severity of injuries Random forest, random parameters logit model Oder VRUs (↑), trucks (↑), turning (↑), braking (↑), head-on collisions (↑), autumn (↑), winter (↑), rural regions (↑), primary roads (↑), secondary roads (↑) 1
Israel/47,432 Examining the severity of injuries sustained by pedestrians in road traffic crashes Logistic regression, support vector machine, naïve Bayes, decision tree, random forest, K-nearest neighbors Females (↑), weekdays (↑), road width (↑), speeding (↑), non-intersection locations (↑) 50
Pakistan/ 9,465 Investigating factors affecting the severity of injuries among motorcyclists Gradient boosted decision tree, naïve Bayes, random forest, multinomial logit model Young riders (↑), females (↑), pedestrians (↑), distraction (↑), trucks (↑), summer (↑) 43
France/68,781 Investigating risk factors contributing to the injury severity of two-wheeler crashes Extreme gradient boosting Older riders (↑), riders not wearing helmets (↑), rural regions (↑), run-off-road crashes (↑), crossing roads (↑) 23
China/9,090 Safety investigation of vehicle-pedestrian crashes Bayesian neural network, conventional neural network, random forest, K-nearest neighbors Presence of traffic signals (↑), lack of control at junctions (↑), light rain (↑), daylight (↓), streetlights (↓) 51
Great Britain/754,636 Predicting the severity of injuries in crashes involving cyclists Categorical gradient boosting Younger and older cyclists (↑), speeding (↑), rain or snow (↑), night (↑), multi-vehicle crashes (↑), highways and poorly maintained roads (↑) 52
Pakistan/5,144 Exploring risk factors contributing to motorcycle crashes Extreme gradient boosting, categorical gradient boosting, multinomial logistic regression, random forest Gender (↑), age (↑), number of lanes (↑), speeding (↑), month of year (↑) 17
China/2,037 Safety investigation of VRUs in collisions with heavy vehicles Stacking, voting, random forest, extreme gradient boosting VRUs age (↑), signalized crossings (↑), national and provincial roads (↑), urban regions (↑) 44
Portugal/37,728 Analyzing the severity of injuries resulting from crashes involving motorcyclists K-nearest neighbors, decision tree, support vector machine, extreme gradient boosting, logistic regression, random forest Male (↑), alcohol consumption (↑), primary roads (↑), speeding (↑), rural regions (↑), road markings (↑), clean and dry roads (↑), intersections or crossroads (↑), night (↑), weekends (↑), driving without license (↑), riders not wearing helmets (↑) 22

The preceding discussion emphasizes the need for a comprehensive investigation of VRU crashes in Pakistan, considering all categories (pedestrians, cyclists, motorcyclists, and rickshaw users) collectively, and their unique safety challenges. This study employs several boosting-based ensemble machine learning techniques such as extreme gradient boosting, light gradient boosting machine, histogram-based gradient boosting, categorical boosting, boosting with decision stumps, and adaptive gradient boosting, to predict crashes involving VRUs by utilizing a dataset of 17,456 crash records obtained from a nationally representative emergency department. In addition, apart from user demographics, crash dynamics, and environmental conditions, the study also incorporates roadway geometric features as key predictors of injury severity, whihc is an often overlooked aspect in prior research. Furthermore, the impact of COVID-19, which has significantly influenced the crash likelihood and severity in other countries, has been considered for examination. The PD plots for the ten most important features, identified via the SHAP technique, further enhance the insights of the study by visually illustrating the influence of these significant features on the predicted probability of severe injury. Furthermore, the SHAP interaction strength plot was constructed to assess which feature pairs interact strongly, revealing complex relationships within the data that might not be apparent when examining features individually. The insights gained from this study are anticipated to be crucial in developing effective policies and safety measures aimed at improving the safety of VRUs.

Methodology

Data description

The study was carried out in Rawalpindi and Islamabad, commonly known as the “twin cities” of Pakistan due to their significant interdependence and the areas that link them. Islamabad is primarily inhabited by bureaucrats and the business elite who are more accustomed to a Western lifestyle, whereas Rawalpindi resembles any other metropolitan area in a developing nation, characterized by a dense population, traffic congestion, and a populace that adheres to traditional culture. Despite their contrasting attributes, Rawalpindi and Islamabad are both essential to the nation, serving as the centers for civil and military leadership and providing a foundation for international relations53. A comprehensive illustration of the study area, generated using ArcGIS 10.8, is provided in Fig. 1.

Fig. 1.

Fig. 1

Map of the study area.

The data concerning VRU crashes from 2021 to 2023, utilized in this study, were sourced from the Rawalpindi Rescue 1122 office, which is responsible for maintaining and updating records of road traffic crashes across all districts in Punjab, Pakistan. This dataset includes detailed crash reports with information on VRU demographics, crash causes, crash locations, vehicle types involved, and injuries sustained by VRUs. Missing geometric details of the roadway segments were collected by a research team through on-site visits to the crash locations. In addition, weather condition data were acquired from the Pakistan Meteorological Department in Islamabad.

The dataset, obtained from Rescue 1122, was formatted in Excel and subsequently refined by removing rows that contained missing or insufficient data, resulting in a total of 17,456 observations for the analysis. The severity of injuries sustained in crashes involving VRUs was categorized into two categories: non-severe injury and severe injury. The “non-severe” injury category included cases with minor injuries such as cuts, bruises, or abrasions, while the “severe” injury category encompassed cases of fatalities, major fractures (particularly of the neck, head, or spine), or substantial bleeding. Among the recorded cases, 13,468 (77.15%) were classified as non-severe, while 3,988 (22.85%) were categorized as severe. A comprehensive summary of the descriptive statistics for each explanatory variable is provided in Table 2.

Table 2.

Summary of descriptive statistics.

Variable Type Category Count Proportion (%)
Injury severity Binary Non-severe 13,468 77.15
Severe 3,988 22.85
VRU categories Categorical Pedestrians 1,056 6.05
Cyclists 78 0.45
Motorcyclists 14,904 85.38
Rickshaw users 1,418 8.12
Gender Binary Male 15,076 86.37
Female 2,380 13.63
Age group Categorical  ≤ 24 6,467 37.05
 > 24 ≤ 54 9,032 51.74
 ≥ 55 1,957 11.21
Education level Categorical No education 4,211 24.12
Primary 4,406 25.24
Middle 45 0.26
Matric 6,016 34.46
Intermediate 2,048 11.73
Graduation and above 730 4.18
Weather Categorical Shiny 11,009 63.07
Cloudy 5,104 29.24
Rainy 1,343 7.69
COVID-19 effect Binary During COVID-19 7,038 40.32
Post COVID-19 10,418 59.68
Other vehicle type Categorical Single vehicle 5,650 32.37
Vulnerable road users 5,713 32.73
Passenger car 4,230 24.23
Heavy vehicle 843 4.83
Van 813 4.66
Other 207 1.18
Crash cause Categorical Speeding 11,495 65.85
Distraction 4,885 27.98
U-turn/wrong way 682 3.91
Cloth stuck in bike tyre 223 1.28
Other 171 0.98
Road functional class Categorical Primary 15,726 90.09
Secondary 1,304 7.47
Tertiary 426 2.44
Number of lanes Categorical Single 1,544 8.84
Two 4,531 25.96
Three and above 11,381 65.20
Lane width Binary  ≥ 3.0 10,179 58.31
 < 3.0 7,277 41.69
Median type Categorical Grassy 12,084 69.22
New Jersey barrier 2,890 16.56
Painted 2,482 14.22
Outer shoulder Binary Yes 16,997 97.37
No 459 2.63
Outer shoulder width Binary  ≥ 1.5 4,541 26.01
 < 1.5 12,915 73.99
Alignment type Categorical Straight 11,983 68.65
Horizontal 3,845 22.03
Vertical 1,628 9.33
U-turn (within 50 m) Binary Yes 10,699 61.29
No 6,757 38.71
Intersection (within 50 m) Binary Yes 6,899 39.52
No 10,557 60.48
Access point (within 50 m) Binary Yes 14,142 81.02
No 3,314 18.98
Sidewalk Binary Yes 9,289 53.21
No 8,167 46.79

The descriptive statistics provided in Table 2 reveal that the majority of individuals involved in VRU crashes were male (86.37%), predominantly of middle age (51.74%), and had a matric or lower level of education (84.08%). The higher involvement of males in these crashes can be attributed to prevailing social and cultural dynamics in the country. In addition, the heightened exposure of middle-aged VRUs to crash risks could be linked to their participation in outdoor activities, as they frequently serve as the primary earners for their households. Furthermore, the increased risk of crashes among those with lower educational attainment may stem from a limited understanding of traffic laws and safety protocols. The majority of these crashes occurred in clear weather, representing 63.07% of the total incidents. It is worth mentioning that the study duration encompasses the COVID-19 lockdown phase, as the first confirmed COVID-19 case in the nation was documented on February 26, 2020. As a result, the country entered a strict nationwide lockdown in the third week of March 2020, which was followed by various phases of “smart” lockdowns. In mid-March 2022, the lockdown restrictions were lifted permanently; however, they had considerable effects not only on everyday life but also on traffic volumes and mobility trends54,55. Thus, the study duration from January 2021 to mid-March 2022 was classified as the COVID-19 lockdown period, while the timeframe from mid-March 2022 to December 2023 was designated as the post-COVID-19 period. Throughout the study period, 40.32% of cases were recorded during the COVID-19 period, whereas 59.68% were noted in the post-COVID-19 period. A significant portion of these crashes involved collisions with VRUs (32.73%), closely followed by single-vehicle crashes (32.37%). The leading causes identified for crashes involving VRUs were speeding (65.85%) and distractions (27.98%). Moreover, it was observed that most crashes occurred on primary roads (90.09%), along straight segments (68.65%), in areas featuring grassy medians (69.22%), and on roads with three or more lanes (65.20%) in each direction. The impact of various other factors, such as the presence of U-turns (61.29%), access points (81.02%), and non-intersection locations (60.48%), was also found to be significant in relation to VRU crashes.

Boosting-based ensembled techniques

In the realm of road safety, boosting-based ensemble techniques are of prime importance for improving prediction accuracy and identifying the underlying causes of road traffic crashes. These techniques utilize the principle of ensemble supervised learning, which enables each learner to correct the errors of its predecessor. This iterative process results in enhanced predictive performance by effectively minimizing both the bias and variance in predictions56,57. The subsequent sections provide a brief overview of six distinct boosting-based ensemble techniques, and their relevant mathematical expressions are provided in Supplementary Table S1.

Extreme gradient boosting

Extreme gradient boosting (XGBoost) is an advanced ensemble machine learning technique based on boosting principles, utilizing various algorithms within the gradient boosting framework. It is characterized by its ability to perform parallel computations at the node level, resulting in speeds more than ten times faster than traditional gradient boosting machines58. It incorporates several regularization techniques to mitigate overfitting and enhance overall model performance. It is notable for its fast execution and high efficiency. Although it supports parallel processing and offers an extensive array of hyperparameters for optimization, it is particularly sensitive to their configurations, requiring considerable time and computational resources to optimize. In addition, the intricate nature of XGBoost models can pose challenges in terms of interpretability.

Light gradient boosting machine

Light gradient boosting machine (LightGBM) is an advanced ensemble learning technique based on boosting, implementing gradient-boosted decision trees that utilize a leaf-wise growth approach and gradient-based optimization59. It speeds up the training process by utilizing a histogram-based approach to determine optimal splits. To address the long training times of traditional gradient-boosted trees56, introduced gradient-based one-side sampling and exclusive feature bundling, which resulted in a 20% reduction in training and inference times, along with reduced memory use. However, LightGBM may struggle with smaller datasets because it is more prone to overfitting and requires meticulous regularization. Furthermore, LightGBM models might be difficult to interpret due to their inherent complexity.

Histogram-based gradient boosting

Histogram-based gradient boosting (HGB) technique, drawing inspiration from LightGBM, often outperforms the gradient boosting classifier on large datasets. During the training phase, the tree growth process evaluates potential gains at each split to determine the best path for samples that contain missing values, subsequently directing them to the appropriate child nodes. In the prediction phase, these missing values are allocated to the relevant child node. In cases where a feature did not exhibit any missing values during training, samples with missing values are assigned to the child node with the highest sample count60.

Categorical boosting

Categorical boosting, often known as CatBoost, is a gradient boosting algorithm that uses decision trees to effectively handle ordered and categorical features61. It employs random data shuffling to compute the average for each object based solely on historical data. In contrast to traditional gradient boosting methods, it combines ordered boosting with a permutation-based strategy to reduce variance in the final model predictions62. CatBoost is known for its excellent accuracy and robustness against noisy data, as well as its built-in mechanisms for managing missing values. However, it may take more computational resources and can be slower than other boosting algorithms, particularly on large datasets, making hyperparameter optimization challenging.

Boosting with decision stumps

Boosting with decision stumps (BDS) uses single-level decision trees as weak learners to enhance model performance. This method focuses on misclassified instances, guiding the algorithm toward more difficult scenarios. The final predictions are obtained by combining the outputs of all decision stumps through weighted sums63. The fundamental concept involves initiating with a basic decision stump and progressively refining it in each iteration to improve the model’s overall efficacy. Although decision stumps are inherently simple, their combination through the boosting framework results in a strong and robust classifier. The primary aim is to reduce classification errors; however, like other boosting algorithms, this technique may be susceptible to overfitting, particularly in noisy environments.

Adaptive gradient boosting

Adaptive gradient boosting (AdaBoost) algorithm, developed by Freund and Schapire, is a pioneering boosting-based ensemble technique. The process commences with training a classifier on the initial dataset, followed by iteratively retraining the classifier. During this process, the algorithm increases the weights of misclassified instances while reducing the weights of those that are correctly classified. This method continues until optimal performance is achieved. AdaBoost is recognized for its proficiency in managing complex datasets with high accuracy, as it resists overfitting and performs effectively with weak classifiers. Nonetheless, it may exhibit sensitivity to noisy data and outliers, which might have an adverse effect on its performance. Furthermore, it may encounter difficulties with datasets that have imbalanced class distributions64,65.

Evaluation metrics

A comprehensive performance evaluation necessitates the consideration of several metrics because accuracy alone may not provide a clear representation of classification outcomes. Therefore, this study considered four of the most prominent evaluation metrics: precision, recall, F1-score, and accuracy to evaluate the predictive capabilities of the selected classification models. Precision measures the model’s effectiveness to correctly identify true positives among all instances it has classified as positive, which includes both true and false positives. Recall quantifies the total number of positive instances identified by the model in relation to all actual positive instances (true positives and false negatives). The F1-score integrates the harmonic mean of precision and recall into a single metric, making it especially useful for evaluating imbalanced datasets. Accuracy measures the ratio of correctly identified instances to the total number of instances within the dataset. All these metrics have values ranging from 0 to 1, with higher values being preferred, as values closer to 1 indicate superior classification performance, whereas lower values signify poorer performance. These metrics have been employed in various studies aimed at predicting the severity of road traffic crashes40,44,46. The mathematical formulas pertinent to these evaluation metrics are provided in Supplementary Table S2.

SHAP value analysis

The Shapley value (hereinafter “SHAP value”) is a model-agnostic method utilized in the realm of machine learning to enhance interpretability. The SHAP value analysis works on the principle of cooperative game theory, which aims to evaluate the contribution of each player to the overall gains achieved in the game66. The SHAP value for a given variable is calculated as the mean of the marginal contributions across all possible combinations. The SHAP value for feature k is determined as:

graphic file with name d33e1284.gif 1

Where Inline graphicdenotes the SHAP value associated with the kth feature, J derived from N, encompasses all relevant risk features, and nJ signifies the values corresponding to the features within J. To analyze the impact of the feature k, two models are developed: one that includes feature k (fJU{k} (nJU{k})) and another that excludes it (fJ(nJ)). The outcomes from these two models are compared to the existing model represented by the expression Inline graphic. The differences are calculated for each potential subset due to the interdependence of the variable of interest with other variables in the model38.

Findings and discussion

Models’ performance

The descriptive statistics summarized in Table 1 reveal a larger number of observations in one outcome category (non-severe injury) in comparison to a limited number of observations in the other category (severe injury). In classification tasks, this disparity in observations across outcome categories can result in a class imbalance, a prevalent problem in real-world high-dimensional datasets67. While many machine learning algorithms excel with balanced datasets, as they prioritize optimizing overall classification accuracy or similar metrics, class imbalance presents several learning challenges, such as uneven class distribution, scarcity of data, and increased conceptual complexity68. A widely adopted strategy to address class imbalance is the implementation of sampling techniques, which modify the prior distributions of both majority and minority classes within the training dataset to ensure a more equitable representation of instances in each class68. Among these strategies, the synthetic minority over-sampling technique (SMOTE) stands out, as it generates a more balanced dataset by creating new samples from the minority class, thereby enhancing the model’s ability to learn patterns related to the minority class69,70. Consequently, this study employed the SMOTE technique to train and assess the model’s performance using a balanced dataset.

All the models were trained with a distribution of data where 60% was allocated for training, 20% for validation, and 20% for testing. The models were fine-tuned using a grid-search optimization technique. This approach ensured that the models were carefully calibrated for improved predictive accuracy while maintaining their ability to generalize effectively to unseen data. Table 3 presents a summary of the optimized parameters along with the range of values used during the optimization process.

Table 3.

Summary of the models hyperparameters tuning.

Algorithm Hyperparameters Range Best values
XGBoost objective binary: logistic binary: logistic
tree_method hist hist
learning_rate 0.01-1.0 0.2
max_depth 1–7 3
n_estimator 10–200 200
LightGBM objective binary binary
boosting_type gbdt gbdt
learning_rate 0.01-1.0 0.1
max_depth 1–7 7
n_estimator 10–200 100
HGB learning_rate 0.01-1.0 0.2
max_iter 50–200 150
max_leaf_nodes 10–50 10
CatBoost learning_rate 0.01-1.0 0.1
max_iter 10–200 200
max_depth 1–7 7
BDS learning_rate 0.01-1.0 0.2
max_depth 1–7 3
n_estimator 10–200 200
AdaBoost Algorithm SAMME, SAMME.R SAMME.R
learning_rate 0.01-1.0 1.0
n_estimator 10–200 200

The modeling results, as provided in Table 4, indicate that the selected models performed satisfactorily across both the training and testing datasets. On the training dataset, all models demonstrated closely matched performance, with precision values ranging from 0.818 to 0.881, recall values from 0.800 to 0.855, F1-scores from 0.797 to 0.852, and accuracy values from 0.800 to 0.855. This consistency suggests that the models are robust in identifying patterns within the training dataset. However, a slight decline in the models’ performance was noted when evaluating the testing dataset, with precision values ranging from 0.820 to 0.862, recall values from 0.799 to 0.829, F1-scores from 0.796 to 0.827, and accuracy values from 0.799 to 0.830. Despite this minor decrease, the models maintained satisfactory performance, reflecting their capability to generalize effectively to unseen data.

Table 4.

Summary of models’ performance.

Classification models Training dataset Testing dataset
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
XGBoost 0.869 0.837 0.833 0.837 0.863 0.829 0.825 0.829
LightGBM 0.873 0.847 0.844 0.847 0.854 0.828 0.825 0.828
HGB 0.871 0.843 0.840 0.843 0.852 0.823 0.820 0.823
CatBoost 0.881 0.855 0.852 0.855 0.856 0.828 0.825 0.828
BDS 0.870 0.839 0.835 0.839 0.862 0.830 0.827 0.830
AdaBoost 0.818 0.800 0.797 0.800 0.820 0.799 0.796 0.799

The receiver operating characteristic (ROC) curves illustrated in Fig. 2 further demonstrate the validity of the classification models, as the ROC curves of all the models are located above the diagonal line22,29. The area under the curve (AUC) values for LightGBM and CatBoost are 0.88, followed by XGBoost, HGB, and BDS with AUC values of 0.87, while AdaBoost achieves an AUC of 0.85. This indicates that the models possess the ability to distinguish between the negative and positive classes, showcasing strong discriminatory power. Although the models’ overall performance is satisfactory, minor limitations such as noise and overlapping feature distributions may still influence their ability to attain even higher levels of predictive performance in VRU-related crash prediction.

Fig. 2.

Fig. 2

Receiver operating characteristic curves for classification models.

Furthermore, the models’ performance has been compared on the testing dataset across several performance metrics, including recall, F1-score, and accuracy. It can be observed that the BDS model marginally surpasses other models, achieving the highest recall (0.830), F1-score (0.827), and accuracy (0.830). This suggests that the BDS model is a more reliable and well-balanced model for predicting the severity of injuries resulting from crashes involving VRUs. Nevertheless, it is worth acknowledging that existing literature underscores that the performance of a model can vary across datasets, and no single model is consistently superior to others across different datasets71.

Feature importance and injury severity

The SHAP technique is widely used for identifying significant features within a dataset that contribute to the model output. In the current study, the SHAP technique was employed to identify the ten most critical features contributing to the likelihood of VRU crashes, as illustrated in Fig. 3. Each bar in the plot represents a SHAP value for the corresponding feature, organized in descending order. The influence of each feature on the model’s output is represented by the values displayed at the end of each bar. From the figure, it can be observed that the ten most significant features contributing to VRU crashes are age (55 years or older), intermediate and graduation and above levels of education, cloudy and rainy weather, the COVID-19 period, VRU groups, heavy vehicles, vans, and painted medians. Previous studies on safety investigations also identified that most of these features increase the risk of road traffic crashes1,17,18,23,41,44,72,73.

Fig. 3.

Fig. 3

Feature importance analysis plot.

To illustrate the impact of these significant features on the severity of injuries sustained by VRUs, PD plots were created. These plots are based on the premise that the explanatory features of interest are independent of other variables74. The PD plots presented in Figs. 4, 5, 6 and 7 demonstrate how various factors, such as VRU attributes, vehicle types involved in collisions, external conditions, and roadway geometric features, affect the predicted likelihood of severe injury. In the provided plots, the X-axis represents various values of the risk factors, the blue lines show the predicted probability of severe injury at each level of the explanatory variables, while the red lines indicate the average predicted probability of severe injury, which serves as a reference for evaluating the relative impact of each risk factor on the model’s predictions.

Fig. 4.

Fig. 4

Influence of VRU attributes on the predicted probability of severe injury.

Fig. 5.

Fig. 5

Influence of vehicle types on the predicted probability of severe injury.

Fig. 6.

Fig. 6

Influence of weather conditions on the predicted probability of severe injury.

Fig. 7.

Fig. 7

Influence of roadway geometric feature on the predicted probability of severe injury.

Vulnerable road users attributes

The PD plots in Fig. 4 illustrate the relationship between VRU attributes (age and educational level) and the probability of severe injury in crashes involving VRUs. The PD plot for crashes involving VRUs aged 55 years or older indicates an increased risk of severe injuries, as indicated by the sharp positive slope of the blue line. This can be due to the fact that VRUs of this age group have slower reaction times, fragile physical health, and a reduced ability to recover from injuries75. The increased likelihood of severe injuries in crashes involving older VRUs is intuitive and has been documented in earlier studies1,29. In contrast, the PD plots for intermediate and graduation and above levels of education demonstrate a negative trend, as evidenced by the downward slopes of the predicted probability lines, suggesting that VRUs with these categories of education are less likely to be engaged in severe injury crashes. Notably, compared to the intermediate level of education, VRUs with a graduation or above level of education are less likely to be involved in severe injury crashes, as evidenced by the sharp downward slope of the blue line in the PD plot. This decrease in severe injury crashes for educated VRUs, particularly those with a graduation and above level of education, could be associated with better understanding and awareness of traffic rules and safety practices like helmet use, vehicle maintenance, avoiding aggressive driving, avoiding distractions such as mobile phone use, and so forth. Prior studies on safety investigations also reported that individuals with higher levels of education are negatively associated with the likelihood of severe injury crashes76,77. In Pakistan, earlier research from the same study setting has drawn similar conclusions, indicating that older VRUs have a higher chance of being involved in severe and fatal injury crashes14,19. In addition, it has been reported that VRU groups with higher literacy rates are less likely to engage in such crashes16.

Vehicle types involved in collisions

The PD plots in Fig. 5 illustrate the relationship between various vehicle types involved in collisions with VRUs and the predicted probability of severe injury outcome. The plots show a positive relationship between all three vehicle types and severe injury, indicating that as the involvement of these vehicle types increases, so does the predicted probability of severe injury. Collisions between VRUs and other VRUs, as well as with vans, show a moderate increase in the predicted probability of severe injury, as indicated by the gentle upward slopes of the blue lines in their plots. In contrast, the steep upward slope of the blue line in the PD plot for heavy vehicles suggests that collisions involving heavy vehicles substantially increase the likelihood of severe injury outcomes for VRUs. The primary reason is that heavy vehicles possess greater size and mass, longer stopping distances, larger blind spots, and higher momentum, all of which contribute to more severe impacts when VRUs are involved in such collisions. An earlier study has drawn similar conclusions, stating that VRUs sustain 53.4% and 55.1% fewer fatal injuries in collisions with motorcyclists and cars, respectively, compared to heavy vehicles27. In the context of Pakistan, prior studies have indicated that collisions between VRU groups such as motorcycles and motorized rickshaws16,19, as well as motorized rickshaws and pedestrians19, increase the probability of severe injuries. In addition, prior research has identified that the interactions between VRUs and cars result in a heightened risk of severe injury outcomes16. Furthermore, previous studies have concluded that collisions between heavy vehicles and two-wheelers as well as three-wheelers substantially elevate the likelihood of severe and fatal injury crashes14,19,78. These phenomena can be attributed to mixed traffic scenarios, risky driving behaviors, and the ineffective implementation of traffic regulations in the country79,80.

External conditions

The PD plots in Fig. 6 illustrate the influence of external conditions (cloudy weather, rainy weather, and COVID-19 duration) on the likelihood of severe injury in road traffic crashes involving VRUs. The plot for cloudy weather conditions indicates that cloudy weather has the least effect on the likelihood of severe injury, as the predicted probability line remains relatively stable around the average predicted value. This suggests that cloudy weather conditions may not be a strong contributor to severe injury crashes. This can be attributed to the cautious behavior of riders and drivers in cloudy weather conditions, which helps mitigate the risk of severe crashes81. A prior investigation82 has also indicated that VRU groups may perceive cloudy weather as a precursor to impending rain, prompting them to refrain from riding altogether, which ultimately diminishes both the frequency and severity of crashes. In contrast, rainy weather indicates a significant increase in the predicted probability of severe injury, as evidenced by the sharp upward slope of the blue line in the corresponding PD plot. This suggests that rainy conditions heighten the risk of severe injury in road traffic crashes involving VRUs. This increase may be attributed to slippery road surfaces, limited visibility, and longer stopping distances, which impair riders and drivers’ ability to decelerate effectively, thereby increasing the collision risks83. In addition, the wet and slippery road surfaces significantly reduce the friction between tires and the road, aggravating the risk of skidding or losing control, both of which are associated with severe crash outcomes84. This situation is particularly problematic for VRU groups, such as motorcyclists and motorized rickshaw drivers, who are more vulnerable to losing balance or control in such conditions. A prior study also indicated that rainy weather conditions significantly elevate the likelihood of fatal crashes27. It has also been noted that severe injury crashes were more prevalent during the COVID-19 period. This observation is consistent with global trends in road traffic safety, which can be linked to several factors such as reduced adherence to traffic regulations and safety protocols, slower emergency response times, curtailment of the public transportation system, and the psychological effects of the COVID-19, all of which contributed to a complex risk environment that was conducive to more severe outcomes in the event of a crash8587. Prior research has similarly concluded that road traffic crashes during COVID-19 were more likely to result in severe injuries88,89. In Pakistan, earlier studies14,43 have drawn similar conclusions, indicating that VRU groups are less prone to severe injuries in crashes during cloudy weather conditions, as cloudy weather does not impair the ability of VRUs to respond effectively to driving situations. While the heightened risk of severe injury crashes during rainy weather conditions has been attributed to inadequate road infrastructure, the existence of various types of road distresses, and the lack of proper drainage, leading to more challenging and risky driving conditions78,90.

Roadway geometric features

The PD plot in Fig. 7 demonstrates that VRU crashes occurring at locations with painted medians are associated with severe injury outcomes, as indicated by the upward slope of the blue line in the plot. The increased likelihood of severe crashes at locations with painted medians can be attributed to the fact that these types of medians allow vehicles to maneuver at any location along the road segments, thereby heightening the chances of head-on and sideswipe collisions91. This finding is consistent with an earlier investigation, which indicated that roads with painted medians had the worst safety records92. In addition, it has been reported that road markings can help control and guide traffic flow, but their inadequacy can lead to crash occurrences93. In the context of Pakistan, painted medians are mostly provided on two-lane, two-way roads, which are associated with improper overtaking and speeding, resulting in an increased risk of severe injury crashes94.

Feature interaction analysis

The SHAP interaction strength plot presented in Fig. 8 explores the pairwise relationships among the ten most contributive features identified via SHAP value analysis. Each bar in the figure shows the interaction strength between two features, quantified by their combined contribution to the model’s output. Features with stronger interactions are shown higher on the plot, indicating that their relationship has a more substantial impact on shaping the model’s predictions. In addition, the blue bars indicate positive interactions, demonstrating that the combined effect of these features pushes the model’s output toward the higher end of the prediction scale (severe injury). Conversely, the red bars indicate negative interactions, indicating that the combined effect of these features pushes the model’s output toward the lower end of the prediction scale (non-severe injury).

Fig. 8.

Fig. 8

SHAP interaction strength plot for feature pairs.

Figure 8 shows that the most significant interaction occurs between VRUs with an intermediate level of education and those aged 55 years or older. While VRUs with an intermediate level of education are usually associated with a fundamental understanding of road safety practices, their interaction with older VRUs can increase the likelihood of crashes. This increased risk of severe injury crashes can be attributed to the fragile health and slower reaction times of older VRUs, which hinder their capacity to respond promptly and effectively to intricate traffic scenarios, such as those at intersections, on multilane roads, during interactions with heavy vehicles, and while performing complex maneuvers75,95.

The interaction strength plot also indicates that the interaction between painted medians and individuals with graduation and above level of education tends to reduce the severity of VRU-related crashes. While painted medians may not provide the full protective benefits of raised or physical medians, when combined with higher education levels, they still appear to offer safety benefits. This can be attributed to the fact that highly educated individuals are generally more likely to understand and adhere to traffic rules and exhibit a greater awareness of traffic hazards16. Consequently, the careful driving or crossing practices of individuals with higher education levels at locations with painted medians may result in a reduced risk of severe crashes.

The plot further illustrates the danger associated with mixed traffic scenarios, where various types of vehicles, including vans, heavy vehicles, and VRUs, share the same roadway, thereby increasing the likelihood of severe crashes. Vans, being smaller and more agile, contrast sharply with heavy vehicles, which are larger and exhibit significantly slower reaction times. This disparity in speed, agility, and driver perspective complicates the traffic environment, potentially leading to severe crashes, particularly in congested areas. Previous research has concluded that head-on and sideswipe collisions involving trucks and lighter vehicles, such as cars or vans, elevate the likelihood of severe crashes96.

It can also be observed that VRUs were more frequently involved in severe crashes during the COVID-19 lockdown period. This trend can be linked to a change in travel habits, as the halt of public transport forced more people to depend on walking, cycling, and motorcycling, which are modes that naturally involve a greater risk of severe crashes. In addition, the reduction in traffic volume during the lockdown period may have led to a false sense of safety, prompting higher speeds and riskier behaviors among VRUs, thus increasing the severity of crashes87. Furthermore, the psychological stress brought on by the pandemic, which included increased anxiety and distraction, may have further diminished the situational awareness of road users, especially VRUs, thereby leading to their heightened susceptibility on the roads88.

The notable interaction between cloudy and rainy weather conditions underscores the risks that adverse weather poses to road traffic safety. This could be attributed to the fact that such weather conditions diminish visibility, which often results in misjudgments and hence increases the risk of collisions for VRUs83,97. The elevated risk of road traffic crashes and injuries during adverse weather is intuitive, as documented in a previous investigation98. In the context of Pakistan, earlier research has drawn similar conclusions, indicating that VRU groups sustain severe injuries in crashes during rainy weather conditions19. This can be attributed to the deteriorated road infrastructure, which frequently suffers from inadequate drainage and multiple surface damages, contributes to more dangerous driving conditions, thereby increasing the likelihood of severe injury crashes.

The remaining feature pairs do not exhibit substantial interaction, as evidenced by the lower magnitude of the SHAP interaction values. This suggests that these feature pairs do not have a significant influence on the model’s output. However, they may still be part of the top ten feature pairs due to the ranking process, even if their impact on the severity of VRU crashes is not immediately obvious.

Policy implications

In Pakistan, road traffic crashes result in severe injury outcomes, leading to the loss of thousands of lives each year. This highlights a serious concern regarding the safety of all road users, underscoring the need for a comprehensive understanding of the subject matter through rigorous research and the provision of evidence-based guidance to policymakers for the development of effective preventive measures. This study recommends the implementation of safety measures related to VRU demographics, vehicle types, external conditions, and road geometric features to mitigate the severity of crashes involving VRUs. In this regard, tailored educational programs delivered in regional languages, particularly for less educated VRUs, that focus on safe driving techniques, can significantly minimize the severity of crashes involving VRUs. Furthermore, incorporating road safety education into the curriculum is essential for promoting responsible driving habits from a young age.

For the safety enhancement of older VRUs, it is crucial to provide and maintain sidewalks and crosswalks in good condition. In addition, implementing signalized crossings that offer longer crossing times is vital to accommodate the slower walking speeds of older individuals. Furthermore, improving visibility at critical locations where VRUs and vehicles interact, such as intersections and pedestrian crossings, can significantly improve safety outcomes for this demographic. It is also recommended that dedicated lanes be established to physically separate VRUs from vehicular traffic, particularly from both light and heavy commercial vehicles, which are prevalent in the country. In addition, strict penalties should be enforced for improper vehicle behavior around VRUs to ensure safer interactions. Policies should also focus on encouraging the integration of technologies that enhance the safety of vehicles in diverse and mixed driving conditions.

The findings also indicate that adverse weather conditions are critical risk factors that elevate the likelihood of crashes involving VRUs. Policies in this regard should focus on ensuring the implementation of effective drainage systems to prevent water accumulation on the road surface, which is associated with an increased risk of crashes. In addition, infrastructure improvements, such as replacing painted medians with proper road dividers, can minimize the incidence of head-on collisions, thereby improving road safety for all road users. Furthermore, the provision of protected U-turns and dedicated turning lanes can further enhance safety by minimizing conflict points between vehicles and VRUs. The authors contend that these policy measures could greatly improve not only the safety of VRUs, but also the overall efficacy of the transportation system in Pakistan.

Conclusions and recommendations

The safety investigation of VRUs is essential due to their increased vulnerability to crash risks. This study aims to predict the severity of injuries sustained by VRUs in Pakistan’s twin cities, Rawalpindi and Islamabad, by analyzing crash data from 2021 to 2023. The study evaluated the effectiveness of six boosting-based machine learning classifiers using various evaluation metrics. The findings indicated that boosting with decision stumps surpassed extreme gradient boosting, light gradient boosting, histogram-based gradient boosting, categorical boosting, and adaptive boosting in terms of recall, F1-score, and accuracy. The partial dependence plots indicated that older VRUs, collisions with vans and heavy vehicles, rainy weather, the COVID-19 lockdown period, and the presence of painted medians were associated with severe injury in VRU-related crashes. Furthermore, the interaction between vans and heavy vehicles, as well as between cloudy and rainy weather, was found to increase the likelihood of severe crashes. The interaction between VRUs and the COVID-19 lockdown period was also found to be positive, indicating an elevated risk of severe outcomes in such incidents.

While the findings of the study provide valuable insights, future research should focus on a more in-depth analysis of VRU-related crashes and the associated risk factors. This should include analyzing the crash dataset over extended timeframes and incorporating a broader range of explanatory variables, such as driving experience, vehicle age, driving hours, curve sharpness, and other relevant factors. The study strongly recommends the use of deep learning and other advanced artificial intelligence techniques to improve the understanding of safety concerns related to VRU crashes in Pakistan. Furthermore, in many low- and middle-income countries, crash datasets frequently face significant issues, such as underreporting, measurement inaccuracies, incomplete records, and inconsistencies due to varying levels of training among reporting personnel. Therefore, future studies should adopt advanced data collection methodologies, including real-time traffic monitoring systems, integration of multiple data sources, and enhanced training programs for reporting personnel, to enhance the reliability of crash data.

Supplementary Information

Below is the link to the electronic supplementary material.

Acknowledgements

The authors express their gratitude to Tangshan Institute, Southwest Jiaotong University (Tangshan Caofeidian Research Funding. Grant No.: R113623H01099) and International Postgraduate Education and Teaching Reform Grant (Project No. GYJG[2023]Y13) for supporting this study.

Author contributions

Muhammad Junaid: Conceptualization, Methodology, Formal Analysis, Writing - original draft, Writing - review & editing. Chaozhe Jiang: Supervision, Resources, Writing - original draft, Writing - review & editing. Saleh Alotaibi: Writing - review & editing. Tong Wang: Methodology, Writing - original draft, Writing - review & editing. Yahya Almarhab: Writing - review & editing.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Declarations

Declaration of generative AI and AI-assisted technologies

In the course of preparing this manuscript, the author(s) utilized ChatGPT 3.5 to enhance the language and clarity of the paper. Following this assistance, the author(s) thoroughly reviewed and revised the content to ensure it met their standards. The author(s) assume full responsibility for the final content of the publication.

Ethical approval and informed consent statement

This research did not include the direct involvement of human participants or any experimental procedures. It solely concentrated on the analysis of crash data, which is devoid of any personally identifiable information.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Sun, Z. et al. A hybrid approach of random forest and random parameters logit model of injury severity modeling of vulnerable road users involved crashes. Accid. Anal. Prev.192, 107235 (2023). [DOI] [PubMed] [Google Scholar]
  • 2.World Health Organization (WHO). Global Status Report on Road Safety. https://iris.who.int/bitstream/handle/10665/375016/9789240086517-eng.pdf?sequence=1 (2023).
  • 3.Shinar, D. Safety and mobility of vulnerable road users: pedestrians, bicyclists, and motorcyclists. Accid. Anal. Prev.44 1–2 (2012). [DOI] [PubMed]
  • 4.Status Summary. Road Traffic Fatalities and Safety Risk Factors, São Paulo City, Brazil. https://publichealth.jhu.edu/sites/default/files/2024-02/20231003bigrssaopaulostate07pages.pdf (2023).
  • 5.World Bank. Delivering Road Safety in India: Leadership Priorities and Initiatives to 2030. (2020).
  • 6.Hoque, M. M., Pervaz, S. & Ashek, A. A. N. Overview of the highway crashes in Bangladesh. In Proceedings of the 5th International Conference on Civil Engineering for Sustainable Development (ICCESD 2020), KUET, Khulna, Bangladesh 7–9 (2020).
  • 7.Saha, B., Fatmi, M. R. & Rahman, M. M. Traffic crashes in Dhaka, Bangladesh: analysing crashes involving unconventional modes, pedestrians and public transit. Int. J. Injury Control Saf. Promotion. 28, 347–359 (2021). [DOI] [PubMed] [Google Scholar]
  • 8.Toroyan, T. Global status report on road safety. Inj. Prev.15, 286–286 (2009). [DOI] [PubMed] [Google Scholar]
  • 9.DAWN. Deadly accidents. DAWN.COM. https://www.dawn.com/news/1705346 (2022).
  • 10.THE EXPRESS TRIBUNE. Fatal road accidents. The Express Tribune. https://tribune.com.pk/story/2398660/fatal-road-accidents (2023).
  • 11.DAWN. Government launches road safety strategy. DAWN COM. https://www.dawn.com/news/1445885 (2018).
  • 12.Pervez, A., Lee, J. & Huang, H. Exploring factors affecting the injury severity of freeway tunnel crashes: A random parameters approach with heterogeneity in means and variances. Accid. Anal. Prev.178, 106835 (2022). [DOI] [PubMed] [Google Scholar]
  • 13.LeDuc, T. World health rankings live longer live better. USA: LeDuc Media Recuperado de: 81, http://www.leducmedia.com (2018).
  • 14.Ijaz, M., Lan, L., Usman, S. M., Zahid, M. & Jamal, A. Investigation of factors influencing motorcyclist injury severity using random parameters logit model with heterogeneity in means and variances. Int. J. Crashworthiness. 27, 1412–1422 (2022). [Google Scholar]
  • 15.Pervez, A., Lee, J. & Huang, H. Identifying factors contributing to the motorcycle crash severity in Pakistan. J. Adv. Transp.2021, 1–10 (2021). [Google Scholar]
  • 16.Waseem, M., Ahmed, A. & Saeed, T. U. Factors affecting motorcyclists’ injury severities: an empirical assessment using random parameters logit model with heterogeneity in means and variances. Accid. Anal. Prev.123, 12–19 (2019). [DOI] [PubMed] [Google Scholar]
  • 17.Zahid, M. et al. Factors affecting injury severity in motorcycle crashes: different age groups analysis using catboost and SHAP techniques. Traffic Inj. Prev.25, 472–481 (2024). [DOI] [PubMed] [Google Scholar]
  • 18.Ijaz, M., Zahid, M. & Jamal, A. A comparative study of machine learning classifiers for injury severity prediction of crashes involving three-wheeled motorized rickshaw. Accid. Anal. Prev.154, 106094 (2021). [DOI] [PubMed] [Google Scholar]
  • 19.Pervez, A., Jamal, A. & Khan, S. H. Analyzing injury severity of three-wheeler motorized rickshaws: a correlated random parameters approach with heterogeneity in means. Accid. Anal. Prev.204, 107651 (2024). [DOI] [PubMed] [Google Scholar]
  • 20.Talpur, M. et al. (ed A., G.) Forensic evaluation of the patterns of fatal injuries among pedestrians in road traffic accidents in Hyderabad, Pakistan. JMMC11 62–67 (2020). [Google Scholar]
  • 21.Wang, C. et al. Temporal assessment of injury severities of two types of pedestrian-vehicle crashes using unobserved-heterogeneity models. J. Transp. Saf. Secur.16, 820–869 (2024). [Google Scholar]
  • 22.Santos, K., Firme, B., Dias, J. P. & Amado, C. Analysis of motorcycle accident injury severity and performance comparison of machine learning algorithms. Transp. Res. Record: J. Transp. Res. Board.2678, 736–748 (2024). [Google Scholar]
  • 23.Kashifi, M. T. Investigating two-wheelers risk factors for severe crashes using an interpretable machine learning approach and SHAP analysis. IATSS Res.47, 357–371 (2023). [Google Scholar]
  • 24.Tamakloe, R., Das, S., Aidoo, E. N. & Park, D. Factors affecting motorcycle crash casualty severity at signalized and non-signalized intersections in ghana: insights from a data mining and binary logit regression approach. Accid. Anal. Prev.165, 106517 (2022). [DOI] [PubMed] [Google Scholar]
  • 25.Wankie, C. et al. Prevalence of crashes and associated factors among commercial motorcycle riders in Bamenda, Cameroon. J. Transp. Health. 20, 100993 (2021). [Google Scholar]
  • 26.Xin, C., Wang, Z., Lee, C. & Lin, P. S. Modeling safety effects of horizontal curve design on injury severity of single-motorcycle crashes with mixed-effects logistic model. Transp. Res. Record: J. Transp. Res. Board.2637, 38–46 (2017). [Google Scholar]
  • 27.Gandupalli, S. R., Kokkeragadda, P. & Dangeti, M. R. Analysis and modelling of crash severity of vulnerable road users through discrete methods: a case study approach. Innov. Infrastruct. Solut.8, 298 (2023). [Google Scholar]
  • 28.Law, T. H., Ghanbari, M., Hamid, H., Abdul-Halin, A. & Ng, C. P. Role of sensory and cognitive conspicuity in the prevention of collisions between motorcycles and trucks at T-intersections. Accid. Anal. Prev.96, 64–70 (2016). [DOI] [PubMed] [Google Scholar]
  • 29.Sun, Z. et al. Exploring injury severity of vulnerable road user involved crashes across seasons: A hybrid method integrating random parameter logit model and Bayesian network. Saf. Sci.150, 105682 (2022). [Google Scholar]
  • 30.Gel, Y., Miao, W. & Gastwirth, J. L. The importance of checking the assumptions underlying statistical analysis: graphical methods for assessing normality. Jurimetrics46, 3 (2005). [Google Scholar]
  • 31.Troncoso Skidmore, S. & Thompson, B. Bias and precision of some classical ANOVA effect sizes when assumptions are violated. Behav. Res.45, 536–546 (2013). [DOI] [PubMed] [Google Scholar]
  • 32.Kumar, S. & Toshniwal, D. A data mining framework to analyze road accident data. J. Big Data. 2, 26 (2015). [Google Scholar]
  • 33.Tang, J., Liang, J., Han, C., Li, Z. & Huang, H. Crash injury severity analysis using a two-layer stacking framework. Accid. Anal. Prev.122, 226–238 (2019). [DOI] [PubMed] [Google Scholar]
  • 34.Laoudai, O. Machine learning models vs. statistical models. Infomineohttps://infomineo.com/data-analytics/machine-learning-models-vs-statistical-models/ (2024).
  • 35.Kim, B., Khanna, R. & Koyejo, O. O. Examples are not enough, learn to criticize! criticism for interpretability. Adv. Neural Inf. Process. Syst.29, (2016).
  • 36.Plumb, G., Molitor, D. & Talwalkar, A. S. Model agnostic supervised local explanations. Adv. Neural Inf. Process. Syst.31, (2018).
  • 37.Goldstein, A., Kapelner, A., Bleich, J. & Pitkin, E. Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graphical Stat.24, 44–65 (2015). [Google Scholar]
  • 38.Lundberg, S. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874 (2017).
  • 39.Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 10.1145/2939672.2939778 (ACM, 2016).
  • 40.Ahmed, S., Hossain, M. A., Ray, S. K., Bhuiyan, M. M. I. & Sabuj, S. R. A study on road accident prediction and contributing factors using explainable machine learning models: analysis and performance. Transp. Res. Interdisciplinary Perspect.19, 100814 (2023). [Google Scholar]
  • 41.Ji, A. & Levinson, D. Injury severity prediction from two-vehicle crash mechanisms with machine learning and ensemble models. IEEE Open. J. Intell. Transp. Syst.1, 217–226 (2020). [Google Scholar]
  • 42.Ma, Z., Mei, G. & Cuomo, S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accid. Anal. Prev.160, 106322 (2021). [DOI] [PubMed] [Google Scholar]
  • 43.Mansoor, U., Jamal, A., Su, J., Sze, N. N. & Chen, A. Investigating the risk factors of motorcycle crash injury severity in Pakistan: insights and policy recommendations. Transp. Policy. 139, 21–38 (2023). [Google Scholar]
  • 44.Wei, F., Xu, P., Guo, Y. & Wang, Z. Exploring the injury severity of vulnerable road users to truck crashes by ensemble learning. J. Transp. Saf. Secur.16, 1259–1282 (2024). [Google Scholar]
  • 45.Almahdi, A. et al. Boosting ensemble learning for freeway crash classification under varying traffic conditions: A hyperparameter optimization approach. Sustainability15, 15896 (2023). [Google Scholar]
  • 46.Dong, S., Khattak, A., Ullah, I., Zhou, J. & Hussain, A. Predicting and analyzing road traffic injury severity using boosting-based ensemble learning models with SHAPley additive explanations. Int. J. Environ. Res. Public Health. 19, 2925 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Jamal, A. & Umer, W. Exploring the injury severity risk factors in fatal crashes with neural network. Int. J. Environ. Res. Public Health. 17, 7466 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Singh, R. et al. Highway 4.0: digitalization of highways for vulnerable road safety development with intelligent IoT sensors and machine learning. Saf. Sci.143, 105407 (2021). [Google Scholar]
  • 49.Tao, W. et al. An advanced machine learning approach to predicting pedestrian fatality caused by road crashes: A step toward sustainable pedestrian safety. Sustainability14, 2436 (2022). [Google Scholar]
  • 50.Elalouf, A., Birfir, S. & Rosenbloom, T. Developing machine-learning-based models to diminish the severity of injuries sustained by pedestrians in road traffic incidents. Heliyon9, (2023). [DOI] [PMC free article] [PubMed]
  • 51.Arifeen, S. U., Ali, M. & Macioszek, E. Analysis of vehicle pedestrian crash severity using advanced machine learning techniques. Archives Transp.68, 91–116 (2023). [Google Scholar]
  • 52.Petraki, V., Roussou, S., Ziakopoulos, A. & Yannis, G. Enhancing cyclist safety: predictive analysis of injury severity and advocacy for evidence-based interventions. (2024).
  • 53.Murtaza, A. The Twin Cities of Pakistan: Islamabad and Rawalpindi. lighthouseindopakhttps://editorlighthousepr.wixsite.com/lighthouseindopak/single-post/2015/11/07/the-twin-cities-of-pakistan-islamabad-and-rawalpindi (2015).
  • 54.Ahmad, T. et al. COVID-19 in Pakistan: A national analysis of five pandemic waves. PLoS One. 18, e0281326 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Islamabad Scene. Pakistan lifts all coronavirus restrictions. https://www.islamabadscene.com/pakistan-lifts-all-coronavirus-restrictions/ (2022).
  • 56.Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev.54, 1937–1967 (2021). [Google Scholar]
  • 57.Giraud-Carrier, C. Combining base-learners into ensembles. In Metalearning 169–188 10.1007/978-3-030-67024-5_9 (Springer International Publishing, 2022).
  • 58.Chen, T., Guestrin, C. & XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 10.1145/2939672.2939785 (ACM, 2016).
  • 59.Saied, M., Guirguis, S. & Madbouly, M. A. Comparative study of using boosting-based machine learning algorithms for IoT network intrusion detection. Int. J. Comput. Intell. Syst.16, 177 (2023). [Google Scholar]
  • 60.Guryanov, A. Histogram-based algorithm for building gradient boosting ensembles of piecewise linear decision trees. In Analysis of Images, Social Networks and Texts (eds Van Der Aalst, W. M. P. et al.) vol. 11832, 39–50 (Springer International Publishing, 2019). [Google Scholar]
  • 61.Ibrahim, A. A., Ridwan, R. L., Muhammed, M. M., Abdulaziz, R. O. & Saheed, G. A. Comparison of the catboost classifier with other machine learning methods. Int. J. Adv. Comput. Sci. Appl.11, (2020).
  • 62.Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst.31, (2018).
  • 63.Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001).
  • 64.Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci.55, 119–139 (1997). [Google Scholar]
  • 65.Freund, Y., Schapire, R. & Abe, N. A short introduction to boosting. J.-Jpn. Soc. Artif. Intell.14, 1612 (1999). [Google Scholar]
  • 66.Shapley, L. S. Notes on the n-person game—ii: The value of an n-person game. (1951).
  • 67.Arafat, M. Y., Hoque, S., Xu, S. & Farid, D. M. Machine learning for mining imbalanced data. (2019).
  • 68.Nguyen, G. H., Bouzerdoum, A. & Phung, S. L. Learning pattern classification tasks with imbalanced data sets. Pattern Recogn.10, 1322–1328 (2009). [Google Scholar]
  • 69.Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res.16, 321–357 (2002). [Google Scholar]
  • 70.Douzas, G., Bacao, F. & Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci.465, 1–20 (2018). [Google Scholar]
  • 71.Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: from Theory to Algorithms (Cambridge University, 2014).
  • 72.Ijaz, M. et al. Temporal instability of factors affecting injury severity in helmet-wearing and non-helmet-wearing motorcycle crashes: a random parameter approach with heterogeneity in means and variances. Int. J. Environ. Res. Public Health. 19, 10526 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Jamal, A. et al. Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study. Int. J. Injury Control Saf. Promotion. 28, 408–427 (2021). [DOI] [PubMed] [Google Scholar]
  • 74.Friedman, J. The elements of statistical learning: data mining, inference, and prediction. (No Title) (2009).
  • 75.Yasmin, S., Eluru, N., Bhat, C. R. & Tay, R. A latent segmentation based generalized ordered logit model to examine factors influencing driver injury severity. Analytic Methods Accid. Res.1, 23–38 (2014). [Google Scholar]
  • 76.Naghdi, K. et al. The association between the outcomes of trauma, education and some socio-economic indicators. Archives Trauma. Res.12, 84–89 (2023). [Google Scholar]
  • 77.Saeednejad, M. et al. Association of social determinants of health and road traffic deaths: a systematic review. Bull. Emerg. Trauma.8, 211 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Junaid, M., Pervez, A., Lee, J. J., Jiang, C. & Khan, W. A. Investigating contributing factors to severe injuries in crashes involving motorized rickshaws in Pakistan. J. Saf. Res.94, 157–166 (2025). [Google Scholar]
  • 79.Batool, Z. & Carsten, O. Self-reported dimensions of aberrant behaviours among drivers in Pakistan. Transp. Res. Part. F: Traffic Psychol. Behav.47, 176–186 (2017). [Google Scholar]
  • 80.Pervez, A. et al. Risky riding behaviors among motorcyclists and self-reported safety events in Pakistan. Transp. Res. Part. F: Traffic Psychol. Behav.105, 350–367 (2024). [Google Scholar]
  • 81.Jones, S., Gurupackiam, S. & Walsh, J. Factors influencing the severity of crashes caused by motorcyclists: analysis of data from Alabama. J. Transp. Eng.139, 949–956 (2013). [Google Scholar]
  • 82.Cheng, W., Gill, G. S., Sakrani, T., Dasu, M. & Zhou, J. Predicting motorcycle crash injury severity using weather data and alternative Bayesian multivariate crash frequency models. Accid. Anal. Prev.108, 172–180 (2017). [DOI] [PubMed] [Google Scholar]
  • 83.Druta, C., Kassing, A., Gibbons, R. & Alden, V. A. Assessing driver behavior using shrp2 adverse weather data. J. Saf. Res.73, 283–295 (2020). [DOI] [PubMed] [Google Scholar]
  • 84.Ullah, H., Farooq, A. & Shah, A. A. An empirical assessment of factors influencing injury severities of motor vehicle crashes on national highways of Pakistan. J. Adv. Transp.2021, 1–11 (2021). [Google Scholar]
  • 85.Ageta, K. et al. Delay in emergency medical service transportation responsiveness during the COVID-19 pandemic in a minimally affected region. Acta Med. Okayama. 74, 513–520 (2020). [DOI] [PubMed] [Google Scholar]
  • 86.Gong, Y., Lu, P. & Yang, X. T. Impact of COVID-19 on traffic safety from the lockdown to the new normal: A case study of Utah. Accid. Anal. Prev.184, 106995 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Stiles, J., Kar, A., Lee, J. & Miller, H. J. Lower volumes, higher speeds: changes to crash type, timing, and severity on urban roads from COVID-19 Stay-at-Home policies. Transp. Res. Record: J. Transp. Res. Board.2677, 15–27 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Dong, X., Xie, K. & Yang, H. How did COVID-19 impact driving behaviors and crash severity?? A multigroup structural equation modeling. Accid. Anal. Prev.172, 106687 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Taheri, A. et al. The influences of strict and post-strict lockdowns due to the Covid-19 pandemic on crash severity on rural roads: A case study of Khorasan Razavi, Iran. Transp. Res. Part. F: Traffic Psychol. Behav.97, 231–245 (2023). [Google Scholar]
  • 90.Pervez, A. & Oad, A. Navigating the road to safer travel in Pakistan: A multi-perspective analysis of road safety challenges and solutions. Sir Syed Univ. Res. J. Eng. Technol.13, 104–110 (2023). [Google Scholar]
  • 91.Michel, M. B., Francois, W. J., Elambo, N. G., Delore, W. T. & Works, P. O. Impact of road geometric design elements on road traffic accidents in the city of Yaounde Cameroon. Methodology10 (2020).
  • 92.Gattis, J. L., Balakumar, R. & Duncan, L. K. Effects of rural highway median treatments and access. Transp. Res. Rec. J. Transp. Res. Board1931, 99–107 (2005).
  • 93.Perumal, V. & V S, V. & Analysis of crash severity in rear-end and angle collisions on an urban roundabout in heterogeneous non-lane-based traffic conditions. Transp. Lett. 1–11. 10.1080/19427867.2024.2338325 (2024).
  • 94.Kashani, A. T., Shariat-Mohaymany, A. & Ranjbari, A. Analysis of factors associated with traffic injury severity on rural roads in Iran. J. Injury Violence Res.4, 36 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Oxley, J., Corben, B., Fildes, B., O’Hare, M. & Rothengatter, T. Older vulnerable road users- measures to reduce crash and injury risk. Monash Univ. Accid. Res. Centre Rep.218, 162 (2004). [Google Scholar]
  • 96.Zhu, X. & Srinivasan, S. A comprehensive analysis of factors influencing the injury severity of large-truck crashes. Accid. Anal. Prev.43, 49–57 (2011). [DOI] [PubMed] [Google Scholar]
  • 97.Li, Z. et al. Investigation of driver injury severities in rural single-vehicle crashes under rain conditions using mixed logit and latent class models. Accid. Anal. Prev.124, 219–229 (2019). [DOI] [PubMed] [Google Scholar]
  • 98.Sivasankaran, S. K., Rangam, H. & Balasubramanian, V. Investigation of factors contributing to injury severity in single vehicle motorcycle crashes in India. Int. J. Injury Control Saf. Promot.28, 243–254 (2021). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES