Skip to main content
MethodsX logoLink to MethodsX
. 2025 Sep 8;15:103612. doi: 10.1016/j.mex.2025.103612

A combined modelling approach to predicting injury severity in rear-end collisions

Shufeng Wang 1,, Shixuan Jiang 1, Zhengli Wang 1, Lingyi Meng 1
PMCID: PMC12495341  PMID: 41048319

Abstract

Rear-end collisions constitute the most prevalent category of urban road traffic accidents, resulting in severe traffic congestion, casualties, and substantial economic losses. To mitigate the impact of such accidents effectively, this study proposes a severity prediction model that integrates Convolutional Neural Networks (CNN) and Extreme Gradient Boosting (XGBoost). The model employs the U.S. Department of Transportation's Fatality Analysis Reporting System (FARS) accident dataset, which undergoes preliminary preprocessing. Subsequently, Principal Component Analysis (PCA) is applied to reduce the dimensionality of the influencing factors prior to their input into the combined model for classification. CNN is utilized to extract features, while XGBoost is responsible for classification. Experimental results demonstrate that the combined model achieves a classification accuracy of 96.2 %, with superior AUC and F1 scores compared to traditional models, indicating excellent predictive performance.

  • This paper proposes a hybrid CNN-XGBoost algorithm that combines the superior feature extraction capability of CNN with the powerful structured data processing and precise prediction ability of XGBoost, resulting in a significant performance improvement over traditional algorithms.

Keywords: Rear-end collision, Crash analysis, Severity prediction model, CNN, XGBoost, Combined model

Graphical abstract

Image, graphical abstract


Specifications table

Subject area Computer Science
More specific subject area Machine Learning
Name of your method CNN-XGBoost combined model
Name and reference of original method N/A
Resource availability Datasets available:https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars.

Background

With the rapid development of transportation systems, road traffic accidents have become a critical issue globally. Rear-end collisions, in particular, represent one of the most frequent and severe types of accidents, accounting for approximately 30 % of all incidents, with 89 % attributed to driver errors [[1], [2], [3]]. Existing studies typically employ three types of models to predict accident severity: statistical models, machine learning models, and hybrid models [[4], [5], [6], [7], [8], [9]].

Statistical models such as Logit, Probit, and Logistic Regression (LR) offer high interpretability and have been widely used in severity prediction [10,11]. However, they rely heavily on assumptions of linearity and independence, which limits their ability to capture complex interactions among variables, often leading to poor generalization and biased estimations when assumptions are violated [12]. In contrast, machine learning models—including Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGBoost)—demonstrate superior nonlinear modeling capabilities and have shown significantly better predictive performance [[13], [14], [15]]. Nonetheless, these models often operate as “black boxes,” making it difficult to interpret prediction outcomes and identify key contributing factors [16].

Deep learning methods such as Convolutional Neural Networks (CNN) further enhance modeling capabilities, particularly in processing high-dimensional and complex data. Studies have shown that CNNs outperform traditional models when additional contextual data such as time and weather are included [3,17]. However, deep learning models typically require large datasets and suffer from limited interpretability, raising concerns in practical applications.

Hybrid models attempt to combine the strengths of statistical and machine learning approaches to improve accuracy and interpretability [18,19]. For example, a mixed Logit model was integrated with association rule learning to predict pedestrian crash severity, achieving improved explanatory power [20]. Similarly, hybrid models combining neural networks with traditional methods were developed to enhance prediction performance [21,22].

Given these insights, this study proposes a novel hybrid approach that integrates CNN with XGBoost. CNN is highly effective at learning spatial and hierarchical patterns from complex input features, while XGBoost excels in classification tasks, handling missing data, and reducing overfitting. This combination leverages CNN's feature extraction strengths with XGBoost's robust predictive capabilities. The proposed model is benchmarked against LR, BN, DT, and standalone XGBoost through controlled experiments to support traffic safety decision-making Fig. 1.

Fig. 1.

Fig 1

Total framework of the paper.

Method details

This study introduces an innovative prediction methodology that amalgamates CNN with XGBoost, thereby capitalizing on their distinctive and complementary attributes. CNN demonstrates remarkable proficiency in processing high-dimensional and intricate datasets, adeptly discerning the interrelationships among diverse factors implicated in rear-end collisions. XGBoost, on the other hand, is renowned for its robustness and superior classification capabilities, effectively managing missing values and accommodating a wide array of feature types while mitigating the risk of overfitting and ensuring precise predictions. Through meticulously designed controlled experiments, the proposed hybrid model is benchmarked against established algorithms such as XGBoost, Logistic Regression (LR), Bayesian Networks (BN), and Decision Trees (DT).

Data acquisition

The Fatality Analysis Reporting System (FARS) is a comprehensive database maintained by the U.S. National Highway Traffic Safety Administration, which meticulously records detailed information on U.S. traffic accidents, including accident time, location, and casualty information. For the purpose of this study, a total of 39,731 accident records related to rear-end collisions on urban roads were extracted from the FARS database.

Variable processing

In accordance with the principles of relevance, reliability, and operability, records of accidents caused by drivers' own factors (such as drunk driving and drugged driving) and information unrelated to the severity of accidental injuries (such as license plate number and vehicle manufacturer) were excluded from the dataset. Ultimately, eighteen influencing factors were selected for analysis, with the meanings of each variable detailed in Table 1.

Table 1.

Description of independent variables.

Name Description Typology
AGE Age of the injured Continuity
Speed Relative vehicle speeds at the time of the collision Continuity
FIRE Crash fires (0 fires, 1 fire) Discreteness
AIR_BAG Airbags (0 not used, 1 used) Discreteness
REST_USE Seat belts (0 used, 1 not used) Discreteness
WKDY Date (0 working days, 1 rest day) Discreteness
VALIGN Lane curvature (0 straight, 1 curved) Continuity
SPEEDREL Whether speeding (0 not speeding, 1 speeding) Discreteness
VPROFILE Driveway gradient (0 horizontal, 1 sloped, 2 hilly) Continuity
BDYTYP Vehicle type (0 regular sedan, 1 mid-size sedan, 2 SUV\truck) Discreteness
PCRASH Obstacle avoidance maneuvers in case of collision (0 deceleration, 1 steering, 2 deceleration + steering) Discreteness
VTRAFWAY Lanes (0 single lane, 1 two-lane without segregation, 2 two-lane and above) Discreteness
VTRAFCON Control equipment (0 no signals, 1 with traffic signals, 2 regulatory warning signs) Discreteness
GVWR Vehicle weight (0 for <3001lbs, 1 for 3001–4501, 2 for >4501lbs) Continuity
WEATHR Weather (0 sunny, 1 foggy, 2 rainy, 3 snowy) Discreteness
LGTCON Light (0 daytime, 1 dawn or dusk, 2 nighttime with light, 3 nighttime without light) Continuity
VSURCOND Surface conditions (0 dry, 1 wet, 2 waterlogged, 3 snowlogged, 4 frozen) Discreteness
HOUR Time (0 is 6:00–9:59AM, 1 is 10:00–2:59PM, 2 is 3:00–5:59PM, 3 is 6:00–8:59PM, 4 is 9:00–5:59AM) Continuity

The qualitative classification of the severity of road traffic accidents does not adhere to a uniform international standard. Different countries may establish specific divisions based on their unique characteristics. However, the primary considerations generally include the extent of property damage and the severity of casualties. Consequently, in this study, the dependent variable, accident injury severity, is categorized into four levels, as detailed in Table 2.

Table 2.

Description of dependent variables.

Classification of severity Value Explanation of casualties
undamaged 0 No injuries, property damage only
minor injuries 1 Minor injuries other than fatal and disabling injuries
seriously hurt 2 Non-fatal injuries with varying degrees of disability
dead 3 Death on the spot or within 30

Normalization

Normalization is an important method of data preprocessing aimed at eliminating scale differences between different features. It achieves this through linear transformation of the original data, mapping the results to the range [−1,1]. This ensures that the impact of each feature on the model is more balanced, preventing certain features with larger value ranges from disproportionately influencing the model. Additionally, normalization helps improve the stability and accuracy of the model while accelerating its convergence. The normalization formula is as follows:

x*=xxminxmaxxmin (1)

Where x* is the new data obtained by normalization, x is the original data, xmin is the minimum value in the sample data, and xmax is the maximum value of the sample data.

PCA downscaling

PCA is a widely used data dimensionality reduction algorithm. In data analysis, when there are too many features affecting the indicators, it can lead to overly complex models and the problem of dimensional catastrophe, which affects the validity of the prediction results. For the 18 categories of variables studied, samples from each category were screened to form a total matrix. PCA is performed to find the PCA transformed Y matrix of the X matrix. The covariance matrix of the samples needs to be found, firstly for each vector its eigenmeans are calculated:

u=1ni=1nxij (2)

The covariance matrix S can be obtained from the eigenmeans:

S=1n1i=1n(xiu)(xiu)T (3)

The eigenvalues and eigenvectors of the covariance matrix S can be found by using Eq. (4):

Svi=λivii=1,2,,n (4)

Where: λ is the eigenvalue of the covariance matrix and also denotes the value of the variance obtained by the ith component, and v is the eigenvector representing the direction of the distribution of the total variance over the ith component.

For each component, the size of its variance as a proportion of the total variance indicates how much information it carries, and from this, the contribution of each component can be derived.

Pi=λii=1λiλi=λim (5)

Usually, the cumulative contribution rate of the principal components reaches 85 % to fully reflect the information of the original influencing factor variables. The influencing factors of accidental injury severity were analyzed by principal components to obtain the corresponding eigenvalue, contribution rate, and cumulative contribution rate of each principal component, as shown in Table 3. It shows that the cumulative contribution rate of the first 14 principal components is 86.905 %. Therefore, selecting the first 14 principal components to characterize the accident data information contained in the features ensures the mutual independence of each variable in the accident data samples and achieves the effect of dimensionality reduction Fig. 2.

Table 3.

Interpretation of total variance of principal components.

Ingredient Eigenvalue
Extract the sum of the squares of the loads
Total Percentage of variance Cumulative ( %) Total Percentage of variance Cumulative ( %)
x1 1.998 11.097 11.097 1.998 11.097 11.097
x2 1.416 7.865 18.962 1.416 7.865 18.962
x3 1.304 7.245 26.207 1.304 7.245 26.207
x4 1.195 6.640 32.847 1.195 6.640 32.847
x5 1.086 6.036 38.883 1.086 6.036 38.883
x6 1.065 5.918 44.801 1.065 5.918 44.801
x7 1.028 5.711 50.511 1.028 5.711 50.511
x8 .997 5.537 56.049 .997 5.537 56.049
x9 .974 5.411 61.460 .974 5.411 61.460
x10 .959 5.330 66.790 .959 5.330 66.790
x11 .938 5.213 72.003 .938 5.213 72.003
x12 .920 5.109 77.112 .920 5.109 77.112
x13 .901 5.006 82.118 .901 5.006 82.118
x14 .862 4.787 86.905 .862 4.787 86.905
x15 .777 4.317 91.222
x16 .725 4.029 95.251
x17 .662 3.680 98.930
x18 .193 1.070 100.000

Fig. 2.

Fig 2

Scatter plot.

Since the principal components cannot visually represent the influencing factors of accident severity, it is necessary to convert the dimensionality reduction results into representative original features to perform subjective analysis and validation of the influencing factors. Based on the eigenvalues and cumulative contributions, the principal component composite scores were calculated, and the explanatory variables SPEED, PCRASH, GVWR, BDYTYP, FIRE, REST_USE, AIR_BAG, HOUR, WKDY, LGTCON, WEATHR, VTRAFWAY, VALIGN, and AGE were determined to be representative of the accident data samples. These 14 variables were utilized for further prediction studies of injury severity. The definitions of the above variables are shown in Table 1.

CNN

CNN has unique advantages in data processing. One of its main features is local awareness. Through the operation of the convolutional layer, CNN can effectively focus on and extract features from local regions in the data. This approach helps to recognize patterns and associations in the data. Additionally, the parameter-sharing property of CNN enables the network to use the structural information of the data more effectively in the learning process, reducing the number of parameters to be trained and improving the efficiency and generalization ability of the model.

The CNN structure is depicted in Fig. 3. The Input layer receives the input, while the Convolution layer (Conv) is responsible for extracting features from the data. Batch Normalization (BN) is utilized to enhance the speed and stability of neural network training. The Pooling layer (Pooling) reduces the dimensionality of the data while retaining key information. The Activation function increases the expressive ability of the model. Lastly, the Full Connection layer (Full Connection) maps the extracted features to the final output. These layers collaborate to construct a deep Learning model that efficiently processes and learns from complex data.

Fig. 3.

Fig 3

CNN network structure.

CNN network reduces the high dimensional data by convolution and pooling layer. The height and width of the data after convolution and pooling layer through CNN network changes as in Eq. (6):

N=(WF+2P)/S+1 (6)

where N is the output size, W is the original size of the data input, P is the size of the edge complementary zeros, S is the step size of the convolution kernel move, and F is the size of the convolution kernel.

Feature extraction

CNN contains a convolutional layer, pooling layer, fully connected layer, and output layer, where neurons in adjacent layers are connected and there is no connection between the same layers. The whole can be divided into three parts: input, feature extraction, and output. The feature extraction stage mainly comprises two parts: convolution and pooling. The entire network can automatically extract features from the input matrix without expert experience.

Convolutional operations leverage the local connectivity property to extract local features of the input matrix. During iterative learning, the shared weights in the convolution kernel are adjusted by gradient descent to maximize the extraction of data features. The convolution is calculated as shown in Eq. (7).

Xij=f(q=1rp=1r(D(i+p)(j+q)Cpq)+bc) (7)

where Xij is the output feature at position (i,j), D(i+p)(j+q) denotes the input element, Cpq is the convolution kernel weight, bc is the bias term, and f(·) is the activation function applied to the weighted sum.

In this paper, the ReLU activation function (f(x)=max(0,x)) with fast convergence speed is selected, the input matrix D is of dimension n×m, and the sliding calculation with step size 1 is performed using a convolution kernel of r×r, together with the bias variable b to obtain the feature vector of dimension (nr+1)×(mr+1).

After the features are extracted from the convolutional layer, the pooling layer is employed for feature aggregation to decrease the computational load and reduce the size of the convolutional features. Pooling is calculated as shown in Eq. (8).

X^ij=α×Down(Xij)+bp (8)

Where: Down is the pooling operation, α is the pooling weight and bp is the pooling layer bias weight. The commonly used pooling operations are meanpooling and maxpooling. denotes that at a pooling scale of l×l, a range of l×l is selected in the convolutional feature matrix for the operation. meanpooling is replaced by calculating the mean of the range, while maxpooling is replaced by the maximum of the range.

To enhance the effectiveness of the features extracted by CNN, this paper conducts feature extraction using different convolutional kernels and hyperparameters. It selects the appropriate structure and optimal hyperparameters based on the model's fitting ability. Table 4 displays the Mean Absolute Percentage Error (MAPE) of the training set under different convolutional kernels.

Table 4.

MAPE with different convolutional kernels.

Group C1 S1 C2 S2 MAPE( %)
1 3×3 2×2 3×3 2×2 11.3
2 3×3 2×2 5×5 2×2 14.7
3 5×5 2×2 3×3 2×2 15.1
4 5×5 2×2 5×5 2×2 19.6

As evident from Table 4, the MAPE value is the lowest under combination 1. Therefore, in this paper, group 1 is chosen for convolutional network feature extraction, and the corresponding optimal hyperparameter settings are displayed in Table 5.

Table 5.

CNN optimal hyperparameters.

Name Optimizer Dropout Rate Learning Rate Batch Size Dense_units Padding
Value Adam 0.3 0.01 64 256 Valid

XGBoost

XGBoost is a parallel regression tree model that incorporates the boosting concept, improved based on the gradient boosted decision tree (GBDT) [23]. Compared to the GBDT model, XGBoost overcomes the limitations in computational speed and accuracy with its regularization technique to prevent model overfitting. Unlike traditional GBDT, which performs a first-order Taylor expansion of the computational loss function, XGBoost performs a second-order Taylor expansion to ensure the accuracy of the model. Additionally, each feature is chunked (blocked) and sorted, enabling parallelization of the computation when searching for the optimal splitting point. This greatly accelerates computational speed. Fig. 4 illustrates an example of the principle of the XGBoost algorithm.

Fig. 4.

Fig 4

Example of the principle of XGBoost algorithm.

The prediction expression is:

y^i(t)=k=1tfk(xi)=y^i(t1)+ft(xi) (9)

Where: y^i(t) is the prediction result of sample i after the tth iteration; y^i(t1) is the prediction result of the t-1st tree; ft(xi) is the prediction result of the model for the tth tree.

The objective function of the model consists of a loss function and a regularization term.

Obj(t)=i=1nl(yi,y^i(t))+j=1tΩ(fj) (10)

Where: Obj(t) is the objective function of the model; i=1nl(yi,y^i(t)) is the corresponding loss function; and j=1tΩ(fj) is the regularization term.

In order to fully utilize the performance of XGBoost, its key parameters need to be adjusted reasonably. These parameters can not only affect the model's fitting ability but also have a significant impact on the accuracy of the prediction results. Therefore, this paper adopts the method of grid search combined with 5-fold cross-validation to tune the key parameters of the XGBoost model to enhance its performance and generalization ability. The parameters optimized in this paper include: n_estimators, learning_rate, max_depth, colsample_bytree, gamma, subsample. The optimization flow is depicted in Fig. 5.

Fig. 5.

Fig 5

XGBoost hyperparameter optimization flow.

CNN-XGBoost combination

By combining CNN and XGBoost in a cascade, the complementary strengths of both methods are utilized. This approach leverages CNN's deep learning for automatic feature extraction and XGBoost's ability to combine weak classifiers and rank feature importance. This not only improves prediction performance but also enhances the interpretability of each factor's impact on the results.

The CNN and XGBoost combination model leverages the strengths of both deep learning and GBT algorithms. In this combination, the CNN serves as a feature extractor. A data matrix is constructed and fed into the CNN to extract high-level features. The CNN architecture is then optimized, and the best features are input into the XGBoost model for further prediction and optimization. This approach takes advantage of the CNN's feature extraction capabilities and XGBoost's strength in handling structured data and making accurate predictions. The implementation flow of the combined model is shown in Fig. 6 and Table 6.

Fig. 6.

Fig 6

Flow of combinatorial model realization.

Table 6.

CNN-XGBoost Algorithm Steps.

CNN-XGBoost Algorithm Steps
Input: X_train (N_train, 14), y_train (N_train, 1), X_test (N_test, 14), y_test (N_test, 1), Conv_Kernels = [2, 3, 5], XGBoost_Params (search space), K (number of folds for cross-validation), step (grid search step size).
Output:Prediction results and performance evaluation metrics.
Process:
1. Initialize the optimal CNN model as null and set the minimum MAPE to positive infinity.
2. For each kernel size in Conv_Kernels, perform the following steps:
 3. Construct a CNN model:
  - Design the CNN architecture using the current kernel size.
  - Configure convolutional layers, pooling layers, and fully connected layers.
  - Compile the CNN model.
 4. Perform K-fold cross-validation:
  - Split the training data into K folds for training and validation.
  - For each fold, train the CNN model and compute MAPE.
- Calculate the average MAPE across all folds.
 5. if avg_MAPE < min_MAPE:
  - Update the optimal kernel size and the minimum MAPE.
6. Construct the optimal CNN model using the selected kernel size, and train it on the entire training set.
7. Apply grid search to optimize the XGBoost hyperparameters defined in XGBoost_Params:
 8. For each combination of hyperparameters:
  - Build an XGBoost model with the current parameter set.
  - Evaluate the model and record the performance metrics.
 9. Identify the best combination of XGBoost hyperparameters.
10. Feed the optimal features extracted from the CNN into the XGBoost model.
11. Evaluate the performance of the hybrid model on the test set:
  - Predict the severity of traffic accidents.
  - Assess and record performance metrics such as accuracy, AUC, and F1-score.
12. Output the optimal performance metrics and final prediction results.

Performance measure

  • (1)
    In classification problems, accuracy is one of the most common evaluation criteria, with higher accuracy indicating better model predictions, as shown in Eq. (11).
    accuracy=NumberofCorrectPredictionsTotalNumberofPredictions (11)
  • (2)

    Area Under the Curve (AUC) value is one of the metrics used to evaluate the performance of classification models, usually, the AUC value is greater than 0.5, and the closer it is to 1 means that the model performance is better.

  • (3)
    The F1 score is an evaluation index that integrates the Precision and Recall of the model, and its value ranges between [0,1], the higher its value, the better the model performs in the classification task of positive and negative categories.
    F1=2×Precision×RecallPrecision+Recall (12)

Method validation

Model results and interpretation

The grid search method is a technique used to adjust the hyperparameters of the model in machine learning. By combining hyperparameter grid search with 5-fold cross-validation, important parameters can be systematically adjusted. This allows for the exploration of various parameter combinations to identify values that optimize the model's performance, thereby enhancing its generalization ability. Consequently, the model can more accurately capture data features, improve prediction performance, and effectively prevent overfitting. Fig. 7 illustrates the accuracy of the training set corresponding to different hyperparameter combinations under the grid search method, the horizontal axis represents the indices of different hyperparameter combinations, while the vertical axis indicates the corresponding training accuracy. Each blue dot represents the training result of a single hyperparameter setting, and the connecting line illustrates the fluctuation trend of accuracy as the hyperparameters change. As observed in the figure, there are significant differences in training accuracy across different hyperparameter combinations, with values ranging approximately from 0.87 to 0.95. This indicates that hyperparameters have a considerable impact on model performance. Some combinations achieve relatively high accuracy (close to 0.95), suggesting that these parameter settings are better suited to the current dataset and enable the model to fit the training data more effectively.

Fig. 7.

Fig 7

Accuracy with different hyperparameters.

Table 7 displays the values of each parameter for the optimal hyperparameter combination identified through the grid search method after expanding the range of hyperparameter grid.

Table 7.

XGBoost optimal hyperparameter combinations.

Name n_estimators learning_rate max_depth colsample_bytree gamma subsample reg_alpha
Value 200 0.1 3 0.4 0.06 0.7 0.2

The optimal model was obtained after completing parameter tuning. At this stage, the model was trained using the training set, and its overall generalization ability and performance were evaluated using the validation set samples. The model's predictive ability for different categories of injuries was assessed using the area under the ROC curves (AUC) values, as depicted in Fig. 8, Fig. 9.

Fig. 8.

Fig 8

Accuracy and loss variation.

Fig. 9.

Fig 9

ROC curve.

As depicted in Fig. 8, the combined model exhibits a promising prediction performance on the validation set following training with the training set. The average accuracy of the validation set reaches 0.962, indicating excellent generalization ability and rapid convergence. Additionally, the AUC values for the four classifications are 0.99, 0.91, 0.90, and 0.97, respectively. This suggests that the model performs exceptionally well in predicting category 0 (undamaged) and category 3 (dead) incidents, while experiencing a slight decrease in performance for category 1 (minor injuries) and category 2 (serious injuries) predictions due to variations in sample frequency. Upon comprehensive examination of Fig. 8, Fig. 9, the combined model demonstrates an average accuracy of 0.962 and an average AUC value of 0.94, indicating high prediction performance and stability. Thus, the model effectively addresses the rear-end collision prediction task in this study.

Model comparison

To thoroughly evaluate the effectiveness of the combined model, this study conducts a comparative analysis against several widely used algorithms, including XGBoost, RF, and MLP. The performance of each model is assessed across multiple evaluation metrics, such as ROC curves, accuracy, F1-score, and loss variation, to comprehensively measure their effectiveness in rear-end collision risk prediction tasks. Additionally, the AUC is calculated to evaluate the classification performance and generalization ability of each model. Fig. 10 illustrates the comparison results of model accuracy.

Fig. 10.

Fig 10

Comparison of the accuracy rates of various models.

As shown in Fig. 10, in the accuracy comparison among the combined model, XGBoost,RF,MLP,DT,LSTM, the combined model demonstrates a significantly higher initial accuracy than the other models. Moreover, as the data volume gradually increases, its accuracy improves at a notably faster rate, indicating superior training efficiency and performance improvement speed. In contrast, the accuracy growth of other models is relatively moderate under the same data conditions, especially for the Multi-Layer Perceptron and Long Short-Term Memory models, which exhibit lower initial accuracy and slower improvement rates.

The variation in loss reflects how well the models fit the validation dataset during training. Lower loss values typically indicate that the model effectively captures patterns in the data, leading to more accurate predictions. The loss variations of each model are illustrated in Fig. 11.

Fig. 11.

Fig 11

Comparison of the losses of various models.

As illustrated in Fig. 11, the convergence speed of the loss function during training varies significantly across the models. The combined model shows a rapid decrease in loss during the early stages and converges to a lower final value, indicating better training efficiency and stronger fitting capability. In contrast, although RF and DT exhibit relatively stable convergence in the later stages, their loss reduction is slower, and the final values remain comparatively high.

Furthermore, the combined model demonstrates a relatively smooth loss decline throughout training. This is due to the CNN's ability to effectively extract key features during the feature extraction phase, enabling XGBoost in the classification stage to converge quickly with fewer training iterations, thus reducing training time and avoiding overfitting. Compared to using XGBoost, MLP, or DT alone, this integrated approach significantly enhances model efficiency in handling complex data, resulting in lower loss values and improved prediction accuracy.

The micro-averaged ROC curve aggregates TP, FP, TN, and FN across all classes in multi-class classification tasks, providing a comprehensive assessment of overall performance across all categories rather than focusing on a single class. The micro-averaged ROC curves of each model are shown in Fig. 12.

Fig. 12.

Fig 12

Comparison diagram of the ROC curves of various models.

As shown in Fig. 12, the combined model outperforms the other models in terms of the micro-averaged ROC curve, particularly with respect to the AUC. The AUC value of combined model is significantly higher than those of XGBoost, RF, MLP, DT, and LSTM, indicating a stronger ability to distinguish between positive and negative classes—that is, it can more accurately identify the severity level of rear-end collisions.

In addition, the ROC curve of combined model encloses the largest area, suggesting better stability across classification tasks. It consistently maintains high sensitivity and specificity, whereas the other models exhibit fluctuations, indicating relatively poor prediction accuracy in certain categories. This demonstrates that the combined model can handle complex multi-class data more effectively, particularly in high-dimensional tasks such as rear-end collision prediction, by leveraging the feature extraction strength of CNN and the classification power of XGBoost. A comprehensive evaluation of the model's overall performance is presented in Fig. 13 and Table 7.

Fig. 13.

Fig 13

Comprehensive performance comparison.

As observed from the model comparison results in Fig. 13 and Table 8, the training and validation set accuracies, average AUC value, and F1 score of the traffic accident severity prediction model built based on the CNN-XGBoost algorithm are the highest among all models. Compared to the single XGBoost model, integrating CNN with XGBoost yields a significant improvement in overall performance.

Table 8.

Comparison of performance indicators.

Categorization Training accuracy Validation accuracy AUC F1 scores
CNN-XGBoost 0.969 0.962 0.94 0.95
XGBoost 0.811 0.797 0.89 0.70
RF 0.793 0.721 0.84 0.68
MLP 0.762 0.627 0.81 0.91
DT 0.779 0.764 0.86 0.79
Bayes 0.781 0.737 0.91 0.81
LSTM 0.686 0.635 0.85 0.87

By comparing the nonlinear machine learning models XGBoost, RF, MLP, DT, Bayes and LSTM algorithms, the average accuracy of the validation set for the combined model is improved by 0.165, 0.241, 0.335,0.198,0.225 and 0.327 respectively. Additionally, the average AUC values are higher by 0.05, 0.1, 0.13,0.08,0.03 and 0.09, indicating that the combined model demonstrates stronger classification abilities and exhibits better robustness when distinguishing between different categories. Moreover, the F1 scores are higher by 0.25, 0.27, 0.04,0.16,0.14 and 0.08 respectively, indicating that the combined model achieves a better balance between accuracy and recall. Compared to a single model, its comprehensive performance is superior when dealing with the same prediction task, suggesting that the constructed combined model possesses excellent adaptability and can demonstrate good prediction performance across various prediction tasks.

Limitations

In this study, although the rear-end collision severity prediction model based on the CNN-XGBoost algorithm achieved an average accuracy of 0.962 in the experiments, there remains a gap between experimental performance and real-world application. Practical deployment involves a broader range of factors, such as road conditions, weather variations, and driver emotions, many of which were not fully captured by the input variables used in this model. Therefore, future research could improve real-world applicability by tuning additional hyperparameters and incorporating larger, more diverse datasets to better reflect the complexities of actual driving environments.

Ethics statements

This work did not involve human subjects, animal experiments data, and data collected from social media platforms.

CRediT authorship contribution statement

Shufeng Wang: Conceptualization, Methodology, Data curation, Writing – review & editing. Shixuan Jiang: Software, Visualization, Writing – original draft. Zhengli Wang: Validation. Lingyi Meng: Validation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by the State Key Lab of Intelligent Transportation System under Project no 2025-B008 and the Postgraduate Quality Professional Degree Teaching Case Base of Shandong Province under Grant no SDYAL2024030.

Footnotes

Related research article

None

For a published article: None

Data availability

The data sources used in this study have been provided in the manuscript.

References

  • 1.World Health Organization . WHO; Geneva (CH): 2023. Global Status Report On Road Safety 2023: Summary. [Google Scholar]
  • 2.Owais M., Alshehri A., Gyani J., et al. Prioritizing rear-end crash explanatory factors for injury severity level using deep learning and global sensitivity analysis. Expert Syst. Appl. 2024;245 [Google Scholar]
  • 3.Ahmed S.K., Mohammed M.G., Abdulqadir S.O., et al. Road traffic accidental injuries and deaths: a neglected global health issue. Health Sci. Rep. 2023;6(5):e1240. doi: 10.1002/hsr2.1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ma Z., Mei G., Cuomo S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accid. Anal. Prev. 2021;160 doi: 10.1016/j.aap.2021.106322. [DOI] [PubMed] [Google Scholar]
  • 5.Liu Z., Chen H., Li Y., et al. Taxi demand prediction based on a combination forecasting model in hotspots. J. Adv. Transp. 2020 [Google Scholar]
  • 6.Miaomiao Y., Yindong S. Traffic accident severity prediction based on random forest. Sustainability. 2022;14(3):1729. [Google Scholar]
  • 7.Madushani J.P.S.S., Sandamal R.M.K., Meddage D.P.P.P., et al. Evaluating expressway traffic crash severity using logistic regression and explainable machine learning. Transp. Eng. 2023;13 [Google Scholar]
  • 8.Sattar K.A., Ishak I., Suriani L., et al. Road crash injury severity prediction using a graph neural network framework. IEEE Access. 2024;12:37540–37556. [Google Scholar]
  • 9.Cao Q. Shandong University of Technology; 2023. Analysis of Factors Influencing the Severity of Collision Between Electric Two-Wheeled Vehicles and Motor Vehicles [dissertation] [Google Scholar]
  • 10.Gao X., Tang H., Shen J., et al. An XGBoost-based method for highway accident type and severity prediction. Traffic Inf. Saf. 2023;41(4):55–63. [Google Scholar]
  • 11.Zubaidi H.A., Obaid I.A., Alnedawi A., et al. Motor vehicle driver injury severity analysis utilizing a random parameter binary probit model considering different types of driving licenses in 4-legs roundabouts in South Australia. Saf. Sci. 2021;134 [Google Scholar]
  • 12.Fountas G., Anastasopoulos P.C., Abdel-Aty M. Analysis of accident injury-severities using a correlated random parameters ordered probit approach with time variant covariates. Anal. Methods Accid. Res. 2018;18:57–68. [Google Scholar]
  • 13.Assi K., Rahman S.M., Mansoor U., Ratrout N. Predicting crash injury severity with machine learning algorithm synergized with clustering technique. Int. J. Environ. Res. Public Health. 2020;17(15):5497. doi: 10.3390/ijerph17155497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mehraab N., Ubaid I., Gajjar J., et al. Traffic crash severity: comparing the predictive performance. J. Inst. Eng. India Ser. A. 2023;104(2):435–446. [Google Scholar]
  • 15.Vadhwani D., Thakor D. Predictive analysis of injury severity of person across angle crashes using machine learning models. Int. J. Crashworthiness. 2022;28(4):523–536. [Google Scholar]
  • 16.Mannering F., Bhat C.R., Shankar V., et al. Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis. Anal. Methods Accid. Res. 2020;25 [Google Scholar]
  • 17.Wang S., Zhang J., Li J., et al. Traffic accident risk prediction via multi-view multi-task spatio-temporal networks. IEEE Trans. Knowl. Data Eng. 2023:12323–12336. [Google Scholar]
  • 18.Katanalp B.Y., Eren E. The novel approaches to classify cyclist accident injury-severity: hybrid fuzzy decision mechanisms. Accid. Anal. Prev. 2020;144 doi: 10.1016/j.aap.2020.105590. [DOI] [PubMed] [Google Scholar]
  • 19.Xie X., Kong L. Traffic accident prediction based on combined ARIMA and XGBoost model. J. Saf. Environ. 2021;21(1):277–284. [Google Scholar]
  • 20.Riccardi M.R., Filomena M., Santoro A., et al. Analysis of contributory factors of fatal pedestrian crashes by mixed logit model and association rules. Int. J. Inj. Contr. Saf. Promot. 2023;30(2):195–209. doi: 10.1080/17457300.2022.2116647. [DOI] [PubMed] [Google Scholar]
  • 21.Astarita V., Haghshenas S.S., Guido G., et al. Developing new hybrid grey wolf optimization-based artificial neural network for predicting road crash severity. Transportat. Eng. 2023;12 [Google Scholar]
  • 22.Kaur R., Roul R.K., Batra S. An efficient approach for accident severity classification in smart transportation system. Arabian J. Sci. Eng. 2023;48(8):9645–9659. [Google Scholar]
  • 23.Arif Ali A.Z., Abduljabbar H., Tahir A., Sallow A.B., Almufti S.M. eXtreme gradient boosting algorithm with machine learning: a review. Acad. J. Nawroz. Univ. 2023;12(2):320–334. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data sources used in this study have been provided in the manuscript.


Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES