Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 May 17;15:17203. doi: 10.1038/s41598-025-01469-2

Crash injury severity for ride-hailing drivers: a questionnaire study in China

Yan Zhang 1, Shiyuan Liu 2,
PMCID: PMC12085703  PMID: 40382426

Abstract

The development of ride-hailing service has also led to an increase in the frequency of traffic accidents. The present study investigated the causes and accident severity for the ride-hailing accidents attributed to drivers-related factors in the context of China. From an online, self-reported survey of 1356 ride-hailing drivers across the country, we collected data about the drivers’ demographic characteristics, working conditions, fatigue, risky driving behaviors and accident records between 2021 and 2023. We turned the data into insights through a two-step approach: using a Random Forest (RF) model to identify the most significant factors influencing accident severity, followed by building a Bayesian Network (BN) model to analyze the relationships between the identified factors and accident severity. With 16 top factors according to the RF model, results from the BN model showed that the main risk factors differ between different levels of accident severity. Among all the factors, nine proved to be directly related to accident severity, mostly involving drowsiness, using smartphones in inappropriate situations and risky driving behaviors; the drivers’ demographic and working conditions otherwise influence accident severity in indirect ways. The findings from this study are useful for proposing more targeted policies to mitigate the accident severity among ride-hailing drivers.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-01469-2.

Keywords: Ride-hailing drivers, Accident severity, Working conditions, Risky driving behavior, Mobile phone use

Subject terms: Health occupations, Engineering

Introduction

With the development of web services and mobile technology over the last decade, smartphone-based ride-hailing service emerges as a supplement to public transport modes. The ride-hailing service first appeared in 2011, since when the market has grown rapidly in a short period, however, without proper regulations at the early stages of its development1. Still, it features greater flexibility in daily travel routines with reduced costs and has become increasingly popular worldwide, especially in fast-developing countries. In China, for example, by the end of 2022, there had been more than 210 ride-hailing firms nationwide, serving a total of nearly 340 million registrants, over 6% of whom were daily active users2. As the travel demand has been constantly rising, the industry is currently employing over 3.08 million drivers and managing approximately 1.16 million vehicles; and over 300 new drivers are joining every day3. In the meantime, the surge of the travel demands translates into a high driving exposure on roads, accompanied by a high reporting rate of traffic accidents4. In Shenzhen, one of the first-tier cities in China, 176,948 traffic violations were recorded in 2017, where 3,879 accidents were found to be related to ride hailing. The accident rate of ride-hailing cars was as high as 7.15%, while that of private cars was only 0.28% in the same period5. There is therefore an urgent need for a better understanding of the underlying causes of the ride-hailing accidents.

Research on the effect of ride-hailing cars over the traffic accident rate has so far proved inconclusive. Some found a decline in certain types of accidents by the participation of ride-hailing in road traffic68. In reality, many of the ride-hailing taxi drivers are only on a part-time basis and used to drive for commuting only, thus contributing to lower the level of vehicle miles traveled. As pointed out by Barrios et al.9, ride-hailing taxis in-crease the rate of traffic accidents with the increasing number of vehicles and miles traveled on the roads. Some other studies, however, showed evidence that ride-hailing taxis may not always raise the rate of road traffic accidents9,10. Besides, ride-hailing services may attract travelers from other modes of public transportation, such as buses and sub-ways, resulting in more traffic congestion on the urban road networks, especially those in fast-developing countries. It is also noted that most of the existing studies were conducted in developed countries; few investigated the risk factors related to the ride-hailing drivers, given a wider social, demographic and economic context of developing countries. Since ride-hailing cars usually operate in cities, the risk of causing traffic accidents may be relatively higher in developing countries, given their more complex and mixed road traffic conditions.

Compared with traditional professional drivers, a few characteristics of the ride-hailing drivers may be more likely to pose potential risks to road safety. For example, many ride-hailing drivers have a full-time job and offer ride-hailing services during their off-duty times. In that case, the “part-time” drivers would be driving after extended periods of wakefulness or at nights. Previous studies showed that long, continuous driving hours would be more likely to lead to an accident11,12. Accordingly, the traffic regulators stipulate a maximum of continuous driving hours and that of daily working hours for professional drivers. These regulations, however, are not sufficiently effective for ride-hailing drivers because it is difficult to evaluate the impact of fatigue from their primary job on their part-time driving or to ensure their adequate rest before driving ride-hailing cars. Moreover, Mao et al.13 found that nearly 33% of the ride-hailing driver tend to work for long-shift bookings and their daily driving distance may even be up to 162.7 km – the heavy workload is an important reason for the high accident rate of ride-hailing drivers. Due to the special working schedule, Berneking et al.1 pointed out that fatigue and sleepiness were inherent safety risks in the ride-hailing industry, both of which proved to be key factors in increasing the risk of drowsy driving accidents14.

Risky driving behaviors have also been considered as major contributors to accidents in the authors’ recent studies11,15. For ride-hailing drivers, there are several reasons for a higher accident risk caused by risky driving behaviors. For instance, ride-hailing drivers typically have not received rigorous training and screening that is required for taxi or truck fleets13. Therefore, these drivers may fall short of professional drivers in terms of driving experience and skills, emergency handling, and emotional control. Besides, most ride-hailing drivers do not have enough experience in the service industry. As a result, they are more susceptible to passenger influence while driving, leading to more risky driving behavior. As highlighted by Acheampong16, reckless driving and risk-taking behavior by drivers appears to be frequently in ride-hailing services and have seriously affected road safety. It was found that ride-hailing drivers tended to be more pressed for saving time and earning profits, and thus more likely to commit risky driving behaviors, such as neglecting to use turn signals, encroaching car lanes, speeding, and running red lights17. In addition, since the smartphone is an important intermediary between the ride-hailing drivers and their customers, risky behaviors related to smartphones are quite common among these drivers. According to the Provisions on Penalties for Road Traffic Safety Violations, using phone while driving is classified as a traffic violation. Drivers who violate this provision face penalties, including demerit points and fines, as this behavior can easily lead to serious traffic accidents. A survey conducted by Truong and Nguyen18 compared the frequency of smartphone use when driving between traditional professional drivers and the ride-hailing counterparts. Their findings showed that the latter used smartphones more frequently and caused more phone-related traffic accidents. Ansar et al.19 examined the effect of driving distraction caused by passenger interference, smartphone use and entertainment elements on ride-hailing drivers. They concluded that the phone-related behaviors might make drivers be more susceptible to emotional changes while driving, thus increasing the risk of accidents.

Given the risk factors noted above, the safety issues in ride-hailing services have attracted significant attention in academia, though, the studies on ride-hailing drivers in developing countries are limited. In this paper, we attempt to fill in the gap and take China as an example to investigate the relationship between accident severity caused by ride-hailing drivers and a range of influencing factors, including demographic, working conditions, driver fatigue, and risky driving behaviors. To achieve this goal, we used a Random Forest (RF) to identify the most significant factors that influence the accident se-verity, and a Bayesian Network (BN) model to find out the causal nature of associations between the identified risk factors and accident severity, considering the interactions among all variables.

Materials & methods

Data collection

In this study, participants completed an electronic questionnaire through an online survey platform (www.wjx.cn). Ethics approval of the research was granted by Chang’an University Human Ethics Committee, and we confirm that all experiments were performed in accordance with the guidelines and regulations. The research team entered into a partnership with a professional survey company in China, which was responsible for sending questionnaires to ride-hailing drivers. At first, drivers received the link to the survey shared in WeChat groups and the questionnaire was filled out voluntarily by the drivers. Participants were then informed of their rights and the protection of their personal information in an electronic consent form, which also emphasized the purposes of this study and promised the survey data would be used for academic research only. Importantly, the questionnaires were completely anonymous. The minimum response time for each questionnaire is 10 min and drivers who complete the questionnaire will be compensated with 12 CNY. Experienced ride-hailing drivers, each with a minimum of 2 years of experience, were invited. The survey was conducted between October 10 and November 13 in 2023. The questionnaires were screened to ensure that respondents answered all key questions related to demographic information, working conditions, fatigue status, risky driving behavior, and accident history. Surveys lacking responses to any of these critical sections were classified as incomplete and deemed invalid. Out of a total of 1,534 drivers surveyed, 1,356 (about 88.4% of the respondents) returned valid questionnaires, for which the geographical distribution is shown in Fig. 1.

Fig. 1.

Fig. 1

Geographical distribution of the sample.

The questionnaire consists of four parts, as shown in the appendix (Table A1). The first part includes questions about demographic information, such as gender, age and education level, for examining the relationship between socio-demographic variables and traffic accidents4,15. Considering a large number of ride-hailing drivers have multiple jobs and work under adverse conditions, which may increase their accident risk12, the second part deals with the drivers’ working conditions, including their employment status, daily working hours, daily income, dissatisfaction with incomes, and the commission rate of each order drawn by the ride-hailing company. Questions in the third part are about driver fatigue, involving continuous driving profile, fatigue perceptions, and sleep problems. These were referenced from reports regarding professional drivers who are vulnerable to fatigue or sleepiness20. In the fourth part, the drivers were expected to report the frequency of risky behaviors while driving on a five Likert scale from “Never” to “Always”. Questions on risky behaviors were referenced from Wang et al.11 and Nguyen-Phuoc et al.17. An example is that, “As a ride-hailing driver, how often do you engage in the following risky behaviors: Using apps on your phone? Using the phone for navigating? Crossing red light? etc.” Note that three questions related to phone use while driving were designed due to the dependence of ride-hailing drivers on smartphones18. Finally, accident records for each participant were collected over the past two years, consistent with the time window selected in previous studies to reduce potential recall bias regarding accidents21,22.

Research design

BN model has been gaining popularity in recent transportation safety research2327. It visualizes the complex relationships and interactions among independent and dependent variables for constructing probabilistic inference and diagnosis. Thus, this method is suitable for modeling inter-correlated independent variables to better explain heterogeneous influence of ride-hailing drivers on accident injury severity from the attribute changes. However, BN model has difficulty identifying significant factors affecting the target variable, and each factor input to the model exponentially increases the computational complexity28. The 88.4 possible structure of BN with Inline graphic variables increases super-exponentially as the number of variables Inline graphic increases, and its magnitude reaches Inline graphic for Inline graphic, and Inline graphic for Inline graphic29. Therefore, to achieve feasible and efficient estimation of BN model, it is essential to apply a variable selection procedure to exclude the other variables that do not influence model performance. In this study, a RF model is used to select significant variables and screen out unnecessary and redundant attributes. The details of RF model and BN model are provided in the following sections.

Random forest based variable selection

RF is an integrated learning algorithm, which consists of a set of tree-structured classifiers with independent identically distributed random vectors30. When dealing with classification issues, the model outputs the most voted classification of all trees. Since the number of trees can be customized, RF can give stable results. Moreover, compared with traditional variable selection methods (e.g. classification trees, stepwise regression, etc.), RF has the ability to handle multi-collinearity problem associated with candidate variables and is robust against over fitting. Therefore, RF has been frequently used to rank the importance of variables and to select significant variables in traffic safety studies3133. The algorithm of RF can be referred to Breiman34.

In this study, RF was used to identify the significant factors that had a great impact on the accident severity of ride-hailing drivers. To this end, the importance of each variable to accident severity should be calculated, which could be measured by Inline graphic index:

graphic file with name 41598_2025_1469_Article_Equ1.gif 1

where Inline graphic is the Inline graphic index for a node t, m is the number of classes for this node, and Inline graphic denotes the estimated class probabilities. For a splitting variable Inline graphic, the importance of Inline graphic represents the mean decrease in the Inline graphic index over all trees in the forest for the node.

After the importance ranking of all variables is determined, a stepwise procedure was followed to select the significant variables that should be included in BN model. The variables were dropped one by one by checking the out-of-bag (OOB) error rates of RF models, which is calculated as the proportion of times that the voted class is not equal to the true class and averaged over all cases in the OOB data.

Bayesian network

BN model is a directed acyclic graph, consisting of a set of nodes, arcs, and probabilities. The nodes and arcs represent, respectively, variables and causal relationships between them. The directed arcs imply local conditional dependencies between parent and child nodes. The probabilities quantitatively describe the causal relationships represented by the arcs from one node to another.

The joint probability distribution of a set of random variables Inline graphic based on the conditional independence and the chain rule35, is included in the network as:

graphic file with name 41598_2025_1469_Article_Equ2.gif 2

where Inline graphic is the joint probability of V and Inline graphic represents the parent nodes of Inline graphic.

According to Bayesian theorem, the BN model update the prior probability of event occurrence with new information, denoted E, to yield the posterior probability. This could be represented by the following equation:

graphic file with name 41598_2025_1469_Article_Equ3.gif 3

where Inline graphic is the probability of event E conditional on the occurrence of event V; Inline graphic and Inline graphic are the probabilities that event V and E occur independently.

Modeling process and sensitivity analysis

The modeling process of BN model includes two steps: structure learning and parameter learning. Generally, there are three methods to build the BN model: (1) relying on the knowledge of domain experts; (2) using data driven algorithm; (3) combining (1) and (2)36. As for the expert knowledge is subjective and limited, and the structure learned by algorithm can appear to be unrealistic, the third method was selected in this study. First, the PC algorithm, a popular constraint-based method, was used to learn the original BN model. This method infers the relationship between paired variables by applying conditional independence tests, and uses a backward stepwise testing procedure to determine absent edges in the resulting graph37. Then, the edges that do not fit the theoretical reality are deleted according the expert knowledge and previous literature. With the established BN model, the parameter learning is conducted using CPTs from the dataset. This process is accomplished by GeNIe Software 3.0 (BayesFusion, LLC; http://www.bayesfusion.com/), which performs the maximum likelihood estimates using the Expectation-Maximization algorithm38.

Once the modeling process is completed, the prior probability of the target variable can be obtained. Sensitivity analysis calculates how the probabilities change as new evidence or cases become available. The probability of a state in a variable was adjusted to Inline graphic each time, while other states remain unchanged, and the probability change of target variable can be calculated:

graphic file with name 41598_2025_1469_Article_Equ4.gif 4

Results

Descriptive statistics

Table 1 presents sample statistics about demographic characteristics, working conditions and driver fatigue of the respondents. Of the total of 1,356 valid responses, over a half accident records, with 43.9% of drivers were involved in property damage accidents, and 8.5% in injury accidents; the proportion of male drivers was nearly twice that of female drivers. The imbalance among the “No accidents”, “Property damage” and “Injury” categories may affect the performance of our model; however, it somewhat reflects real-world conditions, where serious injuries are relatively rare. Maintaining this realism is likely to enhance the transferability of our solutions. Over 90% of the surveyed drivers were under the age of 40, the vast majority (86.7%) of whom had achieved a level of higher education. The data suggested that those ride-hailing drivers were mostly young male and well educated, and however accounted for a higher rate of accidents. This may be attributed to factors such as younger and more educated drivers having less driving experience and poorer judgment of road conditions.

Table 1.

Characteristics of ride-hailing taxi drivers (n = 1356).

Categorical Variable n % n % n %
Demographics
Age (years)
Young (< 30) 602 44.3 Middle (30 to 40) 638 47.1 Older (> 40) 116 8.6
Education level
Middle school 33 2.4 High school 147 10.8 Higher education 1176 86.7
Gender
Female 475 35.0 Male 881 65.0
Working conditions
Employment status
Part-time 865 63.1 Full-time 500 36.9
Daily working (hours)
< 4 770 56.8 5 to 8 212 15.6 9 to 10 337 24.9
Over 10 37 2.7
Daily income (yuan)
Low (< 200) 629 46.4 Middle (200 to 400) 472 34.8 High (> 400) 255 18.8
Commission ratio (%)
Low (< 15) 238 17.6 Middle (15 to 30) 718 52.9 High (> 30) 400 29.5
Dissatisfaction with income
Dissatisfied 479 35.3 Neutral 719 53.0 Satisfied 158 11.7
Fatigue
Consecutive driving (hours)
Short (< 2) 1036 76.4 Normal (2 to 4) 232 17.1 Long (> 4) 88 6.5
Fatigue feeling
Never 143 10.5 Seldom 625 46.1 Sometimes 445 32.8
Often 131 9.7 Always 12 0.9
Sleep problems
Never 220 16.2 Seldom 682 50.3 Sometimes 352 26.0
Often 90 6.6 Always 12 0.9
Accident history
No accident 645 47.6 Property damage 595 43.9 Injury 116 8.5

The results also paint a gloomy picture about the working conditions of ride-hailing driver. 63.1% of drivers were employed as ride-hailing drivers on a part-time basis. Not surprisingly, although 93.5% of the drivers claimed that their continuous driving times were less than four hours, 43.4% reported they had fatigue feeling while driving and 33.5% had sleep problems. Ride-hailing drivers suffer a heavy workload, but they don’t earn much, as 46.4% of them reported that their daily income was less than 200 yuan. Despite the low income of many drivers, ride-hailing companies still charged a high daily management fee, with a commission rate greater than 15% for each order. Relatedly, only 11.7% of the drivers were satisfied with their daily income.

Table 2 shows the prevalent risky driving behaviors in terms of frequency among ride-hailing drivers. As expected, the frequency of phone-related behaviors is quite high, as using a phone to navigate while driving (M = 3.91 ± 1.04) was found to be the most frequent behavior, followed by using a phone to talk (M = 2.79 ± 1.21) and using apps on the phone (M = 2.03 ± 0.91). For the other nine risky behaviors, the three most frequently behaviors were honking at other drivers, driving with one hand and failure to use a lamp indicator.

Table 2.

Distribution of risky driving behaviors while driving (%).

Variable Frequency Mean (M) SD
1 2 3 4 5
Phone-related behaviors
Use phone to navigate 3.5 8.3 13.6 43.2 31.3 3.91 1.04
Use phone to talk 16.7 28.5 20.9 26.8 7.2 2.79 1.21
Use apps on your phone 29.6 46.5 16.4 6.1 1.3 2.03 0.91
Other behaviors
Honk at other drivers 28.0 42.9 19.8 7.5 1.8 2.12 0.96
Driving with one hand 35.2 38.7 17.5 7.2 1.4 2.01 0.97
Failure to use lamp indicator 53.1 33.6 9.3 2.2 1.8 1.66 0.87
Dangerous overtaking 56.5 31.9 10.1 1.5 0.1 1.57 0.74
Speeding 59.2 30.9 8.1 1.7 0.1 1.53 0.72
Parking in forbidden areas 66.2 22.0 7.8 2.9 1.1 1.51 0.85
Suddenly accelerate or brake 64.1 28.6 6.0 1.0 0.3 1.45 0.68
Crossing a red light 74.3 18.2 6.1 1.3 0.1 1.35 0.66
Reckless merging 82.7 13.3 3.0 0.6 0.4 1.22 0.56

1 = never, 2 = seldom, 3 = sometimes, 4 = often, 5 = always.

SD standard error.

RF results and important variable selection

Initially, the 23 explanatory variables described in Fig. 2a were input to the RF model for the accident severity. To obtained stable results, the different number of trees to grow was tried, and in this study, 500 trees were sufficient. The number of variables randomly sampled as candidates at each split is recommended in the literature as the square root of the number of explanatory variables (Inline graphic)39.

Fig. 2.

Fig. 2

Selection of factors predicting accident severity by RF. (a) Variables ranked by MDG. (b) Select variables by OOB error.

For RF model with all explanatory variables, the OOB error rate of the model was 28.6%, and the importance rank of the variables based on their corresponding MDG values was showed in Fig. 2a. The driver’s daily working hours, with an MDG value of 69.6, ranked at the top and turned out to have the most significant impact on the classification of accident severity.

Then, the RF model was conducted in a successive phase in which the number of input variables was set from 1 to 23 according to the importance rank of variables, and the changes of OOB error rates were showed in Fig. 2b. The result showed that when the numbers of variables set to 16, the OOB error rate get the minimum value of 0.27, indicating the seven least important variables can be removed from the model. Therefore, the top 16 variables ranked by the RF model were used to construct BN model.

Bayesian network modeling results

Model performance

To validate the model, 80% of the 1356 records were randomly selected as a training set and 20% a test set. The original five-level categorization resulted in small sample sizes (< 5%) for some combinations, which adversely affected the stability of the conditional probability tables. Therefore, the five-point Likert scale measurements were reduced to three40. We merged “never” and “seldom” into “seldom”, “often” and “always” into “often”, the intermediate value “sometimes” remained unaltered.

Table 3 presents a confusion matrix showing the predicted results from the BN model estimation results for both datasets. In Table 4, the predicted accuracy rate of “no accidents”, “property damage” and “injury” were 90.3%, 60.0%, and 60.7% for testing dataset, respectively. The overall estimation accuracy of this trained BN model was 74.8% for testing dataset, indicating the model has good prediction performance.

Table 3.

BN model classification confusion matrix for the training and testing dataset.

Training dataset (80%, n = 1082) Predicted instances
Total No accidents Property damage Injury
Observed instances No accidents 514 469 44 1
Property damage 480 187 291 2
Injury 88 19 13 56
Accuracy (%) 75.4 91.2 60.6 63.6
Testing dataset (20%, n = 274) Predicted instances
Total No accidents Property damage Injury
Observed instances No accidents 131 119 12 0
Property damage 115 46 69 0
Injury 28 17 2 17
Accuracy (%) 74.8 90.3 60.0 60.7
Table 4.

Sensitivity analysis of different variable state for three accident severity (%).

No accidents Property damage Injury
Variables State (100%) Sensitivity Variables State (100%) Sensitivity Variable State (100%) Sensitivity
Fatigue feeling often -15.9 Fatigue feeling Often 4.3 Speeding Often 17.1
Sleep problems often -14.7 Fatigue feeling Sometimes 3.6 Parking in forbidden areas Often 15.4
Speeding often -14.4 Use a phone to talk Often 3.3 Parking in forbidden areas Sometimes 15.2
Parking in forbidden areas often -14.1 Sleep problems Often 3.1 Speeding Sometimes 13.9
Parking in forbidden areas sometimes -14.0 Daily working hours > 10 h 3.1 Use apps on your phone Often 13.4
Speeding sometimes -13.8 Honk at other drivers Sometimes 2.5 Driving with one hand Often 12.9
Use apps on your phone often -11.9 Sleep problems Sometimes 2.5 Fatigue feeling Often 11.6
Use apps on your phone sometimes -11.0 Use phone to navigation Often 2.2 Sleep problems Often 11.5
Honk at other drivers often -9.7 Daily working hours 5 to 8 h 1.7 Honk at other drivers Often 10.5
Driving with one hand often -9.5 Use a phone to talk Sometimes 1.2 Use apps on your phone Sometimes 10.3
Daily working hours > 10 h -8.8 Consecutive driving Normal 1.1 Driving with one hand Sometimes 7.7
Driving with one hand sometimes -8.6 Driving with one hand Sometimes 1.0 Honk at other drivers Sometimes 5.8
Honk at other drivers sometimes -8.4 Use apps on your phone Sometimes 0.7 Daily working hours > 10 h 5.7
Sleep problems sometimes -7.0 Daily income > 400 0.5 Sleep problems Sometimes 4.5
Fatigue feeling sometimes -5.9 Speeding Sometimes 0.5 Failure to use indicator lamp Often 4.0
Use a phone to talk often -5.3 Age < 30 years 0.5 Failure to use indicator lamp Sometimes 3.9
Failure to use indicator lamp often -3.8 Daily income 200 to 400 0.5 Use a phone to navigate Seldom 3.6
Use a phone to talk sometimes -3.8 Daily working hours 9 to 10 h 0.3 Use a phone to talk Sometimes 2.6
Failure to use indicator lamp sometimes -3.7 Parking in forbidden areas Seldom 0.2 Fatigue feeling Sometimes 2.3
Daily working hours 5 to 8 h -3.6 Commission ratio > 30% 0.2 Use a phone to navigate Sometimes 2.1

The Receiver Operating Characteristic (ROC) curve was also used to evaluate the overall performance of the BN model. The area under a ROC curve (AUC) is a quantitative index to assess the overall performance of model classification. Typically, the value of AUC more than 0.7 indicates a good model prediction performance31. Figure 3 showed that the AUCs achieved by the BN model. The AUCs for “no accidents”, “property damage” and “injury” accident severity classification of the testing dataset were 0.85, 0.86, and 0.84, respectively, which also confirmed the validity of the model.

Fig. 3.

Fig. 3

Fig. 3

The ROC curve for three levels of accident severity (testing data). (a) No accidents (AUC = 0.85). (b) Property damage (AUC = 0.86). (c) Injury (AUC = 0.84).

Relationship identification

Using the 16 explanatory variables identified by RF and the target variable (i.e. accident severity), the BN model with 17 nodes was obtained by the combination of PC algorithm and expert knowledge, as showed in Fig. 4. The input variables are all related to the accident severity in a direct or indirect way. Among all the explanatory variables, nine of which were directly related to accident severity. They are fatigue feeling, sleep problems, use apps on your phone, use a phone to navigate, use a phone to talk, honking at other drivers, driving with one hand, parking in forbidden areas and speeding. These variables were also correlated with each other. On the other hand, the remaining seven explanatory variables turned out to be indirectly related to accident severity. For instance, the variable of long daily working hours was one of the causes of sleep problems whose frequency could however directly affect the accident severity.

Fig. 4.

Fig. 4

The BN model learned by the PC algorithm.

Sensitivity analysis

With the BN model, we could make inference by updating the probabilities of different states in each node. The compiled BN model was shown in Fig. 5. In the first run, the initial probabilities of “no accidents”, “property damage” and “injury” were 46.6%, 37.2% and 16.2%, respectively. Since the 16 explanatory variables have a total of 49 categories, a total of 49 states requires sensitivity analysis. For instance, assuming that the variable “sleep problems” is rated 100% “often”, the probabilities of “no accidents”, “property damage” and “injury” were predicted to change to 31.9%, 40.3% and 27.8%, respectively. In this case, the sensitivity values of the three levels of accident severity were 14.7%, 3.1% and 11.6%, respectively.

Fig. 5.

Fig. 5

The compiled BN model for ride-hailing taxi accident severity.

Table 4 presents the sensitivities for each state of different variables having impact on accident severity. The causal factors of accident severity differ between the three levels. For drivers with no accident record, often feel fatigued while driving (15.9%), often experiencing sleep problems (14.7%), often speeding (14.4%), and often or sometimes parking in forbidden areas (14.1%; 14.0%) were the top five states that reduce the probability of not being involved in an accident. For drivers who had records of property damages in an accident, often or sometimes feeling fatigued while driving (4.3%; 3.6%), often using a phone to talk (3.3%), often experiencing sleep problems (3.1%), and working over 10 h daily (3.1%) are the top five states that increase the probability of being involved in property damage accidents. For drivers who reported injury accident records, often or sometimes speeding (17.1%; 13.9%), often or sometimes parking in forbidden areas (15.4%; 15.2%), and often using apps on your phone (13.4%) are the top five states that increase the probability of causing injuries in an accident.

Discussion

While many previous studies have explored the risk factors of traffic accidents for different types of professional drivers, this study set its sight on an emerging occupation, namely, the ride-hailing drivers, to investigate the influence of risk factors on different accident severity. Overall, the results reveal that both part-time and full-time ride-hailing drivers are subject to heavy workload and low income, which are potential triggers for them to engage in more risky driving behaviors and thus increase the risk of accidents. The information about the workload portrays a picture even more worrisome than similar profession such as taxi drivers12, because most of the ride-hailing drivers have multiple jobs and work extreme long times1.

The results also suggest that the main causes of the three levels of accident severity are different, especially between property damage accidents and injury accidents. Factors related to driver fatigue were strongly associated with property damage accidents, while risky driving behaviors such as speeding, parking in prohibited areas, and driving with one hand had a greater impact on injury accidents, and, of course, mobile phone use had a great impact on both types of accidents. Similar discrepancies were also found in previous studies on accident severity29,41, although the study population and explanatory variables were different from the present study. Also, the results proved that reducing the frequency of risky behaviors, managing driver fatigue and improving working conditions could increase the probability of no accidents in different degrees. These finds clarify which factors should be prioritized at different accident severity and shed a light on how to efficiently improve the safety performance of ride-hailing drivers.

Besides, although previous studies have demonstrated the relationship between risky driving behaviors and accident involvement, the present study adds to the literature by finding that phone-related behaviors including using apps on the phone, using the phone for navigating, and talking on the phone are directly related to predict accident severity. Moreover, the more frequently drivers use their phones, the higher the probability of causing a serious accident. Indeed, using a phone while driving can seriously distract the driver’s attention and weaken their ability to perceive and react to changes in the complex traffic environment. Under such circumstances, drivers have less control over their vehicles and tend to miss important road safety information, making them more prone to accidents, especially in the case of an unexpected situation or emergency. Most importantly, this study found that ride-hailing drivers have become somewhat dependent on the use of mobile phones. In addition to the necessity of processing the orders on ride-hailing apps, almost all ride-hailing drivers rely on navigation aids while driving. One possible explanation is that the group of drivers includes a large number of part-time workers. These people may not be professional drivers, and some are even novice drivers who are not familiar with most road conditions in the city. Another possible reason is that using navigation facilitates advanced route planning, which helps save travel time and thus allows drivers to earn more orders.

This investigation also provides an understanding of the correlations between phone-related behaviors, driver fatigue, and other risky driving behaviors. These results indicated that drivers who frequently use apps on the phone or use the phone for talking are more likely to drive with one hand or honk at other drivers. Generally, using functions on the phone required handheld or one-handed operation, so the phone-related behaviors while driving can undoubtedly trigger more frequent one-hand driving. Besides, when the drivers receive orders via their phone, they should get to the passenger’s location as quickly as possible. At this time, the drivers may become more aggressive and more likely to honk at other slower drivers, especially when experiencing traffic jams. Meanwhile, the frequency of using phones while driving reflects the busy working situation of ride-hailing drivers. As noted in the studies of Peng et al.12, the desire to make more money drives drivers to take on a heavier workload, which further exposes them to more severe physical and mental stress. Therefore, it is not surprising to find that phone use is also closely linked to driver fatigue and sleep problems, not to mention the fact that mobile phone use while driving can complicate driving tasks and cause drivers to expend energy on dealing with other non-work-related matters.

Additionally, the results of this study support that driver fatigue and sleep problems influence accident severity from both the direct and the indirect aspects. On the one hand, driving while fatigue always leads to impaired driving skills, lack of concentration, and slower response times of drivers, which have been proved to be highly correlated with higher accident severity14. Meanwhile, serious sleep problems not only exacerbate driver fatigue but also lead to a decline in memory and alertness, causing drivers to make more mistakes while driving42. On the other hand, this study also confirmed that driver fatigue and sleep problems induce more risky driving behaviors, which are directly associated with behaviors such as speeding, parking in forbidden areas, and failure to use an indicator lamp, all of which have been considered important precursors of traffic accidents in previous studies11,15,17.

Regarding demographic factors, only age was found to have an indirect effect on accident severity, by influencing drivers’ use of mobile navigation. As mentioned above, ride-hailing drivers use mobile navigation with high frequency. This finding further establishes that younger drivers rely more on the navigation to guide their driving. Since the ride-hailing taxi driver is a young driver-dominated profession, and the industry has inevitably introduced some drivers with less driving experience and poorer judgment of road conditions, the interventions should totally consider the particularities of this group of drivers.

As for working conditions, related factors can affect accident severity in many indirect ways. For example, drivers with long daily working hours are prone to continuous driving hours and are more likely to feel fatigued or fall into sleep while driving. Drivers who are dissatisfied with income or are subject to a high commission ratio will use their phones more frequently while driving because they do not want to miss any orders. Consistent with previous studies13,18, this study also confirmed that driver fatigue, sleep problems, and phone-related behaviors are important precursors of traffic accidents. Hence, it can be concluded that working under adverse conditions lead ride-hailing drivers more vulnerable to accidents. Moreover, the finds showed that drivers who worked for 5 to 8 h or over 10 h daily have a higher probability of causing serious accidents. Considering that the income of ride-hailing drivers depends largely on their working hours, it can be inferred that both part-time and full-time drivers are suffering heavy workloads as they tend to work longer hours to increase their income. It is worth emphasizing that full-time drivers are more likely to cause accidents when working similar hours than part-time drivers. One reason is that ride-hailing drivers are often hired as “independent contractors,” so they are not screened for medical problems that might reduce alertness, such as obstructive sleep apnea1. Another possible explanation lies in the fact that full-time drivers are more likely to be exposed to accident risk factors due to long-term exposure to the road43.

Conclusions

This study identified the risky factors of accidents related to ride-hailing drivers and investigated the relationships between demographics, working conditions, driver fatigue, and risky driving behaviors, and accident severity among Chinese ride-hailing drivers. An online self-reported survey was conducted from October 10 to November 13 in 2020, and a total of 1,356 effective questionnaires were collected. A two-steps method was applied in this study: first, a random forest (RF) technique was used to rank the importance of factors predicting accident severity, and then a Bayesian Network (BN) model was employed to establish the relationship between accident severity and significant factors. The results of the RF showed that it is appropriate to eliminate the seven least important variables according to the OOB error rate. Then a total of 16 significant factors were used to construct a BN model and the relevant results indicated that nine variables including fatigue feeling, sleep problems, use apps on your phone, use the phone to navigate, use the phone to talk, honking at other drivers, driving with one hand, parking in forbidden areas and speeding were directly related to accident severity. The remaining variables on demographic and working conditions can influence accident severity in indirect manners. Finally, sensitivity analysis further explored the main causes of different accident severity.

Findings from the present study have many practical implications. To reduce serious accidents, the management of ride-hailing drivers’ risky driving behavior should be prioritized. For example, increasing the management and assessment of new drivers; appropriately strengthening the monitoring and punishment of risky driving behaviors; creating some incentives to reward drivers who exhibit good driving behavior over a designated period. Since mobile use is an important means of taking online-orders, the related risky behaviors require more precise countermeasures: grading penalties ranging from no penalty to license revocation can be carried out depending on the possible consequences of phone use; integrating more functions that can be done by voice in ride-hailing applications. Besides, measures should be in place to reduce driver fatigue. It is responsible for ride-hailing companies to assess the number of part-time workers, the shifts that drivers typically work, and the frequency of drivers’ fatigue and sleep problems, to advocate for policies on hours-of-service regulations and appropriate work-rest schedules. Meanwhile, a more rational order allocation scheme should be adopted to ensure drivers could schedule shifts and rest more effectively. Importantly, the findings indicated that the improvement of income could reduce drivers’ working hours and reliance on phone use, which eventually alleviates driver fatigue. Achieving this goal requires financial support from the government for this industry, allowing ride-hailing companies to reduce the commission ratio of each order and then improve service quality to attract more customers.

A few limitations remain. First, while the study acknowledges the geographic distribution of respondents, it does not explicitly address subtle regional differences in driving conditions, regulations, and accident rates across various cities in China. Future comparative analyses will examine how these factors influence accidents among online taxi drivers in different regions. Second, our reliance on categorizing crashes solely by their highest severity is a limitation, as it overlooks the cumulative effects of multiple minor crashes and their frequency. This approach may obscure important behavioral differences among drivers, particularly between those with several minor crashes and those with no crashes or a single severe incident. Future research should investigate the interplay between accident frequency and severity to enhance our understanding of accident risk.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (75.2KB, pdf)

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author contributions

Y.Z. and S.L. designed the experiments, performed the analyses, carried out the statistical analysis and wrote the manuscript. S.L. edited the final version of the manuscript.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Informed consent statement

We hereby confirm that informed consent has been obtained from all participants involved in the study titled “Crash injury severity for ride-hailing drivers: A questionnaire study in China”. Each participant was provided with detailed information regarding the purpose, procedures, risks, and bsenefits of the study. Participants were given the opportunity to ask questions and were assured of their right to withdraw from the study at any time without any consequences. All participants voluntarily agreed to participate in the study and have signed the consent forms.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Berneking, M. et al. The risk of fatigue and sleepiness in the ridesharing industry: an American academy of sleep medicine position statement. J. Clin. Sleep. Med.14, 683–685 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.China Industrial Research Institute. China Ride-hailing Market In-depth Analysis and Development Trend Research Consultation Report (2020–2025), No.1742381.https://www.chinairn.com/report/20191225/093147828.html?id=1742381&name=guomeng (2020).
  • 3.National Bureau of Statistics. Statistical Bulletin of the People’s Republic of China on National Economic and Social Development in 2019. (2020).
  • 4.La, Q. N., Lee, A. H., Meuleners, L. B. & Van, D. D. Prevalence and factors associated with road traffic crash among taxi drivers in Hanoi. Vietnam Accid. Anal. Prev.50, 451–455 (2013). [DOI] [PubMed] [Google Scholar]
  • 5.Legal Daily. The traffic accident rate of ride-hailing taxis is higher than that of traditional taxis in Shenzhen.http://www.xinhuanet.com/auto/2017-06/02/c_1121073559.htm (2017).
  • 6.Brazil, N. & Kirk, D. S. Uber and metropolitan traffic fatalities in the united States. Am. J. Epidemiol.184, 192–198 (2016). [DOI] [PubMed] [Google Scholar]
  • 7.Greenwood, B. N. & Wattal, S. Show me the way to go home: an empirical investigation of ride-sharing and alcohol related motor vehicle fatalities. MIS Q.41, 163–187 (2017). [Google Scholar]
  • 8.Morrison, C. N., Jacoby, S. F., Dong, B., Delgado, M. K. & Wieebe, D. J. Ridesharing and motor vehicle crashes in 4 US cities: an interrupted time-series analysis. Am. J. Epidemiol.187 (2), 224–232 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Barrios, J. M., Hochberg, Y. V. & Yi, H. The cost of convenience: ridesharing and traffic fatalities. J. Oper. Manag27 (2020).
  • 10.Kirk, D. S., Cavalli, N. & Brazil, N. The implications of ridehailing for risky driving and road accident injuries and fatalities. Soc. Sci. Med.250, 112793 (2020). [DOI] [PubMed] [Google Scholar]
  • 11.Wang, Y., Li, L. & Prato, C. G. The relation between working conditions, aberrant driving behaviour and crash propensity among taxi drivers in China. Accid. Anal. Prev.126, 17–24 (2019). [DOI] [PubMed] [Google Scholar]
  • 12.Peng, Z., Wang, Y. & Luo, X. How does financial burden influence the crash rate among taxi drivers? A self-reported questionnaire study in China. Traffic Inj. Prev.21 (5), 324–329 (2020). [DOI] [PubMed] [Google Scholar]
  • 13.Mao, H. et al. Driving safety assessment for ride-hailing drivers. Accid. Anal. Prev.149, 105574 (2020). [DOI] [PubMed] [Google Scholar]
  • 14.Smolensky, M. H., Milia, L. D., Ohayon, M. M. & Philip, P. Sleep disorders, medical conditions, and road accident risk. Accid. Anal. Prev.43 (2), 533–548 (2011). [DOI] [PubMed] [Google Scholar]
  • 15.Peng, Z., Zhang, H. & Wang, Y. Work-related factors, fatigue, risky behaviours and traffic accidents among taxi drivers: a comparative analysis among age groups. Int. J. Inj Control Sa. 28 (1), 58–67 (2021). [DOI] [PubMed] [Google Scholar]
  • 16.Acheampong, R. A. Societal impacts of smart, digital platform mobility services-an empirical study and policy implications of passenger safety and security in ride-hailing. Case Stud. Transp. Policy. 9 (1), 302–314 (2021). [Google Scholar]
  • 17.Nguyen-Phuoc, D. Q., Gruyter, C. D., Nguyen, H. A., Nguyen, T. & Su, D. N. Risky behaviours associated with traffic crashes among app-based motorcycle taxi drivers in Vietnam. Transp. Res. Part F Traffic Psychol. Behav.70, 249–259 (2020). [Google Scholar]
  • 18.Truong, L. T. & Nguyen, H. T. T. Mobile phone related crashes among motorcycle taxi drivers. Accid. Anal. Prev.132, 105288 (2019). [DOI] [PubMed] [Google Scholar]
  • 19.Ansar, M. S., Ma, Y. F., Chen, S. Y., Tang, K. & Zhang, Z. Y. Investigating the trip configured causal effect of distracted driving on aggressive driving behavior for e-hailing taxi drivers. J. Traffic Transp. Eng-Engl Ed.8 (5), 725–734 (2021). [Google Scholar]
  • 20.Kwon, S., Kim, H., Kim, G. S. & Cho, E. Fatigue and poor sleep are associated with driving risk among Korean occupational drivers. J. Transp. Health. 14, 100572 (2019). [Google Scholar]
  • 21.Peng, Z. P., Pan, H. Y., Yuan, R. T. & Wang, Y. G. A comparative analysis of risk factors influencing crash severity between full-time and part-time riding-hailing drivers in China. 1–16 (2024).
  • 22.Cao, Y., Xie, X. W., Peng, Z. P. & Wang, Y. G. Investigating crash mechanisms: occupational factors among traditional and ride-hailing taxi drivers. Traffic Inj. Prev. 1–11 (2025). [DOI] [PubMed]
  • 23.Zhao, L., Wang, X. & Qian, Y. Analysis of factors that influence hazardous material transportation accidents based on bayesian networks: a case study in China. Saf. Sci.50 (4), 1049–1055 (2012). [Google Scholar]
  • 24.Oña, J. D., López, G., Mujalli, R. & Calvo, F. J. Analysis of traffic accidents on rural highways using latent class clustering and bayesian networks. Accid. Anal. Prev.51, 1–10 (2013). [DOI] [PubMed] [Google Scholar]
  • 25.Baksh, A. A., Khan, F., Gadag, V. & Ferdous, R. Network based approach for predictive accident modelling. Saf. Sci.80, 274–287 (2015). [Google Scholar]
  • 26.Mbakwe, A. C., Saka, A. A., Choi, K. & Lee, Y. J. Alternative method of highway traffic safety analysis for developing countries using Delphi technique and bayesian network. Accid. Anal. Prev.93, 135–146 (2016). [DOI] [PubMed] [Google Scholar]
  • 27.Das, S., Vierkant, V., Gonzalez, J. C., Kutela, B. & Sheykhfard, A. Bayesian network for motorcycle crash severity analysis. Transp. Res. Rec.2677 (11), 51–63 (2023).37153176 [Google Scholar]
  • 28.Naghavi, K. Z., Mortazavi, S. B., Asilian, M. H. & Hajizadeh, E. Ranking the occupational incident contributory factors: A bayesian network model for the petroleum industry. Process. Saf. Environ.137, 352–357 (2020). [Google Scholar]
  • 29.Chen, C. et al. A multinomial logit model-Bayesian network hybrid approach for driver injury severity analyses in rear-end crashes. Acc. Anal. Prev.80, 76–88 (2015). [DOI] [PubMed] [Google Scholar]
  • 30.Kutela, B., Dzinyela, R., Haule, H., Sheykhfard, A. & Msechu, K. Leveraging autonomous vehicles crash narratives to understand the patterns of parking-related crashes. Traffic Saf. Res., 4 (2023).
  • 31.Yu, R. & Abdel-Aty, M. Analyzing crash injury severity for a mountainous freeway incorporating real-time traffic and weather data. Saf. Sci.63, 50–56 (2014). [Google Scholar]
  • 32.Iranitalab, A. & Khattak, A. Comparison of four statistical and machine learning methods for crash severity prediction. Acc. Anal. Prev.108, 27–36 (2017). [DOI] [PubMed] [Google Scholar]
  • 33.Bao, J., Liu, P., Qin, X. & Zhou, H. G. Understanding the effects of trip patterns on spatially aggregated crashes with large-scale taxi GPS data. Acc. Anal. Prev.120, 281–294 (2018). [DOI] [PubMed] [Google Scholar]
  • 34.Breiman, L. Random forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]
  • 35.Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann Publishers, Inc, 1988).
  • 36.Kjaerulff, U. B. & Madsen, A. L. Bayesian Networks and Influence Diagrams: A Guide To Construction and Analysis144 (Springer, 2008).
  • 37.Spirtes, P. & Glymour, C. An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev.9 (1), 62–72 (1991). [Google Scholar]
  • 38.Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol). 1–38 (1977).
  • 39.Svetnik, V. et al. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci.43, 1947–1958 (2003). [DOI] [PubMed] [Google Scholar]
  • 40.Zhou, Q., Fang, D. P. & Wang, X. M. A method to identify strategies for the improvement of human safety behavior by considering safety climate and personal experience. Saf. Sci.46, 1406–1419 (2008). [Google Scholar]
  • 41.Wang, Y. & Prato, C. G. Determinants of injury severity for truck crashes on mountain expressways in China: A case-study with a partial proportional odds model. Saf. Sci.117, 100–107 (2019). [Google Scholar]
  • 42.Zhang, W. H. et al. The fitness-to-drive of shift-work taxi drivers with obstructive sleep apnea: an investigation of self-reported driver behaviour and skill. Transp. Res. Part. F Traffic Psychol. Behav.59, 545–554 (2018). [Google Scholar]
  • 43.Kingsley, M., Joey, J., Ma, L. & Zhang, W. Modeling and mitigating fatigue-related accident risk of taxi drivers. Accid. Anal. Prev.123, 79–87 (2019). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (75.2KB, pdf)

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES