Abstract
Background
Expenditure on driver-related behavioral interventions and road use policy is often justified by their impact on the frequency of fatal and serious injury crashes. Given the rarity of fatal and serious injury crashes, offense history, and crash history of drivers are sometimes used as an alternative measure of the impact of interventions and changes to policy. The primary purpose of this systematic review was to assess the rigor of statistical modeling used to predict fatal and serious crashes from offense history and crash history using a purpose-made quality assessment tool. A secondary purpose was to explore study outcomes.
Methods
Only studies that used observational data and presented a statistical model of crash prediction from offense history or crash history were included. A quality assessment tool was developed for the systematic evaluation of statistical quality indicators across studies. The search was conducted in June 2019.
Results
One thousand one hundred and five unique records were identified, 252 full texts were screened for inclusion, resulting in 20 studies being included in the review. The results indicate substantial and important limitations in the modeling methods used. Most studies demonstrated poor statistical rigor ranging from low to middle quality. There was a lack of confidence in published findings due to poor variable selection, poor adherence to statistical assumptions relating to multicollinearity, and lack of validation using new data.
Conclusions
It was concluded that future research should consider machine learning to overcome correlations in the data, use rigorous vetting procedures to identify predictor variables, and validate statistical models using new data to improve utility and generalizability of models.
Systematic review registration
PROSPERO CRD42019137081
Keywords: Systematic review, Quality assessment tool, Crash, Traffic, Offense, Statistics, Statistical modeling, Driver offenses, Crash history
Background
Expenditure on driver-related behavioral interventions and road use policy is often justified by their impact on the frequency of fatal and serious injury (FSI) crashes. Fatal and serious injury includes death and injuries that endanger human life, including fetuses. Injury can be acute, cumulative, and protracted [1]. Due to the increased rarity of FSI crashes, because of factors such as improved vehicle design and road infrastructure, reliably evaluating the short- and medium-term impact of interventions on FSI crashes is challenging. On the other hand, traffic offenses (e.g., speeding and disobeying traffic lights) are much more frequent than FSI crashes [2, 3]. This has led to organizations using offending patterns as a proxy measure to evaluate the effectiveness of new interventions and policies targeting the reduction of FSI crashes.
Predictive models of FSI crashes can be complex and include variables from multiple domains. Environmental factors, road conditions, legal factors, licensing factors, and driver characteristics have all been found to contribute to FSI crash involvement [4–12]. Offense history (i.e., the number of traffic infringements a driver has incurred) and crash history (i.e., the number of crashes a driver has been involved in) have also frequently been found to be useful predictors of future FSI crashes [13–15]. Offense histories that include repeated violations over time, such as exceeding the speed limit and failure to obey road signs, have been found to increase crash risk. The increase in crash risk is particularly high when repeated violations lead to license suspension or revocation. Involvement in multiple crashes over time is a stronger predictor of future crash involvement than traffic violations [13, 14, 16]. Indeed, even particularly risky offenses do not appear to always increase the risk of FSI crashes. For example, Leal and Watson [17] reported that for those who engage in illegal street racing only 3.7% of offenses result in crashes, none of them being fatal.
A handful of reviews have investigated the modeling of FSI crashes [18–22]. A common critique in these reviews concerns the suitability of the traditional statistical techniques that have been applied to this data [22]. Considering the wide-ranging implications for developing and employing a statistical model to help inform decisions around policy and funding, it is vital that models are developed using rigorous and suitable methods, producing models that can be understood by non-technical audiences. Our review emphasizes the statistical approaches and methodologies applied to modelling FSI crash data. However, we were unable to find an existing quality assessment tool that met this need and have therefore developed our own.
Further, while statistical heterogeneity between primary studies is regularly noted as a limitation in systematic reviews and meta-analyses, detail such as the rigor and suitability of the modeling methods, as well as the ease of model interpretation is rarely discussed [9, 23–27]. In this review, we focus on an area of importance to policy makers. We focus exclusively on how offense history and crash history predict future FSI crashes and how such associations are best modeled. No previous reviews have had this focus and a new quality assessment tool has been developed specifically for this review because nothing suitable could be found.
Aims
The overall research question was to assess the state of evidence for the prediction of FSI crashes from offense history and/or crash history by completing a systematic review of published literature and grey literature. Based on the failure to identify a systematic review focused on our research question and failure to identify a statistical quality assessment tool, two specific aims were formed. The first aim was to determine the type and quality of statistical analyses applied to the prediction of FSI crashes from driver offense and crash history. The quality of the papers included in the review was assessed based on the reporting of statistical assumptions, the reporting of statistical results, and the reporting of considerations specific to the statistical methods used. The second aim was to summarize the evidence and outcomes of studies that include offense history and crash history as predictors of FSI crashes.
Methods
Protocol and registration
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) were followed for the review [28]. The protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO), registration number: CRD42019137081.
Inclusion and exclusion criteria
Individuals and outcome of interest
There were no restrictions on the type of individuals (i.e., drivers) included in the review. Efficacy and outcomes of interventions were not the subject of this review. Rather, we were interested in how the prediction of FSI road crashes has been modeled when described by driver offense and crash history. FSI crashes are the outcome of interest. A fatality is defined as a death that results from a crash, while serious injury resulting from a crash is defined as long-term impairment or loss of body function, permanent serious disfigurement, severe long-term mental, or severe long-term behavioral disturbance or disorder, or loss of a fetus [1]. Studies that examined crashes that did not result in fatalities or serious injury were not included in this review.
Data
Only models based on observational data were included. Data derived from laboratory tests, simulated data, self-report measures, and driving simulations were excluded. Studies that presented models built on qualitative data, mixed qualitative and quantitative data, and simulated data were excluded. Instantaneous traffic data used in real-time traffic crash prediction were excluded because these models typically apply only to a small section of the road.
Model specifications
To be included in the review studies must have presented a quantitative model predicting FSI crashes for individual drivers. That is, FSI crashes must be the outcome variable. A model was excluded from the review if the dependent variable included only minor crashes, such as when those involved in the accident received cuts and bruises and the vehicle had minor/repairable damage. When the dependent variable included a combination of serious and minor crashes, the model was excluded. Models had to describe crash risk based on driver offense and/or crash history. Models using longitudinal data were included. Models that considered only summarized country or state level offense and crash data, rather than individual driver data as the unit of interest, were excluded.
Study design
A study or report was included if it was a primary study that presented a quantitative model predicting FSI crashes. Study designs that were excluded included those which reported only qualitative data, reviews, meta-analyses, case studies, and any study that did not present a model for crash prediction.
Publication date
Studies and reports published prior to 1984 were excluded because data without a digital format was rare prior to this date, making statistical analysis difficult.
Search strategy
Four academic databases were systematically searched: Australian Transport Index: ATRI database (via informat); Transportation Research International Documentation (ITRD) database; Scopus; and Web of Science. Search terms were chosen to identify papers that included statistical models, serious crashes, and variables related to the driver (Appendix S1). For example for Scopus we used the following search string: (predict* OR model) AND (“serious crash*” OR “serious accident” OR “serious collision” OR “fatal accident” OR “fatal crash” OR “fatal collision” OR “road deaths” OR “road fatal*” OR “traffic fatal*” OR “collision fatal*” OR “accident fatal*”) AND (driver).
Grey literature was also searched for government reports of crash prediction. The grey literature search was focused on, but not limited to, high-income countries that have low road traffic fatality rates, as indicated by the World Health Organization [29] (i.e., < 19.9 deaths per 100,000 population per annum). This was to increase the chance of identifying relevant reports in relation to the time spent searching. The search for grey literature utilized Google and Google Scholar to identify relevant international organizations, their websites, and information repositories. A total of 58 official government and statistic websites were searched from 38 countries (Appendix S2).
A forward and backward search was conducted on the articles that met the inclusion criteria. Forward searching involved identifying articles that cited the included study. Forward searching was conducted in Scopus. The backward search consisted of screening the references of the included studies (Appendix S3).
Selection of studies
Initial search records were recorded in a Word document. Every article title and the first author from each article were systematically entered into the “find” function to identify and remove duplicates. The first author conducted all searches, eliminated duplicates, screened titles, and abstracts against the inclusion/exclusion criteria, and produced a list for full-text screening. If the full text was not available, the record was excluded from the review at the title/abstract screen stage. An auditing trail was created using an inclusion/exclusion checklist for the full-text screen in Excel, in which reasons for exclusion were recorded. Two authors screened studies at the full-text stage in Excel; discrepancies were resolved by a third author.
Quality assessment tool
Statistical models were assessed in three areas using a purpose-made statistical quality assessment tool. Firstly, the degree to which assumptions were met and whether the method used in the study was appropriate. Secondly, the validation of the statistical model presented including whether the model had been validated using fresh data allowing an evaluation of generalizability. Thirdly, how adequately the authors reported the analysis procedures used and the results.
The quality assessment tool was based on the Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines [30]. Items from the guidelines were rewritten as questions and additional items were added when required (Table 1). The tool was divided into three sub-sections. The “Reporting of statistical methods” sub-section included assessments of data quality (5 items), preliminary analyses (1 item), primary analyses (11 items), and supplemental analyses (4 items). The “Reporting of results” sub-section was made up of items assessing the reporting of numbers, descriptive statistics and averages (8 items), reporting risk ratios (5 items), and validation (2 items). The third section assessed “Method specific quality indicators” for regression analysis (10 items), survival analysis (13 items), and structural equation modeling (5 items). Studies were not assessed on items irrelevant to the methods used.
Table 1.
Quality assessment tool
| Sub-section and area of interest | Items |
|---|---|
| Section 1—Reporting of statistical analysis | |
| Data quality |
Does the study report information about the research population? Does the study report what type of data has been used for the analysis, whether it is primary data (collected initially for the study) or secondary data (from a different source)? If the study used secondary data does the study name, the databases which have been used for the analysis? If the study has used different databases, does the study describe any linkage between the databases? Does the study report whether the data set is a representative data set? |
| Preliminary analyses | Does the study report any statistical procedures used to modify raw data before the analysis? |
| Primary analyses |
Does the study describe about the purpose of the analysis? Does the study identify the variables used in the analysis and summarize each with descriptive statistics? Does the study describe the main methods fully, for analyzing the primary objectives of the study? Is the study clear on which method is used for each analysis, rather than just listing all statistical methods used in one place? If the method includes any assumptions, does the study verify that the data conformed to the assumptions of the test used to analyze them? Does the study indicate whether and how any allowance or adjustments were made for multiple comparisons (performing multiple hypothesis tests on the same data)? Does the study report how it deals with missing data? If relevant, does the study report how any outlying data were treated in the analysis? Does the study report the alpha level (e.g., 0.05) that defines statistical significance? Does the study report the name of the statistical package or program used in the analysis? If the study needs to control any variables for its objective, does the study report it properly? |
| Supplementary analyses |
Does the study describe sensitivity analyses if applicable? Does the study test for the underlying assumptions of the methods used in the analysis? Does the study identify post hoc analyses, including unplanned subgroup analyses, as exploratory? If there is an imbalance that exists in the outcome variable of the data set, does the study report how the training phase overcome this issue? |
| Section 2—Reporting of results | |
| Reporting numbers and descriptive stat avg |
Does the study report numbers—especially measurements—with an appropriate degree of precision. For ease of comprehension and simplicity, rounded to a reasonable extent? Does the study report total sample and group sizes for each analysis? Does the study report numerators and denominators for all percentages? Does the study summarize data that are approximately normally distributed with means and standard deviations (SD)? Use the form: mean (SD), not mean ± SD? Does the study summarize data that are not normally distributed with medians and interpercentile ranges, ranges, or both (report the upper and lower boundaries of interpercentile ranges and the minimum and maximum values of ranges, not just the size of t)? Does the study report the variability of the data set using either standard deviations, inter-percentile ranges, or ranges (the SE is an inferential statistic—it is about a 68% confidence interval—not a descriptive statistic)? Does the study display summarized or exact data in tables? Does the study display data in figures? Tables present exact values, and figures provide an overall assessment of the data? |
| Reporting risk and ratios |
Does the study describe the type of rate (e.g., incidence rates; survival rates), ratio (e.g., odds ratios; hazard ratios), or risk (e.g., absolute risks; relative risk differences), being reported? Does the study describe the quantities represented in the numerator and denominator? Does the study report the time period over with each rate applies? Does the study report any unit of population (that is, the unit multiplier: e.g., × 100; × 10,000) associated with the rate? Does the study consider reporting a measure of precision (a confidence interval) for estimated risks, rates, and ratios? |
| Validation |
Does the study describe methods of validation used in the training phase (e.g., cross validation, use of test/hold-out sample)? Does the study describe the attempts to generalize the model beyond the immediate context? |
| Section 3—Method specific quality indicators | |
| Regression analysis |
Does the study describe the purpose of the analysis? Does the study confirm that the assumptions of the analysis were met? For example, in linear regression indicate whether an analysis of residuals confirmed the assumptions of linearity. Does the study report the regression equation for either simple or multiple (multivariable) regression analyses? For primary comparisons analyzed with simple linear regression analysis, does the study consider reporting the results graphically, in a scatter plot showing the regression line and its confidence bounds? Does the study report the alpha level used in the univariate analysis? Does the study report whether the variables were assessed for collinearity? Does the study report whether variables were assessed for interactions? Does the study describe the variable selection process by which the final model was developed (e.g., forward stepwise; best subset). Does the study report the regression coefficients (beta weights) of each explanatory variable and the associated confidence intervals and P values, preferably in a table? Does the study provide a measure of the model’s “goodness-of-fit” to the data (the coefficient of determination, r2, for simple regression and the coefficient of multiple determination, R2, for multiple regression)? |
| Survival analysis |
Does the study describe the purpose of the analysis? Does the study describe the dates or events that mark the beginning and the end of the time period analyzed? Does the study specify the circumstances under which data were censored? Does the study specify the statistical methods used to estimate the survival rate? Does the study confirm that the assumptions of survival analysis were met? For each group, give the estimated survival probability at appropriate follow-up times, with confidence intervals, and the number of participants at risk for death at each time. It is often more helpful to plot the cumulative probability of not surviving. For each group, give the estimated survival probability at appropriate follow-up times, with confidence intervals, and the number of participants at risk for death at each time. It is often more helpful to plot the cumulative probability of not surviving. Reporting median survival times, with confidence intervals, is often useful to allow the results to be compared with those of other studies? Does the study present the full results in a graph (e.g., a Kaplan-Meier plot) or table? Does the study specify the statistical methods used to compare two or more survival curves? Does the study report the P value, when comparing two or more survival curves with hypothesis tests? Does the study report the regression model used to assess the associations between the explanatory variables and survival or time-to-event? Does the study report a measure of risk (e.g., a hazard ratio) for each explanatory variable, with a confidence interval? |
| SEM models |
Does the study report all the parameters and their standard errors? Does the study report the reason for the choice of a clear and complete form of path model structure? Does the study report the global indices of fit? Does the study provide reasons as justification for omitted directed and non-directed arcs? Does the study report alternative and equivalent models? |
Quality assessment procedure
Included studies were quality assessed by two independent authors (SM and SS) using the purpose-made quality assessment tool. The quality assessment tool was completed as a Google doc and automatically exported into an Excel file for analysis. Disagreements were resolved by a third author (DM). Items were allocated a score of 1 for yes, 0 for no, or NA for not applicable with higher quality papers receiving higher quality scores. Sub-section quality scores were calculated by averaging responses to reveal a quality score ranging from 1 when all quality indicators were met, to 0 when no quality indicators were met. When an item was not applicable, that item was excluded from the mean score of that study’s rating. Then each reviewers’ sub-section quality scores were averaged to get a final quality score for every study independently. Finally, the independently reviewed quality scores of the two reviewers were averaged to get an overall quality measure for each study. Studies were then categorized as scoring low (0 to 0.333), medium (0.334 to 0.666), and high (0.667 to 1.00); i.e., 1/3 = 0.333 with one being the totality of all studies included in the review.
Results
The flow of studies from identification, through screening for eligible inclusion, to the final synthesis, is presented in the PRISMA flowchart, Fig. 1. One thousand one hundred and five records were identified. Of the 252 studies included in the full-text screen (Appendix S4), 243 were identified in the database search and nine were identified in the forwards and backward search. Agreement between the two reviewers (RS and SM) of the full texts for inclusion was low (k = 0.469, p < 0.00). These disagreements were resolved by a third author (SM), resulting in twenty studies being included in the quality assessment, comprising of data from a total of 2,379,862 individuals. The statistical techniques, findings, and characteristics of the included studies are presented in Table 2.
Fig. 1.
PRISMA flowchart
Table 2.
Details of study characteristics
| Study, country, vehicle type, quality | Design, data type, number of drivers/crashes | Independent predictor | Confounders controlled | Dependent variable | Statistical technique used, measure of risk | Main finding of interest |
|---|---|---|---|---|---|---|
| Lui and Marchbanks [31], USA, cars, high | Longitudinal, FARS (1984‑1986), population, drivers that have had a speeding conviction (n = 46,266) | Date of prior crash, suspension or conviction | None | Date of fatal crash | Survival analysis | Involvement in a fatal crash will occur by 5 years post the prior crash, suspension, or conviction. |
| Perneger and Smith [32], USA, cars and light pick ups, mid | Matched-pairs design, FARS (1986), population, two-car crashes (n = 6506) | Invalid license, prior DWI, prior suspension, prior speeding, prior crash within 12 months, 13‑24 and 25‑36 months ago | Environment, exposure to traffic, differences in case fatality | Culpable or non-culpable drivers | Logistic regression, odds ratio | Invalid driving license, prior DWI, and prior license suspension increases the likelihood of initiating a fatal crash. |
| Rajalin [33], Finland, truck, bus, and drivers over 65 years not included, low | Case-control design, (Study 1) VALT and traffic offense register, sample, drivers involved in fatal crash (n = 615), random selection of license holders (n = 776) | Offense rate | Distance driven | Drivers in fatal crashes or control | Logistic regression, odds ratio | Drivers involved in a fatal crash have an offense rate 1.51 times higher than control drivers. |
| Cooper [34], Canada, all vehicles, mid | Case-control design, police crash data and driver records in British Columbia (1991‑1994), population, with speeding conviction (n = 877,758), with other conviction (n = 154,103) | Thirteen types of prior traffic convictions (e.g., speeding, not obeying signal, failure to yield) | None | Rate of crashes per driver | Logistic regression, estimates | All conviction types contribute to increasing the chance of being in a subsequent FSI crash, convictions associated with speeding and DWI were the most important predictors. |
| Wundersitz et al. [35], Australia, all vehicles, low |
Matched-pairs design (drivers paired from same fatal crash group), TARS (1999 to 2002), South Australian population, Culpable drivers (n = 182), non-culpable drivers (n = 206) |
Four offense types and grouped offenses over 5 years prior to the fatal crash, crashes and culpable crashes over 5 years prior to fatal crash | None | Culpable vs non-culpable involvement in fatal crash | Not reported, odds ratio | Not significant |
| Kim et al. [36], USA, all vehicles, mid | Case-control design, NPTS and FARS (1995‑1997) population, n not reported | Previous traffic offense yes/no | None | Probability of survival, probability of crash | Logistic regression, odds ratio | If a previous traffic offense is present it decreases the likelihood of surviving a one or two-vehicle crash but increases the likelihood of surviving a multi-vehicle crash. A previous traffic offense increases the likelihood of having a crash for one, two, and multi-vehicle crashes. |
| Blower and Green [37], USA, buses, mid | Case-control design, BIFA from FARS (1999‑2005), population, drivers (n = 2102) | Previous violation or not in previous 3 years, previous crashes or not in previous 3 years | Type of bus service: school, intercity, charter/tour, other | Driver error in fatal crash or not | Logistic regression, odds ratio | Previous violations and crashes each increase the chance of the driver making an error by 1.3 times. Drivers with a violation on their record are 27% more likely to commit an error in a fatal crash than those without a violation. |
| Malchose and Vachal [38], USA, all vehicles, mid | Case-control design (involved in FSI crash or not), NDDL, (2006 to 2009), North Dakota population, crashes (n = 317), teen drivers aged between 14 to 17 years (n = 20,392) | No-risk convictions (e.g., parking), risk convictions (e.g., speeding), previous property damage only crash, in the first year or prior to first fatal or injury crash, previous at-fault property damage only crash, in the first year or prior to first fatal | Gender, age risk convictions, rural/urban | Involvement in a fatal or injury crash in the first year of licensure or not | Logistic regression, log odds | Drivers with previous convictions are 0.5 times; and drivers with previous property damage only crash histories are 25.5 times likely to be involved in a fatal or injury crash. |
| Lueck and Murry [39], USA, truck involved crashes, low | Case-control design, MCMIS and CDLIS, population, crashes (n = 30,090), drivers in a crash or roadside inspection (n = 587,772) | The ability of those with and without a violation/conviction and crash history (2008) | None | Crashes (2009) | Not reported, reported percentages |
Twenty-three of the 34 independent variables significantly increased the chance of being in a crash, increases ranged from 18% to 96% Any previous conviction increased the chance of being in a crash by 56% Prior crash increased the chance of being in a future crash by 88% |
| Gates et al. [40], USA, heavy trucks, > 26,000 lbs, mid | Case-control design, FARS (1993‑2008), population, driver > 20 years old involved in a fatal crash (n = 65,867) | Stimulants present or absent | Past 3 years, prior crashes, DWI, speeding infractions, other infractions, suspensions | Responsibility for crash measured by UDA preceding the crash | Logistic regression, reported percentages |
Odds of an UDA increased by: 30% one or more prior crash 24% other moving conviction 26% one prior suspension 14% previous speeding conviction |
| Factor [41], Israel, all vehicles, mid | Case-control design, ICBS (2002‑2008), 20% representative sample, drivers (n = 409,051) | Traffic tickets per year (over 13 years) | Daily distance traveled, gender, age, religion, education, social class, vehicle type | Driver involve in FSI crash or not | Logistic regression, odds ratio | As traffic tickets increase so too do the odds of being in an FSI crash. |
| Reguly et al. [42], USA, heavy trucks, > 26,000 lbs, mid | Case-control design, FARS (1993‑2008), population, negative drug test (n = 8325), positive drug test (n = 102) | Opioid analgesic (drivers positive or negative drug test) | Collisions, DWI convictions, other convictions, speeding, license suspensions | Responsibility for crash measured by UDA preceding the crash | Logistic regression, estimate result, odds ratio | Odds of an unsafe driver action increased if the driver had a previous crash, a driving infraction or speeding infraction in the past 3 years. |
| Dubois et al. [43], USA; passenger vehicles, sport-utility vehicles, and light trucks; high | Case-control design, FARS (1991‑2008), population, driver > 20 years old involved in a fatal crash (n = 150,010) | BAC, cannabis, BAC and cannabis (drivers positive or negative drug test) | History of one, two, or three or more of crashes, DWI, speeding, suspensions | Responsibility for crash measured by an UDA preceding the crash | Logistic regression, odds ratio |
Odds of an UDA increased by: 13% one prior crash 39% three or more prior crashes 26% one prior suspension 33% three or more prior suspensions |
| Kumfer et al. [44], USA, all vehicles, high |
Case-control design, FARS (2010‑2012) California, Michigan, New York, North Carolina, Texas, and Washington, representative sample of USA, driver in a single-vehicle crash (n = 5110), driver in a multivehicle vehicle crash (n = 9986) (passengers not included) |
History of suspensions or revocations (none, one, other), year of last crash or license suspension (no record, two since 2005, other) | None | Multivehicle or single vehicle | Logistic regression, odds ratio | Not significant |
| Feng et al. [45], USA, bus involved crashes, mid | Case-control design, BIFA from FARS (2006 to 2010) population, drivers (n = 1380) |
Cluster 1 “middle age drivers with history of driving violations” Cluster 2 “young and elderly drivers with history of driving violations” Cluster 3 “drivers without history of driving violations” Valid license or not |
None |
Level 1 crashes < = 2 fatalities Level 2 crashes > 2, < 3 fatalities Level 3 crashes > = 3 fatalities |
Logistic regression, estimate result and odds ratio | Cluster 1, low chance of being involved in level 2 and 3 crashes; cluster 2, high chance of being in level 1 and 2 crashes; cluster 3, ns |
| Li et al. [46], USA, all vehicles except heavy trucks, mid | Matched-pairs design (drivers paired from same fatal two-vehicle crash), FARS (1993 to 2014), population, culpable driver (n = 14,742), non-culpable driver (n = 14,742) | Concurrent alcohol and marijuana use | Previous 3-year crash history, previous 3-year DWI conviction, previous 3-year speeding conviction | Culpability or non-culpability in a fatal crash | Logistic regression, odds ratio | Having a crash history, a DWI conviction, a speeding conviction, and license suspension in the previous 3 years increases the likelihood of culpability in a fatal crash. |
| Hamzeie et al. [47], USA, all vehicles, mid | Case control design, FARS fatal crashes (2010 to 2014), population, drivers with known injury severity (n = 74,632) | Cannabis use, five levels of injury severity* | Number of previous license suspensions, number of previous speeding violations | Cannabis use yes/no | Logistic regression, odds ratio | Those with speeding violations and those who have had their license suspended are associated with higher levels of injury in fatal crashes. |
| Stringer [48], USA, all vehicles, high | Longitudinal, FARS and GSS (1993 to 2015), population, drivers (n = 2326) | Normative behavior, values and beliefs, local attitudes | Non-DUI fatal crashes, repeat DUI offender crashes | Total frequency of DUI fatal crashes in each county | Poisson multi-level growth curve | Non-DUI fatal crashes and repeat DUI offender crashes significantly predict future DUI fatal crashes |
| Mashhadi et al. [49], USA, trucks, mid | Case-control design, CARE and WCRVD (2011‑2014), Wyoming data from 3 interstate highways, single truck crash at fault (n = 1654), multi-vehicle crash truck not at fault (n = 696), multi-vehicle crash truck at fault (n = 847) | Violation record | None | Fatality/injury level, being in a single truck crash | Logistic regression | Not significant |
| Yuan et al. [50], USA, truck involved crashes, mid | Within group (involved in FSI crashes), TIFA and FARS (2010), population, crashes (n = 1555), drivers/occupants n not reported | Driver factors (latent factor) made up of five observed measures; belt use, driving experience, history of conviction, history of crash, gender, valid license or not | Other latent variables | Truck occupant injury factors (latent), accident size (latent) | Structural equation modeling, standardized regression weights |
Crash history (measured) has a large effect on driver factors (latent) which decreases accident size (latent). Prior suspension, speeding, and convictions (measured) make a fatal accident more severe. |
Country the country from which the data was collected; TIFA database trucks involved in fatal accidents database, FARS fatality analysis reporting system, BIFA buses involved in fatal accidents database, * the KABCO scale, ns not significant; MCMIS motor carrier management information system database, CDLIS commercial driver’s license information system, NDDL North Dakota Drivers’ License data; TARS Traffic Accident Reporting System, NPTS Nationwide Personal Transportation Survey, VALT Traffic Safety Committee of Insurance Companies, GSS General Social Survey from National Opinion Research Center, ICBS Israel Central Bureau of Statistics, CARE Critical Analysis Reporting Environment, WCRVD Wyoming court reported violation database, DUI driving under the influence, DWI driving while intoxicated, FSI fatal or serious injury crash, UDA unsafe driving action (used as a proxy measure)
Vehicle type and population
Nine out of the 20 reviewed studies included all vehicle types (i.e., cars, motorcyclists, and trucks). The remaining 11 studies explored the prediction of FSI crashes from offense history and crash history using data related to distinct vehicle types: buses (1 study), heavy trucks (5 studies), and cars and light trucks (4 studies). Seventeen studies used data representing the population, three countries were represented—USA, Finland, and Canada, and two states were represented—North Dakota (USA) and South Australia (Australia). Of the studies using sample data, one used a 20% representative sample based on Israeli census data, one used data from six states in the USA, and a third study selected various highways within the USA state of Wyoming (Appendix S5).
Quality assessment
Overall, there was high agreement between the two reviewers of statistical quality (inter-class correlation coefficient 0.919; Appendix S6). Fifteen studies used logistic regression and most of these studies received quality scores in the middle range (low = 3, middle = 12, high = 2). The only study to use structural equation modeling also received a quality score in the middle range. One study used a survival analysis, and one used a Poisson multi-level growth curve model, both receiving a high-quality score. Quality scores tended to be consistently higher for more recently published studies (Table 3). The improvement in quality seems to be led predominantly by improved “Reporting of statistical methods,” while the method specific quality indicators provide the poorest results (Fig. 2).
Table 3.
Quality tool assessment results
| Studies grouped by statistical technique | Reporting of statistical methods | Reporting of statistical results | Method specific quality indicators | Overall quality score |
|---|---|---|---|---|
| Logistic regression | ||||
| Perneger and Smith 1991 | 0.704 | 0.552 | 0.299 | 0.463 |
| Rajalin 1994 | 0.193 | 0.391 | 0.000 | 0.146 |
| Cooper 1997 | 0.372 | 0.188 | 0.444 | 0.362 |
| Wundersitz et al. 2004 | 0.435 | 0.525 | 0.167 | 0.323 |
| Kim et al. 2006 | 0.470 | 0.432 | 0.278 | 0.365 |
| Blower and Green 2010 | 0.675 | 0.590 | 0.556 | 0.594 |
| Malchose and Vachal 2011 | 0.646 | 0.449 | 0.500 | 0.524 |
| Lueck and Murry 2011 | 0.333 | 0.382 | 0.111 | 0.234 |
| Gates et al. 2013 | 0.741 | 0.617 | 0.611 | 0.645 |
| Factor 2014 | 0.880 | 0.576 | 0.389 | 0.558 |
| Reguly et al. 2014 | 0.907 | 0.636 | 0.556 | 0.664 |
| Dubois et al. 2015 | 0.869 | 0.750 | 0.611 | 0.710 |
| Kumfer et al. 2015 | 0.938 | 0.651 | 0.611 | 0.703 |
| Feng et al. 2016 | 0.750 | 0.617 | 0.500 | 0.592 |
| Li et al. 2017 | 0.697 | 0.464 | 0.997 | 0.755 |
| Hamzeie et al. 2017 | 0.741 | 0.571 | 0.500 | 0.578 |
| Mashhadi et al. 2018 | 0.696 | 0.569 | 0.500 | 0.566 |
| Structural equation modeling | ||||
| Yuan et al. 2019 | 1.000 | 0.387 | 0.600 | 0.647 |
| Survival analysis | ||||
| Lui and Marchbanks 1990 | 0.481 | 0.625 | 0.808 | 0.680 |
| Poisson multi-level growth curve | ||||
| Stringer 2018 | 0.958 | 0.504 | 0.778 | 0.755 |
Fig 2.
Quality assessment sub-section scores for included studies
Qualitative synthesis of study outcomes
Using binary logistic regression analyses, seven studies reported that offense history predicts FSI crash involvement. Perneger and Smith [32] found that having an invalid driving license or license suspension, or a prior DWI (driving while intoxicated) conviction increases the likelihood of initiating a fatal crash. Cooper [34] found that fourteen conviction types increase the chance of being in a subsequent FSI crash; speeding and DWI convictions were the most important. These findings are supported by Factor [41], who used traffic infringement tickets as a predictor variable, and Rajalin [33], who compared drivers who had had an FSI crash to those who had not. Further nuance to the role of offense history is added by Kim et al. [36]. Offense history increased the likelihood of having a crash regardless of the number of vehicles involved. Additionally, those with previous traffic offenses were less likely to survive a one or two-vehicle crash, but more likely to survive a crash involving three or more vehicles. Speeding violations and a license suspension, rather than any other type of offense, were associated with increased injury severity [47].
Six studies using binary logistic regression reported that both crash history and offense history increase the likelihood of FSI crash involvement. The odds of a driver making an error directly leading to an FSI crash increased by approximately 27% if they had been responsible for a previous crash, a driving infraction/violation, or speeding infraction [37, 38, 42]. Specifically, driver error leading to an FSI crash increased by 13% if the driver had one prior crash [43], 30% if the driver had one or more prior crashes [40], and 39% if the driver had three or more prior crashes [43]. Prior license suspension was a strong predictor of driver error leading to an FSI crash: one suspension increased the odds by 26% [40, 43] and three or more prior suspensions increased the odds by 33% [43]. Not only are offense history and crash history associated with driver errors causing an FSI crash, but they are also associated with the likelihood of the driver being legally culpable. Predictors of being legally culpable include a crash history, a DWI conviction, a speeding conviction, and a license suspension in the 3 years prior to the crash [46]. Property damage only crashes also predicted FSI crash involvement [38]. Feng et al. [45] took a unique approach by grouping drivers on several characteristics. Middle-aged drivers with a history of convictions had a low chance of being involved in a crash involving more than two fatalities. Young and elderly drivers with a history of violations had a high chance of being involved in a crash involving less than 3 fatalities [45].
Analyses other than logistic regression produced comparable findings. Based on longitudinal data, Stringer [48] found that crash history and offense history significantly predict future fatal crashes with crash history being the more important predictor. Yuan et al. [2019] found that prior suspension and speeding convictions are associated with fatal accidents. Finally, Lui and Marchbanks [31] found that should a fatal crash occur after license suspension, a conviction, or prior crash, it is likely to occur within 5 years. While the predictors of FSI crashes have been identified, these findings cannot be verified given that most of these studies were assessed as having only low to middle quality in our quality assessment.
Critical discussion
Strengths of logistic regression in this context
There are several reasons for the popularity of logistic regression. Logistic regression has many different forms allowing its use for conditional logistic regression with matched-pairs data [32, 45]; ordinal and nominal logistic regression when the dependent variable has several categories [45, 47]; direct logistical regression when no predictor variables are considered more or less important than the others [37]; sequential logistic regression when confounding variables need to be controlled [41, 43]; stepwise logistic regression when an exploratory approach is needed [49]; and censored regression when data for the dependent variable is incomplete [36]. Further, the assumptions for logistic regression are lenient. Also, logistic regression can perform several functions. It can predict group membership, identify important predictors, identify interactions among predictors and provide odds ratios for quantifying the effects of predictors. Moreover, the accuracy of logistic regression models is easily assessed using a variety of measures including the proportion of variance in the dependent variable explained by the predictors, using pseudo R squared values.
Weaknesses of the use of logistic regression in this review
Most of the papers that applied a logistic regression model in this review was rated as “medium” quality. There were striking statistical deficiencies found in these studies. Firstly, the selection of predictor variables and the number of predictors included often appeared to be made post hoc, i.e., inclusion was justified only after the model was created. Given the scope of variables available from large population databases recording fatal crashes, a surprising observation was that only five studies [32, 37, 40, 43, 49] described the variable selection process used. While the practice of post hoc justification of predictors is common, it is damaging to the integrity of findings and real-world implications. Logistic regression is often used to inform life and death decisions; therefore, inadequate or poor a priori variable selection may indeed lead to unwanted consequences.
Secondly, of vital importance to the quality of analyses and subsequent confidence in findings is the adherence to underlying statistical assumptions. Only two studies using logistic regression confirmed that the assumptions were met [46, 47]. Particularly problematic within the studies using logistic regression was the failure to check for correlations between predictor variables (multicollinearity) and interactions. Of studies using logistic regression, only two studies reported checking multicollinearity [44, 46], and only five studies, checked for interactions [40, 42–44, 46]. The presence of multicollinearity and interactions within the predictor variables artificially increases their ability to predict FSI crashes and makes interpretation of odds ratios problematic. Not meeting these assumptions raises questions about the validity of the results.
A critical observation repeatedly made by reviewers of statistical models is the inclusion of unwanted correlations in data. Examples are, serial correlation, i.e., correlation over time [18], spatial and temporal correlation between predictor variables, and correlation between predictor variables and error [20]. The inclusion of these correlations in data results in incorrect inferences being made from the results. To illustrate the problem, helmet use may appear to reduce crash fatalities, yet when a biological or psychological predisposition to risk predicts both helmet use and crash involvement, helmet use may no longer be a significant predictor of fatalities. This argument is supported by a clear relationship between sensation seeking and risky driving [51]. The problem of unwanted correlation in the data is relevant in this review, when contemporaneous offense and crash data are considered. It is important to note that there is likely to be a two-way relationship between offense history and crash history in this context. Not only is someone with an extensive offense history more likely to have a crash. In addition, it is likely that a crash caused by a driver will lead to the driver being charged with an offense [49].
The third weakness identified in the reviewed studies using logistic regression is the lack of validation. For example, none of the studies reported the results of a Hosmer Lemeshow goodness of fit test or areas under the receiver operating characteristic curve. Validation of final models using new data and the use of validation during model creation was not conducted anywhere in the reviewed literature. Given the pervasive international interest and enormous volume of publications on FSI crashes, the lack of validated models that predict FSI crashes from offense history, crash history, and licensing variables is an important limitation. However, one of the papers in this review [46] did conduct two sensitivity analyses for their models. One of these sensitivity analyses tested their model with data from the USA states that accounted for more than 80% of FSI crashes in the USA and the second sensitivity analysis tested the model on data for two different time periods.
The fourth major limitation is lack of effective calibration of the models, considering any imbalance in the data in order to provide accurate estimates of classification accuracy. Indeed, no study in this review presented a confusion/classification matrix to evaluate how well the model classified participants into the correct categories. Both validation and calibration must be conducted after a model has been created to confirm its classification accuracy [52].
Taken together these four deficits within the literature using logistic regression models, drastically lowers confidence in the overall findings that offense history or crash history can accurately predict future FSI crashes.
Critical summary of other modeling techniques utilized in this review
Three studies used a modeling technique other than logistic regression: survival analysis [31], Poisson multi-level growth curve modeling [48], and structural equation modeling [50]. The use of survival analysis was largely responsible for Lui and Marchbanks article [31] receiving a high-quality score, with many statistical considerations not being applicable. However, lacking within the study was a description of how raw data was treated, whether preliminary analyses were conducted, and the reporting of descriptive statistics. These limitations reduce the generalization and utility of the model.
A Poisson multi-level growth curve model was used by Stringer [48] to address the question, “What is the probability that in a given period an FSI crash will occur?” Stringer [48] received the highest quality score among the included studies, suggesting that this paper was the most rigorous. The weaknesses identified in Stringer [48] did not seriously undermine the validity of the results. The most important result was based on longitudinal data; crash history and offense history significantly predict future fatal crashes with crash history being the more important predictor. Weaknesses were failure to report the statistical packages used and descriptive statistics for the included variables. In addition, there was no model validation attempt so the extent to which the model has general application is unknown. Notable strengths of the study were the use of a Poisson distribution to avoid overestimation of zero values, the use of longitudinal data, and rigorous checks of the model assumptions.
Yuan et al. [50] used structural equation modeling to make sense of a broad range of variables involved in FSI crashes. Structural equation modeling allows for the modeling of latent constructs using measurement models and for path models used to assess relationships between variables that are measured (observed) or latent (not observed) or both. An example of a latent variable constructed in this paper is “Driver Factors” made up from the following measured variables-seatbelt use, driving expertise, offense history, crash history, and gender (measurement model). “Driver Factors” was used to predict two other latent variables, “Truck Occupant” and “Accident Size,” using a path model. This technique was appropriately used by Yuan et al. (2019) as they had a large data set, appropriate measures for the latent variables which met the underlying model assumptions, and appropriate values for the goodness of fit indices.
Recommendations for future research of FSI crashes using offense history and crash history as predictor variables
There are multiple recommendations for future statistical modeling of FSI crashes from offense and crash history based on the reviewed literature. Firstly, when using population data, and data containing a large number of variables, the use of statistical significance indicators such as p-values are inappropriate. For example, tests of predictor coefficients will inevitably result in multiple type 1 errors. Instead, researchers need to follow and report rigorous methods of variable selection prior to commencing the main analysis. Secondly, future research needs to test and report the checking of statistical assumptions and models need to be validated using fresh data. In addition, classification accuracy needs to be reported considering any imbalance in the crash data. The underreporting of crashes has previously been argued to create a non-random data set, violating traditional statistical assumptions [20], but this is unlikely for FSI crashes.
Thirdly, machine learning should be utilized to overcome the problem of unwanted correlations in the data when there are very large sample sizes. These methods do not rely on p values for model selection purposes. For example, random forest can accommodate interactions between predictor variables and can sort the predictor variables according to their importance [21]. Importance ranking is crucial when there are many predictor variables and there is a need for organizations to identify the most important variables for policy changes. Similarly, to random forest models, gradient boosting algorithms may be useful in that they can handle large sets of predictor variables and interactions. The advantage of these models is their ability to more accurately classify minority classes. However, careful considerations for the tuning of the parameters during the training phase, often involving cross-validation, is needed in order to avoid overfitting. Unfortunately, interpretation of random forests and gradient boosting is difficult. This makes the results harder to explain to non-technical audiences compared to the results of traditional statistical models such as logistic regression models which provide odds ratios with 95% confidence intervals.
Another prominent issue faced by researchers analyzing FSI crash data is the prevalence of highly imbalanced data sets with respect to the class of the dependent variable. Most drivers have no history of being involved in an FSI crash. This imbalance in the number of drivers who have and who have not been involved in an FSI crash is problematic for statistical models attempting to classify drivers as being at risk of FSI crash involvement as models developed using imbalanced data tend to struggle to correctly classify those in the minority class (i.e., drivers most at risk of FSI crash involvement). In order to overcome this problem, researchers should consider using techniques such as the synthetic minority over sampling technique (SMOTE) which allows oversampling of the minority class in a representative manner. Another consideration for researchers is the use of hidden Markov models or recurrent neural networks to identify drivers more a risk of an FSI crash. These methods could use the past sequence of offenses to predict the probability of a future FSI crash.
Finally, there is a need for the use of statistical quality assessment tools in future reviews of the crash literature. It is hoped that future reviews of the FSI and broader crash and transportation literature will refine and use our tool to assess the quality of statistical analyses. This is important because a failure to identify poor statistical analysis in reviews may lead to incorrect conclusions and misinformed policy. The use of a standardized assessment tool improves the objectivity of a review’s findings. Such tools are easy and quick to use, also allowing comparisons between reviewer scores and ensuring that all studies are assessed on the same characteristics.
Limitations of this review
Language and geographical bias limited this review. Similar to other reviews of related topics [20, 53], the bulk of the research was conducted in the USA. We considered only articles written in English and the grey literature review was confined to higher-income countries with lower road death rates. The review was limited to a qualitative synthesis of included studies as a meta-analysis was deemed inappropriate. This was because most models reviewed included many predictor variables in addition to our predictors of interest, namely, offense history and crash history. Further limitations were the time and scope of the search protocol. Despite searching four prominent databases, it is possible that further applicable publications were missed. The lack of search terms for specific offenses (e.g., “speeding,” “drink driving”) may have limited the identification of additional studies present in the databases that were searched. Additionally, one author conducted the screening of identified records and abstracts potentially introducing bias and missed studies. The applicability of our findings is limited to high income developed countries, as FSI crashes are not rare and offense history may not be reliable in countries with low and middle incomes.
Conclusions
This review contributed to the literature in multiple ways. The study developed a statistical quality assessment tool and demonstrated how it can be utilized when presenting evidence of FSI crash prediction. The review identified that multicollinearity, model validation, and appropriate methods for the selection of predictor variables remain problematic in studies predicting FSI crashes from offense and crash history. However, the most recent studies reported more rigorous modeling practices. Future studies modeling FSI crash risk using offense and/or crash history should consider employing machine learning methods to overcome some of the key limitations of the traditionally used statistical techniques identified in this review. Seven out of the 15 studies using logistic regression reported an association between offense history and FSI crashes. Suspension bans and crash history were also commonly reported as having an association with FSI crashes.
Supplementary information
Additional file 1. Data base search strings 7/6/2019. Appendix S1-S6
Acknowledgements
There are no additional contributors to acknowledge.
Abbreviations
- FSI
Fatal and serious injury
- PRISMA
Preferred Reporting Items for Systematic Reviews and Meta-analyses
- PROSPERO
International prospective register of systematic reviews
- ATRI
Australian transport index
- ITRD
Transportation Research International Documentation database
- SAMPL
Statistical Analysis and Methods in the Published Literature
- SM
Samuel Muir
- SS
S. S. M. Silva
- DM
Denny Myer
- TIFA database
Trucks Involved in Fatal Accidents database
- FARS
Fatality Analysis Reporting System
- BIFA
Buses Involved in Fatal Accidents database
- KABCO scale
Fatal, incapacitating injury, non-incapacitating injury, possible injury, property damage only scale
- MCMIS
Motor Carrier Management Information System database
- CDLIS
Commercial Driver’s License Information System
- NDDL
North Dakota Drivers’ License data
- TARS
Traffic Accident Reporting System
- NPTS
Nationwide Personal Transportation Survey
- VALT
Traffic Safety Committee of Insurance Companies
- GSS
General Social Survey from National Opinion Research Center
- ICBS
Israel Central Bureau of Statistics
- CARE
Critical Analysis Reporting Environment
- WCRVD
Wyoming Court Reported Violation Database
- DUI
Driving under the influence
- DWI
Driving while intoxicated
- UDA
Unsafe driving action
- USA
United States of America
Authors’ contributions
All authors contributed to writing the manuscript, screening studies, conducting analysis, and design. All authors read and approved the final manuscript.
Funding
This work was supported by VicRoads, part of the Department of Transport, Victoria. VicRoads outlined the central research question, but all other aspects of the review were conducted independently.
Availability of data and materials
All data analyzed during this study are included in this published article and its supplementary contents.
Ethics approval and consent to participate
All data originated from published articles.
Consent for publication
No individual person’s data was presented.
Competing interests
No competing interests are reported by the authors.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Reneta Slikboer, Email: rslikboer@swin.edu.au.
Samuel D. Muir, Email: sdmuir@swin.edu.au
S. S. M. Silva, Email: sssilva@swin.edu.au
Denny Meyer, Email: dmeyer@swin.edu.au.
Supplementary information
Supplementary information accompanies this paper at 10.1186/s13643-020-01475-7.
References
- 1.AustLII. Transport Accident Act 1986 - Schedule [Internet]. Victorian current acts. 2019 [cited 2019 Oct 7]. Available from: http://www.austlii.edu.au/au/legis/vic/consol_act/taa1986204/sch1.html.
- 2.Bureau of Infrastructure Transport and Regional Economics . Canberra ACT. 2017. Road trauma Australia 2016 statistical summary. [Google Scholar]
- 3.Australian Bureau of Statistics. Criminal Courts, Australia, 2017-18 [Internet]. 4513.0 Criminal Courts, Australia, 2017-18. 2019 [cited 2019 Oct 7]. Available from: https://www.abs.gov.au/ausstats/abs@.nsf/mf/4513.0.
- 4.Kraus J, Peek C, McAethur D. The effect of the 1992 California motocycle helment use law on motorcycle crash fatalities and injuries. JAMA. 1994;272(19):1506–1511. doi: 10.1001/jama.1994.03520190052034. [DOI] [PubMed] [Google Scholar]
- 5.Wali B, Ahmed A, Iqbal S, Hussain A. Effectiveness of enforcement levels of speed limit and drink driving laws and associated factors - exploratory empirical analysis using a bivariate ordered probit model. J Traffic Transp Eng (English Ed [Internet]. 2017;4(3):272–9. Available from: 10.1016/j.jtte.2017.04.001.
- 6.Yang B, Liu P, Chan C, Xu C, Guo Y. Identifying the crash characteristics on freeway segments based on different ramp influence areas. Traffic Inj Prev [Internet]. 2019;20(4):386–391. Available from: 10.1080/15389588.2019.1588965. [DOI] [PubMed]
- 7.Chen Y, Liu G, Zhang Z, Hou S. Integrated design technique for materials and structures of vehicle body under crash safety considerations. Struct Multidiscip Optim. 2017;56(2):455–472. doi: 10.1007/s00158-017-1674-8. [DOI] [Google Scholar]
- 8.Salmon PM, Read GJM, Beanland V, Thompson J, Filtness AJ, Hulme A, et al. Bad behaviour or societal failure? Perceptions of the factors contributing to drivers’ engagement in the fatal five driving behaviours. Appl Ergon [Internet]. 2019;74(April 2018):162–71. Available from: 10.1016/j.apergo.2018.08.008. [DOI] [PubMed]
- 9.Asbridge M, Hayden JA. Acute cannabis consumption and motor vehicle collision risk: systematic review of observational studies and meta-analysis. BMJ. 2012;536(February):1–9. doi: 10.1136/bmj.e536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Asgarian F, Namdari M, Soori H. Worldwide prevalence of alcohol in fatally injured motorcyclists: a meta-analysis. Traffic Inj Prev. 2019. [DOI] [PubMed]
- 11.Bingham CR, Ehsani JP. The relative odds of involvement in seven crash configurations by driver age and sex. J Adolesc Health [Internet. 2012;51(5):484–90 Available from: 10.1016/j.jadohealth.2012.02.012. [DOI] [PubMed]
- 12.Moradi A, Saeed S, Nazari H, Rahmani K. Sleepiness and the risk of road traffic accidents: a systematic review and meta-analysis of previous studies. Transp Res Part F Psychol Behav [Internet]. 2018; Available from: 10.1016/j.trf.2018.09.013.
- 13.DeYoung DJ, Gebers MA. An examination of the characteristics and traffic risks of drivers suspended/revoked for different reasons. J Saf Res. 2004;35:287–295. doi: 10.1016/j.jsr.2004.01.002. [DOI] [PubMed] [Google Scholar]
- 14.Elliott MR, Waller PF, Raghunathan TE, Shope JT. Predicting offenses and crashes from young drivers’ offense and crash histories. Traffic Inj Prev. 2007;6586.
- 15.Rezapour M, Wulff S, Ksaibati K. Predicting truck at fault crashes using crash and traffic offence data. Open Transp J. 2018;12:128–138. doi: 10.2174/18744478018120100128. [DOI] [Google Scholar]
- 16.Sagberg F, Economics T, Engström J, Group V A review of research on driving styles and road safety. Hum Factors. 2015;57(7):1248–1275. doi: 10.1177/0018720815591313. [DOI] [PubMed] [Google Scholar]
- 17.Leal NL, Watson BC. The road safety implications of illegal street racing and associated risky driving behaviours: an analysis of offences and offenders. Accid Anal Prev [Internet]. 2011;43(4):1547–1554. Available from: 10.1016/j.aap.2011.03.010. [DOI] [PubMed]
- 18.Noland RB, Karlaftis MG. Sensitivity of crash models to alternative specifications. Transp Res Rec Part E Logist Transp Rev. 2005;41(5):439–458. doi: 10.1016/j.tre.2005.03.002. [DOI] [Google Scholar]
- 19.Venkataraman N, Shankar V, Blum J, Hariharan B, Hong J. Transferability analysis of heterogeneous overdispersion parameter negative binomial crash models. Transp Res Rec. 2016;2583(1):99–109. doi: 10.3141/2583-13. [DOI] [Google Scholar]
- 20.Savolainen PT, Mannering FL, Lord D, Quddus MA. The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. Accid Anal Prev [Internet]. 2011;43(5):1666–1676. Available from: 10.1016/j.aap.2011.03.025. [DOI] [PubMed]
- 21.Iranitalab A, Khattak A. Comparison of four statistical and machine learning methods for crash severity prediction. Accid Anal Prev 2017;108(August):27–36. Available from: 10.1016/j.aap.2017.08.008. [DOI] [PubMed]
- 22.Lord D, Mannering F. The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transp Res Part A [Internet]. 2010;44(5):291–305. Available from: 10.1016/j.tra.2010.02.001.
- 23.Cassarino M, Murphy G. Reducing young drivers’ crash risk: are we there yet? An ecological systems-based review of the last decade of research. Transp Res Part F Traffic Psychol Behav [Internet]. 2018;56:54–73. Available from: 10.1016/j.trf.2018.04.003.
- 24.Connor J, Whitlock G, Norton R, Jackson R. The role of driver sleepiness in car crashes: a systematic review of epidemiological studies. Accid Anal Prev. 2001;33:31–41. doi: 10.1016/S0001-4575(00)00013-0. [DOI] [PubMed] [Google Scholar]
- 25.Asbridge M, Desapriya E, Ogilvie R, Cartwright J, Mehrnoush V, Ishikawa T, et al. The impact of restricted driver’s licenses on crash risk for older drivers: a systematic review. Transp Res Part A Policy Pract [Internet]. 2017;97:137–145. Available from: 10.1016/j.tra.2017.01.006.
- 26.Elvik R. Risk of road accident associated with the use of drugs: a systematic review and meta-analysis of evidence from epidemiological studies. Accid Anal Prev [Internet]. 2013;60:254–267. Available from: 10.1016/j.aap.2012.06.017. [DOI] [PubMed]
- 27.Oviedo-trespalacios O, Truelove V, Watson B, Hinton JA. The impact of road advertising signs on driver behaviour and implications for road safety: a critical systematic review. Transp Res Part A [Internet]. 2019;122(November 2018):85–98. Available from: 10.1016/j.tra.2019.01.012.
- 28.Moher D, Liberati A, Telzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7). [DOI] [PMC free article] [PubMed]
- 29.World Health Orginization. Road traffic deaths, data by county [Internet]. Global Health Observatory data repository. 2019 [cited 2019 Oct 7]. Available from: https://apps.who.int/gho/data/node.main. A997.
- 30.Altman DG, Schulz KF. Statistical analyses and methods in the published literature: the SAMPL guidelines. In: Moher D, Altman DG, Schulz KF, Simera I, Wager E, editors. Guidelines for reporting health research: a user’s manual. Oxford: John Wiley & Sons, Ltd.; 2014. pp. 265–274. [Google Scholar]
- 31.Lui K, Marchbanks P. A study of the time between previous traffic infractions and fatal automobile crashes, 1984-1986. J Saf Res. 1990;21:45–51. doi: 10.1016/0022-4375(90)90001-R. [DOI] [Google Scholar]
- 32.Perneger T, Smith GS. The driver’s role in fatal two-car crashes: a paired “case- control” study. Am J Epidemiol 1991;134(December):1138–1145. [DOI] [PubMed]
- 33.Rajalin S. The connection between risky driving and involvement in fatal accidents. Accid Anal Prev. 1994;26(5):555–562. doi: 10.1016/0001-4575(94)90017-5. [DOI] [PubMed] [Google Scholar]
- 34.Cooper PJ. The relationship between speeding behaviour (as measured by violation convictions) and crash involvement. J Saf Res. 1997;28(2):83–95. doi: 10.1016/S0022-4375(96)00040-0. [DOI] [Google Scholar]
- 35.Wundersitz L, Burns N. Relationships between prior driving record, driver culpability, and fatal crash involvement. 2004. [Google Scholar]
- 36.Kim HS, Kim HJ, Son B. Factors associated with automobile accidents and survival. Accid Anal Prev 2006;38(April):981–987. [DOI] [PubMed]
- 37.Blower D, Green PE. Type of motor carrier and driver history in fatal bus crashes. Transp Res Rec J Transp Res Board. 2010:37–43.
- 38.Malchose D, Vachal K. Identifying factors that predict teen driver crashes [Internet] 2011. [Google Scholar]
- 39.Lueck MD, Murray DC. Predicting truck crash involvement: linking driver behaviors to crash probability. J Transp Law, Logist Policy. 2011:109–28.
- 40.Gates J, Dubois S, Mullen N, Weaver B, Bedard M. The influence of stimulants on truck driver crash responsibility in fatal crashes. Forensic Sci Int. 2013;228:15–20. doi: 10.1016/j.forsciint.2013.02.001. [DOI] [PubMed] [Google Scholar]
- 41.Factor R. The effect of traffic tickets on road traffic crashes. Accid Anal Prev [Internet]. 2014;64:86–91. Available from: 10.1016/j.aap.2013.11.010. [DOI] [PubMed]
- 42.Reguly P, Dubois S, Bedard M. Examining the impact of opioid analgesics on crash responsibility in truck drivers involved in fatal crashes. Forensic Sci Int. 2014;234:154–161. doi: 10.1016/j.forsciint.2013.11.005. [DOI] [PubMed] [Google Scholar]
- 43.Dubois S, Mullen N, Weaver B, Bedard M. The combined effects of alcohol and cannabis on driving: impact on crash risk. Forensic Sci Int. 2015;248:94–100. doi: 10.1016/j.forsciint.2014.12.018. [DOI] [PubMed] [Google Scholar]
- 44.Kumfer W, Wei D, Liu H. Effects of demographic and driver factors on single-vehicle and multivehicle fatal crashes investigation with multinomial logistic regression. Transp Res Rec J Transp Res Board. 2015;2518:37–45. doi: 10.3141/2518-05. [DOI] [Google Scholar]
- 45.Feng S, Li Z, Ci Y, Zhang G. Risk factors affecting fatal bus accident severity: their impact on different types of bus drivers. Accid Anal Prev [Internet]. 2016;86:29–39. Available from: 10.1016/j.aap.2015.09.025. [DOI] [PubMed]
- 46.Li G, Chihuri S, Brady J. Role of alcohol and marijuana use in the initiation of fatal two-vehicle crashes. Ann Epidemiol [Internet]. 2017;27(5):342–347. Available from: 10.1016/j.annepidem.2017.05.003. [DOI] [PubMed]
- 47.Hamzeie R, Thompson I, Roy S, Savolainen PT. State-level comparison of traffic fatality data in consideration of Marijuana Laws. Transp Res Rec J Transp Res Board [Internet. 2017;2660:78–85 Available from: 10.3141/2660-11.
- 48.Stringer RJ. Exploring traffic safety culture and drunk driving: an examination of the community and DUI related fatal crashes in. Transp Res Part F Psychol Behav [Internet]. 2018;56:371–80. Available from: 10.1016/j.trf.2018.05.014.
- 49.Mashhadi M, Wulff S, Ksaibati K. A comprehensive study of single and multiple truck crashes using violation and crash data. Open Transp J. 2018;12:43–56. doi: 10.2174/1874447801812010043. [DOI] [Google Scholar]
- 50.Yuan Y, Yang M, Gan Z, Wu J, Xu C, Lei D. Analysis of the risk factors affecting the size of fatal accidents involving trucks based on the structural equation model. Transp Res Rec. 2019;1–3.
- 51.Zhang X, Qu X, Tao D, Xue H. The association between sensation seeking and driving outcomes: a systematic review and meta-analysis. Accid Anal Prev [Internet]. 2019;123(127):222–34. Available from: 10.1016/j.aap.2018.11.023. [DOI] [PubMed]
- 52.Tabachnick B, Fidell L. Using multivariate statistics. 5th Editio. Pearson Education, Inc.; 2007.
- 53.Yannis G, Richter T, Ruhl S, Dragomanovits A, Graham D, Laiou A, et al. Road traffic accident prediction modelling: a literature review. Transport. 2017;170(TR5):245–254. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1. Data base search strings 7/6/2019. Appendix S1-S6
Data Availability Statement
All data analyzed during this study are included in this published article and its supplementary contents.


