Abstract
Background:
Alcohol- and cannabis-impaired driving remain major public health concerns, particularly among young adults. Although prior studies have identified numerous risk factors, most have focused on limited subsets of predictors, restricting a broader understanding of impaired driving. This study applied machine learning to identify salient predictors of alcohol- and cannabis-impaired driving from a wide range of candidate variables.
Methods:
Data came from annual cross-sectional surveys of 18- to 25-year-olds participating in the Washington Young Adult Health Survey (2015–2022). Analyses were limited to two overlapping subsets of participants: those who reported past-month alcohol use for analyses predicting alcohol-impaired driving (N=9,852) and those who reported past-month cannabis use for analyses predicting cannabis-impaired driving (N=4,891). Regularized regression and random forests were used to identify the most salient predictors of each type of impaired driving from a large set of approximately 80 candidate variables. These methods were selected for their complementary strengths and their shared capacity for robust performance when handling high-dimensional data with potentially collinear predictors.
Results:
For likelihood of alcohol-impaired driving, top predictors included alcohol use frequency, participants’ age, peak drinking quantity, age of alcohol initiation, full-time employment, and cannabis use frequency. For likelihood of cannabis-impaired driving, top predictors included cannabis use frequency, cannabis-related memory problems, simultaneous alcohol and cannabis use frequency, increased cannabis tolerance, and age of cannabis initiation.
Conclusions:
Two complementary machine learning methods yielded convergent findings on the most salient predictors of impaired driving, increasing confidence in their validity. These methods provide a flexible alternative to traditional models for analyzing high-dimensional data and highlight recent use patterns, substance use disorder symptoms, and age of initiation as key priorities for prevention.
Keywords: Impaired driving, Driving under the influence, Machine learning, Regularization, Random forest
Graphical Abstract
Machine learning applied to statewide survey data revealed robust predictors of alcohol- and cannabis-impaired driving among young adults. Across two complementary algorithms, frequent use, early initiation, polysubstance involvement, and cannabis-related cognitive symptoms consistently emerged as key risk indicators, informing impaired-driving prevention strategies.
Introduction
Impaired driving involving alcohol and/or cannabis remains a major public health concern due to its significant contribution to traffic-related injuries and fatalities. These behaviors are particularly prevalent among young adults and pose significant risks not only to drivers but also to passengers, pedestrians, and other road users. Despite public health campaigns and legal deterrents, impaired driving continues to be a leading cause of preventable harm, contributing to thousands of deaths and injuries annually (CDC, 2024; NHTSA, 2025). The high societal costs, both in human and economic terms, underscore the need for more effective prevention strategies. Although alcohol-impaired driving has long been recognized as a critical risk behavior, cannabis-impaired driving has emerged as an increasingly important concern amid cannabis legalization, shifting social norms, and rising use (Hultgren et al., 2023; Pearlson et al., 2021; Windle et al., 2021).
Both forms of impaired driving are influenced by a broad range of behavioral, demographic, and contextual risk factors. However, most prior research has examined these factors in isolation or in limited combinations, often focusing on relatively small subsets of variables. This piecemeal approach limits the ability to determine which factors are most influential when considered alongside a comprehensive set of potential predictors. Moreover, traditional analytic methods are less well-suited to account for potentially nonlinear relationships between predictors and impaired driving, and/or non-additive interactions among predictors, particularly when the form of these relationships is unknown and difficult to specify in advance. To inform focused, data-driven prevention efforts, it is essential to identify the most salient predictors of each behavior using methods capable of handling a large and potentially overlapping set of risk factors.
Predictors of Alcohol- and Cannabis-Impaired Driving
Past research has identified a wide range of individual, contextual, and demographic factors that predict alcohol-impaired driving. At the individual level, alcohol consumption patterns, particularly binge drinking and frequent use, are consistently among the strongest predictors (Birdsall et al., 2012; Fairlie et al., 2010; LaBrie et al., 2011; Naimi et al., 2009). Demographic characteristics such as male gender and low socioeconomic status are also robustly linked to higher rates of alcohol-impaired driving (Birdsall et al., 2012; Fan et al., 2019; LaBrie et al., 2011; Naimi et al., 2009; Romano et al., 2012). Although some studies have found associations with age, race, and ethnicity (e.g., Birdsall et al., 2012; Fairlie et al., 2010; Fan et al., 2019; Naimi et al., 2009), the direction and statistical significance of these findings vary across populations and study designs. Prior traffic violations and symptoms of alcohol use disorder further elevate risk (Gebers & Peck, 2003; SAMHSA, 2022). Social and contextual influences, including peer presence when driving, late-night driving, and rural or high-speed roads, can compound these risks by increasing opportunity and reducing perceived deterrence (Romano et al., 2012; Simons-Morton et al., 2011; Yadav & Velaga, 2020). Although many predictors have been identified, less is known about which ones remain most influential when considered alongside a broad set of potential risk factors. This obfuscates priorities for prevention efforts and highlights the importance of data-driven approaches that can assess the relative contribution of multiple predictors simultaneously.
A growing body of research has examined predictors of cannabis-impaired driving, highlighting the importance of both substance-use patterns and broader behavioral and demographic factors. Frequent cannabis use is consistently associated with greater risk, with individuals who engage in near-daily cannabis use being more likely to report cannabis-impaired driving than those who only use cannabis occasionally (Berg et al., 2018; Sterzer et al., 2022; Wickens et al., 2022). Early initiation of cannabis use and symptoms of cannabis use disorder (CUD) have also been linked to increased risk, likely due to heightened exposure and reduced risk perception (Brown et al., 2022; Salas-Wright et al., 2021; Sterzer et al., 2022). Higher levels of alcohol use are similarly associated with cannabis-impaired driving, suggesting that frequent or heavy drinking may reflect broader risk-taking tendencies or patterns of polysubstance use that further elevate the risk of driving under the influence of cannabis (Lloyd et al., 2020; Sterzer et al., 2022). Sociodemographic and psychological characteristics including younger age, male sex, and sensation-seeking tendencies have also been implicated (Berg et al., 2018; Lloyd et al., 2022; Sterzer et al., 2022; Wickens et al., 2022), as have permissive attitudes toward cannabis and beliefs that driving under its influence is not particularly dangerous (Arterberry et al., 2013; Berg et al., 2018). Despite this evidence, uncertainty remains about which predictors are most salient when accounting for a broad array of potential factors. As cannabis legalization has expanded access and normalized use (Farrelly et al., 2023), it is increasingly important to identify the strongest risk factors for driving under the influence of cannabis.
How Can Machine Learning Help Identify the Most Salient Predictors of Impaired Driving?
Impaired driving is a complex behavior shaped by a wide array of factors, including substance use patterns, demographic characteristics, social roles, and norms. To develop more effective prevention strategies, it is essential to identify which of these factors most strongly predict impaired driving. However, traditional statistical methods such as standard regression often struggle when faced with many candidate predictors, especially when predictors are correlated or when their relationships with the outcome are non-linear or involve non-additive interactions (Jacobucci et al., 2023; James et al, 2021). For example, although some studies emphasize substance use frequency, others highlight factors like age of initiation or symptoms of substance use disorders, leading to inconsistent conclusions about which factors are most important and under what conditions. These inconsistencies may reflect the limitations of traditional approaches in modeling complex data structures.
Machine learning offers a more flexible and comprehensive framework, enabling the simultaneous evaluation of many predictors without strong assumptions about their functional forms (e.g., linearity). It can more effectively model interactions, nonlinearities, and multicollinearity (Jacobucci et al., 2023; James et al., 2021). Regularized regression and random forests are two complementary machine learning methods well-suited for identifying the most salient predictors of impaired driving. Regularized regression extends traditional linear regression by adding a penalty term that shrinks the influence of less informative predictors, thereby preventing overfitting and enabling automatic variable selection (James et al., 2021; Kuhn & Johnson, 2013). This approach retains the structure and interpretability of traditional regression while improving performance in high-dimensional settings. In contrast, random forests are non-parametric methods that build an ensemble of decision trees to flexibly model complex, non-linear relationships and interactions without requiring those forms to be pre-specified (James et al., 2021; Kuhn & Johnson, 2013). Both methods accommodate large predictor sets, handle multicollinearity, rely on fewer assumptions than traditional models, and have demonstrated strong performance across a wide range of applications (Jacobucci et al., 2023; James et al., 2021; Kuhn & Johnson, 2013). They also yield interpretable outputs. Regularized regression provides standardized coefficients indicating the direction and relative strength of associations, whereas random forests generate variable importance scores based on each predictor’s contribution to model performance (Kuhn & Johnson, 2013).
No single machine learning algorithm performs best across all problems, a principle known as the “no free lunch” theorem (Wolpert, 1996). Relying on a single method may overlook important patterns or introduce bias (Jacobucci et al., 2023; James et al., 2021). By applying both regularized regression and random forests, researchers can leverage the strengths of each method while mitigating their individual limitations (Kuhn & Johnson, 2013; Kuhn & Silge, 2022). When two fundamentally different approaches converge on similar sets of top predictors, this enhances confidence in the robustness and generalizability of the findings. In this way, machine learning not only improves predictive accuracy but also advances a more nuanced and empirically grounded understanding of the behaviors most relevant to impaired driving prevention and intervention.
Current Study
The present study applied two machine learning approaches, regularized regression and random forests, to identify the most salient predictors of alcohol- and cannabis-impaired driving in a statewide sample of young adults in Washington State. Using data collected from 2015 to 2022, we examined a wide range of predictors across demographic, social, and behavioral domains. By comparing results across both machine learning approaches, we aimed to capitalize on their respective strengths, such as coefficient shrinkage in regularized regression and the ability to account for non-linear relationships and interactions in random forests.
Materials and Methods
Participants and Procedure
Participants were drawn from the Washington Young Adult Health Survey (Kilmer et al., 2022), which employed an accelerated longitudinal cohort-sequential design. Beginning in 2014, approximately 2,000 young adults aged 18 to 25 were recruited annually across Washington State via (a) direct mail to a random sample of addresses from the Department of Licensing and (b) social media advertisements (e.g., Facebook, Instagram). Eligibility criteria included residency in Washington and age between 18 and 25. Enrolled participants completed a baseline survey assessing substance use, health behaviors, and risk factors. Although nonmedical cannabis was legalized in Washington in 2012, retail outlets did not open until July 2014. Accordingly, data from the 2014 cohort were excluded from analyses because most of this data was collected when cannabis retail outlets had not yet opened. This study used baseline data from cohorts recruited between 2015 and 2022. Because only one survey per participant was used, the data reflect a repeated cross-sectional design. All procedures were approved by the University of Washington Institutional Review Board.
Separate analytic samples were constructed for each impaired driving outcome. Analyses of alcohol-impaired driving were restricted to participants who reported past-month alcohol use (N = 9,852), whereas analyses of cannabis-impaired driving included only those who reported past-month cannabis use (N = 4,891). There was overlap in the analytic samples as participants who reported both alcohol- and cannabis-impaired driving in the past month were included in both. Most participants in both analytic samples identified their biological sex as female (~69%) and their race/ethnicity as White non-Hispanic (~65%). More specific information about demographic characteristics is provided in Supplemental Tables 2 and 3 for the alcohol-impaired driving subsample and in Supplemental Tables 4 and 5 for the cannabis-impaired driving subsample.
Measures
Driving under the influence of alcohol (DUIA).
Participants were asked, “During the past 30 days, how many times did you drive a car or other vehicle after consuming alcohol?” Response options were “0 times,” “1 time,” “2-3 times,” “4-5 times,” and “6 or more times.” For analysis, responses were dichotomized with “0 times” coded as 0 and any response indicating one or more instances (i.e., 1+ times) coded as 1.
Driving under the influence of cannabis (DUIC).
Participants were asked, “During the past 30 days, how many times did you drive a car or other vehicle within three hours after using cannabis (e.g., marijuana, hashish, edibles)?” The same response options were used as for DUIA. Responses were likewise dichotomized, with “0 times” coded as 0 and any other response coded as 1.
Predictor variables.
Given that both regularized regression and random forest models can accommodate large numbers of predictors, a broad set of variables from the survey were considered for inclusion. Variables were excluded based on several criteria. First, items were removed if they were not administered across all cohorts or if item wording or response options changed substantially over time. Second, variables were excluded if they exhibited near-zero variance (≥99% of responses identical) or were highly redundant with included constructs (e.g., measures of past-month versus past-year frequencies which were correlated r≥0.70). In such cases, the variable with the stronger bivariate association with the outcome was retained to reduce redundancy and improve interpretability. Although regularized regression and random forests are more robust to multicollinearity than traditional regression models, high collinearity can still distort variable importance scores and reduce model interpretability. Third, three cannabis-related variables were excluded from the DUIA analytic sample because they were only asked of participants who reported recent cannabis use, and there was no logical way to recode or impute the resulting missing data. After applying these criteria, 79 and 88 predictor variables (prior to preprocessing) were retained for the DUIA and DUIC models, respectively. These variables spanned multiple domains, including demographics, social roles, living situation, normative beliefs, and substance use behaviors. A complete list of predictor variables is provided in Supplemental Table 1.
Analyses
The analytic workflow was implemented in a series of steps, described in the following subsections. All analyses were conducted using the tidymodels ecosystem (Kuhn & Silge, 2022) in R version 4.5.0 (R Core Team, 2025).
Data splitting and resampling.
First, each analytic sample was randomly split into a training set (75% of observations) and a testing set (25% of observations). The training set was used for model building and hyperparameter tuning, and the testing set was held out for evaluating the final model's out-of-sample performance. Within the training set, repeated 10-fold cross-validation (5 repeats; 50 total resamples) was used to estimate performance during model tuning and selection (Kuhn & Silge, 2022).
Preprocessing.
Second, data were preprocessed to meet the requirements of each model type, following recommendations from Kuhn and Silge (2022). Both model types used the following steps: (1) categorical variable levels endorsed by fewer than 2% of participants were combined into one category, which was typically described as “Other” in supplemental tables containing model results (e.g., participants identifying as asexual or “a sexual orientation not listed here” were collapsed into one sexual orientation category due to low endorsement), (2) missing categorical values were imputed with the mode and missing numeric values with the median. Additional preprocessing was applied to the regularized regression models: (3) categorical variables were dummy-coded and the most frequently endorsed category was set as the referent group (see Supplemental Tables 3 and 5 for the referent groups for each categorical variable in the DUIA and DUIC subsamples, respectively), (4) indicator variables with zero variance were removed, (5) the Yeo-Johnson transformation was applied to help normalize skewed continuous predictors and reduce the influence of outliers, and (6) all continuous predictors were standardized (i.e., centered and scaled) to ensure consistent penalization across predictors and to ease interpretation of coefficients.
Model specifications.
Third, models were initialized and hyperparameters were either fixed or designated for tuning. Hyperparameters are model parameters set prior to training that govern the learning process and influence model complexity or behavior but are not learned directly from the data. The regularized logistic regression models were fit using the elastic net algorithm in the glmnet package (Friedman et al., 2010; Tay et al., 2023). Elastic net regularization combines the L1 penalty from lasso regression and the L2 penalty from ridge regression (Jacobucci et al., 2023; James et al., 2021; Kuhn & Johnson, 2013). Ridge regression shrinks coefficients toward zero but retains all predictors, making it effective for managing multicollinearity. Lasso regression, in contrast, can shrink some coefficients exactly to zero, performing variable selection. Elastic net blends both penalties, offering a flexible approach that can handle correlated and high-dimensional data while stabilizing estimates. The regularized regression models included only main effects and did not incorporate interaction terms or nonlinear transformations (e.g., polynomial terms). Two hyperparameters were tuned: (1) the amount or strength of regularization and (2) the mixing parameter, which controls the balance of L1 versus L2 penalty.
The random forest models were fit using the ranger package (Wright & Ziegler, 2017). Random forests are ensemble methods that aggregate predictions from multiple decision trees to improve accuracy and reduce overfitting (Jacobucci et al., 2023; James et al., 2021; Kuhn & Johnson, 2013). Each tree is trained on a bootstrap sample of the original dataset, and at each split, a randomly selected subset of predictors is considered. This approach introduces decorrelation among trees, which enhances generalization. Final predictions are made by aggregating across all trees (e.g., majority vote for classification). Two hyperparameters were tuned: (1) the number of predictors randomly selected at each split, and (2) the minimum number of observations required in a terminal node. The number of trees was fixed at 1,000, as model performance is generally robust to this parameter (Kuhn & Johnson, 2013).
Hyperparameter tuning.
Fourth, hyperparameter tuning was conducted using a grid of 50 hyperparameter combinations generated via a space-filling design (Kuhn & Silge, 2022). This design ensures broad coverage of the hyperparameter space while minimizing redundancy. Models were trained and evaluated across the grid using resampled performance metrics, and the combination yielding the best performance was selected. For elastic net models, the regularization parameter ranged from 1×10−5 to 1, and the mixing parameter varied from 0.00 (pure ridge) to 1.00 (pure lasso). For random forests, the number of predictors at each split ranged from 5 to 50, and the minimal node size ranged from 1 to 15.
Model evaluation.
Fifth, the final models were trained on the full training set and evaluated on the held-out testing set. The primary performance metric was the area under the receiver operating characteristic curve (ROC AUC), which quantifies the model’s ability to discriminate between positive and negative cases across all possible classification thresholds; a value of 0.5 indicates no discriminative ability (i.e., random guessing), whereas a value of 1.0 indicates perfect discrimination. Additional metrics, including accuracy (proportion of correct predictions), sensitivity (proportion of true positives that are correctly predicted), specificity (proportion of true negatives that are correctly predicted), precision (proportion of positive predictions that are true positives; also known as positive predictive value), and F1 score (the harmonic mean of precision and recall), were also computed to provide a comprehensive assessment of model performance.
Decision threshold.
By default, classification models use a threshold of 0.50 for model-predicted probabilities to classify observations into the positive (e.g., impaired driving = 1) or negative class (0). However, when the proportion of participants endorsing the positive class is very high or very low, this default may yield suboptimal sensitivity or specificity. To improve balance, we identified an optimal decision threshold using Youden’s J statistic (J = sensitivity + specificity – 1; Powers, 2011). This index ranges from 0 (no discrimination) to 1 (perfect discrimination). The threshold that maximizes J represents the point at which the model best distinguishes between classes. Using this criterion helps minimize both false positives and false negatives, which is especially important in public health contexts where both error types carry meaningful consequences.
Model interpretation.
Regularized logistic regression models produced standardized regression coefficients on the log-odds scale. Because all predictors were standardized during preprocessing, each coefficient represents the expected change in the log-odds of the outcome (i.e., impaired driving) associated with a one-unit increase in the predictor, holding all other variables constant. Due to the regularization penalty, these coefficients are shrunk toward zero, with weaker predictors more strongly penalized. As a result, the magnitude of the coefficients reflects both the strength of association and the stability of the predictor in the presence of other correlated variables. Although exponentiating these coefficients yields odds ratios per one standard deviation increase, interpretation should focus on the relative ranking of predictors, rather than the precise magnitude of effect sizes, given the influence of regularization.
Random forest models produced variable importance scores, which reflect the relative contribution of each predictor to the model’s overall performance. These values were calculated using the permutation method (Kuhn & Silge, 2022). In this method, the values of a given predictor are randomly permuted (i.e., shuffled) in the test dataset, breaking any relationship between that predictor and the outcome. The decrease in model performance (ROC AUC) resulting from this permutation is recorded. A larger performance drop indicates that the predictor was important for accurate prediction, while a small or negligible drop suggests limited predictive value. Because this approach captures both main effects and interactions in the model, it is particularly well-suited for nonparametric methods like random forests. As with the regularized regression coefficients, variable importance scores were primarily used to rank predictors in terms of their relative importance for impaired driving prediction.
Results
Descriptive statistics for numeric and categorical variables in the DUIA sample are presented in Supplemental Tables 2 and 3, respectively. Descriptive statistics for numeric and categorical variables in the DUIC sample are presented in Supplemental Tables 4 and 5, respectively. In the DUIA sample of participants who reported alcohol use in the past month, 18.4% reported past-month DUIA and 15.5% reported past-month DUIC. In the DUIC sample of participants who reported cannabis use in the past month, 36.6% reported past-month DUIC and 15.9% reported past-month DUIA.
Most Salient Predictors of DUIA
For the regularized logistic regression model predicting DUIA, after fitting across a grid of 50 hyperparameter combinations, the best-fitting model used a regularization penalty of 1.84×10−2, with 16.6% Lasso and 83.4% ridge penalization. The model demonstrated fair discriminative ability (ROC AUC = 0.74; Supplemental Figure 2a). At the default decision threshold of 0.50, classification accuracy in the test set was 0.82, driven primarily by very high specificity (0.99) but accompanied by very low sensitivity (0.05), indicating poor detection of true positives (Table 1). In contrast, using the decision threshold that maximized Youden’s J statistic (0.17) yielded a more balanced classification, with sensitivity increasing to 0.71 and specificity decreasing to 0.65. However, this came at the cost of lower overall accuracy, which declined to 0.66. These two sets of performance metrics provide context for interpreting the standardized coefficients, which reflect the relative salience of predictors and are unaffected by the chosen threshold.
Table 1.
Performance Metrics
| Outcome: DUIA | Outcome: DUIC | |||||||
|---|---|---|---|---|---|---|---|---|
| Regularized Regression | Random Forest | Regularized Regression | Random Forest | |||||
| Classification Threshold |
Default (0.50) |
J-Index (0.17) |
Default (0.50) |
J-Index (0.19) |
Default (0.50) |
J-Index (0.33) |
Default (0.50) |
J-Index (0.34) |
| ROC AUC | 0.74 | 0.74 | 0.73 | 0.73 | 0.77 | 0.77 | 0.77 | 0.77 |
| Accuracy | 0.82 | 0.66 | 0.82 | 0.64 | 0.71 | 0.70 | 0.71 | 0.68 |
| Sensitivity | 0.05 | 0.71 | 0.02 | 0.73 | 0.52 | 0.80 | 0.52 | 0.81 |
| Specificity | 0.99 | 0.65 | 0.99 | 0.63 | 0.81 | 0.65 | 0.82 | 0.60 |
| Precision | 0.49 | 0.32 | 0.62 | 0.31 | 0.62 | 0.57 | 0.63 | 0.54 |
| F1 Score | 0.10 | 0.44 | 0.03 | 0.43 | 0.57 | 0.66 | 0.57 | 0.65 |
Note. DUIA = Driving under the influence of alcohol; DUIC = Driving under the influence of cannabis; ROC AUC = Area under the receiver operating characteristic curve.
Figure 1 presents a forest plot of the 10 strongest standardized coefficients, with values for all predictors provided in Supplemental Table 6. The strongest predictor of DUIA was alcohol use frequency followed by age and maximum number of drinks on an occasion in the past month, all of which were positively associated with DUIA. Other top predictors of DUIA included age of alcohol use initiation, calendar year, and cannabis use frequency, which were negatively associated with DUIA, and full-time employment, which was positively associated with DUIA.
Figure 1.

Standardized Coefficients for the Top 10 Predictors from the Regularized Logistic Regression Model Predicting Driving Under the Influence of Alcohol.
For the random forest model predicting DUIA, after hyperparameter tuning, the best-fitting model randomly selected 7 predictor variables at each node and required only one participant per node for further tree splitting (i.e., there was no minimal node size). This model also demonstrated fair discriminative ability (ROC AUC = 0.73; Supplemental Figure 2b). At the default decision threshold of 0.50, the test set accuracy was 0.82, driven by very high specificity (0.99) but extremely low sensitivity (0.02), indicating poor detection of true positives. Adjusting the threshold to maximize Youden’s J statistic (0.19) resulted in a more balanced classification, with sensitivity improving to 0.73 and specificity declining to 0.63. However, this improvement in balance came at the cost of reduced overall accuracy, which dropped to 0.64. As with the regularized regression model, variable importance scores are unaffected by the decision threshold used.
Figure 2 shows a variable importance plot of the 10 most influential predictors from the random forest model, with full results in Supplemental Table 7. Importance values represent the decrease in model accuracy when a given predictor is permuted, reflecting its contribution to the model’s overall predictive power. Although variable importance values do not convey directionality, associations were inferred from bivariate relationships with the outcome to aid interpretation. As in the regularized regression model, the most important predictor of DUIA was alcohol use frequency, followed by age and maximum number of drinks on an occasion in the past month, all of which were positively associated with DUIA. Next were cigarette use frequency, which was positively associated with DUIA, and age of cigarette use initiation, which was negatively associated with DUIA. Other top predictors of DUIA were descriptive norms for alcohol use frequency, full-time employment, and number of drinks consumed on a typical drinking occasion, which were all positively associated with DUIA, and age of alcohol use initiation and cannabis use frequency, which were negatively associated with DUIA.
Figure 2.

Variable Importance Scores for the Top 10 Predictors from the Random Forest Model Predicting Driving Under the Influence of Alcohol.
The regularized regression and random forest models predicting DUIA yielded highly similar results in terms of optimal decision thresholds, test set performance, and the most salient predictors. The stark differences in performance metrics (excluding ROC AUC) between the default decision threshold and the threshold that maximized Youden’s J statistic likely reflect the relatively low base rate of DUIA in the sample. Across both models, the top three predictors were alcohol use frequency, age, and maximum drinks per occasion. Additional predictors consistently ranked highly included age of alcohol use initiation, full-time employment, and cannabis use frequency. Despite structural differences between the models (i.e., parametric vs. nonparametric), the convergence of findings reinforces the robustness of these predictors in identifying individuals at risk for alcohol-impaired driving.
Most Salient Predictors of DUIC
For the regularized regression model predicting DUIC, after hyperparameter tuning, the best-fitting model used a regularization penalty of 2.33×10−2, with 30.2% of the penalty being Lasso and 69.8% ridge. The model demonstrated fair discriminative ability (ROC AUC = 0.77; Supplemental Figure 2c). At the default decision threshold of 0.50, test set accuracy was 0.71, with sensitivity of 0.52 and specificity of 0.81. When the decision threshold was adjusted to maximize Youden’s J statistic (0.33), sensitivity increased to 0.80 and specificity decreased to 0.65, while overall accuracy remained similar at 0.70.
Figure 3 shows a forest plot of the 10 strongest standardized coefficients, with full results presented in Supplemental Table 8. The strongest predictor of DUIC was cannabis use frequency followed by the frequency of cannabis-related memory problems and simultaneous alcohol and marijuana/cannabis (SAM) use frequency, all of which were positively associated with DUIC. Other top predictors of DUIC included living in an apartment or condo (versus in a house or townhouse), age of cannabis use initiation, and calendar year, which were all negatively associated with DUIC, and past-year pain reliever use frequency and noticing increased tolerance to cannabis, which were positively associated with DUIC.
Figure 3.

Standardized Coefficients for the Top 10 Predictors from the Regularized Logistic Regression Model Predicting Driving Under the Influence of Cannabis.
For the random forest model predicting DUIC, after hyperparameter tuning, the best-fitting model randomly selected 15 predictor variables at each tree split and required a minimum node size of 6 participants. The model demonstrated fair discriminative ability (ROC AUC = 0.77; Supplemental Figure 2d). At the default decision threshold of 0.50, test set accuracy was 0.71, with sensitivity of 0.52 and specificity of 0.82. When the threshold was adjusted to maximize Youden’s J statistic (0.34), sensitivity increased to 0.81 and specificity decreased to 0.60, resulting in a modest decrease in accuracy to 0.68.
Figure 4 shows a variable importance plot of the 10 most salient predictors from the random forest model, with full results in Supplemental Table 9. Similar to the regularized regression model, the most salient predictor of DUIC was cannabis use frequency, which was positively associated with DUIC. Next were the frequencies of experiencing cannabis-related memory problems and “having the munchies,” both of which were positively associated with DUIC. Other top predictors of DUIC were SAM use frequency, noticing increased tolerance to cannabis, the frequency of experiencing low motivation due to cannabis use, and cigarette use frequency, all of which were positively associated with DUIC, as well as age of cannabis use initiation, which was negatively associated with DUIC.
Figure 4.

Variable Importance Scores for the Top 10 Predictors from the Random Forest Model Predicting Driving Under the Influence of Cannabis.
As with DUIA, the regularized regression and random forest models predicting DUIC produced highly similar results in terms of optimal decision thresholds, performance metrics, and the most salient predictors. In both models, the top two predictors were cannabis use frequency and frequency of experiencing cannabis-related memory problems. Other consistently important predictors included SAM use frequency, noticing increased tolerance to cannabis, and age of cannabis use initiation. Despite overall similarities, there were slight differences between models in the exact ordering of predictors and in which variables were ranked as most salient beyond the top few. Nevertheless, despite differences in model structure (i.e., parametric vs. nonparametric), the consistency of findings across methods reinforces the robustness of these predictors in identifying individuals at risk for cannabis-impaired driving.
Discussion
This study offers new insights into the predictors of alcohol- and cannabis-impaired driving among young adults in a context where nonmedical cannabis legalization has been implemented, using a robust machine learning framework. The findings highlight the value of applying data-driven models to better identify the most salient risk factors of impaired driving while also supporting findings from parametric models employed in previous research. Moreover, the consistency between regularized regression and random forest models in terms of the most salient predictors, as well as their comparable performance metrics, increases confidence in the stability and validity of the results.
For alcohol-impaired driving, both modeling approaches revealed that alcohol use frequency was by far the strongest predictor, which is consistent with prior research and theoretical expectations (Birdsall et al., 2012; Fairlie et al., 2010; LaBrie et al., 2011; Naimi et al., 2009). Qualitative and mixed methods research indicates young drivers view low levels of drinking (i.e., 1–2 drinks) prior to driving as safe for people, in general, but tend to base the assessment of their own ability to drive on internal feelings of intoxication instead of their drinking behavior, which can be erroneous (e.g., Vaca et al., 2024). This may be particularly relevant for people who drink more frequently and who are less accurate in assessing their actual level of impairment (Aston & Liguori, 2013). A key strength of the machine learning methods used is their ability to assess the relative importance of correlated predictors without allowing dominant variables like alcohol use frequency to overshadow other meaningful factors. In contrast to traditional parametric models, which may allow highly correlated predictors to suppress one another’s effects, the machine learning approaches applied here enabled the identification of additional variables that made independent contributions to DUIA risk. Age, the maximum number of drinks on a single occasion, and full-time employment status also emerged as positively associated with DUIA. These findings highlight that DUIA risk reflects not only patterns of substance use but also broader sociodemographic and lifestyle characteristics (Birdsall et al., 2012; Fan et al., 2019; LaBrie et al., 2011; Simons-Morton et al., 2011). Additionally, the identification of age of alcohol use initiation as a negative predictor highlights the protective influence of delayed substance use onset (Dawson et al., 2008; Guttmannova et al., 2011).
The negative association between cannabis use frequency and DUIA is somewhat counterintuitive, as prior research has consistently shown that cannabis use frequency is positively associated with alcohol use frequency at the between-person level (e.g., Guttmannova et al., 2021). However, findings from event-level and day-level studies suggest that cannabis use may sometimes be associated with reduced alcohol use in specific contexts, such as when individuals use cannabis instead of drinking or delay alcohol use on cannabis use days (Gunn et al., 2022). These substitution patterns may help explain why those who reported more frequent cannabis use in this study also reported lower likelihood of alcohol-impaired driving, despite typically higher levels of alcohol involvement. Additionally, qualitative research consistently shows that DUIC is perceived as less dangerous than DUIA and, among some young adults, believed to improve driving ability (e.g., Colonna et al., 2021; Goodman et al., 2020; Resko et al. 2019; Wickens et al., 2019). It is possible that individuals who use cannabis frequently are less likely to DUIA because of perceived greater harms of DUIA relative to DUIC. Additional research is needed to see how perceived safety of DUI impacts these associations. Moreover, this association emerged after accounting for a wide array of covariates, including alcohol use frequency and quantity, suggesting a unique contribution of cannabis use frequency to lower DUIA risk.
For cannabis-impaired driving, cannabis use frequency emerged as the strongest predictor. As with alcohol-impaired driving, the use of machine learning methods allowed for the identification of additional salient predictors without allowing the dominant influence of cannabis frequency to overshadow other meaningful factors. Among these, the frequency of cannabis-related memory problems and SAM use stood out as particularly important. These results are consistent with prior research indicating that both cannabis and alcohol use are strong predictors of cannabis-impaired driving (Lloyd et al., 2020; Sterzer et al., 2022). Age of cannabis use initiation also emerged as a salient predictor, with earlier initiation associated with higher risk of DUIC, aligning with evidence that early substance use is linked to more problematic use patterns later in life (Fairman et al., 2019; Lloyd et al., 2020; Rioux et al., 2018).
The presence of memory problems and perceived increases in cannabis tolerance among the top predictors of DUIC suggests that subjective cognitive and physiological responses to cannabis may also be critical indicators of DUIC risk. Notably, both memory problems and tolerance are diagnostic criteria for CUD as defined in the DSM-5 (American Psychiatric Association, 2013). In addition, DUIC itself is commonly cited as a behavioral example of the CUD criterion involving recurrent use in physically hazardous situations. The emergence of these specific symptoms as leading predictors of DUIC provides empirical support for the validity of the CUD diagnostic framework and suggests that individuals who report DUIC may be at elevated risk of meeting CUD criteria. More broadly, these findings raise the possibility that certain CUD criteria may be more strongly associated with high-risk cannabis-related behaviors than others, pointing to meaningful variability in the functional impact of individual diagnostic features.
One of the notable contributions of this study is the use of complementary machine learning approaches (i.e., regularized regression and random forests) to evaluate a broad array of potential predictors without relying on the assumptions of traditional statistical models. Regularized regression offers several key strengths, including the ability to shrink less informative coefficients toward zero to reduce overfitting, while retaining interpretable, standardized coefficients that convey both the direction and relative strength of associations. This interpretability allows for more nuanced understanding of how specific variables are associated with impaired driving outcomes. In contrast, random forests provide a highly flexible, nonparametric modeling framework that can account for complex nonlinear relationships and interactions among predictors, even if these are not explicitly modeled or reported. Random forests also generate robust, permutation-based estimates of variable importance that present a rank ordering of predictors based on their contributions to model performance. The consistency observed across these two distinct modeling approaches, both in predictive performance and in identifying top predictors, strengthens the validity of the findings. This convergence is especially important given the “no free lunch” theorem in machine learning, which states that no single algorithm is universally optimal across all problems (Jacobucci et al., 2023; James et al., 2021; Wolpert, 1996). By applying two methodologically distinct approaches and observing aligned results, this study increases the likelihood that the identified predictors reflect genuine patterns in the data rather than artifacts of a specific model.
Furthermore, this study illustrates the importance of accounting for multiple domains of risk, including not only substance use behaviors but also social and behavioral factors and demographic characteristics. Prior work has shown that impaired driving is rarely the result of a single factor; rather, it emerges from a constellation of behavioral patterns, contextual factors, and individual vulnerabilities (e.g., Fan et al., 2019; Lloyd et al., 2020; Sterzer et al., 2022). For instance, demographic characteristics such as age, employment or socioeconomic status, and housing type may shape exposure to risk, access to transportation alternatives, and opportunities for substance use (Fan et al., 2019; White et al., 2021). Social influences, including perceived norms around drinking or cannabis use, can also play a powerful role in shaping attitudes and decisions around impaired driving (LaBrie et al., 2011; Lloyd et al., 2020). Accounting for these diverse domains allows for a more ecologically valid understanding of who is most at risk for impaired driving and why. By integrating these varied predictors into a unified modeling framework, this study moves the field toward a more comprehensive understanding of impaired driving behavior.
These findings have practical implications for the design of focused intervention strategies. Prevention and intervention efforts should include a specific focus on high-frequency substance use, early initiation of use, and cannabis-related functional impairments. Educational campaigns could also be tailored to individuals in high-risk living or employment situations, and messaging could focus on correcting misconceptions about the safety of cannabis or SAM use while driving. Additionally, integrating screening tools that assess polysubstance use patterns and related cognitive symptoms (e.g. memory problems) into routine health or educational settings could help identify at-risk individuals earlier and facilitate timely intervention.
Despite the strengths of this study, including the use of large-scale data and advanced modeling techniques, there are several limitations. First, the data is cross-sectional and self-reported, which introduces potential biases, particularly around sensitive behaviors such as impaired driving. Second, the models were trained on a specific set of variables available in the dataset, and other potentially important predictors not captured in the survey (e.g., mental health problems, access to alternative transportation, or law enforcement presence) may also play a significant role but were not evaluated here. Third, because the study was conducted in a single U.S. state where nonmedical cannabis use has been legal for over a decade, the findings may not generalize to states with different legal frameworks, enforcement practices, or cultural attitudes toward cannabis.
In conclusion, this study demonstrates that machine learning can meaningfully advance multidimensional understanding of impaired driving by identifying the most salient predictors across various ecological domains in complex datasets. The findings suggest both alcohol- and cannabis-impaired driving are closely tied to patterns of recent use, but also influenced by cognitive, behavioral, and sociodemographic factors. These insights may help inform data-driven prevention and intervention strategies tailored to high-risk subgroups and ultimately help reduce the public health burden of impaired driving.
Supplementary Material
Acknowledgments
This research was supported by a grant from the National Institute on Drug Abuse (R01DA057705, PI: Guttmannova), a contract with the Washington State Health Care Authority (Division of Behavioral Health and Recovery) (PI: Kilmer), and a grant from the National Institute of Alcohol Abuse and Alcoholism (R00AA030052, PI: McCabe). The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Drug Abuse, the National Institute of Alcohol Abuse and Alcoholism, the National Institutes of Health, or the Washington State Health Care Authority. The authors have no conflicts of interest to report.
References
- American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders. 5th ed. 10.1176/appi.books.9780890425596 [DOI] [Google Scholar]
- Arterberry BJ, Treloar HR, Smith AE, Martens MP, Pedersen SL, McCarthy DM (2013) Marijuana use, driving, and related cognitions. Psychol Addict Behav 27:854–860. https://psycnet.apa.org/doi/10.1037/a0030877 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aston ER, Liguori A (2013) Self-estimation of blood alcohol concentration: A review. Addictive Behav. 38:1944–1951. 10.1016/j.addbeh.2012.12.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berg CJ, Daniel CN, Vu M, Li J, Martin K, Le L (2018) Marijuana use and driving under the influence among young adults: A socioecological perspective on risk factors. Subst Use Misuse 53:370–380. 10.1080/10826084.2017.1327979 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birdsall WC, Reed BG, Huq SS, Wheeler L, Rush S (2012) Alcohol-impaired driving: Average quantity consumed and frequency of drinking do matter. Traffic Inj Prev 13:24–30. 10.1080/15389588.2011.629700 [DOI] [PubMed] [Google Scholar]
- Brown T, Banz B, Schmitt R, Gaffney G, Milavetz G, Camenga D, Li K, Brooks-Russell A, Vaca F (2022) A study of self-reported personal cannabis use and state legal status and associations with engagement in and perceptions of cannabis-impaired driving. Traffic Inj Prev 23(sup1):S183–S186. 10.1080/15389588.2022.2124803 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Centers for Disease Control and Prevention. Leading causes of death reports, 2018-2023 [database online]. Available at: https://wisqars.cdc.gov/fatal-leading. Accessed August 8, 2025. [Google Scholar]
- Colonna R, Hand CL, Holmes JD, Alvarez L (2021) Exploring youths’ beliefs towards cannabis and driving: A mixed method study. Transportation Research Part F: Traffic Psychology and Behaviour 82:429–439. 10.1016/j.trf.2021.09.013 [DOI] [Google Scholar]
- Dawson DA, Goldstein RB, Chou SP, Ruan WJ, Grant BF (2008) Age at first drink and the first incidence of adult-onset DSM-IV alcohol use disorders. Alcohol: Clin Exp Res 32:2149–2160. 10.1111/j.1530-0277.2008.00806.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fairlie AM, Quinlan KJ, DeJong W, Wood MD, Lawson D, Witt CF (2010) Sociodemographic, behavioral, and cognitive predictors of alcohol-impaired driving in a sample of US college students. J Health Commun 15:218–232. 10.1080/10810730903528074 [DOI] [PubMed] [Google Scholar]
- Fairman BJ, Furr-Holden CD, Johnson RM (2019) When marijuana is used before cigarettes or alcohol: Demographic predictors and associations with heavy use, cannabis use disorder, and other drug-related outcomes. Prev Sci 20:225–233. 10.1007/s11121-018-0908-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan AZ, Grant BF, Ruan WJ, Huang B, Chou SP (2019) Drinking and driving among adults in the United States: Results from the 2012–2013 National Epidemiologic Survey on Alcohol and Related Conditions-III. Accid Anal Prev 125:49–55. 10.1016/j.aap.2019.01.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farrelly KN, Wardell JD, Marsden E, Scarfe ML, Najdzionek P, Turna J, MacKillop J (2023) The impact of recreational cannabis legalization on cannabis use and associated outcomes: A systematic review. Subst Abuse: Res Treat 17:11782218231172054. 10.1177/11782218231172054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22. 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gebers MA, Peck RC (2003) Using traffic conviction correlates to identify high accident-risk drivers. Accid Anal Prev 35:903–912. 10.1016/S0001-4575(02)00098-2 [DOI] [PubMed] [Google Scholar]
- Goodman SE, Leos-Toro C, Hammond D (2020) Risk perceptions of cannabis-vs. alcohol-impaired driving among Canadian young people. Drugs: Educ. Prev. Policy 27:205–212. 10.1080/09687637.2019.1611738 [DOI] [Google Scholar]
- Gunn RL, Aston ER, Metrik J (2022) Patterns of cannabis and alcohol co-use: Substitution versus complementary effects. Alcohol Research: Current Reviews 42:04. 10.35946/arcr.v42.1.04 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guttmannova K, Bailey JA, Hill KG, Lee JO, Hawkins JD, Woods ML, Catalano RF (2011) Sensitive periods for adolescent alcohol use initiation: Predicting the lifetime occurrence and chronicity of alcohol problems in adulthood. J Stud Alcohol Drugs 72:221–231. 10.15288/jsad.2011.72.221 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guttmannova K, Fleming CB, Rhew IC, Abdallah DA, Patrick ME, Duckworth JC, Lee CM (2021) Dual trajectories of cannabis and alcohol use among young adults in a state with legal nonmedical cannabis. Alcohol: Clin Exp Res 45:1458–1467. 10.1111/acer.14629 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hultgren BA, Guttmannova K, Cadigan JM, Kilmer JR, Delawalla ML, Lee CM, Larimer ME (2023) Injunctive norms and driving under the influence and riding with an impaired driver among young adults in Washington State. J Adolesc Health 73:852–858. 10.1016/j.jadohealth.2023.06.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacobucci R, Grimm KJ, Zhang Z (2023) Machine learning for social and behavioral research. Guilford Publications. [Google Scholar]
- James G, Witten D, Hastie T, Tibshirani R (2021) An introduction to statistical learning with Applications in R. 2nd ed. Springer. [Google Scholar]
- Kilmer JR, Rhew IC, Guttmannova K, Fleming CB, Hultgren BA, Gilson MS, Cooper RL, Dilley J, Larimer ME (2022) Cannabis use among young adults in Washington State after legalization of nonmedical cannabis. Am J Public Health 112:638–645. 10.2105/AJPH.2021.306641 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuhn M, Johnson K (2013) Applied predictive modeling. Springer. [Google Scholar]
- Kuhn M, Silge J (2022) Tidy modeling with R: A framework for modeling in the tidyverse. O'Reilly Media, Inc. [Google Scholar]
- LaBrie JW, Kenney SR, Mirza T, Lac A (2011) Identifying factors that increase the likelihood of driving after drinking among college students. Accid Anal Prev 43:1371–1377. 10.1016/j.aap.2011.02.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lloyd SL, Lopez-Quintero C, Striley CW (2020) Sex differences in driving under the influence of cannabis: The role of medical and recreational cannabis use. Addictive Behav 110:106525. 10.1016/j.addbeh.2020.106525 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Naimi TS, Nelson DE, Brewer RD (2009) Driving after binge drinking. Am J Prev Med 37:314–320. 10.1016/j.amepre.2009.06.013 [DOI] [PubMed] [Google Scholar]
- National Center for Statistics and Analysis (2025, April, Revised) Traffic safety facts 2022: A compilation of motor vehicle traffic crash data (Report No. DOT HS 813 656). National Highway Traffic Safety Administration. Accessed on August 8, 2025. https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813656 [Google Scholar]
- Pearlson GD, Stevens MC, D'Souza DC (2021) Cannabis and driving. Front Psychiatry 12:689444. 10.3389/fpsyt.2021.689444 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Powers DMW (2011) Evaluation: From precision, recall and F-score to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2:37–63. 10.48550/arXiv.2010.16061 [DOI] [Google Scholar]
- R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ [Google Scholar]
- Resko S, Ellis J, Early TJ, Szechy KA, Rodriguez B, Agius E (2019) Understanding public attitudes toward cannabis legalization: Qualitative findings from a statewide survey. Subst. Use Misuse 54:1247–1259. 10.1080/10826084.2018.1543327 [DOI] [PubMed] [Google Scholar]
- Rioux C, Castellanos-Ryan N, Parent S, Vitaro F, Tremblay RE, Séguin JR (2018) Age of cannabis use onset and adult drug abuse symptoms: A prospective study of common risk factors and indirect effects. Can J Psychiatry 63:457–464. 10.1177/0706743718760289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Romano EO, Peck RC, Voas RB (2012) Traffic environment and demographic factors affecting impaired driving and crashes. J Saf Res 43:75–82. 10.1016/j.jsr.2011.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salas-Wright CP, Cano M, Hai AH, Oh S, Vaughn MG (2021) Prevalence and correlates of driving under the influence of cannabis in the US. Am J Prev Med 60:e251–e260. 10.1016/j.amepre.2021.01.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simons-Morton BG, Ouimet MC, Zhang Z, Klauer SE, Lee SE, Wang J, Chen R, Albert P, Dingus TA (2011) The effect of passengers and risk-taking friends on risky driving and crashes/near crashes among novice teenagers. J Adolesc Health 49:587–593. 10.1016/j.jadohealth.2011.02.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sterzer FR, Caird JK, Simmons S, Bourdage JS (2022) A scoping review of predictors of driving under the influence of cannabis (DUIC) in young drivers. Transportation Research Part F: Traffic Psychology and Behaviour 88:168–183. 10.1016/j.trf.2022.05.014 [DOI] [Google Scholar]
- Substance Abuse and Mental Health Services Administration (SAMHSA) (2022) Results from the 2021 National Survey on Drug Use and Health: Detailed tables. Available at: https://www.samhsa.gov/data/report/2021-nsduh-detailed-tables. Accessed on August 5, 2025.
- Tay JK, Narasimhan B, Hastie T (2023) Elastic net regularization paths for all generalized linear models. J Stat Softw 106:1–31. 10.18637/jss.v106.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaca FE, Camenga DR, Li K, Zuniga V, Banz B, Iannotti RJ, Grayton C, Simons-Morton B, Haynie DL, Curry LA (2024) Individual and social-environmental factors among young drivers informing decisions to ride with an impaired driver and drive impaired: A sequential mixed methods assessment. Traffic Inj. Prev 25:S15–S24. 10.1080/15389588.2024.2368595 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wickens CM, Stoduto G, Ilie G, Di Ciano P, McDonald AJ, Mistry A, Alawi A, Sharma S, Hamilton H, Nigatu YT, Elton-Marshall T, Mann RE (2022) Driving under the influence of cannabis among recreational and medical cannabis users: A population study. J Transp Health 26:101402. 10.1016/j.jth.2022.101402 [DOI] [Google Scholar]
- Wickens CM, Watson TM, Mann RE, Brands B (2019) Exploring perceptions among people who drive after cannabis use: Collision risk, comparative optimism and normative influence. Drug Alcohol Rev. 38:443–451. 10.1111/dar.12923 [DOI] [PubMed] [Google Scholar]
- Windle SB, Sequeira C, Filion KB, Thombs BD, Reynier P, Grad R, Ells C, Eisenberg MJ (2021) Impaired driving and legalization of recreational cannabis. Can Med Assoc J 193:E481–E485. 10.1503/cmaj.191032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390. 10.1162/neco.1996.8.7.1341 [DOI] [Google Scholar]
- Wright MN, Ziegler A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1–17. 10.18637/jss.v077.i01 [DOI] [Google Scholar]
- Yadav AK, Velaga NR (2020) Alcohol-impaired driving in rural and urban road environments: Effect on speeding behaviour and crash probabilities. Accid Anal Prev 140:105512. 10.1016/j.aap.2020.105512 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
