ABSTRACT
A concern surrounding marijuana legalization is that driving after marijuana use may become more prevalent. Survey data are valuable for estimating policy effects, however their observational nature and unequal sampling probabilities create challenges for causal inference. To estimate population‐level effects using survey data, we propose a matched design and implement sensitivity analyses to quantify how robust conclusions are to unmeasured confounding. Both theoretical justification and simulation studies are presented. We found no support that marijuana legalization increased tolerant behaviors and attitudes toward driving after marijuana use, and these conclusions seem moderately robust to unmeasured confounding.
Keywords: causal inference, marijuana legalization, propensity score matching, sensitivity analysis, survey sampling inference
1. Introduction
Surveys are a valuable source of information for health program and policy evaluation, as they can gather individual‐level information about pertinent behaviors and attitudes that are not easily measured for all members of a population. Furthermore, through the use of weights, survey data can produce estimates that are representative of their target population. Policy evaluations are typically rooted in causal questions about effect of a policy on relevant outcomes among a population. However, multiple challenges are posed when trying to estimate population‐level causal policy effects using survey data, including policy being enacted at the specific subpopulation level, data being observational that may suffer from both measured and unmeasured confounding bias, and how to incorporate survey weights to preserve population‐level interpretation. In this paper, we propose a matched design to overcome these issues and estimate the effects of medical marijuana legalization on residents of states that are likely to pass such bill in near future.
For state‐level health policy analyses, meeting the assumptions to infer about causal effects with observational data (Rosenbaum 2010) is particularly challenging, because the “treatments,” that is, the policies, which are better described as “exposures,” are assigned at the state level, rather than the individual level. Common policy analysis approaches such as interrupted time series (see, e.g., Kontopantelis et al. 2015) instead leverage temporal changes within a state or other municipal unit that has enacted a policy. Additional methods, such as controlled interrupted time‐series analysis and differences‐in‐differences, also incorporate a control group that did not enact a policy within the time period of interest (Lechner 2011). But identifying comparable control groups at state level is always difficult in practice. Another methodological advance in policy evaluation was the synthetic control method of Abadie, Diamond, and Hainmueller (2010), who used weighted average of possible control states to create a more appropriate control group that exhibits similarities to the exposed or treated group with respect to aggregate pretreatment measures of the outcome and observed confounders.
With population surveys, researchers often have access to individual data. Aggregating them at the state level may not take full advantage of the rich data information. Moreover, aggregated state‐level analysis leaves a very limited number of comparison units, which makes comparability questionable. In the present study, we expand the traditional synthetic control approach by considering individual survey data. Our goal is to construct a matched data set that (i) exhibits individual‐level covariate balance, mimicking a randomization setting; and (ii) much like the synthetic control method, both the exposed and unexposed groups mimic the target population in aggregate. To achieve this, we start by casting the survey data into the potential outcomes framework, and matching on propensity score and survey weight to achieve individual‐level covariate balance. To obtain the causal interpretation for the target population, we preserve the sample from the exposure group of interest and then weight the test statistic using their sampling weights. Even though we only use 1 year of survey data, the causal interpretation is justified as marijuana legalization has already occurred in all states considered in the synthetic comparison group and their residents have been exposed to the policy change prior to the survey.
Propensity score weighting and matching are popular tools for confounding control in observational studies, however, it is still not clear what the best practice is to combine the propensity score and the survey design. Zanutto (2006) combined sampling weight and propensity score weight to come up with a weighted estimator for the population average effect and suggested that the propensity score should be unweighted as it tried to achieve sample‐level balance. This idea was echoed by DuGoff, Schuler, and Stuart (2014) and Lenis et al. (2017). Wang et al. (2009) and Ridgeway et al. (2015), however, recommended estimating the propensity score with sampling weight adjustment to achieve a population‐level balance. A case study evaluation by Dong et al. (2020) did not find differences between various methods of incorporating survey weights into a propensity score analysis. Since there is not enough empirical justification for or against incorporating survey weights into a propensity score analysis, we based our procedure off the targeted population‐level inference and theoretical results presented in Nattino (2019). Nattino argued that using survey weights in estimating propensity scores is advantageous if the survey weights contain additional important information not included in the covariates of the propensity score model. This may occur in practical scenarios when the survey administrator withholds some design information for confidentiality reasons or survey weights are adjusted for nonresponse. To be cautious, we follow Nattino's suggestion to estimate a survey‐weighted propensity score. Moreover, whereas Nattino and others have discussed incorporating survey weights into propensity score weighting, we instead propose a propensity score matched design for survey data, which has not been discussed much in literature (previous studies include DuGoff, Schuler, and Stuart 2014 and Lenis et al. 2017). A key difference from conventional matching is that we also match on survey weights to ensure that the matched samples are representative of their target populations. Another advantage of matching over weighting is that it is relatively easier to implement a sensitivity analysis postmatching to examine the potential impact of unmeasured confounders. Thus we further extend the conventional sensitivity analysis strategy to matched survey data to provide a complete picture for health policy evaluation.
1.1. Motivating Example: Medical Marijuana Legalization in Kentucky and Tennessee
Marijuana policies in the United States are rapidly liberalizing, prompting concerns about impacts of increased availability and normalization of its use (Anderson and Rees 2014). As of 2022, 36 states and the District of Columbia legalized medical marijuana. Despite recent legislative efforts, Kentucky and Tennessee have yet to do so. Kentucky recently proposed House Bill 136 to legalize medical marijuana. Despite bipartisan support, the bill failed to move past committee assignment. Kentucky lawmakers and advocates alike continue to try to advance medical marijuana, and arguments will likely be heard in upcoming legislative sessions (Ragusa 2021). Tennessee, which as of 2022 prohibits medical marijuana for most purposes, recently established a commission to propose a state medical marijuana program (Garrett and Li 2021). With these recent developments, we posit that medical marijuana legalization in Kentucky and Tennessee is likely in the coming years.
A common concern surrounding marijuana legalization is that, through tacit endorsement of marijuana use and increased availability, driving after marijuana use (DAMU) may become more prevalent (Anderson and Rees 2014). Although previous studies have found that medical marijuana legalization is associated with multiple marijuana‐related outcomes (Hasin 2018), evidence of an association between these policies and traffic safety is mixed. Benedetti et al. (2020);;2021) and Fink et al. (2020) found that medical marijuana legalization is associated with higher prevalence of self‐reported DAMU. Acute consumption of ‐9‐tetrahydrocannabinol (THC), the primary psychoactive ingredient in marijuana, causes motor and cognitive impairment, which has been linked to driving decrements (McCartney et al. 2021). However, both Santaella‐Tenorio et al. (2017) and Cook et al. (2020) found that medical marijuana legalization was associated with lower motor vehicle fatality rates. These mixed results occurring amidst increasing permissive marijuana policies motivate additional research on the effects of marijuana legalization on DAMU. Moreover, the concern that legalizing marijuana will affect traffic safety is prevalent in both academic and political discourse (see, e.g., Windle et al. 2022; Henry 2023). This concern posits a specific causal relationship: if medical marijuana were to be legalized in states that have not yet done so, then behaviors and attitudes toward driving after using marijuana will become more tolerant. These relationships can be tested using causal inference methods that estimate treatment effects on those in states who have yet to legalize medical marijuana. Therefore, in our application, we are interested in the average treatment effect (ATE) of medical marijuana legalization in Kentucky and Tennessee. However, answering this causal policy question with survey data requires novel analytic techniques.
1.2. Motivating Data Set: Traffic Safety Culture Index (TSCI)
The data for our case study came from the 2017 TSCI, a national survey conducted by the AAA Foundation for Traffic Safety. TSCI data are drawn from KnowledgePanel, a larger online research panel whose sample is obtained via stratified probability sampling of the US Postal Service's delivery sequence file. Although it is often convenient to analyze multiple years of TSCI data, the 2017 survey asked four questions related to DAMU, a feature that, to our knowledge, is unique among US surveys. Therefore, it is a valuable resource for studying how marijuana policies relate to tolerance of DAMU.
TSCI provides survey weights and information on all factors used in sample selection and weighting, namely, gender, age, race/ethnicity, education, income, household size, metropolitan status, and Census region. Our case study incorporated information from the following four items derived from TSCI survey questions:
-
1.
In the last year, have you driven within 1 h of marijuana use (yes vs. no)?
-
2.
How personally acceptable do you find DAMU (acceptable vs. not acceptable)?
-
3.
How does DAMU affect ones driving (positively or not at all vs. negatively)?
-
4.
Is drugged driving a threat to your personal safety (no vs. yes)?
Our outcome is a composite score that quantifies the respondents' tolerance of DAMU. The score is the sum of binary coding of the questions enumerated above, and can take on values consisting of integers from 0 to 4, with higher values indicating greater tolerance of DAMU.
For this study, respondents from Kentucky and Tennessee belong to the unexposed group in which medical marijuana is not legal. These states comprise our population of interest. To infer about the effects of medical marijuana legalization, unexposed subjects were matched to respondents from nearby states that had legalized medical marijuana by 2017: Arkansas, Florida, Louisiana, Maryland, Michigan, New Jersey, Ohio, and Pennsylvania. Although other states had legalized medical marijuana by 2017, we preferred to draw exposed subjects from states in relative proximity to the unexposed subjects. We hoped that doing so would partially mitigate unmeasured confounding due to cultural differences between states, which could not be captured with our set of measured confounders. That being said, we could have been more discerning in our selection of exposed states, for example, only using states that neighbored Tennessee and Kentucky. However, we opted to use a larger pool of states because we placed greater priority on individual‐level covariate balance over perhaps capturing more unmeasured confounding, while also being able to preserve the sample from the population of interest. Because we could not capture all unmeasured confounding, we also measure how sensitive our findings are in the sensitivity analysis.
The remainder of this paper proceeds as follows: in Section 2, we define notation and assumptions for our proposed method, while also providing additional background of randomization inference for survey data. In Section 3, we describe in greater detail our proposed methods, including the matched design, randomization inference, and sensitivity analysis. A simulation study is conducted to validate our method in a setting that aims to emulate our data application. Section 4 contains our data analysis, and Section 5 concludes with a discussion of our contributions, limitations, and opportunities for future work.
2. Notation and Causal Framework
2.1. Notation and Potential Outcomes
Our notation reflects a matched setup which produces pairs of observations, which will be described in greater detail in Section 3. Let be the observed outcome for of the th observation in the th pair, , . is the exposure status of the th observation in the th pair. Specifically, if the th subject in the th pair is in the exposed group, and otherwise. Each pair consists of one exposed and one unexposed observation, therefore . Furthermore, we denote a set of observed covariates as , an unobserved covariate or set of covariates as , and a survey weight as . We assume that is fixed and known by survey design for simplicity.
For our application, we adopt the potential outcomes framework (see, e.g., Imbens and Rubin 2015). That is, we assume that each subject has two potential outcomes: one which manifests when they experience under the exposed condition, and another when they experience under the unexposed condition. Let denote the potential outcome of the th observation in the th pair under the exposed condition, and denote their potential outcome under the unexposed condition. Hence, .
To avoid confusion, it will be convenient to always designate subjects with a certain exposure status to a specific index. Therefore, without loss of generality, hereafter we let denote the observed outcome for the exposed observation in pair , and denote the observed outcome for the unexposed observation in pair .
2.2. Assumptions
There are two commonly used assumptions for causal inference with observational data. The first one is the Stable Unit Treatment Value Assumption (SUTVA), which requires there is no interference among participants and no different versions of the specified treatment. This is an untestable assumption. But it seems reasonable in our analysis since we do not expect people's potential behavior or attitude would be affected by whether another person is exposed to marijuana legalization or not.
The second assumption is strongly ignorable treatment assignment, which requires independence of potential outcomes and treatment status given only observed covariates. Specifically,
with .
We acknowledge that this assumption is likely untenable in our setting. Therefore, in Section 3.3, we propose a sensitivity analysis to measure the degree of unmeasured confounding that would be needed in order to change our conclusions. We believe that this sensitivity analysis provides a powerful tool for policy researchers, as usually the data source for policy evaluation does not include enough important covariates.
2.3. Fisher's Sharp Null and Randomization Inference for Survey Data
Under the null hypothesis of no causal effect, people should have the same tolerance toward DAMU regardless of the state‐level marijuana policy. This motivates the use of Fisher's sharp null hypothesis (shown in Equation 1) with randomization inference, which does not rely on additional modeling assumption. This is particularly helpful since our study has a relatively small sample size and the primary outcome is ordinal, which is likely not be captured well with a parametric distribution.
(1) |
The alternative hypothesis in (1) reflects the belief that marijuana legalization causes more tolerance toward DAMU.
We test the hypothesis in (1) via permutation test described in Section 3.2. We choose a one‐sided alternative that reflects the belief that permissive policies cause more tolerant behaviors and attitudes toward DAMU. We used the common ‐value cutoff of 0.05 to denote a statistically significant result.
The literature on permutation tests for survey data is relatively sparse. Abadie et al. (2020) and Li and Ding (2017) discussed randomization inference for sample data, but focus on within‐sample treatment effects. Schuessler and Selb (2019) argued for the value of within‐sample treatment effect, pointing out that a significant within‐sample treatment effect allows one to then reject Fisher's sharp null at the population level. We believe this is a valid argument in the case where a treatment effect exists, and it may be bolstered by recent guidance on generalizing randomized experiments to target populations, which we attempt to mimic. For example, Miratrix et al. (2018) suggested that the sample and population ATE are often similar. However, we are concerned these arguments may overlook the null case. Unless the available data are from a simple random sample, which is not the case in our setting, an unweighted sample estimate could exhibit bias for the population treatment effect. It is clear that, in the case of no treatment effect, a biased estimate could lead to inflated type‐I error. Returning to the literature on randomized experiments, Stuart, Bradshaw, and Leaf (2015) showed that poststratification weights can be used to recover the population ATE from a randomized experiment. Therefore, we preferred to use a weighted estimate in our application.
Recently, Toth (2019) proposed a permutation test for complex survey data that is based on the residuals from a design‐consistent linear estimate for the outcome. While both Toth (2019) and our proposed method is a permutaion test for survey data, there are several key differences: first, we use matching in order to construct a synthetic exposure group, whereas Toth compares groups' weighted averages. Second, our matched procedure incorporates survey weights; observations with similar survey weights are matched with one another, a step that attempts to group observations with similar characteristics that determine how representative they are of the population.
3. Matched Design
3.1. Matched Design to Create Synthetic Exposure Group with Survey Data
To remove observed confounding, we perform one‐to‐one matching without replacement of observations in the unexposed group to observations in the exposed group via optimal matching (Ho et al. 2007;;2011) on (i) survey weight; and (ii) the propensity score obtained by estimating the probability of exposure status conditioned on . The propensity score is denoted as , and the vector of all propensity scores is denoted simply as . We clarify that, because our data are cross‐sectional, and we match respondents from unexposed states to respondents from treated states, each pair contains exactly one respondent from the exposed and unexposed group. Our causal estimand is unique in the sense that it is not the commonly used ATE or average treatment effect on the treated (ATT). Since we retain the full sample of subjects in the unexposed group and select similar subjects from the exposed group, our estimator is for the average treatment effect on the unexposed (ATU), or average treatment effect on the control (ATC) to be consistent with the existing literature.
3.2. Randomization Test for the Sharp Null Hypothesis and Causal Effect Estimation
To test the Fisher's sharp null hypothesis, we consider the following test statistic:
(2) |
where is the survey weight for the unexposed subject in pair . Or equivalently, since we denote the second element in each pair for the unexposed subject. We keep the notation with “” in weights hereafter to emphasize that we test an ATC effect. Under the null hypothesis, in theory, we can determine the exact randomization distribution of the test statistic, then derive the ‐value (Imbens and Rubin 2015). With a relatively modest sample size, however, the computation becomes burdensome. To perform computationally efficient inference about , we simulated its null distribution through a permutation approach described in Web Appendix A.
With the inclusion of survey weights, has population‐level causal effect interpretation. Proposition 1 formalizes conditions under which is unbiased, and a proof is provided in Web Appendix B.
Proposition 1
Assuming strongly ignorable treatment assignment and a known probability sampling design, the matched estimator, as defined in Equation (2), is an unbiased estimator of the population ATE on the unexposed.
3.3. Sensitivity Analysis
The randomization test in Section 3.2 utilized matching to obtain well‐balanced pairs in terms of observed covariates. However, causal interpretation still relies on strongly ignorable treatment assignment, which is challenging to prove in an observational policy study. To account for the potential unmeasured confounding, Rosenbaum has developed a series of sensitivity analysis strategies for various matched designs (Rosenbaum 1987). Starting with a hypothetical unmeasured confounder and a prespecified magnitude of its impact, we can identify an upper bound and a lower bound for the test statistic. Even though we cannot conduct the test directly due to the unmeasured confounder, we can find the upper and lower bounds on the ‐value. Therefore, we can infer how the unmeasured confounder may change the test result, in a conservative fashion using the bounds of the ‐value. The process is repeated for a range of plausible magnitude values to gauge how the study conclusion may change, and therefore to assess how sensitive the study is to unmeasured confounding. If the study conclusion changes substantially from the original analysis for a small magnitude of the impact due to the unmeasured confounder, the study is referred to as being sensitive to hidden bias. Otherwise, the study is robust to certain level of hidden bias. Detailed discussion of sensitivity analysis is presented in Chapter 3 of Rosenbaum (2010). In this subsection, we explain our sensitivity analysis implementation, which is built upon an algorithm in Fogarty (2020), with adaptations to accommodate our use of survey weights.
We expand the ignorability assumption to include both unmeasured and measured confounders. That is, we assume that the true probability of exposure depends on both observed confounders and unobserved confounders . We assume that the log odds of exposure is a linear function of and , and a sensitivity parameter quantifies the relationship between and the log odds of exposure. Provided that are normalized to fall between zero and one, this model implies the following bounds on the probability that subject 1 in the matched pair is in the exposed group:
Additional details on this model for sensitivity to hidden bias are available in Chapter 4 of Rosenbaum (2002).
The sensitivity analysis proceeds by (i) bounding; and (ii) centering . First, we define a random variable which takes on a value of 1 with probability and with probability . Then, the quantity is obtained by taking the product of and and centering it at its expected value, hence:
and
Similarly, and is defined as
We note that stochastically dominates , and . Therefore, and are monotonic increasing functions of and , respectively, and stochastically dominates . This allows one to estimate the upper bound of the ‐value corresponding to under the sensitivity parameter through the following steps:
-
1.
Over iterations indexed by :
-
a.
For each pair, generate , with denoting a Bernoulli random variable with expectation .
-
b.
Compute .
-
c.
Compute
-
a.
-
2.Compute the upper bound on the ‐value for as
The algorithm enumerated above would be used in the event that we rejected and wished to measure the amount of hidden bias needed to change our conclusions, that is, fail to reject . Anticipating the possibility that we observe a null result, that is, medical marijuana legalization does not change people's tolerance toward DAMU, we adapt the algorithm to measure the amount of hidden bias needed to mask a treatment effect of size .
First, we define a one‐sided null hypothesis, specifically
(3) |
To test , we define as , and
By using in place of , the algorithm above is suited to perform sensitivity analysis following a null result, measuring the amount of unmeasured confounding (as measured by ) needed to mask a treatment effect of size .
3.4. Simulation Study
In this subsection, we describe a series of simulation studies we performed to validate our proposed method. We ran simulations in four settings, each corresponding to different sample sizes. Within each of those four settings, we considered six treatment effect sizes. We compared our method to two existing methods: (i) the method of Toth (2019), described in greater detail in Section 3.4.2; and (ii) survey‐weighted linear regression without matching. The key takeaways of the simulation study are:
-
1.
When estimating the ATE, the proposed method exhibits little bias and achieves near nominal type‐I error and coverage. It has sufficient power to detect an effect of size 0.25 and greater.
-
2.
Our proposed method performs similar to two regression‐based benchmark methods in settings with a constant treatment effect.
-
3.
In a simulation where the ATE differs for exposed and unexposed subjects, our proposed method recovers the ATC, while both regression‐based methods are biased.
3.4.1. Design and Data Generation
All simulations drew data from a single population data set. To generate population data mimicking real‐world situations, we began with data from the American Community Survey's Public‐Use Microdata Sample (PUMS), which contain much of the same demographic and socioeconomic data provided in TSCI. We used these PUMS data from all 11 states used in our data application, specifically Kentucky, Tennessee, Arkansas, Florida, Louisiana, Maryland, Michigan, New Jersey, Ohio, and Pennsylvania. Ignoring the sampling weights contained within the PUMS data, we instead treat the unweighted data as the true population about which we want to infer. For this reason, we refer to these data as “pseudo‐states.”
We proceeded by generating outcome data for each individual in our population. We wanted to avoid generating outcome data with a particular functional relationship (i.e., linear) between outcome and covariates, and generate outcomes that somewhat resembled the observed outcomes in our main data analysis. For this reason, we trained a random forest (Breiman 2001) with the existing TSCI data using the randomForest package in R Version 3.6 (Liaw and Wiener 2002). The outcome was the composite score measuring tolerance of DAMU, and the predictors were age, gender, race/ethnicity, income, and education. We then applied this model to the population data, using the predicted values, denoted as , to generate the outcome values between 0 and 4 for the simulated data. The outcome values, , are also related to the treatment effect. As in our data application, individuals from the Kentucky and Tennessee populations were unexposed (), and all others were exposed (). For an effect size of , the outcome was generated as
We generated data with six effect sizes, = 0, 0.10, 0.25, 0.50, 0.75, and 1.00.
Next, we selected a sample from each state. The sample was stratified based on eight categories of age (broken into four categories) and gender, and sampling weights were proportional to the known sampling probabilities. The sample sizes varied over the four simulation settings, and were primarily based on the sample sizes available to us from the 2017 TSCI. Below, we describe the unexposed data sets for each of four simulation settings, with our shorthand labels for them:
-
1.
“KY Only”: the sample size is based on the Kentucky TSCI sample. We select 60 subjects from the pseudo‐Kentucky population data. Tennessee data are excluded from this simulation.
-
2.
“TN Only”: the sample size is based on the Tennessee TSCI sample. We select 70 subjects from the pseudo‐Tennessee population data.
-
3.
“KY and TN”: We selected 60 subjects from the pseudo‐Kentucky population data and 70 subjects from the pseudo‐Tennessee population data.
-
4.
“Large Sample Size”: We selected 600 subjects from the pseudo‐Kentucky population data and 700 subjects from the pseudo‐Tennessee population data.
For simulation studies 1–3 above, we selected samples of the following size for the exposed pseudostates: 20 from Arkansas, 150 from Florida, 20 from Louisiana, 40 from Maryland, 60 from Michigan, 60 from New Jersey, 70 from Ohio, and 100 from Pennsylvania. For simulation 4, the sample sizes were increased by an order of magnitude.
After generating the data and selecting a sample, we proceed by performing the matching and analysis procedure described in Section 3. For each setting and effect size, we run 500 simulations, with each simulation consisting of a different sample from the pseudo‐population data.
In addition to fitting the proposed model, we compared our method to two benchmarks that use survey‐weighted regression (see Section 3.4.2 for more details). Because our data‐generating procedure uses a constant treatment effect, we expect both of these benchmarks to recover the true treatment effect and exhibit near nominal coverage probability, even though we avoided specific functional forms when generating the data. Therefore, we should not expect our proposed method outperforms the benchmarks, rather, we hope that it will perform similarly.
Instead, we performed one additional simulation to complement our proof of Proposition 1, which showed that our estimator recovers the ATC. Starting from simulation setting 3, in which we drew samples of 60 subjects from Kentucky and 70 from Tennessee, we randomly generated a Bernoulli random variable whose expectation is 0.5 in medical marijuana states and 0.7 in nonmedical marijuana states. Then, the treatment effect is
Then, the data are generated as follows:
As a result, the ATE for the entire population is 0.814, the average treatment effect on the exposed group is 0.80, and the ATU is 0.92. We expect that our method will recover the ATC, whereas the other methods will not, simply because they were not designed to do so.
3.4.2. Simulation Comparisons: Permutation Test of Toth and Survey‐Weighted Regression
Toth (2019) proposed a survey‐weighted permutation test for estimating treatment effects from complex sample data that used the residuals from a design‐consistent linear estimate of the outcome. To our knowledge, this is among the only other proposed permutation tests for survey data. While TSCI is not a complex survey, we can nevertheless perform a permutation test on the residuals from a design‐consistent linear estimate of the outcome. To describe the approach in more detail, we will use the index from 1 to total observations in the full (unmatched) sample.
We began by fitting a survey‐weighted regression of on all covariates, excluding the exposure term , which provides a design‐consistent estimate of : . The residuals are defined as , and the test statistic is
To approximate the null distribution of , we randomly permuted the exposure status 10,000 times and computed the test statistic.
For our second comparison, we use a survey‐weighted linear regression model. We modeled the outcome as a numeric variable and included and indicator marijuana policy, along with age, gender, race/ethnicity, income, and education. The model was weighted based on the TSCI survey weights. All methods used in this simulation should, under ideal conditions, yield estimates with expectation equal to the data‐generating treatment effect.
All survey‐weighted regression procedures were fit using the survey package in R.
3.4.3. Simulation Results
Table 1 presents results for simulations 1 and 2, and Table 2 presents results for simulations 3 and 4. For simulations with no treatment effect, we provided the empirical type‐I error rate, which was the proportion of simulations in which the null hypothesis was erroneously rejected in favor of the alternative. For simulations with treatment effects, we provided the empirical power, which was the proportion of simulations in which the null hypotheses was rejected. Additional empirical quantities included the bias, that is, the average difference across simulations between the estimated treatment effect and the true one; the empirical coverage probability, that is, the proportion of simulations in which a 95% confidence interval (CI) for the treatment effect contained the true value; and the average length of a CI for the treatment effect. Type‐I Errors and Power (i.e., metrics pertaining to a hypothesis test) were one‐sided, whereas CIs were two‐sided and computed via bootstrap resampling.
TABLE 1.
Results for simulations 1 and 2: columns denote the setting and number of pairs (), the effect size, the type‐I error rate (only for simulations with no treatment effect), the power (only for simulations with treatment effect), the bias of the test statistic relative to the true treatment effect, the probability that a 95% CI for the treatment effect contains the true value, and the average length of a confidence interval for the treatment effect. The type‐I error, power, bias, coverage, and CI length are empirical quantities taken over 500 simulations. The bias is computed by taking the average test statistic compared minus the true data‐generating effect size and multiplied by 100 for greater ease of comparison.
Setting | Effect size | Method | Type‐I error or Power a | Bias | Coverage | CI length |
---|---|---|---|---|---|---|
1: KY Only | 0.00 | Matched perm. | 0.04 | −1.9 | 0.89 | 0.28 |
Toth | 0.05 | 0.2 | 0.96 | 0.28 | ||
Regression | 0.07 | 0.2 | 0.95 | 0.28 | ||
0.10 | Matched perm. | 0.28 | −2.7 | 0.91 | 0.28 | |
Toth | 0.38 | −0.7 | 0.95 | 0.28 | ||
Regression | 0.37 | −0.4 | 0.93 | 0.29 | ||
0.25 | Matched perm. | 0.93 | −2.5 | 0.93 | 0.28 | |
Toth | 0.97 |
|
0.96 | 0.28 | ||
Regression | 0.95 | −0.5 | 0.96 | 0.29 | ||
0.50 | Matched perm. | 1.00 | −1.7 | 0.92 | 0.28 | |
Toth | 1.00 | −0.7 | 0.95 | 0.28 | ||
Regression | 1.00 | −0.7 | 0.93 | 0.29 | ||
0.75 | Matched perm. | 1.00 | −2.1 | 0.92 | 0.28 | |
Toth | 1.00 | −2.1 | 0.94 | 0.28 | ||
Regression | 1.00 | 0.0 | 0.94 | 0.29 | ||
1.00 | Matched perm. | 1.00 | −2.6 | 0.91 | 0.28 | |
Toth | 1.00 | −3.1 | 0.92 | 0.28 | ||
Regression | 1.00 | −0.4 | 0.93 | 0.29 | ||
2: TN Only | 0.00 | Matched perm. | 0.07 | 1.3 | 0.94 | 0.27 |
Toth | 0.04 | −0.7 | 0.97 | 0.27 | ||
Regression | 0.05 | −0.7 | 0.96 | 0.27 | ||
0.10 | Matched perm. | 0.54 | 1.6 | 0.95 | 0.27 | |
Toth | 0.40 | −0.4 | 0.96 | 0.27 | ||
Regression | 0.41 | −0.2 | 0.94 | 0.27 | ||
0.25 | Matched perm. | 0.97 | 1.9 | 0.88 | 0.26 | |
Toth | 0.99 | −0.6 | 0.95 | 0.27 | ||
Regression | 0.98 | −0.1 | 0.96 | 0.27 | ||
0.50 | Matched perm. | 1.00 | 1.1 | 0.92 | 0.27 | |
Toth | 1.00 | −1.7 | 0.94 | 0.27 | ||
Regression | 1.00 | −0.7 | 0.93 | 0.27 | ||
0.75 | Matched perm. | 1.00 | 1.5 | 0.91 | 0.27 | |
Toth | 1.00 | −1.4 | 0.96 | 0.27 | ||
Regression | 1.00 | 0.1 | 0.95 | 0.27 | ||
1.00 | Matched perm. | 1.00 | 0.9 | 0.92 | 0.27 | |
Toth | 1.00 | −2.8 | 0.94 | 0.27 | ||
Regression | 1.00 | −0.7 | 0.96 | 0.27 |
Type‐I error rate is provided for simulations with an effect size of 0.00, whereas Power, or –Type‐II error, is provided for simulations nonzero effect size. Bold numbers indicate the best performance among the three methods for a given evaluation metric.
TABLE 2.
Results for simulations 3 and 4: columns denote the setting and number of pairs (), the effect size, the type‐I error rate (only for simulations with no treatment effect), the power (only for simulations with treatment effect), the bias of the test statistic relative to the true treatment effect, the probability that a 95% CI for the treatment effect contains the true value, and the average length of a confidence interval for the treatment effect. The type‐I error, power, bias, coverage, and CI length are empirical quantities taken over 500 simulations. The bias is computed by taking the average test statistic compared minus the true data‐generating effect size and multiplied by 100 for greater ease of comparison.
Setting | Effect size | Method | Type‐I error or Power a | Bias | Coverage | CI length |
---|---|---|---|---|---|---|
3: TN and KY | 0.00 | Matched perm. | 0.06 | 0.0 | 0.94 | 0.20 |
Toth | 0.03 | −0.3 | 0.95 | 0.21 | ||
Regression | 0.05 | −0.3 | 0.96 | 0.21 | ||
0.10 | Matched perm. | 0.58 | −0.1 | 0.94 | 0.20 | |
Toth | 0.58 | −0.5 | 0.97 | 0.21 | ||
Regression | 0.57 | −0.2 | 0.97 | 0.21 | ||
0.25 | Matched perm. | 1.00 | −0.2 | 0.93 | 0.20 | |
Toth | 1.00 | −0.8 | 0.95 | 0.20 | ||
Regression | 1.00 | −0.1 | 0.95 | 0.21 | ||
0.50 | Matched perm. | 1.00 | −0.3 | 0.95 | 0.20 | |
Toth | 1.00 | −1.9 | 0.94 | 0.20 | ||
Regression | 1.00 | −0.5 | 0.95 | 0.21 | ||
0.75 | Matched Perm. | 1.00 | 0.0 | 0.92 | 0.20 | |
Toth | 1.00 | −2.3 | 0.95 | 0.20 | ||
Regression | 1.00 | −0.1 | 0.96 | 0.21 | ||
1.00 | Matched perm. | 1.00 | −0.2 | 0.92 | 0.20 | |
Toth | 1.00 | −3.2 | 0.91 | 0.21 | ||
Regression | 1.00 | −0.4 | 0.94 | 0.21 | ||
4: Large Sample Size | 0.00 | Matched perm. | 0.04 | −0.1 | 0.93 | 0.05 |
Toth | 0.03 | −0.4 | 0.95 | 0.06 | ||
Regression | 0.04 | −0.4 | 0.96 | 0.06 | ||
0.10 | Matched Perm. | 1.00 | −0.1 | 0.91 | 0.04 | |
Toth | 1.00 | −0.5 | 0.95 | 0.06 | ||
Regression | 1.00 | −0.4 | 0.96 | 0.06 | ||
0.25 | Matched perm. | 1.00 | −0.1 | 0.92 | 0.05 | |
Toth | 1.00 | −0.7 | 0.94 | 0.07 | ||
Regression | 1.00 | −0.3 | 0.96 | 0.07 | ||
0.50 | Matched perm. | 1.00 | −0.1 | 0.92 | 0.05 | |
Toth | 1.00 | −1.3 | 0.90 | 0.07 | ||
Regression | 1.00 | −0.4 | 0.94 | 0.07 | ||
0.75 | Matched perm. | 1.00 | −0.1 | 0.92 | 0.05 | |
Toth | 1.00 | −1.6 | 0.85 | 0.07 | ||
Regression | 1.00 | −0.2 | 0.94 | 0.07 | ||
1.00 | Matched perm. | 1.00 | −0.1 | 0.91 | 0.04 | |
Toth | 1.00 | −2.0 | 0.80 | 0.07 | ||
Regression | 1.00 | −0.2 | 0.94 | 0.07 |
Type‐I error rate is provided for simulations with an effect size of 0.00, whereas Power, or –Type‐II error, is provided for simulations nonzero effect size. Bold numbers indicate the best performance among the three methods for a given evaluation metric.
Overall, each of the three methods performed well across the four simulation settings. In settings 1 and 2, our method exhibited slightly more bias compared to the other methods, and that bias was present for every effect size. In simulation settings 3 and 4, our proposed method exhibited little bias. Coverage rates were close to nominal for each method, and their CI lengths were similar in settings 1–3. In simulation setting 4, the bias for our proposed method was consistently the lowest among the three methods. However, the other methods were sometimes closer to nominal coverage. The coverage probability for our method was slightly lower than desired in simulation 4, ranging from 91% to 93% instead of 95%. The CI lengths for our proposed method were slightly shorter than those of the benchmarks in simulation setting 4, which may have explained the lower‐than‐desired coverage probability. The permutation test on residuals, which we labeled Toth's method, seemed to exhibit more bias for larger effect sizes in every simulation setting, however this bias was relatively small even for the largest effect sizes.
Overall, these simulations demonstrated the following:
Our proposed method was able to estimate the treatment effect and achieve other desirable statistical properties, such as nominal coverage probabilities.
Our proposed method, which did not impose any functional relationship between outcome and covariates, performed similarly to other existing methods based on survey‐weighted regression.
Despite only using a matched subset of the data, our proposed method had similar CI length and power as the regression‐based methods which used every sampled observation. In simulation 4, the matched estimator had shorter CI length.
Finally, Table 3 provides empirical validation of Proposition 1, which stated that our method was able to recover the ATC when data were generated with heterogeneous treatment effects. The true population ATC was 0.92, and our average estimate of the ATC across 1000 simulations was 0.912. While Table 3 includes the benchmark methods as well, and those methods did not recover the ATC as well as ours, we stress that those methods were not designed to estimate the ATC.
TABLE 3.
Results for a single treatment study with nonconstant treatment effects. Columns denote the setting and number of pairs (), the effect size, the power, the bias of the test statistic relative to the true treatment effect, the probability that a 95% CI for the treatment effect contains the true value, and the average length of a confidence interval for the treatment effect, denoted as the “ATC.” The power, bias, coverage, and CI length are empirical quantities taken over 500 simulations. The bias is computed by taking the average test statistic compared minus the true data‐generating effect size and multiplied by 100 for greater ease of comparison.
Setting | ATC | Method | Power | Bias | Coverage | CI length |
---|---|---|---|---|---|---|
TN and KY | 0.92 | Matched perm. | 1.00 | −0.8 | 0.93 | 0.25 |
Toth | 1.00 | −7.0 | 0.75 | 0.21 | ||
Regression | 1.00 | −2.3 | 0.92 | 0.23 |
Bold numbers indicate the best performance among the three methods for a given evaluation metric.
4. Analysis of TSCI Data
This section presents the results from our real data analysis, in which we investigated the effect that medical marijuana legalization would have on behaviors and attitudes toward DAMU. We conducted three separate analyses: one in which the only unexposed data were from Kentucky; one in which the only unexposed data were from Tennessee; and one in which unexposed data were from both Kentucky and Tennessee.
Survey weights in TSCI were designed to yield population‐level inference for the United States. Because our populations of interest were states, we used iterative proportional fitting (raking) to calibrate the survey weights. Doing so guaranteed that the weighted samples exhibited agreement with their corresponding states with respect to gender, age, race/ethnicity, education, income, household size, and metropolitan status. These factors comprised all factors (except for Census region) used in computing and raking the original TSCI survey weights.
4.1. Matching and Covariate Balance
We first estimated the propensity scores via logistic regression model which included race/ethnicity, gender, income, age, and income. The propensity score model also included an interaction between gender and income. Because we could not guarantee that survey selection was independent of exposure status, following the recommendation of Nattino (2019), we estimated a survey‐weighted propensity score. Unexposed subjects (from Kentucky and Tennessee) were matched on a one‐to‐one basis with exposed subjects (from medical marijuana states) via optimal matching on both survey weights and propensity scores. We matched without replacement and retained all subjects from the unexposed group.
Figure 1 provides a forest plot of the absolute standardized mean differences for all covariates used in computing the propensity score for the combined Tennessee and Kentucky analysis. All of the postmatching absolute standardized mean differences fall below 0.1, with most falling below 0.05, meaning the matched sample exhibits satisfactory balance with respect to weight, race/ethnicity, education, gender, income, and age. Similar assessments of Kentucky and Tennessee samples alone also demonstrated satisfactory covariate balance. Web Appendix C presents an assessment of propensity score and survey weight balance for the matched sample.
FIGURE 1.
Forest plot of standardized mean differences of the covariates used in estimating the propensity scores, before and after matching.
4.2. Permutation Test and Sensitivity Analysis of TSCI Data
With a successfully matched data set, the biases due to observed covariates are believed to be removed. Table 4 presents the estimated treatment effect, the ‐value corresponding to a one‐sided hypothesis test that marijuana legalization may cause higher tolerance for DAMU, and the sensitivity analysis measuring the amount of unmeasured confounding that would lead us to reach the opposite conclusion. We performed the analyses on Kentucky alone, Tennessee alone, and Kentucky and Tennessee combined.
TABLE 4.
Results from analyses of Kentucky and Tennessee. Columns denote the estimated treatment effect , ‐value corresponding to , and the amount of unmeasured bias needed to mask a treatment effect of size 1.00, denoted as .
Sample |
|
‐Value |
|
||
---|---|---|---|---|---|
KY | −0.333 | 0.975 | 8.2 | ||
TN | −0.348 | 0.941 | 4.7 | ||
KY and TN | −0.211 | 0.940 | 6.3 |
The hypothesis that more liberal marijuana policies will lead to dangerous driving conditions was not corroborated by our analysis. In all three analyses, there was practically no evidence to reject the null hypothesis and to conclude that medical marijuana legalization leads to more tolerant behaviors and attitudes toward DAMU. In fact, the signs of the effect estimates were in the opposite direction hypothesized in Equation (1).
To further evaluate the potential impact due to unmeasured factors, we conducted the sensitivity analysis as described in Section 3.3. We posited that a reasonable value for (see Equation 3) would be 1.0: in other words, the masked treatment effect would be equivalent to medical marijuana legalization changing one of the four survey items that comprised our outcome (see Section 1.2) from the intolerant response to the tolerant one. Specifically, using Kentucky sample as an example, we started with , which indicates no unmeasured confounding. Under the null hypothesis , the ‐value corresponding to the weighted randomization test was less than 0.001. With , it suggested a rejection of the null hypothesis, meaning no treatment effect of more than one unit. Since we did not account for unmeasured confounding, this could be a false rejection if the hidden bias exists. Therefore, we increased values to allow various magnitudes of hidden bias. For a small magnitude, say , the ‐value was less than 0.01, which was still significant. For a moderate magnitude, say , we got , which was significant but much closer to 0.05. For a moderately large magnitude, , the ‐value became 0.05, indicating that we should retain the null hypothesis. It might be possible that the legalization would lead to more tolerance (one unit or more) on DAMU, but it is not shown in our data due to some strong unmeasured confounding. A detailed table of this sensitivity analysis is presented in Web Appendix D.
The results of three sensitivity analyses are presented in the last column of Table 4. The values of the sensitivity parameter that would be needed in order to mask a true effect of one‐unit higher tolerance were: 8.2 for Kentucky only; 4.7 for Tennessee only; and 6.3 for Kentucky and Tennessee together. In other words, an unmeasured confounder would have to be associated with more than 4.7 times higher odds of living in a medical marijuana state to make it possible to see a greater tolerance after medical marijuana legalization. We acknowledge that the covariates included in may not have been comprehensive enough to rule out any unmeasured confounding, and we did not fully account for cultural differences between the states considered in this study. However, we believe that an unmeasured confounder having such a strong association with the exposure status to change our conclusions is unlikely to exist. For example, in the logistic regression model we used to estimate the propensity scores, the maximum odds ratio associated with exposure status was 1.75. Therefore, we believe our causal evaluation is robust to moderate amounts of hidden bias.
5. Discussion
Our study proposed a matched design to utilize survey data to estimate the causal effect of medical marijuana legalization in Kentucky and Tennessee. We rooted our study in the idea that medical marijuana legalization would cause residents to exhibit greater tolerance toward DAMU. We found practically no evidence for this hypothesis, and our conclusions were unlikely to change due to moderate level of unmeasured confounding.
Our study contributes both to the literature on causal inference for survey data, and to the literature on the effects of marijuana legalization in the United States. With respect to the former, our theoretical justification and simulation study demonstrated that matching based on survey weight and propensity score resulted in an unbiased treatment effect estimator with population‐level interpretation, and a test that achieved near nominal type‐I error and high power in settings that imitated our data application. Our matched design was used to construct a synthetic exposure group, an approach that was evocative of the synthetic control method (Abadie, Diamond, and Hainmueller 2010). The matched design also makes the sensitivity analysis to assess unmeasured confounding easy to implement.
Previous studies have found small but statistically significant associations between medical marijuana legalization and self‐reported DAMU (Fink et al. 2020; Benedetti et al. 2020). This is in contrast to our conclusion of no causal effects, however we note that our outcome measured both behaviors and attitudes, and we only considered Southern and Midwestern states in our study. Because self‐reporting DAMU was rare, it is challenging to detect meaningful policy effects. A strength of our study was our use of multiple survey items related to DAMU. Benedetti et al. (2021) found no pattern that those living in states with more liberal marijuana policies (recreational or medical) were more tolerant of DAMU, a finding which was consistent with the present study.
One aspect of our proposed method that warrants further discussion is temporality. Our method was motivated by a data set with 1 year of data. Interpreting our estimand as a causal one requires that the exposure preceded the outcome. We posit that this assumption holds in our setting because medical marijuana policies in the exposed states were enacted prior to the subjects taking the survey. Potential violations may have arisen in a few cases. A subject may have driven within 1 h of using marijuana in the last 12 months (question 1 in Section 1.2) but before the medical marijuana policy was enacted. Because TSCI surveys were administered from October to November 2017, this could have occurred if a subject had driven after marijuana use (i) in Arkansas in October 2016 but not in November 2016 or later; (ii) in Florida in October–December 2016 but not in January 2017 or later. While violations were possible, we believe they were few in number.
There were several additional limitations to our study. First, the data were self‐reported, and there was potential for response bias. Those in states that legalized marijuana may have been more forthright about their marijuana‐related responses in the survey, however this bias did not appear to manifest in our data. Second, it is plausible that effects were heterogeneous within the population, for example, they may have been more extreme among certain demographic subgroups. While we entertained the idea of so‐called “uncommon but dramatic effects” (see Chapter 17 of Rosenbaum 2010), an exploratory analysis of the matched data provided no evidence that such an effect existed. Another limitation was the use of a single year of data, which in fact motivated the proposed method. The year 2017 was the only one in which TSCI included the four questions used in this study, hence we assessed that it was valuable to study this year alone. However, a promising avenue for future work would be to extend this matched design to be used in longitudinal setting. Stuart et al. (2014) proposed incorporating propensity score weights into the difference‐in‐differences framework to estimate policy effects in a way that ensures covariate balance across groups. Next, we only estimated treatment effects in two southern states. Our findings may not generalize to other states that have yet to legalize medical marijuana. In addition, one may reasonably question if the SUTVA is met for our study, since statewide changes in behaviors and attitudes do not occur in a vacuum, especially in the age of social media and national connectivity. Further investigation on this topic is desirable. Finally, our method estimates state‐level policy effect and the sampling design of TSCI was relatively simple. This may not be directly applicable to more complex sampling scenarios. For example, if the interest is in school‐level policy change and schools are selected using clustered sampling, it may require additional methodological considerations. Among others, Arpino and Cannas (2016) and Zubizarreta and Keele (2017) have proposed matched designs for clustered data, which may provide good avenues for future work.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research Badges
This article has earned an Open Data badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section.
This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary to reproduce the reported results. The results reported in this article were reproduced partially due to confidentiality issues.
Supporting information
Supporting Information
Supporting Information
Acknowledgments
The authors sincerely thank the AAA Foundation for Traffic Safety, especially Dr. Woon Kim, for their assistance in procuring the TSCI data. We also thank the associate editor and two anonymous reviewers for their insightful feedback, which lead to substantial improvements of the manuscript. This work was partially supported by grant DMS‐2015552 from the National Science Foundation.
Funding: This work was partially supported by the National Science Foundation (Grant DMS‐2015552).
Data Availability Statement
The code used for simulations and data analyses are available as supplemental files. The data that support the findings of this study are available from the AAA Foundation for Traffic Safety. Restrictions apply to the availability of these data, which were used under license for this study. Data are available https://aaafoundation.org/about/contact/ with the permission of the AAA Foundation for Traffic Safety.
References
- Abadie, A. , Athey S., Imbens G. W., and Wooldridge J. M.. 2020. “Sampling‐Based Versus Design‐Based Uncertainty in Regression Analysis.” Econometrica 88: 265–296. [Google Scholar]
- Abadie, A. , Diamond A., and Hainmueller J.. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program.” Journal of the American Statistical Association 105: 493–505. [Google Scholar]
- Anderson, D. M. , and Rees D. I.. 2014. “The Legalization of Recreational Marijuana: How Likely is the Worst‐Case Scenario?” Journal of Policy Analysis and Management 331: 221–232. [DOI] [PubMed] [Google Scholar]
- Arpino, B. , and Cannas M.. 2016. “Propensity Score Matching With Clustered Data. An Application to the Estimation of the Impact of Caesarean Section on the Apgar Score.” Statistics in Medicine 35: 2074–2091. [DOI] [PubMed] [Google Scholar]
- Benedetti, M. H. , Li L., Neuroth L. M., Humphries K. D., Brooks‐Russell A., and Zhu M.. 2020. “Self‐Reported Driving After Marijuana Use in Association With Medical and Recreational Marijuana Policies.” International Journal of Drug Policy 92: 102944. [DOI] [PubMed] [Google Scholar]
- Benedetti, M. H. , Li L., Neuroth L. M., Humphries K. D., Brooks‐Russell A., and Zhu M.. 2021. “Demographic and Policy‐Based Differences in Behaviors and Attitudes Towards Driving After Marijuana Use: An Analysis of the 2013–2017 Traffic Safety Culture Index.” BMC Research Notes 14: 226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breiman, L. 2001. “Random Forests.” Machine Learning 45: 5–32. [Google Scholar]
- Cook, A. C. , Leung G., and Smith R. A.. 2020. “Marijuana Decriminalization, Medical Marijuana Laws, and Fatal Traffic Crashes in US Cities, 2010–2017.” American Journal of Public Health 110: 363–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dong, N. , Stuart E. A., Lenis D., and Nguyen T. Q.. 2020. “Using Propensity Score Analysis of Survey Data to Estimate Population Average Treatment Effects: A Case Study Comparing Different Methods.” Evaluation Review 44: 84–108. [DOI] [PubMed] [Google Scholar]
- DuGoff, E. H. , Schuler M., and Stuart E. A.. 2014. “Generalizing Observational Study Results: Applying Propensity Score Methods to Complex Surveys.” Health Services Research 49: 284–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fink, D. S. , Stohl M., Sarvet A. L., Cerda M., Keyes K. M., and Hasin D. S.. 2020. “Medical Marijuana Laws and Driving Under the Influence of Marijuana and Alcohol.” Addiction 115: 1944–1953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fogarty, C. B. 2020. “Studentized Sensitivity Analysis for the Sample Average Treatment Effect in Paired Observational Studies.” Journal of the American Statistical Association 115: 1518–1530. [Google Scholar]
- Garrett, T. , and Li A.. 2021. “Tennessee Expands (Minimally) Medical Marijuana Law and Establishes Cannabis Commission.” JD Spura .
- Hasin, D. S. 2018. “US Epidemiology of Cannabis Use and Associated Problems.” Neuropsychopharmacology 43: 195–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henry, M. 2023. “Could Legalizing Marijuana Cause More Fatal Car Crashes in Ohio? ‘It's Extraordinarily Complicated’.” Ohio Capital Journal .
- Ho, D. E. , Imai K., King G., and Stuart E. A.. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15: 199–236. [Google Scholar]
- Ho, D. E. , Imai K., King G., and Stuart E. A.. 2011. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.” Journal of Statistical Software 42: 1–28. [Google Scholar]
- Imbens, G. W. , and Rubin D. B.. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. New York: Cambridge University Press. [Google Scholar]
- Kontopantelis, E. , Doran T., Springate D. A., Buchan I., and Reeves D.. 2015. “Regression Based Quasi‐Experimental Approach When Randomisation is Not an Option: Interrupted Time Series Analysis.” BMJ 350: h2750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lechner, M. 2011. The Estimation of Causal Effects by Difference‐in‐Difference Methods . Technical Report, Department of Economics, University of St. Gallen. [Google Scholar]
- Lenis, D. , Nguyen T. Q., Dong N., and Stuart E. A.. 2017. “It's All About Balance: Propensity Score Matching in the Context of Complex Survey Data.” Biostatistics 20: 147–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, X. , and Ding P.. 2017. “General Forms of Finite Population Central Limit Theorems With Applications to Causal Inference.” Journal of the American Statistical Association 112: 1759–1769. [Google Scholar]
- Liaw, A. , and Wiener M.. 2002. “Classification and Regression by randomForest.” R News 2: 18–22. [Google Scholar]
- McCartney, D. , Arkell T. R., Irwin C., and McGregor I. S.. 2021. “Determining the Magnitude and Duration of Acute 9‐Tetrahydrocannabinol (9‐THC)‐Induced Driving and Cognitive Impairment: A Systematic and Meta‐Analytic Review.” Neuroscience & Biobehavioral Reviews 126: 175–193. [DOI] [PubMed] [Google Scholar]
- Miratrix, L. W. , Sekhon J. S., Theodoridis A. G., and Campos L. F.. 2018. “Worth Weighting? How to Think About and Use Weights in Survey Experiments.” Political Analysis 26: 275–291. [Google Scholar]
- Nattino, G. 2019. “Causal Inference in Observational Studies With Complex Design: Multiple Arms, Complex Sampling and Intervention Effects.” PhD thesis, The Ohio State University. [Google Scholar]
- Ragusa, J. 2021. “Kentucky State Lawmaker Seeks Marijuana Legalization.” Spectrum News . [Google Scholar]
- Ridgeway, G. , Kovalchik S. A., Griffin B. A., and Kabeto M. U.. 2015. “Propensity Score Analysis With Survey Weighted Data.” Journal of Causal Inference 3: 237–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosenbaum, P. R. 1987. “Sensitivity Analysis for Certain Permutation Inferences in Matched Observational Studies.” Biometrika 74: 13–26. [Google Scholar]
- Rosenbaum, P. R. 2002. Observational Studies. New York, NY: Springer. [Google Scholar]
- Rosenbaum, P. R. 2010. Design of Observational Studies. New York, NY: Springer. [Google Scholar]
- Santaella‐Tenorio, J. , Mauro C. M., Wall M. M., et al. 2017. “US Traffic Fatalities, 1985‐2014, and Their Relationship to Medical Marijuana Laws.” American Journal of Public Health 107: 336–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schuessler, J. , and Selb P.. 2019. “Graphical Causal Models for Survey Inference.” SocArXiv .
- Stuart, E. A. , Bradshaw C. P., and Leaf P. J.. 2015. “Assessing the Generalizability of Randomized Trial Results to Target Populations.” Prevention Science 16: 475–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stuart, E. A. , Huskamp H. A., Duckworth K., et al. 2014. “Using Propensity Scores in Difference‐in‐Differences Models to Estimate the Effects of a Policy Change.” Health Services and Outcomes Research Methodology 14: 166–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toth, D. 2019. “A Permutation Test on Complex Sample Data.” Journal of Survey Statistics and Methodology 8: 772–791. [Google Scholar]
- Wang, W. , Scharfstein D., Tan Z., and MacKenzie E. J.. 2009. “Causal Inference in Outcome‐Dependent Two‐Phase Sampling Designs.” Journal of the Royal Statistical Society. Series B (Statistical Methodology) 71: 947–969. [Google Scholar]
- Windle, S. B. , Socha P., Nazif‐Munoz J. I., Harper S., and Nandi A.. 2022. “The Impact of Cannabis Decriminalization and Legalization on Road Safety Outcomes: A Systematic Review.” American Journal of Preventive Medicine 63: 1037–1052. [DOI] [PubMed] [Google Scholar]
- Zanutto, E. L. 2006. “A Comparison of Propensity Score and Linear Regression Analysis of Complex Survey Data.” Journal of Data Science 4: 67–91. [Google Scholar]
- Zubizarreta, J. R. , and Keele L.. 2017. “Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large‐Scale Voucher System.” Journal of the American Statistical Association 112: 547–560. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Supporting Information
Data Availability Statement
The code used for simulations and data analyses are available as supplemental files. The data that support the findings of this study are available from the AAA Foundation for Traffic Safety. Restrictions apply to the availability of these data, which were used under license for this study. Data are available https://aaafoundation.org/about/contact/ with the permission of the AAA Foundation for Traffic Safety.