Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Nov 1.
Published in final edited form as: Epidemiology. 2020 Nov;31(6):815–822. doi: 10.1097/EDE.0000000000001252

Impact of Regression to the Mean on the Synthetic Control Method: Bias and Sensitivity Analysis

Nicholas Illenberger a, Dylan S Small b, Pamela A Shaw a
PMCID: PMC7541515  NIHMSID: NIHMS1616521  PMID: 32947369

Abstract

To make informed policy recommendations from observational panel data, researchers must consider the effects of confounding and temporal variability in outcome variables. Difference-in-Difference methods allow for estimation of treatment effects under the parallel trends assumption. To justify this assumption, methods for matching based on covariates, outcome levels, and outcome trends- such as the synthetic control approach- have been proposed. While these tools can reduce bias and variability in some settings, we show that certain applications can introduce regression to the mean (RTM) bias into estimates of the treatment effect. Through simulations, we show RTM bias can lead to inflated type I error rates as well as bias towards the null in typical policy evaluation settings. We develop a novel correction for RTM bias that allows for valid inference and show how this correction can be used in a sensitivity analysis. We apply our proposed sensitivity analysis to reanalyze data concerning the effects of California’s Proposition 99, a large-scale tobacco control program, on statewide smoking rates.

Keywords: Regression to the Mean, Difference-in-difference, Matching, Synthetic Control, Sensitivity analysis

Introduction

Panel studies are a type of longitudinal study that can be used to estimate the effect of an intervention on an outcome of interest by comparing outcome measurements collected pre- and post-treatment. Because treatment is not typically randomized, differences in outcomes can not be attributed to intervention alone. If we consider the effect of a smoking cessation program on cigarette sales within a state, then state demographics, which may influence the likelihood of a cessation program being passed, can also effect sales trends. Additionally, temporal variation and outside events (such as natural disasters) can add noise to trends and affect estimates of the treatment effect. These features can occasionally create the illusion of a treatment effect where none exists.

Given a set of treated and control units with outcomes measured pre- and post-intervention, the difference-in-difference estimator is the difference in pre-treatment outcomes between the two units subtracted from the difference in post-treatment outcomes [5]. Under the assumption that the treated and control groups would have parallel outcome trends in the absence of treatment, this estimator is unbiased for the average treatment effect on the treated (ATT). Ease of use and robustness to unmeasured confounding, has made the difference-in-difference approach popular among epidemiologists [21, 10, 19]. However, because the estimator is not robust to deviations from the parallel trends assumption, control units must be selected with care.

To improve the selection of controls, Abadie et al. [1] introduced the synthetic control approach. This method constructs a ”synthetic control” unit using a weighted sum of donor controls. If donors are weighted such that they resemble the treated unit in the pre-intervention period, then the synthetic control should emulate how the treated unit would behave in the post-intervention period in the absence of treatment. This is akin to matching, in that control units that are similar to the treated unit are weighted more heavily than those that are not. While matching may improve comparability between treated and control units, recent work by Daw et al.[12] have shown that matching in difference-in-difference analyses can introduce regression to the mean (RTM) bias. Because of the similarities between matching and the synthetic control method, there is a need to better understand the effects of RTM when using synthetic controls.

In this paper, we examine the effect of RTM on estimates of the ATT coming from the synthetic control and other matched difference-in-difference methods. Through simulations, we show that RTM can result in inflated type I error rates and, in some settings, decreased power. Compared to other matching techniques, these effects are exaggerated in the synthetic control estimator. We also propose a novel sensitivity analysis, which can be used to check how robust inference may be to the effect of RTM bias. Sensitivity and quantitative bias analyses allow researchers to assess the potential effects of systematic error in an experiment [23] and are common in causal inference and missing data settings [24]. We apply our proposed sensitivity analysis to reanalyze data from Abadie et al.[2], estimating the effect of a large-scale tobacco control initiative on smoking sales in California. The data are from a publicly available dataset on state-level annual cigarette sales and are not subject to human subjects review [3].

Methods

Matched Difference-in-Difference

Consider a setting where observations are measured pre-intervention, t = 0, and post-intervention, t = 1. Let Y(t) represent the observed outcome at time t, A be an indicator of treatment status, and X be measured or unmeasured confounders. Define Ya(t) to be the potential outcome [25] which would be observed under treatment A = a at time t. In this setting Y1(0) = Y0(0) because neither group receives treatment at time t = 0. Assume the linear model, E[Y0(t)X]=βX+γt, for the expected potential outcome under no treatment at time t. Because the distribution of X typically differs between the treatment groups, the potential mean under no treatment will differ as well. This model assumes that the effect of time, γ, does not depend on confounders and that the effect of confounders, β, does not depend on time. These are jointly known as the parallel trends assumption. If both are true and if the distribution of covariates within each group remains the same over time, then the expected difference between the potential untreated outcomes for the the treated and control units in the pre-treatment period is equivalent to that in the post-treatment period. Let D(t) be the expected difference between the treatment and control groups at time t. If, alongside the parallel trends assumption, we also assume consistency (YA(t) = Y(t)) and random treatment conditional on X(Y0(t)⫫A/X), then we can show (see Appendix):

D(t)=E[Y(t)A=1]E[Y(t)A=0]=E[Y1(t)Y0(t)A=1]+β(E[XA=1]E[XA=0]).

For t = 1, the first term in this summation is the ATT, which we define as θ. Because Y1(0) = Y0(0), it follows that θ is the difference between D(1) and D(0). A natural estimator of θ uses the empirical means within treatment groups at times t = 0 and t = 1. Specifically, θ^=D^(1)D^(0), where

D^(t)=i=1nYi(t)I(Ai=1)j=1nI(Aj=1)i=1nYi(t)I(Ai=0)j=1nI(Aj=0).

In practice, it may be difficult to identify units such that β and γ are equivalent in the treated and control groups. Ryan et al. [26] have shown that matching can decrease bias by improving the comparability of units. They consider the case where treated and control units are drawn from the same underlying population, but the probability of treatment depends on pre-intervention outcome levels or trend. In this setting, if pre-intervention outcomes are correlated with future observations, then matched difference-in-difference estimators of the ATT are less biased then their unmatched counterparts. However, as we will discuss, if treated and control units are pulled from populations with different outcome distributions, then matching can induce RTM bias into estimates of the ATT.

Regression to the Mean

Regression to the mean (RTM) is a statistical phenomena in which extreme measurements of a random variable tend towards their expected value upon repeat measurement. Bias due to RTM is introduced when three conditions hold: (1) there is variability in outcome measures, (2) the population from which the treated unit is drawn differs from the control population, and (3) matching is done on pre-treatment outcome levels. For example, suppose pre-treatment outcome measurements are obtained from control and treatment populations with mean outcome levels μ0 and μ1, respectively. If μ1 > μ0, then the nearest-neighbor match for a treated unit is expected to be a control unit with outcome greater than μ0. Because this control unit is expected to decrease upon repeat measurement in the post-treatment period (i.e. regress towards its mean), the differences in outcome levels between treated and matched control units are expected to be larger in the post-treatment period than in the pre-treatment period even when there is no treatment effect. In this setting, matching results in a violation of the parallel trends assumption, leading to bias.

Matching procedures

Synthetic Control Method

The method of synthetic controls is provided in detail elsewhere [2], and so we provide only a brief overview. Suppose we collect data on a single treated unit and n0 controls for a total of n0 + 1 units. Let i = 1 index the treated unit and C denote the set of indices for the control units. Collect τ outcome measurements Yi = (Yi1, … Y) on each unit. Treatment is withheld until time τ0, such that j ∈ {1, …, τ0} denote the pre-treatment period and j ∈ {τ0 + 1, …, τ} compose the post-treatment period. Select wk, for kC such that Y1jkCwkYkj for j ∈ {1, …, τ0} and kCwk=1. If weights are chosen so that these equalities approximately hold, then the weighted sum of the post-treatment control vectors can serve as a potential untreated outcome vector for the treated unit.

Nearest Neighbor matching

Let S be the set of pre-treatment outcome measurements from the control group. If s are pre-treatment measures for the treated unit, then we want to find the nearest neighbor match for s over the set S. Given some distance metric, this match is the element of S which minimizes the distance from s [7]. Different distance metrics can result in different matches. In this paper, we consider two implementations of nearest neighbor matching. The first method is based upon the distance between pre-treatment outcome vectors as determined by the L2-norm, while the second uses the L1 distance between coefficients in an OLS regression of pre-treatment outcome measurements on time (i.e. pre-treatment trend).

Simulations

To examine the effect of RTM bias, we simulate a single treated unit alongside n0 = 40 controls. For control units, outcome measurements are drawn from a multivariate normal distribution with mean μ0, marginal variance σ2 = 1, and first order autoregressive (AR(1)) covariance structure with correlation ρ|titj| between outcome measurements at ti and tj. The treatment unit is simulated similarly, with mean μ1 rather than μ0. For each simulated dataset the treated unit is matched to controls using the synthetic control method, nearest neighbor based on the L2-norm, and nearest neighbor based on pre-treatment trend. For comparison, we provide an estimate of the treatment effect using the unmatched difference-in-difference. The situation we consider- that of one single unit and many controls- is typical for applications of synthetic control and unmatched difference-in-difference, but is uncommon for 1:1 nearest neighbor matching. Although it is not common under this setup, we implement the 1:1 nearest-neighbor matching approach to facilitate comparison with the other methods under study. Additionally, simulations by Daw et al. [12] were based on 1:1 nearest-neighbor matching.

If we define Y¯0j=n01kCYkj as the mean of the control units’ outcomes at time j, then we calculate the unmatched difference-in-difference estimator as

θ^=1ττ0j=τ0+1τ{Y1jY¯0j}1τ0j=1τ0{Y1jY¯0j}

For the nearest neighbor and synthetic control methods, the estimator for treatment effect simply replaces Y¯0j with the value of the matched or synthetic control at time j. Because each unit is simulated multivariate normal with constant mean, the parallel trends assumption holds. Additionally, for μ1 relatively close to μ0 the synthetic control method should be able to find wk such that Y1jkCwkYkj for j ∈ {1, …, τ0}. If treated and control units are drawn from the same underlying distribution (μ1 = μ0), then all estimators would be unbiased for the ATT. However when μ1μ0 both the matched difference-in-difference estimator and the synthetic control estimator will be biased from RTM.

Type I error rate

We first consider the effects of outcome level matching on type I error rates. For each method, we use permutation tests to test the null hypothesis of no treatment effect. For i = 1, …, n0 + 1, sequentially treat individual i as if they were the treated unit and estimate θ^i for i = 1, …, n0 + 1. The p-value for this test is given as p=n1I(|θ^1||θ^i|).

We consider settings with varying levels of μ1 (μ1 = 1, …5), ρ (ρ = 0.00, 0.25, 0.50, 0.75, 0.90), and number of pre-treatment observations (4 or 10). Each unit is simulated with four post-treatment observations. For each setting, 2000 simulations are performed. Results for simulations with four and ten pre-treatment observations are presented in Tables 1 and 2 respectively. In both settings, as μ1 moves further from μ0, the type I error rate for the synthetic control and outcome-level based nearest neighbor approach increases. Additionally, as the correlation between repeat observations, ρ, increases, the type I error rate decreases. In all cases, the synthetic controls method leads to greater type I error rate inflation than the nearest neighbor methods. Because the synthetic control uses information from all control units, there is less variance in the biased estimator. Matching on pre-treatment linear trend does not appear to increase type I error rates in any scenario. This is consistent with findings from Daw et al. [12]. By comparing Table 1 with Table 2, we see that the maximum type I error rate is greater when there are fewer pre-treatment observations. As the number of pre-intervention observations grows, there will be less variability in the average pre-intervention outcome levels of each patient. Because of this, the maximum average among control units is expected to be closer to μ0 when there are many pre-intervention observations than when there are fewer. This results in less bias due to RTM and lower type-I error rates.

Table 1:

Type I error rates for the unmatched difference-in-difference, synthetic control (SC), nearest neighbor using the L2-norm (NN1), and nearest neighbor using linear trends (NN2). Data is simulated using 4 pre-intervention observations. In all simulation settings, μ0 = 0 and σ2 = 1. For simulations varying μ1, ρ = 0.5. For simulations varying ρ, μ1 = 5

Type I Error Rate: Varying μ1
Type I Error Rate: Varying ρ
μ1 Unmatched SC NN1 NN2 ρ Unmatched SC NN1 NN2
1.00 0.05 0.16 0.09 0.05 0.00 0.05 0.40 0.29 0.06
2.00 0.05 0.31 0.19 0.05 0.25 0.04 0.39 0.29 0.05
3.00 0.05 0.35 0.25 0.05 0.50 0.04 0.36 0.27 0.05
4.00 0.05 0.35 0.26 0.05 0.75 0.05 0.25 0.18 0.05
5.00 0.04 0.33 0.25 0.05 0.90 0.05 0.18 0.13 0.05

Table 2:

Type I error rates for the unmatched difference-in-difference, synthetic control (SC), nearest neighbor using the L2-norm (NN1), and nearest neighbor using linear trends (NN2). Data is simulated using 10 pre-intervention observations. In all simulation settings, μ0 = 0 and σ2 = 1. For simulations varying μ1, ρ = 0.5. For simulations varying ρ, μ1 = 1

Type I Error Rate: Varying μ1
Type I Error Rate: Varying ρ
μ1 Unmatched SC NN1 NN2 ρ Unmatched SC NN1 NN2
1.00 0.05 0.16 0.07 0.06 0.00 0.05 0.26 0.08 0.05
2.00 0.04 0.25 0.12 0.04 0.25 0.06 0.22 0.08 0.05
3.00 0.05 0.24 0.15 0.04 0.50 0.05 0.16 0.07 0.05
4.00 0.04 0.25 0.15 0.05 0.75 0.05 0.11 0.07 0.05
5.00 0.05 0.26 0.17 0.04 0.90 0.05 0.07 0.06 0.03

Matching with covariates

Because the motivating model for the synthetic control procedure includes covariates which are associated with the outcome of interest, we perform simulations to show the effects of regression to the mean in this setting. We simulate a covariate X from a multivariate normal distribution with mean γ0 for control units and γ1 for the treated unit. X is simulated with an AR(1) error covariance structure with variance, σX2=0.25, and correlation, ρX = 0.4. For each unit, Yi is multivariate normal with mean μi + βxXi where μi = μ1 for the treated unit and μi = μ0 for control units. The synthetic control method now uses both pre-intervention levels of Y and X when constructing weights. We also consider unmatched and nearest neighbor matching in our simulations. The first variation of nearest neighbor matches on the pre-intervention levels of X, while the second matches on pre-intervention trend in X.

Table 3 contains type I error rates obtained while varying the value of γ1 between 0 and 5 and βx between 0 and 2. In these simulations μ0 =0, μ1 = 5 and ρ = 0.5. The type I error rate increases as the distributions of X in the treated and control groups move further apart and as the effect of X on the outcome level increases.

Table 3:

Type I error rate for the different estimators of the ATT. Data is simulated using 4 pre-treatment observations and a covariate associated with the outcome level. Here, μ0 =0, μ1 = 2, ρ = 0.5, and σ2 = 1. When varying γ1, βx = 1. When varying β1, γ1 = 1.

Type I Error Rate: Varying γ1
Type I Error Rate: Varying βx
γ1 Unmatched SC NN1 NN2 βx Unmatched SC NN1 NN2
0.00 0.05 0.27 0.06 0.03 0.0 0.05 0.30 0.05 0.04
1.00 0.04 0.37 0.05 0.05 0.5 0.05 0.32 0.05 0.04
2.00 0.06 0.36 0.06 0.05 1.0 0.05 0.38 0.05 0.05
3.00 0.06 0.36 0.05 0.05 1.5 0.05 0.36 0.06 0.05
4.00 0.05 0.39 0.05 0.05 2.0 0.04 0.35 0.04 0.03

Bias towards the null

As evidenced by previous simulations, matching on pre-treatment outcomes can lead to anti-conservative bias. However, in some settings where there is a treatment effect, RTM can also result in bias towards the null. To illustrate this phenomena, we perform 1000 simulations with μ0 =0, μ1 = 2, and ρ = 0.5. We induce a treatment effect, θ. For each additional time point in the treatment period, the expected outcome for the treated unit increases by θ. For negative θ, the treatment effect and bias due to RTM are working in opposite directions resulting in bias towards the null and conservative rejection rates. Figure 1 provides rejection rates for the unmatched, nearest neighbor, and synthetic control procedures when θ is between 0 and −1.5. As in the previous simulations, we see that when there is no treatment effect, the synthetic control method exhibits inflated type I error rates while the unmatched data has appropriate rejection rates. As the treatment effect increases, the rejection rate of the unmatched estimator’s power surpasses that of the synthetic control method. These results indicate that, depending on the direction of the treatment effect in relation to the direction of RTM, the synthetic control method can result in either conservative or anti-conservative bias.

Figure 1:

Figure 1:

Empirical probability of rejecting the hypothesis that there is no treatment effect (θ = 0) as a function of θ using the unmatched, synthetic control method (SC), nearest neighbor matching on L2-norm (NN1), and nearest neighbor matching on linear trend (NN2).

Correction and Sensitivity Analysis

Suppose Y1, …, YT are jointly normal random variables. By properties of the multivariate normal distribution, for any given i and j, we have E[YiYj]=μi+ΣijΣjj1(Yjμj). If we know the mean vectors and the covariance structure for each unit, then we can use this representation to account for RTM bias in matched difference-in-difference estimates of the ATT. To illustrate, suppose we have a single treated unit and a sample of control units. Using a 1:1 matching technique, the treated unit Y1 is matched with a control unit, Ym, and an estimate of the ATT, θ^obs, is obtained. This estimate can be conceptualized as the sum of the effect due to RTM bias and the effect due to treatment. Our correction technique subtracts the estimated effect of RTM, θ^rtm, from the observed effect to obtain a bias-adjusted estimate of the ATT, θ^adj=θ^obsθ^rtm.

If we assume that observations are normally distributed and follow a Markov process (as would be true for an AR(1) error structure), then we can predict post-intervention observations using only the most recent pre-intervention observation. Define Y^ij=μij+Σjτ0Στ0τ01(Yiτ0μiτ0) for j > τ0 and i ∈ {1, m} where m is the index of the matched control unit. Here, τ0 is the final pre-treatment observation time and μij=E[Yi0(j)] is the expected potential outcome level under no treatment for unit i at post-treatment time j. Y^ij is the expected observation for unit i at post-treatment time j conditional on the pre-treatment observations assuming no treatment effect. Estimating the ATT using these expected values in place of observed post-treatment values for the treated and matched control units provides θ^rtm, which can be used to find θ^adj.

To generalize this adjustment for use with synthetic controls, first obtain the synthetic control weights wk for kC. Using these weights, construct a synthetic outcome vector YS, where YSj=kCwkYkj, and corresponding estimate of the ATT, θ^obs. To obtain the expected difference-in-difference estimate under RTM construct an augmented synthetic control using the fit weights and replacing post-treatment control measurements with the previously defined Y^ijs. Note that Y^ij must be found for j ∈ {1, …, n}. Call this augmented control Y^S and calculate θ^rtm by subtracting the mean difference in observed pre-treatment outcomes of the treated unit and the synthetic control unit from the mean difference in expected post-treatment outcomes. The covariance matrices used to construct the Y^ij in this correction procedure are those of the unmatched and unweighted observations. The goal of this procedure is to use pre-intervention observations to estimate the expected outcomes of each unit in the post-intervention period under the assumption of no treatment effect. Applying the different estimators of the ATT to those projections allows us to determine what portion of the original estimate of the ATT is explainable by RTM bias.

As a proof of concept, we perform 2000 simulations with outcomes drawn from a multivariate normal distribution with AR(1) error structure. Here, μ0= 0, μ1 = 1, σ2 = 1, ρ = 0.5, and there is no treatment effect. For each simulation, we test the null hypothesis of no treatment effect using the permutation test described earlier, replacing θ^i with θ^i,adj. To test if this adjustment is robust to deviations from this assumption, we also determine type I error rates when outcomes are drawn from a multivariate t-distribution. Simulation results are given in Table 4. When errors are normally distributed, the adjusted synthetic control estimate of the ATT attains nominal type I error rates. Error rates are inflated for t-distributed outcomes, particularly for highly correlated outcomes. However, observed error rates are lower than those obtained in Table 1 using the unadjusted synthetic control approach with normally distributed errors.

Table 4:

Type I error rates for the adjusted difference-in-difference estimator when normality assumption is not satisfied. Errors come from a t-distribution with degrees of freedom described.

Degrees of Freedom ρ = 0.25 ρ = 0.50 ρ = 0.75
∞ (Normal) 0.05 0.05 0.05
50 0.05 0.05 0.06
10 0.05 0.06 0.08
3 0.05 0.08 0.12

In practice, estimating θ^adj is not possible without assuming the values of μ1j, μ0j, ρ, or σ. We propose treating these as sensitivity parameters. By positing a range of values for these parameters and calculating θ^adj under each set, we can quantify how much an estimate of the ATT is affected by RTM.

Reanalysis of smoking cessation data

To further understanding of our proposed sensitivity analysis, we reanalyze data from Abadie et al. [2] concerning the effect of California’s Proposition 99 on smoking cessation. The act added a 25 cents per pack tax on the sale of cigarettes, and earmarked tax revenue for use in health care programs and anti-tobacco advertisements. The original analysis concluded that the initiative decreased cigarette consumption in California by approximately 20 packs per capita annually. Because this analysis was based upon the synthetic control method, we aim to determine if these findings are robust to RTM bias using our proposed sensitivity analysis.

Following the original study, this analysis was based upon cigarette consumption rates in California and 38 control states. Figure 2 provides a plot of cigarette consumption rates between 1970 and 2000 for the included states. To construct the synthetic control unit, we use logged per capita GDP, the average retail price of cigarettes within each state, beer consumption per capita, the percentage of the population aged 15 to 24, and cigarette sales in the pre-treatment years 1975, 1980, and 1988. Without adjustment, we estimate that Proposition 99 reduced consumption by 18.9 packs per capita annually between the years 1989 and 2000.

Figure 2:

Figure 2:

Tobacco consumption (per capita cigarette consumption) in a subset of states between 1970–2000. California highlighted in black, treatment initiation indicated by dashed red line.

To perform the sensitivity analysis we must (1) propose a set of reasonable models for the distribution of outcomes under no treatment effect for both the treated and control groups, (2) calculate the expected value of outcomes in the post treatment period given our assumed outcome models and pre-treatment observations, and (3) calculate adjusted estimates of the ATT using these values. Proposed distributions of outcomes under no treatment effect can be pulled from domain knowledge or from statistical modeling. For illustration, we employ generalized estimating equations (GEE) with an AR(1) working correlation matrix to regress per capita cigarette sales on the covariates used to construct the synthetic control. Because this model is meant to estimate the ATT under the assumption of no treatment effect, we fit the model using all states. If there is no treatment effect, then even California’s outcomes are representative of outcomes under no treatment. The estimated residual standard deviation is 11.6 and the sample correlation is 0.72. Let gi(t) denote the predicted value from the fit model for unit i at time t. Define Y^ij=ρ^j1988(Yi,1988gi(1988))+gi(j) for j > 1988 and i ∈ {1, …, 39}. If Wk for kC are the weights for our synthetic control, then θ^rtm=112j=19892000(Y^1jY^sj)119j=19701988(Y1jYsj) where Ysj=kCwkYkj and Y^sj=kCwkY^kj. This expected estimate under no treatment effect is then subtracted from the observed estimate to obtain the adjusted estimate of the ATT. Using this procedure, the adjusted estimate of the ATT is 12.1 with p-value 0.10. We also consider the set of outcome models indexed by Δ:gi(j;Δ)=gi(j)+ΔI(i=1). This is the same null outcome model considered above except we shift the mean of the treated unit by Δ. For Δ = −1 and 1, the adjusted estimates of the ATT is 11.3 (p = 0.1) and 12.9 (p = 0.05). Likewise, if we look at Δ = −5 and 5, then the estimates of the ATT become 8.14 (p = 0.3) and 16.0 (p = 0.05). Because the estimated treatment effects and associated p-values vary over relatively similar null outcome models, there is evidence that RTM may play a large role in our estimate of the ATT and suggest that further research be done to determine the effect of the tobacco tax.

Applying multiple models for the null outcome distribution when performing this analysis can help better characterize the sensitivity of results. If, instead of an AR(1) structure, we had chosen an unstructured error model then the adjustment would proceed as described except we would change our calculation of Y^. In this case, calculate Y^ij=Σj,preΣpre,pre1(Yi,pregi,pre)+gi(j) where Yi,pre is the vector of pre-intervention observations, gi,pre is the vector of predicted pre-intervention outcomes obtained from our fit model, and Σpre,pre is the estimated covariance matrix of the pre-intervention outcomes. Because the unstructured model does not have the Markov property, we must condition on all pre-treatment observations when calculating our expected values of Y under the null distribution.

Discussion

In this paper we have illustrated the effects of RTM bias on matched difference-in-difference estimators. This builds upon work done by Daw et al. [12] showing the bias induced by 1:1 nearest neighbor matching. Here, we have shown how the synthetic control approach can also introduce bias and have provided simulations showing the effect of this bias on type I error rates and power. Our results suggest that synthetic control approaches are more prone to bias than nearest neighbor matching. Added ”confidence” in the model, gained from utilizing information from all of the control units, can increase the type I error rate by a factor of two over the nearest neighbor approach. We also developed an approach to determine the sensitivity of matched estimators of the ATT to RTM bias. Using our approach, we showed that previous results concerning the effect of Proposition 99 on cigarette consumption in California [2] may be overstated.

The results we obtained differ from those of Ryan et al. [26], which showed that matching can be beneficial in settings where the probability of treatment is associated with pre-intervention outcomes but both treated and control units are drawn from the same underlying distribution. In our simulations and in those of Daw et al. [12], treated and control units are drawn from populations with different outcome distributions. Because it is difficult to know which of these two settings holds, we can not know whether matching will be protective against bias or if it will induce bias. There is a need for methodology which performs well in either settings. More recent implementations of synthetic controls, such as those of Doudchenko & Imbens [13] and Arkhangelsky [6], weight control units based on similarity in pre-intervention outcome trends with the treated unit rather than on outcome levels. Because our results show that matching on trend did not induce bias, these implementations of synthetic control could be helpful. Further research is needed to determine how these and other recent approaches (such as that of Ben-Michael et al. [9], are affected my RTM bias. Because the goal of this paper is to examine the effects of RTM on the simpler and more popular variants of synthetic controls and matched difference-in-difference, we do not consider these approaches. Additionally, we do not consider how k:1 matching techniques are affected by RTM bias. Stuart [28] suggests that whether k:1 matching is superior to 1:1 matching depends on the setting. Thus, comparisons with k:1 matching may be more nuanced and deserving of further research.

In the future, it may be worthwhile to look for ways to correct for RTM bias when we can not assume normality of errors. For t-distributed errors, we noticed that type I error rates were slightly greater than desired α-levels. While the adjustment still performed better than the unadjusted synthetic control estimator, we believe the method could be improved upon. As a whole, we believe that when researchers apply matched difference-in-difference estimators, they should also provide evidence that their results are robust to RTM bias, either by using our adjusted difference-in-difference estimator or by providing a sensitivity analysis.

Supplementary Material

Supplemental Digital Content

Acknowledgments

Sources of financial support:

The work of Dr. Shaw and Small was supported in part by R01 NIH grant R01-AI131771

A Appendix

We wish to prove:

D(t)=E[Y1(t)Y0(t)A=1]β(E[XA=1]E[YA=0])

Proof. Consider the following:

D(t)=E[Y(t)A=1]E[Y(t)A=0]=E[Y1(t)A=1]E[Y0(t)A=0]=E[Y1(t)Y0(t)A=1]+E[Y0(t)A=1]E[Y0(t)A=0]

Here, the second line follows from the consistency assumption. Next note, for A = 0 or 1, the expected value of the potential distribution can be rewritten as:

E[Y0(t)A]=E{E[Y0(t)X,A]A}=E{E[Y0(t)X]A}=E[βX+γtA]=βE[XA]+γt

The second line is true because we assume Y0(t)⫫A/X. Plugging this into the expression for D(t) we can see,

D(t)=E[Y1(t)Y0(t)A=1]+β(E[XA=1]E[XA=0])

Footnotes

Conflicts of Interest:

The authors report no conflicts of interest.

Data and Code:

All code for replicating the results of this article can be found in the eAppendix.

References

  • [1].Abadie Alberto. Semiparametric Difference-in-Differences Estimators. The Review of Economic Studies, 72(1):1–19, 2005. [Google Scholar]
  • [2].Abadie Alberto, Diamond Alexis, and Hainmueller Jens. Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program. Journal of the American Statistical Association, 105(490):493–505, 2010. [Google Scholar]
  • [3].Abadie Alberto, Diamond Alexis, and Hainmueller Jens. Synth: Stata Module to Implement Synthetic Control Methods for Comparative Case Studies. 2014. [Google Scholar]
  • [4].Althauser Robert P and Rubin Donald. Measurement Error and Regression to the Mean in Matched Samples. Social Forces, 50(2):206–214, 1971. [Google Scholar]
  • [5].Angrist Joshua D and Pischke Jörn-Steffen. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press, 2008. [Google Scholar]
  • [6].Arkhangelsky Dmitry, Athey Susan, Hirshberg David A, Imbens Guido W, and Wager Stefan. Synthetic difference in differences Technical report, National Bureau of Economic Research, 2019. [Google Scholar]
  • [7].Arya Sunil, Mount David M, Netanyahu Nathan, Silverman Ruth, and Wu Angela Y. An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. In Proc. 5th ACM-SIAM Sympos. Discrete Algorithms, pages 573–582, 1994. [Google Scholar]
  • [8].Basu Sanjay, Rehkopf David H, Siddiqi Arjumand, Glymour M Maria, and Kawachi Ichiro. Health Behaviors, Mental Health, and Health Care Utilization among Single Mothers after Welfare Reforms in the 1990s. American Journal of Epidemiology, 183(6):531–538, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Ben-Michael Eli, Feller Avi, and Rothstein Jesse. The augmented synthetic control method. arXiv preprint arXiv:1811.04170, 2018. [Google Scholar]
  • [10].Branas Charles C, Cheney Rose A, MacDonald John M, Tam Vicky W, Jackson Tara D, and Ten Thomas R Have. A Difference-in-Differences Analysis of Health, Safety, and Greening Vacant Urban Space. American Journal of Epidemiology, 174(11):1296–1306, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Card David and Krueger Alan B. Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. American Economic Review, 90(5):1397–1420, 2000. [Google Scholar]
  • [12].Daw Jamie R and Hatfield Laura A. Matching and Regression to the Mean in Difference-in-Differences Analysis. Health Services Research, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Doudchenko Nikolay and Imbens Guido W. Balancing, regression, difference-in-differences and synthetic control methods: A synthesis Technical report, National Bureau of Economic Research, 2016. [Google Scholar]
  • [14].Duflo Esther. Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment. American Economic Review, 91(4):795–813, 2001. [Google Scholar]
  • [15].Dynarski Susan M. Does Aid Matter? Measuring the Effect of Student Aid on College Attendance and Completion. American Economic Review, 93(1):279–288, 2003. [Google Scholar]
  • [16].Fichtenberg Caroline M and Glantz Stanton A. Association of the California Tobacco Control Program with Declines in Cigarette Consumption and Mortality from Heart Disease. New England Journal of Medicine, 343(24):1772–1777, 2000. [DOI] [PubMed] [Google Scholar]
  • [17].Gallin John I and Ognibene Frederick P. Principles and Practice of Clinical Research. Academic Press, 2012. [Google Scholar]
  • [18].Glantz Stanton A. Changes in Cigarette Consumption, Prices, and Tobacco Industry Revenues Associated with California’s Proposition 99. Tobacco Control, 2(4):311, 1993. [Google Scholar]
  • [19].Hamad Rita, Batra Akansha, Karasek Deborah, LeWinn Kaja Z, Bush Nicole R, Davis Robert L, and Tylavsky Frances A. The Impact of the Revised WIC Food Package on Maternal Nutrition during Pregnancy and Postpartum. American Journal of Epidemiology, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Hendryx Michael and Holland Benjamin. Unintended consequences of the clean air act: Mortality rates in appalachian coal mining communities. Environmental Science & Policy, 63:1–6, 2016. [Google Scholar]
  • [21].Kagawa Rose MC, Castillo-Carniglia Alvaro, Vernick Jon S, Webster Daniel, Crifasi Cassandra, Rudolph Kara E, Cerdá Magdalena, Shev Aaron, and Wintemute Garen J. Repeal of Comprehensive Background Check Policies and Firearm Homicide and Suicide. Epidemiology, 29(4):494–502, 2018. [DOI] [PubMed] [Google Scholar]
  • [22].Kreif Noémi, Grieve Richard, Hangartner Dominik, Turner Alex James, Nikolova Silviya, and Sutton Matt. Examination of the Synthetic Control method for Evaluating Health Policies with Multiple Treated Units. Health Economics, 25(12):1514–1528, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Lash Timothy L, Fox Matthew P, MacLehose Richard F, Maldonado George, McCand-less Lawrence C, and Greenland Sander. Good Practices for Quantitative Bias Analysis. International Journal of Epidemiology, 43(6):1969–1985, 2014. [DOI] [PubMed] [Google Scholar]
  • [24].Robins James M, Rotnitzky Andrea, and Scharfstein Daniel O. Sensitivity Analysis for Selection Bias and Unmeasured Confounding in Missing Data and Causal Inference Models In Statistical Models in Epidemiology, the Environment, and Clinical Trials, pages 1–94. Springer, 2000. [Google Scholar]
  • [25].Rubin Donald B. Causal Inference using Potential Outcomes: Design, Modeling, Decisions. Journal of the American Statistical Association, 100(469):322–331, 2005. [Google Scholar]
  • [26].Ryan Andrew M, Burgess James F Jr, and Dimick Justin B. Why we Should not be Indifferent to Specification Choices for Difference-in-Differences. Health Services Research, 50(4):1211–1235, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Siegel Michael. The Effectiveness of State-Level Tobacco Control Interventions: a Review of Program Implementation and Behavioral Outcomes. Annual Review of Public Health, 23(1):45–71, 2002. [DOI] [PubMed] [Google Scholar]
  • [28].Stuart Elizabeth A. Matching methods for causal inference: A review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics, 25(1):1, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Tordoff Diana, Andrasik Michele, and Hajat Anjum. Misclassification of Sex Assigned at Birth in the Behavioral Risk Factor Surveillance System and Transgender Reproductive Health: A Quantitative Bias Analysis. Epidemiology, 30(5):669–678, 2019. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Digital Content

RESOURCES