The augmented synthetic control method in public health and biomedical research

Taylor Krajewski; Michael Hudgens

doi:10.1177/09622802231224638

. 2024 Feb 6;33(3):376–391. doi: 10.1177/09622802231224638

The augmented synthetic control method in public health and biomedical research

Taylor Krajewski ^1,^✉, Michael Hudgens ¹

PMCID: PMC10981189 NIHMSID: NIHMS1981586 PMID: 38320801

Abstract

Estimating treatment (or policy or intervention) effects on a single individual or unit has become increasingly important in health and biomedical sciences. One method to estimate these effects is the synthetic control method, which constructs a synthetic control, a weighted average of control units that best matches the treated unit’s pre-treatment outcomes and other relevant covariates. The intervention’s impact is then estimated by comparing the post-intervention outcomes of the treated unit and its synthetic control, which serves as a proxy for the counterfactual outcome had the treated unit not experienced the intervention. The augmented synthetic control method, a recent adaptation of the synthetic control method, relaxes some of the synthetic control method’s assumptions for broader applicability. While synthetic controls have been used in a variety of fields, their use in public health and biomedical research is more recent, and newer methods such as the augmented synthetic control method are underutilized. This paper briefly describes the synthetic control method and its application, explains the augmented synthetic control method and its differences from the synthetic control method, and estimates the effects of an antimalarial initiative in Mozambique using both the synthetic control method and the augmented synthetic control method to highlight the advantages of using the augmented synthetic control method to analyze the impact of interventions implemented in a single region.

Keywords: Causal inference, synthetic control, comparative case studies, counterfactual, treatment effect

1. Introduction

Estimates of treatment or policy effects on a single unit (e.g. country or region) are critical for informing public health interventions, but quantifying the impact of a treatment on a single treated unit is difficult as in general one cannot observe what that unit’s outcome would have been in the absence of treatment. Nonetheless, estimating the effect of a treatment on a single unit has become increasingly important, particularly in the health and biomedical sciences. For example, precision medicine seeks to develop individualized treatment plans since treatments that work well on average can elicit varied individual responses.¹ Similarly, policymakers are often interested in the effect of a public health policy or intervention on a single state or region.^2–4

Randomized controlled trials (RCTs) are the gold standard for measuring the effectiveness of a new intervention.⁵ However, RCTs may not be feasible or ethical in many scenarios. When treatment is not randomly assigned, methods that evaluate the impact of an intervention based on observational data collected over time on the same participants may be used. Quasi-experimental methods such as interrupted time series (ITS) or difference-in-differences (DID) are often used to compare the longitudinal trajectory of the outcome of the treated unit to the evolution of the same outcome in a set of control units that did not experience the treatment. However, these methods have limitations that make them incompatible with the analysis of many policies or interventions. ITS is frequently used in policy evaluation, but it cannot account for other events or interventions that may be implemented at the same time as the intervention of interest.^6–8 The validity of DID estimates relies on the assumption that, in the absence of intervention, the outcome in the treated and control units would follow parallel trends, an assumption that may be violated in settings where pre-treatment characteristics of the treated and control units that may be predictive of the outcome are unbalanced.^7
–9

The synthetic control method (SCM) is applicable in many settings in which the above-described methods are not valid. SCM uses a data-driven procedure to formally select control units and constructs a synthetic control, a weighted average of control units that best matches the treated unit’s pre-treatment outcomes and other relevant covariates.^2,10,11 Each control unit receives a non-negative weight, and thus one can describe the synthetic control in terms of each control units’ contribution. The impact of the treatment is estimated by comparing the post-treatment outcomes of the treated unit and its synthetic control, which serves as a proxy for the counterfactual outcome had the treated unit not experienced the treatment. By creating a synthetic control that closely matches the treated unit’s outcome trajectory and other characteristics predictive of the outcome in the pre-treatment period, SCM does not rely on the parallel trends assumption of DID. Additionally, exposure to other events or interventions that may affect the outcome can be included when balancing characteristics in the pre-treatment period, addressing a limitation of ITS.

Since its inception in 2003, SCM has made substantial contributions to policy evaluation methods and has been extended and adapted in a myriad of ways and applied to a wide variety of fields.¹² One recent adaptation of SCM is the augmented synthetic control method (ASCM).¹³ While SCM requires the synthetic control to closely reproduce the trend in the treated unit’s outcome in the pre-treatment period, ASCM allows for situations where a close fit between the synthetic control and the treated unit in the pre-treatment period is infeasible. ASCM estimates the bias due to imperfect pre-treatment fit and then de-biases the original SCM estimate.¹³ When the bias is small (good pre-treatment fit), ASCM estimates will be similar to SCM estimates.

SCM and other methods that use synthetic controls have been widely applied in a variety of fields. In economics, SCM has been used to assess the effect of democratization or specific political regimes on child mortality,^14,15 to examine how the implementation of taxes on sweetened beverages in Philadelphia, PA impacted employment in related industries,¹⁶ and to study the effectiveness of post-COVID-19 fare-free public transport policies in increasing daily public transportation passenger flow.¹⁷ Political scientists have employed SCM to assess the impact of the George Floyd protests on police resignations,¹⁸ and to estimate the effect of overseeing natural disaster relief on politicians’ election outcomes.¹⁹ Criminologists have also used SCM to estimate the impact of right-to-carry concealed handgun laws on violent crime.²⁰ Synthetic controls have also been used in education-related research to evaluate how initiatives aimed to improve teacher effectiveness impact student outcomes,²¹ and to investigate how the Say Yes to Education reform in Syracuse, NY impacted district enrollment and graduation rates.²²

The use of synthetic control methods in public health and biomedical research, however, is more recent.²³ The COVID-19 pandemic has led to increased use of these methods to evaluate the impact of COVID-related policies. For instance, these methods have been used to analyze the effects of mandatory lockdowns, including investigating if strict COVID-19 lockdown procedures lead to improved air quality,²⁴ and if extending lockdown periods within specific regions would have decreased COVID-19 incidence in those regions.^3,25 Additionally, these methods have been used to analyze how vaccine lotteries impact vaccination rates and COVID-19 incidence, and if mandatory COVID-19 certification (i.e., showing vaccination, recent negative test, or proof of recovery) increased vaccine uptake.^26–28 Synthetic control methods continue to be used in the post-pandemic era to estimate the effect of COVID-19 on the economy, including its impact on public transit usage.²⁹ While SCM and other methods that use synthetic controls have been used in health-related studies in recent years, newer modifications of SCM, such as ASCM, are still underutilized in this setting.

This article aims to highlight the utility of synthetic control methods, specifically ASCM, in public health and biomedical research. A brief overview of SCM and adaptations of SCM are provided in Section 2. Then, Section 3 explains one of those adaptations, ASCM. Section 4 presents a new application of ASCM to evaluate the short-term effects of the mass drug administration-based malaria elimination initiative in the Magude district of the Maputo province in Southern Mozambique, previously analyzed using SCM.³⁰ These parallel analyses are compared to highlight the advantages of using ASCM to evaluate the impact of interventions implemented in a single region. This article concludes with a discussion, including limitations and future directions, in Section 5.

2. The synthetic control method

This section provides a brief overview of SCM. For a more extensive summary of SCM, SCM-related methods, and implementation, see Bonander et al.³¹ or Abadie.³²

2.1. Notation and implementation

Suppose units $i = 1, \dots, N$ are observed for $t = 1, \dots, T$ time periods, and assume $N$ is fixed. For simplicity, consider the case where only the first unit ( $i = 1$ ) receives the treatment (or intervention or exposure), and the remaining $N_{0} = N - 1$ units are control units. The treated unit is unexposed at time points $t = 1, \dots, T_{0}$ (the pre-treatment period) and exposed thereafter. All other units ( $i = 2, \dots, N$ ) remain untreated throughout the $T$ time periods. Let $W_{i t}$ be an indicator of treatment for unit $i$ at time $t$ , such that $W_{i t} = 1$ for $i = 1$ and $t > T_{0}$ , and $W_{i t} = 0$ otherwise. Throughout, $\sum$ represents $\sum_{i = 2}^{N}$ , which is equivalent to $\sum_{i : W_{i} = 0}$ in this setting where only the first unit is treated.

Let $Y_{i t} (1)$ and $Y_{i t} (0)$ be the potential outcomes that would be observed for unit $i$ in period $t$ under treatment and in the absence of treatment, respectively. Throughout, the stable unit treatment value assumption (SUTVA) is presumed,³³ which supposes each unit’s outcome is not affected by the treatment of other units and there are not multiple versions of treatment (consistency). Thus, the observed outcome for unit $i$ at time $t$ is

Y_{i t} = W_{i t} Y_{i t} (1) + (1 - W_{i t}) Y_{i t} (0)

and we aim to estimate the effect of the intervention on the treated unit,

τ_{1 t} = Y_{1 t} (1) - Y_{1 t} (0) = Y_{1 t} - Y_{1 t} (0)

for $t > T_{0}$ , where the second equality holds by consistency.

Since $Y_{1 t}$ is observed for $t > T_{0}$ , $Y_{1 t} (0)$ must be estimated in order to estimate $τ_{1 t}$ . Let $γ = (γ_{2}, \dots, γ_{N})^{'}$ be a vector of weights subject to the constraints

γ_{i} \geq 0 for i = 2, \dots, N

(1)

and

\sum γ_{i} = 1

(2)

The SCM constraints (1) and (2) limit $γ$ to the simplex $△^{N_{0}} = {γ \in R^{N_{0}} | γ_{i} \geq 0 \forall i, \sum γ_{i} = 1}$ . Each weight vector $γ$ can be used to construct an estimator of $Y_{1 t} (0)$ given by

{\hat{Y}}_{1 t} (0) = \sum γ_{i} Y_{i t}

(3)

The SCM estimator corresponds to weights $γ^{scm}$ which minimize differences in pre-treatment period data between the treated unit and control units, as described below. The SCM estimator of $τ_{1 t}$ is thus

{\hat{τ}}_{1 t}^{scm} = Y_{1 t} - \sum γ_{i}^{scm} Y_{i t}

(4)

The weight vector $γ^{scm}$ is estimated as follows. For unit $i$ , let $Z_{i}$ be a (column) vector of covariates (not affected by treatment)² assumed to be predictive of the outcome variable measured during the pre-treatment period, and let $Y_{i}$ be the vector of observed outcomes during the pre-treatment period. Let $X_{1} = (Z_{1}^{'}, Y_{1}^{'})^{'}$ be the vector describing the treated unit ( $i = 1$ ) in the pre-treatment period, and let $X_{0}$ be the matrix describing the control units ( $i = 2, \dots, N$ ) such that the $i$ th column of $X_{0}$ is $X_{i} = (Z_{i}^{'}, Y_{i}^{'})^{'}$ . The SCM weights $γ^{scm}$ solve the optimization problem

min_{\begin{matrix} γ \end{matrix}} | | V^{\frac{1}{2}} (X_{1} - X_{0}^{'} γ) | |_{2}^{2} = min_{\begin{matrix} γ \end{matrix}} [(X_{1} - X_{0}^{'} γ)^{'} V (X_{1} - X_{0}^{'} γ)]

(5)

subject to constraints (1) and (2) where $V$ is a symmetric, positive semidefinite matrix that can be chosen a priori (e.g. $V$ may be the identity matrix) or based on the observed data.^11,31

If a synthetic control fails to adequately fit unit $i$ ’s outcome in the pre-treatment period, then the synthetic control may fail to give a good approximation of $Y_{i t} (0)$ for $t > T_{0}$ . Thus, after determining the weights, $γ^{scm}$ , diagnostics should be performed to assess the pre-treatment fit of the corresponding synthetic control. Pre-treatment fit can be evaluated visually by examining a graph of the pre-treatment outcome trajectory for the treated unit compared to the synthetic control. Additionally, the fit can be evaluated by the root mean squared prediction error (RMSPE), the square root of the average of the squared difference between the pre-treatment outcome of the treated unit and the synthetic control, ${{T_{0}}^{- 1} \sum_{t = 1}^{T_{0}} (Y_{1 t} - \sum γ_{i}^{scm} Y_{i t})^{2}}^{1 / 2}$ , during the pre-treatment period. The magnitude of the RMSPE is context-specific and should be evaluated relative to the scale of the outcome variable. Finally, the pre-treatment covariate balance between the treated unit and synthetic control should be examined.

2.2. Estimation error

The bias of the SCM estimator depends on the assumed data generating process. Suppose

Y_{i t} (0) = m_{i t} + ϵ_{i t}

(6)

where $m_{i t}$ may be a function of pre-treatment outcomes and/or predictors of the outcome, and the error terms $ϵ_{i t}$ are assumed to have mean zero. Different assumed model components $m_{i t}$ result in differing bounds on the bias of the SCM estimator. Chernozhukov et al.³⁴ provide an extensive discussion on various panel data models that are encompassed by (6). For example, a linear relationship between the outcome variable and predictors could be assumed, as in the linear factor model

m_{i t} = δ_{t} + θ_{t} Z_{i} + λ_{t} μ_{i}

(7)

where the parameter $δ_{t}$ may vary over time but is constant across units, $μ_{i}$ is a vector of unobserved predictors, and $θ_{t}$ and $λ_{t}$ are time varying coefficients.³⁵ Here, if the predictors $Z_{i}$ and $μ_{i}$ are considered nonrandom quantities,¹³ then the randomness in $Y_{i t} (0)$ under (7) is only from $ϵ_{i t}$ . Under this model and certain additional assumptions, the bias of the SCM estimator (4) can be bounded by a function that goes to zero as the number of pre-treatment periods increase ( $T_{0} \to \infty$ ). Bias of (4) for other forms of $m_{i t}$ , such as a vector autoregressive model or a linear function of lagged outcomes, has also been studied.^2,13

2.3. Inference

Traditional statistical inferential methods are typically not well-suited for the SCM setting. In particular, because SCM is often performed with a small number of units, large sample approximations (based on $N \to \infty$ ) are generally not applicable. Likewise, treatment is typically not randomized in settings where SCM is applied, such that randomization-based inference methods are also not applicable. In addition, the inferential goal to estimate the effect of treatment on a single unit differs from sampling-based methods for drawing inference about population parameters. Further, in many settings where SCM is applied, the sample is not randomly selected from a well-defined population, and thus sampling-based methods are not applicable. For these reasons, inferential methods for SCM typically rely on permutation-type arguments or approximations based on $T_{0} \to \infty$ to conduct hypothesis tests and construct confidence intervals.

For example, Abadie et al.² proposed inference via a placebo test in which the SCM estimate is compared to a permutation distribution. To calculate the permutation distribution, SCM is applied separately to each of the control units to estimate the effect of the treatment on the units that were not exposed. Then, the estimated effect, or estimated average effect in the case of multiple post- $T_{0}$ time periods, of the treatment on all units (treated and control) are pooled to create the permutation distribution, and the p-value $p$ is the proportion of “placebo effects” that are as or more extreme than that of the treated unit. By construction, the placebo test p-value is constrained by the number of units, in particular, $p \in {1 / N, 2 / N, \dots, 1}$ . The p-value equals the probability of obtaining results as extreme as those seen in the treated unit if any one of the $N$ units were randomly labeled as the treated unit, and thus a small p-value provides evidence of an intervention effect. When the intervention is randomly assigned, this procedure is the same as classical randomization inference.² However, random assignment is uncommon in most settings in which SCM is applied.

The placebo test described above is a permutation test in which the treatment effect serves as the test statistic. Alternatively Abadie et al.² recommend using a RMSPE-based test statistic as follows. First, calculate the RMSPE for each unit (treated and controls) in both the pre- and post-treatment periods separately, where the post-treatment RMSPE is the square root of the average of the squared difference between the outcome of the treated unit and its synthetic control for $t > T_{0}$ . Then the post/pre-treatment ratio of the RMSPE is calculated for each unit, and the treated unit’s ratio is compared to the distribution of all ratios. The p-value is the proportion of RMSPE ratios as or more extreme than the treated unit’s RMSPE ratio.

Another option, when the available data includes a long pre-treatment period, is to perform an “in-time placebo test” in which the treatment time is falsely shifted to a time that occurs before the true time of treatment (i.e. a false treatment time).¹¹ The outcomes of the treated unit and its synthetic control, recalculated based on the false treatment time, are then compared in the period after the false treatment time. This process is repeated for multiple false treatment times, and the p-value is constructed by comparing the estimated treatment effect based on the true treatment time to the distribution of the estimated treatment effects when the treatment time is shifted to one of the false treatment times. Thus, by construction, the p-value is constrained by the number of pre-treatment periods, that is, $p \geq 1 / T_{0}$ .

Statistical inference procedures for SCM estimates are an active area of research, and several alternatives to placebo tests have been proposed to construct variance estimates and confidence intervals.^34,36,37 For example, Chernozhukov et al.³³ obtain p-values and corresponding confidence intervals by permuting estimated residuals across the time dimension in a repeated sampling framework, which is discussed further in Section 4.3. See Abadie for a summary of inferential methods for SCM.³²

2.4. Considerations for application

Data availability: To employ SCM, longitudinal data on the outcome and covariates believed to be predictive of the outcome, but not affected by treatment,² should be available for the treated unit and potential controls. SCM analyses are generally considered more credible if data is utilized over a long pre-treatment period. Under certain assumptions, for example, a linear factor model (7), the bias of the synthetic control estimator is bounded by a function that goes to zero as $T_{0} \to \infty$ .² There is a greater risk of overfitting and bias when data is only available from a few pre-intervention time periods; that is, excellent pre-treatment fit may be (spuriously) attained even though the synthetic control poorly predicts the treated unit’s counterfactual outcome.^32,38 In addition to requiring data from multiple pre-treatment periods, outcomes must (of course) be available for at least one post-treatment period.

Selection of control units: Selecting potential control units is a crucial component of analyses employing SCM as the control units’ similarity to the treated unit directly impacts the validity of the estimated treatment effect.^11,32 Potential control units should be plausible substitutes for the treated unit in the absence of treatment (e.g. control regions from the same or similar geographic locations). Units that experienced the same or a similar intervention to that of the treated unit and units with highly volatile pre-treatment outcomes should be discarded from the set of potential control units.^31,32 Restrictions on weights assigned to the control units will prevent extrapolation, but including units with pre-treatment outcomes very different from the treated unit could lead to so-called “interpolation bias.”^2,11,32,39 Data should be examined to ensure that the pre-treatment outcome trends of the control units are not systematically different from that of the treated unit. Likewise, pre-treatment covariates should be compared between the treated unit and control units. Pre-treatment outcomes and covariates for the treated unit ( $X_{1}$ ) should be close to or inside the convex hull of the values for control units ( $X_{0}$ ). Disparities between the treated unit and its corresponding synthetic control could lead to biased estimates because the treatment effect might be confounded by these imbalanced predictors.

No anticipation effect: SCM requires no effect of treatment prior to the time of intervention ( $T_{0}$ ). The no anticipation effect assumption is plausible in many settings; for example, the incidence of malaria in a specific region should not be affected by a malaria elimination initiative implemented in that region in the future. On the other hand, in some settings the no anticipation effect may be dubious. For example, if a policy aims to restrict access to a certain substance, then individuals may stock up on that substance prior to the policy’s intervention. If there is an anticipation effect, SCM can be used by backdating the intervention.³²

No interference: As mentioned above, SCM assumes no interference between units, that is, the outcome of a unit is unaffected by the treatment of other units. In the setting of a single treated unit, this implies that the treatment cannot affect the outcomes of the potential control units. In the next section, SCM is employed to assess the effect of a tobacco control program implemented in California on California’s cigarette sales, with other states serving as control units. The no interference assumption in this example stipulates that the tobacco control program in California had no effect on cigarette sales in other states.

2.5. An application of SCM

Abadie et al.² implemented SCM to estimate the effect of California Proposition 99, a tobacco control program on cigarette sales in California. Proposition 99 was passed in November 1988 and went into effect in January 1989. Since Proposition 99 was only implemented in California, California’s cigarette sales after the passage of Proposition 99 could be naively compared to the unweighted average of the rest of the United States (excluding 11 states that either implemented formal tobacco control programs or raised cigarette taxes by 50 cents or more in 1989–2000), as shown in Figure 1(A). However, Figure 1(A) shows how the trend in per capita cigarette sales differs between California and the rest of the United States prior to the passage of Proposition 99.

To improve pre-treatment fit, SCM was used to create a synthetic control for California, letting $Z_{i}$ include GDP per capita, percent aged 15–24, retail price, and beer consumption per capita, averaged over 1980–1988, and $Y_{i}$ include cigarette sale averages in 1970, 1980, and 1988.² The synthetic control for California is a weighted average of only five states (Colorado, Connecticut, Montana, Nevada, and Utah, with SCM weights of 0.163, 0.068, 0.201, 0.234, and 0.334, respectively) that most closely resembled California in the pre-treatment period.¹ The SCM weights of all other control states are zero.

Figure 1(B) shows that California’s synthetic control better approximates California’s cigarette consumption during the pre-treatment period than the unweighted average of the rest of the United States. The gap between California and its synthetic control after the passage of Proposition 99 in Figure 1(B) estimates that cigarette consumption was reduced by an average of almost 20 packs per capita in the post-treatment period, suggesting Proposition 99 was effective in lowering cigarette sales in California. Examining the distribution of the ratios of post/pre-treatment MSPE, as described in Section 2.3, shows that the post-treatment MSPE for California was about 130 times the pre-treatment MSPE. This ratio was much larger than that of any of the control states, with the majority of control states having a MSPE ratio less than 40.² Based on the distribution of the post/pre-treatment MSPE ratios, if one were to randomly assign the intervention to a state within the data set, the probability of getting a ratio as large as California’s was $p = 0.026$ ( $1 / 39$ ; $N = 39$ states included in analyses).

2.6. Modifications of the synthetic control method

Often in practice, due to SCM’s constraints (1) and (2) on the weights, only a few control units receive non-zero weights, making the contribution of each control unit clear and allowing for simple interpretation of the counterfactual. For example, the synthetic control for California in Section 2.5 is a weighted average of only five states. While the transparency and interpretability of SCM are attractive in practice, the application of this method has limitations in some scenarios. For instance, SCM is not recommended when data is limited to a small number of pre-treatment periods due to the risk of overfitting.^11,32 SCM is also not recommended if there is poor pre-treatment fit since the bias bound of the SCM estimator under certain assumed data generating processes for the control potential outcomes relies on $X_{1} \approx X_{0}^{'} γ^{scm}$ .^2,32 Unfortunately, finding a synthetic control that closely fits the treated unit in the pre-treatment period can be difficult to achieve in some settings. Additionally, even if good pre-treatment fit is achieved, the SCM estimator is susceptible to so-called “interpolation bias.”^2,11,32,39 Such bias may arise because the synthetic control providing a good approximation to $Y_{1 t} (0)$ for $t \leq T_{0}$ does not necessarily imply the synthetic control will also well approximate $Y_{1 t} (0)$ for $t > T_{0}$ unless additional assumptions are made about the data generating process. Hence, the need for methods that modify SCM arises.

There have been several extensions and modifications of SCM, including extending to multiple treated units, correcting for bias, implementing regression-based methods, and applying matrix completion/estimation methods.^13,32 A variety of recently proposed estimators can be expressed as penalized SCM estimators. These methods utilize weights $γ$ that solve the constrained optimization problem

min_{\begin{matrix} γ \end{matrix}} | | V^{\frac{1}{2}} (X_{1} - X_{0}^{'} γ) | |_{2}^{2} + ξ \sum f (γ_{i})

(8)

subject to constraints (1) and (2). Note that if $ξ \to 0$ , a penalized SCM estimator based on (8) will converge to the standard SCM estimator. Heuristically, the penalty term in (8) is specified to de-bias SCM estimates. For example, setting $f (γ_{i}) = γ_{i} | | X_{1} - X_{i} | |_{2}^{2}$ penalizes discrepancies between the treated unit and each control unit, weighted by the magnitude of their contributions, aiming to reduce interpolation bias. Here, as $ξ \to \infty$ , the penalized estimator converges to a one-to-one matching estimator.^32,40 Since, under various data generating processes, improving pre-treatment fit reduces the bias of the SCM estimator, some methods aimed at bias correction relax SCM constraint (1) on $γ$ to attain better pre-treatment fit. These methods let $f (γ_{i})$ be a function that penalizes the deviation of the new weight vector, with potentially negative weights, from the SCM weights, and $ξ$ controls the level of extrapolation. This modification is employed in ridge regression and Ridge ASCM, further discussed below.

3. The augmented synthetic control method

As noted in Section 2.4, disparities between the predictor values of the treated unit ( $X_{1}$ ) and the predictor values of the corresponding synthetic control ( $X_{0}$ ) can lead to biased estimates of the treatment effect due to confounding by these imbalanced predictor values. For instance, when evaluating the impact of Proposition 99 on California’s cigarette sales, a significant disparity in the pre-policy GDP per capita between California and other states could introduce bias as the pre-policy GDP per capita might independently influence post-policy cigarette sales. For example, individuals with higher incomes may be less sensitive to changes in cigarette prices, leading them to continue purchasing cigarettes despite cost increases, confounding the estimate of the treatment effect. If the synthetic control fails to approximate California’s pre-policy GDP, differences in the post-policy cigarette sales may be due to imbalance in GDP rather than the policy itself. ASCM estimates the bias due to poor pre-treatment fit via an outcome regression model and then attempts to correct the SCM estimate for that bias.

This section summarizes ASCM developed by Ben-Michael et al.¹³ Continue to consider the case where only the first unit ( $i = 1$ ) receives the treatment at time $T_{0} < T$ , and for simplicity, further assume only one post-treatment period, $T = T_{0} + 1$ . In Sections 3.1 through 3.3, only the pre-treatment values of the outcome variable are included in $X_{1}$ and $X_{0}$ . Incorporating $Z_{1}$ and $Z_{0}$ is briefly discussed in Section 3.4.

3.1. Overview

In ASCM, the underlying data generating process is assumed to have the form given in (6). As described in Section 2.2, common forms of $m_{i t}$ include linear factor models, where for $t = 1, \dots, T$ , $Y_{i t} (0)$ is linear in a set of latent factors, and autoregressive models, where the post-treatment outcome $Y_{i T} (0)$ is linear in its lagged values. Letting ${\hat{m}}_{i T}$ be an estimator of $m_{i T}$ , the ASCM estimator of $Y_{1 T} (0)$ is:

{\hat{Y}}_{1 T}^{aug} (0) = \sum γ_{i}^{scm} Y_{i T} + {\hat{m}}_{1 T} - \sum γ_{i}^{scm} {\hat{m}}_{i T}

(9)

Notice that (9) is the original SCM estimator with an added term, ${\hat{m}}_{1 T} - \sum γ_{i}^{scm} {\hat{m}}_{i T}$ , which corrects for the model-based estimate of bias. When the estimated bias is small, the ASCM estimator will approximate the SCM estimator.

Correcting the SCM estimator for the model-based estimate of bias is similar to bias correction for inexact matching.^33,41 Inexact matching occurs when a treated unit is paired with control units that have similar characteristics (e.g. pre-treatment outcomes), but differences between the treated and control units still exist. When comparing the treated unit to these controls, the estimated treatment effect might be biased due to the imperfect matching. Bias correction techniques adjust the estimated treatment effect by accounting for differences that remain between a treated unit and its matched controls. In (9), the SCM estimate of the treatment effect is adjusted based on the discrepancy between the outcome model estimate for the treated unit and the synthetic control.

In augmenting the SCM estimator, the choice of the estimator ${\hat{m}}_{i T}$ can lead to different implications.¹³ For example, if ${\hat{m}}_{i T}$ is linear in pre-treatment outcomes, that is, ${\hat{m}}_{i T} = {\hat{η}}_{0} + X_{i}^{'} \hat{η}$ , then pre-treatment periods that are more predictive of the post-treatment outcome will have larger regression coefficients, and hence imbalances in those periods will lead to larger adjustments to the SCM estimate in the augmented estimator (9). If ${\hat{m}}_{i T}$ is linear in a combination of control units’ post-treatment outcomes, that is, ${\hat{m}}_{i T} = \sum {\hat{α}}_{i} (X) Y_{i T}$ , where $\hat{α}$ is some weighting function, then (9) can be re-expressed as a weighting estimator in which the SCM weight for unit $i$ is adjusted based on imbalance in the unit-specific transformation ${\hat{α}}_{i}$ of the lagged outcomes. The ridge regression estimator of ${\hat{m}}_{i T}$ described below is linear in both pre-treatment outcomes and a combination of control units’ post-treatment outcomes.

Ben-Michael et al.¹³ considered the performance of a range of estimators of ${\hat{m}}_{i T}$ , focusing on estimators that are functions of pre-treatment outcomes, ${\hat{m}}_{i T} \equiv \hat{m} (X_{i})$ , based on: a unit fixed effect model; a penalized linear model, such as LASSO, ridge, or elastic net; a machine learning model such as random forest; Bayesian structural time series estimation; or matrix completion. Ben-Michael et al. perform simulations using these different outcome estimators under several data generating processes, including linear factor models, an autoregressive model, and a unit and time fixed effect model. They suggest augmenting SCM with a penalized regression model based on the relative simplicity of the method and consistently good performance across data generating processes. In particular, using ridge regression as the outcome model estimator, further explained in Section 3.2 below, is recommended as it directly controls pre-treatment fit while also minimizing extrapolation due to the outcome model.

3.2. Ridge ASCM

A special case of the class of ASCM estimators is the Ridge ASCM estimator, which estimates $m_{i t}$ by a ridge-regularized linear model, that is, $\hat{m} (X_{i}) = {\hat{η}}_{0}^{ridge} + X_{i}^{'} {\hat{η}}^{ridge}$ ,¹³ where ${\hat{η}}_{0}^{ridge}$ and ${\hat{η}}^{ridge}$ solve

\underset{\begin{matrix} η_{0}, η \end{matrix}}{arg min} \frac{1}{2} \sum (Y_{i} - (η_{0} + X_{i}^{'} η))^{2} + λ^{ridge} | | η | |_{2}^{2}

(10)

Substituting this form of ${\hat{m}}_{i t}$ into (9) yields the Ridge ASCM estimator

{\hat{Y}}_{1 T}^{aug} (0) = \sum γ_{i}^{scm} Y_{i T} + (X_{1} - \sum γ_{i}^{scm} X_{i}) {\hat{η}}^{ridge}

(11)

The Ridge ASCM estimator can equivalently be expressed as

{\hat{Y}}_{1 T}^{aug} (0) = \sum {γ_{i}}^{aug} Y_{i T}

(12)

where

{γ_{i}}^{aug} = γ_{i}^{scm} + (X_{1} - X_{0}^{'} γ^{scm})^{'} (X_{0}^{'} X_{0} + λ^{ridge} I_{T_{0}})^{- 1} X_{i}

(13)

with $I_{T_{0}}$ the $T_{0} \times T_{0}$ identity matrix. In this form, the Ridge ASCM weights $γ_{i}^{aug}$ are the solution to

min_{\begin{matrix} γ \end{matrix}} \frac{1}{2 λ^{ridge}} | | X_{1} - X_{0}^{'} γ | |_{2}^{2} + \frac{1}{2} | | γ - γ^{scm} | |_{2}^{2}

(14)

subject to constraint (2) on $γ$ . Note that (14) is a special case of (8) with $f (γ_{i}) = (γ_{i} - γ_{i}^{scm})^{2}$ . Also, notice that $γ_{i}^{aug}$ may be negative. When SCM weights do not achieve perfect pre-treatment fit, relaxing the non-negative restriction allows the Ridge ASCM estimator to extrapolate from the convex hull of the control units, attaining pre-treatment fit that will always be as good or better than SCM’s (i.e. $| | X_{1} - X_{0}^{'} γ^{aug} | |_{2}^{2} \leq | | X_{1} - X_{0}^{'} γ^{scm} | |_{2}^{2})$ .¹³

The hyperparameter $λ^{ridge}$ controls the amount of extrapolation and the improvement in pre-treatment fit over SCM. Cross-validation can be used to select a value for $λ^{ridge}$ . Specifically, let ${\hat{Y}}_{1 t}^{(- t)} (0) = \sum γ_{i (- t)}^{aug} Y_{i t}$ be the estimator of $Y_{1 t} (0)$ based on (12) excluding the time period $t$ . Then, the leave-one-out cross-validation mean squared error (MSE) over pre-treatment time periods, $C V (λ^{ridge}) = T_{0}^{- 1} \sum_{t = 1}^{T_{0}} (Y_{1 t} - {\hat{Y}}_{1 t}^{(- t)})^{2}$ , can be computed, and $λ^{ridge}$ may be chosen to minimize $C V (λ^{ridge})$ , or more conservatively, set to the maximum value within one standard deviation of the minimal MSE.^13,42 Since negative weights may be hard to interpret, SCM weights may be preferred if the synthetic control closely fits the trajectory of the treated unit in the pre-treatment period.¹¹ When good pre-treatment fit is achieved, the amount of extrapolation, determined by hyperparameter $λ^{ridge}$ and the amount of imbalance, will be small, and $γ^{aug}$ will be similar to $γ^{scm}$ , maintaining interpretability.

3.3. Inference

In addition to using ASCM for estimation, confidence intervals for $τ$ can be constructed via so-called “conformal inference” as follows.³⁴ For simplicity, assume only one post-treatment period, that is, $T = T_{0} + 1$ , and consider testing the null hypothesis $H_{0} : τ = τ_{0}$ . Let ${\tilde{Y}}_{1 T} = Y_{1 T} - τ_{0}$ denote the counterfactual post-treatment outcome for the treated unit under $H_{0}$ . Then, define an extended version of $X_{1}$ and $X_{0}$ which includes data through $t = T$ where ${\tilde{Y}}_{1 T}$ is used as the outcome of the treated unit at $t = T$ . Calculate weights $γ_{i} (τ_{0})$ by applying (13) to the extended data set and compute the time $T$ residual, ${\tilde{Y}}_{1 T} - \sum γ_{i} (τ_{0}) Y_{i T}$ , where the second term is the Ridge ASCM estimator of $Y_{1 T} (0)$ based on the extended data set. A p-value is obtained by comparing the time $T$ residual to the residuals in the pre-treatment period,

p (τ_{0}) = \frac{1}{T} \sum_{t = 1}^{T_{0}} I {| {\tilde{Y}}_{1 T} (0) - \sum γ_{i} (τ_{0}) Y_{i T} | \leq | Y_{1 t} - \sum γ_{i} (τ_{0}) Y_{i t} |} + \frac{1}{T}

(15)

Note that $p (τ_{0})$ is constrained to $[1 / T, 1]$ under this approach, and hence depends on the number of time periods. Finally, the test is repeated over all possible null hypothesis values $τ_{0}$ , and a $1 - α$ confidence interval for $τ$ is given by ${τ_{0} : p (τ_{0}) > α}$ . Since $τ_{0} = Y_{1 T} - {\tilde{Y}}_{1 T}$ , and $Y_{1 T}$ is observed, this is equivalent to inverting the test over all possible values of ${\tilde{Y}}_{1 T}$ to obtain a conformal prediction interval for $Y_{1 T} (0)$ . By construction, these intervals have exact finite-sample coverage when the residuals are exchangeable. In the absence of exchangeable residuals, approximate validity can be achieved (as $T_{0} \to \infty$ ) under certain assumptions.¹³

3.4. Incorporating covariates

As with SCM, ASCM can also incorporate covariates $Z_{i}$ . First, expanding (8), weights are determined as the solution to the constrained optimization problem

min_{\begin{matrix} γ \in △^{N_{0}} \end{matrix}} θ_{x} | | (X_{1} - X_{0}^{'} γ) | |_{2}^{2} + θ_{z} | | (Z_{1} - Z_{0}^{'} γ) | |_{2}^{2} + ξ \sum f (γ_{i})

(16)

Then, an outcome model conditional on covariates and pre-treatment outcomes is fit such that $\hat{m} (X, Z) = {\hat{η}}_{0} + X^{'} {\hat{η}}_{x} + Z^{'} {\hat{η}}_{z}$ where, as in (10), parameters are estimated via

min_{\begin{matrix} η_{0}, η_{x}, η_{z} \end{matrix}} \frac{1}{2} \sum {Y_{i} - (η_{0} + X_{i}^{'} η_{x} + Z_{i}^{'} η_{z})}^{2} + λ_{x} | | η_{x} | |_{2}^{2} + λ_{z} | | η_{z} | |_{2}^{2}

(17)

Setting $θ_{x} = θ_{z} = 1$ and $λ_{x} = λ_{z} = λ^{ridge}$ incorporates the pre-treatment outcomes $X$ and covariates $Z$ equally, which parallels the process described in Section 2.1, where the covariates are included in $X$ . Under this setup, the process proceeds as described in Section 3.2. Other choices of the tuning parameters $θ_{x}$ , $θ_{z}$ , $λ_{x}$ , and $λ_{z}$ may be considered depending on the data being analyzed. For example, when the dimension of the covariates is small relative to the number of units, Ben-Michael et al.¹³ suggested regularizing the pre-treatment outcome coefficients $η_{x}$ but not regularizing the coefficients $η_{z}$ (i.e. set $λ_{z} = 0$ ).

4. Malaria elimination initiative in Southern Mozambique

In August 2015, a large-scale mass drug administration (MDA)-based malaria elimination initiative was implemented in the Magude district of the Maputo province in Southern Mozambique, a country with one of the top five highest malaria burdens in the world.⁴³ The elimination initiative was implemented on top of existing malaria control measures, which included intermittent preventative treatment for pregnant women and long-lasting insecticide-treated nets (LLINs). Phase 1 of the elimination initiative (MDA-1) took place over approximately 2 years, from August 2015 through June 2017 (see Thomas et al.³⁰ Figure S1 for intervention timeline).

To evaluate the short-term effects of the MDA initiative in Magude, quantified by the reduction in weekly malaria incidence rate (cases per 1000 individuals at risk) for residents under 5 years of age (0–4) and 5 years of age or older (5 $+$ ), the data set prepared by Thomas et al.³⁰ for a similar analysis is used. In addition to weekly malaria cases obtained from Mozambique’s Boletin Epidemiologico Semanal, this data set incorporates other predictors of malaria incidence, including coverage of LLINs, average weekly temperature, and weekly precipitation. The control units for Magude consisted of 17 other districts either in Maputo province or in districts from the neighboring province of Gaza. These districts were chosen based on their similarity to Magude in terms of location and epidemiological characteristics (see Table 1); districts that also received an MDA initiative were excluded from the set of possible control units.³⁰ Pre-treatment outcome trajectories for Magude and all control districts for both age groups can be found in Supplemental Figures S1 and S2.

Table 1.

Covariate balance between Magude, synthetic Magude from SCM, synthetic Magude from ASCM, and the unweighted control average of all potential control districts before the MDA-based malaria elimination initiative was implemented in Magude in August 2015.

		0–4 age group		5+ age group
		SCM	ASCM	SCM	ASCM	Unweighted
		synthetic	synthetic	synthetic	synthetic	control
Variables	Magude	Magude	Magude	Magude	Magude	average
Precipitation (mm)	1.71	1.73	1.73	1.71	1.71	1.73
Temperature (^∘C)	24.52	24.47	24.48	24.41	24.41	24.52
Net coverage (%)	24.83	22.70	24.68	27.38	27.33	27.66

Open in a new tab

SCM: synthetic control method; ASCM: augmented synthetic control method; MDA: mass drug administration.

For this analysis, let $Y_{i t}$ be the weekly malaria incidence (measured separately for 0–4 and 5 $+$ ) for district $i$ at time $t$ , and $W_{i}$ be an indicator that district $i$ received the MDA at $T_{0} < T$ , where $T_{0}$ is August 2015, the start date of MDA-1 in Magude. Let $X_{i}$ contain the pre-treatment weekly malaria incidence for district $i$ , weekly precipitation, weekly temperature, and coverage of LLINs. SCM and Ridge ASCM are employed in parallel to estimate the weekly reduction in malaria cases in Magude as a result of the MDA initiative. The effect is first estimated on the full data set in Section 4.1 and then on a reduced data set that includes only a subset of seven of the control districts in Section 4.2 to highlight the advantages of Ridge ASCM in settings where good pre-treatment fit is not feasible.

4.1. Effect of the MDA initiative in Magude using all available controls

The effect of MDA-1 was first estimated using SCM. The suitability of the control districts comprising synthetic Magude was assessed by examining both the covariate balance (Table 1) and pre-treatment outcome trajectories (Supplemental Figures S3 and S4) between Magude and the districts that comprise the synthetic controls. The pre-treatment fit displayed in Figures 2 and 3 shows that the difference between Magude and its synthetic control for both age groups prior to the intervention is concentrated around zero (Figures 2C and 3C), indicating a reasonably good fit. The synthetic Magude found via SCM for both 0–4 and 5 $+$ closely fit the trend in malaria incidence for Magude over the pre-treatment period (October 2013–July 2015) with RMSPEs of 2.1 and 1.5 malaria cases per week, respectively (Figures 2 and 3).

Figure 2. — Malaria incidence per 1000 individuals in Magude versus synthetic Magude for ages 0–4 estimated via (A) SCM and (B) Ridge ASCM, and gap plots of Magude versus synthetic Magude for ages 0–4 estimated via (C) SCM and (D) Ridge ASCM. Vertical line indicates the MDA implementation. SCM: synthetic control method; ASCM: augmented synthetic control method; MDA: mass drug administration.

Figure 3. — Malaria incidence per 1000 individuals in Magude versus synthetic Magude for ages 5+ estimated via (A) SCM and (B) Ridge ASCM, and gap plots of Magude versus synthetic Magude for ages 5+ estimated via (C) SCM and (D) Ridge ASCM. Vertical line indicates the MDA implementation. SCM: synthetic control method; ASCM: augmented synthetic control method; MDA: mass drug administration.

Based on the difference in malaria incidence between Magude and its synthetic control in the post-treatment period, an estimated 78.4% of expected malaria cases in the 0–4 age group and 61.6% of expected malaria cases in the 5 $+$ age group were averted from August 2015 to April 2017 (75.3% in 0–4 and 56.6% in 5 $+$ during intervention year 1, 82.0% in 0–4 and 65.5% in 5 $+$ during intervention year 2; see Table 2).

Table 2.

Estimated treatment effect and number of cases averted in the 0–4 and 5+ age groups.

	Weekly reduction in malaria incidence per 1000				Number of cases averted (% reduction)
	Ages 0–4		Ages 5+		Ages 0–4		Ages 5+
	SCM	ASCM	SCM	ASCM	SCM	ASCM	SCM	ASCM
Year
Aug 2015–Jul 2016 (intervention year 1)	$-$ 2.98	$-$ 3.42	$-$ 0.81	$-$ 0.82	1396 (75.3%)	1601 (77.7%)	1894 (56.6%)	1925 (57.0%)
Aug 2016–Apr 2017 (intervention year 2)	$-$ 4.00	$-$ 4.79	$-$ 1.70	$-$ 1.74	1332 (82.0%)	1596 (84.5%)	2840 (65.5%)	2911 (66.0%)
Total (Aug 2015–Apr 2017)					2728 (78.4%)	3197 (81.0%)	4734 (61.6%)	4836 (62.1%)
Peak malaria season (Dec 2015–Apr 2016)					600 (83.7%)	703 (85.7%)	1048 (73.8%)	1065(74.1%)
Peak malaria season (Dec 2016–Apr 2017)					916 (82.1%)	1124 (84.9%)	2362 (66.2%)	2421 (66.8%)

Open in a new tab

SCM: synthetic control method; ASCM: augmented synthetic control method.

During peak malaria seasons, 83.7% of cases in the 0–4 age group and 73.8% of cases in the 5 $+$ age group were averted between December 2015 and April 2016, and 82.1% of cases in the 0–4 age group and 66.2% of cases in the 5 $+$ age group of cases were averted between December 2016 and April 2017. Note that Mozambique experienced a dry season during Year 1 due to El Ni $\tilde{n}$ o and a very wet season of heavy rainfall in year 2 due to La Ni $\tilde{n}$ a, resulting in higher overall incidence of malaria in year 2.

The same analysis was then performed using Ridge ASCM. The quality of the control districts was examined as described above for the SCM analysis. The outcomes and covariates were incorporated equally, as described in Section 3.4, and the hyperparameter $λ^{ridge}$ was selected via cross-validation to minimize $C V (λ^{ridge})$ . In the 5 $+$ age group, minimal extrapolation was needed when implementing Ridge ASCM to estimate the effect of the MDA on malaria incidence in Magude (Figure 3). The pre-treatment RMSPE and weights estimated via Ridge ASCM were nearly identical to SCM weights (Supplemental Figure S5), and both methods resulted in similar estimates of reduction in malaria cases (Table 2). In the 0–4 age group, several control regions had a substantially higher malarial incidence than Magude before the MDA intervention in Magude (Supplemental Figure S6), making it more challenging for SCM to achieve good pre-treatment fit. Augmenting SCM with ridge regression decreased the pre-treatment RMSPE by $\sim$ 11%, but the weights estimated via Ridge ASCM were nearly identical to SCM weights (Supplemental Figure S5), and both methods resulted in consistent estimates of reduction in malaria cases (Table 2). The results from both age groups are consistent with Thomas et al.’s findings that the MDA intervention resulted in a large reduction in malaria incidence over the 2 years post-intervention. These results demonstrate how Ridge ASCM typically produces similar results to SCM when good pre-treatment fit is achieved.

Several sensitivity analyses and diagnostics were conducted. First, baseline covariates were removed and the SCM and ASCM analyses were repeated using only pre-intervention outcomes as predictors. Removing covariates resulted in estimated effects similar to those found when including covariates. Second, the cross-validation MSE over a range of $λ^{ridge}$ values was examined, and the analyses were repeated setting $λ^{ridge}$ to the maximum value within one standard deviation of the minimal cross-validation MSE, resulting in slightly attenuated effect estimates of the intervention in the 0–4 age group. However, results were consistent with those found when using the $λ^{ridge}$ value that minimized the cross-validation MSE. In the 5+ age group, setting $λ^{ridge}$ to the maximum value within one standard deviation of the minimal cross-validation MSE was equivalent to using the $λ^{ridge}$ value that minimized the cross-validation MSE, and thus results remained the same. Third, the intervention was backdated to May 2015, before the main 2015 malaria season, to assess the no anticipation effect assumption.³² In both age groups, the difference between Magude and synthetic Magude found via ASCM remained around zero between May 2015 and August 2015, indicating little to no anticipation effect. Finally, the outcome model used to augment SCM was varied and the estimated treatment effect using different augmentations was compared. The majority of outcome models used to augment SCM resulted in effect estimates consistent with those found via Ridge ASCM. Detailed results from each of these analyses can be found in the Supplemental Material.

4.2. Effect of the MDA initiative in Magude using a subset of available controls

While SCM is not recommended when pre-treatment fit is poor, in certain situations there may be no available controls that closely match the treated unit in the pre-treatment period. For example, focusing on the 0–4 age group, suppose that data was only available on the subset of seven control regions for Magude used in the analysis in 4.1 that had higher malarial incidence than Magude before the MDA intervention (Supplemental Figure S6). As expected based on comparison of covariate balance (Table 3), pre-treatment outcome trajectories (Supplemental Figure S7) and differences in those trajectories (Figure 4C) between Magude and the districts that comprise the synthetic control, SCM was not able to create a weighted average of these seven control units whose outcome in the pre-treatment period matched the outcomes of Magude. The RMSPE of Magude compared to synthetic Magude over the pre-treatment period was 5.6 for 0–4 (compared to a previous 2.1 when 17 controls were available). Augmenting SCM with ridge regression achieved a better fit in the pre-treatment period with a RMSPE of 2.8, a nearly 50% reduction in SCM’s RMSPE in the pre-treatment period (Figure 4). The change in weights was also larger than when using all 17 controls: the root mean square difference between SCM and Ridge ASCM weights was 0.2 in the limited control set (Supplemental Figure S8).

Table 3.

Covariate balance between Magude, synthetic Magude from SCM, and synthetic Magude from ASCM in the 0–4 age group when using a subset of seven control districts, and the unweighted control average of all potential control districts before the MDA-based malaria elimination initiative was implemented in Magude in August 2015.

Variables	Magude	SCM synthetic Magude	ASCM synthetic Magude	Unweighted control average
Precipitation (mm)	1.71	1.72	1.73	1.74
Temperature (^∘C)	24.52	24.62	24.61	24.67
Net coverage (%)	24.83	27.73	23.98	23.53

Open in a new tab

SCM: synthetic control method; ASCM: augmented synthetic control method; MDA: mass drug administration.

Figure 4. — Malaria incidence per 1000 individuals in Magude versus synthetic Magude for ages 0–4 estimated via (A) SCM and (B) Ridge ASCM, and gap plots of Magude versus synthetic Magude for ages 0–4 estimated via (C) SCM and (D) Ridge ASCM, using a subset of seven control regions. Vertical line indicates the MDA implementation. SCM: synthetic control method; ASCM: augmented synthetic control method; MDA: mass drug administration.

Additionally, ASCM resulted in estimates for malaria reduction more comparable to estimates found when applying SCM or ASCM with the full set of control units consisting of units more similar to Magude in the pre-treatment period. SCM applied to the subset of controls estimates 90.0% of expected malaria cases in Magude were averted from August 2015 to April 2017 (89.8% intervention year 1, 90.2% intervention year 2; Table 4). ASCM estimates 80.7% of expected malaria cases in Magude were averted from August 2015 to April 2017 (83.0% intervention year 1, 75.6% intervention year 2; Table 4). In the absence of control units that closely match Magude’s pre-treatment trend in malaria incidence, ASCM is preferred for this analysis, as evidenced by improved pre-treatment fit and results more consistent with those found using similar control units.

Table 4.

Estimated treatment effect and number of cases averted in the 0–4 age group when using a subset of seven control regions.

	Weekly reduction in malaria		Number of cases averted
	incidence per 1000		(% reduction)
	SCM	ASCM	SCM	ASCM
Year
Aug 2015–Jul 2016 (intervention year 1)	$-$ 8.66	$-$ 4.77	4055 (89.8%)	2234 (83.0%)
Aug 2016–Apr 2017 (intervention year 2)	$-$ 8.12	$-$ 2.72	2706 (90.2%)	907 (75.6%)
Total (Aug 2015–Apr 2017)			6762 (90.0%)	3141 (80.7%)
Peak malaria season (Dec 2015–Apr 2016)			1557 (93.0%)	914 (88.7%)
Peak malaria season (Dec 2016–Apr 2017)			1615 (89.0%)	448 (69.1%)

Open in a new tab

SCM: synthetic control method; ASCM: augmented synthetic control method.

Sensitivity analyses and diagnostics were performed as in Section 4.1 for the ASCM results. No major differences were found in the results when removing baseline covariates. Setting $λ^{ridge}$ to the maximum value within one standard deviation of the minimal cross-validation MSE resulted in slightly increased estimates of number of cases averted, but results were consistent with those found under the $λ^{ridge}$ value that minimized the cross-validation MSE. When backdating the intervention to May 2015, the estimated effect prior to the true date was centered around zero indicating little to no anticipation effect. When varying the outcome model under the reduced set of controls, estimated effect sizes were more varied than under the full set of controls; the Ridge ASCM estimate was similar to the average of all estimates for intervention year 1 and was smaller than the majority of estimates for intervention year 2. Detailed results from each of these analyses can be found in the Supplemental Material.

5. Discussion

SCM and variations such as ASCM estimate the effect of a treatment on a single unit by creating a synthetic control unit to approximate the counterfactual outcome that would have been observed had the treated unit not been treated. Synthetic controls created through SCM allow for simple interpretations of the counterfactual as the contribution of each control unit is clear. However, the reliability of the synthetic control depends in part on its outcome trajectory matching the treated unit’s outcome trajectory in the pre-treatment period. In recent years, much attention has been focused on adapting SCM in ways that make it more applicable to situations where finding a group of controls that closely fit the trajectory of the treated unit in the pre-treatment period is not feasible. One recent adaptation of SCM, Ridge ASCM, uses ridge regression as its outcome model, estimates the bias due to imperfect pre-treatment fit, and then de-biases the original SCM estimate. Ridge ASCM admits negative weights, using extrapolation to improve pre-treatment fit, but parameterizes the level of extrapolation by penalizing the distance from SCM weights. This augmentation of SCM always achieves pre-treatment fit at least as good as that of the original SCM.

There are several possible areas of future methodological research related to (augmented) synthetic controls. First, developing formal inferential methods for this setting is an active area of current research with many open problems.^{31,32,34,37,44,45} While placebo tests are often used for inference in SCM applications, the statistical power of such tests can be limited due to small sample sizes. Other proposed inferential methods rely on approximations as $T_{0} \to \infty$ and thus may not perform well in settings where data is only available for a limited number of pre-treatment periods.

ASCM could be modified to account for different data structures, such as hierarchical data, discrete or count outcomes, multiple outcomes for the same units, or interference between units.¹³ Some of these structures have been considered in the SCM setting. For example, Robbins et al.⁴⁶ proposed a synthetic control method to assess the effect of treatment across multiple outcomes; future research could consider an ASCM-type extension of this method. SCM-type methods have also recently been developed which permit interference.^47–50 The no interference assumption may be questionable in settings where, for example, the units are geographical regions in close proximity and it is plausible the treatment may have a spillover effect to regions near the treated region. Future research could entail developing extensions of ASCM which allow for interference.

Finally, in terms of application, while SCM and ASCM have been applied to evaluate population-level health interventions surrounding the COVID-19 pandemic, ASCM has potential for broader applicability to other areas of health and biomedical science, such as single-case experimental designs (i.e. n-of-1 trials), and their observational counterparts (i.e. “esametry”),⁵¹ or personalized medicine. For example, ASCM can be used to estimate the effect of a public health policy implemented within a particular state, the effect of a behavioral intervention on an individual’s health, or the effect of an experimental drug given to a single person.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231224638 - Supplemental material for The augmented synthetic control method in public health and biomedical research

sj-pdf-1-smm-10.1177_09622802231224638.pdf^{(5.6MB, pdf)}

Supplemental material, sj-pdf-1-smm-10.1177_09622802231224638 for The augmented synthetic control method in public health and biomedical research by Taylor Krajewski and Michael Hudgens in Statistical Methods in Medical Research

sj-zip-2-smm-10.1177_09622802231224638 - Supplemental material for The augmented synthetic control method in public health and biomedical research

sj-zip-2-smm-10.1177_09622802231224638.zip^{(139.5KB, zip)}

Supplemental material, sj-zip-2-smm-10.1177_09622802231224638 for The augmented synthetic control method in public health and biomedical research by Taylor Krajewski and Michael Hudgens in Statistical Methods in Medical Research

Acknowledgements

The authors would like to thank Ranjeeta Thomas and Laia Cirera for their collaboration on applying synthetic controls to the malaria elimination initiative in southern Mozambique. We also thank the Ministry of Health (MOH) and National Malaria Control Program (NMCP) in Mozambique for providing us access to the BES data. Helpful comments from four anonymous reviewers that improved the quality of this paper are also gratefully acknowledged.

^1.

Replication of Abadie et al.² analysis was performed in R and resulted in slightly different weights than the original analysis, which was performed in Stata.

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by NIH grant R01 AI085073 and by NIEHS grant T32 ES007018.

Data Availability Statement: The California Proposition 99 data analyzed in Section 2.5 are available at https://web.stanford.edu/jhain/synthpage.html. Data sets used in Section 4.3 to analyze the malaria elimination initiative in Magude were compiled using publicly available data from the National Malaria Control Programme (NMCP) of Mozambique. Requests for access and use of the data should be made directly to the NMCP. Since the NMCP data is not publicly accessible, a synthetic data set is available in the Supplemental Material along with code to analyze the synthetic data using both SCM and ASCM and corresponding results.

ORCID iD: Taylor Krajewski https://orcid.org/0000-0001-7410-8370

Supplemental material: Supplemental material for this article is available online.

References

1.National Academy of Medicine. Caring for the individual patient: Understanding heterogeneous treatment effects, 2019. [PubMed]
2.Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J Am Stat Assoc 2010; 105: 493–505. [Google Scholar]
3.Li Y, Undurraga EA, Zubizarreta JR. Effectiveness of localized lockdowns in the COVID-19 pandemic. Am J Epidemiol 2022; 191: 812–824. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cunningham S, Shah M. Decriminalizing indoor prostitution: implications for sexual violence and public health. Rev Econ Stud 2017; 85: 1683–1715. [Google Scholar]
5.Friedman L, Furberg C, DeMets D. et al. Fundamentals of Clinical Trials. 5th ed. Cham, Switzerland: Springer International Publishing. ISBN 978-3-319-18538-5. DOI:10.1007/978-3-319-18539-2. [Google Scholar]
6.Bernal JL, Cummins S, Gasparrini A. Interrupted time series regression for the evaluation of public health interventions: a tutorial. Int J Epidemiol 2017; 46: 348–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Bernal JL, Andrews N, Amirthalingam G. The use of quasi-experimental designs for vaccine evaluation. Clin Infect Dis 2019; 68: 1769–1776. [DOI] [PubMed] [Google Scholar]
8.Shadish WR, Cook TD, Campbell D. Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston, MA: Houghton Mifflin, 2002. [Google Scholar]
9.Abadie A. Semiparametric difference-in-differences estimators. Rev Econ Stud 2005; 72: 1–19. [Google Scholar]
10.Abadie A, Gardeazabal J. The economic costs of conflict: a case study of the Basque country. Am Econ Rev 2003; 93: 113–132. [Google Scholar]
11.Abadie A, Diamond A, Hainmueller J. Comparative politics and the synthetic control method. Am J Pol Sci 2015; 59: 495–510. [Google Scholar]
12.Athey S, Imbens GW. The state of applied econometrics: causality and policy evaluation. J Econ Perspect 2017; 31: 3–32.29465214 [Google Scholar]
13.Ben-Michael E, Feller A, Rothstein J. The augmented synthetic control method. J Am Stat Assoc 2021; 116: 1–27. [Google Scholar]
14.Pieters H, Curzi D, Olper A. et al. Effect of democratic reforms on child mortality: a synthetic control analysis. Lancet Glob Health 2016; 4: e627–e632. [DOI] [PubMed] [Google Scholar]
15.Geloso V, Pavlik JB. The Cuban revolution and infant mortality: a synthetic control approach. Explor Econ Hist 2021; 80: 101376. [Google Scholar]
16.Marinello S, Leider J, Pugach O. et al. The impact of the Philadelphia beverage tax on employment: a synthetic control analysis. Econ Hum Biol 2021; 40: 100939. [DOI] [PubMed] [Google Scholar]
17.Dai J, Liu Z, Li R. Improving the subway attraction for the post-COVID-19 era: the role of fare-free public transport policy. Transp Policy (Oxf) 2021; 103: 21–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Mourtgos SM, Adams IT, Nix J. Elevated police turnover following the summer of George Floyd protests: a synthetic control study. Criminol Public Policy 2021; 21: 9–33. [Google Scholar]
19.Heersink B, Peterson BD, Jenkins JA. Disasters and elections: estimating the net effect of damage and relief in historical perspective. Polit Anal 2017; 25: 260–268. [Google Scholar]
20.Donohue JJ, Aneja A, Weber KD. Right-to-carry laws and violent crime: a comprehensive assessment using panel data and a state-level synthetic control analysis. J Empir Leg Stud 2019; 16: 198–247. [Google Scholar]
21.Gutierrez IA, Weinberger G, Engberg J. Improving Teaching Effectiveness: Impact on Student Outcomes: The Intensive Partnerships for Effective Teaching Through 2013–2014. Santa Monica, CA: RAND Corporation, 2016. [Google Scholar]
22.Bifulco R, Rubenstein R, Sohn H. Using synthetic controls to evaluate the effect of unique interventions: the case of say yes to education. Eval Rev 2017; 41: 593–619. [DOI] [PubMed] [Google Scholar]
23.Bouttell J, Craig P, Lewsey J. et al. Synthetic control methodology as a tool for evaluating population-level health interventions. J Epidemiol Commun Health 2018; 72: 673–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Cole MA, Elliott RJR, Liu B. The impact of the Wuhan Covid-19 lockdown on air pollution and health: a machine learning and augmented synthetic control approach. Environ Resour Econ (Dordr) 2020; 76: 1–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Born B, Dietrich AM, Müller GJ. The lockdown effect: a counterfactual for Sweden. PLoS ONE 2021; 16: 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Barber A, West J. Conditional cash lotteries increase Covid-19 vaccination rates. J Health Econ 2022; 81: 102578. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Brehm M, Brehm P, Saavedra MH. The Ohio vaccine lottery and starting vaccination rates. Am J Health Econ 2022; 8: 387–411. [Google Scholar]
28.Mills MC, Rüttenauer T. The effect of mandatory Covid-19 certificates on vaccine uptake: synthetic-control modelling of six countries. Lancet Public Health 2022; 7: e15–e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Xin M, Shalaby A, Feng S. et al. Impacts of Covid-19 on urban rail transit ridership using the synthetic control method. Transp Policy (Oxf) 2021; 111: 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Thomas R, Cirera L, Brew J. et al. The short-term impact of a malaria elimination initiative in southern Mozambique: application of the synthetic control method to routine surveillance data. Health Econ 2021; 30: 2168–2184. [DOI] [PubMed] [Google Scholar]
31.Bonander C, Humphreys DK, Esposti MD. Synthetic control methods for the evaluation of single-unit interventions in epidemiology: a tutorial. Am J Epidemiol 2021; 190: 2700–2711. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Abadie A. Using synthetic controls: feasibility, data requirements, and methodological aspects. J Econ Lit 2021; 59: 391–425. [Google Scholar]
33.Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974; 66: 688–701. [Google Scholar]
34.Chernozhukov V, Wüthrich K, Zhu Y. An exact and robust conformal inference method for counterfactual and synthetic controls. J Am Stat Assoc 2021; 116: 1849–1864. [Google Scholar]
35.Bai J. Panel data models with interactive fixed effects. Econometrica 2009; 77: 1229–1279. [Google Scholar]
36.Firpo S, Possebom V. Synthetic control method: inference, sensitivity analysis and confidence sets. J Causal Inference 2018; 6: 20160026. [Google Scholar]
37.Li KT. Statistical inference for average treatment effects estimated by synthetic control methods. J Am Stat Assoc 2020; 115: 2068–2083. [Google Scholar]
38.Mcclelland R, Mucciolo L. An update on the synthetic control method as a tool to understand state policy. Tax Policy Center 2022. [Google Scholar]
39.Kellogg M, Mogstad M, Pouliot GA. et al. Combining matching and synthetic control to tradeoff biases from extrapolation and interpolation. J Am Stat Assoc 2021; 116: 1804–1816. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Abadie A, L’Hour J. A penalized synthetic control estimator for disaggregated data. J Am Stat Assoc 2021; 116: 1817–1834. [Google Scholar]
41.Abadie A, Diamond A, Hainmueller J. Synth: an R package for synthetic control methods in comparative case studies. J Stat Softw 2011; 42: 1–17. [Google Scholar]
42.Hastie T, Friedman J, Tisbshirani R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (2nd ed.) New York, NY: Springer, 2009. [Google Scholar]
43.World Health Organization. World malaria report, 2021.
44.Cattaneo MD, Feng Y, Titiunik R. Prediction intervals for synthetic control methods. J Am Stat Assoc 2021; 116: 1865–1880. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Shaikh AM, Toulis P. Randomization tests in observational studies with staggered adoption of treatment. J Am Stat Assoc 2021; 116: 1835–1848. [Google Scholar]
46.Robbins MW, Saunders J, Kilmer B. A framework for synthetic control methods with high-dimensional, micro-level data: evaluating a neighborhood-specific crime intervention. J Am Stat Assoc 2017; 112: 109–126. [Google Scholar]
47.Cao J, Dowd C. Estimation and Inference for Synthetic Control Methods with Spillover Effects, 2019. arXiv:1902.07343.
48.Di Stefano R, Mellace G. The inclusive synthetic control method. Working Papers 21/20, Sapienza University of Rome, DISS.. 2020. [Google Scholar]
49.Grossi G, Lattarulo P, Mariani M, et al. Direct and spillover effects of a new tramway line on the commercial vitality of peripheral streets. A synthetic-control approach, 2022. arXiv:1902.07343.
50.Menchetti F, Bojinov I. Estimating the effectiveness of permanent price reductions for competing products using multivariate Bayesian structural time series models. Ann Appl Stat 2022; 16: 414–435. [Google Scholar]
51.Daza EJ, Schneider L. Model-twin randomization (motr): A Monte Carlo method for estimating the within-individual average treatment effect using wearable sensors, 2023. arXiv.2208.00739.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-pdf-1-smm-10.1177_09622802231224638 - Supplemental material for The augmented synthetic control method in public health and biomedical research

sj-pdf-1-smm-10.1177_09622802231224638.pdf^{(5.6MB, pdf)}

sj-zip-2-smm-10.1177_09622802231224638 - Supplemental material for The augmented synthetic control method in public health and biomedical research

sj-zip-2-smm-10.1177_09622802231224638.zip^{(139.5KB, zip)}

[bibr1-09622802231224638] 1.National Academy of Medicine. Caring for the individual patient: Understanding heterogeneous treatment effects, 2019. [PubMed]

[bibr2-09622802231224638] 2.Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J Am Stat Assoc 2010; 105: 493–505. [Google Scholar]

[bibr3-09622802231224638] 3.Li Y, Undurraga EA, Zubizarreta JR. Effectiveness of localized lockdowns in the COVID-19 pandemic. Am J Epidemiol 2022; 191: 812–824. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr4-09622802231224638] 4.Cunningham S, Shah M. Decriminalizing indoor prostitution: implications for sexual violence and public health. Rev Econ Stud 2017; 85: 1683–1715. [Google Scholar]

[bibr5-09622802231224638] 5.Friedman L, Furberg C, DeMets D. et al. Fundamentals of Clinical Trials. 5th ed. Cham, Switzerland: Springer International Publishing. ISBN 978-3-319-18538-5. DOI:10.1007/978-3-319-18539-2. [Google Scholar]

[bibr6-09622802231224638] 6.Bernal JL, Cummins S, Gasparrini A. Interrupted time series regression for the evaluation of public health interventions: a tutorial. Int J Epidemiol 2017; 46: 348–355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr7-09622802231224638] 7.Bernal JL, Andrews N, Amirthalingam G. The use of quasi-experimental designs for vaccine evaluation. Clin Infect Dis 2019; 68: 1769–1776. [DOI] [PubMed] [Google Scholar]

[bibr8-09622802231224638] 8.Shadish WR, Cook TD, Campbell D. Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston, MA: Houghton Mifflin, 2002. [Google Scholar]

[bibr9-09622802231224638] 9.Abadie A. Semiparametric difference-in-differences estimators. Rev Econ Stud 2005; 72: 1–19. [Google Scholar]

[bibr10-09622802231224638] 10.Abadie A, Gardeazabal J. The economic costs of conflict: a case study of the Basque country. Am Econ Rev 2003; 93: 113–132. [Google Scholar]

[bibr11-09622802231224638] 11.Abadie A, Diamond A, Hainmueller J. Comparative politics and the synthetic control method. Am J Pol Sci 2015; 59: 495–510. [Google Scholar]

[bibr12-09622802231224638] 12.Athey S, Imbens GW. The state of applied econometrics: causality and policy evaluation. J Econ Perspect 2017; 31: 3–32.29465214 [Google Scholar]

[bibr13-09622802231224638] 13.Ben-Michael E, Feller A, Rothstein J. The augmented synthetic control method. J Am Stat Assoc 2021; 116: 1–27. [Google Scholar]

[bibr14-09622802231224638] 14.Pieters H, Curzi D, Olper A. et al. Effect of democratic reforms on child mortality: a synthetic control analysis. Lancet Glob Health 2016; 4: e627–e632. [DOI] [PubMed] [Google Scholar]

[bibr15-09622802231224638] 15.Geloso V, Pavlik JB. The Cuban revolution and infant mortality: a synthetic control approach. Explor Econ Hist 2021; 80: 101376. [Google Scholar]

[bibr16-09622802231224638] 16.Marinello S, Leider J, Pugach O. et al. The impact of the Philadelphia beverage tax on employment: a synthetic control analysis. Econ Hum Biol 2021; 40: 100939. [DOI] [PubMed] [Google Scholar]

[bibr17-09622802231224638] 17.Dai J, Liu Z, Li R. Improving the subway attraction for the post-COVID-19 era: the role of fare-free public transport policy. Transp Policy (Oxf) 2021; 103: 21–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr18-09622802231224638] 18.Mourtgos SM, Adams IT, Nix J. Elevated police turnover following the summer of George Floyd protests: a synthetic control study. Criminol Public Policy 2021; 21: 9–33. [Google Scholar]

[bibr19-09622802231224638] 19.Heersink B, Peterson BD, Jenkins JA. Disasters and elections: estimating the net effect of damage and relief in historical perspective. Polit Anal 2017; 25: 260–268. [Google Scholar]

[bibr20-09622802231224638] 20.Donohue JJ, Aneja A, Weber KD. Right-to-carry laws and violent crime: a comprehensive assessment using panel data and a state-level synthetic control analysis. J Empir Leg Stud 2019; 16: 198–247. [Google Scholar]

[bibr21-09622802231224638] 21.Gutierrez IA, Weinberger G, Engberg J. Improving Teaching Effectiveness: Impact on Student Outcomes: The Intensive Partnerships for Effective Teaching Through 2013–2014. Santa Monica, CA: RAND Corporation, 2016. [Google Scholar]

[bibr22-09622802231224638] 22.Bifulco R, Rubenstein R, Sohn H. Using synthetic controls to evaluate the effect of unique interventions: the case of say yes to education. Eval Rev 2017; 41: 593–619. [DOI] [PubMed] [Google Scholar]

[bibr23-09622802231224638] 23.Bouttell J, Craig P, Lewsey J. et al. Synthetic control methodology as a tool for evaluating population-level health interventions. J Epidemiol Commun Health 2018; 72: 673–678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr24-09622802231224638] 24.Cole MA, Elliott RJR, Liu B. The impact of the Wuhan Covid-19 lockdown on air pollution and health: a machine learning and augmented synthetic control approach. Environ Resour Econ (Dordr) 2020; 76: 1–28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr25-09622802231224638] 25.Born B, Dietrich AM, Müller GJ. The lockdown effect: a counterfactual for Sweden. PLoS ONE 2021; 16: 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr26-09622802231224638] 26.Barber A, West J. Conditional cash lotteries increase Covid-19 vaccination rates. J Health Econ 2022; 81: 102578. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr27-09622802231224638] 27.Brehm M, Brehm P, Saavedra MH. The Ohio vaccine lottery and starting vaccination rates. Am J Health Econ 2022; 8: 387–411. [Google Scholar]

[bibr52-09622802231224638] 28.Mills MC, Rüttenauer T. The effect of mandatory Covid-19 certificates on vaccine uptake: synthetic-control modelling of six countries. Lancet Public Health 2022; 7: e15–e22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr28-09622802231224638] 29.Xin M, Shalaby A, Feng S. et al. Impacts of Covid-19 on urban rail transit ridership using the synthetic control method. Transp Policy (Oxf) 2021; 111: 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr29-09622802231224638] 30.Thomas R, Cirera L, Brew J. et al. The short-term impact of a malaria elimination initiative in southern Mozambique: application of the synthetic control method to routine surveillance data. Health Econ 2021; 30: 2168–2184. [DOI] [PubMed] [Google Scholar]

[bibr30-09622802231224638] 31.Bonander C, Humphreys DK, Esposti MD. Synthetic control methods for the evaluation of single-unit interventions in epidemiology: a tutorial. Am J Epidemiol 2021; 190: 2700–2711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr31-09622802231224638] 32.Abadie A. Using synthetic controls: feasibility, data requirements, and methodological aspects. J Econ Lit 2021; 59: 391–425. [Google Scholar]

[bibr32-09622802231224638] 33.Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974; 66: 688–701. [Google Scholar]

[bibr33-09622802231224638] 34.Chernozhukov V, Wüthrich K, Zhu Y. An exact and robust conformal inference method for counterfactual and synthetic controls. J Am Stat Assoc 2021; 116: 1849–1864. [Google Scholar]

[bibr34-09622802231224638] 35.Bai J. Panel data models with interactive fixed effects. Econometrica 2009; 77: 1229–1279. [Google Scholar]

[bibr35-09622802231224638] 36.Firpo S, Possebom V. Synthetic control method: inference, sensitivity analysis and confidence sets. J Causal Inference 2018; 6: 20160026. [Google Scholar]

[bibr36-09622802231224638] 37.Li KT. Statistical inference for average treatment effects estimated by synthetic control methods. J Am Stat Assoc 2020; 115: 2068–2083. [Google Scholar]

[bibr37-09622802231224638] 38.Mcclelland R, Mucciolo L. An update on the synthetic control method as a tool to understand state policy. Tax Policy Center 2022. [Google Scholar]

[bibr38-09622802231224638] 39.Kellogg M, Mogstad M, Pouliot GA. et al. Combining matching and synthetic control to tradeoff biases from extrapolation and interpolation. J Am Stat Assoc 2021; 116: 1804–1816. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr39-09622802231224638] 40.Abadie A, L’Hour J. A penalized synthetic control estimator for disaggregated data. J Am Stat Assoc 2021; 116: 1817–1834. [Google Scholar]

[bibr40-09622802231224638] 41.Abadie A, Diamond A, Hainmueller J. Synth: an R package for synthetic control methods in comparative case studies. J Stat Softw 2011; 42: 1–17. [Google Scholar]

[bibr41-09622802231224638] 42.Hastie T, Friedman J, Tisbshirani R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (2nd ed.) New York, NY: Springer, 2009. [Google Scholar]

[bibr42-09622802231224638] 43.World Health Organization. World malaria report, 2021.

[bibr43-09622802231224638] 44.Cattaneo MD, Feng Y, Titiunik R. Prediction intervals for synthetic control methods. J Am Stat Assoc 2021; 116: 1865–1880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr44-09622802231224638] 45.Shaikh AM, Toulis P. Randomization tests in observational studies with staggered adoption of treatment. J Am Stat Assoc 2021; 116: 1835–1848. [Google Scholar]

[bibr45-09622802231224638] 46.Robbins MW, Saunders J, Kilmer B. A framework for synthetic control methods with high-dimensional, micro-level data: evaluating a neighborhood-specific crime intervention. J Am Stat Assoc 2017; 112: 109–126. [Google Scholar]

[bibr46-09622802231224638] 47.Cao J, Dowd C. Estimation and Inference for Synthetic Control Methods with Spillover Effects, 2019. arXiv:1902.07343.

[bibr47-09622802231224638] 48.Di Stefano R, Mellace G. The inclusive synthetic control method. Working Papers 21/20, Sapienza University of Rome, DISS.. 2020. [Google Scholar]

[bibr48-09622802231224638] 49.Grossi G, Lattarulo P, Mariani M, et al. Direct and spillover effects of a new tramway line on the commercial vitality of peripheral streets. A synthetic-control approach, 2022. arXiv:1902.07343.

[bibr49-09622802231224638] 50.Menchetti F, Bojinov I. Estimating the effectiveness of permanent price reductions for competing products using multivariate Bayesian structural time series models. Ann Appl Stat 2022; 16: 414–435. [Google Scholar]

[bibr50-09622802231224638] 51.Daza EJ, Schneider L. Model-twin randomization (motr): A Monte Carlo method for estimating the within-individual average treatment effect using wearable sensors, 2023. arXiv.2208.00739.

PERMALINK

The augmented synthetic control method in public health and biomedical research

Taylor Krajewski

Michael Hudgens

Abstract

1. Introduction

2. The synthetic control method

2.1. Notation and implementation

2.2. Estimation error

2.3. Inference

2.4. Considerations for application

2.5. An application of SCM

Figure 1.

2.6. Modifications of the synthetic control method

3. The augmented synthetic control method

3.1. Overview

3.2. Ridge ASCM

3.3. Inference

3.4. Incorporating covariates

4. Malaria elimination initiative in Southern Mozambique

Table 1.

4.1. Effect of the MDA initiative in Magude using all available controls

Figure 2.

Figure 3.

Table 2.

4.2. Effect of the MDA initiative in Magude using a subset of available controls

Table 3.

Figure 4.

Table 4.

5. Discussion

Supplemental Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases