Abstract
Many research questions in public health and medicine concern sustained interventions in populations defined by substantive priorities. Existing methods to answer such questions typically require a measured covariate set sufficient to control confounding, which can be questionable in observational studies. Differences-in-differences relies instead on the parallel trends assumption, allowing for some types of time-invariant unmeasured confounding. However, most existing difference-in-differences implementations are limited to point treatments in restricted subpopulations. We derive identification results for population effects of sustained treatments under parallel trends assumptions. In particular, in settings where all individuals begin follow-up with exposure status consistent with the treatment plan of interest but may deviate at later times, a version of Robins’ g-formula identifies the intervention-specific mean under SUTVA, positivity, and parallel trends. We develop consistent asymptotically normal estimators based on inverse-probability weighting, outcome regression, and a double robust estimator based on targeted maximum likelihood. Simulation studies confirm theoretical results and support the use of the proposed estimators at realistic sample sizes. As an example, the methods are used to estimate the effect of a hypothetical federal stay-at-home order on all-cause mortality during the COVID-19 pandemic in spring 2020 in the United States.
Keywords: causal inference, difference-in-differences, g-formula, observational study, unmeasured confounding
1. Introduction
Many epidemiologic and other empirical studies concern the effects of sustained treatment strategies on population average outcomes over time. A sustained treatment or intervention is one that sets values of a time-varying exposure via a predetermined plan or algorithm. For example, clinical studies are often concerned with optimal dosing plans for therapeutic drugs, and policy investigations are often concerned with policies that determine exposure distributions repeatedly over time for the population residing in a jurisdiction.
Existing approaches to estimating effects of sustained interventions in well-defined populations include g-computation (Robins, 1986), inverse probability of treatment weighted (IPTW) marginal structural models (Robins, 2000; Robins et al., 2000), g-estimation of structural nested models (Robins, 1989), and double robust methods such as augmented IPTW (Bang and Robins, 2005) and targeted maximum likelihood (van der Laan and Gruber, 2012). Importantly, these approaches base causal identification on a sequential version of exchangeability (Robins, 1986), also known as sequential ignorability or no unmeasured confounders (Robins, 2000). Sequential exchangeability posits that the potential outcomes are independent of treatment, given the history of some set of measured (possibly time-varying) covariates and treatment; this assumption is unverifiable and can be implausible in many settings. For example, individuals may select medical treatments based on unmeasured risk factors, and public policies are decided in highly complex political contexts that may influence health.
In contrast, difference-in-differences (DID) methods typically base identification on parallel trends assumptions rather than sequential exchangeability (Ashenfelter and Card, 1985; Roth et al., 2022). Parallel trends assumptions posit that time trends in average potential outcomes are independent of the observed treatment (Ashenfelter and Card, 1985; Marcus and Sant’Anna, 2021). DID methods typically focus on the average treatment effect in the treated for a treatment occurring at a single time point, although recently extensions have considered certain types of sustained treatment regimes (for a review, see Roth et al., 2022). In particular, Callaway and Sant’Anna (2021) and de Chaisemartin and D’Haultfoeuille (2020, 2021b) consider effects conditional on each observed treatment path in a monotonic (i.e., staggered adoption) setting, meaning that values of the observed time-varying treatment can either increase or decrease over time, but not both. Somewhat more generally, de Chaisemartin and D’Haultfoeuille (2021a) consider interventions fixing a (possibly non-monotonic) binary or ordinal exposure to its baseline status, focusing on unconditional cost-effectiveness ratios and outcome regression estimators. Relevant to the present work, these recent DID developments have included doubly robust estimators (Callaway and Sant’Anna, 2021; Sant’Anna and Zhao, 2020).
This paper considers a more general setting and different estimands than previous DID implementations, focusing on marginal effects of general sustained treatment strategies under parallel trends assumptions. The remainder of this paper is organized as follows. Section 2 introduces notation and the assumed data structure. Section 3 presents new identification results for an intervention-specific mean under parallel trends, where identifying formulas are modifications of Robins’ (1986) g-computation algorithm formula (g-formula). Section 4 then presents consistent and asymptotically normal (CAN) estimators based on inverse probability weighting, outcome regression, and a double robust estimator that combines both. Section 5 presents simulation results that support the identification result and theoretical large sample properties of the proposed estimators. Section 6 presents an example estimating the number of lives that would have been saved by a U.S. federal stay-at-home order during the COVID-19 pandemic in spring 2020. Section 7 considers sensitivity analysis for violations of parallel trends and application of the poposed approach to dynamic treatment regimens. Section 8 concludes.
2. Preliminaries
2.1. Data
Suppose data are observed on individuals (or units) at time points , where: are (possibly vector-valued) covariates; are discrete, possibly multivariate treatments realized after ; and are outcomes realized after , all measured without error. Denote history of a variable with overbars, e.g., , with and for by convention. Upper case is used throughout to refer to random variables, lower case refers to specific realizations, and scripts refer to the support. The subscript is omitted unless needed to resolve ambiguity. Throughout, it is assumed that represent independent and identically distributed (iid) draws from a relevant target population.
Assume the data come from a staggered discontinuation design, defined as follows. Suppose the target estimand is where denotes the treatment strategy or intervention plan of interest, and denotes a potential outcome; i.e., the value would take under the intervention setting . The approach in this paper requires that in the observed data distribution, , or in other words that for all . Such a scenario is said to be a staggered discontinuation design with respect to the treatment plan of interest, because units begin follow-up under the treatment plan but may discontinue at later points in a staggered way. Note that we do not require monotonic treatment assignment, in contrast to recent DID papers (e.g., Goodman-Bacon, 2021; Callaway and Sant’Anna, 2021).
2.2. Motivating example
Consider the question, “what effects on all-cause mortality would a U.S. federal stay-at-home order have had in spring 2020 during the COVID-19 pandemic?” Let be a binary indicator that individual died during week , , measured as weeks since April 6, 2020. Let be a binary indicator that the state in which individual was living during week was under a state-level stay-at-home or shelter-in-place order. Suppose it is of interest to estimate , the difference in U.S. mortality rates under a hypothetical federal stay-at-home order vs. under the observed treatment trajectory (i.e., the “natural course”). As of April 6, 43/50 U.S. states were under stay-at-home orders, which were discontinued at times ranging from late April to late June, with the exception of California which continued through December (Figure 1). Thus, the observed treatment trajectories give rise to a staggered discontinuation design with respect to the treatment plan setting everyone to remain under stay-at-home order in those 43 states. The methods developed below can be used to draw inference about what would have happened had such a policy been implemented.
Figure 1:
Dates of state-issued stay-at-home orders in U.S. states during the COVID-19 pandemic in 2020
3. Identification
In this section we consider identification, given data , of the quantity , the mean outcome at time , under the intervention to set all individuals to . Throughout, it is assumed that interest lies in only one intervention . Note that depends on , which is left implicit for notational simplicity. Consider the following assumptions:
Assumption 1.
(Stable Unit Treatment Value Assumption [SUTVA]): If , then for .
In words, Assumption 1 requires that an individual’s potential outcomes are not affected by other individuals’ treatments (i.e., no interference), allowing us to index potential outcomes by each individual’s treatments alone. Assumption 1 also requires that the treatment is defined precisely enough so that, for individuals whose observed treatment equals , observed outcomes can stand in for counterfactual outcomes under a hypothetical intervention . Implicit in Assumption 1 is that the future cannot affect the past, or for . Thus, SUTVA would be violated, for example, if individuals were able to anticipate future treatments and change their behavior. This particular violation has typically been addressed under a separate “no anticipation” assumption in the DID literature (e.g., Callaway and Sant’Anna, 2021). SUTVA is an unverifiable assumption in observational studies, though it is sometimes possible to test for the presence of interference (Halloran and Hudgens, 2016).
Assumption 2.
(Positivity): If , then , for .
Here and throughout, refers to a conditional density if is continuous, and a conditional probability mass function if is discrete. Assumption 2 requires that units whose treatment history up to time is consistent with the regime in question have positive probability of remaining under treatment plan at time . Positivity can sometimes be a verifiable assumption. In particular, if both and are low-dimensional and discrete, then among units with , if one observes units who remain under at time in every stratum of , this implies positivity in the population with probability 1 (but not necessarily the reverse).
Assumption 3.
(Parallel trends): For :
In words, Assumption 3 states that, among individuals whose treatment status is consistent with the intervention up to time , had (counter to fact) all individuals followed the intervention through time , trends would have been parallel for those who do and do not follow the intervention at time but have equal covariate histories. This assumption is very similar to those adopted in papers describing event study and staggered adoption DID designs such as Goodman-Bacon (2021) and Callaway and Sant’Anna (2021), and nearly identical to Assumption 12 in de Chaisemartin and D’Haultfoeuille (2021a). The latter differs by considering only certain types of treatment regimes and presuming a linear relation between covariates and counterfactual trends. Parallel trends is unverifiable, though closely related conditions can often be checked (Roth, 2019). Note that may include prior outcomes for . However, if is included in , then the parallel trends assumption is equivalent to sequential exchangeability, in which case existing causal inference methods for observational data with a longitudinal exposure can be used (e.g., Robins, 1986, 2000; van der Laan and Gruber, 2012).
The following Lemma presents the main identification results in this paper, which show that Assumptions 1-3 are sufficient to equate to a function of the observed data distribution.
Lemma 1.
(Parallel trends g-formula) Define the functional (i.e., statistical parameter) . Under a staggered discontinuation design and if Assumptions 1-3 hold, then .
Here and throughout, refers to a conditional cumulative distribution function. Lemma 1 states that the target causal quantity is identified by the parameter . The parameter is referred to as the parallel trends -formula because it represents a modification of the usual -formula (the dependence of on is also left implicit). A formal proof of Lemma 1 by induction is presented in Appendix A. Here we give a less formal explanation to build intuition. We have:
where the first equality follows by adding and subtracting constants, the second by iterated expectation, and the third by Assumption 3. Repeatedly applying iterated expectation and Assumption 3, we have:
where the second equality follows from Assumption 1 and the last equality from iterated expectation.
4. Estimators
This section presents estimators for the statistical parameter , which equals the target quantity under the above stated assumptions. The estimators in this section utilize existing estimators of under sequential exchangeability rather than parallel trends, all of which are CAN estimators of the g-formula by virtue of being solutions to unbiased estimating equations (Stefanski and Boos, 2002). Since is a continuous function of several g-formulas, the same function applied to estimators of those g-formulas is a CAN estimator for . The remainder of this section formalizes this logic and gives examples of specific estimators that function in this capacity. The estimators presented in this section are provided in an R package (see Supporting Information).
4.1. General form
Here we derive a general form of a CAN estimator for the target statistical parameter, . First, define:
| (1) |
Equation (1) is the g-formula, developed in the context of identifying parameters like under sequential exchangeability. Here, sequential exchangeability is not assumed, and therefore is not interpretable as a causal parameter; instead existing estimators of the statistical parameter are used to assemble estimators of (which equals the causal parameter under Assumptions 1-3) by noting that . Next, suppose there is an estimator of which is the solution to an unbiased estimating function ; i.e., . Let . Then simply define
| (2) |
Clearly, , indicating that an estimator that jointly solves will yield a CAN estimator for .
The following subsections show how different options for (including IPTW, g-computation, and targeted maximum likelihood estimation [TMLE]) can be constructed and stacked with (2) to form estimators of that inherit desirable properties (e.g., consistency, asymptotic normality, double robustness).
4.2. Inverse probability of treatment weighted (IPTW) estimator
Define stabilized inverse probability of treatment weights (IPTWs) as . Robins (2000) (Lemma 1.1) showed that , where is the unique function such that for all functions where the expectation exists. The equation defines a regression of on weighted by . In the context of sequential exchangeability, estimators based on this weighted regression formulation are called IPTW-marginal structural model estimators and given a causal interpretation (Robins et al., 1992; Robins, 2000). However, in the context of this paper, sequential exchangeability is not assumed and so the weighted regression equation does not have a causal interpretation on its own. Instead, the above result together with results from Section 4.1, implies that an IPTW estimator of can be formed by using a linear combination of IPTW marginal structural model estimators for .
For simplicity of presentation, assume that for and are known up to a finite dimensional parameter. That is, define and , and say we are willing to assume that is uniquely determined by the parametric model , and similarly that , where and are finite dimensional parameter vectors. Say we have an estimator that solves an unbiased estimating equation for . For example, and may consist of generalized linear models with parameters estimated by maximum likelihood. Specify as some appropriate functional form for the expected value of conditional on in the weighted data distribution, such as (i.e., leaving the model unrestricted when ). Then an estimator that solves is CAN for if , and are correctly specified. Finally, stack the score equations for , and , along with and equation (2) to yield an estimator for , say .
In other words, the IPTW estimator for the target parameter of interest is , where are estimators of each appropriate g-formula parameter based on an IPTW model. Note that under our assumption set, are not estimators of causal quantities in and of themselves, but simply functions of the observed data distribution that may be assembled appropriately to form the causal estimator . Clearly, solves an estimating equation that is unbiased if , and are all correctly specified, implying is CAN for under the same conditions. However, IPTW estimators are known to be inefficient and may similarly inherit this property. The following subsections present estimators that may improve on efficiency relative to IPTW.
4.3. Iterated conditional expectation (ICE) estimator
Bang and Robins (2005) describe an estimator of based on the following iterated conditional expectation (ICE) representation:
| (3) |
which can equivalently be written as where, for and .
An estimator of can then be formulated based on this representation. For simplicity, say we are willing to assume that are known up to a finite dimensional parameter for . That is, assume , where for , are finite dimensional parameters. For example, may be a generalized linear model with parameters . Say we are in possession of an unbiased estimating function for . For example, if maximum likelihood is used, then is the vector of first derivatives of the model log-likelihood with respect to . Note that including as an argument to the estimating function makes explicit the nested nature of the iterated expectations being modeled. The ICE estimator of is then defined (Bang and Robins, 2005) as the solution to where
Then, simply stack with (2) to yield an estimator for . In other words, the ICE estimator of the target parameter is , where each is an estimator of the corresponding g-formula parameter based on ICE g-computation. Clearly, solves an unbiased estimating equation whenever all the iterated outcome models are correctly specified. Estimators of based on outcome regression generally have smaller asymptotic variance that IPTW estimators, and may inherit this property.
4.4. Doubly robust targeted maximum likelihood estimator (TMLE)
IPTW estimators are only guaranteed to be CAN if the treatment models are correctly specified, and ICE estimators are only guaranteed to be CAN if all the outcome models are correctly specified. Doubly robust estimators are CAN if either the outcome or treatment models are correct (but not necessarily both), which is an advantage because one is rarely certain that models are correctly specified.
Doubly robust estimators of generally consist of augmenting the ICE algorithm by including predicted values from the treatment models used to construct IPTWs in some way. Such estimators are called semiparametric efficient if they solve the estimating equation corresponding to the following efficient influence curve (van der Laan and Gruber, 2012; Tran et al., 2019):
| (4) |
with and defined as in previous sections. Many estimators correspond to this efficient influence curve, meaning they all have the smallest asymptotic variance of any regular asymptotically linear estimator in this class (Bang and Robins, 2005; van der Laan and Gruber, 2012). We present one such example of a targeted maximum likelihood estimator (TMLE) which may outperform others in finite samples (Tran et al., 2019).
First consider the TMLE of . For simplicity, assume that outcome models and treatment models are known up to a finite dimensional parameter. That is, assume and , where and are finite dimensional parameters, . Then proceed as follows:
For , estimate , for example using maximum likelihood. Denote estimators of as and corresponding estimators of as .
For , estimate , for example using maximum likelihood, denoting this estimator . Calculate for each unit and denote this estimator . Note that these are model predictions that implicitly depend on the data, and so vary across units .
Also for , update the initial fit by fitting a new model, defined as , where is an appropriate link function, is an intercept, and are conditional expectations under the updated model. Note, the response variable in this model is . The logit link is recommended to ensure the estimator respects bounds implied by the data (if is not bounded by (0,1), it will need to be appropriately transformed for the logit function to be defined) (van der Laan and Gruber, 2012). Estimators for the updated fit are found by maximizing an appropriate weighted likelihood with weights .
Repeat steps 2–3, estimating and for .
The TMLE for is then defined as .
Then, the TMLE for is defined as . Since solves the estimating equation corresponding to the efficient influence curve (4), it will be CAN for so long as either (i) the set of outcome models are correctly specified, or (ii) the set of treatment models are correctly specified, but it is not necessary that both be correct. Therefore, if one of these two conditions holds for all and , then will be CAN for . The double robustness property carries through to by virtue of the fact that the estimating equation in (2) is unbiased if the estimating equations for all the are unbiased, which is the case for under conditions (i) or (ii) above.
5. Simulation study
A simulation study was conducted to evaluate the finite sample performance of the IPTW, ICE, and TMLE estimators described in Section 4 when Assumptions 1-3 hold and all models were correctly specified. The TMLE estimator was also evaluated under misspecification of either the treatment or outcome model. Code for the simulation is provided in an R package (see Supporting Information).
5.1. Data generating distribution
Data were generated from the distributions ; ; ; ; and ; with and if . The monotonic treatment assignment for is not necessary, but simplifies analysis. In all analyses, is treated as unmeasured, but all other variables are observed. Parallel trends hold in this setup, as one sufficient set of conditions for parallel trends (proven in Appendix B) is that the only unmeasured variables (here, ) are time-invariant, do not affect time-varying covariates, and enter the outcome model linearly with constant coefficient over time (here, is constant over ).
We simulated 1,000 datasets each for sample sizes , 10,000, and 100,000. All parameters were generated from a distribution, with the same values for each parameter used across all simulation runs. The target parameter was , the mean outcome at end of follow-up, had everyone remained untreated. The true difference compared to the natural course was .
5.2. Estimator implementation
For each simulated dataset, the estimators , , and were calculated using correctly specified generalized linear regression models estimated using maximum likelihood. Additionally, was calculated with treatment models misspecified, with outcome models misspecified, and with treatment models, outcome models, or both misspecified. All misspecified models omitted the term for at each time .
5.3. Simulation results
Table 1 shows estimates of the bias, variance, and p-values from a Lilliefors test for normality, for each estimator of . The results suggest that all stated theoretical properties hold approximately in simulated data. First, when all models are correctly specified, all estimators appear approximately unbiased with decreasing variance as the sample size increases. When outcome models and treatment models are misspecified, ICE and IPTW estimators appear biased, respectively. TMLE appears consistent when either the treatment or outcome models are correctly specified, but not when both are misspecified, supporting the double robustness property. Lastly, all estimators appear normally distributed for all sample sizes considered, based on Lilliefors tests.
Table 1:
Simulation results
| n = 1, 000 | n = 10, 000 | n = 100, 000 | |||||||
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
| estimator | variance1/n | bias2 | p3 | variance1/n | bias2 | p3 | variance1/n | bias2 | p3 |
| ice_qfal | 4.3 | 1.15 | 0.68 | 4.6 | 1.10 | 0.81 | 4.5 | 1.16 | 0.76 |
| ice_true | 4.2 | −0.04 | 0.82 | 4.5 | −0.02 | 0.23 | 4.2 | 0.02 | 0.45 |
| iptw_gfal | 4.5 | 1.17 | 0.86 | 4.7 | 1.17 | 0.49 | 4.6 | 1.25 | 0.69 |
| iptw_true | 6.0 | −0.17 | 0.37 | 6.3 | −0.08 | 0.87 | 7.4 | −0.01 | 0.57 |
| tmle_bfal | 4.4 | 1.19 | 0.51 | 4.6 | 1.15 | 0.65 | 4.5 | 1.22 | 0.62 |
| tmle_gfal | 4.2 | −0.08 | 0.86 | 4.5 | −0.04 | 0.28 | 4.3 | 0.02 | 0.21 |
| tmle_qfal | 5.9 | −0.09 | 0.15 | 6.4 | −0.07 | 0.78 | 7.6 | −0.02 | 0.41 |
| tmle_true | 5.4 | −0.03 | 0.82 | 5.6 | 0.01 | 0.11 | 5.7 | 0.01 | 0.69 |
Empirical variance of estimates over 1000 simulated datasets.
Multiplied by 100.
P-value for Lilliefors test against the null hypothesis of normality.
Abbreviations: ice=iterated conditional expectation, iptw=inverse probability of treatment weighted, tmle=targeted maximum likelihood, qfal=outcome models misspecified, gfal=treatment models misspecified, bfal=both sets of models misspecified, true=all models correctly specified.
6. COVID-19 application
6.1. Data
This section presents an analysis of the motivating example, introduced in Section 2.2. Code and data are provided in the R package didgformula (see Supporting Information). State-level weekly mortality data come from the Centers for Disease Control and Prevention’s National Death Index, and weekly counts of COVID-19 cases from the COVID-19 Data Repository at the Center for Systems Science and Engineering at Johns Hopkins University. Data on state-level stay-at-home orders come from the COVID-19 U.S. State Policy database. Though the outcome variable of interest is an individual-level indicator of death in week , this variable is not directly observed; instead the observed data represent counts of deaths occurring in each state. Let be a state index, and let be an indicator of mortality during week for the individual living in state , where denotes the population size in state , and million. The observed outcome variable is , the state-level weekly sum of individual-level mortality counts, along with population counts (drawn from the 2010 Census). The observed treatment variable is an indicator of state being under stay-at-home order in week . Finally, let be the change in confirmed COVID-19 cases reported per 100k population in the previous four weeks (i.e., the difference from week to ) in state . Thus, in this example, the parallel trends assumption is conditional on the local state of the pandemic, which may be plausible for pandemic-related policies (Callaway and Li, 2021).
6.2. Estimator implementation
6.2.1. IPTW
For the treatment models, the following parametric models pooled over were assumed:
where is a natural cubic spline basis with 3 degrees of freedom for time . The outcome model , , was specified, which allows the outcome to depend on the full exposure history. The parameters and were estimated using maximum likelihood, weighted by to account for differing population sizes across states. Then, were estimated by maximizing the state-level binomial likelihood weighted by inverse probability of treatment weights , where and denote maximum likelihood estimators of and . Then estimators , were calculated as , where and ; denote the weighted maximum likelihood estimators.
6.2.2. ICE
For ICE estimators, the following parametric outcome regression models pooled over were assumed:
for and , where again refers to a natural cubic spline basis with 3 degrees of freedom. Note that, due to the monotonic treatment pattern, the interaction between time and treatment allows the outcome to depend on the full exposure history. The parameters were estimated by maximizing a binomial quasilikelihood, with estimators denoted . To account for varying state population sizes, state contributions to the quasilikelihood were weighted by . Finally, ICE estimators were calculated as , where (as there are 43 states included in the analysis).
6.2.3. TMLE
For TMLE, the same treatment models as specified for IPTW were used, along with the same outcome models as specified for ICE. Specifically, when estimating , for the ICE step , the TMLE updating step was performed by maximizing another weighted quasibinomial likelihood with response variable with an intercept and offset , weighted by . Predictions from this model were then passed to the ICE step, and the process was repeated for . Finally, were calculated as , where .
6.2.4. Bootstrap standard errors and confidence intervals
Standard errors were estimated using a nonparametric bootstrap. Specifically, for bootstrap replicates , a resampled outcome variable was drawn from a multinomial distribution with trials and probabilities , where denotes the number of individuals who survived beyond in state . IPTW, ICE, and TMLE estimators , , were calculated on each replicate . Then, Wald 95% confidence intervals were computed using the standard deviation of bootstrap estimates.
6.3. Results
Figure 2 shows results in the form of estimated U.S. weekly mortality rates per 100,000 person weeks over the study period under the natural course (red) and under the hypothetical sustained treatment of setting for all , i.e., under a scenario where all 43 included states maintained stay-at-home orders through June 2020. The three estimators largely agree in their predictions that all-cause mortality rates would have been moderately lower throughout most of the study period, had stay-at-home orders remained in place. Translating the counterfactual mortality rate estimates to lives saved, if all causal and modeling assumptions hold, based on TMLE, stay-at-home orders remaining in place from April through June 2020 would have saved appoximately 11,100 (95% CI: 6,800, 15,500) lives in those 43 states during the same time period. Results based on ICE were similar (point estimate: 11,300, 95% CI: 6,900, 15,600), whereas IPTW gave a smaller point estimate and somewhat wider CI (point estimate: 4,100, 95% CI: −500, 8,700).
Figure 2:
Estimated U.S. weekly mortality rates - observed (red) and estimated under hypothetical treatment setting all states to remain under stay-at-home order using IPTW (green), ICE (blue), and TMLE (purple). Note that TMLE and ICE estimates and 95% CIs are nearly identical.
7. Extensions
7.1. Violations of parallel trends
In some applications, the parallel trends assumption (Assumption 3) may be questionable, and investigators may be interested in how inferences are altered by plausible deviations from parallel trends. A sensitivity analysis can be conducted as follows. Let
where quantifies a deviation from parallel trends, which may depend on both the covariates and time . Then, consider the following statistical parameter:
If Assumptions 1 and 2 hold, then (the proof follows from results in Appendix A). If a particular value is assumed known for , then estimation can proceed by defining
and noting that . As with , the parameter is simply a special case of the usual g-formula, where the outcome variable is . Thus, the IPTW, ICE, and TMLE estimators can be used, replacing outcome variables with (but not for ). In practice, will typically not be known, and thus estimates may be computed over a range of plausible values of . Differences in trends between subgroups of units before discontinuation occurs may be helpful in determining plausible values of (Roth and Rambachan, 2019).
7.2. Dynamic regimes
In addition to the static regimes considered above, the proposed approach can accommodate regimes where treatment decisions may depend on the history of covariates and/or treatments. Let denote a dynamic regime, where returns the treatment value that would be assigned given covariate history . Note that may depend on treatment history as well, which we suppress for notational simplicity. Likewise let be a potential outcome under treatment regime . Suppose interest is in the estimand . Then, consider the following modifications to Assumptions 1-3.
Assumption 4.
(SUTVA for dynamic regimes): If , then for .
Assumption 5.
(Positivity for dynamic regimes): If , then , for ; .
Assumption 6.
(Parallel trends for dynamic regimes): For , :
Lemma 2.
(Parallel trends g-formula, dynamic regimes) Define the functional (i.e., statistical parameter)
Under a staggered discontinuation design and if Assumptions 4-6 hold, then .
The proof of Lemma 2 follows from results in Appendix A. Thus, the IPTW, ICE, and TMLE estimators described can be used, with appropriately redefined.
8. Discussion
This paper considers a new approach to identifying effects of sustained intervention strategies based on an assumption set that includes parallel trends. This assumption is popular in difference-in-differences because it allows for some degree of unmeasured confounding (Zeldow and Hatfield, 2021). Recently, parallel trends assumptions have been leveraged to target sustained treatment estimands, mainly considering certain types of treatment regimes (Callaway and Sant’Anna, 2021; de Chaisemartin and D’Haultfoeuille, 2020, 2021b,a). Relative to previous work, the main contribution of this paper is a framework for estimating marginal intervention-specific means for general treatment regimes (including dynamic regimes) under parallel trends. This is accomplished by building on IPTW, g-computation, and doubly-robust TMLE developed in the context of sequential exchangeability, thus connecting disparate causal inference literatures from biostatistics (Robins, 1986, 2000; Bang and Robins, 2005; van der Laan and Gruber, 2012) and econometrics (Ashenfelter and Card, 1985; Callaway and Sant’Anna, 2021). Independently and concurrently with the present work, Shahn et al. (2022) developed g-estimation of stuctural nested models for general sustained treatment regimes under parallel trends, with results that imply identification for the intervention-specific means considered here. While it is possible (but complex) to estimate the latter quantity using the g-estimation approach of Shahn et al. (2022), the main strength of g-estimation is in exploring effect heterogeneity by time-varying covariates.
Regarding the example presented in Section 6, care should be taken when assuming parallel trends for pandemic-related outcomes without conditioning on pandemic state variables such as infection rates, as marginal parallel trends are incompatible with standard epidemic models (Callaway and Li, 2021). DID methods have previously been used to estimate effects of stay-at-home orders on the treated (e.g. Fowler et al., 2021). The methods in this paper allow for (i) a different target parameter that may more directly correspond to decisions facing policy makers and public health officials (Maldonado and Greenland, 2002), and (ii) adjustment for time-varying pandemic state variables likely affected by prior treatment, which DID methods have only recently begun to consider (Callaway and Li, 2021). That said, assessing the effects of stay-at-home orders is complex, and a comprehensive analysis would need to consider potential biases not factored into the present analysis; e.g., there is likely some interference (Haber et al., 2021). Thus, the application results are not meant to inform policy or scientific conclusions.
The approach presented here may have application in many other contexts. Many U.S. state-level policies have changed in such a way as to accommodate a staggered discontinuation design, including in domains other than pandemic mitigation. Outside of staggered discontinuation designs, methods developed in this paper apply more generally in settings where baseline potential outcomes are identified. For example, the approach could be used to estimate perprotocol effects in a clinical trial of a time-varying treatment regime with non-adherence.
Several areas for future research remain. First, it will be important to explore efficiency for competing estimators in this framework. Notably, the TMLE presented here is only known to be semiparametric efficient for the nuisance parameters and not necessarily for the target parameter (van der Laan and Gruber, 2012). Second, though parallel trends may be considered more plausible than sequential exchangeability in some settings, strategies for formally evaluating the assumption using domain knowledge, e.g. using causal diagrams, are in their infancy (e.g., Ghanem et al., 2022). Finally, the focus of this paper was on settings where one treatment regime is of interest. Future research could consider extensions beyond a single regimen, but caution should be exercised when assuming parallel trends for multiple regimens, which would imply certain restrictions on treatment effect heterogeneity that may not be plausible in some settings (Shahn et al., 2022).
Supplementary Material
Acknowledgements
The authors thank Dr. Whitney Robinson for helpful comments. This research was supported by the NIH grants T32-HD091058–02, T32-AI007001, R01 AI085073, and P2C-HD050924. The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health.
Appendix A Proof of Lemma 1
Here we provide a formal proof by induction of Lemma 1.
Proof. First, note that, by adding and subtracting constants,
and by SUTVA (Assumption 1), under a staggered discontinuation design because for all . Thus it remains to prove that
for . As our induction hypothesis, assume temporarily that the following holds for some such that :
| (A1) |
If (A1) is true, then, so long as , it follows that:
| (A2) |
where the first equality is by iterated expectation and the second by Assumption 3. In other words, if (A1) holds for some , (A2) proves that the same statement holds for . Next, note that, for all such that , by iterated expectation and Assumption 3:
which shows that (A1) holds for . Then, by (A2), (A1) holds for . Finally, for , by Assumption 1 we have:
□
Appendix B Proof of parallel trends in simulation data generating distribution
To prove that parallel trends holds under the simulation setup, we first introduce three additional assumptions which trivially hold in the simulation, and then show that they are sufficient to guarantee parallel trends.
Assumption 7.
(Additive equi-confounding) For all , ,
Assumption 8.
(Latent ignorability)
Assumption 9.
(Measured covariates conditionally independent of unmeasured ones)
Lemma 3.
(Implied parallel trends) Under Assumptions 7–9,
Proof. Proof of Lemma 3:
To show that parallel trends (Assumption 3) are implied by Assumptions 7–9, we show both that
| (B1) |
and
| (B2) |
To see (B1) note that, for any such that , by repeated applications of iterated expectation and Assumption 8 we have:
By Assumption 7:
By Assumption 9:
| (B3) |
The last equality holds because the inside terms do not depend on . To see (B2), repeated applications of iterated expectation and Assumption 8 again give:
Similarly, by Assumptions 7 and 9:
which is exactly equal to (B3), thus proving that parallel trends (Assumption 3) hold whenever Assumptions 7-9 hold. This ends the proof of Lemma 3. □
Thus, since Assumptions 7-9 hold in the simulation data-generating distribution, parallel trends also holds. In particular, Assumptions 8-9 are easy to see in the simulation setup, and Assumption 7 holds because:
where is the constant (over ) linear model coefficient relating to in the simulation data generating distribution described in Section 5.1.
Footnotes
Software
The R package didgformula, available at https://github.com/audreyrenson/didgformula, implements the estimators, simulation study (in vignette “simulation”), and example results (in vignette “example”) described in the paper.
Data availability statement
The data that support the findings in this paper are available as part of the R package didgformula, available at https://github.com/audreyrenson/didgformula, in the dataset called stayathome2020.
References
- Ashenfelter O. and Card D. (1985). Using the longitudinal structure of earnings to estimate the effect of training programs. The Review of Economics and Statistics 67, 648–660. [Google Scholar]
- Bang H. and Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973. [DOI] [PubMed] [Google Scholar]
- Callaway B. and Li T. (2021). Policy evaluation during a pandemic. arXiv preprint arXiv:2105.06927 1–37. [DOI] [PMC free article] [PubMed]
- Callaway B. and Sant’Anna PH (2021). Difference-in-differences with multiple time periods. Journal of Econometrics 225, 200–230. [Google Scholar]
- de Chaisemartin C. and D’Haultfoeuille X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review 11, 2964–2996. [Google Scholar]
- de Chaisemartin C. and D’Haultfoeuille X. (2021a). Difference-in-differences estimators of intertemporal treatment effects. arXiv preprint arXiv:2007.04267 1–64.
- de Chaisemartin C. and D’Haultfoeuille X. (2021b). Two-way fixed effects regressions with several treatments. arXiv preprint arXiv:2012.10077 1–34.
- Fowler J, Hill S, Levin R. and Obradovich N. (2021). Stay-at-home orders associate with subsequent decreases in COVID-19 cases and fatalities in the United States. PLoS One. 16, e0248849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghanem D, Sant’Anna P. and Wüthrich K. (2022) Selection and parallel trends. arXiv Preprint arXiv:2203.09001.
- Goodman-Bacon A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics 225, 254–277. [Google Scholar]
- Haber NA, Clarke-Deelder E, Salomon JA, Feller A, and Stuart EA (2021). Impact evaluation of coronavirus disease 2019 policy: A guide to common design issues. American Journal of Epidemiology 190, 2474–2486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halloran ME and Hudgens MG (2016). Dependent happenings: A recent methodological review. Current Epidemiology Reports 3, 297–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maldonado G. and Greenland S. (2002). Estimating causal effects. International Journal of Epidemiology 31, 422–429. [PubMed] [Google Scholar]
- Marcus M. and Sant’Anna PH (2021). The role of parallel trends in event study settings: An application to environmental economics. Journal of the Association of Environmental and Resource Economists 8, 235–275. [Google Scholar]
- Rambachan A. and Roth J. (2022). A more credible approach to parallel trends. Working Paper 1–47
- Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period application to control of the healthy worker survivor effect. Mathematical Modelling 7, 1393–1512. [Google Scholar]
- Robins JM (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. Health Service Research Methodology: A Focus on AIDS 113–159.
- Robins JM (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 95–133. Springer. [Google Scholar]
- Robins JM, Herán MÁ, and Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560. [DOI] [PubMed] [Google Scholar]
- Robins JM, Mark SD, and Newey WK (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48, 479–495. [PubMed] [Google Scholar]
- Roth J. (2019). Pre-test with caution: Event-study estimates after testing for parallel trends. Working Paper 1–54.
- Roth J, Sant’Anna PH, Bilinski A, and Poe J. (2022). What’s trending in difference-in-differences? A synthesis of the recent econometrics literature. arXiv preprint arXiv:2201.01194.
- Sant’Anna Pedro H. C. and Zhao J. (2022). Doubly robust difference-in-differences estimators. Journal of Econometrics 219, 101–122. [Google Scholar]
- Shahn Z, Dukes O, Richardson D, Tchetgen Tchetgen E, Robins J. (2022). Structural nested mean models under parallel trends assumptions. arXiv preprint arXiv:2204.10291 1–42.
- Stefanski LA and Boos DD (2002). The calculus of M-estimation. The American Statistician 56, 29–38. [Google Scholar]
- Tran L, Yiannoutsos C, Wools-Kaloustian K, Siika A, van der Laan M, and Petersen M. (2019). Double robust efficient estimators of longitudinal treatment effects: Comparative performance in simulations and a case study. International Journal of Biostatistics 15, 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Laan MJ and Gruber S. (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. International Journal of Biostatistics 8, 1–39. [DOI] [PubMed] [Google Scholar]
- Zeldow B. and Hatfield LA (2021). Confounding and regression adjustment in difference-in-differences studies. Health Services Research 56, 932–941. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings in this paper are available as part of the R package didgformula, available at https://github.com/audreyrenson/didgformula, in the dataset called stayathome2020.


