Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 16.
Published in final edited form as: Biometrics. 2023 Apr 25;79(4):2998–3009. doi: 10.1111/biom.13862

Identifying and estimating effects of sustained interventions under parallel trends assumptions

Audrey Renson 1,2,*, Michael G Hudgens 3, Alexander P Keil 1, Paul N Zivich 1, Allison E Aiello 4
PMCID: PMC10539489  NIHMSID: NIHMS1902004  PMID: 36989497

Abstract

Many research questions in public health and medicine concern sustained interventions in populations defined by substantive priorities. Existing methods to answer such questions typically require a measured covariate set sufficient to control confounding, which can be questionable in observational studies. Differences-in-differences relies instead on the parallel trends assumption, allowing for some types of time-invariant unmeasured confounding. However, most existing difference-in-differences implementations are limited to point treatments in restricted subpopulations. We derive identification results for population effects of sustained treatments under parallel trends assumptions. In particular, in settings where all individuals begin follow-up with exposure status consistent with the treatment plan of interest but may deviate at later times, a version of Robins’ g-formula identifies the intervention-specific mean under SUTVA, positivity, and parallel trends. We develop consistent asymptotically normal estimators based on inverse-probability weighting, outcome regression, and a double robust estimator based on targeted maximum likelihood. Simulation studies confirm theoretical results and support the use of the proposed estimators at realistic sample sizes. As an example, the methods are used to estimate the effect of a hypothetical federal stay-at-home order on all-cause mortality during the COVID-19 pandemic in spring 2020 in the United States.

Keywords: causal inference, difference-in-differences, g-formula, observational study, unmeasured confounding

1. Introduction

Many epidemiologic and other empirical studies concern the effects of sustained treatment strategies on population average outcomes over time. A sustained treatment or intervention is one that sets values of a time-varying exposure via a predetermined plan or algorithm. For example, clinical studies are often concerned with optimal dosing plans for therapeutic drugs, and policy investigations are often concerned with policies that determine exposure distributions repeatedly over time for the population residing in a jurisdiction.

Existing approaches to estimating effects of sustained interventions in well-defined populations include g-computation (Robins, 1986), inverse probability of treatment weighted (IPTW) marginal structural models (Robins, 2000; Robins et al., 2000), g-estimation of structural nested models (Robins, 1989), and double robust methods such as augmented IPTW (Bang and Robins, 2005) and targeted maximum likelihood (van der Laan and Gruber, 2012). Importantly, these approaches base causal identification on a sequential version of exchangeability (Robins, 1986), also known as sequential ignorability or no unmeasured confounders (Robins, 2000). Sequential exchangeability posits that the potential outcomes are independent of treatment, given the history of some set of measured (possibly time-varying) covariates and treatment; this assumption is unverifiable and can be implausible in many settings. For example, individuals may select medical treatments based on unmeasured risk factors, and public policies are decided in highly complex political contexts that may influence health.

In contrast, difference-in-differences (DID) methods typically base identification on parallel trends assumptions rather than sequential exchangeability (Ashenfelter and Card, 1985; Roth et al., 2022). Parallel trends assumptions posit that time trends in average potential outcomes are independent of the observed treatment (Ashenfelter and Card, 1985; Marcus and Sant’Anna, 2021). DID methods typically focus on the average treatment effect in the treated for a treatment occurring at a single time point, although recently extensions have considered certain types of sustained treatment regimes (for a review, see Roth et al., 2022). In particular, Callaway and Sant’Anna (2021) and de Chaisemartin and D’Haultfoeuille (2020, 2021b) consider effects conditional on each observed treatment path in a monotonic (i.e., staggered adoption) setting, meaning that values of the observed time-varying treatment can either increase or decrease over time, but not both. Somewhat more generally, de Chaisemartin and D’Haultfoeuille (2021a) consider interventions fixing a (possibly non-monotonic) binary or ordinal exposure to its baseline status, focusing on unconditional cost-effectiveness ratios and outcome regression estimators. Relevant to the present work, these recent DID developments have included doubly robust estimators (Callaway and Sant’Anna, 2021; Sant’Anna and Zhao, 2020).

This paper considers a more general setting and different estimands than previous DID implementations, focusing on marginal effects of general sustained treatment strategies under parallel trends assumptions. The remainder of this paper is organized as follows. Section 2 introduces notation and the assumed data structure. Section 3 presents new identification results for an intervention-specific mean under parallel trends, where identifying formulas are modifications of Robins’ (1986) g-computation algorithm formula (g-formula). Section 4 then presents consistent and asymptotically normal (CAN) estimators based on inverse probability weighting, outcome regression, and a double robust estimator that combines both. Section 5 presents simulation results that support the identification result and theoretical large sample properties of the proposed estimators. Section 6 presents an example estimating the number of lives that would have been saved by a U.S. federal stay-at-home order during the COVID-19 pandemic in spring 2020. Section 7 considers sensitivity analysis for violations of parallel trends and application of the poposed approach to dynamic treatment regimens. Section 8 concludes.

2. Preliminaries

2.1. Data

Suppose data Oit={Wit,Ait,Yit} are observed on i=1,2,,n individuals (or units) at time points t=0,1,,τ, where: Wit are (possibly vector-valued) covariates; Ait are discrete, possibly multivariate treatments realized after Wit; and Yit are outcomes realized after Ait, all measured without error. Denote history of a variable with overbars, e.g., A¯it=(Ai0,Ai1,,Ait), with A¯iA¯iτ and Aik={Ø} for k<0 by convention. Upper case is used throughout to refer to random variables, lower case refers to specific realizations, and scripts refer to the support. The i subscript is omitted unless needed to resolve ambiguity. Throughout, it is assumed that O¯i{W¯i,A¯i,Y¯i}(i=1,2,,n) represent independent and identically distributed (iid) draws from a relevant target population.

Assume the data come from a staggered discontinuation design, defined as follows. Suppose the target estimand is E{Yt(a¯*)} where a¯* denotes the treatment strategy or intervention plan of interest, and Yt(a¯*) denotes a potential outcome; i.e., the value Yt would take under the intervention setting A¯t=a¯t*. The approach in this paper requires that in the observed data distribution, Pr(A0=a0*)=1, or in other words that Ai0=a0* for all i. Such a scenario is said to be a staggered discontinuation design with respect to the treatment plan a¯* of interest, because units begin follow-up under the treatment plan but may discontinue at later points in a staggered way. Note that we do not require monotonic treatment assignment, in contrast to recent DID papers (e.g., Goodman-Bacon, 2021; Callaway and Sant’Anna, 2021).

2.2. Motivating example

Consider the question, “what effects on all-cause mortality would a U.S. federal stay-at-home order have had in spring 2020 during the COVID-19 pandemic?” Let Yit be a binary indicator that individual i died during week t, t=0,1,,11, measured as weeks since April 6, 2020. Let Ait be a binary indicator that the state in which individual i was living during week t was under a state-level stay-at-home or shelter-in-place order. Suppose it is of interest to estimate E{Yt(1¯)}E(Yt), the difference in U.S. mortality rates under a hypothetical federal stay-at-home order vs. under the observed treatment trajectory (i.e., the “natural course”). As of April 6, 43/50 U.S. states were under stay-at-home orders, which were discontinued at times ranging from late April to late June, with the exception of California which continued through December (Figure 1). Thus, the observed treatment trajectories give rise to a staggered discontinuation design with respect to the treatment plan a¯*=1¯ setting everyone to remain under stay-at-home order in those 43 states. The methods developed below can be used to draw inference about what would have happened had such a policy been implemented.

Figure 1:

Figure 1:

Dates of state-issued stay-at-home orders in U.S. states during the COVID-19 pandemic in 2020

3. Identification

In this section we consider identification, given data O¯, of the quantity μtE{Yt(a¯*)}, the mean outcome at time t, under the intervention to set all individuals to A¯i=a¯*. Throughout, it is assumed that interest lies in only one intervention a¯*. Note that μt depends on a¯*, which is left implicit for notational simplicity. Consider the following assumptions:

Assumption 1.

(Stable Unit Treatment Value Assumption [SUTVA]): If A¯it=a¯t*, then Yit=Yit(a¯t*) for t{0,1,τ}.

In words, Assumption 1 requires that an individual’s potential outcomes are not affected by other individuals’ treatments (i.e., no interference), allowing us to index potential outcomes by each individual’s treatments alone. Assumption 1 also requires that the treatment a¯* is defined precisely enough so that, for individuals whose observed treatment equals a¯t*, observed outcomes can stand in for counterfactual outcomes under a hypothetical intervention a¯t*. Implicit in Assumption 1 is that the future cannot affect the past, or Yit(a¯k*)=Yit(a¯t*) for k>t. Thus, SUTVA would be violated, for example, if individuals were able to anticipate future treatments and change their behavior. This particular violation has typically been addressed under a separate “no anticipation” assumption in the DID literature (e.g., Callaway and Sant’Anna, 2021). SUTVA is an unverifiable assumption in observational studies, though it is sometimes possible to test for the presence of interference (Halloran and Hudgens, 2016).

Assumption 2.

(Positivity): If f(w¯tA¯t1=a¯t1*)>0, then f(a¯t*W¯t=w¯t,A¯t1=a¯t1*)>0, for w¯t𝓦¯t;t{1,2,,τ}.

Here and throughout, f(x) refers to a conditional density if X is continuous, and a conditional probability mass function if X is discrete. Assumption 2 requires that units whose treatment history up to time t1 is consistent with the regime in question (a¯*) have positive probability of remaining under treatment plan a¯* at time t. Positivity can sometimes be a verifiable assumption. In particular, if both W¯ and A¯ are low-dimensional and discrete, then among units with A¯t1=a¯t1*, if one observes units who remain under a¯* at time t in every stratum of W¯t, this implies positivity in the population with probability 1 (but not necessarily the reverse).

Assumption 3.

(Parallel trends): For t{1,2,,τ},kt:

E{Yt(a¯*)Yt1(a¯*)W¯k,A¯k1=a¯k1*}=E{Yt(a¯*)Yt1(a¯*)W¯k,A¯k=a¯k*}

In words, Assumption 3 states that, among individuals whose treatment status is consistent with the intervention a¯* up to time k1, had (counter to fact) all individuals followed the intervention through time t, trends would have been parallel for those who do and do not follow the intervention at time k but have equal covariate histories. This assumption is very similar to those adopted in papers describing event study and staggered adoption DID designs such as Goodman-Bacon (2021) and Callaway and Sant’Anna (2021), and nearly identical to Assumption 12 in de Chaisemartin and D’Haultfoeuille (2021a). The latter differs by considering only certain types of treatment regimes and presuming a linear relation between covariates and counterfactual trends. Parallel trends is unverifiable, though closely related conditions can often be checked (Roth, 2019). Note that Wit may include prior outcomes Yim for m<t. However, if Yi,t1 is included in Wit, then the parallel trends assumption is equivalent to sequential exchangeability, in which case existing causal inference methods for observational data with a longitudinal exposure can be used (e.g., Robins, 1986, 2000; van der Laan and Gruber, 2012).

The following Lemma presents the main identification results in this paper, which show that Assumptions 1-3 are sufficient to equate μt to a function of the observed data distribution.

Lemma 1.

(Parallel trends g-formula) Define the functional (i.e., statistical parameter) ψtE(Y0)+k=1tE(YkYk1W¯k=w¯k,A¯k=a¯k*)m=0kdF(wmw¯m1,a¯m1*). Under a staggered discontinuation design and if Assumptions 1-3 hold, then ψt=μt.

Here and throughout, F() refers to a conditional cumulative distribution function. Lemma 1 states that the target causal quantity μt is identified by the parameter ψt. The parameter ψt is referred to as the parallel trends g-formula because it represents a modification of the usual g-formula (the dependence of ψt on a¯* is also left implicit). A formal proof of Lemma 1 by induction is presented in Appendix A. Here we give a less formal explanation to build intuition. We have:

μtE{Yt(a¯*)}=E{Y0(a¯*)}+k=1tE{Yk(a¯*)Yk1(a¯*)}=E{Y0(a¯*)}+k=1tE[E{Yk(a¯*)Yk1(a¯*)W0}]=E{Y0(a¯*)}+k=1tE[E{Yk(a¯*)Yk1(a¯*)A0=a0*,W0}]

where the first equality follows by adding and subtracting constants, the second by iterated expectation, and the third by Assumption 3. Repeatedly applying iterated expectation and Assumption 3, we have:

μt=E{Y0(a¯*)}+k=1tE(E[E{Yk(a¯*)Yk1(a¯*)A¯k=a¯k*,W¯k}A¯k1=a¯k1*,W¯k1]A0=a0*,W0)=E(Y0)+k=1tE[E{E(YkYk1A¯k=a¯k*,W¯k)A¯k1=a¯k1*,W¯k1}A0=a0*,W0]=ψt

where the second equality follows from Assumption 1 and the last equality from iterated expectation.

4. Estimators

This section presents estimators for the statistical parameter ψt, which equals the target quantity μt under the above stated assumptions. The estimators in this section utilize existing estimators of μt under sequential exchangeability rather than parallel trends, all of which are CAN estimators of the g-formula by virtue of being solutions to unbiased estimating equations (Stefanski and Boos, 2002). Since ψt is a continuous function of several g-formulas, the same function applied to estimators of those g-formulas is a CAN estimator for ψt. The remainder of this section formalizes this logic and gives examples of specific estimators that function in this capacity. The estimators presented in this section are provided in an R package (see Supporting Information).

4.1. General form

Here we derive a general form of a CAN estimator for the target statistical parameter, ψt. First, define:

ϕj,k=E(YjA¯k=a¯k*,W¯k=w¯k)m=0kdF(wmW¯m1=w¯m1,A¯m1=a¯m1*) (1)

Equation (1) is the g-formula, developed in the context of identifying parameters like μt under sequential exchangeability. Here, sequential exchangeability is not assumed, and therefore ϕj,k is not interpretable as a causal parameter; instead existing estimators of the statistical parameter ϕj,k are used to assemble estimators of ψt (which equals the causal parameter μt under Assumptions 1-3) by noting that ψt=ϕ0,0+k=1t(ϕk,kϕk1,k). Next, suppose there is an estimator ϕ^jk of ϕjk which is the solution to an unbiased estimating function dϕjk(O;ϕjk); i.e., 0=E{dϕjk(O;ϕ^jk)}. Let ϕ=(ϕ0,0,ϕ0,1,,ϕt1,t,ϕt,t). Then simply define

dψt(O;ϕ,ψt)=ϕ0,0+k=1t(ϕk,kϕk1,k)ψt (2)

Clearly, E{dψt(O;ϕ,ψt)}=0, indicating that an estimator (ϕ^,ψt^) that jointly solves 0=idψt(Oi;ϕ^,ψ^t) will yield a CAN estimator ψ^t for ψt.

The following subsections show how different options for dϕjk(O;ϕjk) (including IPTW, g-computation, and targeted maximum likelihood estimation [TMLE]) can be constructed and stacked with (2) to form estimators of ψt that inherit desirable properties (e.g., consistency, asymptotic normality, double robustness).

4.2. Inverse probability of treatment weighted (IPTW) estimator

Define stabilized inverse probability of treatment weights (IPTWs) as πk(a¯;W¯)=m=1kf(ama¯m1)/m=1kf(ama¯m1,W¯m). Robins (2000) (Lemma 1.1) showed that ϕj,k=cjk(a¯k*), where cjk() is the unique function such that E[qjk(A¯k){Yjcjk(Ak)}πk(A¯;W¯)]=0 for all functions qjk(A¯k) where the expectation exists. The equation E[qjk(A¯k){Yjcjk(A¯k)}πk(A¯k;W¯)]=0 defines a regression of Yj on A¯k weighted by πk(A¯;W¯). In the context of sequential exchangeability, estimators based on this weighted regression formulation are called IPTW-marginal structural model estimators and given a causal interpretation (Robins et al., 1992; Robins, 2000). However, in the context of this paper, sequential exchangeability is not assumed and so the weighted regression equation does not have a causal interpretation on its own. Instead, the above result together with results from Section 4.1, implies that an IPTW estimator of ψt can be formed by using a linear combination of IPTW marginal structural model estimators for ϕj,k.

For simplicity of presentation, assume that for k=0,1,,t,f(aka¯k1) and f(aka¯k1,w¯k) are known up to a finite dimensional parameter. That is, define g0,k(a¯,)s=0kf(asa¯s1) and g0,k(a¯,w¯)s=0kf(asa¯s1,w¯s), and say we are willing to assume that g0,k(a¯,) is uniquely determined by the parametric model gk(a¯,;α0), and similarly that g0,k(a¯,w¯)=gk(a¯,w¯;α1), where α0 and α1 are finite dimensional parameter vectors. Say we have an estimator that solves an unbiased estimating equation for (α0,α1). For example, gk(a¯,;α0) and gk(a¯,w¯;α1) may consist of generalized linear models with parameters estimated by maximum likelihood. Specify cjk(A¯k) as some appropriate functional form for the expected value of Yj conditional on A¯k in the weighted data distribution, such as cjk(A¯k)=γ0jk+γ1jkI(A¯k=a¯k*) (i.e., leaving the model unrestricted when A¯k=a¯k*). Then an estimator ϕ^j,kIPTW that solves 0=cjk(a¯k*)ϕj,k is CAN for ϕj,k if cjk(A¯k), gk(A¯,;α0) and gk(A¯,w¯;α1) are correctly specified. Finally, stack the score equations for gk(a¯,;α0), and gk(a¯,w¯;α1), along with 0=cjk(a¯k*)ϕj,k(k=0,1,,t;j=k1,k) and equation (2) to yield an estimator for ψt, say ψ^tIPTW.

In other words, the IPTW estimator for the target parameter of interest is ψ^tIPTW=ϕ^0,0IPTW+k=1t(ϕ^k,kIPTWϕ^k1,kIPTW), where ϕ^j,kIPTW are estimators of each appropriate g-formula parameter based on an IPTW model. Note that under our assumption set, ϕ^j,kIPTW are not estimators of causal quantities in and of themselves, but simply functions of the observed data distribution that may be assembled appropriately to form the causal estimator ψ^tIPTW. Clearly, ψ^tIPTW solves an estimating equation that is unbiased if gk(A¯,;α0), gk(A¯,w¯;α1) and cjk(A¯k) are all correctly specified, implying ψ^tIPTW is CAN for ψt under the same conditions. However, IPTW estimators are known to be inefficient and ψ^tIPTW may similarly inherit this property. The following subsections present estimators that may improve on efficiency relative to IPTW.

4.3. Iterated conditional expectation (ICE) estimator

Bang and Robins (2005) describe an estimator of ϕj,k based on the following iterated conditional expectation (ICE) representation:

ϕj,k=E(E[E{E(YjA¯k=a¯k*,W¯k)A¯k1=a¯k1*,W¯k1}A¯1=a¯1*,W¯1]A0=a0*,W0) (3)

which can equivalently be written as ϕj,k=E{Q0j,k,0(a¯*)} where, for m=0,1,,k,Q0j,k,m(a¯*)=E{Q0j,k,m+1(a¯*)A¯m=a¯m*,W¯m} and Q0j,k,k+1(a¯*)=Yj.

An estimator of ψt can then be formulated based on this representation. For simplicity, say we are willing to assume that Q0j,k,m(a¯*) are known up to a finite dimensional parameter for m=0,1,,k. That is, assume Q0j,k,m(a¯*)=Qj,k,m(a¯*;βm), where βm for m=0,1,,k, are finite dimensional parameters. For example, Qj,k,m(a¯*;βm) may be a generalized linear model with parameters βm. Say we are in possession of an unbiased estimating function dj,k,m{O,Qj,k,m+1(a¯*;βm+1);βm} for βm. For example, if maximum likelihood is used, then dj,k,m{O,Qj,k,m+1(a¯*;βm+1);βm} is the vector of first derivatives of the model log-likelihood with respect to βm. Note that including Qj,k,m+1(a¯*;βm+1) as an argument to the estimating function makes explicit the nested nature of the iterated expectations being modeled. The ICE estimator of ϕj,k is then defined (Bang and Robins, 2005) as the solution ϕ^j,kICE to 0=i=1ndj,k(Oi;ϕj,k) where

dj,k(O;ϕj,k)=(dj,k,k(O;βk)dj,k,k1{O,Qj,k,k(a¯*;βk);βk1}dj,k,0{O,Qj,k,1(a¯*;β1);β0}Qj,k,0(a¯*;β0)ϕj,k)

Then, simply stack dj,k(O;ϕj,k) with (2) to yield an estimator ψ^tICE for ψt. In other words, the ICE estimator of the target parameter is ψ^tICE=ϕ^0,0ICE+k=1t(ϕ^k,kICEϕ^k1,kICE), where each ϕ^j,kICE is an estimator of the corresponding g-formula parameter based on ICE g-computation. Clearly, ψ^tICE solves an unbiased estimating equation whenever all the iterated outcome models {Qj,k,m(a¯*;βm):k=0,1,,t;j=k,k1;m=0,1,,k+1} are correctly specified. Estimators of ϕj,k based on outcome regression generally have smaller asymptotic variance that IPTW estimators, and ψ^tICE may inherit this property.

4.4. Doubly robust targeted maximum likelihood estimator (TMLE)

IPTW estimators are only guaranteed to be CAN if the treatment models are correctly specified, and ICE estimators are only guaranteed to be CAN if all the outcome models are correctly specified. Doubly robust estimators are CAN if either the outcome or treatment models are correct (but not necessarily both), which is an advantage because one is rarely certain that models are correctly specified.

Doubly robust estimators of ϕj,k generally consist of augmenting the ICE algorithm by including predicted values from the treatment models used to construct IPTWs in some way. Such estimators are called semiparametric efficient if they solve the estimating equation corresponding to the following efficient influence curve (van der Laan and Gruber, 2012; Tran et al., 2019):

m=0kI(A¯m=a¯m*)g0,m(a¯*,w¯){Q0j,k,m+1(a¯*)Q0j,k,m(a¯*)}+Q0j,k,0(a¯*)ϕj,k (4)

with g0,m(a¯*,w¯) and Q0j,k,m defined as in previous sections. Many estimators correspond to this efficient influence curve, meaning they all have the smallest asymptotic variance of any regular asymptotically linear estimator in this class (Bang and Robins, 2005; van der Laan and Gruber, 2012). We present one such example of a targeted maximum likelihood estimator (TMLE) which may outperform others in finite samples (Tran et al., 2019).

First consider the TMLE of ϕj,k. For simplicity, assume that outcome models {Q0j,k,m(a¯*):m=0,1,,k} and treatment models {g0,m(a¯*):m=0,1,,k} are known up to a finite dimensional parameter. That is, assume g0,m(a¯*)=gm(a¯*;αm) and Q0j,k,m(a¯*)=Qj,k,m(a¯*;βm), where αm and βm are finite dimensional parameters, m=0,1,,k. Then proceed as follows:

  1. For m=0,1,,k, estimate αm, for example using maximum likelihood. Denote estimators of αm as α^m and corresponding estimators of gm(a¯*;αm) as gm(a¯*;α^m).

  2. For m=k, estimate βm, for example using maximum likelihood, denoting this estimator β^m. Calculate Qij,k,m(a¯*;β^m) for each unit i and denote this estimator Q^ij,k,m(a¯*). Note that these are model predictions that implicitly depend on the data, and so vary across units i.

  3. Also for m=k, update the initial fit Q^ij,k,m(a¯*) by fitting a new model, defined as h{Qij,k,m,*(a¯*)}=h{Q^ij,k,m(a¯*)}+ϵj,k,m, where h() is an appropriate link function, ϵj,k,m is an intercept, and Qij,k,m,*(a¯*) are conditional expectations under the updated model. Note, the response variable in this model is Qij,k,k+1(a¯*)=Yj. The logit link is recommended to ensure the estimator respects bounds implied by the data (if Yj is not bounded by (0,1), it will need to be appropriately transformed for the logit function to be defined) (van der Laan and Gruber, 2012). Estimators Q^ij,k,m,*(a¯*) for the updated fit are found by maximizing an appropriate weighted likelihood with weights I(A¯im=a¯m*)/gm(a¯*;α^m).

  4. Repeat steps 2–3, estimating Qj,k,m(a¯*;βm) and Qj,k,m,*(a¯*) for m=k1,k2,,0.

  5. The TMLE for ϕj,k is then defined as ϕ^j,kTMLE=n1i=1nQ^ij,k,0,*(a¯*).

Then, the TMLE for ψt is defined as ψ^tTMLE=ϕ^0,0TMLE+k=1t(ϕ^k,kTMLEϕ^k1,kTMLE). Since ϕ^j,kTMLE solves the estimating equation corresponding to the efficient influence curve (4), it will be CAN for ϕj,k so long as either (i) the set of outcome models {Qj,k,m(a¯*;βm):m=0,1,,k} are correctly specified, or (ii) the set of treatment models {gm(a¯*;αm):m=0,1,,k} are correctly specified, but it is not necessary that both be correct. Therefore, if one of these two conditions holds for all k=0,1,,t and j=k1,k, then ψtTMLE^ will be CAN for ψt. The double robustness property carries through to ψ^tTMLE by virtue of the fact that the estimating equation in (2) is unbiased if the estimating equations for all the ϕj,k are unbiased, which is the case for ϕ^j,kTMLE under conditions (i) or (ii) above.

5. Simulation study

A simulation study was conducted to evaluate the finite sample performance of the IPTW, ICE, and TMLE estimators described in Section 4 when Assumptions 1-3 hold and all models were correctly specified. The TMLE estimator was also evaluated under misspecification of either the treatment or outcome model. Code for the simulation is provided in an R package (see Supporting Information).

5.1. Data generating distribution

Data Oit={Wit=(Wit1,Wit2),Ait,Yit};i=1,2,,n;t=0,,5 were generated from the distributions Ui0Bernoulli{logit1(ω0)}; Wit1Bernoulli{logit1(α0t+α1tAi,t1)}; Wit2N(γ0t+γ1tAi,t1,1); AitAi,t1=0Bernoulli{logit1(δ0t+δ1tUi0+δ2tWit1+δ3tWit2+δ4tWit22)}; and YitN(β0t+β1tWit1+β2tWit2+β3tWit22+β4tAit+θUi0,1); with Ai0=0 and Ait=1 if Ai,t1=1. The monotonic treatment assignment for Ait is not necessary, but simplifies analysis. In all analyses, Ui0 is treated as unmeasured, but all other variables are observed. Parallel trends hold in this setup, as one sufficient set of conditions for parallel trends (proven in Appendix B) is that the only unmeasured variables (here, Ui0) are time-invariant, do not affect time-varying covariates, and enter the outcome model linearly with constant coefficient over time (here, θ is constant over t).

We simulated 1,000 datasets each for sample sizes n=1,000, 10,000, and 100,000. All parameters were generated from a N(0.2,1) distribution, with the same values for each parameter used across all simulation runs. The target parameter was μ5=E{Y5(0¯)}=3.98, the mean outcome at end of follow-up, had everyone remained untreated. The true difference compared to the natural course was E{Y5(0¯)}E(Y5)=3.98(3.88)=0.10.

5.2. Estimator implementation

For each simulated dataset, the estimators μ^tIPTW, μ^tICE, and μ^tTMLE were calculated using correctly specified generalized linear regression models estimated using maximum likelihood. Additionally, μ^tIPTW was calculated with treatment models misspecified, μ^tICE with outcome models misspecified, and μ^tTMLE with treatment models, outcome models, or both misspecified. All misspecified models omitted the term for Wit22 at each time t.

5.3. Simulation results

Table 1 shows estimates of the bias, variance, and p-values from a Lilliefors test for normality, for each estimator of μ5. The results suggest that all stated theoretical properties hold approximately in simulated data. First, when all models are correctly specified, all estimators appear approximately unbiased with decreasing variance as the sample size increases. When outcome models and treatment models are misspecified, ICE and IPTW estimators appear biased, respectively. TMLE appears consistent when either the treatment or outcome models are correctly specified, but not when both are misspecified, supporting the double robustness property. Lastly, all estimators appear normally distributed for all sample sizes considered, based on Lilliefors tests.

Table 1:

Simulation results

n = 1, 000 n = 10, 000 n = 100, 000

estimator variance1/n bias2 p3 variance1/n bias2 p3 variance1/n bias2 p3
ice_qfal 4.3 1.15 0.68 4.6 1.10 0.81 4.5 1.16 0.76
ice_true 4.2 −0.04 0.82 4.5 −0.02 0.23 4.2 0.02 0.45
iptw_gfal 4.5 1.17 0.86 4.7 1.17 0.49 4.6 1.25 0.69
iptw_true 6.0 −0.17 0.37 6.3 −0.08 0.87 7.4 −0.01 0.57
tmle_bfal 4.4 1.19 0.51 4.6 1.15 0.65 4.5 1.22 0.62
tmle_gfal 4.2 −0.08 0.86 4.5 −0.04 0.28 4.3 0.02 0.21
tmle_qfal 5.9 −0.09 0.15 6.4 −0.07 0.78 7.6 −0.02 0.41
tmle_true 5.4 −0.03 0.82 5.6 0.01 0.11 5.7 0.01 0.69
1

Empirical variance of estimates over 1000 simulated datasets.

2

Multiplied by 100.

3

P-value for Lilliefors test against the null hypothesis of normality.

Abbreviations: ice=iterated conditional expectation, iptw=inverse probability of treatment weighted, tmle=targeted maximum likelihood, qfal=outcome models misspecified, gfal=treatment models misspecified, bfal=both sets of models misspecified, true=all models correctly specified.

6. COVID-19 application

6.1. Data

This section presents an analysis of the motivating example, introduced in Section 2.2. Code and data are provided in the R package didgformula (see Supporting Information). State-level weekly mortality data come from the Centers for Disease Control and Prevention’s National Death Index, and weekly counts of COVID-19 cases from the COVID-19 Data Repository at the Center for Systems Science and Engineering at Johns Hopkins University. Data on state-level stay-at-home orders come from the COVID-19 U.S. State Policy database. Though the outcome variable of interest is an individual-level indicator of death in week t, this variable is not directly observed; instead the observed data represent counts of deaths occurring in each state. Let s=1,2,,43 be a state index, and let Yist be an indicator of mortality during week t for the ith individual (i=1,,ns) living in state s, where ns denotes the population size in state s, and n=s=143ns309 million. The observed outcome variable is Yst=i=1nsYist, the state-level weekly sum of individual-level mortality counts, along with population counts ns (drawn from the 2010 Census). The observed treatment variable Ast is an indicator of state s being under stay-at-home order in week t. Finally, let Wst be the change in confirmed COVID-19 cases reported per 100k population in the previous four weeks (i.e., the difference from week t4 to t) in state s. Thus, in this example, the parallel trends assumption is conditional on the local state of the pandemic, which may be plausible for pandemic-related policies (Callaway and Li, 2021).

6.2. Estimator implementation

6.2.1. IPTW

For the treatment models, the following parametric models pooled over k=1,,11 were assumed:

f(AskA¯s,k1;α0)=Bernoulli{logit1(α00+α01ω(k)+α02As,k1)}f(AskA¯s,k1,W¯sk;α1)=Bernoulli{logit1(α10+α11ω(k)+α12As,k1+α13logWsk)}

where ω(k) is a natural cubic spline basis with 3 degrees of freedom for time k. The outcome model cjk(A¯)=γ0jk+γ1jkI(A¯k=1¯), k=1,,11, j=k,k1 was specified, which allows the outcome to depend on the full exposure history. The parameters α0=(α00,α01,α02) and α1=(α10,,α13) were estimated using maximum likelihood, weighted by 1/ns to account for differing population sizes across states. Then, γ0jk,γ1jk,k=1,11,j=k,k1 were estimated by maximizing the state-level binomial likelihood weighted by inverse probability of treatment weights πk(A¯;W¯,α^)=m=1kf(AmA¯m1;α^0)/m=1kf(AmA¯m1,W¯m;α^1), where α^0 and α^1 denote maximum likelihood estimators of α0 and α1. Then estimators ψ^tIPTW, t=0,,11 were calculated as ϕ^0,0IPTW+k=1t(ϕ^k,kIPTWϕ^k1,kIPTW), where ϕ^j,kIPTW=γ^0jk+γ^1jk and γ^0jk; γ^1jk denote the weighted maximum likelihood estimators.

6.2.2. ICE

For ICE estimators, the following parametric outcome regression models pooled over k=1,,11 were assumed:

Qj,k,m(a¯*;βm)=logit1{β0jm+β1jmω(k)+β2jmω(k)am*+β3jmlogWm}

for j=k,k1 and m=k,k1,,0, where again ω(k) refers to a natural cubic spline basis with 3 degrees of freedom. Note that, due to the monotonic treatment pattern, the interaction between time and treatment allows the outcome to depend on the full exposure history. The parameters βm were estimated by maximizing a binomial quasilikelihood, with estimators denoted β^m. To account for varying state population sizes, state contributions to the quasilikelihood were weighted by 1/ns. Finally, ICE estimators ψ^tICE,t=0,,11 were calculated as ϕ^0,0ICE+k=1t(ϕ^k,kICEϕ^k1,kICE), where ϕ^j,kICE=r=143Qrj,k,0(a¯*;β^m)/43 (as there are 43 states included in the analysis).

6.2.3. TMLE

For TMLE, the same treatment models as specified for IPTW were used, along with the same outcome models as specified for ICE. Specifically, when estimating ϕjk, for the mth ICE step (m=k,k1,,0), the TMLE updating step was performed by maximizing another weighted quasibinomial likelihood with response variable Qsj,k,m+1(a¯*;βm+1) with an intercept and offset Qsj,k,m(a¯*;β^m), weighted by I(A¯k=1¯)/gk(A¯,α^k). Predictions Q^sj,k,m,*(a¯*) from this model were then passed to the (m1)th ICE step, and the process was repeated for m=k,k1,,0. Finally, ψ^tTMLE,t=0,,11 were calculated as ϕ^0,0TMLE+k=1t(ϕ^k,kTMLEϕ^k1,kTMLE), where ϕ^j,kTMLE=r=143Q^rj,k,0,*(a¯*)/43.

6.2.4. Bootstrap standard errors and confidence intervals

Standard errors were estimated using a nonparametric bootstrap. Specifically, for B bootstrap replicates (b=1,2,,B), a resampled outcome variable Ystb=i=1nsYistb(t=0,,12) was drawn from a multinomial distribution with ns trials and probabilities ns1(Ys0,Ys1,,Y¯s,12), where Ys,12 denotes the number of individuals who survived beyond t=11 in state s. IPTW, ICE, and TMLE estimators ψ^tIPTW,b, ψ^tICE,b, ψ^tTMLE,b were calculated on each replicate (b=1,,B). Then, Wald 95% confidence intervals were computed using the standard deviation of bootstrap estimates.

6.3. Results

Figure 2 shows results in the form of estimated U.S. weekly mortality rates per 100,000 person weeks over the study period under the natural course (red) and under the hypothetical sustained treatment of setting At=1 for all t, i.e., under a scenario where all 43 included states maintained stay-at-home orders through June 2020. The three estimators largely agree in their predictions that all-cause mortality rates would have been moderately lower throughout most of the study period, had stay-at-home orders remained in place. Translating the counterfactual mortality rate estimates to lives saved, if all causal and modeling assumptions hold, based on TMLE, stay-at-home orders remaining in place from April through June 2020 would have saved appoximately 11,100 (95% CI: 6,800, 15,500) lives in those 43 states during the same time period. Results based on ICE were similar (point estimate: 11,300, 95% CI: 6,900, 15,600), whereas IPTW gave a smaller point estimate and somewhat wider CI (point estimate: 4,100, 95% CI: −500, 8,700).

Figure 2:

Figure 2:

Estimated U.S. weekly mortality rates - observed (red) and estimated under hypothetical treatment setting all states to remain under stay-at-home order using IPTW (green), ICE (blue), and TMLE (purple). Note that TMLE and ICE estimates and 95% CIs are nearly identical.

7. Extensions

7.1. Violations of parallel trends

In some applications, the parallel trends assumption (Assumption 3) may be questionable, and investigators may be interested in how inferences are altered by plausible deviations from parallel trends. A sensitivity analysis can be conducted as follows. Let

Δ(w¯k,t)=E{Yt(a¯*)Yt1(a¯*)W¯k=w¯k,A¯k1=a¯k1*}E{Yt(a¯*)Yt1(a¯*)W¯k=w¯k,A¯k=a¯k*},

where Δ(w¯k,t) quantifies a deviation from parallel trends, which may depend on both the covariates w¯k and time t. Then, consider the following statistical parameter:

ψt=E(Y0)+k=1t{E(YkYk1A¯k=a¯k*,W¯k=w¯k}+m=1kΔ(w¯m,k)}m=0kdF(wmw¯m1,a¯m1*).

If Assumptions 1 and 2 hold, then μt=ψt (the proof follows from results in Appendix A). If a particular value is assumed known for Δ(w¯k,t), then estimation can proceed by defining

ϕj,k=[E{Yj+m=1kΔ(w¯m,k)A¯t=a¯t*,W¯k=w¯k}]k=0tdF(wkW¯k1=w¯k1,A¯k1=a¯k1*),

and noting that ψt=ϕ0,0+k=1t(ϕk,kϕk1,k). As with ϕj,k, the parameter ϕj,k is simply a special case of the usual g-formula, where the outcome variable is Yj+m=1kΔ(w¯m,k). Thus, the IPTW, ICE, and TMLE estimators can be used, replacing outcome variables Yk with Yk+m=1kΔ(w¯m,k) (but not for Yk1 ). In practice, Δ(w¯k,t) will typically not be known, and thus estimates may be computed over a range of plausible values of Δ(w¯k,t). Differences in trends between subgroups of units before discontinuation occurs may be helpful in determining plausible values of Δ(w¯k,t) (Roth and Rambachan, 2019).

7.2. Dynamic regimes

In addition to the static regimes considered above, the proposed approach can accommodate regimes where treatment decisions may depend on the history of covariates and/or treatments. Let g¯={g0(w0),g1(w¯1),,gτ(w¯τ)} denote a dynamic regime, where gk(w¯k) returns the treatment value ak that would be assigned given covariate history w¯k. Note that gk() may depend on treatment history as well, which we suppress for notational simplicity. Likewise let Yk(g¯) be a potential outcome under treatment regime g¯. Suppose interest is in the estimand μtg=E{Yt(g¯)}. Then, consider the following modifications to Assumptions 1-3.

Assumption 4.

(SUTVA for dynamic regimes): If A¯it=g¯t(W¯it), then Yit=Yit(g¯t) for t{0,1,τ}.

Assumption 5.

(Positivity for dynamic regimes): If {w¯tA¯t1=g¯t1(W¯t1)}>0, then f{gt(w¯t)W¯t=w¯t,A¯t1=g¯t1(w¯t1)}>0, for w¯t𝓦¯t; t{1,2,,τ}.

Assumption 6.

(Parallel trends for dynamic regimes): For t{1,2,,τ}, kt :

E{Yt(g¯)Yt1(g¯)W¯k,A¯k1=g¯k1(W¯k1)}=E{Yt(g¯)Yt1(g¯)W¯k,A¯k=g¯k(W¯k)}

Lemma 2.

(Parallel trends g-formula, dynamic regimes) Define the functional (i.e., statistical parameter)

ψtgE(Y0)+k=1tE{YkYk1W¯k=w¯k,A¯k=g¯k(w¯k)}m=0kdF{wmw¯m1,g¯m1(w¯m1)}

Under a staggered discontinuation design and if Assumptions 4-6 hold, then ψtg=μtg.

The proof of Lemma 2 follows from results in Appendix A. Thus, the IPTW, ICE, and TMLE estimators described can be used, with ϕj,k appropriately redefined.

8. Discussion

This paper considers a new approach to identifying effects of sustained intervention strategies based on an assumption set that includes parallel trends. This assumption is popular in difference-in-differences because it allows for some degree of unmeasured confounding (Zeldow and Hatfield, 2021). Recently, parallel trends assumptions have been leveraged to target sustained treatment estimands, mainly considering certain types of treatment regimes (Callaway and Sant’Anna, 2021; de Chaisemartin and D’Haultfoeuille, 2020, 2021b,a). Relative to previous work, the main contribution of this paper is a framework for estimating marginal intervention-specific means for general treatment regimes (including dynamic regimes) under parallel trends. This is accomplished by building on IPTW, g-computation, and doubly-robust TMLE developed in the context of sequential exchangeability, thus connecting disparate causal inference literatures from biostatistics (Robins, 1986, 2000; Bang and Robins, 2005; van der Laan and Gruber, 2012) and econometrics (Ashenfelter and Card, 1985; Callaway and Sant’Anna, 2021). Independently and concurrently with the present work, Shahn et al. (2022) developed g-estimation of stuctural nested models for general sustained treatment regimes under parallel trends, with results that imply identification for the intervention-specific means considered here. While it is possible (but complex) to estimate the latter quantity using the g-estimation approach of Shahn et al. (2022), the main strength of g-estimation is in exploring effect heterogeneity by time-varying covariates.

Regarding the example presented in Section 6, care should be taken when assuming parallel trends for pandemic-related outcomes without conditioning on pandemic state variables such as infection rates, as marginal parallel trends are incompatible with standard epidemic models (Callaway and Li, 2021). DID methods have previously been used to estimate effects of stay-at-home orders on the treated (e.g. Fowler et al., 2021). The methods in this paper allow for (i) a different target parameter that may more directly correspond to decisions facing policy makers and public health officials (Maldonado and Greenland, 2002), and (ii) adjustment for time-varying pandemic state variables likely affected by prior treatment, which DID methods have only recently begun to consider (Callaway and Li, 2021). That said, assessing the effects of stay-at-home orders is complex, and a comprehensive analysis would need to consider potential biases not factored into the present analysis; e.g., there is likely some interference (Haber et al., 2021). Thus, the application results are not meant to inform policy or scientific conclusions.

The approach presented here may have application in many other contexts. Many U.S. state-level policies have changed in such a way as to accommodate a staggered discontinuation design, including in domains other than pandemic mitigation. Outside of staggered discontinuation designs, methods developed in this paper apply more generally in settings where baseline potential outcomes are identified. For example, the approach could be used to estimate perprotocol effects in a clinical trial of a time-varying treatment regime with non-adherence.

Several areas for future research remain. First, it will be important to explore efficiency for competing estimators in this framework. Notably, the TMLE presented here is only known to be semiparametric efficient for the nuisance parameters ϕjk and not necessarily for the target parameter ψt (van der Laan and Gruber, 2012). Second, though parallel trends may be considered more plausible than sequential exchangeability in some settings, strategies for formally evaluating the assumption using domain knowledge, e.g. using causal diagrams, are in their infancy (e.g., Ghanem et al., 2022). Finally, the focus of this paper was on settings where one treatment regime is of interest. Future research could consider extensions beyond a single regimen, but caution should be exercised when assuming parallel trends for multiple regimens, which would imply certain restrictions on treatment effect heterogeneity that may not be plausible in some settings (Shahn et al., 2022).

Supplementary Material

Web appendices referenced in Sections 3, 5, and 7, code for the simulation study in Section 5, and data and code for the application in Section 6, are available with this paper at the Biometrics website on Wiley Online Library. The R package didgformula|, available at https://github.com/audreyrenson/didgformula , implements the estimators, as well as the simulation study (in vignette “simulation”), and applications results (in vignette “example”).
Simulation replication materials

Acknowledgements

The authors thank Dr. Whitney Robinson for helpful comments. This research was supported by the NIH grants T32-HD091058–02, T32-AI007001, R01 AI085073, and P2C-HD050924. The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health.

Appendix A Proof of Lemma 1

Here we provide a formal proof by induction of Lemma 1.

Proof. First, note that, by adding and subtracting constants,

E[Yt(a¯*)]=E[Y0(a¯*)]+k=1tE[Yk(a¯*)Yk1(a¯*)],

and by SUTVA (Assumption 1), E[Y0(a¯*)]=E[Y0] under a staggered discontinuation design because Ai0=a0* for all i. Thus it remains to prove that

E[Yk(a¯*)Yk1(a¯*)]=E[YkYk1W¯k=w¯k,A¯k=a¯k*]m=0kdF(wmw¯m1,a¯*)

for 1kt. As our induction hypothesis, assume temporarily that the following holds for some m such that 1m<kt:

E[Yk(a¯*)Yk1(a¯*)]=E[Yk(a¯*)Yk1(a¯*)W¯m=w¯m,A¯m=a¯m*]s=0mdF(wsw¯s1,a¯s1*) (A1)

If (A1) is true, then, so long as mt1, it follows that:

E[Yk(a¯*)Yk1(a¯*)]=E[Yk(a¯*)Yk1(a¯*)W¯m+1=w¯m+1,A¯m=a¯m*]s=0m+1dF(wsw¯s1,a¯s1*)=E[Yk(a¯*)Yk1(a¯*)W¯m+1=w¯m+1,A¯m+1=a¯m+1*]s=0m+1dF(wsw¯s1,a¯s1*) (A2)

where the first equality is by iterated expectation and the second by Assumption 3. In other words, if (A1) holds for some m, (A2) proves that the same statement holds for m+1t. Next, note that, for all k such that 1kt, by iterated expectation and Assumption 3:

E[Yk(a¯*)Yk1(a¯*)]=E[Yk(a¯*)Yk1(a¯*)W¯1=w¯1]s=01dF(wsws1,as1*)=E[Yk(a¯*)Yk1(a¯*)W¯1=w¯1,A¯1=a¯1*]s=01dF(wsws1,as1*)

which shows that (A1) holds for m=1. Then, by (A2), (A1) holds for m=2,,k. Finally, for m=k, by Assumption 1 we have:

E[Yt(a¯*)Yt1(a¯*)]=E[Yt(a¯*)Yt1(a¯*)W¯k=w¯k,A¯k=a¯k*]s=0kdF(wsw¯s1,a¯s1*)=E[YtYt1W¯k=w¯k,A¯k=a¯k*]s=0kdF(wsw¯s1,a¯s1*)

Appendix B Proof of parallel trends in simulation data generating distribution

To prove that parallel trends holds under the simulation setup, we first introduce three additional assumptions which trivially hold in the simulation, and then show that they are sufficient to guarantee parallel trends.

Assumption 7.

(Additive equi-confounding) For all u, u𝓤0,

E[Yt(a¯*)W¯t,A¯t=a¯t*,U0=u]E[Yt(a¯*)W¯t,A¯t=a¯t*,U0=u]=E[Yt1(a¯*)W¯t,A¯t=a¯t*,U0=u]E[Yt1(a¯*)W¯t,A¯t=a¯t*,U0=u]

Assumption 8.

(Latent ignorability)

(Yt(a¯t),Yt1(a¯t))AkU0,W¯k,A¯k1=a¯k1*for1tT,kt

Assumption 9.

(Measured covariates conditionally independent of unmeasured ones)

WtU0W¯t1,A¯t1=a¯t1*for1tT

Lemma 3.

(Implied parallel trends) Under Assumptions 79,

E[Yt(a¯*)Yt1(a¯*)W¯k,A¯k1=a¯k1*]=E[Yt(a¯*)Yt1(a¯*)W¯k,A¯k=a¯k*]fort{1,2,,τ},kt.

Proof. Proof of Lemma 3:

To show that parallel trends (Assumption 3) are implied by Assumptions 79, we show both that

E[Yt(a¯*)Yt1(a¯*)A¯k=a¯k*,W¯k]=EWk+1A¯k=a¯k*,W¯k[EWtA¯t1=a¯t1*,W¯t1{E[Yt(a¯*)Yt1(a¯*)A¯t=a¯t*,W¯t]}] (B1)

and

E[Yt(a¯*)Yt1(a¯*)A¯k1=a¯k1*,W¯k]=EWk+1A¯k=a¯k*,W¯k[EWtA¯t1=a¯t1*,W¯t1{E[Yt(a¯*)Yt1(a¯*)A¯t=a¯t*,W¯t]}] (B2)

To see (B1) note that, for any t,k such that 1ktT, by repeated applications of iterated expectation and Assumption 8 we have:

E[Yt(a¯*)Yt1(a¯*)A¯k=a¯k*,W¯k]=EU0A¯k=a¯k*,W¯k(EWk+1A¯k=a¯k*,W¯k,U0[EWtA¯t1=a¯t1*,W¯t1,U0{E[Yt(a¯*)Yt1(a¯*)A¯t=a¯t*,W¯t,U0]}])

By Assumption 7:

=EU0A¯k=a¯k*,W¯k(EWk+1A¯k=a¯k*,W¯k,U0[EWtA¯t1=a¯t1*,W¯t1,U0{E[Yt(a¯*)Yt1(a¯*)A¯t=a¯t*,W¯t]}])

By Assumption 9:

=EU0A¯k=a¯k*,W¯k(EWk+1A¯k=a¯k,W¯k[EWtA¯t1=a¯t1*,W¯t1{E[Yt(a¯*)Yt1(a¯*)A¯t=a¯t*,W¯t]}])=EWk+1A¯k=a¯k*,W¯k[EWtA¯t1=a¯t1*,W¯t1{E[Yt(a¯*)Yt1(a¯*)A¯t=a¯t*,W¯t]}] (B3)

The last equality holds because the inside terms do not depend on U0. To see (B2), repeated applications of iterated expectation and Assumption 8 again give:

E[Yt(a¯*)Yt1(a¯*)A¯k1=a¯k1*,W¯k]=EU0A¯k1=a¯k1*,W¯k(EWk+1A¯k=a¯k*,W¯k,U0[EWtA¯t1=a¯t1*,W¯t1,U0{E[Yt(a¯)Yt1(a¯)A¯t=a¯t*,W¯t,U0]}])

Similarly, by Assumptions 7 and 9:

=EU0A¯k=a¯k*,W¯k(EWk+1barAk1=a¯k1*,W¯k[EWtA¯t1=a¯t1*,W¯t1{E[Yt(a¯*)Yt1(a¯*)A¯t=a¯t*,W¯t]}])=EWk+1A¯k=a¯k*,W¯k[EWtA¯t1=a¯t1*,W¯t1{E[Yt(a¯)Yt1(a¯)A¯t=a¯t*,W¯t]}]

which is exactly equal to (B3), thus proving that parallel trends (Assumption 3) hold whenever Assumptions 7-9 hold. This ends the proof of Lemma 3. □

Thus, since Assumptions 7-9 hold in the simulation data-generating distribution, parallel trends also holds. In particular, Assumptions 8-9 are easy to see in the simulation setup, and Assumption 7 holds because:

E[Yt(a¯*)W¯t,A¯t=a¯t*,U0=u]E[Yt(a¯*)W¯t,A¯t=a¯t*,U0=u]=E[Yt1(a¯*)W¯t,A¯t=a¯t*,U0=u]E[Yt1(a¯*)W¯t,A¯t=a¯t*,U0=u])=θ(uu)

where θ is the constant (over t) linear model coefficient relating Yt to U0 in the simulation data generating distribution described in Section 5.1.

Footnotes

Software

The R package didgformula, available at https://github.com/audreyrenson/didgformula, implements the estimators, simulation study (in vignette “simulation”), and example results (in vignette “example”) described in the paper.

Data availability statement

The data that support the findings in this paper are available as part of the R package didgformula, available at https://github.com/audreyrenson/didgformula, in the dataset called stayathome2020.

References

  1. Ashenfelter O. and Card D. (1985). Using the longitudinal structure of earnings to estimate the effect of training programs. The Review of Economics and Statistics 67, 648–660. [Google Scholar]
  2. Bang H. and Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973. [DOI] [PubMed] [Google Scholar]
  3. Callaway B. and Li T. (2021). Policy evaluation during a pandemic. arXiv preprint arXiv:2105.06927 1–37. [DOI] [PMC free article] [PubMed]
  4. Callaway B. and Sant’Anna PH (2021). Difference-in-differences with multiple time periods. Journal of Econometrics 225, 200–230. [Google Scholar]
  5. de Chaisemartin C. and D’Haultfoeuille X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review 11, 2964–2996. [Google Scholar]
  6. de Chaisemartin C. and D’Haultfoeuille X. (2021a). Difference-in-differences estimators of intertemporal treatment effects. arXiv preprint arXiv:2007.04267 1–64.
  7. de Chaisemartin C. and D’Haultfoeuille X. (2021b). Two-way fixed effects regressions with several treatments. arXiv preprint arXiv:2012.10077 1–34.
  8. Fowler J, Hill S, Levin R. and Obradovich N. (2021). Stay-at-home orders associate with subsequent decreases in COVID-19 cases and fatalities in the United States. PLoS One. 16, e0248849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ghanem D, Sant’Anna P. and Wüthrich K. (2022) Selection and parallel trends. arXiv Preprint arXiv:2203.09001.
  10. Goodman-Bacon A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics 225, 254–277. [Google Scholar]
  11. Haber NA, Clarke-Deelder E, Salomon JA, Feller A, and Stuart EA (2021). Impact evaluation of coronavirus disease 2019 policy: A guide to common design issues. American Journal of Epidemiology 190, 2474–2486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Halloran ME and Hudgens MG (2016). Dependent happenings: A recent methodological review. Current Epidemiology Reports 3, 297–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Maldonado G. and Greenland S. (2002). Estimating causal effects. International Journal of Epidemiology 31, 422–429. [PubMed] [Google Scholar]
  14. Marcus M. and Sant’Anna PH (2021). The role of parallel trends in event study settings: An application to environmental economics. Journal of the Association of Environmental and Resource Economists 8, 235–275. [Google Scholar]
  15. Rambachan A. and Roth J. (2022). A more credible approach to parallel trends. Working Paper 1–47
  16. Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period application to control of the healthy worker survivor effect. Mathematical Modelling 7, 1393–1512. [Google Scholar]
  17. Robins JM (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. Health Service Research Methodology: A Focus on AIDS 113–159.
  18. Robins JM (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 95–133. Springer. [Google Scholar]
  19. Robins JM, Herán MÁ, and Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560. [DOI] [PubMed] [Google Scholar]
  20. Robins JM, Mark SD, and Newey WK (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48, 479–495. [PubMed] [Google Scholar]
  21. Roth J. (2019). Pre-test with caution: Event-study estimates after testing for parallel trends. Working Paper 1–54.
  22. Roth J, Sant’Anna PH, Bilinski A, and Poe J. (2022). What’s trending in difference-in-differences? A synthesis of the recent econometrics literature. arXiv preprint arXiv:2201.01194.
  23. Sant’Anna Pedro H. C. and Zhao J. (2022). Doubly robust difference-in-differences estimators. Journal of Econometrics 219, 101–122. [Google Scholar]
  24. Shahn Z, Dukes O, Richardson D, Tchetgen Tchetgen E, Robins J. (2022). Structural nested mean models under parallel trends assumptions. arXiv preprint arXiv:2204.10291 1–42.
  25. Stefanski LA and Boos DD (2002). The calculus of M-estimation. The American Statistician 56, 29–38. [Google Scholar]
  26. Tran L, Yiannoutsos C, Wools-Kaloustian K, Siika A, van der Laan M, and Petersen M. (2019). Double robust efficient estimators of longitudinal treatment effects: Comparative performance in simulations and a case study. International Journal of Biostatistics 15, 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. van der Laan MJ and Gruber S. (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. International Journal of Biostatistics 8, 1–39. [DOI] [PubMed] [Google Scholar]
  28. Zeldow B. and Hatfield LA (2021). Confounding and regression adjustment in difference-in-differences studies. Health Services Research 56, 932–941. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web appendices referenced in Sections 3, 5, and 7, code for the simulation study in Section 5, and data and code for the application in Section 6, are available with this paper at the Biometrics website on Wiley Online Library. The R package didgformula|, available at https://github.com/audreyrenson/didgformula , implements the estimators, as well as the simulation study (in vignette “simulation”), and applications results (in vignette “example”).
Simulation replication materials

Data Availability Statement

The data that support the findings in this paper are available as part of the R package didgformula, available at https://github.com/audreyrenson/didgformula, in the dataset called stayathome2020.

RESOURCES