Identifying and estimating effects of sustained interventions under parallel trends assumptions

Audrey Renson; Michael G Hudgens; Alexander P Keil; Paul N Zivich; Allison E Aiello

doi:10.1111/biom.13862

. Author manuscript; available in PMC: 2024 Feb 16.

Published in final edited form as: Biometrics. 2023 Apr 25;79(4):2998–3009. doi: 10.1111/biom.13862

Identifying and estimating effects of sustained interventions under parallel trends assumptions

Audrey Renson ^1,^2,^*, Michael G Hudgens ³, Alexander P Keil ¹, Paul N Zivich ¹, Allison E Aiello ⁴

PMCID: PMC10539489 NIHMSID: NIHMS1902004 PMID: 36989497

Abstract

Many research questions in public health and medicine concern sustained interventions in populations defined by substantive priorities. Existing methods to answer such questions typically require a measured covariate set sufficient to control confounding, which can be questionable in observational studies. Differences-in-differences relies instead on the parallel trends assumption, allowing for some types of time-invariant unmeasured confounding. However, most existing difference-in-differences implementations are limited to point treatments in restricted subpopulations. We derive identification results for population effects of sustained treatments under parallel trends assumptions. In particular, in settings where all individuals begin follow-up with exposure status consistent with the treatment plan of interest but may deviate at later times, a version of Robins’ g-formula identifies the intervention-specific mean under SUTVA, positivity, and parallel trends. We develop consistent asymptotically normal estimators based on inverse-probability weighting, outcome regression, and a double robust estimator based on targeted maximum likelihood. Simulation studies confirm theoretical results and support the use of the proposed estimators at realistic sample sizes. As an example, the methods are used to estimate the effect of a hypothetical federal stay-at-home order on all-cause mortality during the COVID-19 pandemic in spring 2020 in the United States.

Keywords: causal inference, difference-in-differences, g-formula, observational study, unmeasured confounding

1. Introduction

Many epidemiologic and other empirical studies concern the effects of sustained treatment strategies on population average outcomes over time. A sustained treatment or intervention is one that sets values of a time-varying exposure via a predetermined plan or algorithm. For example, clinical studies are often concerned with optimal dosing plans for therapeutic drugs, and policy investigations are often concerned with policies that determine exposure distributions repeatedly over time for the population residing in a jurisdiction.

Existing approaches to estimating effects of sustained interventions in well-defined populations include g-computation (Robins, 1986), inverse probability of treatment weighted (IPTW) marginal structural models (Robins, 2000; Robins et al., 2000), g-estimation of structural nested models (Robins, 1989), and double robust methods such as augmented IPTW (Bang and Robins, 2005) and targeted maximum likelihood (van der Laan and Gruber, 2012). Importantly, these approaches base causal identification on a sequential version of exchangeability (Robins, 1986), also known as sequential ignorability or no unmeasured confounders (Robins, 2000). Sequential exchangeability posits that the potential outcomes are independent of treatment, given the history of some set of measured (possibly time-varying) covariates and treatment; this assumption is unverifiable and can be implausible in many settings. For example, individuals may select medical treatments based on unmeasured risk factors, and public policies are decided in highly complex political contexts that may influence health.

In contrast, difference-in-differences (DID) methods typically base identification on parallel trends assumptions rather than sequential exchangeability (Ashenfelter and Card, 1985; Roth et al., 2022). Parallel trends assumptions posit that time trends in average potential outcomes are independent of the observed treatment (Ashenfelter and Card, 1985; Marcus and Sant’Anna, 2021). DID methods typically focus on the average treatment effect in the treated for a treatment occurring at a single time point, although recently extensions have considered certain types of sustained treatment regimes (for a review, see Roth et al., 2022). In particular, Callaway and Sant’Anna (2021) and de Chaisemartin and D’Haultfoeuille (2020, 2021b) consider effects conditional on each observed treatment path in a monotonic (i.e., staggered adoption) setting, meaning that values of the observed time-varying treatment can either increase or decrease over time, but not both. Somewhat more generally, de Chaisemartin and D’Haultfoeuille (2021a) consider interventions fixing a (possibly non-monotonic) binary or ordinal exposure to its baseline status, focusing on unconditional cost-effectiveness ratios and outcome regression estimators. Relevant to the present work, these recent DID developments have included doubly robust estimators (Callaway and Sant’Anna, 2021; Sant’Anna and Zhao, 2020).

This paper considers a more general setting and different estimands than previous DID implementations, focusing on marginal effects of general sustained treatment strategies under parallel trends assumptions. The remainder of this paper is organized as follows. Section 2 introduces notation and the assumed data structure. Section 3 presents new identification results for an intervention-specific mean under parallel trends, where identifying formulas are modifications of Robins’ (1986) g-computation algorithm formula (g-formula). Section 4 then presents consistent and asymptotically normal (CAN) estimators based on inverse probability weighting, outcome regression, and a double robust estimator that combines both. Section 5 presents simulation results that support the identification result and theoretical large sample properties of the proposed estimators. Section 6 presents an example estimating the number of lives that would have been saved by a U.S. federal stay-at-home order during the COVID-19 pandemic in spring 2020. Section 7 considers sensitivity analysis for violations of parallel trends and application of the poposed approach to dynamic treatment regimens. Section 8 concludes.

2. Preliminaries

2.1. Data

Suppose data $O_{i t} = {W_{i t}, A_{i t}, Y_{i t}}$ are observed on $i = 1, 2, \dots, n$ individuals (or units) at time points $t = 0, 1, \dots, τ$ , where: $W_{i t}$ are (possibly vector-valued) covariates; $A_{i t}$ are discrete, possibly multivariate treatments realized after $W_{i t}$ ; and $Y_{i t}$ are outcomes realized after $A_{i t}$ , all measured without error. Denote history of a variable with overbars, e.g., ${\bar{A}}_{i t} = (A_{i 0}, A_{i 1}, \dots, A_{i t})$ , with ${\bar{A}}_{i} \equiv {\bar{A}}_{i τ}$ and $A_{i k} = {Ø}$ for $k < 0$ by convention. Upper case is used throughout to refer to random variables, lower case refers to specific realizations, and scripts refer to the support. The $i$ subscript is omitted unless needed to resolve ambiguity. Throughout, it is assumed that ${\bar{O}}_{i} \equiv {{\bar{W}}_{i}, {\bar{A}}_{i}, {\bar{Y}}_{i}} (i = 1, 2, \dots, n)$ represent independent and identically distributed (iid) draws from a relevant target population.

Assume the data come from a staggered discontinuation design, defined as follows. Suppose the target estimand is $E {Y_{t} ({\bar{a}}^{*})}$ where ${\bar{a}}^{*}$ denotes the treatment strategy or intervention plan of interest, and $Y_{t} ({\bar{a}}^{*})$ denotes a potential outcome; i.e., the value $Y_{t}$ would take under the intervention setting ${\bar{A}}_{t} = {\bar{a}}_{t}^{*}$ . The approach in this paper requires that in the observed data distribution, $Pr (A_{0} = a_{0}^{*}) = 1$ , or in other words that $A_{i 0} = a_{0}^{*}$ for all $i$ . Such a scenario is said to be a staggered discontinuation design with respect to the treatment plan ${\bar{a}}^{*}$ of interest, because units begin follow-up under the treatment plan but may discontinue at later points in a staggered way. Note that we do not require monotonic treatment assignment, in contrast to recent DID papers (e.g., Goodman-Bacon, 2021; Callaway and Sant’Anna, 2021).

2.2. Motivating example

Consider the question, “what effects on all-cause mortality would a U.S. federal stay-at-home order have had in spring 2020 during the COVID-19 pandemic?” Let $Y_{i t}$ be a binary indicator that individual $i$ died during week $t$ , $t = 0, 1, \dots, 11$ , measured as weeks since April 6, 2020. Let $A_{i t}$ be a binary indicator that the state in which individual $i$ was living during week $t$ was under a state-level stay-at-home or shelter-in-place order. Suppose it is of interest to estimate $E {Y_{t} (\bar{1})} - E (Y_{t})$ , the difference in U.S. mortality rates under a hypothetical federal stay-at-home order vs. under the observed treatment trajectory (i.e., the “natural course”). As of April 6, 43/50 U.S. states were under stay-at-home orders, which were discontinued at times ranging from late April to late June, with the exception of California which continued through December (Figure 1). Thus, the observed treatment trajectories give rise to a staggered discontinuation design with respect to the treatment plan ${\bar{a}}^{*} = \bar{1}$ setting everyone to remain under stay-at-home order in those 43 states. The methods developed below can be used to draw inference about what would have happened had such a policy been implemented.

Figure 1: — Dates of state-issued stay-at-home orders in U.S. states during the COVID-19 pandemic in 2020

3. Identification

In this section we consider identification, given data $\bar{O}$ , of the quantity $μ_{t} \equiv E {Y_{t} ({\bar{a}}^{*})}$ , the mean outcome at time $t$ , under the intervention to set all individuals to ${\bar{A}}_{i} = {\bar{a}}^{*}$ . Throughout, it is assumed that interest lies in only one intervention ${\bar{a}}^{*}$ . Note that $μ_{t}$ depends on ${\bar{a}}^{*}$ , which is left implicit for notational simplicity. Consider the following assumptions:

Assumption 1.

(Stable Unit Treatment Value Assumption [SUTVA]): If ${\bar{A}}_{i t} = {\bar{a}}_{t}^{*}$ , then $Y_{i t} = Y_{i t} ({\bar{a}}_{t}^{*})$ for $t \in {0, 1, \dots τ}$ .

In words, Assumption 1 requires that an individual’s potential outcomes are not affected by other individuals’ treatments (i.e., no interference), allowing us to index potential outcomes by each individual’s treatments alone. Assumption 1 also requires that the treatment ${\bar{a}}^{*}$ is defined precisely enough so that, for individuals whose observed treatment equals ${\bar{a}}_{t}^{*}$ , observed outcomes can stand in for counterfactual outcomes under a hypothetical intervention ${\bar{a}}_{t}^{*}$ . Implicit in Assumption 1 is that the future cannot affect the past, or $Y_{i t} ({\bar{a}}_{k}^{*}) = Y_{i t} ({\bar{a}}_{t}^{*})$ for $k > t$ . Thus, SUTVA would be violated, for example, if individuals were able to anticipate future treatments and change their behavior. This particular violation has typically been addressed under a separate “no anticipation” assumption in the DID literature (e.g., Callaway and Sant’Anna, 2021). SUTVA is an unverifiable assumption in observational studies, though it is sometimes possible to test for the presence of interference (Halloran and Hudgens, 2016).

Assumption 2.

(Positivity): If $f ({\bar{w}}_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}) > 0$ , then $f ({\bar{a}}_{t}^{*} ∣ {\bar{W}}_{t} = {\bar{w}}_{t}, {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}) > 0$ , for ${\bar{w}}_{t} \in {\bar{𝓦}}_{t}; t \in {1, 2, \dots, τ}$ .

Here and throughout, $f (x ∣ \cdot)$ refers to a conditional density if $X$ is continuous, and a conditional probability mass function if $X$ is discrete. Assumption 2 requires that units whose treatment history up to time $t - 1$ is consistent with the regime in question $({\bar{a}}^{*})$ have positive probability of remaining under treatment plan ${\bar{a}}^{*}$ at time $t$ . Positivity can sometimes be a verifiable assumption. In particular, if both $\bar{W}$ and $\bar{A}$ are low-dimensional and discrete, then among units with ${\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}$ , if one observes units who remain under ${\bar{a}}^{*}$ at time $t$ in every stratum of ${\bar{W}}_{t}$ , this implies positivity in the population with probability 1 (but not necessarily the reverse).

Assumption 3.

(Parallel trends): For $t \in {1, 2, \dots, τ}, k \leq t$ :

E {Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{k}, {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}} = E {Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{k}, {\bar{A}}_{k} = {\bar{a}}_{k}^{*}}

In words, Assumption 3 states that, among individuals whose treatment status is consistent with the intervention ${\bar{a}}^{*}$ up to time $k - 1$ , had (counter to fact) all individuals followed the intervention through time $t$ , trends would have been parallel for those who do and do not follow the intervention at time $k$ but have equal covariate histories. This assumption is very similar to those adopted in papers describing event study and staggered adoption DID designs such as Goodman-Bacon (2021) and Callaway and Sant’Anna (2021), and nearly identical to Assumption 12 in de Chaisemartin and D’Haultfoeuille (2021a). The latter differs by considering only certain types of treatment regimes and presuming a linear relation between covariates and counterfactual trends. Parallel trends is unverifiable, though closely related conditions can often be checked (Roth, 2019). Note that $W_{i t}$ may include prior outcomes $Y_{i m}$ for $m < t$ . However, if $Y_{i, t - 1}$ is included in $W_{i t}$ , then the parallel trends assumption is equivalent to sequential exchangeability, in which case existing causal inference methods for observational data with a longitudinal exposure can be used (e.g., Robins, 1986, 2000; van der Laan and Gruber, 2012).

The following Lemma presents the main identification results in this paper, which show that Assumptions 1-3 are sufficient to equate $μ_{t}$ to a function of the observed data distribution.

Lemma 1.

(Parallel trends g-formula) Define the functional (i.e., statistical parameter) $ψ_{t} \equiv E (Y_{0}) + \sum_{k = 1}^{t} \int E (Y_{k} - Y_{k - 1} ∣ {\bar{W}}_{k} = {\bar{w}}_{k}, {\bar{A}}_{k} = {\bar{a}}_{k}^{*}) \prod_{m = 0}^{k} d F (w_{m} ∣ {\bar{w}}_{m - 1}, {\bar{a}}_{m - 1}^{*})$ . Under a staggered discontinuation design and if Assumptions 1-3 hold, then $ψ_{t} = μ_{t}$ .

Here and throughout, $F (\cdot ∣ \cdot)$ refers to a conditional cumulative distribution function. Lemma 1 states that the target causal quantity $μ_{t}$ is identified by the parameter $ψ_{t}$ . The parameter $ψ_{t}$ is referred to as the parallel trends $g$ -formula because it represents a modification of the usual $g$ -formula (the dependence of $ψ_{t}$ on ${\bar{a}}^{*}$ is also left implicit). A formal proof of Lemma 1 by induction is presented in Appendix A. Here we give a less formal explanation to build intuition. We have:

μ_{t} \equiv E {Y_{t} ({\bar{a}}^{*})} = E {Y_{0} ({\bar{a}}^{*})} + \sum_{k = 1}^{t} E {Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*})} = E {Y_{0} ({\bar{a}}^{*})} + \sum_{k = 1}^{t} E [E {Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ W_{0}}] = E {Y_{0} ({\bar{a}}^{*})} + \sum_{k = 1}^{t} E [E {Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ A_{0} = a_{0}^{*}, W_{0}}]

where the first equality follows by adding and subtracting constants, the second by iterated expectation, and the third by Assumption 3. Repeatedly applying iterated expectation and Assumption 3, we have:

μ_{t} = E {Y_{0} ({\bar{a}}^{*})} + \sum_{k = 1}^{t} E (\dots E [E {Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} ∣ {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}, {\bar{W}}_{k - 1}] \dots ∣ A_{0} = a_{0}^{*}, W_{0}) = E (Y_{0}) + \sum_{k = 1}^{t} E [\dots E {E (Y_{k} - Y_{k - 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}) ∣ {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}, {\bar{W}}_{k - 1}} \dots ∣ A_{0} = a_{0}^{*}, W_{0}] = ψ_{t}

where the second equality follows from Assumption 1 and the last equality from iterated expectation.

4. Estimators

This section presents estimators for the statistical parameter $ψ_{t}$ , which equals the target quantity $μ_{t}$ under the above stated assumptions. The estimators in this section utilize existing estimators of $μ_{t}$ under sequential exchangeability rather than parallel trends, all of which are CAN estimators of the g-formula by virtue of being solutions to unbiased estimating equations (Stefanski and Boos, 2002). Since $ψ_{t}$ is a continuous function of several g-formulas, the same function applied to estimators of those g-formulas is a CAN estimator for $ψ_{t}$ . The remainder of this section formalizes this logic and gives examples of specific estimators that function in this capacity. The estimators presented in this section are provided in an R package (see Supporting Information).

4.1. General form

Here we derive a general form of a CAN estimator for the target statistical parameter, $ψ_{t}$ . First, define:

ϕ_{j, k} = \int E (Y_{j} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k} = {\bar{w}}_{k}) \prod_{m = 0}^{k} d F (w_{m} ∣ {\bar{W}}_{m - 1} = {\bar{w}}_{m - 1}, {\bar{A}}_{m - 1} = {\bar{a}}_{m - 1}^{*})

(1)

Equation (1) is the g-formula, developed in the context of identifying parameters like $μ_{t}$ under sequential exchangeability. Here, sequential exchangeability is not assumed, and therefore $ϕ_{j, k}$ is not interpretable as a causal parameter; instead existing estimators of the statistical parameter $ϕ_{j, k}$ are used to assemble estimators of $ψ_{t}$ (which equals the causal parameter $μ_{t}$ under Assumptions 1-3) by noting that $ψ_{t} = ϕ_{0, 0} + \sum_{k = 1}^{t} (ϕ_{k, k} - ϕ_{k - 1, k})$ . Next, suppose there is an estimator ${\hat{ϕ}}_{j k}$ of $ϕ_{j k}$ which is the solution to an unbiased estimating function $d_{ϕ_{j k}} (O; ϕ_{j k})$ ; i.e., $0 = E {d_{ϕ_{j k}} (O; {\hat{ϕ}}_{j k})}$ . Let $ϕ = (ϕ_{0, 0}, ϕ_{0, 1}, \dots, ϕ_{t - 1, t}, ϕ_{t, t})$ . Then simply define

d_{ψ_{t}} (O; ϕ, ψ_{t}) = ϕ_{0, 0} + \sum_{k = 1}^{t} (ϕ_{k, k} - ϕ_{k - 1, k}) - ψ_{t}

(2)

Clearly, $E {d_{ψ_{t}} (O; ϕ, ψ_{t})} = 0$ , indicating that an estimator $(\hat{ϕ}, \hat{ψ_{t}})$ that jointly solves $0 = \sum_{i} d_{ψ_{t}} (O_{i}; \hat{ϕ}, {\hat{ψ}}_{t})$ will yield a CAN estimator ${\hat{ψ}}_{t}$ for $ψ_{t}$ .

The following subsections show how different options for $d_{ϕ_{j k}} (O; ϕ_{j k})$ (including IPTW, g-computation, and targeted maximum likelihood estimation [TMLE]) can be constructed and stacked with (2) to form estimators of $ψ_{t}$ that inherit desirable properties (e.g., consistency, asymptotic normality, double robustness).

4.2. Inverse probability of treatment weighted (IPTW) estimator

Define stabilized inverse probability of treatment weights (IPTWs) as $π_{k} (\bar{a}; \bar{W}) = \prod_{m = 1}^{k} f (a_{m} ∣ {\bar{a}}_{m - 1}) / \prod_{m = 1}^{k} f (a_{m} ∣ {\bar{a}}_{m - 1}, {\bar{W}}_{m})$ . Robins (2000) (Lemma 1.1) showed that $ϕ_{j, k} = c_{j k} ({\bar{a}}_{k}^{*})$ , where $c_{j k} (\cdot)$ is the unique function such that $E [q_{j k} ({\bar{A}}_{k}) {Y_{j} - c_{j k} (A_{k})} π_{k} (\bar{A}; \bar{W})] = 0$ for all functions $q_{j k} ({\bar{A}}_{k})$ where the expectation exists. The equation $E [q_{j k} ({\bar{A}}_{k}) {Y_{j} - c_{j k} ({\bar{A}}_{k})} π_{k} ({\bar{A}}_{k}; \bar{W})] = 0$ defines a regression of $Y_{j}$ on ${\bar{A}}_{k}$ weighted by $π_{k} (\bar{A}; \bar{W})$ . In the context of sequential exchangeability, estimators based on this weighted regression formulation are called IPTW-marginal structural model estimators and given a causal interpretation (Robins et al., 1992; Robins, 2000). However, in the context of this paper, sequential exchangeability is not assumed and so the weighted regression equation does not have a causal interpretation on its own. Instead, the above result together with results from Section 4.1, implies that an IPTW estimator of $ψ_{t}$ can be formed by using a linear combination of IPTW marginal structural model estimators for $ϕ_{j, k}$ .

For simplicity of presentation, assume that for $k = 0, 1, \dots, t, f (a_{k} ∣ {\bar{a}}_{k - 1})$ and $f (a_{k} ∣ {\bar{a}}_{k - 1}, {\bar{w}}_{k})$ are known up to a finite dimensional parameter. That is, define $g_{0, k} (\bar{a}, \emptyset) \equiv \prod_{s = 0}^{k} f (a_{s} ∣ {\bar{a}}_{s - 1})$ and $g_{0, k} (\bar{a}, \bar{w}) \equiv \prod_{s = 0}^{k} f (a_{s} ∣ {\bar{a}}_{s - 1}, {\bar{w}}_{s})$ , and say we are willing to assume that $g_{0, k} (\bar{a}, ∅)$ is uniquely determined by the parametric model $g_{k} (\bar{a}, \emptyset; α_{0})$ , and similarly that $g_{0, k} (\bar{a}, \bar{w}) = g_{k} (\bar{a}, \bar{w}; α_{1})$ , where $α_{0}$ and $α_{1}$ are finite dimensional parameter vectors. Say we have an estimator that solves an unbiased estimating equation for $(α_{0}, α_{1})$ . For example, $g_{k} (\bar{a}, \emptyset; α_{0})$ and $g_{k} (\bar{a}, \bar{w}; α_{1})$ may consist of generalized linear models with parameters estimated by maximum likelihood. Specify $c_{j k} ({\bar{A}}_{k})$ as some appropriate functional form for the expected value of $Y_{j}$ conditional on ${\bar{A}}_{k}$ in the weighted data distribution, such as $c_{j k} ({\bar{A}}_{k}) = γ_{0 j k} + γ_{1 j k} I ({\bar{A}}_{k} = {\bar{a}}_{k}^{*})$ (i.e., leaving the model unrestricted when ${\bar{A}}_{k} = {\bar{a}}_{k}^{*}$ ). Then an estimator ${\hat{ϕ}}_{j, k}^{I P T W}$ that solves $0 = c_{j k} ({\bar{a}}_{k}^{*}) - ϕ_{j, k}$ is CAN for $ϕ_{j, k}$ if $c_{j k} ({\bar{A}}_{k})$ , $g_{k} (\bar{A}, \emptyset; α_{0})$ and $g_{k} (\bar{A}, \bar{w}; α_{1})$ are correctly specified. Finally, stack the score equations for $g_{k} (\bar{a}, \emptyset; α_{0})$ , and $g_{k} (\bar{a}, \bar{w}; α_{1})$ , along with $0 = c_{j k} ({\bar{a}}_{k}^{*}) - ϕ_{j, k} (k = 0, 1, \dots, t; j = k - 1, k)$ and equation (2) to yield an estimator for $ψ_{t}$ , say ${\hat{ψ}}_{t}^{I P T W}$ .

In other words, the IPTW estimator for the target parameter of interest is ${\hat{ψ}}_{t}^{I P T W} = {\hat{ϕ}}_{0, 0}^{I P T W} + \sum_{k = 1}^{t} ({\hat{ϕ}}_{k, k}^{I P T W} - {\hat{ϕ}}_{k - 1, k}^{I P T W})$ , where ${\hat{ϕ}}_{j, k}^{I P T W}$ are estimators of each appropriate g-formula parameter based on an IPTW model. Note that under our assumption set, ${\hat{ϕ}}_{j, k}^{I P T W}$ are not estimators of causal quantities in and of themselves, but simply functions of the observed data distribution that may be assembled appropriately to form the causal estimator ${\hat{ψ}}_{t}^{I P T W}$ . Clearly, ${\hat{ψ}}_{t}^{I P T W}$ solves an estimating equation that is unbiased if $g_{k} (\bar{A}, \emptyset; α_{0})$ , $g_{k} (\bar{A}, \bar{w}; α_{1})$ and $c_{j k} ({\bar{A}}_{k})$ are all correctly specified, implying ${\hat{ψ}}_{t}^{I P T W}$ is CAN for $ψ_{t}$ under the same conditions. However, IPTW estimators are known to be inefficient and ${\hat{ψ}}_{t}^{I P T W}$ may similarly inherit this property. The following subsections present estimators that may improve on efficiency relative to IPTW.

4.3. Iterated conditional expectation (ICE) estimator

Bang and Robins (2005) describe an estimator of $ϕ_{j, k}$ based on the following iterated conditional expectation (ICE) representation:

ϕ_{j, k} = E (E [\dots E {E (Y_{j} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}) ∣ {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}, {\bar{W}}_{k - 1}} \dots ∣ {\bar{A}}_{1} = {\bar{a}}_{1}^{*}, {\bar{W}}_{1}] ∣ A_{0} = a_{0}^{*}, W_{0})

(3)

which can equivalently be written as $ϕ_{j, k} = E {Q_{0}^{j, k, 0} ({\bar{a}}^{*})}$ where, for $m = 0, 1, \dots, k, Q_{0}^{j, k, m} ({\bar{a}}^{*}) = E {Q_{0}^{j, k, m + 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{m} = {\bar{a}}_{m}^{*}, {\bar{W}}_{m}}$ and $Q_{0}^{j, k, k + 1} ({\bar{a}}^{*}) = Y_{j}$ .

An estimator of $ψ_{t}$ can then be formulated based on this representation. For simplicity, say we are willing to assume that $Q_{0}^{j, k, m} ({\bar{a}}^{*})$ are known up to a finite dimensional parameter for $m = 0, 1, \dots, k$ . That is, assume $Q_{0}^{j, k, m} ({\bar{a}}^{*}) = Q^{j, k, m} ({\bar{a}}^{*}; β_{m})$ , where $β_{m}$ for $m = 0, 1, \dots, k$ , are finite dimensional parameters. For example, $Q^{j, k, m} ({\bar{a}}^{*}; β_{m})$ may be a generalized linear model with parameters $β_{m}$ . Say we are in possession of an unbiased estimating function $d_{j, k, m} {O, Q^{j, k, m + 1} ({\bar{a}}^{*}; β_{m + 1}); β_{m}}$ for $β_{m}$ . For example, if maximum likelihood is used, then $d_{j, k, m} {O, Q^{j, k, m + 1} ({\bar{a}}^{*}; β_{m + 1}); β_{m}}$ is the vector of first derivatives of the model log-likelihood with respect to $β_{m}$ . Note that including $Q^{j, k, m + 1} ({\bar{a}}^{*}; β_{m + 1})$ as an argument to the estimating function makes explicit the nested nature of the iterated expectations being modeled. The ICE estimator of $ϕ_{j, k}$ is then defined (Bang and Robins, 2005) as the solution ${\hat{ϕ}}_{j, k}^{I C E}$ to $0 = \sum_{i = 1}^{n} d_{j, k} (O_{i}; ϕ_{j, k})$ where

d_{j, k} (O; ϕ_{j, k}) = (\begin{matrix} d_{j, k, k} (O; β_{k}) \\ d_{j, k, k - 1} {O, Q^{j, k, k} ({\bar{a}}^{*}; β_{k}); β_{k - 1}} \\ ⋮ \\ d_{j, k, 0} {O, Q^{j, k, 1} ({\bar{a}}^{*}; β_{1}); β_{0}} \\ Q^{j, k, 0} ({\bar{a}}^{*}; β_{0}) - ϕ_{j, k} \end{matrix})

Then, simply stack $d_{j, k} (O; ϕ_{j, k})$ with (2) to yield an estimator ${\hat{ψ}}_{t}^{I C E}$ for $ψ_{t}$ . In other words, the ICE estimator of the target parameter is ${\hat{ψ}}_{t}^{I C E} = {\hat{ϕ}}_{0, 0}^{I C E} + \sum_{k = 1}^{t} ({\hat{ϕ}}_{k, k}^{I C E} - {\hat{ϕ}}_{k - 1, k}^{I C E})$ , where each ${\hat{ϕ}}_{j, k}^{I C E}$ is an estimator of the corresponding g-formula parameter based on ICE g-computation. Clearly, ${\hat{ψ}}_{t}^{I C E}$ solves an unbiased estimating equation whenever all the iterated outcome models ${Q^{j, k, m} ({\bar{a}}^{*}; β_{m}) : k = 0, 1, \dots, t; j = k, k - 1; m = 0, 1, \dots, k + 1}$ are correctly specified. Estimators of $ϕ_{j, k}$ based on outcome regression generally have smaller asymptotic variance that IPTW estimators, and ${\hat{ψ}}_{t}^{I C E}$ may inherit this property.

4.4. Doubly robust targeted maximum likelihood estimator (TMLE)

IPTW estimators are only guaranteed to be CAN if the treatment models are correctly specified, and ICE estimators are only guaranteed to be CAN if all the outcome models are correctly specified. Doubly robust estimators are CAN if either the outcome or treatment models are correct (but not necessarily both), which is an advantage because one is rarely certain that models are correctly specified.

Doubly robust estimators of $ϕ_{j, k}$ generally consist of augmenting the ICE algorithm by including predicted values from the treatment models used to construct IPTWs in some way. Such estimators are called semiparametric efficient if they solve the estimating equation corresponding to the following efficient influence curve (van der Laan and Gruber, 2012; Tran et al., 2019):

\sum_{m = 0}^{k} \frac{I ({\bar{A}}_{m} = {\bar{a}}_{m}^{*})}{g_{0, m} ({\bar{a}}^{*}, \bar{w})} {Q_{0}^{j, k, m + 1} ({\bar{a}}^{*}) - Q_{0}^{j, k, m} ({\bar{a}}^{*})} + Q_{0}^{j, k, 0} ({\bar{a}}^{*}) - ϕ_{j, k}

(4)

with $g_{0, m} ({\bar{a}}^{*}, \bar{w})$ and $Q_{0}^{j, k, m}$ defined as in previous sections. Many estimators correspond to this efficient influence curve, meaning they all have the smallest asymptotic variance of any regular asymptotically linear estimator in this class (Bang and Robins, 2005; van der Laan and Gruber, 2012). We present one such example of a targeted maximum likelihood estimator (TMLE) which may outperform others in finite samples (Tran et al., 2019).

First consider the TMLE of $ϕ_{j, k}$ . For simplicity, assume that outcome models ${Q_{0}^{j, k, m} ({\bar{a}}^{*}) : m = 0, 1, \dots, k}$ and treatment models ${g_{0, m} ({\bar{a}}^{*}) : m = 0, 1, \dots, k}$ are known up to a finite dimensional parameter. That is, assume $g_{0, m} ({\bar{a}}^{*}) = g_{m} ({\bar{a}}^{*}; α_{m})$ and $Q_{0}^{j, k, m} ({\bar{a}}^{*}) = Q^{j, k, m} ({\bar{a}}^{*}; β_{m})$ , where $α_{m}$ and $β_{m}$ are finite dimensional parameters, $m = 0, 1, \dots, k$ . Then proceed as follows:

For $m = 0, 1, \dots, k$ , estimate $α_{m}$ , for example using maximum likelihood. Denote estimators of $α_{m}$ as ${\hat{α}}_{m}$ and corresponding estimators of $g_{m} ({\bar{a}}^{*}; α_{m})$ as $g_{m} ({\bar{a}}^{*}; {\hat{α}}_{m})$ .
For $m = k$ , estimate $β_{m}$ , for example using maximum likelihood, denoting this estimator ${\hat{β}}_{m}$ . Calculate $Q_{i}^{j, k, m} ({\bar{a}}^{*}; {\hat{β}}_{m})$ for each unit $i$ and denote this estimator ${\hat{Q}}_{i}^{j, k, m} ({\bar{a}}^{*})$ . Note that these are model predictions that implicitly depend on the data, and so vary across units $i$ .
Also for $m = k$ , update the initial fit ${\hat{Q}}_{i}^{j, k, m} ({\bar{a}}^{*})$ by fitting a new model, defined as $h {Q_{i}^{j, k, m, *} ({\bar{a}}^{*})} = h {{\hat{Q}}_{i}^{j, k, m} ({\bar{a}}^{*})} + ϵ_{j, k, m}$ , where $h (\cdot)$ is an appropriate link function, $ϵ_{j, k, m}$ is an intercept, and $Q_{i}^{j, k, m, *} ({\bar{a}}^{*})$ are conditional expectations under the updated model. Note, the response variable in this model is $Q_{i}^{j, k, k + 1} ({\bar{a}}^{*}) = Y_{j}$ . The logit link is recommended to ensure the estimator respects bounds implied by the data (if $Y_{j}$ is not bounded by (0,1), it will need to be appropriately transformed for the logit function to be defined) (van der Laan and Gruber, 2012). Estimators ${\hat{Q}}_{i}^{j, k, m, *} ({\bar{a}}^{*})$ for the updated fit are found by maximizing an appropriate weighted likelihood with weights $I ({\bar{A}}_{i m} = {\bar{a}}_{m}^{*}) / g_{m} ({\bar{a}}^{*}; {\hat{α}}_{m})$ .
Repeat steps 2–3, estimating $Q^{j, k, m} ({\bar{a}}^{*}; β_{m})$ and $Q^{j, k, m, *} ({\bar{a}}^{*})$ for $m = k - 1, k - 2, \dots, 0$ .
The TMLE for $ϕ_{j, k}$ is then defined as ${\hat{ϕ}}_{j, k}^{T M L E} = n^{- 1} \sum_{i = 1}^{n} {\hat{Q}}_{i}^{j, k, 0, *} ({\bar{a}}^{*})$ .

Then, the TMLE for $ψ_{t}$ is defined as ${\hat{ψ}}_{t}^{T M L E} = {\hat{ϕ}}_{0, 0}^{T M L E} + \sum_{k = 1}^{t} ({\hat{ϕ}}_{k, k}^{T M L E} - {\hat{ϕ}}_{k - 1, k}^{T M L E})$ . Since ${\hat{ϕ}}_{j, k}^{T M L E}$ solves the estimating equation corresponding to the efficient influence curve (4), it will be CAN for $ϕ_{j, k}$ so long as either (i) the set of outcome models ${Q^{j, k, m} ({\bar{a}}^{*}; β_{m}) : m = 0, 1, \dots, k}$ are correctly specified, or (ii) the set of treatment models ${g_{m} ({\bar{a}}^{*}; α_{m}) : m = 0, 1, \dots, k}$ are correctly specified, but it is not necessary that both be correct. Therefore, if one of these two conditions holds for all $k = 0, 1, \dots, t$ and $j = k - 1, k$ , then $\hat{ψ_{t}^{T M L E}}$ will be CAN for $ψ_{t}$ . The double robustness property carries through to ${\hat{ψ}}_{t}^{T M L E}$ by virtue of the fact that the estimating equation in (2) is unbiased if the estimating equations for all the $ϕ_{j, k}$ are unbiased, which is the case for ${\hat{ϕ}}_{j, k}^{T M L E}$ under conditions (i) or (ii) above.

5. Simulation study

A simulation study was conducted to evaluate the finite sample performance of the IPTW, ICE, and TMLE estimators described in Section 4 when Assumptions 1-3 hold and all models were correctly specified. The TMLE estimator was also evaluated under misspecification of either the treatment or outcome model. Code for the simulation is provided in an R package (see Supporting Information).

5.1. Data generating distribution

Data $O_{i t} = {W_{i t} = (W_{i t 1}, W_{i t 2}), A_{i t}, Y_{i t}}; i = 1, 2, \dots, n; t = 0, \dots, 5$ were generated from the distributions $U_{i 0} \sim Bernoulli {{logit}^{- 1} (ω_{0})}$ ; $W_{i t 1} \sim Bernoulli {{logit}^{- 1} (α_{0 t} + α_{1 t} A_{i, t - 1})}$ ; $W_{i t 2} \sim N (γ_{0 t} + γ_{1 t} A_{i, t - 1}, 1)$ ; $A_{i t} ∣ A_{i, t - 1} = 0 \sim Bernoulli {{logit}^{- 1} (δ_{0 t} + δ_{1 t} U_{i 0} + δ_{2 t} W_{i t 1} + δ_{3 t} W_{i t 2} + δ_{4 t} W_{i t 2}^{2})}$ ; and $Y_{i t} \sim N (β_{0 t} + β_{1 t} W_{i t 1} + β_{2 t} W_{i t 2} + β_{3 t} W_{i t 2}^{2} + β_{4 t} A_{i t} + θ U_{i 0}, 1)$ ; with $A_{i 0} = 0$ and $A_{i t} = 1$ if $A_{i, t - 1} = 1$ . The monotonic treatment assignment for $A_{i t}$ is not necessary, but simplifies analysis. In all analyses, $U_{i 0}$ is treated as unmeasured, but all other variables are observed. Parallel trends hold in this setup, as one sufficient set of conditions for parallel trends (proven in Appendix B) is that the only unmeasured variables (here, $U_{i 0}$ ) are time-invariant, do not affect time-varying covariates, and enter the outcome model linearly with constant coefficient over time (here, $θ$ is constant over $t$ ).

We simulated 1,000 datasets each for sample sizes $n = 1, 000$ , 10,000, and 100,000. All parameters were generated from a $N (0.2, 1)$ distribution, with the same values for each parameter used across all simulation runs. The target parameter was $μ_{5} = E {Y_{5} (\bar{0})} = - 3.98$ , the mean outcome at end of follow-up, had everyone remained untreated. The true difference compared to the natural course was $E {Y_{5} (\bar{0})} - E (Y_{5}) = - 3.98 - (- 3.88) = - 0.10$ .

5.2. Estimator implementation

For each simulated dataset, the estimators ${\hat{μ}}_{t}^{I P T W}$ , ${\hat{μ}}_{t}^{I C E}$ , and ${\hat{μ}}_{t}^{T M L E}$ were calculated using correctly specified generalized linear regression models estimated using maximum likelihood. Additionally, ${\hat{μ}}_{t}^{I P T W}$ was calculated with treatment models misspecified, ${\hat{μ}}_{t}^{I C E}$ with outcome models misspecified, and ${\hat{μ}}_{t}^{T M L E}$ with treatment models, outcome models, or both misspecified. All misspecified models omitted the term for $W_{i t 2}^{2}$ at each time $t$ .

5.3. Simulation results

Table 1 shows estimates of the bias, variance, and p-values from a Lilliefors test for normality, for each estimator of $μ_{5}$ . The results suggest that all stated theoretical properties hold approximately in simulated data. First, when all models are correctly specified, all estimators appear approximately unbiased with decreasing variance as the sample size increases. When outcome models and treatment models are misspecified, ICE and IPTW estimators appear biased, respectively. TMLE appears consistent when either the treatment or outcome models are correctly specified, but not when both are misspecified, supporting the double robustness property. Lastly, all estimators appear normally distributed for all sample sizes considered, based on Lilliefors tests.

Table 1:

Simulation results

	n = 1, 000			n = 10, 000			n = 100, 000

estimator	variance¹/n	bias²	p³	variance¹/n	bias²	p³	variance¹/n	bias²	p³
ice_qfal	4.3	1.15	0.68	4.6	1.10	0.81	4.5	1.16	0.76
ice_true	4.2	−0.04	0.82	4.5	−0.02	0.23	4.2	0.02	0.45
iptw_gfal	4.5	1.17	0.86	4.7	1.17	0.49	4.6	1.25	0.69
iptw_true	6.0	−0.17	0.37	6.3	−0.08	0.87	7.4	−0.01	0.57
tmle_bfal	4.4	1.19	0.51	4.6	1.15	0.65	4.5	1.22	0.62
tmle_gfal	4.2	−0.08	0.86	4.5	−0.04	0.28	4.3	0.02	0.21
tmle_qfal	5.9	−0.09	0.15	6.4	−0.07	0.78	7.6	−0.02	0.41
tmle_true	5.4	−0.03	0.82	5.6	0.01	0.11	5.7	0.01	0.69

Open in a new tab

Empirical variance of estimates over 1000 simulated datasets.

Multiplied by 100.

P-value for Lilliefors test against the null hypothesis of normality.

Abbreviations: ice=iterated conditional expectation, iptw=inverse probability of treatment weighted, tmle=targeted maximum likelihood, qfal=outcome models misspecified, gfal=treatment models misspecified, bfal=both sets of models misspecified, true=all models correctly specified.

6. COVID-19 application

6.1. Data

This section presents an analysis of the motivating example, introduced in Section 2.2. Code and data are provided in the R package didgformula (see Supporting Information). State-level weekly mortality data come from the Centers for Disease Control and Prevention’s National Death Index, and weekly counts of COVID-19 cases from the COVID-19 Data Repository at the Center for Systems Science and Engineering at Johns Hopkins University. Data on state-level stay-at-home orders come from the COVID-19 U.S. State Policy database. Though the outcome variable of interest is an individual-level indicator of death in week $t$ , this variable is not directly observed; instead the observed data represent counts of deaths occurring in each state. Let $s = 1, 2, \dots, 43$ be a state index, and let $Y_{i s t}$ be an indicator of mortality during week $t$ for the $i th$ individual $(i = 1, \dots, n_{s})$ living in state $s$ , where $n_{s}$ denotes the population size in state $s$ , and $n = \sum_{s = 1}^{43} n_{s} \approx 309$ million. The observed outcome variable is $Y_{s t} = \sum_{i = 1}^{n_{s}} Y_{i s t}$ , the state-level weekly sum of individual-level mortality counts, along with population counts $n_{s}$ (drawn from the 2010 Census). The observed treatment variable $A_{s t}$ is an indicator of state $s$ being under stay-at-home order in week $t$ . Finally, let $W_{s t}$ be the change in confirmed COVID-19 cases reported per 100k population in the previous four weeks (i.e., the difference from week $t - 4$ to $t$ ) in state $s$ . Thus, in this example, the parallel trends assumption is conditional on the local state of the pandemic, which may be plausible for pandemic-related policies (Callaway and Li, 2021).

6.2. Estimator implementation

6.2.1. IPTW

For the treatment models, the following parametric models pooled over $k = 1, \dots, 11$ were assumed:

f (A_{s k} ∣ {\bar{A}}_{s, k - 1}; α_{0}) = Bernoulli {{logit}^{- 1} (α_{00} + α_{01} ω (k) + α_{02} A_{s, k - 1})} f (A_{s k} ∣ {\bar{A}}_{s, k - 1}, {\bar{W}}_{s k}; α_{1}) = Bernoulli {{logit}^{- 1} (α_{10} + α_{11} ω (k) + α_{12} A_{s, k - 1} + α_{13} \log W_{s k})}

where $ω (k)$ is a natural cubic spline basis with 3 degrees of freedom for time $k$ . The outcome model $c_{j k} (\bar{A}) = γ_{0 j k} + γ_{1 j k} I ({\bar{A}}_{k} = \bar{1})$ , $k = 1, \dots, 11$ , $j = k, k - 1$ was specified, which allows the outcome to depend on the full exposure history. The parameters $α_{0} = (α_{00}, α_{01}, α_{02})$ and $α_{1} = (α_{10}, \dots, α_{13})$ were estimated using maximum likelihood, weighted by $1 / n_{s}$ to account for differing population sizes across states. Then, $γ_{0 j k}, γ_{1 j k}, k = 1, \dots 11, j = k, k - 1$ were estimated by maximizing the state-level binomial likelihood weighted by inverse probability of treatment weights $π_{k} (\bar{A}; \bar{W}, \hat{α}) = \prod_{m = 1}^{k} f (A_{m} ∣ {\bar{A}}_{m - 1}; {\hat{α}}_{0}) / \prod_{m = 1}^{k} f (A_{m} ∣ {\bar{A}}_{m - 1}, {\bar{W}}_{m}; {\hat{α}}_{1})$ , where ${\hat{α}}_{0}$ and ${\hat{α}}_{1}$ denote maximum likelihood estimators of $α_{0}$ and $α_{1}$ . Then estimators ${\hat{ψ}}_{t}^{I P T W}$ , $t = 0, \dots, 11$ were calculated as ${\hat{ϕ}}_{0, 0}^{I P T W} + \sum_{k = 1}^{t} ({\hat{ϕ}}_{k, k}^{I P T W} - {\hat{ϕ}}_{k - 1, k}^{I P T W})$ , where ${\hat{ϕ}}_{j, k}^{I P T W} = {\hat{γ}}_{0 j k} + {\hat{γ}}_{1 j k}$ and ${\hat{γ}}_{0 j k}$ ; ${\hat{γ}}_{1 j k}$ denote the weighted maximum likelihood estimators.

6.2.2. ICE

For ICE estimators, the following parametric outcome regression models pooled over $k = 1, \dots, 11$ were assumed:

Q^{j, k, m} ({\bar{a}}^{*}; β_{m}) = {logit}^{- 1} {β_{0 j m} + β_{1 j m} ω (k) + β_{2 j m} ω (k) a_{m}^{*} + β_{3 j m} \log W_{m}}

for $j = k, k - 1$ and $m = k, k - 1, \dots, 0$ , where again $ω (k)$ refers to a natural cubic spline basis with 3 degrees of freedom. Note that, due to the monotonic treatment pattern, the interaction between time and treatment allows the outcome to depend on the full exposure history. The parameters $β_{m}$ were estimated by maximizing a binomial quasilikelihood, with estimators denoted ${\hat{β}}_{m}$ . To account for varying state population sizes, state contributions to the quasilikelihood were weighted by $1 / n_{s}$ . Finally, ICE estimators ${\hat{ψ}}_{t}^{I C E}, t = 0, \dots, 11$ were calculated as ${\hat{ϕ}}_{0, 0}^{I C E} + \sum_{k = 1}^{t} ({\hat{ϕ}}_{k, k}^{I C E} - {\hat{ϕ}}_{k - 1, k}^{I C E})$ , where ${\hat{ϕ}}_{j, k}^{I C E} = \sum_{r = 1}^{43} Q_{r}^{j, k, 0} ({\bar{a}}^{*}; {\hat{β}}_{m}) / 43$ (as there are 43 states included in the analysis).

6.2.3. TMLE

For TMLE, the same treatment models as specified for IPTW were used, along with the same outcome models as specified for ICE. Specifically, when estimating $ϕ_{j k}$ , for the $m th$ ICE step $(m = k, k - 1, \dots, 0)$ , the TMLE updating step was performed by maximizing another weighted quasibinomial likelihood with response variable $Q_{s}^{j, k, m + 1} ({\bar{a}}^{*}; β_{m + 1})$ with an intercept and offset $Q_{s}^{j, k, m} ({\bar{a}}^{*}; {\hat{β}}_{m})$ , weighted by $I ({\bar{A}}_{k} = \bar{1}) / g_{k} (\bar{A}, {\hat{α}}_{k})$ . Predictions ${\hat{Q}}_{s}^{j, k, m, *} ({\bar{a}}^{*})$ from this model were then passed to the $(m - 1) th$ ICE step, and the process was repeated for $m = k, k - 1, \dots, 0$ . Finally, ${\hat{ψ}}_{t}^{T M L E}, t = 0, \dots, 11$ were calculated as ${\hat{ϕ}}_{0, 0}^{T M L E} + \sum_{k = 1}^{t} ({\hat{ϕ}}_{k, k}^{T M L E} - {\hat{ϕ}}_{k - 1, k}^{T M L E})$ , where ${\hat{ϕ}}_{j, k}^{T M L E} = \sum_{r = 1}^{43} {\hat{Q}}_{r}^{j, k, 0, *} ({\bar{a}}^{*}) / 43$ .

6.2.4. Bootstrap standard errors and confidence intervals

Standard errors were estimated using a nonparametric bootstrap. Specifically, for $B$ bootstrap replicates $(b = 1, 2, \dots, B)$ , a resampled outcome variable $Y_{s t}^{b} = \sum_{i = 1}^{n_{s}} Y_{i s t}^{b} (t = 0, \dots, 12)$ was drawn from a multinomial distribution with $n_{s}$ trials and probabilities $n_{s}^{- 1} (Y_{s 0}, Y_{s 1}, \dots, {\bar{Y}}_{s, 12})$ , where $Y_{s, 12}$ denotes the number of individuals who survived beyond $t = 11$ in state $s$ . IPTW, ICE, and TMLE estimators ${\hat{ψ}}_{t}^{I P T W, b}$ , ${\hat{ψ}}_{t}^{I C E, b}$ , ${\hat{ψ}}_{t}^{T M L E, b}$ were calculated on each replicate $(b = 1, \dots, B)$ . Then, Wald 95% confidence intervals were computed using the standard deviation of bootstrap estimates.

6.3. Results

Figure 2 shows results in the form of estimated U.S. weekly mortality rates per 100,000 person weeks over the study period under the natural course (red) and under the hypothetical sustained treatment of setting $A_{t} = 1$ for all $t$ , i.e., under a scenario where all 43 included states maintained stay-at-home orders through June 2020. The three estimators largely agree in their predictions that all-cause mortality rates would have been moderately lower throughout most of the study period, had stay-at-home orders remained in place. Translating the counterfactual mortality rate estimates to lives saved, if all causal and modeling assumptions hold, based on TMLE, stay-at-home orders remaining in place from April through June 2020 would have saved appoximately 11,100 (95% CI: 6,800, 15,500) lives in those 43 states during the same time period. Results based on ICE were similar (point estimate: 11,300, 95% CI: 6,900, 15,600), whereas IPTW gave a smaller point estimate and somewhat wider CI (point estimate: 4,100, 95% CI: −500, 8,700).

7. Extensions

7.1. Violations of parallel trends

In some applications, the parallel trends assumption (Assumption 3) may be questionable, and investigators may be interested in how inferences are altered by plausible deviations from parallel trends. A sensitivity analysis can be conducted as follows. Let

Δ ({\bar{w}}_{k}, t) = E {Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{k} = {\bar{w}}_{k}, {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}} - E {Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{k} = {\bar{w}}_{k}, {\bar{A}}_{k} = {\bar{a}}_{k}^{*}},

where $Δ ({\bar{w}}_{k}, t)$ quantifies a deviation from parallel trends, which may depend on both the covariates ${\bar{w}}_{k}$ and time $t$ . Then, consider the following statistical parameter:

ψ_{t}^{'} = E (Y_{0}) + \sum_{k = 1}^{t} \int {E (Y_{k} - Y_{k - 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k} = {\bar{w}}_{k}} + \sum_{m = 1}^{k} Δ ({\bar{w}}_{m}, k)} \prod_{m = 0}^{k} d F (w_{m} ∣ {\bar{w}}_{m - 1}, {\bar{a}}_{m - 1}^{*}) .

If Assumptions 1 and 2 hold, then $μ_{t} = ψ_{t}^{'}$ (the proof follows from results in Appendix A). If a particular value is assumed known for $Δ ({\bar{w}}_{k}, t)$ , then estimation can proceed by defining

ϕ_{j, k}^{'} = \int [E {Y_{j} + \sum_{m = 1}^{k} Δ ({\bar{w}}_{m}, k) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{k} = {\bar{w}}_{k}}] \prod_{k = 0}^{t} d F (w_{k} ∣ {\bar{W}}_{k - 1} = {\bar{w}}_{k - 1}, {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}),

and noting that $ψ_{t}^{'} = ϕ_{0, 0} + \sum_{k = 1}^{t} (ϕ_{k, k}^{'} - ϕ_{k - 1, k})$ . As with $ϕ_{j, k}$ , the parameter $ϕ_{j, k}^{'}$ is simply a special case of the usual g-formula, where the outcome variable is $Y_{j} + \sum_{m = 1}^{k} Δ ({\bar{w}}_{m}, k)$ . Thus, the IPTW, ICE, and TMLE estimators can be used, replacing outcome variables $Y_{k}$ with $Y_{k} + \sum_{m = 1}^{k} Δ ({\bar{w}}_{m}, k)$ (but not for $Y_{k - 1}$ ). In practice, $Δ ({\bar{w}}_{k}, t)$ will typically not be known, and thus estimates may be computed over a range of plausible values of $Δ ({\bar{w}}_{k}, t)$ . Differences in trends between subgroups of units before discontinuation occurs may be helpful in determining plausible values of $Δ ({\bar{w}}_{k}, t)$ (Roth and Rambachan, 2019).

7.2. Dynamic regimes

In addition to the static regimes considered above, the proposed approach can accommodate regimes where treatment decisions may depend on the history of covariates and/or treatments. Let $\bar{g} = {g_{0} (w_{0}), g_{1} ({\bar{w}}_{1}), \dots, g_{τ} ({\bar{w}}_{τ})}$ denote a dynamic regime, where $g_{k} ({\bar{w}}_{k})$ returns the treatment value $a_{k}$ that would be assigned given covariate history ${\bar{w}}_{k}$ . Note that $g_{k} (\cdot)$ may depend on treatment history as well, which we suppress for notational simplicity. Likewise let $Y_{k} (\bar{g})$ be a potential outcome under treatment regime $\bar{g}$ . Suppose interest is in the estimand $μ_{t}^{g} = E {Y_{t} (\bar{g})}$ . Then, consider the following modifications to Assumptions 1-3.

Assumption 4.

(SUTVA for dynamic regimes): If ${\bar{A}}_{i t} = {\bar{g}}_{t} ({\bar{W}}_{i t})$ , then $Y_{i t} = Y_{i t} ({\bar{g}}_{t})$ for $t \in {0, 1, \dots τ}$ .

Assumption 5.

(Positivity for dynamic regimes): If ${{\bar{w}}_{t} ∣ {\bar{A}}_{t - 1} = {\bar{g}}_{t - 1} ({\bar{W}}_{t - 1})} > 0$ , then $f {g_{t} ({\bar{w}}_{t}) ∣ {\bar{W}}_{t} = {\bar{w}}_{t}, {\bar{A}}_{t - 1} = {\bar{g}}_{t - 1} ({\bar{w}}_{t - 1})} > 0$ , for ${\bar{w}}_{t} \in {\bar{𝓦}}_{t}$ ; $t \in {1, 2, \dots, τ}$ .

Assumption 6.

(Parallel trends for dynamic regimes): For $t \in {1, 2, \dots, τ}$ , $k \leq t$ :

E {Y_{t} (\bar{g}) - Y_{t - 1} (\bar{g}) ∣ {\bar{W}}_{k}, {\bar{A}}_{k - 1} = {\bar{g}}_{k - 1} ({\bar{W}}_{k - 1})} = E {Y_{t} (\bar{g}) - Y_{t - 1} (\bar{g}) ∣ {\bar{W}}_{k}, {\bar{A}}_{k} = {\bar{g}}_{k} ({\bar{W}}_{k})}

Lemma 2.

(Parallel trends g-formula, dynamic regimes) Define the functional (i.e., statistical parameter)

ψ_{t}^{g} \equiv E (Y_{0}) + \sum_{k = 1}^{t} \int E {Y_{k} - Y_{k - 1} ∣ {\bar{W}}_{k} = {\bar{w}}_{k}, {\bar{A}}_{k} = {\bar{g}}_{k} ({\bar{w}}_{k})} \prod_{m = 0}^{k} d F {w_{m} ∣ {\bar{w}}_{m - 1}, {\bar{g}}_{m - 1} ({\bar{w}}_{m - 1})}

Under a staggered discontinuation design and if Assumptions 4-6 hold, then $ψ_{t}^{g} = μ_{t}^{g}$ .

The proof of Lemma 2 follows from results in Appendix A. Thus, the IPTW, ICE, and TMLE estimators described can be used, with $ϕ_{j, k}$ appropriately redefined.

8. Discussion

This paper considers a new approach to identifying effects of sustained intervention strategies based on an assumption set that includes parallel trends. This assumption is popular in difference-in-differences because it allows for some degree of unmeasured confounding (Zeldow and Hatfield, 2021). Recently, parallel trends assumptions have been leveraged to target sustained treatment estimands, mainly considering certain types of treatment regimes (Callaway and Sant’Anna, 2021; de Chaisemartin and D’Haultfoeuille, 2020, 2021b,a). Relative to previous work, the main contribution of this paper is a framework for estimating marginal intervention-specific means for general treatment regimes (including dynamic regimes) under parallel trends. This is accomplished by building on IPTW, g-computation, and doubly-robust TMLE developed in the context of sequential exchangeability, thus connecting disparate causal inference literatures from biostatistics (Robins, 1986, 2000; Bang and Robins, 2005; van der Laan and Gruber, 2012) and econometrics (Ashenfelter and Card, 1985; Callaway and Sant’Anna, 2021). Independently and concurrently with the present work, Shahn et al. (2022) developed g-estimation of stuctural nested models for general sustained treatment regimes under parallel trends, with results that imply identification for the intervention-specific means considered here. While it is possible (but complex) to estimate the latter quantity using the g-estimation approach of Shahn et al. (2022), the main strength of g-estimation is in exploring effect heterogeneity by time-varying covariates.

Regarding the example presented in Section 6, care should be taken when assuming parallel trends for pandemic-related outcomes without conditioning on pandemic state variables such as infection rates, as marginal parallel trends are incompatible with standard epidemic models (Callaway and Li, 2021). DID methods have previously been used to estimate effects of stay-at-home orders on the treated (e.g. Fowler et al., 2021). The methods in this paper allow for (i) a different target parameter that may more directly correspond to decisions facing policy makers and public health officials (Maldonado and Greenland, 2002), and (ii) adjustment for time-varying pandemic state variables likely affected by prior treatment, which DID methods have only recently begun to consider (Callaway and Li, 2021). That said, assessing the effects of stay-at-home orders is complex, and a comprehensive analysis would need to consider potential biases not factored into the present analysis; e.g., there is likely some interference (Haber et al., 2021). Thus, the application results are not meant to inform policy or scientific conclusions.

The approach presented here may have application in many other contexts. Many U.S. state-level policies have changed in such a way as to accommodate a staggered discontinuation design, including in domains other than pandemic mitigation. Outside of staggered discontinuation designs, methods developed in this paper apply more generally in settings where baseline potential outcomes are identified. For example, the approach could be used to estimate perprotocol effects in a clinical trial of a time-varying treatment regime with non-adherence.

Several areas for future research remain. First, it will be important to explore efficiency for competing estimators in this framework. Notably, the TMLE presented here is only known to be semiparametric efficient for the nuisance parameters $ϕ_{j k}$ and not necessarily for the target parameter $ψ_{t}$ (van der Laan and Gruber, 2012). Second, though parallel trends may be considered more plausible than sequential exchangeability in some settings, strategies for formally evaluating the assumption using domain knowledge, e.g. using causal diagrams, are in their infancy (e.g., Ghanem et al., 2022). Finally, the focus of this paper was on settings where one treatment regime is of interest. Future research could consider extensions beyond a single regimen, but caution should be exercised when assuming parallel trends for multiple regimens, which would imply certain restrictions on treatment effect heterogeneity that may not be plausible in some settings (Shahn et al., 2022).

Supplementary Material

Web appendices referenced in Sections 3, 5, and 7, code for the simulation study in Section 5, and data and code for the application in Section 6, are available with this paper at the Biometrics website on Wiley Online Library. The R package didgformula|, available at https://github.com/audreyrenson/didgformula , implements the estimators, as well as the simulation study (in vignette “simulation”), and applications results (in vignette “example”).

NIHMS1902004-supplement-Web_appendices_referenced_in_Sections_3__5__and_7__code_for_the_simulation_study_in_Section_5__and_data_and_code_for_the_application_in_Section_6__are_available_with_this_paper_at_the_Biometrics_website_on_Wiley_Online.pdf^{(195.4KB, pdf)}

Simulation replication materials

NIHMS1902004-supplement-Simulation_replication_materials.zip^{(22.2KB, zip)}

Acknowledgements

The authors thank Dr. Whitney Robinson for helpful comments. This research was supported by the NIH grants T32-HD091058–02, T32-AI007001, R01 AI085073, and P2C-HD050924. The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health.

Appendix A Proof of Lemma 1

Here we provide a formal proof by induction of Lemma 1.

Proof. First, note that, by adding and subtracting constants,

E [Y_{t} ({\bar{a}}^{*})] = E [Y_{0} ({\bar{a}}^{*})] + \sum_{k = 1}^{t} E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*})],

and by SUTVA (Assumption 1), $E [Y_{0} ({\bar{a}}^{*})] = E [Y_{0}]$ under a staggered discontinuation design because $A_{i 0} = a_{0}^{*}$ for all $i$ . Thus it remains to prove that

E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*})] = \int E [Y_{k} - Y_{k - 1} ∣ {\bar{W}}_{k} = {\bar{w}}_{k}, {\bar{A}}_{k} = {\bar{a}}_{k}^{*}] \prod_{m = 0}^{k} d F (w_{m} ∣ {\bar{w}}_{m - 1}, {\bar{a}}^{*})

for $1 \leq k \leq t$ . As our induction hypothesis, assume temporarily that the following holds for some $m$ such that $1 \leq m < k \leq t$ :

E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*})] = \int E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{m} = {\bar{w}}_{m}, {\bar{A}}_{m} = {\bar{a}}_{m}^{*}] \prod_{s = 0}^{m} d F (w_{s} ∣ {\bar{w}}_{s - 1}, {\bar{a}}_{s - 1}^{*})

(A1)

If (A1) is true, then, so long as $m \leq t - 1$ , it follows that:

E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*})] = \int E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{m + 1} = {\bar{w}}_{m + 1}, {\bar{A}}_{m} = {\bar{a}}_{m}^{*}] \prod_{s = 0}^{m + 1} d F (w_{s} ∣ {\bar{w}}_{s - 1}, {\bar{a}}_{s - 1}^{*}) = \int E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{m + 1} = {\bar{w}}_{m + 1}, {\bar{A}}_{m + 1} = {\bar{a}}_{m + 1}^{*}] \prod_{s = 0}^{m + 1} d F (w_{s} ∣ {\bar{w}}_{s - 1}, {\bar{a}}_{s - 1}^{*})

(A2)

where the first equality is by iterated expectation and the second by Assumption 3. In other words, if (A1) holds for some $m$ , (A2) proves that the same statement holds for $m + 1 \leq t$ . Next, note that, for all $k$ such that $1 \leq k \leq t$ , by iterated expectation and Assumption 3:

E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*})] = \int E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{1} = {\bar{w}}_{1}] \prod_{s = 0}^{1} d F (w_{s} ∣ w_{s - 1}, a_{s - 1}^{*}) = \int E [Y_{k} ({\bar{a}}^{*}) - Y_{k - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{1} = {\bar{w}}_{1}, {\bar{A}}_{1} = {\bar{a}}_{1}^{*}] \prod_{s = 0}^{1} d F (w_{s} ∣ w_{s - 1}, a_{s - 1}^{*})

which shows that (A1) holds for $m = 1$ . Then, by (A2), (A1) holds for $m = 2, \dots, k$ . Finally, for $m = k$ , by Assumption 1 we have:

E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*})] = \int E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{k} = {\bar{w}}_{k}, {\bar{A}}_{k} = {\bar{a}}_{k}^{*}] \prod_{s = 0}^{k} d F (w_{s} ∣ {\bar{w}}_{s - 1}, {\bar{a}}_{s - 1}^{*}) = \int E [Y_{t} - Y_{t - 1} ∣ {\bar{W}}_{k} = {\bar{w}}_{k}, {\bar{A}}_{k} = {\bar{a}}_{k}^{*}] \prod_{s = 0}^{k} d F (w_{s} ∣ {\bar{w}}_{s - 1}, {\bar{a}}_{s - 1}^{*})

□

Appendix B Proof of parallel trends in simulation data generating distribution

To prove that parallel trends holds under the simulation setup, we first introduce three additional assumptions which trivially hold in the simulation, and then show that they are sufficient to guarantee parallel trends.

Assumption 7.

(Additive equi-confounding) For all $u$ , $u^{'} \in 𝓤_{0}$ ,

E [Y_{t} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u] - E [Y_{t} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u^{'}] = E [Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u] - E [Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u^{'}]

Assumption 8.

(Latent ignorability)

(Y_{t} ({\bar{a}}_{t}), Y_{t - 1} ({\bar{a}}_{t})) ⫫ A_{k} ∣ U_{0}, {\bar{W}}_{k}, {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*} for 1 \leq t \leq T, k \leq t

Assumption 9.

(Measured covariates conditionally independent of unmeasured ones)

W_{t} ⫫ U_{0} ∣ {\bar{W}}_{t - 1}, {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*} for 1 \leq t \leq T

Lemma 3.

(Implied parallel trends) Under Assumptions 7–9,

E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{k}, {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}] = E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{k}, {\bar{A}}_{k} = {\bar{a}}_{k}^{*}] for t \in {1, 2, \dots, τ}, k \leq t .

Proof. Proof of Lemma 3:

To show that parallel trends (Assumption 3) are implied by Assumptions 7–9, we show both that

E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}] = E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}} {E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}]} \dots]

(B1)

and

E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}, {\bar{W}}_{k}] = E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}} {E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}]} \dots]

(B2)

To see (B1) note that, for any $t, k$ such that $1 \leq k \leq t \leq T$ , by repeated applications of iterated expectation and Assumption 8 we have:

E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}] = E_{U_{0} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} (E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}, U_{0}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}, U_{0}} {E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}, U_{0}]} \dots])

By Assumption 7:

= E_{U_{0} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} (E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}, U_{0}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}, U_{0}} {E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}]} \dots])

By Assumption 9:

= E_{U_{0} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} (E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}, {\bar{W}}_{k}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}} {E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}]} \dots]) = E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}} {E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}]} \dots]

(B3)

The last equality holds because the inside terms do not depend on $U_{0}$ . To see (B2), repeated applications of iterated expectation and Assumption 8 again give:

E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}, {\bar{W}}_{k}] = E_{U_{0} ∣ {\bar{A}}_{k - 1} = {\bar{a}}_{k - 1}^{*}, {\bar{W}}_{k}} (E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}, U_{0}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}, U_{0}} {E [Y_{t} (\bar{a}) - Y_{t - 1} (\bar{a}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}, U_{0}]} \dots])

Similarly, by Assumptions 7 and 9:

= E_{U_{0} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} (E_{W_{k + 1} ∣ b a r A_{k - 1} = {\bar{a}}_{k - 1}^{*}, {\bar{W}}_{k}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}} {E [Y_{t} ({\bar{a}}^{*}) - Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}]} \dots]) = E_{W_{k + 1} ∣ {\bar{A}}_{k} = {\bar{a}}_{k}^{*}, {\bar{W}}_{k}} [\dots E_{W_{t} ∣ {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}^{*}, {\bar{W}}_{t - 1}} {E [Y_{t} (\bar{a}) - Y_{t - 1} (\bar{a}) ∣ {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, {\bar{W}}_{t}]} \dots]

which is exactly equal to (B3), thus proving that parallel trends (Assumption 3) hold whenever Assumptions 7-9 hold. This ends the proof of Lemma 3. □

Thus, since Assumptions 7-9 hold in the simulation data-generating distribution, parallel trends also holds. In particular, Assumptions 8-9 are easy to see in the simulation setup, and Assumption 7 holds because:

E [Y_{t} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u] - E [Y_{t} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u^{'}] = E [Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u] - E [Y_{t - 1} ({\bar{a}}^{*}) ∣ {\bar{W}}_{t}, {\bar{A}}_{t} = {\bar{a}}_{t}^{*}, U_{0} = u^{'}]) = θ (u - u^{'})

where $θ$ is the constant (over $t$ ) linear model coefficient relating $Y_{t}$ to $U_{0}$ in the simulation data generating distribution described in Section 5.1.

Footnotes

Software

The R package didgformula, available at https://github.com/audreyrenson/didgformula, implements the estimators, simulation study (in vignette “simulation”), and example results (in vignette “example”) described in the paper.

Data availability statement

The data that support the findings in this paper are available as part of the R package didgformula, available at https://github.com/audreyrenson/didgformula, in the dataset called stayathome2020.

References

Ashenfelter O. and Card D. (1985). Using the longitudinal structure of earnings to estimate the effect of training programs. The Review of Economics and Statistics 67, 648–660. [Google Scholar]
Bang H. and Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973. [DOI] [PubMed] [Google Scholar]
Callaway B. and Li T. (2021). Policy evaluation during a pandemic. arXiv preprint arXiv:2105.06927 1–37. [DOI] [PMC free article] [PubMed]
Callaway B. and Sant’Anna PH (2021). Difference-in-differences with multiple time periods. Journal of Econometrics 225, 200–230. [Google Scholar]
de Chaisemartin C. and D’Haultfoeuille X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review 11, 2964–2996. [Google Scholar]
de Chaisemartin C. and D’Haultfoeuille X. (2021a). Difference-in-differences estimators of intertemporal treatment effects. arXiv preprint arXiv:2007.04267 1–64.
de Chaisemartin C. and D’Haultfoeuille X. (2021b). Two-way fixed effects regressions with several treatments. arXiv preprint arXiv:2012.10077 1–34.
Fowler J, Hill S, Levin R. and Obradovich N. (2021). Stay-at-home orders associate with subsequent decreases in COVID-19 cases and fatalities in the United States. PLoS One. 16, e0248849. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghanem D, Sant’Anna P. and Wüthrich K. (2022) Selection and parallel trends. arXiv Preprint arXiv:2203.09001.
Goodman-Bacon A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics 225, 254–277. [Google Scholar]
Haber NA, Clarke-Deelder E, Salomon JA, Feller A, and Stuart EA (2021). Impact evaluation of coronavirus disease 2019 policy: A guide to common design issues. American Journal of Epidemiology 190, 2474–2486. [DOI] [PMC free article] [PubMed] [Google Scholar]
Halloran ME and Hudgens MG (2016). Dependent happenings: A recent methodological review. Current Epidemiology Reports 3, 297–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maldonado G. and Greenland S. (2002). Estimating causal effects. International Journal of Epidemiology 31, 422–429. [PubMed] [Google Scholar]
Marcus M. and Sant’Anna PH (2021). The role of parallel trends in event study settings: An application to environmental economics. Journal of the Association of Environmental and Resource Economists 8, 235–275. [Google Scholar]
Rambachan A. and Roth J. (2022). A more credible approach to parallel trends. Working Paper 1–47
Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period application to control of the healthy worker survivor effect. Mathematical Modelling 7, 1393–1512. [Google Scholar]
Robins JM (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. Health Service Research Methodology: A Focus on AIDS 113–159.
Robins JM (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 95–133. Springer. [Google Scholar]
Robins JM, Herán MÁ, and Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560. [DOI] [PubMed] [Google Scholar]
Robins JM, Mark SD, and Newey WK (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48, 479–495. [PubMed] [Google Scholar]
Roth J. (2019). Pre-test with caution: Event-study estimates after testing for parallel trends. Working Paper 1–54.
Roth J, Sant’Anna PH, Bilinski A, and Poe J. (2022). What’s trending in difference-in-differences? A synthesis of the recent econometrics literature. arXiv preprint arXiv:2201.01194.
Sant’Anna Pedro H. C. and Zhao J. (2022). Doubly robust difference-in-differences estimators. Journal of Econometrics 219, 101–122. [Google Scholar]
Shahn Z, Dukes O, Richardson D, Tchetgen Tchetgen E, Robins J. (2022). Structural nested mean models under parallel trends assumptions. arXiv preprint arXiv:2204.10291 1–42.
Stefanski LA and Boos DD (2002). The calculus of M-estimation. The American Statistician 56, 29–38. [Google Scholar]
Tran L, Yiannoutsos C, Wools-Kaloustian K, Siika A, van der Laan M, and Petersen M. (2019). Double robust efficient estimators of longitudinal treatment effects: Comparative performance in simulations and a case study. International Journal of Biostatistics 15, 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Laan MJ and Gruber S. (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. International Journal of Biostatistics 8, 1–39. [DOI] [PubMed] [Google Scholar]
Zeldow B. and Hatfield LA (2021). Confounding and regression adjustment in difference-in-differences studies. Health Services Research 56, 932–941. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Simulation replication materials

NIHMS1902004-supplement-Simulation_replication_materials.zip^{(22.2KB, zip)}

Data Availability Statement

The data that support the findings in this paper are available as part of the R package didgformula, available at https://github.com/audreyrenson/didgformula, in the dataset called stayathome2020.

[R1] Ashenfelter O. and Card D. (1985). Using the longitudinal structure of earnings to estimate the effect of training programs. The Review of Economics and Statistics 67, 648–660. [Google Scholar]

[R2] Bang H. and Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973. [DOI] [PubMed] [Google Scholar]

[R3] Callaway B. and Li T. (2021). Policy evaluation during a pandemic. arXiv preprint arXiv:2105.06927 1–37. [DOI] [PMC free article] [PubMed]

[R4] Callaway B. and Sant’Anna PH (2021). Difference-in-differences with multiple time periods. Journal of Econometrics 225, 200–230. [Google Scholar]

[R5] de Chaisemartin C. and D’Haultfoeuille X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review 11, 2964–2996. [Google Scholar]

[R6] de Chaisemartin C. and D’Haultfoeuille X. (2021a). Difference-in-differences estimators of intertemporal treatment effects. arXiv preprint arXiv:2007.04267 1–64.

[R7] de Chaisemartin C. and D’Haultfoeuille X. (2021b). Two-way fixed effects regressions with several treatments. arXiv preprint arXiv:2012.10077 1–34.

[R8] Fowler J, Hill S, Levin R. and Obradovich N. (2021). Stay-at-home orders associate with subsequent decreases in COVID-19 cases and fatalities in the United States. PLoS One. 16, e0248849. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Ghanem D, Sant’Anna P. and Wüthrich K. (2022) Selection and parallel trends. arXiv Preprint arXiv:2203.09001.

[R10] Goodman-Bacon A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics 225, 254–277. [Google Scholar]

[R11] Haber NA, Clarke-Deelder E, Salomon JA, Feller A, and Stuart EA (2021). Impact evaluation of coronavirus disease 2019 policy: A guide to common design issues. American Journal of Epidemiology 190, 2474–2486. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Halloran ME and Hudgens MG (2016). Dependent happenings: A recent methodological review. Current Epidemiology Reports 3, 297–305. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Maldonado G. and Greenland S. (2002). Estimating causal effects. International Journal of Epidemiology 31, 422–429. [PubMed] [Google Scholar]

[R14] Marcus M. and Sant’Anna PH (2021). The role of parallel trends in event study settings: An application to environmental economics. Journal of the Association of Environmental and Resource Economists 8, 235–275. [Google Scholar]

[R15] Rambachan A. and Roth J. (2022). A more credible approach to parallel trends. Working Paper 1–47

[R16] Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period application to control of the healthy worker survivor effect. Mathematical Modelling 7, 1393–1512. [Google Scholar]

[R17] Robins JM (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. Health Service Research Methodology: A Focus on AIDS 113–159.

[R18] Robins JM (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 95–133. Springer. [Google Scholar]

[R19] Robins JM, Herán MÁ, and Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560. [DOI] [PubMed] [Google Scholar]

[R20] Robins JM, Mark SD, and Newey WK (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48, 479–495. [PubMed] [Google Scholar]

[R21] Roth J. (2019). Pre-test with caution: Event-study estimates after testing for parallel trends. Working Paper 1–54.

[R22] Roth J, Sant’Anna PH, Bilinski A, and Poe J. (2022). What’s trending in difference-in-differences? A synthesis of the recent econometrics literature. arXiv preprint arXiv:2201.01194.

[R23] Sant’Anna Pedro H. C. and Zhao J. (2022). Doubly robust difference-in-differences estimators. Journal of Econometrics 219, 101–122. [Google Scholar]

[R24] Shahn Z, Dukes O, Richardson D, Tchetgen Tchetgen E, Robins J. (2022). Structural nested mean models under parallel trends assumptions. arXiv preprint arXiv:2204.10291 1–42.

[R25] Stefanski LA and Boos DD (2002). The calculus of M-estimation. The American Statistician 56, 29–38. [Google Scholar]

[R26] Tran L, Yiannoutsos C, Wools-Kaloustian K, Siika A, van der Laan M, and Petersen M. (2019). Double robust efficient estimators of longitudinal treatment effects: Comparative performance in simulations and a case study. International Journal of Biostatistics 15, 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] van der Laan MJ and Gruber S. (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. International Journal of Biostatistics 8, 1–39. [DOI] [PubMed] [Google Scholar]

[R28] Zeldow B. and Hatfield LA (2021). Confounding and regression adjustment in difference-in-differences studies. Health Services Research 56, 932–941. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Identifying and estimating effects of sustained interventions under parallel trends assumptions

Audrey Renson

Michael G Hudgens

Alexander P Keil

Paul N Zivich

Allison E Aiello

Abstract

1. Introduction

2. Preliminaries

2.1. Data

2.2. Motivating example

Figure 1:

3. Identification

Assumption 1.

Assumption 2.

Assumption 3.

Lemma 1.

4. Estimators

4.1. General form

4.2. Inverse probability of treatment weighted (IPTW) estimator

4.3. Iterated conditional expectation (ICE) estimator

4.4. Doubly robust targeted maximum likelihood estimator (TMLE)

5. Simulation study

5.1. Data generating distribution

5.2. Estimator implementation

5.3. Simulation results

Table 1:

6. COVID-19 application

6.1. Data

6.2. Estimator implementation

6.2.1. IPTW

6.2.2. ICE

6.2.3. TMLE

6.2.4. Bootstrap standard errors and confidence intervals

6.3. Results

Figure 2:

7. Extensions

7.1. Violations of parallel trends

7.2. Dynamic regimes

Assumption 4.

Assumption 5.

Assumption 6.

Lemma 2.

8. Discussion

Supplementary Material

Acknowledgements

Appendix A Proof of Lemma 1

Appendix B Proof of parallel trends in simulation data generating distribution

Assumption 7.

Assumption 8.

Assumption 9.

Lemma 3.

Proof. Proof of Lemma 3:

Footnotes

Data availability statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases