Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jun 1.
Published in final edited form as: Biometrics. 2022 Nov 9;79(2):569–581. doi: 10.1111/biom.13783

Instrumented difference-in-differences

Ting Ye 1, Ashkan Ertefaie 2, James Flory 3, Sean Hennessy 4, Dylan S Small 5
PMCID: PMC10484497  NIHMSID: NIHMS1900009  PMID: 36305081

Abstract

Unmeasured confounding is a key threat to reliable causal inference based on observational studies. Motivated from two powerful natural experiment devices, the instrumental variables and difference-in-differences, we propose a new method called instrumented difference-in-differences that explicitly leverages exogenous randomness in an exposure trend to estimate the average and conditional average treatment effect in the presence of unmeasured confounding. We develop the identification assumptions using the potential outcomes framework. We propose a Wald estimator and a class of multiply robust and efficient semiparametric estimators, with provable consistency and asymptotic normality. In addition, we extend the instrumented difference-in-differences to a two-sample design to facilitate investigations of delayed treatment effect and provide a measure of weak identification. We demonstrate our results in simulated and real datasets.

Keywords: causal inference, effect modification, exclusion restriction, instrumental variables, multiple robustness

1 |. INTRODUCTION

Unmeasured confounding is a key threat to reliable causal inference based on observational studies (Lawlor et al., 2004; Rutter, 2007). A popular approach to handle unmeasured confounding is the instrumental variable (IV) method, which requires an IV that satisfies three core assumptions (Angrist et al., 1996; Baiocchi et al., 2014; Hernan & Robins, 2020): (i) (relevance) it is associated with the exposure; (ii) (independence) it is independent of any unmeasured confounder of the exposure–outcome relationship; (iii) (exclusion restriction) it has no direct effect on the outcome. By extracting exogenous variation in the exposure that is independent of the unmeasured confounder, IVs can be used to estimate the causal effect.

Meanwhile, the increasing availability of large longitudinal datasets such as administrative claims and electronic health records has created new opportunities to expand study designs to take the advantage of the longitudinal structure. One method that is widely used in economics and other social sciences is difference-in-differences (DID) (Card & Krueger, 1994; Angrist & Pischke, 2008). The method of DID is based on a comparison of the trends in outcome for two exposure groups, where one group consists of individuals who switch from being unexposed to exposed and the other group consists of individuals who are never exposed. Under the parallel trends assumption, which says that the outcomes in the two exposure groups evolve in the same way over time in the absence of the exposure, DID is able to remove time-invariant bias from the unmeasured confounder. However, because the setup and assumptions of DID are motivated from applications in social sciences, its applicability is limited in biomedical sciences. For example, in social sciences it is relatively common for a new policy to be applied to one region of the country but not another, creating a circumstance in which key assumptions such as parallel trends are likely to hold and facilitating a DID design. In assignment of pharmacologic or other treatments in health care, such clear natural, exogenous sources of cleavage between exposed and unexposed groups are rare, making it more difficult to identify situations in which all assumptions of DID will be met.

In this paper, we connect these two powerful natural experiment devices (referred to as the standard IV and standard DID) and propose a new method called instrumented DID to estimate the causal effect of the exposure in the presence of unmeasured confounding. Unlike the standard DID, the instrumented DID exploits a haphazard encouragement targeted at a subpopulation toward faster uptake of the exposure or a surrogate of such encouragement, which we call IV for DID. Then, any observed nonparallel trends in outcome between the encouraged and unencouraged groups provides evidence for causation, as long as their trends in outcome are parallel if all individuals were counterfactually not exposed. A prototypical example of instrumented DID is a longitudinal randomized experiment, where after a baseline period, some individuals are randomly selected to be encouraged to take the treatment regardless of their treatment history. If the encouragement is effective, the exposure rate would increase more for the encouraged group than the unencouraged group. If additionally the encouragement has no direct effect on the trend in outcome, then any nonparallel trends in outcome must be due to the nonparallel trends in exposure. Therefore, through exploiting haphazard encouragement that affects the exposure trend, the instrumented DID is able to extract some variation in the exposure trend that is independent of the unmeasured confounder and relax some of the most disputable assumptions of the standard IV and standard DID method, particularly the exclusion restriction for the standard IV method and the parallel trends for the standard DID method; see Section 2 for more discussion.

Reasoning similar to the instrumented DID has been applied informally in prior studies. A prominent example is the differential trends in smoking prevalence for men and women as a consequence of targeted tobacco advertising to women, which were associated with disproportional trends for men and women in lung cancer mortality (Burbank, 1972; Meigs, 1977; Patel et al., 2004). Specifically, because of marketing efforts designed to introduce specific women’s brands of cigarettes such as Virginia Slims in 1967, there was a considerable increase in smoking initiation by young women, which lasted through the mid-1970s (Pierce & Gilpin, 1995). Thirty years later, the lung cancer mortality rates for women at the age of 55 years or older had increased to almost four times the 1970 rate, whereas rates among men had no such dramatic change (Bailar & Gornik, 1997). In Section 7, we will analyze this example using the proposed method.

The rest of this paper is organized as follows. In Section 2, we establish the identification assumptions for the instrumented DID. In Section 3, we develop various estimation and inference approaches. In Section 4, we extend the instrumented DID to a two-sample design. In Section 5, we provide a measure of weak identification. Results from simulation studies and a real-data application are in Sections 6 and 7, respectively. The paper concludes with a discussion in Section 8. A review of IV and DID designs can be found in Section S1 of the Supporting information.

2 |. INSTRUMENTED DIFFERENCE-IN-DIFFERENCES: IDENTIFICATION

Suppose that random samples of a target population are collected at two time points t=0 and t=1, and there is no overlap between individuals in these two samples. We leave consideration of overlapping samples and multiple time points to future work. For each individual i in the pooled sample, we observe Oi=Ti,Zi,Xi,Di,Yi, where Ti is a time indicator that equals one if this individual appears at t=1, equals zero if t=0,Zi is a binary IV for DID observed at the baseline, Xi is a vector of baseline covariates, Di is a binary exposure variable, Yi is a real-valued outcome of interest. We assume that O1,,On are independent and identically distributed (i.i.d.) realizations of O=(T,Z,X,D,Y). This data setup is also commonly known as repeated cross-sectional data (Abadie, 2005).

We define causal effects using the potential outcomes framework (Neyman, 1923; Rubin, 1974). For each individual, let Dt(z) be the potential exposure if this individual were observed at time t and if Z were externally set to z,Yt(d) be the potential outcome if this individual were observed and exposed to d at time t, and Z had the same value it actually had. The full data vector for each individual is (Z,X,Dt(z),Yt(d),t=0,1,z=0,1,d=0,1). Moreover, let Y(d):=TY1(d)+(1-T)Y0(d) be the potential outcome if this individual were exposed to d at the time point it actually got sampled and Z had the same value it actually had. Our target estimand is the average treatment effect (ATE) β0=E(Y(1)-Y(0)) and conditional average treatment effect (CATE) β0(v)=E(Y1-Y0|V=v), where V is a pre-specified subset of X, representing the effect modifiers of interest; for example, setting V to be an empty set gives the unconditional ATE β0. Note that the separation of V and X separates the need to adjust for possible confounding and the specification of effect modifiers of interest, which provides great flexibility and allows researchers to define the target estimand a priori. Throughout the paper, we consider treatment effect on the additive scale.

We make the following identification assumptions for using the instrumented DID.

Assumption 1.

  1. (Consistency) D=DT(Z) and Y=YT(D).

  2. (Positivity) 0<P(T=t,Z=z|X)<1 for t=0,1,z=0,1 with probability 1.

  3. (Random sampling) T(Dtz,Ytd,t=0,1,z=0,1,d=0,1)|Z,X.

Assumption 1(a) states that the observed exposure is D=Dt(z) if and only if Z=z and T=t, and the observed outcome is Y=Yt(d) if and only if D=d and T=t. Implicit in this assumption is that an individual’s observed outcome is not affected by others’ exposure level or this individual’s exposure level at the other time point; this is known as the Stable Unit Treatment Value Assumption (Rubin, 1978, 1990). Assumption 1(b) postulates that there is a positive probability of receiving each (t,z) combination within each level of X, or equivalently, the support of X is the same for each level of (T,Z). Assumption 1(c) is often assumed for repeated cross-sectional studies and says that for each level of (Z,X), the collected data at every time point are a random sample from the underlying population; see Section 3.2.1 of Abadie (2005) that makes a similar assumption.

Assumption 2.

(Instrumented DID). With probability 1,

  1. (Trend relevance) E(D1(1)-D0(1)|Z=1,X)E(D1(0)-D0(0)|Z=0,X).

  2. (Independence & exclusion restriction) Z(Dt(0),Dt(1),Y1(0)-Y0(0),Yt(1)-Yt(0),t=0,1)|X.

  3. (No unmeasured common effect modifier) Cov(Dt(1)-Dt(0),Yt(1)-Yt(0)|X)=0 for t=0,1.

  4. (Stable treatment effect over time) E(Y1(1)-Y1(0)|X)=E(Y0(1)-Y0(0)|X).

Assumptions 2(a) and (b) formalize the core assumptions that an IV for DID needs to satisfy, which are illustrated by a directed acyclic graph (DAG) in Figure 1. Assumptions 2(a) and (b) are also parallel to the core assumptions for the standard IV introduced in Section 1.

FIGURE 1.

FIGURE 1

Directed acyclic graph (DAG) for instrumented difference-in-differences (DID). Suppose the existence of an unmeasured confounder Ut such that D0,D1Y0,Y1|U0,U1,X. Assumption 2(a) states that Z must be associated with the change in exposure D1-D0, Assumption 2(b) states that Z is independent of any unmeasured confounders U0,U1 and cannot have any direct effect on the change in outcome Y1-Y0 and does not modify the treatment effect.

Assumption 2(a) says that the IV for DID Z, as an encouragement that disproportionately acts on only a subpopulation, affects the trend in exposure. For example, Z can be a random encouragement for some subjects in a longitudinal experiment, an advertisement campaign targeted at a certain geographic region or subpopulation, or a change in reimbursement policies for a certain insurance plan. Under Assumption 1, Assumption 2(a) is equivalent to E(D|T=1,Z=1,X)-E(D|T=0,Z=1,X)E(D|T=1,Z=0,X)-E(D|T=0,Z=0,X) with probability 1, thus is checkable from observed data.

Assumption 2(b) is an integration of the independence and exclusion restriction assumptions. To see this, we adopt a more elaborated definition of the potential outcomes and define Yt(dz) as the potential outcome if the individual were observed and exposed to d at time t, and if Z were externally set to z, then Assumption 2(b) is implied by (independence) Z(Dt(0),Dt(1),Y1(0z)-Y0(0z),Yt(1z)-Yt(0z),t=0,1,z=0,1)|X and (exclusion restriction) Yt(11)-Yt(01)|X~dYt(10)-Yt(00)|X and Y1(01)-Y0(01)|X~dY1(00)-Y0(00)|X, where ~d means having the same distribution; see Tan (2006) for a parallel statement for the standard IV and Hernán and Robins (2006) for connections and comparisons between different definitions of the standard IV. Hence, Assumption 2(b) essentially states that Z is unconfounded, has no direct effect on the trend in outcome, and does not modify the treatment effect. Here, we see the main advantage of using Z as an IV for DID compared to as a standard IV: Z as an IV for DID is allowed to have a direct effect on the outcome, as long as it has no direct effect on the trend in outcome and does not modify the treatment effect. For example, Newman et al. (2012) considered using a hospital’s preference for phototherapy when treating newborns with hyperbilirubinemia as a standard IV to study the effect of phototherapy but found evidence that hospitals that use more phototherapy also have greater use of infant formula, which is thought to be an effective treatment for hyperbilirubinemia. Hence, the hospital’s preference is a potentially invalid standard IV as it can have a direct effect on the outcome through the use of infant formula. However, it may still qualify as an IV for DID if the use of phototherapy evolves differently between the high and low preference hospitals over time, but the use of infant formula in the two groups of hospitals does not change over time. These features imply that variables like hospital’s preference may be more likely to be an IV for DID, compared to being a standard IV.

Assumption 2(c) is developed in Cui and Tchetgen Tchetgen (2021) and a slightly stronger version is proposed earlier in Wang and Tchetgen Tchetgen (2018). Suppose in this paragraph only the existence of an unmeasured confounder Ut such that (Dt(1),Dt(0))(Yt(1),Yt(0))|Ut,X, then Assumption 2(c) holds if either (i) there is no additive Ut-z interaction in E(Dt(z)|Ut,X):E(Dt(1)-Dt(0)|Ut,X)=E(Dt(1)-Dt(0)|X); or (ii) there is no additive Ut-d interaction in E(Yt(d)|Ut,X):E(Yt(1)-Yt(0)|Ut,X)=E(Yt(1)-Yt(0)|X).

Assumption 2(d) requires that the CATE does not change over time. This is a strong assumption but may be plausible in many applications when the study period only spans a short period of time. In our application in Section 7, we conduct a sensitivity analysis to gauge the sensitivity of the study conclusion to violation of this assumption.

Two additional remarks on Assumption 2 are in order. First, an attractive feature of Assumptions 2(c) and (d) is that they are guaranteed to be true under the sharp null hypothesis of no treatment effect for all individuals. This means that the instrumented DID method can be used for testing the sharp null hypothesis under Assumptions 2(a) and (b). Second, from the definition of potential exposures Dt(z), the IV for DID Z is considered causal for the exposure. In Section S4.2, we present another version of notations and assumptions which does not require Z to be causal, that is, Z is allowed to be correlated with a cause that affects the trend in exposure, and is more suitable for our application in Section 7 in which we use gender as the IV for DID for its correlation with the encouragement from targeted tobacco advertising.

For C{Y,D}, let μC(t,z,X)=E(C|T=t,Z=,X), δC(X)=μC(1,1,X)-μC(0,1,X)-μC(1,0,X)+μC(0,0,X), and let μC(t,z) and δC denote their counterparts without observed covariates. The next proposition indicates that the (conditional) ATE can be identified under the above assumptions.

Proposition 1.

If Assumptions 1 and 2 hold, then

δ(X):=δY(X)δD(X)=β0(X)andE[δ(X)|V=v]=β0(v). (1)

This and all the other proofs in this paper are in Section S3.

Now we contrast the instrumented DID with standard DID. As discussed in Section 1, the standard DID identifies the ATE for the treated in the post-treatment period from comparing the trends in outcome between two exposure groups, where every individual in one group switches from being unexposed to exposed between two time points, and every individual in the other group is never exposed. However, its key assumption, the parallel trends, will be violated if there is a time-invariant unmeasured confounder that has time-varying effects or there is a time-varying unmeasured confounder in the exposure–outcome relationship. We use time-varying unmeasured confounding to refer to either case. In contrast, the instrumented DID explicitly probes the relationship between the trend in outcome and the trend in exposure using an exogenous variable Z which often results in partial compliance with exposure within groups defined by levels of Z. Compared with standard DID, instrumented DID is robust to time-varying unmeasured confounding in the exposure–outcome relationship by making use of an exogenous variable Z that is not subject to this time-varying unmeasured confounding.

We remark that when there are no observed covariates, δY/δD has been derived in alternative ways in econometrics under different assumptions. It is the same as the standard IV Wald ratio after first differencing the exposure and outcome when each individual is observed at both time points (Wooldridge, 2010, Chap. 15.8), as motivated from the linear structural equation models. Importantly, Proposition 1 provides a justification of this approach using the potential outcomes framework without any modeling assumption. It is also the same as the Wald ratio in the fuzzy DID method for identification of a local ATE under the assumption that individuals can switch treatment in only one direction within each treatment group (de Chaisemartin & D’HaultfŒuille, 2018), as motivated from social science applications (e.g., Duflo 2001). Compared with this derivation, our proposed instrumented DID is less stringent in terms of the direction in which each individual can switch treatment, thus is better suited for applications using healthcare data where individuals can switch treatment in any direction. In addition, we complement the proposed instrumented DID with a novel semiparametric estimation and inference method in Section 3.2, two-sample design in Section 4, and measure of weak identification in Section 5.

Finally, we note that Assumption 2(c) can be replaced by the monotonicity assumption Dt(1)Dt(0) for t=0,1 with probability 1, under which δ(X) in (1) identifies a complier ATE; see Section S3.3 for details.

3 |. ESTIMATION AND INFERENCE

3.1 |. Wald estimator

When there are no observed covariates, based on Proposition 1, we can simply replace the conditional expectations in Equation (1) with their sample analogues and obtain the Wald estimator

β^wald=μ^Y(1,1)-μ^Y(0,1)-μ^Y(1,0)+μ^Y(0,0)μ^D(1,1)-μ^D(0,1)-μ^D(1,0)+μ^D(0,0)=δ^Yδ^D, (2)

where μ^C(t,z)=i=1nCiITi=t,Zi=z/i=1nITi=t,Zi=z, δ^C=μ^C(1,1)-μ^C(0,1)-μ^C(1,0)+μ^C(0,0), for C{Y,D}. In Theorem S1, we prove consistency and asymptotic normality of β^wald and give a consistent variance estimator.

3.2 |. Semiparametric theory and multiply robust estimators

Consider the case with a baseline observed covariate vector X. Suppose that we have a parametric model for β0(v), written as β(v;ψ) for some finite-dimensional parameter ψ. Importantly, we do not assume that this model is necessarily correct, but instead treat it as a working model and formulate our estimand as the projection of the CATE β0(v) onto the working model β(v;ψ). Specifically, we use the weighted least-squares projection given by

ψ0=argminψEwVβ0V-βV;ψ2, (3)

where w(v) is a user-specified weight function, which can be tailored if there is subject matter knowledge for emphasizing specific parts of the support of V; otherwise, we can set w(v)=1. By definition, βV;ψ0 is the best least-squares approximation to the CATE β0(V). For example, when effect modification is not of interest, we can specify β(v;ψ)=ψ such that β0(V) is projected onto a constant ψ0, which can be interpreted as the ATE; if we want to estimate a linear approximation of the CATE, we can specify β(v;ψ)=vTψ, with V including the intercept. This working model approach is also adopted in Abadie (2003), Ogburn et al. (2015), and Kennedy et al. (2019).

Let π(t,z,x)=P(T=t,Z=z|X=x), bC(x)=μC(0,0,x), mCZ(x)=μC(0,1,x)-μC(0,0,x), mCT(x)=μC(1,0,x)-μC(0,0,x), and ΔC(x)=bC(x),mCZ(x),mCT(x), for C{Y,D}. Consider three sets of model assumptions:

1: models for δ(x),ΔD(x),ΔY(x) are correct.

2: models for π(t,z,x), δD(x) are correct.

3: models for π(t,z,x), δ(x) are correct.

In what follows, we first discuss three different estimators for ψ that are consistent and asymptotically normal under 1,2,3, respectively. Bounded semiparametric estimators analogous to those in Wang and Tchetgen Tchetgen (2018) are developed in Section S2.4. Let PnX=n-1i=1nXi be the empirical average. Under model 1, we present a regression-based estimator ψ^reg that solves

Pnq(V;ψ){δx;α^reg-β(V;ψ)}=0,

where q(v;ψ)=w(v)β(v;ψ)/ψ,δ(x;α) is a parametric specification of δ(x), α^reg the solution to Pnhα(X) {Y-b^Y(X)-mˆYZ(X)Z-mˆYT(X)T-δ(X;α)(D-b^D(X)-m^DZ(X)Z-m^DT(X)T)}=0,hα(X) a vector of the same dimension as α, and b^D,mˆDZ,mˆDT,b^Y,mˆYZ,mˆYT are respectively estimators of bD,mDZ,mDT,bY,mYZ,mYT. Under model 2, we present an inverse probability weighting (IPW) estimator ψ^ipw that solves

Pnq(V;ψ)(2Z-1)(2T-1)Yπ^(T,Z,X)δD(X;θ^)-β(V;ψ)=0,

where δD(x;θ) is a parametric specification of δD(x), θˆ the solution to Pnhθ(X){(2Z-1)(2T-1)D/π^(T,Z,X)-δD(X;θ)=0,π^(t,z,x) an estimator of π(t,z,x), and hθ(X) a vector of the same dimension as θ. Finally, under model 3, we present an estimator ψg^ based on g-estimation, defined as the solution to

Pnq(V;ψ)δx;α^g-β(V;ψ)=0,

where α^g is the solution to Pnhα(X){(2Z-1)(2T-1)(Y-δ(X;α)D)/πˆ(T,Z,X)}=0. These three classes of estimators are consistent and asymptotically normal in three different models 1,2,3, following standard arguments, for example, as in Newey and McFadden (1994, Chap. 6.1). Depending on the specific applications, some classes may be more preferable when knowledge about certain nuisance parameters is available. In practice, when we are uncertain about which models are correctly specified, it is of interest to develop a multiply robust estimator that is guaranteed to deliver valid inference about ψ0 provided that one, but not necessarily more than one, of models 1,2,3 holds (Vansteelandt et al., 2008; Wang & Tchetgen Tchetgen, 2018; Shi et al., 2020).

The next theorem derives the efficient influence function for ψ (Bickel et al., 1993; van der Vaart, 2000), which provides the basis of constructing a multiply robust estimator.

Theorem 1.

If Assumptions 1 and 2 hold, and β(v;ψ)/ψ exists and is continuous. Under a nonparametric model, the efficient influence function for ψ is proportional to

φ(O;ψ,η)=q(V;ψ)(δ(X)-β(V;ψ)+(2Z-1)(2T-1)π(T,Z,X)δD(X)
Y-bY(X)-mYZ(X)Z-mYT(X)T-δ(X)D-bD(X)-mDZ(X)Z-mDT(X)T, (4)

where η=π,δD,δ,ΔD,ΔY denotes the vector of nuisance parameters, ΔD=bD,mDZ,mDT,ΔY=bY,mYZ,mYT, and q(v;ψ)=w(v)β(v;ψ)/ψ.

Note that the efficient influence function gives an estimator ψ^mr defined as a solution to Pnφ(O;ψ,ηˆ)=0, where ηˆ is a vector of the estimated nuisance parameters. Among the nuisance parameters, π,ΔD,ΔY can be estimated directly from likelihood or moment equations, whereas the estimation of δD and δ relies on additional nuisance parameters. To achieve multiple robustness, we need to construct a consistent estimator of δD in the union of 1 and 2, as well as a consistent estimator of δ in the union of 1 and 3. We achieve these goals by using doubly robust g-estimation (Robins, 1994). Specifically, we solve for δ^D(x)=δD(x;θ^dr) and δ^(x)=δx;α^dr respectively from

Pnhθ(X)(2Z-1)(2T-1)πˆ(T,Z,X)D-b^D(X)-m^DZ(X)Z-m^DT(X)T-δD(X;θ)ZT=0,
Pnhα(X)(2Z-1)(2T-1)π^(T,Z,X)Y-b^Y(X)-m^YZ(X)Z-mˆYT(X)T-δ(X;α)(D-b^D(X)-mˆDZ(X)Z-mˆDT(X)T)=0.

We prove in the Supporting information that ψ^mr is multiply robust, in the sense that the estimator is consistent as long as either one of the three models 1,2,3 holds.

Next, we derive the asymptotic properties of ψ^mr. Let p denote convergence in probability, ψ=ψTψ1/2 the Euclidean norm for any column vector ψ,f2=f2(o)dP(o)1/2 the L2(P) norm for any real-valued function f, and f2=j=1fj2 for any collection of real-valued functions f=f1,,f, where P denotes the distribution of O. Moreover, let η0=π0,δD0,δ0,ΔD0,ΔY0 denote the true values of the nuisance parameters.

Assumption 3.

  1. (ψ^mr,ηˆ)pψ0,η, where η=π,δD,δ,ΔD,ΔY with either (i) δ=δ0,ΔD=ΔD0,ΔY=ΔY0; or (ii) π=π0 and δD=δD0; or (iii) π=π0 and δ=δ0.

  2. For each ψ in an open subset of Euclidean space and each η in a metric space, let φ(o;ψ,η) be a measurable function such that the class of functions {φ(o;ψ,η):ψ-ψ0<ϵ,η-η2<ϵ is Donsker for some ϵ>0, and such that Eφ(O;ψ,η)-φO;ψ0,η20 as (ψ,η)ψ0,η. The maps ψE{φ(O;ψ,η)} are differentiable at ψ0, uniformly in η in a neighborhood of η with nonsingular derivative matrices Mψ0,ηMψ0,η.

Assumption 3(a) describes the multiple robustness of our estimator. Assumption 3(b) is standard for M-estimators (van der Vaart, 2000, Chap. 5.4).

Theorem 2.

Under Assumptions 13, ψ^mr is consistent with rate of convergence

ψ^mr-ψ0=Opn-1/2+δ^-δ02(π^-π02+δ^D-δD02)+π^-π02(Δ^Y-ΔY02+Δ^D-ΔD02).

Suppose further that

δ^-δ02(π^-π02+δ^D-δD02)+π^-π02(Δ^Y-ΔY02+Δ^D-ΔD02=opn-1/2,

then ψ^mr is asymptotically normal and semiparametric efficient, satisfying

n(ψ^mr-ψ0)dN0,Mψ0,η0-1EφO;ψ0,η0φO;ψ0,η0T(Mψ0,η0-1)T. (5)

The first part of Theorem 2 describes the convergence rate of ψ^mr, which again indicates the multiple robustness of our estimator. That is, ψ^mr is consistent provided that (i) either one of πˆ or (Δ^Y,Δ^D) is consistent, and (ii) either one of δˆ or (πˆ,δˆD) is consistent. The multiple robustness property is important in practice, because nuisance parameters such as π,δD, and δ may be easier to estimate than ΔY and ΔD. When all the nuisance parameters are consistently estimated, we can still benefit from using the semiparametric methods, in that even the nuisance parameters are estimated at slower rates, ψ^mr can still have the fast convergence rate. For example, if all the nuisance parameters are estimated at n-1/4 rates, then ψ^mr can still achieve fast n-1/2 rate. The second part of Theorem 2 says that if the nuisance parameters are consistently estimated with fast rates, for example, if they are estimated using parametric methods, then their variance contributions are negligible, and ψ^mr achieves the semiparametric efficiency bound.

When Equation (5) holds, a plug-in variance estimator for nψ^mr can be easily constructed as M^-1{Pnφ(O;ψ^mr,ηˆ)φ(O;ψ^mr,η^)T}(M^-1)T, with M^=Pnφ^(O;ψ,η^)/ψ|ψ=ψ^mr. Even if Equation (5) does not hold, for example, when only one of 1,2,3 holds, but all the nuisance parameters are finite-dimensional and in the form of M-estimators, ψ^mr is still consistent and asymptotically normal from standard M-estimation theory (Newey & McFadden, 1994, Chap. 6). Thus, a consistent variance estimator for nψ^mr can be constructed by stacking the efficient influence function φ(O;ψ,η) and the estimation equations for the nuisance parameters, solving for (ψ^mr,η^) simultaneously, and taking the corresponding diagonal component of the joint sandwich variance estimator. Alternatively, the nonparametric bootstrap is commonly used in practice (Cheng & Huang, 2010).

4 |. TWO-SAMPLE INSTRUMENTED DIFFERENCE-IN-DIFFERENCES

In some applications, it is hard to collect the exposure and outcome variables for the same individual, especially when the outcome is defined to reflect a delayed treatment effect. For instance, in the smoking and lung cancer example in Section 1, the outcome of interest is lung cancer mortality after 35 years and it is infeasible to follow the same individuals for 35 years. Motivated from Angrist and Krueger (1992, 1995)’s influential two-sample standard IV analysis, we extend the instrumented DID to a two-sample design.

Suppose there are na i.i.d. realizations of Ta,Za,Da,Ya from one sample, and nb i.i.d. realizations of Tb,Zb,Db,Yb from another sample. These two samples are independent of each other and we never observe Da and Yb. We write the observed data as Tai,Zai,Yai,i=1,,na and Tbi,Zbi,Dbi,i=1,,nb, which are respectively referred to as the outcome dataset and the exposure dataset. Let δYa,δ^Ya,δDb,δ^Db,μ^Ya(t,z), μ^Db(t,z) be as defined in Equations (1) and (2) but evaluated correspondingly using the outcome dataset and exposure dataset. Suppose that Assumptions 1 and 2 hold for the data-generating processes in both datasets, and EYa|Ta,Za=EYb|Tb,Zb,EDa|Ta,Za=EDb|Tb,Zb, then the ATE is identified by β0=δYa/δDb. Analogously, the two-sample instrumented DID Wald estimator is obtained as βˆTSwald=δ^Ya/δ^Db. In Theorem S2, we establish the consistency and asymptotic normality of β^TSwald and provide a consistent variance estimator. Both βˆTSwald and its variance estimator can be conveniently calculated based on solely summary statistics μ^Ya(t,z) and μ^Db(t,z) and their standard errors (SEs).

5 |. MEASURE OF WEAK IDENTIFICATION

Weak identification is a general challenge for IV-type methods and has recently received increased attention among theoretical and applied researchers; see Stock et al. (2002) for a survey. For standard IV, weak identification refers to that the IVs are only weakly associated with the exposure. For instrumented DID, weak identification refers to that the trends in exposure for Z=0 and Z=1 are near-parallel. Under weak identification, the sampling distribution for the point estimators is generally non-normal and the standard inference can be unreliable (Bound et al., 1995). Therefore, it is important to have a measure of weak identification tailored for the instrumented DID as diagnostic checks to make sure the developed asymptotic inference procedures can be reliably applied.

Consider first the case when there are no observed covariates. We take the one-sample estimator β^wald as an example; the result for the two-sample estimator β^TSwald is similar. Note that δ^Y and δ^D can be respectively obtained from fitting a saturated model of Y or D on 1, ZT,Z, and T, where ZT is the interaction term. Let R be the n-dimensional vector of residuals from regressing ZT on 1, Z, and T. By using the Frisch–Waugh–Lovell theorem (Davidson & MacKinnon, 1993), β^wald in Equation (2) can be equivalently formulated as

β^wald=δ^Yδ^D=RTR-1RTYRTR-1RTD=DTHRYDTHRD,

where DT=D1,,Dn,YT=Y1,,Yn,HR=RRTR-1RT is the hat matrix. Interestingly, the above formula indicates that β^wald can be alternatively obtained from a conventional two-stage least squares: the exposure D is first regressed on R (first-stage regression) and the outcome Y is then regressed on the predicted values from the first-stage regression. This provides a perception that Z as an IV for DID is equivalent to using ZT as the standard IV while further controlling for 1, Z, and T. Hence, the concentration parameter of ZT as the standard IV (controlling for 1, Z, and T) serves here as a measure of weak identification using Z as the IV for DID. Specifically, this measure is defined as κ2=δD2RTR/σε2, where δD is defined in Proposition 1, σϵ2 is the population residual variance from the first-stage regression. Heuristically, κ2 increases if we have a larger sample size n, larger δD2, or a larger limit of RTR/n. For the usual inference based on normal approximation to be accurate, κ2 must be large.

A commonly used estimate of κ2 is the F-statistic from the first-stage regression. When only summary-data are available, that is, only δ^D and its SE are available, one can also use the squared z-score as an estimate of κ2, where the z-score is the ratio of δ^D to its SE. When there are observed covariates, a measure of weak identification can also be easily calculated by defining R as the vector of residuals from regressing ZT on 1, Z,T,X. We follow Stock et al. (2002) and recommend checking to make sure that an estimated κ2 is larger than 10 before applying the inference methods in Sections 3 and 4.

6 |. SIMULATIONS

To evaluate the finite sample performance of the proposed instrumented DID (iDID) methods, we simulate data as follows: X=X1,X2T,X1~N(0,1), X2~N(0,1), Z~Binomexpit0.5IX1>0+0.5IX2>0,T~Binom(0.5), Ut~N(2t-1,1), ϵt~N(0,1), Dt~Binom(expit(-0.5-ZUt+1.5Ut)),Yt=1+X1+X2Dt+2+2Ut+Z+(1+X1+X2+ϵt, for t=0,1. We simulate n=105 random samples from T,Z,X,D0,D1,Y0,Y1 and let D=TD1+(1-T)D0,Y=TY1+(1-T)Y0. The observed data are Zi,Xi,Ti,Di,Yi,i=1,,n.

Under this data-generating process, Assumptions 1 and 2 do not hold unconditionally, but do hold in each of the four strata defined by IX1>0,IX2>0. Hence, the Wald estimator in Equation (2) is valid when considering each stratum separately; we denote the obtained stratum-specific Wald estimators as β^S1,wald,,β^S4,wald and they are respectively estimating the stratum-specific ATE: −0.60, 1, 1, 2.60. On the other hand, Assumptions 1 and 2 hold when conditioning on X, and thus the three classes of semiparametric estimators ψ^reg,ψ^ipw,ψ^g and the multiply robust estimator ψ^mr proposed in Section 3.2 are all valid. For the semiparametric iDID method, we consider two working models for the CATE: a constant working model β(v;ψ)=ψ with V=1 and a linear working model β(v;ψ)=ψ1+ψ2x1, with V=X1. The true values of ψ,ψ1,ψ2 are all equal to 1 because E(Yt(1)-Yt(0))=1 and (Yt(1)-Yt(0)|X1)=1+X1. The weight function w(v) in Equation (3) is set to be 1.

We also examine the effect of model misspecification for the semiparametric iDID estimators. Note that the data-generating process implies that (t,1,x)=expit0.5Ix1>0+0.5Ix2>0/2,π(t,0,x)=1-expit0.5Ix1>0+0.5Ix2>0/2, and δD(x), δ(x), bD(x), bY(x), mDZ(x), mDT(x), mYZ(x), mYT(x) are all linear in x. The misspecified model we fit for π(t,z,x) is a product of two logistic regressions, one for Z and the other for T, both in terms of expx1/2, the misspecified models for δD(x), δ(x) are linear in x1, and for bD(x), bY(x), mDZ(x), mDT(x), mYZ(x), mYT(x) are linear in expx1/2.

We compare with two other methods, direct treated-versus-control outcome comparison using ordinary least squares (OLS) and the standard IV method using Z as the IV, where the latter is implemented using the R package ivpack (Jiang & Small, 2014). Direct outcome comparison is invalid because of the unmeasured confounder Ut; the standard IV method is also invalid due to the direct effect of Z on the outcome. Table 1 shows the simulation results based on 1,000 repetitions, which includes: (i) the simulation average bias and standard deviation (SD) of each estimator; (ii) the mean standard errors (SEs), which are calculated according to Equation (S4) in the supplementary materials for the Wald estimator, using standard M-estimation theory for the semiparametric estimators; (iii) simulation coverage probability (CP) of 95% confidence intervals.

TABLE 1.

Bias, standard deviation (SD), average standard error (SE), and coverage probability (CP) of 95% asymptotic confidence interval based on 1,000 repetitions with n=105

Method Correct model Estimator Bias SD SE CP
OLS 2.466 0.024 0.023 0.000
Standard IV −38.525 3.385 3.316 0.000
iDID-Wald β^S1,wald −0.014 0.247 0.251 0.950
β^S2,wald 0.007 0.253 0.259 0.958
β^S3,wald −0.008 0.250 0.259 0.961
β^S4,wald 0.000 0.289 0.284 0.943
Constant working model β(V;ψ)=ψ
iDID-MR 1,2,3 all correct ψ^mr −0.002 0.111 0.114 0.956
1 correct ψ^mr −0.001 0.110 0.114 0.960
2 correct ψ^mr −0.003 0.136 0.139 0.944
3 correct ψ^mr −0.003 0.137 0.140 0.945
none ψ^mr −0.355 0.144 0.142 0.293
iDID-Reg 1 correct ψ^reg −0.002 0.109 0.114 0.960
1 incorrect ψ^reg −0.351 0.144 0.149 0.335
iDID-IPW 2 correct ψ^ipw −0.021 0.225 0.225 0.948
2 incorrect ψ^ipw −0.271 0.234 0.242 0.816
iDID-G 3 correct ψ^g −0.021 0.225 0.224 0.948
3 incorrect ψ^g −0.276 0.235 0.233 0.814
Linear working model β(V;ψ)=ψ1+ψ2X1
iDID-MR 1,2,3 all correct ψ^1,mr −0.002 0.110 0.114 0.956
ψ^2,mr 0.004 0.113 0.115 0.950
1 correct ψ^1,mr −0.001 0.110 0.114 0.960
ψ^2,mr 0.004 0.115 0.118 0.946
2 correct ψ^1,mr −0.003 0.136 0.139 0.944
ψ^2,mr 0.003 0.146 0.150 0.960
3 correct ψ^1,mr −0.003 0.137 0.140 0.946
ψ^2,mr 0.004 0.144 0.149 0.958
None ψˆ1,mr −0.355 0.144 0.142 0.292
ψ^2,mr −0.129 0.221 0.175 0.908
iDID-Reg 1 correct ψ^1,reg −0.001 0.109 0.114 0.960
ψ^2,reg −0.005 0.114 0.118 0.949
1 incorrect ψ^1,reg −0.351 0.144 0.149 0.332
ψ^2,reg −0.110 0.473 0.478 0.942
iDID-IPW 2 correct ψ^1,ipw −0.021 0.225 0.225 0.948
ψ^2,ipw −0.010 0.270 0.269 0.957
2 incorrect ψ^1,ipw −0.271 0.234 0.242 0.813
ψ^2,ipw −0.072 0.267 0.313 0.957
iDID-G 3 correct ψ^1,g −0.021 0.225 0.224 0.949
ψ^2,g −0.007 0.245 0.247 0.953
3 incorrect ψ^1,g −0.276 0.235 0.233 0.812
ψ^2,g −0.234 0.246 0.241 0.863

Abbreviations: iDID, instrumented difference-in-differences; OLS, ordinary least squares.

The following is a summary based on the results in Table 1. The OLS and standard IV estimators have large bias due to violations of their assumptions. The stratum-specific iDID Wald estimators show negligible bias and adequate coverage probability. The three classes of semiparametric iDID estimators that rely on 1,2,3 have negligible bias and adequate coverage probability when the corresponding models are correctly specified but are biased when misspecified. The multiply robust semiparametric iDID estimators exhibit negligible bias and adequate coverage probabilities when at least one of 1,2,3 is correct, which supports the multiple robustness property.

7 |. APPLICATION

We apply the proposed method to analyze the effect of cigarette smoking on lung cancer mortality. Given the lag between smoking exposure and lung cancer mortality, we adopt the two-sample instrumented DID design. Our analysis is based upon two datasets arranged by 10-year birth cohort: the 1970 National Health Interview Survey (NHIS) for nationally representative estimates of smoking prevalence (National Health Interview Survey, 1970), and the US Centers for Disease Control and Prevention’s (CDC) Wide-ranging ONline Data for Epidemiologic Research (WONDER) system for estimates of national lung cancer (ICD-8/9: 162; ICD-10: C33-C34) mortality rates (CDC, 2000a, 2000b, 2016). Only the 1970 NHIS is used because it is the first NHIS that records the initiation and cessation time of smoking such that a longitudinal structure is available. We closely follow the approach taken by Tolley et al. (1991, Chapter 3) to calculate the smoking prevalence rates.

Based on the data availability, we focus on four successive 10-year birth cohorts: 1911–1920, 1921–1930, 1931–1940, 1941–1950, whose smoking prevalence is estimated respectively at year 1940, 1950, 1960, 1970 when they are at age 20–29, whose lung cancer mortality rates are estimated respectively at year 1975, 1985, 1995, 2005 when they are at age 55–64. Here, cohort of birth plays the role of time. Figure 2 shows the changes in prevalence of cigarette smoking among men and women aged 20–29 years, and the changes in lung cancer mortality rates 35 years later in the United States. From Figure 2, we see that the trends in lung cancer mortality rates follow the trends in smoking prevalence, with a lag of 35 years, which provides evidence that smoking increases lung cancer mortality rate.

FIGURE 2.

FIGURE 2

Changes in prevalence of cigarette smoking for men and women aged 20–29, lung cancer mortality rates for men and women aged 55–64 years among four successive 10-year birth cohorts: 1911–1920, 1921–1930, 1931–1940, 1941–1950

There have been many direct comparisons of the lung cancer mortality rates between smokers and non-smokers which have found higher rates among smokers (International Agency for Research on Cancer, 1986). Additional studies that replicate direct comparisons of smokers and non-smokers may not add much evidence beyond the first comparison. It is argued in Rosenbaum (2010) that “in such a situation, it may be possible to find haphazard nudges that, at the margin, enable or discourage [the exposure]. ... These nudges may be biased in various ways, but there may be no reason for them to be consistently biased in the same direction, so similar estimates of effect from studies subject to different potential biases gradually reduce ambiguity about what part is effect and what part is bias.” The instrumented DID is one such method that attempts to exploit the “haphazard nudges”, that is, the targeted tobacco advertising to women in the 1960s that led to a rapid increase in smoking among young women in a way that is presumably independent of other causes of lung cancer mortality.

To quantitatively evaluate the effect of cigarette smoking on lung cancer mortality, we take gender—a surrogate of whether each individual received encouragement (targeted tobacco advertising) or not—as the IV for DID. Note that gender does not need to have a causal effect on smoking; as proved in the Supporting information, it suffices that gender is correlated with smoking due to the encouragement from targeted tobacco advertising. We consider two successive 10-year birth cohorts, setting the earlier birth cohort as T=0 and the later birth cohort as T=1. Gender is likely a valid IV for DID, as it clearly satisfies the trend relevance assumption, the lung cancer mortality rates for men and women would have evolved similarly had all subjects counterfactually not smoked, and there is no evident gender difference in the cancer-causing effects of cigarette smoking (Patel et al., 2004).

Table 2 summarizes (i) the F-statistic proposed in Section 5 to measure weak identification; and (ii) the two-sample iDID Wald estimators β^TSwald defined in Section 4 and their SEs defined in Equation (S6). More details on the application are also in the Supporting information. From Table 2, under the assumption that gender is a valid IV for DID and the treatment effect is stable over time, we find evidence that smoking leads to significantly higher lung cancer mortality rates. Specifically, we find that smoking in one’s 20s leads to an elevated annual lung cancer mortality rate at age 55–64 years, with the effect size ranging from 0.285% to 0.568%. This is of a similar magnitude as the findings in Thun et al. (1982, 2013). Using different birth cohorts gives slightly different point estimates, but they are within two SEs of each other. Nonetheless, there is still concern about violating the stable treatment effect over time assumption (Assumption 2(d)), possibly because the cigarette design and composition have undergone changes that promote deeper inhalation of smoke (Thun et al., 2013; Warren et al., 2014). In Section S4, we perform a sensitivity analysis and find that increasing risk of smoking over time does not explain away the observed treatment effect.

TABLE 2.

Two-sample iDID Wald estimates and their standard errors (in parentheses) using two successive birth cohorts (in %)

1911–1920 1921–1930 1931–1940
Birth cohort 1921–1930 1931–1940 1941–1950
F-statistic 13.94 47.28 21.33
β^TSwald 0.285 (0.089) 0.497 (0.076) 0.568 (0.127)

0Notes: F-statistic is the squared z-score, β^TSwald defined in Section 4 estimates the ATE of smoking on lung cancer mortality. Abbreviations: ATE, average treatment effect; iDID, instrumented difference-in-differences.

8 |. RESULTS AND DISCUSSION

In this paper, we have proposed a new method called instrumented DID that explicitly leverages exogenous randomness in the exposure trends, and controls for unmeasured confounding in repeated cross-sectional studies. The instrumented DID method evolves from two powerful natural experiment devices, the standard IV and standard DID, but is able to relax some of their most disputable assumptions. Our motivation of assessing the causal effect by linking the change in outcome mean and the change in exposure rate is also related to the trend-in-trend design (Ji et al., 2017) and etiologic mixed design (Lash et al., 2021).

In principle, any variable that satisfies Assumptions 2(a)–(c) can be chosen as the IV for DID. Here, we list two common sources of the IV for DID: (i) administrative information, such as geographic region and insurance type; and (ii) variables that are commonly used as standard IVs, such as physician preference, distance to care provider, and genetic variants—see Baiocchi et al. (2014) for more examples; as discussed in Section 2, these variables are more likely to be an IV for DID compared to being a standard IV, because IVs for DID are allowed to have direct effects on the outcome.

Supplementary Material

supinfo

ACKNOWLEDGMENTS

We gratefully acknowledge support by grant R01AG064589 from the National Institutes of Health. We would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that led to a much improved paper.

Funding information

National Institutes of Health, Grant/Award Number: R01AG064589

Footnotes

OPEN RESEARCH BADGES

This article has earned Open Data and Open Materials badges. Data and materials are available at https://github.com/jfiksel/compregpaper, https://github.com/jfiksel/codalm.

SUPPORTING INFORMATION

Web Appendices referenced in Sections 27 are available with this paper at the Biometrics website on Wiley Online Library. Code and data for replicating the simulations and data analyses in this paper are being made available online with the paper. R package idid, available at https://github.com/tye27/idid, implements the proposed method.

Table S1: Sample sizes for 1970 NHIS datasets and 1975, 1985, 1995, 2005 CDC WONDER compressed mortality datasets by birth cohort and gender

Data and codes are available in the Supporting Information of this article.

DATA AVAILABILITY STATEMENT

The data that support the findings of this paper are openly available in a GitHub repository at https://github.com/jfiksel/compregpaper.

REFERENCES

  1. Abadie A (2003) Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics, 113, 231–263. [Google Scholar]
  2. Abadie A (2005) Semiparametric difference-in-differences estimators. The Review of Economic Studies, 72, 1–19. [Google Scholar]
  3. Angrist JD, Imbens GW & Rubin DB (1996) Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444–455. [Google Scholar]
  4. Angrist JD & Krueger AB (1992) The effect of age at school entry on educational attainment: an application of instrumental variables with moments from two samples. Journal of the American statistical Association, 87, 328–336. [Google Scholar]
  5. Angrist JD & Krueger AB (1995) Split-sample instrumental variables estimates of the return to schooling. Journal of Business & Economic Statistics, 13, 225–235. [Google Scholar]
  6. Angrist JD & Pischke J-S (2008) Mostly harmless econometrics: an empiricist’s companion. Princeton, NJ: Princeton University Press. [Google Scholar]
  7. Bailar JC & Gornik HL (1997) Cancer undefeated. New England Journal of Medicine, 336, 1569–1574. [DOI] [PubMed] [Google Scholar]
  8. Baiocchi M, Cheng J & Small DS (2014) Instrumental variable methods for causal inference. Statistics in Medicine, 33, 2297–2340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bickel P, Klaassen C, Ritov Y & Wellner J (1993) Efficient and adaptive estimation for semiparametric Models. Springer. [Google Scholar]
  10. Burbank F (1972) U.S. lung cancer death rates begin to rise proportionately more rapidly for females than for males: a dose-response effect? Journal of Chronic Diseases, 25, 473–479. [DOI] [PubMed] [Google Scholar]
  11. Card D & Krueger AB (1994) Minimum wages and employment: a case study of the fast food industry in new jersey and pennsylvania. American Economic Review, 84, 772–793. [Google Scholar]
  12. CDC (2000a). Centers for disease control and prevention, national center for health statistics. compressed mortality file 1968–1978. CDC WONDER online database, compiled from compressed mortality file CMF 1968–1988, series 20, no. 2A, 2000. Available from: http://wonder.cdc.gov/cmf-icd8.html [Accessed 27th Aug 2020]. [Google Scholar]
  13. CDC (2000b). Centers for disease control and prevention, national center for health statistics. compressed mortality file 1979–1998. CDC WONDER online database, compiled from compressed mortality file CMF 1979–1998, series 20, no. 2A, 2000 and CMF 1989–1998, series 20, no. 2E, 2003. Available from: http://wonder.cdc.gov/cmf-icd9.html [Accessed 27th Aug 2020]. [Google Scholar]
  14. CDC (2016) Centers for disease control and prevention, national center for health statistics. compressed mortality file 1999–2016 on cdc wonder online database, released june 2017. Data are from the compressed mortality file 1999–2016 series 20 no. 2U, 2016. Available from: http://wonder.cdc.gov/cmf-icd10.html [Accessed 28th Aug 2020]. [Google Scholar]
  15. Cheng G & Huang JZ (2010) Bootstrap consistency for general semiparametric M-estimation. The Annals of Statistics, 38, 2884–2915. [Google Scholar]
  16. Cui Y & Tchetgen Tchetgen E (2021) A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity. Journal of the American Statistical Association, 116, 162–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Davidson R & MacKinnon JG (1993) Estimation and inference in econometrics. Oxford University Press. [Google Scholar]
  18. de Chaisemartin C & D’HaultfŒuille X (2018) Fuzzy differences-in-differences. The Review of Economic Studies, 85, 999–1028. [Google Scholar]
  19. Duflo E (2001) Schooling and labor market consequences of school construction in indonesia: evidence from an unusual policy experiment. American Economic Review, 91, 795–813. [Google Scholar]
  20. Hernán MA & Robins JM (2006) Instruments for causal inference: an epidemiologist’s dream? Epidemiology, 17, 360–372. [DOI] [PubMed] [Google Scholar]
  21. Hernan MA & Robins JM (2020) Causal inference: what if. Boca Raton, FL: Chapman & Hall. [Google Scholar]
  22. International Agency for Research on Cancer (1986) Tobacco smoking, vol. 38. World Health Organization. [Google Scholar]
  23. Ji X, Small DS, Leonard CE & Hennessy S (2017) The trend-in-trend research design for causal inference. Epidemiology, 28, 529–536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Jiang Y & Small DS (2014) ivpack: instrumental Variable Estimation. R package version 1.2. [Google Scholar]
  25. Kennedy EH, Lorch S & Small DS (2019) Robust causal inference with continuous instruments using the local instrumental variable curve. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81, 121–143. [Google Scholar]
  26. Lash TL, VanderWeele TJ, Haneuse S & Rothman KJ (2021) Modern epidemiology, vol. 4. Wolters Kluwer Health. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lawlor DA, Davey Smith G, Kundu D, Bruckdorfer KR & Ebrahim S (2004) Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? Lancet, 363, 1724–1727. [DOI] [PubMed] [Google Scholar]
  28. Meigs JW (1977) Epidemic lung cancer in women. JAMA, 238, 1055–1055. [PubMed] [Google Scholar]
  29. National Health Interview Survey (1970) Available from: ftp://ftp.cdc.gov/pub/health_statistics/nchs/datasets/nhis/1970 [Accessed 31st Aug 2020].
  30. Newey WK & McFadden D (1994) Large sample estimation and hypothesis testing. Chap. 36. Handbook of Econometrics, 4, 2111–2245. [Google Scholar]
  31. Neyman J (1923) On the application of probability theory to agricultural experiments. Essay on principles. section 9. Statistical Science, 5, 465–472. Trans. Dabrowska Dorota M. and Speed Terence P. (1990). [Google Scholar]
  32. Ogburn EL, Rotnitzky A & Robins JM (2015) Doubly robust estimation of the local average treatment effect curve. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 373–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Patel JD, Bach PB & Kris MG (2004) Lung cancer in US women: a contemporary epidemic. JAMA, 291, 1763–1768. [DOI] [PubMed] [Google Scholar]
  34. Pierce JP & Gilpin EA (1995) A historical analysis of tobacco marketing and the uptake of smoking by youth in the United States: 1890–1977. Health Psychology, 14, 500. [DOI] [PubMed] [Google Scholar]
  35. Robins JM (1994) Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics: Theory and Methods, 23, 2379–2412. [Google Scholar]
  36. Rosenbaum PR (2010) Design of observational studies. Springer. [Google Scholar]
  37. Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 6, 688–701. [Google Scholar]
  38. Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Annals of Statistics, 6, 34–58. [Google Scholar]
  39. Rubin DB (1990) Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5, 472–480. [Google Scholar]
  40. Rutter M (2007) Identifying the environmental causes of disease: how should we decide what to believe and when to take action? Report Synopsis. Academy of Medical Sciences. [Google Scholar]
  41. Shi X, Miao W, Nelson JC & Tchetgen Tchetgen EJ (2020) Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82, 521–540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Stock JH, Wright JH & Yogo M (2002) A survey of weak instruments and weak identification in generalized method of moments. Journal of Business & Economic Statistics, 20, 518–529. [Google Scholar]
  43. Tan Z (2006) Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association, 101, 1607–1618. [Google Scholar]
  44. Thun JM, Day-Lally C, Myers GD, Calle EE, Flanders WD, Zhu B-P et al. (1982) Trends in tobacco smoking and mortality from cigarette use in cancer prevention studies I (1959–1965) and II (1982–1988). Changes in cigarette-related disease risks and their implication for prevention and control: smoking and tobacco control monograph. Vol. 8. Bethesda, MD: U.S. Department of Health and Human Services, National Institutes of Health, National Cancer Institute. NIH Pub. [Google Scholar]
  45. Thun MJ, Carter BD, Feskanich D, Freedman ND, Prentice R, Lopez AD, et al. (2013) 50-year trends in smoking-related mortality in the United States. New England Journal of Medicine, 368, 351–364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Tolley H, Crane L & Shipley N (1991) Strategies to control tobacco use in the United States—a blueprint for public health action in the 1990s. NIH publication no. 92–3316 pp. 75–144. Bethesda, MD: U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, National Cancer Institute. [Google Scholar]
  47. van der Vaart A (2000) Asymptotic statistics. Cambridge University Press. [Google Scholar]
  48. Vansteelandt S, VanderWeele TJ, Tchetgen Tchetgen EJ & Robins JM (2008) Multiply robust inference for statistical interactions. Journal of the American Statistical Association, 103, 1693–1704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Wang L & Tchetgen Tchetgen E (2018) Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 531–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Warren GW, Alberg AJ, Kraft AS & Cummings KM (2014) The 2014 surgeon general’s report:“the health consequences of smoking—50 years of progress”: a paradigm shift in cancer care. Cancer, 120, 1914–1916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Wooldridge JM (2010) Econometric analysis of cross section and panel data. MIT press. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

Data Availability Statement

The data that support the findings of this paper are openly available in a GitHub repository at https://github.com/jfiksel/compregpaper.

RESOURCES