Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 May 23.
Published in final edited form as: Biometrics. 2022 Oct 21;79(2):1330–1343. doi: 10.1111/biom.13749

A formal causal interpretation of the case-crossover design

Zach Shahn 1,2, Miguel A Hernán 3,4, James M Robins 3,4
PMCID: PMC11115970  NIHMSID: NIHMS1983244  PMID: 36001285

Abstract

The case-crossover design of Maclure is widely used in epidemiology and other fields to study causal effects of transient treatments on acute outcomes. However, its validity and causal interpretation have only been justified under informal conditions. Here, we place the design in a formal counterfactual framework for the first time. Doing so helps to clarify its assumptions and interpretation. In particular, when the treatment effect is nonnull, we identify a previously unnoticed bias arising from strong common causes of the outcome at different person-times. We analyze this bias and demonstrate its potential importance with simulations. We also use our derivation of the limit of the case-crossover estimator to analyze its sensitivity to treatment effect heterogeneity, a violation of one of the informal criteria for validity. The upshot of this work for practitioners is that, while the case-crossover design can be useful for testing the causal null hypothesis in the presence of baseline confounders, extra caution is warranted when using the case-crossover design for point estimation of causal effects.

Keywords: case-crossover, causal inference, counterfactual framework

1. |. INTRODUCTION

The case-crossover design (Maclure, 1991) is used in epidemiology and other fields to study causal effects of transient treatments on acute outcomes. One of its major advantages is that it only requires information from individuals who experience the outcome of interest (the cases). Another appealing feature is that under certain circumstances (which we will discuss at length) the case-crossover estimator adjusts for unobserved time invariant confounding. In a seminal application of this design (Mittleman et al., 1993), researchers obtained data on the physical activity (a transient treatment) of individuals who experienced a myocardial infarction (MI, an acute outcome). They then defined any person-times less than 1 h after vigorous activity as “treated,” and all other person-times as “untreated.” Finally, they considered each person-time as an individual observation and computed a Mantel–Haenszel estimate of the corresponding hazard ratio (Greenland & Robins, 1985; Kleinbaum et al., 1982; Nurminen, 1981; Tarone et al., 1983). This hazard ratio estimate was interpreted as the causal effect of vigorous physical activity on MI. Some variants of the case-crossover design allow flexible control time selection strategies where control times can follow outcome occurrence (e.g., Levy et al., 2001), but in this paper we restrict attention to studies in which follow-up is terminated at the time of the first outcome occurrence as in the above MI example.

Past authors have extensively considered several threats to validity of the case-crossover design (Greenland, 1996; Janes et al., 2005; Levy et al., 2001; Maclure, 1991; Mittleman & Mostofsky, 2014; Vines & Farrington, 2001), and conditions for causal interpretation of the estimator have been informally stated in the literature. The usual criteria cited are that: (a) the outcome has acute onset; (b) the treatment effect on the outcome is transient; (c) there are no unobserved post-baseline common causes of treatment and outcome;(d)there are no time trends in treatment; and (e) the treatment effect is constant across subjects.

The Mantel–Haenszel estimator was originally applied to estimate the treatment-outcome odds ratio when subjects were classified in strata sharing values of confounders V, and observed subjects in each stratum could be conceived of as independent draws from the (hypothetically) infinite stratum population. Under the assumptions that stratum-specific odds ratios are all equal and observations are independent within each stratum, the Mantel–Haenszel estimator was later proven consistent for the constant odds ratio as the number of strata approach infinity even if only a few subjects are observed in each stratum (Breslow, 1981). Since the values of the confounders V are held constant within each stratum, the constant odds ratio can be endowed with a causal interpretation if V includes all confounders. The same goes for the rate ratio (Greenland & Robins, 1985).

Maclure’s idea was to regard person-times (rather than subjects) as the units of analysis and subjects as the strata, then apply the Mantel–Haenszel estimator. As Maclure (1991) put it: “In the case-crossover design, the population base is considered to be stratified in the extreme, so there is only one individual per stratum... Use of subjects as their own controls eliminates confounding by subject characteristics that remain constant.” Analogy to past applications of the Mantel–Haenszel estimator would seem to imply that the case-crossover design eliminates baseline confounding as a source of bias assuming a constant treatment effect across subjects (informal condition (e)) and independent identically distributed observations across time within each subject. Of course, these two assumptions are unlikely to be satisfied in most research settings: the effect of treatment is rarely the same in all subjects, and variables at different person-times are typically not independent within subjects. Informal assumptions (a)–(d) can be viewed as a more plausible alternative to independent person-times, but to determine when the case-crossover estimator is asymptotically unbiased for causal effects in the presence of unobserved confounding requires a formal analysis.

Here we place the case-crossover design in a formal counterfactual causal inference framework (Rubin, 1978; Robins, 1986). Doing so helps to clarify its assumptions and interpretation. In Section 2, we introduce notation, describe the (possibly hypothetical) cohort that gives rise to the data in a case-crossover analysis, and summarize the MI study in more detail so that it can serve as a running example. In Section 3, we define a natural estimand motivated by a hypothetical randomized trial practitioners of the case-crossover design might wish to emulate. In Section 4, we state formal assumptions (mostly analogous to informal assumptions (a)–(e)) that allow us to causally interpret the limit of the case-crossover estimator and under which the limit approximates the trial estimand from Section 3. We identify and characterize a previously unnoticed bias present when there exist strong common causes of the outcomes at different times (as would seem likely in many instances) and the treatment effect is non-null. In Section 5, we discuss this bias and illustrate it with simulations. We also use our results from Section 4 to analyze sensitivity to effect heterogeneity, that is, violations of informal assumption (e). In Section 6, we conclude. Our general message to practitioners is that, while the case-crossover can be a clever way to test the null hypothesis of no causal effect in the presence of unobserved baseline confounding, its point estimates of nonnull effects can be sensitive to violations of unrealistic assumptions.

2 |. DATA-GENERATING PROCESS

2.1 |. Notation

While case-crossover studies only use data from subjects who experience the outcome, we will nonetheless describe a full cohort from which these subjects are drawn in order to facilitate the definition of certain concepts and quantities of interest. Consider a cohort of individuals followed from baseline(i.e.,study entry)—defined by calendar time, age, or time of some pre-defined index event—until they develop the outcome or the administrative end of follow-up, whichever occurs first. For simplicity, we assume no individual is lost to follow-up. Subjects are indexed by i,i{1,,N}. Subject i is followed for at most T+1 person-times (e.g., hours) indexed by j{0,,T}. For simplicity, we take T to be the same for all subjects. Let Aij be a binary variable taking values 0 and 1 indicating whether subject i was treated at time j. Let Yij be a binary variable taking values 0 and 1 indicating whether the outcome of interest occurred in subject i before time j+1. We assume that Yij is a “time to event” outcome in the sense that if Yij=1 then Yij=1 for all j>j. The above implies the temporal ordering Aij, Yij,Ai(j+1). Thus the outcome has an acute onset as required by informal condition (a). We define Aij=}? if the event has occurred by time j.

For a time-varying variable Z, we denote by Z¯ij the history (Zi0,,Zij) of Z in subject i up (i.e., prior) to time j+1. We will often omit the subscript i in the subsequent notation because we assume the data from different subjects i are independent and identically distributed. Let V denote a possibly multidimensional and unobserved baseline confounding variable that we assume has some population density p(v). (For notational convenience, we shall write conditional probabilities P{,V=v} given V=v as pv{}. To avoid measure theoretic subtleties, we shall henceforth assume that when V has continuous components, conditions sufficient to pick out a particular version of P{,V=v} have been imposed as in Gill and Robins (2001).) Let U¯K denote common causes of outcomes (but not treatments) at different person-times not included in V. For example, in the MI and exercise study, Uj could denote formation of a blood clot by hour j after baseline. We assume that the N subjects are 𝑖𝑖𝑑 realizations of the random vector (V,A¯T,Y¯T,U¯T) and that Uj precedes Aj and Yj in the temporal ordering at each j. Recall that in a case-crossover study the observed data on subject i are (A¯iT,Y¯iT) as data on V and U¯T are not available.

We assume that the causal directed acyclic graph (DAG) (Greenlandetal.,1999) in Figure 1 describes the data generating process within levels of baseline confounders V. This DAG encodes aspects of informal assumptions (b) and (c). One salient feature of the DAG is that there are no directed paths from a current treatment to an outcome at a later time that do not first pass through the outcome at the time of the current treatment or through a later treatment. This can be considered a representation of informal assumption (b) that the treatment effect is transient. The DAG also excludes any common causes of treatments and outcomes other than V not through past outcomes.(Since occurrence of the outcome at time j determines the values of all variables at all later time points, outcome variable nodes in the DAG trivially must have arrows to all temporally subsequent variables.) This represents informal assumption (c) which bars nonbaseline confounding. This DAG also has fully forward connected treatments with arbitrary common causes of treatment at different times U¯AT, indicating that we put no causal restrictions on the treatment assignment process. (We will, however, impose distributional assumptions.) We provide a fuller discussion of causal assumptions in Section 4, but we find it helpful to keep this DAG in mind.

FIGURE 1.

FIGURE 1

Causal DAG within levels of V. A V node with arrows pointing into every other node was omitted for visual clarity. This figure appears in color in the electronic version of this article, and any mention of color refers to that version

2.2 |. The case-crossover design

The outcome-censored case-crossover Mantel–Haenszel estimator requires data from subjects who experience the outcome on treatment status at the time of outcome occurrence and at designated “control” times preceding the outcome. It is computed as follows:

  • Select a random sample of H person-times from the H person times ij satisfying Yij=1, Yi(j1)=0, and j>W where W is a maximum “look back” time chosen by the investigator. We refer to these H person-times when the outcome occurred for the first time and after time W as the set of “case” person-times.

  • Let ihjh denote the person-time of the hth element of the set of H sampled case person times. From the same subject ih, select m times {jhc1,,jhcm} from the W times prior to the time jh of subject ihs first outcome event. We call these m times the “control” person-times for subject ih. We discuss selection of “control” times below.

  • Let Ah1 denote the treatment at the case time and (Ah10,,Ahm0) denote treatments at the m control times in subject ih. The Mantel–Haenszel case-crossover estimator IRR^MH is

IRR^MH=hl=1m𝟙{Ah1=1,Ahl0=0}hl=1m𝟙{Ah1=0,Ahl0=1}. (1)

Note that for subject ih the only data necessary to compute IRR^MH is (Ah1,Ah10,,Ahm0).

Intuitively, the more subjects tend to be treated at the time of the outcome but not at earlier control times as opposed to vice versa, the stronger the estimated effect of treatment. To fix ideas, we consider an example of a case-crossover study from the literature. In a simplified version of Mittleman et al. (1993) study on the impact of exercise on MI mentioned in the introduction, suppose we collect data from a random sample of patients suffering MI on a particular Sunday. We record whether each patient exercised in the hour immediately preceding their MI and whether they exercised in the same hour the day before their MI. We compute the Mantel–Haenszel case-crossover estimator (1): in the numerator is the number of subjects who exercised immediately prior to their MI but not 24 h before, and in the denominator is the number of subjects who did not exercise immediately prior to their MI but did 24 h before. Mittleman et al. estimated a ratio of 5.9 (95% CI 4.6,7.7).

Many approaches to selecting control times might be acceptable. In the MI example, the lookback window is the 24 h before the MI and there is only one control time exactly 24 h before the outcome time. So W=24, m=1, and c1=24 in the notation above.

3 |. A NATURAL ESTIMAND

Consider T parallel group randomized trials in which, in trial j, j=1,,T, treatment is randomly assigned at and only at time j to all subjects who have yet to experience the outcome. Such a time j-specific trial could estimate the immediate effect of treatment at time j. To formalize, we adopt the counterfactual framework of Robins (1986). Let Yija¯j be the value of the outcome at time j had, possibly contrary to fact, subject i followed treatment regime a¯j(a1,,aj) through time j. We refer to Yija¯j as a counterfactual or potential outcome. Since we will frequently consider treatment interventions at a single time point, we also introduce the notation Yaj as shorthand for YA¯j1,aj, that is, the counterfactual value of random variable Yj under observed treatment history through j1 and treatment at time j set to aj. The j-specific trial would yield an estimate of 𝑗-specific risk ratio or discrete hazard ratio βjP(Yj1=1Y¯j1=0)/P(Yj0=1Y¯j1=0). Until Section 5.2, we assume

ββjconstant overj. (2)

In the next section, we establish (strong) assumptions under which the case-crossover estimator approximately converges to β.

4 |. DERIVATION OF THE COUNTERFACTUAL INTERPRETATION OF THE LIMIT OF THE CASE-CROSSOVER ESTIMATOR

4.1 |. Assumptions

Our goal is to specify natural and near minimal assumptions that allow us to causally interpret the limit of the case- crossover estimator. Counterfactuals and the observed data are linked by the following standard assumption:

Consistency:YjA¯j=Yjfor allj. (3)

Consistency states that the counterfactual outcomes corresponding to the observed treatment regimes are equal to the observed outcomes. Consistency is a technical assumption that has no counterpart in the informal assumptions (a)–(e) but is implicit in almost all analyses.

We assume that the causal graph in Figure 1 describes the data-generating process (Greenland et al., 1999). We will state some specific assumptions implied by the graph in counterfactual notation and also state additional assumptions. Figure 1 encodes informal assumption (c) that there are no post-baseline confounders not contained in V, that is,

Sequential Exchangeability:For alla¯k,Aj{Yka¯k;kj}Y¯j1=0,A¯j1=a¯j1,V=vfor allj.

See Appendix B for further details. Assuming consistency, (4) can be read off the Single World Intervention Graph (Richardson & Robins, 2013) for the treatment a¯T associated with the causal graph of Figure 1. An example violation of (4) in the MI study would be if caffeine intake at hour j both encouraged exercise and increased MI risk at j. We might expect that confounders of this sort in the MI study (short-term encouragements to exercise that are associated with MI) are weak.

The DAG in Figure 1 also reflects informal assumption (b) that effects are transient by implying that Aj has no direct effect on Yj+1,,YT not through Aj+1,,AT and Yj for all j. Graphically, this is the statement that the only treatment variable that is a parent of Yj is Aj. One might hope that the graphical definition of the transient effect assumption would be equivalent to the assumption that, conditional on V, counterfactual hazards are independent of past treatment history, that is, that λvja¯jpv(Yja¯j=1Y¯j1a¯j1=0)=λvjaj(a¯j1)pv(Yjaj=1Y¯j1=0,A¯j1=a¯j1)=pv(Yj=1Y¯j1=0,A¯j1=a¯j1,Aj=aj) is the same for all a¯j1, where the equalities all follow from (4) and (3). However, this is not generally true due to collider bias (Hernan et al., 2004) stemming from the presence of the common causes U¯j of the outcomes Y¯j in Figure 1 since conditional on Y¯j1=0, for example, the path Aj1Yj1Uj1Yj is open. Because of the collider bias, a formal counterfactual definition of the transient effect assumption requires conditioning on U-histories. Specifically, let λvjaj(a¯j1,u¯j) denote pv(Yjaj=1Y¯j1=0,A¯j1=a¯j1,U¯j=u¯j) the conditional counterfactual hazard at time j under treatment aj given past treatments a¯j1, common causes of outcomes u¯j, and baseline confounders u. As implied by Figure 1, we henceforth assume:

UV-Transient Hazards:λvjaj(A¯j1=a¯j1,U¯j1=u¯j)doesnotdependona¯j1. (5)

That is, conditional on V and the history of U, the current counterfactual hazard does not depend on past treatments. This assumption is consistent with the absence of any mention of such dependence in the case-crossover literature. Biological considerations determine the plausibility of (5). In the MI study, (5) would be violated if exercise can have delayed effects of more than one hour on MI. Maclure (1991) argued delayed effects would be weak in this setting.

Under (5), we can write the counterfactual hazard λvjaj(a¯j1,u¯j) for any a¯j1 as λvjaj(u¯j). Define λvjaj=u¯jλvjaj(u¯j)pv(u¯jY¯j1=0)=pr(Yjaj=1Y¯j1=0,V=v) to be the counterfactual hazard at time j given v marginal over U¯j. Note λvjaj is the treatment arm ajv-specific conditional risk being estimated in the RCT conducted at time j described in Section 3. Thus, βvj=λvj1/λvj0 is the vj-specific relative risk. Define λjaj=λvjajp(vY¯j1=0)dv=pr(Yjaj=1Y¯j1=0).Then βj=λj1/λj0=βvjw(v)dv with weight w(v)=λvj0p(vY¯j1=0)/λj0βj is the parameter from the time j-specific RCT in (2).

We make the constant effects assumption (e) that

Constant Causal Hazard Ratio:β=βj=βvjλvj1/λvj0doesnotdependonv, (6)

which is stronger than assumption (2). Note under (6), βvj=βj is collapsible over v at each j so that the definition of βj does not depend on the specific variables comprising V, which we leave unspecified. Although it is well known that hazard ratios are not collapsible (Greenland, 1999) in general, in our time j-specific RCT, βvj is just the conditional risk ratio in the follow up period from time j to j+1 among those with Y¯j1=0, which is collapsible under (6).

(6) is a very strong assumption unlikely to ever hold exactly. Violations can be less extreme in subpopulations, for example, subjects who exercise regularly in the MI study. We examine sensitivity to violations of (6) in Section 5.2.

We make a rare outcome assumption that holds under all levels of V, U¯, and A¯.

Rare Outcome:j=1T(1λvjaj(u¯j))>1ϵu¯T,v,a¯Tandϵa small positive number. (7)

A consequence of the rare outcome assumption is that (to a good approximation) collider bias induced by conditioning on Y¯j1 can be neglected. Because (V,U¯) can be high dimensional and contain post-baseline information, it is unlikely this assumption holds in the MI study. For example, clot formation might cause a violation. But we will see that bias can be small even if this assumption fails as long as cases occurring under the violating (V,U¯) levels do not account for a large proportion of total cases.

Mathematically, the final assumption we will need is

Weighted No Time Trends in Treatment:lk>Wvλvk0pv(Ak=0,Akcl=1){pv(Ak=1,Akcl=0)pv(Ak=0,Akcl=1)1}p(v)dvlk>Wvλvk0pv(Ak=0,Akcl=1)p(v)0. (8)

The left-hand side of (8) is a weighted average of time trends in treatment pv(Ak=1,Akcl=0)pv(Ak=0,Akcl=1)1 within levels of V, with weights equal to λvk0pv(Ak=0,Akcl=1). One way to satisfy (8) is if |pv(Ak=1,Akcl=0)pv(Ak=0,Akcl=1)1| is almost always very small, but that is a very strong condition. We find it enlightening to consider a weaker trio of jointly sufficient conditions for (8).

The first of the three jointly sufficient conditions for (8) is

No Time-Modified Confounding:l=1mk>Wvλvk0{pv(Ak=1,Akcl=0)pv(Ak=0,Akcl=1)}p(v)dvl=1mk>W[vλvk0p(v)dv×v{pv(Ak=1,Akcl=0)pv(Ak=0,Akcl=1)}p(v)dv], (9)

where kcl is a control time for an outcome occurring at k. A sufficient condition for (9) to hold is that, for each k and l the marginal correlation Cou(pv(Ak=1,Akcl=0)pv(Ak=0,Akcl=1),λVk0) is near zero between the random functions PV(Ak=1,Akcl=0)pV(Ak=0,Akcl=1) and λVk0 of V. In fact, we require only that the sum over k and l of the k-specific covariances for each control time is near zero. This condition prevents bias from so-called time modified baseline confounders V (Platt et al., 2009) which, by definition, are baseline confounders V that predict both (i) the hazard of an unexposed subject failing at various times k and (ii) the difference in marginal probabilities of the events (Ak=0,Akcl=1) and (Ak=1,Akcl=0). The case-crossover literature distinguishes between baseline and post-baseline confounders and says the former are allowed but not the latter. The more relevant distinction is whether a confounder has time-varying effects. To understand the issue, first consider a post-baseline confounder. We gave the example earlier of caffeine intake (Ck) at time k impacting probability of both exercise and MI at k (more precisely, between k and k+1). Ck is temporally a post-baseline variable as its value is realized at time 𝑘, but in the causal ordering it could be equivalent to a baseline variable if it is not influenced by past treatments. For example, coffee at time k could be equivalent to a 𝑘-hour delayed release caffeine pill at baseline. Suppose D(k)V is a baseline variable (like the delayed release caffeine pill) such that D(k)=1 causes Ak=1 and Yk=1 to be more likely. D(k) would induce bias just like Ck, even though D(k)V is a baseline confounder that (unlike Ck) would not lead to a violation of (4). However, whenever D(k)=1, p(Ak=1,Akcl=0)p(Ak=0,Akcl=1) and λvk0 will both be large, inducing a correlation of the sort banned by (9). Thus, (9) serves to ban time modified confounding.

The second of the jointly sufficient conditions for (8) is a sort of positivity assumption that probability of outcome occurrences and probability of discordant exposure pairs are not so negatively correlated across levels of V that they almost never co-occur.

Discordance Positivity:vλvk0pv(Ak=0,Akcl=1)(v)dv>δvλvk0p(v)dvfor someδ>0 (10)

for all k,cl such that kcl would be a control time if the outcome were to occur atk.

The last of the jointly sufficient conditions for (8) formalizes the informal assumption (d) of no time trends in treatment.

No Time Trends in Treatment:[p(Ak=1,Akcl=0)p(Ak=0,Akcl=1)]/δ<ϵ2 (11)

for all k,cl such that kcl would be a control time if the outcome were to occur at k, δ is defined in (10), and ϵ2 is a small positive number. (11) is essentially the marginal pairwise exchangeability assumption previously derived by Vines and Farrington (2001). Note that the assumption is marginal over V and U¯k so that it is empirically checkable (apart from δ being unknown). In Web Appendix A, we discuss bias inflation from small δ due to strong negative correlation between λvk0 and pv(Ak=0,Akcl=1) and provide an example where the treatment time trend is very small (on an absolute scale, though not compared to δ),and yet the estimator is significantly biased. Under (11), exposure can still exhibit arbitrarily complex temporal dependence as in the DAG in Figure 1. Whether (11) holds can depend in part on how control times are chosen. In the MI study, control times 12 h prior to the outcome could be much less likely to satisfy (11) than control times 24 h prior (e.g., 2 PM the previous day would be a better control time than 2 AM the morning of an MI that occurred at 2 PM).

4.2 |. The limit of the Mantel–Haenszel estimator

In the theorem below, we consider the probability limit IRRMH of IRR^MH (1) in the outcome-censored case-crossover design under an asymptotic sequence in which the full cohort, the number of cases in the cohort, and the number of sampled cases grow at similar rates, that is, N,H/Nd1>0, and H/Hd2>0. We also assume subjects are iid.

Theorem 1.

Assume (3)(8) or (3)(7) and (9)(11) hold for some V and U¯. Then, under the outcome-censored case-crossover design,IRR^MHpIRRMHβ.

The proof is in Appendix A, along with by product so of the derivation useful for bias analysis.

5 |. ANALYSIS OF SELECTED SOURCES OF BIAS

5.1 |. Bias due to strong common causes of the outcome

As discussed earlier, our rare outcome assumption within levels of (possibly post-baseline and high dimensional) common causes of the outcome is novel and unreasonably strong. In this subsection, we will examine analytically and through simulations the bias that arises when it fails even under a stronger constant effect assumption that β=λvk1(u¯k)/λvk0(u¯k) does not depend on v, k, or u¯k. We first consider the special case in which, at each time k, exposure is determined by an independent coin flip with success probability p. In that case, as shown in Remark 1 in Appendix A, the multiplicative bias of the case-crossover estimator is well approximated by

vk>Wu¯kMv(u¯k){1λv,kc0(u¯kc)}p(v)dvvk>Wu¯kMv(u¯k){1λv,kc1(u¯kc)}p(v)dv, (12)

where Mv(u¯k)=λvk0(u¯k){j=1kpv(ujY¯j1=0,U¯j1=u¯j1)}.

Disparities between the numerator and denominator of the bias term (12) will lead to bias of the estimator. Before examining disparities related to nonnegligible V and U-specific hazards, we note that the bias contribution of a disparity at a given level of v and u¯k depends on the weight Mv(u¯k)p(v), which is large when both the probability of observing (v,u¯k) and the probability of an untreated event occurring at k given v and u¯k are large. Thus, the larger the proportion of total cases occurring at v and u¯k, the more that failure of the rare outcome assumption at v and u¯k biases the estimator.

The only difference between the numerator and denominator of (12) is that where 1λv,kc0(u¯kc) appears in the numerator, 1λv,kc1(u¯kc) appears in the denominator. The ratio of the term in the numerator to that in the demoninator is 1λv,kc0(u¯kc)1βλv,kc0(u¯kc). When β=1, this factor is equal to 1 and there is no bias. When β1, the bias is away from the null since 1λv,kc0(u¯kc)1βλv,kc0(u¯kc)>1 if and only if β>1 and thus the MH estimator converges to a limit that is further from 1 than the true β and in the same direction. For the mutiplicative bias to be nonnegligible requires a violation of the rare outcome assumption in which there exist histories v, u¯k for which both Mv(u¯k)/u¯kMv(u¯k) and βλv,kc0(u¯kc) are nonnegligible.

We illustrate this bias with a simulation. For N=100,000 subjects, we simulated treatments and counterfactual outcomes for 24 time steps or until the first occurrence of the outcome according to the following data-generating process (DGP).

UtBernoulli(.001);λt0(Ut1,Ut)=min(1/2,.45Ut1+.45Ut)Yt0Bernoulli(λt0(Ut1,Ut));λt1(Ut1,Ut)=2λt0(Ut1,Ut)Yt=AtYt1+(1At)Yt0

The DAG for this DGP is depicted in Figure 2. The true value of β is 2. There are no common causes of treatments and outcomes, treatments are independent identically distributed and hence exhibit no time trends, and the outcome is rare when marginalized over U. (Although the outcome is not rare when Ut=1, it is rare that Ut=1.) Yet the limit of the case-crossover estimator using the time prior to outcome occurrence as the control is approximately 2.8. The estimator fails because the outcome was common when Ut or Ut1 were 1 and a large proportion of total cases occurred when Ut or Ut1 were 1. The bias is away from the null, as predicted by our analysis above. The effect of U on the outcome needed to be strong to produce the bias in this simulation. If λt0(Ut1,Ut)=min(1/2,.25Ut1+.25Ut) instead of min(1/2,.45Ut1+.45Ut), then the case-crossover estimator is about 2.3 instead of 2.8. A recently formed blood clot could roughly play the role of U¯ in the MI example—a rare event that does not influence probability of exposure, greatly increases probability of the outcome at multiple time points after the clot forms, and without which the outcome is rare.

FIGURE 2.

FIGURE 2

Causal DAG for simulation DGP with unobserved post-baseline common causes of outcomes at different times. This figure appears in color in the electronic version of this article, and any mention of color refers to that version

Now we consider bias in the more general scenario where treatments are correlated across time. In Appendix A, we expand the multiplicative bias term as

vk>Wu¯kMv(u¯k)(1λv,kc0(u¯kc))a¯k/k,kcGv(1,0,a¯k/k,kc,u¯k)skc,k(1λvsas(u¯s))p(v)dvvk>Wu¯kMv(u¯k)(1λv,kc1(u¯kc))a¯k/k,kcGv(0,1,a¯k/k,kc,u¯k)skc,k(1λvsas(u¯s))p(v)dv, (13)

where a¯k/k,kc denotes a¯k excluding ak and akc and Gv(a,a,a¯k/k,kc,u¯k) (defined in Appendix A) roughly corresponds to the probability of observing treatment trajectory with ak=a,akc=a, and treatment at the other time points equal to a¯k/k,kc. When treatments are correlated, Gv(1,0,a¯k/k,kc,u¯k) in the numerator might assign high weights to different treatment sequences a¯k/k,kc than Gv(0,1,a¯k/k,kc,u¯k) in the denominator, and under failure of the rare outcome assumption the highly weighted treatment sequences in the numerator might have significantly different survival probabilities (skc,k(1λvsas(u¯s))) for some values of v and u¯k than the highly weighted treatment trajectories in the denominator. By the reasoning we applied to infer direction of bias in the case with uncorrelated exposures, strongly weighted untreated survival probabilities in the numerator combined with strongly weighted treated survival probabilities in the denominator would lead to bias away from the null, and vice versa. Depending on the treatment correlation pattern, treated or untreated survival probabilities might be more strongly weighted in the numerator or denominator. Thus, in the correlated treatment case the resulting bias can be either toward or away from the null. As in the case without correlated exposures, the magnitude of the bias contribution stemming from this dynamic for a given v and u¯k depends on Mv(u¯k).

To illustrate, we modify our previous simulation example to add correlations in treatments over a time period much greater than the duration of the exposure’s transient effect. Specifically, we reduce the duration of the transient effect from 1 h to 1 s. Exposure and the unobserved common cause of the outcome are still independently assigned to 1 h intervals as in the previous simulation. This induces perfect correlation between treatments corresponding to 1 s time bins within the same hour. The untreated 1 s discrete hazards are set to preserve the hourly untreated survival probability from the previous simulation, and the multiplicative treatment effect within each one second bin is again set to 2. To formalize, we simulated data according to

U˜kBernoulli(.001)fork{1,,24};Ukt=U˜kfork{1,,24},t{1,,3600}
A˜kBernoulli(.5)fork{1,,24};Akt=A˜kfork{1,,24},t{1,,3600}
λkt0(U¯kt)=0.000166(Ukt+Uk1t);Ykt0Bernoulli(λkt0(U¯kt));λkt1(U¯kt)=2λkt0(U¯kt)
Ykt1Bernoulli(λkt1(U¯kt));Ykt=AktYkt1+(1Akt)Ykt0

where we have indexed “hours” by k and seconds within hours by t. The true value of β in this DGP is again 2, but the case-crossover estimate using the time bin exactly one hour (3600 s) prior to the case as the control (as in the previous simulation) is 1.84. So decreasing the transient exposure effect to 1 s without changing either the case-crossover estimator or the treatment duration of 1 h made the bias switch direction. The two simulations taken together illustrate that bias from strong common causes of the outcome, when present, can be both sizable and unpredictable. (See Web Appendix B for analytic confirmation of simulation results from both DGPs using (A.4), discussion of what drives the discrepancy between the two simulations, and further analysis of bias in the correlated exposure setting.)

5.2 |. Treatment effect heterogeneity

We now examine sensitivity to violations of the constant causal hazard ratio assumption if the rare outcome assumption holds. For simplicity, we consider a scenario where there are just two types of subjects and counterfactual hazard ratios are constant across time within types. For g{0,1},say subjects of type g arise from the following data-generating process:

A1,,ATidBernoulli(pA,g);Y10,,YT0iidBernoulli(λg0)Y11,,YT1iidBernoulli(λg1);Yj=AjYj1+(1Aj)Yj0,

with data censored at the first occurrence of the outcome. So within each type g, the constant causal hazard ratio is λg1/λg0. Let pg denote the proportion of the population of type g=1 at baseline, which under the rare outcome assumption would also be approximately the proportion of type g=1 among surviving subjects at all subsequent follow-up times. According to Equation (A.6) from the proof of Theorem 1, if the rare outcome assumption holds, then the case-crossover estimator with m=1 (i.e., using just one control) will approach

λg=11pA,g=1(1pA,g=1)pg+λg=01pA,g=0(1pA,g=0)(1pg)λg=10pA,g=1(1pA,g=1)pg+λg=00pA,g=0(1pA,g=0)(1pg). (14)

(14) can be expressed as a weighted average of λg=01/λg=00 and λg=11/λg=10,λg=01λg=00δδ+θ+λg=11λg=10θδ+θ, where δ=λg=00 pAg=0(1pA,g=0)(1pg) and θ=λg=10pA,g=1(1pA,g=1)pg. Hence, the limit of the case-crossover estimator is bounded by the group-specific hazard ratios.

The relative risk computed from any of the RCTs described in Section 3 would approach

λg=11pg+λg=01(1pg)λg=10pg+λg=00(1pg). (15)

Like the case-crossover limit, the RCT estimand can be expressed as a weighted average of λg=01/λg=00 and λg=11/λg=10:

λg=01λg=00λg=00(1pg)λg=00(1pg)+λg=10pg+λg=11λg=10λg=10pgλg=00(1pg)+λg=10pg. (16)

Without loss of generality, assume λg=01λg=00>λg=11λg=10. The ratio of the weight placed on the higher hazard ratio to the weight placed on the lower hazard ratio in the RCT estimand is

γRCTλg=00(1pg)λg=10pg. (17)

The corresponding case-crossover weight ratio is

γCCλg=00(1pg)pA,g=1(1pA,g=1)λg=10pgpA,g=0(1pA,g=0)=γRCT×pA,g=0(1pA,g=0)pA,g=1(1pA,g=1). (18)

(18) implies that bias of the case-crossover estimator due to treatment effect heterogeneity depends on the difference in treatment probability between groups with different effect sizes. If treatment probability does not vary across groups with different treatment effects, effect heterogeneity will not induce bias in the case-crossover estimator. When treatment probabilities do vary, whichever group has higher treatment variance pA,g(1pA,g), that is, whichever group has probability of treatment closer to 0.5, will be weighted too highly by the case-crossover estimator compared to the RCT estimand. Some intuition behind this behavior is that the closer the treatment probability within a group is to 0.5, the more subjects from that group will contribute discordant case-control pairs to the case-crossover estimator, weighting the estimator disproportionately toward the effect within that group.

For illustrative purposes, consider a numerical example where we set

λg=00=0.001;λg=01=0.002;λg=10=0.0005;λg=11=0.005;pA,g=0=0.8;pA,g=1=0.5;pg=0.5.

Then λg=11/λg=10=10, λg=01/λg=00=2, and the RCT estimand (15) is equal to 4.67. IRR^MH converges to 5.5, while the naive cohort hazard ratio estimator P(Y=1A=1)P(Y=1A=0) that does not adjust for the confounder g approaches 4.9. In this example, bias from effect heterogeneity overrides any benefits from control of unobserved confounding.

The specific numerical example above is a cautionary tale illustrating the potential significance of heterogeneity induced bias. But if both cohort and case-crossover analyses are feasible with available data, and unobserved baseline confounding and effect heterogeneity vary within realistic ranges, does one estimator tend to be more biased than the other? We addressed this question in the framework of our toy example by computing the limiting values of case-crossover and cohort estimators for a large grid of data-generating process parameter settings. We let λg=00 and λg=10 take values in {0.0005,0.001},λg=01/λg=00 take values in {1,,5}, λg=11/λg=10 take values in {1×λg=01/λg=00,,10×λg=01/λg=00}, and pA,g=0 and pA,g=1 take values in {1/20,,19/20}. Figure 3 shows that neither estimator has a general advantage over the other across parameter settings.

FIGURE 3.

FIGURE 3

Left: Scatterplot of case-crossover versus cohort estimator multiplicative bias across a range of settings. Middle: Distribution of case-crossover estimator bias across settings. Right: Distribution of ratio of case-crossover bias to cohort bias across settings. This figure appears in color in the electronic version of this article, and any mention of color refers to that version

In the MI study, the effect of exercise appeared much greater in subjects who rarely exercised than in those who exercised regularly. Probability of treatment (i.e., exercise) clearly varied considerably between regular and rare exercise groups. Hence, we would expect an estimate of the marginal effect to be biased. The authors of the MI study reported separate effect estimates for the strata (exercise frequency prior to the study period) over which the effect was thought to vary. This is appropriate, as marginal effect estimates for the full population can be misleading.

6 |. DISCUSSION

We have put the case-crossover estimator on more solid theoretical footing by providing a proof of its approximate convergence to a formal counterfactual causal estimand, β, under certain assumptions. This result alone may not be of much utility, but it was overdue for such a widely used method. And the derivation yielded some practical insights as by products.

First, we discovered a new source of potential bias when the treatment effect is not null–strong common causes of the outcome across time. We analyzed this bias and illustrated its potential significance and unpredictability with simulations. The effect of the common cause needs to be quite strong to induce sizable bias, but the fact that (V,U¯) can be high dimensional and temporally postbaseline increases the likelihood of this in a real analysis. Formation of a blood clot might induce a bias of this sort in the MI example, but it is difficult to speculate about how often meaningful bias of this type appears in practice.

Second, expression (A.6) characterizing the limit of the case-crossover estimator allowed us to quantify sensitivity to violations of the constant treatment effect assumption. We analyzed a simple scenario with two groups of subjects having potentially different baseline risks, exposure rates, and treatment effects. The limit of the case-crossover estimator was a weighted average of the group-specific hazard ratios. The bias relative to the estimand (2) that would be targeted by an RCT depends on the exposure rates in the groups. If the groups have the same exposure rate, effect heterogeneity would not induce any bias. Otherwise, whichever group had exposure rate closer to 0.5 would be overweighted. We provided a numerical example in which significant unobserved baseline confounding (which could be controlled by the case-crossover estimator) and effect heterogeneity were both present. In this example,the effect heterogeneity bias in the case-crossover estimator was greater than the confounding bias in a standard cohort hazard ratio estimator, illustrating that effect heterogeneity can sometimes override benefits from control of unobserved baseline confounding in the case-crossover estimator. More extensive numerical analyses showed that neither the cohort estimator nor the case-crossover had a general advantage across a range of settings in which the levels of unobserved confounding and effect heterogeneity varied. An analyst concerned about bias from effect heterogeneity could employ the general framework of our numerical studies to conduct a quantitative bias analysis (Lash et al., 2014).

Overall, the formal assumptions required for consistency mostly mapped onto informal assumptions (a)–(e). Unsurprisingly for a method that has been used for 30 years, our contributions do not drastically alter its recommended use. As an illustrative exercise, we assess our simplified version of Mittleman et al. (1993) study of the effect of exercise on MI assumption by assumption through the lens of our analysis in Web Appendix C.

We might summarize our general guidance to practitioners and consumers of case-crossover analyses as follows. If unobserved baseline confounding is thought to be serious and/or data collection for a cohort study is unfeasible, the case-crossover should be considered as an option. If interest lies only in testing the null hypothesis of no effect, fewer assumptions are necessary. Under the null: the transient treatment effects assumption automatically holds; common causes of the outcome do not induce bias; the rare outcome assumption is not necessary; and there is no treatment effect heterogeneity. Hence, the case-crossover design remains a clever method for causal null hypothesis testing in the presence of unmeasured baseline confounders under the exchangeability (4), no time trends in treatment (11), and no time-modified confounding (9) assumptions. If interest lies in obtaining a point estimate, results should be interpreted with considerable additional caution as effect heterogeneity, delayed treatment effects, and common causes of outcomes will all be present to some degree, and as we have shown can have a large impact on results.

There are many variants of the case-crossover design, of which we have here only analyzed arguably the simplest one. One important extension of the MH estimator adjusts for post-baseline confounders through matching. Another variant employs conditional logistic regression in place of the MH estimator. Inthiscase, Vines and Farrington(2001) showed that joint exchangeability is required among all control times and the case time as opposed to just pairwise exchangeability. Additionally, in situations where time trends in treatment are present, the case-time-control method (Suissa, 1995) is often utilized and requires alternative assumptions (Greenland, 1996). The case-crossover design is also frequently applied in air pollution epidemiology. In this setting, the treatment regime is shared among all subjects and later values of treatment are not influenced by past values of subjects’ outcomes, allowing more flexible control time selection strategies, including using control times following outcome occurrence (Janes et al., 2005; Levy et al., 2001; Navidi, 1998). It would be interesting to investigate these variants in a similar counterfactual framework.

Supplementary Material

Web Appendix C (Pg 6)

ACKNOWLEDGMENTS

This research was partly funded by NIH grant R37 AI102634. We also wish to thank Sonia Hernandez-Diaz and Anke Neumann for guidance and helpful discussions.

Funding information

National Institutes of Health, Grant/Award Number: R37 AI102634

APPENDIX A: PROOF OF THEOREM 1

From the definition of IRR^MH, it is clear that under the case-crossover design

IRR^MHpIRRMH=l=1mPrMH(A1=1,Al0=0)l=1mPrMH(A1=0,Al0=1)=l=1mPr(AX=1,AXcl=0,X>W)l=1mPr(AX=0,AXcl=1,X>W), (A.1)

where X is the outcome occurrence time random variable set to if the outcome occurs after end of followup (i.e.,X>T). For simplicity, we present the proof for the case where m=1 and, if the outcome occurs at time k, there is one control time kc (so W=c). Letting a¯k/k,kc denote a¯k excluding ak and akc, we can express (A.1) as

IRRMH*=vk>Wa¯k/k,kc,u¯kpv(Yk=1,Y¯k1=0,Ak=1,Akc=0,A¯k/k,kc=a¯k/k,kc,U¯k=u¯k)p(v)dvvk>Wa¯k/k,kc,u¯kpv(Yk=1,Y¯k1=0,Ak=0,Akc=1,A¯k/k,kc=a¯k/k,kc,U¯k=u¯k)p(v)dv (A.2)
=vk>Wa¯k/k,kc,u¯kλvk1(u¯k)pv(Y¯k1=0,Ak=1,Akc=0,A¯k/k,kc=a¯k/k,kc,U¯k=u¯k)p(v)dvvk>Wa¯k/k,kc,u¯kλvk0(u¯k)pv(Y¯k1=0,Ak=0,Akc=1,A¯k/k,kc=a¯k/k,kc,U¯k=u¯k)p(v)dv (A.3)
=kvu¯kλvk1(u¯k)pv(U¯k=u¯k,Akc=0,Ak=1,Y¯k1=0)p(v)dvkvu¯kλvk0(u¯k)pv(U¯k=u¯k,Akc=1,Ak=0,Y¯k1=0)p(v)dv. (A.4)

We get (A.2) by basic probability rules; (A.3) by consistency (3), Sequential Exchangeability (4), and UV-transient hazards (5); and (A.4) by total probability. Under rare outcome assumption (7), pv(U¯k=u¯kAkc=a,Ak=a,Y¯k1=0)pv(U¯k=u¯kY¯k1=0) as collider paths| conditional on Y¯k1 are effectively closed. So we can approximate (A.4)

IRRMHvk>Wu¯kλvk1(u¯k)pv(U¯k=u¯kY¯k1=0)pv(Ak=1,Akc=0)p(v)dvvk>Wu¯kλvk0(u¯k)pv(U¯k=u¯kY¯k1=0)pv(Ak=0,Akc=1)p(v)dv (A.5)
=vk>Wλvk1pv(Ak=1,Akc=0)p(v)dvvk>Wλvk0pv(Ak=0,Akc=1)p(v)dv=vk>Wβvkλvk0pv(Ak=1,Akc=0)p(v)dvvk>Wλvk0pv(Ak=0,Akc=1)p(v)dv (A.6)
=βvk>Wλvk0pv(Ak=1,Akc=0)p(v)dvvk>Wλvk0pv(Ak=0,Akc=1)p(v)dvβτ, (A.7)

where the first equality in (A.7) follows from the constant hazard ratio assumption (6). We can rewrite τ as

τ=k>Wvλvk0pv(Ak=0,Akc=1){pv(Ak=1,Akc=0)pv(Ak=0,Akc=1)1}p(v)dvk>Wvλvk0pv(Ak=0,Akc=1)}p(v)+11under (8). (A.8)

Thus IRRMHβ under (3)(8). Alternatively, under (9)(11) we can write τ as

τ=k>Wvλvk0[pv(Ak=0,Akc=1)+(pv(Ak=1,Akc=0)pv(Ak=0,Akc=1))]p(v)dvk>Wvλvk0pv(Ak=0,Akc=1)p(v)dv (A.9)
=1+k>Wvλvk0(pv(Ak=1,Akc=0)pv(Ak=0,Akc=1))p(v)dvk>Wvλvk0pv(Ak=0,Akc=1)p(v)dv (A.10)
=1+k>Wvλvk0p(v)dvv(pv(Ak=1,Akc=0)pv(Ak=0,Akc=1))p(v)dvk>Wvλvk0pv(Ak=0,Akc=1)p(v)dv (A.11)
1, (A.12)

where (A.9) and (A.10) are algebra, (A.11) follows from (9), and (A.12) follows from applying (10) to the denominator of the quotient in (A.11) and then (11) to the numerator, proving that IRRMHβ under (3)(7) and (9)(11).

Remark 1.

In the absence of rare disease and under the stronger constant effects assumption that β=λvk1(u¯k)/λvk0(u¯k) does not depend on v, k, or u¯k, it follows from (A.4) that we can expand the multiplicative bias as

vk>Wu¯kMv(u¯k)(1λv,kc0(u¯kc))a¯k/k,kcGv(1,0,a¯k/k,kc,u¯k)skc,k(1λvsas(u¯s))p(v)dvvk>Wu¯kMv(u¯k)(1λv,kc1(u¯kc))a¯k/k,kcGv(0,1,a¯k/k,kc,u¯k)skc,k(1λvsas(u¯s))p(v)dv, (A.13)

where Mv(u¯k)λvk0(u¯k)j=1kpv(ujY¯j1=0,U¯j1=u¯j1) and Gv(a,a,a¯k/k,kc,u¯k)pv(Ak=aY¯k1=0,Akc=a,A¯k/k,kc=a¯k/k,kc,U¯k=u¯k)×s=kc+1k1pv(asY¯s1=0,Akc=a,A¯s1/kc=a¯s1/kc,U¯kc=u¯kc)×pv(Akc=aY¯kc1=0,A¯kc1=a¯kc1,U¯kc1=u¯kc1)s=1kc1pv(asY¯s1=0,A¯s1=a¯s1,U¯s=u¯s). If at each time s, AsiidBernoulli(p), the bias is approximately vk>Wu¯kMv(u¯k){1λv,kc0(u¯kc)}p(v)dvvk>Wu¯kMv(u¯k){1λv,kc1(u¯kc)}p(v)dv.

APPENDIX B: FURTHER DETAILS ON SEQUENTIAL EXCHANGEABILITY ASSUMPTION

More precisely, Equation (4) holds under the assumption (which we assume is true) that the causal DAG in Figure 1 represents an underlying FFRCISTG counterfactual causal model (Robins, 1986) and thus also under Pearl’s NPSEM with independent errors. See Richardson and Robins (2013) and Shpitser, Richardson, and Robins (2020).

Footnotes

SUPPORTING INFORMATION

Web Appendices referenced in Sections 4, 5, and 6 and R code for simulations in Section 5 are available with this paper at the Biometrics website on Wiley Online Library.

DATA AVAILABILITY STATEMENT

Data sharing is not applicable as no new data were created or analyzed in this paper.

REFERENCES

  1. Breslow N. (1981) Odds ratio estimators when the data are sparse. Biometrika, 68(1), 73–84. [Google Scholar]
  2. Gill RD & Robins JM (2001) Causal inference for complex longitudinal data: the continuous case. Annals of Statistics, 29, 1785–1811. [Google Scholar]
  3. Greenland S. (1996) Confounding and exposure trends in case-crossover and case-time-control designs. Epidemiology, 7(3), 231–239. [DOI] [PubMed] [Google Scholar]
  4. Greenland S. & Robins JM (1985) Estimation of a common effect parameter from sparse follow-up data. Biometrics, 41, 55–68. [PubMed] [Google Scholar]
  5. Greenland S, Pearl J. & Robins JM (1999) Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48. [PubMed] [Google Scholar]
  6. Hernan MA, Hernandez-Díaz S. & Robins JM (2004) A structural approach to selection bias. Epidemiology, 15, 615–625. [DOI] [PubMed] [Google Scholar]
  7. Janes H, Sheppard L. & Lumley T. (2005) Case-crossover analyses of air pollution exposure data: referent selection strategies and their implications for bias. Epidemiology, 16(6), 717–726. [DOI] [PubMed] [Google Scholar]
  8. Kleinbaum D, Kupper L. & Chambless L. (1982) Logistic regression analysis of epidemiologic data: theory and practice. Communications in Statistics-Theory and Methods, 11(5), 485–547. [Google Scholar]
  9. Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC & Greenland S. (2014) Good practices for quantitative bias analysis. International Journal of Epidemiology, 43(6), 1969–1985. [DOI] [PubMed] [Google Scholar]
  10. Levy D, Lumley T, Sheppard L, Kaufman J. & Checkoway H. (2001) Referent selection in case-crossover analyses of acute health effects of air pollution. Epidemiology, 12(2), 186–192. [DOI] [PubMed] [Google Scholar]
  11. Maclure M. (1991) The case-crossover design: a method for studying transient effects on the risk of acute events. American Journal of Epidemiology, 133(2), 144–153. [DOI] [PubMed] [Google Scholar]
  12. Mittleman MA, Maclure M, Tofler GH, Sherwood JB, Goldberg RJ & Muller JE (1993) Triggering of acute myocardial infarction by heavy physical exertion–protection against triggering by regular exertion. New England Journal of Medicine, 329(23), 1677–1683. [DOI] [PubMed] [Google Scholar]
  13. Mittleman MA & Mostofsky E. (2014) Exchangeability in the case-crossover design. International Journal of Epidemiology, 43, 1645–1655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Navidi W. (1998) Bidirectional case-crossover designs for exposures with time trends. Biometrics, 54, 596–605. [PubMed] [Google Scholar]
  15. Nurminen M. (1981) Asymptotic efficiency of general noniterative estimators of common relative risk. Biometrika, 68(2), 525–530. [Google Scholar]
  16. Platt RW, Schisterman EF & Cole SR. (2009) Time-modified confounding. American Journal of Epidemiology, 170(6), 687–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Richardson TS & Robins JM (2013) Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper 128. [Google Scholar]
  18. Robins JM (1986) A new approach to causal inference in mortality studies with a sustained treatment period–application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9–12), 1393–1512. [Google Scholar]
  19. Robins JM & Hernan MA (2009) Estimation of the causal effects of time-varying treatments. New York, NY: Chapman and Hall/CRC Press. [Google Scholar]
  20. Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. The Annals of Statistics, 6, 34–58. [Google Scholar]
  21. Shpitser I, Richardson TS & Robins JM (2020) Multivariate counterfactual systems and causal graphical models. Preprint, arXiv:2008.06017. [Google Scholar]
  22. Suissa S. (1995) The case-time-control design. Epidemiology, 6(3), 248–53. [DOI] [PubMed] [Google Scholar]
  23. Tarone RE, Gart J. & Hauck W. (1983) On the asymptotic inefficiency of certain noniterative estimators of a common relative risk or odds ratio. Biometrika, 70(2), 519–522. [Google Scholar]
  24. Vines SK & Farrington CP (2001) Within-subject treatment dependency in case-crossover studies. Statistics in Medicine, 20(20), 3039–3049. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Appendix C (Pg 6)

Data Availability Statement

Data sharing is not applicable as no new data were created or analyzed in this paper.

RESOURCES