Abstract
In the presence of time-varying confounders affected by prior treatment, standard statistical methods for failure time analysis may be biased. Methods that correctly adjust for this type of covariate include the parametric g-formula, inverse probability weighted estimation of marginal structural Cox proportional hazards models, and g-estimation of structural nested accelerated failure time models. In this article, we propose a novel method to estimate the causal effect of a time-dependent treatment on failure in the presence of informative right-censoring and time-dependent confounders that may be affected by past treatment: g-estimation of structural nested cumulative failure time models (SNCFTMs). An SNCFTM considers the conditional effect of a final treatment at time m on the outcome at each later time k by modeling the ratio of two counterfactual cumulative risks at time k under treatment regimes that differ only at time m. Inverse probability weights are used to adjust for informative censoring. We also present a procedure that, under certain “no-interaction” conditions, uses the g-estimates of the model parameters to calculate unconditional cumulative risks under nondynamic (static) treatment regimes. The procedure is illustrated with an example using data from a longitudinal cohort study, in which the “treatments” are healthy behaviors and the outcome is coronary heart disease.
Keywords: Causal inference, Coronary heart disease, Epidemiology, G-estimation, Inverse probability weighting
1. INTRODUCTION
A common approach to estimating the causal effect of a time-varying treatment on failure time is to model the hazard of failure as a function of past treatment and covariate history (Kalbeisch and Prentice 1980). However, this approach may be biased even under the causal null whether or not one adjusts for the past history of measured covariates in the analysis when (a) there exists a time-dependent risk factor for failure that also predicts subsequent treatment, and (b) past treatment history predicts subsequent risk factor level (Robins 1986; Hernán, Hernández-Diaz, and Robins 2004). For example, when interested in the effect of a dietary intervention on the risk of coronary heart disease (CHD), diabetes is such a risk factor because (a) diabetes is a risk factor for CHD and receiving a diagnosis of diabetes may lead to changes in diet, and (b) the risk of diabetes is affected by an individual’s prior diet history.
In the context of survival analysis, Robins and colleagues have developed several methods to deal with the problem of time dependent confounders that are themselves affected by previous treatment. These include the parametric g-formula (Robins 1986; Taubman et al. 2009), inverse probability weighted (IPW) estimation of marginal structural Cox proportional hazard models (Cox MSMs; Robins 1999; Hernán, Brumback, and Robins 2001), and g-estimation of structural nested accelerated failure time models (SNAFTMs; Robins 1998; Hernán et al. 2005).
In this article, we propose a new approach to estimate the causal effect of a time-dependent treatment on failure in the presence of informative right-censoring and time-dependent confounders that may be affected by past treatment. This approach is based on a new class of structural models called structural nested cumulative failure time models (SNCFTMs), previously discussed by Page (2005; unpublished Sc.D. dissertation) and briefly in a simulation study by Young et al. (2010). An SNCFTM can be viewed as a special case of the multivariate structural nested mean model (Robins 1994). The use of SNCFTMs requires that, for all values of the measured covariates, the conditional cumulative probability of failure under the possible treatment regimes satisfies a particular type of rare failure assumption, formally defined in Section 3.
In this “rare failure” context, the SNCFTM has advantages compared with Cox MSMs, SNAFTMs, and the parametric g-formula. Like the Cox MSM, but in contrast to the SNAFTM, the SNCFTM admits unbiased estimating equations that are differentiable in the model parameters and thus are easily solved. Like the SNAFTM, but in contrast to the Cox MSM, a correctly specified SNCFTM does not require a positivity assumption (i.e., it is not necessary for both treated and untreated to occur in every stratum of covariates); furthermore, it naturally allows one to investigate both quantitative and qualitative effect modification by evolving time-dependent covariates. Finally, the SNCFTM is dependent on fewer parametric assumptions than are required in applications of the parametric g-formula.
Besides describing the SNCFTM, this article presents an application to estimate the effects of lifestyle interventions on the 20-year risk of CHD among women in the Nurses’ Health Study. For concreteness, we introduce the notation and method in the context of our application. Although we restrict attention to time-varying binary treatments, the extension to continuous and/or multivariate treatments is straightforward (Robins 1994).
2. STUDY POPULATION AND NOTATION
The Nurses’ Health Study is a prospective cohort study of female nurses followed since 1976, when participants were 30–55 years old. Data are collected in mailed questionnaires every 2 years to update information on potential risk factors and newly diagnosed diseases (Colditz, Manson, and Hankinson 1997). Let k = 0, 1, … , K + 1 where k = 0 denotes the time of return of the 1982 questionnaire and time k is 2k years later, with the last questionnaire returned at time K. Follow-up ends at K + 1 = 10 (20 years). For brevity, we use “by time k” to mean “before the start of interval k,” and “reported at time k” to mean “reported on the questionnaire returned at the start of interval k.”
As in the literature by Stampfer et al. (2000) and Taubman et al. (2009), we define CHD as confirmed first myocardial infarction (fatal or nonfatal). Let Yk indicate CHD (0: no, 1: yes) by time k for k = 1, 2, …, K + 1. Our study population consists of 78,746 women who were free of CHD (Y0 = 0) and had complete baseline covariate data in 1982. During the follow-up, 2319 women experienced a CHD event, 5616 died from other causes, and 16,818 were censored when they first failed to return a questionnaire (Taubman et al. 2009). We explain how to handle censoring in Section 6.
We estimated the effects of separate interventions on cigarette smoking, exercise, diet score (as defined by Stampfer et al. 2000), and alcohol intake. The assessments of these variables have been validated (Colditz et al. 1986; Salvini et al. 1989; Wolf et al. 1994). We quantify these effects from a public health, rather than biological, perspective. Following Taubman and colleagues (2009), we estimated the effects of four hypothetical interventions sustained from 1982 to 2002:
-
(1)
avoid smoking;
-
(2)
exercise at least 30 min/day;
-
(3)
keep diet score in a range corresponding to the top two quintiles of the observed data; and
-
(4)
consume at least 5 g/day of alcohol.
Although there is ongoing debate about the potential nonlinearity of the effect of high levels of alcohol consumption on CHD risk, the choice of 5 g/day as a threshold for alcohol was made to be consistent with the definition of the low-risk group described by Stampfer et al. (2000). Very few women in the Nurses’ Health Study report heavy drinking; less than 6% of observations correspond to at least two drinks (27.4 g alcohol) per day.
Our specific goal is to estimate the effect of representative interventions or regimes as defined by Taubman et al. (2008). In a representative regime, the actual daily exercise duration, diet score, or amount of alcohol assigned at time m is randomly drawn from the observed distribution of the variable intervened on (e.g., daily exercise duration), conditional on past covariate history, among those subjects who, in the observed data, have followed the intervention (e.g., exercised at least 30 min/day) through time m (see Section 4 for a formal definition). Note that our estimates of the effect of a hypothetical intervention under a representative regime cannot be extrapolated to cohorts whose observed data distribution differs from that of the Nurses’ Health Study. We nonetheless chose to study representative regimes to achieve our goal of comparing our estimates with those obtained in previous analyses that used alternative statistical methods.
For each of the above interventions, we denote by Am the observed value at time m of the risk factor to be intervened on (smoking, exercise, diet score, or alcohol intake). Let Lm be the observed vector of covariates reported at time m, where L0 includes time-invariant (fixed or baseline) covariates. We always include the survival indicator Sj = 1 − Yj in Lj. We view Am−1 as a component of Vm = (Am−1, Lm); an arbitrary single component of Vm (either Am−1 or one of the components of Lm) will be denoted Xm. Table 1 lists the variables included in V0 and Vm in our application. For each intervention, we then defined an observed binary time-varying variable Rm to indicate whether awoman’s reported Am (for m ≥ 0) was consistent with the intervention under consideration. For example, when estimating the effects of the exercise intervention, Rm = 1 for those who exercise at least 30 min/day at time m ≥ 0, and Rm = 0 otherwise. That is, Rm = I(Am ≥ 30) for m ≥ 0 and Rm = 0 for m < 0. We refer to Rm as the “treatment” indicator, and thus “being treated” in the observed data (Rm = 1) means “behaving consistently with the intervention.” Note that for the “avoid smoking” intervention, Rm = 1 denotes no smoking at time m. At each time m, we assume the temporal ordering Ym, Lm, Rm, Am (Robins et al. 2004). Our dataset includes K + 1 observations for every uncensored individual. Further, since Yk = 1 denotes failure before k, Yk = 1 implies that Yj = 1, for all j, k ≤ j ≤ K + 1. When Yj = 1, we define, by convention, Lj = Rj = 0.
Table 1.
Covariates included in the models for treatment (Rm) and for remaining uncensored (Cm+1 = 0)
| Time-varying covariates in Vm | Years assessed | No. of categories |
|---|---|---|
| Time period | All | 10 |
| Multivitamins use | Starting in 1980 | 2 |
| Aspirin use | 1980, 1982, 1984, 1988 on |
2 |
| Statin use | 1988, 1994 on | 2 |
| Postmenopausal hormone use | All | 2 |
| Number of cigarettesa | All | 5 |
| Exercisea | 1980, 1982, 1986, 1988, 1992 on |
6 |
| Diet scorea | 1980, 1984, 1986, 1990, 1994, 1998 |
5 |
| Alcohol intakea | 1980, 1984, 1986, 1990, 1994, 1998 |
4 |
| High blood pressure | All | 2 |
| High cholesterol | All | 2 |
| Diabetes | All | 2 |
| Angina | All | 2 |
| Stroke | All | 2 |
| Coronary artery bypass graft | Starting in 1986 | 2 |
| Cancer | All | 2 |
| Menopause | All | 2 |
| Osteoporosis | Starting in 1982 | 2 |
| BMI | All | 6 |
| Baseline covariates (1980) in V0 | ||
| Age | 5 | |
| Parental MI before age 60 | 2 | |
| Smoking history | 2 | |
| Oral contraceptive history | 2 | |
| BMI at age 18 | 5 | |
| BMI | 6 | |
| Exercise | 6 | |
| Diet score | 5 | |
| Alcohol intake | 4 |
This variable is omitted from the model for Rm when Rm represents behavior consistent with an intervention on this risk factor.
Overbars represent history from time 0, so R̄m = (R0, … , Rm) and V̄m = (V0, …, Vm). By convention, R̄−1 = V̄−1 = 0. To denote the full histories of treatment and covariates from time 0 to the end of the study, we use R̄ = (R0, … , RK) and V̄ = (V0, …, VK), respectively. The observed covariate and treatment history (V̄m, R̄m) can be written (V̄m, Rm) since each prior treatment Rj for j < m is determined by Aj , which is a component of Vj+1. We assume that the data observed on our n study participants are n independent, identically distributed realizations of a vector (ȲK+1, V̄K, RK, AK). In general, we use m to refer to a time when treatment was measured and k to refer to a time when the survival outcome was measured, with m < k. Where capital letters represent random variables and the corresponding lower-case letters represent their realizations.
Let a treatment regime g ≡ ḡK ≡ (g0, … , gK) be a vector of functions where gm assigns to each covariate history v̄m, a treatment rm to subjects who survive until time m (i.e., those with ym = 0), and, by convention, 0 to those who have failed (i.e., those with ym = 1). We define a nondynamic regime g ≡ ḡK = r̄K to be one where for those with ym = 0, gm(v̄m) equals the same rm for all v̄m. The four hypothetical interventions described above are nondynamic regimes that assign the same value of rm to all subjects with ym = 0 at every time point m from 1982 to 2002 (until failure). The regime “Always treat” (when ym = 0) is represented by g = (1, 1, … , 1) = 1¯ while the regime “Never treat” is g = 0¯. Another regime that will be used below is the regime “Receive the treatment actually received through time m, but no treatment thereafter” denoted by g = (R̄m, 0), where 0 (when following R̄m) represents no treatment from time m + 1 up to the end of the study. For any treatment regime g, Yk,g denotes the counterfactual (or potential) outcome by time k under regime g. For example, Yk,g = 0 indicates that a woman would have been free of CHD at time k if she had followed treatment regime g. More generally, Vm,g = (Am−1,g, Lm,g) is the counterfactual Vm.
We will see below that for any nondynamic regime g, under appropriate consistency and exchangeability assumptions, the distribution of ȲK+1,g under the above definition is identified and equals the distribution of ȲK+1,g under the corresponding representative regime.
3. STRUCTURAL NESTED CUMULATIVE FAILURE TIME MODELS
Let E[Yk,g|V̄m, Rm, Ym = 0] be the counterfactual risk of developing the outcome by time k, among those free of the outcome at time m < k, given their observed covariate and treatment history up to time m, had they followed treatment regime g. An SNCFTM models the ratio of two such risks under different regimes g conditional on covariate and treatment history (V̄m, Rm). The numerator of the ratio is the conditional risk had everyone received their actual treatment history R̄m through time m and no treatment from time m + 1 to the end of the study, that is, g = (R̄m, 0). The denominator of the ratio is the conditional risk had everyone received their actual treatment history R̄m−1 through time m − 1 and no treatment from time m to the end of the study (i.e., g = (R̄m−1, 0)). Adopt the convention that Rm = 0 for those who have Ym = 1. Then, the general form of the model is
| (1) |
| (2) |
where γk(V̄m, Rm; ψ) is a function of treatment and covariate history indexed by the (possibly vector-valued) parameter ψ whose unknown true value is ψ*. Note that Rm = 0 implies exp[γk(V̄m, Rm; ψ*)] = 1.
An SNCFTM for the effect of the intervention “Require everyone to exercise at least 30 min/day at each time m” on the risk at time k compares the conditional CHD risk at each time k under two regimes: a regime that does not modify anyone’s observed exercise indicator (either at least 30 min/day or less than 30 min/day) through time m, and then makes everyone exercise less than 30 min/day thereafter (i.e., sets the indicator to 0), and the nearly identical regime that sets the indicator to 0 starting one interval sooner.
We refer to γk(V̄m, Rm; ψ) as the “blip function” because the two treatment regimes being compared in Equation (1) can differ only at time m, and thus an SNCFTM models the effect of a final “blip” of treatment at time m on the outcome at time k. For consistency, the exponentiated blip function exp[γk(V̄m, Rm; ψ)] must be 1 when Rm = 0, because then the two treatment regimes being compared are identical. The model should also be chosen so that exp[γk(V̄m, Rm; 0)] = 1 when there is no effect of treatment at time m on outcome at time k. An example of a blip function is
| (3) |
so ψ = 0 corresponds to no effect, ψ < 0 to beneficial effect, and ψ > 0 to harmful effect.
The blip function we used in our analysis, which may bemore realistic in some contexts, is
| (4) |
The effect of Rm under model (4) is the same as that under model (3) at time k = m + 1, and then diminishes as the time elapsed since m, k − m, increases. Under certain conditions, this model is consistent with data generated under a particular SNAFTM (Young et al. 2008).
Other possible blip functions might include a product term between treatment Rm and a component of treatment history R̄m−1, for example,
| (5) |
a product term between treatment Rm and a baseline covariate X0 in V0,
| (6) |
or a product term between treatment Rm and a component Xm of the time-varying covariate vector Vm,
| (7) |
Page (2005) considers additional functional forms for the blip function.
Remark 1 (Restriction to rare failures). Note that the conditional expectations of the potential outcomes (failures) necessarily lie within the interval [0, 1]. Thus, exp[γm,k(V̄m, Rm; ψ*)] is bounded by
We do not impose this constraint in fitting the SNCFTM model because estimation of ψ* would then require a correct model for E[Yk,g=(R̄m−1,0)|V̄m, Rm, Ym = 0]. Instead, we will estimate ψ* under a semiparametric model that leaves the conditional expectation in the denominator completely unspecified, and thus protects our estimates against misspecification of the model for this conditional expectation. As a consequence, our semiparametric SNCFTM should only be used for studies in which the cumulative probability of failure is small. As seen later, our methodology allows us to obtain estimates of E[YK+1,g=(0¯)] and, in certain cases, E[YK+1,g=(r̄K)] for all r̄K without requiring a model for the E[Yk,g=(R̄m−1,0)| V̄m, Rm, Ym = 0]. If these estimates are all much less than 1, we have some, but not conclusive, evidence, for the rare failure assumption.
If we substitute the counterfactual survival indicators Sk,g=(R̄m,0) = 1 − Rk,g=(R̄m,0) and Sk,g=(R̄m−1,0) = 1 − Rk,g=(R̄m−1,0) for Rk,g=(R̄m,0) and Sk,g=(R̄m−1,0) in Eq (1), we obtain the structural nested multiplicative survival time model (SNMSTM) considered by Martinussen et al. (2011, p. 776). We chose to fit an SNCFTM rather than a SNMSTM because, with rare failures, unconstrained estimation of an SNMSTM may result in estimated survival probabilities exceeding 1.
The next two sections describe how to obtain a consistent estimate of ψ* via g-estimation, and how the parameter ψ* can, under a certain restrictive “no-interaction” condition, be used to calculate the absolute risk of failure for nondynamic treatment regimes.
4. IDENTIFICATION, REPRESENTATIVE REGIMES, AND G-ESTIMATION
This section describes assumptions under which the mean of ȲK+1,g is identified for all regimes g. We then show how the method of g-estimation can be used to estimate the parameter ψ* of the SNCFTM when these assumptions hold.
4.1 Assumptions and Definitions
Let ḡm(v̄m) ≡ (g0(v0), … , gm(v̄m)) = r̄m for any regime g that specifies the treatment r̄m through time m and consider the following identifying assumptions:
Assumption 1 (Consistency). For m = 0, … , K, and any regime g
That is, if a subject has an observed treatment history through m equal to that prescribed by regime g, then her observed outcome history through m + 1 will equal her counterfactual outcome history under regime g through that time.
Assumption 2 (Exchangeability). For any treatment regime g, and for all m ≤ K,
| (8) |
| (9) |
where Ym+1,g = (Ym+1,g, Ym+2,g, … , YK+1,g) and ∐ denotes statistical independence. Note these two conditions are equivalent to the single condition that
since, in the observed data, Rm is a deterministic function of the Am.
The exchangeability assumptions in Equations (8) and (9) are expected to hold in a sequentially randomized trial in which at each time m treatment Am is based on physical randomization with the randomization probabilities allowed to depend on the measured past V̄m. In an observational study, they cannot be guaranteed to hold and cannot be empirically tested from the data.
Assumption 3 (Positivity). If fV̄m (v̄m|Ym = 0) Pr[Ym = 0] ≠ 0, then
where fv̄m (v̄m|Ym = 0) is the conditional density of V̄m evaluated at v̄m.
Assumptions 1 and 3 and Equation (8) nonparametrically identify the distribution of ȲK+1,g for all regimes g (Robins 1994). However, once we impose an SNCFTM, the model is no longer nonparametric and the positivity assumption is not required for identification. Therefore, in the absence of positivity, the estimates rely on some degree of extrapolation from the model.
All of the formal results in this article refer to the means of the ȲK+1,g. However, when Assumption 2 (i.e., both Equations 8 and 9) holds, the distribution of ȲK+1,g for any nondynamic regime g = r̄K has an additional interpretation as the distribution of ȲK+1,grep(g) under the representative regime grep ≡ grep(g) defined as follows (Taubman et al. 2008):
Definition 2. The representative regime grep(g) corresponding to a nondynamic regime g = r̄K is the random regime in which, for each subject with a counterfactual history (l̄m, ām−1), (a) Rm is set to rm and (b) Am is a random draw am from the observed conditional distribution of Am given L̄m = l̄m, Ām−1 = ām−1, R̄m = r̄m.
A Clarifying Remark. Note that the notation for the counterfactual outcome under regime g, Yk,g, can be replaced by a mathematically equivalent, but more cumbersome, notation that makes it transparent that the counterfactual outcome is also a function of the particular value of treatment Am that the subject would receive if assigned to treatment Rm. (Hernán and VanderWeele 2011) By definition, under regime g = r̄K for each subject we (a) set (manipulate) the indicator of following the intervention at m to rm and then (b) a precise value of Am consistent with rm is then chosen by each subject without further intervention. In contrast, under the representative regime, clause (b) is replaced by the following clause (b’) we also intervene on Am by drawing its value at random from the aforementioned distribution. In both cases, it is the value of the Am that is the causal determinant of later failure. If Equation (9) holds, the choice of Am in regime g = r̄K is unconfounded; as a consequence (b) and (b’) are equivalent in the sense that, for g = r̄K, ȲK+1,grep(g) and ȲK+1,g have the same distribution. Since it would be quite unusual for Equation (9) not to hold when Equation (8) holds, the possibility that, for g = r̄K, the means of ȲK+1,grep(g) and Ȳ K+1,g differ is mainly of theoretical interest. Therefore, readers can safely ignore the possibility of difference without compromising their understanding of the remainder of the article.
4.2 G-Estimation
To describe g-estimation, define
Under our identifying assumptions, this quantity has the same mean as the counterfactual outcome Yk,g=(R̄m−1,0) conditional on V̄m when ψ = ψ* (Robins 1994; Page 2005). Therefore,
by Assumptions 1 and 2. We now use the above equality to describe the theory underlying g-estimation of the parameter ψ of the SNCFTM.
For our parameter vector ψ of dimension d, we consider estimating functions of the form
where B(v̄m, k) is a user-supplied real-valued function, J(v̄m, rm, k) is a user-supplied (1 × d)-dimensional function with k < m, D(V̄m, Rm, k) = J(V̄m, Rm, k) − E[J(V̄m, Rm, k)| V̄m, Ym = 0], and ′ denotes the transpose. If f(Rm| V̄m, Ym = 0) is known for each m, then under Assumption 2 and the SNCFTM γm,k(V̄m, Rm;ψ), it follows that
where ψ* is the true value of ψ. More generally, consider the class of estimating functions
It follows from theorem A3.1 in the article by Robins (1994) that is the orthogonal complement of the nuisance tangent space for ψ* under semiparametric model χ defined by the known f (Rm|V̄m, Ym = 0) and the SNCFTM γm,k(V̄m, Rm;ψ). It is well known that the influence functions of regular asymptotically linear (RAL) estimators of ψ* are elements of the orthogonal complement to the nuisance tangent space (Bickel et al. 2003; van der Laan and Robins 2003). Thus, if Pn(G) denotes for any G, then any RAL estimator of ψ* under semiparametric model χ is asymptotically equivalent to a solution of an estimating equation of the form
| (10) |
for some choice of functions B and J. That is, converges to zero in probability.
When f(Rm| V̄m, Ym = 0) is unknown, as in observational studies, we will replace it by an estimate to compute E[J(V̄m, Rm, k)| V̄m, Ym = 0]. If V̄m has two or more continuous components or many discrete components, we cannot hope to estimate E[J(V̄m, Rm, k)|V̄m, Ym = 0] using nonparametric smoothing due to the curse of dimensionality. Therefore, we estimate it under a parametric model. If, as in our example, Rm is dichotomous, then we can restrict consideration to functions J(V̄m, Rm, k) = J*(V̄m, k)′Rm, yielding D(V̄m, Rm, k)′ = J*(V̄m, k){Rm − E[Rm|V̄m, Ym = 0]}. Here J*(V̄m, k) is a (d × 1)-dimensional function. Hence, it suffices to postulate a parametric model for treatment,
| (11) |
where α* is an unknown vector of parameters to be estimated and E[Rm|V̄m, Ym = 0; α] is a known function. The estimator solving where
will be consistent and asymptotically normal (CAN) if themodel for E[Rm|V̄m, Ym = 0] is correct.
For simplicity of computation in our application, we chose B(V̄m, k) = 0, and J*(V̄m, k) = 1 for one-parameter models (3) and (4), (1, Rm−1)′ for model (5), and (1, X0)′ for model (6). We estimated E[Rm| V̄m, Ym = 0] by where is the maximum likelihood estimate based on the pooled logistic model logit Pr[Rm = 1| V̄m, Ym = 0; α] = α′v(V̄m) where v(V̄m) is a function of the covariates listed in Table 1. Thus, we estimated . For each covariate Xm in Vm that was not measured at every time m, the function v(V̄m) included a product term between the most recent measurement of that component Xt for t < m and the time since that measurement m − t, as described by Taubman et al. (2009). For each covariate in Vm that was measured at every time, the function v(V̄m) included its values at times m and m − 1, unless these measurements were so collinear as to result in unstable coefficients or convergence problems, in which case only the value at time m was included. The exclusion of the measurements at time m − 1 did not materially affect our estimates.
The estimator that solves is our g-estimate, where U(ψ; α*) = U(ψ; B,J) and B,J were chosen as above. To solve this estimating equation, we minimized a quadratic form obtained by multiplying the sum over all individuals of by the generalized inverse of the empirical covariance matrix by the transpose of the sum of . We conducted a grid search using 100 points within a user-defined interval for each component of the parameter (by default, −1.5 to 1.5) to determine the initial value for the optimization procedure. Then we used the Newton–Raphson method to find the minimum. When a minimum was not obtained within the region, the search band was widened until a minimum was found. In the case of model (3), Newton–Raphson did not converge and the Nelder–Mead simplex method was used instead. The gestimation procedure was repeated in 200 nonparametric bootstrap samples to obtain 95% confidence intervals (CIs) for the g-estimate.
We note that our estimator did not require the positivity assumption. It follows that, under a nonsaturated SNCFTM, ψ* and thus
can be consistently estimated even though positivity does not hold.
When we replace Yk by Sk = 1 − Yk in the definition of Hm,k(ψ) and redefine Hm,k(ψ) as 0 when Ym = 1, the above G-estimation procedure consistently estimates the parameter vector of an SNMSTM.
5. COMPUTING THE MARGINAL RISK UNDER NONDYNAMIC INTERVENTIONS
Equation (1) and our estimate of ψ* provide an estimate of the contrast between the conditional means of Yk,g=(R̄m,0) and Yk,g=(R̄m−1,0) given V̄m, Rm, Ym = 0. To consider the interventions’ effects at the population level, we are more interested in comparisons between the unconditional means of Yk,g, Yk, g = 0, and Yk for various choices of g. Below we show that E[Yk,g=0] equals the expectation of an observable random variable when γk(V̄m, Rm; ψ*) is known. This raises the question whether, for certain other choices of g, E[Yk,g] is a function of the joint distribution of the data only through the αk(V̄m, Rm; ψ*) and E[Yk,g = 0]. In general, the answer is no except in the special case in which (a) there is no treatment time-varying covariate interaction on the multiplicative cumulative failure scale, that is,
does not depend on V̄m for all m > 0, and (b) the regime g of interest is a “baseline-adjusted” nondynamic regime g = r̄K(v0); that is, at each time m, the treatment rm assigned by g to subjects with Ym = 0 does not depend on covariate history v̄m except through the baseline covariates v0. Note that (a) is satisfied by blip functions (3)–(6) but not by blip function (7), and that (b) is satisfied by all of the interventions we considered in our application. Condition (a) can be tested by specifying a model γk(V̄m, Rm; ψ) depending on V̄m with a subvector of ψ, say, ψm,int, encoding the dependence on V1, …, Vm and then using g-estimation to test the hypothesis that . Below, for ease of notation, we restrict attention to nondynamic regimes g= r̄K. Extension to “baseline-adjusted” nondynamic regimes g= r̄K(v0) is straightforward.
Outside of the special case where (a) and (b) hold, estimation of E[Yk,g] requires that one estimate the nuisance functions E[Yk,g=(R̄m−1,0)|V̄m, Rm, Ym = 0] and the density of Vm given the past, which cannot be accomplished in a robust manner, would add substantially to the computational burden, and would make estimation more dependent on correct specification of additional models (Robins et al. 1992). Nonetheless, subject to these limitations, estimation of E[Yk,g] for any regime g is possible by using the approach provided in the appendix of the article by Robins (1994) and in section 8 of the article by Robins, Rotnitzky, and Scharfstein (1999).
Below we first describe the estimation of E[Yk,g = 0̄] for each time k. To do so, the SNCFTM model is used to mathematically “remove” the effect of each subject’s nonzero treatments, starting at the end of the study period (K + 1) and working backward to the beginning (k = 0). We refer to this process as “blipping down.”
We then describe the estimation of E[Yk,g=r̄K] under the additional assumption that (a) above holds. For concreteness, we suppose the regime g is the regime “Always treat,” that is, g = 1¯. Then, given the estimates of the E[Yk,g=0̄] obtained by “blipping down,” we use the SNCFTM to construct estimates of E[Yk,g=1̄], by starting at the first time point (k = 0) and working forward to the end of the study period (K + 1). We refer to this process as “blipping up” (Robins 1994; Page 2005).
5.1 Blipping Down
The quantity
has the same conditional mean given V̄j, R̄j = r̄j as the counterfactual Yk,g=(r̄j−1,0) under the regime in which treatment is withheld starting at time j (Page 2005). For j = 0, H0,k(ψ*) has the same mean as the cumulative risk at time k under the intervention “Never treat.” Note that
| (12) |
because Y0 = 0 for all participants. Thus, consistently estimates the cumulative risk E[Yk,g=0¯] at time k under “Never treat.” Indeed, Robins (1994) showed that is consistent under the weaker exchangeability condition that Ym+1,g=0¯ ∐ Rm|V̄m, Ym = 0.
When, as in our application, blip function (4) is used,
where is the reciprocal of the exponentiated blip function (4). If the treatment is exercising at least 30 min/day, is the estimated cumulative risk at time k if everyone had exercised less than 30 min/day throughout the period 1982–2002. As noted above, this blipping down procedure can be applied to any blip function, including those in Equations (3)–(7), regardless of whether (a) above holds.
5.2 Blipping Up
We next consider blipping up—a curiously more complicated computation. In Remark 3 below, we explain why the computation is so complex. We first consider one-dimensional parameter models like blip functions (3) and (4). When (a) above and the exchageability assumption (Equation (8)) both hold, we can “blip up” from E[H0,k(ψ*)] to the risk under any specified treatment regime g = r̄ with r̄ ≡ r̄K by the formula
| (13) |
where the coefficients tr̄(k, j, i) for i = −1, 0, 1, …, k − 2 are recursively defined as
with j ≤ s and er̄(p,m) = exp[γp(r̄m, ψ*)]. For example, under the regime “Always treat,” we have rm = 1 for all m > 0, so er̄(p, m) = exp[ψ*] for blip function (3), and er̄(p, m) = 1 + (exp[ψ*] − 1)/(p − m) for blip function (4) (see Appendix B for a proof of Equation (13)).
After algebraic rearrangements and applying the definition of the blip function and the recursive definitions of tr̄(k, j, i), Equation (13) amounts to the following: the cumulative probability of failure by time k under the treatment regime g = r̄ is a sum of probabilities of failure at each time j < k (conditional on survival through j − 1), weighted by cumulative probability of survival through j − 1, wherein the blip function for the treatment at time k − 1 is only multiplied by the final term in the sum (as only those who survive to time k − 1 can be treated at that time). However, blip functions for earlier treatments are used to obtain the other terms in the sum. It follows that, given estimates and computed above, we obtain a “substitution” estimator of E[Yk,g=r̄] by substituting these estimates for their respective estimands in Equation (13).
Remark 3. Robins (1994) and Robins, Rotnitzky, and Scharfstein (1999) provided the following simple formula for E[Yk,g=r̄] under a nondynamic regime when (a) holds:
instead of Equation (13). The explanation for the difference is that the definition of a nondynamic regime for a failure time outcome used in this article is subtly different from that for the nonfailure time outcomes studied by Robins (1994) and Robins, Rotnitzky, and Scharfstein (1999). Indeed, under the definition in these articles, our nondynamic failure time regime is the dynamic regime: if without failure at time m, take treatment rm, otherwise take treatment 0. This regime is dynamic because the treatment at time m depends on the value of an evolving time-dependent covariate—the indicator of survival at time m.
However, under a SNMSTM, the counterfactual survival probability E[Sk,g=r̄] is given by the right-hand side of the previous display with Sk,g=0¯ = 1 − Yk,g=0¯ substituted for Yk,g=0¯.
We now consider two-dimensional parameter models that depend on a prior treatment (blip function (5)) or a time-invariant covariate X0 (blip function (6)).
For blip function (5), including an interaction term with treatment at the previous time, we use Equation (13) as stated, where tr̄(k, j, −1) is now calculated using
and r−1 = 0.
For blip function (6), we first blip down separately in each stratum of X0 to compute the conditional means E[Sk,g=r̄]X0 = x0] for each k. We then blip up using a conditional version of Equation (13),
where er̄(p, m) becomes er̄,x0 (p, m) = exp[γp(x0, r̄m; ψ*)] for each x0. At each time k, the marginal risk under the treatment regime g = r̄ is the weighted average
| (14) |
where wx0 is the unconditional probability that X0 = x0.
So far, we have described one method to estimate the treatment effect E[Yk,g=1¯] − E[Yk,g=0¯] under condition (a) above: use g-estimation to obtain , estimate E[Yk,g=0¯] by , and finally estimate E[Yk,g=1¯] with the “substitition” estimator obtained from blipping up.
However, there is also a second, inequivalent method: estimate E[Yk,g=0¯] by as above; then estimate E[Yk,g=1¯] as follows. Define and substitute for Rm in Equation (12) and elsewhere. Define and to be and , except based on rather than Rm. Then estimates the cumulative risk E[Yk,g=0¯] at time k and is an estimate of E[Yk,g=1¯] − E[Yk,g=0¯]. A major advantage of the second method is that, in contrast to the first method, the estimate of E[Yk,g=0¯] does not require assumption (a), since both and are obtained by blipping down. A major disadvantage of the second method is that, in contrast to the first method, the second method is logically inconsistent for the following reason.
The estimators and are based on different structural models. The estimate is consistent for E[Yk,g=0¯] only if the SNCFTM of Equation (1) holds. In contrast, the estimate is consistent for E[Yk,g=1¯] only if the model
holds. However, if the null hypothesis that, for all regimes g, E[Yk,g] = E[Yk] is false and, as in this article, the dimension of ψ is not large, these two structural models are generally incompatible. Hence, at least one must be misspecified. Thus, cannot be consistent for E[Yk,g=1¯] − E[Yk,g=0¯]. Of course, in truth both models will be somewhat misspecified and assumption (a) is never exactly true. In light of this, researchers may want to estimate E[Yk,g=1¯] − E[Yk,g=0¯] both methods and see if the difference in the estimates is substantively unimportant.
Relationship to Counterfactual Hazard Models. In Appendix A we derive assumptions under which exp[γk(V̄m,Rm; ψ*)] approximates a counterfactual hazard ratio. We also describe why, when these assumptions do not hold, estimation of the ratio of counterfactual hazards under the structural nested hazard ratio model defined in the appendix is much less robust than g-estimation of the ratio of the counterfactual cumulative risks under an SNCFTM.
6. CENSORING
Selection bias due to censoring may be adjusted for by inverse probability weighting as follows. Let Ck be an indicator for whether an individual is censored (i.e., dead from a competing cause or lost to follow-up) at time k. We define the inverse probability weights
where Pr[Cj+1 = 0|Cj = 0, Yj = 0, V̄j, Aj] is the probability of remaining uncensored through time j + 1 conditional on surviving uncensored, and treatment and covariate history, at time j.
We can use g-estimation to consistently estimate the parameter ψ* of the SNCFTM by using the weighted estimating equation (Robins, Rotnitzky, and Zhao 1995)
under Assumptions 1–3 with Rm replaced by (Rm, Cm+1 = 0) and Ym = 0 by (Ym = 0, Cm = 0). The estimated inverse probability weights were obtained by fitting a logistic model:
where z(V̄j, Aj) is a vector of indicator variables for the categories of the covariates listed in Table 1. We fit separate models for censoring by death and censoring from other causes, and multiplied the estimated probabilities from each model at each time k to obtain the overall probability of remaining uncensored.
In our application, recall that J*(V̄m, k) = 1 for one-parameter models (3) and (4), (1, Rm−1)′ for model (5), and (1,X0)′ for model (6); B(V̄m, k) = 0.
For each intervention, the estimator that solved was our g-estimate. We used the same minimization procedures described above, and nonparametric bootstraps based on 200 samples to obtain a 95% CI for the g-estimate.
In addition, the blipping down procedure was slightly modified: the risk E[Yk,g=0¯] at each time k under no treatment is estimated as the weighted average of using the estimated weights . However, the blipping up procedure remains unchanged since, after the censored data has been used to obtain estimates of ψ and the E[Yk,g=0¯], it is not further used in the estimation of E[Yk,g=r̄].
7. DATA ANALYSIS
In the Nurses’ Health Study, the g-estimates (bootstrap 95% CIs) for blip function (4) were −0.668 (−0.800, −0.565) for the smoking intervention, −0.069 (−0.244, 0.089) for the exercise intervention, −0.034 (−0.153, 0.101) for the diet intervention, and −0.152 (−0.310,−0.016) for the alcohol intervention.
The observed 20-year risk of CHD, computed via a cumulative incidence method that took into account the subjects who died from other causes (Gooley et al. 1999), was 3.50%. We used the g-estimates to calculate the 20-year CHD risks under the four “Always treat” interventions (g = 1¯) by blipping down and up as described in Section 5. Table 2 displays these risks and their 95% CIs. Following the approach by Taubman et al. (2009), we computed risk ratios and risk differences (Table 2) by comparing the risks under each intervention with the risk under no intervention (g = R̄, observed treatments), which was estimated, via a weighted Kaplan-Meier method using the estimated weights , to be 4.16% (95% CI, 3.97–4.34). Note that this is the theoretical risk if nobody had died of other causes or been lost to follow-up. None of the four interventions resulted in a higher 20-year CHD risk than no intervention. The estimated risk ratios ranged from 0.80 for avoiding smoking to 0.97 for the dietary intervention. The bootstrap estimate of bias for the risk difference (absolute value of the difference between the point estimate and the bootstrap average) was ≤0.05 for each intervention. When we used the alternative method to calculate the risk under “Always treat” (reversing the coding for treatment and blipping down only), results were nearly identical.
Table 2.
Percent CHD risks under four “Always treat” interventions sustained over a 20-year period, risk differences (RD), and risk ratios (RR) comparing “Always treat” to “No intervention,” obtained from the SNCFTM, Nurses’ Health Study
| Intervention | % Risk under “Always treat” |
95% CI | “Always treat” vs. “No intervention” |
|||
|---|---|---|---|---|---|---|
| RD | 95% CI | RR | 95% CI | |||
| Avoid smoking | 3.32 | 3.10, 3.47 | −0.84 | −0.98,−0.72 | 0.80 | 0.77, 0.83 |
| Exercise ≥30 min/day | 3.84 | 3.08, 4.65 | −0.33 | −1.03, 0.47 | 0.92 | 0.75, 1.11 |
| Diet in top two quintiles | 4.03 | 3.58, 4.63 | −0.13 | −0.57, 0.43 | 0.97 | 0.86, 1.10 |
| Consume ≥5 g/day alcohol | 3.59 | 3.07, 4.15 | −0.57 | −1.07,−0.07 | 0.86 | 0.75, 0.98 |
We chose these hypothetical interventions for comparability with previous analyses (Taubman et al. 2009). However, g-estimation of SNCFTMs can be generally applied to other interventions. For example, when the method is applied to the intervention “Keep diet score in a range corresponding to the top quintile of the observed data” (instead of the top two quintiles), the estimated risk ratio (95% CI) was 0.62 (0.40, 0.92).
We also used the g-estimate to calculate the 20-year CHD risks under the corresponding “Never treat” interventions (g = 0¯, in otherwords, the counterfactual risks obtained by blipping down), and the risk differences and risk ratios comparing each “Always treat” intervention with its corresponding “Never treat.” For each intervention, the bootstrap estimate of bias for this risk difference (absolute value of the difference between the point estimate and the bootstrap average) was ≤0.09. As expected, the estimates of the risk ratios in Table 3 were further from the null than those in Table 2. The difference increased with the proportion of women whose observed data were consistent with the “Always treat” intervention. Thus, the risk ratio for “Always treat” versus “Never treat” ranged from 0.41 (compared with 0.80 for “Always treat” versus “No intervention”) for avoiding smoking to 0.96 (compared with 0.97) for the dietary intervention.
Table 3.
Percent CHD risks under four “Never treat” interventions sustained over a 20-year period, risk differences (RD), and risk ratios (RR) comparing “Always treat” to “Never treat,” obtained from the SNCFTM, Nurses’ Health Study
| Intervention | % Risk under “Never treat” |
95% CI | “Always treat” vs. “No treat” |
|||
|---|---|---|---|---|---|---|
| RD | 95% CI | RR | 95% CI | |||
| Avoid smoking | 8.16 | 7.23, 9.41 | −4.84 | −6.10,−3.88 | 0.41 | 0.35, 0.47 |
| Exercise ≥30 min/day | 4.20 | 3.98, 4.40 | −0.37 | −1.16, 0.52 | 0.91 | 0.73, 1.13 |
| Diet in top two quintiles | 4.21 | 3.92, 4.47 | −0.18 | −0.79, 0.59 | 0.96 | 0.82, 1.15 |
| Consume ≥5 g/day alcohol | 4.38 | 4.07, 4.72 | −0.80 | −1.57, −0.09 | 0.82 | 0.67, 0.98 |
We repeated the analyses using the SNCFTMs defined by blip functions (3), (5), and (6). For blip function (6), we considered X0 = I (A−1 ≥ a−1), where a−1 is 30 min/day for exercise, 4 (quintile) for diet score, or 5 g/day of alcohol, and X0 = I(A−1 = 0 cigarettes/day) for the smoking intervention. Figure 1 compares the risk ratios for “Always treat” versus “No intervention” for each of the four interventions using the four different models (3)–(6). The estimates for all four interventions were reasonably similar across all four models. The estimate for the exercise intervention for blip function (5) was stronger than it was for the other models, perhaps indicating (as supported by the signs of the parameters) that this model better captured the distinction between habitual exercisers and those who began exercising only when they realized they were at elevated risk for CHD.
Figure 1.
Comparison of results using four different structural models for the effect of four hypothetical interventions on 20-year risk of coronary heart disease in the Nurses’ Health Study. Ratios of the risks for “Always treat” to the risks for “No intervention” (with 95% confidence intervals) using blip function (4) are represented by diamonds, blip function (3) by squares, blip function (5) by triangles, and blip function (6) by circles. Data points are grouped by intervention: from left to right, the interventions are on smoking, exercise, diet, and alcohol.
In interval cohorts like the Nurses’ Health Study, Lm and Am are measured simultaneously, so their temporal order cannot be determined. The ordering used to define our causal parameters above—that of Lm preceding Am for any given intervention—is in line with the strong assumption that, while possibly associated, (a) no risk factor measured at time m causes any other risk factor in the same period, and (b) the common causes that correlate the treatments at each time m are independent of those at any time m′ different from m (Taubman et al. 2009; Robins, Hernán, and Siebert 2004). To assess the sensitivity of our estimates to this assumption, we repeated the analysis for each structural model assuming that Am occurred before Lm. Risk ratios were all in the same direction as those in our main analyses, and changed by no more than 0.09.
Software to implement SNCFTMs, along with documentation for its use, is available on the website: http://www.hsph.harvard.edu/causal.
8. CONCLUSION
This article describes SNCFTMs, a new class of structural models, and a practical example of its application to complex longitudinal data. SNCFTMs have a key advantage over the previously proposed SNAFTMs: the estimating function is smooth in the parameter and thus g-estimation of SNCFTMs is numerically stable and can be carried out using standard optimization routines (e.g., the Newton–Raphson procedure).
It is useful to fit a number of different structural models. If results fluctuate dramatically for different choices, it will be important to consider the biological plausibility of the model, its vulnerability to residual confounding and measurement error, and the appropriateness of the interventions that are being considered.
Three approaches for causal inference from observational data are g-estimation of structural nested models, inverse probability weighting of marginal structural models, and the parametric g-formula. The comparison of effect estimates obtained using different methods can provide a crosscheck and shed light on the validity of the different modeling assumptions needed for each method. For comparison purposes, Figure 2 shows the risk ratios (with 95% CI) estimated for “Always treat” versus “No intervention” for our four representative interventions as presented in Table 2 alongside those obtained by using the parametric g-formula as described by Taubman et al. (2008) and allowing for censoring by death. The risk ratios estimated from the SNCFTM and the parametric g-formula are in the same direction for all four interventions. Unsurprisingly, considering the vastly fewer modeling assumptions required, the CIs from the semiparametric SNCFTM ranged from about the same width (for the smoking intervention) to about twice as wide (for the diet intervention) as the CIs from the fully parametric g-formula.
Figure 2.
Comparison of results using SNCFTM and parametric g-formula for the effect of four hypothetical interventions on 20-Year risk of coronary heart disease in the Nurses’ Health Study. Ratios of the risks for “Always treat” to the risks for “No intervention” (with 95% confidence intervals) using SNCFTM with blip function (4) are represented by diamonds, and parametric g-formula by squares. Data points are grouped by intervention: from left to right, the interventions are on smoking, exercise, diet, and alcohol.
Our application of SNCFTMs has some limitations. First, our estimates are only valid under the assumption of exchangeability conditional on the measured variables. Because the Nurses’ Health Study is an observational one, this assumption is not guaranteed to hold. Second, the estimates may not be valid when the rare failure condition (Remark 1) fails to hold. In our application, there was no evidence that this rare failure assumption failed to hold. Third, the functions B and J* that we chose for computational simplicity, described in Section 4, do not result in a doubly robust, locally efficient estimator (Page 2005). Fourth, we only considered structural models exp[γk(V̄m, Rm; ψ*)] that depended on V̄m only through V0 and were thus without effect modification by time-varying covariates. This allowed us to use the method in Section 5.2 to compute absolute risks. Fifth, we only considered binary treatment variables, because we aimed to compare the results from the SNCFTM with those from the parametric g-formula (Taubman et al. 2008).
Finally, we used inverse probability weighting to adjust for informative censoring due to both loss to follow-up and competing risks (i.e., death from other causes). Thus, our method theoretically estimates the risk if no one in the study population had been lost to follow-up or died from other causes. In contrast, Taubman et al. (2009) estimated the risk if no one in the population had been lost to follow-up, while allowing the subjects in the population to die from other causes according to the observed distribution of death. This different estimand explains, in part, the difference in the risk estimate under no intervention between our method (4.16) and that found by Taubman and collaborators (3.68). (For caveats on the use of inverse probability weighting to adjust for informative censoring due to competing risks see, for example, section 12.2 of the article by Robins 1986, along with Rubin 1998, and Hernán, Hernández-Diaz, and Robins 2004.)
Future extensions of this work will include the development of the software needed both to compute doubly robust, locally efficient estimators and to implement alternative methods to adjust for censoring by competing risks.
Acknowledgments
This research was funded by NIH grants R01 HL34594 and R01 HL080644. The authors thank Roger Logan for programming support, and JoAnn Manson, Frank Hu, and Walter Willett for helpful comments on the article.
APPENDIX A: APPROXIMATING THE HAZARD RATIO
In this appendix, we will show how the blip function γm,k (V̄m, Rm; ψ) relates to the hazard ratio.
Remark 4. If we take k to be that instant of time after m, that is, k = m + 1, where the unit of time is very small, and further assume that the hazard ratio is constant over time, then Equation (1) rewritten as
is an approximation to the conditional hazard ratio. This formulation does not require a rare failure assumption, but it does require that relative to the median survival time, say, the time intervals are measured in very small units.
For the rest of this section, we will consider time k as a continuous variable. Let X ≈ W mean that X and W are arbitrarily close. Since we are assuming a rare outcome, but the time intervals may not be small, we will prove the following theorem.
Theorem 5. Suppose Ym = 0. Let λYg (k|V̄m, Rm) represent the conditional counterfactual hazard of Y at time k under regime g. Define the conditional hazard ratio at time k comparing treatment at time m, Rm, with no treatment at time m (0m) among individuals with the same covariate and treatment history up to m, as follows:
Under the (limiting) rare failure assumption that for all g, Pr(Yg,K |V̄m, Rm) ≈ 0, if the conditional hazard ratio is constant over time, that is,
then
which does not depend on k.
Proof. In Equation (1), the conditional expectation of Y is the cumulative distribution of failure up to k given (V̄m, Rm). Since the density at any time u is equal to the product of hazard at u and the survival up to time u, Equation (1) may be written as
| (A.1) |
where SYg (u|V̄m, Rm) = Pr[Yu,g = 0|V̄m, Rm, Ym,g = 0] is the conditional counterfactual survival function of Yg at time u.
Under the rare failure assumption, SYg=(R̄m,0) (u|V̄m, Rm) is arbitrarily close to 1 for all u < K. Thus, Equation (A.1) may be approximated by
This approximation will be reasonable as long as SYg=(R̄m,0) (u|V̄m, Rm) is greater than about 0.95. If we rewrite the numerator and rearrange as follows, we obtain:
where
If HR(u|V̄m, Rm) = HR(V̄m, Rm) does not depend on u, then we can pull it out of the integral. Note that
and therefore eγm,k(V̄m, Rm) ≈ HR(V̄m, Rm). □
The above proof also shows that if HR(u|(V̄m, Rm) depends on time u, then eγm,k(V̄m, Rm) depends on time k and is a weighted average of the HR(u|V̄m, Rm), m ≤ u ≤ k with u-specific weights proportional to the hazard λYg=(Rm−1,0) (u|V̄m, Rm). Since we are modeling eγm,k(V̄m, Rm) but may be interested in hazard ratios, we invert this last relationship and derive HR(u|V̄m, Rm) as a function of u when γm,k(V̄m, Rm) depends on k.
Theorem 6. If Ym = 0, then, under the rare failure assumption, for k > m,
Proof. In the proof of Theorem 5, we found
| (A.2) |
where
can be seen as a weight proportional to the hazard at time u under no treatment at time m.
Taking derivatives of Equation (A.2) with respect to k, using the chain rule on the left hand side and using Leibniz’s rule on the right results in the following:
The middle term is 0 because m does not depend on k. Next, inside the integral, pull the hazard ratio HR(u|V̄m, Rm) out of the derivative since it is constant with respect to k.
Now the numerator of w(m, u, k|V̄m, Rm) is also constant with respect to k, so
by the chain rule and the Fundamental Theorem of Calculus. Putting all of this together,
Note that w(m, k, k|V̄m, Rm) does not depend on u and can thus be pulled out of the integral. The rest of the integral is the right hand side of (A.2). From here, using Equation (A.2) and rearranging gives the desired result:
□
Note that if one were interested in HR(k|V̄m, Rm) rather than eγm,k(V̄m, Rm) as an effect measure, one might consider parametric models for HR(k|V̄m, Rm). Such a model is referred to as a structural nested hazard ratio model. However, in contrast with an SNCFTM, a structural nested hazard ratio model does not admit unbiased estimating functions even when treatment probabilities are known (as in a sequential randomized trial; Robins 2000).
These results only consider comparisons between conditionalmeans of Yg=(R̄m,0) and Yg=(R̄m−1,0) given (V̄m, Rm). We ultimately would be more interested in comparisons between the unconditional means of Yg and Yg=0¯ for various choices of g. Define HRg(k) = λYg (k)/λYg=0¯ (k). The above proofs can also be used to show that if the outcome is rare, that is,
for all combinations of (V̄m, Rm), then for k > K + 1 and for all g,
and
where, similar to above,
This raises the question of whether, for certain g’s, E[Yk,g] is a function of the joint distribution of the data only through γm,k(V̄m, Rm) and E[Yk,g=0¯]. In general, the answer is no except in the special case in which
does not depend on V̄m for all m > 0, in which case we say there is no treatment covariate interaction on the multiplicative cumulative failure scale; this case is explored in detail in Section 5.2 and Appendix B.
APPENDIX B: PROOF OF EQUATION (13) (BLIPPING UP)
In this appendix, we provide the proof of Equation (13). We will need two lemmas.
Lemma 7. Suppose that there are no unmeasured confounders and that (possibly after stratification on a pre-intervention variable) the blip function γm,k depends only on treatment. Let
Let m be an integer ≥ −1, and use the notation er̄(p,m) = exp[γp(r̄m; ψ)]. Then
Proof. By the definition of the blip function γk, and the assumption that the blip function does not depend on time-varying covariates, br(m + 1, m|m, m, m) = br(m + 1, m − 1|m, m, m) × er̄(m + 1, m). Under exchangeability, the actual treatment received at time m is independent of the counterfactual outcome at time m + 1, given past treatment and covariate history, so on both sides we can ignore the Rm in the condition. Thus,
Then, integrating out Vm, we conclude that
because the blip function depends only on treatment. □
The second lemma contains the main substance of the proof of the theorem.
Lemma 8. Define br(p, m|j, s, t), er̄(p, m), and m as above. Let s ≤ m be a positive integer and j ≤ s be a nonnegative integer, where j indexes integer valued differences of earlier failure times relative to m + 1 and s indexes integer valued differences of earlier treatment occasions relative to m + 1. Define the functions tr̄ recursively as follows:
Then, under the assumptions of Lemma 7:
and
Proof. Note that for each value of s, the first equation implies the second, by the same reasoning that was used in Lemma 7. Specifically, on both sides of the first equation, we change the condition from “|m − s, m − s, m − s)” to “|m − s, m − (s + 1), m − s)” using the assumption of no unmeasured confounders: the counterfactual outcome at a later time (m + 1 or m + 1 − j) under a given treatment regime is independent of actual treatment received at an earlier time (m − s) given the past (R̄m−(s+1)), and then we integrate out Vm−s on both sides. This yields the second equation.
The proof that the first equation is true proceeds by induction.
Base case: Consider s = 1. We want to prove:
Note that by the recursive definition,
and also
Replacing these terms, the equation we are trying to prove becomes:
Starting from the left hand side,
| (B.1) |
This is because if events Q and T are mutually exclusive, then Pr[Q ∨ T] = Pr[Q|¬T] × Pr[¬T] + Pr[T].
Here, in the population of people who survived up to time m − 1 with a given covariate and treatment history up to then, Q ∨ T refers to the (counterfactual) event “failure between time m − 1 and time m + 1 under a regime whose final blip of treatment occurs at time m,” which can occur in one of two mutually exclusive ways: T is the counterfactual event “failure between times m − 1 and m under a final blip of treatment at time m − 1” and Q is the counterfactual event “failure between times m and m + 1 under a final blip of treatment at time m,” which can only occur if T does not.
By Lemma 7, the first factor on the right hand side of Equation (B.1) can be rewritten to yield
However, since Pr[Q|¬T] × Pr[¬T] = Pr[Q ∨ T] − Pr[T], part of the product becomes a difference:
So now we have
Algebraic rearrangement yields:
which becomes, after applying the definition of the blip function again,
This is the right-hand side of the equation we wanted to prove. So the lemma holds for s = 1.
Induction Step: Suppose that the theorem holds for s − 1. In other words,
and
| (B.2) |
We want to prove that the theorem holds for s:
| (B.3) |
Starting from the left hand side of Equation (B.3), the first step follows the same reasoning as in the base case:
Now substituting in for the first br using the second part of the induction hypothesis (Equation (B.2)), we obtain
Distributing the {1 − br (m − (s − 1), m − s|m − s, m − s, m − s)} term into the sum and using the same reasoning as in the base case, this becomes:
Using some algebra, we get
Applying the definition of the blip function both inside the sum and to the term at the end of the equation, we obtain:
Now, in the sum j < s, so tr̄(m + 1, j, m − (s + 1)) = tr̄(m + 1, j, m − s) × er̄(m + 1 − j, m − s). We have exactly that product as a coefficient of br(m + 1 − j, m − (s + 1)|m − s, m − s, m − s). Meanwhile,
which is the coefficient of br(m − (s − 1), m − (s + 1)|m − s, m − s, m − s). So our expression becomes:
However, the last term can be incorporated into the sum, yielding
This is the right-hand side of Equation (B.3), which is what we were trying to prove. □
Theorem 9. Under exchangeability and no treatment–covariate interaction on the multiplicative cumulative failure scale (i.e., γk(V0, R̄m, ψ) is a function only of treatment history and baseline covariates), for any nondynamic regime g = r̄ and m ≤ K,
Proof. The theorem follows directly from Lemma 8 if we set s = m. Then m − s − 1 = −1 and m − s = 0, so the second formula from Lemma 8 becomes
which can be rewritten as
The conditioning event, however, is true by definition, so it can be left out. Thus,
□
Contributor Information
Sally Picciotto, Research Fellow in the Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115; Division of Environmental Health Sciences, UC Berkeley School of Public Health, Berkeley, CA 94720.
Miguel A. Hernán, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA 02115.
John H. Page, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115; Channing Laboratory, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115.
Jessica G. Young, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115.
James M. Robins, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA 02115
REFERENCES
- Bickel P, Klaassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semipammetric Models. The Johns Hopkins University Press; Baltimore, MD: 1993. [5] [Google Scholar]
- Colditz GA, Manson JE, Hankinson SE. The Nurses’ Health Study: 20-Year Contribution to the Understanding of Health Among Women. Journal of Women’s Health. 1997;6:49–62. doi: 10.1089/jwh.1997.6.49. [2] [DOI] [PubMed] [Google Scholar]
- Colditz GA, Martin P, Stampfer MJ, Willett WC, Sampson L, Rosner B, Hennekens CH, Speizer FE. Validation of Questionnaire Information on Risk Factors and Disease Outcomes in a Prospective Cohort Study of Women. American Journal of Epidemiology. 1986;123:894–900. doi: 10.1093/oxfordjournals.aje.a114319. [2] [DOI] [PubMed] [Google Scholar]
- Gooley TA, Leisenring W, Crowley J, Storer BE. Estimation of Failure Probabilities in the Presence of Competing Risks: New Representations of Old Estimators. Statistics in Medicine. 1999;18:695–706. doi: 10.1002/(sici)1097-0258(19990330)18:6<695::aid-sim60>3.0.co;2-o. [8] [DOI] [PubMed] [Google Scholar]
- Hernán MA, Brumback B, Robins JM. Marginal Structural Models to Estimate the Joint Causal Effect of Nonrandomized Treatments. Journal of the American Statistical Association. 2001;96:440–448. [1] [Google Scholar]
- Hernán MA, Cole SR, Margolick J, Cohen M, Robins JM. Structural Accelerated Failure Time Models for Survival Analysis in Studies With Time-Varying Treatments. Pharmacoepidemiology and Drug Safety. 2005;14:477–491. doi: 10.1002/pds.1064. [1] [DOI] [PubMed] [Google Scholar]
- Hernán MA, Hernández-Diaz S, Robins JM. A Structural Approach to Selection Bias. Epidemiology. 2004;15:615–625. doi: 10.1097/01.ede.0000135174.63482.43. [1,10] [DOI] [PubMed] [Google Scholar]
- Hernán MA, VanderWeele TJ. Compound Treatments and Transportability of Causal Inference. Epidemiology. 2011;22(3):368–377. doi: 10.1097/EDE.0b013e3182109296. [5] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalbeisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. Wiley; New York: 1980. [1] [Google Scholar]
- Martinussen T, Vansteelandt S, Gerster M, Hjelmborg JVB. Estimation of Direct Effects for Survival Data by Using the Aalen Additive Hazards Model. Journal of the Royal Statistical Society, Series B. 2011;73:773–788. [4] [Google Scholar]
- Page J. 2005. Doubly Robust Estimation: Structural Nested Cumulative Failure Time Models. Sc.D. dissertation, Departments of Epidemiology and Biostatistics, Harvard School of Public Health. [1,4,5,6,10] [Google Scholar]
- Robins JM. A New Approach to Causal Inference in Mortality Studies With Sustained Exposure Periods: Application to Control of the Healthy Worker Survivor Effect. Mathematical Modelling. 1986;7:1393–1512. [1,10] [Google Scholar]
- Robins JM. Correcting for Non-Compliance in Randomized Trials Using Structural Nested Mean Models. Communications in Statistics - Theory and Methods. 1994;23:2379–2412. [1,4,5,6,7] [Google Scholar]
- Robins JM. Structural Nested Failure Time Models. In: Armitage P, Colton T, editors. Encyclopedia of Biostatistics. Wiley; Chichester/New York: 1998. pp. 4372–4389. [1] [Google Scholar]
- Robins JM. Association, Causation, and Marginal Structural Models. Synthese. 1999;121:151–179. [1] [Google Scholar]
- Robins JM. Marginal Structural Models Versus Structural Nested Models as Tools for Causal Inference. In: Halloran ME, Berry D, editors. Statistical Models in Epidemiology: The Environment and Clinical Trials. Vol. 116. Springer-Verlag; New York: 2000. pp. 95–134. [12] [Google Scholar]
- Robins JM, Blevins D, Ritter G, Wulfsohn M. G-Estimation of the Effect of Prophylaxis Therapy for Pneumocystis carinii Pneumonia on the Survival of AIDS Patients. Epidemiology. 1992;3:319–336. doi: 10.1097/00001648-199207000-00007. [6] [DOI] [PubMed] [Google Scholar]
- Robins JM, Blevins D, Ritter G, Wulfsohn M. Errata to G-Estimation of the Effect of Prophylaxis Therapy for Pneumocystis carinii Pneumonia on the Survival of AIDS Patients. Epidemiology. 1993;14:79–97. doi: 10.1097/00001648-199207000-00007. [xxxx] [DOI] [PubMed] [Google Scholar]
- Robins JM, Hernán MA, Siebert U. Effects of Multiple Interventions. In: Ezzati M, Lopez AD, Rodgers A, Murray CJL, editors. Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors. I. World Health Organization; Geneva: 2004. pp. 2191–2230. [3,9] [Google Scholar]
- Robins JM, Rotnitzky A, Scharfstein D. Sensitivity Analysis for Selection Bias and Unmeasured Confounding in Missing Data and Causal Inference Models. In: Halloran ME, Berry D, editors. Statistical Models in Epidemiology: The Environment and Clinical Trials. Vol. 116. Springer-Verlag; New York: 1999. pp. 1–92. [6,7] [Google Scholar]
- Robins JM, Rotnitzky A, Zhao LP. Analysis of Semi-parametric Regression Models for Repeated Outcomes in the Presence of Missing Data. Journal of the American Statistical Association. 1995;90:106–121. [8] [Google Scholar]
- Rubin DB. More Powerful Randomization-Based p-Values in Double-Blind Trials With Noncompliance. Statistics in Medicine. 1998;17:371–385. doi: 10.1002/(sici)1097-0258(19980215)17:3<371::aid-sim768>3.0.co;2-o. [10] [DOI] [PubMed] [Google Scholar]
- Salvini S, Hunter DJ, Sampson L, Stampfer MJ, Colditz GA, Rosner B, Willett WC. Food-Based Validation of a Dietary Questionnaire: The Effects of Week-to-Week Variation in Food Consumption. International Journal of Epidemiology. 1989;18:858–867. doi: 10.1093/ije/18.4.858. [2] [DOI] [PubMed] [Google Scholar]
- Stampfer MJ, Hu FB, Manson JE, Rimm EB, Willett WC. Primary Prevention of Coronary Heart Disease in Women Through Diet and Lifestyle. The New England Journal of Medicine. 2000;343:16–22. doi: 10.1056/NEJM200007063430103. [2] [DOI] [PubMed] [Google Scholar]
- Taubman SL, Robins JM, Mittleman MA, Hernán MA. JSM Proceedings 2008, Health Policy Statistics Section. American Statistical Association; Alexandria, VA: 2008. Alternative Approaches to Estimating the Effects of Hypothetical Interventions. [2,4,10] [Google Scholar]
- Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on Risk Factors for Coronary Heart Disease: An Application of the Parametric g-Formula. International Journal of Epidemiology. 2009;38:1599–1611. doi: 10.1093/ije/dyp192. [1,2,6,9,9,10] [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer; New York: 2003. [5] [Google Scholar]
- Wolf AM, Hunter DJ, Colditz GA, Manson JE, Stampfer MJ, Corsano KA, Rosner B, Kriska A, Willett WC. Reproducibility and Validity of a Self-Administered Physical Activity Questionnaire. International Journal of Epidemiology. 1994;23:991–999. doi: 10.1093/ije/23.5.991. [2] [DOI] [PubMed] [Google Scholar]
- Young JG, Hernán MA, Picciotto S, Robins JM. Proceedings of the Section on Statistics in Epidemiology. ASA; Alexandria, VA: 2008. Simulation From Structural Survival Models Under Complex Time-Varying Data Structures; pp. 1130–1135. [3] [Google Scholar]
- Young JG, Hernán MA, Picciotto S, Robins JM. Relation Between Three Classes of Structural Models for the Effect of a Time-Varying Exposure on Survival. Lifetime Data Analysis. 2010;16:71–84. doi: 10.1007/s10985-009-9135-3. [1] [DOI] [PMC free article] [PubMed] [Google Scholar]


