Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 6.
Published in final edited form as: Biometrika. 2019 Oct 29;107(1):123–136. doi: 10.1093/biomet/asz057

Semiparametric estimation of structural failure time models in continuous-time processes

S YANG 1, K PIEPER 2, F COOLS 3
PMCID: PMC7646189  NIHMSID: NIHMS1639113  PMID: 33162561

Summary

Structural failure time models are causal models for estimating the effect of time-varying treatments on a survival outcome. G-estimation and artificial censoring have been proposed for estimating the model parameters in the presence of time-dependent confounding and administrative censoring. However, most existing methods require manually pre-processing data into regularly spaced data, which may invalidate the subsequent causal analysis. Moreover, the computation and inference are challenging due to the nonsmoothness of artificial censoring. We propose a class of continuous-time structural failure time models that respects the continuous-time nature of the underlying data processes. Under a martingale condition of no unmeasured confounding, we show that the model parameters are identifiable from a potentially infinite number of estimating equations. Using the semiparametric efficiency theory, we derive the first semiparametric doubly robust estimators, which are consistent if the model for the treatment process or the failure time model, but not necessarily both, is correctly specified. Moreover, we propose using inverse probability of censoring weighting to deal with dependent censoring. In contrast to artificial censoring, our weighting strategy does not introduce nonsmoothness in estimation and ensures that resampling methods can be used for inference.

Keywords: Causality, Cox proportional hazards model, Discretization, Observational study, Semiparametric analysis, Survival data

1. Introduction

Confounding by indication is common in observational studies and obscures the causal relationship between the treatment and outcome, (Robins et al., 1992). In longitudinal observational studies, this phenomenon becomes more pronounced because of time-varying confounding, when there are time-dependent covariates that predict the subsequent treatment and outcome, and are also affected by the past treatment history. In this case standard regression methods, whether adjusting for confounders or not, are fallible (Robins et al., 2000; Daniel et al., 2013).

Structural failure time models (Robins & Tsiatis, 1991; Robins, 1992) and marginal structural models (Robins, 2000; Hernán et al., 2001) have been used to handle time-varying confounding effectively. Structural failure time models simulate the potential failure time outcome that would have been observed in the absence of treatment, referred to as the potential baseline failure time, by removing the treatment effect, while marginal structural models specify the marginal relationship between potential outcomes under different treatments, possibly adjusting for the baseline covariates. Structural failure time models have certain features that are more desirable than marginal structural models (Robins, 2000): structural failure time models allow for the modelling of time-varying treatment modification effects using the post-baseline time-dependent covariates; they are more flexible in terms of translating biological hypotheses into their parameters (Robins, 1998b; Lok, 2008); and g-estimation (Robins, 1998b) for structural failure time models does not require the probability of receiving treatment at each time-point to be positive for all subjects.

Most structural failure time models specify deterministic relationships between the observed failure time and the potential baseline failure time, and are therefore rank preserving (see, e.g., Mark & Robins, 1993a,b; Robins & Greenland, 1994; Robins, 2002; Hernán et al., 2005). Moreover, existingg-estimation approaches often use a discrete-time set-up, which requires all subjects to be followed at the same prefixed time-points. However, in practical situations, the variables and processes are more likely to be measured at irregularly spaced time-points, which may not be the same for all subjects (Robins, 1998a). To use existing estimators, one needs to discretize the timeline and recreate the measurements at each time-point, for example by averaging observations within the given time-point or by imputation if there are no observations. Such data pre-processing may distort the relationship between variables and cast doubt on the sequential randomization assumption, which is essential to justification of the discrete-timeg-estimation (Zhang et al., 2011). In the literature, much less work has addressed non-rank-preserving continuous-time causal models; exceptions include Robins (1998b), Lok et al. (2004) and Lok (2008, 2017). Robins (1998b) conjectured that g-estimation extends to settings with continuous-time processes, but still relies on the rank-preserving assumption. Recently, Lok (2017) presented a formal proof of the extension conjecture without the assumption of rank preservation.

Despite these advances, estimation for continuous-time structural failure time models is largely underdeveloped. Existing g-estimation is singly robust, in the sense that it relies on a correct model specification for the treatment process. In the literature of missing data analysis and causal inference, many authors have proposed doubly robust estimators that require either one of the two model components to be correctly specified (Robins et al., 1994; Scharfstein et al., 1999; Van Der Laan et al., 2002; Lunceford & Davidian, 2004; Bang & Robins, 2005; Robins et al., 2007; Cao et al., 2009; Lok & DeGruttola, 2012). Yang & Lok (2016) constructed a doubly robust test procedure for structural nested mean models. To the best of our knowledge, a doubly robust estimator for structural failure time models does not exist.

We develop a general framework for structural failure time models with continuous-time processes. We relax the local rank-preservation condition by specifying a distributional rather than deterministic relationship between the treatment process and the potential baseline failure time. We impose a martingale condition of no unmeasured confounding, which serves as the basis for identification and estimation. Under the semiparametric model characterized by the structural failure time model and the no unmeasured confounding assumption, we develop a class of regular asymptotically linear estimators. This class of estimators contains the semiparametric efficient estimators (Bickel et al., 1993; Tsiatis, 2006). We further construct an optimal member among a wide class of semiparametric estimators that are relatively simple to compute. Moreover, we show that our estimators are doubly robust in the sense that they are consistent if either the model for the treatment process is correctly specified or the failure time model is correctly specified, but not necessarily both. Our framework is readily applicable to the traditional discrete-time settings.

In the presence of censoring, Robins and coauthors have introduced the notion of the potential censoring time and proposed a way of using this information to estimate the treatment effect. This approach may artificially terminate follow-up for some subjects before their observed failure or censoring times, so it is often called artificial censoring. It works only for administrative censoring when follow-up ends at a prespecified date, and it fails to provide consistent estimators for dependent censoring (Rotnitzky & Robins, 1995), which likely occurs due to drop-out of subjects. Moreover, the computation and inference are challenging because of the nonsmoothness of artificial censoring (Joffe, 2001; Joffe et al., 2012). To overcome these limitations, we propose using inverse probability of censoring weighting. In contrast to artificial censoring, our weighting strategy is smooth and ensures that resampling methods can be used for inference, which is straightforward to implement in practice.

2. Notation, models and assumptions

2.1. Notation

We assume that n subjects constitute a random sample from a larger population of interest and are therefore independent and identically distributed. For notational simplicity, we suppress the subscript i for subjects. Let T be the observed failure time. Let Lt be a multi-dimensional covariate process, and let At be the binary treatment process, i.e., At = 1 if the subject is on treatment at time t and At = 0 if the subject is off treatment at time t. We assume that all subjects received treatment at baseline and may discontinue treatment during follow-up. We also assume that treatment discontinuation is permanent, i.e., if At = 0 then Au = 0 for all ut. Let V be the time to treatment discontinuation or failure, whichever comes first, and let Γ be the binary indicator of treatment discontinuation at time V. To ensure regularity, we assume that all continuous-time processes are càdlàg, i.e., the processes are continuous from the right and have limits from the left. Let Ht = (Lt, At) be the combined covariates and treatment process, where At denotes the treatment just before time t. We use an overbar to denote the history; for example, H¯t=(Hu:0ut) is the history of the covariates and treatment process until time t. Following Cox & Oakes (1984), we assume that there exists a potential baseline failure time U, representing the failure time had the treatment always been withheld. The full data consist of F=(T,H¯T). Up until § 4 we assume that there is no censoring before T.

2.2. Structural failure time model

The structural failure time model specifies the relationship between the potential baseline failure time U and the actual observed failure time T. We assume that given any H¯t,

U~U(ψ*)=0Texp[{ψ1*+ψ2*Tg(Lu)}Au]du, (1)

where ~ means has the same distribution as and ψ*T=(ψ1*,ψ2*T) is a p-vector of unknown parameters. Model (1) entails that the treatment effect is to accelerate or decelerate the failure time relative to the potential baseline failure time U. Intuitively, exp[{ψ1*+ψ2*Tg(Lt)}At] can be interpreted as the effect rate of the treatment on the outcome, possibly modified by the time-varying covariate g(Lt).To aid understanding of the model, consider a simplified model U(ψ*)=0Texp(ψ1*Au) du. The multiplicative factor exp(ψ1*) describes the relative increase or decrease in the failure time had the subject continuously received treatment compared to had the treatment always been withheld.

Remark 1.

The rank-preserving structural failure time model specifies a deterministic relationship instead of a distributional relationship between the failure times, i.e., it uses = instead of ~ in model (1). Then, for subjects i and j who have the same observed treatment and covariate history, Ti < Tj must imply Ui < Uj. This may be restrictive in practice. In contrast, we link the distribution of the potential baseline failure time and the distribution of the actual failure time after removing the treatment effect. Specifically, we assume that the distributions of U and U(ψ*) are the same given past treatment and covariates, thus avoiding the rank-preserving restriction.

2.3. No unmeasured confounding

The model parameter ψ* is not identifiable in general, because U is missing for all subjects. To identify and estimate ψ*, we impose the following assumption (Yang et al., 2018).

Assumption 1 (No unmeasured confounding).

The hazard of treatment discontinuation is

λV(tF,U)=limh0h1pr(tV<t+h,Γ=1F,U,Vt)=limh0h1pr(tV<t+h,Γ=1H¯t,Vt)=λV(tH¯t). (2)

Assumption 1 implies that λV (t | F, U) depends only on the past treatment and covariate history up to time t, H¯t, but not on the future variables and U. This assumption holds if the set of historical covariates contains all prognostic factors for the failure time that affect the decision of discontinuing treatment at time t.

For an equivalent representation of the treatment process At, we define the counting process NV (t) = I(Vt, Γ = 1)) and the at-risk process YV (t) = I (Vt) (Andersen et al., 1993). Let σ(Ht) be the σ-field generated by Ht, and let σ(H¯t) be the σ-field generated by utσ(Hu). We show in the Supplementary Material that under model (1), (2) implies that

λV{tH¯t,U(ψ*)}=λV(tH¯t).

Thus, under common regularity conditions for the counting process, MV(t)=NV(t)0tλV(u|H¯u)YV(u) du is a martingale with respect to σ{U(ψ*),H¯t}, which renders ψ* identifiable.

3. Semiparametric estimation

We consider the semiparametric model characterized by (1) and Assumption 1. We derive a regular asymptotically linear estimator ψ^ of ψ*, such that

n1/2(ψ^ψ*)=PnΦ(F)+op(1),

where Pn is the empirical measure induced by F1, …, Fn, i.e., PnΦ(F)=n1i=1nΦ(Fi) and Φ(F) is the influence function of ψ^, which has zero mean and finite and nonsingular variance.

Let fF(T,H¯T;ψ,θ) be the semiparametric likelihood function based on a single variable F, where ψ is the primary parameter of interest and θ is the infinite-dimensional nuisance parameter. A fundamental result of Bickel et al. (1993) states that the influence functions for regular asymptotically linear estimators lie in the orthogonal complement of the nuisance tangent space, denoted by Λ. We characterize Λ in the following theorem, the proof of which is given in the Supplementary Material.

Theorem 1.

Under model (1) and Assumption 1, the orthogonal complement of the nuisance tangent space for ψ* is

Λ={0(hu{U(ψ*),H¯u}E[hu{U(ψ*),H¯u}H¯u,Vu]) dMV(u)}

for all p-dimensional hu{U(ψ*),H¯u}.

The score function of ψ* is Sψ(F)= log fF(T,H¯T;ψ,θ)/ψ evaluated at (ψ*, θ*). Following Bickel et al. (1993), the efficient score for ψ* is Seff (F) = Π{Sψ(F) | Λ}, where Π is the projection operator in the Hilbert space. The efficient influence function is Φ(F) = E{Seff(F)Seff(F)T}−1Seff(F), with the variance [E{Seff (F)Seff (F)T}]−1 achieving the semiparametric efficiency bound. However, the analytical form of Sψ(F) is intractable in general. To facilitate estimation, we focus on a reduced class of Λ with hu{U(ψ*),H¯u}=c(H¯u)U(ψ*) for c(H¯u)p, leading to the following estimating function for ψ*:

G(ψ;F)=0c(H¯u)[U(ψ)E{U(ψ)H¯u,Vu}] dMV(u). (3)

Because of the no unmeasured confounding assumption, U(ψ*)MV(u)(H¯u,Vu) and so E{G(ψ*; F)} = 0. We obtain the estimator of ψ* by solving

Pn{G(ψ;F)}=0. (4)

Within this class, we show that the optimal choice of c(H¯u) is

copt(H¯u)=E{U˙u(ψ)/ψH¯u,V=u}[var{U(ψ)H¯u,Vu}]1. (5)

In practice, we require working models to be posited for approximating copt(H¯u); see the example in the simulation study. Compared to naive choices, such as c(H¯u)={Au,Aug(Lu)T}T for model (1), our simulation results show that using the optimal choice yields gains in estimation efficiency.

In (4), we assume that the hazard function for the treatment process and E{U(ψ)H¯u,Vu} are known. In practice, they are often unknown and must be modelled and estimated from the data. We posit a proportional hazards model with time-dependent covariates,

λV(tH¯t;γV)=λV,0(t) exp{γVTgV(t,H¯t)},

where λV,0(t) is unknown and nonnegative, gV(t,H¯t) is a prespecified function of t and H¯t, and γV is a vector of unknown parameters. We also posit a working model E{U(ψ)H¯u,Vu;ξ} indexed by ξ. We show that the estimating equation for ψ* achieves the double robustness or double protection (Rotnitzky & Vansteelandt, 2015).

Theorem 2 (Double robustness).

Under model (1) and Assumption 1, the estimating equation (4) for ψ* is unbiased if either the model for the treatment process is correctly specified or the failure time model E{U(ψ)H¯u,Vu;ξ} is correctly specified, but not necessarily both.

4. Censoring

4.1. Inverse probability of censoring weighting

In most studies, the failure time is subject to right censoring. We now introduce C, the time to censoring. The observed data are O={X=min(T,C),Δ=1(TC),H¯X}. In the presence of censoring, we may not observe T, so it may not be feasible to solve the estimating equation (4). A naive solution is to replace T in U(ψ) by X and use U˜(ψ)=0Xexp(ψAs) ds; however, U˜(ψ*) depends on the whole treatment process and is therefore not independent of MV (t) given (H¯t,Vt), which renders the estimating equation (4) biased (Hernán et al., 2005). Robins (1998b) proposed a strategy for dealing with administrative censoring, a censoring mechanism which occurs when subjects are censored due to the fact that the study ended at a known calendar date. In this case, C is independent of all other variables. In Robins’s strategy, U(ψ) is replaced by a function of U(ψ) and C, which is always observable. For illustration, consider U(ψ)=0Texp(ψAu) du and

C(ψ)=minas{0,1}0Cexp(ψas) ds={C,ψ0,C exp(ψ),ψ<0.

Then U˜(ψ*)=min{U(ψ*),C(ψ*)} and Δ(ψ*) = 1{U(ψ*) < C(ψ*)} are two functions that are independent of MV (t) given (H¯t,Vt) and always computable; see the Supplementary Material. The g-estimator is constructed based on U˜(ψ) and Δ(ψ). In this approach, for subjects with T < C it is possible that U(ψ) > C(ψ) and Δ(ψ) = 0, i.e., those subjects who actually were observed to fail are treated as if they were censored. Therefore, this approach is often called artificial censoring. Artificial censoring suffers from many drawbacks. First, the resulting estimating equation is not smooth in ψ, and therefore the estimation and inference are challenging (Joffe et al., 2012). Second, if the censoring mechanism is dependent, the estimators will be inconsistent (Robins, 1998b). To avoid the disadvantages of artificial censoring and also allow for more general censoring mechanisms, we consider using inverse probability of censoring weighting. Robins (1998b) suggested and Witteman et al. (1998) applied the weighting approach to deal with censoring by competing risks in deterministic structural failure time models with discretized data. We now assume an ignorable censoring mechanism as follows.

Assumption 2.

The hazard of censoring is

λC(tF,T>t)=limh0h1pr(tC<t+hCt,F,T>t)=limh0h1pr(tC<t+hCt,H¯t,T>t)=λC(tH¯t,T>t),

written as λC(tH¯t) for short.

Assumption 2 says that λC(t | F, T > t) depends only on the past treatment and covariate history up to time t, but not on the future variables and failure time. This assumption holds if the set of historical covariates contains all prognostic factors for the failure time that affect the loss to follow-up at time t. Under this assumption, the missing data due to censoring are missing at random (Rubin, 1976). In the presence of censoring, V is redefined to be the time to treatment discontinuation, failure or censoring, whichever comes first. We show in the Supplementary Material that λV(tH¯t) is equal to λV(tH¯t,Ct) and so can be estimated conditional on Vt with the new definition of V. From λC(tH¯t) we define KC(tH¯t)=exp{0tλC(uH¯u) du}, which is the probability of the subject not being censored before time t. For regularity, we also impose a positivity condition on KC(tH¯t).

Assumption 3 (Positivity).

There exists a constant δ such that with probability 1, KC(tH¯t)δ>0 for t in the support of T.

Under Assumptions 1–3, ψ* is identifiable; see the Supplementary Material for a proof. Following Rotnitzky et al. (2009), the main idea of inverse probability of censoring weighting is to redistribute the weights for the censored subjects to the remaining uncensored subjects.

Theorem 3.

Under Assumptions 13, the unbiased estimating equation for ψ* is

Pn{ΔKC(TH¯T)G(ψ;F)}=0, (6)

where G(ψ; F) is as defined in (3).

Theorem 3 assumes that λC(tH¯t) is known. As was done for λV(tH¯t), we posit a proportional hazards model with time-dependent covariates,

λC(tH¯t)=λC,0(t) exp{γCTgC(t,H¯t)},

where λC, 0(t) is unknown and nonnegative, gC(t,H¯t) is a prespecified function of t and H¯t, and γC is a vector of unknown parameters.

To summarize, the algorithm for developing an estimator of ψ* is as follows.

Step 1.

Using the data (Vi,Γi,H¯Vi,i) (i = 1, …, n), obtain estimators for λV(tH¯t)=λV,0(t) exp{γVTgV(t,H¯t)} and MV (t).To estimate γV, treat the treatment discontinuation as failure and the failure event and censoring as censored observations in the time-dependent proportional hazards model. Once we have an estimate of γV, γ^V, we can estimate the cumulative baseline hazard, λV, 0(t) dt, using the Breslow estimator

λ^V,0(t) dt=i=1ndNV,i(t)i=1nexp{γ^VTgV(t,H¯t,i)}YVi(t).

Then we obtain M^V(t)=NV(t)0texp{γ^VTgV(u,H¯u)}λ^V,0(u)YV(u) du.

Step 2.

Using the data (Xi,Δi,H¯Xi,i) (i = 1, …, n), obtain estimators for λC(tH¯t)=λC,0(t) exp{γCTgC(t,H¯t)} and KC(TiH¯Ti). To estimate γC, treat censoring as failure and the failure event as censored observations in the time-dependent proportional hazards model. Once we have an estimate of γC, γ^C, we can estimate λC, 0(t) dt using the Breslow estimator

λ^C,0(t) dt=i=1ndNC,i(t)i=1nexp{γ^CTgC(t,H¯t,i)}YCi(t),

where NC(t) = I(C ⩿ t, Δ = 0) and YC(t) = I(Ct) are the counting process and the at-risk process of observing censoring, respectively. Then we estimate KC(tH¯t) by

K^C(tH¯t)=0ut[1exp{γ^CTgC(u,H¯u)}λ^C,0(u) du].

Step 3.

We obtain the estimator ψ^ of ψ by solving

Pn{ΔK^C(TH¯T)c(H¯u)[U(ψ)E{U(ψ)H¯u,Vu;ξ^}] dM^V(u)}=0, (7)

where we estimate E{U(ψ)H¯u,Vu;ξ} by regressing K^C(TH¯T)1ΔU(ψ) on (X0, Lu, u) restricted to subjects with Vu. The estimating equation (7) is continuously differentiable in ψ, and hence can generally be solved using a Newton–Raphson procedure (Atkinson, 1989). For example, one can use the multiroot function in R (R Development Core Team, 2020).

Remark 2.

It is worth discussing the connection between the proposed framework and the existing framework for the discrete-time setting. If the processes take observations at discrete times {t0, …, tK}, then for t = tm, H¯t={Ht1,,Htm}, dNT(t) is a binary treatment indicator, and 0tλT(uH¯u)YT(u) du becomes the propensity score pr{dNT(t)=1H¯t}. As a result, in the special case where E{U(ψ)H¯u,Vu;ξ^} is zero, (7) simplifies to the existing estimating equation for ψ*. Importantly, (7) provides, for the first time in the literature, a semiparametric doubly robust estimator ψ^ even for the discrete-time setting, in the sense that ψ^ is consistent if either the model for the treatment process or the failure time model is correctly specified, under correct model specifications for the treatment effect mechanism and the censoring mechanism.

4.2. Asymptotic theory and variance estimation

In this section we discuss the asymptotic properties of our proposed estimator; the technical details are presented in the Supplementary Material. To reflect the dependence of the estimating equation on the nuisance models, write (7) as PnΦ(ψ,ξ^,M^V,K^C;F)=0, where

Φ(ψ,ξ,MV,KC;F)={KC(TH¯T)}1Δ×c(H¯u)[U(ψ)E{U(ψ)H¯u,Vu;ξ}] dMV(u).

Let the probability limits of ξ^, M^V and K^C be ξ*, MV* and KC*, respectively. We impose standard regularity conditions for Z-estimators (van der Vaart & Wellner, 1996). Roughly speaking, these conditions restrict the flexibility and convergence rates of the nuisance estimators; for example, we assume that Φ(ψ, ξ, MV, KC; F) and Φ(ψ, ξ, MV, KC; F)/∂ψ belong to P-Donsker classes. The regularity conditions ensure that

E(c(H¯u)[E{(U(ψ*)U(ψ*)/ψ)|H¯u,Vu;ξ^}E{(U(ψ*)U(ψ*)/ψ)|H¯u,Vu;ξ*}]d{M^V(u)MV*(u)})=o(n1/2).

Under Assumptions 3 and further assumptions in the Supplementary Material if KC is correctly specified and if either E{U(ψ)H¯u,Vu} or MV is correctly specified, ψ^ solving (7) with the estimated nuisance models is still consistent and asymptotically normal, with the influence function Φ˜(ψ*,ξ*,MV*,KC*;F).

We can estimate the variance of ψ^ either by the empirical variance of the estimated influence function or by resampling. If all the nuisance models, ξ, MV and KC, are correctly specified, we obtain an analytical expression for Φ˜(ψ*,ξ*,MV*,KC*;F). We can then estimate Φ˜(ψ*,ξ*,MV*,KC*;F) by plugging in estimates of ψ*, ξ*, MV*, KC* and the required expectations, denoted by Φ^(ψ^,ξ^,M^V,K^C;F). Then the estimated variance of n1/2(ψ^ψ*) is

Pn{Φ^(ψ^,ξ^,M^V,K^C;F)Φ^(ψ^,ξ^,M^V,K^C;F)T}. (8)

However, when one of ξ and MV is correctly specified, but not both, characterizing Φ˜(ψ*,ξ*,MV*,KC*;F) is difficult, and hence approximating (8) is no longer feasible. To avoid this technical difficulty, we recommend estimating the asymptotic variance by resampling methods such as the bootstrap and jackknife (Efron, 1979; Efron & Stein, 1981). In this case, the resampling works because ψ^ is regular and asymptotically normal.

5. Simulation study

We evaluate the finite-sample performance of the proposed estimator on simulated datasets. We generate U from Ex(0.2) and generate the covariate process (X0, Lt) had the treatment always been withheld. We generate X0 from Ber(0.55). To generate Lt, we first generate a 1 × 3 row vector following a multivariate normal distribution with mean 0.2U – 4 and covariance 0.7|ij| for i,j = 1,2,3. This vector represents the values of Lt at times t1 = 0, t2 = 5 and t3 = 10. We assume that the time-dependent variable remains constant between measurements. We generate the time until treatment discontinuation, V1, according to a proportional hazards model λV(tX0,L¯t)=0.15 exp(0.15X0+0.15Lt). This determines the treatment process At, i.e., At = 1 if tV1 and At = 0 if t > V1. The observed time-dependent covariate process is Lt if tV1 and Lt + log(tV1) if t > V1, to reflect the fact that the covariate process is affected after treatment discontinuation. Let the history of covariates and treatment up to time t be H¯t=(X0,L¯t,A¯t). We generate T according to U~0Texp(ψ*Au) du as follows. Let T1 = U exp(−ψ*). If T1 < V1, then T = T1; otherwise T = U + V1V1 exp(ψ*). Under the above data-generating mechanism, the potential failure time under a¯T also follows a Cox marginal structural model with the hazard rate at u, λ0(u) exp(ψ* Au) (Young et al., 2010). We generate C according to a proportional hazards model with λC(tX0,L¯t,Ct)=0.025 exp(0.15X0+0.15Lt). Let X = min(T, C). If T < C, then Δ = 1; otherwise Δ = 0. Finally, let V = min(V1, T, C) and let Γ be the indicator of treatment discontinuation before the time to failure or censoring; i.e., if V = V1, then Γ = 1; otherwise Γ = 0. The observed data are (Xi,Δi,Vi,Γi,H¯Xi,i) for i = 1, …, n. We consider ψ* ∈ {−0.5, 0, 0.5}. From our data-generating mechanism, 50–58% of observations are censored, and 70–80% of treatment discontinuation times are observed before the time to failure or censoring.

We consider the following estimators of ψ*: (i) a naive estimator ψ^naive obtained by solving (4) with T in U(ψ)=0Texp(ψ*Au) du replaced by X; (ii) an inverse probability of weighting estimator ψ^msm for the Cox marginal structural model in continuous time (Yang et al., 2018); (iii) a simple inverse probability of censoring weighting estimator ψ^ipcw obtained by solving Pn[{K^C(TH¯T)}1Δc(H¯u)U(ψ)dMV(u)]=0; and (iv) the proposed doubly robust estimator ψ^dr obtained by solving (7) with E{U(ψ)H¯u,Vu} reduced to a tractable function E{U(ψ)H¯0}. Note that ψ^ipcw is the special case of ψ^dr with E{U(ψ)H¯u,Vu} misspecified as zero. Moreover, to demonstrate the effect of data discretization, we include the discrete-time g-estimator ψ^disc applied to the pre-processed data with grid size 51. The details for ψ^msm and ψ^disc are presented in the Supplementary Material. For estimators requiring a choice of c(H¯u), we compare the simple choice c(H¯u)=Au and the optimal choice copt(H¯u) in (5), where E{U˙u(ψ)/ψH¯u,V=u}=E(VuH¯u,Vu).We approximate E(VuH¯u,Vu) by the mean of the exponential distribution with rate λ^V(u) and assume that var{U(ψ)H¯u,Vu} is a constant, which is common practice in the generalized estimating equation literature. We approximate E{U(ψ)H¯u,Vu} by regressing K^C(TH¯T)1ΔU(ψ) on (X0, L0). To evaluate the double robustness, we consider two specifications for the hazard of treatment discontinuation: (a) the true proportional hazards model, and (b) a misspecified Kaplan–Meier model (Kaplan & Meier, 1958). In calculating the censoring weights, we specify the censoring model as the true proportional hazards model. We assess the impact of misspecification of the censoring model in the Supplementary Material. For standard errors, we consider the delete-a-group jackknife variance estimator with 500 groups (Kott, 1998).

Table 1 summarizes the simulation results with n = 1000. The naive estimator ψ^naive is biased, and its bias becomes larger as |ψ*| increases. In scenario 1, where the treatment process model is correctly specified, ψ^ipcw, ψ^dr and ψ^msm show small biases across all scenarios with different values of ψ*. Note that ψ^ipcw is a special case of the proposed estimator with E{U(ψ)H¯u,Vu} misspecified as zero. This demonstrates that the proposed estimator is robust to misspecification of E{U(ψ)H¯u,Vu} given that the treatment process model is correctly specified. If additionally E{U(ψ)H¯u,Vu} is well approximated, ψ^dr achieves gains in estimation efficiency over ψ^ipcw. Moreover, ψ^dr with copt is more efficient than with c. In scenario 1, ψ^dr has smaller standard errors than ψ^msm. This is because ψ^msm involves weighting directly by the inverse of the propensity score, whereas ψ^dr utilizes the propensity score not in the form of inverse weights and therefore avoids the possibly large variability due to weighting. In scenario 2, where the treatment process model is misspecified, ψ^ipcw and ψ^msm show large biases; however, ψ^dr still has small biases, confirming its double robustness. The jackknife variance estimation performs well for ψ^dr and produces coverage rates close to the nominal level. Large biases in the discrete-time g-estimator ψ^disc illustrate the consequences of data pre-processing for the subsequent analysis.

Table 1.

Simulation results: bias, standard deviation, root mean squared error, and coverage rate of 95% confidence intervals for exp(ψ*) over 1000 simulated datasets

ψ* = −0.5 ψ* = 0 ψ* = 0.5
Bias SE CR Bias SE CR Bias SE CR
Scenario 1 ψ^naive c 0.06 0.048 76.8 0.02 0.069 95.6 −0.06 0.112 92.4
copt 0.05 0.043 78.4 0.02 0.063 95.0 −0.05 0.107 91.8
ψ^ipcw c −0.01 0.089 95.2 −0.02 0.123 97.2 −0.02 0.191 95.6
copt −0.01 0.070 96.4 −0.02 0.095 97.0 −0.02 0.148 95.6
ψ^dr c 0.00 0.053 95.2 −0.00 0.076 96.8 −0.01 0.125 95.4
copt 0.00 0.049 95.4 −0.00 0.071 96.0 −0.00 0.118 94.8
ψ^msm −0.00 0.050 95.8 0.00 0.081 96.4 0.00 0.148 95.2
ψ^disc −0.37 0.041 0.0 −0.61 0.055 0.0 −1.01 0.092 0.6
Scenario 2 ψ^naive c 0.22 0.065 4.8 0.24 0.097 30.4 0.26 0.164 66.0
copt 0.22 0.066 5.4 0.24 0.097 31.8 0.26 0.163 68.2
ψ^ipcw c 0.16 0.098 62.4 0.23 0.140 64.4 0.33 0.239 79.6
copt 0.16 0.098 62.2 0.23 0.140 65.8 0.33 0.234 79.4
ψ^dr c 0.01 0.048 95.0 0.00 0.070 96.4 0.00 0.115 95.4
copt 0.01 0.048 95.4 0.00 0.070 96.6 0.00 0.115 95.2
ψ^msm 0.13 0.069 54.4 −0.40 0.051 57.6 0.36 0.217 75.6
ψ^disc −0.25 0.035 0.0 0.22 0.118 0.0 −0.72 0.092 1.0

Scenario 1, the treatment discontinuation model is correctly specified; Scenario 2, the treatment discontinuation model is misspecified; SE, standard error; CR, coverage rate of 95% confidence intervals.

6. Application to the GARFIELD data

We analyse data from the Global Anticoagulant Registry in the FIELD with Atrial Fibrillation, GARFIELD-AF, registry study, an observational study of patients newly diagnosed with atrial fibrillation; see the study website at http://www.garfieldregistry.org/ for details. Our analysis includes 22811 patients who were enrolled between April 2013 and August 2016, and received oral anticoagulant therapy for stroke prevention. The goal is to investigate the effect of discontinuation of oral anticoagulant therapy in patients with atrial fibrillation. The primary endpoint is the composite clinical outcome, including death, non-haemorrhagic stroke, systemic embolism and myocardial infarction. Treatment discontinuation at time t is defined as treatment being stopped at time t and never restarted afterwards. In our study, 9.5% of patients discontinued oral anticoagulant therapy over a median follow-up of 710 days with an interquartile range of (487, 731) days; 43.8% of discontinuations were within the first four months of beginning treatment. Among patients who discontinued treatment, 512 stopped the treatment for more than seven days and then went back on treatment. We censor these patients at the time of restarting treatment. This censoring mechanism is not likely to be completely at random, because patients with poor prognosis may be more likely to restart. We assume a dependent censoring mechanism and use inverse probability of censoring weighting.

To answer the clinical question of interest, we consider the structural failure time model U(ψ*)=0Texp (ψ*Au) du. Under this model, if a patient had been on treatment continuously, T = U(ψ*) exp(−ψ*), so U(ψ*){exp(−ψ*) − 1} is the time gained or lost while on treatment. We focus on estimating the multiplicative factor exp(ψ*). Table 2 reports the results obtained from using the naive estimator and the proposed doubly robust estimator as described in § 5. The details of the nuisance models are given in the Supplementary Material. Although the effect sizes may be a little different between the naive analysis and the proposed analysis, qualitatively they all suggest that treatment is beneficial for prolonging the time to clinical events, and thus that treatment discontinuation is harmful. If a patient had been on treatment continuously, the time to clinical outcomes would have been exp(ψ^)=1/0.64=1.56 times longer than if the patient had never received treatment. Importantly, the proposed analysis is designed to address the well-formulated question for investigating the effect of treatment discontinuation.

Table 2.

Results of the effect of oral anticoagulant therapy on the composite outcome; exp(ψ*) is the causal estimand

Est SE CI p-value
Naive method 0.68 0.176 (0.34, 1.03) 0.07
Proposed method 0.64 0.179 (0.29, 0.99) 0.04

Est, estimate of exp(ψ*); SE, standard error; CI, 95% confidence interval.

7. Discussion

The proposed framework of structural failure time models can be used to adjust for time-varying confounding and selection bias with irregularly spaced observations under the three assumptions of no unmeasured confounders, ignorability of censoring, and positivity. As discussed previously, the first and second assumptions hold in the scenario of adjusting for all variables that are related to both treatment discontinuation and outcome, and all variables that are related to both censoring and outcome. Although essential, these assumptions are not verifiable based on the observed data, but rely on subject-matter experts’ assessments of their plausibility. Future work will investigate the sensitivity to these assumptions using the methods of Yang & Lok (2017).The third assumption is that all subjects have nonzero probabilities of staying on study before the failure time; it requires the absence of predictors that are deterministic in relation to censoring and outcome. Practitioners should carefully examine the question at hand to eliminate deterministic violations of positivity.

Our framework can also be extended in the following directions. First, the proposed doubly robust estimator still relies on a correct specification of the censoring mechanism. If the censoring model is misspecified, the proposed estimator may be biased; see the additional simulation results in the Supplementary Material. It would be interesting to construct an improved estimator that is multiply robust in the sense that it is consistent in the union of the three models (Molina et al., 2017). Second, it is critical to derive test procedures for evaluating the goodness-of-fit of the treatment effect model (Yang & Lok, 2016).

Supplementary Material

supplementary material

Acknowledgement

We have benefited from the comments from two reviewers and Anastasio A. Tsiatis. The first author was partially supported by the U.S. National Science Foundation and National Cancer Institute.

Footnotes

Supplementary material

Supplementary material available at Biometrika online includes proofs of the theoretical results and additional simulations. The R package implementing the methods in this article is available at https://github.com/shuyang1987/contTimeCausal.

Contributor Information

S. YANG, Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, North Carolina 27695, U.S.A.

K. PIEPER, Duke Clinical Research Institute, Duke University, 300 W. Morgan Street, Durham, North Carolina 27705, U.S.A.

F. COOLS, Department of Cardiology, AZ Klina, Augustijnslei 100, 2930 Brasschaat, Belgium

References

  1. Andersen PK, Borgan O, Gill RD & Keiding N (1993). Statistical Models Based on Counting Processes. NewYork: Springer. [Google Scholar]
  2. Atkinson KE (1989). An Introduction to Numerical Analysis. NewYork: Wiley. [Google Scholar]
  3. Bang H & Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–73. [DOI] [PubMed] [Google Scholar]
  4. Bickel PJ, Klaassen C, Ritov Y & Wellner J (1993). Efficient and Adaptive Inference in Semiparametric Models. Baltimore, Maryland: Johns Hopkins University Press. [Google Scholar]
  5. Cao W, Tsiatis AA & Davidian M (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96, 723–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cox DR & Oakes D (1984). Analysis of Survival Data. London: Chapman and Hall. [Google Scholar]
  7. Daniel R, Cousens S, De Stavola B, Kenward M & Sterne J (2013). Methods for dealing with time-dependent confounding. Statist. Med 32, 1584–618. [DOI] [PubMed] [Google Scholar]
  8. Efron B (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist 7, 1–26. [Google Scholar]
  9. Efron B & Stein C (1981). The jackknife estimate of variance. Ann. Statist 9, 586–96. [Google Scholar]
  10. Hernán MA, Brumback B & Robins JM (2001). Marginal structural models to estimate the joint causal effect of nonrandomized treatments. J. Am. Statist. Assoc 96, 440–8. [Google Scholar]
  11. Hernán MA, Cole SR, Margolick J, Cohen M & Robins JM (2005). Structural accelerated failure time models for survival analysis in studies with time-varying treatments. Pharmacoepidemiol. Drug Safety 14, 477–91. [DOI] [PubMed] [Google Scholar]
  12. Joffe MM (2001). Administrative and artificial censoring in censored regression models. Statist. Med 20, 2287–304. [DOI] [PubMed] [Google Scholar]
  13. Joffe MM, Yang WP & Feldman H (2012). G-estimation and artificial censoring: Problems, challenges, and applications. Biometrics 68, 275–86. [DOI] [PubMed] [Google Scholar]
  14. Kaplan EL & Meier P (1958). Nonparametric estimation from incomplete observations. J. Am. Statist. Assoc 53, 457–81. [Google Scholar]
  15. Kott PS (1998). Using the delete-a-group jackknife variance estimator in practice In Proc. Surv. Res. Meth. Sect., ASA Alexandria, Virginia: American Statistical Association, pp. 763–8. [Google Scholar]
  16. Lok J, Gill R, Van Der Vaart A & Robins J (2004). Estimating the causal effect of a time-varying treatment on time-to-event using structural nested failure time models. Statist. Neer 58, 271–95. [Google Scholar]
  17. Lok JJ (2008). Statistical modeling of causal effects in continuous time. Ann. Statist 36, 1464–507. [Google Scholar]
  18. Lok JJ (2017). Mimicking counterfactual outcomes to estimate causal effects. Ann. Statist 45, 461–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lok JJ & DeGruttola V (2012). Impact of time to start treatment following infection with application to initiating HAART in HIV-positive patients. Biometrics 68, 745–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lunceford JK & Davidian M (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statist. Med 23, 2937–60. [DOI] [PubMed] [Google Scholar]
  21. Mark SD & Robins JM (1993a). Estimating the causal effect of smoking cessation in the presence of confounding factors using a rank preserving structural failure time model. Statist. Med 12, 1605–28. [DOI] [PubMed] [Google Scholar]
  22. Mark SD & Robins JM (1993b). A method for the analysis of randomized trials with compliance information: An application to the multiple risk factor intervention trial. Contr. Clin. Trials 14, 79–97. [DOI] [PubMed] [Google Scholar]
  23. Molina J, Rotnitzky A, Sued M & Robins J (2017). Multiple robustness in factorized likelihood models. Biometrika 104, 561–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. R Development Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria: ISBN 3-900051-07-0. http://www.R-project.org. [Google Scholar]
  25. Robins J (1992). Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika 79, 321–34. [Google Scholar]
  26. Robins J, Sued M, Lei-Gomez Q & Rotnitzky A (2007). Comment: Performance of double-robust estimators when ‘inverse probability’ weights are highly variable. Statist. Sci 22, 544–59. [Google Scholar]
  27. Robins JM (1998a). Correction for non-compliance in equivalence trials. Statist. Med 17, 269–302. [DOI] [PubMed] [Google Scholar]
  28. Robins JM (1998b). Structural nested failure time models In The Encyclopedia of Biostatistics, Armitage P & Colton T, eds. Chichester: Wiley, pp. 4372–89. [Google Scholar]
  29. Robins JM (2000). Marginal structural models versus structural nested models as tools for causal inference In Statistical Models in Epidemiology, the Environment, and Clinical Trials. NewYork: Springer, pp. 95–133. [Google Scholar]
  30. Robins JM (2002). Analytic methods for estimating HIV-treatment and cofactor effects In Methodological Issues in AIDS Behavioral Research. New York: Springer, pp. 213–88. [Google Scholar]
  31. Robins JM, Blevins D, Ritter G & Wulfsohn M (1992). G-estimation of the effect of prophylaxis therapy for pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology 3, 319–36. [DOI] [PubMed] [Google Scholar]
  32. Robins JM & Greenland S (1994). Adjusting for differential rates of prophylaxis therapy for PCP in high-versus low-dose AZT treatment arms in an AIDS randomized trial. J. Am. Statist. Assoc 89, 737–49. [Google Scholar]
  33. Robins JM, Hernan MA & Brumback B (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–60. [DOI] [PubMed] [Google Scholar]
  34. Robins JM, Rotnitzky A & Zhao LP (1994). Estimation of regression coefficients when some regressors are not always observed. J. Am. Statist. Assoc 89, 846–66. [Google Scholar]
  35. Robins JM & Tsiatis AA (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Commun. Statist A 20, 2609–31. [Google Scholar]
  36. Rotnitzky A, Bergesio A & Farall A (2009). Analysis of quality-of-life adjusted failure time data in the presence of competing, possibly informative, censoring mechanisms. Lifetime Data Anal. 15, 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rotnitzky A & Robins JM (1995). Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82, 805–20. [Google Scholar]
  38. Rotnitzky A & Vansteelandt S (2015). Double-robust methods In Handbook of Missing Data Methodology, Tsiatis A & Verbeke G, eds. Boca Raton, Florida: CRC Press, pp. 185–212. [Google Scholar]
  39. Rubin DB (1976). Inference and missing data. Biometrika 63, 581–92. [Google Scholar]
  40. Scharfstein DO, Rotnitzky A & Robins JM (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Statist. Assoc 94, 1096–120. [Google Scholar]
  41. Tsiatis A (2006). Semiparametric Theory and Missing Data. NewYork: Springer. [Google Scholar]
  42. Van Der Laan MJ, Hubbard AE & Robins JM (2002). Locally efficient estimation of a multivariate survival function in longitudinal studies. J. Am. Statist. Assoc 97, 494–507. [Google Scholar]
  43. van der Vaart AW & Wellner JA (1996). Weak Convergence and Emprical Processes: With Applications to Statistics. NewYork: Springer. [Google Scholar]
  44. Witteman JC, D’Agostino RB, Stijnen T, Kannel WB, Cobb JC, de Ridder MA, Hofman A & Robins JM (1998). G-estimation of causal effects: Isolated systolic hypertension and cardiovascular death in the Framingham Heart Study. Am. J. Epidemiol 148, 390–401. [DOI] [PubMed] [Google Scholar]
  45. Yang S & Lok JJ (2016). A goodness-of-fit test for structural nested mean models. Biometrika 103, 734–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Yang S & Lok JJ (2017). Sensitivity analysis for unmeasured confounding in coarse structural nested mean models. Statist. Sinica 28, 1703–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Yang S, Tsiatis AA & Blazing M (2018). Modeling survival distribution as a function of time to treatment discontinuation: A dynamic treatment regime approach. Biometrics 74, 900–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Young JG, Hernán MA, Picciotto S & Robins JM (2010). Relation between three classes of structural models for the effect of a time-varying exposure on survival. Lifetime Data Anal. 16, 71–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zhang M, Joffe MM & Small DS (2011). Causal inference for continuous-time processes when covariates are observed only at discrete times. Ann. Statist 39, 131–73. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary material

RESOURCES