Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 14.
Published in final edited form as: Biometrics. 2018 Aug 22;75(1):100–109. doi: 10.1111/biom.12943

On Doubly Robust Estimation of the Hazard Difference

Oliver Dukes 1,*, Torben Martinussen 2, Eric J Tchetgen Tchetgen 3, Stijn Vansteelandt 1
PMCID: PMC7735191  NIHMSID: NIHMS1001906  PMID: 30133696

Summary.

The estimation of conditional treatment effects in an observational study with a survival outcome typically involves fitting a hazards regression model adjusted for a high-dimensional covariate. Standard estimation of the treatment effect is then not entirely satisfactory, as the misspecification of the effect of this covariate may induce a large bias. Such misspecification is a particular concern when inferring the hazard difference, because it is difficult to postulate additive hazards models that guarantee non-negative hazards over the entire observed covariate range. We therefore consider a novel class of semiparametric additive hazards models which leave the effects of covariates unspecified. The efficient score under this model is derived. We then propose two different estimation approaches for the hazard difference (and hence also the relative chance of survival), both of which yield estimators that are doubly robust. The approaches are illustrated using simulation studies and data on right heart catheterization and mortality from the SUPPORT study.

Keywords: Additive hazards model, Causal inference, Doubly robust estimation, Lifetime and survival analysis, Semiparametric inference

1. Introduction

In the analysis of time-to-event data, one is often interested in the effect of an exposure A on a survival outcome T, subject to a censoring time C, and conditional on a set of variables L. This adjusted association may be summarized by the hazard difference, which can be estimated by fitting a multivariable additive hazards model, or the hazard ratio, commonly estimated via the Cox proportional hazards model. For example, in the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT), Connors et al. (1996) investigated the effect of right heart catheterization (RHC), a binary exposure, on patient mortality. The exposure could also be continuous, for example, particulate air pollution. In observational studies, the dimension of covariates to adjust for is often high. The SUPPORT investigators used expert input from clinicians to identify 72 variables that could affect the decision of whether to use RHC or not, which they wished to adjust for in the analysis. The problem of bias resulting from misspecification of the hazards regression model then becomes a dominant consideration.

Such concerns have prompted the development of doubly robust estimators of treatment effects (Robins and Rotnitzky, 2001). These estimators require two working models, one of which is a regression model for the outcome, and another that relates to the treatment selection mechanism. Only one of these models needs to be correct in order to consistently estimate the treatment effect. Doubly robust estimators are now well established for parameters in linear, log-linear, and logistic conditional mean models (Robins, 1994; Tchetgen Tchetgen et al., 2010), and are particularly appealing when evaluating static treatment regimes or estimating optimal dynamic regimes in longitudinal studies. This is because it is challenging to specify a series of sequential regression models for the same outcome that are all simultaneously correct. More recently, the usefulness of doubly robust procedures has also been recognized in the context of data-adaptive selection or regularization. In particular, a number of common doubly robust estimators have turned out to be less susceptible to regularization bias than popular alternative estimators that do not possess the doubly robust property (Farrell, 2015). Standard confidence intervals for these doubly robust estimators are moreover uniformly valid, even when they ignore the use of such data-adaptive procedures (assuming the estimators for both working models converge sufficiently fast to the truth).

There has however been limited development of doubly robust estimators of the parameters indexing the hazard regression models (additive or multiplicative) popular in survival analysis. For semiparametric proportional hazards models we conjecture that, partly due to the non-collapsibility of the hazard ratio, no estimators of the treatment effect hazard ratio exist that are consistent whenever the treatment selection mechanism (more precisely, the distribution of A given L) is known (Tchetgen Tchetgen et al., 2010). Double robustness with respect to the treatment selection mechanism is therefore not attainable under such models. In contrast, as we will show in this article, doubly robust estimation strategies do exist for the hazard difference under semiparametric additive hazards models. The additional robustness is particularly advantageous for additive models; these are prone to misspecification since they do not impose the constraint that hazards are non-negative.

In Section 2, we introduce the new class of semiparametric additive hazards models. A theory of estimation for these models is developed in Section 3, and the efficient score function is identified. Because it requires specification of the entire conditional distribution of the treatment given the covariates, we also describe a subclass of estimators which only requires this conditional distribution to be correctly specified up to the mean. Drawing from these results, two practical strategies for the estimation of the treatment effect are proposed in Section 4; both of which yield doubly robust estimators. In Section 5, our estimators are compared in simulations with standard estimators of additive hazards models, and we reanalyze data from the SUPPORT study in Section 6.

2. The Model

We begin with some notation. The counting process corresponding to the survival time T is denoted by N(t) = I(Tt); Ft is the history spanned by N(t), with R(t) = I(Tt). Let L include 1 for the intercept. For the moment, we assume there is no censoring.

Consider an additive hazards model of the form

E{dN(t)|Ft,L,A}={dΓ(t)TL+ψAdt}R(t),

where dΓ (t)T is a vector of coefficients that are allowed to depend on time (McKeague and Sasieni, 1994). This model imposes restrictions on the effect of L on the hazard at any time point t; such restrictions are undesirable, because misspecification of an additive hazards model may be inevitable when L is high-dimensional and has continuous components. Incorrectly specifying the effect of L can then induce bias in estimation of ψ. We therefore further relax the model restrictions by developing inference for the semiparametric additive hazards model M, defined by

E{dN(t)|Ft,L,A}={dΩ(t,L)+ψAdt}R(t), (1)

where dΩ (t, L) denotes the effect of time and the covariates on the hazard and is left unspecified. Restrictions are now only imposed on the association between A and the hazard. A further relaxation of model M is

E{dN(t)|Ft,L,A}={dΩ(t,L)+dΨ(t)A}R(t), (2)

where dψ (t) is an unknown locally integrable function of time. In Web Appendix A, we extend our results on estimation to model (2) (denoted by MTV), but otherwise assume that the effect of A is constant. To simplify the exposition, we will also assume there is no effect modification by Z, where Z = z(L) is a vector function of L, and give details on extensions for multivariate ψ in the discussion.

By the equality

pr(T>t|A,L)pr(T>t|A=0,L)= exp(ψAt),

which is implied by (1), it follows that exp(− ψAt) can be interpreted as the adjusted relative change in the probability of surviving time t per unit increase in the exposure. This relative chance of survival is potentially easier to communicate than the hazard difference (or ratio). The reason is that contrasts between hazards lack a causal interpretation because they compare, at each time t, individuals who have not yet failed at that time. These individuals may not be exchangeable between treatment arms, even when the treatment is randomly assigned (Hernán, 2010).

3. Theory of Semiparametric Estimation

3.1. The Efficient Score for ψ

In this section, we develop a theory of estimation for ψ. We first give the semiparametric efficient score for ψ under model M and discuss the properties of an efficient estimator. The derivation of all results is left to Web Appendix A.

Let Ω(t,L)=0tdΩ(s,L), λ(t|A, L;ψ) = dΩ(t, L)/dt + ψA and dM(t; ψ) = dN(t) − λ(t|A, L; ψ)R(t)dt be the increment at time t of a local square-integrable martingale. Then the locally efficient score for ψ under model M is

Seff=0[AE{λ1(t|A,L;ψ)A exp(ψAt)|L}E{λ1(t|A,L;ψ) exp(ψAt)|L}]dM(t;ψ)λ(t|A,L;ψ) (3)

The solution ψ^ for ψ to the equation

0=i=1n0[AiE{λ1(t|Ai,Li;ψ)Ai exp(ψAit)|Li}E{λ1(t|Ai,Li;ψ) exp(ψAit)|Li}]dMi(t;ψ)λ(t|Ai,Li;ψ)

thus has an asymptotic variance which attains the semiparametric efficiency bound for ψ under model M, when f (A|L) is known and Ω(t, L) is correctly specified (Bickel et al., 1993).

In practice, the law f (A|L) will usually be unknown, and thus so will E−1(t, A, L)A exp(−ψAt)|L} and E−1(t, A, L) exp(−ψAt)|L}. One may then postulate a parametric model AD for the population distribution f (A|L) = f (A|L; α), where f (A|L; α) is a known function smooth in an unknown finite-dimensional parameter α. In practice, α can be estimated using maximum likelihood. Since f (A|L) is ancillary to ψ, the efficiency bound for ψ is the same whether f (A|L) is estimated or known. The score (3) is thus efficient under the intersection model MAD.

Implementation of an efficient estimator also requires knowledge of dΩ(t, L). In the semiparametric model M, this function is left unspecified and is unknown to the data analyst. One option then is to estimate it via a working model B, such as dΩ(t, L) = dΩ{L; Γ(t)} where dΩ{L; Γ(t)} is a known function, smooth at each time point t in an unknown finite-dimensional parameter Γ(t). With a slight abuse of notation, let L¯ denote the set of covariates that are included in the model B; for example, if L (1, L1)T, then a potential choice could L¯=(1,L1,L12)T. We will postulate a linear model dΩ{L;Γ(t)}=dΓ(t)TL¯. The parameters Γ(t) can then be consistently estimated using Aalen least-squares (Aalen, 1980), upon changing the increments dN(t) to dN(t)dΨ^init(t)A, with dΨ^init(t) being a consistent estimator of the time varying effect dψinit(t) under the initial model E{dN(t)|Ft,L,A}=[dΩ{L;Γ(t)}+dΨinit(t)A]R(t).

We therefore arrive at the estimating function

0(AE[λ1{Γ(t),ψ}A exp(ψAt)|L;α]E[λ1{Γ(t),ψ} exp(ψAt)|L;α])×[dN(t)dΩ{L;Γ(t)}ψAdt]R(t)λ{Γ(t),ψ}, (4)

where λ{Γ(t), ψ} = dΩ{L; Γ(t)}/dt + ψA. Then the population expectation of (4) converges to

0E[E{(AE[λ1{Γ(t),ψ}A exp(ψAt)|L;α]E[λ1{Γ(t),ψ} exp(ψAt)|L;α])×λ1{Γ(t),ψ} exp(ψAt)|L}× exp{Ω(t,L)}[dΩ(t,L)dΩ{L;Γ(t)}]]

It follows that the function in (4) has mean zero when, in addition to M, either model AD or B is correct. The efficient estimator under model MAD is therefore doubly robust (Robins and Rotnitzky, 2001). The semiparametric efficiency bound under model MAD is only met when model B is correctly specified and thus attained locally.

We have argued that the score (3) is efficient under MAD. However, the same score in (3) also delivers an efficient doubly robust estimator; specifically, it is efficient under the union model M(ADB) at the intersection submodel MADB. This follows from a general result in Robins and Rotnitzky (2001); however, the efficiency bound under MB may be lower than the bound under the union model.

3.2. Efficiency in a Subclass of Estimators

A drawback of the efficient score derived in the previous section is that it requires postulation of a model for the entire conditional distribution f (A|L). In Web Appendix A, we therefore derive the subclass of influence functions which have mean zero when the conditional mean E(A|L) is known. When A is binary, this conditional mean is known as the propensity score (Rosenbaum and Rubin, 1983). We are then lead to the class of estimating functions

0d(t,L){AE(A|L)}R(t) exp(ψAt)×{dN(t)dΩ*(t,L)ψAdt}, (5)

where d(t, l) and dΩ*(t, L) are arbitrary functions of t and l with finite variance. Note indeed that (5) no longer depends on f (A|L) but only on the conditional mean E(A|L). The term R(t) exp(ψAt) can be interpreted as the removal of the treatment effect (in expectation) from the at-risk indicators (Martinussen et al., 2011). That (5) has mean zero under model M when E(A|L) is known can be seen as follows:

E[d(t,L){AE(A|L)}R(t) exp(ψAt){dN(t)dΩ*(t,L)ψAdt}]=E[d(t,L) exp{Ω(t,L)}{dΩ(t,L)dΩ*(t,L)}E{AE(A|L)|L}]=0

for all d(t, L). It is shown in Web Appendix A that the optimal choice of d(t, L) for efficiency is

deff(t,L)=var(A|L)E[{AE(A|L)}2 exp(ψAt)λ(t|A,L)|L]; (6)

for dΩ*(t, L), it is equal to dΩ(t, L). The efficient estimator within this subclass is obtained by solving the equations

0=i=1n0var(Ai|Li)E[{AiE(Ai|Li)}2 exp(ψAit)λ(t|Ai,Li)|Li]×{AiE(Ai|Li)}Ri(t) exp(ψAit)×{dNi(t)dΩ(t,Li)ψAidt} (7)

The conditional expectation E(A|L) is typically unknown. It can be estimated under a parametric model AE for the conditional mean E(A|L) = E(A|L; β), where E(A|L; β) is a known function, smooth in an unknown finite-dimensional smooth parameter β. Under model AE, β can be estimated using maximum likelihood.

Estimation of model AE does not affect the efficiency bound for the class of estimators identified by (5). Furthermore, misspecification does not induce bias when dΩ*(t, L) = dΩ(t, L) is consistently estimated, since the estimating function in (7) is unbiased under the union model M(AEB) and therefore doubly robust.

4. Implementation

4.1. Estimation via f (A|L)

In this section, we will build on the efficiency theory of the previous section and outline two potential estimation strategies for ψ. All of the estimators are consistent and asymptotically normal and accompanying variance estimators, unless stated otherwise, are given in Web Appendix B.

From the perspective of maximizing efficiency, a reasonable approach to take is to construct an estimator based on the efficient score (3). We note first that the score requires inverse weighting by the hazard function; this is also the case for efficient estimators of the parameters indexing other additive hazards models (McKeague and Sasieni, 1994). In practice, this can lead to estimators with unstable performance in small-samples. If the hazard weights are removed from (3), as is common in standard fitting strategies for additive hazards models, we are left with the estimating function

0[AE{A exp(ψAt)|L}E{ exp(ψAt)|L}]R(t){dN(t)dΩ(t,L)ψAdt} (8)

The ratio

E{A exp(ψAt)|L}E{ exp(ψAt)|L} (9)

is the first order derivative of the cumulant generating function for f (A|L), evaluated at −ψt. Under some model AD, one may evaluate this ratio directly or using Monte Carlo integration. Estimators of the asymptotic variance can be derived following standard M-estimation arguments.

However, specifying a correct model for a distribution in this fashion is unappealing, as it is difficult to formulate plausible models and any resulting misspecification may then have a potentially large impact on subsequent inference. Also, Monte Carlo integration may be computationally inconvenient as the integration needs to be done at all parameter values through which one iterates when numerically solving the equation. In light of these limitations, we will pursue alternative strategies in the remainder of this section.

We return to the estimating function (8). By Bayes rule, it follows that under model M,

f(A|Tt,L=l)= exp(ψAt)f(A|L=l)E{exp(ψAt)|L=l}

for all l. Therefore,

E(A|Tt,L)=E{A exp(ψAt)|L}E{exp(ψAt)|L} (10)

which suggests that the unbiased estimating function (8) can also be written as

0{AE(A|Tt,L)}R(t){dN(t)dΩ(t,L)ψAdt} (11)

Rather than modeling (8) indirectly via a model for the distribution of A given L, we may choose to instead specify a model ATV for the time-varying propensity score E(A|T ≥ t, L). A question then is how to specify a model at each time t that is congenial with model M. Specifically, a parameterization of ATV is congenial with model M if for each element in ATV and M, there exists a distribution f (A|L) such that the equality (10) holds. If no such distribution exists, then we know before even seeing the data that the proposed models for M and ATV cannot both be correct. In Web Appendix A, we show that the following generalized linear model

E(A|Tt,L)=E{A|Tt,L;θ(t)}=g1{θ(t)TL˜}

is always congenial with M when the dispersion parameter for f (A|L) does not depend on L. Here, g() is a canonical link function; and L˜ is the vector of covariates that are included in the model ATV. A similar estimating function to (11) appears in Kang et al. (2018); however, they use a different parameterization of ATV to the one we give above, which requires estimation of P(Tt|A, L), P(Tt|L), and E(A|L). Parametric models for P(Tt|A, L) and P(Tt|L) may not be congenial with model MB, which undermines the feasibility of doubly robust inference. Their proposal therefore relies on kernel density estimators, which are not suitable when L is high-dimensional.

An advantage of our parameterization of ATV is that it admits a closed-form estimator of ψ, which is defined as

ψ^TVPSDR=i=1n0Λi{θ^(t)}Ri(t)J(t)[dNi(t)dΩ{Li;Γ^(t)}]i=1n0Λi{θ^(t)}Ri(t)Aidt (12)

Here, Λ{θ(t)} = AE{A|Tt,L; θ(t)} and J(t) = 1 if both Y(t) and L˜(t) have full rank and zero otherwise, where Y(t) denotes a matrix with ith row Ri(t)(L¯iT,Ai) and similarly for L˜(t). It follows from the theory of M-estimation that this estimator is consistent and asymptotically normal under model M(ATVB).

When L¯=L˜ (we hereby denote the common set of covariates by L˙), the previous expressions can be further simplified. Given the use of the canonical link function, θ(t) can be estimated at time t as the solution to the estimating equations

0=i=1nL˙iTRi(t)[AiE{Ai|Tit,Li;θ^(t)}]

By estimating θ(t) in this way, we ensure that the estimating equations for ψ reduce to

0=i=1n0Λi{θ(t)}Ri(t){dNi(t)ψAidt} (13)

and ψ can be estimated in closed-form as

ψ^TVPSDR=i=1n0Λi{θ^(t)}dNi(t)J(t)i=1n0Λi{θ^(t)}Ri(t)Aidt (14)

Surprisingly, estimation of Γ(t) is no longer required, yet the doubly robust property is retained. To see why, note that when model ATV is misspecified, the expectation of (13) will converge to

0E([E(A|Tt,L)E{A|Tt,L;θ*(t)}]R(t)dΩ(t,L)),

where θ*(t) is the limiting value of θ^(t). Because of how θ(t) is estimated, the above display will equal zero when dΩ(t, L) = dΩ{L; Γ(t)}, thus demonstrating double robustness. This strategy is related to bias-reduced doubly robust estimation, as proposed by Vermeulen and Vansteelandt (2015); further discussion is given in Web Appendix A.

Vansteelandt et al. (2014) showed that Aalen least-squares estimators are robust to misspecification of the additive hazards model when A is normal with a mean that is linear in L¯=L˜ and constant variance. Indeed, in this scenario, it follows from the Appendix of Vansteelandt et al. (2014) that the Aalen least-squares estimator is equivalent to the estimator given in (14). Using a Taylor expansion around (9),

E{A exp(ψAt)|L}E{exp(ψAt)|L}=E(A|L)+var(A|L)ψt+E(A3|L)3E(A|L)E(A2|L)+2E(A|L)32!ψ2t2+,

it follows that this robustness holds more generally so long as the mean and central moments of f (A|L) are linear in L. This assumption would not generally hold if A is binary; however, our estimator given in (12) generalizes the robustness properties of Aalen least-squares to arbitrary exposure distributions. Furthermore, if the true treatment effect ψ*(t, L) is a function of t and L, such that model M no longer holds, then the estimator defined in (12) continues to have a useful interpretation. Assuming model ATV is correct, then the estimator converges to

0E{var(A|Tt,L)R(t)ψ*(t,L)}dt0E{var(A|Tt,L)R(t)}dt,

which is a weighted average of the treatment effects at different times and covariate values. In contrast, the Aalen least-squares estimator of ψ in the corresponding additive hazards model is not generally a convex combination of the time/covariate-specific treatment effects, even when L is correctly modeled. It is in particular not guaranteed to lie within the range of time/covariate-specific treatment effects.

4.2. Estimation via E(A|L)

In Section 3.2, we identified a subclass of estimators that are consistent under a correctly specified model of the conditional mean of the exposure. We now exploit these results in order to develop inference for ψ.

Consider the estimating equations for ψ suggested by the function (5):

0=i=1n0d(t,Li){AiE(Ai|Li)}Ri(t) exp(ψAit)×{dNi(t)dΩ(t,Li)ψAidt} (15)

In evaluating the integral in (15), note that the function d(t, L) impacts only the variance, rather than the unbiasedness of the estimating equations. The efficient choice (6) depends on the conditional distribution of the treatment A, and in certain cases may lead to the integral becoming analytically intractable. We therefore set d(t,L) to 1, leaving the search for efficient yet computationally feasible choices to future work.

Letting Δ(β) denote {AE(A|L; β)}, the estimating equation becomes

0=i=1n0Δi(β)Ri(t) exp(ψAit){dNi(t)dΩ(t,Li)ψAidt}

ane ψ can therefore be estimated as a solution to the equations

0=i=1nΔi(β^){10TidΩ(s,Li) exp(ψAis)} (16)

It is vital for identification that the arbitrary function dΩ(t,L) is non-zero over a set of times t with positive Lebesgue measure. Otherwise, as we integrate to ∞, all information about the parameter ψ is lost. Therefore, dΩ (t,L) can be seen as weighting term that prevents the integral in (15) from equaling zero at all ψ.

Setting dΩ(t, Li) = 1 for all t > 0, the above equations reduce to 0=i=1nUi(ψ,β^), where

Ui(ψ,β^)={Δi(β^){exp(ψAiTi)1}/ψAi    if ψAi0Δi(β^)Ti     if ψAi=0

It follows from the theory of M-estimation that under standard regularity conditions, the solution ψ^BPS to equation (15) delivers an estimator which is consistent and asymptotically normal under model MAE

Doubly robust extensions to the previous proposal can also be made. Returning to equation (15), then rather than setting dΩ(t, Li) = 1 for all t > 0, we now postulate a model B for dΩ(t, L), such as dΩ(t, L) = dΩ{L; Γ(t)}. After setting d(t, Li) = 1 again for all t > 0, it follows that ψ can be estimated as the solution to

0=i=1nΔi(β^){(0TiJ(s) exp(ψAis)[dNi(s)dΩ{Li;Γ^(s)}]) exp(ψAiTi)} (17)

The resulting solution ψ^BPSDR is consistent and asymptotically normal under the model M(AEB).

It is straightforward to show that the efficient subclass score (7) is invariant to centering A by its conditional mean; if we are willing to work with an non-efficient estimator, it is also desirable that it has this property. We therefore recommend that A be substituted by Δ(β) in (16) and (17), such that the estimating equations implied by (16) reduce to Ui(ψ,β^)=[exp{ψΔi(β^)Ti}1]/ψ if Δi(β^)0 otherwise. Centering will prevent the exponential terms in the estimating equations from becoming large at later time points, which could lead to improved finite-sample performance.

Solving the equation U(ψ,β^) for the singly robust estimator could also be a computationally-fast first step towards an estimator that is nearly efficient (in the general class), if one is willing to specify the distribution f (A|L). This is because an initial estimate ψ^init could be plugged into ratio term (9) in the estimating function (8), making it linear in ψ. Under model M, the resulting two-step estimator is consistent and asymptotically normal if either or both of the models AD or B hold.

4.3. Censoring

When the survival time is censored by C, all of the approaches described above are valid under the assumption that censoring is independent of T and A, conditional on L, in the sense that C ⫫ (T, A)|L. All approaches are thus consistent when censoring depends only on L. The doubly robust estimators are moreover consistent when censoring depends additionally on A, and the additive hazards model is correctly specified. We can relax these assumptions by using inverse probability of censoring weighting (Scharfstein and Robins, 2002), under a model for the censoring mechanism:

E{dNC(t)|Tt,Ct,L,A,V¯t}=a(t,L,A,V¯t;π),

Here, NC(t) is the counting process for the censoring time; V¯t={Vs:s<t}, where Vs is a collection of covariates measured at time s; and a(t,L,A,V¯t;π) is a known function, smooth in an unknown parameter π. An additive or multiplicative hazards model could be postulated here. An individual’s contribution to the estimating function (12) at time t is then weighted by

1 exp{0ta(s,L,A,V¯s;π)ds},

such that ψ can be estimated as the solution to

0=i=1n0[1 exp{0ta(s,Li,Ai,V¯si;π^)ds}]Λi{θ^(t)}Ri(t)J(t)×[dNi(t)dΩ{Li;Γ^(t)}ψAidt] (18)

Weights can also be added to the estimating equations in (15), such that ψ can be estimated as the solution to

0=i=1nΔi(β^)(10Ti[1 exp{0sa(u,Li,Ai,V¯ui;π)du}]×dΩ(s,Li) exp(ψAis)ds),

and likewise for the doubly robust estimator given by (17).

When the resulting weights are highly variable, stabilized inverse probability weights can be obtained under an additional model for the censoring mechanism:

E{dNC(t)|Tt,Ct,L}=a(t,L;κ),

where a(t, L; κ) is a known function, smooth in an unknown parameter κ. Misspecification of the latter model does not affect the consistency of the estimator of ψ (Robins et al., 2000).

5. Simulation Study

We considered 4 estimators: i) the singly robust estimator ψ^BPS described in Section 4.2 that is consistent under model MAE; ii) the doubly robust estimator ψ^BPSDR based on display (17) that is consistent under model M(AEB); iii) doubly robust estimator ψ^TVPSDR given in closed-form in (12) that is consistent under model M(ATVB); and iv) the Aalen least-squares estimator ψ^ALS of the time-constant treatment coefficient from a covariate-adjusted additive hazards model (where the effects of the baseline covariates were allowed to vary over time) that is consistent under model MB. Model-based standard errors were used to construct 95% confidence intervals for ψ^ALS (the variance estimators used to construct the other 95% confidence intervals are described in Web Appendix B).

In order to evaluate the four different estimators, we considered eight different experiments. For each experiment, we simulated 1000 data sets of 1000 observations. We generated covariates L1 and L2, and exposure A, event time T and censoring time C; an individual’s follow up time was taken as min(T, C). In experiments 1–4, the exposure A was continuous (e.g., the increase in the dose of a drug), whereas it was binary in experiments 5–8 (see Table 1 for a descriptions of the data generating mechanisms). In experiments 1, 2, 5, and 6, all working models included only terms for L1, L2, and an intercept. In experiments 1 and 5, all models were correctly specified, whereas in experiments 2 and 6, the models AE and ATV were misspecified, as they excluded an interaction term. In experiments 3 and 7, the models AE and ATV correctly included an interaction term, whereas this term was excluded in experiments 4 and 8 (all models are wrong). In experiments 1, 2, 5, and 6, those for whom min(T, C) >= 1.6 were censored at t =1.6, corresponding to the study being closed at this time point. The same was done in experiments 3, 4, 7 and 8 at t = 1.3. For all experiments, the chosen censoring mechanisms lead to 25–30% of subjects being censored (with around 10% censored at the end of the study).

Table 1.

A description of the data-generating mechanisms behind experiments 1–8. In experiments 1–4, we standardized the exposure to give it mean zero and standard deviation 1. B(1, p): Bernoulli distribution with expectation p; N(μ,σ2): normal distribution with expectation μ and variance σ2; Exp(λ): exponential distribution with rate λ; unif(a, b)N:uniform distribution with minimum and maximum values a and b, respectively. We use L = (1, L1, L2)T; in all settings, for model B, we fitted E{dN(t)|Ft,L,A=0}=dΓ(t)TLR(t).

Exp. Data-generating mechanism Fitted exposure model
1 L1 ~ B(1, 0.6) AE:βTL
L2 ~ B{1, expit(0.5L1)} ATV:θ(t)TL
A~N{1+0.25(L1L2),0.09}
T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A)
C ~ unif(0, 3.5)
2 L1 ~ B(1, 0.6) AF:βTL
L2 ~ B{1, expit(0.5L1)} ATV:θ(t)TL
A~N{1+0.25(L1L2)+0.5L1L2,0.09}
T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A)
C ~ unif(0, 3.5)
3, 4 L1~N(0,1) 3AE:β1TL+β2L1L2
L2~N(L1,1) ATV:θ1(t)TL+θ2(t)L1L2
A~N{1+0.25(L1L2)+0.5L1L2,0.09} 4AE:βTL
ATV:θ(t)TL
T ~ Exp{0.3 + |L1|+log(1 + |L2|) + 0.1A}
C ~ unif(0, 3)
5 L1 ~ B(1, 0.6) AE:expit(βTL)
L2 ~ B{1, expit(0.5L1)} ATV:expit{θ(t)L}
A ~ B[1, expit{−1 + 0.25(L1L2)}]
T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A)
C ~ unif(0, 3.5)
6 L1 ~ B(1, 0.6) AE:expit(βTL)
L2 ~ B{1, expit(0.5L1)} ATY:expit{θ(t)TL}
A ~ B[1, expit{−1 + 0.25(L1L2)+0.5L1L2}]
T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A)
C ~ unif(0, 3.5)
7, 8 L1~N(0,1) 7AE:expit(β1TL+β2L1L2)
L2~N(L1,1) ATV:expit{θ1(t)TL+θ2(t)L1L2}
A ~ B[1, expit{−1 + 0.25(L1L2)+0.5L1L2}] 8AE:expit(βTL)
ATV:expit{θ(t)TL}
T ~ Exp{0.3 + |L1|+log(1 + |L2|) + 0.1A}
C ~ unif(0, 3)

The results of the simulations are given in Table 2, and largely corroborate the theory. The doubly robust estimators were empirically unbiased when either of the working models was correctly specified, and when both models were correctly specified, they were more efficient than the estimator ψ^BPS. There was little difference in efficiency between the estimators, and they also behaved similarly in terms of bias when both working models were misspecified. In general, the standard errors performed well. When the treatment was normal however, under misspecification of model AE, performance of the standard errors for ψ^BPS and ψ^BPSDR was less than optimal because the distribution of the estimation function was characterized by outlying values.

Table 2.

Simulation results from experiments 1 to 8. Monte Carlo bias multiplied by 10 (bias); Monte Carlo standard deviation multiplied by 10 (SD); coverage of 95% Wald confidence intervals (Cov).

ψ^BPS ψ^BPSDR ψ^TVPSDR ψ^ALS
Exp Bias SD Cov Bias SD Cov Bias SD Cov Bias SD Cov
1 −0.00 0.6 95.4 0.00 0.5 95.0 −0.00 0.5 94.9 −0.00 0.5 95.3
2 −0.57 0.6 86.4 −0.07 0.5 93.9 0.02 0.6 94.5 0.02 0.6 94.6
3 0.03 2.1 94.7 0.04 1.8 94.9 −0.00 1.8 95.3 7.19 1.1 0.0
4 7.80 1.8 0.0 7.86 1.8 0.0 7.17 1.1 0.0 7.17 1.1 0.0
5 0.01 1.2 94.9 0.05 1.1 94.8 0.04 1.1 94.8 0.04 1.1 94.7
6 −0.10 1.1 94.2 0.03 1.0 95.4 0.02 1.0 95.4 0.02 1.0 95.7
7 −0.01 1.5 95.0 0.03 1.3 95.0 −0.02 1.4 94.5 3.80 1.4 17.3
8 3.91 1.5 25.8 3.68 1.3 19.1 3.70 1.3 19.3 3.75 1.4 18.2

6. Data Analysis

We applied our methods to data from the five-center SUPPORT study that took place between 1989 and 1994. Previously, Connors et al. (1996) analyzed the dataset in order to evaluate the effect of right heart catheterization (RHC) on patient mortality. Many cardiologists believed that RHC was beneficial, but due to ethical concerns, this benefit had not been demonstrated via a randomized controlled trial. Connors et al. (1996) constructed propensity scores using 72 potential confounders identified by clinicians. As part of the original analysis, Connors et al. (1996) evaluated the association between RHC and survival by fitting a Cox model, adjusted for treatment (RHC), the propensity score and a reduced set of outcome adjustment variables. They considered only the first thirty days after entry into the study, such that people who survived beyond thirty days were considered administratively censored. Surprisingly, the investigators found that undergoing RHC (compared with no RHC) led to a decrease in survival.

We attempted to fit an additive hazards model using Aalen least-squares, adjusting for treatment and all 72 covariates. Random noise from a uniform distribution on (0,0.001) was first added to the survival times in order to break ties. We also forced the effects of all covariates (excluding the intercept) to be constant over time. The estimate of the coefficient for treatment in the final adjusted model was 0.00365 (SE = 0.00077, 95%CI 0.00213–0.00516). To assess the fitted model, we obtained predicted hazards at each time point (day 1, 2,…, 30); over 99% of the predicted hazards at the 25th, 50th, and 75th percentiles of the distribution of survival times were less than zero. We would be cautious about drawing inferences from such a model, as the misspecification (suggested by the invalid predictions) could lead to biased estimates of the treatment effect and misleading model-based standard errors. In view of this, we will next report estimates that rely on propensity scores instead.

We fitted a logistic regression model for treatment, adjusting for all 72 covariates. We obtained the predicted values from this model in order to estimate the effect of RHC using the estimator ψ^BPS. We also postulated a simplified additive hazards model, which included treatment and the outcome adjustment variables listed above, and allowed their effects to depend on time. The purpose of fitting this model was to construct the doubly robust estimators ψ^BPSDR and ψ^TVPSDR. For ψ^TVPSDR, we obtained the predictions after fitting a logistic regression model that included all 72 variables at each event time. R code is available in the Supplementary Materials (see also Web Appendix C).

The estimated hazard difference from the singly-robust estimator ψ^BPS was 0.00406 (SE = 0.00084, 95%CI 0.00241–0.0057). The doubly robust estimator ψ^BPSDR gave a hazard difference of 0.00334 (SE = 0.00074, 95%CI 0.0019–0.00479), and the estimator ψ^TVPSDR gave 0.00363 (SE = 0.00078, 95%CI 0.0021–0.00515). We also computed E{pr(Tt|A = a, L)}, the average survival probability at time t if everyone received treatment a, standardized with respect to the observed distribution of L. This was done by taking sample averages of R(t) exp{ψ^(Aa)t}. The adjusted survival curves are plotted in Figure 1; compared with the standard Kaplan–Meier estimates, the difference between the treatment groups is slightly shrunken towards the null. This is consistent with the results of Connors et al. (1996) and Vermeulen and Vansteelandt (2015), where small differences between unadjusted and adjusted treatment effect estimates (in the same direction) were observed.

Figure 1.

Figure 1.

Thirty-day survival curves. ψ^BPS, ψ^BPSDR and ψ^TVPSDR were used to compute E[pr(Tt|A = a, L)], the average survival probability at time t if everyone received treatment a (RHC or no RHC), standardized with respect to the observed distribution of the covariates L.

7. Discussion

In this article, we have developed a theory of estimation for the adjusted hazard difference/relative chance of survival using semiparametric additive hazards models. We have used this theory to develop several classes of doubly robust estimators, each strategy with its own strengths and limitations. The closed-form estimators described in Section 4.1, which are consistent and asymptotically normal under model M(ATVB), have several important advantages over competing strategies: under omitted interactions and/or time effects in model M, they converge to a convex combination of time/covariate-specific treatment effects (assuming that model ATV is correct); since the term exp(ψAt) does not appear in the estimating functions, their behavior is likely to be more stable; and simulations suggest that these estimators are reasonably efficient (although information may be lost by ignoring the fact that model ATV contains information about ψ). In Web Appendix D, we compare these estimators with those of Kang et al. (2018) in additional simulations, and find ours to be more efficient in a low-dimensional setting.

Although for simplicity we have considered a scalar treatment effect, in many settings ψ will be a vector of parameters. For instance, when the effect of A on the hazard is modified by Z, then the semiparametric model is now defined by the restriction

E{dN(t)|Ft,L,A}={dΩ(t,L)+(ψ1+ψ2TZ)Adt}R(t)

where ψ2 is a vector, and ψ1 and ψ2 give the adjusted effect of A within different levels of Z. We note that unlike the estimators of Wang et al. (2017), all of our proposals remain doubly robust outside of the ‘no treatment heterogeneity’ model (and are considerably more efficient under the homogeneous model by avoiding inverse probability weighting). If the vector-valued ψ = (ψ1, ψ2)T is estimated via the approach described in Section 4.1 that requires a model for E(A|Tt, L), then to ensure a congenial model specification, the model ATV must now also include Z, with its regression coefficient(s) allowed to depend on time.

The proposals described in this article are closely related to the method of G-estimation (Robins and Tsiatis, 1991; Robins, 1994). Picciotto et al. (2012) recently introduced the class of discrete-time structural cumulative failure time models, along with accompanying G-estimators. They postulate a multiplicative model for the probability of failure, rather than survival. Note that all of the methods described above can be adjusted in order to estimate the relative chance of failure. In Web Appendix A, we further investigate the efficiency of Picciotto et al.’s estimators.

In future work, the new class of semiparametric additive hazards models and the accompanying doubly robust estimators will be extended to estimate controlled direct effects in the presence of mediators and/or time varying confounders. Regarding mediation problems, Martinussen et al. (2011) previously considered the estimation of direct effects using additive hazards models; however, their approach was limited to settings with binary, randomly assigned treatments. By taking a semiparametric approach, the methods described in this article would be able to accommodate different types of treatment and adjustment for baseline covariates. Regarding the problem of time-varying confounding, such extensions would be useful in light of issues facing the two principal approaches used in survival analysis, marginal structural models (Robins et al., 2000) and structural accelerated failure time models (Robins and Tsiatis, 1991). Inference for the former requires inverse probability weighting; estimators can suffer heavily from large finite-sample bias and imprecision due to highly variable weights. Marginal structural models also prohibit investigation into effect modification by time-varying covariates. G-estimation has turned out to be problematic for structural accelerated failure time models, because administrative censoring is dealt with through an artificial recensoring process which can induce a lack of smoothness in the estimating equations.

Supplementary Material

Supplement

Acknowledgements

Oliver Dukes is supported by a Strategic Basic Research PhD grant from the Research Foundation—Flanders (FWO). Torben Martinussen’s work is part of the Dynamical Systems Interdisciplinary Network, University of Copenhagen. Eric J. Tchetgen Tchetgen is funded by the National Institutes of Health grant A1104459. The authors are grateful to Shaun Seaman for helpful discussions, and to Wenbin Lu and Suhyun Kang for providing R code.

Footnotes

8.

Supplementary Materials

Web Appendices referenced in Sections 27 are available with this article at the Biometrics website on Wiley Online Library. R code is for the data analysis is also available here.

References

  1. Aalen O (1980). A model for nonparametric regression analysis of counting processes In Lecture Notes in Statistics, Vol. 2, 1–25. New York, NY: Springer New York. [Google Scholar]
  2. Bickel PJ, Klaassen CA, Ritov Y, and Wellner JA (1993). Efficient and Adaptive Estimation for Semiparametric Models Johns Hopkins series in the mathematical sciences. Baltimore: Johns Hopkins University Press. [Google Scholar]
  3. Connors AF, Speroff T, Dawson NV, Thomas C, Harrell FE, Wagner D, et al. (1996). The effectiveness of right heart catheterization in the initial care of critically ill patients. SUPPORT Investigators. JAMA 276, 889–897. [DOI] [PubMed] [Google Scholar]
  4. Farrell MH (2015). Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics 189, 1–23. [Google Scholar]
  5. Hernán MA (2010). The hazards of hazard ratios. Epidemiology 21, 13–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Kang S, Lu W, and Zhang J (2018). On estimation of the optimal treatment regime with the additive hazards model. Statistica Sinica In press. DOI: 10.5705/ss.202016.0543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Martinussen T, Vansteelandt S, Gerster M, and Hjelmborg J.v. B. (2011). Estimation of direct effects for survival data by using the Aalen additive hazards model. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73, 773–788. [Google Scholar]
  8. McKeague IW and Sasieni PD (1994). A partly parametric additive risk model. Biometrika 81, 501–514. [Google Scholar]
  9. Picciotto S, Hernán MA, Page JH, Young JG, and Robins JM (2012). Structural nested cumulative failure time models to estimate the effects of interventions. Journal of the American Statistical Association 107, 886–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics—Theory and Methods 23, 2379–2412. [Google Scholar]
  11. Robins JM, Hernán MA, and Brumback B (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560. [DOI] [PubMed] [Google Scholar]
  12. Robins JM and Rotnitzky A (2001). Comments. Statistica Sinica 11, 920–936. [Google Scholar]
  13. Robins JM and Tsiatis AA (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics—Theory and Methods 20, 2609–2631. [Google Scholar]
  14. Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55. [Google Scholar]
  15. Scharfstein DO and Robins JM (2002). Estimation of the failure time distribution in the presence of informative censoring. Biometrika 89, 617–634. [Google Scholar]
  16. Tchetgen Tchetgen EJ, Robins JM, and Rotnitzky A (2010). On doubly robust estimation in a semiparametric odds ratio model. Biometrika 97, 171–180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Vansteelandt S, Martinussen T, and Tchetgen Tchetgen EJ (2014). On adjustment for auxiliary covariates in additive hazard models for the analysis of randomized experiments. Biometrika 101, 237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Vermeulen K and Vansteelandt S (2015). Bias-reduced doubly robust estimation. Journal of the American Statistical Association 110, 1024–1036. [Google Scholar]
  19. Wang Y, Lee M, Liu P, Shi L, Yu Z, Abu Awad Y, et al. (2017). Doubly robust additive hazards models to estimate effects of a continuous exposure on survival. Epidemiology 28, 771–779. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES