Summary.
The estimation of conditional treatment effects in an observational study with a survival outcome typically involves fitting a hazards regression model adjusted for a high-dimensional covariate. Standard estimation of the treatment effect is then not entirely satisfactory, as the misspecification of the effect of this covariate may induce a large bias. Such misspecification is a particular concern when inferring the hazard difference, because it is difficult to postulate additive hazards models that guarantee non-negative hazards over the entire observed covariate range. We therefore consider a novel class of semiparametric additive hazards models which leave the effects of covariates unspecified. The efficient score under this model is derived. We then propose two different estimation approaches for the hazard difference (and hence also the relative chance of survival), both of which yield estimators that are doubly robust. The approaches are illustrated using simulation studies and data on right heart catheterization and mortality from the SUPPORT study.
Keywords: Additive hazards model, Causal inference, Doubly robust estimation, Lifetime and survival analysis, Semiparametric inference
1. Introduction
In the analysis of time-to-event data, one is often interested in the effect of an exposure A on a survival outcome T, subject to a censoring time C, and conditional on a set of variables L. This adjusted association may be summarized by the hazard difference, which can be estimated by fitting a multivariable additive hazards model, or the hazard ratio, commonly estimated via the Cox proportional hazards model. For example, in the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT), Connors et al. (1996) investigated the effect of right heart catheterization (RHC), a binary exposure, on patient mortality. The exposure could also be continuous, for example, particulate air pollution. In observational studies, the dimension of covariates to adjust for is often high. The SUPPORT investigators used expert input from clinicians to identify 72 variables that could affect the decision of whether to use RHC or not, which they wished to adjust for in the analysis. The problem of bias resulting from misspecification of the hazards regression model then becomes a dominant consideration.
Such concerns have prompted the development of doubly robust estimators of treatment effects (Robins and Rotnitzky, 2001). These estimators require two working models, one of which is a regression model for the outcome, and another that relates to the treatment selection mechanism. Only one of these models needs to be correct in order to consistently estimate the treatment effect. Doubly robust estimators are now well established for parameters in linear, log-linear, and logistic conditional mean models (Robins, 1994; Tchetgen Tchetgen et al., 2010), and are particularly appealing when evaluating static treatment regimes or estimating optimal dynamic regimes in longitudinal studies. This is because it is challenging to specify a series of sequential regression models for the same outcome that are all simultaneously correct. More recently, the usefulness of doubly robust procedures has also been recognized in the context of data-adaptive selection or regularization. In particular, a number of common doubly robust estimators have turned out to be less susceptible to regularization bias than popular alternative estimators that do not possess the doubly robust property (Farrell, 2015). Standard confidence intervals for these doubly robust estimators are moreover uniformly valid, even when they ignore the use of such data-adaptive procedures (assuming the estimators for both working models converge sufficiently fast to the truth).
There has however been limited development of doubly robust estimators of the parameters indexing the hazard regression models (additive or multiplicative) popular in survival analysis. For semiparametric proportional hazards models we conjecture that, partly due to the non-collapsibility of the hazard ratio, no estimators of the treatment effect hazard ratio exist that are consistent whenever the treatment selection mechanism (more precisely, the distribution of A given L) is known (Tchetgen Tchetgen et al., 2010). Double robustness with respect to the treatment selection mechanism is therefore not attainable under such models. In contrast, as we will show in this article, doubly robust estimation strategies do exist for the hazard difference under semiparametric additive hazards models. The additional robustness is particularly advantageous for additive models; these are prone to misspecification since they do not impose the constraint that hazards are non-negative.
In Section 2, we introduce the new class of semiparametric additive hazards models. A theory of estimation for these models is developed in Section 3, and the efficient score function is identified. Because it requires specification of the entire conditional distribution of the treatment given the covariates, we also describe a subclass of estimators which only requires this conditional distribution to be correctly specified up to the mean. Drawing from these results, two practical strategies for the estimation of the treatment effect are proposed in Section 4; both of which yield doubly robust estimators. In Section 5, our estimators are compared in simulations with standard estimators of additive hazards models, and we reanalyze data from the SUPPORT study in Section 6.
2. The Model
We begin with some notation. The counting process corresponding to the survival time T is denoted by N(t) = I(T ≥ t); is the history spanned by N(t), with R(t) = I(T ≥ t). Let L include 1 for the intercept. For the moment, we assume there is no censoring.
Consider an additive hazards model of the form
where dΓ (t)T is a vector of coefficients that are allowed to depend on time (McKeague and Sasieni, 1994). This model imposes restrictions on the effect of L on the hazard at any time point t; such restrictions are undesirable, because misspecification of an additive hazards model may be inevitable when L is high-dimensional and has continuous components. Incorrectly specifying the effect of L can then induce bias in estimation of ψ. We therefore further relax the model restrictions by developing inference for the semiparametric additive hazards model , defined by
| (1) |
where dΩ (t, L) denotes the effect of time and the covariates on the hazard and is left unspecified. Restrictions are now only imposed on the association between A and the hazard. A further relaxation of model is
| (2) |
where dψ (t) is an unknown locally integrable function of time. In Web Appendix A, we extend our results on estimation to model (2) (denoted by ), but otherwise assume that the effect of A is constant. To simplify the exposition, we will also assume there is no effect modification by Z, where Z = z(L) is a vector function of L, and give details on extensions for multivariate ψ in the discussion.
By the equality
which is implied by (1), it follows that exp(− ψAt) can be interpreted as the adjusted relative change in the probability of surviving time t per unit increase in the exposure. This relative chance of survival is potentially easier to communicate than the hazard difference (or ratio). The reason is that contrasts between hazards lack a causal interpretation because they compare, at each time t, individuals who have not yet failed at that time. These individuals may not be exchangeable between treatment arms, even when the treatment is randomly assigned (Hernán, 2010).
3. Theory of Semiparametric Estimation
3.1. The Efficient Score for ψ
In this section, we develop a theory of estimation for ψ. We first give the semiparametric efficient score for ψ under model and discuss the properties of an efficient estimator. The derivation of all results is left to Web Appendix A.
Let , λ(t|A, L;ψ) = dΩ(t, L)/dt + ψA and dM(t; ψ) = dN(t) − λ(t|A, L; ψ)R(t)dt be the increment at time t of a local square-integrable martingale. Then the locally efficient score for ψ under model is
| (3) |
The solution for ψ to the equation
thus has an asymptotic variance which attains the semiparametric efficiency bound for ψ under model , when f (A|L) is known and Ω(t, L) is correctly specified (Bickel et al., 1993).
In practice, the law f (A|L) will usually be unknown, and thus so will E{λ−1(t, A, L)A exp(−ψAt)|L} and E{λ−1(t, A, L) exp(−ψAt)|L}. One may then postulate a parametric model for the population distribution f (A|L) = f (A|L; α), where f (A|L; α) is a known function smooth in an unknown finite-dimensional parameter α. In practice, α can be estimated using maximum likelihood. Since f (A|L) is ancillary to ψ, the efficiency bound for ψ is the same whether f (A|L) is estimated or known. The score (3) is thus efficient under the intersection model .
Implementation of an efficient estimator also requires knowledge of dΩ(t, L). In the semiparametric model , this function is left unspecified and is unknown to the data analyst. One option then is to estimate it via a working model , such as dΩ(t, L) = dΩ{L; Γ(t)} where dΩ{L; Γ(t)} is a known function, smooth at each time point t in an unknown finite-dimensional parameter Γ(t). With a slight abuse of notation, let denote the set of covariates that are included in the model ; for example, if L (1, L1)T, then a potential choice could . We will postulate a linear model . The parameters Γ(t) can then be consistently estimated using Aalen least-squares (Aalen, 1980), upon changing the increments dN(t) to , with being a consistent estimator of the time varying effect dψinit(t) under the initial model .
We therefore arrive at the estimating function
| (4) |
where λ{Γ(t), ψ} = dΩ{L; Γ(t)}/dt + ψA. Then the population expectation of (4) converges to
It follows that the function in (4) has mean zero when, in addition to , either model or is correct. The efficient estimator under model is therefore doubly robust (Robins and Rotnitzky, 2001). The semiparametric efficiency bound under model is only met when model is correctly specified and thus attained locally.
We have argued that the score (3) is efficient under . However, the same score in (3) also delivers an efficient doubly robust estimator; specifically, it is efficient under the union model at the intersection submodel . This follows from a general result in Robins and Rotnitzky (2001); however, the efficiency bound under may be lower than the bound under the union model.
3.2. Efficiency in a Subclass of Estimators
A drawback of the efficient score derived in the previous section is that it requires postulation of a model for the entire conditional distribution f (A|L). In Web Appendix A, we therefore derive the subclass of influence functions which have mean zero when the conditional mean E(A|L) is known. When A is binary, this conditional mean is known as the propensity score (Rosenbaum and Rubin, 1983). We are then lead to the class of estimating functions
| (5) |
where d(t, l) and dΩ*(t, L) are arbitrary functions of t and l with finite variance. Note indeed that (5) no longer depends on f (A|L) but only on the conditional mean E(A|L). The term R(t) exp(ψAt) can be interpreted as the removal of the treatment effect (in expectation) from the at-risk indicators (Martinussen et al., 2011). That (5) has mean zero under model when E(A|L) is known can be seen as follows:
for all d(t, L). It is shown in Web Appendix A that the optimal choice of d(t, L) for efficiency is
| (6) |
for dΩ*(t, L), it is equal to dΩ(t, L). The efficient estimator within this subclass is obtained by solving the equations
| (7) |
The conditional expectation E(A|L) is typically unknown. It can be estimated under a parametric model for the conditional mean E(A|L) = E(A|L; β), where E(A|L; β) is a known function, smooth in an unknown finite-dimensional smooth parameter β. Under model , β can be estimated using maximum likelihood.
Estimation of model does not affect the efficiency bound for the class of estimators identified by (5). Furthermore, misspecification does not induce bias when dΩ*(t, L) = dΩ(t, L) is consistently estimated, since the estimating function in (7) is unbiased under the union model and therefore doubly robust.
4. Implementation
4.1. Estimation via f (A|L)
In this section, we will build on the efficiency theory of the previous section and outline two potential estimation strategies for ψ. All of the estimators are consistent and asymptotically normal and accompanying variance estimators, unless stated otherwise, are given in Web Appendix B.
From the perspective of maximizing efficiency, a reasonable approach to take is to construct an estimator based on the efficient score (3). We note first that the score requires inverse weighting by the hazard function; this is also the case for efficient estimators of the parameters indexing other additive hazards models (McKeague and Sasieni, 1994). In practice, this can lead to estimators with unstable performance in small-samples. If the hazard weights are removed from (3), as is common in standard fitting strategies for additive hazards models, we are left with the estimating function
| (8) |
The ratio
| (9) |
is the first order derivative of the cumulant generating function for f (A|L), evaluated at −ψt. Under some model , one may evaluate this ratio directly or using Monte Carlo integration. Estimators of the asymptotic variance can be derived following standard M-estimation arguments.
However, specifying a correct model for a distribution in this fashion is unappealing, as it is difficult to formulate plausible models and any resulting misspecification may then have a potentially large impact on subsequent inference. Also, Monte Carlo integration may be computationally inconvenient as the integration needs to be done at all parameter values through which one iterates when numerically solving the equation. In light of these limitations, we will pursue alternative strategies in the remainder of this section.
We return to the estimating function (8). By Bayes rule, it follows that under model ,
for all l. Therefore,
| (10) |
which suggests that the unbiased estimating function (8) can also be written as
| (11) |
Rather than modeling (8) indirectly via a model for the distribution of A given L, we may choose to instead specify a model for the time-varying propensity score E(A|T ≥ t, L). A question then is how to specify a model at each time t that is congenial with model . Specifically, a parameterization of is congenial with model if for each element in and , there exists a distribution f (A|L) such that the equality (10) holds. If no such distribution exists, then we know before even seeing the data that the proposed models for and cannot both be correct. In Web Appendix A, we show that the following generalized linear model
is always congenial with when the dispersion parameter for f (A|L) does not depend on L. Here, g() is a canonical link function; and is the vector of covariates that are included in the model . A similar estimating function to (11) appears in Kang et al. (2018); however, they use a different parameterization of to the one we give above, which requires estimation of P(T ≥ t|A, L), P(T ≥ t|L), and E(A|L). Parametric models for P(T ≥ t|A, L) and P(T ≥ t|L) may not be congenial with model , which undermines the feasibility of doubly robust inference. Their proposal therefore relies on kernel density estimators, which are not suitable when L is high-dimensional.
An advantage of our parameterization of is that it admits a closed-form estimator of ψ, which is defined as
| (12) |
Here, Λ{θ(t)} = A − E{A|T ≥ t,L; θ(t)} and J(t) = 1 if both Y(t) and have full rank and zero otherwise, where Y(t) denotes a matrix with ith row and similarly for . It follows from the theory of M-estimation that this estimator is consistent and asymptotically normal under model .
When (we hereby denote the common set of covariates by ), the previous expressions can be further simplified. Given the use of the canonical link function, θ(t) can be estimated at time t as the solution to the estimating equations
By estimating θ(t) in this way, we ensure that the estimating equations for ψ reduce to
| (13) |
and ψ can be estimated in closed-form as
| (14) |
Surprisingly, estimation of Γ(t) is no longer required, yet the doubly robust property is retained. To see why, note that when model is misspecified, the expectation of (13) will converge to
where θ*(t) is the limiting value of . Because of how θ(t) is estimated, the above display will equal zero when dΩ(t, L) = dΩ{L; Γ(t)}, thus demonstrating double robustness. This strategy is related to bias-reduced doubly robust estimation, as proposed by Vermeulen and Vansteelandt (2015); further discussion is given in Web Appendix A.
Vansteelandt et al. (2014) showed that Aalen least-squares estimators are robust to misspecification of the additive hazards model when A is normal with a mean that is linear in and constant variance. Indeed, in this scenario, it follows from the Appendix of Vansteelandt et al. (2014) that the Aalen least-squares estimator is equivalent to the estimator given in (14). Using a Taylor expansion around (9),
it follows that this robustness holds more generally so long as the mean and central moments of f (A|L) are linear in L. This assumption would not generally hold if A is binary; however, our estimator given in (12) generalizes the robustness properties of Aalen least-squares to arbitrary exposure distributions. Furthermore, if the true treatment effect ψ*(t, L) is a function of t and L, such that model no longer holds, then the estimator defined in (12) continues to have a useful interpretation. Assuming model is correct, then the estimator converges to
which is a weighted average of the treatment effects at different times and covariate values. In contrast, the Aalen least-squares estimator of ψ in the corresponding additive hazards model is not generally a convex combination of the time/covariate-specific treatment effects, even when L is correctly modeled. It is in particular not guaranteed to lie within the range of time/covariate-specific treatment effects.
4.2. Estimation via E(A|L)
In Section 3.2, we identified a subclass of estimators that are consistent under a correctly specified model of the conditional mean of the exposure. We now exploit these results in order to develop inference for ψ.
Consider the estimating equations for ψ suggested by the function (5):
| (15) |
In evaluating the integral in (15), note that the function d(t, L) impacts only the variance, rather than the unbiasedness of the estimating equations. The efficient choice (6) depends on the conditional distribution of the treatment A, and in certain cases may lead to the integral becoming analytically intractable. We therefore set d(t,L) to 1, leaving the search for efficient yet computationally feasible choices to future work.
Letting Δ(β) denote {A − E(A|L; β)}, the estimating equation becomes
ane ψ can therefore be estimated as a solution to the equations
| (16) |
It is vital for identification that the arbitrary function dΩ(t,L) is non-zero over a set of times t with positive Lebesgue measure. Otherwise, as we integrate to ∞, all information about the parameter ψ is lost. Therefore, dΩ (t,L) can be seen as weighting term that prevents the integral in (15) from equaling zero at all ψ.
Setting dΩ(t, Li) = 1 for all t > 0, the above equations reduce to , where
It follows from the theory of M-estimation that under standard regularity conditions, the solution to equation (15) delivers an estimator which is consistent and asymptotically normal under model
Doubly robust extensions to the previous proposal can also be made. Returning to equation (15), then rather than setting dΩ(t, Li) = 1 for all t > 0, we now postulate a model for dΩ(t, L), such as dΩ(t, L) = dΩ{L; Γ(t)}. After setting d(t, Li) = 1 again for all t > 0, it follows that ψ can be estimated as the solution to
| (17) |
The resulting solution is consistent and asymptotically normal under the model .
It is straightforward to show that the efficient subclass score (7) is invariant to centering A by its conditional mean; if we are willing to work with an non-efficient estimator, it is also desirable that it has this property. We therefore recommend that A be substituted by Δ(β) in (16) and (17), such that the estimating equations implied by (16) reduce to otherwise. Centering will prevent the exponential terms in the estimating equations from becoming large at later time points, which could lead to improved finite-sample performance.
Solving the equation for the singly robust estimator could also be a computationally-fast first step towards an estimator that is nearly efficient (in the general class), if one is willing to specify the distribution f (A|L). This is because an initial estimate could be plugged into ratio term (9) in the estimating function (8), making it linear in ψ. Under model , the resulting two-step estimator is consistent and asymptotically normal if either or both of the models or hold.
4.3. Censoring
When the survival time is censored by C, all of the approaches described above are valid under the assumption that censoring is independent of T and A, conditional on L, in the sense that C ⫫ (T, A)|L. All approaches are thus consistent when censoring depends only on L. The doubly robust estimators are moreover consistent when censoring depends additionally on A, and the additive hazards model is correctly specified. We can relax these assumptions by using inverse probability of censoring weighting (Scharfstein and Robins, 2002), under a model for the censoring mechanism:
Here, NC(t) is the counting process for the censoring time; , where Vs is a collection of covariates measured at time s; and is a known function, smooth in an unknown parameter π. An additive or multiplicative hazards model could be postulated here. An individual’s contribution to the estimating function (12) at time t is then weighted by
such that ψ can be estimated as the solution to
| (18) |
Weights can also be added to the estimating equations in (15), such that ψ can be estimated as the solution to
and likewise for the doubly robust estimator given by (17).
When the resulting weights are highly variable, stabilized inverse probability weights can be obtained under an additional model for the censoring mechanism:
where a(t, L; κ) is a known function, smooth in an unknown parameter κ. Misspecification of the latter model does not affect the consistency of the estimator of ψ (Robins et al., 2000).
5. Simulation Study
We considered 4 estimators: i) the singly robust estimator described in Section 4.2 that is consistent under model ; ii) the doubly robust estimator based on display (17) that is consistent under model ; iii) doubly robust estimator given in closed-form in (12) that is consistent under model ; and iv) the Aalen least-squares estimator of the time-constant treatment coefficient from a covariate-adjusted additive hazards model (where the effects of the baseline covariates were allowed to vary over time) that is consistent under model . Model-based standard errors were used to construct 95% confidence intervals for (the variance estimators used to construct the other 95% confidence intervals are described in Web Appendix B).
In order to evaluate the four different estimators, we considered eight different experiments. For each experiment, we simulated 1000 data sets of 1000 observations. We generated covariates L1 and L2, and exposure A, event time T and censoring time C; an individual’s follow up time was taken as min(T, C). In experiments 1–4, the exposure A was continuous (e.g., the increase in the dose of a drug), whereas it was binary in experiments 5–8 (see Table 1 for a descriptions of the data generating mechanisms). In experiments 1, 2, 5, and 6, all working models included only terms for L1, L2, and an intercept. In experiments 1 and 5, all models were correctly specified, whereas in experiments 2 and 6, the models and were misspecified, as they excluded an interaction term. In experiments 3 and 7, the models and correctly included an interaction term, whereas this term was excluded in experiments 4 and 8 (all models are wrong). In experiments 1, 2, 5, and 6, those for whom min(T, C) >= 1.6 were censored at t =1.6, corresponding to the study being closed at this time point. The same was done in experiments 3, 4, 7 and 8 at t = 1.3. For all experiments, the chosen censoring mechanisms lead to 25–30% of subjects being censored (with around 10% censored at the end of the study).
Table 1.
A description of the data-generating mechanisms behind experiments 1–8. In experiments 1–4, we standardized the exposure to give it mean zero and standard deviation 1. B(1, p): Bernoulli distribution with expectation p; : normal distribution with expectation μ and variance σ2; Exp(λ): exponential distribution with rate λ; unif(a, b)N:uniform distribution with minimum and maximum values a and b, respectively. We use L = (1, L1, L2)T; in all settings, for model , we fitted .
| Exp. | Data-generating mechanism | Fitted exposure model |
|---|---|---|
| 1 | L1 ~ B(1, 0.6) | |
| L2 ~ B{1, expit(0.5L1)} | ||
| T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A) | ||
| C ~ unif(0, 3.5) | ||
| 2 | L1 ~ B(1, 0.6) | |
| L2 ~ B{1, expit(0.5L1)} | ||
| T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A) | ||
| C ~ unif(0, 3.5) | ||
| 3, 4 | ||
| T ~ Exp{0.3 + |L1|+log(1 + |L2|) + 0.1A} | ||
| C ~ unif(0, 3) | ||
| 5 | L1 ~ B(1, 0.6) | |
| L2 ~ B{1, expit(0.5L1)} | ||
| A ~ B[1, expit{−1 + 0.25(L1 − L2)}] | ||
| T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A) | ||
| C ~ unif(0, 3.5) | ||
| 6 | L1 ~ B(1, 0.6) | |
| L2 ~ B{1, expit(0.5L1)} | ||
| A ~ B[1, expit{−1 + 0.25(L1 − L2)+0.5L1L2}] | ||
| T ~ Exp(0.5 + 0.5L1 + L2 + 0.1A) | ||
| C ~ unif(0, 3.5) | ||
| 7, 8 | ||
| A ~ B[1, expit{−1 + 0.25(L1 − L2)+0.5L1L2}] | ||
| T ~ Exp{0.3 + |L1|+log(1 + |L2|) + 0.1A} | ||
| C ~ unif(0, 3) |
The results of the simulations are given in Table 2, and largely corroborate the theory. The doubly robust estimators were empirically unbiased when either of the working models was correctly specified, and when both models were correctly specified, they were more efficient than the estimator . There was little difference in efficiency between the estimators, and they also behaved similarly in terms of bias when both working models were misspecified. In general, the standard errors performed well. When the treatment was normal however, under misspecification of model , performance of the standard errors for and was less than optimal because the distribution of the estimation function was characterized by outlying values.
Table 2.
Simulation results from experiments 1 to 8. Monte Carlo bias multiplied by 10 (bias); Monte Carlo standard deviation multiplied by 10 (SD); coverage of 95% Wald confidence intervals (Cov).
| Exp | Bias | SD | Cov | Bias | SD | Cov | Bias | SD | Cov | Bias | SD | Cov |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | −0.00 | 0.6 | 95.4 | 0.00 | 0.5 | 95.0 | −0.00 | 0.5 | 94.9 | −0.00 | 0.5 | 95.3 |
| 2 | −0.57 | 0.6 | 86.4 | −0.07 | 0.5 | 93.9 | 0.02 | 0.6 | 94.5 | 0.02 | 0.6 | 94.6 |
| 3 | 0.03 | 2.1 | 94.7 | 0.04 | 1.8 | 94.9 | −0.00 | 1.8 | 95.3 | 7.19 | 1.1 | 0.0 |
| 4 | 7.80 | 1.8 | 0.0 | 7.86 | 1.8 | 0.0 | 7.17 | 1.1 | 0.0 | 7.17 | 1.1 | 0.0 |
| 5 | 0.01 | 1.2 | 94.9 | 0.05 | 1.1 | 94.8 | 0.04 | 1.1 | 94.8 | 0.04 | 1.1 | 94.7 |
| 6 | −0.10 | 1.1 | 94.2 | 0.03 | 1.0 | 95.4 | 0.02 | 1.0 | 95.4 | 0.02 | 1.0 | 95.7 |
| 7 | −0.01 | 1.5 | 95.0 | 0.03 | 1.3 | 95.0 | −0.02 | 1.4 | 94.5 | 3.80 | 1.4 | 17.3 |
| 8 | 3.91 | 1.5 | 25.8 | 3.68 | 1.3 | 19.1 | 3.70 | 1.3 | 19.3 | 3.75 | 1.4 | 18.2 |
6. Data Analysis
We applied our methods to data from the five-center SUPPORT study that took place between 1989 and 1994. Previously, Connors et al. (1996) analyzed the dataset in order to evaluate the effect of right heart catheterization (RHC) on patient mortality. Many cardiologists believed that RHC was beneficial, but due to ethical concerns, this benefit had not been demonstrated via a randomized controlled trial. Connors et al. (1996) constructed propensity scores using 72 potential confounders identified by clinicians. As part of the original analysis, Connors et al. (1996) evaluated the association between RHC and survival by fitting a Cox model, adjusted for treatment (RHC), the propensity score and a reduced set of outcome adjustment variables. They considered only the first thirty days after entry into the study, such that people who survived beyond thirty days were considered administratively censored. Surprisingly, the investigators found that undergoing RHC (compared with no RHC) led to a decrease in survival.
We attempted to fit an additive hazards model using Aalen least-squares, adjusting for treatment and all 72 covariates. Random noise from a uniform distribution on (0,0.001) was first added to the survival times in order to break ties. We also forced the effects of all covariates (excluding the intercept) to be constant over time. The estimate of the coefficient for treatment in the final adjusted model was 0.00365 (SE = 0.00077, 95%CI 0.00213–0.00516). To assess the fitted model, we obtained predicted hazards at each time point (day 1, 2,…, 30); over 99% of the predicted hazards at the 25th, 50th, and 75th percentiles of the distribution of survival times were less than zero. We would be cautious about drawing inferences from such a model, as the misspecification (suggested by the invalid predictions) could lead to biased estimates of the treatment effect and misleading model-based standard errors. In view of this, we will next report estimates that rely on propensity scores instead.
We fitted a logistic regression model for treatment, adjusting for all 72 covariates. We obtained the predicted values from this model in order to estimate the effect of RHC using the estimator . We also postulated a simplified additive hazards model, which included treatment and the outcome adjustment variables listed above, and allowed their effects to depend on time. The purpose of fitting this model was to construct the doubly robust estimators and . For , we obtained the predictions after fitting a logistic regression model that included all 72 variables at each event time. R code is available in the Supplementary Materials (see also Web Appendix C).
The estimated hazard difference from the singly-robust estimator was 0.00406 (SE = 0.00084, 95%CI 0.00241–0.0057). The doubly robust estimator gave a hazard difference of 0.00334 (SE = 0.00074, 95%CI 0.0019–0.00479), and the estimator gave 0.00363 (SE = 0.00078, 95%CI 0.0021–0.00515). We also computed E{pr(T ≥ t|A = a, L)}, the average survival probability at time t if everyone received treatment a, standardized with respect to the observed distribution of L. This was done by taking sample averages of . The adjusted survival curves are plotted in Figure 1; compared with the standard Kaplan–Meier estimates, the difference between the treatment groups is slightly shrunken towards the null. This is consistent with the results of Connors et al. (1996) and Vermeulen and Vansteelandt (2015), where small differences between unadjusted and adjusted treatment effect estimates (in the same direction) were observed.
Figure 1.

Thirty-day survival curves. , and were used to compute E[pr(T ≥ t|A = a, L)], the average survival probability at time t if everyone received treatment a (RHC or no RHC), standardized with respect to the observed distribution of the covariates L.
7. Discussion
In this article, we have developed a theory of estimation for the adjusted hazard difference/relative chance of survival using semiparametric additive hazards models. We have used this theory to develop several classes of doubly robust estimators, each strategy with its own strengths and limitations. The closed-form estimators described in Section 4.1, which are consistent and asymptotically normal under model , have several important advantages over competing strategies: under omitted interactions and/or time effects in model , they converge to a convex combination of time/covariate-specific treatment effects (assuming that model is correct); since the term exp(ψAt) does not appear in the estimating functions, their behavior is likely to be more stable; and simulations suggest that these estimators are reasonably efficient (although information may be lost by ignoring the fact that model contains information about ψ). In Web Appendix D, we compare these estimators with those of Kang et al. (2018) in additional simulations, and find ours to be more efficient in a low-dimensional setting.
Although for simplicity we have considered a scalar treatment effect, in many settings ψ will be a vector of parameters. For instance, when the effect of A on the hazard is modified by Z, then the semiparametric model is now defined by the restriction
where ψ2 is a vector, and ψ1 and ψ2 give the adjusted effect of A within different levels of Z. We note that unlike the estimators of Wang et al. (2017), all of our proposals remain doubly robust outside of the ‘no treatment heterogeneity’ model (and are considerably more efficient under the homogeneous model by avoiding inverse probability weighting). If the vector-valued ψ = (ψ1, ψ2)T is estimated via the approach described in Section 4.1 that requires a model for E(A|T ≥ t, L), then to ensure a congenial model specification, the model must now also include Z, with its regression coefficient(s) allowed to depend on time.
The proposals described in this article are closely related to the method of G-estimation (Robins and Tsiatis, 1991; Robins, 1994). Picciotto et al. (2012) recently introduced the class of discrete-time structural cumulative failure time models, along with accompanying G-estimators. They postulate a multiplicative model for the probability of failure, rather than survival. Note that all of the methods described above can be adjusted in order to estimate the relative chance of failure. In Web Appendix A, we further investigate the efficiency of Picciotto et al.’s estimators.
In future work, the new class of semiparametric additive hazards models and the accompanying doubly robust estimators will be extended to estimate controlled direct effects in the presence of mediators and/or time varying confounders. Regarding mediation problems, Martinussen et al. (2011) previously considered the estimation of direct effects using additive hazards models; however, their approach was limited to settings with binary, randomly assigned treatments. By taking a semiparametric approach, the methods described in this article would be able to accommodate different types of treatment and adjustment for baseline covariates. Regarding the problem of time-varying confounding, such extensions would be useful in light of issues facing the two principal approaches used in survival analysis, marginal structural models (Robins et al., 2000) and structural accelerated failure time models (Robins and Tsiatis, 1991). Inference for the former requires inverse probability weighting; estimators can suffer heavily from large finite-sample bias and imprecision due to highly variable weights. Marginal structural models also prohibit investigation into effect modification by time-varying covariates. G-estimation has turned out to be problematic for structural accelerated failure time models, because administrative censoring is dealt with through an artificial recensoring process which can induce a lack of smoothness in the estimating equations.
Supplementary Material
Acknowledgements
Oliver Dukes is supported by a Strategic Basic Research PhD grant from the Research Foundation—Flanders (FWO). Torben Martinussen’s work is part of the Dynamical Systems Interdisciplinary Network, University of Copenhagen. Eric J. Tchetgen Tchetgen is funded by the National Institutes of Health grant A1104459. The authors are grateful to Shaun Seaman for helpful discussions, and to Wenbin Lu and Suhyun Kang for providing R code.
Footnotes
Supplementary Materials
Web Appendices referenced in Sections 2–7 are available with this article at the Biometrics website on Wiley Online Library. R code is for the data analysis is also available here.
References
- Aalen O (1980). A model for nonparametric regression analysis of counting processes In Lecture Notes in Statistics, Vol. 2, 1–25. New York, NY: Springer New York. [Google Scholar]
- Bickel PJ, Klaassen CA, Ritov Y, and Wellner JA (1993). Efficient and Adaptive Estimation for Semiparametric Models Johns Hopkins series in the mathematical sciences. Baltimore: Johns Hopkins University Press. [Google Scholar]
- Connors AF, Speroff T, Dawson NV, Thomas C, Harrell FE, Wagner D, et al. (1996). The effectiveness of right heart catheterization in the initial care of critically ill patients. SUPPORT Investigators. JAMA 276, 889–897. [DOI] [PubMed] [Google Scholar]
- Farrell MH (2015). Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics 189, 1–23. [Google Scholar]
- Hernán MA (2010). The hazards of hazard ratios. Epidemiology 21, 13–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang S, Lu W, and Zhang J (2018). On estimation of the optimal treatment regime with the additive hazards model. Statistica Sinica In press. DOI: 10.5705/ss.202016.0543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinussen T, Vansteelandt S, Gerster M, and Hjelmborg J.v. B. (2011). Estimation of direct effects for survival data by using the Aalen additive hazards model. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73, 773–788. [Google Scholar]
- McKeague IW and Sasieni PD (1994). A partly parametric additive risk model. Biometrika 81, 501–514. [Google Scholar]
- Picciotto S, Hernán MA, Page JH, Young JG, and Robins JM (2012). Structural nested cumulative failure time models to estimate the effects of interventions. Journal of the American Statistical Association 107, 886–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics—Theory and Methods 23, 2379–2412. [Google Scholar]
- Robins JM, Hernán MA, and Brumback B (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560. [DOI] [PubMed] [Google Scholar]
- Robins JM and Rotnitzky A (2001). Comments. Statistica Sinica 11, 920–936. [Google Scholar]
- Robins JM and Tsiatis AA (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics—Theory and Methods 20, 2609–2631. [Google Scholar]
- Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55. [Google Scholar]
- Scharfstein DO and Robins JM (2002). Estimation of the failure time distribution in the presence of informative censoring. Biometrika 89, 617–634. [Google Scholar]
- Tchetgen Tchetgen EJ, Robins JM, and Rotnitzky A (2010). On doubly robust estimation in a semiparametric odds ratio model. Biometrika 97, 171–180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vansteelandt S, Martinussen T, and Tchetgen Tchetgen EJ (2014). On adjustment for auxiliary covariates in additive hazard models for the analysis of randomized experiments. Biometrika 101, 237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vermeulen K and Vansteelandt S (2015). Bias-reduced doubly robust estimation. Journal of the American Statistical Association 110, 1024–1036. [Google Scholar]
- Wang Y, Lee M, Liu P, Shi L, Yu Z, Abu Awad Y, et al. (2017). Doubly robust additive hazards models to estimate effects of a continuous exposure on survival. Epidemiology 28, 771–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
