Skip to main content
Springer logoLink to Springer
. 2024 Mar 11;30(2):383–403. doi: 10.1007/s10985-024-09616-z

Bias of the additive hazard model in the presence of causal effect heterogeneity

Richard A J Post 1,, Edwin R van den Heuvel 1, Hein Putter 2,3
PMCID: PMC10957647  PMID: 38466520

Abstract

Hazard ratios are prone to selection bias, compromising their use as causal estimands. On the other hand, if Aalen’s additive hazard model applies, the hazard difference has been shown to remain unaffected by the selection of frailty factors over time. Then, in the absence of confounding, observed hazard differences are equal in expectation to the causal hazard differences. However, in the presence of effect (on the hazard) heterogeneity, the observed hazard difference is also affected by selection of survivors. In this work, we formalize how the observed hazard difference (from a randomized controlled trial) evolves by selecting favourable levels of effect modifiers in the exposed group and thus deviates from the causal effect of interest. Such selection may result in a non-linear integrated hazard difference curve even when the individual causal effects are time-invariant. Therefore, a homogeneous time-varying causal additive effect on the hazard cannot be distinguished from a time-invariant but heterogeneous causal effect. We illustrate this causal issue by studying the effect of chemotherapy on the survival time of patients suffering from carcinoma of the oropharynx using data from a clinical trial. The hazard difference can thus not be used as an appropriate measure of the causal effect without making untestable assumptions.

Keywords: Causal inference, Causal hazard difference, Individual effect modification, Selection bias, Aalen additive hazard model

Introduction

Hazard ratios, often obtained by fitting a Cox proportional hazards model (Cox 1972), are the most common effect measures when dealing with time-to-event data. However, the hazard ratio is prone to selection bias due to conditioning on survival and therefore not suitable for causal inference (Hernán 2010; Aalen et al. 2015; Stensrud et al. 2017). It has been recommended to use other, better interpretable, estimands when interested in causal effects (Hernán 2010; Stensrud et al. 2018; Bartlett et al. 2020; Young et al. 2020). Alternatively, using the additive hazard model can avoid interpretation issues (Aalen et al. 2015; Martinussen et al. 2020). In the nonparametric model proposed by Aalen (1989), the hazard rate at time t for individual i with (possibly time-dependent) covariates xi(t) (of dimension p) is determined by the values of xi up until time t and equals

λt{xi(s)}st=β0(t)+β1(t)xi1(t)++βp(t)xip(t),

where the parameters βj(t) are arbitrary regression functions, allowing time-varying effects (Aalen et al. 2008). Restricted versions have been proposed by Lin and Ying (1994) and McKeague and Sasieni (1994), where all or some βj(t) are assumed to be constant over time. The cumulative regression function, Bj(t)=0tβj(s)ds, may reveal changes in effect over time, see for example Aalen et al. (2008, pp. 160-162).

For cause-effect relations that can be accurately described with Aalen’s additive hazard model, the hazard difference is a collapsible measure (Martinussen and Vansteelandt 2013). Then, in the absence of confounding, the hazard difference can be appropriately used to estimate the causal effect, even in the case of unmeasured risk factors (Aalen et al. 2015). For this, it is necessary that the exposure effect on the hazard does not depend on unmeasured individual features (modifiers) and thus is the same for all individuals. However, due to the fundamental problem of causal inference, the effect homogeneity assumption is untestable (Holland 1986). In our companion paper (Post et al. 2024), we showed that next to unmeasured risk factors, i.e. frailty (Aalen et al. 2008, Chapter 6), effect heterogeneity at the level of the individual hazard results in selection bias of observed hazard ratios.

In this work, we extend the additive hazard model studied in Aalen et al. (2015) by allowing heterogeneity of the effect (on the hazard) and quantify the bias of using the observed hazard difference when estimating the causal effect. In Sect. 2, we introduce notation to describe the cause-effect relations using a structural causal model for which we can define the causal hazard difference. In practice, when appyling an additive hazard model the observed hazard difference is modelled. We show that (in absence of confounding) the expected value of the observed hazard equals a causal hazard marginalized over survivors. The expectation of the observed hazard difference thus equals a difference between hazards marginalized over survivors in the exposed and unexposed universe. By selection of individuals with favourable values of the effect modifier, this difference can deviate from the causal hazard difference as we formalize in Sect. 3. We present numerical examples to illustrate how this selection can result in a non-linear integrated hazard difference curve, that can be interpeted as reflecting a time-varying causal effect, while the actual individual causal effects are time-invariant. To emphasize why it is important to be aware of such a difference between the expected observed hazard difference and the causal hazard difference, we reflect on the analysis of the effect of treatment on survival with carcinoma of the oropharynx from the clinical trial in Sect. 4. Finally, we present some concluding remarks in Sect. 5.

Notation and hazard differences

Probability distributions of factual and counterfactual outcomes are defined in the potential outcome framework (Neyman 1923; Rubin 1974). Let Ti and Ai represent the (factual) stochastic outcome and exposure assignment level of individual i. Let Tia equal the potential outcome of individual i under the intervention of the exposure to level a (counterfactual when Aia). For those more familiar with the do-calculus, Ta is equivalent to Tdo(A=a) as e.g. derived in Pearl (2009a, Equation 40) and Bongers et al. (2021, Definition 8.6). Throughout this paper, we will assume causal consistency, i.e. if Ai=a, then Tia=Tii=Ti. Causal consistency implies that potential outcomes are independent of the assigned exposure levels of other individuals. The hazard rate of the potential outcome can vary among individuals due to heterogeneity in risk factors U0 as also considered by Aalen et al. (2015). The hazard difference of the potential outcomes with and without (a=0) exposure might also vary among individuals due to an effect modifier U1. Therefore, the hazard rate of individual i at time t of the potential outcome under exposure to level a is a function of U0i and U1i and thus random and equals

λia(t)=limh0h-1PTia[t,t+h)Tiat,U0i,U1i. 1

The hazard of the potential outcome Tia can be parameterized with a function that depends on U0i, U1i and a.

We describe cause-effect relations with a structural causal model (SCM) which is commonly used in the causal graphical literature, see e.g. Pearl (2009b, Chapter 1.4) and  Peters et al. (2018, Chapter 6), to model observations. Instead, we include details on individual effect modifier U1 as well as the latent common cause of the outcomes U0, so that the SCM consists of a joint probability distribution of (NA,U0,U1,NT) and a collection of structural assignments (for more details, see Post et al. (2024, Section 2)) such that graphic file with name 10985_2024_9616_Figa_HTML.jpg

If in SCM (2), Inline graphic, then there exists confounding as the distributions of (U0,U1,NT) are not exchangeable between exposed and non-exposed individuals. However, in this work we focus on the distribution of data observed from a properly executed RCT, where by the randomization NAU0,U1,NT so that there is no confounding. It is important to realize that a SCM cannot be validated with data as it describes potential outcomes from different universes. For each individual the outcome can only be observed in one of the universes, and only the fit of the distribution of the outcomes in the factual world can be verified. In SCM (2), we did not restrict the distribution of U0 and U1 and only restricted fλ and fA to be properly defined hazard and inverse cumulative distribution functions respectively, so that the structural model is quite general. However, in this work we limit ourselves to cause-effect relations where the causal effect of the exposure is described by λia(t)-λi0(t)=f1(t,U1i,a).

Hazard differences

If SCM (2) applies and λia(t)-λi0(t)=f1(t,a), then the causal effect is equal for each individual, i.e. effect homogeneity, and Aalen’s additive hazard model applies. Otherwise, when λia(t)-λi0(t)=f1(t,U1i,a), the difference differs among individuals so that E[λia(t)-λi0(t)] will typically be the estimand of interest. The latter contrast equals the difference between the expected hazard rate in the world where everyone is exposed to a and the world without exposure, and will therefore be referred to as the causal hazard difference (CHD) defined in Definition 1.

Definition 1

Causal hazard difference The CHD for cause-effect relations that can be parameterized with SCM 2 equals

Eλia(t)-Eλi0(t)=E[f1(t,U1,a)]=limh0h-1PTa[t,t+h)Tat,U0,U1dFU0,U1-limh0h-1PT0[t,t+h)T0t,U0dFU0.

Throughout this paper, we abbreviate the Lebesque-Stieltjes integral of a function g with respect to probability law FX, g(x)dFX(x), as g(X)dFX.

The CHD thus equals the difference of hazard rates of the potential outcomes marginalized over the population distribution of (U0,U1). However, the distribution of (U0,U1) among survivors will differ over time (in all worlds), i.e. (U0,U1)d(U0,U1)Tat. In turn, (U0,U1)Tat and (U0,U1)T0t can differ in distribution. As a consequence, the hazard rates for the observed (factual) outcomes are affected by these conditional distributions of U0 and U1, and the observed hazard difference may reflect both the causal effect of exposure and a difference in distributions of (U0,U1) between exposed and unexposed individuals. Since practioners are typically interested in the causal effect alone, it is important to understand this mixture. Since the focus of this work is on estimands, for readability, we refer to the expected value of the difference of the observed hazards as the observed hazard difference (OHD) presented in Definition 2.

Definition 2

Observed hazard difference The OHD at time t equals

limh0h-1PT[t,t+h)Tt,A=a-limh0h-1PT[t,t+h)Tt,A=0=limh0h-1PT[t,t+h)Tt,A=a,U0,U1dFU0,U1Tt,A=a-limh0h-1PT[t,t+h)Tt,A=0,U0dFU0Tt,A=0.

To be precise, at time t the hazard rate can only be observed for non-censored individuals at that time (C(t)=0). However, in this work we will assume independent censoring, so that PTTt,A=a is equal to PTTt,A=a,C(t)=0.

To compare the OHD to the CHD of interest, the OHD should be expressed in terms of potential outcomes. By causal consistency,

PT[t,t+h)Tt,A=a=PTa[t,t+h)Tat,A=a. 3

For a randomized controlled trial (RCT), where by design of the trial ATa (in SCM (2) equivalent to NAU0,U1,NT,

PTa[t,t+h)Tat,A=a=PTa[t,t+h)Tat. 4

The OHD at time t is then equal to

limh0h-1PTa[t,t+h)Tat-limh0h-1PT0[t,t+h)T0t. 5

We refer to (5) as the survivor marginalized causal hazard difference (SMCHD) that is rewritten in Definition 3.

Definition 3

Survivor marginalized causal hazard difference The SMCHD at time t for cause-effect relations that can be parameterized with SCM 2 equals

limh0h-1PTa[t,t+h)Tat,U0,U1dFU0,U1Tat-h-1PT0[t,t+h)T0t,U0dFU0T0t.

As the integration in Definition 1 is with respect to the population distribution of (U0,U1) (instead of that of the survivors in the exposed and unexposed universe respectively), the SMCHD is thus affected by the difference in distribution between (U0,U1) and (U0,U1)Tat as well as the actual causal effect. Therefore, the SMCHD can deviate from the CHD. Nevertheless, Aalen et al. (2015) explained that U0A|Tt so that for degenerate U1, the SMCHD is only affected by the causal effect and equals the CHD, so that the latter can be unbiasedly estimated from RCT data. In the next section, we formalize the SMCHD in case of effect heterogeneity (non-degenerate U1) and show that Inline graphic, so that then the SMCHD deviates from the CHD.

Results

In the remainder of the paper, we will focus on binary exposures such that a{0,1}. In this section we quantify how the SMCHD describes both the causal effect and the difference in distribution of (U0,U1) between survivors in the exposed and unexposed universes.

For cause-effect relations where SCM (2) applies with λia(t)=f0(t,U0i)+f1(t,a), it is known from Aalen et al. (2015) that for an RCT U0iAi|Tit, so that U0 remains exchangeable between exposed (U0Tt,A=1) and nonexposed (U0Tt,A=0) survivors. This independence, causal consistency, and the absence of confounding in an RCT (TaA) imply

U0Tt=dU0Tt,A=a=dU0Tat,A=a=dU0Tat.

Thus in absence of effect heterogeneity of the hazard difference, U0 is exchangeable between survivors in the exposed (U0T1t) and unexposed universes (U0T0t), so that the OHD from an RCT (that equals the SMCHD) equals the CHD and describes the causal effect.

However, if heterogeneity exists, there will also be a selection of the modifier (U1) in the exposed universe, where individuals with more favourable levels of U1 are more likely to survive. As a result of this selection, the SMCHD over time no longer represents the (population) average effect. For the main result of this paper, we consider hazard functions that satisfy Condition 1.

Condition 1

Hazard without infinite discontinuity

t>0:h~>0such thath(0,h~):Ef0(t+h,U0)+f1(t+h,U1,a)Tat<

In Theorem 1, we show that the SMCHD, for a hazard function that satisfies Condition 1, can be expressed in terms of conditional expectations of f1(t,U1,1) and f0(t,U0). In presence of effect heterogeneity, the SMCHD thus deviates from the CHD equal to E[f1(t,U1,1)].

Theorem 1

If the cause-effect relations of interest can be parameterized with SCM (2), where

λia(t):=f0(t,U0i)+f1(t,U1i,a),

and Condition 1 applies, then the SMCHD at time t equals

E[f1(t,U1,1)T1t]+E[f0(t,U0)T1t]-E[f0(t,U0)T0t].

To illustrate how the SMCHD can deviate from the CHD we continue by presenting some examples and apply Theorem 1. All programming codes used for these examples can be found online at https://github.com/RAJP93/CHD. First, we consider cause-effect relations for which U0U1.

Independent U0 and U1

As discussed at the start of this section, Aalen et al. (2015) implicitly showed that U0T1t=dU0T0t in absence of effect heterogeneity of the hazard difference. Based on similar arguments, Lemma 1 states that the additive frailty is also exchangeable in the presence of effect heterogeneity at the hazard scale that is independent of the frailty.

Lemma 1

If the cause-effect relations of interest can be parameterized with SCM (2), where

λia(t):=f0(t,U0i)+f1(t,U1i,a),

and U0iU1i then,

Ef0(t,U0)T1t=Ef0(t,U0)T0t.

Note that while Ef0(t,U0)T1t=Ef0(t,U0)T0t, Ef0(t,U0)TatEf0(t,U0) as the conditional expectations will decrease over time representing the survival of less susceptible individuals. If U0U1, as for the case of effect homogeneity, U0 is thus exchangeable between survivors in the exposed (U0T1t) and unexposed (U0T0t) universes. By Theorem 1 and Lemma 1, the SMCHD at time t now equals E[f1(t,U1,1)T1t].

Let us consider cause-effect relations for which SCM (2) applies with

f1(t,U1i,a)=U1if1(t,a), 6

where f1(t,0)=0, then Ef1(t,U1,1)T1t=f1(t,a)E[U1T1t]. By Definition 1, the CHD equals f1(t,a)E[U1], so that the difference with the SMCHD varies over time and equals

f1(t,a)E[U1]-E[U1T1t]. 7

For this multiplicative case, the conditional expectation E[U1T1t] can be expressed in terms of the Laplace transform of U1, as stated in Lemma 2.

Lemma 2

If the cause-effect relations of interest can be parameterized with SCM (2), where

f1(t,U1i,1)=U1if1(t,1),

and U0iU1i, then

EU1T1t=-LU1(0tf1(s,1)ds)LU1(0tf1(s,1)ds),

where LU1(c)=Eexp-cU1 with derivative LU1(c).

We continue to illustrate how effect heterogeneity can affect the integrated hazard difference when the causal effect is time-invariant for each individual. To do so, we let the additive hazard effect modifier U1 equal μ1 (0, for individuals that benefit) with probability p1, μ2 (0, for individuals that are harmed) with probability p2 or 0 (for individuals that are not affected). We denote this distribution as the Benefit-Harm-Neutral, BHN(p1,μ1,p2,μ2), distribution. Note that it is necessary that t:P(f0(t,U0)<|μ1)|)=0 to guarantee that the hazard rate is always positive for each individual. By Theorem 1 and Lemma 2 with f1(t,1)=1, the SMCHD is equal to EU1T1t=μ1p1exp-tμ1+μ2p2exp-tμ2p1exp-tμ1+p2exp-tμ2+1-p1-p2, and deviates from the constant CHD equal to E[U1]=p1μ1+p2μ2. For an RCT, the OHD equals the SMCHD, so that the integrated OHD equals

B(t)=0tEU1T1sds=-logp1(exp-tμ1-1)+p2(exp-tμ2-1)+1.

Thus, although the CHD is time-invariant, due to the selection (of U1) effect over time B(t) will not be linear and deviates from the function g(t)=tE[U1]. Three types of curves could be observed as shown in Fig. 1 where for illustration, p1=p2=0.5, so there exist only two levels for the modifier U1.

Fig. 1.

Fig. 1

0tEU1T1sds, when U1BHN(p1,μ1,p2,μ2) (solid), and g(t)=tE[U1] (dashed)

First of all, let’s consider the case where the exposure harms some individuals (for which U1i=1) while others do not respond to the exposure at all (U1i=0); see the orange line in Fig. 1. Initially, B(t) evolves as tE[U1]=0.5t. However, the individuals harmed by the exposure are less likely to survive over time, so the curve’s derivative decreases. In the end, only individuals with U1i=0 are expected to survive so that B(t) remains constant. Concluding that the exposure initially harms but loses effect over time is false for this case as the effect is time-invariant for each individual.

Secondly, when some individuals do benefit (U1i=-0.25) while others are not affected (U1i=0), the derivative of B(t) evolves from -0.125 to -0.25 at the moment only those that benefit are expected to survive, as illustrated with the purple line in Fig. 1. The effect for an individual is again constant and does not become more beneficial over time.

Finally, different individuals in the population might have opposite effects (U1i=1 or U1i=-0.1), as illustrated with the pink line in Fig. 1. Initially, the integrated hazard differences increase as the expected effect is harmful. However, over time those individuals with U1i=-0.1 are more likely to survive so that E[U1T11] changes sign. Finally, only those that benefit are expected to survive, and the curve decreases with a derivative equal to -0.1. For this example, it would be false to conclude that the exposure first harms but becomes beneficial over time.

Similar patterns can be observed for a continuous U1 distribution, in which case the EU1T1t will keep decreasing as for example presented in Appendix B.

In summary, if U0U1, then the SMCHD will be less or equal to the CHD due to the selection of U1. Therefore, decreasing or constant B(t) curves that at some point increase again can not be explained by the selection of U1 since individuals with less beneficial values of U1 are expected to survive shorter. However, if Inline graphic, such a pattern of the B(t) curve can still occur when the CHD is time-invariant as we will show next.

Dependent U0 and U1

The bivariate joint distribution function of U0 and U1, F(U0,U1), can be written using the marginal distribution functions and a copula C (Sklar 1959). As such,

F(U0,U1)(u0,u1)=CFU0(u0),FU1(u1)

and the Kendall’s τ correlation coefficient of U0 and U1 can be written as a function of the copula (Nelsen 2006). For the next example, we consider cause-effect relations for which SCM (2) applies with

f0(t,U0i)=+U0it2, 8

and again

f1(t,U1i,a)=U1ia, 9

while U0iΓ(1,1) and (U1i+)Γ(1,1), so that the hazard is nonnegative for each individual. To illustrate how the dependence can affect the integrated SMCHD for the settings presented in Fig. 5 in Appendix B, we use a Gaussian copula

C(x,y)=Φ2,ρ(Φ-1(x),Φ-1(y)),

where Φ and Φ2,ρ are the standard normal and standard bivariate normal with correlation ρ cumulative distribution functions, respectively. In Fig. 2, for ρ{-1,sin(-0.25π),0,sin(0.25π),1} (such that τ{-1,-0.5,0,0.5,1}) and {0,0.5,1}, we present the integrated SMCHD at time t that equals 0tE[U1T1s]+E[U0s2T1s]-E[U0s2T0s]ds by Theorem 1. The conditional expectations are derived empirically from simulations (n=10,000), and the integral is approximated by taking discrete steps of size 0.1. For completeness, the survival curves of the potential outcomes can be found in Fig. 6 in Appendix C.

Fig. 5.

Fig. 5

0tEU1T1sds (thick lines), when U1+Γ(1,1) and g(t)=tE[U1] (transparent lines)

Fig. 2.

Fig. 2

Integrated hazard difference, B(t), when f0(t,U0i)=+U0it2, U0iΓ(1,1), (U1i+)Γ(1,1) for equal to 0 (left), 12 (middle) and 1 (right) and different Kendall’s τ for U0i and U1i. The lines for τ=0 were already presented in Fig. 5 in Appendix B. Furthermore, g(t)=tE[U1] are presented as gray lines

Fig. 6.

Fig. 6

Survival curves for Y1 and Y0 when f0(t,U0i)=+U0it2, U0iΓ(1,1), (U1i+)Γ(1,1) for equal to 0 (left), 12 (middle) and 1 (right) and different Kendall’s τ for U0i and U1i

The difference between the integrated OHD and SMCHD increases when τ>0 (compared to τ=0). On the other hand, for τ<0, the difference is smaller most of the time than for τ=0 since favourable U1 are expected to occur with unfavourable levels of U0. Moreover, for τ=-1, at larger t, we observe that the difference can even change sign. For τ0, the SMCHD might thus be larger than the CHD, so the SMCHD is not a theoretical lower bound for the CHD. Note that if Inline graphic, the integrated SMCHD depends on the functional form of f0. In Fig. 7 in Appendix C, the results for f0(t,U0i)=+U0it220 are presented where the effect of the dependence is limited and the corresponding survival curves of the potential outcomes are presented in Fig. 8.

Fig. 7.

Fig. 7

B(t), when f0(t,U0i)=+U0it220, U0iΓ(1,1), (U1i+)Γ(1,1) for equal to 0 (left), 12 (middle) and 1 (right) and different Kendall’s τ for U0i and U1i. The lines for τ=0.5 and τ=1 do overlap, and the lines for τ=0 were already presented in Fig. 5. Furthermore, g(t)=tE[U1] are presented as gray lines

Fig. 8.

Fig. 8

Survival curves for Y1 and Y0 when f0(t,U0i)=+U0it220, U0iΓ(1,1), (U1i+)Γ(1,1) for equal to 0 (left), 12 (middle) and 1 (right) and different Kendall’s τ for U0i and U1i

Case study: the Radiation Therapy Oncology Group trial

With the findings of the previous section we will reflect on a data analysis of an actual case study to illustrate why it is important for a practioner to be aware of the possible difference between the SMCHD and CHD. We consider a large clinical trial carried out by the Radiation Therapy Oncology Group as described by Kalbfleisch and Prentice (2002, Section 1.1.2 and Appendix A) and also presented by Aalen (1989). From the patients with squamous cell carcinoma (a form of skin cancer) of 15 sites in the mouth and throat from 16 participating institutions, our focus is only on two sites (faucial arch and pharyngeal tongue) and patients from the six largest institutions. All participants were randomly assigned to radiation therapy alone or combined with a chemotherapeutic agent. So, we are interested in the causal effect of the chemotherapeutic agent in addition to radiation therapy on survival. If the causal mechanism can be parameterized with SCM (2) without effect heterogeneity, i.e. 

λia(t)=f0(t,U0i)+f1(t,a),

and the randomization was properly executed, implying NAiU0i, then, by Theorem 1, the OHD equals the CHD. Moreover, the CHD can be unbiasedly estimated by fitting Aalen’s additive hazard model. We did so by using the aalen() function from the package timereg in R. The estimated cumulative regression function (and a corresponding 95% confidence interval) of treatment combined with a chemotherapeutic agent is presented by the black lines in Fig. 3.

Fig. 3.

Fig. 3

Estimated B(t) and corresponding 95% confidence interval (black). Furthermore, the expected evolution of B(t) when λia(t)=f0(t,U0i)+U1ia and U1iBHN(0.5,-0.1,0.5,0.4) is presented (green)

In the absence of effect heterogeneity (ignoring the statistical uncertainty), one could now conclude that initially adding the chemotherapy is expected to harm a patient as B(t) takes on positive values and that the exposure loses its harmful effect over time as the derivative of B(t) decreases over time. Following a similar reasoning, Aalen et al. (2008, pp. 160-161) discuss a conclusion on the effect of N-stage (an index of lymph node metastasis) on survival that may be drawn by practitioners based on the same dataset (while also including patients with a tumour located at the tonsillar fossa): “The regression plot shows that this [non-significant P-value for a zero-effect of N-stage from a Cox analysis] is due to a strong initial positive effect being “watered down" by a lack of, or even a slightly negative effect after one year. Hence, not taking into consideration the change in effect over time may lead to missing significant effects.". However, if in reality

λia(t)=f0(t,U0i)+f1(t,U1i,a),

where f1(t,U1i,0)=0, the observed time-varying effect can also result from the modifier U1i selection. For example, when λia(t)=f0(t,U0i)+U1ia, and U1iBHN(0.5,-0.1,0.5,0.4), by Theorem 1, this pattern is expected (see the green line in Fig. 3) while the actual causal effect is time-invariant for each individual. The CHD equals 0.15 at each time point, but over time individuals that are harmed by the chemotherapy (U1i=0.4) are less likely to survive so that the SMCHD converges towards -0.1 (the effect for individuals that benefit from the chemotherapy). When we perform a stratified analysis by site in the oropharynx (where randomization remains), we observe that the effect of chemotherapy might have opposite effects for tumours located in the faucial arch and on the pharyngeal tongue, see Fig. 4. The tumour location could thus be the individual modifier underlying the BHN distribution. For this case study, we cannot be sure whether the effect of chemotherapy depends on the tumour location due to statistical uncertainty. However, it became clear that when statistical uncertainty is not the issue, it will be impossible to distinguish between a time-varying causal effect and a selection effect (of an unmeasured modifier) from data. Both phenomena can give rise to the same B(t).

Fig. 4.

Fig. 4

Estimated B(t) and corresponding 95% confidence interval (black) for patients with tumours located at the faucial arch (left) and pharyngeal tongue (right), respectively. Furthermore, the B(t) for a homogeneous population is presented (green) equal to 0.4t (left) and -0.1t (right) for comparison

Discussion

The additive hazard model gives better interpretable estimates of causal effects than the proportional hazard model (Aalen et al. 2015). As discussed by Aalen et al. (2015), the model assumes that the additive part of the hazard involving the exposure (or treatment) is not affected by any other individual feature. Otherwise, if such effect heterogeneity at the hazard scale exists, we have shown that the SMCHD deviates from the CHD of interest. For an RCT, and independent censoring, a time-varying observed hazard difference can be the result of either an actual time-varying causal effect or of the selection of favourable effect-modifier levels over time. Therefore, it is impossible to distinguish these scenarios based on data without making untestable assumptions. It is important to remark that for cause-effect relations that can be parameterized with SCM (2) where U1 is degenerate (in which case the OHD equals the CHD), contrary to the individual hazard differences, the difference of the potential survival times, T1-T0 can be heterogeneous. So, heterogeneous effects can still exist under Aalen’s additive hazard model.

In the presented examples and the case study, we have illustrated that one should be very careful when concluding that the effect decreases over time based on the cumulative regression function, as this might result from the selection. The size of the bias depends on how much the distribution FU1T1t changes over time. If the U1 is low in variability, the bias will be small. When analyzing data from an RCT with an additive hazard model, it can thus be helpful to adjust for potential effect modifiers to reduce the remaining variability of unmeasured effect modifiers. We want to remark that for cause-effect relations that cannot be described by SCM (2), the CHD is not the appropriate estimand to quantify the causal effect, which is then a more serious concern than that the complicated causal interpretation of the observed hazard difference.

Even in the absence of confounding, the hazard difference and the hazard ratio (as discussed in Post et al. (2024)) have a difficult causal interpretation. Instead, contrasts of the survival probabilities, the median, or the restricted mean survival time, have clear causal interpretations and should thus be used to quantify the causal effect on time-to-event outcomes as suggested by others (Hernán 2010; Stensrud et al. 2018; Bartlett et al. 2020; Young et al. 2020). Nevertheless, (additive) hazard models can still be used for causal inference to derive these appropriate estimands (Ryalen et al. 2018).

Appendix A: Proofs

Appendix A.1: Proof of Theorem 1

Proof

By causal consistency (Hernán and Robins 2020),

limh0h-1PT[t,t+h)Tt,A=a=limh0h-1PTa[t,t+h)Tat,A=a

As NAU0,U1,NT, there is no confounding and TaA, so that

limh0h-1PT[t,t+h)Tt,A=a=limh0h-1PTa[t,t+h)Tat,.

By the law of total probability,

limh0h-1PTa[t,t+h)Tat=limh0h-1PTa[t,t+h)Tat,U0,U1dFU0,U1Tat

First, we focus on the integrand,

h-1PTa[t,t+h)Tat,U0,U1=h-1PTatU0,U1-PTat+hU0,U1PTatU0,U1=h-11-PTat+hU0,U1PTatU0,U1=h-11-exp-0t+hf0(s,U0)+f1(s,U1,a)dsexp-0tf0(s,U0)+f1(s,U1,a)ds=h-11-exp-tt+hf0(s,U0)+f1(s,U1,a)ds

For monotonic conditional hazard functions, if h2<h1, then

h1-11-exp-tt+h1f0(s,U0)+f1(s,U1,a)dsh2-11-exp-tt+h2f0(s,U0)+f1(s,U1,a)ds

or

h1-11-exp-tt+h1f0(s,U0)+f1(s,U1,a)dsh2-11-exp-tt+h2f0(s,U0)+f1(s,U1,a)ds

as the average integrated conditional hazard over the interval increases (or decreases). Moreover,

limh0h-1PTa[t,t+h)Tat,U0,U1=f0(s,U0)+f1(s,U1,a)0.

Then, the limit and integral can be interchanged by directly applying the monotone convergence theorem.

For non-monotone conditional hazard functions, when Condition 1 applies, for every t, there exist a h~ so that h(0,h~):Ef0(t+h,U0)+f1(t+h,U1,a)Tat<. Moreover, let t=argmaxs(t,t+h~)f0(s,U0)+f1(s,U1,a), so that for h<h~:

h-1PTa[t,t+h)Tat,U0,U1h-11-exp-hf0(t,U0)+f1(t,U1,a).

Using the power series definition of the exponential function,

h-1PTa[t,t+h)Tat,U0,U1h-11-1k=0hk(f0(t,U0)+f1(t,U1,a))k1k!=h-1k=1hk(f0(t,U0)+f1(t,U1,a))k1k!k=0hk(f0(t,U0)+f1(t,U1,a))k1k!=f0(t,U0)+f1(t,U1,a)k=1hk-1(f0(t,U0)+f1(t,U1,a))k-11k!k=0hk(f0(t,U0)+f1(t,U1,a))k1k!=f0(t,U0)+f1(t,U1,a)k=0hk(f0(t,U0)+f1(t,U1,a))k1(k+1)!k=0hk(f0(t,U0)+f1(t,U1,a))k1k!<f0(t,U0)+f1(t,U1,a).

Moreover, Ef0(t,U0)+f1(t,U1,a)Tat< when E[f0(t+h,U0)+f1(t+h,U1,a)Tat]< for all h(0,h~). By application of the dominated convergence theorem, we can change the order of the limit and integral and conclude,

limh0h-1PTa[t,t+h)Tat=Ef0(t,U0)+f1(t,U1,a)Tat.

If U0U1, by Lemma 1,

E[f0(t,U0)+f1(t,U1,1)T1t]-E[f0(t,U0)T0t]=Ef1(t,U1,1)T1t,

and

limh0h-1PT[t,t+h)Tt,A=1-limh0h-1PT[t,t+h)Tt,A=0=Ef1(t,U1,1)T1t.

Appendix A.2: Proof of Lemma 1

Proof
Ef0(t,U0)Tat=f0(t,u0)dFU0Tat(u0)=f0(t,U0)P(TatU0=u0)P(Tat)dFU0(u0)=f0(t,U0)exp-(0tf0(s,U0)ds+0tf1(s,U1,a)ds)dFU1|U0=u0(u1)exp-(0tf0(k0,s)ds+0tf1(k1,a,s)ds)dFU1U0=k0(k1)dFU0(k0)dFU0(u0)=f0(t,U0)exp-0tf0(s,U0)dsexp-0tf1(s,U1,a)dsdFU1U0=u0(u1)exp-0tf0(k0,s)dsexp-0tf1(k1,a,s)dsdFU1U0=k0(k1)dFU0(k0)dFU0(u0).

Moreover, if U0U1, then

Ef0(t,U0)Tat=f0(t,U0)exp-0tf0(s,U0)dsexp-0tf1(s,U1,a)dsdFU1(u1)exp-0tf0(k0,s)dsexp-0tf1(k1,a,s)dsdFU1(k1)dFU0(k0)dFU0(u0)=f0(t,U0)exp-0tf0(s,U0)dsexp-0tf1(s,U1,a)dsdFU1(u1)exp-0tf0(k0,s)dsdFU0(k0)exp-0tf1(k1,a,s)dsdFU1(k1)dFU0(u0)=f0(t,U0)exp-0tf0(s,U0)dsexp-0tf0(k0,s)dsdFU0(k0)dFU0(u0)=Ef0(t,U0)T0t.

Appendix A.3: Proof of Lemma 2

Proof

If U0U1, by Bayes rule, the probability density of U1 given T1t, f(u1T1t), equals

f(u1T1t)=P(T1tU1=u1)f(u1)P(T1tU1=u1)dFU1(u1)=exp-0tf0(s,U0)+u1f1(a,s)dsdFU0(u0)f(u1)exp-0tf0(k0,s)+k1f1(a,s)dsdFU1(k1)dFU0(k0)=exp-0tu1f1(a,s)dsexp-0tf0(s,U0)dsdFU0(u0)f(u1)exp-k1f1(a,s)dsdFU1(k1)exp-0tf0(k0,s)dsdFU0(k0)=exp-0tu1f1(a,s)dsf(u1)exp-k1f1(a,s)dsdFU1(k1).

So that the Laplace transform of f(u1T1t) can be written as

LU1T1t(c)=E[exp-U1cT1t]=exp-u1cdFU1T1t(u1)=exp-u1cexp-0tu1f1(s,1)dsexp-k1f1(s,1)dsdFU1(k1)dFU1(u1)=exp-u1(c+0tf1(s,1)ds)exp-k1f1(s,1)dsdFU1(k1)dFU1(u1)=Eexp-U1(c+0tf1(s,1)ds)Eexp-U1f1(s,1)ds=LU1(c+0tf1(s,1)ds)LU1(0tf1(s,1)ds).

Since for a random variable X, E[X]=-LX(0),

E[U1T1t]=-LU1T1t(0)=-LU1(0tf1(s,1)ds)LU1(0tf1(s,1)ds).

Appendix B: Continuous U1 distribution (U0U1)

Let us consider cause-effect relations for which SCM (2) applies with

f1(t,U1i,a)=U1ia, 10

and U0U1. Moreover, let (U1+)Γ(k,θ), then LU1+(c)=(1+θc)-k. By lemma 2,

EU1T1t=θkθt+1-,

and

B(t)=klog(θt+1)-t.

The B(t) are presented over time in Fig. 5 for θ=k=1 and {0,14,12,1}.

Appendix C: Figures dependent U0 and U1

The additional figures for f0(t,U0i)=+U0it220.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Aalen OO. A linear regression model for the analysis of life times. Stat Med. 1989;8(8):907–925. doi: 10.1002/sim.4780080803. [DOI] [PubMed] [Google Scholar]
  2. Aalen OO, Borgan Ø, Gjessing HK. Survival and event history analysis. 1. Berlin: Springer; 2008. [Google Scholar]
  3. Aalen OO, Cook RJ, Røysland K. Does Cox analysis of a randomized survival study yield a causal treatment effect? Lifetime Data Anal. 2015;21(4):579–593. doi: 10.1007/s10985-015-9335-y. [DOI] [PubMed] [Google Scholar]
  4. Bartlett JW, Morris TP, Stensrud MJ, Daniel RM, Vansteelandt SK, Burman CF. The hazards of period specific and weighted hazard ratios. Stat Biopharm Res. 2020;12(4):518–519. doi: 10.1080/19466315.2020.1755722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bongers S, Forré P, Peters J, Mooij JM. Foundations of structural causal models with cycles and latent variables. Ann Stat. 2021;49(5):2885–2915. doi: 10.1214/21-AOS2064. [DOI] [Google Scholar]
  6. Cox DR. Regression models and life-tables. J R Stat Soc Ser B. 1972;34(2):187–220. [Google Scholar]
  7. Hernán MA. The hazards of hazard ratios. Epidemiology. 2010;21(1):13–15. doi: 10.1097/EDE.0b013e3181c1ea43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hernán MA, Robins JM. Causal inference: what if. Boca Raton: Chapman & Hall/CRC, Boca Raton, Florida; 2020. [Google Scholar]
  9. Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986;81(396):945–960. doi: 10.1080/01621459.1986.10478354. [DOI] [Google Scholar]
  10. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2. Wiley; 2002. [Google Scholar]
  11. Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81(1):61–71. doi: 10.1093/biomet/81.1.61. [DOI] [Google Scholar]
  12. Martinussen T, Vansteelandt S. On collapsibility and confounding bias in Cox and Aalen regression models. Lifetime Data Anal. 2013;19(3):279–296. doi: 10.1007/s10985-013-9242-z. [DOI] [PubMed] [Google Scholar]
  13. Martinussen T, Vansteelandt S, Andersen P. Subtleties in the interpretation of hazard contrasts. Lifetime Data Anal. 2020;26(4):833–855. doi: 10.1007/s10985-020-09501-5. [DOI] [PubMed] [Google Scholar]
  14. McKeague IW, Sasieni PD. A partly parametric additive risk model. Biometrika. 1994;81(3):501–514. doi: 10.1093/biomet/81.3.501. [DOI] [Google Scholar]
  15. Nelsen RB. An introduction to copulas. 2. Berlin: Springer; 2006. [Google Scholar]
  16. Neyman J. On the application of probability theory to agricultural experiments. Essay on principles. Stat Sci. 1923;5(4):465–472. [Google Scholar]
  17. Pearl J. Causal inference in statistics: an overview. Stat Surv. 2009;3:96–146. doi: 10.1214/09-SS057. [DOI] [Google Scholar]
  18. Pearl J. Causality: models, reasoning, and inference. 2. Cambridge: Cambridge University Press; 2009. [Google Scholar]
  19. Peters J, Janzing D, Schölkopf B. Elements of causal inference: foundations and learning algorithms. Cambridge, Massachusetts: The MIT Press; 2018. [Google Scholar]
  20. Post RAJ, van den Heuvel ER, Putter H (2024) The built-in selection bias of hazard ratios formalized using structural causal models. Lifetime Data Anal (2024). 10.1007/s10985-024-09617-y [DOI] [PMC free article] [PubMed]
  21. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701. doi: 10.1037/h0037350. [DOI] [Google Scholar]
  22. Ryalen PC, Stensrud MJ, Røysland K. Transforming cumulative hazard estimates. Biometrika. 2018;105(4):905–916. doi: 10.1093/biomet/asy035. [DOI] [Google Scholar]
  23. Sklar A. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de statistique de l’Université de Paris. 1959;8:229–231. [Google Scholar]
  24. Stensrud MJ, Valberg M, Røysland K, Aalen OO. Exploring selection bias by causal frailty models: the magnitude matters. Epidemiology. 2017;28(3):379–386. doi: 10.1097/EDE.0000000000000621. [DOI] [PubMed] [Google Scholar]
  25. Stensrud MJ, Aalen JM, Aalen OO, Valberg M. Limitations of hazard ratios in clinical trials. Eur Heart J. 2018;40(17):1378–1383. doi: 10.1093/eurheartj/ehy770. [DOI] [PubMed] [Google Scholar]
  26. Young JG, Stensrud MJ, Tchetgen Tchetgen EJ, Hernán MA. A causal framework for classical statistical estimands in failure-time settings with competing events. Stat Med. 2020;39(8):1199–1236. doi: 10.1002/sim.8471. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Lifetime Data Analysis are provided here courtesy of Springer

RESOURCES