Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 1.
Published in final edited form as: Biom J. 2025 Apr;67(2):e70041. doi: 10.1002/bimj.70041

Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks

Leah Comment 1, Fabrizia Mealli 2, Sebastien Haneuse 3, Corwin M Zigler 4,*
PMCID: PMC11887578  NIHMSID: NIHMS2055019  PMID: 40047176

Abstract

In semicompeting risks problems, nonterminal time-to-event outcomes, such as time to hospital readmission, are subject to truncation by death. These settings are often modeled with illness-death models for the hazards of the terminal and nonterminal events, but evaluating causal treatment effects with hazard models is problematic due to conditioning on survival– a post-treatment outcome– that is embedded in the definition of a hazard. Extending an existing survivor average causal effect (SACE) estimand, we frame the evaluation of treatment effects in the context of semicompeting risks with principal stratification and introduce two new causal estimands: the time-varying survivor average causal effect (TV-SACE) and the restricted mean survivor average causal effect (RM-SACE). These principal causal effects are defined among units that would survive regardless of assigned treatment. We adopt a Bayesian estimation procedure that parameterizes illness-death models for both treatment arms. We outline a frailty specification that can accommodate within-person correlation between nonterminal and terminal event times, and we discuss potential avenues for adding model flexibility. The method is demonstrated in the context of hospital readmission among late-stage pancreatic cancer patients.

Keywords: Causal inference, Hospital readmission, Principal stratification, Semicompeting risks, Survivor average causal effect

1. Introduction

In end-of-life care settings, doctors patients, and policymakers often make decisions based on outcomes related to quality of the remaining lifespan, e.g., hospitalization, onset of dementia, or loss of independence. Comparing treatments on the basis of non-mortality time-to-event outcomes is complicated by the fact that interventions on quality of life can also affect mortality. As a motivating example, consider a hospital tasked with reducing hospital readmission among late-stage cancer patients. Death is a “terminal” or “truncating” event because its occurrence precludes future hospitalizations, and a hospital readmission is a “nonterminal” event which does not truncate death. An intervention that increases risk of readmission may simultaneously increase risk of death, leading a naïve analysis to attribute the lower observed readmission rates to a beneficial effect of the intervention. This danger stems from a problem known as “truncation by death” and has been addressed in the causal inference literature using principal stratification (Zhang and Rubin, 2003).

Principal stratification handles truncation by death by restricting causal contrasts to groups that would not experience the truncating event under either treatment at a fixed time point. Initially introduced by Robins (1986) and formalized in Rubin (2000) and Zhang and Rubin (2003), the traditional survivor average causal effect (SACE) is the causal effect of the treatment on the truncated (i.e., non-mortality) outcome among the subpopulation that would survive regardless of treatment assignment. A number of papers have discussed the nonparametric identifiability conditions and assumptions for the SACE (Long and Hudgens, 2013; Zhang and Rubin, 2003; Robins, 1986; Tchetgen Tchetgen, 2014). The time point for defining always-survivorship is often implicit, such as “by the end of the study,” and typically only one such time t is considered. With a time-to-event structure of both the terminal (truncating) event and the nonterminal event, explicit definition of causal effects is indexed by both: (1) the time defining the “always survivors,” denoted with t, and (2) the time interval over which treatment contrasts on the nonterminal event are evaluated, r. For example, interest may lie in the causal effect on cumulative incidence of readmission at 30 days post-discharge (r=30) among patients who would survive under either treatment at 60 days (t=60). Examining such quantities across different values of (r,t) can serve different inferential purposes. We use the term “snapshot causal effect” to describe survivor causal effects with r=t.

When the truncated outcome is time-to-event, estimating the SACE at a single t can be problematic. First, providing one snapshot effect does not give decision makers information about the sensitivity of conclusions to the (possibly arbitrary) choice of t. It also does not account for the fact that timing matters for the nonterminal event. With hospital readmission, being hospitalized earlier may lead to more total hospitalizations or accelerate death. In other contexts – such as the onset of dementia – an earlier occurrence of the nonterminal event means more time spent in an unfavorable state, even if total lifespan remains unaffected. These concerns motivate the development of principal stratification methods that explicitly account for the time-to-event nature of the nonterminal outcome.

Others have partially grappled with principal stratification defined over time. For example, methods exist for treatment noncompliance in longitudinal contexts (Lin et al., 2008; Dai et al., 2012) But unlike treatment compliance status, which can vary over time arbitrarily, death at t under either treatment condition necessarily precludes membership in an always-alive state at t>t. Defining strata on the basis of survival is also closely linked to principal strata generated by other continuously-scaled quantities (Jin and Rubin, 2008; Bartolucci and Grilli, 2011; Schwartz et al., 2011) which can, in principle, create an infinite number of strata that are difficult to collapse into meaningful subpopulations and can entail estimation strategies complicated by problems of partial identifiability. Others have deployed similar perspectives to time-to-event problems where survival is intertwined with interest in a non-terminal event. Mattei et al. (2024) consider a clinical trial where patients could switch treatment, with principal strata defined based on the timing of treatment switching. Nevo and Gorfine (2022) provide estimands anchored to potential outcomes but based on a principal stratification defined by the ordering of a terminal and a nonterminal event, focusing inference on estimating effects on both event times. Lyu et al. (2023) build off of a more similar Bayesian estimation approach to that pursued here and focus estimation on snapshot effects of the effect of a treatment on recurrent nonterminal events by a given time, r, among principal strata who would survive under both treatments up to time t=r. Xu et al. (2022) provide a flexible Bayesian nonparametric estimation strategy targeting similar snapshot effects.

In the survival analysis literature, the problem of nonterminal time-to-event outcomes which may be truncated by terminal events is referred to as semicompeting risks because the terminal event acts as a competing risk for the nonterminal event, but the reverse is not true (Fine et al., 2001). Models which accommodate this semicompeting risks structure have been applied to a wide range of settings, including hospital readmission (Lee et al., 2015), cancer recurrence (Xu et al., 2010, 2022), career advancement (Pan and Gastwirth, 2013), and subscription product upgrades (Chen et al., 2017). Shared subject- and/or cluster-specific random effects, termed “frailties,” can allow for within-subject or across-subject correlation between event times that is induced by unmeasured factors (Xu et al., 2010). Such models are typically constructed on the hazard scale and account for truncation by removing individuals from nonterminal risk sets after the time of their observed terminal event; this removal is akin to what occurs with cause-specific hazards in competing risks problems. Joint modelling of the time-to-event outcomes is used to describe cumulative incidences or hazard-based predictive models (Lee et al., 2015). With a few notable exceptions(Xu et al., 2022; Lyu et al., 2023; Nevo and Gorfine, 2022; Mattei et al., 2024), analyses of semicompeting risks emanating from the survival analysis literature typically do not focus explicitly on causal inference for treatment effects, and inference for regression parameters in those approaches do not necessarily support causal interpretation. We offer new causal estimands that describe how the causal treatment effect on the nonterminal event varies across strata of the population defined by how the treatment causally affects their survival for the terminal event, i.e., their “survivorship.” Further c larifications of the relationship to causal estimands for competing events such as total effects and controlled direct effects appear in Young et al. (2020); Stensrud et al. (2020); Stensrud and Dukes (2022); Mattei et al. (2024).

This paper adapts existing semicompeting risks models and anchoring them to a principal stratification framework for the purpose of drawing causal inferences. As a first contribution, we outline a framework for principal strata defined by a continuous time-to-event truncating variable, such as death time. Second, we motivate and define two new causal estimands for truncated time-to-event outcomes, which represent extensions over previously-considered snapshot effects. We offer practical guidance for interpreting time-varying estimates of these quantities and describe their relationship to different types of decision making about the intervention. Third, we describe a density factorization which is innovative for principal stratification problems and that allows for explicit links to (non-causal) semicompeting risks models. Lastly, a Bayesian estimation procedure is provided with accompanying software.

The work here has important commonalities with work in Xu et al. (2022), who propose a Bayesian nonparametric estimation procedure for a distinct set of estimands for semicompeting risks problems, the aforementioned work in Nevo and Gorfine (2022), who provide causal estimands based on an alternative “population stratification,” and work in Lyu et al. (2023) which builds on the same foundation as our framework.

2. A potential outcomes approach for semicompeting risks data

2.1. Notation

Consider the evaluation of a binary intervention Z (0=control, 1=treated), where interest lies in its effect on the times to a nonterminal event, R, and a terminal event, D, the occurrence of which may leave R ill defined. We continue with the motivating setting of late-stage cancer care, where Z is an intervention intended to reduce hospital readmission among recently discharged patients, R is the time to hospital readmission, and D is the time to death. The occurrence of death leaves future readmission undefined. Using the potential outcomes framework, let Ri(z) and Di(z) denote the potential event times for readmission and death for person i, respectively, that would occur if the person were treated with Z=z. One or both of these events may be right censored by the potential censoring time Ci(z). If death occurs without readmission, we set Ri(z) to be R¯, a non-real value. The observed times are YiR=minRiZi,DiZi,CiZi and YiD=minDiZi,CiZi, where min(R¯,x) is defined to be x for any real x. The nonterminal event indicator δiR=1YiR=RiZi is one if the nonterminal event is observed to occur and zero otherwise. The analogous death event indicator is δiD=1YiD=DiZi. The set of covariates available at baseline, denoted by X, may consist of confounders, predictors of censoring, and measured baseline predictors of either event type. Together, the observed data for individual i is Oi=(YiR,δiR,YiD,δiD,Xi,Zi).

2.1.1. Principal stratification for continuous time

A principal stratification is a partition of the population into subpopulations defined by joint values of the potential outcomes under all treatment conditions. Our basic principal strata are defined by the pair of potential death times Di(0),Di(1). Since potential outcomes are not affected by treatment, stratifications based on the basic principal strata – and unions of these strata – exist prior to treatment assignment and can play a role similar to baseline covariates. While the basic principal strata describe a unit’s survival experience under both treatments across the entire time scale, it is useful to derive related quantities. For any t, let Vi(t) denote the time-varying principal state (Lin et al., 2008; Dai et al., 2012) implied by the basic principal strata:

Vi(t)=AAifDi(0)>t,Di(1)>tTKifDi(0)>t,Di(1)tCKifDi(0)t,Di(1)>tDDifDi0t,Di1t.

The value of Vi(t) represents a union of basic principal strata, depending on whether the individual is alive at t in both arms (AA, for “always alive”), alive only under treatment (CK, for “control-killed”) or control (TK, for “treatment-killed”), or dead under both (DD, for “doubly dead”). In the context of hospital readmission for cancer patients, we may be interested in readmission differences among the “always-alive” at 30 days, i.e., i:Vi(30)=AA, as well as the net difference in 30-day survival probabilities P(V(30)=TK)-P(V(30)=CK).

The set of individuals with Vi(t)=AA can also be viewed as a cohort with a well-defined and time-varying nonterminal event causal contrast function on interval (0,t). For various t, we can define survivorship cohorts, denoted by 𝒜t:

𝒜t=i:minDi(0),Di(1)>t=i:Vi(t)=AA

We note that 𝒜t𝒜t for t>t. In the context of hospital readmission, 𝒜90 refers to the cohort of patients who would survive at least 90 days regardless of treatment assignment. Like the principal states, these principal strata are defined solely in terms of potential terminal event times. Within a cohort 𝒜t, there can be no treatment effect on survival during the interval (0,t); this fact ensures the time at risk for the nonterminal event is the same under both treatment and control conditions, allowing us to separate the intervention’s effect on the nonterminal event from the effect on survivorship. Time-varying estimands defined across different survivorship cohorts can inform downstream decision making about whether the intervention impacts different types of patients differently, as we will discuss in Section 2.3.

2.2. Causal estimands for semicompeting risks

2.2.1. The time-varying survivor average causal effect (TV-SACE)

On the cumulative incidence scale, the existing “snapshot” survivor average causal effect is

SACEt=PR1<tVt=AA-PR0<tVt=AA. (1)

As previously discussed, snapshot estimands do not describe time-varying effects for any well-defined population. When they are estimated at a single time point, as is typically done, it is also unclear how sensitive conclusions are to the choice of t. To address these limitations, we define a new quantity, the time-varying survivor average causal effect (TV-SACE). This estimand is a function taking two arguments r and t, and it conveys the difference in the cumulative incidence of nonterminal events by time r among the group that survives past t>r regardless of assigned treatment:

TV-SACEr,t=PR1<rVt=AA-PR0<rVt=AA. (2)

The TV-SACE captures the causal effect of Z on R that has manifested by time r, among the alwayssurvivors at t. For example, hospitals may be interested in comparing 30-day (r=30) and 90-day (r=90) readmission rates among the cancer patients always-surviving at least 90 days post-discharge (t=90). When r=t, the TV-SACE(r,t) of Equation 2 coincides with the SACE(t) as defined in Equation 1.

Joint indexing of TV-SACE(r,t) by both r and t is essential for characterizing causal effects. For a fixed t, the function TV-SACE(r,t) (as a function of r) is a time-varying causal effect within the 𝒜t cohort. It describes the accumulation of benefit causally attributable to treatment among the well-defined – if latent – survivorship cohort. The shapes of these curves for cohorts defined by different t describe possible treatment effect heterogeneity on the nonterminal outcome across subpopulations with different underlying risks of death, and may reveal, for example, whether treatment effects steadily accrue or decay with time differently across the survivorship cohorts.

Importantly, viewing TV-SACE(r,t) as a function of t does not characterize a time-varying causal effect in a static population, but a function of snapshot effects, each defined within a different 𝒜t cohort. In particular, the shape of the function TVsnap(t)=TV-SACE(t,t) captures the sensitivity of causal effect estimation – in both sign and magnitude – to the moment in time used to define the always-survivorship group. Implications for decision makers is addressed in Section 2.3

2.2.2. The restricted mean survivor average causal effect (RM-SACE)

Another causal estimand is a variation of the the restricted mean survival time (RMST) and captures the length of the delay in the nonterminal event among always-survivors. This effect may be particularly relevant if the nonterminal event represents a permanent state change, such as the onset of irreversible dementia.

RM-SACE(r,t)=E[min(R(1),r)V(t)=AA]-E[min(R(0),r)V(t)=AA]

In the context of preventing hospital readmission, the RM-SACE(r,t) captures how much expected hospitalization-free time the treatment causes one to accumulate by time r, defined among the always-survivors 𝒜t. Within the cohort 𝒜t,RM-SACE(r,t) describes the timing of benefit accrual. If, within 𝒜t, the effect of treatment on the nonterminal events arises solely by delaying early events, the benefit accrues quickly and RM-SACE(r,t) eventually levels off as r increases. If different survivorship cohorts have very different curves, then the effect on the nonterminal event is heterogeneous with respect to the underlying risk of death.

Just as with TVsnap(t), the function RMsnap(t)=RM-SACE(t,t) conveys the sensitivity of the snapshot version to the choice of t. If RMsnap(t) increases steadily, the choice of t matters greatly, and reporting RM-SACE(t,t) for only a single time point understates the total impact of the treatment on delaying the nonterminal event. On the other hand, if RMsnap(t) levels off at some t*, then any benefits attributable to treatment can be fully captured by an estimate of RMsnapt*.

2.3. Using principal stratum estimands for decision making

Nonterminal and terminal events meaningfully characterize the patient experience, but decision makers rarely weigh them equally. Typically both event types are undesirable, but improvement in the nonterminal event rate with an increase in terminal events is unacceptable. When estimates of the principal stratum causal effects in Sections 2.2.1 and 2.2.2 agree qualitatively with estimates of the causal effect on terminal event times (e.g., the treatment is estimated to prolong survival and prolong readmission for all or most survival cohorts), decision making is straightforward. When these disagree, the relevance of the principal stratum effects depends on the probability of belonging to 𝒜t (i.e., P(V(t)=AA) for a population, or P(V(t)=AAX) for subpopulations defined by X). We discuss considerations for three types of decisions: (1) policymakers evaluating whether an intervention should be made available to future patients and, if so, which ones (“access decisions”), (2) decision makers choosing whether to pursue an intervention for a specific individual (“personal decisions”), and (3) regulators selecting metrics to monitor the quality of an intervention (“monitoring decisions”).

Access decisions prescribe an intervention to populations where it is judged favorably and prohibit it where the risk outweighs the demonstrated benefit. For example, a beneficial principal stratum effect involving hospital readmission (e.g., RMsnap(t)>0) may justify access to an intervention despite a null effect or modestly negative effect on mortality. More likely, the intervention should be targeted to subgroups defined by observable baseline X where P(V(t)=AAX) is large. Accordingly, researchers that wish to inform access decisions should report TVsnap(t) or RMsnap(t) across the wide range of t to facilitate synthesis across studies.

Personal decisions mimic access decisions focused on a narrow subpopulation matching the patient of interest. If P(V(t)=AAX=x) is highly dependent on x, conditional effects like TV-SACE(r,tX=x) should be explored and reported in order to support these decisions. Additionally, individuals are sometimes expected to know more about their underlying latent health than is adequately captured by X. If evidence was initially generated in a randomized setting, exceptionally healthy patients may infer that they are more likely to belong in 𝒜t, increasing the relevance of TVsnap(t) and RMsnap(t).

Lastly, quality monitoring assesses whether an intervention continues to offer new patients a sufficiently favorable risk-benefit profile. Here, a regulator may decide to add SACE metrics to complement existing terminal event metrics such as 30-day mortality, or may use estimated functions of TV-SACE(r,t) to decide if there is risk that standard nonterminal event quality of care metrics create perverse incentives (e.g., if variation in readmission tends to come at the cost of mortality). If SACE metrics are to be implemented, regulators must also choose a t* such that TVsnapt* or RMsnapt* reliably summarizes the intervention’s impact on the nonterminal event. Metrics will be most relevant for t* where PVt*=AA is high, since this is the subpopulation for which the intervention has demonstrated value for the nonterminal event. For candidate t*, TVsnapt* and RMsnapt* capture the benefit most completely when the corresponding TV-SACEr,t* and RM-SACEr,t* are fairly stable as rt*.

2.4. Structural assumptions

We now review a set of assumptions essential to our estimation strategy for TV-SACE(r,t) and RM-SACE(r,t). For clarity, our exposition focuses on non-recurrent nonterminal events, where any individual who experiences the nonterminal event is no longer at risk for that event. For nonterminal events which are nonpermanent and that in principal could recur – like a second hospital readmission – the proposed framework may still be relevant with careful definition of the nonterminal event (e.g., time to first readmission).

Assumption 1

Consistency of potential outcomes.

Ri=ZiRi(1)+1-ZiRi(0)
Di=ZiDi(1)+1-ZiDi(0)

Consistency is a standard assumption throughout the causal inference literature which connects observables R and D to their corresponding potential outcomes. Briefly, the treatment is well-defined such that there are no hidden variations within treatment level (Rubin, 1990).

Assumption 2

Conditional exchangeability (no unmeasured confounding).

The observed treatment assignment does not depend on the potential outcomes after accounting for the set of measured covariates X.

(R(z),D(z))ZXforz{0,1}

In a randomized trial, this assumption holds by design since treatment assignment is independent of all measured and unmeasured variables. For observational settings, interpreting effect estimates as causal effects requires a sufficiently comprehensive X.

Assumption 3

Shared, non-informative censoring of event times.

The potential censoring times are shared (i.e., CiCi(0)=Ci(1)). Furthermore, the vector of potential censoring times C is conditionally independent of all potential event times.

(R(0),D(0),R(1),D(1))CX

Non-informative censoring is required for the consistent estimation of cumulative distribution functions. With administrative censoring, this assumption is satisfied by design.

2.5. Connection to traditional semicompeting risks models

We state a key simplifying assumption that builds a bridge to the semicompeting risks literature. With closely related nonterminal and terminal event processes, it is unrealistic to assume that any measured baseline set X will contain all sources of dependence between potential event times in (R(0),R(1),D(0),D(1)). However, if the cause of the dependence is baseline heterogeneity in the patient population, it may be reasonable to assume that baseline factors can be summarized by a one-dimensional subject-specific latent trait γi. As with any random effect, γi cannot adjust for unmeasured confounding. However, γi can be used to model sources of dependence in event times across treatment arms which are independent of the treatment assignment mechanism (i.e., unmeasured predictors).

Assumption 4

Independence of potential outcomes conditional on covariates and latent frailty.

Potential nonterminal and terminal event times under each treatment are conditionally independent conditional on X and an individual-level latent trait γ.

(R(0),D(0))(R(1),D(1))γ,X

Assumption 4 suggests a factorization of the joint density of the four potential outcomes R(0), R(1), D(0), and D(1) that is unusual within the principal stratification literature. Traditional model-based principal stratification approaches build a model for stratum membership given covariates (the “S-model”), and a model for the joint distribution of the potential outcomes conditional on the principal strata and covariates (the “Y-model”) (Schwartz et al., 2011). Instead, we choose an alternative factorization, shown in Equation 3, which further simplifies to Equation 4 under Assumption 4.

f(R(0),R(1),D(0),D(1)X,γ)=f(R(0),D(0)X,γ)f(R(1),D(1)R(0),D(0),X,γ) (3)
=f(R(0),D(0)X,γ)f(R(1),D(1)X,γ) (4)

This arrangement makes it easy to enforce that Di(z) must exceed Ri(z) whenever the nonterminal event occurs (i.e., Ri(z)R+). We can also leverage existing illness-death transition models from the semi-competing risks literature to obtain a general form of the likelihood.

2.6. Likelihood

Within a single treatment condition, the semicompeting risks structure of the potential outcomes R(z) and D(z) can be seen as an illness-death transition model characterizing transitions among the event-free (“healthy”), nonterminal only (“ill”), and post-terminal (“dead”) states. Hazards can be defined for the three types of event transitions: (1) healthy-ill, (2), healthy-dead, and (3) ill-dead.

λ1z(t)=limΔ0PYR(z)[t,t+Δ)YR(z)t,YD(z)tΔ
λ2z(t)=limΔ0PYD(z)[t,t+Δ)YR(z)t,YD(z)tΔ
λ3z(tr)=limΔ0PYD(z)[t,t+Δ)YR(z)=r,YD(z)tΔ

The treatment arm-specific hazards conditional on covariates are denoted λ1ztxi,γi,θ, λ2ztxi,γi,θ, and λ3ztr,xi,γi,θ, where θ is a vector of unknown parameters. With cumulative hazard Λjz(t)=0tλjz(u)du, the observed data likelihood conditional on Oi is given by

c=i=1n([λ1ZiyiRxi,γi,θ]δiR[λ2ZiyiRxi,γi,θ]δiD(1-δiR)[λ3ZiyiDyiR,xi,γi,θ]δiDδiR×exp-Λ1ZiyiRxi,γi,θ-Λ2ZiyiRxi,γi,θ-δiRΛ3ZiyiDyiR,xi,γi,θ (5)

If the frailties are included as unknown parameters in an expanded parameter set θ*=(θ,γ) for γ=(γ1,,γn), the dimension of the parameter space is large and grows linearly with n, rendering estimation impracticable for large data sets. For computational efficiency and scalability, we use the marginalized likelihood m=cf(γ)dγ rather than the conditional likelihood in our estimation algorithm. For selected choices of f(γ), the form of m can be obtained analytically, but numerical integration within the Markov chain Monte Carlo (MCMC) computational algorithm can be used to accommodate arbitrary f(γ). Computationally feasible estimation strategies are the focus of the next section.

3. Bayesian model-based estimation of causal effects

3.1. Identifiability in the Bayesian framework

We propose a Bayesian approach anchored to illness-death models for state transitions. Note that the likelihood in Section 2.6 does not support point identifiability of the principal stratum causal effects, a problem which also arises with the more traditional (i.e., snapshot) SACE (Long and Hudgens, 2013). This motivates our use of a Bayesian estimation procedure. In addition to the ability to handle large amounts of missing data (including unobserved potential outcomes) in much the same way as unknown parameters, the Bayesian procedure with proper prior distributions will yield proper posterior inference, even in the face of flat portions of the likelihood. In these instances, some of the unknown parameters in θ are only “partially identified”: even with infinite amounts of data, the posterior distribution converges to a non-degenerate distribution over a range of possible values that is smaller than that specified in the prior, but not equal to a single point (Gustafson, 2010).

3.2. Implementation with parametric illness-death models

In this paper we focus on hazards parameterized using Weibull regression models for each of the six possible transitions. Although alternative specifications are possible, we elect to use a semi-Markov model for the terminal event after the occurrence of the nonterminal event (i.e., for t>Ri(z), the terminal event hazard at t depends on Ri(z) only through t-Ri(z) (Lee et al., 2015), and we require a non-negative correlation among transition hazards. For z{0,1} and j{1,2,3}, the Weibull shape for transition j under Z=z is denoted αjz, and the baseline hazard rate is κjz, giving hazard equations:

λ1ztxi,γi,θ=γiκ1zα1zt1α1z-1expxiβ1z
λ2ztxi,γi,θ=γiκ2zα2ztα2z-1expxiβ2z
λ3ztri(z),xi,γi,θ=γiκ3zα3zt-ri(z)α3z-1expxiβ3z×1t>ri(z)

The complete parameter vector for the above model specification is θ=(α,β,κ,σ) for α=α10,,α31,κ=κ10,,κ31, and β=β10,,β31. For computational convenience we suppose that the independent subject-specific frailties γi arise from a gamma distribution constrained to have a mean of 1 with unknown variance σ. This parametric assumption allows the marginal likelihood to be computed analytically, regardless of the specific models used for the baseline hazards. Equation 6 gives the likelihood marginalizing over independent gamma-distributed frailties

m=i=1n(1+σ)δiRδiD[λ1ZiyiRxi,θ]δiR[λ2ZiyiRxi,θ]δiD(1-δiR)[λ3ZiyiDyiR,xi,θ]δiDδiR×1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ+Λ3ZiyiDyiR,xi,θ-(1/σ+δiR+δiD) (6)

where λjztxi,θ=λjztxi,γi=1,θ is a reference level transition hazard for j{1,2,3}. Details of this marginalization can be found in Appendix B.

As with any Bayesian procedure, prior distributions must be placed on all unknown parameters. For the frailty variance σ, we suggest eliciting weakly informative priors from subject matter experts since we encountered convergence problems with dispersed starting values and vague priors (e.g., the Gamma(0.7,0.7) prior on the precision σ-1 suggested by Lee et al. (2015)). Priors for other components of θ are intended to be weakly informative; details can be found in the Appendix A. Analyses can be performed with different prior distributions to gauge sensitivity of substantive conclusions to the choice of prior.

3.3. Estimation algorithm

The estimation procedure for the causal quantities can be summarized in four steps: (1) estimating regression model coefficients θ using MCMC, (2) sampling latent frailties γ conditional on the posterior of θ, (3) imputing missing factual and counterfactual outcomes conditional on the posterior of (θ,γ), and (4) using imputed potential outcomes to calculate causal estimands of interest.

We obtain posterior samples of θ with a modified Hamiltonian Monte Carlo No-U-Turn Sampler (NUTS) using the marginalized form of the likelihood in Stan, a programming language for Bayesian modeling, inference, and posterior analysis. Suppose there are B post-warmup MCMC parameter samples θ(1),,θ(B). The closed form of γiθ is a gamma distribution (see Appendix D), which facilitates sampling from the posterior for γ. Using these B posterior samples of (θ,γ), we can draw from the posterior predictive distribution of the full set of potential outcomes (R(Z),D(Z),R(1-Z),D(1-Z)). Full details of the imputation procedure can be found in Appendix D.1.

The final step of finite sample causal inference is straightforward once all potential outcomes have been either directly observed or imputed. For a sequence of K time points t1,,tK with tKmaxiyiR dictated by the scientific question, the principal state vector Vtk is a deterministic function of D(0) and D(1). For MCMC iteration b, denote the principal state for person i at time point k by Vitk(b). Given D(0)(b) and D(1)(b), let 𝒜tk(b) be the number in the always-alive state at tk (i.e., i=1n1(Vitk(b)=AA)) For any rt1,,tk, a posterior draw of the sample time-varying survivor average causal effect is given by

TV-SACEr,tk(b)=𝒜tk(b)-1i:Vitk(b)=AA1Ri(1)(b)<r-1Ri(0)(b)<r

Similarly, the bth posterior draw of the sample restricted mean survivor average effect is

RM-SACEr,tk(b)=𝒜tk(b)-1i:Vitk(b)=AAminRi(1)(b),r-minRi(0)(b),r

As with any posterior sample, the B draws of TV-SACEr,tk(b) or RM-SACEr,tk(b) can be summarized using the means, medians, or quantile-based credible intervals for each (r,tk) pair.

4. Simulation study

Our simulation study investigates operating characteristics of the estimation strategy and evaluates performance of the method under two possible misspecifications of the underlying frailty. To align with time frames for existing quality metrics, we simulate data with administrative censoring at 90 days and consider the TV-SACE and RM-SACE estimators at 2 time points, Day t=30 and Day t=90, in a data sets of size n=6000 similar to the pancreatic cancer and hospital readmission data application. The data generation processes (DGPs) are as follows: correct model specification (DGP1), parametric misspecification of the frailty distribution, where estimation assumes a gamma-distributed frailty while the true data generation process is log-normal (DGP2), and coefficient misspecification of the frailty distribution, where estimation incorrectly assumes that the frailty enters every hazard model in the same way (DGP3). For z=0,1, all three scenarios can be expressed as special cases of data generation process with hazards

λ1ztxi,γi,θ=(1-z)κ10α10tα10-1γiω1zexpxiβ1z+zκ11α11tα11-1γiω1zexpxiβ1z
λ2ztxi,γi,θ=(1-z)κ20α20tα20-1γiω2zexpxiβ2z+zκ21α21tα21-1γiω2zexpxiβ2z
λ3ztri(z),xi,γi,θ=(1-z)κ3α30t-ri(z)α30-1γiω3zexpxiβ3z1t>ri(z)+zκ31α31t-ri(z)α31-1γiω3zexpxiβ3z1t>ri(z)

DGP1 and DGP2 differ only in the frailty distribution f(γ); for both DGP1 and DGP2, ωjz1 for all j and z. With DGP3, the inclusion of ωjz1 allows for the importance of the frailty to vary by treatment and transition type; values of ωjz>1 mean the the frailty is more important for transition j in the group z, while ωjz<1 mean the individual frailty is less important. Values for ωjz ranged from 0.81 to 1.23.

Parameters used for data generation were selected based on simplistic analysis of the readmission data in Section 5 to approximate a setting of end-of-life cancer care. Four independent binary baseline covariates were simulated, plus one continuous covariate. For j=1,2,3, coefficients βj1=βj0 were chosen to specify moderate hazard ratios ranging from 0.72 to 1.85. All frailty distribution were specified to have a variance of 1.3 to approximately match the readmission data. Appendix E provides more detail on these data generation processes, the process used to construct weakly data driven priors for αjz and κjz (Appendix A), thresholds for adequate MCMC convergence, and sampler tuning parameters. Each replicate data set was analyzed with model fits using the final 2000 iterations of a 5000 iteration chain.

4.1. Simulation results

Table 1 shows bias, mean squared error (MSE), and 95% credible interval coverage estimates for the true population quantities. In the correctly specified Scenario 1, there is small bias and low MSE, particularly for the fraction always-alive and the TV-SACE. However, note the slight under-coverage for the fraction always alive at times t=30,60,90 and the corresponding under-coverage of the true population estimates of TV-SACE and RM-SACE.

Table 1.

Bias, Mean Squared Error (MSE), and 95% interval coverage for estimators of population proportion always-alive, time-varying survivor average causal effect TV-SACE(t,t) and restricted mean survivor average causal effect RM-SACE(t,t) from 200 replicates for correct specification (Scenario 1), plus two incorrectly specified frailty scenarios (Scenarios 2 and 3).

Scenario Quantity t Bias MSE Coverage

1 Fraction always-alive at t 30 0.002 3.917e-05 0.845
60 0.001 3.956e-05 0.900
90 0.001 5.163e-05 0.855
TV-SACE(t,t) 30 0.001 9.313e-05 0.935
60 0.002 1.707e-04 0.915
90 0.000 2.142e-04 0.915
RM-SACE(t,t) 30 −0.016 3.310e-02 0.930
60 −0.079 2.327e-01 0.945
90 −0.060 6.976e-01 0.935
2 Fraction always-alive at t 30 −0.006 8.058e-05 0.680
60 −0.005 7.189e-05 0.745
90 −0.002 6.038e-05 0.845
TV-SACE(t,t) 30 0.000 1.145e-04 0.910
60 0.003 1.917e-04 0.895
90 0.004 2.327e-04 0.940
RM-SACE(t,t) 30 0.094 5.732e-02 0.830
60 −0.024 2.857e-01 0.920
90 −0.246 8.786e-01 0.930
3 Fraction always-alive at t 30 0.003 4.811e-05 0.805
60 0.003 4.940e-05 0.810
90 0.004 6.904e-05 0.755
TV-SACE(t,t) 30 0.002 9.410e-05 0.925
60 0.008 2.089e-04 0.885
90 0.013 3.744e-04 0.810
RM-SACE(t,t) 30 −0.044 3.768e-02 0.895
60 −0.412 3.829e-01 0.845
90 −0.956 1.548e+00 0.775

Figure 1 provides 95% credible intervals for quantities from Scenario 1 defined at t=30, depicted against the superpopulation truth (vertical gray lines) and finite sample (i.e., individual replicate data set) truths. We find roughly nominal coverage for the finite sample fraction always-alive intervals (subplot A); thus, we recommend interpreting posterior uncertainty for the fraction always-alive primarily as an in-sample quantity. Credible intervals for the survivor average causal effects show similar coverage levels for the superpopulation and finite-sample causal TV-SACE and RV-SACE estimands (subplots B and C).

Figure 1.

Figure 1

Simulation Scenario 1: 95% credible intervals across 200 replicates for the fraction always-alive at t (A), TV-SACE(t,t) (B), and RM-SACE(t,t) (C) for t=30, compared to the replicates’ finite sample truth (points) and the population average (vertical gray lines).

Table 1 shows the expected deterioration in performance for the misspecified Scenarios 2 and 3, with both misspecified scenarios result in bias and lower coverage probabilities. The RM-SACE seems particularly sensitive to misspecification, but the TV-SACE has reasonably low MSE. In general, these misspecifications resulted in slight overestimates of the TV-SACE, moderate to substantial underestimates of the RM-SACE, and unpredictable directions of the bias for the fraction always-alive.

5. Evaluation of supportive home care effects on mortality and hospital readmission among pancreatic cancer patients

5.1. Medicare Part A pancreatic cancer readmission data

We demonstrate our method in an analysis of hospital readmission using a data set of 12,091 newly diagnosed pancreatic cancer patients in the United States, adopting many of the principles involved in the emulation of target trials with observational data (Hernán and Robins, 2016) to support clear interpretation of the proposed causal estimands. The initial sample consisted of 17,685 Medicare Part A enrollees in California from 2000 to 2012 who were hospitalized and later discharged with a diagnosis of pancreatic cancer. We limited our analysis to the 12,091 patients who were healthy enough to be discharged to home (i.e., not hospice or a skilled nursing facility). The baseline t=0 was set to the date of discharge from the index hospitalization during which the cancer was diagnosed. Hospital readmission as a proxy for quality of care usually focuses on a short window after the index hospitalization. To focus on these short-term effects, administrative censoring was applied at 90 days. More information can be found in Lee et al. (2015).

The scientific question of interest is whether in-home supportive care leads to lower rates of hospital readmission than discharging to home without additional support. In the terminology of Section 2.3, a related “access decision” determines whether additional support should be available to patients whose doctors request it, a “personal decision” answers whether additional support safely reduces readmission for specific covariate profiles, and a “monitoring decision” assesses whether readmission and mortality metrics adequately summarize benefit and are therefore suitable for ongoing evaluation of home support efficacy. Of the 12,091 patients discharged to home, 3,140 (26%) were sent home with supportive care. A major concern was that patients discharged without care would be systematically healthier than those discharged with support, presenting a strong threat of confounding. To reduce the dependence of model-based confounding adjustment, a logistic regression propensity score model for the receipt of home care was constructed using all available baseline covariates: non-White race, age, dichotomized Charleston-Deyo comorbidity score, admission route, and length of stay during index hospitalization. Estimated propensity scores used to match (without replacement) 3,140 of the 8,951 patients discharged without care for comparison with those receiving supportive care (Ho et al., 2007), resulting in average treatment effects on the treated (i.e., supported). Appendix F depicts covariate balance checks after matching.

Hazard regression models included the same covariate set, with all covariates mean centered. Age and length of stay were scaled to have a standard deviation of 1 to facilitate specification of priors for the coefficients. As proposed in Section 3.3, we adopted Weibull transition hazards with a semi-Markov specification for the post-readmission hazard of death. The baseline hazard κ=κ10,,κ31 and Weibull shape parameters α=α10,,α31 were allowed to freely vary across treatment arm and transition type, but adjustment covariates β=β10,,β31 were assumed to be the same across treatment (i.e., βj0=βj1 for j=1,2,3). Prior distributions were specified as in Section 3.2. In particular, the prior for σ is specified as Inverse − Gamma(21, 7.1), which is an informative prior chosen to ensure stable MCMC inference for the remaining model parameters.

Posterior draws of θ=(α,κ,β) were obtained from 4 chains of 4,000 MCMC iterations each, with the first 3,000 iterations removed as warmup. Gelman-Rubin potential scale reduction factors Rˆ and effective sample sizes were calculated for each parameter (Gelman et al., 2013; Carpenter et al., 2017). Using the procedure outlined in Section 3.3, posterior draws of the frailties and missing potential outcomes were obtained using the 4,000 post-warmup samples of θ.

5.2. Readmission and mortality results

All Gelman-Rubin Rˆ values were below 1.01, indicating good mixing of the chains, and the minimum effective sample size across all parameters was 2,770.

Part A of Figure 2 shows the posterior mean survival curves for each treatment group and their implications for posterior mean size of the always-survivor subpopulation. The fastest-declining survival curve, shown in green, is the in-sample average of time to first potential death (i.e., minDi(0),Di(1)); the “survival” S(t) equals P(V(t)=AA). The other two curves in Part A show the mean S(t) for the counterfactual survival probabilities if everyone in the (matched) sample had been treated with extra care (z=1, dashed orange) or discharged home without extra care (z=0, in dotted navy). Based on these covariate-adjusted survival curves, the treatment of receiving additional support at home leads to reduced lifespan across the 90 days, i.e., P(D(0)>t)>P(D(1)>t) for t(0,90]. For all curves, uncertainty increases with time because of the decreasing number of subjects used to estimate survival. Relative to the D(z) survival curves, there is additional uncertainty in the P(V(t)=AA) estimates due to uncertainty in σ and γ.

Figure 2.

Figure 2

Posterior mean survival curves among newly diagnosed pancreatic cancer patients discharged home, with supportive care (z=1) and without (z=0), with the corresponding implications for always-alive principal stratum size (A) and posterior mean population composition of always-alive (AA), treatment-killed (TK), control-killed (CK), and doubly dead (DD) principal states (B)

Part B of Figure 2 shows the posterior mean proportion of the population in each principal state {AA,TK,CK,DD} over time. For small t, nearly the entire population is in the AA state because few deaths are observed or imputed under z{0,1}. With time, more deaths accumulate among patients discharged home with support, leading to a greater proportion of the population in the TK state than the CK state. The population fractions in TK and CK stay relatively constant after approximately day 45, suggesting that most patients who would die only if discharged to one of the conditions will do so relatively early in the 90-day time frame. The overall effect is that depletion of the always-alive principal stratum occurs more during the early part of the 90-day window.

5.2.1. Population-level causal effects

Part A of Figure 3 shows posterior means for TV-SACE(r,t) for 5 always-survivor cohorts 𝒜t:t{15,30,45,60,90}. In all cohorts, support leads to greater incidence of hospital readmission. In the first days after discharge from the index hospitalization, the healthier, longer-surviving cohorts like 𝒜90 have treatment effects on readmission rates which are slightly larger than cohorts with less stringent survivorship requirements (e.g., 𝒜15). However, effects among the longer-surviving cohorts grow more similar over time. This may point to a heterogeneity in reasons requiring a readmission; that is, readmissions occurring in the first week or so after diagnosis may be caused by a different mixture of proximate causes than the admissions during the rest of the 90 days.

Figure 3.

Figure 3

Estimated time-varying (TV-SACE) and restricted mean (RM-SACE) survivor average causal effects of home care (vs. no additional care at home) on the cumulative incidence of hospital readmission among 6,280 newly diagnosed late-stage pancreatic cancer patients

Plot B of Figure 3 shows estimated curves of TVsnap(t)=TV-SACE(t,t) across t, with each of the 1,000 lines derived from a representative posterior draw of θ. The color of the lines at each t gives the proportion of the study population in the always-alive state at t according to that set of posterior predictive potential outcome samples. The shape suggests that, for the cumulative incidence scale, there is no natural time point for evaluating the causal effect of discharge support on hospital readmission because TVsnap(t) never completely levels off. However, the direction of the effect (i.e., higher cumulative incidence in the group discharged with care) is largely consistent over time.

Like the time-varying survivor average causal effect, the restricted mean effects also suggest that being discharged home with support increases readmissions. Part C of Figure 3 shows the within-cohort accumulation of readmission-free days attributable to being discharged with support. Because the accumulation is negative, this finding is consistent with faster and ultimately greater cumulative incidence of readmission among the treated (i.e., supported) group. In part due to the natural ceiling of t in the definition of the restricted mean, the estimated snapshot function RMsnap(t)=RM-SACE(t,t) in Part D of Figure 3 steadily grows in magnitude over the course of the 90 days.

5.2.2. Implications for individual-level decisionmaking

The posterior distribution for frailty variance σ shows substantial remaining variability in prognosis that is not explained by the covariates included in the models, with a mean of 1.31 (95% CI: 1.12, 1.51). To put this estimate into perspective, σ=1.31 corresponds to patients in the 90th percentile of the latent frailty experiencing event hazards that are 41.9 times the hazards for comparable patients in the 10th percentile. Relative to the variation in prognoses explained by predictive covariates, large values for σ pose additional difficulties for tailored decisionmaking. Nevertheless, covariate-specific posterior predictions may be used to differentiate treatment recommendations.

Table 2 gives examples of tailored prognoses for two individuals with selected covariate patterns. As expected, the posterior predictive state probabilities show that – for comparable levels of underlying frailness – a younger White woman is much more likely to be in the always-alive state at 90 days than an older non-White man with the same comorbidity score and duration of index hospitalization. However, the magnitude of this survival advantage varies greatly. For an individuals in the 10th or 90th percentile of latent health (i.e., the 90th or 10th percentile for γ), the difference in the probability of being always-alive at t=90 is approximately 0.08; for individuals of average frailty, the difference is more pronounced at 0.312 (0.420 vs. 0.108). We can also conclude that frail patients of either covariate pattern are unlikely to be in the always-alive state at 90 days. Together, these findings highlight the degree to which an individual can tailor decision making based on additional information.

Table 2.

Posterior predictive means for principal state probabilities and principal stratum causal effects for new patients of two covariate patterns

Principal State Probabilities at t1 If always-alive, causal effect of being discharged to home with support (vs. without)

Patient characteristics Latent health2 Day t AA CK TK Difference in readmission incidence by t Additional readmission-free days accumulated by t

Nonwhite male aged 85, average comorbidity score and hospital length of stay Frail 30 0.180 0.150 0.362 0.037 −0.296
90 0.003 0.020 0.115 0.023 −3.622
Average 30 0.550 0.126 0.263 0.041 −0.449
90 0.108 0.124 0.353 0.079 −5.439
Healthy 30 0.972 0.009 0.019 0.006 −0.065
90 0.895 0.032 0.071 0.023 −0.912
White female aged 65, average comorbidity score and hospital length of stay Frail 30 0.495 0.140 0.278 0.029 −0.346
90 0.090 0.117 0.308 0.005 −1.346
Average 30 0.813 0.061 0.116 0.045 −0.550
90 0.420 0.147 0.312 0.043 −3.337
Healthy 30 0.994 0.002 0.004 0.008 −0.092
90 0.976 0.008 0.016 0.030 −1.191
1

Always-alive (AA), dead only under control (CK), and dead only under treatment (TK)

2

Frail and healthy correspond to the 90th and 10th percentiles of γ, while average health corresponds to γ = 1

6. Conclusions

In this paper we have proposed a general approach to principal stratification where the strata are defined by potential times to a truncating event and there is administrative censoring. From a decision making perspective, this stratification is a natural one because it groups units according to their time horizon for comparing quality of care. To quantify those differences, we formulated two new causal estimands, the TV-SACE and RM-SACE, for contrasting nonterminal time-to-event outcomes that are truncated by death. We then described a Bayesian model-based estimation procedure that builds upon existing strategies for semicompeting risks models. Our innovative factorization scheme facilitates connections to existing illness-death models, putting a sharper causal focus on this literature and clarifying how such models can be adapted to yield causally interpretable quantities.

The methods outlined here have several limitations that warrant discussion. First, in our implementation, the latent trait is assumed to be gamma-distributed, and the effect of the frailty is constrained to be identical across all hazard types and treatment arms. Both assumptions were made largely for computational convenience because they allow parameter sampling using the marginal likelihood. The need to balance computational stability also required an informative prior for the frailty variance, with less informative prior specifications difficult to assess due to issues with MCMC convergence. Other parametric distributions could be assumed for the latent trait (e.g., log-normal), and shared frailty models have previously incorporated transition-specific coefficients for the log-frailty (Liu et al., 2004). These adaptations do not result in analytically tractable marginal likelihoods, although numerical integration can be used. In practice, we found MCMC performance using the unmarginalized likelihood to be inconsistent, slow, and prone to divergent NUTS transitions. Second, parametric Weibull hazard models may not be appropriate for all scenarios. More flexible baseline hazard specifications could be achieved with splines (Royston and Parmar, 2002) or Bayesian nonparametrics (Lee et al., 2015), although posterior prediction would become more difficult due to problems extrapolating beyond the observed time scale. The Bayesian nonparametric approach of Xu et al. (2022) sidesteps parametric assumptions but requires randomized treatment assignment. While our approach can be used in observational settings, confounding will only be controlled with correct specification of the propensity score or outcome models, which may be difficult with parametric restrictions.

Notwithstanding these limitations, this work offers a new, causally informed approach to the analysis of semicompeting risks data. Illness-death models pose two challenges for causal inference on the non-terminal outcome: (1) the use of hazard-based estimation strategies, which implicitly condition on the post-treatment outcome of survival, and (2) handling truncation when the treatments also affect survival. By formulating causal estimands using potential outcomes notation, we separate the model estimation from the choice of causal estimand. Our method also indirectly addresses decision makers’ need to balance nonmortality considerations with treatment impacts on survival; this is achieved by quantifying, for every time point, the relative size of the population for whom quality of care contrasts are relevant. While we consider the case of administrative censoring, other contexts with more complex censoring may motivate extensions of the TV-SACE and RM-SACE to additionally depend on potential censoring times. The use of posterior predictive sampling to estimate the effects allows for the innovative density factorization which connects to an existing semicompeting risks approach. Analogous factorizations may prove useful for truncated outcomes which are not time-to-event. Lastly, because we operate in the Bayesian framework, we properly account for uncertainty due to partial identifiability of the causal effects.

Future work on these and other principal stratification models for non-survival outcomes in high-mortality settings may be extended to incorporate utility functions within a formal decision-theoretic framework. These methods open up more possibilities for causally valid research on non-mortality outcomes related to quality of life or quality of care among high-mortality patient populations. In turn, this provides evidence that is more directly useful to the individuals and policymakers making decisions on the basis of multiple criteria.

Acknowledgements

Support for this work was provided by NIH grants T32CA009337 and T32ES007142 (LC), R01CA181360 (SH), and R01ES026217 and R01GM111339 (CZ), as well as EPA grant RD835872 (CZ). FM received support from Dipartimenti Eccellenti 2018–2022 ministerial funds. LC was also supported by a Rose Traveling Fellowship. We thank Alessandra Mattei for helpful discussion.

A. Additional details on the prior specification

For binary covariates and continuous variables rescaled to have unit variance, hazard ratios are unlikely to exceed 5; therefore we set πβjz to be 𝒩0,2.52 for j=1,2,3 and z=0,1. With mean-centered covariates and αjz=1, the baseline hazard κjz corresponds to the hazard experienced by those at the sample mean covariate values. Thus, reasonable priors for the log-baseline hazards are 𝒩(log(Ej/PTj),(log(100)/2)2), where Ej is the number of observed events and PTj is the total at-risk person-time for transition j, pooling across treatment arms. For exponential hazards, this prior asserts that the true hazard experienced at the sample mean value has only ≈ 5% probability of being more than two orders of magnitude away from the crude (pooled) event rate. The data-driven prior specification for the log-baseline hazards makes the model invariant to the time scale of the data (i.e., days vs. years). (An alternative approach would be to rescale all times so that the mean event times were ≈ 1.) Lastly, the Weibull shape parameters α10,,α31 are given LogNormal0,22 priors to express moderate belief that any changes in the hazards occur slowly rather than quickly decaying or exploding. Assuming variation independence of the different parameter blocks β, α, and κ, we can construct a prior as π(θ)=π(β)π(α)π(κ).

B. Marginalization of conditional likelihood c over frailties γ

Let λj(t) be the instantaneous hazards and Λj(t) be the cumulative hazards for transition j{1,2,3}. First, note Λ3(tr)0 for tr and yiDyiR by definition, regardless of specific distributional assumptions. We have the following equivalence:

Λ3ZiyiDyiR,xi,γi,θ=1yiDyiRΛ3ZiyiDyiR,xi,γi,θ+1yiD>yiRΛ3ZiyiDyiR,xi,γi,θ=δiRΛ3ZiyiDyiR,xi,γi,θ

Then the conditional likelihood contribution of individual i is

c,i=[λ1ZiyiRxi,γi,θ]δiR[λ2ZiyiRxi,γi,θ]δiD(1-δiR)[λ3ZiyiDyiR,xi,γi,θ]δiDδiR×exp-Λ1ZiyiRxi,γi,θ-Λ2ZiyiRxi,γi,θ-Λ3ZiyiDyiR,xi,γi,θ (7)

with f(γσ)=i=1nσ1/σΓσ-1-1γi1/σ-1exp-γi(1/σ), as in the main text. The marginal likelihood across all observations can be written as

m=i=1n(0c,ifγiσdγi)i=1nm,i (8)

This was first stated in Xu et al. (2010), but we provide a proof here.

First, we define some shorthand notation to suppress indexing that is unnecessary for this proof:

h1=λ1ZiyiRxi,γi,θ
h2=λ2ZiyiRxi,γi,θ
h3=λ3ZiyiDyiR,xi,γi,θ
H1=Λ1ZiyiRxi,γi,θ
H2=Λ2ZiyiRxi,γi,θ
H3=Λ3ZiyiDyiR,xi,γi,θ
s=σ
de=δiEforE{R,D}
g=γi

and note a general property of the gamma function that for kR+

Γ(k+1)=kΓ(k) (9)
m,i=0g1/s-1s1/sΓ(1/s)h1drh2dd1-drh3drddexp-H1-H2-H3-g/sdg=s-1/sΓ1/s+dr+ddh1drh2dd1-drh3drdd1/s+H1+H2+H3-1/s-dr-ddΓ(1/s)=Γ1/s+dr+ddΓ(1/s)×h1drh2dd1-drh3drdd×sdr+dds-1/s-dr-dd1/s+H1+H2+H3-1/s-dr-dd1+sH1+H2+H3-1/s-dr-dd=h1drh2dd1-drh3drdd×(1+s)drddsdr+ddΓ1/s+dr+ddΓ(1/s)fortherelevantdr,dd×1+sH1+H2+H3-1/s-dr-dd

To see the final line, consider the 4 possible values taken on by the binary indicators dr,dd:

Case 1: dr,dd=(0,0)

1=(1+s)0=s(0+0)Γ(1/s)Γ(1/s)=1

Cases 2 and 3: dr,dd=(0,1) and (1, 0)

1=(1+s)0=s1Γ(1/s+1)Γ(1/s)=1

where the rightmost equality is true by Equation 9.

Case 4: dr,dd=(1,1)

s1+1Γ(1/s+1+1)Γ(1/s)=s(1/s)-1Γ(1/s+2)Γ(1/s+1)1/s+1by9Γ(1/s+1)Γ(1/s)=s(1/s+1)×Γ(1/s+1)(1/s)Γ(1/s)1by9=s+1

This proves the marginal likelihood has the form stated in the main text:

m=i=1n(1+σ)δiRδiD[λ1ZiyiRxi,θ]δiR[λ2ZiyiRxi,θ]δiD(1-δiR)[λ3ZiyiDyiR,xi,θ]δiDδiR×1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ+Λ3ZiyiDyiR,xi,θ-(1/σ+δiR+δiD)

C. Log-likelihood contributions by observed data pattern

The log-likelihood marginalized over the frailties is

mi=δiRδiD(log(1+σ)+logλ3)+δiRlogλ1+δiD(1-δiR)logλ2-(1/σ+δiR+δiD)log(1+B)

with B=σ(Λ1+Λ2+Λ3). This is the likelihood that gets added to the target function within Stan.

The marginal likelihood m in the main text corresponds to 4 types of marginal likelihood and log-likelihood contributions: (1) neither event occurrence, (2) nonterminal occurrence only, (3) terminal occurrence only, and (4) both event occurrence.

  1. Observe neither event (δiR=δiD=0)
    mi=1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ-(1/σ)
    mi=-(1/σ)log1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ=-(1/σ)log1+σκ1Ziexiβ1zyiRα1Zi+κ2Ziexiβ2zyiRα2Zi
  2. Observe only nonterminal (δiR=1,δiD=0)
    mi=λ1ZiyiRxi,θ1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ+Λ3ZiyiDyiR,xi,θ-(1/σ+1)
    mi=logλ1ZiyiRxi,θ+-(1/σ+1)log1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ+Λ3ZiyiDyiR,xi,θ=logκ1Ziα1Ziexiβ2zyiRα1Zi+-(1/σ+1)log1+σκ1Ziexiβ1ZiyiRα1Zi+κ2Ziexiβ2ZiyiRα2Zi+κ3Ziexiβ3ZiyiD-yiRα3Zi
  3. Observe only terminal (δiR=0,δiD=1)
    mi=λ2ZiyiRxi,θ1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ-(1/σ+1)
    mi=logλ2ZiyiRxi,θ+-(1/σ+1)log1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ=logκ2Ziα2Ziexiβ2ZiyiRα2Zi+-(1/σ+1)log1+σκ1Ziexiβ1ZiyiRα1Zi+κ2Ziexiβ2ZiyiRα2Zi
  4. Observed both events (δiR=δiD=1)
    mi=(1+σ)λ1ZiyiRxi,θλ3ZiyiDyiR,xi,θ×1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ+Λ3ZiyiDyiR,xi,θ-(1/σ+δiR+δiD)
    mi=log(1+σ)+logλ1ZiyiRxi,θ+logλ3ZiyiDyiR,xi,θ+-(1/σ+2)log1+σΛ1ZiyiRxi,θ+Λ2ZiyiRxi,θ+Λ3ZiyiDyiR,xi,θ=log(1+σ)+logκ1Ziα1Ziexiβ1ZiyiRα1Zi+logκ3Ziα3Ziexiβ3ZiyiD-yiRα3Zi+-(1/σ+2)log1+σκ1Ziexiβ1ZiyiRα1Zi+κ2Ziexiβ2ZiyiRα2Zi+κ3Ziexiβ3ZiyiD-yiRα3Zi

D. Frailty marginalization and posterior predictive imputation

D.1. Full conditional form of frailties

Omitting terms which do not depend on γi, the conditional likelihood as a function of γi is

cγiδiR+δiDexp-γiΛ1yiRxi,θ+Λ2yiRxi,θ+Λ3yiDyiR,xi,θ

This demonstrates that the posterior distribution of γi, conditional on θ, only depends on the data through Oi. The only other place γi appears in the posterior is in fγiσ, which has kernel γi1/σ-1exp-γi(1/σ). Thus, the full conditional distribution for γi is

πγiγi1/σ+δiR+δiD-1exp-γi1/σ+Λ1yiRxi,θ+Λ2yiRxi,θ+Λ3yiDyiR,xi,θ

which can be recognized as the kernel of a Gammaa1,a2 with a1=1/σ+δiR+δiD and a2=1/σ+Λ1yiRxi,θ+Λ2yiRxi,θ+Λ3yiDyiR,xi,θ.

D.2. Frailty imputation

For each i=1,,n and b=1,,B, sample γi(b) as

γi(b)θ(b)Gamma1/σ(b)+δiR+δiD,1/σ(b)+Λ1(yiRxi,θ(b))+Λ2(yiRxi,θ(b))+Λ3(yiDyiR,xi,θ(b))

D.3. Imputation of censored outcomes

Censoring is the cause of missing outcome data in the factual treatment arm. In the presence of censoring for individual i, there is only partial information on one or both of RiZi,DiZi. Given censoring time Ci and draw b of the posterior parameter and frailty vectors (θ(b),γi(b)), we can impute Ri(b)Zi or Ri(b)Zi,Di(b)Zi. The hazards specified in the main text lead to a simple imputation strategy based on Weibull random deviates. The resulting draws of Ri(b)Zi,Di(b)Zi are compatible with θ(b), γi(b), and Oi. If individual i was censored before the nonterminal event occurred, we impute the missing event times according to the following algorithm.

  1. Impute a candidate nonterminal event time R* from a Weibull distribution with shape parameter α1Zi,(b) and scale parameter exp-(log(γi(b)κ1Zi,(b))+xiβ1Zi,(b))/α1Zi that is truncated to have no mass below Ci.

  2. Impute a candidate death time D* from a Weibull distribution with shape parameter α2Zi,(b) and scale parameter exp-(log(γi(b)κ2Zi,(b))+xiβ2Zi,(b))/α2Zi,(b) that is truncated to have no mass below Ci. If D*<R*, set RiZi(b)=R¯ and DiZi(b)=D*. This gives us a complete (RiZi(b),DiZi(b)) and the imputation process concludes. Otherwise, set RiZi(b)=R* and continue to Step 3.

  3. Impute a sojourn time S* from a Weibull distribution with shape parameter α3Zi,(b) and scale parameter exp-(log(γi(b)κ3Zi,(b))+xiβ3Zi,(b))/α3Zi,(b).

  4. Set the imputed death time DiZi(b) to RiZi(b)+S*.

For individuals censored after the nonterminal event, the procedure starts at Step 3 with the modification that the distribution of the sojourn time must be truncated to have no mass below Ci-RiZi. After imputation, each individual has a complete set of four potential outcomes for all B MCMC iterations.

D.4. Imputation of counterfactual potential outcomes

Missingness in the outcome pair Ri1-Zi,Di1-Zi is due to the so-called fundamental problem of causal inference. From assuming independent gamma-distributed frailties as in the main text, we have that posterior draws of Ri(b)1-Zi,Di(b)1-Zi depend only on γi(b), θ(b), and Oi.

Imputation of outcomes in the treatment arm counter to fact is actually simpler because there is no need to truncate so that the imputed values agree with the observed (YiR,δiR,YiD,δiD). Replace Zi with 1-Zi and Ci with 0, then follow the algorithm in Section D.3.

E. Additional details for simulation study

  • Sample size n=6000

  • 4 independent Bernoulli covariates X1,,X4 with p=0.5

  • One 𝒩(0,1) continuous covariate X5

  • Mean-centering of design matrices

  • Regression coefficients, baseline hazard rates, and Weibull shape parameters match the posterior medians for the first 5 covariates in a simplified version of the data application, where the covariate effects were assumed to be shared across treatment arms:

    β10=β11=(0.2153716,-0.2693217,0.4919442,0.2751461,-0.1407124)

    β20=β21=(-0.2626574,-0.322248,0.5197795,0.6148415,0.2845544)

    β30=β31=(0.16109025,-0.1152938,0.4407258,0.40880864,0.09707071)

    α10,α20,α30=(0.96949,1.19239,0.91350)

    α11,α21,α31=(1.04497,1.30856,0.94227)

    κ10,κ20,κ30=(0.01591,0.00322,0.00857)

    κ11,κ21,κ31=(0.0147,0.0041,0.01352)

  • Frailty variance σ equal to 1.314846, the posterior median of σ in the data application

Scenario 1: Correct specification

  • γii.i.dGamma(1/σ,1/σ)

  • ωjz=1 for all j{1,2,3} and z{0,1}

Scenario 2: Log-normal frailty

A log-normal distribution log-𝒩μL,σL2 has the following properties:

  • Mean: expμL+σL2/2

  • Variance: expσL2-1exp2μL+σL2

For identifiability reasons, we constrain the mean of the distribution to be one (i.e., require μL=-σL2/2), yielding the curved exponential distribution log-𝒩-σL2/2,σL2. To have the variance of the log-normal frailties match the variance σ in the other scenarios, we choose σl2=log(σ+1) for the σ stated above.

  • γii.i.dlog-𝒩μL=-log(σ+1)/2,σL2=log(σ+1)

  • ωjz=1 for all j{1,2,3} and z{0,1}

Scenario 3: Variable frailty exponents

The final scenario has a gamma distribution for the frailty, but the relevance of the frailty in each model is different.

  • γii.i.dGamma(1/σ,1/σ)

  • The exponents ωjz vary:

    ω10=exp{-0.1}

    ω20=exp{-0.2}

    ω30=exp{-0}

    ω11=exp{0.15}

    ω21=exp{0.2}

    ω31=exp{0}

E.1. Prior specification

The mean of the frailty distribution must be constrained to 1 for identifiability purposes, and the frailties must be assumed to arise from a gamma distribution in order to arrive at the marginalized likelihood. These two criteria are met by distributions of the form Gamma(1/σ,1/σ). In all simulation scenarios, data were analyzed with an inverse-Gamma prior for σ.

  • γii.i.dGamma(1/σ,1/σ)

  • σi.i.d Inverse -Gamma(18/d,25/d) for d>0

Numerators for the σ hyperparameters were chosen to approximately center the prior around the underlying σ.Increasingd leads to vaguer priors for σ and offers a path to performing sensitivity analysis. Simulation results presented here arise from d=7.

E.2. MCMC specification

Each replicate data set was analyzed with model fits using the final 2000 iterations of a 5000 iteration chain.

Across 200 replicates in each of the three scenarios, none of the fits displayed signs of poor convergence, defined as having any of the following characteristics:

  • Energy fraction of missing information (E-FMI) < 0.2

  • One or more post-warmup HMC divergences in any chain

  • One or more occurrences reaching maximum tree depth

  • Effective post-warmup sample size for TV-SACE(30) or RM-SACE(90) below 500.

Monte Carlo standard errors were calculated following the jackknife after bootstrap formula from Koehler et al. (2009). Specifically, let X=X1,,XR designate the R simulated observed data sets, and let ϕˆR(X) be a Monte Carlo estimate of an operating characteristic like the 95% credible interval coverage for TV-SACEsnap(t=30). The estimated Monte Carlo standard error for the

R-1Rr=1RϕˆR-1X(-r)-1Rr=1ϕˆR-1X-r2,

where ϕˆR-1X(-r) is an estimate calculated with the rth replicate removed.

F. Hospital readmission data application

Figure 4 shows the degree to which matching on the estimated propensity score reduces imbalances in the observed covariates. All treated patients (i.e., patients discharged to home care with support) were matched to similar control patients (i.e., patients discharged to home care without additional support). While there are significant reductions in covariate imbalance in the post-matching sample, there is remaining imbalance on the length of the initial hospital stay.

Figure 4.

Figure 4

Reduction in covariate imbalance after propensity score matching of late-stage pancreatic cancer patients discharged to home care with support

Footnotes

Conflict of Interest

The authors have declared no conflict of interest. (or please state any conflicts of interest)

References

  1. Bartolucci F and Grilli L (2011). Modeling partial compliance through copulas in a principal stratification framework. JASA 106, 469–479. [Google Scholar]
  2. Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, and Riddell A (2017). Stan: a probabilistic programming language. Journal of Statistical Software 76,. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chen D, Li J, and Chong JK (2017). Hazards regression for freemium products and services: a competing risks approach. Journal of Statistical Computation and Simulation 87, 1863–1876. [Google Scholar]
  4. Dai J, Gilbert P, and Mâsse B (2012). Partially hidden markov model for time-varying principal stratification in HIV prevention trials. JASA 107, 52–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fine JP, Jiang H, and Chappell R (2001). On semi-competing risks data. Biometrika 88, 907–919. [Google Scholar]
  6. Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, and Rubin D (2013). Bayesian Data Analysis. Chapman and Hall/CRC. [Google Scholar]
  7. Gustafson P (2010). Bayesian inference for partially identified models. The International Journal of Biostatistics 6,. [DOI] [PubMed] [Google Scholar]
  8. Hernán MA and Robins JM (2016). Using big data to emulate a target trial when a randomized trial is not available. American journal of epidemiology 183, 758–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ho D, Imai K, King G, and Stuart E (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis. [Google Scholar]
  10. Jin H and Rubin D (2008). Principal stratification for causal inference with extended partial compliance. JASA 103, 101–111. [Google Scholar]
  11. Koehler E, Brown E, and Haneuse SJ-P (2009). On the assessment of monte carlo error in simulation-based statistical analyses. The American Statistician 63, 155–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Lee K, Haneuse S, Schrag D, and Dominici F (2015). Bayesian semiparametric analysis of semicompeting risks data: investigating hospital readmission after a pancreatic cancer diagnosis. JRSS:C 64, 253–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lin J, Ten Have T, and Elliott M (2008). Longitudinal nested compliance class model in the presence of time-varying noncompliance. JASA 103, 462–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Liu L, Wolfe R, and Huang X (2004). Shared frailty models for recurrent events and a terminal event. Biometrics 60, 747–756. [DOI] [PubMed] [Google Scholar]
  15. Long D and Hudgens M (2013). Sharpening bounds on principal effects with covariates. Biometrics 69, 812–819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Lyu T, Bornkamp B, Mueller-Velten G, and Schmidli H (2023). Bayesian inference for a principal stratum estimand on recurrent events truncated by death. Biometrics n/a,. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13831. [DOI] [PubMed] [Google Scholar]
  17. Mattei A, Ding P, Ballerini V, and Mealli F (2024). Assessing causal effects in the presence of treatment switching through principal stratification. Bayesian Analysis page to appear. [Google Scholar]
  18. Nevo D and Gorfine M (2022). Causal inference for semi-competing risks data. Biostatistics 23, 1115–1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Pan Q and Gastwirth J (2013). Estimating restricted mean job tenures in semi-competing risk data compensating victims of discrimination. The Annals of Applied Statistics 7, 1474–1496. [Google Scholar]
  20. Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling 7, 1393–1512. [Google Scholar]
  21. Royston P and Parmar M (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21, 2175–2197. [DOI] [PubMed] [Google Scholar]
  22. Rubin D (1990). Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science 5, 472–480. [Google Scholar]
  23. Rubin D (2000). Causal inference without counterfactuals: comment. JASA 95, 435–438. [Google Scholar]
  24. Schwartz S, Li F, and Mealli F (2011). A bayesian semiparametric approach to intermediate variables in causal inference. JASA 106, 1331–1344. [Google Scholar]
  25. Stensrud MJ and Dukes O (2022). Translating questions to estimands in randomized clinical trials with intercurrent events. Statistics in Medicine 41, 3211–3228. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Stensrud MJ, Young JG, Didelez V, Robins JM, and Hernán MA (2020). Separable effects for causal inference in the presence of competing events. Journal of the American Statistical Association pages 1–9. [Google Scholar]
  27. Tchetgen Tchetgen E (2014). Identification and estimation of survivor average causal effects. Statistics in Medicine 33, 3601–3628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Xu J, Kalbfleisch J, and Tai B (2010). Statistical analysis of illness–death processes and semicompeting risks data. Biometrics 66, 716–725. [DOI] [PubMed] [Google Scholar]
  29. Xu Y, Scharfstein D, Müller P, and Daniels M (2022). A Bayesian nonparametric approach for evaluating the causal effect of treatment in randomized trials with semi-competing risks. Biostatistics 23, 34–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Young JG, Stensrud MJ, Tchetgen Tchetgen EJ, and Hernán MA (2020). A causal framework for classical statistical estimands in failure-time settings with competing events. Statistics in medicine 39, 1199–1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Zhang J and Rubin D (2003). Estimation of causal effects via principal stratification when some outcomes are truncated by “death”. Journal of Educational and Behavioral Statistics 28, 353–368. [Google Scholar]

RESOURCES