ABSTRACT
This study proposes a heterogeneous mediation analysis for survival data that accommodates multiple mediators and sparsity of the predictors. We introduce a joint modeling approach that links the mediation regression and proportional hazards models through Bayesian additive regression trees with shared typologies. The shared tree component is motivated by the fact that confounders and effect modifiers on the causal pathways linked by different mediators often overlap. A sparsity‐inducing prior is incorporated to capture the most relevant confounders and effect modifiers on different causal pathways. The individual‐specific interventional direct and indirect effects are derived on the scale of the logarithm of hazards and survival function. A Bayesian approach with an efficient Markov chain Monte Carlo algorithm is developed to estimate the conditional interventional effects through the Monte Carlo implementation of the mediation formula. Simulation studies are conducted to verify the empirical performance of the proposed method. An application to the ACTG175 study further demonstrates the method's utility in causal discovery and heterogeneity quantification.
Keywords: heterogeneous effect, multiple mediators, survival outcome, variable selection
1. Introduction
Unraveling the mechanisms through which an exposure or treatment exerts its effect on an outcome of interest is a fundamental pursuit across multiple disciplines, including epidemiology, psychology, and social sciences. Mediation analysis has emerged as a powerful tool for disentangling the intricate causal pathways from the exposure to the outcome, which can be implemented through one or more intermediate variables. Various mediation models have been developed in survival analysis to elucidate the causal mechanism underlying the relationship between a therapy or risk factor and a time‐to‐event outcome. A majority of these approaches are grounded in classic frameworks, such as a linear structural equation model [1] combined with an additive hazards model [2, 3], proportional hazards model [4, 5], or transformation models [6, 7], enabling efficient estimation of the population‐level direct and indirect effects with good interpretability.
Beyond the scope of mediation analysis, the growing accessibility of large‐scale clinical trials and observational studies, coupled with recent advancements in machine learning techniques, has inevitably shifted the focus of causal inference from population‐level average treatment effects (ATE) to group‐level conditional average treatment effects (CATE) given individual‐specific features. This trend is prominent in biomedical studies involving survival data, where patients with distinct characteristics naturally respond differently to the same treatment. The rise of precision medicine has fueled the development of various approaches for estimating heterogeneous treatment effects, including causal survival forest [8], nonparametric accelerated failure time model based on Bayesian trees [9, 10], subgroup analysis via gradient tree boosting [11], and deep neural networks‐based Cox regression [12]. However, these approaches primarily focus on capturing heterogeneity within the total effect of the exposure on the survival outcome. They often overlook the potential existence of embedded causal pathways, let alone the more intricate scenario where multiple mediators are present, each with the possibility of heterogeneity occurrence. Although conventional moderated mediation analysis and its recent adaptation to the causal framework [13, 14] provide a straightforward solution by including first‐order interaction terms of a suspected moderator with the exposure in the mediator(s) regression model, as well as the interaction of the suspected moderator with the exposure or the mediator(s) in the outcome regression model, it typically targets on one moderator at a time and can become less practical when abundant choices of moderators exist.
On the other hand, large‐scale trials and observational datasets often come with many pretreatment covariates or even posttreatment covariates serving as potential mediators. However, a substantial proportion of these covariates may neither affect the mediator/outcome nor interact with the exposure. In other words, in practical applications, it is often the case that only a small portion of the pretreatment covariates truly interact with the exposure or the mediators on their pathways to the outcome, and similarly, only a small part of the posttreatment variables genuinely mediate the effect of the exposure on the outcome. In such cases, the accurate estimation of CATE for different subgroups and the reliability of the detected sources of heterogeneity inevitably rely on feature selection for the sparse regression surface. Therefore, it is critically important to differentiate the genuine confounders and effect modifiers from the irrelevant covariates and, more crucially, to distinguish the true mediators from a set of posttreatment covariates that were measured temporally between treatment initiation and the outcome of interest, and consequently included among the candidate mediators. Although recent literature has seen various works addressing this issue for causal survival analysis [15, 16] or high‐dimensional mediation analysis [17], these approaches have yet to consider a simultaneous selection of confounders, effect modifiers, and mediators for heterogeneous mediation analysis with a survival outcome.
To address these challenges, we propose a novel Bayesian semiparametric approach for mediation analysis with survival outcomes that adapts to the existence of multiple mediators, heterogeneity across causal pathways, sparsity of the regression surfaces, and overlapping patterns of confounders or effect modifiers across different pathways. Our model comprises two major components built upon the Bayesian Causal Forest (BCF) [18] framework, which provides a flexible and interpretable way to estimate CATE. Specifically, the first component is a mediator regression model, which characterizes the prognostic factors of the exposure–mediator relationships and modifiers of the direct effect on the mediators through two separate Bayesian additive regression trees (BART) [19]. Motivated by the mixed‐scale Bayesian forest [20], we allow the tree typologies within each BART to be shared among the mediators, such that the overlap of the confounders or effect modifiers can be accommodated straightforwardly. The second component fits the survival outcome through a proportional hazards (PH) model with two separate BART, one capturing the prognostic factors of the exposure–outcome relationship and the other describing the direct effect of the exposure on the logarithm of hazards. The PH model is linked to the mediator regression model in that the second BART is allowed to share tree typologies with its counterpart in the mediator regression model, thereby accommodating the presence of overlapping modifiers for the direct effects of the exposure. Besides, we extend the BCF to incorporate shrinkage priors for variable selection [21], enabling the selection of relevant confounders and effect modifiers while simultaneously fitting the possibly complex and nonlinear regression surfaces. By combining the flexibility of a Bayesian nonparametric ensemble of trees with shared typologies and sparsity‐inducing prior, the proposed approach offers a powerful tool for uncovering heterogeneous mediation mechanisms in survival data without imposing restrictive assumptions on the regression surfaces or requiring manual variable preselection.
The rest of the article is organized as follows. Section 2 introduces the proposed heterogeneous mediation model with shared tree ensembles. Section 3 defines the interventional path‐specific effects under the potential outcomes framework, along with a set of identifiability assumptions. Section 4 elucidates the Bayesian inference procedure for the proposed methodology. Section 5 presents simulation studies that evaluate the empirical performance of the proposed model. Section 6 provides a real‐world application example using a dataset collected from a clinical trial on HIV‐infected patients. Section 7 concludes the paper. Technical details are provided in the Data S1.
2. Model Description
2.1. A Brief Overview of BART
Let be a vector of predictors and be a continuous response variable. BART is a flexible nonparametric model that approximates the unknown regression function of on via a sum of binary trees, where each binary tree can be viewed as a simple step function that splits the original dataset into more homogeneous subsets according to the values of . Such a regression problem is usually set up as
| (1) |
where is the true unknown regression function to be learnt from the data and is a normally distributed error term with zero mean. denotes the structure of the th binary tree composed of a collection of internal nodes with splitting rules and terminal nodes, and denotes the vector of parameter values corresponding to each terminal node, with serving as a gauge of the average outcome inside the th node. At each internal node, binary splits are made according to rules of the form vs. , with being the th predictor and being a threshold. The top‐down sequence of all such splitting rules across each binary tree , as a whole, recursively partitions the original covariate space into subsets represented by the terminal nodes. is a function that maps a given to a node parameter suppose that it is allocated to the th () terminal node of according to the above rules, thereby ending up as a piecewise constant function. Since any given is placed to one unique terminal node within each tree, (or the conditional mean function ) is fitted as a sum of the corresponding s over the trees.
The BART framework has seen significant extensions to handle diverse data types, including mixed‐scale responses, time‐to‐event outcomes, and high‐dimensional predictors. Notable examples include the BCF, which introduced tailored regularization for estimating CATEs, and shrinkage priors [21, 22, 23] that enable variable selection from redundant covariates, among others. In the next section, we will build upon the BCF framework to introduce the proposed heterogeneous mediation model. For a comprehensive introduction to BART, readers are directed to Tan and Roy [24], while Hill, Linero, and Murray [25] provide an overview of recent developments in this area.
2.2. Parallel Mediators With Shared Ensembles
We consider a two‐arm study design with continuous mediator(s) and a right‐censored time‐to‐event outcome. For subject , let denote the vector of pretreatment covariates. Let be the treatment indicator, with if subject is assigned to the treated group and otherwise. Let be a vector of parallel (i.e., causally non‐ordered) mediators with each element representing a specific mediator that links a potential causal pathway between the treatment and the time‐to‐event outcome. Causal heterogeneity is usually characterized as interaction between the treatment indicator and a subset of , also known as the effect modifiers. Such effect modifiers in moderated mediation analysis can be shared across pathways from the treatment to multiple mediators. Therefore, we model the parallel mediators through two shared ensembles of trees as follows:
| (2) |
where is a dimensional vector that captures the prognostic factors of the treatment‐mediator relationship through the ensemble of binary trees with leaf node parameters and , . is another BART that captures the direct effect of the treatment on the mediators through shared tree topology with leaf node parameters and . Note that and within and are dimensional vectors since the mediators are modeled jointly without any prespecified causal structure. For a special single‐mediator case with , and in Equation (2) simply degenerate to two regular BART (i.e., and ) with one‐dimensional leaf node parameters and , respectively. denotes the corresponding node‐parameter‐allocation functions. is the residual term of the mediator regression equations. While an alternative approach is to model the residual distribution nonparametrically as a location mixture of Gaussian distributions (see, e.g., Henderson et al. [9]), we found through a pilot simulation that combining the shared tree ensembles and Dirichlet process prior did not synergistically improve estimation accuracy of the proposed method. The additional model complexity introduced rendered it less desirable compared to the simpler assumption of normally distributed residuals.
Our model development up to this point has focused on scenarios where the mediators are continuous variables. However, it is straightforward to extend the mediator regression model to accommodate binary mediators using the multivariate probit model. Specifically,
| (3) |
where are the observed binary mediators, are the underlying latent Gaussian variables as introduced by Chib and Greenberg [26], is the indicator function, and are the residuals with the diagonal elements of fixed at 1.0. Similar extensions to accommodate binary mediators or outcomes have been adopted and seen to be effective in existing BART‐based causal models [24, 25, 27].
2.3. Cox Proportional Hazards Model
Let and represent the event time of interest and censoring time for subject , respectively. Define as an indicator of whether the th subject experiences the event of interest or is censored, and as the observed time. Throughout this paper, we consider the scenario where a subject has the event or is censored after the mediators have been measured. Recognizing that effect modifiers can exert influence on multiple causal pathways simultaneously, that is, from both the treatment to the mediators and the treatment to the outcome, we employ the shared tree ensembles again in the PH model to jointly capture the causal relationships between the treatment, mediators, and the hazard at time :
| (4) |
where captures the direct effect of the treatment on hazard through shared the tree topology with but possesses a different set of leaf node parameters, . is another separate BART that quantifies the prognostic factors, or equivalently, the main effects and interaction of the pretreatment covariates and the mediators, on the pathways to the survival response through the sum of binary trees with leaf node parameters . is the unknown baseline hazard function.
The proposed modeling approach stems from the frequent occurrence of shared effect modifiers influencing both the treatment–mediator and treatment–outcome relationships in real‐world scenarios. These shared modifiers can exhibit consistent moderation patterns, even in high‐dimensional settings with a sparse subset of true modifiers. The prognostic factors in the treatment–mediator relationship may also demonstrate an overlapped pattern. The shared tree typologies, and , naturally accommodate the overlapped prognostic factors or moderators in the causal pathways. Besides, our approach extends moderated mediation to a semiparametric framework, accommodating arbitrary interactions among the effect modifiers and the treatment. This flexibility applies to both the mediator regression model and the outcome regression model, in contrast to the classical use of first‐order interaction terms. The BCF structure on the right‐hand side of Equations ((2), (3), (4)) directly captures the confounders and modifiers as splitting variables within different tree ensembles, with no need to prespecify the functional form of the covariates.
2.4. Model Specification and Variable Selection
We adopt the default BART prior outlined in Chipman, George, and McCulloch [19] and the BCF regularization prior of Hahn, Murray, and Carvalho [18] to specify the proposed model. However, we introduce a slight modification to account for the shared tree structure inherent in our modeling approach. For the mediator regression equation, consider first the following priors on each binary tree in or : (i) the probability that a node at depth continues splitting is given by or , where and are hyperparameters set a priori to control the scope of each binary tree; (ii) the splitting variable at each internal node is selected uniformly from the set of all available variables, and the splitting value at the internal node given the known splitting variable is drawn uniformly from the set of all available splitting values constructed from the interpolated sample quantiles; and (iii) given or , the leaf node parameters or are assumed with conjugate normal priors or , respectively, where are prespecified hyperparameters that ensure a substantial prior probability is assigned within a desirable range for each mediator marginally, and and are diagonal positive definite matrices. Priors (i)–(iii) collectively impose a regularization effect on each individual binary tree, restricting it to a weak learner that contributes only a small portion to the overall fit. The sequential accumulation of such weak learners under such regularization helps mitigate the risk of overfitting.
A common concern regarding the default BART prior is its potential limitation in high‐dimensional settings with a substantial number of irrelevant predictors. The discrete uniform prior (ii) used for predictor selection lacks an explicit mechanism for inducing sparsity or feature shrinkage. Specifically, consider as the vector of splitting probabilities for the predictors . The discrete uniform prior with for gives each predictor an equal chance of being selected as a splitting variable. Consequently, it may struggle to identify the real confounders or effect modifiers in our mediation problem, hindering model accuracy and interpretability. As an alternative, we employ the Gibbs‐type prior proposed by Linero and Du [21] to achieve variable selection in the proposed heterogeneous mediation analysis. This sparsity‐inducing prior is given as follows:
| (5) |
For an internal node at depth , the idea is to first sample , the number of predictors allowed to be split on. A subset of predictors, of size , is then sampled from the complete set of predictors . For the sampled predictors with , a Dirichlet prior is assigned to the corresponding splitting probability . is a hyperparameter that encodes the preference for sparsity. Notably, with and , the Gibbs‐type prior (ii
) reduces to the discrete uniform prior.
We use a similar approach to specify the ensemble of trees in the PH model, assuming a prior node‐splitting probability of for each binary tree in and the same sparsity‐inducing prior for the splitting rules as described in (ii
) above. Throughout the paper, we consider a scenario where the dimension of the candidate mediators, , do not necessarily grow as the number of observations increases and remains relatively small (as demonstrated in Sections 5 and 6). The primary difference lies in priors assigned to the leaf node parameters, and . Based on the Bayesian justification of Cox's partial likelihood [28], we assign conjugate log‐gamma priors to the leaf node parameters, that is, and , where are hyperparameters that ensure each binary tree a weak learner and a reasonable prior range for the ensemble. Section 4 contains a detailed description on choice of the hyperparameters.
3. Definition and Identification of the Interventional Conditional Effects
In this section, we construct the interventional conditional path‐specific effects (ICPSEs) under the counterfactual framework, using “distribution shifts” of the potential outcomes given the covariates. Let be the potential survival time that would have been observed if the th subject had been exposed to a treatment level of and a mediator level of , where each element is set to the level of , . Let be the th potential mediator that would have been observed for the th subject had the treatment level been set to . Following the definition of interventional (in)direct effects [29, 30], we further denote as the potential survival time that would have been observed for the th subject if the treatment level is set to and the value of each mediator is set to a random draw from its (counterfactual) marginal distribuion under a treatment level , that is, , with , , and denoting the cumulative distribution function of conditional on . Comparatively, let be the potential survival time for the th subject if the treatment level is set to and the mediator levels are set to a random draw from the (counterfactual) joint distribution of the potential mediators under a treatment level , i.e., , with .
Throughout this paper, we invoke three widely accepted assumptions that underpin causal inference: the stable unit treatment value assumption (SUTVA), the positivity assumption, and the consistency assumption. SUTVA postulates that treatment assignment for one individual does not influence the potential outcomes of any other individual, ruling out interference between units. The positivity assumption ensures that all units have a nonzero probability of being assigned to any of the treatment arms, preventing scenarios where certain subgroups are deterministically excluded from a specific treatment level. Within each treatment arm, it is further stipulated that densities of the candidate mediators conditional on the pretreatment covariates are nonzero with probability 1 for each value of their respective support. The consistency assumption states that an individual's potential mediators or potential outcome under the treatment condition they actually experienced is precisely their observed mediators/outcome.
Let denote a user‐specified function of the potential survival time conditional on the covariates. We define the conditional path‐specific effects under the interventional framework by substituting with or . For notation simplicity, we omit and write it as or in what follows. For each individual, the conditional average total effect of the binary treatment is defined as
| (6) |
where and correspond to the two treatment arms. This conditional average total effect can be decomposed to a conditional direct effect defined as
| (7) |
which is implemented along the direct causal pathway from the treatment to survival time, and a joint conditional indirect effect defined as
| (8) |
which is implemented by shifting the joint distribution of the potential mediators. Since the proposed model implicitly assumes no treatment–mediator interaction in the outcome regression equation, the two possible ways of decomposition, and , should work equivalently up to sign.
Additionally, the conditional indirect effect through each mediator separately is defined as
| (9) |
that is, by shifting the counterfactual marginal distribution of only the th mediator from one treatment arm () to the other (), while keeping the remaining mediators unchanged. Existing studies employed the interventional (in)direct effect as a randomized interventional analogue of the natural (in)direct effect and focused on their population‐level interpretation through hypothetical interventions on and [31, 32]. Our definition above extends this approach to subpopulation or individual level, enabling the creation of randomized interventional analogues of the conditional average (in)direct effects. This extension allows us to interpret as the effect that passing through the paths from the treatment to and then to the survival outcome directly, which is implemented through the hypothetical intervention that shifts its marginal distribution conditional on the covariate level of the th subject. It is worth noting that such indirect effects through each mediator separately do not necessarily sum up to the joint indirect effect. Instead, the difference between the sum of the separate indirect effect and the joint indirect effect can be rewritten into two parts,
| (10) |
and
| (11) |
where the former is referred to as the indirect effect via the mediators' mutual dependence and the latter stands for a remainder effect [29, 30]. When the mediators are assumed to be causally unordered with no interactions in the outcome regression model, the indirect effect via the mediators' mutual dependence and the remainder effect become zero. In this scenario, the interventional joint indirect effect can be “decomposed” into the separate indirect effects via each mediator. Besides, for the simple single‐mediator case with , the joint indirect effect just coincides with the indirect effect through , that is,
| (12) |
with the additional effect in Equations (10) and (11) reducing to zero.
To identify the targeted functions from the proposed model, we invoke a series of sequential ignorability assumptions:
Assumption 1
, that is, there is no unmeasured confounders for the causal pathway from the treatment to the time‐to‐event outcome conditional on .
Assumption 2
, that is, there is no unmeasured confounders for the causal pathway from the mediator(s) to the time‐to‐event outcome conditional on and .
Assumption 3
, that is, there is no unmeasured confounders for the causal pathway from the treatment to the mediator(s) conditional on .
“A BC” in the above assumptions denotes independence between A and B conditional on C, and . Under these assumptions, can be identified nonparametrically as
| (13) |
where , , denotes the marginal distribution function of conditional on treatment level and covariate . The integration over can be approximated through Monte Carlo integration, where the integrand is calculated and averaged over random realizations of in the treatment arm simulated from the corresponding posterior samples. Similarly, can be identified as
| (14) |
where denotes the joint distribution function of conditional on treatment level and covariate . The group‐level or population‐level average mediation effect, which is at the heart of conventional mediation analysis, can be identified through
| (15) |
and
| (16) |
where denotes the distribution function of covariates . The integration is often approximated by averaging the function of the potential survival time identified in (13) and (14) over the empirical distribution of to avoid separate modelling of the covariate distribution. Derivation of Equations ((13), (14), (15), (16)) under the assumptions outlined above is provided in Appendix A of the Supporting Information.
The proposed model links the causal pathways from the treatment and the mediators to the time‐to‐event outcome through a PH model. Consequently, the logarithm of hazards serves as a natural choice for the targeted function . Other options are also viable, including survival probability [33], transformation of survival times [2, 6], and restricted mean survival time [4, 5], each offering a different perspective on the survival outcome. This study focuses on the hazard function and the survival probability as illustrative examples of the targeted function to demonstrate the estimation of the ICPSEs.
3.1. ICPSEs on Logarithm of Hazards
With , we define the ICPSEs on the logarithm of hazards as follows:
| (17) |
whereas the conditional average total effect on the logarithm of hazards adds up to
| (18) |
Similarly, with , the conditional indirect effect through each mediator separately is given by
| (19) |
The counterfactual logarithm of hazards, and , are identified from the proposed model as given in (13) and (14).
3.2. ICPSEs on Survival Probability
Based on the PH model, the survival function at a a given time can be expressed as
| (20) |
With , and , the ICPSEs and total effect on the probability of surviving over are defined as
| (21) |
4. Bayesian Analysis
4.1. Prior Specification
Let , , , , and . Let denote the observed data. The complete data likelihood of the proposed model is expressed as
| (22) |
where is the cumulative baseline hazard function. We adopt Bayesian P‐splines [34] to achieve flexible estimation of the baseline hazard and the ICPSEs. The basic idea is to approximate through a set of B‐spline basis functions, i.e.,
| (23) |
where is the number of B‐spline segments determined by a prespecified number of knots on , is the corresponding set of cubic B‐spline basis functions, and is the vector of unknown coefficients. Following the common practice in survival analysis [4, 35], we set 10 equidistant knots on and applied a roughness penalty on coefficients of the B‐splines to counterbalance its flexibility and avoid overfitting. The penalty is imposed on the (higher‐order) finite differences of adjacent B‐splines coefficients. We considered a second‐order difference penalty of the form: , where is the penalty parameter that controls smoothness of the fit and is the second‐order difference operator with a matrix representation
| (24) |
Accounting for this penalty, the prior distribution for the B‐splines coefficients was specified as , that is, a multivariate Gaussian distribution with a precision matrix . We assigned a gamma hyperprior for the penalty parameter, that is, , where and are prespecified hyperparameters. A common choice is and , which produces a dispersed prior.
Assuming prior independence among the individual trees and the residual covariance matrix , the prior distribution for the remaining parameters within the tree ensembles can be formulated as follows:
| (25) |
Among the hyperparameters , the s and s restrict the depth of each individual binary tree through , , and , and controls the variable selection process through the splitting rules, while the rest confine the prior probability for , , , and through , , , and , respectively. For the prognostic functions and , we set and such that shallow trees with two or three leaf nodes are allowed with higher prior probability. For and that capture the direct effect of the treatment through shared tree topology, we set and to grow even shallower individual trees to avoid false discovery of the effect modifiers or heterogeneity in the ICPSEs. Different combinations of commonly used BART priors are also explored in a sensitivity analysis to evaluate the robustness of the estimated causal effects. For the sparsity‐inducing Gibbs‐type prior on the splitting rules, we follow the default setup of Linero and Du [21] to set .
For the mediator regression equations, we assign a conjugate inverse‐Wishart prior to the residual covariance matrix, , where is a diagonal matrix. To set the diagonal elements of , we first get a rough estimate of the residual variance for each mediator, denoted as , by performing an ordinary least squared (OLS) regression of on and . is expected to overestimate the th diagonal element of , that is, , given that no higher‐order terms, such as interactions, is considered in the OLS regression. Therefore, we assign an inverse‐chi‐squared prior, , where is chosen to ensure that the prior probability of being greater than the rough estimate is controlled at around . is a prespecified constant that adjusts the comparative scale of based on , and a common choice is 0.95. The degree of freedom hyperparameter of the inverse‐chi‐squared prior is set as as suggested by Chipman, George, and McCulloch [19]. The marginal inverse‐chi‐squared priors then leads to . For the leaf node hyperparameters and , we adopt the common practice to set and the th diagonal element of and as the sample SD of , denoted by . This results in a normal prior, , for the th element of the prognostic function , such that the interval covers 95% of its prior probability. Similarly, a normal prior is assigned to the th element of , which is the conditional average effect of on characterized by the sum of leaf node parameters s. By choosing and , the normal priors control the (marginal) prior distribution of each expected mediator under treatment within a range of with a probability of around 99.6%.
For the leaf node parameters in the PH model, we assign conjugate log‐gamma priors, . Following Linero et al. [36], we introduce regularization to homogeneity on the prognostic functions by setting , where is the digamma function, such that is controlled with a prior mean of 0 and a prior variance of . We follow the suggestions therein to set and further adopt a simple approximation of and , as proposed by Murray [37], to ease the numerical calculation of the digamma and trigamma functions. The same set of procedures is also implemented for , but with stronger regularization to homogeneity introduced by .
We set for the prognostic functions in both the mediator regression model and the PH model, which is a commonly used default for BART‐based approaches. For and , which characterize the direct pathways from the treatment, we consider to accommodate varying overlapped structure in the effect modifiers. Different choices for the above set of hyperparameters are studied in the simulation studies to check stability of the proposed method under different prior regularizations on heterogeneity.
4.2. Posterior Inference
Within the Bayesian framework, the estimation of the ICPSEs is facilitated through posterior sampling from the full conditional distributions of the binary trees, which are derived from the complete‐data likelihood in Equation (22) and the prior distributions outlined above. We employ a Bayesian Backfitting MCMC algorithm in tandem with the Gibbs sampler to implement efficient sampling. This integrated procedure allows for sequential updates of the individual binary trees and nuisance parameters, as outlined in Algorithm 1. A detailed derivation of the full conditional distributions is provided in Appendix B of the Supporting Information. Additionally, a multivariate extension of the comonotonic sampling strategy [27] is adopted to enhance the numerical implementation of the mediation formula. The details of this extended sampling strategy are presented in Algorithms S1 of Appendix A in Supporting Information.
ALGORITHM 1. The hybrid MCMC algorithm with Bayesian backfitting and Gibbs sampler.

Leveraging the posterior samples of the ICPSEs, we can assess the evidence of heterogeneity in each causal pathway by evaluating the posterior probability of differential treatment effects, as proposed by Henderson et al. [9]. Specifically, let , where denotes the sample average direct effect, and let . Across the MCMC iterations, an individualized direct effect that deviates from the sample average level becomes evident when the corresponding value of approaches 1, or when is close to either 0 or 1. Similar indices, , , and , can be defined for the separate and joint indirect effects, as well as the total effect. Following the recommendations of Henderson et al. [9] and Chen et al. [38], we consider (i.e., or ) as strong evidence of individual‐specific differential PSEs, (i.e., or ) as moderate evidence, and (i.e., or ) as mild evidence. The potential sources of heterogeneity can be evaluated by examining the posterior splitting proportions of the predictors in , , and across the iterations, which are readily available from the collected posterior draws of the corresponding binary trees. To further elucidate the specific manner in which the top‐selected effect modifiers induce heterogeneity on each causal pathway, we can leverage group‐average ICPSEs and the partial effect of certain covariates of interest, denoted by . The partial effect of can be estimated by averaging the ICPSEs over the samples in which is fixed at a set of reasonable values.
5. Simulation Study
In this section, we assessed the finite sample performance of the proposed method in estimating the ICPSEs under a multiple‐mediator case with . We generated correlated covariates from a multivariate Gaussian distribution, , with the entries of the covariance matrix being . The prognostic functions and true direct effects on the mediators and hazard of the survival outcome were set as
The true propensity score for each subject was generated by
where is the sample SD of the function on the numerator and is a noise component. We simulated the event times with a true baseline hazard of . The mediators were generated with a true residual variance–covariance matrix of . We considered two sample sizes, and . For each scenario, 100 replications were conducted. Prior inputs were specified as described in Section 4.1, and the unknown baseline hazard function was approximated using cubic B‐splines with 10 equidistant knots. Following common practice, we included the estimated propensity scores as an additional covariate/predictor for the prognostic functions and to alleviate regularization‐induced confounding.
To compute the Bayesian estimates of the ICPSEs, we ran the MCMC algorithm with 2000 iterations after a burn‐in stage of 1000 iterations and conducted Monte Carlo implementation of the mediation formula with through comonotonic sampling within each iteration. Figure S1 depicts the distribution of the true and estimated ICPSEs on each scale based on one randomly selected replication. The distribution of the individualized ICPSEs across the causal pathways are covered by the proposed method with satisfying accuracy and the sample average interventional effects are estimated precisely. We evaluated the performance of the proposed methodology using both average‐level and individual‐level criteria, which have been adopted in existing works on heterogeneous mediation analysis [27, 39]. At average level, we computed the bias, relative bias, and root mean squared error (RMSE) for the sample average interventional PSEs regarding each causal pathway. At individual level, we reported the squared root of the precision in estimating heterogeneous effects (PEHE) [40]. For example, for the direct path, was calculated as . Table 1 presents the average bias, relative bias, and RMSE for the estimated sample average interventional PSEs, as well as the average based on the 100 replications. The default discrete uniform prior of BART‐based models were also performed for comparison. The proposed method with Gibbs‐type prior produced better estimates of the ICPSEs, as evidenced by the substantially smaller bias, RMSE, and values. As expected, an increasing sample size led to improved estimation results for both methods, but the proposed method consistently outperformed the default one across different sample sizes. Figure 1 shows the average posterior splitting proportions of the predictors in each tree ensemble based on the replications. The true confounders and effect modifiers were selected with high probabilities by the proposed method, while irrelevant variables were almost excluded with near‐zero probabilities.
TABLE 1.
Average bias, relative bias, root mean squared error (RMSE) for the sample average interventional path‐specific effects (PSEs), and for the interventional conditional path‐specific effects (ICPSEs) on the scale of logarithm of hazards and survival probability at the mean observed event time under setup (i) with two mediators.
|
|
|
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Gibbs type | Discrete uniform | Gibbs type | Discrete uniform | ||||||||
| PSE | Criterion | Logh | Survp | Logh | Survp | Logh | Survp | Logh | Survp | ||
| DE | Bias | 0.085 | −0.006 | 0.215 | −0.022 | 0.056 | −0.005 | 0.141 | −0.013 | ||
| RBias | 0.115 | 0.086 | 0.254 | 0.264 | 0.078 | 0.061 | 0.192 | 0.150 | |||
| RMSE | 0.117 | 0.012 | 0.267 | 0.033 | 0.078 | 0.009 | 0.153 | 0.020 | |||
|
|
0.387 | 0.061 | 0.513 | 0.080 | 0.305 | 0.049 | 0.392 | 0.063 | |||
IE
|
Bias | −0.090 | 0.013 | −0.257 | 0.036 | −0.070 | 0.009 | −0.163 | 0.020 | ||
| RBias | 0.072 | 0.094 | 0.197 | 0.237 | 0.057 | 0.068 | 0.130 | 0.145 | |||
| RMSE | 0.119 | 0.017 | 0.291 | 0.039 | 0.091 | 0.012 | 0.187 | 0.024 | |||
|
|
0.405 | 0.058 | 0.593 | 0.076 | 0.308 | 0.046 | 0.436 | 0.059 | |||
IE
|
Bias | −0.037 | 0.009 | −0.047 | 0.012 | −0.027 | 0.005 | −0.045 | 0.007 | ||
| RBias | 0.093 | 0.117 | 0.094 | 0.138 | 0.070 | 0.080 | 0.085 | 0.108 | |||
| RMSE | 0.086 | 0.012 | 0.090 | 0.011 | 0.063 | 0.008 | 0.076 | 0.010 | |||
|
|
0.557 | 0.064 | 0.610 | 0.071 | 0.446 | 0.052 | 0.497 | 0.059 | |||
IE
|
Bias | −0.126 | 0.018 | −0.312 | 0.043 | −0.097 | 0.012 | −0.219 | 0.025 | ||
| RBias | 0.069 | 0.074 | 0.169 | 0.176 | 0.051 | 0.053 | 0.114 | 0.106 | |||
| RMSE | 0.177 | 0.023 | 0.394 | 0.049 | 0.133 | 0.016 | 0.274 | 0.031 | |||
|
|
0.737 | 0.095 | 0.919 | 0.117 | 0.582 | 0.075 | 0.713 | 0.090 | |||
| TE | Bias | −0.044 | 0.013 | −0.090 | 0.010 | −0.041 | 0.007 | −0.083 | 0.006 | ||
| RBias | 0.037 | 0.044 | 0.029 | 0.033 | 0.031 | 0.030 | 0.027 | 0.024 | |||
| RMSE | 0.142 | 0.019 | 0.142 | 0.017 | 0.106 | 0.013 | 0.121 | 0.013 | |||
|
|
0.824 | 0.110 | 0.921 | 0.120 | 0.646 | 0.087 | 0.717 | 0.096 | |||
Abbreviations: DE, the direct effect ; IE
and IE
, the separate indirect effect and , respectively. IE
, the joint indirect effect; Logh, the logarithm of hazards; Survp, survival probability; TE, the total effect.
FIGURE 1.

Posterior splitting proportions in the tree ensembles for each covariate under the Gibbs‐type prior (left) and the default discrete uniform prior (right) in setup (i) with . The horizontal dotted lines stand for the discrete uniform splitting probabilities.
In addition to setup (i), we conducted an additional simulation, referred to as setup (ii), to further validate the proposed model. This setup involved a fake mediator generated as a normal random variable from and intentionally designed to have no effect on the survival outcome. The objective was to examine the impact of including the fake mediator, that is, the posttreatment variable that is listed among the candidate mediators but does not genuinely linking between the treatment and the survival outcome, on the estimation of the ICPSEs corresponding to both the truly existing causal pathways and the nonexistent one. Results summarized in Table S1 indicate that inclusion of the invalid mediator led to slightly increased average bias, RMSE, and for the estimated sample average interventional PSEs along the true underlying causal pathways. Nonetheless, the proposed model still demonstrated improved performance as the sample size increased. Moreover, the estimated indirect effect though separately was found consistently close to zero for each individual across the replications, suggesting that the incorporation of this invalid mediator did not induce any significant spurious effects into the discovered causal mechanism. This finding is also supported by Figure S2, which displays the distribution of the true and estimated ICPSEs along each causal pathway. Figure S3 shows the average posterior splitting proportions of the predictors, where the proposed method effectively captures the confounders and modifiers of the relationship between and , while simultaneously excludes from the selected predictors in the PH model.
To further evaluate the variable selection performance of the proposed method, we conducted additional simulations under scenarios where the number of redundant covariates or candidate mediators increased with the number of observations. Additionally, we performed sensitivity analyses with respect to different choices of hyperparameters, the number of spline basis functions, baseline hazard, and censoring rate. To demonstrate the robustness of the proposed method against violations of the normal assumption, we also considered non‐normally distributed mediators, where residual terms in model (2) were generated from heavy‐tailed, skewed, or mixture distributions. Performance of the proposed method was relatively stable across the scenarios. Detailed setups and results are provided in Appendix C of Supporting Information.
The computing code for conducting the preceding analysis is available at https://github.com/roxiesun/HMedCox.
6. Real Data Application
In this section, we applied the proposed method to a dataset extracted from the AIDS Clinical Trials Group Protocol 175 (ACTG175) [41] to demonstrate its utility. ACTG175 is a double‐blind clinical trial that collected medical history and demographic characteristics of 2139 adults infected with HIV to compare nucleoside monotherapy with zidovudine (ZDV) or didanosine (ddI) to combination therapy. Medical findings have suggested the superiority of combination therapy over monotherapy in slowing the progression of HIV disease, while recent advancements in causal inference revealed heterogeneity among patients of varying ages and sexual activity levels [8, 42].
In this study, we focused on a subset of 532 subjects who received ZDV only as the control group () and 522 subjects who received ZDV+ddI combination as the treatment group (). The primary objective is to identify the causal pathways linking combination therapy to the survival of individuals and explore potential sources of heterogeneity introduced by the effect modifiers. We considered three possible mediators: the CD4:CD8 ratio (), as well as the changes in CD4 cell counts () and CD8 cell counts (), all of which were measured at weeks from baseline. Variation in the above two types of T cells are closely related to the low CD4:CD8 ratio for HIV‐infected individuals, which is indicative of the disease progression. The candidate mediators were regarded as causally non‐ordered based on two key considerations: first, they were measured simultaneously in a cross‐sectional manner, and secondly, the trends in these two types of cell counts are typically monitored parallelly in medical studies, along with their ratio, rather than being viewed as causally influencing one another. In the original study, the survival endpoint was defined as a decline in CD4 count of at least 50%, or occurrence of an AIDS‐defining event, or death. We excluded four individuals in the treatment group who had the event or were lost to follow‐up within the first 15 weeks after treatment initiation, resulting in a mean observed event time of 566 days, a maximum follow‐up time of 1,231 days, and a censoring rate of 73.3% for the targeted subset of individuals. Figure 2 depicts the Kaplan–Meier curves for subjects within the two treatment groups.
FIGURE 2.

The Kaplan–Meier curves for HIV‐infected subjects who received zidovudine (ZDV)+didanosine (ddI) () and who received ZDV only () in the ACTG175 trial.
We considered 14 pretreatment covariates as potential confounders or effect modifiers, including six continuous variables: age (years), weight (kg), the number of days of previously received antiretroviral therapy (preanti, days), Karnofsky score (karnof, 0–100), baseline CD4 cell count (cd40, cells/mm
), baseline CD8 cell count (cd80, cells/mm
), and eight binary variables: hemophilia (hemo, 1 = yes), homosexual activity (homo, 1 = yes), history of intravenous drug use (drugs, 1 = yes), ZDV use in the 30 days prior to treatment initiation (z30, 1 = yes), race (0 = White, 1 = non‐White), gender (1 = male), history of prior antiretroviral therapy (str2, 1 = experienced), and symptoms of HIV infection at enrolment (symptom, 0 = asymptomatic, 1 = symptomatic). The estimated propensity score was also included as a predictor.
We used the trace plots of three Markov chains starting from different initial values to check convergence. Figure S4 depicts the trace plots of several randomly selected parameters, showing that the Markov chains mixed well within a few iterations. Therefore, we ran 5000 MCMC iterations and discarded the first half as burn‐in. The upper panel of Table 2 presents the estimated sample average interventional PSEs and the proportion of individuals with various levels of differential ICPSEs based on the posterior samples. The combination therapy of ZDV+ddI shows an advantage over ZDV only in slowing the progression of HIV infection with an average hazard ratio of and an average increase of 0.110 in the probability of surviving over 566 days. Specifically, we observed that the treatment effect is mainly carried out through the direct causal pathway and partially through the first two mediators, that is, the CD4:CD8 ratio and changes in the CD4 cell counts measured at weeks. In contrast, the changes in CD8 cell counts appeared to be an invalid mediator. The estimated sample average indirect effect that corresponds to the pathway is nonsignificant and consistently near zero on both scales, suggesting that does not truly mediate the effect of the combination therapy on survival of the HIV‐infected subjects. On the scale of the logarithm of hazard, we spotted only mild evidence of heterogeneity on the direct pathway, with of the subjects exhibiting a differential direct effect with . On the probability of surviving over 566 days, however, such evidences were detected on both the direct pathway and the indirect pathways through and . Figure 3 shows the boxplot and histogram of the estimated ICPSEs across each causal pathway.
TABLE 2.
Estimated sample average interventional path‐specific effects (PSEs) on the scale of logarithm of hazard and survival probability, together with the evidence of individual‐specific differential PSEs and the top selected splitting variables in the analysis of ACTG175.
| Scale | |||||||
|---|---|---|---|---|---|---|---|
| Path | Index | Logh | Survp | ||||
| DE | Est.(SD) |
|
0.064 (0.020) | ||||
|
|
0% | 1.9% | |||||
|
|
0% | 7.1% | |||||
|
|
13.7% | 15.9% | |||||
IE
|
Est. (SD) |
|
0.027 (0.013) | ||||
|
|
0% | 0.2% | |||||
|
|
0% | 5.0% | |||||
|
|
0.1% | 12.2% | |||||
IE
|
Est. (SD) |
|
0.021 (0.019) | ||||
|
|
0% | 0% | |||||
|
|
0% | 0.5% | |||||
|
|
0% | 15.0% | |||||
IE
|
Est. (SD) |
|
0.000 (0.001) | ||||
|
|
0% | 0% | |||||
|
|
0% | 0% | |||||
|
|
0% | 0% | |||||
IE
|
Est. (SD) |
|
0.046 (0.021) | ||||
|
|
0% | 0.1% | |||||
|
|
0% | 2.3% | |||||
|
|
0.1% | 8.2% | |||||
| TE | Est. (SD) |
|
0.110 (0.023) | ||||
|
|
0% | 1.4% | |||||
|
|
0% | 5.9% | |||||
|
|
1.0% | 13.0% | |||||
| Tree ensembles | |||||||
| Top splitting variable |
|
|
|
||||
| First | Gender (17.4%) | (11.9%) | Homo (34.2%) | ||||
| Second | Homo (17.2%) | Drugs (8.3%) | Hemo (32.4%) | ||||
| Third | Race (17.0%) | Race (8.3%) | Drugs (31.5%) | ||||
| Fourth | cd80 (12.3%) | Karnof (8.3%) | Weight (0.9%) | ||||
Abbreviations: Est, Bayesian estimates of the sample average PSEs; SD, posterior standard deviation.
FIGURE 3.

Estimated ICPSEs for each individual on the logarithm scale of hazards (left) and survival probability with respect to days (right). The vertical black lines stand for the estimated sample average interventional PSEs correspondingly.
The lower panel of Table 2 presents the top selected splitting variables for each component of the tree ensembles, which serve as the most possible sources of heterogeneity. The posterior splitting proportions for the predictors in each tree ensemble are plotted in Figure 4. Overall, the effects of the combination therapy on the mediators and the survival outcomes are moderated by hemophilia, intravenous drug use history, and homosexual activity. Figure S5 depicts the partial effects of the effect modifiers on the distribution of the ICPSEs. Finally, the levels of the mediators at weeks are mainly related to gender, homosexual activity, race, and the baseline CD8 cell counts; while the prognosis of the HIV‐infected subjects can also be confounded by the Karnofsky score, race, and homosexual activity.
FIGURE 4.

Posterior splitting proportions in the tree ensembles for each covariate in the ACTG175 dataset. The horizontal dotted lines stand for the discrete uniform splitting probabilities.
7. Conclusion
This study introduces a novel heterogeneous mediation analysis tailored for survival data with multiple potential mediators based on joint modeling of the mediator regression model and the PH model. The proposed approach with shared ensemble of trees explicitly accounts for the overlapped pattern of confounders and effect modifiers among the mediators and outcome, which are ubiquitous in real‐world scenarios. The incorporation of sparsity‐inducing Gibbs‐type priors into the shared tree ensembles addresses the challenges of feature selection and heterogeneity quantification, enabling the model to identify the most relevant predictors while simultaneously capturing the intricate interplay between effect modifiers across multiple causal pathways. A fully Bayesian approach is developed to estimate the individual‐specific and sample average interventional PSEs under the potential outcome framework. The utility and performance of the proposed method are thoroughly evaluated through extensive simulation studies and a real‐world application to the ACTG175 dataset.
There are several promising new directions that can be explored. First, we have assumed causally non‐ordered mediators to model their joint distribution with the time‐to‐event outcome through three separate ensembles of trees. The marginal distribution of each mediator is therefore directly attainable to facilitate the Monte Carlo implementation of the mediator formula. Strengths of the shared tree topologies are manifested in contrast to modeling the prognostic functions and direct causal pathways separately for the mediator and outcome regression models using tree ensembles with univariate leaf node parameters. However, it is worth mentioning that such efficiency of shared structure may trade off the ability to disentangle the causal structure among multiple causally ordered mediators. A possible solution would be to substitute in Equation (2) with , allowing each mediator to be regressed on the rest of mediators (excluding itself), and embedding the relevant field knowledge on their causal ordering through prior distributions assigned to the corresponding splitting rules of the tree ensembles. We consider this as a promising future direction that warrants continuing effort, given that the causal structure learning problem itself remains an ongoing topic of debate. From the perspective of variable selection, it is also a worthwhile endeavor to allow and to be high‐dimensional and study the respective asymptotic properties regarding their order. Second, the interventionist framework employed in this study can handle more general cases, such as the presence of posttreatment confounding, without the need to impose the cross‐world independence assumption when focusing on the sample average or population level effects. Although our proposed method can accommodate posttreatment variables as possible mediators, incorporating heterogeneity and addressing more general causal structure simultaneously remains a direction for future research. Therefore, relaxing this assumption under the current joint model is of considerable interest. Third, it should also be possible to address violations of the positivity assumption, which can be viewed as an extreme case of heterogeneity. Exploring the above extensions would require substantial efforts in future research endeavors.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1. Supporting information.
Acknowledgments
This research was fully supported by GRF Grants (14303622, 14302220) from Research Grant Council of the Hong Kong Special Administration Region. The authors are thankful to the editor, the associate editor, and two anonymous reviewers for their valuable comments.
Funding: The authors received no specific funding for this work.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
- 1. Baron R. M. and Kenny D. A., “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations,” Journal of Personality and Social Psychology 51, no. 6 (1986): 1173–1182. [DOI] [PubMed] [Google Scholar]
- 2. VanderWeele T. J., “Causal Mediation Analysis With Survival Data,” Epidemiology 22, no. 4 (2011): 582–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Aalen O. O., Stensrud M. J., Didelez V., Daniel R., Røysland K., and Strohmaier S., “Time‐Dependent Mediators in Survival Analysis: Modeling Direct and Indirect Effects With the Additive Hazards Model,” Biometrical Journal 62, no. 3 (2020): 532–549. [DOI] [PubMed] [Google Scholar]
- 4. Zhou X. and Song X., “Mediation Analysis for Mixture Cox Proportional Hazards Cure Models,” Statistical Methods in Medical Research 30, no. 6 (2021): 1554–1572. [DOI] [PubMed] [Google Scholar]
- 5. Wang W. and Albert J. M., “Causal Mediation Analysis for the Cox Proportional Hazards Model With a Smooth Baseline Hazard Estimator,” Journal of the Royal Statistical Society. Series C, Applied Statistics 66, no. 4 (2017): 741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Huang Y. T. and Yang H. I., “Causal Mediation Analysis of Survival Outcome With Multiple Mediators,” Epidemiology 28, no. 3 (2017): 370–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Fulcher I. R., Tchetgen E. T., and Williams P. L., “Mediation Analysis for Censored Survival Data Under an Accelerated Failure Time Model,” Epidemiology 28, no. 5 (2017): 660–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Cui Y., Kosorok M. R., Sverdrup E., Wager S., and Zhu R., “Estimating Heterogeneous Treatment Effects With Right‐Censored Data via Causal Survival Forests,” Journal of the Royal Statistical Society, Series B (Statistical Methodology) 85, no. 2 (2023): 179–211. [Google Scholar]
- 9. Henderson N. C., Louis T. A., Rosner G. L., and Varadhan R., “Individualized Treatment Effects With Censored Data via Fully Nonparametric Bayesian Accelerated Failure Time Models,” Biostatistics 21, no. 1 (2020): 50–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Sun R. and Song X., “A Tree‐Based Bayesian Accelerated Failure Time Cure Model for Estimating Heterogeneous Treatment Effect,” Bayesian Analysis 1, no. 1 (2023): 1–29.36714467 [Google Scholar]
- 11. Zhang P., Ma J., Chen X., and Shentu Y., “A Nonparametric Method for Value Function Guided Subgroup Identification via Gradient Tree Boosting for Censored Survival Data,” Statistics in Medicine 39, no. 28 (2020): 4133–4146. [DOI] [PubMed] [Google Scholar]
- 12. Katzman J. L., Shaham U., Cloninger A., Bates J., Jiang T., and Kluger Y., “DeepSurv: Personalized Treatment Recommender System Using a Cox Proportional Hazards Deep Neural Network,” BMC Medical Research Methodology 18 (2018): 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Preacher K. J., Rucker D. D., and Hayes A. F., “Addressing Moderated Mediation Hypotheses: Theory, Methods, and Prescriptions,” Multivariate Behavioral Research 42, no. 1 (2007): 185–227. [DOI] [PubMed] [Google Scholar]
- 14. Qin X. and Wang L., “Causal Moderated Mediation Analysis: Methods and Software,” Behavior Research Methods 56, no. 3 (2024): 1314–1334. [DOI] [PubMed] [Google Scholar]
- 15. Hu L., “A New Method for Clustered Survival Data: Estimation of Treatment Effect Heterogeneity and Variable Selection,” Biometrical Journal 66, no. 1 (2024): 2200178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Xu Y., Ignatiadis N., Sverdrup E., Fleming S., Wager S., and Shah N., “Treatment Heterogeneity With Survival Outcomes,” in Handbook of Matching and Weighting Adjustments for Causal Inference, 1st ed., eds. Zubizarreta J. R., Stuart E. A., Small D. S., and Rosenbaum P. R. (Boca Raton, FL: CRC Press, 2023), 445–482. [Google Scholar]
- 17. Zhang H., Zheng Y., Hou L., Zheng C., and Liu L., “Mediation Analysis for Survival Data With High‐Dimensional Mediators,” Bioinformatics 37, no. 21 (2021): 3815–3821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hahn P. R., Murray J. S., and Carvalho C. M., “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (With Discussion),” Bayesian Analysis 15, no. 3 (2020): 965–1056. [Google Scholar]
- 19. Chipman H. A., George E. I., and McCulloch R. E., “BART: Bayesian Additive Regression Trees,” Annals of Applied Statistics 4, no. 1 (2010): 266–298. [Google Scholar]
- 20. Linero A. R., Sinha D., and Lipsitz S. R., “Semiparametric Mixed‐Scale Models Using Shared Bayesian Forests,” Biometrics 76, no. 1 (2020): 131–144. [DOI] [PubMed] [Google Scholar]
- 21. Linero A. R. and Du J., “Gibbs Priors for Bayesian Nonparametric Variable Selection With Weak Learners,” Journal of Computational and Graphical Statistics 32, no. 3 (2023): 1046–1059. [Google Scholar]
- 22. Linero A. R., “Bayesian Regression Trees for High‐Dimensional Prediction and Variable Selection,” Journal of the American Statistical Association 113, no. 522 (2018): 626–636. [Google Scholar]
- 23. Liu Y., Ročková V., and Wang Y., “Variable Selection With ABC Bayesian Forests,” Journal of the Royal Statistical Society, Series B (Statistical Methodology) 83, no. 3 (2021): 453–481. [Google Scholar]
- 24. Tan Y. V. and Roy J., “Bayesian Additive Regression Trees and the General BART Model,” Statistics in Medicine 38, no. 25 (2019): 5048–5069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Hill J., Linero A., and Murray J., “Bayesian Additive Regression Trees: A Review and Look Forward,” Annual Review of Statistics and Its Application 7, no. 1 (2020): 251–278. [Google Scholar]
- 26. Chib S. and Greenberg E., “Analysis of Multivariate Probit Models,” Biometrika 85, no. 2 (1998): 347–361. [Google Scholar]
- 27. Linero A. R. and Zhang Q., “Mediation Analysis Using Bayesian Tree Ensembles,” Psychological Methods (2022): 1–23. [DOI] [PubMed] [Google Scholar]
- 28. Sinha D., Ibrahim J. G., and Chen M. H., “A Bayesian Justification of Cox's Partial Likelihood,” Biometrika 90, no. 3 (2003): 629–641. [Google Scholar]
- 29. Vansteelandt S. and Daniel R. M., “Interventional Effects for Mediation Analysis With Multiple Mediators,” Epidemiology 28, no. 2 (2017): 258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Loh W. W., Moerkerke B., Loeys T., and Vansteelandt S., “Nonlinear Mediation Analysis With High‐Dimensional Mediators Whose Causal Structure Is Unknown,” Biometrics 78, no. 1 (2022): 46–59. [DOI] [PubMed] [Google Scholar]
- 31. Lin S. H. and VanderWeele T., “Interventional Approach for Path‐Specific Effects,” Journal of Causal Inference 5, no. 1 (2017): 1–10. [Google Scholar]
- 32. VanderWeele T. J. and Tchetgen Tchetgen E. J., “Mediation Analysis With Time Varying Exposures and Mediators,” Journal of the Royal Statistical Society, Series B (Statistical Methodology) 79, no. 3 (2017): 917–938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Tchetgen Tchetgen E. J., “On Causal Mediation Analysis With a Survival Outcome,” International Journal of Biostatistics 7, no. 1 (2011): 1–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Lang S. and Brezger A., “Bayesian P‐Splines,” Journal of Computational and Graphical Statistics 13, no. 1 (2004): 183–212. [Google Scholar]
- 35. Çetinyürek Yavuz A. and Lambert P., “Smooth Estimation of Survival Functions and Hazard Ratios From Interval‐Censored Data Using Bayesian Penalized B‐Splines,” Statistics in Medicine 30, no. 1 (2011): 75–90. [DOI] [PubMed] [Google Scholar]
- 36. Linero A. R., Basak P., Li Y., and Sinha D., “Bayesian Survival Tree Ensembles With Submodel Shrinkage,” Bayesian Analysis 17, no. 3 (2022): 997–1020. [Google Scholar]
- 37. Murray J. S., “Log‐Linear Bayesian Additive Regression Trees for Multinomial Logistic and Count Regression Models,” Journal of the American Statistical Association 116, no. 534 (2021): 756–769. [Google Scholar]
- 38. Chen X., Harhay M. O., Tong G., and Li F., “A Bayesian Machine Learning Approach for Estimating Heterogeneous Survivor Causal Effects: Applications to a Critical Care Trial,” Annals of Applied Statistics 18, no. 1 (2024): 350–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Xue F., Tang X., Kim G., et al., “Heterogeneous Mediation Analysis on Epigenomic PTSD and Traumatic Stress in a Predominantly African American Cohort,” Journal of the American Statistical Association 117, no. 540 (2022): 1669–1683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Hill J. L., “Bayesian Nonparametric Modeling for Causal Inference,” Journal of Computational and Graphical Statistics 20, no. 1 (2011): 217–240. [Google Scholar]
- 41. Hammer S. M., Katzenstein D. A., Hughes M. D., et al., “A Trial Comparing Nucleoside Monotherapy With Combination Therapy in HIV‐Infected Adults With CD4 Cell Counts From 200 to 500 per Cubic Millimeter,” New England Journal of Medicine 335, no. 15 (1996): 1081–1090. [DOI] [PubMed] [Google Scholar]
- 42. Lu W., Zhang H. H., and Zeng D., “Variable Selection for Optimal Treatment Decision,” Statistical Methods in Medical Research 22, no. 5 (2013): 493–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. Supporting information.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
