Bayesian variable selection for a semi-competing risks model with three hazard functions

Andrew G Chapple; Marina Vannucci; Peter F Thall; Steven Lin

doi:10.1016/j.csda.2017.03.002

. Author manuscript; available in PMC: 2018 Aug 1.

Published in final edited form as: Comput Stat Data Anal. 2017 Mar 22;112:170–185. doi: 10.1016/j.csda.2017.03.002

Bayesian variable selection for a semi-competing risks model with three hazard functions

Andrew G Chapple ^a,^*, Marina Vannucci ^a,^b, Peter F Thall ^b, Steven Lin ^c

PMCID: PMC5637455 NIHMSID: NIHMS857317 PMID: 29033478

Abstract

A variable selection procedure is developed for a semi-competing risks regression model with three hazard functions that uses spike-and-slab priors and stochastic search variable selection algorithms for posterior inference. A rule is devised for choosing the threshold on the marginal posterior probability of variable inclusion based on the Deviance Information Criterion (DIC) that is examined in a simulation study. The method is applied to data from esophageal cancer patients from the MD Anderson Cancer Center, Houston, TX, where the most important covariates are selected in each of the hazards of effusion, death before effusion, and death after effusion. The DIC procedure that is proposed leads to similar selected models regardless of the choices of some of the hyperparameters. The application results show that patients with intensity-modulated radiation therapy have significantly reduced risks of pericardial effusion, pleural effusion, and death before either effusion type.

Keywords: Semi-Competing Risks, Variable Selection, Metropolis-Hastings

1. Introduction

Global cancer incidence estimates from 2008 indicate that esophageal cancer is the eighth most common and the sixth most deadly among cancers [2]. Torrey et al. [3] estimated that there were 455,800 new cases and 400,200 deaths in 2012. The two most common types of esophageal cancer are squamous cell carcinoma and adenocarcinoma, the latter of which has been linked to obesity and gastrointestinal problems. Definitive concurrent chemoradiotherapy (CRT) is the standard treatment for esophageal cancers for patients with inoperable tumors. Several different methods for delivering radiation are used, particularly three dimensional conformal radiation therapy (3D-CRT). All of these methods increase patient survival but also have several adverse effects, the most common being pleural effusion (PE) and pericardial effusion (PCE) [4] [5]. Pleural and pericardial effusion occur when excess fluid is present around the lungs and heart, respectively, and can lead to poor function of these organs and death. These adverse events are associated with higher doses of radiation to the heart and lungs [5].

Intensity-modulated radiation therapy (IMRT) has been shown to reduce the volume of a patient’s non-cancerous organs exposed to radiation and increase volume of radiation on esophageal tumors compared to 3D-CRT [6]. Chandra et al. [7] showed that IMRT reduced the volume of lungs that received different radiation doses compared to 3D-CRT. Due to the relationship between increased dosage and effusion rates, IMRT potentially could result in fewer incidences of pleural and pericardial effusion in esophageal cancer patients compared to standard 3D-CRT treatment. Assessing the impact of IMRT on time to effusion is more complicated than assessing the impact of IMRT on overall survival time, even when pleural and pericardial effusion are considered separately. When the survival time of interest is a non-terminal event such as effusion, death is commonly assumed to be a non-informative independent censoring event [8]. This assumption is invalid because both pleural and pericardial effusion could lead to death, which means that death may indicate that a patient experienced effusion, which is informative censoring. Another complication with this data structure is that patients can experience effusion followed by death, but not death first and effusion afterwards. Administrative right censoring could occur before a patient experiences either event type or after a patient has effusion. Due to these complications, this data structure must be analyzed with a semi-competing risks model. This model has different hazards for three events: a given non-terminal event, death before the non-terminal event and death after the non-terminal event.

Lee et al. [9] developed a novel Bayesian semi-parametric semi-competing risks regression model for a non-terminal event and death. Their motivating non-terminal event of interest was hospital readmission for patients diagnosed with advanced pancreatic cancer. Since pancreatic cancer has high mortality rates, they were concerned with end of life care and keeping patients comfortable at home during their final days. They considered three different hazard functions: the hazard of a non-terminal event, the hazard of death without a non-terminal event, and the hazard of death after a non-terminal event. Each of these three hazard functions resembled a Cox-type regression including a baseline hazard function which was assumed to be piecewise exponential, individual patient frailty parameters, and a linear combination of patient covariates. They used the posterior sample of the beta coefficients in the three hazards for inference on what types of homecare affected the hazard that a patient would return to the hospital, the hazard of death before returning to the hospital, and the hazard of death after patients were readmitted to the hospital. They implemented their algorithm in the package SemiCompRisks [10].

We initially aimed at implementing the semi-competing risks model of Lee et al. [9] to analyze the effects of IMRT on effusion and overall survival for an observational study consisting of 470 patients at The University of Texas M.D. Anderson Cancer Center in Houston, TX, treated between January 1998 and April 2012 [11]. However, it was unclear which baseline covariates should be included in the model for analyzing the IMRT effect, particularly because of the correlation between treatment group assignment and the baseline covariates, which could affect clinical conclusions. Consequently, in this paper we develop a variable selection procedure for the semi-competing risks model of Lee et al. that uses spike-and-slab priors and stochastic search variable selection (SSVS) algorithms for posterior inference. The proposed procedure performs variable selection for each of the linear terms in the three hazard functions. Furthermore, we devise a protocol to choose the threshold on the marginal posterior probability of variable inclusion based on the Deviance Information Criterion (DIC). The code for the described methodology can be found in the R package SCRSELECT [1] In the application to the data from the esophageal cancer patients we do not perform variable selection on the IMRT status. To correct for some of the bias introduced in a nonrandomized observational study, we estimate the probability of receiving IMRT for each patient as a function of other covariates and include this propensity score in each hazard function. This allows us to compare the effects of IMRT on effusion and death while correcting for bias for non-randomization in every potential model. We present results from analyses done separately for pleural and pericardial effusion, where we show how the DIC procedure we propose leads to similar selected models, regardless of the choice of some of the hyperparameters. We find that patients with IMRT radiation have significantly reduced risks of pericardial effusion, pleural effusion and death before either effusion type. The rest of the paper is organized as follows: In section 2, we describe the Bayesian semi-parametric semi-competing risks model, the variable selection priors and the Markov Chain Monte Carlo procedure for posterior inference. We also present the DIC-based procedure that we propose for the final covariate selection. In section 3, we perform a simulation study to assess our proposed DIC-based procedure. In section 4, we describe the case study data and discuss results and sensitivity to hyperparameter choices. Section 5 concludes the paper with a discussion.

2. Methods

2.1. Semi-Parametric Semi-Competing Risks Model

Let T₁_i denote the time to a non terminal event and T₂_i be the time to death for patient i. Lee et al. [9] model covariate effects in the three hazard functions in the following manner. They denote h₁, the hazard of a non-terminal event, h₂ the hazard of a terminal event when the non-terminal event has not occurred and h₃, the hazard of a terminal event after the non-terminal event has occurred. Let x_i denote the vector of patient covariates and β₁, β₂, β₃ denote the three coefficient vectors associated with x_i in hazards 1,2, and 3, respectively. They list the functional forms of the three hazards for the semi-markov model as

h_{1} (T_{1 i} ∣ γ_{i}, β_{1}, x_{i}) = γ_{i} h_{01} (T_{1 i}) exp (x_{i}^{t} β_{1}),

(1)

h_{2} (T_{2 i} ∣ γ_{i}, β_{2}, x_{i}) = γ_{i} h_{02} (T_{2 i}) exp (x_{i}^{t} β_{2})

(2)

and

h_{3} (T_{2 i} ∣ T_{1 i}, γ_{i}, β_{3}, x_{i}) = γ_{i} h_{03} (T_{2 i} - T_{1 i}) exp (x_{i}^{t} β_{3}) .

(3)

γ_i is the frailty for patient i and h₀_g is the baseline hazard function for event g = 1, 2, 3. Even though here, we consider the same set of covariates for each of the three hazards, model formulation (1)–(3) can accommodate different lists of covariates for each hazard function. Lee et al. [9] assume that the log baseline hazard functions are piecewise exponential, that is λ_g_,_j = log(h₀_g(t)) is constant for t ∈ I_g_,_j = (s_g_,_j₋₁, s_g_,_j] for a partition of the time scale s_g_,0 = 0 < s_g_,1 < s_g_,2 < … < s_{g,J_g} < s_g_,_max, where s_g_,_max is the largest observed time for event g. The observed events are Y₁_i = min(T₁_i, T₂_i, C_i), δ₁_i = I[T₁_i ≤ min(T₂_i, C_i)], Y₂_i = min(T₂_i, C_i) and δ₂_i = I[T₁_i ≤ C_i]. These are realizations of T₁_i, the time to non terminal event, T₂_i, the time to death, and C_i, the independent censoring time for patient i. The likelihood for this parameterization of the baseline hazard functions and given hazard functions (1)–(3) is

\begin{matrix} L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{1}, β_{2}, β_{3}, γ, s_{1}, s_{2}, s_{3}, λ_{1}, λ_{2}, λ_{3}) = \\ \prod_{j = 1}^{J_{1} + 1} \prod_{k = 1}^{J_{2} + 1} \prod_{l = 1}^{J_{3} + 1} exp {λ_{1} d_{1 j} - exp (λ_{1 k}) \sum_{m \in R_{1 j}} Δ_{m j}^{1} γ_{m} exp (x_{m}^{t} β_{1})} \times exp {λ_{2} d_{2 k} - exp (λ_{2 k}) \sum_{q \in R_{2 k}} Δ_{q k}^{2} γ_{q} exp (x_{q}^{t} β_{2})} \\ \times exp {λ_{3} d_{3 l} - exp (λ_{3 l}) \sum_{r \in R_{3 l}} Δ_{r l}^{3} γ_{r} exp (x_{r}^{t} β_{3})} \times \prod_{m^{'} \in D_{1 j}} γ_{m^{'}} exp (x_{m^{'}}^{t} β_{1}) \prod_{q^{'} \in D_{2 k}} γ_{q^{'}} exp (x_{q^{'}}^{t} β_{2}) \prod_{r^{'} \in D_{3 l}} γ_{r^{'}} exp (x_{r^{'}}^{t} β_{3}), \end{matrix}

where d₁_j is the number of patients who experienced a non-terminal event in the interval (s_1,_j₋₁, s_1,_j], d₂_k is the number of patients who experienced a terminal event but did not previously experience a non-terminal event in the interval (s_2,_k₋₁, s_2,_k], and d₃_l = #{i : s_3,_l₋₁ < Y₂_i − Y₁_i ≤ s_3,_l, δ₁_i = 1, δ₂_i = 1} is the number of patients who have a time between effusion and death in interval (s_3,_l₋₁, s_3,_l]. $Δ_{i j}^{g} = max {0, min (y_{1 i}, s_{g, j}) - s_{g, j - 1}}$ for g = 1,2 and $Δ_{i l}^{3} = max {min (y_{2 i} - y_{1 i}, s_{g, l}) - s_{g, l - 1}}$ . We denote by ℛ_gj the risk set ( the set of patients who have neither experienced event g nor been censored by time s_g_,_j₋₁) for interval I_g_,_j and let 𝒟_gj denote the set of patients who experienced event g in this interval. We follow Lee et al. and assign priors for g = 1, 2, 3

\begin{matrix} λ_{g} ∣ J_{g}, μ_{λ_{g}}, σ_{λ_{g}}^{2} ~ N_{J_{g} + 1} (μ_{λ_{g}} \underline{1}, σ_{λ_{g}}^{2} \sum_{λ_{g}}), \\ σ_{λ_{g}}^{- 2} ~ Gamma (a_{g}, b_{g}), \\ π (μ_{λ_{g}}) \propto 1, \\ γ_{i} ~ Gamma (θ^{- 1}, θ^{- 1}) \\ and \\ θ^{- 1} ~ Gamma (ψ, ω) . \end{matrix}

This prior formulation has hyperparameters a_g, b_g, ψ, and ω. Here Σ_{λ_g} is a function of the current partition s_g corresponding to a Multivariate Normal intrinsic conditional autoregression model (ICAR) as formulated by Besag and Kooperberg [12].

Lee et al. [9] consider the number of split points J_g to be a poisson random variable with mean α_g. They place a prior distribution on the partition s_g|J_g by drawing 2J_g + 1 uniform random variables on [0, s_g_,_max] and take the even indexed values as the split points. We also adopt this prior formulation, which limits the number of split point intervals that have no events.

2.2. Variable Selection Priors

Bayesian variable selection methods were introduced by George and McCulloch [13] [14] for normal linear models. The basic idea is to introduce a latent binary random vector η = (η₁, …, η_p) with η_k = 0 indicating that the variable x₍_k₎ should be excluded from the model and η_k = 1 otherwise. Generalizing this notation to three hazard functions, we introduce three latent vectors η_g = (η_g_,1, …, η_g_,_p) one for each hazard function g = 1, 2, 3, where η_g_,_k = 1 indicates that the variable x₍_k₎ is important in hazard g and η_g_,_k = 0 otherwise. The indicator η_g_,_k is included in the prior distribution of β_g_,_k to define the mixture

β_{g, k} ∣ η_{g, k} ~ η_{g, k} N (0, τ_{g, k}^{2}) + (1 - η_{g, k}) δ_{0} (β_{g, k}),

(4)

where δ₀(·) is the point mass distribution at 0. Mixture priors of type (4) are known as spike-and-slab priors in the Bayesian variable selection literature. Here we choose $τ_{g, k}^{2}$ in (4) as the kth diagonal element of $c {(X_{(η_{g})}^{t} X_{(η_{g})})}^{- 1}$ with X_{(η_g)} denoting a matrix of the columns of X corresponding to η_g = 1, obtaining a prior resembling Zelner’s g-prior [15]. This prior mimicks the correlation structure of the data. Denote by β_{(η_g)} the coefficient vector corresponding to the entries of η_g = 1, and $β_{(η_{g}^{c})}$ the coefficient vector corresponding to the entries of η_g = 0. Then the prior distribution of β_g|η_g can written as

π (β_{(η_{g})}) ~ N [\underline{0}, c {(X_{(η_{g})}^{t} X_{(η_{g})})}^{- 1}],

(5)

and that $P [β_{(η_{g}^{c})} = 0] = 1$ . We assume η_g_,_k ~ Bernoulli(w_g) for all k = 1, …, p and g = 1, 2, 3. Formally, the prior of η_g|w_g is

π (η_{g} ∣ w_{g}) = \prod_{k = 1}^{p} w_{g}^{η_{g, k}} {(1 - w_{g})}^{1 - η_{g, k}} = w_{g}^{t_{g}} {(1 - w_{g})}^{p - t_{g}},

where $t_{g} = \sum_{k = 1}^{p} η_{g, k}$ is the number of η_g_,_k = 1 in hazard g. We assume a beta prior distribution for each of the w_gs with parameters z_g₁ and z_g₂. Following Brown et al. [16], we integrate out w_g to get the marginal prior for π(η_g) as

π (η_{g}) = \int π (η_{g} ∣ w_{g}) π (w_{g}) d w_{g} = \frac{Beta (t_{g} + z_{g 1}, p - t_{g} + z_{g 2})}{Beta (z_{g 1}, z_{g 2})},

where Beta(·, ·) is the beta function.

2.3. Markov Chain Monte Carlo

For posterior inference, we implement a Markov Chain Monte Carlo (MCMC) sampling scheme with stochastic search variable selection (SVSS) applied on the three hazard functions that use add, delete and swap moves. We employ some of the same sampling schemes of Lee et al. but use different algorithms in our sampling of s_g, λ_g|s_g, μ_{λ_g}, σ_{λ_g} and the sequence of the Gibbs sampler. Lee et al. [9] used a random scan Gibbs sampler where the probability of adding an additional split point or deleting one of the current split points was a function the current number of split points and hyperparameters. The remaining moves are the frailty sampler, the hierarchical frailty parameter, 3 baseline hazards, 3 hierarchical baseline hazard parameters and 3 regression parameters, which are assigned equal probabilites from what is left after accounting for birth and death probabilities for the three different baseline hazard functions. Our approach differs in that we do not randomly select what move to perform at each iteration, and instead do all the moves for each of the three hazard functions consecutively. In summary, a generic iteration of the MCMC sampler does the following

Update (η_g, β_g) jointly via a Metropolis step. This is done through add, delete, and swap moves. If t_g = 0, add one variable automatically and if t_g = p delete one variable automatically. Otherwise, with probability ϕ, perform a swap move. With probability 1−ϕ perform a Add/Delete move and randomly select one entry of η_g, say η_g_,_k, and if η_g_,_k = 1, perform a delete move, otherwise perform an add move. The details of these three moves are described below:
1. Add If η_g_,_k = 0, set $η_{g, k}^{*} = 1$ and sample β_g_,_k|β_g_,(−_k₎, η_g from a normal distribution. Denoting $\sum = c {(X_{(η_{g}^{*})}^{t} X_{(η_{g}^{*})})}^{- 1}$ , then the proposal distribution has mean $μ_{new} = \sum_{k, (- k)} \sum_{(- k), (- k)}^{- 1} β_{(η_{g}), (- k)}$ and a variance of $σ_{new}^{2} = \sum_{k, k} - \sum_{k, (- k)} \sum_{(- k), (- k)}^{- 1} \sum_{(- k), k}$ . Denote Σ_k_,_k as the kth diagonal element of Σ, Σ_k_,(−_k₎ as the kth row without the kth column entry and Σ₍₋_k_),(−_k₎ as the submatrix without row and column k of Σ. The proposal (η_g^*, β_g^*) is accepted jointly with probability
  $min [\frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) N (β_{g, k} ∣ μ_{new}, σ_{new}^{2}) Beta (t_{g} + 1 + z_{g, 1}, m_{g} - t_{g} - 1 + z_{g, 2})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) Beta (t_{g} + z_{g, 1}, m_{g} - t_{g} + z_{g, 2})}, 1] .$
2. Delete If η_g_,_k = 1, set $η_{g, k}^{*} = 0$ and $β_{g, k}^{*} = 0$ . Denote $R = c {(X_{(η_{g})}^{t} X_{(η_{g})})}^{- 1}, μ_{old} = R_{k, (- k)} R_{(- k), (- k)}^{- 1} β_{(η_{g}), (- k)}$ and variance $σ_{old}^{2} = R_{k, k} - R_{k, (- k)} R_{(- k), (- k)}^{- 1} R_{(- k), k}$ . Then the proposal ( $η_{g}^{*}, β_{g}^{*}$ ) is accepted with probability
  $min [\frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) Beta (t_{g} - 1 + z_{g, 1}, m_{g} - t_{g} + 1 + z_{g, 2})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) N (β_{g, k} ∣ μ_{old}, σ_{old}^{2}) Beta (t_{g} + z_{g, 1}, m_{g} - t_{g} + z_{g, 2})}, 1] .$
3. Swap randomly select one η_g_,_k = 1 and one η_g_,_j = 0 and swap their values, setting $η_{g, k}^{*} = 0$ and $η_{g, j}^{*} = 1$ . Then set $β_{g, j}^{*} = 0$ and sample $β_{g, k}^{*} ∣ η_{g}, β_{g, (- k)}$ from a normal distribution with mean μ_new and variance σ_new. These proposed values ( $η_{g}^{*}, β_{g}^{*}$ ) are accepted with probability
  $min [\frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) N (β_{g, k}^{*} ∣ μ_{new}, σ_{new}^{2})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) N (β_{g, j} ∣ μ_{old}, σ_{old}^{2})}, 1] .$
Update β_{(η_g)} via a Metropolis step. β_{(η_g)} is first updated conditionally by sampling β_{(η_g),k}|η_g, β_{(η_g),(−k)} from a N(μ_old, σ_old) for all k = 1, …, p variables currently in hazard g. Then β_{(η_g)} is updated jointly by sampling a proposal $β_{(η_{g}^{*})} ~ N (0, c {(X_{(η_{g})}^{t} X_{(η_{g})})}^{- 1})$ . This provides better mixing in our Markov Chain by moving entries of β_{(η_g)} further from or closer to zero which affects future (β_g, η_g) updates.
ε = θ⁻¹ via a Metropolis step in the same manner listed in the supplemental material of Lee et al.
Update γ_i for i = 1, …, n via a Gibbs step in the same manner listed in the supplemental material of Lee et al.
Update μ_{λ_g} and $σ_{λ_{g}}^{2}$ via a Gibbs step in the same manner listed in the supplemental material of Lee et al. for g = 1, 2, 3. The posteriors for each are normal and inverse-gamma distributed, respectively.
Update λ_g_,_j|λ_g_,(−_j₎, s_g for j = 1, …, J_g + 1 via a Metropolis step. Lee et al. use the first and second derivatives of π(λ_g|Data) in the λ_g|s_g, μ_{λ_g}, $σ_{λ_{g}}^{2}$ proposal distribution. These derivatives involve β_g, which often has entries that change drastically in magnitude in the SVSS procedure, making the sampling scheme of Lee et al. extremely inefficient. Furthermore, the tuning parameter they use in the proposal distribution for λ_g must be tuned for each hazard g = 1, 2, 3 extensively to get good Metropolis-Hastings acceptance rates. To avoid these issues, we sampled $λ_{g, k}^{*} ∣ λ_{g, (- k)}$ from a U (λ_g_,_k − c_g, λ_g_,_k + c_g) distribution where λ_g_,_k is the previous sampled value. This follows the approach of Haneuse et al. [17], and we use their default value c_g = .25 for each g, which gives good acceptance rates within our MCMC. Since the proposal ratio is $\frac{2 c_{g}}{2 c_{g}} = 1$ , the proposed value $λ_{g, k}^{*}$ is accepted with probability
$min [\frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {λ_{g}}^{*}, \dots) N_{J_{g} + 1} ({λ_{g}}^{*} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ λ_{g}, \dots) N_{J_{g} + 1} (λ_{g} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}})}, 1] .$

Add a split point to s_g and update (s_g, λ_g) jointly via a Metropolis-Hastings-Green step. Propose a new split point on

s_{g}^{*} ~ U [0, s_{g, \max}]

then the new

λ_{g}^{*}

heights created by adding this split point are based on a multiplicative perturbation like in Green [18] and Lee et al. [9]. That is, if s^* is proposed such that it is in the interval [s_g_,_j₋₁, s_g_,_j ] then the new

λ_{g}^{*}

values are

\begin{matrix} λ_{g, j}^{*} = λ_{g, j} - \frac{s_{g, j} - s^{*}}{s_{g, j} - s_{g, j - 1}} log (\frac{1 - U_{g}}{U_{g}}) \\ and \\ λ_{g, j + 1}^{*} = λ_{g, j} + \frac{s_{g, j} - s^{*}}{s_{g, j} - s_{g, j - 1}} log (\frac{1 - U_{g}}{U_{g}}) . \end{matrix}

U_g ~ U[0, 1] is drawn at every iteration. This is the only difference in our sampler from the sampler of Lee et al. for the Metropolis-Hastings-Green step, as they set U_g as a tuning parameter. This move is accepted with probability

min [\frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {s_{g}}^{*}, {λ_{g}}^{*}, \dots) P (J_{g} + 1 ∣ α_{g}) N_{J_{g} + 1} ({λ_{g}}^{*} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}}^{*}) (2 J_{g} + 3) (2 J_{g} + 2) (s_{g, *} - s_{g, j - 1}) (s_{g, j} - s_{g, *})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ s_{g}, λ_{g}, \dots) P (J_{g} ∣ α_{g}) N_{J_{g} + 1} (λ_{g} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}}) (s_{g, j} - s_{g, j - 1}) s_{g, \max}^{2} U_{g} (1 - U_{g})}, 1] .

Delete a split point from s_g and update (s_g, λ_g) jointly via a Metropolis-Hastings-Green step. We randomly select one of the current split points that are not 0 or s_g_,_max with equal probability and delete it. Assume we delete split point s_g_,_j. Following the methods outlined by Green [18], we have that the multiplicative perturbation is

e^{λ_{g, j + 1} - λ_{g, j}} = \frac{1 - U_{g}}{U_{g}}

and that the new height of the interval created by deleting a split point is a compromise of the previous two heights over this interval, defined as

λ_{g, j}^{*} = \frac{(s_{g, j} - s_{g, j - 1}) λ_{g, j} + (s_{g, j + 1} - s_{g, j}) λ_{g, j + 1}}{s_{g, j + 1} - s_{g, j - 1}} .

We draw a new U_g ~ U[0, 1] as in Green [18], rather than setting it to be a hyperparameter as in Lee et al. Now we accept the vector (s_g^*, λ_g^*) jointly with probability

min [\frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {s_{g}}^{*}, {λ_{g}}^{*}, \dots) P (J_{g} - 1 ∣ α_{g}) N_{J_{g} + 1} ({λ_{g}}^{*} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}}^{*}) (s_{g, j + 1} - s_{g, j - 1}) s_{g, \max}^{2} U_{g} (1 - U_{g})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ s_{g}, λ_{g}, \dots) P (J_{g} ∣ α_{g}) N_{J_{g} + 1} (λ_{g} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}}) (2 J_{g} + 1) (2 J_{g}) (s_{g, j} - s_{g, j - 1}) (s_{g, j + 1} - s_{g, j})}, 1] .

2.4. Model Determination

To determine the final model on which to draw inference on our treatment and covariate effects on survival, we calculated the marginal posterior probabilities of inclusion for each variable k = 1, …, p in hazard function g as the proportion of η_g_,_k = 1 in the posterior sample. We then selected the variables in hazard g with marginal posterior probabilities of inclusion (PPI) greater than τ_g ∈ (0, 1). Formally, variable k = 1, …, p in hazard g = 1, 2, 3 was included in the final model if

P [η_{g, k} = 1 ∣ Data] > τ_{g} .

(6)

Thus, one must specify τ_g for each of the three hazard functions. We decided the optimum $τ^{*} = (τ_{1}^{*}, τ_{2}^{*}, τ_{3}^{*})$ vector based on the deviance information via the DIC-τ_g procedure, defined as follows. Recall that the deviance information criterion of Spiegelhalter et al. [19] for β is

DIC = - 2 log (L (\hat{β})) + 2 p_{DIC},

(7)

where $p_{DIC} = 2 (log (L (\hat{β})) - \frac{1}{B} \sum_{b = 1}^{B} log (L (β^{b}))$ , β^b is the sampled value of β for iteration b and β̂ is the posterior mean of β = (β₁, β₂, β₃). Since our likelihood contains many nuisance parameters (γ, J_g, s_g, λ_g,…), Spiegelhalter et al. [19] reccommend to plug-in posterior means of these parameters in determining the DIC for a given β vector. We use the DIC to select the optimal model via the following algorithm, which we call the DIC-τ_g procedure.

Calculate the posterior mode or median of J_g for g = 1, 2, 3, depending on the shape of the posterior distribution. The posterior mode is calculated if the mode had a much greater posterior density than all other possible values for J_g while the posterior median of J_g is calculated if there were several values of J_g that occurred with about the same frequency.
Compute the posterior mean nuisance parameters γ = (γ₁, …, γ_n), s_g|J_g and λ_g|J_g for g = 1, 2, 3.
Perform a three-dimensional grid search for the optimum τ = (τ₁, τ₂, τ₃) over values τ_g = 0.05, 0.1, …, 0.9 that give different models based on the threshold criterion (6) for g = 1, 2, 3.
1. For a given τ = (τ₁, τ₂, τ₃), find the $η^{*} = (η_{1}^{*}, η_{2}^{*}, η_{3}^{*})$ that satisfies (6).
2. Sample $β_{(η_{g}^{*})}$ 10, 000 times via a Metropolis-Hastings algorithm using the prior distribution (5), the same c value used in the variable selection sampler and the posterior quantities calculated in steps 2 and 3.
3. Discard the first half of the sample and save the DIC
The vector $τ^{*} = (τ_{1}^{*}, τ_{2}^{*}, τ_{3}^{*})$ chosen by the DIC-τ_g procedure is the τ^* that produces the smallest DIC. For models with DIC values that differ by 1 or less from the model with the smallest DIC, the most parsimonious model was selected as the final model. A small decrease in DIC here indicates that including more variables does not increase the information criterion significantly over the more parsimonious model.
The final model includes the variables that satisfy (6) for the optimal $τ^{*} = (τ_{1}^{*}, τ_{2}^{*}, τ_{3}^{*})$

After the DIC-τ_g procedure selects $η^{*} = (η_{1}^{*}, η_{2}^{*}, η_{3}^{*})$ based on $τ^{*} = (τ_{1}^{*}, τ_{2}^{*}, τ_{3}^{*})$ , we resample $β_{(η_{g}^{*})}$ 100, 000 times using the prior (5) $β_{(η_{g}^{*})} ~ N (0, c {(X_{(η_{g}^{*})}^{t} X_{(η_{g}^{*})})}^{- 1})$ and do posterior inference after discarding the first 50, 000 MCMC samples. While we could compute the DIC for all 19³ different τ vectors, small changes in τ_g of .05 or .10 may not add any additional variables to hazard g, which means that two different τ_g vectors could lead to the same η_g. This reduces the number of τ_g values that we need to try based on our marginal posterior probabilities of inclusion. On the other hand, increasing τ_g by .05 could add more than one variable to hazard g. Since our case study has 11 variables, there are at most 11³ different τ = (τ₁, τ₂, τ₃) vectors that choose a unique subset of variables. We also did not consider a finer grid τ_g = .01, .02, … because using a spacing of .05 tends to include variables that occur with about the same frequency in the posterior distribution. This contrasts searching through the (2¹¹)³ total possible models without using the SVSS approach to find the lowest DIC. By performing the SVSS to obtain a posterior ordering of the variable importances, we compute the DIC for less than 0.0001% of the total possible models.

3. Simulation Study

We performed a simulation study under six different scenarios to see how well our method selected important variables in the posterior probabilities of inclusion, and how accurately the DIC-τ_g method selected the final model. We used the simID function from the SemiCompRisks package to simulate patient data under the Weibull semi-competing risks model, to examine how well our method performs when the baseline hazards are not truly piecewise exponential. For all simulation studies, we used κ₁ = .05, κ₂ = .01, κ₃ = .01, α₁ = .8, α₂ = 1.1, α₃ = .9 and θ = .5 which are the values given in the example for the simID documentation. Additionally, we set the censoring time for patients at 2000 days. For each simulated data set, we used the actual patient covariate matrix from our data to mimic a correlated data structure, rather than generating patient covariates independently. We performed the simulation study as follows for each simulation:

Run two MCMC chains using the package SCRSELECT [1] for 100,000 iterations with disperse starting values. One chain starts with all the variables included, the other starts with no variables included.
Discard the first 40,000 samples and combine the two chains, saving the marginal posterior probabilities of inclusion as well as the posterior means of nuissance parameters for the baseline hazard and frailties (s_g|J_g, λ_g|J_g, γ), as outlined in the DIC-τ_g procedure.
For τ_g = (.05, .1, …, .9) and g = 1, 2, 3 run an MCMC as described in section 2.4 for the DIC-τ_g procedure using the quantities from step 2 while skipping any (τ₁, τ₂, τ₃) vectors that do not change the variables included in each hazard
Find the combination τ₁, τ₂, τ₃ that produces the smallest DIC. When there are DIC values within 1 of each other for different models, take the one with less variables included (higher τ values).
Assess the results of the simulation.

We present results averaged over 100 replicated datasets for each scenario. For each of the seven scenarios we kept the last two entries of the β_g vectors constant and did not allow these to be selected from the model. These correspond to the propensity score and treatment effect of the XRT modality, respectively. These last two entries are set to (.15,−.15) for the hazard of effusion, (.3,−.2) for the hazard of death before effusion, and (.01,−.05) for the hazard of death after effusion. These values were selected in part to mimic the case study results. This allowed selection on 11 out of the 13 covariates in each hazard. Below is a summary of the six simulation scenarios that we considered for the DIC-τ_g method, where we only list the true coefficients of variables that were considered for variable selection.

Since gth hazard of the model includes e^X_tβ_g, entries of magnitude greater than 1 would be too unrealistic and easy to pick out in simulations. Instead we focused on coefficients having magnitudes of .3 to .9, with a few challenging coefficient values having magnitude less than .3. We used the same hyperparameter settings used in the application for the simulations for comparison. We set the variable selection hyperparameters c = 20 in (5) and (z_g₁, z_g₂) = (.4, 1.6) for g = 1, 2, 3. We set (a_g, b_g) = (ψ, ω) = (.7, .7) as a non-informitive prior for $σ_{λ_{g}}^{- 2}$ and θ⁻¹. We set α_g = 3 corresponding to 3 expected split points in each baseline hazard, and the maximum possible number of split points to 10.

Before examining how well the DIC-τ_g procedure picked out the final model, we examined marginal posterior probability of inclusion results for the simulations in terms of included variables and excluded variables. In most hazards and senarios, the mean marginal probability of inclusion for the included variables is greater than .9. There are several exceptions to this that can be explained by the simulation settings. In scenario two hazard three, the mean marginal probability of inclusion for the two variables is .797, but the median is 1.00. This is due to the small true simulation value for variable 8 of .2, which is much harder to detect in the SVSS.

In Table 2, we display a summary of the results of the DIC-τ_g procedure showing the mean number of false positives, false negatives, and the mean probability of a correct decision. We denote NFP= mean number of false positives and N₀ = total number of variables with true β_g,k = 0 for all three hazards combined. Similarly, NFN= mean number of false negatives and N₊ = total number of variables with true β_g,k ≠ 0 for all three hazards combined. We denote PCD as the probability of a correct decision of whether or not a variable is included. We also computed a statistic resembling the area under the curve (AUC) where we plotted the true positive and false positive rates of all 100 simulations, connected these points in the same manner as the ROC curve and estimated the area under this curve. This statistic, which we denote AUC* operates in the same manner as the traditional AUC statistic where larger values indicate a better classifier and values near .5 indicate a poor classification method. We used this since it evaluated the DIC-τ_g procedure rather than just the SVSS portion like the traditional AUC score would. The AUC* and AUC statistics were undefined and omitted for scenario 7 since there were no true positives.

Table 2.

Simulation Study of the DIC-τ_g Procedure.

Scenario #	NFP/N₀	NFN/N₊	PCD	AUC*
1	5.92/20	0.48/13	0.806	0.972
2	7.29/23	0.50/10	0.764	0.957
3	5.94/28	0.00/5	0.820	1.000
4	5.75/21	0.28/12	0.817	0.935
5	6.21/24	0.45/9	0.798	0.928
6	5.66/13	1.98/20	0.769	0.881
7	5.88/33	0.00/0	0.822	——

Open in a new tab

In all seven scenarios the DIC-τ_g method correctly determined variable inclusion on average with probability .76 to .82 and the AUC* scores for the simulations were all greater than .88, showing that the DIC-τ_g procedure performs well in determining variable inclusion. The DIC-τ_g method correctly identified the true model in at least one simulation for scenarios 2,3 and 7. The majority of the incorrect decisions about variable inclusion were false positives, and false negatives did not even occur in scenarios 3 or 7 which are the sparse and null scenarios. Each scenario had about 6 false positives on average, with the second scenario having the highest mean false positive rate. The false positives are partially due to correlations among the covariates, and not imposing enough separation in the marginal posterior probabilities of inclusion, which can be mitigated through careful adjustments to the hyperparameter c.

4. Application

4.1. Data

This observational data set came from M.D. Anderson Cancer Center in Houston, TX. It consists of 470 esophageal cancer patients who had one of two XRT modalities for radiation: 3-Dimensional Conformal Radiation Therapy (3D-CRT) or Intensity Modulated Radiation Therapy (IMRT). Some patients also received induction chemotherapy. Patients were followed from the end of radiation therapy until death or censoring. The dates pleural or pericardial effusion occured were recorded during this time. The two path diagrams in Figure 1 describe the semi-competing risks structure and enumerate the different patient outcomes. In each of these paths, we display the total number of patients who experienced each event followed by the numbers of patients who received 3D-CRT and IMRT, respectively, in each transition. Pleural effusion occurs much more frequently in these patients than pericardial effusion. Patients who experienced either Pericardial or Pleural effusion died afterwards in over 75% of all cases (no censoring).

Path diagrams for the three event types along with how many patients (Total, 3D-CRT, IMRT) experienced each for pleural and pericardal effusion. 212 patients received 3D-CRT and 258 patients recieved IMRT.

We did separate analyses for the pleural effusion and pericardial effusion path structures shown in Figure 1. Each patient had 20 baseline covariates, including individual characteristics and characteristics related to their tumor. Several patient covariates had high pairwise correlations or a small number of patients. Because this caused severe MCMC convergence problems, these variables were excluded from the analysis. The covariates considered were XRT modality, induction chemotherapy, age, BMI, asthma, diabetes, smoking status, and also binary variables for whether or not a patient had an adeno histology, a good KPS performence score, stage 3–4 cancer, and for tumor location 3 (lower/distal) or tumor location 2 (middle) vs tumor location 1 (upper). Because patients did not have smoking status, BMI, or tumor histology information, they were removed from the data, leaving 470 patients. Clinicians were primarily interested in comparing the effects of the two modalities on pleural effusion, pericardial effusion, and death, but it was unclear what covariates should be included in each hazard function. We addressed this concern through our variable selection method for the three hazard functions. We included IMRT status in each hazard to evaluate radiation therapy modality effects on each of the three hazard functions. Because the data were observational and not randomized, we included the propensity score of each patient as a covariate always in each hazard to correct for some of the bias introduced. The propensity score was estimated by logit⁻¹(Pr(IMRT|X)) = .25 + .48(Hypertension) − .20(Differentation) + .30(Adeno Histology), the fitted logistic regression model of these covariates on IMRT status. We did not allow the propensity score or IMRT status to be removed from the model. Thus, X_{(η_g)} always contains a column with individual propensity scores and XRT status. Likewise, β_{(η_g)} always contains an entry corresponding to the estimated coefficient for the propensity score and XRT status in each hazard function.

4.2. Hyperparameter Settings

For the hyperparameters, we set c = 20 in (5) for both case studies. We found that varying c greatly changed the posterior inclusion probabilities, with larger values of c forcing more variables out of the model and smaller values keeping more variables in the model. Smith and Kohn [20] reccomend a value between 10 and 100. We obtained vague priors on w_g as in Brown et al. [16] by imposing the constraint z_g₁ + z_g₂ = 2, corresponding to a prior effective sample size of 2 for the beta prior of the covariate inclusion probability, with some desirable mean percentage of inclusion. We set (z_g₁, z_g₂) = (.4, 1.6) for g = 1, 2, 3, corresponding to an expectation of 2 to 3 variables being included in each hazard function. We set (a_g, b_g) = (ψ, ω) = (.7, .7) as a non-informitive prior for $σ_{λ_{g}}^{- 2}$ and θ⁻¹. This corresponds to .025 and .975 quantiles of .23 and 155.61 for $σ_{λ_{g}}^{2}$ and θ. We set α_g = 3 corresponding to 3 expected split points in each baseline hazard, and the maximum possible number of split points to 10, which neither data set reached for any hazard. Hyperparameter settings for the ICAR formulation are discussed in the appendix. For both data sets, two chains were run with two different starting values for the three hazard β_g vectors; the first with all η_g,k = 1 and β_g,k = 1 for all k = 1, .., p and g = 1, 2, 3 the second with η₁_,₃ = 1,η₂_,₅ = 1 and η₃_,₇ = 1 (these entries were randomly chosen) and all non-zero coefficients set to β_g,k = −1.

4.3. Case Study Results: Pleural Effusion

The two chains were run for 100, 000 iterations with the first 40, 000 discarded as burn-in. The scale reduction factors for the β_g coefficients had estimates and upper confidence interval estimates below 1.01, which indicates good convergence for both chains along with the trace plots for each [21]. Furthermore, the posterior probabilities of inclusion for the two chains did not differ substantially, with the biggest difference of 4.46%. The correlation of the marginal posterior probabilities of inclusion for the two chains were all above .99 for all three hazards. The posterior samples of both chains were then combined and the resulting marginal posterior probabilities of inclusion can be seen graphically in Figure 2 and numerically in Table 5.

Pleural Effusion Results: Combined Marginal Posterior Probabilities of inclusion and the DIG-*τ_g* cutoff threshholds $τ_{g}^{*}$

Table 5.

Sensitivity Analysis of prior hyperparameter effects on marginal PPI for the Pleural Effusion case study. Notation: c is a parameter in the prior of β_{(η_g)}, given in (5) and (z_g₁, z_g₂) are the prior hyperparameters on w_g, the probability that η_g,k = 1, given on page 4.

Variable (z_g₁, z_g₂) =	c = 20 (.4, 1.6)	c = 30 (.4, 1.6)	c = 20 (.1, 1.9)	c = 30 (.1, 1.9)
	h₁: Effusion before Death

Asthma	.378	.184	.207	.080
Diabetes	.373	.174	.199	.073
Stage 3–4	.352	.168	.191	.071
Location 2	.369	.174	.201	.077
Location 3	.426	.209	.227	.091
Age	.759	.629	.509	.361
Smoker	.373	.181	.194	.074
BMI	.600	.378	.343	.165
Induction Chemo	.345	.166	.191	.071
Good KPS	.430	.253	.242	.111
Adeno Histology	.364	.176	.197	.071
	h₂: Death w/o Effusion

Asthma	.648	.451	.513	.291
Diabetes	.889	.760	.779	.543
Stage 3–4	.833	.712	.731	.507
Location 2	.801	.631	.686	.421
Location 3	.760	.583	.638	.400
Age	.643	.479	.562	.330
Smoker	.599	.412	.500	.272
BMI	.805	.621	.667	.400
Induction Chemo	.607	.419	.497	.270
Good KPS	.968	.922	.896	.713
Adeno Histology	.821	.623	.717	.466
	h₃: Death after Effusion

Asthma	.393	.273	.296	.169
Diabetes	.361	.223	.251	.136
Stage 3–4	.577	.451	.443	.292
Location 2	.459	.321	.345	.195
Location 3	.476	.325	.347	.189
Age	.418	.293	.317	.186
Smoker	.359	.238	.261	.146
BMI	.392	.254	.277	.149
Induction Chemo	.357	.225	.263	.134
Good KPS	.417	.283	.301	.168
Adeno Histology	.405	.264	.291	.151

Open in a new tab

The optimal model based on the DIC-τ_g had a DIC of 1050.845 and indicated that $(τ_{1}^{*}, τ_{2}^{*}, τ_{3}^{*}) = (.45, .75, .45)$ was the appropriate vector of upper bounds to produce the optimum model. This model had 2, 7 and 3 variables in hazards 1, 2 and 3, respectively. After finding the optimal model, the Gibbs sampler was rerun using 100, 000 iterations. Table 3 displays the fitted optimal model with 95% posterior credible intervals for the hazard ratio exp[β_g,k] and P = P[β_g,k > 0|Data] for the kth variable included in each hazard. That is, P is the posterior probability that the effect of covariate k increases the hazard of event g, so that large values of P, above .95 or .99, correspond to a harmful effect of the covariate.

Table 3.

Pleural Effusion Case Study: Posterior quantities for the model chosen by DIC-τ_g procedure. Notation: P = P[β_g,k > 0|Data], HR (CI) = posterior hazard ratio and 95% credible interval. g = 1, 2, 3 correspond to the three hazard functions and k = 1, …, p index to the covariates.

Variable	h₁: Effusion		h₂: Death w/o Effusion		h₃: Death after Effusion

	HR (CI)	P	HR (CI)	P	HR (CI)	P
Asthma	—	—	—	—	—	—
Diabetes	—	—	1.18 (1.03–1.33)	.99	—	—
Stage 3–4	—	—	1.13 (1.00–1.29)	.97	1.19 (1.03–1.37)	.99
Location 2	—	—	.86 (.74–1.01)	.03	1.18 (.98–1.41)	.96
Location 3	—	—	.87 (.73–1.04)	.06	1.18 (.99–1.42)	.97
Age	1.18 (1.04–1.36)	.99	—	—	—	—
Smoker	—	—	—	—	—	—
BMI	.90 (.78–1.02)	.05	.88 (.76–1.00)	.03	—	—
Induction Chemo	—	—	—	—	—	—
Good KPS	—	—	.80 (.71–.91)	< .01	—	—
Adeno Histology	—	—	1.21 (1.02–1.42)	.99	—	—
Propensity Score	.97 (.86–1.12)	.34	.92 (.81–1.05)	.10	.98 (.88–1.11)	.40
IMRT	.84 (.74–.96)	< .01	.85 (.75–.97)	< .01	.98 (.85–1.13)	.40

Open in a new tab

We see from the final model chosen by the DIC-τ_g procedure that patients with IMRT radiation had significantly reduced hazards of pleural effusion and death before pleural effusion compared to 3D-CRT (P < .01). Older age greatly increased the hazard of pleural effusion (P = .99) and patients with Stage 3–4 cancer had an increased hazard of death after pleural effusion (P = .99). Patients with diabetes and adeno histology had significantly increased hazards of death before pleural effusion (P = .99) and patients with a good KPS score had a significantly reduced hazard of death before pleural effusion P <.01. Stage 3–4 cancer increased the hazard of death before pleural effusion (P = .97) but patients with stage 3–4 cancer had a significantly increased risk of death before pericardial effusion (Table 4). This could be attributed to the number of patients who died without pericardial effusion (n = 285) compared to those who died without pleural effusion (n = 197).

Table 4.

Pericardial Effusion Case Study: Posterior quantities for the model chosen by DIC-τ_g procedure. Notation: P = P[β_g,k > 0|Data], HR (CI) = posterior hazard ratio and 95% credible interval. g = 1, 2, 3 correspond to the three hazard functions and k = 1, …, p index to the covariates.

Variable	h₁: Effusion		h₂: Death w/o Effusion		h₃: Death after Effusion

	HR (CI)	P	HR (CI)	P	HR (CI)	P
Asthma	—	—	—	—	—	—
Diabetes	.91 (.75–1.09)	.15	1.14 (1.03–1.27)	> .99	1.11 (.90–1.36)	.83
Stage 3–4	—	—	1.30 (1.17–1.45)	> .99	—	—
Location 2	1.16 (.95–1.42)	.93	—	—	—	—
Location 3	1.20 (.94–1.54)	.92	—	—	1.10 (.92–1.33)	.85
Age	.85 (.72–1.06)	.03	—	—	—	—
Smoker	.89 (.74–.106)	.09	—	—	—	—
BMI	.91 (.76–1.09)	.16	.85 (.76–.95)	< .01	—	—
Induction Chemo	—	—	—	—	—	—
Good KPS	—	—	.83 (.75–.92)	< .01	—	—
Adeno Histology	.83 (.68–1.03)	.04	1.21 (1.08–1.37)	> .99	—	—
Propensity Score	1.07 (.89–1.30)	.75	.96 (.87–1.08)	.25	1.00 (.87–1.17)	.51
IMRT	.73 (.62–.87)	< .01	.80 (.72–.89)	< .01	1.08 (.90–1.30)	.81

Open in a new tab

4.4. Case Study Results: Pericardal Effusion

The two chains were run for 100, 000 iterations with the first 40, 000 discarded as burn-in. The scale reduction factors for the β_g coefficients had estimates and upper confidence interval estimates below 1.01, which indicates good convergence for both chains along with the trace plots for each [21]. The posterior probabilities of inclusion for the two chains did not differ substantially, with the biggest difference 2.80%. The correlation of the marginal posterior probabilities of inclusion for the two chains were .994, .995 and .972 for the three hazards, respectively. The posterior samples of both chains were combined and the resulting marginal posterior probabilities of inclusion can be seen graphically in Figure 3 and numerically in Table 7.

Pericardial Effusion Results: Combined Marginal Posterior Probabilities of inclusion and the DIG-*τ_g* cutoff threshholds $τ_{g}^{*}$

Table 7.

Sensitivity Analysis of prior hyperparameter effects on marginal PPI for the Pericardial Effusion case study. Notation: c is a parameter in the prior of β_{(η_g)}, given in (5) and (z_g₁, z_g₂) are the prior hyperparameters on w_g, the probability that η_g,k = 1, given on page 4.

Variable	c = 20	c = 30	c = 20	c = 30
(z_g₁, z_g₂) =	(.4, 1.6)	(.4, 1.6)	(.1, 1.9)	(.1, 1.9)
	h₁: Effusion before Death

Asthma	.556	.277	.338	.130
Diabetes	.615	.399	.418	.192
Stage 3-4	.538	.305	.357	.137
Location 2	.658	.410	.439	.199
Location 3	.624	.363	.410	.167
Age	.762	.551	.525	.276
Smoker	.639	.394	.435	.191
BMI	.631	.412	.436	.199
Induction Chemo	.508	.388	.329	.126
Good KPS	.521	.440	.347	.130
Adeno Histology	.731	.405	.508	.257
	h₂: Death w/o Effusion

Asthma	.622	.460	.568	.400
Diabetes	.839	.745	.820	.703
Stage 3-4	.993	.993	.990	.975
Location 2	.705	.562	.688	.538
Location 3	.630	.485	.579	.425
Age	.601	.462	.565	.413
Smoker	.630	.479	.577	.423
BMI	.847	.759	.831	.673
Induction Chemo	.686	.563	.638	.490
Good KPS	.946	.923	.929	.880
Adeno Histology	.898	.839	.867	.789
	h₃: Death after Effusion

Asthma	.413	.242	.222	.122
Diabetes	.488	.315	.271	.172
Stage 3-4	.385	.216	.204	.110
Location 2	.422	.259	.227	.135
Location 3	.457	.282	.251	.142
Age	.378	.215	.206	.109
Smoker	.397	.231	.218	.116
BMI	.412	.236	.218	.119
Induction Chemo	.388	.225	.212	.111
Good KPS	.440	.270	.244	.134
Adeno Histology	.405	.237	.218	.119

Open in a new tab

Variables above the thresholds in Figure 3 are included in the final model. The optimal model based on the DIC-τ_g procedure had a DIC of 895.37, with τ^* = (.55, .65, .45). This model produced 7, 6, and 2 variables in hazards 1, 2, and 3, respectively. After finding the optimal model, the Gibbs sampler was rerun using 100, 000 iterations and the first 50, 000 samples were discarded as burn-in. Table 4 displays the fitted optimal model with 95% posterior credible intervals for the hazard ratio exp[β_g,k] and P = P[β_g,k > 0|Data], the posterior probability that a larger value of a covariate is more hazardous, for the kth variable included in each hazard g. We used the same hyperparameter settings for this model as for the pleural effusion model, namely c = 20, z_g₁ = .4 and z_g₂ = 1.6.

Table 4 shows that patients with IMRT had significantly decreased hazards of pericardial effusion and death before pericardial effusion, compared to those who received 3D-CRT (P < .01 for both hazards). All other 95% credible intervals for the hazard of pericardial effusion contain 1, so even though Age and Adeno histology had coefficients that indicated the hazard decreased with these variables (P = .03 and .04) they only decreased the hazard of pericardial effusion slightly. The final model for the hazard of death before pericardial effusion has clearer trends, namely patients with Stage 3–4 cancer, Diabetes, or an Adeno Histology had significantly increased hazards of death without prior pericardial effusion (P > .99), evidenced by the 95 % credible intervals. IMRT was associated with a significant reduction in the hazard of death prior to pericardial effusion as did patients with an increased BMI (P < .01). No variables included in the hazard of death after pericardial effusion changed the hazard significantly, indicating that patient baseline covariates and radiation therapy have little effect on survival following pericardial effusion.

4.5. Sensitivity Analyses

We assessed how sensitive the marginal posterior probabilities of inclusion and model selected via the DIC-τ_g procedure were to the hyperparameters c, z_g₁ and z_g₂. Because we wanted to distinguish among variables, we did not consider c and (z_g₁, z_g₂) values that force all variables in or out of the model. For c, first we found a range of appropriate values for the data that did not include or exclude all the variables. Varying c within this range of appropriate values did not have a large impact on the final models. We tested sensitivity of our model to c by looking at c = 20 and c = 30, in conjunction with two w_g hyperparameter combinations of (z_g₁, z_g₂) = (.4, 1.6) as in the primary analysis and (z_g₁, z_g₂) = (.1, 1.9) corresponding to an expectation of 1 or 2 variables included in each hazard. We found that our method was sensitive to the choices of c and (z_g₁, z_g₂) in shrinking the marginal posterior probabilities of inclusion, but that the final model chosen by the DIC−τ_g procedure did not differ much. This helps address concerns about the sensitivity of the model to final inferences based on treatments and covariates. For sensitivity to the survival hyperparameters, we refer to the supplemental material of Lee et al. [9].

To assess sensitivity to the pleural effusion case study, Table 5 shows the marginal posterior probability of inclusion in each hazard function for each of the four different settings of c and z_g₁. For each hazard function, increasing c had a greater shrinkage impact than decreasing z_g₁ for the pleural effusion data.

In general, the ordering of importance of the variables by marginal Posterior Probablies of Inclusion (PPIs) does not change much. The primary difference in ordering comes for variables that have lower marginal PPIs relative to the variables with the highest PPIs in each hazard, because these variables tend to have marginal PPIs very close to each other, which increases the likelihood of different orderings by chance alone. As mentioned previously, different values of c and z_g₁ induce different degrees of sparsity, so we wanted to know if our final model selected via the DIC-τ_g procedure was sensitive to c and z_g₁. Table 6 displays the final models selected by the DIC-τ_g procedure for each of the four models considered. The same two variables, Age and BMI, were chosen in the final model for all but one hyperparameter setting. In all four settings, the hazard of pleural effusion was decreased significantly for patients who had IMRT radiation (P <.01) and increased for patients with older age (P ≥ .99). For the hazard of death before pleural effusion, the variables Diabetes, Stage 3-4, Location 2, Location 3, BMI, good KPS and adeno Histology were included for all four hyperparameter settings. In all four settings, the hazard of death before pleural effusion was reduced significantly for patients with IMRT radiation (P ≤ .01). All four hyperparameter settings had Stage 3-4 in the final model for the hazard of death after pleural effusion, and in each model, having stage 3-4 cancer significantly increased a patient’s hazard of death after pleural effusion (P ≥ .99). In models including more than just Stage 3-4 cancer in h₃, all other credible intervals contained 1, indicating that Stage 3-4 cancer is the key driver for this hazard.

Table 6.

Sensitivity Analysis of prior hyperparameter effects on the model selected in the Pleural Effusion case study. Notation: c is a dispersion parameter in the prior of β_{(η_g)}, given in (5) and (z_g₁, z_g₂) are the beta prior hyperparameters on w_g, the probability that η_g,k = 1 for k = 1, ..., p, given on page 4.

c	(z_ga, z_gb)	Included Variables
		h₁: Effusion before Death

20	(.4, 1.6)	Age, BMI
30	(.4, 1.6)	Age, BMI
20	(.1, 1.9)	Age, BMI
30	(.1, 1.9)	Age
		h₂: Death w/o Effusion

20	(.4, 1.6)	Diabetes, Stage 3-4, Location 2, Location 3, BMI, Good KPS, Adeno Histology
30	(.4, 1.6)	Diabetes, Stage 3-4, Location 2, Location 3, BMI, Good KPS, Adeno Histology
20	(.1, 1.9)	Diabetes, Stage 3-4, Location 2, Location 3, BMI, Good KPS, Adeno Histology
30	(.1, 1.9)	Diabetes, Stage 3-4, Location 2, Location 3, BMI, Good KPS, Adeno Histology
		h₃: Death after Effusion

20	(.4, 1.6)	Stage 3-4, Location 2, Location 3
30	(.4, 1.6)	Stage 3-4, Location 2, Location 3
20	(.1, 1.9)	Stage 3-4, Location 2, Location 3, Age, Good KPS
30	(.1, 1.9)	Stage 3-4

Open in a new tab

Next, we similarly assessed the sensitivity of the pericardial effusion data set to the variable selection hyperparameters. For the first two hazard functions, increasing c caused more shrinkage than decreasing z_g₁ while the third hazard function achieved more shrinkage by lowering z_g₁. The four different settings considered had good agreement for the order of importance based on marginal posterior probability of inclusion for the second and third hazard functions. Table 7 shows that Stage 3-4, good KPS and Adeno histology had the three highest marginal posterior probabilities of inclusion (in order) for the hazard of death before pericardial effusion in all four settings considered. Diabetes, Location 3 and a good KPS score had the highest three marginal posterior probabilities of inclusion for the hazard of death after pericardial effusion for all four models considered. There was, however, more sensitivity to c and z_g₁ for the hazard of pericardial effusion. While all four settings identified Age as the variable with highest marginal posterior probability of inclusion for hazard 1, three out of the four models identified Adeno Histology and Location 3 and the second and third most important variables based on the marginal PPI. The model that didn’t identify this trend, with c = 30 and (z_g₁, z_g₂) = (.4, 1.6), had four variables with marginal PPI values between .40 and .44, which included the variables associated with Adeno Histology and a good KPS score. This discrepancy in ordering could be due, in part, to these four marginal PPI values being close to each other. In Table 8, we see the final models selected by the DIC-τ_g procedure for each of the four models considered.

Table 8.

Sensitivity Analysis of prior hyperparameter effects on the model selected in the Pericardial Effusion case study. Notation: c is a dispersion parameter in the prior of β_{(η_g)}, given in (5) and (z_g₁, z_g₂) are the beta prior hyperparameters on w_g, the probability that η_g,k = 1 for k = 1, ..., p, given on page 4.

c	(z_ga, z_gb)	Included Variables
		h₁: Effusion before Death

20	(.4, 1.6)	Diabetes, Location 2, Location 3, Age, Smoking Status, BMI, Adeno Histology
30	(.4, 1.6)	Diabetes, Location 2, Location 3, Age, Smoking Status, BMI, Adeno Histology
20	(.1, 1.9)	Diabetes, Location 2, Location 3, Age, Smoking Status, BMI, Adeno Histology
30	(.1, 1.9)	Diabetes, Location 2, Location 3, Age, Smoking Status, BMI, Adeno Histology
		h₂: Death w/o Effusion

20	(.4, 1.6)	Diabetes, Stage 3-4, BMI, Good KPS, Adeno Histology
30	(.4, 1.6)	Diabetes, Stage 3-4, BMI, Good KPS, Adeno Histology
20	(.1, 1.9)	Diabetes, Stage 3-4, BMI, Good KPS, Adeno Histology
30	(.1, 1.9)	Diabetes, Stage 3-4, BMI, Good KPS, Adeno Histology
		h₃: Death after Effusion

20	(.4, 1.6)	Diabetes, Location 3
30	(.4, 1.6)	Diabetes
20	(.1, 1.9)	Diabetes, Location 3
30	(.1, 1.9)	Diabetes

Open in a new tab

The models chosen for the pericardial effusion data do not differ much for the four c and z_g₁ values considered. The hazard of pericardial effusion included Diabetes, Location 2, Location 3, Age, Smoking Status, BMI and Adeno histology in all four models considered. All four models included Diabetes, Stage 3-4, BMI, a good KPS score and Adeno Histology in the hazard of death before pericardial effusion. The variables selected for the hazard of death after pericardial effusion showed some sensitivity to c and z_g₁ choices, but all four models contained Diabetes as a covariate. Additionally, the conclusion that no covariates impact the hazard of death after pericardial effusion did not change for any of the four models.

5. Discussion

We have developed a variable selection procedure for a three-hazard model for semi-competing risks data using spike-andslab priors and the Stochastic Variable Selection Search (SVSS) algorithm. We devised a criterion, DIC-τ_g, for choosing the threshold on the marginal posterior probability of variable inclusion based on the Deviance Information Criteria. As seen in the sensitivity analyses, the DIC-τ_g procedure led to similar selected models, regardless of the choices of the hyperparameters c, z_g₁ and z_g₂. The hazard of death before effusion of either type chose the same set of variables for all hyperparameter choices in both data sets. The hazard of pericardial effusion had the same set of variables for all four settings and the remaining hazard functions had minor differences in variables chosen. Clearly, there is a substantial effect of c and (z_g₁, z_g₂) on decisions about variable inclusion in the final model when choosing an arbitrary τ_g. The DIC-τ_g procedure mitigates much of this sensitivity.

Our simulation study showed that the DIC-τ_g procedure performs well in determining which variables should be included in the model. This method can display a substantial improvement by tuning the sparsity parameter c to separate the marginal posterior probabilities of inclusion of the variables.

In an application to data from esophageal cancer patients, we were able to use this procedure to select the important covariates in each of the three hazards and use the final variables included to evaluate the treatment and covariate effects on the hazards of effusion, death before effusion, and death after effusion. In our selected models, we saw the same medical conclusion, there was strong evidence that patients with IMRT radiation had significantly reduced risks of pericardial effusion, pleural effusion, and death before either effusion type. Further evidence of this treatment effect was shown in the SVSS posteriors with P[β_1,IMRT > 0|Data] = .02 and P[β_2,IMRT > 0|Data] = .04 for pleural effusion and P[β_1,IMRT > 0|Data] < .01 and P[β_2,IMRT > 0|Data] < .01 for pericardial effusion. For the models determined by the SVSS algorithm, patients who received IMRT had significantly reduced risks of pericardial and pleural effusion as well as death before effusion type. This agrees with results from several previous studies which showed that IMRT increased patient survival compared to 3D-CRT [22] [23].

The proposed method provides a flexible, practical variable selection procedure for semi-competing risks. The SVSS algorithm for semi-competing risks is implemented in the package SCRSELECT [1], which is now available on CRAN. This function computes and saves all posterior quantities and returns the marginal posterior probabilities of inclusion for each hazard function. The code allows for 0, 1, .., p − 2 variables to be excluded from the selection procedure, while we kept two variables out of the selection procedure to evaluate the treatment effect in our application. This program takes between two and four hours to run 100,000 iterations, depending on the number of split points in each hazard during the MCMC. Additionally, there is a program that runs the SVSS algorithm on two disperse chains followed by the DIC-τ_g procedure which performs a grid search through all the possible models for given vectors of marginal posterior probabilities of inclusion. This takes six to twelve hours depending on the separation of the variables in terms of marginal posterior probabilities of inclusion and returns these posterior probabilities, the threshold of inclusion and the final model selected by the DIC-τ_g procedure.

Table 1.

Simulation Scenarios.

Hazard Component Scenario #	β₁ h₁: Non-Terminal Event	β₂ h₂: Death w/o Non-Terminal Event	β₃ h₃: Death after Non-Terminal Event
1	(.9,0,−.7,0,0,0,0,−.6,0,0,.5)	(−.5,0,0,.7,0,0,0,.7,−.7,.5,0)	(.6,0,0,−.5,0,−.6,0,−.8,0,0,0)
2	(.9,0,−.7,0,0,0,0,−.6,0,0,.5)	(.7,0,0,−.2,0,0,0,.3,0,−.45,0)	(−.9,0,0,0,0,0,0,.2,0,0,0)
3	(0,0,0,0,.9,0,0,0,0,−.7,0)	(0,0,0,0,0,0,0,0,0,0,−.6)	(.6,−.7,0,0,0,0,0,0,0,0,0)
4	(0,−.7,.6,0,0,0,.3,.5,0,−.9,.4)	(−.85,0,0,−.4,0,0,0,.3,0,.8,0)	(−.5,0,0,0,0,0,0,.6,0,0,0)
5	(−.7,0,0,−.5,0,.9,0,0,0,0,0)	(.4,0,0,−.6,0,0,0,0,−.7,0,.5)	(−.3,0,0,.8,0,0,0,0,0,0,0)
6	(.8,.7,.4,−.7,−.7,−.8,0,0,0,0)	(.8,−.7,.9,0,0,0,0,0,.8,−.4,−7)	(.4,.8,−.7,.35,.8,−.9,.4,−.7,0,0,0)
7	(0,0,0,0,0,0,0,0,0,0,0)	(0,0,0,0,0,0,0,0,0,0,0)	(0,0,0,0,0,0,0,0,0,0,0)

Open in a new tab

Acknowledgments

The authors thank the two referees for their constructive comments on an earlier version of this paper. This study was performed at the M.D. Anderson Cancer Center in Houston, Tx. Peter Thall’s reasearch was supported by NIH R01 CA 83932. Andrew Chapple was partially supported by the NIH grant 5T32-CA096520-07.

Appendix: MCMC algorithm details on (η_g, β_g) and λ_g updates

We report additional details of the (η_g, β_g) and λ_g Markov Chain Monte Carlo samplers here.

Update (η_g, β_g) update via a Metropolis step. With probability ϕ we randomly choose one entry of η_g and if it is 0, set it to 1, while if it is 1, set it to 0. With probability 1 − ϕ, we randomly choose one entry that is a 1 and a random entry that is 0 and change their values. If for some reason there are no values of 1 or 0 in the previous iteration, we can only perform an add or delete step and randomly choose one entry to change. We then update β_g in the following way under 3 different scenarios. If η_g,k is “deleted”, we set β_g,k = 0 and $η_{g, k}^{*} = 0$ . If η_g,k is ”added”, we sample $β_{g, k}^{*}$ from $β_{g, k} ∣ η_{g}^{*}$ , β_g,(−k) and set $η_{g, k}^{*} = 1$ . Finally if η_g,k = 0 and η_g,j = 1 are swapped then we set $η_{g, j}^{*} = 0, η_{g, k}^{*} = 1, β_{g, j}^{*} = 0$ and we sample $β_{g, k}^{*}$ from the distribution of $β_{g, k} ∣ η_{g}^{*}$ , β_g,(−k). The vectors $η_{g}^{*}$ and $β_{g}^{*}$ are jointly accepted over the previous values (η_g, β_g) with probability $α_{1}^{*}, α_{2}^{*}, α_{3}^{*} = min (α_{1}, 1)$ , min(α₂, 1), min(α₃, 1), respectively for the three moves listed above.

Delete Move

For a delete move, the acceptance probability is

\begin{matrix} α_{1} = \frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) π (β_{g, k}^{*} = 0 ∣ η_{g}^{*}) π (η_{g}^{*})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) π (β_{g, k} ∣ η_{g}, β_{g, (- k)}) π (η_{g})} = \\ = \frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) Beta (t_{g} - 1 + z_{g, 1}, m_{g} - t_{g} + 1 + z_{g, 2})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) N (β_{g, k} ∣ μ_{old}, σ_{old}^{2}) Beta (t_{g} + z_{g, 1}, m_{g} - t_{g} + z_{g, 2})}, \end{matrix}

where $μ_{old} = R_{k, (- k)} R_{(- k), (- k)}^{- 1} β_{(η_{g}), (- k)}$ and $σ_{old}^{2} = R_{k, k} - R_{k, (- k)} R_{(- k), (- k)}^{- 1} R_{(- k), (k)}$ and $R = c {(X_{(η_{g})}^{t} X_{η_{g}})}^{- 1}$ .

Add move

For the add step we set

η_{g, k}^{*} = 1

and sample

β_{g, k}^{*}

corresponding to η_g,k = 1 from

π (β_{g, k} ∣ β_{g, (- k)}, η_{g}^{*})

and accept

η_{g}^{*}, β_{g, k}^{*}

with probability

α_{2}^{*} = min (α_{2}, 1)

where

\begin{matrix} α_{2} = \frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) π (β_{g, k}^{*} ∣ η_{g}^{*}, β_{g, (- k)}) π (η_{g}^{*})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) π (β_{g, k} = 0 ∣ η_{g}) π (η_{g})} = \\ = \frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) N (β_{g, k} ∣ μ_{new}, σ_{new}^{2}) Beta (t_{g} + 1 + z_{g, 1}, m_{g} - t_{g} - 1 + z_{g, 2})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) Beta (t_{g} + z_{g, 1}, m_{g} - t_{g} + z_{g, 2})}, \end{matrix}

Swap Move

For the swap step, where entry j is set to 0 and k is sampled for

β_{g}^{*}

, we jointly accept

η_{g}^{*}, β_{g, j}^{*}, β_{g, k}^{*}

with probability

α_{3}^{*} = min (1, α_{3})

where

\begin{matrix} α_{3} = \frac{L (β_{g, j}^{*}, β_{g, k}^{*} ∣ β_{g, (- k, - j), \dots}) π (β_{g, j}^{*}, β_{g, k}^{*} ∣ η_{g}^{*}, β_{g, (- k, - j)}) π (η_{g}^{*})}{L (β_{g, j}, β_{g, k} ∣ β_{g, (- k, - j), \dots}) π (β_{g, j}, β_{g, k} ∣ η_{g}, β_{g, (- k, - j)}) π (η_{g})} = \\ = \frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) π (β_{g, k}^{*} ∣ η_{g}^{*}, β_{g, (- k)}) π (β_{g, j}^{*} = 0 ∣ η_{g}^{*}) Beta (t_{g} + z_{g, 1}, m_{g} - t_{g} + z_{g, 2})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) π (β_{g, j} ∣ η_{g}, β_{g, (- j)}) π (β_{g, k} = 0 ∣ η_{g}) Beta (t_{g} + z_{g, 1}, m_{g} - t_{g} + z_{g, 2})} = \\ = \frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {β_{g}}^{*}, \dots) N (β_{g, k}^{*} ∣ μ_{new}, σ_{new}^{2})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ β_{g}, \dots) N (β_{g, j} ∣ μ_{old}, σ_{old}^{2})}, \end{matrix}

where we denote $μ_{new} = \sum_{k, (- k)} \sum_{(- k), (- k)}^{- 1} β_{(η_{g}), (- k)}$ and $σ_{new}^{2} = \sum_{k, k} - \sum_{k, (- k)} \sum_{(- k), (- k)}^{- 1} \sum_{(- k), (k)}$ with $\sum = c {(X_{{(η_{g})}^{*}}^{t} X_{η_{g}^{*}})}^{- 1}$ . Likewise, we define $μ_{old} = R_{j, (- j)} R_{(- j), (- j)}^{- 1} β_{(η_{g}), (- j)}$ and $σ_{old}^{2} = R_{j, j} - R_{j, (- j)} R_{(- j), (- j)}^{- 1} R_{(- j), (j)}$ with $R = c {(X_{(η_{g})}^{t} X_{η_{g}})}^{- 1}$ .

After we jointly accept or reject ( $η_{g}^{*}, β_{g}^{*}$ ), we resample each entry of β_g|η_g conditional on the other non-zero entries and then sample β_g|η_g jointly for better mixing of our posterior distribution.

λ_g update

We used the same sampler as Lee et al [9] for all samplers except the birth and death of a split point in hazard g = 1, 2, 3 along with using a different proposal distribution for λ_g|s_g, μ_{λ_g}, $σ_{λ_{g}}^{2}$ ,Σ_{λ_g}. Recall the MVN-ICAR specification of the prior for λ_g that was imployed in Lee et al which sets up two matrices based on distances between adjacent split points [12] [9]. Formally, define ${\bar{Δ}}_{j}^{g} = s_{g, j} - s_{g, j - 1}$ and let W^g be an off diagonal matrix with entries
$W_{j (j - 1)}^{g} = \frac{c_{λ_{g}} ({\bar{Δ}}_{j - 1}^{g} + {\bar{Δ}}_{j}^{g})}{{\bar{Δ}}_{j - 1}^{g} + 2 {\bar{Δ}}_{j}^{g} + {\bar{Δ}}_{g + 1}^{g}} and W_{j (j + 1)}^{g} = \frac{c_{λ_{g}} ({\bar{Δ}}_{j + 1}^{g} + {\bar{Δ}}_{j}^{g})}{{\bar{Δ}}_{j - 1}^{g} + 2 {\bar{Δ}}_{j}^{g} + {\bar{Δ}}_{j + 1}^{g}}$

here c_{λ_g} ∈ [0, 1] characterizes the dependence between the heights of adjacent split point intervals. In our computation, we set c_{λ_g} = 1 to encourage spatial dependency in adjacent intervals. Let Q^g be a diagonal matrix with entries $Q_{j}^{g} = 2 / ({\bar{Δ}}_{j - 1}^{g} + 2 {\bar{Δ}}_{j}^{g} + {\bar{Δ}}_{j + 1}^{g})$ . Then we have Σ_{λ_g} = (I −W^g)⁻¹Q^q and the MVN-ICAR prior is
$λ_{g} ∣ J_{g}, μ_{λ_{g}}, σ_{λ_{g}}^{2} ~ N_{J_{g} + 1} (μ_{λ_{g}} \underline{1}, σ_{λ_{g}}^{2} \sum_{λ_{g}})$

Our sampler differs from Lee et al. [9] in that we sample our proposal $λ_{g, k}^{*}$ from a uniform distribution on [λ_g,k −c_g, λ_g,k +c_g] where c_g is a tuning parameter for k = 1, ...J_g +1. We set c_g = .25 as in Haneuse et al. [17]. This proposal distribution causes the proposal ratio to be 1 which leads to an acceptance probability of this move to be the minimum of 1 and
$α = \frac{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ {λ_{g}}^{*}, \dots) N_{J_{g} + 1} ({λ_{g}}^{*} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}})}{L (Y_{1}, Y_{2}, δ_{1}, δ_{2}, X ∣ λ_{g}, \dots) N_{J_{g} + 1} (λ_{g} ∣ μ_{λ_{g}}, σ_{λ_{g}}^{2} \sum_{λ_{g}})}$

Footnotes

A computer program to implement the methodology named SCRSELECT is available on CRAN [1].

References

1.Chapple A. Package: Scrselect. CRAN; [Google Scholar]
2.Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide burden of cancer in 2008: Globocan 2008. International Journal of Cancer. 2010;127(12):2893–2917. doi: 10.1002/ijc.25516. [DOI] [PubMed] [Google Scholar]
3.Torre L, Bray F, Siegel R, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. A Cancer Journal for Clinicians. 2015;65(2):87–108. doi: 10.3322/caac.21262. [DOI] [PubMed] [Google Scholar]
4.Ishikura S, Nihei K, Ohtsu A, Boku N, Hironaka S, Mera K, Muto M, Ogino T, Yoshida S. Long-term toxicity after definitive chemoradiotherapy for squamous cell carcinoma of the thoracic esophagus. Journal of Clinical Oncology. 2003;21(14):2697–2702. doi: 10.1200/JCO.2003.03.055. [DOI] [PubMed] [Google Scholar]
5.Wei X, Liu H, Tucker S, Wang S, Mohan R, Cox J, Komaki R, Liao Z. Risk factors for pericardial effusion in inoperable esophageal cancer patients treated with definitive chemoradiation therapy. International Journal of Radiaition Oncology - Biology - Phyisics. 2009;70(3):707–714. doi: 10.1016/j.ijrobp.2007.10.056. [DOI] [PubMed] [Google Scholar]
6.Fenkella L, Kaminskyc I, Breenb S, Huangc S, Prooijenb MV, Ringasha J. Dosimetric comparison of imrt vs. 3d conformal radiotherapy in the treatment of cancer of the cervical esophagus. Radiotherapy and Oncology. 2008;89(3):287–291. doi: 10.1016/j.radonc.2008.08.008. [DOI] [PubMed] [Google Scholar]
7.Chandra A, Guerrero T, Liu H, Tucker S, Liao Z, Wang X, Murshed H, Bonnen M, Stevens C, Chang J, Jeter M, Mohan R, Cox J, Komaki R. Feasibility of using intensity-modulated radiotherapy to improve lung sparing in treatment planning for distal esophageal cancer. Radiotherapy and Oncology. 2005;77(3):247–253. doi: 10.1016/j.radonc.2005.10.017. [DOI] [PubMed] [Google Scholar]
8.Shirai K, Tamaki Y, Kitamoto Y, Murata K, Satoh Y, Higuchi K, Nonaka T, Ishikawa H, Katoh H, Takahashi T, Nakano T. Dose-volume histogram parameters and clinical factors associated with pleural effusion after chemoradiotherapy in esophageal cancer patients. Int J Radiation Oncology Biol Phys. 2011;80(4):1002–1007. doi: 10.1016/j.ijrobp.2010.03.046. [DOI] [PubMed] [Google Scholar]
9.Lee K, Haneuse S, Schrag D, Dominici F. Bayesian semiparametric analysis of semicompeting risks data: investigating hospital readmission after a pancreatic cancer diagnosis. Journal of the Royal Statistical Society Series C Applied Statistics. 2015;64(2):253–273. doi: 10.1111/rssc.12078. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Lee K, Haneuse S. Package: Semicomprisks. CRAN; [Google Scholar]
11.He L, Chapple A, Liao Z, Komaki R, Thall P, Lin S. Bayesian regression analyses of radiation modality effects on pericardial and pleural effusion and survival in esophageal cancer. Radiother Oncoldoi. doi: 10.1016/j.radonc.2016.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Besag J, Kooperberg C. On conditional and intrinsic autoregression. Biometrika. 1995;82(4):733–746. [Google Scholar]
13.George E, McCulloch R. Variable selection via gibbs sampling. Journal of the American Statistican Association. 1993;88(423):881–889. [Google Scholar]
14.George E, McCulloch R. Approaches for bayesian variable selection. Statistica sinica. 1997:339–373. [Google Scholar]
15.Zelner A, Goel P. Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti. Elsevier Science Ltd; 1986. pp. 233–243. Ch. On Assessing Prior Distributions and Bayesian Regression Analysis with g Prior Distributions. [Google Scholar]
16.Brown P, Vannucci M, Fearn T. Bayesian wavelength selection in multicomponent analysis. Journal of Econometrics. 1998;12(3):173–182. doi: 10.1002/(SICI)1099-128X(199805/06)12:3<173::AID-CEM505>3.0.CO;2-0. [DOI] [Google Scholar]
17.Haneuse S, Rudser K, Gillen D. Bayesian survival modeling of the time-varying effect of a time-dependent exposure. Biostatistics. 2008;9(3):400–410. doi: 10.1093/biostatistics/kxm038. [DOI] [PubMed] [Google Scholar]
18.Green P. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika. 1995;82(4):711–732. [Google Scholar]
19.Spiegelhalter D, Best N, Carlin B, van der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(4):583–639. [Google Scholar]
20.Smith M, Kohn R. Nonparametric regression via bayesian variable selection. Journal of Econometrics. 1996;75(2):317–343. doi: 10.1016/0304-4076(95)01763-1. [DOI] [Google Scholar]
21.Gelman A, Rubin D. Inference from iterative simulation using multiple sequences. Statistics Science. 1992;7(4):457–511. [Google Scholar]
22.Lin S, Wang L, Myles B, Thall P, Hofstetter W, Swisher S, Ajani J, Cox J, Komaki R, Liao Z. Propensity score based comparison of long term outcomes with 3d conformal radiotherapy (3dcrt) versus intensity modulated radiation therapy (imrt) in the treatment of esophageal cancer. Int J Radiat Oncol Biol Phys. 2012;84(5):1078–1085. doi: 10.1016/j.ijrobp.2012.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Lin S, Zhang N, Godby J, Wang J, Marsh G, Liao Z, Komaki R, Ho L, Hofstetter W, Swisher S, Mehran R, Buchholz T, Elting L, Giordano S. Radiation modality use and cardiopulmonary mortality risk in elderly patients with esophageal cancer. Cancer. 2016;122(6):917–928. doi: 10.1002/cncr.29857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Chapple A. Package: Scrselect. CRAN; [Google Scholar]

[R2] 2.Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide burden of cancer in 2008: Globocan 2008. International Journal of Cancer. 2010;127(12):2893–2917. doi: 10.1002/ijc.25516. [DOI] [PubMed] [Google Scholar]

[R3] 3.Torre L, Bray F, Siegel R, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. A Cancer Journal for Clinicians. 2015;65(2):87–108. doi: 10.3322/caac.21262. [DOI] [PubMed] [Google Scholar]

[R4] 4.Ishikura S, Nihei K, Ohtsu A, Boku N, Hironaka S, Mera K, Muto M, Ogino T, Yoshida S. Long-term toxicity after definitive chemoradiotherapy for squamous cell carcinoma of the thoracic esophagus. Journal of Clinical Oncology. 2003;21(14):2697–2702. doi: 10.1200/JCO.2003.03.055. [DOI] [PubMed] [Google Scholar]

[R5] 5.Wei X, Liu H, Tucker S, Wang S, Mohan R, Cox J, Komaki R, Liao Z. Risk factors for pericardial effusion in inoperable esophageal cancer patients treated with definitive chemoradiation therapy. International Journal of Radiaition Oncology - Biology - Phyisics. 2009;70(3):707–714. doi: 10.1016/j.ijrobp.2007.10.056. [DOI] [PubMed] [Google Scholar]

[R6] 6.Fenkella L, Kaminskyc I, Breenb S, Huangc S, Prooijenb MV, Ringasha J. Dosimetric comparison of imrt vs. 3d conformal radiotherapy in the treatment of cancer of the cervical esophagus. Radiotherapy and Oncology. 2008;89(3):287–291. doi: 10.1016/j.radonc.2008.08.008. [DOI] [PubMed] [Google Scholar]

[R7] 7.Chandra A, Guerrero T, Liu H, Tucker S, Liao Z, Wang X, Murshed H, Bonnen M, Stevens C, Chang J, Jeter M, Mohan R, Cox J, Komaki R. Feasibility of using intensity-modulated radiotherapy to improve lung sparing in treatment planning for distal esophageal cancer. Radiotherapy and Oncology. 2005;77(3):247–253. doi: 10.1016/j.radonc.2005.10.017. [DOI] [PubMed] [Google Scholar]

[R8] 8.Shirai K, Tamaki Y, Kitamoto Y, Murata K, Satoh Y, Higuchi K, Nonaka T, Ishikawa H, Katoh H, Takahashi T, Nakano T. Dose-volume histogram parameters and clinical factors associated with pleural effusion after chemoradiotherapy in esophageal cancer patients. Int J Radiation Oncology Biol Phys. 2011;80(4):1002–1007. doi: 10.1016/j.ijrobp.2010.03.046. [DOI] [PubMed] [Google Scholar]

[R9] 9.Lee K, Haneuse S, Schrag D, Dominici F. Bayesian semiparametric analysis of semicompeting risks data: investigating hospital readmission after a pancreatic cancer diagnosis. Journal of the Royal Statistical Society Series C Applied Statistics. 2015;64(2):253–273. doi: 10.1111/rssc.12078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Lee K, Haneuse S. Package: Semicomprisks. CRAN; [Google Scholar]

[R11] 11.He L, Chapple A, Liao Z, Komaki R, Thall P, Lin S. Bayesian regression analyses of radiation modality effects on pericardial and pleural effusion and survival in esophageal cancer. Radiother Oncoldoi. doi: 10.1016/j.radonc.2016.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Besag J, Kooperberg C. On conditional and intrinsic autoregression. Biometrika. 1995;82(4):733–746. [Google Scholar]

[R13] 13.George E, McCulloch R. Variable selection via gibbs sampling. Journal of the American Statistican Association. 1993;88(423):881–889. [Google Scholar]

[R14] 14.George E, McCulloch R. Approaches for bayesian variable selection. Statistica sinica. 1997:339–373. [Google Scholar]

[R15] 15.Zelner A, Goel P. Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti. Elsevier Science Ltd; 1986. pp. 233–243. Ch. On Assessing Prior Distributions and Bayesian Regression Analysis with g Prior Distributions. [Google Scholar]

[R16] 16.Brown P, Vannucci M, Fearn T. Bayesian wavelength selection in multicomponent analysis. Journal of Econometrics. 1998;12(3):173–182. doi: 10.1002/(SICI)1099-128X(199805/06)12:3<173::AID-CEM505>3.0.CO;2-0. [DOI] [Google Scholar]

[R17] 17.Haneuse S, Rudser K, Gillen D. Bayesian survival modeling of the time-varying effect of a time-dependent exposure. Biostatistics. 2008;9(3):400–410. doi: 10.1093/biostatistics/kxm038. [DOI] [PubMed] [Google Scholar]

[R18] 18.Green P. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika. 1995;82(4):711–732. [Google Scholar]

[R19] 19.Spiegelhalter D, Best N, Carlin B, van der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(4):583–639. [Google Scholar]

[R20] 20.Smith M, Kohn R. Nonparametric regression via bayesian variable selection. Journal of Econometrics. 1996;75(2):317–343. doi: 10.1016/0304-4076(95)01763-1. [DOI] [Google Scholar]

[R21] 21.Gelman A, Rubin D. Inference from iterative simulation using multiple sequences. Statistics Science. 1992;7(4):457–511. [Google Scholar]

[R22] 22.Lin S, Wang L, Myles B, Thall P, Hofstetter W, Swisher S, Ajani J, Cox J, Komaki R, Liao Z. Propensity score based comparison of long term outcomes with 3d conformal radiotherapy (3dcrt) versus intensity modulated radiation therapy (imrt) in the treatment of esophageal cancer. Int J Radiat Oncol Biol Phys. 2012;84(5):1078–1085. doi: 10.1016/j.ijrobp.2012.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Lin S, Zhang N, Godby J, Wang J, Marsh G, Liao Z, Komaki R, Ho L, Hofstetter W, Swisher S, Mehran R, Buchholz T, Elting L, Giordano S. Radiation modality use and cardiopulmonary mortality risk in elderly patients with esophageal cancer. Cancer. 2016;122(6):917–928. doi: 10.1002/cncr.29857. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Bayesian variable selection for a semi-competing risks model with three hazard functions

Andrew G Chapple

Marina Vannucci

Peter F Thall

Steven Lin

Abstract

1. Introduction

2. Methods

2.1. Semi-Parametric Semi-Competing Risks Model

2.2. Variable Selection Priors

2.3. Markov Chain Monte Carlo

2.4. Model Determination

3. Simulation Study

Table 2.

4. Application

4.1. Data

Figure 1.

4.2. Hyperparameter Settings

4.3. Case Study Results: Pleural Effusion

Figure 2.

Table 5.

Table 3.

Table 4.

4.4. Case Study Results: Pericardal Effusion

Figure 3.

Table 7.

4.5. Sensitivity Analyses

Table 6.

Table 8.

5. Discussion

Table 1.

Acknowledgments

Appendix: MCMC algorithm details on (ηg, βg) and λg updates

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Appendix: MCMC algorithm details on (η_g, β_g) and λ_g updates