BAYESIAN INFERENCE OF HIDDEN GAMMA WEAR PROCESS MODEL FOR SURVIVAL DATA WITH TIES

Arijit Sinha; Zhiyi Chi; Ming-Hui Chen

doi:10.5705/ss.2012.351

. Author manuscript; available in PMC: 2016 Oct 1.

Published in final edited form as: Stat Sin. 2015 Oct;25(4):1613–1635. doi: 10.5705/ss.2012.351

BAYESIAN INFERENCE OF HIDDEN GAMMA WEAR PROCESS MODEL FOR SURVIVAL DATA WITH TIES

Arijit Sinha ¹, Zhiyi Chi ², Ming-Hui Chen ²

PMCID: PMC4643298 NIHMSID: NIHMS723313 PMID: 26576105

Abstract

Survival data often contain tied event times. Inference without careful treatment of the ties can lead to biased estimates. This paper develops the Bayesian analysis of a stochastic wear process model to fit survival data that might have a large number of ties. Under a general wear process model, we derive the likelihood of parameters. When the wear process is a Gamma process, the likelihood has a semi-closed form that allows posterior sampling to be carried out for the parameters, hence achieving model selection using Bayesian deviance information criterion. An innovative simulation algorithm via direct forward sampling and Gibbs sampling is developed to sample event times that may have ties in the presence of arbitrary covariates; this provides a tool to assess the precision of inference. An extensive simulation study is reported and a data set is used to further illustrate the proposed methodology.

Keywords and phrases: Direct forward sampling, Gibbs sampling, jump process, latent variables, proportional hazards model, tied event times

1. Introduction

Tied event times are a common phenomenon in time-to-event studies. For events that only happen at specific points in time, ties occur naturally, and for events that can happen at any point in time, ties may arise when a coarse time scale is used to record data (cf., Rossi, Berk, and Lenihan (1980)). Even when continuous event times are recorded at a fine time scale, ties can occur. Thus, machines in a workshop can stop working instantaneously from a power outage, an abrupt worsening of air quality can cause multiple emergency calls in a short time period, and in a sudden natural or man-made disaster casualties tend to happen at the same time (cf., Gold et al. (2007)). It can be more useful to model observed ties as the outcome of a certain mechanism underpinning the events than to either account for them as artifacts or to ignore them altogether.

Under the Proportional Hazards (PH) model of Cox (1972, 1975), the survival function of a subject with time-invariant covariates is expressed as

P (T > t ∣ x) = exp {- H (t) exp (x^{'} β)},

(1.1)

where T is the failure time, x is a vector of covariates, H is a completely un-specified baseline cumulative hazard function, and β is a vector of regression coefficients. In the non-Bayesian setting, inference from the model uses the partial likelihood of β, which implicitly assumes that the baseline hazard rate H′(t) exists, hence ruling out ties between independent failure times. In practice, in the presence of ties, approximation can be made by applying the formula for the no-tie case or by discretizing time (Cox (1972); Peto (1972); Breslow (1974); Efron (1977)).

Taking H as modeled as a stochastic process provides a powerful way to handle ties. In reliability analysis, Gaver (1963) took H to be a process with independent increments. Reynolds and Savage (1971) studied Gaver’s model in detail and obtained a likelihood function of its parameters and, for the case of Gamma process, several closed form results. However, as Gaver’s model sets β = 0 in (1.1), its primary concern is different from the PH model. Within the Bayesian setting, Dirichlet, Gamma, Beta, beta-Stacy and, more generally, neutral-to-the-right processes have been introduced as priors on H or its transformations (Ferguson (1973); Doksum (1974); Kalbfleisch (1978); Ferguson and Phadia (1979); Hjort (1990); Walker and Muliere (1997); Epifani, Lijoi, and Prünster (2003)), and inference on β can be based on maximum likelihood estimation or Bayesian posterior analysis (Kalbfleisch (1978); Hjort (1990); Damien, Laud, and Smith (1996); Laud, Damien, and Smith (1998); Kim and Lee (2003); Lee and Kim (2004)). Under these priors, tied event times occur with positive probability. On the other hand, a standard Bayesian approach that rules out ties imposes priors on the baseline hazard rate function H′, which leads to continuous H (Antelman and Savage (1965); Reynolds and Savage (1971); Dykstra and Laud (1981); Lo and Weng (1989); Clayton (1991); Ibrahim, Chen, and Sinha (2001); Nieto-Barajas and Walker (2004); James (2005, 2006); Lijoi, Prünster, and Walker (2008b); Peccati and Prünster (2008); Kim and Kim (2009); Kim, Park, and Kim (2011)). In data analysis, tie-breaking has been used to artificially transform tied event times into distinct ones (Kalbfleisch (1978); Chen, Ibrahim, and Shao (2006)). However, the resulting estimate can be seriously biased if the proportion of ties is large (Burridge (1981)).

Following Gaver (1963), we regard H as a hidden stochastic wear process underlying the failures, rather than a parameter with a certain prior distribution. Based on the joint likelihood of β and the parameters of the process H, we establish Bayesian inference and model selection to analyze survival data with ties. The joint likelihood is obtained under a general multivariate process model that associates with each subject a possibly different H. This model unifies the PH model as a limiting case and others, such as the Lévy copula model (Epifani and Lijoi (2010)). We obtain the likelihood of the parameters using an argument similar in spirit to those for several special cases (Reynolds and Savage (1971); Lijoi, Prünster, and Walker (2008a); Epifani and Lijoi (2010)). For homogeneous Gamma wear processes, we derive the likelihood in a semi-closed form. By imposing suitable noninformative priors on β and the parameters of the Gamma process, we can sample from their joint posterior distribution efficiently using Gibbs sampling. While similar methods have been used for posterior sampling of β (Damien, Laud, and Smith (1996); Laud, Damien, and Smith (1998)), the joint sampling appears to be new. With this in place, we propose to use the Bayesian deviance information criterion (DIC) (Spiegelhalter et al. (2002)) to guide the selection of Gamma process models.

We develop a Gibbs sampling-based simulation algorithm, termed the Direct Forward Sampling (DFS), to sample multiple failure times allowing for ties from a homogeneous Gamma process H in the presence of arbitrary values of β and covariates. The sampling is clearly different from posterior sampling, which has failure times already observed, and it does not rely on truncating the Lévy measure or sampling the path of H at pre-selected time points (Damien, Laud, and Smith (1996); Laud, Damien, and Smith (1998); Lee and Kim (2004)). Except for the approximation error of the Gibbs sampling of H just before and at failure times, as in posterior Gibbs sampling (Laud, Damien, and Smith (1998)), our sampling method is precise. In fact, by replacing Gibbs sampling with rejection sampling, exact sampling can be achieved (Chi (2012)).

The rest of the paper is as follows. Section 2 sets up notation. In Section 3 we propose a general multivariate additive process model, derive the likelihood function for the model, and apply it to Gamma wear processes. In Section 4 we describe the DFS algorithm. Section 5 details a Gibbs sampling algorithm for posterior computation. In Section 6 we report on an extensive simulation study to examine the empirical properties of the Gamma wear process model. In this section, we use DIC to guide the choice of parameters. In Section 7 we analyze a prostate cancer data set with our methodology. Section 8 ends with a discussion and potential future research work on this topic. Proofs and a discussion on possible extension to processes other than Gamma processes are given in Supplementary Material.

2. Basic Setup

Suppose n subjects are observed in a time-to-event study. Denote by T_i, C_i, and Y_i = min(T_i, C_i), the random failure time, right-censoring time, and endpoint of the i^th subject, respectively. We use the corresponding lower-case letters to denote the actual values of the random variables. Thus y_i = min(t_i, c_i) is the observed endpoint for the i^th subject, with t_i observable if and only if t_i ≤ c_i. Let δ_i = I {y_i = t_i} = I {t_i ≤ c_i}. The observed data is D_obs = {y_i, δ_i, x_i; i ≤ n}, where x_i is the vector of covariates of the i^th subject. Let Inline graphic = {i : δ_i = 1}, = {i : δ_i = 0}, so consists of those subjects that fail before censoring, and those that are censored. Denote by 0 < τ₁ < τ₂ <···< τ_N the distinct values of y₁, …, y_n and τ₀ = 0. For j ≤ N, let

\begin{array}{l} D_{j} = {i \in D : y_{i} = τ_{j}}, N_{j} = {i \in N : y_{i} = τ_{j}}, \\ R_{j} = \underset{i \geq j}{\cup} (D_{i} \cup N_{i}), R_{j}^{'} = R_{j} \ D_{j}, \end{array}

(2.1)

so Inline graphic consists of subjects that fail at time τ_j, those censored at τ_j, those at risk in time interval (0, τ_j), and $R_{j}^{'}$ those at risk in time interval (0, τ_j]. Let n_T = |{j : ≠ ∅}| be the number of endpoints where failures occur. For A ⊂ {1, …, n}, take κ_A = (a₁, …, a_n) with a_i = I {i ∈ A}. For brevity, write κ_i = κ_{_i_}. Let

ϱ_{j} = κ_{R_{j}}, ω_{j} = κ_{R_{j}^{'}}, j = 1, \dots, N

(2.2)

and ϱ_N₊₁ = ω_N₊₁ = (0, 0, …, 0). All analyses are conditional on C₁, …, C_n.

A stochastic process Inline graphic = ( (t) : t ≥ 0) is said to be additive if it has independent increments, is stochastically continuous, and with probability 1, the function t → (t) is right-continuous in t ≥ 0 with (0) = 0 and has left limit in t > 0. In this paper, = (H₁, …, H_n) is an additive process taking values in $ℝ_{+}^{n}$ with ℝ₊ := [0, ∞), and we refer to Inline graphic as a pure jump process. It is well known that each H_i in is nondecreasing and, for $a \in ℝ_{+}^{n}$ ,

E [e^{- a^{'} W (t)}] = e^{- Ψ (a, t)} with Ψ (a, t) = \int_{0}^{t} d v \int (1 - e^{- a^{'} s}) φ (d s ∣ v),

(2.3)

where given t > 0, φ(ds |t) is a Lévy measure on $ℝ_{+}^{n}$ with ∫ min(1, |s|) φ(ds |t) < ∞ (Sato (1999)). We refer to Ψ as the characteristic exponent of Inline graphic . By Ferguson and Phadia (1979), is homogeneous if Ψ(a, t) = Ψ₁(a) Ψ₂(t).

Denote by U(0, 1) the uniform distribution on (0, 1), Gamma(a, b) the distribution with density I {x > 0}b⁻^ax^a⁻¹e⁻^x^/^b, Exp(c) = Gamma(1, c), and δ the unit mass concentrated at 0. If F is a nondecreasing function on ℝ₊, then let F^*(z) = inf{t > 0 : F(t) ≥ z} with the convention inf ∅ = ∞.

3. Joint Likelihood for Wear Process Model

3.1. N-variate wear process model

Assume that each of n subjects is exposed to a type of environmental fluc-tuation characterized by a nondecreasing stochastic process H_i with H_i(0) = 0 and H_i(∞) = ∞, such that Inline graphic = (H₁, …, H_n) is a pure jump process and, conditional on , the failure times T₁, …, T_n of the subjects are independent, with

P (T_{i} > t ∣ W) = e^{- H_{i} (t)}, i \leq n .

(3.1)

We assume Inline graphic is unobservable. We also assume the right-censoring times C₁, … C_n are independent of and T₁, …, T_n. The process is referred to as a (cumulative) wear process (Gaver (1963); Reynolds and Savage (1971)).

Example 1 (PH model)

In a Bayesian analysis of the PH model, typically there is a univariate pure jump process H such that, conditional on H, T₁, …, T_n are independent with P(T_i > t | H) = e^−γ_iH(t), where γ_i is a constant that may incorporate covariates of the i^th subject. Here, H is often referred to as the baseline cumulative hazard function. To account for possible changes over time of the covariates, one might take $P (T_{i} > t ∣ H) = exp {- \int_{0}^{t} γ_{i} (v) d H (v)}$ , where γ_i ≥ 0 is a bounded nonrandom function such that $\int_{0}^{\infty} γ_{i} (v) d H (v) = \infty$ with probability 1. Let $H_{i} (t) = \int_{0}^{t} γ_{i} (v) d H (v)$ and Inline graphic = (H₁, …, H_n). Since is H-measurable,

\begin{array}{l} P (T_{1} > t_{1}, \dots, T_{n} > t_{n} ∣ W) = E [P (T_{1} > t_{1}, \dots, T_{n} > t_{n} ∣ H) ∣ W] \\ = E [\prod_{i = 1}^{n} e^{- H_{i} (t_{i})} | W] = \prod_{i = 1}^{n} e^{- H_{i} (t_{i})} . \end{array}

The PH model can thus be formulated as an n-variate model with Inline graphic the wear process. Let φ₀(dx | t) be the Lévy measure of H. For $a \in ℝ_{+}^{n}$ and t > 0, since $E [e^{- a^{'} W (t)}] = E [e^{- \int_{0}^{t} λ (v) d H (v)}] = exp {- \int_{0}^{t} d v \int_{0}^{\infty} [1 - e^{- λ (v) x}] φ_{0} (d x ∣ v)}$ , with λ(t) = a₁γ₁(t) + ··· + a_n γ_n(t), the characteristic exponent of Inline graphic is

Ψ (a, t) = \int_{0}^{t} d v \int_{0}^{\infty} [1 - e^{- a_{1} γ_{1} (v) x - \dots - a_{n} γ_{n} (v) x}] φ_{0} (d x ∣ v) .

Consequently, the Lévy measure φ(ds |t) of Inline graphic , where s = (s₁, …, s_n), is as follows. Given t > 0, if all γ_i(t) = 0, then φ(ds | t) = 0. On the other hand, if γ_i(t) > 0 for some i, then

φ (d s ∣ t) = φ_{0} (\frac{d s_{i}}{γ_{i} (t)} ∣ t) \prod_{j \neq i} δ (d s_{j} - \frac{γ_{j} (t) s_{i}}{γ_{i} (t)}) .

(3.2)

Clearly, φ is determined by both φ₀ and γ;_i. In Bayesian analysis, often only the parameters in γ_i are estimated, while the parameters of φ₀ are regarded as hyperparameters. However, under the n-variate model, this distinction between φ₀ and γ_i disappears, as both become parameters of the wear process Inline graphic .

Example 2 (Lévy copula)

A Lévy copula survival model was studied by Epifani and Lijoi (2010), in which the subjects are divided into two nonempty groups and a bivariate pure jump process Z = (Z₁, Z₂) is used as the wear process such that, conditional on Z, the failure times are independent, and for each i = 1, …, n and j = 1, 2, if the i^th subject is in the j^th group, then P(T_i > t | Z) = e^−Z_j(t). By letting H_i = Z_j, the model becomes an n-variate model. Suppose subject 1 belongs to group 1, subject 2 belongs to group 2 and, for each i > 2, j_i is the index of the group subject i belongs to. Then the characteristic exponent and Lévy measure of Inline graphic = (H₁, …, H_n) are

\begin{array}{l} Ψ (a, t) & = \int_{0}^{t} d v \int (1 - e^{- a_{1} s_{1} - a_{2} s_{2} - \sum_{i > 2} a_{i} s_{j_{i}}}) φ_{0} (d s_{1}, d s_{2} ∣ v), \\ φ (d s ∣ t) & = φ_{0} (d s_{1}, d s_{2} ∣ t) \prod_{i > 2} δ (d s_{j} - s_{j_{i}}), s = (s_{1}, \dots, s_{n}), \end{array}

respectively, where φ₀ is the Lévy measure of Z.

Example 3 (Independent failure times)

In the above examples, the H_i are dependent processes, making T_i dependent random variables. If the H_i are independent, then the T_i are independent. If the Lévy measure of each H_i is φ_i(dx | t), then the characteristic exponent and Lévy measure of Inline graphic are

Ψ (a, t) = \sum_{i = 1}^{n} Ψ_{i} (a_{i}, t), φ (d s ∣ t) = \sum_{i = 1}^{n} φ_{i} (d s_{i} ∣ t) \prod_{j \neq i} δ (d s_{j}),

respectively, where Ψ_i(a_i, t) is the characteristic exponent of H_i.

Under the n-variate model, when the H_i are dependent, the probability of ties among T_i is positive. From (3.1),

P (T_{i} > t) = exp {- \int_{0}^{t} f_{i} (v) d v}, with f_{i} (v) = \int (1 - e^{- s_{i}}) φ (d s ∣ v) \geq 0,

so each T_i has a probability density. Then the T_i are dependent. It is noteworthy to mention that T_i are pairwise locally independent as defined by Oakes (1989). Let X, Y > 0 be random variables. For t = (t₁, t₂), let S(t) = P (X > t₁, Y > t₂) and D_α = ∂/∂t_α. Then $θ_{X Y}^{*} (t) = S (t) D_{1} D_{2} S (t) / [D_{1} S (t) \times D_{2} S (t)]$ is the ratio of the conditional hazard rate of X at t₁ given Y = t₂, to that of X at t₁ given Y > t₂. X and Y are called locally independent if $θ_{X Y}^{*} (t) \equiv 1$ .

Proposition 1

T₁, …, T_n are pairwise locally independent.

3.2. Likelihood function

The Lévy measure φ can be regarded as the only parameter of Inline graphic .

Theorem 1

The likelihood function of φ based on D_obs is

L (φ ∣ D_{obs}) = \prod_{j = 1}^{N} e^{- Ψ (ϱ_{j}, τ_{j}) + Ψ (ϱ_{j}, τ_{j - 1})} \times \prod_{D_{j} \neq \emptyset} \int e^{- ω_{j}^{'} s} \prod_{i \in D_{j}} (1 - e^{- s_{i}}) φ (d s ∣ τ_{j}) .

(3.3)

Kalbfleisch (1978) observed that in the setting of Example 1, if H is a Gamma process, then depending on its variability a spectrum of likelihoods can be obtained. To characterize this in general, write φ(ds|t) = cν(cds|t), with c > 0, where ν(ds|t) is a Lévy measure with support in $ℝ_{+}^{n}$ . Suppose for all i = 1, …, n and t > 0,

m_{i} (t) : = \int_{0}^{t} d v \int s_{i} ν (d s ∣ v) < \infty,

(3.4)

and σ_ii(t) < ∞, where $σ_{i j} (t) = \int_{0}^{t} d v \int s_{i} s_{j} ν (d s ∣ v)$ , j = 1, …, n. Let m(t) = (m₁(t), …, m_n(t)) and Σ(t) = (σ_ij(t)). Then

\begin{array}{l} E [W (t)] & = \int_{0}^{t} d v \int s φ (d s ∣ v) = \int_{0}^{t} d v \int c s ν (c d s ∣ v) = m (t), \\ Var [W (t)] & = \int_{0}^{t} d v \int s s^{'} φ (d s ∣ v) = \int_{0}^{t} d v \int c s s^{'} ν (c d s ∣ v) = c^{- 1} \sum (t) . \end{array}

Here, c is called a precision parameter; the larger c is, the less variable Inline graphic is. For us, c is fixed, its value will be determined via model selection, so it is not a part of the parameter to be estimated; see Sections 6–7 for more detail. Whenever c is involved, we rewrite the likelihood as L(ν|c, D_obs).

Proposition 2

If we fit D_obs to the model Inline graphic , with Lévy measure cν(cds|t) satisfying (3.4), then, as c →; ∞,

L (ν ∣ c, D_{obs}) \to I {all ∣ D_{j} ∣ = 0 o r 1} \times \prod_{i = 1}^{n} e^{- m_{i} (y_{i})} {[m_{i}^{'} (y_{i})]}^{δ_{i}} .

Consider Example 1 again. Suppose φ₀(dx|t) = ch(cx, t)dx for x > 0. Letting $g (t) = \int_{0}^{t} s h (s, v) d v$ , it can be seen that $m_{i} (t) = \int_{0}^{t} γ_{i} (v) g^{'} (v) d v$ . As a result, as c → ∞, the likelihood tends to

I {all ∣ D_{j} ∣ = 0 or 1} \times \prod_{i = 1}^{n} e^{- \int_{0}^{t} γ_{i} (v) g^{'} (v) d v} {[γ_{i} (y_{i}) g^{'} (y_{i})]}^{δ_{i}},

and hence it behaves similarly to the one under the PH model. However, whereas the likelihood based on the n-variate model automatically discriminates against ties in this case, the one based on the PH model cannot.

The result implicitly assumes that m is differentiable at every τ_j with Inline graphic ≠ 0. Following the argument for the existence of probability density of T_i, this indeed holds with probability 1. Here then is some information on the probability of ties for the case of most interest to us.

Proposition 3

Let Inline graphic be homogeneous such that, for any $a \in ℝ_{+}^{n}$ , Ψ(a, t) = Ψ₁(a)Ψ₂(t). Then

P (T_{i} = T_{j}) = \frac{Ψ_{1} (κ_{i}) + Ψ_{1} (κ_{j})}{Ψ_{1} (κ_{i} + κ_{j})} - 1, i \neq j .

(3.5)

If Inline graphic has Lévy measure φ(ds|t) = cν(cds|t) with ν(ds|t) = h(t)λ(ds) satisfying (3.4), where λ is a Lévy measure on $ℝ_{+}^{n}$ , then as c → ∞, P(T_i all different) → 1.

3.3. A Gamma wear process model

Let Inline graphic = γH, where γ = (γ₁, …, γ_n) is a constant vector with γ_i > 0 and H is a homogeneous Gamma process with Lévy measure

φ_{0} (d s ∣ t) = c f (t) I {s > 0} s^{- 1} e^{- c s} d s,

with c > 0 being the precision parameter and f = F′. Denote H ~ Inline graphic (cF, c). We refer to the corresponding n-variate model as the Gamma Process (GP) model. The parameters of the model are γ, F, and c, but c will be fixed via model selection and only γ and F will be estimated.

Corollary 1

The likelihood function for the GP model is

L (γ, F ∣ c, D_{obs}) = \prod_{j = 1}^{N} {(\frac{c}{c + ϱ_{j}^{'} γ})}^{c [F (τ_{j}) - F (τ_{j - 1})]} \times \prod_{j : D_{j} \neq \emptyset} c f (τ_{j}) \int_{0}^{\infty} s^{- 1} e^{- (c + ω_{j}^{'} γ) s} \prod_{i \in D_{j}} (1 - e^{- γ_{i} s}) d s .

(3.6)

The proof of (3.6) is quick. As a′ Inline graphic (t) = a′γH(t) ~ Gamma(cF(t), a′γ/c) for $0 \neq a \in ℝ_{+}^{n}$ , e^−Ψ(^a^,^t⁾ = (1 + a′γ/c)⁻^cF⁽^t⁾. Then the first factor on the right hand side in (3.6) follows from that in (3.3); the second factor follows from that in (3.3) and (3.2).

When the data have no ties, (3.6) can be shown to coincide with (14) in Kalbfleisch (1978). To take ties into account, Kalbfleisch (1978) derived a likelihood of regression coefficients in his (23) which, if expressed in integral form, is a part of the likelihood in (3.6) but with the factor Inline graphic cf (τ_j) missing. From Proposition 2, we get the following when c → ∞.

Corollary 2

Given D_obs, let the GP model be fit with Inline graphic = γH, where γ = (γ₁, …, γ_n) with γ_i > 0 and H ~ (cF, c). Fixing γ and F, as c → ∞,

L (γ, F ∣ c, D_{obs}) \to I {all ∣ D_{j} ∣ = 0 o r 1} \times \prod_{i = 1}^{n} e^{- γ_{i} F (y_{i})} {[γ_{i} f (y_{i})]}^{δ_{i}} .

Corollary 3

Suppose there is no censoring. Then for any i ≠ j,

P (T_{i} = T_{j}) = \frac{ln (1 + γ_{i} / c) + ln (1 + γ_{j} / c)}{ln (1 + γ_{i} / c + γ_{j} / c)} - 1.

Thus, as c → 0, P(T_i all equal) → 1, and as c → ∞, P (T_i all different) → 1.

It should be noted that in general, as the wear process becomes more variable, it is not necessarily true that P(T_i all equal) → 1. For example, let H be a generalized Gamma process with time-independent Lévy density c²h₁(cs), where h₁(s) = s⁻^α⁻¹e⁻^s, 0 < α < 1 (Hougaard (1986); Brix (1999); Epifani, Lijoi, and Prünster (2003); Lijoi, Mena, and Prünster (2007); Argiento, Guglielmi, and Pievatolo (2010)). Then Ψ₁(λ) = (1 + λ/c)^α − 1 and, by Proposition 3, as c → 0,

P (T_{i} = T_{j}) = \frac{{(γ_{i} + c)}^{α} + {(γ_{j} + c)}^{α} - 2 c^{a}}{{(γ_{i} + γ_{j} + c)}^{α} - c^{α}} - 1 \to \frac{γ_{i}^{α} + γ_{j}^{α}}{{(γ_{i} + γ_{j})}^{α}} - 1 \in (0, 1) .

4. Sampling of Survival Data from Gamma Process

Proposition 4

Let F be strictly increasing and α(t) = cF(t). Let G ~ Inline graphic (t, 1) be a standard Gamma process and η_i be i.i.d. Exp(c) random variables independent of G. Then (T₁, …, T_n) ~ (α⁻¹(G^*(η₁/γ₁)), …, α⁻¹(G^*(η_n/γ_n))).

Here, since η_i and G are independent, if we can sample G^*(θ_i) for an arbitrary fixed set of θ_i > 0, then we can sample T₁, …, T_n. The result follows from the inversion formula for univariate distributions (Devroye (1986)). The inversion is used by Bender, Augustin, and Blettner (2005) to sample failure times with no ties for the PH model. We next describe how to jointly sample G^*(θ_i) forwardly. For convenience, suppose θ_i are already sorted in increasing order.

Theorem 2

Let G ~ Inline graphic (t, 1). Given a single θ > 0, the distribution function of G^*(θ) is given by

P (G^{*} (θ) \leq t) = \frac{1}{Γ (t)} \int_{θ}^{\infty} u^{t - 1} e^{- u} d u

(4.1)

and, given G^*(θ) = τ, the conditional distribution of G(τ) is

P (G (τ) \leq r ∣ G^{*} (θ) = τ) = \frac{M_{θ, τ} (r)}{M_{θ, τ} (\infty)}, θ \leq r < \infty,

(4.2)

where

M_{θ, τ} (r) = \int_{θ}^{r} e^{- s} (\int_{0}^{θ} \frac{u^{τ - 1} d u}{s - u}) d s .

This enables us to sample G^*(θ_i) for 0 < θ₁ < ··· < θ_n, as follows. First sample τ₁ = G^*(θ₁) from (4.1) and r₁ = G(τ₁), conditional on τ₁, from (4.2). If r₁ ≥ θ_n, then all G^*(θ_i) = τ₁. Otherwise, with s the number with θ_s ≤ r₁ < θ_s₊₁, G^*(θ₁) = ··· = G^*(θ_s) = τ₁ < G^*(θ_s₊₁) ≤ ··· ≤ G^*(θ_n). In general, if (τ₁, r₁), …, (τ_k, r_k) have been sampled but there is s < n such that θ_s ≤ r_k < θ_s₊₁, then τ_k₊₁ and r_k₊₁ = G^*(τ_k₊₁) are sampled as follows. First, independently from all (τ_j, r_j), j ≤ k, sample τ̃_k₊₁ ~ G^*(θ_s₊₁ − r_k) from (4.1) and r̃_k₊₁ ~ G(τ̃_k₊₁), conditional on G^*(θ_s₊₁ − r_k) = τ̃_k₊₁, from (4.2). Then τ_k₊₁ = G^*(θ_s₊₁) = τ_k + τ̃_k₊₁ and r_k₊₁ = r_k + r̃_k₊₁. If r_k₊₁ ≥ θ_n, then all G^*(θ_s₊₁) = ··· = G(θ_n) = τ_k, otherwise, sample the next distinct failure time τ_k₊₂ and r_k₊₂. The procedure continues until all G^*(θ_i) are sampled.

Gibbs sampling can be applied to the conditional distribution (4.2). Introduce two latent variables U and V such that 0 < U < θ, V > 0 and, conditional on G^*(θ) = τ, G(τ), U, and V have joint density

m_{θ} (r, u, v) \propto e^{- r} u^{τ - 1} e^{- v (r - u)}, θ < r < \infty, 0 < u < θ, 0 < v < \infty .

Let ζ = ln[U/(θ − U)] and denote the conditional joint density of G(τ), ζ, and V by m_θ(r, z, v). Using the collapsed Gibbs method (Liu (1994); Chen, Shao, and Ibrahim (2000)), we then sample from, in turn: (i) m_θ(z|r), (ii) m_θ(v|r, z), and (iii) m_θ(r|v). For (i), we have

m_{θ} (z ∣ r) \propto \frac{e^{τ z}}{{(1 + e^{z})}^{τ}} \times \frac{1}{r + (r - θ) e^{z}}, - \infty < z < \infty;

this can be shown to be a log-concave density with the conditional mode

z_{mod} = ln [\frac{(τ - 1) (r - θ) + {{[(τ - 1) (r - θ)]}^{2} + 4 (r - θ) τ r}^{1 / 2}}{2 (r - θ)}],

thus allowing the application of the adaptive-rejection algorithm of Gilks and Wild (1992) to sample ζ conditional on G(τ) = r. For (ii), we have

m_{θ} (v ∣ r, z) \propto exp {- v (r - \frac{θ e^{z}}{1 + e^{z}})}, 0 < v < \infty,

which is an exponential density with mean [r− θe^z/(1+e^z)]⁻¹. For (iii), m_θ(r|v) ∝ e⁻^r⁽¹⁺^v⁾, r > θ, and hence sampling r is also straightforward. We use the following algorithm to generate failure times that may have ties.

Direct Forward Sampling (DFS) Algorithm

Set n (number of failure times), c (precision parameter), and γ₁, …, γ_n (coefficients).
Generate η_i i.i.d. ~ Exp(c) and set θ_i = η_i/γ_i for i ≤ n.
Rearrange (γ_i, θ_i) so that 0 < θ₁ < · · · < θ_n.
Initialize k = 0, t = 0, and h = 0.
Generate τ, which is a realization of G^*(θ_k₊₁ − h).
Generate r, which is a realization of G(τ) conditional on G^*(θ_k₊₁ − h) = τ.
Update t to t + τ.
Update h to h + r.
If h ≥ θ_n, then G^*(θ_k₊₁) = · · · = G^*(θ_n) = t. Go to Step 14.
If θ_k₊_s ≤ h < θ_k₊_s₊₁ for some 1 ≤ s < n − k, then G^*(θ_k₊₁) = · · · = G^*(θ_k₊_s) = t.
Update k to k + s.
Go to Step 5.
Follow Steps 8 through 12 until all G^*(θ₁), …, G^*(θ_n) are generated.
Return c⁻¹F⁻¹(G^*(θ_i)), the failure times from (cF, c).

5. Bayesian Posterior Inference

Henceforth, we assume that in the GP model $γ_{i} = exp (x_{i}^{'} β)$ , where x_i is the vector of covariates of the i^th subject and β the vector of regression coefficients. Our goal is to develop posterior inference for (β, F) given c.

5.1. Prior

We assume a piecewise linear model for F as follows. Partition the time axis into K intervals (a₀, a₁], (a₁, a₂], …, (a_K₋₁, a_K], where a₀ = 0 and a_K ≥ τ_N. Then let f(t) = F′(t) = λ_k for a_k₋₁ < t ≤ a_k. Under our model,

F (τ_{j}) - F (τ_{j - 1}) = \sum_{k = 1}^{K} λ_{k} d_{j k}, with d_{j k} = ∣ (a_{k - 1}, a_{k}) \cap (τ_{j - 1}, τ_{j}) ∣ .

(5.1)

Let λ = (λ₁, …, λ_K)′, and for j ≤ N, ν(j) be the unique index with a_ν₍_j_{) − 1} < τ_j ≤ a_ν₍_j₎. Then, the likelihood function in (3.6) can be rewritten as

L (β, λ ∣ c, D_{obs}) = \prod_{j = 1}^{N} {(\frac{c}{c + ϱ_{i}^{'} γ})}^{c [F (τ_{j}) - F (τ_{j - 1})]} \times {[c λ_{ν (j)} \int_{0}^{\infty} s^{- 1} e^{- (c + ω_{j}^{'} γ) s} \prod_{i \in D_{j}} (1 - e^{- γ_{j} s}) d s]}^{I {D_{j} \neq \emptyset}} .

(5.2)

We assume the prior $π (β, λ) \propto exp (- β^{'} \sum_{0}^{- 1} β / 2) Π_{k = 1}^{K} λ_{k}^{α_{0} - 1} e^{- α_{1} λ_{k}}$ . Under the prior, β, and λ₁, …, λ_K are independent, with β ~ N_p(0, Σ₀) and λ_k ~ Gamma(α₀, 1/α₁). In Sections 6 and 7, we specify Σ₀ = 10⁴I_p and α₀ = α₁ = 10⁻², which lead to a relatively vague prior for (β, λ).

5.2. Posterior computation

To sample the joint posterior distribution

π (β, λ ∣ c, D_{obs}) \propto L (β, λ ∣ c, D_{obs}) π (β, λ),

introduce the latent variable s = (s_j : Inline graphic ≠ ∅, j ≤ N) and define an augmented joint posterior distribution π(β, λ, s | c, D_obs) ∝ L(β, λ, s | c, D_obs)π(β, λ), where

L (β, λ, s ∣ c, D_{obs}) = \prod_{j = 1}^{N} {(\frac{c}{c + ϱ_{j}^{'} γ})}^{c [F (τ_{j}) - F (τ_{j - 1})]} {[c λ_{ν (j)} s_{j}^{- 1} e^{- s_{j} (c + ω_{j}^{'} γ)} \prod_{i \in D_{j}} (1 - e^{- γ_{i} s_{j}})]}^{I {D_{j} \neq \emptyset}} .

π (β ∣ λ, s, c, D_{obs}) = \prod_{j = 1}^{N} {(\frac{c}{c + ϱ_{j}^{'} γ})}^{c [F (τ_{j}) - F (τ_{j - 1})]} {[e^{- s_{j} (c + ω_{j}^{'} γ)} \prod_{i \in D_{j}} (1 - e^{- γ_{i} s_{j}})]}^{I {D_{j} \neq \emptyset}} π (β) .

Since $γ_{i} = exp (x_{i}^{'} β)$ , it is easy to show that π(β | λ, s, D_obs) is log-concave in each component of β, and so we can use the adaptive rejection algorithm of Gilks and Wild (1992) to sample β. For (ii), given β and s, λ₁, …, λ_K are conditionally independent and, for each k, the conditional posterior distribution of λ_k is

π (λ_{k} ∣ β, s, c, D_{obs}) \propto λ_{k}^{α_{0} + \sum_{j = 1}^{N} I {D_{j} \neq \emptyset, a_{k - 1} < τ_{j} \leq a_{k}}} exp {- λ_{k} [α_{1} - c \sum_{j = 1}^{N} d_{j k} ln (\frac{c}{c + ϱ_{j}^{'} γ})]} .

Thus, λ_k follows a Gamma distribution that is easy to sample. For (iii), given β, s₁, …, s_N are conditionally independent and, for each j with Inline graphic ≠ ∅,

π (s_{j} ∣ β, c, D_{obs}) \propto s_{j}^{- 1} e^{- (c + ω_{j}^{'} γ) s_{j}} \prod_{i \in D_{j}} (1 - e^{- γ_{i} s_{j}}) .

Let u_j = ln s_j. Then the conditional posterior density of u_j is

π (u_{j} ∣ β, c, D_{obs}) \propto exp {- (c + ω_{j}^{'} γ) e^{u_{j}}} \prod_{i \in D_{j}} (1 - exp {- γ_{i} e^{u_{j}}}) .

It is easy to show that π(u_j | β, D_obs) is log-concave. Then we again can use the adaptive rejection algorithm to sample u_j and set s_j = exp(u_j).

6. A Simulation Study

We conducted a simulation study to compare the PH model and the GP model with H ~ Inline graphic (cF, c). As the value of c is unknown in practice, to guide the choice of c in fitting the GP model, we use deviance information criterion (DIC) (Spiegelhalter et al. (2002)). Define the deviance function

D (ψ) = - 2 ln L (β, λ ∣ c, D_{obs}),

where ψ = (β′, λ′)′ and L(β, λ | c, D_obs) is given in (5.2). Then

DIC = D (\bar{ψ}) + 2 p_{D},

(6.1)

where ψ̄ = E[ψ |D_obs] and $p_{D} = \bar{D (ψ)} - D (\bar{ψ})$ with $\bar{D (ψ)} = E [D (ψ) ∣ D_{obs}]$ . In (6.1), D(ψ̄) measures the goodness-of-fit, and p_D is the effective number of model parameters. The DIC is a Bayesian measure of fit or adequacy with 2p_D being the dimensional penalty term. The smaller the DIC value, the better the model fits the data. In this simulation study, our second goal was to examine the performance of DIC in correctly identifying c in the fitted the GP model.

In the simulation study, the data were generated as follows. We generated x_i = (x_i₁, x_i₂)′, i ≤ n, where x_i₁ ~ N(0, 1), x_i₂ ~ Bernoulli(0.7) were all independent. We set β = (β₁, β₂) = (1, −0.5), and F(t) = t, and considered sample sizes n = 250 and 500. We used the DFS algorithm in Section 4 to generate failure times from the GP model with $γ_{i} = exp (x_{i}^{'} β)$ , and for c as 1, 10, and 100. We independently generated n censored times from a rescaled beta distribution such that C_i = 38q_i with q_i ~ beta(1, 3), which yielded approximately 15% of censored observations for each simulated data set. We independently generated 500 data sets under each simulation setting.

For each data set, we let N_total = Σ_j | Inline graphic |I {| | > 1} and N_max = max_j | |, where is defined as (2.1). Figure 1 shows the boxplots of $N_{total}^{(1)}, \dots, N_{total}^{(500)}$ and Figure 2 shows the boxplots of $N_{max}^{(1)}, \dots, N_{max}^{(500)}$ for the 500 simulated data sets under the six simulation settings. From Figure 1, we can see that as c increases from 1 to 100, $N_{total}^{(1)}, \dots, N_{total}^{(500)}$ in the simulated data sets decrease as a whole, and their median drops substantially.

Boxplots of the total numbers of ties in 500 simulated data sets of sizes n = 250 and n = 500 generated from the GP models with c = 1, 10, and 100.

Boxplots of the maximum numbers of ties in 500 simulated data sets of sizes n = 250 and n = 500 generated from the GP model with c = 1, 10, and 100.

For each simulated data set, we fit the PH model with a constant baseline hazard rate function and the GP model with F(t) = t; the true value of c was used in the simulation. For each simulated data set, we implemented the Gibbs sampling algorithm of Section 5.2 and used 5,000 Gibbs iterations after a burnin of 500 iterations to compute the posterior estimates. Let β̂_j_ℓ and sd_ℓ(β_j) denote the posterior mean and the posterior standard deviation of β_j computed from the ℓ^th simulated data set for ℓ = 1, …, 500. The simulation posterior estimate (Est), the simulation posterior standard deviation (SD), the simulation error (SE), and the mean squared error (MSE) for β_j are, respectively, ${\bar{\hat{β}}}_{j} = (1 / 500) \sum_{ℓ = 1}^{500} {\hat{β}}_{j ℓ}, \bar{sd} (β_{j}) = (1 / 500) \sum_{ℓ = 1}^{500} {sd}_{ℓ} (β_{j}), SE (β_{j}) = {[(1 / 499) \sum_{ℓ = 1}^{500} {({\hat{β}}_{j ℓ} - {\bar{\hat{β}}}_{j})}^{2}]}^{1 / 2}$ , and $MSE (β_{j}) = (1 / 500) \sum_{ℓ = 1}^{500} {({\hat{β}}_{j ℓ} - β_{j})}^{2}$ , where β_j is the true value. We define the same simulation summary statistics for λ. We let CP denote the coverage probability of the 95% highest posterior density (HPD) intervals that contain the true parameter value in the 500 simulated data sets, using the Monte Carlo method developed by Chen and Shao (1999). Table 1 shows these simulation summary statistics. We see that the GP model generally performed well and the posterior estimates were very close to the true values of β and λ, and the coverage probabilities were close to 95%. Meanwhile, the PH model performed poorly and there were substantial biases in the posterior estimates, especially when c was small. When c = 100, the performance of the PH model improved and the biases of the posterior estimates under the PH model were reduced considerably, but the coverage probabilities were still smaller than the expected 95%, especially for β₁ when n = 500.

Table 1.

Summary of posterior estimates for the GP and the PH Models in simulation studies.

n	Parameter		True c = 1		True c = 10		True c = 100
n	Parameter		GP	PH	GP	PH	GP	PH
250	β₁	True	1		1		1
		Est	1.012	0.827	1.013	0.965	1.002	0.995
		SD	0.096	0.070	0.085	0.072	0.078	0.073
		SE	0.094	0.242	0.085	0.127	0.079	0.085
		MSE	0.009	0.089	0.007	0.017	0.006	0.007
		CP	0.944	0.300	0.950	0.682	0.942	0.906

	β₂	True	−0.5		−0.5		−0.5
		Est	−0.504	−0.407	−0.514	−0.482	−0.498	−0.494
		SD	0.178	0.158	0.152	0.156	0.139	0.156
		SE	0.157	0.176	0.142	0.158	0.128	0.144
		MSE	0.025	0.040	0.020	0.025	0.016	0.021
		CP	0.974	0.894	0.962	0.956	0.966	0.970

	λ	True	1		1		1
		Est	1.055	0.903	1.020	0.964	1.011	1.007
		SD	0.225	0.116	0.135	0.123	0.113	0.128
		SE	0.234	0.782	0.136	0.254	0.109	0.145
		MSE	0.057	0.619	0.019	0.066	0.012	0.021
		CP	0.940	0.262	0.944	0.654	0.962	0.924

500	β₁	True	1		1		1
		Est	1.010	0.824	0.999	0.964	1.003	0.995
		SD	0.068	0.050	0.061	0.051	0.057	0.051
		SE	0.071	0.244	0.063	0.129	0.055	0.062
		MSE	0.005	0.091	0.004	0.018	0.003	0.004
		CP	0.938	0.184	0.948	0.532	0.964	0.892

	β₂	True	−0.5		−0.5		−0.5
		Est	−0.501	−0.400	−0.511	−0.489	−0.508	−0.500
		SD	0.129	0.113	0.113	0.111	0.098	0.110
		SE	0.117	0.156	0.098	0.112	0.094	0.108
		MSE	0.014	0.034	0.010	0.013	0.009	0.012
		CP	0.966	0.754	0.972	0.952	0.948	0.952

	λ	True	1		1		1
		Est	1.026	0.860	1.020	0.986	1.013	1.000
		SD	0.188	0.078	0.108	0.089	0.081	0.090
		SE	0.190	0.597	0.102	0.263	0.079	0.112
		MSE	0.037	0.375	0.011	0.070	0.006	0.013
		CP	0.940	0.200	0.970	0.498	0.956	0.870

Open in a new tab

To examine the performance of DIC, for each simulated data set, we fit the GP model with c = 1, 10, and 100 when the true c = 1 or 100. When the true c = 10, we fit the GP model with c = 1, 10, 15, and 100. In the DIC computation, we used (3.6) to compute L(β,λ, c |D_obs). Our simulation codes were written in FORTRAN 95 with double precision. The IMSL subroutine DQDAGI was used for evaluating all one-dimensional integrals involved in the likelihood function. For ℓ = 1,…, 500, let DIC_c,_ℓ denote the DIC computed for the ℓ^th simulated data set. The boxplots of the DIC differences Δ_ℓ(c, c′) = DIC_c_′_,_ℓ − DIC_c,_ℓ for different values of c′ and c in Figure 3 show that DIC could identify the true GP model correctly for most of the simulated data sets and the DIC differences were quite large when the value of c in the fitted GP model was far from the true c. Even when c in the fitted GP model was close to the true one, for example, when c = 15 in the fitted GP model and the true c = 10, the boxplot shown in Figure 3 is nearly above zero, but with much smaller DIC differences.

Boxplots of DIC differences in 500 simulations for the data of sizes n = 250 and n = 500 generated from the GP model with c = 1, c = 10, and c = 100.

7. Analysis of Prostate Cancer Data

We considered a subset of the data from a prostate cancer study published by D’Amico et al. (2010), which consisted of 558 patients with high risk prostate cancer, namely, prostate specific antigen (PSA) > 20, clinical Gleason score ≥ 8, or clinical stage T3 or higher. All patients in the subset were treated with radical prostatectomy (RP) between 1989 and 2008. In these data, the response is time to PSA failure or time to the last follow-up from the time of RP, whichever is smaller. The time of PSA failure is the time of prostate cancer recurrence after RP. The clinical implication of PSA recurrence is that men are offered salvage therapy (second treatment), which may prolong life or cure the patient but may have side effects. The covariates include age in years at the date of RP, the logarithm of PSA (logpsa), pathological Gleason score (pGS7 and pGS8H), pathological stage (pT3H), positive surgical margin (Margin), and year of RP. Among these covariates, age, logpsa, and year of RP are continuous, while pGS7 = 1 and pGS8H = 0 if pathological Gleason score was 7, pGS7 = 0 and pGS8H = 1 if pathological Gleason score was 8 or higher, and pGS7 = 0 and pGS8H = 0 if pathological Gleason score was 6 or less; pT3H = 1 if pathological stage was T3 or higher and 0 otherwise; Margin = 1 if the surgical margin was positive and 0 if surgical margin was negative. There are 216 censored and 342 failed patients in the data. The total number of ties is 215 and the maximum size of tied group is 16.

We fit the GP model with the seven covariates (age, logpsa, pGS7, pGS8H, pT3H, Margin, year of RP) to the data. In all the posterior computations, the covariates were standardized. A piecewise linear model was assumed for F(t) in the GP model. The intervals (a_k₋₁, a_k] were chosen to be the (100k/K)^th percentile of the ordered distinct failure times for k ≤ K. The model parameters included β = (β₁,…, β₇)′ and λ= (λ₁,…,λ_K)′. We computed DIC and p_D under various values of c and K. The values of DIC are plotted in Figure 4. For all the values of c considered, the values of p_D range from 12.03 to 12.16 for K = 5, 17.03 to 17.13 for K = 10, and 22.12 to 22.23 for K = 15; these are almost the same as those corresponding numbers of parameters. The GP model with c = 185 and K = 10 attained the smallest DIC value among all of the combinations of (c, K) considered. However, as seen from Figure 4, the DIC values were very close for 170 ≤ c ≤ 200. In fact, for K = 10, the DIC values were 2,939.63, 2,939.61, 2,939.49, 2,939.32, 2,939.40, 2,939.64 for c = 170, 175, 180, 185, 190, 200, respectively. To further verify this finding, we simulated 500 data sets of size n = 558 from the GP model with c = 185 under the simulation setting discussed in Section 6; the resulting median and IQR of N_total were 219 and (208, 229), closely matching N_total = 215 in the prostate cancer data. In addition, as shown in Figure 4, the GP model with K = 10 clearly outperforms those with K = 5 and K = 15 according to the DIC measure.

Plots of DIC values versus c with K = 5, 10, 15 for the prostate cancer data.

Under the best DIC GP model with c = 185 and K = 10, we computed the posterior means, posterior standard deviations (SD), and 95% HPD intervals of β. We also fit the PH model with the piecewise linear baseline hazard function with K = 10. Table 2 shows the maximum partial likelihood estimates (MPLEs) and Bayes estimates of β. From Table 2, we see that (i) under the PH model, the Bayes estimates were very close to the MPLEs; (ii) the estimates of β₁ and β₇ were very similar under the PH and GP models; (iii) the estimates of β₂, β₅, and β₆ under the PH model were slightly smaller than those under the GP model; and (iv) the estimates of β₃ and β₄ under the PH model were much smaller than those under the GP model. The difference in the estimates of β is expected as there were a large number of ties in the data and the best GP model was the one with c = 185 according to the DIC measure. Also, due to the large value of c, the difference in the estimates of β should not be too large as shown in our simulation study. When the regression coefficients were underestimated, the effects of the covariates could not be accurately assessed, which may lead to an incorrect conclusion regarding the impact of important clinical factors, such as a pathological Gleason score, on the risk of PSA failure.

Table 2.

Estimates of β under the PH and GP Models for the prostate cancer data.

Method	Variable	Parameter	Estimate	SD^*	95% Interval^†
MPLE	age	β₁	0.003	0.008	(−0.012, 0.019)
	logpsa	β₂	0.282	0.066	(0.152, 0.411)
	pGS7	β₃	0.461	0.178	(0.113, 0.809)
	pGS8H	β₄	0.916	0.177	(0.570, 1.263)
	pT3H	β₅	0.533	0.145	(0.248, 0.818)
	Margin	β₆	0.532	0.121	(0.296, 0.769)
	year of RP	β₇	−0.054	0.014	(−0.082, −0.026)

Bayes Based on PH Model	age	β₁	0.003	0.008	(−0.013, 0.019)
	logpsa	β₂	0.270	0.066	(0.144, 0.403)
	pGS7	β₃	0.468	0.179	(0.114, 0.818)
	pGS8H	β₄	0.913	0.178	(0.569, 1.267)
	pT3H	β₅	0.534	0.146	(0.249, 0.823)
	Margin	β₆	0.527	0.121	(0.289, 0.762)
	year of RP	β₇	−0.052	0.014	(−0.079, −0.024)

Bayes Based on GP Model	age	β₁	0.003	0.008	(−0.012, 0.019)
	logpsa	β₂	0.309	0.067	(0.175, 0.437)
	pGS7	β₃	0.507	0.177	(0.163, 0.854)
	pGS8H	β₄	0.992	0.175	(0.639, 1.329)
	pT3H	β₅	0.553	0.144	(0.278, 0.840)
	Margin	β₆	0.566	0.120	(0.337, 0.806)
	year of RP	β₇	−0.053	0.014	(−0.081, −0.025)

Open in a new tab

For MPLE, the values under the SD column are the standard errors of the estimates.

^†

For MPLE, the 95% intervals are the 95% confidence intervals while for Bayes, those intervals are the 95% HPD intervals.

In all the Bayesian computations in this section, we used 50,000 Gibbs iterations after a burn-in of 1,000 iterations to compute the posterior estimates, including DICs, posterior means, posterior standard deviations, and 95% HPD intervals. The convergence of the Gibbs sampling algorithm was checked and the autocorrelations for all model parameters disappeared before lag 5.

8. Discussion

We have carried out an in-depth investigation of the GP model and its properties. Our results are obtained as special cases of a general multivariate wear process model. A novel DFS algorithm and a new Gibbs sampling algorithm have been developed that allow us to generate the tied failure times from the GP model, and to carry out posterior computations. The simulation study of Section 6 revealed some empirical properties of the GP model and the degree of biases of the parameter estimates when fitting the PH model to the data generated from the GP model.

One potential limitation of our analysis is its use of homogeneous Gamma process to model the baseline wear process H. This choice allows us to obtain the joint likelihood in a form explicit enough to achieve several goals, including posterior sampling of parameters and model selection. To our best knowledge, processes such as the Dirichlet and the Beta, do not yield such formulas for the joint likelihood. Furthermore, in S2 of the Supplementary Material, we argue that, under mild conditions, a Beta process is a homogeneous Gamma process plus an independent compound Poisson process with bounded Lévy density. This suggests that our sampling algorithm of failure times can be extended to Beta processes. It also implies that Beta and Gamma processes have similar behavior at small jumps, while the former have more large jumps. Therefore, for data sets that exhibit few large jumps, these two should have similar performance as models for H. In S2, we also comment on how to extend the sampling algorithm to other types of pure jump processes.

In our simulation study and data analysis, we used DIC to determine the value of c when we fit the GP model to survival data with ties. Our simulation study showed that DIC was an effective measure in determining the true value of c. As an extension of this research, one can assume that c is an unknown parameter. With a prior distribution for c, posterior inference needs to be carried out. An unknown c may pose a computational challenge in sampling from its conditional posterior distribution. Theoretically, when there is a large number of ties, the PH model is not appropriate because under the model the probability of tied failure times is zero. When there are no ties in failure times, as shown in Proposition 2, the likelihood function under the GP model converges to the one under the PH model when the fitted wear process is concentrated. In practice, one can fit a GP model to survival data and then determine the “best” value of c according to DIC. When c is large, the PH model might be appropriate for fitting such survival data. Other extensions of the proposed methodology include time-dependent covariates, multivariate failure times, and non-proportional hazards models. These extensions are currently under investigation.

Supplementary Material

Supplemental 1

NIHMS723313-supplement-Supplemental_1.pdf^{(244.3KB, pdf)}

Acknowledgments

The authors thank the referees for their careful reviews, an associate editor, and the Editor. A suggestion of one of the referees inspired our development of the multivariate wear process model. The authors thank Dr. Anthony V. D’Amico of Brigham and Women’s Hospital and Dana-Farber Cancer Institute for providing the prostate cancer data and advice on the clinical implication of PSA failure. Dr. Chen’s research was partially supported by NIH grants #GM70335 and #P01CA142538.

Footnotes

Supplementary Materials

The online supplementary material has two sections. Section S1 contains proofs of the theoretical results of the paper. Section S2 is a discussion on possible extension to wear processes other than the Gamma processes considered in the paper.

References

Antelman G, Savage IR. Characteristic functions of stochastic integrals and reliability theory. Naval Res Logist Quart. 1965;12:199–222. [Google Scholar]
Argiento R, Guglielmi A, Pievatolo A. Bayesian density estimation and model selection using nonparametric hierarchical mixtures. Comput Statist Data Anal. 2010;54:816–832. [Google Scholar]
Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statist Med. 2005;24:1713–1723. doi: 10.1002/sim.2059. [DOI] [PubMed] [Google Scholar]
Breslow NE. Covariance analysis of censored survival data. Biometrics. 1974;30:89–100. [PubMed] [Google Scholar]
Brix A. Generalized gamma measures and shot-noise Cox processes. Adv Appl Probab. 1999;31:929–953. [Google Scholar]
Burridge J. Empirical Bayes analysis of survival time data. J Roy Statist Soc Ser B. 1981;43:65–75. [Google Scholar]
Chen MH, Shao QM. Monte Carlo estimation of Bayesian credible and HPD intervals. J Comput Graph Statist. 1999;8:69–92. [Google Scholar]
Chen MH, Ibrahim JG, Shao QM. Posterior propriety and computation for the Cox regression model with applications to missing covariates. Biometrika. 2006;93:791–807. [Google Scholar]
Chen M-H, Shao Q-M, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. Springer-Verlag; New York: 2000. [Google Scholar]
Chi Z. Technical Report 28. Department of Statistics, University of Connecticut; 2012. On exact sampling of the first passage event of Lévy process with infinite Lévy measure and bounded variation. Available at arxiv.org with article id 1207.2495. [Google Scholar]
Clayton DG. A Monte Carlo method for Bayesian inference in frailty models. Biometrics. 1991;47:467–485. [PubMed] [Google Scholar]
Cox DR. Regression models and life-tables. J Roy Statist Soc Ser B. 1972;34:187–220. [Google Scholar]
Cox DR. Partial likelihood. Biometrika. 1975;62:269–276. [Google Scholar]
D’Amico AV, Chen MH, Sun L, Lee WR, Mouraviev V, Robertson CN, Walther PJ, Polascik TJ, Albala DM, Moul JW. Adjuvant versus salvage radiation therapy for prostate cancer and the risk of death. BJU International. 2010;106:1618–1622. doi: 10.1111/j.1464-410X.2010.09447.x. [DOI] [PubMed] [Google Scholar]
Damien P, Laud PW, Smith AFM. Implementation of Bayesian non-parametric inference based on beta processes. Scand J Statist. 1996;23:27–36. [Google Scholar]
Devroye L. Nonuniform Random Variate Generation. Springer-Verlag; New York: 1986. [Google Scholar]
Doksum K. Tailfree and neutral random probabilities and their posterior distributions. Ann Probab. 1974;2:183–201. [Google Scholar]
Dykstra RL, Laud P. A Bayesian nonparametric approach to reliability. Ann Stat. 1981;9:356–367. [Google Scholar]
Efron B. The efficiency of Cox’s likelihood function for censored data. J Amer Statist Assoc. 1977;72:557–565. [Google Scholar]
Epifani I, Lijoi A. Nonparametric priors for vectors of survival functions. Statist Sinica. 2010;20:1455–1484. [Google Scholar]
Epifani I, Lijoi A, Prünster I. Exponential functionals and means of neutral-to-the-right priors. Biometrika. 2003;90:791–808. [Google Scholar]
Ferguson TS. A Bayesian analysis of some nonparametric problems. Ann Stat. 1973;1:209–230. [Google Scholar]
Ferguson TS, Phadia EG. Bayesian nonparametric estimation based on censored data. Ann Stat. 1979;7:163–186. [Google Scholar]
Gaver DP., Jr Random hazard in reliability problems. Technometrics. 1963;5:211–226. [Google Scholar]
Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Appl Statist. 1992;41:337–348. [Google Scholar]
Gold LS, Kane LB, Sotoodehnia N, Rea T. Disaster events and the risk of sudden cardiac death: a Washington State investigation. Prehosp Disaster Medicine. 2007;22:313–317. doi: 10.1017/s1049023x00004921. [DOI] [PubMed] [Google Scholar]
Hjort NL. Nonparametric Bayes estimators based on beta processes in models for life history data. Ann Stat. 1990;18:1259–1294. [Google Scholar]
Hougaard P. Survival models for heterogeneous populations derived from stable distributions. Biometrika. 1986;73:387–396. [Google Scholar]
Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. Springer-Verlag; New York: 2001. [Google Scholar]
James LF. Bayesian Poisson process partition calculus with an application to Bayesian Lévy moving averages. Ann Stat. 2005;33:1771–1799. [Google Scholar]
James LF. Poisson calculus for spatial neutral to the right processes. Ann Stat. 2006;34:416–440. [Google Scholar]
Kalbfleisch JD. Non-parametric Bayesian analysis of survival time data. J Roy Statist Soc Ser B. 1978;40:214–221. [Google Scholar]
Kim Y, Kim D. Bayesian partial likelihood approach for tied observations. J Statist Plann Inference. 2009;139:469–477. [Google Scholar]
Kim Y, Lee J. Bayesian analysis of proportional hazard models. Ann Stat. 2003;31:493–511. [Google Scholar]
Kim Y, Park JK, Kim G. Bayesian analysis for monotone hazard ratio. Lifetime Data Anal. 2011;17:302–320. doi: 10.1007/s10985-010-9181-x. [DOI] [PubMed] [Google Scholar]
Laud PW, Damien P, Smith AFM. Practical Nonparametric and Semiparametric Bayesian Statistics. Springer; New York: 1998. Bayesian nonparametric and covariate analysis of failure time data; pp. 213–225. [Google Scholar]
Lee J, Kim Y. A new algorithm to generate beta processes. Comput Statist Data Anal. 2004;47:441–453. [Google Scholar]
Lijoi A, Mena RH, Prünster I. Controlling the reinforcement in Bayesian non-parametric mixture models. J Roy Statist Soc Ser B. 2007;69:715–740. [Google Scholar]
Lijoi A, Prünster I, Walker SG. Investigating nonparametric priors with Gibbs structure. Statist Sinica. 2008a;18:1653–1668. [Google Scholar]
Lijoi A, Prünster I, Walker SG. Posterior analysis for some classes of nonpara-metric models. J Nonparametr Statist. 2008b;20:447–457. [Google Scholar]
Liu JS. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J Amer Statist Assoc. 1994;89:958–966. [Google Scholar]
Lo AY, Weng CS. On a class of Bayesian nonparametric estimates. II Hazard rate estimates. Ann Inst Statist Math. 1989;41:227–245. [Google Scholar]
Nieto-Barajas LE, Walker SG. Bayesian nonparametric survival analysis via Lévy driven Markov processes. Statist Sinica. 2004;14:1127–1146. [Google Scholar]
Oakes D. Bivariate survival models induced by frailties. J Amer Statist Assoc. 1989;84:487–493. [Google Scholar]
Peccati G, Prünster I. Linear and quadratic functionals of random hazard rates: an asymptotic analysis. Ann Appl Probab. 2008;18:1910–1943. [Google Scholar]
Peto R. Contribution to the discussion of paper by D. R. Cox. J Roy Statist Soc Ser B. 1972;34:205–207. [Google Scholar]
Reynolds DS, Savage IR. Random wear models in reliability theory. Adv Appl Probab. 1971;3:229–248. [Google Scholar]
Rossi PH, Berk RA, Lenihan KJ. Money, Work and Crime: Experimental Evidence. Academic Press; New York: 1980. [Google Scholar]
Sato K-I. Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press; Cambridge: 1999. [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. J Roy Statist Soc Ser B. 2002;64:583–639. [Google Scholar]
Walker S, Muliere P. Beta-Stacy processes and a generalization of the Pólya-urn scheme. Ann Stat. 1997;25:1762–1780. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental 1

NIHMS723313-supplement-Supplemental_1.pdf^{(244.3KB, pdf)}

[R1] Antelman G, Savage IR. Characteristic functions of stochastic integrals and reliability theory. Naval Res Logist Quart. 1965;12:199–222. [Google Scholar]

[R2] Argiento R, Guglielmi A, Pievatolo A. Bayesian density estimation and model selection using nonparametric hierarchical mixtures. Comput Statist Data Anal. 2010;54:816–832. [Google Scholar]

[R3] Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statist Med. 2005;24:1713–1723. doi: 10.1002/sim.2059. [DOI] [PubMed] [Google Scholar]

[R4] Breslow NE. Covariance analysis of censored survival data. Biometrics. 1974;30:89–100. [PubMed] [Google Scholar]

[R5] Brix A. Generalized gamma measures and shot-noise Cox processes. Adv Appl Probab. 1999;31:929–953. [Google Scholar]

[R6] Burridge J. Empirical Bayes analysis of survival time data. J Roy Statist Soc Ser B. 1981;43:65–75. [Google Scholar]

[R7] Chen MH, Shao QM. Monte Carlo estimation of Bayesian credible and HPD intervals. J Comput Graph Statist. 1999;8:69–92. [Google Scholar]

[R8] Chen MH, Ibrahim JG, Shao QM. Posterior propriety and computation for the Cox regression model with applications to missing covariates. Biometrika. 2006;93:791–807. [Google Scholar]

[R9] Chen M-H, Shao Q-M, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. Springer-Verlag; New York: 2000. [Google Scholar]

[R10] Chi Z. Technical Report 28. Department of Statistics, University of Connecticut; 2012. On exact sampling of the first passage event of Lévy process with infinite Lévy measure and bounded variation. Available at arxiv.org with article id 1207.2495. [Google Scholar]

[R11] Clayton DG. A Monte Carlo method for Bayesian inference in frailty models. Biometrics. 1991;47:467–485. [PubMed] [Google Scholar]

[R12] Cox DR. Regression models and life-tables. J Roy Statist Soc Ser B. 1972;34:187–220. [Google Scholar]

[R13] Cox DR. Partial likelihood. Biometrika. 1975;62:269–276. [Google Scholar]

[R14] D’Amico AV, Chen MH, Sun L, Lee WR, Mouraviev V, Robertson CN, Walther PJ, Polascik TJ, Albala DM, Moul JW. Adjuvant versus salvage radiation therapy for prostate cancer and the risk of death. BJU International. 2010;106:1618–1622. doi: 10.1111/j.1464-410X.2010.09447.x. [DOI] [PubMed] [Google Scholar]

[R15] Damien P, Laud PW, Smith AFM. Implementation of Bayesian non-parametric inference based on beta processes. Scand J Statist. 1996;23:27–36. [Google Scholar]

[R16] Devroye L. Nonuniform Random Variate Generation. Springer-Verlag; New York: 1986. [Google Scholar]

[R17] Doksum K. Tailfree and neutral random probabilities and their posterior distributions. Ann Probab. 1974;2:183–201. [Google Scholar]

[R18] Dykstra RL, Laud P. A Bayesian nonparametric approach to reliability. Ann Stat. 1981;9:356–367. [Google Scholar]

[R19] Efron B. The efficiency of Cox’s likelihood function for censored data. J Amer Statist Assoc. 1977;72:557–565. [Google Scholar]

[R20] Epifani I, Lijoi A. Nonparametric priors for vectors of survival functions. Statist Sinica. 2010;20:1455–1484. [Google Scholar]

[R21] Epifani I, Lijoi A, Prünster I. Exponential functionals and means of neutral-to-the-right priors. Biometrika. 2003;90:791–808. [Google Scholar]

[R22] Ferguson TS. A Bayesian analysis of some nonparametric problems. Ann Stat. 1973;1:209–230. [Google Scholar]

[R23] Ferguson TS, Phadia EG. Bayesian nonparametric estimation based on censored data. Ann Stat. 1979;7:163–186. [Google Scholar]

[R24] Gaver DP., Jr Random hazard in reliability problems. Technometrics. 1963;5:211–226. [Google Scholar]

[R25] Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Appl Statist. 1992;41:337–348. [Google Scholar]

[R26] Gold LS, Kane LB, Sotoodehnia N, Rea T. Disaster events and the risk of sudden cardiac death: a Washington State investigation. Prehosp Disaster Medicine. 2007;22:313–317. doi: 10.1017/s1049023x00004921. [DOI] [PubMed] [Google Scholar]

[R27] Hjort NL. Nonparametric Bayes estimators based on beta processes in models for life history data. Ann Stat. 1990;18:1259–1294. [Google Scholar]

[R28] Hougaard P. Survival models for heterogeneous populations derived from stable distributions. Biometrika. 1986;73:387–396. [Google Scholar]

[R29] Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. Springer-Verlag; New York: 2001. [Google Scholar]

[R30] James LF. Bayesian Poisson process partition calculus with an application to Bayesian Lévy moving averages. Ann Stat. 2005;33:1771–1799. [Google Scholar]

[R31] James LF. Poisson calculus for spatial neutral to the right processes. Ann Stat. 2006;34:416–440. [Google Scholar]

[R32] Kalbfleisch JD. Non-parametric Bayesian analysis of survival time data. J Roy Statist Soc Ser B. 1978;40:214–221. [Google Scholar]

[R33] Kim Y, Kim D. Bayesian partial likelihood approach for tied observations. J Statist Plann Inference. 2009;139:469–477. [Google Scholar]

[R34] Kim Y, Lee J. Bayesian analysis of proportional hazard models. Ann Stat. 2003;31:493–511. [Google Scholar]

[R35] Kim Y, Park JK, Kim G. Bayesian analysis for monotone hazard ratio. Lifetime Data Anal. 2011;17:302–320. doi: 10.1007/s10985-010-9181-x. [DOI] [PubMed] [Google Scholar]

[R36] Laud PW, Damien P, Smith AFM. Practical Nonparametric and Semiparametric Bayesian Statistics. Springer; New York: 1998. Bayesian nonparametric and covariate analysis of failure time data; pp. 213–225. [Google Scholar]

[R37] Lee J, Kim Y. A new algorithm to generate beta processes. Comput Statist Data Anal. 2004;47:441–453. [Google Scholar]

[R38] Lijoi A, Mena RH, Prünster I. Controlling the reinforcement in Bayesian non-parametric mixture models. J Roy Statist Soc Ser B. 2007;69:715–740. [Google Scholar]

[R39] Lijoi A, Prünster I, Walker SG. Investigating nonparametric priors with Gibbs structure. Statist Sinica. 2008a;18:1653–1668. [Google Scholar]

[R40] Lijoi A, Prünster I, Walker SG. Posterior analysis for some classes of nonpara-metric models. J Nonparametr Statist. 2008b;20:447–457. [Google Scholar]

[R41] Liu JS. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J Amer Statist Assoc. 1994;89:958–966. [Google Scholar]

[R42] Lo AY, Weng CS. On a class of Bayesian nonparametric estimates. II Hazard rate estimates. Ann Inst Statist Math. 1989;41:227–245. [Google Scholar]

[R43] Nieto-Barajas LE, Walker SG. Bayesian nonparametric survival analysis via Lévy driven Markov processes. Statist Sinica. 2004;14:1127–1146. [Google Scholar]

[R44] Oakes D. Bivariate survival models induced by frailties. J Amer Statist Assoc. 1989;84:487–493. [Google Scholar]

[R45] Peccati G, Prünster I. Linear and quadratic functionals of random hazard rates: an asymptotic analysis. Ann Appl Probab. 2008;18:1910–1943. [Google Scholar]

[R46] Peto R. Contribution to the discussion of paper by D. R. Cox. J Roy Statist Soc Ser B. 1972;34:205–207. [Google Scholar]

[R47] Reynolds DS, Savage IR. Random wear models in reliability theory. Adv Appl Probab. 1971;3:229–248. [Google Scholar]

[R48] Rossi PH, Berk RA, Lenihan KJ. Money, Work and Crime: Experimental Evidence. Academic Press; New York: 1980. [Google Scholar]

[R49] Sato K-I. Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press; Cambridge: 1999. [Google Scholar]

[R50] Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. J Roy Statist Soc Ser B. 2002;64:583–639. [Google Scholar]

[R51] Walker S, Muliere P. Beta-Stacy processes and a generalization of the Pólya-urn scheme. Ann Stat. 1997;25:1762–1780. [Google Scholar]

PERMALINK

BAYESIAN INFERENCE OF HIDDEN GAMMA WEAR PROCESS MODEL FOR SURVIVAL DATA WITH TIES

Arijit Sinha

Zhiyi Chi

Ming-Hui Chen

Abstract

1. Introduction

2. Basic Setup

3. Joint Likelihood for Wear Process Model

3.1. N-variate wear process model

Example 1 (PH model)

Example 2 (Lévy copula)

Example 3 (Independent failure times)

Proposition 1

3.2. Likelihood function

Theorem 1

Proposition 2

Proposition 3

3.3. A Gamma wear process model

Corollary 1

Corollary 2

Corollary 3

4. Sampling of Survival Data from Gamma Process

Proposition 4

Theorem 2

Direct Forward Sampling (DFS) Algorithm

5. Bayesian Posterior Inference

5.1. Prior

5.2. Posterior computation

6. A Simulation Study

Figure 1.

Figure 2.

Table 1.

Figure 3.

7. Analysis of Prostate Cancer Data

Figure 4.

Table 2.

8. Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases