Abstract
Competing risks data are routinely encountered in various medical applications due to the fact that patients may die from different causes. Recently, several models have been proposed for fitting such survival data. In this paper, we develop a fully specified subdistribution model for survival data in the presence of competing risks via a subdistribution model for the primary cause of death and conditional distributions for other causes of death. Various properties of this fully specified subdistribution model have been examined. An efficient Gibbs sampling algorithm via latent variables is developed to carry out posterior computations. Deviance Information Criterion (DIC) and Logarithm of the Pseudomarginal Likelihood (LPML) are used for model comparison. An extensive simulation study is carried out to examine the performance of DIC and LPML in comparing the cause-specific hazards model, the mixture model, and the fully specified subdistribution model. The proposed methodology is applied to analyze a real dataset from a prostate cancer study in detail.
Keywords: Latent variables, Markov chain Monte Carlo, Partial likelihood, Proportional hazards
1 Introduction
Competing risks data are frequently encountered in various medical applications due to the fact that patients may die from different causes. Studies on this topic have been active and productive. Gail (1975) proposed a multivariate model of failure times due to different causes. Tsiatis (1975) showed that for any joint distribution of n failure times there exists a joint distribution of n independent failure times such that the marginal cause-specific cumulative incident functions from the two joint distributions coincide, which implies that the correlations between the failure times are not identifiable in the multivariate failure time model. Prentice et al. (1978) introduced a cause-specific hazards model. Larson and Dinse (1985) established a mixture model with hazards function conditional on failure from a specific cause. Fine and Gray (1999) discussed the subdistribution model with proportional hazards assumption to assess the covariates effect on the cumulative incidence function of the cause of interest. Recently, Fan (2008) introduced a Bayesian nonparametric methodology based on the full likelihood for the proportional subdistribution hazards model. Elashoff et al. (2007, 2008) jointly modeled the longitudinal measurements and survival data with competing risks, where they extended respectively the cause-specific hazards model and the mixture model for survival data, and used latent random variables to link together the sub-models for longitudinal measurements and survival data.
The Bayesian literature on competing risks analysis is still sparse. Fan (2008) developed Bayesian methods by extending the subdistribution model of Fine and Gray (1999) for each cause-specific risk. More recently, Hu et al. (2009) and Huang et al. (2011) developed the Bayesian methods for a joint analysis of longitudinal measurements and survival data with competing risks, in which cause-specific hazards sub-models were considered for modeling survival times. As pointed out in Fine and Gray (1999), one of the nice properties of the subdistribution model is that the effect of a covariate on the marginal probability function can be directly assessed. However, the subdistribution model proposed by Fine and Gray (1999) cannot be compared to two other established models as the competing risks for other causes are not specified in their model. Due to this reason, we develop a fully specified subdistribution model with subdistribution hazard for the primary cause of death and conditional hazards for other causes of death. Under this fully specified subdistribution model, we are able to establish a theoretical connection between the partial likelihood of Fine and Gray (1999) and the one under the fully specified subdistribution model for the cause of primary interest when all failure times are observed. We notice that this connection may not be established under the models discussed in Fan (2008). With this new development, formal model comparisons between the fully specified subdistribution model and two other established models, namely, the cause-specific hazards model (Prentice et al., 1978) and the mixture model (Larson and Dinse, 1985), can be carried out via Bayesian Deviance Information Criterion (DIC) and logarithm of the Pseudomarginal likelihood (LPML). Furthermore, the fully specified subdistribution model also facilitates an efficient implementation of the Gibbs sampling algorithm.
The rest of the article is organized as follows. In Section 2, we present a detailed development of the fully specified subdistribution model and examine various properties of it. The prior and posterior are discussed and an efficient Gibbs sampling algorithm via a set of latent variables is developed in Section 3. In Section 4, we briefly review the cause-specific hazards model (Prentice et al., 1978) and the mixture model (Larson and Dinse, 1985), and provide necessary mathematical formulations for DIC and LPML under these two models and the fully specified subdistribution model. In Section 5, we present the design of a simulation study and the simulation algorithms for generating the data under the three competing risk models. We notice that these three competing risk models have never been formally compared based on our best knowledge. In Section 6, we analyze a real data from a prostate cancer study in detail. We conclude the paper with brief discussion and some extensions of the proposed model in Section 7.
2 Subdistribution Based Models for Competing Risks
2.1 Preliminary
We consider two competing risks throughout the paper and the extension to more than two competing risks will be discussed in Section 7. Let Tj be the time to failure due to cause j for j = 1, 2 and δ be the index of cause of death. Also let T = min{T1, T2}. Assume cause 1 is the cause of primary interest. The subdistribution hazard for cause 1 defined in Gray (1988) is given as follows:
(2.1) |
where F1(t) = Pr(T ≤ t, δ = 1). As discussed in Fine and Gray (1999), to develop the regression model of (2.1) with the proportional hazards assumption, h1(t|x) = h10(t) exp(x′β1) and , where x is a vector of covariates and β1 is a vector of the corresponding regression coefficients. As pointed out in Fine and Gray (1999), the covariate effects can be directly assessed on the cumulative incidence function for primary cause under the subdistribution model. However, the distributions for failure times due to other causes are never specified in Fine and Gray (1999).
2.2 A Fully Specified Subdistribution Model for Two Competing Risks
Let , j = 1, 2, where we define ∞ × 0 = 0. Write as the observed time to failure. We propose the cause-specific cumulative incidence functions for both causes as follows:
(2.2) |
where M2(t) is the cumulative incidence function conditional on cause 2. The fact that implies that Fj(t) is improper. Yet M2(t) is proper due to . Note that in (2.2), we do not directly model the correlation between T1 and T2, which is not identifiable as shown in Tsiatis (1975). Instead, F1(t) and F2(t) are related to each other via Pr(δ = 2) = 1 − Pr(δ = 1) = 1 − F1(∞).
We apply the definition of subdistribution hazard in Fine and Gray (1999) for cause 1 by . Then H1(t) is improper because . We specify a proportional hazards model with an improper baseline hazard function for F1(t|x) as
(2.3) |
For cause 2, we propose a proportional hazards model for M2(t|x) as
(2.4) |
The model defined by (2.3) and (2.4) is thus called the fully specified subdistribution (FS) model. Under the FS model, Pr(δ = 2|x) = 1 − Pr(δ = 1|x) = exp{−H10(∞) exp(x′β1)}.
Assume there are n observations with the vector of observed time t = (t1, t2, … , tn)′, the matrix of covariates X = (x1, x2, … , xn)′, and the vector of cause indicator δ = (δ1, δ2, … , δn)′, where δi takes possible values of 0, 1, and 2, corresponding to “censored”, “died due to cause 1”, and “died due to cause 2” for the ith subject, respectively. Under the model defined in (2.3) and (2.4), the likelihood function is given by
(2.5) |
2.3 Justification of Fine and Gray’s Partial Likelihood
The FS model is not only a natural expansion of the subdistribution model of Fine and Gray (1999) but also provides novel justifications of Fine and Gray’s partial likelihood under certain conditions. Assume there are n complete observations with the vector of observed time t = (t1, t2, … , tn)′, the matrix of covariates X = (x1, x2, … , xn)′, and the vector of cause indicator δ = (δ1, δ2, … , δn)′. The partial likelihood of β1 for cause 1 given in Fine and Gray (1999) is of the form
(2.6) |
where is defined as a special risk set at failure time ti given by
(2.7) |
Note that the risk set is quite different than the risk set in Cox’s partial likelihood (Cox, 1972, 1975) as the patients who died from cause 2 before ti are also included in .
Three theorems are established below to show that the partial likelihood (2.6) can be obtained under the FS regression model via three different approaches with detailed proofs given in Appendix B. Denote D1 as the number of deaths due to cause 1. Let yi = ti when δi = 1, and yi = ∞ when δi ≠ 1. Write y = (y(1), y(2), … , y(n))′, where 0 = y(0) < y(1) < ⋯ < y(D1) < y(D1+1) = ⋯ = y(n) = ∞. Since all observations are failure times, the likelihood function of β1 for cause 1 given the n complete observations is the part of the likelihood function in (2.5) involving β1:
(2.8) |
Theorem 1 With n complete observations, assume that in the FS model the baseline hazard rate h10 is zero after the last failure time due to cause 1. The partial likelihood function (2.6) can be attained by the profile likelihood approach, which is to plug in the profile maximum likelihood estimator of h10 in the likelihood function L(β1, h10|y, X, δ).
Theorem 2 With n complete observations, assume that in the FS model the baseline hazard rate h10(t) is zero after the last failure time due to cause 1 and the prior of h10(t) is degenerate at 0 everywhere except at yi’s when δi = 1. Let h10(yi) = λi when δi = 1 and λ = (λ1, … , λD1)′. We further assume independent Jeffreys-type priors for the λi’s, i.e., . Then, the partial likelihood function (2.6) is obtained by
where L(β1, h10|y, X, δ) is defined in (2.8).
Theorem 3 With n complete observations, assume that in the FS model the baseline hazard rate h10(t) is zero after the last failure time due to cause 1 and that H10(t) has a Gamma process prior, i.e., h1i ~ Gamma(c0h0i, c0), where c0 > 0, h1i = H10(y(i)) − H10(y(i−1)) for i = 1, 2, … , D1, h1, D1+1 = ⋯ = h1n = 0, h0i = H0(y(i)) − H0(y(i−1)), H0(y) is increasing and differentiable at y1, … , yD1 with H0(0) = 0, and the h1i are independent of each other. Then the partial likelihood function (2.6) can be approximated by
where g(c0) is a function of c0, which is free from β1.
Fine and Gray (1999) also showed that the partial likelihood arises from complete data using a certain reduced data structure, without any assumptions on the models for the subdistribution for other causes. The results established in the above theorems give insight into Fine and Gray’s partial likelihood.
3 Prior, Posterior, and Computational Development
3.1 Prior and Posterior
For the sake of simpler calculation, a special case of the gamma process prior for the cumulative baseline hazard function assumed in Theorem 3 is considered here for the FS model. Assume the baseline hazard functions respectively have piecewise constant forms, which are, with Kj + 1 partitions of the time axis, 0 = sj0 < sj1 < sj2 < ⋯ < sjKj < ∞,
(3.1) |
To construct posterior distributions for the unknown parameters, we assume βj follows an improper uniform prior, λjk follows a Jeffreys-type prior, and λ1,K1+1 follows a gamma prior. We further assume that βj, λjk, and λ1,K1+1 are independent for k = 1, … , Kj and j = 1, 2. Let λ1 = (λ11, λ12, … , λ1,K1+1)′ and λ2 = (λ21, λ22, … , λ2K2)′, then the joint prior of (β1, β2, λ1, λ2) is specified as follows
(3.2) |
where with a > 0 and b > 0, which are prespecified hyperparameters. The joint posterior distribution is given by
(3.3) |
where L(β1, β2, λ1, λ2|t, X, δ) is given by (2.5) with h10(t) and h20(t) defined in (3.1). Recently, Wang et al. (2012) established a theoretical connection between the gamma process prior specified for the cumulative baseline function and the independent gamma priors for the λjk for the interval-censored survival data. Also, the independent gamma priors assumed for the baseline hazard function approximate the gamma process prior when c0 → 0+. Thus, the priors in (3.2) for the λjk can be considered as a special case of the gamma process priors specified for the cumulative baseline function in this sense.
Let νjik = 1 if the ith subject failed or was censored in the kth interval (sj,k−1, sjk], and 0 otherwise for k = 1, 2, … , Kj +1, and i = 1, 2, … , n, where sj,Kj+1 = ∞, for j = 1, 2. Also let Xj be a matrix with its ith row equal to for j = 1, 2. Then, we are led to the following theorem regarding the the propriety of the posterior distribution of (β1, β2, λ1, λ2) with an improper prior given by (3.2).
Theorem 4 Assume that (i) when δi > 0, ti > 0 and for k = 1, 2, … , Kj for j = 1, 2, and (ii) X1 and X2 are of full rank. Then, the posterior distribution π(β1, β2, λ1, λ2|t, X, δ) in (3.3) with the prior specified in (3.2) is proper.
A proof of Theorem 4 is given in Appendix B. Theorem 4 gives very mild conditions for ensuring propriety of the joint posterior distribution of (β1, β2, λ1, λ2) under the fully specified subdistribution model. The conditions (i) and (ii) essentially require that all event times are strictly positive, at least one event occurs in each chosen interval (sj,k−1, sjk], and the corresponding covariate matrix is of full rank. Notice that we do not require any events for the last interval (s1K1, ∞) for the primary cause as we specify a proper prior for λ1,K1+1. These conditions are easily satisfied in most applications and are quite easy-to-check. Under certain additional conditions, we can also show that when π(λ1,K1+1) ∝ 1 in (3.2), the resulting posterior of (β1, β2, λ1, λ2) is still proper. We also notice that following Chen et al (2006), the posterior propriety by assuming full gamma process priors on H10 and H20 for c0 > 0 can be established and, however, stronger propriety conditions are required in this case.
3.2 Computational Development
Due to the complexity of the likelihood structure of the FS model, an analytical evaluation of the posterior distribution does not appear to be possible. In order to carry out posterior inference, we adopt Markov chain Monte Carlo (MCMC) methods and develop a computationally efficient Gibbs sampling algorithm to sample from the posterior distribution in (3.3).
In order to avoid the complicated form in the censored part of the likelihood function, we introduce a latent variable ηi to indicate whether or not subject i would eventually fail from cause 1 or 2 and another latent variable ui to be the failure time such that ui ≥ ti when subject i was censored at ti and ηi = 1. Then the complete likelihood is constructed as
(3.4) |
where η = (ηi : δi = 0, 1 ≤ i ≤ n) and u = (ui : δi = 0, ηi = 1, 1 ≤ i ≤ n). Based on the complete data likelihood, the augmented posterior of (β1, β2, λ1, λ2, η, u) is given by
(3.5) |
where h10(t) and h20(t) are defined in (3.1). It is easy to show that
where π(β1, β2, λ1, λ2|t, X, δ) is the posterior given in (3.3). This result ensures that whenever (β1, β2, λ1, λ2, η, u) ~ π(β1, β2, λ1, λ2, η, u|t, X, δ), then (β1, β2, λ1, λ2) ~ π(β1, β2, λ1, λ2|t, X, δ).
The introduction of the latent variables η and u greatly facilitates a convenient implementation of the Gibbs sampling algorithm. To develop an efficient Gibbs sampling algorithm, we use the collapsed Gibbs method of Liu (1994). First, we group (β2, λ2, η, u) together. Then, the Gibbs sampling algorithm requires to sample from the following conditional posterior distributions in turn: (i) [β1|λ1, β2, λ2, η, u, t, X, δ]; (ii) [λ1|β1, β2, λ2, η, u, t, X, δ]; and (iii) [β2, λ2, η, u|β1, λ1, t, X, δ]. For (i), the conditional posterior density of β1 given (λ1, β2, λ2, η, u, t, X, δ) is log-concave in each component of β1. Thus, we can use the adaptive rejection algorithm of Gilks and Wild (1992) to sample β1. For (ii), it can be shown that given (β1, β2, λ2, η, u, t, X, δ), the λ1k’s are conditionally independent and each of them follows a gamma distribution. Thus, sampling λ1 is straightforward. For (iii), it is easy to see that
(3.6) |
In (3.6), we collapse out u in the conditional distribution [β2, λ2, η|β1, λ1, t, X, δ]. However, jointly sampling (β2, λ2, η) from their conditional distribution is not possible. Thus, we need to run a sub-Gibbs sampling algorithm to sample (β2, λ2, η) from this conditional posterior distribution. The approach is called the modified collapsed Gibbs sampling algorithm. As shown in Chen et al. (2000), the modified collapsed Gibbs sampling algorithm yields the target posterior as its stationary distribution. The sub-Gibbs sampling algorithm requires to sample from the following three additional conditional posterior distributions in turns: (iiia) [β2|β1, λ1, λ2, η, t, X, δ]; (iiib) [λ2|β1, λ1, β2, η, t, X, δ]; and (iiic) [η|β1, λ1, β2, λ2, t, X, δ]. For (iiia) the conditional posterior density of β2 is log-concave in each component of β2 and we again use adaptive rejection algorithm of Gilks and Wild (1992) to sample β2. For (iiib), the λ1k’s are conditionally independent and each of them follows a gamma distribution. Finally, for (iiic), ηi’s are conditionally independent and each ηi follows a Bernoulli distribution. The technical detail of sampling η from [η|β1, λ1, β2, λ2, t, X, δ] and sampling u from [u|β1, λ1, λ2, η, t, X, δ] is given in Appendix C.
4 Model Comparison
The cause-specific hazards model and the mixture model are two well-established models for competing risks survival data. We discuss details of these two models and compare them with the FS model both theoretically, in simulation, and in an analysis of a real dataset.
4.1 Other Models for Comparison
Cause-specific Hazards Model
As discussed in Gaynor et al. (1993), the cause-specific hazard function is denoted by
Under the proportional hazards assumption, hCj(t|x) = hCj0(t) exp(x′βj) and the cumulative incidence function of cause j is given by
The covariate effects can be directly assessed on the cause-specific hazard functions. But they cannot be directly estimated by βj alone on the cause-specific cumulative incidence function of cause j, as the cause-specific cumulative incidence function of cause 1 also depends on regression coefficients β2 for cause 2.
Mixture Model
Larson and Dinse (1985) discussed the mixture model. Assume the types of cause-specific failures follow a multinomial distribution. Define the probability of failing from cause j as pj = Pr(δ = j) for j = 1, 2, where p1 + p2 = 1. Define hMj(t) as the hazard function conditional on failure from cause j,
Under the proportional hazards assumption, hMj(t|x) = hMj0(t) exp(x′βj) and the cause-specific cumulative incidence function is given by
It is observed that both cause-specific hazard functions and cause-specific cumulative incidence functions depend on regression coefficients of the corresponding cause as well as the probability of failing from that cause.
Notice that the definition of the subdistribution hazard is different than the definition of the cause-specific hazard function or the definition of the conditional hazard function in the mixture model. Thus, if the proportional structure on the hazard function of one model is true, the hazard functions of other two models could never achieve the Cox proportional hazards assumption.
4.2 Model Comparison Measures
Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002) and Logarithm of the Pseudomarginal Likelihood (LPML) (Ibrahim et al., 2001) are used here to compare the cause-specific hazards model, the mixture model, and the FS model. Let θ denote a collection of model parameters. DIC is defined as DIC = D(θ̂) + 2pD, where D(θ) is a deviance function, pD = D̄ − D(θ̂), D̄ and θ̂ are the posterior means of D(θ) and θ. The formula of LPML is given by , where the Conditional Predictive Ordinate (CPO), CPOi = f(ti|xi,D(i)) = ∫ f(ti|θ, xi)π(θ|D(i)), D(i) is the data with the ith observation deleted, and π(θ|D(i)) is the posterior distribution based on the data D(i). According to Gelfand and Dey (1994), LPML implicitly includes a similar dimensional penalty as AIC asymptotically.
For the proposed FS model, θ = (β1, λ1, β2, λ2), and the deviance function D(θ) is given by D(θ) = −2 log L(β1, β2, h10, h20|t, X, δ), where L(β1, β2, h10, h20|t, X, δ) is given by (2.5) and h10(t) and h20(t) are defined in (3.1).
For the cause-specific hazards model, suppose the piecewise exponential models for hCj0(t), j = 1, 2. The deviance function is defined by
where
From the above likelihood function, it is easy to see that under the cause-specific hazards model, DIC=DIC1+DIC2, where DICj is the DIC of the survival model with single cause j by treating other causes of death as censored.
For the mixture model, let p1i denote the probability of death due to cause 1 for the ith subject, p1 = (p11, p12, … , p1n)′. The likelihood function is
Assume , where zi is a vector of covariates, which may be a subset of xi.
For the cause-specific hazards model and mixture model, the form of the baseline hazard functions hCj0 and hMj0 and the prior of (β1, β2, λ1, λ2) are assumed in the same way as those in the FS model excluding the last piece h10(t) when t ≥ s1,K1. Similar to the FS model, the propriety of the joint posterior distribution of (β1, β2, λ1, λ2) can also be established under an improper joint prior π(β1, β2, λ1, λ2), which is similar to (3.2).
5 A Simulation Study
In this section, we carry out a simulation study to compare the cause-specific hazards model, mixture model, and fully specified subdistribution model via DIC and LPML. To generate data from the FS model, we assume there are two causes with cause 1 to be the cause of interest, and there are two covariates with true parameters β1 = (β11, β12)′ and β2 = (β21, β22)′, which are chosen such that Pr(δ = 1) is around 1/3. Covariates xi1 are generated from N(0, 1) and xi2 given xi1 are generated from Bernoulli(p(xi1)), where . Assume the failure times of two causes follow distinct piecewise exponential distributions, where for cause 1 the time is partitioned as s10 = 0, s11 = 8, s12 = 12, s13 = 15, s14 = 16, and s15 = 17 with corresponding λ1 = (0.001, 0.01, 0.03, 0.02, 0.3)′, and for cause 2 the time is partitioned as s20 = 0, s21 = 3, s22 = 5, s23 = 8, s24 = 10, s25 = 11, s26 = 12, s27 = 13, s28 = 15, s29 = 17, and s210 = 18 with corresponding λ2 = (0.001, 0.005, 0.01, 0.02, 0.04, 0.07, 0.1, 0.15, 0.2, 1.0)′. Generate ri from U(0, 1). If ri < Pr(δi = 1|xi), then generate of cause 1 from a piecewise exponential distribution
If ri ≥ Pr(δi = 1|xi), then generate of cause 2 from a piecewise exponential distribution
The censoring time ci is generated from a uniform distribution, U(ac, bc), where 0 < ac < bc are chosen so that the proportion of death is around 2/5, and then ti is taken to be . Note that under the FS model, .
For the mixture model, the settings of model parameters and covariates are similar to those for the FS model, while Pr(δ = 1) is calculated by , where ϕ = (ϕ1, ϕ2)′ is chosen such that p1 is around 1/3. If ri from U(0, 1) falls in (0, Pr(δ = 1)), then of cause 1 is generated from a piecewise exponential distribution
Otherwise, of cause 2 is generated from a piecewise exponential distribution
For the cause-specific hazards model, the parameter and covariates settings are also similar as above. According to Lu and Tsiatis (2001), ti is generated by ti = min{t1i, t2i, ci}, where t1i, t2i, and ci are generated independently, respectively, from a piecewise exponential distribution
a piecewise exponential distribution
and a uniform distribution, U(ac, bc), such that the proportion of death is around 2/5.
500 data sets with n = 500 observations in each dataset were generated from each of the three models, respectively, as described above. Each simulated dataset was fitted by all three models and the corresponding DICs and LPMLs were calculated. From Table 1, we see that the best model chosen by DIC and LPML is always consistent with the true model where the data were simulated. The mean DIC and mean LPML under each model and scenario are shown in Table 4 of Appendix A. The true value, estimate, standard deviation, mean square error, and coverage probability of each covariate coefficient when estimating from the true model for all of the three models are given in Table 5 of Appendix A. It is observed that the standard deviations and mean square errors are moderate and stable, and that the coverage probabilities are always around 0.95 under all three scenarios of (K1, K2).
Table 1.
DIC | LPML | |||||
---|---|---|---|---|---|---|
(K1, K2) | FS Best | C Best | M Best | FS Best | C Best | M best |
Data Simulated from FS Model | ||||||
(5, 10) | 0.626 | 0.002 | 0.372 | 0.608 | 0.000 | 0.392 |
(10, 20) | 0.764 | 0.004 | 0.232 | 0.778 | 0.008 | 0.214 |
(15, 30) | 0.892 | 0.006 | 0.102 | 0.918 | 0.006 | 0.076 |
Data Simulated from C Model | ||||||
(5, 10) | 0.194 | 0.524 | 0.282 | 0.190 | 0.482 | 0.328 |
(10, 20) | 0.130 | 0.722 | 0.148 | 0.132 | 0.716 | 0.152 |
(15, 30) | 0.156 | 0.732 | 0.112 | 0.158 | 0.726 | 0.116 |
Data Simulated from M Model | ||||||
(5, 10) | 0.096 | 0.044 | 0.860 | 0.100 | 0.072 | 0.828 |
(10, 20) | 0.104 | 0.094 | 0.802 | 0.110 | 0.142 | 0.748 |
(15, 30) | 0.098 | 0.154 | 0.748 | 0.120 | 0.192 | 0.688 |
To further compare the models, the median and interquartile range (IQR) of the pairwise differences of DIC and LPML for two models under each scenario are calculated, and the corresponding boxplots are shown in Figure 1. When the data were simulated from another model, the mixture model and fully specified subdistribution model fit better than the cause-specific hazards model.
6 Analysis of the Prostate Cancer Data
A subset of data from the prostate cancer studies published in Choueiri et al. (2010) is analyzed using the three models. The response variable was the time from prostate-specific antigen (PSA) failure to death or to the last follow-up, whichever came first. The median follow-up time after PSA failure was 11.2 years with IQR=(5.8, 16.0). The sample size was 546, with 54 prostate cancer deaths and 151 other causes of deaths. Seven covariates were considered in the analysis, including patient’s age at the date of PSA failure, the natural logarithm of PSA (logpsa), prostatectomy Gleason score (7 (GS7= 1) or otherwise (GS7= 0)), prostatectomy Gleason score (8 to 10 (GS8H= 1) or otherwise (GS8H= 0)), prostatectomy T classification (T3 and higher (T3= 1) or otherwise (T3= 0)), surgical margin status (positive (margin= 1) or otherwise (margin= 0)), and PSA doubling time (DT) (less than 6 months (DT6= 1) or otherwise (DT6= 0)). In this analysis, the prostate cancer cause of death is the primary cause and the other cause of death is any causes of death other than prostate cancer. We let β1 = (β11, β12, … , β17) denote the vector of the corresponding regression coefficients for the prostate cancer cause of death.
Different values of K1 and K2 were tried out for optimizing the model fitting. In the mixture model, z is the vector of all covariates without preselecting for a fair comparison between the models. The values of DIC, pD, and LPML for the 3 × 3 combinations of K1 and K2 under all three models are reported in Table 2. We see that (K1, K2) = (15, 20) is the optimum combination of (K1, K2) for almost all models, and that the fully specified subdistribution model always outperforms the other two models by achieving the smallest DIC and the largest LPML.
Table 2.
C Model | M Model | FS Model | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
K1 | K2 | DIC | pD | LPML | DIC | pD | LPML | DIC | pD | LPML |
10 | 10 | 1586.5 | 34.3 | −795.3 | 1580.0 | 42.5 | −791.9 | 1565.3 | 34.4 | −784.2 |
20 | 1578.6 | 44.6 | −792.5 | 1574.9 | 52.7 | −791.2 | 1560.1 | 44.7 | −782.9 | |
30 | 1600.8 | 55.0 | −805.6 | 1596.0 | 63.0 | −803.1 | 1581.5 | 55.1 | −795.0 | |
15 | 10 | 1584.2 | 39.9 | −795.7 | 1576.4 | 48.3 | −791.9 | 1564.4 | 39.8 | −785.0 |
20 | 1575.8 | 49.9 | −792.7 | 1571.1 | 58.4 | −790.3 | 1558.9 | 49.9 | −783.5 | |
30 | 1598.0 | 60.3 | −805.2 | 1592.1 | 68.8 | −802.6 | 1580.1 | 60.5 | −795.4 | |
20 | 10 | 1599.8 | 45.1 | −805.4 | 1592.6 | 53.6 | −801.4 | 1579.4 | 45.2 | −794.6 |
20 | 1592.0 | 55.5 | −802.6 | 1587.9 | 64.1 | −800.3 | 1574.7 | 55.8 | −793.4 | |
30 | 1614.1 | 65.9 | −815.2 | 1609.6 | 74.7 | −813.0 | 1595.4 | 65.9 | −805.2 |
The subdistribution model of Fine and Gray (1999) was also fit the data. For the prostate cancer death, the estimates, standard errors (SEs) and 95% confidence intervals (CIs) of β1 under the subdistribution model of Fine and Gray (1999) and the posterior means (estimates), posterior standard deviations (SDs), and 95% highest posterior density (HPD) intervals of β1 under the FS model for the scenario of (K1, K2) = (15, 20) are shown in Table 3. We see, from Table 3, that all estimates were quite close and the two models gave consistent conclusions in terms of significance of covariates at a significance level of 0.05. Note that the estimates of β1 under the subdistribution model of Fine and Gray (1999) were computed using the R-package cmprsk.
Table 3.
Variable | Subdistribution Model | Fully Specified Subdistribution Model | ||||
---|---|---|---|---|---|---|
Estimate | SE | 95% CI | Estimate | SD | 95% HPD Interval | |
age | 0.017 | 0.022 | (−0.026, 0.059) | 0.020 | 0.020 | (−0.017, 0.061) |
logpsa | −0.057 | 0.143 | (−0.337, 0.223) | 0.036 | 0.139 | (−0.225, 0.316) |
GS7 | −0.173 | 0.431 | (−1.018, 0.672) | −0.123 | 0.411 | (−0.935, 0.676) |
GS8H | 0.298 | 0.400 | (−0.486, 1.082) | 0.153 | 0.402 | (−0.615, 0.951) |
T3 | 0.453 | 0.409 | (−0.348, 1.255) | 0.598 | 0.417 | (−0.195, 1.414) |
margin | 0.483 | 0.305 | (−0.115, 1.080) | 0.417 | 0.295 | (−0.124, 1.035) |
DT6 | 0.990 | 0.278 | ( 0.445, 1.535) | 0.919 | 0.271 | ( 0.407, 1.467) |
Let the prostate cancer specific mortality (PCSM) be the cumulative incident function corresponding to the primary cause of death due to prostate cancer. The covariate effect was further investigated by comparing the posterior means of the PCSM at different times stratified by PSA doubling time (DT6= 1 versus DT6= 0) under each of the three models, where the other covariates were fixed at age = mean age (66.5), logpsa = mean logpsa (2.5), GS7 = 1, GS8H = 0, T3 = 1, and margin = 1. The PCSM plots are shown in Figure 2. The shapes of the PCSM curves under the three models were similar except at the tail part and the difference between the two curves was slightly smaller under the cause-specific hazards model. For example, at the 10th and 15th year after PSA failure, the posterior means of PCSM under the cause-specific hazards model were 0.055 and 0.166 for patients with PSA doubling time less than 6 months and 0.021 and 0.071 for patients with PSA doubling time greater than or equal to 6 months; under the mixture model the posterior means of PCSM were 0.052 and 0.17 for patients with PSA doubling time less than 6 months and 0.022 and 0.073 for patients with PSA doubling time greater than or equal to 6 months; under the FS model the posterior means of PCSM were 0.061 and 0.192 for patients with PSA doubling time less than 6 months and 0.025 and 0.082 for patients with PSA doubling time greater than or equal to 6 months. Those PCSM plots indicate that the patients with PSA doubling time less than 6 months had worse PCSMs than those with PSA doubling time greater than or equal to 6 months. This covariate effect can directly be seen from Table 3 under the FS model as DT6 was significant at a significance level of 0.05. In addition, the proportional hazards structure of the FS model also allows us to compute the adjusted hazard ratio (AHR), which is defined as exp(β17), of DT6 for the PCSM. Specifically, the posterior mean and 95% HPD interval of the AHR of DT6 were 2.601 and (1.369, 4.061), respectively. However, this covariate effect could not be directly assessed under the other two models. For example, under the mixture model with K1 = 15,K2 = 20, for the hazard regression sub-model corresponding to the prostate cancer death, the posterior mean, SD, and 95% HPD interval of β17 for DT6 were 0.122, 0.384, and (−0.627, 0.865) while the posterior mean, SD, and 95% HPD interval of ϕ7 for DT6 were 0.519, 0.168, and (0.187, 0.839) in the logistic regression sub-model for p1, indicating that DT6 was significant.
7 Discussion
In this paper, we have developed a fully specified subdistribution model of Fine and Gray (1999) and provided a justification of Fine and Gray’s partial likelihood via the profile likelihood approach and the Bayesian approach. Our Bayesian justification is the first such development in the context of competing risk models after the Bayesian justification of Cox’s partial likelihood (Kalbfleisch, 1978; Sinha et al., 2003) as the risk set at time t in Fine and Gray’s partial likelihood includes all patients who are still alive prior to t as well as the patients who were died from other causes of death up to t, which is quite different than the usual risk set in Cox’s partial likelihood (Cox, 1972, 1975). To fit the proposed FS model, a piecewise exponential model with Jeffreys-type priors, which is a special case of the gamma process prior when c0 → 0+, is assumed for the baseline hazard function. Compared to the full gamma process priors, the gamma priors based on the piecewise exponential model relax the conditions for the posterior propriety and facilitate the development of an efficient Gibbs sampling algorithm for carrying out the posterior computation.
In Section 5, we conducted an extensive simulation study in examining the performance of DIC and LPML in identifying the model from which the data were generated. Our simulation results empirically showed that when the data are from one (the true model) of the three models (cause-specific hazards model, mixture model, and fully specified subdistribution model), it is unlikely that the other two models would have smaller DICs and larger LPMLs than the true model. This may be due to different proportional hazard functions assumed under these three models. For the prostate cancer data, the FS model had much smaller DIC and larger LPML than those under the cause-specific model and mixture model, implying that the FS model was much more appropriate for fitting this dataset than the other two models.
The fully specified subdistribution model can be further extended to the cases with more than 2 competing risks. Assume there are J competing risks with cause 1 as the cause of interest. Denote , j = 1, 2, … , J, and . The cause-specific cumulative incidence functions can be constructed as follows: F1(t) = Pr(T* ≤ t, δ = 1) = Pr(T1 ≤ t, δ = 1), and Fj(t) = Pr(T* ≤ t, δ = j) = Mj(t)Pr(δ ≠ j − 1|δ ≠ 1, … , δ ≠ j − 2) … Pr(δ ≠ 2|δ ≠ 1)Pr(δ ≠ 1), j = 2, … , J, where Mj(t) is the probability of failure from cause j by time t conditional on not failing from causes 1, 2, … , j − 1.
In this paper, we only considered fixed covariates. Including time-dependent covariates in the FS model, as well as jointly modeling longitudinal measurements (e.g., a series of PSA measures over-time) and survival endpoints of cause-specific death times are important future research topics, which are under investigation currently. Models with frailty terms (Clayton, 1978; Vaupel et al., 1979) are commonly used for correlated survival data with multivariate risk factors. Dixon et al. (2011) introduced a multivariate subdistribution hazard model including frailty to induce correlations among clustered survival times. The FS model can be extended for correlated survival data in the presence of competing risks in the manner to include frailties. This is another interesting topic for future research.
In all the Bayesian computations, we used 10,000 Gibbs samples after a burn-in of 1000 iterations for each model to compute all the posterior estimates, including posterior means, posterior standard deviations, 95% HPD intervals, DICs and LPMLs. We also generated 50,000 Gibbs samples after a burn-in of 1000 to re-compute those posterior quantities and the results were very similar. The HPD intervals were computed via the Monte Carlo method developed by Chen and Shao (1999). Codes were written for FORTRAN 95 compiler, and we used IMSL subroutines with double precision accuracy. The fortran codes for the FS model are available upon request.
Acknowledgements
The authors wish to thank the Editor-in-Chief, the Associate Editor, and the two referees for their helpful comments and suggestions, which have led to an improved version of this article. This research was partially supported by NIH grants #GM 70335 and #CA 74015.
Appendix A: Additional Tables
Table 4.
DIC | LPML | |||||
---|---|---|---|---|---|---|
(K1, K2) | FS Model | C Model | M Model | FS Model | C Model | M Model |
Data Simulated from FS Model | ||||||
(5, 10) | 2142.0 | 2167.1 | 2144.2 | −1071.1 | −1083.7 | −1072.0 |
(10, 20) | 2119.2 | 2145.6 | 2123.2 | −1060.6 | −1073.9 | −1062.6 |
(15, 30) | 2126.5 | 2153.9 | 2131.8 | −1065.8 | −1079.6 | −1068.4 |
Data Simulated from C Model | ||||||
(5, 10) | 2064.8 | 2055.8 | 2056.6 | −1032.5 | −1028.0 | −1028.2 |
(10, 20) | 2039.0 | 2030.7 | 2033.4 | −1020.5 | −1016.4 | −1017.8 |
(15, 30) | 2044.6 | 2037.5 | 2040.8 | −1024.9 | −1021.4 | −1023.0 |
Data Simulated from M Model | ||||||
(5, 10) | 2286.8 | 2283.7 | 2274.7 | −1144.0 | −1142.3 | −1137.7 |
(10, 20) | 2285.9 | 2282.3 | 2276.1 | −1144.4 | −1142.4 | −1139.5 |
(15, 30) | 2295.9 | 2291.8 | 2287.0 | −1150.9 | −1148.7 | −1146.5 |
Table 5.
FS Model | C Model | M Model | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
(K1, K2) = | (5, 10) | (10, 20) | (15, 30) | (5, 10) | (10, 20) | (15, 30) | (5, 10) | (10, 20) | (15, 30) | |
β11 | True | 0.2 | 0.2 | 0.2 | ||||||
Est | 0.2 | 0.2 | 0.2 | 0.18 | 0.19 | 0.19 | 0.19 | 0.20 | 0.20 | |
SD | 0.09 | 0.09 | 0.09 | 0.11 | 0.11 | 0.11 | 0.13 | 0.13 | 0.13 | |
MSE | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | |
CP | 0.94 | 0.94 | 0.94 | 0.95 | 0.95 | 0.95 | 0.94 | 0.94 | 0.94 | |
β12 | True | 0.8 | 1.0 | 1.5 | ||||||
Est | 0.78 | 0.78 | 0.78 | 1.00 | 1.02 | 1.02 | 1.47 | 1.48 | 1.48 | |
SD | 0.23 | 0.23 | 0.23 | 0.24 | 0.24 | 0.24 | 0.35 | 0.36 | 0.37 | |
MSE | 0.05 | 0.05 | 0.06 | 0.06 | 0.06 | 0.06 | 0.14 | 0.15 | 0.16 | |
CP | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.94 | 0.94 | |
β21 | True | 0.3 | 0.3 | 0.3 | ||||||
Est | 0.29 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | |
SD | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | |
MSE | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | |
CP | 0.97 | 0.96 | 0.96 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | |
β22 | True | 1.0 | 0.2 | 0.5 | ||||||
Est | 0.98 | 1.00 | 1.00 | 0.19 | 0.20 | 0.20 | 0.53 | 0.53 | 0.54 | |
SD | 0.17 | 0.17 | 0.17 | 0.15 | 0.15 | 0.15 | 0.18 | 0.18 | 0.18 | |
MSE | 0.03 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 | 0.03 | 0.03 | 0.03 | |
CP | 0.94 | 0.95 | 0.95 | 0.96 | 0.95 | 0.96 | 0.96 | 0.96 | 0.96 |
Note that Est, SD, MSE, and CP denote the average of the posterior means, the average of the posterior standard deviations, and the mean square error, and the coverage probability of the 95% HPD intervals over 500 simulations.
Appendix B: Proofs of Theorems
Proof of Theorem 1 With the assumption that h10 is zero after the last observation of failure due to cause 1, the likelihood function is now
The profile likelihood approach assumes that h10 is zero except for the failure times due to cause 1. Then
Therefore, the profile maximum likelihood estimator of h10 is given by
Plugging ĥ10(y(i)) in L(β1, h10|y, X, δ) results in the profile likelihood function given by
which is (2.6).
Proof of Theorem 2 Assume the prior of h10 only has value λi at times y(i) such that δi = 1. Then λi = H10(y(i)) − H10(y(i−1)), and λD1+1 = ⋯ = λn = 0, i = 1, … , D1. The survival function at time t is
(B.1) |
Then the likelihood function in (2.8) reduces to
Since , we have
Proof of Theorem 3 Assume H10 follows a gamma process prior. Let h1i = H10(y(i)) − H10(y(i−1)), h1i ~ G(c0h0i, c0), i = 1, … , D1. h1i’s are independent of each other, and h1,D1+1 = ⋯ = h1n = 0. Similar to (B.1), we can show that the survival function at time t is given by
Taking expectation with respect to the gamma process prior gives
Now the expectation of the likelihood function in (2.8) with respect to the gamma process prior reduces to
Since and
we have
Proof of Theorem 4 To show that (3.3) is proper, it is needed to show that
where π*(β1, β2, λ1, λ2|t, X, δ) is the unnormalized joint posterior density defined in (3.3). After some algebra, we can show that
It suffices to show that
since ∫ π**(β2, λ2|t, X, δ)dβ2dλ2 < ∞ can be proved in a similar way.
Consider the transformation uj = log(λ1j), j = 1, 2, … , K1, and let u = (u1, u2, … , uK1)′. Then, we have
(B.2) |
It is easy to show that , where M1 > 0 is a constant. Under condition (ii), X1 is of full rank. Thus, there exist distinct i1, i2, … , iK1+p1, where p1 = dim(β1), such that the (K1+p1) × (K1+p1) matrix , which has rows for ℓ = 1, … , K1+p1, is of full rank. Let kiℓ be an integer such that tiℓ ∈ (s1,kiℓ−1, s1kiℓ] for ℓ = 1, … , K1+p1. We take a one-to-one transformation . Using (B.2), we have
where M2 and M3 are two positive constants. This completes the proof.
Appendix C: Generating η from [η|β1, λ1, β2, λ2, t,X, δ] and u from [u|β1, λ1, λ2, η, t,X, δ]
Generating η from [η|β1, λ1, β2, λ2, t, X, δ] When δi = 0, we generate ηi by I(ηi = 1) ~ Bin(1, pi1) and I(ηi = 2) = 1−I(ηi = 1), where .
Generating u from [u|β1, λ1, λ2, η, t, X, δ] When δi = 0 and ηi = 1, we generate ui from a truncated piecewise exponential distribution f(ui), where
Denote δji as the index such that sj, δji−1 ≤ ti < sjδji. Let
Generate vi from a U(0, 1) distribution. If vi falls into the interval such that
the inverse distribution function method is used to calculate ui as
Contributor Information
Miaomiao Ge, Clinical Bio Statistics, Boehringer Ingelheim Pharmaceuticals, Inc., 900 Ridgefield Road, Ridgefield, CT, 06877.
Ming-Hui Chen, Department of Statistics, University of Connecticut, 215 Glenbrook Road, U-4120, Storrs, CT 06269, ming-hui.chen@uconn.edu.
References
- Chen MH, Ibrahim JG, Shao QM. Posterior propriety and computation for the Cox regression model with applications to missing covariates. Biometrika. 2006;93:791–807. [Google Scholar]
- Chen MH, Shao QM. Monte Carlo estimation of Bayesian credible and HPD intervals. J Comput Graph Stat. 1999;8:69–92. [Google Scholar]
- Chen MH, Shao QM, Ibrahim JG. Monte Carlo methods in Bayesian computation. New York: Springer-Verlag; 2000. [Google Scholar]
- Choueiri TK, Chen MH, D’Amico AV, Sun L, Nguyen PL, Hayes JH, Robertson CN, Walther PJ, Polascik TJ, Albala DM, Moul JW. Impact of postoperative prostate-specific antigen disease recurrence and the use of salvage therapy on the risk of death. Cancer. 2010;116:1887–1892. doi: 10.1002/cncr.25013. [DOI] [PubMed] [Google Scholar]
- Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–151. [Google Scholar]
- Cox DR. Regression models and life tables (with discussion) J Roy Stat Society B. 1972;34:187–220. [Google Scholar]
- Cox DR. Partial likelihood. Biometrika. 1975;62:269–276. [Google Scholar]
- Dixon SN, Darlington GA, Desmond AF. A competing risks model for correlated data based on the subdistribution hazard. Lifetime Data Anal. 2011;17:473–495. doi: 10.1007/s10985-011-9198-9. [DOI] [PubMed] [Google Scholar]
- Elashoff RM, Li G, Li N. An approach to joint analysis of longitudinal measurements and competing risks failure time data. Stat Med. 2007;26:2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elashoff RM, Li G, Li N. A joint model for longitudinal measurements and survival data in the presence of multiple failure types. Biometrics. 2008;64:762–771. doi: 10.1111/j.1541-0420.2007.00952.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan X. Unpublished Ph.D. Dissertation, Division of Biostatistics. Medical College of Wisconsin; 2008. Bayesian nonparametric inference for competing risks data. [Google Scholar]
- Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94:496–509. [Google Scholar]
- Gail M. A review and critique on some models used in competing risk analysis. Biometrics. 1975;31:209–222. [PubMed] [Google Scholar]
- Gaynor JJ, Feuer EJ, Tan CC, Wu DH, Little CR, Strauss DJ, Clarkson BD, Brennan MF. On the use of cause-specific failure and conditional failure probabilities: examples from clinical oncology data. J Am Stat Assoc. 1993;88:400–409. [Google Scholar]
- Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. J Roy Stat Society B. 1994;56:501–514. [Google Scholar]
- Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Appl Stat. 1992;41:337–348. [Google Scholar]
- Gray RJ. A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat. 1988;16:1141–1154. [Google Scholar]
- Hu W, Li G, Li N. A Bayesian approach to joint analysis of longitudinal measurements and competing risks failure time data. Stat Med. 2009;28:1601–1619. doi: 10.1002/sim.3562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, Li G, Elashoff RM, Pan J. A general joint model for longitudinal measurements and competing risks survival data with heterogeneous random effects. Lifetime Data Anal. 2011;17:80–100. doi: 10.1007/s10985-010-9169-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ibrahim JG, Chen MH, Sinha D. Bayesian survival analysis. New York: Springer-Verlag; 2001. [Google Scholar]
- Kalbfleisch JD. Non-parametric Bayesian analysis of survival time data. J Roy Stat Society B. 1978;40:214–221. [Google Scholar]
- Larson MG, Dinse GE. A mixture model for the regression analysis of competing risks data. Appl Stat. 1985;34:201–211. [Google Scholar]
- Liu JS. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J Am Stat Assoc. 1994;89:958–966. [Google Scholar]
- Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. doi: 10.1111/j.0006-341x.2001.01191.x. [DOI] [PubMed] [Google Scholar]
- Prentice RL, Kalbfleisch JD, Peterson A, Flournoy N, Farewell V, Breslow N. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]
- Sinha D, Ibrahim JG, Chen MH. A Bayesian justification of Cox’s partial likelihood. Biometrika. 2003;90:629–641. [Google Scholar]
- Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with Discussion) J Roy Stat Society B. 2002;64:583–639. [Google Scholar]
- Tsiatis A. A nonidentifiability aspect of the problem of competing risks. Proc Nat Acad Sciof USA. 1975;72:20–22. doi: 10.1073/pnas.72.1.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;16:439–454. [PubMed] [Google Scholar]
- Wang X, Sinha A, Yang J, Chen MH. Bayesian inference of interval-censored survival data. In: Chen DG, Sun J, Peace KE, editors. Interval-censored time-toevent data: methods and applications. Boca Raton, FL: Chapman & Hall; 2012. in press. [Google Scholar]