Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2023 Sep 27;51(11):2139–2156. doi: 10.1080/02664763.2023.2263819

Bayesian transformation model for spatial partly interval-censored data

Mingyue Qiu 1, Tao Hu 1,CONTACT
PMCID: PMC11328804  PMID: 39157272

Abstract

The transformation model with partly interval-censored data offers a highly flexible modeling framework that can simultaneously support multiple common survival models and a wide variety of censored data types. However, the real data may contain unexplained heterogeneity that cannot be entirely explained by covariates and may be brought on by a variety of unmeasured regional characteristics. Due to this, we introduce the conditionally autoregressive prior into the transformation model with partly interval-censored data and take the spatial frailty into account. An efficient Markov chain Monte Carlo method is proposed to handle the posterior sampling and model inference. The approach is simple to use and does not include any challenging Metropolis steps owing to four-stage data augmentation. Through several simulations, the suggested method's empirical performance is assessed and then the method is used in a leukemia study.

Keywords: Data augmentation, MCMC method, partly interval-censored data, spatial effect, semiparametric transformation model

1. Introduction

Partly interval-censored (PIC) data have attracted increasing interest in recent years as a mixture of exactly observed, left-censored, interval-censored, and right-censored observations. In the frequentist framework, Kim [17] proposed the maximum likelihood estimation for the proportional hazard (PH) model with PIC data. Gao, Zeng, and Lin [8] discussed the accelerated failure time (AFT) model with PIC data. Zhou, Sun, and Gilbert [39] studied the effects of time-dependent covariates on PIC failure time via a class of semiparametric transformation models. In the Bayesian framework, Pan, Cai, and Wang [25] and Wang, Jiang, and Song [34] proposed a Bayesian method with PIC data under the PH model and transformation models, respectively.

However, there might be unexplained heterogeneity in the data, after accounting for certain risk factors (fixed effects). These could be caused by many unmeasured regional characteristics such as socioeconomic status, health care quality, environmental exposure, etc. Motivated by this, certain spatial patterns can be considered to inform us of the differences in clinical practice among medical centers or inform further epidemiological studies. For instance, Henderson, Shimakura, and Gorst [11] investigated possible spatial variation in the survival of adult acute myeloid leukemia patients based on a multivariate gamma frailty model. Banerjee, Wall, and Carlin [2] involved a random effect of spatial to the Weibull distribution and applied it to analyze a dataset on infant mortality. All the works aforementioned assume that the survival model is parametric. However, nonparametric analysis is more typical in real-world settings. Therefore, Kneib [18] studied the same data by the PH model. In addition, Hennerfeind, Brezger, and Fahrmeir [12] added a spatial component to the usual linear predictor of the PH model and estimated the waiting times for coronary artery bypass grafting in a different area. Li and Ryan [20] and Li and Lin [19] analyzed the asthma data by semiparametric frailty models and semiparametric normal transformation models, respectively. Schnell et al. [29] explored the time to tooth loss by using a marginal cure rate proportional hazards model conditioned on spatial frailties. Martins, Silva, and Andreozzi [23] involved HIV/AIDS patients by jointly modeling longitudinal and survival data considering functional time and spatial frailty effects, respectively.

Generally, there are two types of the spatial arrangement of the strata: geostatistical approaches, where the exact geographic locations (e.g. latitude and longitude) of the strata are used, and lattice approaches, where the positions of the strata relative to each other (e.g. which area neighbors the others) are used. For lattice approaches, the conventional model is the conditionally autoregressive (CAR) distribution which only uses the adjacency information between areas, initially developed by Besag [3]. Start with the framework established by Besag [3,4] for the CAR model, there have been many developments about it recently. Banerjee, Wall, and Carlin [2] involved the CAR in the parametric Weibull model. And, Jin, Carlin, and Banerjee [14] applied the hierarchical multivariate CAR models to spatiotemporally correlated survival data. In addition to the frequentist framework's advancement, the Bayesian framework also developed rapidly. Hodges, Carlin, and Fan [13] discussed the exact form of the CAR prior in spatial models. Based on this, Banerjee, Carlin, and Gelfand [1] developed the hierarchical Bayes model. And, Liu, Sun, and He [21] propose a Bayesian hierarchical linear mixed model for colorectal cancer data with geographical characteristics.

Recently, there are mainly two papers that considered spatial effects by using lattice approaches. One developed a unified Bayesian approach that fits the PH, proportional odds (PO), and AFT models, under PIC data [38]. However, in the posterior sampling algorithms of this work, most parameters have no standard distribution. This leads to the Metropolis–Hastings (M–H) algorithm being adopted for almost all parameters. Hence, this method has low efficiency and slow speed. The other proposed a Bayesian semiparametric method with the PH model under interval-censored data [26]. However, the sampler of regression parameters in this approach is complex. For each dimension of the parameters, the adaptive rejection Metropolis sampling (ARMS) is used. This single-dimension sampling strategy can drag down the sampling speed. Especially as the dimension of covariate increases, the disadvantage becomes more obvious. Furthermore, the models that the two papers mentioned are limited. And the models' restrictive assumptions about the hazard function (such as the PH and AFT models) are always untenable.

In contrast to these two techniques, the method suggested in this study adopts a transformation model that includes the PH and PO models as special cases. Second, to address the issues brought on by the intricate model and data structure, we introduce several latent variables. As a result, the approach doesn't have any difficult Metropolis procedures, and all sample steps are simple or automatic. Finally, the regression parameter is sampled as vectors, which can accelerate the running speed. In addition, the code of this method is written in Julia. It was designed specifically for scientific computing and retained to be simple and fast while managing and processing massive amounts of data [5,27].

The rest of the paper is outlined as follows. Section 2 introduces the data structure, the transformation model, and the spline approximation of the baseline hazard function. Section 3 describes the proposed method including four-stage data augmentation, the CAR model, and the construction of Markov Chain Monte Carlo (MCMC) sampling algorithms. Section 4 offers some simulations and comparisons illustrating the quality of estimation. In Section 5, we apply the proposed method to the spatial leukemia survival data. Section 6 contains conclusions and suggestions for future research. In addition, some technical details are given in Appendix.

2. Data and model

2.1. Data structure and the likelihood function

The basic model used in this paper is the semiparametric transformation model. Under this model, the cumulative hazard function for event time t given the covariates x takes the form

Λ(t|x)=G[Λ0(t)exp(xβ)],

where G() is a transformation function that is strictly increasing and Λ0() is an unknown increasing function [37]. The PH and PO models can be regarded as special cases of the transformation model when the choice of G(x) are x and log(1+x), respectively. Now, we draw the spatial term into Λ(t|x). Firstly, we give each area a number, and a dummy vector 1 composed of 0 and 1. The position of 1 can describe the area where the subject comes from. Furthermore, let the spatial effect of each area compose the parameter vector ϕ. Finally, the cumulative hazard function with spatial effect can be shown as follows

Λ(t|x)=G[Λ0(t)exp(xβ+1ϕ)],

where ϕ is the spatial effect vector related to the area. Consequently, the cumulative distribution function of t given x, denoted by F(t|x) can be written as F(t|x)=1exp[Λ(t|x)], and the density function is f(t|x)=λ(t|x)exp[Λ(t|x)], where λ(t|x) is the first derivative of Λ(t|x).

Assume that the subjects are observed at M distinct spatial areas s1,,sM. Let tij be the true event time and xij be the p-dimensional covariates for subject j at area si, where j=1,,ni, i=1,,M. So, the sample size is n=i=1Mni. Then, we introduce δijs to describe the types of censoring and the interval (aij,bij) to denote where the tij lies. Specifically, let the δij be a four-dimensional vector with the constraint δij0+δij1+δij2+δij3=1. Then δij0=1 means the observed time of subject j at area si is exact and aij=bij; δij1=1 means left-censoring and the interval is of the form (0,bij); δij2=1 means interval-censoring and the interval is of the form (aij,bij); δij3=1 means right-censoring and the interval is of the form (aij,). The event and censoring times are assumed to be independent given the observed covariates [31]. Then, the observed data are denoted by D={aij,bij,xij,δij:j=1,,ni;i=1,,M}. After some assumption and notation as above, we can obtain the likelihood function of subject j at area si as follows

Lij(β,Λ0,ϕ|D)=f(aij|xij)δij0F(bij|xij)δij1×[F(bij|xij)F(aij|xij)]δij2[1F(aij|xij)]δij3. (1)

2.2. Modeling Λ0(t) and λ0(t) by monotone spline

The infinite dimensionality of Λ0() brings a great challenge to estimation. We refer to [7,16] to overcome this difficulty and adopt a linear combination of monotone splines to model the cumulative baseline hazard function Λ0(t). In particular, the I-spines are used to model Λ0(t), that is,

Λ0(t)=l=1kγlIl(t), (2)

where Il()s are monotone spline basis functions and are non-decreasing from 0 to 1, and γls are nonnegative spline coefficients to ensure that Λ0(t) are nondecreasing. Meanwhile, the I-spines are the integrated functions of M-splines [28]. Consequently, the baseline hazard function λ0(t) can be written as

λ0(t)=l=1kγlMl(t). (3)

The spline basis functions can be determined by specifying the degree d, the number of interior knots m, and the locations of interior knots. The number of basis functions k is equal to d + m. For the degree d, we recommend using 2 or 3 to ensure adequate smoothness of the spline functions. For the number of interior knots m, using 10 to 30 should provide adequate modeling flexibility for data sets with thousands of observations [7].

3. Methodology

3.1. Data augmentation

3.1.1. Frailty transformation

Due to the complexity of the survival and hazard functions in the transformation model, the likelihood function (1) is highly intractable. Consequently, the class of frailty-induced transformations is used to simplify the calculation, and the transformation function can be rewritten as

G(x)=log0exp()f(ξ)dξ,

where f(ξ) is the density function of a frailty variable with support [0,) [37]. If G(x) is the class of logarithmic transformations, that is G(x)=log(1+rx)/r(r0), f(ξ) is gamma density with unit mean and variance r. r = 0 means the PH models, and r = 1 means the PO models. After frailty transformation, the form of F(t|x) is

F(t|x)=1exp{G[Λ0(t)exp(xβ+1ϕ)]}=10exp[Λ0(t)exp(xβ+1ϕ)ξ]f(ξ)dξ=0{1exp[Λ0(t)exp(xβ+1ϕ)ξ]}f(ξ)dξ.

According to the total probability theorem, the conditional cumulative distribution function is as follows

F(t|x,ξ)=1exp[Λ0(t)exp(xβ+1ϕ)ξ], (4)

and the distribution of ξ is f(ξ). Similarly, the conditional probability density function can be written as

f(t|x,ξ)=λ0(t)exp(xβ+1ϕ)ξexp[Λ0(t)exp(xβ+1ϕ)ξ]. (5)

3.1.2. Poisson latent variables for interval-censoring

Inspired by [35], the relationship between the conditional cumulative distribution function (4) and a non-homogeneous Poisson process can be used to make the calculation simpler. Specifically, let N(t) be a latent Poisson process which has cumulative intensity function Λ0(t)exp(xβ+1ϕ)ξ, then the cumulative distribution function of T=inf{t:N(t)>0}, time of the first occurrence in the Poisson process, has the same form with F(t|x,ξ).

Define two time points t1<t2, wherein t1=bδ1+aδ2+aδ3 and t2=b+δ1+bδ2+aδ3, where a,b form the interval of t, δ1,δ2,δ3 are used to distinguish the censoring type, a is an arbitrary time point between 0 and a, and b+ is an arbitrary time point greater than b. Then two latent variables z=N(t1) and w=N(t2)N(t1) are regarded as the event number of the interval. Consequently, z and w are independent Poisson random variables with distribution parameter Λ0(t1)exp(xβ+1ϕ)ξ and [Λ0(t2)Λ0(t1)]exp(xβ+1ϕ)ξ. Taking the relation of event occurrence and latent variable into account, some constraints should be set, that is z>0 for δ1=1, z = 0, w>0 for δ2=1, and z = w = 0 for δ3=1. Since the additivity of Poisson distribution and the summation form of the cumulative baseline hazard function (2), we decompose z and w respectively into k independent Poisson latent variables zl and wl, where l=1,,k. After some simple calculations, we know that (z1,,zk) have multinomial distribution given z, and (w1,,wk) have multinomial distribution given w.

3.1.3. Multinomial latent variable for exact observation

In consideration of the conditional probability density function (5), a summation form in the baseline hazard function (3) brings challenges to directly sampling γls. To obtain the Bayesian posterior distribution of spline parameter γl more easily, the latent variable u=(u1,,uk)Multinomial(1;1/k,,1/k) are introduced [36]. This process can change l=1kγlMl(t) to kl=1k(γlMl(t))ul.

3.2. CAR model

For the spatial term ϕ, we refer to [3] and apply the conditionally autoregressive (CAR) prior

ϕi|{ϕj:ij}N(jhijϕj/hi+,1/τhi+),i=1,,M,

where hij is the element of adjacency matrix H, and ϕi is the component of ϕ, for i=1,,M. Specially, hij=1 if area si and sj are neighbors, 0 otherwise, and hii=0. hi+=jhij is the totally number of neighbors of area si. The parameter τ, which must be positive, is the spatial precision parameter. Then by Brook's Lemma in [6], the CAR prior distribution of ϕ can be rewritten as

π(ϕ|τ)exp{τ2ϕ(DhH)ϕ},

where Dh is a diagonal matrix with (Dh)ii=hi+. However, DhH is not a positive definite matrix. In this study, we set a sum-to-zero constraint iϕi=0 motivated by [1]. In addition, the density distribution needs a proportionality constant, which takes a form like τq. Hodges, Carlin, and Fan [13] derived q=(Mg)/2, where g is the number of disconnected groups of areas.

3.3. Prior specification and posterior inference

For regression parameter β, a normal prior N(0p,σ0Ip×p) is used where σ0>0. For each spline parameter γls, independent exponential priors Exp(η) are used. This prior is not only the conjugate prior, but it is also the shrinkage prior of γl. The parameters η are treated as random and assigned hyperpriors for automatic tuning with less computational costs, instead of selecting it by using cross-validation. Here, gamma hyper prior Gamma(aη,bη) is designated for η, with mean aη/bη and variance aη/bη2. Finally, for the spatial precision parameter τ, we set a gamma prior Gamma(aτ,bτ) to it. These priors can lead to conjugate posteriors for both η and τ.

Combining the four-stage augmented likelihood function, CAR model, and the priors of parameters given above, the hierarchical representation can be described as follows:

L(β,γ,ϕ|D,z,w,u,ξ)i=1Mj=1nif(tij|xij,ξij)δij0×P(zij|Λ0(tij1)exp(xijβ+ϕi)ξij)δij1+δij2+δij3P(wij|[Λ0(tij2)Λ0(tij1)]exp(xijβ+ϕi)ξij)δij2+δij3,π(ϕ|τ)τMg2exp{τ2ϕ(DhH)ϕ}, with iϕi=0,βσ0Np(0p,σ0Ip×p),γlηExp(η),l=1,,k,ηGamma(aη,bη),τGamma(aτ,bτ), (6)

where ∝ means the ratio of two sides is constant, and P(|α) is the Poission ditribution where the parameter is α. σ0, aη, bη, aτ and bτ denote the hyper parameters with preassigned values. Moreover, the spatial term of subject j from area si is 1ϕ=ϕi in the likelihood function L(β,γ,ϕ|D,z,w,u,ξ). Consequently, the full joint posterior distribution of all parameters is given by

Lfull(β,γ,ϕ,η,τ|D,z,w,u,ξ)i=1Mj=1nif(tij|xij,ξij)δij0P(zij|Λ0(tij1)exp(xijβ+ϕi)ξij)δij1+δij2+δij3P(wij|[Λ0(tij2)Λ0(tij1)]exp(xijβ+ϕi)ξij)δij2+δij3×exp{ββ2σ0}×l=1kηexp(ηγl)×ηaη1exp(bηη)×τMg2exp{τ2ϕ(DhH)ϕ}×τaτ1exp(bττ).

The Bayesian estimator is the mean or mode of the posterior samples drawn from the full joint posterior distribution. Due to the complexity of posterior distribution, a Gibbs sampler is employed to generate the posterior sample of each component of θ, given the others iteratively, where θ=(β,γ,ϕ,η,τ,z,w,u,ξ). Based on the full joint posterior distribution and the initializing values for the parameters, the steps of the proposed MCMC algorithm are described as follows,

  1. Sample zij, wij, zijl, wijl and uijl for l=1,,k, j=1,,ni, i=1,,M. First let all of them be 0. Then for each ij, if δij0=1, namely exact observation, then
    (uij1,,uijk)Multinomial(1;γ1M1(aij),,γkMk(aij)).
    If δij1=1, namely left-censored, then
    zijPossion(Λ0(bij)exp(xijβ+ϕi)ξij)I(zij>0),(zij1,,zijk)Multinomial(zij;γ1I1(bij),,γkIk(bij)).
    If δij2=1, namely interval-censored, then
    wijPossion({Λ0(bij)Λ0(aij)}exp(xijβ+ϕi)ξij)I(wij>0),(wij1,,wijk)Multinomial(wij;γ1{I1(bij)I1(aij)},,γk{Ik(bij)Ik(aij)}).
  2. Sample ξij for j=1,,ni, i=1,,M, from
    Gamma(δij0+zijδij1+wijδij2+1r,[Λ0(aij)(δij0+δij3)+Λ0(bij)(δij1+δij2)]exp(xijβ+ϕi)+1r).
  3. For β, its full conditional distribution is proportional to
    p(βD,θβ)exp{ββ2σ0+i=1Mj=1nixijβ(δij0+zijδij1+wijδij2)i=1Mj=1niexp(xijβ+ϕi)ξij[Λ0(aij)(δij0+δij3)+Λ0(bij)(δij1+δij2)]}. (7)
    However, the posterior distribution of β is neither conjugate nor standard. Here, the M–H algorithm is used to obtain the posterior samples. The detail of the sampler can be seen in Appendix.
  4. Sample γl for l=1,,k, from
    Gamma(1+i=1n(uijlδij0+zijlδij1+wijlδij2),η+i=1nexp(xijβ+ϕi)ξij[Il(aij)(δij0+δij3)+Il(bij)(δij1+δij2)]).
  5. Sample η from
    Gamma(aη+k,bη+l=1kγl).
  6. For ϕi with i=1,,M, its full conditional distribution can be written as
    p(ϕi|D,θϕi)exp{hi+τ2(ϕijhijϕjhi+)2+j=1ni(δij0+zijδij1+wijδij2)ϕij=1niexp(xijβ+ϕi)ξij[Λ0(aij)(δij0+δij3)+Λ0(bij)(δij1+δij2)]}. (8)
    Similarly with β, the M–H algorithm is used.
  7. Sample τ from
    Gamma(Mg2+aτ,ϕ(DhH)ϕ2+bτ).

As seen from the above construction of the Gibbs sampler, the proposed method is easy to implement and computationally fast.

4. Simulation studies

In this section, the performance of the proposed method is assessed through several simulation studies. First, 500 data sets were generated. The failure times were generated from the following model,

F(t|x)=1exp{G[Λ0(t)exp(xβ+1ϕ)]},

where G(x)=log(1+rx)/r. For the covariates xij=(xij1,xij2) of subject j in area si, it was component by xij1iidBernoulli(0.5) and xij2iidN(0,1). (ϕ1,,ϕ46) followed the CAR model with τ=1, and H was set to be the adjacency matrix included in R package PICBayes. The spatial layout is based on the 46 counties in South Carolina, with n = 20 subjects within each county [24]. Hence, the total sample size is 20×46=920. The true values of regression parameters were taken as β=(0.5,1). The failure time t was generated from the above model with the cumulative baseline hazard functions given by Λ0(t)=0.5t or 0.2t2. Exactly observation rate was set to be 20%. For the remaining interval-censoring samples, we assumed that each subject has a random number of observations generated as 1 plus a Poisson random variable with a mean of 2. Furthermore, the gap time between adjacent observations is independently generated from an exponential distribution with parameter 1. The observed interval [a,b) consisted of two adjacent observation times (including 0 and ∞) that contains the failure time t. The right-censoring rates of our simulated data sets were approximately 25%∼35%.

In this paper, the hyperparameters were taken as σ0=100, which can guarantee the prior distribution of β is noninformative, and aη=1, bη=1, aτ=0.1 and bτ=0.1, which could make the appropriate mean and variance of η and τ. The knots of the spline function were equidistant, and their number was 10. For the hyperparameter c0 and c1 of the M–H algorithm (they are mentioned in the Appendix), we set them to be 1.5 to keep the acceptance rate is about 70%. By running a few test simulations, the results showed that the MCMC chains starting from different initial values mixed well within 1000 iterations. Therefore, with the first 1000 as burn-in, 5000 scans thinned from the remaining 10,000 iterations were enough for Bayesian inference. Figure 1 depicts the trace plots of the posterior sample, which demonstrate the convergence of parameters. The simulations of the proposed method were accomplished by Julia 1.6.3. The code can be found at the link: https://github.com/qiumingyue/BTM-for-Spatial-PIC-Data. All the computations were implemented by 11th Gen Intel(R) Core(TM) i7-11700KF (3.60GHz) CPU and 16 GB RAM.

Figure 1.

Figure 1.

The β's trace plot when the baseline function is Λ0(t)=0.2t2. (a) β1=0.5,PH(r=0), (b) β2=1,PH(r=0), (c) β1=0.5,r=0.5, (d) β2=1,r=0.5, (e) β1=0.5,PO(r=1) and (f) β2=1,PO(r=1).

Table 1 summarizes the results for regression parameters β and the spatial precision parameter τ, including the average bias between the 500 posterior means (Bias–Mean, the empirical standard error (ESE) in the brackets), the average bias between the 500 posterior modes (Bias–Mode, the empirical standard error (ESE) in the brackets), sample standard deviation (SSD) and the coverage probability for the 95% credible intervals (CP). The result shows that the point estimators of regression parameter β, including mean and mode, are unbiased under six cases, SSDs are closed to ESEs, and CP values are near to the nominal 95%. So, both the posterior mean and mode can be the estimation of regression parameter β. However, for the spatial precision parameter τ, the posterior means are far away from the true value. Additionally, they have large ESE. The results inspire us that the posterior mean is not a suitable estimation for this parameter. Combining the posterior distribution of the parameter τ and the observation of the posterior samples, the posterior distribution is a highly skewed gamma distribution. Additionally, the mean is greatly affected by extreme value. So, the posterior mode is preferred to estimate the spatial precision parameter τ. In summary, this paper applies the posterior mode to estimate both regression parameters β and the spatial precision parameter τ.

Table 1.

Simulation I: the average bias between the 500 posterior mean (Bias–Mean, the empirical standard error (ESE) in the brackets), the average bias between the 500 posterior mode (Bias–Mode, the empirical standard error (ESE) in the brackets), sample standard deviation (SSD) and the coverage probability for the 95% credible intervals (CP).

Hazard function Model Parameter Bias–Mean Bias–Mode SSD CP
Λ(t)=0.5t PH(r = 0) β1=0.5 0.008(0.095) 0.011(0.097) 0.091 0.94
    β2=1 0.011(0.060) 0.017(0.059) 0.059 0.94
    τ=1 0.168(0.459) 0.056(0.365) 0.407 0.95
  r = 0.5 β1=0.5 0.007(0.116) 0.009(0.124) 0.117 0.96
    β2=1 0.008(0.073) 0.015(0.068) 0.069 0.92
    τ=1 0.245(0.635) 0.055(0.443) 0.562 0.95
  PO(r = 1) β1=0.5 0.004(0.138) 0.007(0.142) 0.137 0.94
    β2=1 0.006(0.084) 0.015(0.076) 0.078 0.92
    τ=1 0.414(0.946) 0.062(0.498) 0.811 0.97
Λ(t)=0.2t2 PH(r = 0) β1=0.5 0.012(0.092) 0.019(0.096) 0.094 0.95
    β2=1 0.023(0.059) 0.029(0.058) 0.060 0.92
    τ=1 0.239(0.506) 0.097(0.396) 0.451 0.94
  r = 0.5 β1=0.5 0.022(0.118) 0.018(0.128) 0.119 0.94
    β2=1 0.023(0.068) 0.033(0.066) 0.069 0.94
    τ=1 0.393(0.821) 0.093(0.435) 0.670 0.94
  PO(r = 1) β1=0.5 0.009(0.144) 0.026(0.144) 0.139 0.94
    β2=1 0.024(0.075) 0.033(0.072) 0.078 0.94
    τ=1 0.522(0.957) 0.118(0.530) 0.915 0.96

Next, we compared the proposed method with R function spatialPIC in the package PICBayes, which are designed for the PH model, and R function survregbayes in the package spBayesSurv, which are designed for PH, PO and AFT model in terms of time and performance. Both of them can be used in PIC data with spatial frailty, and the calculations were implemented by R. Here, we compared these methods under PH and PO models. The parameter settings are the same as above.

Figure 2 shows the boxplots of the proposed method, spatialPIC and survregbayes. It demonstrates that the proposed method is closer to and more concentrated around the true value. Table 2 summarizes the comparison results for regression parameter β and the spatial precision parameter τ in detail. In order to observe the performance of each method easily, we report the bias between the posterior mode and the true value (Bias) here. Firstly, the result shows that the point estimators of our method are better in most cases. Meanwhile, the SSD and ESE of the three methods are similar. As to the CP, our method is around 95% stably. Finally, the running time of our method is one-quarter to one-third of the other methods. In other words, our method can produce a more accurate estimate in less time.

Figure 2.

Figure 2.

The boxplot of three methods in every case (spatialPIC can not be used in PO(r=1)). (a) Λ0(t)=0.5t,PH(r=0), (b) Λ0(t)=0.2t2,PH(r=0), (c) Λ0(t)=0.5t,PO(r=1) and (d) Λ0(t)=0.2t2,PO(r=1).

Table 2.

Simulation II: comparison of the proposed method with spatialPIC and survregbayes.

Hazard function Model Methode Time Parameter Bias ESE SSD CP
λ(t)=0.5t PH(r = 0) proposed method 89 β1=0.5 0.011 0.097 0.091 0.94
        β2=1 0.017 0.059 0.059 0.94
        τ=1 0.056 0.365 0.407 0.95
    spatialPIC 212 β1=0.5 0.003 0.124 0.110 0.92
        β2=1 0.014 0.068 0.055 0.84
        τ=1 0.148 0.573 0.465 0.93
    survregbayes 257 β1=0.5 0.010 0.085 0.091 0.97
        β2=1 0.032 0.057 0.059 0.92
        τ=1 0.037 0.302 0.314 0.94
  PO(r = 1) proposed method 64 β1=0.5 0.007 0.142 0.137 0.94
        β2=1 0.015 0.076 0.078 0.92
        τ=1 0.062 0.498 0.811 0.97
    survregbayes 231 β1=0.5 0.007 0.129 0.138 0.96
        β2=1 0.007 0.080 0.079 0.94
        τ=1 0.050 0.442 0.434 0.93
λ(t)=0.2t2 PH(r = 0) proposed method 75 β1=0.5 0.019 0.096 0.094 0.95
        β2=1 0.029 0.058 0.060 0.92
        τ=1 0.097 0.396 0.451 0.94
    spatialPIC 188 β1=0.5 0.014 0.128 0.116 0.93
        β2=1 0.047 0.065 0.054 0.77
        τ=1 0.460 0.804 0.667 0.92
    survregbayes 252 β1=0.5 0.028 0.087 0.094 0.96
        β2=1 0.064 0.055 0.058 0.80
        τ=1 0.109 0.281 0.300 0.93
  PO(r = 1) proposed method 67 β1=0.5 0.026 0.144 0.139 0.94
        β2=1 0.033 0.072 0.078 0.94
        τ=1 0.118 0.530 0.915 0.96
    survregbayes 229 β1=0.5 0.008 0.140 0.142 0.94
        β2=1 0.007 0.082 0.081 0.95
        τ=1 0.051 0.458 0.442 0.92

Notes: Bias is the average bias between posterior mode and true value with 500 repeat, and time is measured in seconds. The meaning of other indicators is the same as above.

5. Application

In this section, we investigate whether some spatial effects exist, using data maintained by the North West Leukemia Register in the United Kingdom [11]. This data includes 1043 cases recorded between 1982 and 1998. Complete information is available for four covariates: age; sex (0 for female, 1 for male); white blood cell count (wbc) at diagnosis, truncated at 500 units with 1 unit =50×109/L; and Townsend score (tpi), which is a quantitative measure with higher values indicating less affluent areas. There are M = 24 administrative districts whose locations are showed in Figure 3(a) (Numbers are district identifiers). Furthermore, Figure 3(b) illustrates the mean survival time in days by the district. We can see that the mean survival time of adjacent neighborhoods is similar – either long or short at the same time. This encourages us to consider the spatial effect and explore its existence.

Figure 3.

Figure 3.

District Boundaries and Location of AML Cases in North West England (a) and mean survival time by district in days (b). Numbers are district identifiers. (a) District Boundaries and (b) The mean of survival time in days.

With respect to the selection of r, two model selection criteria are applied: the deviance information criterion (DIC) [30] and the log pseudo marginal likelihood (LPML) [9]. Specifically, LPML is the sum of log conditional predictive ordinates and measures model cross-validation predictive performance and DIC equals the posterior mean of deviance plus the model's effective number of parameters. Smaller DIC and absolute values of LPML indicate better model fit. By running a few tests, the results showed that the MCMC chains starting from different initial values mixed well within 4000 iterations. Meanwhile, to eliminate the deviation caused by the autoregression of posterior samples, 5000 samples which are thinned from 10,000 after a burned-in period of 4000 are used for Bayesian inference. Similar to the simulation, the posterior mode is used to get the parameter's estimation.

To compare the performance of different r, we take r on [0,3] with a step size of 0.1. The smallest DIC and absolute value of LPML arose around r = 2. From the practical point of view, however, one may also want to consider the model interpretation. Based on these, one may prefer to choose r = 0 or r = 1, which correspond to PH and PO, respectively. Additionally, r = 1 can give the two criteria close to the smallest. So, here, we fit this data by r = 0, 0.5 and 1 with non-frailty and CAR frailty to see which one performs better.

From Table 3, we can see that PO (r = 1) model always outperforms PH (r = 0) model and r = 0.5 regardless of the spatial frailty assumptions. Therefore, the PO model is more suitable for this data. As for spatial frailty, under PH (r = 0) model and r = 0.5, CAR frailty significantly improves the goodness of fit over the non-frailty model. When PO (r = 1) model is applied, CAR frailty and Non-spatial have the similar LPML, but CAR frailty has a lower DIC. This also verifies the information in Figure 3, that is, the spatial effect dose exist. At the same time, Table 3 shows the posterior 95% credible interval of sex covers 0. This suggests that sex is a nonsignificant covariate for leukemia. Finally, age, wbc, and tpi are considered. To sum up, considering age, wbc and tpi, we use the PO model with CAR spatial effect to fit the data. After model selection, we compare the proposed method with other methods. Since spatialPIC can not be used to fit the PO model, here we only compare our method with survregbayes in Table 4. Compared with the result of the PO model with CAR spatial effect in Table 3, the model without sex gets similar LPML and lower DIC. It verifies that sex is a nonsignificant covariate, again. Additionally, our method and suvregbayes have matching estimations, but the absolute value of LPML and DIC of our method are lower. As for the covariate, tpi has the most significant influence on leukemia. A higher value indicates less affluent areas and higher leukemia risk. The second major influencing covariate is age. The older the age, the higher the risk. Finally, wpc's impact is small, but cannot be ignored.

Table 3.

Comparison of the model (r = 0, 0.5, 1) and frailty (CAR frailty vs non-spatial).

  Model LPML DIC Covariate Estimator SE 95%CI
CAR frailty PH(r = 0) 6013 12016 age 0.0320 0.0020 (0.0283,0.0363)
        sex 0.0798 0.0702 ( 0.0835,0.1940)
        wpc 0.0035 0.0005 (0.0025,0.0043)
        tpi 0.0302 0.0100 (0.0131,0.0524)
  r = 0.5 5962 11916 age 0.0458 0.0027 (0.0414,0.0514)
        sex 0.0712 0.0943 ( 0.0889,0.2867)
        wpc 0.0059 0.0007 (0.0045,0.0072)
        tpi 0.0542 0.0141 (0.0245,0.0724)
  PO(r = 1) 5938 11869 age 0.0524 0.0030 (0.0244,0.0805)
        sex 0.1205 0.1204 ( 0.1471,0.3216)
        wpc 0.0078 0.0009 (0.0062,0.0095)
        tpi 0.0610 0.0178 (0.0288,0.1000)
Non-spatial PH(r = 0) 6022 12040 age 0.0303 0.0019 (0.0270,0.0346)
        sex 0.0596 0.0670 ( 0.0866,0.1884)
        wpc 0.0031 0.0005 (0.0022,0.0040)
        tpi 0.0297 0.0093 (0.0138,0.0492)
  r = 0.5 5965 11928 age 0.0454 0.0032 (0.0383,0.0503)
        sex 0.0927 0.0892 ( 0.0895,0.2558)
        wpc 0.0060 0.0007 (0.0044,0.0071)
        tpi 0.0520 0.0119 (0.0313,0.0781)
  PO(r = 1) 5939 11874 age 0.0511 0.0038 (0.0470,0.0619)
        sex 0.0746 0.1112 ( 0.1365,0.2922)
        wpc 0.0081 0.0009 (0.0060,0.0096)
        tpi 0.0678 0.0156 (0.0383,0.0998)

Note: SE is the standard error of posterior samples and 95%CI is the posterior 95% credible interval.

Table 4.

Comparison with suvregbayes under the PO model using age, wpc and tpi.

Method Time LPML DIC Covariate Estimator SE 95%CI
proposed method 54 5937 11865 age 0.0512 0.0028 (0.0485,0.0595)
        wpc 0.0076 0.0008 (0.0059,0.0093)
        tpi 0.0654 0.0172 (0.0317,0.0983)
suvregbayes 295 5962 11916 age 0.0517 0.0034 (0.0452,0.0590)
        wpc 0.0060 0.0008 (0.0044,0.0077)
        tpi 0.0597 0.0156 (0.0276,0.0900)

6. Conclusion

In this paper, a Bayesian method including spatial effect based on a semiparametric transformation model is developed by adding a CAR prior distribution to the spatial random effect. The general data type, PIC data, is considered. After introducing a series of latent variables, the proposed method does not contain any complicated Metropolis steps and all the sampling steps are straightforward or automatic. Simulations show that the proposed method performs comparably as well as spatialPIC and survregbayes. Furthermore, the running time we spend is shorter and the models we can fit are more flexible than the others. In the analysis of leukemia data, we find that there is indeed a spatial effect, and identify the significant covariates tpi, age and wpc.

There are some directions for extension or future research. The case we investigated in this paper assumes that all subjects of the population under study will experience the event of interest as time spans for a sufficiently long period. However, technological breakthroughs in medicine during the last decades have led to the development of new treatments so that many diseases previously considered fatal can now be cured. This phenomenon has triggered the emergence of the cure rate models [10,32]. It is worthwhile to take the spatial effect into account under this model. Moreover, the spatial effect we focus on is only one case related to location, namely, lattice, and we can also consider other correlations to characterize the spatial effect, such as geostatistical approaches [15,33]. Lastly, the proposed approach assumes that the censoring mechanism is noninformative. This assumption is sometimes untenable in practice [22,31].

Acknowledgments

The authors wish to thank the Editor, the Associate Editor and two reviewers for their many helpful and insightful comments and suggestions.

Appendix: The M–H algorithm for sampling β and ϕ.

In Section 3.3, the full conditional distribution of β (7) has a complex form. As a result, we consider the log form of it as follows,

log(p(βD,θβ))=ββ2σ0+i=1Mj=1nixijβ(δij0+ziδij1+wiδij2)i=1Mj=1niexp(xijβ+ϕi)ξij[Λ0(aij)(δij0+δij3)+Λ0(bij)(δij1+δij2)].

It is not a standard distribution. Thus, the M–H algorithm is employed to sample β, which is described as follows.

  1. Generate a candidate vector β(prop) from the proposal distribution g(β|β(m))=dN(β(m),Σβ(m)), where =d represents the same distribution on both sides, and β(m) is the value of β at mth iteration. The form of covariance matrix Σ(m) is

    Σβ(m)=c02{[2log(p(βD,θβ))ββ]|β(m)}1=c02{i=1Mj=1niexp(xijβ+ϕi)ξij[Λ0(aij)(δij0+δij3)+Λ0(bij)(δij1+δij2)]xijxij+Iσ0}1,
    where c0 is a tuning parameter that controls the acceptance rate of β.
  2. Generate u from the uniform distribution U(0,1) and calculate the acceptance rate raccept from the following equation:
    raccept=min{1,p(β(prop)D,θβ)g(β(m)β(prop))p(β(m)D,θβ)g(β(prop)β(m))}.
    If u<raccept, β(prop) will be accepted and let β(m+1)=β(prop); otherwise, let β(m+1)=β(m).

Similarly, ϕ=(ϕ1,,ϕM) can be sampled as above by considered the log form of (8). Here, ϕ can be sampled by single dimension, and the proposal distribution of ϕi, g(ϕi|ϕi(m))=dN(ϕi(m),σϕi(m)), where σϕi(m) has form as follows,

σϕi(m)=c12{[2log(p(ϕiD,θϕi))ϕi2]|ϕi(m)}1=c12{j=1niexp(xijβ+ϕi)ξij[Λ0(aij)(δij0+δij3)+Λ0(bij)(δij1+δij2)]+hi+τ}1.

Funding Statement

This research partially supported by the Beijing Natural Science Foundation [grant number Z210003] and National Nature Science Foundation of China [grant numbers 12171328 and 11971064].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Banerjee S., Carlin B., and Gelfand A., Hierarchical Modeling and Analysis of Spatial Data, 2nd ed., Chapman and Hall/CRC, 2014. [Google Scholar]
  • 2.Banerjee S., Wall M.M., and Carlin B.P., Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota, Biostatistics 4 (2003), pp. 123–142. [DOI] [PubMed] [Google Scholar]
  • 3.Besag J., Spatial interaction and the statistical analysis of lattice systems, J. R. Stat. Soc. B Methodol. 36 (1974), pp. 192–225. [Google Scholar]
  • 4.Besag J. and Kooperberg C., On conditional and intrinsic autoregressions, Biometrika 82 (1995), pp. 733–746. [Google Scholar]
  • 5.Bezanson J., Edelman A., Karpinski S., and Shah V.B., Julia: A fresh approach to numerical computing, SIAM Rev. 59 (2017), pp. 65–98. [Google Scholar]
  • 6.Brook D.F., On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems, Biometrika 51 (1964), pp. 481–483. [Google Scholar]
  • 7.Cai B., Lin X., and Wang L., Bayesian proportional hazards model for current status data with monotone splines, Comput. Stat. Data Anal. 55 (2011), pp. 2644–2651. [Google Scholar]
  • 8.Gao F., Zeng D., and Lin D.Y., Semiparametric estimation of the accelerated failure time model with partly interval-censored data, Biometrics 73 (2017), pp. 1161–1168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Geisser S. and Eddy W.F., A predictive approach to model selection, J. Am. Stat. Assoc. 74 (1979), pp. 153–160. [Google Scholar]
  • 10.Gressani O., Faes C., and Hens N., Laplacian-p-splines for bayesian inference in the mixture cure model, Stat. Med. 41 (2022), pp. 2602–2626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Henderson R., Shimakura S., and Gorst D., Modeling spatial variation in leukemia survival data, J. Am. Stat. Assoc. 97 (2002), pp. 965–972. [Google Scholar]
  • 12.Hennerfeind A., Brezger A., and Fahrmeir L., Geoadditive survival models, J. Am. Stat. Assoc. 101 (2006), pp. 1065–1075. [Google Scholar]
  • 13.Hodges J.S., Carlin B.P., and Fan Q., On the precision of the conditionally autoregressive prior in spatial models, Biometrics 59 (2003), pp. 317–322. [DOI] [PubMed] [Google Scholar]
  • 14.Jin X., Carlin B., and Banerjee S., Generalized hierarchical multivariate car models for areal data, Biometrics 61 (2005), pp. 950–961. [DOI] [PubMed] [Google Scholar]
  • 15.Jin J., Zhang L., Leng E., Metzger G.J., and Koopmeiners J.S., Bayesian spatial models for voxel-wise prostate cancer classification using multi-parametric magnetic resonance imaging data, Stat. Med. 41 (2022), pp. 483–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Joly P., Commenges D., and Letenneur L., A penalized likelihood approach for arbitrarily censored and truncated data: application to age-specific incidence of dementia, Biometrics 54 (1998), pp. 185–194. [PubMed] [Google Scholar]
  • 17.Kim J., Maximum likelihood estimation for the proportional hazards model with partly interval-censored data, J. R. Stat. Soc. B. 65 (2003), pp. 489–502. [Google Scholar]
  • 18.Kneib T., Mixed model-based inference in geoadditive hazard regression for interval-censored survival times, Comput. Stat. Data Anal. 51 (2006), pp. 777–792. [Google Scholar]
  • 19.Li Y. and Lin X., Semiparametric normal transformation models for spatially correlated survival data, J. Am. Stat. Assoc. 101 (2006), pp. 591–603. [Google Scholar]
  • 20.Li Y. and Ryan L., Modeling spatial survival data using semiparametric frailty models, Biometrics 58 (2002), pp. 287–297. [DOI] [PubMed] [Google Scholar]
  • 21.Liu Y., Sun D., and He C.Z., A hierarchical conditional autoregressive model for colorectal cancer survival data, Wiley Interdiscip. Rev. Comput. Stat. 6 (2014), pp. 37–44. [Google Scholar]
  • 22.Ma L., Hu T., and Sun J., Sieve maximum likelihood regression analysis of dependent current status data, Biometrika 102 (2015), pp. 731–738. [Google Scholar]
  • 23.Martins R., Silva G.L., and Andreozzi V., Bayesian joint modeling of longitudinal and spatial survival aids data, Stat. Med. 35 (2016), pp. 3368–3384. [DOI] [PubMed] [Google Scholar]
  • 24.Pan C. and Cai B., A bayesian model for spatial partly interval-censored data, Commun. Stat. Simul. Comput. 51 (2022), pp. 7513–7525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pan C., Cai B., and Wang L., A bayesian approach for analyzing partly interval-censored data under the proportional hazards model, Stat. Methods. Med. Res. 29 (2020), pp. 3192–3204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Pan C., Cai B., Wang L., and Lin X., Bayesian semiparametric model for spatially correlated interval-censored survival data, Comput. Stat. Data Anal. 74 (2014), pp. 198–208. [Google Scholar]
  • 27.Perkel J.M., Julia: come for the syntax, stay for the speed, Nature 572 (2019), pp. 141–142. [DOI] [PubMed] [Google Scholar]
  • 28.Ramsay J., Monotone regression splines in action, Stat. Sci. 3 (1988), pp. 425–441. [Google Scholar]
  • 29.Schnell P., Bandyopadhyay D., Reich B., and Nunn M., A marginal cure rate proportional hazards model for spatial survival data, J. R. Stat. Soc. C. Appl. Stat. 64 (2015), pp. 673–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Spiegelhalter D., Best N., Carlin B., and Linde A., Bayesian measures of model complexity and fit (with discussion), J. R. Stat. Soc. B. 64 (2002), pp. 583–639. [Google Scholar]
  • 31.Sun J., The Statistical Analysis of Interval-censored Failure Time Data, Springer, New York, 2006. 01. [Google Scholar]
  • 32.Sun L., Li S., Wang L., and Song X., A semiparametric mixture model approach for regression analysis of partly interval-censored data with a cured subgroup, Stat. Methods. Med. Res. 30 (2021), pp. 1890–1903. [DOI] [PubMed] [Google Scholar]
  • 33.Utazi C., Thorley J., Alegana V., Ferrari M., Nilsen K., Takahashi S., Metcalf C., Lessler J., and Tatem A., A spatial regression model for the disaggregation of areal unit based data to high-resolution grids with application to vaccination coverage mapping, Stat. Methods. Med. Res. 28 (2019), pp. 3226–3241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wang C., Jiang J., and Song X., Bayesian transformation models with partly interval?censored data, Stat. Med. 41 (2021), pp. 1263–1279. [DOI] [PubMed] [Google Scholar]
  • 35.Wang L., Mcmahan C., Hudgens M., and Qureshi Z., A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data, Biometrics 72 (2015), pp. 222–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wang L. and Wang L., Regression analysis of arbitrarily censored survival data under the proportional odds model, Stat. Med. 40 (2021), pp. 3724–3739. [DOI] [PubMed] [Google Scholar]
  • 37.Zeng D., Mao L., and Lin D., Maximum likelihood estimation for semiparametric transformation models with interval-censored data, Biometrika 103 (2016), pp. 253–271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhou H. and Hanson T., A unified framework for fitting bayesian semiparametric models to arbitrarily censored survival data, including spatially referenced data, J. Am. Stat. Assoc. 113 (2018), pp. 571–581. [Google Scholar]
  • 39.Zhou Q., Sun Y., and Gilbert P.B., Semiparametric regression analysis of partly interval? Censored failure time data with application to an aids clinical trial, Stat. Med. 40 (2021), pp. 4376–4394. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES