Abstract
In some applications, the clustered survival data are arranged spatially such as clinical centers or geographical regions. Incorporating spatial variation in these data not only can improve the accuracy and efficiency of the parameter estimation, but it also investigates the spatial patterns of survivorship for identifying high-risk areas. Competing risks in survival data concern a situation where there is more than one cause of failure, but only the occurrence of the first one is observable. In this paper, we considered Bayesian subdistribution hazard regression models with spatial random effects for the clustered HIV/AIDS data. An intrinsic conditional autoregressive (ICAR) distribution was employed to model the areal spatial random effects. Comparison among competing models was performed by the deviance information criterion. We illustrated the gains of our model through application to the HIV/AIDS data and the simulation studies.
KEYWORDS: Competing risks, subdistribution hazard, cumulative incidence function, spatial random effect, Markov chain Monte Carlo
1. Introduction
In biomedical studies it is common to have time to event data so that the event of interest is usually death, giving rise to the survival analysis. In the survival analysis, in many situations, there are some potential risk factors that are immeasurable or unobservable. In the presence of such risk factors, the usual survival models such as the Cox proportional hazards model are not proper [28]. For solving this problem, Vaupel et al. introduced a model with the univariate random effect that corresponds to health status of the stratum [36]. Also, they referred to this random effect as frailty, therefore, these models became known as the frailty models. Clayton and Cuzik introduced the generalization of the proportional hazards model by including a random effect to the Cox proportional hazards model to account for variability due to unknown risk factors [11]. Frailty models could improve the accuracy and efficiency of the parameter estimation when the survival data are independent. But, in some situations, the survival data are dependent such as when a sample of individuals was grouped into clusters such as clinical centers, geographic regions, and so on.
If individuals come from different regions, there will be a spatial correlation between survival data because the geographically closer regions usually are the same or similar in terms of the environmental and social factors, therefore, data from the same or nearer regions are likely to be more similar than those from farther regions. Ignoring this spatial correlation results in biased estimates and misleading inferences. Moreover, the spatial survival analysis by mapping the spatial distribution could identify some of the geographical inequalities that exist in survival and find places or populations that require public health improvements. Spatial models are grouped into three categories according to the data structure: point-referenced (geostatistical) data, where the exact geographic locations (e.g. latitude and longitude) are used, areal (lattice) data, where the region of study is divided into a number of areal units with well-defined boundaries and the positions of the units relative to each other are used (e.g. which units neighbor which others), and point pattern data, where the response is often fixed, and only the locations are assumed as random [3].
In biostatistics and epidemiology researches, modeling spatial survival data using region-specific random effects accounting for the spatial association has become increasingly popular. For example, Li and Ryan [35] and Banerjee et al. [5] proposed a spatial survival model using the proportional hazards structure from the classical and Bayesian perspective, respectively. In the latter paper, Banerjee et al. also compared the geostatistical and areal approaches and showed that the geostatistical frailty model is time-consuming and it produces results that differ little from the areal frailty model. The survival model for capturing spatiotemporal variation in the survival data was investigated by Banerjee and Carlin [5] and Hanson et al. [21]. Banerjee and Dey presented a semiparametric hierarchical modeling framework for the proportional odds model in the spatially survival data [4]. Diva et al. used a spatial Bayesian survival model within the proportional hazards and the proportional odds frameworks on diagnosed patients with multiple gastrointestinal cancers [13]. Pan et al. proposed a spatial Bayesian semiparametric model to analyze interval-censored survival data [29]. Lately, Cramb et al. proposed a spatial flexible parametric relative survival model [12], and Zhou and Hanson applied a spatial semiparametric survival model to arbitrarily censored survival data [40].
In the survival data, there is also a situation where the time from the starting point to an event of interest may not be observable, because of the incidence of another, so-called competing event. Thus, in competing risks data, there is more than one cause of failure, but only the occurrence of the first one is observable. The models and interpretation of competing risks data are not straightforward because of the problem of the estimation of failure probabilities after exclusion of one of competing risks [32]. There are two approaches for analyzing competing risks data: (i) modeling of the cause-specific hazards function and (ii) direct modeling of the cause-specific cumulative incidence function (CIF). The first model relates the covariate effects with the cause-specific hazard while the second model relates covariate effects with the cause-specific cumulative incidence (absolute risk). Because in many medical researches the direct information on the cumulative incidence would be more appropriate the direct modeling of the cause-specific CIF is an interpretation-friendly model. The most common method for direct modeling of the cause-specific CIF is to model subhazard (hazard function of subdistribution) first described by Fine and Gray [16].
In terms of competing risks models for clustered data, Christian and et al. considered a flexible cause-specific hazard frailty model using estimation method the hierarchical likelihood [10]. Lai and et al. proposed a cause-specific hazard model with bivariate random effects to analyze clustered two competing risks using the estimation procedure of Generalized Linear Mixed Model [26]. Chen et al. investigated clustered competing risks data using inference procedure of nonparametric and developed an estimator for a multivariate CIF and robust point-wise standard errors [9]. Gorfine and Hsu applied a flexible cause-specific hazard frailty model with the flexible structure of the dependency among competing risks within a cluster and the unspecified within-subject dependency structure [19]. Another approach to handling the potential heterogeneity in clustered competing risks data is extending the Fine and Gray model to the subhazard frailty model [14,24,38]. Zhou et al. proposed an procedure for modeling the subdistribution hazard by stratification to allow the baseline hazard to vary across strata [39]. Ha and et al. investigated a subhazards model with multivariate frailty to consider the potential heterogeneity in treatment effects among centers [20]. Regarding spatially clustered data, more recently, Hesam et al. used cause-specific hazard spatial frailty model with multivariate conditional autoregressive distribution for frailties in the clustered competing risks data [22]. But, to the best of our knowledge, no model for the subdistribution hazards in the spatially correlated competing risks data has been applied. Thus, the new aspect of this paper is the extension of a regression model to model the subdistribution hazards, in the spatially clustered competing risks HIV/AIDS data. The motivation of our model in the HIV/AIDS data is to address the primary question of effect of prognostic factors on the cumulative incidences of AIDS progression and death before AIDS while accounting for the spatially heterogeneity among regions. Also, subsequent objectives may be (i) investigating the patterns of spatial inequalities in the cumulative incidences of AIDS progression and death before AIDS for identifying high-risk areas and (ii) investigating the necessity for including spatially varying covariates such as socioeconomic status, access and quality of healthcare in the modeling of the subdistribution hazards.
The rest of this paper is outlined as follows. Section 2 describes the HIV/AIDS data that were the motivation of our proposed model. Section 3 formulates the proposed model, including the notations, modeling the proportional subdistribution hazards with the spatial random effects, the estimation method and model comparison using deviance information criterion (DIC) [35]. Section 4 analyzes the HIV/AIDS data. In Section 5, the performance of the proposed model is evaluated through simulation studies. Finally, Section 6 presents a discussion of related issues and future investigation areas.
2. The HIV/AIDS data
The data were from a retrospective cohort study, which was conducted in Hamadan Province, the central-western part of Iran, from 1997 to 2011. All 585 HIV-positive people who had a medical record in the HIV testing and treatment centers were included in this study. The explanatory variables included were as follows: age at the time of diagnosis that was classified into three categories (≤24, 25–44 and ≥45 years), sex, marital status, method of transmission (injection drug use (IDU), sexual, mother to child, IDU/sexual) and co-infection with tuberculosis (TB). Also, date of HIV diagnosis, date of progression to AIDS, date of death (if any) and patient’s county of residence were collected based on the information documented in the patients’ medical records. Information about vital status was checked through December 31, 2011, using active contact with patients. The main outcome in this study was the time interval, in years, between HIV diagnosis and AIDS progression, so the event of interest was AIDS progression. However, some of the HIV infected patients die before AIDS progression. Thus, death before AIDS was considered as competing risk because it prevents AIDS progression. The patients who did not experience any of these events until December 31, 2011, or were lost to follow-up were considered as censored. Accordingly, in the final outcome classification, patients were categorized into three categories: those who developed AIDS (23.4%), those who died before AIDS (22.9%), and those who censored (53.7%). The mean duration of follow-up was 2.46 years with the range of 0 to 14 years. The mean (standard deviation) age of patients was 32.59 (8.71) years, ranging from birth to 66 years. Also, of the 585 HIV-positive subjects, 521 (89.1%) were men and 29 (10.9%) were women and the majority of HIV-infected subjects were acquired HIV through injections (Table 1). The number of HIV/AIDS individuals per Hamadan County was presented in the supplementary material (Table 1). Figure 1 plotted the Aalen-Johansen estimates of the cumulative incidence curve for the risks of AIDS progression and death before AIDS. As seen, a few AIDS progression and death before AIDS occur after 10 years and the cumulative incidence function for both risks plateaus between 10 and 15 years.
Table 1.
Demographic characteristics of the study patients by their final outcome.
Alive and lost to | ||||||||
---|---|---|---|---|---|---|---|---|
follow up | AIDS progression | Death before AIDS | Total | |||||
Number | Percent | Number | Percent | Number | Percent | Number | Percent | |
Variable | (314) | (53.7) | (137) | (23.4) | (134) | (22.9) | (585) | (100) |
Gender | ||||||||
Male | 237 | 52.4 | 116 | 22.3 | 132 | 25.3 | 521 | 89.05 |
Female | 41 | 64.1 | 21 | 32.8 | 2 | 3.1 | 64 | 10.94 |
Marital status | ||||||||
Single | 153 | 57.1 | 58 | 21.6 | 57 | 21.3 | 268 | 45.81 |
Married | 117 | 52.5 | 51 | 22.9 | 55 | 24.7 | 223 | 38.11 |
Divorce | 35 | 46.7 | 21 | 28 | 19 | 25.3 | 75 | 12.82 |
Widow | 9 | 47.4 | 7 | 36.8 | 3 | 15.8 | 19 | 3.24 |
Age | ||||||||
1–24 | 40 | 57.1 | 16 | 22.9 | 14 | 20.0 | 70 | 11.96 |
25–44 | 253 | 55.4 | 105 | 23.0 | 99 | 21.7 | 457 | 78.11 |
45–74 | 21 | 36.8 | 15 | 26.3 | 21 | 36.8 | 57 | 9.74 |
Tuberculosis infection | ||||||||
No | 311 | 55.1 | 121 | 21.5 | 132 | 23.4 | 564 | 96.41 |
Yes | 3 | 14.3 | 16 | 76.2 | 2 | 9.5 | 21 | 3.58 |
Mode of HIV transmission | ||||||||
Injection drug use | 246 | 51.8 | 103 | 21.7 | 126 | 26.5 | 475 | 81.19 |
Sexual | 48 | 66.7 | 19 | 26.4 | 5 | 6.9 | 72 | 12.30 |
Mother to child | 3 | 33.3 | 6 | 66.7 | 0 | 0 | 9 | 1.53 |
Injecting drug use/sexual | 15 | 57.7 | 9 | 34.6 | 2 | 7.7 | 26 | 4.44 |
Figure 1.
The Aalen–Johansen estimates of the cumulative incidence functions for AIDS progression and death before AIDS.
3. Model formulation
In this section, we suppose individuals under study are from counties. The number of individuals in a sample in each county is where and . Also, let be the competing risks data for the th individual living in the th county, , where denotes the time to an event which may be right censored time. In the setting of competing risks, each individual could experience one of the possible failure types during follow-up or could be right censored. Hence, failure type indicator or takes value from , with indicating a censored event and , where , indicating that the th subject fails from the gth type of failure. Moreover, the censoring mechanism is assumed to be non-informative.
3.1. Modeling of subdistribution hazards with spatial random effects
The cause-specific CIF for the gth risk is defined by
(1) |
where is a vector of observed explanatory covariates associated to the gth type of failure. The cause-specific CIF for the gth risk shows the probability that an individual will experience the gth type of failure by time . Fine and Gray developed a proportional hazard model for the subdistribution that directly links the regression coefficients with the CIF [16]. The corresponding hazard function of subdistribution for the gth type of failure at , is defined by
(2) |
which shows the probability that an individual will fail from the gth type of failure in the time interval , given an individual has not experienced any event until time or has previously experienced an event other than the gth type. In other words, an individual who has experienced event before time , but not from event of interest, remain in the risk set for all future failure times. Also, the conditional subhazard function for the gth type of failure under a proportional hazards assumption is modeled as
(3) |
where is an unspecified subdistribution baseline hazard function for the gth type of failure. This model is semi-parametric and baseline function is not directly estimated because, like the Cox model, it is not incorporated into the likelihood. Also, is a vector of regression parameters associated to a vector of observed explanatory covariates for the gth type of failure. The CIF of the gth type of failure is linked directly to the subdistribution hazard model in a following way
Hence, there is the one-to-one relationship between the subdistribution hazard function and the CIF. In other words, the CIF for the gth type of failure can be estimated directly from the regression coefficients obtained by the subdistribution hazard model. But, these coefficients do not quantify the expected change of the CIF for a one unit change of the predictor. Hence, the interpretation of these coefficients is not as straightforward and simple as the regression coefficients in the cause-specific hazard model. The subhazard ratio is the resulting effect measure for each predictor. A subhazard ratio equal one indicates no association between the predictor and the CIF, a subhazard ratio more than one indicates that an increase of the predictor value is associated with an increased risk, whereas a subhazard ratio less than one indicates the opposite. For further details, see Wolbers et al. [37].
The subdistribution hazard model does not make any assumptions about the dependence among the events, and does not need any independence of latent failure times. However, since subjects coming from different counties and subjects living within the same county or neighboring counties have common or similar health services and environmental risk factors, we consider a survival model with the spatial random effects. Adding the spatial random effects into the hazard function accommodates a grouping association for subjects living in the same county and a neighborhood association for subjects living in the neighboring counties. Let , , , denotes the spatial effects of latent risk factors for the kth county and the gth type of failure. We introduce, these random effects through (3), as
(5) |
In other words, in the HIV/AIDS data, the model (5) captures the residual or unexplained log-relative subhazards of AIDS progression and death before AIDS in each of the nine Hamadan counties. We used the areal approach for the spatial random effects such that we used only the information about the adjacency of each county relative to each other rather than other metrics based on geographic distance.
Also, in the presence of right censoring the inverse probability of censoring weighting approach is used in the Fine and Gray model to obtain the consistent estimators of regression coefficients [33]. We assessed the association between censoring times and explanatory variables by a Cox regression and the effects of none of variables were significant. Hence, the weight of subject ith of county kth at for event of interest (AIDS progression) is defined as follows
(6) |
where is the Kaplan-Meier estimate of the survival function for the censoring times. Here, as long as individuals have not failed, if they failed from event of interest or have been right censored, and if they failed from competing events. Therefore, individuals who have not failed fully contribute to the partial likelihood while those failed from event of interest or have been right censored have no contribution. Also, the contribution of individuals who failed from competing events has to be weighted according to their probability of being censored to account for the fact these individuals would have potentially been censored if they had remained at risk. We estimated the censoring distribution across counties because the county sizes were finite and the estimation of censoring distribution is not consistent in each county. In other words, we estimated while the spatially association among counties was ignored.
3.2. Estimation method
In the Bayesian approach, we considered the prior distribution for all parameters, and by the Markov chain Monte Carlo (MCMC) methods, which involve the Gibbs sampler and Metropolis-Hastings algorithm, sampling from the posterior distributions was implemented. Since the marginal posterior distributions do not have the closed forms, the MCMC methods were applied to avoid the analytic intractable integral problem. Let and denotes all the unknown parameters for the gth type of failure and represents the observed data; therefore, then the partial likelihood function is defined as follows:
Under the usual assumptions of independent and non-informative censoring, the partial likelihood of failure times given in Fine and Gray is of the form
(7) |
where denotes the risk set at time of failure for the ikth individual for the gth type of failure. Then the partial likelihood of failure times was modified using inverse probability of censoring weighting, as originally proposed by Fine and Gray
(8) |
Also, for the spatial random effects of the gth type of failure, we used the intrinsic conditional autoregressive (ICAR) prior. A common model for the areal data collected over a geographic region in a univariate case such as a single disease is a conditional autoregressive (CAR) distribution, developed by Besage [6]. Let be the spatial random effects vector observed at the areal counties for the gth type of failure, then the general form of the CAR conditional distributions for the gth type of failure is
(9) |
where is the set of all spatial random effects except the one for the county , and indicates that the county is a neighbor of the county [2]. Also, is conditional variance and is called a smoothing parameter for the gth type of failure. From Brook’s (1964) lemma [8], the full conditional distributions in (9) determine the joint distribution as
(10) |
where is a with . If we specify , is the number of neighbors of the k-th county, and where is the adjacency matrix of the graph representing our county ( if the county is a neighbor of the county ), we obtain the so-called ICAR model that is the most common CAR distribution [7]. Also, in this structure, the smoothing parameter is set as . Hence, the formulation of the joint prior for is as follows
(11) |
where is a matrix. The model specification in the Bayesian setting is completed by assigning prior distributions for the spatial random effects variance, , and regression coefficients of subhazard models, . An inverse-gamma, , and a multivariate normal, , priors were used for spatial random effects variance and regression coefficients, respectively. For all these priors, we choose the distributions with the very high variance. Also, we considered nonspatial random effects to account for the global county heterogeneity as it is common practice in spatial areal modeling. We replaced in (5) by and employed a normal prior as . Moreover, we included both spatial and nonspatial random effects. One problem with this approach is that the random effects become identified only by the prior, and so the proper choice of priors for and becomes problematic [15]. We chose vague hyperpriors for the and (having mean 1 but variance 100) in order to allow maximum flexibility in the partitioning of the random effects into spatial and nonspatial. The joint posterior distribution of the proposed model for the gth type of failure is denoted by , and is proportional to
(12) |
After initializing values for the parameters, sampling from the full conditional distribution by the MCMC algorithm was performed. All statistical analysis and also mapping the results were performed using OpenBUGS software, version 3.2.3, GeoBUGS and R package R2OpenBUGS.
For the model selection, we selected a summary measure of the deviance information criterion (DIC) [35]. The DIC criterion is based on the posterior distribution of the deviance statistic (a measure of goodness of fit) and an effective number of parameters (a measure of model complexity). The values of DIC have no meaning and differences in DIC between the models are important. The model with the smallest of DIC is the preferred model between a collection of alternative models. Using the MCMC methods, the DIC was computed from the posterior samples.
4. Analysis of the HIV/AIDS data
In order to show the benefits of our proposed model, four models were selected by varying the log-relative subhazard as follows:
Model 1: The model with no random effects in the log-relative subhazard ( )
Model 2: The model with the nonspatial random effects in the log-relative subhazard (
Model 3: The model with the spatial random effects in the log-relative subhazard
Model 4: The model with the spatial and nonspatial random effects in the log-relative subhazard .
The first model is the model that does not account for any correlation among the counties (Fine and Gray model). The second model is the model that only incorporates the global correlation among the counties. In third model, we considered the ICAR prior distribution for the random effects to account spatial correlation among counties. The fourth model is the full model to account both spatial and nonspatial correlation.
We analyzed the HIV/AIDS data using separate models for each risk because both types of failure were of interest in this data. The proposed models were fitted based on sampling chains of 50,000 iterations with a spacing of 5 for reducing the level of correlation and the first 10,000 discarded as a burn-in. Trace, auto-correlation and density plots of the posterior distributions were assessed for convergence of the MCMC chains. The Gelman-Rubin’s statistics for all parameters were between 1.0 and 1.1. Also, the consistent batch means estimates of Monte Carlo standard errors were from 0.0002 to 0.020 for all parameters. The summary measures of all parameters consist of the mean and standard deviation by posterior samples were obtained. Moreover, for the regression coefficients, the adjusted subhazard ratios were calculated with 95% credible intervals.
In order to examine the effect of hyperprior specification, we carried out a sensitivity analysis regarding the prior distribution for the spatial variance component, , for model 3. We used different inverse gamma priors, suggested by Silva et al. [34] as follows: , , , , , , and . The value of DIC based on model 3 for these seven selected priors showd that is the prior with moderate information has best fit for both risks in HIV/AIDS data (Table 2 of the supplementary material). Also, Table 2 compared four models based on the DIC for both risks. The DIC values for the four proposed models were close, but model 3 which introducing the spatial random effects, showed better DIC for both risks. In addition, these values suggested that the unstructured spatial random effects has a small effect on DIC value. This means that the structured spatial random effects, , are accounting for all the residual spatial heterogeneity among the Hamadan counties for both risks. The summary measures of all parameters of the third model for both risks were presented in Table 3. Based on the adjusted subhazard ratio estimates, there were the significant relationships between TB co-infection and mode of transmission with the risk of AIDS progression. Also, the adjusted relationships for the subhazard of death before AIDS were statistically significant for gender, TB co-infection and mode of transmission. In other words, HIV-positive patients who were co-infected with TB and became infected through mother to child had a higher cumulative incidence of AIDS progression as compared to those who were infected with HIV alone and infected through IDU. Also, the cumulative incidence of death before AIDS was lower in female than in male, such that the adjusted subhazard ratio was 0.22.
Table 2.
Model comparison results for the HIV/AIDS data, bold number shows the best fit.
AIDS progression | Death before AIDS | ||||
---|---|---|---|---|---|
Model types | Log-relative subhazard | DIC | DIC | ||
(Model 1) | 2687.0 | 19.18 | 2657.0 | 18.11 | |
(Model 2) | 2687.0 | 19.72 | 2656.0 | 19.42 | |
(Model 3) | 2669.0 | 7.34 | 2641.0 | 6.34 | |
(Model 4) | 2681.0 | 15.45 | 2656.0 | 19.06 |
Table 3.
Posterior estimation results of model 3.
AIDS progression | Death before AIDS | ||||
---|---|---|---|---|---|
Subhazard ratio (95% | Subhazard ratio (95% | ||||
Variable | Category | Mean (SD) | credible interval) | Mean (SD) | credible interval) |
Gender | |||||
Male | Reference | Reference | |||
Female | 0.246 (0.38) | 1.379 (0.608, 2.693) | −1.866 (0.92) | 0.223 (0.021, 0.747) | |
Marital status | |||||
Single | Reference | Reference | |||
Married | 0.048 (0.22) | 1.075 (0.686, 1.581) | 0.295 (0.20) | 1.370 (0.910, 1.970) | |
Divorced | 0.189 (0.27) | 1.254 (0.710, 2.066) | 0.226 (0.28) | 1.305 (0.704, 2.083) | |
Widowed | 0.707 (0.43) | 2.223 (0.803, 4.537) | −0.091 (0.65) | 1.111 (0.213, 2.808) | |
Age | |||||
1–24 | Reference | Reference | |||
25–44 | 0.262 (0.32) | 1.369 (0.715, 2.488) | −0.165 (0.27) | 0.880 (0.512, 1.442) | |
45–74 | 0.512 (0.41) | 1.823 (0.744, 3.780) | 0.388 (0.35) | 1.571 (0.730, 2.909) | |
Tuberculosis infection | |||||
No | Reference | Reference | |||
Yes | 1.798 (0.29) | 6.290 (3.349, 10.340) | −1.460 (0.81) | 0.306 (0.038, 0.864) | |
Mode of transmission | |||||
Injection drug use | Reference | Reference | |||
Sexual | 0.247 (0.37) | 1.371 (0.607, 2.593) | −0.698 (0.56) | 0.575 (0.149, 1.315) | |
Mother to child | 1.977 (0.59) | 8.632 (2.155, 24.10) | ND | ND | |
Injection drug use/sexual | 0.728 (0.36) | 2.210 (0.966, 4.024) | −1.365 (0.79) | 0.333 (0.040, 0.913) | |
0.436 (0.040, 1.919) | 0.092 (0.004, 0.515) |
We also mapped the summaries of our results. Figure 2 shows 2 maps that represent the estimated spatial random effects related to the cumulative incidence of both risks in nine counties of Hamadan Province. The map on the left is for the relative subhazard of AIDS progression and the map on the right is for relative subhazard of death before AIDS that are defined by and , respectively, for . The posterior estimates of county-specific random effects were recorded based on the quintile of their distribution for showing the spatial inequalities on the map. As shown in Figure 2, for the cumulative incidence of AIDS progression, one cluster of counties was identified with highest risk in the east region (two out of nine counties) and one county with lowest risk in the south region was identified. Also, for the cumulative incidence of death before AIDS, the highest-risk cluster consisted of the west, and southwest regions (two out of nine counties) and the lowest-risk county was in the east region.
Figure 2.
Maps of the spatial relative subhazard for ADS progression (left) and death before AIDS (right) based on the model 3.
Furthermore, the values of spatial random effects for the subhazards model of AIDS progression and death before AIDS were in ranges (−0.339, 0.237) and (−0.054, 0.056), respectively (Table 4 of the supplementary material). Such values of the spatial random effects suggest that regional differences had an effect around from 24% to 34% and from 5.4% to 5.6% on the subhazards of AIDS and death before AIDS, respectively. Also, the values of the spatial random effects can be interpreted as a residual heterogeneity after adjusting for the covariates effects and again such values indicate that the covariates such as county economic status, quality of healthcare or population total per county are needed in the subhazard model for AIDS progression. But, with respect to the small values of the spatial random effect for risk of death before AIDS, these variables are not needed in its subhazard model. On the other hand, the values of the spatial random effects could help in the visual representations of the spatial inequalities, allowing to identify hot spot areas. The variances of spatial random effects were estimated 0.43 with 95% CI (0.04, 1.91) and 0.09 with 95% CI (0.004, 0.51) for risks of AIDS progression and death before AIDS, respectively. These values illustrated the high and the low amount of variation in the cumulative incidence of AIDS progression and death before AIDS across counties, respectively.
Table 4.
Simulation results for the estimation of parameters under the model 3 while the true parameter values were ( , , (0.3, 0.5, 0.5, −0.5, 0.5).
nk =15, | nk = 30, | = 50, | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Censoring rate | Estimate | Rel.Bias | MSE | CP | Estimate | Rel.Bias | MSE | CP | Estimate | Rel.Bias | MSE | CP | |
20% | = 0.5 | 0.486 | −0.025 | 0.027 | 96 | 0.496 | −0.005 | 0.014 | 96 | 0.495 | −0.007 | 0.008 | 94 |
= 0.5 | 0.490 | −0.016 | 0.010 | 93 | 0.504 | 0.010 | 0.007 | 95 | 0.503 | 0.003 | 0.006 | 96 | |
= 1 | 1.181 | 0.181 | 0.142 | 91 | 1.123 | 0.123 | 0.101 | 93 | 1.106 | 0.106 | 0.068 | 97 | |
40% | |||||||||||||
= 0.5 | 0.481 | −0.032 | 0.018 | 92 | 0.491 | −0.014 | 0.008 | 97 | 0.506 | 0.008 | 0.007 | 96 | |
= 0.5 | 0.478 | −0.034 | 0.021 | 96 | 0.493 | −0.013 | 0.018 | 94 | 0.492 | −0.014 | 0.009 | 94 | |
= 1 | 1.292 | 0.292 | 0.194 | 94 | 1.186 | 0.186 | 0.142 | 96 | 0.893 | −0.107 | 0.071 | 98 | |
60% | |||||||||||||
= 0.5 | 0.488 | −0.022 | 0.027 | 93 | 0.471 | −0.039 | 0.016 | 95 | 0.491 | −0.014 | 0.012 | 94 | |
= 0.5 | 0.484 | −0.028 | 0.032 | 94 | 0.475 | −0.035 | 0.026 | 94 | 0.520 | 0.033 | 0.015 | 97 | |
= 1 | 1.312 | 0.312 | 0.213 | 92 | 1.298 | 0.298 | 0.184 | 97 | 0.857 | −0.143 | 0.080 | 96 |
5. Simulation study
The performance of model 3 was evaluated through a series of scenarios in the simulation study. For each dataset, we assumed an area that contained 16 spatial regions on a grid. We generated a total of 500 simulated datasets with three levels of sample size in each spatial region ( = 15, = 30 and = 50) and under three levels of censoring rate (low, 20%, medium, 40%, and high, 60%). In each spatial region, sample sizes were assumed to be equal for the convenience only, although different sample sizes in counties are allowed in our model. Thus, with this setup, the generated datasets were with three levels of the total sample size ( = 240, = 480 and = 800). We used a continuous covariate ( ) and a binary categorical covariate ( ) as the treatment variable (1 if treatment and 0 if placebo) that was generated from a standard normal distribution and a Bernoulli distribution with a success probability p = 0.5, respectively. For each dataset, we simulated two competing risks, risk 1 and 2. The failure times were generated based on the on the algorithm similar to that of Fine and Gray presented for competing risks data [16]. Let is the proportion of failure type 1. First, a random number R was generated from a continuous Uniform distribution on [0,1] and then for each individual, the times of failure type 1 was generated by , that is the inverse function of with the form
Also, the times of failure type 2 was generated directly by the conditional distribution of below
where is a proper distribution function. Therefore, when , the failure times can be directly simulated by plugging a random number R to the inverse function of this conditional distribution. Following Fine and Gray [16] and Katsahian and Boudreau [23] the corresponding regression coefficients and the proportion of failure type 1 were set as , , (0.3,0.5,0.5,-0.5,0.5) and , , (0.6,1,-1,1,1). The spatial random effects were generated from the ICAR distribution, , where , and then were centered around its mean. We introduced to be invertible the precision matrix and centered to be identifiable the spatial random effects. Also, in the structure of the ICAR distribution, the spatial random effects variance was considered as . The Censoring times were generated from a uniform (0, a) distribution where the value of a was empirically selected to achieve the approximate right censoring rate, low (around 20%), medium (around 40%), and high (around 60%).
For each of the fitted models, the convergence of the MCMC chain was evaluated by trace plot, auto-correlation plot, and Gelman-Rubin’s diagnostic [18] by constructing two chains with different starting values. For each MCMC chain, we run the 15,000 iterations and the first 5,000 discarded as a burn-in. For each parameter, the point estimate, relative bias, and mean square error (MSE) were calculated by the average of the means, relative biases and the square errors from 500 replicates, respectively. The coverage probability (CP) was calculated as the proportion of the 95% credible intervals that contain the true values. The relative bias and square error are defined as
where is the estimate of for the ith sample.
5.1. Simulation results
The results of the simulation study from eighteen scenarios were reported in Tables 4 and 5. The chief focus of this simulation study was the regression coefficients. Nevertheless, the estimation of spatial random effects variance was also examined. In terms of the regression coefficients, our proposed model overall performed well in all scenarios. It is worth highlighting that the increase of reduced relative bias moderately. The trend of the estimates of the regression coefficients in Table 5 was quite similar to those presented in Table 4. Also, Tables 4 and 5 summarizes the estimates of spatial random effects variance under all scenarios. The estimation of the spatial random effects variance, had a consistently small relative bias for the medium and large sample size ( = 480 and = 800) and with censoring rate 20% and 40%. Regarding spatial random effects variance, the trend of estimates in Table 5 was similar to those evident in Table 4. In the other words, our proposed model was not robust to varying proportions of censoring in terms of estimates of spatial random effects variance. The MSE criterion for all parameters was close together and as sample size increased and censoring rate decreased, the estimation accuracy of parameters increased in both simulation studies. Also, all parameters had the coverage probabilities close to the nominal level of 0.95 under all scenarios.
Table 5.
Simulation results for the estimation of parameters under the model 3 while the true parameter values were ( , , (0.6, 1, −1, 1, 1).
= 15, | = 30, | = 50, | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Censoring rate | Estimate | Rel.Bias | MSE | CP | Estimate | Rel.Bias | MSE | CP | Estimate | Rel.Bias | MSE | CP | |
20% | = 1 | 0.982 | −0.018 | 0.050 | 94 | 1.002 | 0.002 | 0.019 | 97 | 0.996 | −0.003 | 0.010 | 95 |
= −1 | 0.989 | 0.010 | 0.028 | 96 | 0.984 | 0.016 | 0.013 | 95 | 0.988 | 0.011 | 0.007 | 92 | |
= 1 | 1.135 | 0.135 | 0.112 | 92 | 1.131 | 0.131 | 0.074 | 94 | 1.061 | 0.061 | 0.059 | 96 | |
40% | |||||||||||||
= 1 | 0.993 | −0.007 | 0.054 | 94 | 0.994 | −0.006 | 0.024 | 95 | 0.999 | −0.0008 | 0.013 | 96 | |
= −1 | 1.003 | −0.003 | 0.039 | 95 | 0.995 | 0.004 | 0.019 | 97 | 0.992 | 0.007 | 0.009 | 95 | |
= 1 | 1.194 | 0.194 | 0.144 | 94 | 1.143 | 0.143 | 0.151 | 95 | 1.091 | 0.091 | 0.109 | 97 | |
60% | |||||||||||||
= 1 | 0.938 | −0.061 | 0.067 | 95 | 0.954 | −0.045 | 0.044 | 93 | 0.974 | −0.025 | 0.021 | 94 | |
= −1 | 1.017 | −0.017 | 0.060 | 96 | 1.001 | −0.001 | 0.030 | 97 | 0.995 | 0.004 | 0.015 | 93 | |
= 1 | 1.210 | 0.210 | 0.243 | 93 | 1.284 | 0.284 | 0.166 | 96 | 1.158 | 0.158 | 0.076 | 96 |
6. Discussion
In this paper, we used proportional subhazard models in a Bayesian setting for the spatially clustered survival HIV/AIDS data. The data were from Hamadan Province, Iran, from 1997 to 2011. Our proposed model, allows to estimate spatial inequalities related to the cumulative incidence, as well as to incorporate spatial heterogeneity in prognostic analyses. Also, this study showed that as the sample size in the county increases, the estimates were slightly improved. Hence, our proposed model is a promising choice with reasonable number of subject in each region.
Regarding spatially clustered competing risks data, more recently, Hesam et al. used cause- specific hazard model with spatial random effects [22]. The multivariate conditional autoregressive distribution was used for spatial random effects. In fact, one random effect was used for every type of event in each cluster and the correlation between random effects during geographic regions was taken into account. But, in this paper, the univariate conditional autoregressive distribution was used for spatial random effects because Fine and Gray model does not take into account any assumptions about the dependence among the events. On the other hand, the estimated regression coefficients of Fine and Gray model give the effect of the covariates on the subhazard for the gth type of failure and can be interpreted to an effect on the cumulative incidence function [16, 32]. While, the results of cause-specific hazard model give the covariate effect on the instantaneous risk not on the cumulative incidence. Hence, in study of Hesam et al. estimates the relative risk for the gth type of failure in the th region while in our model, estimates the spatial relative subhazard.
In HIV disease, where survival of the patients is the outcome of interest, there is multiple event times. The times may consist of an intermediate nonterminal event such as AIDS, and terminal event such as death. Hence we could consider three transitions to two different states (AIDS and death) in these data as follows: 1) HIV → AIDS, 2) AIDS → death, and 3) HIV → death. In this case, when there is a terminal event (e.g. death), and an intermediate nonterminal event (e.g. AIDS) semi-competing risks data are encountered [17]. In semi-competing risks data, there is usually correlation between the times to the terminal and nonterminal events of an individual [30]. Hence, HIV/AIDS data could be analyzed by methods for semi-competing risks data that take into account dependence between the non-terminal and terminal events (e.g. multistate models). Several studies proposed models for analyzing cluster-correlated semi-competing risks data [1, 27, 31]. However, if one only uses information on the time and type of the first event, then this reduced data could be analyzed as competing risks [30]. Hence, when interest is in the time from HIV infection to AIDS diagnosis, death before AIDS is a competing risk or vice versa [32]. In other words, one of these transitions to two different states as follows: 1) HIV → AIDS or 2) HIV → death could be modeled by Fine and Gray model.
According to the posterior estimates of the adjusted subhazard ratio of proposed model, there were the significant associations between TB co-infection and mode of transmission with the cumulative incidence of AIDS progression. Also, there was the significant association between gender, TB co-infection, and mode of transmission with the cumulative incidence of death before AIDS.
In the field of HIV/AIDS disease, understanding of the geographic variation of the cumulative incidence of AIDS progression and death before AIDS provides greater opportunity to identify high-burden areas. From the results of the present study, the highest-risk cluster in risk of AIDS progression consists of some counties with the relatively high rate of population density. Also, the lowest-risk county was in the remote area and with much distance from the most populous county in Hamadan Province. It is worth mentioning that by fitting our proposed model in the HIV/AIDS data, the length of 95% credible interval for the covariates was decreased slightly compared to the Fine and Gray model (Table 3 of the supplementary material). In other words, taking into consideration the spatial heterogeneity improves the prediction of survival.
The present study had limitations which should be mentioned. First, the length of 95% credible interval of the subhazard ratio in some categories of some covariates such as TB co-infection and mode of transmission was wide because the sample size was low in these categories in the HIV/AIDS data. Second, in order to incorporate more spatial variation, instead of counties another smaller areas should be used, that unfortunately, were not available in our dataset. Third, the exact geographic locations (e.g. latitude and longitude) were not available in the HIV/AIDS data. Because of that the geostatistical frailty model was not used for analysis of HIV/AIDS data. Forth, our simulation design was limited to the fixed county size and observations without ties. It is worth mentioning that a Bayesian partial likelihood posterior is not a good approximation for the full Bayesian posterior when many tied observations exist [25].
Finally, our proposed model could be developed in some ways. First, an extension to the spatiotemporal framework is possible. Second, the robustness of spatial random effects for violation of the normality assumption can be assessed in this model. Also, the nonparametric priors for the spatial random effects can be considered to relax this assumption. Another further work is to investigate the parametric modeling of the cause-specific cumulative incidence function with spatial random effects.
Supplementary Material
Acknowledgements
The authors are thankful to the anonymous referees and an associate editor for their constructive and helpful comments which led to a significantly improved presentation. The authors are also grateful to Dr. Leili Tapak for her help with the R code.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.D. Alvares, S. Haneuse, C. Lee, and K.H. Lee, Semicomprisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data. RJ 11 (2019), pp. 376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Banerjee S., Carlin B., and Gelfand A., Hierarchical Modeling and Analysis for Spatial Statistics, Chapman&Hall/CRC, New York, 2004. [Google Scholar]
- 3.Banerjee S., Carlin B.P., and Gelfand A.E., Hierarchical Modeling and Analysis for Spatial Data, ed, CRC press, Boca Raton, 2014. [Google Scholar]
- 4.Banerjee S., and Dey D.K., Semiparametric proportional odds models for spatially correlated survival data. Lifetime Data Anal. 11 (2005), pp. 175–191. [DOI] [PubMed] [Google Scholar]
- 5.Banerjee S., Wall M.M., and Carlin B.P., Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics 4 (2003), pp. 123–142. [DOI] [PubMed] [Google Scholar]
- 6.Besag J., Spatial interaction and the statistical analysis of lattice systems. J. Royal Stat. Soc. Ser B (Methodol.) 36 (1974), pp. 192–236. [Google Scholar]
- 7.Besag J., and Kooperberg C., On conditional and intrinsic autoregressions. Biometrika 82 (1995), pp. 733–746. [Google Scholar]
- 8.Brook D., On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems. Biometrika 51 (1964), pp. 481–483. [Google Scholar]
- 9.Chen B.E., et al. , Competing risks analysis of correlated failure time data. Biometrics 64 (2008), pp. 172–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Christian N.J., Ha I.D., and Jeong J.H., Hierarchical likelihood inference on clustered competing risks data. Stat. Med. 35 (2016), pp. 251–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Clayton D., and Cuzick J., Multivariate generalizations of the proportional hazards model. J. Royal Stat. Soc. Ser. A (Gen.) 148 (1985), pp. 82–117. [Google Scholar]
- 12.Cramb S.M., et al. , A flexible parametric approach to examining spatial variation in relative survival. Stat. Med. 35 (2016), pp. 5448–5463. [DOI] [PubMed] [Google Scholar]
- 13.Diva U., Dey D.K., and Banerjee S., Parametric models for spatially correlated survival data for individuals with multiple cancers. Stat. Med. 27 (2008), pp. 2127–2144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dixon S.N., Darlington G.A., and Desmond A.F., A competing risks model for correlated data based on the subdistribution hazard. Lifetime Data Anal. 17 (2011), pp. 473–495. [DOI] [PubMed] [Google Scholar]
- 15.Eberly L.E., and Carlin B.P., Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. Stat. Med. 19 (2000), pp. 2279–2294. [DOI] [PubMed] [Google Scholar]
- 16.Fine J.P., and Gray R.J., A proportional hazards model for the subdistribution of a competing risk. J. Am. Stat. Assoc. 94 (1999), pp. 496–509. [Google Scholar]
- 17.Fine J.P., Jiang H., and Chappell R., On semi-competing risks data. Biometrika 88 (2001), pp. 907–919. [Google Scholar]
- 18.Gelman A., and Rubin D.B., Inference from iterative simulation using multiple sequences. Stat. Sci. 7 (1992), pp. 457–472. [Google Scholar]
- 19.Gorfine M., and Hsu L., Frailty-based competing risks model for multivariate survival data. Biometrics 67 (2011), pp. 415–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ha I.D., et al. , Analysis of clustered competing risks data using subdistribution hazard models with multivariate frailties. Stat. Method Med. Res. 25 (2016), pp. 2488–2505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hanson T.E., Jara A., and Zhao L., A Bayesian semiparametric temporally-stratified proportional hazards model with spatial frailties. Bayesian Anal. 7 (2012), pp. 147–188. [PMC free article] [PubMed] [Google Scholar]
- 22.Hesam S., et al. , A cause-specific hazard spatial frailty model for competing risks data. Spat. Stat. 26 (2018), pp. 101–124. [Google Scholar]
- 23.Katsahian S., and Boudreau C., Estimating and testing for center effects in competing risks. Stat. Med. 30 (2011), pp. 1608–1617. [DOI] [PubMed] [Google Scholar]
- 24.Katsahian S., et al. , Analysing multicentre competing risks data with a mixed proportional hazards model for the subdistribution. Stat. Med. 25 (2006), pp. 4267–4278. [DOI] [PubMed] [Google Scholar]
- 25.Kim Y., and Kim D., Bayesian partial likelihood approach for tied observations. J. Stat. Plan. Infer. 139 (2009), pp. 469–477. [Google Scholar]
- 26.Lai X., Yau K.K., and Liu L., Competing risk model with bivariate random effects for clustered survival data. Comput. Stat. Data Anal. 112 (2017), pp. 215–223. [Google Scholar]
- 27.Lee K.H., et al. , Hierarchical models for semicompeting risks data with application to quality of end-of-life care for pancreatic cancer. J. Am. Stat. Assoc. 111 (2016), pp. 1075–1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Motarjem K., Mohammadzadeh M., and Abyar A., Geostatistical survival model with Gaussian random effect. Stat. Pap. 61 (2017), pp. 1–23. [Google Scholar]
- 29.Pan C., et al. , Bayesian semiparametric model for spatially correlated interval-censored survival data. Comput. Stat. Data Anal. 74 (2014), pp. 198–208. [Google Scholar]
- 30.Peng L., and Fine J.P., Regression modeling of semicompeting risks data. Biometrics 63 (2007), pp. 96–108. [DOI] [PubMed] [Google Scholar]
- 31.Peng M., Xiang L., and Wang S., Semiparametric regression analysis of clustered survival data with semi-competing risks. Comput. Stat. Data Anal. 124 (2018), pp. 53–70. [Google Scholar]
- 32.Putter H., Fiocco M., and Geskus R.B., Tutorial in biostatistics: competing risks and multi-state models. Stat. Med. 26 (2007), pp. 2389–2430. [DOI] [PubMed] [Google Scholar]
- 33.Robins J.M., and Rotnitzky A., Recovery of information and adjustment for dependent censoring using surrogate markers, in AIDS Epidemiology, Birkhäuser, Boston, MA, 1992, pp. 297–331. [Google Scholar]
- 34.Silva G.L., et al. , Hierarchical Bayesian spatiotemporal analysis of revascularization odds using smoothing splines. Stat. Med. 27 (2008), pp. 2381–2401. [DOI] [PubMed] [Google Scholar]
- 35.Spiegelhalter D.J., et al. , Bayesian measures of model complexity and fit. J. Royal Stat. Soc. Ser. B (Stat. Methodol.) 64 (2002), pp. 583–639. [Google Scholar]
- 36.Vaupel J.W., Manton K.G., and Stallard E., The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16 (1979), pp. 439–454. [PubMed] [Google Scholar]
- 37.Wolbers M., et al. , Competing risks analyses: Objectives and approaches. European Heart J. 35 (2014), pp. 2936–2941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhou B., et al. , Competing risks regression for clustered data. Biostatistics 13 (2012), pp. 371–383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhou B., et al. , Competing risks regression for stratified data. Biometrics 67 (2011), pp. 661–670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhou H., and Hanson T., A unified framework for fitting Bayesian semiparametric models to arbitrarily censored survival data, including spatially referenced data. J. Am. Stat. Assoc. 113 (2018), pp. 571–581. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.