ABSTRACT
In survival analysis, the Accelerated Failure Time (AFT) shared frailty model is a widely used framework for analyzing time-to-event data while accounting for unobserved heterogeneity among individuals. This paper extends the traditional Weibull AFT shared frailty model using half logistic-G family of distributions (Type I, Type II and Type II exponentiated) through Bayesian methods. This approach offers flexibility in capturing covariate influence and handling heavy-tailed frailty distributions. Bayesian inference with MCMC provides parameter estimates and credible intervals. Simulation studies show improved model predictive performance compared to existing models, and real-world applications demonstrate its practical utility. In summary, our Bayesian Weibull AFT shared frailty model with Type I, Type II and Type II exponentiated half logistic-G family distributions enhances time-to-event data analysis, making it a versatile tool for survival analysis in various fields using STAN in R.
KEYWORDS: Type I half-logistic-G, type II half logistic-G, type II exponentiated half logistic-G, Weibull distribution, AFT, shared-frailty, censored data, LOOIC, WAIC, STAN
1. Introduction
In the world of data analysis, understanding how long it takes for events to happen is crucial. Survival analysis helps us make sense of this time-to-event data, especially when there are hidden differences among individuals that might affect the outcomes. The Accelerated Failure Time (AFT) shared frailty model [21] is a popular tool for this type of analysis, allowing us to consider these hidden factors. This study focuses on improving the traditional Weibull AFT shared frailty model by adding a new element the half logistic-G family of distributions. We explore three variations within this family the Type I [15], Type II [16] and Type II exponentiated [2]. Censoring frequently occurs in survival analysis, and there isn't a precise theory for handling censored data within the frequentist approach. In contrast, the Bayesian framework consistently allows for the development of exact theories in such situations. The prominent theory in the frequentist approach is the maximum likelihood theory, known for its asymptotic properties such as consistency and asymptotic normality. However, this theory tends to remain silent when dealing with small sample sizes. In contrast, the Bayesian approach is not constrained by these limitations. To make these additions, we use Bayesian methods, which give us more flexibility in understanding the impact of different factors and handling situations where the hidden differences have a big impact.
Our main goal is to make the Weibull AFT shared frailty model better by including the half logistic-G family of distributions which improve the predictive performance of the survival models with Bayesian methods. By using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques, we aim to provide accurate estimates for the model's parameters and show how certain we can be about those estimates. We also want to demonstrate that our model performs better than existing ones by conducting simulation studies. These studies will help us see how well our model predicts outcomes compared to what's already out there. To make sure our work is practical and useful in the real world, we'll apply our model to actual situations in different fields. The use of STAN in R (common statistical tools) underlines that our methods are not just theoretical but can be applied easily by researchers and analysts. To sum up, our study aims to make a significant contribution to survival analysis. Our improved Bayesian Weibull AFT shared frailty model, featuring the Type I, Type II and Type II exponentiated half logistic-G family distributions, is designed to better handle time-to-event data. This research focuses on enhancing the predictive performance of survival models across diverse fields by employing Bayesian information criteria–specifically, Leave-One-Out Information Criterion (LOOIC) and Watanabe–Akaike Information Criterion (WAIC). Through systematic data point omission and model re-estimation, lower LOOIC and WAIC values indicate superior predictive accuracy, considering both goodness of fit and model complexity. This study emphasizes the practical utility of LOOIC and WAIC in model comparison, providing a concise toolkit for researchers seeking nuanced insights and improved decision-making in event prediction [5].
2. Generalized family of distribution
A generalized family of distributions refers to a broad class of probability distributions that encompasses several specific distributions as special cases. These distributions are often characterized by a set of parameters that allow for flexibility in modeling a wide range of data patterns. The term ‘generalized’ is used to indicate that the family includes various well-known distributions as particular instances. In this study, we have Half logistic families of distributions which are Type I, Type II and Type II exponentiated generalized families of distribution.
2.1. Type I half logistic (TIHL-G) family
A continuous random variable T is said to have the TIHL-G family if random variable T ∼ TIHL-G(ν, β), if it has following probability density, cumulative density, survival and hazard functions [15] respectively.
| (1) |
| (2) |
| (3) |
| (4) |
The and are the baseline cumulative density and probability density functions of any continuous random variable respectively depending on a parameter vector , and .
2.2. Type II half-logistic (TIIHL-G) family
A continuous random variable T is said to have the TIIHL-G family if random variable T ∼ TIIHL-G(ν, β), if it has following probability density and cumulative density functions [16] respectively.
| (5) |
| (6) |
The ) and are the baseline cumulative density and probability density functions of any continuous random variable respectively depending on a parameter vector , and .
2.3. Type II exponentiated half logistic (TIIEHL-G) family
A continuous random variable T is said to have the TIIEHL-G family if random variable T ∼ TIIEHL-G(ν, ), if it has following probability density and cumulative density functions [2] respectively.
| (7) |
| (8) |
The ) and are the baseline cumulative density and probability density functions of any continuous random variable respectively depending on a parameter vector , and .
3. AFT shared frailty model
An Accelerated Failure Time shared frailty model, also referred to as a shared frailty model for AFT, is a statistical technique employed in survival analysis to investigate the time until a specific event, such as death or failure, occurs [21]. This model serves as an expansion of the conventional Accelerated Failure Time (AFT) model, which posits a linear association between the logarithm of survival time and predictor variables. The incorporation of frailty into the AFT shared frailty model allows for the incorporation of unobservable variations or clustering patterns among individuals or groups. The AFT model assumes that the natural logarithm of the survival time can be expressed as a linear function of covariates. This model can be represented as follows:
| (9) |
where T is the survival time and b's are regression coefficient vector , are the jth predictor variable and be the error term assumed to follow a specific distribution and σ be the scale parameter. Suppose that a random variable ϵ has a standard extreme value distribution with density function, substituting from Equation 9 in extreme value distribution then Weibull AFT will obtain, [5].
The AFT shared frailty model extends the AFT model by incorporating shared frailty term . The model equation becomes:
| (10) |
where is the shared frailty term, which represents the unobserved heterogeneity or clustering effects among individuals. Additionally, the term signifies a stochastic variable utilized to represent the disparity between the values of and the linear segment of the model. It is presumed that conforms to a specific parametric distribution.
The shared frailty term captures the variation that is not explained by the observed covariates and provides a way to account for correlations among individuals within the same group or cluster. This is particularly useful when there is reason to believe that the survival times within a group are more similar than what can be explained by the observed covariates alone.
The survivor function's general expression pertaining to the jth individual within an AFT model can be stated as follows:
| (11) |
In this rendition of the model, represents the acceleration factor associated with the jth individual.
The overarching parametric AFT model that includes a shared frailty component can be expressed as follows:
| (12) |
Here, the acceleration factor for the individual in the group, denoted as , is a pivotal component. The variables stand for the distinct random effects attributed to each cluster.
3.1. Weibull AFT shared frailty model
Weibull distribution is a two-parameter distribution with shape and scale parameters α and β as an extension of the gamma distribution. The probability density, cumulative density and survival functions of Weibull distribution with shape parameter α and scale parameter are given below.
| (13) |
| (14) |
| (15) |
The baseline survival function of Weibull distribution is .
Hence, the survival function of Weibull AFT shared frailty model are as follows:
| (16) |
| (17) |
where from the survival function it is very clear that the Weibull AFT shared frailty model follow Weibull distribution T∼ Weibull( then the probability density and cumulative density functions of the Weibull AFT shared frailty model are as follows:
| (18) |
| (19) |
4. Method of generating random numbers
Inverse CDF method for generating random numbers: For random number generation of time variable from any survival model, we will use inverse CDF method by equating the CDF function, F(t) to u, where U is a Uniform(0,1) variate and solve this equation for the value of t.
| (20) |
| (21) |
t is now a random number following the desired probability distribution. This method leverages the fact that the CDF transforms random numbers from a uniform distribution into random numbers following the target distribution.
5. Proposed models
Our research introduces a trio of survival models, namely Type I, Type II and Type II exponentiated half logistic-G family of distributions, each of which is integrated with the Weibull Accelerated Failure Time (AFT) framework as the foundational model for shared frailty analysis. These proposed models collectively contribute to the field of Bayesian survival analysis, offering researchers a comprehensive toolkit to address shared frailty scenarios in a manner that aligns with the underlying characteristics of their data. It boasts the best predictive performance, gauged using Bayesian information criteria like LOOIC and WAIC. These criteria are popular in Bayesian analysis for their flexibility and reliability in model comparison, especially in situations with smaller sample sizes or when candidate models don't accurately represent the true model. However, the choice of the information criterion depends on the specific traits of the data and the modeling context.
LOOIC is based on leave-one-out cross-validation, which involves fitting the model to the entire dataset multiple times, leaving out one observation at a time. This process provides a more robust estimate of predictive performance compared to other criteria.
5.1. Type I half logistic-Weibull AFT shared frailty model
The Type I half logistic-Weibull AFT shared frailty (TIHL-WAF) model will involve the probability density, cumulative density, survival and hazard functions when using the TIHL-G family [15], if the Weibull AFT shared frailty model is taken into account as the baseline distribution.
| (22) |
| (23) |
| (24) |
Also, using above expressions the hazard rate function (HRF) can be obtained as
| (25) |
Therefore, we can denote the distribution of T as T ∼ TIHL-WAF(α,β, ). Thus, the random number generation from TIHL-WAF model is obtained by:
| (26) |
| (27) |
| (28) |
5.2. Type II half logistic-Weibull AFT shared frailty model
The Type II half logistic-Weibull AFT shared frailty (TIIHL-WAF) model will involve the probability density and cumulative density functions when using the TIIHL-G family [16], if the Weibull AFT shared frailty model is taken into account as the baseline distribution.
| (29) |
| (30) |
Therefore, we can denote the distribution of T as T ∼ TIIHL-WAF(α,β, ).
Thus the random number generation from TIIHL-WAF model is obtained given below
| (31) |
| (32) |
| (33) |
5.3. Type II exponentiated half logistic-Weibull AFT shared frailty model
The Type II exponentiated half logistic-Weibull AFT shared frailty (TIIEHL-WAF) model will involve the probability density and cumulative density functions when using the TIIEHL-G family [2], if the Weibull AFT shared frailty model is taken into account as the baseline distribution.
| (34) |
| (35) |
Therefore, we can denote the distribution of T as T ∼ TIIEHL-WAF( ).
Thus the random number generation from TIIEHL-WAF is obtained below
| (36) |
| (37) |
| (38) |
5.4. Weibull AFT shared frailty model
The probability density and cumulative density functions of the Weibull distribution are given below from section 3.1.
| (39) |
| (40) |
Consequently, the following is how the Weibull AFT shared frailty model generates random numbers
| (41) |
| (42) |
| (43) |
All the Type I, Type II and Type II exponentiated-Weibull AFT shared frailty and Weibull AFT shared frailty models are compared to show the importance of generalized family incorporated with AFT shared frailty models and compare the predictive performance of the respective models.
6. Comprehending hazard curves and the practical use of half-logistic Weibull-AFT models
Hazard curves depict the system or component's instantaneous failure rate at a specific point in time. These curves are essential tools in reliability analysis, offering insights into the likelihood of failure over varying time intervals. By illustrating how the failure rate changes over time, hazard curves help engineers and medical researchers make informed decisions about maintenance schedules, system improvements and overall risk management strategies.
Type 1 half logistic-WAF (TIHL-WAF): This model provides excellent flexibility for modeling bathtub-shaped hazard functions, where the failure rate initially decreases, then increases and finally levels off. In medical research, this could represent diseases with early susceptibility followed by a period of decreasing risk due to interventions, then rising susceptibility due to age-related decline. In engineering, it could model component failures with initial wear-in, followed by stable operation, and eventual wear-out.
Type 2 half logistic-WAF (TIIHL-WAF): Similar to TIHL-WAF, this model allows for bathtub-shaped hazard functions but offers computational advantages. It's useful when focusing on the later stages of the bathtub curve, like modeling wear-out in engineered systems.
Type 2 exponentiated half logistic-WAF (TIIEHL-WAF): This model excels at modeling unimodal (monotonic) increasing or decreasing hazard functions. It's valuable for medical studies where failure risk steadily increases with age, like analyzing cancer incidence, or for engineering applications where components degrade uniformly over time.
TIHL-WAF, TIIHL-WAF and TIIEHL-WAF can analyze disease progression, treatment effectiveness, and survival data, and also can model age-related disease incidence, organ failure, and cancer risk.
TIHL-WAF, TIIHL-WAF and TIIEHL-WAF can assess component failure times, equipment deterioration, and maintenance schedules, and also can model material degradation, fatigue behavior, and system reliability.
7. Exploring the data and visualizing the survival function to validate the proposed model
The dataset discussed initially by McGilchrist and Aisbett, and also addressed by [5], revolves around the intervals between instances of infection at the site where a catheter is introduced for kidney patients who employ portable dialysis equipment. The dataset encompasses the durations until the first and second recurrence of kidney infections in a total of 38 patients, each having two recorded observations. These survival times denote the duration until infection following catheter insertion. It's worth noting that instances where a catheter is removed for reasons other than infection result in censored observations, which account for approximately 24% of the dataset. This data has often been used to illustrate the use of random effects (frailty) in a survival model.
An inherent, unmeasured, or ‘random’ factor is present in this dataset in the form of a patient identification code. This factor introduces diversity among the patients. The dataset is accessible through the survival package [17] in the R programming environment [14].
Description of kidney catheter data variables are given below
time: time to infection in days
status: event status, 1=infection occurs or 0=censored
age: age in years
sex: 1=male, 2=female
disease: disease type (0=GN, 1=AN, 2=PKD, 3=Other)
id: identification code of the patients
7.1. Kaplan-Meier survival curve alongside parametric survival curve
The Kaplan-Meier survival curve is a graphical representation commonly used in survival analysis to estimate the probability of an event occurring over time. In the context of kidney catheter data, the Kaplan-Meier survival curve would be used to analyze and visualize the time to infection in days occurs for a group of individuals with kidney catheters.
The Kaplan-Meier survival curve analysis of kidney catheter data highlights notable trends. Survival probabilities for females consistently exceed those for males over time, suggesting a more favorable outlook for female patients. Additionally, the order of disease types by survival probability is glomerulonephritis (GN), amyloidosis (AN), other unspecified conditions and polycystic kidney disease (PKD). These findings underscore the importance of considering both gender and specific disease types when assessing survival probabilities in this patient population.
The evident concordance between the Non-parametric survival curve (K-M surv) and the parametric survival curve of the TIIEHL-WAF model highlights a consistent trend, offering robust backing for the model's excellent fit to the data. This alignment not only strengthens the model's reliability but also fosters confidence in its precision in depicting survival probabilities over time. The compelling agreement between the Non-parametric and parametric survival curves establishes TIIEHL-WAF as an optimal choice for data fitting, underscoring its resilience and capacity to provide accurate insights into survival probabilities.
7.2. Data creation for computation in stan
In order to produce the data, it is required to have various components of the model matrix X. This matrix should include information on the censoring variables and response variables, as well as a set of predictors designated by the letter M. The desired number of observations, N, is established, and censoring is taken into consideration the value 0 denotes values that have been censored, while the value 1 represents values that have not been censored. By combining all of these procedures, a data set with the name 'datk' and a listed format will be produced. The appendix contains the stan codes for the respective models.
8. The Bayesian framework and its corresponding procedure
Our objective in Bayesian analysis is to calculate the posterior distribution, which represents the exact distributions of parameters. In this method, the parameter's initial distribution and the data or likelihood are combined using Bayes' Theorem. Before the Bayesian regression model can be constructed, Both a prior distribution for the model's parameters and a likelihood function for the data must be defined. However the random group effect are not known, but are realisations of a random variable that has probability distribution with density . In this situation, we integrate the likelihood over possible values of random effects, to give the likelihood function as follows [5], for right-censored data.
| (44) |
In this context, we use the indicator variable or censoring indicator . It takes the value of 0 when the observed value is censored and 1 otherwise. Equation (44) allows us to replace the hazard function and survival function of all the models thereby obtaining their respective likelihoods.
8.1. Incorporating covariates into the modeling process
Following [12], We have introduced covariates using the log link function to construct a regression model.
| (45) |
| (46) |
| (47) |
Here are regression coefficients vector and 's are jth covariates of the ith group.
In STAN, the transformed parameter block of the stan model code contains above regression model.
8.2. Gaining insight into the underlying workings of prior and posterior distributions
Bayesian methods, known for their flexibility, leverage prior knowledge, handle uncertainty through probability distributions and offer a natural framework for updating beliefs. Particularly valuable in scenarios with limited data or complex models compared to traditional frequentist approaches, Bayesian regression models necessitate establishing prior distributions for parameters. This study adopts a Half-Cauchy prior for shape and scale parameters, along with a Normal prior (mean=0, standard deviation=5) for the regression coefficient as a regularization measure. These regularizing priors play a pivotal role in promoting simpler, interpretable solutions, preventing overfitting and enhancing predictive performance. The use of a Half-Cauchy prior for various parameters is supported by [8], with detailed explanations provided by [6] and [11]. Consequently, the prior distribution for parameters is a Half-Cauchy distribution with a scale of , represented by its probability density function.
| (48) |
| (49) |
The regularizing prior of the regression coefficiants , and random effect variable and then probability distribution are given below
| (50) |
| (51) |
| (52) |
From Equation (45), be the joint function of b and which can be expressed as
| (53) |
The probability density function of can be written as follows:
| (54) |
| (55) |
By employing Bayes' theorem, the joint posterior distribution of the parameter given the observed data is obtained as follows:
| (56) |
| (57) |
| (58) |
8.3. Posterior density function of TIHL-WAF model
By substituting the , and prior distribution function of each parameter in equation 58 of the TIHL-WAF model we will get the posterior density function.
| (59) |
For MCMC implementation, one need to obtaine the full conditional on upto proportionality.
| (60) |
| (61) |
| (62) |
| (63) |
8.4. Posterior density function of TIIHL-WAF model
| (64) |
8.5. Posterior density function of TIIEHL-WAF model
| (65) |
We can also derive the joint posterior distribution of the Weibull AFT shared frailty model in the same way by plugging the hazard function , survival function , and priors into Equation (58), and then marginal posterior distribution of the parameters from the TIIHL-WAF, TIIEHL-WAF and Weibull AFT shared frailty models for MCMC implementation.
8.6. MCMC simulation: A concise overview of essential Bayesian techniques
To calculate the marginal posterior distribution, solving the high-dimensional integral for all model parameters is challenging. Obtaining accurate marginal distributions and the normalized joint posterior distribution analytically is impossible. Hence, we resort to approximating these integrals using the MCMC simulation method [15], specifically employing the HMC (Hamiltonian Monte Carlo) method and its adaptive NUTS (No-U-Turn sampler) algorithm [12]. We provide details on their implementation and setup. Utilizing the MCMC simulation method assisted by STAN, we perform the estimation procedure and extract noteworthy findings.
8.6.1. HMC algorithm
Hamiltonian Monte Carlo (HMC) efficiently traverses the posterior distribution using derivatives of the target density function. Employing a numerical integration-based simulation of Hamiltonian dynamics, adjusted through a Metropolis acceptance step, HMC draws samples from the joint density function, involving both the parameter θ and auxiliary momentum variables ϕ. This section presents the HMC technique in the notation introduced by [4], aligning it with [9], aiming to draw samples from a density function denoted as , representing the parameter θ in Bayesian analysis, often expressed as a Stan program for the Bayesian posterior conditioned on observed data X.
| (66) |
A multivariate normal distribution, defined as , is frequently used as the auxiliary density in the majority of HMC applications, including those in Stan. It is crucial to remember that this auxiliary density is not reliant on the θ parameters. The Euclidean metric is represented by Σ in this situation and acts as a measure of variability. This supplementary density allows for more effective and economical sampling by transforming the parameter space. By default, in Stan, the ( ) is set to a diagonal estimate of the covariance that is computed during the warm-up phase. The Joint density defines the hamiltonian.
| (67) |
| (68) |
| (69) |
The quantity is referred to as kinetic energy, while is known as potential energy. In the Stan program, the log density is defined to describe the characteristics of the distribution based on the current parameter value θ. The transition to a new state occurs in two stages before undergoing a Metropolis acceptance step. Firstly, the momentum is independently sampled as and is not carried over between iterations. Next, Hamilton's equations are used to evolve the joint system, which includes both the current parameter values θ and the newly sampled momentum ϕ.
| (70) |
| (71) |
Since that the momentum density is unrelated to the target density, (i.e. ), the first term in the time derivative of the momentum, , becomes zero, resulting in zero contributions to the pair time derivatives.
| (72) |
| (73) |
Following the preceding section, we are confronted with a two-state differential equation that necessitates a solution. Consistent with other implementations of Hamiltonian Monte Carlo (HMC), Stan utilizes the leapfrog integrator. This integrator is a numerical technique specifically designed to produce stable results for systems of Hamiltonian equations. The leapfrog algorithm, akin to many numerical integrators, operates by taking discrete steps with a small time interval denoted as ϵ. The algorithm initiates by independently sampling a new momentum term, represented as , without relying on the parameter values θ or previous momentum value. Subsequently, the algorithm proceeds by alternating between half-step updates of the momentum and full-step updates of the position.
| (74) |
| (75) |
By utilizing L leapfrog steps, a simulated time of is accumulated. At the conclusion of this simulation, following L repetitions of the three aforementioned steps, the resulting state is represented as .
If the leapfrog integrator were perfect in terms of numerical accuracy, introducing randomness solely through generating a random momentum vector for each transition would be sufficient. However, in practice, to accommodate numerical integration errors, a Metropolis acceptance step is incorporated. This step determines the probability of accepting the proposal generated from transitioning from the current state . The acceptance probability is calculated as follows:
| (76) |
The previous parameter value is saved and utilized to initiate the following iteration if the proposal is rejected.
8.6.2. Summary of algorithm
The HMC algorithm begins with initializing a preset parameter set θ, either provided by the user or randomly generated in the Stan framework. After a set number of iterations, a new momentum vector is sampled and the leapfrog integrator updates the current parameter value θ. Leapfrog integration, aligned with Hamiltonian dynamics, is executed with a discretization time interval ϵ and a specified number of steps (L). Following this, an acceptance step, guided by the Metropolis criterion, decides whether to transition to the new state or maintain the current state.
8.7. Efficient model implementation and fitting: Harnessing the power of stan for robust model execution
The crucial rstan package forms the foundation for executing Stan code in R. Comprising six distinct blocks designed for specific Bayesian modeling tasks–data, mutated data, parameters, altered parameters, model and simulated outcomes (generated quantities) –a Stan program is a comprehensive framework. The provided Stan codes in the appendix encapsulate these blocks for the TIIEHL-Weibull AFT shared frailty models. The fitting of all three models is executed through the ‘stan’ function from the rstan package, involving MCMC sampling with 2000 iterations across 4 parallel chains, resulting in a total of 8000 iterations. The Stan framework, at its core, efficiently utilizes a compiler for seamless execution. Following subsections present succinct code snippets for generating both numeric and graphical summaries.
8.8. The execution of the TIHL-Weibull AFT shared frailty model using stan
8.9. Generating a summary of the output and providing interpretation
The outcomes of employing Bayesian techniques to fit the TIHL-WAF model are presented in Table 1, along with visual aids to encapsulate the posterior density and assess model convergence. Notably, the analysis reveals that parameters b1 (intercept) with an estimate of 4.365 and b3 (sex) with an estimate of 1.68 exhibit statistical significance within a 95% credible interval. This translates to an acceleration factor of infection in male patients of and an acceleration factor of infection in female patients of This suggests that during catheter insertion, the infection rate is higher in male patients compared to female patients. However, parameters b2 (age) with an estimate of , b4 (AN) with an estimate of -0.108, b5 (GN) with an estimate of , and b6 (PKD) with an estimate of 0.919 do not exhibit statistical significance within the 95% credible interval. This lack of significance is indicated by the credible intervals encompassing zero, implying that these coefficients' effects are not statistically meaningful. The summary table provides various posterior estimates, including the mean, se_mean (standard error of the mean), standard deviation (sd) and credible intervals. Additionally, it furnishes supplementary information such as the effective number of samples , which gauges sample independence from the posterior distribution, and the Rhat (potential scale reduction factor) [9], used to assess convergence towards the target distribution. In general, values exceeding and Rhat below 1.1 are deemed acceptable for obtaining precise parameter estimates and ensuring model convergence.
Table 1.
Posterior estimates for TIHL-Weibull AFT shared frailty model parameters.
| Parameters | Mean | se_mean | sd | 2.5% | 25% | 97.5% | n_eff | Rhat |
|---|---|---|---|---|---|---|---|---|
| b[1] | 4.365 | 0.034 | 1.710 | 0.898 | 3.339 | 7.723 | 2482 | 1.000 |
| b[2] | 0.000 | 0.014 | 0.025 | 2354 | 0.999 | |||
| b[3] | 1.680 | 0.008 | 0.379 | 0.906 | 1.436 | 2.399 | 2555 | 1.002 |
| b[4] | 0.010 | 0.486 | 0.857 | 2147 | 1.000 | |||
| b[5] | 0.010 | 0.481 | 0.415 | 2178 | 1.000 | |||
| b[6] | 0.919 | 0.019 | 0.731 | 0.434 | 2.353 | 1462 | 1.000 | |
| alpha | 1.029 | 0.005 | 0.151 | 0.765 | 0.922 | 1.349 | 1045 | 1.001 |
| beta | 60.928 | 8.205 | 392.858 | 0.441 | 6.804 | 325.232 | 2293 | 1.000 |
| omega_z | 0.632 | 0.015 | 0.221 | 0.189 | 0.482 | 1.048 | 210 | 1.008 |
In the case of the TIHL-WAF model, all Rhat values fall within an acceptable range, this indicates that the Markov chains have reached convergence to the desired distribution and the effective sample size is deemed satisfactory. The model's consistency with the current data is visually assessed using the Bayesplot package [7]. Plot illustrating the posterior predictive distribution also known as (PPD) plots, (Figure 1) demonstrate that the TIHL-WAF model aligns well with the data.
Figure 1.
(a) The traceplot for the TIHL-Weibull AFT shared frailty model shows four separate runs of chains. Successfully combining these four chains confirms that the MCMC algorithm has converged to the target joint posterior distribution. (b) The model convergence of the TIHL-Weibull AFT shared frailty is assessed through a posterior predictive density (PPD) plot, which demonstrates a good fit to the data: (a) trace plot and (b) PPD plot.
8.10. The execution of the TIIHL-Weibull AFT shared frailty model using stan
8.11. Generating a summary of the output and providing interpretation
Table 2 presents the results of fitting a TIIHL-Weibull AFT shared frailty model using Bayesian methods. Upon examination, we can see that the posterior estimate of parameters b3(sex) is 1.545 are statistically significant at 95% credible interval. The acceleration factor of infection in male patient, and acceleration factor of infection in female patient, which inferred that in the male patients infection rate is more than female patient during catheter is inserted. The Rhat values for the model parameters are all below 1.1, indicating that the Markov chains have converged to the desired distribution. Additionally, the effective number of samples exceeds 100, further confirming the adequacy of the sample size. The numeric summary in Table 2 reveal that the 95% credible intervals for the coefficient of sex(b3) do not encompass zero, signifying their statistical significance. Furthermore, plot illustrating the posterior predictive distribution also known as (PPD) plot (Figure 2) of the TIIHL-WAF model demonstrates a close alignment between the model's predictions and the observed data.
Table 2.
Posterior estimates for TIIHL-Weibull AFT shared frailty model parameters.
| Parameters | Mean | se_mean | sd | 2.5% | 25% | 97.5% | n_eff | Rhat |
|---|---|---|---|---|---|---|---|---|
| b[1] | 0.448 | 0.083 | 2.021 | 3.581 | 591 | 1.000 | ||
| b[2] | 0.000 | 0.013 | 0.023 | 919 | 1.004 | |||
| b[3] | 1.545 | 0.012 | 0.354 | 0.851 | 1.316 | 2.257 | 872 | 0.999 |
| b[4] | 0.016 | 0.485 | 0.675 | 941 | 1.003 | |||
| b[5] | 0.017 | 0.485 | 0.430 | 777 | 1.003 | |||
| b[6] | 0.727 | 0.022 | 0.689 | 0.274 | 2.089 | 1016 | 1.005 | |
| alpha | 0.698 | 0.039 | 0.666 | 0.222 | 0.347 | 2.716 | 297 | 1.003 |
| beta | 9.802 | 0.569 | 13.940 | 0.401 | 2.250 | 45.685 | 600 | 1.001 |
| omega_z | 0.431 | 0.063 | 0.272 | 0.019 | 0.216 | 0.980 | 99 | 1.100 |
Figure 2.
(a) The traceplot for the TIIHL-Weibull AFT shared frailty model shows four separate runs of chains. Successfully combining these four chains confirms that the MCMC algorithm has converged to the target joint posterior distribution. (b) The model convergence of the TIIHL-Weibull AFT shared frailty is assessed through a posterior predictive density (PPD) plot, which demonstrates a good fit to the data: (a) Trace plot and (b) PPD plot.
8.12. The excution of the TIIEHL-Weibull AFT shared frailty model using stan
8.13. Generating a summary of the output and providing interpretation
Table 3 displays the Bayesian results from fitting a TIIEHL-Weibull AFT shared frailty model. The posterior estimate for parameter b3(sex) is 1.448, indicating statistical significance at the 95% credible interval. The acceleration factor for infection in male patients is , while for female patients, it is . This suggests a higher infection rate in male patients during catheter insertion. Rhat values below 1.1 indicate successful convergence of Markov chains, and the effective number of samples surpassing 100 confirms an adequate sample size. The numeric summary in Table 3 highlights that the 95% credible intervals for the coefficient of sex (b3) exclude zero, signifying statistical significance. Additionally, the posterior predictive distribution (PPD) plot (Figure 3) demonstrates a close alignment between the model's predictions and observed data.
Table 3.
Posterior estimates for TIIEHL-Weibull AFT shared frailty model parameters.
| Parameters | mean | se_mean | sd | 2.5% | 25% | 97.5% | n_eff | Rhat |
|---|---|---|---|---|---|---|---|---|
| b[1] | 1.048 | 0.273 | 4.066 | 7.706 | 221 | 1.014 | ||
| b[2] | 0.000 | 0.013 | 0.022 | 956 | 1.005 | |||
| b[3] | 1.448 | 0.012 | 0.378 | 0.678 | 1.199 | 2.165 | 959 | 1.001 |
| b[4] | 0.017 | 0.481 | 0.729 | 814 | 1.008 | |||
| b[5] | 0.017 | 0.476 | 0.284 | 770 | 1.006 | |||
| b[6] | 0.919 | 0.027 | 0.711 | 0.444 | 2.287 | 671 | 1.001 | |
| alpha | 0.489 | 0.084 | 0.820 | 0.087 | 0.128 | 3.339 | 95 | 1.014 |
| beta | 73.067 | 8.412 | 296.895 | 4.294 | 14.139 | 371.989 | 1246 | 1.000 |
| lambda | 14.978 | 0.895 | 14.972 | 0.335 | 4.234 | 55.433 | 280 | 1.009 |
| omega_z | 0.478 | 0.031 | 0.241 | 0.084 | 0.282 | 0.969 | 99 | 1.027 |
Figure 3.
(a) The TIIEHL-Weibull AFT shared frailty model's traceplot exhibits successful convergence of four separate chains, confirming MCMC convergence to the target joint posterior distribution. (b) The model convergence is further validated through a posterior predictive density (PPD) plot, demonstrating a robust fit to the data: (a) Trace plot and (b) PPD plot.
8.14. The execution of the Weibull AFT shared frailty model using Stan
8.15. Generating a summary of the output and providing interpretation
Table 4 presents the outcomes of Bayesian model fitting using a Weibull AFT shared frailty model. Notably, the intercept b1 and the coefficiant of age(b3) are statistically significant at 95% credible interval. The Rhat values for the model parameters are all below 1.1, indicating that the Markov chain has successfully converged to the intended distribution. Additionally, the effective number of samples surpasses 100, providing further confidence in the adequacy of the sample size. By examining the numerical summary in Table 4, it becomes evident that the 95% credible intervals for coefficients b2(age), b4(AN), b5(GN) and b6(PKD) are not statistically significant do not encompass zero. This observation signifies the statistical significance of this coefficients. Furthermore, plot illustrating the posterior predictive distribution also known as (PPD) plot (Figure 4) of the Weibull AFT shared frailty model displays a close match between the model's predictions and the actual data, indicating a good fit.
Table 4.
Posterior estimates for Weibull AFT shared frailty model parameters.
| Parameters | mean | se_mean | sd | 2.5% | 25% | 97.5% | n_eff | Rhat |
|---|---|---|---|---|---|---|---|---|
| b[1] | 2.120 | 0.016 | 0.815 | 0.546 | 1.563 | 3.758 | 2442 | 1.000 |
| b[2] | 0.000 | 0.013 | 0.024 | 2219 | 1.000 | |||
| b[3] | 1.627 | 0.007 | 0.383 | 0.855 | 1.371 | 2.385 | 2653 | 1.000 |
| b[4] | 0.012 | 0.494 | 0.893 | 1795 | 1.001 | |||
| b[5] | 0.012 | 0.489 | 0.463 | 1748 | 1.001 | |||
| b[6] | 0.934 | 0.019 | 0.725 | 0.441 | 2.431 | 1517 | 1.001 | |
| alpha | 1.153 | 0.006 | 0.157 | 0.876 | 1.041 | 1.492 | 643 | 1.003 |
| omega_z | 0.588 | 0.029 | 0.242 | 0.085 | 0.444 | 1.036 | 98 | 1.019 |
Figure 4.
(a) The traceplot of the Weibull AFT shared frailty model exhibits four converged chains, confirming MCMC convergence to the target joint posterior distribution.(b) The model convergence is validated by a posterior predictive density (PPD) plot, illustrating a strong fit to the data: (a) Trace plot and (b) PPD plot.
9. Model comparison with Bayesian criteria
For model comparison, we employ criteria such as Leave One Out cross-validation Information Criteria (LOOIC) and Watanabe Akaike Information Criteria (WAIC) [18,20]. These criteria aid in selecting the most suitable model by evaluating their performance. Models with lower LOOIC and WAIC values are preferred as they indicate a closer match to the observed data and better capture underlying patterns. LOOIC and WAIC are particularly advantageous in situations with limited data, providing more precise estimates of out-of-sample predictive performance. They help strike a balance between fitting the data and maintaining simplicity, promoting better generalization and accurate predictions beyond the training dataset. Mathematically, LOOIC and WAIC calculations are as follows:
| (77) |
| (78) |
Here represent the log pointwise predictive density for the leave-one-out (LOO) cross validation scenario, and represents the log pointwise predictive density where denotes the predictive density obtained by excluding the jth data point from the given dataset also the is posterior variance in log probabilities known as penalty term. For more comprehensive explanation, please refer to the relevant source [19]. Recently [3], [1], [6] and [11] used LOOIC and WAIC as the basis of comparision of the Bayesian survival models.
From Table 5, We can observe that the TIIEHL-WAF model's LOOIC and WAIC values are the lowest of the four, which indicate that the predictive performance of this model is better demonstrating that it is a superior model as compared to other three models for the cancer data from survival package.
Table 5.
LOOIC and WAIC values for TIHL-WAF, TIIHL-WAF,TIIEHL-WAF and WAFT models.
| Model | LOOIC | WAIC |
|---|---|---|
| TIHL-WAF | 672.6 | 669.5 |
| TIIHL-WAF | 670.9 | 669.2 |
| TIIEHL-WAF | 661.8 | 659.1 |
| WAFT | 675.1 | 674.3 |
10. Conclusion and discussion
In this study, the Bayesian paradigm was applied to the analysis of a censored survival data using the TIHL-G family, TIIHL-G family and TIIEHL-G family. The rstan package of R is used to implement the simulation and analytical approximation techniques. The Markov chains for all models converge to the target distribution using trace plots of the models. The TIIEHL- Weibull AFT shared frailty model is the most appropriate model for fitting the caner data, because its predictive performance is the best among all the other three models according to comparisons between posterior predictive density plots, LOOIC, and WAIC. Additionally, from the analysis we found that in the male patients infection rate is more than female patient during catheter is inserted.
To further enhance and extend this work by using the investigating methods for selecting important covariates or predictors in the model to enhance the interpretability and efficiency of the model in capturing the underlying hazard rates and we can extend the model to accommodate time-varying covariates, as many real-world survival studies involve predictors that change over time also we can explore the integration of frailty models with the model to account for unobserved heterogeneity among subjects, which can have a significant impact on survival outcomes.
Appendices.
Appendix 1. Implementation of the TIHL-Weibull AFT shared frailty model using stan code.
Appendix 2. Implementation of the TIIHL-Weibull AFT shared frailty model using stan code.
Appendix 3. Implementation of the TIIEHL-Weibull AFT shared frailty model using stan code.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.AbuJarad M.H., AbuJarad E.S., and Khan A.A., Bayesian survival analysis of type I general exponential distributions, Ann. Data Sci. 1 (2019), pp. 1–21. [Google Scholar]
- 2.Al-Mofleh H., Elgarhy M., Afify A., and Zannon M., Type II exponentiated half logistic generated family of distributions with applications, Electron. J. Appl. Stat. Anal. 13 (2020), pp. 536–561. [Google Scholar]
- 3.Ashraf-Ul-Alam M. and Khan A.A., Generalized Topp–Leone–Weibull aft modelling: A Bayesian analysis with MCMC tools using r and Stan, Aust. J. Stat. 50 (2021), pp. 52–76. [Google Scholar]
- 4.Betancourt M. and Girolami M., Hamiltonian Monte Carlo for hierarchical models, Curr. Trends Bayes Methodol. Appl. 79 (2015), pp. 2–4. [Google Scholar]
- 5.Collett D., Modelling Survival Data in Medical Research, 3rd., CRC Press, Boca Raton, FL, 2015. [Google Scholar]
- 6.Farhin S., Bayesian survival modeling of Marshal Olkin generalized-g family with random effects using r and Stan, Reliab. Theory Appl. 17 (2022), pp. 422–440. [Google Scholar]
- 7.Gabry J., Simpson D., Vehtari A., Betancourt M., and Gelman A., Visualization in Bayesian workflow, J. R. Stat. Soc. Ser. A: Stat. Soc. 182 (2019), pp. 389–402. [Google Scholar]
- 8.Gelman A., Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Anal. 1 (2006), pp. 515–534. [Google Scholar]
- 9.Gelman A., Carlin J.B., Stern H.S., Dunson D.B., Vehtari A., and Rubin D.B., Models for robust inference. Bayesian Data Analysis. 3rd ed., Chapman & Hall/CRC, Boca Raton, FL, 2014, pp. 435–446.
- 10.Hoffman M.D., The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo, J. Mach. Learn. Res. 15 (2014), pp. 1593–1623. [Google Scholar]
- 11.Khan A.A., Bayesian analysis of type II generalized Topp–Leone accelerated failure time models using r and Stan, Reliab. Theory Appl. 17 (2022), pp. 477–493. [Google Scholar]
- 12.Lawless J.F., Statistical Models and Methods for Lifetime Data, John Wiley & Sons, Hoboken, NJ, 2011. [Google Scholar]
- 13.Neal R.M., Handbook of Markov Chain Monte Carlo. Press C, editor, 2011, pp. 22011.
- 14.R Core Team , A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2021.
- 15.Shrahili M., Muhammad M., Elbatal I., Muhammad I., Bouchane M., and Abba B., Properties and applications of the type i half-logistic Nadarajah–Haghighi distribution, Aust. J. Stat. 52 (2023), pp. 1–21. [Google Scholar]
- 16.Soliman A.H., Elgarhy M.A.E., and Shakil M., Type II half logistic family of distributions with applications, Pak. J. Stat. Oper. Res. 13 (2017), pp. 245–264. [Google Scholar]
- 17.Therneau T.M. and Lumley T., Package ‘survival’, R Top. Doc. 128 (2015), pp. 28–33. [Google Scholar]
- 18.Vehtari A., Gabry J., Yao Y., and Gelman A., loo: Efficient leave-one-out crossvalidation and WAIC for Bayesian models, R Package Version 2 (2018), pp. 1003. [Google Scholar]
- 19.Vehtari A., Gelman A., and Gabry J., Practical Bayesian model evaluation using Leaveone-out cross-validation and WAIC, Stat. Comput. 27 (2017), pp. 1413–1432. [Google Scholar]
- 20.Watanabe S. and Opper M., Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory, J. Mach. Learn. Res. 11 (2010), pp. 5. [Google Scholar]
- 21.Wienke A., Frailty Models in Survival Analysis, CRC Press, New York, 2010. [Google Scholar]




