Abstract
We introduce a new modelling for long-term survival models, assuming that the number of competing causes follows a mixture of Poisson and the Birnbaum-Saunders distribution. In this context, we present some statistical properties of our model and demonstrate that the promotion time model emerges as a limiting case. We delve into detailed discussions of specific models within this class. Notably, we examine the expected number of competing causes, which depends on covariates. This allows for direct modeling of the cure rate as a function of covariates. We present an Expectation-Maximization (EM) algorithm for parameter estimation, to discuss the estimation via maximum likelihood (ML) and provide insights into parameter inference for this model. Additionally, we outline sufficient conditions for ensuring the consistency and asymptotic normal distribution of ML estimators. To evaluate the performance of our estimation method, we conduct a Monte Carlo simulation to provide asymptotic properties and a power study of LR test by contrasting our methodology against the promotion time model. To demonstrate the practical applicability of our model, we apply it to a real medical dataset from a population-based study of incidence of breast cancer in São Paulo, Brazil. Our results illustrate that the proposed model can outperform traditional approaches in terms of model fitting, highlighting its potential utility in real-world scenarios.
Keywords: Birnbaum–Saunders, breast cancer data, competing causes, cure rate model, expectation–maximization algorithm
1 |. Introduction
Cancer represents a significant global public health issue, as it is a leading cause of death and poses a major obstacle to the increase in life expectancy. In most countries, it ranks as the first or second leading cause of premature death before the age of 70. Both the incidence and mortality rates are rapidly increasing worldwide, driven by demographic and epidemiological transitions (Sung et al. 2021). The significant increase in disease rates directly reflects the lifestyle choices that most families have been adopting over time. The adoption of certain behavioral and environmental changes, such as dietary habits and exposure to environmental pollutants, contribute to the rise in cancer incidence and mortality. These factors also impact mobility, recreation, and overall structural conditions that influence health and quality of life (Wild, Weiderpass, and Stewart 2020).
Effective interventions have been implemented for the prevention, early detection, and treatment of the disease in countries with high human development indices. These efforts have had a substantial impact on reducing the incidence and mortality rates associated with cancer (Sung et al. 2021). According to the Global Cancer Estimates Observatory (GLOBOCAN), a web-based platform presenting global cancer statistics prepared by the International Agency for Research on Cancer, the impact of cancer on the world in 2020 was significant. There were 19.3 million new cancer cases worldwide (18.1 million, excluding nonmelanoma skin cancer). This means that one in five individuals receive a cancer diagnosis during their lifetime (Ferlay et al. 2021; Sung et al. 2021).
The long-term survivors of breast cancer patients have significantly improved over the past 50 years, due to advancements in the field of medical science and the introduction of new treatment approaches. As a result, an increasing number of patients are now considered “cured” or immune to the event of interest. It is anticipated that a certain percentage of patients will respond positively to treatment, leading to an improvement in overall survival. The long-term survival or cure rate models are specifically designed to account for this characteristic. It is essential to understand that these models do not apply to overall survival since even if a patient is cured of a specific disease, they remain vulnerable to other diseases, making it impossible to achieve a complete cure for all illnesses.
Cure rate models are appropriate when there are individuals in the population who will never experience the event of interest. The pioneering model in this context was proposed by Berkson and Gage (1952), where it is assumed that there are two distinct groups: those who are immune and those who are susceptible to the event of interest. Chen, Ibrahim, and Sinha (1999) discussed an alternative model with a biological interpretation. In this model, the authors assumed the existence of some carcinogenic cells (latent variable), denoted as for each individual. The classification of subjects into cured and susceptible categories is determined by and , respectively. In their initial proposal, the authors considered to follow a Poisson distribution with mean .
In the literature, various alternative models to have been proposed. Notable examples include the negative binomial (NB) as particular cases and some well-known distributions such as the Bernoulli (Bern), binomial (Bin), Poisson (Poi), and geometric (Geo) (Rodrigues, Cancho, et al. 2009); COM-Poisson (Rodrigues, de Castro, et al. 2009); power series (Cancho, Louzada, and Ortega 2013); Yule–Simon (Gallardo, Gómez, and Bolfarine 2017a); Polylogarithm (Gallardo, Gómez, and de Castro 2018); zero-modified geometric (Leão et al. 2020); compound Poisson (Gómez et al. 2023); a mixture of power series (Brandão et al. 2023), among others.
An interesting class of models was discussed in Barreto-Souza (2015), where it is assumed that, conditional on a latent variable . The author considered the exponential family (EF) of distributions for with mean 1, which encompasses a wide class of models, including the gamma, inverse Gaussian, and generalized hyperbolic secant, among others. In this paper, we explore a similar concept, but we consider the Birnbaum–Saunders (BS) distribution for . The BS model does not belong to the EF but possesses many interesting properties: it can be directly parameterized in terms of the mean, it has a moment-generating function in a closed and simple form, and it can be expressed as a mixture of distributions, among other characteristics. The proposed model presents itself as a compelling alternative to the widely acknowledged negative binomial model. Both models exhibit the common feature of overdispersion of simultaneous causes in relation to the mean. This incorporation not only addresses overdispersion but also introduces versatility to the array of modeling options at our disposal.
The article is organized as follows. In Section 2, we introduce the Poisson–Birnbaum–Saunders (PBS) mixture model. Section 3 provides a comprehensive review of the maximization of the log-likelihood function for this model, and we propose an estimation procedure based on the expectation–maximization (EM) algorithm. The performance of our proposed model is thoroughly examined in Section 4 through two simulation studies. To illustrate the practical application of the methodology, in Section 5, we analyze a dataset comprising survival times of patients with breast cancer in the state of São Paulo, Brazil. Finally, in Section 6, we present a detailed discussion of the main findings and implications of this study.
2 |. The Proposed Model
In this section, we provide an overview of the Birnbaum–Saunders distribution and introduce our proposed modeling approach.
2.1 |. Birnbaum–Saunders Model
The BS distribution has been widely considered in the literature due to its physical arguments, favorable statistical properties, and its connection with the normal distribution. The BS model was proposed by Birnbaum and Saunders (1969) and has been extensively applied for modeling failure times in engineering. However, novel applications have emerged in biological, environmental, and financial sciences as well; for instance, Desmond (1985), Kotz, Leiva, and Sanhueza (2010), Saulo et al. (2013) and Leiva, Santos-Neto, et al. (2014), Leiva, Saulo, et al. (2014), Leiva, Marchant, et al. (2015), Leiva, Tejo et al. (2015), and Leiva et al. (2017).
In the context of the BS distribution, Santos-Neto et al. (2012) introduced various parameterizations. One such parameterization is defined by the parameters and , where and are the original BS parameters (Birnbaum and Saunders 1969), is a scale parameter and represents the mean of the distribution, while acts as a shape and precision parameter. We use the notation to denote a random variable following this distribution. If , its probability density function (PDF) is as follows:
For the particular case (which will be of our interest) and defining and , we have that
| (1) |
where denotes the generalized inverse gamma distribution with PDF is given by
The result in (1) indicates that the BS model is a mixture of two GIG distributions. The supposition on the distribution BS with and is done to ensure identifiability to the model, which will be very useful in the estimation process that will be developed and does not reduce the applicability of the model in practical cases. This result was also presented in Equation (22) of Balakrishnan and Kundu (2019)
Finally, the moment-generating function for the BS distribution can be expressed as
2.2 |. Poisson–Birnbaum–Saunders Mixture Model
In this subsection, we introduce a novel cure rate model based on a mixture of the Poisson and BS distributions. The PBS mixture model was proposed by Gonçalves, Barreto-Souza, and Ombao (2022). In addition to exploring its properties, we also discuss a method for generating values from this model.
Let be an unobserved variable denoting the initial number of competing causes related to the occurrence of an event of interest. In a medical context, such as with cancer patients, represents the number of carcinogenic cells in patients undergoing cancer treatment. We assume that, conditional on , . We further assume that , that is, and . It is straightforward to see that and then is degenerated at 1 when .
Under this scheme, and , that is, the distribution of is overdispersed. Furthermore, we can readily compute the probability-generating function of as follows:
The usual scheme here is the assumption that , the time to produce a detectable cancer for each of the carcinogenic cells, are conditionally independent given with common survival function . In addition, if the individual is considered as cured and then it is defined as . With those notations, the time-to-event for the individual can be represented as . Under this usual competing risks framework, and per Theorem 2 in Rodrigues, Cancho, et al. (2009), we have that the (improper) population survival function (SF) and PDF of the PBS mixture cure rate model is given by
| (2) |
and
where and and are the survival function and PDF of time-to-event and denotes a vector of unknown parameters. We assume that the time to the event of interest follows a Weibull distribution. An important detail concerning the relationship between the PBS mixture model and the Poisson model with mean is that when the parameter from the BS distribution in our proposal tends to infinity, the population survival function in Equation (2), denoted as , converges in the limit to the population survival function of the cure rate Poisson model (also knows in the literature as the promotion time cure rate model). In other words, , as described in Rodrigues, Cancho, et al. (2009).
Applying the limit in Equation (2), it is immediate that the cure rate of the model has the following expression:
| (3) |
For heterogeneous populations with varying characteristics, we can introduce explanatory variables into this model through the cure rate using the cure parameter from Poisson distribution in our mixing approach. When these factors are integrated, a distinct cure rate parameter is assigned to each patient or subject, represented as , where ranges from 1 to , being the number of individuals or subjects in the study. To capture the influence of these explanatory factors on the cure rate, different link functions can be employed.
Remark 2.1.
Hashimoto et al. (2014) present a model named Poisson-Birnbaum-Saunders. Such a model corresponds to considering and as the survival function for the BS distribution. Despite the similarity in name, for our approach the PBS mixture model considers a very different assumption than in the aforementioned work, namely , and, up to this moment, any particular choice for .
3 |. Estimation
In this section, we focus on estimating the model parameters. Let us consider the situation when the time to an event is not completely observed and is subject to right censoring. Let be the censoring time for the th individual and be the failure time. We observe and , where if is a time-to-event and if is right-censored, for . Based on the observed vectors , where is the covariate vector of dimension related to the cure of the th individual. These covariates on the cure fraction in (3) can be modeled via a link function in . In order to deal with the effect of the explanatory variables on the cure, let be the vector of regression coefficients to be estimated. Note that is related to explanatory variables with observed values for the patient denoted by , which are associated with the cured fraction. Observe that different kinds of link functions can be considered, so that, the choice of the link function depends on the parameter space. The variables , and are nonobservable, and thus the complete data are denoted through vector .
To obtain the estimates for , where , we can use the corresponding log-likelihood function method under uninformative censoring that is expressed as
| (4) |
To obtain the maximum likelihood (ML) estimators, it is necessary to maximize (4) in relation to , that is, a maximization of dimension . In the following subsection, we discuss an EM-type algorithm in order to provide a more attractive and robust estimation procedure.
3.1 |. EM Algorithm
Let us focus on estimating the model parameters when it involves incomplete data, latent variables or missing data by using the ML method proposed by Dempster, Laird, and Rubin (1977). The EM algorithm is commonly employed to handle ML estimates of the parameters of interest. It uses incomplete data to deal with the estimation process. This algorithm iteratively intends the conditional distribution of the latent variables given the observed data and actual parameter estimates in the E-step to obtain ML estimates of the parameters. Thereafter, in the M-step, this conditional expectation is maximized to obtain ML estimates of the parameters studied.
To derive the formula for the E-step, the following proposition and corollary can be employed. The proofs of the latter can be found in Appendix A.
Proposition 3.1.
For the PBS model, the conditional distribution of (i) ; (ii) and; (iii) are, respectively, given by
with , , , and
| (5) |
where is a modified Bessel function of the second kind (Abramowitz and Stegun 1972).
Corollary 3.1.
The expected values for , , , and given are, respectively,
More details of the results presented above are provided in the Appendix.
The complete log-likelihood for , with and and thus the complete data are denoted by is given by
| (6) |
where
Let be the estimate of at the th iteration and denote as the conditional expectation of in (6) given the observed data and . Then this conditional expectation can be decomposed as
with
| (7) |
| (8) |
| (9) |
where , and . Note that all the expected values required can be computed using Corollary 1 and the functions , , and depend only on , , and , respectively.
In short, the th iteration of the EM algorithm is given by
E-step: Following the Corollary 1, for , update the values of the following latent variables: , , , and .
M-step: Given the actual values of , , , and , find the values of , , and that maximizes (7), (8), and (9), in relation to , , and , respectively.
The E-step and M-step are performed iteratively until a pre-defined convergence criterion is met, specifically when the difference between consecutive estimates reaches a predetermined tolerance level. Conversely, the standard errors for the estimator can be derived from the Hessian matrix of the observed log-likelihood function in (4), which is given by
This matrix can be computed using the hessian function included in the pracma (Borchers 2023) package of the R Core Team (2023) software. Under appropriated regularity conditions, it was shown by Kalbfleisch and Prentice (2002) that the asymptotic distribution of the estimator follows:
| (10) |
where represents a vector of zeros with a dimension of and denotes the identity matrix of order . In addition, if denotes the estimated variance of , then by the delta method (Hajek, Sidak, and Sen 1999), for , it is obtained
| (11) |
Results in Equations (10) and (11) allow to build confidence intervals for each parameter and/or .
4 |. Monte Carlo Simulation Studies
In this section, we present the results of two simulation studies. The first is related to assessing the performance of the ML estimator for the PBS model through the EM algorithm. The second study is devoted to evaluating the performance of the likelihood ratio (LR) test to decide between the PBS model and the traditional promotion time cure rate model.
4.1 |. Asymptotic Properties
In this study, we assess the effectiveness of parameter estimation using the EM algorithm by recovering parameter values for simulated datasets. To facilitate the study’s conduction, the following data structure was created. The time-to-event values were drawn from the Weibull distribution with fixed parameters and . As our investigation involves studying covariates within the cure fraction, we generated a sample for the number of competing causes using the PBS mixture model, with fixed regression coefficients , , , and , which provides . To evaluate different sample sizes (), set at 200, 400, 600, 800, 1000, 1200, 1400, 1600, and 5000, we conducted a Monte Carlo study comprising 1000 replications for each size.
For each individual, a categorical covariate with three levels was considered. This variable was denoted as , , and for . The values of these covariates were sampled from a multinomial distribution with probabilities 0.15, 0.26, and 0.59, respectively. The censoring times for all sample sizes were drawn from a uniform distribution between 0 and 20, resulting in an average censoring percentage of approximately 20%. The selected parameter values were approximated from the estimates obtained for the application of our proposed model in the next section.
For each parameter value and sample size, we presented the empirical estimates for the standard deviation (SD) of as well as the estimated bias and root mean squared error (RMSE) of the ML estimators and the coverage probabilities (CP) of the asymptotic 95% confidence intervals, all based on the asymptotic distribution given by Equation (10). The results are shown in Table 1. The standard error used to compute the RMSE was obtained using the Hessian matrix computed using the hessian function included in the pracma package (Borchers 2023) of R Core Team (2023), considering the asymptotic distribution of the ML estimators based on EM estimators.
TABLE 1 |.
Empirical standard deviation (SD), Bias, root of RMSE, and CP of the ML estimators for the Poisson–Birnbaum–Saunders mixture model using the Weibull distribution to time-to-event in the concurrent causes regression.
| Sample size | Parameter | β 0 | β 1 | β 2 | α | ν | ϕ | Var(Z) |
|---|---|---|---|---|---|---|---|---|
| Real value | −1.67 | 1.21 | 2.53 | −3.52 | 1.40 | 1.36 | 1.386 | |
|
| ||||||||
| 200 | Bias | −0.199 | 0.325 | 0.444 | −0.283 | 0.070 | 9.844 | 0.253 |
| RMSE | 1.611 | 1.676 | 1.730 | 0.805 | 0.240 | 34.427 | 1.170 | |
| EM SD | 1.598 | 1.644 | 1.672 | 0.754 | 0.229 | 32.989 | 1.707 | |
| CP | 0.993 | 0.971 | 0.971 | 0.933 | 0.954 | 0.818 | 0.719 | |
| 400 | Bias | −0.024 | 0.086 | 0.150 | −0.147 | 0.042 | 4.472 | 0.133 |
| RMSE | 0.466 | 0.526 | 0.623 | 0.577 | 0.168 | 18.528 | 0.948 | |
| EM SD | 0.466 | 0.519 | 0.605 | 0.558 | 0.163 | 17.980 | 1.157 | |
| CP | 0.977 | 0.965 | 0.952 | 0.934 | 0.947 | 0.857 | 0.721 | |
| 600 | Bias | −0.023 | 0.051 | 0.079 | −0.074 | 0.025 | 3.880 | 0.040 |
| RMSE | 0.369 | 0.392 | 0.465 | 0.452 | 0.127 | 16.134 | 0.795 | |
| EM SD | 0.368 | 0.389 | 0.459 | 0.446 | 0.124 | 15.661 | 0.839 | |
| CP | 0.969 | 0.954 | 0.950 | 0.935 | 0.956 | 0.883 | 0.728 | |
| 800 | Bias | −0.020 | 0.031 | 0.046 | −0.033 | 0.010 | 1.905 | −0.014 |
| RMSE | 0.312 | 0.336 | 0.407 | 0.391 | 0.110 | 6.678 | 0.706 | |
| EM SD | 0.311 | 0.335 | 0.405 | 0.389 | 0.110 | 6.400 | 0.676 | |
| CP | 0.970 | 0.955 | 0.959 | 0.948 | 0.956 | 0.916 | 0.740 | |
| 1000 | Bias | −0.010 | 0.021 | 0.031 | −0.034 | 0.011 | 1.255 | 0.012 |
| RMSE | 0.270 | 0.295 | 0.347 | 0.340 | 0.097 | 5.465 | 0.631 | |
| EM SD | 0.270 | 0.294 | 0.346 | 0.338 | 0.096 | 5.319 | 0.602 | |
| CP | 0.958 | 0.959 | 0.956 | 0.955 | 0.959 | 0.922 | 0.761 | |
| 1200 | Bias | −0.017 | 0.030 | 0.039 | −0.033 | 0.012 | 1.033 | 0.005 |
| RMSE | 0.255 | 0.282 | 0.330 | 0.320 | 0.091 | 4.950 | 0.602 | |
| EM SD | 0.255 | 0.281 | 0.328 | 0.319 | 0.090 | 4.841 | 0.563 | |
| CP | 0.961 | 0.955 | 0.953 | 0.959 | 0.963 | 0.927 | 0.776 | |
| 1400 | Bias | −0.007 | 0.011 | 0.022 | −0.022 | 0.009 | 0.581 | 0.000 |
| RMSE | 0.232 | 0.254 | 0.303 | 0.296 | 0.085 | 2.284 | 0.570 | |
| EM SD | 0.232 | 0.254 | 0.302 | 0.295 | 0.084 | 2.209 | 0.523 | |
| CP | 0.959 | 0.950 | 0.964 | 0.958 | 0.956 | 0.916 | 0.775 | |
| 1600 | Bias | −0.025 | 0.016 | 0.024 | −0.010 | 0.006 | 0.669 | −0.016 |
| RMSE | 0.211 | 0.229 | 0.286 | 0.283 | 0.080 | 2.750 | 0.543 | |
| EM SD | 0.210 | 0.229 | 0.285 | 0.283 | 0.079 | 2.667 | 0.483 | |
| CP | 0.955 | 0.958 | 0.968 | 0.945 | 0.957 | 0.931 | 0.776 | |
| 5000 | Bias | −0.001 | 0.006 | 0.004 | −0.007 | 0.003 | 0.091 | −0.002 |
| RMSE | 0.121 | 0.132 | 0.155 | 0.152 | 0.043 | 0.503 | 0.295 | |
| EM SD | 0.121 | 0.132 | 0.155 | 0.152 | 0.043 | 0.495 | 0.248 | |
| CP | 0.948 | 0.951 | 0.967 | 0.971 | 0.967 | 0.952 | 0.845 | |
The performance evaluation of the proposed model is based on results obtained in a Monte Carlo (MC) study. Table 1 summarizes the simulation study of model parameter estimates from 1000 replicates of experiments. Evaluating the estimates as the sample size increases, the biases and the RMSEs decrease for most cases. This shows us the efficiency of the ML estimates of the proposed model. The estimate of variance for the random variable , which is a function only of the parameter, is more efficient for estimating the dispersion of data generated from the model, yielding favorable results across all studied sample sizes. When analyzing all the regression coefficient estimates, the average bias approaches zero as the sample size increases.
The SD and the RMSE are closer to each other, which suggests that the standard errors of parameters are well estimated. The biases of the time-to-event distribution parameters and become smaller as the sample size increases. Furthermore, it can be noted that these estimated values are greater compared to the ones considered in the regression structure. This happened because the values chosen for the simulation study were larger, in absolute terms, than the ones for the vector of regression parameters. Last but not least, the CPs for all scenarios studied were next to the nominal value (95%). Additional MC simulation results for a different set of parameters can be found in Tables B2 and B1 of Appendix B.
In addition to the simulation study presented in Table 1 above and considering the relationship between the PBS mixture model and the promotion time model through the parameter values , a more challenging scenario for parameter estimation can be observed in Table 2. In this study, 1000 Monte Carlo replicates were generated for fixed sample sizes , varying the true values of in {0.5, 1, 5, 10} while fixing the other model parameters at , , , , and . With the obtained EM-estimated values, it was observed that, as expected, the estimation of the precision parameter becomes more biased as the values of increase. Additionally, the SD and the RMSE are closer to each other, suggesting that the standard errors of parameters are well estimated. Each simulated scenario for the values of considered in the study had CPs close to the nominal level (95%). The study suggests that the estimates of the other parameters were not relatively affected as the parameter increased.
TABLE 2 |.
Empirical bias, root of MSE, standard deviation (SD), and CP of the ML estimators for Poisson–Birnbaum–Saunders mixture model using the Weibull distribution to time-to-event in the concurrent causes regression for variations of ϕ parameter.
| Real value | Estimates | ϕ | β 0 | β 1 | β 2 | α | ν |
|---|---|---|---|---|---|---|---|
|
| |||||||
| ϕ = 0.5 | Bias | 0.019 | 0.003 | −0.008 | −0.009 | 0.002 | −0.001 |
| RMSE | 0.120 | 0.127 | 0.142 | 0.151 | 0.132 | 0.040 | |
| EM SD | 0.119 | 0.127 | 0.142 | 0.151 | 0.132 | 0.040 | |
| CP | 0.978 | 0.962 | 0.959 | 0.979 | 0.978 | 0.977 | |
| ϕ = 1 | Bias | 0.069 | −0.007 | 0.002 | −0.005 | 0.007 | 0.001 |
| RMSE | 0.320 | 0.121 | 0.134 | 0.159 | 0.151 | 0.044 | |
| EM SD | 0.313 | 0.121 | 0.135 | 0.159 | 0.151 | 0.044 | |
| CP | 0.971 | 0.957 | 0.955 | 0.964 | 0.966 | 0.972 | |
| ϕ = 5 | Bias | 1.736 | 0.003 | 0.001 | 0.002 | −0.005 | 0.001 |
| RMSE | 6.298 | 0.107 | 0.116 | 0.140 | 0.140 | 0.038 | |
| EM SD | 6.054 | 0.107 | 0.116 | 0.140 | 0.139 | 0.038 | |
| CP | 0.931 | 0.958 | 0.952 | 0.951 | 0.945 | 0.944 | |
| ϕ = 10 | Bias | 4.384 | 0.001 | 0.006 | 0.016 | −0.021 | 0.005 |
| RMSE | 14.331 | 0.101 | 0.109 | 0.126 | 0.127 | 0.035 | |
| EM SD | 13.644 | 0.101 | 0.109 | 0.125 | 0.125 | 0.034 | |
| CP | 0.864 | 0.958 | 0.952 | 0.970 | 0.941 | 0.954 | |
4.2 |. Hypothesis Testing
In this subsection, a Monte Carlo study was conducted to evaluate the performance of the LR test in comparing the proposed PBS mixture model with the Poisson model when the parameter increases. The main purpose of this simulation study is to assess the performance of our proposed model for different values for within the parameter space. Drawing inspiration from the findings of Barreto-Souza (2015), where the promotion model is presented as a limiting case, the focus here is on investigating the model’s efficacy through the LR test. Our exploration centers on testing the hypothesis:
The LR test allows us to discern substantial deviations from the null hypothesis, positing the PBS mixture model as the true model. Through this study, the aim is to gauge the power of the LR test in identifying significant deviations from the null hypothesis. Notably, the null hypothesis above lies at the boundary of the parameter space for , making the usual LR test statistic, denoted as , where represents the log-likelihood function evaluated at the ML estimator under hypothesis , , unable to follow the standard chi-squared distribution with 1 degree of freedom . Instead, in this scenario, the asymptotic distribution is given by , as demonstrated by Stram and Lee (1994).
The simulation scenario used consists of the following configuration: In the data generation, we consider the case when the alternative hypothesis is true, that is, the competitive causes are due to the PBS mixture model. The time-to-event values were sampled from the Weibull distribution with fixed parameters and . We generated a sample for the number of competing causes using the PBS mixture model, with fixed regression coefficients in , , , and to analyze those significant deviations of the null hypothesis mentioned later, varying the parameter at . The sample sizes were defined in . The significance levels were set at . The percentage of rejection of the null hypothesis for the fixed cases is displayed in Table 3.
TABLE 3 |.
Power (%) of the LR test for different values of ϕ and sample sizes.
| Significance level (%) and sample size (n) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||||
| 1% | 5% | 10% | ||||||||||
|
|
|
|
||||||||||
| ϕ | 200 | 400 | 600 | 1000 | 200 | 400 | 600 | 1000 | 200 | 400 | 600 | 1000 |
|
| ||||||||||||
| 0.6 | 0.246 | 0.470 | 0.689 | 0.889 | 0.516 | 0.727 | 0.887 | 0.971 | 0.643 | 0.847 | 0.939 | 0.988 |
| 1 | 0.171 | 0.407 | 0.605 | 0.809 | 0.430 | 0.680 | 0.809 | 0.934 | 0.581 | 0.801 | 0.894 | 0.976 |
| 10 | 0.050 | 0.073 | 0.114 | 0.199 | 0.158 | 0.247 | 0.306 | 0.435 | 0.267 | 0.370 | 0.424 | 0.565 |
| 25 | 0.021 | 0.024 | 0.049 | 0.068 | 0.105 | 0.112 | 0.161 | 0.220 | 0.179 | 0.201 | 0.261 | 0.325 |
| 100 | 0.018 | 0.016 | 0.015 | 0.012 | 0.075 | 0.069 | 0.067 | 0.075 | 0.126 | 0.124 | 0.136 | 0.138 |
From Table 3, we can deduce that the power of the LR test rises proportionally with larger sample sizes, as anticipated. For small values of , the study shows that the test is more powerful in identifying the PBS mixture model as the best fit, indicating its effectiveness in detecting significant deviations from the null hypothesis. It is also possible to observe that as the value of increases, the percentage of rejection of the LR test decreases, indicating that for large values of , the Poisson model is the most adequate to adjust data with this feature, as expected.
All the scenarios chosen for this simulation study have a considered computational effort to compute these rejection percentages. The higher the values of parameters and the sample sizes considered in this study, the greater the computational effort employed in obtaining the calculated percentages.
5 |. Application With Breast Cancer Data
In this section, we present a real data problem related to melanoma cancer in the state of São Paulo, Brazil. Female breast cancer is the most common incidence in the world, with 2.3 million new cases (11.7%), followed by lung cancer with 2.2 million cases (11.4%). Additionally, colon and rectum cancer accounted for 1.9 million cases (10.0%), prostate cancer for 1.4 million cases (7.3%), and nonmelanoma skin cancer for 1.2 million cases (6.2%) (INCA 2022). Like in the rest of the world, cancer plays a significant role in public health in Brazil, contributing to a substantial number of deaths and placing pressure on the public healthcare system’s costs. Among the various prevalent types of cancer in the country, breast cancer stands out as a prominent contributor to this increasing mortality trend. Globally, it is estimated that 70% of breast cancer deaths occur in women from low- and middle-income countries (Goss, Lee, and Badovinac-crnjevic 2013). According to data from the National Cancer Institute, in the State of São Paulo, Brazil, the estimated crude and adjusted incidence rates per 100,000 inhabitants, as well as the number of new cancer cases for the year 2023, were 97.72 and 58.90 cases, respectively.
Swaminathan, Saravanamurali, and Yadav (2023) conducted a comprehensive assessment of treatments aimed at improving survival rates for breast cancer. Therapeutic decisions are typically based on the recognition of the unique characteristics of tumors. In cases of nonmetastatic breast cancer, the standard approach involves surgical excision and the removal of axillary lymph nodes, followed by postoperative radiotherapy as a local therapy. In essence, the primary procedures for treating nonmetastatic breast cancer include removing the tumor and regional lymph nodes from the breast and preventing metastatic recurrence. Surgery and radiotherapy are commonly employed as regional approaches for early-stage tumors. Chemotherapy is considered a gold-standard treatment strategy, utilizing combinations of cytotoxic drugs to either destroy or reduce the growth of breast cancer cells. The selection or combination of these medical approaches depends on the overall condition of the patients.
In this paper, we utilize a dataset from the Oncology Foundation of São Paulo (FOSP), São Paulo, Brazil, which is a public institution affiliated with the State Health Secretariat, is tasked with coordinating and providing guidance on healthcare through the state’s Hospital Cancer Registry. Moreover, it plays a crucial role in enabling oncology hospitals to develop protocols and enhance care practices (De Andrade et al. 2012). The dataset comprises observations from a retrospective survey involving patients diagnosed with breast cancer in the State of São Paulo, Brazil, during the years 2009–2016, with follow-up conducted until 2021. The event of interest was defined as death due to breast cancer, and the time-to-event was calculated as the period between the date of diagnosis and the date of death attributed to cancer. Patients who did not experience cancer-related mortality during the follow-up period were considered as right-censored observations. The dataset contains a total of 59,300 patients, and the explanatory variables considered in our analysis are as follows: : Clinical cancer stage (Stage I: ; Stage II: ; Stage III: ; and Stage IV, ), patient’s treatment, being : surgery (yes: ), : radiotherapy (yes: ), : chemotherapy (yes: ), and : age at diagnosis in years (mean ± SD, 56.3 ± 13.62). The maximum observed follow-up time was 13.85 years. The median and mean follow-up times were approximately 5.27 and 5.35 years, respectively. The percentage of censored observations was 77.67.
Figure 1 shows the estimated Kaplan–Meier (KM) curves associated with the clinical stage and type of treatment. Higher survival rates were observed in early clinical stages (I and II), while poorer prognoses were noted in advanced clinical Stage IV. Notably, higher survival rates were observed in patients who underwent surgery, received radiotherapy, and did not receive chemotherapy. For patients undergoing chemotherapy, better survival rates were observed within the first 2 years, with improved long-term survival for those who did not receive chemotherapy. This result is expected, as a significant percentage of patients at clinical stages III and IV received chemotherapy (80.8%), which typically presents a more challenging prognosis.
FIGURE 1 |.

Estimated survival function (SF) obtained from the Kaplan–Meier estimator for overall patients diagnosed with breast cancer, by clinical stage, surgery, radiotherapy, chemotherapy, and combinations of treatments.
A hypothesis test proposed by Maller and Zhou (1992) available in the package npcure (López et al. 2020) of R Core Team (2023) was carried out with the aim of verifying whether there is the presence of “immune” individuals in the study, this estimated value is used to test whether the study has enough follow-up time. Based on the results for this application, the test has provided evidence through p-value < 0.0001, that there is the presence of immune individuals and that the follow-up time is sufficient at a 5% significance level. The same characteristics for long-term survivals in the KM curves in data for patients with breast cancer were discussed in Rodrigues et al. (2016), Makdissi et al. (2019), and Pal (2021).
In this section, we fitted the proposed model as well as various cure rate models from the existing literature to the real breast cancer data presented in the previous section. The event of interest was defined as death due to breast cancer. Our objective was to assess the effect of variables such as age at diagnosis, clinical stage, surgery, radiotherapy, and chemotherapy on survival rates.
We obtained ML estimates by employing the EM algorithm, as detailed in Section 3.1. The EM algorithm has been implemented in the R language (R Core Team 2023) and is available to the community upon request. Furthermore, we computed the ML estimates for the parameters of the compared models, including negative binomial, Poisson, and Bernoulli (standard mixture) using the EM.PScr function included in the PScr (Gallardo and Azimi 2023) package of R software.
It is important to emphasize that, in our proposal, the time-to-event distribution chosen for individuals at risk is the Weibull distribution, following parameterizations as detailed in Table 2 of Gallardo, Romeo, and Meyer (2017b) for both and , survival and density functions, which are defined by parameters. These values differ from the typical Weibull distribution included in the PScr package used to compute EM estimates for the models under comparison, which are parameterized by , where and are the conventional shape and scale parameters, respectively.
We conducted a comparative analysis of the proposed PBS mixture model against the negative binomial, Poisson, and Bernoulli (standard mixture) models to evaluate their fitting performances on the dataset, considering that the time-to-event comes from the Weibull distribution.
In Table 4, values for the Akaike information criterion (AIC), as introduced by Akaike (Akaike 1973), Bayesian information criterion (BIC), proposed by Schwarz (Schwarz 1978), and Bayes factor (BF) are provided. We use the BF to evaluate the magnitude of the difference between two BIC values; see Kass and Raftery (1995). We compute the AIC and BIC in all models, but the BF is obtained for the comparison between the PBS versus Negative Binomial (NB), PBS versus Poisson, and PBS versus Bernoulli. The decision about the best fit is made according to the interpretation of the BF presented in Table 6 of Leiva, Tejo, et al. (2015). Table 4 indicates that the PBS mixture model provides the best overall fit in terms of AIC, BIC, and BF.
TABLE 4 |.
AIC, BIC, and BF values obtained by fitting the mixture Poisson–Birnbaum–Saunders, negative binomial, Poisson, and Bernoulli (standard mixture) models to the breast cancer dataset.
| Models | AIC | BIC | Estimated log-likelihood | BF |
|---|---|---|---|---|
|
| ||||
| Poisson–Birnbaum–Saunders mixture | 92,666.63 | 92,765.53 | −46,322.32 | — |
| Negative binomial | 92,705.88 | 92,804.77 | −46,341.94 | 39.24 |
| Poisson | 93,144.66 | 93,234.56 | −46,562.33 | 469.04 |
| Bernoulli (standard mixture) | 95,413.74 | 95,503.65 | −47,696.87 | 2738.12 |
The ML estimates of the model parameters, accompanied by their corresponding standard errors and p-values for each model, can be found in Table 5. Given that both AIC and BIC criteria have indicated our proposal as the most suitable among the four fitted models, the interpretation of the results will be based on the estimated parameters of this specific model.
TABLE 5 |.
ML estimate, standard error (SE), and respective p-value obtained by fitting of cure rate models for the Poisson–Birnbaum–Saunders mixture, negative binomial, Poisson, and Bernoulli (standard mixture) applied to breast cancer.
| Poisson–Birnbaum–Saunders mixture | Negative binomial | Poisson | Bernoulli (standard mixture) | |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|||||
| Parameter | ML | SE | ML | SE | ML | SE | ML | SE |
|
| ||||||||
| β 0: Intercept | −1.673 | 0.092 | −1.477 | 0.154 | −1.905 | 0.081 | −2.649 | 0.092 |
| p-value | <0.001 | <0.001 | <0.001 | <0.001 | ||||
| β 1: Stage II | 1.219 | 0.050 | 1.194 | 0.048 | 1.120 | 0.046 | 1.169 | 0.050 |
| p-value | <0.001 | <0.001 | <0.001 | <0.001 | ||||
| β 2: Stage III | 2.533 | 0.055 | 2.472 | 0.051 | 2.219 | 0.045 | 2.562 | 0.051 |
| p-value | <0.001 | <0.001 | <0.001 | <0.001 | ||||
| β 3: Stage IV | 4.088 | 0.075 | 3.986 | 0.064 | 3.361 | 0.046 | 7.953 | 0.830 |
| p-value | <0.001 | <0.001 | <0.001 | <0.001 | ||||
| β 4: Surgery | −0.755 | 0.030 | −0.748 | 0.028 | −0.561 | 0.020 | −0.732 | 0.038 |
| p-value | <0.001 | <0.001 | <0.001 | <0.001 | ||||
| β 5: Radiotherapy | −0.327 | 0.024 | −0.326 | 0.023 | −0.252 | 0.019 | −0.319 | 0.030 |
| p-value | <0.001 | <0.001 | <0.001 | <0.001 | ||||
| β 6: Chemotherapy | 0.052 | 0.030 | 0.022 | 0.029 | 0.091 | 0.023 | 0.357 | 0.039 |
| p-value | 0.080 | 0.459 | <0.001 | <0.001 | ||||
| β 7: Age | 0.008 | 0.001 | 0.008 | 0.001 | 0.006 | 0.001 | 0.009 | 0.001 |
| p-value | <0.001 | <0.001 | <0.001 | <0.001 | ||||
| α | −3.526 | 0.070 | — | — | — | — | — | — |
| ν and ν(PScr) | 1.396 | 0.019 | 1.331 | 0.016 | 1.182 | 0.012 | 1.167 | 0.009 |
| σ (PScr) | — | — | 17.233 | 1.497 | 12.500 | 0.795 | 5.006 | 0.057 |
| ϕ | 1.359 | 0.141 | — | — | — | — | — | — |
| q | — | — | 1.120 | 0.080 | — | — | — | — |
Based on the results provided in Table 5, all covariates included in the analysis are statistically significantly associated (at the 5% significance level) with the time-to-event, except for the chemotherapy, which can be considered significant at 8%. Positive estimated regression coefficients were obtained for the clinical stage and age at diagnosis, indicating that higher clinical stages and older age at diagnosis are associated with worse survival rates. Conversely, negative estimated values were obtained for the surgery and radiotherapy indicating that patients who undergo surgery and radiotherapy have better survival rates compared to those who did not receive surgery and radiotherapy. Examining the fitted values for chemotherapy treatment, notable variations in the estimated coefficient values emerge, particularly in the model fitted through the Bernoulli (standard mixture). Despite its relevance to the estimated model, this discrepancy may arise from the observations associated with this variable. Furthermore, it is possible to note that the adjustment for the Bernoulli model presents a very unsatisfactory result for the estimated value of the log-likelihood presented in Table 4, which is much lower than the other models compared. This fact may be impacting the estimated values for the parameters, making them different from the other settings.
All of the findings in this study are consistent with observations made in routine clinical practice. Clinical stage and age at diagnosis have previously been reported as prognostic factors, indicating that younger patients in early clinical stages who undergo surgery and radiotherapy tend to have a better prognosis (Makdissi et al. 2019).
The estimated long-term survivors for Equation (3) considering patients with fixed ages at diagnosis of 20, 56 (the average age of patients), and 70 years, undergoing various types of treatments across the four clinical stages are shown in Table 6. The study reveals that estimated long-term survivors decrease as age increases, indicating that younger patients have better survival rates when diagnosed early. As expected, patients in clinical stage IV exhibited a poorer prognosis, regardless of their age at diagnosis and the type of treatment received. In some cases, physicians have opted for submitting the patients to more than one treatment, providing in some cases higher probability of cure than a specific treatment isolated.
TABLE 6 |.
ML estimates of cure rate and 95% confidence interval (CI) obtained by the delta method for the Poisson–Birnbaum–Saunders mixture cure rate model applied to the breast cancer dataset through the stage of disease and treatments.
| Age: 20 years old | |||||
|---|---|---|---|---|---|
|
| |||||
| Stages of disease | Stage I | Stage II | Stage III | Stage IV | |
|
| |||||
| No treatment | Estimate | 0.826 | 0.593 | 0.278 | 0.046 |
| CI 95% | (0.802; 0.849) | (0.555; 0.631) | (0.243; 0.314) | (0.036; 0.055) | |
| Surgery | Estimate | 0.908 | 0.751 | 0.457 | 0.129 |
| CI 95% | (0.896; 0.921) | (0.724; 0.778) | (0.418; 0.496) | (0.107; 0.152) | |
| Radiotherapy | Estimate | 0.867 | 0.666 | 0.353 | 0.075 |
| CI 95% | (0.848; 0.886) | (0.632; 0.701) | (0.314; 0.392) | (0.061; 0.089) | |
| Chemotherapy | Estimate | 0.818 | 0.581 | 0.267 | 0.042 |
| CI 95% | (0.795; 0.842) | (0.546; 0.616) | (0.236; 0.298) | (0.034; 0.050) | |
| Surgery and radiotherapy | Estimate | 0.932 | 0.806 | 0.537 | 0.183 |
| CI 95% | (0.922; 0.942) | (0.784; 0.828) | (0.450; 0.575) | (0.155; 0.212) | |
| Surgery and chemotherapy | Estimate | 0.904 | 0.742 | 0.444 | 0.122 |
| CI 95% | (0.892; 0.917) | (0.717; 0.766) | (0.410; 0.478) | (0.103; 0.141) | |
| Radiotherapy and chemotherapy | Estimate | 0.861 | 0.655 | 0.341 | 0.070 |
| CI 95% | (0.842; 0.880) | (0.623; 0.687) | (0.306; 0.375) | (0.057; 0.082) | |
| Surgery, radiotherapy and chemotherapy | Estimate | 0.929 | 0.798 | 0.525 | 0.174 |
| CI 95% | (0.919; 0.938) | (0.778; 0.817) | (0.492; 0.558) | (0.150; 0.199) | |
|
| |||||
| Age: 56 years old | |||||
|
| |||||
| Stages of disease | Stage I | Stage II | Stage III | Stage IV | |
|
| |||||
| No treatment | Estimate | 0.784 | 0.528 | 0.221 | 0.028 |
| CI 95% | (0.759; 0.810) | (0.487; 0.568) | (0.193; 0.250) | (0.023; 0.034) | |
| Surgery | Estimate | 0.884 | 0.699 | 0.39 | 0.093 |
| CI 95% | (0.870; 0.898) | (0.672; 0.726) | (0.356; 0.425) | (0.077; 0.109) | |
| Radiotherapy | Estimate | 0.833 | 0.605 | 0.290 | 0.050 |
| CI 95% | (0.813; 0.854) | (0.572; 0.639) | (0.257; 0.323) | (0.040; 0.059) | |
| Chemotherapy | Estimate | 0.776 | 0.515 | 0.211 | 0.026 |
| CI 95% | (0.749; 0.802) | (0.481; 0.549) | (0.185; 0.237) | (0.021; 0.031) | |
| Surgery and radiotherapy | Estimate | 0.913 | 0.761 | 0.471 | 0.138 |
| CI 95% | (0.902; 0.924) | (0.739; 0.783) | (0.436; 0.505) | (0.116; 0.159) | |
| Surgery and chemotherapy | Estimate | 0.878 | 0.688 | 0.378 | 0.087 |
| CI 95% | (0.864; 0.893) | (0.663; 0.713) | (0.346; 0.409) | (0.073; 0.101) | |
| Radiotherapy and chemotherapy | Estimate | 0.826 | 0.593 | 0.279 | 0.046 |
| CI 95% | (0.805; 0.847) | (0.561; 0.625) | (0.249; 0.309) | (0.038; 0.054) | |
| Surgery, radiotherapy and chemotherapy | Estimate | 0.909 | 0.752 | 0.458 | 0.130 |
| CI 95% | (0.898; 0.920) | (0.731; 0.773) | (0.426; 0.489) | (0.111; 0.149) | |
|
| |||||
| Age: 70 years old | |||||
|
| |||||
| Stages of disease | Stage I | Stage II | Stage III | Stage IV | |
|
| |||||
| No treatment | Estimate | 0.766 | 0.502 | 0.201 | 0.023 |
| CI 95% | (0.739; 0.794) | (0.459; 0.545) | (0.174; 0.228) | (0.018; 0.028) | |
| Surgery | Estimate | 0.873 | 0.677 | 0.365 | 0.080 |
| CI 95% | (0.857; 0.888) | (0.649; 0.705) | (0.330; 0.399) | (0.066; 0.095) | |
| Radiotherapy | Estimate | 0.818 | 0.581 | 0.267 | 0.042 |
| CI 95% | (0.796; 0.841) | (0.546; 0.615) | (0.235; 0.299) | (0.034; 0.050) | |
| Chemotherapy | Estimate | 0.757 | 0.489 | 0.191 | 0.021 |
| CI 95% | (0.729; 0.786) | (0.454; 0.524) | (0.166; 0.216) | (0.017; 0.025) | |
| Surgery and radiotherapy | Estimate | 0.904 | 0.742 | 0.444 | 0.122 |
| CI 95% | (0.892; 0.916) | (0.718; 0.766) | (0.409; 0.479) | (0.102; 0.142) | |
| Surgery and chemotherapy | Estimate | 0.867 | 0.666 | 0.352 | 0.075 |
| CI 95% | (0.851; 0.883) | (0.639; 0.693) | (0.321; 0.384) | (0.062; 0.088) | |
| Radiotherapy and chemotherapy | Estimate | 0.811 | 0.568 | 0.256 | 0.038 |
| CI 95% | (0.787; 0.834) | (0.535; 0.602) | (0.227; 0.285) | (0.031; 0.046) | |
| Surgery, radiotherapy and chemotherapy | Estimate | 0.900 | 0.732 | 0.432 | 0.115 |
| CI 95% | (0.887; 0.912) | (0.709; 0.755) | (0.400; 0.464) | (0.0969; 0.1322) | |
Figure 2 shows the quantile versus quantile plot of the normalized randomized quantile residuals for the PBS mixture model, which suggests that the proposed model shows a good agreement with the expected standard normal distribution. Figure 2 also illustrates the estimated survival curve associated with patients who underwent radiotherapy and chemotherapy for three different ages through the stages of the disease. These estimates also indicate that younger patients in all clinical stages who undergo radiotherapy and chemotherapy tend to have a better prognosis. Furthermore, the estimated survival decays for all ages studied as the clinical stage grows up. in all scenarios, older patients had worse cure rates.
FIGURE 2 |.

Normalized randomized quantile residuals for the PBS mixture applied to the breast cancer dataset. Estimated SF for the PBS mixture for patients 20, 56, and 70 years old who underwent radiotherapy and chemotherapy through different stages of the disease: Stage I (black), Stage II (red), Stage III (green), and Stage IV (blue).
6 |. Concluding Remarks
In this paper, we propose a new model for long-term survival data, assuming that the number of concurrent causes for events of interest is a mixture of the Poisson and BS distributions. This approach represents a significant innovation, as the BS model does not belong to the exponential family, presenting several interesting properties and applications in medical and biological research. A distinguishing feature of the proposed model is the existence of closed-form equations for all conditional expectations, allowing an efficient estimation via maximum likelihood. In addition, the developed estimation algorithm is remarkably simple to implement, as all the steps are completely defined. This model emerges as a competitive alternative to the negative binomial model, which is widely recognized in the literature. Both models share the characteristic of overdispersion of concurrent causes relative to the mean. However, our model utilizes the BS as an innovative and popular alternative in recent literature, adding versatility to the available modeling options.
The simulation study suggests that the ML estimators have good performance in terms of bias, RMSE, and coverage probability, despite the heightened complexities inherent in modeling with small samples, particularly concerning the estimation of the parameter, and in terms of overall fitting. Through the power study for the likelihood test, the authors have estimated the statistical test through samples generated from the model under investigation contrasting these results with the Poisson model. The study has shown great percentages of rejection of the Poisson model for minor values of the dispersion parameter , in addition to a decrease in this percentage according to the higher values set for this parameter. This result is expected as the authors have argued in Section 2 for the development of the proposed model.
The proposed methodology has fitted well with the dataset provided by FOSP with a retrospective survey of 59,300 patients diagnosed with breast cancer in the State of São Paulo, Brazil. Criterion’s, AIC, BIC, and BF have shown that the PBS mixture cure rate model had a better fit as compared to the Poisson, negative binomial, and Bernoulli models. Furthermore, we had well-fitting through the normalized randomized quantiles residuals.
In short, our methodology was able to yield more precise inferences regarding the impact of disease stages, different types of applied treatments, and patient ages than the commonly used promotion time model in survival data analysis with cure fraction, in addition to the negative binomial and the standard mixture models. Furthermore, This study underscores the significance of early disease detection in achieving treatment success, emphasizing the importance of both breast self-examination and regular screening examinations in enhancing treatment efficacy and attaining higher rates of recovery through therapeutic interventions.
Supplementary Material
Acknowledgments
The authors thank the editors and reviewers for their constructive comments on an earlier version of this article. The research was partially supported by CNPq and CAPES grants from the Brazilian federal government, by FAPEAM grants from the government of the State of Amazonas, and by the NIH National Center for Advancing Translational Sciences UCLA CTSI UL1 TR001881. Marcelo Bourguignon and Jeremias Leão are grateful for CNPq grants 304140/2021-0 and 304015/2021-0, respectively.
Appendix A
In this section, we provide details for Proposition 3.1 and Corollary 3.1.
A. 1 |. Proof of Proposition 3.1
Using the results in Gallardo et al. (2017), we obtain For .
where , , . The result is obtained recognizing the distributions in each case.
A.2 |. Proof of Corollary 3.1
As described later, if the conditional distribution of , we have the following results:
Using the properties of conditional expectation, it is easy to see that
Since the distribution of with defined in Equation (5) and using the expressions calculated later it is straightforward that
From Proposition 3.1, conditional distribution for so that its expected value is . Similarly, using the properties of conditional expectation, we have that
Appendix B: Simulation Study
TABLE B1 |.
Empirical, bias, root of MSE, standard error (SE), and CP of the ML estimators for Poisson–Birnbaum–Saunders mixture model using the Weibull distribution to time-to-event in the concurrent causes regression.
| Sample size (n) | Parameter | β 0 | β 1 | β 2 | α | ν | ϕ | Var(Z) |
|---|---|---|---|---|---|---|---|---|
| Real value | −0.81 | 1.26 | 2.64 | −3.93 | 1.43 | 0.94 | 1.80 | |
|
| ||||||||
| 400 | Bias | −0.011 | −0.010 | −0.010 | 0.002 | 0.006 | 1.265 | −0.102 |
| RMSE | 0.395 | 0.406 | 0.484 | 0.486 | 0.145 | 6.814 | 0.714 | |
| SE | 0.458 | 0.438 | 0.575 | 0.681 | 0.173 | 6.754 | 0.967 | |
| CP | 0.974 | 0.959 | 0.960 | 0.959 | 0.953 | 0.969 | 0.798 | |
| 600 | Bias | −0.050 | 0.016 | −0.007 | 0.039 | −0.003 | 0.945 | −0.115 |
| RMSE | 0.322 | 0.331 | 0.394 | 0.408 | 0.122 | 6.065 | 0.618 | |
| SE | 0.365 | 0.356 | 0.470 | 0.548 | 0.141 | 2.703 | 0.796 | |
| CP | 0.961 | 0.957 | 0.959 | 0.955 | 0.948 | 0.973 | 0.819 | |
| 800 | Bias | −0.042 | 0.005 | −0.015 | 0.034 | 0.000 | 0.365 | −0.101 |
| RMSE | 0.283 | 0.293 | 0.341 | 0.356 | 0.105 | 2.279 | 0.530 | |
| SE | 0.314 | 0.308 | 0.406 | 0.476 | 0.123 | 1.048 | 0.676 | |
| CP | 0.964 | 0.959 | 0.973 | 0.967 | 0.967 | 0.978 | 0.842 | |
| 1000 | Bias | −0.026 | 0.014 | 0.001 | 0.008 | 0.003 | 0.210 | −0.060 |
| RMSE | 0.255 | 0.249 | 0.294 | 0.315 | 0.092 | 0.851 | 0.473 | |
| SE | 0.282 | 0.276 | 0.365 | 0.428 | 0.110 | 0.731 | 0.628 | |
| CP | 0.972 | 0.967 | 0.968 | 0.976 | 0.971 | 0.979 | 0.864 | |
| 1200 | Bias | −0.024 | 0.014 | 0.004 | 0.011 | 0.000 | 0.214 | −0.045 |
| RMSE | 0.228 | 0.238 | 0.284 | 0.291 | 0.086 | 2.141 | 0.435 | |
| SE | 0.258 | 0.252 | 0.333 | 0.390 | 0.101 | 0.822 | 0.583 | |
| CP | 0.973 | 0.958 | 0.958 | 0.959 | 0.964 | 0.982 | 0.876 | |
| 1400 | Bias | −0.017 | 0.007 | 0.000 | 0.006 | 0.002 | 0.152 | −0.042 |
| RMSE | 0.207 | 0.215 | 0.247 | 0.259 | 0.078 | 0.836 | 0.395 | |
| SE | 0.237 | 0.233 | 0.309 | 0.361 | 0.094 | 0.585 | 0.528 | |
| CP | 0.965 | 0.968 | 0.977 | 0.968 | 0.969 | 0.981 | 0.892 | |
| 1600 | Bias | −0.020 | 0.011 | 0.005 | 0.008 | 0.001 | 0.115 | −0.037 |
| RMSE | 0.195 | 0.193 | 0.231 | 0.243 | 0.072 | 0.567 | 0.373 | |
| SE | 0.221 | 0.218 | 0.288 | 0.337 | 0.087 | 0.467 | 0.499 | |
| CP | 0.960 | 0.971 | 0.981 | 0.969 | 0.970 | 0.984 | 0.896 | |
| 5000 | Bias | −0.005 | −0.004 | −0.004 | 0.009 | −0.003 | 0.028 | −0.020 |
| RMSE | 0.108 | 0.113 | 0.121 | 0.111 | 0.037 | 0.158 | 0.170 | |
| SE | 0.125 | 0.123 | 0.163 | 0.190 | 0.049 | 0.217 | 0.225 | |
| CP | 0.972 | 0.962 | 0.979 | 0.979 | 0.975 | 0.997 | 0.944 | |
TABLE B2 |.
Empirical, bias, root of MSE, standard error (SE), and CP of the ML estimators for the Poisson–Birnbaum–Saunders mixture model using the Weibull distribution to time-to-event in the concurrent causes regression.
| Sample size (n) | Parameter | β 0 | β 1 | β 2 | α | ν | ϕ | Var(Z) |
|---|---|---|---|---|---|---|---|---|
| Real value | 1.9 | −1.5 | −0.2 | −4 | 1.8 | 0.6 | 2.42 | |
|
| ||||||||
| 400 | Bias | 0.023 | −0.009 | 0.005 | −0.047 | 0.023 | 0.855 | −0.067 |
| RMSE | 0.654 | 0.425 | 0.305 | 0.579 | 0.195 | 5.321 | 0.840 | |
| SE | 0.782 | 0.460 | 0.309 | 0.699 | 0.229 | 3.545 | 1.945 | |
| CP | 0.948 | 0.953 | 0.953 | 0.957 | 0.954 | 0.902 | 0.876 | |
| 600 | Bias | −0.047 | 0.018 | 0.008 | 0.020 | −0.001 | 0.450 | −0.117 |
| RMSE | 0.486 | 0.325 | 0.238 | 0.429 | 0.143 | 3.226 | 0.657 | |
| SE | 0.641 | 0.372 | 0.247 | 0.571 | 0.187 | 2.027 | 1.403 | |
| CP | 0.970 | 0.963 | 0.967 | 0.966 | 0.962 | 0.974 | 0.894 | |
| 800 | Bias | −0.075 | 0.029 | 0.004 | 0.053 | −0.013 | 0.216 | −0.145 |
| RMSE | 0.417 | 0.270 | 0.198 | 0.370 | 0.122 | 0.941 | 0.578 | |
| SE | 0.556 | 0.320 | 0.211 | 0.494 | 0.161 | 0.681 | 1.174 | |
| CP | 0.959 | 0.968 | 0.967 | 0.961 | 0.967 | 0.974 | 0.895 | |
| 1000 | Bias | −0.011 | 0.006 | 0.011 | −0.010 | 0.006 | 0.164 | −0.042 |
| RMSE | 0.403 | 0.260 | 0.185 | 0.349 | 0.118 | 1.167 | 0.536 | |
| SE | 0.500 | 0.289 | 0.192 | 0.444 | 0.146 | 0.595 | 1.204 | |
| CP | 0.960 | 0.964 | 0.951 | 0.968 | 0.961 | 0.964 | 0.923 | |
| 1200 | Bias | −0.043 | 0.009 | 0.005 | 0.028 | −0.007 | 0.093 | −0.089 |
| RMSE | 0.339 | 0.237 | 0.171 | 0.291 | 0.098 | 0.349 | 0.446 | |
| SE | 0.453 | 0.263 | 0.173 | 0.402 | 0.132 | 0.344 | 0.940 | |
| CP | 0.973 | 0.958 | 0.951 | 0.972 | 0.969 | 0.981 | 0.939 | |
Footnotes
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research Badges
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section.
This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary to reproduce the reported results. The results reported in this article could fully be reproduced.
Supporting Information
Additional supporting information can be found online in the Supporting Information section.
Data Availability Statement
The data that support the findings of this study are available in the Supplementary Material.
References
- Abramowitz M, and Stegun I. 1972. Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables. New York: Dover. [Google Scholar]
- Akaike H 1973. “Information Theory and an Extension of the Maximum Likelihood Principle.” In International Symposium on Information Theory, edited by Petrov BN and Csaki F, 267–281. Budapest, Hungary: Akademiai Kiado. [Google Scholar]
- Balakrishnan N, and Kundu D. 2019. “Birnbaum-Saunders Distribution: A Review of Models, Analysis, and Applications.” Applied Stochastic Models in Business and Industry 35, no. 1: 4–49. [Google Scholar]
- Barreto-Souza W 2015. “Long-Term Survival Models With Overdispersed Number of Competing Causes.” Computational Statistics and Data Analysis 91, no. 1: 51–63. [Google Scholar]
- Berkson J, and Gage RP. 1952. “Survival Curve for Cancer Patients Following Treatment.” Journal of the American Statistical Association 47, no. 259: 501–515. [Google Scholar]
- Birnbaum ZW, and Saunders SC. 1969. “A New Family of Life Distributions.” Journal of Applied Probability 6: 319–327. [Google Scholar]
- Borchers HW 2023. pracma: Practical Numerical Math Functions. R package version 2.3.8. https://CRAN.R-project.org/package=pracma. [Google Scholar]
- Brandão M, Leão J, Gallardo D, and Bourguignon M. 2023. “Cure Rate Models for Heterogeneous Competing Causes.” Statistical Methods in Medical Research 32, no. 9: 1823–1841. [DOI] [PubMed] [Google Scholar]
- Cancho VG, Louzada F, and Ortega EM. 2013. “The Power Series Cure Rate Model: An Application to a Cutaneous Melanoma Data.” Communications in Statistics-Simulation and Computation 42, no. 3: 586–602. [Google Scholar]
- Chen M-H, Ibrahim J, and Sinha D. 1999. “A New Bayesian Model for Survival Data With a Surviving Fraction.” Journal of the American Statistical Association 94, no. 447: 909–919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Andrade CT, Magedanz A, Escobosa DM, et al. 2012. “The Importance of a Database in the Management of Healthcare Services.” Einstein (São Paulo) 10: 360–365. [DOI] [PubMed] [Google Scholar]
- Dempster AP, Laird NM, and Rubin DB. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B (Methodological) 39, no. 1: 1–38. [Google Scholar]
- Desmond A 1985. “Stochastic Models of Failure in Random Environments.” Canadian Journal of Statistics 13, no. 13: 171–183. [Google Scholar]
- Ferlay J, Colombet M, Soerjomataram I, et al. 2021. “Cancer Statistics for the Year 2020: An Overview.” International Journal of Cancer 149, no. 4: 778–789. [DOI] [PubMed] [Google Scholar]
- Gallardo DI, Gómez HW, and Bolfarine H. 2017a. “A New Cure Rate Model Based on the Yule–Simon Distribution With Application to a Melanoma Data Set.” Journal of Applied Statistics 44, no. 7: 1153–1164. [Google Scholar]
- Gallardo DI, Gómez YM, and de Castro M. 2018. “A Flexible Cure Rate Model Based on the Polylogarithm Distribution.” Journal of Statistical Computation and Simulation 88, no. 11: 2137–2149. [Google Scholar]
- Gallardo DI, Romeo JS, and Meyer R. 2017b. “A Simplified Estimation Procedure Based on the EM Algorithm for the Power Series Cure Rate Model.” Communications in Statistics-Simulation and Computation 46, no. 8: 6342–6359. [Google Scholar]
- Gallardo DI, and Azimi R. 2023. PScr: Estimation for the Power Series Cure Rate Model. R package version 1.1. https://CRAN.R-project.org/package=PScr. [Google Scholar]
- Gómez Y, Gallardo D, Bourguignon M, Bertolli E, and Calsavara V. 2023. “A General Class of Promotion Time Cure Rate Models With a New Biological Interpretation.” Lifetime Data Analysis 29: 66–86. [DOI] [PubMed] [Google Scholar]
- Gonçalves J, Barreto-Souza W, and Ombao H. 2022. “Poisson-Birnbaum-Saunders Regression Model for Clustered Count Data.” Preprint. 10.48550/arXiv.2202.10162. [DOI]
- Goss PE, Lee BL, and Badovinac-Crnjevic T. 2013. “Planning Cancer Control in Latin America and the Caribbean.” The Lancet Oncology 14, no. 5: 391–436. [DOI] [PubMed] [Google Scholar]
- Hajek J, Sidak Z, and Sen PK. 1999. Theory of Rank Tests. San Diego, CA: Academic Press. [Google Scholar]
- Hashimoto E, Ortega E, Cordeiro G, and Cancho V. 2014. “The Poisson Birnbaum–Saunders Model With Long-Term Survivors.” Statistics: A Journal of Theoretical and Applied Statistics 48, no. 6: 1394–1413. [Google Scholar]
- INCA. 2022. Estimativa 2023: Incidência do Câncer no Brasil. Rio de Janeiro: INCA-Instituto Nacional do Cancer. [Google Scholar]
- Kalbfleisch JD, and Prentice RL. 2002. The Statistical Analysis of Failure Time Data. New York: Wiley. [Google Scholar]
- Kass RE, and Raftery AE. 1995. “Bayes Factors.” Journal of the American Statistical Association 90, no. 430: 773–795. 10.1080/01621459.1995.10476572. [DOI] [Google Scholar]
- Kotz S, Leiva V, and Sanhueza A. 2010. “Two New Mixture Models Related to the Inverse Gaussian Distribution.” Methodology and Computing in Applied Probability 12: 199–212. [Google Scholar]
- Leão J, Bourguignon M, Gallardo DI, Rocha R, and Tomazella V. 2020. “A New Cure Rate Model With Flexible Competing Causes With Applications to Melanoma and Transplantation Data.” Statistics in Medicine 39, no. 24: 3272–3284. [DOI] [PubMed] [Google Scholar]
- Leiva V, Marchant C, Ruggeri F, and Saulo H. 2015. “A Criterion for Environmental Assessment Using Birnbaum-Saunders Attribute Control Charts.” Environmetrics 26: 463–476. [Google Scholar]
- Leiva V, Ruggeri F, Saulo H, and Vivanco JF. 2017. “A Methodology Based on the Birnbaum-Saunders Distribution for Reliability Analysis Applied to Nano-Materials.” Reliability Engineering and System Safety 157: 192–201. [Google Scholar]
- Leiva V, Tejo M, Guiraud P, Schmachtenberg O, Orio P, and Marmolejo-Ramos F. 2015. “Modeling Neural Activity With Cumulative Damage Distributions.” Biological Cybernetics 109: 421–433. [DOI] [PubMed] [Google Scholar]
- Leiva V, Santos-Neto M, Cysneiros FJA, and Barros M. 2014. “Birnbaum-Saunders Statistical Modelling: A New Approach.” Statistical Modelling 14: 21–48. [Google Scholar]
- Leiva V, Saulo H, Leão J, and Marchant C. 2014. “A Family of Autoregressive Conditional Duration Models Applied to Financial Data.” Computational Statistics and Data Analysis 79: 175–191. [Google Scholar]
- López-de-Ullibarri I, López-Cheda A, Jácome MA, and Borchers HW. 2020. npcure: Nonparametric Estimation in Mixture Cure Models. R package version 0.1–5. https://CRAN.R-project.org/package=npcure. [Google Scholar]
- Makdissi FB, Leite FPM, Peres SV, et al. 2019. “Breast Cancer Survival in a Brazilian Cancer Center: A Cohort Study of 5,095 Patients.” Mastology 29, no. 1: 37–46. [Google Scholar]
- Maller RA, and Zhou S. 1992. “Estimating the Proportion of Immunes in a Censored Sample.” Biometrika 79, no. 4: 731–739. [Google Scholar]
- Pal S 2021. “A Simplified Stochastic EM Algorithm for Cure Rate Model With Negative Binomial Competing Risks: An Application to Breast Cancer Data.” Statistics in Medicine 28: 0277–6715. [DOI] [PubMed] [Google Scholar]
- R Core, Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
- Rodrigues J, de Castro M, Cancho VG, and Balakrishnan N. 2009. “Com–Poisson Cure Rate Survival Models and an Application to a Cutaneous Melanoma Data.” Journal of Statistical Planning and Inference 139, no. 10: 3605–3611. 10.1016/j.jspi.2009.04.014. [DOI] [Google Scholar]
- Rodrigues J, Cordeiro GM, Cancho VG, and Balakrishnan N. 2016. “Relaxed Poisson Cure Rate Models.” Biometrical Journal 58: 397–4157. [DOI] [PubMed] [Google Scholar]
- Rodrigues R, Cancho V, De Castro M, and Louzada-Neto F. 2009. “On the Unification of Long-Term Survival Models.” Statistics and Probability Letters 79, no. 6: 753–759. 10.1016/j.spl.2008.10.029. [DOI] [Google Scholar]
- Santos-Neto M, Cysneiros FJA, Leiva V, and Ahmed S. 2012. “On New Parameterizations of the Birnbaum-Saunders Distribution.” Pakistan Journal of Statistics 28: 1–26. [Google Scholar]
- Saulo H, Leiva V, Ziegelmann FA, and Marchant C. 2013. “A Nonparametric Method for Estimating Asymmetric Densities Based on Skewed Birnbaum-Saunders Distributions Applied to Environmental Data.” Stochastic Environmental Research and Risk Assessment 27: 1479–1491. [Google Scholar]
- Schwarz G 1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6: 461–464. [Google Scholar]
- Stram D, and Lee J. 1994. “Variance Components Testing in the Longitudinal Mixed Effects Model.” Biometrics 50: 1171–1177. [PubMed] [Google Scholar]
- Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. 2021. “Global Cancer Statistics 2020: Globocan Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.” CA: A Cancer Journal for Clinicians 71, no. 3: 209–249. [DOI] [PubMed] [Google Scholar]
- Swaminathan H, Saravanamurali K, and Yadav SA. 2023. “Extensive Review on Breast Cancer Its Etiology, Progression, Prognostic Markers, and Treatment.” Medical Oncology 40, no. 8: 238. [DOI] [PubMed] [Google Scholar]
- Wild CP, Weiderpass E, and Stewart B. 2020. World Cancer Report: Cancer Research for Cancer Prevention. Lyon, France: International Agency for Research on Cancer. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are available in the Supplementary Material.
