Abstract
In obstetrics and gynecology, knowledge about how women's features are associated with childbirth is important. This leads to establishing guidelines and can help managers to describe the dynamics of pregnant women's hospital stays. Then, time is a variable of great importance and can be described by survival models. An issue that should be considered in the modeling is the inclusion of women for whom the duration of labor cannot be observed due to fetal death, generating a proportion of times equal to zero. Additionally, another proportion of women's time may be censored due to some intervention. The aim of this paper was to present the Log-Normal zero-inflated cure regression model and to evaluate likelihood-based parameter estimation by a simulation study. In general, the inference procedures showed a better performance for larger samples and low proportions of zero inflation and cure. To exemplify how this model can be an important tool for investigating the course of the childbirth process, we considered the Better Outcomes in Labor Difficulty project dataset and showed that parity and educational level are associated with the main outcomes. We acknowledge the World Health Organization for granting us permission to use the dataset.
Keywords: Childbirth, cure, duration of labor, survival analysis, zero-inflation
1. Introduction
Survival and hazard models are widely used in different research fields. In medical studies, for example, the most common application for these models is to evaluate time until the occurrence of an event, such as the recurrence of a disease under treatment or the death of a patient. There are also other medical areas, such as obstetrics and gynecology, in which these models are less used but have great potential for applications. Thus, for survival analysis to be increasingly used in different areas, it is important that the usual models are adapted to the complexity present in the data.
The usual survival models assume that the probability that an individual does not present the event at time 0 is equal to 1. From the practical point of view, this assumption prevents times equal to zero from being analyzed. Another assumption is that , i.e. for sufficiently long times, all individuals will present the event of interest [17]. In order to overcome the limitations inherent to the standard models, the cure models and, more recently, the zero-inflated survival models were proposed.
The cure models were proposed in order to allow that a proportion of the subjects do not present the event of interest, even after a long follow-up (immune subjects). Perhaps the most popular models with this attribute are the mixture model [3,5] and the hierarchical model [11]. Over the last few years, proposals for cure models have intended to cover more characteristics related to data heterogeneity, for example: more flexible distributions, the presence of covariates, frailty and competing risks, among others. Perdoná and Louzada-Neto (2011) [28] proposed a general risk cure model, which covers several particular cases derived from the Weibull distribution. Other authors also sought to present flexible methods related to data distribution [14,30]. In terms of competing risks, we can cite Barreto-Souza [1], Leão et al [18] and Cancho et al. [10]. There have also been major developments in terms of cure models that deal with frailty and covariates [9,15,31,33]. Therefore, there are various authors that have contributed to proposing broader models that incorporate necessary features to understand the phenomenon under study and a variety of recent updates in this area can be cited [2,7,13,22,23,35].
Concerning the zero-inflation, methods were proposed in a variety of statistical contexts [4,16,19,24,27]. However, in the context of survival data, this feature is still less explored. As far as we know, there are four published papers in this field. Braekers and Grouwels [6] describe a Cox proportional-hazard model with the presence of zeros to study mice sleep times after ingesting ethanol. Louzada, Moreira and Oliveira [20] and Oliveira, Moreira and Louzada [12] provide models considering the Weibull distribution in the context of bank data to study the time until loan fraud. Calsavara et al. [8] propose using defective models with the presence of times equal to zero to study times until the occlusion of the endoscopic stent and time until the beginning of using insulin. Therefore, given the innovative feature of these models, it is important that further studies considering other possibilities are carried out.
In this paper, we will consider a mixture model that includes the two properties described above: cure and zero-inflation. The population survival function is given by , where , is the baseline survival function related to the subjects susceptible to experiencing the event of interest, is the proportion of subjects immune to the event and is the proportion of individuals with time equal to zero. Our practical motivation is to present a more comprehensive model in the study of duration of labor from a real dataset of African pregnant women. In this context, we focus on the Log-Normal distribution which was used in studies related to delivery time or childbirth context. [36,37].
Thus, our main aim is to evaluate the properties of inference methods for the Log-Normal zero-inflated cure (ZIC) model by a simulation study and to present an important example of application considering data from a real African population. Therefore, the paper is organized as follows. In Section 2, we formulate the model and present the approach for parameter estimation. A study based on Monte Carlo simulations with a variety of different parameter values is presented in Section 3. In Section 4, we present more details about the data set and the application results. Some general remarks are presented in Section 5.
2. The Log-Normal zero-inflated cure model
Let T be a non-negative random variable representing the time to non-operative childbirth of a woman in a population and consider that the Log-Normal is the distribution associated with the occurrence of the event in the susceptible group. In this context, the Log-Normal ZIC model is obtained by considering the Log-Normal distribution as the baseline survival, as follows
| (1) |
where Φ is the cumulative distribution function (CDF) of the standard Normal, and are, respectively, location and scale parameters. The parameter represents the proportion of subjects immune to the event (cure). The parameter and represents the proportion of individuals with time equal to zero (zero-inflation). Thus, the ZIC models allow the research to study together a population consisting of three groups: the subjects with t = 0, the immune and the susceptible ones. Note that is an improper survival function, as and .
The probability density function (PDF) and the hazard function are given by
where and are the survival function and PDF of the Log-Normal distribution, respectively. Examples of survival and hazard curves for a variety of parameters are available on a shiny application on: http://200.144.255.68:3838/gleici/LNZICR/
2.1. Log-likelihood
The point estimation is maximum likelihood based. Suppose the data observations of n individuals are given by the pairs ( , ), in which the ith individual is assumed to have a lifetime and a censoring time , with T and C independent, and is the censure indicator variable. Suppose that m<n individuals has t = 0. Then, the log-likelihood function for , corresponding to the observed data is given by
| (2) |
The log-likelihood partial derivatives, which composes the score function are given as follows:
| (3) |
The maximum likelihood estimates (MLE) are obtained by solving the non-linear system of equations , which is difficult to do arithmetically. Then, we can do it using iterative techniques, such as the Newton-Raphson algorithm, which are available on the R routine optim [29].
2.2. Regression model
One of the main goals of biomedical research is to evaluate how patients' characteristics (or independent variables) are associated with the final outcome. Regression models can be a powerful tool in this context. The Log-Normal ZIC regression model allows us to link the independent variables with all the parameters of the model. Thus, if we have a vector of k covariates, we can rewrite the survival defined in (1) as follows:
| (4) |
Suppose that our covariates are organized in a matrix as follows S
where and four unknown vectors of regression coefficients: , , and , where and the length of is k + 1. Then, the regression version of the Log-Normal ZIC model is defined by (4) and by the following systematic components:
| (5) |
where is the ith line of the matrix , and represents the intercept of each linear predictor. Because the proportion and are dependent, the function is established as the multinomial logistic regression, as follows:
It is important to note that each unknown vector regression coefficient is linked with the parameters defined in (4): and are related to and . Additionally, and are related with the location and scale of Log-Normal distribution ( and ).
The estimates of the regression coefficients are also maximum likelihood based. To obtain interval estimates of , we consider the standard asymptotic confidence interval, i.e. the approximate 100% confidence interval (CI) for is given by , where represents the standard normal quantile and Var( ) is obtained from the observed information matrix.
3. Simulation study
In order to evaluate the behavior of point and interval estimators of the Log-Normal ZIC regression model, we perform a Monte Carlo simulation study with different combinations of parameter values and sample sizes. We assume x as a single binary covariate representing a woman characteristic with values drawn from a Bernoulli distribution with parameter 0.5, linked with the parameter by the link functions defined in (5).
For each one of three parameter scenarios (Table 1), we considered five different sample sizes (n = 100, 250, 500, 750 and 1000), generating B = 1000 sample replications, resulting in 15,000 samples. The parameter values are selected in order to assess the ML estimation performance under different shape and scale parameters and also under a composition of different proportions of zero-inflated data and cure. Figure 1 presents the survival curves obtained in each simulation scenario and enables us to note that the proportions of zero-inflation and cure are smaller in the first scenario and bigger in scenario 3. To generate the samples, we considered the CDF inversion method [32], with a random censure process and a Uniform distribution to the censored times.
Table 1.
Parameter values fixed in each scenario (I, II, III).
| I | II | III | ||
|---|---|---|---|---|
| −3.00 | −2.00 | −0.50 | ||
| 1.00 | 2.00 | 0.75 | ||
| −2.50 | −1.50 | −0.35 | ||
| 0.30 | 1.50 | 1.75 | ||
| μ | 0.50 | −0.5 | −1.0 | |
| 0.50 | 1.50 | 2.00 | ||
| σ | 0.25 | −1.50 | −0.50 | |
| 0.75 | 1.50 | −0.50 |
Figure 1.
Survival curves of simulation study parameter scenarios.(a) Scenario I. (b) Scenario II. (c) Scenario III.
Bias and Root Mean Square Error (RMSE) of MLE and coverage probabilities (CP) of CI are presented and, to assess if an estimated CP converges the CI fixed level, , we set the nominal coverage, given by the bounded values: , where B is the number of samples generated to obtain the CP [21]. Thus, in this case the nominal coverage bounds are given by
Tables 2 and 3 present the simulation results for the three simulated scenarios. In general, the biases and RMSE are closer to zero and CPs are closer to 0.95 as the sample size increases. In most cases, the biases of , , and are closer to zero even for small sample sizes, except for Scenario I, in which the biases converge to zero when n is equal to 250 or bigger. Similarly, the CP of and in Scenario I needs a bigger sample size (n=500) to achieve the nominal coverage. These results show that the parameters related to and present better results in the second and third scenarios.
Table 2.
Bias and RMSE of the maximum likelihood estimation Log-Normal ZIC regression model parameters for simulated data.
| Bias | RMSE | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Scenario | n = 100 | n = 250 | n = 500 | n = 750 | n = 1000 | n = 100 | n = 250 | n = 500 | n = 750 | n = 1000 | |
| I | −0.798 | −0.111 | −0.033 | −0.021 | −0.006 | 2.503 | 0.595 | 0.338 | 0.262 | 0.221 | |
| 0.636 | 0.077 | 0.004 | 0.004 | −0.004 | 2.602 | 0.645 | 0.398 | 0.316 | 0.275 | ||
| −0.291 | −0.066 | −0.040 | −0.024 | −0.012 | 1.463 | 0.381 | 0.256 | 0.211 | 0.164 | ||
| −0.269 | −0.064 | 0.010 | 0.011 | −0.003 | 2.311 | 0.840 | 0.382 | 0.289 | 0.238 | ||
| 0.002 | 0.001 | 0.001 | 0.002 | 0.000 | 0.201 | 0.129 | 0.087 | 0.071 | 0.061 | ||
| −0.021 | 0.007 | 0.000 | 0.003 | −0.001 | 0.541 | 0.323 | 0.219 | 0.173 | 0.152 | ||
| −0.025 | −0.009 | −0.003 | −0.004 | −0.002 | 0.118 | 0.071 | 0.050 | 0.040 | 0.033 | ||
| −0.007 | 0.006 | −0.004 | 0.001 | 0.001 | 0.186 | 0.109 | 0.077 | 0.059 | 0.050 | ||
| II | −0.161 | −0.043 | −0.002 | −0.015 | −0.002 | 0.954 | 0.318 | 0.219 | 0.182 | 0.155 | |
| 0.133 | 0.048 | −0.002 | 0.018 | 0.007 | 1.052 | 0.399 | 0.277 | 0.221 | 0.187 | ||
| −0.054 | −0.015 | −0.017 | −0.005 | −0.007 | 0.437 | 0.249 | 0.185 | 0.143 | 0.125 | ||
| −0.264 | −0.001 | 0.007 | 0.006 | 0.006 | 1.644 | 0.453 | 0.276 | 0.207 | 0.179 | ||
| 0.001 | 0.000 | 0.001 | 0.000 | 0.000 | 0.039 | 0.025 | 0.016 | 0.013 | 0.011 | ||
| 0.017 | 0.002 | 0.013 | 0.003 | 0.003 | 0.485 | 0.246 | 0.147 | 0.108 | 0.094 | ||
| −0.034 | −0.012 | −0.008 | −0.003 | −0.002 | 0.130 | 0.077 | 0.053 | 0.044 | 0.036 | ||
| −0.077 | −0.024 | −0.007 | −0.007 | −0.004 | 0.354 | 0.189 | 0.122 | 0.092 | 0.078 | ||
| III | −0.020 | −0.003 | 0.003 | −0.003 | −0.003 | 0.361 | 0.220 | 0.160 | 0.128 | 0.106 | |
| −0.141 | −0.010 | 0.003 | 0.003 | −0.005 | 0.802 | 0.589 | 0.360 | 0.264 | 0.234 | ||
| −0.018 | 0.005 | −0.002 | 0.000 | −0.003 | 0.390 | 0.231 | 0.161 | 0.131 | 0.108 | ||
| −0.462 | −0.098 | 0.003 | −0.010 | −0.010 | 1.837 | 0.981 | 0.425 | 0.289 | 0.249 | ||
| 0.004 | 0.000 | −0.003 | −0.003 | 0.001 | 0.146 | 0.089 | 0.061 | 0.050 | 0.043 | ||
| 0.035 | 0.011 | 0.008 | 0.005 | 0.006 | 0.460 | 0.278 | 0.163 | 0.110 | 0.098 | ||
| −0.060 | −0.024 | −0.013 | −0.009 | −0.004 | 0.198 | 0.112 | 0.077 | 0.061 | 0.052 | ||
| −0.533 | −0.140 | −0.052 | −0.030 | −0.015 | 1.023 | 0.490 | 0.257 | 0.189 | 0.153 | ||
Table 3.
Coverage probability of the maximum likelihood estimation Log-Normal ZIC regression model parameters for simulated data. Values in bold are inside the nominal coverage bounds .
| PC | ||||||
|---|---|---|---|---|---|---|
| Scenario | n = 100 | n = 250 | n = 500 | n = 750 | n = 1000 | |
| 1 | 0.965 | 0.970 | 0.953 | 0.949 | 0.956 | |
| 0.978 | 0.980 | 0.961 | 0.948 | 0.949 | ||
| 0.954 | 0.960 | 0.951 | 0.955 | 0.971 | ||
| 0.992 | 0.976 | 0.962 | 0.951 | 0.964 | ||
| 0.933 | 0.934 | 0.946 | 0.945 | 0.958 | ||
| 0.931 | 0.945 | 0.948 | 0.945 | 0.950 | ||
| 0.935 | 0.936 | 0.930 | 0.945 | 0.955 | ||
| 0.939 | 0.946 | 0.941 | 0.953 | 0.956 | ||
| 2 | 0.972 | 0.962 | 0.941 | 0.946 | 0.951 | |
| 0.957 | 0.966 | 0.945 | 0.949 | 0.957 | ||
| 0.959 | 0.949 | 0.954 | 0.949 | 0.944 | ||
| 0.967 | 0.958 | 0.942 | 0.948 | 0.947 | ||
| 0.933 | 0.941 | 0.956 | 0.945 | 0.958 | ||
| 0.869 | 0.934 | 0.941 | 0.957 | 0.952 | ||
| 0.929 | 0.947 | 0.955 | 0.943 | 0.964 | ||
| 0.907 | 0.932 | 0.941 | 0.949 | 0.948 | ||
| 3 | 0.957 | 0.963 | 0.943 | 0.953 | 0.959 | |
| 0.946 | 0.947 | 0.964 | 0.963 | 0.940 | ||
| 0.956 | 0.952 | 0.946 | 0.942 | 0.967 | ||
| 0.983 | 0.957 | 0.963 | 0.952 | 0.933 | ||
| 0.915 | 0.947 | 0.953 | 0.948 | 0.950 | ||
| 0.834 | 0.883 | 0.922 | 0.951 | 0.951 | ||
| 0.917 | 0.932 | 0.933 | 0.946 | 0.952 | ||
| 0.748 | 0.867 | 0.914 | 0.928 | 0.934 | ||
In relation to parameters , , and , it can be observed that the biases obtained from Scenario II and III need a bigger sample size to converge to zero, particularly for . In terms of CP, and in Scenario II and III also need a minimum n of 500 or 750 to converge to the nominal CP, indicating that the parameters related to μ and σ presented better results in the first scenario.
In summary, it can be observed that high proportions of zero inflation and censure improve the estimates of the and parameters, but have an opposite effect in the estimates of μ and σ.
4. Data application: sub-Saharan African pregnant women
In this section, we present an application of the proposed model in a database consisting of information on the labor of 7062 sub-Saharan African pregnant women, which were selected according to clinical characteristics of interest, as described in Figure A1. The dataset considered is from the Better Outcomes in Labor Difficulty (BOLD) study, which is a cohort study of women giving birth in health facilities in Nigeria and Uganda and was an initiative of the World Health Organization (WHO). [25,26,34].
An important issue to improve pregnant and neonatal care is to study the pattern of childbirth progression. This should be based on the knowledge of how possible events occur and change over time from hospital admission until vaginal birth and how women's characteristics (e.g. age, parity) are associated with this behavior. Concerning the duration of labor (i.e. the time between hospital admission and vaginal birth (main outcome)) of the BOLD data, three main groups can be identified. The first one refers to women who arrive at the hospital already having had a stillbirth and, generally, the time is not registered, leading to the need to consider that the time is equal to zero (Fetal Death (FD) group). On the other hand, there are some women that may not undergo vaginal birth because of an intervention (cesarean section, for example), which affects the normal progress of labor (Non Vaginal Birth (NVB) group). Finally, there is a third group (Vaginal Birth (VB) group), which presents a normal progression, with vaginal birth.
Given the presence of these three groups, the proposed model allows studying the duration of childbirth by relating each group to the terms of the model. The proportion of FD and NVB women are modeled by zero inflation ( ) and cure ( ) terms, respectively. The VB group can be considered as having positive and uncensored times, and are modeled by . Therefore, it can be assessed if women's characteristics are associated with time until childbirth and with the propensity of presenting a previous fetal death or a non-vaginal birth.
In order to study the effect on childbirth time of two independent variables, educational level and parity, we adjusted three Log-Normal ZIC models. In the first one, we inserted only the educational level into all the model parameters. In the second, only the parity was inserted as a covariate. Finally, we fitted a model with the two covariates together, by running a forward selection model with the Akaike Criterion (AIC) as the insertion criterion. Additionally, in order to evaluate how more traditional models fit to our data, we also fitted these models considering Weibull distribution and the models without cure. Both independent variables are dichotomous, the educational level assumes 0 for lower levels (complete primary education or less) and 1 for higher levels (incomplete secondary education or more). Parity assumes 0 for nulliparous and 1 for multiparous. Table 4 presents the AIC of the models without and with independent variables for the fitted models. The results indicate that the ZIC Log-Normal model obtained by the selection method is more adequate to fit the data, as the value of AIC is the lowest.
Table 4.
Akaike criterion (AIC) values for the zero-inflated model assuming Log-Normal and Weibull distributions without covariates, with educational and with Parity as covariates.
| No covariate | Educational Level | Parity | Forward | |
|---|---|---|---|---|
| Log-Normal ZIC | 30745.8 | 30702.6 | 30288.4 | 30248.88 |
| Weibull ZIC | 31195.0 | 31182.6 | 30905.9 | 31305.31 |
| Log-Normal ZI | 30756.1 | 30713.0 | 30301.0 | 30263.10 |
| Weibull ZI | 31364.6 | 31328.9 | 31081.9 | 31303.32 |
Figure 2(a) presents the Kaplan Meier (KM) and fitted survival estimated curves. Furthermore, Tables 5 and 6 present the estimated parameters and estimated survival of the Log-Normal ZIC regression model considering educational level and parity as covariates after variable selection. In this model, the parity was inserted as the covariate in the parameter μ and educational level in the and σ parameters In general, the model is well fitted when compared to the KM curves.
Figure 2.
Results of Log-Normal ZIC model by educational level and parity: Fitted survival curves with Kapplan Meier estimates(a) and fitted hazard function(b). (a) Survival. (b) Hazard.
Table 5.
MLE, standard deviation (SD) and 95%CI of Log-Normal ZIC regression model parameters with Educational level and Parity as covariates.
| MLE | SD | CI (95%) | |||
|---|---|---|---|---|---|
| −2.314 | 0.125 | −2.56 | −2.07 | ||
| (Educational level) | −0.844 | 0.140 | −1.12 | −0.57 | |
| −3.220 | 0.267 | −3.74 | −2.70 | ||
| μ | 2.539 | 0.028 | 2.48 | 2.59 | |
| (Parity) | −0.628 | 0.029 | −0.69 | −0.57 | |
| σ | −0.100 | 0.033 | −0.16 | −0.04 | |
| (Educational level) | 0.106 | 0.034 | 0.04 | 0.17 | |
Table 6.
Survival (%) estimates in Admission, 6, 12 and 18 h.
| Adm | 6 h | 12 h | 18 h | |
|---|---|---|---|---|
| Parity =0 Low education | 91.32 | 73.37 | 49.51 | 34.15 |
| Parity =0 High education | 96.07 | 74.94 | 51.87 | 37.27 |
| Parity >0 Low education | 91.32 | 52.01 | 26.59 | 15.75 |
| Parity >0 High education | 96.07 | 54.23 | 29.93 | 18.93 |
The coefficients and CI presented in Table 5 allow us to interpret how each variable changes the pattern of childbirth. In the first place, it can be observed that the parameters related to educational level, and , have strictly negative and positive CI, respectively. Therefore, there is evidence that women with a higher education have a smaller probability of arriving at hospital with fetal death and higher variability of time until vaginal childbirth than women with a lower education. The estimated fetal death proportion (zero inflation) is equal to 8.68% and 3.93% when the educational level is low and high, respectively. On the other hand, the estimated values of cure proportions are 3.51% and 3.69% for low and high educational levels, respectively. Concerning parity, the model suggests that this variable affects the average duration of labor. In addition, the CI for the μ( ) parameter is strictly negative, indicating that multiparous women have a mean time shorter than nulliparous. After 6 h of admission, about 74% of nulliparous woman still have the possibility of undergoing vaginal birth in low and high education. These proportions decrease to 52% and 54% for multiparous women (Table 6)
Finally, Figure 2(b) presents the baseline hazard function, which reflects the instantaneous rate of occurrence of vaginal delivery for susceptible women (VB group). We can observe that the hazard shape is the unimodal for all parity and educational level categories, which means that the hazard is initially increasing and, after reaching a maximum, is descending. Parity presents the greater difference between the categories, as multiparous has superior hazard than nulliparous over time. The maximum instantaneous rate is observed at 9.9 and 7.7 h for multiparous with lower and higher education, respectively. For the nulliparous, the maximum instantaneous rates are observed at 5.2 and 4.1 h.
5. Final remarks
In this paper, we presented the Log-Normal zero-inflated cure model as a new survival method. To the best of our knowledge, there is no previous literature that presents the use of Log-Normal distribution in the context of zero-inflated survival models. Our model includes an important characteristic of childbirth times in developing countries: a proportion of times equal to zero due to a fetal death. Therefore, the model allowed us to estimate the proportions of three groups of women in a given dataset: the one where time is equal to zero (fetal death); a segment of those who are susceptible to the event of interest (vaginal childbirth); and a segment of those who are not susceptible to the event (non vaginal birth).
The simulation study enabled us to assess the performances of the MLEs and confidence intervals. The MLEs performance is satisfactory even for the small sample size evaluated (n = 100) and it is improved for larger samples, given that bias and variance decrease as the sample size increases. High proportions of zero inflation and censure improve the estimates of and parameters, but have an opposite effect on the parameter estimates of the parameters linked to the distribution of the baseline survival.
We present the application of the proposed model on a dataset of sub-Saharan African women. Our model presented good results in the estimation of the survival with two independent dichotomous covariates: educational level and parity. The model enabled us to verify that a high educational level is related to decreased fetal death rate, in contrast to parity, which is related to a difference in the mean time for non-operative labor. Thus, we show that that the model is an important tool for investigating the course of the childbirth process beginning from the moment that a woman arrives in the facility until the time of childbirth. From a clinical perspective, these findings may not be unexpected, however they draw attention to the need to further explore interventions to reduce the risk of fetal death among women with lower levels of education.
As mentioned above, the class of survival models that consider the possibility of times equal to zero has been underexplored. Thus, we believe that it is necessary to investigate the properties of this class considering, for example: more general distributions and the possibility of competing risks. Regarding to the study of labor itself, it is known that it is a multifactorial event, thus it would be interesting to evaluate methods to select variables and also the consideration of unobservable factors, through a term of frailty.
It is important to highlight that our model has several other possibilities of application besides the study of the childbirth time presented here. There are different situations in which databases present a proportion of the times that are equal to zero. For example, in oncology, when there is interest in the time between diagnosis and the occurrence of metastasis, it is possible that patients already present metastasis at the time of diagnosis. In the financial area, it is observed that a proportion of clients do not pay their loans from the beginning of the contract, leading to times of default equal to zero. In general, the times equal to zero are excluded from the analysis, because the standard survival models do not allow it. As the ZIC models are recent, we believe that it is very important to evaluate the properties for different baseline distributions. This will help future research to fully explore the data with more flexibility.
Supplementary Material
Acknowledgments
This manuscript reports on a secondary analysis of the World Health Organization BOLD Project database. This project was implemented by WHO to further understand the patterns of labor progression in sub-Saharan African women and to contribute to the development of evidence-based guidelines to improve women's experience with labor and childbirth. The UNDP-UNFPA-UNICEF-WHO-World Bank Special Programme of Research, Development and Research Training in Human Reproduction (HRP), a cosponsored program executed by the World Health Organization (WHO) provided access to the BOLD data set for this analysis. The named authors alone are responsible for the views expressed in this publication. The authors would like to thank Professor João Paulo Souza for comments provided in early versions of this manuscript.
Appendix 1. Sample selection flow chart.
Figure A1.
Sample selection flow chart.
Funding Statement
The research was partially sponsored by the following Brazilian Funding Agencies: FAEPA, CAPES, CNPq and FAPESP.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Barreto-Souza W., Long-term survival models with overdispersed number of competing causes, Comput. Stat. Data. Anal. 91 (2015), pp. 51–63. [Google Scholar]
- 2.Beesley L.J. and Taylor J.M., Em algorithms for fitting multistate cure models, Biostatistics 20 (2019), pp. 416–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Berkson J. and Gage R.P., Survival curve for cancer patients following treatment, J. Am. Stat. Assoc. 47 (1952), pp. 501–515. [Google Scholar]
- 4.Bertoli W., Conceição K.S., Andrade M.G., and Louzada F., A bayesian approach for some zero-modified poisson mixture models, Stat. Modelling. 20 (2020), pp. 467–501. [Google Scholar]
- 5.Boag J.W., Maximum likelihood estimates of the proportion of patients cured by cancer therapy, J. R. Stat. Soc. Ser. B (Methodological) 11 (1949), pp. 15–53. [Google Scholar]
- 6.Braekers R. and Grouwels Y., A semi-parametric Cox's regression model for zero-inflated left-censored time to event data, Commun. Stat. Theor. Meth. 45 (2016), pp. 1969–1988. [Google Scholar]
- 7.Bremhorst V. and Lambert P., Flexible estimation in cure survival models using Bayesian P-splines, Comput. Stat. Data Anal. 93 (2016), pp. 270–284. [Google Scholar]
- 8.Calsavara V.F., Rodrigues A.S., Rocha R., Louzada F., Tomazella V., Souza A.C., Costa R.A., and Francisco R.P., Zero-adjusted defective regression models for modeling lifetime data, J. Appl. Stat. 46 (2019), pp. 2434–2459. [Google Scholar]
- 9.Calsavara V.F., Rodrigues A.S., Tomazella V.L.D., and de Castro M., Frailty models power variance function with cure fraction and latent risk factors negative binomial, Commun. Stat. Theor. Meth. 46 (2017), pp. 9763–9776. [Google Scholar]
- 10.Cancho V.G., Louzada F., Dey D.K., and Barriga G.D., A new lifetime model for multivariate survival data with a surviving fraction, J. Stat. Comput. Simul. 86 (2016), pp. 279–292. [Google Scholar]
- 11.Chen M.-H., Ibrahim J.G., and Sinha D., A new Bayesian model for survival data with a surviving fraction, J. Am. Stat. Assoc. 94 (1999), pp. 909–919. [Google Scholar]
- 12.de Oliveira M.R., Moreira F., and Louzada F., The zero-inflated promotion cure rate model applied to financial data on time-to-default, Cogent Econom. Financ. 5 (2017), pp. 1395950. [Google Scholar]
- 13.Gallardo D.I., Bolfarine H., and Pedroso-de Lima A.C., Destructive weighted poisson cure rate models with bivariate random effects: classical and Bayesian approaches, Comput. Stat. Data. Anal. 98 (2016), pp. 31–45. [Google Scholar]
- 14.Gallardo D.I., Gómez Y.M., and de Castro M., A flexible cure rate model based on the polylogarithm distribution, J. Stat. Comput. Simul. 88 (2018), pp. 2137–2149. [Google Scholar]
- 15.Gonzales J.F.B., Tomazella V.L.D., and Taconelli J.P., Estimação paramétrica do modelo de mistura com fragilidade gama na presença de covariáveis, Rev. Bras. Biom. 1 (2013), pp. 233–247. [Google Scholar]
- 16.Lambert D., Zero-inflated poisson regression, with an application to defects in manufacturing, Technometrics 34 (1992), pp. 1–14. [Google Scholar]
- 17.Lawless J.F, Statistical Models and Methods for Lifetime Data, Vol. 362, John Wiley & Sons, Hoboken, NJ, 2011. [Google Scholar]
- 18.Leão J., Bourguignon M., Gallardo D.I., Rocha R., and Tomazella V., A new cure rate model with flexible competing causes with applications to melanoma and transplantation data, Stat. Med. 39 (2020), pp. 3272–3284. [DOI] [PubMed] [Google Scholar]
- 19.Liu L., Shih Y.-C.T., Strawderman R.L., Zhang D., Johnson B.A., and Chai H., Statistical analysis of zero-inflated nonnegative continuous data: a review, Stat. Sci. 34 (2019), pp. 253–279. [Google Scholar]
- 20.Louzada F., Moreira F.F., and de Oliveira M.R., A zero-inflated non default rate regression model for credit scoring data, Commun. Stat. Theor. Meth. 47 (2018), pp. 3002–3021. [Google Scholar]
- 21.Louzada-Neto F., Extended hazard regression model for reliability and survival analysis, Lifetime. Data. Anal. 3 (1997), pp. 367–381. [DOI] [PubMed] [Google Scholar]
- 22.Marinho A.R. and Loschi R.H., Bayesian cure fraction models with measurement error in the scale mixture of normal distribution, Stat. Methods. Med. Res. 29 (2020), pp. 2411–2444. [DOI] [PubMed] [Google Scholar]
- 23.Martinez E.Z., Achcar J.A., and Icuma T.R., Bivariate basu-dhar geometric model for survival data with a cure fraction, Electron. J. Appl. Stat. Anal. 11 (2018), pp. 655–673. [Google Scholar]
- 24.Neelon B., O'Malley A.J., and Smith V.A., Modeling zero-modified count and semicontinuous data in health services research part 1: background and overview, Stat. Med. 35 (2016), pp. 5070–5093. [DOI] [PubMed] [Google Scholar]
- 25.Oladapo O.T., Souza J.P., Bohren M.A., Fawole B., Mugerwa K., and Gülmezoglu A.M., WHO better outcomes in labour difficulty (BOLD) project: innovating to improve quality of care around the time of childbirth, Reprod. Health. 12 (2015), pp. 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Oladapo O. T., Souza J. P., Fawole B., Mugerwa K., Perdoná G., Alves D., Souza H., Reis R., Oliveira-Ciabati L., Maiorano A., Akintan A., Alu F. E., Oyeneyin L., Adebayo A., Byamugisha J., Nakalembe M., Idris H. A., Okike O., Althabe F., Hundley V., Donnay F., Pattinson R., Sanghvi H. C., Jardine J. E., Tunçalp Ö, Vogel J. P., Stanton M. E., Bohren M., Zhang J., Lavender T., Liljestrand J., ten Hoope-Bender P., Mathai M., Bahl R., Gülmezoglu A. M., and Persson L., Progression of the first stage of spontaneous labour: A prospective cohort study in two sub-saharan african countries, PLoS Med. 15 (2018), pp. e1002492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ospina R. and Ferrari S.L., A general class of zero-or-one inflated beta regression models, Comput. Stat. Data Anal. 56 (2012), pp. 1609–1623. [Google Scholar]
- 28.Perdoná G.C. and Louzada-Neto F., A general hazard model for lifetime data in the presence of cure rate, J. Appl. Stat. 38 (2011), pp. 1395–1405. [Google Scholar]
- 29.R Core Team . R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2017.
- 30.Ramires T.G., Hens N., Cordeiro G.M., and Ortega E.M., Estimating nonlinear effects in the presence of cure fraction using a semi-parametric regression model, Comput. Stat. 33 (2018), pp. 709–730. [Google Scholar]
- 31.Rondeau V., Schaffner E., Corbière F., Gonzalez J.R., and Mathoulin-Pélissier S., Cure frailty models for survival data: application to recurrences for breast cancer and to hospital readmissions for colorectal cancer, Stat. Methods. Med. Res. 22 (2013), pp. 243–260. [DOI] [PubMed] [Google Scholar]
- 32.Ross S.M., Introduction to Probability Models, Academic Press, Los Angeles, CA, 2014. [Google Scholar]
- 33.Scudilio J., Calsavara V.F., Rocha R., Louzada F., Tomazella V., and Rodrigues A.S., Defective models induced by gamma frailty term for survival data with cured fraction, J. Appl. Stat. 46 (2019), pp. 484–507. [Google Scholar]
- 34.Souza J. P., Oladapo O. T, Bohren M. A, Mugerwa K., Fawole B., Moscovici L., Alves D., Perdona G., Oliveira-Ciabati L., Vogel J. P, Tunçalp Ö, Zhang J., Hofmeyr J., Bahl R., and Gülmezoglu A M., The development of a simplified, effective, labour monitoring-to-action (SELMA) tool for better outcomes in labour difficulty (BOLD): study protocol, Reprod. Health 12 (2015), pp. 49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Su C.-L. and Lin F.-C., Analysis of clustered failure time data with cure fraction using copula, Stat. Med. 38 (2019), pp. 3961–3973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Vahratian A., Zhang J., Troendle J.F., Sciscione A.C., and Hoffman M.K., Labor progression and risk of cesarean delivery in electively induced nulliparas, Obstetrics & Gynecology 105 (2005), pp. 698–704. [DOI] [PubMed] [Google Scholar]
- 37.Zhang J., Landy H.J., Branch D.W., Burkman R., Haberman S., Gregory K.D., Hatjis C.G., Ramirez M.M., Bailit J.L., Gonzalez-Quintero V.H., and Hibbard J.U.. Contemporary patterns of spontaneous labor with normal neonatal outcomes, Obstet. Gynecol. 116 (2010), pp. 1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



