ABSTRACT
In reliability or survival analysis, the hazard function plays a significant part for it can display the instantaneous failure rate at any time point. In practice, the abrupt change in hazard function at an unknown time point may occur after a maintenance activity or major operation. Under these circumstances, identifying the change point and estimating the size of the change are meaningful. In this paper, we assume that the hazard function is piecewise constant with a single jump at an unknown time. We propose the single change-point model for interval-censored survival data with a cure fraction. Estimation methods for the proposed model are investigated, and large-sample properties of the estimators are established. Simulation studies are carried out to evaluate the performance of the estimating method. The liver cancer data and breast cancer data are analyzed as the applications.
KEYWORDS: Survival analysis, interval censoring, change-point hazard model, cure fraction, pseudo-maximum likelihood
1. Introduction
The change-point problem in distribution arises in quality control problems and has recently received much attention. In survival analysis, it is of great importance to detect a time lag of treatment effect or identify the change point in a hazard function. In medical follow-up studies, after a major operation, e.g. liver transplantation or bone marrow transplantation, the initial risk is usually high and then the risk drops to a lower constant long term risk. In this case, the hazard function with a change point is the commonly employed model:
| (1) |
where β, and τ are positive constants. In this model, is the indicator function of an event, β and are the hazard rates before and after the change point τ, respectively. The jump θ can be either positive or negative, which reflects an increase or decrease in the hazard rate. There exist three main research aspects of this model in the literature. Firstly, fit the model by means of maximum -ikelihood methods (MLE) (see [3,13,15,17]). The second is testing the existence of the change point, considered by [10,16]. The third aspect is obtaining the estimation of the change-point location by the structural properties of (1), explored by [2,6], and a certain function of the estimated cumulative hazard function is needed in their estimation procedures.
For standard survival models, we generally assume that all the patients will die from the event of interest. However, in clinical studies, a substantial proportion of patients who respond favorably to the treatment appear subsequently to be free of any signs or symptoms of the disease and may be considered cured, while the remaining patients may eventually relapse. The standard survival models may be inappropriate for this type of data. To deal with this situation, the cure model was proposed by [1] as follows:
| (2) |
where is the proportion of people surviving at any given point in time, p is the proportion that is not cured, and represents the survival function of the uncured people. There are several methods proposed to deal with cure models, such as the expectation-maximization algorithm and Markov chain Monte Carlo method [1,19,27]. Survival models with a cure fraction have been extensively studied for decades and many applications have been reported [14], but these models do not consider possible change-point phenomena. In reality, cured patients may well exist in change-point situations. For example, the nonlymphoblastic leukemia data studied in [15] was proved that there exists an abrupt change for the recurrence rate. At the same time, the Kaplan–Meier estimator of the survival function levels off below 1, indicating the presence of cured patients who will never suffer a relapse of leukemia in the data. Inspired by this, we pursue the change-point model (1) with the possible presence of a cure fraction.
Interval censoring is an increasingly common type of censoring and there is voluminous literature on the statistical analysis of interval-censored failure time data (see [9,23,24]). In these studies, we only know that the event occurs within a time interval based on the follow-up visit. In some situations, interval censorship is due to some data-processing programs. The liver cancer data in our real data analysis are from the Surveillance, Epidemiology, and End Results (SEER) cancer incidence public-use database. SEER only publicizes the processed survival month. Thus, we cannot obtain the day-level information, needless to say the exact event time. The survival month (T) after diagnosis is defined by
where . By the definition of survival month T, we obtain that the exact event time lying on the interval or [29]. The breast cancer data in our application come from a retrospective study of 94 patients who received radiation therapy. The clinician, who determined whether or not the retraction had occurred, made a series of observation times to each patient. Hence, the time of retraction is known only to lie between the time of the present and last observation times.
In this article, we focus on modeling the ‘case 2’ interval-censored survival data with the cured patients by model (1). Compared with [28], we deal with different types of data. We replace the Kaplan–Meier method with the ICM method. The NPMLE calculated by ICM can guarantee the consistency which is helpful to get the consistency of the uncured rate's estimation. Additionally, we need some assumptions for interval censoring to obtain asymptotic properties and propose a change-point test procedure.
The rest of the article is organized as follows. In Section 2, we outline the notations and model descriptions of the single change-point hazard model with a cure fraction for interval-censored data. Details of pseudo-maximum-likelihood estimation are presented in Section 3. The procedure of the change-point test is proposed in Section 4. Asymptotic properties are investigated in Section 5. Extensive simulation results are reported in Section 6. In Section 7, we apply the proposed method to the liver cancer data and breast cancer data. Technical proofs are relegated to the supplemental material.
2. Model formulation
Under the cure model (2), let η be the indicator variable with if the patient is uncured and 0 if cured, T be the failure time of a patient and be the failure time of an uncured patient. Define . That is, p is the probability of being uncured. Then the relationship between cumulative distribution functions (c.d.f.) of T and is
where and are the c.d.f. of T and , respectively. Correspondingly, provided and are their density functions, the hazard rate function of T is of the form
| (3) |
Note that as for p<1, and hence the hazard rate cannot remain constant with the presence of a cure fraction. A simple example is the exponential lifetime with cured patients, where is exponentially distributed with a constant hazard rate ψ. Then the hazard rate of T is
which is no longer constant.
Now, assume that the hazard function of is specified as
Then, we obtain the corresponding density function and c.d.f. are, respectively,
and
By (3), the hazard function of T is
| (4) |
There is a jump at τ of size
which is increasing with respect to p if and decreasing for , and it reaches its maximum (minimum) value θ at p=1 for the case of (). Apparently, there are two different mathematical forms before and after the point τ, and . As is common in change-point models, we suppose the existence of bounds and such that (see [2,6,16]). In medical research, one often has to deal with interval-censored survival data when patients are assessed only at pre-scheduled visits. If the event has not occurred at one visit but has occurred by the following visit, the time T is known within an interval. Following the usual formulation, let , be a sample of random variables (r.v.) . We postulate , are non-negative, independent with , following a joint distribution function H, and for some positive constant c. Moreover, we assume that H has a density h, satisfying
| (5) |
Let , denote the data for subjects, where , , and are the censoring indicators and satisfy that when is left censored (), when is right censored (), and when is interval-censored (). Note that for all i. The observed likelihood function of the parameters β, θ, p and τ under the model (4) for interval-censored data is given by
| (6) |
3. Pseudo-Maximum- likelihood estimation
According to the preceding instructions, the log-likelihood function based on the observed data can be written as
where with and is the log-likelihood of a single observation and given by
| (7) |
From (7), it is obvious that the function is not continuous at τ. Hence, the sufficient conditions for consistency are not met. The classical maximum likelihood (MLE) is not appropriate to implement. Thus, we resort to the pseudo-likelihood approach, which overcomes this problem.
The pseudo-likelihood approach was proposed by [7] and further studied by others including [11,12]. The key idea is to replace the true (but unknown) ‘nuisance’ parameters p and τ in (7) by their consistent estimators and , and then treat the log-likelihood function , called the pseudo log-likelihood function, as a usual likelihood function of β and θ to generate the pseudo-MLE of .
The consistent estimators of τ and p can be obtained by nonparametric methods as follows. As in [14], let
| (8) |
where , and denotes the nonparametric estimate of the c.d.f. of failure times which is achieved by the Iterative Convex Minorant (ICM) algorithm. The ICM algorithm proposed by [9] is fast in computing the nonparametric maximum-likelihood estimation (NPMLE) of the distribution function for interval-censored data without covariates. We will show that is consistent in Section 5.
To develop the estimator of τ, note that the cumulative hazard function of T can be obtained by
Let
which is a piecewise linear function of t. Further define
| (9) |
for 0<t<D, where and , . Then we have
| (10) |
which is increasing (decreasing) on and decreasing (increasing) on for . Let be the empirical version of (9) with unknown cumulative hazard function and p replaced by the estimator and in (8), respectively.
Then an estimator of τ is given by
| (11) |
In the absence of a cure fraction, Chang et al. [2] showed that is a consistent estimator of τ, with . Using the asymptotic properties of and , we can also establish the consistency of .
4. Change-point test
In this section, we propose a test to determine whether there is a change point in the hazard function for the data. We test the null hypothesis : or which means there is no change point in the survival distribution versus the alternative hypothesis that there is one change point. Following the results of [21,25,26], together with our model (4), we propose two modified test statistics:
where and are estimators of the parameters corresponding to the model (4) obtained by maximizing the pesudo-likelihood , is the MLE when or , is obtained from (8) and are equally spaced points in the interval . The asymptotic distributions of and under the null hypothesis are complicated. Hence, we apply a resampling procedure to obtain the critical values under , taking as an example:
Step 1. Calculate by (8), and by maximizing .
Step 2. Obtain a series of observation times , m<=2n by arranging all the points in the set from small to large, and removing the repeated points.
Step 3. Generate the failure time data by the model (4) with , , and . Obtain the interval-censored data by setting if .
Step 4. Generate a total of B (e.g. B = 500) simulated trials by repeating Step 3. Obtain the likelihood ratio statistics for each trial.
Step 5. Reject the null hypothesis if , the likelihood ratio statistic calculated from the original trial, is larger than the 95% percentile of .
5. Asymptotic properties
We first introduce the notations. For ease of presentation, we consider the case only. Let , and . The true value of is denoted by . Write for the probability measure of and employ the abbreviation for any measurable function g. Denote as the empirical measure of observations and . Define the parameter spaces for and as
| (12) |
where , , , and are some small positive constants. The first partial derivative of with respect to is donated by where
| (13) |
and
| (14) |
Denote by . From (13) and (14), we can easily have that
| (15) |
on condition that the censoring distribution has a finite variance. Here, a vector or matrix less than infinity means that its all components are less than infinity. Note that is the unique point such that , then we obtain by solving .
The main results on asymptotic properties of , , and are presented in the next five theorems, and their proofs are given in supplemental materia. In order to describe the theorems, we need to define the right extreme of by
Under assumptions of H, we can obtain that , and hence there exists a constant satisfying .
Theorem 1 Consistency of —
Suppose that 0<p<1 and F is continuous at in case . Then in probability as .
Remark 1
Theorem 1 is a modification of [14] for interval-censored data. And because the Kaplan–Meier estimation method is not proper for case 2 interval-censored data, in this paper the nonparametric estimation of the c.d.f. of the failure time is achieved by the ICM algorithm. The strong consistency of obtained by the ICM algorithm is proved by [9], which is sufficient for the proof of Theorem 1.
Using the consistency of and , we establish the consistency of in the following theorem.
Theorem 2 Consistency of —
Assume that F is continuous at in case and . Then the estimator of τ defined in (10) is consistent.
Theorem 3 Consistency of —
Suppose that and are obtained by (8) and (11), respectively. Then almost surely, and converges in outer probability to .
Remark 2
Note that in the following representations indicates convergence to zero in outer probability in case that the term involved is not Borel measurable.
Theorem 4 The rate of convergence of —
Under the conditions in Theorem 3, .
Theorem 5 Asymptotic normality —
Under the conditions in Theorem 3, is asymptotically normal with mean 0 and variance where and and are random vectors satisfying
Remark 3
For the asymptotic variance of in Theorem 5, a precise representation of can be found in Corollary 3.1.4 of [11] for i.i.d. setup. In this case, there exists satisfying that , which presents , where is defined by (3.1.21) in [11]. Without such a ), a closed form of is not available, but we can estimate the variance by the bootstrap method as discussed below.
Since the asymptotic variances of and are often intractable, we resort to the bootstrap method applied in [18,22]. The algorithm proceeds in three steps.
Step 1. Resample the pairs with probability 1/n at each pair . Denote the resampled data by , for some positive integer B.
Step 2. For each set of bootstrap data , , evaluate the estimates of interest.
Step 3. Calculate the sample means and standard deviations of estimators.
6. Simulation results
To evaluate our approach, we do a lot of simulations through different settings. Particularly, the studies are performed to explore the influences of the value of q in (9), the jump size, the change-point location, the sample size and the censoring level. We also check the power of the test procedure. In all cases, we compute the estimation of p by (8) and τ by (11), and then achieve the estimation of β and θ.
The study considers the data simulated from the hazard function defined at (3) and the corresponding distribution function is
where p=0.8, and . For checking the influence of the jumping size, we fix , and set . And we set with θ=1 to see the effect of the change-point location. The change-point search range is set to . There are six parameter configurations totally. We also let for in (9) to assess the influence of q. Each is generated by solving numerically, where . The total number of visit times for each subject is generated according to 1 plus a Poisson random variable having mean parameter σ. The first observation times are the sample from where b is a positive constant. The gap times between adjacent visits are sampled according to an exponential distribution with mean 2. Subsequently, the visit times are given by the cumulative sums of the gap times. The observed interval for the ith subject is then determined to be the two consecutive observation times whose interval contained , with the convention that if is less (greater) than the smallest (largest) observation time then the lower (upper) bound of the observed interval is 0 . Different censoring levels can be obtained by adjusting b and σ. We consider four kinds of left and right censoring levels (CL), (30%, 30%), (20%, 20%), (0, 50%) and (0, 30%). For the purposes of this study, 1000 data sets of the form are generated for each considered parameter configuration where .
Firstly, we investigate the effects of the different q in (9). Since the choice of q can not affect , and directly, the result of is the only assessment of q. Table 1 displays the empirical biases (bias) and sample standard deviations (SD) of under different model scenarios where , p=0.8, , and the sample size n=800. The results indicate that the proposed method with any performs reasonably well. However, smaller sample standard deviations of were obtained when q=1 in all situations. Hence, q=1 is suggested.
Table 1. Results of the estimation with , p=0.8, , and n=800. CL denotes the average rates of both left and right censoring. Denote , , and the pseudo-maximum-likelihood method with , respectively.
| Con. | CL | τ | θ | Bias | SD | Bias | SD | Bias | SD | Bias | SD |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | (30%, 30%) | 0.1 | 1 | 0.050 | 0.271 | 0.048 | 0.256 | 0.042 | 0.241 | 0.039 | 0.231 |
| 2 | (30%, 30%) | 0.5 | 1 | 0.016 | 0.387 | 0.010 | 0.251 | 0.002 | 0.161 | 0.003 | 0.111 |
| 3 | (30%, 30%) | 1 | 1 | 0.007 | 0.227 | 0.001 | 0.195 | 0.011 | 0.181 | 0.003 | 0.162 |
| 4 | (30%, 30%) | 1.5 | 1 | 0.007 | 0.172 | 0.017 | 0.157 | 0.019 | 0.168 | 0.019 | 0.168 |
| 5 | (30%, 30%) | 1 | 0.5 | 0.008 | 0.373 | 0.021 | 0.342 | 0.020 | 0.331 | 0.006 | 0.337 |
| 6 | (30%, 30%) | 1 | 1.5 | 0.016 | 0.159 | 0.006 | 0.144 | 0.004 | 0.123 | 0.001 | 0.119 |
| 7 | (20%, 20%) | 0.1 | 1 | 0.017 | 0.157 | 0.015 | 0.148 | 0.007 | 0.151 | 0.003 | 0.151 |
| 8 | (20%, 20%) | 0.5 | 1 | 0.005 | 0,213 | 0.003 | 0.118 | 0.005 | 0.104 | 0.006 | 0.096 |
| 9 | (20%, 20%) | 1 | 1 | 0.005 | 0.175 | 0.006 | 0.163 | 0.005 | 0.158 | 0.001 | 0.141 |
| 10 | (20%, 20%) | 1.5 | 1 | 0.005 | 0.144 | 0.003 | 0.145 | 0.009 | 0.126 | 0.006 | 0.119 |
| 11 | (20%, 20%) | 1 | 0.5 | 0.003 | 0.245 | 0.001 | 0.237 | 0.009 | 0.203 | 0.002 | 0.195 |
| 12 | (20%, 20%) | 1 | 1.5 | 0.007 | 0.130 | 0.007 | 0.114 | 0.002 | 0.101 | 0.005 | 0.097 |
| 13 | (0, 50%) | 0.1 | 1 | 0.125 | 0.403 | 0.124 | 0.398 | 0.121 | 0.393 | 0.115 | 0.296 |
| 14 | (0, 50%) | 0.5 | 1 | 0.017 | 0.228 | 0.005 | 0.260 | 0.018 | 0.196 | 0.003 | 0.211 |
| 15 | (0, 50%) | 1 | 1 | 0.002 | 0.308 | 0.005 | 0.288 | 0.004 | 0.287 | 0.001 | 0.270 |
| 16 | (0, 50%) | 1.5 | 1 | 0.004 | 0.220 | 0.009 | 0.218 | 0.004 | 0.199 | 0.016 | 0.216 |
| 17 | (0, 50%) | 1 | 0.5 | 0.007 | 0.374 | 0.009 | 0.378 | 0.007 | 0.378 | 0.003 | 0.367 |
| 18 | (0, 50%) | 1 | 1.5 | 0.001 | 0.193 | 0.005 | 0.189 | 0.006 | 0.172 | 0.002 | 0.157 |
| 19 | (0, 30%) | 0.1 | 1 | 0.090 | 0.304 | 0.087 | 0.299 | 0.083 | 0.293 | 0.072 | 0.282 |
| 20 | (0, 30%) | 0.5 | 1 | 0.001 | 0.133 | 0.004 | 0.098 | 0.007 | 0.093 | 0.003 | 0.089 |
| 21 | (0, 30%) | 1 | 1 | 0.011 | 0.198 | 0.001 | 0.191 | 0.011 | 0.168 | 0.008 | 0.167 |
| 22 | (0, 30%) | 1.5 | 1 | 0.006 | 0.177 | 0.003 | 0.169 | 0.007 | 0.169 | 0.001 | 0.168 |
| 23 | (0, 30%) | 1 | 0.5 | 0.001 | 0.377 | 0.001 | 0.297 | 0.006 | 0.292 | 0.007 | 0.289 |
| 24 | (0, 30%) | 1 | 1.5 | 0.002 | 0.148 | 0.001 | 0.140 | 0.004 | 0.138 | 0.001 | 0.134 |
Next, we compare our method with the MLE method suggested by [3,15,25,26]. They obtain the estimators as follows: with a fixed τ, let be the value of ξ maximizing . Then τ is estimated by
Then the maximum-likelihood estimator of is obtained as .
Table 2 displays the empirical biases and sample standard deviations of the estimators considering different configurations of model parameters from the sets , . We also set different censoring levels and consider samples of size 800. From Table 2, the results indicate that both methods perform reasonably well in estimating the model parameters for the cases investigated except the situation (see the configuration sequence 1–7–13–19), since the number of the failures smaller than 0.1 is not enough to obtain good estimators. The performances of the two methods are generally comparable showing similar improvement or deterioration behaviors as the system parameters change. More specifically, biases and standard deviations of the estimators are smaller with a larger jump size (see configuration sequences 3–5–6, 9–11–12, 15–17–18 and 21–23–24). By comparing configurations 1 to 6 with 7 to 12 in Table 2, we obtain that the biases and standard deviations of the estimators are smaller with a lower left and right censoring level. More specifically, the performances are better with a smaller right censoring rate referring to configurations 13 to 24. Looking at the configuration sequences 2–3–4, 8–9–10, 14–15–16 and 20–21–22, the change-point location has no great influences on the estimations, if there are enough samples on both sides of the change point. We also obtain that our method provides smaller biases than the MLE method in most cases. Further, when the jump size is small or the location of the change point is close to 0, our method is overwhelmingly better. In Table 3, we take into account the influence of the sample size. As expected, biases and standard deviations decrease as the sample size increases.
Table 2. Results of the estimation with , p=0.8, , and n=800. PMLE denotes the pseudo-maximum-likelihood method with q=1 and CL denotes the level of right and left censoring.
| PMLE | MLE | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Con. | CL | τ | θ | |||||||||
| 1 | (30%, 30%) | 0.1 | 1 | Bias | 0.005 | 0.039 | 0.087 | 0.089 | 0.001 | 0.116 | 0.087 | 0.229 |
| SD | 0.022 | 0.231 | 0.433 | 0.451 | 0.020 | 0.341 | 0.496 | 0.415 | ||||
| 2 | (30%, 30%) | 0.5 | 1 | Bias | 0.001 | 0.003 | 0.003 | 0.014 | 0.003 | 0.015 | 0.008 | 0.046 |
| SD | 0.020 | 0.111 | 0.113 | 0.222 | 0.019 | 0.093 | 0.109 | 0.229 | ||||
| 3 | (30%, 30%) | 1 | 1 | Bias | 0.001 | 0.003 | 0.002 | 0.012 | 0.001 | 0.015 | 0.001 | 0.040 |
| SD | 0.022 | 0.111 | 0.067 | 0.298 | 0.021 | 0.125 | 0.069 | 0.320 | ||||
| 4 | (30%, 30%) | 1.5 | 1 | Bias | 0.001 | 0.019 | 0.001 | 0.011 | 0.002 | 0.074 | 0.005 | 0.113 |
| SD | 0.023 | 0.168 | 0.062 | 0.414 | 0.020 | 0.284 | 0.075 | 0.582 | ||||
| 5 | (30%, 30%) | 1 | 0.5 | Bias | 0.002 | 0.006 | 0.019 | 0.038 | 0.002 | 0.015 | 0.014 | 0.105 |
| SD | 0.024 | 0.337 | 0.063 | 0.237 | 0.021 | 0.261 | 0.084 | 0.337 | ||||
| 6 | (30%, 30%) | 1 | 1.5 | Bias | 0.001 | 0.001 | 0.003 | 0.010 | 0.001 | 0.011 | 0.004 | 0.040 |
| SD | 0.019 | 0.119 | 0.066 | 0.349 | 0.019 | 0.071 | 0.062 | 0.411 | ||||
| 7 | (20%, 20%) | 0.1 | 1 | Bias | 0.001 | 0.003 | 0.100 | 0.103 | 0.001 | 0.037 | 0.079 | 0.116 |
| SD | 0.015 | 0.151 | 0.379 | 0.363 | 0.015 | 0.185 | 0.401 | 0.371 | ||||
| 8 | (20%, 20%) | 0.5 | 1 | Bias | 0.001 | 0.006 | 0.011 | 0.024 | 0.001 | 0.004 | 0.015 | 0.026 |
| SD | 0.015 | 0.096 | 0.091 | 0.142 | 0.016 | 0.079 | 0.092 | 0.135 | ||||
| 9 | (20%, 20%) | 1 | 1 | Bias | 0.001 | 0.001 | 0.001 | 0.007 | 0.001 | 0.014 | 0.002 | 0.018 |
| SD | 0.015 | 0.141 | 0.063 | 0.212 | 0.015 | 0.125 | 0.066 | 0.202 | ||||
| 10 | (20%, 20%) | 1.5 | 1 | Bias | 0.001 | 0.006 | 0.002 | 0.024 | 0.001 | 0.018 | 0.002 | 0.027 |
| SD | 0.015 | 0.119 | 0.054 | 0.236 | 0.015 | 0.159 | 0.056 | 0.263 | ||||
| 11 | (20%, 20%) | 1 | 0.5 | Bias | 0.001 | 0.002 | 0.002 | 0.019 | 0.001 | 0.023 | 0.002 | 0.052 |
| SD | 0.015 | 0.195 | 0.061 | 0.139 | 0.015 | 0.210 | 0.063 | 0.161 | ||||
| 12 | (20%, 20%) | 1 | 1.5 | Bias | 0.001 | 0.005 | 0.001 | 0.017 | 0.001 | 0.005 | 0.001 | 0.024 |
| SD | 0.015 | 0.097 | 0.062 | 0.247 | 0.015 | 0.097 | 0.062 | 0.229 | ||||
| 13 | (0, 50%) | 0.1 | 1 | Bias | 0.010 | 0.115 | 0.002 | 0.065 | 0.001 | 0.130 | 0.032 | 0.252 |
| SD | 0.032 | 0.296 | 0.413 | 0.478 | 0.028 | 0.410 | 0.500 | 0.447 | ||||
| 14 | (0, 50%) | 0.5 | 1 | Bias | 0.008 | 0.003 | 0.046 | 0.004 | 0.001 | 0.001 | 0.029 | 0.135 |
| SD | 0.033 | 0.211 | 0.119 | 0.317 | 0.029 | 0.218 | 0.123 | 0.351 | ||||
| 15 | (0, 50%) | 1 | 1 | Bias | 0.009 | 0.001 | 0.022 | 0.044 | 0.002 | 0.002 | 0.008 | 0.251 |
| SD | 0.037 | 0.270 | 0.075 | 0.479 | 0.027 | 0.245 | 0.091 | 0.667 | ||||
| 16 | (0, 50%) | 1.5 | 1 | Bias | 0.005 | 0.016 | 0.006 | 0.045 | 0.006 | 0.128 | 0.001 | 0.319 |
| SD | 0.035 | 0.216 | 0.085 | 0.591 | 0.032 | 0.358 | 0.092 | 0.868 | ||||
| 17 | (0, 50%) | 1 | 0.5 | Bias | 0.002 | 0.003 | 0.014 | 0.108 | 0.009 | 0.023 | 0.003 | 0.399 |
| SD | 0.038 | 0.367 | 0.083 | 0.438 | 0.034 | 0.363 | 0.107 | 0.622 | ||||
| 18 | (0, 50%) | 1 | 1.5 | Bias | 0.005 | 0.002 | 0.006 | 0.027 | 0.001 | 0.020 | 0.005 | 0.084 |
| SD | 0.025 | 0.157 | 0.077 | 0.440 | 0.025 | 0.134 | 0.088 | 0.655 | ||||
| 19 | (0, 30%) | 0.1 | 1 | Bias | 0.004 | 0.072 | 0.045 | 0.034 | 0.001 | 0.120 | 0.053 | 0.073 |
| SD | 0.023 | 0.282 | 0.366 | 0.392 | 0.018 | 0.294 | 0.436 | 0.400 | ||||
| 20 | (0, 30%) | 0.5 | 1 | Bias | 0.001 | 0.003 | 0.020 | 0.028 | 0.001 | 0.001 | 0.023 | 0.057 |
| SD | 0.019 | 0.089 | 0.091 | 0.199 | 0.019 | 0.090 | 0.101 | 0.207 | ||||
| 21 | (0, 30%) | 1 | 1 | Bias | 0.001 | 0.008 | 0.003 | 0.045 | 0.001 | 0.007 | 0.005 | 0.082 |
| SD | 0.020 | 0.167 | 0.069 | 0.284 | 0.019 | 0.149 | 0.068 | 0.276 | ||||
| 22 | (0, 30%) | 1.5 | 1 | Bias | 0.001 | 0.001 | 0.003 | 0.018 | 0.001 | 0.040 | 0.008 | 0.075 |
| SD | 0.020 | 0.168 | 0.062 | 0.349 | 0.019 | 0.231 | 0.069 | 0.454 | ||||
| 23 | (0, 30%) | 1 | 0.5 | Bias | 0.003 | 0.007 | 0.012 | 0.047 | 0.001 | 0.008 | 0.012 | 0.078 |
| SD | 0.022 | 0.289 | 0.062 | 0.221 | 0.020 | 0.265 | 0.074 | 0.217 | ||||
| 24 | (0, 30%) | 1 | 1.5 | Bias | 0.001 | 0.001 | 0.005 | 0.016 | 0.001 | 0.005 | 0.009 | 0.088 |
| SD | 0.017 | 0.134 | 0.069 | 0.301 | 0.017 | 0.098 | 0.070 | 0.416 | ||||
Table 3. Results of the estimation with , p=0.8, , and . PMLE denotes the pseudo-maximum-likelihood method with q=1, and CL denotes the level of right and left censoring.
| PMLE | MLE | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Con. | CL | θ | n | |||||||||
| 5 | (30%, 30%) | 0.5 | 200 | Bias | 0.001 | 0.046 | 0.081 | 0.127 | 0.006 | 0.055 | 0.035 | 0.586 |
| SD | 0.035 | 0.159 | 0.206 | 0.378 | 0.037 | 0.439 | 0.172 | 0.730 | ||||
| 400 | Bias | 0.004 | 0.011 | 0.031 | 0.105 | 0.001 | 0.020 | 0.026 | 0.214 | |||
| SD | 0.028 | 0.139 | 0.158 | 0.293 | 0.028 | 0.353 | 0.112 | 0.364 | ||||
| 800 | Bias | 0.002 | 0.006 | 0.019 | 0.038 | 0.002 | 0.005 | 0.014 | 0.105 | |||
| SD | 0.024 | 0.337 | 0.063 | 0.237 | 0.021 | 0.120 | 0.084 | 0.337 | ||||
| 6 | (30%, 30%) | 1.5 | 200 | Bias | 0.001 | 0.003 | 0.002 | 0.089 | 0.002 | 0.021 | 0.005 | 0.497 |
| SD | 0.036 | 0.215 | 0.116 | 0.433 | 0.037 | 0.199 | 0.120 | 0.775 | ||||
| 400 | Bias | 0.002 | 0.005 | 0.001 | 0.052 | 0.001 | 0.012 | 0.005 | 0.276 | |||
| SD | 0.029 | 0.167 | 0.088 | 0.387 | 0.027 | 0.157 | 0.093 | 0.593 | ||||
| 800 | Bias | 0.001 | 0.001 | 0.003 | 0.010 | 0.001 | 0.011 | 0.004 | 0.040 | |||
| SD | 0.019 | 0.119 | 0.066 | 0.349 | 0.019 | 0.071 | 0.062 | 0.411 | ||||
| 11 | (20%, 20%) | 0.5 | 200 | Bias | 0.002 | 0.013 | 0.012 | 0.099 | 0.003 | 0.071 | 0.017 | 0.282 |
| SD | 0.027 | 0.372 | 0.095 | 0.279 | 0.028 | 0.443 | 0.118 | 0.504 | ||||
| 400 | Bias | 0.001 | 0.024 | 0.003 | 0.043 | 0.001 | 0.069 | 0.003 | 0.122 | |||
| SD | 0.022 | 0.313 | 0.078 | 0.219 | 0.022 | 0.348 | 0.081 | 0.290 | ||||
| 800 | Bias | 0.001 | 0.002 | 0.002 | 0.019 | 0.001 | 0.023 | 0.002 | 0.052 | |||
| SD | 0.015 | 0.195 | 0.061 | 0.139 | 0.015 | 0.210 | 0.063 | 0.161 | ||||
| 12 | (20%, 20%) | 1.5 | 200 | Bias | 0.001 | 0.014 | 0.005 | 0.035 | 0.002 | 0.057 | 0.014 | 0.318 |
| SD | 0.027 | 0.203 | 0.100 | 0.413 | 0.027 | 0.203 | 0.107 | 0.579 | ||||
| 400 | Bias | 0.001 | 0.005 | 0.006 | 0.020 | 0.002 | 0.019 | 0.007 | 0.058 | |||
| SD | 0.022 | 0.134 | 0.077 | 0.340 | 0.022 | 0.113 | 0.082 | 0.364 | ||||
| 800 | Bias | 0.001 | 0.005 | 0.001 | 0.017 | 0.001 | 0.005 | 0.001 | 0.024 | |||
| SD | 0.015 | 0.097 | 0.062 | 0.247 | 0.015 | 0.097 | 0.062 | 0.229 | ||||
| 17 | (0, 50%) | 0.5 | 200 | Bias | 0.012 | 0.012 | 0.060 | 0.395 | 0.006 | 0.005 | 0.048 | 1.013 |
| SD | 0.053 | 0.451 | 0.167 | 0.657 | 0.052 | 0.471 | 0.208 | 1.075 | ||||
| 400 | Bias | 0.012 | 0.026 | 0.045 | 0.185 | 0.005 | 0.016 | 0.026 | 0.636 | |||
| SD | 0.048 | 0.451 | 0.105 | 0.512 | 0.038 | 0.416 | 0.135 | 0.856 | ||||
| 800 | Bias | 0.002 | 0.003 | 0.014 | 0.108 | 0.009 | 0.023 | 0.003 | 0.399 | |||
| SD | 0.038 | 0.367 | 0.083 | 0.438 | 0.034 | 0.363 | 0.107 | 0.622 | ||||
| 18 | (0, 50%) | 1.5 | 200 | Bias | 0.007 | 0.009 | 0.001 | 0.034 | 0.001 | 0.013 | 0.011 | 0.598 |
| SD | 0.051 | 0.297 | 0.148 | 0.592 | 0.045 | 0.318 | 0.181 | 0.935 | ||||
| 400 | Bias | 0.009 | 0.002 | 0.014 | 0.029 | 0.002 | 0.002 | 0.018 | 0.386 | |||
| SD | 0.042 | 0.270 | 0.108 | 0.541 | 0.036 | 0.256 | 0.127 | 0.837 | ||||
| 800 | Bias | 0.005 | 0.002 | 0.006 | 0.027 | 0.001 | 0.020 | 0.005 | 0.084 | |||
| SD | 0.025 | 0.157 | 0.077 | 0.440 | 0.025 | 0.080 | 0.088 | 0.655 | ||||
| 23 | (0, 30%) | 0.5 | 200 | Bias | 0.004 | 0.006 | 0.030 | 0.232 | 0.002 | 0.109 | 0.027 | 0.492 |
| SD | 0.038 | 0.356 | 0.109 | 0.434 | 0.034 | 0.414 | 0.137 | 0.614 | ||||
| 400 | Bias | 0.003 | 0.024 | 0.014 | 0.145 | 0.001 | 0.036 | 0.018 | 0.311 | |||
| SD | 0.027 | 0.349 | 0.094 | 0.368 | 0.026 | 0.394 | 0.121 | 0.492 | ||||
| 800 | Bias | 0.003 | 0.007 | 0.012 | 0.047 | 0.001 | 0.008 | 0.012 | 0.078 | |||
| SD | 0.022 | 0.289 | 0.062 | 0.221 | 0.020 | 0.265 | 0.074 | 0.217 | ||||
| 24 | (0, 30%) | 1.5 | 200 | Bias | 0.002 | 0.015 | 0.004 | 0.023 | 0.001 | 0.019 | 0.006 | 0.351 |
| SD | 0.035 | 0.260 | 0.128 | 0.361 | 0.034 | 0.267 | 0.136 | 0.748 | ||||
| 400 | Bias | 0.001 | 0.008 | 0.002 | 0.016 | 0.001 | 0.013 | 0.003 | 0.180 | |||
| SD | 0.026 | 0.158 | 0.095 | 0.383 | 0.026 | 0.164 | 0.104 | 0.540 | ||||
| 800 | Bias | 0.001 | 0.001 | 0.005 | 0.016 | 0.001 | 0.005 | 0.009 | 0.016 | |||
| SD | 0.017 | 0.134 | 0.069 | 0.301 | 0.017 | 0.098 | 0.070 | 0.310 | ||||
To assess the powers of our proposed change-point test statistics, we explore the percentages of one change point detected in simulated trials. We consider the samples of size 800, , p=0.8, , and the left and right censoring level is . The number of equally spaced points in the interval is 500. The results are listed in Table 4. We obtain that the powers of the two statistics are larger with a bigger jump size and smaller when the change-point location is close to 0 or goes to infinity. For each configuration, the power of is slightly larger than that of . Looking at the configuration , we also had false positive discovery rates of 5% and 4.8% for and , respectively, which shows that our methods maintain the overall type I error of when the true model has no change point. Overall, the results demonstrate that our methods have a good performance in identifying the true model and estimating the parameters.
Table 4. Size and powers with one or no change point.
| θ | |||||||
|---|---|---|---|---|---|---|---|
| τ | 0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | |
| 0.1 | 5.0% | 12.8% | 20.6% | 36.4% | 48.6% | 60.4% | |
| 4.8% | 11.0% | 20.0% | 30.8% | 40.2% | 48.0% | ||
| 0.5 | 5.0% | 22.6% | 64.0% | 88.4% | 98.2% | 100% | |
| 4.8% | 26.0% | 54.4% | 76.0% | 92.2% | 100% | ||
| 1 | 5.0% | 22.0% | 56.0% | 84.4% | 96.6% | 100% | |
| 4.8% | 24.4% | 50.2% | 84.2% | 94.8% | 100% | ||
| 1.5 | 5.0% | 16.4% | 32.4% | 64.4% | 84.8% | 90.4% | |
| 4.8% | 16.8% | 32.0% | 56.6% | 82.6% | 86.0% | ||
7. Real data analysis
We are interested in finding whether there exists a change point in the hazard function, whether the cure model is proper and estimating the location of the change point and all other model parameters.
7.1. The liver cancer data
The proposed method is applied to the 2008–2013 Iowa liver cancer data set from the Surveillance, Epidemiology, and End Results cancer incidence public-use database (SEER). Observations with missing values for any of these variables are excluded from this study. People who were lost to follow-up after diagnosis or died immediately after diagnosis are also excluded. The final analytic data set includes 546 patients where a total of of the observations are subjected to right censoring and the others belong to interval censoring. As discussed in Section 1, we calculate the survival month T based on the difference in the last contact date and the diagnosis date, then rounded down. We treat it as interval-censored data and apply the change-point model to analyze this data.
Table 5 records the 5-year relative survival probability for liver cancer, and we can see that with the development of the medical treatment, the improvement of 5-year relative survival probability is obvious. In the live cancer data, there are 107 patients still alive at the last visit time. And the 95% confidential interval of is . Hence, it is appropriate to admit the presence of long-term survivors. And our suggested model (4) is proper for the analysis of the data. Based on the change-point model (4), the MLE of the parameters are calculated to be , which implies that the hazard function has a change point around 55 days with a jump of 0.0256.
Table 5. Liver cancer 5-year relative survival probability.
| Year | 1975 | 1980 | 1985 | 1990 | 1995 | 2000 | 2005 | 2010 |
|---|---|---|---|---|---|---|---|---|
| 5-year relative survival probability | 3.0% | 3.2% | 7.0% | 5.3% | 5.7% | 11.7% | 16.8% | 16.8% |
Figure 1 shows two modified cumulative hazard functions. The broken line is generated by the estimation of the model (4). The irregular line is the result of NPMLE. In the figure, the increasing ratio of the irregular line changes around 50–60 which is the rough range of the change point. And the dot of the broken line is 55 which is the change point obtained by the model (4). Table 6 records the results of the estimations of each parameter. Since the asymptotic variance of is often intractable, we resort to the bootstrap method, which is proposed by [4] to produce the standard deviation (SD) based on 1000 repetitions. The results in Table 6 imply the value of p at around 0.95, indicating a proportion of for the long-term survivors. The change point τ is estimated as 55 months, with an estimated jump of about 0.0256 for the hazard rate. We fitted the data using the survival cure model (4). And the test procedure in Section 4 was applied to determine whether there is an abrupt change for the recurrence rate of the live cancer. We obtain that and . Under the null hypothesis, the 95% quantiles of and are 254.478 and 70.339, respectively. Hence, there is overwhelming evidence to reject the null hypothesis and conclude that a change point does exist for the data.
Figure 1.
Modified cumulative hazard function for liver cancer data where the broken line is the result of the estimation of change point model and the irregular line is the result of NPMLE.
Table 6. Estimates for liver cancer data.
| 0.954 | 0.0130 | 55.00 | 0.274 | 0.0065 | 0.000637 | 0.0256 | 0.00179 |
7.2. The breast cancer data
The breast cancer data set is described and given in [5]. This data set considers the information of time to cosmetic deterioration of the breast for women with Stage 1 breast cancer who have undergone a radiotherapy. The data come from a retrospective study of 94 patients who received radiation. Each woman made a series of visits to a clinician who determined whether or not the retraction had occurred. If the retraction had occurred, the time was known only to lie between the time of the present and last visits. Finally, a total of 40.6% of the observations were subjected to right censoring and 5% belongs to left censoring.
By the bootstrap procedure described in Section 5 and (8), the 95% confidential interval of is . Hence the cure model (4) is suitable to analyze the data. And our estimates for , together with their sample standard deviations, are shown in Table 7. We apply the test procedures in Section 4 to determine whether there is a change point. This shows the deviances and . The 95% quantiles of and under the null hypothesis are 6.804 and 3.987 respectively. Hence, there is a strong evidence to reject the null hypothesis and conclude that a change point does exist for the data.
Table 7. Estimates for breast cancer data.
| 0.882 | 0.0170 | 16.00 | 0.074 | 0.017 | 0.004 | 0.028 | 0.010 |
8. Discussion
In this paper, we develop the pseudo-maximum-likelihood method to handle the single change-point hazard model for interval-censored data with a cure fraction. To guarantee the consistency of the uncured rate estimate, we have applied the ICM method to calculate the NPMLE. Compared with [3], our approach possesses a smaller bias of the change size estimate. The simulation and two real data examples illustrate that the proposed method can effectively deal with interval-censored data with a cure fraction.
In our future work, the multiple change-point hazard model will be the key research object. There is some research work on this model, such as [8,20]; however, they did not consider interval-censored data.
Supplementary Material
Funding Statement
The research work of Wang is supported by the National Natural Sciences Foundation of China grant 11471065. The research work of Song is supported by the National Natural Sciences Foundation of China grant 11371077 and 61175041.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- 1.Berkson J. and Gage R.P., Survival curve for cancer patients following treatment, J. Am. Stat. Assoc. 47 (1952), pp. 501–515. doi: 10.1080/01621459.1952.10501187 [DOI] [Google Scholar]
- 2.Chang I.S., Chen C.H., and Hsiung C.A., Estimation in change-point hazard rate models with random censorship, Lecture Notes Monogr. Ser. 23 (1994), pp. 78–92. doi: 10.1214/lnms/1215463115 [DOI] [Google Scholar]
- 3.Dupuy J.F., Estimation in a change-point hazard regression model, Stat. Probab. Lett. 76 (2006), pp. 182–190. doi: 10.1016/j.spl.2005.07.013 [DOI] [Google Scholar]
- 4.Efron B., Bootstrap methods: another look at the jackknife, Ann. Stat. 7 (1979), pp. 1–26. doi: 10.1214/aos/1176344552 [DOI] [Google Scholar]
- 5.Finkelstein D.M. and Wolfe R.A., A semiparametric model for regression analysis of interval-censored failure time data, Biometrics 41 (1985), pp. 933–945. doi: 10.2307/2530965 [DOI] [PubMed] [Google Scholar]
- 6.Gijbels I. and Gürler Ü., Estimation of a change point in a hazard function based on censored data, Lifetime Data Anal. 9 (2003), pp. 395–411. doi: 10.1023/B:LIDA.0000012424.71723.9d [DOI] [PubMed] [Google Scholar]
- 7.Gong G. and Samaniego F.J., Pseudo maximum likelihood estimation: theory and applications, Ann. Stat. 9 (1981), pp. 861–869. doi: 10.1214/aos/1176345526 [DOI] [Google Scholar]
- 8.Goodman M.S., Li Y., and Tiwari R.C., Detecting multiple change points in piecewise constant hazard functions, J. Appl. Stat. 38 (2011), pp. 2523–2532. PMID: 22707842. doi: 10.1080/02664763.2011.559209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Groeneboom P. and Wellner J., Information Bounds and Nonparametric Maximum Likelihood Estimation, Oberwolfach Seminars. Birkhäuser, Basel, 1992. [Google Scholar]
- 10.Henderson R., A problem with the likelihood ratio test for a change-point hazard rate model, Biometrika 77 (1990), pp. 835–843. doi: 10.1093/biomet/77.4.835 [DOI] [Google Scholar]
- 11.Hu H., Large sample theory for pseudo-maximum likelihood estimates in semiparametric models, Ph.D. diss., University of Washington, 1998.
- 12.Huang J., Efficient estimation for the proportional hazards model with interval censoring, Ann. Stat. 24 (1996), pp. 540–568. doi: 10.1214/aos/1032894452 [DOI] [Google Scholar]
- 13.Loader C.R., Inference for a hazard rate change point, Biometrika 78 (1991), pp. 749–757. doi: 10.1093/biomet/78.4.749 [DOI] [Google Scholar]
- 14.Maller R.A. and Zhou X., Survival Analysis with Long-term Survivors, Wiley, New York, 1996. [Google Scholar]
- 15.Matthews D.E. and Farewell V.T., On testing for a constant hazard against a change-point alternative, Biometrics 38 (1982), pp. 463–468. doi: 10.2307/2530460 [DOI] [PubMed] [Google Scholar]
- 16.Matthews D.E., Farewell V.T., and Pyke R., Asymptotic score-statistic processes and tests for constant hazard against a change-point alternative, Ann. Stat. 13 (1985), pp. 583–591. doi: 10.1214/aos/1176349540 [DOI] [Google Scholar]
- 17.Nguyen H.T., Rogers G.S., and Walker E.A., Estimation in change-point hazard rate models, Biometrika 71 (1984), pp. 299–304. doi: 10.1093/biomet/71.2.299 [DOI] [Google Scholar]
- 18.Pan W., Extending the iterative convex minorant algorithm to the cox model for interval-censored data, J. Comput. Graph. Stat. 8 (1999), pp. 109–120. [Google Scholar]
- 19.Peng Y. and Dear K.B.G., A nonparametric mixture model for cure rate estimation, Biometrics 56 (2000), pp. 237–243. doi: 10.1111/j.0006-341X.2000.00237.x [DOI] [PubMed] [Google Scholar]
- 20.Qian L., Zhang W., Multiple change-point detection in piecewise exponential hazard regression models with long-term survivors and right censoring, in Contemporary Developments in Statistical Theory: A Festschrift for Hira Lal Koul, Springer International Publishing, Cham, 2014, pp. 289–304.
- 21.Qin J. and Sun J., Statistical analysis of right-censored failure-time data with partially specified hazard rates, Can. J. Stat. 25 (1997), pp. 325–336. doi: 10.2307/3315782 [DOI] [Google Scholar]
- 22.Sun J., Variance estimation of a survival function for interval-censored survival data, Stat. Med. 20 (2001), pp. 1249–1257. doi: 10.1002/sim.719 [DOI] [PubMed] [Google Scholar]
- 23.Sun J., The Statistical Analysis of Interval-censored Failure Time Data, Springer, New York, 2006. [Google Scholar]
- 24.Turnbull B.W., The empirical distribution function with arbitrarily grouped, censored and truncated data, J. R. Stat. Soc. Ser. B (Methodological) 38 (1976), pp. 290–295. [Google Scholar]
- 25.Vexler A. and Hutson A., Statistics in the Health Sciences, Chapman and Hall/CRC, New York, 2018. [Google Scholar]
- 26.Vexler A., Hutson A., and Chen X., Statistical Testing Strategies in the Health Sciences, Chapman and Hall/CRC, New York, 2016. [Google Scholar]
- 27.Zhang J. and Peng Y., A new estimation method for the semiparametric accelerated failure time mixture cure model, Stat. Med. 26 (2007), pp. 3157–3171. doi: 10.1002/sim.2748 [DOI] [PubMed] [Google Scholar]
- 28.Zhao X., Wu X., and Zhou X., A change-point model for survival data with long-term survivors, Stat. Sin. 19 (2009), pp. 377–390. [Google Scholar]
- 29.Zhou J., Zhang J., McLain A.C., and Cai B., A multiple imputation approach for semiparametric cure model with interval censored data, Comput. Stat. Data. Anal. 99 (2016), pp. 105–114. doi: 10.1016/j.csda.2016.01.013 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

