Abstract
The gap time between recurrent events is often of primary interest in many fields such as medical studies (Cook and Lawless 2007; Kang, Sun, and Zhao 2015; Schaubel and Cai 2004), and in this paper, we discuss regression analysis of the gap times arising from a general class of additive transformation models. For the problem, we propose two estimation procedures, the modified within-cluster resampling (MWCR) method and the weighted risk-set (WRS) method, and the proposed estimators are shown to be consistent and asymptotically follow the normal distribution. In particular, the estimators have closed forms and can be easily determined, and the methods have the advantage of leaving the correlation among gap times arbitrary. A simulation study is conducted for assessing the finite sample performance of the presented methods and suggests that they work well in practical situations. Also the methods are applied to a set of real data from a chronic granulomatous disease (CGD) clinical trial.
Keywords: additive transformation model, latent variable, recurrent event data, gap times, within-cluster resampling
Mathematics Subject Classification: 62N01, 62N02, 62G09
1. Introduction
Recurrent event data frequently occur in many fields such as clinical and observational studies and usually concern the occurrence rate of some recurrent events (Cook and Lawless 2007; Huang and Chen 2003; Wang and Chang 1999). Examples of such events include, among others, repeated hospitalizations, multiple infections and tumor recurrences. Many methods have been developed in the literature for both nonparametric analysis and regression analysis of recurrent event data with the focus on the occurrence times of the event of interest since enrollment. Among others, Cook and Lawless (2007) gave a comprehensive review of the literature on the analysis of recurrent event data.
In many applications, the gap time, the time between the successive occurrences of the recurrent event of interest, is an another natural outcome of interest (Gail, Santner, and Brown 1980). An example of this is given by the time between hospitalizations, which could be important from the point of view of both patients and hospitals. It is easy to see that the analysis of the occurrence time of a recurrent event from enrollment and the analysis of recurrent gap time need different methods due to the special structure of the latter. One difference is that although the first gap time may be subject to independent censoring, the second and later gap times are subject to dependent censoring as gap times from the same subject are usually correlated. Another special feature of the gap time data is that the last censored gap time tends to be longer than the uncensored gap times. Also the number of gap times is informative about the underlying distribution because the subjects at a higher risk of experiencing recurrent events are likely to have shorter and hence more gap times.
Some methods have been developed for the analysis of recurrent gap time data. For example, Huang and Chen (2003), Schaubel and Cai (2004), Luo and Huang (2011) and Darlington and Dixon (2013) discussed regression analysis of the data arising from the proportional hazards model, and Chang (2004) and Strawderman (2005) considered the problem under the accelerated time models. Also Sun, Park, and Sun (2006) and Ding and Sun (2017) discussed the problem with the use of the additive hazards model and the additive mixed effect model, respectively, and Kang, Sun, and Zhao (2015) fitted a class of transformed hazards models to recurrent gap time data. In addition, Lin, Sun, and Ying (1999) and Wang and Chang (1999) investigated the nonparametric estimation problem for recurrent gap time data among others. In the following, we will consider a general class of additive transformation models (Zeng and Cai 2010; Liu, Sun and Zhou 2013), which may be more appropriate in some situations than the models mentioned above (Lin and Ying 1994; O’Neill 1986) and for which there does not seem to exist an established inference procedure.
The rest of the article is organized as follows. In Section 2, we will first define some notation and describe the models and assumptions that will be used throughout the paper. Two inference procedures, a modified within-cluster resampling (MWCR) approach and a weighted risk-set (WRS) method (Luo and Huang 2011), will then be presented in Section 3 with the use of the approach given in Lin and Ying (1994) for right-censored data. Both methods can be easily implemented and the resulting estimators are consistent and asymptotically follow the normal distribution. In Section 4, we provide some results obtained from a simulation study conducted to assess the performance of the proposed approaches and they suggest that the methods work well in practical situations. Section 5 applies the proposed methods to a set of recurrent gap time data arising from a chronic granulomatous disease (CGD) clinical trial and Section 6 gives some discussion and concluding remarks.
2. Notation, Models and Assumptions
Consider an event history study concerning a recurrent event and including n independent subjects. For subject i, suppose that there exists a vector of covariates denoted by Zi and let Ci denote the censoring or follow-up time. Also for subject i, let Tij denote the time between the (j – 1)-th and the j-th recurrences of the event and mi the number of both observed and censored gap times, j = 1, …, mi, i = 1, …, n. It follows that Ti1 + … + Tij denotes the time of the j-th recurrence of the event and we have that
with . Define δi = I(mi > 1). Note that δi = 0 or mi = 1 means that subject i is event-free during the follow-up or one only observes one censored gap time as Ci, while δi = 1 or mi > 1 indicates that for the subject, we observe the first (mi – 1) gap times and the last one is censored at . In the following, for convenience, we will define , denoting the number of uncensored gap times for the subjects with at least one observed gap time.
To describe the covariate effects on the Tij’s, we will assume that given Zi which may depend on time, the cumulative hazard function of Tij has the form
| (1) |
(Zeng and Cai 2010; Liu, Sun and Zhou 2013). In the above, Λ0(t) is an unknown baseline cumulative hazard function, β0 denotes the vector of unknown regression coefficients, and K(x) is an arbitrary and monotone transformation function. Clearly if K(x) = x we have the additive hazards model. In the following, we will focus on the situation where the covariates are time-independent and K(x) satisfies E [exp{−ξx}] = exp{−K(x)} for some positive latent variable ξ. In this case, the marginal survival function of the gap time has the form
and it follows that the model (1) above can be written as
| (2) |
where the ξi’s can be seen as the subject-specific latent variables that account for the correlation among the gap times from the same subject. It will be assumed below that the ξi’s have an arbitrary distribution with mean one.
It is easy to see that models (1) or (2) can have various forms with different K(x) or distributions of the ξi’s. For example, by assuming that the ξi’s follow a Gamma distribution with the shape and scale parameters θ1 and θ2 satisfying , we have
If the ξi’s follow the uniform distribution over U(a, b) with a > 0, b > 0 and (a + b)/2 = 1, the corresponding K(x) has the form
Another common choice for the distribution of the ξi’s is the inverse Gaussian distribution with mean μ > 0 and shape parameter γ > 0 and this gives
In addition to the choices above, the simplest one is clearly the one that sets ξi = 1 for all i or the variance of the ξi’s being zero. In this case, model (2) reduces to the commonly used additive hazards model
| (3) |
for right-censored failure time data (Lin and Ying 1994). Actually Sun, Park, and Sun (2006) discussed the fitting of the model above to recurrent gap time, but their inference procedure does not take into account the correlation among gap times from the same subject. To address this, Ding and Sun (2017) considered the following additive mixed effect model
| (4) |
where ωi is a subject-specific latent variable. It is easy to see that both models (3) and (4) assume a linear relationship between covariates and the hazard rate of the gap time, which may not be appropriate sometimes, while models (1) or (2) allow nonlinear relationships. Also model (2) may be more reasonable than model (4) for a clinical trial where the patient information characterized by the latent variable ξi could enlarge or degenerate the treatment effect on the hazard of gap times. In the following, we will assume that Ci is independent of Tij given Zi and ξi and the Tij’s are conditionally independent of each other given Zi and ξi.
3. Inference Procedures
In this section, we present two estimation procedures for models (1) or (2) and for this, for subject i, we will let denote the set of all uncensored gap times, which will be assumed to be exchangeable, if mi > 1 or the set of only the censored gap time otherwise. First we will describe the MWCR procedure (Hoffman, Sen and Weinberg 2001; Luo and Huang 2011).
3.1. The MWCR-based Estimation Procedure
Let Q be an integer and for each q = 1, …, Q and i, let denote an observation randomly selected with replacement from , i = 1, …, n. Then based on the assumption that the gap times from different subjects are independent, the resampled dataset of gap times gives a random sample for each q. In other words, for each q, the resampled dataset can be treated as a random sample from and analyzed by the additive hazards model (3) by employing the estimating equation approach given in, for example, Lin and Ying (1994).
More specifically, for each q, define , the risk indicator process, and , a counting process, i = 1, …, n. Let τ denote the largest observation time in the study and denote the estimator of β0 given by the solution to the following estimating equation
where τ is the largest observation time in the study, and
It can be easily shown that can be explicitly expressed as
| (5) |
where a⊗2 = a a′ for a vector a. Furthermore, Lin and Ying (1994) showed that is consistent and converges weakly to a normal vector with mean zero and the covariance matrix that can be consistently estimated by , where
and
It is apparent that sometimes one may also be interested in estimating the baseline cumulative hazard function and for this, given , a natural estimator is given by
Note that the estimator above may not always be monotone in t and to deal with this, by following Lin and Ying (1994), one can use the adjusted estimator . Finally we suggest to define the MWCR estimator β as the average of the Q resample-based estimators given by
whose variance-covariance matrix can be estimated by
Under some regularity conditions, in the Appendix, we show that is consistent and converges in distribution to a zero-mean normal random vector. Similarly one can define the MWCR estimator of Λ0(t) as
where .
Note that the idea described above was motivated by that discussed in Hoffman, Sen, and Weinberg (2001) for clustered data but the direct application of the original idea to recurrent gap times could result in biased estimation because of the special data structure here. Also an idea similar to that described above was considered by Luo and Huang (2011) for recurrent gap time data. One advantage of the estimation procedure given above is that it can be implemented by using the existing software for right-censored failure time data.
3.2. The WRS-based Estimation Procedure
To describe the WRS estimation procedure, for the subjects with δi = 1, define the averaged at-risk process
and the averaged counting process
For the subjects with δi = 0, define and . Then for estimation of β0, by following Luo and Huang (2011) and applying the estimation procedure given in Lin and Ying (1994) directly to , we can consider the estimating equation
where
Let denote the solution to the equation above, which will be referred to as the WRS estimator. Then it can be easily shown that
Also it will be shown in the Appendix that as the MWCR estimator, under some regularity conditions, the new estimator is consistent and asymptotically follows the normal distribution with mean zero and the covariance matrix that can be consistently estimated by . Here
and
where
and
It is easy to see that the main difference between the two methods above is how the average is taken and the two estimators are expected to be asymptotically equivalent. The WRS estimator has an advantage of being conceptually simpler than the MWCR estimator.
4. A Simulation Study
A simulation study was conducted to assess the finite sample performance of the two estimation procedures proposed in the previous sections with the focus on the estimation of regression parameters. In the study, we considered the situation with one covariate generated from the Bernoulli distribution with the success probability p = 0.5 and assumed that the follow-up times Ci’s follow the uniform distribution U(0, 4). To generate the correlated gap times, we first generated both the subject-specific variables Ai’s and the episode-specific variables Bij’s independently from the normal distributions with mean zero and the variances ρ and 1 – ρ, respectively, where ρ ∈ [0, 1]. Also we set λ0(t) = λ0 with λ0 being a constant or λ0(t) = t, and K(x) = 2 ln(1 + x/2) or K(x) = 4 ln(1 + x/4) or generated the ξi’s from the gamma distribution with mean one. Then the baseline gap times were defined to be with Φ(·) denoting the standard normal distribution function. It can be verified that follows the standard exponential distribution. Given the ’s, the general gap times were taken to be
for the situation with λ0(t) = λ0, or determined through
for the situation with λ0(t) = t. Note that ρ above measures the heterogeneity among the subjects and 1 – ρ controls the heterogeneity among the gap times within a subject. The results given below are based on n = 200 or 300 with Q = 1000 for the MWCR estimator and 1000 replications.
Table 1 presents the results obtained for estimation of β with the true value of β0 being −0.2, 0 or 0.2, ρ = 0.25,0.5 or 0.75, K(x) = 4 ln(1+x/4), and λ0(t) = 2. The results given in Table 2 were obtained under the same set-up as in Table 1 except that K(x) = 2 ln(1 + x/2) and λ0(t) = t. The results include the estimated bias of the estimators given by the estimates minus the true value (BIAS), the mean of the estimated standard deviations (ESD), the sample standard error of the estimates (SSE), and the 95% empirical coverage probability (CP). It can be seen from the tables that both proposed estimators of β seem to be unbiased and the variance estimation also seems to be appropriate. The 95% empirical coverage probabilities are close to the nominal level, indicating that the normal approximation to the distributions of the proposed estimators is reasonable. In addition, as expected, the results became better when the sample size increased. To further assess the normal approximation, we investigated the quantile plots of the standardized estimates against the standard normal distribution and Figure 1 presents some plots corresponding to the set-up considered in Table 1. One can see that these quantile plots again indicate that the normal approximation appears to be appropriate.
Table 1:
Estimation of the regression parameter with K(x) = 4 ln(1 + x/4) and λ0(t) = 2
| WRS |
MWCR |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| ρ | β0 | BIAS | ESD | SSE | CP | BIAS | ESD | SSE | CP |
| n = 200 | |||||||||
| 0.25 | −0.2 | −0.0104 | 0.2427 | 0.2418 | 0.947 | −0.0064 | 0.2415 | 0.2367 | 0.955 |
| 0 | −0.0078 | 0.2512 | 0.2596 | 0.945 | −0.0079 | 0.2495 | 0.2532 | 0.949 | |
| 0.2 | −0.0188 | 0.2600 | 0.2711 | 0.941 | −0.0218 | 0.2578 | 0.2653 | 0.945 | |
| 0.5 | −0.2 | −0.0141 | 0.2607 | 0.2708 | 0.946 | −0.0114 | 0.2589 | 0.2658 | 0.951 |
| 0 | −0.0087 | 0.2708 | 0.2717 | 0.944 | −0.0076 | 0.2689 | 0.2662 | 0.943 | |
| 0.2 | 0.0188 | 0.2834 | 0.2993 | 0.941 | 0.0144 | 0.2813 | 0.2927 | 0.945 | |
| 0.75 | −0.2 | −0.0033 | 0.2785 | 0.2821 | 0.947 | 0.0004 | 0.2767 | 0.2770 | 0.953 |
| 0 | 0.0109 | 0.2913 | 0.3025 | 0.948 | 0.0107 | 0.2886 | 0.2966 | 0.947 | |
| 0.2 | −0.0149 | 0.3049 | 0.3179 | 0.942 | −0.0186 | 0.3021 | 0.3108 | 0.940 | |
| n = 300 | |||||||||
| 0.25 | −0.2 | 0.0002 | 0.1976 | 0.2006 | 0.947 | 0.0029 | 0.1970 | 0.1983 | 0.954 |
| 0 | −0.0008 | 0.2044 | 0.1999 | 0.952 | −0.0018 | 0.2035 | 0.1969 | 0.954 | |
| 0.2 | −0.0023 | 0.2120 | 0.2167 | 0.944 | −0.0048 | 0.2111 | 0.2132 | 0.947 | |
| 0.5 | −0.2 | −0.0058 | 0.2114 | 0.2239 | 0.945 | −0.0017 | 0.2109 | 0.2209 | 0.948 |
| 0 | −0.0013 | 0.2202 | 0.2330 | 0.934 | −0.0014 | 0.2193 | 0.2298 | 0.939 | |
| 0.2 | −0.0113 | 0.2299 | 0.2408 | 0.936 | −0.0143 | 0.2288 | 0.2371 | 0.943 | |
| 0.75 | −0.2 | −0.0057 | 0.2267 | 0.2286 | 0.948 | −0.0029 | 0.2257 | 0.2256 | 0.951 |
| 0 | 0.0062 | 0.2374 | 0.2478 | 0.942 | 0.0060 | 0.2361 | 0.2446 | 0.942 | |
| 0.2 | 0.0056 | 0.2468 | 0.2489 | 0.941 | 0.0026 | 0.2454 | 0.2455 | 0.941 | |
Table 2:
Estimation of the regression parameter with K(x) = 2 ln(1 + x/2) and λ0(t) = t
| WRS |
MWCR |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| ρ | β0 | BIAS | ESD | SSE | CP | BIAS | ESD | SSE | CP |
| n = 200 | |||||||||
| 0.25 | −0.2 | −0.0122 | 0.0893 | 0.0922 | 0.930 | −0.0101 | 0.0903 | 0.0906 | 0.945 |
| 0 | −0.0015 | 0.0955 | 0.1014 | 0.931 | −0.0014 | 0.0957 | 0.0993 | 0.948 | |
| 0.2 | −0.0140 | 0.1025 | 0.1076 | 0.932 | −0.0153 | 0.1025 | 0.1054 | 0.942 | |
| 0.5 | −0.2 | −0.0151 | 0.0949 | 0.0989 | 0.927 | −0.0135 | 0.0959 | 0.0972 | 0.946 |
| 0 | −0.0024 | 0.1026 | 0.1041 | 0.953 | −0.0022 | 0.1028 | 0.1021 | 0.960 | |
| 0.2 | −0.0116 | 0.1103 | 0.1129 | 0.944 | −0.0135 | 0.1103 | 0.1109 | 0.953 | |
| 0.75 | −0.2 | −0.0133 | 0.1003 | 0.1033 | 0.940 | −0.0108 | 0.1014 | 0.1017 | 0.952 |
| 0 | 0.0054 | 0.1091 | 0.1157 | 0.931 | 0.0051 | 0.1094 | 0.1140 | 0.943 | |
| 0.2 | −0.0090 | 0.1182 | 0.1258 | 0.936 | −0.0104 | 0.1182 | 0.1237 | 0.944 | |
| n = 300 | |||||||||
| 0.25 | −0.2 | −0.0094 | 0.0732 | 0.0747 | 0.937 | −0.0079 | 0.0739 | 0.0736 | 0.946 |
| 0 | 0.0014 | 0.0782 | 0.07934 | 0.949 | 0.0013 | 0.0783 | 0.0780 | 0.955 | |
| 0.2 | −0.0100 | 0.0841 | 0.0876 | 0.938 | −0.0122 | 0.0839 | 0.0864 | 0.943 | |
| 0.5 | −0.2 | −0.0142 | 0.0777 | 0.0803 | 0.931 | −0.0121 | 0.0786 | 0.0794 | 0.945 |
| 0 | 0.0038 | 0.0840 | 0.0848 | 0.945 | 0.0038 | 0.0841 | 0.0837 | 0.950 | |
| 0.2 | −0.0111 | 0.0902 | 0.0895 | 0.946 | −0.0127 | 0.0902 | 0.0884 | 0.949 | |
| 0.75 | −0.2 | −0.0109 | 0.0822 | 0.0821 | 0.947 | −0.0094 | 0.0829 | 0.0811 | 0.954 |
| 0 | 0.0006 | 0.0889 | 0.0884 | 0.942 | 0.0006 | 0.0892 | 0.0872 | 0.947 | |
| 0.2 | −0.0027 | 0.0965 | 0.0982 | 0.938 | −0.0053 | 0.0965 | 0.0970 | 0.947 | |
Figure 1:
Quantile plots of WRS and MWCR estimators with n = 200 or 300
Note that in the above, for simplicity, we have assumed that the follow-up times Ci’s follow the same distribution and it is apparent that this may not be true in practice as, for example, they may depend on covariates. To assess this, we repeated the study above by generating the follow-up times from the mixed uniform distribution U(0, 4)I(Zi = 0) + U(0, 6)I(Zi = 1), and Table 3 gives the results obtained on estimation of β with n = 200 and the Ci’s generated above, and all other set-up being the same as in Table 1. It is clear that they gave the same conclusions as above. We also considered some other set-ups, including different K(x) functions and different covariates, and obtained similar results.
Table 3:
Estimation of the regression parameter with follow-up times depending on covariates
| WRS |
MWCR |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| ρ | β0 | BIAS | ESD | SSE | CP | BIAS | ESD | SSE | CP |
| 0.25 | −0.2 | −0.0033 | 0.2299 | 0.2387 | 0.936 | 0.0004 | 0.2293 | 0.2340 | 0.939 |
| 0 | −0.0032 | 0.2376 | 0.2401 | 0.939 | −0.0038 | 0.2353 | 0.2344 | 0.949 | |
| 0.2 | 0.0016 | 0.2454 | 0.2562 | 0.946 | −0.0029 | 0.2429 | 0.2492 | 0.943 | |
| 0.5 | −0.2 | −0.0157 | 0.2531 | 0.2648 | 0.946 | −0.0129 | 0.2513 | 0.2589 | 0.958 |
| 0 | 0.0003 | 0.2618 | 0.2636 | 0.947 | −0.0014 | 0.2598 | 0.2576 | 0.954 | |
| 0.2 | 0.0070 | 0.2726 | 0.2781 | 0.945 | 0.0004 | 0.2700 | 0.2725 | 0.941 | |
| 0.75 | −0.2 | −0.0123 | 0.2728 | 0.2819 | 0.944 | −0.0116 | 0.2703 | 0.2761 | 0.949 |
| 0 | 0.0084 | 0.2847 | 0.2910 | 0.952 | 0.0044 | 0.2820 | 0.2844 | 0.960 | |
| 0.2 | −0.0018 | 0.2971 | 0.3131 | 0.944 | −0.0094 | 0.2945 | 0.3060 | 0.946 | |
5. An Application
Now we apply the two estimation procedures proposed in the previous sections to a set of recurrent event data arising from a double-blinded randomized clinical trial of gamma interferon on the chronic granulomatous disease (CGD) discussed by Ding and Sun (2017) among others. The CGD is a group of inherited rare disorder characterized by recurrent pyogenic infections that usually present early in life and may lead to death in childhood. In order to study the ability of gamma interferon to reduce the rate of serious infections, 128 patients with CGD were randomized to either placebo or gamma interferon with 65 in the placebo group and 63 in the treatment group. By the end of the study, 30 patients in the placebo group and 14 patients in the gamma interferon group had experienced at least one serious infection with the number of serious infections ranging from 1 to 8. In addition to the information on the occurrences of serious infections, the data also include the information on six covariates, the gender, age, BMI at the study entry and pattern of inheritance of each patient, and if they used steroids and prophylactic antibiotics at the study entry. One of the objectives is to assess if any of the risk factors above is significantly associated with the gap times between the successive CGD infections.
To apply the proposed estimation approach, for patient i, define Zi1 = 1 if the patient was assigned to the gamma interferon group and 0 otherwise, Zi2 = 1 if the patient is female and 0 otherwise, Zi3 and Zi4 to be the age and BMI of the patient, respectively, Zi5 = 1 if the pattern of inheritance was X-linked and 0 if was autosomal recessive, Zi6 = 1 if the patient used steroids and 0 otherwise, and Zi7 = 1 if the patient used prophylactic antibiotics and 0 otherwise. Table 4 presents the obtained results on the seven risk factors given by the two proposed estimation procedures with Q = 1000 for the MWCR approach. The table includes the estimated effect of each risk factor, the estimated standard error and the p-values for testing no effect on the gap time between the successive CGD infections. One can see that the two methods gave similar results and both indicate that the patients in the gamma interferon group had significantly longer gap times than the patients in the placebo group. However, all other risk factors did not seem to significantly affect or be associated with the gap time. Ding and Sun (2017) gave similar results.
Table 4:
Estimates of the covariate effects for the CGD data
| WRS |
MWCR |
|||||
|---|---|---|---|---|---|---|
| Covariate | EST | SE | p-value | EST | SE | p-value |
| Treatment | −0.0449 | 0.0137 | 0.0011 | −0.0447 | 0.0140 | 0.0014 |
| Gender | −0.0090 | 0.0156 | 0.5645 | −0.0090 | 0.0154 | 0.5585 |
| Age | −0.0008 | 0.0006 | 0.1531 | −0.0008 | 0.0006 | 0.1534 |
| BMI | −0.0002 | 0.0006 | 0.7435 | −0.0002 | 0.0006 | 0.7222 |
| Pattern of inheritance | −0.0048 | 0.0138 | 0.7287 | −0.0048 | 0.0140 | 0.7309 |
| Steroids | 0.0533 | 0.0656 | 0.4168 | 0.0531 | 0.0684 | 0.4378 |
| Prophylactic antibiotics | −0.0181 | 0.0228 | 0.4267 | −0.0180 | 0.0230 | 0.4340 |
In the two proposed methods above, we have assumed that the gap times Tij’s follow models (1) or (2) and it is apparent that one question of interest is the appropriateness of the model for the data analyzed here. To investigate this, we obtained the nonparametric estimates of the survival function of the gap times between the successive CGD infections for the patients in the two treatment groups separately by using the method given in Wang and Chang (1999). Also by using the WRS procedure proposed above and assuming no other covariate effect, we obtained the estimates of the same two survival functions under the assumed model and present them in Figure 2. One can see that they suggest that the additive transformation model seems to fit the CGD gap time data well.
Figure 2:
Estimates of the survival function of the gap times between CGD infections
6. Discussion and Concluding Remarks
In this paper we discussed regression analysis of recurrent gap time data and for the problem, we presented a general class of additive transformation models that allow both linear and nonlinear covariate effects and include some existing models as special cases. For inference, two estimation approaches were developed with one based on the averaged empirical processes and the other based on the within-cluster resampling. The MWCR procedure has the advantage that it can be easily implemented through the use of the existing software for right-censored failure time data, while the WRS procedure is conceptually simpler than the former. The simulation study suggested that both procedures work well in practical situations.
As mentioned above, one advantage of models (1) or (2) or the proposed methods is that they leave the correlation among the gap times arbitrary, and this is especially clear from model (2). In other words, the proposed estimation procedures do not require the specification of the unknown function K(x) or the distribution of the latent variable ξ. This is quite important in general since in practice, it is difficult or impossible to know the correlation structure of gap times or the distribution of ξ. For the case where such information is known or available, it is apparent that one could develop some more efficient estimation procedures by taking into account the extra information.
In the proposed methods, for simplicity, we have assumed that the censoring or follow-up times follow the same distribution and as discussed in Section 4, the proposed methods can still apply if the distribution of the follow-up times depends on covariates. A more complicated situation is that in addition to such follow-up times, there may exist a terminal event that is related to the recurrent event of interest and can occur before the usual follow-up time. The proposed methods will not be valid for this later situation and the new methods that can take into account the terminal event have to be developed. Also in the preceding sections, we have only focused on the situation with time-independent covariates and it is apparent that sometimes there may exist time-dependent covariates. For the situation, the proposed methods may result in biased estimation if covariate values change between gaps as the exchangeability of uncensored gap times that is needed for the proposed estimation procedures would not hold anymore. In other words, one would need some new methods.
Acknowledgements
The authors wish to thank the Editor-in-Chief, Dr. Balakrishnan, the Associate editor and two reviewers for their many comments and suggestions that greatly improved the paper. This research was partly supported by the NSFC grant No. 11471252 to the second author.
Appendix
Some brief proofs of the main results obtained in the previous sections will be given in this appendix. We first summarize some regularity conditions needed for the proofs of the asymptotic theories. Specially, for i = 1, …, n, and q = 1, …, Q, assume that P{Yi1(t) = 1, 0 ≤ t ≤ τ} > 0 for some constant τ > 0, the norm of the covariate vector Zi is bounded, and . Also assume that , , and uniformly converge to κ(t), π(t), and , respectively.
A. The proof for the asymptotic normality of
For i = 1, …, n, let XiIq(i) denote the selected gap time of subject i in the q-th resampling, where is the index for the selected gap time for subject i. For the asymptotically normal property of the estimate , We start from the following equation
| (A.1) |
where β* is on the line segment between and β0. Note that, (A.1) can be rewritten as
It is easy to see that Aq is a positive definite matrix. Since the Q resamples are identically distributed, Aq converges in probability to a deterministic and positive definite matrix denoted by . Meanwhile, we rewrite n−1/2Uq(β0) as
where . Note that is equal to
where , Zi, Yij(u) : 0 ≤ u ≤ t}, and is the σ-algebra consisting of all open sets in R. It can be seen that Mij(t) is a local square-integrable martingale with respect to the marginal filtration when .
For given Q, is equal to
Since the resampling is uniform within a subject, it yields that converges to almost surely as Q → ∞. Therefore, we get
where Φi(β0), i = 1, …, n are independent and have zero mean and finite variance. Combining the multivariate central limit theorem with Slutsky’s theorem, it yields that converges in distribution to a normal random vector with zero mean and covariance matrix that can be consistently estimated by .
Next, in order to obtain the consistent estimator of the covariance matrix, we first note that the following equation is hold (Hoffman, Sen and Weinberg 2001),
where the expectations on the right-hand side are with respect to the resampling distribution for given the observed data. It can be easily seen that , then we can have
| (A.2) |
For the q-th resampled data set, it is obvious that is a consistent estimator of . The resulting estimator denoted by , obtained by averaging over the Q resamples, is also a consistent estimator of . On the second item on the right-hand of the equation (A.2), note that
Denote
the estimated variance-covariance matrix of is (Cong, Yin and Shen, 2007) is
Finally, it need to show the consistency of , it is sufficient to show that , in probability, as n → ∞. It is obvious that as n → ∞. Moreover, using the same arguments as those in Cong, Yin and Shen (2007), we can show that as n → ∞. This proof is completed.
B. The proof for the asymptotic normality of
Firstly, we can see that and have the same limits as those of and . Thus we obtain
Since is the solution of the estimating equation U*(β) = 0, using the Taylor’s expansion, we have
| (A.3) |
where β* is on the line segment between and β0. Rewriting (A.3) yields
where . Some simply computation yields
which is positive definite, and converges in probability to a deterministic and positive definite matrix denoted by . Hence, has the following asymptotic representation
It can be seen that Θi(β0), i = 1, …, n are independent random vectors with zero-mean and bounded variance. By using the multivariate central limit theorem and the Slutsky’s theorem, we obtain that converges to a zero-mean normal random vector with zero-mean and covariance matrix that can be consistently estimated by .
References
- Chang SH 2004. Estimating marginal effects in accelerated failure time models for serial sojourn among repeated event. Lifetime Data Analysis 10: 175–190. [DOI] [PubMed] [Google Scholar]
- Cong X, Yin G, and Shen Y 2007. Marginal analysis of correlated failure time data with informative cluster sizes. Biometrics 63: 663–672. [DOI] [PubMed] [Google Scholar]
- Cook R and Lawless J 2007. The Statistical Analysis of Recurrent Events, 1st edn, New York: Springer-Verlag. [Google Scholar]
- Darlington GA and Dixon SN 2013. Event-weighted proportional hazards modeling for recurrent gap time data. Statistics in Medicine 32: 124–130. [DOI] [PubMed] [Google Scholar]
- Ding J and Sun L 2017. Additive mixed effect model for recurrent gap time data. Lifetime Data Analysis 23: 223–253. [DOI] [PubMed] [Google Scholar]
- Gail MH, Santner TJ, and Brown CC 1980. An analysis of comparative carcino-genesis experiments based on multiple times to tumor. Biometrics 36: 255–266. [PubMed] [Google Scholar]
- Hoffman EB, Sen PK, and Weinberg CR 2001. Within-cluster resampling. Biometrika 88: 1121–1134. [Google Scholar]
- Huang Y and Chen YQ 2003. Marginal regression of gaps between recurrent events. Lifetime Data Analysis 9: 293–303. [DOI] [PubMed] [Google Scholar]
- Kang F, Sun L, and Zhao X 2015. A class of transformed hazard models for recurrent gap times. Computational Statistics and Data Analysis 83: 151–167. [Google Scholar]
- Lin DY, Sun W, and Ying Z 1999. Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika 86: 59–70. [Google Scholar]
- Lin DY and Ying Z 1994. Semiparametric analysis of the additive risk model. Biometrika 81: 61–71. [Google Scholar]
- Liu Y, Sun L, and Zhou Y 2013. Additive transformation models for recurrent events. Communications in Statistics - Theory and Methods 42: 4043–4055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo X and Huang CY 2011. Analysis of recurrent gap time data using the weighted risk-set method and the modified within-cluster resampling method. Statistics in Medicine 30: 301–311. [DOI] [PubMed] [Google Scholar]
- O’Neill TJ 1986. Inconsistency of the misspecified proportional hazards model. Statistics & probability Letters 4: 219–222. [Google Scholar]
- Schaubel DE and Cai J 2004. Regression analysis for gap time hazard functions of sequentially ordered multiplicative failure time data. Biometrika 91: 291–303. [Google Scholar]
- Strawderman RL 2005. The accelerated gap times model. Biometrika 92: 1299–1315. [Google Scholar]
- Sun L, Park D, and Sun J 2006. The additive hazards model for recurrent gap times. Statistica Sinica 16: 919–932. [Google Scholar]
- Wang MC and Chang SH 1999. Nonparametric estimation of a recurrent survival function. Journal of the American Statistical Association 94: 146–153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng D and Cai J 2010. Additive transformation models for clustered failure time data. Lifetime Data Analysis 16: 333–352. [DOI] [PMC free article] [PubMed] [Google Scholar]


