Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Apr 23.
Published in final edited form as: J Nonparametr Stat. 2012 Sep 18;24(4):1041–1050. doi: 10.1080/10485252.2012.720256

Regression analysis of clustered interval-censored failure time data with the additive hazards model

Junlong Li a,*, Chunjie Wang b, Jianguo Sun a
PMCID: PMC4407380  NIHMSID: NIHMS474043  PMID: 25914511

Abstract

This paper discusses regression analysis of clustered failure time data, which means that the failure times of interest are clustered into small groups instead of being independent. Clustering occurs in many fields such as medical studies. For the problem, a number of methods have been proposed, but most of them apply only to clustered right-censored data. In reality, the failure time data is often interval-censored. That is, the failure times of interest are known only to lie in certain intervals. We propose an estimating equation-based approach for regression analysis of clustered interval-censored failure time data generated from the additive hazards model. A major advantage of the proposed method is that it does not involve the estimation of any baseline hazard function. Both asymptotic and finite sample properties of the proposed estimates of regression parameters are established and the method is illustrated by the data arising from a lymphatic filariasis study.

Keywords: additive hazards model, clustered data, estimating equation, interval censoring, semi-parametric regression analysis

1. Introduction

This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are clustered into small groups or some study subjects are related such as siblings, families, or communities. The subjects from the same cluster or group usually share certain unobserved characteristics and their failure times tend to be correlated as a result. Siblings, for example, share similar genetic and environmental influences. A key feature of the failure time data is censoring and one type of censoring that has been extensively discussed is right censoring. Another complex type is interval censoring that arises when the failure event of interest cannot be observed directly but is known only to have occurred over a time interval. This is common and natural in a clinical trial or longitudinal study in which there is a periodic follow-up. For instance, an individual who is monitored weekly for a response may miss visits for a few weeks and return in a changed response state, thus contributing an interval-censored observation. In the following, we will focus on the regression analysis of such data.

For regression analysis of clustered failure time data, several methods have been proposed when only right censoring is present (Cai and Prentice 1997; Cai, Wei and Wilcox 2000; Zeng, Lin and Lin 2008). For example, Cai et al. (2000) and Zeng et al. (2008). For example, Cai et al. (2000) and Zeng et al. (2008) investigated the fitting of semi-parametric linear transformation models to clustered right-censored data and Rossini and Moore (1999) considered the use of estimating equations and pseudolikelihood for the analysis. For the case of interval-censored data, there also exist some procedures when there is no clustering (Jewell 1994; Wang and Ding 2000; Jewell and van der Laan 2004). For example, Huang (1996) considered the maximum-likelihood approach for Cox model regression of current status data, a special case of interval-censored data, and Sun (2006) gave a comprehensive review of the existing literature on the topic.

The additive hazards model is one of the most commonly used models for regression analysis of failure time data (Lin, Oakes and Ying 1998; Ghosh 2001; Martinussen and Sheike 2002; Wang, Sun and Tong 2010; Cai and Zeng 2011). For instance, Lin et al. (1998) considered the fitting of the model to current status data and developed some estimating equation approaches for the estimation of regression parameters. Wang et al. (2010) discussed the fitting of the model to general interval-censored failure time data and Cai and Zeng (2011) investigated the additive mixed effect model for clustered right-censored failure time data. In this paper, an approach is developed for fitting the additive hazards model to clustered interval-censored failure time data.

The reminder of the paper is organised as follows. We will begin in Section 2 with describing notation and models that will be used throughout the paper. In particular, the failure time of interest is assumed to follow the additive hazards model marginally. In Section 3, an estimating equation-based approach is developed for estimating regression parameters of interest and the asymptotic properties of the proposed estimates are established. The approach can be easily implemented and does not require the estimation of the baseline hazard function. Section 4 presents some results obtained from a simulation study performed to evaluate the proposed estimation procedure and Section 5 illustrates the proposed approach by a set of clustered interval-censored data arising from a lymphatic filariasis (LF) study. Some concluding remarks and discussion are provided in Section 6.

2. Notation and models

Consider a survival study that involves n clusters of subjects with ni denoting the size of cluster, where i = 1,…, n. Let Tij denote the failure time of interest for subject j in cluster i, where j = 1,…, ni, and Zij(t) is a possibly time-dependent p-dimensional vector of covariates that is associated with the subject and assumed to be completely observed. Throughout this paper, the Tijs are considered to be independent for subjects in different clusters but could be dependent for subjects within the same cluster, and only interval-censored data on the Tijs are observed. Specifically for each Tij, there exist two monitoring times denoted by Uij and Vij with UijVij such that Tij is known only to be smaller than Uij, between Uij and Vij, or greater than Vij. That is, we have clustered interval-censored data on the Tijs. Furthermore, it will be assumed that Tij is independent of the monitoring times Uij and Vij given covariate process Zij(t).

As mentioned above, our focus will be on the estimation of the covariate effect on the Tijs. For this, we assume that for each cluster, there exists a latent variable bi and all Tijs are independent given bi. Also, the hazard function for Tij is specified by the following additive hazard model:

λij(tZij(s),bi,st)=λ0(t)+β0TZij(t)+bi (1)

given bi and the covariate process up to time t (Lin et al. 1998; Ghosh 2001). Here, λ0(t) is an unknown baseline hazard function and β0 denotes the p-dimensional vector of regression parameters.

In practice, the monitoring variables Uij and Vij could depend on covariates, too. To model this and due to the strict order restriction between them, it is natural to regard them as some failure models, too. Here, we consider employing the Cox-type models

λijU(tZij(s),st)=λ1(t)exp{γ0TZij(t)} (2)

and

λijV(tUij,Zij(s),st)=I(t>Uij)λ2(t)exp{γ0TZij(t)} (3)

for their hazard functions. In the above, both λ1(t) and λ2(t) are unspecified baseline hazard functions and γ0 is a p-dimensional vector of regression parameters. Note that model (3) essentially assumes that the gap time between the two monitoring times Uij and Vij follows a Cox-type model conditional on Uij. There are several motivations for considering the models described. First, it is well known that the Cox model is one of the most widely used models partly due to its simplicity and it will be seen that under the models above, an easy estimation procedure can be developed for regression coefficients without the need to estimate the baseline hazard functions. Also, the model assumptions can be easily checked since we have complete data for the monitoring times. Furthermore, the same baseline hazard functions and covariate effects for all subjects were assumed in the models (1)–(3), respectively, for simplicity and the method developed below can be easily generalised to more general situations. Some comments on these will be given below.

For subject j in cluster i, define

δ1ij=I(Tij<Uij),δ2ij=I(UijTij<Vij),δ3ij=I(TijVij)=1δ1ijδ2ij,

where j = 1,… ni; i = 1,…, n. Then, define the following counting processes:

Nij(1)(t)=(1δ1ij)I(Uijt)=δij(1)I(Uijt)

and conditional on Uij,

Nij(2)(t)=δ3ijI(Vijt)=δij(2)I(Vijt),

where δij(1)=1δ1ij and δij(2)=δ3ij. Note that the definition of Nij(2) is naturally based on the order restriction between Uij and Vij and indicates that Vij is considered only after Uij has been observed. Based on the models (1)–(3), following the same arguments as those in Wang et al. (2010), one can derive the intensity functions of Nij(1) and Nij(2) as

λij(1)(tZij)=I(Uijt)Eb(ebit)eΛ0(t)λ1(t)exp{β0TZij(t)+γ0TZij(t)} (4)

and

λij(2)(tZij)=I(Uij<tVij)Eb(ebit)eΛ0(t)λ2(t)exp{β0TZij(t)+γ0TZij(t)}, (5)

respectively. It can be seen that models (4) and (5) are Cox-type ones similar to models (2) and (3) and model (5) is a conditional one since the starting time point is the observed monitoring time Uij. In the following section, we will establish some estimating equations for the estimation of regression coefficients β0 and γ0.

3. Estimation of regression parameters

Now we consider the estimation of regression parameters β0 and γ0 and for this, we will employ the estimating equation approach. For l = 0, 1 and 2, define

S1,β(l)(t;β,γ)=1ni=1nj=1niI(tUij)Zij(l)(t)exp{βTZij(t)+γTZij(t)}

and

S2,β(l)(t;β,γ)=1ni=1nj=1niI(Uij<tVij)Zij(l)(t)exp{βTZij(t)+γTZij(t)},

where Zij(l)(t)=0tZij(l)(s)ds. Here for any vector a, a(0) = 1, a(1) = a and a(2) = aaT. Based on the models and (5) and motivated by Zhu, Tong and Sun (2008) and Wang et al. (2010), we propose the following estimating function:

Uβ(β,γ)=1ni=1nj=1ni0{Zij(t)S1,β(1)(t;β,γ)S1,β(0)(t;β,γ)}dNij(1)(t)+1ni=1nj=1ni0{Zij(t)S2,β(1)(t;β,γ)S2,β(0)(t;β,γ)}dNij(2)(t)=1ni=1nj=1nik=12δij(k){Zij(Wij(k))Sk,β(1)(Wij(k);β,γ)Sk,β(0)(Wij(k);β,γ)}

for the estimation of β0 if γ0 is known, where Wij(1)=Uij and Wij(2)=Vij.

The key idea here is to reduce general interval-censored data to current status data (Zhu et al. 2008). In the expression of Uβ(β, γ), the first term is the partial likelihood score function under model (4) if one observes only current status data, while the second term is the partial likelihood score function given under model (5) if one considers only current status data given by the observations on the Vijs. Thus, Uβ(β, γ) is an unbiased function. Similar estimating function can be developed for the estimation of γ0. However, complete data are actually available for the monitoring times Uij and Vij and thus it is more efficient to estimate it based on models (2) and (3).

To this end, define N~ij(1)(t)=I(Uijt) and N~ij(2)(t)=I(Vijt) given Uij. Also for l 0, 1 and 2, define

S1,γ(l)(t;γ)=1ni=1nj=1niI(tUij)exp{γTZij(t)}Zij(l)(t)

and

S2,γ(l)(t;γ)=1ni=1nj=1niI(Uij<tVij)exp{γTZij(t)}Zij(l)(t).

Then, an estimating function for γ0 can be constructed as

Uγ(γ)=1ni=1nj=1ni0{Zij(t)S1,γ(1)(t;γ)S1,γ(0)(t;γ)}dN~ij(1)(t)+1ni=1nj=1ni0{Zij(t)S2,γ(1)(t;γ)S2,γ(0)(t;γ)}dN~ij(2)(t)=1ni=1nj=1nik=12{Zij(Wij(k))=Sk,γ(1)(Wij(k);γ)Sk,γ(0)(Wij(k);γ)}.

Define the estimate of γ0 and γ^ to be the solution to Uγ (γ) = 0 and the estimate of β0 and β^ to be the solution to Uβ(β,γ^)=0. It can be easily shown that γ^ is consistent and has an asymptotic normal distribution (Wei, Lin and Weissfeld 1989; Lin 1994). The consistency of β^ can be similarly proved by noting that A^β(β,γ^)=n1Uβ(β,γ^)β converges to a positive matrix at β0. For the asymptotic distribution of β^, one can first show that n12Uβ(β0,γ^) converges in distribution to a vector of normal random variables with a zero mean vector and a covariance matrix that can be consistently estimated. It then follows by the Taylor series expansion of Uβ(β^,γ^) around β0 that the distribution of n12(β^β0) can be asymptotically approximated by the normal distribution with mean zero and a covariance matrix that can be consistently estimated and provided in the appendix. The proof of the result is also sketched in the appendix.

4. A simulation study

We conducted an extensive simulation study to assess the finite sample performance of the estimation procedure proposed in the previous sections. To generate the failure times of interest in the study, it was assumed that Tij followed the model

λij(tZij,bi)=λ0(t)+β0TZij+bi

with λ0(t = 2. Note that the covariate process was assumed to be time independent for simplicity and generated from the Bernoulli distribution with success probability p = 0.5. Also, the latent variables bis were assumed to follow a normal distribution with zero mean and variance equal to 14. For the monitoring variables Uijs and Vijs, they were generated based on the models (2) and (3) with λ1(t) = 4 and λ2(t) = 2, respectively. Finally, the cluster size ni was assumed to follow the uniform distribution U{2, 3, 4}.

The following tables give the results based on 1000 replications, including the bias of the estimates given by the average of the estimates minus the true value (BIAS), the sample standard deviation (SSD) of the estimates, the average of the estimated standard errors (ESE) and the 95% empirical coverage probability (95%-CP). In particular, Table 1 presents the obtained results on the estimates β^ and γ^ based on the simulated data with the true values of β0 being −0.25, 0, 0.25, 0.5 and 1, respectively with γ0 = −0.25. Two choices of cluster sizes, n = 200 and 400, are considered in this simulation study. One can see that these results suggest that the proposed estimates seem to be unbiased and the SSD is close to the ESE, suggesting that the proposed variance estimate and the normality of the estimates are reasonable. It is interesting to find out that the parameter γ seems to be better estimated than the parameter β. This is reasonable as complete data are available for the former and one only observes incomplete data for the latter.

Table 1.

Estimation of γ and β with binary covariate and γ0 = −0.25.

n = 200
n = 400
TRUE EST BIAS SSD ESE 95%-CP BIAS SSD ESE 95%-CP
γ0 = −0.25 γ^ −0.0068 0.1107 0.1077 0.948 −0.0029 0.0772 0.0760 0.949
β0 = −0.25 β^ −0.0638 0.8182 0.7605 0.937 −0.0299 0.5587 0.5331 0.940
γ0 = −0.25 γ^ −0.0084 0.1087 0.1059 0.948 0.0058 0.0752 0.0748 0.965
β0 = 0.00 β^ −0.0694 0.8162 0.7619 0.966 −0.0310 0.5616 0.5352 0.969
γ0 = −0.25 γ^ 0.0112 0.1074 0.1046 0.941 0.0082 0.0745 0.0739 0.952
β0 = 0.25 β^ −0.0767 0.8226 0.7672 0.965 −0.0317 0.5658 0.5376 0.960
γ0 = −0.25 γ^ 0.0193 0.1063 0.1036 0.944 0.0120 0.0734 0.0732 0.945
β0 = 0.50 β^ −0.0832 0.8256 0.7728 0.967 −0.0319 0.5640 0.5419 0.945
γ0 = −0.25 γ^ 0.0341 0.1044 0.1021 0.934 0.0259 0.0721 0.0720 0.937
β0 = 1.00 β^ −0.0962 0.8518 0.7902 0.955 −0.0480 0.5797 0.5533 0.938

Tables 2 and 3 give the results obtained for the estimation of the regression parameters β and γ with γ0 = 0 and 0.25, respectively, and all other set-ups being the same as those in Table 1. They are similar to those given in Table 1 and again suggest that the proposed estimation approach seems to work well for the situations considered.

Table 2.

Estimation of γ and β with binary covariate and γ0 = 0.00.

n = 200
n = 400
TRUE EST BIAS SSD ESE 95%-CP BIAS SSD ESE 95%-CP
γ0 = 0.00 γ^ −0.0051 0.1126 0.1100 0.952 −0.0034 0.0799 0.0777 0.945
β0 = −0.25 β^ 0.0172 0.8776 0.8255 0.938 0.0155 0.6065 0.5795 0.942
γ0 = 0.00 γ^ 0.0019 0.1106 0.1081 0.948 0.0039 0.0785 0.0764 0.945
β0 = 0.00 β^ 0.0315 0.8804 0.8273 0.975 0.0065 0.6158 0.5810 0.968
γ0 = 0.00 γ^ 0.0073 0.1102 0.1065 0.946 0.0122 0.0768 0.0753 0.943
β00 = 0.25 β^ −0.0225 0.8835 0.8303 0.973 −0.0061 0.6106 0.5838 0.963
γ0 = 0.00 γ^ 0.0122 0.1087 0.1053 0.941 0.0143 0.0755 0.0744 0.946
β00 = 0.50 β^ −0.0422 0.8903 0.8389 0.931 −0.0169 0.6110 0.5892 0.941
γ0 = 0.00 γ^ 0.0263 0.1063 0.1035 0.932 0.0212 0.0746 0.0732 0.940
β0 = 1.00 β^ −0.0748 0.9116 0.8583 0.933 −0.0498 0.6539 0.6036 0.960

Table 3.

Estimation of γ and β with binary covariate and γ0 = 0.25.

n = 200
n = 400
TRUE EST BIAS SSD ESE 95%-CP BIAS SSD ESE 95%-CP
γ0 = 0.25 γ^ −0.0036 0.1177 0.1151 0.941 −0.0009 0.0829 0.0831 0.956
β0 = −0.25 β^ 0.0992 0.9643 0.9112 0.963 0.0328 0.7017 0.6768 0.956
γ0 = 0.25 γ^ 0.0035 0.1154 0.1126 0.941 0.0054 0.0812 0.0762 0.948
β0 = 0.00 β^ 0.0850 0.9610 0.9150 0.940 0.0353 0.6957 0.6747 0.956
γ0 = 0.25 γ^ 0.0122 0.1138 0.1107 0.948 0.0118 0.0802 0.0783 0.945
β0 = 0.25 β^ 0.0731 0.9634 0.9133 0.971 0.0307 0.7038 0.6763 0.959
γ0 = 0.25 γ^ 0.0193 0.1124 0.1093 0.941 0.0186 0.0792 0.0772 0.938
β0 = 0.50 β^ 0.0627 0.9690 0.9216 0.936 0.0262 0.7102 0.6822 0.960
γ0 = 0.25 γ^ 0.0286 0.1112 0.1070 0.926 0.0208 0.0769 0.0757 0.935
β0 = 1.00 β^ 0.0432 0.9721 0.9289 0.972 0.0299 0.7129 0.6964 0.969

5. An application

In this section, we apply the estimation procedure proposed in the previous sections to a set of clustered interval-censored failure time data arising from an LF study (Williamson, Kim, Manatunga and Addiss 2008). The study followed 47 men with LF, a debilitating parasitic disease in which several worms live together in several nests. An effective treatment is expected to kill the worms in all of the nests. The goal of the study was to compare the effect of the co-administration of diethylcarbamazine (DEC) and albendazole (ALB) (new treatment) versus DEC alone (standard treatment) for the treatment of LF. The patients in the study were followed for a year since their treatment and periodically examined by ultrasound to see if the worms were still alive. Thus, for the times to the clearance of the worms in each nest, the variables of interest, only clustered interval-censored data were observed with each patient serving as a cluster and the cluster size being the number of nests of adult filial worms in the body of each patient.

Among 47 patients, 22 received the co-administration of DEC and ALB, while the others were given DEC alone. In total, 78 adult worm nests were detected by ultrasound with the cluster size ni ranging from 1 to 5. In addition to the treatment indicator, the age of each subject in years was also observed, ranging from 16 to 66. In the analysis below, we define X1i to be 0 if subject i was given the co-administration of DEC and ALB and 1 otherwise and let X2i be the age of the corresponding patient. Note that here we only have cluster-specific covariates.

To apply the proposed method, we assume that the times to the clearance of the worms and the monitoring times can be described by models (1)–(3), respectively. Since the observed data were given in the form Tij ∈ [Lij, Rij), that is, we have a mixture of left-, interval-, and right-censored observations, we need to transfer them to the form expressed by Uij and Vij to implement the proposed estimation procedure. For this and an observed interval [Lij, Rij), we set Uij = Rij and Vij to be the largest observation time in the study if Lij = 0; if 0 < Lij < Rij < +∞, we let Uij = Lij and Vij = Rij; if Rij = +∞, we take Vij = Lij and Uij to be the smallest observation time in the study. Correspondingly, the estimating function Uγ (γ) needs to be adjusted to

Uγ(γ)=1ni=1nj=1nik=12δ~ij(k){Zij(Wijk)Sk,γ(1)(Wij(k);γ)Sk,γ(0)(Wij(k);γ)},

where δ~ij(1)=1δ3ij and δ~ij(2)=1δ1ij. In contrast, the estimating function Uβ(β, γ) remains the same. This essentially treats Uij as missing in the right-censored case, and Vij as missing in the left-censored case.

The results obtained by the application of the proposed estimation procedure to the data are presented in Table 4 and it includes the estimated treatment and age effects on the time to the clearance of the worms, the estimated standard deviation (SD), and the p-values for testing the covariate effects equal to zero. They suggest that the two treatments seem to have no significant difference in killing or cleaning the worms and also the clearance of the worms did not seem to be significantly related to the age of the patient. However, one may be careful about the conclusions due to the small number of subjects.

Table 4.

Estimations from the LF study.

Covariate Estimate SD p-Value
Treatment (γ1) −0.375 0.383 0.327
Age (γ1) 0.0061 0.020 0.755
Treatment (β1) −0.522 0.442 0.237
Age (β2) 0.0108 0.415 0.794

6. Concluding remarks and discussion

As mentioned before, clustered failure time data occurs in a study if study subjects are related through being clustered into small groups. In this case, one has to take into account the correlation among them to perform a valid analysis. In the previous sections, an estimation procedure was developed for the problem in the presence of interval censoring for the data arising from the additive hazards model and the asymptotic properties of the proposed estimates were established. One major advantage of the presented method is that it does not involve the estimation of the baseline hazard functions.

For the problem considered here, an alternative approach is to employ the full likelihood approach and a main advantage of this method is that one can expect that it could be more efficient than the estimating equation method proposed here. However, the full likelihood approach would be time-consuming and may be infeasible if a nonparametric or semi-parametric approach was adopted since it involves the estimation of the infinite-dimensional functions. Also, it could be difficult to derive the asymptotic properties of the resulting estimates. In contrast, the proposed method can be easily implemented.

The proposed methodology involves modelling gap times between the adjacent monitoring times using the Cox model. An alternative choice is to model the monitoring times using the Cox model marginally. However, such modelling requires stricter conditions due to the order relationship. In contrast, the gap time modelling approach is more flexible. In the presented approach, the latent variables bis are assumed to follow the normal distribution for simplicity and the methodology still applies for other distributions.

Acknowledgements

The authors are grateful to the editor and the reviewers for their insightful comments on the article. This work was partially supported by NIH grant 5 R01 CA152035 to the third author.

Appendix

In the following, we will sketch the proofs of the asymptotic normality of the estimates γ^ and β^. First, we will discuss the asymptotic normality of γ^. For this and l = 0, 1, 2, let s0,γ(l) and s1,γ(l) denote the limits of S0,γ(l) and S1,γ(l), respectively. By following Wang et al. (2010), one can obtain that

γ^γ0=Uγγ11ni=1nUi(γ0)+op(n12), (1)

where

Ui(γ)=j=1nik=12{Zij(Wij(k)sk,γ(1)(Wij(k);γ)sk,γ(0)(Wij(k);γ)}

and

Uγγ=E{j=1nik=12sk,γ(2)(Wij(k);γ0)sk,γ(0)(Wij(k);γ0)sk,γ(1)(Wij(k);γ0)sk,γ(1)(Wij(k);γ0)Tsk,γ(0)(Wij(k);γ0)2}.

with Wij(1)=Uij and Wij(2)=Vij. Thus, the distribution of n12(γ^γ0) can be approximated by the normal distribution with mean zero and covariance matrix Uγγ1ΣγUγγ1 which can be consistently estimated by U^γγ1Σ^γU^γγ1, In the above,

Σ^γ=1ni=1nUi(γ^)Ui(γ^)T,
Ui(γ^)=j=1nik=12{Zij(Wij(k))sk,γ^(1)(Wij(k);γ^)sk,γ^(0)(Wij(k);γ^)}

and

U^γγ=1ni=1nj=1nik=12{Sk,γ(2)(Wij(k);γ^)Sk,γ(0)(Wij(k);γ^)Sk,γ(1)(Wij(k);γ^)Sk,γ(1)(Wij(k);γ^)TSk,γ(0)(Wij(k);γ^)2}.

For the asymptotic normality of β^, let sk,β(l) denote the limit of Sk,β(l),l=0,1,2 and k = 1, 2. By some calculation and Equation (1), we have

U(β,γ0)U(β,γ^)=1ni=1nj=1nik=12δij(k){Sk,β(1)(Wij(k);β,γ^Sk,β(1)(Wij(k);β,γ0Sk,β(0)(Wij(k);β,γ^)Sk,β(1)(Wij(k);β,γ0)[Sk,β(0)(Wij(k);β,γ^)Sk,β(0)(Wij(k);β,γ0)]Sk,β(0)(Wij(k);β,γ^)Sk,β(0)(Wij(k);β,γ0}=1ni=1nj=1nik=12δij(k){Sk,β(1)(Wij(k);β,γ^Sk,β(1)(Wij(k);β,γ0sk,β(0)(Wij(k);β,γ0)sk,β(1)(Wij(k);β,γ0)[Sk,β(0)(Wij(k);β,γ^)Sk,β(0)(Wij(k);β,γ0)]sk,β(0)(Wij(k);β,γ0)2}+op(n12).

Note that here δij(1)=1δ1ij and δij(2)=δ3ij. Furthermore, define the following for l = 0, 1,

A1(l)(t;β,γ)=E[j=1niI(tUij)Zij(l)(t)Zij(t)Texp{βTZij(t)+γTZij(t)}],
A2(l)(t;β,γ)=E[j=1niI(Uij<tVij)Zij(l)(t)Zij(t)Texp{βTZij(t)+γTZij(t)}]

and

A3(β,γ)=E[j=1nik=12δij(k){Ak(1)(Wij(k);β,γ)sk,β(0)(Wij(k);β,γ)sk,β(1)(Wij(k);β,γ)Ak(0)(Wij(k);β,γ)sk,β(0)(Wij(k);β,γ)}].

By the Taylor series expansion and Equation (1), one obtains that

U(β,γ^)U(β,γ0)=1ni=1j=1nik=12δij(k)(γ^γ0){Ak(1)(Wij(k);β,γ0)sk,β(0)(Wij(k);β,γ0)sk,β(1)(Wij(k);β,γ0)Ak(0)(Wij(k);β,γ0)sk,β(0)(Wijk;β,γ0)2}+op(n12)=E[j=1nik=12δij(k)(γ^γ0){Ak(1)(Wij(k);β,γ0)sk,β(0)(Wij(k);β,γ0)sk,β(1)(Wij(k);β,γ0)Ak(0)(Wij(k);β,γ0)sk,β(0)(Wijk;β,γ0)2}]+op(n12).

This yields that

U(β,γ^)=1ni=1nai(β,γ0)+op(n12),

where

ai(β,γ)=j=1nik=12[δij(k){Zij(Wij(k))sk,β(1)(Wij(k);β,γ)sk,β(0)(Wij(k);β,γ)}+A3(β,γ)Uγγ1{Zij(Wij(k)sk,γ(1)(Wij(k);γ)sk,γ(0)(Wij(k);γ)}].

This shows that the distribution of n12(β^β0) can be approximated by the normal distribution with zero mean and the covariance matrix that can be consistently estimated by Γ^1Σ^Γ^1, where

Γ^=1ni=1nj=1nik=12{Sk,β(2)(Wij(k);β^,γ^)Sk,β(0)(Wij(k);β^,γ^)Sk,β(1)(Wij(k);β^,γ^)Sk,β(1)(Wij(k);β^,γ^)TSk,β(0)(Wij(k);β^,γ^)2},
Σ^=1ni=1nai(β^,γ^)ai(β^,γ^)T,
ai(β^,γ^)=j=1nik=12[δij(k){Zij(Wij(k))Sk,β(1)(Wij(k);β^,γ^)Sk,β(0)(Wij(k);β^,γ^)}+A3(β^,γ^)U^γγ1{Zij(Wij(k))Sk,γ(1)(Wij(k);γ^)Sk,γ(0)(Wij(k);γ^)}]

and

A3(β^,γ^)=1ni=1n[j=1nik=12δij(k){Ak(1)(Wij(k);β^,γ^)Sk,β(0)(Wij(k);β^,γ^)Sk,β(1)(Wij(k);β^,γ^)Ak(0)(Wij(k);β^,γ^)Sk,β(0)(Wij(k);β^,γ^)2}].

References

  1. Cai J, Prentice R. Regression Estimation Using Multivariate Failure Time Data and a Common Baseline Hazard Function Model. Lifetime Data Analysis. 1997;3:197–213. doi: 10.1023/a:1009613313677. [DOI] [PubMed] [Google Scholar]
  2. Cai J, Zeng D. Additive Mixed Effect Model for Clustered Failure Time Data. Biometrics. 2011;67:1340–1351. doi: 10.1111/j.1541-0420.2011.01590.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cai T, Wei L, Wilcox M. Semi-parametric Regression Analysis for Clustered Failure Time Data. Biometrika. 2000;87:867–878. [Google Scholar]
  4. Ghosh D. Efficiency Considerations in the Additive Hazard Model with Current Status Data. Statistica Neerlandica. 2001;55:367–376. [Google Scholar]
  5. Huang J. Estimation for the Cox Model with Interval Censoring. The Annals of Statistics. 1996;24:540–568. [Google Scholar]
  6. Jewell NP. Non-parametric Estimation and Doubly-Censored Data: General Ideas and Applications to Aids. Statistics in Medicine. 1994;13:2081–2095. doi: 10.1002/sim.4780131917. [DOI] [PubMed] [Google Scholar]
  7. Jewell NP, van der Laan MJ. Current Status Data: Review, Recent Developments and Open Problems. Advances in Survival Analysis. 2004;23:625–642. [Google Scholar]
  8. Lin D. Cox Regression Analysis of Multivariate Failure Time Data: The Marginal Approach. Statistics in Medicine. 1994;85:2233–2247. doi: 10.1002/sim.4780132105. [DOI] [PubMed] [Google Scholar]
  9. Lin D, Oakes D, Ying Z. Additive Hazard Regression with Current Status Data. Biometrika. 1998;85:289–298. [Google Scholar]
  10. Martinussen T, Sheike TH. Efficient Estimation in Additive Hazard Regression with Current Status Data. Biometrika. 2002;89:649–658. [Google Scholar]
  11. Rossini A, Moore D. Modeling Clustered, Discrete, or Grouped Time Survival Data with Covariates. Biometrics. 1999;55:813–819. doi: 10.1111/j.0006-341x.1999.00813.x. [DOI] [PubMed] [Google Scholar]
  12. Sun J. The Statistical Analysis of Interval Censored Failure Time Data. Springer; New York: 2006. [Google Scholar]
  13. Wang W, Ding AA. On Assessing the Association for Bivariate Current Status Data. Biometrika. 2000;87:879–893. [Google Scholar]
  14. Wang L, Sun J, Tong X. Regression Analysis of Case II Interval-Censored Failure Time Data with the Additive Hazard Model. Statistica Sinica. 2010;20:1709–1723. [PMC free article] [PubMed] [Google Scholar]
  15. Wei L, Lin D, Weissfeld L. Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions. Journal of the American Statistical Association. 1989;84:1065–1073. [Google Scholar]
  16. Williamson J, Kim H, Manatunga A, Addiss D. Modeling Survival Data with Informative Cluster Size. Statistics in Medicine. 2008;27:543–555. doi: 10.1002/sim.3003. [DOI] [PubMed] [Google Scholar]
  17. Zeng D, Lin D, Lin X. Semiparametric Transformation Models with Random Effects for Clustered Failure Time Data. Statistica Sinica. 2008;18:355–377. [PMC free article] [PubMed] [Google Scholar]
  18. Zhu L, Tong X, Sun J. A Transformation Approach for the Analysis of Interval-Censored Failure Time Data. Lifetime Data Analysis. 2008;14:167–178. doi: 10.1007/s10985-007-9075-8. [DOI] [PubMed] [Google Scholar]

RESOURCES