Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Oct 14.
Published in final edited form as: Lifetime Data Anal. 2017 Jun 12;24(1):94–109. doi: 10.1007/s10985-017-9397-0

Joint analysis of interval-censored failure time data and panel count data

Da Xu 1, Hui Zhao 2, Jianguo Sun 1,3
PMCID: PMC6790980  NIHMSID: NIHMS1054144  PMID: 28608228

Abstract

Interval-censored failure time data and panel count data are two types of incomplete data that commonly occur in event history studies and many methods have been developed for their analysis separately (Sun in The statistical analysis of interval-censored failure time data. Springer, New York, 2006; Sun and Zhao in The statistical analysis of panel count data. Springer, New York, 2013). Sometimes one may be interested in or need to conduct their joint analysis such as in the clinical trials with composite endpoints, for which it does not seem to exist an established approach in the literature. In this paper, a sieve maximum likelihood approach is developed for the joint analysis and in the proposed method, Bernstein polynomials are used to approximate unknown functions. The asymptotic properties of the resulting estimators are established and in particular, the proposed estimators of regression parameters are shown to be semiparametrically efficient. In addition, an extensive simulation study was conducted and the proposed method is applied to a set of real data arising from a skin cancer study.

Keywords: Bernstein polynomial, Event history study, Frailty model, Sieve maximum likelihood estimation

1. Introduction

Interval-censored failure time data and panel count data are two types of incomplete data that commonly occur in event history studies (Sun 2006; Sun and Zhao 2013). The former concerns the occurrence rate of a failure event or the event that occurs only once or only whose first occurrence is of interest, while the latter provides information on the occurrence rate of a recurrent event. Furthermore, the former means that the occurrence time of the event is observed or known only to belong to an interval, while the latter means that one only observes the numbers of the occurrences of the event between some discrete observation times. A common feature behind them is that they typically involve or occur with a periodic follow-up observation scheme. In the following, we will focus on the situations where the two types of the incomplete data occur together and discuss their joint analysis.

There exist many fields that often produce both interval-censored failure time data and panel count data together such as economic studies, medical studies, reliability experiments and social sciences. In an economic study, for example, one may have or need to analyze the data both on the length of holding a certain credit card and on employment changes. It is easy to see that such data are usually collected from periodic follow-up studies and thus given in the forms of interval-censored data and panel count data, respectively. An example from medical studies is clinical trials with two composite prime endpoints defined by a failure time event and a recurrent event. It is well-known that clinical trials usually employ periodic follow-up schemes and thus only interval-censored data and panel count data are available.

Many authors have discussed the analysis of either interval-censored failure time data or panel count data. For example, some early references on interval-censored data are given by Turnbull (1976) and Finkelstein (1986). The former discussed nonparametric estimation of a survival function based on interval-censored data and the latter investigated regression analysis of interval-censored data under the proportional hazards model. Some more recent references on interval-censored data include Huang and Rossini (1997), Banerjee and Sen (2007) and especially, Chen et al. (2012) and Sun (2006) give relatively complete reviews of the literature on interval-censored data. The authors who have investigated the analysis of panel count data include Groeneboom and Wellner (1992), Sun and Wei (2000), Wellner and Zhang (2000), and Zhang et al. (2005). Especially, Sun and Zhao (2013) provides a relatively complete review of the literature on them, and several authors have considered sieve estimation methods for either interval-censored or panel count data separately (Hua et al. 2014; Lu et al. 2009; Ma et al. 2015; Zhang et al. 2010). However, it does not seem to exist an established approach for joint analysis of the two types of data together. In the following, we will present a sieve maximum likelihood approach for the problem based on Bernstein polynomials.

To describe the approach, we will first introduce some notation and the models as well as the resulting likelihood function in Sect. 2. Section 3 presents the proposed sieve maximum likelihood estimation procedure and in the approach, Bernstein polynomials are employed to approximate the unknown baseline mean and baseline cumulative hazard functions. In addition, the asymptotic properties of the resulting estimators are established. In Sect. 4, we present some results obtained from an extensive simulation study for the assessment of the proposed method and they suggest that it works well for practical situations. Section 5 provides an illustrative example and some discussion and concluding remarks are given in Sect. 6.

2. Notation, models and likelihood function

Consider an event history study that involves n independent subjects and two events of interest, a failure event and a recurrent event. For subject i, suppose that there exists a p-dimensional vector of covariates denoted by Zi, and let Ti and Ni(t) denote the occurrence time of the failure event and the number of the occurrences of the recurrent event up to time t, respectively, i = 1, 2, … , n. Also suppose that each subject is observed only at a sequence of time points denoted by si1<si2<<si,mi, where mi denotes the number of observations on subject i. Hence for each subject, one only observes

Xi={mi,Ni(sij),δik=I(Ti(si,k1,si,k]);
j=1,2,,mi,k=1,2,,mi+1},

where si0 = 0 and si,mi+1=. That is, we only have interval-censored data on the Ti’s and panel count data on the Ni(t)’s.

To describe the effects of covariates on Ti and Ni(t) and the possible correlation between Ti and Ni(t), we will assume that there exists a latent variable ηi with mean 1 and unknown variance γ > 0. Suppose that given Zi and ηi, the cumulative hazard function of Ti has the form

Λ(t|Zi,ηi)=ηiΛ1(t)eαZi, (1)

where Λ1 denotes an unknown baseline cumulative hazard function and α is a vector of regression parameters. That is, Ti follows the proportional hazards frailty model. For Ni(t), we will assume that it is a nonhomogeneous Poisson process with the proportional mean function

E{Ni(t)|Zi,ηi}=ηiΛ2(t)eβZi, (2)

where Λ2 is an unknown, nondecreasing baseline mean function and β a vector of regression parameters as α. Note that it is easy to see that ηi represents a measure of the association between Ti and Ni(t) and ηi = 1 means that they are independent given covariates. In the following, we will assume that the frailty ηi is independent of {Zi, Ti, Ni(t)} and given Zi and ηi, Ti and Ni(t) are independent. Also it will be assumed that given Zi, {mi, sij; j = 1, 2, … , mi} and ηi, Ti, Ni(t) are independent, and the conditional distribution of {mi, sij; j = 1, 2, … , mi} given Zi does not involve the parameters in models (1) and (2).

Define θ = (α′, β′, γ, Λ1, Λ2). Then the likelihood function of θ is given by Ln(θ)=i=1nL(θ|Xi), where

L(θ|Xi)=Eη{k=1mi+1[eηiΛ1(si,k1)eαZieηiΛ1(si,k)eαZi]δik×j=1mi[ηiΔΛ2(sij)eβZi]ΔNi(sij)eηiΔΛ2(sij)eβZi},

with

ΔNi(sij)=Ni(sij)Ni(si,j1),ΔNi(si1)=Ni(si1),
ΔΛ2(sij)=Λ2(sij)Λ2(si,j1),ΔΛ2(si1)=Λ2(si1).

Note that if the ηi’s are assumed to follow the gamma distribution, the likelihood contribution L(θ|Xi) can be simplified to

L(θ|Xi)=Qik=1mi+1(Ai,k1Ai,k)δik,

where

Qi=Γ(Ni+γ1)Γ(γ1)j=1miΔΛ2(sij)ΔNi(sij)(γeβZi)Ni,
Ai0=[1+γeβZiΛ2(si,mi)]Niγ1,Ai,mi+1=0,
Aik=[1+γeβZiΛ2(si,mi)+γeαZiΛ1(sik)]Niγ1,k=1,2,,mi

with Ni=i=1miΔNi(sij)=Ni(si,mi). In the following, for the simplicity, we will focus on the situation where the ηi’s follow the gamma distribution but the method developed below applies to general situations. In the next section, we will discuss the estimation of θ.

3. Estimation and inference procedures

Now we will discuss the estimation of the parameters θ and for this, it is apparent that a natural way would be to maximize the likelihood function Ln(θ). On the other hand, it is easy to see that this would be difficult or not straightforward. Thus instead, by following Huang and Rossini (1997) and Ma et al. (2015) as well as others, we propose to employ the sieve maximum likelihood estimation approach. More specifically, define

Θ={θ=(α,β,γ,Λ1,Λ2)BM1M2},

the parameter space of θ, and

Θn={θn=(α,β,γ,Λ1n,Λ2n)BMn1Mn2},

the sieve space. In the above,

B={(α,β,γ)|(α,β,γ)R2p×R+,α+β+γM}

with M being a positive constant, Mj denotes the collection of all bounded and continuous non-decreasing, non-negative functions over the interval [cj, uj], and

Mnj={Λjn(t)=k=0mϕjkBk(t,m,cj,uj):0km|ϕjk|Mn,0ϕj0ϕj1ϕjm}

with

Bk(t,m,cj,uj)=(mk)(tcjujcj)k(1tcjujcj)mk,k=0,,m,

the Bernstein basis polynomials of degree m = o(nν) for some ν ∈ (0, 1), where 0 ⩽ cj < uj < ∞ with [cj, uj] usually taken as the range of observed data, j = 1, 2.

It is easy to see that one can approximate Λj(t) by Λjn(t) with the coefficients ϕjk = Λj(cj + (k/m)(ujcj)) or approximate the parameter space Θ by the Bernstein polynomials-based sieve space Θn. The use of Bernstein polynomials transfers an estimation problem about both finite-dimensional and infinite-dimensional parameters into a simpler estimation problem that involves only finite-dimensional parameters. Of course, one may use other approximations such as splines and piecewise linear functions (Huang and Rossini 1997; Ding and Nan 2011). One advantage of Bernstein polynomials is that they can naturally model the nonnegativity and monotonicity of Λ1 and Λ2 under the constraint 0 ≤ ϕj0ϕj1 ≤ ⋯ ≤ ϕjm, which can be easily removed through reparameterization in implementation (Lorentz 1986; Osman and Ghosh 2012). Also it is known that the Bernstein polynomial has the optimal shape preserving property among all approximation polynomials (Carnicer and Peña 1993). Furthermore, although the use of Bernstein polynomials may seem to be complex, it actually can be relatively easily implemented as seen below and they do not require the specification of interior knots as spline functions. One can show that the size of the sieve space defined above can be controlled by Mn = O(na) with a being a positive constant (Lorentz 1986; Shen 1997).

For estimation of θ, define the sieve maximum likelihood estimator θ^n=(α^n,β^n,γ^n,Λ^1n,Λ^2n) to be the value of θ that maximizes the log likelihood function ln(θ) = log {Ln(θ)} over the sieve space Θn. Let θ0=(α0,β0,γ0,Λ10,Λ20) denote the true value of θ. For the asymptotic properties of θ^n, we need the following regularity conditions.

Condition 1 The covariate Z has a bounded support in Rp.

Condition 2 If there exist constant ξ0 and a0 such that ξ0Z=a0 almost surely, then a0 = ξ0 = 0.

Condition 3 For j = 1, 2, the κth derivatives of Λj0(·), denoted by Λj0(κ)(), is Holder continuous such that |Λj0(κ)(t1)Λj0(κ)(t1)|M|t1t2|η for some η ∈ (0, 1] and t1, t2 ∈ [c, u], where 0 < c < u < ∞ and M are some constants. Define r = κ + η.

Condition 4 For any ε > 0, supd(θ,θ0)εPl(θ,X)<Pl(θ0,X).

Condition 5 The matrix E(SϑSϑ) is finite and positive definite with ϑ = (α′, β′, γ′), where Sϑ is defined below.

Note that the conditions above are generally mild and satisfied in practical situations (Huang and Rossini 1997). In particular, Condition 2 is needed for the identifiability of the parameters, which is equivalently to the linear independence of the components of Z. The following theorems give the consistency, the rate of convergence and the asymptotic normality of the estimator.

Theorem 1 Suppose that the Conditions 1 – 4 described above hold. Then α^n, β^n and γ^n are strongly consistent, and as n → ∞,

Λ^1nΛ1020,Λ^2nΛ2020

almost surely, where ‖f (X)2 = ( | f |2dP)1/2 is defined as norm for a function f with P being the probability measure for X.

Theorem 2 Suppose that the Conditions 1 – 4 described above hold and r > 2 with r defined in Condition 3. Then as n → ∞, we have that

Λ^1nΛ102+Λ^2nΛ202=Op(n(1v)/2+nrv/2).

Theorem 3 Suppose that the Conditions 1 – 5 described above hold and r > 2. Then as n → ∞, we have

n(ϑ^nϑ0)N(0,Σ)

in distribution and furthermore ϑ^n is semiparametrically efficient, where Σ is defined in the Appendix, ϑ^n=(α^n,β^n,γ^n) and ϑ0=(α0,β0,γ0).

The proof of the theorems above are sketched in the Appendix. To make use of the results above, it is apparent that one needs to estimate the covariance matrix of ϑ^n. One natural way would be to employ the inverse of the information matrix of the log likelihood function ln(θ). On the other hand, this is quite difficult because of the complicated form of the information matrix. To deal with, suggested by a referee, we adopt the following simple bootstrap procedure. Let B denote a prespecified positive integer. For each b, where 1 ≤ bB, draw a simple random sample of size n,

D(b)={mi(b),Ni(b)(sij(b)),δik(b)=I(Ti(si,k1(b),si,k(b)]);j=1,2,,mi(b),k=1,2,,mi(b)+1,i=1,2,n},

with replacement from the observed data

D={mi,Ni(sij),δik=I(Ti(si,k1,si,k]);j=1,2,,mi,k=1,2,,mi+1,i=1,2,n}.

Let ϑ^n(b) denote the proposed estimate of ϑ based on the data set D(b) defined above. Then a natural estimate of the covariance matrix of ϑ^n is given by

^=1B1b=1B{ϑ^n(b)1Bb=1Bϑ^n(b)}2.

For the choice of positive integer B, for a practical problem, one may start with some reasonable values and then increase them until the results are stable. For example, it is common to choose B = 100. For a simulation study, if using enough replications, one may actually only need to use smaller values to save the computational effort.

For the implementation of the estimation procedure proposed above, two issues need to be addressed. One is that there exist some restrictions on the parameters due to the nonnegativity and monotonicity of the functions Λ1 and Λ2, and for this, one can easily remove them by using some reparameterization. A natural way is to reparameterize the frailty variance parameter γ as exp(γ*) and the parameters {ϕj0, … , ϕjm} as the cumulative sums of {exp(ϕj0*),,exp(ϕjm*)}, j = 1, 2, giving the total number parameters to be estimated being 2(p + m) + 3. Another issue is the selection of the degree m of the Bernstein polynomials for the parameter space Θn, which controls the roughness or smoothness of the approximation. It is apparent that a simple approach is to use several different values that are in the order o(nν) and compare the results. As an alternative, by following the BIC criterion commonly used for model selection (Burnham and Anderson 2003), one can choose the value of m that minimizes

BIC=2ln(θ^n)+(2(p+m)+3)logn.

4. A simulation study

An extensive simulation study was conducted to assess the performance of the estimation procedure proposed in the previous sections, including the general performance and the robustness to some assumptions. In the study, we used the design similar to that used in Hua et al. (2014) and others and generated the covariate Zi’s and the latent variables ηi’s from the Bernoulli distribution with the probability of success 0.5 and the gamma distribution with mean 1 and variance γ, respectively. Furthermore, the total number of observation times mi was assumed to follow the uniform distribution over {1, 2, 3, 4, 5}, and given mi, the observation times sij’s were taken to be the order statistics of the mi random variables from the uniform distribution over (0.02, 5). Then given the Zi’s, ηi’s, mi’s and sij’s, the failure times Ti’s were generated under model (1) with Λ1(t) = t and the panel count data Ni(sij)’s under model (2) with Λ2(t) = t such that

Ni(si,1)~Poisson{ηiΛ2(si,1)exp(βZi)},
Ni(sij)Ni(si,j1)~Poisson{ηi[Λ2(sij)Λ2(si,j1)]exp(βZi)},

for j = 2, … , mi, i = 1, … , n. The results given below are based on n = 200, B = 100 and 1000 replications.

Table 1 presents the results obtained on estimation of the parameters α, β and γ with α0 = −0.5, 0 or 0.5, β0 = −0.5, 0 or 0.5, and γ0 = 1.2. Here we took [cj, uj] = [0.02, 5] and used m = [n1/2] = 3, largest the integer smaller than n1/4. The results include the estimated bias] (Bias) calculated as the average of the point estimates minus the true value, the sample standard errors (SSE) of the point estimates, the average of the bootstrap standard error estimates (BSE), and the 95% empirical coverage probability (CP). They suggest that the proposed estimator seems to be unbiased and the bootstrap variance estimation also seems to be appropriate. In addition, the normal approximation to the distribution of the proposed estimators appears to be reasonable. We also considered several other values for m and obtained similar results. In other words, the proposed estimator seems to be robust to the choice of m.

Table 1.

Estimation of regression parameters α and β and the variance parameter γ

0, β0) Parameters Bias SSE BSE CP
(0, 0) α −0.0115 0.2818 0.2800 94.3
β −0.0076 0.1793 0.1876 94.6
γ −0.0204 0.1247 0.1283 95.4
(0, 0.5) α −0.0179 0.2693 0.2625 94.4
β −0.0110 0.1680 0.1733 95.3
γ −0.0126 0.1240 0.1221 94.6
(0, −0.5) α 0.0130 0.2740 0.2746 94.4
β −0.0062 0.1749 0.1778 94.6
γ −0.0051 0.1375 0.1337 94.0
(0.5, 0) α −0.0119 0.2885 0.2992 94.7
β −0.0046 0.1792 0.1715 94.2
γ −0.0209 0.1229 0.1279 94.6
(0.5, 0.5) α 0.0099 0.2734 0.2790 94.8
β −0.0084 0.1637 0.1740 95.6
γ −0.0125 0.1252 0.1194 93.8
(0.5, −0.5) α 0.0161 0.2796 0.2865 94.4
β −0.0067 0.1746 0.1779 95.2
γ −0.0058 0.1361 0.1331 94.1
(−0.5, 0) α −0.0165 0.2855 0.2913 95.7
β −0.0111 0.1789 0.1815 95.3
γ −0.0205 0.1244 0.1289 95.6
(−0.5, 0.5) α −0.0009 0.2824 0.2875 94.9
β 0.0062 0.1738 0.1794 95.5
γ −0.0153 0.1224 0.1272 95.5
(−0.5, −0.5) α 0.0040 0.2750 0.2722 94.8
β −0.0062 0.1748 0.1779 94.8
γ −0.0038 0.1375 0.1346 94.0

Note that in the study above, we generated the latent variables ηi’s from the gamma distribution and a question of practical interest is the performance of the proposed method in the case of misspecified distributions for the latent variables. To investigate this, we repeated the simulation study above in which we generated the ηi’s from the log-normal distribution with the location parameter μ = −0.168 and the scale parameter σ2 = 0.336 but pretended they were from the gamma distribution. Table 2 gives the results obtained on estimation of the regression parameters α and β and they seem to yield similar conclusions as those given in Table 1. In other words, this suggests that the proposed estimation procedure seems to be robust to the misspecification of the distribution of the latent variables.

Table 2.

Estimation of regression parameters α and β with misspecified frailty distribution

(α0, β0) Parameters Bias SSE BSE CP
(0, 0) α −0.0059 0.2002 0.2104 94.8
β −0.0020 0.1125 0.1162 94.4
(0, 0.5) α 0.0360 0.2324 0.2208 94.3
β −0.0022 0.1207 0.1133 93.3
(0, −0.5) α −0.0154 0.2104 0.2174 95.5
β 0.0009 0.1143 0.1137 94.6
(0.5, 0) α −0.0050 0.2096 0.2065 94.4
β 0.0001 0.1147 0.1076 93.2
(0.5, 0.5) α 0.0052 0.2318 0.2368 94.3
β 0.0031 0.1186 0.1186 94.3
(0.5, −0.5) α 0.0198 0.2282 0.2225 94.8
β −0.0163 0.1251 0.1214 95.0
(−0.5, 0) α −0.0054 0.2282 0.2403 95.7
β −0.0018 0.1249 0.1277 94.6
(−0.5, 0.5) α 0.0198 0.2282 0.2225 94.8
β −0.0163 0.1252 0.1264 95.1
(−0.5, −0.5) α −0.0168 0.2145 0.2078 93.8
β −0.0024 0.1221 0.1217 94.4

In the next scenario, we repeated the study that gave the results in Table 2 but generated the panel count data Ni(sij)’s from the mixed-Poisson processes. Specifically, we first generated a random sample {ν1, ν2, … , νn} from {−0.25, 0, 0.25} with P(νi = −0.25) = P(νi = 0.25) = 1/4 and P(νi = 0) 1/2. For each i, given νi, ηi and Zi, the {Ni(sij)}j=1mi were then generated from the Poisson process with the mean function ηi (1 − νi) Λ2(t) exp(βZi). The results obtained on estimation of the regression parameters are presented in Table 3 and indicate that the proposed estimation procedure seems to work well for the situations considered. In other words, the estimation procedure appears to be robust against the Poisson assumption.

Table 3.

Estimation of regression parameters α and β with misspecified frailty distribution and based on the data from mixed Poisson processes

(α0, β0) Parameters Bias SSE BSE CP
(0, 0) α −0.0001 0.2167 0.2105 94.3
β 0.0014 0.1209 0.1154 94.2
(0, 0.5) α −0.0181 0.2099 0.2082 94.8
β −0.0030 0.1208 0.1293 95.2
(0, −0.5) α 0.0074 0.2146 0.2219 95.2
β 0.0043 0.1260 0.1239 95.2
(0.5, 0) α 0.0474 0.2290 0.2230 93.8
β 0.0087 0.1222 0.1156 92.8
(0.5, 0.5) α 0.0214 0.2094 0.2284 95.6
β −0.0054 0.1205 0.1146 93.9
(0.5, −0.5) α 0.0341 0.2285 0.2255 94.8
β 0.0026 0.1381 0.1435 95.1
(−0.5, 0) α −0.0400 0.2067 0.2093 94.6
β 0.0038 0.1220 0.1195 94.4
(−0.5, 0.5) α −0.0106 0.2075 0.2184 95.6
β −0.0004 0.1227 0.1204 94.3
(−0.5, −0.5) α −0.0232 0.2263 0.2172 93.4
β −0.0061 0.1216 0.1283 95.2

5. An application

In this section, we apply the estimation procedure proposed in the previous sections to a set of real data arising from a skin cancer chemoprevention trial conducted by the University of Wisconsin Comprehensive Cancer Center in Madison, Wisconsin (Sun and Zhao 2013). It is a five-year double-blinded and placebo-controlled randomized Phase III clinical trial. The primary objective of this trial is to evaluate the effectiveness of 0.5g/m2/day PO difluoromethylornithine (DFMO) in reducing new skin cancers in a population of the patients with a history of non-melanoma skin cancers: basal cell carcinoma and squamous cell carcinoma. During the study, the patients were scheduled to be assessed or observed every 6 months for the development of new skin cancers. As expected, the real observation times differ from patient to patient and so as the follow-up times. The study consists of 291 patients randomized to either the placebo group (147) or the DFMO group (144), and the data include the numbers of occurrences of both basal cell carcinoma and squamous cell carcinoma between observation times. On the time to the first recurrence of the squamous cell carcinoma and the overall recurrence process of the basal cell carcinoma, only interval-censored data and panel count data are available, respectively. Note that both variables are important for characterizing the carcinoma recurrence process, and thus for assessing the treatment effect on the carcinoma recurrence process, one may want to consider both together.

For the study subject, in addition to the treatment indicator, we also have information on three baseline covariates, gender, age at the diagnosis and the number of prior skin cancers from the first diagnosis to randomization. For the analysis below, we will focus on the 290 patients (147 in the placebo group and 143 in the DFMO group) with at least one observation. To apply the estimation approach proposed in the previous sections, for patient i, let Ti denote the time to the first recurrence of squamous cell carcinoma and Ni(t) be the number of the basal cell carcinoma that have occurred up to time t, i = 1, … , 290. Also define the covariate Zi1 to be 1 if the ith patient was in the DFMO group and 0 otherwise, Zi2 and Zi3 to denote the number of prior skin cancers and the age of the patient, respectively, and Zi4 = 1 if patient i is female and 0 otherwise. Table 4 presents the analysis results given by the application of the proposed estimation procedure with m = 3, 4, 5, 6 and B = 1000, and they include the estimated covariate effects, the estimated standard error based on the bootstrap procedure (BSE) and the p-values for testing no covariate effects.

Table 4.

Joint analysis results of the skin cancer data

α^1 α^2 α^3 α^4 β^1 β^2 β^3 β^4 γ^
m = 3
 Estimator −0.086 0.094 0.039 −0.335 −0.133 0.106 −0.015 −0.096 −0.294
 BSE 0.227 0.025 0.008 0.279 0.171 0.020 0.007 0.179 0.178
p-value 0.707 0.000 0.000 0.230 0.438 0.000 0.038 0.529 0.098
m = 4
 Estimator −0.097 0.093 0.035 −0.347 −0.145 0.106 −0.017 −0.105 −0.299
 BSE 0.226 0.025 0.010 0.278 0.171 0.020 0.009 0.178 0.178
p-value 0.666 0.000 0.000 0.212 0.400 0.000 0.042 0.554 0.092
m = 5
 Estimator −0.098 0.093 0.035 −0.347 −0.147 0.106 −0.018 −0.106 −0.299
 BSE 0.225 0.025 0.012 0.279 0.171 0.020 0.009 0.178 0.178
p-value 0.664 0.000 0.002 0.213 0.390 0.000 0.049 0.550 0.091
m = 6
 Estimator −0.100 0.093 0.034 −0.350 −0.148 0.106 −0.018 −0.106 −0.299
 BSE 0.225 0.025 0.012 0.278 0.171 0.020 0.009 0.178 0.177
p-value 0.654 0.000 0.004 0.208 0.389 0.000 0.051 0.551 0.091

One can see from Table 4 that it seems that the DFMO treatment did not have any significant effect in reducing the risk of the first recurrence of squamous cell carcinoma and the recurrence process of basal cell carcinoma. Also both the first recurrence of squamous cell carcinoma and the recurrence process of basal cell carcinoma did not seem to be significantly related to the gender of the patient. However, both variables seem to be positively related to the number of prior skin cancers. The age of the patient seems to be positively related to the the first recurrence of squamous cell carcinoma but negatively related to the recurrence process of basal cell carcinoma. Note that the results also indicate that the time to the first squamous cell carcinoma recurrence and the overall recurrence of basal cell carcinoma were not significantly correlated. In addition, one can see that the results are similar for different values of m, which suggests that the proposed approach seems to be robust to the choice of m as expected.

6. Discussion and concluding remarks

This paper discussed joint regression analysis of interval-censored failure time data and panel count data. As mentioned above, a large literature exists for the analysis of either type of the data separately, but there does not seem to exist an established approach for their joint analysis. The proposed procedure allows one to combine two different sources together for the analysis or perform treatment comparison based on composite endpoints. The proposed method made use of the sieve approach and Bernstein polynomials and the resulting estimators of regression parameters are consistent, asymptotically normal and semiparametrically efficient. Also the simulation study suggested that it works well for practical situations.

Note that in the method described above, for the simplicity, we have assumed that the observation processes on the failure time and recurrent event process of interest are the same, but sometimes they could be different. In other words, the time observation times, say sik(1),s, on the Ti’s and the time observation times, say sij(2),s, on the Ni(t)’s are different. In this case, the observed data would have the form

{Ni(sij(2)),δik=I(Ti(si,k1(1),si,k(1)))}

and more complicated structures. However, one can still follow the idea used above to develop a similar estimation procedure as above.

The proposed method relies on several assumptions or has some limitations, which also provide several directions for extensions. One is the Poisson process assumption on the underlying recurrent event process, which can be treated as a working assumption (Wellner and Zhang 2000). For this, note that the simulation study suggested that the estimation procedure is robust to it and also this has already been demonstrated theoretically by others under similar contexts. Also on the underlying recurrent event process, we only considered the situation where it follows the proportional mean model and it is apparent that it is useful to generalize the method to more general models. In the method, it has been assumed that the failure time of interest follows the proportional hazards model. Although this model is quite common, it would be helpful to develop similar methods for other commonly used models. Another assumption used above is that the observation process is noninformative and sometimes this may not be true (Sun and Zhao 2013). Note that for the latter, one faces three related processes, failure time process, the underlying recurrent event process and the observation processes, which is also a counting process. In other words, for the development of the corresponding inference method, one needs to model all three processes together.

Acknowledgements

The authors wish to thank the Editor-in-Chief, Dr. Mei-Ling Lee, an Associate Editor and two reviewers for their many helpful comments and suggestions. The work was partly supported by the National Nature Science Foundation of China grants 11471135 and 11571133 and the Central China Normal University grant MOE 15ZD011 to the second author and the NIH grant R21CA198641 to the third author.

Appendix:

Proofs of the asymptotic properties of θ^n

In this Appendix, we will sketch the proofs for the results given in Theorems 1–3 by using the empirical process theory (van der Vaart and Wellner 1996) some techniques commonly used in nonparametric literature.

First let X = {X1, … , Xn} denote the observed data and define Ln={l(θ,X):θΘn}.

Proof of Theorem 1 For any Proof of Theorem 1. For any θ1=(α1,β1,γ1,Λ11,Λ21) and θ2=(α2,β2,γ2,Λ12,Λ22)Θn, it is easy to show that

|l(θ1,X)l(θ2,X)|K(α1α2+β1β2+γ1γ2+Λ11Λ12+Λ21Λ22) (1)

by the Taylor’s series expansion. According to the conclusion in page 94 of van der Vaart and Wellner (1996), we can show that the covering number of Ln satisfies

N(ϵ,Ln,L1(Pn))N(ϵ3M,B,)N(ϵ3Mn,Mn1,L)N(ϵ3Mn,Mn2,L)(9M2ϵ)2p+1(9Mn2ϵ)m+1(9Mn2ϵ)m+1KM4p+2Mn4m+4ϵpm,

where pm = 2p + 2m + 3. Then it follows from the inequality (31) of Pollard (1984)(page 31) that

supθΘn|Pnl(θ,X)Pl(θ,X)|0 (2)

in probability. Define Θϵ = {θ : d(θ, θ0) ≥ ϵ, θΘn} and let J(θ, X) = −l(θ, X), ζ1n=supθΘn|PnJ(θ,X)PJ(θ,X)|, and ζ2n = Pn J(θ0, X) − PJ(θ0, X). Then

infΘϵPJ(θ,X)=infΘϵ{PJ(θ,X)PnJ(θ,X)+PnJ(θ,X)}ζ1n+infΘϵPnJ(θ,X).

If θ^nΘϵ, we have

infΘϵPnJ(θ,X)=PnJ(θ^n,X)PnJ(θ0,X)=ζ2n+PJ(θ0,X).

By Condition 4, we have that infΘϵPJ(θ,X)PJ(θ0,X)=δϵ>0, Thus,

infΘϵPJ(θ,X)ζ1n+ζ2n+PJ(θ0,X)=ζn+PJ(θ0,X)

with ζn = ζ1n +ζ2n. Hence we obtain that ζnδϵ and furthermore {θ^nΘϵ}{ζnδϵ}. By (2) and Strong Law of Large Numbers, we have ζ1n = o(1), ζ2n = o(1) and then ζn = o(1) almost surely. Therefore, k=1n=k{θ^nΘϵ}k=1n=k{ζnδϵ}, which shows the strong consistency of θ^n.

Proof of Theorem 2 To establish the convergence rate, note that by the Theorem 1.6.2 of Lorentz (1986) or the proof of Theorem 2 in Osman and Ghosh (2012), if m = o(nν), there exist Bernstein polynomials Λ1n0 and Λ2n0 such that Λ1n0Λ10=O(nNv2) and Λ2n0Λ20=O(nrv2), respectively. For any η, define the class Fη={l(θn0,X)l(θ,X):θΘn,d(θθn0)<η} with θn0=(β0,α0,γ0,Λ1n0,Λ2n0). Following the calculation of Shen and Wong (1994, p. 597), we can establish that logN[](ε,Fη,2)CN log(η/ε) with N = 2(m + 1). Moreover, some algebraic calculations lead to l(θn0,X)l(θ,X)22Cη2 for any l(θn0,X)l(θ,X)Fη. Therefore it follows from Lemma 3.4.2 of van der Vaart and Wellner (1996) that

EPn1/2(PnP)FηCJη(ε,Fη,2){1+Jη(ε,Fη,2)η2n1/2}, (3)

where Jη(ε,Fη,2)=0η{1+logN[](ε,Fη,2)}1/2dεCN1/2η.

Note that the right-hand side of (3) gives ϕn(η) = C(N1/2η + N/n1/2). Also it is easy to see that ϕn(η)/η decreases in η and rn2ϕn(1/rn)=rnN1/2+rn2N/n1/2<2n1/2, where r = N−1/2n1/2 = n(1−v)/2 with 0 < ν < 0.5. Hence n(1v)/2d(θ^n,θn0)=Op(1) by Theorem 3.4.1 of van der Vaart and Wellner (1996). This, together with d(θn0, θ0) = Op(n/2) , yields that d(θ^n,θ0)=Op(n(1v)/2+nrv/2). The choice of ν = 1/(1 + r) yields the rate of convergence d(θ^n,θ0)=Op(nr/(2+2r)).

Proof of Theorem 3 As above, let θ0 denote the true value of parameter θ and define V to be the linear span of Θθ0. Also let l(θ, X) be the log-likelihood for a sample of size one and δn = n−(1−ν)/2+ n/2. For any θ ∈ {θΘ : ‖θθ0‖ = O(δn)}, define the first order directional derivative of l(θ, X) at the direction υV as

l˙(θ,X)[v]=dl(θ+sv,X)ds|s=0,

and the second order directional derivative as

l¨(θ,X)[v,v˜]=dl˙((θ+s˜v˜,X))ds˜|s˜=0=d2l(θ+sv+s˜v˜,X)ds˜ds|s˜=s=0.

Also define the Fisher inner product on the space V as

<v,v,>=P{l˙(θ,X)[v]l˙(θ,X)[v˜]}

and the Fisher norm for υV as ‖υ1/2 = < υ, υ >. Let V¯ be the closed linear span of V under the Fisher norm. Then (V¯,) is a Hilbert space.

Furthermore, define the smooth functional of θ as

ψ(θ)=b1α+b2β+b3γ,

where b=(b1,b2,b3) is any vector of 2p + 1 dimension with ‖b‖ ≤ 1. For any υV, we denote

ψ˙(θ0)[v]=dψ(θ0+sv)ds|s=0.

Note that ψ(θ)ψ(θ0)=ψ˙(θ0)[θθ0]. It follows from the Riesz representation theorem that there exists v*V¯ such that ψ˙(θ0)[v]=<v*,v> for all vV¯ and v*2=ψ˙(θ0). Thus it follows from the Cramér-Wold device that to prove Theorem 3, it suffices to show that

n1/2<θ^nθ0,v*>N(0,bΣb) in distribution (4)

since b{(α^nα0), (β^nβ0), (γ^nγ0)}=ψ(θ^n)ψ(θ0)=ψ˙(θ0)[θ^nθ0]=<θ^nθ0,v*>. In fact, (4) can be proved using the similar arguments of Theorem 1 of Shen (1997). For each component ϑq of ϑ, q = 1, 2, ⋯ , 2p + 1, we denote by ζq*=(b1q*,b2q*) the solution to

infζqE{lϑeqlb1[b1q]lb2[b2q]}2,

where lϑ=(lα,lβ,lγ), and eq is a (2p + 1)-dimensional vector of zeros except the q-th element equal to 1. lb1[b1] and lb2[b2] are the directional derivatives with respect to Λ1 and Λ2 and can be calculated as directional derivatives defined at the beginning of the proof of Theorem 3. Now let ζ*=(ζ1*,,ζ2p+1*). By the calculations of Chen et al. (2012), we have v*2=ψ˙(θ0)=supvV¯:v>0|ψ˙(θ0)[v]|v=bΣb, where Σ=[E(SϑSϑ)]1, Sϑ={lϑlb1*[b1*]lb2*[b2*]}. Hence the semiparametric efficiency can be established by applying the result of Theorem 4 in Shen (1997), which completes the proof.

References

  1. Banerjee M, Sen B (2007) A pseudolikelihood method for analyzing interval censored data. Biometrika 94:71–86 [Google Scholar]
  2. Burnham KP, Anderson DR (2003) Model selection and multimodel inference: a practical information-theoretic approach. Springer, Berlin [Google Scholar]
  3. Carnicer JM, Peña JM (1993) Shape preserving representations and optimality of the bernstein basis. Adv Comput Math 1:173–196 [Google Scholar]
  4. Chen D, Sun J, Peace K (2012) Interval-censored time-to event data: methods and applications. Chapman & Hall/CRC [Google Scholar]
  5. Ding Y, Nan B (2011) A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data. Ann Stat 39:3032–3061 [PMC free article] [PubMed] [Google Scholar]
  6. Finkelstein DM (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42:845–854 [PubMed] [Google Scholar]
  7. Groeneboom P, Wellner JA (1992) Information bounds and nonparametric maximum likelihood estimation. Springer, Berlin [Google Scholar]
  8. Hua L, Zhang Y, Tu W (2014) A spline-based semiparametric sieve likelihood method for over-dispersed panel count data. Can J Stat 42(2):217–245 [Google Scholar]
  9. Huang J, Rossini AJ (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92(439):960–967 [Google Scholar]
  10. Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Co, New York [Google Scholar]
  11. Lu M, Zhang Y, Huang J (2009) Semiparametric estimation methods for panel count data using monotone B-splines. J Am Stat Assoc 104(487):1060–1070 [Google Scholar]
  12. Ma L, Hu T, Sun J (2015) Sieve maximum likelihood regression analysis of dependent current status data. Biometrika 102(3):731–738 [Google Scholar]
  13. Osman M, Ghosh SK (2012) Nonparametric regression models for right-censored data using Bernstein polynomials. Comput Stat Data Anal 56(3):559–573 [Google Scholar]
  14. Pollard D (1984) Convergence of stochastic processes. Springer, New York [Google Scholar]
  15. Shen X (1997) On methods of sieves and penalization. Ann Stat 25(6):2555–2591 [Google Scholar]
  16. Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22:580–615 [Google Scholar]
  17. Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York [Google Scholar]
  18. Sun J, Wei LJ (2000) Regression analysis of panel count data with covariate-dependent observation and censoring times. J R Stat Soc Ser B (Stat Methodol) 62(2):293–302 [Google Scholar]
  19. Sun J, Zhao X (2013) The statistical analysis of panel count data. Springer, New York [Google Scholar]
  20. Turnbull BW (1976) The empirical distribution function with arbitrarily grouped censored and truncated data. J R Stat Soc Ser B 38:290–295 [Google Scholar]
  21. van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes: with application to statistics. Springer [Google Scholar]
  22. Wellner JA, Zhang Y (2000) Two estimators of the mean of a counting process with panel count data. Ann Stat 28:779–814 [Google Scholar]
  23. Zhang Z, Sun J, Sun L (2005) Statistical analysis of current status data with informative observation times. Stat Med 24(9):1399–1407 [DOI] [PubMed] [Google Scholar]
  24. Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand J Stat 37(2):338–354 [Google Scholar]

RESOURCES