Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2016 May 24;103(2):253–271. doi: 10.1093/biomet/asw013

Maximum likelihood estimation for semiparametric transformation models with interval-censored data

Donglin Zeng 1,, Lu Mao 1, D Y Lin 1
PMCID: PMC4890294  PMID: 27279656

Abstract

Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an EM-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through simulation studies and application to an HIV/AIDS study conducted in Thailand.

Keywords: Current-status data, EM algorithm, Interval censoring, Linear transformation model, Nonparametric likelihood, Proportional hazards, Proportional odds, Semiparametric efficiency, Time-dependent covariate

1. Introduction

Interval-censored data arise when the event or failure of interest is known only to occur within a time interval. Such data are commonly encountered in disease research, where the ascertainment of an asymptomatic event is costly or invasive and so can take place only at a small number of monitoring times. For example, in HIV/AIDS studies, blood samples are periodically drawn from at-risk subjects to look for evidence of HIV sero-conversion. Likewise, biopsies are performed on patients at clinic visits to determine the occurrence or recurrence of cancer.

There are several types of interval-censored data. The simplest and most studied type is called case-1 or current-status data, which involves only one monitoring time per subject and is routinely found in cross-sectional studies. When there are two or Inline graphic monitoring times per subject, the resulting data are referred to as case-2 or case-Inline graphic interval censoring (Huang & Wellner, 1997). The most general and common type allows for varying numbers of monitoring times among subjects and is termed mixed-case interval censoring (Schick & Yu, 2000).

The fact that the failure time is never observed exactly poses theoretical and computational challenges in semiparametric regression analysis of such data. Huang (1995, 1996) and Huang & Wellner (1997) studied nonparametric maximum likelihood estimation for the proportional hazards and proportional odds models with case-1 and case-2 data. The estimators are obtained by the iterative convex minorant algorithm, which becomes unstable for large datasets. Sieve maximum likelihood estimation for the proportional odds model was considered by Rossini & Tsiatis (1996) with case-1 data and by Huang & Rossini (1997) and Shen (1998) with case-2 data; however, it is difficult to choose an appropriate sieve parameter space and, especially, to choose the number of knots. For the proportional odds model with case-1 and case-2 data, Rabinowitz et al. (2000) derived an approximate conditional likelihood, which does not perform well in small samples. Gu et al. (2005), Sun & Sun (2005), Zhang et al. (2005) and Zhang & Zhao (2013) constructed rank-based estimators for linear transformation models, but such estimators are computationally demanding and statistically inefficient. None of the existing work accommodates time-dependent covariates or can handle case-Inline graphic or mixed-case interval censoring.

In this paper we consider interval censoring in the most general form, that is, mixed-case data. We study nonparametric maximum likelihood estimation for a broad class of transformation models that allows time-dependent covariates and includes the proportional hazards and proportional odds models as special cases. We develop an EM-type algorithm, which is demonstrated to perform satisfactorily in a wide variety of settings, even with time-dependent covariates. Using empirical process theory (van der Vaart & Wellner, 1996; van de Geer, 2000) and semiparametric efficiency theory (Bickel et al., 1993), we establish that, under mild conditions, the proposed estimators for the regression parameters are consistent and asymptotically normal and the limiting covariance matrix attains the semiparametric efficiency bound and can be estimated analytically by the profile likelihood method (Murphy & van der Vaart, 2000). The theoretical development requires careful treatment of the time trajectories of covariate processes and the joint distribution for an arbitrary sequence of monitoring times.

2. Methods

2.1. Transformation models and likelihood construction

Let Inline graphic denote the failure time, and let Inline graphic denote a Inline graphic-vector of potentially time-dependent covariates. Under the semiparametric transformation model, the cumulative hazard function for Inline graphic conditional on Inline graphic takes the form

2.1. (1)

where Inline graphic is a specific transformation function that is strictly increasing and Inline graphic is an unknown increasing function (Zeng & Lin, 2006). The choices of Inline graphic and Inline graphic yield the proportional hazards and proportional odds models, respectively. It is useful to consider the class of frailty-induced transformations

2.1.

where Inline graphic is the density function of a frailty variable with support Inline graphic. The choice of the gamma density with unit mean and variance Inline graphic for Inline graphic yields the class of logarithmic transformations, Inline graphicInline graphic, and the choice of the positive stable distribution with parameter Inline graphic yields the class of Box–Cox transformations, Inline graphic. When all the covariates are time-independent, model (1) can be rewritten as a linear transformation model

2.1.

where Inline graphic is an error term with distribution function Inline graphic (Chen et al., 2002). Thus, Inline graphic can be interpreted as the effects of covariates on a transformation of Inline graphic.

We formulate the mixed-case interval censoring by assuming that the number of monitoring times, denoted by Inline graphic, is random and that there exists a random sequence of monitoring times, denoted by Inline graphic. We do not model Inline graphic. Write Inline graphic, where Inline graphic and Inline graphic. Also, define Inline graphic, where Inline graphicInline graphic with Inline graphic denoting the indicator function. Then the observed data from a random sample of Inline graphic subjects consist of Inline graphicInline graphic, where Inline graphic and Inline graphic. If Inline graphic or 2 Inline graphic, then the observation scheme becomes case-1 or case-2, respectively.

Suppose that Inline graphic is independent of Inline graphic conditional on Inline graphic. Then the observed-data likelihood function concerning parameters Inline graphic takes the form

2.1.

Since only one Inline graphic is unity for each subject and the others equal zero,

2.1.

where Inline graphic is the smallest interval that brackets Inline graphic, i.e., Inline graphic and Inline graphic. Clearly, Inline graphic indicates that the Inline graphicth subject is left censored, while Inline graphic indicates that the subject is right censored.

Remark 1. —

The sequence of monitoring times may not be completely observed and, in fact, need not be for the purpose of inference. We only need to know the values of Inline graphic and Inline graphic, since the other monitoring times do not contribute to the likelihood. The theoretical development, however, requires consideration of the joint distribution for the entire sequence of monitoring times.

2.2. Nonparametric maximum likelihood estimation

To estimate Inline graphic and Inline graphic, we adopt the nonparametric maximum likelihood approach, under which Inline graphic is regarded as a step function with nonnegative jumps at the endpoints of the smallest intervals that bracket the failure times. Specifically, if Inline graphic denotes the set consisting of 0 and the unique values of Inline graphic and Inline graphicInline graphic, then the estimator for Inline graphic is a step function with jump size Inline graphic at Inline graphic and with Inline graphic. Hence, we maximize the function

2.2. (2)

Direct maximization of (2) is difficult due to the lack of an analytical expression for the parameters Inline graphicInline graphic. An even more severe challenge is that not all the Inline graphic and Inline graphic are informative about the failure times, so many of the Inline graphic are zero and hence lie on the boundary of the parameter space. For example, if there are no interval endpoints between some Inline graphic and Inline graphic with Inline graphic, then the jump size at Inline graphic must be zero in order to maximize (2). The existing iterative convex minorant algorithm works only for the proportional hazards and proportional odds models with time-independent covariates (Huang & Wellner, 1997). In the following, we construct an EM algorithm to maximize (2).

For the class of frailty-induced transformations described in §2.1, the observed-data likelihood can be written as

2.2.

so that the estimation of the transformation model becomes that of the proportional hazards frailty model. With Inline graphic as a step function with jumps Inline graphic at Inline graphicInline graphic, this likelihood becomes

2.2. (3)

where Inline graphic. We introduce latent variables Inline graphicInline graphic which, conditional on Inline graphic, are independent Poisson random variables with means Inline graphic. We show below that the nonconcave likelihood function given in (3) is equivalent to a likelihood function for these Poisson variables, so the M-step becomes maximization of a weighted sum of Poisson loglikelihood functions which is strictly concave and has closed-form solutions for Inline graphicInline graphic. Similar Poisson variables were recently used by Wang et al. (2015) in spline-based estimation of the proportional hazards model with time-independent covariates.

Define Inline graphic and Inline graphic. Suppose that the observed data consist of Inline graphicInline graphic, where Inline graphic means that Inline graphic is known to be zero and that Inline graphic means that Inline graphic is known to be positive, such that Inline graphic for Inline graphic and at least one Inline graphic for Inline graphic with Inline graphic. Then the likelihood takes the form

2.2. (4)

which is the same as (3). Thus, maximization of (3) is equivalent to maximum likelihood estimation based on the data Inline graphicInline graphic.

We maximize (4) through an EM algorithm by treating Inline graphic and Inline graphic as missing data. The complete-data loglikelihood is

2.2. (5)

where Inline graphic. In the M-step, we calculate

2.2. (6)

where Inline graphic denotes the posterior mean given the observed data. After incorporating (6) into the conditional expectation of (5), we update Inline graphic by solving the following equation using the one-step Newton–Raphson method:

2.2.

In the E-step, we evaluate the posterior means Inline graphic and Inline graphic. The posterior density function of Inline graphic given the observed data is proportional to Inline graphic, where Inline graphic and Inline graphic. Hence, we evaluate the posterior means by noting that for Inline graphic,

2.2.

and for Inline graphic with Inline graphic,

2.2.

which can be calculated using Gaussian–Laguerre quadrature. In addition,

2.2.

where Inline graphic for any function Inline graphic.

We iterate between the E- and M-steps until the sum of the absolute differences of the estimates at two successive iterations is less than, say, Inline graphic. This EM algorithm has several desirable features. First, the conditional expectations in the E-step involve at most one-dimensional integration, so they can be evaluated accurately by Gaussian quadrature. Second, in the M-step, the high-dimensional parameters Inline graphicInline graphic are calculated explicitly, while the low-dimensional parameter vector Inline graphic is updated by the Newton–Raphson method. In this way, the algorithm avoids the inversion of any high-dimensional matrices. Finally, the observed-data likelihood is guaranteed to increase after each iteration. To avoid local maxima, we suggest using a range of initial values for Inline graphic while setting Inline graphic to Inline graphic. We denote the final results by Inline graphic.

2.3. Variance estimation

We use profile likelihood (Murphy & van der Vaart, 2000) to estimate the covariance matrix of Inline graphic. Specifically, we define the profile loglikelihood

2.3.

where Inline graphic is the set of step functions with nonnegative jumps at Inline graphic. Then the covariance matrix of Inline graphic is estimated by the negative inverse of the matrix whose Inline graphicth element is

2.3.

where Inline graphic is the Inline graphicth canonical vector in Inline graphic and Inline graphic is a constant of order Inline graphic. To calculate Inline graphic for each Inline graphic, we reuse the proposed EM algorithm with Inline graphic held fixed. Thus, the only step in the EM algorithm is to explicitly evaluate Inline graphic and Inline graphic so as to update Inline graphic using (6). The iteration converges quickly with Inline graphic as the initial value.

3. Asymptotic theory

We establish the asymptotic properties of Inline graphic under the following regularity conditions.

Condition 1. —

The true value of Inline graphic, denoted by Inline graphic, lies in the interior of a known compact set Inline graphic in Inline graphic, and the true value of Inline graphic, denoted by Inline graphic, is continuously differentiable with positive derivatives in Inline graphic, where Inline graphic is the union of the supports of Inline graphic.

Condition 2. —

The vector Inline graphic is uniformly bounded with uniformly bounded total variation over Inline graphic, and its left limit exists for any Inline graphic. In addition, for any continuously differentiable function Inline graphic, the expectations Inline graphicInline graphic are continuously differentiable in Inline graphic, where Inline graphic and Inline graphic are increasing functions in the decomposition Inline graphic.

Condition 3. —

If Inline graphic for all Inline graphic with probability 1, then Inline graphic for Inline graphic and Inline graphic.

Condition 4. —

The number of monitoring times, Inline graphic, is positive, and Inline graphic. The conditional probability Inline graphic is greater than some positive constant Inline graphic. In addition, Inline graphicInline graphic for some positive constant Inline graphic. Finally, the conditional densities of Inline graphic given Inline graphic and Inline graphic, denoted by Inline graphicInline graphic, have continuous second-order partial derivatives with respect to Inline graphic and Inline graphic when Inline graphic and are continuously differentiable with respect to Inline graphic.

Condition 5. —

The transformation function Inline graphic is twice continuously differentiable on Inline graphic with Inline graphic, Inline graphic and Inline graphic.

Remark 2. —

Condition 1 is standard in survival analysis. Condition 2 allows Inline graphic to have discontinuous trajectories, but the expectation of any smooth functional of Inline graphic must be differentiable. One example would be that Inline graphic is a stochastic process with a finite number of piecewise-smooth trajectories, where the discontinuity points have a continuous joint distribution. This condition excludes taking Brownian motion as a process for Inline graphic. Condition 3 holds if the matrix Inline graphic is nonsingular for some Inline graphic. Condition 4 pertains to the joint distribution of monitoring times. First, it requires that the monitoring occur anywhere in Inline graphic and that the largest monitoring time be equal to Inline graphic with positive probability. The latter assumption may be removed, at the expense of more complicated proofs. Condition 4 also requires that two adjacent monitoring times be separated by at least Inline graphic; otherwise, the data may contain exact observations, which would entail a different theoretical treatment. The smoothness condition for the joint density of monitoring times is used to prove the Donsker property of some function classes and the smoothness of the least favourable direction. Finally, Condition 5 pertains to the transformation function and holds for both the logarithmic family Inline graphicInline graphic and the Box–Cox family Inline graphicInline graphic.

The following theorem establishes the strong consistency of Inline graphic.

Theorem 1. —

Under Conditions 1–5,Inline graphicalmost surely asInline graphic, whereInline graphicis the Euclidean norm.

It is implicitly assumed in Theorem 1 that Inline graphic is restricted to Inline graphic, although in practice Inline graphic is allowed to be very large. The proof of Theorem 1 is based on the Kullback–Leibler information and makes use of the strong consistency of empirical processes. Careful arguments are needed to establish a preliminary bound for Inline graphic and to handle time-dependent covariates. Our next theorem establishes the asymptotic normality and semiparametric efficiency of Inline graphic.

Theorem 2. —

Under Conditions 1–5,Inline graphicconverges in distribution asInline graphicto a zero-mean normal random vector whose covariance matrix attains the semiparametric efficiency bound.

The proof of Theorem 2 relies on the derivation of the least favourable submodel for Inline graphic and utilizes modern empirical process theory. A key step is to show that Inline graphic converges to Inline graphic at the Inline graphic rate. Although the general procedure is similar to that of Huang & Wellner (1997), a major innovation is the derivation of the least favourable submodel for general interval censoring and time-dependent covariates by carefully handling the trajectories of Inline graphic and the joint distribution of Inline graphic. The existence of the least favourable submodel is also used at the end of the Appendix to show consistency of the profile-likelihood covariance estimator given in §2.3.

4. Simulation studies

We conducted simulation studies to assess the operating characteristics of the proposed numerical and inferential procedures. In the first study, we considered two time-independent covariates, Inline graphic and Inline graphic. In the second study, we allowed Inline graphic to vary over time by imitating two-stage randomization: Inline graphic, where Inline graphic and Inline graphic are independent Inline graphic and Inline graphic with Inline graphic. In both studies, we generated the failure times from the transformation model

4.

where Inline graphicInline graphic. We set Inline graphic, Inline graphic and Inline graphic. To create interval censoring, we randomly generated two monitoring times, Inline graphic and Inline graphic, so that the time axis Inline graphic was partitioned into three intervals, Inline graphic, Inline graphic and Inline graphic. On average, there were 25–35% left-censored observations and 50–60% right-censored ones. We set Inline graphic, 400 or 800 and used 10 000 replicates for each sample size.

For each dataset, we applied the proposed EM algorithm by setting the initial value of Inline graphic to 0 and the initial value of Inline graphic to Inline graphic, and we set the convergence threshold to Inline graphic. We also tried other initial values for Inline graphic, but they all led to the same estimates. For the variance estimation, we set Inline graphic, but the results differed only in the third decimal place if we used Inline graphic or Inline graphic. There was no nonconvergence in any of the EM iterations.

Tables 1 and 2 summarize the results of the two simulation studies under Inline graphic or 1. The parameter estimators have small bias, and the bias decreases rapidly as Inline graphic increases. The variance estimators accurately reflect the true variabilities, and the confidence intervals have proper coverage probabilities. As shown in Fig. 1, the estimated cumulative hazard functions have negligible bias.

Table 1.

Summary statistics for the simulation study with time-independent covariates

Inline graphic Inline graphic Inline graphic
Inline graphic Est SE SEE CP Est SE SEE CP Est SE SEE CP
0 Inline graphic 0Inline graphic515 0Inline graphic209 0Inline graphic216 96 0Inline graphic506 0Inline graphic148 0Inline graphic149 95 0Inline graphic503 0Inline graphic103 0Inline graphic104 95
Inline graphic Inline graphic515 0Inline graphic366 0Inline graphic354 94 Inline graphic505 0Inline graphic254 0Inline graphic248 95 Inline graphic504 0Inline graphic176 0Inline graphic174 95
Inline graphic Inline graphic 0Inline graphic514 0Inline graphic255 0Inline graphic259 96 0Inline graphic507 0Inline graphic180 0Inline graphic176 94 0Inline graphic503 0Inline graphic125 0Inline graphic125 95
Inline graphic Inline graphic516 0Inline graphic451 0Inline graphic434 94 Inline graphic505 0Inline graphic311 0Inline graphic303 94 Inline graphic503 0Inline graphic215 0Inline graphic212 94
1 Inline graphic 0Inline graphic516 0Inline graphic294 0Inline graphic297 95 0Inline graphic506 0Inline graphic209 0Inline graphic207 95 0Inline graphic504 0Inline graphic145 0Inline graphic144 95
Inline graphic Inline graphic517 0Inline graphic522 0Inline graphic503 94 Inline graphic505 0Inline graphic358 0Inline graphic350 95 Inline graphic502 0Inline graphic249 0Inline graphic244 94

Est, empirical average of the parameter estimator; SE, standard error of the parameter estimator; SEE, empirical average of the standard error estimator; CP, empirical coverage percentage of the 95% confidence interval.

Table 2.

Summary statistics for the simulation study with time-dependent covariates

Inline graphic Inline graphic Inline graphic
Inline graphic Est SE SEE CP Est SE SEE CP Est SE SEE CP
0 Inline graphic 0Inline graphic529 0Inline graphic241 0Inline graphic239 95 0Inline graphic518 0Inline graphic166 0Inline graphic164 95 0Inline graphic509 0Inline graphic114 0Inline graphic114 95
Inline graphic Inline graphic515 0Inline graphic363 0Inline graphic353 95 Inline graphic511 0Inline graphic253 0Inline graphic247 94 Inline graphic503 0Inline graphic175 0Inline graphic173 95
Inline graphic Inline graphic 0Inline graphic533 0Inline graphic292 0Inline graphic280 94 0Inline graphic522 0Inline graphic198 0Inline graphic193 94 0Inline graphic511 0Inline graphic138 0Inline graphic134 94
Inline graphic Inline graphic514 0Inline graphic441 0Inline graphic433 95 Inline graphic512 0Inline graphic307 0Inline graphic302 95 Inline graphic503 0Inline graphic214 0Inline graphic211 95
1 Inline graphic 0Inline graphic537 0Inline graphic336 0Inline graphic317 94 0Inline graphic525 0Inline graphic228 0Inline graphic219 94 0Inline graphic514 0Inline graphic157 0Inline graphic152 94
Inline graphic Inline graphic518 0Inline graphic512 0Inline graphic502 95 Inline graphic513 0Inline graphic358 0Inline graphic349 95 Inline graphic505 0Inline graphic250 0Inline graphic243 94

Fig. 1.

Fig. 1.

Estimation of Inline graphic with Inline graphic, for (a) Inline graphic and (b) Inline graphic. The solid and dashed curves represent the true values and mean estimates, respectively.

We conducted an additional simulation study with five covariates. We set Inline graphic to zero-mean normal with unit variances and pairwise correlations of 0Inline graphic5 and took Inline graphic; the other simulation settings were left unchanged. The results are summarized in the Supplementary Material. The proposed methods performed well in this simulation too. Again, there were no cases of nonconvergence.

To evaluate the performance of the EM algorithm in even larger datasets, we set Inline graphic and Inline graphic to ten standard normal random variables with pairwise correlations of 0Inline graphic25 and regression coefficients of 0Inline graphic5. The algorithm converged to values close to 0Inline graphic5 in all 10 000 replicates.

5. Application

The Bangkok Metropolitan Administration conducted a cohort study of 1209 injecting drug users who were initially sero-negative for the HIV-1 virus. Subjects from 15 drug treatment clinics were followed from 1995 to 1998. At study enrolment and approximately every four months thereafter, subjects were assessed for HIV-1 sero-positivity through blood tests. As of December 1998 there were 133 HIV-1 sero-conversions and roughly 2300 person-years of follow-up.

We aim to identify the factors that influence HIV-1 infection. We fit model (1) with the class of logarithmic transformations Inline graphicInline graphic. The covariates include age at recruitment, gender, history of needle sharing, and drug injection in jail before recruitment; age is measured in years, gender takes value 1 for male and 0 for female, and history of needle sharing and drug injection are binary indicators of yes or no. In addition, we include a time-dependent covariate indicating imprisonment since the last clinic visit.

To select a transformation function, we vary Inline graphic from 0 to 1Inline graphic5 in steps of 0Inline graphic05. For each Inline graphic, we estimate Inline graphic and Inline graphic by the EM algorithm and evaluate the loglikelihood at the parameter estimates. Figure 2(a) shows that the loglikelihood changes only very slowly as Inline graphic varies and is maximized at Inline graphic. We choose Inline graphic, which corresponds to the proportional odds model. Table 3 shows the results under this model. For comparison, we also include the results for Inline graphic, which corresponds to the proportional hazards model.

Fig. 2.

Fig. 2.

Analysis of the Bangkok Metropolitan Administration HIV-1 study: (a) the loglikelihood at the nonparametric maximum likelihood estimates plotted as a function of Inline graphic in the logarithmic transformations; (b) estimation of infection-free probabilities, where the upper lines correspond to a low-risk subject under the proportional hazards (solid) and proportional odds (dashed) models, and the lower lines correspond to a high-risk subject under the proportional hazards (solid) and proportional odds (dashed) models.

Table 3.

Regression analysis of the Bangkok Metropolitan Administration HIV-1 infection data

Proportional hazards Proportional odds
Covariates Estimate Standard error Inline graphic-value Estimate Standard error Inline graphic-value
Age Inline graphic028 0Inline graphic012 0Inline graphic021 Inline graphic031 0Inline graphic013 0Inline graphic016
Gender 0Inline graphic424 0Inline graphic270 0Inline graphic117 0Inline graphic539 0Inline graphic310 0Inline graphic082
Needle sharing 0Inline graphic237 0Inline graphic183 0Inline graphic196 0Inline graphic251 0Inline graphic196 0Inline graphic200
Drug injection 0Inline graphic313 0Inline graphic184 0Inline graphic089 0Inline graphic360 0Inline graphic198 0Inline graphic069
Imprisonment over time 0Inline graphic502 0Inline graphic211 0Inline graphic017 0Inline graphic494 0Inline graphic219 0Inline graphic024

Under either model, ageing reduces the risk of HIV-1 infection, whereas being male increases it. In addition, drug injection increases the risk. Finally, subjects who have recently been imprisoned have an elevated risk of HIV-1 infection.

Figure 2(b) shows the prediction of HIV-1 infection for a low-risk subject versus a high-risk subject. The low-risk subject is a 50-year-old female with no history of needle sharing, no drug injection in jail before recruitment, and no imprisonment during follow-up; the high-risk subject is a 20-year-old male with a history of needle sharing, drug injection in jail before recruitment, and imprisonment over time. The estimated probabilities of infection for the low-risk subject are similar under the proportional odds and proportional hazards models. For the high-risk subject, however, the proportional odds model yields slightly higher risks of infection than the proportional hazards model during the first part of the follow-up period, with the opposite being true during the later part of the follow-up period.

6. Remarks

The presence of time-dependent covariates poses major computational and theoretical challenges. With time-dependent covariates, the parameters Inline graphic and Inline graphicInline graphic in the likelihood function are entangled. As a result, the diagonal approximation to the Hessian matrix in the iterative convex minorant algorithm (Huang & Wellner, 1997) is inaccurate, and the algorithm becomes unstable. By contrast, each iteration of our EM algorithm only solves a low-dimensional equation for Inline graphic while calculating the jump sizes of Inline graphic explicitly as weighted sums of Poisson rates. Thus, our algorithm is fast and stable. In extensive numerical studies we have never encountered nonconvergence. Our software is available at http://dlin.web.unc.edu/software.

Our theoretical development requires that the population average of the covariate process be smooth but allows individual covariate trajectories to be discontinuous. We treat Inline graphic as a bundled process of Inline graphic and Inline graphic when proving the identifiability of Inline graphic in Theorem 1, the convergence rate of Inline graphic in Lemma A1, and the invertibility of the information operator in Theorem 2. The Donsker property for this class of processes indexed by Inline graphic is used repeatedly in the proofs. Besides time-dependent covariates, one major theoretical challenge is dealing with general interval censoring, which allows each subject to have a different number of monitoring times. In particular, the derivation of the least favourable direction for Inline graphic requires careful consideration of the joint distribution for an arbitrary sequence of monitoring times, and the Lax–Milgram theorem is used to prove the existence of a least favourable direction. That theorem greatly simplifies the proof, in contrast to the approach of Huang & Wellner (1997), even for case-2 data.

To apply the transformation model to real data, one must choose a transformation function. In the analysis of the Bangkok Metropolitan Administration HIV-1 data, we used the aic to select the transformation function, although the likelihood surface is fairly flat. It would be worthwhile to develop formal diagnostic procedures to check the appropriateness of the transformation function and other model assumptions. One possible strategy is to examine the behaviour of the posterior mean of the martingale residuals (Chen et al., 2012) given the observed intervals.

In many applications, the event of interest may occur repeatedly over time. Recurrent events under interval censoring are called panel count data, which have been studied by Sun & Wei (2000), Zhang (2002) and Wellner & Zhang (2007), among others. There are also studies in which each subject can experience different types of events or where subjects are sampled in clusters such that the failure times with the same cluster are correlated. We are currently developing regression methods to handle such multivariate failure time data.

We are also extending our work to competing risks data. Indeed, the Bangkok Metropolitan Administration HIV-1 study contains information on HIV-1 infection by viral subtypes B and E, which are two competing risks. We propose to formulate the effects of potentially time-dependent covariates on the cumulative incidence functions of competing risks in the form of model (1). We will modify the EM algorithm to deal with multiple subdistribution functions and establish the asymptotic theory under suitable conditions.

Supplementary material

Supplementary material available at Biometrika online includes additional simulation results and a zip file containing the computer code and documentation.

Supplementary Material

Supplementary Data

Acknowledgments

This research was supported by the U.S. National Institutes of Health. The authors thank the editor, an associate editor and a referee for helpful comments.

Appendix A. Appendix

Technical details

We use Inline graphic to denote the empirical measure from Inline graphic independent observations and Inline graphic to denote the true probability measure. The corresponding empirical process is Inline graphic. Let Inline graphic be the observed-data loglikelihood for a single subject, that is,

graphic file with name M472.gif

Proof of Theorem 1. —

We first show that Inline graphic with probability 1. By Condition 4, the measure generated by the function Inline graphic is dominated by the sum of the Lebesgue measure in Inline graphic and the counting measure at Inline graphic, and its Radon–Nikodym derivative, denoted by Inline graphic, is bounded away from zero. We define

graphic file with name M478.gif

Clearly, Inline graphic is a step function with jumps only at Inline graphic. Since

graphic file with name M481.gif

uniformly in Inline graphic with probability 1 as Inline graphic, we conclude that Inline graphic converges uniformly to Inline graphic with probability 1 for Inline graphic.

By the definition of Inline graphic, we have Inline graphic. Because of its bounded total variation, Inline graphic belongs to a Donsker class indexed by Inline graphic. Hence, the class of functions

graphic file with name M491.gif

where Inline graphic denotes functions which have total variation in Inline graphic bounded by a given constant Inline graphic, is a convex hull of functions Inline graphic, so it is a Donsker class. Furthermore,

graphic file with name M496.gif

is bounded away from zero. Therefore, Inline graphic belongs to some Donsker class due to the preservation property of the Donsker class under Lipschitz-continuous transformations. We conclude that Inline graphic almost surely. In addition, by the construction of Inline graphic, Inline graphic converges almost surely to Inline graphic, which is finite. Therefore, with probability 1,

graphic file with name M502.gif (A1)

Let Inline graphic be such that Inline graphic for Inline graphic. Then the left-hand side of (A1) is less than or equal to

graphic file with name M506.gif

Hence Inline graphic. Since as Inline graphic, Inline graphic, which is positive, Condition 5 implies that Inline graphic with probability 1.

We can now restrict Inline graphic to a class of functions with uniformly bounded total variation, equipped with the weak topology on Inline graphic. By Helly's selection lemma, for any subsequence of Inline graphic we can choose a further subsequence such that Inline graphic converges weakly to some Inline graphic on Inline graphic, Inline graphic converges to Inline graphic, and Inline graphic converges to Inline graphic. Clearly, Inline graphic implies that Inline graphic, where Inline graphic, so that

graphic file with name M524.gif

By the above arguments for proving the Donsker property of Inline graphic, together with the fact that the total variation of Inline graphic is bounded by a constant, we can show that Inline graphic belongs to a Donsker class with a bounded envelope function, so that Inline graphic. It follows that Inline graphic Furthermore,

graphic file with name M530.gif

where Inline graphic is the measure corresponding to Inline graphic. According to Conditions 2 and 4, Inline graphic is dominated by the Lebesgue measure with bounded derivative in Inline graphic and has a point mass at Inline graphic. Hence Inline graphic almost surely. We therefore conclude that Inline graphic

By the properties of the Kullback–Leibler information, Inline graphic with probability 1. In particular, for any Inline graphic, we choose Inline graphic to obtain

graphic file with name M541.gif

Thus, for any Inline graphic,

graphic file with name M543.gif (A2)

Differentiating both sides with respect to Inline graphic, we have

graphic file with name M545.gif

By Condition 3, Inline graphic and Inline graphic for Inline graphic. We let Inline graphic by redefining Inline graphic to centre at a deterministic function in the support of Inline graphic, and we set Inline graphic in (A2) to obtain Inline graphic. Hence, Inline graphic for Inline graphic. It follows that Inline graphic and Inline graphic converges weakly to Inline graphic almost surely. The latter convergence can be strengthened to uniform convergence since Inline graphic is continuous. Thus, we have proved Theorem 1. □

Since we have established consistency, we may restrict the space of Inline graphic to

graphic file with name M561.gif

for some Inline graphic. Thus, when Inline graphic is large enough, Inline graphic belongs to Inline graphic with probability 1. Before proving Theorem 2, we need to establish the convergence rate for Inline graphic. Specifically, the following lemma holds.

Lemma A1. —

Under Conditions 1–5,

graphic file with name M567.gif

Proof. —

The proof relies on the convergence-rate result in Theorem 3.4.1 of van der Vaart & Wellner (1996). To use that theorem, we define Inline graphic and let

graphic file with name M569.gif

be a class of functions indexed by Inline graphic and Inline graphic.

We first calculate the Inline graphic-bracketing number of Inline graphic. Because Inline graphic consists of increasing and uniformly bounded functions on Inline graphic, Lemma 2.2 of van de Geer (2000) implies that for any Inline graphic, the bracketing number satisfies

graphic file with name M577.gif

where Inline graphic denotes the Inline graphic-norm with respect to the Lebesgue measure on Inline graphic, and Inline graphic means that Inline graphic for a positive constant Inline graphic. For Inline graphic, we can find Inline graphic number of brackets Inline graphic with Inline graphic and Inline graphic to cover Inline graphic. In addition, there are Inline graphic number of brackets covering Inline graphic, such that any two Inline graphic within the same bracket differ by at most Inline graphic. Hence, there are a total of Inline graphic brackets covering Inline graphic. For any pair of Inline graphic and Inline graphic, there exist some constants Inline graphic and Inline graphic such that

graphic file with name M600.gif

Because the measures Inline graphic and Inline graphic have bounded derivatives with respect to the Lebesgue measure in Inline graphic and the former has a finite point mass at Inline graphic,

graphic file with name M605.gif

Thus, the bracketing number for Inline graphic satisfies

graphic file with name M607.gif

and so it has a finite entropy integral. Define

graphic file with name M608.gif

It is easy to show that Inline graphic. In addition, by Lemma 1.3 of van de Geer (2000),

graphic file with name M610.gif

where Inline graphic is the Hellinger distance, defined as

graphic file with name M612.gif

with respect to the dominating measure Inline graphic.

The above results, together with the fact that Inline graphic maximizes Inline graphic and the consistency result in Theorem 1, imply that all the conditions in Theorem 3.4.1 of van der Vaart & Wellner (1996) hold. Thus, we conclude that Inline graphic, where Inline graphic satisfies Inline graphic. In particular, we can choose Inline graphic in the order of Inline graphic such that Inline graphic. By the mean value theorem,

graphic file with name M622.gif

On the left-hand side of the above equation, we consider the event Inline graphic to find that

graphic file with name M624.gif

Next, we consider Inline graphic to obtain the same equation as the one above but with Inline graphic replaced by Inline graphic. By repeating this process, we conclude that, conditional on Inline graphic and for any Inline graphic,

graphic file with name M630.gif

It then follows from the mean value theorem that

graphic file with name M631.gif

and so the lemma is proved. □

Now we are ready to prove Theorem 2.

Proof of Theorem 2. —

It is helpful to introduce the following notation: Inline graphic,

graphic file with name M633.gif

and

graphic file with name M634.gif

The score function for Inline graphic is

graphic file with name M636.gif

To obtain the score operator for Inline graphic, we consider any parametric submodel of Inline graphic defined by Inline graphicInline graphic, where Inline graphic. The score function along this submodel is

graphic file with name M642.gif

Clearly, Inline graphic and Inline graphic. Hence

graphic file with name M645.gif

We apply Taylor series expansions about Inline graphic to the right-hand sides of the above two equations. By Lemma A1, the second-order terms are bounded by

graphic file with name M647.gif

Hence

graphic file with name M648.gif (A3)
graphic file with name M649.gif (A4)

where Inline graphic is the second derivative of Inline graphic with respect to Inline graphic, Inline graphic is the derivative of Inline graphic along the submodel Inline graphic, Inline graphic is the derivative of Inline graphic with respect to Inline graphic, and Inline graphic is the derivative of Inline graphic along the submodel Inline graphic. All the derivatives on the right-hand sides of (A3) and (A4) are evaluated at Inline graphic.

We choose Inline graphic to be the least favourable direction Inline graphic, a Inline graphic-vector with components in Inline graphic that solves the normal equation

graphic file with name M667.gif (A5)

where Inline graphic is the adjoint operator of Inline graphic. Then

graphic file with name M670.gif

so that the difference between (A3) and (A4) yields

graphic file with name M671.gif (A6)

Consequently, Theorem 2 will be established if we can show that:

  1. equation (A5) has a solution Inline graphic;

  2. Inline graphic belongs to a Donsker class and converges in the Inline graphic-norm to Inline graphic;

  3. the matrix Inline graphic is invertible.

The reason is that when (i)–(iii) hold, (A6) entails Inline graphic and further yields

graphic file with name M678.gif

This implies that the influence function for Inline graphic is exactly the efficient influence function, so that Inline graphic converges to a zero-mean normal random vector whose covariance matrix attains the semiparametric efficiency bound (Bickel et al., 1993, p. 65).

We first verify (i). For any Inline graphic,

graphic file with name M682.gif

where

graphic file with name M683.gif

Likewise,

graphic file with name M684.gif

Therefore, by the definition of the dual operator Inline graphic, solving the normal equation (A2) is equivalent to solving the integral equation

graphic file with name M686.gif (A7)

We define the left-hand side of (A7) as a linear operator Inline graphic which maps Inline graphic to itself. In addition, we equip Inline graphic with an inner product Inline graphic so that it becomes a Hilbert space. On the same space, we define Inline graphic. It is easy to show that Inline graphic is a seminorm for Inline graphic. Furthermore, if Inline graphic, then Inline graphic. Thus, with probability 1, Inline graphic is zero, i.e., for any Inline graphic,

graphic file with name M698.gif

Setting Inline graphic we obtain

graphic file with name M700.gif

Thus, Inline graphic for Inline graphic, implying that Inline graphic is a norm in Inline graphic. Clearly, Inline graphic for some constant Inline graphic. According to the bounded inverse theorem in Banach spaces, we have Inline graphic for another constant Inline graphic; that is, we have Inline graphic. By the Lax–Milgram theorem (Zeidler, 1995), the solution to (A7), namely Inline graphic exists. So we have verified (i).

To verify (ii), we examine Inline graphic by considering Inline graphic and Inline graphic. Along the lines of Huang & Wellner (1997), we differentiate the integral equation (A7) with respect to Inline graphic to obtain

graphic file with name M715.gif

where Inline graphic and Inline graphicInline graphic are continuously differentiable with respect to their arguments. Hence Inline graphic is continuously differentiable in Inline graphic. This fact implies that Inline graphic belongs to some Donsker class. It then follows from the Donsker property of the class Inline graphic that (ii) is true.

Finally, we verify (iii). If the matrix is singular, then there exists a nonzero vector Inline graphic such that

graphic file with name M724.gif

It follows that, with probability 1, the score function along the submodel Inline graphic is zero; that is, for any Inline graphic,

graphic file with name M727.gif

where Inline graphic. We consider Inline graphic for Inline graphic to obtain

graphic file with name M731.gif

Therefore, with probability 1, Inline graphic for any Inline graphic. This implies that Inline graphic, so Inline graphic by Condition 3. Thus, we have verified (iii). □

Remark A1. —

For a given Inline graphic, we define Inline graphic as the step function that maximizes Inline graphic. The arguments in the proof of Theorem 1 can be used to show that Inline graphic is bounded asymptotically when Inline graphic is in a small neighbourhood of Inline graphic, so Inline graphic converges to Inline graphic as Inline graphic converges to Inline graphic. In addition, the arguments in the proof of Lemma A1 yield Inline graphic in the Inline graphic space. Finally, in light of the existence of Inline graphic in the proof of Theorem 2, we can define Inline graphic to obtain the least favourable submodel as Inline graphic, where Inline graphic is in a neighbourhood of Inline graphic. Thus, we can easily verify conditions (8), (9) and (10) of Murphy & van der Vaart (2000) for the likelihood function along this submodel. Along with the Donsker property of the functional classes for the first and second derivatives of Inline graphic with respect to Inline graphic and Inline graphic, we conclude that Theorem 1 of Murphy & van der Vaart (2000) is applicable, and hence the covariance matrix estimator given in §2.3 with Inline graphic is consistent for the limiting covariance matrix of Inline graphic.

References

  1. Bickel P. J., Klaassen C. A. J., Ritov Y. & Wellner J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. [Google Scholar]
  2. Chen K., Jin Z. & Ying Z. (2002). Semiparametric analysis of transformation models with censored data. Biometrika 89, 659–68. [Google Scholar]
  3. Chen L., Lin D. Y. & Zeng D. (2012). Checking semiparametric transformation models with censored data. Biostatistics 13, 18–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Gu M. G., Sun L. & Zuo G. (2005). A baseline-free procedure for transformation models under interval censorship. Lifetime Data Anal. 11, 473–88. [DOI] [PubMed] [Google Scholar]
  5. Huang J. (1995). Maximum likelihood estimation for proportional odds regression model with current status data. In Analysis of Censored Data, vol. 27 of IMS Lecture Notes—Monograph Series, H. L. Koul and J. V. Deshpande, eds. Hayward: Institute of Mathematical Statistics, pp. 129–46.
  6. Huang J. (1996). Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 24, 540–68. [Google Scholar]
  7. Huang J. & Rossini A. J. (1997). Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J. Am. Statist. Assoc. 92, 960–7. [Google Scholar]
  8. Huang J. & Wellner J. A. (1997). Interval censored survival data: A review of recent progress. In Proc. 1st Seattle Symp. Biostatist.: Survival Anal., D. Y. Lin and T. R. Fleming, eds. New York: Springer, pp. 123–69.
  9. Murphy S. A. & van der Vaart A. W. (2000). On profile likelihood. J. Am. Statist. Assoc. 95, 449–65. [Google Scholar]
  10. Rabinowitz D., Betensky R. A. & Tsiatis A. A. (2000). Using conditional logistic regression to fit proportional odds models to interval censored data. Biometrics 56, 511–8. [DOI] [PubMed] [Google Scholar]
  11. Rossini A. J. & Tsiatis A. A. (1996). A semiparametric proportional odds regression model for the analysis of current status data. J. Am. Statist. Assoc. 91, 713–21. [Google Scholar]
  12. Schick A. & Yu Q. (2000). Consistency of the GMLE with mixed case interval-censored data. Scand. J. Statist. 27, 45–55. [Google Scholar]
  13. Shen X. (1998). Proportional odds regression and sieve maximum likelihood estimation. Biometrika 85, 165–77. [Google Scholar]
  14. Sun J. & Sun L. (2005). Semiparametric linear transformation models for current status data. Can. J. Statist. 33, 85–96. [Google Scholar]
  15. Sun J. & Wei L. J. (2000). Regression analysis of panel count data with covariate-dependent observation and censoring times. J. R. Statist. Soc. B 62, 293–302. [Google Scholar]
  16. van de Geer S. A. (2000). Empirical Process Theory and Applications. Cambridge: Cambridge University Press. [Google Scholar]
  17. van der Vaart A. W. & Wellner J. A. (1996) Weak Convergence and Empirical Processes. New York: Springer. [Google Scholar]
  18. Wang L., McMahan C. S., Hudgens M. G. & Qureshi Z. P. (2015). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics72, 222–31. [DOI] [PMC free article] [PubMed]
  19. Wellner J. A. & Zhang Y. (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist. 35, 2106–42. [Google Scholar]
  20. Zeidler E. (1995). Applied Functional Analysis: Applications to Mathematical Physics. New York: Springer. [Google Scholar]
  21. Zeng D. & Lin D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93, 627–40. [Google Scholar]
  22. Zhang Y. (2002). A semiparametric pseudolikelihood estimation method for panel count data. Biometrika 89, 39–48. [Google Scholar]
  23. Zhang Z. & Zhao Y. (2013). Empirical likelihood for linear transformation models with interval-censored failure time data. J. Mult. Anal. 116, 398–409. [Google Scholar]
  24. Zhang Z., Sun L., Zhao X. & Sun J. (2005). Regression analysis of interval-censored failure time data with linear transformation models. Can. J. Statist. 33, 61–70. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES