Maximum likelihood estimation for semiparametric transformation models with interval-censored data

Donglin Zeng; Lu Mao; D Y Lin

doi:10.1093/biomet/asw013

. 2016 May 24;103(2):253–271. doi: 10.1093/biomet/asw013

Maximum likelihood estimation for semiparametric transformation models with interval-censored data

Donglin Zeng ^1,^✉, Lu Mao ¹, D Y Lin ¹

PMCID: PMC4890294 PMID: 27279656

Abstract

Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an EM-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through simulation studies and application to an HIV/AIDS study conducted in Thailand.

Keywords: Current-status data, EM algorithm, Interval censoring, Linear transformation model, Nonparametric likelihood, Proportional hazards, Proportional odds, Semiparametric efficiency, Time-dependent covariate

1. Introduction

Interval-censored data arise when the event or failure of interest is known only to occur within a time interval. Such data are commonly encountered in disease research, where the ascertainment of an asymptomatic event is costly or invasive and so can take place only at a small number of monitoring times. For example, in HIV/AIDS studies, blood samples are periodically drawn from at-risk subjects to look for evidence of HIV sero-conversion. Likewise, biopsies are performed on patients at clinic visits to determine the occurrence or recurrence of cancer.

There are several types of interval-censored data. The simplest and most studied type is called case-1 or current-status data, which involves only one monitoring time per subject and is routinely found in cross-sectional studies. When there are two or Inline graphic monitoring times per subject, the resulting data are referred to as case-2 or case- interval censoring (Huang & Wellner, 1997). The most general and common type allows for varying numbers of monitoring times among subjects and is termed mixed-case interval censoring (Schick & Yu, 2000).

The fact that the failure time is never observed exactly poses theoretical and computational challenges in semiparametric regression analysis of such data. Huang (1995, 1996) and Huang & Wellner (1997) studied nonparametric maximum likelihood estimation for the proportional hazards and proportional odds models with case-1 and case-2 data. The estimators are obtained by the iterative convex minorant algorithm, which becomes unstable for large datasets. Sieve maximum likelihood estimation for the proportional odds model was considered by Rossini & Tsiatis (1996) with case-1 data and by Huang & Rossini (1997) and Shen (1998) with case-2 data; however, it is difficult to choose an appropriate sieve parameter space and, especially, to choose the number of knots. For the proportional odds model with case-1 and case-2 data, Rabinowitz et al. (2000) derived an approximate conditional likelihood, which does not perform well in small samples. Gu et al. (2005), Sun & Sun (2005), Zhang et al. (2005) and Zhang & Zhao (2013) constructed rank-based estimators for linear transformation models, but such estimators are computationally demanding and statistically inefficient. None of the existing work accommodates time-dependent covariates or can handle case- Inline graphic or mixed-case interval censoring.

In this paper we consider interval censoring in the most general form, that is, mixed-case data. We study nonparametric maximum likelihood estimation for a broad class of transformation models that allows time-dependent covariates and includes the proportional hazards and proportional odds models as special cases. We develop an EM-type algorithm, which is demonstrated to perform satisfactorily in a wide variety of settings, even with time-dependent covariates. Using empirical process theory (van der Vaart & Wellner, 1996; van de Geer, 2000) and semiparametric efficiency theory (Bickel et al., 1993), we establish that, under mild conditions, the proposed estimators for the regression parameters are consistent and asymptotically normal and the limiting covariance matrix attains the semiparametric efficiency bound and can be estimated analytically by the profile likelihood method (Murphy & van der Vaart, 2000). The theoretical development requires careful treatment of the time trajectories of covariate processes and the joint distribution for an arbitrary sequence of monitoring times.

2. Methods

2.1. Transformation models and likelihood construction

Let Inline graphic denote the failure time, and let denote a -vector of potentially time-dependent covariates. Under the semiparametric transformation model, the cumulative hazard function for conditional on takes the form

(1)

where Inline graphic is a specific transformation function that is strictly increasing and is an unknown increasing function (Zeng & Lin, 2006). The choices of and yield the proportional hazards and proportional odds models, respectively. It is useful to consider the class of frailty-induced transformations

where Inline graphic is the density function of a frailty variable with support . The choice of the gamma density with unit mean and variance for yields the class of logarithmic transformations, , and the choice of the positive stable distribution with parameter yields the class of Box–Cox transformations, Inline graphic . When all the covariates are time-independent, model (1) can be rewritten as a linear transformation model

where Inline graphic is an error term with distribution function (Chen et al., 2002). Thus, can be interpreted as the effects of covariates on a transformation of .

We formulate the mixed-case interval censoring by assuming that the number of monitoring times, denoted by Inline graphic , is random and that there exists a random sequence of monitoring times, denoted by . We do not model . Write , where and . Also, define , where with denoting the indicator function. Then the observed data from a random sample of subjects consist of , where and . If or 2 , then the observation scheme becomes case-1 or case-2, respectively.

Suppose that Inline graphic is independent of conditional on . Then the observed-data likelihood function concerning parameters takes the form

Since only one Inline graphic is unity for each subject and the others equal zero,

where Inline graphic is the smallest interval that brackets , i.e., and . Clearly, indicates that the th subject is left censored, while indicates that the subject is right censored.

Remark 1. —

The sequence of monitoring times may not be completely observed and, in fact, need not be for the purpose of inference. We only need to know the values of and , since the other monitoring times do not contribute to the likelihood. The theoretical development, however, requires consideration of the joint distribution for the entire sequence of monitoring times.

2.2. Nonparametric maximum likelihood estimation

To estimate Inline graphic and , we adopt the nonparametric maximum likelihood approach, under which is regarded as a step function with nonnegative jumps at the endpoints of the smallest intervals that bracket the failure times. Specifically, if denotes the set consisting of 0 and the unique values of and , then the estimator for Inline graphic is a step function with jump size at and with . Hence, we maximize the function

(2)

Direct maximization of (2) is difficult due to the lack of an analytical expression for the parameters Inline graphic . An even more severe challenge is that not all the and are informative about the failure times, so many of the are zero and hence lie on the boundary of the parameter space. For example, if there are no interval endpoints between some and with , then the jump size at must be zero in order to maximize (2). The existing iterative convex minorant algorithm works only for the proportional hazards and proportional odds models with time-independent covariates (Huang & Wellner, 1997). In the following, we construct an EM algorithm to maximize (2).

For the class of frailty-induced transformations described in §2.1, the observed-data likelihood can be written as

so that the estimation of the transformation model becomes that of the proportional hazards frailty model. With Inline graphic as a step function with jumps at , this likelihood becomes

(3)

where Inline graphic . We introduce latent variables which, conditional on , are independent Poisson random variables with means . We show below that the nonconcave likelihood function given in (3) is equivalent to a likelihood function for these Poisson variables, so the M-step becomes maximization of a weighted sum of Poisson loglikelihood functions which is strictly concave and has closed-form solutions for Inline graphic . Similar Poisson variables were recently used by Wang et al. (2015) in spline-based estimation of the proportional hazards model with time-independent covariates.

Define Inline graphic and . Suppose that the observed data consist of , where means that is known to be zero and that means that is known to be positive, such that for and at least one for with . Then the likelihood takes the form

(4)

which is the same as (3). Thus, maximization of (3) is equivalent to maximum likelihood estimation based on the data Inline graphic .

We maximize (4) through an EM algorithm by treating Inline graphic and as missing data. The complete-data loglikelihood is

(5)

where Inline graphic . In the M-step, we calculate

(6)

where Inline graphic denotes the posterior mean given the observed data. After incorporating (6) into the conditional expectation of (5), we update by solving the following equation using the one-step Newton–Raphson method:

In the E-step, we evaluate the posterior means Inline graphic and . The posterior density function of given the observed data is proportional to , where and . Hence, we evaluate the posterior means by noting that for ,

and for Inline graphic with ,

which can be calculated using Gaussian–Laguerre quadrature. In addition,

where Inline graphic for any function .

We iterate between the E- and M-steps until the sum of the absolute differences of the estimates at two successive iterations is less than, say, Inline graphic . This EM algorithm has several desirable features. First, the conditional expectations in the E-step involve at most one-dimensional integration, so they can be evaluated accurately by Gaussian quadrature. Second, in the M-step, the high-dimensional parameters are calculated explicitly, while the low-dimensional parameter vector Inline graphic is updated by the Newton–Raphson method. In this way, the algorithm avoids the inversion of any high-dimensional matrices. Finally, the observed-data likelihood is guaranteed to increase after each iteration. To avoid local maxima, we suggest using a range of initial values for while setting Inline graphic to . We denote the final results by .

2.3. Variance estimation

We use profile likelihood (Murphy & van der Vaart, 2000) to estimate the covariance matrix of Inline graphic . Specifically, we define the profile loglikelihood

where Inline graphic is the set of step functions with nonnegative jumps at . Then the covariance matrix of is estimated by the negative inverse of the matrix whose th element is

where Inline graphic is the th canonical vector in and is a constant of order . To calculate for each , we reuse the proposed EM algorithm with held fixed. Thus, the only step in the EM algorithm is to explicitly evaluate and so as to update using (6). The iteration converges quickly with as the initial value.

3. Asymptotic theory

We establish the asymptotic properties of Inline graphic under the following regularity conditions.

Condition 1. —

The true value of , denoted by , lies in the interior of a known compact set in , and the true value of , denoted by , is continuously differentiable with positive derivatives in , where is the union of the supports of .

Condition 2. —

The vector is uniformly bounded with uniformly bounded total variation over , and its left limit exists for any . In addition, for any continuously differentiable function , the expectations are continuously differentiable in , where and are increasing functions in the decomposition .

Condition 3. —

If for all with probability 1, then for and .

Condition 4. —

The number of monitoring times, , is positive, and . The conditional probability is greater than some positive constant . In addition, for some positive constant . Finally, the conditional densities of given and , denoted by , have continuous second-order partial derivatives with respect to and when and are continuously differentiable with respect to .

Condition 5. —

The transformation function is twice continuously differentiable on with , and .

Remark 2. —

Condition 1 is standard in survival analysis. Condition 2 allows to have discontinuous trajectories, but the expectation of any smooth functional of must be differentiable. One example would be that is a stochastic process with a finite number of piecewise-smooth trajectories, where the discontinuity points have a continuous joint distribution. This condition excludes taking Brownian motion as a process for . Condition 3 holds if the matrix is nonsingular for some . Condition 4 pertains to the joint distribution of monitoring times. First, it requires that the monitoring occur anywhere in and that the largest monitoring time be equal to with positive probability. The latter assumption may be removed, at the expense of more complicated proofs. Condition 4 also requires that two adjacent monitoring times be separated by at least ; otherwise, the data may contain exact observations, which would entail a different theoretical treatment. The smoothness condition for the joint density of monitoring times is used to prove the Donsker property of some function classes and the smoothness of the least favourable direction. Finally, Condition 5 pertains to the transformation function and holds for both the logarithmic family and the Box–Cox family .

The following theorem establishes the strong consistency of Inline graphic .

Theorem 1. —

Under Conditions 1–5,almost surely as, whereis the Euclidean norm.

It is implicitly assumed in Theorem 1 that Inline graphic is restricted to , although in practice is allowed to be very large. The proof of Theorem 1 is based on the Kullback–Leibler information and makes use of the strong consistency of empirical processes. Careful arguments are needed to establish a preliminary bound for and to handle time-dependent covariates. Our next theorem establishes the asymptotic normality and semiparametric efficiency of Inline graphic .

Theorem 2. —

Under Conditions 1–5,converges in distribution asto a zero-mean normal random vector whose covariance matrix attains the semiparametric efficiency bound.

The proof of Theorem 2 relies on the derivation of the least favourable submodel for Inline graphic and utilizes modern empirical process theory. A key step is to show that converges to at the rate. Although the general procedure is similar to that of Huang & Wellner (1997), a major innovation is the derivation of the least favourable submodel for general interval censoring and time-dependent covariates by carefully handling the trajectories of Inline graphic and the joint distribution of . The existence of the least favourable submodel is also used at the end of the Appendix to show consistency of the profile-likelihood covariance estimator given in §2.3.

4. Simulation studies

We conducted simulation studies to assess the operating characteristics of the proposed numerical and inferential procedures. In the first study, we considered two time-independent covariates, Inline graphic and . In the second study, we allowed to vary over time by imitating two-stage randomization: , where and are independent and with . In both studies, we generated the failure times from the transformation model

where Inline graphic . We set , and . To create interval censoring, we randomly generated two monitoring times, and , so that the time axis was partitioned into three intervals, , and . On average, there were 25–35% left-censored observations and 50–60% right-censored ones. We set , 400 or 800 and used 10 000 replicates for each sample size.

For each dataset, we applied the proposed EM algorithm by setting the initial value of Inline graphic to 0 and the initial value of to , and we set the convergence threshold to . We also tried other initial values for , but they all led to the same estimates. For the variance estimation, we set , but the results differed only in the third decimal place if we used or . There was no nonconvergence in any of the EM iterations.

Tables 1 and 2 summarize the results of the two simulation studies under Inline graphic or 1. The parameter estimators have small bias, and the bias decreases rapidly as increases. The variance estimators accurately reflect the true variabilities, and the confidence intervals have proper coverage probabilities. As shown in Fig. 1, the estimated cumulative hazard functions have negligible bias.

Table 1.

Summary statistics for the simulation study with time-independent covariates


	Est	SE	SEE	CP	Est	SE	SEE	CP	Est	SE	SEE	CP
0	0515	0209	0216	96	0506	0148	0149	95	0503	0103	0104	95
	515	0366	0354	94	505	0254	0248	95	504	0176	0174	95
	0514	0255	0259	96	0507	0180	0176	94	0503	0125	0125	95
	516	0451	0434	94	505	0311	0303	94	503	0215	0212	94
1	0516	0294	0297	95	0506	0209	0207	95	0504	0145	0144	95
	517	0522	0503	94	505	0358	0350	95	502	0249	0244	94

Open in a new tab

Est, empirical average of the parameter estimator; SE, standard error of the parameter estimator; SEE, empirical average of the standard error estimator; CP, empirical coverage percentage of the 95% confidence interval.

Table 2.

Summary statistics for the simulation study with time-dependent covariates


	Est	SE	SEE	CP	Est	SE	SEE	CP	Est	SE	SEE	CP
0	0529	0241	0239	95	0518	0166	0164	95	0509	0114	0114	95
	515	0363	0353	95	511	0253	0247	94	503	0175	0173	95
	0533	0292	0280	94	0522	0198	0193	94	0511	0138	0134	94
	514	0441	0433	95	512	0307	0302	95	503	0214	0211	95
1	0537	0336	0317	94	0525	0228	0219	94	0514	0157	0152	94
	518	0512	0502	95	513	0358	0349	95	505	0250	0243	94

Open in a new tab

Fig. 1. — Estimation of with , for (a) and (b) . The solid and dashed curves represent the true values and mean estimates, respectively.

We conducted an additional simulation study with five covariates. We set Inline graphic to zero-mean normal with unit variances and pairwise correlations of 05 and took ; the other simulation settings were left unchanged. The results are summarized in the Supplementary Material. The proposed methods performed well in this simulation too. Again, there were no cases of nonconvergence.

To evaluate the performance of the EM algorithm in even larger datasets, we set Inline graphic and to ten standard normal random variables with pairwise correlations of 025 and regression coefficients of 05. The algorithm converged to values close to 05 in all 10 000 replicates.

5. Application

The Bangkok Metropolitan Administration conducted a cohort study of 1209 injecting drug users who were initially sero-negative for the HIV-1 virus. Subjects from 15 drug treatment clinics were followed from 1995 to 1998. At study enrolment and approximately every four months thereafter, subjects were assessed for HIV-1 sero-positivity through blood tests. As of December 1998 there were 133 HIV-1 sero-conversions and roughly 2300 person-years of follow-up.

We aim to identify the factors that influence HIV-1 infection. We fit model (1) with the class of logarithmic transformations Inline graphic . The covariates include age at recruitment, gender, history of needle sharing, and drug injection in jail before recruitment; age is measured in years, gender takes value 1 for male and 0 for female, and history of needle sharing and drug injection are binary indicators of yes or no. In addition, we include a time-dependent covariate indicating imprisonment since the last clinic visit.

To select a transformation function, we vary Inline graphic from 0 to 15 in steps of 005. For each , we estimate and by the EM algorithm and evaluate the loglikelihood at the parameter estimates. Figure 2(a) shows that the loglikelihood changes only very slowly as varies and is maximized at . We choose , which corresponds to the proportional odds model. Table 3 shows the results under this model. For comparison, we also include the results for Inline graphic , which corresponds to the proportional hazards model.

Fig. 2. — Analysis of the Bangkok Metropolitan Administration HIV-1 study: (a) the loglikelihood at the nonparametric maximum likelihood estimates plotted as a function of in the logarithmic transformations; (b) estimation of infection-free probabilities, where the upper lines correspond to a low-risk subject under the proportional hazards (solid) and proportional odds (dashed) models, and the lower lines correspond to a high-risk subject under the proportional hazards (solid) and proportional odds (dashed) models.

Table 3.

Regression analysis of the Bangkok Metropolitan Administration HIV-1 infection data

	Proportional hazards			Proportional odds
Covariates	Estimate	Standard error	-value	Estimate	Standard error	-value
Age	028	0012	0021	031	0013	0016
Gender	0424	0270	0117	0539	0310	0082
Needle sharing	0237	0183	0196	0251	0196	0200
Drug injection	0313	0184	0089	0360	0198	0069
Imprisonment over time	0502	0211	0017	0494	0219	0024

Open in a new tab

Under either model, ageing reduces the risk of HIV-1 infection, whereas being male increases it. In addition, drug injection increases the risk. Finally, subjects who have recently been imprisoned have an elevated risk of HIV-1 infection.

Figure 2(b) shows the prediction of HIV-1 infection for a low-risk subject versus a high-risk subject. The low-risk subject is a 50-year-old female with no history of needle sharing, no drug injection in jail before recruitment, and no imprisonment during follow-up; the high-risk subject is a 20-year-old male with a history of needle sharing, drug injection in jail before recruitment, and imprisonment over time. The estimated probabilities of infection for the low-risk subject are similar under the proportional odds and proportional hazards models. For the high-risk subject, however, the proportional odds model yields slightly higher risks of infection than the proportional hazards model during the first part of the follow-up period, with the opposite being true during the later part of the follow-up period.

6. Remarks

The presence of time-dependent covariates poses major computational and theoretical challenges. With time-dependent covariates, the parameters Inline graphic and in the likelihood function are entangled. As a result, the diagonal approximation to the Hessian matrix in the iterative convex minorant algorithm (Huang & Wellner, 1997) is inaccurate, and the algorithm becomes unstable. By contrast, each iteration of our EM algorithm only solves a low-dimensional equation for Inline graphic while calculating the jump sizes of explicitly as weighted sums of Poisson rates. Thus, our algorithm is fast and stable. In extensive numerical studies we have never encountered nonconvergence. Our software is available at http://dlin.web.unc.edu/software.

Our theoretical development requires that the population average of the covariate process be smooth but allows individual covariate trajectories to be discontinuous. We treat Inline graphic as a bundled process of and when proving the identifiability of in Theorem 1, the convergence rate of in Lemma A1, and the invertibility of the information operator in Theorem 2. The Donsker property for this class of processes indexed by is used repeatedly in the proofs. Besides time-dependent covariates, one major theoretical challenge is dealing with general interval censoring, which allows each subject to have a different number of monitoring times. In particular, the derivation of the least favourable direction for Inline graphic requires careful consideration of the joint distribution for an arbitrary sequence of monitoring times, and the Lax–Milgram theorem is used to prove the existence of a least favourable direction. That theorem greatly simplifies the proof, in contrast to the approach of Huang & Wellner (1997), even for case-2 data.

To apply the transformation model to real data, one must choose a transformation function. In the analysis of the Bangkok Metropolitan Administration HIV-1 data, we used the aic to select the transformation function, although the likelihood surface is fairly flat. It would be worthwhile to develop formal diagnostic procedures to check the appropriateness of the transformation function and other model assumptions. One possible strategy is to examine the behaviour of the posterior mean of the martingale residuals (Chen et al., 2012) given the observed intervals.

In many applications, the event of interest may occur repeatedly over time. Recurrent events under interval censoring are called panel count data, which have been studied by Sun & Wei (2000), Zhang (2002) and Wellner & Zhang (2007), among others. There are also studies in which each subject can experience different types of events or where subjects are sampled in clusters such that the failure times with the same cluster are correlated. We are currently developing regression methods to handle such multivariate failure time data.

We are also extending our work to competing risks data. Indeed, the Bangkok Metropolitan Administration HIV-1 study contains information on HIV-1 infection by viral subtypes B and E, which are two competing risks. We propose to formulate the effects of potentially time-dependent covariates on the cumulative incidence functions of competing risks in the form of model (1). We will modify the EM algorithm to deal with multiple subdistribution functions and establish the asymptotic theory under suitable conditions.

Supplementary material

Supplementary material available at Biometrika online includes additional simulation results and a zip file containing the computer code and documentation.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(1.6MB, zip)}

Acknowledgments

This research was supported by the U.S. National Institutes of Health. The authors thank the editor, an associate editor and a referee for helpful comments.

Appendix A. Appendix

Technical details

We use Inline graphic to denote the empirical measure from independent observations and to denote the true probability measure. The corresponding empirical process is . Let be the observed-data loglikelihood for a single subject, that is,

Proof of Theorem 1. —

We first show that with probability 1. By Condition 4, the measure generated by the function is dominated by the sum of the Lebesgue measure in and the counting measure at , and its Radon–Nikodym derivative, denoted by , is bounded away from zero. We define

Clearly, is a step function with jumps only at . Since

uniformly in with probability 1 as , we conclude that converges uniformly to with probability 1 for .

By the definition of , we have . Because of its bounded total variation, belongs to a Donsker class indexed by . Hence, the class of functions

where denotes functions which have total variation in bounded by a given constant , is a convex hull of functions , so it is a Donsker class. Furthermore,

is bounded away from zero. Therefore, belongs to some Donsker class due to the preservation property of the Donsker class under Lipschitz-continuous transformations. We conclude that almost surely. In addition, by the construction of , converges almost surely to , which is finite. Therefore, with probability 1,

(A1)

Let be such that for . Then the left-hand side of (A1) is less than or equal to

Hence . Since as , , which is positive, Condition 5 implies that with probability 1.

We can now restrict to a class of functions with uniformly bounded total variation, equipped with the weak topology on . By Helly's selection lemma, for any subsequence of we can choose a further subsequence such that converges weakly to some on , converges to , and converges to . Clearly, implies that , where , so that

By the above arguments for proving the Donsker property of , together with the fact that the total variation of is bounded by a constant, we can show that belongs to a Donsker class with a bounded envelope function, so that . It follows that Furthermore,

where is the measure corresponding to . According to Conditions 2 and 4, is dominated by the Lebesgue measure with bounded derivative in and has a point mass at . Hence almost surely. We therefore conclude that

By the properties of the Kullback–Leibler information, with probability 1. In particular, for any , we choose to obtain

Thus, for any ,

(A2)

Differentiating both sides with respect to , we have

By Condition 3, and for . We let by redefining to centre at a deterministic function in the support of , and we set in (A2) to obtain . Hence, for . It follows that and converges weakly to almost surely. The latter convergence can be strengthened to uniform convergence since is continuous. Thus, we have proved Theorem 1. □

Since we have established consistency, we may restrict the space of Inline graphic to

for some Inline graphic . Thus, when is large enough, belongs to with probability 1. Before proving Theorem 2, we need to establish the convergence rate for . Specifically, the following lemma holds.

Lemma A1. —

Under Conditions 1–5,

Proof. —

The proof relies on the convergence-rate result in Theorem 3.4.1 of van der Vaart & Wellner (1996). To use that theorem, we define and let

be a class of functions indexed by and .

We first calculate the -bracketing number of . Because consists of increasing and uniformly bounded functions on , Lemma 2.2 of van de Geer (2000) implies that for any , the bracketing number satisfies

where denotes the -norm with respect to the Lebesgue measure on , and means that for a positive constant . For , we can find number of brackets with and to cover . In addition, there are number of brackets covering , such that any two within the same bracket differ by at most . Hence, there are a total of brackets covering . For any pair of and , there exist some constants and such that

Because the measures and have bounded derivatives with respect to the Lebesgue measure in and the former has a finite point mass at ,

Thus, the bracketing number for satisfies

and so it has a finite entropy integral. Define

It is easy to show that . In addition, by Lemma 1.3 of van de Geer (2000),

where is the Hellinger distance, defined as

with respect to the dominating measure .

The above results, together with the fact that maximizes and the consistency result in Theorem 1, imply that all the conditions in Theorem 3.4.1 of van der Vaart & Wellner (1996) hold. Thus, we conclude that , where satisfies . In particular, we can choose in the order of such that . By the mean value theorem,

On the left-hand side of the above equation, we consider the event to find that

Next, we consider to obtain the same equation as the one above but with replaced by . By repeating this process, we conclude that, conditional on and for any ,

It then follows from the mean value theorem that

and so the lemma is proved. □

Now we are ready to prove Theorem 2.

Proof of Theorem 2. —

It is helpful to introduce the following notation: ,

and

The score function for is

To obtain the score operator for , we consider any parametric submodel of defined by , where . The score function along this submodel is

Clearly, and . Hence

We apply Taylor series expansions about to the right-hand sides of the above two equations. By Lemma A1, the second-order terms are bounded by

Hence

(A3)

(A4)

where is the second derivative of with respect to , is the derivative of along the submodel , is the derivative of with respect to , and is the derivative of along the submodel . All the derivatives on the right-hand sides of (A3) and (A4) are evaluated at .

We choose to be the least favourable direction , a -vector with components in that solves the normal equation

(A5)

where is the adjoint operator of . Then

so that the difference between (A3) and (A4) yields

(A6)

Consequently, Theorem 2 will be established if we can show that:

equation (A5) has a solution ;

belongs to a Donsker class and converges in the -norm to ;

the matrix is invertible.

The reason is that when (i)–(iii) hold, (A6) entails and further yields

This implies that the influence function for is exactly the efficient influence function, so that converges to a zero-mean normal random vector whose covariance matrix attains the semiparametric efficiency bound (Bickel et al., 1993, p. 65).

We first verify (i). For any ,

where

Likewise,

Therefore, by the definition of the dual operator , solving the normal equation (A2) is equivalent to solving the integral equation

(A7)

We define the left-hand side of (A7) as a linear operator which maps to itself. In addition, we equip with an inner product so that it becomes a Hilbert space. On the same space, we define . It is easy to show that is a seminorm for . Furthermore, if , then . Thus, with probability 1, is zero, i.e., for any ,

Setting we obtain

Thus, for , implying that is a norm in . Clearly, for some constant . According to the bounded inverse theorem in Banach spaces, we have for another constant ; that is, we have . By the Lax–Milgram theorem (Zeidler, 1995), the solution to (A7), namely exists. So we have verified (i).

To verify (ii), we examine by considering and . Along the lines of Huang & Wellner (1997), we differentiate the integral equation (A7) with respect to to obtain

where and are continuously differentiable with respect to their arguments. Hence is continuously differentiable in . This fact implies that belongs to some Donsker class. It then follows from the Donsker property of the class that (ii) is true.

Finally, we verify (iii). If the matrix is singular, then there exists a nonzero vector such that

It follows that, with probability 1, the score function along the submodel is zero; that is, for any ,

where . We consider for to obtain

Therefore, with probability 1, for any . This implies that , so by Condition 3. Thus, we have verified (iii). □

Remark A1. —

For a given , we define as the step function that maximizes . The arguments in the proof of Theorem 1 can be used to show that is bounded asymptotically when is in a small neighbourhood of , so converges to as converges to . In addition, the arguments in the proof of Lemma A1 yield in the space. Finally, in light of the existence of in the proof of Theorem 2, we can define to obtain the least favourable submodel as , where is in a neighbourhood of . Thus, we can easily verify conditions (8), (9) and (10) of Murphy & van der Vaart (2000) for the likelihood function along this submodel. Along with the Donsker property of the functional classes for the first and second derivatives of with respect to and , we conclude that Theorem 1 of Murphy & van der Vaart (2000) is applicable, and hence the covariance matrix estimator given in §2.3 with is consistent for the limiting covariance matrix of .

References

Bickel P. J., Klaassen C. A. J., Ritov Y. & Wellner J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. [Google Scholar]
Chen K., Jin Z. & Ying Z. (2002). Semiparametric analysis of transformation models with censored data. Biometrika 89, 659–68. [Google Scholar]
Chen L., Lin D. Y. & Zeng D. (2012). Checking semiparametric transformation models with censored data. Biostatistics 13, 18–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gu M. G., Sun L. & Zuo G. (2005). A baseline-free procedure for transformation models under interval censorship. Lifetime Data Anal. 11, 473–88. [DOI] [PubMed] [Google Scholar]
Huang J. (1995). Maximum likelihood estimation for proportional odds regression model with current status data. In Analysis of Censored Data, vol. 27 of IMS Lecture Notes—Monograph Series, H. L. Koul and J. V. Deshpande, eds. Hayward: Institute of Mathematical Statistics, pp. 129–46.
Huang J. (1996). Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 24, 540–68. [Google Scholar]
Huang J. & Rossini A. J. (1997). Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J. Am. Statist. Assoc. 92, 960–7. [Google Scholar]
Huang J. & Wellner J. A. (1997). Interval censored survival data: A review of recent progress. In Proc. 1st Seattle Symp. Biostatist.: Survival Anal., D. Y. Lin and T. R. Fleming, eds. New York: Springer, pp. 123–69.
Murphy S. A. & van der Vaart A. W. (2000). On profile likelihood. J. Am. Statist. Assoc. 95, 449–65. [Google Scholar]
Rabinowitz D., Betensky R. A. & Tsiatis A. A. (2000). Using conditional logistic regression to fit proportional odds models to interval censored data. Biometrics 56, 511–8. [DOI] [PubMed] [Google Scholar]
Rossini A. J. & Tsiatis A. A. (1996). A semiparametric proportional odds regression model for the analysis of current status data. J. Am. Statist. Assoc. 91, 713–21. [Google Scholar]
Schick A. & Yu Q. (2000). Consistency of the GMLE with mixed case interval-censored data. Scand. J. Statist. 27, 45–55. [Google Scholar]
Shen X. (1998). Proportional odds regression and sieve maximum likelihood estimation. Biometrika 85, 165–77. [Google Scholar]
Sun J. & Sun L. (2005). Semiparametric linear transformation models for current status data. Can. J. Statist. 33, 85–96. [Google Scholar]
Sun J. & Wei L. J. (2000). Regression analysis of panel count data with covariate-dependent observation and censoring times. J. R. Statist. Soc. B 62, 293–302. [Google Scholar]
van de Geer S. A. (2000). Empirical Process Theory and Applications. Cambridge: Cambridge University Press. [Google Scholar]
van der Vaart A. W. & Wellner J. A. (1996) Weak Convergence and Empirical Processes. New York: Springer. [Google Scholar]
Wang L., McMahan C. S., Hudgens M. G. & Qureshi Z. P. (2015). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics72, 222–31. [DOI] [PMC free article] [PubMed]
Wellner J. A. & Zhang Y. (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist. 35, 2106–42. [Google Scholar]
Zeidler E. (1995). Applied Functional Analysis: Applications to Mathematical Physics. New York: Springer. [Google Scholar]
Zeng D. & Lin D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93, 627–40. [Google Scholar]
Zhang Y. (2002). A semiparametric pseudolikelihood estimation method for panel count data. Biometrika 89, 39–48. [Google Scholar]
Zhang Z. & Zhao Y. (2013). Empirical likelihood for linear transformation models with interval-censored failure time data. J. Mult. Anal. 116, 398–409. [Google Scholar]
Zhang Z., Sun L., Zhao X. & Sun J. (2005). Regression analysis of interval-censored failure time data with linear transformation models. Can. J. Statist. 33, 61–70. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(1.6MB, zip)}

[ASW013C1] Bickel P. J., Klaassen C. A. J., Ritov Y. & Wellner J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. [Google Scholar]

[ASW013C2] Chen K., Jin Z. & Ying Z. (2002). Semiparametric analysis of transformation models with censored data. Biometrika 89, 659–68. [Google Scholar]

[ASW013C3] Chen L., Lin D. Y. & Zeng D. (2012). Checking semiparametric transformation models with censored data. Biostatistics 13, 18–31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW013C4] Gu M. G., Sun L. & Zuo G. (2005). A baseline-free procedure for transformation models under interval censorship. Lifetime Data Anal. 11, 473–88. [DOI] [PubMed] [Google Scholar]

[ASW013C5] Huang J. (1995). Maximum likelihood estimation for proportional odds regression model with current status data. In Analysis of Censored Data, vol. 27 of IMS Lecture Notes—Monograph Series, H. L. Koul and J. V. Deshpande, eds. Hayward: Institute of Mathematical Statistics, pp. 129–46.

[ASW013C6] Huang J. (1996). Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 24, 540–68. [Google Scholar]

[ASW013C7] Huang J. & Rossini A. J. (1997). Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J. Am. Statist. Assoc. 92, 960–7. [Google Scholar]

[ASW013C8] Huang J. & Wellner J. A. (1997). Interval censored survival data: A review of recent progress. In Proc. 1st Seattle Symp. Biostatist.: Survival Anal., D. Y. Lin and T. R. Fleming, eds. New York: Springer, pp. 123–69.

[ASW013C9] Murphy S. A. & van der Vaart A. W. (2000). On profile likelihood. J. Am. Statist. Assoc. 95, 449–65. [Google Scholar]

[ASW013C10] Rabinowitz D., Betensky R. A. & Tsiatis A. A. (2000). Using conditional logistic regression to fit proportional odds models to interval censored data. Biometrics 56, 511–8. [DOI] [PubMed] [Google Scholar]

[ASW013C11] Rossini A. J. & Tsiatis A. A. (1996). A semiparametric proportional odds regression model for the analysis of current status data. J. Am. Statist. Assoc. 91, 713–21. [Google Scholar]

[ASW013C12] Schick A. & Yu Q. (2000). Consistency of the GMLE with mixed case interval-censored data. Scand. J. Statist. 27, 45–55. [Google Scholar]

[ASW013C13] Shen X. (1998). Proportional odds regression and sieve maximum likelihood estimation. Biometrika 85, 165–77. [Google Scholar]

[ASW013C14] Sun J. & Sun L. (2005). Semiparametric linear transformation models for current status data. Can. J. Statist. 33, 85–96. [Google Scholar]

[ASW013C15] Sun J. & Wei L. J. (2000). Regression analysis of panel count data with covariate-dependent observation and censoring times. J. R. Statist. Soc. B 62, 293–302. [Google Scholar]

[ASW013C16] van de Geer S. A. (2000). Empirical Process Theory and Applications. Cambridge: Cambridge University Press. [Google Scholar]

[ASW013C17] van der Vaart A. W. & Wellner J. A. (1996) Weak Convergence and Empirical Processes. New York: Springer. [Google Scholar]

[ASW013C18] Wang L., McMahan C. S., Hudgens M. G. & Qureshi Z. P. (2015). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics72, 222–31. [DOI] [PMC free article] [PubMed]

[ASW013C19] Wellner J. A. & Zhang Y. (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist. 35, 2106–42. [Google Scholar]

[ASW013C20] Zeidler E. (1995). Applied Functional Analysis: Applications to Mathematical Physics. New York: Springer. [Google Scholar]

[ASW013C21] Zeng D. & Lin D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93, 627–40. [Google Scholar]

[ASW013C22] Zhang Y. (2002). A semiparametric pseudolikelihood estimation method for panel count data. Biometrika 89, 39–48. [Google Scholar]

[ASW013C23] Zhang Z. & Zhao Y. (2013). Empirical likelihood for linear transformation models with interval-censored failure time data. J. Mult. Anal. 116, 398–409. [Google Scholar]

[ASW013C24] Zhang Z., Sun L., Zhao X. & Sun J. (2005). Regression analysis of interval-censored failure time data with linear transformation models. Can. J. Statist. 33, 61–70. [Google Scholar]

PERMALINK

Maximum likelihood estimation for semiparametric transformation models with interval-censored data

Donglin Zeng

Lu Mao

D Y Lin

Abstract

1. Introduction

2. Methods

2.1. Transformation models and likelihood construction

Remark 1. —

2.2. Nonparametric maximum likelihood estimation

2.3. Variance estimation

3. Asymptotic theory

Condition 1. —

Condition 2. —

Condition 3. —

Condition 4. —

Condition 5. —

Remark 2. —

Theorem 1. —

Theorem 2. —

4. Simulation studies

Table 1.

Table 2.

Fig. 1.

5. Application

Fig. 2.

Table 3.

6. Remarks

Supplementary material

Supplementary Material

Acknowledgments

Appendix A. Appendix

Technical details

Proof of Theorem 1. —

Lemma A1. —

Proof. —

Proof of Theorem 2. —

Remark A1. —

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases