Abstract
In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed “doubly-censored data”. This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left censoring.
Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the nonparametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left censoring feature of the data. The new algorithm not only resolves this challenge but also accommodate the additional frailty variable effectively.
Asymptotic properties of the NPMLE are established along with semi-parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data.
Keywords and phrases: Frailty model, semi-parametric efficiency, EM algorithm, Monte Carlo integrations
1. Introduction
In the past decades, Cox’s proportional hazards model (Cox, 1972), along with its generalizations, have been widely explored and the corresponding asymptotic theories have been well established for independently sampled subjects. When subjects are correlated, e.g. under a clustered sampling plan in familial-type studies, the approaches for independent samples are no longer suitable. A common approach to accommodate familial or, more generally, clustered data is the shared-frailty model, which assumes independence for subjects from different clusters but a shared-frailty variable for subjects in the same cluster. Such a frailty model is generally useful to explain the dependency of subjects within the same cluster due to shared genes and environmental background.
For the Cox proportional hazards model with a shared frailty, this leads to a proportional hazards model with a multiplicative frailty term w, which is random and unobservable for subjects within the same cluster, and which explains the dependency among subjects. This class of models was first introduced and termed “frailty” by Vaupel, Manton and Stallard (1979), and subsequently studied for right censored data by Nielsen et al. (1992), Murphy (1994, 1995), and Parner (1998) among others. Due to the latent term in the model, the elegant partial likelihood approach (Cox, 1972, 1975) is no longer applicable. Two alternative approaches have been adopted in the literature, one reverts to the full likelihood approach and the other treats the frailty variables as parameters in the estimation process but imposes a penalty in the partial likelihood to regularize the high dimensional parameters induced by frailties (Therneau, Grambsch and Pankratz, 2003).
The full likelihood approach leads to nonparametric maximum likelihood estimators (NPMLE) when the baseline hazard function is modeled nonparametrically and an expectation-maximization (EM) algorithm is proposed in Nielsen et al. (1992) when the frailty distribution follows a Gamma distribution. The corresponding asymptotic theories, including consistency and asymptotic normality of the NPMLE, were well-studied by Murphy (1994, 1995) and Parner (1998) for the cases without and with covariates, respectively. All these approaches adopt the Gamma frailty assumption, mainly due to its computational advantages, as the posterior distribution involved in the E-step of the EM algorithm is also a Gamma distribution. Other frailty distributions, such as log-normal or Weibull distribution, could be employed at additional computational cost, since numerical integration methods, such as Monte Carlo (MC) integration, will be needed to estimate the posterior distributions at each step of the EM-algorithm. Besides using a full likelihood approach, Ripatti and Palmgren (2000) investigated the penalized partial likelihood estimator with a log-normal frailty distribution and Therneau, Grambsch and Pankratz (2003) showed that, with a gamma frailty distribution and a special type of penalty, this leads to the same estimates for the regression parameters as those obtained from an EM algorithm.
All the aforementioned approaches are for right censored data. Our focus in this paper is to study the estimating procedure and accompanying theory for the shared-frailty model when data are subject to both right and left censoring, i.e., double-censoring. An example is a familial-type study for Hepatitis B patients, whose age at e-antigen seroconversion is the primary focus of the study. However, due to delayed entry into the study, subjects who have e-seroconverted prior to entry into the study were left censored (only their age at entry is available), while all other subjects are subject to the usual right censorship that is common in longitudinal follow-up studies. This leads to the double-censorship considered in this paper. We make a note here that the terminology “double-censoring” is confusing by itself, as there are two different definitions in the literature. The first definition is that the survival time of interest can only be observed within a certain window determined by the left- and right-censoring times. Outside this window, the survival time is only known to be either less than the left censoring time or greater than the right-censoring time. This is the situation considered in this paper, and it has also been considered by Turnbull (1974), Chang and Yang (1987), Chang (1990), Mykland and Ren (1996), Cai and Cheng (2004), Zhang and Jamshidian (2004) and Kim, Kim and Jang (2010). Another definition of double-censoring, as adopted in DeGruttola and Lagakos (1989), Kim, DeGruttola and Lagakos (1993) and Kim (2006), refers to a time period where both endpoints of the time period are subject to either left, right or interval censoring. This second type of double censorship is not considered here as our data conforms to the first type of double-censorship.
Estimating the survival function when there is no covariate has been well investigated in the literature. For instance, Chang and Yang (1987) and Chang (1990) address the consistency and asymptotic normality of the self-consistent estimators of the survival function, while Mykland and Ren (1996) and Zhang and Jamshidian (2004) discuss algorithmic issues for self-consistent estimators and maximum likelihood estimators. The NPMLE under the Cox model had not been explored until Kim, Kim and Jang (2010) established its consistency and asymptotic normality. However, the numerical computation of NPMLE remains a challenge and, to the best of our knowledge, the shared-frailty model for the first type of double censoring has eluded the attention of researchers.
We investigate in this paper the nonparametric maximum likelihood approach and study the asymptotic theory for the NPMLEs. Additionally, a workable numerical algorithm to locate the NPMLEs is proposed along with sufficient conditions to ensure convergence of the algorithm. Our approach works with or without a frailty term and resolves the computational difficulties for doubly-censored data without a frailty term. We would like to make a note here that the proposed numerical method without frailty terms is an independent work of that shown in Kim, Kim and Jang (2013) though the idea of treating the left-censorship as missing data in the EM algorithm is similar to their work. This idea was firstly demonstrated in Su (2011), and further studied in this paper. The model is introduced in Section 2 followed by a computational algorithm presented in Section 3. The left censorship present in the data poses computational challenges in contrast to the right censoring situation due to the lack of a closed-form solution for the score equation during the M-step of the EM-algorithm. We resolve this difficulty by introducing in Section 3 a modified MCEM algorithm which can be seen as a weighted version of regular MCEM algorithm. Asymptotic properties and estimations of the standard errors of the proposed NPMLEs are discussed in Section 4. Simulation studies are presented in Section 5 to provide numerical support for the new algorithm and an analysis of the motivating example is provided in Section 6.
2. Model and the NPMLE
Consider a cluster sampling plan for, e.g. familial data, when n independent families are sampled and data are collected for each of the ni subjects in the ith cluster. The goal is to study the association between the response variable, the survival time T of a subject, and its vector covariates Z. Because the survival times of subjects from the same family may be correlated due to shared gene or environmental background, we assume that a shared-frailty variable W, which could be a vector, explains the dependency of all subjects from the same family/cluster. More specifically, let Tij denote the survival time of the jth subject from the ith cluster with frailty variable Wi and zij be the observed value of its covariates.
The shared-frailty model assumes that, given the value wi of the frailty variable and the covariate value zij, the hazard function for this subject takes the form:
| (1) |
where λ0 is the baseline hazard rate and β stands for the regression parameter. Besides the violation of the independence assumption, we face another complication for the motivating Hepatitis B data in that the survival time T of a subject is subject to either left- or right-censoring by L and R, respectively. In the following we denote T̃ = max(L, min(T, R)), the observed event-time, and δ = I(T ≤ R) and η = I(T ≥ L), the right- and left-censored indicators respectively. By the fact that each subject is only subject to one type of censoring, δ + η = 1 or 2 always holds. One thing worths to point out is that the left-censoring time (for example, the time at recruitment in the Hepatitis B study) is always observed. This is different from the cases subject to right-censoring only. Consequently, the observed data for a subject is either (T̃, δ, η, Z, L) for uncensored or right-censored individual, or (T̃, δ, η, Z) for left-censored one where T̃ is exactly equal to L. The following conditions are for the identifiability of this model and the construction of the likelihood function.
-
C1
The left- and right-censoring times for the jth subject of the ith cluster, denoted respectively as Lij and Rij, are continuously distributed on [0, ∞) with density functions fL and fR, respectively; where Lij is the age at the entry of the study, and the right censoring time Rij is written as Lij + Yij with Yij ≥ 0, since right censoring can only occur after a subject enters the study.
-
C2
Let Zi be the ni × q covariate matrix for the ith cluster, where the jth row, denoted by Zij, is the covariate vector of the jth subject in the ith cluster. The probability that is full rank is positive. Moreover, if cT Zij = 0 with positive probability, it implies c = 0. These conditions mean that the covariates are independent within and between subjects.
-
C3
Conditional on Wi and Zij, (Lij, Yij) are independent of Tij and their joint distribution does not involve β or the frailty distribution. This implies that both the left and right censoring schemes are noninformative.
-
C4
The frailty variables, W1, …, Wn, are i.i.d. from a density fW (·| γ) with mean 1 and variance γ. The Laplace transform of fW, denoted by Mγ(t) = Eγ[exp(−Wt)], for any 0 < t < ∞, satisfies the following conditions: Mγ(0+) = 1, Mγ(t) > 0, , and for all γ in a compact set in ℝ. Moreover, the frailty W and the covariate Z are independent.
Conditions C1 and C3 are standard assumptions for survival data in the presence of left or right censoring to facilitate the expression of the likelihood function. Conditions C2 and C4 are needed for the identifiability of the frailty model and integrability of those integrals that involve the frailty distribution.
Let f, S, and F stand for the density, survival, and cumulative density functions respectively of the random variable in the subscript, and Λ0 is the baseline cumulative hazard function of survival time, i.e. . The likelihood contributed by a left-censored subject, given the frailty w and covariates z, is FT (t̃|w, z). The likelihood contributed by an uncensored or right-censored subject, given the frailty w and covariates z, is [fT (t̃|w, z)]δη [ST (t̃|w, z)](1−δ)η (see Appendix 1 for the detail). Therefore, the likelihood contributed by the observation Oi from the ith cluster can be expressed as
| (2) |
In (2) we consider Λ0 instead of λ0 as the parameter, because Λ0 can be estimated at the same parametric rate as β and γ. As is common for models with a nonparametric parameter, the maximum likelihood estimator does not exist due to the infinite-dimensional parameter associated with Λ0. Therefore, we turn to the nonparametric maximum likelihood approach, which leads to a discrete probability measure with positive point mass assigned to all uncensored observations and an additional set of left-censored observations. This is similar to the left-censored case described in Mykland and Ren (1996), which is for a single population without covariates and the frailty term. The following lemma extends their result (Corollary 5) and provides a description of the NPMLE of Λ0 under double censoring. The proof is similar to theirs and thus omitted.
Lemma 2.1
Denote the ranked observation time points in ascending order by t̃(l), . The NPMLE of the cumulative baseline hazard function, Λ0(·), is a non-decreasing step function with jumps at all uncensored observations and an additional set of observations from left-censored subjects. The left-censored observations that receive positive mass consist of: the smallest observation at time t̃(1), if it is left-censored, and for l ≥ 2 all the left-censored observation at time t̃(l) such that the observation immediately preceding it at time t̃(l−1) is right-censored. We denote those time points with positive mass in ascending order by t1, …, tK with corresponding jump sizes λ1, …, λK.
Direct maximization of the NPMLE poses computational challenges due to the latent frailty term and the contribution to the likelihood by left censored data. The latter, which involves the cumulative distribution function FT in (2), is a more serious problem, as even when the frailty variable is known to be 1, i.e., no clustering effect and all survival data are independent, the profile likelihood has no explicit form. Kim, Kim and Jang (2010) proposed to adopt a Gauss-Seidel algorithm to solve the high-dimensional equations, which works well for small sample sizes but often fails to converge when the sample size grows to several hundreds, which is common for a medical or epidemiological study.
For computational stability and because none of the previous approaches accommodate a frailty variable, we took a different and more appealing approach in this paper to treat the left-censored survival times, along with the frailty variable, as missing variables so that a different and more effective EM algorithm can be employed to overcome the aforementioned computational challenges. Following this idea and conditioning first on the frailty variable and then left censored data, the likelihood (2) from the ith cluster can be written as
| (3) |
since .
At first glance, the likelihood in (3) involves many integrations because all left censored survival times are treated as missing data. However, as shown in Propositions 3.1 later, the actual E-step only involves one-dimensional integration over the frailty distribution due to the appealing structure of the proportional hazards model. This leads to a stable EM-algorithm with only one dimensional Monte-Carlo integration in the E-steps and a low dimensional nonlinear maximization in the M-steps.
3. EM-Algorithm
For ease of presentation, we illustrate the EM-algorithm with gamma frailty but other frailty distributions could be employed with additional computational cost in the E-step. The computational advantage of gamma frailty is that the posterior distribution required in the EM algorithm remains a gamma distribution. This feature allows directly sampling from a known gamma distribution and enhances the computational efficiency of Monte Carlo integration. This will be further illustrated in this section. Treating the frailty term and left censored times as missing data, the integrand of (3) provides the complete likelihood. Let Uij = TijI(Tij ≤ T̃ij) denote the unobserved left censored time, and is zero otherwise. The resulted complete log-likelihood for the ith cluster is
| (4) |
The complete log-likelihood, denoted as lC, from all clusters is then the sum of (4) over i = 1, …, n. For simplicity, we will denote the parameters of interest, (β, Λ0, γ), or equivalently (β, λ1, …, λK, γ), by θ, and the corresponding parameter space as ΘEM = Θβ × Θ(λ1,…, λK) × Θγ, in the following illustration.
3.1. E-step
The expected complete log-likelihood contributed by the ith cluster with the posterior parameter value θ′ is:
| (5) |
which involves imputation of functions of Wi, Uij, or both of them given the observed data. Fortunately, the imputation of functions of Wi, such as Wi, log(Wi), and log fW(Wi) in (5), have simple forms, if one employs the Bayes rule effectively as described below. For illustration purpose, we consider the imputation of a general function h(Wi).
Consider the three sets of variables, , and , where
and
By Bayes rule, the imputation of h(Wi) can be expressed as
| (6) |
where . Under the gamma-frailty model, the posterior distribution of Wi conditioning on the non-left-censored part is a gamma distribution with parameters
and
The imputation of functions involving Uij, say log λ0(Uij) and WiΛ0(Uij) in (5), is more complicated because of the semiparametric setting on T. For a generic function of Uij, say h(Uij), the imputation given the observed data Oi is of the form
| (7) |
where Oij is the observed data from the jth subject within the ith cluster, and fW|O(w|Oi) is defined in (6). Since the cumulative hazard function is a non-decreasing step function with positive jumps at t1 < ⋯ < tK, the ordered observed time points mentioned in Lemma 2.1, and corresponding jump sizes λ1, …, λK, the conditional density fU|(W,O)(u|w, Oij) in (7) is
when u = tk, for any tk ≤ t̃ij, and 0 otherwise. Because of this explicit form for fU|(W,O)(u|w, Oij), no Monte-Carlo integration is needed to evaluate the inner integral in (7), so only one-dimensional Monte-Carlo integration is needed for (7) and we arrive at the following proposition.
Proposition 3.1
The imputations of the two functions involving Uij in the imputed complete log-likelihood can be expressed as follows.
where
and
with
Note that this proposition implies that only one-dimensional Monte Carlo integrations are involved in the calculation of ak,ij(θ′) and ck,ij(θ′). Let M(l+1) denote the number of Monte Carlo seeds generated in the (l+1)th iteration and θ(l) be the value of θ obtained in the previous iteration. The detailed imputation procedure in the E-step is provided below.
- Step 1. Generate wi1, …, wiM(l+1) from a gamma distribution with parameters
- Step 2. Evaluate the following terms at wim, for m = 1, …, M(l+1), and plug-in the current value, θ(l), for θ.
- .
Step 3. Take sample means on the four sets of M(l+1) values in (a) to (d) in Step 2 and replace the integrals to be evaluated by the sample means in corresponding forms.
The imputed complete log-likelihood can thus be rewritten as
where Λ{·} represents the jump size of a step function Λ at the specified time point inside the bracket.
3.2. M-step
In the M-step, the NPMLE of β,γ and (λ1, …, λK) are located by taking derivatives on (5) and solving a system of equations. The MLE of λk is the solution to the following equation
| (8) |
where δk and ηk both correspond to the observed time point tk, and Rk stands for the corresponding risk set defined as {k′ : tk′ ≥ tk}. On the other hand, there is no explicit form for the NPMLE of β and υ, so a one-step Newton-Raphson method is used to update the estimates in each iteration. The updating formula for β is
where Sβ, the score function of β, takes the following form
| (9) |
One interesting finding from (8) and (9) is that the structure of the resulting forms in the proposed EM algorithm is very similar to those subject to right-censoring. The difference is that, for doubly-censored data every left-censored observation contributes part of its probability mass to each jump points preceding it during each iteration of the EM-algorithm. This redistribution to the left algorithm is similar to the self-consistency property for right censored data, which redistributes the weight of each right censored observation to all observations after it. It also reflects the fact that for a left-censored subject, the unobserved event of interest has happened sometime in the past. However, there is a major difference in that some left censored data also carries positive masses.
Under the assumption of a gamma frailty, the Newton-Raphson algorithm for the parameter γ is based on the following updating rule
with Sγ, the score function of γ, defined as
3.3. Convergence of the algorithm
Since Monte Carlo errors are induced in the E-step of the MCEM algorithm, the convergence of the proposed algorithm to the true NPMLE is no longer guaranteed. To address this issue we increase the Monte-Carlo sample size M(l+1) with each iteration to enhance the convergence of our algorithm and refer to Chan and Ledolter (1995), Booth and Hobert (1999), Fort and Moulines (2003), and Caffo, Jank and Jones (2005) for the discussions there on how this overcomes the convergence issue. Although the frailty model is formulated under a semi-parametric setting, the problem of locating the NPMLE given an observed sample via the MCEM algorithm is no different from the parametric setting since the jump points t1, …, tK of the NPMLE of Λ0 are fixed across iterations. This feature allows us to investigate the convergence issue similar to those in existing parametric literature as long as the stationary points of the observed log likelihood (points where the derivative of the observed log likelihood is zero) are all isolated points and there is no left censoring involved. See Fort and Moulines (2003) for more details. However, the situation is much more complicated in the presence of left-censoring, because the Monte Carlo approximation in the proposed algorithm is nonstandard. In standard MCEM algorithms the sample mean of the Monte Carlo samples were used but in the our set up a weighted average (as an empirical counterpart to (6)) is employed in Step 2 of the EM algorithm to approximate the needed integrals in the likelihood. A new convergence theory is thus needed and we establish this in the proposition below, which provides some sufficient conditions for the convergence of the proposed algorithm in the presence of left-censoring. The proof of Proposition 3.2 is relegated to the supplemental material (Su and Wang, 2015). In the following context, we denote L(θ|O) and l(θ|O) as the observed likelihood and log-likelihood respectively with the cumulative baseline hazard function Λ replaced by a non-decreasing step function. The notation stands for the set of stationary points of l(θ|O).
Proposition 3.2
Under the following conditions (a)–(e), the sequence {l(θ(l)|O)} of the observed log-likelihood evaluated at {θ(l), l = 1, 2, …} converges with probability 1 to l(θ*|O), where θ* is a local maximizer of l(·), and {θ(l)} converges to θ*.
fW (w|γ) is continuous w.r.t. to w. Moreover, Eθ′(Wi|Oi), Eθ′(log W|O), ak,ij(θ′), and ck,ij(θ′) are all continuous w.r.t. θ′.
{θ ∈ ΘEM : L(θ) ≥ c} is compact for any given constant c and the stationary points of l(θ|O) are all isolated points in ℒ.
-
The initial value θ(0) falls in a compact neighborhood 𝒞* of θ*, and θ* is the only point in {θ ∈ ℒ : l(θ|O) = l(θ*|O)}.
For any compact subset 𝒞 ⊆ ΘEM,
where h(1)(W, t, Oij) = log(W), h(2)(W, t, Oij) = W, h(3)(W, t, Oij) = fU|(W,O)(t|W, Oij), and h(4)(W, t, Oij) = fU|(W,O)(t|W, Oij)W. The Monte Carlo sample size {M(l)} satisfies , and grows fast enough such that l(θ(l)|O) ≥ l(θ*|O) − M infinitely often, for some constant M > 0, and {l(θ(l)|O)} ≤ l(θ*|O) with probability 1.
Conditions (a)–(c) are also required for standard parametric MCEM (Fort and Moulines, 2003), where condition (a) ensures the continuity of Eθ′ [lC(θ|O)], the expected complete log-likelihood w.r.t. θ′, and condition (c) requires a good initial value which is close to the local maximizer. A sufficient condition for condition (b) is the continuity of the (K + 3)th derivative of l(θ|O), where K varies with the sample size in the semi-parametric setting in contrast to parametric settings where K is fixed. This is the major price for the convergence of the proposed EM algorithm under a semi-parametric model. Condition (d) controls the error induced by the Monte Carlo approximation, and condition (e) specifies the required size of the Monte Carlo samples.
The convergence of {l(θ(l)|O)} suggests a stopping rule based on the difference of the observed likelihood. The algorithm stops when the difference between the observed likelihood at two consecutive iterations is smaller than a pre-specified tolerance of error.
Corollary 3.1
Under the conditions (a)–(e) in Proposition 3.2 and given a good set of initial values in a neighbor of the NPMLE, the estimators from the proposed MCEM converges a.s. to the NPMLE.
4. Main Theorems
We first list the technical assumptions for the theoretical results of the NPMLE. Hereafter, τ denotes the endpoint time of the study.
-
A1
The baseline hazard rate function λ0(t) is bounded and positive in [0, τ]. Moreover, the cumulative hazard function is bounded at τ, i.e. Λ0(τ) < ∞. Let Di be the number of right-censored subjects at time τ in the ith cluster. E(Di) > 0.
-
A2
The expected number of subjects in a family, E(ni), is bounded above. Also ni is noninformative to the parameters of interest.
-
A3
Θβ × Θγ, the parameter space of (β, γ) is compact, and the true value (β0, γ0) falls in the interior of the parameter space.
-
A4
The covariate Z is bounded, i.e. there exists MZ > 0, such that |Z| ≤ MZ. Moreover, exists and is bounded away from 0 over the parameter space.
-
A5
Eθ0 [W exp(β0Z)I(T ≥ t)] exists and is bounded away from 0 for all t ∈ [0, τ]. Moreover, Eθ0 [WΛ0(T̃)Z2 exp(β0Z)] exists and is greater than 0.
-
A6
The distribution fW (·|γ) is continuous with respect to γ and has a continuous second derivative with respect to γ. Furthermore, the Fisher information matrix from fW is of positive definite.
The assumption that Λ0(τ) < ∞ and E(Di) > 0 in A1 are satisfied for a follow up study that needs to end early before all subjects have failed. This is common in medical studies. Assumption A2 is typically satisfied in familial type studies. A3 is a common assumption on the true values of the parameter. Assumptions A4–A5 are technical assumptions for the boundedness of Λ̂n(τ), the invertibility and the boundedness of the Fisher information operator for the proof of the consistency and asymptotic normality of the proposed estimators. The differentiability of the frailty distribution with respect to γ and the invertibility of its Fisher information are stated in A6.
4.1. Asymptotic properties of the NPMLE
Theorem 4.1 (Consistency)
Under assumptions C1–C4 and A1–A5, the NPMLE θ̂ = (β̂, Λ̂, γ̂) converges strongly to θ0 = (β0, Λ0, γ0) under the Euclidean norm |·| for vector parameters and the supreme norm |·| for functions on [0, τ] respectively.
Theorem 4.2 (Asymptotic Normality and Efficiency)
Under assumptions C1–C4 and A1–A6, which imply the consistency (for functions on [0, τ]) in Theorem 4.1, the process converges in distribution to a normal element G in l∞(Hp), where Hp is a collection of directions as defined at the beginning of section A.2, with mean 0 and a covariance structure
∀h, h* ∈ Hp, where σθ0,k, k = 1, 2, 3, are the information operators derived in Appendix. Moreover, β̂ and γ̂ are efficient estimators for β0 and γ0 respectively in the semi-parametric sense.
We provide the detailed proofs for the two theorems in the Appendix. Basically, the proofs are based on demonstrating the Glivenko-Cantelli property on the terms involved in the NPMLE, and the Donsker property on the score functions.
4.2. Estimating the standard error of β̂
As pointed out in the literature under a semiparametric setting with latent variables, estimation of the standard error of the estimates for finite dimensional parameters involves the inversion of a high-dimensional matrix, where each entity further involves integrals. This often poses computational challenges and is also the case with our setting, where the inverse of the information operator has no explicit form. Thus, even under the right censorship, the straight-forward method of utilizing asymptotic variance-covariance matrix, as proposed by Murphy (1995) and Parner (1998), is not applicable to estimate the standard errors. There are two alternative methods in the literature to estimate standard errors under a semiparametric setting: the profile likelihood approach (Murphy, Rossini and van der Vaart, 1997) and the bootstrap method (Tseng, Hsieh and Wang, 2005). The first approach has also been successfully implemented by Zeng and Cai (2005) in joint modeling right censored survival data and its longitudinal covariates. Therefore, we explored both approaches in order to compare them. It turns out that the profile likelihood approach in Murphy, Rossini and van der Vaart (1997) and Zeng and Cai (2005) does not work well in our setting, but we are able to modify it and the modified version works well in the simulation study reported in Section 5.
A profile log-likelihood is defined as
The curvature of pln around β̂ provides an estimate for the negative value of the information matrix. However a direct derivation of the second derivative of pln is not feasible since there is no closed form for pln due to the integration involved in the likelihood function. Murphy, Rossini and van der Vaart (1997) proposed a second difference method to numerically approximate the information. The second difference method is a numerical approach to approximate the second derivative of a target function pln at a point of interest β̂. We start with the first difference: the basic principle is that if we are interested in estimating the first derivative of pln at β̂, we can use the first difference, for a small h to approximate . By applying this idea twice the second time on the first differences and , the second difference as defined in Murphy, Rossini and van der Vaart (1997) gives a numerical second differentiation of the target function. However, in the presence of double censoring, their method often results in negative estimates. We were thus motivated to look for an alternative approach to estimate the second derivative of pln around β̂.
The key idea of our approach is, instead of the simple difference method which are very case sensitive, we fit a quadratic curves on pln around β̂ and then take the estimated leading second-order term to estimate the second derivative of pln. To implement this method, we evaluate pln on d equal-distant points β1 …, βd within a window (β̂ − hn, β̂ + hn), with the half-width hn taken to be of the order O(n−1/2). Although there is no closed form for pln, the evaluation can be done by the EM algorithm. A point regarding the calculation of the profile likelihood in our algorithm needs to be addressed as following. Although left-censored data are treated as missing data in the estimation of NPMLE, we use the original form of the likelihood (3) to calculate the profile log-likelihood after obtaining the maximizer Λ(β) and γ(β) corresponding to each fixed β. Specifically, we fit a quadratic model a0 + a1β + a2β2 on the pairs (β1, pln(β1)), …, (βd, pln(βd)), the stand error of β̂ is estimated by . This method only involves fitting a linear regression model with two predictors, so a moderate number of points β1, …, βd, say 20, is enough for the implementation.
Although the proposed method needs more computational effort than the method in Murphy, Rossini and van der Vaart (1997), for which pln is evaluated at only 3 points, it provides a more stable and accurate estimate for the standard errors. Based on our experience, the performance of both profile likelihood methods depends on the half-width of the window, hn, and the method by Murphy, Rossini and van der Vaart (1997) is much more sensitive to the choice of hn. If the window is too narrow, the profile likelihood approach may yield a negative estimate of the standard error due to the highly oscillatory behavior of the profile log-likelihood around β̂. A wider window may overcome this issue of negative estimate but at a cost of higher biases. For the procedure advocated in Murphy, Rossini and van der Vaart (1997), the bias is always downward and quite serious. Moreover, negative estimates for the standard errors occur much more frequently than our approach based on quadratic approximations. We compare these two profile methods through a simulation study in Section 5, and the simple profile method fails to produce meaningful results.
On the other hand, the bootstrap method has been widely used to estimate the standard error of estimates under many semiparametric models when a simple closed form of the standard error is not available. It provides a numerically valid estimation for the standard error of the estimates by resampling from the observed sample when the number of resampling is fairly large. However, a theoretical justification of the bootstrap method under semiparametric models has not been brought up until Cheng and Huang (2010) and Cheng (2012), which demonstrate the distribution consistency and moment consistency respectively. Those works provide general theories for us to investigate the consistency of the nonparametric bootstrap standard error under the frailty model subject to double censoring as stated in the following theorem. The proof involves verifying the conditions listed in Theorem 1 in Cheng (2012) and is presented in the Appendix. Below we denote as the bootstrap sample standard error and σβ̂ as the standard error of β̂.
Theorem 4.3 (Consistency of the bootstrap standard error)
Under the assumption A1–A6, the nonparametric bootstrap standard error converges in probability to σβ̂, as n → ∞.
5. Simulation
5.1. Evaluate the proposed EM algorithm
To study the numerical performance of the proposed EM algorithm, four simulation settings were conducted, each based on 100 Monte Carlo samples. For each setting, we consider a binary covariate with equal probability to take the value 0 or 1, and the number of subjects within each family is chosen randomly from {2, 3, 4} with equal probabilities, which reflects the structure of the familial data in Section 6, where 49 families participated in the study. The survival times are generated from a Cox model with β = 1, λ0 is the hazard function from an exponential distribution with mean 1, and the frailty term is generated from a gamma distribution with mean and variance both equal to 1.
The four simulation settings correspond to two cluster sizes, 50 and 100, and the following two types of left censorship: (1) Left-censoring time is generated from an exponential distribution with mean 0.05, and (2) Left-censoring time is from an exponential distribution with mean 0.2. In each of the four settings, the right-censoring time is the sum of the left-censoring time and an independent random variable from exponential distribution with mean 8 (cf. Condition C1). For type (1) left censorship above, this resulted in an average of 8% left-censored data and an additional 17% right-censored data, leading to a total of 25% censoring. This reflects a light left-censored case in contrast to the scenario in type (2), where on average 22% of the data are left-censored with an additional 16% right-censored.
The results of the NPMLE for the finite dimensional parameters are listed in Table 1. For the case n = 50, the bias for β under light left-censoring is 0.0072 with a standard error of 0.2173. The variance of the gamma-frailty term can be estimated with a bias of 0.0468 and a standard error of 0.3026. Overall, β can be estimated with more precision than γ. Both the biases (and standard errors) for β and γ decreases, to 0.0070 (0.1549) and 0.0465 (0.2293) respectively, as the number of clusters increases to n = 100. As expected, the performance of both estimates for β and γ generally deteriorates under the heavier left-censoring scenario (2), but the differences are not large. Considering that a total of 38% of the data are missing under scenario (2), the numerical performance of the procedure seems satisfactory. In addition to the accuracy and precision of the estimator, the proposed EM algorithm also performs well in the aspect of numerical stability. It possesses high convergence rate under all scenario. In the simulation, we allow the maximum iteration as 100 along with a tolerance of relative error of 0.001. The convergence rates with 50 clusters are 100% and 99% under 8% and 22% of left-censoring respectively. When the number of clusters increases to 100, the convergence rates achieve 100% under both 8% and 22% of left-censoring.
Table 1. Simulation results from the proposed EM algorithm.
Results on two different simulation settings with 2 sample sizes, 50 and 100, under each setting. The true baseline hazard is a constant function. The notations σ̂·,MC stand for estimated standard deviation from 100 Monte Carlo samples. MSE stands for the mean square error of the estimates.
| Cases | n | β0 | β̂ | σ̂β,MC | MSE(β̂) | γ0 | γ̂ | σ̂γ,MC | MSE(γ̂) |
|---|---|---|---|---|---|---|---|---|---|
| 8% left-censored | 50 | 1 | 1.0072 | 0.2173 | 0.0473 | 1 | 0.9532 | 0.3026 | 0.0938 |
| 100 | 1 | 1.0070 | 0.1549 | 0.0240 | 1 | 0.9535 | 0.2293 | 0.0547 | |
| 22% left-censored | 50 | 1 | 1.0200 | 0.2274 | 0.0521 | 1 | 0.9364 | 0.3069 | 0.0982 |
| 100 | 1 | 1.0138 | 0.1559 | 0.0245 | 1 | 0.9580 | 0.2209 | 0.0506 |
For estimating the stand error of the estimates, we started by comparing three approaches: the bootstrap method, the profile likelihood method by Murphy, Rossini and van der Vaart (1997), and our version of the profile likelihood method as discussed in Section 4.2. The bootstrap method is similar to the one described in Tseng, Hsieh and Wang (2005). Both versions of the profile likelihood method involve the choice of a window width h, as demonstrated in Section 4.2. Based on our experience in simulations, the performance of the estimated standard errors depends on the choice of the window width and the approach by Murphy, Rossini and van der Vaart (1997) more sensitive to the window width than ours. If the window width is too small, the profile likelihood method may result in unreasonable values of standard error, while a larger width leads to biases. We tried different widths, , with k = 1, 3, 5, 7, 9 for both profile likelihood approaches but the approach of Murphy, Rossini and van der Vaart (1997) still resulted in many negative estimates up to . Our profile approach resulted in a few negative estimates for small h but none for and . Naturally, performed better than . Because of these reasons, we report in Table 2 only our results for together with the results by bootstrap method. Both approaches are comparable and produce results close to the Monte-Carlo standard deviation, σ̂β,MC. Since it is difficult to know in reality how to choose the window width, a bootstrap method may be the preferred choice if computational time is not a concern. Otherwise, we recommend our profile likelihood method with a small width h that leads to a positive estimate.
Table 2. Simulation results on estimating the standard error of β̂.
The notations with subscription ”BT” stand for estimated standard error based on 50 bootstrap resamples. Those with subscription ”PL” stand for estimated standard error based on profile likelihood approach with a width of .
| Cases | n | σ̂β,MC | σ̂β,BT | σ̂β,PL,7 | σ̂γ,MC | σ̂γ,BT | σ̂γ,PL,7 |
|---|---|---|---|---|---|---|---|
| 8% left-censored | 50 | 0.2173 | 0.25442 | 0.2409 | 0.3026 | 0.3019 | 0.2913 |
| 100 | 0.1549 | 0.1703 | 0.1525 | 0.2293 | 0.2053 | 0.2451 | |
| 22% left-censored | 50 | 0.2274 | 0.2561 | 0.2010 | 0.3069 | 0.2871 | 0.3042 |
| 100 | 0.1559 | 0.1769 | 0.1622 | 0.2209 | 0.2087 | 0.2239 |
5.2. Ascent property of the proposed EM algorithm
One issue commonly encountered in Monte Carlo EM algorithms is the convergence to the true maximizer of the (marginal) likelihood function. As discussed in the literature, maximizing the approximated likelihood by Monte Carlo integration will not locate the MLE exactly due to the presence of Monte Carlo errors. An efficacious EM algorithm should sustain the so-called ascent property which describes the increasing pattern of the targeted marginal likelihood along iterations. Herein a simulation is conducted to verify the ascent property of the proposed EM algorithm. In order to obtain an analytical form of the marginal likelihood in each iteration, we consider a simple scenario with 100 clusters of size 2. The survival times are generated by a gamma frailty model with β = 1, λ0 as the hazard function of an exponential distribution with mean 1, and γ = 1. The left-censored rate is about 8% and there is at most 1 left-censored subject within each cluster. The marginal likelihood function is evaluated at the estimated values obtained by maximizing the surrogate likelihood in each iteration. Figure 1 shows the patterns of the marginal log-likelihood along with iteration steps obtained from 12 randomly selected sets of simulations. As observed in the plots, the marginal log-likelihood increase drastically in the first few iterations and continues to climb up till the algorithm converges. The ascent property of the proposed EM algorithm is clearly demonstrated by the trends in the plots.
Fig 1.
Plots of marginal log-likelihood evaluated at the NPMLE calculated in each iteration step based on 12 datasets with 100 clusters of size 2.
5.3. Misspecification on the frailty distribution
To study the effect of misspecifying the frailty distribution, we conducted some simulations with misspecified frailty distributions. The frailty term is generated from (1) a log-normal distribution with the mean and the standard deviation after logarithm transformation as −0.5 and 1 respectively, and (2) a mixture of two gamma distributions, Gamma(2,0.1) and Gamma(18,0.1), with equal weights. The two scenarios on frailty distributions represent a unimodal non-gamma distribution with mean 1 and variance about 1.72 and a bimodal distribution with mean 1 and variance about 0.74. We explore the two types of misspecified frailty distributions with the numbers of clusters as 50 and 100, and the settings on other factors similar to the first two simulations in Section 5.1. The left-censoring rate is about 8% with an additional 17% of right-censorship in average. Under both scenarios, a gamma frailty model is fitted via the proposed method for estimating the parameters.
The results of the NPMLE obtained from a misspecified model are shown in Table 3. The performance of the estimated regression coefficient β̂ is comparable to the results under the correct model in Table 1. The biases are slightly greater than that under a correct model, yet still within 4% (0.0277 and 0.0328 based on 50 clusters for unimodal and bimodal models respectively). Increasing the number of clusters to 100 reduces the bias to less than 2% (0.0193 and 0.0154 for unimodal and bimodal respectively) and gains efficiency as well. However, the variance component of the frailty cannot be accurately recovered under model misspecification. As demonstrated in Table 3, there can be a non-negligible bias on estimating γ. The relative biases are about 60% and 50% of the true parameter γ0 for unimodal and bimodal cases respectively. This is expected under model misspecification as the targets have changed. To summarize, given that the survival regression coefficients are usually the primary interest of a study, the proposed NPMLE is fairly robust against departure of the frailty distribution. In particular, the survival regression coefficient can be estimated with high accuracy and precision even when the frailty component is incorrectly modeled.
Table 3. Simulation results with misspecification on the frailty distribution.
Results on two scenarios on frailty distributions with 2 sample sizes, 50 and 100, under each setting. The true baseline hazard is a constant function. The notations σ̂·,MC stand for estimated standard deviation from 100 Monte Carlo samples. MSE stands for the mean square error of the estimates.
| True frailty distribution | n | β0 | β̂ | σ̂β,MC | MSE(β̂) | γ0 | γ̂ | σ̂γ,MC | MSE(γ̂) |
|---|---|---|---|---|---|---|---|---|---|
| Log-normal | 50 | 1 | 0.9723 | 0.2371 | 0.0570 | 1.72 | 0.6835 | 0.2375 | 0.1566 |
| 100 | 1 | 0.9807 | 0.1697 | 0.0292 | 1.72 | 0.6551 | 0.1442 | 0.1398 | |
| Mixture of Gammas | 50 | 1 | 0.9672 | 0.2460 | 0.0616 | 0.74 | 1.0997 | 0.3057 | 0.2228 |
| 100 | 1 | 0.9846 | 0.1593 | 0.0256 | 0.74 | 1.1940 | 0.2091 | 0.2498 |
6. Numerical Example
Our motivating example is a hepatitis B study for children with chronic Hepatitis B virus (HBV) infection. Hepatitis B is an infectious liver disease causes by HBV. About a quarter of the world populations have been infected. Patients with chronic HBV may infect others over a long period of time and are more likely to develop liver cirrhosis and cancer. It is thus important to control and monitor this disease. HBeAg (Hepatitis B e antigen) is a marker of a patient’s degree of infectiousness with positive result indicates the person has high levels of virus and greater infectiousness. E-seroconversion occurs when an infected individual’s immune system produces the corresponding antibodies to the e antigen. This is an important therapeutic end point and the primary interest of this study.
Our goal is to understand the seroconversion process of e antigen and its association to two risk factors, one is ALT (Alanine Aminotransferase) measured at the baseline clinical visit and the other is the HBV (Hepatitis B virus) status (yes=1, and no=0) of the child’s mother. ALT (alanine aminotransferase) is the liver enzyme marker that is followed most closely in those chronically infected with hepatitis B. An elevated level of ALT indicates the damage on liver cells. Due to the extremely large values of ALT level, a logarithm transformation is often applied and the covariate we used in the analysis is the logarithm of the baseline ALT levels.
The study includes 107 HBeAg positive children from 49 families recruited between 1974 and 1992 and followed up till 2008. Since subjects entered the study at different ages and some of them had completed e-seroconversion before the first clinical visit, the survival time of those patients are thus left censored. For those who have not developed e-seroconversion at the time of entry into the study, they are subject to the usual right censorship. Thus, this data set is subject to the double censorship considered in this paper. The left-censoring rate is about 2.8% and the right-censoring rate is about 19.4%. Detailed description of the data can be found in Wu et al. (2006), which included a subset of the sample and focused on a different problem. Although the left-censoring rate is low in this data, ignoring the left-censored subjects may result in a bias sample as left-truncated data. To avoid the issue of bias samples, we retain those subjects in the dataset.
Due to the familial structure, a frailty model is employed to accommodate the dependence among subjects from the same family. We consider a shared-frailty model with these two covariates and apply the proposed approach in Sections 2 and 3 to obtain statistical inferences. The results are provided in Table 4. The mother’s HBV status, is insignificant but negatively associated with the incidence rate of HB e antigen seroconversion. In the final model, the regression coefficient of logarithm of baseline ALT level is 0.6091 with a p-value 0.0030 indicating a positive and significant effect on the incidence rate of e-seroconversion, which may seem surprising at first but is consistent with clinical observation that patients with higher level of ALT when entering the study tend to have a higher incidence rate to e-seroconvert. This could explained as higher ALT levels are more likely to trigger the development of antibodies to HBV e antigen. The estimated variance of the frailty term is 1.4065 with an estimated standard error of 0.6865. That is, children from the same family tend to have correlated seroconversion time. The estimated cumulative baseline hazard function is shown in Figure 2 along with a 95% pointwise confidence band obtained from bootstrap.
Table 4. The Fitted results on HB study under full and reduced models.
| Model | Parameters | Estimates | Esti. SD | p-value |
|---|---|---|---|---|
| Full | Mom HBV carrier | −0.0311 | 0.4348 | 0.9430 |
| Baseline ALT | 0.6191 | 0.2101 | 0.0032 | |
| v | 0.5303 | 0.2370 | – | |
| Reduced | Baseline ALT | 0.6091 | 0.2053 | 0.0030 |
| v | 0.4723 | 0.2350 | – |
Fig 2.
The solid line stands for the estimated cumulative baseline hazard function obtained from the proposed method. A pointwise 95% confidence band from bootstrap is presented by the dash lines.
We close this section with a remark on the usage of the baseline ALT. Due to the sampling plan, the ALT measurements taken from the left-censored subjects at their first clinical visits are post-seroconversion; hence it can be an issue that the significant results are likely contributed by a reverse causation. The validity of using the baseline ALT obtained from the left-censored subjects can be justified by the following two reasons: (1) the left censoring proportion is very low (only 3 out of 107 subjects) for this data, so unlikely to induce a serious bias, and (2) ALT levels tend to stabilize to a normal level after seroconversion, so the ALT level at entry of the study for a left censored data is likely to be lower than the ALT levels prior to seroconversion. The implication is that the actual p-value should be smaller than the ones reported in Table 4 leading to an even more significant finding. Thus, the significance finding observed in this paper is not a result of the reverse causation. To provide further assurance, a separate analysis is conducted on the same data but omitting the three left-censored subjects. This resulted in a left-truncation (left censored data are truncated) and right-censoring (LTRC) scenario. The new algorithm we developed for LTRC clustered survival data resulted in an estimate of 0.7238 (S.E. = 0.2570) for the regression coefficient of ALT (p-value=0.0049). Thus, the new analysis underscores the significant association between the baseline ALT level and seroconversion time.
7. Conclusions
In this paper, we propose a likelihood approach to estimate the unknown components in a shared-frailty model for clustered survival data that are subject to double-censoring. We show that the non-parametric maximum likelihood method leads to and semi-parametrically efficienct estimator for all finite-dimensional parameter and for the cumulative baseline hazard function.
Two estimates for the standard deviation of the NPMLE for finite-dimensional parameters are investigated, one based on bootstrap method and the other based on a new quadratic approximation for the profile likelihood. Both approaches are supported by numerical evidence and lead to reliable and stable estimates. They complement each other in that the bootstrap method is conceptually simpler but computationally costly. The quadratic approximation method is computationally efficient and a remedy for the simple profile likelihood approach proposed in Murphy, Rossini and van der Vaart (1997), which often leads to negative estimates of the standard errors when data are doubly censored.
In addition to theoretical contributions, a new and effective algorithm is proposed to estimate the nonparametric maximum likelihood estimates through a modified EM algorithm by treating the unobserved frailty terms and all left-censored survival times as missing data. The distinctive features of the proposed algorithm are: (i) it provides a computationally simple and stable algorithm that involves only one-dimensional Monte-Carlo integrations, with respect to the latent frailty, in the E-step of the EM-algorithm, (ii) it involves simple and tractable maximization in the M-step of the EM-algorithm, and (iii) for a special and simpler case where the frailty variable is constant, it involves no Monte-Carlo integration and overcomes the computational instability of an existing method (Kim, Kim and Jang, 2010) that tackles the full nonparametric likelihood by the Gauss-Seidel method, which involves solving high-dimensional equation systems. Thus, we not only provide a viable solution to a new problem but also resolve a lingering computational issue for independent left- or doubly-censored data.
Supplementary Material
Acknowledgments
We thank the Editor, the Associate Editor and two anonymous reviewers; their insightful suggestions greatly improve this paper. Furthermore, we are grateful to Dr. Masanao Yajima at Fred Hutchinson Cancer Research Center for his helpful comments.
This research is partially supported by NIH grants R21ES022332 and R01AG014358.
This research is partially supported by NIH grants 1R01AG025218-01 and 1R56AG043995-01.
Appendix
A.1. Construction of the likelihood contributed by uncensored and right-censored subjects
We focus on uncensored subjects. An analogous argument can be extended to right-censored subjects. For an uncensored subject, the observed data are (T̃ = t̃, δ = 1, η = 1, Z = z, L = l), where t̃ > l. Under the assumption of independence between W and Z, the conditional density fobs of the observed data given W = w is
| (10) |
The last equation holds since whenever t̃ > l it implies that η = 1. By substituting δ = 1 with R ≥ T,
| (11) |
By the conditional independence in C3 between T and (L, Y) given (Z, W) the right-hand side of (11) becomes
| (12) |
The non-informative assumption on L and Y in C3 implies that the second term above does not involve any parameter of interest, hence the observed left-censoring time does not contribute information to the likelihood.
A.2. Proof of Theorem 4.1
OUTLINE OF THE PROOF
We shall use a subscript n, the number of families, for the NPMLE since the asymptotic properties are constructed according to n. We would like to point out here that the NPMLE exists with probability 1 under our setting. This can be verified by an apagogic argument analogous to page 2140–2141 in Zeng and Cai (2005). Consistency of the NPMLE can be demonstrated by first showing that Λ̂n (τ) is bounded almost surely as n → ∞. This implies that Λ̂n can be regarded as a bounded measure. Then by Helly’s selection theorem and the compactness of the parameter space Θβ × Θγ, every subsequence of θ̂n = (Λ̂n, βn, γn) has a subsequence {q(n)} of {n} such that θ̂q(n) = (βq(n), Λq(n), γq(n)) converges to a certain inner point θ* = (β*, Λ*, γ*), where Λ* is continuous as shown in Zeng and Cai (2005), and Λ̂q(n) converges uniformly to it in the whole parameter space. The proof will be completed if we can show that θ* = θ0. However, we do not know what θ* is, since there is no close form solution for θ̂n. Therefore, we rely on an intermediate function Λ̄n (·), which converges to Λ0 uniformly on [0, τ]. The claim that θ* = θ0 can next be established similar to the arguments in the literature (Murphy, 1994; Dupuy, Grama and Mesbah, 2006). Below, we provide details of the proof.
To prove the boundedness of Λ̂n (τ), we take derivatives on the observed log likelihood function with respect to all λk’s, and it can be shown that
| (13) |
The second term in the numerator of (13) is bounded above by , since the sum of ak,ij (θ̂) over all k such that t̃k ≤ t̃ij is bounded above by 1. Then
| (14) |
The upper bound in (14) converge a.s. to a finite number as n tends to infinite by the Law of Large Number and assumptions A1, A2, and A4. This implies the boundedness of Λ̂n (τ). Consequently the NPMLE Λ̂n is a finite measure on [0, τ]. According to Helly’s selection lemma, every subsequence of Λ̂n has a further subsequence Λ̂q(n) such that ‖Λ̂q(n) − Λ*‖ converges to 0 with probability 1 on [0, τ].
Before defining an intermediate term used in this proof, we re-write Λ̂n as
where
and
stands for the empirical process. Then the intermediate term Λ̄ is defined as
Since the class {Q2(·, O, θ0) on [0, τ]} can be shown to be Glivenko-Cantelli by establishing the uniform boundedness and bounded variation of Q2, assumption A5 then implies the convergence of the first term to as n goes to infinite. Likewise, the pointwise convergence of the second term to , for each t* in [0, τ] can be established. It is easy to see that sum of the above two limits is Λ0 (t). Glivenko-Cantelli Lemma and the continuity of Λ0 imply that ‖Λ̄n − Λ0‖ converges to 0 with probability 1. By the definition of Λ̂n (t*) and Λ̄n (t*), we have that Λ̂n (t*) is absolutely continuous with respect to Λ̄n (t*), and
where . By applying the Glivenko-Cantelli property along with the dominance convergence theorem the convergence of θ̂q(n) to θ* implies that
Then the absolute continuity of Λ* with respect to Λ0 holds. Moreover, converges uniformly to .
Now we consider the following difference in log-likelihood
The left-hand-side converges a.s. to Eθ0 [l(θ*) − l(θ0)] by Lebesgue’s theorem. Since the limit is the Kullback-Leibler divergence which is non-positive, the only possibility is that the limit is exactly zero. By the identifiability under conditions C1 to C4, we conclude that θ* = θ0. The proof is now complete.
A.3. Proof of Theorem 4.2
Proof of Asymptotic Normality
The proof will follow the framework of Theorem 3.3.1 in van der Vaart and Wellner (1996) and involves several key steps.
Let Hp = {h = (h1, h2, h3) : |h1| + ‖ h2 ‖v + |h3| ≤ p}, where h1 and h3 ∈ ℝ1, h2 is a function of bounded variation on [0, τ], and ‖ h2 ‖v denotes the sum of the absolute value of h2 at 0 and the total variation of h2 on [0, τ]. Here we consider θ = (β, Λ, γ) as a functional on Hp defined as
Hence the parameter space Θ is a subspace of l∞(Hp). To verify the Fréchet differentiability of the score function, for a fixed h = (h1, h2, h3) ∈ Hp, we shall consider an one-dimensional submodel θt = (β + th1, Λt(h2), γ + th3), where t ∈ ℝ1 and . Here, for sufficiently small |t|, Λt(h2) satisfies the requirements of a cumulative hazard function since h2 is a function of bounded variation on [0, τ].
Let θ̃ denote a certain value of θ, the score function for t at θ along the direction h is
| (15) |
where
and
The corresponding Fréchet derivative of the score at the true value θ0 can be shown to be
| (16) |
where
and
We shall term σθ0 = (σθ0, 1, σθ0, 2, σθ0, 3) the Fisher information operator. Since the Fréchet derivative ∇θSθ0 (θ0) is a linear form of σ, it suffices to show the continuous invertibility of σ by proving: (i) its one-to-one property, and (ii) it can be expressed as a sum of a continuously invertible operator and a compact operator. The one-to-one property (i) can be illustrated by a pagogical argument which is a consequence of the identifiability of the model.
To demonstrate (ii), we consider the following decomposition of the Fisher information operator.
where
and
The continuous invertibility of σθ0,L is straightforward under assumptions A1 to A6. To show the compactness of σθ0,C, we consider a sequence hn = (h1,n, h2,n, h3,n) ∈ Hp, and prove the existence of a convergent subsequence of σθ0,C (hn). By applying Helly’s selection theorem along with the Bolzanno-Weierstrass theorem, we obtain a subsequence hq(n) of hn which converges to a limit . Since the norm of the distance between σθ0,C (hq(n)) and σθ0,C (h*) can be expressed as
| (17) |
assumption A1 to A6 imply that (17) is bounded above by
for some constant c. The dominated convergence theorem gives the convergence of the upper bound to zero, and then implies the convergence of σθ0,C (hq(n)) to σθ0,C (h*). The operator σθ0,C has been shown to be compact and then the continuous invertibility of σθ0 holds.
In the following step, we demonstrate the convergence of the difference between the empirical score process Sn,θ̂n and the mean score process Sθ0 evaluated at the true θ0. The definition of Sn,θ̂n (θ) and Sθ0(θ) are defined as follows. For the empirical score process, we define
where
For the mean score process, we define
where
To illustrate the convergence of the process , the main point is to demonstrate the Donsker property of the classes of functions shown in Sn,θ̂n (θ0). The Donsker property on the class {h1Sn,θ̂n,1(θ0) + h3Sn,θ̂n,3 (θ0) : |h1|, |h3| ≤ p} holds due to the boundedness assumption in A4 and A5, and the fact that it is a parametric class, parameterized by h on a bounded subset, of measurable score function. This is illustrated by van der Vaart (Example 19.7 in van der Vaart (1998)). Moreover, according to the fact that a class of functions that are both uniformly bounded on [0, τ] and of bounded variation is Donsker, the Donsker property holds for the class {Sn,θ̂n,2(θ0)(h2) : h2 ∈ BVp}, where BVp is the space of functions of bounded variation whose total variations are smaller than p on [0, τ]. This leads to the convergence of to a tight element on l∞(Hp).
Next, we verify condition (a) in Theorem 3.3.1 in van der Vaart and Wellner (1996). From now on, we denote the score functions based on one cluster by sθ̃,O (θ)(h) = h1sθ̃,O,1(θ) + sθ̃,O,2(θ)(h2) + h3sθ̃,O,3(θ), where
According to Lemma 3.3.5 in van der Vaart and Wellner (1996), it suffices to show the following two steps: (i) the class of random functions {sθ,O(θ)(h) − sθ0,O(θ0)(h)} :‖ θ − θ0 ‖ < δ, h ∈ H}, for certain δ > 0, is Donsker, and (ii) as θ → θ0. The Donsker property for the class in (i) can be verified in a similar way as shown previously for condition (b) by looking at sθ,O,k(θ) − sθ0,O,k(θ0), k = 1, 2, 3. The second step follows from the dominated convergence theorem. Therefore, condition (a) holds.
We have now verified condition (a), (b), and (c) in Theorem 3.3.1 in van der Vaart and Wellner (1996). Along with the consistency of θ̂n shown in Theorem 4.1, the weak convergence of is concluded.
Proof of semiparametric efficiency
The Fréchet differentiability and the of θ̂ shown previously imply
| (18) |
where op(1) is a random term converges in probability to zero element in l∞(Hp). Since the continuous invertibility of the Fisher information operator σ has been verified, its inverse operator, denoted as σ−1, exists and for each given h we have h̃ = (h̃1, h̃2, h̃3) = σ−1(h). By replacing h by h̃ on the right-hand-side of (18) and according to (16), we obtain the following equation.
| (19) |
Hence, converges weakly to a tight Gaussian element in l∞(Hp). By taking h2 = 0 in (19), we observe that the influence function for (β̂nh1, γ̂nh3) is a linear span of the score functions. By applying proposition 3.3.1 in van der Vaart and Wellner (1996), the semiparametric efficiency of (β̂, γ̂) is concluded.
A.4. Proof of Theorem 4.3
We complete the proof of the consistency of the bootstrap standard error by verifying the three conditions M1–M3 listed in Theorem 1 in Cheng (2012). In the following, we denote the log-likelihood contributed by a cluster by l(θ). Condition M1, which states the quadratic behavior of the log-likelihood, can be illustrated by considering the second-order Taylor expansion on the expected log-likelihood Eθ0 l(θ). By the identifiability of the model, θ0 maximizes Eθ0 l(θ); hence the expected difference between l(θ) and l(θ0) can be expressed by a linear form of the information operator defined in (16), with θ replaced by θ − θ0, plus the remainder term. By assumptions A1–A6, Eθ0 [l(θ) − l(θ0)] is bounded above by a certain constant times . Thus condition M1 holds for the current model. Condition M3 in Theorem 1 in Cheng (2012) requires the of the NPMLE Λ̂ and the bootstrap NPMLE Λ̂*. The of Λ̂ is illustrated in Theorem 4.1. Analogously, the of Λ̂* can be verified since the log-likelihood from a nonparametric bootstrap sample can be expressed as , where Mni is the frequency of the ith cluster being resampled, and (Mn1, …, Mnn) ~ Multinomial (n, (n−1, …, n−1)).
Condition M2 in Theorem 1 in Cheng (2012) describes the moment condition of the empirical process over a class of functions defined as 𝒩δ = {l(θ) − l(θ0) : θ ∈ Θ,| β − β0 |≤ δ, | γ − γ0 |≤ δ, ‖ Λ − Λ0 ‖∞ ≤ δ} for some δ > 0. Here we introduce some notations for later use. Let 𝒩δ be the envelop function of the class 𝒩δ. Define empirical processes and , where Pf = ∫ fdP, , and . The notations with * denote the corresponding terms based on bootstrap samples. Moreover, we define and , and use the notation “a(b) ≲ b” to mean that a(b) is smaller than b, for all b, up to an universal constant.
Since the function l(θ)−l(θ0), for fixed θ0, is globally Lipschitz continuous with the Lipschitz coefficient function as a finite constant function under the assumption A1–A6, we have the Lp′−norm of the envelop function
| (20) |
It also implies that
| (21) |
By the compactness of the finite-dimensional parameter space Θβ × Θγ and the fact that the class of bounded monotone functions is VC-hull class, the class 𝒩δ has finite uniform entropy integral. This fact along with (20) imply
| (22) |
Moreover, under nonparametric sampling scheme, (21) and (22) lead to the following inequality according to Appendix A.5 in Cheng (2012).
| (23) |
The two inequalities in (22) and (23) complete the verification of condition M2, and hence the theorem.
Footnotes
Supplement to “Semiparametric efficient estimation for shared-frailty models with doubly-censored clustered data” (doi: COMPLETED BY THE TYPESETTER; .pdf). Owing to the space constraints, we presented the proof of Proposition 3.2 to the supplemental material (Su and Wang, 2015).
References
- Booth JG, Hobert JP. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of Royal Statistical Society Series B - Statistical Methodology. 1999;61:265–285. [Google Scholar]
- Caffo BS, Jank W, Jones GL. Ascent-based Monte Carlo expectation-maximization. Journal of Royal Statistical Society Series B - Statistical Methodology. 2005;67:235–251. [Google Scholar]
- Cai T, Cheng S. Semiparametric regression analysis for doubly censored data. Biometrika. 2004;91:277–290. [Google Scholar]
- Chan KS, Ledolter J. Monte Carlo EM Estimation for Time Series Models Involving Counts. Journal of the American Statistical Association. 1995;90:242–252. [Google Scholar]
- Chang MN. Weak convergence of a self-consistent estimator of the survival function with doubly censored data. Annals of Statistics. 1990;18:391–404. [Google Scholar]
- Chang MN, Yang GL. Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Annals of Statistics. 1987;15:1536–1547. [Google Scholar]
- Cheng G. A Note on Bootstrap Moment Consistency for Semiparametric M-Estimation 2012 [Google Scholar]
- Cheng G, Huang J. Bootstrap Consistency for General Semiparametric M-Estimation. The Annals of Statistics. 2010;38:2884–2915. [Google Scholar]
- Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society Series B - Statistical Methodology. 1972;34:187–220. [Google Scholar]
- Cox DR. Partial Likelihood. Biometrika. 1975;62:269–276. [Google Scholar]
- DeGruttola V, Lagakos SW. Analysis of doubly-censored survival data, with application to AIDS. Biometrics. 1989;45:1–11. [PubMed] [Google Scholar]
- Dupuy JF, Grama I, Mesbah M. Asymptotic theory for the cox model with missing time-depedent covariate. Annals of Statistics. 2006;34:903–924. [Google Scholar]
- Fort G, Moulines E. Convergence of the Monte Carlo expectation maximization for curved exponential families. The Annals of Statistics. 2003;31:1220–1259. [Google Scholar]
- Kim YJ. Regression analysis of doubly censored failure time data with frailty. Biometrics. 2006;62:458–464. doi: 10.1111/j.1541-0420.2005.00487.x. [DOI] [PubMed] [Google Scholar]
- Kim MY, DeGruttola V, Lagakos SW. Analyzing doubly censored data with covariates, with application to AIDS. Biometrics. 1993;49:13–22. [PubMed] [Google Scholar]
- Kim Y, Kim B, Jang W. Asymptotic properties of the maximum likelihood estimator for the proportional hazards model with doubly censored data. Journal of Multivariate Analysis. 2010;101:1339–1351. [Google Scholar]
- Kim Y, Kim J, Jang W. An EM algorithm for the proportional hazards model with doubly censored data. Computational Statistics and Data Analysis. 2013;57:41–51. [Google Scholar]
- Murphy SA. Consistency in a proportional hazard model incorporating a random effect. Annals of Statistics. 1994;22:712–731. [Google Scholar]
- Murphy SA. Asymptotic theory for the frailty model. Annals of Statistics. 1995;23:182–198. [Google Scholar]
- Murphy SA, Rossini AJ, van der Vaart AW. Maximum likelihood estimation in the proportional odds model. Journal of American Statistical Association. 1997;92:968–976. [Google Scholar]
- Mykland P, Ren JJ. Algorithms for computing self-consistent and maximum likelihood estimators with doubly censored data. The Annals of Statistics. 1996;24:1740–1764. [Google Scholar]
- Nielsen GG, Gill RD, Andersen PK, Sorensen TIA. A counting process approach to maximum likelihood estimation in frailty models. Scand. J. Statist. 1992;19:25–44. [Google Scholar]
- Parner E. Asymptotic theory for correlated gamma-frailty model. Annals of Statistics. 1998;26:183–214. [Google Scholar]
- Ripatti S, Palmgren J. Estimation of multivariate frailty models using penalized partial likelihood. Biometrics. 2000;56:1016–1022. doi: 10.1111/j.0006-341x.2000.01016.x. [DOI] [PubMed] [Google Scholar]
- Su Y-R. Survivial analysis for incomplete data PhD thesis. University of California; Davis: 2011. [Google Scholar]
- Su Y-R, Wang J-L. Supplement to “Semiparametric efficient estimation for shared-frailty models with doubly-censored clustered data”. 2015 doi: 10.1214/15-AOS1406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Therneau TM, Grambsch PM, Pankratz VS. Penalized survival models and frailty. Journal of Computational and Graphical Statistics. 2003;12:156–175. [Google Scholar]
- Tseng YK, Hsieh F, Wang JL. Joint modelling of accelerated failure time and longitudinal data. Biometrika. 2005;92:587–603. [Google Scholar]
- Turnbull BW. Nonparametric estimation of a survivorship function with doubly censored data. Journal of American Statistical Association. 1974;69:169–173. [Google Scholar]
- van der Vaart AW. Asymptotic statistics. Cambridge Univ. Press; 1998. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer; New York: 1996. [Google Scholar]
- Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;16:439–454. [PubMed] [Google Scholar]
- Wu JF, Chen CC, Hsieh RP, Shih HH, Chen YH, Li CR, Chiang CY, Shau WY, Ni YH, Chen HL, Hsu HY, Chang MH. HLA typing associated with hepatitis B E antigen seroconversion in children with chronic hepatitis B virus infection: a longterm prospective sibling cohort study in Taiwan. The Journal of Pediatrics. 2006;148:647–651. doi: 10.1016/j.jpeds.2005.12.025. [DOI] [PubMed] [Google Scholar]
- Zeng D, Cai J. Asymptotic Results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics. 2005;33:2132–2163. [Google Scholar]
- Zhang Y, Jamshidian M. On algorithm for the nonparametric maximum likelihood estimator of the failure function with censored data. Journal of Computational and Graphical Statistics. 2004;13:123–140. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


