Abstract
We study nonparametric estimation with two types of data structures. In the first data structure n i.i.d. copies of (C, N(C)) are observed, where N is a finite state counting process jumping at time-variables of interest and C a random monitoring time. In the second data structure n i.i.d. copies of (C ∧ T, I (T ≤ C), N(C ∧ T)) are observed, where N is a counting process with a final jump at time T (e.g., death). This data structure includes observing right-censored data on T and a marker variable at the censoring time.
In these data structures, easy to compute estimators, namely (weighted)-pool-adjacent-violator estimators for the marginal distributions of the unobservable time variables, and the Kaplan–Meier estimator for the time T till the final observable event, are available. These estimators ignore seemingly important information in the data. In this paper we prove that, at many continuous data generating distributions the ad hoc estimators yield asymptotically efficient estimators of -estimable parameters.
Key words and phrases: Asymptotically linear estimator, asymptotically efficient estimator, current status data, right, censored data, isotonic regression
1. Introduction
In this paper we study nonparametric estimation with two types of data structures. First, we discuss these two data structures in detail. Subsequently, we provide an overview of the rest of the paper.
1.1. Current status data on a finite counting process
Consider a finite state counting process , where Tj is the time-variable at which a specified event occurs and where N jumps from value j − 1 to j at time Tj. The number of jumps k is fixed and known. We allow that there is a positive probability that the counting process never reaches jump j0 for any particular j0 ∈ {1, …, k}; since T1 < · · · < Tk, this implies that there is also a positive probability that N never reaches jump j for j = j0, …, k: that is, we allow multivariate distributions of (T1, …, Tk) with P (Tj = ∞) > 0 for j = j0, …, k. In this manner we allow applications in which the number of jumps of N is random on {1, …, k}.
We consider the data structure (C, N (C)) for a single random monitoring time C. The only assumption is that C is independent of N: the cumulative distribution G of C, and the probability distribution F of N are unspecified. Note that the distribution of N, denoted by F, is not a cumulative distribution function, but a probability distribution that is identified by the multivariate cumulative distribution of (T1, …, Tk).
Such data structures occur in cross-sectional studies where each subject is monitored once. For example, in some carcinogenicity experiments, one can only determine a discretized occult tumor size at time t in a randomly sampled mouse, as measured by N (t), by sacrificing a mouse at time t. In this example, T1 might represent time till onset of the tumor and T2, …, Tk might correspond with times till increasing sizes of the tumor. Similarly, Tj might denote the age at which a child has mastered the j th skill among a set of k skills ordered in difficulty. We refer to Jewell and van der Laan (1995) for additional applications.
The distribution of (C, N (C)) depends on the distribution of T⃗ = (T1, …, Tk) only through the marginal distributions Fj of Tj, j = 1, …, k (see Section 2). In this problem, the NPMLE of the distribution of Tj requires an iterative algorithm. On the other hand, an ad hoc method for estimation of the distribution of Tj is directly available: reduce the observation (C, N (C)) to a standard current status observation (C, Δj = I (Tj ≤ C)) on Tj. Then one can estimate the distribution of Tj with the NPMLE based on the reduced current status observations, which we will refer to as the reduced data NPMLE (RNPMLE). This estimator provides regular and asymptotically linear estimators of pathwise differentiable functionals of Fj such as μj = ∫(1 − Fj)(u)r(u) du, for a given r, in the nonparametric model under certain conditions [Groeneboom and Wellner (1992)]. Previous work and examples of traditional current status data on a time variable T can be found in Diamond, McDonald and Shah (1986), Jewell and Shiboski (1990), Diamond and McDonald (1992), Keiding (1991) and Sun and Kalbfleisch (1993). In its nonparametric setting, the current status data structure is also known as case I interval censored data [Groeneboom and Wellner (1992)]. Current status data commonly arise in epidemiological investigations of the natural history of disease and in animal tumorigenicity experiments. Jewell, Malani and Vittinghoff (1994) give two examples that arise from studies of Human Immunodeficiency Virus (HIV) disease.
Note that the RNPMLE of Fj ignores the value of N (C), beyond information on whether N (C) ≥ j or not. For example, if N (t) is tumor size in a carcinogenicity experiment, then the simple current status estimator of the distribution of time, T1, till onset of tumor would not distinguish between an observation (C, N (C)) with N (C) large and an observation (C, N (C)) with N (C) small but larger than 0, while the latter observation seems to suggest that onset occurred recently. Nonetheless, we establish that the RNPMLE yields efficient estimators of pathwise differentiable parameters at a large class of continuous data generating distributions of interest.
1.2. Current status data on a finite counting process when the final event is right censored
We also consider the data structure (T̃k ≡ C ∧ Tk, N (T̃k)) for a finite state counting process , where Tk represents the final event (say death) which is right censored by the monitoring time C, and k is known. Note that this observation includes observing the failure indicator I (T̃k = Tk). For example, consider a carcinogenicity experiment with mice in which T1 is time till onset of colon tumor, T2 time to liver metastasis and T3 time to death from tumor, where we assume that colon tumors do not cause death except through liver failure secondary to metastasis. Here C is either a sacrificing time or time till death from any unrelated cause.
Consider another example concerning estimation of the survival function of the time T = J − I between time I at seroconversion and time J at death of a hemophiliac patient infected with HIV. For this purpose we observe n i.i.d. subjects in a fixed time-interval of 10 years. If we assume that the time I at seroconversion of the subject is observed (which is approximately true for hemophiliacs), then the subject’s survival time T is right censored by C ≡ 10 − I, where T will play the role of Tk. We define Tj as the time till a given monotone “surrogate” process Z(t) achieves a particular value among a set of k − 1 increasing values, j = 1, …, k − 1, where we assume that death T = Tk always and only occurs after the value Z(Tk − 1) has been reached. Let be the counting process. Here Z(t) measures the progression of the disease of the subject t years after seroconversion; for example, Z(t) might be a measure of viral load of the subject t years after seroconversion, where it may be reasonable to assume that the viral load is a nondecreasing process in the absence of treatment.
Suppose that for every subject who did not die before the end of the study C one measures the “surrogate” Z(C) at time C only. In other words, we observe failure times only for subjects who fail before end of follow up and for every subject who is alive at end of follow up we also have a marker indicating future prognosis. Note that the observed data on a subject is given by (T̃ = T ∧ C, Z(T̃)). We only assume that C is independent of Z. A seemingly ad hoc estimator of S(t) = P (T > t) is the Kaplan–Meier estimator which simply ignores the marker information. In this example, a natural question is whether one can improve on the Kaplan–Meier estimator using the information in the surrogate process Z. In this paper we prove that the Kaplan–Meier estimator is asymptotically efficient at many continuous data generating distributions for which Fj have compact support.
A special case of this data structure has been treated in the literature. Consider a carcinogenicity experiment with , T1 is time till onset of tumor and T2 is time till death from tumor. Thus one observes (T̃2 ≡ C ∧ T2, N (T̃2)). This data structure has been considered in Kodell, Shaw and Johnson (1982), Dinse and Lagakos (1982), Turnbull and Mitchell (1984), van der Laan, Jewell and Peterson (1997), and recently Groeneboom (1998). The NPMLE for this data structure requires an iterative algorithm: Turnbull and Mitchell (1984) implemented the NPMLE via the EM-algorithm (using an initial distribution with point masses at each data point so that the EM-algorithm indeed converges to the NPMLE), while Groeneboom (1998) implements the NPMLE by maximizing the actual likelihood with a modern optimization algorithm. In this problem, an ad hoc estimator of the distribution of T2 is the Kaplan–Meier estimator based on the reduced data (T̃2, Δ2 = I (T̃2 = T2)). In Dinse and Lagakos (1982), the Kaplan–Meier estimator of F2 was proposed and it was suggested that the NPMLE might be more efficient than the Kaplan–Meier estimator. In van der Laan, Jewell and Peterson (1997) it is shown that the Kaplan–Meier is efficient under a weak condition on (F1, F2). Moreover, an isotonic regression estimator of F1 was provided: note that estimation of F1 is complicated by the fact that for some subjects one only observes T2 and thus that T1 < T2, where T2 cannot be viewed as an independent monitoring time for T1. We note here that, in van der Laan, Jewell and Peterson (1997), a simulation study was carried out which incorrectly implements the NPMLE, so that finite sample comparisons between the Kaplan–Meier estimator and the NPMLE remain open to study [specifically the derivation of the score equations in van der Laan, Jewell and Peterson (1997) for the NPMLE was not valid since the authors incorrectly assumed that the NPMLE F̂1 is strictly smaller than the NPMLE F̂2].
1.3. Organization and overview of results
In Section 2 we prove, for the data structure of Section 1.1, that if the Fj’s are continuous with Lebesgue density bounded away from zero on [0, τj] and zero elsewhere, and G is also continuous, then any estimator of a parameter that is regular and asymptotically linear at PF,G is also asymptotically efficient. The complexity of the NPMLE is discussed including that it is more efficient at many data generating distributions with singular pairs Fj1, Fj2; for example, F1 discrete and F2 continuous
In Section 3, we prove an analogous result for the nonparametric model with the data structure (C ∧ Tk, N (C ∧ Tk)). This shows that the Kaplan–Meier estimator of the distribution of Tk, based on the reduced data (T̃k, Δk ≡ I (Tk ≤ C)), is asymptotically efficient at many continuous data generating distributions, extending the result in van der Laan, Jewell and Peterson (1997) for the case k = 2. Moreover, simple isotonic regression estimators for the distributions Fj, j = 1, …, k − 1, are proposed that also yield asymptotically efficient estimators of smooth functionals by our general result.
2. Current status data on a counting process
2.1. Traditional current status data
Traditional current status data can be viewed as current status data on a simple counting process as follows. Let T be a univariate failure time of interest and define the process Δ(t) = I (T ≤ t) as the counting process with one single jump at point T. Let Y = (C, Δ(C)) represent current status data on Δ at a monitoring time C. We assume that C is independent of T [i.e., of Δ(·)]. The parameter of interest is the distribution F of T.
The properties of the NPMLE Fn of the distribution of T were established in Groeneboom and Wellner (1992). Here the NPMLE is defined as the maximum likelihood estimator over all discrete distributions with jumps at the monitoring times. Beyond proving a limit distribution result for Fn, these authors also established efficiency of smooth functionals of Fn with a closed form expression of the limit variance so that Wald-type confidence intervals are directly available. Huang and Wellner (1995) provide an alternative proof of asymptotic linearity of the NPMLE of smooth functionals of F under weak conditions.
We refer to Bickel, Klaassen, Ritov and Wellner (1993) for definitions of a regular, asymptotically linear and efficient estimator and influence curve of an estimator. The semiparametric-information bound at PF,G is defined as the infimum of parametric information bounds over a specified class of parametric submodels. We choose as parametric one-dimensional submodels
where dFε,h1 (·) = (1 + ε,h1 (·)) dF (·), dGε,h2 (·) = (1 + ε,h2 (·)) dG (·) and ε is the unknown parameter with parameter space [− δ, δ] for some small δ > 0. The tangent space at PF,G is now defined as the closure in of the linear span of all the scores of these one-dimensional submodels, where, for a given measure μ, we define as the Hilbert space endowed with inner product 〈h1, h2〉μ = ∫h1(y)h2(y) d μ(y). Thus the tangent space at PF,G is a sub-Hilbert space of .
In this paper it is particularly important to realize that efficiency of an estimator is a local property in the sense that a regular estimator can be efficient at a particular PF,G and inefficient at another element of the model.
Lemma 2.1
Consider the nonparametric model for Y = (C, Δ(C)), where Δ(·) ≡I (T ≤ ·), T has unspecified distribution F and C is independent of T with unspecified distribution G. We observe n i.i.d. observations of Y = (C, Δ(C)). Consider the parameter μ = ∫(1 − Fn)(u)r(u) du for a given function r. Consider the estimator μ = ∫(1 − Fn)(u)r(u) du, where Fn is the NPMLE of F. We have that μn is regular and asymptotically linear at any (F, G) for which F is continuous with density fT > 0 on [0, M] and zero elsewhere (M < ∞), g(x) = dG/dx > 0 on [0, M], and r is bounded on [0, M].
The influence curve of μn is given by
(1) |
The variance of IC is given by
This lemma is proved in Huang and Wellner (1995).
We can also prove the following tangent space result.
Lemma 2.2
Consider the nonparametric model for Y = (C, Δ(C)), where Δ(·) ≡I (T ≤ ·), T has unspecified distribution F and C is independent of T with unspecified distribution G. We observe n i.i.d. observations of Y = (C, Δ(C)). Suppose that:
F has a Lebesgue density f with f > 0 on [0, τF) and, if τF < ∞(τF = ∞ is allowed), then f = 0 on (τF = ∞), and
G has a Lebesgue density g.
We allow F ({∞}) > 0. Then the tangent space at PF,G equals . This implies that an estimator of a parameter μ (F) which is regular and asymptotically linear at PF,G is also asymptotically efficient if F, G satisfy (1) and (2).
In Gill, van der Laan and Robins (1997) it is proved that if one only assumes that the conditional distribution of the observed data Y, given the full data T, satisfies “coarsening at random” (CAR), then the tangent space at PF,G is saturated, that is, equals . The tangent space generated by G(· | T) under the sole assumption CAR equals . Therefore, the main idea of the proof below is to show that under the independent censoring model G(· | T) = G(·), the tangent space of the marginal distribution G equals TCAR at a PF,G satisfying (1) and (2) of Lemma 2.2. The proof below will be an ingredient of the proofs of our two main theorems.
Proof of Lemma 2.2
Let be the score operator for F and let be its adjoint. The closure of the range of a Hilbert space operator equals the orthogonal complement of the null-space of its adjoint; that is, , where is the closure of the range of the score operator and N (A⊤) is the null space of A⊤. Thus .
The data generating distribution is indexed by two locally variation-independent parameters F and G, so that the tangent space at PF,G can be obtained as a sum of two tangent spaces, namely the tangent space for F, which is given by , and the tangent space for G. For every with finite supremum norm, we have that ε →(1 + εh2) dG is a one-dimensional submodel through G at ε = 0. Thus the tangent space corresponding with submodels ε → PF,Gε equals . Thus we have that the tangent space is given by . We conclude that it suffices to show that .
We have
Thus ∫V (c, Δ(c)) dG(c) = 0 F -a.e. implies that
(2) |
Differentiation w.r.t T yields V (C, 0) = V (C, 1) on [0, τF) G-a.e. If τF < ∞ and c > τF, then c > T and thus V (c, Δ(c)) = V (c, 1). Thus V (C, 0) = V (C, 1) G-a.e. which proves . □
It is of interest to note that one can represent FT (t) as a monotonic regression of Δ on C since F (t) = E(Δ (C) | C = t). This suggests that one can estimate FT with the estimator Fn(t) which minimizes over all distribution functions FT. Fn(t) can be computed using the pool-adjacent-violator-algorithm [see Barlow, Bartholomew, Bremner and Brunk (1972)] which, in fact, yields the NPMLE.
2.2. Current status data on a counting process
Let the process of interest be a counting process , where Tj is the time-variable at which an event occurs and where N jumps from value j − 1 to j. Let C be a monitoring time and consider the data structure Y = (C, N (C)). We observe n i.i.d. copies of Y. We only assume that C is independent of N.
The distribution of (C, N (C)) depends on the distribution of T⃗ only through the marginal distributions Fj of Tj, j = 1, …, k. To be precise, we have (denoting Si = 1 − Fi), for j ∈ {0, …, k},
Thus the distribution of Y = (C, N (C)) only identifies the marginal distributions of Tj, j = 1, …, k.
The NPMLE does not exist in closed form and can only be computed with an iterative algorithm. For a given j, we can reduce the observation (C, N (C)) to simple current status data (C, Δj = I (Tj ≤ C)) on Tj, and estimate Fj with the RNPMLE. Under the conditions stated in Lemma 2.1, with F = Fj and G = G, this estimator provides regular and asymptotically linear estimators of smooth functionals of the type μj = ∫(1 − FTj)(u)r(u) du,, for a given r in the nonparametric model. The following theorem proves that, at a data generating distribution of Y satisfying a specified condition, any regular asymptotically linear estimator will provide asymptotically efficient estimators of smooth functionals of FTj. We decided to state a condition (3) which is easy to understand, but our proof shows that this can be weakened, for example, to allow the analogue of (3) for the case where all distributions G, F1, …, Fk are discrete with a finite number of support points; that is, the support points of Fj are contained in the support points of Fj are contained in the support points of Fj+1, j = 1, …, k −1, and G is discrete with support contained in the support of Fk.
Theorem 2.1
Let T1 < T2 < ···< Tk be time-variables corresponding to the chronological events of interest. Define the counting process with jumps of size 1 at these Tj’s by
Let Y = (C, N (C)). Consider the following semiparametric model for Y: Let C ~ G be independent of T⃗~ F, but leave G and F unspecified. Then, the distribution of Y only depends on the multivariate distribution F of T⃗ = (T1, …, Tk) through the marginal distributions F1, …, Fk of T1, …, Tk.
Consider a data generating distribution PF,G in the model above, satisfying the following condition (3): For certain τ1 < ···< τk < ∞let Fj have Lebesgue density fj on [0, τj] with
(3) |
We allow that pj ≡ P (Tj = ∞) > 0 for j = j0, …, k and j0 ∈ {1, …, k}.
Then the tangent space at PF,G equals and is thus saturated.
This implies that any estimator of a real valued parameter of F that is a regular and asymptotically linear estimator at PF,G is also asymptotically efficient if PF,G satisfies (3). In particular, given j ∈ {1, …, k}, if PF,G satisfies (3), and Fj, G satisfy the conditions of Lemma 2.1 for the RNPMLE of μFj based on (C, I (Tj = C)) (thus with F = Fj and G = G), then the RNPMLE of μFj is asymptotically efficient.
2.2.1. Heuristic understanding of the difference between NPMLE and RNPMLE
To understand the difference between the NPMLE and the RNPMLE, we consider the special case k = 2 in detail. In this case N can have three possible values:
Let us assume that C has a Lebesgue density g. The likelihood of (C, N (C)) is given by
We note that the density pF1,,F2,G can be reparametrized as
where R(t) ≡ S1(t)/S2(t). Thus, if we ignore the relation between F2 and R, then the NPMLE of F2 of the likelihood corresponding with pR,F2,G would actually be equal to the reduced data NPMLE based on the reduced data (C, I (T2 ≤ C)). However, F2 and R are related since S2R has to be a survival function. Therefore, it is not possible to determine the NPMLE by separate maximization w.r.t. F2 and R, which explains why the NPMLE and the RNPMLE of F2 differ.
Theorem 2.1 shows that this relation between F2 and R is not informative for estimation of smooth functionals of F2 at a large class of data generating distributions, since the RNPMLE, which ignores this relation, is still asymptotically efficient for estimation of -estimable parameters. Our proof of Theorem 2.1 for k = 2 shows that the efficient score operator (for the definition of an efficient score operator, see the proof) of F2 equals the efficient score operator for F2 in the reduced data model based on (C, Δ2). This implies that, at (F1, F2) satisfying (3), the efficient influence curve for any smooth functional of F2 equals the influence curve of the RNPMLE as given in Lemma 2.1. Closer inspection of the proof for k = 2 also shows that, if (e.g.) F2 is continuous while F1 is discrete on [0, τ1], or F2 is discrete with support not containing the support of a discrete F1, then the efficient score operator for F2 is not the same as the efficient score operator for F2 in the reduced data model, so that, in particular, the efficient influence curves (and information bounds) differ for the two models. Thus, at such (F1, F2), the RNPMLE of smooth functionals of F2 is inefficient.
Here, we provide a likelihood-based explanation of this fact. Let Rn be the NPMLE of R. The NPMLE of F2 maximizes the likelihood corresponding with pRn,F2 over all F2 for which S2Rn is a survival function, while the RNPMLE maximizes the likelihood over all distributions F2. Suppose now that the model consists of discrete F1’s and continuous F2’s. This model, though smaller than the model with F1, F2 being unspecified, has the same semiparametric efficiency bound at a (F1, F2) in this smaller model as the efficiency bound in the original model. This follows from the fact that the class of one-dimensional submodels as needed to compute the tangent space can still be chosen the same. In this smaller model, an R = S1/S2 will be discrete at the support points of F1, and the shape of R between the support points equals the shape of 1/S2. As a consequence, since R determines the shape of F2 between the support points, knowing R in the smaller model helps enormously in estimating S2. In particular, for a given Rn, maximizing the likelihood corresponding with pRn,,F2 over F2 with S2Rn being a survival function, is very different from maximizing this likelihood over all possible distributions F2. This shows that the RNPMLE in the smaller model is inefficient at such (F1, F2). Since the efficiency bound in the smaller model is the same as the efficiency bound in the original model, this also shows that the RNPMLE will also be inefficient at such (F1, F2).
Proof of Theorem 2.1
We need to prove that assumption (3) implies that the tangent space at PF,G equals , and is thus saturated. The data generating distribution PF,G is indexed by F and G, where the dependence on F is only through the marginals Fj, j = 1, …, k. Thus, the tangent space at PF,G can be obtained as a sum of two tangent spaces, namely the tangent space for F and the tangent space for G, where the latter equals . Let F, G be given and satisfy (3). We now claim that the tangent space for F is given by the closure of the sum of the k tangent spaces for Fj calculated as if the Fj ’s are variation-independent parameters, j = 1, …, k. We will show this now. Let have finite supremum norm, and let Fj,ε,hj be the one-dimensional perturbation through Fj at ε = 0, j = 1, …, k. First, note that the support of Fj,ε,hj equals the support of Fj, j = 1, …, k. Since Fj > Fj +1 (strictly) on (0, τj] we have that, given an arbitrarily small δ1 > 0, there exists a neighborhood ε ∈ (−δ, δ) with Fj,ε,hj ≥ Fj+1,ε,hj+1on (δ, τj] for all j = 1, …, k − 1. Thus, PFj,ε,hj,j = 1,…,k,G satisfies the constraints Fj ≥ Fj+1, j = 1, …, k − 1, of our model except on an arbitrarily small neighborhood of 0. Thus, by modifying hj on an arbitrarily small neighborhood of 0, we can make ε → PFj,ε,hj, j=1,…,k,G a true one-dimensional submodel. Since a tangent space for F is obtained as the closure in of the linear span of scores of all possible one-dimensional submodels, it follows that the score of the unmodified ε → PFj,ε,hj, j=1,…,k,G also belongs to the tangent space. This proves our claim.
Let j ∈ {1, …, k} be given. For a given , we consider the one-dimensional submodel Fj,ε given by ε → (1 + εhj (t)) dFj (t) which goes through Fj at ε = 0. For notational convenience, define the random variable R = N (C) + 1 ∈ {1, …, k + 1}, and let F−j be the (k − 1)-dimensional vector of c.d.f.’s excluding Fj. This one-dimensional submodel Fj,ε implies a score for PFj,ε,F−j,G given by
If we define S0 ≡ 0 and Sk+1 ≡ 1, then, for j = 1, …, k,
where we use that S1 − S0 = S1, and Sk+1 − Sk = Fk. Here is called the score operator of, Fj = 1, …, k. The tangent space for Fj is given by the closure of the range of Aj denoted by . Define by AF (h1, …, hk) = A1(h1) + … + Ak(hk). Then, the tangent space for F equals so that the tangent space at PF,G is given by . Thus, to prove the theorem, it suffices to show that at any F, G satisfying (3).
The remaining task is to understand the range of AF. We decompose AF as a sum of efficient score operators , where is defined as Aj minus its projection, on the sum-space spanned by the ranges of the other score operators A1, …, Aj −1, Aj +1, …, Ak, j = 1, …, k. We will prove that the efficient score operator of Fj at a PF,G satisfying (3) equals , which is the score operator for the reduced current status data structure (C, Δj), j = 1, …, k. Since the information bounds for smooth functionals of Fj are, in both models, solely expressed in terms of the efficient score operator for Fj, the latter result proves that an efficient estimator of μj based on (C, Δj), j = 1, …, k, like the RNPMLE, is also efficient in the model for the more informative data structure (C, N (C)) [e.g., Bickel, Klaassen, Ritov and Wellner (1993)]. This proves that the RNPMLE actually yields efficient estimators. Subsequently, we show that this special structure of the efficient score operators implies that the tangent space at a PF,G satisfying (3) is saturated, proving the more general statement of Theorem 2.1.
Derivation of the efficient score operators of Fj
Since E(Al(hl)Am(hm)(Y)) is equal to 0 if | l − m |≥ 2, it will follow that the efficient score operators mainly involve projections of the type and . Therefore we first obtain closed form expressions, in general, for these projection operators.
If the projection is actually an element of R(Aj −1), then this projection is given by (compare with the formula X(X′X)−X′Y for the least squares estimator):
(4) |
where is the adjoint of , and stands for the generalized inverse of . Similarly,
(5) |
The adjoint is defined by
It is easily shown that for l ∈ {1, …, k},
We have that
where
or, in fact, with our convention of S0 = 0 and Sk+1 = 1,
Here φl (t) ≡ 0 if Sl(t) = 0.
If pl = P (Tl = ∞) > 0, then we can write
Thus, given a K with K ≪ G, a solution (if it exists) of has to satisfy: for G-a.e., c ∈ [0, τl],
(6) |
and, if pl = P (Tl = ∞) > 0, then the equation yields
(7) |
Thus, even when pl > 0, (6) is the principal equation to solve (and will imply our conditions) since its solution hl on [0, τl] yields the complete solution hl(Tl) = hl(Tl)I[0,τl](Tl) + I(Tl = ∞)hl(∞). This two-step method for solving for hl in first solves for hl I[0,τl] and then uses that, if pl > 0, hl (∞) is a function of hl I[0, τl].
We have, for l ∈ {1, …, k − 1},
We note that this element is indeed absolutely continuous w.r.t. G. Similarly, it follows that, for l ∈ {1, …, k − 1},
Thus, is the h satisfying
(8) |
for G-a.e., c ∈ [0, τj−1] and, if pj −1 > 0, then h(∞) is a simple function of hI[0,τj−1] as given above. Similarly, is the h satisfying
(9) |
for G-a.e., c ∈ [0, τj+1] and, if pj +1 > 0, then h(∞) is a simple function of hI[0,τj+1]. If we can take a derivative of the right-hand sides in (8) and (9) w.r.t. Fj −1 and Fj +1, then, in terms of h, equations (8) and (9) have a solution. This is possible if Fj ≪ Fl (i.e., Fj is absolute continuous w.r.t. Fl) on [0, τl], l ∈ {j − 1, j + 1}, which holds under assumption (3) since we assumed that all Fj have positive Lebesgue density on [0, τj]. The efficient score operator also involves projections requiring existence of solutions hl−1,l, hl+1,l for l different from j. Therefore, the assumed condition (3) includes (via an easy to understand condition) the necessary and sufficient conditions for the existence of hl−1,l, hl+1,l for all possible l, as needed below.
This gives the following closed form expressions for the projections (4) and (5) by simply replacing in Al(h) by the expressions above. We have, for j = 1, …, k − 1,
(10) |
and, for j = 2, …, k,
(11) |
For simplicity we derive the efficient score operators for the case k = 3. (The proof generalizes to the general case.) First, define
The efficient score operators are given by
Calculation of . Applying (10) and (11) with j = 2 gives us
and
Thus,
Now, notice that
Thus (using ),
Calculation of . Formula (10) with j = 1 gives us
Thus,
We now note that
Thus,
It is easily verified that the adjoint is given by
Subsequently, we can now verify that
where
We need to find with
This solution has to satisfy on [0, τ2]:
and, as shown previously, h23,1(∞) is a simple function of h23,1I[0,τ2]. We note that h23,1 exists under the assumption Fj ≡ Fk (i.e., Fj ≪ Fk and Fk ≪ Fj) on [0, τj], j = 1, …, k − 1, which follows from (3). We conclude that
Using F2/(F1(S2 − S1)) − 1/(S2 − S1) = −1/F1 and yields
Calculation of . This calculation is very similar to the one above for and is omitted. We have
Proving that the tangent space is saturated
Given the expressions for the efficient score operators derived above, we now prove that the tangent space at a PF,G satisfying (3) is saturated. Under our assumption (3), the tangent space equals (scores generated by G) plus the closure of the range of defined by
where the marginal efficient score operators are given by . The closure of the range of a Hilbert space operator equals the orthogonal complement of the null-space of its adjoint, that is, . Thus we need to show that . The adjoint is given by
where it is easily verified that the adjoint of is given by
Consider the operator given by , where is the space of functions of (C, Δj) with finite variance and zero mean (both taken w.r.t. PF,G). Using precisely the same proof as the proof of Lemma 2.2, it follows that, if Fj has a Lebesgue density fj > 0 on [0, τj], then the null-space , that is, it consists of functions independent of Δj. Thus, under (3), implies that E(V (C, R) | C, Δj) = E (V (C, R) | C) ≡ φ(C), j = 1, …, k.
Setting Δ1 = 0 yields φ(C) = E(V (C, R) | C, Δ1 = 0) = V (C, 1). Now, we note that
where P (R = m | c) = (Sm − Sm−1)(c). Thus, E(V (C, R) | C, Δj = 1) is given by
For j = k, this equality gives V (c, k + 1) = φ(c). For j = k − 1, this equality gives then
so that V (c, k) = φ(c). In this manner, we subsequently find φ(c) = V (c, k + 1) = V (c, k) = … = V (c, 2). This shows that V (C, R) does not depend on R. This completes the proof.
3. Current status data on a counting process when final event is right censored
The following theorem proves efficiency of any regular asymptotically linear estimator at a specified rich sub-model.
Theorem 3.1
Let N (t) be a counting process for random variables T1 < …< Tk. Let C be a random censoring time. For every subject we observe the following data structure:
We assume that C is independent of (T1, …, Tk). The distribution of Y only depends on the multivariate distribution F of (T1, …, Tk) through the marginal distributions F1, …, Fk of (T1, …, Tk).
Consider a data generating distribution PF,G in the model above satisfying the following condition (12): For certain τ1 < …< τk < ∞, let Fj have Lebesgue density fj on [0, τj] with
(12) |
We allow that pj ≡ P(Tj = ∞) > 0 for j = j0, …, k and j0 ∈{1, …, k}.
Then, the tangent space at PF,G equals and is thus saturated. This implies that an estimator of a real valued parameter of the distribution F which is regular and asymptotically linear at PF,G is also asymptotically efficient if PF,G satisfies (12). In particular, if Ḡ(t) > 0 and F, G satisfy (12), then the Kaplan–Meier estimator Sk,KM (t) of Sk(t) = P (Tk > t), based on the i.i.d. data (T̃, Δ), is asymptotically efficient.
3.1. Regular and asymptotically linear estimators
The important implication of Theorem 3.1 is that, if we can construct an estimator of -estimable parameters of Fj which is regular, then this estimator will be asymptotically efficient at any F satisfying (12), j = 1, …, k. In this subsection, we provide relatively simple regular and asymptotically linear estimators.
First, consider estimation of Sk(t) = P (Tk > t). It is well known that Sk,KM (t) is a regular asymptotically linear estimator of Sk(t) whenever Ḡ(t) > 0. Second, consider estimation of Sj (t) = P (Tj > t), j = 1, …, k − 1. Let Δj ≡ I (Tj ≤ C). Under independent censoring (we can weaken this to noninformative censoring of Tk), we have
(13) |
So
(14) |
In other words, estimating Sj can be viewed as estimating a monotonic regression of Sk(C)(1 − Δj) on the observed C’s. This suggests replacing Sk by the efficient Kaplan–Meier estimator Sk,KM and minimizing
(15) |
over the vector (Sj (Ci): i = 1, …, n), under the constraint that Sj is monotone, where wi, i = 1, …, n, is a given set of weights possibly assigning more mass to observations with smaller variance. The solution Sj,n of this problem can be obtained with the pool-adjacent-violator-algorithm (PAVA) [see, e.g., Barlow, Bartholomew, Bremner and Brunk (1972)].
A simple calculation shows that
(16) |
Since Rj is not identified from the data at a better rate than Sj, a good set of weights is [see van der Laan, Jewell and Peterson (1997)].
It is beyond the scope of this paper to prove that smooth functionals of Sj,n are regular and asymptotically linear. Since it is straightforward to prove such a theorem for a standard histogram regression estimator of the regression of Sk(C)(1 − Δj) on the observed C’s, one expects that the more sophisticated isotonic regression estimate Sj,n (which only differs because it selects its bins adaptively) is regular and asymptotically linear under the same conditions. We note that the choice of weights wi, i = 1, …, n, has no effect on the limit distribution of smooth functionals of Sj,n.
3.2. Proof of Theorem 3.1
In the first part of the proof we establish that, if condition (12) holds, then the efficient score operator of Fk equals the efficient score operator of Fk in the reduced data model for (T̃k, Δk), hereby establishing a proof of the efficiency of the Kaplan–Meier estimator SKM (t). Subsequently, exploiting this special form of the efficient score operator of Fk, we prove saturation of the tangent space and thus Theorem 3.1.
Consider the data structure (T̃k = Tk ∧ C, N (T̃k)), where and T1 < T2 < …< Tk are ordered random variables. Let R = N (T̃k) + 1. The density of the data is given by
where S0 ≡ 0 and Sk+1 ≡ 1. We refer to the beginning of the proof of Theorem 2.1 to show that the tangent space at a PF,G satisfying condition (12) is the closure of the sum of the tangent spaces generated by Fj, j = 1, …, k and the tangent space of G, treating Fj as locally variation-independent. We have that the score operators: for Fj, j= 1, …, k − 1, are given by
and
Derivation of efficient score operator of Fk
We first determine the efficient score operator for Fk. For notational convenience, we consider the case k = 3. We have
where
Applying formula (11) gives
where we need to assume that F2 ≪ F1 on [0, τ1]. Thus, an easy calculation shows that
Another straightforward calculation shows that the adjoint of is given by
A straightforward calculation now shows that
We also have
This shows that satisfies, on [0, τ2],
and, if p2 = P (T2 = ∞) > 0, then h21,3(∞) is a simple function of h21,3I[0, τ2]as shown above (7). Here we need to assume that this equation can be solved in h21,3. This is true if F3 ≪ F2 on [0, τ2]. Then
This proves that
Thus, we have proved that, if Fk ≡ Fj on [0, τj], j = 1, …, k − 1, then the efficient score . The latter condition holds, in particular, if (12) holds. This proves the statement of Theorem 3.1 regarding efficiency of the Kaplan–Meier estimator SKM.
Saturated tangent space result
Note that, for a random variable Y, we define . For simplicity, we prove saturation for k = 3. Let be defined by A(h1, h2) = A1(h1) + A2(h2). Then, the tangent space of F is given by . Thus, the tangent space at PF,G is given by , where is the score operator for the censoring mechanism G, given by B(h) = E(h(C) | T̃3, Δ3). By factorization of the likelihood into F and G parts, we have that R(B) is orthogonal to F-scores. It is well known that . The latter result simply states that the tangent space for the nonparametric right-censored data model for (T̃3, Δ3), only assuming that C is independent of T, is saturated [e.g., Bickel, Klaassen, Ritov and Wellner (1993)]. Thus, we need to prove that which is equivalent to proving , where is the adjoint of A and N (A⊤) denotes its null space.
First, we decompose A1 + …+ Ak− 1 into a sum of orthogonal operators (efficient score operators in the model with Fk known). Let and . By (4), it follows that
where we need the equivalence assumptions Fj ≡ Fj +1 on [0, τj] for j = 1, …, k, again. A more compact manner of representing these operators is
(17) |
Consider the operator defined by . Proving is equivalent to proving , where A′⊤ is the adjoint of A′.
From the representation (17), the adjoint is given by
and thus, .
Consider now a solution V I (T3 > C) ∈ H (C, R) satisfying . In order to prove , it suffices to show I (T3 > C)V = I (T3 > C)φ(C) for some φ. Using precisely the same proof as the proof of Lemma 2.2, it follows that, if Fj has a Lebesgue density fj > 0 on [0, τj] and G has a Lebesgue density, then, for any function I (T3 > C)η(C, Δj), E(I (T3 > C) η (C, Δj) | Tj) = 0 implies η (C, 1) = η (C, 0). This proves that E(V (C, R)I (T3 > C) | C, Δj, T3 > C) = E(V (C, R)I (T3 > C) | C, T3 > C) ≡ I (T3 > C)φ(C) does not depend on Δj, j = 1, 2.
Setting Δ1 = 0 yields I (T3 > C)φ(C) = E(V (C, R)I (T3 > C) | C, Δj, T3 > C) = V (C, 1)I (T3 > C). Now, we note that
Thus, E(V (C, R)I (T3 > C) | C, Δj = 1, T3 > C) is given by
For j = 2, this equality gives I (T3 > C)V (C, 3) = I (T3 > C)φ(C). For j = 1, this equality gives
so that I (T3 > C)V (C, 2) = I (T3 > C)φ(C). We have shown I (T3 > C) × V (C, 1) = I (T3 > C)V (C, 2) = I (T3 > C)V (C, 3) which proves that V = I (T3 < C)V1(T3) + I (T3 > C)φ(C) for some functions V1 and f, and thus that . This completes the proof. □
Acknowledgments
The authors thank the referees and Associate Editor for their helpful comments.
Footnotes
Supported by a FIRST award (GM53722) from the National Institute of General Medical Sciences and the National Institutes of Health.
References
- Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference under Order Restrictions. Wiley; New York: 1972. [Google Scholar]
- Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation in Semi-Parametric Models. Johns Hopkins Univ. Press; 1993. [Google Scholar]
- Diamond ID, McDonald JW. The analysis of current status data. In: Trussell J, Hankinson R, Tilton J, editors. Demographic Applications of Event History Analysis. Oxford Univ. Press; 1992. pp. 231–252. [Google Scholar]
- Diamond ID, McDonald JW, Shah IH. Proportional hazards models for current status data: Application to the study of differentials in age at weaning in Pakistan. Demography. 1986;23:607–620. [PubMed] [Google Scholar]
- Dinse GE, Lagakos SW. Nonparametric estimation of lifetime and disease onset distributions from incomplete observations. Biometrics. 1982;38:921–932. [PubMed] [Google Scholar]
- Gill RD, van der Laan MJ, Robins JM. Proc First Seattle Symposium in Biostatistics. Lecture Notes in Statist. Vol. 123. Springer; New York: 1997. Coarsening at random: Characterizations, conjectures and counterexamples; pp. 255–294. [Google Scholar]
- Groeneboom PJ. Special topics course 593C: Nonparametric estimation for inverse problems: algorithms and asymptotics. 1998 Technical Report 344, Dept. Statistics, Univ. Washington. (For related software see www.stat.washington.edu/jaw/RESEARCH/SOFTWARE/software.list.html.)
- Groeneboom P, Wellner JA. Information Bounds and Nonparametric Maximum Likelihood Estimation. Birkhäuser; Basel: 1992. [Google Scholar]
- Huang J, Wellner JA. Asymptotic normality of the NPMLE of linear functionals for interval censored data, case I. Statist Neerlandica. 1995;49:153–163. [Google Scholar]
- Jewell NP, Malani HM, Vittinghoff E. Nonparametric estimation for a form of doubly censored data with application to two problems in AIDS. J Amer Statist Assoc. 1994;89:7–18. [Google Scholar]
- Jewell NP, Shiboski SC. Statistical analysis of HIV infectivity based on partner studies. Biometrics. 1990;46:1133–1150. [PubMed] [Google Scholar]
- Jewell NP, van der Laan MJ. Generalizations of current status data with applications. Lifetime Data Analysis. 1995;1:101–109. doi: 10.1007/BF00985261. [DOI] [PubMed] [Google Scholar]
- Jongbloed G. Three statistical inverse problems. Delft Univ. Technology; 1995. Ph.D. dissertation. [Google Scholar]
- Keiding N. Age-specific incidence and prevalence: A statistical perspective (with discussion) J Roy Statist Soc Ser A. 1991;154:371–412. [Google Scholar]
- Kodell RL, Shaw GW, Johnson AM. Nonparametric joint estimators for disease resistance and survival functions in survival/sacrifice experiments. Biometrics. 1982;38:43–58. [PubMed] [Google Scholar]
- Sun J, Kalbfleisch JD. The analysis of current status data on point processes. J Amer Statist Assoc. 1993;88:1449–1454. [Google Scholar]
- Turnbull BW, Mitchell TJ. Nonparametric estimation of the distribution of time to onset for specific diseases in survival/sacrifice experiments. Biometrics. 1984;40:41–50. [PubMed] [Google Scholar]
- van der Laan MJ, Jewell NP, Peterson DR. Efficient estimation of the lifetime and disease onset distribution. Biometrika. 1997;84:539–554. [Google Scholar]