Abstract
The Kaplan–Meier product-limit estimator is a simple and powerful tool in time to event analysis. An extension exists for populations stratified into cohorts where a population survival curve is generated by weighted averaging of cohort-level survival curves. For making population-level comparisons using this statistic, we analyse the statistics of the area between two such weighted survival curves. We derive the large sample behaviour of this statistic based on an empirical process of product-limit estimators. This estimator was used by an interdisciplinary National Institutes of Health–Social Security Administration team in the identification of medical conditions to prioritize for adjudication in disability benefits processing.
Keywords: survival analysis, Kaplan–Meier, heterogeneous distribution, non-parametric, hypothesis test, asymptotic analysis
1. Introduction
Survival analysis addresses the classical statistical problem of determining characteristics of the waiting time until an event, canonically death, from observations of their occurrence sampled from within a population. This problem is not trivial as the expected waiting time is typically dependent on the time-already-waited. For instance, a hundred-year-old can be more certain of surviving to his or her one hundred-and-first birthday than a newborn might reasonably be. However, the comparison may shift in the newborn’s favour for the living to 121, particularly in light of medical advances that make survival probabilities non-stationary. Parametric approaches for assembling survival curves are usually not flexible enough to capture this complexity.
One simple approach to this problem was pioneered by the work of Kaplan & Meier [1]. Their product-limit estimator [2–5] is a non-parametric statistic that is used for inferring the survival function for members of a population from observed lifetimes. This method is particularly useful in that it naturally handles the presence of right censoring, where some event times are only partially observed because they fall outside the observation window. It was not, however, designed to account for varying subpopulations that may yield non-homogeneity in overall population survival (figure 1). For instance, in the example given above, subpopulations for survival characteristics may be defined by birth year or entry cohort of a subject in a particular study (figure 1).
Figure 1.
Inhomogeneity of survival within populations can result due to at least two reasons. In (a), inhomogeneity results from a categorical covariate that influences survival statistics. In (b), inhomogeneity results from non-stationarity, where cohorts of individuals are sampled at different times. In this case, the problem of progressive censoring is apparent because later cohorts have not been observed as long.
Several existing statistical methods address variants of this limitation. A natural approach is to consider the varying subpopulations as defining underlying covariates, thus laying the framework for a proportional hazards model. The assumption of proportional hazards is quite strong. When considering time-dependent statistics (as in the motivational example), it is violated in all but a few specific cases. Likewise, frailty models, first developed by Hougaard (cf. [6]), and extended by Aalen (cf. [7]), assume multivariate event distributions, but also make assumptions on the underlying event distributions and assume proportional hazards.
Other existing methods, such as bivariate survival analysis (cf. [8]), consider the time to observation and the time to event as conditionally independent random times. Underlying these methods is the assumption that upon the time of observation, all individuals will then have a similar event time distribution, thus failing to acknowledge the temporal changes.
These complexities arose in the identification of new disorders to incorporate into the United States Social Security Administration (SSA)’s Compassionate Allowances (CAL) initiative. The CAL initiative seeks to identify candidate medical conditions for fast-tracking in the processing of disability applications. The intent of this initiative is to prioritize applicants who are most likely to die in the time-course of usual case processing so that they may receive benefits while still living.
At its inception, the CAL initiative identified conditions based on the counsel of expert opinion [9]. The SSA in collaboration with the National Institutes of Health (NIH) sought to expand the list of CAL conditions systematically, using a databased approach. Using in part the survival estimator described in this paper, the NIH identified 24 conditions for inclusion into the list of conditions [9].
The methodology used in CAL is related to that of the work of Pepe & Fleming (cf. [10,11]), where a class of weighted Kaplan–Meier statistics is introduced. Though these statistics exhibit the same limitations as in the standard Kaplan–Meier case, it should be noted that [11] introduces the stratified weighted Kaplan–Meier statistic. The statistic presented here is a priori quite similar, but instead of a weighting function, includes the empirical prevalence. In doing so, the weight is no longer independent of the event time estimate, and thus requires much different methods of proof.
We thus consider the overall survival distribution for a population of individuals with sub-populations that exhibit non-homogeneous survival distributions. Through this consideration, a new test statistic, based upon the empirical process of product-limit estimators is developed. Through constructive methods, this test statistic compares survival distributions among the distinct subpopulations, and weights according to distribution of the identified subgroups.
2. Statistical method
Suppose Γ(1) and Γ(2) are disjoint populations of individuals where each individual belongs to exactly one of d distinct cohorts labelled . For randomly selected individuals γ ∈ Γ(i) within population i, we desire to understand the statistics of the event time under the assumption that survival is conditional on cohort and population.
One representation of the marginal survival probability for members of population i, is found by conditioning on cohort
| 2.1 |
where represents the survival function for individuals of cohort z in population i, where each individual’s cohort membership is known.
We use this representation of the survival probability as motivation to formulate an estimator for the population-average survival functions
| 2.2 |
where and are estimators of the cohort prevalence and cohort-wise survival, respectively. This weighted Kaplan–Meier method has appeared previously in the literature [12], and has been empirically validated against the pure Kaplan–Meier method [13], where the weighting procedure was found to reduce the bias in the construction of survival curves. The asymptotic convergence of the product-limit estimator and weighted variants is well established [11,14]. We use this survival curve reconstruction method as a base in constructing a new statistic for comparing populations. The focus of this paper is not the properties of this survival estimator but rather the asymptotic convergence of its bounding area and the use of such a quantity for evaluating a null hypothesis.
Our concern is the general situation where random samples of size n(i) are chosen from each of the respective populations. Within these samples, the number of individuals within each cohort, is counted, from which an estimator of the cohort distribution is obtained
| 2.3 |
In turn, we assume that the cohort-level survival functions are estimated independently using the product-limit estimator. Note that since the product-limit estimator is not a linear functional of sampled lifetimes, is distinct from the estimator obtained by applying the product-limit estimator directly on all n(i) samples of population i. To prevent confusion, we denote all direct applications of the product-limit estimator using and all instances of weighted sums of product-limit estimators using the Greek letter
With these elements in place, we define our test statistic
| 2.4 |
where , and τz denotes the time at which cohort z is censored in observations. Note that in the absence of random prevalence this statistic is equivalent to comparison of mean lifetimes between the two populations [10]. We state here the main result of the paper—the large sample behaviour of this statistic within a null-hypothesis statistical testing framework.
Theorem 2.1. —
Let denote the probability that a z-type individual has not yet been censored at time t ≥ 0 (the survival probability relative to the occurrence of censoring), and denote the probability that an individual in population i is of cohort z, and let p(i) = n(i)/(n(1) + n(2)). Suppose that Then , as n(i) → ∞, with
where for , where τz is the time at which samples of cohort z are censored, , ϕz ≡ ϕz,0, Sz,t is the survival function for the pooled data of cohort z, and
Note that this quantity is well defined since by definition of τz, for all t ≤ τz. The variance σ2 may be consistently estimated by
2.5 where for , is the product-limit estimate of the pooled data for cohort z,
2.6 is the product-limit estimate associated with the event of censoring for cohort z within population i, and
2.7
Note that this quantity is also well defined since for all t ≤ τz. In appendix A, we provide a proof of theorem 2.1 in an empirical process framework. Note that since survival estimates and are step functions, all integrals are exactly computable.
3. Numerical investigation
A computational implementation of the test statistic and weighted survival estimators is available in the form of a package for R. This package also contains a class to handle arithmetic involving right-continuous piecewise linear functions. In the appendices, we have provided source code that may be used for installing and invoking this package.
Here, we present a computational investigation of the weighted survival curve estimator and the corresponding test statistic. Using simulations, we investigated the statistical power of , contrasted with that of existing non-parametric methods. Using a real dataset, we demonstrate the computation of , , and evaluate type I error.
3.1. Evaluating statistical power through simulations
Using simulations, we explored the statistical power of the test statistic in a case where populations are difficult to distinguish based purely on mean survival time. As test populations, we examined admixtures of exponential and Weibull distributions for the event time, and compared survival in these mixture populations to survival of a population of purely exponential event times (figure 2). Population 1 consists of individuals having an exponentially distributed lifetime with a mean of λ−1 = 4 years. Population 2 consists of two types of individuals: those who have an exponentially distributed lifetime with a mean of 5 years (type z = 1), and those of type z = 2 who have a Weibull distributed lifetime with shape parameter k = 5 and scale parameter λ = 1.
Figure 2.
Admixture test distributions used in simulated investigations of our estimator. Populations formed using q2 ∈ [0, 1) admixtures of (1 − q2)exponential(λ = 5−1) and q2Weibull(k = 5, λ = 1) event time distributions. Event time density functions πt and corresponding survival functions St are shown for various values of q2.
Since Population 1 is homogeneous, we only track subpopulations of Population 2—we drop the superscript and denote the proportion of Population 2’s members of type 2 by q2. It is most instructive to examine our method in the neighbourhood where both populations have approximately the same expectation value for the event time, which occurs for q2 ≈ 0.245. For this reason, we chose values near 0.25 for our simulations.
To compare the reweighted Kaplan–Meier estimator (equation (2.2)) to the standard Kaplan–Meier estimator, we estimated survival for the admixed population for q2 = 0.25, using various sample sizes. In figure 3, we present example reconstructions using these two methods. The estimator variance was approximated using 10 000 resamplings of sample size n of the admixed population, for each value of n. The estimation error, as defined by mean-squared difference between the reconstruction and the true survival function, was approximated in the same manner.
Figure 3.
Comparing estimators of survival. The survival estimation method of equation (2.2) compared to pure Kaplan–Meier for a population containing an admixture of and q2Weibull(1, 5) individuals, where q2 = 0.25. At a given sample size n, the survival estimates are obtained (top row: examples shown and contrasted). The estimator variance and mean square error were approximated using 10 000 resamplings for each of the sample sizes.
To better understand the performance of the test statistic (equation (2.4)), we evaluated its statistical power against that of other test statistics in distinguishing between Population 1 and Population 2 for various values of q2. For samples of size n(i) ∈ {30, 50, 100, 200, 1000} taken from each population, we performed 1000 null hypothesis statistical tests using our method, the log-rank method [15], and the standard Kaplan–Meier Wilcoxon signed-rank difference-of-mean methods [16,17]. The power of the test, or the proportion of times that the null hypothesis was correctly rejected, is shown in figure 4.
Figure 4.
Simulated power computation comparing exponentially distributed lifetimes against a mixture of q2 Weibull and (1 − q2) exponential distributions, where q2 determines the amount of mixing. A larger value of q2 implies more real difference between the survival functions of the two populations. The power of our method (black) is compared to the power of Kaplan–Meier Wilcoxon signed rank (blue) and log-rank (red) methods. (More power is better.)
3.2. Evaluating type I error in a real world example
We applied the survival estimator and statistic to NCCTG Lung Cancer data [18] available within the survival package for R. We compared the survival between male (n(1) = 136) and female (n(2) = 90) cancer patients, organized by ECOG performance score (z ∈ {0, 1, 2}) as cohort. Using males as population 1 and females as population 2, we arrived at the test-statistic estimate: , with 95% asymptotic confidence interval: (−1527, −396), which would support rejection (p ≈ 0.0009) of the null hypothesis () at α = 0.05. For reference, both the Wilcoxon (p ≈ 0.0012) and log-rank (p ≈ 0.0015) tests referenced in figure 5 also rejected the null hypothesis. In figure 5, cohort-level survival estimates are also shown.
Figure 5.

estimates for days of lung cancer survival in males (population 1) versus females (population 2) from the NCCTG lung cancer dataset. The statistic implies an asymptotic p-value of 0.0009, rejecting H0 at α = 0.05.
In theory, the type I error is set by the significance level at study design. Whether a statistic controls type I error correctly depends on accurate evaluation of its sampling distribution. In the case of , our main result is that the sampling distribution for this estimator converges asymptotically in distribution to a Gaussian with a definite variance. However, small-sample behaviour is not guaranteed. To evaluate type I error, we used the same dataset, restricted to male patients. For each of n ∈ {40, 80, 136}, we sampled without replacement the n male patients split into two groups so that n(1) = n(2) = n/2, and compared survival between the two random groups. Repeating this procedure 10 000 times, we generated the observed distribution of p-values, presented in figure 6 in log-scale. The distributions computed using the three methods are similar. The three methods all rejected H0 approximately of the time except for the case of at n = 40, which rejected H0 approximately of the time. Essentially, asymptotic convergence as defined by the accurate evaluation of α = 0.05 type I error occurs somewhere between 40 and 80 samples for this particular dataset.
Figure 6.
P-value distributions for the comparison between samples of size n/2 of two random subpopulations of male patients in the lung cancer data. The proportion of null hypotheses rejected for each of the three statistical methods is similar, at approximately for α = 0.05.
Probing deeper, we examined the sampling distributions of for each of n ∈ {50, 60, 70}, in each instance compared to the Gaussian distribution stated in theorem 2.1, where the approximation is computed using only the first sample of size n. The results for these simulations are shown in figure 7, where it is seen that the sampling distribution of is approximately the same as the computed asymptotic Gaussian distribution, which is traced out in red.
Figure 7.
Histograms of sampling distributions for comparing survival between random subsets of male lung cancer patients using sample sizes of n ∈ {50, 60, 70}. Traced in red, the asymptotic Gaussian density as computed using theorem 2.1 on the first sample set of each size is overlayed.
The R code used to compute these examples is available in appendix B.3.
4. Discussion and conclusion
In this paper, we have proposed a test statistic that uses a cohort-averaged survival function estimator in order to make cross-population comparisons of survival within a null hypothesis statistical testing framework. The proposed survival estimator was an empirically weighted average of cohort-level product-limit estimates. The test statistic involved computation of the area between estimated survival functions for two populations. By invoking an empirical stochastic process, we proved asymptotic normality of this test statistic.
Using simulations, we contrasted the weighted survival estimator against the pure Kaplan–Meier estimator. It is seen, in figure 3, that the survival curves generated from the two methods are distinct yet similar. In the second and third rows of figure 3, one sees that this reweighted estimator has comparable performance to the pure Kaplan–Meier estimator at large sample sizes. Asymptotically, both estimators converge to the true survival function, with variance converging to zero. At small sample sizes, there are differences. The reweighted estimator has reduced variance at the cost of larger bias, in a time-dependent manner. It also appears to have smaller variance at the cost of larger error at earlier times. This error at earlier times is mitigated by decreased error at later times (better reconstruction of tails); however, the estimator variance is lower at all times. Hence, dependent on costs, for small samples, this reweighted estimator may be preferable to the pure Kaplan–Meier estimator.
In simulations of the test statistic derived from the reweighted survival estimator, we saw superior performance compared to existing methods. In figure 4, it is seen that in all cases, the test statistic was better at distinguishing between the two populations than either the Wilcoxon signed-rank test or the log-rank test. The relatively high statistical power of this statistic is due to tighter variation in the test statistic. In nearly all cases (greater than 99.5%), the estimator variance for the tested method was less than that of the other two tests (not shown).
This paper derives the asymptotic convergence in distribution of the statistic. Numerically, we demonstrated convergence of the statistic in figures 6 and 7, where we verified that the asymptotic approximation respects type I error at α = 0.05 and where we observe good match between the sampling distribution of and the asymptotic Gaussian distribution provided by theorem 2.1.
A variant of this method was used in Rasch et al. [9] in order to classify physical disorders based on severity for the sake of prioritization of processing for disability claims. Since the underlying survival surface is non-stationary, and the fixed observation windows create progressive censoring, that paper illustrates the utility of this statistical method. In that paper, the cohorts were defined based on binned application entry times and a heuristic ‘survival surface’ was generated in order to get a single overall picture of the survivability of a given disorder. The censoring parameters τz varied due to the finite sampling window and the fact that more recent cohorts are not observed for as long a time period as older cohorts, as depicted in figure 1b. It was also expected that survival by cohort would vary due to differences in healthcare administration and treatment between entry cohorts. The use of the empirical prevalences () allowed the accounting for variability in disability application volume by sufferers of given disorders, conditional on entry date.
We note that a strong limitation of the presented method lies in its framing in terms of null hypothesis statistical testing. The statistic only provides a p-value, as opposed to other tests such as the log-rank test which provide hazard ratios as well. As a trade-off for statistical power, one is sacrificing interpretability in the form of effect sizes.
Although the most direct and natural applications of the method that we have presented here involve discretely indexed covariates, it is possible to use this method for continuously indexed covariates such as time by employing the binning strategy used by Rasch et al. [9]. This approach is particularly fruitful if the sampling windows are coarse, and there is a clear separation between cohorts to maintain statistical independence. In this situation, it may be unreasonable to expect to construct a full continuous surface for survival. Nonetheless, a possible future extension of this method might involve replacing the sum of equation (2.1) with an integral and using statistical regularization tools [19] in order to infer true continuously indexed surfaces.
Supplementary Material
Acknowledgements
The authors thank Dr Leighton Chan and Dr Elizabeth Rasch for insightful discussions, guidance and support, Dr Pei-Shu Ho for help obtaining data.
Appendix A. Proof of the main theorem
To prove the main theorem, we use an empirical process modelling framework to develop the asymptotic properties of first deterministically proportionally weighted Kaplan–Meier estimators. We then replace the deterministic proportions with estimates given by the sample prevalences of the cohorts. Here, we restate the main theorem and prove it through a series of lemmata.
Theorem 2.1. —
Let denote the probability that a z-type individual has not yet been censored at time t ≥ 0 (the survival probability relative to the occurrence of censoring), and denote the probability that an individual in population i is of cohort z, and let p(i) = n(i)/(n(1) + n(2)). Suppose that Then , as n(i) → ∞, with
where for , where τz is the time at which samples of cohort z are censored, , ϕz ≡ ϕz,0, Sz,t is the survival function for the pooled data of cohort z, and
The variance σ2 may be consistently estimated by
A 1 where for , is the product-limit estimate of the pooled data for cohort z
A 2 is the product-limit estimate associated with the event of censoring for cohort z within population i, and
A 3 Overview of proof of theorem 2.1. To prove the main theorem, we turn to the modelling framework that we present in A.2. In general, we proceed by first assuming fixed sample proportions and then extending results to random proportions as given by empirical prevalence (equation (A 2)). The convergence of follows directly from corollary A.9 and equation (A 18). The consistency of follows from theorem 4.2.2 of [2], which provides for weak convergence of the product limit estimator to a Gaussian process, and the Glivenko–Cantelli theorem. ▪
A.1. Preliminaries and notation
Given any pair of random elements X, Y, we denote equality in a distributional sense by X ≈ Y. Let be a probability measure on the measurable space . The empirical measure generated by the sample of random elements x1, …, xn, is given by
| A 4 |
where for any x ∈ X, and any ,
| A 5 |
Note that alternatively, when needed, one may write δx(A) as the indicator function 1A(x) on the set A. Furthermore, in the case that A = {k}, , and , we write δx(A) ≡ δx,k.
Given , a class of measurable functions , the empirical measure generates the map given by , where for any signed measure Q and measurable function h, we use the notation . Furthermore, define the -indexed empirical process by
| A 6 |
and with the empirical process, identify the signed measure .
Note that for a measurable function h, from the law of large numbers and the central limit theorem, it follows that , and , provided exists and , and where ‘’ denotes convergence in distribution. In addition to the preceding notation, given the elements f, and fn, , we also denote, respectively, convergence in probability and in distribution, of fn to f, by .
For any map , , define the uniform norm by
| A 7 |
and in the case that , write . A class for which is called a -Glivenko–Cantelli class. Denote by the class of uniformly bounded functions on . That is, for a general ,
If for some tight Borel measurable element , , in , we say that is a -Donsker class.
A.2. Empirical process framework
To prove theorem 2.1, we turn to an empirical modelling framework that will provide us the asymptotic statistics of the weighted product limit estimator. Consider a closed particle system, such that according to a predefined set of characteristics, the system can be subdivided into mutually exclusive subsystems.
Each particle corresponds to the observed state of a particular individual in a fixed population cohort. Note that we will restrict this discussion to only a single population of particles. These arguments will extend to multiple populations as mentioned in this paper by treating separate populations as independent.
At any given time t ≥ 0, each particle will have exactly one associated state x in the set , referring, respectively, to states of
| A 8 |
Assume that the path of any particle is statistically dependent upon its particular subsystem, and that given the respective subsystems of any two particles, their resulting paths are statistically independent. Assume further that at a reference time t = 0, all particles enter into the active state (x = 1), and that particles are considered dormant for all t < 0.
Let and τ ∈ (0, ∞) be fixed. We will assume the existence of a collection of individuals Γ, assumed to be infinite in size, where each individual γ ∈ Γ exhibits a càdlàg path-valued state , for t ≥ 0. For each γ ∈ Γ, is determined by the individual's particle type and a random jump time . The particle type is distributed in the population through the probability mass , where satisfies . Let St = (S1,t, …, Sd,t) be the survival vector , which is assumed continuous for t ≥ 0. Suppose that it is desired to understand the event probabilities for randomly selected γ ∈ Γ, unconditional on subgroup membership. We assume that members of each cohort are in the inactive (0) state at times t < 0.
Given a random sample γ1, …, γn, of individuals, let n = (n1, …, nd) and
| A 9 |
where nz is the random number of drawn individuals of cohort z. In considering the event time probabilities of each subgroup, the random number of particles excludes the use of many well-established results in survival analysis. Therefore, we begin with a somewhat restricted framework, and assume a known number of initial individuals of each type.
Assume the sample contains a known number nz = az n, az ∈ (0, 1), of individuals of cohort z, and let be the number of the cohort z individuals who are in state at time t, so that
| A 10 |
is conserved. Also, we assume that there exists τz < ∞ when all particles either become inactive or censored so that τz is the infimum time where the condition
| A 11 |
holds.
For the sample of size nz, we denote the z-type cumulative hazard by Λz,t and, respectively, define the z-type cumulative hazard and survival estimates by
| A 12 |
and
| A 13 |
Define further
and note that and for all t ≥ τz.
From [2], it follows that is a mean-zero square-integrable martingale with Meyer bracket process
| A 14 |
where and is the Kroenicker delta function.
A.3. Convergence theorems
In order to guarantee convergence of the estimator, we make the following assumptions (based upon an initially known sample size distribution n).
Assumption A.1. —
We assume that the initial sample is chosen large enough to ensure that individuals of cohort z, at state 1 (active), exist at all points t ∈ [0, τz], z ∈ {1, …, d}. That is,
Since any survival function is monotone, an immediate result that follows from the above assumption is
| A 15 |
for some constant c > 0.
Assumption A.2. —
It is assumed that as n becomes large, the sample size for each individual type will grow to infinity. That is,
Assumption A.3. —
For each z ∈ {1, …, d} there exists a non-increasing continuous function mz : [0, ∞) → (0, 1] such that
Note that in the case of fixed censoring, that is, in the case that censoring exists only at time τ, the above is satisfied by mz,t = Sz,t. In the general case, mz,t can be seen as the probability that an individual of cohort z has not yet left state 1. That is, mz,t is the probability that an individual has not left due to censoring or death by time t, and so mz,t = Sz,t Cz,t−, where Cz,t is the probability that censoring has not occurred by time t.
To prove the main theorem, we now present a series of lemmata.
Lemma A.4. —
If is defined as in equation (2.3) and is defined as in equation (A 13), then
as n → ∞, uniformly in t ≥ 0.
Proof. —
It is claimed that to prove the statement of the lemma, it suffices to show that
A 16 uniformly in t ≥ 0, for each .
Indeed, for if the above holds, then
uniformly in t ≥ 0. Since the central limit theorem implies that , each term in the sum would converge in probability to 0, uniformly in t ≥ 0.
And so, if denotes the expectation given N, we have that
for some arbitrary constant C. From Lenglart’s inequality (cf. [20]),
for any arbitrary η, ε > 0. Therefore, from assumption A.2, since nz → ∞ a.s., the desired result follows. ▪
Turning momentarily to the situation where there are two populations denoted by superscripts (1) and (2), for any t ≥ 0, define
noting that setting recovers our test statistic of equation (2.4). For a general survival function θ, with respective estimate , define by
| A 17 |
If the process converges in distribution to some Y ∼ N(0, σ2), since n(i)/(n(1) + n(2)) converges to p(i), i = 1, 2, it follows that
| A 18 |
Now we turn to analysis under a single population, dropping the superscripts. Note that , where
| A 19 |
Therefore, if it can be shown that
uniformly in t, then convergence of is dependent only upon the convergence of the d-dimensional vector-valued process given by
| A 20 |
with a = (a1, …, ad) ∈ (0, 1) d chosen in a sufficiently small neighbourhood V of q. This decomposition will thus lead to the main theorem. To show the desired convergence of , we first focus on convergence of .
Let and write , where
| A 21 |
and
| A 22 |
Lemma A.5. —
Suppose that and are the processes respectively defined by equations (A 21) and (A 22), and that is the d-dimensional mean-zero Gaussian process defined by
A 23 Then in the space of compactly supported functions for each a ∈ V, where is the mean-zero square-integrable Gaussian process defined by
A 24 and is given by
A 25 The processes and are independent, and there exist Skorohod representations such that
and
almost surely as n → ∞.
Proof. —
To begin note that independence follows immediately from the independence of the respective limiting processes. Since n is a multinomial random variable, (A 24) follows from the central limit theorem. In the case of , we first consider .
An application of Lenglart’s inequality, very similar to that in the proof of lemma A.4, along with assumption A.2, shows that
Moreover, from assumption A.3,
It follows that
uniformly in t ≥ 0, and since mz,t = Sz,t Cz,t−,
Therefore, from theorem 4.2.1 of [2], , and there exists a Skorohod representation of such that
almost surely as n → ∞. Since almost sure convergence of implies almost sure convergence of bounded functionals of , the desired convergence of follows from theorem 2.1 of [3]. ▪
Corollary A.6. —
If the process is defined by equation (A 20), then
A 26
Proof. —
From the previous theorem, we may assume that and almost surely, uniformly for a ∈ V and t ≥ 0. Therefore,
almost surely, uniformly for a ∈ V and t ≥ 0. The statement of the theorem then follows from theorem 5.1 of [21]. ▪
Since , from theorem 4.4 of [21]
Define the map by , then
Furthermore, if for any we have that
for some δ > 0, then
Therefore, g is continuous at any (a, f) such that f is continuous at a, uniformly in t. It thus follows from the continuous mapping theorem (cf. [4]) that if is continuous, uniformly in t, then
| A 27 |
Lemma A.7. —
If {ζt(a) : t ≥ 0} is defined as in corollary A.6, then the map
is continuous for a ∈ V, uniformly in t ≥ 0.
Proof. —
For any a, b ∈ V, it follows that
Since for all z, from Doob’s martingale inequality (cf. [22]),
for some arbitrary constant C. For each , since az and bz are sufficiently close to qz ∈ (0, 1), it follows that there exists some δ > 0 such that . Therefore,
Combining the above two results gives
and so, by Kolmogorov’s continuity criterion (cf. [22]), the desired result follows. ▪
The above lemma, along with the argument immediately preceding, gives the following.
Theorem A.8. —
Let and be defined as in corollary A.6, then
A 28
Corollary A.9. —
If , then
where
Proof. —
Note that when t = τz, we have
which are independent and normally distributed, implying that is also normally distributed. Furthermore,
which after recombining the final terms, gives the desired result. ▪
Appendix B. Computation
B.1. Installation of R package
The following code installs the R package from github sources:

B.2. Simulation of data used in this paper
We simulated draws from the populations mentioned in the main text using the following R code:

B.3. Real-world example

B.3.1. Simulations for examining the sampling distribution of

Data accessibility
All data in this paper are simulated, with R source code provided in appendix B.2.
Authors' contributions
A.H. and M.H. developed the statistical method. A.H., M.H. and J.C.C. wrote the proof, performed the simulations and wrote the manuscript. J.C.C generated the figures. All authors gave final approval for publication.
Competing interests
The authors declare no competing interests.
Funding
This work is supported by the Intramural Research Program of the National Institutes of Health Clinical Center and the US Social Security Administration.
References
- 1.Kaplan EL, Meier P. 1958. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481. ( 10.1080/01621459.1958.10501452) [DOI] [Google Scholar]
- 2.Gill RD. 1980. Censoring and stochastic integrals. Statistica Neerlandica 34, 124–124. ( 10.1111/j.1467-9574.1980.tb00692.x) [DOI] [Google Scholar]
- 3.Gill R. 1983. Large sample behaviour of the product-limit estimator on the whole line. Ann. Stat. 11, 49–58. ( 10.1214/aos/1176346055) [DOI] [Google Scholar]
- 4.Van Der Vaart A. 1996. New donsker classes. Ann. Probab. 24, 2128–2140. ( 10.1214/aop/1041903221) [DOI] [Google Scholar]
- 5.Shorack GR, Wellner JA. 2009. Empirical processes with applications to statistics, vol. 59 Philadelphia, PA: SIAM. [Google Scholar]
- 6.Hougaard P. 1984. Life table methods for heterogeneous populations: distributions describing the heterogeneity. Biometrika 71, 75–83. ( 10.1093/biomet/71.1.75) [DOI] [Google Scholar]
- 7.Aalen OO. 1994. Effects of frailty in survival analysis. Stat. Methods Med. Res. 3, 227–243. ( 10.1177/096228029400300303) [DOI] [PubMed] [Google Scholar]
- 8.Lin DY, Ying Z. 1993. A simple nonparametric estimator of the bivariate survival function under univariate censoring. Biometrika 80, 573–581. ( 10.1093/biomet/80.3.573) [DOI] [Google Scholar]
- 9.Rasch EK, Huynh M, Ho P-S, Heuser A, Houtenville A, Chan L. 2014. First in line: prioritizing receipt of social security disability benefits based on likelihood of death during adjudication. Med. Care 52, 944–950. ( 10.1097/MLR.0000000000000204) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.SullivanPepe M, Fleming TR. 1989. Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data. Biometrics 45, 497–507. ( 10.2307/2531492) [DOI] [PubMed] [Google Scholar]
- 11.SullivanPepe M, Fleming TR. 1991. Weighted Kaplan-Meier statistics: large sample and optimality considerations. J. R. Stat. Soc. B 53, 341–352. ( 10.2307/2345745) [DOI] [Google Scholar]
- 12.Murray S. 2001. Using weighted Kaplan-Meier statistics in nonparametric comparisons of paired censored survival outcomes. Biometrics 57, 361–368. ( 10.1111/j.0006-341X.2001.00361.x) [DOI] [PubMed] [Google Scholar]
- 13.Zare A, Mahmoodi M, Mohammad K, Zeraati H, Hosseini M, Naieni KH. 2014. A comparison between Kaplan-Meier and weighted Kaplan-Meier methods of five-year survival estimation of patients with gastric cancer. Acta Medica Iranica 52, 764–767. [PubMed] [Google Scholar]
- 14.Cai Z. 1998. Asymptotic properties of Kaplan-Meier estimator for censored dependent data. Stat. Probab. Lett. 37, 381–389. ( 10.1016/S0167-7152(97)00141-7) [DOI] [Google Scholar]
- 15.Berty HP, Shi H, Lyons-Weiler J. 2010. Determining the statistical significance of survivorship prediction models. J. Eval. Clin. Pract. 16, 155–165. ( 10.1111/j.1365-2753.2009.01199.x) [DOI] [PubMed] [Google Scholar]
- 16.Wilcoxon F. 1945. Individual comparisons by ranking methods. Biometr. Bull. 1, 80–83. ( 10.2307/3001968) [DOI] [Google Scholar]
- 17.Schoenfeld D. 1981. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68, 316–319. ( 10.1093/biomet/68.1.316) [DOI] [Google Scholar]
- 18.Loprinzi CL. et al. 1994. Prospective evaluation of prognostic variables from patient-completed questionnaires. North central cancer treatment group. J. Clin. Oncol. 12, 601–607. ( 10.1200/JCO.1994.12.3.601) [DOI] [PubMed] [Google Scholar]
- 19.Chang JC, Savage VM, Chou T. 2014. A path-integral approach to Bayesian inference for inverse problems using the semiclassical approximation. J. Stat. Phys. 157, 582–602. ( 10.1007/s10955-014-1059-y) [DOI] [Google Scholar]
- 20.Lenglart E. 1977. Relation de domination entre deux processus. Ann. Inst. H. Poincaré Sect. B (NS) 13, 171–179. [Google Scholar]
- 21.Billingsley P. 2013. Convergence of probability measures. New York, NY: John Wiley & Sons. [Google Scholar]
- 22.Karatzas I, Shreve S. 2012. Brownian motion and stochastic calculus, vol. 113 Berlin, Germany: Springer Science & Business Media. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data in this paper are simulated, with R source code provided in appendix B.2.






