Skip to main content
Royal Society Open Science logoLink to Royal Society Open Science
. 2018 Nov 21;5(11):180496. doi: 10.1098/rsos.180496

Asymptotic convergence in distribution of the area bounded by prevalence-weighted Kaplan–Meier curves using empirical process modelling

Aaron Heuser 2, Minh Huynh 2, Joshua C Chang 1,
PMCID: PMC6281901  PMID: 30564383

Abstract

The Kaplan–Meier product-limit estimator is a simple and powerful tool in time to event analysis. An extension exists for populations stratified into cohorts where a population survival curve is generated by weighted averaging of cohort-level survival curves. For making population-level comparisons using this statistic, we analyse the statistics of the area between two such weighted survival curves. We derive the large sample behaviour of this statistic based on an empirical process of product-limit estimators. This estimator was used by an interdisciplinary National Institutes of Health–Social Security Administration team in the identification of medical conditions to prioritize for adjudication in disability benefits processing.

Keywords: survival analysis, Kaplan–Meier, heterogeneous distribution, non-parametric, hypothesis test, asymptotic analysis

1. Introduction

Survival analysis addresses the classical statistical problem of determining characteristics of the waiting time until an event, canonically death, from observations of their occurrence sampled from within a population. This problem is not trivial as the expected waiting time is typically dependent on the time-already-waited. For instance, a hundred-year-old can be more certain of surviving to his or her one hundred-and-first birthday than a newborn might reasonably be. However, the comparison may shift in the newborn’s favour for the living to 121, particularly in light of medical advances that make survival probabilities non-stationary. Parametric approaches for assembling survival curves are usually not flexible enough to capture this complexity.

One simple approach to this problem was pioneered by the work of Kaplan & Meier [1]. Their product-limit estimator [25] is a non-parametric statistic that is used for inferring the survival function for members of a population from observed lifetimes. This method is particularly useful in that it naturally handles the presence of right censoring, where some event times are only partially observed because they fall outside the observation window. It was not, however, designed to account for varying subpopulations that may yield non-homogeneity in overall population survival (figure 1). For instance, in the example given above, subpopulations for survival characteristics may be defined by birth year or entry cohort of a subject in a particular study (figure 1).

Figure 1.

Figure 1.

Inhomogeneity of survival within populations can result due to at least two reasons. In (a), inhomogeneity results from a categorical covariate that influences survival statistics. In (b), inhomogeneity results from non-stationarity, where cohorts of individuals are sampled at different times. In this case, the problem of progressive censoring is apparent because later cohorts have not been observed as long.

Several existing statistical methods address variants of this limitation. A natural approach is to consider the varying subpopulations as defining underlying covariates, thus laying the framework for a proportional hazards model. The assumption of proportional hazards is quite strong. When considering time-dependent statistics (as in the motivational example), it is violated in all but a few specific cases. Likewise, frailty models, first developed by Hougaard (cf. [6]), and extended by Aalen (cf. [7]), assume multivariate event distributions, but also make assumptions on the underlying event distributions and assume proportional hazards.

Other existing methods, such as bivariate survival analysis (cf. [8]), consider the time to observation and the time to event as conditionally independent random times. Underlying these methods is the assumption that upon the time of observation, all individuals will then have a similar event time distribution, thus failing to acknowledge the temporal changes.

These complexities arose in the identification of new disorders to incorporate into the United States Social Security Administration (SSA)’s Compassionate Allowances (CAL) initiative. The CAL initiative seeks to identify candidate medical conditions for fast-tracking in the processing of disability applications. The intent of this initiative is to prioritize applicants who are most likely to die in the time-course of usual case processing so that they may receive benefits while still living.

At its inception, the CAL initiative identified conditions based on the counsel of expert opinion [9]. The SSA in collaboration with the National Institutes of Health (NIH) sought to expand the list of CAL conditions systematically, using a databased approach. Using in part the survival estimator described in this paper, the NIH identified 24 conditions for inclusion into the list of conditions [9].

The methodology used in CAL is related to that of the work of Pepe & Fleming (cf. [10,11]), where a class of weighted Kaplan–Meier statistics is introduced. Though these statistics exhibit the same limitations as in the standard Kaplan–Meier case, it should be noted that [11] introduces the stratified weighted Kaplan–Meier statistic. The statistic presented here is a priori quite similar, but instead of a weighting function, includes the empirical prevalence. In doing so, the weight is no longer independent of the event time estimate, and thus requires much different methods of proof.

We thus consider the overall survival distribution for a population of individuals with sub-populations that exhibit non-homogeneous survival distributions. Through this consideration, a new test statistic, based upon the empirical process of product-limit estimators is developed. Through constructive methods, this test statistic compares survival distributions among the distinct subpopulations, and weights according to distribution of the identified subgroups.

2. Statistical method

Suppose Γ(1) and Γ(2) are disjoint populations of individuals where each individual belongs to exactly one of d distinct cohorts labelled zZd. For randomly selected individuals γΓ(i) within population i, we desire to understand the statistics of the event time Tγ under the assumption that survival is conditional on cohort zγ and population.

One representation of the marginal survival probability for members of population i, θt(i)=P{Tγ>tγΓ(i)}, is found by conditioning on cohort

θt(i)=z=1dP{Tγ>tzγ=z,γΓ(i)}Sz,t(i)P{zγ=zγΓ(i)}qz(i), 2.1

where Sz,t(i) represents the survival function for individuals of cohort z in population i, where each individual’s cohort membership is known.

We use this representation of the survival probability as motivation to formulate an estimator for the population-average survival functions

θ^t(i)=z=1dq^z(i)S^z,t(i), 2.2

where q^z(i) and S^z,t(i) are estimators of the cohort prevalence and cohort-wise survival, respectively. This weighted Kaplan–Meier method has appeared previously in the literature [12], and has been empirically validated against the pure Kaplan–Meier method [13], where the weighting procedure was found to reduce the bias in the construction of survival curves. The asymptotic convergence of the product-limit estimator and weighted variants is well established [11,14]. We use this survival curve reconstruction method as a base in constructing a new statistic for comparing populations. The focus of this paper is not the properties of this survival estimator but rather the asymptotic convergence of its bounding area and the use of such a quantity for evaluating a null hypothesis.

Our concern is the general situation where random samples of size n(i) are chosen from each of the respective populations. Within these samples, the number of individuals within each cohort, nz(i), is counted, from which an estimator of the cohort distribution is obtained

q^z(i)=nz(i)n(i). 2.3

In turn, we assume that the cohort-level survival functions S^z,t(i) are estimated independently using the product-limit estimator. Note that since the product-limit estimator is not a linear functional of sampled lifetimes, θ^t(i) is distinct from the estimator obtained by applying the product-limit estimator directly on all n(i) samples of population i. To prevent confusion, we denote all direct applications of the product-limit estimator using S^ and all instances of weighted sums of product-limit estimators using the Greek letter θ^.

With these elements in place, we define our test statistic

Θ^=n(1)n(2)n(1)+n(2)0τdt(θ^t(1)θ^t(2)), 2.4

where τ=inf{τz:zZd}, and τz denotes the time at which cohort z is censored in observations. Note that in the absence of random prevalence this statistic is equivalent to comparison of mean lifetimes between the two populations [10]. We state here the main result of the paper—the large sample behaviour of this statistic within a null-hypothesis statistical testing framework.

Theorem 2.1. —

Let Cz,t(i) denote the probability that a z-type individual has not yet been censored at time t ≥ 0 (the survival probability relative to the occurrence of censoring), and qz(i) denote the probability that an individual in population i is of cohort z, and let p(i) = n(i)/(n(1) + n(2)). Suppose that θt(1)=θt(2). Then Θ^dN(0,σ2), as n(i) → ∞, with

σ2=i=12(1p(i))z=1dqz(i)ϕz2(z=1dqz(i)ϕz)2z=1d0τzdSz,tWz,t×ϕz,tSz,t2,

where for 0tτz, where τz is the time at which samples of cohort z are censored, ϕz,t=tτzdsSz,s, ϕzϕz,0, Sz,t is the survival function for the pooled data of cohort z, and

Wz,t=p(1)Cz,t(1)qz(2)+p(2)Cz,t(2)qz(1)Cz,t(1)Cz,t(2).

Note that this quantity is well defined since by definition of τz, Cz,t(z)>0 for all tτz. The variance σ2 may be consistently estimated by

σ^2=i=12(1p(i))z=1dq^z(i)ϕ^z2(z=1dq^z(i)ϕ^z)2z=1d0τzdS^z,tW^z,t×ϕ^z,tS^z,t2, 2.5

where for 0tτz, S^z,t is the product-limit estimate of the pooled data for cohort z,

ϕ^z,t=tτzdsS^z,s, 2.6

C^z,t(i) is the product-limit estimate associated with the event of censoring for cohort z within population i, ϕ^zϕ^z,0, and

W^z,t=p(1)C^z,t(1)q^z(2)+p(2)C^z,t(2)q^z(1)C^z,t(1)C^z,t(2). 2.7

Note that this quantity is also well defined since C^z,t(z)>0 for all tτz. In appendix A, we provide a proof of theorem 2.1 in an empirical process framework. Note that since survival estimates θ^ and S^ are step functions, all integrals are exactly computable.

3. Numerical investigation

A computational implementation of the test statistic Θ^ and weighted survival estimators is available in the form of a package for R. This package also contains a class to handle arithmetic involving right-continuous piecewise linear functions. In the appendices, we have provided source code that may be used for installing and invoking this package.

Here, we present a computational investigation of the weighted survival curve estimator and the corresponding test statistic. Using simulations, we investigated the statistical power of Θ^, contrasted with that of existing non-parametric methods. Using a real dataset, we demonstrate the computation of Θ^, θ^t, and evaluate type I error.

3.1. Evaluating statistical power through simulations

Using simulations, we explored the statistical power of the test statistic Θ^ in a case where populations are difficult to distinguish based purely on mean survival time. As test populations, we examined admixtures of exponential and Weibull distributions for the event time, and compared survival in these mixture populations to survival of a population of purely exponential event times (figure 2). Population 1 consists of individuals having an exponentially distributed lifetime with a mean of λ−1 = 4 years. Population 2 consists of two types of individuals: those who have an exponentially distributed lifetime with a mean of 5 years (type z = 1), and those of type z = 2 who have a Weibull distributed lifetime with shape parameter k = 5 and scale parameter λ = 1.

Figure 2.

Figure 2.

Admixture test distributions used in simulated investigations of our estimator. Populations formed using q2 ∈ [0, 1) admixtures of (1 − q2)exponential(λ = 5−1) and q2Weibull(k = 5, λ = 1) event time distributions. Event time density functions πt and corresponding survival functions St are shown for various values of q2.

Since Population 1 is homogeneous, we only track subpopulations of Population 2—we drop the superscript and denote the proportion of Population 2’s members of type 2 by q2. It is most instructive to examine our method in the neighbourhood where both populations have approximately the same expectation value for the event time, which occurs for q2 ≈ 0.245. For this reason, we chose values near 0.25 for our simulations.

To compare the reweighted Kaplan–Meier estimator (equation (2.2)) to the standard Kaplan–Meier estimator, we estimated survival for the admixed population for q2 = 0.25, using various sample sizes. In figure 3, we present example reconstructions using these two methods. The estimator variance was approximated using 10 000 resamplings of sample size n of the admixed population, for each value of n. The estimation error, as defined by mean-squared difference between the reconstruction and the true survival function, was approximated in the same manner.

Figure 3.

Figure 3.

Comparing estimators of survival. The survival estimation method of equation (2.2) compared to pure Kaplan–Meier for a population containing an admixture of (1q2)exponential(15) and q2Weibull(1, 5) individuals, where q2 = 0.25. At a given sample size n, the survival estimates are obtained (top row: examples shown and contrasted). The estimator variance and mean square error were approximated using 10 000 resamplings for each of the sample sizes.

To better understand the performance of the test statistic (equation (2.4)), we evaluated its statistical power against that of other test statistics in distinguishing between Population 1 and Population 2 for various values of q2. For samples of size n(i) ∈ {30, 50, 100, 200, 1000} taken from each population, we performed 1000 null hypothesis statistical tests using our method, the log-rank method [15], and the standard Kaplan–Meier Wilcoxon signed-rank difference-of-mean methods [16,17]. The power of the test, or the proportion of times that the null hypothesis was correctly rejected, is shown in figure 4.

Figure 4.

Figure 4.

Simulated power computation comparing exponentially distributed lifetimes against a mixture of q2 Weibull and (1 − q2) exponential distributions, where q2 determines the amount of mixing. A larger value of q2 implies more real difference between the survival functions of the two populations. The power of our method (black) is compared to the power of Kaplan–Meier Wilcoxon signed rank (blue) and log-rank (red) methods. (More power is better.)

3.2. Evaluating type I error in a real world example

We applied the survival estimator and statistic to NCCTG Lung Cancer data [18] available within the survival package for R. We compared the survival between male (n(1) = 136) and female (n(2) = 90) cancer patients, organized by ECOG performance score (z ∈ {0, 1, 2}) as cohort. Using males as population 1 and females as population 2, we arrived at the test-statistic estimate: Θ^=961, with 95% asymptotic confidence interval: (−1527, −396), which would support rejection (p ≈ 0.0009) of the null hypothesis (θ^t(1)=θ^t(2)) at α = 0.05. For reference, both the Wilcoxon (p ≈ 0.0012) and log-rank (p ≈ 0.0015) tests referenced in figure 5 also rejected the null hypothesis. In figure 5, cohort-level survival estimates are also shown.

Figure 5.

Figure 5.

θ^t estimates for days of lung cancer survival in males (population 1) versus females (population 2) from the NCCTG lung cancer dataset. The statistic Θ^ implies an asymptotic p-value of 0.0009, rejecting H0 at α = 0.05.

In theory, the type I error is set by the significance level at study design. Whether a statistic controls type I error correctly depends on accurate evaluation of its sampling distribution. In the case of Θ^, our main result is that the sampling distribution for this estimator converges asymptotically in distribution to a Gaussian with a definite variance. However, small-sample behaviour is not guaranteed. To evaluate type I error, we used the same dataset, restricted to male patients. For each of n ∈ {40, 80, 136}, we sampled without replacement the n male patients split into two groups so that n(1) = n(2) = n/2, and compared survival between the two random groups. Repeating this procedure 10 000 times, we generated the observed distribution of p-values, presented in figure 6 in log-scale. The distributions computed using the three methods are similar. The three methods all rejected H0 approximately 5% of the time except for the case of Θ^ at n = 40, which rejected H0 approximately 6% of the time. Essentially, asymptotic convergence as defined by the accurate evaluation of α = 0.05 type I error occurs somewhere between 40 and 80 samples for this particular dataset.

Figure 6.

Figure 6.

P-value distributions for the comparison between samples of size n/2 of two random subpopulations of male patients in the lung cancer data. The proportion of null hypotheses rejected for each of the three statistical methods is similar, at approximately 5% for α = 0.05.

Probing deeper, we examined the sampling distributions of Θ^ for each of n ∈ {50, 60, 70}, in each instance compared to the Gaussian distribution stated in theorem 2.1, where the approximation is computed using only the first sample of size n. The results for these simulations are shown in figure 7, where it is seen that the sampling distribution of Θ^ is approximately the same as the computed asymptotic Gaussian distribution, which is traced out in red.

Figure 7.

Figure 7.

Histograms of Θ^ sampling distributions for comparing survival between random subsets of male lung cancer patients using sample sizes of n ∈ {50, 60, 70}. Traced in red, the asymptotic Gaussian density as computed using theorem 2.1 on the first sample set of each size is overlayed.

The R code used to compute these examples is available in appendix B.3.

4. Discussion and conclusion

In this paper, we have proposed a test statistic that uses a cohort-averaged survival function estimator in order to make cross-population comparisons of survival within a null hypothesis statistical testing framework. The proposed survival estimator was an empirically weighted average of cohort-level product-limit estimates. The test statistic involved computation of the area between estimated survival functions for two populations. By invoking an empirical stochastic process, we proved asymptotic normality of this test statistic.

Using simulations, we contrasted the weighted survival estimator against the pure Kaplan–Meier estimator. It is seen, in figure 3, that the survival curves generated from the two methods are distinct yet similar. In the second and third rows of figure 3, one sees that this reweighted estimator has comparable performance to the pure Kaplan–Meier estimator at large sample sizes. Asymptotically, both estimators converge to the true survival function, with variance converging to zero. At small sample sizes, there are differences. The reweighted estimator has reduced variance at the cost of larger bias, in a time-dependent manner. It also appears to have smaller variance at the cost of larger error at earlier times. This error at earlier times is mitigated by decreased error at later times (better reconstruction of tails); however, the estimator variance is lower at all times. Hence, dependent on costs, for small samples, this reweighted estimator may be preferable to the pure Kaplan–Meier estimator.

In simulations of the test statistic derived from the reweighted survival estimator, we saw superior performance compared to existing methods. In figure 4, it is seen that in all cases, the test statistic Θ^ was better at distinguishing between the two populations than either the Wilcoxon signed-rank test or the log-rank test. The relatively high statistical power of this statistic is due to tighter variation in the test statistic. In nearly all cases (greater than 99.5%), the estimator variance for the tested method was less than that of the other two tests (not shown).

This paper derives the asymptotic convergence in distribution of the Θ^ statistic. Numerically, we demonstrated convergence of the statistic in figures 6 and 7, where we verified that the asymptotic approximation respects type I error at α = 0.05 and where we observe good match between the sampling distribution of Θ^ and the asymptotic Gaussian distribution provided by theorem 2.1.

A variant of this method was used in Rasch et al. [9] in order to classify physical disorders based on severity for the sake of prioritization of processing for disability claims. Since the underlying survival surface is non-stationary, and the fixed observation windows create progressive censoring, that paper illustrates the utility of this statistical method. In that paper, the cohorts were defined based on binned application entry times and a heuristic ‘survival surface’ was generated in order to get a single overall picture of the survivability of a given disorder. The censoring parameters τz varied due to the finite sampling window and the fact that more recent cohorts are not observed for as long a time period as older cohorts, as depicted in figure 1b. It was also expected that survival by cohort would vary due to differences in healthcare administration and treatment between entry cohorts. The use of the empirical prevalences (q^z) allowed the accounting for variability in disability application volume by sufferers of given disorders, conditional on entry date.

We note that a strong limitation of the presented method lies in its framing in terms of null hypothesis statistical testing. The Θ^ statistic only provides a p-value, as opposed to other tests such as the log-rank test which provide hazard ratios as well. As a trade-off for statistical power, one is sacrificing interpretability in the form of effect sizes.

Although the most direct and natural applications of the method that we have presented here involve discretely indexed covariates, it is possible to use this method for continuously indexed covariates such as time by employing the binning strategy used by Rasch et al. [9]. This approach is particularly fruitful if the sampling windows are coarse, and there is a clear separation between cohorts to maintain statistical independence. In this situation, it may be unreasonable to expect to construct a full continuous surface for survival. Nonetheless, a possible future extension of this method might involve replacing the sum of equation (2.1) with an integral and using statistical regularization tools [19] in order to infer true continuously indexed surfaces.

Supplementary Material

Reviewer comments

Acknowledgements

The authors thank Dr Leighton Chan and Dr Elizabeth Rasch for insightful discussions, guidance and support, Dr Pei-Shu Ho for help obtaining data.

Appendix A. Proof of the main theorem

To prove the main theorem, we use an empirical process modelling framework to develop the asymptotic properties of first deterministically proportionally weighted Kaplan–Meier estimators. We then replace the deterministic proportions with estimates given by the sample prevalences of the cohorts. Here, we restate the main theorem and prove it through a series of lemmata.

Theorem 2.1. —

Let Cz,t(i) denote the probability that a z-type individual has not yet been censored at time t ≥ 0 (the survival probability relative to the occurrence of censoring), and qz(i) denote the probability that an individual in population i is of cohort z, and let p(i) = n(i)/(n(1) + n(2)). Suppose that θt(1)=θt(2). Then Θ^dN(0,σ2), as n(i) → ∞, with

σ2=i=12(1p(i))z=1dqz(i)ϕz2(z=1dqz(i)ϕz)2z=1d0τzdSz,tWz,t×ϕz,tSz,t2,

where for 0tτz, where τz is the time at which samples of cohort z are censored, ϕz,t=tτzdsSz,s, ϕzϕz,0, Sz,t is the survival function for the pooled data of cohort z, and

Wz,t=p(1)Cz,t(1)qz(2)+p(2)Cz,t(2)qz(1)Cz,t(1)Cz,t(2).

The variance σ2 may be consistently estimated by

σ^2=i=12(1p(i))z=1dq^z(i)ϕ^z2(z=1dq^z(i)ϕ^z)2z=1d0τzdS^z,tW^z,t×ϕ^z,tS^z,t2, A 1

where for 0tτz, S^z,t is the product-limit estimate of the pooled data for cohort z

ϕ^z,t=tτzdsS^z,s, A 2

C^z,t(i) is the product-limit estimate associated with the event of censoring for cohort z within population i, ϕ^zϕ^z,0, and

W^z,t=p(1)C^z,t(1)q^z(2)+p(2)C^z,t(2)q^z(1)C^z,t(1)C^z,t(2). A 3

Overview of proof of theorem 2.1. To prove the main theorem, we turn to the modelling framework that we present in A.2. In general, we proceed by first assuming fixed sample proportions and then extending results to random proportions as given by empirical prevalence (equation (A 2)). The convergence of Θ^ follows directly from corollary A.9 and equation (A 18). The consistency of σ^2 follows from theorem 4.2.2 of [2], which provides for weak convergence of the product limit estimator to a Gaussian process, and the Glivenko–Cantelli theorem. ▪

A.1. Preliminaries and notation

Given any pair of random elements X, Y, we denote equality in a distributional sense by XY. Let P be a probability measure on the measurable space (X,A). The empirical measure generated by the sample of random elements x1, …, xn, nN is given by

Pn=n1i=1nδxi, A 4

where for any xX, and any AA,

δx(A)=1,xA,0,xA A 5

Note that alternatively, when needed, one may write δx(A) as the indicator function 1A(x) on the set A. Furthermore, in the case that A = {k}, kZ, and xZ, we write δx(A) ≡ δx,k.

Given H, a class of measurable functions h:XR, the empirical measure generates the map HR given by hPnh, where for any signed measure Q and measurable function h, we use the notation Qh=dQh. Furthermore, define the H-indexed empirical process Gn by

Gnh=n(PnP)h=1ni=1n(h(xi)Ph), A 6

and with the empirical process, identify the signed measure Gn=n1/2i=1n(δxiP).

Note that for a measurable function h, from the law of large numbers and the central limit theorem, it follows that Pnha.s.Ph, and GnhdN(0,PhPh2), provided Ph exists and Ph2<, and where ‘d’ denotes convergence in distribution. In addition to the preceding notation, given the elements f, and fn, nN, we also denote, respectively, convergence in probability and in distribution, of fn to f, by fnpf.

For any map x:HRk, kN, define the uniform norm xH by

xH=sup{|x(h)|:hH}, A 7

and in the case that HR, write H. A class H for which PnPH0 is called a P-Glivenko–Cantelli class. Denote by (H) the class of uniformly bounded functions on H. That is, for a general kN,

(H)={x:HRk:xH<}.

If for some tight Borel measurable element G(H), GndG, in (H), we say that H is a P-Donsker class.

A.2. Empirical process framework

To prove theorem 2.1, we turn to an empirical modelling framework that will provide us the asymptotic statistics of the weighted product limit estimator. Consider a closed particle system, such that according to a predefined set of characteristics, the system can be subdivided into mutually exclusive subsystems.

Each particle corresponds to the observed state of a particular individual in a fixed population cohort. Note that we will restrict this discussion to only a single population of particles. These arguments will extend to multiple populations as mentioned in this paper by treating separate populations as independent.

At any given time t ≥ 0, each particle will have exactly one associated state x in the set Z4, referring, respectively, to states of

0dormancy1activity2inactivity3censored. A 8

Assume that the path of any particle is statistically dependent upon its particular subsystem, and that given the respective subsystems of any two particles, their resulting paths are statistically independent. Assume further that at a reference time t = 0, all particles enter into the active state (x = 1), and that particles are considered dormant for all t < 0.

Let dN and τ ∈ (0, ∞) be fixed. We will assume the existence of a collection of individuals Γ, assumed to be infinite in size, where each individual γΓ exhibits a càdlàg path-valued state xtγ, for t ≥ 0. For each γΓ, xtγ is determined by the individual's particle type zγ and a random jump time ξγ. The particle type zγ is distributed in the population through the probability mass P(zγ=z)=qz, where q=(q1,,qd)(0,1)d satisfies z=1dqz=1. Let St = (S1,t, …, Sd,t) be the survival vector Sz,t=P{Tz>t}, which is assumed continuous for t ≥ 0. Suppose that it is desired to understand the event probabilities for randomly selected γΓ, unconditional on subgroup membership. We assume that members of each cohort are in the inactive (0) state at times t < 0.

Given a random sample γ1, …, γn, nN of individuals, let n = (n1, …, nd) and

n=z=1dnz, A 9

where nz is the random number of drawn individuals of cohort z. In considering the event time probabilities of each subgroup, the random number of particles excludes the use of many well-established results in survival analysis. Therefore, we begin with a somewhat restricted framework, and assume a known number of initial individuals of each type.

Assume the sample contains a known number nz = az n, az ∈ (0, 1), of individuals of cohort z, and let μj,z,tnz0 be the number of the cohort z individuals who are in state jZ4 at time t, so that

j=03μj,z,tnz=nz A 10

is conserved. Also, we assume that there exists τz < ∞ when all particles either become inactive or censored so that τz is the infimum time where the condition

nz=μ2,z,tnz+μ3,z,tnzt>τz A 11

holds.

For the sample of size nz, we denote the z-type cumulative hazard by Λz,t and, respectively, define the z-type cumulative hazard and survival estimates by

Λ^z,tnz=0tdμ2,z,snzμ1,z,snz A 12

and

S^z,tnz=st(1dΛ^z,snz). A 13

Define further

Bz,tnz=nzS^z,tnzSz,tSz,t

and note that S^z,tnz=S^z,τznz and Bz,tnz=Bz,τznz for all tτz.

From [2], it follows that {Bz,tnz:t0} is a mean-zero square-integrable martingale with Meyer bracket process

Bz,tnz,Bw,tnzt=δzwnz0tτzdΛz,sS^z,snzSz,s21{μ1,z,snz>0}μ1,z,snz, A 14

where tτz=min{t,τz}, and δ(,) is the Kroenicker delta function.

A.3. Convergence theorems

In order to guarantee convergence of the estimator, we make the following assumptions (based upon an initially known sample size distribution n).

Assumption A.1. —

We assume that the initial sample is chosen large enough to ensure that individuals of cohort z, at state 1 (active), exist at all points t ∈ [0, τz], z ∈ {1, …, d}. That is,

infzNdμ1,z,τznz>0,a.s.

Since any survival function is monotone, an immediate result that follows from the above assumption is

c<Sz,τzSz,t1,t0, A 15

for some constant c > 0.

Assumption A.2. —

It is assumed that as n becomes large, the sample size for each individual type will grow to infinity. That is,

limninfzNd,aVμ1,z,τznaz=,a.s.

Assumption A.3. —

For each z ∈ {1, …, d} there exists a non-increasing continuous function mz : [0, ∞) → (0, 1] such that

limnsupt0μ1,z,tnaznazmz,t=0a.s.

Note that in the case of fixed censoring, that is, in the case that censoring exists only at time τ, the above is satisfied by mz,t = Sz,t. In the general case, mz,t can be seen as the probability that an individual of cohort z has not yet left state 1. That is, mz,t is the probability that an individual has not left due to censoring or death by time t, and so mz,t = Sz,t Cz,t, where Cz,t is the probability that censoring has not occurred by time t.

To prove the main theorem, we now present a series of lemmata.

Lemma A.4. —

If q^ is defined as in equation (2.3) and S^z,snz is defined as in equation (A 13), then

nz=1d(q^zqz)0tτzds(S^z,snq^zSz,s)p0,

as n → ∞, uniformly in t ≥ 0.

Proof. —

It is claimed that to prove the statement of the lemma, it suffices to show that

supt0S^z,tnq^zSz,tSz,t2p0, A 16

uniformly in t ≥ 0, for each z=1,,d.

Indeed, for if the above holds, then

0tτzds(S^z,snq^zSz,s)p0,

uniformly in t ≥ 0. Since the central limit theorem implies that n(q^zqz)dN(0,qz(1qz)), each term in the sum would converge in probability to 0, uniformly in t ≥ 0.

And so, if EN denotes the expectation given N, we have that

ES^z,tnq^zSz,tSz,t2=E1nq^zEnq^z(Bz,tnq^z)2=E1nq^zEnq^znq^z0tτzdΛz,sμ1,z,snq^zS^z,snq^zSz,s=E0tτzdΛz,sμ1,z,snq^zS^z,snq^zSz,sCE(μ1,z,τznq^z)1,

for some arbitrary constant C. From Lenglart’s inequality (cf. [20]),

P{suptS^z,tnq^zSz,tSz,t2>ϵ}ηϵ+P{μ1,z,τznq^z<Cη},

for any arbitrary η, ε > 0. Therefore, from assumption A.2, since nz → ∞ a.s., the desired result follows. ▪

Turning momentarily to the situation where there are two populations denoted by superscripts (1) and (2), for any t ≥ 0, define

Θ^tδ=n(2)n(1)+n(2)0tτdsn(1)(θ^s(1)θs(1))n(1)n(1)+n(2)0tτdsn(2)(θ^s(2)θs(2)),

noting that setting θs(1)=θs(2) recovers our test statistic of equation (2.4). For a general survival function θ, with respective estimate θ^, define Y^t by

Y^t=0tτdsn(θ^sθs),t0. A 17

If the process Y^ converges in distribution to some YN(0, σ2), since n(i)/(n(1) + n(2)) converges to p(i), i = 1, 2, it follows that

Θ^tδdp(2)Yt(1)p(1)Yt(2)N(0,p(2)σ12+p(1)σ22). A 18

Now we turn to analysis under a single population, dropping the superscripts. Note that Y^t=z=1dZ^z,t, where

Z^z,t=n0tτzds(q^zS^z,snq^zqzSz,s)=n(q^zqz)0tτzds(S^z,snq^zSz,s)+n(q^zqz)0tτzdsSz,s+nqz0tτzds(S^z,snq^zSz,s). A 19

Therefore, if it can be shown that

nz=1d(q^zqz)0tτzds(S^z,snq^zSz,s)p0,

uniformly in t, then convergence of (Y^t:t0) is dependent only upon the convergence of the d-dimensional vector-valued process ζ^(q^) given by

ζ^z,t(a)=n(q^zqz)0tτzdsSz,s+nqz0tτzds(S^z,snazSz,s), A 20

with a = (a1, …, ad) ∈ (0, 1) d chosen in a sufficiently small neighbourhood V of q. This decomposition will thus lead to the main theorem. To show the desired convergence of ζ^t(q^), we first focus on convergence of ζ^t(a).

Let ϕz,t=tτzdsSz,s and write ζ^t(a)=ζ^t1+ζ^t2(a), where

ζ^z,t1=n(q^zqz)0tτz(dϕz,s) A 21

and

ζ^z,t2(a)=qzaz0tτz(dϕz,s)Bz,snaz. A 22

Lemma A.5. —

Suppose that {ζ^t1(a):t0} and {ζ^t2(a):t0} are the processes respectively defined by equations (A 21) and (A 22), and that B~ is the d-dimensional mean-zero Gaussian process defined by

B~z,B~wt=δz,w0tτzdΛz,sSz,sCz,s. A 23

Then ζ^t1dζt1andζ^t2(a)dζt2(a), in the space of compactly supported functions DRd[0,)asn, for each aV, where ζt1=(ζ1,t1,,ζd,t1) is the mean-zero square-integrable Gaussian process defined by

ζz1,ζw1t=qzqw0tτzdsSz,s0tτwdsSw,s+δz,wqz0tτzdsSz,s2, A 24

and ζt2(a)=(ζ1,t2(a),,ζd,t2) is given by

ζz,t2(a)=qzaz0tτzdB~z,sϕz,sϕz,tτzB~z,tτz. A 25

The processes ζ^1 and ζ^2(a) are independent, and there exist Skorohod representations such that

supt0|ζ^z,t1ζz,t1|0

and

supt0,aV|ζ^z,t2(a)ζz,t2(a)|0,

almost surely as n → ∞.

Proof. —

To begin note that independence follows immediately from the independence of the respective limiting processes. Since n is a multinomial random variable, (A 24) follows from the central limit theorem. In the case of ζ^t2(a), we first consider Bz,tnaz.

An application of Lenglart’s inequality, very similar to that in the proof of lemma A.4, along with assumption A.2, shows that

supaV,t0|S^z,tnazSz,t|p0,asn.

Moreover, from assumption A.3,

supaV,t0nazμ1,z,tnaz1mz,tp0,asn.

It follows that

nazμ1,z,tnazS^z,tnazSz,t2p1mz,t,

uniformly in t ≥ 0, and since mz,t = Sz,t Cz,t,

Bz,tnaz,Bw,tnaztpδz,w0tτzdΛz,sSz,sCz,s.

Therefore, from theorem 4.2.1 of [2], Bz,tnazdB~z,t, and there exists a Skorohod representation of Bz,tnaz such that

supt0,aV|Bz,tnazB~z,t|0,

almost surely as n → ∞. Since almost sure convergence of Bz,tnaz implies almost sure convergence of bounded functionals of Bz,tnaz, the desired convergence of ζ^2(a) follows from theorem 2.1 of [3]. ▪

Corollary A.6. —

If the process ζ^(a)={ζ^t(a)} is defined by equation (A 20), then

z=1dζ^z(a)dz=1dζz,t(a)=z=1dζz,t1+ζz,t2(a). A 26

Proof. —

From the previous theorem, we may assume that ζ^z,t1ζz,t1 and ζ^z,t2(a)ζz,t2(a) almost surely, uniformly for aV and t ≥ 0. Therefore,

ζ^t(a)ζt(a)

almost surely, uniformly for aV and t ≥ 0. The statement of the theorem then follows from theorem 5.1 of [21]. ▪

Since n/npq, from theorem 4.4 of [21]

nn,z=1dζ^z,t(a):t0dq,z=1dζz,t(a):t0.

Define the map g:V×(V×[0,))([0,)) by g(a,f)=f(a,), then

z=1dζz,tnn=gnn,z=1dζz.

Furthermore, if for any (a1,f1),(a2,f2)V×(V×[0,)) we have that

|a1a2|+supaV,t0|f1(a,t)f2(a,t)|<δ

for some δ > 0, then

supt0|g(a1,f1)(t)g(a2,f2)(t)|=supt0|f1(a1,t)f2(a2,t)|supt0|f1(a1,t)f1(a2,t)|+supt0|f1(a2,t)f2(a2,t)|.

Therefore, g is continuous at any (a, f) such that f is continuous at a, uniformly in t. It thus follows from the continuous mapping theorem (cf. [4]) that if az=1dζz,t(a) is continuous, uniformly in t, then

gnn,z=1dζ^zdgq,z=1dζz. A 27

Lemma A.7. —

Ift(a) : t ≥ 0} is defined as in corollary A.6, then the map

az=1dζz,t(a)

is continuous for aV, uniformly in t ≥ 0.

Proof. —

For any a, bV, it follows that

z=1dζz,t(a)z=1dζz,t(b)=z=1dqz1az1bz×ϕz,tτzB~z,tτz0tτzdB~z,sϕz,s.

Since Sτz>0 for all z, from Doob’s martingale inequality (cf. [22]),

Esupt0z=1dζz,t(a)z=1dζz,t(b)2Cz=1d1az1bz2,

for some arbitrary constant C. For each zNd, since az and bz are sufficiently close to qz ∈ (0, 1), it follows that there exists some δ > 0 such that azbz>δ. Therefore,

1az1bz2=1azbz(azbz)2δ2(azbz)2az+bzaz+bz214δ3(azbz)2.

Combining the above two results gives

Esupt0z=1dζz,t(a)z=1dζz,t(b)2C|ab|2,

and so, by Kolmogorov’s continuity criterion (cf. [22]), the desired result follows. ▪

The above lemma, along with the argument immediately preceding, gives the following.

Theorem A.8. —

Let z=1dζz,tn() and z=1dζz,t() be defined as in corollary A.6, then

z=1dζ^z,tnndz=1dζz,t(q),inDR[0,),asn. A 28

Corollary A.9. —

If ζ^=z=1dζz,τz(q), then

ζ^N(0,σ2),

where

σ2=z=1dqzϕz,02z=1dqzϕz,02z=1dqz0τzdSz,tCz,tϕz,tSz,t2.

Proof. —

Note that when t = τz, we have

ζz,τz(q)=ζz,τz1+qz0τzdB~z,tϕz,t,

which are independent and normally distributed, implying that ζ^ is also normally distributed. Furthermore,

Eζ^2=z=1dζz,τz1+qz0τzdB~z,tϕz,t2+z,w=1zwdζz,τz1+qz0τzdB~z,tϕz,t×ζw,τw1+qz0τwdB~w,tϕw,t=z=1dEζz,τz12+Eqz0τzdB~z,tϕz,t2z,w=1zwdEζz,τz1ζw,τw1=z=1dqz(1qz)ϕz,02+qz0τzdΛz,tϕz,t2Sz,tCz,tz,w=1zwdqzqwϕz,0ϕw,0,

which after recombining the final terms, gives the desired result. ▪

Appendix B. Computation

B.1. Installation of R package

The following code installs the R package from github sources:

graphic file with name rsos180496-i1.jpg

B.2. Simulation of data used in this paper

We simulated draws from the populations mentioned in the main text using the following R code:

graphic file with name rsos180496-i2.jpg

B.3. Real-world example

graphic file with name rsos180496-i3.jpg

B.3.1. Simulations for examining the sampling distribution of Θ^

graphic file with name rsos180496-i4.jpg

Data accessibility

All data in this paper are simulated, with R source code provided in appendix B.2.

Authors' contributions

A.H. and M.H. developed the statistical method. A.H., M.H. and J.C.C. wrote the proof, performed the simulations and wrote the manuscript. J.C.C generated the figures. All authors gave final approval for publication.

Competing interests

The authors declare no competing interests.

Funding

This work is supported by the Intramural Research Program of the National Institutes of Health Clinical Center and the US Social Security Administration.

References

  • 1.Kaplan EL, Meier P. 1958. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481. ( 10.1080/01621459.1958.10501452) [DOI] [Google Scholar]
  • 2.Gill RD. 1980. Censoring and stochastic integrals. Statistica Neerlandica 34, 124–124. ( 10.1111/j.1467-9574.1980.tb00692.x) [DOI] [Google Scholar]
  • 3.Gill R. 1983. Large sample behaviour of the product-limit estimator on the whole line. Ann. Stat. 11, 49–58. ( 10.1214/aos/1176346055) [DOI] [Google Scholar]
  • 4.Van Der Vaart A. 1996. New donsker classes. Ann. Probab. 24, 2128–2140. ( 10.1214/aop/1041903221) [DOI] [Google Scholar]
  • 5.Shorack GR, Wellner JA. 2009. Empirical processes with applications to statistics, vol. 59 Philadelphia, PA: SIAM. [Google Scholar]
  • 6.Hougaard P. 1984. Life table methods for heterogeneous populations: distributions describing the heterogeneity. Biometrika 71, 75–83. ( 10.1093/biomet/71.1.75) [DOI] [Google Scholar]
  • 7.Aalen OO. 1994. Effects of frailty in survival analysis. Stat. Methods Med. Res. 3, 227–243. ( 10.1177/096228029400300303) [DOI] [PubMed] [Google Scholar]
  • 8.Lin DY, Ying Z. 1993. A simple nonparametric estimator of the bivariate survival function under univariate censoring. Biometrika 80, 573–581. ( 10.1093/biomet/80.3.573) [DOI] [Google Scholar]
  • 9.Rasch EK, Huynh M, Ho P-S, Heuser A, Houtenville A, Chan L. 2014. First in line: prioritizing receipt of social security disability benefits based on likelihood of death during adjudication. Med. Care 52, 944–950. ( 10.1097/MLR.0000000000000204) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.SullivanPepe M, Fleming TR. 1989. Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data. Biometrics 45, 497–507. ( 10.2307/2531492) [DOI] [PubMed] [Google Scholar]
  • 11.SullivanPepe M, Fleming TR. 1991. Weighted Kaplan-Meier statistics: large sample and optimality considerations. J. R. Stat. Soc. B 53, 341–352. ( 10.2307/2345745) [DOI] [Google Scholar]
  • 12.Murray S. 2001. Using weighted Kaplan-Meier statistics in nonparametric comparisons of paired censored survival outcomes. Biometrics 57, 361–368. ( 10.1111/j.0006-341X.2001.00361.x) [DOI] [PubMed] [Google Scholar]
  • 13.Zare A, Mahmoodi M, Mohammad K, Zeraati H, Hosseini M, Naieni KH. 2014. A comparison between Kaplan-Meier and weighted Kaplan-Meier methods of five-year survival estimation of patients with gastric cancer. Acta Medica Iranica 52, 764–767. [PubMed] [Google Scholar]
  • 14.Cai Z. 1998. Asymptotic properties of Kaplan-Meier estimator for censored dependent data. Stat. Probab. Lett. 37, 381–389. ( 10.1016/S0167-7152(97)00141-7) [DOI] [Google Scholar]
  • 15.Berty HP, Shi H, Lyons-Weiler J. 2010. Determining the statistical significance of survivorship prediction models. J. Eval. Clin. Pract. 16, 155–165. ( 10.1111/j.1365-2753.2009.01199.x) [DOI] [PubMed] [Google Scholar]
  • 16.Wilcoxon F. 1945. Individual comparisons by ranking methods. Biometr. Bull. 1, 80–83. ( 10.2307/3001968) [DOI] [Google Scholar]
  • 17.Schoenfeld D. 1981. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68, 316–319. ( 10.1093/biomet/68.1.316) [DOI] [Google Scholar]
  • 18.Loprinzi CL. et al. 1994. Prospective evaluation of prognostic variables from patient-completed questionnaires. North central cancer treatment group. J. Clin. Oncol. 12, 601–607. ( 10.1200/JCO.1994.12.3.601) [DOI] [PubMed] [Google Scholar]
  • 19.Chang JC, Savage VM, Chou T. 2014. A path-integral approach to Bayesian inference for inverse problems using the semiclassical approximation. J. Stat. Phys. 157, 582–602. ( 10.1007/s10955-014-1059-y) [DOI] [Google Scholar]
  • 20.Lenglart E. 1977. Relation de domination entre deux processus. Ann. Inst. H. Poincaré Sect. B (NS) 13, 171–179. [Google Scholar]
  • 21.Billingsley P. 2013. Convergence of probability measures. New York, NY: John Wiley & Sons. [Google Scholar]
  • 22.Karatzas I, Shreve S. 2012. Brownian motion and stochastic calculus, vol. 113 Berlin, Germany: Springer Science & Business Media. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reviewer comments

Data Availability Statement

All data in this paper are simulated, with R source code provided in appendix B.2.


Articles from Royal Society Open Science are provided here courtesy of The Royal Society

RESOURCES