The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average

Glen A Satten; Somnath Datta

doi:10.1198/000313001317098185

. Author manuscript; available in PMC: 2017 Aug 23.

Published in final edited form as: Am Stat. 2012 Jan 1;55(3):207–210. doi: 10.1198/000313001317098185

The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average

Glen A Satten ¹, Somnath Datta ²

PMCID: PMC5568678 NIHMSID: NIHMS810169 PMID: 28845048

Abstract

The Kaplan-Meier (product-limit) estimator of the survival function of randomly-censored time-to-event data is a central quantity in survival analysis. It is usually introduced as a nonparametric maximum likelihood estimator, or else as the output of an imputation scheme for censored observations such as redistribute-to-the-right or self-consistency. Following recent work by Robins and Rotnitzky, we show that the Kaplan-Meier estimator can also be represented as a weighted average of identically distributed terms, where the weights are related to the survival function of censoring times. We give two demonstrations of this representation; the first assumes a Kaplan-Meier form for the censoring time survival function, the second estimates the survival functions of failure and censoring times simultaneously and can be developed without prior introduction to the Kaplan-Meier estimator.

1. Introduction

The Kaplan-Meier (product-limit) estimator for the survival function of randomly-censored time-to-event data (Kaplan and Meier, 1958) is often introduced as the maximizer of a nonparametric maximum likelihood (Kalbfleisch and Prentice, 1978). Because data are subject to censoring, estimating the survival function can be thought of as a missing data problem. There are two general approaches to missing data problems: imputation, and weighting. Alternate presentations of the Kaplan-Meier estimator, including the redistribution-to-the-right algorithm of Efron (1967), the self-consistency property (Efron, 1967), or the E-M algorithm approach (Turnbull, 1976) are all examples of the imputation approach. In a series of papers, Robins and coworkers have shown that the weighting approach to missing data problems has a number of advantages over the imputation approach (Robins and Rotnitzky, 1992; Robins 1993; and Robins and Finkelstein 2000 relate directly to survival analysis). An outcome of their approach applied to survival analysis is an inverse-probability-of-censoring representation of the Kaplan-Meier estimator. The purpose of this paper is to provide a straightforward demonstration of this representation. We give two simple demonstrations of this representation. The first, found in Section 3, is more straightforward but uses as weights the Kaplan-Meier estimator for censoring times, and hence does not stand alone. For this reason, we give a second approach in Section 4 that simultaneously estimates the cumulative distribution functions of survival and censoring times using coupled inverse-probability-weighted sums. The weighted average form given in this paper is convenient for asymptotic theory and it leads to an interesting variance decomposition for the Kaplan-Meier estimator (not shown here; see Satten et al. 2001 or Robins and Finkelstein 2000 for examples of this type of result).

2. Notation and Preliminary Results

For i = 1, …, N let $T_{i}^{*}$ be the random variable denoting the (possibly unobserved) failure time and C_i be the random variable denoting the (possibly unobserved) censoring time for the ith person. We adopt the usual convention that realizations of random variables are denoted by lower-case letters. Let T_i = min( $T_{i}^{*}$ ,C_i) and let Δ_i = I[ $T_{i}^{*}$ ≤ C_i.]. The observed data consist of i.i.d. replicates of (T_i, Δ_i). We assume “random censoring,” i.e. that $T_{i}^{*}$ and C_i are independent. The goal is to estimate the survival function S(t) = Pr[ $T_{i}^{*}$ >t] or, equivalently, the cumulative distribution function F(t) = 1−S(t).

Let the ordered failure or censoring times be τ_j, j = 1, …, J and let n_j be the number of persons who fail at time τ_j and m_j be the number of persons censored at time τ_j. We assume that no person can have a failure time equal to their censoring time (i.e. such persons are taken to be uncensored with failure time τ_j). Then, the risk set (number of persons at risk for failure at time t) can be written as

Y (t) = \sum_{j = 1}^{J} (n_{j} + m_{j}) I [τ_{j} \geq t] .

(1)

The Kaplan-Meier estimator is ${\hat{S}}_{k m} (t)$ of S(t) is

{\hat{S}}_{k m} (t) = \prod_{{j | τ_{j} \leq t}} (1 - \frac{n_{j}}{Y (τ_{j})})

(2)

We can also estimate the survival function for censoring times, K(t) = Pr[C_i>t] using the Kaplan-Meier approach but considering failure events as “censored” observations and censored observations as “failures.” The Kaplan-Meier estimator of K(t) is thus

\hat{K} (t) = \prod_{{j | τ_{j} \leq t}} (1 - \frac{m_{j}}{Y (τ_{j})}) .

(3)

If there were no censoring, we could estimate F(t) by the empirical cumulative distribution function

F^{*} (t) = \frac{1}{N} \sum_{i = 1}^{N} I [t_{i}^{*} \leq t],

(4)

which, considered as a random variable for each t, is an average of iid terms. The inverse-probability-of-censoring estimator analogous to F*(t) is also an average of iid terms $I [t_{i}^{*} \leq t]$ , each multiplied by $δ_{i} = [t_{i}^{*} \leq c_{i}]$ and weighted inversely by the probability that the failure time is observed, i.e. by $P r [C_{i} \geq t_{i}^{*}] \equiv K (t_{i}^{*} -)$ . Of course we do not know K(t) so we must use an estimate; we use the Kaplan-Meier estimator of K(t) given in (3). Because this estimator was first proposed by Robins and Rotnitzky (1992) we denote the resulting estimator by ${\hat{F}}_{r r} (t)$ ; it is given by

{\hat{F}}_{r r} (t) = \frac{1}{N} \sum_{i = 1}^{N} \frac{I [t_{i} \leq t] δ_{i}}{\hat{K} (t_{i} -)} .

(5)

Note that we have used I[t_i≤ t] rather than $I [t_{i}^{*} \leq t]$ in (4) to emphasize that ${\hat{F}}_{r r} (t)$ can be calculated using the observed data; this replacement is possible as $I [t_{i} \leq t] δ_{i} = I [t_{i}^{*} \leq t] δ_{i}$ .

3. Equivalence of ${\hat{F}}_{r r} (t)$ and ${\hat{F}}_{k m} (t)$

Note that both ${\hat{F}}_{r r} (t)$ and ${\hat{F}}_{k m} (t) : = 1 - {\hat{S}}_{k m} (t)$ are right-continuous step functions with possible jumps at times τ_j. Thus, ${\hat{F}}_{r r}$ and ${\hat{F}}_{k m}$ are the same if the magnitudes of the jumps in the two functions are equal. The jump in ${\hat{F}}_{k m}$ at time τ_j is given by

{\hat{S}}_{k m} (τ_{j} -) - {\hat{S}}_{k m} (τ_{j}) = {\hat{S}}_{k m} (τ_{j} -) \frac{n_{j}}{Y (τ_{j})}

(6)

while the jump in ${\hat{F}}_{r r} (τ_{j})$ is given by

{\hat{F}}_{r r} (τ_{j}) - {\hat{F}}_{r r} (τ_{j} -) = \frac{1}{N} \frac{n_{j}}{\hat{K} (τ_{j} -)}

The jumps are equal provided

\frac{1}{N} \frac{1}{\hat{K} (τ_{j} -)} = \frac{{\hat{S}}_{k m} (τ_{j} -)}{Y (τ_{j})}

{\hat{S}}_{k m} (τ_{j} -) \hat{K} (τ_{j} -) = \frac{1}{N} Y (τ_{j}) .

(7)

As long as there is no time τ_j for which n_jm_j > 0 (i.e., no ties between deaths and censored values), then

{\hat{S}}_{k m} (τ_{j} -) \hat{K} (τ_{j} -) = \prod_{j^{'} < j} (1 - \frac{n_{j^{'}}}{Y (τ_{j^{'}})}) \prod_{j^{'} < j} (1 - \frac{m_{j^{'}}}{Y (τ_{j^{'}})}) = \prod_{j^{'} < j} (1 - \frac{n_{j^{'}} + m_{j^{'}}}{Y (τ_{j^{'}})});

but

\prod_{j^{'} < j} (1 - \frac{n_{j^{'}} + m_{j^{'}}}{Y (τ_{j^{'}})}) = (1 - \frac{n_{1} + m_{1}}{n_{1} + m_{1} + \dots + n_{J} + m_{J}}) (1 - \frac{n_{2} + m_{2}}{n_{2} + m_{2} + \dots + n_{J} + m_{J}}) \dots (1 - \frac{n_{j - 1} + m_{j - 1}}{n_{j - 1} + m_{j - 1} + \dots + n_{J} + m_{J}}) = (\frac{n_{2} + m_{2} + \dots + n_{J} + m_{J}}{n_{1} + m_{1} + \dots + n_{J} + m_{J}}) (\frac{n_{3} + m_{3} + \dots + n_{J} + m_{J}}{n_{2} + m_{2} + \dots + n_{J} + m_{J}}) \dots (\frac{n_{j} + m_{j} + \dots + n_{J} + m_{J}}{n_{j - 1} + m_{j - 1} + \dots + n_{J} + m_{J}}) = (\frac{n_{j} + m_{j} + \dots + n_{J} + m_{J}}{n_{1} + m_{1} + \dots + n_{J} + m_{J}}) = \frac{Y (τ_{j})}{N}

since n₁+m₁ ⋯+n_J+m_J = N, so that equation (7) holds.

For the case where n_jm_j>0 for some j, the argument above breaks down because

(1 - \frac{n_{j}}{Y (τ_{j})}) (1 - \frac{m_{j}}{Y (τ_{j})}) \neq (1 - \frac{n_{j} + m_{j}}{Y (τ_{j})})

We can ask, what function K′(t) of the form $K' (t) = \prod_{{j | τ_{j} \leq t}} (1 - d_{j})$ would make ${\hat{F}}_{r r} (t)$ equal to ${\hat{F}}_{k m} (t)$ even in the presence of ties. The appropriate choice of d_j solves

(1 - \frac{n_{j}}{Y (τ_{j})}) (1 - d_{j}) = (1 - \frac{n_{j} + m_{j}}{Y (τ_{j})})

for each j, from which we obtain d_j = m_j/{Y(τ_j)−n_j}and hence

K^{'} (t) = \prod_{{j | τ_{j} \leq t}} (1 - \frac{m_{j}}{Y (τ_{j}) - n_{j}}) .

Note that K′ is the Kaplan-Meier estimator of censoring times we would obtain if we broke the ties between failures and censored observations by assuming that the failures had occurred just before the censored observations. This coincides with the usual convention when calculating the Kaplan-Meier estimator of failure times with data where there are ties between failure and censoring times (Kaplan and Meier 1958, p. 461).

4. Coupled Estimation of the Distribution of Failure and Censoring Times

The results in Section 3 are somewhat unsatisfactory in that the definition of ${\hat{F}}_{r r} (τ_{j})$ uses a Kaplan-Meier estimator (for the censoring times, $\hat{K}$ ). Hence, these results would be unsuitable for an a priori development of the Kaplan-Meier estimator. In this section, we introduce a “new” inverse-probability-of-censoring weighted estimator of F(t) that makes no reference to the Kaplan-Meier estimator of the censoring times. We then show that this “new” estimator is identical to the Kaplan-Meier estimator. Our approach is to simultaneously estimate $F (t) = P r [T_{i}^{*} \leq t]$ and $G (t) = 1 - K (t) = P r [C_{i} \leq t]$ using coupled inverse-probability-of-censoring weighted estimators. Let $\hat{F} (t)$ and $\hat{G} (t)$ be given by

\hat{F} (t) = \frac{1}{n} \sum_{i = 1}^{n} \frac{I [t_{i} \leq t] δ_{i}}{1 - \hat{G} (t_{i} -)}

and

\hat{G} (t) = \frac{1}{n} \sum_{i = 1}^{n} \frac{I [t_{i} \leq t] {\bar{δ}}_{i}}{1 - \hat{F} (t_{i})}

where ${\bar{δ}}_{i} = I [c_{i} < t_{i}^{*}]$ . Then $\hat{F} (t)$ is a step function with jumps at times τ_j for which n_j > 0 and $\hat{G} (t)$ is a step function with jumps at times τ_j for which m_j.> 0. The asymmetry in definitions of $\hat{F} (t)$ and $\hat{G} (t)$ reflects the choice that when failure and censoring times are tied, the censored observations are considered to have been lost to follow-up after the failures had occurred. Denoting the jumps in $\hat{F} (τ_{j})$ and $\hat{G} (τ_{j})$ by f_j and g_j we have

f_{j} = \frac{1}{N} \frac{n_{j}}{(1 - \sum_{j^{'} < j} g_{j^{'}})}

(8)

and

g_{j} = \frac{1}{N} \frac{m_{j}}{(1 - \sum_{j^{'} \leq j} f_{j^{'}})},

(9)

where the sum $\sum_{j < j} g_{j}$ is to be interpreted as 0. Note that these equations are easily uncoupled to yield

f_{j} = \frac{n_{j}}{N - \sum_{j^{'} < j} \frac{m_{j^{'}}}{(1 - \sum_{j^{″} \leq j^{'}} f_{j^{″}})}}

(10)

and

g_{j} = \frac{m_{j}}{N - \sum_{j^{'} \leq j} \frac{n_{j^{'}}}{(1 - \sum_{j^{″} < j^{'}} g_{j^{″}})}} .

(11)

Equations (10)–(11) for the f_j and g_j are triangular, i.e. the right hand side of the equation (10) expresses f_j in terms of $f_{j^{'}}, j^{'} < j$ . Hence, the f_j and hence $\hat{F} (t)$ can be calculated recursively using (10). Similarly, $\hat{G} (t)$ can be calculated using (11), if desired.

Although it is not immediately obvious, the fact is that $\hat{F} (t) = {\hat{F}}_{k m} (t)$ . To see this recall that the masses in the Kaplan-Meier estimator is the maximizer of the likelihood

L = \prod_{j = 1}^{J} f_{j}^{n_{j}} {(\sum_{j^{'} > j} f_{j^{'}})}^{m_{j}}

(11)

subject to $\sum_{j = 1}^{J + 1} f_{j} = 1$ . Following Turnbull (1976), note that ${f_{j}, 1 \leq j \leq J + 1}$ solves this maximization problem if

D_{j} : = \frac{\partial l n L}{\partial f_{j}} - \sum_{j = 1}^{J + 1} f_{j} \frac{\partial l n L}{\partial f_{j}} = 0

and $\sum_{j = 1}^{J + 1} f_{j} = 1$ . Some algebra shows that the condition D_j = 0 can be rewritten as

\frac{n_{j}}{f_{j}} + \sum_{j^{'} < j} \frac{m_{j^{'}}}{(\sum_{j^{″} < j^{'}} f_{j^{″}})} - N = 0;

(12)

solving (12) for f_j yields Equation (10), establishing the equivalence of $\hat{F} (t)$ and ${\hat{F}}_{k m} (t)$

Discussion

The Kaplan-Meier estimator is a fundamental tool in survival analysis. It is usually introduced as a nonparametric maximum likelihood estimator. The likelihood-based approach is useful, leading to useful generalizations when data are subject to interval censoring (Turnbull, 1976), truncation (Woodroofe 1985, Wang, Jewell and Tsai 1986) or both (Frydman 1994). We have shown that the Kaplan-Meier estimator can also be expressed as an inverse-probability-of-censoring weighted estimator.

The weighted average form given in this paper with the true K is an average of i.i.d. terms under the random censoring model. Even under the model when censoring times are regarded fixed (Meier, 1975), it is an average of independent (but not necessarily identically distributed) terms and is therefore subject to appropriate laws of large numbers and central limit theorems. Thus, the inverse-probability-of-censoring weighted estimator is also convenient for asymptotic theory.

Since the inverse-probability-of-censoring approach in survival analysis was introduced by Robins and Rotnitzky (1992) it has also led to useful generalizations, primarily to more general censoring models where the censoring hazard may depend on an observable covariate history (see e.g. Robins and Finkelstein, 2000, Satten and Datta 2002 and Satten et al., 2001, for recent discussions). We have given two demonstrations of the equivalence of the inverse-probability-of censoring weighted sum and product-limit representations of the Kaplan-Meier estimator. The first, given in Section 3, is designed to achieve the result quickly, but requires the availability of the Kaplan-Meier estimator of censoring times. The second, given in Section 4, is less direct, but constructs the weighted estimator without making any reference to the Kaplan-Meier estimator.

Contributor Information

Glen A. Satten, Division of HIV/AIDS Prevention - Surveillance and Epidemiology, National Center for HIV, STD and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333 USA

Somnath Datta, Department of Statistics, University of Georgia, Athens, GA 30602 USA.

References

Efron B. Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability. IV. University of California Press; Berkeley, CA: 1967. The two sample problem with censored data; pp. 831–853. [Google Scholar]
Frydman H. A note on nonparametric estimation of the distribution function from interval-censored and truncated observations. Journal of the Royal Statistical Society, Series B. 1994;56:71–74. [Google Scholar]
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481. [Google Scholar]
Kalbfleisch J, Prentice R. The Statistical Analysis of Failure Time Data. New York: John Wiley; 1980. [Google Scholar]
Meier P. Perspectives in Probability and Statistics. Ed J. Gani, Sheffeld Eng.: Applied Probability Trust; 1975. Estimation of a distribution function from incomplete observations. [Google Scholar]
Miller RG., Jr . Survival Analysis. New York: John Wiley & Sons; 1981. [Google Scholar]
Robins JM. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. Proceedings of the American Statistical Association - Biopharmaceutical Section. 1993:24–33. [Google Scholar]
Robins J, Finkelstein D. Correcting for non-compliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56:779–788. doi: 10.1111/j.0006-341x.2000.00779.x. [DOI] [PubMed] [Google Scholar]
Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V, editors. AIDS Epidemiology – Methodological Issues. Boston: Birkhauser; 1992. pp. 297–331. [Google Scholar]
Satten GA, Datta S. Marginal Estimation for Multistage Models: Waiting Time Distributions and Competing Risks Analyses. Statistics in Medicine. 2002;21:3–19. [PubMed] [Google Scholar]
Satten GA, Datta S, Robins JM. Estimating the Marginal Survival Function in the Presence of Time Dependent Covariates. Statistics and Probability Letters. 2001;54:397–403. [Google Scholar]
Turnbull The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B. 1976;38:290–295. [Google Scholar]
Wang M-C, Jewell N, Tsai W-Y. Asymptotic properties of the product limit estimator under random truncation. The Annals of Statistics. 1986;124:1597–1605. [Google Scholar]
Woodroofe M. Estimating a distribution function with truncated data. The Annals of Statistics. 1985;13:163–177. Correction, 15, 883 (1987) [Google Scholar]

[R1] Efron B. Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability. IV. University of California Press; Berkeley, CA: 1967. The two sample problem with censored data; pp. 831–853. [Google Scholar]

[R2] Frydman H. A note on nonparametric estimation of the distribution function from interval-censored and truncated observations. Journal of the Royal Statistical Society, Series B. 1994;56:71–74. [Google Scholar]

[R3] Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481. [Google Scholar]

[R4] Kalbfleisch J, Prentice R. The Statistical Analysis of Failure Time Data. New York: John Wiley; 1980. [Google Scholar]

[R5] Meier P. Perspectives in Probability and Statistics. Ed J. Gani, Sheffeld Eng.: Applied Probability Trust; 1975. Estimation of a distribution function from incomplete observations. [Google Scholar]

[R6] Miller RG., Jr . Survival Analysis. New York: John Wiley & Sons; 1981. [Google Scholar]

[R7] Robins JM. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. Proceedings of the American Statistical Association - Biopharmaceutical Section. 1993:24–33. [Google Scholar]

[R8] Robins J, Finkelstein D. Correcting for non-compliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56:779–788. doi: 10.1111/j.0006-341x.2000.00779.x. [DOI] [PubMed] [Google Scholar]

[R9] Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V, editors. AIDS Epidemiology – Methodological Issues. Boston: Birkhauser; 1992. pp. 297–331. [Google Scholar]

[R10] Satten GA, Datta S. Marginal Estimation for Multistage Models: Waiting Time Distributions and Competing Risks Analyses. Statistics in Medicine. 2002;21:3–19. [PubMed] [Google Scholar]

[R11] Satten GA, Datta S, Robins JM. Estimating the Marginal Survival Function in the Presence of Time Dependent Covariates. Statistics and Probability Letters. 2001;54:397–403. [Google Scholar]

[R12] Turnbull The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B. 1976;38:290–295. [Google Scholar]

[R13] Wang M-C, Jewell N, Tsai W-Y. Asymptotic properties of the product limit estimator under random truncation. The Annals of Statistics. 1986;124:1597–1605. [Google Scholar]

[R14] Woodroofe M. Estimating a distribution function with truncated data. The Annals of Statistics. 1985;13:163–177. Correction, 15, 883 (1987) [Google Scholar]

PERMALINK

The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average

Glen A Satten

Somnath Datta

Abstract

1. Introduction

2. Notation and Preliminary Results

3. Equivalence of ${\hat{F}}_{r r} (t)$ and ${\hat{F}}_{k m} (t)$

4. Coupled Estimation of the Distribution of Failure and Censoring Times

Discussion

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average

Glen A Satten

Somnath Datta

Abstract

1. Introduction

2. Notation and Preliminary Results

3. Equivalence of F^rr(t) and F^km(t)

4. Coupled Estimation of the Distribution of Failure and Censoring Times

Discussion

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. Equivalence of ${\hat{F}}_{r r} (t)$ and ${\hat{F}}_{k m} (t)$