Abstract
In the presence of informative right censoring and time-dependent covariates, we estimate the survival function in a fully nonparametric fashion. We introduce a novel method for incorporating multiple observations per subject when estimating the survival function at different covariate values and compare several competing methods via simulation. The proposed method is applied to survival data from people awaiting liver transplant.
Keywords: Survival analysis, Kaplan–Meier estimation, inverse probability censoring weighting, dependent censoring, organ transplantation
1. Introduction
The Cox proportional hazards model [1] can be used to semiparametrically estimate survival functions that depend on temporally constant covariates. It requires a proportional hazards assumption which is often violated in practice. This is sometimes remedied by extending the Cox model to allow time-dependent covariates [4], but this does not always solve the problem, and it prevents us from recovering the survival function [2, p. 151].
The Kaplan–Meier product-limit estimator [3] avoids the proportional hazards assumption, but it does not naturally utilize covariate information, whether it be constant or varying in time. However, in a data-rich environment it is possible to apply the Kaplan–Meier estimator to each level of a covariate (or each possible combination of a set of covariates) separately. Up to how ties are handled, this is equivalent to the stratified Cox model with no predictors. Unfortunately, the Kaplan–Meier estimator (and the Cox model, for that matter) may be subject to bias if censoring and failure times are not independent.
This can happen, for example, when estimating survival without a transplant for people on the liver transplant waiting list, because the sickest patients (i.e. those closest to failure) are prioritized for transplant, which is a common censoring mechanism. In this context, patient medical histories constitute time-varying covariate information. With this application in mind, we develop a Kaplan–Meier-type estimator which can incorporate time-dependent covariates and avoids the bias from having dependent failure and censoring times. Through simulation we demonstrate improvements over traditional Kaplan–Meier and Cox models. We then apply the proposed technique to estimate without-transplant survival using a de-identified data set of liver waitlist candidates from the Scientific Registry of Transplant Recipients (SRTR).
2. Methods
2.1. Notation and preliminaries
For person i, let be the random variable denoting the possibly unobserved failure time, let be the random variable denoting censoring time, let be the indicator of failure (i.e. indicates censoring and indicates failure), and let . We follow the usual practice of allowing lowercase letters to denote the realizations of these random variables. In addition, let be a time-dependent covariate that may affect both failure and censoring times. Let be a finite vector containing the realized history of for times up to and including time t. For example, in the context of liver transplantation, might encode a patient's medical history, which would be relevant for predicting both death (failure) and transplant (censoring). We assume that, conditional on , failure times are independent and identically distributed with common conditional survival function1 . Our goal is to estimate the survival function nonparametrically, taking covariate history into account and adjusting for possible dependence between failure times and censoring times.
We note that survival analysis literature often refers to non-informative and informative censoring; but these terms are not always precisely defined. Very broadly, informative censoring is censoring that ‘provides more information than the fact that survival time exceeded a certain time’ [6, p. 95]. Lagakos [5] provides a more technical discussion of conditions for non-informative censoring. The approach we propose addresses instances where failure and censoring times are marginally dependent (because they both depend on one or more common covariates), but conditionally independent (given the entire covariate history).
2.2. The Kaplan–Meier estimator and incorporating time-dependent covariates
Under the assumption that censoring is independent of failure time, the Kaplan–Meier product-limit estimator is a consistent estimator for the survival function [3, p. 479]. This estimator is of the form:
| (1) |
where the 's are the distinct times at which at least one failure happened, is the number of failures that occurred at time , and is the number of individuals still at risk of failure right before time , i.e. the number of people who had neither had a failure nor been censored before .
We presuppose a data-rich environment, such as a national database containing tens of thousands of observations, in which it is plausible to estimate for each level z of some covariate of interest or each possible combination of levels in the case of multiple covariates. If necessary, levels can be binned to increase within-level sample sizes. Suppose further that we have multiple observations per person, each taken at a different time point and having a potentially different value for z. If censoring and failure times are independent, we can consistently estimate each with the Kaplan–Meier estimate using a data set comprised of up to one survival time per person per z. In this manner, we allow the same person to contribute data to multiple 's, which may induce dependence between them. We account for this via bootstrapping, as explained in Section 2.6. To guarantee independence of observations within the data set for a particular z, we require that if a person has the same z-value multiple times, we only use one of them. For example, we might use just the survival time from the moment the person first reached level z.
2.3. IPCW Kaplan–Meier estimator
When censoring time and failure time are dependent, however, the Kaplan–Meier estimator is no longer asymptotically unbiased. To take an extreme situation, suppose individuals were systematically censored immediately before death; the estimated survival function would (trivially) be 1. In general, if dependence between censoring time and failure time is positive survival will be overestimated, whereas if the dependence is negative survival will be underestimated. In many medical applications, positive dependence is the primary worry: for example, people may drop out of a study because they are too sick to continue with treatment.
A solution, the inverse-probability-of-censoring-weighted (IPCW) estimator, was proposed by Robbins and various coauthors [see 9, and self-citations therein], with a user-friendly implementation for R users given by Willems et al. [15] that makes use of the popular survival package [12]. The IPCW estimator takes a similar form to the Kaplan–Meier estimator, but the counts of deaths and those at risk are modified by weights that equal the inverse of the probability of remaining uncensored. In particular, let be the probability that person i is still uncensored at time t, conditional on the covariate history up until that time. Some authors denote this as to emphasize this dependence on , but we omit the superscript for convenience. In practice, this probability is not available so we use an estimate of it, which we shall denote . (The method for obtaining this estimate is explained below.) The (estimated) inverse probability weight for subject i at time t is therefore . Equipped with such weights, we estimate the IPCW estimator as:
| (2) |
where the numerator is the sum of the weights at time for all people who died at time , and the denominator is the sum of the weights at time for all people who were at risk at . In other words, is the indicator that person i died at time , is an indicator that person i was at risk at time , and n is the total number of people in the data set. Note that if everyone's estimated weight function is the same, i.e. if for all , then reduces to .
2.4. Intuition behind IPCW
The IPCW estimator increases the weight of data from individuals who have not been censored yet, but, based on their covariate history , should have been. This is because such individuals are underrepresented in the traditional Kaplan–Meier estimate. This underrepresentation stems from the fact that had there been other people with these same histories, they would have tended to be censored and thus not appear in the risk or death sets.
Paraphrasing an example by Robbins and Finkelstein [9], suppose a person i is observed to survive uncensored to time τ (i.e. ) and suppose further that this person has an estimated conditional probability (given the history ) of of having avoided censoring until time τ. Then there would, on average, have been three other prognostically similar persons (‘ghosts’ in the language of Robbins and Finkelstein) who were censored before time τ and who would have survived until τ had censoring been prevented. We therefore assign person i a weight of in the denominator of Equation (2), which denominator estimates the number of people who would have been at risk at time τ in the absence of censoring.
Similarly, a person who died at τ with is representative of three other ghosts who would have died then and would have had similar covariate histories, had they not been removed from the risk pool because of censoring. We likewise assign a weight of 4 to such a person in our calculation of the numerator.
2.5. Calculating the weights
Recall the weight function is simply , where is the estimated probability that subject i remains uncensored until time t. In practice, we estimate this probability by taking a product of Kaplan–Meier estimates across the subject's covariate history. Each Kaplan–Meier estimate is the (estimated) probability that the subject remained uncensored for their sojourn at a particular covariate value. In this situation, ‘failure’ occurs when the sojourn ends by censoring, while ‘censoring’ occurs when the sojourn ends in some other way, usually by failure (i.e. death) or by a change in the covariate value.
In particular, let be the amount of time spent by subject i ( ) during their jth sojourn ( ) at a covariate value z before getting a transplant. Let be 1 if the sojourn ended in censoring and 0 if not. Let and denote realizations of these random variables. Let denote the cumulative distribution function for ‘survival’ time (i.e. time until censoring) for someone with covariate value z, let denote the corresponding probability density function, and let denote the corresponding survival function, i.e. . Assuming that a subject's sojourn times are independent, the likelihood of takes the form:
It is well-known that the Kaplan–Meier estimator maximizes this likelihood [3, p. 475 – 476]. We therefore let be the Kaplan–Meier estimate for the probability of remaining untransplanted until time t based on data from all subjects from all sojourns where the covariate was z, treating censoring events as failure events and vice versa, as described above. Let with be the sum of the first ℓ sojourn times for subject i; and for any time t>0, define m to be the largest integer such that . Recall that is the covariate value for subject i at time t. Then can be obtained as the product of 's over subject i's sojourns as follows:
| (3) |
2.6. Standard errors
We calculated standard errors using formula (5) of Xie and Liu [16, p. 3093]:
| (4) |
where
In the present context, this approach assumes that the probabilities of remaining untransplanted can be consistently estimated and are bounded away from zero, and that getting a transplant is independent of censoring and conditionally independent of survival time, given the patient's medical history. All of these assumptions are approximately met, except the independence of censoring status and whether a subject got a transplant. In Section 3 we quantify the extent to which standard errors are underestimated in the presence of such dependence.
Note that if everyone's estimated weight function is the same, i.e. if for all , then the righthand side of Equation (4) reduces to the familiar Greenwood's formula. Curiously, the survival package [12] incorrectly calculates the standard error for Kaplan–Meier survival with weights, substituting for . At considerable computational cost, the standard errors can alternatively be estimated by bootstrapping both the weight-construction and the weighted Kaplan–Meier steps.
The above variance formula is for the bias-corrected survival given a particular value of the covariate. If one desires to conduct joint inference on the survival at multiple covariate values, some care is needed, because the data used to fit the model at one value is correlated to the data used to fit the model at another value , since time-to-event observations from the same subjects can occur in both data sets. To deal with such correlation, we recommend bootstrapping. These are the steps: (1) sample n subjects with replacement; (2) for some time of interest t>0, calculate for all covariate values z in some set Z; (3) repeat 1000 times. The set of 1000 's can be used for joint inference. For instance, to get a 95% confidence interval for the difference in survival at time t between and , calculate the difference for each bootstrap iteration and take the 2.5th and 97.5th percentiles.
2.7. SRTR data
This study used data from the Scientific Registry of Transplant Recipients (SRTR). The SRTR data system includes data on all donors, waitlisted candidates, and transplant recipients in the U.S., submitted by the members of the Organ Procurement and Transplantation Network (OPTN). The Health Resources and Services Administration (HRSA), U.S. Department of Health and Human Services provides oversight to the activities of the OPTN and SRTR contractors. This data system has previously been described elsewhere [7]. The Johns Hopkins Medicine Institutional Review Board determined this research was exempt from federal regulations applying to human subjects research.
3. Simulations
3.1. Setup
To demonstrate the efficacy of the proposed method, we conducted simulations meant to mimic liver disease progression and the liver transplantation program. For disease progression, we assumed a discrete-time Markov model with 3 states: disease scores of z = 1 (least severe; probability of death in the next day is 0.005) and z = 2 (most severe; probability of death in the next day is 0.08), and an absorbing death state z = 3. The one-day transition matrix is:
where represents the probability of going from state i to state j in one day. Thus, in the next day a person in disease state 2 improves to state 1 with probability 2%, stays at state 2 with probability 90%, or dies with probability 8%. We assumed that the probability of transplant in the next day increased with disease severity as well:
We simulated day-to-day disease progression without transplant for 20,000 subjects for 100 days using the matrix P, with initial (day-0) disease states selected at random according to the vector (0.80, 0.20, 0.00). Based on the disease state trajectories, for each day, we simulated according to whether the candidates would have received a transplant. If the transplant date was before the death date, this was counted as censoring due to transplant. If a death occurred before a transplant and on or before day 100, this was counted as a death. Candidates experiencing neither death nor transplant by day 100 were administratively censored.
It may be more realistic to allow the transition matrix to differ across subjects, but estimating subject-specific survival functions nonparametrically would not be possible in general because there would be so few observations on each subject. In the Appendix we use an additional simulation to show that failing to take into account such heterogeneity can result in bias, but the bias is negligible for shorter time horizons.
3.2. Estimation methods
We compared four estimation methods:
Kaplan–Meier using one observation per subject (‘NaiveKM1’),
bias-corrected Kaplan–Meier using one observation per subject ( ‘NaiveKM2’),
bias-corrected Cox model using one observation per subject (‘Cox’), and
bias-corrected Kaplan–Meier using multiple observations per subject (‘SmartKM’, the proposed model).
3.3. Results
Figure 1 contains the estimated survival curves and standard errors when z = 1 and when z = 2 for each of the four methods. Each estimated survival curve and standard error are based on one simulation replicate consisting of 20,000 subjects. First, notice from the bottom lefthand panel that the Cox and NaiveKM1 models are biased when z = 2, whereas SmartKM and NaiveKM2 are not. When z = 1 (top panels), SmartKM and NaiveKM2 are very similar, since almost every subject spent time in the z = 1 state. But when z = 2, the NaiveKM2 standard errors are at least 50% larger. This is because SmartKM can use a subject's z = 2 data point even if that subject did not start at z = 2, but NaiveKM2 cannot. The simulation results suggests that the proposed SmartKM model is better than the other models.
Figure 1.
Estimated survival curves (left) and standard errors (right) for z = 1 (top) and z = 2 (bottom).
3.4. Standard error analysis
We now turn our attention to the standard errors and coverage of the unbiased models. By coverage, we mean the probability that a confidence interval contains the true value of a parameter of interest such as a survival probability. It can be estimated empirically by observing the proportion of confidence intervals that contain the true parameter value across numerous simulations. A model is working well when the level of confidence matches the observed coverage, e.g. when 95% confidence intervals capture the true parameter value 95% of the time. Table 1 reports empirical standard deviations, median standard errors, and coverage estimates based on 500 replicates2, wherein each replicate consists of 5,000 subjects (instead of 20,000) per sample.
Table 1.
Empirical standard deviation versus median standard error and estimated coverage for nominally 68% (± 1 s.e.) and 95% (± 2 s.e.) confidence intervals, by method. Only bias-corrected methods are included.
| SmartKM | NaiveKM2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Nominal coverage | Nominal coverage | ||||||||
| z | t | sd | median(s.e.) | 68% | 95% | sd | median(s.e.) | 68% | 95% |
| 1 | 3 | 0.0021 | 0.0021 | 0.70 | 0.96 | 0.0021 | 0.0022 | 0.70 | 0.97 |
| 1 | 7 | 0.0037 | 0.0037 | 0.64 | 0.95 | 0.0038 | 0.0037 | 0.63 | 0.95 |
| 1 | 30 | 0.0108 | 0.0093 | 0.58 | 0.90 | 0.0110 | 0.0094 | 0.58 | 0.90 |
| 1 | 90 | 0.0211 | 0.0158 | 0.59 | 0.87 | 0.0210 | 0.0159 | 0.59 | 0.88 |
| 2 | 3 | 0.0091 | 0.0088 | 0.68 | 0.94 | 0.0140 | 0.0138 | 0.69 | 0.94 |
| 2 | 7 | 0.0120 | 0.0119 | 0.66 | 0.96 | 0.0188 | 0.0186 | 0.66 | 0.95 |
| 2 | 30 | 0.0178 | 0.0167 | 0.65 | 0.96 | 0.0286 | 0.0258 | 0.63 | 0.92 |
| 2 | 90 | 0.0241 | 0.0181 | 0.54 | 0.86 | 0.0319 | 0.0251 | 0.59 | 0.88 |
For 30-day survival, the median standard error is about 85% to 95% as large as the empirical standard deviation, regardless of the disease score z or the estimation method. For 90-day survival, this proportion drops to approximately 75%. Both methods tend to underestimate the true variability in survival for longer time horizons. This results in moderate undercoverage for such time horizons.
It is worth pointing out that even though SmartKM's standard error underestimates the true variability in survival, its empirical standard deviation is still quite a bit lower than NaiveKM's when z = 2, mimicking what we saw in Figure 1 and what we see when comparing median standard errors. This shows the actual benefit of using multiple observations per subject instead of just one.
4. Application: estimating probability of survival without a liver transplant
An important application of the proposed method is in the context of liver transplantation. In the United States, there are about 12,000 people added to the liver transplant waiting list each year, but there are only about 8,000 liver transplants per year [8]. The scarcity of donor livers makes it necessary to prioritize the sickest candidates first. Prioritization is accomplished via a liver disease severity index, known as MELD3. When liver transplant candidates arrive on the waitlist (and at subsequent intervals thereafter), they are assigned a MELD score. The score is an integer from 6 to 40, with a higher number indicating more severe liver disease. Those with higher MELD scores are prioritized for transplant.
A consideration for patients on the waitlist is their probability of surviving a certain amount of time without a transplant, given their current MELD score. Traditional survival analysis methods which require noninformative censoring are inappropriate in this case since one of the most common forms of censoring is by transplant, which, by design, occurs for those closest to death. An additional feature of liver transplant data that complicates analysis is that MELD scores are updated from time to time. Fortunately, the method proposed in this paper can make use of this time-dependent covariate while also correcting for the informative censoring. We next describe how to apply our method in this context, and then show some results.
4.1. Details on implementing proposed method
We first build, for every MELD score between 6 and 40, a separate Kaplan–Meier model for the probability of remaining untransplanted for t days. These constitute our 's from Section 2.5, and portions of some of these can be found in Table 2. Armed with each MELD's survival function for time until transplant, we can calculate (using Equation (3)) what a person's probability of remaining untransplanted is at any time, given their MELD trajectory up until that point. For example, consider the medical history belonging to person i, shown in the first four columns of Table 3. Entries in the fifth and final column of Table 3 come from Table 2, as explained in the next paragraph. These will be helpful in calculating the Kaplan–Meier weights.
Table 2.
Selected 's.
| t | |||
|---|---|---|---|
| 1 | 0.9633 | 0.9560 | 0.9491 |
| 3 | 0.9263 | 0.8974 | 0.8353 |
| 10 | 0.8744 | 0.8326 | 0.7281 |
| 12 | 0.8744 | 0.8326 | 0.7073 |
Table 3.
Example MELD trajectory.
| MELD | Start | Stop | Death | |
|---|---|---|---|---|
| 30 | 0 | 12 | 0 | 0.8744 |
| 31 | 12 | 15 | 0 | 0.8974 |
| 35 | 15 | 25 | 0 | 0.7281 |
| 31 | 25 | 28 | 0 | 0.8974 |
Referring to Table 2, we find that the (estimated) probability of remaining untransplanted after 12 days at a MELD of 30 is . Similarly, and . The final row of Table 3 corresponds to going 3 days without transplant at MELD 31 again, so we can reuse the . Assuming independent sojourn times, we find (see Equation (3)) that the probability that person i remains untransplanted at day 28, conditional on the MELD history given above, is:
Thus, the weight for person i at time is . Recall that the weight needs to be calculated for each person and each failure time. Suppose is another death time. To calculate the associated weight for person i, we need the probability that a MELD 31 patient remains untransplanted after 1 day, which is given by It follows that
This sort of calculation can be made for any person on the list at any time τ.
4.2. Why censoring is noninformative in the models
It is worth reiterating that in the construction of these 's, death, a change in MELD, administrative end of study, or leaving the waitlist for some other reason gets coded as ‘censoring’, while getting a transplant is coded as ‘failure’. In this situation, the Kaplan–Meier assumption of independent ‘failure’ and ‘censoring’ times is much more tenable than in the model where death constitutes failure, because those with a MELD of, say, 30 who are censored by death or change of MELD are indeed quite representative (in terms of their probability of getting a transplant) of those who are not censored. In the analogous model with death as failure, however, there is no censoring for change of MELD, and someone who had a MELD of 30 and gets censored (by transplant) likely has a higher MELD than and is therefore no longer representative (in terms of without-transplant probability of death) of those who do not get censored (by transplant). If one were to build a model with death as failure in which there were censoring for change of MELD, it could not be used to answer our ultimate question of without-transplant survival probability as it implicitly conditions on MELD staying the same until death. But this kind of conditional probability is uninteresting to a patient whose future MELD status is unknown.
4.3. Formatting data for calculating the probability of survival for each MELD
To estimate without-transplant survival by MELD, we use a data set comprised, initially, of one (possibly censored) survival time since arriving at that MELD per person. If a candidate reached the same MELD multiple times, we selected the first only, so that the model estimates survival since first arriving at a certain MELD. Refer again to person i's data in Table 3 above. In the calculation of the survival curve for MELD 30, person i would contribute one row of data: (Time = 28, Death = 0), since that is how long patient i survived (before getting censored) since obtaining a MELD score of 30. This initial data set is then stretched into a longer format by expanding each person's row into a set of rows of adjacent intervals in time, starting at the unique 's (i.e. the unique failure times) until that person is censored or dies. The weights are then calculated separately for each row in the stretched data set. For example, assume that the only deaths among MELD 30 patients occurred at days 12, 26, 28, 30, and 45; that is, these are the 's. Then person i's single row of (Time = 28, Death = 0) would expand into three rows, wherein the weight is (see Table 4). We note that this differs slightly from the approach of Willems et al. [15], who used ; for a large data set wherein at least one death occurs at almost every , the difference in estimated survival functions is very small.
Table 4.
Example data for MELD 30 expanded into ‘long format.
| Start | Stop | Death | Weight(Stop) |
|---|---|---|---|
| 0 | 12 | 0 | 1.1436 |
| 12 | 26 | 0 | 1.8309 |
| 26 | 28 | 0 | 1.9505 |
Once each person's row is similarly expanded, the resulting stretched data set with weights is used to construct the survival estimate in Equation (2). This can then be repeated for every MELD score.
4.4. SRTR data
The Scientific Registry of Transplant Recipents (SRTR) data system includes data on all donors, waitlisted candidates, and transplant recipients in the U.S. There were 34,878 adult (i.e. at least 18 years old at listing) liver transplant waitlist candidates who were newly listed or who had at least one MELD change during the study period between 01/01/2016 and 01/01/2019. This number excludes the of cases with missing data as well as data from any candidates who received a MELD exception4 during the study period. From the 34,878 waitlist candidates, we took a random sample of 10,000 candidates which we used for analysis. This subsampling step was necessary because fitting the Cox model to the full data set required more computer storage than we had available. Importantly, the proposed model handles the full data set with ease, as we demonstrated in a recent paper we wrote for a medical audience [14].
4.5. Results from SRTR analysis
Figure 2 contains the 30-day survival estimates and 95% confidence intervals from three of the models studied in Section 3: the naive Kaplan–Meier model with bias correction (NaiveKM2), the bias-corrected Cox model (Cox), and the proposed model (smartKM). (We omitted NaiveKM1 since intuition suggests and simulation confirmed that NaiveKM2 is preferable.) In all cases, the bias correction was done using weights calculated as described in Section 2.5. Note that the x-axis of Figure 2 is MELD (i.e. z), not Time as in Figure 1. In other words, Figure 2 was created using estimates ( standard errors) for each .
Figure 2.

30-day without-transplant survival and 95% confidence intervals, as estimated by the bias-corrected Cox model, the bias-corrected naive Kaplan–Meier, and the proposed method (SmartKM).
We see that the naive Kaplan–Meier and the proposed method produce similar estimates, but the proposed method's confidence intervals are much narrower. The Cox model appears moderately biased for high MELDs despite the bias correction. In light of this, the narrowness of its confidence intervals is largely irrelevant. These findings agree qualitatively with what we saw in the simulation study.
5. Discussion and future work
There are some limitations to the proposed approach. First, recall that we construct weights by inverting the estimated probability of remaining untransplanted. But Tin [13] showed that there are better techniques for estimating 1/K than , especially when the number of data points is small. Whether improved ratio estimation would have a practical impact in the context of liver disease is unclear, however.
Second, we saw moderate undercoverage for large t when using the standard errors of Xie and Liu [16]. We suspect that this stems from the strong dependence between whether someone remains untransplanted and the censoring status. (It is worth noting that because some censoring mechanisms are not due to transplant, these variables are not perfectly correlated.) Some preliminary investigation on our part suggests that the bootstrap may mitigate this problem, but a systematic study is needed to confirm this. Such a study is bound to be quite computationally expensive, which is why we have not undertaken it here. In light of the computational burden of bootstrapping, derivation of a closed-form standard error that takes said dependence into account would be especially welcome.
A third potential improvement stems from our assumption that, conditional on MELD, individuals have the same cumulative distribution function for time until transplant. In practice, certain geographic locations have more available livers per person, so the probability of transplant in the next t days, i.e. , is higher in those regions. A model for (or, equivalently, ) which takes location or liver supply into account may therefore be more accurate than the model we have chosen. In addition, it is worth noting that semiparametric or parametric alternatives for (or ) fit very naturally within the proposed framework.
In this paper, we corrected for the bias induced when one censoring mechanism is informative of failure time. It is possible, however, for multiple censoring mechanisms to be informative of failure time. For example, in the liver transplant context, it is possible for patients to leave the waitlist because their condition improved, even though they did not get a transplant. Those who are censored because of improvement are not representative of those who remain on the waitlist; in particular, the former will have better without-transplant survival. We chose to ignore this in the analysis presented here, because in our data set censoring by improvement occurred rather rarely (about 15% as often as transplant). It is possible, however, to construct another set of weights based on the probability of remaining unimproved (instead of untransplanted). The weights used in the weighted Kaplan–Meier would be the product of the original set of weights and the additional set. This same methodology extends straightforwardly to more than two mechanisms of informative censoring.
Because it is fully nonparametric, the proposed model suffers the drawbacks inherent to that class of methods. In particular, a correctly specified parametric model will tend to outperform ours, and a lot of data may be required to reliably estimate some survival probabilities, even with the efficiency gains enabled by our use of multiple data points per subject. At a reviewer's suggestion, we tried a kernel smoothing method to deal with this problem. In particular, we binned MELDs together into these groups: , and calculated survival for each group using the SmartKM procedure applied to data from each bin. As expected, this smooths out the curve presented in Figure 2, but it makes interpretation a bit more difficult. We think such techniques would be interesting to explore in future work.
Despite these limitations, the proposed model improves upon traditional approaches in several important ways. First, it avoids the Cox model's proportional hazards assumption, which is often violated in practice and may yield biased estimates. And while Cox modeling with time-dependent covariates has been suggested as another way to handle the violated proportional hazards assumption, that approach can only be used to estimate hazard functions and cannot recover survival estimates, as we do here. Second, the proposed method has lower standard errors than a Kaplan–Meier model that uses just one observation per subject. Third, incorporating nonparametrically estimated weights allows us to remove bias due to informative censoring, a technique missing in most of the without-transplant survival literature. (For notable exceptions, see works by Schaubel and coauthors, e.g. [10,11].) We recommend the proposed method whenever one would like to estimate survival nonparametrically in the presence of a time-dependent covariate that affects both censoring and survival.
6. Supplemental materials
We have included the R code for the simulation presented in Figure 1, along with an .RData file created by said code.
The data reported here have been supplied by the Hennepin Healthcare Research Institute (HHRI) as the contractor for the Scientific Registry of Transplant Recipients (SRTR). The interpretation and reporting of these data are the responsibility of the author(s) and in no way should be seen as an official policy of or interpretation by the SRTR or the U.S. Government.
Supplementary Material
Appendix: Relaxing the assumption of a common survival function.
Individuals may differ in their disease state transition matrix, which induces differences in without-transplant survival given the current disease state. We used simulation to learn whether failing to take such heterogeneity into account induces bias in the proposed estimation procedure. We used the same setup as described in Section 3.4, except that instead of a common one-day transition matrix we generated for each of the 5,000 candidates their own transition matrix. Rows were sampled independently from the Dirichlet distribution, with a vector parameter of 100 times the corresponding row of P. Results are contained in Table A1, where we see negligible bias for t = 3 and t = 7, small overestimation for t = 30, and moderate overestimation for t = 90.
Table A1.
Mean survival estimate (across 5000 replicates) versus true survival probability under relaxation of the common transition matrix assumption.
| z | t | mean | |
|---|---|---|---|
| 1 | 3 | 0.9818 | 0.9818 |
| 1 | 7 | 0.9466 | 0.9463 |
| 1 | 30 | 0.7102 | 0.6835 |
| 1 | 90 | 0.3642 | 0.2551 |
| 2 | 3 | 0.7846 | 0.7829 |
| 2 | 7 | 0.5880 | 0.5802 |
| 2 | 30 | 0.2117 | 0.1921 |
| 2 | 90 | 0.0912 | 0.0611 |
Funding Statement
This work was supported in part by grant number R01DK111233 from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK).
Notes
In a simulation documented in the Appendix, we explore what happens when we relax this assumption by allowing individuals to have their own survival functions drawn from a common distribution.
The (z = 2, t = 90) case was actually based on 497 replicates because 3 had an undefined survival estimate, which happens whenever the last subject is censored before t = 90.
MELD is an acronym for ‘model for end-stage liver disease’.
In some circumstances, the MELD score is known to not accurately convey the urgency with which one needs a transplant; priority for these candidates is determined in a different manner that we do not discuss here.
Disclosure statement
The authors report no potential conflict of interest.
References
- 1.Cox D., Regression models and life tables, J. R. Stat. Soc. Ser. B 34 (1972), pp. 187–220. [Google Scholar]
- 2.Fisher L. and Lin D., Time-dependent covariates in the Cox proportional-hazards regression model, Annu. Rev. Public Health 20 (1999), pp. 145–157. [DOI] [PubMed] [Google Scholar]
- 3.Kaplan E. and Meier P., Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53 (1958), pp. 457–481. [Google Scholar]
- 4.Kleinbaum D.G. and Klein M., Survival analysis: A self-learning text, 3rd ed., Springer, New York, 2012. [Google Scholar]
- 5.Lagakos S., General right censoring and its impact on the analysis of survival data, Biometrics 35 (1979), pp. 139–156. [PubMed] [Google Scholar]
- 6.Leung K.-M., Elashoff R., and Afifi A., Censoring issues in survival analysis, Annu. Rev. Public Health 18 (1997), pp. 83–104. [DOI] [PubMed] [Google Scholar]
- 7.Massie A., Kucirka L., and Segev D., Big data in organ transplantation: Registries and administrative claims, Am. J. Transplant. 14 (2014), pp. 1723–1730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Organ Procurement and Transplantation Network , National Data. (2021) https://optn.transplant.hrsa.gov/data/view-data-reports/national-data/.
- 9.Robbins J.M. and Finkelstein D.M., Correcting for noncompliance and dependent censoring in an aids clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests, Biometrics 56 (2000), pp. 779–788. [DOI] [PubMed] [Google Scholar]
- 10.Schaubel D. and Wei G., Double inverse-weighted estimation of cumulative treatment effects under nonproportional hazards and dependent censoring, Biometrics 67 (2011), pp. 29–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Schaubel D., Wolfe A., Sima C., and Merion R., Estimating the effect of a time-dependent treatment by levels of an internal time-dependent covariate: Application to the contrast between liver waitlist and posttransplant mortality, J. Am. Stat. Assoc. 104 (2009), pp. 49–59. [Google Scholar]
- 12.Therneau T., A package for survival analysis in R. R package version 3.2-11. 2021.
- 13.Tin M., Comparison of some ratio estimators, J. Am. Stat. Assoc. 60 (1965), pp. 294–307. [Google Scholar]
- 14.VanDerwerken D., Wood N., Segev D., and Gentry S., The precise relationship between MELD and survival without a liver transplant, Hepatology 74 (2021), pp. 950–960. [DOI] [PubMed] [Google Scholar]
- 15.Willems S., Schat A., van Noorden M., and Fiocco M., Correcting for dependent censoring in routine outcome monitoring data by applying the inverse probability censoring weighted estimator, Stat. Meth. Med. Res. 27 (2018), pp. 323–335. [DOI] [PubMed] [Google Scholar]
- 16.Xie J. and Liu C., Adjusted Kaplan–Meier estimator and log-rank test with inverse probability of treatment weighting for survival data, Stat. Med. 24 (2005), pp. 3089–3110. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

