Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jan 6.
Published in final edited form as: J Stat Theory Pract. 2013 Apr 5;7(2):10.1080/15598608.2013.771556. doi: 10.1080/15598608.2013.771556

Generalized Redistribute-to-the-Right Algorithm: Application to the Analysis of Censored Cost Data

SHUAI CHEN 1, HONGWEI ZHAO 2
PMCID: PMC3882135  NIHMSID: NIHMS534713  PMID: 24403869

Abstract

Medical cost estimation is a challenging task when censoring of data is present. Although researchers have proposed methods for estimating mean costs, these are often derived from theory and are not always easy to understand. We provide an alternative method, based on a replace-from-the-right algorithm, for estimating mean costs more efficiently. We show that our estimator is equivalent to an existing one that is based on the inverse probability weighting principle and semiparametric efficiency theory. We also propose an alternative method for estimating the survival function of costs, based on the redistribute-to-the-right algorithm, that was originally used for explaining the Kaplan–Meier estimator. We show that this second proposed estimator is equivalent to a simple weighted survival estimator of costs. Finally, we develop a more efficient survival estimator of costs, using the same redistribute-to-the-right principle. This estimator is naturally monotone, more efficient than some existing survival estimators, and has a quite small bias in many realistic settings. We conduct numerical studies to examine the finite sample property of the survival estimators for costs, and show that our new estimator has small mean squared errors when the sample size is not too large. We apply both existing and new estimators to a data example from a randomized cardiovascular clinical trial.

Keywords: Mean cost, Median cost, Redistribute-to-the-right, Replace-from-the-right, Survival analysis, Survival estimator for costs

1. Introduction

High and rising health care costs in an environment of limited resources have sharpened the focus on economic evaluation of new treatments. Studies of cost-effectiveness usually aim at evaluating new treatments in the hope of finding an effective treatment that does not cause too much financial burden on society. In clinical trials and observational studies, survival time and health costs frequently are censored for administrative reasons, since not all patients can be observed until events such as death or disease relapse occur. Censoring poses a unique problem for cost estimation due to the “induced informative censoring” problem, first noted by Lin and colleagues (Lin et al. 1997). Traditional survival analysis methods assume that the censoring time is independent of the survival time (conditional on some covariates). However, the costs at censoring time are no longer independent of the total uncensored costs. For example, a healthier patient will accumulate costs more slowly, and therefore will have lower costs at the censoring time and at the potential event time (Lin 2003). Thus, many standard approaches for survival analysis, such as the Kaplan–Meier estimator (Kaplan and Meier 1958), or the Cox regression model (Cox 1972), are not valid for the analysis of cost data.

Many researchers have proposed methods for estimating mean medical costs. Most focus on restricted medical costs, that is, the costs accumulated within a time limit. Among them, Lin et al. (1997) proposed estimators via survival probability weighting using partitioned time intervals; Bang and Tsiatis (2000) proposed consistent estimators using the inverse probability weighting technique; and Zhao and Tian (2001) proposed a more efficient estimator. Later, Zhao et al. (2007) discovered some special conditions under which the estimators without using cost history and those using cost history become identical within each class.

Although many estimators for the mean costs have appeared in the literature, these often are deeply based in theory and therefore less accessible to practitioners. To address this situation, Zhao et al. (2011) established a mathematical equivalency between the BT estimator for the mean costs (Bang and Tsiatis 2000), and a replace-from-the-right (RR) algorithm (Pfeifer and Bang 2005). Thus, the BT estimator, which is based on the inverse probability weighting technique (Horvitz and Thompson 1952), has a more intuitive explanation from the point of the RR algorithm. Motivated by this idea, we propose a modified RR algorithm, the RRimp method, which utilizes cost history information and therefore is generally more efficient than the RR estimator. We provide a proof of the mathematical equivalence between the RRimp method and an existing estimator for the mean costs, the ZT estimator (Zhao and Tian 2001). The ZT estimator was derived from complicated theory. Therefore, the RRimp algorithm provides insight on how the ZT estimator works and eventually can help promote its application in practice.

Cost data are often highly skewed, with most patients incurring relatively small costs but a few accumulating huge costs. It is often desirable, therefore, to estimate the median and other quantiles of the costs. These quantities are readily available if we can estimate the survival function of costs. Using the original redistribute-to-the right algorithm (Efron 1967), which was used for explaining the Kaplan–Meier estimator, we propose an RRS survival estimator for costs, and show that it is equivalent to a simple weighted (SW) survival estimator for costs (Zhao and Tsiatis 1997; Zhao et al. 2012), which uses the inverse probability weighting technique. We further extend this method to propose an RRimpS survival estimator. We conduct simulation studies to compare this RRimpS survival estimator with the RRS survival estimator (or equivalent SW estimator), and with a more efficient ZTS survival estimator (Zhao and Tsiatis 1997; Zhao et al. 2012). We discuss our findings in the Conclusion section.

2. Notation and Assumptions

For the ith individual in the study, i = 1, 2, . . . , n, we define Ti as the survival time from the beginning of the study until the occurrence of some event, for examples death or disease relapse. The censoring time for the ith individual is denoted as Ci. We can observe either the survival time or the censoring time, whichever is shorter; that is, we observe the follow-up time Xi = min(Ti, Ci) and the indicator variable Δi = I(Ti ≤ Ci). We define Mi(t) as the accumulated cost of patient i from time 0 to t. For some real applications, we observe only the total cost Mi = Mi(Xi). However, in other studies, we may know the entire cost history, Mi(t), 0 < t < Xi.

We assume that the censoring variable is independent of the survival time and cost accumulation process, a condition that is often satisfied in well-conducted clinical trials and in some observational studies where censoring occurs mainly for administrative reasons. Due to the presence of censoring, the marginal distribution of cost may be nowhere identifiable without making some parametric assumptions (Huang 2002). Hence we adopt an approach that focuses on the accumulated cost by a time limit L, where L is chosen such that a reasonable number of subjects are still being observed at that time. A consequence of applying such a restriction is that a survival time longer than L can be considered equivalently as having an event at time L, that is, TiL=min(Ti,L) (we still use Ti for notational convenience).

We consider the problem of estimating the mean cost, μ = E{Mi(Ti)}, and the survival function of cost, S(x) = Pr{Mi(Ti > x}, for costs accumulated to a time L. For reasons that become clear in the following, we also need to define the survival function for the event time as ST(t) = Pr(Ti > t), and the survival function for the censoring time as K(t) = Pr(Ci > t).

3. Estimating the Mean Cost

3.1. Without Using Cost History: The BT Estimator and Its Equivalent RR Estimator

Bang and Tsiatis (2000) proposed a consistent estimator for the mean costs accumulated over time L with censored data, based on the inverse probability weighting technique:

μ^BT=1ni=1nΔiMiK^(Ti), (1)

where Mi is the total observed cost for the ith individual, and (Ti) is the Kaplan–Meier estimator for the survival function of the censoring time, K(t) = Pr(Ci > t). K(Ti) represents the probability that a subject is uncensored at Ti. The basic idea of the BT estimator is that each complete observation represents potential 1/K̂(Ti) observations that might be censored.

Even though the BT estimator is easy to obtain mathematically, for many a full understanding of its mechanism is not very intuitive. The replace-from-the-right (RR) estimator proposed by Pfeifer and Bang (2005), on the other hand, is more so. To explain the main idea of the RR method, first we note that in the absence of censoring, a mean cost estimator is simply the average of costs from all observations. When a subject is censored, we know that this subject lives longer than his/her censoring time, but we do not have information on his/her total cost. In the RR algorithm, we replace this subject's cost by an average of costs from those individuals who survived longer than this subject. Specifically, an RR estimator for the mean costs can be obtained by first arranging all the subjects from the shortest observed time to the longest. If some of these are equal, we put the event time before the (same) censored time. Since we focus on time-restricted cost estimation, we can assume that the individual with the longest observed time is uncensored. We then move from the right (the longest observation time) to the left (the shortest observation time). When we encounter the first censored observation, say, at time Ci, we replace its costs by the average of costs from all the observations to its right,

MiRR=j=1nI(Xj>Ci)Mjj=1nI(Xj>Ci). (2)

We move to the left and repeat this process of replacing all the censored costs with the average of all upstream costs (some of which are real costs and some are replaced costs). The RR mean cost estimator is simply an average of all the costs from both complete observations and censored observations (replaced costs), that is,

μ^RR=1ni=1n{ΔiMi+(1Δi)MiRR}. (3)

Although the BT estimator (1) and the RR method (3) look quite different—the former is based on a well-known theory, and the latter makes intuitive sense—it is rather amazing that the two estimators in fact are mathematically equivalent (see Zhao et al. [2011] for a detailed proof).

Note that if we replace the costs M by the survival time T (restricted by time L), we also obtain an equivalency between the RR estimator for the mean (restricted) survival time, and a simple weighted estimator for the mean survival time,

μ^T=1ni=1nΔiTiK^(Ti).

Since this simple weighted estimator has been shown to be equivalent to the area under the Kaplan–Meier survival curve (Satten and Datta 2001; Zhao and Tian 2001), we are providing an alternative and simpler way for obtaining the (restricted) area under the Kaplan–Meier survival curve using the RR algorithm.

3.2. Using the Cost History: the ZT Estimator and Its Equivalent RRimp Estimator

The BT estimator and its equivalent RR algorithm use only the total cost information from uncensored subjects. Hence, they are not very efficient. An improved estimator proposed by Zhao and Tian (2001) utilizes cost history information from both censored and uncensored observations. Therefore this ZT estimator is often more efficient. It has the following simplified form (Pfeifer and Bang, 2005):

μ^ZT=1ni=1nΔiMiK^(Ti)+1ni=1n(1Δi){Mi(Ci)M(Ci)¯}K^(Ci), (4)

where M(Ci)¯=j=1nI(XjCi)Mj(Ci)j=1nI(XjCi), which is the average cumulative cost at time Ci of those subjects who are alive at Ci.

The ZT estimator consists of two terms. The first is the BT estimator. The second term is constructed using cost history information, which can be viewed as an adjustment term. The ZT estimator gains more efficiency through an adjustment made to the BT estimator using the difference of censored costs and the average accumulated costs at the same time point. Zhao and Tian (2001) established the large sample property for this estimator, and showed that the estimator is consistent and asymptotically normally distributed. Furthermore, Zhao et al. (2007) described the conditions under which this estimator is equivalent to the partitioned Bang and Tsiatis (2000) estimator (BTp), as well as to the two estimators of medical costs LinA/B proposed by Lin et al. (1997).

Since the BT estimator has an intuitive explanation through the RR algorithm, naturally one may wonder whether the ZT estimator has a similar intuitive explanation. Therefore we propose an RRimp algorithm, which makes intuitive sense, and later we show that it is equivalent to the ZT estimator. In contrast to the simple RR method, which depends only on the total costs from complete observations, the RRimp algorithm uses the cost history information from both censored and complete observations. Intuitively, for a censored subject i, we already know his/her accumulated cost before censoring Mi = Mi(Ci). Hence, we need only to estimate his/her cost beyond the censoring time point, Mi(Ti) – Mi(Ci). We propose to impute this cost using the average of all additional costs beyond the censoring point Ci from those subjects who survive longer. The detailed RRimp estimator can be described as follows. First, arrange all the subjects from the shortest to the longest follow-up time. If some of these are the same, we assume events happen shortly before censoring times. Since we focus on time-restricted (say, by L) cost estimation, we assume that the individual with the longest observed time (i.e., L) is uncensored. Starting from the right (the longest observed time) we move to the left. We first find the longest censoring time, denoted as Ci. We replace the cost for this observation by summation of his/her observed costs and the average additional accumulated costs from all subjects who have a longer survival time, that is,

MiRRimp=Mi+j=1nI(Xj>Ci){MjMj(Ci)}j=1nI(Xj>Ci). (5)

We then move to the second longest censoring time and perform the same replacement procedure, using the replaced cost for the longest censoring time in calculating the average. We move to the left and repeat this process until we replace all the censored costs. The RRimp estimator is then obtained by an average of costs from all complete observations (real costs) and the censored observations (replaced costs), that is,

μ^RRimp=1ni=1n{ΔiMi+(1Δi)MiRRimp}. (6)

We illustrate this algorithm using a simple example. Suppose we observe the following data: follow-up time X = {1, 2, 3, 4, 5}, death indicator Δ = {1, 0, 1, 0, 1}, and their accumulated costs Mi(·) are shown in the figure that follows. Here the 2nd and 4th subjects are censored. In Step 1, we try to obtain the replacement cost for subject 4. Since subject 5 is the only one surviving longer than subject 4, the replacement cost for subject 4 is equal to the summation of the censored cost of subject 4 (= 60) and the additional cost of subject 5 beyond time C4 (= 40 – 30), which is 70. Similarly, in Step 2 we try to obtain the replacement cost for subject 2 by adding the observed cost of subject 2 (= 50) and the average of additional costs after time C2 for subject 3 (= 100 – 60, real costs), subject 4 (= 70 – 20, replaced costs) and subject 5 (= 40 – 10, real costs), which is equal to 90. Therefore, the mean cost estimated from the RRimp method gives an estimate of 62, as shown in the graph here.

Xi = 1 2 3 4 5
x o x o x
M1(·) = 10
M2(·) = 20 50
M3(·) = 30 60 100
M4(·) = 10 20 40 60
M5(·) = 5 10 20 30 40
Step 1: (M4RRimp) 70{= 60 + (40 – 30)}
Step 2: (M2RRimp) 90{= 50 + [(100 – 60) + (70 – 20) + (40 – 10)]/3}
μ^RRimp=(10+90+100+70+40)5=62.

Meanwhile, the ZT estimator of the mean cost obtained from the same data set is:

μ^ZT=15i=15ΔiMiK^(Ti)+15i=15(1Δi){Mi(Ci)M(Ci)¯}K^(Ci)=15(101+10034+4038)+15(503534+604538)=15(10+4003+3203)+15(20+40)=50+12=62,

where the Kaplan–Meier estimates for K(t) = Pr(Ci > t) are (Xi) = (1, 3/4, 3/4, 3/8, 3/8), at Xi = {1, 2, 3, 4, 5}, and M(Ci)¯={35,45}, at Ci = {2, 4}, respectively. Hence, we obtain exactly the same estimate for the mean costs through both the ZT estimator and the RRimp method using this data set. In the appendix we provide mathematical proof of the equivalence between the ZT estimator and the RRimp estimator for any data set.

In summary, when censoring of data is present, we cannot observe full costs for every subject. If we have cost history information, we can replace the censored cost by supplementing what we can observe with the average of the additional accumulated costs from upstream observations. This RRimp method is mathematically equivalent to the ZT estimator, and, as demonstrated by simulations and examples in Zhao and Tian (2001), is generally more efficient than the BT estimator and its equivalent RR method.

4. Estimating Survival Functions for Costs

In addition to estimating the mean costs, we may want to estimate the survival function of costs in practice. The survival function can provide more information about costs, such as medians and quartiles, which are more robust to outliers. Motivated by the idea of the replace-from-the-right algorithm for estimating mean costs, we investigate how to use similar approaches to develop survival estimators for the costs. We show that a naive way of deriving the survival estimator based on the replace-from-the-right algorithm will result in a biased estimator. Instead, we propose a new RRS estimator for the survival function of costs, based on the original redistribute-to-the-right idea from Efron (1967) for estimating the survival function of a failure time. Within this section only, when the context is clear, we use the same abbreviation “RR” to stand for redistribute-to-the-right. We show that the RRS estimator is equivalent to a simple weighted (SW) survival estimator of costs, whose form was first described in the context of estimating quality-adjusted lifetime by Zhao and Tsiatis (1997). We also attempt to derive a survival estimator RRimpS based on a modified RR algorithm that uses cost history information. We discuss the advantages and disadvantages of such an estimator.

4.1. The SW Estimator and Its Equivalent RRS Estimator

Following the work of Zhao and Tsiatis (1997) and Zhao et al. (2012), a SW estimator for the survival function of costs can be obtained by:

S^SW(x)=1ni=1nΔiK^(Ti)I(Mi>x). (7)

The large sample properties of this estimator, such as its consistency and asymptotic normality, were established by Zhao and Tsiatis (1997).

To construct an equivalent survival estimator, one is tempted to use the replacement costs at each censoring point and estimate the survival function for costs using the following formula:

S^naive(x)=1ni=1n{ΔiI(Mi>x)+(1Δi)I(MiRR>x)}. (8)

Unfortunately, if we use the empirical distribution function just shown to estimate the survival function for costs, treating the replaced costs as if they were the real costs, the estimated curve will be biased although the area under the curve, that is, the estimated mean costs, is unbiased. This is demonstrated in subsequent simulation studies.

In order to find an equivalent RRS estimator, we rely on the original redistribute-tothe-right idea proposed by Efron (1967), used to explain the Kaplan–Meier estimator for survival time. For each censored subject, since we do not know the actual costs, we will find the contributions from observations that have longer follow-up time than this subject. Specifically, we first sort all subjects according to their observation times from the shortest (left) to the longest (right). For any tied observations, we assume the death event occurs a little earlier than the censored time. We also assume that the individual with the longest observed time is uncensored, since we focus on time-restricted cost estimation. Consider a censored observation i whose initial weight is set to be 1. We distribute its weight evenly to all the time points to its right. For example, if there are ni such observations, then each one gets a weight of 1/ni. Next we find the nearest censored observation to its right, and redistribute its weight again evenly to all the observations to its right. We repeat this process until we have redistributed the weight of the longest censoring time. Note that after redistribution the weights are nonzero only at those complete observations that are on the right side of the censored observation i. Denote the final weight at the jth complete event time as Wj(i), representing the contribution of a complete subject j to the censored subject i.

Due to censoring we often cannot evaluate the mark I(Mi > x). Instead we use the weighted sum

I(Mi>x)RR=j=1nΔjI(Tj>Xi)Wj(i)I(Mj>x) (9)

as the replacement mark. As a result, the RRS estimator for the survival function of costs is

S^RR(x)=1ni=1n{ΔiI(Mi>x)+(1Δi)I(Mi>x)RR}. (10)

We illustrate this idea using a simple example. Assume we have data [X = {1, 2, 3, 4, 5}, Δ = {1, 0, 1, 0, 1}, M = {10, 20, 40, 30, 50}]. As shown in the following graph, we first find the weight Wj(2), that is, the contribution of complete observations to the censored observation 2. In Step 0, the censored observation 2 gets the weight of 1. In Step 1, we distribute its weight of 1 to all of the 3 observations to its right, so that each gets a weight of 1/3. Moving to the next censoring time, observation 4, we distribute its weight of 1/3 to the one observation to its right, making the weight at time 5 to be 2/3. Hence we have W3(2)=13, and W5(2)=23.

Xj = 1 2 3 4 5
x o x o x
Step 0: 0 1 0 0 0
Step 1: 0 0 13 13 13
Step 2: 0 0 13 0 23(=13+13)
Wj(2) 13 23

It is easy to obtain the contributions of complete observations to the censored observation 4, in this case W5(4)=1. Hence the RRS estimator is

S^RR(x)=15i=15[ΔiI(Mi>x)+(1Δi)I(Mi>x)RR]=15{I(M1>x)+I(M3>x)+I(M5>x)+I(M2>x)RR+I(M4>x)RR}=15{I(M1>x)+I(M3>x)+I(M5>x)+13I(M3>x)+23I(M5>x)+I(M5>x)}=15{I(M1>x)+43I(M3>x)+83I(M5>x)}.

The simple weighted estimator for this example is

S^SW(x)=15i=15{ΔiI(Mi>x)K^(Ti)}=15{I(M1>x)1+I(M3>x)34+1(M5>x)38}=15{I(M1>x)+43I(M3>x)+83I(M5>x)}.

It is clear that the RRS estimator is equivalent to the SW survival estimator for costs in this example.

Remarks.

  1. It is not difficult to show that the weight Wj(i) is related to the estimated conditional probability of an event occurring at Xj given that the subject is alive at Xi (discrete case). Thus, Wj(i) can be easily obtained as follows:
    Wj(i)=1nS^T(Ci)K^(Tj), (11)
    where ŜT(x) is the Kaplan–Meier estimator for Pr(T > x), and (x) is the Kaplan-Meier estimator for Pr(C > x).
  2. We can show that this RRS estimator (10) for the survival function of costs is mathematically equivalent to the SW estimator based on the similar proofs for mean cost estimators.

  3. The weights Wj(i) are exactly the weights needed for obtaining the replaced costs for a censored observation i, in estimating the mean costs by the replace-from-the-right algorithm, that is,
    MiRR=j=1nΔjI(Xj>Xi)Wj(i)Mj.
    Therefore, the replace-from-the-right algorithm for the mean cost estimator is a generalized version of the redistribute-to-the-right algorithm.
  4. The replaced costs MiRRimp from the RRimp estimator, however, are not equivalent to
    j=1nΔjI(Xj>Xi)Wj(i){Mi+MjMj(Ci)}, (12)
    since MiRRimp from (5) utilizes the cost information from censored observations beyond Ci while (12) does not.

4.2. RR Improved Survival Estimator for the Survival Function of Costs

As in the case of estimating the mean costs, the SW and its equivalent RRS estimator for the survival function of costs are not efficient since they utilize only the costs from complete observations. Based on the principles of constructing the RRS survival estimator and the RRimp estimator for mean costs, we propose an improved RR survival (RRimpS) estimator, as shown next:

S^RRimp(x)=1ni=1n{ΔiI(Mi>x)+(1Δi)I(Mi>x)RRimp, (13)

where

I(Mi>x)RRimp=j=1nΔjI(Tj>Xi)Wj(i)I(Mj(i)>x) (14)

is the new replacement mark, and Mj(i)=Mi+MjMj(Ci) is the replacement cost, combining information from censored observation i and complete observation j.

For a censored subject i, if we observe Mi(Ci) > x, then we know for sure that Mi(Ti) > x. This information is not utilized in the SW estimator (7), or the equivalent RRS estimator (4.3). However, it is captured in the RRimpS estimator (13) and (14), since Mj(i)=Mi(Ci)+MjMj(Ci)>x always holds under Mi(Ci) > x, and the sum of weights Wj(i) is 1, giving rise to I(Mi > x)RRimp = 1.

Because I(Mj(i)>x) is monotone in x and the weights are nonnegative, this RRimpS estimator is always monotone, which is a desirable property for a survival estimator. In contrast, an improved survival function estimator of costs, ZTS, first developed by Zhao and Tsiatis (1997) in the context of quality-adjusted survival time, and later applied to cost estimation (Zhao et al. 2012), cannot be guaranteed to be monotone (Huang and Louis 1998). From subsequent simulation studies and the real example, we see that the RRimpS estimator is also more efficient, in many practical situations, than both the SW estimator and the ZTS estimator.

Unfortunately, unlike the SW and the ZTS estimators, this RRimpS estimator is not always consistent. An intuitive reason for this inconsistency is as follows. We replace I(Mj > x) by I(Mj(i)>x) in the RRimpS estimator. Since Mj(Ci) and MjMj(Ci) are dependent, while Mi(Ci) and MjMj(Ci) are independent, the distribution of replaced cost Mj(i)=Mi(Ci)+MjMj(Ci) is different from the distribution of the true cost Mj = Mj(Ci) + MjMj(Ci). As a result, the RRimpS estimator performs worse when there is a high correlation among costs accumulated in different periods. Nonetheless, the simulation studies show that the bias is quite small, even for the worst-case scenario with a high correlation.

5. Simulation Studies

We conduct simulation studies under several different settings to evaluate the survival function estimators for costs. We generate survival times using an exponential distribution T ~ exp(10), and a uniform distribution T ~ Unif(0, 15). The survival time is truncated at L = 10. We generate censoring times using a uniform distribution: C ~ Unif(0, 22), for light censoring (25%-30%), and Unif(0, 15), for heavy censoring (37%-44%). The sample size is set to be 100, and the number of simulations is 1000.

We consider U-shaped sample paths for the cost distribution, similar to the simulation settings of Lin et al. (1997), Bang and Tsiatis (2002), and Zhao et al. (2012). We partition the entire time period of 10 years into 10 equal intervals. Each individual's costs consist of initial diagnostic costs incurred at time 0, terminal costs incurred during the last year before the failure time, fixed annual costs, and random annual costs (which vary from year to year). The diagnostic costs, fixed annual costs, random annual costs, and terminal costs are generated using a log normal distribution with parameters (10, 0.2452), (6, 0.2452), (4, 0.2452), and (9, 0.6322), respectively. We estimate the survival function of costs using the SW/RRS estimator, the ZTS estimator from Zhao and Tsiatis (1997), and our RRimpS estimator, under the four different simulation scenarios. We also examine the naive survival estimator of (8) for one of the settings.

Figure 1 shows the true survival function for costs and the average of the survival curves from the 1000 simulations using different estimators, for the setting with heavy censoring and exponential survival time. As expected, the SW/RRS estimator and the ZTS estimator are both unbiased since they almost coincide with the true survival curve. However, the naive estimator, obtained by using the replacement costs as the true costs, is severely biased. We observe similar biases for the naive method under other scenarios.

Figure 1.

Figure 1

The mean of estimated survival estimators for costs based on 1000 replications with exponential survival time under heavy censoring: the solid curve is the true survival function; the dashed curve is the SW/RRS estimator; the dot-dashed curve is the ZTS estimator; the dotted curve is the naive estimator.

Figure 2 and Figure 3 display the mean and sample variances of different survival function estimators for costs based on 1000 replications, under four simulation scenarios. The SW/RRS and ZTS estimators are consistent as in Figure 1, since these almost coincide with the true survival curve. Although from a theoretical point of view the new proposed RRimpS estimator is not always consistent, its average survival curves follow the true survival curves very well, for all the settings considered here. This indicates that the bias of the RRimpS survival estimator is relatively small. In the plots of the sample variances, we find that the ZTS estimator is more efficient than the SW/RRS estimator. More importantly, our RRimpS estimator outperforms both SW/RRS and ZTS estimators under all four of these scenarios, with more efficiency gain under heavy censoring. Hence, the RRimpS survival function makes a significant improvement in efficiency. This improvement is achieved without sacrificing the monotonicity property, unlike in the case of the ZTS estimator.

Figure 2.

Figure 2

The mean of estimated survival estimators for costs based on 1000 replications: the solid curve is for true survival function; the dashed curve is for SW/RRS estimator; the dot-dashed curve is for ZTS estimator; the dotted curve is for RRimpS estimator. (a) Scenario with exponential survival time under light censoring. (b) Scenario with exponential survival time under heavy censoring. (c) Scenario with uniform survival time under light censoring. (d) Scenario with uniform survival time under heavy censoring.

Figure 3.

Figure 3

The sample variance of estimated survival estimators for costs based on 1000 replications: the solid curve is for SW/RRS estimator; the dashed curve is for ZTS estimator; the dotted curve is for RRimpS estimator. (a) Scenario with exponential survival time under light censoring. (b) Scenario with exponential survival time under heavy censoring. (c) Scenario with uniform survival time under light censoring. (d) Scenario with uniform survival time under heavy censoring.

Since the RRimpS survival estimator performs worse when there is a high correlation between costs accumulated in different periods, we design an extreme case in order to examine how biased the RRimpS estimator could be. We generate the fixed annual costs using a log normal distribution with parameters (8, 0.2452), while setting the diagnostic costs, random annual costs, and terminal costs to be 0. All other parameters stay the same. Figure 4 displays the mean survival curves and the mean squared errors (MSE = variance + bias2), for the case with exponential survival time and heavy censoring, and for different sample sizes (n = 100, 400). We observe similar trends for other simulation settings. The bias for the RRimpS estimator is noticeable now, albeit very small. The MSE for the RRimpS estimator remains mostly the smallest among the three methods available, even when the sample size is as large as 400. In general, as the sample size gets larger, the variance becomes smaller but the bias stays the same. We expect the gain in terms of MSE for the RRimpS estimator will be most prominent when the sample size is small, or when the censoring rate is high.

Figure 4.

Figure 4

The mean and MSE of estimated survival estimators for costs under the extreme case based on 1000 replications with exponential survival time under heavy censoring. (a) Mean of estimated survival estimators for costs with sample size 100. (b) MSE of estimated survival estimators for costs with sample size 100. (c) Mean of estimated survival estimators for costs with sample size 400. (d) MSE of estimated survival estimators for costs with sample size 400.

6. A Real Data Example: MADIT-II

The Multicenter Automatic Defibrillator Implantation Trial II (MADIT-II) was one of a series of studies designed to examine the potential survival benefit of a prophylactically implanted defibrillator in patients with a prior myocardial infarction and other selection criteria (Moss et al. 2002). Patients were recruited into the study over time and were randomized into either the implantable cardiac defibrillator (ICD) arm or the conventional therapy (CONV) arm, with a ratio of 2:1. After the trial was completed, it was shown that the risk of death in the ICD group was lower (hazard ratio = 0.69, pvalue = 0.016).

Given the huge costs associated with the defibrillator and the implantation process, a cost-effectiveness analysis was conducted based on patients from the u.s. centers, with 664 patients in the ICD arm and 431 in the CONV arm (Zwanziger et al. 2006). The follow-up time varied from 11 days to 55 months, and the average was 22 months. As in their original paper, we examine the costs accumulated over 3.5 years. The estimated survival function for medical costs for the ICD and CONV groups, based on SW/RRS, ZTS, and RRimpS estimators, are shown in Figure 5. As mentioned earlier, the ZTS estimator is not monotone, while both the SW/RRS and the RRimpS estimator are monotone. Our RRimpS survival estimator for cost is also smoother than the SW/RRS and ZTS estimators. Figure 6 displays the standard errors of the estimators obtained by the bootstrap method. Similarly to the simulation studies, the standard errors of RRimpS are mostly the smallest for different costs, and SW/RRS are the largest. Therefore, our proposed RRimpS method might be a good alternative for smooth and efficient estimation of the survival function of costs.

Figure 5.

Figure 5

Estimated survival function for medical costs for the MADIT-II study: (a) is for the ICD arm, and (b) is for the CONV arm.

Figure 6.

Figure 6

Standard errors (SEs) of the survival estimators for costs obtained by 200 bootstrap replications for the MADIT-II study: (a) SEs for the ICD arm, and (b) SEs for the CONV arm.

7. Conclusion

In this article we extend the research of Zhao et al. (2011), who provided a link between a theoretically justified mean cost estimator based on the inverse probability weighting techniques, that is, the BT estimator, and an intuitive replace-from-the-right estimator, the RR estimator. We propose a modified replace-from-the-right algorithm, the RRimp estimator, which utilizes the cost history process and therefore is generally more efficient than the RR estimator. We establish a mathematical equivalency between the RRimp estimator and an improved mean cost estimator, the ZT estimator. In doing so we provide an intuitive explanation for how the ZT estimator works, and thereby engender a better understanding of the theoretically derived mean cost estimators, the BT and ZT estimators. Meanwhile, this article also gives justification for the simple, intuition-based RR and RRimp estimators. Without the theoretical background for a full understanding of the BT and ZT estimators, some practitioners may hesitate to use these. With a facilitated interpretation of the RR and RRimp estimators, and an established equivalency between these estimators and the BT and ZT estimators, we believe the proposed estimators can become more accessible and useful to practitioners.

Deriving an intuitive estimator for the survival function of costs proves to be a tougher problem. We show that a naive method using the replaced cost as the true cost in an empirical survival function gives rise to a biased estimator. Resorting to the original redistribute-to-the-right idea (Efron 1967) derived for explaining the Kaplan–Meier estimator, we construct an RRS survival estimator which can be shown to be equivalent to the SW survival estimator for costs. We also propose an RRimpS survival estimator that has the desirable property of being monotone and is usually more efficient than the SW/RRS survival estimator in many simulation studies and the real example we conducted. Unfortunately, this estimator is not always consistent. Judging from many simulations we conducted, the bias seems to be quite small however. It may be considered as an alternative survival estimator for costs in a real setting when cost history information is available, especially when the sample size is not very large or the censoring rate is high.

Both the replace-from-the-right and the redistribute-to-the-right algorithms can be viewed as special cases of imputation of missing data. Our work may motivate more research in the area of censored marked variables; quality-adjusted survival time and repeated events are two additional examples. Even though we demonstrated that the proposed RRimpS estimator was more efficient than the SW estimator in realistic settings, we did not provide theoretical justifications. In our future research we will attempt to develop the standard error estimate of the RRimpS estimator and to provide theoretical justification for its greater efficiency. We also aim to find a survival estimator for costs that is monotone, consistent, and efficient, if possible.

Acknowledgments

The authors thank Dr. Heejung Bang for her motivating ideas for this article. We also thank Dr. Arthur Moss and the Boston Scientific for use of their data in our example. We thank reviewers for providing constructive comments. This research was supported by R01 HL096575 from the National Heart, Lung, and Blood Institute.

Appendix

Proof of the Equivalency of the ZT Estimator and the RRimp Method for Estimating the Mean Costs

Suppose we have observed the following survival and cost history data

[{Xi,Δi,Mi,Mi(tj),j=1,,J},i=1,,n],

where i denotes individuals, tj(j = 1, , J) denotes the ordered distinctive censoring times. Let Yj indicate the number of people who have observation times greater than tj (i.e., Yj=i=1nI(Xi>tj)), and nj represent the number of people who are censored at time tj. If an event occurs at a censoring time tj, we assume this event happens shortly before tj. Therefore, the set {Xi = tj} consists only of censored data.

First, for the subject i}who is censored at tj (note that we allow multiple subjects who are censored at time tj), define δMi(tj) as the difference between the observed cost at time tj for the ith subject and the average accumulated cost at tj for subjects who are still alive at tj:

δMi(tj)=Mi(tj)M(tj)¯=Mi(tj)i:XitjMi(tj)Yj+nj.

Define M*(tj) as the sum of δMi(tj) over all subjects who are censored at tj:

M(tj)=i:Xi=tjδMi(tj)=i:Xi=tjMi(tj)njM(tj)¯=i:Xi=tjMi(tj)njYj+nji:XitjMi(tj).

Starting from the longest censoring time tJ, there are YJ subjects who have complete costs and whose survival times are greater than tJ. Hence, the RRimp cost for the kth subject censored at tJ is

MJ,kRRimp=Mk(tJ)+1YJi:Xi>tJ{MiMi(tJ)}.

Recall that the replacement cost from the RR method for the kth subject censored at time tJ is

MJRR=1YJi:Xi>tJMi,

and thus, the sum of the differences between MJ,kRRimp (in RRimp method) and MJRR (in RR method) at tJ is

k:Xk=tJ(MJ,kRRimpMJRR)=k:Xk=tJMk(tJ)+nJYJi:Xi>tJ{MiMi(tJ)}nJYJi:Xi>tJMi=i:Xi=tJMi(tJ)nJYJi:Xi>tJMi(tJ)=(1+nJYJ)i:Xi=tJMi(tJ)nJYJi:XitJMi(tJ)=(1+nJYJ){i:Xi=tJMi(tJ)nJYJ+nJi:XitJMi(tJ)}=(1+nJYJ)M(tJ). (A.1)

Now we move to the second longest censoring time tJ–1, where the number of subjects surviving longer than tJ–1 is YJ–1. The RRimp cost for the kth censored subject at tJ– is

MJ1,kRRimp=Mk(tJ1)+1YJ1i:Xi>tJ1{MiMi(tJ1)}=Mk(tJ1)+1YJ1[i:Xi>tJ1Δi{MiMi(tJ1)}+i:Xi=tJ{MJ,iRRimpMi(tJ1)}]=Mk(tJ1)+1YJ1[i:Xi>tJ1ΔiMii:Xi>tJ1ΔiMi(tJ1)i:Xi=tJMi(tJ1)+i:Xi=tJMi(tJ)+nJYJi:Xi>tJΔi{MiMi(tJ)}]=Mk(tJ1)+1YJ1{i:Xi>tJΔiMi+i:tJ1<XitJΔiMii:Xi>tJ1Mi(tJ1)+i:Xi=tJMi(tJ)+nJYJi:Xi>tJΔiMinJYJi:Xi>tJMi(tJ)}=1YJ1(1+nJYJ)i:Xi>tJΔiMi+1YJ1i:tJ1<XitJΔiMi+Mk(tJ1)1YJ1i:Xi>tJ1Mi(tJ1)+1YJ1i:Xi=tJMi(tJ)nJYJYJ1i:Xi>tJMi(tJ),

where the first two terms 1YJ1(1+nJYJ)i:Xi>tJΔiMi+1YJ1i:tJ1<XitJΔiMi=MJ1RR (Zhao et al. 2011). Thus, the sum of difference between MJ1,kRRimp and MJ1RR at tJ–1 is

k:Xk=tJ1(MJ1,kRRimpMJ1RR)=i:Xi=tJ1Mi(tJ1)nJ1YJ1i:Xi>tJ1Mi(tJ1)+nJ1YJ1i:Xi=tJMi(tJ)nJ1nJYJ1YJi:Xi>tJMi(tJ)=(1+nJ1YJ1)i:Xi=tJ1Mi(tJ1)nJ1YJ1i:XitJ1Mi(tJ1)+nJ1YJ1(1+nJYJ)i:Xi=tJMi(tJ)nJ1nJYj1YJi:XitJMi(tJ)=(1+nJ1YJ1)M(tJ1)+nJ1YJ1(1+nJYJ)M(tJ). (A.2)

Similarly, we have

k:Xk=tJ2(MJ2,kRRimpMJ2RR)=(1+nJ2YJ2)M(tJ2)+nJ2YJ2(1+nJ1YJ1)M(tJ1)+nJ2YJ2(1+nJ1YJ1)(1+nJYJ)M(tJ). (A.3)

In (A.1), the contribution of M*(tj) is (1+nJYJ). In (A.2), its contribution is nJ1YJ1(1+nJYJ). For (A.3), the contribution is nJ2YJ2(1+nJ1YJ1)(1+nJYJ). If we generalize the conclusion and sum up the equations from J to 1, we can find the contribution of M*(tJ) is

(1+nJYJ)+(1+nJYJ)nJ1YJ1++(1+nJYJ)(1+n2Y2)n1Y1=j=1J(1+njYj).

Similarly, the contribution of M*(tj) is

(1+njYj)+(1+njYj)nj1Yj1++(1+njYj)(1+n2Y2)n1Y1=l=1j(1+nlYl).

Hence,

μ^RRimp=1n{i=1nΔiMi+k:Xk=tJMJ,kRRimp+k:Xk=tJ1MJ1,kRRimp++k:Xk=t1M1,kRRimp}=1n{i=1nΔiMi+k:Xk=tJMJRR+k:Xk=tJ1MJ1RR++k:Xk=t1M1RR}+1n{j=1J((1+njYj)M(tJ)+j=1J1(1+njYj)M(tJ1)++(1+n1Y1)M(t1)}=μ^RR+1n{j=1J(1+njYj)M(tJ)+j=1J1(1+njYj)M(tJ1)++(1+n1Y1)M(t1)},

where μ^RR=μ^BT is already known, and M(tj)=i:Xi=tj[Mi(tj)M(tj)¯] according to its definition. It can also be shown that the Kaplan–Meier estimator for K(tj) is

K^(tj)=l=1jYlYl+nl,

which means

1K^(tj)=1l=1jYlYl+nl=l=1j(1+nlYl).

Thus,

μ^RRimp=μ^BT+1n[i:Xi=tJ{Mi(tJ)M(tJ)¯}K^(tJ)+i:Xi=tJ1{Mi(tJ1)M(tJ1)¯}K^(tJ1)+i:Xi=tJ2{Mi(tJ2)M(tJ2)¯}K^(tJ2)++i:Xit1{Mi(t1)M(t1)¯}K^(t1)]=μ^BT+1ni=1n(1Δi){MiM(Ci)¯}K^(Ci)=μ^ZT.

We have proved that the RRimp estimator is the same as the ZT estimator for estimating the mean cost.

References

  1. Bang H, Tsiatis AA. Estimating medical costs with censored data. Biometrika. 2000;87:329–343. [Google Scholar]
  2. Bang H, Tsiatis AA. Median regression with censored cost data. Biometrics. 2002;58:43–649. doi: 10.1111/j.0006-341x.2002.00643.x. [DOI] [PubMed] [Google Scholar]
  3. Cox D. Regression models and life tables. J. R. Stat. Soc., Series B. 1972;34:187–220. [Google Scholar]
  4. Efron B. The two sample problem with censored data.. Proceedings of the 5th Berkeley Symposium; Berkeley. University of California Press; 1967. pp. 831–853. [Google Scholar]
  5. Horvitz D, Thompson D. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 1952;47:663–685. [Google Scholar]
  6. Huang Y. Calibration regression of censored lifetime medical cost. J. Am. Stat. Assoc. 2002;97:318–327. [Google Scholar]
  7. Huang Y, Louis R. Nonparametric estimation of the joint distribution of survival time and mark variables. Biometrika. 1998;85:785–798. [Google Scholar]
  8. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958;53:457–481. [Google Scholar]
  9. Lin D. Regression analysis of incomplete medical cost data. Stat. Med. 2003;22:1181–1200. doi: 10.1002/sim.1377. [DOI] [PubMed] [Google Scholar]
  10. Lin D, Feuer E, Etzioni R, Wax Y. Estimating medical costs from incomplete follow-up data. Biometrics. 1997;53:419–434. [PubMed] [Google Scholar]
  11. Moss A, Zareba W, Hall W, Klein H, Wilber D, Cannom V, Daubert J, Higgins S, Brown M, Andrews M. Prophylactic implantation of a defibrillator in patients with myocardial infarction and reduced ejection fraction. N. Eng. J. Med. 2002;346:877–883. doi: 10.1056/NEJMoa013474. [DOI] [PubMed] [Google Scholar]
  12. Pfeifer PE, Bang H. Non-parametric estimation of mean customer lifetime value. J. Interactive Marketing. 2005;19:48–66. [Google Scholar]
  13. Satten GA, Datta S. The Kaplan–Meier estimator as an inverse-probability-of-censoring weighted average. Am. Stat. 2001;55:207–210. doi: 10.1198/000313001317098185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Zhao H, Bang H, Wang H, Pfeifer PE. On the equivalence of some medical cost estimators with censored data. Stat. Med. 2007;26:4520–4530. doi: 10.1002/sim.2882. [DOI] [PubMed] [Google Scholar]
  15. Zhao H, Cheng Y, Bang H. Some insight on censored cost estimators. Stat. Med. 2011;30:2381–2389. doi: 10.1002/sim.4295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Zhao H, Tian L. On estimating medical cost and incremental cost-effectiveness ratios with censored data. Biometrics. 2001;57:1002–1008. doi: 10.1111/j.0006-341x.2001.01002.x. [DOI] [PubMed] [Google Scholar]
  17. Zhao H, Tsiatis AA. A consistent estimator for the distribution of quality adjusted survival time. Biometrika. 1997;84:339–348. [Google Scholar]
  18. Zhao H, Zuo C, Chen S, Bang H. Nonparametric inference for median costs with censored data. Biometrics. 2012;68:717–725. doi: 10.1111/j.1541-0420.2012.01755.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Zwanziger J, Hall WJ, Dick AW, Zhao H, Mushlin AI, Hahn R, Wang H, Andrews M, Mooney C, Wang C, Moss A. The cost-effectiveness of implantable cardiac defibrillators: Results from MADIT II. J. Am. Coll. Cardiol. 2006;47:2310–2318. doi: 10.1016/j.jacc.2006.03.032. [DOI] [PubMed] [Google Scholar]

RESOURCES