Abstract
The censored linear regression model, also referred to as the accelerated failure time (AFT) model when the logarithm of the survival time is used as the response variable, is widely seen as an alternative to the popular Cox model when the assumption of proportional hazards is questionable. Buckley and James [Linear regression with censored data, Biometrika 66 (1979) 429−436] extended the least squares estimator to the semiparametric censored linear regression model in which the error distribution is completely unspecified. The Buckley–James estimator performs well in many simulation studies and examples. The direct interpretation of the AFT model is also more attractive than the Cox model, as Cox has pointed out, in practical situations. However, the application of the Buckley–James estimation was limited in practice mainly due to its illusive variance. In this paper, we use the empirical likelihood method to derive a new test and confidence interval based on the Buckley–James estimator of the regression coefficient. A standard chi-square distribution is used to calculate the P-value and the confidence interval. The proposed empirical likelihood method does not involve variance estimation. It also shows much better small sample performance than some existing methods in our simulation studies.
Keywords: Censored data, Wilks theorem, Accelerated failure time model, Linear regression model
1. Introduction
The Cox proportional hazards regression model [2] is very popular and has been routinely used in modeling covariate effects with right censored survival data. However, there are many cases where the proportional hazards model does not apply. As Cox pointed out in an interview [22], “Of course, another issue is the physical or substantive basis for the proportional hazards model. I think that is one of its weaknesses, that accelerated life models are in many ways more appealing because of their quite direct physical interpretation, particularly in an engineering context”. See also [27] and the references therein.
One of the most promising estimate in the semiparametric accelerated failure time (AFT) model is the Buckley–James estimator [1]. They extended the least squares estimation method to linear regression with right censored data. The Buckley–James estimator is calculated using an iterative algorithm. Cheap, fast computers and ever-improving software in recent years have made the calculation of the Buckley–James estimator a routine business. It has also been observed to perform well in many simulation studies and case studies [14,5,6,24]. However, variance estimation of the Buckley–James estimator remains very difficult. The program bj () within the Design library of Harrell (http://biostat.mc.vanderbilt.edu/s/Design), available for both S-plus and R languages [4], uses a variance estimation formula given by the Buckley–James's original paper which does not have a rigorous justification and indeed may not be consistent as pointed out by Lai and Ying [9]. On the other hand, the variance given by Lai and Ying [9] involves the density and the derivative of the density of the unknown distribution. Estimation of such functions can be highly unstable.
Several approaches have been proposed to tackle this problem [13,8,10,28,21,12,7]. Most recently, Jin et al. [7] used a novel resampling method for estimating the variance of the Buckley–James estimator. Qin and Jing [21] and Li and Wang [12] attempted to use the empirical likelihood (EL) method [18] for censored regression analysis. See Owen [18] for some discussion of bootstrap versus EL method. The EL method was first proposed by Thomas and Grunkemeier [25] to obtain better confidence intervals for a survival probability in connection with the Kaplan–Meier estimator. Owen [16,17] and many others developed this into a general methodology. It has many desirable statistical properties [18]. One of the nice features of the EL method particularly appreciated in censored data analysis is that one can construct confidence intervals without estimating the variance of the statistic. It also has better performance than the traditional normal approximation (Wald) method. However, the applications of Qin and Jing [21] and Li and Wang's [12] methods are hampered by the fact that the limiting distribution of their EL ratio is not a standard chi-square but a linear combination of chi-squares with coefficients depending on the unknown underlying distributions. Moreover, their methods are based on a synthetic data method [8] which does not perform well for small samples with heavy censoring [12].
In this paper, we propose a new EL test procedure for the Buckley–James estimator. In contrast to the synthetic data approach of [21,12], we use the true censored EL and show that the corresponding EL ratio has a standard chi-square limiting distribution. Thus the likelihood ratio test and confidence intervals can be obtained without estimating other quantities. The proposed method has demonstrated much more accurate coverage probability than that of [21,12] in our simulation study.
The rest of the paper is organized as follows. In Section 2, we derive an EL associated with the Buckley–James estimator for a semiparametric linear regression model with censored data. The EL is shown to have a standard chi-square limiting distribution. We also discuss how to extend our method to M-estimators for censored linear regression. Section 3 presents simulation results to evaluate the finite sample performance of the proposed method compared to the synthetic databased method of Qin and Jing [21] and Li and Wang [12]. We finally illustrate our method on the Stanford Heart Transplantation data.
2. The regression model and the empirical likelihood
2.1. The model and notations
Consider the linear regression model
where for subject i, yi is the logarithm of the survival time and xi is the associated vector of q covariates. We assume that εi's are independent and identically distributed with an unspecified distribution except with zero mean and finite variance.
Suppose that one observes the following right censored observations:
where for subject i, ci is the censoring time, assumed independent of yi given xi.
For any candidate estimator b of β, we define ei(b) = – bt xi. Let us order the ei(b)'s:
and order δi and xi along with ei(b). Notice this ordering is dependent on b. For simplicity, we assume for the rest of the paper that the ei(b)'s are already ordered and thus e(i)(b) = ei(b). We will also omit the symbol b as appropriate.
2.2. The empirical likelihood and estimation equation
Let F̂KM(t) denote the Kaplan–Meier estimator based on the right censored sample (ei(b), δi), i = 1, . . . , n. We form an n × n weight matrix M whose (i, j)th element M[i, j] is defined as follows: if δi = 0 then
if δi = 1 then m[i, i] = 1, and m[i, j] = 0; j ≠ i. It is easy to see that M is an upper triangle matrix satisfying the property for all i.
Let . Then wj, j = 1, 2, . . . , n, is a probability distribution with support on the uncensored ei's. Because the Kaplan–Meier estimator is a self-consistent estimator, we have wj = ΔF̂KM(ej), j = 1, . . . , n.
The Buckley–James estimating equation (cf. [23]) can be written as
| (1) |
where all the n terms in the above summation are nonzero. The Buckley–James estimator is the solution to the above equation and denoted by .
We rewrite the Buckley–James estimating equation (1) according to ei's:
| (2) |
| (3) |
Notice the number of nonzero terms in the above summation (3) is now the same as the number of nonzero δ's. This suggests that the equation to be used with the censored data EL (defined in (5) below) should be
| (4) |
where pi = ΔF(ei), i = 1, . . . , n, for any distribution F that has support on the uncensored ei's.
The censored EL based on (ei(b), δi), i = 1, . . . , n, is defined by
| (5) |
We are to find a distribution F or pi's that (a) has support only on the uncensored ei's; (b) satisfies the estimating equation (4); and (c) maximizes the censored EL (5).
Remark 1
The denominator nwi in (3) seems a bit arbitrary, but it makes the vector, m[k, i]/(nwi), into a (conditional) probability measure for each i. Therefore, (3) can be interpreted as an double expectation. Also, (1) and (3) are just two versions of double expectations.
Remark 2
When b = β, the true parameter value, then yi – βt xi are independent and identically distributed. Our censored EL is then the same as the censored EL based on independent identically distributed right censored observations used by [25,11,15,20], among others.
Remark 3
When maximizing the censored EL with respect to the pi's under the constraint (4), the weight matrix M and wi remain unchanged for a fixed b.
Remark 4
Clearly, if b = then pi = ΔF̂KM(t) satisfies the estimating equation (4), and maximizes the EL (5) among all CDF's. Therefore, the confidence regions based on our EL ratio will be “centered” at .
Remark 5
The numerical problem of maximizing the censored EL with respect to pi's with the constraint (4) is the same as the one faced by the censored EL with a mean constraint: . Here and μ = 0. This computational problem do not have explicit solution, but can be handled by a modified EM algorithm proposed by Zhou [29].
2.3. The asymptotics
It is worth noting that our constraint equation (4) is different from the mean constraint used in the previous works of EL in that the function f(·) in our problem depends on the data, and thus should be denoted by fn(·):
Consequently, we need a more general EL Wilks theorem for right censored data which is stated in the following theorem.
Theorem 1. Suppose Xi for i = 1, 2, . . . , n are independent random variables with = F0(t). By right censored data we mean the pairs Ti, δi where Ti = min(Xi, Ci); with Ci independent of Xi. Define the censored EL
In addition, suppose that for every n, fn(t) is a predictable random function with respect to the standard counting process filtration (see for example [3]) and satisfies the regularity conditions (for gn) in Lemma 1. Then we have
in distribution as n → ∞ where F̂KM is the Kaplan–Meier estimator based on Ti, δi, and the numerator EL is maximized under the constraint
With the help of Lemma 1, this theorem can be proved similarly to [19,20]. We defer the proof of the theorem to appendix.
Lemma 1. Consider censored data as described in Theorem 1, let gn(t) be predictable functions such that and gn(t) → g(t) in probability as n → ∞ with g(t) satisfy (defined below), then we have
in distribution as n → ∞. The asymptotic variance is given by
| (6) |
where G0(x) = lim 1/n . Furthermore, denote by the associated with the Kaplan–Meier estimator F̂KM(t) based on the (Ti, δi) i = 1, . . . , n, the asymptotic variance, , can be consistently estimated by
Proof. By the assumption we have
Integration by parts will give
Use the fact that again, we have
The integrand inside {} in the above is clearly a predictable function and thus the integration is also a martingale. By the CLT for martingales, it converges to a normal distribution with zero mean and a variance that can be consistently estimated by
Replace F0(·) by its consistent estimate F̂KM(·) gives the desired result. □
Lemma 2. The weight function is predictable.
Proof. Notice that if the Kaplan–Meier estimator jumps or not at t is not predictable but we are only concerned here with the size of the jump, if there is one. The size of the next jump of the Kaplan–Meier estimator can always be computed from the history and thus is predictable. More specifically, the next jump size of the Kaplan–Meier estimator, at time t, if there is one, is equal to 1/n × 1/(1 – Ĝ(t–)), where Ĝ is the Kaplan–Meier estimator of the censoring distribution.
Similarly we can infer from the history which part of the jump, if there is one, came from tj; tj < t. This proportion is precisely m[j, i]/(nwi). □
Armed with Theorem 1 and Lemma 2, we are ready to prove the following Wilks theorem for the Buckley–James estimator.
Theorem 2. When β = β0, the residuals are independent and identically distributed before censoring and the estimating equation (4) can be written as
where denotes the average over the m[j, i]/(nwi), j = 1, 2, . . . , n. As n → ∞, we have
in distribution, where both EL is defined in (5) and the numerator EL is maximized under constraint (4).
Proof. Since x is independent of ε, . Then the desired result follows from Lemma and Theorem 1. □
2.4. Empirical likelihood for M-estimators
Our EL method for the Buckley–James estimator can be extended to a class of M-estimators along the same lines. For complete data the regression M-estimator is defined as the minimizer of or the solution to the equation
where . Usually we assume ψ is monotone.
For right censored data, the Buckley–James estimating equation (for M-estimate) is
A rewriting of the estimating equation according to ei gives
In the EL analysis of the censored Buckley–James regression M-estimator, the definition of the censored EL remains unchanged as in (5). The constraint (or estimating equation) to be used with the EL is
Similar results as in the least squares estimator can be obtained for the regression M-estimator. We omit the details here.
3. Simulations
The computations of this section and the next are done with software R [4] with the added package emplik, available at any CRAN site (e.g. http://cran.us.r-project.org).
We first considered the following regression model yi = 2xi + εi, where xi is uniform(0.5, 1.5) and εi is uniform(−0.5, 0.5). We further take ci to be 1 + 3.2 exp(1), where exp(1) represents a random variable with standard exponential distribution. The sample size n is 100.
The −2 log EL ratio is computed for each simulation run for the hypothesis H0: β = 2. The resulting Q–Q plot shows a good fit of the distribution of the EL ratio to the chi-square distribution with 1 degree of freedom (Fig. 1).
Fig. 1.
Q–Q plot of −2 log EL ratio, 5000 simulation run, sample size=100.
We also did a simulation to evaluate the coverage accuracy of the proposed EL method compared with the synthetic data EL methods of [21,12]. We used the regression model yi = xi + εi where εi ∼ N(0, 0.25), xi ∼ N(1, 0.5) and Ci ∼ N(μ, 16), with μ = −1.8, 1, 3.1 and 6.1, respectively. This produces samples with censoring percentages equal to 75%, 50%, 30% and 10%, approximately.
Table 1 gives the achieved coverage probabilities of confidence intervals based on our proposed empirical likelihood method for the Buckley–James estimator (ELBJ) and the synthetic data empirical likelihood method (ELSD) [21, 12], respectively. Each entry is based on 5000 simulation runs.
Table 1.
Simulated coverage probabilities of empirical likelihood confidence intervals for β: ELSD refers to the empirical likelihood method of [21,12] based on synthetic data; ELBJ is our proposed empirical likelihood method for the Buckley—James estimator
| Sample size | Censoring rate | Nominal level=90% |
Nominal level=95% |
||
|---|---|---|---|---|---|
| ELSD | ELBJ | ELSD | ELBJ | ||
| 50 | 0.75 | 0.79 | 0.84 | 0.85 | 0.90 |
| 100 | 0.75 | 0.83 | 0.88 | 0.90 | 0.93 |
| 200 | 0.75 | 0.87 | 0.89 | 0.92 | 0.94 |
| 50 | 0.50 | 0.84 | 0.88 | 0.91 | 0.93 |
| 100 | 0.50 | 0.88 | 0.89 | 0.95 | 0.94 |
| 200 | 0.50 | 0.91 | 0.90 | 0.95 | 0.95 |
| 50 | 0.30 | 0.87 | 0.89 | 0.92 | 0.94 |
| 100 | 0.30 | 0.89 | 0.89 | 0.93 | 0.95 |
| 200 | 0.30 | 0.91 | 0.89 | 0.95 | 0.95 |
| 50 | 0.10 | 0.85 | 0.89 | 0.93 | 0.94 |
| 100 | 0.10 | 0.91 | 0.89 | 0.94 | 0.94 |
| 200 | 0.10 | 0.90 | 0.88 | 0.94 | 0.95 |
We see from Table 1 that both methods perform reasonably well for large samples. However, our proposed empirical likelihood method (ELBJ) has a noticeable better coverage accuracy than the synthetic data empirical likelihood method (ELSD) for smaller samples (say n = 50), especially in the case of heavy censoring (75%). Besides, the confidence level is much easier to set in ELBJ method due to a simpler limiting distribution.
As discussed in the introduction, the function bj( ) inside the Design package of R will compute a variance estimator proposed by Buckley and James [1]. The next simulation show that this variance estimator is not consistent, thus confirm Lai and Ying [9].
We found that the Wald statistic formed by using the given variance estimator and the Buckley–James estimator is smaller than the corresponding chi-square values under null hypothesis. Moreover, this discrepancy do not diminish as sample size increase. We also plot our EL ratio statistic as a comparison.
The regression model used is Y = 0.5 + 1.5X + ε; censoring variable is distributed as 1 + 2.5 exp(1). The covariates X is distributed as Unif[1, 2]. The error ε is distributed as N(0, sd = 0.5). The censoring percentage was about 49%.
We also computed the same simulation except with sample size n = 2000. The Q–Q plot is almost identical to the above (n = 400) (Fig. 2).
Fig. 2.
Q–Q plot of −2 log ELR (EL) and Wald statistics (BJvar). 1000 simulation runs.
4. An example
We illustrate the EL analysis of the Buckley–James estimator with the Stanford Heart Transplant data. Following [14], we use only 152 cases. The specific AFT model we used is log10(Ti) = β0 + β1 age + εi.
The Buckley–James estimator of the was marked by an X on the plot. From the plot we see that the contours are fairly symmetrical and elliptically shaped, indicating that the normal approximation is pretty good for the Buckley–James estimator here.
From the plot we see that the estimator is strongly negatively correlated with . The 95% confidence interval for the β1 alone is approximately [−0.0357, −0.0028], the 95% confidence interval for β0 alone is approximately [2.755, 4.255. These are obtained as the left (right, upper or lower) most point of the contour with level 3.84. They are approximate values because we used a coarse grid points to produce the contour plot, and thus interpolation was used in the plot (Fig. 3).
Fig. 3.
Contour plot for the −2 log EL ratio, Stanford Heart Transplant Data, 152 cases.
From the bj() function from the Design library of Harrell, the following results are obtained:
> bj (Surv(log10(time), status)∼ age, data=stanford5, link=“identity”) Buckley–James Censored Data Regression bj (formula=Surv(log10 (time), status)∼ age, data=stanford5, link=“identity”) Obs Events d.f. error d.f. sigma 152 97 1 95 0.6796 Value Std. Error Z Pr(>|Z|) Intercept 3.52696 0.299123 11.79 4.344e-32 age −0.01990 0.006632 −3.00 2.700e-03
Our confidence intervals are slightly wider than the ones obtained by the Wald confidence interval using the standard error estimator given by the function bj(). We remind readers that the standard error estimator produced by bj() has no theoretical justifications.
5. Concluding remarks
We developed a new empirical likelihood method for the Buckley–James estimator for censored linear regression. The empirical likelihood can be calculated using a modified EM algorithm proposed by Zhou [29]. Our method is different from the previous empirical likelihood methods for censored linear regression in that we use a “true” likelihood for censored data, while previous methods use an “estimated” empirical likelihood based on synthetic data. Consequently, our empirical likelihood has the standard chi-square null limiting distribution while previous methods do not. The empirical likelihood approach is appealing because it does not involve variance estimation which is difficult for the Buckley–James estimator. Our simulation also showed that our empirical likelihood method has much better small sample performance than previous methods. Finally, our results can be extended to a class of regression M-estimators for censored data.
Acknowledgement
Mai Zhou's research was partially supported by NSF grant DMS 0604920.
Appendix
We outline the proof of Theorem 1 here. First of all we define a class of functions
Furthermore, we define a one-parameter family of distribution functions
where
| (7) |
and C(λ) is just a normalizing constant
The parameter λ is well defined in a neighborhood of zero and for λ = 0, we get back the Kaplan–Meier: Fλ=0 = F̂KM. Within this family of distributions, there is only one that satisfy the constraint equation
| (8) |
We denote the parameter for this unique distribution as λ0.
Finally, we define a class of profile empirical likelihood ratio functions as follows:
Lemma A. Assume all the conditions in Lemma 1. Then, as n → ∞, (1) λ0 = Op(n−1/2), (2) in distribution.
Proof. (outline) Expanding (8), we have
from there we have
By Lemma 1, as n → ∞,
in distribution, and
in probability. By Slutsky's theorem in distribution as n → ∞, where
| (9) |
Theorem A. If the conditions in Lemma A hold, then, as n → ∞
in distribution, where
and
Furthermore, infh rh = 1.
Proof. Define
| (10) |
where and . From the definition we can see that
By Lemma A, λ0 = Op(n−1/2) where λ0 is the root of (8). Hence we can apply Taylor's expansion for f(λ0):
Some tedious but straight forward calculation show that f′(0) = 0 and the second derivative of f with respect to λ, evaluated at λ = 0 is
Its not hard to show as n → ∞, the following three quantities all converge in probability:
Hence we have
| (11) |
in probability. Finally, by similar calculations we can show that the third derivative of f evaluated at ξ is
| (12) |
Now observe
By Lemma A, (11), (12), and Slutsky theorem, we obtain
in distribution.
We now prove the infimum of the constant rh over h is one. First we notice that
is precisely the information defined by van der Vaart [26], as iα in his (4.1).
The infimum of iα over all one-dimensional sub-models is called “efficient Fisher information”. And in this case (right censored observations), the reciprocal of it is given by the last equation in p. 193 of van der Vaart [26] (as the lower bound for the asymptotic variance of estimating ∫ gd F):
Lastly, we notice that ∫ gnd F̂KM is an efficient estimate and therefore we can easily check
Therefore, infh rh = 1. □
Footnotes
Publisher's Disclaimer: This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
References
- 1.Buckley JJ, James IR. Linear regression with censored data. Biometrika. 1979;66:429–436. [Google Scholar]
- 2.Cox DR. Regression models and life tables (with discussion) J. Roy. Statist. Soc. B. 1972;34:187–220. [Google Scholar]
- 3.Fleming TR, Harrington DP. Counting Processes and Survival Analysis. Wiley; New York: 1991. [Google Scholar]
- 4.Gentleman R, Ihaka R. R: a language for data analysis and graphics. J. Comput. Graph. Statist. 1996;5:299–314. [Google Scholar]
- 5.Heller G, Simonoff JS. A comparison of estimators for regression with a censored response variable. Biometrika. 1990;77:515–520. [Google Scholar]
- 6.Heller G, Simonoff JS. Prediction in censored survival data: a comparison of the proportional hazards and linear regression models. Biometrics. 1992;48:101–115. [PubMed] [Google Scholar]
- 7.Jin Z, Lin DY, Ying Z. On least-squares regression with censored data. Biometrika. 2006;93:147–161. [Google Scholar]
- 8.Koul H, Susarla V, van Ryzin J. Regression analysis with randomly right-censored data. Ann. Statist. 1981;9:1276–1288. [Google Scholar]
- 9.Lai TL, Ying Z. Large sample theory of a modified Buckley–James estimator for regression analysis with censored data. Ann. Statist. 1991;19:1370–1402. [Google Scholar]
- 10.Lai TL, Ying Z. Linear rank statistics in regression analysis with censored or truncated data. J. Multivariate Anal. 1992;40:13–45. [Google Scholar]
- 11.Li G. On nonparametric likelihood ratio estimation of survival probabilities for censored data. Statist. Probab. Lett. 1995;25:95–104. [Google Scholar]
- 12.Li G, Wang Q-H. Empirical likelihood regression analysis for right censored data. Statist. Sinica. 2003;13:51–68. [Google Scholar]
- 13.Lin JS, Wei LJ. Linear regression analysis based on Buckley–James estimating equation. Biometrics. 1992;48:679–681. [PubMed] [Google Scholar]
- 14.Miller RG, Halpern J. Regression with censored data. Biometrika. 1982;69:521–531. [Google Scholar]
- 15.Murphy S, van der Vaart A. Semiparametric likelihood ratio inference. Ann. Statist. 1997;25:1471–1509. [Google Scholar]
- 16.Owen A. Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 1988;75:237–249. [Google Scholar]
- 17.Owen A. Empirical likelihood ratio confidence regions. Ann. Statist. 1990;18:90–120. [Google Scholar]
- 18.Owen A. Empirical Likelihood. Chapman & Hall; London: 2001. [Google Scholar]
- 19.Pan XR. Empirical likelihood ratio method for censored data, Ph.D. Dissertation. Department of Statistics, University of Kentucky; 1997. [Google Scholar]
- 20.Pan XR, Zhou M. Using one parameter sub-family of distributions in empirical likelihood with censored data. J. Statist. Plann. Inference. 1999;75:379–392. [Google Scholar]
- 21.Qin G, Jing BY. Empirical likelihood for censored linear regression. Scand. J. Statist. 2001;28:661–673. [Google Scholar]
- 22.Reid N. A conversation with Sir David Cox. Statist. Sci. 1994;9:439–455. [Google Scholar]
- 23.Ritov Y. Estimation in a linear regression with censored data. Ann. Statist. 1990;18:303–328. [Google Scholar]
- 24.Stare J, Heinzl H, Harrell F. Ferligoj A, Mrvar A, editors. On the use of Buckley and James least squares regression for survival data. New Approaches in Applied Statistics. 2000.
- 25.Thomas DR, Grunkemeier GL. Confidence interval estimation of survival probabilities for censored data. Amer. Statist. Assoc. 1975;70:865–871. [Google Scholar]
- 26.van der Vaart A. On differentiable functionals. Ann. Statist. 1991;19:178–204. [Google Scholar]
- 27.Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Statist. Med. 1992;11:1871–1879. doi: 10.1002/sim.4780111409. [DOI] [PubMed] [Google Scholar]
- 28.Zhou M. M-estimation in censored linear models. Biometrika. 1992;79:837–841. [Google Scholar]
- 29.Zhou M. Empirical likelihood ratio with arbitrarily censored/truncated data by a modified EM algorithm. J. Comput. Graph. Statist. 2005;14:643–656. [Google Scholar]



