Abstract
A logistic regression method can be applied to regressing the τ-year survival probability to covariates, if there are no censored observations before time τ. But if some observations are incomplete due to censoring before time τ, then the logistic regression cannot be applied. Jung (1996) proposed to modify the score function for logistic regression to accommodate the right censored observations. His modified score function, motivated for a consistent estimation of regression parameters, becomes a regular logistic score function if no observations are censored before time τ. In this paper, we propose a modification of Jung’s estimating function for an optimal estimation for the regression parameters in addition to consistency. We prove that the optimal estimator is more efficient than Jung’s estimator. This theoretical comparison is illustrated with a real example data analysis and simulations.
Keywords: Censoring distribution, Logistic regression, Nonnegative definite, Survival probability
1 Introduction
Cox (1972) regression based on proportional hazards model has been widely used to relate the survival distribution of a patient with a set of covariates. So, Cox regression method is to investigate the impact of covariates on the patient’s survival time over the whole time span. In a certain clinical setting, however, there may be a landmark time τ relevant to determine the patient’s disease status. Suppose that, if a cancer patient is free of disease for 5 years after a surgery for tumor resection, then this patient is considered to be completely cured of the cancer. In this case, we may be interested in relating the disease-free probability at time τ = 5 years to a given set of covariates. The regression analysis of survival probability at a chosen time point is of interest for time-dependent ROC curve methods, refer to e.g. Heagerty and Zheng (2005) and Zheng et al. (2010). If there is no censored observation before time τ, then we can apply a logistic regression method. But when some observations are censored before time τ, we cannot apply the logistic regression method to the survival data.
The censoring mechanism is usually assumed to be independent of the survival time itself. Such an assumption is reasonable, for example, when the survival time is administratively censored. Under the independence assumption, the probability of being observed to survive time τ is a product of two probabilities, probability of not being censored by τ and probability of surviving time τ. Using this property, Jung (1996) adjusted the score function of the logistic regression for the censoring probability, and regressed the survival probability at time τ to covariates.
Jung’s approach of correction for the censoring probability is aimed at a consistent estimation of the regression coefficients. In this paper we propose an inference method for an optimal as well as consistent regression estimators. We theoretically proved that the proposed estimators are more efficient than Jung’s. This comparison is illustrated by analyzing real example data and simulation studies.
2 An Optimal Regression Method for τ-Year Survival Probability
For patient i (i = 1, …, n), let Ti be the survival time and Zi = (Z1i, …, Zpi)T the covariate vector. Suppose that, for a time point τ, survival probability, πi = pr(Ti ≥ τ), satisfies the relation
| (1) |
where ϕ, called a link function, is a monotonic differentiable function in [0, 1] and β0 is the true value of the regression parameter β = (β1, …, βp)T. Among popularly used link functions are logit link ϕ(π) = log{π/(1−π)}, log-log link ϕ(π) = −log{− log(π)}, and probit, the inverse of standard normal cumulative density function. Doksum and Gasko (1990) extensively review a correspondence between models in binary regression analysis and in survival analysis.
If all survival times are observed, maximum likelihood estimator of β is obtained by solving the equation
where π is the inverse of the link function ϕ and π′(ϕ) = ∂π (ϕ)/∂ϕ. However, in most studies observing lifetime, some survival times are censored. If a survival time is censored before time τ, then it is not clear whether this subject will survive beyond τ or not. In this case, the regular maximum likelihood method can not be used to estimate the regression parameters β of model (1). Let Xi = min(Ti, Ci) with Ci, censoring time, which is independent of Ti and Δi be the event indicator taking 1 if Ti is observed and 0 otherwise. To simplify our discussion, we assume that Ci are independent and identically distributed with G(t) = pr(Ci ≥ t). Then noting that E{I(Xi ≥ τ)} = pr(Ti ≥ τ)G(τ), Jung (1996) proposed a modified estimating equation, Ũ(β) = 0, where
| (2) |
and θ̂ is the Kaplan-Meier estimate of θ = G(τ) that is obtained by exchanging the role of censoring and event. He proves that the estimator is consistent and asymptotically normal under moderate assumption. In fact, estimating function (2) is oriented to consistency of regression estimator, rather than to efficiency. In this article, we propose an optimal estimating function that can improve the efficiency while maintaining consistency.
Let ε(β) = {ε1(β), …, εn(β)}T, εi(β) = I(Xi ≥ τ) − θ̂π(βTZi),
and V = cov{ε(β0)}. Noting, from estimating function (2), that ε(β) plays the role of residual vector, we consider the estimating function
where D = D(β0). Similarly to Jung (1996), we can show that β̂, the solution of U(β) = 0, is consistent and asymptotically normal with mean β0 and with covariance matrix (DTV−1D)−1.
Now let’s prove that β̂ is an optimal estimator. For an n × p full rank matrix H, we may consider an estimating function UH(β) = HT ε(β). Note that
for Ũ and HT = DTV−1 for U. Also, using similar arguments to Jung (1996), we can show that the solution β̂H to the equation UH(β) = 0 is consistent and approximately normal with mean β0 and with covariance matrix
Further, Appendix A shows that cov(β̂H) − cov(β̂) is nonnegative definite, meaning that, for any linear combination of β, β̂ provides more efficient estimator than β̂H.
Since V is an n × n matrix, obtaining its inverse matrix in the optimal estimating equation looks problematic, especially when sample size n is large. However, because of its unique structure, the inverse matrix can be easily obtained. From Appendix B, we have
| (3) |
where A = θ×diag{π1(1−θπ1), …, πn(1−θπn)}, π = (π1, …, πn)T, , Λ(t) = −log G(t), , and a⊗2 = aaT for a vector a. Since A is a diagonal matrix, γπ⊗2 in (3) reflects the dependency between residuals. Noting that γ = Op(n−1), we see that the dependency gets weaker as n increases. Now, by applying the Bartlett (1951) equality to (3), we have
| (4) |
Hence, inversion of V involves only that of diagonal matrix A.
Using Newton-Raphson’s method, solution to the equation U(β) = 0 at iteration k is obtained by
Since D and V include unknown parameters β0, θ and Λ(t), we replace them with β̂(k−1), θ̂ and Nelson’s (1969) estimate . Here, and Ni(t) = (1 − Δi)I(Xi ≤ t). Replacing the parameters included in the weight DTV−1 with their consistent estimates does not change asymptotic property of the resulting estimator β̂.
3 Example
The proposed optimal estimating method is applied to real data. Phenobarbital was studied in the treatment of children with febrile seizure in a double blinded clinical trial (Farwell et al., 1990). Two hundred seventeen eligible febrile seizure children were randomized to either a two-year prophylactic phenobarbital treatment or a placebo arm. A variable of interest was the time to the recurrent seizure from the index febrile seizure. The number of prior febrile seizures (NPF) and the age at the index seizure (AGE) were considered important covariates for the time to the recurrent seizure. We relate three covariates, treatment (TRT), NPF and AGE to the probability of staying seizure-free for two years when the treatment was tapered. In this data, 29 observations were censored before 2 years. From Table 1, we observe that the standard errors of the regression estimates by the optimal method are smaller than those by Jung’s (1996) method. The AGE variable is significant by the optimal method at 5% significance level, but not by Jung’s method. From the analysis results, we may interpret that the fewer the number of prior febrile seizures and the older when the index seizure occurs, the more likely to remain seizure-free for two years. The treatment is not significant. These results are based on logistic regression model, but regression models using other link functions give similar results.
Table 1.
Logistic regression of 2-year seizure probability using Phenobarbital Trial data
| Jung’s method | Optimal method | |||||
|---|---|---|---|---|---|---|
| β̂ | se(β̂) | P-value | β̂ | se(β̂) | P-value | |
| Intercept | 0.322 | 0.593 | 0.588 | 0.276 | 0.585 | 0.637 |
| TRT | 0.238 | 0.355 | 0.501 | 0.243 | 0.352 | 0.490 |
| NPF | −0.689 | 0.239 | 0.004 | −0.706 | 0.234 | 0.003 |
| AGE | 0.057 | 0.030 | 0.053 | 0.061 | 0.029 | 0.037 |
4 Simulation Studies
Extensive simulation studied were conducted to investigate finite-sample properties of the proposed optimal estimating method which is based on asymptotic theory.
In the first simulation, covariate Zi was generated from U(0, 1) distribution. Given Zi, Ti was generated from an exponential distribution with hazard rate exp(−βZi). With log-log link ϕ(π) = −log{− log(π)} and πi = pr(Ti ≥ τ), this survival distribution entails the model
where β0 = − log(τ) and β1 = β. Censoring variables were generated from U(c0, c0 + c1). For a given β value, the constant c1 was chosen for 30% censoring before τ = 1 by U(0, c1). With the chosen c1 value fixed now, c0 was chosen for 10% or 20% censoring before τ = 1 by U(c0, c0 + c1). For each combination of β and censoring proportion, 1000 random samples of {(Xi, Δi, Zi), i = 1, …, n} were generated with n = 100, 200 or 300. For each sample, (β̂0, β̂1) and its variance were calculated using Jung’s (1996) method and the optimal method. Table 2 reports bias and standard error (SE) of β̂1, and empirical power (PWR) for H0 : β1 = 0, which was calculated as the proportion of the 1000 testings with a nominal level 0.05 that rejected H0. The regression estimates by the both methods have very small bias, especially under β1 = 0. Under H1 : β1 = 2, the optimal method seems to have a slightly smaller bias than Jung’s method. When β1 = 0, we observe that the empirical powers by both methods are fairly close to the nominal level 0.05. When β1 = 2, optimal estimation method seems to be more powerful than Jung’s method. For example, with n = 100 and 30% censoring, the optimal method has an over twice higher power than that of Jung’s (0.278 vs. 0.136). Difference in power between two methods increases as censoring proportion increases. Note that, if there is no censoring before τ, the two estimating methods are identical. Standard error of optimal method is always smaller than that of Jung’s method.
Table 2.
Bias and standard error (SE) of β̂1 and empirical power (PWR) for H0 : β1 = 0
| n | Censoring within τ |
Jung’s method | Optimal method | ||||
|---|---|---|---|---|---|---|---|
| Bias | SE | PWR | Bias | SE | PWR | ||
| (a) Under H0 : β1 = 0 | |||||||
| 100 | 10% | 0.042 | 0.616 | 0.054 | 0.042 | 0.613 | 0.058 |
| 20% | 0.003 | 0.715 | 0.033 | 0.004 | 0.708 | 0.045 | |
| 30% | 0.014 | 0.804 | 0.034 | 0.015 | 0.795 | 0.041 | |
| 200 | 10% | −0.015 | 0.430 | 0.047 | −0.015 | 0.428 | 0.048 |
| 20% | 0.035 | 0.495 | 0.045 | 0.035 | 0.493 | 0.051 | |
| 30% | 0.003 | 0.554 | 0.050 | 0.005 | 0.550 | 0.056 | |
| 300 | 10% | 0.012 | 0.350 | 0.040 | 0.012 | 0.350 | 0.042 |
| 20% | −0.001 | 0.400 | 0.046 | −0.001 | 0.399 | 0.049 | |
| 30% | 0.006 | 0.449 | 0.037 | 0.006 | 0.447 | 0.040 | |
| (b) Under H1 : β1 = 2 | |||||||
| 100 | 10% | 0.126 | 1.078 | 0.534 | 0.101 | 1.017 | 0.558 |
| 20% | 0.137 | 1.395 | 0.292 | 0.080 | 1.227 | 0.380 | |
| 30% | 0.146 | 1.696 | 0.136 | 0.108 | 1.426 | 0.278 | |
| 200 | 10% | 0.037 | 0.719 | 0.846 | 0.027 | 0.698 | 0.853 |
| 20% | 0.059 | 0.880 | 0.707 | 0.060 | 0.829 | 0.724 | |
| 30% | 0.145 | 1.099 | 0.542 | 0.097 | 0.982 | 0.590 | |
| 300 | 10% | 0.044 | 0.573 | 0.964 | 0.040 | 0.561 | 0.965 |
| 20% | 0.040 | 0.709 | 0.865 | 0.038 | 0.673 | 0.878 | |
| 30% | 0.065 | 0.848 | 0.732 | 0.050 | 0.785 | 0.769 | |
In the second simulation, we want to check the robustness of estimation methods against misspecification of link function. Two covariates Z1i and Z2i were generated from Bernoulli(1/2) and U(0,1) respectively. Given Z1i and Z2i, Ti was generated from an exponential distribution with hazard rate exp(−β1Z1i − β2Z2i). With log-log link ϕ(π) = − log{− log(π)} and πi = pr(Ti ≥ τ), this survival distribution entails the model
where β0 = − log(τ). We introduce another covariate in the regression model, since regression models with a single covariate will do not depend on the choice of a link function under H0. Censoring variable was similarly generated as in the first simulation set. For each sample, regression estimates and their standard errors were calculated using the popular logit link, a wrong choice. Table 3 reports standard error of β̂1 and empirical power for H0 : β1 = 0 with τ = 1 and n = 100. Empirical powers by both methods are fairly close to the nominal level 0.05 under the null hypothesis, although Jung’s method is slightly conservative under heavy censoring (30%). When β1 = 2, optimal estimation method is always more powerful than Jung’s method. As in the first simulation, standard error of optimal method is always smaller than that of Jung’s method and the difference in power between two methods increases as censoring proportion increases.
Table 3.
Standard Error (SE) of β̂1 and empirical power (PWR) for H0 : β1 = 0 when a wrong link is used
| β1 = 0 | β1 = .5 | |||||||
|---|---|---|---|---|---|---|---|---|
| Censoring within τ |
Jung | Optimal | Jung | Optimal | ||||
| SE | PWR | SE | PWR | SE | PWR | SE | PWR | |
| 10% | 0.524 | 0.044 | 0.519 | 0.052 | 0.485 | 0.329 | 0.483 | 0.334 |
| 20% | 0.639 | 0.037 | 0.624 | 0.051 | 0.558 | 0.243 | 0.551 | 0.251 |
| 30% | 0.772 | 0.026 | 0.729 | 0.037 | 0.642 | 0.195 | 0.626 | 0.206 |
5 Concluding Remarks
The optimal regression method of the τ-year survival probability is easy to understand and does not require any assumption on the form of the underlying survival distribution. Its coefficients can be estimated by assigning optimal weights to the regression residual terms. This approach is proved to be more efficient than the approach proposed by Jung (1996). This theoretical comparison is further illustrated by a real data analysis and simulations.
We have assumed that the censoring distribution is free of covariates. This assumption holds for most well conducted clinical trials where the major cause of censored is administrative, but it may be violated for some retrospective studies. If the censoring distribution depends on a discrete covariate, we may partition the whole data into a number of strata identified by the values of the covariate, so that the censoring times within each stratum have an identical distribution. Suppose that there are K strata, and within stratum k (k = 1, …, K) censoring times are independent and identically distributed with survivor function Gk(t). Let {(Xki, Δki, Zki), i = 1, …, nk} denote the data from stratum k, where . Then the optimal estimating function U(β) is extended to
where εk(β) = {εk1(β), …, εk,nk(β)}T, εki(β) = I(Xki ≥ τ) − θ̂kπ(βTZki),
Vk = cov{εk(β0)}, and θ̂k is the Kaplan-Meier estimator of Gk(τ) from {(Xki, 1 − Δki), i = 1, …, nk}. It is easy to show that this estimating function has similar asymptotic properties as those of U(β). If the censoring distribution depends on a continuous covariate, we may consider partitioning the range of the covariate values into certain number of intervals using some cutoff values, so that the censoring distribution is relatively free of covariates within each interval, and apply the above stratified analysis method.
In this paper our focus has been on optimality of the regression parameters on the survival probability at fixed time point. Unlike time dependent covariates which might be changed at time of patient’s death, we assumed that covariates are fixed for each patients and this assumption is reasonable since our consideration is limited on a pre-specified time point instead of following through until a paient die or censored.
The regression method does not require heavy computations, so that all the computations were done on generic personal computer using Fortran 77 codings. Random numbers were generated using CM-library subroutines. The Fortran codings will be made available from the authors upon request.
Acknowledgements
We would like to thank the editor, the associate editor and the reviewers for valuable comments that greatly improved the presentation of the article. This work was supported by the SungKyunKwan University Research Grant.
Appendix A
Nonnegative definiteness of cov(β̂H) − cov(β̂)
In order to prove that cov(β̂H) − cov(β̂) = (HTD)−1HTV H(DTH)−1 − (DTV−1D)−1 is nonnegative definite, it suffices to show that M = DTV−1 D − DTH(HTV H)−1 HTD is nonnegative definite. It is easy to show that M is the covariance matrix of {DTV−1 − DTH(HTV H)−1 HT}ε(β0), so that M should be nonnegative definite.
Appendix B
V = A − γπ⊗2
Let V = (σij)i,j=1, …, n. Then,
Here,
| (A1) |
and
| (A2) |
by Greenwood (1926). Also, by the martingale representation of a Kaplan-Meier estimator,
| (A3) |
where . Since ξi are asymptotically independent,
| (A4) |
It can be easily shown that the righthand side of (A4) equals γπi. Hence, from (A1), (A2) and (A4), we have
| (A5) |
Similarly, it can be shown that, for i ≠ j,
| (A6) |
Contributor Information
Minjung Kwak, Department of Statistics, Yeungnam University, Gyeongsan, Gyeongbuk 712749, ROK, mjkwak@yu.ac.kr, Tel: 82-53-810-2321.
Jinseog Kim, Department of Statistics and Information Science, Dongguk University, Gyeongju 780714, ROK, jinseog.kim@gmail.com, Tel:82-54-770-2247.
Sin-Ho Jung, Samsung Center Research Institute, Samsung Medical Center, SungKyunKwan University, Seoul, 135710, ROK. sinho.jung@duke.edu, Tel:82-2-3410-9815.
References
- Bartlett MS. An inverse matrix adjustment arising in discriminant analysis. Annals of Mathematical Statistics. 1951;22:107–111. [Google Scholar]
- Cox DR. Regression models and life-tables (with discussion) Journal of Royal Statistical Society B. 1972;34:187–202. [Google Scholar]
- Doksum KA, Gasko M. On a correspondence between models in binary regression analysis and in survival analysis. International Statistical Review. 1990;58:243–252. [Google Scholar]
- Farwell JR, Lee YJ, Hirtz DG, Sulzbacher SI, Ellenberg JH, Nelson KB. Phenobarbital for Febrile Seizures-Effects on Intelligence and on Seizure Recurrence. New England Journal of Medicine. 1990;322:364–369. doi: 10.1056/NEJM199002083220604. [DOI] [PubMed] [Google Scholar]
- Greenwood M. Reports on Public Health and Medical Subjects. Vol. 33. London: Her Majesty’s Stationary Office; 1926. The natural duration of cancer; pp. 1–26. [Google Scholar]
- Heagerty PJ, Zheng Y. Survival model predictive accuracy an dROC curves. Biometrics. 2005;61:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
- Jung SH. Regression analysis for long-term survival rate. Biometrika. 1996;83:227–232. [Google Scholar]
- Nelson W. Hazard plotting for incomplete failure data. Journal of Quality Technology. 1969;1:2752. [Google Scholar]
- Zheng Y, Cai T, Stanford JL, Feng Z. Semiparametric models of time-dependent predictive values of prognostic biomarkers. Biometrics. 2010;66:50–60. doi: 10.1111/j.1541-0420.2009.01246.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
