Abstract
The linear regression model for right censored data, also known as the accelerated failure time model using the logarithm of survival time as the response variable, is a useful alternative to the Cox proportional hazards model. Empirical likelihood as a nonparametric approach has been demonstrated to have many desirable merits thanks to its robustness against model misspecification. However, the linear regression model with right censored data cannot directly benefit from the empirical likelihood for inferences mainly due to dependent elements in estimating equations of the conventional approach. In this paper, we propose an empirical likelihood approach with a new estimating equation for linear regression with right censored data. A nested coordinate algorithm with majorization is used for solving the optimization problems with nondifferentiable objective function. We show that the Wilks theorem holds for the new empirical likelihood. We also consider the variable selection problem with empirical likelihood when the number of predictors can be large. Since the new estimating equation is nondifferentiable, a quadratic approximation is applied to study the asymptotic properties of penalized empirical likelihood. We prove the oracle properties and evaluate the properties with simulated data. We apply our method to a SEER small intestine cancer dataset.
Keywords: Accelerated failure time model, Coordinate descent algorithm, High-dimensional data analysis, Linear regression model, Oracle property, Variable selection, Wilks’ theorem
1. Introduction
The empirical likelihood (Owen 1988, 2001) is an influential computation-intensive statistical approach. The conventional parametric likelihood approach plays a fundamental role in practical data analysis for its convenient and efficient estimation and inferences. It, however, relies on correctly assuming some parameterized distributions. Being a nonparametric likelihood approach in the sense of being distributional assumption free, the empirical likelihood shares some convenient merits of parametric likelihood (e.g. χ2 distributed empirical likelihood ratio) and has many desirable advantages in deriving estimators and confidence sets for unknown parameters, including objective determination of the shape of the confidence region (Owen 1988), seamless incorporation of auxiliary or prior information (Qin and Lawless 1994), Bartlett correctability (Chen and Cui 2006, 2007), and many others. In the past two decades, methods using the empirical likelihood approach have been actively and extensively developed with applications in numerous scientific areas; see Owen (2001) for an overview and Chen and Van Keilegom (2009) for a review.
Among those previous works, the empirical likelihood method has been extended to linear regression with right censored time to event data; see, for example, Qin and Jing (2001), Li and Wang (2003), Zhou (2005), Lu and Liang (2006), Zhou and Li (2008), Zhao and Yang (2012), and Li and Lu (2009). For instance, Qin and Jing (2001) and Li and Wang (2003) proposed an approximate complete-data empirical likelihood to estimate the regression coefficients based on the synthetic data approach of Koul et al. (1981). Zhou and Li (2008) and Fang et al. (2013) developed empirical likelihood methods based on the estimating equation of the Buckley-James estimator (Buckley and James 1979) using a censored-data empirical likelihood and an approximate complete-data empirical likelihood to deal with censored data, respectively. However, each of these methods has its limitations. The synthetic data approach makes a strong assumption of independence between censoring time C and survival time and covariates (Y, X). The approximate complete-data empirical likelihood does not have the standard χ2 limiting distribution and requires an adjustment in order to use the standard chi-square approximation (Li and Wang 2003; Li and Lu 2009). The censored-data empirical likelihood (Zhou and Li 2008) has the standard χ2 limiting distribution. However, it is more computationally intensive and more difficult to cover problems such as incorporating auxiliary information and constructing confidence regions for linear combinations of the regression coefficients. Zheng et al. (2012) introduced an EL method based on derived influence functions of parameters of interest with plugging-in consistent nuisance parameter estimators. This influence function based EL method was proved to retain the nonparametric Wilks property under appropriate conditions.
With the current trend of statistical methodological development, empirical likelihood has been investigated in scenarios with increasing complexity and data dimensionality (Hjort et al. 2009; Chen and Van Keilegom 2009; Chen et al. 2009). Tang and Leng (2010) and Leng and Tang (2012) extend the empirical likelihood approach to high-dimensional data by selecting important variables from the feature space. They apply the empirical likelihood approach in conjunction with appropriate penalty functions on the magnitude of the parameters of interest and construct a framework to build sparse models. All developments require more intensive and demanding computations in solving practical problems with empirical likelihood to continue enjoying those nonparametric merits. On the other hand, algorithms for implementing empirical likelihood are relatively less developed in the recent literature. Tang and Wu (2013) develop two versions of nested coordinate descent algorithms to improve practical implementation of empirical likelihood. As for the linear regression with right censored data discussed above, the empirical likelihood and penalized empirical likelihood approaches remain less investigated partly because the estimating equations generally involve non-smooth functions of the parameter of interest. On the other hand, the merits of the penalized empirical likelihood approach such as parsimonious model construction and the oracle properties are desirable in analyzing right censored survival data.
The first goal of this paper is to introduce a new empirical likelihood approach for right censored linear regression that possesses its conventional nice properties. The second goal is to build parsimonious and robust right censored models with empirical likelihood when the number of covariates is large. Specifically, we propose a complete-data empirical likelihood in combination with an alternative approach to the Buckley-James estimator. Unlike the previous methods in which the estimating equation is a sum of dependent terms, our estimating equation is a sum of approximately i.i.d. terms. Consequently, our empirical likelihood method naturally inherits all nice features of the complete data empirical likelihood method. For example, its empirical likelihood ratio has the standard χ2 limiting distribution. We consider an approach with appropriate penalization to selecting important variables when the number of variables is large. However, the new estimating equation is nondifferentiable, so we use the quadratic approximation developed by Parente and Smith (2010) to study the asymptotic properties of the penalized empirical likelihood. After the treatment, this penalized right censored empirical likelihood regression model will be proved to have the nice oracle properties (Fan and Li 2001).
Both our proposed method and the method in Zheng et al. (2012) are plug-in empirical likelihood-based methods. However, the main result – Theorem 2.1 in Zheng et al. (2012) – does not generally apply to right censored data problems because it requires a continuously differentiable estimating function. However, the Buckley-James estimating function and the estimating function considered in our paper are not differentiable. In summary, our approach is different than Zheng et al. (2012) in the following ways: (1) our method is an estimating equation based approach; (2) the estimation function is not required to be continuously differentiable in our method; (3) the Buckley-James estimation function is re-expressed as approximately the sum of i.i.d. random variables, which allows us to apply the standard complete-data empirical likelihood method for censored regression setting.
The rest of the paper is organized as follows. A new empirical likelihood method for right censored linear regression is introduced in Section 2. This complete-data empirical likelihood combined with a new estimating equation is extended to variable selection for many covariates. Its asymptotic properties are also investigated. The computational issue is briefly discussed in the end of this section. Simulation experiments are conducted to evaluate its performance in finite sample size and variable selection in Section 3. Section 4 is the SEER (Surveillance, Epidemiology, and End Results) small intestine cancer example using our method. A brief discussion is provided in Section 5. Regularity conditions can be found in the Appendix. All the proofs of theorems are provided in the online supplementary file (http://www.urmc.rochester.edu/biostat/people/faculty/wu-tongtong.cfm).
2. Empirical Likelihood for Linear Regression with Right Censored Data
2.1. Estimating Equation for Right Censored Linear Regression
Let Y and C denote times to event and censoring, respectively. The observed survival time T is min(Y, C) and the censoring indicator is δ= I(Y ≤ C), which equals 1 if Y ≤ C and 0 otherwise. The survival time T is in the range [0, ς], where ς denotes the end of a study. Suppose X ∈ Rp is a covariate vector associated with the true event time. Conditionally on X, Y and C are assumed to be independent. Let {(Xi, Yi, Ci, δi), i = 1, …, n}be n independent samples of {(X, Y, C, δ)} and {(Xi, Ti, δi), i = 1, …, n} be observed samples. For the sake of simplicity, denote Yi as the log transformed version of the survival time, e.g. log(Yi). The linear regression model of the survival time on covariates is given by
| (1) |
where ϵi is the ith random error for the observation following some unknown distribution F(t) = P(ϵi ≤ t) and is independent of Xi and Ci. We assume no intercept in model (1) to ensure F is identifiable.
Define to be the conditional expectation of the underlying survival time Yi on (Ti, δi, Xi). Thus
| (2) |
Or, it is equivalent to re-write (2) as
| (3) |
where is the ith censored residual, and Fβ(t) is the distribution function of depending on β. It is easy to see that
| (4) |
so we denote by Wi(β, Fβ, μX). If μX and Fβ are known, then the sample version of (4) is . Therefore, it is possible to solve the following estimating equation for β
| (5) |
Unfortunately, μX and Fβ are unknown in practice so the estimating equation (5) is not useful.
In order to derive a useful estimating equation for β, we note that, for a fixed β, the probability distribution function Fβ can be estimated by the Kaplan-Meier estimator based on the observed right censored residuals ’s. If we replace Fβ and μX in Wi(β, Fβ, μX) with and , the estimating equation (5) becomes
| (6) |
which is the original estimating equation of the Buckley-James estimator (Buckley and James 1979; Ritov 1990). However, because the summands in (6) are not independent, the standard empirical likelihood results for complete data without censoring do not apply directly. Additional efforts are needed to develop the empirical likelihood method based on (6) (Zhou and Li 2008; Li and Lu 2009). Below we consider an alternative estimating equation in which the summand are approximately i.i.d., which will allow adaptation of standard complete data empirical likelihood methods.
Instead of solving the Buckley-James family of estimating equation (6), in which the summands are not independent because of the Kaplan-Meier estimate , we consider the Tsiatis family of estimating equation (Tsiatis 1990):
| (7) |
where , is the counting process for the ith censored residual , dMi(t) = dNi(t) − Ri(t)dΛ(t) with Λ(t) = −log[1 – F(t)], , and is the at-risk process for the ith censored residual. It follows Proposition 4.1 of Ritov (1990) that Ψn(β) = Φn(β) + op(1).
Therefore, we can solve Φn(β) = 0 for β.
Since w(t) still contains the unknown distribution function Fβ(t), we therefore propose the following estimating equation by replacing Fβ(t) by the Kaplan-Meier estimate based on the censored residuals:
| (8) |
Where
| (9) |
, and with the Nelson-Aalen estimator based on the censored residuals for Λ(t). We prove in Lemma 1 in the Appendix that Γn(β) = Φn(β) + op(1), therefore, instead of solving (6) or (7), we can equivalently solve (8) for β. We also prove that Γn(β) is a sum of approximately i.i.d. random variables, so the results of complete-data empirical likelihood can apply. This new estimating equation leads to the following definition of empirical likelihood for β. By solving the estimating equation Γn(β) = 0 for β, we define an M-estimator satisfying . It is straightforward from Theorem 5.1 in Ritov (1990), is -consistent and asymptotically normal.
2.2. Empirical Likelihood for Right Censored Linear Regression and Asymptotic Properties
We now consider constructing empirical likelihood for β for convenient and robust statistical inferences. With the estimating function (8), we define the empirical likelihood for β as follows:
| (10) |
The empirical likelihood ratio is given by
| (11) |
where λis the Lagrange multiplier satisfying
| (12) |
Since Si(β) are independent and identically distributed asymptotically, we show that the Wilks theorem holds for the above empirical likelihood.
Theorem 1. Under the regularity conditions in the Appendix, we have as n → ∞,
in distribution, where p is the dimensionality of β.
The proof of Theorem 1 is given in the Appendix. Correspondingly, we can define the profile empirical likelihood for a subset of regression coefficients as follows. Let , where β1 is a sub-vector of β being the components of parameter of interest. The profile empirical likelihood for β1 is defined by
| (13) |
Then we can show that the profile empirical likelihood ratio is also central χ2 distributed.
Theorem 2. Under the regularity conditions in the Appendix, we have as n → ∞,
in distribution, where q is the dimensionality of β1.
With Theorem 2, the 1 – α% confidence region for β1 can be constructed by
| (14) |
where is the upper α-level quantile of the χ2 distribution with q degrees of freedom.
In a short summary, Theorems 1 and 2 show that with approximately independent estimating equations, the empirical likelihood ratio and profile empirical likelihood ratio enjoy the Wilks property so that our empirical likelihood approach can be more conveniently applied for solving practical problems with right censored data.
2.3. Penalized Empirical Likelihood for Variable Selection
To build robust learning models and identify relevant predictors to the response variable, one can use the variable selection technique of penalizing the objective function. In this section, we show that variable selection can also be conducted in empirical likelihood for linear regression models on right censored data. The SCAD penalty (Fan and Li 2001) is used as a demonstration in this paper:
where τ is the tuning constant controlling the strength of shrinkage, and α = 3.7 following the recommendation in Fan and Li (2001).
We define the penalized empirical likelihood estimator to be the minimizer of
| (15) |
where λ is the Lagrange multiplier satisfying (12). As pointed out by Tang and Leng (2010), involving the penalty function on β has no impact on the evaluation of the empirical likelihood ratio for given β, which conveniently facilitates the calculation of the penalized empirical likelihood.
Since Si(β) involves β in an indicator function, it is not differentiable. Therefore, standard results in Qin and Lawless (1994) for empirical likelihood with estimating equations do not straightforwardly apply. In addition, the results in Leng and Tang (2012) for penalized empirical likelihood with estimation equations also do not carry forward automatically. As shown in the Appendix, the quadratic approximation of Parente and Smith (2010) for empirical likelihood is used for studying the asymptotic properties of the penalized empirical likelihood with nondifferentiable estimating functions.
To choose the optimal value for the tuning parameter τ, we combine our penalized empirical likelihood method with three information criteria: BIC of Schwarz (1978), BICC of Wang et al. (2009), and EBIC of Chen and Chen (2008). The three BIC-type criteria are defined as
where βτ is the estimate of β with τ being the tuning parameter, and dfτ is the number of nonzero coefficients in βτ. The optimal τ minimizes the BIC criteria. In general, all the three BIC-type criteria work well. One can use BIC for its well-known model selection consistency.
Without loss of generality, assume the truth , where denotes the true nonzero components. Now suppose the penalized estimator is the minimizer of (15) and is arranged in the same way, i.e., , where and . We prove the selection consistency in Theorem 3.
Theorem 3. Under the regularity conditions in the Appendix, we have as n → ∞.
Define , g(β) = E{gi(β)}, G = ∂E{gi(β)}/∂β, and . Let Ω = (GTQ−1G)−1, which is the asymptotic efficiency (Ritov 1990) of , i.e., as n → ∞,
where is the unpenalized estimator that solves Γn(β) = 0. We can decompose the variance-covariance matrix Ω as a block matrix according to the arrangement of β0 as
The efficiency of is summarized in the following theorem.
Theorem 4. Under the regularity conditions in the Appendix, we have as n → ∞,
in distribution.
From Theorem 4, it is clear that the penalized estimate is more efficient than the unpenalized estimate by comparing their variances and Ω11. The improvement of the asymptotic variance is seen as a gain by using the penalized empirical likelihood. By estimating the unimportant components in the parameter as zero, the penalized empirical likelihood approach achieves better variance of the important elements, which is analogous to the situation of using empirical likelihood with prior information β2 = 0; see Qin and Lawless (1995).
2.4. Computation
Though the empirical likelihood and penalized empirical likelihood have good properties for analyzing right censored data, dedicated effort is needed for handling the optimizations in the empirical likelihood approach. It is well know that the computation of empirical likelihood is dificult, especially when the dimension of covariates is moderate to high. A recent paper of Tang and Wu (2013) develops two versions of computational algorithms to tackle this problem. One is a nested coordinate descent algorithm with two layers to estimate λ and β separately, and the other is to couple the nest coordinate descent algorithm with the MM (majorization-minimization) algorithm (Hunter and Lange 2004; Lange 2004; Wu and Lange 2010). The latter is to find a surrogate matrix to approximate the Hessian matrix of the empirical likelihood and can further simplify the computation. Since Si(β) is not differentiable, the Newton-Raphson type algorithms for optimizing with respect to β is not feasible. Thus, we propose to solve the optimization with the coordinate decent MM algorithm. Combining with the MM algorithm will also improve the instability problem in the computation of EL methods.
There are two layers of optimizations. The first one is to evaluate ℓ(β) in (11) for a given β. Note that a pseudo-logarithm function is used throughout this paper to overcome the bounded support problem of the logarithm function (Owen 2001):
where ϵ is usually chosen to be 1/n. This pseudo-logarithm function is twice differentiable and is more stable in solving empirical likelihood than the original logarithm function. In our study, we use . as a surrogate for ℓ(β). We apply the coordinate decent algorithm proposed in Tang and Wu (2013) for evaluating ℓ⋆(β) for implementing our method. In practice, the pseudo-logarithm function does not change the asymptotic properties of empirical likelihood.
The second layer of optimization is respect to β or part of its components for evaluating the profile empirical likelihood and penalized empirical likelihood. We propose to apply majorization by using the quadratic bound principle:
where the matrix B satisfies B ⪰ d2ℓ⋆(β) (i.e., B − d2ℓ⋆(β) is nonnegative definite). Since the pseudo-logarithm function is twice differentiable and has a bounded curvature −λλT /ϵ2, it is easy to define the B matrix.
3. Simulation Examples
3.1. Asymptotic Distribution of Empirical Likelihood Ratio
In this example, we examine the distribution of the empirical likelihood ratio ℓ(β) evaluated at the true value β0 and compare with χ2 distributions. We generate n = 100 random samples from multivariate normal distributions in with mean 0 and the pairwise covariance
for i = 1, …, n. Both independent (ρ = 0) and correlated cases (ρ = 0.5 and 0.9) are considered. The true survival time Yi is generated by
where βT = (3, 2, 1)T. The censoring time is generated from a normal distribution with mean 1 and variance 52. The average censoring rate is 45%. We repeat the simulation for 100 times.
Quantile-quantile (QQ) plots in Figure 1 compare the empirical likelihood ratios vs. distribution. It is clear that the empirical distribution from our method have almost identical to a distribution. This shows the validity of the new empirical likelihood with approximate central χ2 distribution. As for comparison, we show the QQ plots in the second row of Figure 1 for the EL based on Buckley-James (BJ) estimating equation such that
where is defined in equation (2). Obviously, the QQ plots based on the Buckley-James estimating equation are quite different than a straight line, which means the empirical likelihood ratios are far away from a central distribution.
Figure 1:
QQ Plots of empirical likelihood ratios vs. for n = 100. The first row is the EL ratios based on our proposed method and the second row is based on the Buckley-James (BJ) estimating equation. The first column is for the case when ρ = 0; the second column is for ρ = 0.5; and the third column is for ρ = 0.9.
3.2. χ2 Distributions for Profiled Empirical Likelihood
In this simulation example, we examine the empirical distributions of ℓ(β2) evaluated at the truth β20 by varying its size s. The empirical coverage of the confidence region (14) is also checked.
The predictors variables X are generated similarly as the previous example with different correlation coefficient ρ. The true survival time Yi is generated by
where βT = (5, 4, 3, 2, 1)T. The censoring time is generated from a normal distribution with mean 1 and variance 52. The average censoring rate is 48%.
We examine s = 1, …, 3 corresponding to the profiled empirical likelihood for s component(s) in β. Without loss of generality, β2 is chosen to be the first s components of β. We fix β2 at its truth β20 and calculate (13) using the nested coordinate decent algorithm (Tang and Wu 2013). As the simulation is repeated for 500 times, we record the value 2ℓ(β20).
The coverage probabilities, i.e., the frequency such that , based on the 500 replicates at different ρ’s and s’s are calculated and summarized in Table 1. The empirical distribution of 2ℓ(β20), which asymptotically follows a distribution, are compared to those of a distribution by examining the QQ plots in Figure 2 at different ρ’s and s’s. It is obvious from both Table 1 and Figure 2 that the nested optimization based on coordinate descent is satisfactory, in both maintaining the nominal level of coverage and approximating the asymptotic distributions.
Table 1:
Coverage probabilities (CP) of profiled empirical likelihood based on 500 replicates
| s | n | ρ | CP | n | ρ | CP | n | ρ | CP |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 100 | 0 | 0.966 | 200 | 0 | 0.966 | 500 | 0 | 0.956 |
| 0.5 | 0.934 | 0.5 | 0.956 | 0.5 | 0.944 | ||||
| 0.9 | 0.920 | 0.9 | 0.936 | 0.9 | 0.940 | ||||
| 2 | 100 | 0 | 0.936 | 200 | 0 | 0.966 | 500 | 0 | 0.966 |
| 0.5 | 0.902 | 0.5 | 0.958 | 0.5 | 0.972 | ||||
| 0.9 | 0.924 | 0.9 | 0.958 | 0.9 | 0.942 | ||||
| 3 | 100 | 0 | 0.920 | 200 | 0 | 0.938 | 500 | 0 | 0.966 |
| 0.5 | 0.920 | 0.5 | 0.948 | 0.5 | 0.956 | ||||
| 0.9 | 0.892 | 0.9 | 0.936 | 0.9 | 0.952 |
Figure 2:
QQ Plot of quantiles of profiled empirical likelihood vs. for n = 100 based on 500 replicates. (a) s = 1, ρ = 0; (b) s = 1, ρ = 0.5; (c) s = 1, ρ = 0.9; (d) s = 2, ρ = 0; (e)s = 2, ρ = 0.5; (f) s = 2, ρ = 0.9; (g) s = 3, ρ = 0; (h) s = 3, ρ = 0.5; (i) s = 3, ρ = 0.9.
3.3. Penalized Empirical Likelihood
We illustrate the applications of empirical likelihood in variable selection in right censored linear regression. The SCAD penalty (Fan and Li 2001) is used in (15).
The predictor Xi, i = 1, …, n is generated from a multivariate normal distribution with mean zero and a compound symmetry variance-covariance matrix with σ = 1 and ρ = 0.5. The true survival time Yi is generated by
where βT = (3, 1.5, 0, 0, 2, 0, …, 0)T. The censoring time is generated from a normal distribution with mean 1 and variance 52. The average censoring rate is 48%.
Table 2 reports the results for n = 100, 200, p = 10, 20, and ρ = 0.5. Our penalized empirical likelihood method is combined with three information criteria – BIC, BICC, and EBIC – for selecting the tuning parameter τ and compared with the R package bujar developed by Wang and Wang (2010). The column of Nnonzero reports the average total number of selected predictors and The column of Ntrue reports the average total number of selected true predictors from (X1, X2, X5). The last column reports the average model error (ME) defined by
where Σ is the variance-covariance matrix of the predictors. Standard errors are reported in the parentheses. The model error of the oracle model (i.e., the model using the true predictors) is also reported in the last column of each setting. Both the PEL and bujar methods select all three true predictors in all settings. When the predictors are correlated, the R package bujar selects more false positives. The model errors of the penalized empirical likelihood methods are smaller than bujar and are closed to the Oracle estimation.
Table 2:
Simulation results of our proposed penalized empirical likelihood (PEL) and Rpackage bujar for penalized linear regression based on 100 replicates.
| (n, p) | ρ | Method | Nnonzero | Ntrue | ME |
|---|---|---|---|---|---|
| (100, 10) | 0.5 | PEL-BIC | 3.59 (0.08) | 3 (0) | 0.212 (0.023) |
| PEL-BICC | 3.59 (0.08) | 3 (0) | 0.212 (0.023) | ||
| PEL-EBIC | 3.30 (0.06) | 3 (0) | 0.188 (0.022) | ||
| bujar | 4.47 (0.10) | 3 (0) | 0.281 (0.015) | ||
| Oracle | 0.067 (0.005) | ||||
| (200, 10) | 0.5 | PEL-BIC | 3.09 (0.03) | 3 (0) | 0.038 (0.004) |
| PEL-BICC | 3.09 (0.03) | 3 (0) | 0.038 (0.004) | ||
| PEL-EBIC | 3.05 (0.02) | 3 (0) | 0.038 (0.004) | ||
| bujar | 4.08 (0.10) | 3 (0) | 0.189 (0.010) | ||
| Oracle | 0.028 (0.003) | ||||
| (200, 20) | 0.5 | PEL-BIC | 3.66 (0.1) | 3 (0) | 0.061 (0.005) |
| PEL-BICC | 3.64 (0.1) | 3 (0) | 0.061 (0.005) | ||
| PEL-EBIC | 3.24 (0.06) | 3 (0) | 0.067 (0.008) | ||
| bujar | 4.97 (0.13) | 3 (0) | 0.203 (0.010) | ||
| Oracel | 0.025 (0.002) |
4. Analysis of SEER Small Intestine Data
In this section, we apply our method to the small intestine data set from the Surveillance, Epidemiology, and End Results (SEER 1973–2002) Program supported by the National Cancer Institute (NCI). Small intestine cancer is a relatively rare cancer compared to other gastrointestinal malignancies such as gastric cancer (stomach cancer) and colorectal cancer (Terry and Santora 2006). Surgery and radiation therapy are the most commonly used treatments for small intestine cancer. However, exposure to radiation may cause damage to cellular DNA and increase the risk for certain types of tumors. For example, Hashibe et al. (2005) showed that radiotherapy for first primary oral cancer is a significant risk factor for cancer development.
The goal of the analysis is to determine the efficacy of surgery and radiation therapy for small intestine cancer. The data set includes 2,669 patients who were diagnosed with small intestine cancer (ICD9 = 152) as a single or first primary tumor in the period from 1973–2002. Each of the 2,699 patients has complete information on surgery status (yes or no), radiation therapy (yes or no), and the survival time, which is defined as the time from the diagnosis of the first primary small intestine cancer to the diagnosis of the second primary cancer or death. Covariate information is also available on age at the first primary cancer diagnosis (age < 60 or ≥ 60), gender, race (white or other), stage (distant or other), and tumor grade (III or IV, or other). All these covariates are binary and have been previously shown to affect cancer development. Patients with survival times less than 6 months are considered as synchronous cases and are excluded from the study.
Li and Wu (2010) showed that the Cox model does not fit this dataset well as the proportionality assumption does not hold. We therefore fit a linear regression model to this dataset. Table 3 shows the estimates of regression coefficients. The variables Surgery, race, stage, and grade are selected and have nonzero coefficients by both penalized empirical likelihood (PEL) and bujar. The penalized empirical likelihood methods show that a white small intestine cancer patient who is at distant stage and grade I or II has longer survival time. Surgery will elongate small intestine patients’ survival time. The variable radiation is selected by bujar only. We argue that the penalized empirical likelihood results are more reliable since radiation was found to have no significant impact on the survival time by Li and Wu (2010).
Table 3:
Result of SEER small intestine data set with 7 covariates using unpenalized empirical likelihood, penalized empirical likelihood (PEL) and R package bujar
| Variable | Unpenalized | PEL-BIC | PEL-BICC | PEL-EBIC | bujar |
|---|---|---|---|---|---|
| surgery | 0.54 | 2.33 | 2.33 | 2.33 | 1.87 |
| radiation | 0.1 | 0 | 0 | 0 | −0.07 |
| age60 | −0.28 | 0 | 0 | 0 | 0 |
| male | 0.08 | 0 | 0 | 0 | 0 |
| white | 1.36 | 2.99 | 2.99 | 2.99 | 2.63 |
| stage-distant | 1.62 | 2.92 | 2.92 | 2.92 | 2.53 |
| grade34 | −0.57 | −1.94 | −1.94 | −1.94 | −1.48 |
In conclusion, controlling for the other variables, surgery for first primary small intestine cancer appears to reduce the risk of second primary cancer or death significantly. Radio-therapy for first primary small intestine cancer does not seem to affect the development of second primary cancer.
5. Discussion
Our result is based on the assumption that the censored errors are i.i.d.. The method will fail, similar to the Buckley-James estimator, when the i.i.d. assumption does not hold. The covariates dimension can grow with the sample size n. However, the converging rate could be slow for two reasons. First, the converging rate is slow in the empirical likelihood framework with diverging number of parameters, even for complete data, due to its nonparametric nature (Leng and Tang 2012). Second, the estimating equation is discontinuous and the estimated regression coefficients are an M-estimator with an unconventional form. It is also unclear whether the quadratic approximation developed in Parente and Smith (2010) continue to apply or not. The converging rate is likely to be slower than the situation considered in Leng and Tang (2012). We therefore do not discuss the case of diverging number of covariates in this paper, and hope to address the problem in a future work.
Supplementary Material
Table 4:
Selection frequencies based on 100 random subsets of 500 out of 2669 SEER small intestine samples
| Variable | PEL-BIC | PEL-BICC | PEL-EBIC | bujar |
|---|---|---|---|---|
| surgery | 80 | 80 | 53 | 100 |
| radiation | 4 | 4 | 2 | 37 |
| age60 | 9 | 9 | 6 | 29 |
| male | 6 | 6 | 3 | 15 |
| white | 99 | 99 | 94 | 100 |
| stage-distant | 100 | 100 | 99 | 100 |
| grade34 | 83 | 83 | 54 | 99 |
Acknowledgement
Wu’s research is supported in part by NSF Grant CCF-0926181. Li’s research is supported in part by NIH grants CA016042, U54 RR031268–01, and P01AT003960. The authors thank the editor, associate editor, and referees for their helpful comments and suggestions
Appendix: Regularity Conditions
Define , g(β) = E{gi(β)}, G = ∂E{gi(β)}/∂β, and . Our theoretical analysis assumes the following conditions:
-
C.1
Xi has compact support with mean μX.
-
C.2
Fβ has finite Fisher information.
-
C.3
where is compact. The truth β0 is in the interior of , and β 0 is the unique solution to 0 = g(β), and the matrix Q is nonsigular.
-
C.4g(β) is differentiable at β 0, the matrix G is of full rank. In addition let where then for any δn → 0
Here conditions C.1 and C.2 are the ones from Ritov (1990) for analyzing right censored data, conditions C.3 is the technical device to ensure consistency of the root of , and C.4 is from Parente and Smith (2010) for handling nonsmooth function gi(β). Essentially, it is assumed that the expectation of the nonsmooth criterion function of the parameter is continuous and differentiable. The second part is required for establishing the asymptotic normality of , and a similar condition can be found in Ritov (1990).
Footnotes
Supplementary Materials
All the proofs of theorems are provided in the online supplementary file (http://www.urmc.rochester.edu/biostat/people/faculty/wu-tongtong.cfm).
References
- Buckley J and James L (1979), “Linear regression with censored data,” Biometrika, 66, 429–436. [Google Scholar]
- Chen J and Chen Z (2008), “Extended Bayesian information criterion for model selection with large model space,” Biometrika, 95, 759–771. [Google Scholar]
- Chen SX and Cui HJ (2006), “On Bartlett correction of empirical likelihood in the presence of nuisance parameters,” Biometrika, 16, 1101–1115. [Google Scholar]
- Chen SX and Cui HJ (2007), “On the second order properties of empirical likelihood with moment restrictions,”Journal of Econometrics, 141, 492–516. [Google Scholar]
- Chen SX, Peng L, and Qin YL (2009), “E ects of data dimension on empirical likelihood,” Biometrika, 96, 711–722. [Google Scholar]
- Chen SX and Van Keilegom I (2009), “A review on empirical likelihood methods for regression (with discussions),” Test, 18, 415–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan J and Li R (2001), “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, 96, 1348–1360. [Google Scholar]
- Fang KT, Li G, Lu X, and Qin H (2013), “An Empirical Likelihood Method for Semiparametric Linear Regression with Right Censored Data,” Computational and Math-ematical Methods in Medicine, 2013, Article ID 469373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hashibe M, Ritz B, Le AD, Li G, Sankaranarayanan R, and Zhang Z-F (2005), “Radiotherapy for oral cancer as a risk factor for second primary cancers,” Cancer Letters, 220, 185–195. [DOI] [PubMed] [Google Scholar]
- Hjort NL, McKeague I, and Van Keilegom I (2009), “Extending the scope of empirical likelihood,” The Annals of Statistics, 37, 1079–1111. [Google Scholar]
- Hunter DR and Lange K (2004), “A tutorial on MM algorithms,” American Statistician, 58, 30–37. [Google Scholar]
- Koul H, Susarla V, and Van Ryzin J (1981), “Regression analysis with randomly right-censored data,” Ann. Statist, 9, 1276–1288. [Google Scholar]
- Lange K (2004), Optimization, New York: Springer-Verlag. [Google Scholar]
- Leng C and Tang CY (2012), “Penalized empirical likelihood and growing dimensional general estimating equations.” Biometrika, 99, 703–716. [Google Scholar]
- Li G and Lu X (2009), “Comments on: A review on empirical likelihood methods for regression,” Test, 18, 463–467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li G and Wang QH (2003), “Empirical likelihood regression analysis for right censored data,” Statistica Sinica, 13, 51–68. [Google Scholar]
- Li G and Wu TT (2010), “Semiparametric additive risks regression for two-stage design survival studies,” Stat Sin, 20, 1581–1607. [PMC free article] [PubMed] [Google Scholar]
- Lu W and Liang Y (2006), “Empirical likelihood inference for linear transformation mod-els,” Journal of Multivariate Analysis, 97, 1586–1599. [Google Scholar]
- Owen AB (1988), “Empirical likelihood ratio confidence intervals for a single function,” Biometrika, 75, 237–249. [Google Scholar]
- Owen AB (2001), Empirical Likelihood, New York: Chapman and Hall/CRC. [Google Scholar]
- Parente PM and Smith RJ (2010), “GEL Methods For Nonsmooth Moment Indicators,” Econometric Theory, 27, 74–113. [Google Scholar]
- Qin G and Jing BY (2001), “Empirical likelihood for censored linear regression,” Scan-dinavian Journal of Statistics, 28, 661–673. [Google Scholar]
- Qin J and Lawless J (1994), “Empirical likelihood and general estimating equations,” The Annals of Statistics, 22, 300–325. [Google Scholar]
- Qin J and Lawless J (1995), “Estimating equations, empirical likelihood and constraints on parameters,” The Canadian Journal of Statistics, 23, 145–159. [Google Scholar]
- Ritov Y (1990), “Estimation in a Linear Regression Model with Censored Data,” Ann. Statist, 18, 303–328. [Google Scholar]
- Schwarz G (1978), “Estimating the dimension of a model,” Annals of Statistics, 6, 461–464. [Google Scholar]
- SEER (1973–2002), “Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Public-Use Data,” National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch. [Google Scholar]
- Tang CY and Leng C (2010), “Penalized high dimensional empirical likelihood,” Biometrika, 97, 905–920. [Google Scholar]
- Tang CY and Wu TT (2013), “Nested Coordinate Descent Algorithms for Empirical Likelihood,” Journal of Statistical Computation and Simulation, in press. [Google Scholar]
- Terry SM and Santora T (2006), “Benign Neoplasm of the Small Intestine (http://www.emedicine.com/med/topic2652.htm),”
- Tsiatis AA (1990), “Estimating Regression Parameters Using Linear Rank Tests for Cen-sored Data,” Ann. Statist, 18, 354–372. [Google Scholar]
- Wang H, Li B, and Leng C (2009), “Shrinkage tuning parameter selection with a di-verging number of parameters,” Journal of the Royal Statistical Society, Series B, 71, 671–683. [Google Scholar]
- Wang Z and Wang C (2010), “Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data,” Statistical Applications in Genetics and Molecular Biology, 9, 1–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu TT and Lange K (2010), “The MM Alternative to EM,” Statistical Science, 25, 492–505. [Google Scholar]
- Zhao Y and Yang S (2012), “Empirical likelihood confidence intervals for regression pa-rameters of the survival rate,” Journal of Nonparametric Statistics, 24, 59–70. [Google Scholar]
- Zheng M, Zhao Z, and Yu W (2012), “Empirical likelihood methods based on influence functions,” Statistics and Its Interface, 5, 355–366. [Google Scholar]
- Zhou M (2005), “Empirical likelihood analysis of the rank estimator for the censored ac-celerated failure time model,” Biometrika, 92, 492–498. [Google Scholar]
- Zhou M and Li G (2008), “Empirical likelihood analysis of the Buckley-James estimator,” Journal of Multivariate Analysis, 99, 649–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


