Abstract
The varying-coefficient Cox model is flexible and useful for modeling the dynamic changes of regression coefficients in survival analysis. In this paper, we study feature screening for varying-coefficient Cox models in ultrahigh-dimensional covariates. The proposed screening procedure is based on the joint partial likelihood of all predictors, thus different from marginal screening procedures available in the literature. In order to carry out the new procedure, we propose an effective algorithm and establish its ascent property. We further prove that the proposed procedure possesses the sure screening property. That is, with probability tending to 1, the selected variable set includes the actual active predictors. We conducted simulations to evaluate the finite-sample performance of the proposed procedure and compared it with marginal screening procedures. A genomic data set is used for illustration purposes.
Keywords: Cox model, Partial likelihood, Penalized likelihood, Ultrahigh-dimensional survival data
1. Introduction
Feature screening can effectively reduce ultrahigh dimensionality and therefore has attracted considerable attention in the recent literature. Fan and Lv [12] proposed a marginal screening procedure for ultrahigh-dimensional Gaussian linear model, and further showed that marginal screening procedures may possess a sure screening property under certain conditions. Feature screening procedures for varying-coefficient models (VCM) with ultrahigh-dimensional covariates have been proposed in the literature. Liu et al. [20] developed a sure independence screening (SIS) procedure for ultrahigh-dimensional VCM by taking conditional Pearson correlation coefficients as a marginal utility for ranking the importance of predictors. Fan et al. [13] proposed an SIS procedure for ultrahigh-dimensional VCM by extending B-spline techniques in Fan et al. [10] for additive models. Xia et al. [26] further extended the SIS procedure proposed in [13] to generalized varying-coefficient models (GVCM). Cheng et al. [5] proposed a forward variable selection procedure for ultrahigh-dimensional VCM based on techniques related to B-splines regression and grouped variable selection. Song et al. [22] extended the procedure in [13] to longitudinal data without taking into account within-subject correlation, while Chu et al. [6] proposed an SIS procedure for longitudinal data based on a weighted residual sum of squares to use within-subjection correlation to improve accuracy of feature screening. Kong et al. [17] proposed a new screening method that leaves a variable in the active set if it has, jointly with some other variables, a high canonical correlation with the response.
Survival analysis has been widely used in medical science, economics, finance, and social science, among others. In many studies, survival data have primary outcomes or responses that are subject to censoring. The Cox model [7, 8] is the most commonly used regression model for survival data, and the partial likelihood method has become a standard approach to parameter estimation and statistical inference. Recently, variable selection and parameter estimation in Cox regression models have been considered by various authors (see, e.g., [4, 9, 14, 18, 19, 30]). Huang et al. [15] studied the penalized partial likelihood with the ℓ1-penalty for the Cox model with high-dimensional covariates. Yan and Huang [29] proposed the adaptive group Lasso in a Cox regression model with time-varying coefficients. However, they have not considered varying-coefficient models.
In this paper, we propose a new feature screening procedure for ultrahigh-dimensional varying-coefficient Cox models. It is distinguished from SIS procedures [11, 32] in that the proposed procedure is based on the joint partial likelihood of potentially important features, rather than the marginal partial likelihood of individual features. Xu and Chen [27] proposed a joint screening procedure and showed its advantage over SIS procedures in the context of generalized linear models. Yang et al. [28] extended the procedures in [27] to the Cox models. This work further extends the joint screening strategy and develops a feature screening procedure for varying-coefficient Cox models, which are natural extensions of Cox models and can be useful to explore nonlinear interaction effects between a primary covariate and other covariates.
The asymptotic properties of the proposed procedure are studied systematically. It is technically challenging to establish its sure screening property. The techniques used in [28] and other works related to SIS procedures cannot be applied for the present setting. We first develop Hoeffding’s inequality for a sequence of martingale differences and then establish a concentration inequality for the score function of a partial likelihood. Based on the concentration inequality, we prove the screening property for our proposed sure joint screening procedure. We also conduct simulation studies to assess the finite-sample performance of the proposed procedure and compare its performance with existing sure screening procedures for ultrahigh-dimensional survival data. The proposed methodology is demonstrated through an empirical analysis of a genomic data set.
The rest of this paper is organized as follows. In Section 2, we propose a new feature screening procedure for the varying-coefficient Cox model, develop an algorithm to carry it out, and demonstrate the ascent property of the proposed algorithm. We study the sampling property of the proposed procedure and establish its sure screening property. In Section 3, we present numerical comparisons and an empirical analysis of a real data set. Discussion is in Section 4. Technical proofs are in the Appendix.
2. New feature screening procedure for varying-coefficient Cox model
Let T be the survival time and x and U be p-dimensional covariate vector and univariate covariate, respectively. Throughout this paper, we consider the varying-coefficient Cox proportional hazard model given by
| (1) |
where h0(t) is an unspecified baseline hazard function and α(U) = (α1(U), …, αp(U))T consists of the unknown nonparametric coefficient functions. It is assumed that the support of U is finite and denoted by [a, b]. In survival data analysis, survival times are subject to a censoring time C. Denote the observed time by Z = min(T, C) and the event indicator by δ = 1(T ≤ C). It is assumed throughout this paper that the censoring mechanism is noninformative. That is, given x and U, T and C are conditionally independent.
Suppose that (x1, U1, Z1, δ1), …, (xn, Un, Zn, δn) is a random sample from model (1). Let be the ordered observed failure times. Let ( j) be the label for the subject failing at time , so that the covariates associated with the N failures are x(1), …, x(N) and U(1), …, U(N). Denote the risk set right before time by . The partial likelihood function [8] of the random sample is
| (2) |
To estimate the nonparametric regression, we use a B-spline basis. Let Sn be the space of polynomial splines of degree ℓ ≥ 1 and denote a normalized B-spline basis with and dn j = O(n1/5), where is the supremum norm. For any j ∈ {1, …, p} and , we have
| (3) |
for some coefficients . Here we allow dn j to increase with n and differ for different j because different coefficient functions may have different smoothness. Under some conditions, the nonparametric coefficient functions can be well approximated by functions in .
Denote and , and define z(j) similarly to x(j). Substituting (3) into (2), the maximum partial likelihood estimate of (2) is to maximize
| (4) |
with respect to β. We next propose a feature screening procedure based on (4).
2.1. A new feature screening procedure
Denote , the L2-norm of α j(·). For ease of presentation, denote s as an arbitrary subset of {1, …, p}, xs = {x j : j ∈ s} and αs(U) = {α j(U) : j ∈ s}. For a set s, τ(s) stands for the cardinality of s. Suppose the effect of x is sparse and the true value of α(U) is α∗(U), where β∗ is the corresponding coefficients of α∗(U). Denote . By sparsity, we mean that τ(s∗) is much less than p. The goal of feature screening is to identify a subset s such that s∗ ⊂ s with overwhelming probability and τ(s) is also much less than p. According to (4), we propose screening features for the varying-coefficient Cox model by the constrained partial likelihood
| (5) |
for a pre-specified m, which is assumed to be greater than the number of nonzero elements of β*.
For high-dimensional problems, it becomes almost impossible to solve the constrained maximization problem (5) directly. Alternatively, we consider a proxy of the partial likelihood function. It follows by the Taylor expansion for the partial likelihood function ℓp(γ) at β lying within a neighborhood of γ that
where and . Denote . If is invertible, the computational complexity of calculating the inverse of is . For large Pt, small n problems (i.e., Pt ≫ n), becomes not invertible. Low computational cost is always desirable for feature screening. To deal with singularity of the Hessian matrix and save computational cost, we propose using the approximation
| (6) |
for , where u is a scaling constant to be specified and W(β) = diag{W1(β), …, Wp(β)}, a block diagonal matrix with Wj(β) being a dnj × dnj matrix. Here (6) is the minimization of the original objective function, , for all γ under some conditions. Due to the properties of the majorization and minorization algorithm, using (6) we can obtain the same estimates as the original objective function. The two functions themselves, however, are not numerically equal. Here we allow W(β) to depend on β. This implies that we approximate by −uW(β). Throughout this paper, we will use .
It can be seen that , and, under some conditions, for all γ. This ensures the ascent property. See Theorem 1 below for more details. Since W(β) is a block diagonal matrix, h(γ|β) is an additive function of γ j for any given β. The additivity enables us to have a closed form solution for the maximization problem
| (7) |
for given β and m. Define for j ∈ {1, …, p}, and is the maximizer of h(γ|β). Denote for j ∈ {1, …, p}, and sort gj so that . The solution of the maximization problem (7) is the hard-thresholding rule defined below:
This enables us to effectively screen features by using the following algorithm.
Theorem 1. Suppose that Conditions (D1)–(D4) in the Appendix hold. Let β(t) be the sequence defined in Step 2b in the above algorithm. Denote
where λmax(A) stands for the maximal eigenvalue of a matrix A. If ut ≥ ρ(t), then , where β(t+1) is defined in Step 2b in the above algorithm.
Theorem 1 claims the ascent property of the proposed algorithm if ut is appropriately chosen. That is, the proposed algorithm may improve the current estimate within the feasible region (i.e., ), and the resulting estimate in the current step may serve as a refinement of the last step. This theorem provides us with some insights into how to choose ut in practical implementation.
2.2. Sure screening property
For a subset s of {1, …, p} with size τ(s), recall the notation xs = {xj : j ∈ s} and associated coefficients αs(U) = {α j(U) : j ∈ s} corresponding to βs = {β j : j ∈ s} with . We denote the true model by with τ(s∗) = q. The objective of feature screening is to obtain a subset ŝ such that s∗ ⊂ ŝ with very high probability.
We now provide some theoretical justifications for the proposed screening procedure for the ultrahigh-dimensional varying-coefficient Cox model. The sure screening property [12] is referred to as
| (8) |
To establish this sure screening property for the proposed screening procedure, we introduce some additional notation as follows. For any model s, let and be the score function and the Hessian matrix of ℓ as a function of βs, respectively. Assume that a screening procedure retains m out of p features such that τ(s∗) = q < m. So, we define
as the collections of the over-fitted models and the under-fitted models, respectively. We investigate the asymptotic properties of under the scenario where p, q, m and β* are allowed to depend on the sample size n. We impose the following conditions, some of which are purely technical and merely serve to facilitate theoretical understanding of the proposed feature screening procedure. For ease of presentation and without loss of generality, it is assumed that .
(C1) The support of U is bounded on [a, b].
(C2) The functions α1(U), …, αp(U) belong to a class of functions , whose rth derivative exists and is Lipschitz of order η,
for some positive constant K, where r is a nonnegative integer and η ∈ (0, 1] such that ν = r + η > 0.5.
(C3) There exist w1, w2 > 0 and some non-negative constants τ1, τ2 such that τ1 + τ2 < ½ and
(C4) ln p = O(nκ) for some 0 ≤ κ < 1 − 2(τ1 + τ2).
(C5) There exist constants C1, C2 > 0, δ > 0, such that for sufficiently large n,
for and , where λmin and λmax denote the smallest and largest eigenvalues of a matrix, respectively.
Under Conditions (C1)–(C2), the following two properties of B-splines are valid.
(a) de Boor [3]: For k ∈ {1, …, dn}, ψ jk (U) ≥ 0 and , U ∈ [a, b]. In addition, there exist positive constants C3 and C4 such that .
(b) Stone [23, 24]: If {α1, …, αp} is a set of functions in described in Condition (C2), there exists a positive constant C5 that does not depend on α j(U); then the uniform approximation error satisfies for all j ∈ {1, …, p}, as dn → ∞.
Conditions (C1)–(C2) ensure properties (a) and (b), which are required for the B-spline approximation and establishing the sure screening properties. Note that . Based on properties (a) and (b) and Condition (C3), we can derive that
Condition (C3) states a few requirements for establishing the sure screening property of the proposed procedure. The first one is the sparsity of α∗(U), which makes the sure screening possible with . Also, it requires that the minimal component in α∗(U) does not degenerate too quckly, so that the signal is detectable in the asymptotic sequence. Meanwhile, together with (C4), it confines an appropriate order of m that guarantees the identifiability of s∗ over s for τ(s) ≤ m. Condition (C5) assumes that p diverges from n at up to an exponential rate; it implies that the number of covariates can be substantially larger than the sample size.
We establish the sure screening property of the quasi-likelihood estimation in the following theorem.
Theorem 2. Suppose that Conditions (C1)–(C5) and Conditions (D1)–(D7) in the Appendix hold. Let ŝ be the model obtained by Eq. (5) of size m. We have as n → ∞.
The proof is given in the Appendix. The sure screening property is an appealing property of a screening procedure because it ensures that the true active predictors are retained in the model selected by the screening procedure. To be distinguished from the SIS procedure, the proposed procedure is referred to as a sure joint screening (SJS) procedure.
3. Numerical studies
In this section, we assess the finite-sample performance of the proposed procedure, compare it with existing procedures via simulation, and illustrate the proposed procedure by an empirical analysis of a genomic data set.
3.1. Simulation studies
The main purpose of our simulation studies is to assess the performance of the proposed procedure by comparing it with the SIS [11] and the SJS [28] procedures for the Cox model. The model sizes selected by the three methods are set to be the same for comparison. We vary the dimension of predictors p, sample size n and sample correlation ρ to examine their impact on the performance of the proposed procedure. We use the success rate of active predictors being selected and computing time as our criteria to compare the performance of screening procedures.
In our simulation, the predictors x are generated from a p-dimensional normal distribution with mean zero and covariance matrix . Two commonly used covariance structures are used in our simulation:
(S1) Σ is compound symmetric, (i.e., σij = ρ for i ≠ j and equal 1 for i = j). We choose ρ ∈ {0.25, 0.5, 0.75}.
(S2) Σ has autoregressive structure with AR(1), (i.e., σij = ρ|i− j|). We choose ρ ∈ {0.25, 0.5, 0.75}.
We generate the survival time from the Cox model with h0(t) = 1 and the censoring time from a uniform distribution . Three different coefficient function settings α(u)s are considered:
(a1): , , ;
(a2): , , ;
(a3): , , .
We consider n ∈ {200, 400}, and p ∈ {2000, 5000}. For the feature screening model size, we follow Liu et al. [20] and set m = ⌊n0.8/ ln(n0.8)⌋, where ⌊a⌋ denotes the integer part of a. For each combination of different inputs, we conduct 1000 repetitions.
To illustrate the performance of a statistical procedure in survival data analysis, we want the censoring rates to lie within a reasonable range. Table 1 depicts the censoring rates for the 18 combinations of covariance structure, sample correlation ρ and the values of α(u). The censoring rates range from 22% to 37%, which is reasonable for simulation studies.
Table 1:
Censoring rates.
| ρ = 0.25 | ρ = 0.5 | ρ = 0.75 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Σ | (a1) | (a2) | (a3) | (a1) | (a2) | (a3) | (a1) | (a2) | (a3) |
| S1 | .276 | .367 | .223 | .277 | .356 | .260 | .277 | .340 | .248 |
| S2 | .275 | .365 | .265 | .279 | .358 | .283 | .278 | .347 | .245 |
We compare the performance of feature screening procedures using the following two criteria: Ps, the probability that an individual active predictor is selected, and Pa, the probability that all active predictors are selected. It is expected that the performance of the proposed varying-coefficient SJS (VSJS) procedure depends on the following factors: the structure of the covariance matrix, the values of α(u), the dimension of all candidate features p, the sample correlation ρ and the sample size n.
Tables 2–3 report Ps and Pa of VSJS, SIS and SJS for the active predictors under (S1). Overall, VSJS outperforms both SIS and SJS for all the three sets of α(u) in terms of Ps and Pa. For (a1), VSJS achieves a high success rate in detecting signals of and , while SIS and SJS fail from time to time.
Table 2:
Comparison between VSJS, SIS and SJS with Σ = (1 − ρ)I + ρ11T (n = 200).
| VSJS | SIS | SJS | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ps | Pa | Time | Ps | Pa | Time | Ps | Pa | Time | |||||||
| α(U) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) |
| n = 200, p = 2000 and ρ = .25 | |||||||||||||||
| α(1) | .989 | 1 | 1 | .989 | 74.5 | .796 | .747 | .990 | .580 | 9.5 | .499 | .419 | .936 | .190 | 3.6 |
| α(2) | .999 | .998 | .999 | .996 | 67.7 | .016 | .002 | 1 | 0 | 8.3 | .018 | .037 | .999 | .002 | 2.4 |
| α(3) | 1 | .810 | .993 | .803 | 82.2 | 1 | .771 | .992 | .763 | 6.0 | 1 | .785 | .996 | .781 | 2.8 |
| n = 200, p = 2000 and ρ = .5 | |||||||||||||||
| α(1) | .970 | .976 | .915 | .868 | 68.9 | .621 | .557 | .968 | .325 | 9.2 | .392 | .311 | .863 | .092 | 2.9 |
| α(2) | .922 | .922 | .990 | .848 | 66.8 | .006 | .003 | 1 | 0 | 7.8 | .020 | .052 | .997 | 0 | 2.5 |
| α(3) | .998 | .617 | .938 | .581 | 74.8 | .999 | .611 | .932 | .573 | 5.3 | 1 | .574 | .932 | .542 | 3.2 |
| n = 200, p = 2000 and ρ = .75 | |||||||||||||||
| α(1) | .628 | .670 | .682 | .259 | 62.4 | .357 | .316 | .879 | .093 | 9.4 | .247 | .211 | .701 | .031 | 3.0 |
| α(2) | .485 | .535 | .738 | .204 | 67.3 | .005 | .001 | 1 | 0 | 6.8 | .018 | .059 | .935 | 0 | 3.4 |
| α(3) | .910 | .361 | .686 | .247 | 62.5 | .987 | .341 | .736 | .250 | 5.3 | .958 | .286 | .644 | .181 | 3.4 |
| n = 200, p = 5000 and ρ = .25 | |||||||||||||||
| α(1) | 1 | 1 | .993 | .993 | 464.0 | .721 | .649 | .983 | .456 | 15.4 | .391 | .326 | .865 | .097 | 32.9 |
| α(2) | .996 | .994 | 1 | .990 | 416.3 | .004 | .004 | 1 | 0 | 18.1 | .007 | .016 | .994 | 0 | 17.6 |
| α(3) | 1 | .708 | .984 | .694 | 451.5 | 1 | .684 | .974 | .667 | 15.2 | 1 | .627 | .980 | .615 | 16.8 |
| n = 200, p = 5000 and ρ = .5 | |||||||||||||||
| α(1) | .925 | .930 | .845 | .725 | 412.7 | .496 | .430 | .954 | .199 | 22.9 | .281 | .224 | .779 | .040 | 16.8 |
| α(2) | .856 | .876 | .976 | .740 | 423.7 | .005 | .002 | 1 | 0 | 16.1 | .007 | .030 | .968 | 0 | 18.9 |
| α(3) | .992 | .508 | .884 | .446 | 390.4 | .999 | .455 | .866 | .38 | 15.2 | .998 | .435 | .878 | .383 | 24.0 |
| n = 200, p = 5000 and ρ = .75 | |||||||||||||||
| α(1) | .510 | .501 | .504 | .121 | 398.1 | .261 | .218 | .803 | .042 | 15.3 | .135 | .140 | .541 | .010 | 20.3 |
| α(2) | .372 | .399 | .625 | .093 | 396.6 | .002 | 0 | .999 | 0 | 14.9 | .006 | .022 | .867 | 0 | 22.2 |
| α(3) | .892 | .276 | .597 | .158 | 369.5 | .977 | .258 | .624 | .159 | 13.3 | .909 | .164 | .493 | .075 | 24.7 |
Table 3:
Comparison between VSJS, SIS and SJS with Σ = (1 − ρ)I + ρ11T (n = 400).
| VSJS | SIS | SJS | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ps | Pa | Time | Ps | Pa | Time | Ps | Pa | Time | |||||||
| α(U) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) |
| n = 400, p = 2000 and ρ = .25 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 217.7 | 1 | .960 | 1 | .960 | 8.8 | .859 | .805 | .999 | .686 | 5.8 |
| α(2) | 1 | 1 | 1 | 1 | 205.9 | .020 | .001 | 1 | 0 | 7.9 | .010 | .076 | 1 | 0 | 5.6 |
| α(3) | 1 | 1 | 1 | 1 | 215.3 | 1 | .974 | 1 | .974 | 8.3 | 1 | .997 | 1 | .997 | 4.9 |
| n = 400, p = 2000 and ρ = .5 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 190.2 | .900 | .871 | .999 | .779 | 8.5 | .736 | .607 | .998 | .437 | 4.6 |
| α(2) | 1 | 1 | 1 | 1 | 184.3 | .010 | .001 | 1 | 0 | 8.5 | .023 | .133 | 1 | .002 | 6.3 |
| α(3) | 1 | .988 | 1 | .988 | 199.5 | 1 | .918 | .997 | .916 | 8.2 | 1 | .944 | 1 | .944 | 5.1 |
| n = 400, p = 2000 and ρ = .75 | |||||||||||||||
| α(1) | .984 | .991 | .976 | .955 | 169.0 | .655 | .566 | .997 | .349 | 8.6 | .474 | .356 | .955 | .155 | 6.3 |
| α(2) | .998 | .995 | 1 | .994 | 162.2 | .001 | 0 | 1 | 0 | 9.5 | .035 | .193 | .999 | .004 | 6.6 |
| α(3) | 1 | .733 | .982 | .719 | 162.8 | 1 | .676 | .968 | .657 | 8.2 | 1 | .576 | .938 | .540 | 6.1 |
| n = 400, p = 5000 and ρ = .25 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 1202 | .963 | .957 | 1 | .920 | 21.6 | .963 | .957 | 1 | .920 | 21.6 |
| α(2) | 1 | 1 | 1 | 1 | 1164 | .006 | .001 | 1 | 0 | 20.6 | .004 | .038 | 1 | .001 | 31.1 |
| α(3) | 1 | 1 | 1 | 1 | 1180 | 1 | .960 | 1 | .960 | 18.2 | 1 | .993 | 1 | .993 | 36.5 |
| n = 400, p = 5000 and ρ = .5 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 1086 | .849 | .798 | .999 | .669 | 21.0 | .849 | .798 | .999 | .669 | 21.1 |
| α(2) | 1 | 1 | 1 | 1 | 1101 | .001 | 0 | 1 | 0 | 22.2 | .011 | .071 | 1 | .002 | 32.1 |
| α(3) | 1 | .975 | 1 | .975 | 1071 | 1 | .840 | .998 | .838 | 19.6 | 1 | .872 | 1 | .872 | 40.3 |
| n = 400, p = 5000 and ρ = .75 | |||||||||||||||
| α(1) | 1 | 1 | .980 | .980 | 929.0 | .562 | .426 | .994 | .224 | 21.0 | .336 | .267 | .933 | .073 | 35.9 |
| α(2) | .994 | .992 | .997 | .988 | 936.7 | .001 | 0 | 1 | 0 | 20.8 | .016 | .109 | 1 | .001 | 35.3 |
| α(3) | .995 | .621 | .926 | .586 | 909.6 | 1 | .580 | .935 | .535 | 18.3 | .999 | .446 | .900 | .401 | 46.1 |
We next consider the performance of VSJS under (a2). For the zero-centered and , VSJS successfully detects their variation signal and achieves high success rates. As a comparison, SIS and SJS fail to identify and as active predictors completely in (a2). In general, VSJS still performs better to some extent in (a3), though SIS slightly outperforms VSJS in a few cases.
Tables 2–3 clearly show how performance is affected by sample correlation ρ, predictor dimension p, and sample size n. When ρ increases, n decreases, or p increases, the three methods perform worse under (S1). Compared to SIS and SJS, the performance of VSJS is more resistant to these changes. Also, Tables 2–3 suggest that VSJS is more computationally inefficient than SIS and SJS.
Tables 4–5 report Ps and Pa of VSJS, SIS, and SJS for the active predictors under (S2). Overall, VSJS still outperforms SIS and SJS. It is worth noting that the three methods have much better performance under (S2) than previous cases under (S1), especially when the correlation ρ is larger. In (a1), VSJS and SIS both perform perfectly and slightly better than SJS. When we consider (a2), SIS and SJS perform better under (S2) and successfully identify and from time to time. However, VSJS again outperforms them in (a2). For (a3), the three methods achieve almost 100% success rate for selecting active predictors. SJS misses some active predictors in a few cases.
Table 4:
Comparison between VSJS, SIS and SJS with Σ = (ρ|i− j|) (n = 200).
| VSJS | SIS | SJS | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ps | Pa | Time | Ps | Pa | Time | Ps | Pa | Time | |||||||
| α(U) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) |
| n = 200, p = 2000 and ρ = .25 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 76.9 | 1 | 1 | .988 | .988 | 5.2 | .856 | .809 | .997 | .684 | 2.6 |
| α(2) | 1 | 1 | 1 | 1 | 70.8 | .042 | .116 | 1 | .008 | 5.9 | .047 | .027 | 1 | 0 | 2.4 |
| α(3) | 1 | 1 | 1 | 1 | 86.6 | 1 | 1 | 1 | 1 | 7.1 | 1 | .981 | 1 | .981 | 3.0 |
| n = 200, p = 2000 and ρ = .5 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 73.1 | 1 | 1 | .999 | .999 | 8.2 | .889 | .792 | .990 | .690 | 2.5 |
| α(2) | 1 | 1 | 1 | 1 | 67.6 | .166 | .611 | 1 | .145 | 5.8 | .052 | .065 | 1 | .011 | 2.4 |
| α(3) | 1 | 1 | 1 | 1 | 82.8 | 1 | 1 | 1 | 1 | 7.7 | 1 | .977 | 1 | .977 | 3.1 |
| n = 200, p = 2000 and ρ = .75 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 75.5 | 1 | 1 | 1 | 1 | 5.2 | .877 | .768 | .990 | .642 | 3.0 |
| α(2) | 1 | 1 | 1 | 1 | 68.6 | .722 | .968 | 1 | .720 | 5.8 | .125 | .417 | .997 | .076 | 2.6 |
| α(3) | 1 | .997 | 1 | .997 | 79.4 | 1 | 1 | 1 | 1 | 8.4 | 1 | .926 | .991 | .917 | 3.1 |
| n = 200, p = 5000 and ρ = .25 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 456.4 | .968 | .997 | 1 | .965 | 15.4 | .785 | .734 | .989 | .559 | 16.1 |
| α(2) | 1 | 1 | 1 | 1 | 463.8 | .016 | .067 | 1 | .004 | 14.6 | .016 | .022 | .999 | 0 | 14.9 |
| α(3) | 1 | 1 | .998 | .998 | 477.1 | 1 | .999 | 1 | .999 | 16.2 | 1 | .967 | 1 | .967 | 20.1 |
| n = 200, p = 5000 and ρ = .5 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 451.1 | 1 | 1 | 1 | 1 | 13.1 | .799 | .730 | .979 | .543 | 13.2 |
| α(2) | 1 | 1 | 1 | 1 | 439.9 | .121 | .501 | 1 | .103 | 14.3 | .030 | .025 | 1 | .003 | 16.0 |
| α(3) | 1 | 1 | 1 | 1 | 475.4 | 1 | 1 | 1 | 1 | 15.8 | 1 | .966 | .997 | .963 | 20.3 |
| n = 200, p = 5000 and ρ = .75 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 448.2 | 1 | 1 | 1 | 1 | 15.4 | .844 | .685 | .987 | .538 | 19.0 |
| α(2) | 1 | 1 | 1 | 1 | 427.3 | .627 | .938 | 1 | .626 | 14.8 | .062 | .327 | 1 | .040 | 15.9 |
| α(3) | 1 | .996 | 1 | .996 | 453.9 | 1 | 1 | 1 | 1 | 14.4 | 1 | .916 | .980 | .896 | 23.3 |
Table 5:
Comparison between VSJS, SIS and SJS with Σ = (ρ|i− j|) (n = 400).
| VSJS | SIS | SJS | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ps | Pa | Time | Ps | Pa | Time | Ps | Pa | Time | |||||||
| α(U) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) | X1 | X2 | X3 | all | (s) |
| n = 400, p = 2000 and ρ = .25 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 229.6 | 1 | 1 | 1 | 1 | 8.6 | .991 | .979 | 1 | .970 | 6.3 |
| α(2) | 1 | 1 | 1 | 1 | 223.3 | .083 | .251 | 1 | .036 | 8.5 | .047 | .040 | 1 | .001 | 5.2 |
| α(3) | 1 | 1 | 1 | 1 | 240.1 | 1 | 1 | 1 | 1 | 11.9 | 1 | 1 | 1 | 1 | 7.0 |
| n = 400, p = 2000 and ρ = .5 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 225.9 | 1 | 1 | 1 | 1 | 7.5 | .992 | .959 | 1 | .951 | 5.3 |
| α(2) | 1 | 1 | 1 | 1 | 226.1 | .387 | .922 | 1 | .382 | 8.8 | .070 | .263 | 1 | .031 | 5.2 |
| α(3) | 1 | 1 | 1 | 1 | 236.8 | 1 | 1 | 1 | 1 | 8.5 | 1 | 1 | 1 | 1 | 7.3 |
| n = 400, p = 2000 and ρ = .75 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 217.9 | 1 | 1 | 1 | 1 | 8.9 | .979 | .907 | 1 | .886 | 6.4 |
| α(2) | 1 | 1 | 1 | 1 | 218.4 | .969 | 1 | 1 | .969 | 9.1 | .139 | .598 | 1 | .080 | 5.8 |
| α(3) | 1 | .999 | 1 | .999 | 227.8 | 1 | 1 | 1 | 1 | 11.9 | 1 | .997 | 1 | .997 | 7.6 |
| n = 400, p = 5000 and ρ = .25 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 1264 | 1 | 1 | 1 | 1 | 20.6 | .988 | .962 | 1 | .952 | 29.5 |
| α(2) | 1 | 1 | 1 | 1 | 1265 | .054 | .183 | 1 | .018 | 18.7 | .029 | .032 | 1 | 0 | 28.8 |
| α(3) | 1 | 1 | 1 | 1 | 1215 | 1 | 1 | 1 | 1 | 20.8 | 1 | 1 | 1 | 1 | 33.8 |
| n = 400, p = 5000 and ρ = .5 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 1274 | 1 | 1 | 1 | 1 | 20.5 | .976 | .924 | 1 | .900 | 32.5 |
| α(2) | 1 | 1 | 1 | 1 | 1256 | .318 | .884 | 1 | .312 | 19.9 | .038 | .162 | 1 | .017 | 29.1 |
| α(3) | 1 | 1 | 1 | 1 | 1194 | 1 | 1 | 1 | 1 | 20.6 | 1 | .999 | 1 | .999 | 35.6 |
| n = 400, p = 5000 and ρ = .75 | |||||||||||||||
| α(1) | 1 | 1 | 1 | 1 | 1202 | 1 | 1 | 1 | 1 | 20.7 | .969 | .902 | 1 | .871 | 36.9 |
| α(2) | 1 | 1 | 1 | 1 | 1225 | .954 | 1 | 1 | .954 | 21.9 | .085 | .548 | 1 | .051 | 29.9 |
| α(3) | 1 | 1 | 1 | 1 | 1139 | 1 | 1 | 1 | 1 | 29.5 | 1 | .995 | 1 | .995 | 34.6 |
We can conclude from Tables 4 and 5 that SIS and SJS tend to perform better when ρ increases, n increases, or p decreases. For VSJS, it performs perfectly in all three settings under (S1). Similarly, Tables 4 and 5 show that VSJS is more computationally intensive than SIS and SJS.
3.2. Real data analysis
We analyze The Cancer Genome Atlas (TCGA; http://cancergenome.nih.gov/) data on liver hepatocellular carcinoma to illustrate the proposed procedure. Liver hepatocellular carcinoma is the most common form of liver cancer and the third cancer death cause worldwide. Zhang and Sun [31] studied 17,255 patients in the Surveillance, Epidemiology, and End Results Program (SEER; https://seer.cancer.gov/) cancer registry and suggested that age is a prognostic factor for liver cancer. Therefore, we consider age as the univariate covariate for coefficient functions, allowing the effects of gene expression on survival time to vary with age. After removing five subjects whose survival time is zero, we obtain 354 subjects with gene expressions (IlluminaHiSeq RNA-seq v2 platform), age at diagnosis, and survival months. We apply a log 2 transformation to gene expressions and analyze 14,683 genes that have more than 90% nonzero observations.
For VSJS, we use a linear combination of five B-spline basis functions to approximate the varying-coefficient functions. As a result, VSJS retains 23 = ⌊3540.8/ ln(3540.8)⌋ genes and the partial likelihood function value for the corresponding model is −544.9. With the same number of genes retained, the resulting partial likelihood function values for SIS and SJS are −589.2 and −588.4, respectively. Simultaneous modeling of the 23 retained genes shows a clear advantage of VSJS in terms of higher partial likelihood value.
To better understand the screening result of VSJS, we apply the backward selection procedure to those 23 genes and obtain a more parsimonious model. Specifically, each backward elimination step removes a gene with the smallest likelihood ratio test statistic until all the genes are significant at level 0.05. Table 6 provides the final list of 11 genes after applying the backward elimination, and Figure 1 depicts their varying coefficients.
Table 6:
Genes selected by backward elimination.
| Gene Name | ANLN | CEP55 | DYNC1LI1 | GTPBP4 |
|---|---|---|---|---|
| LRT Stat | 15.869 | 14.137 | 18.171 | 22.658 |
| p-value | 0.00723 | 0.0148 | 0.00274 | < 001 |
| Gene Name | SLC2A1 | KIF2C | KIF20A | KPNA2 |
| LRT Stat | 18.465 | 26.261 | 15.839 | 14.511 |
| p-value | 0.00241 | < 0.001 | 0.00731 | 0.0127 |
| Gene Name | LIMS2 | TRIP13 | UCK2 | |
| LRT Stat | 23.093 | 17.517 | 14.671 | |
| p-value | < 0.001 | 0.00361 | 0.0119 | |
Figure 1:
Estimated coefficient functions and the pointwise conference intervals of selected genes. The red line represents the average level of the varying-coefficient functions.
Our literature search reveals that those 11 genes are all associated with cancer risk and some genes. For example, GTPBP4 [21] and SLC2A2 [16] are promising prognostic factors for hepatocellular carcinoma. To test whether those 11 genes have varying coefficients versus constant coefficients, a test of for some constant α j versus can be conducted for each j in the selected gene set. The test result is shown in Table 7, and all the genes except DYNC1LI1 have significant varying-coefficient functions of age at the 5% level of significance. There is no evidence of their time-varying effects in the current medical literature, but our study may suggest some evidence for potential granular investigation on those genes.
Table 7:
LRT statistics and p-values for the varying coefficients of the final selected genes
| Gene Name | ANLN | CEP55 | DYNC1LI1 | GTPBP4 |
|---|---|---|---|---|
| LRT Stat | 15.058 | 10.495 | 8.268 | 19.036 |
| p-value | 0.00458 | 0.0328 | 0.0822 | 0.000773 |
| Gene Name | SLC2A1 | KIF2C | KIF20A | KPNA2 |
| LRT Stat | 17.473 | 24.253 | 15.183 | 14.238 |
| p-value | 0.00156 | 0.000071 | 0.00433 | 0.00657 |
| Gene Name | LIMS2 | TRIP13 | UCK2 | |
| LRT Stat | 23.097 | 16.191 | 13.803 | |
| p-value | 0.000121 | 0.00277 | 0.00795 | |
4. Discussion
We have proposed an SJS procedure for the varying-coefficient Cox model with ultrahigh-dimensional covariates based on partial likelihood. The proposed SJS is distinguished from the existing SIS procedure in that the proposed procedure is based on the joint likelihood of potential candidate features. We also proposed an effective algorithm to carry out the feature screening procedures and show that the proposed algorithm possesses an ascent property. We studied the sampling property of SJS and established the sure screening property for SJS.
Theorem 1 ensures the ascent property of the proposed algorithm under certain conditions, but it does not imply that the proposed algorithm converges to the global optimizer. If the proposed algorithm converges to a global maximizer of (5), then Theorem 2 shows that such a solution enjoys the sure screen property.
Acknowledgments
Yang’s research was supported by the National Nature Science Foundation of China grants 11471086 and 11871173, the National Social Science Foundation of China grants 16BTJ032, the National Statistical Scientific Center 2015LD02 and the Fundamental Research Funds for the Central Universities of Jinan University Qimingxing Plan 15JNQM019. Zhang and Li’s research was supported by NIDA grant P50 DA039838, and NSF grant DMS 1820702, and grants from NNSFC 11690014 and 11690015. Huang’s research was supported by the NIH grant 5R01NS091161 and the CHDI Foundation grant A-13343. The content is solely the responsibility of the authors and does not necessarily represent the official views of the CHDI, the NIDA, the NIAID, the NIH, the NSF, or the NNSFC.
Appendix
We use the following notation to present the regularity conditions for the partial likelihood and the Cox model. Most notations are adapted from Andersen and Gill [1], in which counting processes were introduced for the Cox model and the consistency and asymptotic normality of the partial likelihood estimate were established. Denote and Ri(t) = {Ti ≥ t, Ci ≥ t}. Assume that there are no two component processes Ni(t) jumping at the same time. For simplicity, we work on the finite interval [0, τ].
In Cox’s model, properties of stochastic processes, such as being a local martingale or a predictable process, are relative to a right-continuous nondecreasing family of sub σ-algebras on a sample space ; represents everything that happens up to time t. Throughout this section, we define
By stating that has intensity process , we mean that the processes Mi(t) defined, for each i ∈ {1, …, n}, by
are local martingales on the time interval [0, τ]. For k ∈ {0, 1, 2}, define
and
where , and . Note that S(0)(β, t) is a scalar, S(1)(β, t) and E(β, t) are p-vector, and S(2)(β, t) and V(β, t) are p × p matrices. Define
Here, , i.e., . Let bj = Qj − Qj−1, then b1, b2, … is a sequence of bounded martingale differences on . That is, bj is bounded almost surely and as for j ∈ {1, 2, …}.
(D1) Finite interval: .
(D2) Asymptotic stability: There exists a neighborhood of β* and scalar, vector and matrix functions s(0),s(1) and s(2) defined on such that for k ∈ {0, 1, 2},
(D3) Lindeberg condition: There exists δ > 0 such that
(D4) Asymptotic regularity conditions: Let , s(0), s(1) and s(2) be as in Condition (D2) and define e = s(1)/s(0) and v = s(2)/s(0) − e⊗2. For all , t ∈ [0, τ],
s(0)(·, t), s(1)(·, t) and s(2)(·, t) are continuous functions of , uniformly in t ∈ [0, τ], s(0), s(1) and s(2) are bounded on ; s(0) is bounded away from zero on , and the matrix
is positive definite.
(D5) The function S(0)(β*, t) and s(0)(β*, t) are bounded away from 0 on [0, τ].
(D6) There exist constants C1, C2 > 0, such that maxij |zij| < C1 and .
(D7) b1, b2, … is a sequence of martingale differences and there exist nonnegative constants c1, …, cN such that for every real number t and all j ∈ {1, …, N}, almost surely. For each j ∈ {1, …, N}, the minimum of those cj is denoted by η(bj) and |bj| ≤ Kj as and E(bj1, bj2, …, bjk ) = 0 for bj1 < … < bjk.
Note that the partial derivative conditions on s(0), s(1) and s(2) are satisfied by S(0), S(1) and S(2); furthermore, S is automatically positive semidefinite. Moreover, the interval [0, τ] in the conditions may everywhere be replaced by the set {t : h0(t) > 0}.
Conditions (D1)–(D5) are standard requirements for the proportional hazards model [1], which are weaker than those required by Bradic et al. [4], and S(k)(β0, t) converges uniformly to s(k)(β0, t). Condition (D6) is a routine one, which is needed to apply the concentration inequality for general empirical processes. For example, the bounded covariate assumption is used by Huang et al. [15] for discussing the Lasso estimator of proportional hazards models. Condition (D7) is needed for the asymptotic behavior of the score function of partial likelihood because the score function cannot be represented as a sum of independent random vectors, but it can be represented as sum of a sequence of martingale differences.
Proof of Theorem 1. Applying the Taylor expansion to ℓp(γ) at γ = β, one finds
where lies between γ and β.
where W(β) is a block diagonal matrix with Wj(β) being a matrix. Given that −ℓ″(β) is non-negative definite, . Thus, if , then
Thus it follows that and by the definition of h(γ|β). The solution of is . Hence, under the conditions of Theorem 1, it follows that
The second inequality is due to the fact that
and subject to . By definition of β(t+1), and . This proves Theorem 1. □
Proof of Theorem 2. For a given model s, a subset of {1, …, p}, let be the partial likelihood estimate of αs(U) based on the spline approximation. The theorem is implied if . Thus, it suffices to show that
| (A.1) |
For each j ∈ {1, …, p}, we approximate the coefficient function α j(U) by
| (A.2) |
where are basis functions and dn is the number of basis functions, which is allowed to increase with the sample size n. For αn j(U), define the approximation error for each j ∈ {1, …, p}, by
Let , and take . Let αn(U) = (αn1(U), …, αnp(U))T and α(U) = (α1(U), …, αp(U))T. For any s,
where Ψs(U) = diag{ψ1(U), …, ψs(U)} with , and for all j ∈ {1, …, s}. For any , define . So, we have
where and are two immediate values. Denote
Thus, we have . For ∆2, by the Cauchy–Schwarz inequality, we have
By Condition (C5) and Corollary 1 in [25], we obtain ∆2 = op(1). Similarly to ∆2, we can also conclude that ∆3 = op(1).
Next, we consider the term ∆1. For any , define . Under Condition (C3), we consider close to such that for some w1, τ1 > 0. Clearly, when n is sufficiently large, βs′ falls into a small neighborhood of , so that Condition (C5) becomes applicable. Thus, it follows from Condition (C5) and the Cauchy–Schwarz inequality that
| (A.3) |
where is an intermediate value between βs′ and . Thus, we have
Also, by (C3), we have , and also the following probability inequality
| (A.4) |
where denotes some generic positive constant. Recall (2), by differentiation and rearrangement of terms, it can be shown as in [1] that the gradient of ℓp(β) is
| (A.5) |
where . As a result, the partial score function no longer has a martingale structure, and the large deviation results for continuous time martingale in [4] and [15] are not directly applicable. The martingle process associated with is given by
For each j ∈ {1, …, N}, let tj be the time of the jth jump of the process and set t0 = 0. Then, t j are stopping times. For j ∈ {0, …, N}, further define
| (A.6) |
where for all i ∈ {1, …, n} are predictable, provided that no two component processes jump at the same time, (D6 holds), and |bi(u)| ≤ 1.
Since Mi(u) are martingales and bi(u) are predictable, {Q0, Q1, …} is a martingale with the difference |Qj − Qj−1| ≤ maxu,i |bi(u)| ≤ 1. Recall definition of N in Section 2, we define , where C0 is a constant. So, by the martingale version of Hoeffding’s inequality [2] and under Condition (D7), we have
By (A.6), if and only if Thus, the left-hand side of (3.15) in Lemma 3.3 of [15] is no greater than Pr(|QN| > nC0 x) ≤ 2 exp(−nx2/2). Now (A.4) can be rewritten as follows:
| (A.7) |
By the same arguments, we have
| (A.8) |
Inequalities (A.7) and (A.8) imply that,
Consequently, by Bonferroni’s inequality and under conditions (C3)–(C4), we have
| (A.9) |
as n → ∞ for some generic positive constants a1 = 4w2 and . By Condition (C5), ℓp(βs′ ) is concave in βs′ and (A.9) holds for any βs′ such that .
For any , let be augmented with zeros corresponding to the elements in s′/s∗, i.e., s′ = {s ∪ (s∗/s′)} ∪ (s′/s∗). By Condition (C3),
Consequently,
So, we have shown that
Therefore, the theorem is proved.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Andersen PK, Gill RD, Cox’s regression model for counting processes: A large sample study, Ann. Statist 10 (1982) 1100–1120. [Google Scholar]
- [2].Azuma K, Weighted sums of certain dependent random variables, Tohoku Math. J 19 (1967) 357–367. [Google Scholar]
- [3].de Boor C, A Practical Guide to Splines, Springer, New York, 1978. [Google Scholar]
- [4].Bradic JFJ, Fan J, Jiang J, Regularization for Cox’s proportional hazards model with NP-dimensionality, Ann. Statist 39 (2011) 3092–3120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Cheng M-Y, Honda T, Zhang J-T, Forward variable selection for sparse ultra-high dimensional varying-coefficient models, J. Amer. Statist. Assoc 111 (2016) 1209–1221. [Google Scholar]
- [6].Chu W, Li R, Reimherr M, Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data, Ann. Appl. Statist 10 (2016) 596–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Cox DR, Regression models and life tables (with discussion), J. R. Stat. Soc. Ser. B 34 (1972) 187–220. [Google Scholar]
- [8].Cox DR, Partial likelihood, Biometrika 62 (1975) 269–276. [Google Scholar]
- [9].Du P, Ma S, Liang H, Penalized variable selection procedure for Cox models with semiparametric relative risk, Ann. Statist 38 (2010) 2092–2117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Fan J, Feng Y, Song R, Nonparametric independence screening in sparse ultra-high-dimensional additive models, J. Amer. Statist. Assoc 106 (2011) 544–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Fan J, Feng Y, Wu Y, High-dimensional variable selection for Cox’s proportional hazards model, Borrowing Strength: Theory Powering Applications — A Festschrift for Lawrence D. Brown, 70–86, Inst. Math. Stat. (IMS) Collect., 6, Inst. Math. Statist, Beachwood, OH, 2010. [Google Scholar]
- [12].Fan J, Lv J, Sure independence screening for ultrahigh dimensional feature space (with discussion), J. R. Stat. Soc. Ser. B 70 (2008) 849–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Fan J, Ma Y, Dai W, Nonparametric independence screening in sparse ultra-high dimensional varying-coefficient models, J. Amer. Statist. Assoc 109 (2014) 1270–1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Hu Y, Liang H, Variable selection in a partially linear proportional hazards model with a diverging dimensionality, Statist. Probab. Lett 83 (2013) 61–69. [Google Scholar]
- [15].Huang J, Sun T, Ying Z, Yu Y, Zhang C-H, Oracle inequalities for the LASSO in the Cox model, Ann. Statist 41 (2013) 1142–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Kim YH, Jeong DC, Pak K, Han M-E, Kim J-Y, Liangwen L, Kim HJ, Kim TW, Kim TH, Hyun DW, Oh S-O, Slc2a2 (glut2) as a novel prognostic factor for hepatocellular carcinoma, Oncotarget 8 (2017) 68381–68392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Kong X-B, Liu Z, Yao Y, Zhou W, Sure screening by ranking the canonical correlations, Test 26 (2017) 46–70. [Google Scholar]
- [18].Leng C, Zhang H, Model selection in nonparametric hazard regression, J. Nonparametr. Stat 18 (2006) 417–429. [Google Scholar]
- [19].Lian H, Li J, Hu Y, Shrinkage variable selection and estimation in proportional hazards models with additive structure and high dimensionality, Comput. Stat. Data Anal 63 (2013) 99–112. [Google Scholar]
- [20].Liu J, Li R, Wu R, Feature selection for varying-coefficient models with ultrahigh-dimensional covariates, J. Amer. Statist. Assoc 109 (2014) 266–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Liu W-B, Jia W-D, Ma J-L, Xu G-L, Zhou H-C, Peng Y, Wang W, Krockdown of gtpbp4 inhibits cell growth and survival in human hepatocellular carcinoma and its prognostic significance, Oncotarget 8 (2017) 93984–93997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Song R, Yi F, Zou H, On varying-coefficient independence screening for high dimensional varying-coefficient models, Stat. Sinica 24 (2014) 1735–1752. [PMC free article] [PubMed] [Google Scholar]
- [23].Stone CJ, Optimal global rates of convergence for nonparametric regression, Ann. Statist 10 (1982) 1040–1053. [Google Scholar]
- [24].Stone CJ, Additive regression and other nonparametric models, Ann. Statist 13 (1985) 689–705. [Google Scholar]
- [25].Wei F, Huang J, Li H, Variable selection and estimation in high-dimensional varying-coefficient models, Stat. Sinica 21 (2011) 1515–1540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Xia X, Yang H, Li J, Feature screening for generalized varying-coefficient models with application to dichotomous response, Comput. Stat. Data Anal 102 (2016) 85–97. [Google Scholar]
- [27].Xu C, Chen J, The sparse MLE for ultrahigh-dimensional feature screening, J. Amer. Statist. Assoc 109 (2014) 1257–1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Yang G, Yu Y, Li R, Buu A, Feature screening in ultrahigh dimensional Cox’s model, Stat. Sinica 26 (2016) 881–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Yan J, Huang J, Model selection for Cox models with time-varying coefficients, Biometrics 68 (2012) 419–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Zhang H, Lu W, Adaptive Lasso for Cox’s proportional hazards model, Biometrika 94 (2007) 691–703. [Google Scholar]
- [31].Zhang W, Sun B, Impact of age on the survival of patients with liver cancer: An analysis of 27,255 patients in the seer database, Oncotarget 6 (2015) 633–641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Zhao S, Li Y, Principled sure independence screening for Cox models with ultra-high-dimensional covariates, J. Multivariate Anal 105 (2012) 397–411. [DOI] [PMC free article] [PubMed] [Google Scholar]

