Censored Rank Independence Screening for High-dimensional Survival Data

Rui Song; Wenbin Lu; Shuangge Ma; X Jessie Jeng

doi:10.1093/biomet/asu047

. Author manuscript; available in PMC: 2015 Feb 5.

Published in final edited form as: Biometrika. 2014;101(4):799–814. doi: 10.1093/biomet/asu047

Censored Rank Independence Screening for High-dimensional Survival Data

Rui Song ¹, Wenbin Lu ², Shuangge Ma ³, X Jessie Jeng ⁴

PMCID: PMC4318124 NIHMSID: NIHMS625927 PMID: 25663709

Summary

In modern statistical applications, the dimension of covariates can be much larger than the sample size. In the context of linear models, correlation screening (Fan and Lv, 2008) has been shown to reduce the dimension of such data effectively while achieving the sure screening property, i.e., all of the active variables can be retained with high probability. However, screening based on the Pearson correlation does not perform well when applied to contaminated covariates and/or censored outcomes. In this paper, we study censored rank independence screening of high-dimensional survival data. The proposed method is robust to predictors that contain outliers, works for a general class of survival models, and enjoys the sure screening property. Simulations and an analysis of real data demonstrate that the proposed method performs competitively on survival data sets of moderate size and high-dimensional predictors, even when these are contaminated.

Some key words: High-dimensional survival data, Rank independence screening, Sure screening property

1. Introduction

Our study was motivated by a breast cancer data (van Houwelingen et al., 2006) that contains expression profiles of 24885 candidate genes for 295 patients with breast cancer. The primary interest was to find genes that are predictive for the overall survival time of breast cancer patients. In addition to their dimensionality being large, some predictors are not normally distributed and contain outliers; see the Supplementary Material. These phenomena are common in microarray data. High dimensionality and the existence of outliers make variable selection for censored survival data challenging.

There are numerous studies in the literature regarding variable selection for regression problems with and without censoring. Recently, many studies have focused on penalized methods, such as the lasso (Tibshirani, 1996), the smoothly clipped absolute deviation (Fan & Li, 2001), the Dantzig selector (Candes & Tao, 2007), and their variants. These methods have been thoroughly studied for variable selection with high-dimensional data (e.g., Bickel et al., 2009; Meinshausen & Yu, 2009; van de Geer, 2008). Studies of variable selection for survival outcomes include penalized partial likelihood (Fan & Li, 2002; Tibshirani, 1997; Zhang & Lu, 2007), penalized estimating equations (Johnson, 2008; Johnson et al., 2008), and other approaches that can be used for simultaneous variable selection and estimation. Generally, the associated optimization problems may be solved quickly for moderate to large p, such as p being hundreds or thousands. However, for very large p, such as is encountered in microarray data, these methods remain computationally demanding.

A computationally simple method for very high-dimensional data that can work well in practice is sure independence screening, which was demonstrated in the classical regression context in Fan & Lv (2008). In this method, the outcome variable is regressed on each covariate separately. Sure independence screening recruits the features that have the best marginal utility. In the context of least-squares regression for a linear model, this corresponds to the largest marginal absolute Pearson correlation between the response and the predictor. Fan & Lv (2008) showed that this method has a sure screening property: with probability very close to 1, the method can retain all of the important features in the model. It can also be derived from an empirical likelihood point of view (Hall et al., 2009). Correlation screening is a crude yet effective way to decrease the dimensionality of data. However, the Pearson correlation might not work well for censored survival data because it cannot be estimated reliably, especially when the censoring rate is high. In addition, its performance can be ruined by outliers in predictors because correlation is not a robust measure for association. Such outliers cause trouble for theoretical studies of screening methods, most of which require tail probability conditions for the covariates.

Variable screening methods for high-dimensional survival data are mostly based on the partial-likelihood of the Cox model. For example, Tibshirani (2009) used a lasso penalization approach for pre-screening. Zhao & Li (2012) proposed a screening method based on standardized marginal maximum partial likelihood estimators. However, in practice, the true models often remain unknown, and it is unclear if these methods will work well under model misspecification. Gorst-Rasmussen & Scheike (2013) proposed a model-free screening statistic: the feature aberration at survival times. For each covariate, this new statistic is equivalent to the numerator of the marginal log-rank test. These screening methods might be influenced by outliers in predictors.

In this paper, we propose a censored rank independence screening method for high-dimensional survival data. The rank statistic we consider can be viewed as an inverse probability-of-censoring weighted Kendall’s τ (Kendall, 1962). Our proposed method has several advantages. First, it is robust against the existence of outliers. This robustness is inherited from Kendall’s τ coefficient (Sen, 1968). Second, it is a non model-based method, so it works for a wide class of survival models. In contrast to Pearson’s correlation, Kendall’s τ is invariant under monotonic transformations of responses and predictors. This invariance allows our method to discover any nonlinear relationships between the response and predictors. Third, the proposed method has technical improvements over some other high-dimensional methods, as the proposed screening utility is a U-statistic with a bounded kernel function, which enables us to obtain the sure screening property without requiring tail probability conditions.

2. Censored Rank Independence Screening

Let T denote the event time of interest, C denote the censoring time, and X = (X₍₁₎, …, X_(p))′ denote the p-dimensional vector of covariates. Further, define V = min(T, C) and Δ = I(T ≤ C). where I(·) denotes the indicator function. The observed data are independent and identically distributed copies of W = (X, V, Δ) and are denoted by W_i = (X_i, V_i, Δ_i) for i = 1, …, n, where X_i = (X_1i, …, X_pi)′. Throughout the paper, it is assumed that the censoring time, C, is independent of the event time, T, and the covariates, X. Let 𝒜 = {1, …, p} and X_ℬ = {X_(j) : j ∈ ℬ} for a set ℬ ⊂ 𝒜. Let ℳ_⋆ denote the index set of the active variables:

ℳ_{⋆} = {k : p r (T > t | X) functionally depends on X_{(k)}} .

Our goal is to select the set of active variables, X_{ℳ_⋆}, where ℳ_⋆ ⊂ 𝒜.

We consider the following inverse probability-of-censoring weighted marginal rank correlation utility,

{τ̂}_{k} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{Δ_{j}}{Ŝ^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) - 1 / 4, k = 1, \dots, p,

where Ŝ(·) is the Kaplan–Meier estimator of S(t) = pr(C ≥ t). We define 0/0 = 0 to make τ̂_k well-defined. For a prespecified γ_n, we select the set

ℳ̂ = {k : | {τ̂}_{k} | \geq γ_{n}}

as active variables. In this way, the dimension of the covariates used in the model can be reduced to a value much smaller than n.

Let τ_k = pr(X_ki > X_kj, T_i > T_j) − 1/4. It can be shown that

E {\frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j})} = p r (X_{k i} > X_{k j}, T_{i} > T_{j}),

and it follows that τ̂_k provides a consistent estimate of τ_k. Without assuming any particular model structure, such as the proportional hazards model, the set selected by the proposed censored rank independence screening comprises the variables that have strong marginal rank correlation with the failure time. In the next section we show that the proposed method enjoys the sure screening property under general conditions.

3. Sure Screening Property

Let {C_i, V_i, X_ki} (i = 1, 2) be independent and identically distributed copies of {C, V, X_(k)}, for k = 1, …, p. The following conditions are required:

there exists a ν > 0 such that pr(C = ν) > 0 and pr(C > ν) = 0;
min_{k∈ℳ_⋆} |pr(X_k1 > X_k2, T₁ > T₂) − 1/4| ≥ c₀n^−κ for some 0 < κ < 1/2 and c₀ > 0.

Condition A, adopted from Peng & Fine (2009), is a technical condition that simplifies the derivation of asymptotic properties. Because Condition A is satisfied in many clinical settings, it is widely used in the literature. Condition B is a key assumption to ensure the sure screening property, even without assuming specific model forms. This indicates that to ensure the sure screening property, the minimal marginal rank correlation between the active variables and the response variable should exceed a certain threshold.

Theorem 1. Under Condition A, for any positive constants c₅ ≤ c₆, when $n > max {D^{2} {(1 - δ)}^{- 2} {{(3 / 2)}^{1 / 2} - 1}^{- 2} {‖ S ‖}_{\infty}^{- 2}, 25 D^{2} c_{5}^{2} n^{2 κ} {(1 - δ)}^{- 2} {‖ S ‖}_{\infty}^{- 2}, {(c_{5} / 1.12)}^{1 / κ}}$ , there exist constants c₁, c₂, and c₄ such that

p r (max_{1 \leq k \leq p} | {τ̂}_{k} - τ_{k} | > c_{6} n^{- κ}) \leq p {2.5 n exp (- c_{1} n) + 4 exp (- c_{4} n^{1 - 2 κ}) + 2.5 n exp (- c_{2} n^{1 - 2 κ})},

where ‖ · ‖_∞ is the L_∞ norm, and D is a constant introduced in Lemma 1 of the Appendix. Moreover, when Condition B holds, taking γ_n = c₇n^−κ with c₇ ≤ c₀/2 leads to

p r (ℳ_{⋆} \subset ℳ̂) \geq 1 - s {2.5 n exp (- c_{1} n) + 4 exp (- c_{4} n^{1 - 2 κ}) + 2.5 n exp (- c_{2} n^{1 - 2 κ})},

where s is the number of variables in ℳ_⋆.

The first result of Theorem 1 leads to the conditions whereby the sure screening property of our method is ensured. Specifically, as n goes to infinity, the maximum dimensionality is p = o{exp(n^1−2κ)}, for κ ∈ (0, 1/2). This limit is of the same order as that obtained in Fan and Lv (2008) for correlation learning in the linear model set-up, and it is stronger than the result obtained in Fan et al. (2010). Because no tail probability conditions for covariates are needed, the conditions for the sure screening property are more relaxed than those in Fan & Lv (2008) and Fan et al. (2010). Therefore, our method generally allows heavy tailed covariates. Moreover, the method is robust to model misspecification because no model assumptions are required for the sure screening property to hold. In the next section, we apply the proposed method to a general class of transformation models, under which a set of sufficient conditions for showing Condition B will be provided and the size of the set ℳ̂ can be controlled.

The threshold, γ_n, controls how many covariates pass screening. To ensure the sure screening property, γ_n can be taken as any value that is smaller than the minimum signal, provided the minimum signal is distinguishable from the estimation noise. Model selection consistency, i.e., pr(ℳ̂ = ℳ_⋆) = 1 − o(1), can be achieved if there is a gap between signal variables and noise variables. In our case, a sufficient condition for model selection consistency is that X_{ℳ_⋆} and $X_{ℳ_{⋆}^{c}}$ are independent. Then

p r (T_{1} > T_{2}, X_{j 1} > X_{j 2}) = E [E {1 (T_{1} > T_{2}, X_{j 1} > X_{j 2}) | X_{ℳ_{⋆}}}] = 1 / 4, j \in ℳ_{⋆}^{c} .

4. Selection of the important set

In applications, it is common practice to select a prefixed number of top-ranked variables for follow-up study. The prefixed number may reflect researchers’ prior knowledge of the number of susceptible predictors, or budget limitations (Kuo & Zaykin, 2011; Skol et al., 2006). Another commonly used procedure is to set the size of M̂ to a number less than the sample size, so that follow-up regression analysis can be performed in a p < n scenario (Fan and Lv, 2008). Data-driven procedures for selecting the size of the important set based on screening statistics are appealing but relatively limited. Zhao and Li (2012) proposed a principled selection method based on controlling false positive rate, but it can be conservative for screening purposes because controlling the false positive rate at a low level can lead to large false negative error.

We propose to estimate the size of the important set using a technique developed in the multiple testing literature. Specifically, consider the hypotheses H_0k : τ_k = 0 and H_ak : τ_k ≠ 0 (k = 1, …, p). Under H_0k, we can show that n^1/2τ̂_k converges in distribution to a mean-zero normal random variable, and its asymptotic variance can be consistently estimated using U-statistic techniques similar to those studied in Fine et al. (1998). Let ${σ̂}_{k}^{2}$ denote the estimated asymptotic variance of τ̂_k. Then, the p-value for testing H_0k can be computed as q_k = 2{1 − Φ(|τ̂_k|/σ̂_k)}, where Φ(·) is the standard normal cumulative distribution function. Order the p-values as q₍₁₎ ≤ ⋯ ≤ q_(p). Let |𝒜| denote the size of a set 𝒜. The proportion of true signals is π = |ℳ_*|/p. For a large number of independently tested hypotheses, Meinshausen et al. (2006) showed that π can be consistently estimated by

π̂ = max_{i = 1, \dots, p / 2} \frac{i / p - q_{(i)} - {(2 log log p / p)}^{1 / 2} {q_{(i)} (1 - q_{(i)})}^{1 / 2}}{1 - q_{(i)}} .

(1)

However, for general dependent test statistics, such as our proposed censored rank screening statistics τ̂_k’s, the consistency of π̂ is usually unclear. In this paper, we use π̂ as an estimator of π and set |ℳ̂| = π̂p. We study the empirical performance of this estimator in Section 6.

5. Application to a general class of transformation models

Although the sure screening property of our method does not depend on the specific modeling form, the active set, ℳ_⋆, is not easily specified without assuming a model structure. To benefit from aspects of both the model-based and the model-free approaches, it is helpful to consider a wide class of models that contains the underlying true model. Here, we consider a general class of transformation models, under which the active set, ℳ_⋆, can be easily specified, and the sure screening property will hold.

Specifically, the general class of transformation models is given by

H (T) = m (X) + ε,

(2)

where H(·) is an increasing transformation function, m(·) is monotone in each element of X, and ε is independent of X and has a continuous distribution function. Under model (2), the conditional survival function takes the form

p r (T > t | X) = S_{ε} {H (t) - m (X)},

(3)

where S_ε(·) is the survival function of ε. This class of transformation models includes many popular survival models as special cases. For example, when H(·) is unknown, S_ε(·) is specified, and m(X) = β′X, model (2) becomes the linear transformation model (Clayton & Cuzick, 1985). This model includes the proportional hazards and proportional odds models as special cases. When H is the log transformation, S_ε(·) is unspecified, and m(X) = β′X, model (2) becomes the accelerated failure time model (Kalbfleisch & Prentice, 2002). Other examples of transformation models include the odds-rate, inverse Gaussian and log-normal families (Kosorok et al., 2004; Scharfstein et al., 1998).

For transformation models with m(X) = β′X, it is clear that ℳ_⋆ = {j ∈ 𝒜 : β_j ≠ 0}, where β = (β₁, …, β_p)′. In general, ℳ_⋆ can be defined as the smallest subset of 𝒜 such that m(·) is only a function of covariates in ℳ_⋆, i.e., the transformation models can be equivalently written as H(T) = m(X_{ℳ_⋆}) + ε. Define s_n = |ℳ_⋆|, the number of active variables in ℳ_⋆, and $ℳ_{⋆}^{c} = 𝒜 / ℳ_{⋆}$ .

For k ∈ ℳ_⋆, define m_k(x) = E{m(x, X_{ℳ_⋆/k})}, where the expectation is taken with respect to the joint distribution of covariates in ℳ_⋆/k with X_(k) fixed at x. Without loss of generality, X_(k) is assumed to have mean 0 and variance 1. The following conditions are sufficient to show Condition B for a general class of transformation models.

C1
For any k ∈ ℳ_⋆, the conditional density function of H(T₁) − H(T₂) − {m_k(X_k1) − m_k(X_k2)} given m_k(X_k1) − m_k(X_k2) is unimodal and symmetric around zero.
C2
For any k ∈ ℳ_⋆, there exist positive constants σ₁ and σ₂ such that the variance of m_k(X_k1) is uniformly bounded above by $σ_{1}^{2}$ , and the conditional variance of H(T₁) − H(T₂) − {m_k(X_k1) − m_k(X_k2)} given m_k(X_k1) − m_k(X_k2) is uniformly bounded above by $σ_{2}^{2}$ .
C3
For any k ∈ ℳ_⋆, there exists a positive constant, d₀, that is independent of p such that min_k E{|m_k(X_(k)) − Em_k(X_(k))|} ≥ d₀n^−κ/2 for 0 < κ < 1/2.

Proposition 1. If Conditions C1–C3 hold, then Condition B holds for some c₀ > 0.

As m(·) is monotone in each element of ℳ_⋆, m_k(x) is monotone in x. As a marginal projection of m(·) onto the univariate dimension of X_k, m_k(x) is utilized as a parsimonious way to pass along the monotonicity from the joint model to the marginal model. As technical conditions, C1 and C2 can be checked empirically. Condition C3 states that the least absolute deviation of m_k(X_(k)) in the active set, ℳ_⋆, can serve as the measurement of detectable signals for transformation models.

Next, we show that the size of set ℳ̂ can be controlled for linear transformation models with m(X) = β′X. This result is similar in rationale to the result of Theorem 5 in Fan and Song (2010). The following condition, C4, is needed.

C4
For k ∈ ℳ_⋆, the conditional density function of H(T₁) − H(T₂) − (X_k1 − X_k2)EX_(k)Y given X_k1 − X_k2 is unimodal and symmetric around zero.

Theorem 2. Under Conditions A and C1–C4, when var{H(T)} = O(1), for γ_n = c₇n^−κ, there exist positive constants c₁, c₂, and c₄ such that

p r [| ℳ̂ | \leq O {γ_{n}^{- 2} λ_{max} (Σ)}] \geq 1 - p {2.5 n exp (- c_{1} n) + 4 exp (- c_{4} n^{1 - 2 κ}) + 2.5 n exp (- c_{2} n^{1 - 2 κ})},

where Σ is the covariance matrix of X and λ_max(Σ) is its largest eigenvalue.

Taking the choice of γ_n as in Theorem 1, if the largest eigenvalue of the covariance matrix of X = O(n^τ) is of polynomial order for some τ > 0, then the size of ℳ̂ is also of polynomial order O(n^2κ+τ) according to Theorem 2. This indicates that the size of the selected set can indeed be effectively controlled.

6. Simulation Studies

We conducted simulations to evaluate the empirical performance of the proposed censored rank independence screening method. For comparison, we considered three alternative methods: feature aberration at survival times screening (Gorst-Rasmussen & Scheike, 2013), partial likelihood ratio screening, and correlation screening. For partial likelihood ratio screening, we fit a marginal Cox model for each covariate and constructed the corresponding partial likelihood ratio statistic versus the no covariates model. This method is asymptotically equivalent to the screening method proposed by Zhao & Li (2012). For correlation screening, we used uncensored data to compute the marginal correlation between event time and covariate using an inverse probability-of-censoring weighted method. This generalizes standard sure independence screening for linear regression to survival data.

The failure time, T_i, was generated from the class of linear transformation models

H (T_{i}) = - β' X_{i} + ε_{i}, i = 1, \dots, n,

where H(t) = log{0.5(e^2t − 1)}, and X_i is a p-dimensional vector of covariates. We set n = 100, 300 and p = 5000, 10000. The covariates, X_i, were generated from a multivariate normal distribution with mean 0, variance 1, and a first order autoregressive structure, i.e., corr(X_ij, X_ik) = 0.5^|j−k| (j, k = 1, …, p). We considered two scenarios for the true regression coefficients: Scenario 1, $β = (- 1, - 0.9, 0_{6}^{'}, 0.8, 1.0, 0_{p - 10}^{'})'$ and Scenario 2,

β = (- 1, - 0.9, 0.5, 0.8, 0.6, 0_{5}^{'}, 0.3, 0.7, - 0.8, - 0.5, - 1.0, 0_{5}^{'}, - 0.3, 0.6, 0.8, - 0.7, - 0.9, 0_{p - 25}^{'})' .

Scenario 2 is more challenging than Scenario 1 because there are several active variables with relatively small effects. We considered three error distributions: the standard extreme value distribution, which corresponds to a proportional hazards model; the standard logistic distribution, which corresponds to a proportional odds model; and the standard normal distribution, which corresponds to a normal transformation model. The censoring time was generated from a uniform distribution on [0, c], where the constant, c, was chosen to achieve censoring proportions of 15% and 40%. For each setting, we conducted 100 simulation runs. The simulation results for normal covariates are given in the Supplementary Material. Based on the results, the selection performances of the proposed method and partial likelihood ratio screening were comparable for the Scenario 1 settings and for the 15% censoring rate setting in Scenario 2. The performance of partial likelihood ratio screening was slightly better than the proposed method for Scenario 2 when the censoring rate was high, i.e., 40%. Generally, correlation screening performed poorly relative to both the proposed method and partial likelihood ratio screening. Correlation screening became very poor for Scenario 2 when the censoring rate was high.

To study the performances when the covariates might be contaminated by outliers, we added outliers to the covariates. All other settings were unchanged. Specifically, with a probability of 0.1, each covariate was replaced by a random variable generated from a t distribution. Again, we conducted 100 simulation runs. For Scenario 1, we report the average number of active variables contained in the top 4, 10, 20, 30, 40, and 50 selected variables, denoted by true positive; the true number was 4. For Scenario 2, we report the corresponding number in the top 15, 30, 45, 60, 75, 90, 120, and 150 selected variables, and the true number was 15. The results for the proportional hazards model under Scenarios 1 and 2 are summarized in Figures 1 and 2, respectively. Results for the proportional odds model are similar, and are provided in the Supplementary Material.

Fig. 1 — Average numbers of active variables contained in the top 4, 10, 20, 30, 40, and 50 selected variables for the proportional hazards model under Scenario 1. Solid circles denote the results obtained using the proposed method; triangles denote feature aberration at survival times screening, squares denote partial likelihood ratio screening; and diamonds denote correlation screening. For each method, the upper line is for case of 15% censoring, and the lower line is for case of 40% censoring.

Fig. 2 — The average numbers of active variables contained in the top 15, 30, 45, 60, 75, 90, 120, and 150 selected variables for the proportional hazards model under Scenario 2. Solid circles denote the results obtained using the proposed method; triangles denote feature aberration at survival times screening, squares denote partial likelihood ratio screening; and diamonds denote correlation screening. For each method, the upper line is for case of 15% censoring, and the lower line is for case of 40% censoring.

Under all settings, the selection performance of the proposed method was very similar to that when the covariates did not contain outliers, so it is robust to outliers in covariates. In addition, the selection performance improved when the sample size increased and the censoring rate decreased, though the performances when p = 5000 and p = 10000 were similar. For most settings, the performance of the proposed method was superior to both partial likelihood ratio screening and the feature aberration at survival times statistic. However, these three methods had comparable performances when n = 100 and the censoring rate was 40% in Scenario 2. Generally, partial likelihood ratio screening and feature aberration at survival times screening had comparable performances. In addition, the censoring proportion had less effect on their performances than on the performance of the proposed method. This behavior is expected because the proposed method uses the inverse probability-of-censoring weighted technique to address censoring, which might lose some efficiency when the censoring proportion is high. As was the case for normal covariates, correlation screening had the poorest selection performance.

Next, we examine the performance of the method proposed in Section 4 for selecting the number of important predictors. We considered n = 100 and 300, and p = 5000. The average numbers of selected predictors over 100 simulations are given in Table 1. The average numbers of selected predictors are much larger for Scenario 2 and censoring rate of 40%, which is expected because the signal is much weaker under Scenario 2 and the censoring rate is higher. In addition, with n = 300, the selected sets cover all the true signals in almost all the simulation runs for Scenario 1 under both models and censoring rates, while for Scenario 2, they cover almost half of the true signals on average. For smaller sample size of 100, the average numbers of selected true signals decrease, and the magnitude of decrease is relatively large for 40% censoring rate and Scenario 2.

Table 1.

Average numbers of important selected predictors.

Model	Scenario	N.sel (SD)	N.sel (SD)	N.sel (SD)	N.sel (SD)
		n = 100		n = 300
		CP = 15%	CP = 40%	CP = 15%	CP = 40%
PH	1	13.8 (4.8)	28.1 (8.9)	7.4 (3.0)	28.7 (10.2)
	2	31.3 (16.4)	116.7 (80.5)	9.6 (6.5)	101.6 (79.2)
PO	1	11.6 (4.7)	26.1 (7.8)	7.5 (3.3)	23.1 (10.6)
	2	45.0 (15.6)	125.4 (87.3)	25.0 (8.0)	129.8 (100.4)

Open in a new tab

^†

CP denotes censoring proportion; N.sel denotes the average number of selected important predictors over 100 simulations with the number in parenthesis being the standard deviation over 100 simulations; PH denotes the proportional hazards model; PO denotes the proportional odds model.

Although the independent censoring condition was imposed for theoretical development, it can be relaxed in practice. We conducted simulations using a censoring distribution that depended on covariates. Specifically, the censoring times were generated from an exponential distribution with mean c exp(X₁ − X₈), with c chosen to achieve censoring rates of 15% and 40%. All other settings were unchanged from the previous simulations. Here, we only considered the proportional hazards model under Scenario 2 with n = 100, 300 and p = 5000. The simulation results are given in the Supplementary Material. Although the Kaplan–Meier estimator is not consistent for the survival distribution of the censoring time, the proposed method continued to perform competitively in this limited simulation study. In addition, we conducted simulations to examine the performance of the proposed and competing methods under a censoring rate of 70%. The simulation results are given in the Supplementary Material. In summary, the proposed method showed comparable performance under the heavy censoring case. The performance of the proposed method became slightly worse than partial likelihood ratio screening and the feature aberration at survival times screening. This behavior is expected because the proposed method uses the inverse probability-of-censoring weighted technique to address censoring, which might lose some efficiency when the censoring proportion is high.

7. Application to breast cancer data

We applied our proposed rank independence screening method to the analysis of survival from a breast cancer study (van Houwelingen et al., 2006), with 295 female patients with primary invasive breast carcinoma. For each patient, the expressions of 24885 genes were profiled on cDNA arrays from all tumors. A set of 4919 candidate genes were selected after initial screening using the Rosetta error model (van’t Veer et al., 2002). The primary endpoint of interest was the overall survival time. Of the 295 patients, 216 had censored responses, giving a 73% censoring rate. A main goal of the study was to identify genes that are associated with the overall survival of breast cancer patients. As discussed in §1, gene expression profiles commonly contain outliers. For the breast cancer data, we identified potential outliers in the gene expressions. Specifically, for the jth gene of subject i, we calculated a modified z-statistic: z_ij = 0.6745|X_ji − m_j|/ν_j, where m_j and ν_j are the median and median absolute deviation of the jth gene expression profiles over the 295 subjects. If z_ij > 3.5, the data point was claimed to be an outlier. This criterion was suggested by Iglewicz & Hoaglin (1993) for outlier detection and has been widely used in the literature. Based on this rule, of the 4919 genes, 3488 contained at least 1 outlier, 582 contained at least 10 outliers, and 58 contained at least 30 outliers.

To check the covariate-independent censoring assumption, we fit a marginal Cox proportional hazards model for each individual predictor. Out of 4919 genes, 368 genes have significant coefficients, with p-values less than 0.05. After Bonferroni correction, only one gene is significantly related to censoring times. For the 368 significant genes, we replaced the Kaplan–Meier estimator of the censoring survival function by the estimated conditional survival function from the fitted proportional hazards model and recalculated the corresponding screening statistics. We found that the rankings of the screening statistics are nearly unchanged. Therefore, in our data application, we used the Kaplan–Meier estimator for the censoring survival function.

We used the proposed method to analyze both the original data and the data with outliers removed. For comparison, we also include the selection results obtained using partial likelihood ratio screening and correlation screening. We used our method to estimate the number of important predictors based on the original data. The estimated size of the important set is 492. We report the symbols of the top 20 selected genes in Table 2. For the original data, none of the top 20 genes selected by correlation screening were selected by either partial likelihood ratio screening or the proposed method. However, several genes were selected by both partial likelihood ratio screening and the proposed method: 3 of the top 10 genes and 13 of the top 20 genes. For the data with outliers removed, only 1 of the top 20 genes selected by correlation screening was also selected by partial likelihood ratio screening. For each method, we compared the top 20 genes selected from the original data to the top 20 selected genes obtained from the data with outliers removed. The two sets of genes selected using partial likelihood ratio screening were completely different. However, the two sets selected by the proposed method had 18 genes in common, with a similar order. These results imply that partial likelihood ratio screening and the proposed method might give more reliable selection results than correlation screening; and the proposed method is robust to outliers in covariates, but partial likelihood ratio screening is not.

Table 2.

Top 20 selected genes for breast cancer data.

	CS		PLRS		CRIS
order	W	WO	W	WO	W	WO
1	NM_007117	NM_005310	NM_018410	NM_001942	Contig38288_RC	Contig38288_RC
2	AL137517	NM_001871	NM_006607	NM_000509	Contig57584_RC	Contig57584_RC
3	U22029	AB037821	D38553	NM_004496	NM_001168	NM_001168
4	NM_000764	NM_007117	D14678	NM_016267	Contig48270_RC	NM_003258
5	NM_012067	NM_004291	NM_004701	Contig45588_RC	NM_003258	Contig48270_RC
6	NM_001819	NM_000764	Contig57584_RC	NM_006121	NM_006623	NM_006623
7	M33318	NM_001819	NM_003981	Contig44191_RC	NM_001333	NM_006607
8	NM_001871	U22029	D43950	NM_000423	NM_006607	NM_006739
9	NM_004291	M33318	NM_016359	Contig51687_RC	NM_006096	Contig31288_RC
10	NM_005310	NM_012242	NM_001168	Contig27882_RC	Contig31288_RC	NM_006096
11	NM_001741	NM_001327	U96131	NM_002590	NM_004701	NM_004701
12	NM_005982	Contig50979_RC	NM_001333	NM_001844	M96577	M96577
13	NM_016569	Contig21930_RC	Contig38288_RC	NM_000427	NM_004217	NM_001333
14	NM_000439	NM_002809	Contig31288_RC	NM_005310	NM_016359	NM_004217
15	Contig50979_RC	NM_000735	NM_001809	NM_003360	U96131	NM_016359
16	AF070536	NM_001741	NM_004217	NM_014211	NM_004456	NM_007019
17	AF131851	NM_020411	M96577	NM_006536	U74612	U96131
18	NM_000363	AJ275978	NM_001605	Contig53968_RC	NM_007019	NM_003981
19	Contig49589_RC	NM_004988	NM_005733	Contig43195_RC	NM_003981	NM_006579
20	Contig32563_RC	Contig42751_RC	NM_007019	NM_001453	NM_006579	Contig41413_RC

Open in a new tab

^†

CS denotes correlation screening; PLRS denotes partial likelihood ratio screening; CRIS denotes proposed censored rank independence screening; W denotes the original data; and WO denotes the data with outliers removed.

8. Discussion

In our method, censoring times are assumed to be independent of failure times and predictors, which may be too restrictive in some applications. This assumption can be relaxed to a certain extent. For example, as considered in He et al. (2013) for quantile-adaptive variable screening, we may assume that T_i and C_i are conditionally independent given a single predictor. Then, the survival function, S(t), of censoring times can be replaced by the conditional survival function S(t | X_ij) = pr(C_i ≥ t | X_ij), which can be consistently estimated by the local Kaplan–Meier estimator (Gonzalez-Manteiga & Cadarso-Suarez, 1994). Alternatively, we may build a semi-parametric survival model for censoring times, for example, a proportional hazards model with lasso selection of important predictors, and compute the model-based conditional survival function for censoring times. The sure screening property of the associated statistics needs to be further investigated.

Supplementary Material

Web Appendix

NIHMS625927-supplement-Web_Appendix.pdf^{(245.4KB, pdf)}

Acknowledgement

We would like to thank the editor, an associate editor and three referees for very insightful comments. We also thank Shannon Holloway for careful reading of the manuscript. R. S.’s research was supported by the National Science Foundation. W. L.’s research was supported by the National Cancer Institute. S. M.’s research was supported by the National Institute of Health.

Appendix

Proof of the sure screening property

The following Lemmas are used to prove the sure screening property of τ̂_k.

Lemma 1. (Bitouze et al., 1999, Theorem 1) Let ${T_{i}}_{i = 1}^{n}$ and ${C_{i}}_{i = 1}^{n}$ be independent sequences of independently identically distributed nonnegative random variables with distribution functions F and G, respectively. Let F̂_n be the Kaplan–Meier estimator of the distribution function F. There exists a positive constant, D, such that for any positive constant λ,

p r {n^{1 / 2} {‖ (1 - G) ({F̂}_{n} - F) ‖}_{\infty} > λ} \leq 2.5 e^{- 2 λ^{2} + D λ} .

Lemma 2. (Hoeffding, 1963) Let g = g(x₁, …, x_m) be a symmetric kernel of the U-statistic, U, with a ≤ h(x₁, …, x_m) ≤ b. For any t > 0 and m ≤ n, we have

p r {| U - E (U) | > t} \leq 2 exp {\frac{- 2 ⌊ n / m ⌋ t^{2}}{{(b - a)}^{2}}} .

Lemma 3. For any c > 0, when $n \geq D^{2} c_{1}^{- 1}$ , where c₁ = (1 − δ)² { (2c + 1)^1/2(c + 1)^−1/2 − 1}² $c_{1} = {(1 - δ)}^{2} {{(2 c + 1)}^{1 / 2} {(c + 1)}^{- 1 / 2} - 1}^{2} {‖ S ‖}_{\infty}^{2}$ ,

p r {max_{i} | \frac{S^{2} (V_{i})}{Ŝ^{2} (V_{i})} - 1 | \geq c} \leq 2.5 n e^{- c_{1} n} .

(A1)

Moreover, for any 0 < l < 1.12, when $n > 25 D^{2} l^{- 2} {(1 - δ)}^{- 2} {‖ S ‖}_{\infty}^{- 2}$ ,

p r {max_{i} | \frac{S^{2} (V_{i})}{Ŝ^{2} (V_{i})} - 1 | \geq l} \leq 2.5 n e^{- c_{2} n l^{2}},

(A2)

where $c_{2} = 0.04 {(1 - δ)}^{2} {‖ S ‖}_{\infty}^{- 2}$ .

Proof of Lemma 3. To show Lemma 3, we claim the following result, giving its proof at the end: for t ∈ (0, 2^1/2 − 1), |Ŝ⁻²(V_i) − S⁻²(V_i)| ≥ cS⁻²(V_i) implies |Ŝ(V_i) − S(V_i)| ≥ tS(V_i), where c = {(t + 1)² − 1}/{2 − (t + 1)²}. This claim further implies that ‖Ŝ − S‖_∞ ≥ t‖S‖_∞. Therefore,

p r {max_{i} | \frac{S^{2} (V_{i})}{Ŝ^{2} (V_{i})} - 1 | \geq c} \leq n p r {| \frac{S^{2} (V_{i})}{Ŝ^{2} (V_{i})} - 1 | \geq c} \leq n p r {| Ŝ (V_{i}) - S (V_{i}) | \geq t S (V_{i})} \leq n p r ({‖ Ŝ - S ‖}_{\infty} \geq t {‖ S ‖}_{\infty}) .

(A3)

Let G(·) denote the cumulative distribution function of T, i.e., G(t) = pr(T ≤ t). By Condition A, 1 − δ < |1 − G(V_i)| < 1, for i = 1, …, n. Therefore, by Lemma 1, we have

p r ({‖ Ŝ - S ‖}_{\infty} \geq t {‖ S ‖}_{\infty}) \leq p r {n^{1 / 2} {‖ (1 - G) (Ŝ - S) ‖}_{\infty} > n^{1 / 2} (1 - δ) {‖ S ‖}_{\infty} t} \leq 2.5 exp {- 2 n {(1 - δ)}^{2} {‖ S ‖}_{\infty}^{2} t^{2} + D n^{1 / 2} (1 - δ) {‖ S ‖}_{\infty} t} .

(A4)

When $n^{1 / 2} > D t^{- 1} {(1 - δ)}^{- 1} {‖ S ‖}_{\infty}^{- 1}$ , (A4) is further bounded by 2.5 $exp {- n {(1 - δ)}^{2} {‖ S ‖}_{\infty}^{2} t^{2}}$ .

For 0 < c < 1, i.e., 0 < t < (3/2)^1/2 − 1, because

c = \frac{1}{2 - {(t + 1)}^{2}} - 1 < 5 t,

from the above calculations, for $n^{1 / 2} > D t^{- 1} {(1 - δ)}^{- 1} {‖ S ‖}_{\infty}^{- 1}$ ,

p r {| \frac{S^{2} (V_{i})}{Ŝ^{2} (V_{i})} - 1 | \geq 5 t} \leq p r {| \frac{S^{2} (V_{i})}{Ŝ^{2} (V_{i})} - 1 | \geq c} \leq 2.5 exp {- n {(1 - δ)}^{2} {‖ S ‖}_{\infty}^{2} t^{2}} .

The desired result (A2) now follows from the union bound of probability and by setting l = 5t.

We now show the result given at the beginning of the proof. Take A = Ŝ(V_i) and B = S(V_i). Let a = (t + 1)² − 1. Therefore, c = 1/(1 − a) − 1. Because

| A^{- 2} - B^{- 2} | \geq {1 / (1 - a) - 1} B^{- 2},

we have A⁻² − B⁻² ≤ −{1/(1 − a) − 1}B⁻², or ≥ {1/(1 − a) − 1}B⁻². For t ∈ (0, 2^1/2 − 1), i.e., a ∈ (0, 1), we have 1 − 1/(1 + a) < 1/(1 − a) − 1. It follows that A⁻² − B⁻² ≤ −{1 − 1/(1 + a)}B⁻², or ≥ {1/(1 − a) − 1}B⁻², which is equivalent to |A − B| ≥ tB.

Proof of Theorem 1. Rewrite τ̂_k = 2{n(n − 1)}⁻¹ ∑_i<j g(W_i, W_j) − 1/4, where

g (W_{i}, W_{j}) = {\frac{Δ_{j}}{Ŝ^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) + \frac{Δ_{j}}{Ŝ^{2} (V_{i})} I (X_{k j} > X_{k i}, V_{j} > V_{i})} / 2

is the symmetric kernel of τ̂_k. Therefore, τ̂_k is a U-statistic.

Let U_nf = 2{n(n − 1)}⁻¹ ∑_i<j f(X_i, X_j) denote the empirical function for U-statistics. After some algebra, we have τ̂_k − τ_k = I_k1 + I_k2, where

I_{k 1} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{Δ_{j}}{Ŝ^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) - {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) = U_{n} [{\frac{1}{Ŝ^{2} (V_{j})} - \frac{1}{S^{2} (V_{j})}} Δ_{i} I (X_{k i} > X_{k j}, V_{i} > V_{j})],

and

I_{k 2} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) - p r (X_{k i} > X_{k j}, T_{i} > T_{j}) = (U_{n} - E) \frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) .

We bound I_k1 and I_k2 piece by piece. In particular, I_k1 can be bounded from above as

I_{k 1} \leq {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} max_{j} | \frac{S^{2} (V_{j})}{Ŝ^{2} (V_{j})} - 1 | \sum_{i < j} \frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) .

(A5)

By the triangle inequality, (A5) can be further bounded above as

max_{j} | \frac{S^{2} (V_{j})}{Ŝ^{2} (V_{j})} - 1 | | (U_{n} - E) \frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) | + max_{j} | \frac{S^{2} (V_{j})}{Ŝ^{2} (V_{j})} - 1 | | E \frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) | \equiv I_{k 11} + I_{k 12} .

To bound I_k11, recall that

g (W_{i}, W_{j}) = (1 / 2) {\frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) + \frac{Δ_{i}}{S^{2} (V_{i})} I (X_{k i} > X_{k j}, V_{i} > V_{j})} - 1 / 4 .

By Condition A, we have

- δ^{- 2} - 1 / 4 \leq h (W_{i}, W_{j}) \leq δ^{- 2} - 1 / 4 .

For any c₃ > 0, by Lemma 2, there exist c₃ and $c_{4} = 4 c_{3}^{2}$ such that

p r {| (U_{n} - E) \frac{Δ_{j}}{S^{2} (V_{j})} I (X_{k i} > X_{k j}, V_{i} > V_{j}) | > c_{3} n^{- κ}} \leq 2 exp (- c_{4} n^{1 - 2 κ}) .

(A6)

By Lemma 3, letting c = 1 in (A1), when $n \geq D^{2} {(1 - δ)}^{- 2} {{(3 / 2)}^{1 / 2} - 1}^{- 2} {‖ S ‖}_{\infty}^{- 2}$ ,

p r {max_{i} | \frac{S^{2} (V_{i})}{Ŝ^{2} (V_{i})} - 1 | \geq 1} \leq 2.5 n exp (- c_{1} n) .

(A7)

It follows from (A6) and (A7) that for any c₃ > 0, when $n \geq D^{2} {(1 - δ)}^{- 2} {{(3 / 2)}^{1 / 2} - 1}^{- 2} {‖ S ‖}_{\infty}^{- 2}$ , there exist c₁ and c₄ such that

p r (| I_{k 11} | \geq c_{3} n^{- κ}) \leq 2.5 n exp (- c_{1} n) + 2 exp (- c_{4} n^{1 - 2 κ}) .

(A8)

Because |E{Δ_jS(V_j)⁻²I(X_ki > X_kj, V_i > V_j)}| ≤ 1, it follows from (A2) that for any c₅ > 0 and c₅n^−κ < 1.12, when $n > 25 D^{2} c_{5}^{2} n^{2 κ} {(1 - δ)}^{- 2} {‖ S ‖}_{\infty}^{- 2}$ , there exists c₂ > 0 such that

p r (| I_{k 12} | \geq c_{5} n^{- κ}) \leq 2.5 n exp (- c_{2} n^{1 - 2 κ}) .

(A9)

By the triangle inequality, it now follows from (A6), (A8), and (A9) that for any c₃, c₅ > 0, when $n > max [D^{2} {(1 - δ)}^{- 2} {{(3 / 2)}^{1 / 2} - 1}^{- 2} {‖ S ‖}_{\infty}^{- 2}, 25 D^{2} c_{5}^{2} n^{2 κ} {(1 - δ)}^{- 2} {‖ S ‖}_{\infty}^{- 2}, {(c_{5} / 1.12)}^{1 / κ}]$ , there exist c₁, c₂, and c₄ such that

p r {max_{1 \leq k \leq p} | {τ̂}_{k} - τ_{k} | > (2 c_{3} + c_{5}) n^{- κ}} \leq p {2.5 n exp (- c_{1} n) + 4 exp (- c_{4} n^{1 - 2 κ}) + 2.5 n exp (- c_{2} n^{1 - 2 κ})} .

The first result follows by letting c₆ = 2c₃ + c₅.

For the second part, note that on the event

A_{n} \equiv {max_{k \in ℳ_{⋆}} | {τ̂}_{k} - τ_{k} | \leq c_{0} n^{- 2 κ} / 2},

by Condition B, we have |τ̂_k| ≥ c₀n^−κ/2, for all k ∈ ℳ_⋆. Therefore, by the choice of ν_n, we have ℳ_⋆ ⊂ ℳ̂_{ν_n}. The result now follows from a simple union bound:

p r (A_{n}^{c}) \leq s {2.5 n exp (- c_{1} n) + 4 exp (- c_{4} n^{1 - 2 κ}) + 2.5 n exp (- c_{2} n^{1 - 2 κ})} .

This completes the proof.

Verification of Condition B for a general class of transformation models

Proof of Proposition 1. Recall that τ_k = pr(X_k1 > X_k2, T₁ > T₂) − 1/4. Next we will show that |τ_k| ≥ c₀n^−κ for some c₀ > 0, if k ∈ ℳ_⋆. For k ∈ ℳ_⋆, we have

τ_{k} = E {I (X_{k 1} > X_{k 2}, T_{1} > T_{2})} - \frac{1}{4} = E (I (X_{k 1} > X_{k 2}) I [H (T_{1}) - H (T_{2}) - {m_{k} (X_{k 1}) - m_{k} (X_{k 2})} > m_{k} (X_{k 2}) - m_{k} (X_{k 1})]) - \frac{1}{4} = E (I (X_{k 1} > X_{k 2}) [1 - F_{Δ ε_{k} | Δ m_{k}} {m_{k} (X_{k 2}) - m_{k} (X_{k 1})}]) - \frac{1}{4},

where F_{Δε_k|Δm_k} (·) is the conditional cumulative distribution function of Δε_k = H(T₁) − H(T₂) − {m_k(X_k1) − m_k(X_k2)} given Δm_k = m_k(X_k1) − m_k(X_k2). Because m_k(·) is a monotone function, m_k(X_k2) − m_k(X_k1) is either greater than or less than zero for all X_k1 > X_k2. This implies that 1 − F_{Δε_k|Δm_k}{m_k(X_k2) − m_k(X_k1)} is either greater or less than 1/2 due to Condition C1. Therefore, τ_k is either greater or less than zero for k ∈ ℳ_⋆. In the following, we further establish the lower bound of |τ_k|.

Without loss of generality, assume m_k(·) is monotone increasing. Note that τ_k can be equivalently written as

τ_{k} = E (I (X_{k 1} > X_{k 2}) [F_{Δ ε_{k} | Δ m_{k}} {m_{k} (X_{k 1}) - m_{k} (X_{k 2})} - F_{Δ ε_{k} | Δ m_{k}} (0)]) = E {I (X_{k 1} > X_{k 2}) \int_{0}^{m_{k} (X_{k 1}) - m_{k} (X_{k 2})} f_{Δ ε_{k} | Δ m_{k}} (t) d t} .

According to Corollary 3 in Sellke & Sellke (1997), for a random variable X with mean zero, variance σ², and unimodal symmetric distribution, pr(|X| ≥ t) ≤ 3^1/2σ/(t + 3^1/2σ). By Condition C1, we have

\int_{0}^{m_{k} (X_{k 1}) - m_{k} (X_{k 2})} f_{Δ ε_{k} | Δ m_{k}} (t) d t \geq \frac{1}{2} {1 - \frac{3^{1 / 2} σ_{2}}{m_{k} (X_{k 1}) - m_{k} (X_{k 2}) + 3^{1 / 2} σ_{2}}} = \frac{m_{k} (X_{k 1}) - m_{k} (X_{k 2})}{2 {m_{k} (X_{k 1}) - m_{k} (X_{k 2}) + 3^{1 / 2} σ_{2}}} .

This expression leads to

τ_{k} \geq E [I {m_{k} (X_{k 1}) > m_{k} (X_{k 2})} \frac{m_{k} (X_{k 1}) - m_{k} (X_{k 2})}{2 {m_{k} (X_{k 1}) - m_{k} (X_{k 2}) + 3^{1 / 2} σ_{2}}}] \geq \frac{1}{2 M + 2 \times 3^{1 / 2} σ_{2}} E [{m_{k} (X_{k 1}) - m_{k} (X_{k 2})} I {0 < m_{k} (X_{k 1}) - m_{k} (X_{k 2}) \leq M}] = \frac{1}{2 M + 2 \times 3^{1 / 2} σ_{2}} E [{m_{k} (X_{k 1}) - m_{k} (X_{k 2})} I {m_{k} (X_{k 1}) - m_{k} (X_{k 2}) > 0} - {m_{k} (X_{k 1}) - m_{k} (X_{k 2})} 1 {m_{k} (X_{k 1}) - m_{k} (X_{k 2}) > M}],

for M > 0. By Chebyshev’s inequality and C2, the second term can be further bounded below as

\frac{1}{2 M + 2 \times 3^{1 / 2} σ_{2}} (\frac{1}{2} E | m_{k} (X_{k 1}) - m_{k} (X_{k 2}) | - {[2 σ_{1}^{2} p r {m_{k} (X_{k 1}) - m_{k} (X_{k 2}) > M}]}^{1 / 2}) \geq \frac{1}{2 M + 2 \times 3^{1 / 2} σ_{2}} {\frac{1}{2} E | m_{k} (X_{k 1}) - E m_{k} (X_{k 1}) | - \frac{2 σ_{1}^{2}}{M}} .

Let $M = 8 σ_{1}^{2} d_{0}^{- 1} n^{κ / 2}$ . Then, M ≥ 3^1/2σ₂ when n is large. We have τ_k ≥ c₀n^−κ, where $c_{0} = d_{0}^{2} / (128 σ_{1}^{2})$ . Similarly, if m_k(·) is monotone decreasing, we can show that τ_k ≤ −c₀n^−κ. Therefore, |τ_k| ≥ c₀n^−κ for any k ∈ ℳ_⋆. Condition B is then proved.

Proof of Theorem 2. To show Theorem 2, we note that for any c₈ > 0, if |EX_(k)Y| > c₈n^−κ, then |τ_k| > c₀n^−κ for some c₀ > 0. The proof of this statement is similar to that in Proposition 1, hence we omit the details.

Because var{H(T)} = O(1), we have that $\sum_{k = 1}^{p} {E X_{(k)} H (T)}^{2} = {‖ E X' H (T) ‖}^{2} = {‖ E X' (X' β + ε) ‖}^{2} = O ({‖ Σ β ‖}^{2}) = O {λ_{max} (Σ)}$ . Therefore, the number of {k : τ_k > c₀n^−κ} = {k : |EX_(k)H(T)| > c₈n^−κ} = O{n^2κλ_max(Σ)}. Because the number of {k : |τ̂_k| > 2c₀n^−κ} is no bigger than the number of {k : |τ_k| > c₀n^−κ} on the set, {max_1≤k≤p |τ̂_k − τ_k| ≤ c₀n^−κ}. By taking c₈ = c₇/2,

p r [| {ℳ̂}_{ν_{n}} | \leq O {n^{2 κ} λ_{max} (Σ)}] \geq p r (max_{1 \leq k \leq p} | {τ̂}_{k} - τ_{k} | \leq c_{0} n^{- κ}) .

The conclusion follows from the tail probability in Theorem 1. This completes the proof.

Contributor Information

Rui Song, Email: rsong@ncsu.edu, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA.

Wenbin Lu, Email: lu@stat.ncsu.edu, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA.

Shuangge Ma, Email: shuangge.ma@yale.edu, Division of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut 06510, USA.

X. Jessie Jeng, Email: xjjeng@ncsu.edu, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA.

References

Bickel P, Ritov Y, Tsybakov A. Simultaneous analysis of lasso and Dantzig selector. The Annals of Statistics. 2009;37:1705–1732. [Google Scholar]
Bitouze Laurent, Massart A Dvoretzky–Kiefer–Wolfowitz type inequality for the Kaplan–Meier estimator. Annales de l’Institut Henri Poincare (B) Probability and Statistics. 1999;35:735–763. [Google Scholar]
Candes E, Tao T. The Dantzig selector: statistical estimation when p is much larger than n (with discussion) The Annals of Statistics. 2007;35:2313–2404. [Google Scholar]
Clayton D, Cuzick J. Multivariate generalizations of the proportional hazards model (with discussion) Journal of the Royal Statistical Society, Series A. 1985;148:82–117. [Google Scholar]
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics. 2002;30:74–99. [Google Scholar]
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with discussion) Journal of the Royal Statistical Society, Series B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J, Song R, et al. Sure independence screening in generalized linear models with NP–dimensionality. The Annals of Statistics. 2010;38:3567–3604. [Google Scholar]
Gonzalez-Manteiga W, Cadarso-Suarez C. Asymptotic properties of a generalized Kaplan– Meier estimator with some applications. Journal of Nonparametric Statisics. 1994;4:65–78. [Google Scholar]
Gorst-Rasmussen A, Scheike T. Independent screening for single-index hazard rate models with ultrahigh dimensional features. Journal of the Royal Statistical Society, Series B. 2013;75:217–245. [Google Scholar]
Hall P, Titterington DM, Xue J. Tilting methods for assessing the influence of components in a classifier. Journal of the Royal Statistical Society, Series B. 2009;71:783–803. [Google Scholar]
He X, Wang L, Hong HG. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics. 2013;41:342–369. [Google Scholar]
Hoeffding W. Probability inequality for sums of bounded random variables. Journal of the American Statistical Association. 1963;58:13–30. [Google Scholar]
Iglewicz B, Hoaglin DC. How to Detect and Handle Outliers. Milwaukee, WI: American Society for Quality Control; 1993. [Google Scholar]
Johnson BA. Variable selection in semiparametric linear regression with censored data. Journal of the Royal Statistical Society, Series B. 2008;70:351–370. [Google Scholar]
Johnson BA, Lin DY, Zeng D. Penalized estimating functions and variable selection in semi-parametric regression models. Journal of the American Statistical Association. 2008;103:672–680. doi: 10.1198/016214508000000184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2011. [Google Scholar]
Kendall MG. Rank Correlation Methods. 3rd ed. London: Griffin & Co; 1962. [Google Scholar]
Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazards frailty regression models. The Annals of Statistics. 2004;32:1448–1491. [Google Scholar]
Kuo C-L, Zaykin DV. Novel rank-based approaches for discovery and replication in genome-wide association studies. Genetics. 2011;189:329–340. doi: 10.1534/genetics.111.130542. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meinshausen N, Rice J. Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. The Annals of Statistics. 2006;34:373–393. [Google Scholar]
Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics. 2009;37:246–270. [Google Scholar]
Peng L, Fine J. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104:1440–1453. [Google Scholar]
Scharfstein DO, Tsiatis AA, Gilbert PB. Semiparametric efficient estimation in the generalized odds-rate class of regression models for right-censored time-to-event data. Lifetime Data Analysis. 1998;4:355–391. doi: 10.1023/a:1009634103154. [DOI] [PubMed] [Google Scholar]
Sellke TM, Sellke SH. Chebyshev inequalities for unimodal distributions. The American Statistician. 1997;51:34–39. [Google Scholar]
Sen PK. Estimates of the regression coefficient based on Kendall’s tau. Journal of the American Statistical Association. 1968;63:1379–1389. [Google Scholar]
Skol D, Scott L, Abecasis G, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genetics. 2006;38:209–213. doi: 10.1038/ng1706. [DOI] [PubMed] [Google Scholar]
Tibshirani RJ. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]
Tibshirani RJ. The lasso method for variable selection in the Cox model. Statistics in Medicine. 1997;16:385–395. doi: 10.1002/(sici)1097-0258(19970228)16:4<385::aid-sim380>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
Tibshirani RJ. Univariate shrinkage in the cox model for high dimensional data. Statistical Applications in Genetics and Molecular Biology. 2009;8:1–18. doi: 10.2202/1544-6115.1438. [DOI] [PMC free article] [PubMed] [Google Scholar]
van de Geer S. High-dimensional generalized linear models and the lasso. The Annals of Statistics. 2008;36:614–645. [Google Scholar]
van Houwelingen HC, Bruinsma T, Hart AAM, Van ’t Veer LJ, Wessels LFA. Cross-validated cox regression on microarray gene expression data. Statistics in Medicine. 2006;25:3201–3216. doi: 10.1002/sim.2353. [DOI] [PubMed] [Google Scholar]
van’t Veer L, Dai H, van de Vijver MJ, He Y, Hart A, Mao M, Peterse H, van der Kooy K, Marton MJ, Witteveen AT, Schreiber G, Kerkhoven R, Roberts C, Linsley P, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. [DOI] [PubMed] [Google Scholar]
Zhang HH, Lu W. Adaptive lasso for Cox’s proportional hazards model. Biometrika. 2007;94:691–703. [Google Scholar]
Zhao DS, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. Journal of Multivariae Analysis. 2012;105:397–411. doi: 10.1016/j.jmva.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Appendix

NIHMS625927-supplement-Web_Appendix.pdf^{(245.4KB, pdf)}

[R1] Bickel P, Ritov Y, Tsybakov A. Simultaneous analysis of lasso and Dantzig selector. The Annals of Statistics. 2009;37:1705–1732. [Google Scholar]

[R2] Bitouze Laurent, Massart A Dvoretzky–Kiefer–Wolfowitz type inequality for the Kaplan–Meier estimator. Annales de l’Institut Henri Poincare (B) Probability and Statistics. 1999;35:735–763. [Google Scholar]

[R3] Candes E, Tao T. The Dantzig selector: statistical estimation when p is much larger than n (with discussion) The Annals of Statistics. 2007;35:2313–2404. [Google Scholar]

[R4] Clayton D, Cuzick J. Multivariate generalizations of the proportional hazards model (with discussion) Journal of the Royal Statistical Society, Series A. 1985;148:82–117. [Google Scholar]

[R5] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]

[R6] Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics. 2002;30:74–99. [Google Scholar]

[R7] Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with discussion) Journal of the Royal Statistical Society, Series B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Fan J, Song R, et al. Sure independence screening in generalized linear models with NP–dimensionality. The Annals of Statistics. 2010;38:3567–3604. [Google Scholar]

[R9] Gonzalez-Manteiga W, Cadarso-Suarez C. Asymptotic properties of a generalized Kaplan– Meier estimator with some applications. Journal of Nonparametric Statisics. 1994;4:65–78. [Google Scholar]

[R10] Gorst-Rasmussen A, Scheike T. Independent screening for single-index hazard rate models with ultrahigh dimensional features. Journal of the Royal Statistical Society, Series B. 2013;75:217–245. [Google Scholar]

[R11] Hall P, Titterington DM, Xue J. Tilting methods for assessing the influence of components in a classifier. Journal of the Royal Statistical Society, Series B. 2009;71:783–803. [Google Scholar]

[R12] He X, Wang L, Hong HG. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics. 2013;41:342–369. [Google Scholar]

[R13] Hoeffding W. Probability inequality for sums of bounded random variables. Journal of the American Statistical Association. 1963;58:13–30. [Google Scholar]

[R14] Iglewicz B, Hoaglin DC. How to Detect and Handle Outliers. Milwaukee, WI: American Society for Quality Control; 1993. [Google Scholar]

[R15] Johnson BA. Variable selection in semiparametric linear regression with censored data. Journal of the Royal Statistical Society, Series B. 2008;70:351–370. [Google Scholar]

[R16] Johnson BA, Lin DY, Zeng D. Penalized estimating functions and variable selection in semi-parametric regression models. Journal of the American Statistical Association. 2008;103:672–680. doi: 10.1198/016214508000000184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2011. [Google Scholar]

[R18] Kendall MG. Rank Correlation Methods. 3rd ed. London: Griffin & Co; 1962. [Google Scholar]

[R19] Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazards frailty regression models. The Annals of Statistics. 2004;32:1448–1491. [Google Scholar]

[R20] Kuo C-L, Zaykin DV. Novel rank-based approaches for discovery and replication in genome-wide association studies. Genetics. 2011;189:329–340. doi: 10.1534/genetics.111.130542. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Meinshausen N, Rice J. Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. The Annals of Statistics. 2006;34:373–393. [Google Scholar]

[R22] Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics. 2009;37:246–270. [Google Scholar]

[R23] Peng L, Fine J. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104:1440–1453. [Google Scholar]

[R24] Scharfstein DO, Tsiatis AA, Gilbert PB. Semiparametric efficient estimation in the generalized odds-rate class of regression models for right-censored time-to-event data. Lifetime Data Analysis. 1998;4:355–391. doi: 10.1023/a:1009634103154. [DOI] [PubMed] [Google Scholar]

[R25] Sellke TM, Sellke SH. Chebyshev inequalities for unimodal distributions. The American Statistician. 1997;51:34–39. [Google Scholar]

[R26] Sen PK. Estimates of the regression coefficient based on Kendall’s tau. Journal of the American Statistical Association. 1968;63:1379–1389. [Google Scholar]

[R27] Skol D, Scott L, Abecasis G, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genetics. 2006;38:209–213. doi: 10.1038/ng1706. [DOI] [PubMed] [Google Scholar]

[R28] Tibshirani RJ. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]

[R29] Tibshirani RJ. The lasso method for variable selection in the Cox model. Statistics in Medicine. 1997;16:385–395. doi: 10.1002/(sici)1097-0258(19970228)16:4<385::aid-sim380>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

[R30] Tibshirani RJ. Univariate shrinkage in the cox model for high dimensional data. Statistical Applications in Genetics and Molecular Biology. 2009;8:1–18. doi: 10.2202/1544-6115.1438. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] van de Geer S. High-dimensional generalized linear models and the lasso. The Annals of Statistics. 2008;36:614–645. [Google Scholar]

[R32] van Houwelingen HC, Bruinsma T, Hart AAM, Van ’t Veer LJ, Wessels LFA. Cross-validated cox regression on microarray gene expression data. Statistics in Medicine. 2006;25:3201–3216. doi: 10.1002/sim.2353. [DOI] [PubMed] [Google Scholar]

[R33] van’t Veer L, Dai H, van de Vijver MJ, He Y, Hart A, Mao M, Peterse H, van der Kooy K, Marton MJ, Witteveen AT, Schreiber G, Kerkhoven R, Roberts C, Linsley P, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. [DOI] [PubMed] [Google Scholar]

[R34] Zhang HH, Lu W. Adaptive lasso for Cox’s proportional hazards model. Biometrika. 2007;94:691–703. [Google Scholar]

[R35] Zhao DS, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. Journal of Multivariae Analysis. 2012;105:397–411. doi: 10.1016/j.jmva.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Censored Rank Independence Screening for High-dimensional Survival Data

Rui Song

Wenbin Lu

Shuangge Ma

X Jessie Jeng

Summary

1. Introduction

2. Censored Rank Independence Screening

3. Sure Screening Property

4. Selection of the important set

5. Application to a general class of transformation models

6. Simulation Studies

Fig. 1.

Fig. 2.

Table 1.

7. Application to breast cancer data

Table 2.

8. Discussion

Supplementary Material

Acknowledgement

Appendix

Proof of the sure screening property

Verification of Condition B for a general class of transformation models

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Censored Rank Independence Screening for High-dimensional Survival Data

Rui Song

Wenbin Lu

Shuangge Ma

X Jessie Jeng

Summary

1. Introduction

2. Censored Rank Independence Screening

3. Sure Screening Property

4. Selection of the important set

5. Application to a general class of transformation models

6. Simulation Studies

Fig. 1.

Fig. 2.

Table 1.

7. Application to breast cancer data

Table 2.

8. Discussion

Supplementary Material

Acknowledgement

Appendix

Proof of the sure screening property

Verification of Condition B for a general class of transformation models

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases