Conditional screening for ultra-high dimensional covariates with survival outcomes

Hyokyoung G Hong; Jian Kang; Yi Li

doi:10.1007/s10985-016-9387-7

. Author manuscript; available in PMC: 2018 Jan 6.

Published in final edited form as: Lifetime Data Anal. 2016 Dec 8;24(1):45–71. doi: 10.1007/s10985-016-9387-7

Conditional screening for ultra-high dimensional covariates with survival outcomes

Hyokyoung G Hong ¹, Jian Kang ^2,^✉, Yi Li ²

PMCID: PMC5494024 NIHMSID: NIHMS865250 PMID: 27933468

Abstract

Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data.

Keywords: Conditional screening, Cox model, Diffuse large B-cell lymphoma, High-dimensional variable screening

1 Introduction

Despite much progress made in the past two decades, many cancers do not have a proven means of prevention or effective treatments. Precision medicine that takes into account individual susceptibility has become a valid approach to gaining better insights into the biological influences on cancers, which is expected to benefit millions of cancer patients. A critical component of precision medicine lies in detecting and identifying important biomarkers that are predictive for cancer patients’ prognosis. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The presented work is motivated by a genomic study of diffuse large B-cell lymphoma (DLBCL) (Rosenwald et al. 2002), with the goal of identifying gene signatures out of 7399 genes for predicting survival among 240 DLBCL patients. The results may address whether the DLBCL patients’ survival after chemotherapy could be regulated by the molecular features.

The recently developed variable screening methods, such as the sure independence screening proposed by Fan and Lv (2008), have emerged as a powerful tool to solve this problem, but their validity often hinges upon the partial faithfulness assumption, that is, the jointly important variables are also marginally important. Consequently, they will fail to identify the hidden variables that are jointly important but have weak marginal associations with the outcome, resulting in poor understanding of the molecular mechanism underlying or regulating the disease. To alleviate this problem, Fan and Lv (2008) further suggested an iterative procedure (ISIS) by repeatedly using the residuals from the previous iterations, which has gained much popularity. However, the required iterations have increased the computational burden, and the statistical properties are elusive.

On the other hand, intensive biomedical research has generated a large body of biological knowledge. For example, several studies have confirmed AA805575, a germinal-center B-cell signature gene, is relevant to DLBCL survival (Gui and Li 2005; Liu et al. 2013). Including such prior knowledge for improved accuracy in variable selection has drawn much interest. Barut et al. (2016) proposed a conditional screening (CS) approach in the framework of a generalized linear model (GLM) when some prior knowledge on feature selection is known, and showed that the CS approach provides a useful means to identify jointly informative but marginally weak associations, and Hong et al. (2016) further proposed to integrate prior information using data-driven approaches.

Development of high dimensional screening tools with survival outcome has been fruitful. Some related work includes an (iterative) sure screening procedure for Cox’s proportional hazards model (Fan et al. 2010), a marginal maximum partial likelihood estimator (MPLE) based screening procedure (Zhao and Li 2012), a censored rank independence screening method which is robust to outliers and applicable to a general class of survival models (Song et al. 2014). But to the best of our knowledge, all these methods essentially posit the partial faithfulness assumption and do not incorporate the known prior biological information. As a result, they will be likely to suffer the inability to identify marginally weak but jointly important signals.

To fill the gap, we propose a new CS method for the Cox proportional hazards model by computing the marginal contribution of each covariate given priorily known information. We refer to it as Cox conditional screening (CoxCS). As opposed to the conventional marginal screening methods, our method is more likely to detect marginally weak but jointly important signals, which will have important biological applications as shown in the data example section. Moreover, in contrast with most screening methods that usually employ subjective thresholds for screening, we also propose a principled cut-off to govern the screening and control the false positives in light of Zhao and Li (2012). This will be especially important in the presence of hidden variables.

To demonstrate the utility of CoxCS in recovering important hidden variables, we briefly discuss using Example 1 in Sect. 4. By design, variable 6 is the hidden variable in that example as it is only weakly correlated with the survival time marginally. Let ${\hat{β}}_{C, j}$ denote the screening statistics by the CoxCS approach (defined in Sect. 2), where $C$ indexes variables that are pre-included into the model. When $C = \emptyset$ , the CoxCS is equivalent to the marginal screening approach for the Cox proportional hazards model. Figure 1 summarizes the densities of the screening statistics for the hidden variable 6 and noisy variables 10–1000 for different sets of conditional variables based on 400 simulated datasets. The results show that with a high probability the marginal screening statistic for hidden variable 6 is much smaller than those of noisy variables. When the conditioning set includes one truly active variable, the density plots show a clear separation between the hidden variables and the noisy variables. When we include more truly active variables, this separation becomes larger. Interestingly, when conditional on noisy variables that are correlated with both active and hidden variables in the model, the chance of identifying the hidden variable using CoxCS is still higher than the marginal screening. A similar phenomenon was observed in the GLM setting (Barut et al. 2016). This is because when such “noisy” variables are correlated with both marginally important variables and hidden variables, they may effectively function as surrogates for the active variables and conditioning on them can help detect hidden variables.

Fig. 1 — Density of the screening statistics $| {\hat{β}}_{C, 6} |$ (*red*) for the hidden variable compared with a mixture of densities of screening statistics $| {\hat{β}}_{C, 11 : 1000} |$ (*blue*) for the noise variables with different conditioning sets: a $C = {\emptyset}$ which is equivalent to marginal screening; b $C = {1}$ one truly active variables; c $C = {1, 2}$ two truly active variables; d $C = {7, \dots, 10}$ four noisy variables (Color figure online)

The theory of CS for GLM has been established by Barut et al. (2016). But its extension to the survival context is challenging and elusive, calling for new techniques. To this end, we propose two new functional operators on random variables to characterize their linear associations given other random variables: the conditional linear expectation and the conditional linear covariance. Both are critical to formulate the regularity conditions for the population level properties of CoxCS with statistically meaningful interpretations, and facilitate the development of theory for CS approaches in general settings. A similar concept of the conditional linear covariance has been introduced by Barut et al. (2016), but it can not be used for the survival outcome data. In summary, the proposed method is computationally efficient, adapts to sparse and weak signals, enjoys the good theoretical properties under weak regularity conditions, and works robustly in a variety of settings to identify hidden variables.

The remaining of this paper is organized as follows. In Sect. 2, we review the Cox proportional hazards model and present CoxCS approach with some alternatives. In Sect. 3, we list the regularity conditions and establish the sure screening properties. In Sect. 4, we further conduct simulation studies to compare our method with the major competing methods under under a number of scenarios. In Sect. 4, we apply our method to study the DLBCL data. We conclude with a brief discussion on the future work in Sect. 5.

2 Model

2.1 The Cox proportional hazards model

Suppose we have n observations with p covariates. Let i and j be respectively index subjects and covariates. Denote by Z_i,j covariate j for subject i, write Z_i = (Z_i_,1,…, Z_i_,_p)^T. Let T_i be the underlying survival time and C_i be the censoring time. We observe $X_{i} = min {T_{i}, C_{i}}$ and, $δ_{i} = I [T_{i} \leq C_{i}]$ where I(·) is the indicator function. Assume that there exists τ > 0, such that P(X_i > τ|Z_i) > 0 and assume that the event time T_i and the censoring time C_i are independent given Z_i. Suppose T_i follows a Cox proportional hazards model

λ (t; Z_{i}) = λ_{0} (t) \exp (α^{T} Z_{i}),

(1)

where $λ_{0} (t)$ is an-unspecified baseline hazard and $α = {(α_{1}, \dots, α_{p})}^{T}$ is the true-coefficient. Let $Λ_{0} (t) = \int_{0}^{t} λ_{0} (s) d s$ be the baseline cumulative hazard function. Suppose there is a set of covariates that are known a priori to be related to the survival outcome. Denote by $C$ the indices of these covariates. Let q = | $C$ | be the number of covariates in $C$ . Write $Z_{i, C} = {(Z_{i, j}, j \in C)}^{T}$ , $Z_{i, - C} = {(Z_{i, j}, j \notin C)}^{T}$ , $α_{C} = {(α_{j}, j \in C)}^{T}$ and $α_{- C} = {(α_{j}, j \notin C)}^{T}$ .Note that in our problem, $C$ is known but both $α_{C}$ and $α_{- C}$ are unknown. Then the true hazard function in (1) is equivalent to

λ (t; Z_{i}) = λ_{0} (t) \exp (α_{C}^{T} Z_{i, C} + α_{- C}^{T} Z_{i, - C}) .

(2)

To estimate $α_{C}$ and $α_{- C}$ , we introduce the independent counting process $N_{i} (t) = I (X_{i} \leq t, δ_{i} = 1)$ and the at-risk process $Y_{i} (t) = I [X_{i} \geq t]$ . When p is small, we can obtain the partial likelihood estimator $\hat{α} = {({\hat{α}}_{C}^{T}, {\hat{α}}_{- C}^{T})}^{T}$ by solving the estimation equation U(α) = 0_p with U(α) = (U₁(α),…,U_p(α)) and

U_{j} (α) = \sum_{i = 1}^{n} \int_{0}^{τ} {Z_{i, j} - \frac{S_{j}^{(1)} (t, α)}{S_{j}^{(0)} (t, α)} d N_{i} (t)},

(3)

with

S_{j}^{(m)} (t, α) = \frac{1}{n} \sum_{i = 1}^{n} Z_{i, j}^{m} Y_{i} (t) \exp (α_{C}^{T} Z_{i, C} + α_{- C}^{T} Z_{i, - C}),

(4)

for $m \in {0, 1, 2, \dots}$ When p > n, it is computationally and theoretically infeasible to directly solve Eq. (3). By imposing sparsity on the coefficients, one may maximize the penalized partial likelihood to obtain solutions. However, when $p ≫ n$ , we need to employ a variable screening procedure first before performing regularized regression. We propose a new CS procedure in the next section.

2.2 Cox conditional screening

We fit the marginal Cox regression by including the known covariates in $C$ . Specifically, for $j \notin C$ , we have the following marginal Cox regression model

λ_{j} (t; Z_{i}) = λ_{j, 0} (t) \exp (β_{C}^{T} Z_{i, C} + β Z_{i, j}),

(5)

from which the maximum partial likelihood estimation equation ${({\hat{β}}_{C, j}^{T}, {\hat{β}}_{j})}^{T}$ can be obtained. It is given by the solution of the following equation:

V_{j} (β_{C}, β) = {[V_{j, k} (β_{C}, β), k \in C \cup {j}]}^{T} = 0_{q + 1},

(6)

with

V_{j, k} (β_{C}, β) = \sum_{i = 1}^{n} \int_{0}^{τ} {Z_{i, k} - \frac{R_{j, k}^{(1)} (β_{C}, β, t)}{R_{j, k}^{(0)} (β_{C}, β, t)} d N_{i} (t)},

(7)

and

R_{j, k}^{(m)} (β_{C}, β, t) = \frac{1}{n} \sum_{i = 1}^{n} Z_{i, k}^{m} Y_{i} (t) \exp (β_{C}^{T} Z_{i, C} + β Z_{i, j}),

(8)

for $k \in C \cup {j}$ and $m \in {0, 1, 2, \dots}$ . For a given threshold γ > 0. The selected index set in addition to set $C$ is given by

{\hat{ℳ}}_{- C} = {j \notin C : | {\hat{β}}_{j} | \geq γ} .

(9)

Namely, we recruit variables with large additional contribution given $Z_{C}$ . We refer to this method as Cox conditional screening (CoxCS).

3 Theoretical results

We establish the theoretical properties of the proposed methods by introducing a few new definitions along with the basic properties.

3.1 Definitions and basic properties

Let ( $Ω, ℱ, P$ ) be the probability space for all random variables introduced in this paper, where $Ω$ is the sample space, $ℱ$ is the σ-algebra as the set of events and P is the probability measure.

For any d ≥ 1, any random variable $ξ : Ω \to ℝ^{d}$ and any operator A, denote by A[•|ξ] conditional A of • given ξ. For any vector $a = (a_{1}, \dots, a_{p}) \in ℝ^{p}$ , let $a_{C} = {(a_{j}, j \in C)}^{T}$ be the subvector where all its elements are indexed in $C$ . Let ${‖ a ‖}_{d} = \sqrt[d]{{\sum_{j = 1}^{p} | a |}_{j}^{d}}$ be the L-d norm for any vector $a \in ℝ^{p}$ . For a sequence of random variables indexed by ${ξ_{n}}, ξ_{n} = o_{p} (1)$ if and only if for any ε₁ > 0 and ε₂ > 0, there exists N such that for any $n > N, P [| ξ_{n} | > \in_{1}] < \in_{2}$ .

For simplicity, let T, C, X, Y (t), Z_j, Z and δ represent T_i, C_i, X_i, Y_i (t), Z_i,_j, Z_i and δ_i respectively, by removing the subject index i. Let $S_{T} (t | Z)$ and $S_{C} (t)$ represent the survival functions for the event time T and censored time C. Let $F_{T} (t | Z) = 1 - S_{T} (t | Z)$ .

Definition 1

Let $ℳ_{- C} = {j \notin C, α_{j} \neq 0}$ and $w = \sum_{j \notin C} I [α_{j} \neq 0]$ be the true set of non-zero coefficients and its cardinality in model (1), aside from the important predictors known a priori.

To study the asymptotic property of ${({\hat{β}}_{C, j}^{T}, {\hat{β}}_{j})}^{T}$ , define the population level quantity as follows:

Definition 2

Let ${(β_{C, j}^{T}, β_{j})}^{T}$ be the solution of the following equations

V_{j} (β_{C}, β) = {[v_{j, k} (β_{C}, β), k \in C \cup {j}]}^{T} = 0_{q + 1},

(10)

with

v_{j, k} (β_{C}, β) = \int_{0}^{τ} [s_{k}^{(1)} (t) - \frac{r_{j, k}^{(1)} (t, β_{C}, β)}{r_{j, k}^{(0)} (t, β_{C}, β)} s_{k}^{(0)} (t)] d t,

(11)

where $s_{k}^{(m)} (t) = E [Z_{k}^{m} d N (t)]$ and $r_{j, k}^{(m)} (t, β_{C}, β) = E [R_{j, k}^{(m)} (t, β_{c}, β)]$

Definition 3

Let $β_{C, 0}$ be the solution of the following equations

v_{C} (β_{C}) = {[v_{j, k} (β_{C}, 0), k \in C]}^{T} = 0_{q} .

(12)

To understand the intuition of the population level properties for the CoxCS, we need to define a conditional linear expectation. A similar concept has been used to study the conditional sure independence screening (CSIS) in the GLM setting by Barut et al. (2016). We provide a formal definition here.

Definition 4

For two random variables $ζ : Ω \to ℝ^{d}$ and $ξ : Ω \to ℝ^{p}$ . The conditional linear expectation of ζ given ξ is defined as

E^{*} (ζ | ξ) = E [ζ] + A^{T} {ξ - E (ξ)},

(13)

where $A {= argmin}_{B \in ℝ^{d} \times ℝ^{p}} E [{(ζ - E [ζ] - B^{T} {ξ - E (ξ)})}^{2} | ξ]$ . Also, define notation $E^{*} (ζ) = E (ζ)$ .

Definition 5

For any random variables $ζ_{1} : Ω \to ℝ^{d_{1}}$ $ζ_{2} : Ω \to ℝ^{d_{2}}$ and $ξ : Ω \to ℝ^{p}$ . The conditional linear covariance between ζ₁ and ζ₂ given ξ is defined as

{Cov}^{*} (ζ_{1}, ζ_{2} | ξ) = E^{*} [{ζ_{1} - E^{*} (ζ_{1} | ξ)} {ζ_{2} - E^{*} (ζ_{2} | ξ)} | ξ] .

(14)

Definition 6

Define

v_{j} (β_{C}, β) = v_{j, j} (β_{C}, β) + \sum_{k \in C} a_{k} v_{j, k} (β_{C}, β),

where vector $a_{C} = {[a_{k}, k \in C]}^{T}$ such that $E^{*} [Z_{j} | Z_{C}] = \sum_{k \in C} a_{k} Z_{k}$ .

We list all the properties of the conditional linear expectation and the conditional linear covariance in Propositions 1 and 2 in Appendix. They are quite useful for the theoretical developments of the sure screening property. Also, combining Propositions 1, 4 and Lemma 1 in Appendix, we show the uniqueness of the solution to score equations in Definitions 3 and 6.

3.2 Regularity conditions

We list all conditions for the theoretical results.

Condition 1

For each $j \notin C$ and $k \in C \cup {j}$ , there exists a neighborhood of ${(β_{C, j}^{T}, β_{j})}^{T}$ , which is defined as

ℬ_{j} = {{(β_{C}^{T}, β)}^{T} : ‖ {(β_{C}^{T}, β_{j})}^{T} - {(β_{C, j}^{T}, β_{j})}^{T} ‖ 1 < δ_{j}}, with δ_{j} > 0,

such that for each τ < ∞.

For m = 0, 1,
$\sup_{t \in [0, τ], {(β_{C}^{T}, β)}^{T} \in ℬ_{l}} ‖ R_{j, k}^{(m)} (β_{C}, β) - r_{j, k}^{(m)} (β_{C}, β) ‖_{2} \to 0,$
in probability as $n \to \infty$ .
There exists a constant L > 0 such that
$L = min_{j \notin C} [\inf_{t \in [0, τ], {(β_{C}^{T}, β)}^{T} \in ℬ_{j}} {r_{j, k}^{(0)} (β_{C}, β, t)}] .$
Of note, ${r_{j, k}^{(0)} (β_{C}, β, t)}$ does not depend on k. Let $δ = {max}_{j \notin C} δ_{j}$ .

Condition 2

The covariates Z_j’s satisfy the following conditions.

For $j \in {1, \dots, p}, E [Z_{j}] = 0$ and there exists a constant K₀ such that $P (Z_{j} > K_{0}) = 0$ .
Z_j is a time constant variable, for all j.
All Z_j’s, $j \in ℳ_{- C}$ are independent of all Z_k’s, $k \notin ℳ_{- C}$ given $Z_{C}$ .
For constant c₁ > 0 and κ <1/2,
$min_{j \in ℳ_{- C}} | E [{Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})]) | \geq c_{1} n^{- k} .$

Condition 3

There exists a constant K₁ such that

{‖ α ‖}_{1} < K_{1} and ‖ {(β_{C, j}^{T}, β_{j})}^{T} ‖_{1} < K_{1},

for all p > 0.

Condition 4

For all $j \notin C$ , there exists a constant M > 0 such that

M {‖ {({\hat{β}}_{C, j}^{T}, {\hat{β}}_{j})}^{T} - {β_{C, j}^{T}, β_{j})}^{T} ‖}_{2} \leq {‖ {\bar{V}}_{j} ({\hat{β}}_{C, j}, {\hat{β}}_{j}) - {\bar{V}}_{j} (β_{C, j}, β_{j}) ‖}_{2},

where ${\bar{V}}_{j} = n^{- 1} V_{j}$ .

To study the sure screening property of our proposed method, we first link β_j in a joint model to its counterpart in a marginal model. Thus we first understand its properties on the population as well as the sample level.

3.3 Properties on population level

Theorem 1

Suppose Condition 2 holds, β_j = 0 if and only if α_j.= 0 for all $j \notin C$

To identify covariate $j \in ℳ_{- C}, | β_{j} |$ needs to be at least O (n^−1/2). The following Theorem 2 specifies the signal strength $j \in ℳ_{- C}$ .

Theorem 2

Suppose Condition 2 holds. There exist constants c₂ > 0 and κ < 1/2 such that

min_{j \in ℳ_{- C}} | β_{j} | \geq c_{2} n^{- κ} .

3.4 Properties on sample level

Theorem 3 proves the uniform convergence of the proposed estimator and the sure screening property of the procedure.

Theorem 3

Suppose Conditions 1–4 hold. For any ε₁ > 0 and any ε₂ > 0, there exits positive constants c₃, c₄ and integer N such that for any n > N,

For any 0 < κ < 1/2,
$P [max_{j \in ℳ_{- C}} | {\hat{β}}_{j} - β_{j} | > \frac{c_{2}}{2} (n^{- κ} + ε_{1})] \leq 2 w (q + 1) \exp (- c_{3} n^{1 - 2 κ}) + ε_{2},$
where w is the size of $ℳ_{- C}$ , q is the size of $C$ and c₂ is the same value in Theorem 2 and c₃ does not dependent on ε₁, ε₂ and κ, but N depends on ε₁ and ε₂.
If $γ_{n} = c_{4} n^{- κ}$ , where κ is the same number in Condition 2, then
$P [min_{j \in ℳ_{- C}} | {\hat{β}}_{j} | > γ_{n}] \geq 1 - 2 w (q + 1) \exp (- c_{3} n^{1 - 2 κ}) - \in_{2,}$ (15)
where c₄ does not depend on ε₁, ε₂ and κ, thus
$\lim_{n \to \infty} P [ℳ_{- C} \subseteq {\hat{ℳ}}_{- C}] = 1 .$ (16)

3.5 Alternative methods

In this section we introduce two alternative methods for controlling the false discovery rate.

Define the information matrix

I_{j} (β_{C}, β_{j}) = - {(\frac{\partial V_{j, k} (β_{C}, β_{j})}{\partial β_{k^{'}}})}_{k, k^{'} \in C \cup {j}},

(17)

which is of q + 1 dimension. Denote ${\hat{σ}}_{j}^{2} = {[I_{j} (β_{C, j}, {\hat{β}}_{j})]}_{q + 1, q + 1}^{- 1}$ be the variance estimate of ${\hat{β}}_{j}$ . For a given threshold γ > 0, we can have a different way to select the index which is given by

{\hat{ℳ}}_{- C}^{*} = {j \notin C : \frac{| {\hat{β}}_{j} |}{{\hat{σ}}_{j}} \geq γ} .

(18)

We refer to (18) as “CS-Wald”.) suggested that $γ = Φ^{- 1} (1 - f / 2 p)$ where f is the number of false positive that one is willing to tolerate and Φ(·) is the standard normal cumulative distribution function. Another alternative to construct the screening statistics, which is also scale free, is to utilize the partial log likelihood ratio statistic. Specifically,

ℓ (β_{C,} β) = \sum_{i = 1}^{n} \int_{0}^{τ} {β_{C}^{T} Z_{i, C} + β Z_{i, j} - \log (R_{j, k}^{(0)} (β_{C,} β, t)) d N_{i} (t)} .

(19)

Suppose ${({\hat{β}}_{C, j}^{T}, {\hat{β}}_{j})}^{T}$ maximizes (19) for a given j. Then, for a given threshold γ > 0, the index set can be chosen by considering the following likelihood ratio statistic.

{\hat{ℳ}}_{- C}^{*} = {j \notin C : ℓ ({\hat{β}}_{C, j}, {\hat{β}}_{j}) - ℓ ({\hat{β}}_{C, 0}, β = 0) \geq γ},

(20)

where ${\hat{β}}_{C, 0}$ maximizes $ℓ (β_{C}, 0)$ . Hereafter (20) will be referred to as “CS-PLIK”.

4 Simulation studies

The utility of the proposed methods was evaluated via extensive simulations. Denote by CS-MPLE, a version of CoxCS that is based on the criteria of (9). For completeness, we considered two other variations of CoxCS, namely, CS-PLIK and CS-Wald. The finite sample performance of the proposed methods was compared with the following marginal screening methods designed for the survival data.

–
CRIS: censored rank independence screening proposed by Song et al. (2014).
–
CORS: correlation screening, which is an extension of sure independence screening to the censored outcome data by using inverse probability weighting; see Song et al. (2014).
–
PSIS-Wald: Wald test based on the marginal Cox model fitted on each covariate; see Zhao and Li (2012).
–
PSIS-PLIK: partial likelihood ratio test based on the marginal Cox model fitted on each covariate, which is asymptotically equivalent to Zhao and Li (2012).

We illustrated our methods and compared them with the competing methods on data simulated as below.

Example 1 The survival time was generated from a Cox model with baseline hazards function being set to be 1, i.e.,

λ (t | Z) = \exp (β^{T} Z),

where Z were generated from the standard normal distribution with equal correlation 0.5 and $β = {(1_{5}^{T}, - 2.5, 0_{p - 6}^{T})}^{T}$ .

Example 2 The survival time was generated from a Cox model with baseline hazards function being set to be exp(−1), i.e.,

λ (t | Z) = \exp (- 1 + β^{T} Z),

where $β = {(10, 0_{p - 2}^{T}, 1)}^{T}$ and all covariates were generated from the independent standard normal distribution.

Example 3 The same as Example 2 except that the first p – 1 covariates were generated from the multivariate standard normal distribution with an equal correlation of 0.9.

Example 4 The same as Example 1 except that the magnitude of the active variables were reduced to half, i.e., $β = {(0.5, 0.5, 0.5, 0.5, 0.5, - 1.25, 0_{p - 6}^{T})}^{T}$ .

The simulated data examples were designed in such a way that variables Z₆ in Example 1 and Z_p in Example 3 possess marginally weak but conditionally strong signals, which makes marginal screening approaches not ideal for identifying them. Example 4 exhibits a scenario with smaller signal-to-noise ratios, compared with Examples 1–3. For the GLM, Barut et al. (2016) provided a similar simulation design in the context of non-censored regression. Figure 2 depicts the distribution of the absolute correlation between the survival time and the covariate variables, where uncensored data were used to compute the marginal correlation between the event time and the covariate using an inverse probability weighting (Song et al. 2014). In Examples 1–3, the magnitude of the marginal correlation between the survival and inactive variables are sometimes even larger than that of between the survival and active variables. Therefore, it is expected that the competing methods such as CORS which relies on the marginal correlation between the survival and covariates would be difficult to identify active variables. Clearly, in Example 1 the marginal signal strength of Z₆ is weaker than most noisy (inactive) variables (Z₇ – Z₁₀₀₀), while the marginal signal strength of Z₁₀₀₀ in Examples 2 and 3 is similar or even lower than most noisy variables. The marginal correlation between the survival time and each variable is getting weaker with heavier censoring.

Fig. 2 — Absolute correlation of the survival time and the covariate variables. The *blue short-dashed lines* (……) represents the distribution of the inactive variables; the *green long-dashed lines* (– – –) for the active variables with relatively strong signals; the *red solid lines* (— —) for the hidden active variable (Color figure online)

In all these examples, the censoring times C_i were independently generated from a uniform distribution U [0, c], with c chosen to give approximately 20% and 60% of censoring proportions. We set n = 100 and varied p from 1000 (high-dimensional) to 10, 000 (ultrahigh-dimensional). For each configuration, a total of 400 simulated datasets were generated.

We considered two metrics to compare the performance between different methods: the minimum model size (MMS) which is the minimum number of variables that need to be selected in order to include all active variables, and the true positive rate (TPR) which is the proportion of active variables that are included in the first n selected variables. Hence, a method with small MMS and large TPR can be more efficient to discover true signals. For a fair comparison, we added the number of conditioning variable in our examples to MMS for the proposed methods (CS-MPLE, CS-Wald and CS-PLIK). In practice, identifying conditional sets normally requires some prior biological information. In our simulations, we simply chose the covariate Z₁ as the conditioning variable. In practice, we propose to choose the variable with the highest marginal signal strength as the conditioning variable, which can be a practical solution in the absence of prior biological knowledge. In Examples 1–4, we considered two sets of conditioning sets. The first choice was $C_{1} = {1}$ since the signal of the first variable is strong enough to be easily picked by other marginal screening methods. Our second choice was $C_{2}$ that consists of one active variable (the first variable in our examples) and 4 noisy variables. The purpose was to assess the robustness of the proposed method when the conditioning set is dominated by noisy variables.

Tables 1 and 2 demonstrate the advantages of our proposed methods under the difficult scenarios as reflected in Examples 1–4. Although the performance of the proposed methods deteriorates a bit in Example 4 due to small signal-to-noise ratios, in all examples (including Example 4) the proposed methods outperform the marginal approaches by largely reducing MMS. Moreover, all the marginal screening methods have had some difficulties in identifying Z₆ in Example 1 and Example 4 and Z₁₀₀₀ in Examples 2 and 3. Indeed, these variables have the lowest priorities to be included by using the marginal methods. Moreover, the results with $C_{2}$ demonstrate that conditioning can be beneficial even if the conditioning set is dominated by noisy variables.

Table 1.

Median minimum model size (MMS) and median true positive rates (TPR) along with their corresponding IQRs (in the parentheses) based on 400 simulated data sets

	Method	(n, p) = (100, 1000)		(n, p) = (100, 10000)
	Method	MMS	TPR	MMS	TPR
	CRIS	1000.0 (3.0)	0.50 (0.17)	9995.0 (37.0)	0.17 (0.17)
	CORS	944.0 (168.2)	0.33 (0.33)	9466.0 (1101.8)	0.00 (0.17)
Example 1 CR ≈ 20%	PSIS-PLIK	1000.0 (0.0)	0.67 (0.17)	10000.0 (2.2)	0.33 (0.17)
	PSIS-Wald	1000.0 (0.0)	0.67 (0.17)	10000.0 (2.2)	0.33 (0.17)
	CS-PLIK ( $C_{1}$ )	152.5 (272.2)	0.83 (0.17)	1322.0 (2253.8)	0.67 (0.17)
	CS-Wald ( $C_{1}$ )	154.5 (274.5)	0.83 (0.17)	1286.5 (2287.0)	0.67 (0.17)
	CS-MPLE ( $C_{1}$ )	143.0 (249.0)	0.83 (0.17)	1321.0 (2305.8)	0.50 (0.17)
	CS-PLIK ( $C_{2}$ )	109.5 (212.0)	0.83 (0.17)	1012.5 (1906.5)	0.67 (0.33)
	CS-Wald ( $C_{2}$ )	111.0 (206.8)	0.83 (0.17)	1021.0 (1906.0)	0.67 (0.33)
	CS-MPLE ( $C_{2}$ )	110.0 (199.8)	0.83 (0.17)	1070.0 (1892.8)	0.67 (0.17)
	CRIS	926.5 (151.8)	0.33 (0.17)	9429.0 (1405.0)	0.00 (0.17)
	CORS	898.0 (152.2)	0.00 (0.17)	8976.0 (1610.0)	0.00 (0.00)
Example 1 CR ≈ 60%	PSIS-PLIK	1000.0 (2.0)	0.67 (0.17)	9999.0 (17.0)	0.33 (0.17)
	PSIS-Wald	1000.0 (2.0)	0.67 (0.17)	9999.0 (17.0)	0.33 (0.17)
	CS-PLIK ( $C_{1}$ )	227.0 (351.2)	0.83 (0.17)	2383.5 (3331.5)	0.50 (0.33)
	CS-Wald ( $C_{1}$ )	227.5 (348.2)	0.83 (0.17)	2403.5 (3233.2)	0.50 (0.33)
	CS-MPLE ( $C_{1}$ )	228.5 (320.5)	0.83 (0.17)	2229.5 (3046.8)	0.50 (0.17)
	CS-PLIK ( $C_{2}$ )	206.0 (279.2)	0.83 (0.17)	1012.5 (1906.5)	0.67 (0.33)
	CS-Wald ( $C_{2}$ )	207.0 (278.0)	0.83 (0.17)	1021.0 (1906.0)	0.67 (0.33)
	CS-MPLE ( $C_{2}$ )	207.5 (267.2)	0.83 (0.17)	1070.0 (1892.8)	0.67 (0.17)
	CRIS	262.5 (401.8)	0.50 (0.00)	2871.0 (4314.0)	0.50 (0.00)
	CORS	490.5 (510.8)	0.50 (0.00)	4878.5 (5099.8)	0.50 (0.00)
Example 2 CR ≈ 20%	PSIS-PLIK	318.0 (492.8)	0.50 (0.00)	3777.0 (5690.8)	0.50 (0.00)
	PSIS-Wald	318.5 (494.2)	0.50 (0.00)	3786.5 (5707.8)	0.50 (0.00)
	CS-PLIK ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (0.0)	1.00 (0.00)
	CS-Wald ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (0.0)	1.00 (0.00)
	CS-MPLE ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (0.0)	1.00 (0.00)
	CS-PLIK ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (0.0)	1.00 (0.00)
	CS-Wald ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (0.0)	1.00 (0.00)
	CS-MPLE ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (0.0)	1.00 (0.00)
	CRIS	399.5 (478.0)	0.50 (0.00)	3601.0 (4894.2)	0.50 (0.00)
	CORS	603.5 (390.5)	0.00 (0.00)	5988.0 (4353.0)	0.00 (0.00)
Example 2 CR ≈ 60%	PSIS-PLIK	325.0 (498.5)	0.50 (0.00)	3942.5 (5679.5)	0.50 (0.00)
	PSIS-Wald	322.0 (501.0)	0.50 (0.00)	3915.5 (5679.5)	0.50 (0.00)
	CS-PLIK ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (1.0)	1.00 (0.00)
	CS-Wald ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (2.0)	1.00 (0.00)
	CS-MPLE ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (4.0)	1.00 (0.00)
	CS-PLIK ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (4.0)	1.00 (0.00)
	CS-Wald ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (5.0)	1.00 (0.00)
	CS-MPLE ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	7.0 (7.2)	1.00 (0.00)
	CRIS	1000.0 (0.0)	0.50 (0.00)	10000.0 (0.0)	0.50 (0.00)
	CORS	1000.0 (0.0)	0.50 (0.50)	10000.0 (0.0)	0.00 (0.00)
Example 3 CR ≈ 20%	PSIS-PLIK	1000.0 (0.0)	0.50 (0.00)	10000.0 (0.0)	0.50 (0.00)
	PSIS-Wald	1000.0 (0.0)	0.50 (0.50)	10000.0 (0.0)	0.00 (0.50)
	CS-PLIK ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (0.0)	1.00 (0.00)
	CS-Wald ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (0.0)	1.00 (0.00)
	CS-MPLE ( $C_{1}$ )	3.0 (4.0)	1.00 (0.00)	7.5 (33.0)	1.00 (0.00)
	CS-PLIK ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (0.0)	1.00 (0.00)
	CS-Wald ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (0.0)	1.00 (0.00)
	CS-MPLE ( $C_{2}$ )	24.0 (29.0)	1.00 (0.00)	168.5 (266.8)	0.50 (0.50)
	CRIS	1000.0 (0.0)	0.50 (0.00)	10000.0 (0.0)	0.50 (0.00)
	CORS	783.0 (463.8)	0.00 (0.50)	7967.0 (4972.2)	0.00 (0.00)
Example 3 CR ≈ 60%	PSIS-PLIK	1000.0 (0.0)	0.50 (0.00)	10000.0 (0.0)	0.50 (0.00)
	PSIS-Wald	1000.0 (0.0)	0.00 (0.00)	10000.0 (0.0)	0.00 (0.00)
	CS-PLIK ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (0.0)	1.00 (0.00)
	CS-Wald ( $C_{1}$ )	2.0 (0.0)	1.00 (0.00)	2.0 (0.0)	1.00 (0.00)
	CS-MPLE ( $C_{1}$ )	20.0 (55.2)	1.00 (0.00)	161.5 (553.5)	0.50 (0.50)
	CS-PLIK ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (2.0)	1.00 (0.00)
	CS-Wald ( $C_{2}$ )	6.0 (0.0)	1.00 (0.00)	6.0 (3.0)	1.00 (0.00)
	CS-MPLE ( $C_{2}$ )	105.0 (127.5)	0.50 (0.50)	973.5 (1303.2)	0.50 (0.00)

Open in a new tab

Table 2.

Median minimum model size (MMS) and median true positive rates (TPR) along with their corresponding IQRs (in the parentheses) based on 400 simulated data sets

	Method	(n, p) = (100, 1000)		(n, p) = (100, 10000)
	Method	MMS	TPR	MMS	TPR
	CRIS	997.0 (17.0)	0.33 (0.17)	9954.5 (205.0)	0.17 (0.17)
	CORS	947.5 (180.5)	0.33 (0.33)	9566.0 (1618.2)	0.00 (0.17)
Example 4 CR ≈ 20%	PSIS-PLIK	1000.0 (4.0)	0.67 (0.17)	9997.0 (47.2)	0.33 (0.17)
	PSIS-Wald	1000.0 (4.0)	0.67 (0.17)	9997.0 (47.2)	0.33(0.17)
	CS-PLIK ( $C_{1}$ )	252.0 (383.0)	0.83 (0.17)	2428.0 (3387.2)	0.50 (0.33)
	CS-Wald ( $C_{1}$ )	253.0 (385.2)	0.83 (0.17)	2451.0 (3394.0)	0.50 (0.33)
	CS-MPLE ( $C_{1}$ )	250.0 (349.5)	0.83 (0.17)	2320.5 (3354.5)	0.50 (0.17)
	CS-PLIK ( $C_{2}$ )	237.5 (387.0)	0.83 (0.17)	2248.5 (3610.2)	0.50 (0.17)
	CS-Wald ( $C_{2}$ )	241.0 (388.8)	0.83 (0.17)	2258.0 (3629.2)	0.50 (0.17)
	CS-MPLE ( $C_{2}$ )	232.0 (366.5)	0.83 (0.17)	2139.0 (3554.8)	0.50 (0.33)
	CRIS	918.0 (144.2)	0.17 (0.33)	9137.0 (1529.2)	0.00 (0.00)
	CORS	895.0 (155.0)	0.00 (0.17)	8876.5 (1514.2)	0.00 (0.00)
Example 4 CR ≈ 60%	PSIS-PLIK	998.0 (26.0)	0.50 (0.33)	9972.5 (206.0)	0.17 (0.17)
	PSIS-Wald	998.0 (26.0)	0.50 (0.33)	9972.5 (212.2)	0.17(0.17)
	CS-PLIK ( $C_{1}$ )	387.0 (418.8)	0.67 (0.33)	4082.5 (4522.8)	0.33 (0.17)
	CS-Wald ( $C_{1}$ )	389.0 (423.5)	0.67 (0.33)	4076.5 (4553.5)	0.33 (0.17)
	CS-MPLE ( $C_{1}$ )	365.5 (377.2)	0.67 (0.33)	3906.5 (4356.5)	0.33 (0.33)
	CS-PLIK ( $C_{2}$ )	375.5 (472.2)	0.67 (0.33)	4298.5 (4557.8)	0.33 (0.17)
	CS-Wald ( $C_{2}$ )	373.0 (471.5)	0.67 (0.33)	4283.5 (4585.5)	0.33 (0.17)
	CS-MPLE ( $C_{2}$ )	373.0 (448.2)	0.67 (0.33)	4151.5 (4479.5)	0.33 (0.17)

Open in a new tab

The performance by CS-MPLE, CS-Wald and CS-PLIK are quite similar in all the cases in Examples 1, 2, and 4. In Example 3, there is a very high correlation among the covariate variables, CS-MPLE has a slightly larger MMS compared to the CS-Wald and CS-PLIK which well control the false discover rate in this cases.

Figure 3 shows that the proposed methods with $C_{1}$ efficiently identify all active variables by including much fewer variables than the other methods. Specifically, in Example 1 the proposed methods are capable of achieving a 100% TPR by choosing at most one fifth of the variables. In contrast, other competing methods have to include almost all the variables in order to discover all active variables. In Examples 2 and 3 the proposed methods quickly detect the remaining active variable (Z_p) conditioning on the active variable (Z₁), as opposed to the marginal screening approaches that have to include far more variables to ensure a 100% TPR.

Fig. 3 — Median number of active variables that are included in the model with different thresholds by different methods

5 Application

We illustrated the practical utility of the proposed method by applying it to analyze the DLBCL dataset of Rosenwald et al. (2002). The dataset, which was originally collected for identifying gene signatures relevant to the patient survival from time of chemotherapy, included a total of 240 DLBCL patients with 138 deaths observed during the followup and a median survival time of 2.8 years. Along with the clinical outcomes, the expression levels of 7399 genes were available for analysis. In our subsequent analysis, each gene expression was standardized to have mean zero and variance 1.

To facilitate the use of our method, we identified the conditional set by resorting to the medical literature. As gene AA805575, a Germinal-center B-cell signature gene, has been known to be predictive to DCBCL patients’ survival in the literature (see Liu et al. 2013; Gui and Li 2005), we used it as the conditional variable in our proposed procedure. For comparisons, we also analyzed the same data using various competing methods introduced in the simulation section and computed the corresponding concordance statistics (C-statistics) (Uno et al. 2011).

Specifically, we randomly assigned 160 patients to the training set and 80 patients to the testing set, while maintaining the censoring proportion roughly the same in each set. For each split, we applied each method to select top 31(= 160/ log(160)) variables as suggested by Fan and Lv (2008) using the training set. LASSO was performed subsequently for refined modeling, with the tuning parameter selected by the 10-fold cross-validation. The risk score for each subject was obtained by using the final model selected by LASSO in the training dataset and the C-statistics was obtained in the testing dataset. A total of 100 splits were made and the average C-statistics and the model size (MS) were reported in Table 3. By the criterion of C-statistics, the proposed method seemed to have more predictive power.

Table 3.

Summary of C-statistics and the model size for different methods

	CRIS	CORS	PSIS-PLIK	PSIS-Wald
C-statistics	0.54 (0.21)	0.58 (0.20)	0.58 (0.19)	0.55 (0.20)
Model size	14.41 (3.00)	6.83 (3.70)	15.22 (2.93)	15.65 (2.89)

	CS-MPLE	CS-PLIK	CS-Wald

C-statistics	0.63 (0.18)	0.63 (0.18)	0.62 (0.19)
Model size	16.74 (3.26)	15.90 (3.01)	16.28 (3.41)

Open in a new tab

Our further scientific investigation focused on identifying the relevant genes by utilizing the full dataset. Applying our proposed method, we selected top 240/ log(240) = 44 genes before using LASSO to reach the final list. It follows that CS-MPLE, CS-PLIK and CS-Wald selected 20, 16 and 16 genes, respectively. Among the 22 uniquely selected genes by either of them, 14 genes were overlapped and were reported in Table 4. Twelve genes among these 22 genes belong to lymph-node signature group, proliferation signature group, and germinal-center B-cell group defined by Rosenwald et al. (2002). We observed that 13 of these 22 genes were chosen by at least one of CRIS, CORS, PSIS-PLIK, and PSIS-Wald. On the other hand, gene AB007866, Z50115, S78085, U00238, AL050283, J03040, U50196, and AA830781, and M81695 were only identified by using our methods.

Table 4.

A comparison of genes that are selected by CS-MPLE, CS-PLIK and CS-Wald

GenBank ID	Signature	Description	CS-MPEE	CS-PLIK	CS-Wald
LC_25054			✓	✓	✓
X77743	Proliferation	Cyclin-dependent kinase 7	✓	✓	✓
U15552		Acidic 82 kDa protein mRNA	✓	✓	✓
AB007866		KIAA0406 gene product	✓
BC012161	Proliferation	Septin 1	✓	✓	✓
AF134159	Proliferation	Vhromosome 14 open reading frame 1	✓	✓	✓
Z50115	Proliferation	Thimet oligopeptidase 1	✓
S78085	Proliferation	Programmed cell death 2	✓
M29536	Proliferation	Eukaryotic translation initiation factor 2	✓	✓	✓
U00238	Proliferation	Phosphoribosyl pyrophosphate amidotransferase		✓
AL050283	Proliferation	Sentrin/SUMO-specific protease 3	✓
BF129543	Germinal-center-B-cell	ESTs, Weakly similar to A47224 thyroxine-binding globulin precursor	✓	✓	✓
M81695		integrin, alpha X	✓	✓	✓
D13666	Lymph node	Osteoblast specific factor 2 (fasciclin I-like)	✓	✓	✓
J03040	Lymph node	Secreted protein		✓	✓
U50196		Adenosine kinase	✓	✓	✓
U28918	Proliferation	Suppression of tumorigenicity 13	✓		✓
AA721746		ESTs	✓	✓	✓
AA830781			✓	✓	✓
AF127481		Lymphoid blast crisis oncogene	✓
M61906		Phosphoinositide-3-kinase	✓	✓	✓
D42043		KIAA0084 protein	✓	✓	✓

Open in a new tab

In fact, only a few studies have suggested an important role of M81695 (Deb and Reddy 2003; Chow et al. 2001; Mikovits et al. 2001; Stewart and Schuh 2000) or AA830781 (Li and Luan 2005; Binder and Schumacher 2009; Schifano et al. 2010) in predicting DLBCL survival. As the marginal correlation between M81695 and the survival time and between AA830781 and the survival time are markedly low at 0.008 and 0.097, respectively, these two genes are highly likely to be missed by using the conventional marginal screening approaches. Indeed, AA830781 was identified by Schifano et al. (2010) only because of its co-expression or correlation with other relevant genes. A more detailed investigation of the functions of the identified genes in the context of a broader class of blood cancers, including lymphoma, may shed light on preventing, treating and controlling the lethal blood cancers.

6 Discussion

In this paper, we have proposed a new conditional variable screening approach for the Cox proportional hazards model with ultra-high dimensional covariates. The proposed partial likelihood based CS approaches are computationally efficient, with a solid theoretical foundation. Our method and theory are extensions of the conditional sure independence screening (CSIS) (Barut et al. 2016), which is designed for the GLM. In the development of theory, we introduce the new concept of the conditional linear covariance for the first time, which is useful to specify the regularity conditions for the model identifiability and the sure screening property. This also provides a building block for a general theoretical framework of conditional variable screening in the context of other semi-parametric models, such as the partially linear single-index model.

We have mainly focused on studying the theoretical properties of CS-MPLE, which extends the work of Barut et al. (2016) in the GLM setting, though development of the inference procedures for the two variants of the proposed method, namely, CS-PLIK and CS-Wald, will be more involved and out of scope of this paper. However, as indicated by the simulation studies, these two variants may induce substantial improvement especially when the variables are highly correlated. More research is warranted.

Our work also enlightens a few directions that are worth ensuing effort. First, as our proposal requires the prior information to be known and informative, it remains statistically challenging to develop efficient screening methods in the absence of such information. Recently, in the context of GLM, Hong et al. (2016) has proposed a data-driven alternative when a pre-selected set of variables is unknown. It is thus of substantial interest to develop a data-driven CS for the survival model. Second, even with prior knowledge, an open question often lies in how to balance it with the information extracted from the given data. There has been some recent work on how to incorporate prior information. For example, Wang et al. (2013) developed a LASSO method by assigning different prior distributions to each subset according to a modified Bayesian information criterion that incorporates prior knowledge on both the network structure and the pathway information, and Jiang et al. (2015) proposed “prior lasso” (plasso) to balance between the prior information and the data. A natural extension of the current work is to develop a variable screening approach that incorporates more complex prior knowledge, such as the network structure or the spatial information of the covariates. We will report the progress elsewhere.

Acknowledgments

This research was partially supported by a grant from NSA (H98230-15-1-0260, Hong), an NIH grant (R01MH105561, Kang) and Chinese Natural Science Foundation (11528102, Li).

Appendix

The basic properties of the conditional linear expectation are listed in the following proposition.

Proposition 1

$v_{j} (β_{C, 0}, 0) = 0_{q + 1}$ if and only if $v_{j, j} (β_{C, 0}, 0) = 0$ , for all $j \in C$ .

The proof is straightforward based on Definitions 2 and 3.

Proposition 2

Let $ζ, ζ_{1}, ζ_{2}$ and $ξ$ be any four random variables in the probability space $(Ω, ℱ, P)$ The following properties hold for the conditional linear expectation $E^{*} [• | ξ]$ given $ξ$ :

Closed form: $E^{*} (ζ | ξ) = E [ζ] + Cov (ζ, ξ) Var {[ξ]}^{- 1} {ξ - E (ξ)}$ .
Stability: $E^{*} [ξ | ξ] = ξ$ .
Linearity: $E^{*} [A_{1} ζ_{1} + A_{2} ζ_{2} | ξ] = A_{1} E^{*} [ζ_{1} | ξ] + A_{2} E^{*} [ζ_{2} | ξ]$ , where A₁ and A₂ are two matrices that are compatible with the equation.
Law of total expectation: $E^{*} [E^{*} (ζ | ξ)] = E [E^{*} (ζ | ξ)] = E [ζ]$ .

Remark 1 In general, $E^{*} (ζ | ξ) \neq E (ζ | ξ)$ . Also, ζ and ξ are independent does not imply $E^{*} (ζ | ξ) = 0$ , unless ζ and ξ are jointly normally distributed.

Remark 2 By Proposition 2, we can easily verify the following properties.

Proposition 3

The conditional linear covariance defined in Definition 5 has the following properties:

Linear independence and linear zero correlation:
${Cov}^{*} (ζ_{1}, ζ_{2} | ξ) = 0 \Leftrightarrow E^{*} (ζ_{1} ζ_{2} | ξ) = E^{*} (ζ_{1} | ξ) E^{*} (ζ_{2} | ξ)$
Expectation of conditional linear covariance:
${E [Cov}^{*} (ζ_{1}, ξ_{2} | ξ)] = Cov (ζ_{1}, ξ_{2}) - Cov (ζ_{1}, ξ) Var ({ξ)}^{- 1} Cov (ξ, ζ_{2}) .$
Sign: for any increasing function $h (\cdot) : ℝ \to ℝ$ and random variable $η : Ω \to ℝ$ , then
${Cov}^{*} (h (η), η | ξ) \geq 0 .$

Combining Propositions 1–3 and based on Definition 6 we have the following property.

Proposition 4

$v_{j, j} (β_{C, 0}, 0) = 0$ if and only if $v_{j} (β_{C, 0}, 0) = 0$ .

Lemma 1

The solution of $v_{j} (β_{C}, β) = 0_{q + 1}$ and the solution of $v_{C} (β_{C}) = 0_{q}$ are both unique, for any $j \notin C$ .

Proof of Theorem 1

Proof First we make the connection between β_j to the expected conditional linear covariance between Z_j and $P [δ = 1 | Z]$ given $Z_{C}$ , that is

{E [Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})],

then by Condition 2, we relate it to $α_{j}$ . For any $j \notin C$ and $k \in C$ , it is straightforward to see that

s_{k}^{(m)} (t) = {E [Z}_{k}^{m} λ_{0} (t) \exp (Z^{T} α) S_{T} (t | Z) S_{C} (t)],

(21)

and

r_{j, k}^{(m)} (t, β_{C}, β) = {E [Z}_{k}^{m} \exp (Z_{C}^{T} β_{C} + Z_{j} β) S_{T} (t | Z) S_{C} (t)],

(22)

for m = 0, 1. Then

\begin{array}{l} v_{j, k} (β_{C}, β) \\ = \int_{0}^{τ} E [W_{j, k} (t, β_{C}, β) \exp (Z^{T} α) S_{T} (t | Z) S_{C} (t) λ_{0} (t)] d t, \end{array}

(23)

where

W_{j, k} (t, β_{C}, β) = Z_{k} - \frac{E [Z_{k} \exp (Z_{C}^{T} β_{C} + Z_{j} β) S_{T} (t | Z) S_{C} (t)]}{E [\exp (Z_{C}^{T} β_{C} + Z_{j} β) S_{T} (t | Z) S_{C} (t)]}

By Proposition 2,

\begin{array}{l} E [W_{j, k} (t, β_{C}, β) \exp (Z^{T} α) S_{T} (t | Z) S_{C} (t)] \\ = E {E^{*} [W_{j, k} (t, β_{C}, β) \exp (Z^{T} α) S_{T} (t | Z) S_{C} (t)]} . \end{array}

By Definition 6,

\begin{array}{l} v_{j} (β_{C}, β) = v_{j, j} (β_{C}, β) - \sum_{k \in C} a_{k} v_{j, k} (β_{C}, β) \\ = E [{Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})] - g_{j} (β_{C}, β), \end{array}

where

\begin{array}{l} E [{Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})] \\ = \int_{0}^{τ} E [(Z_{j} - E^{*} [Z_{j} | Z_{C})] \exp (Z^{T} α) S_{T} (t | Z) S_{C} (t) λ_{0} (t)] d t, \end{array}

and

\begin{array}{l} g_{j} (β_{C}, β) \\ = \int_{0}^{τ} \frac{E [(Z_{j} - E^{*} [Z_{j} | Z_{C}]) \exp (Z_{C}^{T} β_{C} + Z_{j} β) S_{T} (t | Z) S_{C} (t)]}{E [\exp (Z_{C}^{T} β_{C} + Z_{j} β) S_{T} (t | Z) S_{C} (t)]} \\ \times E [\exp (Z^{T} α) S_{T} (t | Z) λ_{0} (t) S_{C} (t)] d t . \end{array}

By Definition 2, $v_{j} (β_{C, j}, β_{j}) = 0_{q + 1}$ ,

g_{j} (β_{C, j}, β_{j}) = {E [Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})] .

When $α_{j} = 0$ , then $E [{Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})] = 0$ by Condition 2.3. Thus $g_{j} (β_{C, j}, β_{j}) = 0$ . Also, by Propositions 1 and 2, $g_{j} (β_{C, 0}, 0) = 0$ , then $v_{j} (β_{C, 0}, 0) = 0_{q + 1}$ . By uniqueness in Lemma 1, β_j = 0.

When α _j ≠ 0, by Condition 2, we have

| g_{j} (β_{C, j}, β_{j}) | = | E [{Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})] | > c_{1} n^{- κ} .

This implies that $g_{j} (β_{C, j}, β_{j})$ and $E [{Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})]$ are both nonzero and have the same signs since they are equal. Next we show for any $β_{C}, g_{j} (β_{C}, 0)$ and $E [{Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})]$ have the opposite signs unless they are equal to zero. This fact implies that β_j ≠ 0. Specifically, note that P(δ = 1| Z) is the probability of occurring the event and $S_{T} (t | Z) S_{C} (t) = P (X > t | Z)$ represents the probability at risk at time t. Based on Model (1), for any t,

\frac{\partial P (X > t | Z)}{\partial Z_{j}} \times \frac{\partial P (δ = 1 | Z)}{\partial Z_{j}} \leq 0 .

By Proposition 3, ${Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})$ and ${Cov}^{*} [Z_{j}, S_{T} (t | Z) S_{C} (t) | Z_{C}]$ have the opposite signs unless they are zero. This further implies that for any $β_{C}$ ,

\begin{array}{l} g_{j} (β_{C}, 0) = \int_{0}^{τ} \frac{E [\exp (Z_{C}^{T} β_{C}) {Cov}^{*} [Z_{j}, S_{T} (t | Z) S_{C} (t) | Z_{C}]]}{E [\exp (Z_{C}^{T} β_{C}) S_{T} (t | Z) S_{C} (t)]} \\ \times E [\exp (Z^{T} α) S_{T} (t | Z) λ_{0} (t) S_{C} (t)] d t, \end{array}

and ${E [Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})]$ have opposite signs unless they are equal to zero. Therefore, β_j ≠ 0.

Proof of Theorem 2

Proof For any $j \in ℳ_{- C}$ , we have β_j ≠ 0 by Theorem 1, by mean value theorem, for some ${\tilde{β}}_{j} \in (0, β_{j})$ ,

| v_{j} (β_{C, j}, 0) | = | v_{j} (β_{C, j}, β_{j}) - v_{j} (β_{C, j}, 0) | = | \frac{\partial v_{j}}{\partial β} (β_{C, j}, {\tilde{β}}_{j}) | | β_{j} | .

Next we show that $| \frac{\partial v_{j}}{\partial β} (β_{C, j}, {\tilde{β}}_{j}) |$ is bounded. For given any $β_{C}$ , consider $g_{j} (β_{C}, β)$ as a function of β, Then

\frac{\partial g_{j}}{\partial β} (β_{C}, β) = E [\int_{0}^{τ} H_{j} (t, β_{C}, β) S_{C} (t) d F_{T} (t | Z)],

where

\begin{array}{l} H_{j} (t, β_{C}, β) = \frac{E [\exp (Z_{C}^{T} β_{C}) {Cov}^{*} [Z_{j}^{2} \exp (Z_{j} β), S_{T} (t | Z) | Z_{C}]]}{E [\exp (Z_{C}^{T} β_{C} + Z_{j} β), S_{T} (t | Z)]} \\ - \frac{E [\exp (Z_{C}^{T} β_{C}) {Cov}^{*} [Z_{j} \exp (Z_{j} β), S_{T} (t | Z) | Z_{C}]] {E [Z}_{j} \exp (Z_{C}^{T} β_{C} + Z_{j} β), S_{T} (t | Z)]}{{[E [\exp (Z_{C}^{T} β_{C} + Z_{j} β), S_{T} (t | Z)]]}^{2}} . \end{array}

By Condition 2.1, P(|Z| < K₀) = 1, then $\sup_{β_{C}, β} | H_{j} (t, β_{C}, β) | \leq 2 K_{0}^{2}$ Thus,

| \frac{\partial v_{j}}{\partial β} (β_{C, j}, {\tilde{β}}_{j}) | \leq \sup_{β_{C}, β} | \frac{\partial g_{j}}{\partial β} (β_{C}, β) | \leq 2 K_{0}^{2} | E [E [S_{C} (T) | Z]] \leq 2 K_{0}^{2} .

By the proof in Theorem 1, $g (β_{C, j}, 0)$ and ${E [Cov}^{*} (Z_{j}, E {F_{T} (C | Z) | Z} | Z_{C})]$ have the opposite signs, and by Condition 2,

| v_{j} (β_{C, j}, 0) | = | {E [Cov}^{*} (Z_{j}, P [δ = 1 | Z] | Z_{C})] | + | g_{j} (β_{C, j}, 0) | > c_{1} n^{- κ} .

Taking $c_{2} = 0.5 K_{0}^{- 2} c_{1}$ , $β_{j} > 0.5 K_{0}^{- 2} | v_{j} (β_{C, j}, 0) | > c_{2} n^{- κ}$ . This completes the proof.□

Proof of Theorem 3

Proof For any $j \notin C$ and $k \in C \cup {j}$ , by Lin and Wei (1989), we have

{\bar{V}}_{j} (β_{C}, β) = E_{n} {W_{i, j} (β_{C}, β)} + o_{p} (1),

where E_n [·] denotes the empirical measure, which is defined as $E_{n} [ξ_{i}] = n^{- 1} \sum_{i = 1}^{n} ξ_{i}$ for any random variables $ξ_{1}, \dots, ξ_{n}$ , and $W_{i, j} (β_{C}, β)$ are independent over i, and write $W_{i, j} (β_{C}, β) = {[W_{i, j, k} (β_{C}, β), k \in C \cup {j}]}^{T}$ with

\begin{array}{l} W_{i, j, k} (β_{C}, β) = \int_{0}^{τ} {Z_{i, k} - \frac{r_{j, k}^{(1)} (β_{C}, β, t)}{r_{j, k}^{(0)} (β_{C}, β, t)}} d N_{i} (t) \\ - \int_{0}^{τ} \frac{Y_{i} (t) \exp (Z_{i, C} β_{C}^{T} + Z_{i, j} β)}{r_{j, k}^{(0)} (β_{C}, β, t)} {Z_{i, k} - \frac{r_{j, k}^{(1)} (β_{C}, β, t)}{r_{j, k}^{(0)} (β_{C}, β, t)}} dE [N_{i} (t)] . \end{array}

Note that given any i, j, k, with probability one $| W_{i, j, k} (β_{C}, β) |$ are uniformly bounded. Specifically, by Conditions 1.2, 2.1 and 3, with probability one, for all $t \in [0, τ]$ , ${(β_{C}^{T}, β)}^{T} \in ℬ_{j}$ ,

\begin{array}{l} | Z_{i, k} - \frac{r_{j, k}^{(1)} (β_{C}, β, t)}{r_{j, k}^{(0)} (β_{C}, β, t)} | \leq | Z_{i, k} | + K_{0,} \\ | \frac{Y_{i} (t) \exp (Z_{i, C} β_{C}^{T} + Z_{i, j} β)}{r_{j, k}^{(0)} (β_{C}, β, t)} | \leq \exp {K_{0} (K_{1} + δ) - \log (L)}, \end{array}

and

| \int_{0}^{τ} dE [N_{i} (t)] | \leq Λ_{0} (τ) \exp (K_{0} K_{1}) .

Thus, with probability one,

| W_{i, j, k} (β_{C}, β) | \leq K_{2},

where $K_{2} = 2 K_{0} (1 + Λ_{0} (τ) \exp (2 K_{0} K_{1} + K_{0} δ - \log L))$ . By the fact that ${E [W}_{i, j, k} (β_{C}, β)] = 0$ ,

Var [W_{i, j, k} (β_{c}, β)] = E [| W_{i, j, k} (β_{c}, β) |^{2}] < K_{2}^{2} .

By Lemma 2.2.9 (Bernsterin’s inequality) of Vaart and Wellner (1996), for any t > 0, for all j, k, $β_{C}$ and β, we have

P (| E_{n} (W_{i, j, k} (β_{C}, β)) | > \frac{t}{n}) \leq 2 \exp (- \frac{1}{2} \frac{t^{2}}{n K_{2}^{2} + K_{2} t / 3}) .

Note that the above inequality holds for every $j \notin C$ and $k \in C \cup {j}$ By Bonferroni inequality,

P ({‖ E_{n} (W_{i, j} (β_{C}, β)) ‖}_{2} > \frac{t}{(q + 1) n}) \leq 2 (q + 1) \exp (- \frac{1}{2} \frac{t^{2}}{n K_{2}^{2} + K_{2} t / 3}) .

Since,

{‖ {\bar{V}}_{j} (β_{C}, β) - E_{n} (W_{i, j} (β_{C}, β)) ‖}_{2} = o_{p} (1) .

Then for any ε₁ > 0 and ε₂ > 0, there exits N₁, such that for any n > N₁,

P ({‖ {\bar{V}}_{j} (β_{C}, β) - E_{n} (W_{i, j} (β_{C}, β)) ‖}_{2} > M_{\in_{1}} / 2) < \in_{2},

where M is the same value in Condition 4. By Triangle inequality and Bonferroni inequality, we have

\begin{array}{l} P ({‖ {\bar{V}}_{j} (β_{C}, β) ‖}_{2} > \frac{t}{(q + 1) n}) \\ \leq P ({‖ E_{n} (W_{i, j} (β_{C}, β)) ‖}_{2} > \frac{t}{(q + 1) n} - M_{\in_{1}} / 2) \\ + P ({‖ {\bar{V}}_{j} (β_{C}, β) - E_{n} (W_{i, j} (β_{C}, β)) ‖}_{2} > M_{\in_{2}} / 2) . \end{array}

When $n \to \infty$ , take $t = c_{2} M (q + 1) n^{1 - κ} / 2 > 0$ on both side of the inequality, where c₂ is the same value in Theorem 2, we have

\begin{array}{l} P ({‖ {\bar{V}}_{j} (β_{C}, β) ‖}_{2} > \frac{M c_{2}}{2} (n^{- κ} - ε_{1})) \\ \leq 2 (q + 1) \exp (- \frac{M^{2} c_{2}^{2}}{8 {(q + 1)}^{2}} \frac{n^{1 - 2 κ}}{K_{2}^{2} + K_{2} n^{- κ} / 3}) + ε_{2} . \end{array}

Take N = max{⌈(K₂/3)^1/κ⌉,N₁}, then for any n > N, n^−κ< 3/K₂, and

P ({‖ {\bar{V}}_{j} (β_{C}, β) ‖}_{2} > \frac{M c_{2}}{2} (n^{- κ} - \in_{1})) \leq 2 (q + 1) \exp (- \frac{M^{2} c_{2}^{2}}{8 {(q + 1)}^{2}} \frac{n^{1 - 2 κ}}{K_{2}^{2} + 1}) + ε_{2}

Note that the above inequality holds for all ${(β_{C}^{T}, β)}^{T} \in ℬ_{j}$ , particularly for ${(β_{C, j}^{T}, β_{j})}^{T}, j \notin C$ . Also, we have $\bar{V_{j}} ({\hat{β}}_{C, j}, {\hat{β}}_{j}) = 0_{q + 1}$ .By Condition 4, we have

\begin{array}{l} P (| {\hat{β}}_{j} - β_{j} | > \frac{c_{2}}{2} (n^{- κ} - \in_{1})) \\ \leq P ({‖ {({\hat{β}}_{C, j}^{T}, {\hat{β}}_{j})}^{T} - {(β_{C, j}^{T}, β_{j})}^{T} ‖}_{2} > \frac{c_{2}}{2} (n^{- κ} - \in_{1})) \\ \leq 2 (q + 1) \exp (- \frac{M^{2} c_{2}^{2}}{8 {(q + 1)}^{2}} \frac{n^{1 - 2 κ}}{K_{2}^{2} + 1}) + \in_{2} . \end{array}

Taking $c_{3} = \frac{M^{2} c_{2}^{2}}{8 {(q + 1)}^{2} (K_{2}^{2} + 1)}$ and by Bonferroni completes the proof for part 1.

For part 2, by Theorem 2,

min_{j \in ℳ_{- C}} | β_{j} | > c_{2} n^{- κ} .

Note that, for any $j \in ℳ_{- C}$ , event

\begin{array}{l} {| {\hat{β}}_{j} - β_{j} | \leq c_{2} n^{- κ} / 2 - \in_{1}} \\ \subseteq {| {\hat{β}}_{j} | \geq | β_{j} | - c_{2} n^{- κ} / 2 + \in_{1}} \\ \subseteq {| {\hat{β}}_{j} | \geq | c_{2} n^{- κ} / 2 + \in_{1}} . \end{array}

Take $γ_{n} = c_{4} n^{- κ}$ with c₄ = c₂/4,

\begin{array}{l} {max_{j \in ℳ_{- C}} | {\hat{β}}_{j} - β_{j} | \leq c_{2} n^{- κ} / 2 - \in_{1}} \\ \subseteq {max_{j \in ℳ_{- C}} | {\hat{β}}_{j} | \geq c_{2} n^{- κ} / 2 + \in_{1}} \\ \subseteq {max_{j \in ℳ_{- C}} | {\hat{β}}_{j} | \geq γ_{n} + \in_{1}} . \end{array}

Thus,

\begin{array}{l} P [ℳ_{- C} \subseteq {\hat{ℳ}}_{- C}] \\ = P [min_{j \in ℳ_{- C}} | {\hat{β}}_{j} | > γ_{n}] \\ \geq P [min_{j \in ℳ_{- C}} | {\hat{β}}_{j} | > γ_{n} + \in_{1}] \\ \geq 1 - P [min_{j \in ℳ_{- C}} | {\hat{β}}_{j} - β_{j} | \leq c_{2} n^{- κ} / 2 - \in_{1}] \\ \geq 1 - 2 w (q + 1) \exp (- c_{3} n^{1 - 2 κ}) - \in_{2} . \end{array}

Let $n \to \infty$ we have for any $ε_{2} > 0$ ,

\lim_{n \to \infty} P [M_{- C} \subseteq {\hat{M}}_{- C}] \geq 1 - ε_{2} .

Note that the left side of the above equation does not depends on n any more. Taking $ε_{2} \to 0$ completes proof.

References

Barut E, Fan J, Verhasselt A. Conditional sure independence screening. J Am Stat Assoc. 2016;116:544–557. doi: 10.1080/01621459.2015.1092974. [DOI] [PMC free article] [PubMed] [Google Scholar]
Binder H, Schumacher M. Incorporating pathway information into boosting estimation of highdimensional risk prediction models. BMC Bioinform. 2009;10:18. doi: 10.1186/1471-2105-10-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chow ML, Moler EJ, Mian IS. Identifying marker genes intranscriptionprofiling data using a mixture of feature relevance experts. Physiol Genomics. 2001;5:99–111. doi: 10.1152/physiolgenomics.2001.5.2.99. [DOI] [PubMed] [Google Scholar]
Deb K, Reddy AR. Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems. 2003;72:111–129. doi: 10.1016/s0303-2647(03)00138-2. [DOI] [PubMed] [Google Scholar]
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with discussion) J R Stat Soc B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J, Feng Y, Wu Y. IMS collections borrowing strength: theory powering applications—A Festschrift for Lawrence D Brown. Vol. 6. Institute of Mathematical Statistics; Beachwood: 2010. High-dimensional variable selection for Cox’s proportional hazards model; pp. 70–86. [Google Scholar]
Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics. 2005;21(13):3001–3008. doi: 10.1093/bioinformatics/bti422. [DOI] [PubMed] [Google Scholar]
Hong H, Wang L, He X. A data-driven approach to conditional screening of high dimensional variables. Stat. 2016;5(1):200–212. [Google Scholar]
Jiang Y, He Y, Zhang H. Variable selection with prior information for generalized linear models via the prior Lasso method. J Am Stat Assoc. 2015;111(513):355–376. doi: 10.1080/01621459.2015.1008363. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Luan Y. Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics. 2005;21(10):2403–2409. doi: 10.1093/bioinformatics/bti324. [DOI] [PubMed] [Google Scholar]
Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. J Am Stat Assoc. 1989;84(408):1074–1078. [Google Scholar]
Liu XY, Liang Y, Xu ZB, Zhang H, Leung KS. Adaptive $l_{1 / 2}$ shooting regularization method for survival analysis using gene expression data. Sci World J. 2013;2013:475702. doi: 10.1155/2013/475702. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mikovits J, Ruscetti F, Zhu W, Bagni R, Dorjsuren D, Shoemaker R. Potential cellular signatures of viral infections in human hematopoietic cells. Dis Markers. 2001;17(3):173–178. doi: 10.1155/2001/896953. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rosenwald A, Wright G, Chan W, Connors J, Campo E, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med. 2002;346(25):1937–1947. doi: 10.1056/NEJMoa012914. [DOI] [PubMed] [Google Scholar]
Schifano ED, Strawderman RL, Wells MT. Mm algorithms for minimizing nonsmoothly penalized objective functions. Electron J Stat. 2010;4:1258–1299. [Google Scholar]
Song R, Lu W, Ma S, Jeng XJ. Censored rankindependence screening for high-dimensional survival data. Biometrika. 2014;101(4):799–814. doi: 10.1093/biomet/asu047. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stewart AK, Schuh AC. White cells 2: impact of understanding the molecular basis of haematological malignant disorders on clinical practice. Lancet. 2000;355(9213):1447–1453. doi: 10.1016/s0140-6736(00)02150-4. [DOI] [PubMed] [Google Scholar]
Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–1117. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van Der Vaart AW, Wellner JA. Weak convergence. Springer; New York: 1996. [Google Scholar]
Wang Z, Xu W, San Lucas F, Liu Y. Incorporating prior knowledge into gene network study. Bioinformatics. 2013;29:2633–2640. doi: 10.1093/bioinformatics/btt443. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao SD, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal. 2012;105(1):397–411. doi: 10.1016/j.jmva.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Barut E, Fan J, Verhasselt A. Conditional sure independence screening. J Am Stat Assoc. 2016;116:544–557. doi: 10.1080/01621459.2015.1092974. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Binder H, Schumacher M. Incorporating pathway information into boosting estimation of highdimensional risk prediction models. BMC Bioinform. 2009;10:18. doi: 10.1186/1471-2105-10-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Chow ML, Moler EJ, Mian IS. Identifying marker genes intranscriptionprofiling data using a mixture of feature relevance experts. Physiol Genomics. 2001;5:99–111. doi: 10.1152/physiolgenomics.2001.5.2.99. [DOI] [PubMed] [Google Scholar]

[R4] Deb K, Reddy AR. Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems. 2003;72:111–129. doi: 10.1016/s0303-2647(03)00138-2. [DOI] [PubMed] [Google Scholar]

[R5] Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with discussion) J R Stat Soc B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Fan J, Feng Y, Wu Y. IMS collections borrowing strength: theory powering applications—A Festschrift for Lawrence D Brown. Vol. 6. Institute of Mathematical Statistics; Beachwood: 2010. High-dimensional variable selection for Cox’s proportional hazards model; pp. 70–86. [Google Scholar]

[R7] Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics. 2005;21(13):3001–3008. doi: 10.1093/bioinformatics/bti422. [DOI] [PubMed] [Google Scholar]

[R8] Hong H, Wang L, He X. A data-driven approach to conditional screening of high dimensional variables. Stat. 2016;5(1):200–212. [Google Scholar]

[R9] Jiang Y, He Y, Zhang H. Variable selection with prior information for generalized linear models via the prior Lasso method. J Am Stat Assoc. 2015;111(513):355–376. doi: 10.1080/01621459.2015.1008363. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Li H, Luan Y. Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics. 2005;21(10):2403–2409. doi: 10.1093/bioinformatics/bti324. [DOI] [PubMed] [Google Scholar]

[R11] Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. J Am Stat Assoc. 1989;84(408):1074–1078. [Google Scholar]

[R12] Liu XY, Liang Y, Xu ZB, Zhang H, Leung KS. Adaptive $l_{1 / 2}$ shooting regularization method for survival analysis using gene expression data. Sci World J. 2013;2013:475702. doi: 10.1155/2013/475702. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Mikovits J, Ruscetti F, Zhu W, Bagni R, Dorjsuren D, Shoemaker R. Potential cellular signatures of viral infections in human hematopoietic cells. Dis Markers. 2001;17(3):173–178. doi: 10.1155/2001/896953. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Rosenwald A, Wright G, Chan W, Connors J, Campo E, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med. 2002;346(25):1937–1947. doi: 10.1056/NEJMoa012914. [DOI] [PubMed] [Google Scholar]

[R15] Schifano ED, Strawderman RL, Wells MT. Mm algorithms for minimizing nonsmoothly penalized objective functions. Electron J Stat. 2010;4:1258–1299. [Google Scholar]

[R16] Song R, Lu W, Ma S, Jeng XJ. Censored rankindependence screening for high-dimensional survival data. Biometrika. 2014;101(4):799–814. doi: 10.1093/biomet/asu047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Stewart AK, Schuh AC. White cells 2: impact of understanding the molecular basis of haematological malignant disorders on clinical practice. Lancet. 2000;355(9213):1447–1453. doi: 10.1016/s0140-6736(00)02150-4. [DOI] [PubMed] [Google Scholar]

[R18] Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–1117. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Van Der Vaart AW, Wellner JA. Weak convergence. Springer; New York: 1996. [Google Scholar]

[R20] Wang Z, Xu W, San Lucas F, Liu Y. Incorporating prior knowledge into gene network study. Bioinformatics. 2013;29:2633–2640. doi: 10.1093/bioinformatics/btt443. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Zhao SD, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal. 2012;105(1):397–411. doi: 10.1016/j.jmva.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Conditional screening for ultra-high dimensional covariates with survival outcomes

Hyokyoung G Hong

Jian Kang

Yi Li

Abstract

1 Introduction

Fig. 1.

2 Model

2.1 The Cox proportional hazards model

2.2 Cox conditional screening

3 Theoretical results

3.1 Definitions and basic properties

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

3.2 Regularity conditions

Condition 1

Condition 2

Condition 3

Condition 4

3.3 Properties on population level

Theorem 1

Theorem 2

3.4 Properties on sample level

Theorem 3

3.5 Alternative methods

4 Simulation studies

Fig. 2.

Table 1.

Table 2.

Fig. 3.

5 Application

Table 3.

Table 4.

6 Discussion

Acknowledgments

Appendix

Proposition 1

Proposition 2

Proposition 3

Proposition 4

Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases