Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jan 6.
Published in final edited form as: Lifetime Data Anal. 2016 Dec 8;24(1):45–71. doi: 10.1007/s10985-016-9387-7

Conditional screening for ultra-high dimensional covariates with survival outcomes

Hyokyoung G Hong 1, Jian Kang 2,, Yi Li 2
PMCID: PMC5494024  NIHMSID: NIHMS865250  PMID: 27933468

Abstract

Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data.

Keywords: Conditional screening, Cox model, Diffuse large B-cell lymphoma, High-dimensional variable screening

1 Introduction

Despite much progress made in the past two decades, many cancers do not have a proven means of prevention or effective treatments. Precision medicine that takes into account individual susceptibility has become a valid approach to gaining better insights into the biological influences on cancers, which is expected to benefit millions of cancer patients. A critical component of precision medicine lies in detecting and identifying important biomarkers that are predictive for cancer patients’ prognosis. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The presented work is motivated by a genomic study of diffuse large B-cell lymphoma (DLBCL) (Rosenwald et al. 2002), with the goal of identifying gene signatures out of 7399 genes for predicting survival among 240 DLBCL patients. The results may address whether the DLBCL patients’ survival after chemotherapy could be regulated by the molecular features.

The recently developed variable screening methods, such as the sure independence screening proposed by Fan and Lv (2008), have emerged as a powerful tool to solve this problem, but their validity often hinges upon the partial faithfulness assumption, that is, the jointly important variables are also marginally important. Consequently, they will fail to identify the hidden variables that are jointly important but have weak marginal associations with the outcome, resulting in poor understanding of the molecular mechanism underlying or regulating the disease. To alleviate this problem, Fan and Lv (2008) further suggested an iterative procedure (ISIS) by repeatedly using the residuals from the previous iterations, which has gained much popularity. However, the required iterations have increased the computational burden, and the statistical properties are elusive.

On the other hand, intensive biomedical research has generated a large body of biological knowledge. For example, several studies have confirmed AA805575, a germinal-center B-cell signature gene, is relevant to DLBCL survival (Gui and Li 2005; Liu et al. 2013). Including such prior knowledge for improved accuracy in variable selection has drawn much interest. Barut et al. (2016) proposed a conditional screening (CS) approach in the framework of a generalized linear model (GLM) when some prior knowledge on feature selection is known, and showed that the CS approach provides a useful means to identify jointly informative but marginally weak associations, and Hong et al. (2016) further proposed to integrate prior information using data-driven approaches.

Development of high dimensional screening tools with survival outcome has been fruitful. Some related work includes an (iterative) sure screening procedure for Cox’s proportional hazards model (Fan et al. 2010), a marginal maximum partial likelihood estimator (MPLE) based screening procedure (Zhao and Li 2012), a censored rank independence screening method which is robust to outliers and applicable to a general class of survival models (Song et al. 2014). But to the best of our knowledge, all these methods essentially posit the partial faithfulness assumption and do not incorporate the known prior biological information. As a result, they will be likely to suffer the inability to identify marginally weak but jointly important signals.

To fill the gap, we propose a new CS method for the Cox proportional hazards model by computing the marginal contribution of each covariate given priorily known information. We refer to it as Cox conditional screening (CoxCS). As opposed to the conventional marginal screening methods, our method is more likely to detect marginally weak but jointly important signals, which will have important biological applications as shown in the data example section. Moreover, in contrast with most screening methods that usually employ subjective thresholds for screening, we also propose a principled cut-off to govern the screening and control the false positives in light of Zhao and Li (2012). This will be especially important in the presence of hidden variables.

To demonstrate the utility of CoxCS in recovering important hidden variables, we briefly discuss using Example 1 in Sect. 4. By design, variable 6 is the hidden variable in that example as it is only weakly correlated with the survival time marginally. Let β^C,j denote the screening statistics by the CoxCS approach (defined in Sect. 2), where C indexes variables that are pre-included into the model. When C=, the CoxCS is equivalent to the marginal screening approach for the Cox proportional hazards model. Figure 1 summarizes the densities of the screening statistics for the hidden variable 6 and noisy variables 10–1000 for different sets of conditional variables based on 400 simulated datasets. The results show that with a high probability the marginal screening statistic for hidden variable 6 is much smaller than those of noisy variables. When the conditioning set includes one truly active variable, the density plots show a clear separation between the hidden variables and the noisy variables. When we include more truly active variables, this separation becomes larger. Interestingly, when conditional on noisy variables that are correlated with both active and hidden variables in the model, the chance of identifying the hidden variable using CoxCS is still higher than the marginal screening. A similar phenomenon was observed in the GLM setting (Barut et al. 2016). This is because when such “noisy” variables are correlated with both marginally important variables and hidden variables, they may effectively function as surrogates for the active variables and conditioning on them can help detect hidden variables.

Fig. 1.

Fig. 1

Density of the screening statistics |β^C,6| (red) for the hidden variable compared with a mixture of densities of screening statistics |β^C,11:1000| (blue) for the noise variables with different conditioning sets: a C={} which is equivalent to marginal screening; b C={1} one truly active variables; c C={1,2} two truly active variables; d C={7,,10} four noisy variables (Color figure online)

The theory of CS for GLM has been established by Barut et al. (2016). But its extension to the survival context is challenging and elusive, calling for new techniques. To this end, we propose two new functional operators on random variables to characterize their linear associations given other random variables: the conditional linear expectation and the conditional linear covariance. Both are critical to formulate the regularity conditions for the population level properties of CoxCS with statistically meaningful interpretations, and facilitate the development of theory for CS approaches in general settings. A similar concept of the conditional linear covariance has been introduced by Barut et al. (2016), but it can not be used for the survival outcome data. In summary, the proposed method is computationally efficient, adapts to sparse and weak signals, enjoys the good theoretical properties under weak regularity conditions, and works robustly in a variety of settings to identify hidden variables.

The remaining of this paper is organized as follows. In Sect. 2, we review the Cox proportional hazards model and present CoxCS approach with some alternatives. In Sect. 3, we list the regularity conditions and establish the sure screening properties. In Sect. 4, we further conduct simulation studies to compare our method with the major competing methods under under a number of scenarios. In Sect. 4, we apply our method to study the DLBCL data. We conclude with a brief discussion on the future work in Sect. 5.

2 Model

2.1 The Cox proportional hazards model

Suppose we have n observations with p covariates. Let i and j be respectively index subjects and covariates. Denote by Zi,j covariate j for subject i, write Zi = (Zi,1,…, Zi,p)T. Let Ti be the underlying survival time and Ci be the censoring time. We observe Xi=min{Ti,Ci} and, δi=I[TiCi] where I(·) is the indicator function. Assume that there exists τ > 0, such that P(Xi > τ|Zi) > 0 and assume that the event time Ti and the censoring time Ci are independent given Zi. Suppose Ti follows a Cox proportional hazards model

λ(t;Zi)=λ0(t)exp(αTZi), (1)

where λ0(t) is an-unspecified baseline hazard and α=(α1,,αp)T is the true-coefficient. Let Λ0(t)=0tλ0(s)ds be the baseline cumulative hazard function. Suppose there is a set of covariates that are known a priori to be related to the survival outcome. Denote by C the indices of these covariates. Let q = | C| be the number of covariates in C. Write Zi,C=(Zi,j,jC)T, Zi,C=(Zi,j,jC)T, αC=(αj,jC)T and αC=(αj,jC)T.Note that in our problem, C is known but both αC and αC are unknown. Then the true hazard function in (1) is equivalent to

λ(t;Zi)=λ0(t)exp(αCTZi,C+αCTZi,C). (2)

To estimate αC and αC, we introduce the independent counting process Ni(t)=I(Xit,δi=1) and the at-risk process Yi(t)=I[Xit]. When p is small, we can obtain the partial likelihood estimator α^=(α^CT,α^CT)T by solving the estimation equation U(α) = 0p with U(α) = (U1(α),…,Up(α)) and

Uj(α)=i=1n0τ{Zi,jSj(1)(t,α)Sj(0)(t,α)dNi(t)}, (3)

with

Sj(m)(t,α)=1ni=1nZi,jmYi(t)exp(αCTZi,C+αCTZi,C), (4)

for m{0,1,2,} When p > n, it is computationally and theoretically infeasible to directly solve Eq. (3). By imposing sparsity on the coefficients, one may maximize the penalized partial likelihood to obtain solutions. However, when pn, we need to employ a variable screening procedure first before performing regularized regression. We propose a new CS procedure in the next section.

2.2 Cox conditional screening

We fit the marginal Cox regression by including the known covariates in C. Specifically, for jC, we have the following marginal Cox regression model

λj(t;Zi)=λj,0(t)exp(βCTZi,C+βZi,j), (5)

from which the maximum partial likelihood estimation equation (β^C,jT,β^j)T can be obtained. It is given by the solution of the following equation:

Vj(βC,β)=[Vj,k(βC,β),kC{j}]T=0q+1, (6)

with

Vj,k(βC,β)=i=1n0τ{Zi,kRj,k(1)(βC,β,t)Rj,k(0)(βC,β,t)dNi(t)}, (7)

and

Rj,k(m)(βC,β,t)=1ni=1nZi,kmYi(t)exp(βCTZi,C+βZi,j), (8)

for kC{j} and m{0,1,2,}. For a given threshold γ > 0. The selected index set in addition to set C is given by

^C={jC:|β^j|γ}. (9)

Namely, we recruit variables with large additional contribution given ZC. We refer to this method as Cox conditional screening (CoxCS).

3 Theoretical results

We establish the theoretical properties of the proposed methods by introducing a few new definitions along with the basic properties.

3.1 Definitions and basic properties

Let ( Ω,,P) be the probability space for all random variables introduced in this paper, where Ω is the sample space, is the σ-algebra as the set of events and P is the probability measure.

For any d ≥ 1, any random variable ξ:Ωd and any operator A, denote by A[•|ξ] conditional A of • given ξ. For any vector a=(a1,,ap)p, let aC=(aj,jC)T be the subvector where all its elements are indexed in C. Let ad=j=1p|a|jdd be the L-d norm for any vector ap. For a sequence of random variables indexed by {ξn},ξn=op(1) if and only if for any ε1 > 0 and ε2 > 0, there exists N such that for any n>N,P[|ξn|>1]<2.

For simplicity, let T, C, X, Y (t), Zj, Z and δ represent Ti, Ci, Xi, Yi (t), Zi,j, Zi and δi respectively, by removing the subject index i. Let ST(t|Z) and SC(t) represent the survival functions for the event time T and censored time C. Let FT(t|Z)=1ST(t|Z).

Definition 1

Let C={jC,αj0} and w=jCI[αj0] be the true set of non-zero coefficients and its cardinality in model (1), aside from the important predictors known a priori.

To study the asymptotic property of (β^C,jT,β^j)T, define the population level quantity as follows:

Definition 2

Let (βC,jT,βj)T be the solution of the following equations

Vj(βC,β)=[vj,k(βC,β),kC{j}]T=0q+1, (10)

with

vj,k(βC,β)=0τ[sk(1)(t)rj,k(1)(t,βC,β)rj,k(0)(t,βC,β)sk(0)(t)]dt, (11)

where sk(m)(t)=E[ZkmdN(t)] and rj,k(m)(t,βC,β)=E[Rj,k(m)(t,βc,β)]

Definition 3

Let βC,0 be the solution of the following equations

vC(βC)=[vj,k(βC,0),kC]T=0q. (12)

To understand the intuition of the population level properties for the CoxCS, we need to define a conditional linear expectation. A similar concept has been used to study the conditional sure independence screening (CSIS) in the GLM setting by Barut et al. (2016). We provide a formal definition here.

Definition 4

For two random variables ζ:Ωd and ξ:Ωp. The conditional linear expectation of ζ given ξ is defined as

E(ζ|ξ)=E[ζ]+AT{ξE(ξ)}, (13)

where A=argminBd×pE[(ζE[ζ]BT{ξE(ξ)})2|ξ]. Also, define notation E(ζ)=E(ζ).

Definition 5

For any random variables ζ1:Ωd1 ζ2:Ωd2 and ξ:Ωp. The conditional linear covariance between ζ1 and ζ2 given ξ is defined as

Cov(ζ1,ζ2|ξ)=E[{ζ1E(ζ1|ξ)}{ζ2E(ζ2|ξ)}|ξ]. (14)

Definition 6

Define

vj(βC,β)=vj,j(βC,β)+kCakvj,k(βC,β),

where vector aC=[ak,kC]T such that E[Zj|ZC]=kCakZk.

We list all the properties of the conditional linear expectation and the conditional linear covariance in Propositions 1 and 2 in Appendix. They are quite useful for the theoretical developments of the sure screening property. Also, combining Propositions 1, 4 and Lemma 1 in Appendix, we show the uniqueness of the solution to score equations in Definitions 3 and 6.

3.2 Regularity conditions

We list all conditions for the theoretical results.

Condition 1

For each jC and kC{j}, there exists a neighborhood of (βC,jT,βj)T, which is defined as

j={(βCT,β)T:(βCT,βj)T(βC,jT,βj)T1<δj},withδj>0,

such that for each τ < ∞.

  1. For m = 0, 1,
    supt[0,τ],(βCT,β)TlRj,k(m)(βC,β)rj,k(m)(βC,β)20,
    in probability as n.
  2. There exists a constant L > 0 such that
    L=minjC[inft[0,τ],(βCT,β)Tj{rj,k(0)(βC,β,t)}].
    Of note, {rj,k(0)(βC,β,t)} does not depend on k. Let δ=maxjCδj.

Condition 2

The covariates Zj’s satisfy the following conditions.

  1. For j{1,,p},E[Zj]=0 and there exists a constant K0 such that P(Zj>K0)=0.

  2. Zj is a time constant variable, for all j.

  3. All Zj’s, jC are independent of all Zk’s, kC given ZC.

  4. For constant c1 > 0 and κ <1/2,
    minjC|E[Cov(Zj,P[δ=1|Z]|ZC)])|c1nk.

Condition 3

There exists a constant K1 such that

α1<K1and(βC,jT,βj)T1<K1,

for all p > 0.

Condition 4

For all jC, there exists a constant M > 0 such that

M(β^C,jT,β^j)TβC,jT,βj)T2V¯j(β^C,j,β^j)V¯j(βC,j,βj)2,

where V¯j=n1Vj.

To study the sure screening property of our proposed method, we first link βj in a joint model to its counterpart in a marginal model. Thus we first understand its properties on the population as well as the sample level.

3.3 Properties on population level

Theorem 1

Suppose Condition 2 holds, βj = 0 if and only if αj.= 0 for all jC

To identify covariate jC,|βj| needs to be at least O (n−1/2). The following Theorem 2 specifies the signal strength jC.

Theorem 2

Suppose Condition 2 holds. There exist constants c2 > 0 and κ < 1/2 such that

minjC|βj|c2nκ.

3.4 Properties on sample level

Theorem 3 proves the uniform convergence of the proposed estimator and the sure screening property of the procedure.

Theorem 3

Suppose Conditions 1–4 hold. For any ε1 > 0 and any ε2 > 0, there exits positive constants c3, c4 and integer N such that for any n > N,

  1. For any 0 < κ < 1/2,
    P[maxjC|β^jβj|>c22(nκ+ε1)]2w(q+1)exp(c3n12κ)+ε2,
    where w is the size of C, q is the size of C and c2 is the same value in Theorem 2 and c3 does not dependent on ε1, ε2 and κ, but N depends on ε1 and ε2.
  2. If γn=c4nκ, where κ is the same number in Condition 2, then
    P[minjC|β^j|>γn]12w(q+1)exp(c3n12κ)2, (15)
    where c4 does not depend on ε1, ε2 and κ, thus
    limnP[C^C]=1. (16)

3.5 Alternative methods

In this section we introduce two alternative methods for controlling the false discovery rate.

Define the information matrix

Ij(βC,βj)=(Vj,k(βC,βj)βk)k,kC{j}, (17)

which is of q + 1 dimension. Denote σ^j2=[Ij(βC,j,β^j)]q+1,q+11 be the variance estimate of β^j. For a given threshold γ > 0, we can have a different way to select the index which is given by

^C={jC:|β^j|σ^jγ}. (18)

We refer to (18) as “CS-Wald”.) suggested that γ=Φ1(1f/2p) where f is the number of false positive that one is willing to tolerate and Φ(·) is the standard normal cumulative distribution function. Another alternative to construct the screening statistics, which is also scale free, is to utilize the partial log likelihood ratio statistic. Specifically,

(βC,β)=i=1n0τ{βCTZi,C+βZi,jlog(Rj,k(0)(βC,β,t))dNi(t)}. (19)

Suppose (β^C,jT,β^j)T maximizes (19) for a given j. Then, for a given threshold γ > 0, the index set can be chosen by considering the following likelihood ratio statistic.

^C={jC:(β^C,j,β^j)(β^C,0,β=0)γ}, (20)

where β^C,0 maximizes (βC,0). Hereafter (20) will be referred to as “CS-PLIK”.

4 Simulation studies

The utility of the proposed methods was evaluated via extensive simulations. Denote by CS-MPLE, a version of CoxCS that is based on the criteria of (9). For completeness, we considered two other variations of CoxCS, namely, CS-PLIK and CS-Wald. The finite sample performance of the proposed methods was compared with the following marginal screening methods designed for the survival data.

  • CRIS: censored rank independence screening proposed by Song et al. (2014).

  • CORS: correlation screening, which is an extension of sure independence screening to the censored outcome data by using inverse probability weighting; see Song et al. (2014).

  • PSIS-Wald: Wald test based on the marginal Cox model fitted on each covariate; see Zhao and Li (2012).

  • PSIS-PLIK: partial likelihood ratio test based on the marginal Cox model fitted on each covariate, which is asymptotically equivalent to Zhao and Li (2012).

We illustrated our methods and compared them with the competing methods on data simulated as below.

Example 1 The survival time was generated from a Cox model with baseline hazards function being set to be 1, i.e.,

λ(t|Z)=exp(βTZ),

where Z were generated from the standard normal distribution with equal correlation 0.5 and β=(15T,2.5,0p6T)T.

Example 2 The survival time was generated from a Cox model with baseline hazards function being set to be exp(−1), i.e.,

λ(t|Z)=exp(1+βTZ),

where β=(10,0p2T,1)T and all covariates were generated from the independent standard normal distribution.

Example 3 The same as Example 2 except that the first p – 1 covariates were generated from the multivariate standard normal distribution with an equal correlation of 0.9.

Example 4 The same as Example 1 except that the magnitude of the active variables were reduced to half, i.e., β=(0.5,0.5,0.5,0.5,0.5,1.25,0p6T)T.

The simulated data examples were designed in such a way that variables Z6 in Example 1 and Zp in Example 3 possess marginally weak but conditionally strong signals, which makes marginal screening approaches not ideal for identifying them. Example 4 exhibits a scenario with smaller signal-to-noise ratios, compared with Examples 1–3. For the GLM, Barut et al. (2016) provided a similar simulation design in the context of non-censored regression. Figure 2 depicts the distribution of the absolute correlation between the survival time and the covariate variables, where uncensored data were used to compute the marginal correlation between the event time and the covariate using an inverse probability weighting (Song et al. 2014). In Examples 1–3, the magnitude of the marginal correlation between the survival and inactive variables are sometimes even larger than that of between the survival and active variables. Therefore, it is expected that the competing methods such as CORS which relies on the marginal correlation between the survival and covariates would be difficult to identify active variables. Clearly, in Example 1 the marginal signal strength of Z6 is weaker than most noisy (inactive) variables (Z7Z1000), while the marginal signal strength of Z1000 in Examples 2 and 3 is similar or even lower than most noisy variables. The marginal correlation between the survival time and each variable is getting weaker with heavier censoring.

Fig. 2.

Fig. 2

Absolute correlation of the survival time and the covariate variables. The blue short-dashed lines (……) represents the distribution of the inactive variables; the green long-dashed lines (– – –) for the active variables with relatively strong signals; the red solid lines (— —) for the hidden active variable (Color figure online)

In all these examples, the censoring times Ci were independently generated from a uniform distribution U [0, c], with c chosen to give approximately 20% and 60% of censoring proportions. We set n = 100 and varied p from 1000 (high-dimensional) to 10, 000 (ultrahigh-dimensional). For each configuration, a total of 400 simulated datasets were generated.

We considered two metrics to compare the performance between different methods: the minimum model size (MMS) which is the minimum number of variables that need to be selected in order to include all active variables, and the true positive rate (TPR) which is the proportion of active variables that are included in the first n selected variables. Hence, a method with small MMS and large TPR can be more efficient to discover true signals. For a fair comparison, we added the number of conditioning variable in our examples to MMS for the proposed methods (CS-MPLE, CS-Wald and CS-PLIK). In practice, identifying conditional sets normally requires some prior biological information. In our simulations, we simply chose the covariate Z1 as the conditioning variable. In practice, we propose to choose the variable with the highest marginal signal strength as the conditioning variable, which can be a practical solution in the absence of prior biological knowledge. In Examples 1–4, we considered two sets of conditioning sets. The first choice was C1={1}since the signal of the first variable is strong enough to be easily picked by other marginal screening methods. Our second choice was C2 that consists of one active variable (the first variable in our examples) and 4 noisy variables. The purpose was to assess the robustness of the proposed method when the conditioning set is dominated by noisy variables.

Tables 1 and 2 demonstrate the advantages of our proposed methods under the difficult scenarios as reflected in Examples 1–4. Although the performance of the proposed methods deteriorates a bit in Example 4 due to small signal-to-noise ratios, in all examples (including Example 4) the proposed methods outperform the marginal approaches by largely reducing MMS. Moreover, all the marginal screening methods have had some difficulties in identifying Z6 in Example 1 and Example 4 and Z1000 in Examples 2 and 3. Indeed, these variables have the lowest priorities to be included by using the marginal methods. Moreover, the results with C2 demonstrate that conditioning can be beneficial even if the conditioning set is dominated by noisy variables.

Table 1.

Median minimum model size (MMS) and median true positive rates (TPR) along with their corresponding IQRs (in the parentheses) based on 400 simulated data sets

Method (n, p) = (100, 1000)
(n, p) = (100, 10000)
MMS TPR MMS TPR
CRIS 1000.0 (3.0) 0.50 (0.17) 9995.0 (37.0) 0.17 (0.17)
CORS 944.0 (168.2) 0.33 (0.33) 9466.0 (1101.8) 0.00 (0.17)
Example 1 CR ≈ 20% PSIS-PLIK 1000.0 (0.0) 0.67 (0.17) 10000.0 (2.2) 0.33 (0.17)
PSIS-Wald 1000.0 (0.0) 0.67 (0.17) 10000.0 (2.2) 0.33 (0.17)
CS-PLIK ( C1) 152.5 (272.2) 0.83 (0.17) 1322.0 (2253.8) 0.67 (0.17)
CS-Wald ( C1) 154.5 (274.5) 0.83 (0.17) 1286.5 (2287.0) 0.67 (0.17)
CS-MPLE ( C1) 143.0 (249.0) 0.83 (0.17) 1321.0 (2305.8) 0.50 (0.17)
CS-PLIK ( C2) 109.5 (212.0) 0.83 (0.17) 1012.5 (1906.5) 0.67 (0.33)
CS-Wald ( C2) 111.0 (206.8) 0.83 (0.17) 1021.0 (1906.0) 0.67 (0.33)
CS-MPLE ( C2) 110.0 (199.8) 0.83 (0.17) 1070.0 (1892.8) 0.67 (0.17)
CRIS 926.5 (151.8) 0.33 (0.17) 9429.0 (1405.0) 0.00 (0.17)
CORS 898.0 (152.2) 0.00 (0.17) 8976.0 (1610.0) 0.00 (0.00)
Example 1 CR ≈ 60% PSIS-PLIK 1000.0 (2.0) 0.67 (0.17) 9999.0 (17.0) 0.33 (0.17)
PSIS-Wald 1000.0 (2.0) 0.67 (0.17) 9999.0 (17.0) 0.33 (0.17)
CS-PLIK ( C1) 227.0 (351.2) 0.83 (0.17) 2383.5 (3331.5) 0.50 (0.33)
CS-Wald ( C1) 227.5 (348.2) 0.83 (0.17) 2403.5 (3233.2) 0.50 (0.33)
CS-MPLE ( C1) 228.5 (320.5) 0.83 (0.17) 2229.5 (3046.8) 0.50 (0.17)
CS-PLIK ( C2) 206.0 (279.2) 0.83 (0.17) 1012.5 (1906.5) 0.67 (0.33)
CS-Wald ( C2) 207.0 (278.0) 0.83 (0.17) 1021.0 (1906.0) 0.67 (0.33)
CS-MPLE ( C2) 207.5 (267.2) 0.83 (0.17) 1070.0 (1892.8) 0.67 (0.17)
CRIS 262.5 (401.8) 0.50 (0.00) 2871.0 (4314.0) 0.50 (0.00)
CORS 490.5 (510.8) 0.50 (0.00) 4878.5 (5099.8) 0.50 (0.00)
Example 2 CR ≈ 20% PSIS-PLIK 318.0 (492.8) 0.50 (0.00) 3777.0 (5690.8) 0.50 (0.00)
PSIS-Wald 318.5 (494.2) 0.50 (0.00) 3786.5 (5707.8) 0.50 (0.00)
CS-PLIK ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (0.0) 1.00 (0.00)
CS-Wald ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (0.0) 1.00 (0.00)
CS-MPLE ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (0.0) 1.00 (0.00)
CS-PLIK ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (0.0) 1.00 (0.00)
CS-Wald ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (0.0) 1.00 (0.00)
CS-MPLE ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (0.0) 1.00 (0.00)
CRIS 399.5 (478.0) 0.50 (0.00) 3601.0 (4894.2) 0.50 (0.00)
CORS 603.5 (390.5) 0.00 (0.00) 5988.0 (4353.0) 0.00 (0.00)
Example 2 CR ≈ 60% PSIS-PLIK 325.0 (498.5) 0.50 (0.00) 3942.5 (5679.5) 0.50 (0.00)
PSIS-Wald 322.0 (501.0) 0.50 (0.00) 3915.5 (5679.5) 0.50 (0.00)
CS-PLIK ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (1.0) 1.00 (0.00)
CS-Wald ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (2.0) 1.00 (0.00)
CS-MPLE ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (4.0) 1.00 (0.00)
CS-PLIK ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (4.0) 1.00 (0.00)
CS-Wald ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (5.0) 1.00 (0.00)
CS-MPLE ( C2) 6.0 (0.0) 1.00 (0.00) 7.0 (7.2) 1.00 (0.00)
CRIS 1000.0 (0.0) 0.50 (0.00) 10000.0 (0.0) 0.50 (0.00)
CORS 1000.0 (0.0) 0.50 (0.50) 10000.0 (0.0) 0.00 (0.00)
Example 3 CR ≈ 20% PSIS-PLIK 1000.0 (0.0) 0.50 (0.00) 10000.0 (0.0) 0.50 (0.00)
PSIS-Wald 1000.0 (0.0) 0.50 (0.50) 10000.0 (0.0) 0.00 (0.50)
CS-PLIK ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (0.0) 1.00 (0.00)
CS-Wald ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (0.0) 1.00 (0.00)
CS-MPLE ( C1) 3.0 (4.0) 1.00 (0.00) 7.5 (33.0) 1.00 (0.00)
CS-PLIK ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (0.0) 1.00 (0.00)
CS-Wald ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (0.0) 1.00 (0.00)
CS-MPLE ( C2) 24.0 (29.0) 1.00 (0.00) 168.5 (266.8) 0.50 (0.50)
CRIS 1000.0 (0.0) 0.50 (0.00) 10000.0 (0.0) 0.50 (0.00)
CORS 783.0 (463.8) 0.00 (0.50) 7967.0 (4972.2) 0.00 (0.00)
Example 3 CR ≈ 60% PSIS-PLIK 1000.0 (0.0) 0.50 (0.00) 10000.0 (0.0) 0.50 (0.00)
PSIS-Wald 1000.0 (0.0) 0.00 (0.00) 10000.0 (0.0) 0.00 (0.00)
CS-PLIK ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (0.0) 1.00 (0.00)
CS-Wald ( C1) 2.0 (0.0) 1.00 (0.00) 2.0 (0.0) 1.00 (0.00)
CS-MPLE ( C1) 20.0 (55.2) 1.00 (0.00) 161.5 (553.5) 0.50 (0.50)
CS-PLIK ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (2.0) 1.00 (0.00)
CS-Wald ( C2) 6.0 (0.0) 1.00 (0.00) 6.0 (3.0) 1.00 (0.00)
CS-MPLE ( C2) 105.0 (127.5) 0.50 (0.50) 973.5 (1303.2) 0.50 (0.00)

Table 2.

Median minimum model size (MMS) and median true positive rates (TPR) along with their corresponding IQRs (in the parentheses) based on 400 simulated data sets

Method (n, p) = (100, 1000)
(n, p) = (100, 10000)
MMS TPR MMS TPR
CRIS 997.0 (17.0) 0.33 (0.17) 9954.5 (205.0) 0.17 (0.17)
CORS 947.5 (180.5) 0.33 (0.33) 9566.0 (1618.2) 0.00 (0.17)
Example 4 CR ≈ 20% PSIS-PLIK 1000.0 (4.0) 0.67 (0.17) 9997.0 (47.2) 0.33 (0.17)
PSIS-Wald 1000.0 (4.0) 0.67 (0.17) 9997.0 (47.2) 0.33(0.17)
CS-PLIK ( C1) 252.0 (383.0) 0.83 (0.17) 2428.0 (3387.2) 0.50 (0.33)
CS-Wald ( C1) 253.0 (385.2) 0.83 (0.17) 2451.0 (3394.0) 0.50 (0.33)
CS-MPLE ( C1) 250.0 (349.5) 0.83 (0.17) 2320.5 (3354.5) 0.50 (0.17)
CS-PLIK ( C2) 237.5 (387.0) 0.83 (0.17) 2248.5 (3610.2) 0.50 (0.17)
CS-Wald ( C2) 241.0 (388.8) 0.83 (0.17) 2258.0 (3629.2) 0.50 (0.17)
CS-MPLE ( C2) 232.0 (366.5) 0.83 (0.17) 2139.0 (3554.8) 0.50 (0.33)
CRIS 918.0 (144.2) 0.17 (0.33) 9137.0 (1529.2) 0.00 (0.00)
CORS 895.0 (155.0) 0.00 (0.17) 8876.5 (1514.2) 0.00 (0.00)
Example 4 CR ≈ 60% PSIS-PLIK 998.0 (26.0) 0.50 (0.33) 9972.5 (206.0) 0.17 (0.17)
PSIS-Wald 998.0 (26.0) 0.50 (0.33) 9972.5 (212.2) 0.17(0.17)
CS-PLIK ( C1) 387.0 (418.8) 0.67 (0.33) 4082.5 (4522.8) 0.33 (0.17)
CS-Wald ( C1) 389.0 (423.5) 0.67 (0.33) 4076.5 (4553.5) 0.33 (0.17)
CS-MPLE ( C1) 365.5 (377.2) 0.67 (0.33) 3906.5 (4356.5) 0.33 (0.33)
CS-PLIK ( C2) 375.5 (472.2) 0.67 (0.33) 4298.5 (4557.8) 0.33 (0.17)
CS-Wald ( C2) 373.0 (471.5) 0.67 (0.33) 4283.5 (4585.5) 0.33 (0.17)
CS-MPLE ( C2) 373.0 (448.2) 0.67 (0.33) 4151.5 (4479.5) 0.33 (0.17)

The performance by CS-MPLE, CS-Wald and CS-PLIK are quite similar in all the cases in Examples 1, 2, and 4. In Example 3, there is a very high correlation among the covariate variables, CS-MPLE has a slightly larger MMS compared to the CS-Wald and CS-PLIK which well control the false discover rate in this cases.

Figure 3 shows that the proposed methods with C1 efficiently identify all active variables by including much fewer variables than the other methods. Specifically, in Example 1 the proposed methods are capable of achieving a 100% TPR by choosing at most one fifth of the variables. In contrast, other competing methods have to include almost all the variables in order to discover all active variables. In Examples 2 and 3 the proposed methods quickly detect the remaining active variable (Zp) conditioning on the active variable (Z1), as opposed to the marginal screening approaches that have to include far more variables to ensure a 100% TPR.

Fig. 3.

Fig. 3

Median number of active variables that are included in the model with different thresholds by different methods

5 Application

We illustrated the practical utility of the proposed method by applying it to analyze the DLBCL dataset of Rosenwald et al. (2002). The dataset, which was originally collected for identifying gene signatures relevant to the patient survival from time of chemotherapy, included a total of 240 DLBCL patients with 138 deaths observed during the followup and a median survival time of 2.8 years. Along with the clinical outcomes, the expression levels of 7399 genes were available for analysis. In our subsequent analysis, each gene expression was standardized to have mean zero and variance 1.

To facilitate the use of our method, we identified the conditional set by resorting to the medical literature. As gene AA805575, a Germinal-center B-cell signature gene, has been known to be predictive to DCBCL patients’ survival in the literature (see Liu et al. 2013; Gui and Li 2005), we used it as the conditional variable in our proposed procedure. For comparisons, we also analyzed the same data using various competing methods introduced in the simulation section and computed the corresponding concordance statistics (C-statistics) (Uno et al. 2011).

Specifically, we randomly assigned 160 patients to the training set and 80 patients to the testing set, while maintaining the censoring proportion roughly the same in each set. For each split, we applied each method to select top 31(= 160/ log(160)) variables as suggested by Fan and Lv (2008) using the training set. LASSO was performed subsequently for refined modeling, with the tuning parameter selected by the 10-fold cross-validation. The risk score for each subject was obtained by using the final model selected by LASSO in the training dataset and the C-statistics was obtained in the testing dataset. A total of 100 splits were made and the average C-statistics and the model size (MS) were reported in Table 3. By the criterion of C-statistics, the proposed method seemed to have more predictive power.

Table 3.

Summary of C-statistics and the model size for different methods

CRIS CORS PSIS-PLIK PSIS-Wald
C-statistics 0.54 (0.21) 0.58 (0.20) 0.58 (0.19) 0.55 (0.20)
Model size 14.41 (3.00) 6.83 (3.70) 15.22 (2.93) 15.65 (2.89)

CS-MPLE CS-PLIK CS-Wald

C-statistics 0.63 (0.18) 0.63 (0.18) 0.62 (0.19)
Model size 16.74 (3.26) 15.90 (3.01) 16.28 (3.41)

Our further scientific investigation focused on identifying the relevant genes by utilizing the full dataset. Applying our proposed method, we selected top 240/ log(240) = 44 genes before using LASSO to reach the final list. It follows that CS-MPLE, CS-PLIK and CS-Wald selected 20, 16 and 16 genes, respectively. Among the 22 uniquely selected genes by either of them, 14 genes were overlapped and were reported in Table 4. Twelve genes among these 22 genes belong to lymph-node signature group, proliferation signature group, and germinal-center B-cell group defined by Rosenwald et al. (2002). We observed that 13 of these 22 genes were chosen by at least one of CRIS, CORS, PSIS-PLIK, and PSIS-Wald. On the other hand, gene AB007866, Z50115, S78085, U00238, AL050283, J03040, U50196, and AA830781, and M81695 were only identified by using our methods.

Table 4.

A comparison of genes that are selected by CS-MPLE, CS-PLIK and CS-Wald

GenBank ID Signature Description CS-MPEE CS-PLIK CS-Wald
LC_25054
X77743 Proliferation Cyclin-dependent kinase 7
U15552 Acidic 82 kDa protein mRNA
AB007866 KIAA0406 gene product
BC012161 Proliferation Septin 1
AF134159 Proliferation Vhromosome 14 open reading frame 1
Z50115 Proliferation Thimet oligopeptidase 1
S78085 Proliferation Programmed cell death 2
M29536 Proliferation Eukaryotic translation initiation factor 2
U00238 Proliferation Phosphoribosyl pyrophosphate amidotransferase
AL050283 Proliferation Sentrin/SUMO-specific protease 3
BF129543 Germinal-center-B-cell ESTs, Weakly similar to A47224 thyroxine-binding globulin precursor
M81695 integrin, alpha X
D13666 Lymph node Osteoblast specific factor 2 (fasciclin I-like)
J03040 Lymph node Secreted protein
U50196 Adenosine kinase
U28918 Proliferation Suppression of tumorigenicity 13
AA721746 ESTs
AA830781
AF127481 Lymphoid blast crisis oncogene
M61906 Phosphoinositide-3-kinase
D42043 KIAA0084 protein

In fact, only a few studies have suggested an important role of M81695 (Deb and Reddy 2003; Chow et al. 2001; Mikovits et al. 2001; Stewart and Schuh 2000) or AA830781 (Li and Luan 2005; Binder and Schumacher 2009; Schifano et al. 2010) in predicting DLBCL survival. As the marginal correlation between M81695 and the survival time and between AA830781 and the survival time are markedly low at 0.008 and 0.097, respectively, these two genes are highly likely to be missed by using the conventional marginal screening approaches. Indeed, AA830781 was identified by Schifano et al. (2010) only because of its co-expression or correlation with other relevant genes. A more detailed investigation of the functions of the identified genes in the context of a broader class of blood cancers, including lymphoma, may shed light on preventing, treating and controlling the lethal blood cancers.

6 Discussion

In this paper, we have proposed a new conditional variable screening approach for the Cox proportional hazards model with ultra-high dimensional covariates. The proposed partial likelihood based CS approaches are computationally efficient, with a solid theoretical foundation. Our method and theory are extensions of the conditional sure independence screening (CSIS) (Barut et al. 2016), which is designed for the GLM. In the development of theory, we introduce the new concept of the conditional linear covariance for the first time, which is useful to specify the regularity conditions for the model identifiability and the sure screening property. This also provides a building block for a general theoretical framework of conditional variable screening in the context of other semi-parametric models, such as the partially linear single-index model.

We have mainly focused on studying the theoretical properties of CS-MPLE, which extends the work of Barut et al. (2016) in the GLM setting, though development of the inference procedures for the two variants of the proposed method, namely, CS-PLIK and CS-Wald, will be more involved and out of scope of this paper. However, as indicated by the simulation studies, these two variants may induce substantial improvement especially when the variables are highly correlated. More research is warranted.

Our work also enlightens a few directions that are worth ensuing effort. First, as our proposal requires the prior information to be known and informative, it remains statistically challenging to develop efficient screening methods in the absence of such information. Recently, in the context of GLM, Hong et al. (2016) has proposed a data-driven alternative when a pre-selected set of variables is unknown. It is thus of substantial interest to develop a data-driven CS for the survival model. Second, even with prior knowledge, an open question often lies in how to balance it with the information extracted from the given data. There has been some recent work on how to incorporate prior information. For example, Wang et al. (2013) developed a LASSO method by assigning different prior distributions to each subset according to a modified Bayesian information criterion that incorporates prior knowledge on both the network structure and the pathway information, and Jiang et al. (2015) proposed “prior lasso” (plasso) to balance between the prior information and the data. A natural extension of the current work is to develop a variable screening approach that incorporates more complex prior knowledge, such as the network structure or the spatial information of the covariates. We will report the progress elsewhere.

Acknowledgments

This research was partially supported by a grant from NSA (H98230-15-1-0260, Hong), an NIH grant (R01MH105561, Kang) and Chinese Natural Science Foundation (11528102, Li).

Appendix

The basic properties of the conditional linear expectation are listed in the following proposition.

Proposition 1

vj(βC,0,0)=0q+1 if and only if vj,j(βC,0,0)=0, for all jC.

The proof is straightforward based on Definitions 2 and 3.

Proposition 2

Let ζ,ζ1,ζ2 and ξ be any four random variables in the probability space (Ω,,P) The following properties hold for the conditional linear expectation E[|ξ] given ξ:

  1. Closed form: E(ζ|ξ)=E[ζ]+Cov(ζ,ξ)Var[ξ]1{ξE(ξ)}.

  2. Stability: E[ξ|ξ]=ξ.

  3. Linearity: E[A1ζ1+A2ζ2|ξ]=A1E[ζ1|ξ]+A2E[ζ2|ξ], where A1 and A2 are two matrices that are compatible with the equation.

  4. Law of total expectation: E[E(ζ|ξ)]=E[E(ζ|ξ)]=E[ζ].

Remark 1 In general, E(ζ|ξ)E(ζ|ξ). Also, ζ and ξ are independent does not imply E(ζ|ξ)=0, unless ζ and ξ are jointly normally distributed.

Remark 2 By Proposition 2, we can easily verify the following properties.

Proposition 3

The conditional linear covariance defined in Definition 5 has the following properties:

  1. Linear independence and linear zero correlation:
    Cov(ζ1,ζ2|ξ)=0E(ζ1ζ2|ξ)=E(ζ1|ξ)E(ζ2|ξ)
  2. Expectation of conditional linear covariance:
    E[Cov(ζ1,ξ2|ξ)]=Cov(ζ1,ξ2)Cov(ζ1,ξ)Var(ξ)1Cov(ξ,ζ2).
  3. Sign: for any increasing function h(): and random variable η:Ω, then
    Cov(h(η),η|ξ)0.

Combining Propositions 1–3 and based on Definition 6 we have the following property.

Proposition 4

vj,j(βC,0,0)=0 if and only if vj(βC,0,0)=0.

Lemma 1

The solution of vj(βC,β)=0q+1 and the solution of vC(βC)=0q are both unique, for any jC.

Proof of Theorem 1

Proof First we make the connection between βj to the expected conditional linear covariance between Zj and P[δ=1|Z] given ZC, that is

E[Cov(Zj,P[δ=1|Z]|ZC)],

then by Condition 2, we relate it to αj. For any jC and kC, it is straightforward to see that

sk(m)(t)=E[Zkmλ0(t)exp(ZTα)ST(t|Z)SC(t)], (21)

and

rj,k(m)(t,βC,β)=E[Zkmexp(ZCTβC+Zjβ)ST(t|Z)SC(t)], (22)

for m = 0, 1. Then

vj,k(βC,β)=0τE[Wj,k(t,βC,β)exp(ZTα)ST(t|Z)SC(t)λ0(t)]dt, (23)

where

Wj,k(t,βC,β)=ZkE[Zkexp(ZCTβC+Zjβ)ST(t|Z)SC(t)]E[exp(ZCTβC+Zjβ)ST(t|Z)SC(t)]

By Proposition 2,

E[Wj,k(t,βC,β)exp(ZTα)ST(t|Z)SC(t)]=E{E[Wj,k(t,βC,β)exp(ZTα)ST(t|Z)SC(t)]}.

By Definition 6,

vj(βC,β)=vj,j(βC,β)kCakvj,k(βC,β)=E[Cov(Zj,P[δ=1|Z]|ZC)]gj(βC,β),

where

E[Cov(Zj,P[δ=1|Z]|ZC)]=0τE[(ZjE[Zj|ZC)]exp(ZTα)ST(t|Z)SC(t)λ0(t)]dt,

and

gj(βC,β)=0τE[(ZjE[Zj|ZC])exp(ZCTβC+Zjβ)ST(t|Z)SC(t)]E[exp(ZCTβC+Zjβ)ST(t|Z)SC(t)]×E[exp(ZTα)ST(t|Z)λ0(t)SC(t)]dt.

By Definition 2, vj(βC,j,βj)=0q+1,

gj(βC,j,βj)=E[Cov(Zj,P[δ=1|Z]|ZC)].

When αj=0, then E[Cov(Zj,P[δ=1|Z]|ZC)]=0 by Condition 2.3. Thus gj(βC,j,βj)=0. Also, by Propositions 1 and 2, gj(βC,0,0)=0, then vj(βC,0,0)=0q+1. By uniqueness in Lemma 1, βj = 0.

When α j ≠ 0, by Condition 2, we have

|gj(βC,j,βj)|=|E[Cov(Zj,P[δ=1|Z]|ZC)]|>c1nκ.

This implies that gj(βC,j,βj) and E[Cov(Zj,P[δ=1|Z]|ZC)] are both nonzero and have the same signs since they are equal. Next we show for any βC,gj(βC,0) and E[Cov(Zj,P[δ=1|Z]|ZC)] have the opposite signs unless they are equal to zero. This fact implies that βj ≠ 0. Specifically, note that P(δ = 1| Z) is the probability of occurring the event and ST(t|Z)SC(t)=P(X>t|Z) represents the probability at risk at time t. Based on Model (1), for any t,

P(X>t|Z)Zj×P(δ=1|Z)Zj0.

By Proposition 3, Cov(Zj,P[δ=1|Z]|ZC) and Cov[Zj,ST(t|Z)SC(t)|ZC] have the opposite signs unless they are zero. This further implies that for any βC,

gj(βC,0)=0τE[exp(ZCTβC)Cov[Zj,ST(t|Z)SC(t)|ZC]]E[exp(ZCTβC)ST(t|Z)SC(t)]×E[exp(ZTα)ST(t|Z)λ0(t)SC(t)]dt,

and E[Cov(Zj,P[δ=1|Z]|ZC)] have opposite signs unless they are equal to zero. Therefore, βj ≠ 0.

Proof of Theorem 2

Proof For any jC, we have βj ≠ 0 by Theorem 1, by mean value theorem, for some βj(0,βj),

|vj(βC,j,0)|=|vj(βC,j,βj)vj(βC,j,0)|=|vjβ(βC,j,βj)||βj|.

Next we show that |vjβ(βC,j,βj)| is bounded. For given any βC, consider gj(βC,β) as a function of β, Then

gjβ(βC,β)=E[0τHj(t,βC,β)SC(t)dFT(t|Z)],

where

Hj(t,βC,β)=E[exp(ZCTβC)Cov[Zj2exp(Zjβ),ST(t|Z)|ZC]]E[exp(ZCTβC+Zjβ),ST(t|Z)]E[exp(ZCTβC)Cov[Zjexp(Zjβ),ST(t|Z)|ZC]]E[Zjexp(ZCTβC+Zjβ),ST(t|Z)][E[exp(ZCTβC+Zjβ),ST(t|Z)]]2.

By Condition 2.1, P(|Z| < K0) = 1, then supβC,β|Hj(t,βC,β)|2K02 Thus,

|vjβ(βC,j,βj)|supβC,β|gjβ(βC,β)|2K02|E[E[SC(T)|Z]]2K02.

By the proof in Theorem 1, g(βC,j,0) and E[Cov(Zj,E{FT(C|Z)|Z}|ZC)] have the opposite signs, and by Condition 2,

|vj(βC,j,0)|=|E[Cov(Zj,P[δ=1|Z]|ZC)]|+|gj(βC,j,0)|>c1nκ.

Taking c2=0.5K02c1, βj>0.5K02|vj(βC,j,0)|>c2nκ. This completes the proof.□

Proof of Theorem 3

Proof For any jC and kC{j}, by Lin and Wei (1989), we have

V¯j(βC,β)=En{Wi,j(βC,β)}+op(1),

where En [·] denotes the empirical measure, which is defined as En[ξi]=n1i=1nξi for any random variables ξ1,,ξn, and Wi,j(βC,β) are independent over i, and write Wi,j(βC,β)=[Wi,j,k(βC,β),kC{j}]T with

Wi,j,k(βC,β)=0τ{Zi,krj,k(1)(βC,β,t)rj,k(0)(βC,β,t)}dNi(t)0τYi(t)exp(Zi,CβCT+Zi,jβ)rj,k(0)(βC,β,t){Zi,krj,k(1)(βC,β,t)rj,k(0)(βC,β,t)}dE[Ni(t)].

Note that given any i, j, k, with probability one |Wi,j,k(βC,β)| are uniformly bounded. Specifically, by Conditions 1.2, 2.1 and 3, with probability one, for all t[0,τ], (βCT,β)Tj,

|Zi,krj,k(1)(βC,β,t)rj,k(0)(βC,β,t)||Zi,k|+K0,|Yi(t)exp(Zi,CβCT+Zi,jβ)rj,k(0)(βC,β,t)|exp{K0(K1+δ)log(L)},

and

|0τdE[Ni(t)]|Λ0(τ)exp(K0K1).

Thus, with probability one,

|Wi,j,k(βC,β)|K2,

where K2=2K0(1+Λ0(τ)exp(2K0K1+K0δlogL)). By the fact that E[Wi,j,k(βC,β)]=0,

Var[Wi,j,k(βc,β)]=E[|Wi,j,k(βc,β)|2]<K22.

By Lemma 2.2.9 (Bernsterin’s inequality) of Vaart and Wellner (1996), for any t > 0, for all j, k, βC and β, we have

P(|En(Wi,j,k(βC,β))|>tn)2exp(12t2nK22+K2t/3).

Note that the above inequality holds for every jC and kC{j} By Bonferroni inequality,

P(En(Wi,j(βC,β))2>t(q+1)n)2(q+1)exp(12t2nK22+K2t/3).

Since,

V¯j(βC,β)En(Wi,j(βC,β))2=op(1).

Then for any ε1 > 0 and ε2 > 0, there exits N1, such that for any n > N1,

P(V¯j(βC,β)En(Wi,j(βC,β))2>M1/2)<2,

where M is the same value in Condition 4. By Triangle inequality and Bonferroni inequality, we have

P(V¯j(βC,β)2>t(q+1)n)P(En(Wi,j(βC,β))2>t(q+1)nM1/2)+P(V¯j(βC,β)En(Wi,j(βC,β))2>M2/2).

When n, take t=c2M(q+1)n1κ/2>0 on both side of the inequality, where c2 is the same value in Theorem 2, we have

P(V¯j(βC,β)2>Mc22(nκε1))2(q+1)exp(M2c228(q+1)2n12κK22+K2nκ/3)+ε2.

Take N = max{⌈(K2/3)1/κ⌉,N1}, then for any n > N, n−κ< 3/K2, and

P(V¯j(βC,β)2>Mc22(nκ1))2(q+1)exp(M2c228(q+1)2n12κK22+1)+ε2

Note that the above inequality holds for all (βCT,β)Tj, particularly for (βC,jT,βj)T,jC. Also, we have Vj¯(β^C,j,β^j)=0q+1.By Condition 4, we have

P(|β^jβj|>c22(nκ1))P((β^C,jT,β^j)T(βC,jT,βj)T2>c22(nκ1))2(q+1)exp(M2c228(q+1)2n12κK22+1)+2.

Taking c3=M2c228(q+1)2(K22+1) and by Bonferroni completes the proof for part 1.

For part 2, by Theorem 2,

minjC|βj|>c2nκ.

Note that, for any jC, event

{|β^jβj|c2nκ/21}{|β^j||βj|c2nκ/2+1}{|β^j||c2nκ/2+1}.

Take γn=c4nκ with c4 = c2/4,

{maxjC|β^jβj|c2nκ/21}{maxjC|β^j|c2nκ/2+1}{maxjC|β^j|γn+1}.

Thus,

P[C^C]=P[minjC|β^j|>γn]P[minjC|β^j|>γn+1]1P[minjC|β^jβj|c2nκ/21]12w(q+1)exp(c3n12κ)2.

Let n we have for any ε2>0,

limnP[MCM^C]1ε2.

Note that the left side of the above equation does not depends on n any more. Taking ε20 completes proof.

References

  1. Barut E, Fan J, Verhasselt A. Conditional sure independence screening. J Am Stat Assoc. 2016;116:544–557. doi: 10.1080/01621459.2015.1092974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Binder H, Schumacher M. Incorporating pathway information into boosting estimation of highdimensional risk prediction models. BMC Bioinform. 2009;10:18. doi: 10.1186/1471-2105-10-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chow ML, Moler EJ, Mian IS. Identifying marker genes intranscriptionprofiling data using a mixture of feature relevance experts. Physiol Genomics. 2001;5:99–111. doi: 10.1152/physiolgenomics.2001.5.2.99. [DOI] [PubMed] [Google Scholar]
  4. Deb K, Reddy AR. Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems. 2003;72:111–129. doi: 10.1016/s0303-2647(03)00138-2. [DOI] [PubMed] [Google Scholar]
  5. Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with discussion) J R Stat Soc B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Fan J, Feng Y, Wu Y. IMS collections borrowing strength: theory powering applications—A Festschrift for Lawrence D Brown. Vol. 6. Institute of Mathematical Statistics; Beachwood: 2010. High-dimensional variable selection for Cox’s proportional hazards model; pp. 70–86. [Google Scholar]
  7. Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics. 2005;21(13):3001–3008. doi: 10.1093/bioinformatics/bti422. [DOI] [PubMed] [Google Scholar]
  8. Hong H, Wang L, He X. A data-driven approach to conditional screening of high dimensional variables. Stat. 2016;5(1):200–212. [Google Scholar]
  9. Jiang Y, He Y, Zhang H. Variable selection with prior information for generalized linear models via the prior Lasso method. J Am Stat Assoc. 2015;111(513):355–376. doi: 10.1080/01621459.2015.1008363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Li H, Luan Y. Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics. 2005;21(10):2403–2409. doi: 10.1093/bioinformatics/bti324. [DOI] [PubMed] [Google Scholar]
  11. Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. J Am Stat Assoc. 1989;84(408):1074–1078. [Google Scholar]
  12. Liu XY, Liang Y, Xu ZB, Zhang H, Leung KS. Adaptive l1/2 shooting regularization method for survival analysis using gene expression data. Sci World J. 2013;2013:475702. doi: 10.1155/2013/475702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Mikovits J, Ruscetti F, Zhu W, Bagni R, Dorjsuren D, Shoemaker R. Potential cellular signatures of viral infections in human hematopoietic cells. Dis Markers. 2001;17(3):173–178. doi: 10.1155/2001/896953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Rosenwald A, Wright G, Chan W, Connors J, Campo E, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med. 2002;346(25):1937–1947. doi: 10.1056/NEJMoa012914. [DOI] [PubMed] [Google Scholar]
  15. Schifano ED, Strawderman RL, Wells MT. Mm algorithms for minimizing nonsmoothly penalized objective functions. Electron J Stat. 2010;4:1258–1299. [Google Scholar]
  16. Song R, Lu W, Ma S, Jeng XJ. Censored rankindependence screening for high-dimensional survival data. Biometrika. 2014;101(4):799–814. doi: 10.1093/biomet/asu047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Stewart AK, Schuh AC. White cells 2: impact of understanding the molecular basis of haematological malignant disorders on clinical practice. Lancet. 2000;355(9213):1447–1453. doi: 10.1016/s0140-6736(00)02150-4. [DOI] [PubMed] [Google Scholar]
  18. Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–1117. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Van Der Vaart AW, Wellner JA. Weak convergence. Springer; New York: 1996. [Google Scholar]
  20. Wang Z, Xu W, San Lucas F, Liu Y. Incorporating prior knowledge into gene network study. Bioinformatics. 2013;29:2633–2640. doi: 10.1093/bioinformatics/btt443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zhao SD, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal. 2012;105(1):397–411. doi: 10.1016/j.jmva.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES