Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jan 14.
Published in final edited form as: Stat Sin. 2019 Oct;29(4):2105–2139. doi: 10.5705/ss.202017.0298

Marginal screening for high-dimensional predictors of survival outcomes

Tzu-Jung Huang 1, Ian W McKeague 1, Min Qian 1
PMCID: PMC6959482  NIHMSID: NIHMS990031  PMID: 31938013

Abstract

This study develops a marginal screening test to detect the presence of significant predictors for a right-censored time-to-event outcome under a high-dimensional accelerated failure time (AFT) model. Establishing a rigorous screening test in this setting is challenging, because of the right censoring and the post-selection inference. In the latter case, an implicit variable selection step needs to be included to avoid inflating the Type-I error. A prior study solved this problem by constructing an adaptive resampling test under an ordinary linear regression. To accommodate right censoring, we develop a new approach based on a maximally selected Koul–Susarla–Van Ryzin estimator from a marginal AFT working model. A regularized bootstrap method is used to calibrate the test. Our test is more powerful and less conservative than both a Bonferroni correction of the marginal tests and other competing methods. The proposed method is evaluated in simulation studies and applied to two real data sets.

Keywords: Accelerated failure time model, Bootstrap, Family-wise error rate, Inverse probability weighting, Multiple testing, Post-selection inference

1. Introduction

The problem of detecting informative predictors of a survival outcome has received much attention over the past decade, especially since the advent of high-throughput genomic data. For example, a specific gene expression may influence a patient’s survival time from diffuse large B-cell lymphoma (DLBCL). Identifying such associations from massive collections of gene-expression data remains a challenging issue. Motivated by a DLBCL study (Rosenwald et al. (2002)), we consider the fundamental detection problem of whether there exists at least one predictor (or genetic feature) that is associated with the survival outcome in the presence of right censoring.

To address this problem, we develop an adaptive resampling test for survival data (ARTS), related to the approach developed by McKeague and Qian (2015) (henceforth, MQ) for uncensored outcomes. This test provides marginal screening of the predictors, along with rigorous control of the family-wise error rate (FWER) resulting from the implicit multiple testing. Furthermore, our testing procedure adjusts for low-dimensional baseline clinical covariates that are not included in the systematic screening of the gene-expression measurements. To identify the full set of active predictors, we propose a forward-stepwise version of the ARTS procedure that adjusts for previously included predictors at each step, and continues until no further significant predictors are found.

We specify the link between the survival outcome and the predictors in terms of a general semiparametric accelerated failure time (AFT) model that does not make any distributional assumption on the error term. Our approach also applies when the error distribution is modeled parametrically (as in Kalbfleisch and Prentice (2002), Medeiros et al. (2014)), although we focus on the semiparametric case. Let T be the (log-transformed) time-to-event outcome, and U = (U1, …, Up)T denote a p-dimensional vector of predictors. Here, p can be large, although it is taken to be fixed for the purpose of developing the asymptotic theory. The AFT model is given by

T=α0+UTβ0+ε, (1)

where α0 is an intercept, and β0p is a vector of regression coefficients. We assume that the error term ε has a zero mean and finite variance, and is uncorrelated with U. The transformed survival outcome T is possibly right-censored by C, which is assumed to be independent of (T,U) and bounded above by τ, the time to the end of the follow-up. We also make the standard assumption that P(TC) > 0 to ensure that sufficient failure times are observed over the follow-up period (asymptotically).

In the framework of semiparametric AFT models, Koul et al. (1981) (henceforth, KSV) introduced the technique of inversely weighting the observed outcomes by the Kaplan–Meier estimate for the censoring, enabling them to apply standard least squares estimators from the uncensored linear model. Subsequently, two additional sophisticated methods were proposed to fit the semiparametric AFT model. The Buckley–James estimator replaces the censored survival outcome by the conditional expectation of T, given the data (Buckley and James (1979), Ritov (1990)). The rank-based method is an estimating equation approach formulated in terms of the partial likelihood score function (Tsiatis (1990), Lai and Ying (1991a), Lai and Ying (1991b), Ying (1993), Jin et al. (2003)). Our proposed marginal screening test is based on the KSV estimator, which has an advantage over the Buckley–James and rank-based methods in that it preserves a direct link with the linear model. In particular, it maintains the marginal correlations between the inversely weighted response and the predictors.

An especially attractive feature of the AFT model is that the marginal association between T and each predictor can be represented directly in terms of a correlation. As discussed below, this allows us to reduce the high-dimensional screening problem to a single test of whether the most correlated predictor with T is significant. The most popular approach to the screening of predictors in survival analyses is to use relative or excess conditional hazard function representations of associations. However, the AFT approach has the advantage that a lack of any marginal correlation implies the absence of all correlation between T and U; in the hazard-rate setting, there is no such connection.

Another attractive feature of the AFT model is that it is relatively insensitive to unmeasured heterogeneity, because the error term can act as a latent variable representing omitted confounders (Keiding et al. (1997)). In hazard-rate approaches, latent variables are typically included using inflexible parametric frailty models that are not easily applied in practice. In general, the presence of unmeasured heterogeneity causes the attenuation of parameter estimates. This is especially pronounced in hazard-rate approaches, such as the Cox model or additive risk models (Lin and Ying (1994), McKeague and Sasieni (1994)). On the other hand, such attenuation is much less problematic for the AFT model because the error term is only assumed to be uncorrelated with the predictors, and requires no special distributional assumption.

Under the AFT model (1), we test the null hypothesis β0 = 0, that is, that no predictor is linearly associated with T, against the omnibus alternative. The data consist of independent and identically distributed (i.i.d.) copies (Xi, δi, Ui), for i = 1, …, n, of (X, δ, U), where X = min(T,C) and δ = 1(TC). The ARTS marginal screening procedure fits a series of working AFT models using one component of U at a time, and then selects the marginal KSV regression parameter estimate θ^n that has the maximal absolute value. When the predictors are pre-standardized, the maximal regression parameter corresponds to the maximal correlation between T and any component of U, motivating nθ^n as a suitable test statistic. The limiting distribution of this test statistic is nonregular (discontinuous at zero as a function of β0), making it difficult to calibrate the test, as explained in the standard linear regression setting by MQ. Furthermore, the presence of censoring introduces additional (discontinuous) dispersion in the limiting distribution of nθ^n, which needs to be addressed.

The marginal KSV estimates stem from regressing the estimated synthetic response Y=δX/G^n(X) on successive components of U, where Y is regarded as an inverse probability weighted estimate, and G^n is the standard Kaplan–Meier estimator of the survival function of C (denoted by G0). Under independent censoring (as stated earlier), the use of least squares estimators, treating Y as a response variable, is justified in view of the uniform consistency of G^n under mild conditions (e.g., when the distribution functions of T and C have no common jumps; see Stute and Wang (1993)). Independent censoring is a common assumption in the high-dimensional screening of predictors for survival outcomes (He et al. (2013), Song et al. (2014), Li et al. (2016)). However, it is much less restrictive to assume that T and C are conditionally independent, given U, in which case the conditional survival function G0(·|U) of C given U can depend on the predictors. Estimating G0(·|U) is challenging unless there is prior knowledge that only a single predictor is involved, using a local Kaplan–Meier estimator (Dabrowska (1989)). For simplicity, however, we assume independent censoring throughout.

Variable selection methods for right-censored survival data are widely available, although formal testing procedures are far less prevalent. For example, variants of the regularized Cox regression have been studied by Tibshirani (1997), Fan and Li (2002), Bunea and McKeague (2005), Zhang and Lu (2007), Bøvelstad et al. (2009), Engler and Li (2009), Antoniadis et al. (2010), Binder et al. (2011), Wu (2012), and Sinnott and Cai (2016). Penalized AFT models have been considered by Huang et al. (2006), Datta et al. (2007), Johnson (2008), Johnson et al. (2008), Cai et al. (2009), Huang and Ma (2010), Bradic et al. (2011), Ma and Du (2012), and Li et al. (2014). These methods ensure the consistency of variable selection only (i.e., the oracle property), and do not address the issue of post-selection inference. Fang et al. (2017) have established asymptotically valid confidence intervals for a preconceived regression parameter in a high-dimensional Cox model after variable selection on the remaining predictors, but this does not apply to marginal screening (where no regression parameter is singled out, a priori). Zhong et al. (2015) have considered the same problem for preconceived regression parameters within a high-dimensional additive risk model. Taylor and Tibshirani (2018) recently proposed a method of finding post-selection corrected p-values and confidence intervals for the Cox model based on conditional testing. However, to the best of our knowledge, their method has not been explored theoretically (except in a linear regression setting with independent normal errors; see Lockhart et al. (2014)).

Statistical methods for variable selection based on marginal screening on survival data have been studied by Fan et al. (2010), who extended sure independence screening to survival outcomes based on the Cox model. Their method applies to the selection of components of ultra-high-dimensional predictors, although no formal testing is available. Other relevant references include Zhao and Li (2012), Gorst-Rasmussen and Scheike (2013), He et al. (2013), Song et al. (2014), Zhao and Li (2014), Hong et al. (2018b), Li et al. (2016), and Hong et al. (2018a).

The remainder of the paper is organized as follows. In Section 2, we formulate the testing problem and introduce the proposed test statistic based on marginal KSV estimators. The adaptive bootstrap procedure used to calibrate the test is provided at the end of Section 2. In Section 3, we propose a variant of ARTS that adjusts for the effect of baseline clinical covariates. A forward-stepwise ARTS procedure is developed in Section 4. Various competing methods are discussed in Section 5. The numerical results reported in Section 6 show that ARTS performs favorably compared with these competing methods. In Section 7, we present applications to gene-expression data and primary biliary cirrhosis data. Concluding remarks are given in Section 8. The proofs of all the results are provided in the online Supplementary Material.

2. ARTS procedure

2.1. Preliminaries

The method proposed by Koul et al. (1981) for fitting the AFT model (1) replaces T by the synthetic response Y˜=δX/G0(X), which is justified by the property

E[Y˜|U]=E[δXG0(X)|U]=E[TG0(T)E[δ|T]|U]=E[T|U], (2)

where G0 is unknown, but can be estimated by its Kaplan–Meier estimator. In other words, T and Y˜ have identical conditional means, given U, assuming independent censoring. Therefore, we can recast the AFT model as Y˜=α0+UTβ0+ε˜, using a new error term ε˜ that still has a zero mean and finite variance, and is uncorrelated with U (see the Supplementary Material for a detailed proof). Using similar arguments, we can show that E[Y˜2]=E[T2/G0(T)]E[T2] and E[UjY˜]=E[UjT], for j = 1, …, p. Hence, this property implies that the correlation between T and Uj is uniformly proportional to the correlation between Y˜ and Uj over j, leading to the equality

argmaxj=1,,p|Corr(Uj,T)|=argmaxj=1,,p|Corr(Uj,Y˜)|. (3)

In the next section, we use (3) to reduce the screening problem to a test of whether the most correlated predictor with T (or, equivalently, with Y˜) is significant. In practice, we recommend the pre-standardization of the predictors (as is common in variable selection) to provide scale-invariance. However, we develop the ARTS procedure in terms of the unstandardized predictors for simplicity of notation.

2.2. Maximally selected KSV estimator

To specify the predictor that is the most correlated with T, we introduce the notation

j(b)=argmaxj=1,,p|Corr(Uj,UTb)|foranybp. (4)

Under model (1), it is natural to have Corr(Uj, T) = Corr(Uj,UT β0), which indicates that j(β0) = arg maxj=1,…,p |Corr(Uj, T)|. We assume j(β0) is unique when β00. Thus, testing whether β0 = 0 is equivalent to a test of

H0:θ0=0versusHA:θ00,

where θ0 denotes the marginal regression coefficient of Uj(β0), the most correlated predictor with T (or, equivalently, with Y˜ by (3)). Henceforth, for notational simplicity, we denote the label j(β0) by j0.

The synthetic response Y˜ is not observed, but it can be estimated by Y=δX/G^n(X), which leads to the sample version of j0 given by

j^n=argmaxj=1,,p|n(UjnUj)YSjSY|, (5)

where n is the empirical distribution, and Sj and SY are the sample standard deviations of Uj and Y, respectively. The best fitting marginal linear model for T with predictor Uj0 has the intercept and slope

(a0,θ0)=(ETθ0EUj0,Cov(Uj0,T)Var(Uj0)).

The maximally selected KSV estimator of (a0, θ0) is

(α^n,θ^n)=(nYθ^nnUj^n,1Sj^n2n(Uj^nnUj^n)Y), (6)

where Sj^n2 denotes the sample variance of Uj^n. We reject H0 in favor of HA for extreme values of the test statistic nθ^n.

2.3. Local behavior of θ^n

The challenge of calibrating a test based on nθ^n is to adapt to its nonregular limiting behavior at β0 = 0 (as shown in Theorem 1 below). To accurately capture the asymptotic behavior of θ^n in n-neighborhoods of β0 = 0, we consider the local linear model

T(n)=α0+UTβn+ε, (7)

where βn=β0+b0/n, with a local parameter b0p, and ε is unchanged.

Under model (7), the observed time and the censoring status are denoted by X(n) = min(T(n), C) and δ(n) = 1(T(n)C), respectively. We also define the synthetic response Y˜(n) and the estimated synthetic response Y(n) in an analogous fashion:

Y˜(n)=δ(n)X(n)G0(X(n))andY(n)=δ(n)X(n)G^n(X(n)).

For any fixed n, Y˜(n) has the same mean and covariance with U as those of T(n). The error term associated with Y˜(n) is ε˜n=Y˜(n)α0UTβn, which also has zero mean and is uncorrelated with U. Instead of j0, the label of the predictor most correlated with T(n) is

jnj(βn)=argmaxj=1,,p|Corr(Uj,T(n))|=argmaxj=1,,p|Corr(Uj,Y˜(n))|,

and our earlier hypotheses become

H0:θn=0versusHA:θn0,

where

θn=Cov(Ujn,T(n))Var(Ujn). (8)

Note that jn = j(b0) when β0 = 0, but b00, and j(b0) is assumed unique. Otherwise, jn is not well defined, and the null hypothesis θn = 0 holds when β0 = 0 and b0 = 0. If j0 is unique, then jnj0. The estimators j^n and θ^n are now defined by replacing Y by Y(n) in (5) and (6).

We develop the limiting distribution of nθ^n in the following theorem under assumptions (A.1)–(A.4) below. The proof is based on the functional delta method (van der Vaart (2000), Chap. 20) and a functional central limit theorem (Pollard (1990), Sec. 10), and is provided in the Supplementary Material.

  • (A.1)

    The predictors Uj, for j = 1, …, p, are bounded, and |Corr(Uj, Uk)| < 1, for all jk.

  • (A.2)

    The error term ε in (7) has a zero mean and finite variance, and is uncorrelated with U.

  • (A.3)

    The censoring time C is independent of (T,U) and is bounded above by τ (the time to the end of the follow-up).

  • (A.4)

    The marginal survival function of the censoring, G0, is continuous on T, and there exists a positive constant cg such that G0(τ) > cg > 0. In addition, the marginal survival function of T, F0, is continuous on T, and there exists a positive constant cf such that F0(τ) > cf > 0.

Theorem 1

Suppose that j0 = j(β0) is unique when β00; j(b0) is unique when β0 = 0 and b00, and that the regularity conditions (A.1)–(A.4) hold. Under the local model (7),

n(θ^nθn)d{(Mj0+φj0(L))/Vj0ifβ00,(MJ+φJ(L))/VJ+(CJ/VJCj(b0)/Vj(b0))Tb0ifβ0=0,

where Vj = Var(Uj), Cj = Cov(Uj, U), J=argmaxj=1,,p{Mj+φj(L)+CjTb0}2/Vj, M = {Mj, j = 1, …, p} is a mean-zero normal random vector, L is a mean-zero Gaussian process, and (M,L) is a mean-zero Gaussian process, the covariance of which is provided in the Supplementary Material. The j-indexed functional φj:τ is defined by

φj(h)=E[(UjEUj)Th(T)G0(T)],

where τ denotes the space of bounded functions on T.

Remark 1

The Gaussian process L is the weak limit of the process n(G^nG0). When there is no censoring, G^n(t)=G0(t)=1, for all t, such that L is a zero process. Then, φj(L)=0 for all j, and the limiting distribution reduces to that given by MQ. When there is censoring, L is a nontrivial Gaussian process and introduces further dispersion into our limiting distribution.

Remark 2

When there is censoring and β00, we have T and U correlated, leading to nonzero φj(L) for all j. Along with the nontrivial process L, the additional term φj0(L) will be present.

Remark 3

When there is censoring and β0 = 0, φj(L) will vanish everywhere, almost surely (a.s.) for all j, if ε and U are independent. As a result, the additional term φJ(L) disappears. Given the independence between ε and U, the limiting distribution simplifies to

MJ/VJ+(CJ/VJCj(b0)/Vj(b0))Tb0.

This less complex form of the limiting distribution can be estimated easily from the data. In addition to the possibility of evaluating the asymptotic power (discussed in Section 6), it enables calibration via simulation from the estimated null limiting distribution of nθ^n (later introduced as “CEND” in Section 5). However, the validity of this approach relies on the highly restrictive assumption that ε and U are independent.

The discontinuity of the limiting distribution at β0 = 0 introduces difficulties when designing a screening test based on θ^n. If β00, naive resampling methods can give consistent estimates of the limiting distribution of n(θ^nθn). However, if β0 = 0, resampling methods that fail to consider the local behavior of nθ^n around β0 = 0 will give inconsistent estimates of the limiting distribution. To accommodate this nonuniform weak convergence at the point of nonregularity (i.e., β0 = 0), our proposed ARTS allows for the flexibility of using different bootstrap strategies to approximate the limiting distribution when β00 or β0 = 0. Recall that Sj2 is the sample variance of Uj, for all j. We decompose n(θ^nθn) into

n(θ^nθn)1(|Tn|>λnorβ00)+n(θ^nθn)1(|Tn|λn,β0=0), (9)

where Tn=nθ^n/σ^n is the maximally selected studentized statistic, and

σ^n2=n(Yα^nθ^nUj^n)2/Sj^n2

with (α^n,θ^n,j^n) defined in (5) and (6). The statistic Tn serves as a pretest to identify the nonregular situation in which we need a more accurate bootstrap strategy to capture the local asymptotic behavior of θ^n. Although the asymptotic variance of the KSV estimator in the fixed design case is known (Zhou (1992), Srinivasan and Zhou (1994)), in the present random design case it is simpler to avoid using such a complex standard error estimator. Instead, we base the pretest on the relatively simple statistic Tn. We show that σ^n2 is asymptotically bounded away from zero and bounded above (the proof is provided in the Supplementary Material). Together with the results in Theorem 1, we prove that |Tn|a.s. when β00, and |Tn|=Op(1) when β0 = 0. The specification of λn is presented in the next section.

We isolate the possibility of β0 = 0 by comparing |Tn| with some screening threshold λn. The first term in (9) can be estimated consistently using a centered percentile bootstrap whenever λn=o(n) and λn → ∞, because we show 1(|Tn|>λn)p1(β00) (stated as Lemma 4.1 in the Supplementary Material, along with a detailed proof). Estimating the second term in (9) entails additional work. Recall that n is the empirical distribution, P is the distribution of (X(n), δ(n), U), and Gn=n(nP). For j = 1, …, p, we define

Mn,j=Gnε˜n(UjnUj)andDn,j=nn(UjnUj)(Y(n)Y˜(n)).

For bp, we define

Jn(b)=argmaxj=1,,p(Mn,j+Dn,j+n(UjnUj)UTb)2/Sj2,

and a b-indexed process

n(b)=(Mn,Jn(b)+Dn,Jn(b)+n(UJn(b)nUJn(b))UTb)/SJn(b)2Cj(b)Tb/Vj(b).

Below, we express the second term in (9) as a function n(b0). When β0 = 0, it is easy to see that

nθ^j=nn(UjnUj)Y˜(n)/Sj2+nn(UjnUj)(Y(n)Y˜(n))/Sj2=(Gnε˜n(UjnUj)+nn(UjnUj)(Y(n)Y˜(n))+n(UjnUj)UTb0)/Sj2=(Mn,j+Dn,j+n(UjnUj)UTb0)/Sj2,

for all j. Along with j^n=Jn(b0) and jn = j(b0) when β0 = 0, we have nθn=Cj(b0)Tb0/Vj(b0) and therefore, n(θ^nθn)=n(b0). Hence, the decomposition of n(θ^nθn) can be expressed as

n(θ^nθn)=n(θ^nθn)1(|Tn|>λnorβ00)+n(b0)1(|Tn|λn,β0=0). (10)

In Theorem 2 below, we show that n(b) can be consistently bootstrapped for any given b. Provided that b0 is known, we can directly bootstrap the expression in (10) to consistently estimate the limiting distribution of n(θ^nθn). Hereafter, the superscript ∗ is used to indicate the bootstrap version of an estimator.

Theorem 2

Suppose that all conditions for Theorem 1 hold, and the tuning parameter λn satisfies λn=o(n) and λn → ∞ as n → ∞. Under the local model (7),

n(θ^n*θ^n)1(|Tn*|>λnor|Tn|>λn)+n*(b0)1(|Tn*|λn,|Tn|λn)

converges to the limiting distribution of n(θ^nθn) conditionally (on the data) in probability.

2.4. ARTS screening procedure

The ARTS screening procedure uses a bootstrap calibration for the test statistic nθ^n based on a special case of Theorem 2, specifically, b0 = 0. To approximate the limiting distribution of nθ^n under the null, it suffices to bootstrap

Bn=n(θ^nθn)1(|Tn|>λnorβ00)+n(0)1(|Tn|λn,β0=0), (11)

and the corresponding bootstrap version is

Bn*=n(θ^n*θ^n)1(|Tn*|>λnor|Tn|>λn)+n*(0)1(|Tn*|λn,|Tn|λn). (12)

For some nominal level α, define the critical values cl and cu, respectively, by the lower and upper 100(α/2)-th percentiles of 1000 replications of Bn*. We reject the null hypothesis, and conclude that there is at least one significant predictor if nθ^n falls outside the interval [cl, cu].

Given the conditions that λn=o(n) and λn → ∞, the pretest demonstrates an asymptotically negligible Type-I error rate P(|Tn|>λn|θn=0)0, because we have shown that P(|Tn|>λn)1(β00) in Lemma 4.1, stated in the Supplementary Material. Provided that ε˜ and U are independent, a special case of Theorem 1 indicates that Tndmaxj=1,,p|Zj| at the null, where {Zj, j = 1, …, p} is a vector of standard normal random variables. Using similar arguments to those of MQ, the asymptotic Type-I error rate of the pretest can be controlled below level α if we set λn ≥ Φ−1(1 − α/(2p)), where Φ denotes the standard normal distribution function. To satisfy the conditions that λn=o(n) and λn → ∞, one reasonable selection of the threshold would be λn=max{alogn,Φ1(1α/(2p))}, for some constant a > 0.

To determine the value of the constant a in practice, we use a double-bootstrap. That is, we produce 1000 bootstrap estimates θ^n*, and apply the ARTS to a further 1000 nested double-bootstrap samples to obtain the acceptance region [cl*,cu*] for each θ^n*. If the test statistic n(θ^n*θ^n) falls outside [cl*,cu*], we record this as a rejection. The constant a is specified as the value that results in 5% of these 1000 ARTS procedures being rejected. This data-driven selection of a is adopted in our numerical studies and applications to real data. Note that in each bootstrap and nested double-bootstrap sample, we set τ as the 90% empirical percentile of the observed time and control the censoring rate around the same level, as in the original data.

3. ARTS adjusted for baseline covariates

When screening high-dimensional predictors of survival outcomes, it is common practice to adjust for baseline demographic and clinical covariates. These baseline covariates include age, disease stage, tumor thickness, and lymph node status; in the DLBCL study, we have the International Prognostic Index (IPI). The IPI is a widely used prognostic index that reflects the combination of clinical covariates (cf., The International Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993)). Such baseline covariates (with moderate dimensionality) do not need to be screened, but do need to be incorporated as covariates in the AFT model. In this section, we modify the ARTS (as adjusted ARTS) to account for the effect of these covariates.

Let U˜=(U˜1,,U˜q)T be a vector of baseline covariates. With U˜ included, the true AFT model (1) can be expressed as

T=α0+UTβ0+U˜Tγ0+ε, (13)

where γ0 ∈ ℝq, U˜ is assumed to be bounded, and the error term ε is uncorrelated with U˜. We wish to test whether β0 = 0, which includes an adjustment for U˜. Projecting U˜ on the space spanned by U, we reformulate the AFT model (13) as

T=α0+DTβ0+ε, (14)

where D = (D1, …, Dp)T with Dj=Ujα˜jU˜Tγ˜j; at the same time,

(α˜j,γ˜jT)=(E[Uj]E[U˜Tγ˜j],(ΣU˜1Cov(Uj,U˜))T),α0=α0+(α˜1,,α˜p)β0+E[U˜T((γ˜1,,γ˜p)β0+γ0)],ε=U˜T((γ˜1,,γ˜p)β0+γ0)E[U˜T((γ˜1,,γ˜p)β0+γ0)]+ε,

and ΣU˜ is the covariance matrix of U˜. Note that α˜j+U˜Tγ˜j is the best linear unbiased predictor of Uj based on U˜. According to the definition of (α˜j,γ˜j), it is obvious that E[Dj] = 0 and Cov(Dj,U˜Tγ)=0, for all j and any vector γq. The new error term ε′ inherits the properties of ε and satisfies the moment conditions required for the ARTS: E[ε′] = 0, E[(ε′)2] < ∞, and ε′ is uncorrelated with D. To test whether β0 = 0 under model (14), it suffices to test

H0:θ0=0versusHA:θ00,

where θ0=Cov(Dj(β0),T)/Var(Dj(β0)), and j(b)=argmaxj=1,,p|Corr(Dj,DTb)| for any bp, implying j′(β0) = arg maxj=1,…,p |Corr(Dj, T)|.

The adjusted ARTS regresses each screening predictor on baseline covariates and applies the ARTS with the corresponding residuals D^=(D^1,,D^p)T as predictors. Because D^j involves a least-squares-type estimate of (α˜j,γ˜j) for j = 1, …, p, we can use the strong consistency of the estimates over all j (implied by SLLN and fixed p) to justify the replacement of D by D^. The bootstrap consistency is also guaranteed. Thus, we only need to resample the residuals in the procedures of the bootstrap and double-bootstrap. This offers a considerable saving in terms of computation cost (caused by implementing projections every time we have bootstrap or double-bootstrap samples), especially when p is large. We tailor the adjustment of U˜ to fit within the ARTS framework to avoid using a test statistic in matrix form, which is inevitable when fitting a multi-variable AFT model to adjust for U˜. This idea is crucial because it has the advantage of extending the theoretical results developed for the ARTS to the adjusted ARTS.

4. Forward-stepwise ARTS

Given one significant predictor detected by the ARTS, it is natural to continue searching for other potential predictors, conditional on the information provided by the identified predictor. We implement the idea used in the adjusted ARTS procedure to fulfill this task in a forward and stepwise direction. The conditional screening continues until no further significance is detected. We refer to this screening procedure as the forward-stepwise ARTS, implemented as follows:

  1. Given the predictor Uj^n detected by the ARTS, obtain residuals from regressing Uj on Uj^n whenever jj^n. Treat the residuals as screened predictors and run the adjusted ARTS. If no significant results are returned, stop the procedure; otherwise, collect the newly found significant predictor Uj^n.

  2. Use the residuals from regressing Uj on (Uj^n,Uj˜n) as updated predictors, for all j(j^n,j˜n). Implement the adjusted ARTS based on these updated predictors, in order to detect the next significant predictor.

  3. Keep accumulating predictors until no further significant predictors are detected.

Our forward-stepwise ARTS procedure successively updates the predictors using the residuals from regressing on previously identified predictors. Compared with the residual analysis suggested by MQ, our forward-stepwise procedure allows the regression coefficients of all already included predictors to be refitted at each step. This implies the detection of further significant predictors, adjusting for those already-included.

5. Competing methods

We compare the performance of the ARTS with several procedures that are widely applied to detect the presence of significant predictors for the survival outcome. When considering the adjustment of baseline covariates, these procedures can be modified as alternatives to the adjusted ARTS procedure.

5.1. AFT model approaches

Marginal parametric AFT models with Bonferroni correction (BONF-AFT)

A marginal parametric AFT model is often used to predict T from each predictor by specifying a parametric form of the error distribution, from which we obtain the maximum likelihood estimate of the marginal regression coefficient of each predictor. A Z-test with a Bonferroni correction is carried out to test whether each marginal regression coefficient is zero. This method can be implemented using the survreg function from the survival package of R. To adjust for baseline covariates, we treat the residual D^j as the predictor in a marginal parametric AFT model, for j = 1, …, p. In our finite-sample simulations, we specify that the error term follows a standard normal distribution.

Marginal AFT models with higher criticism correction (HC)

The higher criticism method is a test proposed by John Tukey for determining the overall significance of a collection of independent p-values. We use the statistic developed by Donoho and Jin, which is expected to perform well if the predictors are nearly uncorrelated (Donoho and Jin (2004), Donoho and Jin (2015)).

Centered percentile bootstrap with AFT model (CPB-AFT)

In contrast to the ARTS, this procedure works on the premise that there is at least one active predictor. Thus, it only bootstraps the first part of (10) to estimate the upper and lower 100(α/2)-th percentiles of the limiting distribution of n(θ^nθn). The estimated percentiles provide critical values for the test statistic nθ^n (Efron and Tibshirani (1993)). Note that this method yields a special case of the ARTS with λn = 0. We can easily modify this method to adjust for baseline covariates by replacing θn and θ^n with their counterparts in the framework given in Section 3.

Calibration by simulation from the estimated null distribution (CEND)

The asymptotic acceptance region is used to calibrate the test, and can be constructed from the special case in which ε and U are independent. Here, we simulate the limiting distribution of the scaled test statistic nθ^n/s under the null, where s2=n(Yi(n)α^nθ^nUj^n)2. At the null, Theorem 1 implies that nθ^n/sdM˜J/VJ, where {M˜j,j=1,,p}Np(0,ΣU), ΣU is the covariance matrix of U, and J=argmaxjM˜j2/Vj. With ΣU estimated using the sample covariance matrix of U, we generate 1000 realizations from Np(0,ΣU), which we to obtain 1000 random copies of nθ^n. Then, we use the corresponding percentiles to develop the acceptance region. We reject the null hypothesis if nθ^n falls outside this region. The version that adjusts for baseline covariates can be developed analogously by taking D^ as predictors.

5.2. Cox model approaches

The other popular approach for linking predictors to the survival outcome is the Cox model, where the related statistical inference can be developed based on the partial likelihood (Cox (1972), Cox (1975)).

Partial likelihood ratio test (PLRT)

This test uses the likelihood ratio test statistic Λ, which is the ratio of the partial likelihood from the full Cox model to that from the reduced model at the null. Provided that Λdχp2 (chi-square distribution with p degrees of freedom), comparing Λ with a χp2-distributed random variable gives the p-value to calibrate the test. However, the PLRT is only feasible in the case of n > p, because it involves a full linear model containing all of the predictors. To adjust for baseline covariates, we define the test statistic as the ratio of the partial likelihood from a Cox model containing (U,U˜) to that from a Cox model considering U˜ only. This statistic weakly converges to χp2.

Marginal Cox models with Bonferroni correction (BONF-COX)

This procedure is similar to the BONF-AFT, but is based on marginal Cox models to link the survival outcome to each predictor Uj, for j = 1, …, p. Given the asymptotic normality of the maximum partial likelihood estimator (MPLE) (Andersen and Gill (1982)), we conduct a Z-test with a Bonferroni correction to investigate whether each marginal regression coefficient is zero. To adjust for baseline covariates, we can instead fit Cox models containing (Uj,U˜) for all j, and use the corresponding MPLE of the regression coefficient of Uj as the test statistic.

Centered percentile bootstrap with Cox model (CPB-COX)

This procedure is similar to the CPB-AFT in general, but the selected predictor is determined in a different fashion. The marginal p-values are obtained from Z-tests based on separate marginal Cox models, and we select the predictor that marginally introduces the minimal p-value. We apply a centered percentile bootstrap on the MPLE of the regression coefficient of this selected predictor (i.e., the most significant predictor). To consider additional baseline covariates, we consider Cox models containing (Uj,U˜), for all j, and bootstrap the MPLE of the regression coefficient of the most significant predictor among Uj, while adjusting for U˜.

Global test based on Cox model (GLOBAL)

A score test is proposed to investigate whether the predictors U contribute to the hazard rate (Goeman et al. (2005)). The components of β0 are assumed to be random and independently follow a prior distribution with mean zero and common variance v. Here, it suffices to test whether v = 0 to investigate whether β0 = 0. Let r = (r1, …, rn)T, with ri=UiTβ0 for all i, and note that r is not observed because the unknown parameter vector β0 is included. By the assumptions on β0, r has mean zero and covariance matrix vUUT. Under the noninformative censoring assumption, the marginal likelihood function of v is defined by

L(v)=Er[exp(i=1n[δi(ln(h0(Xi))+ri)exp(ri)H0(Xi)])], (15)

where H0(t)=0th0(s)ds is the cumulative baseline hazard function up to time t. Applying the second-order Taylor expansion to the exponential term in (15) with respect to r, L(v) can be expressed by the first and second moments of r (Le Cessie and van Houwelingen (1995)). This implies that we can establish the desired test statistic in terms of the score function of v, which only involves the first and second moments of β0, without specifying the prior distribution. There are two ways to calculate the p-value: using asymptotic theory, and using permutation arguments. We compare both to the ARTS in our numerical studies. This global test can be modified to adjust for baseline covariates by simultaneously including U and U˜ in the Cox model, and the test statistic is constructed conditional on the MPLE of the regression coefficients of U˜.

6. Numerical studies

6.1. Finite-sample simulations

The performance of the ARTS is evaluated using numerical studies under different data-generating scenarios. The underlying survival outcome can follow either an AFT model or a proportional hazards model. For the former, we consider three data-generating models:

  • Model 1 T = ε;

  • Model 2 T = U1/4 + ε;

  • Model 3 T=j=1pβjUj+ε with β1 = … = β5 = 0.15, β6 = … = β10 = −0.1, and βj = 0 for j ≥ 11,

where ε denotes the noise, which follows a standard normal distribution and is independent of U. In Model 1, there is no active predictor, whereas there is only a single active predictor in Model 2. In Model 3, we have 10 active predictors and the most correlated predictor is not unique. The censoring time C follows an exponential distribution with various rate parameters for light censoring (10% of subjects with censored survival outcomes), moderate censoring (20%), and heavy censoring (40%). The vector of predictors U follows a p-dimensional normal distribution with each component UjN(0,1), and an exchangeable correlation structure Corr(Uj, Uk) = 0.5 for jk.

We also generate the survival outcome based on the following proportional hazards models (Bender et al. (2005)):

  • Model 4 h(t|U) = 2exp(t);

  • Model 5 h(t|U) = 2exp(t)exp(U1/4);

  • Model 6 h(t|U)=2exp(t)exp(j=1pβjUj) with the value of (β1, …, βp) as stated in Model 3.

To achieve the designed censoring rates, we generate the censoring time as an exponential random variable, for various choices of the rate parameter. We use Models 1 and 4 as the null models, Models 2 and 5 as the alternative models with a sparse signal, and Models 3 and 6 as the alternative models with weak dense signals.

For each data-generating scenario, we consider two sample sizes (n = 100 and 200), and five values for the dimension of the predictors (p = 10, 50, 100, 150, and 200). A nominal significance level of 5% is used throughout. The number of bootstrap replications is set as 1000. The selection of the threshold λn follows the steps stated in Section 2.4. To provide a full comparison, we compare the performance of the ARTS with the competing methods introduced in Section 5. The empirical rejection rates based on 1000 Monte Carlo replications under various censoring rates are displayed in Figures 12. The panels for Models 1 and 4 give Type-I error rates, which we compare using the nominal level of 5%. The panels for Models 2–6 indicate the power of each test.

Figure 1:

Figure 1:

Empirical rejection rates based on 1000 samples generated from Models 1–3, with the dimension ranging from p = 10 to p = 200.

Figure 2:

Figure 2:

Empirical rejection rates based on 1000 samples generated from Models 4–6, with the dimension ranging from p = 10 to p = 200.

In Figure 1, the ARTS controls the Type-I error rates (or equivalently, FWERs) around the nominal level, and demonstrates relatively high power for all alternative models. The BONF-AFT method gives more conservative Type-I error rates and lower power than the ARTS, with the exception of achieving similar power to the ARTS under alternative models with heavy censoring and n = 200. The HC method is anti-conservative and fails to control the Type-I errors. We suspect this is due to the relatively high correlation between the predictors, for which HC is not designed. The BONF-COX method and the global test based on asymptotic theory (GLOBAL-asymp) are highly conservative and lead to low power. Both the CPB-AFT and the CPB-COX are anti-conservative, with the empirical Type-I error rates considerably exceeding the nominal level under different sample sizes and various censoring rates (and thus going out of range somewhere in the left panels of Figure 1). The global test based on the permutation arguments (GLOBAL-permut) takes good control of the Type-I error rates, but claims much lower power than the ARTS, especially under light or moderate censoring. Both the CEND and the PLRT exhibit poor performance: the former yields large Type-I error rates but low power, whereas the latter introduces extremely high Type-I error rates. (The results of the PLRT are not shown here.) The unsatisfying performance of the CEND may result from small sample sizes in the simulations, given that the CEND is developed based on a simplified form of the limiting distribution. The power of each approach rises as the sample size increases and the censoring rate decreases. A comparison between the results of Models 2 and 3 shows no adverse impact on the power of the ARTS when the maximally correlated predictor is nonunique.

In Figure 2, where the data are not generated from AFT models, the ARTS retains good control of the Type-I error rates. On the other hand, the power of the ARTS is unstable when n = 100 or in the case of heavy censoring. Under light or moderate censoring, the power of the ARTS under Models 5 and 6 deteriorates sharply when n = 100 and p increases, whereas the ARTS maintains stable power when n = 200. With a misspecified error distribution, the BONF-AFT surprisingly controls the Type-I error rates well, but leads to much worse power. In contrast, the BONF-COX yields relatively greater power when the underlying survival outcome is generated from the proportional hazards model, although it is still conservative at the null. Other competing methods present similar results to those in Figure 1. Despite being unstable in terms of power owing to model misspecification, the ARTS still strikes a better balance between controlling the Type-I error and achieving sufficient power than other methods do, especially for light or moderate censoring and a large sample size. Comparing Figure 1 with Figure 2, we find that the ARTS is less susceptible to model misspecification than competing methods are. In the scenarios of the AFT data-generating models, the ARTS apparently dominates the Cox model approaches throughout; in the scenarios where the data are generated from proportional hazards models, the ARTS still exhibits better performance in the FWER and power than that of the Cox-model-relevant approaches when the censoring is light or moderate and n = 200.

6.2. Screening performance of ARTS

We further assess the performance of the ARTS as a full screening method (i.e., retaining all covariates with marginal test statistics beyond the critical values calculated for nθ^n) in terms of the false discovery rate (FDR), false negative rate (FNR), and false positive rate (FPR). Using a simulation study, we compare the screening performance of the ARTS with the Benjamini–Hochberg procedure (BH, Benjamini and Hochberg (1995)) and the Holm–Bonferroni procedure (HB, Holm (1979)). Relevant results are presented in Section S5 of the Supplementary Material.

The power (as given by the average values of (1-FNR)) is slightly less for the ARTS than for the BH, which is expected because the acceptance region is constructed from the critical values of the maximum correlation statistic θ^n, leading to results that are more conservative. We expect, however, that the forward-stepwise ARTS will outperform the ARTS screening procedure because it re-calibrates at each step. In terms of the FDR and FPR, the performance of the ARTS and BH are comparable, although that of the Bonferroni method is more conservative as expected. The HB and Bonferroni methods show similar performance with respect to all the measures.

6.3. Asymptotic power evaluation

In this section, we conduct a simulation study to evaluate the asymptotic FWER and the power of the ARTS, as compared with those of the BONF-AFT. We assess the asymptotic FWER and power based on the limiting distribution shown in Theorem 1. This approach can be a computationally efficient alternative to the simulation method used in our finite-sample studies, because it avoids the required double-bootstrap (for threshold selection) that incurs a heavy computation when implementing the ARTS.

Owing to the complicated limiting distribution shown in Theorem 1, this approach is only feasible when φj(L) can be reasonably negligible for all j. One possible situation is when β0 = 0 and the error term ε is independent of U. This restriction on ε facilitates the evaluation of the asymptotic FWER at the null (β0 = 0, b0 = 0) and the asymptotic power at local alternatives (β0 = 0, b00). This offers a saving in terms of computational costs, at the price of being sensitive to model misspecification.

Consider a local model

T(n)=(n1/2b0)U1+ε, (16)

where U1 is the first element of U. The predictors U, the error term ε, and the censoring time C are generated as in Section 6.1. We allow b0 to vary over a grid in [0, 5] by increments of 0.5. Under this local model, the complex limiting distribution reduces to a simpler form:

n(θ^nθn)d(MJ+b0Cov(UJ,U1))/Var(UJ)b0, (17)

where J = arg maxj{Mj +b0 Cov(Uj, U1)}2/Var(Uj), and M = {Mj, j = 1, …, p} is a mean-zero normal random vector with a covariance matrix given by that of the random vector {ε˜(UjEUj),j=1,,p}. This evaluation procedure is implemented as follows.

  1. For each value of b0 on the grid, generate a large sample (with n = 10,000) from the local model (16) and compute the corresponding Y(n). Using a fixed threshold λn, use the ARTS to develop the acceptance region [cl, cu] based on this sample.

  2. For each given b0, take 10,000 draws from the limiting distribution in (17), and then obtain 10,000 realizations of nθ^n.

  3. The asymptotic rejection rate of the ARTS (for the given b0) is assessed by computing the proportion of the realizations that fall outside [cl, cu] from the 10,000 realizations of nθ^n.

To reflect the random variation of the asymptotic FWER and the power over the samples generated in Step 1, we independently implement the above procedure 20 times and display the corresponding asymptotic rejection rates in a box plot for each b0. For comparison, we also plot the asymptotic power of the BONF-AFT, which is approximated by the rejection rate from 1000 samples, each of size n = 10,000.

To make the above evaluation practical for large p, say p = 1000, the threshold λn is fixed at 0, 4.3, 6.1, and 7.4 as the constant a takes corresponding values of 0, 2, 4, and 6. We present the results under light censoring (Figure 3), moderate censoring (Figure 4), and heavy censoring (Figure 5). Because the plots are similar between a = 0 and a = 1 and have no obvious difference when a ≥ 6, we only present the results for a = 0, 2, 4, 6, for conciseness. From these figures, we observe that smaller values of a lead to the ARTS yielding results that are more anti-conservative, as observed in previous numerical studies. When a = 0, in particular, the ARTS reduces to the CPB-AFT. On the other hand, the ARTS behaves more stably and provides more accurate control of the Type-I error rates as a increases. In addition, the variation within each box plot decreases when the value of a increases.

Figure 3:

Figure 3:

Asymptotic Type-I error and power of ARTS compared with BONF-AFT for p = 1000 under light censoring, where ARTS is implemented with a fixed threshold λn specified by a = {0, 2, 4, 6}, and each box plot is based on 20 independent replications with n = 10,000.

Figure 4:

Figure 4:

Asymptotic Type-I error and power, as in Figure 3, except under moderate censoring.

Figure 5:

Figure 5:

Asymptotic Type-I error and power, as in Figure 3, except under heavy censoring.

Comparing the asymptotic power of the BONF-AFT (denoted by the circle) with the median of each box plot, we find that the ARTS has more satisfactory performance than that of the BONF-AFT in most cases. In terms of median power, the ARTS even provides an extra 20% power in some situations (e.g., at b0 = 3, when a = 4 or a = 6 for all types of censoring). To control the asymptotic FWER, a reasonable choice is a = 4 under light or moderate censoring, because the median FWER starts to touch the nominal level and the corresponding variation within the box plot diminishes. On the other hand, the selection of a should fall between 2 and 4 under heavy censoring, because the median FWER remains higher than 5% when a = 2, but drops below 5% at a = 4.

6.4. Error dependent on predictors

In this section, we present the control on the FWER of the ARTS, when the error term ε is still uncorrelated with but dependent on the predictors U. For simplicity, U follows a p-dimensional normal distribution with mean zero and an identity covariance matrix, implying that the predictors are independent of each other. The FWERs of other AFT-model-relevant methods are also provided; here we omit the anti-conservative results of the CPB-AFT for conciseness, focusing instead on the CEND, which requires independence between ε and U.

To produce a dependent error structure on the predictors, we generate the error term ε by random replications from a normal distribution with mean zero and a standard deviation of 0.7(|U1| + 0.7). Then, we simulate the transformed time-to-event outcome under the null model T = ε. Though not independent, ε remains uncorrelated with U by Cov(ε, U1) = E[εU1] = E{U1E[ε|U1]} = 0, and Cov(ε, Uj) = E{UjE[ε|U1]} = 0 for j ≠ 1. The censoring time C still follows an exponential distribution, with varying rate parameters specified for different censoring rates. Figure 6 shows that only the ARTS controls the FWER around the nominal level in the case of dependent errors, except for giving slightly conservative FWERs for p ≥ 50, heavy censoring, and n = 100.

Figure 6:

Figure 6:

Empirical rejection rates based on 1000 samples generated from the null model with dependent errors under various p, sample sizes, and censoring rates.

7. Applications to real data

7.1. DLBCL data

We revisit the DLBCL data introduced earlier (Rosenwald et al. (2002)). This data set contains the after-chemotherapy survival time from DLBCL diseases, the categorical IPI variable (with three levels: low, medium, and high), and 7,399 genetic features of 222 patients with complete information on genetic predictors. The censoring rate is 43%. More details about the DLBCL data can be found in the literature (cf., Bøvelstad et al. (2009), Binder et al. (2011)). To adjust for the prognostic information provided by IPI, we apply the adjusted ARTS to this data set to detect the presence of significant genetic features. To maintain the stability of the KSV estimator, the observed event times are restricted up to τ = 2.36, which corresponds to the 90% empirical percentile of the observed event times. This excludes one observation that has an estimated synthetic response of 55.867 and severely distorts the estimation of the marginal regression coefficients. For the ARTS, we use the double-bootstrap to select the constant a from 0 to 15, by increments of 0.5. Before implementing the ARTS, we perform a pre-processing step to filter out genes that lack significant differentiation between the censored group (patients still alive at the end of the follow-up) and the uncensored group (patients who died of DLBCL diseases within the follow-up). For each gene, a standard two-sample t-test is conducted to determine whether the gene-expression measurement differentiates between these two groups. By comparing the corresponding p-values with the nominal level of 5%, this pre-processing step reduces the number of screening genetic features to 1026 (p = 1026).

To give a fair comparison with the ARTS, we also apply the following AFT-model-relevant competing methods: BONF-AFT and CPB-AFT, with IPI information adjusted. The CEND method is not included, because it is challenging to verify the required assumption of independence between the error and the predictors. In addition, the HC method is not considered because it is designed for nearly uncorrelated predictors, which is unrealistic in gene-expression data. The three implemented approaches yield similar p-values. The minimal Bonferroni corrected p-value from the BONF-AFT is 4.39%. The ARTS procedure reduces to a special case with λn = 0 and gives the same p-value of 3.40% as that of the CPB-AFT, from 1000 bootstrap samples. Figure 7 shows the sampling distribution of the test statistics used by the ARTS and CPB-AFT based on these bootstrap samples, as well as how the corresponding p-values are obtained. Given the nominal level of 5%, these three approaches all indicate one significant gene for the survival time of patients. The ID of the detected gene is “27766,” which belongs to the group of major histocompatibility class (MHC) II signatures. This finding supports the notion that a loss of MHC II expression correlates with a worse survival outcome, and corresponds to the results provided by Miller et al. (1988), Rosenwald et al. (2002), Rimsza et al. (2004), Roberts et al. (2006), and Higashi et al. (2016), among others.

Figure 7:

Figure 7:

DLBCL example. Left panel: histogram of Bn*, giving the two-sided ARTS p-value 3.40%. Right panel: histogram of n(θ^n*θ^n), giving the two-sided CPB-AFT p-value 3.40%.

7.2. Primary biliary cirrhosis data

In this example, we demonstrate how to apply the forward-stepwise ARTS to successively identify interaction effects, provided that the main effects of some covariates have been shown statistically or clinically significant. We use data from the Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver conducted between 1974 and 1984 (Fleming and Harrington (1991), Appendix D.1). A total of 312 PBC patients participated in the randomized placebo controlled trial of the drug D-penicillamine; in our data analysis, we restrict our attention to the 276 patients for whom we have complete covariate information. The censoring rate is 60%.

The survival outcome is the time from registration to death. Over the follow-up, there is no significant treatment effect (Fleming and Harrington (1991)). Only five of the 16 risk factors were found to be statistically significant under the setting of the Cox model (Dickson et al. (1989)) or under the AFT model (Jin et al. (2003)). Furthermore, they were identified as a subset of the active predictors under the general Cox model (Bunea and McKeague (2005)). These significant risk factors are age (in years), presence of edema (0 = no; 0.5 = resolved; 1 = unresolved with therapy), serum bilirubin (in mg/dl), albumin (in gm/dl), and protime (standardized blood clotting time, in seconds). Of these risk factors, serum bilirubin, albumin, and protime are log-transformed. We successively locate significant pairwise interaction terms of 17 variables, adjusting for the five aforementioned risk factors. These 17 variables include the treatment indicator and 16 clinical risk factors for the survival time (p=(172)=136).

Figure 8 displays the pattern of p-values for the newly entered interaction term at each step. The forward-stepwise ARTS procedure detects one significant interaction term, where the constant a and the end of the follow-up τ are selected as in Section 7.1. This detected interaction is between platelet (platelets per cubic ml/1000) and alk.phos (alkaline phosphatase, in U/liter). For comparison, we also present the successive p-values given by the CPB-AFT. The conclusion remains the same, but the p-values of the CPB-AFT are smaller, as expected.

Figure 8:

Figure 8:

PBC example. The patterns of p-values for forward-stepwise ARTS and CPB-AFT.

To examine the effect of taking covariate-dependent censoring into account when applying the ARTS in this example, we run the forward-stepwise ARTS as before, except we replace G^n by a Cox-model-based estimate, conditional on selected covariates (alkaline phosphotase and log-transformed protime). In contrast to our earlier finding of one significant interaction term, here we find none (results not shown). The CPB-AFT procedure (with the same Cox model estimate of G0) leads to the same conclusion.

8. Discussion

We have developed an adaptive resampling test for survival data (ARTS) to detect the presence of significant predictors for right-censored survival outcomes. We use marginal correlation screening to reduce the high-dimensional detection problem to a single test of whether θ0 = 0, where θ0 is the marginal regression coefficient of the most correlated predictor with the survival outcome. In the setting of marginal screening for survival data, few studies have examined the problem of post-selection inference. The problem is challenging, not only because of the nonregular asymptotic behavior of the test statistic at the null (i.e., θ0 = 0), but also because of the presence of censoring. Within this framework, the ARTS is designed to adapt to the nonregularity, while dealing with the increased dispersion introduced by the censoring. The advantage of the ARTS is that it provides a post-selection-corrected p-value without sacrificing power, while avoiding distributional assumptions, specific correlation structures between predictors, and a preconceived choice of the regression parameters of interest. The ARTS procedure is also versatile for practical use. Various extensions of the ARTS are proposed to adjust for additional baseline covariates of clinicians’ interests and to successively identify further active predictors.

We recognize that the ARTS requires an independent-censoring assumption that may be violated in some clinical contexts. One direction for future work is to develop rigorous theoretical results for the ARTS under the assumption of conditionally independent censoring, given the predictors. To address this type of censoring mechanism, we can use the Cox model or the local Kaplan–Meier estimator to incorporate covariates into the estimation of the conditional survival function of the censoring on the predictors G0(·|U). The generalization of the censoring mechanism could still be challenging in our framework, even with some of the proposals for estimating G0(·|U) listed above. One challenge is how to determine the covariates to be included in the estimation of G0(·|U) under the high-dimensional AFT model. Then, we need to find out whether the post-selection inference results would be affected, because these included covariates may not be completely contained under a series of working AFT models using one predictor per time. To the best of our knowledge, this question has not been fully answered in the area of marginal screening based on survival data, and is worth further attention.

Although our simulation results show that the ARTS performs well when pn, we have provided theoretical support only, assuming a fixed p. Formal testing procedures that can adjust to the nonregular behavior of θ^n under diverging p appear to be challenging. A potential alternative approach that might be able to handle a diverging p would be to extend the efficient influence function technique of Luedtke and van der Laan (2018) to the right-censored setting in terms of a regularized version of the KSV estimator.

Supplementary Material

supplement

Acknowledgments

This research was partially supported by NIH Grants R01GM095722 and R21MH108999, and NSF Grant DMS-1307838. The authors thank the associate editor and reviewers for their helpful comments.

Footnotes

Supplementary Material

The online Supplementary Material includes detailed proofs of the theorems, as well as additional simulation results.

References

  1. Andersen PK and Gill RD (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics, 10, 1100–1120. [Google Scholar]
  2. Antoniadis A, Fryzlewicz P, and Letué F (2010). The Dantzig selector in Cox’s proportional hazards model. Scandinavian Journal of Statistics, 37, 531–552. [Google Scholar]
  3. Bender R, Austin T, and Blettner M (2005). Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine, 24, 1713–1723. [DOI] [PubMed] [Google Scholar]
  4. Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57, 289–300. [Google Scholar]
  5. Binder H, Porzelius C, and Schumacher M (2011). An overview of techniques for linking high-dimensional molecular data to time-to-event endpoints by risk prediction models. Biometrical Journal, 53, 170–189. [DOI] [PubMed] [Google Scholar]
  6. Bøvelstad HM, Nygård S, and Borgan Ø (2009). Survival prediction from clinic-genomic models–A comparative study. BMC bioinformatics, 10, Article 413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bradic J, Fan J, and Jiang J (2011). Regularization for Cox’s proportional hazards model with NP-dimensionality. Annals of Statistics, 39, 3092–3120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Buckley J and James I (1979). Linear regression with censored data. Biometrika, 66, 429–436. [Google Scholar]
  9. Bunea F and McKeague IW (2005). Covariate selection for semiparametric hazard function regression models. Journal of Multivariate Analysis, 92, 186–204. [Google Scholar]
  10. Cai T, Huang J, and Tian L (2009). Regularized estimation for the accelerated failure time model. Biometrics, 65, 394–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cox D (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 34, 187–202. [Google Scholar]
  12. Cox DR (1975). Partial likelihood. Biometrika, 62, 269–276. [Google Scholar]
  13. Dabrowska D (1989). Uniform consistency of the kernel conditional Kaplan–Meier estimate. The Annals of Statistics, 17, 1157–1167. [Google Scholar]
  14. Datta S, Le-Rademacher J, and Datta S (2007). Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO. Biometrics, 63, 259–271. [DOI] [PubMed] [Google Scholar]
  15. Dickson ER, Grambsch PM, Fleming TR, Fisher LD, and Langworthy A (1989). Prognosis in primary biliary cirrhosis: model for decision making. Hepatology, 10, 1–7. [DOI] [PubMed] [Google Scholar]
  16. Donoho D and Jin J (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics, 32, 962–994. [Google Scholar]
  17. Donoho D and Jin J (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statistical Science, 30, 1–25. [Google Scholar]
  18. Efron B and Tibshirani RJ (1993). An Introduction to the Bootstrap (Monographs on Statistics & Applied Probability). Chapman and Hall/CRC. [Google Scholar]
  19. Engler D and Li Y (2009). Survival analysis with high-dimensional covariates: An application in microarray studies. Statistical Applications in Genetics and Molecular Biology, 8, Article 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fan J, Feng Y, and Wu Y (2010). High-dimensional variable selection for Cox’s proportional hazards model. Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown. Institute of Mathematical Statistics; Beachwood OH, 6, 70–86. [Google Scholar]
  21. Fan J and Li R (2002). Variable selection for Cox’s proportional hazards model and frailty model. Annals of Statistics, 30, 74–99. [Google Scholar]
  22. Fang EX, Ning Y, and Liu H (2017). Testing and confidence intervals for high dimensional proportional hazards models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1415–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Fleming TR and Harrington DP (1991). Counting Processes and Survival Analysis. John Wiley & Sons, Inc. [Google Scholar]
  24. Goeman JJ, Oosting J, Cleton-Jansen AM, Anninga JK, and van Houwelingen HC (2005). Testing association of a pathway with survival using gene expression data. Bioinformatics, 21, 1950–1957. [DOI] [PubMed] [Google Scholar]
  25. Gorst-Rasmussen A and Scheike T (2013). Independent screening for single-index hazard rate models with ultra-high dimensional features. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75, 217–245. [Google Scholar]
  26. He X, Wang L, and Hong HG (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Annals of Statistics, 41, 342–369. [Google Scholar]
  27. Higashi M, Tokuhira M, Fujino S, Yamashita T, Abe K, Arai E, Kizaki M, and Tamaru J-I (2016). Loss of HLA-DR expression is related to tumor microenvironment and predicts adverse outcome in diffuse large B-cell lymphoma. Leukemia & Lymphoma, 57, 161–166. [DOI] [PubMed] [Google Scholar]
  28. Holm S (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. [Google Scholar]
  29. Hong HG, Chen X, Christiani DC, and Li Y (2018a). Integrated powered density: screening ultra-high dimensional covariates with survival outcomes. Biometrics, 74, 421–429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hong HG, Kang J, and Li Y (2018b). Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Analysis, 24, 45–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Huang J and Ma S (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16, 176–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Huang J, Ma S, and Xie H (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62, 813–820. [DOI] [PubMed] [Google Scholar]
  33. Jin Z, Lin DY, Wei LJ, and Ying Z (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90, 341–353. [Google Scholar]
  34. Johnson BA (2008). Variable selection in semiparametric linear regression with censored data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 351–370. [Google Scholar]
  35. Johnson BA, Lin DY, and Zeng D (2008). Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association, 103, 672–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Kalbfleisch JD and Prentice RL (2002). The Statistical Analysis of Failure Time Data. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, USA. [Google Scholar]
  37. Keiding N, Andersen PK, and Klein JP (1997). The role of frailty models and accelerated failure time models in describing heterogeneity due to omitted covariates. Statistics in Medicine, 16, 215–224. [DOI] [PubMed] [Google Scholar]
  38. Koul H, Susarla V, and Van Ryzin J (1981). Regression analysis with randomly right-censored data. The Annals of Statistics, 9, 1276–1288. [Google Scholar]
  39. Lai TL and Ying Z (1991a). Large sample theory of a modified Buckley-James estimator for regression analysis with censored data. The Annals of Statistics, 19, 1370–1402. [Google Scholar]
  40. Lai TL and Ying Z (1991b). Rank regression methods for left-truncated and right-censored data. The Annals of Statistics, 19, 531–556. [Google Scholar]
  41. Le Cessie S and van Houwelingen HC (1995). Testing the fit of a regression model via score tests in random effects models. Biometrics, 51, 600–614. [PubMed] [Google Scholar]
  42. Li J, Zheng Q, Peng L, and Huang Z (2016). Survival impact index and ultra-high dimensional model-free screening with survival outcomes. Biometrics, 72, 1145–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Li Y, Dicker L, and Zhao SD (2014). The dantzig selector for censored linear regression models. Statistica Sinica, 24, 251–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Lin DY and Ying Z (1994). Semiparametric analysis of the additive risk model. Biometrika, 81, 61–71. [Google Scholar]
  45. Lockhart R, Taylor J, Tibshirani RJ, and Tibshirani R (2014). A significance test for the lasso. Annals of Statistics, 42, 413–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Luedtke AR and van der Laan MJ (2018). Parametric-rate inference for one-sided differentiable parameters. Journal of American Statistical Association, 113, 780–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Ma S and Du P (2012). Variable selection in partly linear regression model with diverging dimensions for right censored data. Statistica Sinica, 22, 1003–1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. McKeague IW and Qian M (2015). An adaptive resampling test for detecting the presence of significant predictors (with discussion). Journal of the American Statistical Association, 110, 1422–1433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. McKeague IW and Sasieni PD (1994). A partly parametric additive risk model. Biometrika, 81, 501–514. [Google Scholar]
  50. Medeiros FM, da Silva-Júnior AH, Valença DM, and Ferrari SL (2014). Testing inference in accelerated failure time models. International Journal of Statistics and Probability, 3, 121–131. [Google Scholar]
  51. Miller TP, Lippman SM, Spier CM, Slymen DJ, and Grogan TM (1988). HLA-DR (Ia) immune phenotype predicts outcome for patients with diffuse large cell lymphoma. The Journal of Clinical investigation, 82, 370–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Pollard D (1990). Empirical Processes: Theory and Applications NSF-CBMS Regional Conference Series in Probability and Statistics: Vol 2 Institute of Mathematical Statistics. [Google Scholar]
  53. Rimsza LM, Roberts RA, Miller TP, Unger JM, LeBlanc M, Braziel RM, Weisenberger DD, Chan WC, Muller-Hermelink HK, Jaffe ES, Gascoyne RD, Campo E, Fuchs DA, Spier CM, Fisher RI, Delabie J, Rosenwald A, Staudt LM, and Grogan TM (2004). Loss of MHC class II gene and protein expression in diffuse large B-cell lymphoma is related to decreased tumor immunosurveillance and poor patient survival regardless of other prognostic factors: a follow-up study from the leukemia and lymphoma molecular profiling project. Blood, 103, 4251–4258. [DOI] [PubMed] [Google Scholar]
  54. Ritov Y (1990). Estimation in linear regression with censored data. Annals of Statistics, 18, 303–328. [Google Scholar]
  55. Roberts RA, Wright G, Rosenwald AR, Jaramillo MA, Grogan TM, Miller TP, Frutiger Y, Chan WC, Gascoyne RD, Ott G, Muller-Hermelink HK, Staudt LM, and Rimsza LM (2006). Loss of major histocompatibility class II gene and protein expression in primary mediastinal large B-cell lymphoma is highly coordinated and related to poor patient survival. Blood, 108, 311–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Giltnane JM, Hurt EM, Zhao H, Averett L, Yang L, Wilson WH, Jaffe ES, Simon R, Klausner RD, Powell J, Duffey PL, Longo DL, Greiner TC, Weisenburger DD, Sanger WG, Dave BJ, Lynch JC, Vose J, Armitage JO, Montserrat E, López-Guillermo A, Grogan TM, Miller TP, LeBlanc M, Ott G, Kvaloy S, Delabie J, Holte H, Krajci P, Stokke T, and Staudt LM (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. The New England journal of medicine, 346, 1937–1947. [DOI] [PubMed] [Google Scholar]
  57. Sinnott JA and Cai T (2016). Inference for survival prediction under the regularized Cox model. Biostatistics, 17, 692–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Song R, Lu W, Ma S, and Jessie Jeng X (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101, 799–814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Srinivasan C and Zhou M (1994). Linear regression with censoring. Journal of Multivariate Analysis, 49, 179–201. [Google Scholar]
  60. Stute W and Wang J-L (1993). The strong law under random censorship. The Annals of Statistics, 21, 1591–1607. [Google Scholar]
  61. Taylor J and Tibshirani R (2018). Post-selection inference for 1-penalized likelihood models. Canadian Journal of Statistics, 46, 41–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. The International Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993). A predictive model for aggressive non-Hodgkin’s lymphoma. New England Journal of Medicine, 329, 987–994. [DOI] [PubMed] [Google Scholar]
  63. Tibshirani R (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385–395. [DOI] [PubMed] [Google Scholar]
  64. Tsiatis AA (1990). Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics, 18, 354–372. [Google Scholar]
  65. van der Vaart AW (2000). Asymptotic Statistics. Cambridge University Press. [Google Scholar]
  66. Wu Y (2012). Elastic net for Cox’s proportional hazards model with a solution path algorithm. Statistica Sinica, 22, 271–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Ying Z (1993). A large sample study of rank estimation for censored regression data. The Annals of Statistics, 21, 76–99. [Google Scholar]
  68. Zhang HH and Lu W (2007). Adaptive Lasso for Cox’s proportional hazards model. Biometrika, 94, 691–703. [Google Scholar]
  69. Zhao SD and Li Y (2012). Principled sure independence screening for Cox models with ultra-high dimensional covariates. Journal of Multivariate Analysis, 105, 397–411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Zhao SD and Li Y (2014). Score test variable screening. Biometrics, 70, 862–871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Zhong P-S, Hu T, and Li J (2015). Tests for coefficients in high-dimensional additive hazard models. Scandinavian Journal of Statistics, 42, 649–664. [Google Scholar]
  72. Zhou M (1992). Asymptotic normality of the ‘synthetic data’ regression estimator for censored survival data. The Annals of Statistics, 20, 1002–1021. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

RESOURCES