Variable selection in the accelerated failure time model via the bridge method

Jian Huang; Shuangge Ma

doi:10.1007/s10985-009-9144-2

. Author manuscript; available in PMC: 2010 Nov 20.

Published in final edited form as: Lifetime Data Anal. 2009 Dec 16;16(2):176–195. doi: 10.1007/s10985-009-9144-2

Variable selection in the accelerated failure time model via the bridge method

Jian Huang ^1,², Shuangge Ma ³

PMCID: PMC2989175 NIHMSID: NIHMS251151 PMID: 20013308

Abstract

In high throughput genomic studies, an important goal is to identify a small number of genomic markers that are associated with development and progression of diseases. A representative example is microarray prognostic studies, where the goal is to identify genes whose expressions are associated with disease free or overall survival. Because of the high dimensionality of gene expression data, standard survival analysis techniques cannot be directly applied. In addition, among the thousands of genes surveyed, only a subset are disease-associated. Gene selection is needed along with estimation. In this article, we model the relationship between gene expressions and survival using the accelerated failure time (AFT) models. We use the bridge penalization for regularized estimation and gene selection. An efficient iterative computational algorithm is proposed. Tuning parameters are selected using V-fold cross validation. We use a resampling method to evaluate the prediction performance of bridge estimator and the relative stability of identified genes. We show that the proposed bridge estimator is selection consistent under appropriate conditions. Analysis of two lymphoma prognostic studies suggests that the bridge estimator can identify a small number of genes and can have better prediction performance than the Lasso.

Keywords: Bridge penalization, Censored data, High dimensional data, Selection consistency, Stability, Sparse model

1 Introduction

High throughput technologies make it possible to identify genomic markers that are associated with disease development and progression. Gene profiling studies have been extensively conducted using microarrays. Identification of genomic markers from analysis of microarray data may lead to a better understanding of the genomic mechanism beneath disease development and assist future clinical diagnosis and prognosis. Among many disease outcomes measured in microarray studies, censored disease-free or overall survival has attracted much attention. See Alizadeh et al. (2000), Rosenwald et al. (2003), and Dave et al. (2004) for representative examples. Because of the high dimensionality of gene expression data, standard survival analysis techniques cannot be directly used. In addition, among the thousands of genes surveyed, only a subset may be associated with disease. Thus, gene selection is needed along with survival model construction.

When analyzing censored survival data with microarray gene expression measurements, the Cox proportional hazards model and the additive risk model have been adopted (Gui and Li 2005; Ma and Huang 2007). An alternative to those models is the accelerated failure time (AFT) model. Unlike the Cox and additive models, the AFT model is a linear regression model, in which logarithm (or in general a known monotone transformation) of the failure time is directly regressed on gene expressions (Kalbfleisch and Prentice 1980). Compared with the Cox and additive models, the AFT model has an intuitive linear regression interpretation (Wei 1992). In this article, we apply the method of Zhou (1992) and Stute (1993), which uses the Kaplan–Meier weights to account for censoring and has a weighted least squares loss function. The simple form of the loss function makes this estimation approach especially suitable for high dimensional data.

To tackle the high dimensionality problem of gene expression data, various dimension reduction or variable selection techniques have been employed. Previously used dimension reduction techniques include principal component analysis, singular value decomposition, partial least squares, and others. Among the many variable selection techniques developed, penalized selection has attracted extensive attentions. Penalization methods put penalties on the regression coefficients. By properly balancing goodness of fit and model complexity, penalization approaches can lead to parsimonious models with reasonable fit.

The most famous example of penalization methods is the Lasso (Tibshirani 1996), which has been used in gene expression analysis with survival data (Gui and Li 2005; Ma and Huang 2007; Wang et al. 2008). However, it has been shown that the Lasso is in general not variable selection consistent (Leng et al. 2006). Various penalization methods that can have consistent selection have been proposed. Examples include the adaptive Lasso and the SCAD. Another penalty that also enjoys consistent selection is the bridge penalty. Under conventional setup, i.e., when the number of observations is much larger than the number of covariates, the bridge penalty has been investigated. See for example Fu (1998). Huang et al. (2008a) shows that the bridge penalty can have the oracle estimation and selection properties in linear regression models with a divergent number of covariates.

In this article, we consider genomic studies where gene expressions are measured along with censored disease survival. The bridge penalization approach is used for regularized estimation and gene selection. The rest of the article is organized as follows. The AFT model and bridge estimation are introduced in Sect. 2. An efficient computational algorithm is proposed in Sect. 3. Resampling based methods are proposed to evaluate prediction performance and relative stability of selected genes in Sect. 4. Asymptotic selection consistency is established in Sect. 5. Analysis of two lymphoma studies are provided to illustrate the proposed method in Sect. 6. The article concludes with discussions in Sect. 7. Proofs are given in the Appendix.

2 Bridge estimation in the AFT model

Let T_i be logarithm of the failure time and X_i be the p-dimensional gene expressions for the ith subject in a random sample of size n. The AFT model assumes

T_{i} = α + X_{i}^{'} β + ε_{i}, i = 1, \dots, n,

(1)

where α is the intercept, β ∈ IR^p is the regression coefficient, and ε_i is the error term. When T_i is subject to right censoring, we observe (Y_i, δ_i, X_i), where Y_i = min{T_i, C_i}, C_i is logarithm of the censoring time, and δ_i = 1_{{T_i≤C_i}} is the censoring indicator.

Estimation in the AFT model with an unspecified error distribution has been studied extensively. The following two approaches have received special attentions. The first is the Buckley-James estimator which adjusts censored observations using the Kaplan–Meier estimator (Buckley and James 1979); and the second is the rank based estimator motivated by the score function of the partial likelihood function (Ying 1993). Although they both perform well when there are a small number of covariates, with high dimensional gene expression data, both approaches have high computational cost.

A computationally more feasible approach is the weighted least squares (LS) approach (Zhou 1992; Stute 1993). Let F̂_n be the Kaplan–Meier estimator of F, the distribution function of T. It can be computed as ${\hat{F}}_{n} (y) = \sum_{i = 1}^{n} w_{i} 1 {Y_{(i)} \leq y}$ . Here w_is are the jumps in the Kaplan–Meier estimator computed as $w_{1} = \frac{δ_{(1)}}{n} and w_{i} = \frac{δ_{(i)}}{n - i + 1} \prod_{j = 1}^{i - 1} {(\frac{n - j}{n - j + 1})}^{δ_{(j)}}$ , i = 2, …, n. w_is have also been referred to as the Kaplan–Meier weights in Stute (1993). Here Y₍₁₎ ≤ ⋯ ≤ Y_(n) are the order statistics of Y_i’s, δ₍₁₎, …, δ_(n) are the associated censoring indicators, and X₍₁₎, …, X_(n) are the associated covariates. The weighted LS loss function is

\frac{1}{2} \sum_{i = 1}^{n} w_{i} {(Y_{(i)} - α - X_{(i)}^{'} β)}^{2} .

We center X_(i) and Y_(i) with their w_i-weighted means, respectively. Let ${\bar{X}}_{w} = \sum_{i = 1}^{n} w_{i} X_{(i)} / \sum_{i = 1}^{n} w_{i} and {\bar{Y}}_{w} = \sum_{i = 1}^{n} w_{i} Y_{(i)} / \sum_{i = 1}^{n} w_{i}$ . Denote $X_{(i)}^{*} = {({nw}_{i})}^{1 / 2} (X_{(i)} - {\bar{X}}_{w}) and Y_{(i)}^{*} = {({nw}_{i})}^{1 / 2} (Y_{(i)} - {\bar{Y}}_{w})$ . We can then rewrite the weighted LS loss function as

Q_{n} (β) = \frac{1}{2} \sum_{i = 1}^{n} {(Y_{(i)}^{*} - X_{(i)}^{*'} β)}^{2} .

(2)

The bridge penalized objective function is

L_{n} (β) = Q_{n} (β) + λ \sum_{j = 1}^{p} | β_{j} |^{γ},

(3)

where λ is a data dependent tuning parameter and γ > 0 is the bridge index. The value β̂_n that minimizes (3) is called the bridge estimator (Frank and Friedman 1993; Fu 1998).

The bridge estimator includes two important special cases. When γ = 2, it is the familiar ridge estimator, which does not have a “built-in” variable selection mechanism. When γ = 1, it is the Lasso estimator. In this article, we focus on the case with γ < 1.

3 Computation

3.1 Computational algorithm

Direct minimization of L_n(β) is difficult, since the bridge penalty is not convex. An approximation approach is proposed in Huang et al. (2008a). As an alternative, we consider the following approach, which is more efficient and does not need any approximation.

For 0 < γ < 1, define

S_{n} (β, θ) = Q_{n} (β) + \sum_{j = 1}^{p} θ_{j}^{1 - 1 / γ} | β_{j} | + τ_{n} \sum_{j = 1}^{p} θ_{j},

(4)

where τ_n is a penalty parameter.

Proposition 1 If $λ = τ_{n}^{1 - γ} γ^{- γ} {(1 - γ)}^{γ - 1}$ , then β̂_n minimizes L_n(β) if and only if (β̂_n, θ̂) minimizes S_n(β, θ) subject to θ̂_j ≥ 0 for j = 1, …, p.

This proposition can be proved as in Huang et al. (2009), in which a similar result is shown for the group bridge estimator in linear regression without censoring. Based on Proposition 1, we propose the following iterative algorithm for computing the bridge estimate in the AFT models.

Compute an initial estimate β⁽⁰⁾. Specifically, we propose using the Lasso estimate, i.e, the minimizer of Eq. 3 with γ = 1.

For s = 1, 2, …
Compute $θ_{j}^{(s)} = {(\frac{1 - γ}{τ_{n} γ})}^{γ} {| β_{j}^{(s - 1)} |}^{γ}$ , j = 1, …, p.
Compute $β^{(s)} = {arg min}_{β} {Q_{n} (β) + \sum_{j = 1}^{p} {(θ_{j}^{(s)})}^{1 - 1 / γ} | β_{j} |}$ .
Repeat Steps 2–3 until convergence.

The proposed algorithm always converges, since at each step the non-negative objective function (4) decreases. In our numerical studies, convergence is usually achieved within ten iterations. In Step 1, we choose the Lasso estimate as the initial value with the penalty parameter in the Lasso criterion determined by V-fold cross validation. Theorem 1 in Sect. 5 establishes that the Lasso tends to select all the important genes plus a few false positives. Thus, using the Lasso as the starting value will not miss any important genes. The main computational cost comes from Step 3, which computes a weighted Lasso estimate and can be achieved with many existing algorithms as such the LARS (Efron et al. 2004) or the boosting (Ma and Huang 2007). In this article, we adopt the boosting, since its computational cost is relatively insensitive to the number of genes.

3.2 Tuning parameter selection

We use V-fold cross validation to determine the tuning parameter λ. For a pre-defined integer V, partition the data randomly into V non-overlapping subsets with equal sizes. For a given λ, we define $CV score = \sum_{υ = 1}^{V} Q^{(υ)} ({\hat{β}}^{(- υ)})$ , where β̂^(−υ) is the bridge estimator of β based on the data without the υ^th subset and Q^(υ) is the function defined in (2) evaluated on the υ^th subset. Optimal tuning is defined as the minimizer of the CV score. In this article, we set V = 5.

4 Evaluation

With gene expression data, p ≫ n. Most of the conventional evaluation techniques are valid only under the p ≪ n scenario and cannot be applied here. In this study, we are most interested in two aspects: (1) prediction performance. That is, whether the identified genes and corresponding AFT models can make proper predictions for subjects not used in the model estimation; and (2) stability of identified genes. Early studies have shown that gene signatures identified from analysis of gene expression data may suffer from low reproducibility. That is, genes identified using different data sets may differ significantly. Ideally, evaluation should be based on independent data, which is usually not available. As an alternative, we propose the following resampling based approaches.

4.1 Evaluation of prediction

We propose prediction evaluation based on random partitions as follows.

Partition the data randomly into a training set of size n₁ and a testing set of size n₂ with n₁ + n₂ = n. We use n₁ = 2/3n.
Compute the bridge estimate using the training set data. Cross validation is needed to select the optimal tuning for the training set.
Use the training set estimate to make predictions for subjects in the testing set. Specifically, we first compute the risk scores X′β̂. We then dichotomize the risk scores at the median and create two risk groups (referred to as high and low risk groups respectively). Compare the survival functions for the two risk groups, and compute the logrank statistic.
To avoid over fitting caused by a “lucky” partition, we repeat Steps 1–3 B = 500 times. Each time a new partition is made and the value of the logrank statistic is computed.

We partition the dataset in Step 1. To generate a fair evaluation, we recompute tuning parameters for each individual partitions in Step 2. In Step 3, we adopt the logrank statistic as the evaluation measurement. A larger logrank statistic suggests that the high and low risk groups are better separated and the proposed approach is more effective. We create two risk groups, mainly because of the small sample sizes. By repeating the partitioning process many times, we can obtain a Monte Carlo estimation of the distribution of the logrank statistics (as opposed to a single logrank statistic in several early studies). We call it the observed predictive distribution (OPD) of the logrank statistic.

When n ≫ p, the logrank statistic is asymptotically χ² distributed. With gene expression data and n ≪ p, it is not clear how effective the χ² approximation is. To tackle this problem, we propose the following permutation based approach to generate the reference distribution for the OPD. We first randomly permute the event times together with the censoring indicators. We then follow the same procedure as for the OPD and obtain a Monte Carlo estimation of the distribution of the logrank statistic under permutation. We call it permutation predictive distribution (PPD) of the logrank statistic. With permutation, the event times and gene expressions are expected to be independent. The distribution of logrank statistics so computed can serve as the reference distribution for the OPD.

Calculations of the OPD and PPD are parallel: the OPD is calculated from the observed data, whereas the PPD is calculated from the permuted data. Well separated OPD and PPD may indicate that the propose approach can identify genes and models with satisfactory prediction performance, whereas substantially overlapped distributions suggest that either the proposed approach is not effective or the gene expressions do not have good discriminant power.

4.2 Evaluation of stability

The prediction evaluation described above assesses overall performance of the proposed approach and selected genes/models. In what follows, we evaluate the relative stability of each identified gene. The rationale behind the proposed approach is that, if a gene is more “important” or more “stable”, it should be identified “more often” in analysis of multiple data sets. Since multiple independent data sets not available, we resort to random sampling again.

We first randomly sample n₁ = 2/3n subjects. We then use the bridge approach to identify genes in the sampled subset. We repeat this procedure B = 500 times. For the jth gene, we count c_j, the number of times it is identified. The proportion o_j = c_j/B gives a measure of the relative importance and stability of the jth gene, and will be referred to as the observed occurrence index (OOI). Following the same rationale as in the above section, we also permute the data and recompute the occurrence index, which will be referred to as the permutation occurrence index (POI) (since permutated data is used). The occurrence indexes are simply byproducts of the prediction evaluation and incur no additional computational cost.

5 Asymptotic properties

In this section, we investigate asymptotic properties of the proposed bridge approach with p ≫ n. We are especially interested in the gene selection consistency property, because once genes are consistently selected, standard approaches can lead to consistent estimates. We note that for fixed p, the asymptotic results can be obtained easily using standard approaches. Since the case with fixed p is not relevant to our data applications, we will not consider it here.

We note that, with the proposed iterative algorithm, the Lasso estimate is used as the starting value. Genes not selected by the Lasso will not be selected in the final model. Thus, it is crucial to first establish properties of the Lasso estimate under the present data/model setup. Careful inspection of the proposed computational algorithm suggests that, once the initial estimate is obtained, in each step, an adaptive Lasso estimate is computed. Thus, we are able to use similar methods as in Huang et al. (2008b), which studies properties of the adaptive Lasso in high dimensional linear regression models, to establish properties of the bridge estimate.

We consider the rescaled $X_{(i)}^{*} and Y_{(i)}^{*}$ defined in Sect. 2. For simplicity of notations, we use X_(i) and Y_(i) to denote $X_{(i)}^{*} and Y_{(i)}^{*}$ hereafter. Let Y = (Y₍₁₎, …, Y_(n))′. Let X be the n × p covariate matrix consisting of row vectors $X_{(1)}^{'}, \dots, X_{(n)}^{'}$ . Let X₁, …, X_p be the p columns of X. Let W = diag(nw₁, …, nw_n) be the diagonal matrix of the Kaplan–Meier weights. For A ⊆ {1, …, p}, let X_A = (X_j, j ∈ A) be the matrix with columns X_j’s for j ∈ A. Denote $Σ_{A} = X_{A}^{'} {WX}_{A} / n$ . Denote the cardinality of A by |A|.

Let β₀ = (β₀₁, …, β_0p)′ be the true value of the regression coefficients. Let A₁ = {j : β_0j ≠ 0} be the set of nonzero coefficients and let q = |A₁|. We make the following assumptions.

(A1)
The number of nonzero coefficients q is finite.
(A2)
(a) The observations (Y_i, X_i, δ_i), 1 ≤ i ≤ n are independent and identically distributed; (b) The errors ε₁, …, ε_n are independent and identically distributed with mean 0 and finite variance σ². Furthermore, they are subgaussian, in the sense that there exist K₁, K₂ > 0 such that the tail probabilities of ε_i satisfy P(|ε_i| > x) ≤ K₂ exp(−K₁x²) for all x ≥ 0 and all i.
(A3)
(a) The errors (ε₁, …, ε_n) are independent of the Kaplan–Meier weights (w₁, …, w_n); (b) The covariates are bounded. That is, there is a constant M > 0 such that |X_ij| ≤ M, 1 ≤ i ≤ n, 1 ≤ j ≤ p.
(A4)
The covariate matrix satisfies the sparse Riesz condition (SRC) with rank q^*: there exist constants 0 < c_* < c^* < ∞, such that for q^* = (3 + 4C)q and C = c^*/c_*, with probability converging to 1, $c_{*} \leq \frac{ν' Σ_{A} ν}{{‖ ν ‖}^{2}} \leq c^{*}$ , ∀A with |A| = q^* and ν ∈ IR^q* where ‖ · ‖ is the ℓ₂ norm.

By (A1), the model is sparse in the sense that although the total number of covariates may be large, the number of covariates with nonzero coefficients is still small. The tail probability assumption in (A2) has been made with high-dimensional linear regression models. See for example Zhang and Huang (2008). With assumption (A3), it can be shown that the subguassian tail property still holds under censoring. The SRC condition (A4) has been formulated in study of the Lasso with linear regressions without censoring (Zhang and Huang 2008). This condition implies that all the eigenvalues of any d × d submatrix of X′WX/n with d ≤ q* lie between c^* and c*. It ensures that any model with dimension no greater than q* is identifiable.

We first consider the Lasso estimator defined as $\tilde{β} = arg min {Q_{n} (β) + λ \sum_{j = 1}^{p} | β_{j} |}$ . With β̃ = (β̃₁, …, β̃_p)′, let Ã₁ = {j,β̃_j ≠ 0} be the set of nonzero Lasso estimated coefficients.

Theorem 1 Suppose that (A1)–(A4) hold and $λ \geq O (1) \sqrt{n log p}$ . Then

With probability converging to 1, |Ã₁| ≤ (2 + 4C)q.
If λ/n → 0 and (log p)/n → 0, then with probability converging to 1, all the covariates with nonzero coefficients are selected.
${‖ \tilde{β} - β_{0} ‖}_{2}^{2} \leq \frac{16 λ^{2} q}{n^{2} c_{*}^{2}} + O_{p} (\frac{| {\tilde{A}}_{1} | log p}{{nc}_{*}^{2}})$ . In particular, if $λ = O (\sqrt{n log p})$ , then ${‖ \tilde{β} - β_{0} ‖}_{2}^{2} = O_{p} (log p / n)$ .

This theorem suggests that, with high probability, the number of covariates selected by the Lasso is a finite multiply of the number of covariates with nonzero coefficients. Moreover, all the covariates with nonzero coefficients are selected with probability converging to one. This justifies using the Lasso as the initial estimator in the algorithm proposed in Sect. 3.1. In addition, the Lasso estimator is estimation consistent.

Starting from the initial Lasso estimator β̃, we denote β̂ as the estimate after one iteration (in the algorithm described in Sect. 3.1). Simple algebra shows that the value of $θ_{j}^{(1)}$ computed in Step 2 of the proposed algorithm is $θ_{j}^{(1)} = (λ / 2) | {\tilde{β}}_{j} |^{- 1 / 2}$ . Thus Step 3 of the proposed algorithm is

\hat{β} = arg min {Q_{n} (β) + \frac{λ}{2} \sum_{j = 1}^{p} | \tilde{β_{j}} |^{- 1 / 2} | β_{j} |} .

β̂ computed above takes the form of an adaptive Lasso estimator. Of note, here, the penalty parameter is the same as the λ used in the Lasso estimator.

For any vector x = (x₁, x₂, …), denote its sign vector by sgn(x) = (sgn(x₁), sgn(x₂), …) where sgn(x_i) = 1, 0, −1 if x_i > 0,= 0, < 0, respectively.

Theorem 2 Suppose that (A1)–(A4) are satisfied, (log p)/n → 0, and $λ = O (\sqrt{n log p})$ . Then

P (sgn (\hat{β}) = sgn (β_{0})) \to 1 .

The above theorem shows that the one-step estimator is sign consistent. Thus, the one-step estimator is selection consistent, in the sense that it can correctly distinguish covariates with zero and nonzero coefficients with probability converging to 1. Following similar arguments, we can prove that any finite-step estimator (computed from the algorithm described in Sect. 3.1) is sign consistent and hence selection consistent. We note that, although the one-step estimator is selection consistent, our numerical studies suggest that iterating until convergence tends to improve finite sample performance.

We note that in Theorem 2, we allow log p = o(n) or p = exp(o(n)). Thus the dimension of covariates can be larger than the sample size, which accommodates gene expression data.

6 Data analysis

6.1 Mantle cell lymphoma data

A study using microarray expression analysis of mantle cell lymphoma (MCL) is reported in Rosenwald et al. (2003). The primary goal of this study is to identify genes that have good predictive power of patients’ survival risk. Among 101 untreated patients with no history of previous lymphoma, 92 were classified as having MCL based on established morphologic and immunophenotypic criteria. Survival times of 64 patients were available and the other 28 patients were censored. The median survival time was 2.8 years (range 0.02 to 14.05 years). Lymphochip DNA microarrays were used to quantify mRNA expression in the lymphoma samples from the 92 patients. The gene expression data that contains expression values of 8810 cDNA elements is available at http://llmpp.nih.gov/MCL/.

We model survival with the AFT model, and use the proposed bridge approach for gene selection. Although there is no limitation on the number of genes that can be used in the proposed approach, we pre-process the data as follows to exclude noises and gain further stability: (1) Un-supervised screening: compute the interquartile ranges of all gene expressions. Remove genes with interquartile ranges smaller than their first quartile. 6608 genes pass this screening; (2) Supervised screening: compute correlation coefficients of the uncensored survival times with gene expressions. Select 500 genes with the largest absolute values of the correlation coefficients. We then standardize these 500 gene expressions to have zero mean and unit variance. We note that the supervised screening utilizes the survival information. In the random sampling based evaluation, to guarantee a fair evaluation, the supervised screening needs to be conducted for each sampled data.

We employ the proposed approach and select the optimal tuning with 5-fold cross validation. Genes selected with the bridge approach and their corresponding estimates are shown in Table 1. For comparison, we also provide the Lasso estimate. 40 and 34 genes are identified using the Lasso and bridge approaches, respectively. Because of the special setup of the computational algorithm, genes identified using the bridge are a subset of those identified using the Lasso.

Table 1.

Mantle cell lymphoma data

UNIQID	Gene name	Lasso	Bridge
16541	Coagulation factor V (proaccelerin, labile factor)	0.180	0.133
16561	Aurora kinase B	−0.040	−0.040
16822	Chemokine (C-C motif) ligand 3	−0.032	−0.044
17174	DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11	−0.048	−0.069
17434	RAD51 homolog (RecA homolog, E. coli) (S. cerevisiae)	−0.023	−0.041
23972	Zinc finger protein, multitype 2	0.012	0.041
24376	Serine (or cysteine) proteinase inhibitor, clade A, member 9	0.167	0.126
24379	Immunoglobulin superfamily receptor translocation associated 1	0.082	0.102
24488	Eukaryotic translation elongation factor 1 alpha 1	0.176	0.106
24845	Interleukin 2 receptor, beta	0.135	0.109
24972		0.038	0.042
26192	Aldolase B, fructose-bisphosphate	0.124	0.061
26474	Chemokine (C-C motif) ligand 3	−0.027	−0.063
26475	Chemokine (C-C motif) ligand 3	−0.054	−0.054
27116	RAB interacting factor −0.007392391	−0.046	−0.061
27659	Hypothetical protein MGC61571	0.027	0.080
27678		0.020	0.036
27838	Lymphocyte antigen 64 homolog, radioprotective 105kDa	−0.069
27969	CDC-like kinase 3	−0.055
28075	Transforming, acidic coiled-coil containing protein 1	−0.019	−0.038
28638	BCL2-related protein A1	0.050	0.073
28645	EPH receptor B4	−0.029
28990	Cell division cycle 2, G1 to S and G2 to M	−0.035	−0.021
29347	Split hand/foot malformation (ectrodactyly) type 1	0.017
29357	Polymerase (DNA directed), epsilon 2 (p59 subunit)	−0.018	−0.031
29897	Asp (abnormal spindle)-like, microcephaly associated	−0.029	−0.027
30110	Natural killer-tumor recognition sequence	0.056
30144	Natural killer-tumor recognition sequence	0.016	0.052
30284	PR domain containing 15	−0.031	−0.053
31049	Polymerase (DNA directed), theta	−0.053	−0.043
31081	Asp (abnormal spindle)-like, microcephaly associated	−0.036	−0.059
31101	Similar to CG1399-PB	−0.028	−0.082
32023	Membrane-spanning 4-domains, subfamily A, member 1	0.023	0.071
32187	AF15q14 protein	−0.024	−0.022
32830	Hypothetical protein LOC284019	−0.080	−0.108
32935	Zinc finger protein 148 (pHZ-52)	0.018	0.041
32979		0.036	0.073
33781	Chromosome 6 open reading frame 83	0.018	0.053
33851	GRIP and coiled-coil domain containing 2	0.044
33880	HP1-BP74	−0.062	−0.055

Open in a new tab

Genes identified using the Lasso and Bridge: UNIQID, gene names, and estimates

We evaluate prediction performance using the approach described in Sect. 4.1. For comparison, we also evaluate the Lasso approach using the same evaluation technique. We show in Fig. 1 (upper panels) the density estimates of OPD and PPD. We can see that (1) the bridge yields well separated OPD and PPD (Wilcoxon test p-value < 0.001), which suggests satisfactory prediction performance. The 90% percentile of the PPD is 3.32, and 62% of the logrank statistics from the OPD are larger than that value; (2) the PPD is close to the χ² distribution, but there is still very small discrepancy. The 95% and 90% percentiles of the PPD are 4.11 and 3.32, respectively, which are slightly larger than their counterparts (3.84 and 2.71) from the χ²; (3) the prediction performance of Lasso is also satisfactory, but inferior compared to that of the bridge. The mean and median of the Lasso OPD are 4.617 and 3.616, which are smaller than their bridge counterparts (5.319 and 4.404).

We evaluate stability of identified genes using the approach described in Sect. 4.2. Results are shown in the lower panels of Fig. 1. For a better view, we only plot the 500 genes that pass the screening in the whole dataset. We can see that (1) most of the identified genes have relatively large OOI; (2) there are a few genes that are not identified, but have moderate OOI. It is still not clear why those genes are not identified. Such a question is worth investigating in future studies; and (3) with permutated data, the POI for all genes are small, with no genes having significantly larger occurrence indexes than others.

6.2 Diffuse large B-cell lymphoma data

The DLBCL (diffuse large B-cell lymphoma) data was first analyzed in Rosenwald et al. (2002). This dataset consists of 240 patients with DLBCL, including 138 patient deaths during the followup. Expression profiles of 7399 genes are obtained. Missing values are imputed using a K-nearest neighbors approach. We carry out supervised selection and select 500 genes with the largest absolute values of marginal correlation coefficients with the uncensored event times to gain further stability. Gene expressions are then normalized to have zero mean and unit variance. We note that, in the random sampling based evaluation, the supervised screening is conducted for each sampled data.

With the proposed approach and optimal tuning selected using 5-fold cross validation, 44 genes are identified. As a comparison, the Lasso identifies 46 genes. The UNI-QID, gene names, and bridge and Lasso estimated coefficients are shown in Table 2.

Table 2.

DLBCL data

UNIQID	Gene name	Lasso	Bridge
16117	diacylglycerol kinase, delta (130 kD)	−0.155	−0.112
32446		−0.110	−0.101
26329		−0.030	−0.047
25933		0.031	0.074
29585	Epoxide hydrolase 2, cytoplasmic	0.081	0.067
28388	Bloom syndrome	0.044	0.127
17414	Occludin	−0.033	−0.074
30348	chromatin assembly factor 1, subunit A (p150)	−0.059	−0.071
17722	RAD23 homolog A (S. cerevisiae)	−0.088	−0.102
28837	G1 to S phase transition 1	−0.035	−0.064
34827	pM5 protein	−0.120	−0.094
24231	calnexin	−0.032	−0.059
33026	Sialyltransferase 7D (N-acetyl galactosaminide alpha-2,6-sialyltransferase)	−0.166	−0.148
29944	Solute carrier family 21 (organic anion transporter), member 12	−0.066	−0.100
28737	PAS domain containing serine/threonine kinase	0.081	0.114
34729	Forkhead box O1A (rhabdomyosarcoma)	0.233	0.167
34042	CD19 antigen	0.246	0.148
27704	Early B-cell factor	0.072	0.099
30355	Early B-cell factor	0.050	0.090
27681	G protein-coupled receptor 18	0.222	0.163
27341	Sarcoma amplified sequence	0.054	0.042
26231		−0.032
26185		−0.019	−0.072
24400	Monoglyceride lipase	−0.068	−0.079
16636	Glucose regulated protein, 58 kD	−0.057	−0.056
28641	Osteoblast specific factor 2 (fasciclin I-like)	0.081	0.100
26081	Growth arrest-specific 1	0.044	0.063
26020	Melanoma cell adhesion molecule	0.075	0.095
19363	Lymphotoxin beta (TNF superfamily, member 3)	0.040	0.089
27509	Matrix metalloproteinase 9 (gelatinase B, 92 kD gelatinase)	0.040	0.101
24433		0.083	0.146
28415	PTK7 protein tyrosine kinase 7	−0.034	−0.076
16179	Lymphocyte-specific protein tyrosine kinase	0.040	0.090
17140	Protein tyrosine phosphatase, non-receptor type 2	0.060
31728	Hypothetical protein FLJ00024	−0.057	−0.058
28681	CD58 antigen, (lymphocyte function-associated antigen 3)	0.034	0.070
17292		−0.052	−0.090
33912	Nuclear factor of kappa light polypeptide gene enhancer	−0.034	−0.088
29117	—Frizzled homolog 1 (Drosophila)	0.145	0.100
17182	Caspase 10, apoptosis-related cysteine protease	−0.035	−0.079
16701	myosin, light polypeptide 2, regulatory, cardiac, slow	0.051	0.087
17391	tec protein tyrosine kinase	−0.121	−0.103
30130	Homo sapiens cDNA FLJ12727 fis, clone NT2RP2000027	−0.168	−0.121
23922		−0.034	−0.089
32836	ESTs	−0.168	−0.158
24612	immunoglobulin superfamily receptor translocation associated 1	0.050	0.093

Open in a new tab

Genes identified using the Lasso and Bridge: UNIQID, gene names, and estimates

We evaluate prediction performance and show the results in Fig. 2. Similar conclusions as those in Sect. 6.1 can be drawn. With the bridge, the OPD has mean and median 4.416 and 3.674, respectively, which are larger than their Lasso counterparts (3.53 and 2.59). 61% of the logrank statistics from the OPD are greater than the 90% percentile of the PPD. The Wilcox test suggests that the OPD and PPD are well separated (p-value < 0.001). Evaluation of stability using the occurrence index is presented in Fig. 2 (lower panels). The observations are similar to those summarized in Sect. 6.1.

6.3 Remark

Analyses of the MCL and DLBCL data suggest that the bridge approach is capable of identifying a smaller number of genes than the Lasso. With gene expression data, a smaller number of identified genes means more focused hypothesis for future confirmation studies, and is thus preferred. In addition, prediction performance of the bridge is better than that of the Lasso. We note that, although prediction is not based on completely independent data, by properly using resampling and comparing the bridge and Lasso on the same basis, the prediction comparison is expected to be valid.

7 Discussions

Genomic studies with high dimensional markers measured along with censored survival outcomes are becoming more and more common. In this article, we model the relationship between gene expressions and censored survival with AFT models. AFT models have been commonly adopted and provide useful alternatives to the Cox and additive hazards models. Of note, since it is still not clear how to compare different models under the “large p, small n” setting, we do not pursue any model comparison. More methodological studies are needed before such a comparison can be conducted.

We propose using the bridge penalty for gene selection. Our numerical studies suggest that the bridge has better performance than the Lasso in terms of variable selection in AFT models. There are other penalties, for example the adaptive Lasso and SCAD, that can be used in the present setup. Since it is beyond the scope of this paper to compare our proposed method with all the existing ones, we only pursue comparison with the Lasso, which has been commonly used as benchmark.

Acknowledgements

This work is partially supported by CA120988 from the National Cancer Institute and DMS 0805670 from the National Science Foundation. We thank the editors and reviewers for their helpful and constructive comments on an earlier version of the paper.

Appendix: Proofs

Let τ = (τ₁, …, τ_n)′ where $τ_{i} = w_{i} ε_{(i)} \equiv w_{i} (Y_{(i)} - X_{(i)}^{'} β_{0})$ .

Lemma 1 Suppose that conditions (A2) and (A3) hold. Let $ξ_{j} = \sum_{i = 1}^{n} X_{ij} τ_{i}, 1 \leq j \leq p$ . Let ξ_n = max_1≤j≤p |ξ_j|. Then

E (ξ_{n}) \leq C_{1} \sqrt{log (p)} {(\sqrt{2 C_{2} n log (p)} + 4 log (2 p) + C_{2} n)}^{1 / 2},

where C₁ and C₂ are two positive constants. In particular, when log (p)/n → 0,

E (ξ_{n}) = O (1) \sqrt{n log p} .

Proof of Lemma 1 Let $s_{nj}^{2} = \sum_{i = 1}^{n} X_{ij}^{2}$ . Conditional on X_ij’s, assumptions (A2) and (A3) imply that ξ_j’s are subgaussian. Let $s_{n}^{2} = {max}_{1 \leq j \leq p} s_{nj}^{2}$ . By (A2) and the maximal inequality for subgaussian random variables (Van der Vaart and Wellner 1996, Lemmas 2.2.1 and 2.2.2),

E (max_{1 \leq j \leq p} | ξ_{j} | | X_{ij}, 1 \leq i \leq n, 1 \leq j \leq p) \leq C_{1} s_{n} \sqrt{log (p)},

for a constant C₁ > 0. Therefore,

E (max_{1 \leq j \leq p} | ξ_{j} |) \leq C_{1} \sqrt{log (p)} E (s_{n}) .

(5)

Since

\sum_{i = 1}^{n} E {[X_{ij}^{2} - E X_{ij}^{2}]}^{2} \leq 4 C_{2} n,

(6)

and

max_{1 \leq j \leq p} \sum_{i = 1}^{n} E X_{ij}^{2} \leq C_{2} n,

(7)

by Lemma 4.2 of Van de Geer (2008), (6) implies

E (max_{1 \leq j \leq p} | \sum_{i = 1}^{n} {X_{ij}^{2} - E X_{ij}^{2}} |) \leq \sqrt{2 C_{2} n log (p)} + 4 log (2 p) .

Therefore, by (7) and the triangle inequality,

E s_{n}^{2} \leq \sqrt{2 C_{2} n log (p)} + 4 log (2 p) + C_{2} n .

Now since $E s_{n} \leq {(E s_{n}^{2})}^{1 / 2}$ , we have

E s_{n} \leq {(\sqrt{2 C_{2} n log (p)} + 4 log (2 p) + C_{2} n)}^{1 / 2} .

(8)

The lemma follows from (5) and (8).

In the proofs below, let Y^* = W^1/2Y and X^* = W^1/2X. Then

Q_{n} (β) = \frac{1}{2} \sum_{i = 1}^{n} {(Y_{i}^{*} - X_{i}^{*'} β)}^{2} = \frac{1}{2} {‖ Y^{*} - X^{*} β ‖}^{2},

where ‖ · ‖ is the ℓ₂ norm.

Proof of Theorem 1, part (i) Part (i) follows from the proof of Theorem 1 of Zhang and Huang (2008). The only difference is that here we use the subgaussian assumption to control certain tail probabilities, instead of the normality condition assumed in Zhang and Huang (2008). Since subgaussian random variables have the same tail behavior as normal random variables, the argument of Zhang and Huang goes through.

Proof of Theorem 1, part (ii) Part (ii) follows from part (iii) and the assumption that the number of nonzero coefficients is fixed. Thus the absolute values of the nonzero coefficients are bounded away from 0 by a positive constant independent of n.

Proof of Theorem 1, part (iii) By the definition of β̃,

{‖ Y^{*} - X^{*} \tilde{β} ‖}_{2}^{2} + 2 λ \sum_{j = 1}^{p_{n}} | {\tilde{β}}_{j} | \leq {‖ Y^{*} - X^{*} β_{0} ‖}^{2} + 2 λ \sum_{j = 1}^{p_{n}} | β_{0 j} | .

Thus

{‖ Y^{*} - X^{*} \tilde{β} ‖}_{2}^{2} + 2 λ \sum_{j \in A_{1}} | {\tilde{β}}_{j} | \leq {‖ Y^{*} - X^{*} β_{0} ‖}^{2} + 2 λ \sum_{j \in A_{1}} | β_{0 j} | .

This implies

{‖ Y^{*} - X^{*} \tilde{β} ‖}_{2}^{2} - {‖ Y^{*} - X^{*} β_{0} ‖}^{2} \leq 2 λ \sum_{j \in A_{1}} | {\tilde{β}}_{j} - β_{0 j} | .

That is,

{‖ X^{*} (\tilde{β} - β_{0}) ‖}^{2} - 2 τ' X^{*} (\tilde{β} - β_{0}) \leq 2 λ \sum_{j \in A_{1}} | {\tilde{β}}_{j} - β_{0 j} | .

(9)

Let B = A₁ ∪ A₂ = {j : β_0j ≠ 0 or β̃_j ≠ 0}. Note that |B| ≤ q_* with probability converging to 1 by part (i), where q^* is given in (A4). Denote $X_{B}^{*} = (X_{j}^{*}, j \in B)$ , β̃_B = (β̃_j, j ∈ B), and β_0B = (β_0j, j ∈ B). Denote

η_{B} = X_{B}^{*} ({\tilde{β}}_{B} - β_{0 B}) .

Since A₁ ⊂ B,

\sum_{j \in A_{1}} | {\tilde{β}}_{j} - β_{0 j} | \leq \sqrt{| A_{1} |} ‖ {\tilde{β}}_{A_{1}} - β_{0 A_{1}} ‖ \leq \sqrt{| A_{1} |} ‖ {\tilde{β}}_{B} - β_{0 B} ‖ .

(10)

By (9) and (10),

{‖ η_{B} ‖}^{2} - 2 τ' η_{B} \leq 2 λ \sqrt{| A_{1} |} ‖ {\tilde{β}}_{B} - β_{0 B} ‖ .

(11)

Let $τ_{B}^{*}$ be the projection of τ to the span of $X_{B}^{*}$ , i.e., $τ_{B}^{*} = X_{B}^{*} {(X_{B}^{*'} X_{B}^{*})}^{- 1} X_{B}^{*'} τ$ . We have

τ' η_{B} = τ' X_{B}^{*} ({\tilde{β}}_{n B} - β_{0 B}) = {{(X_{B}^{*'} X_{B}^{*})}^{- 1 / 2} X_{B}^{*'} τ}' {{(X_{B}^{*'} X_{B}^{*})}^{1 / 2} ({\tilde{β}}_{B} - β_{0 B})} .

Therefore, by the Cauchy–Schwarz inequality,

2 | τ' η_{B} | \leq 2 ‖ τ_{B}^{*} ‖ \cdot ‖ η_{B} ‖ \leq 2 {‖ τ_{B}^{*} ‖}^{2} + \frac{1}{2} {‖ η_{B} ‖}^{2} .

(12)

Combining (11) and (12),

{‖ η_{B} ‖}^{2} \leq 4 {‖ τ_{B}^{*} ‖}^{2} + 4 λ \sqrt{| A_{1} |} \cdot ‖ {\tilde{β}}_{B} - β_{0 B} ‖

(13)

By the SRC condition (A4), ‖η_B‖² ≥ nc_*‖β̃_B − β_0B‖². Thus (13) implies

{nc}_{*} {‖ {\tilde{β}}_{B} - β_{0 B} ‖}^{2} \leq 4 {‖ τ_{B}^{*} ‖}^{2} + \frac{{(4 λ \sqrt{| A_{1} |})}^{2}}{2 {nc}_{*}} + \frac{1}{2} {nc}_{*} {‖ {\tilde{β}}_{B} - β_{0 B} ‖}^{2} .

It follows that

{‖ {\tilde{β}}_{B} - β_{0 B} ‖}^{2} \leq \frac{8 {‖ τ_{B}^{*} ‖}^{2}}{{nc}_{*}} + \frac{16 λ^{2} | A_{1} |}{n^{2} c_{*}^{2}} .

(14)

Now

{‖ τ_{B}^{*} ‖}^{2} = {‖ {(X_{B}^{*'} X_{B}^{*})}^{- 1 / 2} X_{B}^{*'} τ ‖}^{2} \leq \frac{1}{{nc}_{*}} {‖ X_{B}^{*} τ ‖}^{2} \leq \frac{1}{{nc}_{*}} max_{A : | A | \leq q^{*}} {‖ X_{A}^{*'} τ ‖}^{2} .

We have

max_{A : | A | \leq q^{*}} {‖ X_{A}^{*'} τ ‖}^{2} = max_{A : | A | \leq q^{*}} \sum_{j \in A} | X_{j}^{*'} τ |^{2} \leq q^{*} max_{1 \leq j \leq p} {| X_{j}^{*'} τ |}^{2} .

By Lemma 1,

max_{1 \leq j \leq p} | X_{j}^{*'} τ |^{2} = n max_{1 \leq j \leq p} | n^{- 1 / 2} X_{j}^{*'} τ |^{2} = O_{p} (n log p) .

Therefore,

{‖ τ_{B}^{*} ‖}^{2} = O_{p} (\frac{q^{*} log p}{c_{*}}) .

(15)

The result follows from (14) and (15).

Proof of Theorem 2 The proof follows from the argument of Huang et al. (2008b). So we only provide the basic idea below. Let a_j = |β̃_j|^−1/2/2, 1 ≤ j ≤ p. By the Karush–Kunh–Tucker conditions, β̂ = (β̂₁, …, β̂_p)′ is the unique solution of the adaptive Lasso if

{\begin{matrix} X_{j}^{*'} (Y^{*} - X^{*} \hat{β}) = λ_{n} a_{j} sgn ({\hat{β}}_{j}), & {\hat{β}}_{nj} \neq 0 \\ | X_{j}^{*'} (Y^{*} - X^{*} \hat{β}) | \leq λ_{n} a_{j}, & {\hat{β}}_{j} = 0 \end{matrix}

(16)

and the vectors ${X_{j}^{*}, {\hat{β}}_{j} \neq 0}$ are linearly independent. Recall A₁ = {j : β_0j ≠ 0}. Let s̃_n1 = (a_j sgn(β_0j), j ∈ A₁)′ and $X_{A_{1}}^{*} = (X_{j}^{*}, j \in A_{1})$ , β_0A₁ = (β_j, j ∈ A₁)′. So $X_{A_{1}}^{*}$ is a n × q matrix.

Define

{\hat{β}}_{A_{1}} = {(X_{A_{1}}^{*'} X_{A_{1}}^{*})}^{- 1} (X_{A_{1}}^{*'} Y^{*} - λ_{n} {\tilde{s}}_{n 1}) = β_{0 A_{1}} + C_{11}^{- 1} (X_{A_{1}}^{*'} τ - λ_{n} {\tilde{s}}_{n 1}) / n,

(17)

where $C_{11} = X_{A_{1}}^{*'} X_{A_{1}}^{*} / n$ . If sgn(β̂_A₁) = sgn(β_0A₁), then the equation in (16) holds for $\tilde{β} = ({\hat{β'}}_{A 1}, 0')'$ . Thus, since $X^{*} \hat{β} = X_{A_{1}}^{*} {\hat{β}}_{A_{1}}$ for this β̂,

sgn (\hat{β}) = sgn (β_{0}) if {\begin{matrix} sgn ({\hat{β}}_{A_{1}}) = sgn (β_{0 A_{1}}) \\ | X_{j}^{*'} (Y^{*} - X_{A_{1}}^{*} {\hat{β}}_{A_{1}}) | \leq λ_{n} a_{j}, \forall j \notin A_{1} . \end{matrix}

(18)

Let $H_{n} = I_{n} - X_{A_{1}}^{*} C_{11}^{- 1} X_{n 1}^{*'} / n$ . It follows from (17) that $Y^{*} - X_{A_{1}}^{*} {\hat{β}}_{A_{1}} = τ - X_{A_{1}}^{*} ({\hat{β}}_{A_{1}} - β_{0 A_{1}}) = H_{n} τ + X_{A_{1}}^{*} C_{11}^{- 1} {\tilde{s}}_{n 1} λ_{n} / n$ , so that by (18),

sgn (\hat{β}) = sgn (β_{0}) if {\begin{matrix} sgn (β_{0 j}) (β_{0 j} - {\hat{β}}_{j}) \leq | β_{0 j} |, & \forall j \in A_{1} \\ | X_{j}^{*'} (H_{n} τ + X_{A_{1}}^{*} C_{11}^{- 1} {\tilde{s}}_{n 1} λ_{n} / n) | < λ_{n} a_{j}, & \forall j \notin A_{1} . \end{matrix}

(19)

Thus, by (19) and (17),

P {sgn (\hat{β}) \neq sgn (β_{0})} \leq P {| e_{j}^{'} C_{11}^{- 1} X_{A_{1}}^{*'} τ | / n \geq | β_{0 j} | / 2 for some j \in A_{1}} + P {| e_{j}^{'} C_{11}^{- 1} {\tilde{s}}_{n 1} | λ_{n} / n \geq | β_{0 j} | / 2 for some j \in A_{1}} + P {| X_{j}^{*'} H_{n} τ | \geq λ_{n} a_{j} / 2 for some j \notin A_{1}} + P {| X_{j}^{*'} X_{A_{1}}^{*} C_{11}^{- 1} {\tilde{s}}_{n 1} | / n \geq a_{j} / 2 for some j \notin A_{1}} = P {B_{n 1}} + P {B_{n 2}} + P {B_{n 3}} + P {B_{n 4}}, say,

where e_j is the unit vector in the direction of the j-th coordinate. Therefore, to prove the theorem, it suffices to show that each probability in the last line converges to zero. The same argument as in Huang et al. (2008b) can be used here and is omitted. This completes the outline of the proof.

Contributor Information

Jian Huang, Email: jian-huang@uiowa.edu, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA; Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA.

Shuangge Ma, Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA.

References

Alizadeh AA, Eisen MB, Davis RE, Ma C, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. doi: 10.1038/35000501. [DOI] [PubMed] [Google Scholar]
Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436. [Google Scholar]
Dave SS, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. New Engl J Med. 2004;351:2159–2169. doi: 10.1056/NEJMoa041869. [DOI] [PubMed] [Google Scholar]
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat. 2004;32:407–499. [Google Scholar]
Frank IE, Friedman JH. A statistical view of some chemometrics regression tools (with discussion) Technometrics. 1993;35:109–148. [Google Scholar]
Fu WJ. Penalized regressions: the bridge versus the Lasso. J Comput Graph Stat. 1998;7:397–416. [Google Scholar]
Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics. 2005;21:3001–3008. doi: 10.1093/bioinformatics/bti422. [DOI] [PubMed] [Google Scholar]
Huang J, Ma SG, Xie HL. Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics. 2006;62:813–820. doi: 10.1111/j.1541-0420.2006.00562.x. [DOI] [PubMed] [Google Scholar]
Huang J, Horowitz JL, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat. 2008a;36:587–613. [Google Scholar]
Huang J, Ma SG, Xie HL, Zhang C-H. A group bridge approach for variable selection. Biometrika. 2009;96:339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang J, Ma S, Zhang C. Adaptive Lasso for high-dimensional regression models. Stat Sinica. 2008b;18:1603–1618. [Google Scholar]
Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New York: John Wiley; 1980. [Google Scholar]
Leng C, Lin Y, Wahba G. A note on the LASSO and related procedures in model selection. Stat Sinica. 2006;16:1273–1284. [Google Scholar]
Ma S, Huang J. Additive risk survival model with microarray data. BMC Bioinform. 2007;8:192. doi: 10.1186/1471-2105-8-192. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rosenwald A, Wright G, Chan WC, Conners JM, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large B cell lymphoma. New Engl J Med. 2002;346:1937–1947. doi: 10.1056/NEJMoa012914. [DOI] [PubMed] [Google Scholar]
Rosenwald A, Wright G, Wiestner A, Chan WC, et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell. 2003;3:185–197. doi: 10.1016/s1535-6108(03)00028-x. [DOI] [PubMed] [Google Scholar]
Stute W. Consistent estimation under random censorship when covariables are available. J Multivar Anal. 1993;45:89–103. [Google Scholar]
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B. 1996;58:267–288. [Google Scholar]
van de Geer S. High-dimensional generalized linear models and the Lasso. Ann Stat. 2008;36:614–645. [Google Scholar]
Van der Vaart AW, Wellner JA. Weak convergence and empirical processes: with applications to statistics. New York: Springer; 1996. [Google Scholar]
Wang S, Nan B, Zhu J, Beer DG. Doubly penalized Buckley-James method for survival data with high-dimensional covariates. Biometrics. 2008;6:132–140. doi: 10.1111/j.1541-0420.2007.00877.x. [DOI] [PubMed] [Google Scholar]
Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med. 1992;11:1871–1879. doi: 10.1002/sim.4780111409. [DOI] [PubMed] [Google Scholar]
Ying ZL. A large sample study of rank estimation for censored regression data. Ann Stat. 1993;21:76–99. [Google Scholar]
Zhang C, Huang J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat. 2008;36:1567–1594. [Google Scholar]
Zhou M. M-estimation in censored linear models. Biometrika. 1992;79:837–841. [Google Scholar]

[R1] Alizadeh AA, Eisen MB, Davis RE, Ma C, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. doi: 10.1038/35000501. [DOI] [PubMed] [Google Scholar]

[R2] Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436. [Google Scholar]

[R3] Dave SS, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. New Engl J Med. 2004;351:2159–2169. doi: 10.1056/NEJMoa041869. [DOI] [PubMed] [Google Scholar]

[R4] Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat. 2004;32:407–499. [Google Scholar]

[R5] Frank IE, Friedman JH. A statistical view of some chemometrics regression tools (with discussion) Technometrics. 1993;35:109–148. [Google Scholar]

[R6] Fu WJ. Penalized regressions: the bridge versus the Lasso. J Comput Graph Stat. 1998;7:397–416. [Google Scholar]

[R7] Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics. 2005;21:3001–3008. doi: 10.1093/bioinformatics/bti422. [DOI] [PubMed] [Google Scholar]

[R8] Huang J, Ma SG, Xie HL. Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics. 2006;62:813–820. doi: 10.1111/j.1541-0420.2006.00562.x. [DOI] [PubMed] [Google Scholar]

[R9] Huang J, Horowitz JL, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat. 2008a;36:587–613. [Google Scholar]

[R10] Huang J, Ma SG, Xie HL, Zhang C-H. A group bridge approach for variable selection. Biometrika. 2009;96:339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Huang J, Ma S, Zhang C. Adaptive Lasso for high-dimensional regression models. Stat Sinica. 2008b;18:1603–1618. [Google Scholar]

[R12] Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New York: John Wiley; 1980. [Google Scholar]

[R13] Leng C, Lin Y, Wahba G. A note on the LASSO and related procedures in model selection. Stat Sinica. 2006;16:1273–1284. [Google Scholar]

[R14] Ma S, Huang J. Additive risk survival model with microarray data. BMC Bioinform. 2007;8:192. doi: 10.1186/1471-2105-8-192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Rosenwald A, Wright G, Chan WC, Conners JM, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large B cell lymphoma. New Engl J Med. 2002;346:1937–1947. doi: 10.1056/NEJMoa012914. [DOI] [PubMed] [Google Scholar]

[R16] Rosenwald A, Wright G, Wiestner A, Chan WC, et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell. 2003;3:185–197. doi: 10.1016/s1535-6108(03)00028-x. [DOI] [PubMed] [Google Scholar]

[R17] Stute W. Consistent estimation under random censorship when covariables are available. J Multivar Anal. 1993;45:89–103. [Google Scholar]

[R18] Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B. 1996;58:267–288. [Google Scholar]

[R19] van de Geer S. High-dimensional generalized linear models and the Lasso. Ann Stat. 2008;36:614–645. [Google Scholar]

[R20] Van der Vaart AW, Wellner JA. Weak convergence and empirical processes: with applications to statistics. New York: Springer; 1996. [Google Scholar]

[R21] Wang S, Nan B, Zhu J, Beer DG. Doubly penalized Buckley-James method for survival data with high-dimensional covariates. Biometrics. 2008;6:132–140. doi: 10.1111/j.1541-0420.2007.00877.x. [DOI] [PubMed] [Google Scholar]

[R22] Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med. 1992;11:1871–1879. doi: 10.1002/sim.4780111409. [DOI] [PubMed] [Google Scholar]

[R23] Ying ZL. A large sample study of rank estimation for censored regression data. Ann Stat. 1993;21:76–99. [Google Scholar]

[R24] Zhang C, Huang J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat. 2008;36:1567–1594. [Google Scholar]

[R25] Zhou M. M-estimation in censored linear models. Biometrika. 1992;79:837–841. [Google Scholar]

PERMALINK

Variable selection in the accelerated failure time model via the bridge method

Jian Huang

Shuangge Ma

Abstract

1 Introduction

2 Bridge estimation in the AFT model

3 Computation

3.1 Computational algorithm

3.2 Tuning parameter selection

4 Evaluation

4.1 Evaluation of prediction

4.2 Evaluation of stability

5 Asymptotic properties

6 Data analysis

6.1 Mantle cell lymphoma data

Table 1.

Fig 1.

6.2 Diffuse large B-cell lymphoma data

Table 2.

Fig 2.

6.3 Remark

7 Discussions

Acknowledgements

Appendix: Proofs

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Variable selection in the accelerated failure time model via the bridge method

Jian Huang

Shuangge Ma

Abstract

1 Introduction

2 Bridge estimation in the AFT model

3 Computation

3.1 Computational algorithm

3.2 Tuning parameter selection

4 Evaluation

4.1 Evaluation of prediction

4.2 Evaluation of stability

5 Asymptotic properties

6 Data analysis

6.1 Mantle cell lymphoma data

Table 1.

Fig 1.

6.2 Diffuse large B-cell lymphoma data

Table 2.

Fig 2.

6.3 Remark

7 Discussions

Acknowledgements

Appendix: Proofs

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases