Variable selection with group structure in competing risks quantile regression

Kwang Woo Ahn; Soyoung Kim

doi:10.1002/sim.7619

. Author manuscript; available in PMC: 2019 Apr 30.

Published in final edited form as: Stat Med. 2018 Feb 21;37(9):1577–1586. doi: 10.1002/sim.7619

Variable selection with group structure in competing risks quantile regression

Kwang Woo Ahn ^a, Soyoung Kim ^a,^*

PMCID: PMC5889760 NIHMSID: NIHMS939135 PMID: 29468710

Abstract

We study the group bridge and the adaptive group bridge penalties for competing risks quantile regression with group variables. While the group bridge consistently identifies non-zero group variables, the adaptive group bridge consistently selects variables not only at group level, but also at within-group level. We allow the number of covariates to diverge as the sample size increases. The oracle property for both methods is also studied. The performance of the group bridge and the adaptive group bridge is compared in simulation and in a real data analysis. The simulation study shows that the adaptive group bridge selects non-zero within-group variables more consistently than the group bridge. A bone marrow transplant study is provided as an example.

Keywords: Adaptive lasso, Competing risks quantile regression, Group bridge

1. Introduction

Quantile regression provides an alternative method to the Cox proportional hazards model and the accelerated failure time (AFT) model in survival analysis [1]. It is often preferred when the survival distribution is skewed. There is rich literature in survival quantile regression. Peng and Huang [2] proposed a martingale-based estimating equations. Reich and Smith [3] developed a semiparametric Bayesian quantile regression model for censored data. Yin et al. [4] studied a power-transformed quantile regression model for survival data. Yin and Cai [5] proposed quantile regression models for correlated survival data.

Recently quantile regression for competing risks data have had much attention. Peng and Fine [1] proposed a semiparametric model based on the competing risks AFT model. Sun et al. [6] developed a regression model when the failure type is missing in competing risks data. Lee and Fine [7] studied parametric and nonparametric methods to make inference on cumulative incidence quantiles.

In spite of increasing popularity of quantile regression for survival and competing risks data, the current literature on variable selection is somewhat limited. Jiang et al. [8] proposed the adaptive lasso for a composite quantile regression with randomly censored data. Wang et al. [9] also studied the adaptive lasso for censored quantile regression. They all studied a survival setting, not a competing risks setting. In addition, their proposed methods addressed variable selection at individual level, not at group level. In practice, clinicians often encounter group variables such as categorical variables. For example, Verneris et al. [10] studied the outcomes of the patients having reduced-intensity conditioning allogeneic hematopoietic cell transplantation from 1999 to 2011. They studied competing risks outcomes including relapse and treatment-related mortality (TRM), where relapse and TRM are competing risks to each other. The variables that they considered for analysis consisted of binary and categorical variables.

Several penalties have been proposed to select group variables for linear regression and competing risks settings. Yuan and Lin [11] proposed the group lasso, which selects variables at group level, not at within-group level. Huang et al. [12] developed the group bridge to select both non-zero group and non-zero within-group variables. However, they studied group selection consistency only and did not show within-group variable selection consistency. Zhou and Zhu [13] proposed an adaptive hierarchical lasso having group variable selection consistency and within-group variable selection consistency. Zhao et al. [14] applied the adaptive hierarchical lasso penalty to identify non-zero variables at both levels for quantile linear regression. Fu et al. [15] extensively studied lasso, adaptive lasso, SCAD, and MCP for individual variable selection and their group variable selection versions for the subdistribution hazards model. However, they did not address within-group variable selection. In addition, their oracle property was limited to a fixed number of covariates. Despite extensive work in group variable selection for linear, linear quantile, and subdistribution hazards regression models, there is little literature on group variable selection in competing risks quantile regression. In particular, group and within-group level variable selection techniques remain unexplored in the current literature to the best of the authors’ knowledge.

We propose the group bridge and the adaptive group bridge for bi-level variable selection, that is, group and within-group variable selection, under the competing risks quantile regression model of Peng and Fine [1]. While the group bridge consistently identifies non-zero group variables, the adaptive group bridge consistently selects non-zero variables at both group level and within-group level. When there is no group structure for variables, individual variable selection can be handled as a special case of the proposed methods. Based on our knowledge, even individual variable selection has not been studied for the competing risks quantile regression. We study their oracle property while allowing the number of variables to diverge as the sample size increases. We show the adaptive group bridge identifies non-zero within-group variables more consistently than the group bridge in simulation study. In Section 2, we describe the proposed methods and study their theoretical properties. In Section 3, we compare the performance of the adaptive group bridge and the group bridge via simulation study. We illustrate a real data example in Section 4 and have a brief conclusion in Section 5. All the proofs of the theorems and the lemmas in this paper can be found in the online Supplementary Materials.

2. Method

In this section, we propose a penalized competing risks quantile regression model and study its theoretical properties. We begin with some notations. Without loss of generality, we consider two causes of failure ε ∈ {1,2} with sample size n. We allow the number of covariates d_n to increase as n increases. Let T_i, C_i, ε_i, and $Z_{i} = {(1, Z_{i 1}, \dots, Z_{i d_{n}})}^{T}$ be the event time, censoring time, cause of failure, and covariate vector of subject i for i = 1, …, n. Denote β₀(τ) = {β_j_,0(τ); j = 0, …, d_n}^T as the true parameter vector given quantile τ, where β₀,₀ is the true intercept coefficient. Let X_i = T_i ˄ C_i be the observed time and δ_i = I(T_i ≤ C_i)I(ε_i = 1), where a ˄ b = min (a, b). We assume that (T_i, ε_i, C_i, Z_i) are independent and identically distributed, and the T_i’s and C_i’s are independent given Z_i for i = 1, …, n. The study period is [0, L]. Let F₁(t|Z_i) be the cumulative incidence of cause 1 at time t given Z_i, where F₁(t|Z_i) = P(T_i ≤ t,ε_i = 1|Z_i). Given covariate Z, we define the τth conditional quantile of F₁(t|Z) as Q₁(τ|Z) = inf{t: F₁(t|Z) ≥ τ}. For τ ε [τ_L,τ_U] with 0 < τ_L,τ_U < 1, we consider Q₁(τ|Z) = g{Z^Tβ(τ)}, where g(·) is a known monotone link function. Let ‖·‖ be the Euclidean norm and a^⊗2 = aa^T for a vector a.

Let ${\tilde{Z}}_{i} = {(Z_{i 1}, \dots, Z_{i d_{n}})}^{T}$ . For simplicity, we assume that ${\tilde{Z}}_{i}$ ’s are fixed over time. Let $N_{i}^{G} (t) = I (C_{i} \leq T_{i}) I (C_{i} \leq t)$ be the counting process for censoring and Y_i(t) = I(X_i ≥ t). We use the Cox proportional hazards model to fit censoring time C_i’s:

λ^{G} (t | {\tilde{Z}}_{i}) = λ_{0}^{G} (t) e^{α^{T} {\tilde{Z}}_{i}},

Where $λ_{0}^{G} (t)$ is an arbitrary baseline hazard function for censoring and α^T is the unknown parameter vector. Define

S_{G}^{(d)} (α, t) = n^{- 1} \sum_{i = 1}^{n} Y_{i} (t) {\tilde{Z}}_{i}^{\otimes d} e^{α^{T} {\tilde{Z}}_{i}},

where d = 0,1, and 2. The baseline cumulative hazard function for censoring $Λ_{0}^{G} (t)$ is estimated by the Breslow-type estimator [16]:

{\hat{Λ}}_{0}^{G} (t; \hat{α}) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} d N_{i}^{G} (u)}{n S_{G}^{(0)} (\hat{α}, u)},

where $\hat{α}$ is the estimator of α based on the Cox proportional hazards model. Then, we estimate $G (t | {\tilde{Z}}_{i})$ as follows:

\hat{G} (t | {\tilde{Z}}_{i}) = \exp {- \int_{0}^{t} e^{{\hat{α}}^{T} {\tilde{Z}}_{i}} d {\hat{Λ}}_{0}^{G} (u : \hat{α})} .

We can obtain the consistency of $\hat{α}$ , ${\hat{Λ}}_{0}^{G} (t : \hat{α})$ , and $\tilde{G} (t | {\tilde{Z}}_{i})$ as follows:

Lemma 2.1

Assume Conditions (a)-(e) as in Appendix. Then, we have $‖ \hat{α} - α ‖ = O_{p} (\sqrt{d_{n} / n})$ $\sup_{t} | {\hat{Λ}}_{0}^{G} (t : \hat{α}) - Λ_{0}^{G} (t) | = O_{p} (\sqrt{d_{n} / n})$ , and $\sup_{t} | \hat{G} (t | \tilde{Z}) - G | (t | \tilde{Z}) | = O_{p} (\sqrt{d_{n} / n})$ .

When the censoring distribution G does not depend on any covariates, the Kaplan-Meier estimator can be used instead of the Breslow estimator. The proof of Lemma 2.1 can be found in the online Supplemental Materials.

Next, we define some notations on group variables and their memberships. Assume that we have K groups of variables. Let A₁,…,A_K be subsets of {1, …, d_n} representing group memberships of variables, where A_k’s may overlap. Define β_A(τ) = {β_j(τ),j ε A}^T and β_A_,0(τ) = {β_j,₀(τ); j ε A}^T for a set A. To distinguish the individual memberships between non-zero β_j,₀(τ)’s and zero β_j,₀(τ)’s, we define B₁ and B₂ such that β_j,₀(τ) ≠ 0 if j ε B₁ and β_j,₀(τ) = 0 if j ε B₂. To distinguish the group memberships between non-zero $β_{A_{k}, 0} (τ)$ ’s and zero $β_{A_{k}, 0} (τ)$ ’s, without loss of generality we further define E₁ and E₂ such that $E_{1} = \cup_{k = 1}^{K_{1}} A_{k}$ and $E_{2} = \cup_{k = K_{1} + 1}^{K} A_{k}$ , where $β_{A_{k}, 0} (τ) \neq 0$ for 1 ≤ k ≤ K₁ and $β_{A_{k}, 0} (τ) = 0$ for K₁ + 1 ≤ k ≤ K.

To estimate β(τ), Peng and Fine [1] considered the estimating equation S_n(b, τ) = 0, where

S_{n} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [\frac{I {X_{i} \leq g (Z_{i}^{T} b)} I (δ_{i} = 1)}{\hat{G} (X_{i} | {\tilde{Z}}_{i})} - τ] .

(1)

To solve S_n(b, τ) = 0, Peng and Fine [1] proposed the following L₁-type convex function:

U_{n} (b, τ) = \sum_{i = 1}^{n} I (δ_{i} = 1) | \frac{g^{- 1} (X_{i}) - b^{T} Z_{i}}{\hat{G} (X_{i} | {\tilde{Z}}_{i})} | + | M - b^{T} \sum_{i = 1}^{n} \frac{- Z_{i} I (δ_{i} = 1)}{\hat{G} (X_{i} | {\tilde{Z}}_{i})} | + | M - b^{T} \sum_{i = 1}^{n} 2 Z_{i} τ |,

where M is a very large positive number to bound $| b^{T} \sum_{i = 1}^{n} - Z_{i} I (δ_{i} = 1) / \hat{G} (X_{i} | {\tilde{Z}}_{i}) |$ and $| b^{T} \sum_{i = 1}^{n} 2 Z_{i} τ |$ for all b’s in the parameter space for β₀(τ). They studied the consistency and the asymptotic normality of the estimator of β₀(τ) obtained by solving S_n(b, τ) = 0 when G is non-covariate dependent and d_n is fixed.

To select variables at bi-level, we propose the following penalized function:

W_{n} (b, τ) = U_{n} (b, τ) + λ_{n} \sum_{k = 1}^{K} c_{k} {(\sum_{j \in A_{k}} \frac{| b_{j} |}{{| {\tilde{β}}_{j} (τ) |}^{ν}})}^{γ},

(2)

where ${\tilde{β}}_{j} (τ)$ is a consistent estimator of β(τ), ν ≥ 0, λ_n > 0, and 0 < γ < 1. Following Huang et al. [12], we set $c_{k} \propto {| A_{k} |}^{1 - γ}$ , where |A| is the cardinality of A. If ν = 0, the penalty term is the group bridge penalty of Huang et al. [12] and Huang et al. [17]. When ν > 0, we call the penalty term as adaptive group bridge penalty. The adaptive group bridge becomes i) individual variable selection when |A_k | = 1 for all k; and ii) the adaptive hierarchical lasso penalty of Zhou and Zhu [13] when γ = 1/2 and c_k = 1 for all k.

We can formulate minimizing W_n(b, τ) to minimizing

{\tilde{W}}_{n} (b, θ, τ) = U_{n} (b, τ) + \sum_{k = 1}^{K} θ_{k}^{1 - 1 / γ} c_{k}^{1 / γ} {\sum_{j \in A_{k}} \frac{| b_{j} |}{{| {\tilde{β}}_{j} |}^{ν}}}^{γ} + ζ_{n} \sum_{k = 1}^{K} θ_{k},

(3)

where θ = (θ₁, …, θ_K)^T. By defining

θ_{k} = c_{k} {(\frac{1 - γ}{ζ_{n} γ})}^{γ} {(\sum_{j \in A_{k}} \frac{| β_{j} (τ) |}{{| {\tilde{β}}_{j} (τ) |}^{ν}})}^{γ}, k = 1, \dots, K,

we can show the following lemma similarly to Proposition 1 of Huang et al. [12] and thus its proof is omitted:

Lemma 2.2

Assume that $λ_{n} = ζ_{n}^{1 - γ} γ^{- γ} {(1 - γ)}^{γ - 1}$ for 0 < γ < 1. Then, $\hat{β} (τ)$ minimizes W_n (b, τ) if and only if ${\hat{β} (τ), \hat{θ}}$ minimizes ${\tilde{W}}_{n} (b, θ, τ)$ , where θ_k > 0 and ${\hat{θ}}_{k} > 0$ for k = 1,…, K.

Define ${\tilde{S}}_{n} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [F_{1} {g (Z_{i}^{T} b) | Z_{i}} - τ]$ . Denote $\nabla {\tilde{S}}_{n} (b, τ)$ as the first derivative of ${\tilde{S}}_{n} (b, τ)$ with respect to b. We first study the oracle property of the group bridge estimator given τ. We assume that

(C1)
There exists ω > 0 such that $P (C = ω | \tilde{Z}) \geq c > 0$ and $P (C > ω | \tilde{Z}) = 0$ for any $\tilde{Z}$ .
(C2)
Z_ij and β_j_,0(τ) are uniformly bounded for j = 1,…, d_n.
(C3)
f₁(t|z) is bounded above uniformly in t and z, where f₁(t|z) = dF₁(t|z)/dt.
(C4)
Define $H (b) = E {n^{- 1 / 2} \nabla {\tilde{S}}_{n} (b, τ)} = E [Z^{\otimes 2} f_{1} {g (Z^{T} b) | Z}_{g}^{'} (Z^{T} b)]$ . For some ρ₀ > 0, C₁ > 0, and C₂ > 0, $\inf_{b ε B_{(ρ_{0})}} κ {H (b)} \geq C_{1}$ and $\sup_{b ε B_{(ρ_{0})}} κ {H (b)} \leq C_{2} < \infty$ , where $B (ρ_{0}) = {b ε ℝ^{d_{n} + 1} : ‖ b - β_{0} (τ) ‖ \leq ρ_{0}}$ and κ(H) is the eigenvalue of a matrix H.
(C5)
Σ(τ) = Var{S_n (b,τ)}. There exist C₃ > 0 and C₄ > 0 such that $\inf_{β ε B_{(ρ_{0})}} κ [\sum (τ)}] \geq C_{3}$ and $\sup_{β ε B_{(ρ_{0})}} κ [\sum (τ)}] \leq C_{4} < \infty$ , where ρ₀ > 0.
(C6)
There exists a constant C₅ > 0 such that $\sup_{b ε B_{(ρ_{0}),} 0 \leq i \leq d_{n}} n^{- 1} Cov {\nabla {\tilde{S}}_{n, i j} (b, τ), \nabla {\tilde{S}}_{n, i j^{'}} (b, τ)} \leq C_{5} < \infty$ , for all 0 < j, j′ < d_n, where $\nabla {\tilde{S}}_{n, i j} (b, τ)$ is the (i, j)th entry of $\nabla {\tilde{S}}_{n} (b, τ)$ .
(C7)
$d_{n}^{4} / n \to 0$ .
(C8)
$C_{n}^{*} = {max}_{j} \sum_{k = 1}^{K} I (j ε A_{k})$ is bounded and $λ_{n}^{2} / n \sum_{k = 1}^{K_{1}} c_{k}^{2} {\sum_{j ε A_{k}} | β_{j, 0} (τ) |}^{2 γ - 2} | A_{k} | \leq d_{n} M_{n}$ , M_n = O_p (1),where $λ_{n} / [n^{γ / 2} κ_{max} {\sum (τ)} d_{n}^{1 - γ / 2}] \to \infty$ as n → ∞
(C9)
$λ_{n} n^{- 1 / 2} \to 0$ , $1 / κ_{min} {\sum (τ)} + κ_{max} {\sum (τ)} + \sum_{k = 1}^{K} c_{k}^{2} = O (1)$ , $λ_{n} / (n^{γ / 2} d_{n}^{1 - γ / 2}) \to \infty$ as n → ∞

(C1)−(C5) are similar to the standard conditions for the competing risks quantile regression of Peng and Fine [1]. Peng and Fine [1] suggested to use a truncated censoring time C = mim(C, L) for ω in (C1) so that (C1) is always satisfied. In practice, ω can be chosen as large as possible so that only small information loss occurs [1]. (C4) − (C6) and (C8) control the behavior of the estimating equation as d_n grows. Similar conditions to (C4) − (C8) were used to allow d_n to diverge as n ⟶ ∞ in Cai et al. [18], Huang et al. [12], and Huang et al. [17]. (C5), (C6), and (C9) restricts the variability of Var{S_n(b,τ)} and $Var {{\tilde{S}}_{n} (b, τ)}$ as n and d_n increase. (C8) and (C9) control λ_n, the number of variables within group, and the magnitude of the true parameters in non-zero groups, which were used in Huang et al. [17]. The variance matrix Σ(τ) in Condition (C5) can be specified as follows: Define e_G(α₀,t) and A(α₀) as in Appendix. We further define $h (t, u, Z_{i}) = \exp (α_{0}^{T} Z_{i}) \int_{u}^{t} {Z_{i} - e_{G} (α_{0}, u)} d Λ_{0}^{G} (v)$ , $M_{i}^{G} (t) = N_{i}^{G} (t) - \int_{0}^{t} Y_{i} (u) \exp (α_{0}^{T} Z_{i}) d Λ_{0}^{G} (u)$ and

q (t) = E (\frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{L} [h^{T} (t, 0, Z_{i}) A {(α_{0})}^{- 1} {Z_{i} - e_{G} (α_{0}, t)} + \frac{\exp (α_{0}^{T} Z_{i}) I (u \leq t)}{s_{G}^{(0)} (α_{0}, u)}] M_{i}^{G} (u))

$w_{i} (b) = Z_{i} I {X_{i} \leq g (Z_{i}^{T} b)} I (δ_{i} = 1) q (X_{i}) / G (X_{i} | Z_{i})$ . Then, $\sum (τ) = E {η_{1} (τ) η_{1} {(τ)}^{T}}$ , where $η_{i} (τ) = Z_{i} [I {X_{i} \leq g (Z_{i}^{T} β_{0}^{T} (τ))} I (δ_{i} = 1) / G (X_{i} | Z_{i}) - τ] + w_{i} {β_{0} (τ)}$ . The s includes the detailed derivation of η_i(τ) and the asymptotic normality of the estimator obtained by solving S_n(b,τ) = 0 for fixed d_n. Denote → _d as convergence in distribution.

First of all, the following lemma shows the consistency of the estimator obtained by solving S_n (b,τ) = 0 when d_n diverges as n ⟶ ∞:

Lemma 2.3

Let $\tilde{β} (τ)$ be the estimator obtained by solving S_n (b, τ) = 0. Then, under the conditions (C1) − (C7), we have $‖ \tilde{β} (τ) - β_{0} (τ) ‖ = O_{p} (\sqrt{d_{n} / n})$ .

The proof of Lemma 2.3 can be found in the online Supplementary Materials. Peng and Fine [1] studied the consistency of $\tilde{β} (τ)$ for non-covariate dependent censoring with fixed number of covariates. Lemma 2.3 extends their result to covariate-dependent censoring with diverging d_n. Similarly to Huang et al. [17], we have the following theorem for the group bridge estimator given τ:

Theorem 2.4

Assume ν = 0 in (2). Under (C1) − (C9), we have

Consistency: $‖ \hat{β} (τ) - β_{0} (τ) ‖ = O_{p} (\sqrt{d_{n} / n})$ .
Group variable selection consistency: $P {{\hat{β}}_{E_{2}} (τ) = 0} \to 1$ .
Asymptotic distribution: for fixed unknown ${E_{1}, β_{E_{1}, 0}}$ ,
$\sqrt{n} {{\hat{β}}_{E_{1}} (τ) - β_{E_{1}, 0} (τ)} \to_{d} N [0, H_{11}^{*} {β_{0} (τ)}^{- 1} \sum_{11}^{*} (τ) H_{11}^{*} {β_{0} (τ)}^{- 1}],$
where $H_{11}^{*} {β_{0} (τ)$ and $\sum_{11}^{*} (τ)$ are the leading |E₁| × |E₁| submatrices of H{β₀(τ)} and Σ(τ), respectively.

Using Lemma 2.3, Theorem 2.4 can be shown similarly to the proofs of Theorems 1 and 2 of Huang et al. [17] and thus its proof is omitted. Theorem 2.4 shows the group variable selection consistency $\sqrt{n / d_{n}} -consistency$ of the group bridge estimator.

Although the group bridge can consistently select non-zero group variables, it may not effectively eliminate zero individual variables within non-zero group variables. This may be improved with using ν > 0 in (2), that is, the adaptive group bridge penalty. For the adaptive group bridge, we have the following theorem given τ:

Theorem 2.5

Assume ν > 0 in (2). In addition to (C1) − (C7), we assume

(C8b) For some ν₁ and ν₂ such that 0 < ν₁ < 1, 0 < ν₂, and ν₂/(1 − ν₁) < ν, ${min}_{j ε B_{1}} | β_{0, j} (τ) | = O_{p} {{(d_{n} / n)}^{ν_{1} / 2}}$ , ${max}_{k} | A_{k} \cap B_{1} | = O {{(n / d_{n})}^{ν_{2} / 2}}$ , and

\sum_{k = 1}^{K_{1}} c_{k} {{(\sum_{j ε A_{k} \cap B_{1}} {| β_{j, 0} (τ) |}^{1 - ν})}^{γ - 1} \sum_{j ε A_{k} \cap B_{1}} \frac{1}{{| β_{j, 0} (τ) |}^{ν}}} = O_{p} (\sqrt{d_{n}) .}

$(C 9 b) λ_{n} / \sqrt{n} \to 0$ , $\sqrt{n / d_{n}} {\tilde{β}}_{j} = O_{p} (1)$ , and $min (λ_{n} n^{(ν - 1) / 2} d_{n}^{- (1 + ν) / 2}, λ_{n} n^{γ (ν - 1) / 2} d_{n}^{- 1 + γ (1 - ν) / 2}) \to \infty$ .

Then, we have

Consistency: $‖ \hat{β} (τ) - β_{0} (τ) ‖ = O_{p} (\sqrt{d_{n} / n})$ .
Bi-level variable selection consistency: $P {{\hat{β}}_{B_{2}} (τ) = 0} \to 1$ .
Asymptotic distribution:for fixed unknown ${B_{1}, β_{B_{1}, 0}}$ ,
$\sqrt{n} {{\hat{β}}_{B_{1}} (τ) - β_{B_{1}, 0} (τ)} \to_{d} N [0, H_{11} {β_{0} (τ)}^{- 1} \sum_{11} (τ) H_{11} {β_{0} (τ)}^{- 1}],$
where H₁₁ {β₀(τ)} and Σ₁₁(τ) are the leading |B₁| × |B₁| submatrices of H{β₀(τ)} and Σ(τ), respectively.

The proof of Theorem 2.5 can be found in the online s. (C8b) controls the magnitude of non-zero parameters and the number of non-zero parameters. It requires the smallest magnitude of non-zero parameters does not shrink towards zero too fast. (C9b) controls λ_n and ν as n → ∞ to obtain the oracle property. Theorem 2.5 provides the oracle property of the adaptive group bridge estimator. In particular, it shows that the adaptive group bridge consistently identifies not only non-zero group variables, but also non-zero within-group variables.

To obtain $\hat{β}$ , we minimize ${\tilde{W}}_{n} (b, θ, τ)$ of (3). Then, the optimization algorithm is as follows:

Obtain an consistent estimator $\tilde{β} (τ)$ and an initial value β⁽⁰⁾(τ) from Peng and Fine [1] or the group bridge.
Compute
$θ_{k}^{(i)} = c_{k} {(\frac{1 - γ}{ζ_{n} γ})}^{γ} {(\sum_{j \in A_{k}} \frac{| β_{j}^{(i)} (τ) |}{{| {\tilde{β}}_{j} (τ) |}^{ν}})}^{γ}, k = 1, \dots, K .$
Obtain β⁽ⁱ⁺¹⁾ (τ) by minimizing ${\tilde{W}}_{n} (b, θ^{(i)}, τ)$ with respect to b.
Repeat (2)–(3) until ||β⁽ⁱ⁺¹⁾(τ) − β⁽ⁱ⁾(τ)|| < 10⁻⁴.

The minimization in Step 3 can be implemented using R package quantreg [19]. To choose a tuning parameter ζ_n in (3), we propose the following BIC-type criterion motivated by Lee et al. [20] and Shows et al. [21]:

\frac{2}{n} U_{n} {\hat{β} (τ), τ} + C \log (d_{n}) p_{n} \frac{\log (n)}{2 n},

where p_n is the number of non-zero estimates given ζ_n and C is some positive number.

3. Simulation

We performed simulation studies under two group variable settings: i) group variables consisting of continuous variables; and ii) group variables consisting of continuous variables and categorical variables. Censoring times and event times were independently generated. Let $Z = {(1, \tilde{Z})}^{T}$ , $β_{0}^{- 0} (τ) = {β_{1, 0} (τ), \dots, β_{d_{n}, 0} (τ)}^{T}$ , and $ζ_{0}^{- 0} (τ) = {ζ_{1, 0} (τ), \dots, ζ_{d_{n}, 0} (τ)}^{T}$ . Event times and cause of failure were generated as follows:

P (ε = 1) = p_{1},

P {T \leq t | ε = 1, \tilde{Z}) = Φ (\log t - β_{0}^{- 0} {(τ)}^{T} \tilde{Z}},

P {T \leq t | ε = 2, \tilde{Z}) = Φ (\log t - ζ_{0}^{- 0} {(τ)}^{T} \tilde{Z}},

\log Q_{1} (τ | Z) = Φ^{- 1} (\frac{τ}{p_{1}}) + β_{0}^{- 0} {(τ)}^{T} \tilde{Z},

G (t) = \exp (- λ_{c} α_{0}^{T} \tilde{Z} t) .

Thus, $β_{0} (τ) = {Φ^{- 1} {(τ / p_{1}), β_{0}^{- 0} (τ)}}^{T}$ . We set $β_{0}^{- 0} (τ) = ζ_{0}^{- 0} (τ)$ . Selecting non-zero β_j_,0 (τ) for j = 1, …, d_n was of interest in this simulation study. We selected p₁ and λ_c to generate 40% cause 1 events, 30% cause 2 events, and 30% censoring. Each simulation was conducted 1000 iterations. The competing risks quantile regression of Peng and Fine [1] and the group bridge were used to estimate $\tilde{β}$ . The adaptive group bridge with ν = 1 was compared to the group bridge. We evaluated the mean squared error that was calculated by

MSE = \frac{1}{1000} {\sum_{i = 1}^{1000} ‖ {\hat{β}}^{i, - 0} (τ) - β_{0}^{- 0} (τ) ‖}^{2},

where ${\hat{β}}^{i, - 0} (τ)$ is the estimator of $β_{0}^{- 0} (τ)$ at the ith iteration given τ. The proposed BIC-type criterion with C = 1.5 was used to select the tuning parameter. Two τ values were examined: τ = 0.1 and 0.25. We first considered Setting i) group variables consisting of continuous variables with non-covariate dependent censoring distribution, that is, α₀ = 0. We examined n = 400, 600, and 800. To generate $\tilde{Z}$ , three correlated continuous variables for each group were generated from N(0, Σ), where

\sum = (\begin{matrix} 1 & 0.5 & 0.5 \\ 0.5 & 1 & 0.5 \\ 0.5 & 0.5 & 1 \end{matrix}) .

Variables were assumed to be independent if they belong to different groups. For n = 400, 600, and 800, there were 9, 10, 11 groups, respectively. The true β₀(τ) for n = 400 was {β_1,0(τ), …, β_9,0(τ)}^T = (1, −1, 0, −1, 1, 0, 1, 0, 0)^T and {β_10,0(τ), …, β_27,0(τ)}^T = (0, …, 0)^T. For n=600, we added {β_28,0(τ), β_29,0(τ), β_30,0(τ)}^T = (0,0,0)^T. For n=800, we further added {β_31,0(τ), β_32,0(τ), β_33,0(τ)}^T = (0,0,0)^T. This setting allowed d_n to grow as n increased. The number of non-zero groups and non-zero individual variables of the underlying model were 3 and 5, respectively, for each n.

Table 1 summarizes the simulation results. “AGB-CQ”, “AGB-GB”, and “GB” indicate the adaptive group bridge with $\tilde{β} (τ)$ from Peng and Fine [1], the adaptive group bridge with $\tilde{β} (τ)$ from the group bridge, and the group bridge, respectively. “% Corr. Group” and “% Corr. Individual” represent the proportions that the corresponding variable selection method correctly identified the non-zero group variables and non-zero individual variables of the underlying model, respectively. “Group Size” and “Model Size” are the mean number of groups and individual variables selected by each variable selection method, respectively. “MSER” is the ratio of the median MSE of each variable selection method to that of the oracle estimator. The adaptive group bridge and the group bridge identified the true non-zero and zero groups very well in group variable selection. The mean group sizes of the adaptive group bridge and the group bridge were very close to 3. However, the group bridge performed poorly in within-group variable selection, that is, individual variable selection. It over-identified individual variables as non-zero variables. On the other hand, the adaptive group bridge correctly identified the true non-zero individual variables well. In addition, as n increased, the mean group sizes and the mean model sizes of the adaptive group bridge became closer to 3 and 5, respectively. The MSERs of the adaptive group bridge with $\tilde{β} (τ)$ from the group bridge was lower than those of the other methods. Furthermore, the MSERs of the adaptive group bridge got smaller as n increased in general. We also conducted a simulation under the same setting except that pairwise correlation between continuous variables was assumed to be 0.2 if they belonged to different groups. We had similar results to Table 1 and thus did not report them.

Table 1.

Simulation results for group variables consisting of continuous variables with G(X).

τ	n	Method	% Corr. Group	% Corr. Individual	Group Size	Model Size	MSER
0.1	400	AGB-CQ	0.995	0.987	3.005	5.013	2.108
		AGB-GB	0.991	0.976	3.001	5.011	1.159
		GB	0.988	0.554	3.012	5.570	2.451
	600	AGB-CQ	0.996	0.996	3.004	5.005	2.281
		AGB-GB	0.996	0.992	2.999	5.000	1.056
		GB	0.994	0.622	3.008	5.472	2.491
	800	AGB-CQ	0.997	0.996	3.003	5.004	1.834
		AGB-GB	0.997	0.995	3.003	5.005	1.024
		GB	0.997	0.627	3.003	5.434	2.315
0.25	400	AGB-CQ	0.917	0.855	3.097	5.185	1.316
		AGB-GB	0.931	0.871	3.076	5.150	1.041
		GB	0.945	0.363	3.062	5.963	1.543
	600	AGB-CQ	0.936	0.884	3.073	5.141	1.293
		AGB-GB	0.948	0.892	3.054	5.115	1.077
		GB	0.960	0.358	3.041	5.977	1.583
	800	AGB-CQ	0.949	0.905	3.059	5.111	1.041
		AGB-GB	0.966	0.929	3.038	5.085	1.006
		GB	0.973	0.404	3.028	5.875	1.463

Open in a new tab

Next, we performed a simulation study for Setting ii) group variables consisting of continuous variables and categorical variables with non-covariate dependent censoring distribution. We examined 3 sample sizes: n = 600, 900, and 1200. For n = 600, there were 10 groups: 5 groups consisting of continuous variables (Groups 1 to 5) and 5 groups consisting of categorical variables (Groups 6 to 10). Groups 1 and 2 contained 6 continuous variables each and Groups 3 to 5 were comprised of 3 continuous variables each. The pairwise correlation among continuous variables within group was 0.5. There was no correlation between continuous variables if they belonged to different groups. Groups 6 and 7 consisted of 7 categories each (that is, 6 indicator variables each) and Groups 8 to 10 categories had 4 categories each (that is, 3 indicator variables each). The reference group for each categorical variable was set to 0. Thus, there were 42 variables in total. The true β₀(τ) for n = 600 was {β_1,0(τ), …, β_6,0(τ)}^T = (1, −1,0, …, 0)^T, {β_7,0(τ), …, β_12,0(τ)}^T = (0, …, 0)^T, {β_13,0(τ), β_14,0(τ), β_15,0 (τ)}^T = (1,0,0)^T, and {β₁₆(τ), …, β₂₁(τ)}^T = (0, …, 0)^T, {β_22,0(τ), …, β_27,0(τ)}^T = (1, −1, 0, …, 0)^T, {β_28,0 (τ), …, β_33,0(τ)}^T = (0, …, 0)^T, {β_34,0 (τ), β_35,0(τ), β_36,0(τ)}^T = (1,0,0)^T, and {β₃₇(τ), …, β₄₂(τ)}^T = (0, …, 0)^T. For n = 900, we added one more group consisting of 3 continuous variables with pairwise correlation 0.5 and {β_43,0(τ), β_44,0(τ), β_45,0(τ)}^T = (0,0,0)^T. For n = 1200, in addition to {β_43,0(τ), β_44,0(τ), β_45,0(τ)}^T, we further added a categorical variable having 4 categories, that is, 3 indicator variables: {β_46,0(τ), β_47,0(τ), β_48,0(τ)}^T = (0,0,0)^T. Thus, the number of non-zero groups and non-zero individual variables of the underlying model were 4 and 6, respectively, for each n.

Table 2 shows the simulation results. The adaptive group bridge identified the true non-zero and zero groups better than the group bridge when n = 600 and 900 for τ = 0.1, and n = 600 for τ = 0.25. When n = 1200, both of the methods selected non-zero group variables very well. The mean group sizes of the adaptive group bridge were very close to 4. The group bridge performed poorly in individual variable selection as in Setting i). On the other hand, the adaptive group bridge correctly identified the true non-zero individual variables proficiently. In addition, as n increased, the mean group sizes and the mean model sizes of the adaptive group bridge became closer to 4 and 6, respectively. The MSERs of the adaptive group bridge with $\tilde{β} (τ)$ from the group bridge was lower than those of the other methods. In addition, the MSERs of the adaptive group bridge got smaller as n increased in general.

Table 2.

Simulation results for group variables consisting of continuous and categorical variables with G(X).

τ	n	Method	% Corr. Group	% Corr. Individual	Group Size	Model Size	MSER
0.1	600	AGB-CQ	0.776	0.500	3.737	5.354	4.061
		AGB-GB	0.870	0.652	3.861	5.603	1.621
		GB	0.536	0.136	3.348	5.485	8.226
	900	AGB-CQ	0.952	0.809	3.955	5.835	3.216
		AGB-GB	0.989	0.913	3.989	5.934	1.344
		GB	0.830	0.306	3.798	6.386	4.387
	1200	AGB-CQ	0.990	0.910	3.998	5.940	3.283
		AGB-GB	0.999	0.973	4.001	5.989	1.248
		GB	0.955	0.383	3.955	6.704	4.147
0.25	600	AGB-CQ	0.933	0.647	4.015	6.097	1.925
		AGB-GB	0.955	0.768	4.023	6.098	1.421
		GB	0.881	0.187	3.897	7.187	2.660
	900	AGB-CQ	0.974	0.810	4.025	6.163	1.681
		AGB-GB	0.979	0.879	4.019	6.118	1.212
		GB	0.979	0.220	3.997	7.393	2.484
	1200	AGB-CQ	0.965	0.832	4.033	6.167	1.519
		AGB-GB	0.964	0.888	4.034	6.126	1.250
		GB	0.991	0.264	4.005	7.370	2.730

Open in a new tab

Last, we performed a simulation study for Setting ii) with covariate-dependent censoring distribution. We used the same β₀(τ) as in Setting ii) with non-covariate dependent censoring distribution. The true α₀ for $G (t | \tilde{Z})$ when n = 600 was (α_1,0, …, α_6,0)^T = (1,−1,0, …, 0)^T, (α_7,0, …, α_21,0)^T = (0, …, 0)^T, (α_22,0, …, α_27,0)^T = (1,−1, 0, …, 0)^T, and (α_28,0, …, α_42,0)^T = (0, …, 0)^T. For n = 900 and 1200, we added (α_43,0, α_44,0, α_45,0)^T = (0, 0, 0)^T and (α_46,0, α_47,0, α_48,0)^T = (0, 0, 0)^T, respectively. The Breslow-type estimator was used to estimate $G (t | \tilde{Z})$ . We selected p₁ and λ_c to generate 50% cause 1 events, 20% cause 2 events, and 30% censoring. Table 3 summarizes the simulation results. In general, the results were similar to Table 2. The adaptive group bridge performed better than the group bridge in terms of individual variable selection and MSER.

Table 3.

Simulation results for group variables consisting of continuous and categorical variables with $G (X | \tilde{Z})$

τ	n	Method	% Corr. Group	% Corr. Individual	Group Size	Model Size	MSER
0.1	600	AGB-CQ	0.769	0.547	3.725	5.393	3.096
		AGB-GB	0.862	0.687	3.850	5.615	1.625
		GB	0.533	0.155	3.372	5.538	7.692
	900	AGB-CQ	0.956	0.830	3.976	5.859	2.504
		AGB-GB	0.981	0.902	3.991	5.915	1.356
		GB	0.796	0.287	3.849	6.411	4.049
	1200	AGB-CQ	0.996	0.912	3.996	5.945	3.250
		AGB-GB	0.998	0.963	4.000	5.982	1.248
		GB	0.949	0.370	3.946	6.669	3.972
0.25	600	AGB-CQ	0.940	0.674	4.007	6.040	1.881
		AGB-GB	0.963	0.803	4.024	6.051	1.346
		GB	0.907	0.210	3.914	7.179	2.376
	900	AGB-CQ	0.966	0.854	4.031	6.099	1.680
		AGB-GB	0.967	0.891	4.031	6.093	1.206
		GB	0.885	0.261	4.088	7.363	2.350
	1200	AGB-CQ	0.980	0.888	4.020	6.099	1.537
		AGB-GB	0.986	0.920	4.014	6.078	1.177
		GB	0.996	0.296	4.000	7.202	2.386

Open in a new tab

4. Bone marrow transplant data example

The adaptive group bridge was applied to a bone marrow transplant data set. Verneris et al. [10] studied the outcomes of the patients having reduced-intensity conditioning allogeneic hematopoietic cell transplantation from 1999 to 2011. We considered 2011 patients with human leukocyte antigen fully-matched unrelated donors. Relapse was the outcome of interest for the analysis. Treatment-related-mortality (TRM) was a competing risk. There were 40.5% of relapse, 26.6% of TRM, and 32.9% of censoring. In addition, 69%, 16%, and 8% of relapse events occurred within 6 months, between 6 and 12 months, and between 12 and 24 months, respectively. Thus, the distribution of relapse events were skewed. The overall relapse rate at 1 year was about 35%. The 13 binary or categorical variables that we considered for variable selection included disease type, recipient age, donor age, donor-recipient sex match, donor-recipient cytomegalovirus (CMV) match, ABO blood type match, donor parity, disease status at transplant, conditioning intensity, total body irradiation, graft type, graft-versus-host disease (GVHD) prophylaxis, and in-vivo T cell depletion. They consisted of 28 indicator variables. The censoring distribution did not depend on any covariates based on the Cox proportional hazards model.

We selected variables for the 0.35th competing risks quantile regression for relapse using the following three selection methods: the group bridge, the adaptive group bridge with $\tilde{β} (τ)$ from Peng and Fine [1], and the adaptive group bridge with $\tilde{β} (τ)$ from the group bridge. The reference group was set to zero. Table 4 shows the selected variables and their estimates. The group bridge selected disease status at transplant, CMV match, conditioning intensity, in-vivo T cell depletion, graft type, and GVHD prophylaxis. On the other hand, both of the adaptive group bridge with $\tilde{β} (τ)$ from Peng and Fine [1] and the adaptive group bridge with $\tilde{β} (τ)$ from the group bridge selected the same variables: disease status at transplant, CMV match, conditioning intensity, and in-vivo T cell depletion. The adaptive group bridge did not select graft type and GVHD prophylaxis, which is why all of their estimates are zeros in Table 4. The competing risks quantile regression of Peng and Fine [1] was fitted using the variables selected by at least one of the three methods. “CQ” in Table 4 indicates their estimates and p-values from the competing risks quantile regression of Peng and Fine [1]. It suggests that all variables selected by the adaptive group bridge appeared to be significant. However, graft type and GVHD prophylaxis that the group bridge selected appeared not to be significant based on their p-values.

Table 4.

Selected variables and estimates. “ref” means the reference group. “CQ” indicates the competing risks quantile regression.

Variable	Subcategory	AGB-CQ	AGB-GB	GB	CQ

		Est.	Est.	Est.	Est.	p-value
Disease status	Early (ref)	0	0	0	0
	Intermediate	0	0	0	−0.055	0.503
	Advanced	−0.891	−0.891	−0.859	−0.460	< 0.001
CMV match	+/+ (ref)	0	0	0	0
	+/−	−0.455	−0.445	−0.476	−0.363	0.018
	−/+	0	0	0	−0.034	0.902
	−/−	0	0	0	−0.027	0.774
	Missing	0	0	0	−0.054	0.939
Conditioning intensity	Reduced intensity (ref)	0	0	0	0
	Nonmyeloablative	−1.056	−1.060	−1.165	−0.536	< 0.001
In-vivo T cell depletion	No (ref)	0	0	0	0
	Yes	−0.488	−0.488	−0.466	−0.301	0.010
Graft type	Bone marrow (ref)	0	0	0	0
	Peripheral blood	0	0	0.181	0.130	0.137
GVHD prophylaxis	FK506 ± others (ref)	0	0	0	0
	Others	0	0	0.175	0.121	0.212

Open in a new tab

5. Conclusion

The group bridge and the adaptive group bridge were proposed to select variables for the competing risks quantile regression. Their oracle property was studied. In particular, the adaptive group bridge not only consistently identifies non-zero group variables, but also consistently selects non-zero within-group variables. We also proposed the BIC-type criterion to choose a tuning parameter. The proposed BIC-type criterion appears to work properly in the simulation study. The adaptive group bridge selected non-zero within-group variables more consistently than the group bridge in the simulation study. A bone marrow transplant example showed the usefulness of the adaptive group bridge.

The proposed method was limited to when d_n < n. Developing a group variable selection method when d_n < n would be a crucial research problem. A two-step variable selection procedure may be developed for this: once we screen group variables in the first step, we may use the adaptive group bridge to obtain a further parsimonious list of non-zero variables in the second step. The theoretical justification of the proposed BIC-type criterion needs to be studied in the future.

Supplementary Material

Supp info

NIHMS939135-supplement-Supp_info.pdf^{(227.1KB, pdf)}

Acknowledgments

This work was supported in part by Institutional Research Grant #14-247-29 from the American Cancer Society and the MCW Cancer Center, and the US National Cancer Institute (U24CA076518). The authors would like to thank the Associate Editor and two anonymous reviewers for their helpful comments that significantly improved the manuscript.

Appendix

For $G (t | \tilde{Z})$ and the Breslow estimator, we assume as follows:

$\int_{0}^{L} λ_{0}^{G} (t) d t < \infty$ and P{Y_i(t) = 1} > 0 for t ∈ [0, L], i = 1, …, n, and $d_{n}^{4} / n \to 0$ as n → ∞.
Z_ij is bounded almost surely for all i,j and $α^{T} \tilde{Z}$ is bounded almost surely for any $\tilde{Z}$ and α ∈ ℬ, where ℬ is a neighborhood α₀.
For d = 0, 1, 2, there exists a neighborhood ℬ of α₀ such that $s_{G}^{(d)} (α, t)$ are continuous functions and $\sup_{t \in (0, L), α \in B} ‖ S_{G}^{(d)} (α, t) - s_{G}^{(d)} (α, t) ‖ \to 0$ in probability.
The matrix $A (α_{0}) = \int_{0}^{L} v_{G} (α_{0}, t) s_{G}^{(0)} (α_{0}, t) λ_{0}^{G} (t) d t$ is positive definite, where $v_{G} (α, t) = s_{G}^{(2)} (α, t) / s_{G}^{(0)} (α, t) - e_{G} {(α, t)}^{\otimes 2}$ and $e_{G} (α, t) = s_{G}^{(1)} (α, t) / s_{G}^{(0)} (α, t)$ .
For all α ∈ ℬ, t ∈ [0, L], $S_{G}^{(1)} (α, t) = \partial S_{G}^{(0)} (α, t) / \partial α$ , and $S_{G}^{(2)} (α, t) = \partial^{2} S_{G}^{(0)} (α, t) / (\partial α \partial α^{T})$ , where $S_{G}^{(d)} (α, t)$ , d = 0, 1, 2 are continuous functions of α ∈ ℬ uniformly in t ∈ [0, L] and are bounded on ℬ × [0, L], and $s_{G}^{(0)}$ is bounded away from zero on ℬ × [0, L].

Footnotes

Supplementary Material

Additional supplementary material may be found in the online version of this article at the publishers web site.

References

1.Peng L, Fine JP. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104:1440–1453. [Google Scholar]
2.Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008;103:637–649. [Google Scholar]
3.Reich BJ, Smith LB. Bayesian quantile regression for censored data. Biometrics. 2013;69:651–660. doi: 10.1111/biom.12053. [DOI] [PubMed] [Google Scholar]
4.Yin G, Zeng D, Li H. Power–transformed linear quantile regression with censored data. Journal of the American Statistical Association. 2008;103:1214–1224. [Google Scholar]
5.Yin G, Cai J. Quantile regression models with multivariate failure time data. Journal of the American Statistical Association. 2005;61:151–161. doi: 10.1111/j.0006-341X.2005.030815.x. [DOI] [PubMed] [Google Scholar]
6.Sun Y, Wang HJ, Gilbert PB. Quantile regression for competing risks data with missing cause of failure. Statistica Sinica. 2012;22:703–728. doi: 10.5705/ss.2010.093. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lee M, Fine J. Inference for cumulative incidence quantiles via parametric and nonparametric approaches. Statistics in Medicine. 2011;30:3221–3235. doi: 10.1002/sim.4349. [DOI] [PubMed] [Google Scholar]
8.Jiang R, Qian W, Zhou Z. Variable selection and coefficient estimation via composite quantile regression with randomly censored data. Statistics & Probability Letters. 2012;82:308–317. [Google Scholar]
9.Wang HJ, Zhou J, Li Y. Variable selection for censored quantile regression. Statistica Sinica. 2013;23:145–167. doi: 10.5705/ss.2011.100. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Verneris MR, Lee SJ, Ahn KW, Wang HL, Battiwalla M, Inamoto Y, Munker R, Aljurf M, Saber W, Spellman S, et al. HLA-mismatch is associated with worse outcomes after unrelated donor reduced intensity transplantation: An analysis from the CIBMTR. Biology of Blood and Marrow Transplant. 2015;21:1783–1789. doi: 10.1016/j.bbmt.2015.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B. 2006;68:49–67. [Google Scholar]
12.Huang J, Ma S, Xie H, Zhang CH. A group bridge approach for variable selection. Biometrika. 2009;96:339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Zhou N, Zhu J. Group variable selection via a hierarchical lasso and its oracle property. Statistics and Its Interface. 2010;3:557–574. [Google Scholar]
14.Zhao W, Zhang R, Liu J. Sparse group variable selection based on quantile hierarchical lasso. Journal of Applied Statistics. 2014;41:1658–1677. [Google Scholar]
15.Fu Z, Parikh CR, Zhou B. Penalized variable selection in competing risks regression. Lifetime Data Analysis. 2016;23:353376. doi: 10.1007/s10985-016-9362-3. [DOI] [PubMed] [Google Scholar]
16.Breslow NE. Discussion of the paper by d. r.cox. Journal of the Royal Statistical Society: Series B. 1972;34:216–217. [Google Scholar]
17.Huang J, Li L, Liu Y, Zhao X. Group selection in the Cox model with a diverging number of covariates. Statistica Sinica. 2014;24:1787–1810. [Google Scholar]
18.Cai J, Fan J, Li R, Zhou H. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–316. doi: 10.1093/biomet/92.2.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Koenker R. quantreg: Quantile Regression. 2016 URL https://CRAN.R-project.org/package=quantreg, r package version 5.26.
20.Lee ER, Noh H, Park BU. Model selection via bayesian information criterion for quantile regression models. Journal of the American Statistical Association. 2014;109:216–229. [Google Scholar]
21.Shows JH, Lu W, Zhang HH. Sparse estimation and inference for censored median regression. Journal of Statistical Planning and Inference. 2010;140:1903–1917. doi: 10.1016/j.jspi.2010.01.043. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp info

NIHMS939135-supplement-Supp_info.pdf^{(227.1KB, pdf)}

[R1] 1.Peng L, Fine JP. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104:1440–1453. [Google Scholar]

[R2] 2.Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008;103:637–649. [Google Scholar]

[R3] 3.Reich BJ, Smith LB. Bayesian quantile regression for censored data. Biometrics. 2013;69:651–660. doi: 10.1111/biom.12053. [DOI] [PubMed] [Google Scholar]

[R4] 4.Yin G, Zeng D, Li H. Power–transformed linear quantile regression with censored data. Journal of the American Statistical Association. 2008;103:1214–1224. [Google Scholar]

[R5] 5.Yin G, Cai J. Quantile regression models with multivariate failure time data. Journal of the American Statistical Association. 2005;61:151–161. doi: 10.1111/j.0006-341X.2005.030815.x. [DOI] [PubMed] [Google Scholar]

[R6] 6.Sun Y, Wang HJ, Gilbert PB. Quantile regression for competing risks data with missing cause of failure. Statistica Sinica. 2012;22:703–728. doi: 10.5705/ss.2010.093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Lee M, Fine J. Inference for cumulative incidence quantiles via parametric and nonparametric approaches. Statistics in Medicine. 2011;30:3221–3235. doi: 10.1002/sim.4349. [DOI] [PubMed] [Google Scholar]

[R8] 8.Jiang R, Qian W, Zhou Z. Variable selection and coefficient estimation via composite quantile regression with randomly censored data. Statistics & Probability Letters. 2012;82:308–317. [Google Scholar]

[R9] 9.Wang HJ, Zhou J, Li Y. Variable selection for censored quantile regression. Statistica Sinica. 2013;23:145–167. doi: 10.5705/ss.2011.100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Verneris MR, Lee SJ, Ahn KW, Wang HL, Battiwalla M, Inamoto Y, Munker R, Aljurf M, Saber W, Spellman S, et al. HLA-mismatch is associated with worse outcomes after unrelated donor reduced intensity transplantation: An analysis from the CIBMTR. Biology of Blood and Marrow Transplant. 2015;21:1783–1789. doi: 10.1016/j.bbmt.2015.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B. 2006;68:49–67. [Google Scholar]

[R12] 12.Huang J, Ma S, Xie H, Zhang CH. A group bridge approach for variable selection. Biometrika. 2009;96:339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Zhou N, Zhu J. Group variable selection via a hierarchical lasso and its oracle property. Statistics and Its Interface. 2010;3:557–574. [Google Scholar]

[R14] 14.Zhao W, Zhang R, Liu J. Sparse group variable selection based on quantile hierarchical lasso. Journal of Applied Statistics. 2014;41:1658–1677. [Google Scholar]

[R15] 15.Fu Z, Parikh CR, Zhou B. Penalized variable selection in competing risks regression. Lifetime Data Analysis. 2016;23:353376. doi: 10.1007/s10985-016-9362-3. [DOI] [PubMed] [Google Scholar]

[R16] 16.Breslow NE. Discussion of the paper by d. r.cox. Journal of the Royal Statistical Society: Series B. 1972;34:216–217. [Google Scholar]

[R17] 17.Huang J, Li L, Liu Y, Zhao X. Group selection in the Cox model with a diverging number of covariates. Statistica Sinica. 2014;24:1787–1810. [Google Scholar]

[R18] 18.Cai J, Fan J, Li R, Zhou H. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–316. doi: 10.1093/biomet/92.2.303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Koenker R. quantreg: Quantile Regression. 2016 URL https://CRAN.R-project.org/package=quantreg, r package version 5.26.

[R20] 20.Lee ER, Noh H, Park BU. Model selection via bayesian information criterion for quantile regression models. Journal of the American Statistical Association. 2014;109:216–229. [Google Scholar]

[R21] 21.Shows JH, Lu W, Zhang HH. Sparse estimation and inference for censored median regression. Journal of Statistical Planning and Inference. 2010;140:1903–1917. doi: 10.1016/j.jspi.2010.01.043. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Variable selection with group structure in competing risks quantile regression

Kwang Woo Ahn

Soyoung Kim

Abstract

1. Introduction

2. Method

Lemma 2.1

Lemma 2.2

Lemma 2.3

Theorem 2.4

Theorem 2.5

3. Simulation

Table 1.

Table 2.

Table 3.

4. Bone marrow transplant data example

Table 4.

5. Conclusion

Supplementary Material

Acknowledgments

Appendix

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Variable selection with group structure in competing risks quantile regression

Kwang Woo Ahn

Soyoung Kim

Abstract

1. Introduction

2. Method

Lemma 2.1

Lemma 2.2

Lemma 2.3

Theorem 2.4

Theorem 2.5

3. Simulation

Table 1.

Table 2.

Table 3.

4. Bone marrow transplant data example

Table 4.

5. Conclusion

Supplementary Material

Acknowledgments

Appendix

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases