Bi-level variable selection for case-cohort studies with group variables

Soyoung Kim; Kwang Woo Ahn

doi:10.1177/0962280218803654

. Author manuscript; available in PMC: 2019 Oct 1.

Published in final edited form as: Stat Methods Med Res. 2018 Oct 11;28(10-11):3404–3414. doi: 10.1177/0962280218803654

Bi-level variable selection for case-cohort studies with group variables

Soyoung Kim ¹, Kwang Woo Ahn ¹

PMCID: PMC6748310 NIHMSID: NIHMS1046936 PMID: 30306838

Abstract

The case-cohort design is an economical approach to estimate the effect of risk factors on the survival outcome when collecting exposure information or covariates on all patients is expensive in a large cohort study. Variables often have group structure such as categorical variables and highly correlated continuous variables. The existing literature for case-cohort data is limited to identifying non-zero variables at individual level only. In this article, we propose a bi-level variable selection method to select non-zero group and within-group variables for case-cohort data when variables have group structure. The proposed method allows the number of variables to diverge as the sample size increases. The asymptotic properties of the estimator including bi-level variable selection consistency and the asymptotic normality are shown. We also conduct simulations to compare our proposed method with some existing method and apply them to the Busselton Health data.

Keywords: Case-cohort design, efficiency, multiple diseases, survival analysis, variable selection

1. Introduction

The case-cohort design is widely used to estimate the effect of risk factors on survival outcome when measuring exposure information is costly. Case-cohort data consist of a random sample, called the subcohort, from the full cohort and all cases outside the subcohort. In other words, the expensive covariate information is collected from subjects being in the subcohort or having events of interest outside the subcohort. When studying multiple diseases of interest, the case-cohort design capable of using the same subcohort has been advocated.¹

The extensive work has been done for analyzing case-cohort data for survival outcomes. With a single disease, a pseudo-likelihood approach was proposed by Prentice¹ and Self and Prentice.² In order to improve efficiency, Barlow³ and Kulich and Lin⁴ proposed a robust estimator using a time-varying weight and a class of weighted estimation using all available information, respectively. When several case-cohort studies have already been conducted for multiple diseases, Kang and Cai⁵ developed a joint model with multivariate failure time. However, they did not use extra information from the other diseases when estimating the effect of risk factors for a disease of interest. Kim et al.⁶ developed a more efficient estimation method with a new weight to make full use of information from the other diseases.

In spite of the progress made in developing methods analyzing case-cohort data, studying variable selection under the case-cohort design has been limited. Recently, Ni et al.⁷ proposed a variable selection procedure using the smoothly clipped absolute deviation (SCAD) penalty to select non-zero individual variables for a single case-cohort study. In practice, variables often have group structure such as categorical variables and highly correlated continuous variables. For example, in the Busselton Health data from Cullen⁸ and Knuiman et al.,⁹ a case-cohort study was conducted to investigate the association between serum ferritin and stroke event. It includes categorical variables such as smoking status and categorized serum ferritin levels, and continuous variables. For a continuous variable x, investigators often examine the effects of x,..., x^a on the outcome of interest, where a is a positive integer. In practice, individual variable selection methods are used to identify important variables among x,..., x^a.^7,10 However, because they are highly correlated, one may treat x,..., x^a as a group and apply a bi-level variable selection technique to identify non-zero group and within-group variables. Thus, bi-level selection may be more efficient than individual variable selection in selecting non-zero variables.

Many methods have been developed for variable selection with group structure under the proportional hazards model. Ma et al.¹¹ and Kim et al.¹² proposed the supervised group lasso and the group lasso method, respectively, which are limited to group variable selection. Huang et al.¹³ studied the group bridge to select variables at bi-level, that is, at group and within-group levels. However, they showed the group variable selection consistency only, not the bi-level variable selection consistency. Alternative approach by Wang and Nan¹⁴ considered a hierarchically penalized proportional hazards regression for bi-level variable selection. For the subdistribution hazards model with group variables, Ahn et al.¹⁵ developed the adaptive group bridge that has the bi-level variable selection consistency. However, to our best knowledge, there is no literature on bi-level variable selection for survival outcome under the case-cohort design.

In this article, we propose the adaptive group bridge which is capable of identifying non-zero variables at group and within-group levels for the univariate proportional hazards model under the case-cohort design. To study a disease of interest, we consider two case-cohort designs: a single case-cohort study and multiple case-cohort studies. In contrast with a single case-cohort study, multiple case-cohort studies have extra information from the other diseases. We propose to use such extra information to improve variable selection accuracy. We show the bi-level variable selection consistency and the asymptotic normality of the proposed estimator while allowing the number of variables to diverge as the sample size increases. The proposed method is applied to the Busselton Health data.

2. Model selection with adaptive group bridge under case-cohort designs

2.1. Estimation for case-cohort designs

We define notations for multiple case-cohort studies for generality. A single case-cohort study can be handled as a special case. To elaborate multiple case-cohort studies, we consider multivariate outcomes one of which is the outcome of interest for the univariate proportional hazards model.

Suppose there are n independent subjects and K diseases in the full cohort. For subject i with disease k, let T_ik, C_ik, and X_ik = min(T_ik, C_ik) denote the failure time of interest, censoring time, and observed time, respectively. Let $Δ_{i k} = I (T_{i k} \leq C_{i k}), N_{i k} (t) = I (X_{i k} \leq t, Δ_{i k} = 1), and Y_{i k} (t) = I (X_{i k} \geq t)$ be, respectively, a failure indicator, the counting process, and an at-risk indicator, where $I (\cdot)$ is an indicator function. Let $Z_{i k} (t) = {Z_{i k 1} (t), \dots, Z_{i k d_{n}} (t)}^{T}$ be a d_n × 1 possibly time-dependent covariate vector. Without loss of generality, we assume that Z_ik(t) are centered and standardized. We assume that T_ik is independent of C_ik given $Z_{i k} (\cdot) .$ The study period is assumed to be [0, τ]. Consider the event-specific hazards model: the hazard function for subject i with disease k is assumed to be $h_{i k} {t | Z_{i k} (t)} = h_{0 k} (t) \exp {β_{k}^{T} Z_{i k} (t)},$ where h_0k(t) is an unspecified baseline hazard function and $β_{k} = {(β_{1}, \dots, β_{d_{n}})}^{T}$ is an unknown parameter vector of interest.¹⁶ Let $β_{0} = {(β_{10}, \dots, β_{d_{n} 0})}^{T}$ be the true parameter vector.

Under multiple case-cohort studies, covariate information is available for (i) a randomly selected subcohort from the full cohort and (ii) all cases from any causes outside the subcohort. The subcohort is shared by all outcomes. More specifically, fixed ñ subjects are randomly selected from the full cohort for the shared subcohort. Let ξ_i be an indicator of subject i being selected into the subcohort. Each subject is selected with the same probability $\tilde{α} = pr (ξ_{1} = 1) = \tilde{n} / n .$ The data under multiple case-cohort studies consist of ${X_{i k}, Δ_{i k}, ξ_{i}, Z_{i k} (t), 0 \leq t \leq X_{i k}}$ when ξ_i = 1 or Δ_ik = 1 for k = 1,..., K and {X_ik, Δ_ik, ξ_i} when ξ_i = 0 and Δ_ik = 0 for k = 1,..., K.

Assume disease k is of interest. For disease k, the negative pseudo-partial likelihood is

{\tilde{l}}_{k} (β) = - \sum_{i = 1}^{n} \int_{0}^{τ} [β_{k}^{T} Z_{i k} (t) - \log \sum_{j = 1}^{n} w_{j k} (t) Y_{j k} (t) \exp {β_{k}^{T} Z_{j k} (t)}] d N_{i k} (t)

(1)

where w_ik(t) is the time-varying weight function for case-cohort data. There are several weight functions.^1,3,6,17,18 In this paper, we consider two efficient time-varying weight functions. The first weight function proposed by Kalbfleisch and Lawless¹⁷ for a single case-cohort study has the following form

w_{i k, 1} (t) = Δ_{i k} + (1 - Δ_{i k}) ξ_{i} {\hat{α}}_{k}^{- 1} (t)

(2)

where ${\hat{α}}_{k} (t) = \sum_{i = 1}^{n} ξ_{i} Y_{i k} (t) (1 - Δ_{i k}) / \sum_{i = 1}^{n} Y_{i k} (t) (1 - Δ_{i k})$ which is an estimator for the subcohort selection probability representing the proportion of sampled subjects among subjects who have disease k and remain in the risk set at time t.

The weight function $w_{i k, 1} (t)$ ignores the extra information from the other diseases under multiple case-cohort studies. To use all collected covariate information for subjects who have the other diseases outside the subcohort, Kim et al.⁶ proposed the following more efficient weight

w_{i k, 2} (t) = {1 - \prod_{k = 1}^{K} (1 - Δ_{i k})} + \prod_{k = 1}^{K} (1 - Δ_{i k}) ξ_{i} {\tilde{α}}_{k}^{- 1} (t)

(3)

where ${\tilde{α}}_{k} (t) = \sum_{i = 1}^{n} ξ_{i} Y_{i k} (t) \prod_{k = 1}^{K} (1 - Δ_{i k}) / \sum_{i = 1}^{n} Y_{i k} (t) \prod_{k = 1}^{K} (1 - Δ_{i k})$ which is the proportion of sampled subjects among subjects who do not have any diseases and remain in the risk set at time t. The parameter estimator $\tilde{β}$ can be obtained by minimizing the negative pseudo-partial likelihood (1) using weights (2) or (3).

2.2. Adaptive group bridge

In this section, we propose the adaptive group bridge for bi-level variable selection with group variables. First, we define notations on group variables and their membership. Suppose that there are G groups and group memberships A₁,..., A_G are defined as subsets of {1,..., d_n}. The cardinality of a set A is denoted by |A|. Groups are allowed to overlap. We define a |A| × 1 vector $β_{A} = {(β_{m}, m \in A)}^{T} and β_{A 0} = {(β_{m 0}, m \in A)}^{T} .$ For individual membership, we define B₁ and B₂ such that β_m ≠ 0 if m ∈ B₁ and β_m = 0 if m ∈ B₂. Without loss of generality, we assume $β_{A_{g}} \neq 0 for g = 1, \dots, G_{1} and β_{A_{g}} = 0$ for g = G₁ + 1,..., G.

For data with group structure, we propose the following penalized pseudo-partial likelihood to obtain the estimator $\hat{β} .$

L_{n} (β) = {\tilde{l}}_{k} (β) + λ_{n} \sum_{g = 1}^{G} c_{g} {(\sum_{m \in A_{g}} \frac{| β_{m} |}{{| {\tilde{β}}_{m} |}^{v}})}^{γ}

(4)

where 0 < γ < 1, λ_n > 0, v > 0, the c_g’s are the constants to adjust for different |A_g|’s, and $\tilde{β}$ is a consistent estimator of β₀.The estimator $\tilde{β}$ can be obtained from pseudo-partial likelihood (1) using weights (2) or (3). In practice, $c_{g} \propto {| A_{g} |}^{1 - γ}$ is widely used.^19,13 When v = 0, the penalty term in (4) is the group bridge penalty of Huang et al.^13,19 Using ${\tilde{β}}_{m}$ as the weight is similar to the adaptive lasso of Zou.²⁰ Thus, we call our proposed penalty as the adaptive group bridge penalty. In the case with γ = 1/2 and c_g = 1 for all g, the adaptive group bridge penalty is the same as the adaptive hierarchical penalty of Wang and Nan.¹⁴

Although Huang et al.^13,19 proposed the group bridge for bi-level variable selection, they showed the group variable selection consistency only, not the bi-level variable selection consistency. Because the group bridge set v to 0 in (4), it equally penalizes β_m’s for m ∈ A_g within group g. This may lead to inconsistent within-group variable selection. To overcome this limitation of the group bridge, the adaptive group bridge uses the weight ${\tilde{β}}_{m}$ in the penalty term like the adaptive lasso. When β_m0 = 0, ${\tilde{β}}_{m}$ is close to 0 for sufficiently large n. Thus, $1 / {| {\tilde{β}}_{j} |}^{v}$ assigns larger penalties to zero parameters. On the other hand, $1 / {| {\tilde{β}}_{m} |}^{v}$ converges to a non-zero constant when β_m0 ≠ 0. By doing so, putting the weight ${\tilde{β}}_{m}$ into the penalty term enables the adaptive group bridge to identify non-zero variables at bi-level more consistently than the group bridge.

2.3. Asymptotic properties

In this section, we study the asymptotic properties of the adaptive group bridge estimator $\hat{β}$ using weight (3). The asymptotic properties of the estimator using weight (2) can be similarly shown and thus their proofs are omitted in this article. We define a^⊗0 = 1, a^⊗1 = a, a^⊗2 = aa^T, and the following notations:

S_{k}^{(d)} (β, t) = \frac{1}{n} \sum_{i = 1}^{n} Y_{i k} (t) Z_{i k} {(t)}^{\otimes d} e^{β^{T} Z_{i k} (t)}, d = 0, 1, 2

{\tilde{S}}_{k}^{(d)} (β, t) = \frac{1}{n} \sum_{i = 1}^{n} w_{i k, 2} (t) Y_{i k} (t) Z_{i k} {(t)}^{\otimes d} e^{β^{T} Z_{k} (t)}, d = 0, 1, 2

s_{k}^{(d)} (β, t) = E {S_{k}^{(d)} (β, t)}, d = 0, 1, 2, e_{k} (β, t) = s_{k}^{(1)} (β, t) / s_{k}^{(0)} (β, t)

v_{k} (β, t) = \frac{s_{k}^{(2)} (β, t) s_{k}^{(0)} (β, t) - s_{k}^{(1)} {(β, t)}^{\otimes 2}}{s_{k}^{(0)} {(β, t)}^{2}}

V_{k} (β, t) = \frac{{\tilde{S}}_{k}^{(2)} (β, t) {\tilde{S}}_{k}^{(0)} (β, t) - {\tilde{S}}_{k}^{(1)} {(β, t)}^{\otimes 2}}{{\tilde{S}}_{k}^{(0)} {(β, t)}^{2}}

Ω (β) = \int_{0}^{τ} v_{k} (β, t) s_{k}^{(0)} (β, t) h_{0 k} (t) d t

Γ (β) = \frac{1}{n} V a r {\partial {\tilde{l}}_{k} (β) / \partial β}

We make the following assumptions:

For all $k, \int_{0}^{τ} h_{0 k} (t) d t < \infty and P {Y_{i k} (t) = 1} > 0 for t \in [0, τ], i = 1, \dots, n .$
$| Z_{i j k} (0) | + \int_{0}^{τ} | d Z_{i j k} (t) | < D_{z} < \infty, i = 1, \dots, n, j = 1, \dots, d_{n}$ almost surely and D_z is a constant.
For d = 0, 1, 2, there exists a neighborhood $B of β_{0}$ such that $s_{k}^{(d)} (β, t)$ are continuous functions and $\sup_{t \in [0, τ], β \in B} {‖ S_{k}^{(d)} (β, t) - s_{k}^{(d)} (β, t) ‖}_{2} p \to 0, where ‖ a ‖_{p}$ defines the L_p norm of a.
For all $β \in B, t \in [0, τ], and k = 1, \dots, K, S_{k}^{(1)} (β, t) = \partial S_{k}^{(0)} (β, t) / \partial β, and S_{k}^{(2)} (β, t) = \partial^{2} S_{k}^{(0)} (β, t) / \partial β \partial β^{T},$ where $S_{k}^{(d)} (β, t)$ for d = 0, 1, 2 are continuous functions of $β \in B$ uniformly in t ∈ [0, τ] and are bounded on $B \times [0, τ]; s_{k}^{(0)}$ is bounded away from zero on $B \times [0, τ] .$
There exist constants C₁ and C₂ such that
$0 < C_{3} < {eigen}_{m i n} {Ω (β_{0}} \leq {eigen}_{m a x} {Ω β_{0}}} < C_{4} < \infty$

where eigen_min{A} and eigen_max{A} are minimal and maximal eigenvalues of a matrix A, respectively.
There exist constants C₃ and C₄ such that
$0 < C_{3} < {eigen}_{m i n} {Ω (β_{0})} \leq {eigen}_{m a x} {Ω (β_{0})} < C_{4} < \infty$
$\lim_{n \to \infty} \tilde{α} = α, where \tilde{α} = \tilde{n} / n$ and α is a positive constant.
For some v₁ and v₂ such that 0 < v₁ < 1,0 < v₂, and v₂/(1 – v₁) < v, $\min_{j \in B_{1}} | β_{j 0} (τ) | = O_{p} {{(d_{n} / n)}^{v_{1} / 2}}, \max_{g} | A_{g} \cap B_{1} | = O {{(n / d_{n})}^{v_{2} / 2}},$ we assume $\sum_{g = 1}^{G_{1}} c_{g} {{(\sum_{j \in A_{g} \cap B_{1}} {| β_{j 0} |}^{1 - v})}^{γ - 1} \sum_{j \in A_{g} \cap B_{1}} 1 / {| β_{j 0} |}^{v}} \leq M_{n} and \sum_{g = 1}^{G_{l}} c_{g} {{(\sum_{j \in A_{g} \cap B_{1}} {| β_{j 0} |}^{1 - v})}^{γ - 2} \sum_{j \in A_{g} \cap B_{1}} 1 / {| β_{j 0} |}^{2 v}} \leq M_{n}, where M_{n} = O_{p} (1) .$
$λ_{n} / \sqrt{n} \to 0, \sqrt{n / d_{n}} {\tilde{β}}_{j} = O_{p} (1), and \min (λ_{n} n^{(v - 1) / 2} d_{n}^{- (1 + v) / 2}, λ_{n} n^{γ (v - 1) / 2} d_{n}^{- 1 + γ (1 - v) / 2}) \to \infty .$

Conditions 1–6 are standard conditions for the proportional hazards model. They are similar to Conditions A1–A3 of Cai et al.¹⁰ and Conditions 1–4 of Ni et al.⁷ They guarantee local asymptotic quadratic property of ${\tilde{l}}_{k} (β)$ and the existence of local minimizer of $L_{n} (β) .$ Condition 7 is boundness for the subcohort selection probability. Conditions 8–9 control λ_n, the number of variables within group, and the magnitude of the true non-zero parameters within non-zero groups as n → ∞. Similar conditions to Conditions 8–9 were used in Ahn et al.¹⁵

We have the following theorem.

Theorem 1. Under Conditions 1–9, we have

Consistency: if $d_{n}^{4} / n \to 0, {‖ \hat{β} - β_{0} ‖}_{2} = O_{p} (\sqrt{d_{n} / n}) .$
Bi-level variable selection consistency: $P ({\hat{β}}_{B_{2}} = 0) \to 1.$
Asymptotic distribution: if $d_{n}^{5} / n \to 0,$ we have

n^{1 / 2} u^{T} Ω_{11}^{- 1 / 2} Γ_{11} ({\hat{β}}_{B_{1}} - β_{B_{1} 0}) \to_{d} N (0, 1)

where u is a $| B_{1} | \times 1$ constant vector with ||u|| = 1, and Ω₁₁ and Γ₁₁ are the leading $| B_{1} | \times | B_{1} |$ submatrices of Ω(β₀) and Γ(β₀), respectively.

The Supplemental Materials include the proof of Theorem 1 and the estimators of Ω(β₀) and Γ(β₀). Theorem 1 establishes the oracle property of the adaptive group bridge estimator. In particular, it shows that the adaptive group bridge consistently selects not only non-zero group variables, but also non-zero within-group variables. We have the following corollary:

Corollary 1. Let $\tilde{β}$ be the estimator based on the weighted pseudo-partial likelihood (1) using weights (2) or (3). Under Conditions 1–7 and $d_{n}^{4} / n \to 0, {‖ \tilde{β} - β_{0} ‖}_{2} = O_{p} (\sqrt{d_{n} / n}) .$

2.4. Computation

Since it is difficult to directly minimize $L_{n} (β)$ with respect to β, we formulate minimizing $L_{n} (β)$ to minimizing

Q (β, θ) = {\tilde{l}}_{k} (β) + \sum_{g = 1}^{G} θ_{g}^{1 - 1 / γ} c_{g}^{1 / γ} \sum_{m \in A_{g}} \frac{| β_{m} |}{{| {\tilde{β}}_{m} |}^{v}} + ζ_{n} \sum_{g = 1}^{G} θ_{g}

(5)

where $θ = {(θ_{1}, \dots, θ_{G})}^{T} {and λ}_{n} = ζ_{n}^{1 - γ} γ^{- γ} {(1 - γ)}^{γ - 1} .$ We have the following proposition:

Proposition 1. Assume that $λ_{n} = ζ_{n}^{1 - γ} γ^{- γ} {(1 - γ)}^{γ - 1} .$ Then, $\hat{β}$ minimizes $L_{n} (β)$ if and only if $(\hat{β}, \hat{θ})$ minimizes $Q (β, θ),$ where $θ_{g} > 0 and {\hat{θ}}_{g} > 0$ for g = 1,..., G

We can show Proposition 1 similarly to Huang et al.¹⁹ Thus, its proof is omitted.

To approximate ${\tilde{l}}_{k} (β), we define \nabla \tilde{l} (β) = \partial {\tilde{l}}_{k} (β) / \partial β and \nabla^{2} \tilde{l} (β) = \partial^{2} {\tilde{l}}_{k} (β) / (\partial β \partial β^{T}) = X^{T} X,$ where X can be obtained by the Cholesky decomposition and $Y = {(X^{T})}^{- 1} {\nabla^{2} \tilde{l} (β) β - \nabla \tilde{l} (β)} .$ To minimize $Q (β, θ),$ we minimize

\tilde{Q} (β, θ) = \frac{1}{2} {(Y - X β)}^{T} (Y - X β) + \sum_{g = 1}^{G} θ_{g}^{1 - 1 / γ} c_{g}^{1 / γ} \sum_{m \in A_{g}} \frac{| β_{m} |}{{| {\tilde{β}}_{m} |}^{v}} + ζ_{n} \sum_{g = 1}^{G} θ_{g}

(6)

Define $β^{(p)}$ is the estimator at the pth step in optimization. Then, the algorithm is as follows:

Step 1 Obtain $\tilde{β}$ by minimizing likelihood (1) and set it to the initial value $β^{(0)} .$

Step 2 Compute X and Y using β^(p) at the pth step.

Step 3 Compute

θ_{g}^{(p)} = c_{g} {(\frac{1 - γ}{ζ_{n} γ})}^{γ} {(\sum_{m \in A_{g}} \frac{| β_{m}^{(p)} |}{{| {\tilde{β}}_{m} |}^{v}})}^{γ}, g = 1, \dots, G

Step 4 After plugging $θ^{(p)} = {(θ_{1}^{(p)}, \dots, θ_{G}^{(p)})}^{T}$ from Step 3 into $\tilde{Q} (β, θ^{(p)}),$ minimize $\tilde{Q} (β, θ^{(p)})$ to obtain β^(p+1).

Step 5 Repeat Step 2 – Step 4 until ${‖ β^{(p + 1)} - β^{(p)} ‖}_{1} < 10^{- 4} .$

To choose a tuning parameter, we can use the generalized cross validation by following Huang et al.¹³:

\frac{{\tilde{l}}_{k} (\hat{β})}{n {1 - \hat{d} (λ_{n}) / n}^{2}}

where $\hat{d} (λ_{n})$ is the number of non-zero coefficients given λ_n. To estimate the covariance matrix of $\hat{β} \neq 0,$ we can use a quadratic approximation as in Fan and Li²¹ and Huang et al.¹³ It can be estimated as follows:

{\nabla^{2} \tilde{l} (\hat{β}) + Υ (\hat{β}, \hat{θ})}^{- 1} \hat{Cov} {\nabla \tilde{l} (\hat{β})} {\nabla^{2} \tilde{l} (\hat{β}) + Υ (\hat{β}, \hat{θ})}^{- 1}

where

Υ (\hat{β}, \hat{θ}) = diag {\sum_{A_{g} ∍ m} {\hat{θ}}_{m}^{1 - 1 / γ} c_{g}^{1 / γ} \frac{I ({\hat{β}}_{m} \neq 0)}{| {\hat{β}}_{m} | {| {\tilde{β}}_{m} |}^{v}}, m = 1, \dots, d_{n}}

{\hat{θ}}_{m} = c_{g} {(\frac{1 - γ}{ζ_{n} γ})}^{γ} {(\sum_{m \in A_{g}} \frac{| {\hat{β}}_{m} |}{{| {\tilde{β}}_{m} |}^{ν}})}^{γ}

3. Simulation

We conducted simulations to evaluate the performance of the adaptive group bridge and compare it with the group bridge from Huang et al.¹³ and SCAD of Ni et al.⁷ under two case-cohort studies. The failure time for disease 1, T_i1, was generated based on the proportional hazards model. To generate failure time for disease 2, T_i2, we used the Clayton–Cuzick model²²

F (t_{1}, t_{2} | Z_{i 1}, Z_{i 2}) = {S_{1} {(t_{1}; Z_{i 1})}^{- 1 / η} + S_{2} {(t_{2}; Z_{i 2})}^{- 1 / η} - 1}^{- η}

where $Z_{i 1} = Z_{i 2} = Z_{i}, S_{k} (t; Z_{i}) = Pr (T_{k} > t | Z_{i k}) = \exp {- \int_{0}^{t} h_{0 k} (u) e^{β_{k}^{T} Z_{k}} d u}$ is survival function, h_0k(t) and β_k (k = 1,2) are the baseline hazard function and the covariate effect for disease k, respectively, and η is the association parameter between the failure times of the two diseases. The relationship between Kendall’s tau τ_η and η is τ_η = 1/(2η + 1). A larger Kendall’s tau represents a higher correlation between T₁ and T₂. Value of 4 was used for η. The corresponding Kendall’s tau value was approximately 0.11. Disease 1, that is, k = 1 was of interest.

We considered two censoring rates: 90% and 80%. The censoring distribution was generated independently from the uniform distribution. We assumed the constant baseline hazards: h₀₁(t) = 2 and h₀₂(t) = 8. We examined two event rates: 10% and 20% for k = 1.The corresponding event rates for k = 2were 20% and 40%, respectively. The control-to-case ratio was set to 1:2. Two sample sizes of the full cohort were considered: n = 750 and n = 1500. Let n_case be the expected number of cases given censoring rate. For each setting, 500 replications were conducted.

To compare the performance of the three variable selection methods, we calculated the group correction rate and the individual correction rate representing the proportion that each variable selection method correctly identified the true non-zero groups and non-zero individual variables of the underlying model, respectively. We also calculated group size (GS) and model size (MS) defined as the average number of non-zero groups and non-zero individual variables selected by each method. The ratio of mean squared errors was defined as the ratio of the mean squared error for each variable selection relative to the mean squared error of the oracle estimator

\frac{\sum_{i = 1}^{500} {‖ {\hat{β}}^{i} - β_{0} ‖}_{2}^{2}}{\sum_{i = 1}^{500} | | {\hat{β}}_{Oracle}^{i} - β_{0} ‖_{2}^{2}}

where ${\tilde{β}}_{O r a c l e}^{i}$ is the oracle estimator of β₀ at the ith iteration. The oracle estimator was obtained from the pseudo-partial likelihood (1) with either weight (2) or weight (3) assuming we already knew the true non-zero variables. Therefore, the ratio of mean squared errors closer to 1 indicates a better estimation of β₀.

In the first simulation, we examined group variables consisting of continuous variables. When the event rate was 10% with population size 750, there were five group variables with GSs ${(| A_{1} |, | A_{2} |, | A_{3} |, | A_{4} |, | A_{5} |)}^{T} = {(2, 3, 4, 5, 3)}^{T} .$ Groups 2 and 3 were overlapped as follows

(β_{1}, \dots, β_{15}) = \underset{A_{1}}{\underset{︸}{(1.1, 0.9,}} \overset{A_{2}}{\overset{︷}{- 1.2, 0, 0,}} \underset{A_{3}}{\underset{︸}{0, 0}} \overset{A_{4}}{\overset{︷}{1.1, - 1, 0, 0.9, 0,}} \underset{A_{5}}{\underset{︸}{0, 0, 0}})

For the event rate 20% with population size 750 or the event rate 10% with population size 1500, one more zero group with size 2, (β₁₆, β₁₇)^T = (0,0)^T, was added. For the event rate 20% with population size 1500, an additional zero group with size 2, (β₁₈, β₁₉)^T = (0,0)^T, was added. By doing so, we allowed d_n to increase as n_case increased. All variables within each group were generated from the multivariate normal distribution with mean 0, variance 1, and correlation 0.5. All groups other than Groups 2 and 3 were assumed to be independent. Thus, the true GS and MS were 3 and 6, respectively. For each scenario, the efficient case-cohort weight (3) and the traditional case-cohort weight (2) were examined.

Table 1 summarizes the results. Table 1 reports the group correction rate (GRC %), individual correction rate (IDC %), GS, MS, and ratio of mean square error (MSER). For all variable selection methods, higher event rates for disease 1 and larger sample sizes produced higher group correction and individual correction rates; group and MSs closer to 3 and 6, respectively; the MSER closer to 1. The results show that (i) group correction and individual correction rates for the adaptive group bridge are higher than those of the group bridge and SCAD; (ii) the GSs and the MSs for the adaptive group bridge are closer to 3 and 6, respectively, compared to the other two methods. Moreover, all three variable selection methods using the efficient case-cohort weight (3) have better group correction and individual correction rates than those using the traditional case-cohort weight (2), in particular when the event rate for disease 2 is higher and correlation between two failure times is smaller. We also examined different event rates, different pairwise correlations between variables, and a large number of coefficients with a high pairwise correlation between variables. In all scenarios, we had similar results to Table 1. The detailed settings and results of some additional simulation studies are provided in the Supplemental Materials.

Table 1.

Simulation results for group variables consisting of continuous variables.

n	Weight	P(Δ₁, Δ₂)	d_n	Method	GRC%	IDC%	GS	MS	MSER

750	Traditional case-cohort weight	(0.1,–)	15	AGB	27.8	14.2	4.09	9.02	2.22
				GB	33.2	5.2	3.96	9.53	2.02
				SCAD	8.8	8.2	4.27	8.67	2.43
		(0.2,–)	17	AGB	73.4	59.6	3.32	6.73	1.46
				GB	73.2	21.8	3.32	7.61	1.72
				SCAD	38.8	38.2	3.68	7.05	1.69
	Efficient case-cohort weight	(0.1,0.2)	15	AGB	54.2	32.8	3.58	7.48	1.64
				GB	52.8	12.2	3.56	8.36	1.70
				SCAD	17.6	16.6	4.06	8.01	2.07
		(0.2,0.4)	17	AGB	83.0	71.2	3.20	6.48	1.36
				GB	79.4	23.4	3.23	7.35	1.72
				SCAD	47.4	46.4	3.56	6.79	1.50
1500	Traditional case-cohort weight	(0.1,–)	17	AGB	34.4	20.0	4.12	8.68	2.02
				GB	31.2	4.4	4.18	9.85	2.00
				SCAD	12.0	11.8	4.37	8.74	2.18
		(0.2,–)	19	AGB	83.8	76.0	3.22	6.45	1.30
				GB	73.0	18.8	3.33	7.56	1.70
				SCAD	66.0	65.6	3.39	6.57	1.39
	Efficient case-cohort weight	(0.1,0.2)	17	AGB	56.6	41.2	3.69	7.67	1.75
				GB	51.4	10.8	3.69	8.54	1.84
				SCAD	34.8	33.8	3.89	7.63	1.82
		(0.2,0.4)	19	AGB	92.8	85.2	3.10	6.22	1.24
				GB	82.8	23.4	3.19	7.28	1.74
				SCAD	75.4	74.8	3.26	6.34	1.23

Open in a new tab

GRC: group correction; IDC: individual correction; GS: group size; MS: model size; MSER: mean square error ratio; AGB: adaptive group bridge; GB: group bridge; SCAD: smoothly clipped absolute deviation.

In the second simulation, we considered group structure with continuous and categorical variables. For 10% of event rate of disease 1, there were five groups: two groups consisting of categorical variables (A₁,A₂) and three groups consisting of continuous variables (A₃,A₄,A₅). The GSs were $(| A_{1} |, | A_{2} |, | A_{3} |, | A_{4} |, | A_{5} |) = (2, 3, 3, 4, 5)$ and overlapping groups were A₃ and A₄ as follows

(β_{1}, \dots, β_{15}) = (\underset{A_{1}}{\underset{︸}{1.1, 0.9}}, \overset{A_{2}}{\overset{︷}{0, 0, 0}} \underset{A_{3}}{\underset{︸}{- 1.2, 0, 0,}}, \overset{A_{4}}{\overset{︷}{0, 0}} \underset{A_{5}}{\underset{︸}{1.1, - 1, 0, 0.9, 0}})

The categorical variables in GS 2 and 3 were generated from variables with three and four categories, respectively. The reference groups were set to 0. The continuous variables were generated from multivariate normal distribution with mean 0, correlation within group 0.5, correlation between different groups 0. When the event rate for disease 1 was 20% with population size 750 or the event rate for disease 1 was 10% with population size 1500, we added one more group consisting of categorical variables with size 2, (β₁₆, β₁₇)^T = (0,0)^T. When the event rate for disease 1 was 20% with population size 1500, one more group consisting of categorical variables with size 2 such as (β₁₈,β₁₉)^T = (0,0)^T was added. Table 2 summarizes the results. For the group/individual correction rates, GS, and MS, the adaptive group bridge outperformed the group bridge and SCAD; the efficient case-cohort weight (3) correctly identified the true non-zero variables better than the traditional case-cohort weight (2) in all three methods. The mean square ratio of the adaptive group bridge is always smaller than that of the group bridge in all settings and is comparable with or less than that of SCAD.

Table 2.

Simulation results for group variables consisting of continuous and categorical variables.

n	Weight	P(Δ₁, Δ₂)	d_n	Method	GRC%	IDC%	GS	MS	MSER

750	Traditional case-cohort weight	(0.1,–)	15	AGB	25.2	9.8	4.08	8.88	2.06
				GB	26.8	2.0	3.95	9.46	2.06
				SCAD	10.6	8.6	4.31	9.00	2.20
		(0.2,–)	17	AGB	67.2	51.2	3.40	6.87	1.40
				GB	68.2	14.2	3.38	7.82	1.71
				SCAD	33.6	33.4	3.73	7.10	1.47
	Efficient case-cohort weight	(0.1,0.2)	15	AGB	46.0	24.0	3.69	7.75	1.65
				GB	48.8	6.0	3.61	8.52	1.76
				SCAD	17.0	15.6	4.07	8.25	1.89
		(0.2,0.4)	17	AGB	78.2	64.4	3.26	6.55	1.31
				GB	77.4	15.8	3.25	7.53	1.67
				SCAD	43.4	42.8	3.61	6.86	1.36
1500	Traditional case-cohort weight	(0.1,–)	17	AGB	25.6	12.2	4.40	9.34	1.87
				GB	21.2	1.4	4.36	10.18	1.91
				SCAD	15.0	14.6	4.45	8.90	1.88
		(0.2,–)	19	AGB	78.4	65.4	3.27	6.57	1.37
				GB	72.2	15.4	3.36	7.68	1.78
				SCAD	61.8	61.4	3.43	6.61	1.34
	Efficient case-cohort weight	(0.1,0.2)	17	AGB	46.4	28.4	3.79	7.84	1.53
				GB	46.4	5.2	3.79	8.71	1.71
				SCAD	30.0	29.4	3.98	7.76	1.53
		(0.2,0.4)	19	AGB	84.0	74.4	3.19	6.38	1.34
				GB	82.8	20.8	3.21	7.41	1.91
				SCAD	68.6	68.4	3.33	6.42	1.24

Open in a new tab

We also examined the performance of variable selection when there were continuous variables (Z_ij’s) and their squared variables $(Z_{i j}^{2^{,}} s) .$ The detailed settings and simulation results are presented in the Supplemental Materials. The results are similar to Table 1: the adaptive group bridge has better bi-level selection accuracy than the group bridge and SCAD. And the mean group and MSs of the adaptive group bridge are closer to their true values compared to the other two methods. This suggests that when some of the variables are highly collinear, bi-level selection may be beneficial even if individual variable selection is of interest.

4. Data analysis

We applied the proposed method to analyze the data from the Busselton Health Study.^8,9 The Busselton Health Study was conducted in the south-west of Western Australia and questionnaires were used every three years from 1966 to 1981 to collect general health information for adult participants. The main aims of this study were to evaluate the association between stroke and serum ferritin effect and to identity risk factors related to stoke. The population consisted of 1612 men and women aged 40–89 who participated in 1981 and were free of coronary heart disease or stroke at that time. The outcome of interest was the time to stoke event, defined as hospital admission, any procedure, or death from stroke, which had followed by 31 December 1998. The time to stoke event was considered censored if subjects did not have an event by the end of study time or lost to follow-up during the study period.

To reduce the cost and preserve the blood sample, the case-cohort study was conducted for stoke. In addition to case-cohort sample for stoke, additional serum ferritin information was obtained from another case-cohort study for coronary heart disease. Under this design, the serum ferritin was measured for all the subjects with coronary heart disease and/or stroke as well as those in the subcohort. The full cohort size and the subcohort size were 1210 and 450, respectively. There were 117 subjects and 55 subjects who had stroke events in the full cohort and the subcohort, respectively. Extra information of serum ferritin was available for 174 subjects who had only coronary heart disease outside the subcohort. The other risk factors included age, gender, body mass index (BMI), blood pressure treatment, cholesterol, triglycerides, hemoglobin and smoking status.

To identify non-zero variables, we used the adaptive group bridge with weights (2) and (3) and compared the results with those from group bridge and SCAD. For continuous variables such as age, BMI, and triglycerides, we considered the following hierarchical structure: (Group 1) each continuous variable (GS 1); (Group 2) each continuous variable and its squared value (GS 2). All variables were centered and standardized. Since triglycerides were severely skewed, log-transformed triglycerides was used before standardization.

Table 3 reports estimated coefficients and their standard errors using the group bridge, adaptive group bridge, and SCAD. The results show that SCAD selected more variables than the group bridge and the adaptive group bridge regardless of the weights. In particular, the standard errors of serum ferritin levels, the square of log-transformed triglycerides, BMI, and diabetes treatment that SCAD identified even with weight (3) were large for their estimates. When using the traditional weight (2), all three methods selected more variables than when using the efficient weight (3). More specifically, compared to the variables selected by the group bridge with weight (3), the group bridge with weight (2) selected four additional variables including ferritin tertile 2, log-transformed triglycerides, its quadratic terms, and diabetes treatment. The standard errors of those four variables’ estimates were relatively large for their estimates. On the other hand, the adaptive group bridge with weight (2) additionally selected diabetes treatment only compared to the variables selected by the adaptive group bridge with weight (3). Both the group bridge and the adaptive group bridge using the efficient weight (3) selected the same three variables: age, blood pressure treatment, and sex. Therefore, increased age, blood pressure treatment, and female were associated with stoke event.

Table 3.

Estimated coefficients and standard errors for the Busselton Health Study Data.

	Traditional case cohort weight (2)			Efficient case cohort weight (3)
Variable	AGB $\hat{β} (SE)$	GB $\hat{β} (SE)$	SCAD $\hat{β} (SE)$	AGB $\hat{β} (SE)$	GB $\hat{β} (SE)$	SCAD $\hat{β} (SE)$

Ferritin tertile (ref = 1)
Ferritin tertile 2	0(–)	0 (–)	0.04 (0.15)	0 (–)	0 (–)	0.11 (0.15)
Ferritin tertile 3	0(–)	0.09 (0.11)	0.14 (0.13)	0 (–)	0 (–)	0.11 (0.13)
Age	0.86 (0.13)	0.91 (0.13)	0.88 (0.13)	0.85 (0.13)	0.83 (0.12)	0.90 (0.13)
Age²	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)
BMI	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)	0.04 (0.13)
BMI²	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)
Cholesterol	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)
Cholesterol²	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)
log(TR)	0 (–)	−0.17 (0.25)	0 (–)	0 (–)	0 (–)	0 (–)
log(TR)²	0 (–)	0.21 (0.21)	0.03 (0.13)	0 (–)	0 (–)	0.05 (0.12)
Diabetes TRT	−0.17 (0.1)	−0.16 (0.1)	−0.13 (0.09)	0 (–)	0 (–)	−0.05 (0.08)
Blood Pressure TRT	0.28 (0.09)	0.3 (0.1)	0.31 (0.10)	0.26 (0.09)	0.25 (0.09)	0.30 (0.10)
Sex(l = female)	−0.33 (0.11)	−0.31 (0.12)	−0.32 (0.12)	−0.27 (0.11)	−0.26 (0.11)	−0.25 (0.11)
Smoking(ref = Never)
Former	0(–)	0 (–)	0 (–)	0 (–)	0 (–)	0 (–)
Current	0 (–)	0 (–)	0(–)	0 (–)	0 (–)	0 (–)

Open in a new tab

TR: triglycerides; TRT: treatment; AGB: adaptive group bridge; GB: group bridge; SCAD: smoothly clipped absolute deviation; BMI: body mass index.

We compared the model errors of the three methods using a fivefold cross-validation evaluation. Following Huang et al.,¹³ we considered the following model error: $M E (\hat{β}) = E {\exp (- {\hat{β}}^{T} Z) - \exp (- β_{0}^{T} Z)}^{2} .$ We estimated the model error for case-cohort data as follows

\hat{M E} (\hat{β}) = \frac{1}{n} \sum_{i = 1}^{n} {Δ_{i k} + (1 - Δ_{i k}) ξ_{i} \tilde{α}} {\exp (- {\hat{β}}^{T} Z_{i}) - \exp (- β_{0}^{T} Z_{i})}^{2}

For the traditional weight (2), the estimated model errors for the adaptive group bridge, the group bridge, and SCAD were 1.16, 2.35, and 11.05, respectively. When the efficient weight (3) was used, the estimated model errors for the three methods were smaller than those using the traditional weight (2): 0.18, 0.63, and 1.85 for the adaptive group bridge, the group bridge, and SCAD, respectively. For both weights, the adaptive group bridge had the smallest model error, which indicates the adaptive group bridge had a better prediction than the other two methods.

5. Discussion

We proposed the adaptive group bridge for case-cohort data and studied its asymptotic properties. The simulation studies and the Busselton Health data example showed the adaptive group bridge was superior to the group bridge and SCAD in terms of variable selection and prediction. The objective function of the proposed penalized proportional hazards model with the adaptive group bridge is non-convex. To minimize the non-convex objective function, we proposed a coordinate decent algorithm using the quadratic approximation of the pseudo-likelihood function and the optimization scheme for the adaptive L₁ penalty.

The proposed method is limited to when d_n < n. Studying bi-level selection when d_n > n would be an important future research problem. One way to deal with this problem is screening. The two-stage selection procedure may be considered: we screen group variables in the first stage and then we make the parsimonious list of non-zero variables using the adaptive group bridge in the second stage.

Making an inference using the selected variables only may not be valid because hypotheses are generated from the data and not pre-specified. Recently post-selection inference for LASSO got much attention.^23–25 Developing a method to conduct valid inference after model selection would be an important research problem.

In this paper, we have considered the event-specific model. The joint analysis should be conducted when an investigator compares the risk effects on different diseases. Developing a variable selection method for multivariate failure time models under multiple case-cohort studies would be another interesting future research problem.

Supplementary Material

Supplemental material

NIHMS1046936-supplement-Supplemental_material.pdf^{(468.2KB, pdf)}

Acknowledgements

We thank Professor Matthew Knuiman and the Busselton Population Medical Research Foundation for permission to use their data.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by Institutional Research Grant IRG #16–183-31 from the American Cancer Society and the MCW Cancer Center, and the United States National Cancer Institute (U24CA076518).

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Supplemental Material

Supplemental material for this article is available online.

References

1.Prentice R A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 1986; 73: 1–11. [Google Scholar]
2.Self SG and Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Statist 1988; 34: 103–119. [Google Scholar]
3.Barlow W Robust variance estimation for the case-cohort design. Biometrics 1994; 50: 1064–1072. [PubMed] [Google Scholar]
4.Kulich M and Lin DY. Improving the efficiency of relative-risk estimation in case-cohort study. J Am Statist Assoc 2004; 99: 832–844. [Google Scholar]
5.Kang S and Cai J. Marginal hazard model for case-cohort studies with multiple disease outcomes. Biometrika 2009; 96: 887–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kim S, Cai J and Lu W. More efficient estimators for case-cohort studies. Biometrika 2013; 100: 695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ni A, Cai J and Zeng D. Variable selection for case-cohort studies with failure time outcome. Biometrika 2016; 103: 547–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Cullen KJ. Mass health examinations in the Busselton population, 1996 to 1970. Aust J Med 1972; 2: 714–718. [DOI] [PubMed] [Google Scholar]
9.Knuiman MW, Divitini ML, Olynyk JK, et al. Serum ferritin and cardiovascular disease: a 17-year following-up study in Busselton, Western Australia. Am J Epidemiol 2003; 158: 144–149. [DOI] [PubMed] [Google Scholar]
10.Cai J, Fan J, Li R, et al. Variable selection for multivariate failure time data. Biometrika 2005; 92: 303–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ma S, Song X and Huang J. Supervised group lasso with applications to microarray. BMC Bioinformatics 2007; 3: 60. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kim J, Sohn I, Jung S, et al. Analysis of survival data with group lasso. Comm Statist Simulation Comput 2012; 41: 1593–1605. [Google Scholar]
13.Huang J, Li L, Liu Y, et al. Group selection in the cox model with a diverging number of covariates. Statist Sin 2014; 24: 1787–1810. [Google Scholar]
14.Wang S and Nan B. Hierarchically penalized cox regression with grouped variables. Biometrika 2009; 96: 307–322. [Google Scholar]
15.Ahn KW, Banerjee A, Sahr N, et al. Group and within-group variable selection for competing risks data. Lifetime DataÚnalysis 2018; 24, 407–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Cox DR. Regression models and life-tables (with discussion). J R Statist Soc B 1972; 34: 187–220. [Google Scholar]
17.Kalbfleisch JD and Lawless JF. Likelihood analysis of multistate models for disease incidence and mortality. Statist Med 1988; 7: 149–160. [DOI] [PubMed] [Google Scholar]
18.Borgan O, Langholz B, Samuelsen SO, et al. Exposure stratified case-cohort designs. Lifetime Data Anal 2000; 6: 39–58. [DOI] [PubMed] [Google Scholar]
19.Huang J, Ma S, Xie H, et al. A group bridge approach for variable selection. Biometrika 2009; 96: 339–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Zou H The adaptive lasso and its oracle properties. J Am Statist Assoc 2006; 101: 1418–1429. [Google Scholar]
21.Fan J and Li R. Variable selection for cox’s proportional hazards model and frailty properties. J Am Statist Assoc 2002; 30: 74–99. [Google Scholar]
22.Clayton D and Cuzick J. Multivariate generalizations of the proportional hazards model (with discussion). J R Statist Soc A 1985; 148: 82–117. [Google Scholar]
23.Lee J, Sun D, Sun Y, et al. Exact post-selection inference, with application to the lasso. Ann Stat 2016; 44: 907–927. [Google Scholar]
24.Lockhart R, Jonathan Taylor J, Tibshirani R, et al. A significance test for the lasso. Ann Stat 2014; 42: 413–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Tibshirani R, Taylor J, Lockhart R, et al. Exact post-selection inference for sequential regression procedures. J Am Statist Assoc 2016; 111: 600–620. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material

NIHMS1046936-supplement-Supplemental_material.pdf^{(468.2KB, pdf)}

[R1] 1.Prentice R A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 1986; 73: 1–11. [Google Scholar]

[R2] 2.Self SG and Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Statist 1988; 34: 103–119. [Google Scholar]

[R3] 3.Barlow W Robust variance estimation for the case-cohort design. Biometrics 1994; 50: 1064–1072. [PubMed] [Google Scholar]

[R4] 4.Kulich M and Lin DY. Improving the efficiency of relative-risk estimation in case-cohort study. J Am Statist Assoc 2004; 99: 832–844. [Google Scholar]

[R5] 5.Kang S and Cai J. Marginal hazard model for case-cohort studies with multiple disease outcomes. Biometrika 2009; 96: 887–901. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Kim S, Cai J and Lu W. More efficient estimators for case-cohort studies. Biometrika 2013; 100: 695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Ni A, Cai J and Zeng D. Variable selection for case-cohort studies with failure time outcome. Biometrika 2016; 103: 547–562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Cullen KJ. Mass health examinations in the Busselton population, 1996 to 1970. Aust J Med 1972; 2: 714–718. [DOI] [PubMed] [Google Scholar]

[R9] 9.Knuiman MW, Divitini ML, Olynyk JK, et al. Serum ferritin and cardiovascular disease: a 17-year following-up study in Busselton, Western Australia. Am J Epidemiol 2003; 158: 144–149. [DOI] [PubMed] [Google Scholar]

[R10] 10.Cai J, Fan J, Li R, et al. Variable selection for multivariate failure time data. Biometrika 2005; 92: 303–316. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Ma S, Song X and Huang J. Supervised group lasso with applications to microarray. BMC Bioinformatics 2007; 3: 60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Kim J, Sohn I, Jung S, et al. Analysis of survival data with group lasso. Comm Statist Simulation Comput 2012; 41: 1593–1605. [Google Scholar]

[R13] 13.Huang J, Li L, Liu Y, et al. Group selection in the cox model with a diverging number of covariates. Statist Sin 2014; 24: 1787–1810. [Google Scholar]

[R14] 14.Wang S and Nan B. Hierarchically penalized cox regression with grouped variables. Biometrika 2009; 96: 307–322. [Google Scholar]

[R15] 15.Ahn KW, Banerjee A, Sahr N, et al. Group and within-group variable selection for competing risks data. Lifetime DataÚnalysis 2018; 24, 407–424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Cox DR. Regression models and life-tables (with discussion). J R Statist Soc B 1972; 34: 187–220. [Google Scholar]

[R17] 17.Kalbfleisch JD and Lawless JF. Likelihood analysis of multistate models for disease incidence and mortality. Statist Med 1988; 7: 149–160. [DOI] [PubMed] [Google Scholar]

[R18] 18.Borgan O, Langholz B, Samuelsen SO, et al. Exposure stratified case-cohort designs. Lifetime Data Anal 2000; 6: 39–58. [DOI] [PubMed] [Google Scholar]

[R19] 19.Huang J, Ma S, Xie H, et al. A group bridge approach for variable selection. Biometrika 2009; 96: 339–355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Zou H The adaptive lasso and its oracle properties. J Am Statist Assoc 2006; 101: 1418–1429. [Google Scholar]

[R21] 21.Fan J and Li R. Variable selection for cox’s proportional hazards model and frailty properties. J Am Statist Assoc 2002; 30: 74–99. [Google Scholar]

[R22] 22.Clayton D and Cuzick J. Multivariate generalizations of the proportional hazards model (with discussion). J R Statist Soc A 1985; 148: 82–117. [Google Scholar]

[R23] 23.Lee J, Sun D, Sun Y, et al. Exact post-selection inference, with application to the lasso. Ann Stat 2016; 44: 907–927. [Google Scholar]

[R24] 24.Lockhart R, Jonathan Taylor J, Tibshirani R, et al. A significance test for the lasso. Ann Stat 2014; 42: 413–468. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Tibshirani R, Taylor J, Lockhart R, et al. Exact post-selection inference for sequential regression procedures. J Am Statist Assoc 2016; 111: 600–620. [Google Scholar]

PERMALINK

Bi-level variable selection for case-cohort studies with group variables

Soyoung Kim

Kwang Woo Ahn

Abstract

1. Introduction

2. Model selection with adaptive group bridge under case-cohort designs

2.1. Estimation for case-cohort designs

2.2. Adaptive group bridge

2.3. Asymptotic properties

2.4. Computation

3. Simulation

Table 1.

Table 2.

4. Data analysis

Table 3.

5. Discussion

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Bi-level variable selection for case-cohort studies with group variables

Soyoung Kim

Kwang Woo Ahn

Abstract

1. Introduction

2. Model selection with adaptive group bridge under case-cohort designs

2.1. Estimation for case-cohort designs

2.2. Adaptive group bridge

2.3. Asymptotic properties

2.4. Computation

3. Simulation

Table 1.

Table 2.

4. Data analysis

Table 3.

5. Discussion

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases