The sparse estimation of the semiparametric linear transformation model with dependent current status data

Lin Luo; Jinzhao Yu; Hui Zhao

doi:10.1080/02664763.2022.2161488

. 2022 Dec 29;51(4):759–779. doi: 10.1080/02664763.2022.2161488

The sparse estimation of the semiparametric linear transformation model with dependent current status data

Lin Luo ^a, Jinzhao Yu ^b, Hui Zhao ^b,^CONTACT

PMCID: PMC10896163 PMID: 38414802

Abstract

In this paper, we study the sparse estimation under the semiparametric linear transformation models for the current status data, also called type I interval-censored data. For the problem, the failure time of interest may be dependent on the censoring time and the association parameter between them is left unspecified. To address this, we employ the copula model to describe the dependence between them and a two-stage estimation procedure to estimate both the association parameter and the regression parameter. In addition, we propose a penalized maximum likelihood estimation procedure based on the broken adaptive ridge regression, and Bernstein polynomials are used to approximate the nonparametric functions involved. The oracle property of the proposed method is established and the numerical studies suggest that the method works well for practical situations. Finally, the method is applied to an Alzheimer's disease study that motivated this investigation.

Keywords: Broken adaptive ridge regression, current status data, dependent censoring, linear transformation model, variable selection

1. Introduction

This paper discusses the sparse estimation of the semiparametric linear transformation model with dependent current status data, which occur in many fields including econometrics, epidemiology, demography, and tumorigenicity experiments [7,17,19,27]. By dependent current status data, we usually mean that the occurrence of the failure event of interest is observed only once at a censoring or observation time C, where the occurrence time T is not exactly observed and either left- or right-censored. That is, we only know T is either less than or greater than C. In addition, T and C may be dependent or correlated with each other. One example of such data occurs in tumorigenicity experiments. In these studies, the tumor onset time is usually of interest, but cannot be exactly observed since the animals are commonly only observed at their death or sacrifice time for the presence or absence of a tumor. If the tumor is lethal or non-lethal, the tumor onset time and the death time can usually be treated to be the same or independent respectively in the analysis. On the other hand, it is known that most types of tumors are between lethal and non-lethal and in this case, the two times are clearly related, thus resulting in dependent current status data on the tumor onset time. In the following, we will discuss regression analysis of dependent current status data, with the emphasis on simultaneous estimation and covariate selection.

There have been a great deal of literature on variable selection, especially in linear models and completely observed data. Tibshirani [20] proposed the least absolute shrinkage and selection operator (LASSO) procedure, which does not have the oracle property and tends to select too many small noise features and is biased for large parameters. Fan and Li [5] developed the smoothly clipped absolute deviation (SCAD) penalty for variable selection and proved its oracle properties. Zou [32] suggested the adaptive LASSO (ALASSO) procedure which is a weighted version of LASSO. Lv and Fan [12] studied the smooth integration of counting and absolute deviation (SICA) penalty and proved its nonasymptotic property, also called the weak oracle property. Zhang [25] developed the minimax concave penalty (MCP) procedure, which provides a fast algorithm for nearly unbiased concave penalized selection. Dicker et al. [4] gave the seamless- $L_{0}$ (SELO) penalty, which is a smooth function on $[0, \infty)$ , extremely similar to the $L_{0}$ penalty.

In addition, there also exist some literature for the variable selection under the survival models and censored data context. For example, Tibshirani [21], Fan et al. [6], Zhang et al. [26] and Shi et al. [16] investigated the LASSO, SCAD, ALASSO and SICA penalty-based procedures under the framework of Cox's PH model, respectively. Liu et al. [11] discussed ALASSO in the general transformation models. However, most of the existing methods for failure time data only apply to right-censored data. Until now, there are few researches about the variable selection of the interval-censored data or current status data. Recently, Scolas et al. [15] and Wu and Cook [23] considered interval-censored data arising from the Cox's PH model. Sun et al. [18] extended the LASSO, SCAD and ALASSO to the semiparametric nonmixture cure model with interval-censored failure time data. Zhao et al. [29] and Li et al. [10] discussed the simultaneous estimation and covariate selection of interval-censored data with a broken adaptive ridge (BAR) regression approach. However, all the researches above assumed that the censoring mechanism is independent of the failure time of interest. As mentioned above, the independence assumption may not be valid in many situations.

To deal with dependent censoring, recently, Ma et al. [13], Zhao et al. [28] and Xu et al. [24] employed the copula model [14,30] to describe the relationship between the failure time T of interest and the censoring time C in the contexts of the Cox PH model, the additive hazard model and the linear transformation model, respectively. However, these methods assumed that the association parameter between T and C is known, which is clearly not realistic in general. As pointed out by Ma et al. [13] and others, the resulting estimators of regression parameters can be sensitive to the assumed association parameter or be biased if the assumed association is misspecified.

In this paper, we will focus on the sparse estimation of dependent current status data arising from the semiparametric linear transformation model, where the copula model is employed to describe the correlation between the failure time and the censoring time. For the proposed method given below, we will allow the association parameter between T and C unspecified and propose a two-stage penalized estimation procedure based on the BAR regression [3,29], which has both the oracle property and the grouping effect. It approximates the L0-penalized regression using an iteratively reweighted L2-penalized algorithm and has the advantages of simultaneous variable selection and parameter estimation. Also the BAR iterative algorithm is fast and converges to a unique global optimal solution.

The rest of this paper is organized as follows. In Section 2, we begin with introducing some notations and models that will be used throughout the paper. Furthermore, a sieve likelihood function based on Bernstein polynomials is presented. In Section 3, a two-stage estimation procedure, the oracle property and the iterative algorithm of regression parameters under the BAR penalty function are developed. In Section 4, we give some simulation study results to evaluate the performance of the proposed method. Moreover, we compare the BAR regression procedure with the methods that make use of other commonly used penalty functions. In Section 5, we apply the proposed method to an application that motivated this study. Some concluding remarks and discussion are given in Section 6.

2. Notation, model and likelihood function

Consider a failure time study that consists of n independent subjects. For subject i, let $T_{i}$ denote the failure time of interest and $X_{i}$ be a p-dimensional vector of covariates. To describe the covariate effects, we assume that $T_{i}$ follows the linear transformation model specified by

\begin{aligned} h (T_{i}) = β^{T} X_{i} + ϵ_{i}, \end{aligned}

(1)

where $h (\cdot)$ is a completely unspecified strictly increasing function, β is the p-dimensional vector of unknown regression parameters, and $ϵ_{i}$ is a random error and has a completely known distribution. Then it is easy to see that model (1) can be rewritten as

\begin{aligned} S (t ∣ X) = G {h (t) - β^{T} X}, \end{aligned}

(2)

where $S (t ∣ X)$ denotes the survival function of T for given X and G is the survival function of $ϵ_{i}$ . Note that model (1) is very flexible and includes many popular models as special cases, which avoids possible model misspecification. For example, if $ϵ_{i}$ follows the extreme value distribution function or $G (t) = \exp {- \exp (t)}$ , model (1) is the proportional hazards (PH) model, while if $ϵ_{i}$ follows the standard logistic distribution or $G (t) = 1 - {1 + \exp (- t)}^{- 1}$ , model (1) becomes a proportional odds (PO) model.

Suppose that the observed data is given by ${(C_{i}, δ_{i} = I (T_{i} \leq C_{i}), X_{i}); i = 1, 2, \dots, n}$ , where $C_{i}$ denotes the censoring or observation time which may depend on $T_{i}$ . Each subject is observed only once at $C_{i}$ and $δ_{i} = 1$ or $δ_{i} = 0$ corresponds to a left- or right-censored observation on the ith subject. In practice, the observation times $C_{i}$ 's may depend on covariates too. For this, we assume that the conditional hazard function of $C_{i}$ is given by a Cox PH model,

\begin{aligned} λ (t; X_{i}) = λ_{0} (t) \exp {φ^{T} W_{i}}, \end{aligned}

(3)

where $W_{i}$ is a d-dimensional (d<p) subset of $X_{i}$ , $λ_{0} (t)$ is the unspecified baseline hazard function and φ is the unknown regression parameters.

For inference about model (1), let $F_{T}$ and $F_{C}$ denote the marginal distribution functions of the $T_{i}$ 's and $C_{i}$ 's given covariates, respectively, and F be their joint distribution. Then it follows from Theorem 2.3.3 of Nelsen [14] that there exists a copula function $M_{ρ} (μ, ν)$ defined on $I^{2} = [0, 1] \times [0, 1]$ with $M_{ρ} (μ, 0) = M_{ρ} (0, ν) = 0$ , $M_{ρ} (μ, 1) = μ$ and $M_{ρ} (1, ν) = ν$ such that

\begin{aligned} F (t, c) = M_{ρ} (F_{T} (t), F_{C} (c)), \end{aligned}

where the parameter ρ is often referred to as the association parameter representing the relationship between T and C. In this paper, ρ is allowed to be unknown and needs to be estimated. Furthermore, we have

\begin{aligned} m_{ρ} (F_{T} (t), F_{C} (c)) = P (T \leq t ∣ C = c, X) = {\frac{\partial M_{ρ} (μ, ν)}{\partial ν} |}_{μ = F_{T} (t), ν = F_{C} (c)} \end{aligned}

by the conditional inversion idea [14], where $m_{ρ} (F_{T} (t), F_{C} (c))$ represents the conditional distribution function of T given C and X.

Define $Λ_{1} (t) = \exp {h (t)}$ , $Λ_{2} (c) = \int_{0}^{c} λ_{0} (s) d s$ , $θ = (β^{T}, ρ, Λ_{1} (\cdot))^{T}$ and $η = (φ^{T}, Λ_{2} (\cdot))^{T}$ , where β is the regression parameter of interest. Let $f_{C}$ denote the marginal density function of C given covariates. Then under the models above, we have

\begin{aligned} F_{T} (t) & = 1 - G {\log (Λ_{1} (t)) - β^{T} X}, \\ F_{C} (c) & = 1 - \exp {- Λ_{2} (c) \exp (φ^{T} W)}, \end{aligned}

and

\begin{aligned} f_{C} (c) = \exp {- Λ_{2} (c) \exp (φ^{T} W)} λ_{0} (c) \exp (φ^{T} W) . \end{aligned}

Then given covariates X, the observed conditional likelihood function can be written as

\begin{aligned} L_{n} (θ, η) & = \prod_{i = 1}^{n} P (C_{i} = c_{i}, δ_{i} = 1 ∣ X_{i})^{δ_{i}} P (C_{i} = c_{i}, δ_{i} = 0 ∣ X_{i})^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} {P (T_{i} \leq c_{i} ∣ C_{i} = c_{i}, X_{i}) f_{C} (c_{i})}^{δ_{i}} {[1 - P (T_{i} \leq c_{i} ∣ C_{i} = c_{i}, X_{i})] f_{C} (c_{i})}^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} {m_{ρ} (F_{T} (c_{i}), F_{C} (c_{i}))}^{δ_{i}} {1 - m_{ρ} (F_{T} (c_{i}), F_{C} (c_{i}))}^{1 - δ_{i}} f_{C} (c_{i}) . \end{aligned}

(4)

In the next section, we will present a two-stage estimation algorithm to estimate θ and η, especially to obtain the sparse estimate of β.

3. Estimation and inference procedure

To estimate θ and η, it is clearly desirable to maximize directly the likelihood function $L_{n} (θ, η)$ . However, note that the estimation of η only involves in model (3), and for the observation time $C_{i}$ 's, we have complete data. Therefore, it is natural to estimate φ and $Λ_{2} (t)$ by the maximum partial likelihood estimator $\hat{φ}$ and Breslow estimator ${\hat{Λ}}_{2} (t)$ , respectively.

In this section, we propose the following two stages estimation procedure.

The first stage is to estimate η based on model (3). That is, define $\hat{φ}$ to be the maximizer of the partial likelihood function

\begin{aligned} L_{φ} ≜ \prod_{i = 1}^{n} \frac{e^{φ^{T} W_{i}}}{\sum_{{j : C_{j} \geq C_{i}}} e^{φ^{T} W_{j}}}, \end{aligned}

and furthermore let

\begin{aligned} {\hat{Λ}}_{2} (t) = \sum_{{i : C_{i} \leq t}} \frac{1}{\sum_{j : C_{j} \geq C_{i}} e^{{\hat{φ}}^{T} W_{j}}} . \end{aligned}

For the second stage, to estimate θ, one could employ a sieve conditional likelihood procedure since $Λ_{1}$ is nonparametric and involves in infinite-dimensional parameters. By following Zhou et al. [31] and others, we use the Bernstein polynomials [22] to approximate $Λ_{1}$ . Specifically, define the sieve space

\begin{aligned} Θ_{n} = {θ_{n} = (β^{T}, ρ, Λ_{1})^{T} : (β^{T}, ρ)^{T} \in B; Λ_{1} \in M_{n}}, \end{aligned}

where $B = {(β^{T}, ρ)^{T} \in R^{p + 1}, ‖ β ‖ + | ρ | \leq D}$ , $M_{n} = {Λ_{n} = \sum_{k = 0}^{m} ψ_{k} B_{k} (t, m, l, u) : \sum_{0 \leq k \leq m} ∣ ψ_{k} ∣\leq M_{n}, 0 \leq ψ_{0} \leq ψ_{1} \leq \dots \leq ψ_{m}}$ , with

\begin{aligned} B_{k} (t, m, l, u) = (\binom{m}{k}) {(\frac{t - l}{u - l})}^{k} {(1 - \frac{t - l}{u - l})}^{m - k}, k = 0, 1, \dots, m . \end{aligned}

In the above, $D$ is a positive constant, $M_{n}$ is a class of nonnegative and nondecreasing function over the interval $[l, u]$ with $0 \leq l < u < \infty$ and usually taken as the range of observed data. $B_{k} (t, m, l, u)$ is the Bernstein basis polynomial of degree $m = o (n^{v})$ for some $v \in (0, 1)$ . In practice, one can choose the value of m based on some model selection criteria, like AIC or Bayesian information criterion(BIC). In addition, by virtue of the nonnegativity and monotonicity of $Λ_{1}$ , it is natural to reparameterize the parameters by letting $ψ_{0} = \exp (ϕ_{0})$ and $ψ_{k} = \sum_{i = 0}^{k} \exp (ϕ_{i})$ ( $k = 1, 2, \dots, m$ ) to remove the constraint $0 \leq ψ_{0} \leq ψ_{1} \leq \dots \leq ψ_{m}$ .

Denote $ϕ = (ρ, ϕ_{0}, ϕ_{1}, \dots, ϕ_{m})^{T}$ and let $l_{c} (β, ϕ) = \log L_{n} (β, ϕ ∣ \hat{η})$ be the conditional log-likelihood of θ given $\hat{η} = (\hat{φ}, {\hat{Λ}}_{2})$ . In the following, we will discuss the development of a penalized or regularized procedure for sparse estimate of β based on the profile log-likelihood $l_{p} (β) = {max}_{ϕ} l_{c} (β, ϕ)$ , where the dimension of covariates, p, can diverge to infinity but p<n, and for this, we will denote p by $p_{n}$ to emphasize the dependence of p on n.

For the simultaneous estimation and covariate selection for model (1), we consider the penalized profile likelihood function

\begin{aligned} l_{pp} (β ∣ \overset{ˇ}{β}) = - 2 l_{p} (β) + \sum_{j = 1}^{p_{n}} P (| β_{j} |; λ_{n}), \end{aligned}

(5)

where the penalty function $P (| β_{j} |; λ_{n}) = λ_{n} β_{j}^{2} / {\overset{ˇ}{β}}_{j}^{2}$ and $λ_{n}$ is a tuning parameter. Here $\overset{ˇ}{β} = ({\overset{ˇ}{β}}_{1}, \dots, {\overset{ˇ}{β}}_{p_{n}})^{T}$ is an adaptive consistent estimator of β without zero components. Dai et al. [3] and Zhao et al. [29] also discussed such penalty function under the contexts of the linear model and PH model, respectively. It is easily seen that the minimizing of (5) can be regarded as an automatic implementation of the best subset selection in some asymptotic sense since the term $β_{j}^{2} / {\overset{ˇ}{β}}_{j}^{2}$ is expected to converge to $I (| β_{j} | = 0)$ in probability as $n \to \infty$ .

To get the sparse estimate of β, of course, one could directly minimize $l_{pp} (β ∣ \overset{ˇ}{β})$ given in (5) by some numerical iterative algorithms. Here, we will adopt a quadratic approximation algorithm which is easier and computationally more efficient.

Firstly, define the gradient vector ${\dot{l}}_{c} (β ∣ ϕ) = \partial l_{c} (β, ϕ) / ∂β$ and the Hessian matrix ${\ddot{l}}_{c} (β ∣ ϕ) = \partial^{2} l_{c} (β, ϕ) / ∂β \partial β^{T}$ . Suppose that $(\tilde{β}, \tilde{ϕ})$ satisfies ${\dot{l}}_{c} (\tilde{β} ∣ \tilde{ϕ}) = 0$ . By Cholesky decomposition, there exists a unique upper triangular $Z \in R^{p_{n} \times p_{n}}$ such that $- {\ddot{l}}_{c} (β ∣ \tilde{ϕ}) = Z^{T} Z$ . In addition, define the pseudo-response vector $y = (Z^{T})^{- 1} [{\dot{l}}_{c} (β ∣ \tilde{ϕ}) - {\ddot{l}}_{c} (β ∣ \tilde{ϕ}) β]$ and by the second-order Taylor expansion within a small neighborhood of $\tilde{β}$ , we have

\begin{aligned} l_{p} (β) \approx c + \frac{1}{2} {\dot{l}}_{c} (β ∣ \tilde{ϕ})^{T} [{\ddot{l}}_{c} (β ∣ \tilde{ϕ})]^{- 1} {\dot{l}}_{c} (β ∣ \tilde{ϕ}), \end{aligned}

and

\begin{aligned} ∥ y - Zβ ∥^{2} = - {\dot{l}}_{c} (β ∣ \tilde{ϕ})^{T} [{\ddot{l}}_{c} (β ∣ \tilde{ϕ})]^{- 1} {\dot{l}}_{c} (β ∣ \tilde{ϕ}), \end{aligned}

where c is a constant and $‖ \cdot ‖$ denotes the Euclidean norm for a given vector. Thus, minimizing Equation (5) is asymptotically equivalent to minimize the following penalized least-squares function:

\begin{aligned} ∥ y - Z^{T} β ∥^{2} + λ_{n} \sum_{j = 1}^{p_{n}} β_{j}^{2} / {\overset{ˇ}{β}}_{j}^{2} . \end{aligned}

(6)

Denote $A_{n} = A_{n} (β) = Z^{T} Z$ and $B_{n} = B_{n} (β) = Z^{T} y$ , then by minimizing (6), we can derive the following iterative formula:

\begin{aligned} {\hat{β}}^{(k + 1)} = {A_{n} ({\hat{β}}^{(k)}) + λ_{n} D ({\hat{β}}^{(k)})}^{- 1} B_{n} ({\hat{β}}^{(k)}), \end{aligned}

(7)

where $D ({\hat{β}}^{(k)}) = diag (({\hat{β}}_{1}^{(k)})^{- 2}, ({\hat{β}}_{2}^{(k)})^{- 2}, \dots, ({\hat{β}}_{p_{n}}^{(k)})^{- 2})$ is a $p_{n} \times p_{n}$ matrix. Note that sometimes the above iteration may have an arithmetic overflow and to address this issue, we rewrite (7) as

\begin{aligned} {\hat{β}}^{(k + 1)} = Γ ({\hat{β}}^{(k)}) {Γ ({\hat{β}}^{(k)}) A_{n} ({\hat{β}}^{(k)}) Γ ({\hat{β}}^{(k)}) + λ_{n} I_{p_{n}}}^{- 1} Γ ({\hat{β}}^{(k)}) B_{n} ({\hat{β}}^{(k)}), \end{aligned}

(8)

where $Γ ({\hat{β}}^{(k)}) = diag ({\hat{β}}_{1}^{(k)}, {\hat{β}}_{2}^{(k)}, \dots, {\hat{β}}_{p_{n}}^{(k)})$ . Now for given $λ_{n}$ , we propose the following steps to solve the objective function (6).

Step 1.
Choose an initial estimator ${\hat{β}}^{(0)}$ satisfying $‖ {\hat{β}}^{(0)} - β_{0} ‖ \leq O_{p} ((p_{n} / n)^{1 / 2})$ and ${\hat{ϕ}}^{(0)}$ .
Step 2.
At the k + 1th step, update the estimate of β given $({\hat{β}}^{(k)}, {\hat{ϕ}}^{(k)})$ by Equation (8).
Step 3.
By solving ${\dot{l}}_{c} (ϕ ∣ {\hat{β}}^{(k + 1)}) = \partial l_{c} ({\hat{β}}^{(k + 1)}, ϕ) / ∂ϕ = 0$ , we obtain the updated estimate ${\hat{ϕ}}^{(k + 1)}$ .
Step 4.
Repeat Steps 2 to 3 until the convergence is achieved, where the detailed expressions of ${\dot{l}}_{c} ({\hat{β}}^{(k)} ∣ {\hat{ϕ}}^{(k)})$ , ${\ddot{l}}_{c} ({\hat{β}}^{(k)} ∣ {\hat{ϕ}}^{(k)})$ and ${\dot{l}}_{c} (ϕ ∣ {\hat{β}}^{(k + 1)})$ are derived in Appendix (A1).

Let ${\hat{β}}^{*} = lim_{k \to \infty} {\hat{β}}^{(k)}$ and ${\hat{ϕ}}^{*} = lim_{k \to \infty} {\hat{ϕ}}^{(k)}$ denote the estimators of β and ϕ obtained above respectively, which will be referred to as the BAR estimators. For the determination of the tuning parameter $λ_{n}$ , we propose to perform the widely used BIC and minimize

\begin{aligned} BIC (λ) = - 2 l_{p} ({\hat{β}}^{*}) + q_{n} \log (n), \end{aligned}

with $q_{n}$ denoting the number of the nonzero components of ${\hat{β}}^{*}$ .

Let $β_{0} = (β_{0, 1}, β_{0, 2}, \dots, β_{0, p_{n}})^{T}$ denote the true value of β and without loss of generality, we can assume that $β_{0} = (β_{01}^{T}, β_{02}^{T})^{T}$ , where $β_{01}$ consists of all $q_{n}$ nonzero components and $β_{02}$ the remaining zero components. Correspondingly, the BAR estimator ${\hat{β}}^{*} = ({\hat{β}}_{1}^{* T}, {\hat{β}}_{2}^{* T})^{T}$ is also divided in the same way. The theorem below gives the asymptotic properties of the proposed BAR estimator ${\hat{β}}^{*}$ .

Theorem 3.1 Oracle Property —

Assume that the regularity conditions C1–C8 given in Appendix (A2) hold. Then with probability tending to 1, ${\hat{β}}_{2}^{*} = 0$ , ${\hat{β}}_{1}^{*}$ exists and has the following properties

(i)
${\hat{β}}_{1}^{*}$ is the unique fixed point of the equation $β_{1} = {A_{n}^{(1)} + λ_{n} D_{1} (β_{1})}^{- 1} B_{n}^{(1)}$ , where $D_{1} (β_{1}) = diag (β_{1}^{- 2}, \dots, β_{q_{n}}^{- 2})$ , $A_{n}^{(1)}$ is the $q_{n} \times q_{n}$ leading submatrix of $A_{n}$ and $B_{n}^{(1)}$ is the vector consisting of the first $q_{n}$ components of $B_{n}$ .

(ii)
For any $q_{n}$ -dimensional vector b satisfying $‖ b ‖_{2} \leq 1$ , $\sqrt{n} t^{- 1} b^{T} ({\hat{β}}_{1}^{*} - β_{01})$ converges in distribution to $N (0, 1)$ , where $t = \sqrt{b^{T} Σb}$ , and Σ is defined in Appendix (A2).

The proof will be sketched in Appendix (A2).

4. Simulation studies

In this section, we conduct some simulation studies to assess the finite sample performance of the proposed BAR regression procedure and compare it to other variable selection methods, for example, the LASSO penalty $P (| β_{j} |; λ_{n}) = λ_{n} | β_{j} |$ , the ALASSO penalty $P (| β_{j} |; λ_{n}) = λ_{n} ω_{j} | β_{j} |$ with $ω_{j}$ being a weight, the SCAD penalty $P (| β_{j} |; λ_{n}) = λ_{n} \int_{0}^{| β_{j} |} min {1, (a λ_{n} - x)_{+} / (a - 1) λ_{n}} d x$ with a>2, the SICA penalty $P (| β_{j} |; λ_{n}) = λ_{n} (τ_{0} + 1) | β_{j} | / (| β_{j} | + τ_{0})$ with $τ_{0} > 0$ , the SELO penalty $P (| β_{j} |; λ_{n}) = λ_{n} \log (\frac{| β_{j} |}{| β_{j} | + τ_{0}} - 1)$ with $τ_{0} > 0$ , and the MCP penalty $P (| β_{j} |; λ_{n}) = \int_{0}^{| β_{j} |} (λ_{n} - x / τ_{0})_{+} d x$ with $τ_{0} > 1$ .

First, generate the covariates X from the multivariate normal distribution with mean zero, variance one, and the correlation between $X_{i}$ and $X_{j}$ being ${0.1}^{| i - j |}$ with $i, j = 1, \dots, p$ . To describe the dependent censoring, we considered the Farlie–Gumbel–Morgenstern (FGM) copula model

\begin{aligned} M_{ρ} (μ, ν) = μν + ρμν (1 - μ) (1 - ν), - 1 \leq ρ \leq 1, \end{aligned}

where ρ is the association parameter. It is well known that Kendall's τ is a commonly used global association parameter, which is usually robust and invariant to monotone transformation. For the FGM copula model considered here, $τ = P {(T_{i} - T_{j}) (C_{i} - C_{j}) > 0} - P {(T_{i} - T_{j}) (C_{i} - C_{j}) < 0} = 2 ρ / 9$ is used to measure the correlation between the $T_{i}$ 's and $C_{i}$ 's.

For subject i, we generated the failure time $T_{i}$ under model (1) with $h (t) = \log t$ and

\begin{aligned} G (t) = {\begin{cases} {1 + ωexp (t)}^{- 1 / ω}, & ω > 0, \\ \exp {- \exp (t)}, & ω = 0, \end{cases} \end{aligned}

where $ω = 0$ and $ω = 1$ correspond to the PH model and the PO model, respectively. Then the observation time $C_{i}$ was generated from its conditional distribution given $T_{i}$ with $Λ_{2} (c_{i}) = c_{i}$ . More specifically, we sampled a $b_{i}$ from $U (0, 1)$ and solved the equation

\begin{aligned} b_{i} = P (C \leq c_{i} ∣ T = t_{i}, X_{i}) = \partial M_{ρ} (μ, ν) / ∂μ ∣_{μ = F_{T} (t_{i}), ν = F_{c} (c_{i})} \end{aligned}

for $C_{i} = c_{i}$ . Define the mean weighted squared error (MSE) to be $({\hat{β}}^{*} - β_{0})^{T} E (X X^{T}) ({\hat{β}}^{*} - β_{0})$ . In all tables, we reported the median of MSE (MMSE), the standard deviation of MSE (SD), the averaged number of nonzero estimates of the parameters whose true values are not zero (TP), and the averaged number of nonzero estimates of parameters whose true values are zero (FP). It is easy to see that TP and FP provide the estimates of the true- and false-positive probabilities, respectively. For the results here, we took the degree of Bernstein polynomials $m = [n^{1 / 4}] = 3$ , the largest inter smaller than $n^{1 / 4}$ and used the ridge regression estimate as initial estimates of the proposed algorithm. The selection of the tuning parameters was based on the BIC criterion. The results given below are based on n = 200, 300 or 500 with 500 replications.

Firstly, we considered the proposed method for the situation where the association parameter of T and C is known. Table 1 presents the results on the covariate selection under the PH model ( $ω = 0$ ) and the BAR penalty function with p = 10 or 20. Here, we set $β_{j} = 1$ for the first four components of X and $β_{j} = 0$ for other components. Assume W is composed of the first 4 and the last 4 elements of X, i.e. d = 8, and we take $φ_{j} = 0$ or 0.1 for all components of W. In addition, we set $τ = \pm 0.2, \pm 0.1, 0$ . The results given in Table 2 were obtained based on the same setups but under the PO model ( $ω = 1$ ). According to the reviewer's suggestion, Table 3 lists the results of increasing the correlation between $X_{i}$ and $X_{j}$ to ${0.5}^{| i - j |}$ with $i, j = 1, \dots, p$ and $φ_{j} = 0.1$ , $j = 1, \dots, 8$ . From these tables, we can see that the BAR approach seems to work well in all setups and the proposed method under the PH model performs better than under the PO model. In all cases, the proposed method performs better and better as the sample size increases. Secondly, we compared the performance of the proposed BAR regression procedure with LASSO, ALASSO, SCAD, SICA, SELO and MCP, respectively. Tables 4 and 5 present the results on the covariate selection under these penalty functions based on the same setups as Tables 1 and 2, but with p = 10, n = 300, $ω = 0$ or 1 and $τ = \pm 0.1$ , 0. The results show that all methods perform well except the LASSO method, which tends to select more noises than others. Under the PH model, the BAR method gives the smallest MMSE and SD, but all methods perform similarly under the PO model. Also the BAR approach generally yielded the smallest FP in all cases among the methods considered.

Table 1.

Results on covariate selection based on BAR with $ω = 0$ .

τ	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
n = 200	p = 10	$φ = 0$		p = 10	$φ \neq 0$
−0.2	0.055(0.109)	4.000	0.122	0.076(0.098)	3.998	0.146
−0.1	0.061(0.104)	4.000	0.120	0.068(0.091)	4.000	0.124
0	0.050(0.128)	3.998	0.130	0.046(0.105)	4.000	0.148
0.1	0.067(0.113)	4.000	0.122	0.054(0.087)	3.998	0.130
0.2	0.066(0.117)	4.000	0.138	0.062(0.151)	3.700	0.166
n = 300	p = 10	$φ = 0$		p = 10	$φ \neq 0$
−0.2	0.008(0.041)	4.000	0.020	0.046(0.038)	4.000	0.040
−0.1	0.048(0.034)	4.000	0.012	0.047(0.040)	4.000	0.028
0	0.049(0.047)	4.000	0.024	0.039(0.021)	4.000	0.004
0.1	0.023(0.035)	4.000	0.010	0.023(0.011)	4.000	0.080
0.2	0.028(0.049)	4.000	0.032	0.031(0.022)	4.000	0.014
n = 300	p = 20	$φ = 0$		p = 20	$φ \neq 0$
−0.2	0.046(0.098)	4.000	0.668	0.079(0.045)	4.000	0.664
−0.1	0.069(0.071)	4.000	0.652	0.058(0.046)	4.000	0.634
0	0.078(0.064)	4.000	0.628	0.055(0.058)	4.000	0.686
0.1	0.046(0.084)	4.000	0.566	0.063(0.054)	4.000	0.660
0.2	0.055(0.053)	3.700	0.678	0.046(0.048)	4.000	0.606
n = 500	p = 20	$φ = 0$		p = 20	$φ \neq 0$
−0.2	0.027(0.048)	4.000	0.150	0.046(0.045)	4.000	0.167
−0.1	0.048(0.050)	4.000	0.128	0.036(0.046)	4.000	0.128
0	0.043(0.021)	4.000	0.104	0.040(0.058)	4.000	0.256
0.1	0.046(0.051)	4.000	0.180	0.036(0.054)	4.000	0.130
0.2	0.034(0.042)	4.000	0.114	0.042(0.048)	4.000	0.106

Open in a new tab

Table 2.

Results on covariate selection based on BAR with $ω = 1$ .

τ	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
n = 200	p = 10	$φ = 0$		p = 10	$φ \neq 0$
−0.2	0.219(0.630)	4.000	0.220	0.294(0.402)	3.990	0.240
−0.1	0.246(0.476)	3.990	0.220	0.242(0.369)	3.990	0.140
0	0.270(0.492)	4.000	0.190	0.266(0.413)	4.000	0.26
0.1	0.256(0.465)	4.000	0.210	0.258(0.403)	3.990	0.210
0.2	0.257(0.350)	4.000	0.220	0.238(0.302)	4.000	0.270
n = 300	p = 10	$φ = 0$		p = 10	$φ \neq 0$
−0.2	0.176(0.240)	4.000	0.200	0.172(0.281)	4.000	0.210
−0.1	0.167(0.177)	4.000	0.180	0.163(0.245)	4.000	0.142
0	0.132(0.177)	4.000	0.130	0.180(0.206)	4.000	0.190
0.1	0.158(0.249)	4.000	0.160	0.204(0.216)	4.000	0.160
0.2	0.161(0.196)	4.000	0.250	0.147(0.183)	4.000	0.140
n = 300	p = 20	$φ = 0$		p = 20	$φ \neq 0$
−0.2	0.319(0.314)	4.000	1.858	0.308(0.409)	4.000	1.900
−0.1	0.280(0.331)	4.000	1.756	0.324(0.393)	4.000	1.874
0	0.283(0.371)	4.000	1.900	0.317(0.326)	4.000	1.930
0.1	0.278(0.316)	4.000	1.842	0.301(0.345)	4.000	1.886
0.2	0.305(0.326)	4.000	1.884	0.308(0.344)	4.000	1.874
n = 500	p = 20	$φ = 0$		p = 20	$φ \neq 0$
−0.2	0.199(0.185)	4.000	1.556	0.197(0.210)	4.000	1.588
−0.1	0.201(0.208)	4.000	1.512	0.216(0.200)	4.000	1.676
0	0.189(0.177)	4.000	1.572	0.226(0.220)	4.000	1.612
0.1	0.184(0.201)	4.000	1.414	0.201(0.244)	4.000	1.648
0.2	0.186(0.180)	4.000	1.468	0.193(0.197)	4.000	1.464

Open in a new tab

Table 3.

Results based on BAR with the correlation of the covariates X being ${0.5}^{| i - j |}$ .

	$τ = - 0.1$			$τ = 0.1$
PH	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
	p = 10			p = 10
n = 200	0.362(0.311)	3.814	0.443	0.505(0.353)	3.867	0.167
n = 300	0.237(0.219)	3.929	0.171	0.306(0.364)	4.000	0.020
	p = 20			p = 20
n = 300	0.427(0.357)	3.733	1.350	0.569(0.355)	4.000	2.300
n = 500	0.279(0.169)	3.800	0.977	0.173(0.431)	4.000	1.830
	τ(unknown)			$τ = 0.1$ (known)
PO	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
	p = 10			p = 10
	0.464(0.552)	3.800	0.667	0.354(0.554)	3.940	0.281
n = 300	0.437(0.296)	3.900	0.336	0.275(0.401)	4.000	0.261
	p = 20			p = 20
n = 300	0.755(0.458)	4.000	2.175	0.389(0.326)	4.000	2.367
n = 500	0.495(0.301)	4.000	2.200	0.255(0.266)	4.000	1.733

Open in a new tab

Table 4.

Results on covariate selection with $ω = 0$ , n = 300, p = 10.

Method	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
$τ = - 0.1$	$φ = 0$			$φ \neq 0$
BAR	0.048(0.034)	4.000	0.012	0.047(0.040)	4.000	0.028
LASSO	0.112(0.101)	4.000	2.110	0.123(0.103)	4.000	1.670
ALASSO	0.123(0.099)	4.000	0.208	0.114(0.057)	4.000	0.128
SCAD	0.130(0.064)	4.000	0.154	0.100(0.041)	4.000	0.132
SICA	0.112(0.070)	4.000	0.128	0.132(0.052)	4.000	0.116
SELO	0.139(0.064)	4.000	0.188	0.112(0.063)	4.000	0.142
MCP	0.128(0.068)	4.000	0.160	0.145(0.064)	4.000	0.148
$τ = 0$	$φ = 0$			$φ \neq 0$
BAR	0.049(0.047)	4.000	0.024	0.039(0.021)	4.000	0.004
LASSO	0.134(0.096)	4.000	2.460	0.162(0.095)	4.000	1.910
ALASSO	0.112(0.081)	4.000	0.126	0.112(0.049)	4.000	0.162
SCAD	0.125(0.068)	4.000	0.136	0.133(0.045)	4.000	0.178
SICA	0.113(0.079)	4.000	0.118	0.152(0.044)	4.000	0.112
SELO	0.112(0.073)	4.000	0.130	0.145(0.039)	4.000	0.152
MCP	0.136(0.063)	4.000	0.168	0.131(0.040)	4.000	0.146
$τ = 0.1$	$φ = 0$			$φ \neq 0$
BAR	0.023(0.035)	4.000	0.010	0.023(0.011)	4.000	0.080
LASSO	0.198(0.186)	4.000	2.260	0.156(0.098)	4.000	2.430
ALASSO	0.129(0.087)	4.000	0.224	0.126(0.065)	4.000	0.110
SCAD	0.132(0.067)	4.000	0.174	0.138(0.034)	4.000	0.196
SICA	0.112(0.061)	4.000	0.126	0.124(0.081)	4.000	0.140
SELO	0.134(0.068)	4.000	0.104	0.142(0.038)	4.000	0.146
MCP	0.125(0.062)	4.000	0.186	0.122(0.031)	4.000	0.172

Open in a new tab

Table 5.

Results on covariate selection with $ω = 1$ , n = 300, p = 10.

Method	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
$τ = - 0.1$	$φ = 0$			$φ \neq 0$
BAR	0.167(0.177)	4.000	0.180	0.163(0.245)	4.000	0.142
LASSO	0.230(0.137)	4.000	2.344	0.254(0.147)	4.000	2.318
ALASSO	0.260(0.247)	4.000	0.770	0.238(0.336)	4.000	0.680
SCAD	0.197(0.144)	4.000	0.890	0.215(0.152)	4.000	1.020
SICA	0.233(0.246)	4.000	1.480	0.230(0.279)	4.000	1.530
SELO	0.168(0.223)	4.000	0.300	0.188(0.210)	4.000	0.340
MCP	0.268(0.368)	4.000	1.590	0.265(0.319)	4.000	1.670
$τ = 0$	$φ = 0$			$φ \neq 0$
BAR	0.132(0.177)	4.000	0.130	0.180(0.206)	4.000	0.190
LASSO	0.218(0.153)	4.000	2.170	0.231(0.154)	4.000	2.232
ALASSO	0.212(0.230)	4.000	0.650	0.234(0.335)	4.000	0.690
SCAD	0.222(0.163)	4.000	1.030	0.194(0.140)	4.000	1.050
SICA	0.172(0.209)	4.000	1.270	0.193(0.318)	4.000	1.250
SELO	0.173(0.256)	4.000	0.350	0.173(0.363)	4.000	0.380
MCP	0.252(0.257)	4.000	1.480	0.285(0.380)	4.000	1.600
$τ = 0.1$	$φ = 0$			$φ \neq 0$
BAR	0.158(0.249)	4.000	0.160	0.204(0.216)	4.000	0.160
LASSO	0.206(0.121)	4.000	2.122	0.233(0.128)	4.000	2.256
ALASSO	0.198(0.162)	4.000	0.560	0.236(0.266)	4.000	0.750
SCAD	0.183(0.135)	4.000	0.890	0.185(0.143)	4.000	0.880
SICA	0.199(0.234)	4.000	1.520	0.220(0.270)	4.000	1.360
SELO	0.174(0.212)	4.000	0.360	0.168(0.245)	4.000	0.330
MCP	0.262(0.276)	4.000	1.480	0.267(0.225)	4.000	1.500

Open in a new tab

As pointed out by Ma et al. [13] and others, the estimators of regression parameters can be sensitive to the assumed association parameter or be biased and yield misleading results if the assumed association is misspecified. Thus it would be interesting to compare the proposed method under the unknown association parameter to the known. In the case of unknown association parameter, we repeated the studies above and present the estimation results with the true value of τ being $\pm 0.1$ . Table 6 presents the results on the covariate selection under the BAR penalty function with p = 10 or 20, $ω = 0$ or 1, and $φ_{j} = 0.1 (j = 1, \dots, 8)$ . Table 7 lists the results based on the penalty functions given above but with n = 300, p = 10, $ω = 0$ or 1 and $τ = - 0.1$ . The two tables suggest that both situations performed well. Although all the results under the known association parameter seem to be little more efficient as expected, the proposed procedure assuming association parameter unknown still performs well.

Table 6.

Comparison of the proposed method between the unknown and known τ.

	τ(unknown)			$τ = 0.1$ (known)
PH	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
	p = 10			p = 10
n = 200	0.170(0.168)	4.000	1.286	0.054(0.087)	3.998	0.130
n = 300	0.120(0.150)	4.000	0.667	0.023(0.011)	4.000	0.080
	p = 20			p = 20
n = 300	0.107(0.151)	4.000	2.000	0.063(0.054)	4.000	0.660
n = 500	0.149(0.169)	4.000	1.200	0.036(0.054)	4.000	0.130
	τ(unknown)			$τ = 0.1$ (known)
PO	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
	p = 10			p = 10
n = 200	0.518(0.758)	4.000	1.340	0.258(0.403)	3.990	0.210
n = 300	0.232(0.321)	4.000	0.460	0.204(0.216)	4.000	0.160
	p = 20			p = 20
n = 300	0.694(0.554)	4.000	2.604	0.301(0.345)	4.000	1.886
n = 500	0.284(0.230)	4.000	1.906	0.201(0.244)	4.000	1.648

Open in a new tab

Table 7.

Comparison of variable selection methods between the unknown and known τ with n = 300, p = 10.

Method	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
PH	τ(unknown)			$τ = - 0.1$ (known)
BAR	0.120(0.150)	4.000	0.667	0.047(0.040)	4.000	0.028
LASSO	0.286(0.188)	4.000	2.000	0.123(0.103)	4.000	1.670
ALASSO	0.118(0.167)	4.000	1.000	0.114(0.057)	4.000	0.128
SCAD	0.170(0.320)	4.000	1.333	0.100(0.041)	4.000	0.132
SICA	0.156(0.172)	4.000	1.250	0.132(0.052)	4.000	0.116
SELO	0.217(0.166)	4.000	1.333	0.112(0.063)	4.000	0.142
MCP	0.281(0.262)	4.000	1.100	0.145(0.064)	4.000	0.148
Method	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
PO	τ(unknown)			$τ = - 0.1$ (known)
BAR	0.232(0.321)	4.000	0.460	0.204(0.216)	4.000	0.160
LASSO	0.382(0.246)	4.000	2.310	0.233(0.128)	4.000	2.256
ALASSO	0.233(0.206)	4.000	1.630	0.236(0.266)	4.000	0.750
SCAD	0.430(0.425)	4.000	1.210	0.185(0.143)	4.000	0.880
SICA	0.317(0.248)	4.000	1.571	0.220(0.270)	4.000	1.360
SELO	0.337(0.265)	4.000	0.640	0.168(0.245)	4.000	0.330
MCP	0.250(0.340)	4.000	1.520	0.267(0.225)	4.000	1.500

Open in a new tab

In the estimation procedures above, we assumed that the copula function is known. However, this may not be true in reality. Therefore it will be interesting to investigate the robustness of the copula assumption. To see this, we repeated the study above giving Table 6 except that the copula function is the Frank model

\begin{aligned} M_{ρ} (μ, ν) = \log_{ρ} {1 + (ρ^{u} - 1) (ρ^{v} - 1) / (ρ - 1)}, ρ > 0, ρ \neq 1. \end{aligned}

That is, the data are generated under the Frank model, but by the misspecified model, the FGM model was assumed to be true and used in the estimation procedure. Table 8 summarizes the results from the simulations. We can see that the results obtained under the misspecified model are similar to those obtained under the correct one. In other words, the proposed estimation approach appears to be robust with respect to the copula model assumption.

Table 8.

Results on covariate selection under misspecified copula model.

PH	Misspecified	copula	model	Correct	copula	model
	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
	p = 10			p = 10
n = 200	0.065(0.055)	4.000	0.364	0.054(0.087)	3.998	0.130
n = 300	0.035(0.023)	4.000	0.133	0.023(0.011)	4.000	0.080
	p = 20			p = 20
n = 300	0.097(0.071)	3.967	0.764	0.063(0.054)	4.000	0.660
n = 500	0.066(0.058)	4.000	0.288	0.036(0.054)	4.000	0.130
PO	τ(unknown)			$τ = 0.1$ (known)
	MMSE(SD)	TP	FP	MMSE(SD)	TP	FP
	p = 10			p = 10
n = 200	0.222(0.416)	4.000	0.067	0.258(0.403)	3.990	0.210
n = 300	0.251(0.297)	4.000	0.133	0.204(0.216)	4.000	0.160
	p = 20			p = 20
n = 300	0.331(0.409)	4.000	1.433	0.301(0.345)	4.000	1.886
n = 500	0.136(0.283)	4.000	1.576	0.201(0.244)	4.000	1.648

Open in a new tab

In addition, we also considered some other setups, especially for several other values for m and other copula models, like Frank and Clayton models used in Xu et al. [24], and obtained similar results. In other words, the proposed estimator seems to be robust to the choices of m and the copula models.

5. Analysis of the Alzheimer's disease study

In this section, we apply the method presented in the previous sections to the data arising from the Alzheimer's Disease Neuroimaging Initiative (ADNI), a longitudinal multicenter study launched in 2003 by Principal Investigator Michael W. Weiner, MD, VA Medical Center and University of California-San Francisco. The primary goal of ADNI is to test whether serial magnetic resonance imaging (MRI), positron emission tomography, other biological markers and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). In the study, participants were divided into three groups based on their cognitive status: cognitively normal, MCI and AD. We will focus on the group of the participants with MCI and the time from the baseline visit date to the AD conversion, the failure time of interest. Since the participants were only examined intermittently, the AD conversion thus cannot be observed exactly and is known it occurred before and after the most recent examination. In other words, we may have current status data on the failure time of interest.

For the analysis below, we will consider 310 participants with 24 covariates including gender (Male), marital status ( married), Age, education level (PTEDU), APOE4, AD assessment scale score of 11 items (ADAS11), AD assessment scale score of 13 items (ADAS13), AD assessment scale-delayed word recall score (ADASQ4), clinical dementia rating scale-sum of boxes (CDRSB), mini-mental state examination (MMSE), Rey auditory verbal learning test score of immediate recall (RAVLT.i), rey auditory verbal learning test learning (RAVLT.l), rey auditory verbal learning test forgetting (RAVLT.f), rey auditory verbal learning test percent forgetting (RAVLT.pf), digit symbol substitution test score (DSSTS), trails B score (TBS), functional assessment questionnaire score (FAQ), MRI ventricles volume ( MRIVV), MRI hippocampus volume ( MRIHV), MRI whole brain volume ( MRIWB), MRI entorhinal volume ( MRIEV), MRI fusiform gyrus volume ( MRIFGV), MRI volume of middle temporal gyrus (MRIMTG) and MRI intracerebral volume (ICV).

In the study, we are interested in identifying covariates that have significant effects on the risk of developing AD. As in the simulation study, we employ the LASSO, ALASSO, SCAD, SELO, SICA, MCP and BAR penalty functions under model (1). Under PH and PO models, Tables 9 and 10 give the estimated effects and standard errors (given in parentheses) of the selected covariates and unknown association parameter τ. The standard errors are provided by using the bootstrap procedure with 100 bootstrap samples.

Table 9.

The sparse estimation results for the ADNI data under PH model.

Covariate	LASSO	ALASSO	SCAD	SICA	SELO	MCP	BAR
τ	${0.227}_{(0.181)}$	${0.259}_{(0.031)}$	${0.258}_{(0.035)}$	${0.265}_{(0.006)}$	${0.246}_{(0.038)}$	${0.240}_{(0.049)}$	${0.242}_{(0.046)}$
Male	${0.191}_{(0.188)}$	${0.265}_{(0.175)}$	${0.249}_{(0.176)}$	${0.204}_{(0.189)}$	${0.193}_{(0.166)}$	${0.246}_{(0.178)}$	${0.258}_{(0.189)}$
Married	$-_{()}$	$- {0.134}_{(0.176)}$	$- {0.143}_{(0.169)}$	$-_{(-)}$	$-_{(-)}$	$- {0.118}_{(0.170)}$	$- {0.146}_{(0.177)}$
AGE	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
PTEDU	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
APOE4	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$- {0.183}_{(0.133)}$	$-_{(-)}$	$-_{(-)}$
ADAS11	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
ADAS13	$-_{(-)}$	$-_{(-)}$	$-_{()}$	$-_{(-)}$	$-_{(-)}$	$-_{()}$	$-_{(-)}$
ADASQ4	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
CDRSB	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MMSE	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
RAVLT.i	${0.325}_{(0.191)}$	${0.431}_{(0.083)}$	${0.438}_{(0.087)}$	${0.433}_{(0.050)}$	${0.435}_{(0.056)}$	${0.376}_{(0.149)}$	${0.381}_{(0.158)}$
RAVLT.l	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
RAVLT.f	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
RAVLT.pf	$-_{(-)}$	$- {0.168}_{(0.179)}$	$-_{(-)}$	$- {0.184}_{(0.177)}$	$- {0.161}_{(0.162)}$	$-_{(-)}$	$- {0.134}_{(0.175)}$
DSSTS	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
TBS	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
FAQ	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MRIVV	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MRIHV	${0.106}_{(0.161)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	${0.122}_{(0.150)}$	$-_{(-)}$	$-_{(-)}$
MRIWB	$- {0.347}_{(0.154)}$	$- {0.328}_{(0.146)}$	$- {0.315}_{(0.148)}$	$- {0.207}_{(0.188)}$	$- {0.280}_{(0.155)}$	$- {0.358}_{(0.139)}$	$- {0.351}_{(0.176)}$
MRIEV	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MRIFGV	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MRIMTG	${0.695}_{(0.057)}$	${0.715}_{(0.038)}$	${0.702}_{(0.047)}$	${0.709}_{(0.049)}$	${0.707}_{(0.059)}$	${0.710}_{(0.049)}$	${0.724}_{(0.062)}$
ICV	$- {0.145}_{(0.181)}$	$- {0.146}_{(0.175)}$	$- {0.111}_{(0.164)}$	$- {0.191}_{(0.187)}$	$- {0.279}_{(0.150)}$	$- {0.127}_{(0.180)}$	$- {0.187}_{(0.183)}$

Open in a new tab

Table 10.

The sparse estimation results for the ADNI data under PO model.

Covariate	LASSO	ALASSO	SCAD	SICA	SELO	MCP	BAR
τ	${0.245}_{(0.005)}$	${0.221}_{(0.039)}$	${0.212}_{(0.043)}$	${0.231}_{(0.028)}$	${0.169}_{(0.031)}$	${0.239}_{(0.023)}$	${0.230}_{(0.036)}$
Male	${0.527}_{(0.098)}$	${0.609}_{(0.095)}$	${0.476}_{(0.170)}$	${0.449}_{(0.188)}$	${0.507}_{(0.108)}$	${0.555}_{(0.107)}$	${0.614}_{(0.106)}$
Married	$- {0.105}_{(0.168)}$	$- {0.148}_{(0.191)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$- {0.148}_{(0.189)}$	$- {0.194}_{(0.218)}$
AGE	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
PTEDU	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
APOE4	$- {0.351}_{(0.059)}$	$- {0.393}_{(0.046)}$	$- {0.400}_{(0.050)}$	$- {0.368}_{(0.076)}$	$- {0.431}_{(0.058)}$	$- {0.366}_{(0.063)}$	$- {0.389}_{(0.057)}$
ADAS11	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
ADAS13	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
ADASQ4	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
CDRSB	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MMSE	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
RAVLT.i	${0.739}_{(0.065)}$	${0.710}_{(0.097)}$	${0.692}_{(0.092)}$	${0.733}_{(0.090)}$	${0.663}_{(0.058)}$	${0.762}_{(0.068)}$	${0.728}_{(0.094)}$
RAVLT.l	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
RAVLT.f	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
RAVLT.pf	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
DSSTS	$-_{(-)}$	${0.152}_{(0.170)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	${0.127}_{(0.167)}$	${0.111}_{(0.160)}$
TBS	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
FAQ	$- {0.441}_{(0.045)}$	$- {0.420}_{(0.058)}$	$- {0.396}_{(0.075)}$	$- {0.435}_{(0.064)}$	$- {0.396}_{(0.073)}$	$- {0.426}_{(0.060)}$	$- {0.412}_{(0.086)}$
MRIVV	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MRIHV	${0.308}_{(0.135)}$	${0.382}_{(0.124)}$	${0.358}_{(0.153)}$	${0.378}_{(0.120)}$	${0.455}_{(0.089)}$	${0.345}_{(0.120)}$	${0.359}_{(0.119)}$
MRIWB	$- {0.687}_{(0.099)}$	$- {0.740}_{(0.108)}$	$- {0.708}_{(0.098)}$	$- {0.692}_{(0.138)}$	$- {0.774}_{(0.113)}$	$- {0.725}_{(0.095)}$	$- {0.739}_{(0.105)}$
MRIEV	${0.117}_{(0.165)}$	$-_{(-)}$	$-_{(-)}$	${0.148}_{(0.173)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$
MRIFGV	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	$-_{(-)}$	${0.167}_{(0.177)}$	$-_{(-)}$	$-_{(-)}$
MRIMTG	${1.009}_{(0.063)}$	${1.025}_{(0.069)}$	${1.011}_{(0.071)}$	${1.027}_{(0.084)}$	${0.989}_{(0.066)}$	${1.029}_{(0.065)}$	${1.030}_{(0.065)}$
ICV	$- {0.243}_{(0.190)}$	$- {0.343}_{(0.191)}$	$- {0.297}_{(0.181)}$	$- {0.322}_{(0.208)}$	$- {0.315}_{(0.214)}$	$- {0.260}_{(0.195)}$	$- {0.316}_{(0.187)}$

Open in a new tab

One can see from Table 9 that the five covariates Male, RAVLT.i, MRIWB, MRIMTG and ICV are selected by all penalty procedures. Among the five selected covariates, RAVLT.i, MRIWB and MRIMTG seem to have a significant effect on the AD conversion under the proposed method. In Table 10, one can see that the three covariates APOE4, FAQ and MRIHV are selected by all penalties besides the five covariates above. Among the eight selected covariates, only the covariate ICV suggested no significant effect under the proposed method. With respect to the estimation of the association between the AD conversion time and the observed time, under the FGM model, the results suggest that they are significantly positively correlated. Under the two models, the analysis results are similar to the analysis results given by Li et al. [9].

6. Discussion and concluding remarks

In the preceding sections, we have discussed the sparse estimation for the semiparametric linear transformation models with the dependent current status data. For inference, a two-stage estimation procedure based on the BAR penalized likelihood was proposed to estimate the association parameter in addition to the regression parameters, where the copula model was used to describe the relationship between the failure time and the censoring time. To approximate the unknown function $Λ_{1} (t)$ , Wu and Cook [23] employed the piecewise constant function and Ma et al. [13] used the I-spline base function. However, one drawback of the former is that the piecewise constant function is neither continuous nor differentiable, while the latter requires choosing both the order m and the interior knots. Therefore, we adopted Bernstein polynomials approximation which is continuous and has some nice properties including differentiability. Moreover, as Xu et al. [24] and Zhao et al. [29] pointed out, most inference procedures are insensitive to the value of m and one can simply choose $m = [n^{1 / 4}]$ in practice. In addition, the oracle property of the resulting estimator was established, and some numerical studies suggested that the proposed method works well for practical situations.

Note that in the study, a main advantage is the flexibility of the proposed model as it includes many popular models as special cases. Another advantage is that the proposed estimation procedure does not require specifying the association parameter beforehand, which is more practical. Although in general, the association parameter cannot be estimated without extra information. For the situation here, the extra information is given by the estimation of the marginal distribution $F_{C}$ in the first stage, which can then be treated as being known. However, one limitation of the proposed method is that it assumes that the underlying copula model is known. As some papers pointed out (Zheng et al., [30]; [13]), it is usually impossible to estimate it without strong assumptions. Also the simulation study suggested that the presented method seems to be robust with respect to the underlying copula assumption.

Finally, several improvements and extensions can be explored in future studies. For example, it is meaningful to generalize the proposed methods to the high-dimensional case or the dependent bivariate current status data [1]. Another possible direction for future research is to develop similar method for dependent censoring in other models, such as the additive hazards model or some cure models [18], etc.

Acknowledgments

The data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this paper. A complete listing of ADNI investigators can be found at https://adni.loni.usc.edu/wp-content/uploads/how-to-apply/ADNI-Acknowledgement-List.pdf.

Appendices.

Appendix 1. Compute the partial derivatives ${\dot{l}}_{c} (β ∣ ϕ)$ , ${\dot{l}}_{c} (ϕ ∣ β)$ and ${\ddot{l}}_{c} (β ∣ ϕ)$ .

To describe the dependent censoring, we consider the FGM model

\begin{aligned} M_{ρ} (μ, ν) = μν + ρμν (1 - μ) (1 - ν), - 1 \leq ρ \leq 1. \end{aligned}

Then it follows that

\begin{aligned} m_{ρ} (F_{T} (t), F_{C} (c)) = {\frac{\partial M_{ρ} (μ, ν)}{\partial ν} |}_{μ = F_{T} (t), ν = F_{C} (c)} = F_{T} (t) [1 + ρ (1 - F_{T} (t)) (1 - 2 F_{C} (c))], \end{aligned}

the likelihood function has the following form:

\begin{aligned} L_{n} (β, ϕ, η) \\ = \prod_{i = 1}^{n} {m_{ρ} (F_{T} (c_{i}), F_{C} (c_{i}))}^{δ_{i}} {[1 - m_{ρ} (F_{T} (c_{i}), F_{C} (c_{i}))]}^{1 - δ_{i}} f_{C} (c_{i}) \\ = \prod_{i = 1}^{n} {F_{T} (c_{i}) [1 + ρ (1 - F_{T} (c_{i})) (1 - 2 F_{C} (c_{i}))]}^{δ_{i}} {(1 - F_{T} (c_{i})) [1 - ρ F_{T} (c_{i}) \\ \times (1 - 2 F_{C} (c_{i}))]}^{1 - δ_{i}} f_{C} (c_{i}) \\ = \prod_{i = 1}^{n} {(1 - G {\log (Λ_{1} (c_{i})) - β^{T} X_{i}}) [1 + ρG {\log (Λ_{1} (c_{i})) - β^{T} X_{i}} (1 - 2 F_{C} (c_{i}))]}^{δ_{i}} \\ \times {G {\log (Λ_{1} (c_{i})) - β^{T} X_{i}} [1 - ρ (1 - G {\log (Λ_{1} (c_{i})) - β^{T} X_{i}}) (1 - 2 F_{C} (c_{i}))]}^{1 - δ_{i}} f_{C} (c_{i}) \\ = \prod_{i = 1}^{n} {(1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) [1 + ρG {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} (1 - 2 F_{C} (c_{i}))]}^{δ_{i}} \\ \times {G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} [1 - ρ (1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) (1 - 2 F_{C} (c_{i}))]}^{1 - δ_{i}} f_{C} (c_{i}), \end{aligned}

and the conditional log-likelihood function is

\begin{aligned} l_{c} (β, ϕ) = \log L_{n} (β, ϕ ∣ \hat{η}) \\ = \sum_{i = 1}^{n} {δ_{i} \log (1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) + δ_{i} \log [1 + ρG {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} \\ \times (1 - 2 {\hat{F}}_{C} (c_{i}))] + (1 - δ_{i}) \log (G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) \\ + (1 - δ_{i}) \log [1 - ρ (1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) (1 - 2 {\hat{F}}_{C} (c_{i}))] + \log {\hat{f}}_{C} (c_{i})} . \end{aligned}

Moreover, we obtain the following partial derivatives

\begin{aligned} {\dot{l}}_{c} (β ∣ ϕ) = \partial l_{c} (β, ϕ) / ∂β \\ = \sum_{i = 1}^{n} G^{'} {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} X_{i} {\frac{δ_{i}}{1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}} \\ - \frac{δ_{i} ρ (1 - 2 {\hat{F}}_{C} (c_{i}))}{1 + ρG {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} (1 - 2 {\hat{F}}_{C} (c_{i}))} - \frac{1 - δ_{i}}{G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}} \\ - \frac{(1 - δ_{i}) ρ (1 - 2 {\hat{F}}_{C} (c_{i}))}{1 - ρ (1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) (1 - 2 {\hat{F}}_{C} (c_{i}))}}, \\ {\ddot{l}}_{c} (β ∣ ϕ) = \partial^{2} l_{c} (β, ϕ) / ∂β∂ β^{T} \\ = \sum_{i = 1}^{n} G^{″} {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} X_{i} X_{i}^{T} {\frac{δ_{i}}{1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}} \\ - \frac{δ_{i} ρ (1 - 2 {\hat{F}}_{C} (c_{i}))}{1 + ρG {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} (1 - 2 {\hat{F}}_{C} (c_{i}))} - \frac{1 - δ_{i}}{G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}} \\ - \frac{(1 - δ_{i}) ρ (1 - 2 {\hat{F}}_{C} (c_{i}))}{1 - ρ (1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) (1 - 2 {\hat{F}}_{C} (c_{i}))}} \\ + \sum_{i = 1}^{n} [G^{'} {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}]^{2} X_{i} X_{i}^{T} {\frac{- δ_{i}}{[1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}]^{2}} \\ - \frac{δ_{i} ρ^{2} (1 - 2 {\hat{F}}_{C} (c_{i}))^{2}}{[1 + ρG {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} (1 - 2 {\hat{F}}_{C} (c_{i}))]^{2}} - \frac{1 - δ_{i}}{[G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}]^{2}} \\ - \frac{(1 - δ_{i}) ρ^{2} (1 - 2 {\hat{F}}_{C} (c_{i}))^{2}}{[1 - ρ (1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) (1 - 2 {\hat{F}}_{C} (c_{i}))]^{2}}} \end{aligned}

and

\begin{aligned} {\dot{l}}_{c} (ϕ ∣ β) = \partial l_{c} (β, ϕ) / ∂ϕ \\ = \sum_{i = 1}^{n} \frac{G^{'} {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} Λ_{1 n}^{'} (c_{i})}{Λ_{1 n} (c_{i})} {\frac{- δ_{i}}{1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}} \\ + \frac{δ_{i} ρ (1 - 2 {\hat{F}}_{C} (c_{i}))}{1 + ρG {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}} (1 - 2 {\hat{F}}_{C} (c_{i}))} + \frac{1 - δ_{i}}{G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}} \\ + \frac{(1 - δ_{i}) ρ (1 - 2 {\hat{F}}_{C} (c_{i}))}{1 - ρ (1 - G {\log (Λ_{1 n} (c_{i})) - β^{T} X_{i}}) (1 - 2 {\hat{F}}_{C} (c_{i}))}}, \end{aligned}

where $G^{'} (x) = d G (x) / d x$ , $G^{″} (x) = d^{2} G (x) / d x^{2}$ and $Λ_{1 n}^{'} (c_{i}) = \partial Λ_{1 n} (c_{i}) / \partial ϕ^{*}$ .

Appendix 2. Proof of Theorem 3.1

Before proving Theorem 3.1, we will give some regularity conditions, notation and lemmas. For the completeness, we first describe the asymptotic properties of $\hat{φ}$ and ${\hat{Λ}}_{2}$ in the Lemma A.1 below.

Lemma A.1

Let $\hat{φ}$ and ${\hat{Λ}}_{2}$ be the estimators of φ and $Λ_{2}$ defined above, respectively, and assume that the regularity conditions given at pages 174–176 of Kalbfleisch and Prentice [8] hold. Then $\hat{φ}$ and ${\hat{Λ}}_{2}$ are consistent and have the asymptotical normality.

For the proof of Lemma A.1, please refer to Section 8.3 of Kalbfleisch and Prentice [8]. To show the asymptotic properties of ${\hat{β}}^{*}$ , in addition to the conditions needed in Lemma A.1, we also need the following regularity conditions.

Condition C1. X has a bounded support in $R^{p_{n}}$ , and if there exists a constant $c_{0}$ and a vector ξ such that $ξ^{T} Z = c_{0}$ almost surely, then $c_{0} = 0$ and $ξ = 0$ .

Condition C2. $μ_{M} (E) > 0$ for any open set $E \in I^{2}$ , where $μ_{M}$ denotes the probability measure corresponding to the copula function $M_{α}$ given X.

Condition C3. The copula function $M (\cdot, \cdot)$ has bounded first-order partial derivatives with $\partial M (u, v) / ∂u$ and $\partial M (u, v) / ∂v$ being Lipschitz.

Condition C4. The function $Λ_{1}$ is continuously differentiable up to order r in $[l, μ]$ , r>2, and satisfies $c^{- 1} < Λ_{1} (l) < Λ_{1} (μ) < c$ for some positive constant c.

Condition C5. $β_{0}$ is an interior point of a compact set $B \subset R^{p_{n}}$ and there exists a compact neighborhood $B_{0}$ of the true value $β_{0}$ such that

\begin{aligned} sup_{β \in B_{0}} ‖ n^{- 1} A_{n} (β) - I (β_{0}) ‖ \overset{a . s .}{⟶} 0, \end{aligned}

where $I (β_{0})$ is a positive-definite $p_{n} \times p_{n}$ matrix.

Condition C6. There exists a constant a>1 such that $a^{- 1} < λ_{min} (n^{- 1} A_{n}) \leq λ_{max} (n^{- 1} A_{n}) < a$ for sufficiently large n, where $λ_{min} (\cdot)$ and $λ_{max} (\cdot)$ stand for the smallest and largest eigenvalues of the matrix.

Condition C7. There exists positive constants $a_{0}$ and $a_{1}$ such that $a_{0} \leq | β_{0, j} | \leq a_{1}$ with $1 \leq j \leq q_{n}$ , and as $n \to \infty$ , $p_{n} q_{n} / \sqrt{n} \to 0$ , $λ_{n} / \sqrt{n} \to 0$ , $λ_{n} (q_{n} / n)^{1 / 2} \to 0$ and $λ_{n}^{2} (p_{n} / n) \to 0$ .

Condition C8. The initial estimator ${\hat{β}}^{(0)}$ satisfies $‖ {\hat{β}}^{(0)} - β_{0} ‖ = O_{p} (\sqrt{p_{n} / n})$ .

Note that these conditions are mild and usually satisfied in practical situations. Specifically, C1 is commonly used in current status data literature ([28], Xu et al. [24]) to ensure the identifiability of parameters and the uniform convergence. C2 and C3 are useful for the copula models. C4 and C5 are necessary for the existence and consistence of the sieve maximum likelihood estimator of $Λ_{1} (t)$ . C6 assumes that $n^{- 1} A_{n}$ is positive definite almost surely, its eigenvalues and the nonzero coefficients are bounded away from zero and infinity. C7 and C8 give some sufficient, but not necessary, conditions needed to prove the numerical convergence and asymptotic properties of the BAR estimator. Also, based on the similar arguments of Theorem 3.1 in Cui et al. [2], one can take the initial value ${\hat{β}}^{(0)}$ to be the unpenalized estimate or the ridge regression estimate of β.

Next, define $β = (α^{T}, γ^{T})^{T}$ , where α is a $q_{n} \times 1$ vector and γ is a $(p_{n} - q_{n}) \times 1$ vector, and analogously, write ${\hat{β}}^{(k)} = ({\hat{α}}^{(k) T}, {\hat{γ}}^{(k) T})^{T}$ and

\begin{aligned} (\begin{matrix} α^{*} (β) \\ γ^{*} (β) \end{matrix}) \equiv g (β) = (A_{n} + λ_{n} D (β))^{- 1} B_{n} . \end{aligned}

(A1)

For simplicity, write $α^{*} (β)$ and $γ^{*} (β)$ as $α^{*}$ and $γ^{*}$ hereafter. We further partition $(n^{- 1} A_{n})^{- 1}$ into

\begin{aligned} (n^{- 1} A_{n})^{- 1} = (\begin{array}{cc} A & B \\ B^{T} & G \end{array}), \end{aligned}

where A is a $q_{n} \times q_{n}$ matrix. Based on nonsingularity of matrix $A_{n}$ , multiplying $A_{n}^{- 1} (A_{n} + λ_{n} D (β))$ and subtracting $β_{0}$ on both sides of (A1), we have

\begin{aligned} (\begin{matrix} α^{*} - β_{01} \\ γ^{*} \end{matrix}) + \frac{λ_{n}}{n} (\begin{matrix} A D_{1} (α) α^{*} + B D_{2} (γ) γ^{*} \\ B^{'} D_{1} (α) α^{*} + G D_{2} (γ) γ^{*} \end{matrix}) = \hat{b} - β_{0}, \end{aligned}

(A2)

where $\hat{b} = A_{n}^{- 1} B_{n}$ is the ordinary least squares estimate, $D_{1} (α) = diag (β_{1}^{- 2}, \dots, β_{q_{n}}^{- 2})$ and $D_{2} (γ) = diag (β_{q_{n} + 1}^{- 2}, \dots, β_{p_{n}}^{- 2})$ .

To prove Theorem 3.1, we also need the following two lemmas.

Lemma A.2

Assume Conditions C1–C8 hold, and define $H_{n} \equiv {β = (α^{T}, γ^{T})^{T} : α \in [1 / K_{0}, K_{0}]^{q_{n}}, ‖ γ ‖ \leq δ_{n} \sqrt{p_{n} / n}}$ , where $K_{0} > 1$ is a constant such that $β_{01} \in [1 / K_{0}, K_{0}]^{q_{n}}$ , $0 < δ_{n} \to \infty$ and $p_{n} δ_{n}^{2} / λ_{n} \to 0$ as $n \to \infty$ . Then, with probability tending to 1, we have

(i)
$sup_{β \in H_{n}} ‖ γ^{*} / γ ‖ < 1 / C_{0}$ , for some constant $C_{0} > 1$ ;

(ii)
$g (\cdot)$ is a mapping from $H_{n}$ to itself.

Lemma A.3

Assume Conditions C1–C8 hold. Then, with probability tending to 1, the equation $α = (A_{n}^{(1)} + λ_{n} D_{1} (α))^{- 1} B_{n}^{(1)}$ has a unique fixed-point ${\hat{α}}^{*}$ in the domain $[1 / K_{0}, K_{0}]^{q_{n}}$ , where $A_{n}^{(1)}$ and $B_{n}^{(1)}$ are the first $q_{n}$ columns of $A_{n}$ and $B_{n}$ , respectively.

The proof of the two lemmas is similar as that of Lemmas A.1–A.2 in Zhao et al. [29] and is therefore omitted here.

Proof Proof of Theorem 3.1 —

Firstly, by the definition of ${\hat{β}}^{*}$ and ${\hat{β}}^{(k)}$ , it follows from Lemma A.2(i) that

$\begin{aligned} {\hat{β}}_{2}^{*} \equiv lim_{k \to \infty} {\hat{γ}}^{(k)} = 0 \end{aligned}$ (A3)

holds with the probability tending to 1.

Next, we prove part (i). Define

$\begin{aligned} f (α) = (f_{1} (α), \dots, f_{q_{n}} (α))^{T} \equiv (A_{n}^{(1)} + λ_{n} D_{1} (α))^{- 1} B_{n}^{(1)}, \end{aligned}$ (A4)

where $α = (α_{1}, \dots, α_{q_{n}})^{T}$ . Then it is sufficient to show that

$\begin{aligned} \Pr (lim_{k \to \infty} ‖ {\hat{α}}^{(k)} - {\hat{α}}^{*} ‖ = 0) \to 1, \end{aligned}$ (A5)

where ${\hat{α}}^{*}$ is the fixed point of $f (α)$ defined in Lemma A.3. Define $γ^{*} = 0$ if $γ = 0$ . Note that for any $(α^{T}, γ^{T})^{T} \in H_{n}$ , from (A2) we have

$\begin{aligned} lim_{γ \to 0} γ^{*} (α, γ) = 0. \end{aligned}$ (A6)

Combining (A3), (A4) and (A6), we have

$\begin{aligned} lim_{γ \to 0} α^{*} (α, γ) = (A_{n}^{(1)} + λ_{n} D_{1} (α))^{- 1} B_{n}^{(1)} = f (α) . \end{aligned}$

Then, as $k \to \infty$ , it follows that

$\begin{aligned} η_{k} \equiv sup_{α \in [1 / K_{0}, K_{0}]^{q_{n}}} ‖ f (α) - α^{*} (α, {\hat{γ}}^{(k)}) ‖ \to 0. \end{aligned}$ (A7)

This implies that, for any $ϵ \geq 0$ , there exists N>0 such that when k>N, $∣ η_{k} ∣< ϵ$ . Since $f (\cdot)$ is a contract mapping, it follows from Lemma A.3 that

$\begin{aligned} ‖ f ({\hat{α}}^{(k)}) - {\hat{α}}^{*} ‖ = ‖ f ({\hat{α}}^{(k)}) - f ({\hat{α}}^{*}) ‖ \leq \frac{1}{c} ‖ {\hat{α}}^{(k)} - {\hat{α}}^{*} ‖, \end{aligned}$ (A8)

where c>1.

Let $h_{k} = ‖ {\hat{α}}^{(k)} - {\hat{α}}^{*} ‖$ . Employing some recursive calculation, we have $h_{k} \to 0$ , as $k \to \infty$ . Combining (A7) and (A8), it is easy to show that

$\begin{aligned} ‖ α^{*} ({\hat{β}}^{(k)}) - {\hat{α}}^{*} ‖ \leq ‖ α^{*} ({\hat{β}}^{(k)}) - f ({\hat{α}}^{(k)}) ‖ + ‖ f ({\hat{α}}^{(k)}) - {\hat{α}}^{*} ‖ \leq η_{k} + \frac{1}{c} h_{k} . \end{aligned}$

Thus, as $k \to \infty$ , with the probability tending to 1, we have

$\begin{aligned} ‖ {\hat{α}}^{(k)} - {\hat{α}}^{*} ‖ \to 0. \end{aligned}$

Note that ${\hat{β}}_{1}^{*} \equiv lim_{k \to \infty} {\hat{α}}^{(k)}$ . Therefore, we show that

$\begin{aligned} \Pr (lim_{k \to \infty} ‖ {\hat{α}}^{(k)} - {\hat{α}}^{*} ‖ = 0) \to 1, \end{aligned}$ (A9)

which completes the proof of part (i).

Finally, to prove the part (ii) of the theorem, write

$\begin{aligned} \sqrt{n} t^{- 1} b^{T} ({\hat{α}}^{*} - β_{01}) = T_{1} + T_{2}, \end{aligned}$

where

$\begin{aligned} T_{1} = \sqrt{n} t^{- 1} b^{T} [(A_{n}^{(1)} + λ_{n} D_{1} ({\hat{α}}^{*}))^{- 1} A_{n}^{(1)} - I_{q_{n}}] β_{01} \end{aligned}$

and

$\begin{aligned} T_{2} = \sqrt{n} t^{- 1} b^{T} (A_{n}^{(1)} + λ_{n} D_{1} ({\hat{α}}^{*}))^{- 1} (B_{n}^{(1)} - A_{n}^{(1)} β_{01}) . \end{aligned}$

By the first-order resolvent expansion formula, we can derive

$\begin{aligned} T_{1} = - \frac{λ_{n}}{\sqrt{n}} t^{- 1} b^{T} (A_{n}^{(1)} / n)^{- 1} D_{1} ({\hat{α}}^{*}) {(\frac{1}{n} A_{n}^{(1)} + \frac{λ_{n}}{n} D_{1} ({\hat{α}}^{*}))}^{- 1} \frac{1}{n} A_{n}^{(1)} β_{01} . \end{aligned}$

Hence, by the Conditions C6–C7, we have

$\begin{aligned} ‖ T_{1} ‖ = O_{p} (λ_{n} \sqrt{q_{n} / n}) \to 0. \end{aligned}$

Furthermore, applying the order resolvent expansion formula, it can be shown that

$\begin{aligned} T_{2} = t^{- 1} b^{T} (A_{n}^{(1)} / n)^{- 1} \frac{1}{\sqrt{n}} (B_{n}^{(1)} - A_{n}^{(1)} β_{01}) + o_{p} (1), \end{aligned}$

where $\frac{1}{\sqrt{n}} (B_{n}^{(1)} - A_{n}^{(1)} β_{01}) = \frac{1}{\sqrt{n}} {\dot{l}}_{n}^{(1)} ({\hat{β}}^{*} ∣ {\hat{ϕ}}^{*}) + o_{p} (1)$ with ${\dot{l}}_{n}^{(1)} ({\hat{β}}^{*} ∣ {\hat{ϕ}}^{*})$ denoting the first $q_{n}$ components of ${\dot{l}}_{n} ({\hat{β}}^{*} ∣ {\hat{ϕ}}^{*})$ . Let $I (β) = E [- {\ddot{l}}_{n}^{(1)} (β ∣ {\hat{ϕ}}^{*})]$ be the Fisher information matrix, where ${\ddot{l}}_{n}^{(1)} (β ∣ ϕ)$ is the partial Hessian matrix about β. Based on the asymptotic normality of $n^{- 1 / 2} {\dot{l}}_{n} ({\hat{β}}^{*} ∣ {\hat{ϕ}}^{*})$ , we have

$\begin{aligned} \sqrt{n} t^{- 1} b^{T} ({\hat{α}}^{*} - β_{01}) \to N_{q_{n}} (0, 1) \end{aligned}$

with $Σ = n (A_{n}^{(1)} (β_{0}))^{- 1} I^{(1)} (β_{0}) (A_{n}^{(1)} (β_{0}))^{- 1}$ and $t = \sqrt{b^{T} Σb}$ , where $I^{(1)} (β_{0})$ is the leading $q_{n} \times q_{n}$ submatrix of $I (β_{0})$ . Combining (A9), we can conclude the proof of part (ii).

Funding Statement

The research of Zhao was partially supported by the National Natural Science Foundation of China grants 12171483 and 11861030.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Chen M.H., Tong X., and Sun J., A frailty model approach for regression analysis of multivariate current status data, Stat. Med. 28 (2009), pp. 3424–3436. [DOI] [PubMed] [Google Scholar]
2.Cui Q., Zhao H., and Sun J., A new copula model-based method for regression analysis of dependent current status data, Stat. Interface 11 (2018), pp. 463–471. [Google Scholar]
3.Dai L., Chen K., Sun Z., Liu Z., and Li G., Broken adaptive ridge regression and its asymptotic properties, J. Multivar. Anal. 168 (2018), pp. 334–351. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Dicker L., Huang B., and Lin X., Variable selection and estimation with the seamless-L0 penalty, Stat. Sin. 23 (2013), pp. 929–962. [Google Scholar]
5.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle property, J. Am. Stat. Assoc. 96 (2001), pp. 1348–1360. [Google Scholar]
6.Fan J. and Li R., Variable selection for Cox's proportional hazards model and frailty model, Ann. Stat. 30 (2002), pp. 74–99. [Google Scholar]
7.Huang J., Efficient estimation for the proportional hazards model with interval censoring, Ann. Stat. 24 (1996), pp. 540–568. [Google Scholar]
8.Kalbfleisch J.D. and Prentice R.L., The Statistical Analysis of Failure Time Data, Wiley, New York, 2002. [Google Scholar]
9.Li K., Chan W., Doody R.S., Quinn J., and Luo S., Alzheimers disease neuroimaging initiative. Prediction of conversion to Alzheimer's disease with longitudinal measures and time-to-event data, J. Alzheimer's Dis. 58 (2017), pp. 361–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Li S., Wu Q., and Sun J., Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer's disease, Stat. Methods Med. Res. 29 (2020), pp. 2151–2166. [DOI] [PubMed] [Google Scholar]
11.Liu X. and Zeng D., Variable selection in semiparametric transformation models for right-censored data, Biometrika 100 (2013), pp. 859–876. [Google Scholar]
12.Lv J. and Fan Y., A unified approach to model selection and sparse recovery using regularized least squares, Ann. Stat. 37 (2009), pp. 3498–3528. [Google Scholar]
13.Ma L., Hu T., and Sun J., Sieve maximum likelihood regression analysis of dependent current status data, Biometrika 102 (2015), pp. 731–738. [Google Scholar]
14.Nelsen R.B., An Introduction to Copulas, 2nd ed., Springer, New York, 2006. [Google Scholar]
15.Scolas S., El Ghouch A., Legrand C., and Oulhaj A., Variable selection in a flexible parametric mixture cure model with interval-censored data, Stat. Med. 35 (2016), pp. 1210–1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Shi Y., Cao Y., Jiao Y., and Liu Y., SICA for Cox's proportional hazards model with a diverging number of parameters, Acta Math. Sin. Engl. Ser. 30 (2014), pp. 887–902. [Google Scholar]
17.Sun J., The Statistical Analysis of Interval-censored Failure Time Data, Springer, New York, 2006. [Google Scholar]
18.Sun L., Li S., Wang L., and Song X., Variable selection in semiparametric nonmixture cure model with interval-censored failure time data: Application to the prostate cancer screening study, Stat. Med. 38 (2019), pp. 3026–3039. [DOI] [PubMed] [Google Scholar]
19.Titman A.C., A pool-adjacent-violators type algorithm for non-parametric estimation of current status data with dependent censoring, Lifetime Data Anal. 20 (2014), pp. 444–458. [DOI] [PubMed] [Google Scholar]
20.Tibshirani R., Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Ser. B 58 (1996), pp. 267–288. [Google Scholar]
21.Tibshirani R., The Lassso method for variable selection in the Cox model, Stat. Med. 16 (1997), pp. 385–395. [DOI] [PubMed] [Google Scholar]
22.Wang J. and Ghosh S.K., Shape restricted nonparametric regression with Bernstein polynomials, Comput. Stat. Data Anal. 56 (2012), pp. 2729–2741. [Google Scholar]
23.Wu Y. and Cook R., Penalized regression for interval-censored times of disease progression: Selection of HLA markers in psoriatic arthritis, Biometrics 71 (2015), pp. 782–791. [DOI] [PubMed] [Google Scholar]
24.Xu D., Zhao S., Hu T., Yu M., and Sun J., Regression analysis of informative current status data with the semiparametric linear transformation model, J. Appl. Stat. 46 (2019), pp. 187–202. [Google Scholar]
25.Zhang C.H., Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2010), pp. 894–942. [Google Scholar]
26.Zhang H. and Lu W.B., Adaptive Lasso for Cox's proportional hazards model, Biometrika 94 (2007), pp. 691–703. [Google Scholar]
27.Zhang Z., Sun J., and Sun L., Statistical analysis of current status data with informative observation times, Stat. Med. 24 (2005), pp. 1399–1407. [DOI] [PubMed] [Google Scholar]
28.Zhao S., Hu T., Ma L., Wang P., and Sun J., Regression analysis of informative current status data with the additive hazards model, Lifetime Data Anal. 21 (2015), pp. 241–258. [DOI] [PubMed] [Google Scholar]
29.Zhao H., Wu Q., Li G., and Sun J., Simultaneous estimation and variable selection for interval censored data with broken adaptive ridge regression, J. Am. Stat. Assoc. 115 (2020), pp. 204–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Zheng M. and Klein J.P., Estimates of marginal survival for dependent competing risk based on an assumed copula, Biometrika 82 (1995), pp. 127–138. [Google Scholar]
31.Zhou Q., Hu T., and Sun J., A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data, J. Am. Stat. Assoc. 112 (2017), pp. 664–672. [Google Scholar]
32.Zou H., The adaptive Lasso and its oracle properties, J. Am. Stat. Assoc. 101 (2006), pp. 1418–1429. [Google Scholar]

[CIT0001] 1.Chen M.H., Tong X., and Sun J., A frailty model approach for regression analysis of multivariate current status data, Stat. Med. 28 (2009), pp. 3424–3436. [DOI] [PubMed] [Google Scholar]

[CIT0002] 2.Cui Q., Zhao H., and Sun J., A new copula model-based method for regression analysis of dependent current status data, Stat. Interface 11 (2018), pp. 463–471. [Google Scholar]

[CIT0003] 3.Dai L., Chen K., Sun Z., Liu Z., and Li G., Broken adaptive ridge regression and its asymptotic properties, J. Multivar. Anal. 168 (2018), pp. 334–351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0004] 4.Dicker L., Huang B., and Lin X., Variable selection and estimation with the seamless-L0 penalty, Stat. Sin. 23 (2013), pp. 929–962. [Google Scholar]

[CIT0005] 5.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle property, J. Am. Stat. Assoc. 96 (2001), pp. 1348–1360. [Google Scholar]

[CIT0006] 6.Fan J. and Li R., Variable selection for Cox's proportional hazards model and frailty model, Ann. Stat. 30 (2002), pp. 74–99. [Google Scholar]

[CIT0007] 7.Huang J., Efficient estimation for the proportional hazards model with interval censoring, Ann. Stat. 24 (1996), pp. 540–568. [Google Scholar]

[CIT0008] 8.Kalbfleisch J.D. and Prentice R.L., The Statistical Analysis of Failure Time Data, Wiley, New York, 2002. [Google Scholar]

[CIT0009] 9.Li K., Chan W., Doody R.S., Quinn J., and Luo S., Alzheimers disease neuroimaging initiative. Prediction of conversion to Alzheimer's disease with longitudinal measures and time-to-event data, J. Alzheimer's Dis. 58 (2017), pp. 361–371. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0010] 10.Li S., Wu Q., and Sun J., Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer's disease, Stat. Methods Med. Res. 29 (2020), pp. 2151–2166. [DOI] [PubMed] [Google Scholar]

[CIT0011] 11.Liu X. and Zeng D., Variable selection in semiparametric transformation models for right-censored data, Biometrika 100 (2013), pp. 859–876. [Google Scholar]

[CIT0012] 12.Lv J. and Fan Y., A unified approach to model selection and sparse recovery using regularized least squares, Ann. Stat. 37 (2009), pp. 3498–3528. [Google Scholar]

[CIT0013] 13.Ma L., Hu T., and Sun J., Sieve maximum likelihood regression analysis of dependent current status data, Biometrika 102 (2015), pp. 731–738. [Google Scholar]

[CIT0014] 14.Nelsen R.B., An Introduction to Copulas, 2nd ed., Springer, New York, 2006. [Google Scholar]

[CIT0015] 15.Scolas S., El Ghouch A., Legrand C., and Oulhaj A., Variable selection in a flexible parametric mixture cure model with interval-censored data, Stat. Med. 35 (2016), pp. 1210–1225. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0016] 16.Shi Y., Cao Y., Jiao Y., and Liu Y., SICA for Cox's proportional hazards model with a diverging number of parameters, Acta Math. Sin. Engl. Ser. 30 (2014), pp. 887–902. [Google Scholar]

[CIT0017] 17.Sun J., The Statistical Analysis of Interval-censored Failure Time Data, Springer, New York, 2006. [Google Scholar]

[CIT0018] 18.Sun L., Li S., Wang L., and Song X., Variable selection in semiparametric nonmixture cure model with interval-censored failure time data: Application to the prostate cancer screening study, Stat. Med. 38 (2019), pp. 3026–3039. [DOI] [PubMed] [Google Scholar]

[CIT0019] 19.Titman A.C., A pool-adjacent-violators type algorithm for non-parametric estimation of current status data with dependent censoring, Lifetime Data Anal. 20 (2014), pp. 444–458. [DOI] [PubMed] [Google Scholar]

[CIT0020] 20.Tibshirani R., Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Ser. B 58 (1996), pp. 267–288. [Google Scholar]

[CIT0021] 21.Tibshirani R., The Lassso method for variable selection in the Cox model, Stat. Med. 16 (1997), pp. 385–395. [DOI] [PubMed] [Google Scholar]

[CIT0022] 22.Wang J. and Ghosh S.K., Shape restricted nonparametric regression with Bernstein polynomials, Comput. Stat. Data Anal. 56 (2012), pp. 2729–2741. [Google Scholar]

[CIT0023] 23.Wu Y. and Cook R., Penalized regression for interval-censored times of disease progression: Selection of HLA markers in psoriatic arthritis, Biometrics 71 (2015), pp. 782–791. [DOI] [PubMed] [Google Scholar]

[CIT0024] 24.Xu D., Zhao S., Hu T., Yu M., and Sun J., Regression analysis of informative current status data with the semiparametric linear transformation model, J. Appl. Stat. 46 (2019), pp. 187–202. [Google Scholar]

[CIT0025] 25.Zhang C.H., Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2010), pp. 894–942. [Google Scholar]

[CIT0026] 26.Zhang H. and Lu W.B., Adaptive Lasso for Cox's proportional hazards model, Biometrika 94 (2007), pp. 691–703. [Google Scholar]

[CIT0027] 27.Zhang Z., Sun J., and Sun L., Statistical analysis of current status data with informative observation times, Stat. Med. 24 (2005), pp. 1399–1407. [DOI] [PubMed] [Google Scholar]

[CIT0028] 28.Zhao S., Hu T., Ma L., Wang P., and Sun J., Regression analysis of informative current status data with the additive hazards model, Lifetime Data Anal. 21 (2015), pp. 241–258. [DOI] [PubMed] [Google Scholar]

[CIT0029] 29.Zhao H., Wu Q., Li G., and Sun J., Simultaneous estimation and variable selection for interval censored data with broken adaptive ridge regression, J. Am. Stat. Assoc. 115 (2020), pp. 204–216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0030] 30.Zheng M. and Klein J.P., Estimates of marginal survival for dependent competing risk based on an assumed copula, Biometrika 82 (1995), pp. 127–138. [Google Scholar]

[CIT0031] 31.Zhou Q., Hu T., and Sun J., A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data, J. Am. Stat. Assoc. 112 (2017), pp. 664–672. [Google Scholar]

[CIT0032] 32.Zou H., The adaptive Lasso and its oracle properties, J. Am. Stat. Assoc. 101 (2006), pp. 1418–1429. [Google Scholar]

PERMALINK

The sparse estimation of the semiparametric linear transformation model with dependent current status data

Lin Luo

Jinzhao Yu

Hui Zhao

Abstract

1. Introduction

2. Notation, model and likelihood function

3. Estimation and inference procedure

Theorem 3.1 Oracle Property —

4. Simulation studies

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

5. Analysis of the Alzheimer's disease study

Table 9.

Table 10.

6. Discussion and concluding remarks

Acknowledgments

Appendices.

Appendix 1. Compute the partial derivatives ${\dot{l}}_{c} (β ∣ ϕ)$ , ${\dot{l}}_{c} (ϕ ∣ β)$ and ${\ddot{l}}_{c} (β ∣ ϕ)$ .

Appendix 2. Proof of Theorem 3.1

Lemma A.1

Lemma A.2

Lemma A.3

Proof Proof of Theorem 3.1 —

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The sparse estimation of the semiparametric linear transformation model with dependent current status data

Lin Luo

Jinzhao Yu

Hui Zhao

Abstract

1. Introduction

2. Notation, model and likelihood function

3. Estimation and inference procedure

Theorem 3.1 Oracle Property —

4. Simulation studies

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

5. Analysis of the Alzheimer's disease study

Table 9.

Table 10.

6. Discussion and concluding remarks

Acknowledgments

Appendices.

Appendix 1. Compute the partial derivatives l˙c(β∣ϕ), l˙c(ϕ∣β) and l¨c(β∣ϕ).

Appendix 2. Proof of Theorem 3.1

Lemma A.1

Lemma A.2

Lemma A.3

Proof Proof of Theorem 3.1 —

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Appendix 1. Compute the partial derivatives ${\dot{l}}_{c} (β ∣ ϕ)$ , ${\dot{l}}_{c} (ϕ ∣ β)$ and ${\ddot{l}}_{c} (β ∣ ϕ)$ .