Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2022 Dec 29;51(4):759–779. doi: 10.1080/02664763.2022.2161488

The sparse estimation of the semiparametric linear transformation model with dependent current status data

Lin Luo a, Jinzhao Yu b, Hui Zhao b,CONTACT
PMCID: PMC10896163  PMID: 38414802

Abstract

In this paper, we study the sparse estimation under the semiparametric linear transformation models for the current status data, also called type I interval-censored data. For the problem, the failure time of interest may be dependent on the censoring time and the association parameter between them is left unspecified. To address this, we employ the copula model to describe the dependence between them and a two-stage estimation procedure to estimate both the association parameter and the regression parameter. In addition, we propose a penalized maximum likelihood estimation procedure based on the broken adaptive ridge regression, and Bernstein polynomials are used to approximate the nonparametric functions involved. The oracle property of the proposed method is established and the numerical studies suggest that the method works well for practical situations. Finally, the method is applied to an Alzheimer's disease study that motivated this investigation.

Keywords: Broken adaptive ridge regression, current status data, dependent censoring, linear transformation model, variable selection

1. Introduction

This paper discusses the sparse estimation of the semiparametric linear transformation model with dependent current status data, which occur in many fields including econometrics, epidemiology, demography, and tumorigenicity experiments [7,17,19,27]. By dependent current status data, we usually mean that the occurrence of the failure event of interest is observed only once at a censoring or observation time C, where the occurrence time T is not exactly observed and either left- or right-censored. That is, we only know T is either less than or greater than C. In addition, T and C may be dependent or correlated with each other. One example of such data occurs in tumorigenicity experiments. In these studies, the tumor onset time is usually of interest, but cannot be exactly observed since the animals are commonly only observed at their death or sacrifice time for the presence or absence of a tumor. If the tumor is lethal or non-lethal, the tumor onset time and the death time can usually be treated to be the same or independent respectively in the analysis. On the other hand, it is known that most types of tumors are between lethal and non-lethal and in this case, the two times are clearly related, thus resulting in dependent current status data on the tumor onset time. In the following, we will discuss regression analysis of dependent current status data, with the emphasis on simultaneous estimation and covariate selection.

There have been a great deal of literature on variable selection, especially in linear models and completely observed data. Tibshirani [20] proposed the least absolute shrinkage and selection operator (LASSO) procedure, which does not have the oracle property and tends to select too many small noise features and is biased for large parameters. Fan and Li [5] developed the smoothly clipped absolute deviation (SCAD) penalty for variable selection and proved its oracle properties. Zou [32] suggested the adaptive LASSO (ALASSO) procedure which is a weighted version of LASSO. Lv and Fan [12] studied the smooth integration of counting and absolute deviation (SICA) penalty and proved its nonasymptotic property, also called the weak oracle property. Zhang [25] developed the minimax concave penalty (MCP) procedure, which provides a fast algorithm for nearly unbiased concave penalized selection. Dicker et al. [4] gave the seamless- L0 (SELO) penalty, which is a smooth function on [0,), extremely similar to the L0 penalty.

In addition, there also exist some literature for the variable selection under the survival models and censored data context. For example, Tibshirani [21], Fan et al. [6], Zhang et al. [26] and Shi et al. [16] investigated the LASSO, SCAD, ALASSO and SICA penalty-based procedures under the framework of Cox's PH model, respectively. Liu et al. [11] discussed ALASSO in the general transformation models. However, most of the existing methods for failure time data only apply to right-censored data. Until now, there are few researches about the variable selection of the interval-censored data or current status data. Recently, Scolas et al. [15] and Wu and Cook [23] considered interval-censored data arising from the Cox's PH model. Sun et al. [18] extended the LASSO, SCAD and ALASSO to the semiparametric nonmixture cure model with interval-censored failure time data. Zhao et al. [29] and Li et al. [10] discussed the simultaneous estimation and covariate selection of interval-censored data with a broken adaptive ridge (BAR) regression approach. However, all the researches above assumed that the censoring mechanism is independent of the failure time of interest. As mentioned above, the independence assumption may not be valid in many situations.

To deal with dependent censoring, recently, Ma et al. [13], Zhao et al. [28] and Xu et al. [24] employed the copula model [14,30] to describe the relationship between the failure time T of interest and the censoring time C in the contexts of the Cox PH model, the additive hazard model and the linear transformation model, respectively. However, these methods assumed that the association parameter between T and C is known, which is clearly not realistic in general. As pointed out by Ma et al. [13] and others, the resulting estimators of regression parameters can be sensitive to the assumed association parameter or be biased if the assumed association is misspecified.

In this paper, we will focus on the sparse estimation of dependent current status data arising from the semiparametric linear transformation model, where the copula model is employed to describe the correlation between the failure time and the censoring time. For the proposed method given below, we will allow the association parameter between T and C unspecified and propose a two-stage penalized estimation procedure based on the BAR regression [3,29], which has both the oracle property and the grouping effect. It approximates the L0-penalized regression using an iteratively reweighted L2-penalized algorithm and has the advantages of simultaneous variable selection and parameter estimation. Also the BAR iterative algorithm is fast and converges to a unique global optimal solution.

The rest of this paper is organized as follows. In Section 2, we begin with introducing some notations and models that will be used throughout the paper. Furthermore, a sieve likelihood function based on Bernstein polynomials is presented. In Section 3, a two-stage estimation procedure, the oracle property and the iterative algorithm of regression parameters under the BAR penalty function are developed. In Section 4, we give some simulation study results to evaluate the performance of the proposed method. Moreover, we compare the BAR regression procedure with the methods that make use of other commonly used penalty functions. In Section 5, we apply the proposed method to an application that motivated this study. Some concluding remarks and discussion are given in Section 6.

2. Notation, model and likelihood function

Consider a failure time study that consists of n independent subjects. For subject i, let Ti denote the failure time of interest and Xi be a p-dimensional vector of covariates. To describe the covariate effects, we assume that Ti follows the linear transformation model specified by

h(Ti)=βTXi+ϵi, (1)

where h() is a completely unspecified strictly increasing function, β is the p-dimensional vector of unknown regression parameters, and ϵi is a random error and has a completely known distribution. Then it is easy to see that model (1) can be rewritten as

S(tX)=G{h(t)βTX}, (2)

where S(tX) denotes the survival function of T for given X and G is the survival function of ϵi. Note that model (1) is very flexible and includes many popular models as special cases, which avoids possible model misspecification. For example, if ϵi follows the extreme value distribution function or G(t)=exp{exp(t)}, model (1) is the proportional hazards (PH) model, while if ϵi follows the standard logistic distribution or G(t)=1{1+exp(t)}1, model (1) becomes a proportional odds (PO) model.

Suppose that the observed data is given by {(Ci,δi=I(TiCi),Xi);i=1,2,,n}, where Ci denotes the censoring or observation time which may depend on Ti. Each subject is observed only once at Ci and δi=1 or δi=0 corresponds to a left- or right-censored observation on the ith subject. In practice, the observation times Ci's may depend on covariates too. For this, we assume that the conditional hazard function of Ci is given by a Cox PH model,

λ(t;Xi)=λ0(t)exp{φTWi}, (3)

where Wi is a d-dimensional (d<p) subset of Xi, λ0(t) is the unspecified baseline hazard function and φ is the unknown regression parameters.

For inference about model (1), let FT and FC denote the marginal distribution functions of the Ti's and Ci's given covariates, respectively, and F be their joint distribution. Then it follows from Theorem 2.3.3 of Nelsen [14] that there exists a copula function Mρ(μ,ν) defined on I2=[0,1]×[0,1] with Mρ(μ,0)=Mρ(0,ν)=0, Mρ(μ,1)=μ and Mρ(1,ν)=ν such that

F(t,c)=Mρ(FT(t),FC(c)),

where the parameter ρ is often referred to as the association parameter representing the relationship between T and C. In this paper, ρ is allowed to be unknown and needs to be estimated. Furthermore, we have

mρ(FT(t),FC(c))=P(TtC=c,X)=Mρ(μ,ν)ν|μ=FT(t),ν=FC(c)

by the conditional inversion idea [14], where mρ(FT(t),FC(c)) represents the conditional distribution function of T given C and X.

Define Λ1(t)=exp{h(t)}, Λ2(c)=0cλ0(s)ds, θ=(βT,ρ,Λ1())T and η=(φT,Λ2())T, where β is the regression parameter of interest. Let fC denote the marginal density function of C given covariates. Then under the models above, we have

FT(t)=1G{log(Λ1(t))βTX},FC(c)=1exp{Λ2(c)exp(φTW)},

and

fC(c)=exp{Λ2(c)exp(φTW)}λ0(c)exp(φTW).

Then given covariates X, the observed conditional likelihood function can be written as

Ln(θ,η)=i=1nP(Ci=ci,δi=1Xi)δiP(Ci=ci,δi=0Xi)1δi=i=1n{P(TiciCi=ci,Xi)fC(ci)}δi{[1P(TiciCi=ci,Xi)]fC(ci)}1δi=i=1n{mρ(FT(ci),FC(ci))}δi{1mρ(FT(ci),FC(ci))}1δifC(ci). (4)

In the next section, we will present a two-stage estimation algorithm to estimate θ and η, especially to obtain the sparse estimate of β.

3. Estimation and inference procedure

To estimate θ and η, it is clearly desirable to maximize directly the likelihood function Ln(θ,η). However, note that the estimation of η only involves in model (3), and for the observation time Ci's, we have complete data. Therefore, it is natural to estimate φ and Λ2(t) by the maximum partial likelihood estimator φ^ and Breslow estimator Λ^2(t), respectively.

In this section, we propose the following two stages estimation procedure.

The first stage is to estimate η based on model (3). That is, define φ^ to be the maximizer of the partial likelihood function

Lφi=1neφTWi{j:CjCi}eφTWj,

and furthermore let

Λ^2(t)={i:Cit}1j:CjCieφ^TWj.

For the second stage, to estimate θ, one could employ a sieve conditional likelihood procedure since Λ1 is nonparametric and involves in infinite-dimensional parameters. By following Zhou et al. [31] and others, we use the Bernstein polynomials [22] to approximate Λ1. Specifically, define the sieve space

Θn={θn=(βT,ρ,Λ1)T:(βT,ρ)TB;Λ1Mn},

where B={(βT,ρ)TRp+1,β+|ρ|D}, Mn={Λn=k=0mψkBk(t,m,l,u):0kmψk∣≤Mn,0ψ0ψ1ψm}, with

Bk(t,m,l,u)=(mk)(tlul)k(1tlul)mk,k=0,1,,m.

In the above, D is a positive constant, Mn is a class of nonnegative and nondecreasing function over the interval [l,u] with 0l<u< and usually taken as the range of observed data. Bk(t,m,l,u) is the Bernstein basis polynomial of degree m=o(nv) for some v(0,1). In practice, one can choose the value of m based on some model selection criteria, like AIC or Bayesian information criterion(BIC). In addition, by virtue of the nonnegativity and monotonicity of Λ1, it is natural to reparameterize the parameters by letting ψ0=exp(ϕ0) and ψk=i=0kexp(ϕi) ( k=1,2,,m) to remove the constraint 0ψ0ψ1ψm.

Denote ϕ=(ρ,ϕ0,ϕ1,,ϕm)T and let lc(β,ϕ)=logLn(β,ϕη^) be the conditional log-likelihood of θ given η^=(φ^,Λ^2). In the following, we will discuss the development of a penalized or regularized procedure for sparse estimate of β based on the profile log-likelihood lp(β)=maxϕlc(β,ϕ), where the dimension of covariates, p, can diverge to infinity but p<n, and for this, we will denote p by pn to emphasize the dependence of p on n.

For the simultaneous estimation and covariate selection for model (1), we consider the penalized profile likelihood function

lpp(ββˇ)=2lp(β)+j=1pnP(|βj|;λn), (5)

where the penalty function P(|βj|;λn)=λnβj2/βˇj2 and λn is a tuning parameter. Here βˇ=(βˇ1,,βˇpn)T is an adaptive consistent estimator of β without zero components. Dai et al. [3] and Zhao et al. [29] also discussed such penalty function under the contexts of the linear model and PH model, respectively. It is easily seen that the minimizing of (5) can be regarded as an automatic implementation of the best subset selection in some asymptotic sense since the term βj2/βˇj2 is expected to converge to I(|βj|=0) in probability as n.

To get the sparse estimate of β, of course, one could directly minimize lpp(ββˇ) given in (5) by some numerical iterative algorithms. Here, we will adopt a quadratic approximation algorithm which is easier and computationally more efficient.

Firstly, define the gradient vector l˙c(βϕ)=lc(β,ϕ)/∂β and the Hessian matrix l¨c(βϕ)=2lc(β,ϕ)/∂ββT. Suppose that (β~,ϕ~) satisfies l˙c(β~ϕ~)=0. By Cholesky decomposition, there exists a unique upper triangular ZRpn×pn such that l¨c(βϕ~)=ZTZ. In addition, define the pseudo-response vector y=(ZT)1[l˙c(βϕ~)l¨c(βϕ~)β] and by the second-order Taylor expansion within a small neighborhood of β~, we have

lp(β)c+12l˙c(βϕ~)T[l¨c(βϕ~)]1l˙c(βϕ~),

and

y2=l˙c(βϕ~)T[l¨c(βϕ~)]1l˙c(βϕ~),

where c is a constant and denotes the Euclidean norm for a given vector. Thus, minimizing Equation (5) is asymptotically equivalent to minimize the following penalized least-squares function:

yZTβ2+λnj=1pnβj2/βˇj2. (6)

Denote An=An(β)=ZTZ and Bn=Bn(β)=ZTy, then by minimizing (6), we can derive the following iterative formula:

β^(k+1)={An(β^(k))+λnD(β^(k))}1Bn(β^(k)), (7)

where D(β^(k))=diag((β^1(k))2,(β^2(k))2,,(β^pn(k))2) is a pn×pn matrix. Note that sometimes the above iteration may have an arithmetic overflow and to address this issue, we rewrite (7) as

β^(k+1)=Γ(β^(k)){Γ(β^(k))An(β^(k))Γ(β^(k))+λnIpn}1Γ(β^(k))Bn(β^(k)), (8)

where Γ(β^(k))=diag(β^1(k),β^2(k),,β^pn(k)). Now for given λn, we propose the following steps to solve the objective function (6).

  • Step 1.

    Choose an initial estimator β^(0) satisfying β^(0)β0Op((pn/n)1/2) and ϕ^(0).

  • Step 2.

    At the k + 1th step, update the estimate of β given (β^(k),ϕ^(k)) by Equation (8).

  • Step 3.

    By solving l˙c(ϕβ^(k+1))=lc(β^(k+1),ϕ)/∂ϕ=0, we obtain the updated estimate ϕ^(k+1).

  • Step 4.

    Repeat Steps 2 to 3 until the convergence is achieved, where the detailed expressions of l˙c(β^(k)ϕ^(k)), l¨c(β^(k)ϕ^(k)) and l˙c(ϕβ^(k+1)) are derived in Appendix (A1).

Let β^=limkβ^(k) and ϕ^=limkϕ^(k) denote the estimators of β and ϕ obtained above respectively, which will be referred to as the BAR estimators. For the determination of the tuning parameter λn, we propose to perform the widely used BIC and minimize

BIC(λ)=2lp(β^)+qnlog(n),

with qn denoting the number of the nonzero components of β^.

Let β0=(β0,1,β0,2,,β0,pn)T denote the true value of β and without loss of generality, we can assume that β0=(β01T,β02T)T, where β01 consists of all qn nonzero components and β02 the remaining zero components. Correspondingly, the BAR estimator β^=(β^1T,β^2T)T is also divided in the same way. The theorem below gives the asymptotic properties of the proposed BAR estimator β^.

Theorem 3.1 Oracle Property —

Assume that the regularity conditions C1–C8 given in Appendix (A2) hold. Then with probability tending to 1, β^2=0, β^1 exists and has the following properties

  • (i)

    β^1 is the unique fixed point of the equation β1={An(1)+λnD1(β1)}1Bn(1), where D1(β1)=diag(β12,,βqn2), An(1) is the qn×qn leading submatrix of An and Bn(1) is the vector consisting of the first qn components of Bn.

  • (ii)

    For any qn-dimensional vector b satisfying b21, nt1bT(β^1β01) converges in distribution to N(0,1), where t=bTΣb, and Σ is defined in Appendix (A2).

The proof will be sketched in Appendix (A2).

4. Simulation studies

In this section, we conduct some simulation studies to assess the finite sample performance of the proposed BAR regression procedure and compare it to other variable selection methods, for example, the LASSO penalty P(|βj|;λn)=λn|βj|, the ALASSO penalty P(|βj|;λn)=λnωj|βj| with ωj being a weight, the SCAD penalty P(|βj|;λn)=λn0|βj|min{1,(aλnx)+/(a1)λn}dx with a>2, the SICA penalty P(|βj|;λn)=λn(τ0+1)|βj|/(|βj|+τ0) with τ0>0, the SELO penalty P(|βj|;λn)=λnlog(|βj||βj|+τ01) with τ0>0, and the MCP penalty P(|βj|;λn)=0|βj|(λnx/τ0)+dx with τ0>1.

First, generate the covariates X from the multivariate normal distribution with mean zero, variance one, and the correlation between Xi and Xj being 0.1|ij| with i,j=1,,p. To describe the dependent censoring, we considered the Farlie–Gumbel–Morgenstern (FGM) copula model

Mρ(μ,ν)=μν+ρμν(1μ)(1ν),1ρ1,

where ρ is the association parameter. It is well known that Kendall's τ is a commonly used global association parameter, which is usually robust and invariant to monotone transformation. For the FGM copula model considered here, τ=P{(TiTj)(CiCj)>0}P{(TiTj)(CiCj)<0}=2ρ/9 is used to measure the correlation between the Ti's and Ci's.

For subject i, we generated the failure time Ti under model (1) with h(t)=logt and

G(t)={{1+ωexp(t)}1/ω,ω>0,exp{exp(t)},ω=0,

where ω=0 and ω=1 correspond to the PH model and the PO model, respectively. Then the observation time Ci was generated from its conditional distribution given Ti with Λ2(ci)=ci. More specifically, we sampled a bi from U(0,1) and solved the equation

bi=P(CciT=ti,Xi)=Mρ(μ,ν)/∂μμ=FT(ti),ν=Fc(ci)

for Ci=ci. Define the mean weighted squared error (MSE) to be (β^β0)TE(XXT)(β^β0). In all tables, we reported the median of MSE (MMSE), the standard deviation of MSE (SD), the averaged number of nonzero estimates of the parameters whose true values are not zero (TP), and the averaged number of nonzero estimates of parameters whose true values are zero (FP). It is easy to see that TP and FP provide the estimates of the true- and false-positive probabilities, respectively. For the results here, we took the degree of Bernstein polynomials m=[n1/4]=3, the largest inter smaller than n1/4 and used the ridge regression estimate as initial estimates of the proposed algorithm. The selection of the tuning parameters was based on the BIC criterion. The results given below are based on n = 200, 300 or 500 with 500 replications.

Firstly, we considered the proposed method for the situation where the association parameter of T and C is known. Table 1 presents the results on the covariate selection under the PH model ( ω=0) and the BAR penalty function with p = 10 or 20. Here, we set βj=1 for the first four components of X and βj=0 for other components. Assume W is composed of the first 4 and the last 4 elements of X, i.e. d = 8, and we take φj=0 or 0.1 for all components of W. In addition, we set τ=±0.2,±0.1,0. The results given in Table 2 were obtained based on the same setups but under the PO model ( ω=1). According to the reviewer's suggestion, Table 3 lists the results of increasing the correlation between Xi and Xj to 0.5|ij| with i,j=1,,p and φj=0.1, j=1,,8. From these tables, we can see that the BAR approach seems to work well in all setups and the proposed method under the PH model performs better than under the PO model. In all cases, the proposed method performs better and better as the sample size increases. Secondly, we compared the performance of the proposed BAR regression procedure with LASSO, ALASSO, SCAD, SICA, SELO and MCP, respectively. Tables 4 and 5 present the results on the covariate selection under these penalty functions based on the same setups as Tables 1 and 2, but with p = 10, n = 300, ω=0 or 1 and τ=±0.1, 0. The results show that all methods perform well except the LASSO method, which tends to select more noises than others. Under the PH model, the BAR method gives the smallest MMSE and SD, but all methods perform similarly under the PO model. Also the BAR approach generally yielded the smallest FP in all cases among the methods considered.

Table 1.

Results on covariate selection based on BAR with ω=0.

τ MMSE(SD) TP FP MMSE(SD) TP FP
n = 200 p = 10 φ=0   p = 10 φ0  
−0.2 0.055(0.109) 4.000 0.122 0.076(0.098) 3.998 0.146
−0.1 0.061(0.104) 4.000 0.120 0.068(0.091) 4.000 0.124
0 0.050(0.128) 3.998 0.130 0.046(0.105) 4.000 0.148
0.1 0.067(0.113) 4.000 0.122 0.054(0.087) 3.998 0.130
0.2 0.066(0.117) 4.000 0.138 0.062(0.151) 3.700 0.166
n = 300 p = 10 φ=0   p = 10 φ0  
−0.2 0.008(0.041) 4.000 0.020 0.046(0.038) 4.000 0.040
−0.1 0.048(0.034) 4.000 0.012 0.047(0.040) 4.000 0.028
0 0.049(0.047) 4.000 0.024 0.039(0.021) 4.000 0.004
0.1 0.023(0.035) 4.000 0.010 0.023(0.011) 4.000 0.080
0.2 0.028(0.049) 4.000 0.032 0.031(0.022) 4.000 0.014
n = 300 p = 20 φ=0   p = 20 φ0  
−0.2 0.046(0.098) 4.000 0.668 0.079(0.045) 4.000 0.664
−0.1 0.069(0.071) 4.000 0.652 0.058(0.046) 4.000 0.634
0 0.078(0.064) 4.000 0.628 0.055(0.058) 4.000 0.686
0.1 0.046(0.084) 4.000 0.566 0.063(0.054) 4.000 0.660
0.2 0.055(0.053) 3.700 0.678 0.046(0.048) 4.000 0.606
n = 500 p = 20 φ=0   p = 20 φ0  
−0.2 0.027(0.048) 4.000 0.150 0.046(0.045) 4.000 0.167
−0.1 0.048(0.050) 4.000 0.128 0.036(0.046) 4.000 0.128
0 0.043(0.021) 4.000 0.104 0.040(0.058) 4.000 0.256
0.1 0.046(0.051) 4.000 0.180 0.036(0.054) 4.000 0.130
0.2 0.034(0.042) 4.000 0.114 0.042(0.048) 4.000 0.106

Table 2.

Results on covariate selection based on BAR with ω=1.

τ MMSE(SD) TP FP MMSE(SD) TP FP
n = 200 p = 10 φ=0   p = 10 φ0  
−0.2 0.219(0.630) 4.000 0.220 0.294(0.402) 3.990 0.240
−0.1 0.246(0.476) 3.990 0.220 0.242(0.369) 3.990 0.140
0 0.270(0.492) 4.000 0.190 0.266(0.413) 4.000 0.26
0.1 0.256(0.465) 4.000 0.210 0.258(0.403) 3.990 0.210
0.2 0.257(0.350) 4.000 0.220 0.238(0.302) 4.000 0.270
n = 300 p = 10 φ=0   p = 10 φ0  
−0.2 0.176(0.240) 4.000 0.200 0.172(0.281) 4.000 0.210
−0.1 0.167(0.177) 4.000 0.180 0.163(0.245) 4.000 0.142
0 0.132(0.177) 4.000 0.130 0.180(0.206) 4.000 0.190
0.1 0.158(0.249) 4.000 0.160 0.204(0.216) 4.000 0.160
0.2 0.161(0.196) 4.000 0.250 0.147(0.183) 4.000 0.140
n = 300 p = 20 φ=0   p = 20 φ0  
−0.2 0.319(0.314) 4.000 1.858 0.308(0.409) 4.000 1.900
−0.1 0.280(0.331) 4.000 1.756 0.324(0.393) 4.000 1.874
0 0.283(0.371) 4.000 1.900 0.317(0.326) 4.000 1.930
0.1 0.278(0.316) 4.000 1.842 0.301(0.345) 4.000 1.886
0.2 0.305(0.326) 4.000 1.884 0.308(0.344) 4.000 1.874
n = 500 p = 20 φ=0   p = 20 φ0  
−0.2 0.199(0.185) 4.000 1.556 0.197(0.210) 4.000 1.588
−0.1 0.201(0.208) 4.000 1.512 0.216(0.200) 4.000 1.676
0 0.189(0.177) 4.000 1.572 0.226(0.220) 4.000 1.612
0.1 0.184(0.201) 4.000 1.414 0.201(0.244) 4.000 1.648
0.2 0.186(0.180) 4.000 1.468 0.193(0.197) 4.000 1.464

Table 3.

Results based on BAR with the correlation of the covariates X being 0.5|ij|.

  τ=0.1 τ=0.1
PH MMSE(SD) TP FP MMSE(SD) TP FP
  p = 10 p = 10
n = 200 0.362(0.311) 3.814 0.443 0.505(0.353) 3.867 0.167
n = 300 0.237(0.219) 3.929 0.171 0.306(0.364) 4.000 0.020
  p = 20 p = 20
n = 300 0.427(0.357) 3.733 1.350 0.569(0.355) 4.000 2.300
n = 500 0.279(0.169) 3.800 0.977 0.173(0.431) 4.000 1.830
  τ(unknown) τ=0.1(known)
PO MMSE(SD) TP FP MMSE(SD) TP FP
  p = 10 p = 10
  0.464(0.552) 3.800 0.667 0.354(0.554) 3.940 0.281
n = 300 0.437(0.296) 3.900 0.336 0.275(0.401) 4.000 0.261
  p = 20 p = 20
n = 300 0.755(0.458) 4.000 2.175 0.389(0.326) 4.000 2.367
n = 500 0.495(0.301) 4.000 2.200 0.255(0.266) 4.000 1.733

Table 4.

Results on covariate selection with ω=0, n = 300, p = 10.

Method MMSE(SD) TP FP MMSE(SD) TP FP
τ=0.1 φ=0 φ0
BAR 0.048(0.034) 4.000 0.012 0.047(0.040) 4.000 0.028
LASSO 0.112(0.101) 4.000 2.110 0.123(0.103) 4.000 1.670
ALASSO 0.123(0.099) 4.000 0.208 0.114(0.057) 4.000 0.128
SCAD 0.130(0.064) 4.000 0.154 0.100(0.041) 4.000 0.132
SICA 0.112(0.070) 4.000 0.128 0.132(0.052) 4.000 0.116
SELO 0.139(0.064) 4.000 0.188 0.112(0.063) 4.000 0.142
MCP 0.128(0.068) 4.000 0.160 0.145(0.064) 4.000 0.148
τ=0 φ=0 φ0
BAR 0.049(0.047) 4.000 0.024 0.039(0.021) 4.000 0.004
LASSO 0.134(0.096) 4.000 2.460 0.162(0.095) 4.000 1.910
ALASSO 0.112(0.081) 4.000 0.126 0.112(0.049) 4.000 0.162
SCAD 0.125(0.068) 4.000 0.136 0.133(0.045) 4.000 0.178
SICA 0.113(0.079) 4.000 0.118 0.152(0.044) 4.000 0.112
SELO 0.112(0.073) 4.000 0.130 0.145(0.039) 4.000 0.152
MCP 0.136(0.063) 4.000 0.168 0.131(0.040) 4.000 0.146
τ=0.1 φ=0 φ0
BAR 0.023(0.035) 4.000 0.010 0.023(0.011) 4.000 0.080
LASSO 0.198(0.186) 4.000 2.260 0.156(0.098) 4.000 2.430
ALASSO 0.129(0.087) 4.000 0.224 0.126(0.065) 4.000 0.110
SCAD 0.132(0.067) 4.000 0.174 0.138(0.034) 4.000 0.196
SICA 0.112(0.061) 4.000 0.126 0.124(0.081) 4.000 0.140
SELO 0.134(0.068) 4.000 0.104 0.142(0.038) 4.000 0.146
MCP 0.125(0.062) 4.000 0.186 0.122(0.031) 4.000 0.172

Table 5.

Results on covariate selection with ω=1, n = 300, p = 10.

Method MMSE(SD) TP FP MMSE(SD) TP FP
τ=0.1 φ=0 φ0
BAR 0.167(0.177) 4.000 0.180 0.163(0.245) 4.000 0.142
LASSO 0.230(0.137) 4.000 2.344 0.254(0.147) 4.000 2.318
ALASSO 0.260(0.247) 4.000 0.770 0.238(0.336) 4.000 0.680
SCAD 0.197(0.144) 4.000 0.890 0.215(0.152) 4.000 1.020
SICA 0.233(0.246) 4.000 1.480 0.230(0.279) 4.000 1.530
SELO 0.168(0.223) 4.000 0.300 0.188(0.210) 4.000 0.340
MCP 0.268(0.368) 4.000 1.590 0.265(0.319) 4.000 1.670
τ=0 φ=0 φ0
BAR 0.132(0.177) 4.000 0.130 0.180(0.206) 4.000 0.190
LASSO 0.218(0.153) 4.000 2.170 0.231(0.154) 4.000 2.232
ALASSO 0.212(0.230) 4.000 0.650 0.234(0.335) 4.000 0.690
SCAD 0.222(0.163) 4.000 1.030 0.194(0.140) 4.000 1.050
SICA 0.172(0.209) 4.000 1.270 0.193(0.318) 4.000 1.250
SELO 0.173(0.256) 4.000 0.350 0.173(0.363) 4.000 0.380
MCP 0.252(0.257) 4.000 1.480 0.285(0.380) 4.000 1.600
τ=0.1 φ=0 φ0
BAR 0.158(0.249) 4.000 0.160 0.204(0.216) 4.000 0.160
LASSO 0.206(0.121) 4.000 2.122 0.233(0.128) 4.000 2.256
ALASSO 0.198(0.162) 4.000 0.560 0.236(0.266) 4.000 0.750
SCAD 0.183(0.135) 4.000 0.890 0.185(0.143) 4.000 0.880
SICA 0.199(0.234) 4.000 1.520 0.220(0.270) 4.000 1.360
SELO 0.174(0.212) 4.000 0.360 0.168(0.245) 4.000 0.330
MCP 0.262(0.276) 4.000 1.480 0.267(0.225) 4.000 1.500

As pointed out by Ma et al. [13] and others, the estimators of regression parameters can be sensitive to the assumed association parameter or be biased and yield misleading results if the assumed association is misspecified. Thus it would be interesting to compare the proposed method under the unknown association parameter to the known. In the case of unknown association parameter, we repeated the studies above and present the estimation results with the true value of τ being ±0.1. Table 6 presents the results on the covariate selection under the BAR penalty function with p = 10 or 20, ω=0 or 1, and φj=0.1(j=1,,8). Table 7 lists the results based on the penalty functions given above but with n = 300, p = 10, ω=0 or 1 and τ=0.1. The two tables suggest that both situations performed well. Although all the results under the known association parameter seem to be little more efficient as expected, the proposed procedure assuming association parameter unknown still performs well.

Table 6.

Comparison of the proposed method between the unknown and known τ.

  τ(unknown) τ=0.1(known)
PH MMSE(SD) TP FP MMSE(SD) TP FP
  p = 10 p = 10
n = 200 0.170(0.168) 4.000 1.286 0.054(0.087) 3.998 0.130
n = 300 0.120(0.150) 4.000 0.667 0.023(0.011) 4.000 0.080
  p = 20 p = 20
n = 300 0.107(0.151) 4.000 2.000 0.063(0.054) 4.000 0.660
n = 500 0.149(0.169) 4.000 1.200 0.036(0.054) 4.000 0.130
  τ(unknown) τ=0.1(known)
PO MMSE(SD) TP FP MMSE(SD) TP FP
  p = 10 p = 10
n = 200 0.518(0.758) 4.000 1.340 0.258(0.403) 3.990 0.210
n = 300 0.232(0.321) 4.000 0.460 0.204(0.216) 4.000 0.160
  p = 20 p = 20
n = 300 0.694(0.554) 4.000 2.604 0.301(0.345) 4.000 1.886
n = 500 0.284(0.230) 4.000 1.906 0.201(0.244) 4.000 1.648

Table 7.

Comparison of variable selection methods between the unknown and known τ with n = 300, p = 10.

Method MMSE(SD) TP FP MMSE(SD) TP FP
PH τ(unknown) τ=0.1(known)
BAR 0.120(0.150) 4.000 0.667 0.047(0.040) 4.000 0.028
LASSO 0.286(0.188) 4.000 2.000 0.123(0.103) 4.000 1.670
ALASSO 0.118(0.167) 4.000 1.000 0.114(0.057) 4.000 0.128
SCAD 0.170(0.320) 4.000 1.333 0.100(0.041) 4.000 0.132
SICA 0.156(0.172) 4.000 1.250 0.132(0.052) 4.000 0.116
SELO 0.217(0.166) 4.000 1.333 0.112(0.063) 4.000 0.142
MCP 0.281(0.262) 4.000 1.100 0.145(0.064) 4.000 0.148
Method MMSE(SD) TP FP MMSE(SD) TP FP
PO τ(unknown) τ=0.1(known)
BAR 0.232(0.321) 4.000 0.460 0.204(0.216) 4.000 0.160
LASSO 0.382(0.246) 4.000 2.310 0.233(0.128) 4.000 2.256
ALASSO 0.233(0.206) 4.000 1.630 0.236(0.266) 4.000 0.750
SCAD 0.430(0.425) 4.000 1.210 0.185(0.143) 4.000 0.880
SICA 0.317(0.248) 4.000 1.571 0.220(0.270) 4.000 1.360
SELO 0.337(0.265) 4.000 0.640 0.168(0.245) 4.000 0.330
MCP 0.250(0.340) 4.000 1.520 0.267(0.225) 4.000 1.500

In the estimation procedures above, we assumed that the copula function is known. However, this may not be true in reality. Therefore it will be interesting to investigate the robustness of the copula assumption. To see this, we repeated the study above giving Table 6 except that the copula function is the Frank model

Mρ(μ,ν)=logρ{1+(ρu1)(ρv1)/(ρ1)},ρ>0,ρ1.

That is, the data are generated under the Frank model, but by the misspecified model, the FGM model was assumed to be true and used in the estimation procedure. Table 8 summarizes the results from the simulations. We can see that the results obtained under the misspecified model are similar to those obtained under the correct one. In other words, the proposed estimation approach appears to be robust with respect to the copula model assumption.

Table 8.

Results on covariate selection under misspecified copula model.

PH Misspecified copula model Correct copula model
  MMSE(SD) TP FP MMSE(SD) TP FP
  p = 10 p = 10
n = 200 0.065(0.055) 4.000 0.364 0.054(0.087) 3.998 0.130
n = 300 0.035(0.023) 4.000 0.133 0.023(0.011) 4.000 0.080
  p = 20 p = 20
n = 300 0.097(0.071) 3.967 0.764 0.063(0.054) 4.000 0.660
n = 500 0.066(0.058) 4.000 0.288 0.036(0.054) 4.000 0.130
PO τ(unknown) τ=0.1(known)
  MMSE(SD) TP FP MMSE(SD) TP FP
  p = 10 p = 10
n = 200 0.222(0.416) 4.000 0.067 0.258(0.403) 3.990 0.210
n = 300 0.251(0.297) 4.000 0.133 0.204(0.216) 4.000 0.160
  p = 20 p = 20
n = 300 0.331(0.409) 4.000 1.433 0.301(0.345) 4.000 1.886
n = 500 0.136(0.283) 4.000 1.576 0.201(0.244) 4.000 1.648

In addition, we also considered some other setups, especially for several other values for m and other copula models, like Frank and Clayton models used in Xu et al. [24], and obtained similar results. In other words, the proposed estimator seems to be robust to the choices of m and the copula models.

5. Analysis of the Alzheimer's disease study

In this section, we apply the method presented in the previous sections to the data arising from the Alzheimer's Disease Neuroimaging Initiative (ADNI), a longitudinal multicenter study launched in 2003 by Principal Investigator Michael W. Weiner, MD, VA Medical Center and University of California-San Francisco. The primary goal of ADNI is to test whether serial magnetic resonance imaging (MRI), positron emission tomography, other biological markers and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). In the study, participants were divided into three groups based on their cognitive status: cognitively normal, MCI and AD. We will focus on the group of the participants with MCI and the time from the baseline visit date to the AD conversion, the failure time of interest. Since the participants were only examined intermittently, the AD conversion thus cannot be observed exactly and is known it occurred before and after the most recent examination. In other words, we may have current status data on the failure time of interest.

For the analysis below, we will consider 310 participants with 24 covariates including gender (Male), marital status ( married), Age, education level (PTEDU), APOE4, AD assessment scale score of 11 items (ADAS11), AD assessment scale score of 13 items (ADAS13), AD assessment scale-delayed word recall score (ADASQ4), clinical dementia rating scale-sum of boxes (CDRSB), mini-mental state examination (MMSE), Rey auditory verbal learning test score of immediate recall (RAVLT.i), rey auditory verbal learning test learning (RAVLT.l), rey auditory verbal learning test forgetting (RAVLT.f), rey auditory verbal learning test percent forgetting (RAVLT.pf), digit symbol substitution test score (DSSTS), trails B score (TBS), functional assessment questionnaire score (FAQ), MRI ventricles volume ( MRIVV), MRI hippocampus volume ( MRIHV), MRI whole brain volume ( MRIWB), MRI entorhinal volume ( MRIEV), MRI fusiform gyrus volume ( MRIFGV), MRI volume of middle temporal gyrus (MRIMTG) and MRI intracerebral volume (ICV).

In the study, we are interested in identifying covariates that have significant effects on the risk of developing AD. As in the simulation study, we employ the LASSO, ALASSO, SCAD, SELO, SICA, MCP and BAR penalty functions under model (1). Under PH and PO models, Tables 9 and 10 give the estimated effects and standard errors (given in parentheses) of the selected covariates and unknown association parameter τ. The standard errors are provided by using the bootstrap procedure with 100 bootstrap samples.

Table 9.

The sparse estimation results for the ADNI data under PH model.

Covariate LASSO ALASSO SCAD SICA SELO MCP BAR
τ 0.227(0.181) 0.259(0.031) 0.258(0.035) 0.265(0.006) 0.246(0.038) 0.240(0.049) 0.242(0.046)
Male 0.191(0.188) 0.265(0.175) 0.249(0.176) 0.204(0.189) 0.193(0.166) 0.246(0.178) 0.258(0.189)
Married () 0.134(0.176) 0.143(0.169) () () 0.118(0.170) 0.146(0.177)
AGE () () () () () () ()
PTEDU () () () () () () ()
APOE4 () () () () 0.183(0.133) () ()
ADAS11 () () () () () () ()
ADAS13 () () () () () () ()
ADASQ4 () () () () () () ()
CDRSB () () () () () () ()
MMSE () () () () () () ()
RAVLT.i 0.325(0.191) 0.431(0.083) 0.438(0.087) 0.433(0.050) 0.435(0.056) 0.376(0.149) 0.381(0.158)
RAVLT.l () () () () () () ()
RAVLT.f () () () () () () ()
RAVLT.pf () 0.168(0.179) () 0.184(0.177) 0.161(0.162) () 0.134(0.175)
DSSTS () () () () () () ()
TBS () () () () () () ()
FAQ () () () () () () ()
MRIVV () () () () () () ()
MRIHV 0.106(0.161) () () () 0.122(0.150) () ()
MRIWB 0.347(0.154) 0.328(0.146) 0.315(0.148) 0.207(0.188) 0.280(0.155) 0.358(0.139) 0.351(0.176)
MRIEV () () () () () () ()
MRIFGV () () () () () () ()
MRIMTG 0.695(0.057) 0.715(0.038) 0.702(0.047) 0.709(0.049) 0.707(0.059) 0.710(0.049) 0.724(0.062)
ICV 0.145(0.181) 0.146(0.175) 0.111(0.164) 0.191(0.187) 0.279(0.150) 0.127(0.180) 0.187(0.183)

Table 10.

The sparse estimation results for the ADNI data under PO model.

Covariate LASSO ALASSO SCAD SICA SELO MCP BAR
τ 0.245(0.005) 0.221(0.039) 0.212(0.043) 0.231(0.028) 0.169(0.031) 0.239(0.023) 0.230(0.036)
Male 0.527(0.098) 0.609(0.095) 0.476(0.170) 0.449(0.188) 0.507(0.108) 0.555(0.107) 0.614(0.106)
Married 0.105(0.168) 0.148(0.191) () () () 0.148(0.189) 0.194(0.218)
AGE () () () () () () ()
PTEDU () () () () () () ()
APOE4 0.351(0.059) 0.393(0.046) 0.400(0.050) 0.368(0.076) 0.431(0.058) 0.366(0.063) 0.389(0.057)
ADAS11 () () () () () () ()
ADAS13 () () () () () () ()
ADASQ4 () () () () () () ()
CDRSB () () () () () () ()
MMSE () () () () () () ()
RAVLT.i 0.739(0.065) 0.710(0.097) 0.692(0.092) 0.733(0.090) 0.663(0.058) 0.762(0.068) 0.728(0.094)
RAVLT.l () () () () () () ()
RAVLT.f () () () () () () ()
RAVLT.pf () () () () () () ()
DSSTS () 0.152(0.170) () () () 0.127(0.167) 0.111(0.160)
TBS () () () () () () ()
FAQ 0.441(0.045) 0.420(0.058) 0.396(0.075) 0.435(0.064) 0.396(0.073) 0.426(0.060) 0.412(0.086)
MRIVV () () () () () () ()
MRIHV 0.308(0.135) 0.382(0.124) 0.358(0.153) 0.378(0.120) 0.455(0.089) 0.345(0.120) 0.359(0.119)
MRIWB 0.687(0.099) 0.740(0.108) 0.708(0.098) 0.692(0.138) 0.774(0.113) 0.725(0.095) 0.739(0.105)
MRIEV 0.117(0.165) () () 0.148(0.173) () () ()
MRIFGV () () () () 0.167(0.177) () ()
MRIMTG 1.009(0.063) 1.025(0.069) 1.011(0.071) 1.027(0.084) 0.989(0.066) 1.029(0.065) 1.030(0.065)
ICV 0.243(0.190) 0.343(0.191) 0.297(0.181) 0.322(0.208) 0.315(0.214) 0.260(0.195) 0.316(0.187)

One can see from Table 9 that the five covariates Male, RAVLT.i, MRIWB, MRIMTG and ICV are selected by all penalty procedures. Among the five selected covariates, RAVLT.i, MRIWB and MRIMTG seem to have a significant effect on the AD conversion under the proposed method. In Table 10, one can see that the three covariates APOE4, FAQ and MRIHV are selected by all penalties besides the five covariates above. Among the eight selected covariates, only the covariate ICV suggested no significant effect under the proposed method. With respect to the estimation of the association between the AD conversion time and the observed time, under the FGM model, the results suggest that they are significantly positively correlated. Under the two models, the analysis results are similar to the analysis results given by Li et al. [9].

6. Discussion and concluding remarks

In the preceding sections, we have discussed the sparse estimation for the semiparametric linear transformation models with the dependent current status data. For inference, a two-stage estimation procedure based on the BAR penalized likelihood was proposed to estimate the association parameter in addition to the regression parameters, where the copula model was used to describe the relationship between the failure time and the censoring time. To approximate the unknown function Λ1(t), Wu and Cook [23] employed the piecewise constant function and Ma et al. [13] used the I-spline base function. However, one drawback of the former is that the piecewise constant function is neither continuous nor differentiable, while the latter requires choosing both the order m and the interior knots. Therefore, we adopted Bernstein polynomials approximation which is continuous and has some nice properties including differentiability. Moreover, as Xu et al. [24] and Zhao et al. [29] pointed out, most inference procedures are insensitive to the value of m and one can simply choose m=[n1/4] in practice. In addition, the oracle property of the resulting estimator was established, and some numerical studies suggested that the proposed method works well for practical situations.

Note that in the study, a main advantage is the flexibility of the proposed model as it includes many popular models as special cases. Another advantage is that the proposed estimation procedure does not require specifying the association parameter beforehand, which is more practical. Although in general, the association parameter cannot be estimated without extra information. For the situation here, the extra information is given by the estimation of the marginal distribution FC in the first stage, which can then be treated as being known. However, one limitation of the proposed method is that it assumes that the underlying copula model is known. As some papers pointed out (Zheng et al., [30]; [13]), it is usually impossible to estimate it without strong assumptions. Also the simulation study suggested that the presented method seems to be robust with respect to the underlying copula assumption.

Finally, several improvements and extensions can be explored in future studies. For example, it is meaningful to generalize the proposed methods to the high-dimensional case or the dependent bivariate current status data [1]. Another possible direction for future research is to develop similar method for dependent censoring in other models, such as the additive hazards model or some cure models [18], etc.

Acknowledgments

The data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this paper. A complete listing of ADNI investigators can be found at https://adni.loni.usc.edu/wp-content/uploads/how-to-apply/ADNI-Acknowledgement-List.pdf.

Appendices.

Appendix 1. Compute the partial derivatives l˙c(βϕ), l˙c(ϕβ) and l¨c(βϕ).

To describe the dependent censoring, we consider the FGM model

Mρ(μ,ν)=μν+ρμν(1μ)(1ν),1ρ1.

Then it follows that

mρ(FT(t),FC(c))=Mρ(μ,ν)ν|μ=FT(t),ν=FC(c)=FT(t)[1+ρ(1FT(t))(12FC(c))],

the likelihood function has the following form:

Ln(β,ϕ,η)=i=1n{mρ(FT(ci),FC(ci))}δi{[1mρ(FT(ci),FC(ci))]}1δifC(ci)=i=1n{FT(ci)[1+ρ(1FT(ci))(12FC(ci))]}δi{(1FT(ci))[1ρFT(ci)×(12FC(ci))]}1δifC(ci)=i=1n{(1G{log(Λ1(ci))βTXi})[1+ρG{log(Λ1(ci))βTXi}(12FC(ci))]}δi×{G{log(Λ1(ci))βTXi}[1ρ(1G{log(Λ1(ci))βTXi})(12FC(ci))]}1δifC(ci)=i=1n{(1G{log(Λ1n(ci))βTXi})[1+ρG{log(Λ1n(ci))βTXi}(12FC(ci))]}δi×{G{log(Λ1n(ci))βTXi}[1ρ(1G{log(Λ1n(ci))βTXi})(12FC(ci))]}1δifC(ci),

and the conditional log-likelihood function is

lc(β,ϕ)=logLn(β,ϕη^)=i=1n{δilog(1G{log(Λ1n(ci))βTXi})+δilog[1+ρG{log(Λ1n(ci))βTXi}×(12F^C(ci))]+(1δi)log(G{log(Λ1n(ci))βTXi})+(1δi)log[1ρ(1G{log(Λ1n(ci))βTXi})(12F^C(ci))]+logf^C(ci)}.

Moreover, we obtain the following partial derivatives

l˙c(βϕ)=lc(β,ϕ)/∂β=i=1nG{log(Λ1n(ci))βTXi}Xi{δi1G{log(Λ1n(ci))βTXi}δiρ(12F^C(ci))1+ρG{log(Λ1n(ci))βTXi}(12F^C(ci))1δiG{log(Λ1n(ci))βTXi}(1δi)ρ(12F^C(ci))1ρ(1G{log(Λ1n(ci))βTXi})(12F^C(ci))},l¨c(βϕ)=2lc(β,ϕ)/∂β∂βT=i=1nG{log(Λ1n(ci))βTXi}XiXiT{δi1G{log(Λ1n(ci))βTXi}δiρ(12F^C(ci))1+ρG{log(Λ1n(ci))βTXi}(12F^C(ci))1δiG{log(Λ1n(ci))βTXi}(1δi)ρ(12F^C(ci))1ρ(1G{log(Λ1n(ci))βTXi})(12F^C(ci))}+i=1n[G{log(Λ1n(ci))βTXi}]2XiXiT{δi[1G{log(Λ1n(ci))βTXi}]2δiρ2(12F^C(ci))2[1+ρG{log(Λ1n(ci))βTXi}(12F^C(ci))]21δi[G{log(Λ1n(ci))βTXi}]2(1δi)ρ2(12F^C(ci))2[1ρ(1G{log(Λ1n(ci))βTXi})(12F^C(ci))]2}

and

l˙c(ϕβ)=lc(β,ϕ)/∂ϕ=i=1nG{log(Λ1n(ci))βTXi}Λ1n(ci)Λ1n(ci){δi1G{log(Λ1n(ci))βTXi}+δiρ(12F^C(ci))1+ρG{log(Λ1n(ci))βTXi}(12F^C(ci))+1δiG{log(Λ1n(ci))βTXi}+(1δi)ρ(12F^C(ci))1ρ(1G{log(Λ1n(ci))βTXi})(12F^C(ci))},

where G(x)=dG(x)/dx, G(x)=d2G(x)/dx2 and Λ1n(ci)=Λ1n(ci)/ϕ.

Appendix 2. Proof of Theorem 3.1

Before proving Theorem 3.1, we will give some regularity conditions, notation and lemmas. For the completeness, we first describe the asymptotic properties of φ^ and Λ^2 in the Lemma A.1 below.

Lemma A.1

Let φ^ and Λ^2 be the estimators of φ and Λ2 defined above, respectively, and assume that the regularity conditions given at pages 174–176 of Kalbfleisch and Prentice [8] hold. Then φ^ and Λ^2 are consistent and have the asymptotical normality.

For the proof of Lemma A.1, please refer to Section 8.3 of Kalbfleisch and Prentice [8]. To show the asymptotic properties of β^, in addition to the conditions needed in Lemma A.1, we also need the following regularity conditions.

Condition C1. X has a bounded support in Rpn, and if there exists a constant c0 and a vector ξ such that ξTZ=c0 almost surely, then c0=0 and ξ=0.

Condition C2. μM(E)>0 for any open set EI2, where μM denotes the probability measure corresponding to the copula function Mα given X.

Condition C3. The copula function M(,) has bounded first-order partial derivatives with M(u,v)/∂u and M(u,v)/∂v being Lipschitz.

Condition C4. The function Λ1 is continuously differentiable up to order r in [l,μ], r>2, and satisfies c1<Λ1(l)<Λ1(μ)<c for some positive constant c.

Condition C5. β0 is an interior point of a compact set BRpn and there exists a compact neighborhood B0 of the true value β0 such that

supβB0n1An(β)I(β0)a.s.0,

where I(β0) is a positive-definite pn×pn matrix.

Condition C6. There exists a constant a>1 such that a1<λmin(n1An)λmax(n1An)<a for sufficiently large n, where λmin() and λmax() stand for the smallest and largest eigenvalues of the matrix.

Condition C7. There exists positive constants a0 and a1 such that a0|β0,j|a1 with 1jqn, and as n, pnqn/n0, λn/n0, λn(qn/n)1/20 and λn2(pn/n)0.

Condition C8. The initial estimator β^(0) satisfies β^(0)β0=Op(pn/n).

Note that these conditions are mild and usually satisfied in practical situations. Specifically, C1 is commonly used in current status data literature ([28], Xu et al. [24]) to ensure the identifiability of parameters and the uniform convergence. C2 and C3 are useful for the copula models. C4 and C5 are necessary for the existence and consistence of the sieve maximum likelihood estimator of Λ1(t). C6 assumes that n1An is positive definite almost surely, its eigenvalues and the nonzero coefficients are bounded away from zero and infinity. C7 and C8 give some sufficient, but not necessary, conditions needed to prove the numerical convergence and asymptotic properties of the BAR estimator. Also, based on the similar arguments of Theorem 3.1 in Cui et al. [2], one can take the initial value β^(0) to be the unpenalized estimate or the ridge regression estimate of β.

Next, define β=(αT,γT)T, where α is a qn×1 vector and γ is a (pnqn)×1 vector, and analogously, write β^(k)=(α^(k)T,γ^(k)T)T and

(α(β)γ(β))g(β)=(An+λnD(β))1Bn. (A1)

For simplicity, write α(β) and γ(β) as α and γ hereafter. We further partition (n1An)1 into

(n1An)1=(ABBTG),

where A is a qn×qn matrix. Based on nonsingularity of matrix An, multiplying An1(An+λnD(β)) and subtracting β0 on both sides of (A1), we have

(αβ01γ)+λnn(AD1(α)α+BD2(γ)γBD1(α)α+GD2(γ)γ)=b^β0, (A2)

where b^=An1Bn is the ordinary least squares estimate, D1(α)=diag(β12,,βqn2) and D2(γ)=diag(βqn+12,,βpn2).

To prove Theorem 3.1, we also need the following two lemmas.

Lemma A.2

Assume Conditions C1–C8 hold, and define Hn{β=(αT,γT)T:α[1/K0,K0]qn,γδnpn/n}, where K0>1 is a constant such that β01[1/K0,K0]qn, 0<δn and pnδn2/λn0 as n. Then, with probability tending to 1, we have

  • (i)

    supβHnγ/γ<1/C0, for some constant C0>1;

  • (ii)

    g() is a mapping from Hn to itself.

Lemma A.3

Assume Conditions C1–C8 hold. Then, with probability tending to 1, the equation α=(An(1)+λnD1(α))1Bn(1) has a unique fixed-point α^ in the domain [1/K0,K0]qn, where An(1) and Bn(1) are the first qn columns of An and Bn, respectively.

The proof of the two lemmas is similar as that of Lemmas A.1–A.2 in Zhao et al. [29] and is therefore omitted here.

Proof Proof of Theorem 3.1 —

Firstly, by the definition of β^ and β^(k), it follows from Lemma A.2(i) that

β^2limkγ^(k)=0 (A3)

holds with the probability tending to 1.

Next, we prove part (i). Define

f(α)=(f1(α),,fqn(α))T(An(1)+λnD1(α))1Bn(1), (A4)

where α=(α1,,αqn)T. Then it is sufficient to show that

Pr(limkα^(k)α^=0)1, (A5)

where α^ is the fixed point of f(α) defined in Lemma A.3. Define γ=0 if γ=0. Note that for any (αT,γT)THn, from (A2) we have

limγ0γ(α,γ)=0. (A6)

Combining (A3), (A4) and (A6), we have

limγ0α(α,γ)=(An(1)+λnD1(α))1Bn(1)=f(α).

Then, as k, it follows that

ηksupα[1/K0,K0]qnf(α)α(α,γ^(k))0. (A7)

This implies that, for any ϵ0, there exists N>0 such that when k>N, ηk∣<ϵ. Since f() is a contract mapping, it follows from Lemma A.3 that

f(α^(k))α^=f(α^(k))f(α^)1cα^(k)α^, (A8)

where c>1.

Let hk=α^(k)α^. Employing some recursive calculation, we have hk0, as k. Combining (A7) and (A8), it is easy to show that

α(β^(k))α^α(β^(k))f(α^(k))+f(α^(k))α^ηk+1chk.

Thus, as k, with the probability tending to 1, we have

α^(k)α^0.

Note that β^1limkα^(k). Therefore, we show that

Pr(limkα^(k)α^=0)1, (A9)

which completes the proof of part (i).

Finally, to prove the part (ii) of the theorem, write

nt1bT(α^β01)=T1+T2,

where

T1=nt1bT[(An(1)+λnD1(α^))1An(1)Iqn]β01

and

T2=nt1bT(An(1)+λnD1(α^))1(Bn(1)An(1)β01).

By the first-order resolvent expansion formula, we can derive

T1=λnnt1bT(An(1)/n)1D1(α^)(1nAn(1)+λnnD1(α^))11nAn(1)β01.

Hence, by the Conditions C6–C7, we have

T1=Op(λnqn/n)0.

Furthermore, applying the order resolvent expansion formula, it can be shown that

T2=t1bT(An(1)/n)11n(Bn(1)An(1)β01)+op(1),

where 1n(Bn(1)An(1)β01)=1nl˙n(1)(β^ϕ^)+op(1) with l˙n(1)(β^ϕ^) denoting the first qn components of l˙n(β^ϕ^). Let I(β)=E[l¨n(1)(βϕ^)] be the Fisher information matrix, where l¨n(1)(βϕ) is the partial Hessian matrix about β. Based on the asymptotic normality of n1/2l˙n(β^ϕ^), we have

nt1bT(α^β01)Nqn(0,1)

with Σ=n(An(1)(β0))1I(1)(β0)(An(1)(β0))1 and t=bTΣb, where I(1)(β0) is the leading qn×qn submatrix of I(β0). Combining (A9), we can conclude the proof of part (ii).

Funding Statement

The research of Zhao was partially supported by the National Natural Science Foundation of China grants 12171483 and 11861030.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Chen M.H., Tong X., and Sun J., A frailty model approach for regression analysis of multivariate current status data, Stat. Med. 28 (2009), pp. 3424–3436. [DOI] [PubMed] [Google Scholar]
  • 2.Cui Q., Zhao H., and Sun J., A new copula model-based method for regression analysis of dependent current status data, Stat. Interface 11 (2018), pp. 463–471. [Google Scholar]
  • 3.Dai L., Chen K., Sun Z., Liu Z., and Li G., Broken adaptive ridge regression and its asymptotic properties, J. Multivar. Anal. 168 (2018), pp. 334–351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dicker L., Huang B., and Lin X., Variable selection and estimation with the seamless-L0 penalty, Stat. Sin. 23 (2013), pp. 929–962. [Google Scholar]
  • 5.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle property, J. Am. Stat. Assoc. 96 (2001), pp. 1348–1360. [Google Scholar]
  • 6.Fan J. and Li R., Variable selection for Cox's proportional hazards model and frailty model, Ann. Stat. 30 (2002), pp. 74–99. [Google Scholar]
  • 7.Huang J., Efficient estimation for the proportional hazards model with interval censoring, Ann. Stat. 24 (1996), pp. 540–568. [Google Scholar]
  • 8.Kalbfleisch J.D. and Prentice R.L., The Statistical Analysis of Failure Time Data, Wiley, New York, 2002. [Google Scholar]
  • 9.Li K., Chan W., Doody R.S., Quinn J., and Luo S., Alzheimers disease neuroimaging initiative. Prediction of conversion to Alzheimer's disease with longitudinal measures and time-to-event data, J. Alzheimer's Dis. 58 (2017), pp. 361–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Li S., Wu Q., and Sun J., Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer's disease, Stat. Methods Med. Res. 29 (2020), pp. 2151–2166. [DOI] [PubMed] [Google Scholar]
  • 11.Liu X. and Zeng D., Variable selection in semiparametric transformation models for right-censored data, Biometrika 100 (2013), pp. 859–876. [Google Scholar]
  • 12.Lv J. and Fan Y., A unified approach to model selection and sparse recovery using regularized least squares, Ann. Stat. 37 (2009), pp. 3498–3528. [Google Scholar]
  • 13.Ma L., Hu T., and Sun J., Sieve maximum likelihood regression analysis of dependent current status data, Biometrika 102 (2015), pp. 731–738. [Google Scholar]
  • 14.Nelsen R.B., An Introduction to Copulas, 2nd ed., Springer, New York, 2006. [Google Scholar]
  • 15.Scolas S., El Ghouch A., Legrand C., and Oulhaj A., Variable selection in a flexible parametric mixture cure model with interval-censored data, Stat. Med. 35 (2016), pp. 1210–1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shi Y., Cao Y., Jiao Y., and Liu Y., SICA for Cox's proportional hazards model with a diverging number of parameters, Acta Math. Sin. Engl. Ser. 30 (2014), pp. 887–902. [Google Scholar]
  • 17.Sun J., The Statistical Analysis of Interval-censored Failure Time Data, Springer, New York, 2006. [Google Scholar]
  • 18.Sun L., Li S., Wang L., and Song X., Variable selection in semiparametric nonmixture cure model with interval-censored failure time data: Application to the prostate cancer screening study, Stat. Med. 38 (2019), pp. 3026–3039. [DOI] [PubMed] [Google Scholar]
  • 19.Titman A.C., A pool-adjacent-violators type algorithm for non-parametric estimation of current status data with dependent censoring, Lifetime Data Anal. 20 (2014), pp. 444–458. [DOI] [PubMed] [Google Scholar]
  • 20.Tibshirani R., Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Ser. B 58 (1996), pp. 267–288. [Google Scholar]
  • 21.Tibshirani R., The Lassso method for variable selection in the Cox model, Stat. Med. 16 (1997), pp. 385–395. [DOI] [PubMed] [Google Scholar]
  • 22.Wang J. and Ghosh S.K., Shape restricted nonparametric regression with Bernstein polynomials, Comput. Stat. Data Anal. 56 (2012), pp. 2729–2741. [Google Scholar]
  • 23.Wu Y. and Cook R., Penalized regression for interval-censored times of disease progression: Selection of HLA markers in psoriatic arthritis, Biometrics 71 (2015), pp. 782–791. [DOI] [PubMed] [Google Scholar]
  • 24.Xu D., Zhao S., Hu T., Yu M., and Sun J., Regression analysis of informative current status data with the semiparametric linear transformation model, J. Appl. Stat. 46 (2019), pp. 187–202. [Google Scholar]
  • 25.Zhang C.H., Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2010), pp. 894–942. [Google Scholar]
  • 26.Zhang H. and Lu W.B., Adaptive Lasso for Cox's proportional hazards model, Biometrika 94 (2007), pp. 691–703. [Google Scholar]
  • 27.Zhang Z., Sun J., and Sun L., Statistical analysis of current status data with informative observation times, Stat. Med. 24 (2005), pp. 1399–1407. [DOI] [PubMed] [Google Scholar]
  • 28.Zhao S., Hu T., Ma L., Wang P., and Sun J., Regression analysis of informative current status data with the additive hazards model, Lifetime Data Anal. 21 (2015), pp. 241–258. [DOI] [PubMed] [Google Scholar]
  • 29.Zhao H., Wu Q., Li G., and Sun J., Simultaneous estimation and variable selection for interval censored data with broken adaptive ridge regression, J. Am. Stat. Assoc. 115 (2020), pp. 204–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zheng M. and Klein J.P., Estimates of marginal survival for dependent competing risk based on an assumed copula, Biometrika 82 (1995), pp. 127–138. [Google Scholar]
  • 31.Zhou Q., Hu T., and Sun J., A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data, J. Am. Stat. Assoc. 112 (2017), pp. 664–672. [Google Scholar]
  • 32.Zou H., The adaptive Lasso and its oracle properties, J. Am. Stat. Assoc. 101 (2006), pp. 1418–1429. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES