Abstract
This paper discusses regression analysis of the failure time data arising from case-cohort periodic follow-up studies, and one feature of such data, which makes their analysis much more difficult, is that they are usually interval-censored rather than right-censored. Although some methods have been developed for general failure time data, it does not seem to exist an established procedure for the situation considered here. To address the problem, we present a semiparametric regularized procedure and develop a simple algorithm for the implementation of the proposed method. In addition, unlike some existing procedures for similar situations, the proposed procedure is shown to have the oracle property, and an extensive simulation is conducted and suggests that the presented approach seems to work well for practical situations. The method is applied to a HIV vaccine trial that motivated this study.
Keywords: Interval censoring, Penalized maximum likelihood estimation, Proportional hazards model, Sieve approach, Variable selection
1. Introduction
The case-cohort design was proposed by Prentice (1986) and is commonly used as a means of cost reduction in collecting or measuring expensive covariates in large cohort studies. Among others, one area where the design is often used is epidemiological cohort studies, in which the outcomes of interest are often times to failure events such as AIDS, cancer, heart disease and HIV infection. Under the design, complete covariate information is obtained from only a random sub-cohort of the study subjects plus all of the cases or the subjects who experience the event. In addition to the incomplete nature on covariate information, another feature of the failure time data arising from case-cohort periodic follow-up studies is that the observations are usually interval-censored rather than right-censored (Sun 2006). By interval-censored data, we mean that the failure time of interest is known or observed only to belong to an interval instead of being observed exactly. It is easy to see that among others, most medical follow-up studies such as clinical trials will yield such data and interval-censored data include right-censored data as a special case. One real study that motivated this investigation is the HVTN 505 Trial to assess the efficacy of a DNA prime-recombinant adenovirus type 5 boost (DNA/rAd5) vaccine to prevent human immunodeficiency virus type 1 (HIV-1) infection as well as to explore or identify various biomarkers that are significantly related to the HIV-1 infection. More details about the trial are given below. In the following, we will discuss regression analysis of such failure time data with focus on variable or covariate selection.
The identification of relevant predictors or variable selection is a commonly asked question in statistical analysis and many methods have been developed for it, especially under the context of linear regression, such as forward selection, backward selection and best subset selection. Among them, the regularized or penalized estimation procedure, which optimizes an objective function with a penalty function, has recently become increasingly popular and in particular, many penalty functions have been proposed. For example, one of the early work was given by Tibshirani (1996), who proposed the least absolute shrinkage and selection operator (LASSO) penalty for linear regression models, and Fan and Li (2001) developed the smoothly-clipped absolute deviation (SCAD) penalty. Also Zou (2006) generalized the LASSO penalty to the adaptive LASSO (ALASSO) penalty, and Lv and Fan (2009) and Dicker et al. (2013) proposed the smooth integration of counting and absolute deviation (SICA) penalty and the seamless-L0 (SELO) penalty, respectively.
Many authors have investigated the variable or covariate selection problem under the failure time analysis context too (Cai et al. 2009; Fan and Li 2002; Huang and Ma 2010; Martinussen and Scheike 2009). For example, Tibshirani (1997), Fan and Li (2002) and Zhang and Lu (2007) discussed the generalizations of the LASSO, SCAD and ALASSO penalty-based procedures, respectively, to the failure time data arising from the proportional hazards (PH) model, the most commonly used regression models for failure time data. More recently, Shi et al. (2014) extended the SICA penalty-based procedure to the same situation. In addition, there exist some methods in the literature for the failure time data arising from other regression models such as the additive hazards model (Martinussen and Scheike 2009; Lin and Lv 2013) and the accelerated failure time model (Cai et al. 2009; Huang and Ma 2010). However, most of the existing methods for failure time data only apply to right-censored data except the two parametric procedures given by Scolas et al. (2016) and Wu and Cook (2015), who considered interval-censored data arising from the PH model. In particular, the latter assumed that the baseline hazard function is a piecewise constant function and it is known that one drawback of this is that the piecewise constant function is neither continuous nor differentiable. More importantly, there is no theoretical justification available for both procedures.
There exists a great deal of literature on the analysis of failure time data from the case-cohort design but almost all is on right-censored data except Zhou et al. (2017b), who discussed regression analysis of interval-censored data under the PH model. In particular, Ni et al. (2016) and Ni and Cai (2017) discussed the variable selection problem but also with right-censored data. Although the number of covariates is often very large in these situations, there does not seem to exist an established procedure that is specifically developed for variable or covariate selection in the presence of interval censoring. Note that one main difference between right-censored and interval-censored data is that the latter have much more complex structures. One way for seeing this is to note the fact that for inference about the PH model based on right-censored data, a partial likelihood function can be derived and is commonly used. In particular, since it is free of the baseline hazard function, it is also used as the parametric objective function for all of the existing penalized variable selection procedures. In contrast, for the latter, no similar partial likelihood function or parametric objective function is available any more for the development of a penalized procedure. Nevertheless, in the following, we will overcome this difficulty by using the sieve approach and develop a sieve penalized estimation procedure with the use of Bernstein polynomials.
To present the proposed variable selection approach, we will first in Section 2 introduce some notation and assumptions that will be used throughout the paper and then describe the model and the basic ideas behind the method. In particular, a sieve space based on Bernstein polynomials is defined that naturally results in a parametric objective function for the variable selection procedure to be developed. The proposed sieve penalized estimation approach will be given in Section 3 and for the implementation of the procedure, a coordinate descent algorithm will be described, which is faster and can be easily implemented than the EM algorithm given in Wu and Cook (2015). In addition, the asymptotic properties of the proposed procedure including the oracle property are established. The method applies to interval-censored data both in general and from case-cohort studies. Section 4 presents some results obtained from an extensive simulation study conducted for the assessment of the proposed method, and they suggest that the method seems to work well for practical situations. In Section 5, the method is applied to the motivating HIV vaccine trial described above and Section 6 contains some discussion and remarks. As will be seen, the proposed method is also valid for interval-censored data arising from regular failure time studies.
2. Notation, Assumptions and Review
Consider a cohort study that consists of n independent subjects and let Ti and Xi denote the failure time of interest and a p-dimensional vector of covariates associated with subject i, i = 1, … , n. To describe the covariate effect, we will assume that given Xi, Ti follows the PH model given by
| (1) |
where λ0(t) denotes an unknown baseline hazard function and β is a vector of regression parameters. In the following, it will be supposed that the main goal is about inference on β with the focus on covariate selection or the identification of important covariates such as various biomarkers in the HVTN 505 Trial described above.
For each failure time Ti, we will assume that only an interval-censored observation is available and given by (Li, Ri, Δ1i, Δ2i) with Li ≤ Ri, where Li and Ri denote two random observation times, Δ1i = I(Ti ≤ Li) and Δ2i = I(Li < Ti ≤ Ri). It is apparent that Δ1i = 1 and Δ1i = Δ2i = 0 mean that one has the left- and right-censored observation on Ti, respectively. Under the case-cohort design, as described above, the covariates Xi’s are available only for the subjects from a sub-cohort and for those who have experienced the failure event of interest, meaning Δ1i = 1 or Δ2i = 1. Define δi = 1 if the covariate Xi is available or observed and 0 otherwise, i = 1, … , n. Then the observed data have the form
If δi = 1 for all i, we have regular interval-censored data.
For estimation of the regression parameter β, as mentioned above, a partial likelihood function is available when one has right-censored data but for the case of interval-censored data, one usually has to deal with or estimate both β and the baseline cumulative hazard function together, which is clearly much more complicated. For this, one approach is to approximate Λ0 by some parametric functions such as Bernstein polynomials (Ma et al. 2015; Zhou et al. 2017a). More specifically, let denote the parameter space of θ, where with M being a positive constant and is the collection of all bounded and continuous nondecreasing, nonnegative functions over the interval [c, u]. Here c and u are usually taken to be min(Li) and max(Ri), respectively. Also define the sieve space , where
with Bk(t, m, c, u) denoting the Bernstein basis polynomial defined as
In the above, we define 00 = 1 as usual, Mn is a positive constant, and m = o(nv), denoting the degree of Bernstein polynomials, for some v ∈ (0, 1). More discussion about m will be given below.
Instead of the Bernstein polynomials, of course, one can employ other functions such as piecewise constant functions as used in Wu and Cook (2015). It is worth to note that although the two approximations may look similar, they are actually quite different. One major difference is that unlike the piecewise constant function, the Bernstein polynomial approximation is a continuous one and has some nice properties. More specifically, Bernstein polynomials have the optimal shape preserving property among all approximation polynomials (Zhou et al., 2017a) and can naturally model the non-negativity and monotonicity of Λ0 with some simple restrictions that can be easily removed through reparameterization. In addition, they are easier to work with since they do not require the specification of interior knots. In consequence, as will be seen below, the resulting various functions have relatively simpler forms and also the resulting method can be easily implemented. In particular, a much faster and simpler algorithm can be developed for the implementation of the method than the EM algorithm developed in Wu and Cook (2015). Also it allows the establishment of the asymptotic properties of the resulting method including the oracle property. In addition, the method developed below can provide a better and more natural way than that in Wu and Cook (2015) for the estimation of a survival function, which is often of interest in medical studies among others.
For the selection of the sub-cohort, we will assume the independent Bernoulli sampling with the selection probability ρ ∈ (0, 1). It follows that we have P(δi = 1) = Δ1i + Δ2i + (1 − Δ1i − Δ2i)ρ. Then under the independent censoring assumption, for estimation, Zhou et al. (2017b) suggested to maximize the inverse probability weighted log-likelihood function
| (2) |
over the sieve space Θn. Here πi = δi/{Δ1i + Δ2i + (1 − Δ1i − Δ2i)ρ},
| (3) |
and α = (α0, … , αm)′ with ϕ0 = eα0 and , 1 ≤ k ≤ m, the reparameterization of the parameters ϕj’s to remove the constraint 0 ≤ ϕ0 ≤ ϕ1 ≤ ⋯ ≤ ϕm. To use ln(θ), by following Zhou et al. (2017b), we will assume that ρ is known. Note that for the case of regular studies or interval-censored data where all covariates are observed, we have δi = πi = 1 for all i and ln(θ) reduces to log Li (β, Λ0), the log likelihood function based on the observed interval-censored data.
3. Sieve Penalized Variable Selection Procedure
Now we discuss the variable or covariate selection as well as estimation of the parameters β and Λ0. For this, we propose to estimate β and Λ0 by the sieve penalized inverse probability weighted estimator (SPIPWE) defined as the values of θ that maximize
| (4) |
over Θn, where pλ denotes a penalty function that depends on a tuning parameter λ > 0.
To determine the estimator defined above, in the following, we will employ several commonly used penalty functions. In particular, we will consider the LASSO and ALASSO penalty functions with the latter defined as pλ(βj) = λ wj ∣βj∣, where wj is a weight for βj. By following the suggestion of Zou (2006), in the following, we will set the weights as , where is the estimator of βj given by maximizing ln(β, Λ0n). By setting wj = 1 for all j, the function pλ(βj) above will reduce to the LASSO penalty function. Another penalty function to be investigated here is the SCAD penalty function that has the form
| (5) |
where the constant a is set to be 3.7 to follow the suggestion of Fan and Li (2001). In addition, we will study the SICA and SELO penalty functions. The former is defined as
| (6) |
with γ1 > 0 being a shape parameter, while the latter has the form
| (7) |
with γ2 > 0 being another tuning parameter besides λ. In the numerical studies below, we will set γ1 = γ2 = 0.01 (Dicker et al. 2013).
To maximize Qn(θ) over Θn or obtain the SPIPWE (, ), we propose an iterative procedure that estimates β and α alternately. In particular, we will use the Nelder-Mead simplex algorithm to update the estimator of α given the current estimator of β and then update the estimator of β by employing the coordinate descent algorithm while fixing α (Fu 1998; Fan and Lv 2011; Lin and Lv 2013). The specific steps can be described as follows.
Step 1. Choose the initial values for both β and α.
Step 2. Given the current estimate of β, update the estimate of α by using the Nelder-Mead simplex algorithm.
Step 3. Given the current estimate of α, update the estimate of β by using the coordinate descent algorithm. In particular, update each element of β by maximizing Qn(β, Λ0n) while holding the other elements of β fixed.
Step 4. Repeat steps 2 and 3 until convergence.
The main idea behind the coordinate descent algorithm described above is to conduct the univariate maximization for each element of β repeatedly, and for each univariate maximization, one can use the golden-section search algorithm (Kiefer 1953). To check the convergence, a common way, which is used below, is to compare the summation of the absolute differences between the current and updated estimates of each component of both β and α. Note that the algorithms similar to that described above have been developed and shown to have good convergence properties in the literature (Fu, 1998; Fan and Lv, 2011; Lin and Lv, 2013). The theorem given below will show that the proposed estimator exists and the algorithm will converge. Also in the numerical studies reported below, the algorithm seems to work well and we did not have any convergence issues with the criterion. On the covariate selection, at the convergence, we will set the estimates of the components of β whose absolute values are less than a pre-specified threshold to be zero. For the numerical studies reported below, we used the threshold of 10−6 by following Wang et al. (2007) and 0 as the initial values for both β and α by following Fan and Lv (2011) and Lin and Lv (2013). Also for the numerical study below, we implement the Nelder-Mead simplex algorithm by using the R function optim and employ the R function optimize for the implementation of the golden-section search algorithm.
To establish the asymptotic properties of the proposed estimators and , we will focus on the ALASSO penalty function and some comments on this will be given below. Let θ0 = (β0, Λ00) denote the true value of θ = (β, Λ0) and assume that , where β10 is the s-dimensional nonzero elements with s < p. For a vector v, let ∥v∥ denote its Euclidean norm and define the supremum norm ∥f∥∞ = supt ∣f(t)∣ for a function f. Also define , where F(l, r) denotes the joint distribution function of the Li’s and Ri’s. Furthermore write λn = λ for sample size n and define the distance between θ1 = (β1, Λ1) ∈ Θ and θ2 = (β2, Λ2) ∈ Θ as
The asymptotic properties of described below is with respect to n → ∞. Theorem 1. Assume that Conditions (I) - (IV) given in the Appendix hold and . Then with probability tending to one, there exists a local maximizer of Qn(θ) over Θn with such that
and
,
where ν ∈ (0, 1) such that m = o(nν) and r is defined in Condition (IV).
Theorem 2. Assume that Conditions (I) - (IV) given in the Appendix hold with r > 2 and ν > 1/(2r). Also assume that and nλn → ∞. Then with probability tending to one, the maximizer in Theorem 1 satisfies
(Sparsity) and
(Asymptotic normality) in distribution, where Σ is given in the Appendix.
The proof for the results given above will be sketched in the Appendix. The theorems above tell us that the proposed estimator is consistent and the covariate selection procedure has the oracle property (Fan and Li 2001). Also it will be seen below that is efficient. Note that based on Theorem 1, the choice of ν = 1/(1 + r) yields the optimal rate of convergence nr/2(1+r), which equals n1/3 when r = 2 and improves as r increases. Also note that the main idea behind the proof is similar to that for the similar results for other situations (Fan and Li, 2001; Zhang and Lu, 2007) but there exist some new challenges. One is that unlike the case of right-censored data, the baseline function cannot be cancelled and thus has to be dealt with regression parameters together. Another is that although the penalized log-likelihood function is still concave, the traditional Taylor expansion-based approach such as that used in Zhang and Lu (2007) cannot be used here, and thus instead we will adopt the empirical process theory to prove the strong consistency of the proposed estimator and derived the consistence rate.
To implement the sieve penalized estimation procedure described above, it is apparent that we need to choose the tuning parameter λ as well as the degree m of Bernstein polynomials. For this, we propose to use the C-fold cross-validation. Specifically, let C be an integer and suppose that the observed data can be divided into C non-overlapping parts with approximately the same size. Also let denote the inverse probability weighted log-likelihood function based on the cth part of the whole data set and and the proposed SPIPWE of β and α, respectively, obtained based on the whole data without the cth part. For given λ and m, the cross-validation statistic can be defined as
| (8) |
and one can choose the values of λ and m that maximize CV (λ, m). In practice, one may perform grid search over possible ranges of λ and m and furthermore, may fix m to be the closest integer to n0.25 to focus on the selection of λ. For the inference about β10, one needs to estimate Σ in Theorem 2 and for this, one way is to employ the observed Fisher information matrix. An alternative, which is much simpler and will be used in the application below, is to apply the bootstrap procedure and more comments on this are given below.
4. A Simulation Study
An extensive simulation study was conducted to assess the performance of the sieve penalized estimation procedure proposed in the previous sections. In the study, we considered the situation where there exist p = 100 covariates with the first and last five elements of β being set to be 0.5 and the other elements being 0. Furthermore the covariate vector Xi ~ MVNp(0, Ψ) are i.i.d., where the (j, k) element of Ψ is Ψj,k = 0.5∣j−k∣. Given the true failure time, by following Zhou et al. (2017b), the interval-censored data {Li, Ri, Δ1i, Δ2i : i = 1, … , n} were generated as follows. First we generated the total number of examination time points for each subject from the zero-truncated Poisson distribution with mean μ, and given the number of examination time points, the examination times were generated from the uniform distribution over (0, τ) with τ = 1. Then for subject i, if the failure had already occurred before the first examination time, we defined Ri = τ and Li to be the first examination time and (Δ1i, Δ2i) = (1, 0); if the failure had not yet occurred at the last examination time, we defined Ri to be the last examination time and Li = 0 and (Δ1i, Δ2i) = (0, 0). Otherwise, we set (Δ1i, Δ2i) = (0, 1) and defined Li and Ri to be the largest examination time point before the true Ti and the smallest examination time point after Ti, respectively.
In the study, we considered the interval-censored data arising from both case-cohort studies and general situation where all covariates are observed. For the former, we generated the failure times Ti’s from the exponential distribution with the hazard function λ(t∣Xi; β) = η eXiβ. By setting η = 0.075 or 0.008, we have the right-censored percentage being approximately 80% and 95%, respectively. To generate the sub-cohort, by following Zhou et al. (2017b), we employed the independent Bernoulli sampling with the selection probability ρ = 0.2. Table 1 presents the covariate selection results given by the proposed approach based on 100 replications and with n = 1400 or 2000, and μ = 10. In the table, the MMSE and SD represent the median and the standard deviation of the MSE, respectively, given by among 100 data sets, where ΣX denotes the covariance matrix of the covariates. The quantity TP denotes the averaged number of the correctly selected covariates whose true coefficients are not 0, and FP the averaged number of incorrectly selected covariates whose true coefficients are 0. Here we considered all five penalty functions LASSO, ALASSO, SCAD, SICA and SELO described above, set m to be the closest integer to n0.25, and used the 5-fold cross-validation for the selection of λ based on the grid search. One can see that the proposed variable selection procedure with all five penalty functions seems to provide good and consistent performance, especially for the case with 80% right-censored rate. Also as expected, the performance got better as the full cohort size n increased.
Table 1:
Results on the covariate selection based on case-cohort interval-censored data
| Penalty | MMSE (SD) | TP(10) | FP(90) | MMSE (SD) | TP(10) | FP(90) |
|---|---|---|---|---|---|---|
| n = 1400 | ||||||
| 80% right-censored |
95% right-censored |
|||||
| LASSO | 0.385 (0.094) | 10 | 13.11 | 0.620 (0.199) | 9.71 | 17.14 |
| ALASSO | 0.408 (0.134) | 9.89 | 1.11 | 0.806 (0.607) | 8.58 | 2.06 |
| SCAD | 0.247 (0.158) | 9.48 | 1.36 | 2.819 (2.771) | 6.48 | 9.62 |
| SICA | 0.154 (0.125) | 9.69 | 0.95 | 1.116 (0.508) | 6.62 | 1.6 |
| SELO | 0.152 (0.133) | 9.55 | 0.61 | 1.144 (0.404) | 6.47 | 1.33 |
| Oracle | 0.093 (0.053) | 10 | 0 | 0.344 (0.263) | 10 | 0 |
| n = 2000 | ||||||
| 80% right-censored |
95% right-censored |
|||||
| LASSO | 0.331 (0.084) | 10 | 10.49 | 0.455 (0.140) | 9.88 | 13.11 |
| ALASSO | 0.289 (0.096) | 9.99 | 0.48 | 0.441 (0.332) | 9.66 | 2.45 |
| SCAD | 0.114 (0.085) | 9.88 | 0.99 | 1.586 (0.814) | 7.19 | 7.21 |
| SICA | 0.081 (0.074) | 9.86 | 0.31 | 0.769 (0.311) | 7.57 | 1.69 |
| SELO | 0.082 (0.079) | 9.83 | 0.27 | 0.761 (0.327) | 7.8 | 1.95 |
| Oracle | 0.062 (0.039) | 10 | 0 | 0.226 (0.167) | 10 | 0 |
Simulation results with μ = 10, 100 continuous covariates. The selection probability for the sub-cohort is ρ = 0.2. MMSE (SD): median (standard deviation) of the MSEs of 100 replications. TP: averaged true positive selection rate. FP: averaged false positive selection rate.
Table 2 gives the covariate selection results obtained by the use of the proposed method for the case of general interval-censored data with n = 500, p = 100, μ = 10 or 20 and 100 replications. Here the covariates, the true failure times and the censoring intervals were generated in the same way as above except that for the true failure time, we took η = log(20), which gave the percentage of right-censored observations being about 30%. In the table, as above, we considered the same quantities MMSE, SD, TP and FP and the same five penalty functions LASSO, ALASSO, SCAD, SICA and SELO, set m = 5, the closest integer to n0.25, and used the 5-fold cross-validation for the selection of λ based on the grid search. In addition, for comparison, we also implemented the method proposed by Wu and Cook (2015), referred to as WC method in the table, with the use of the LASSO and ALASSO penalty functions, the four piecewise constant function and the program provided in the paper. One can see from Table 2 that the proposed method seems to perform well no matter which penalty function was used, especially in terms of the quantity TP, the measure of true positive selection. Also in terms of TP, the proposed and WC methods gave similar performance, but the proposed method gave much smaller FP. More comments on the comparison are given below.
Table 2:
Results on the covariate selection based on general interval-censored data with continuous covariates
| Penalty | Method | MMSE (SD) | TP | FP |
|---|---|---|---|---|
| μ = 10 | ||||
| LASSO | WC | 0.2679 (0.1783) | 10 | 18.24 |
| Proposed | 0.3994 (0.1821) | 10 | 11.29 | |
| ALASSO | WC | 0.0549 (0.0576) | 10 | 0.12 |
| Proposed | 0.1322 (0.0861) | 10 | 0 | |
| SCAD | Proposed | 0.0689 (0.1132) | 9.93 | 0.58 |
| SICA | Proposed | 0.0555 (0.0583) | 9.94 | 0.13 |
| SELO | Proposed | 0.0602 (0.0697) | 9.94 | 0.06 |
| μ = 20 | ||||
| LASSO | WC | 0.3118 (0.1059) | 10 | 12.32 |
| Proposed | 0.3969 (0.1549) | 10 | 9.72 | |
| ALASSO | WC | 0.0501 (0.0408) | 10 | 0.09 |
| Proposed | 0.2234 (0.0953) | 10 | 0.01 | |
| SCAD | Proposed | 0.0646 (0.1553) | 9.93 | 0.35 |
| SICA | Proposed | 0.0522 (0.0573) | 9.98 | 0.03 |
| SELO | Proposed | 0.0516 (0.0545) | 9.98 | 0.04 |
Simulation results with n = 500, p = 100 and continuous covariates. MMSE (SD): median (standard deviation) of the MSEs of 100 replications. TP: averaged true positive selection rate. FP: averaged false positive selection rate.
Note that in the set-ups above, all covariates were continuous and it is apparent that in practice, we may have both continuous and discrete covariates. To investigate such situations, we considered the case with all set-up being the same as with Table 2 except that only the first fifty covariates were generated as above and the other fifty covariates were assumed to take values 0 and 1 with E(Xi) = 0.2 and the correlation 0.5∣j1−j2∣ among them, 51 ≤ j1 < j2 ≤ 100. Here the different types of covariates were supposed to be independent and the obtained results are presented in Table 3. One can see that overall they gave similar conclusions and again suggest that the proposed method performed well and consistently with respect to all penalty functions. In addition, we investigated different values for m to assess the robustness of the proposed method with respect to m and the results suggested that it seems to be robust.
Table 3:
Results on the covariate selection based on general interval-censored data with both continuous and discrete covariates
| Penalty | Method | MMSE (SD) | TP | FP |
|---|---|---|---|---|
| μ = 10 | ||||
| LASSO | WC | 0.6026 (0.2873) | 9.88 | 13.31 |
| Proposed | 1.2907 (0.3721) | 9.32 | 10.73 | |
| ALASSO | WC | 0.4163 (0.4055) | 8.12 | 0.48 |
| Proposed | 0.3647 (0.2353) | 9.93 | 0.03 | |
| SCAD | Proposed | 0.4638 (0.2615) | 8.07 | 1.28 |
| SICA | Proposed | 0.4087 (0.2628) | 8.73 | 0.55 |
| SELO | Proposed | 0.4088 (0.2571) | 8.65 | 0.59 |
| μ = 20 | ||||
| LASSO | WC | 0.5818 (0.2307) | 9.92 | 10.04 |
| Proposed | 1.2136 (0.3322) | 9.45 | 9.73 | |
| ALASSO | WC | 0.3315 (0.4237) | 8.69 | 0.43 |
| Proposed | 0.2851 (0.1933) | 9.96 | 0.03 | |
| SCAD | Proposed | 0.4432 (0.3052) | 7.92 | 1.61 |
| SICA | Proposed | 0.3624 (0.2309) | 8.74 | 0.43 |
| SELO | Proposed | 0.3664 (0.2494) | 8.74 | 0.54 |
Simulation results with n = 500, p = 100 and mixed covariates. MMSE (SD): median (standard deviation) of the MSEs of 100 replications. TP: averaged true positive selection rate. FP: averaged false positive selection rate.
On the comparison between the proposed method and the WC method, as seen from Table 2, the former always gave much smaller FP than the latter. With respect to MMSE and TP, the two actually gave similar performance as sometimes one yielded smaller values but in other situations the other did. In addition, as expected, the proposed method was much faster than the WC method. For example, on average, it took 620 seconds for the WC method to finish one replication for the results given in Table 2, while the proposed method only needed 96 seconds. In the study, we also considered some other set-ups including different values for n, p and ρ and all results indicated that the proposed method performed well and either better than or similarly to the WC method on the covariate selection. Furthermore on the comparison, following a reviewer’s suggestion, we obtained the average of the estimates of the cumulative baseline hazard function given by the two methods for the situation considered in Table 2 and present them in Figure 1. It is apparent that the proposed method gave a more natural and reasonable estimate than that of Wu and Cook (2015).
Figure 1:
The estimates of the baseline cumulative hazard function.
To assess the performance of the bootstrap procedure, for the results given in Table 1 with n = 1400 and the right censoring rate being 80%, we calculated the averages of the estimated standard errors given by the bootstrap procedure (ESE) and the sample standard errors (SSE) and present them in Table 4 for the first and last five estimated regression parameters. They suggest that the bootstrap procedure seems to perform well for all penalty functions except that it could underestimated the standard error for the case with the ALASSO function.
Table 4:
Simulation results on the assessment of the bootstrap procedure
| β1 | β2 | β3 | β4 | β5 | β96 | β97 | β98 | β99 | β100 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| LASSO | ESE | 0.0847 | 0.0939 | 0.0974 | 0.0966 | 0.0805 | 0.0813 | 0.0959 | 0.0953 | 0.0924 | 0.0815 |
| SSE | 0.0989 | 0.1087 | 0.1160 | 0.1152 | 0.0916 | 0.0992 | 0.1014 | 0.0892 | 0.1074 | 0.1038 | |
| ALASSO | ESE | 0.0693 | 0.0824 | 0.0882 | 0.0887 | 0.0645 | 0.0662 | 0.0885 | 0.0887 | 0.0864 | 0.0681 |
| SSE | 0.1390 | 0.1383 | 0.1560 | 0.1311 | 0.1312 | 0.1320 | 0.1257 | 0.1146 | 0.1344 | 0.1501 | |
| SCAD | ESE | 0.1812 | 0.1938 | 0.1955 | 0.2000 | 0.1580 | 0.1737 | 0.1947 | 0.1929 | 0.1970 | 0.1658 |
| SSE | 0.1208 | 0.1492 | 0.1878 | 0.1999 | 0.1845 | 0.1209 | 0.1634 | 0.1450 | 0.1788 | 0.2220 | |
| SELO | ESE | 0.1632 | 0.1824 | 0.1850 | 0.1939 | 0.1423 | 0.1555 | 0.1862 | 0.1829 | 0.1891 | 0.1501 |
| SEE | 0.1467 | 0.1798 | 0.1936 | 0.1956 | 0.1470 | 0.1233 | 0.1603 | 0.1457 | 0.2019 | 0.2105 | |
| SICA | ESE | 0.1623 | 0.1824 | 0.1864 | 0.1918 | 0.1475 | 0.1573 | 0.1864 | 0.1829 | 0.1853 | 0.1504 |
| SSE | 0.1384 | 0.1748 | 0.1740 | 0.1741 | 0.1489 | 0.1177 | 0.1434 | 0.1309 | 0.1678 | 0.1756 |
5. Analysis of the HVTN 505 Trial for HIV-1 Infection
The HVTN 505 Trial is a randomized, multiple-sites clinical trial of men or transgender women who had sex with men for assessing the efficacy of the DNA/rAd5 vaccine for HIV-1 infection (Fong et al. 2018; Hammer et al. 2013; Janes et al. 2017). In the study, each subject was randomly assigned to receive either the DNA/rAd5 vaccine or placebo. It is well-known that HIV-1 infection is deadly as it causes AIDS for which there is no cure and thus it is important and essential to develop a safe and effective vaccine for the prevention of the infection. The original primary analysis (Hammer et al. 2013) and the correlates analyses (Janes et al. 2017, Fong et al. 2018) studied the vaccine effect on or marker associations with the time to HIV-1 infection diagnosis using a Cox model via right-censored failure time methods. However, studying the alternative questions defined in terms of the time to true HIV-1 infection is arguably more important, which the previous analyses only studied in an approximate fashion. In the following, we directly address the alternative questions based on the observed interval-censored data.
The original trial consists of 2504 subjects randomly assigned to either the DNA/rAd5 vaccine (1253) or placebo (1251) group and for each subject, the information on four demographic covariates, age, race, BMI and behavioural risk, was collected. In addition, for all 25 HIV infection cases and other 125 subjects in the vaccine group, a number of T cell response biomarkers and antibody response biomarkers were measured. For the analysis below, by following Fong et al. (2018) and Janes et al. (2017), we will focus on these subjects for the determination of the important or relevant covariates or biomarkers for HIV-1 infection conditional on the four demographic covariates. That is, the covariates age, race, BMI and behavioural risk will always be included in the model.
First we will consider all available biomarkers including 38 T cell response biomarkers and 6 primary and 27 secondary antibody response biomarkers. In addition, by following the suggestion of Fong et al. (2018), we include the 21 pairwise interactions between the T cell response biomarker Env CD8+ polyfunctionality score, referred to as Env CD8 Score in the table below, and 6 primary antibody response biomarkers, which gives 96 covariates in total. Table 5 presents the covariates selected by the proposed estimation procedure with the use of the same penalty functions discussed in the simulation study. Here we considered 100 candidate values for the grid search of λ and the results given in the table correspond to m = 3 which is the closest integer to n0.25 and λ was chosen based on the 5-fold cross-validation. It includes the estimated covariate effects and standard errors (SE) given by the bootstrap procedure based on 100 bootstrap samples. One can see that among all of the biomarkers considered, three T cell response biomarkers, including Env CD8 Score, and eight antibody response biomarkers were identified that may be correlated to the HIV-1 infection. In addition, three pairwise interactions were selected but none of the interactions had significant effects on the HIV-1 infection time.
Table 5:
Analysis results of the HVTN 505 Trial (with interactions)
| Covariate | LASSO | ALASSO | SCAD | SICA | SELO |
|---|---|---|---|---|---|
| age | −0.192(0.362) | −0.224(0.384) | −0.155(0.336) | −0.229(0.377) | −0.227(0.369) |
| race | −0.648(0.645) | −0.628(0.622) | −0.650(0.598) | −0.718(0.534) | −0.730(0.542) |
| BMI | −0.103(0.338) | −0.089(0.358) | −0.116(0.339) | −0.169(0.334) | −0.167(0.340) |
| behavioural risk | 1.171(0.756) | 1.246(0.718) | 1.116(0.724) | 1.125(0.795) | 1.126(0.846) |
| ANY.VRC.ENV.logpctpos | 0.031(0.145) | - | - | - | - |
| CMV.logpctpos | 0.105(0.215) | 0.284(0.331) | - | - | - |
| Env CD8 Score | −1.031(0.387) | −0.981(0.315) | −0.965(0.327) | −0.925(0.399) | −0.925(0.403) |
| IgG.AEA244V1V2Tags293F | −0.094(0.168) | - | - | - | - |
| IgA.BioV3B | 0.084(0.122) | 0.119(0.114) | - | - | - |
| IgG.Cconenv03140CF.avi | −0.256(0.203) | −0.406(0.283) | - | −0.433(0.058) | −0.440(0.080) |
| IgA.ConSgp140CFI | 0.123(0.138) | 0.258(0.230) | - | - | - |
| IgG.BioC4.427B | - | 0.044(0.281) | - | - | - |
| IgG.V2 | 0.375(0.303) | 0.282(0.346) | 0.285(0.322) | - | - |
| IgG.V3 | 0.076(0.371) | - | 0.156(0.384) | - | - |
| IgG.env | 0.029(0.370) | 0.166(0.359) | −0.200(0.355) | - | - |
| Env CD8 Score × IgG.V2 | 0.289(0.259) | 0.288(0.240) | 0.262(0.227) | - | - |
| Env CD8 Score × IgG.env | 0.457(0.330) | 0.376(0.391) | 0.390(0.307) | - | - |
| IgG.V2 × IgG.V3 | −0.185(0.165) | - | −0.046(0.105) | - | - |
age: scaled age; race: indicator of Caucasian; BMI: scaled body mass index; behavioural risk: baseline behavioral risk score; variables ending with logpctpos: Month 7 log of scaled net percentage of cells positive for IL2/IFN-gamma cytokine expression for various HIV-1 antigens with antigen names in the front; Env CD8 Score: Month 7 scaled CD8+ polyfunctionality score for the ANY VRC ENV antigen; variables starting with IgG/IgA: Month 7 IgG or IgA type antibody binding level measured by the binding antibody multiplex assay.
Based on the results above, we reanalyzed the data by considering only all 75 individual biomarkers without the interactions and Table 6 gives the covariates selected by the proposed estimation procedure again with the use of the same penalty functions discussed in the simulation study. It can be seen that only one antibody response biomarker IgG.Cconenv03140CF.avi was selected and seems to be significantly related to the HIV-1 infection risk. Among the T cell response biomarkers, although six were selected, only two of them, including Env CD8 Score, seem to have significant effects and furthermore, the other biomarker, ANY.VRC.ENV.logpctpos, was only selected by the LASSO procedure and is highly correlated with the biomarker Env CD8 Score. Due to this and also the correlation between Env CD8 Score and other selected T cell response biomarkers, one can see that the estimated Env CD8 Score effects by different procedures have large variation. In addition, the results suggested that the behavioural risk also had significant effect on the HIV-1 infection time among the demographic covariates. The results above on the individual biomarkers are similar to those given by Fong et al. (2018) and Janes et al. (2017), which carried out the covariate selection based on either the univariate analysis or simplified multivariate analysis and suggested that Env CD8 Score is a significant biomarker. Specifically, in the analysis performed by Janes et al.(2017), they performed multiple testing in assessing exploratory immune response variables. They calculated p–values and considered p < .20 to be statistically significant. While Fong et al.(2018) applied the forward stepwise regression to build a logistic regression model with the significance of each covariate tested using the p–value. However, unlike above, they also suggested that Env CD8 Score may have significant pairwise interactions with two other antibody response biomarkers.
Table 6:
Analysis results of the HVTN 505 Trial (without interactions)
| Covariate | LASSO | ALASSO | SCAD | SICA | SELO |
|---|---|---|---|---|---|
| age | −0.258(0.295) | −0.254(0.300) | −0.261(0.289) | −0.227(0.380) | −0.225(0.376) |
| race | −0.502(0.487) | −0.563(0.509) | −0.776(0.451) | −0.728(0.565) | −0.738(0.574) |
| BMI | −0.101(0.265) | −0.153(0.275) | −0.059(0.263) | −0.167(0.337) | −0.165(0.346) |
| behavioural risk | 1.019(0.603) | 1.111(0.619) | 1.333(0.590) | 1.121(0.781) | 1.120(0.832) |
| ANY.VRC.ENV.logpctpos | −0.225(0.114) | - | - | - | - |
| CMV.logpctpos | 0.063(0.131) | 0.122(0.176) | - | - | - |
| VRC.ENV.A.logpctpos | −0.067(0.177) | - | - | - | - |
| VRC.ENV.B.logpctpos | −0.091(0.189) | - | −0.064(0.166) | - | - |
| VRC.ENV.C.logpctpos | −0.249(0.198) | −0.254(0.182) | −0.256(0.187) | - | - |
| Env CD8 Score | −0.042(0.159) | −0.625(0.284) | −0.214(0.147) | −0.921(0.413) | −0.921(0.444) |
| IgG.Cconenv03140CF.avi | −0.266(0.219) | −0.233(0.190) | −0.261(0.207) | −0.434(0.089) | −0.440(0.104) |
age: scaled age; race: indicator of Caucasian; BMI: scaled body mass index; behavioural risk: baseline behavioral risk score; variables ending with logpctpos: Month 7 log of scaled net percentage of cells positive for IL2/IFN-gamma cytokine expression for various HIV-1 antigens with antigen names in the front; Env CD8 Score: Month 7 scaled CD8+ polyfunctionality score for the ANY VRC ENV antigen; IgG.Cconenv03140CF.avi: Month 7 IgG antibody binding level to the HIV-1 antigen Cconenv03140CF.avi measured by the binding antibody multiplex assay.
6. Discussion and Conclusion Remarks
This paper discussed the covariate selection problem when one faces interval-censored failure time data arising from case-cohort studies and for it, a sieve penalized estimation procedure was developed. In the method, Bernstein polynomials were employed to approximate the unknown cumulative baseline hazard function and the method allows the use of various commonly used penalty functions. In terms of the penalty function, it is apparent that one question of practical interest is that for a given data set, which one should be used. For the proposed method, the simulation study seems to suggest that SICA and SELO share the similar performance and ALASSO, SICA and SELO may be better than LASSO and SCAD. On the other hand, it is well-known that it does not seem to exist a general guideline on this in the literature, and a common approach is to apply multiple penalty functions and select the variables in the sense of average. For the implementation of the proposed method, an iterative algorithm was presented with the use of the Nelder-Mead simplex algorithm and the coordinate descent algorithm. In addition, the asymptotic properties of the procedure were established and the simulation study indicated that it works well for practical situations.
As mentioned above, the proposed variable selection procedure applies to interval-censored data arising from both case-cohort studies and general failure time studies. For the latter, Wu and Cook (2015) developed a similar method by using the piecewise constant functions to approximation the unknown baseline hazard function. Although their method and the proposed one may look similar, they are actually quite different. One key difference is that compared to the piecewise constant functions, Bernstein polynomials have much better properties such as being continuous and differentiable and can provide much more natural approximation to the cumulative baseline hazard function. In consequence, this allows the development of a simpler, faster and more stable algorithm for the implementation of the proposed method than the EM algorithm given in Wu and Cook (2015). More specifically, for general interval-censored data, the proposed method directly maximizes the penalized log-likelihood function, while Wu and Cook’s method needs to calculate the conditional expectation of the penalized log-likelihood function and maximize the penalized conditional expectation iteratively. Furthermore, the asymptotic properties of the proposed method can be established and in comparison, no theoretical justification was available for the method given in Wu and Cook (2015).
There exist several directions for future research. One is that the focus above has been on the situation where the number of covariates is smaller than the sample size and it is apparent that it would be interesting to generalize the method to high dimensional situations where the number of covariates is larger than the sample size. Note that although the latter is important, the situation discussed here is also quite important as it occurs quite often in, for example, medical follow-up studies such as clinical trials, and there did not exist an established procedure with theoretical justification available. For the generalization to the high dimensional case, although the idea and the procedure discussed above are applicable and valid, one issue that may be difficult is the establishment of some theoretical justification.
Another direction for further research is that in the previous sections, we have focused on the data from independent subjects. In practice, however, sometimes one may face the data from correlated subjects such as clustered interval-censored data and it is straightforward to generalize the idea discussed above to this latter situation. For example, consider a regular failure time study that consists of n clusters with Ji subjects in the ith cluster. Furthermore assume that the subjects within the same cluster may be correlated with each other but the subjects between different clusters are independent. Let Tij and Xij denote the failure time of interest and the vector of covariates associated with the jth subject in the ith cluster, respectively, and suppose that only an interval (Lij, Rij] is observed for Tij such that Tij ∈ (Lij, Rij]. For inference, a simple generalization of the Cox model above is to assume that the cumulative hazard function of Tij has the form
| (9) |
given the latent variables μi’s. Under the assumptions above, the resulting full likelihood function can be written as the following expectation
| (10) |
with respect to the distribution of the μi’s. Then a covariate selection procedure can be developed as above with replacing the likelihood function given in (1) by the likelihood function above.
Supplementary Material
Acknowledgements
The authors wish to thanks the Editor, Professor Dankmar Bohning, the Associate Editor and two reviewers for their many comments and suggestions that great improved the paper. Also they like to thank the participants and investigators of HVTN 505, in particular Youyi Fong, Holly Janes, Georgia Tomaras, and Julie McElrath for providing the marker data for the example. Research reported in this publication was supported by the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health under Award Number R37AI054165 and the U.S. Public Health Service Grant AI068635. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Appendix: Asymptotic Properties of
In this Appendix, we will sketch the proof of the asymptotic properties of the proposed estimator given in the theorems above with the focus on the ALASSO penalty function. First, we will describe the regularity conditions that will be needed.
Condition (I). (i) is a compact subset of and β0 is an interior point of . (ii) E(XX′) is non-singular with X being bounded. That is, there exists x0 > 0 such that P(∥X∥ ≤ x0) = 1.
Condition (II). (i) There exists a positive number ς such that P(R − L ≥ ς) = 1. (ii) The union of the supports of L and R is contained in an interval [c, u], where 0 < c < u < ∞.
Condition (III). The conditional density g(l, r∣x) of (L, R) given X has bounded partial derivatives with respect to (l, r), and the bounds of these partial derivatives do not depend on (l, r, x).
Condition (IV). The function Λ0(·) is continuously differentiable up to order r in [c, u] and satisfies for some positive constant .
Next we apply the empirical process theory to prove Theorems 1-2 and in the following, For the proof, define P f = ∫ f(y)dP(y), the expectation of f(Y) taken under the distribution P, and , the empirical process indexed by f(Y). Also let C denote a positive constant which may differ from place to place in the proofs.
Proof of Theorem 1.
In this section, we apply the empirical process theory to prove the strong consistency of . For the proof, let and M(θ) = −Q(θ), where l(θ) is the log-likelihood function based on a single observation.
Define Kϵ = {θ : d(θ, θ0) ≥ ϵ, θ ∈ Θn} for ϵ > 0 and
Consider the class of functions and based on Lemmas 1-2 in the Supplementary materials of Zhou et al.(2017a), we have the following facts:
The covering number of satisfies ;
ζ1n → 0 a.s.
Then one can show that
| (11) |
If , then
| (12) |
Denote , then from (12), we can prove δϵ > 0 via proof by contradiction. It follows from (11) and (12) that
with ζn = ζ1n + ζ2n and ζn ≥ δϵ, which yields . Based on the fact (ii) above and the strong law of large numbers, we have both ζ1n → 0 and ζ2n → 0 almost surely. Thus,
which implies that almost surely.
Next we discuss the convergence rate of . Since Qn(θ) only penalizes the regression coefficients β and pλ(·) does not involve Λ, then for any ζ, we define the class
with . Following the calculation of Shen and Wong (1994) (page 597), we can establish that the entropy with bracketing
where N = m + 1, is the bracketing number of the class . Moreover, some algebraic calculations lead to for any . Under Conditions (I), (II) and (IV), we have that is uniformly bounded. Therefore it follows from Lemma 3.4.2 of van der Vaart and Wellner (1996) that
| (13) |
where .
Note that the right-hand side of (13) gives ϕn(ζ) = C(N1/2ζ + N/n1/2). Also it is easy to see that ϕn(ζ)/ζ decreases in ζ and
where rn = N−1/2n1/2 = O(n(1−ν)/2) with 0 < ν < 0.5. Hence by Theorem 3.4.1 of van der Vaart and Wellner (1996). Also note that by the Theorem 1.6.2 of Lorentz (1986), if m = o(nν), there exist Bernstein polynomials Λ0n such that
| (14) |
Thus,
This completes the proof.
Proof of Theorem 2.
(i) First we will show that and for this, it is sufficient to show that, for any θ = (β, Λ0) satisfying d(θ, θ0) = Op(n−(1−ν)/2 + n−rν/2), we have
For each β in a neighbourhood of β0, by Taylor expansion, we have that
where I1,2(β0, Λ00) is the leading p × p submatrix of Fisher information matrix. Then for j = s + 1, … , p, one can easily show that
Note that . It follows that
Since nλn → ∞, it is easily seen that for and j = s + 1, … , p, ∂Qn(β, Λ0)/∂βj and βj have different signs when n is large enough. Thus, when Qn(β, Λ0) achieves its maximum, βj = 0, that is, .
(ii) Note that , where is the vector composed of the first s elements of the sieve MLE, . To prove the asymptotic normality of , it suffices to show the following two facts:
; and
.
To prove (a), considering being a -consistent maximizer of and using the Taylor expansion, we have
where η* lies between and . Since maximizes ln(β, Λ0) and , we have that
Also because η* → η0, we have in probability, where I1(η0) is the leading s × s submatrix of the matrix I(η0). Therefore, .
To prove fact (b), denote V as the linear span of Θ − θ0, where θ0 is the true value of parameter θ. Let l(θ; X) be the weighted log-likelihood for a sample of size one and τn = n−(1−ν)/2 + n−rv/2. For any θ ∈ Θ with ∥θ − θ0∥ = O(τn), define the first order directional derivative of l(θ, X) at the direction v ∈ V as
and the second order directional derivative as
Also define the Fisher inner product on the space V as
and the Fisher norm for v ∈ V as ∥v∥1/2 = < v, v >. Let be the closed linear span of V under the Fisher norm. Then (, ∥ · ∥) is a Hilbert space.
Furthermore, define the smooth functional of θ as ψ(θ) = b′β, where b is any vector of p dimension with ∥b∥ ≤ 1. For any v ∈ V, we denote
Note that . It follows from the Riesz representation theorem that there exists such that for all and . Thus it follows from the Cramér-Wold device and the formula , we only need to show that
| (15) |
where is the sieve inverse probability weighted estimator of θ. In fact, (15) can be proved using the similar arguments of Theorem 1 of Shen (1997). For each component βj, j = 1, … , p, we denote by the solution to infbj E{lβ · ej − lbj [bj]}2, where lβ is the partial derivative of the log-likelihood about β and ej is a p-dimensional vector of zeros except the j-th element equal to 1. Then lbj [bj] is the directional derivative with respect to Λ0 and can be calculated as directional derivative defined at the beginning of the proof of Theorem 2. Now let . By the calculations of Chen et al. (2006), we have
and , where , Sβ = lβ − lb* [b*] and I(β0) is the efficient Fisher information for β. Hence the semiparametric efficiency can be established by applying the result of Theorem 4 in Shen (1997).
Combining (a) and (b), we have in distribution, where Σ is the leading s × s submatrix of [I(β0)]−1. This completes the proof.
References
- [1].Cai T, Huang J and Tian L (2009). Regularized estimation for the accelerated failure time model. Biometrics 65(2):394–404 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Chen X, Fan Y and Tsyrennikov V (2006). Efficient estimation of semiparametric multivariate copula models. Journal of the American Statistical Association 101(475):1228–1240 [Google Scholar]
- [3].Dicker L, Huang B and Lin X (2013). Variable selection and estimation with the seamless-L0 penalty. Statistica Sinica 23:929–962 [Google Scholar]
- [4].Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle property. Journal of the American Statistical Association 96(456):1348–1360 [Google Scholar]
- [5].Fan J and Li R (2002). Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics 30(1):74–99 [Google Scholar]
- [6].Fan J and Lv J (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory 57(8):5467–5484 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Fong Y, Shen X, Ashley VC, Deal A, Seaton KE, Yu C, Grant SP, Ferrari G, deCamp AC, Bailer RT, Koup RA, Montefiori D, Haynes BF, Sarzotti-Kelsoe M, Graham BS, Carpp LN, Hammer SM, Sobieszczyk ME, Karuna S, Swann E, DeJesus E, Mulligan M, Frank I, Buchbinder S, Novak RM, McElrath MJ, Kalams S, Keefer M, Frahm NA, Janes HE, Gilbert PB and Tomaras GD (2018). Modification of the association between T-cell immune responses and human immunodeficiency virus type 1 infection risk by vaccine-induced antibody responses in the HVTN 505 Trial. The Journal of Infectious Diseases 217(8):1280–1288 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Fu W (1998). Penalized regressions: the bridge versus the lasso. Journal of Computational and Graphical Statistics 7(3):397–416 [Google Scholar]
- [9].Hammer SM, Sobieszczyk ME, Janes H, Karuna ST, Mulligan MJ, Grove D, Koblin BA, Buchbinder SP, Keefer MC, Tomaras GD, Frahm N, Hural J, Anude C, Graham BS, Enama ME, Adams E, Dejesus E, Novak RM, Frank I, Bentley C, Ramirez S, Fu R, Koup RA, Mascola JR, Nabel GJ, Montefiori DC, Kublin J, McElrath MJ, Corey L and Gilbert PB (2013), Efcacy trial of a DNA/rAd5 HIV-1 preventive vaccine. The New England Journal of Medicine 369:2083–2092 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Huang J and Ma S (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis 16(2):176–195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Janes HE, Cohen KW, Frahm N, De Rosa SC, Sanchez B, Hural J, Magaret CA, Karuna S, Bentley C, Gottardo R, Finak G, Grove D, Shen M, Graham BS, Koup RA, Mulligan MJ, Koblin B, Buchbinder SP, Keefer MC, Adams E, Anude C, Corey L, Sobieszczyk M, Hammer SM, Gilbert PB and McElrath MJ (2017). Higher T-cell responses induced by DNA/rAd5 HIV-1 preventive vaccine are associated with lower HIV-1 infection risk in an efficacy trial. The Journal of Infectious Diseases 215(9):1376–1385 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Kiefer J (1953). Sequential minimax search for a maximum. Proceedings of the American Mathematical Societ 4(3):502–506 [Google Scholar]
- [13].Lin W and Lv J (2013). High-dimensional sparse additive hazards regression. Journal of American Statistical Association 501(108):247–264. [Google Scholar]
- [14].Lorentz GG (1986). Bernstein polynomials. Chelsea Publishing Co., New York [Google Scholar]
- [15].Lv J and Fan Y (2009). A unified approach to model selection and sparse recovery using regularized least squares. The Annals of Statistics 37(6A):3498–3528 [Google Scholar]
- [16].Ma L, Hu T and Sun J (2015). Sieve maximum likelihood regression analysis of dependent current status data. Biometrika 102(3):731–738 [Google Scholar]
- [17].Martinussen T and Scheike T (2009). Covariate selection for the semiparametric additive risk model. Scandinavian Journal of Statistics 36(4):602–619 [Google Scholar]
- [18].Ni A and Cai J (2017). A regularized variable selection procedure in additive hazards model with stratified case-cohort design. Lifetime Data Analysis 24(3):443–463 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Ni A, Cai J and Zeng D (2016). Variable selection for case-cohort studies with failure time outcome. Biometrika 103(3):547–562 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Prentice RL (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11 [Google Scholar]
- [21].Scolas S, El Ghouch A, Legrand C and Oulhaj A (2016). Variable selection in a flexible parametric mixture cure model with interval-censored data. Statistics in Medicine 35(7):1210–1225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Shen X (1997). On methods of sieves and penalization. The Annals of Statistics 25(6): 2555–2591 [Google Scholar]
- [23].Shen X and Wong WH (1994). Convergence rate of sieve estimates. The Annals of Statistics 22(2):580–615 [Google Scholar]
- [24].Shi Y, Cao Y, Jiao Y and Liu Y (2014). SICA for Cox’s proportional hazards model with a diverging number of parameters Acta Mathematicae Applicatae Sinica, English Series; 30(4):887–902 [Google Scholar]
- [25].Sun J (2006). The statistical analysis of interval-censored failure time data. Springer, New York [Google Scholar]
- [26].Sun J, Feng Y and Zhao H (2015). Simple estimation procedures for regression analysis of interval-censored failure time data under the proportional hazards model. Lifetime Data Analysis 21(1):138–155 [DOI] [PubMed] [Google Scholar]
- [27].Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58(1):267–288 [Google Scholar]
- [28].Tibshirani R (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine 16(4):385–395 [DOI] [PubMed] [Google Scholar]
- [29].van der Vaart AW and Wellner JA (1996). Weak convergence and empirical processes: with application to statistics. Springer, New York [Google Scholar]
- [30].Wang H, Li R and Tsai C (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Wu Y and Cook R (2015). Penalized regression for interval-censored times of disease progression: selection of HLA markers in psoriatic arthritis. Biometrics 71(3):782–791 [DOI] [PubMed] [Google Scholar]
- [32].Zhang H and Lu WB (2007). Adaptive lasso for Cox’s proportional hazards model. Biometrika 94(3):1–13 [Google Scholar]
- [33].Zhou Q, Hu T and Sun J (2017a). A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. Journal of American Statistical Association 112(518):664–672 [Google Scholar]
- [34].Zhou Q, Zhou H and Cai J (2017b). Case-cohort studies with interval-censored failure time data. Biometrika 104(1):17–29 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Zou H (2006). The adaptive lasso and its oracle properties. Journal of American Statistical Association 476(101):1418–1429 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

