Abstract
Recurrent event data are commonly encountered in biomedical studies. In many situations, they are subject to an informative terminal event, e.g., death. Joint modeling of recurrent and terminal events has attracted substantial recent research interests. On the other hand, there may exist a large number of covariates in such data. How to conduct variable selection for joint frailty proportional hazards models has become a challenge in practical data analysis. We tackle this issue on the basis of the “Minimum approximated Information Criterion”(MIC) method. The proposed method can be conveniently implemented in SAS Proc NLMIXED for commonly-used frailty distributions. Its finite-sample behavior is evaluated through simulation studies. We apply the proposed method to model recurrent opportunistic diseases in the presence of death in an AIDS study.
Keywords: Frailty models, Informative censoring, Proportional hazards models, Recurrent event, Survival analysis, Variable selection
1. Introduction
Recurrent event data are frequently encountered in biomedical applications. Examples include repeated asthma attacks, recurrent hospitalizations, repeated lung infections in people with cystic fibrosis, and multiple opportunistic infections in AIDS studies. An important feature of recurrent event data is that there exist correlations among the event times within the same subject, ignorance of which could lead to biased and insufficient estimates (Lawless and Nadeau, 1995; Kelly and Lim, 2000). Over the past decades, a great deal of work has been done for the analysis of recurrent event data (e.g., Pepe and Cai, 1993; Lin, Wei, and Ying, 1998; Lin et al., 2000; Zeng and Lin, 2007; Liu and Wu, 2011; Liu, Lu, and Zhang, 2014).
Recurrent event data are often subject to informative dropout or a dependent terminal event such as death, which has a non-negligible impact on the occurrences of the events (Ghosh and Lin, 2002; Cook and Lawless, 2007). Many authors have investigated the analysis of recurrent event data under such situations (Lancaster and Intrator, 1998; Wang, Qin, and Chiang, 2001; Huang and Wolfe, 2002; Huang and Wang, 2004; Liu, Wolfe, and Huang, 2004; Huang and Liu, 2007; Liu and Huang, 2008; Ye, Kalbfleisch, and Schaubel, 2007; Zeng and Lin, 2009; Liu et al., 2016).
Our motivating example is the Terry Beirn Community Programs for Clinical Research on AIDS (CPCRA) data (Abrams et al., 1994; Neaton et al., 1994). The study enrolled 467 patients infected with the human immunodeficiency virus (HIV). About half (230) were randomized to receive diadanosine (ddI) and the others received zalcitabine (ddC). The goal of the study is to compare the effect of ddI and ddC. The median follow-up time is 13 months (range: 1-21 months). Patients might have experienced opportunistic diseases, which were defined in Table I in Neaton et al. (1994) or Table 3 in Abrams et al. (1994). By the end of the study, there were 172 confirmed or probable opportunistic diseases in the ddI group and 191 in the ddC group. Each patient had between zero and five opportunistic diseases. In addition, 100 of 230 ddI patients and 80 of 237 ddC patients were dead. Liu and Huang (2008) adopted a joint frailty proportional hazards model of recurrent and terminal events, which naturally incorporates the positive correlation between the recurrent opportunistic diseases and death. That is, a higher intensity of recurrent opportunistic diseases is associated with a higher mortality rate.
Table 1:
baseline hazard | frailty | Method | True | False | Correct% | MSE |
---|---|---|---|---|---|---|
h0(t) = 5, r0(t) = 8 | normal | Full | 0.000 | 0.000 | 0.000 | 0.139 |
MIC | 7.998 | 0.000 | 0.998 | 0.061 | ||
Oracle | 8.000 | 0.000 | 1.000 | 0.061 | ||
h0(t) = 5 + 0.2t, r0(t) = 8 + 0.2t | normal | Full | 0.000 | 0.000 | 0.000 | 0.138 |
MIC | 7.994 | 0.000 | 0.994 | 0.066 | ||
Oracle | 8.000 | 0.000 | 1.000 | 0.065 | ||
h0(t) = 5 + 0.5t2, r0(t) = 8 + 0.5t2 | normal | Full | 0.000 | 0.000 | 0.000 | 0.148 |
MIC | 7.992 | 0.000 | 0.992 | 0.069 | ||
Oracle | 8.000 | 0.000 | 1.000 | 0.068 | ||
h0(t) = 5, r0(t) = 8 | log-Gamma | Full | 0.000 | 0.000 | 0.000 | 0.172 |
MIC | 7.996 | 0.000 | 0.996 | 0.057 | ||
Oracle | 8.000 | 0.000 | 1.000 | 0.056 | ||
h0(t) = 5 + 0.2t, r0(t) = 8 + 0.2t | log-Gamma | Full | 0.000 | 0.000 | 0.000 | 0.148 |
MIC | 7.996 | 0.000 | 0.996 | 0.060 | ||
Oracle | 8.000 | 0.000 | 1.000 | 0.059 | ||
h0(t) = 5 + 0.5t2, r0(t) = 8 + 0.5t2 | log-Gamma | Full | 0.000 | 0.000 | 0.000 | 0.145 |
MIC | 7.994 | 0.000 | 0.994 | 0.057 | ||
Oracle | 8.000 | 0.000 | 1.000 | 0.057 |
Table 3:
MIC | Full | |||
---|---|---|---|---|
NAME | EST | z-statistic | EST | z-statistic |
treatment | −0.055 | −0.471 | −0.022 | −0.186 |
stratum | 0 | — | −0.116 | −0.851 |
age | −0.775 | −1.570 | −0.773 | −1.544 |
gender | −0.632 | −2.413 | −0.540 | −2.021 |
race | 0 | — | 0.244 | 1.713 |
hx | 0 | — | 0.031 | 0.197 |
prevOI | 0.356 | 1.781 | 0.246 | 1.294 |
base Hb | −2.456 | −4.625 | −2.441 | −4.246 |
base CD4 | −1.820 | −4.905 | −1.875 | −5.634 |
base AZT | 0 | — | −0.017 | −0.115 |
base TMS | −0.231 | −1.371 | −0.185 | −0.919 |
base Dap | 0 | — | 0.054 | 0.245 |
base Pent | 0 | — | −0.026 | −0.132 |
hist PCP | 0 | — | 0.103 | 0.746 |
hist Cand | 0 | — | −0.104 | −0.622 |
hist Herp | 0 | — | −0.016 | −0.072 |
ϕ | 0.321 | 3.558 | 0.303 | 3.153 |
In this study, a total of 16 covariates were examined, which implies that we have over 30 regression coefficients for the full joint model. In order to improve model efficiency and interpretability, variable selection becomes necessary and critical. For this purpose, sparse estimation via the regularized or penalized log-likelihood (or estimating function) has attracted an increasing interest. Commonly used penalized functions include the “least absolute shrinkage and selection operator” (LASSO) (Tibshirani, 1996), the “smoothly clipped absolute deviation penalty” (SCAD) (Fan and Li, 2001), the adaptive LASSO (ALASSO) (Zou, 2006), and the “minimax concave penalty” (MCP) (Zhang, 2010). There have been some developments of regularization approaches with these penalized functions for recurrent event times alone (Tong, Zhu, and Sun, 2009; Chen and Wang, 2013; Zhao et al., 2018). Other related work includes Cheng and Luo (2012) on variable selection for recurrent events data under informative censoring, and Wang et al. (2018) on variable selection for panel count data. However, none of these methods work directly for the joint frailty model of recurrent and terminal events.
Parameters in the joint frailty model are estimated by maximizing the nonconcave marginal likelihood, which involves estimation of nonparametric functions and complicated integration over the frailties. As a result, variable selection for this complex model through commonly used regularization approaches faces both sophisticated mathematical derivation and challenging implementation. To tackle this issue, in this article, we adopt the “minimum approximated information criterion” (MIC) method (Su et al., 2016) to conduct variable selection for joint proportional hazards models of recurrent and terminal events. This method obtains sparse estimation through minimizing an approximated Bayesian information criterion (BIC). By reparameterizing the model, the objective function of the MIC method remains smooth, which allows for a feasible implementation through ready-to-use statistical software.
The rest of the article is organized as follows. Section 2 presents the joint frailty proportional hazards model and introduces the penalized likelihood estimation method. In Section 3, we report results from simulation studies for evaluating the proposed method. In Section 4, we apply the proposed method to a joint frailty model of recurrent opportunistic diseases and survival for HIV-infected patients from the CPCRA study. Some concluding remarks are made in Section 5. Details about the computational method are provided in the Appendix.
2. Model and Estimation
2.1. Joint frailty proportional hazards models
Suppose that there are n subjects in a study, each of which may experience recurrences of the same type of event. Let Ti1 < Ti2 < … be the recurrent event times of subject i for i = 1, … , n. Also, let Ci denote the independent censoring time, Di be the dependent terminal event (e.g., death) time. Define δi = I(Di ≤ Ci) as the terminal event indicator, and Yi = min(Ci, Di) as the follow up time. For each subject, there is a p-dimensional covariate vector Xi ∈ Rp associated with the fixed-effect regression parameters. In addition, a random effect frailty term ui is shared by all the events of subject i, introducing their correlation. The frailty term ui is often assumed to follow a log-Gamma (i.e., eui follows a Gamma distribution) or normal distribution (e.g., Liu and Huang, 2008). The observed data consists of {(Ti1, … , Tini, δi, Yi, Xi) : i = 1, … , n}, where ni denotes the number of the observed recurrences before Yi for subject i.
Define the hazard for the recurrent events by ri(t) and that for the terminal event by hi(t), respectively. Liu, Wolfe, and Huang (2004) proposed a joint frailty proportional hazards model for the recurrent events and survival:
(1) |
(2) |
where β1, β2 ∈ Rp are regression parameters and r0(t) and h0(t) are baseline hazards for the recurrent and terminal event processes, respectively. The correlation between the terminal event and the recurrent events is modeled via the common frailty ui, which may have different impacts on the two hazards administered through the parameter γ. The frailty ui is further assumed to have a density function fϕ(ui), with ϕ being the parameter.
2.2. Estimation
Starting with the full likelihood (Liu, Wolfe, and Huang, 2004), it can be shown that the marginal likelihood associated with the joint frailty model is
(3) |
where
is the likelihood for Di; and
Estimation of the joint frailty model is a difficult problem since the marginal likelihood (3) involves the nonparameterically unknown baseline hazard functions and the integration over the frailties. To tackle these issues, piecewise constant functions are first used to approximate r0(t) and h0(t). Approximation of the baseline hazards in frailty models with piecewise constant functions has been common (Lawless and Zhan 1998, Feng, Wolfe, and Port, 2005). Simulation studies showed satisfactory estimation results when the number of the pieces is set at 5 or 10 (Liu and Huang, 2008, 2009; Liu et al., 2016). In this article, following Liu and Huang (2008), we first divide the observed terminal event times into 10 intervals by their deciles. Let be the q-th decile of the observed terminal event times for q = 0, 1, … , 10, with . Then a piecewise constant approximation of the baseline hazard h0(t) is given by
where hq > 0 for q = 1, … , 10 are constant parameters to be estimated. Similarly, we divide the observed recurrent event times into 10 intervals with deciles {, , … }. Then we have
where rq > 0 for q = 1, … , 10.
Replacing {h0(t), r0(t)} with {, } and treating {hq, rq} as additional parameters yield an approximated marginal likelihood as follows
(4) |
where the notation with h = (h1, … , h10)T and r = (r1, … , r10)T is introduced to denote all the parameters involved;
with is an estimate of the cumulative baseline hazard for the terminal event at t = Yi; and
with is an estimate of the cumulative baseline hazard for the recurrent event at t = Yi.
Unlike in ordinary frailty models, the integral in (4) cannot be explicitly solved. We resort to numerical integration. A commonly used tool to approximate the integration is the adaptive Gaussian quadrature. With this approach, the term contributed by each subject i in (4)
can be approximated by a weighted sum over predetermined abscissas for the frailty ui. More details are given in the Appendix.
2.3. Variable Selection
When a large number of covariates are present, variable selection becomes critical in avoiding the curse of dimensionality, reducing overfitting, and improving model interpretation. For this purpose, penalized log-likelihood approaches are popular for their advantages in many ways. Common penalty functions include LASSO (Tibshirani, 1996), ALASSO (Zou, 2006), SCAD (Fan and Li, 2001) and MCP (Zhang, 2010). However, to enforce sparsity, these penalty functions must have a singular point at zero (Fan and Li, 2001), making the penalized likelihood objective function non-smooth. As a result, one needs to develop a new algorithm which can handle: (i) the numerical integration techniques for approximating the observed likelihood; (ii) methods for solving the nonsmooth optimization problem such as coordinate descent. These issues render the computation a formidable task and implementation challenging in statistical practice.
To tackle this challenge, we use the “Minimum approximated Information Criterion” (MIC) method (Su et al., 2016) to select important variables for our joint frailty model. The MIC method achieves sparse estimation by minimizing an approximated Bayesian information criterion (BIC). In the MIC method, a hyperbolic tangent function is used to approximate the ℓ0 norm and a reparameterization step is devised to enforce sparsity in parameter estimates. Simulation studies in a series of papers (Su et al., 2018; Han et al., 2019) demonstrated its satisfactory performance especially in dealing with complex models. On one hand, the MIC method is yet another penalized log-likelihood approach in which the hyperbolic tangent function is used as the penalty function. On the other hand, it avoids the selection of tuning parameters by approximating BIC, while the reparameterization step leads to a smooth objective function. These features make the MIC method compatible with a wide range of numerical optimization algorithms and more computationally efficient. In the following, we will explain how the MIC method can be applied to our joint frailty model.
Consider the BIC associated with the joint frailty model:
(5) |
where , the ℓ0 norm of βmj is ‖βmj‖0 = I{βmj ≠ 0} with βmj being the jth element of for βm m = 1, 2 and j = 1, … , p, and n0 is the effective sample size that will be discussed later. The discrete nature of ℓ0 norm makes problem (5) NP-hard and its optimization becomes infeasible for a moderately large p.
To remedy, Su et al. (2016) proposed to replace the indicator function I{βmj ≠ 0} with the hyperbolic tangent function in (5), where an = Op(n). According to Su et al. (2016) and Han et al. (2019), ω(βmj) can provide a good approximation to the ℓ0 norm of [βmj.
However, since ω(βmj) is differentiable in βmj, solving the smooth optimization
does not yield a sparse estimate. To address this issue, Su et al. (2016) introduced a reparameterization:
(6) |
where ω(αmj) can approximate ω(βmj) well (Han et al., 2019). Let α1 = (α11, … , α1p)T and α2 = (α21, … , α2p)T. Denote . We can obtain the explicit expression of as a function of θ′ by replacing βmj with αmjω(αmj) in , denoted by . The MIC method solves with the objective function
(7) |
to obtain the estimate of θ′, denoted by . Then βmj can be estimated via . The objective function Q(θ′) in (7) is differentiable with respect to θ′, which avoids the non-differentiability issue for LASSO and other regularization methods. In addition, applying the chain rule and the differentiation of an inverse function, we can obtain
(8) |
From (8), it is easy to show ω(αmj) is not differentiable at βmj = 0 where dβmj/dαmj = 0 (see also Figure 2 of Su et al., 2016), which implies that the MIC method can provide a sparse estimate for β in theory (Fan and Li, 2001).
For the resultant estimator , we resort to the bootstrap method to estimate the standard error for each of its components. Throughout the paper, three hundred bootstrap samples are used for this purpose. The satisfactory performance of the bootstrap method to obtain the estimated standard errors has been documented in Han et al. (2019) in variable selection for sophisticated random effects models.
2.4. Implementation Issues
It is difficult to determine the exact value of the effective sample size n0 for our joint frailty proportional hazards model, due to complexity induced by both censoring and correlated recurrent event times. We set n0 = N for simplicity, where ni. In fact, according to Su el al. (2016), n0 only needs to be Op(n). This choice works well for the real recurrent event data since ni is often a small number. As for the choice of an, the MIC estimate is robust with respect to an according to Su et al. (2016). Following Han et al. (2019), in our simulation studies and application, we set an = n/4. Similarly to the local quadratic approximation method for solving SCAD (Fan and Li, 2001), when the real value of βmj is 0, the estimate , which is obtained by the MIC method, may not be exactly zero, but very close to zero. As a result, we can treat very small values as 0 virtually (by applying a threshold ϵ). In our simulation studies and application, the threshold is set to 10−4.
For software implementation, we choose SAS Proc NLMIXED, which can accommodate user defined loglikelihood functions with normal random effects. It can be seen that minimizing the objective function (7) is equivalent to maximizing the following function
where for i = 1, … , n and k = 1, … , ni, and . With normal frailties, we only need to write the explicit expression of as SAS statements to obtain the estimate. On the other hand, if ui has a non-normal distribution (e.g., ui follows a log-Gamma or log-Weibull distributions), we need to reformulate the likelihood conditional on a non-normal frailty to that conditional on a normal frailty (Liu and Yu, 2008). That is, for a general likelihood L, we have
where L* = Lfθ(x)/f0(x) and f0(x) is the density function of a standard normal random variable. We can then write the explicit expression of log(L*) as SAS statements for maximization. The adaptive Gaussian quadrature method is chosen to approximate the integral over the frailties and the number of quadrature points is set at 10.
For optimization, we choose the quasi-Newton algorithm, defaulted to the dual Broyden-Fletcher-Goldfarb-Shanno (dBFGS) update of the Cholesky factor of the Hessian matrix (SAS, 2018), which provides an appropriate balance between the speed and stability required for most nonlinear mixed model applications, as shown in our simulation studies and real data analysis.
3. Simulation
In simulation studies, data are generated from the following joint frailty proportional hazards model:
(9) |
(10) |
where six covariates (Xi1, … , Xi6)T are simulated from MVN6(0, Σ) with Σk,k′ = (0.5∣k–k′∣), the regression parameters are set as β1 = (1.0, 0, 0, 0, 0, −1.0)T, β2 = (1.0, −0.5, 0, 0, 0, 0)T, and γ = 1.0.
We experiment with both normal ui ~ N(0, ϕ) and log-Gamma eui ~ Gamma(1/ϕ, 1/ϕ) frailties. We set ϕ = 1 in both cases. Three settings are considered for the baseline hazard functions. In Setting I, we generate data with h0(t) = 5 and r0(t) = 8, so the piecewise constant baseline hazards are a proper assumption. Linear baseline hazards h0(t) = 5 + 0.2t and r0(t) = 8 + 0.2t are used in Setting II, while quadratic baseline hazards h0(t) = 5 + 0.5t2 and r0(t) = 8 + 0.5t2 are used in Setting III. For Settings II and III, the piecewise constant baseline hazards can only approximate the true functions. The censoring time is generated from a uniform distribution on (0, 2). Again, the number of quadrature points is set to 10 and the quasi-Newton dBFGS algorithm is used for optimization. We set an = n/4 and n0 = N with . For each dataset, we generate 300 bootstrap samples and obtain the estimated standard errors by the bootstrap method.
Now we explain how to generate recurrent event times. Let T0 = 0 and assume that Tl is generated, then it can be shown that
where F(Tl+1∣Tl+1 > Tl) is the cumulative distribution function of Tl+1 given {Tl+1 > Tl}, and R0(t) is the cumulative baseline hazard for the recurrent events. Then we can generate Tl+1 by the probability integral transform method.
The results presented below are based on 500 replications with sample size n = 500, close to the sample size of n = 467 in the application. On average, each subject has 1.9 to 2.5 recurrent events under these settings (exponential/linear/quadratic baseline hazards and normal/log-Gamma frailty), and the censoring rate has a range of 15% to 27%.
To assess the performance of , Table 1 reports the average number of correct and incorrect zero coefficient estimates, the frequency of correct model selection, and the mean square error (MSE) over the 500 simulation runs. The MSE is given by
where is the true parameter vector and is the estimator of θ from the kth simulated dataset. In Table 1, our method is also compared to the full model and the oracle model. The full model pertains to the situation that the model includes all the covariates while the oracle model is referred to the model in which we know a priori which coefficients are non-zero. Table 2 presents the averaged parameter estimates, the sample standard deviations (SD), the average of the estimated standard errors (SEE), and the coverage probabilities (CP) of the 95% confidence intervals for non-zero coefficients from our method.
Table 2:
baseline hazard | frailty | TR | EST | SD | SEE | CP | |
---|---|---|---|---|---|---|---|
h0(t) = 5, r0(t) = 8 | normal | β1,1 | 1.000 | 1.006 | 0.068 | 0.070 | 0.952 |
β1,6 | −1.000 | −1.000 | 0.057 | 0.060 | 0.960 | ||
β2,1 | 1.000 | 1.030 | 0.105 | 0.118 | 0.974 | ||
β2,2 | −0.500 | −0.511 | 0.079 | 0.096 | 0.976 | ||
γ | 1.000 | 1.049 | 0.125 | 0.146 | 0.986 | ||
ϕ | 1.000 | 1.008 | 0.129 | 0.135 | 0.956 | ||
h0(t) = 5 + 0.2t, r0(t) = 8 + 0.2t | normal | β1,1 | 1.000 | 1.004 | 0.066 | 0.070 | 0.956 |
β1,6 | −1.000 | −0.999 | 0.060 | 0.060 | 0.946 | ||
β2,1 | 1.000 | 1.034 | 0.103 | 0.117 | 0.984 | ||
β2,2 | −0.500 | −0.519 | 0.079 | 0.095 | 0.980 | ||
γ | 1.000 | 1.045 | 0.142 | 0.164 | 0.978 | ||
ϕ | 1.000 | 1.013 | 0.130 | 0.138 | 0.936 | ||
h0(t) = 5 + 0.5t2, r0(t) = 8 + 0.5t2 | normal | β1,1 | 1.000 | 1.007 | 0.068 | 0.070 | 0.942 |
β1,6 | −1.000 | −1.001 | 0.058 | 0.060 | 0.952 | ||
β2,1 | 1.000 | 1.038 | 0.111 | 0.139 | 0.970 | ||
β2,2 | −0.500 | −0.518 | 0.079 | 0.096 | 0.972 | ||
γ | 1.000 | 1.049 | 0.140 | 0.199 | 0.986 | ||
ϕ | 1.000 | 1.004 | 0.134 | 0.134 | 0.956 | ||
h0(t) = 5, r0(t) = 8 | log-Gamma | β1,1 | 1.000 | 1.003 | 0.075 | 0.074 | 0.946 |
β1,6 | −1.000 | −1.007 | 0.061 | 0.064 | 0.966 | ||
β2,1 | 1.000 | 1.030 | 0.107 | 0.119 | 0.962 | ||
β2,2 | −0.500 | −0.515 | 0.080 | 0.092 | 0.982 | ||
γ | 1.000 | 1.044 | 0.130 | 0.149 | 0.986 | ||
ϕ | 1.000 | 0.985 | 0.097 | 0.100 | 0.954 |
From Table 1, it can be seen that all of the average numbers of correct zero coefficients are very close to 8 and those of incorrect zeros coefficients are 0 for our method. This indicates that our method can accurately discover the correct sparse representation of Models (9) and (10). In addition, our method outperforms the full model in terms of MSE, and performs similarly to the oracle model. Table 2 shows that the estimates from our method are very close to the true values on average. In addition, there is a good agreement between SD and SEE values, and the coverage probabilities of the 95% confidence intervals are reasonable, demonstrating the appropriateness of the bootstrap method. In addition, from Tables 1 and 2, we can see that all the performance measures under Settings II and III are comparable to that under Setting I. This indicates that the piecewise constant approximation method performs well. We also conduct extensive simulation studies under other settings. In particular, we demonstrate our method can accommodate negative (e.g., γ = −1) and null (γ = 0) associations between recurrent and terminal events, as well as to misspecification of the frailty distribution. Due to space limitations, these additional results are presented in the supporting information.
4. Application
We apply the proposed method to the CPCRA data (Abrams et al., 1994; Neaton et al., 1994). This dataset was also analyzed in Liu and Huang (2008, 2009). The sixteen clinical covariates considered for the joint frailty model, as given by Equations (1) and (2), are: treatment (1 for the ddC group; 0 for the ddI group), stratum (baseline stratum: 1, azidothymidine intolerance; 0, azidothymidine failure), age, gender (1, female; 0, male), race (0, white; 1, others), hx (history of injecting drugs: 1, yes; 0, no), prevOI (previous opportunistic infection: 1, diagnosis of acquired immune deficiency syndrome at baseline; 0, otherwise), base Hb (baseline hemoglobin), base CD4 (baseline CD4), base AZT (baseline Zidovudine use: 1, yes; 0, no), base TMS (baseline Trimethoprim/Sulfamethoxazole use: 1, yes; 0, no), base Dap (baseline dapsone use: 1, yes; 0, no), base Pent (baseline pentamidine use: 1, yes; 0, no), hist PCP (history of Pneumocystis carinii pneumonia: 1, yes; 0, no), hist Cand (history of candiasis: 1, yes; 0, no), and hist Herp (history of herpes simplex: 1, yes; 0, no). Our objective is to select relevant variables. Since we are interested in the treatment effect, we do not penalize the coefficients for treatment.
In this application, we compare the performance of the full model and the MIC method. We first assume the frailty ui follows a normal distribution. Tables 3 presents the parameter estimates and the resultant Wald z-statistics for testing H0 : βmj = 0 in Model (1) for the opportunistic diseases. Such quantities in Model (2) for death are presented in Table 4.
Table 4:
MIC | Full | |||
---|---|---|---|---|
NAME | EST | z-statistic | EST | z-statistic |
treatment | −0.424 | −1.586 | −0.412 | −1.760 |
stratum | 0 | — | −0.251 | −0.828 |
age | 1.135 | 1.080 | 1.215 | 1.120 |
gender | 0 | — | 0.154 | 0.298 |
race | 0 | — | −0.029 | −0.092 |
hx | 0 | — | −0.106 | −0.301 |
prevOI | 1.296 | 2.795 | 1.128 | 2.149 |
base Hb | −8.663 | −8.569 | −8.093 | −6.466 |
base CD4 | −3.421 | −2.589 | −3.382 | −2.327 |
base AZT | −0.483 | −1.556 | −0.536 | −1.654 |
base TMS | 0 | — | −0.194 | −0.447 |
base Dap | 0.453 | 1.457 | 0.307 | 0.705 |
base Pent | 0 | — | −0.250 | −0.583 |
hist PCP | −0.265 | −1.049 | −0.168 | −0.569 |
hist Cand | 0 | — | 0.029 | 0.092 |
hist Herp | 0.758 | 1.809 | 0.766 | 1.702 |
γ | 2.656 | 3.956 | 2.624 | 3.510 |
We first observe no significant difference exists between ddC and ddI in the hazard of recurrent opportunistic diseases. Although treatment is insignificant in Model (2), the absolute value of the z-statistic for this covariate is 1.586. This indicates that, albeit weakly, ddC can decrease the risk of death. We can see that the MIC method selects 7 variables for Model (1) and 9 variables for Model (2) in fitting the joint frailty model (not counting treatment). Specifically, women have a relatively low risk of recurrence, while gender is not a significant variable in Model (2). Moreover, base Hb is negatively correlated with both the hazards of recurrent opportunistic diseases and death. There is a positive correlation between prevOI and the risk of death. Although prevOI is insignificant in Model (1), it is slightly positively associated with the hazard of recurrent opportunistic diseases. Also, base CD4 is negatively correlated with both the hazards of recurrent opportunistic diseases and death. In addition, γ is significantly positive, which implies that a higher intensity of recurrent opportunistic diseases is associated with a higher death rate. This finding agrees with the result in Liu and Huang (2009). Other selected variables are not significant at α = 0.05 because the absolute values of the z-statistics for them are smaller than 1.96. In comparison to the full model which cannot obtain a sparse model, we can see that the MIC method identifies these important covariates and yields a more parsimonious model with better interpretability. Fitting results for the joint model with log-Gamma frailties are presented in the supporting information. The results are largely similar to those with the normal frailties.
5. Discussion
In this article, we considered a computationally feasible variable selection method for the joint frailty proportional hazards model. The associated likelihood function is nonconcave, involving unspecified baseline hazards that cannot be canceled out and integrals with respect to frailties that cannot be explicitly integrated out. To tackle these issues, we adopted the MIC method for its smooth formulation. The unspecified baseline hazards were approximated by piecewise constant functions. The adaptive Gaussian quadrature was used to approximate the integration over the random effects, which can be conveniently implemented in SAS Proc NLMIXED. The simulation results showed that the proposed method works well for the situations considered. An application to the CPCRA data was provided to illustrate our method.
Our method can be extended in several directions. First, our approach can be applied to non-proportional hazards models, e.g., the transformation models (Zeng and Lin, 2007, 2009) or the accelerated failure time models. Second, our method can be adopted to model multi-outcomes, e.g., recurrent events of opportunistic disease, repeated measures of CD4, and death (Liu and Huang, 2009). Third, variable selection for zero-inflation (or cure) models can be considered in this framework (Liu, 2009; Liu et al., 2016).
To our best knowledge, our work is the first variable selection method for the joint frailty proportional hazards model in the literature. It is not trivial how other sparse estimation methods such as LASSO or SCAD can be extended to this sophisticated model. For this reason, we did not compare our method with another competitive approach. In an unpublished manuscript, we conducted simulation studies on variable selection for simple frailty models (Fan and Li, 2002; Liu and Huang, 2008), in which we compared the performance of the full model, the MIC method, LASSO, ALASSO, and the oracle model (results available upon request). We found that the MIC method outperforms LASSO and performs similarly to ALASSO, demonstrating its validity. In the future work, we are also interested in conducting variable selection for joint frailty proportional hazards models via the regularized or penalized log-likelihood with a traditional penalty function.
The proposed method requires relatively large sample sizes to work well for its inherent intricacy. We reran the experiments with a reduced sample size of n = 300 as requested by one reviewer. As expected, the performance of our method deteriorates moderately, especially in estimating the standard error of and . Although this sample size issue does not present itself as a major problem in our application to the AIDS study, one needs to be cautious when applying our method to small data, owing to the extraordinary complexity of the joint frailty model. For the same reason, the theoretical study of the asymptotic properties of our sparse estimator is overly difficult and warrants tremendous future research efforts, even though their finite-sample performance is highly promising as demonstrated by our simulation studies.
Since γ plays a critical role in formulating the association between the recurrent and terminal events in the joint frailty model, it is not penalized in our sparse estimation procedure. Instead, we resort to the bootstrap method to obtain statistical inference on γ. By examining if the 95% confidence interval for γ includes 0, we can formally test H0 : γ = 0. If variable selection for γ is desired, one may consider solving with the objective function
where αγ satisfies γ = ω(αγ)αγ, , and is the explicit expression of as a function of by replacing βmj and γ with αmjω(αmj) and ω(αγ)αγ in . Solving is similar to solve , as described in Section 2.4. Then γ can be estimated via . The standard error for can be obtained with the same bootstrap procedure and then used to construct the 95% confidence interval for γ.
Supplementary Material
Acknowledgements
The authors are grateful to Co-Editor, the AE, and three anonymous reviewers for their very constructive comments and suggestions. Research reported in this publication was supported by NIH/NHLBI R01 HL13694, NIH/NCATS UL1 TR002345, the National Natural Science Foundation of China (Grant Nos. 11771431, 11690015 and 11926341), and the Key Laboratory of RCSDS, CAS (No. 2008DP173182). The content is solely the responsibility of the authors and does not necessarily represent the official view of the NIH.
Appendix: Adaptive Gaussian Quadrature
The empirical Bayes estimate of ui is defined as which minimizes
Let D be the number of the quadrature points and zd and qd be the standard Gauss-Hermite abscissas and weights (Golub and Welsch, 1969), for d = 1, … , D. Then the adaptive Gaussian quadrature integral approximation is as follows:
where H(Yi) is the second derivative of
with respect to ui at , and .
Footnotes
Supporting Information
Web Appendices and Tables referenced in Sections 3 and 4 are available with this paper.
Data Availability Statement
Restrictions apply to the availability of these data, which were used under license for this study. A subset of the data is available from the JM package in R. The code for implementing the proposed method is publicly available at https://github.com/joyfulstones/variable-selection-in-joint-frailty-models.
References
- [1].Abrams DI, Goldman AI, Launer C, Korvick JA, Neaton JD, Crane LR, et al. (1994). A comparative trial of Didanosine or Zalcitabine after treatment with Zidovudine in patients with human immunodeficiency virus infection. New Engl J Med 330, 657–662. [DOI] [PubMed] [Google Scholar]
- [2].Chen X and Wang Q (2013). Variable selection in the additive rate model for recurrent event data. Comput Stat Data An 57, 491–503. [Google Scholar]
- [3].Cheng X and Luo L (2012). Variable selection for recurrent event data with informative censoring. Journal of Systems Science and Complexity 25, 987–997. [Google Scholar]
- [4].Cook RJ and Lawless JF (2007). The Statistical Analysis of Recurrent Events. New York, NY: Springer. [Google Scholar]
- [5].Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96, 1348–1360. [Google Scholar]
- [6].Fan J and Li R (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30, 74–99. [Google Scholar]
- [7].Feng S, Wolfe RA, and Port PK (2005). Frailty survival model analysis of the National Deceased Donor Kidney Transplant Dataset using Poisson variance structures. J Am Stat Assoc 100, 728–735. [Google Scholar]
- [8].Ghosh D and Lin DY (2002). Marginal regression models for recurrent and terminal events. Stat Sinica 12, 663–688. [Google Scholar]
- [9].Golub GH and Welsch JH (1969). Calculation of Gauss quadrature rules. Math Comput 23, 221–230. [Google Scholar]
- [10].Han D, Liu L, Su X, Johnson B, and Sun L (2019). Variable selection for random effects two-part models. Stat Methods Med Res 29, 2697–2709. [DOI] [PubMed] [Google Scholar]
- [11].Huang CY and Wang MC (2004). Joint modeling and estimation for recurrent event processes and failure time data. J Am Stat Assoc 99, 1153–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Huang X and Liu L (2007). A joint frailty model for survival time and gap times between recurrent events. Biometrics 63, 389–397. [DOI] [PubMed] [Google Scholar]
- [13].Huang X and Wolfe RA (2002). A frailty model for informative censoring. Biometrics 58, 510–520. [DOI] [PubMed] [Google Scholar]
- [14].Kelly PJ and Lim LL (2000). Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat Med 19, 13–33. [DOI] [PubMed] [Google Scholar]
- [15].Lancaster A and Intrator O (1998). Panel data with survival: hospitalization of HIVpositive patients. J Am Stat Assoc 93, 46–53. [Google Scholar]
- [16].Lawless JF and Nadeau C (1995). Some simple robust methods for the analysis of recurrent events. Technometrics 37, 158–168. [Google Scholar]
- [17].Lawless JF and Zhan M (1998). Analysis of interval-grouped recurrent event data using piecewise constant rate function. Can J Stat 26, 549–565. [Google Scholar]
- [18].Lin DY, Wei LJ, Yang I, and Ying ZL (2000). Semiparametric regression for the mean and rate functions of recurrent events. J Roy Stat Soc B 62, 711–730. [Google Scholar]
- [19].Lin DY, Wei LJ, and Ying ZL (1998). Accelerated failure time models for counting processes. Biometrika 85, 605–618. [Google Scholar]
- [20].Liu B, Lu W, and Zhang J (2014). Accelerated Intensity Frailty Model for Recurrent Events Data. Biometrics 70, 579–587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Liu L (2009). Joint modeling longitudinal semi-continuous data and survival, with application to longitudinal medical cost data. Stat Med 28, 972–986. [DOI] [PubMed] [Google Scholar]
- [22].Liu L and Huang X (2008). The use of Gaussian quadrature for estimation in frailty proportional hazards models. Stat Med 27, 2665–2683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Liu L and Huang X (2009). Joint analysis of correlated repeated measures and recurrent events processes in the presence of a dependent terminal event. J Roy Stat Soc C-App 58, 65–81. [Google Scholar]
- [24].Liu L, Huang X, Yaroshinsky A, and Cormier JN (2016). Joint Frailty Models for Zero-Inflated Recurrent Events in the Presence of a Terminal Event. Biometrics 72, 204–214. [DOI] [PubMed] [Google Scholar]
- [25].Liu L, Wolfe RA, and Huang X (2004). Shared frailty models for recurrent events and a terminal event. Biometrics 60, 747–756. [DOI] [PubMed] [Google Scholar]
- [26].Liu L and Yu Z (2008). A likelihood reformulation method in non-normal random effects models. Stat Med 27, 3105–3124. [DOI] [PubMed] [Google Scholar]
- [27].Liu Y and Wu Y (2011). Semiparametric additive intensity model with frailty for recurrent events. Acta Math Sin 27, 1831–1842. [Google Scholar]
- [28].Neaton JD, Wentworth DN, Rhame F, Hogan C, Abrams DI, and Deyton L (1994). Considerations in choice of a clinical endpoint for AIDS clinical trials. Stat Med 13, 2107–2125. [DOI] [PubMed] [Google Scholar]
- [29].Pepe MS and Cai J (1993). Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. J Am Stat Assoc 88, 811–820. [Google Scholar]
- [30].SAS Institute Inc. (2018). SAS/STAT(R) 15.1 User’s Guide. Cary, NC: SAS Institute Inc. [Google Scholar]
- [31].Su X, Fan J, Levine RA, Nunn ME, and Tsai CL (2018). Sparse estimation of generalized linear models (GLM) via approximated information criteria. Stat Sinica 28, 1561–1581. [Google Scholar]
- [32].Su X, Wijayasinghe CS, Fan J, and Zhang Y (2016). Sparse estimation of Cox proportional hazards models via approximated information criteria. Biometrics 72, 751–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Tibshirani R (1996). Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58, 267–288. [Google Scholar]
- [34].Tong X, Zhu L, and Sun J (2009). Variable selection for recurrent event data via nonconcave penalized estimating function. Lifetime Data Anal 15, 197–215. [DOI] [PubMed] [Google Scholar]
- [35].Wang M, Qin J, and Chiang C (2001). Analyzing recurrent event data with informative censoring. J Am Stat Assoc 96, 1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Wang W, Wu X, Zhao X, and Zhou X (2018). Robust variable selection of joint frailty model for panel count data. J Multivariate Anal 167, 60–78. [Google Scholar]
- [37].Ye Y, Kalbfleisch JD, and Schaubel DE (2007). Semiparametric analysis of correlated recurrent and terminal events. Biometrics 63, 78–87. [DOI] [PubMed] [Google Scholar]
- [38].Zeng D and Lin DY (2007). Semiparametric transformation models with random effects for recurrent events. J Am Stat Assoc 102, 167–180. [Google Scholar]
- [39].Zeng D and Lin DY (2009). Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. Biometrics 65, 746–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38, 894–942. [Google Scholar]
- [41].Zhao H, Sun D, Li G, and Sun J (2018). Variable selection for recurrent event data with broken adaptive ridge regression. Can J Stat 46, 416–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Zou H (2006). The adaptive LASSO and its oracle properties. J Am Stat Assoc 101, 1418–1429. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Restrictions apply to the availability of these data, which were used under license for this study. A subset of the data is available from the JM package in R. The code for implementing the proposed method is publicly available at https://github.com/joyfulstones/variable-selection-in-joint-frailty-models.