Abstract
In survival analysis, we may encounter the following three problems: nonlinear covariate effect, variable selection and measurement error. Existing studies only address one or two of these problems. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularized estimation, a penalization approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realized using an iterative algorithm. The performance of the proposed approach is assessed via simulation studies, and further illustrated by application to data from an AIDS clinical trial.
Keywords: Corrected score, conditional score, joint modeling, polynomial spline, survival, 62J07, 62G05, 62G20, 62N01
1. Introduction
In survival analysis, the Cox proportional hazards model has been extensively adopted. When the assumption of linear covariate effects is not sufficient, the partially linear (varying coefficient) proportional hazards model has been assumed. In the literature, estimation, inference, application of the partially linear proportional hazards model have been conducted. For references, we refer to Cai et al. (2008) and Nan et al. (2005). Among all covariates collected, not all are necessarily associated with survival, creating a demand for variable selection. Multiple techniques have been proposed for the purpose of variable selection (along with estimation), including for example penalization, Bayesian, boosting, thresholding, and others. Among them, penalization has drawn special attention, because of its appealing methodological, theoretical, and computational properties. Penalized variable selection and estimation with the Cox model has been studied in Fan and Li (2002), Gui and Li (2005) and Zhang and Lu (2007). Yan and Huang (2012) studied penalized variable selection for varying coefficient proportional hazards model. It is noted that, in most of the existing studies, linear covariate effects have been assumed. More remotely relevant to this study, penalized variable selection and estimation with linear covariate effects have been considered for other survival models, for example the accelerated failure time (AFT) model (Huang and Ma, 2010) and additive risk model (Ma et al., 2006).
In most of the existing studies, including the aforementioned, it has been assumed that the covariate values have been measured without error. The measurement error problem has been examined in quite a few publications. Under the standard proportional hazards model, available approaches include the regression calibration (Prentice, 1982; Wang et al., 1997; Dafni and Tsiatis, 1998), likelihood-based approaches (Wulfsohn and Tsiatis, 1997; Faucett and Thomas, 1996; Henderson et al., 2000; Xu and Zeger, 2001; Song et al., 2002b), conditional score (Tsiatis and Davidian, 2001; Song et al., 2002a) and correction approaches (Huang and Wang, 2000), among others. When the more challenging partially linear covariate effects are present, existing studies include the local conditional score and corrected score approaches based on kernel smoothing (Song and Wang, 2008) and spline smoothing (Song and Wang, 2017).
In summary, existing approaches only address one or two of the following problems: nonlinear covariate effect, variable selection using penalization, and measurement error. However, it is not hard to imagine that all three problems can co-exist. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularized estimation, a penalization approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realized using an iterative algorithm. We note that the individual components of the proposed model/approach may have roots in the existing literature, however, the “combination”, which can tackle a practically important problem, has not been investigated in the existing studies. The increasing complexity brings significant methodological, computational, and theoretical challenges.
The paper is organized as follows. In Section 2, we give the model definition. In Section 3, we first review the spline-based conditional score and corrected score approaches in Section 3.1, then we develop the corresponding penalized approaches in Section 3.2. Section 3.3 presents the asymptotic properties of the estimators. We assess the performance of the approaches via simulations in Section 4. The approaches are applied to the ACTG 175 data in Section 5. Section 6 provides conclusion remarks and a brief discussion.
2. Model definition
Let T denote the failure time and C denote the censoring time. The observed survival data are V = min(T, C), and Δ = I(T ≤ C), where I(·) is the indicator function. Let H = (H1,…, HK)⊤ denote K covariates. To deal with measurement error, it is required to have repeated error-prone measurements, or a validation set, or instrumental variables. We focus on the case with replicated measurements for error-prone covariates measurements; the proposed approaches can be easily extended to the other cases. Suppose that the kth covariate Hk may be measured with error mk times with being the mk error-contaminated measurements. To ensure identifiability, for error contaminated covariates, we assume that a subset of subjects have replicated observations, that is, Wk > 1. For error-free covariates, mk = 1 and Wk = Hk. Let , and m = (m1,…,mK)⊤.
We assume the classical measure error model
| (1) |
where the error ekj is normally-distributed with mean zero and variance . For error-free covariates, ekj = 0. Let , where . We assume that the errors are independent, and e is independent of (T, C) given H.
Suppose the first K1 covariates X have constant effects on survival and the last K2 covariates Z have possible time-varying effect on survival, that is, H = (X⊤, Z⊤)⊤, and K = K1 + K2. A partially time-varying coefficient proportional hazards model is assumed for the relationship between the hazard of failure and the covariates,
| (2) |
Here λ0(u) is an unspecified baseline hazard; β0 is a length-K1 vector of regression parameters and α0(u) is a length-K2 vector of smooth functions. Model (2) subsumes the standard proportional hazards model (K2 = 0) and the varying-coefficient model (K1 = 0). It makes explicit the assumption that censoring is noninformative.
Suppose the observed data are independent and identically distributed samples of (V, Δ, W, m}, which are denoted by {(Vi, Δi, Wi, mi) : i = 1,…,n}. We focus on estimating of the regression parameters β0 and α0(u).
3. Approaches
3.1. Estimation
For now, we assume the errors are known. Song and Wang (2017) have proposed spline-based corrected score and conditional score approaches when time-dependent covariates are measured with error, which may be easily adopted in this case as follows. Specifically, let α0k(u) be the kth component of α0(u). B-spline basis expansion is used to approximate α0k(u):
where is a set of basis functions, and Lk = nk + d + 1 is the number of basis functions in approximating the function α0k(u), with nk being the number of interior knots and d the degree of spline. The interior knots of the splines can be either equally spaced or placed on the sample quantiles of the failed events so that there are about the same number of events between any two adjacent knots. In practice, if the failure events are sparse, we recommend the second approach to reduce the chances of getting singularities. With the approximation, model (2) can be written in a form of the standard proportional hazards model:
| (3) |
Here R(u) = (X⊤,Z⊤B(u))⊤ is the vector of “covariates”, where
is a K2 × L matrix with , for k = 1,…,k, , and . The regression coefficient θ0 in (3) can be estimated by measurement-error dealing techniques, such as the corrected score and conditional score approaches.
Corrected score
The idea of corrected score (correction) approach is to correct the bias of the naive estimating function that obtained from replacing the true covariates by their sample means in the partial likelihood estimating function (Huang and Wang, 2000). Let g denote a scalar, vector or matrix which can be fixed or random, Yi(u) = I(Vi ≥ u) be the “at-risk” process, Ni(u) = I(Vi ≤ u, Δi = 1) be the counting process for the failure events, and η⊤(u) = (β⊤, γ⊤ B⊤ (u))⊤. Let θ = (β⊤, γ⊤)⊤. The spline-based corrected score estimating equation can be written as
| (4) |
for a fixed time τ. Here , and with the kth component equal to ; is the variance of given Hi; and for a scalar, vector or matrix g, with
Here is the variance of given Hi, and with
with denoting a (K1 × K1) identity matrix and 0r×s an (r × s) zero matrix.
Conditional score
The conditional score approach treats the unobserved true covariates as nuisance parameters for which sufficient statistics may be derived, and a set of estimating equations based on conditioning on the sufficient statistics may be deduced that remove the dependence on the true covariates (Tsiatis and Davidian, 2001). The spline-based conditional score estimating equation can be written as
| (5) |
where is a “sufficient statistic” for Ri(u), and
When there is no measurement error (all ), it follows that , and , and thus both (4) and (5) reduce to the standard partial likelihood score estimating function for (3).
In practice, are generally unknown. They can be estimated by methods of moments. The correct score and conditional score estimates can be obtained by substituting for in in (4) and (5).
3.2. Penalized variable selection
Assume there are multiple covariates. We are interested in estimating , when some components of η0(u) are zero and correspond to covariates that are not associated with the response. Without loss of generality, write where contains nonzero elements, contains zero elements, and . Write , where contains nonzero elements, contains zero elements and .
One popular technique for regression-based variable selection is the penalization, which can be applied to M-estimators, Z-estimators and U-estimators. Let for k = 1,…,K2. Let . For any vector function g on [0, τ], let ∥g∥2 = E{Δg⊤ (V)g(V)}, , . For any k = 1,…,K2 and ℓ, ℓ′ = 1,…,Lk, define the inner product with norm ∥Bkℓ∥2 = 〈Bkℓ, Bkℓ〉. For , let
We apply the penalization with respect to functions . It can be easily seen that is the derivative of the corrected log partial likelihood
| (6) |
which is also a function of . When applying penalization to , we maximize the following objective function
| (7) |
where as in Xue (2009) and pν(·) is some penalty function. Taking the derivative with respect to θ in (7), we obtain the penalized corrected score estimating equation
| (8) |
where is the derivative of pv(s) for s > 0 and . Estimating equation (8) can be rewritten as
where
Popular penalty functions include LASSO (Tibshirani, 1996) and SCAD (Fan and Li, 2001). Since the LASSO lacks the oracle property, we focus on SCAD, which is defined by
with
Here the tuning parameter ν controls the variable selection, and the parameters in the spline functions control the smoothness of the estimated functions .
We use the majorize-minorize (MM) algorithm to obtain the penalized corrected score estimator . Using local quadratic approximation (Fan and Li, 2001), the MM algorithm sets
at the (k + 1)th iteration for k ≥ 0, where θc(0) is the solution to , and
for a small number ε. We set ε = 10−3 in our numerical studies.
Similarly, we propose the penalized conditional score estimating equation
| (9) |
The estimator can be obtained using the MM algorithm through the iterations
3.3. Asymptotic properties
In this section, we derive the asymptotic properties of the proposed estimators. We first assumeassume are known. To reduce the complexity of studying the asymptotics, we consider equally spaced knots and assume the numbers of the knots are all equal for k = 1,…,K2. Let be the length of the subintervals between any two adjacent interior knots. The asymptotic properties of the penalized corrected score estimator, , are given in the following theorems with the proof outlined in the Appendix.
Theorem 3.1
Under Conditions (C1)–(C10) given in the Appendix, if the tuning parameters v1 → 0, v2 → 0 almost surely there exists a solution such that
Note that the estimator of α(u) is . Corresponding to the non-zero and zero coefficients, write and .
Theorem 3.2
Under Conditions (C1)–(C10) given in the Appendix, if ν1 → 0, ν2 → 0, ν1(nh)1/2 → ∞ and ν2(nh)1/2 → ∞, then with probability approaching to 1, and .
Theorem 3.3
Under Conditions (C1)–(C10) given in the Appendix, if ν1 → 0, ν2 → 0, ν1(nh)1/2 → ∞, and ν2(nh)1/2 → ∞, then
where is given in (A.5) in the Appendix. In addition, is positive definite, where is the variance of the estimator when there is no measurement error.
Theorem 3.1 indicates the consistency of the penalized corrected score estimator. Theorems 3.2 and 3.3 indicate that the estimator has the “oracle” property (Donoho and Johnstone, 1998), that is, as n → ∞, the penalized corrected score estimator performs as well as if the correct submodel that excluding the zero effect covariates is known.
When there are some error variance greater than 0 and unknown, they can be estimated by the method of moments estimator (Song et al., 2002a):
This requires P(mik > 1) > 0 for error contaminated covariates (Condition (C11) in the Appendix). The corrected score and conditional score estimates can be obtained by substituting for in (8) and (9). It can be easily shown that is a root-n consistent estimator of . Replacing with does not affect the convergence rate of the penalized corrected score and conditional score estimators. The asymptotic normality of is given in the following theorem.
Theorem 3.4
Under Conditions (C1)–(C11) given in the Appendix, if ν1 → 0, ν2 → 0, ν1(nh)1/2 → ∞, and ν2(nh)1/2 → ∞, then
where is given in (A.8) in the Appendix. In addition, is positive definite.
Theorem 3.4 indicates that the estimator is less efficient when the error variances are estimated.
With similar arguments as those in Song and Wang (2017), we can show that the penalized conditional score estimator has the same asymptotic properties as the penalized corrected score estimator. The asymptotic distribution result enables us to construct confidence intervals for the coefficients simultaneously.
4. Simulation studies
We conducted simulation studies to evaluate the performance of the estimators. We considered the case that there are 15 covariates. The covariates are generated from multivariate normal distributions with common correlation ρ = 0, 0.25 or 0.5. Among them, four are measured with error with the corresponding coefficients β1 = 0, β2 = 0, β3 = −1, and α1(u) = 0.3log(u/5+1)−1.8, and 11 covariates are exactly measured with the corresponding coefficients βj = 0 for j = 4,…,12, β13 = −1, and α2(u) = 0. The variance of the error is equal to 0.25. The error contaminated covariates has two replicated observations. The baseline hazard is a constant λ0(u) = 0.0005. The censoring distribution was generated from an exponential distribution with mean 400. The censoring rates are between 38% to 41%.
We ran the simulations for n = 300, 500 and 1000. In each scenario, 500 Monte Carlo datasets were simulated. For each dataset, The coefficients were estimated using the following approaches: (i) the “ideal” approach where the true values of the covariates are used; (ii) the naive approach; (iii) the corrected score approach; (iv) the conditional score approach; (v) the penalized “ideal”, naive, corrected score and conditional score approaches.
To reduce computational complexity, we considered ν1 = ν2 = ν in our numerical studies. This is justified by Theorems 3.1–3.3 in Section 3.3. We used quadratic splines with equally spaced knots. The penalty parameter ν and the number of knots were selected via a BIC type criterion, specifically, by minimizing 2Ln(θ) + np log(d), where np is the number of estimated nonzero parameters, d is the number of events, Ln(θ) is the log partial likelihood function for the “ideal” penalized and unpenalized approaches, the naive log partial likelihood function for the naive approaches, and the corrected log partial likelihood function for the corrected score and conditional score approaches. Our preliminary studies found that zero interior knots were selected for most of the datasets. Here we show the results with zero interior knots.
For the non-zero time-varying coefficients, we calculated the average of the mean absolute bias, , at equally spaced grids between 1 and 160, which are the 5th and 95th percentiles of the observed survival times, across the simulated datasets. Similarly, we calculated the average of the mean standard deviation, mean standard error and mean coverage probability of 95% Wald confidence intervals. For the constant non-zero coefficient, we gave the same statistics except replacing the mean absolute bias by the mean bias. The results for the nonzero coefficients are shown in Tables 1–3. Figure 1 shows the average of the estimates of α1 and the corresponding 95% point-wise confidence intervals. For all the estimators, the standard deviation tends to increase with ρ increases, and the penalized estimators are more efficient than the corresponding unpenalized estimators. The unpenalized and penalized naive approaches have large bias and poor coverage probabilities on estimation of β3 and α1(u) in all the cases, and the coverage probabilities worsen as sample size increases. The unpenalized corrected score and conditional score estimators also show relatively large bias when n = 300, and the coverage probability is somewhat below the nominal level, but their performance improves when sample size increases. The corresponding penalized approaches not only reduce bias but also improve efficiency, especially when the correlation between the covariates is large. Penalization also improves the coverage probabilities of the conditional score approaches. Although the conditional score and the corrected score estimators have the same asymptotic distributions, the conditional score estimators have smaller bias and standard errors, which indicates that they have better finite sample performance. An intuitive explanation can be found in Song and Huang (2005) (Section 3.1).
Table 1.
Estimation of the nonzero coefficients when n = 300
|
β3 |
β13 |
α1 (u) |
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | SD | SE | CP | RE | Bias | SD | SE | CP | RE | Bias | SD | SE | CP | RE | ||
| Corr = 0.00 | Ideal | −0.061 | 0.099 | 0.098 | 0.914 | −0.059 | 0.102 | 0.098 | 0.900 | 0.071 | 0.190 | 0.183 | 0.919 | |||
| Naive | 0.152 | 0.098 | 0.088 | 0.548 | 0.045 | 0.101 | 0.094 | 0.894 | 0.158 | 0.177 | 0.165 | 0.760 | ||||
| Corr | −0.187 | 0.228 | 0.212 | 0.956 | −0.151 | 0.173 | 0.152 | 0.889 | 0.218 | 0.331 | 0.267 | 0.879 | ||||
| Cond | −0.147 | 0.185 | 0.146 | 0.830 | −0.121 | 0.149 | 0.125 | 0.836 | 0.173 | 0.288 | 0.240 | 0.875 | ||||
| P Ideal | −0.023 | 0.092 | 0.091 | 0.926 | 1.17 | −0.020 | 0.093 | 0.091 | 0.946 | 1.19 | 0.039 | 0.179 | 0.170 | 0.919 | 1.13 | |
| P Naive | 0.187 | 0.090 | 0.083 | 0.388 | 1.16 | 0.083 | 0.094 | 0.088 | 0.811 | 1.15 | 0.199 | 0.167 | 0.156 | 0.641 | 1.13 | |
| P Corr | −0.065 | 0.158 | 0.133 | 0.904 | 2.08 | −0.054 | 0.133 | 0.110 | 0.895 | 1.70 | 0.079 | 0.257 | 0.195 | 0.861 | 1.66 | |
| P Cond | −0.045 | 0.147 | 0.128 | 0.915 | 1.57 | −0.040 | 0.126 | 0.113 | 0.930 | 1.41 | 0.059 | 0.244 | 0.217 | 0.916 | 1.39 | |
| Corr = 0.25 | Ideal | −0.049 | 0.108 | 0.108 | 0.942 | −0.052 | 0.108 | 0.108 | 0.934 | 0.055 | 0.221 | 0.210 | 0.929 | |||
| Naive | 0.175 | 0.100 | 0.095 | 0.530 | 0.036 | 0.106 | 0.104 | 0.910 | 0.191 | 0.197 | 0.187 | 0.721 | ||||
| Corr | −0.184 | 0.210 | 0.246 | 0.977 | −0.130 | 0.151 | 0.154 | 0.899 | 0.204 | 0.354 | 0.303 | 0.894 | ||||
| Cond | −0.141 | 0.188 | 0.162 | 0.890 | −0.101 | 0.143 | 0.132 | 0.898 | 0.160 | 0.328 | 0.280 | 0.905 | ||||
| P Ideal | −0.012 | 0.097 | 0.095 | 0.938 | 1.24 | −0.012 | 0.096 | 0.095 | 0.934 | 1.26 | 0.045 | 0.205 | 0.191 | 0.921 | 1.17 | |
| P Naive | 0.183 | 0.092 | 0.086 | 0.457 | 1.19 | 0.047 | 0.097 | 0.094 | 0.895 | 1.20 | 0.207 | 0.183 | 0.172 | 0.647 | 1.15 | |
| P Corr | −0.054 | 0.153 | 0.139 | 0.947 | 1.88 | −0.038 | 0.125 | 0.111 | 0.912 | 1.46 | 0.062 | 0.281 | 0.214 | 0.867 | 1.59 | |
| P Cond | −0.034 | 0.145 | 0.131 | 0.934 | 1.67 | −0.025 | 0.121 | 0.113 | 0.936 | 1.40 | 0.050 | 0.271 | 0.244 | 0.924 | 1.46 | |
| Corr = 0.50 | Ideal | −0.057 | 0.143 | 0.127 | 0.894 | −0.061 | 0.140 | 0.128 | 0.904 | 0.072 | 0.272 | 0.254 | 0.926 | |||
| Naive | 0.214 | 0.127 | 0.109 | 0.504 | 0.013 | 0.139 | 0.125 | 0.922 | 0.217 | 0.236 | 0.221 | 0.719 | ||||
| Corr | −0.252 | 0.349 | 0.440 | 0.973 | −0.172 | 0.229 | 0.239 | 0.924 | 0.324 | 0.568 | 0.513 | 0.900 | ||||
| Cond | −0.196 | 0.283 | 0.212 | 0.878 | −0.128 | 0.201 | 0.163 | 0.868 | 0.255 | 0.475 | 0.382 | 0.903 | ||||
| P Ideal | −0.014 | 0.122 | 0.107 | 0.920 | 1.37 | −0.022 | 0.116 | 0.108 | 0.924 | 1.47 | 0.045 | 0.246 | 0.227 | 0.929 | 1.22 | |
| P Naive | 0.199 | 0.124 | 0.096 | 0.475 | 1.05 | −0.018 | 0.119 | 0.108 | 0.933 | 1.36 | 0.205 | 0.221 | 0.200 | 0.695 | 1.15 | |
| P Corr | −0.060 | 0.255 | 0.166 | 0.915 | 1.87 | −0.059 | 0.173 | 0.132 | 0.913 | 1.77 | 0.101 | 0.477 | 0.270 | 0.849 | 1.42 | |
| P Cond | −0.029 | 0.187 | 0.150 | 0.915 | 2.27 | −0.032 | 0.143 | 0.127 | 0.925 | 1.97 | 0.067 | 0.360 | 0.306 | 0.927 | 1.74 | |
Corr, corrected score; Cond, conditional score; P, penalized; SD, empirical standard deviation; SE, average of estimated standard errors; CP, empirical coverage probability of 95% confidence interval; NC: non-convergence rate (%).
Table 3.
Estimation of the nonzero coefficients when n = 1000
|
β3 |
β13 |
α1 (u) |
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | SD | SE | CP | RE | Bias | SD | SE | CP | RE | Bias | SD | SE | CP | RE | ||
| Corr = 0.00 | Ideal | −0.018 | 0.051 | 0.051 | 0.950 | −0.016 | 0.052 | 0.051 | 0.936 | 0.037 | 0.091 | 0.094 | 0.924 | |||
| Naive | 0.188 | 0.048 | 0.045 | 0.022 | 0.089 | 0.053 | 0.049 | 0.536 | 0.206 | 0.087 | 0.085 | 0.270 | ||||
| Corr | −0.052 | 0.080 | 0.081 | 0.928 | −0.038 | 0.070 | 0.063 | 0.900 | 0.055 | 0.128 | 0.112 | 0.883 | ||||
| Cond | −0.045 | 0.079 | 0.071 | 0.886 | −0.034 | 0.069 | 0.063 | 0.904 | 0.049 | 0.126 | 0.119 | 0.906 | ||||
| P Ideal | −0.007 | 0.050 | 0.049 | 0.950 | 1.05 | −0.006 | 0.051 | 0.049 | 0.944 | 1.04 | 0.037 | 0.090 | 0.092 | 0.921 | 1.03 | |
| P Naive | 0.197 | 0.047 | 0.046 | 0.014 | 1.05 | 0.100 | 0.052 | 0.049 | 0.474 | 1.02 | 0.219 | 0.085 | 0.085 | 0.238 | 1.04 | |
| P Corr | −0.024 | 0.076 | 0.070 | 0.912 | 1.13 | −0.017 | 0.068 | 0.060 | 0.904 | 1.05 | 0.040 | 0.123 | 0.103 | 0.872 | 1.08 | |
| P Cond | −0.019 | 0.074 | 0.070 | 0.938 | 1.12 | −0.013 | 0.067 | 0.062 | 0.924 | 1.05 | 0.038 | 0.122 | 0.118 | 0.922 | 1.08 | |
| Corr = 0.25 | Ideal | −0.020 | 0.057 | 0.056 | 0.936 | −0.013 | 0.061 | 0.056 | 0.936 | 0.040 | 0.113 | 0.108 | 0.911 | |||
| Naive | 0.202 | 0.052 | 0.049 | 0.016 | 0.078 | 0.060 | 0.054 | 0.654 | 0.224 | 0.102 | 0.096 | 0.288 | ||||
| Corr | −0.059 | 0.090 | 0.093 | 0.952 | −0.035 | 0.079 | 0.067 | 0.890 | 0.059 | 0.157 | 0.128 | 0.869 | ||||
| Cond | −0.050 | 0.088 | 0.079 | 0.896 | −0.029 | 0.078 | 0.067 | 0.900 | 0.053 | 0.153 | 0.139 | 0.906 | ||||
| P Ideal | −0.010 | 0.054 | 0.052 | 0.928 | 1.11 | −0.003 | 0.055 | 0.052 | 0.930 | 1.20 | 0.040 | 0.110 | 0.104 | 0.912 | 1.06 | |
| P Naive | 0.186 | 0.050 | 0.048 | 0.044 | 1.11 | 0.059 | 0.056 | 0.052 | 0.760 | 1.15 | 0.213 | 0.099 | 0.095 | 0.307 | 1.06 | |
| P Corr | −0.027 | 0.080 | 0.075 | 0.930 | 1.29 | −0.012 | 0.070 | 0.061 | 0.900 | 1.27 | 0.045 | 0.146 | 0.115 | 0.859 | 1.15 | |
| P Cond | −0.020 | 0.078 | 0.074 | 0.936 | 1.27 | −0.008 | 0.069 | 0.063 | 0.924 | 1.26 | 0.042 | 0.143 | 0.134 | 0.920 | 1.14 | |
| Corr = 0.50 | Ideal | −0.017 | 0.065 | 0.065 | 0.948 | −0.013 | 0.067 | 0.065 | 0.950 | 0.050 | 0.130 | 0.130 | 0.922 | |||
| Naive | 0.247 | 0.062 | 0.056 | 0.022 | 0.064 | 0.070 | 0.064 | 0.816 | 0.257 | 0.113 | 0.112 | 0.297 | ||||
| Corr | −0.068 | 0.119 | 0.133 | 0.978 | −0.038 | 0.090 | 0.081 | 0.902 | 0.088 | 0.197 | 0.165 | 0.865 | ||||
| Cond | −0.056 | 0.115 | 0.097 | 0.888 | −0.031 | 0.088 | 0.079 | 0.918 | 0.077 | 0.192 | 0.176 | 0.908 | ||||
| P Ideal | −0.006 | 0.058 | 0.058 | 0.946 | 1.28 | −0.002 | 0.060 | 0.058 | 0.950 | 1.25 | 0.050 | 0.124 | 0.123 | 0.915 | 1.10 | |
| P Naive | 0.197 | 0.058 | 0.053 | 0.070 | 1.16 | 0.001 | 0.063 | 0.059 | 0.940 | 1.26 | 0.216 | 0.109 | 0.109 | 0.358 | 1.07 | |
| P Corr | −0.023 | 0.096 | 0.086 | 0.924 | 1.55 | −0.009 | 0.075 | 0.069 | 0.924 | 1.45 | 0.057 | 0.175 | 0.134 | 0.844 | 1.27 | |
| P Cond | −0.016 | 0.094 | 0.083 | 0.922 | 1.50 | −0.006 | 0.074 | 0.069 | 0.932 | 1.42 | 0.052 | 0.172 | 0.163 | 0.923 | 1.25 | |
Corr, corrected score; Cond, conditional score; P, penalized; SD, empirical standard deviation; SE, average of estimated standard errors; CP, empirical coverage probability of 95% confidence interval; NC: non-convergence rate (%).
Figure 1.
Average estimates of α1(u) and the 95% pointwise confidence interval.
To evaluate the performance of the penalized approaches on variable selection, we calculated the percentage of correct selection of model, and the percentage of the covariates included in the model. The results are shown in Table 4. All methods correctly select the covariates with non-zero coefficients in all cases, and percentage of incorrectly selecting covariates with zero coefficients tends to be higher if the coefficient is treated as time-varying in the model. The conditional score and the corrected score estimators perform better than the naive estimator. Among the two proposed methods, the penalized conditional score approach performs slightly better. All penalized approaches improve on variable selection when sample size increases.
Table 4.
Average percentage (%) of correct selection of the model, and percentages of selection of individual covariates with each coefficient
| Model | Non Zero Coef |
Zero Coef |
||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β3 | β13 | α1 | β1 | β2 | β4 | β5 | β6 | β7 | β8 | β9 | β10 | β11 | β12 | α2 | ||||
| n=300 | Corr = 0.00 | P Ideal | 65.8 | 100 | 100 | 100 | 1.0 | 2.2 | 2.2 | 2.8 | 2.8 | 2.8 | 3.4 | 3.0 | 3.4 | 2.6 | 2.0 | 16.5 |
| P Naive | 38.0 | 100 | 100 | 100 | 12.3 | 12.7 | 11.9 | 11.5 | 13.1 | 12.7 | 10.7 | 10.5 | 11.3 | 15.1 | 12.5 | 38.4 | ||
| P Corr | 35.8 | 100 | 100 | 100 | 7.7 | 7.3 | 6.6 | 4.7 | 7.3 | 6.7 | 5.6 | 8.1 | 6.9 | 6.9 | 5.8 | 30.8 | ||
| P Cond | 36.7 | 100 | 100 | 100 | 7.4 | 7.4 | 6.4 | 4.7 | 7.4 | 6.4 | 5.1 | 7.4 | 6.6 | 6.6 | 5.9 | 30.4 | ||
| Corr = 0.25 | P Ideal | 74.3 | 100 | 100 | 100 | 1.4 | 2.2 | 1.4 | 2.4 | 2.4 | 1.6 | 1.0 | 1.2 | 2.0 | 1.8 | 1.8 | 15.0 | |
| P Naive | 38.3 | 100 | 100 | 100 | 14.2 | 14.6 | 14.9 | 12.6 | 11.7 | 11.9 | 11.3 | 13.2 | 14.4 | 11.3 | 12.1 | 38.3 | ||
| P Corr | 45.9 | 100 | 100 | 100 | 6.0 | 7.5 | 4.3 | 4.9 | 4.3 | 6.2 | 4.1 | 3.6 | 5.8 | 3.8 | 6.2 | 28.6 | ||
| P Cond | 49.0 | 100 | 100 | 100 | 5.4 | 7.3 | 3.3 | 4.1 | 4.4 | 4.6 | 4.1 | 3.5 | 3.5 | 5.6 | 5.4 | 27.8 | ||
| Corr = 0.50 | P Ideal | 82.2 | 100 | 100 | 100 | 0.6 | 1.0 | 0.0 | 0.8 | 0.4 | 0.8 | 0.8 | 0.8 | 0.2 | 0.8 | 0.6 | 13.8 | |
| P Naive | 37.7 | 100 | 100 | 100 | 14.7 | 18.1 | 13.0 | 12.6 | 14.5 | 14.9 | 9.8 | 11.4 | 11.8 | 11.8 | 13.2 | 42.4 | ||
| P Corr | 61.3 | 100 | 100 | 100 | 5.6 | 4.6 | 03.6 | 3.6 | 3.1 | 4.4 | 3.6 | 3.6 | 4.6 | 4.9 | 3.8 | 25.6 | ||
| P Cond | 60.3 | 100 | 100 | 100 | 4.0 | 5.7 | 2.4 | 3.0 | 3.0 | 32 | 3.0 | 2.2 | 32 | 3.8 | 2.4 | 24.7 | ||
| n=500 | Corr = 0.00 | P Ideal | 90.8 | 100 | 100 | 100 | 0.8 | 0.4 | 0.8 | 0.4 | 0.6 | 0.2 | 0.2 | 0.0 | 0.2 | 0.0 | 0.2 | 5.8 |
| P Naive | 62.8 | 100 | 100 | 100 | 7.6 | 6.2 | 6.4 | 5.4 | 4.6 | 5.8 | 5.4 | 5.2 | 5.2 | 4.8 | 4.0 | 21.6 | ||
| P Corr | 72.3 | 100 | 100 | 100 | 2.2 | 2.4 | 1.4 | 1.8 | 1.0 | 1.4 | 3.8 | 1.4 | 2.0 | 1.4 | 1.4 | 12.6 | ||
| P Cond | 74.1 | 100 | 100 | 100 | 2.0 | 2.4 | 1.4 | 1.6 | 1.0 | 1.2 | 3.4 | 1.4 | 2.0 | 1.2 | 1.4 | 11.4 | ||
| Corr = 0.25 | P Ideal | 89.8 | 100 | 100 | 100 | 0.2 | 0.2 | 0.4 | 0.8 | 0.4 | 0.6 | 0.2 | 0.8 | 0.0 | 0.2 | 0.0 | 7.2 | |
| P Naive | 58.5 | 100 | 100 | 100 | 7.8 | 7.2 | 7.8 | 6.6 | 7.2 | 6.0 | 6.8 | 5.2 | 8.2 | 7.6 | 7.2 | 25.9 | ||
| P Corr | 73.0 | 100 | 100 | 100 | 1.8 | 1.6 | 1.6 | 2.0 | 1.0 | 2.2 | 2.2 | 1.8 | 1.4 | 1.4 | 1.4 | 15.9 | ||
| P Cond | 75.6 | 100 | 100 | 100 | 1.4 | 1.6 | 1.6 | 1.8 | 1.0 | 2.0 | 1.8 | 1.2 | 1.4 | 1.2 | 1.0 | 14.5 | ||
| Corr = 0.50 | P Ideal | 97.2 | 100 | 100 | 100 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.4 | 0.0 | 2.6 | |
| P Naive | 55.7 | 100 | 100 | 100 | 12.0 | 10.4 | 8.6 | 9.6 | 6.8 | 7.6 | 8.6 | 9.6 | 11.4 | 8.4 | 9.2 | 28.3 | ||
| P Corr | 86.9 | 100 | 100 | 100 | 0.6 | 0.6 | 1.0 | 0.2 | 0.6 | 0.4 | 0.4 | 0.2 | 0.6 | 1.0 | 0.2 | 10.1 | ||
| P Cond | 88.4 | 100 | 100 | 100 | 0.4 | 0.8 | 0.8 | 0.2 | 0.4 | 0.2 | 0.2 | 0.2 | 0.4 | 0.8 | 0.2 | 9.2 | ||
| n=1000 | Corr = 0.00 | P Ideal | 99.6 | 100 | 100 | 100 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.4 |
| P Naive | 89.2 | 100 | 100 | 100 | 1.0 | 1.2 | 0.8 | 1.4 | 0.2 | 0.8 | 1.4 | 0.8 | 0.6 | 1.2 | 0.8 | 5.2 | ||
| P Corr | 96.2 | 100 | 100 | 100 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | 0.2 | 0.2 | 0.4 | 0.2 | 0.2 | 0.0 | 2.4 | ||
| P Cond | 96.2 | 100 | 100 | 100 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | 0.2 | 0.2 | 0.4 | 0.2 | 0.2 | 0.0 | 2.4 | ||
| Corr = 0.25 | P Ideal | 99.4 | 100 | 100 | 100 | 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.6 | |
| P Naive | 81.6 | 100 | 100 | 100 | 1.4 | 2.2 | 2.0 | 2.8 | 2.4 | 2.2 | 2.2 | 2.0 | 2.4 | 1.6 | 1.4 | 11.8 | ||
| P Corr | 95.8 | 100 | 100 | 100 | 0.2 | 0.4 | 0.0 | 0.0 | 0.0 | 0.2 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 3.4 | ||
| P Cond | 96.0 | 100 | 100 | 100 | 0.2 | 0.2 | 0.0 | 0.0 | 0.0 | 0.2 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 3.4 | ||
| Corr = 0.50 | P Ideal | 99.8 | 100 | 100 | 100 | 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | |
| P Naive | 70.6 | 100 | 100 | 100 | 7.6 | 4.8 | 6.2 | 3.6 | 5.2 | 5.4 | 3.4 | 5.4 | 5.2 | 6.4 | 4.8 | 20.0 | ||
| P Corr | 99.6 | 100 | 100 | 100 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.4 | ||
| P Cond | 99.6 | 100 | 100 | 100 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.4 | ||
Corr, corrected score; Cond, conditional score; P, penalized.
5. Application
We applied the proposed approaches to the AIDS Clinical Trial (ACTG) 175 data. Access of the ACTG data is described at https://actgnetwork.org/clinical-trials/access-published-data. ACTG 175 is a randomized clinical trial to compare zidovudine alone, zidovudine plus didanosine, zidovudine plus zalcitabine, or didanosine alone in HIV-infected subjects on the basis of time to progression to AIDS or death (Hammer et al., 1996). Between December 1991 and October 1992, 2467 subjects were recruited and followed until November 1994. It is of interest to assess the effect of treatments on survival time adjusted for baseline covariates, including CD4 counts, antiretroviral history (naive or experience), history of intravenous drug use (yes or no), Karnofsky score, homosexual activity (yes or no), age and gender. Our analysis included 2448 patients with observations on these variables. It is well known that CD4 measurements may be subjected to substantial measurement error. In the ACTG 175 study, most subjects had replicated CD4 measurements before starting the treatments. The measurements between three weeks before randomization and one week after randomization were taken as replicates for baseline CD4 measurements. The logarithmic transformation was applied to CD4 counts to achieve approximate constant variance. The primary analysis found zidovudine alone to be inferior to the other three therapies; thus, further investigations focused on two treatment groups, zidovudine alone and the combination of the other three.
To determine if the coefficients are constant or time-varying, we used BIC to select among models with coefficients that are constant and quadratic splines with 0, 1, and 2 interior knots. Based on the selected model, history of intravenous drug use has a time-varying effect and the effects of treatment and other covariates are constant. We obtained the naive, conditional score, corrected score and the corresponding penalized estimates. For all these approaches, the BIC is smallest in the case of no interior knot for the time-varying coefficient. The estimated constant coefficients are shown in Table 5. All penalized approaches selected the covariates log(CD4), treatment, antiretroviral history, age, and Karnofsky scores, and the estimates are significant as the unpenalized estimates. The penalized conditional score and corrected score estimates have smaller estimated standard errors than the corresponding unpenalized estimates, which may imply possible efficiency gain. Homosexual activity, gender and history of intravenous drug are not selected by the penalized methods. Based on the unpenalized approaches, homosexual activity and gender are insignificant, while history of intravenous drug might have some effect at the beginning of the study, and the effect decayed and eventually disappeared around week 50 (Figure 2). The conditional score and corrected score estimates of treatment effects are larger in magnitude than the naive estimates.
Table 5.
Estimates (Standard Errors) of the constant coefficients in the ACTG 175 study.
| Naive | Corr | Cond | P Naive | P Corr | P Cond | |
|---|---|---|---|---|---|---|
| treatment | −0.405 (0.124) | −0.416 (0.135) | −0.416 (0.134) | −0.411 (0.130) | −0.423 (0.134) | −0.423 (0.132) |
| log(CD4) | −1.903 (0.191) | −2.217 (0.379) | −2.204 (0.223) | −1.909 (0.186) | −2.220 (0.336) | −2.207 (0.218) |
| antiretroviral experience | 0.293 (0.128) | 0.264 (0.140) | 0.265 (0.133) | 0.294 (0.130) | 0.265 (0.137) | 0.266 (0.132) |
| age | 0.021 (0.006) | 0.020 (0.008) | 0.021 (0.007) | 0.018 (0.006) | 0.018 (0.006) | 0.018 (0.006) |
| Karnofsky score | −0.036 (0.009) | −0.035 (0.008) | −0.035 (0.009) | −0.031 (0.008) | −0.029 (0.007) | −0.029 (0.008) |
| homosex | 0.133 (0.165) | 0.147 (0.175) | 0.147 (0.175) | 0 (–) | 0 (–) | 0 (–) |
| gender (male) | 0.116 (0.209) | 0.113 (0.218) | 0.113 (0.217) | 0 (–) | 0 (–) | 0 (–) |
Corr, corrected score; Cond, conditional score; P, penalized.
Figure 2.
Estimate of the coefficient of history of intravenous drug use and the 95% pointwise confidence interval.
6. Discussion
We have proposed penalized variable selection approaches for partially linear proportional hazards models with covariate measurement error. The proposed approaches can be extended to including intermittently measured time-dependent covariates via joint modeling the survival and longitudinal processes. The computation time usually increases when the number of covariates increases. Like other measurement error dealing approaches, the proposed approaches may break down when the measurement error is too large for a given sample size.
In this article, we assume that the number of covariates is finite. In our numerical studies, the dimension K is relatively low, which corresponding to many practical situations. In some recent studies, the ultra-high dimensional case with K diverging with n has been considered. We suspect that, with a diverging number of covariates, the proposed penalized variable selection approaches are still applicable. Data assumptions and proofs of variable selection properties with a finite or a diverging number of covariates are usually significantly different. Investigation of the proposed methodology with K → ∞ is highly nontrivial and will be pursued in future research.
To facilitate the development of the theory, in this paper we consider splines with quasi-uniform interior knots (Assumption (C7) in the Appendix) in the manuscript. This assumption is the same as in Huang (1998) and Xue and Yang (2006). In our simulation and real data application, we used the equally spaced knots and found this method worked very well in all the examples. We have also done simulation studies using knots at equally spaced sample quantiles, and the results are very similar to the ones based on equally spaced knots.
In practice, if the failure events are sparse, we recommend that knots be placed on the sample quantiles of the failure events. In the Cox model literature, Nan et al. (2005) and Sleeper and Harrington (1990) also suggested this scheme, which is believed to be able to reduce the chances of getting singularities compared to the one with equally spaced knots. There are also some methods involving adaptive knot selection (Stone et al., 1997; Miyata and Shen, 2012) at the expense of a larger computational burden. Developing an efficient and automatic criterion for knots selection is challenging for our model setting and warrants future study.
Table 2.
Estimation of the nonzero coefficients when n = 500
|
β3 |
β13 |
α1 (u) |
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | SD | SE | CP | RE | Bias | SD | SE | CP | RE | Bias | SD | SE | CP | RE | ||
| Corr = 0.00 | Ideal | −0.028 | 0.078 | 0.073 | 0.930 | −0.029 | 0.075 | 0.073 | 0.928 | 0.046 | 0.143 | 0.137 | 0.910 | |||
| Naive | 0.180 | 0.069 | 0.065 | 0.228 | 0.077 | 0.074 | 0.071 | 0.776 | 0.185 | 0.131 | 0.124 | 0.559 | ||||
| Corr | −0.088 | 0.123 | 0.125 | 0.954 | −0.070 | 0.104 | 0.095 | 0.902 | 0.114 | 0.207 | 0.169 | 0.871 | ||||
| Cond | −0.073 | 0.116 | 0.103 | 0.886 | −0.059 | 0.100 | 0.090 | 0.900 | 0.098 | 0.197 | 0.173 | 0.892 | ||||
| P Ideal | −0.007 | 0.074 | 0.070 | 0.942 | 1.10 | −0.007 | 0.072 | 0.070 | 0.944 | 1.09 | 0.035 | 0.138 | 0.131 | 0.914 | 1.08 | |
| P Naive | 0.199 | 0.067 | 0.064 | 0.144 | 1.06 | 0.098 | 0.072 | 0.069 | 0.692 | 1.06 | 0.207 | 0.127 | 0.120 | 0.467 | 1.00 | |
| P Corr | −0.032 | 0.111 | 0.099 | 0.928 | 1.22 | −0.025 | 0.096 | 0.084 | 0.918 | 1.17 | 0.051 | 0.189 | 0.147 | 0.863 | 1.20 | |
| P Cond | −0.020 | 0.107 | 0.098 | 0.934 | 1.19 | −0.017 | 0.093 | 0.087 | 0.938 | 1.14 | 0.043 | 0.183 | 0.166 | 0.923 | 1.12 | |
| Corr = 0.25 | Ideal | −0.034 | 0.084 | 0.081 | 0.934 | −0.024 | 0.081 | 0.080 | 0.936 | 0.052 | 0.164 | 0.157 | 0.916 | |||
| Naive | 0.188 | 0.077 | 0.072 | 0.256 | 0.064 | 0.083 | 0.078 | 0.826 | 0.201 | 0.146 | 0.139 | 0.565 | ||||
| Corr | −0.121 | 0.151 | 0.153 | 0.958 | −0.076 | 0.124 | 0.104 | 0.880 | 0.136 | 0.246 | 0.200 | 0.867 | ||||
| Cond | −0.097 | 0.138 | 0.117 | 0.858 | −0.060 | 0.116 | 0.098 | 0.878 | 0.111 | 0.231 | 0.204 | 0.893 | ||||
| P Ideal | −0.012 | 0.076 | 0.073 | 0.942 | 1.22 | −0.002 | 0.073 | 0.073 | 0.954 | 1.22 | 0.045 | 0.155 | 0.148 | 0.918 | 1.12 | |
| P Naive | 0.182 | 0.072 | 0.067 | 0.232 | 1.13 | 0.058 | 0.076 | 0.073 | 0.846 | 1.22 | 0.203 | 0.139 | 0.133 | 0.533 | 1.10 | |
| P Corr | −0.049 | 0.122 | 0.106 | 0.903 | 1.52 | −0.022 | 0.100 | 0.086 | 0.915 | 1.55 | 0.059 | 0.213 | 0.164 | 0.856 | 1.34 | |
| P Cond | −0.034 | 0.117 | 0.103 | 0.905 | 1.41 | −0.012 | 0.096 | 0.089 | 0.937 | 1.46 | 0.049 | 0.205 | 0.189 | 0.922 | 1.27 | |
| Corr = 0.50 | Ideal | −0.043 | 0.096 | 0.095 | 0.924 | −0.039 | 0.098 | 0.095 | 0.920 | 0.057 | 0.197 | 0.190 | 0.926 | |||
| Naive | 0.223 | 0.084 | 0.082 | 0.238 | 0.034 | 0.101 | 0.093 | 0.906 | 0.240 | 0.168 | 0.164 | 0.536 | ||||
| Corr | −0.160 | 0.196 | 0.235 | 0.984 | −0.103 | 0.151 | 0.133 | 0.907 | 0.189 | 0.366 | 0.283 | 0.879 | ||||
| Cond | −0.121 | 0.160 | 0.146 | 0.894 | −0.079 | 0.132 | 0.116 | 0.886 | 0.145 | 0.301 | 0.264 | 0.912 | ||||
| P Ideal | −0.018 | 0.082 | 0.083 | 0.944 | 1.38 | −0.014 | 0.085 | 0.083 | 0.936 | 1.32 | 0.048 | 0.182 | 0.174 | 0.923 | 1.18 | |
| P Naive | 0.185 | 0.077 | 0.075 | 0.315 | 1.20 | −0.012 | 0.089 | 0.084 | 0.942 | 1.29 | 0.210 | 0.158 | 0.153 | 0.567 | 1.13 | |
| P Corr | −0.055 | 0.146 | 0.125 | 0.948 | 1.81 | −0.033 | 0.117 | 0.101 | 0.912 | 1.84 | 0.072 | 0.279 | 0.193 | 0.855 | 1.71 | |
| P Cond | −0.037 | 0.126 | 0.117 | 0.938 | 1.62 | −0.022 | 0.105 | 0.099 | 0.934 | 1.57 | 0.057 | 0.254 | 0.232 | 0.935 | 1.41 | |
Corr, corrected score; Cond, conditional score; P, penalized; SD, empirical standard deviation; SE, average of estimated standard errors; CP, empirical coverage probability of 95% confidence interval; NC: non-convergence rate (%).
Acknowledgements
This research is supported in part by National Science Foundation grants DMS-1106816 (Song, Wang) and DMS-1542332 (Wang), and National Institute and Health grants CA201207 (Song), HL121347 (Song), and CA204120 (Ma).
Appendix
A.1. Regularity conditions
Let be the space of functions that have r continuous derivatives for some r ≥ 2 and assume , . Let be the space of spline functions with knots sequence and order p on [0, τ]. Let . Let S(u, η)[g] = E{Sni(u, η)[g]}, where Sni(u, η})[1] = Yi(u)exp{η⊤{(u)Hi(u)}. Similarly we denote and . For any matrix A, let ρmax(A) and ρmin(A) denote the maximum and minimum eigenvalues of A, and let A⊗k denote 1, A and AA⊤ respectively for k = 0, 1, 2. Define
Let be a neighborhood of η0. We assume the following regularity conditions.
-
(C1)
Pr(V ≥ τ) > 0.
-
(C2)
Pr(Δ = 1) > 0.
-
(C3)
There exist such that the density fV|Δ=1(x) of V satisfies that .
-
(C4)
-
(C5)
There exist such that uniformly for .
-
(C6)
ρmax(H⊗2) < ∞.
-
(C7)
The knot sequence is quasiuniform. The number nk of interior knots satisfies n1/(2r) ≪ nk ≪ n1/2−δ for some 0 < δ < (r − 1)/(2r), where an ≪ bn denotes that .
-
(C8)
E {H⊗2}2 < ∞, E [e⊤ e]2 < ∞, and .
-
(C9)
.
Conditions (C1) and (C2) are standard assumptions for proportional hazards models. Conditions (C8) and (C9) control the magnitude of the covariates, measurement error and baseline hazard, which are generally used for joint models. Condition (C7) specifies the knot density for spline approximation compared to the sample size. Condition (C3) ensures the equivalence of the norms ∥·∥ and ∥·∥2.Conditions (C4)–(C6) control the variation of the estimating functions around for u ∈ [0, τ]. Similar assumptions like (C3)–(C7) are usually adopted for asymptotics for polynomial spline approximations (Xue et al., 2010).
We also need some assumptions about and .
-
(C10)
The number of nonzero components in the nonparametric part is fixed, and there is a constant cα > 0 such that . The nonzero coefficients in the linear part satisfy that .
Let . Let denote the vector of parameters for error variances. To be able to estimate ω, we make the following assumption:
-
(C11)
P(mik > 1) > 0 for k ∈ Q.
A.2. Proof of Theorem 1
For simplicity of notation, we assume that ν1 = ν2 = ν. Note that an estimator maximizes (7) is a solution to (8). By Lemma B.5 in Song and Wang (2017), there exists satisfies that for . Let for and . Define an intermediate estimator of β that minimizes
where
First we show the consistency. Let , , , , , . Next define
Lemma A.1
Let . If θ ∈ Θ(C) and ν → 0, then
By triangular inequality,
where is the kth element of . As θ ∈ Θ(C) we have
By Lemma B.5 in (Song and Wang, 2017),
for some constant Ck. Thus,
Since ∥η0k∥ > 0 for and v → 0, ∥ηk∥ ≥ av when n is large enough. By Lemma B.3 and B.4 in Song and Wang (2017), ∥ηk∥2 ≥ av when n is large enough. The result follows.
Lemma A.2
Under Conditions (C1)–(C10), one has .
Proof. We only need to show that for any ε > 0, there exists a positive constant C such that, as n → ∞,
| (A.1) |
where is the corrected log partial likelihood function given in (6). Note that
Since pν(θ) ≥ pν(0) = 0, we have
By a Taylor expansion, we have
| (A.2) |
where
and η* lies between and η = B*(u)θ.
From (A.5) and (A.10) in (Song and Wang, 2017), for θ ∈ ∂Θ(C), we have
By Lemma A.1, for almost surely. The result holds when C is large enough. □
Lemma A.3
Under Conditions (C1)–(C10), we have .
We only need to show that for any ε > 0, there exists a C such that
The arguments are similar to those for (A.1) and are thus omitted.
Theorem 3.1 follows from Lemmas A.2 and A.3.
A.3. Proof of Theorem 2
We first cite two lemmas, which are corresponding to Lemmas B.3 and B.4 in Supplementary Material of Song and Wang (2017).
Lemma A.4
As n → ∞,
Lemma A.5
For any function , under Conditions C2 and C3, there exists 0 < c1 ≤ c2 < ∞ such that c1 ∥g∥2 ≤ ∥g∥n ≤ c2 ∥g∥2.
Note that , and
Then by Lemmas A.1, A.2 and A.3. This, together with Lemmas A.4 and A.5 implies and .
Let θ0 = (βs⊤, 0⊤, γs⊤, 0⊤)⊤, η0 = B*θ0, and define
It suffices to show that
Suppose θ ∈ Θ(A). Since pv(0) = 0, we have
where is the subvector of Uc(θ) composed of the elements, and is the subvector of Uc(θ) composed of the elements. With similar arguments as those for (A.5) and (A.10) in Song and Wang (2017), we have
Note that
Since , and v(nh)1/2 → ∞, we have , and when n is large enough. Therefore,
As v → 0, it follows that
This completes the proof.
A.4. Proof of Theorem 3
From Theorems 3.1 and 3.2, we have shown that there exists maximizing . It follows that also maximizes . Let . By Lemma A.1 and pν(0) = 0, for n large enough,
Let , contain the corresponding elements in . With similar arguments as those for Lemma B.12 in Song and Wang (2017), for any , we have
| (A.3) |
where
By similar arguments as those for Theorem 2 in Song and Wang (2017), it can be shown that has bounded eigenvalues and
| (A.4) |
From (A.6), the first K1 rows of equal [J1, J2] with
Thus, for any , replacing a by in (A.3) and letting
with being a matrix and an Ls × Ls matrix, we have
It can be easily seen that . We only need to show that
for some constant c3. This follows with similar arguments as those for Lemma A.5 in Song and Wang (2017). Therefore,
where
| (A.5) |
with
| (A.6) |
Finally, with similar arguments as those for Lemma B.14 in Song and Wang (2017), we can show that is positive definite.
A.5. Proof of Theorem 4
When ω is unknown, we will rewrite as , and modify the notation for the other functions similarly. A method of moments estimator of is
By the stong law of large number, under condition (C11), it can be easily shown that converges almost surely to . This, together with an Taylor expansion, implies that
With these facts, it can be shown that the results of Theorem 3.1 and 3.2 still hold. Then using similar arguments as those for Lemma A.5 in Song and Wang (2017), we have
where
and is the sample variance of eikj, j = 1,…,mik. Then, with arguments similar to those for Theorem 3, it can be shown that
Both and Ψi(θ0, ω) has mean 0. By the law of iterated expectation, we have
Note that is a function of , which is independent of . Therefore . It follows that
| (A.7) |
Hence
where
| (A.8) |
It can be easily seen that is positive definite from (A.7).
References
- Cai J, Fan J, Jiang J, and Zhou H (2008), ‘Partially linear hazard regression with varying coefficients for multivariate survival data’, Journal of the Royal Statistical Society, Series B, 70, 141–158. [Google Scholar]
- Dafni UG and Tsiatis AA (1998), ‘Evaluating surrogate markers of clinical outcome measured with error’, Biometrics, 54, 1445–1462. [PubMed] [Google Scholar]
- Donoho DL and Johnstone IM (1998), ‘Minimax estimation via wavelet shrinkage’, The Annals of Statistics, 26, 879–921. [Google Scholar]
- Fan J and Li R (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, Journal of the American Statistical Association, 96, 1348–1360. [Google Scholar]
- Fan J and Li R (2002), ‘Variable selection for Cox’s proportional hazards model and frailty model’, The Annals of Statistics, 30, 74–99. [Google Scholar]
- Faucett CJ and Thomas DC (1996), ‘Simultaneously modeling censored survival data and repeatedly measured covariates: a gibbs sampling approach’, Statistics in Medicine, 15, 1663–1685. [DOI] [PubMed] [Google Scholar]
- Gui J and Li H (2005), ‘Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data’, Bioinformatics, 21, 3001–3008. [DOI] [PubMed] [Google Scholar]
- Hammer SM, Katezstein DA, Hughes MD, Gundaker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, Hirsch MS, and Merigan TC (1996), ‘A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter’, New England Journal of Medicine, 335, 1081–1089. [DOI] [PubMed] [Google Scholar]
- Henderson R, Diggle P, and Dobson A (2000), ‘Joint modeling of longitudinal measurements and event time data’, Biostatistics, 4, 465–480. [DOI] [PubMed] [Google Scholar]
- Huang J (1998), ‘Projection estimation in multiple regression with application to functional anova models’, The Annals of Statistics, 26, 242–272. [Google Scholar]
- Huang J and Ma S (2010), ‘Variable selection in the accelerated failure time model via the bridge method’, Lifetime Data Analysis, 16, 176–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang Y and Wang CY (2000), ‘Cox regression with accurate covariates unascertainable: A nonparametric correction approach’, Journal of the American Statistical Association, 95, 1209–1219. [Google Scholar]
- Ma S, Kosorok M, and Fine J (2006), ‘Additive risk models for survival data with high-dimensional covariates’, Bometrics, 62, 202–210. [DOI] [PubMed] [Google Scholar]
- Miyata S and Shen X (2012), ‘Adaptive free-knot splines’, Journal of Computational and Graphical Statistics, 12, 197–213. [Google Scholar]
- Nan B, Lin X, Lisabeth L, and Harlow S (2005), ‘A varying-coefficient Cox model for the effect of age at a marker event on age at menopause’, Biometrics, 61, 576–583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prentice R (1982), ‘Covariate measurement errors and parameter estimates in a failure time regression model’, Biometrika, 69, 331–342. [Google Scholar]
- Sleeper LA and Harrington DP (1990), ‘Regression splines in the cox model with application to covariate effects in liver disease’, Journal of the American Statistical Association, 85, 941–949. [Google Scholar]
- Song X, Davidian M, and Tsiatis AA (2002a), ‘An estimator for the proportional hazards model with multiple longitudinal covariates measured with error’, Biostatistics, 3, 511–528. [DOI] [PubMed] [Google Scholar]
- Song X, Davidian M, and Tsiatis AA (2002b), ‘A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data’, Biometrics, 58, 742–753. [DOI] [PubMed] [Google Scholar]
- Song X and Huang Y (2005), ‘On corrected score approach for proportional hazards model with covariate measurement error’, Biometrics, 61, 702–714. [DOI] [PubMed] [Google Scholar]
- Song X and Wang CY (2008), ‘Semiparametric approaches for joint modeling of longitudinal and survival data with time varying coefficients’, Statistica Sinica, 27, 3178–3190. [DOI] [PubMed] [Google Scholar]
- Song X and Wang L (2017), ‘Partially time-varying coefficient proportional hazards models with error prone time-dependent covariates — an application to the AIDS clinical trial group 175 data.’, The Annals of Applied Statistics, 11, 274–296. [Google Scholar]
- Stone CJ, Hansen M, Kooperberg C, and Truong YK (1997), ‘Polynomial splines and their tensor products in extended linear modeling (with discussion)’, The Annals of Statistics, 25, 1371–1470. [Google Scholar]
- Tibshirani R (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society, Series B, 58, 172–183. [Google Scholar]
- Tsiatis AA and Davidian M (2001), ‘A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error’, Biometrika, 88, 447–458. [DOI] [PubMed] [Google Scholar]
- Wang CY, Hsu L, Feng ZD, and Prentice RL (1997), ‘Regression calibration in failure time regression’, Biometrics, 53, 131–145. [PubMed] [Google Scholar]
- Wulfsohn MS and Tsiatis AA (1997), ‘A joint model for survival and longitudinal data measured with error’, Biometrics, 53, 330–339. [PubMed] [Google Scholar]
- Xu J and Zeger SL (2001), ‘Joint analysis of longitudinal data comprising repeated measures and times to events’, Applied Statistics, 50, 375–387. [Google Scholar]
- Xue L (2009), ‘Consistent variable selection in additive models’, Statistica Sinica, 19, 1281–1296. [Google Scholar]
- Xue L, Qu A, and Zhou J (2010), ‘Consistent model selection for marginal generalized additive model for correlated data’, Journal of the American Statistical Association, 105 (492), 1518–1530. [Google Scholar]
- Xue L and Yang L (2006), ‘Additive coefficient modeling via polynomial spline’, Statistica Sinica, 16, 1423–1446. [Google Scholar]
- Yan J and Huang J (2012), ‘Model selection for Cox models with time-varying coefficients’, Biometrics, 68, 419–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang H and Lu W (2007), ‘Adaptive lasso for Cox’s proportional hazards model’, Biometrika, 94, 691–703. [Google Scholar]


