Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 9.
Published in final edited form as: J Nonparametr Stat. 2018 Nov 14;31(1):196–220. doi: 10.1080/10485252.2018.1545903

Variable selection for partially linear proportional hazards model with covariate measurement error

Xiao Song a,*, Li Wang b, Shuangge Ma c, Hanwen Huang a
PMCID: PMC7546028  NIHMSID: NIHMS1512684  PMID: 33041606

Abstract

In survival analysis, we may encounter the following three problems: nonlinear covariate effect, variable selection and measurement error. Existing studies only address one or two of these problems. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularized estimation, a penalization approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realized using an iterative algorithm. The performance of the proposed approach is assessed via simulation studies, and further illustrated by application to data from an AIDS clinical trial.

Keywords: Corrected score, conditional score, joint modeling, polynomial spline, survival, 62J07, 62G05, 62G20, 62N01

1. Introduction

In survival analysis, the Cox proportional hazards model has been extensively adopted. When the assumption of linear covariate effects is not sufficient, the partially linear (varying coefficient) proportional hazards model has been assumed. In the literature, estimation, inference, application of the partially linear proportional hazards model have been conducted. For references, we refer to Cai et al. (2008) and Nan et al. (2005). Among all covariates collected, not all are necessarily associated with survival, creating a demand for variable selection. Multiple techniques have been proposed for the purpose of variable selection (along with estimation), including for example penalization, Bayesian, boosting, thresholding, and others. Among them, penalization has drawn special attention, because of its appealing methodological, theoretical, and computational properties. Penalized variable selection and estimation with the Cox model has been studied in Fan and Li (2002), Gui and Li (2005) and Zhang and Lu (2007). Yan and Huang (2012) studied penalized variable selection for varying coefficient proportional hazards model. It is noted that, in most of the existing studies, linear covariate effects have been assumed. More remotely relevant to this study, penalized variable selection and estimation with linear covariate effects have been considered for other survival models, for example the accelerated failure time (AFT) model (Huang and Ma, 2010) and additive risk model (Ma et al., 2006).

In most of the existing studies, including the aforementioned, it has been assumed that the covariate values have been measured without error. The measurement error problem has been examined in quite a few publications. Under the standard proportional hazards model, available approaches include the regression calibration (Prentice, 1982; Wang et al., 1997; Dafni and Tsiatis, 1998), likelihood-based approaches (Wulfsohn and Tsiatis, 1997; Faucett and Thomas, 1996; Henderson et al., 2000; Xu and Zeger, 2001; Song et al., 2002b), conditional score (Tsiatis and Davidian, 2001; Song et al., 2002a) and correction approaches (Huang and Wang, 2000), among others. When the more challenging partially linear covariate effects are present, existing studies include the local conditional score and corrected score approaches based on kernel smoothing (Song and Wang, 2008) and spline smoothing (Song and Wang, 2017).

In summary, existing approaches only address one or two of the following problems: nonlinear covariate effect, variable selection using penalization, and measurement error. However, it is not hard to imagine that all three problems can co-exist. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularized estimation, a penalization approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realized using an iterative algorithm. We note that the individual components of the proposed model/approach may have roots in the existing literature, however, the “combination”, which can tackle a practically important problem, has not been investigated in the existing studies. The increasing complexity brings significant methodological, computational, and theoretical challenges.

The paper is organized as follows. In Section 2, we give the model definition. In Section 3, we first review the spline-based conditional score and corrected score approaches in Section 3.1, then we develop the corresponding penalized approaches in Section 3.2. Section 3.3 presents the asymptotic properties of the estimators. We assess the performance of the approaches via simulations in Section 4. The approaches are applied to the ACTG 175 data in Section 5. Section 6 provides conclusion remarks and a brief discussion.

2. Model definition

Let T denote the failure time and C denote the censoring time. The observed survival data are V = min(T, C), and Δ = I(TC), where I(·) is the indicator function. Let H = (H1,…, HK) denote K covariates. To deal with measurement error, it is required to have repeated error-prone measurements, or a validation set, or instrumental variables. We focus on the case with replicated measurements for error-prone covariates measurements; the proposed approaches can be easily extended to the other cases. Suppose that the kth covariate Hk may be measured with error mk times with Wk=(Wk1,,Wkmk) being the mk error-contaminated measurements. To ensure identifiability, for error contaminated covariates, we assume that a subset of subjects have replicated observations, that is, Wk > 1. For error-free covariates, mk = 1 and Wk = Hk. Let W=(W1,,WK), and m = (m1,…,mK).

We assume the classical measure error model

Wkj=Hk+ekj,j=1,,mk,k=1,,K, (1)

where the error ekj is normally-distributed with mean zero and variance σk2. For error-free covariates, ekj = 0. Let e=(e1,,eK), where ek=(ek1,,ekmk). We assume that the errors are independent, and e is independent of (T, C) given H.

Suppose the first K1 covariates X have constant effects on survival and the last K2 covariates Z have possible time-varying effect on survival, that is, H = (X, Z), and K = K1 + K2. A partially time-varying coefficient proportional hazards model is assumed for the relationship between the hazard of failure and the covariates,

λ(u|H)=limdu0du1Pr(uT<u+du|Tu,H,C)=λ0(u)exp{β0X+α0(u)Z}. (2)

Here λ0(u) is an unspecified baseline hazard; β0 is a length-K1 vector of regression parameters and α0(u) is a length-K2 vector of smooth functions. Model (2) subsumes the standard proportional hazards model (K2 = 0) and the varying-coefficient model (K1 = 0). It makes explicit the assumption that censoring is noninformative.

Suppose the observed data are independent and identically distributed samples of (V, Δ, W, m}, which are denoted by {(Vi, Δi, Wi, mi) : i = 1,…,n}. We focus on estimating of the regression parameters β0 and α0(u).

3. Approaches

3.1. Estimation

For now, we assume the errors (σ12,,σK2) are known. Song and Wang (2017) have proposed spline-based corrected score and conditional score approaches when time-dependent covariates are measured with error, which may be easily adopted in this case as follows. Specifically, let α0k(u) be the kth component of α0(u). B-spline basis expansion is used to approximate α0k(u):

α0k(u)=1Lkγ0klBk(u),

where {Bk(u)}=1Lk is a set of basis functions, and Lk = nk + d + 1 is the number of basis functions in approximating the function α0k(u), with nk being the number of interior knots and d the degree of spline. The interior knots of the splines can be either equally spaced or placed on the sample quantiles of the failed events so that there are about the same number of events between any two adjacent knots. In practice, if the failure events are sparse, we recommend the second approach to reduce the chances of getting singularities. With the approximation, model (2) can be written in a form of the standard proportional hazards model:

λi(u)λ0(u)exp{θ0Ri(u)}. (3)

Here R(u) = (X,ZB(u)) is the vector of “covariates”, where

B(u)=[B11(u)B1L1(u)000000B21(u)B2L2(u)000000BK21(u)BK2LK2(u)]

is a K2 × L matrix with L=k=1K2Lk, γ0k=(γ0k1,,γ0kLk) for k = 1,…,k, γ0=(γ01,,γ0K2), and θ0=(β0,γ0). The regression coefficient θ0 in (3) can be estimated by measurement-error dealing techniques, such as the corrected score and conditional score approaches.

Corrected score

The idea of corrected score (correction) approach is to correct the bias of the naive estimating function that obtained from replacing the true covariates by their sample means in the partial likelihood estimating function (Huang and Wang, 2000). Let g denote a scalar, vector or matrix which can be fixed or random, Yi(u) = I(Viu) be the “at-risk” process, Ni(u) = I(Viu, Δi = 1) be the counting process for the failure events, and η(u) = (β, γ B (u)). Let θ = (β, γ). The spline-based corrected score estimating equation can be written as

Unc(θ)=1ni=1n0τ{R^i(u)+ΣRi(u)θSnc(u,η)[R^]Snc(u,η)[1]}dNi(u)=0 (4)

for a fixed time τ. Here R^i(u)=(X¯i,Z¯iB(u)), and H^i=(X¯i,Z¯i) with the kth component equal to H^ik=mik1j=1mikWikj; Ri(u) is the variance of R^i(u) given Hi; and for a scalar, vector or matrix g, Snc(u,η)[g]=n1i=1nSnic(u,η)[g] with

Snic(u,η)[g]=Yi(t)giexp{η(u)H^iη(u)ΣHiη(u)/2}.

Here ΣHi=diag(mi11σ12,,miK1σK2) is the variance of H^i given Hi, and ΣRi(u)=B*(u)ΣHiB*(u) with

B*(u)=(IK1×K10K1×L0K2×K1B(u))

with IK1×K1 denoting a (K1 × K1) identity matrix and 0r×s an (r × s) zero matrix.

Conditional score

The conditional score approach treats the unobserved true covariates as nuisance parameters for which sufficient statistics may be derived, and a set of estimating equations based on conditioning on the sufficient statistics may be deduced that remove the dependence on the true covariates (Tsiatis and Davidian, 2001). The spline-based conditional score estimating equation can be written as

Und(θ)=n1i=1n0τ{R^i*(u)Snd(u,η)[R^i*]Snd(u,η)[1]}dNi(u)=0, (5)

where R^i*(u)=R^i(u)+ΣRi(u)θdNi(u) is a “sufficient statistic” for Ri(u), and

Snd(u,η)[g]=n1i=1nYi(u)giexp{θR^i*(u)θΣRi(u)θ/2}.

When there is no measurement error (all σk2=0), it follows that ΣRi(u)=0, R^i(u)=Ri(u) and R^i*(u)=Ri(u), and thus both (4) and (5) reduce to the standard partial likelihood score estimating function for (3).

In practice, (σ12,,σK2) are generally unknown. They can be estimated by methods of moments. The correct score and conditional score estimates can be obtained by substituting (σ^12,,σ^K2) for (σ12,,σK2) in ΣRi(u) in (4) and (5).

3.2. Penalized variable selection

Assume there are multiple covariates. We are interested in estimating η0(u)=(β0,α0(u)), when some components of η0(u) are zero and correspond to covariates that are not associated with the response. Without loss of generality, write β0=(β0s,β0z) where β0s contains K1s nonzero elements, β0z contains K1z zero elements, and K1=K1s+K1z. Write α0(u)=(α0s(u),α0z(u)), where α0s contains K2s nonzero elements, α0s(u) contains K2z zero elements and K2=K2s+K2z.

One popular technique for regression-based variable selection is the penalization, which can be applied to M-estimators, Z-estimators and U-estimators. Let Gk={gk(u):gk(u)==1LKγkBk(u)} for k = 1,…,K2. Let G*=RK1×G1××GK2. For any vector function g on [0, τ], let ∥g2 = Eg (V)g(V)}, gn2=n1i=1nΔig(Vi)g(Vi), g22=g(u)g(u)du. For any k = 1,…,K2 and ℓ, ℓ′ = 1,…,Lk, define the inner product Bk,Bk=Bk(u)Bk(u)du with norm ∥Bkℓ2 = 〈Bkℓ, Bkℓ〉. For Bk=(Bk1,,BkLk), let

Bk,Bk=(Bk1,Bk1Bk1,BkLkBkqk,Bk1BkLk,BkLk).

We apply the penalization with respect to functions g=(g1,,gK)G*. It can be easily seen that Unc(θ) is the derivative of the corrected log partial likelihood

Lnc(θ)=n10τ{θR^i(u)+12θΣRi(u)θlogSnc(u,η)[1]}dNi(u), (6)

which is also a function of ηG*. When applying penalization to Lnc(θ), we maximize the following objective function

LncP(θ)=Lnc(θ)k=1K1pν1(|βk|)k=1K2pν2(γkBk), (7)

where γkBk2=γkBk,Bkγk as in Xue (2009) and pν(·) is some penalty function. Taking the derivative with respect to θ in (7), we obtain the penalized corrected score estimating equation

Unc(θ)k=1K1pν1βk/|βk|k=1K2pν2(γkBk)γkBk1Bk,Bkγk=0, (8)

where pν1(s) is the derivative of pv(s) for s > 0 and pν(0)=0. Estimating equation (8) can be rewritten as

UncP(θ)=Unc(θ)Ων(θ)θ=0,

where

Ων(θ)=diag{β1pν1(|β1|)|β1|,,βK1pν1(|βK1|)|βK1|,pν2(γ1B1)γ1B1B1,B1,,pν2(γK2BK2)γK2BK2BK2,BK2}.

Popular penalty functions include LASSO (Tibshirani, 1996) and SCAD (Fan and Li, 2001). Since the LASSO lacks the oracle property, we focus on SCAD, which is defined by

pν(u)={νu0uν,(u22aνu+ν2)2(a1)ν<u<aν,(a+1)ν22uaν,

with

pν(u)=ν{I(uν)+(aνu)+(a1)νI(u>ν)}.

Here the tuning parameter ν controls the variable selection, and the parameters L1,,LK2 in the spline functions control the smoothness of the estimated functions α^k().

We use the majorize-minorize (MM) algorithm to obtain the penalized corrected score estimator θ^c. Using local quadratic approximation (Fan and Li, 2001), the MM algorithm sets

θc(k+1)=θc(k){Unc(θc(k))θΩν*(θc(k))}1UncP(θ(k))

at the (k + 1)th iteration for k ≥ 0, where θc(0) is the solution to Unc(θ)=0, and

Ων*(θ)=diag{β1pν1(|β1|)|β1|+ε,,βK1pν1(|βK1|)|βK1|+ε,pν2(γ1B1)γ1B1+ε<B1,B1>,,pν2(γK2BK2)γK2BK2+ε<BK2,BK2>}

for a small number ε. We set ε = 10−3 in our numerical studies.

Similarly, we propose the penalized conditional score estimating equation

UndP(θ)=Und(θ)Ων(θ)θ=0. (9)

The estimator θ^d can be obtained using the MM algorithm through the iterations

θd(k+1)=θd(k){Und(θd(k))θΩν*(θd(k))}1UndP(θd(k)).

3.3. Asymptotic properties

In this section, we derive the asymptotic properties of the proposed estimators. We first assumeassume σk2(k=1,,K) are known. To reduce the complexity of studying the asymptotics, we consider equally spaced knots and assume the numbers of the knots are all equal for k = 1,…,K2. Let h=nk1 be the length of the subintervals between any two adjacent interior knots. The asymptotic properties of the penalized corrected score estimator, θ^c=(β^c,γ^c), are given in the following theorems with the proof outlined in the Appendix.

Theorem 3.1

Under Conditions (C1)–(C10) given in the Appendix, if the tuning parameters v1 → 0, v2 → 0 almost surely there exists a solution η^c(u)=(β^c,γ^cB(u)) such that

η^cη0=O{(nh)1/2}.

Note that the estimator of α(u) is B(u)γ^c. Corresponding to the non-zero and zero coefficients, write β^c=(β^sc,β^0zc) and α^c=(α^sc,α^zc).

Theorem 3.2

Under Conditions (C1)–(C10) given in the Appendix, if ν1 → 0, ν2 → 0, ν1(nh)1/2 → ∞ and ν2(nh)1/2 → ∞, then with probability approaching to 1, α^zc=0 and β^zc=0.

Theorem 3.3

Under Conditions (C1)–(C10) given in the Appendix, if ν1 → 0, ν2 → 0, ν1(nh)1/2 → ∞, and ν2(nh)1/2 → ∞, then

n(Σβsc)1/2(β^scβ0s)dN(0,I),

where Σβsc is given in (A.5) in the Appendix. In addition, ΣβscΣβs is positive definite, where Σβs is the variance of the estimator when there is no measurement error.

Theorem 3.1 indicates the consistency of the penalized corrected score estimator. Theorems 3.2 and 3.3 indicate that the estimator has the “oracle” property (Donoho and Johnstone, 1998), that is, as n → ∞, the penalized corrected score estimator performs as well as if the correct submodel that excluding the zero effect covariates is known.

When there are some error variance σk2 greater than 0 and unknown, they can be estimated by the method of moments estimator σ^k2 (Song et al., 2002a):

σ^k2=i=1nj=1miI(mik>1)(WikjW¯ik)2i=1nI(mik>1)(mik1).

This requires P(mik > 1) > 0 for error contaminated covariates (Condition (C11) in the Appendix). The corrected score and conditional score estimates can be obtained by substituting σ^k2 for σk2 in (8) and (9). It can be easily shown that σ^k2 is a root-n consistent estimator of σk2. Replacing σk2 with σ^k2 does not affect the convergence rate of the penalized corrected score and conditional score estimators. The asymptotic normality of β^sc is given in the following theorem.

Theorem 3.4

Under Conditions (C1)–(C11) given in the Appendix, if ν1 → 0, ν2 → 0, ν1(nh)1/2 → ∞, and ν2(nh)1/2 → ∞, then

n(Σβsc)1/2(β^scβ0s)dN(0,I),

where Σβsc* is given in (A.8) in the Appendix. In addition, Σβsc*Σβsc is positive definite.

Theorem 3.4 indicates that the estimator is less efficient when the error variances are estimated.

With similar arguments as those in Song and Wang (2017), we can show that the penalized conditional score estimator has the same asymptotic properties as the penalized corrected score estimator. The asymptotic distribution result enables us to construct confidence intervals for the coefficients simultaneously.

4. Simulation studies

We conducted simulation studies to evaluate the performance of the estimators. We considered the case that there are 15 covariates. The covariates are generated from multivariate normal distributions with common correlation ρ = 0, 0.25 or 0.5. Among them, four are measured with error with the corresponding coefficients β1 = 0, β2 = 0, β3 = −1, and α1(u) = 0.3log(u/5+1)−1.8, and 11 covariates are exactly measured with the corresponding coefficients βj = 0 for j = 4,…,12, β13 = −1, and α2(u) = 0. The variance of the error is equal to 0.25. The error contaminated covariates has two replicated observations. The baseline hazard is a constant λ0(u) = 0.0005. The censoring distribution was generated from an exponential distribution with mean 400. The censoring rates are between 38% to 41%.

We ran the simulations for n = 300, 500 and 1000. In each scenario, 500 Monte Carlo datasets were simulated. For each dataset, The coefficients were estimated using the following approaches: (i) the “ideal” approach where the true values of the covariates are used; (ii) the naive approach; (iii) the corrected score approach; (iv) the conditional score approach; (v) the penalized “ideal”, naive, corrected score and conditional score approaches.

To reduce computational complexity, we considered ν1 = ν2 = ν in our numerical studies. This is justified by Theorems 3.1–3.3 in Section 3.3. We used quadratic splines with equally spaced knots. The penalty parameter ν and the number of knots were selected via a BIC type criterion, specifically, by minimizing 2Ln(θ) + np log(d), where np is the number of estimated nonzero parameters, d is the number of events, Ln(θ) is the log partial likelihood function for the “ideal” penalized and unpenalized approaches, the naive log partial likelihood function for the naive approaches, and the corrected log partial likelihood function for the corrected score and conditional score approaches. Our preliminary studies found that zero interior knots were selected for most of the datasets. Here we show the results with zero interior knots.

For the non-zero time-varying coefficients, we calculated the average of the mean absolute bias, u=1160|α^1(u)α1(u)|/160, at equally spaced grids between 1 and 160, which are the 5th and 95th percentiles of the observed survival times, across the simulated datasets. Similarly, we calculated the average of the mean standard deviation, mean standard error and mean coverage probability of 95% Wald confidence intervals. For the constant non-zero coefficient, we gave the same statistics except replacing the mean absolute bias by the mean bias. The results for the nonzero coefficients are shown in Tables 13. Figure 1 shows the average of the estimates of α1 and the corresponding 95% point-wise confidence intervals. For all the estimators, the standard deviation tends to increase with ρ increases, and the penalized estimators are more efficient than the corresponding unpenalized estimators. The unpenalized and penalized naive approaches have large bias and poor coverage probabilities on estimation of β3 and α1(u) in all the cases, and the coverage probabilities worsen as sample size increases. The unpenalized corrected score and conditional score estimators also show relatively large bias when n = 300, and the coverage probability is somewhat below the nominal level, but their performance improves when sample size increases. The corresponding penalized approaches not only reduce bias but also improve efficiency, especially when the correlation between the covariates is large. Penalization also improves the coverage probabilities of the conditional score approaches. Although the conditional score and the corrected score estimators have the same asymptotic distributions, the conditional score estimators have smaller bias and standard errors, which indicates that they have better finite sample performance. An intuitive explanation can be found in Song and Huang (2005) (Section 3.1).

Table 1.

Estimation of the nonzero coefficients when n = 300

β3
β13
α1 (u)
Bias SD SE CP RE Bias SD SE CP RE Bias SD SE CP RE
Corr = 0.00 Ideal −0.061 0.099 0.098 0.914 −0.059 0.102 0.098 0.900 0.071 0.190 0.183 0.919
Naive 0.152 0.098 0.088 0.548 0.045 0.101 0.094 0.894 0.158 0.177 0.165 0.760
Corr −0.187 0.228 0.212 0.956 −0.151 0.173 0.152 0.889 0.218 0.331 0.267 0.879
Cond −0.147 0.185 0.146 0.830 −0.121 0.149 0.125 0.836 0.173 0.288 0.240 0.875
P Ideal −0.023 0.092 0.091 0.926 1.17 −0.020 0.093 0.091 0.946 1.19 0.039 0.179 0.170 0.919 1.13
P Naive 0.187 0.090 0.083 0.388 1.16 0.083 0.094 0.088 0.811 1.15 0.199 0.167 0.156 0.641 1.13
P Corr −0.065 0.158 0.133 0.904 2.08 −0.054 0.133 0.110 0.895 1.70 0.079 0.257 0.195 0.861 1.66
P Cond −0.045 0.147 0.128 0.915 1.57 −0.040 0.126 0.113 0.930 1.41 0.059 0.244 0.217 0.916 1.39
Corr = 0.25 Ideal −0.049 0.108 0.108 0.942 −0.052 0.108 0.108 0.934 0.055 0.221 0.210 0.929
Naive 0.175 0.100 0.095 0.530 0.036 0.106 0.104 0.910 0.191 0.197 0.187 0.721
Corr −0.184 0.210 0.246 0.977 −0.130 0.151 0.154 0.899 0.204 0.354 0.303 0.894
Cond −0.141 0.188 0.162 0.890 −0.101 0.143 0.132 0.898 0.160 0.328 0.280 0.905
P Ideal −0.012 0.097 0.095 0.938 1.24 −0.012 0.096 0.095 0.934 1.26 0.045 0.205 0.191 0.921 1.17
P Naive 0.183 0.092 0.086 0.457 1.19 0.047 0.097 0.094 0.895 1.20 0.207 0.183 0.172 0.647 1.15
P Corr −0.054 0.153 0.139 0.947 1.88 −0.038 0.125 0.111 0.912 1.46 0.062 0.281 0.214 0.867 1.59
P Cond −0.034 0.145 0.131 0.934 1.67 −0.025 0.121 0.113 0.936 1.40 0.050 0.271 0.244 0.924 1.46
Corr = 0.50 Ideal −0.057 0.143 0.127 0.894 −0.061 0.140 0.128 0.904 0.072 0.272 0.254 0.926
Naive 0.214 0.127 0.109 0.504 0.013 0.139 0.125 0.922 0.217 0.236 0.221 0.719
Corr −0.252 0.349 0.440 0.973 −0.172 0.229 0.239 0.924 0.324 0.568 0.513 0.900
Cond −0.196 0.283 0.212 0.878 −0.128 0.201 0.163 0.868 0.255 0.475 0.382 0.903
P Ideal −0.014 0.122 0.107 0.920 1.37 −0.022 0.116 0.108 0.924 1.47 0.045 0.246 0.227 0.929 1.22
P Naive 0.199 0.124 0.096 0.475 1.05 −0.018 0.119 0.108 0.933 1.36 0.205 0.221 0.200 0.695 1.15
P Corr −0.060 0.255 0.166 0.915 1.87 −0.059 0.173 0.132 0.913 1.77 0.101 0.477 0.270 0.849 1.42
P Cond −0.029 0.187 0.150 0.915 2.27 −0.032 0.143 0.127 0.925 1.97 0.067 0.360 0.306 0.927 1.74

Corr, corrected score; Cond, conditional score; P, penalized; SD, empirical standard deviation; SE, average of estimated standard errors; CP, empirical coverage probability of 95% confidence interval; NC: non-convergence rate (%).

Table 3.

Estimation of the nonzero coefficients when n = 1000

β3
β13
α1 (u)
Bias SD SE CP RE Bias SD SE CP RE Bias SD SE CP RE
Corr = 0.00 Ideal −0.018 0.051 0.051 0.950 −0.016 0.052 0.051 0.936 0.037 0.091 0.094 0.924
Naive 0.188 0.048 0.045 0.022 0.089 0.053 0.049 0.536 0.206 0.087 0.085 0.270
Corr −0.052 0.080 0.081 0.928 −0.038 0.070 0.063 0.900 0.055 0.128 0.112 0.883
Cond −0.045 0.079 0.071 0.886 −0.034 0.069 0.063 0.904 0.049 0.126 0.119 0.906
P Ideal −0.007 0.050 0.049 0.950 1.05 −0.006 0.051 0.049 0.944 1.04 0.037 0.090 0.092 0.921 1.03
P Naive 0.197 0.047 0.046 0.014 1.05 0.100 0.052 0.049 0.474 1.02 0.219 0.085 0.085 0.238 1.04
P Corr −0.024 0.076 0.070 0.912 1.13 −0.017 0.068 0.060 0.904 1.05 0.040 0.123 0.103 0.872 1.08
P Cond −0.019 0.074 0.070 0.938 1.12 −0.013 0.067 0.062 0.924 1.05 0.038 0.122 0.118 0.922 1.08
Corr = 0.25 Ideal −0.020 0.057 0.056 0.936 −0.013 0.061 0.056 0.936 0.040 0.113 0.108 0.911
Naive 0.202 0.052 0.049 0.016 0.078 0.060 0.054 0.654 0.224 0.102 0.096 0.288
Corr −0.059 0.090 0.093 0.952 −0.035 0.079 0.067 0.890 0.059 0.157 0.128 0.869
Cond −0.050 0.088 0.079 0.896 −0.029 0.078 0.067 0.900 0.053 0.153 0.139 0.906
P Ideal −0.010 0.054 0.052 0.928 1.11 −0.003 0.055 0.052 0.930 1.20 0.040 0.110 0.104 0.912 1.06
P Naive 0.186 0.050 0.048 0.044 1.11 0.059 0.056 0.052 0.760 1.15 0.213 0.099 0.095 0.307 1.06
P Corr −0.027 0.080 0.075 0.930 1.29 −0.012 0.070 0.061 0.900 1.27 0.045 0.146 0.115 0.859 1.15
P Cond −0.020 0.078 0.074 0.936 1.27 −0.008 0.069 0.063 0.924 1.26 0.042 0.143 0.134 0.920 1.14
Corr = 0.50 Ideal −0.017 0.065 0.065 0.948 −0.013 0.067 0.065 0.950 0.050 0.130 0.130 0.922
Naive 0.247 0.062 0.056 0.022 0.064 0.070 0.064 0.816 0.257 0.113 0.112 0.297
Corr −0.068 0.119 0.133 0.978 −0.038 0.090 0.081 0.902 0.088 0.197 0.165 0.865
Cond −0.056 0.115 0.097 0.888 −0.031 0.088 0.079 0.918 0.077 0.192 0.176 0.908
P Ideal −0.006 0.058 0.058 0.946 1.28 −0.002 0.060 0.058 0.950 1.25 0.050 0.124 0.123 0.915 1.10
P Naive 0.197 0.058 0.053 0.070 1.16 0.001 0.063 0.059 0.940 1.26 0.216 0.109 0.109 0.358 1.07
P Corr −0.023 0.096 0.086 0.924 1.55 −0.009 0.075 0.069 0.924 1.45 0.057 0.175 0.134 0.844 1.27
P Cond −0.016 0.094 0.083 0.922 1.50 −0.006 0.074 0.069 0.932 1.42 0.052 0.172 0.163 0.923 1.25

Corr, corrected score; Cond, conditional score; P, penalized; SD, empirical standard deviation; SE, average of estimated standard errors; CP, empirical coverage probability of 95% confidence interval; NC: non-convergence rate (%).

Figure 1.

Figure 1.

Average estimates of α1(u) and the 95% pointwise confidence interval.

To evaluate the performance of the penalized approaches on variable selection, we calculated the percentage of correct selection of model, and the percentage of the covariates included in the model. The results are shown in Table 4. All methods correctly select the covariates with non-zero coefficients in all cases, and percentage of incorrectly selecting covariates with zero coefficients tends to be higher if the coefficient is treated as time-varying in the model. The conditional score and the corrected score estimators perform better than the naive estimator. Among the two proposed methods, the penalized conditional score approach performs slightly better. All penalized approaches improve on variable selection when sample size increases.

Table 4.

Average percentage (%) of correct selection of the model, and percentages of selection of individual covariates with each coefficient

Model Non Zero Coef
Zero Coef
β3 β13 α1 β1 β2 β4 β5 β6 β7 β8 β9 β10 β11 β12 α2
n=300 Corr = 0.00 P Ideal 65.8 100 100 100 1.0 2.2 2.2 2.8 2.8 2.8 3.4 3.0 3.4 2.6 2.0 16.5
P Naive 38.0 100 100 100 12.3 12.7 11.9 11.5 13.1 12.7 10.7 10.5 11.3 15.1 12.5 38.4
P Corr 35.8 100 100 100 7.7 7.3 6.6 4.7 7.3 6.7 5.6 8.1 6.9 6.9 5.8 30.8
P Cond 36.7 100 100 100 7.4 7.4 6.4 4.7 7.4 6.4 5.1 7.4 6.6 6.6 5.9 30.4
Corr = 0.25 P Ideal 74.3 100 100 100 1.4 2.2 1.4 2.4 2.4 1.6 1.0 1.2 2.0 1.8 1.8 15.0
P Naive 38.3 100 100 100 14.2 14.6 14.9 12.6 11.7 11.9 11.3 13.2 14.4 11.3 12.1 38.3
P Corr 45.9 100 100 100 6.0 7.5 4.3 4.9 4.3 6.2 4.1 3.6 5.8 3.8 6.2 28.6
P Cond 49.0 100 100 100 5.4 7.3 3.3 4.1 4.4 4.6 4.1 3.5 3.5 5.6 5.4 27.8
Corr = 0.50 P Ideal 82.2 100 100 100 0.6 1.0 0.0 0.8 0.4 0.8 0.8 0.8 0.2 0.8 0.6 13.8
P Naive 37.7 100 100 100 14.7 18.1 13.0 12.6 14.5 14.9 9.8 11.4 11.8 11.8 13.2 42.4
P Corr 61.3 100 100 100 5.6 4.6 03.6 3.6 3.1 4.4 3.6 3.6 4.6 4.9 3.8 25.6
P Cond 60.3 100 100 100 4.0 5.7 2.4 3.0 3.0 32 3.0 2.2 32 3.8 2.4 24.7
n=500 Corr = 0.00 P Ideal 90.8 100 100 100 0.8 0.4 0.8 0.4 0.6 0.2 0.2 0.0 0.2 0.0 0.2 5.8
P Naive 62.8 100 100 100 7.6 6.2 6.4 5.4 4.6 5.8 5.4 5.2 5.2 4.8 4.0 21.6
P Corr 72.3 100 100 100 2.2 2.4 1.4 1.8 1.0 1.4 3.8 1.4 2.0 1.4 1.4 12.6
P Cond 74.1 100 100 100 2.0 2.4 1.4 1.6 1.0 1.2 3.4 1.4 2.0 1.2 1.4 11.4
Corr = 0.25 P Ideal 89.8 100 100 100 0.2 0.2 0.4 0.8 0.4 0.6 0.2 0.8 0.0 0.2 0.0 7.2
P Naive 58.5 100 100 100 7.8 7.2 7.8 6.6 7.2 6.0 6.8 5.2 8.2 7.6 7.2 25.9
P Corr 73.0 100 100 100 1.8 1.6 1.6 2.0 1.0 2.2 2.2 1.8 1.4 1.4 1.4 15.9
P Cond 75.6 100 100 100 1.4 1.6 1.6 1.8 1.0 2.0 1.8 1.2 1.4 1.2 1.0 14.5
Corr = 0.50 P Ideal 97.2 100 100 100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.0 2.6
P Naive 55.7 100 100 100 12.0 10.4 8.6 9.6 6.8 7.6 8.6 9.6 11.4 8.4 9.2 28.3
P Corr 86.9 100 100 100 0.6 0.6 1.0 0.2 0.6 0.4 0.4 0.2 0.6 1.0 0.2 10.1
P Cond 88.4 100 100 100 0.4 0.8 0.8 0.2 0.4 0.2 0.2 0.2 0.4 0.8 0.2 9.2
n=1000 Corr = 0.00 P Ideal 99.6 100 100 100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4
P Naive 89.2 100 100 100 1.0 1.2 0.8 1.4 0.2 0.8 1.4 0.8 0.6 1.2 0.8 5.2
P Corr 96.2 100 100 100 0.0 0.0 0.2 0.0 0.0 0.2 0.2 0.4 0.2 0.2 0.0 2.4
P Cond 96.2 100 100 100 0.0 0.0 0.2 0.0 0.0 0.2 0.2 0.4 0.2 0.2 0.0 2.4
Corr = 0.25 P Ideal 99.4 100 100 100 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.6
P Naive 81.6 100 100 100 1.4 2.2 2.0 2.8 2.4 2.2 2.2 2.0 2.4 1.6 1.4 11.8
P Corr 95.8 100 100 100 0.2 0.4 0.0 0.0 0.0 0.2 0.0 0.2 0.0 0.0 0.0 3.4
P Cond 96.0 100 100 100 0.2 0.2 0.0 0.0 0.0 0.2 0.0 0.2 0.0 0.0 0.0 3.4
Corr = 0.50 P Ideal 99.8 100 100 100 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.2
P Naive 70.6 100 100 100 7.6 4.8 6.2 3.6 5.2 5.4 3.4 5.4 5.2 6.4 4.8 20.0
P Corr 99.6 100 100 100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4
P Cond 99.6 100 100 100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4

Corr, corrected score; Cond, conditional score; P, penalized.

5. Application

We applied the proposed approaches to the AIDS Clinical Trial (ACTG) 175 data. Access of the ACTG data is described at https://actgnetwork.org/clinical-trials/access-published-data. ACTG 175 is a randomized clinical trial to compare zidovudine alone, zidovudine plus didanosine, zidovudine plus zalcitabine, or didanosine alone in HIV-infected subjects on the basis of time to progression to AIDS or death (Hammer et al., 1996). Between December 1991 and October 1992, 2467 subjects were recruited and followed until November 1994. It is of interest to assess the effect of treatments on survival time adjusted for baseline covariates, including CD4 counts, antiretroviral history (naive or experience), history of intravenous drug use (yes or no), Karnofsky score, homosexual activity (yes or no), age and gender. Our analysis included 2448 patients with observations on these variables. It is well known that CD4 measurements may be subjected to substantial measurement error. In the ACTG 175 study, most subjects had replicated CD4 measurements before starting the treatments. The measurements between three weeks before randomization and one week after randomization were taken as replicates for baseline CD4 measurements. The logarithmic transformation was applied to CD4 counts to achieve approximate constant variance. The primary analysis found zidovudine alone to be inferior to the other three therapies; thus, further investigations focused on two treatment groups, zidovudine alone and the combination of the other three.

To determine if the coefficients are constant or time-varying, we used BIC to select among models with coefficients that are constant and quadratic splines with 0, 1, and 2 interior knots. Based on the selected model, history of intravenous drug use has a time-varying effect and the effects of treatment and other covariates are constant. We obtained the naive, conditional score, corrected score and the corresponding penalized estimates. For all these approaches, the BIC is smallest in the case of no interior knot for the time-varying coefficient. The estimated constant coefficients are shown in Table 5. All penalized approaches selected the covariates log(CD4), treatment, antiretroviral history, age, and Karnofsky scores, and the estimates are significant as the unpenalized estimates. The penalized conditional score and corrected score estimates have smaller estimated standard errors than the corresponding unpenalized estimates, which may imply possible efficiency gain. Homosexual activity, gender and history of intravenous drug are not selected by the penalized methods. Based on the unpenalized approaches, homosexual activity and gender are insignificant, while history of intravenous drug might have some effect at the beginning of the study, and the effect decayed and eventually disappeared around week 50 (Figure 2). The conditional score and corrected score estimates of treatment effects are larger in magnitude than the naive estimates.

Table 5.

Estimates (Standard Errors) of the constant coefficients in the ACTG 175 study.

Naive Corr Cond P Naive P Corr P Cond
treatment −0.405 (0.124) −0.416 (0.135) −0.416 (0.134) −0.411 (0.130) −0.423 (0.134) −0.423 (0.132)
log(CD4) −1.903 (0.191) −2.217 (0.379) −2.204 (0.223) −1.909 (0.186) −2.220 (0.336) −2.207 (0.218)
antiretroviral experience 0.293 (0.128) 0.264 (0.140) 0.265 (0.133) 0.294 (0.130) 0.265 (0.137) 0.266 (0.132)
age 0.021 (0.006) 0.020 (0.008) 0.021 (0.007) 0.018 (0.006) 0.018 (0.006) 0.018 (0.006)
Karnofsky score −0.036 (0.009) −0.035 (0.008) −0.035 (0.009) −0.031 (0.008) −0.029 (0.007) −0.029 (0.008)
homosex 0.133 (0.165) 0.147 (0.175) 0.147 (0.175) 0 (–) 0 (–) 0 (–)
gender (male) 0.116 (0.209) 0.113 (0.218) 0.113 (0.217) 0 (–) 0 (–) 0 (–)

Corr, corrected score; Cond, conditional score; P, penalized.

Figure 2.

Figure 2.

Estimate of the coefficient of history of intravenous drug use and the 95% pointwise confidence interval.

6. Discussion

We have proposed penalized variable selection approaches for partially linear proportional hazards models with covariate measurement error. The proposed approaches can be extended to including intermittently measured time-dependent covariates via joint modeling the survival and longitudinal processes. The computation time usually increases when the number of covariates increases. Like other measurement error dealing approaches, the proposed approaches may break down when the measurement error is too large for a given sample size.

In this article, we assume that the number of covariates is finite. In our numerical studies, the dimension K is relatively low, which corresponding to many practical situations. In some recent studies, the ultra-high dimensional case with K diverging with n has been considered. We suspect that, with a diverging number of covariates, the proposed penalized variable selection approaches are still applicable. Data assumptions and proofs of variable selection properties with a finite or a diverging number of covariates are usually significantly different. Investigation of the proposed methodology with K → ∞ is highly nontrivial and will be pursued in future research.

To facilitate the development of the theory, in this paper we consider splines with quasi-uniform interior knots (Assumption (C7) in the Appendix) in the manuscript. This assumption is the same as in Huang (1998) and Xue and Yang (2006). In our simulation and real data application, we used the equally spaced knots and found this method worked very well in all the examples. We have also done simulation studies using knots at equally spaced sample quantiles, and the results are very similar to the ones based on equally spaced knots.

In practice, if the failure events are sparse, we recommend that knots be placed on the sample quantiles of the failure events. In the Cox model literature, Nan et al. (2005) and Sleeper and Harrington (1990) also suggested this scheme, which is believed to be able to reduce the chances of getting singularities compared to the one with equally spaced knots. There are also some methods involving adaptive knot selection (Stone et al., 1997; Miyata and Shen, 2012) at the expense of a larger computational burden. Developing an efficient and automatic criterion for knots selection is challenging for our model setting and warrants future study.

Table 2.

Estimation of the nonzero coefficients when n = 500

β3
β13
α1 (u)
Bias SD SE CP RE Bias SD SE CP RE Bias SD SE CP RE
Corr = 0.00 Ideal −0.028 0.078 0.073 0.930 −0.029 0.075 0.073 0.928 0.046 0.143 0.137 0.910
Naive 0.180 0.069 0.065 0.228 0.077 0.074 0.071 0.776 0.185 0.131 0.124 0.559
Corr −0.088 0.123 0.125 0.954 −0.070 0.104 0.095 0.902 0.114 0.207 0.169 0.871
Cond −0.073 0.116 0.103 0.886 −0.059 0.100 0.090 0.900 0.098 0.197 0.173 0.892
P Ideal −0.007 0.074 0.070 0.942 1.10 −0.007 0.072 0.070 0.944 1.09 0.035 0.138 0.131 0.914 1.08
P Naive 0.199 0.067 0.064 0.144 1.06 0.098 0.072 0.069 0.692 1.06 0.207 0.127 0.120 0.467 1.00
P Corr −0.032 0.111 0.099 0.928 1.22 −0.025 0.096 0.084 0.918 1.17 0.051 0.189 0.147 0.863 1.20
P Cond −0.020 0.107 0.098 0.934 1.19 −0.017 0.093 0.087 0.938 1.14 0.043 0.183 0.166 0.923 1.12
Corr = 0.25 Ideal −0.034 0.084 0.081 0.934 −0.024 0.081 0.080 0.936 0.052 0.164 0.157 0.916
Naive 0.188 0.077 0.072 0.256 0.064 0.083 0.078 0.826 0.201 0.146 0.139 0.565
Corr −0.121 0.151 0.153 0.958 −0.076 0.124 0.104 0.880 0.136 0.246 0.200 0.867
Cond −0.097 0.138 0.117 0.858 −0.060 0.116 0.098 0.878 0.111 0.231 0.204 0.893
P Ideal −0.012 0.076 0.073 0.942 1.22 −0.002 0.073 0.073 0.954 1.22 0.045 0.155 0.148 0.918 1.12
P Naive 0.182 0.072 0.067 0.232 1.13 0.058 0.076 0.073 0.846 1.22 0.203 0.139 0.133 0.533 1.10
P Corr −0.049 0.122 0.106 0.903 1.52 −0.022 0.100 0.086 0.915 1.55 0.059 0.213 0.164 0.856 1.34
P Cond −0.034 0.117 0.103 0.905 1.41 −0.012 0.096 0.089 0.937 1.46 0.049 0.205 0.189 0.922 1.27
Corr = 0.50 Ideal −0.043 0.096 0.095 0.924 −0.039 0.098 0.095 0.920 0.057 0.197 0.190 0.926
Naive 0.223 0.084 0.082 0.238 0.034 0.101 0.093 0.906 0.240 0.168 0.164 0.536
Corr −0.160 0.196 0.235 0.984 −0.103 0.151 0.133 0.907 0.189 0.366 0.283 0.879
Cond −0.121 0.160 0.146 0.894 −0.079 0.132 0.116 0.886 0.145 0.301 0.264 0.912
P Ideal −0.018 0.082 0.083 0.944 1.38 −0.014 0.085 0.083 0.936 1.32 0.048 0.182 0.174 0.923 1.18
P Naive 0.185 0.077 0.075 0.315 1.20 −0.012 0.089 0.084 0.942 1.29 0.210 0.158 0.153 0.567 1.13
P Corr −0.055 0.146 0.125 0.948 1.81 −0.033 0.117 0.101 0.912 1.84 0.072 0.279 0.193 0.855 1.71
P Cond −0.037 0.126 0.117 0.938 1.62 −0.022 0.105 0.099 0.934 1.57 0.057 0.254 0.232 0.935 1.41

Corr, corrected score; Cond, conditional score; P, penalized; SD, empirical standard deviation; SE, average of estimated standard errors; CP, empirical coverage probability of 95% confidence interval; NC: non-convergence rate (%).

Acknowledgements

This research is supported in part by National Science Foundation grants DMS-1106816 (Song, Wang) and DMS-1542332 (Wang), and National Institute and Health grants CA201207 (Song), HL121347 (Song), and CA204120 (Ma).

Appendix

A.1. Regularity conditions

Let C(r) be the space of functions that have r continuous derivatives for some r ≥ 2 and assume α0kC(r), k=1,,K2s. Let Gk be the space of spline functions with knots sequence ξk={0=ξk0ξk1ξknkξk(nk+1)=τ} and order p on [0, τ]. Let GnK2=G1G2GK2. Let S(u, η)[g] = E{Sni(u, η)[g]}, where Sni(u, η})[1] = Yi(u)exp{η{(u)Hi(u)}. Similarly we denote Sc(u,η)[g]=E{Snic(u,η)[g]} and Sd(u,η)[g]=E{Snid(u,η)[g]}. For any matrix A, let ρmax(A) and ρmin(A) denote the maximum and minimum eigenvalues of A, and let Ak denote 1, A and AA respectively for k = 0, 1, 2. Define

Γ(u,η)=S(u,η)[H2]S(u,η)[1]S2(u,η)[1]S2(u,η)[H]S2(u,η)[1].

Let N(η0)={η:ηη0cη0} be a neighborhood of η0. We assume the following regularity conditions.

  • (C1)

    Pr(Vτ) > 0.

  • (C2)

    Pr(Δ = 1) > 0.

  • (C3)

    There exist 0<c1fc2f< such that the density fV|Δ=1(x) of V satisfies that c1ffV|Δ=1(x)c2f.

  • (C4)

    ρmax[E(ΣHi)]cΣ<.

  • (C5)

    There exist 0<c1Γc2Γ< such that c1Γinfu[0,τ]ρmin{Γ(u,η)}supu[0,τ]ρmax{Γ(u,η)}c2Γ uniformly for ηN(η0).

  • (C6)

    ρmax(H⊗2) < ∞.

  • (C7)

    The knot sequence ξk={0=ξk0ξk1ξknkξk(nk+1)=τ} is quasiuniform. The number nk of interior knots satisfies n1/(2r)nkn1/2−δ for some 0 < δ < (r − 1)/(2r), where anbn denotes that limnanbn1=0.

  • (C8)

    E {H⊗2}2 < ∞, E [e e]2 < ∞, and supηN(η0),u[0,τ]max[E{exp(4η(u)H)},E{exp(4η(u)e)}]<.

  • (C9)

    0τλ02(u)du<.

Conditions (C1) and (C2) are standard assumptions for proportional hazards models. Conditions (C8) and (C9) control the magnitude of the covariates, measurement error and baseline hazard, which are generally used for joint models. Condition (C7) specifies the knot density for spline approximation compared to the sample size. Condition (C3) ensures the equivalence of the norms ∥·∥ and ∥·∥2.Conditions (C4)–(C6) control the variation of the estimating functions around ηN(η0) for u ∈ [0, τ]. Similar assumptions like (C3)–(C7) are usually adopted for asymptotics for polynomial spline approximations (Xue et al., 2010).

We also need some assumptions about β0s and α0s.

  • (C10)

    The number of nonzero components in the nonparametric part K2s is fixed, and there is a constant cα > 0 such that min1kK2sα0k>cα. The nonzero coefficients in the linear part satisfy that min1kK1sβ0k/ν1.

Let Q={k:σk2>0,k=1,K}. Let ω={σk2:kQ} denote the vector of parameters for error variances. To be able to estimate ω, we make the following assumption:

  • (C11)

    P(mik > 1) > 0 for kQ.

A.2. Proof of Theorem 1

For simplicity of notation, we assume that ν1 = ν2 = ν. Note that an estimator η^c(u) maximizes (7) is a solution to (8). By Lemma B.5 in Song and Wang (2017), there exists α˜k(u)Gn satisfies that supu[0,τ]α˜kα0k=O(hr) for k=1,,K2s. Let α˜k=0 for k=K2s+1,,K2 and α˜(u)=(α˜1,,α˜K2)=γ˜B(u). Define an intermediate estimator β˜ of β that minimizes

LncP(β,γ˜)=Lnc(β,γ˜)k=1K1pν1(|βk|),

where

Lnc(β,γ˜)=n1Δi{βXi+γ˜(u)Zi+12(β,γ)ΣRi(u)(β,γ˜)logSn(Ti;(β,γ˜))}.

First we show the consistency. Let θ˜=(β˜,γ˜), θ˜0=(β˜0,γ˜), η(u)=(β,γB(u)), η˜(u)=(β˜,γ˜B(u)), η0(u)=(β0,α0(u)), η˜0(u)=(β0,γ˜B(u)). Next define

Θ(C)={θ=(β,γ):B*(θθ˜0)C(nh)1/2},Θ(C)={θ=(β,γ):B*(θθ˜0)=C(nh)1/2}.

Lemma A.1

Let η(u)=(β,γB(u)). If θ ∈ Θ(C) and ν → 0, then

Pr[θΘ(C),η0k0{pν(ηk2)(a+1)ν2/2}]=o(1).

By triangular inequality,

ηkη˜0k[B*(θθ˜0)]k,η˜0kη0kη0kη˜0k,

where [B*(θθ˜0)]k is the kth element of [B*(θθ˜0)]. As θ ∈ Θ(C) we have

[B*(θθ˜0)]kB*(θθ˜0)C(nh)1/2.

By Lemma B.5 in (Song and Wang, 2017),

η0kη˜0kCkhr

for some constant Ck. Thus,

ηkη0kC(nh)1/2Ckhr=η0ko(1).

Since ∥η0k∥ > 0 for k=K1+1,,K1+K2s and v → 0, ∥ηk∥ ≥ av when n is large enough. By Lemma B.3 and B.4 in Song and Wang (2017), ∥ηk2av when n is large enough. The result follows.

Lemma A.2

Under Conditions (C1)–(C10), one has θ^cθ˜0=O{(nh)1/2}.

Proof. We only need to show that for any ε > 0, there exists a positive constant C such that, as n → ∞,

Pr{supθΘ(C)LncP(θ)<LncP(θ˜0)}>1ε, (A.1)

where Lnc() is the corrected log partial likelihood function given in (6). Note that

LncP(θ)LncP(θ˜0)=Lnc(θ)Lnc(θ˜0){k=1Kpν(ηk2)k=1Kpν(η˜k2)}.

Since pν(θ) ≥ pν(0) = 0, we have

LncP(θ)LncP(θ˜0)Lnc(θ)Lnc(θ˜0)ηk0{pν(ηk2)pν(η˜k2)}.

By a Taylor expansion, we have

Lnc(θ)Lnc(θ˜0)=(θθ˜0)Unc(θ˜0)+(θθ˜0)Unc(θ˜0)θ(θθ˜0), (A.2)

where

Unc(θ˜0)=n1i=1n0τ{R^i(u)+ΣRi(u)θ˜0Snc(u,η˜0)[R^]Snc(u,η˜0)[1]}dNi(u),Unc(θ*)θ=n1i=1n0τ{ΣRi(u)Γ#c(u,η*)}dNi(u),Γ#c(u,η)=Snc(u,η)[R^2]Snc(u,η)[1]Snc2(u,η)[1]Snc2(u,η)[R^]Snc2(u,η)[1],

and η* lies between η˜0 and η = B*(u)θ.

From (A.5) and (A.10) in (Song and Wang, 2017), for θΘ(C), we have

(θθ˜0)Unc(θ˜0)=C×Op{(hr+(nh)1/2)(nh)1/2}.
C2(nh)1{c2Γ+op(1)}(θθ˜0)Unc(θ*)θ(θθ˜0)C2(nh)1{c1Γ+op(1)}.

By Lemma A.1, pν(ηk2)=pν(η˜0k2) for ηk0 almost surely. The result holds when C is large enough. □

Lemma A.3

Under Conditions (C1)–(C10), we have β˜β0=O{(nh)1/2}.

We only need to show that for any ε > 0, there exists a C such that

Pr{supββ0=C(nh)1/2LncP(β,γ˜)<LncP(β0,γ˜)}>1ε.

The arguments are similar to those for (A.1) and are thus omitted.

Theorem 3.1 follows from Lemmas A.2 and A.3.

A.3. Proof of Theorem 2

We first cite two lemmas, which are corresponding to Lemmas B.3 and B.4 in Supplementary Material of Song and Wang (2017).

Lemma A.4

As n → ∞,

An=supgGK|gn2g21|=Op(log(n)nh).

Lemma A.5

For any function gGK, under Conditions C2 and C3, there exists 0 < c1c2 < ∞ such that c1g2 ≤ ∥gnc2g2.

Note that η^cη0η^cη˜+η˜η0, and

η˜η02=β˜β02+Bγ˜α02=β˜β02+k=1Kα˜kα0k2.

Then η˜cη0=O{(nh)1/2} by Lemmas A.1, A.2 and A.3. This, together with Lemmas A.4 and A.5 implies η˜cη02=Op{(nh)1/2} and η^cη02=Op{(nh)1/2}.

Let θ0 = (βs, 0, γs, 0), η0 = B*θ0, and define

Θ(A)={θ=(βs,0,γs,0):B*(θθ0)C(nh)1/2}.

It suffices to show that

LncP(θ0)=maxθΘ(A)LncP(θ).

Suppose θ ∈ Θ(A). Since pv(0) = 0, we have

LncP(θ)LncP(θ0)=LncP(θ)LncP(θ0)ηk0=0pν(ηk2)=βzUn,βzc(θ0*)+γzUn,γzc(θ0*)+βzUn,γzc(θ0*)βzβz+γzUn,βzc(θ0*)γzγz{ηk0=0pν(ηk*2)fk(θ*)ηk},

where Un,βzc(θ) is the subvector of Uc(θ) composed of the (K1z+1,,K1) elements, and Un,γzc(θ) is the subvector of Uc(θ) composed of the (K1+k=1K2sLk,,K1+k=1K2Lk) elements. With similar arguments as those for (A.5) and (A.10) in Song and Wang (2017), we have

βzUn,βzc(θ˜0)=C×Op{(hr+(nh)1/2)}βz,γzUn,γzc(θ0*)=C×Op{(hr+(nh)1/2)}Bγz,
C2{c2Γ+op(1)}βz2βzUn,γ*c(θ0*)βzβzC2{c1Γ+op(1)}βz2,C2{c2Γ+op(1)}Bγz2γzUn,βzc(θ0*)γzγzC2{c1Γ+op(1)}Bγz2.

Note that

ηk0=0pν(ηk*2)fk(θ*)ηk=βjzβjzpν(|βjz|)|βjz|2+pν(γ1z2)γjz2γjzBj,Bjγjz.

Since γjzBj,BjγjzBjγz2 |βjz|C(nh)1/2, Bjγjz2C(nh)1/2 and v(nh)1/2 → ∞, we have pν(|βjz|)=ν, and pν(γjz2)=ν when n is large enough. Therefore,

βjzβjzpν(|βjz|)|βjz|2νβz,pν(γ1z2)γjz2γjzBj,BjγjzCνBjγz2.

As v → 0, it follows that

LnCP(θ)LncP(θ0)0.

This completes the proof.

A.4. Proof of Theorem 3

From Theorems 3.1 and 3.2, we have shown that there exists θ^=(β^sc,0,γ^sc,0) maximizing LncP(θ). It follows that θ^ also maximizes LncP((βs,0,γs,0)). Let U˙sc(θ)=Unc(θ)/(βs,γs). By Lemma A.1 and pν(0) = 0, for n large enough,

Us,ncP(θ^)=Us,nc(θ^),a.s.

Let θs^=(β^sc,γ^sc), θ˜0s contain the corresponding elements in θ˜0. With similar arguments as those for Lemma B.12 in Song and Wang (2017), for any aRK1s+Ls, we have

an1/2(θs^sθ˜0s)=a{U˙sc(θ0)}1n1/2i=1nUnic(η0)+op(1)a2, (A.3)

where

Unic(η0)=0τ[B*(u){H^i+ΣHiη0(u)Sc(u,η0)[H^]Sc(u,η0)[1]}dNi(u)0LB*(u){H^iSc(u,η0)[H^]Sc(u,η0)[1]}Snic(u,η0)[1]Sc(u,η0)[1]dEN(u)],

By similar arguments as those for Theorem 2 in Song and Wang (2017), it can be shown that {U˙sc(θ0)}1 has bounded eigenvalues and

c21aaa{U˙sc(θ0)}1ac11aa. (A.4)

From (A.6), the first K1 rows of {U˙sc(θ0)}1 equal [J1, J2] with

J1=(U˙sβsβscU˙sβsγscU˙sγsγsc1U˙sγsβsc)1,J2=(U˙sβsβscU˙sβsγscU˙sγsγsc1U˙sγsβsc)1U˙sβsγscU˙sγsγsc1.

Thus, for any a1RK1s, replacing a by (a1,0K1K1s+L) in (A.3) and letting

U˙sc(θ0)=E{0τB*(u)Γs(u,η)B*(u)dNi(u)}=(U˙sβsβscU˙sβsγscU˙sγsβscU˙sγsγsc)

with U˙sβsβsc being a K1s×K1s matrix and U˙sγsγsc an Ls × Ls matrix, we have

a1n(β^scβ0)=a1(U˙sβsβscU˙sβsγscU˙sγsγsc1U˙sγβsc)1(I,U˙sβsγscU˙sγsγsc1)×1ni=1nUnic(η0)+a12op(1),

It can be easily seen that E(Unic)=0. We only need to show that

var{a1(U˙sβsβscU˙sβsγsc1U˙sγsγsc1U˙sγsβsc)1(I,U˙sβsγscU˙sγsγsc1)Unic(η0)}c3a1a1

for some constant c3. This follows with similar arguments as those for Lemma A.5 in Song and Wang (2017). Therefore,

nΣβsc1/2(β^scβ0)dN(0,I),

where

Σβsc=Ac1Dc(Ac1), (A.5)

with

Ac=(U˙sβsβscU˙sβsγscU˙sγsγsc1U˙sγsβsc)1,Dc=(I,U˙sβsγsc1U˙sγsγsc1)E{Unic2(η0)}(I,U˙sβsγscU˙sγsγsc1). (A.6)

Finally, with similar arguments as those for Lemma B.14 in Song and Wang (2017), we can show that ΣβscΣβs is positive definite.

A.5. Proof of Theorem 4

When ω is unknown, we will rewrite Usc(θ) as Usc(θ,ω), and modify the notation for the other functions similarly. A method of moments estimator of σk2(kQ) is

σ^k2=i=1nj=1miI(mik>1)(WikjW¯ik)2i=1nI(mik>1)(mik1).

By the stong law of large number, under condition (C11), it can be easily shown that σ^k2 converges almost surely to σk2. This, together with an Taylor expansion, implies that

n1/2(σ^k2σk2)=n1/2[E{I(mik>1)(mik1)}]1×i=1nj=1miI(mik>1){(WikjW¯ik)2(mik1)σk2}+op(1).

With these facts, it can be shown that the results of Theorem 3.1 and 3.2 still hold. Then using similar arguments as those for Lemma A.5 in Song and Wang (2017), we have

an1/2(θ^θ˜0)=a{U˙sc(θ0,ω)}1n1/2i=1nΨi(θ0,ω)+op(1)a2,

where

Ψi(θ0,ω)=kQE{Uc(θ0,ω)σk2}[E{I(mik>1)(mik1)}]1×j=1mikI(mik>1){(WikjWik¯)2(mik1)σk2}=kQE{Uc(θ0,ω)σk2}[E{I(mik>1)(mik1)}]1×j=1mikI(mik>1)(mik1){Seik2σk2},

and Seik2 is the sample variance of eikj, j = 1,…,mik. Then, with arguments similar to those for Theorem 3, it can be shown that

a1n(β^scβ0)=a1(U˙sβsβscU˙sβsγscU˙sγsγsc1U˙sγβsc)1(I,U˙sβsγscU˙sγsγsc1)×1ni=1n{Unic(η0,ω)+Ψi(θ0,ω)}+a12op(1).

Both Unic(η0,ω) and Ψi(θ0, ω) has mean 0. By the law of iterated expectation, we have

E(Unic(η0,ω)Ψi(θ0,ω))=E{E(Ψi(θ0,ω)Unic(η0,ω)|Hi,ei)}=E[Ψi(θ0,ω)E{Unic(η0)|Hi,ei)}]

Note that E{Unic(η0)|Hi,ei)} is a function of (Hi,e¯i), which is independent of Seik2. Therefore E(Unic(η0,ω)Ψi(θ0,ω))=0. It follows that

Dc*=var(Unic(η0,ω)+Ψi(θ0,ω))=var(Unic(η0,ω))+var(Ψi(θ0,ω)). (A.7)

Hence

nΣβsc*1/2(β^scβ0)dN(0,I),

where

Σβsc*=Ac1Dc*(Ac1). (A.8)

It can be easily seen that Σβsc*Σβsc is positive definite from (A.7).

References

  1. Cai J, Fan J, Jiang J, and Zhou H (2008), ‘Partially linear hazard regression with varying coefficients for multivariate survival data’, Journal of the Royal Statistical Society, Series B, 70, 141–158. [Google Scholar]
  2. Dafni UG and Tsiatis AA (1998), ‘Evaluating surrogate markers of clinical outcome measured with error’, Biometrics, 54, 1445–1462. [PubMed] [Google Scholar]
  3. Donoho DL and Johnstone IM (1998), ‘Minimax estimation via wavelet shrinkage’, The Annals of Statistics, 26, 879–921. [Google Scholar]
  4. Fan J and Li R (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, Journal of the American Statistical Association, 96, 1348–1360. [Google Scholar]
  5. Fan J and Li R (2002), ‘Variable selection for Cox’s proportional hazards model and frailty model’, The Annals of Statistics, 30, 74–99. [Google Scholar]
  6. Faucett CJ and Thomas DC (1996), ‘Simultaneously modeling censored survival data and repeatedly measured covariates: a gibbs sampling approach’, Statistics in Medicine, 15, 1663–1685. [DOI] [PubMed] [Google Scholar]
  7. Gui J and Li H (2005), ‘Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data’, Bioinformatics, 21, 3001–3008. [DOI] [PubMed] [Google Scholar]
  8. Hammer SM, Katezstein DA, Hughes MD, Gundaker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, Hirsch MS, and Merigan TC (1996), ‘A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter’, New England Journal of Medicine, 335, 1081–1089. [DOI] [PubMed] [Google Scholar]
  9. Henderson R, Diggle P, and Dobson A (2000), ‘Joint modeling of longitudinal measurements and event time data’, Biostatistics, 4, 465–480. [DOI] [PubMed] [Google Scholar]
  10. Huang J (1998), ‘Projection estimation in multiple regression with application to functional anova models’, The Annals of Statistics, 26, 242–272. [Google Scholar]
  11. Huang J and Ma S (2010), ‘Variable selection in the accelerated failure time model via the bridge method’, Lifetime Data Analysis, 16, 176–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Huang Y and Wang CY (2000), ‘Cox regression with accurate covariates unascertainable: A nonparametric correction approach’, Journal of the American Statistical Association, 95, 1209–1219. [Google Scholar]
  13. Ma S, Kosorok M, and Fine J (2006), ‘Additive risk models for survival data with high-dimensional covariates’, Bometrics, 62, 202–210. [DOI] [PubMed] [Google Scholar]
  14. Miyata S and Shen X (2012), ‘Adaptive free-knot splines’, Journal of Computational and Graphical Statistics, 12, 197–213. [Google Scholar]
  15. Nan B, Lin X, Lisabeth L, and Harlow S (2005), ‘A varying-coefficient Cox model for the effect of age at a marker event on age at menopause’, Biometrics, 61, 576–583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Prentice R (1982), ‘Covariate measurement errors and parameter estimates in a failure time regression model’, Biometrika, 69, 331–342. [Google Scholar]
  17. Sleeper LA and Harrington DP (1990), ‘Regression splines in the cox model with application to covariate effects in liver disease’, Journal of the American Statistical Association, 85, 941–949. [Google Scholar]
  18. Song X, Davidian M, and Tsiatis AA (2002a), ‘An estimator for the proportional hazards model with multiple longitudinal covariates measured with error’, Biostatistics, 3, 511–528. [DOI] [PubMed] [Google Scholar]
  19. Song X, Davidian M, and Tsiatis AA (2002b), ‘A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data’, Biometrics, 58, 742–753. [DOI] [PubMed] [Google Scholar]
  20. Song X and Huang Y (2005), ‘On corrected score approach for proportional hazards model with covariate measurement error’, Biometrics, 61, 702–714. [DOI] [PubMed] [Google Scholar]
  21. Song X and Wang CY (2008), ‘Semiparametric approaches for joint modeling of longitudinal and survival data with time varying coefficients’, Statistica Sinica, 27, 3178–3190. [DOI] [PubMed] [Google Scholar]
  22. Song X and Wang L (2017), ‘Partially time-varying coefficient proportional hazards models with error prone time-dependent covariates — an application to the AIDS clinical trial group 175 data.’, The Annals of Applied Statistics, 11, 274–296. [Google Scholar]
  23. Stone CJ, Hansen M, Kooperberg C, and Truong YK (1997), ‘Polynomial splines and their tensor products in extended linear modeling (with discussion)’, The Annals of Statistics, 25, 1371–1470. [Google Scholar]
  24. Tibshirani R (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society, Series B, 58, 172–183. [Google Scholar]
  25. Tsiatis AA and Davidian M (2001), ‘A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error’, Biometrika, 88, 447–458. [DOI] [PubMed] [Google Scholar]
  26. Wang CY, Hsu L, Feng ZD, and Prentice RL (1997), ‘Regression calibration in failure time regression’, Biometrics, 53, 131–145. [PubMed] [Google Scholar]
  27. Wulfsohn MS and Tsiatis AA (1997), ‘A joint model for survival and longitudinal data measured with error’, Biometrics, 53, 330–339. [PubMed] [Google Scholar]
  28. Xu J and Zeger SL (2001), ‘Joint analysis of longitudinal data comprising repeated measures and times to events’, Applied Statistics, 50, 375–387. [Google Scholar]
  29. Xue L (2009), ‘Consistent variable selection in additive models’, Statistica Sinica, 19, 1281–1296. [Google Scholar]
  30. Xue L, Qu A, and Zhou J (2010), ‘Consistent model selection for marginal generalized additive model for correlated data’, Journal of the American Statistical Association, 105 (492), 1518–1530. [Google Scholar]
  31. Xue L and Yang L (2006), ‘Additive coefficient modeling via polynomial spline’, Statistica Sinica, 16, 1423–1446. [Google Scholar]
  32. Yan J and Huang J (2012), ‘Model selection for Cox models with time-varying coefficients’, Biometrics, 68, 419–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Zhang H and Lu W (2007), ‘Adaptive lasso for Cox’s proportional hazards model’, Biometrika, 94, 691–703. [Google Scholar]

RESOURCES