Variable selection for partially linear proportional hazards model with covariate measurement error

Xiao Song; Li Wang; Shuangge Ma; Hanwen Huang

doi:10.1080/10485252.2018.1545903

. Author manuscript; available in PMC: 2020 Oct 9.

Published in final edited form as: J Nonparametr Stat. 2018 Nov 14;31(1):196–220. doi: 10.1080/10485252.2018.1545903

Variable selection for partially linear proportional hazards model with covariate measurement error

Xiao Song ^a,^*, Li Wang ^b, Shuangge Ma ^c, Hanwen Huang ^a

PMCID: PMC7546028 NIHMSID: NIHMS1512684 PMID: 33041606

Abstract

In survival analysis, we may encounter the following three problems: nonlinear covariate effect, variable selection and measurement error. Existing studies only address one or two of these problems. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularized estimation, a penalization approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realized using an iterative algorithm. The performance of the proposed approach is assessed via simulation studies, and further illustrated by application to data from an AIDS clinical trial.

Keywords: Corrected score, conditional score, joint modeling, polynomial spline, survival, 62J07, 62G05, 62G20, 62N01

1. Introduction

In survival analysis, the Cox proportional hazards model has been extensively adopted. When the assumption of linear covariate effects is not sufficient, the partially linear (varying coefficient) proportional hazards model has been assumed. In the literature, estimation, inference, application of the partially linear proportional hazards model have been conducted. For references, we refer to Cai et al. (2008) and Nan et al. (2005). Among all covariates collected, not all are necessarily associated with survival, creating a demand for variable selection. Multiple techniques have been proposed for the purpose of variable selection (along with estimation), including for example penalization, Bayesian, boosting, thresholding, and others. Among them, penalization has drawn special attention, because of its appealing methodological, theoretical, and computational properties. Penalized variable selection and estimation with the Cox model has been studied in Fan and Li (2002), Gui and Li (2005) and Zhang and Lu (2007). Yan and Huang (2012) studied penalized variable selection for varying coefficient proportional hazards model. It is noted that, in most of the existing studies, linear covariate effects have been assumed. More remotely relevant to this study, penalized variable selection and estimation with linear covariate effects have been considered for other survival models, for example the accelerated failure time (AFT) model (Huang and Ma, 2010) and additive risk model (Ma et al., 2006).

In most of the existing studies, including the aforementioned, it has been assumed that the covariate values have been measured without error. The measurement error problem has been examined in quite a few publications. Under the standard proportional hazards model, available approaches include the regression calibration (Prentice, 1982; Wang et al., 1997; Dafni and Tsiatis, 1998), likelihood-based approaches (Wulfsohn and Tsiatis, 1997; Faucett and Thomas, 1996; Henderson et al., 2000; Xu and Zeger, 2001; Song et al., 2002b), conditional score (Tsiatis and Davidian, 2001; Song et al., 2002a) and correction approaches (Huang and Wang, 2000), among others. When the more challenging partially linear covariate effects are present, existing studies include the local conditional score and corrected score approaches based on kernel smoothing (Song and Wang, 2008) and spline smoothing (Song and Wang, 2017).

In summary, existing approaches only address one or two of the following problems: nonlinear covariate effect, variable selection using penalization, and measurement error. However, it is not hard to imagine that all three problems can co-exist. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularized estimation, a penalization approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realized using an iterative algorithm. We note that the individual components of the proposed model/approach may have roots in the existing literature, however, the “combination”, which can tackle a practically important problem, has not been investigated in the existing studies. The increasing complexity brings significant methodological, computational, and theoretical challenges.

The paper is organized as follows. In Section 2, we give the model definition. In Section 3, we first review the spline-based conditional score and corrected score approaches in Section 3.1, then we develop the corresponding penalized approaches in Section 3.2. Section 3.3 presents the asymptotic properties of the estimators. We assess the performance of the approaches via simulations in Section 4. The approaches are applied to the ACTG 175 data in Section 5. Section 6 provides conclusion remarks and a brief discussion.

2. Model definition

Let T denote the failure time and C denote the censoring time. The observed survival data are V = min(T, C), and Δ = I(T ≤ C), where I(·) is the indicator function. Let H = (H₁,…, H_K)^⊤ denote K covariates. To deal with measurement error, it is required to have repeated error-prone measurements, or a validation set, or instrumental variables. We focus on the case with replicated measurements for error-prone covariates measurements; the proposed approaches can be easily extended to the other cases. Suppose that the kth covariate H_k may be measured with error m_k times with $W_{k} = {(W_{k 1}, \dots, W_{k m_{k}})}^{⊤}$ being the m_k error-contaminated measurements. To ensure identifiability, for error contaminated covariates, we assume that a subset of subjects have replicated observations, that is, W_k > 1. For error-free covariates, m_k = 1 and W_k = H_k. Let $W = {(W_{1}^{⊤}, \dots, W_{K}^{⊤})}^{⊤}$ , and m = (m₁,…,m_K)^⊤.

We assume the classical measure error model

W_{k j} = H_{k} + e_{k j}, j = 1, \dots, m_{k}, k = 1, \dots, K,

(1)

where the error e_kj is normally-distributed with mean zero and variance $σ_{k}^{2}$ . For error-free covariates, e_kj = 0. Let $e = {(e_{1}^{⊤}, \dots, e_{K}^{⊤})}^{⊤}$ , where $e_{k} = {(e_{k 1}, \dots, e_{k m_{k}})}^{⊤}$ . We assume that the errors are independent, and e is independent of (T, C) given H.

Suppose the first K₁ covariates X have constant effects on survival and the last K₂ covariates Z have possible time-varying effect on survival, that is, H = (X^⊤, Z^⊤)^⊤, and K = K₁ + K₂. A partially time-varying coefficient proportional hazards model is assumed for the relationship between the hazard of failure and the covariates,

λ (u | H) = \lim_{d u \to 0} d u^{- 1} \Pr (u \leq T < u + d u | T \geq u, H, C) = λ_{0} (u) \exp {β_{0}^{⊤} X + α_{0}^{⊤} (u) Z} .

(2)

Here λ₀(u) is an unspecified baseline hazard; β₀ is a length-K₁ vector of regression parameters and α₀(u) is a length-K₂ vector of smooth functions. Model (2) subsumes the standard proportional hazards model (K₂ = 0) and the varying-coefficient model (K₁ = 0). It makes explicit the assumption that censoring is noninformative.

Suppose the observed data are independent and identically distributed samples of (V, Δ, W, m}, which are denoted by {(V_i, Δ_i, W_i, m_i) : i = 1,…,n}. We focus on estimating of the regression parameters β₀ and α₀(u).

3. Approaches

3.1. Estimation

For now, we assume the errors $(σ_{1}^{2}, \dots, σ_{K}^{2})$ are known. Song and Wang (2017) have proposed spline-based corrected score and conditional score approaches when time-dependent covariates are measured with error, which may be easily adopted in this case as follows. Specifically, let α_0k(u) be the kth component of α₀(u). B-spline basis expansion is used to approximate α_0k(u):

α_{0 k} (u) \approx \sum_{ℓ = 1}^{L_{k}} γ_{0 k l} B_{k ℓ} (u),

where ${B_{k ℓ} (u)}_{ℓ = 1}^{L_{k}}$ is a set of basis functions, and L_k = n_k + d + 1 is the number of basis functions in approximating the function α_0k(u), with n_k being the number of interior knots and d the degree of spline. The interior knots of the splines can be either equally spaced or placed on the sample quantiles of the failed events so that there are about the same number of events between any two adjacent knots. In practice, if the failure events are sparse, we recommend the second approach to reduce the chances of getting singularities. With the approximation, model (2) can be written in a form of the standard proportional hazards model:

λ_{i} (u) \approx λ_{0} (u) \exp {θ_{0}^{⊤} R_{i} (u)} .

(3)

Here R(u) = (X^⊤,Z^⊤B(u))^⊤ is the vector of “covariates”, where

B (u) = [\begin{matrix} B_{11} (u) & \dots & B_{1 L_{1}} (u) & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & \dots & 0 & B_{21} (u) & \dots & B_{2 L_{2}} (u) & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & \dots & 0 & 0 & \dots & 0 & B_{K_{2} 1} (u) & \dots & B_{K_{2} L_{K_{2}}} (u) \end{matrix}]

is a K₂ × L matrix with $L = \sum_{k = 1}^{K_{2}} L_{k}$ , $γ_{0 k} = {(γ_{0 k 1}, \dots, γ_{0 k L_{k}})}^{⊤}$ for k = 1,…,k, $γ_{0} = {(γ_{01}^{⊤}, \dots, γ_{0 K_{2}}^{⊤})}^{⊤}$ , and $θ_{0} = {(β_{0}^{⊤}, γ_{0}^{⊤})}^{⊤}$ . The regression coefficient θ₀ in (3) can be estimated by measurement-error dealing techniques, such as the corrected score and conditional score approaches.

Corrected score

The idea of corrected score (correction) approach is to correct the bias of the naive estimating function that obtained from replacing the true covariates by their sample means in the partial likelihood estimating function (Huang and Wang, 2000). Let g denote a scalar, vector or matrix which can be fixed or random, Y_i(u) = I(V_i ≥ u) be the “at-risk” process, N_i(u) = I(V_i ≤ u, Δ_i = 1) be the counting process for the failure events, and η^⊤(u) = (β^⊤, γ^⊤ B^⊤ (u))^⊤. Let θ = (β^⊤, γ^⊤)^⊤. The spline-based corrected score estimating equation can be written as

U_{n}^{c} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{τ} {{\hat{R}}_{i} (u) + Σ_{R_{i}} (u) θ - \frac{S_{n}^{c} (u, η) [\hat{R}]}{S_{n}^{c} (u, η) [1]}} d N_{i} (u) = 0

(4)

for a fixed time τ. Here ${\hat{R}}_{i} (u) = {({\bar{X}}_{i}^{⊤}, {\bar{Z}}_{i}^{⊤} B (u))}^{⊤}$ , and ${\hat{H}}_{i} = {({\bar{X}}_{i}^{⊤}, {\bar{Z}}_{i}^{⊤})}^{⊤}$ with the kth component equal to ${\hat{H}}_{i k} = m_{i k}^{- 1} \sum_{j = 1}^{m_{i k}} W_{i k j}$ ; $\sum_{R_{i}} (u)$ is the variance of ${\hat{R}}_{i} (u)$ given H_i; and for a scalar, vector or matrix g, $S_{n}^{c} (u, η) [g] = n^{- 1} \sum_{i = 1}^{n} S_{n i}^{c} (u, η) [g]$ with

S_{n i}^{c} (u, η) [g] = Y_{i} (t) g_{i} \exp {η^{⊤} (u) {\hat{H}}_{i} - η^{⊤} (u) Σ_{H_{i}} η^{⊤} (u) / 2} .

Here $Σ_{H_{i}} = diag (m_{i 1}^{- 1} σ_{1}^{2}, \dots, m_{i K}^{- 1} σ_{K}^{2})$ is the variance of ${\hat{H}}_{i}$ given H_i, and $Σ_{R_{i}} (u) = B_{*}^{⊤} (u) Σ_{H_{i}} B_{*} (u)$ with

B_{*} (u) = (\begin{array}{l} I_{K_{1} \times K_{1}} & 0_{K_{1} \times L} \\ 0_{K_{2} \times K_{1}} & B (u) \end{array})

with $I_{K_{1} \times K_{1}}$ denoting a (K₁ × K₁) identity matrix and 0_r×s an (r × s) zero matrix.

Conditional score

The conditional score approach treats the unobserved true covariates as nuisance parameters for which sufficient statistics may be derived, and a set of estimating equations based on conditioning on the sufficient statistics may be deduced that remove the dependence on the true covariates (Tsiatis and Davidian, 2001). The spline-based conditional score estimating equation can be written as

U_{n}^{d} (θ) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} {{\hat{R}}_{i}^{*} (u) - \frac{S_{n}^{d} (u, η) [{\hat{R}}_{i}^{*}]}{S_{n}^{d} (u, η) [1]}} d N_{i} (u) = 0,

(5)

where ${\hat{R}}_{i}^{*} (u) = {\hat{R}}_{i} (u) + Σ_{R_{i}} (u) θ d N_{i} (u)$ is a “sufficient statistic” for R_i(u), and

S_{n}^{d} (u, η) [g] = n^{- 1} \sum_{i = 1}^{n} Y_{i} (u) g_{i} \exp {θ^{⊤} {\hat{R}}_{i}^{*} (u) - θ^{⊤} Σ_{R_{i}} (u) θ / 2} .

When there is no measurement error (all $σ_{k}^{2} = 0$ ), it follows that $Σ_{R_{i}} (u) = 0$ , ${\hat{R}}_{i} (u) = R_{i} (u)$ and ${\hat{R}}_{i}^{*} (u) = R_{i} (u)$ , and thus both (4) and (5) reduce to the standard partial likelihood score estimating function for (3).

In practice, $(σ_{1}^{2}, \dots, σ_{K}^{2})$ are generally unknown. They can be estimated by methods of moments. The correct score and conditional score estimates can be obtained by substituting $({\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{K}^{2})$ for $(σ_{1}^{2}, \dots, σ_{K}^{2})$ in $Σ_{R_{i}} (u)$ in (4) and (5).

3.2. Penalized variable selection

Assume there are multiple covariates. We are interested in estimating $η_{0} (u) = {(β_{0}^{⊤}, α_{0}^{⊤} (u))}^{⊤}$ , when some components of η₀(u) are zero and correspond to covariates that are not associated with the response. Without loss of generality, write $β_{0} = {(β_{0}^{s ⊤}, β_{0}^{z ⊤})}^{⊤}$ where $β_{0}^{s}$ contains $K_{1}^{s}$ nonzero elements, $β_{0}^{z}$ contains $K_{1}^{z}$ zero elements, and $K_{1} = K_{1}^{s} + K_{1}^{z}$ . Write $α_{0} (u) = {(α_{0}^{s ⊤} (u), α_{0}^{z ⊤} (u))}^{⊤}$ , where $α_{0}^{s}$ contains $K_{2}^{s}$ nonzero elements, $α_{0}^{s} (u)$ contains $K_{2}^{z}$ zero elements and $K_{2} = K_{2}^{s} + K_{2}^{z}$ .

One popular technique for regression-based variable selection is the penalization, which can be applied to M-estimators, Z-estimators and U-estimators. Let $G_{k} = {g_{k} (u) : g_{k} (u) = \sum_{ℓ = 1}^{L_{K}} γ_{k ℓ} B_{k ℓ} (u)}$ for k = 1,…,K₂. Let $G * = R^{K_{1}} \times G_{1} \times \dots \times G_{K_{2}}$ . For any vector function g on [0, τ], let ∥g∥² = E{Δg^⊤ (V)g(V)}, $‖ g ‖_{n}^{2} = n^{- 1} \sum_{i = 1}^{n} Δ_{i} g^{⊤} (V_{i}) g (V_{i})$ , $‖ g ‖_{2}^{2} = \int g^{⊤} (u) g (u) d u$ . For any k = 1,…,K₂ and ℓ, ℓ′ = 1,…,L_k, define the inner product $〈 B_{k ℓ}, B_{k ℓ^{'}} 〉 = \int B_{k ℓ} (u) B_{k ℓ^{'}} (u) d u$ with norm ∥B_kℓ∥² = 〈B_kℓ, B_kℓ〉. For $B_{k} = {(B_{k 1}, \dots, B_{k L_{k}})}^{⊤}$ , let

〈 B_{k}, B_{k} 〉 = (\begin{matrix} 〈 B_{k 1}, B_{k 1} 〉 & \dots & 〈 B_{k 1}, B_{k L_{k}} 〉 \\ ⋮ & ⋱ & ⋮ \\ 〈 B_{k q_{k}}, B_{k 1} 〉 & \dots & 〈 B_{k L_{k}}, B_{k L_{k}} 〉 \end{matrix}) .

We apply the penalization with respect to functions $g = {(g_{1}, \dots, g_{K})}^{⊤} \in G *$ . It can be easily seen that $U_{n}^{c} (θ)$ is the derivative of the corrected log partial likelihood

L_{n}^{c} (θ) = n^{- 1} \int_{0}^{τ} {θ^{⊤} {\hat{R}}_{i} (u) + \frac{1}{2} θ^{⊤} Σ_{R_{i}} (u) θ - \log S_{n}^{c} (u, η) [1]} d N_{i} (u),

(6)

which is also a function of $η \in G *$ . When applying penalization to $L_{n}^{c} (θ)$ , we maximize the following objective function

L_{n}^{c P} (θ) = L_{n}^{c} (θ) - \sum_{k = 1}^{K_{1}} p_{ν_{1}} (| β_{k} |) - \sum_{k = 1}^{K_{2}} p_{ν_{2}} ({‖ γ_{k} ‖}_{B_{k}}),

(7)

where ${‖ γ_{k} ‖}_{B_{k}}^{2} = γ_{k}^{⊤} 〈 B_{k}, B_{k} 〉 γ_{k}$ as in Xue (2009) and p_ν(·) is some penalty function. Taking the derivative with respect to θ in (7), we obtain the penalized corrected score estimating equation

U_{n}^{c} (θ) - \sum_{k = 1}^{K_{1}} p_{ν_{1}}^{'} β_{k} / | β_{k} | - \sum_{k = 1}^{K_{2}} p_{ν_{2}}^{'} ({‖ γ_{k} ‖}_{B_{k}}) {‖ γ_{k} ‖}_{B_{k}}^{- 1} 〈 B_{k}, B_{k} 〉 γ_{k} = 0,

(8)

where $p_{ν_{1}}^{'} (s)$ is the derivative of p_v(s) for s > 0 and $p_{ν}^{'} (0) = 0$ . Estimating equation (8) can be rewritten as

U_{n}^{c P} (θ) = U_{n}^{c} (θ) - Ω_{ν} (θ) θ = 0,

where

Ω_{ν} (θ) = diag {\frac{β_{1} p_{ν_{1}}^{'} (| β_{1} |)}{| β_{1} |}, \dots, \frac{β_{K_{1}} p_{ν_{1}}^{'} (| β_{K_{1}} |)}{| β_{K_{1}} |}, \frac{p_{ν_{2}}^{'} ({‖ γ_{1} ‖}_{B_{1}})}{{‖ γ_{1} ‖}_{B_{1}}} 〈 B_{1}, B_{1} 〉, \dots, \frac{p_{ν_{2}}^{'} ({‖ γ_{K_{2}} ‖}_{B_{K_{2}}})}{{‖ γ_{K_{2}} ‖}_{B_{K_{2}}}} 〈 B_{K_{2}}, B_{K_{2}} 〉} .

Popular penalty functions include LASSO (Tibshirani, 1996) and SCAD (Fan and Li, 2001). Since the LASSO lacks the oracle property, we focus on SCAD, which is defined by

p_{ν} (u) = {\begin{matrix} ν u & 0 \leq u \leq ν, \\ - \frac{(u^{2} - 2 a ν u + ν^{2})}{2 (a - 1)} & ν < u < a ν, \\ \frac{(a + 1) ν^{2}}{2} & u \geq a ν, \end{matrix}

with

p_{ν}^{'} (u) = ν {I (u \leq ν) + \frac{{(a ν - u)}_{+}}{(a - 1) ν} I (u > ν)} .

Here the tuning parameter ν controls the variable selection, and the parameters $L_{1}, \dots, L_{K_{2}}$ in the spline functions control the smoothness of the estimated functions ${\hat{α}}_{k} (\cdot)$ .

We use the majorize-minorize (MM) algorithm to obtain the penalized corrected score estimator ${\hat{θ}}^{c}$ . Using local quadratic approximation (Fan and Li, 2001), the MM algorithm sets

θ^{c (k + 1)} = θ^{c (k)} - {\frac{\partial U_{n}^{c} (θ^{c (k)})}{\partial θ^{⊤}} - Ω_{ν}^{*} (θ^{c (k)})}^{- 1} U_{n}^{c P} (θ^{(k)})

at the (k + 1)th iteration for k ≥ 0, where θ^c(0) is the solution to $U_{n}^{c} (θ) = 0$ , and

Ω_{ν}^{*} (θ) = diag {\frac{β_{1} p_{ν_{1}}^{'} (| β_{1} |)}{| β_{1} | + ε}, \dots, \frac{β_{K_{1}} p_{ν_{1}}^{'} (| β_{K_{1}} |)}{| β_{K_{1}} | + ε}, \frac{p_{ν_{2}}^{'} ({‖ γ_{1} ‖}_{B_{1}})}{{‖ γ_{1} ‖}_{B_{1}} + ε} < B_{1}, B_{1} >, \dots, \frac{p_{ν_{2}}^{'} ({‖ γ_{K_{2}} ‖}_{B_{K_{2}}})}{{‖ γ_{K_{2}} ‖}_{B_{K_{2}}} + ε} < B_{K_{2}}, B_{K_{2}} >}

for a small number ε. We set ε = 10⁻³ in our numerical studies.

Similarly, we propose the penalized conditional score estimating equation

U_{n}^{d P} (θ) = U_{n}^{d} (θ) - Ω_{ν} (θ) θ = 0 .

(9)

The estimator ${\hat{θ}}^{d}$ can be obtained using the MM algorithm through the iterations

θ^{d (k + 1)} = θ^{d (k)} - {\frac{\partial U_{n}^{d} (θ^{d (k)})}{\partial θ^{⊤}} - Ω_{ν}^{*} (θ^{d (k)})}^{- 1} U_{n}^{d P} (θ^{d (k)}) .

3.3. Asymptotic properties

In this section, we derive the asymptotic properties of the proposed estimators. We first assumeassume $σ_{k}^{2} (k = 1, \dots, K)$ are known. To reduce the complexity of studying the asymptotics, we consider equally spaced knots and assume the numbers of the knots are all equal for k = 1,…,K₂. Let $h = n_{k}^{- 1}$ be the length of the subintervals between any two adjacent interior knots. The asymptotic properties of the penalized corrected score estimator, ${\hat{θ}}^{c} = {({\hat{β}}^{c ⊤}, {\hat{γ}}^{c ⊤})}^{⊤}$ , are given in the following theorems with the proof outlined in the Appendix.

Theorem 3.1

Under Conditions (C1)–(C10) given in the Appendix, if the tuning parameters v₁ → 0, v₂ → 0 almost surely there exists a solution ${\hat{η}}^{c} (u) = {({\hat{β}}^{c ⊤}, {\hat{γ}}^{c ⊤} B^{⊤} (u))}^{⊤}$ such that

‖ {\hat{η}}^{c} - η_{0} ‖ = O {{(n h)}^{- 1 / 2}} .

Note that the estimator of α(u) is $B (u) {\hat{γ}}^{c}$ . Corresponding to the non-zero and zero coefficients, write ${\hat{β}}^{c} = {({\hat{β}}^{s c ⊤}, {\hat{β}}_{0}^{z c ⊤})}^{⊤}$ and ${\hat{α}}^{c} = {({\hat{α}}^{s c ⊤}, {\hat{α}}^{z c ⊤})}^{⊤}$ .

Theorem 3.2

Under Conditions (C1)–(C10) given in the Appendix, if ν₁ → 0, ν₂ → 0, ν₁(nh)^1/2 → ∞ and ν₂(nh)^1/2 → ∞, then with probability approaching to 1, ${\hat{α}}^{z c} = 0$ and ${\hat{β}}^{z c} = 0$ .

Theorem 3.3

Under Conditions (C1)–(C10) given in the Appendix, if ν₁ → 0, ν₂ → 0, ν₁(nh)^1/2 → ∞, and ν₂(nh)^1/2 → ∞, then

\sqrt{n} {(Σ_{β^{s}}^{c})}^{- 1 / 2} ({\hat{β}}^{s c} - β_{0}^{s}) \overset{d}{\to} N (0, I),

where $Σ_{β^{s}}^{c}$ is given in (A.5) in the Appendix. In addition, $Σ_{β^{s}}^{c} - Σ_{β^{s}}$ is positive definite, where $Σ_{β^{s}}$ is the variance of the estimator when there is no measurement error.

Theorem 3.1 indicates the consistency of the penalized corrected score estimator. Theorems 3.2 and 3.3 indicate that the estimator has the “oracle” property (Donoho and Johnstone, 1998), that is, as n → ∞, the penalized corrected score estimator performs as well as if the correct submodel that excluding the zero effect covariates is known.

When there are some error variance $σ_{k}^{2}$ greater than 0 and unknown, they can be estimated by the method of moments estimator ${\hat{σ}}_{k}^{2}$ (Song et al., 2002a):

{\hat{σ}}_{k}^{2} = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} I (m_{i k} > 1) {(W_{i k j} - {\bar{W}}_{i k})}^{2}}{\sum_{i = 1}^{n} I (m_{i k} > 1) (m_{i k} - 1)} .

This requires P(m_ik > 1) > 0 for error contaminated covariates (Condition (C11) in the Appendix). The corrected score and conditional score estimates can be obtained by substituting ${\hat{σ}}_{k}^{2}$ for $σ_{k}^{2}$ in (8) and (9). It can be easily shown that ${\hat{σ}}_{k}^{2}$ is a root-n consistent estimator of $σ_{k}^{2}$ . Replacing $σ_{k}^{2}$ with ${\hat{σ}}_{k}^{2}$ does not affect the convergence rate of the penalized corrected score and conditional score estimators. The asymptotic normality of ${\hat{β}}^{s c}$ is given in the following theorem.

Theorem 3.4

Under Conditions (C1)–(C11) given in the Appendix, if ν₁ → 0, ν₂ → 0, ν₁(nh)^1/2 → ∞, and ν₂(nh)^1/2 → ∞, then

\sqrt{n} {(Σ_{β^{s}}^{c})}^{- 1 / 2} ({\hat{β}}^{s c} - β_{0}^{s}) \overset{d}{\to} N (0, I),

where $Σ_{β^{s}}^{c *}$ is given in (A.8) in the Appendix. In addition, $Σ_{β^{s}}^{c *} - Σ_{β^{s}}^{c}$ is positive definite.

Theorem 3.4 indicates that the estimator is less efficient when the error variances are estimated.

With similar arguments as those in Song and Wang (2017), we can show that the penalized conditional score estimator has the same asymptotic properties as the penalized corrected score estimator. The asymptotic distribution result enables us to construct confidence intervals for the coefficients simultaneously.

4. Simulation studies

We conducted simulation studies to evaluate the performance of the estimators. We considered the case that there are 15 covariates. The covariates are generated from multivariate normal distributions with common correlation ρ = 0, 0.25 or 0.5. Among them, four are measured with error with the corresponding coefficients β₁ = 0, β₂ = 0, β₃ = −1, and α₁(u) = 0.3log(u/5+1)−1.8, and 11 covariates are exactly measured with the corresponding coefficients β_j = 0 for j = 4,…,12, β₁₃ = −1, and α₂(u) = 0. The variance of the error is equal to 0.25. The error contaminated covariates has two replicated observations. The baseline hazard is a constant λ₀(u) = 0.0005. The censoring distribution was generated from an exponential distribution with mean 400. The censoring rates are between 38% to 41%.

We ran the simulations for n = 300, 500 and 1000. In each scenario, 500 Monte Carlo datasets were simulated. For each dataset, The coefficients were estimated using the following approaches: (i) the “ideal” approach where the true values of the covariates are used; (ii) the naive approach; (iii) the corrected score approach; (iv) the conditional score approach; (v) the penalized “ideal”, naive, corrected score and conditional score approaches.

To reduce computational complexity, we considered ν₁ = ν₂ = ν in our numerical studies. This is justified by Theorems 3.1–3.3 in Section 3.3. We used quadratic splines with equally spaced knots. The penalty parameter ν and the number of knots were selected via a BIC type criterion, specifically, by minimizing 2L_n(θ) + n_p log(d), where n_p is the number of estimated nonzero parameters, d is the number of events, L_n(θ) is the log partial likelihood function for the “ideal” penalized and unpenalized approaches, the naive log partial likelihood function for the naive approaches, and the corrected log partial likelihood function for the corrected score and conditional score approaches. Our preliminary studies found that zero interior knots were selected for most of the datasets. Here we show the results with zero interior knots.

For the non-zero time-varying coefficients, we calculated the average of the mean absolute bias, $\sum_{u = 1}^{160} | {\hat{α}}_{1} (u) - α_{1} (u) | / 160$ , at equally spaced grids between 1 and 160, which are the 5th and 95th percentiles of the observed survival times, across the simulated datasets. Similarly, we calculated the average of the mean standard deviation, mean standard error and mean coverage probability of 95% Wald confidence intervals. For the constant non-zero coefficient, we gave the same statistics except replacing the mean absolute bias by the mean bias. The results for the nonzero coefficients are shown in Tables 1–3. Figure 1 shows the average of the estimates of α₁ and the corresponding 95% point-wise confidence intervals. For all the estimators, the standard deviation tends to increase with ρ increases, and the penalized estimators are more efficient than the corresponding unpenalized estimators. The unpenalized and penalized naive approaches have large bias and poor coverage probabilities on estimation of β₃ and α₁(u) in all the cases, and the coverage probabilities worsen as sample size increases. The unpenalized corrected score and conditional score estimators also show relatively large bias when n = 300, and the coverage probability is somewhat below the nominal level, but their performance improves when sample size increases. The corresponding penalized approaches not only reduce bias but also improve efficiency, especially when the correlation between the covariates is large. Penalization also improves the coverage probabilities of the conditional score approaches. Although the conditional score and the corrected score estimators have the same asymptotic distributions, the conditional score estimators have smaller bias and standard errors, which indicates that they have better finite sample performance. An intuitive explanation can be found in Song and Huang (2005) (Section 3.1).

Table 1.

Estimation of the nonzero coefficients when n = 300

		β₃					β₁₃					α₁ (u)
		Bias	SD	SE	CP	RE	Bias	SD	SE	CP	RE	Bias	SD	SE	CP	RE
Corr = 0.00	Ideal	−0.061	0.099	0.098	0.914		−0.059	0.102	0.098	0.900		0.071	0.190	0.183	0.919
	Naive	0.152	0.098	0.088	0.548		0.045	0.101	0.094	0.894		0.158	0.177	0.165	0.760
	Corr	−0.187	0.228	0.212	0.956		−0.151	0.173	0.152	0.889		0.218	0.331	0.267	0.879
	Cond	−0.147	0.185	0.146	0.830		−0.121	0.149	0.125	0.836		0.173	0.288	0.240	0.875
	P Ideal	−0.023	0.092	0.091	0.926	1.17	−0.020	0.093	0.091	0.946	1.19	0.039	0.179	0.170	0.919	1.13
	P Naive	0.187	0.090	0.083	0.388	1.16	0.083	0.094	0.088	0.811	1.15	0.199	0.167	0.156	0.641	1.13
	P Corr	−0.065	0.158	0.133	0.904	2.08	−0.054	0.133	0.110	0.895	1.70	0.079	0.257	0.195	0.861	1.66
	P Cond	−0.045	0.147	0.128	0.915	1.57	−0.040	0.126	0.113	0.930	1.41	0.059	0.244	0.217	0.916	1.39
Corr = 0.25	Ideal	−0.049	0.108	0.108	0.942		−0.052	0.108	0.108	0.934		0.055	0.221	0.210	0.929
	Naive	0.175	0.100	0.095	0.530		0.036	0.106	0.104	0.910		0.191	0.197	0.187	0.721
	Corr	−0.184	0.210	0.246	0.977		−0.130	0.151	0.154	0.899		0.204	0.354	0.303	0.894
	Cond	−0.141	0.188	0.162	0.890		−0.101	0.143	0.132	0.898		0.160	0.328	0.280	0.905
	P Ideal	−0.012	0.097	0.095	0.938	1.24	−0.012	0.096	0.095	0.934	1.26	0.045	0.205	0.191	0.921	1.17
	P Naive	0.183	0.092	0.086	0.457	1.19	0.047	0.097	0.094	0.895	1.20	0.207	0.183	0.172	0.647	1.15
	P Corr	−0.054	0.153	0.139	0.947	1.88	−0.038	0.125	0.111	0.912	1.46	0.062	0.281	0.214	0.867	1.59
	P Cond	−0.034	0.145	0.131	0.934	1.67	−0.025	0.121	0.113	0.936	1.40	0.050	0.271	0.244	0.924	1.46
Corr = 0.50	Ideal	−0.057	0.143	0.127	0.894		−0.061	0.140	0.128	0.904		0.072	0.272	0.254	0.926
	Naive	0.214	0.127	0.109	0.504		0.013	0.139	0.125	0.922		0.217	0.236	0.221	0.719
	Corr	−0.252	0.349	0.440	0.973		−0.172	0.229	0.239	0.924		0.324	0.568	0.513	0.900
	Cond	−0.196	0.283	0.212	0.878		−0.128	0.201	0.163	0.868		0.255	0.475	0.382	0.903
	P Ideal	−0.014	0.122	0.107	0.920	1.37	−0.022	0.116	0.108	0.924	1.47	0.045	0.246	0.227	0.929	1.22
	P Naive	0.199	0.124	0.096	0.475	1.05	−0.018	0.119	0.108	0.933	1.36	0.205	0.221	0.200	0.695	1.15
	P Corr	−0.060	0.255	0.166	0.915	1.87	−0.059	0.173	0.132	0.913	1.77	0.101	0.477	0.270	0.849	1.42
	P Cond	−0.029	0.187	0.150	0.915	2.27	−0.032	0.143	0.127	0.925	1.97	0.067	0.360	0.306	0.927	1.74

Open in a new tab

Corr, corrected score; Cond, conditional score; P, penalized; SD, empirical standard deviation; SE, average of estimated standard errors; CP, empirical coverage probability of 95% confidence interval; NC: non-convergence rate (%).

Table 3.

Estimation of the nonzero coefficients when n = 1000

		β₃					β₁₃					α₁ (u)
		Bias	SD	SE	CP	RE	Bias	SD	SE	CP	RE	Bias	SD	SE	CP	RE
Corr = 0.00	Ideal	−0.018	0.051	0.051	0.950		−0.016	0.052	0.051	0.936		0.037	0.091	0.094	0.924
	Naive	0.188	0.048	0.045	0.022		0.089	0.053	0.049	0.536		0.206	0.087	0.085	0.270
	Corr	−0.052	0.080	0.081	0.928		−0.038	0.070	0.063	0.900		0.055	0.128	0.112	0.883
	Cond	−0.045	0.079	0.071	0.886		−0.034	0.069	0.063	0.904		0.049	0.126	0.119	0.906
	P Ideal	−0.007	0.050	0.049	0.950	1.05	−0.006	0.051	0.049	0.944	1.04	0.037	0.090	0.092	0.921	1.03
	P Naive	0.197	0.047	0.046	0.014	1.05	0.100	0.052	0.049	0.474	1.02	0.219	0.085	0.085	0.238	1.04
	P Corr	−0.024	0.076	0.070	0.912	1.13	−0.017	0.068	0.060	0.904	1.05	0.040	0.123	0.103	0.872	1.08
	P Cond	−0.019	0.074	0.070	0.938	1.12	−0.013	0.067	0.062	0.924	1.05	0.038	0.122	0.118	0.922	1.08
Corr = 0.25	Ideal	−0.020	0.057	0.056	0.936		−0.013	0.061	0.056	0.936		0.040	0.113	0.108	0.911
	Naive	0.202	0.052	0.049	0.016		0.078	0.060	0.054	0.654		0.224	0.102	0.096	0.288
	Corr	−0.059	0.090	0.093	0.952		−0.035	0.079	0.067	0.890		0.059	0.157	0.128	0.869
	Cond	−0.050	0.088	0.079	0.896		−0.029	0.078	0.067	0.900		0.053	0.153	0.139	0.906
	P Ideal	−0.010	0.054	0.052	0.928	1.11	−0.003	0.055	0.052	0.930	1.20	0.040	0.110	0.104	0.912	1.06
	P Naive	0.186	0.050	0.048	0.044	1.11	0.059	0.056	0.052	0.760	1.15	0.213	0.099	0.095	0.307	1.06
	P Corr	−0.027	0.080	0.075	0.930	1.29	−0.012	0.070	0.061	0.900	1.27	0.045	0.146	0.115	0.859	1.15
	P Cond	−0.020	0.078	0.074	0.936	1.27	−0.008	0.069	0.063	0.924	1.26	0.042	0.143	0.134	0.920	1.14
Corr = 0.50	Ideal	−0.017	0.065	0.065	0.948		−0.013	0.067	0.065	0.950		0.050	0.130	0.130	0.922
	Naive	0.247	0.062	0.056	0.022		0.064	0.070	0.064	0.816		0.257	0.113	0.112	0.297
	Corr	−0.068	0.119	0.133	0.978		−0.038	0.090	0.081	0.902		0.088	0.197	0.165	0.865
	Cond	−0.056	0.115	0.097	0.888		−0.031	0.088	0.079	0.918		0.077	0.192	0.176	0.908
	P Ideal	−0.006	0.058	0.058	0.946	1.28	−0.002	0.060	0.058	0.950	1.25	0.050	0.124	0.123	0.915	1.10
	P Naive	0.197	0.058	0.053	0.070	1.16	0.001	0.063	0.059	0.940	1.26	0.216	0.109	0.109	0.358	1.07
	P Corr	−0.023	0.096	0.086	0.924	1.55	−0.009	0.075	0.069	0.924	1.45	0.057	0.175	0.134	0.844	1.27
	P Cond	−0.016	0.094	0.083	0.922	1.50	−0.006	0.074	0.069	0.932	1.42	0.052	0.172	0.163	0.923	1.25

Open in a new tab

Figure 1. — Average estimates of α₁(u) and the 95% pointwise confidence interval.

To evaluate the performance of the penalized approaches on variable selection, we calculated the percentage of correct selection of model, and the percentage of the covariates included in the model. The results are shown in Table 4. All methods correctly select the covariates with non-zero coefficients in all cases, and percentage of incorrectly selecting covariates with zero coefficients tends to be higher if the coefficient is treated as time-varying in the model. The conditional score and the corrected score estimators perform better than the naive estimator. Among the two proposed methods, the penalized conditional score approach performs slightly better. All penalized approaches improve on variable selection when sample size increases.

Table 4.

Average percentage (%) of correct selection of the model, and percentages of selection of individual covariates with each coefficient

			Model	Non Zero Coef			Zero Coef
				β₃	β₁₃	α₁	β₁	β₂	β₄	β₅	β₆	β₇	β₈	β₉	β₁₀	β₁₁	β₁₂	α₂
n=300	Corr = 0.00	P Ideal	65.8	100	100	100	1.0	2.2	2.2	2.8	2.8	2.8	3.4	3.0	3.4	2.6	2.0	16.5
		P Naive	38.0	100	100	100	12.3	12.7	11.9	11.5	13.1	12.7	10.7	10.5	11.3	15.1	12.5	38.4
		P Corr	35.8	100	100	100	7.7	7.3	6.6	4.7	7.3	6.7	5.6	8.1	6.9	6.9	5.8	30.8
		P Cond	36.7	100	100	100	7.4	7.4	6.4	4.7	7.4	6.4	5.1	7.4	6.6	6.6	5.9	30.4
	Corr = 0.25	P Ideal	74.3	100	100	100	1.4	2.2	1.4	2.4	2.4	1.6	1.0	1.2	2.0	1.8	1.8	15.0
		P Naive	38.3	100	100	100	14.2	14.6	14.9	12.6	11.7	11.9	11.3	13.2	14.4	11.3	12.1	38.3
		P Corr	45.9	100	100	100	6.0	7.5	4.3	4.9	4.3	6.2	4.1	3.6	5.8	3.8	6.2	28.6
		P Cond	49.0	100	100	100	5.4	7.3	3.3	4.1	4.4	4.6	4.1	3.5	3.5	5.6	5.4	27.8
	Corr = 0.50	P Ideal	82.2	100	100	100	0.6	1.0	0.0	0.8	0.4	0.8	0.8	0.8	0.2	0.8	0.6	13.8
		P Naive	37.7	100	100	100	14.7	18.1	13.0	12.6	14.5	14.9	9.8	11.4	11.8	11.8	13.2	42.4
		P Corr	61.3	100	100	100	5.6	4.6	03.6	3.6	3.1	4.4	3.6	3.6	4.6	4.9	3.8	25.6
		P Cond	60.3	100	100	100	4.0	5.7	2.4	3.0	3.0	32	3.0	2.2	32	3.8	2.4	24.7
n=500	Corr = 0.00	P Ideal	90.8	100	100	100	0.8	0.4	0.8	0.4	0.6	0.2	0.2	0.0	0.2	0.0	0.2	5.8
		P Naive	62.8	100	100	100	7.6	6.2	6.4	5.4	4.6	5.8	5.4	5.2	5.2	4.8	4.0	21.6
		P Corr	72.3	100	100	100	2.2	2.4	1.4	1.8	1.0	1.4	3.8	1.4	2.0	1.4	1.4	12.6
		P Cond	74.1	100	100	100	2.0	2.4	1.4	1.6	1.0	1.2	3.4	1.4	2.0	1.2	1.4	11.4
	Corr = 0.25	P Ideal	89.8	100	100	100	0.2	0.2	0.4	0.8	0.4	0.6	0.2	0.8	0.0	0.2	0.0	7.2
		P Naive	58.5	100	100	100	7.8	7.2	7.8	6.6	7.2	6.0	6.8	5.2	8.2	7.6	7.2	25.9
		P Corr	73.0	100	100	100	1.8	1.6	1.6	2.0	1.0	2.2	2.2	1.8	1.4	1.4	1.4	15.9
		P Cond	75.6	100	100	100	1.4	1.6	1.6	1.8	1.0	2.0	1.8	1.2	1.4	1.2	1.0	14.5
	Corr = 0.50	P Ideal	97.2	100	100	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.4	0.0	2.6
		P Naive	55.7	100	100	100	12.0	10.4	8.6	9.6	6.8	7.6	8.6	9.6	11.4	8.4	9.2	28.3
		P Corr	86.9	100	100	100	0.6	0.6	1.0	0.2	0.6	0.4	0.4	0.2	0.6	1.0	0.2	10.1
		P Cond	88.4	100	100	100	0.4	0.8	0.8	0.2	0.4	0.2	0.2	0.2	0.4	0.8	0.2	9.2
n=1000	Corr = 0.00	P Ideal	99.6	100	100	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.4
		P Naive	89.2	100	100	100	1.0	1.2	0.8	1.4	0.2	0.8	1.4	0.8	0.6	1.2	0.8	5.2
		P Corr	96.2	100	100	100	0.0	0.0	0.2	0.0	0.0	0.2	0.2	0.4	0.2	0.2	0.0	2.4
		P Cond	96.2	100	100	100	0.0	0.0	0.2	0.0	0.0	0.2	0.2	0.4	0.2	0.2	0.0	2.4
	Corr = 0.25	P Ideal	99.4	100	100	100	0.0	0.0	0.0	0.0	0.2	0.0	0.0	0.0	0.0	0.0	0.0	0.6
		P Naive	81.6	100	100	100	1.4	2.2	2.0	2.8	2.4	2.2	2.2	2.0	2.4	1.6	1.4	11.8
		P Corr	95.8	100	100	100	0.2	0.4	0.0	0.0	0.0	0.2	0.0	0.2	0.0	0.0	0.0	3.4
		P Cond	96.0	100	100	100	0.2	0.2	0.0	0.0	0.0	0.2	0.0	0.2	0.0	0.0	0.0	3.4
	Corr = 0.50	P Ideal	99.8	100	100	100	0.0	0.0	0.0	0.0	0.2	0.0	0.0	0.0	0.0	0.0	0.0	0.2
		P Naive	70.6	100	100	100	7.6	4.8	6.2	3.6	5.2	5.4	3.4	5.4	5.2	6.4	4.8	20.0
		P Corr	99.6	100	100	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.4
		P Cond	99.6	100	100	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.4

Open in a new tab

Corr, corrected score; Cond, conditional score; P, penalized.

5. Application

We applied the proposed approaches to the AIDS Clinical Trial (ACTG) 175 data. Access of the ACTG data is described at https://actgnetwork.org/clinical-trials/access-published-data. ACTG 175 is a randomized clinical trial to compare zidovudine alone, zidovudine plus didanosine, zidovudine plus zalcitabine, or didanosine alone in HIV-infected subjects on the basis of time to progression to AIDS or death (Hammer et al., 1996). Between December 1991 and October 1992, 2467 subjects were recruited and followed until November 1994. It is of interest to assess the effect of treatments on survival time adjusted for baseline covariates, including CD4 counts, antiretroviral history (naive or experience), history of intravenous drug use (yes or no), Karnofsky score, homosexual activity (yes or no), age and gender. Our analysis included 2448 patients with observations on these variables. It is well known that CD4 measurements may be subjected to substantial measurement error. In the ACTG 175 study, most subjects had replicated CD4 measurements before starting the treatments. The measurements between three weeks before randomization and one week after randomization were taken as replicates for baseline CD4 measurements. The logarithmic transformation was applied to CD4 counts to achieve approximate constant variance. The primary analysis found zidovudine alone to be inferior to the other three therapies; thus, further investigations focused on two treatment groups, zidovudine alone and the combination of the other three.

To determine if the coefficients are constant or time-varying, we used BIC to select among models with coefficients that are constant and quadratic splines with 0, 1, and 2 interior knots. Based on the selected model, history of intravenous drug use has a time-varying effect and the effects of treatment and other covariates are constant. We obtained the naive, conditional score, corrected score and the corresponding penalized estimates. For all these approaches, the BIC is smallest in the case of no interior knot for the time-varying coefficient. The estimated constant coefficients are shown in Table 5. All penalized approaches selected the covariates log(CD4), treatment, antiretroviral history, age, and Karnofsky scores, and the estimates are significant as the unpenalized estimates. The penalized conditional score and corrected score estimates have smaller estimated standard errors than the corresponding unpenalized estimates, which may imply possible efficiency gain. Homosexual activity, gender and history of intravenous drug are not selected by the penalized methods. Based on the unpenalized approaches, homosexual activity and gender are insignificant, while history of intravenous drug might have some effect at the beginning of the study, and the effect decayed and eventually disappeared around week 50 (Figure 2). The conditional score and corrected score estimates of treatment effects are larger in magnitude than the naive estimates.

Table 5.

Estimates (Standard Errors) of the constant coefficients in the ACTG 175 study.

	Naive	Corr	Cond	P Naive	P Corr	P Cond
treatment	−0.405 (0.124)	−0.416 (0.135)	−0.416 (0.134)	−0.411 (0.130)	−0.423 (0.134)	−0.423 (0.132)
log(CD4)	−1.903 (0.191)	−2.217 (0.379)	−2.204 (0.223)	−1.909 (0.186)	−2.220 (0.336)	−2.207 (0.218)
antiretroviral experience	0.293 (0.128)	0.264 (0.140)	0.265 (0.133)	0.294 (0.130)	0.265 (0.137)	0.266 (0.132)
age	0.021 (0.006)	0.020 (0.008)	0.021 (0.007)	0.018 (0.006)	0.018 (0.006)	0.018 (0.006)
Karnofsky score	−0.036 (0.009)	−0.035 (0.008)	−0.035 (0.009)	−0.031 (0.008)	−0.029 (0.007)	−0.029 (0.008)
homosex	0.133 (0.165)	0.147 (0.175)	0.147 (0.175)	0 (–)	0 (–)	0 (–)
gender (male)	0.116 (0.209)	0.113 (0.218)	0.113 (0.217)	0 (–)	0 (–)	0 (–)

Open in a new tab

Corr, corrected score; Cond, conditional score; P, penalized.

Figure 2. — Estimate of the coefficient of history of intravenous drug use and the 95% pointwise confidence interval.

6. Discussion

We have proposed penalized variable selection approaches for partially linear proportional hazards models with covariate measurement error. The proposed approaches can be extended to including intermittently measured time-dependent covariates via joint modeling the survival and longitudinal processes. The computation time usually increases when the number of covariates increases. Like other measurement error dealing approaches, the proposed approaches may break down when the measurement error is too large for a given sample size.

In this article, we assume that the number of covariates is finite. In our numerical studies, the dimension K is relatively low, which corresponding to many practical situations. In some recent studies, the ultra-high dimensional case with K diverging with n has been considered. We suspect that, with a diverging number of covariates, the proposed penalized variable selection approaches are still applicable. Data assumptions and proofs of variable selection properties with a finite or a diverging number of covariates are usually significantly different. Investigation of the proposed methodology with K → ∞ is highly nontrivial and will be pursued in future research.

To facilitate the development of the theory, in this paper we consider splines with quasi-uniform interior knots (Assumption (C7) in the Appendix) in the manuscript. This assumption is the same as in Huang (1998) and Xue and Yang (2006). In our simulation and real data application, we used the equally spaced knots and found this method worked very well in all the examples. We have also done simulation studies using knots at equally spaced sample quantiles, and the results are very similar to the ones based on equally spaced knots.

In practice, if the failure events are sparse, we recommend that knots be placed on the sample quantiles of the failure events. In the Cox model literature, Nan et al. (2005) and Sleeper and Harrington (1990) also suggested this scheme, which is believed to be able to reduce the chances of getting singularities compared to the one with equally spaced knots. There are also some methods involving adaptive knot selection (Stone et al., 1997; Miyata and Shen, 2012) at the expense of a larger computational burden. Developing an efficient and automatic criterion for knots selection is challenging for our model setting and warrants future study.

Table 2.

Estimation of the nonzero coefficients when n = 500

		β₃					β₁₃					α₁ (u)
		Bias	SD	SE	CP	RE	Bias	SD	SE	CP	RE	Bias	SD	SE	CP	RE
Corr = 0.00	Ideal	−0.028	0.078	0.073	0.930		−0.029	0.075	0.073	0.928		0.046	0.143	0.137	0.910
	Naive	0.180	0.069	0.065	0.228		0.077	0.074	0.071	0.776		0.185	0.131	0.124	0.559
	Corr	−0.088	0.123	0.125	0.954		−0.070	0.104	0.095	0.902		0.114	0.207	0.169	0.871
	Cond	−0.073	0.116	0.103	0.886		−0.059	0.100	0.090	0.900		0.098	0.197	0.173	0.892
	P Ideal	−0.007	0.074	0.070	0.942	1.10	−0.007	0.072	0.070	0.944	1.09	0.035	0.138	0.131	0.914	1.08
	P Naive	0.199	0.067	0.064	0.144	1.06	0.098	0.072	0.069	0.692	1.06	0.207	0.127	0.120	0.467	1.00
	P Corr	−0.032	0.111	0.099	0.928	1.22	−0.025	0.096	0.084	0.918	1.17	0.051	0.189	0.147	0.863	1.20
	P Cond	−0.020	0.107	0.098	0.934	1.19	−0.017	0.093	0.087	0.938	1.14	0.043	0.183	0.166	0.923	1.12
Corr = 0.25	Ideal	−0.034	0.084	0.081	0.934		−0.024	0.081	0.080	0.936		0.052	0.164	0.157	0.916
	Naive	0.188	0.077	0.072	0.256		0.064	0.083	0.078	0.826		0.201	0.146	0.139	0.565
	Corr	−0.121	0.151	0.153	0.958		−0.076	0.124	0.104	0.880		0.136	0.246	0.200	0.867
	Cond	−0.097	0.138	0.117	0.858		−0.060	0.116	0.098	0.878		0.111	0.231	0.204	0.893
	P Ideal	−0.012	0.076	0.073	0.942	1.22	−0.002	0.073	0.073	0.954	1.22	0.045	0.155	0.148	0.918	1.12
	P Naive	0.182	0.072	0.067	0.232	1.13	0.058	0.076	0.073	0.846	1.22	0.203	0.139	0.133	0.533	1.10
	P Corr	−0.049	0.122	0.106	0.903	1.52	−0.022	0.100	0.086	0.915	1.55	0.059	0.213	0.164	0.856	1.34
	P Cond	−0.034	0.117	0.103	0.905	1.41	−0.012	0.096	0.089	0.937	1.46	0.049	0.205	0.189	0.922	1.27
Corr = 0.50	Ideal	−0.043	0.096	0.095	0.924		−0.039	0.098	0.095	0.920		0.057	0.197	0.190	0.926
	Naive	0.223	0.084	0.082	0.238		0.034	0.101	0.093	0.906		0.240	0.168	0.164	0.536
	Corr	−0.160	0.196	0.235	0.984		−0.103	0.151	0.133	0.907		0.189	0.366	0.283	0.879
	Cond	−0.121	0.160	0.146	0.894		−0.079	0.132	0.116	0.886		0.145	0.301	0.264	0.912
	P Ideal	−0.018	0.082	0.083	0.944	1.38	−0.014	0.085	0.083	0.936	1.32	0.048	0.182	0.174	0.923	1.18
	P Naive	0.185	0.077	0.075	0.315	1.20	−0.012	0.089	0.084	0.942	1.29	0.210	0.158	0.153	0.567	1.13
	P Corr	−0.055	0.146	0.125	0.948	1.81	−0.033	0.117	0.101	0.912	1.84	0.072	0.279	0.193	0.855	1.71
	P Cond	−0.037	0.126	0.117	0.938	1.62	−0.022	0.105	0.099	0.934	1.57	0.057	0.254	0.232	0.935	1.41

Open in a new tab

Acknowledgements

This research is supported in part by National Science Foundation grants DMS-1106816 (Song, Wang) and DMS-1542332 (Wang), and National Institute and Health grants CA201207 (Song), HL121347 (Song), and CA204120 (Ma).

Appendix

A.1. Regularity conditions

Let $C^{(r)}$ be the space of functions that have r continuous derivatives for some r ≥ 2 and assume $α_{0 k} \in C^{(r)}$ , $k = 1, \dots, K_{2}^{s}$ . Let $G_{k}$ be the space of spline functions with knots sequence $ξ_{k} = {0 = ξ_{k 0} \leq ξ_{k 1} \leq \dots \leq ξ_{k n_{k}} \leq ξ_{k (n_{k} + 1)} = τ}$ and order p on [0, τ]. Let $G_{n}^{K_{2}} = G_{1} \otimes G_{2} \otimes \dots \otimes G_{K_{2}}$ . Let S(u, η)[g] = E{S_ni(u, η)[g]}, where S_ni(u, η})[1] = Y_i(u)exp{η^⊤{(u)H_i(u)}. Similarly we denote $S^{c} (u, η) [g] = E {S_{n i}^{c} (u, η) [g]}$ and $S^{d} (u, η) [g] = E {S_{n i}^{d} (u, η) [g]}$ . For any matrix A, let ρ_max(A) and ρ_min(A) denote the maximum and minimum eigenvalues of A, and let A^⊗k denote 1, A and AA^⊤ respectively for k = 0, 1, 2. Define

Γ (u, η) = \frac{S (u, η) [H^{\otimes 2}] S (u, η) [1]}{S^{2} (u, η) [1]} - \frac{S^{\otimes 2} (u, η) [H]}{S^{2} (u, η) [1]} .

Let $N (η_{0}) = {η : {‖ η - η_{0} ‖}_{\infty} \leq c_{η_{0}}}$ be a neighborhood of η₀. We assume the following regularity conditions.

(C1)
Pr(V ≥ τ) > 0.
(C2)
Pr(Δ = 1) > 0.
(C3)
There exist $0 < c_{1}^{f} \leq c_{2}^{f} < \infty$ such that the density f_V|Δ=1(x) of V satisfies that $c_{1}^{f} \leq f_{V | Δ = 1} (x) \leq c_{2}^{f}$ .
(C4)
$ρ_{\max} [E (Σ_{H_{i}})] \leq c_{Σ} < \infty .$
(C5)
There exist $0 < c_{1}^{Γ} \leq c_{2}^{Γ} < \infty$ such that $c_{1}^{Γ} \leq \inf u_{\in [0, τ]} ρ_{\min} {Γ (u, η)} \leq \sup_{u \in [0, τ]} ρ_{\max} {Γ (u, η)} \leq c_{2}^{Γ}$ uniformly for $η \in N (η_{0})$ .
(C6)
ρ_max(H^⊗2) < ∞.
(C7)
The knot sequence $ξ_{k} = {0 = ξ_{k 0} \leq ξ_{k 1} \leq \dots \leq ξ_{k n_{k}} \leq ξ_{k (n_{k} + 1)} = τ}$ is quasiuniform. The number n_k of interior knots satisfies n^1/(2r) ≪ n_k ≪ n^1/2−δ for some 0 < δ < (r − 1)/(2r), where a_n ≪ b_n denotes that $\lim_{n \to \infty} a_{n} b_{n}^{- 1} = 0$ .
(C8)
E {H^⊗2}² < ∞, E [e^⊤ e]² < ∞, and $\sup_{η \in N (η_{0}), u \in [0, τ]} \max [E {\exp (4 η^{⊤} (u) H)}, E {\exp (4 η^{⊤} (u) e)}] < \infty$ .
(C9)
$\int_{0}^{τ} λ_{0}^{2} (u) d u < \infty$ .

Conditions (C1) and (C2) are standard assumptions for proportional hazards models. Conditions (C8) and (C9) control the magnitude of the covariates, measurement error and baseline hazard, which are generally used for joint models. Condition (C7) specifies the knot density for spline approximation compared to the sample size. Condition (C3) ensures the equivalence of the norms ∥·∥ and ∥·∥₂.Conditions (C4)–(C6) control the variation of the estimating functions around $η \in N (η_{0})$ for u ∈ [0, τ]. Similar assumptions like (C3)–(C7) are usually adopted for asymptotics for polynomial spline approximations (Xue et al., 2010).

We also need some assumptions about $β_{0}^{s}$ and $α_{0}^{s}$ .

(C10)
The number of nonzero components in the nonparametric part $K_{2}^{s}$ is fixed, and there is a constant c_α > 0 such that $\min_{1 \leq k \leq K_{2}^{s}} ‖ α_{0 k} ‖ > c_{α}$ . The nonzero coefficients in the linear part satisfy that $\min_{1 \leq k \leq K_{1}^{s}} ‖ β_{0 k} ‖ / ν_{1} \to \infty$ .

Let $Q = {k : σ_{k}^{2} > 0, k = 1 \dots, K}$ . Let $ω = {σ_{k}^{2} : k \in Q}$ denote the vector of parameters for error variances. To be able to estimate ω, we make the following assumption:

(C11)
P(m_ik > 1) > 0 for k ∈ Q.

A.2. Proof of Theorem 1

For simplicity of notation, we assume that ν₁ = ν₂ = ν. Note that an estimator ${\hat{η}}^{c} (u)$ maximizes (7) is a solution to (8). By Lemma B.5 in Song and Wang (2017), there exists ${\tilde{α}}_{k} (u) \in G_{n}$ satisfies that $\sup_{u \in [0, τ]} {‖ {\tilde{α}}_{k} - α_{0 k} ‖}_{\infty} = O (h^{r})$ for $k = 1, \dots, K_{2}^{s}$ . Let ${\tilde{α}}_{k} = 0$ for $k = K_{2}^{s} + 1, \dots, K_{2}$ and $\tilde{α} (u) = {({\tilde{α}}_{1}, \dots, {\tilde{α}}_{K_{2}})}^{⊤} = {\tilde{γ}}^{⊤} B (u)$ . Define an intermediate estimator $\tilde{β}$ of β that minimizes

L_{n}^{c P} (β, \tilde{γ}) = L_{n}^{c} (β, \tilde{γ}) - \sum_{k = 1}^{K_{1}} p_{ν_{1}} (| β_{k} |),

where

L_{n}^{c} (β, \tilde{γ}) = n^{- 1} \sum Δ_{i} {β^{⊤} X_{i} + {\tilde{γ}}^{⊤} (u) Z_{i} + \frac{1}{2} (β^{⊤}, γ^{⊤}) Σ_{R_{i}} (u) {(β^{⊤}, {\tilde{γ}}^{⊤})}^{⊤} - \log S_{n} (T_{i}; {(β^{⊤}, {\tilde{γ}}^{⊤})}^{⊤})} .

First we show the consistency. Let $\tilde{θ} = ({\tilde{β}}^{⊤}, {\tilde{γ}}^{⊤})^{⊤}$ , ${\tilde{θ}}_{0} = ({\tilde{β}}_{0}^{⊤}, {\tilde{γ}}^{⊤})^{⊤}$ , $η^{⊤} (u) = {(β^{⊤}, γ^{⊤} B^{⊤} (u))}^{⊤}$ , $\tilde{η} (u) = ({\tilde{β}}^{⊤}, {\tilde{γ}}^{⊤} B^{⊤} (u))^{⊤}$ , $η_{0} (u) = {(β_{0}^{⊤}, α_{0}^{⊤} (u))}^{⊤}$ , ${\tilde{η}}_{0} (u) = {(β_{0}^{⊤}, {\tilde{γ}}^{⊤} B^{⊤} (u))}^{⊤}$ . Next define

Θ (C) = {θ = {(β^{⊤}, γ^{⊤})}^{⊤} : ‖ B_{*} (θ - {\tilde{θ}}_{0}) ‖ \leq C {(n h)}^{- 1 / 2}}, \partial Θ (C) = {θ = {(β^{⊤}, γ^{⊤})}^{⊤} : ‖ B_{*} (θ - {\tilde{θ}}_{0}) ‖ = C {(n h)}^{- 1 / 2}} .

Lemma A.1

Let $η^{⊤} (u) = {(β^{⊤}, γ^{⊤} B^{⊤} (u))}^{⊤}$ . If θ ∈ Θ(C) and ν → 0, then

\Pr [\cup_{θ \in Θ (C), ‖ η_{0 k} ‖ \neq 0} {p_{ν} ({‖ η_{k} ‖}_{2}) \neq (a + 1) ν^{2} / 2}] = o (1) .

By triangular inequality,

‖ η_{k} ‖ \geq ‖ {\tilde{η}}_{0 k} ‖ - ‖ {[B_{*} (θ - {\tilde{θ}}_{0})]}_{k} ‖, ‖ {\tilde{η}}_{0 k} ‖ \geq ‖ η_{0 k} ‖ - ‖ η_{0 k} - {\tilde{η}}_{0 k} ‖,

where ${[B_{*} (θ - {\tilde{θ}}_{0})]}_{k}$ is the kth element of $[B_{*} (θ - {\tilde{θ}}_{0})]$ . As θ ∈ Θ(C) we have

‖ {[B_{*} (θ - {\tilde{θ}}_{0})]}_{k} ‖ \leq ‖ B_{*} (θ - {\tilde{θ}}_{0}) ‖ C {(n h)}^{- 1 / 2} .

By Lemma B.5 in (Song and Wang, 2017),

‖ η_{0 k} - {\tilde{η}}_{0 k} ‖ \leq C_{k} h^{r}

for some constant C_k. Thus,

‖ η_{k} ‖ \geq ‖ η_{0 k} ‖ - C {(n h)}^{- 1 / 2} - C_{k} h^{r} = ‖ η_{0 k} ‖ - o (1) .

Since ∥η_0k∥ > 0 for $k = K_{1} + 1, \dots, K_{1} + K_{2}^{s}$ and v → 0, ∥η_k∥ ≥ av when n is large enough. By Lemma B.3 and B.4 in Song and Wang (2017), ∥η_k∥₂ ≥ av when n is large enough. The result follows.

Lemma A.2

Under Conditions (C1)–(C10), one has $‖ {\hat{θ}}^{c} - {\tilde{θ}}_{0} ‖ = O {{(n h)}^{- 1 / 2}}$ .

Proof. We only need to show that for any ε > 0, there exists a positive constant C such that, as n → ∞,

\Pr {\sup_{θ \in \partial Θ (C)} L_{n}^{c P} (θ) < L_{n}^{c P} ({\tilde{θ}}_{0})} > 1 - ε,

(A.1)

where $L_{n}^{c} (\cdot)$ is the corrected log partial likelihood function given in (6). Note that

L_{n}^{c P} (θ) - L_{n}^{c P} ({\tilde{θ}}_{0}) = L_{n}^{c} (θ) - L_{n}^{c} ({\tilde{θ}}_{0}) - {\sum_{k = 1}^{K} p_{ν} ({‖ η_{k} ‖}_{2}) - \sum_{k = 1}^{K} p_{ν} ({‖ {\tilde{η}}_{k} ‖}_{2})} .

Since p_ν(θ) ≥ p_ν(0) = 0, we have

L_{n}^{c P} (θ) - L_{n}^{c P} ({\tilde{θ}}_{0}) \leq L_{n}^{c} (θ) - L_{n}^{c} ({\tilde{θ}}_{0}) - \sum_{‖ η_{k} ‖ \neq 0} {p_{ν} ({‖ η_{k} ‖}_{2}) - p_{ν} ({‖ {\tilde{η}}_{k} ‖}_{2})} .

By a Taylor expansion, we have

L_{n}^{c} (θ) - L_{n}^{c} ({\tilde{θ}}_{0}) = {(θ - {\tilde{θ}}_{0})}^{⊤} U_{n}^{c} ({\tilde{θ}}_{0}) + {(θ - {\tilde{θ}}_{0})}^{⊤} \frac{\partial U_{n}^{c} ({\tilde{θ}}_{0})}{\partial θ^{⊤}} (θ - {\tilde{θ}}_{0}),

(A.2)

where

U_{n}^{c} ({\tilde{θ}}_{0}) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} {{\hat{R}}_{i} (u) + Σ_{R_{i}} (u) {\tilde{θ}}_{0} - \frac{S_{n}^{c} (u, {\tilde{η}}_{0}) [\hat{R}]}{S_{n}^{c} (u, {\tilde{η}}_{0}) [1]}} d N_{i} (u), \frac{\partial U_{n}^{c} (θ^{*})}{\partial θ^{⊤}} = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} {Σ_{R_{i}} (u) - Γ_{#}^{c} (u, η *)} d N_{i} (u), Γ_{#}^{c} (u, η) = \frac{S_{n}^{c} (u, η) [{\hat{R}}^{\otimes 2}] S_{n}^{c} (u, η) [1]}{S_{n}^{c 2} (u, η) [1]} - \frac{S_{n}^{c \otimes 2} (u, η) [\hat{R}]}{S_{n}^{c 2} (u, η) [1]},

and η^* lies between ${\tilde{η}}_{0}$ and η = B_*(u)θ.

From (A.5) and (A.10) in (Song and Wang, 2017), for θ ∈ ∂Θ(C), we have

{(θ - {\tilde{θ}}_{0})}^{⊤} U_{n}^{c} ({\tilde{θ}}_{0}) = C \times O_{p} {(h^{r} + {(n h)}^{- 1 / 2}) {(n h)}^{- 1 / 2}} .

- C^{2} {(n h)}^{- 1} {c_{2}^{Γ} + o_{p} (1)} \leq {(θ - {\tilde{θ}}_{0})}^{⊤} \frac{\partial U_{n}^{c} (θ *)}{\partial θ^{⊤}} (θ - {\tilde{θ}}_{0}) \leq - C^{2} {(n h)}^{- 1} {c_{1}^{Γ} + o_{p} (1)} .

By Lemma A.1, $p_{ν} ({‖ η_{k} ‖}_{2}) = p_{ν} ({‖ {\tilde{η}}_{0 k} ‖}_{2})$ for $‖ η_{k} ‖ \neq 0$ almost surely. The result holds when C is large enough. □

Lemma A.3

Under Conditions (C1)–(C10), we have $‖ \tilde{β} - β_{0} ‖ = O {{(n h)}^{- 1 / 2}}$ .

We only need to show that for any ε > 0, there exists a C such that

\Pr {\sup_{‖ β - β_{0} ‖ = C {(n h)}^{- 1 / 2}} L_{n}^{c P} (β, \tilde{γ}) < L_{n}^{c P} (β_{0}, \tilde{γ})} > 1 - ε .

The arguments are similar to those for (A.1) and are thus omitted.

Theorem 3.1 follows from Lemmas A.2 and A.3.

A.3. Proof of Theorem 2

We first cite two lemmas, which are corresponding to Lemmas B.3 and B.4 in Supplementary Material of Song and Wang (2017).

Lemma A.4

As n → ∞,

A_{n} = \sup_{g \in G^{K}} | \frac{‖ g ‖_{n}^{2}}{‖ g ‖^{2}} - 1 | = O_{p} (\sqrt{\frac{\log (n)}{n h}}) .

Lemma A.5

For any function $g \in G^{K}$ , under Conditions C2 and C3, there exists 0 < c₁ ≤ c₂ < ∞ such that c₁ ∥g∥₂ ≤ ∥g∥_n ≤ c₂ ∥g∥₂.

Note that $‖ {\hat{η}}^{c} - η_{0} ‖ \leq ‖ {\hat{η}}^{c} - \tilde{η} ‖ + ‖ \tilde{η} - η_{0} ‖$ , and

{‖ \tilde{η} - η_{0} ‖}^{2} = {‖ \tilde{β} - β_{0} ‖}^{2} + {‖ B \tilde{γ} - α_{0} ‖}^{2} = {‖ \tilde{β} - β_{0} ‖}^{2} + \sum_{k = 1}^{K} {‖ {\tilde{α}}_{k} - α_{0 k} ‖}^{2} .

Then $‖ {\tilde{η}}^{c} - η_{0} ‖ = O {{(n h)}^{- 1 / 2}}$ by Lemmas A.1, A.2 and A.3. This, together with Lemmas A.4 and A.5 implies ${‖ {\tilde{η}}^{c} - η_{0} ‖}_{2} = O_{p} {{(n h)}^{- 1 / 2}}$ and ${‖ {\hat{η}}^{c} - η_{0} ‖}_{2} = O_{p} {{(n h)}^{- 1 / 2}}$ .

Let θ⁰ = (β^s⊤, 0^⊤, γ^s⊤, 0^⊤)^⊤, η⁰ = B_*θ⁰, and define

Θ (A) = {θ = {(β^{s ⊤}, 0^{⊤}, γ^{s ⊤}, 0^{⊤})}^{⊤} : ‖ B_{*} (θ - θ^{0}) ‖ \leq C {(n h)}^{- 1 / 2}} .

It suffices to show that

L_{n}^{c P} (θ^{0}) = \max_{θ \in Θ (A)} L_{n}^{c P} (θ) .

Suppose θ ∈ Θ(A). Since p_v(0) = 0, we have

L_{n}^{c P} (θ^{⊤}) - L_{n}^{c P} (θ^{0}) = L_{n}^{c P} (θ^{⊤}) - L_{n}^{c P} (θ^{0}) - \sum_{‖ η_{k}^{0} ‖ = 0} p_{ν} ({‖ η_{k} ‖}_{2}) = β^{z ⊤} U_{n, β^{z}}^{c} (θ_{0}^{*}) + γ^{z ⊤} U_{n, γ^{z}}^{c} (θ_{0}^{*}) + β^{z} \frac{\partial U_{n, γ^{z}}^{c} (θ_{0}^{*})}{\partial β_{z}^{⊤}} β^{z} + γ^{z ⊤} \frac{\partial U_{n, β^{z}}^{c} (θ_{0}^{*})}{\partial γ_{z}^{⊤}} γ^{z} - {\sum_{‖ η_{k}^{0} ‖ = 0} p_{ν}^{'} ({‖ η_{k}^{*} ‖}_{2}) f_{k} (θ^{*}) η_{k}},

where $U_{n, β^{z}}^{c} (θ)$ is the subvector of U^c(θ) composed of the $(K_{1}^{z} + 1, \dots, K_{1})$ elements, and $U_{n, γ^{z}}^{c} (θ)$ is the subvector of U^c(θ) composed of the $(K_{1} + \sum_{k = 1}^{K_{2}^{s}} L_{k}, \dots, K_{1} + \sum_{k = 1}^{K_{2}} L_{k})$ elements. With similar arguments as those for (A.5) and (A.10) in Song and Wang (2017), we have

β^{z ⊤} U_{n, β^{z}}^{c} ({\tilde{θ}}_{0}) = C \times O_{p} {(h^{r} + {(n h)}^{- 1 / 2})} ‖ β^{z} ‖, γ^{z^{⊤}} U_{n, γ^{z}}^{c} (θ_{0}^{*}) = C \times O_{p} {(h^{r} + {(n h)}^{- 1 / 2})} ‖ B γ^{z} ‖,

- C^{2} {c_{2}^{Γ} + o_{p} (1)} {‖ β^{z} ‖}^{2} \leq β^{z} - \frac{\partial U_{n, γ^{*}}^{c} (θ_{0}^{*})}{\partial β_{z}^{⊤}} β^{z} \leq - C^{2} {c_{1}^{Γ} + o_{p} (1)} {‖ β^{z} ‖}^{2}, - C^{2} {c_{2}^{Γ} + o_{p} (1)} {‖ B γ^{z} ‖}^{2} \leq γ^{z^{⊤}} \frac{\partial U_{n, β_{z}}^{c} (θ_{0}^{*})}{\partial γ_{z}^{⊤}} γ^{z} \leq - C^{2} {c_{1}^{Γ} + o_{p} (1)} {‖ B γ^{z} ‖}^{2} .

Note that

\sum_{‖ η_{k}^{0} ‖ = 0} p_{ν}^{'} ({‖ η_{k}^{*} ‖}_{2}) f_{k} (θ^{*}) η_{k} = \sum \frac{β_{j}^{z} β_{j}^{z} p_{ν}^{'} (| β_{j}^{z} |)}{{| β_{j}^{z} |}_{2}} + \sum \frac{p_{ν}^{'} ({‖ γ_{1}^{z} ‖}_{2})}{{‖ γ_{j}^{z} ‖}_{2}} γ_{j}^{z ⊤} 〈 B_{j}, B_{j} 〉 γ_{j}^{z} .

Since $γ_{j}^{z ⊤} 〈 B_{j}, B_{j} 〉 γ_{j}^{z} ≃ {‖ B_{j} γ^{z} ‖}^{2}$ $| β_{j}^{z} | \leq C {(n h)}^{- 1 / 2}$ , ${‖ B_{j} γ_{j}^{z} ‖}_{2} \leq C {(n h)}^{- 1 / 2}$ and v(nh)^1/2 → ∞, we have $p_{ν}^{'} (| β_{j}^{z} |) = ν$ , and $p_{ν}^{'} ({‖ γ_{j}^{z} ‖}_{2}) = ν$ when n is large enough. Therefore,

\frac{β_{j}^{z} β_{j}^{z} p_{ν}^{'} (| β_{j}^{z} |)}{{| β_{j}^{z} |}_{2}} \leq ν ‖ β^{z} ‖, \frac{p_{ν}^{'} ({‖ γ_{1}^{z} ‖}_{2})}{{‖ γ_{j}^{z} ‖}_{2}} γ_{j}^{z ⊤} 〈 B_{j}, B_{j} 〉 γ_{j}^{z} \leq C ν {‖ B_{j} γ^{z} ‖}^{2} .

As v → 0, it follows that

L_{n}^{C P} (θ^{⊤}) - L_{n}^{c P} (θ^{0}) \leq 0.

This completes the proof.

A.4. Proof of Theorem 3

From Theorems 3.1 and 3.2, we have shown that there exists $\hat{θ} = {({\hat{β}}^{s c ⊤}, 0^{⊤}, {\hat{γ}}^{s c ⊤}, 0^{⊤})}^{⊤}$ maximizing $L_{n}^{c P} (θ)$ . It follows that $\hat{θ}$ also maximizes $L_{n}^{c P} ({(β^{s ⊤}, 0^{⊤}, γ^{s ⊤}, 0^{⊤})}^{⊤})$ . Let ${\dot{U}}_{s}^{c} (θ) = \partial U_{n}^{c} (θ) / \partial {(β^{s ⊤}, γ^{s ⊤})}^{⊤}$ . By Lemma A.1 and p_ν(0) = 0, for n large enough,

U_{s, n}^{c P} (\hat{θ}) = U_{s, n}^{c} (\hat{θ}), a . s .

Let $\hat{θ^{s}} = {({\hat{β}}^{s c ⊤}, {\hat{γ}}^{s c ⊤})}^{⊤}$ , ${\tilde{θ}}_{0}^{s}$ contain the corresponding elements in ${\tilde{θ}}_{0}$ . With similar arguments as those for Lemma B.12 in Song and Wang (2017), for any $a \in R^{K_{1}^{s} + L_{s}}$ , we have

a^{⊤} n^{1 / 2} ({\hat{θ^{s}}}^{s} - {\tilde{θ}}_{0}^{s}) = a^{⊤} {- {\dot{U}}_{s}^{c} (θ_{0})}^{- 1} n^{- 1 / 2} \sum_{i = 1}^{n} U_{n i}^{c} (η_{0}) + o_{p} (1) ‖ a ‖_{2},

(A.3)

where

U_{n i}^{c} (η_{0}) = \int_{0}^{τ} [B_{*}^{⊤} (u) {{\hat{H}}_{i} + Σ_{H_{i} η_{0} (u)} - \frac{S^{c} (u, η_{0}) [\hat{H}]}{S^{c} (u, η_{0}) [1]}} d N_{i} (u) - \int_{0}^{L} B_{*}^{⊤} (u) {{\hat{H}}_{i} - \frac{S^{c} (u, η_{0}) [\hat{H}]}{S^{c} (u, η_{0}) [1]}} \frac{S_{n i}^{c} (u, η_{0}) [1]}{S^{c} (u, η_{0}) [1]} d E N (u)],

By similar arguments as those for Theorem 2 in Song and Wang (2017), it can be shown that ${{\dot{U}}_{s}^{c} (θ_{0})}^{- 1}$ has bounded eigenvalues and

c_{2}^{- 1} a^{⊤} a \leq a^{⊤} {{\dot{U}}_{s}^{c} (θ_{0})}^{- 1} a \leq c_{1}^{- 1} a^{⊤} a .

(A.4)

From (A.6), the first K₁ rows of ${{\dot{U}}_{s}^{c} (θ_{0})}^{- 1}$ equal [J₁, J₂] with

J_{1} = {({\dot{U}}_{s β^{s} β^{s}}^{c} - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ^{s} β^{s}}^{c})}^{- 1}, J_{2} = - {({\dot{U}}_{s β^{s} β^{s}}^{c} - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ^{s} β^{s}}^{c})}^{- 1} {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1} .

Thus, for any $a_{1} \in R^{K_{1}^{s}}$ , replacing a by ${(a_{1}^{⊤}, 0_{K_{1} - K_{1}^{s} + L}^{⊤})}^{⊤}$ in (A.3) and letting

{\dot{U}}_{s}^{c} (θ_{0}) = E {\int_{0}^{τ} B_{*}^{⊤} (u) Γ_{s} (u, η) B_{*} (u) d N_{i} (u)} = (\begin{matrix} {\dot{U}}_{s β^{s} β^{s}}^{c} {\dot{U}}_{s β^{s} γ^{s}}^{c} \\ {\dot{U}}_{s γ^{s} β^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c} \end{matrix})

with ${\dot{U}}_{s β^{s} β^{s}}^{c}$ being a $K_{1}^{s} \times K_{1}^{s}$ matrix and ${\dot{U}}_{s γ^{s} γ^{s}}^{c}$ an L_s × L_s matrix, we have

a_{1}^{⊤} \sqrt{n} ({\hat{β}}^{s c} - β_{0}) = a_{1}^{⊤} {({\dot{U}}_{s β^{s} β^{s}}^{c} - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ β^{s}}^{c})}^{- 1} (I, - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1}) \times \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} U_{n i}^{c} (η_{0}) + {‖ a_{1} ‖}_{2} o_{p} (1),

It can be easily seen that $E (U_{n i}^{c}) = 0$ . We only need to show that

var {a_{1}^{⊤} {({\dot{U}}_{s β^{s} β^{s}}^{c} - {\dot{U}}_{s β^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ^{s} β^{s}}^{c})}^{- 1} (I, - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1}) U_{n i}^{c} (η_{0})} \leq c_{3} a_{1}^{⊤} a_{1}

for some constant c₃. This follows with similar arguments as those for Lemma A.5 in Song and Wang (2017). Therefore,

\sqrt{n} Σ_{β^{s}}^{c - 1 / 2} ({\hat{β}}^{s c} - β_{0}) \overset{d}{\to} N (0, I),

where

Σ_{β^{s}}^{c} = A^{c - 1} D^{c} {(A^{c - 1})}^{⊤},

(A.5)

with

A^{c} = {({\dot{U}}_{s β^{s} β^{s}}^{c} - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ^{s} β^{s}}^{c})}^{- 1}, D^{c} = (I, - {\dot{U}}_{s β^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1}) E {U_{n i}^{c 2} (η_{0})} {(I, - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1})}^{⊤} .

(A.6)

Finally, with similar arguments as those for Lemma B.14 in Song and Wang (2017), we can show that $Σ_{β^{s}}^{c} - Σ_{β^{s}}$ is positive definite.

A.5. Proof of Theorem 4

When ω is unknown, we will rewrite $U_{s}^{c} (θ)$ as $U_{s}^{c} (θ, ω)$ , and modify the notation for the other functions similarly. A method of moments estimator of $σ_{k}^{2} (k \in Q)$ is

{\hat{σ}}_{k}^{2} = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} I (m_{i k} > 1) {(W_{i k j} - {\bar{W}}_{i k})}^{2}}{\sum_{i = 1}^{n} I (m_{i k} > 1) (m_{i k} - 1)} .

By the stong law of large number, under condition (C11), it can be easily shown that ${\hat{σ}}_{k}^{2}$ converges almost surely to $σ_{k}^{2}$ . This, together with an Taylor expansion, implies that

n^{1 / 2} ({\hat{σ}}_{k}^{2} - σ_{k}^{2}) = n^{- 1 / 2} {[E {I (m_{i k} > 1) (m_{i k} - 1)}]}^{- 1} \times \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} I (m_{i k} > 1) {{(W_{i k j} - {\bar{W}}_{i k})}^{2} - (m_{i k} - 1) σ_{k}^{2}} + o_{p} (1) .

With these facts, it can be shown that the results of Theorem 3.1 and 3.2 still hold. Then using similar arguments as those for Lemma A.5 in Song and Wang (2017), we have

a^{⊤} n^{1 / 2} (\hat{θ} - {\tilde{θ}}_{0}) = a^{⊤} {- {\dot{U}}_{s}^{c} (θ_{0}, ω)}^{- 1} n^{- 1 / 2} \sum_{i = 1}^{n} Ψ_{i} (θ_{0}, ω) + o_{p} (1) ‖ a ‖_{2},

where

Ψ_{i} (θ_{0}, ω) = \sum_{k \in Q} E {\frac{\partial U^{c} (θ_{0}, ω)}{\partial σ_{k}^{2}}} {[E {I (m_{i k} > 1) (m_{i k} - 1)}]}^{- 1} \times \sum_{j = 1}^{m_{i k}} I (m_{i k} > 1) {{(W_{i k j} - \bar{W_{i k}})}^{2} - (m_{i k} - 1) σ_{k}^{2}} = \sum_{k \in Q} E {\frac{\partial U^{c} (θ_{0}, ω)}{\partial σ_{k}^{2}}} {[E {I (m_{i k} > 1) (m_{i k} - 1)}]}^{- 1} \times \sum_{j = 1}^{m_{i k}} I (m_{i k} > 1) (m_{i k} - 1) {S_{e_{i k}}^{2} - σ_{k}^{2}},

and $S_{e_{i k}}^{2}$ is the sample variance of e_ikj, j = 1,…,m_ik. Then, with arguments similar to those for Theorem 3, it can be shown that

a_{1}^{⊤} \sqrt{n} ({\hat{β}}^{s c} - β_{0}) = a_{1}^{⊤} {({\dot{U}}_{s β^{s} β^{s}}^{c} - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1} {\dot{U}}_{s γ β^{s}}^{c})}^{- 1} (I, - {\dot{U}}_{s β^{s} γ^{s}}^{c} {\dot{U}}_{s γ^{s} γ^{s}}^{c - 1}) \times \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {U_{n i}^{c} (η_{0}, ω) + Ψ_{i} (θ_{0}, ω)} + {‖ a_{1} ‖}_{2} o_{p} (1) .

Both $U_{n i}^{c} (η_{0}, ω)$ and Ψ_i(θ₀, ω) has mean 0. By the law of iterated expectation, we have

E (U_{n i}^{c} (η_{0}, ω) Ψ_{i} (θ_{0}, ω)) = E {E (Ψ_{i} (θ_{0}, ω) U_{n i}^{c} (η_{0}, ω) | H_{i}, e_{i})} = E [Ψ_{i} (θ_{0}, ω) E {U_{n i}^{c} (η_{0}) | H_{i}, e_{i})}]

Note that $E {U_{n i}^{c} (η_{0}) | H_{i}, e_{i})}$ is a function of $(H_{i}, {\bar{e}}_{i})$ , which is independent of $S_{e_{i k}}^{2}$ . Therefore $E (U_{n i}^{c} (η_{0}, ω) Ψ_{i} (θ_{0}, ω)) = 0$ . It follows that

D^{c *} = v a r (U_{n i}^{c} (η_{0}, ω) + Ψ_{i} (θ_{0}, ω)) = v a r (U_{n i}^{c} (η_{0}, ω)) + v a r (Ψ_{i} (θ_{0}, ω)) .

(A.7)

Hence

\sqrt{n} Σ_{β^{s}}^{c * - 1 / 2} ({\hat{β}}^{s c} - β_{0}) \overset{d}{\to} N (0, I),

where

Σ_{β^{s}}^{c *} = A^{c - 1} D^{c *} {(A^{c - 1})}^{⊤} .

(A.8)

It can be easily seen that $Σ_{β^{s}}^{c *} - Σ_{β^{s}}^{c}$ is positive definite from (A.7).

References

Cai J, Fan J, Jiang J, and Zhou H (2008), ‘Partially linear hazard regression with varying coefficients for multivariate survival data’, Journal of the Royal Statistical Society, Series B, 70, 141–158. [Google Scholar]
Dafni UG and Tsiatis AA (1998), ‘Evaluating surrogate markers of clinical outcome measured with error’, Biometrics, 54, 1445–1462. [PubMed] [Google Scholar]
Donoho DL and Johnstone IM (1998), ‘Minimax estimation via wavelet shrinkage’, The Annals of Statistics, 26, 879–921. [Google Scholar]
Fan J and Li R (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, Journal of the American Statistical Association, 96, 1348–1360. [Google Scholar]
Fan J and Li R (2002), ‘Variable selection for Cox’s proportional hazards model and frailty model’, The Annals of Statistics, 30, 74–99. [Google Scholar]
Faucett CJ and Thomas DC (1996), ‘Simultaneously modeling censored survival data and repeatedly measured covariates: a gibbs sampling approach’, Statistics in Medicine, 15, 1663–1685. [DOI] [PubMed] [Google Scholar]
Gui J and Li H (2005), ‘Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data’, Bioinformatics, 21, 3001–3008. [DOI] [PubMed] [Google Scholar]
Hammer SM, Katezstein DA, Hughes MD, Gundaker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, Hirsch MS, and Merigan TC (1996), ‘A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter’, New England Journal of Medicine, 335, 1081–1089. [DOI] [PubMed] [Google Scholar]
Henderson R, Diggle P, and Dobson A (2000), ‘Joint modeling of longitudinal measurements and event time data’, Biostatistics, 4, 465–480. [DOI] [PubMed] [Google Scholar]
Huang J (1998), ‘Projection estimation in multiple regression with application to functional anova models’, The Annals of Statistics, 26, 242–272. [Google Scholar]
Huang J and Ma S (2010), ‘Variable selection in the accelerated failure time model via the bridge method’, Lifetime Data Analysis, 16, 176–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang Y and Wang CY (2000), ‘Cox regression with accurate covariates unascertainable: A nonparametric correction approach’, Journal of the American Statistical Association, 95, 1209–1219. [Google Scholar]
Ma S, Kosorok M, and Fine J (2006), ‘Additive risk models for survival data with high-dimensional covariates’, Bometrics, 62, 202–210. [DOI] [PubMed] [Google Scholar]
Miyata S and Shen X (2012), ‘Adaptive free-knot splines’, Journal of Computational and Graphical Statistics, 12, 197–213. [Google Scholar]
Nan B, Lin X, Lisabeth L, and Harlow S (2005), ‘A varying-coefficient Cox model for the effect of age at a marker event on age at menopause’, Biometrics, 61, 576–583. [DOI] [PMC free article] [PubMed] [Google Scholar]
Prentice R (1982), ‘Covariate measurement errors and parameter estimates in a failure time regression model’, Biometrika, 69, 331–342. [Google Scholar]
Sleeper LA and Harrington DP (1990), ‘Regression splines in the cox model with application to covariate effects in liver disease’, Journal of the American Statistical Association, 85, 941–949. [Google Scholar]
Song X, Davidian M, and Tsiatis AA (2002a), ‘An estimator for the proportional hazards model with multiple longitudinal covariates measured with error’, Biostatistics, 3, 511–528. [DOI] [PubMed] [Google Scholar]
Song X, Davidian M, and Tsiatis AA (2002b), ‘A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data’, Biometrics, 58, 742–753. [DOI] [PubMed] [Google Scholar]
Song X and Huang Y (2005), ‘On corrected score approach for proportional hazards model with covariate measurement error’, Biometrics, 61, 702–714. [DOI] [PubMed] [Google Scholar]
Song X and Wang CY (2008), ‘Semiparametric approaches for joint modeling of longitudinal and survival data with time varying coefficients’, Statistica Sinica, 27, 3178–3190. [DOI] [PubMed] [Google Scholar]
Song X and Wang L (2017), ‘Partially time-varying coefficient proportional hazards models with error prone time-dependent covariates — an application to the AIDS clinical trial group 175 data.’, The Annals of Applied Statistics, 11, 274–296. [Google Scholar]
Stone CJ, Hansen M, Kooperberg C, and Truong YK (1997), ‘Polynomial splines and their tensor products in extended linear modeling (with discussion)’, The Annals of Statistics, 25, 1371–1470. [Google Scholar]
Tibshirani R (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society, Series B, 58, 172–183. [Google Scholar]
Tsiatis AA and Davidian M (2001), ‘A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error’, Biometrika, 88, 447–458. [DOI] [PubMed] [Google Scholar]
Wang CY, Hsu L, Feng ZD, and Prentice RL (1997), ‘Regression calibration in failure time regression’, Biometrics, 53, 131–145. [PubMed] [Google Scholar]
Wulfsohn MS and Tsiatis AA (1997), ‘A joint model for survival and longitudinal data measured with error’, Biometrics, 53, 330–339. [PubMed] [Google Scholar]
Xu J and Zeger SL (2001), ‘Joint analysis of longitudinal data comprising repeated measures and times to events’, Applied Statistics, 50, 375–387. [Google Scholar]
Xue L (2009), ‘Consistent variable selection in additive models’, Statistica Sinica, 19, 1281–1296. [Google Scholar]
Xue L, Qu A, and Zhou J (2010), ‘Consistent model selection for marginal generalized additive model for correlated data’, Journal of the American Statistical Association, 105 (492), 1518–1530. [Google Scholar]
Xue L and Yang L (2006), ‘Additive coefficient modeling via polynomial spline’, Statistica Sinica, 16, 1423–1446. [Google Scholar]
Yan J and Huang J (2012), ‘Model selection for Cox models with time-varying coefficients’, Biometrics, 68, 419–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang H and Lu W (2007), ‘Adaptive lasso for Cox’s proportional hazards model’, Biometrika, 94, 691–703. [Google Scholar]

[R1] Cai J, Fan J, Jiang J, and Zhou H (2008), ‘Partially linear hazard regression with varying coefficients for multivariate survival data’, Journal of the Royal Statistical Society, Series B, 70, 141–158. [Google Scholar]

[R2] Dafni UG and Tsiatis AA (1998), ‘Evaluating surrogate markers of clinical outcome measured with error’, Biometrics, 54, 1445–1462. [PubMed] [Google Scholar]

[R3] Donoho DL and Johnstone IM (1998), ‘Minimax estimation via wavelet shrinkage’, The Annals of Statistics, 26, 879–921. [Google Scholar]

[R4] Fan J and Li R (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, Journal of the American Statistical Association, 96, 1348–1360. [Google Scholar]

[R5] Fan J and Li R (2002), ‘Variable selection for Cox’s proportional hazards model and frailty model’, The Annals of Statistics, 30, 74–99. [Google Scholar]

[R6] Faucett CJ and Thomas DC (1996), ‘Simultaneously modeling censored survival data and repeatedly measured covariates: a gibbs sampling approach’, Statistics in Medicine, 15, 1663–1685. [DOI] [PubMed] [Google Scholar]

[R7] Gui J and Li H (2005), ‘Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data’, Bioinformatics, 21, 3001–3008. [DOI] [PubMed] [Google Scholar]

[R8] Hammer SM, Katezstein DA, Hughes MD, Gundaker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, Hirsch MS, and Merigan TC (1996), ‘A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter’, New England Journal of Medicine, 335, 1081–1089. [DOI] [PubMed] [Google Scholar]

[R9] Henderson R, Diggle P, and Dobson A (2000), ‘Joint modeling of longitudinal measurements and event time data’, Biostatistics, 4, 465–480. [DOI] [PubMed] [Google Scholar]

[R10] Huang J (1998), ‘Projection estimation in multiple regression with application to functional anova models’, The Annals of Statistics, 26, 242–272. [Google Scholar]

[R11] Huang J and Ma S (2010), ‘Variable selection in the accelerated failure time model via the bridge method’, Lifetime Data Analysis, 16, 176–195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Huang Y and Wang CY (2000), ‘Cox regression with accurate covariates unascertainable: A nonparametric correction approach’, Journal of the American Statistical Association, 95, 1209–1219. [Google Scholar]

[R13] Ma S, Kosorok M, and Fine J (2006), ‘Additive risk models for survival data with high-dimensional covariates’, Bometrics, 62, 202–210. [DOI] [PubMed] [Google Scholar]

[R14] Miyata S and Shen X (2012), ‘Adaptive free-knot splines’, Journal of Computational and Graphical Statistics, 12, 197–213. [Google Scholar]

[R15] Nan B, Lin X, Lisabeth L, and Harlow S (2005), ‘A varying-coefficient Cox model for the effect of age at a marker event on age at menopause’, Biometrics, 61, 576–583. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Prentice R (1982), ‘Covariate measurement errors and parameter estimates in a failure time regression model’, Biometrika, 69, 331–342. [Google Scholar]

[R17] Sleeper LA and Harrington DP (1990), ‘Regression splines in the cox model with application to covariate effects in liver disease’, Journal of the American Statistical Association, 85, 941–949. [Google Scholar]

[R18] Song X, Davidian M, and Tsiatis AA (2002a), ‘An estimator for the proportional hazards model with multiple longitudinal covariates measured with error’, Biostatistics, 3, 511–528. [DOI] [PubMed] [Google Scholar]

[R19] Song X, Davidian M, and Tsiatis AA (2002b), ‘A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data’, Biometrics, 58, 742–753. [DOI] [PubMed] [Google Scholar]

[R20] Song X and Huang Y (2005), ‘On corrected score approach for proportional hazards model with covariate measurement error’, Biometrics, 61, 702–714. [DOI] [PubMed] [Google Scholar]

[R21] Song X and Wang CY (2008), ‘Semiparametric approaches for joint modeling of longitudinal and survival data with time varying coefficients’, Statistica Sinica, 27, 3178–3190. [DOI] [PubMed] [Google Scholar]

[R22] Song X and Wang L (2017), ‘Partially time-varying coefficient proportional hazards models with error prone time-dependent covariates — an application to the AIDS clinical trial group 175 data.’, The Annals of Applied Statistics, 11, 274–296. [Google Scholar]

[R23] Stone CJ, Hansen M, Kooperberg C, and Truong YK (1997), ‘Polynomial splines and their tensor products in extended linear modeling (with discussion)’, The Annals of Statistics, 25, 1371–1470. [Google Scholar]

[R24] Tibshirani R (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society, Series B, 58, 172–183. [Google Scholar]

[R25] Tsiatis AA and Davidian M (2001), ‘A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error’, Biometrika, 88, 447–458. [DOI] [PubMed] [Google Scholar]

[R26] Wang CY, Hsu L, Feng ZD, and Prentice RL (1997), ‘Regression calibration in failure time regression’, Biometrics, 53, 131–145. [PubMed] [Google Scholar]

[R27] Wulfsohn MS and Tsiatis AA (1997), ‘A joint model for survival and longitudinal data measured with error’, Biometrics, 53, 330–339. [PubMed] [Google Scholar]

[R28] Xu J and Zeger SL (2001), ‘Joint analysis of longitudinal data comprising repeated measures and times to events’, Applied Statistics, 50, 375–387. [Google Scholar]

[R29] Xue L (2009), ‘Consistent variable selection in additive models’, Statistica Sinica, 19, 1281–1296. [Google Scholar]

[R30] Xue L, Qu A, and Zhou J (2010), ‘Consistent model selection for marginal generalized additive model for correlated data’, Journal of the American Statistical Association, 105 (492), 1518–1530. [Google Scholar]

[R31] Xue L and Yang L (2006), ‘Additive coefficient modeling via polynomial spline’, Statistica Sinica, 16, 1423–1446. [Google Scholar]

[R32] Yan J and Huang J (2012), ‘Model selection for Cox models with time-varying coefficients’, Biometrics, 68, 419–428. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Zhang H and Lu W (2007), ‘Adaptive lasso for Cox’s proportional hazards model’, Biometrika, 94, 691–703. [Google Scholar]

PERMALINK

Variable selection for partially linear proportional hazards model with covariate measurement error

Xiao Song

Li Wang

Shuangge Ma

Hanwen Huang

Abstract

1. Introduction

2. Model definition

3. Approaches

3.1. Estimation

Corrected score

Conditional score

3.2. Penalized variable selection

3.3. Asymptotic properties

Theorem 3.1

Theorem 3.2

Theorem 3.3

Theorem 3.4

4. Simulation studies

Table 1.

Table 3.

Figure 1.

Table 4.

5. Application

Table 5.

Figure 2.

6. Discussion

Table 2.

Acknowledgements

Appendix

A.1. Regularity conditions

A.2. Proof of Theorem 1

Lemma A.1

Lemma A.2

Lemma A.3

A.3. Proof of Theorem 2

Lemma A.4

Lemma A.5

A.4. Proof of Theorem 3

A.5. Proof of Theorem 4

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases