Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring

Hsiang Yu; Yu-Jen Cheng; Ching-Yun Wang

doi:10.1515/ijb-2016-0001

. Author manuscript; available in PMC: 2017 Nov 1.

Published in final edited form as: Int J Biostat. 2016 Nov 1;12(2):/j/ijb.2016.12.issue-2/ijb-2016-0001/ijb-2016-0001.xml. doi: 10.1515/ijb-2016-0001

Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring

Hsiang Yu ¹, Yu-Jen Cheng ¹, Ching-Yun Wang ²

PMCID: PMC5490505 NIHMSID: NIHMS870416 PMID: 27497870

SUMMARY

Recurrent event data arise frequently in many longitudinal follow-up studies. Hence, evaluating covariate effects on the rates of occurrence of such events is commonly of interest. Examples include repeated hospitalizations, recurrent infections of HIV, and tumor recurrences. In this article, we consider semiparametric regression methods for the occurrence rate function of recurrent events when the covariates may be measured with errors. In contrast to the existing works, in our case, the conventional assumption of independent censoring is violated because the recurrent event process is interrupted by some correlated events, which is called informative drop-out. Furthermore, some covariates may be measured with errors. To accommodate for both informative censoring and measurement error, the occurrence of recurrent events is modeled using an unspecified frailty distribution and accompanied with a classical measurement error model. We propose two corrected approaches based on different ideas, and we show that they are numerically identical when estimating the regression parameters. The asymptotic properties of the proposed estimators are established, and the finite sample performance is examined via simulations. The proposed methods are applied to the Nutritional Prevention of Cancer trial to assess the effect of the plasma selenium treatment on the recurrence of squamous cell carcinoma.

Keywords: Informative censoring, Measurement error, Surrogate covariate, Recurrent event data

1 Introduction

In many longitudinal follow-up studies, recurrent event data are collected when subjects experience an event multiple times. For example, patients with superficial bladder cancer may experience tumor recurrence many times; patients with cystic fibrosis may experience repeated lung exacerbations; and patients with chronic granulomatous disease may experience repeated pyogenic infections (Morgan, Butler, Johnson, Colin, FitzSimmons, Geller, Konstan, Light, Rabin, Regelmann et al., 1999, Fleming and Harrington, 1991). Models for recurrent event data can be categorized into two different classes: time-to-event or gap time models. In time-to-event models, interest focuses on the occurrence rate of an event over time (Lawless, Hu, and Cao, 1995, Hu and Lawless, 1996, Hu, Lagakos, and Lockhart, 2009). In gap time models, interest lies in the gap time between two consecutive events (Lin, Sun, and Ying, 1999).

In this study, we focus on the time-to-event models. The time-to-event models may be constructed based on an intensity function (Prentice, Williams, and Peterson, 1981) or a rate function (Hu and Lagakos, 2007, Hu et al., 2009). The intensity function uniquely determines the probability structure of the recurrent event process. However, it needs to correctly specify the occurrence of an event given the prior event history. On the other hand, the rate function allows for arbitrary dependence among the recurrent events and provides a direct interpretation of the occurrence rate without conditioning the prior event history. Our primary focus is to assess the average effects of treatments or risk factors, that is, we are mainly interested in the inference of the rate function. Lawless and Nadeau (1995) estimated the cumulative rate function nonparametrically and applied their approach to industrial warranty data. In addition, Hu and Lagakos (2007) proposed a nonparametric method to study the rate function of the viral load changing process for HIV-infected patients. Nevertheless, all of the above approaches need to assume non-informative censoring, that is, the observation mechanism is independent of the recurrent process. In practice, the assumption is usually violated, for example, when the recurrent event process is interrupted by some terminal events that are related to the recurrent events. A potential remedy is to consider a frailty model, which allows dependence between the recurrent event process and the informative drop-out through a non-negative frailty variable. In general, the distribution of the frailty variable is assumed to be known (Lancaster and Intrator, 1998) and thus the likelihood-based approach (Nielsen, Gill, Andersen, and Sørensen, 1992) is preferred. More recently, Kalbfleisch, Schaubel, Ye, and Gong (2013) proposed a weighted estimating equation approach, with the weight specified by a gamma frailty distribution. However, in general it is not easy to verify the frailty distribution due to invisibility of the frailty variable. To avoid specification of the frailty distribution, Wang, Qin, and Chiang (2001) and Wang and Huang (2014) considered a conditional likelihood approach, where the unobserved frailty variables are “conditioned away” in their proposed estimating equations.

The aforementioned approaches, nevertheless, require that the covariates are correctly measured. In many epidemiologic or medical studies, the covariates may suffer from measurement errors. For example, the baseline plasma selenium level is an important predictor for the occurrence of skin cancers in the Nutritional Prevention of Cancer (NPC) trial study (Clark, Combs, Turnbull, Slate, Chalker, Chow, Davis, Glover, Graham, Gross et al., 1996). However, the true value of the plasma selenium level can never be measured because of intrinsic biological variability or limited instrumental precision. Instead, the values we observed are contaminated by measurement errors. The most convenient approach is to treat the observed covariates as the true covariates in the regular estimating procedure, which is also referred to as the naive approach. However, the naive estimator obtained from this approach is generally known to be inconsistent (Carroll, Ruppert, Stefanski, and Crainiceanu, 2006, Chapter 3). In survival and longitudinal data analysis, intensive research has been performed to address measurement error problems. For Cox regression, Prentice (1982) proposed a likelihood approach with normal measurement error and rare disease assumptions. Wang, Hsu, Feng, and Prentice (1997) applied regression calibration to the partial score function and investigated the performance of the regression calibration estimator through simulation studies, whereas Nakamura (1992) constructed unbiased estimating equations based on the concept of corrected scores. For nonlinear mixed models, Wu (2002), Liu and Wu (2007) and Wu, Liu, and Hu (2010) proposed estimating approaches for longitudinal response data when the covariates are measured with errors, which can also account for censoring in the response and missing data. In recurrent event analysis, nonetheless, little has been addressed regarding measurement error problems. Under a normal measurement error assumption, Jiang, Turnbull, and Clark (1999) proposed a moment corrected method to adjust for the bias of a naive estimator under a semi-parametric model. However, their approach not only requires the assumption of non-informative censoring but also assumes that the censoring distribution is independent of the covariates.

The present study is motivated by the NPC trial study, which aimed to assess the efficacy of the oral supplement of plasma selenium in preventing the development of skin cancer, such as squamous cell carcinoma (SCC). This clinical trial began in 1983 and included approximately 1300 patients with dermatologic cancer histories. Nearly half of the patients in the NPC trial were randomly assigned to either the placebo or treatment groups. Patients in the treatment arm were supposed to take 200μg of plasma selenium supplement per day. In the study period, the patients might experience SCC events repeatedly. Each incidence of a new SCC was diagnosed and recorded by the certified doctors. The medical records were re-viewed by the clinical coordinators at each semi-annual visit, annual contact or by self-report to ensure the completeness of the data. At the time of randomization, many prognostic risk factors of SCC were recorded, including the baseline plasma selenium level. As mentioned, the plasma selenium level may include errors. In the original study, Clark et al. (1996) did not take measurement error into account and found a nonsignificant negative plasma selenium effect on developing SCC. The result contradicted the evidence of the previous studies, which showed high correlation between the plasma selenium level and several types of cancer. Later, many studies focused on the effect of plasma selenium level on the recurrences of SCC by assuming an independent censoring assumption, some of which also took measurement error into account (Jiang et al., 1999). However, we found a negative relationship between the censoring time and the SCC occurrence rate. This implies that the independent censoring assumption is not satisfied. Therefore, the existing methods are not appropriate for the NPC trial data.

This paper is organized as follows. In Section 2, statistical models for recurrent events and measurement errors are given. In Section 3, we propose a regression calibration method and a moment corrected method to correct the measurement errors in the presence of informative censoring. The simulation results are given in Section 4 to investigate the finite sample performance of the proposed methods. Then, we applied the proposed methods to the NPC trial data to evaluate the effect of selenium on the recurrence of SCC in Section 5. Finally, we concluded with a discussion in Section 6. The regularity conditions and technical proofs are provided in the appendices and the Supplementary Information.

2 Model illustration

2.1 Recurrent event model

Assume that there are n independent individuals in the cohort. Let subscript i be the index for a subject, where i = 1, …, n. For the ith subject, let N_i(t) denote the number of recurrent events occurring up to t within a fixed time period [0,τ], where the recurrent event process could be observed beyond τ. Let Z_i be a q × 1 vector of covariates that is precisely measured and X_i be a p × 1 vector of covariates that can be measured with errors. Let ℰ denote expectation over the samples, ν_i be the unobserved frailty variable with mean ℰ (ν_i | X_i,Z_i) = μ_ν, which does not depend on (X_i,Z_i), and C_i be the informative censoring time. Suppose that conditional on (ν_i,X_i,Z_i), N_i(t) follows a Poisson process with a multiplicative intensity function

λ (t ∣ ν_{i}, X_{i}, Z_{i}) = ν_{i} λ_{0} (t) e^{β_{X}^{'} X_{i} + β_{Z}^{'} Z_{i}}

(1)

where λ₀(t) is a baseline function and ${(β_{X}^{'}, β_{Z}^{'})}^{'}$ is a vector of regression parameters. Note that when ν is given, model (1) is also a rate function due to the assumption of the Poisson process. In general, regression parameters can be estimated either by a likelihood-based approach or by solving a set of unbiased estimating equations. If the distribution of ν is assumed and the true covariates are observed, then the standard procedure of the likelihood-based approaches can be conducted by integrating out ν (Cook and Lawless, 2007, Chapter 3). There are several popular choices for the frailty distribution, such as gamma, log-normal, and positive stable distribution. Balakrishnan and Peng (2006) advocated using the generalized gamma distribution as the frailty distribution because it includes many distributions (e.g., Weibull, log-normal, gamma, positive stable distribution) as special cases. Recently, Mazroui, Mathoulin-Pelissier, Soubeyran, and Rondeau (2012) and Zeng, Ibrahim, Chen, Hu, and Jia (2014) proposed a joint frailty model with two independent frailty variables to distinguish the dependence within the recurrent events and the association between the recurrent event process and terminal events. However, the determination of the frailty distribution usually depends on computational convenience instead of on biological reasons or data characteristics. Balakrishnan and Peng (2006) noted that an inappropriate frailty distribution may result in a large bias in the estimation.

Alternatively, we can construct a set of unbiased estimating equations based on the cumulative rate function. According to model (1), the cumulative rate function up to time t is

ℰ (N_{i} (t) ∣ X_{i}, Z_{i}) = ℰ (ℰ (N_{i} (t) ∣ ν_{i}, X_{i}, Z_{i}) ∣ X_{i}, Z_{i}) = Λ_{0} (t) e^{α_{0} + β_{X}^{'} X_{i} + β_{Z}^{'} Z_{i}}, \forall t \in [0, τ]

(2)

where $Λ_{0} (t) = \int_{0}^{t} λ_{0} (u) d u$ and α₀ = log(μ_ν). An advantage of using estimating equations over a likelihood-based approach is that one can avoid misspecifying the frailty distribution. However, to solve estimating equations based on (2), Λ₀(t) needs to be known and the true covariates need to be observed. Both deficiencies motivate us to consider the recurrent event process with an unspecified distribution of the frailty variable and an unknown Λ₀(t) in this article.

2.2 Measurement error model

For subject i, let W_{i j} be the jth replicated surrogate measurement of the true covariate vector X_i and k_i be the number of replicates of W_i. Assume that the surrogate measurement satisfies the classical measurement error model

W_{i j} = X_{i} + U_{i j}, i = 1, \dots, n, j = 1, \dots, k_{i},

where U_{i j} are random errors. Suppose that U_{i j} are independent of (ν_i,X_i,Z_i) and C_i, which implies that the measurement errors are non-differential. In other words, W_i provides no additional information regarding the event process when the true covariate X_i is given (Carroll et al., 2006, Chapter 2). Let μ_s and Σ_s be the mean and covariance matrix of a random vector s, Σ_sh be the covariance matrix of two random vectors (s,h), and γ = (μ_X,μ_Z,Σ_U,Σ_X,Σ_Z,Σ_XZ) be the parameter of the distribution of X given (W,Z). We assume that X given (W̄,Z) follows a multivariate normal distribution with mean

ℰ (X ∣ \bar{W}, Z, γ) = μ_{X} + (\begin{array}{l} \sum_{X} & \sum_{X Z} \end{array}) {(\begin{matrix} \sum_{X} + \sum_{U} / k & \sum_{X Z} \\ \sum_{Z X} & \sum_{Z} \end{matrix})}^{- 1} (\begin{matrix} \bar{W} - μ_{X} \\ Z - μ_{Z} \end{matrix})

and variance

\sum (γ) = \sum_{X} - (\begin{matrix} \sum_{X} & \sum_{X Z} \end{matrix}) {(\begin{matrix} \sum_{X} + \sum_{U} / k & \sum_{X Z} \\ \sum_{Z X} & \sum_{Z} \end{matrix})}^{- 1} (\begin{matrix} \sum_{X} \\ \sum_{Z X} \end{matrix}) .

As in Carroll et al. (2006, Chapter 4), the formula given above is the best linear approximation of ℰ (X | W̄,Z,γ), and it can also be applied when Z is discrete.

3 Correction for errors-in-variable

Assume that the observed data {(C_i, (T_i₁, …,T_{im_i}),{W_i₁, …,W_{ik_i}},Z_i), i=1, …,n} are independent and identically distributed (iid), where T_{i j} denotes the observed event time for j = 1, …,m_i, and m_i denotes the number of recurrent events that occurred before C_i. As mentioned in Section (2.1), C is conditionally independent of the recurrent event process N(t) given (ν,X,Z). Then, by (2), we have

ℰ (N (C) Λ_{0}^{- 1} (C) ∣ X, Z) = ℰ (ℰ (N (C) Λ_{0}^{- 1} (C) ∣ ν, C, X, Z) ∣ X, Z) = e^{α_{0} + β_{X}^{'} X + β_{Z}^{'} Z} .

If Λ₀(t) and X are known, the estimating equations $\sum_{i = 1}^{n} (1, X_{i}^{'}, Z_{i}^{'}) {m_{i} Λ_{0}^{- 1} (C_{i}) - e^{a_{0} + β_{X}^{'} X_{i} + β_{Z}^{'} Z_{i}}} = 0$ for (β_X,β_Z) are unbiased. In practice, they cannot be implemented because X_i is unobserved and Λ₀(t) is unknown. To deal with the unknown function Λ₀(t), we start with the conditional likelihood function of (T_i₁, …,T_{im_i}) given (C_i,ν_i,m_i,X_i,Z_i). Under the assumption of the Poisson process, such a conditional likelihood can be constructed from a set of iid random variables with truncated density $\prod_{j = 1}^{m_{i}} λ_{0} (T_{i j}) / Λ_{0} (C_{i}) I (0 \leq T_{i j} \leq C_{i})$ . Define a rescaled baseline function ϕ (t) ≡ λ₀(t)/Λ₀(τ) and $Φ (t) = \int_{0}^{t} ϕ (u) d u = Λ_{0} (t) / Λ_{0} (τ)$ for t ∈ [0,τ], where Φ(τ)=1. The conditional likelihood is given by $\prod_{i = 1}^{n} P (T_{i 1}, \dots, T_{{i m}_{i}} ∣ C_{i}, ν_{i}, m_{i}, X_{i}, Z_{i})$ , which is proportional to $\prod_{i = 1}^{n} \prod_{j = 1}^{m_{i}} {ϕ (T_{i j}) / Φ (C_{i})}$ . As noted by Wang et al. (2001), the conditional likelihood shares the same form as the nonparametric likelihood for right-truncated data. Thus, Φ(t) can be consistently estimated using the product limit estimator

\hat{Φ} (t) = \prod_{T_{(l)} > t} (1 - \frac{n_{(l)}}{N_{(l)}}),

where {T₍_l₎} are the ordered and distinct values of {T_{i j}}_{i=1,…,n; j=1,…,m_i}, n₍_l₎ is the number of events that occurred at T₍_l₎, and N₍_l₎ is the number of events that satisfy T_{i j} ≤ T₍_l₎ ≤C_i. Note that the non-parametric estimation of Φ does not require any information from the covariates and the unobserved frailty variable. Hence, Φ̂(t) is a consistent estimator even if X is measured with errors or the frailty distribution is unspecified.

For the issue of identifiability, let μ_ν = 1 without loss of generality. The expectation of the event number divided by the rescaled baseline function before time C is

E (N (C) Φ^{- 1} (C) ∣ X, Z) = E (E (N (C) Φ^{- 1} (C) ∣ C, ν, X, Z) ∣ X, Z) = e^{β_{0} + β_{X}^{'} X + β_{Z}^{'} Z},

where β₀ = log(Λ₀(τ)). With the above equation, we can construct the unbiased estimating equations by using Φ(t) instead of the unknown Λ₀(t). After replacing the unknown X with the average of the replicates ${\bar{W}}_{i} = \sum_{j = 1}^{k_{i}} W_{i j} / k_{i}$ , we can obtain the naive estimating equations

U_{N} (b) = n^{- 1} \sum_{i = 1}^{n} (\begin{matrix} 1 \\ {\bar{W}}_{i} \\ Z_{i} \end{matrix}) {m_{i} {\hat{Φ}}^{- 1} (C_{i}) - e^{b_{0} + b_{X}^{'} {\bar{W}}_{i} + b_{Z}^{'} Z_{i}}} = 0.

(3)

Then, the naive estimator ${\hat{β}}_{N} = {({\hat{β}}_{N, 0}, {\hat{β}}_{N, X}^{'}, {\hat{β}}_{N, Z}^{'})}^{'}$ is obtained by solving equation (3) and Λ₀(t) can be estimated by ${\hat{Λ}}_{0}^{N} (t) = \hat{Φ} (t) exp ({\hat{β}}_{N, 0})$ . Due to the measurement errors, it can be shown that β̂_N does not converge to the true parameter $β = {(β_{0}, β_{X}^{'}, β_{Z}^{'})}^{'}$ . Based on (3), we develop a regression calibration method and a moment corrected method to adjust for the measurement errors in the following subsections.

3.1 Regression calibration approach

The regression calibration (RC) method is based on the assumption that the induced model of the response conditioning on (W̄,Z) can be well approximated by the underlying model, with X being replaced by the conditional mean ℰ (X | W̄,Z). The RC estimator is obtained by treating ℰ (X | W̄,Z) as the true covariate X in the standard estimating procedure (Carroll et al., 2006, Chapter 4). Although the RC method generally leads to inconsistent estimation in nonlinear models, it is still valuable, with the advantages of computational efficiency and limited bias under some conditions (Carroll et al., 2006, Prentice, 1982).

Under our framework, the RC method substitutes W̄ with ℰ (X | W̄,Z,γ) in equation (3). If the measurement error covariance matrix Σ_U is known, we can estimate the other components of γ using the observed data without replicates. If not, replicates are needed to estimate Σ_U (Wang et al., 1997, Wang, 1999, Carroll et al., 2006). By using the method of moments, the estimator γ̂ of γ can be obtained by solving $n^{- 1} \sum_{i = 1}^{n} Ψ_{i} (γ) = 0$ , where Ψ_i(γ) is given in Appendix A. Then, the RC estimator ${\hat{β}}_{R} = {({\hat{β}}_{R, 0}, {\hat{β}}_{R, X}, {\hat{β}}_{R, Z}^{'})}^{'}$ is obtained by solving the equations

U_{R} (b) = n^{- 1} \sum_{i = 1}^{n} (E (X_{i} ∣ \begin{matrix} 1 \\ {\bar{W}}_{i} \\ Z_{i} \end{matrix}, Z_{i}, \hat{γ})) {m_{i} {\hat{Φ}}^{- 1} (C_{i}) - e^{b_{0} + b_{X}^{'} E (X_{i} ∣ {\bar{W}}_{i}, Z_{i}, \hat{γ}) + b_{Z}^{'} Z_{i}}} = 0.

(4)

Coincidently, the conditional expectation of mΦ⁻¹(C), given the observed covariate (W̄,Z), is $exp (β_{0} + β_{X}^{'} \sum (γ) β_{X} / 2 + β_{X}^{'} E (X ∣ \bar{W}, Z, γ) + β_{Z}^{'} Z)$ . Thus, the RC estimator β̂_R converges to a limit $β_{R} = (β_{0} + β_{X}^{'} \sum {(γ) β_{X} / 2, β_{X}^{'}, β_{Z}^{'})}^{'}$ . The result implies that the RC estimator is consistent for the regression coefficients but not for the intercept.

Note that β̂_R,₀ converges to $β_{0} + β_{X}^{'} \sum (γ) β_{X} / 2$ . Let Σ̂ be the estimator of Σ(γ), which is calculated as ${\sum^{^}}_{X} - ({\sum^{^}}_{X} - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X}) {({\sum^{^}}_{\bar{W}} - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X})}^{- 1} ({\sum^{^}}_{X} - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X}) - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X}$ , where ${\sum^{^}}_{\bar{W}} = {\sum^{^}}_{X} + {\sum^{^}}_{U} \sum_{i = 1}^{n} {({n k}_{i})}^{- 1}$ . The RC estimator of Λ₀(t) can be adjusted as ${\hat{Λ}}_{0}^{R} (t) = \hat{Φ} (t) exp ({\hat{β}}_{R, 0} - {\hat{β}}_{R, X}^{'} \sum^{^} {\hat{β}}_{R, X} / 2)$ , which converges to Λ₀(t). In the Supplementary Information, we show that $\sqrt{n} ({\hat{β}}_{R} - β_{R})$ is asymptotically normally distributed with mean zero and variance A⁻¹Σ_g{A⁻¹}′; A and Σ_g are defined in Proposition 1 in Appendix A. The covariance matrix estimation of the RC estimator is also given in Appendix B.

3.2 Moment corrected approach

The moment corrected (MC) method is motivated by the bias-correction method proposed by Stefanski (1985). Under the classical measurement error model, Stefanski (1985) showed that the naive estimator converges to a limit that is a function of the true parameter and the error variance. Accordingly, the bias of the naive estimator can be corrected based on the relationship between the limit of the naive estimator and the true parameter.

Based on this idea, we show that the naive estimator β̂_N converges to a limit $β_{N} = {(β_{N, 0}, β_{N, X}^{'}, β_{N, Z}^{'})}^{'}$ , which satisfies

E {U_{N} (β_{N}) ∣ \bar{W}, Z} = E (\begin{matrix} 1 \\ \bar{W} \\ Z \end{matrix}) {E (m Φ^{- 1} (C) ∣ \bar{W}, Z) - e^{β_{N, 0} + β_{N, X}^{'} \bar{W} + β_{N, Z}^{'} Z}} = 0.

(5)

In the Supplementary Information, we have shown that the root of (5) is unique. As described in Section (2.2), we assume that X given (W̄,Z) follows a multivariate normal distribution. For the convenience of derivation, we re-parametrize the conditional mean as ℰ (X | W̄,Z,γ) =η₀+η_WW̄+η_ZZ, where I_p denotes an identity matrix of size p, η₀ = (I_p −η_W)μ_X − η_Zμ_Z, $η_{W} = (\sum_{X} - \sum_{X Z} \sum_{Z}^{- 1} \sum_{Z X}) {(\sum_{\bar{W}} - \sum_{X Z} \sum_{Z}^{- 1} \sum_{Z X})}^{- 1}$ , and $η_{Z} = {I_{p} - (\sum_{X} - \sum_{X Z} \sum_{Z}^{- 1} \sum_{Z X}) {(\sum_{\bar{W}} - \sum_{X Z} \sum_{Z}^{- 1} \sum_{Z X})}^{- 1}} \sum_{X Z} \sum_{Z}^{- 1}$ . Based on the non-differential error assumption, it follows that $E (m Φ^{- 1} (C) ∣ \bar{W}, Z) = E (E (m Φ^{- 1} (C) ∣ X, Z) ∣ \bar{W}, Z) = exp (β_{0} + β_{X}^{'} E (X ∣ \bar{W}, Z, γ) + β_{X}^{'} \sum (γ) β_{X} / 2 + β_{Z}^{'} Z)$ . Thus, we can easily show that the unique root β_N of (5) is related to the true parameter β as $β_{N, 0} = β_{0} + β_{X}^{'} η_{0} + β_{X}^{'} \sum (γ) β_{X} / 2, β_{N, X} = η_{W}^{'} β_{X}$ and $β_{N, Z} = β_{Z} + η_{Z}^{'} β_{X}$ . Specifically, β_N = D(β, η) is a one-to-one function of the true parameter $β = {(β_{0}, β_{X}^{'}, β_{Z}^{'})}^{'}$ when the nuisance parameter η = (η₀,η_W,η_Z) is given. Therefore, substituting the estimates of b_N and η in the inverse function D⁻¹ results in the moment corrected estimator

{\hat{β}}_{M} = D^{- 1} ({\hat{β}}_{N}, \hat{η}) = (\begin{matrix} {\hat{β}}_{N, 0} - {\hat{β}}_{N, X}^{'} {\hat{η}}_{W}^{- 1} {\hat{η}}_{0} - {\hat{β}}_{N, X}^{'} {\hat{η}}_{W}^{- 1} \sum^{^} {{\hat{η}}_{W}^{'}}^{- 1} {\hat{β}}_{N, X} / 2 \\ {{\hat{η}}_{W}^{'}}^{- 1} {\hat{β}}_{N, X} \\ {\hat{β}}_{N, Z} - {\hat{η}}_{Z}^{'} {{\hat{η}}_{W}^{'}}^{- 1} {\hat{β}}_{N, X} \end{matrix}),

where β̂_M =(β̂_M,0, β̂_M,X, β̂_M,Z) and η̂₀ =(I_p−η̂_W)μ̂_X −η̂_Zμ̂_Z, ${\hat{η}}_{W} = ({\sum^{^}}_{X} - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X}) {({\sum^{^}}_{\bar{W}} - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X})}^{- 1}, {\hat{η}}_{Z} = {I_{p} - ({\sum^{^}}_{X} - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X}) {({\sum^{^}}_{\bar{W}} - {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1} {\sum^{^}}_{Z X})}^{- 1}} {\sum^{^}}_{X Z} {\sum^{^}}_{Z}^{- 1}$ . Because β̂_M,₀ is consistent for the true intercept β₀, Λ₀(t) can also be consistently estimated by ${\hat{Λ}}_{0}^{M} (t) = \hat{Φ} (t) exp ({\hat{β}}_{M, 0})$ . In summary, the estimating procedure of the MC method is

Solve equation (3) and $\sum_{i = 1}^{n} Ψ_{i} (γ) = 0$ illustrated in Appendix A to obtain β̂_N and γ̂.
Apply β̂_N and η̂ = η(γ̂) to the function D⁻¹ to obtain the MC estimator β̂_M = D⁻¹(β̂_N, η̂).

In the Supplementary Information, we show that $\sqrt{n} ({\hat{β}}_{M} - β)$ is asymptotically normally distributed with mean zero and covariance matrix B⁻¹Σ_h{B⁻¹}′; B and Σ_h are defined in Proposition 2 in Appendix A. The covariate matrix estimation of the MC estimator is also illustrated in Appendix C.

An important feature of the MC estimator is that it is numerically identical to the RC estimator for the regression parameter ${(β_{X}^{'}, β_{Z}^{'})}^{'}$ but not for the intercept β₀. That is, the estimating equations for the two estimators will have exactly the same roots for the regression parameters. The proof of β̂_M,X = β̂_R,X and β̂_M,Z = β̂_R,Z is provided in Appendix D.

4 Simulation study

In this section, we evaluate the performance of the RC and MC methods with the naive approach via the simulation studies. Additionally, the corrected partial likelihood (CPL) approach, proposed by Jiang et al. (1999), is also listed for comparison. The CPL estimator takes measurement error into account but assumes non-informative and covariate-independent censoring.

We consider a regression model with a continuous covariate X and a discrete covariate Z. Let $X ~ N (0, σ_{X}^{2} = 1 / 3)$ be an error-prone covariate that is unobserved and Z ~ Bin(0.5) be a random treatment assignment that is precisely obtained. For subject i, we generate k_i repeated surrogates W_{i j} = X_i+U_{i j} for X_i, where k_i follows a discrete uniform distribution ranging from 1 to 4 and $U_{i j} ~ N (0, σ_{U}^{2})$ . With the repeated surrogates, we estimate the nuisance parameter γ by solving $\sum_{i = 1}^{n} Ψ_{i} (γ) / n = 0$ , where Ψ is shown in the appendices. We conduct the simulations with reliability ratio (RR) $σ_{X}^{2} / (σ_{X}^{2} + σ_{U}^{2}) = 0.8$ and 0.5. The reliability ratio is used to represent the magnitude of the error contamination; lower RR indicates higher error contamination. We generate $ν_{i}^{*}$ from a mixture model, in which ν^* follows a uniform distribution ranging from 0.5 to 1.5 when Z_i =0 and follows a uniform distribution ranging from 1.5 to 4 otherwise. Then, the frailty variable is $ν_{i} = exp (- Z_{i} log (2.75)) ν_{i}^{*}$ . When (ν_i,X_i,Z_i) is given, the recurrent event process {N_i(t)} is generated with the corresponding intensity function λ (t | ν_i,X_i,Z_i) = ν_iλ₀(t)exp(β_XX_i+β_ZZ_i), in which λ₀(t) = (t − 6)³=360+0.6, t ∈ [0,τ], τ = 10. We consider two distinct coefficient parameters (β_X,β_Z) = (log(1.5), log(1.5)) and (β_X,β_Z) = (log(3), log(1.5)) in each scenario. The first two scenarios are conducted under different censoring time settings. In Scenario 1, we let the censoring time C depend on W. If W_i₁ > 0, C_i is generated from an exponential distribution with mean $10 ν_{i}^{- 1}$ and is truncated after τ = 10; otherwise, C_i is generated from an exponential distribution with mean $0.5 ν_{i}^{- 1}$ and is truncated after τ = 10. In Scenario 2, let the censoring time C depend on X. We generate C_i from the mixed exponential distribution in the same way as in Scenario 1, with W_i replaced by X_i. Next, we investigate the sensitivity of the proposed methods to the conditionally normal assumption imposed on X. In Scenario 3, X is uniformly distributed over the interval ( $- \sqrt{3 σ_{X}^{2}}, \sqrt{3 σ_{X}^{2}}$ ) and Z is allowed to be correlated with X. Let Z^* = X + ε, where $ε ~ N (0, σ_{X}^{2})$ ; Z = 1 if Z^* ≤ 0 and Z = 0 otherwise. The other variables are generated in the same manner as those in Scenario 2. A non-normal measurement error case is considered in Scenario 4. We generate measurement error U from a skew normal distribution with mean 0, variance $σ_{U}^{2}$ and skewness parameter α = −2 and X from $N (0, σ_{X}^{2} = 1 / 3)$ . The remaining variables are generated in the same manner as those in Scenario 3. A total of 200 replicates with sample sizes n = 300 and n = 600 are generated in each simulation configuration. In the tables, BIAS denotes the average bias, ASE denotes the average standard error estimation, ESD denotes the empirical sample standard deviation, and CP and CL denote, respectively, the coverage probability and average interval length of the 95% confidence interval based on the 200 runs. The standard errors of the proposed estimators are obtained by taking the square roots of the diagonal elements from the sandwich variance estimators given in Appendices B and C.

The results of Scenarios 1 to 4 are demonstrated in Tables 1 to 4. In general, the naive estimator for β_X has a bias problem with low coverage probabilities, as shown in all tables. This phenomenon is due to the common attenuation effect. The degree of bias becomes critical when β_X is large and RR is low. In Scenarios 1 and 2, the naive estimation of β_Z is not affected by the measurement errors because X and Z are generated to be mutually independent. When X and Z are correlated (as shown in Tables 3 and 4), the naive estimator for β_Z also has a bias problem. Moreover, the numerical equivalence of the RC and MC estimators is seen in the simulation results.

Table 1.

Censoring time depends on W; X follows a normal distribution, and X and Z are independent.

		n = 300				n = 600
		Naive	RC	MC	CPL	Naive	RC	MC	CPL
		(β_X,β_Z) = (log (1.5), log (1.5)); RR=0.8
β_X	BIAS ×10³	−83	−1	−1	24	−71	13	13	39
	ASE ×10³	137	172	172	127	96	120	120	87
	ESD ×10³	133	167	167	120	94	117	117	86
	CP	0.93	0.97	0.97	0.96	0.91	0.94	0.94	0.91
	CL ×10³	537	675	676	497	375	470	470	343
β_Z	BIAS ×10³	−2	−2	−2	13	−4	−4	−4	5
	ASE ×10³	164	164	164	106	114	114	114	74
	ESD ×10³	157	157	157	102	120	120	120	75
	CP	0.97	0.96	0.96	0.96	0.96	0.96	0.96	0.95
	CL ×10³	643	644	644	416	446	447	447	292

		(β_X,β_Z) = (log (1.5), log (1.5)); RR=0.5
β_X	BIAS ×10³	−216	−20	−20	−89	−198	11	11	49
	ASE ×10³	102	209	209	150	78	156	156	117
	ESD ×10³	103	211	211	139	69	142	142	115
	CP	0.43	0.92	0.92	0.89	0.23	0.96	0.96	0.925
	CL ×10³	424	868	869	650	304	612	613	460
β_Z	BIAS ×10³	5	7	7	14	7	8	8	12
	ASE ×10³	162	163	164	108	117	117	117	77
	ESD ×10³	168	169	169	107	115	117	117	79
	CP	0.94	0.94	0.94	0.94	0.96	0.96	0.96	0.93
	CL ×10³	650	655	655	427	458	460	460	300

		(β_X,β_Z) = (log (3), log (1.5)); RR=0.8
β_X	BIAS ×10³	−241	−24	−24	3	−214	7	7	23
	ASE ×10³	130	163	163	136	92	115	115	100
	ESD ×10³	129	161	161	139	94	119	119	99
	CP	0.53	0.95	0.95	0.94	0.34	0.96	0.96	0.96
	CL ×10³	508	639	639	532	360	451	451	392
β_Z	BIAS ×10³	22	25	25	10	8	9	9	2
	ASE ×10³	146	146	146	110	103	103	103	79
	ESD ×10³	147	148	148	116	99	102	102	79
	CP	0.95	0.95	0.95	0.94	0.95	0.95	0.95	0.94
	CL ×10³	573	574	574	433	402	403	403	311

		(β_X,β_Z) = (log (3), log (1.5)); RR=0.5
β_X	BIAS ×10³	−539	34	34	67	−551	−4	−4	12
	ASE ×10³	106	221	222	226	75	154	155	158
	ESD ×10³	111	228	228	234	76	146	146	151
	CP	0.00	0.94	0.94	0.95	0.00	0.96	0.96	0.96
	CL ×10³	417	867	868	887	295	606	606	619
β_Z	BIAS ×10³	7	8	8	15	−7	−8	−8	2
	ASE ×10³	155	159	159	130	110	112	112	95
	ESD ×10³	157	163	163	138	119	121	121	92
	CP	0.95	0.94	0.94	0.94	0.94	0.94	0.94	0.95
	CL ×10³	607	622	623	511	430	438	439	371

Open in a new tab

Note: BIAS denotes the average of β̂ − β from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95% confidence interval, CL denotes the average length of Wald 95% confidence interval from 200 samplings.

Table 4.

Censoring time depends on X; U follows a skew normal distribution, and X and Z are correlated.

		n = 300				n = 600
		Naive	RC	MC	CPL	Naive	RC	MC	CPL
		(β_X, β_Z) = (log (1.5), log (1.5)); RR=0.8
β_X	BIAS ×10³	−118	−10	−10	−43	−99	16	16	−44
	ASE ×10³	150	207	207	127	104	142	142	91
	ESD ×10³	150	208	208	118	103	142	142	84
	CP	0.89	0.95	0.95	0.95	0.84	0.95	0.95	0.94
	CL ×10³	589	811	811	499	406	558	559	357
β_Z	BIAS ×10³	97	26	26	84	50	−25	−25	75
	ASE ×10³	198	217	217	130	138	150	150	89
	ESD ×10³	197	217	217	124	133	146	146	89
	CP	0.89	0.95	0.95	0.90	0.94	0.94	0.94	0.88
	CL ×10³	777	851	851	511	541	588	588	349

		(β_X, β_Z) = (log (1.5), log (1.5)); RR=0.5
β_X	BIAS ×10³	−244	5	5	−109	−231	32	32	−115
	ASE ×10³	111	285	285	155	77	195	195	110
	ESD ×10³	114	296	296	137	79	202	202	103
	CP	0.42	0.94	0.94	0.92	0.16	0.93	0.94	0.82
	CL ×10³	437	1115	1118	606	302	763	764	432
β_Z	BIAS ×10³	179	16	16	149	134	−36	−36	139
	ASE ×10³	191	254	255	130	133	174	174	88
	ESD ×10³	187	252	252	120	128	174	174	88
	CP	0.84	0.94	0.94	0.83	0.85	0.95	0.95	0.68
	CL ×10³	747	997	998	508	522	681	682	346

		(β_X, β_Z) = (log (3), log (1.5)); RR=0.8
β_X	BIAS ×10³	−271	40	40	−115	−261	51	51	−121
	ASE ×10³	137	189	190	142	96	133	133	100
	ESD ×10³	144	195	195	144	92	129	129	93
	CP	0.50	0.93	0.92	0.87	0.21	0.94	0.94	0.78
	CL ×10³	536	742	743	558	377	522	522	394
β_Z	BIAS ×10³	157	−46	−46	148	173	−29	−29	166
	ASE ×10³	185	201	201	136	131	142	142	95
	ESD ×10³	172	188	188	122	131	144	144	93
	CP	0.89	0.96	0.96	0.83	0.74	0.95	0.95	0.59
	CL ×10³	725	790	790	534	512	556	556	373

		(β_X, β_Z) = (log (3), log (1.5)); RR=0.5
β_X	BIAS ×10³	−623	98	98	−295	−615	99	99	−273
	ASE ×10³	109	290	291	202	79	215	216	148
	ESD ×10³	112	305	305	172	88	220	220	150
	CP	0.00	0.95	0.95	0.70	0.00	0.93	0.93	0.48
	CL ×10³	428	1139	1142	791	309	805	808	582
β_Z	BIAS ×10³	419	−45	−45	342	411	−56	−56	335
	ASE ×10³	185	255	255	147	130	179	180	103
	ESD ×10³	189	269	269	140	128	185	185	88
	CP	0.39	0.93	0.93	0.36	0.11	0.93	0.93	0.08
	CL ×10³	725	998	1000	576	509	703	705	405

Open in a new tab

Table 3.

Censoring time depends on X; X follows a uniform distribution, and X and Z are correlated.

		n = 300				n = 600
		Naive	RC	MC	CPL	Naive	RC	MC	CPL
		(β_X,β_Z) = (log (1.5), log (1.5)); RR=0.8
β_X	BIAS ×10³	−114	6	6	−71	−119	−2	−2	−78
	ASE ×10³	151	213	213	132	108	152	152	93
	ESD ×10³	163	230	230	142	104	147	147	85
	CP	0.88	0.94	0.94	0.88	0.83	0.95	0.95	0.91
	CL ×10³	591	834	834	519	422	595	595	363
β_Z	BIAS ×10³	98	12	12	81	81	−3	−3	88
	ASE ×10³	200	223	223	138	141	157	157	94
	ESD ×10³	216	239	239	138	139	157	157	94
	CP	0.92	0.93	0.93	0.92	0.91	0.96	0.96	0.84
	CL ×10³	785	875	875	541	552	616	616	370

		(β_X,β_Z) = (log (1.5), log (1.5)); RR=0.5
β_X	BIAS ×10³	−253	2	2	−170	−244	24	24	−152
	ASE ×10³	109	297	298	152	77	207	207	107
	ESD ×10³	125	340	340	153	74	207	207	100
	CP	0.37	0.93	0.93	0.76	0.10	0.95	0.95	0.69
	CL ×10³	426	1166	1168	595	302	811	811	419
β_Z	BIAS ×10³	178	−7	−7	178	180	−13	−13	153
	ASE ×10³	195	275	275	134	137	190	190	95
	ESD ×10³	208	303	303	122	134	186	186	91
	CP	0.85	0.92	0.92	0.75	0.76	0.94	0.94	0.59
	CL ×10³	766	1077	1077	524	537	744	744	372

		(β_X,β_Z) = (log (3), log (1.5)); RR=0.8
β_X	BIAS ×10³	−351	−42	−42	−306	−364	−66	−66	−324
	ASE ×10³	138	197	197	141	98	139	139	100
	ESD ×10³	141	199	199	140	96	133	133	87
	CP	0.25	0.93	0.93	0.42	0.05	0.95	0.95	0.08
	CL ×10³	542	771	772	552	385	544	545	391
β_Z	BIAS ×10³	237	15	15	209	240	26	26	215
	ASE ×10³	189	209	209	148	134	147	147	101
	ESD ×10³	215	235	235	142	138	149	149	102
	CP	0.75	0.93	0.93	0.70	0.58	0.94	0.94	0.46
	CL ×10³	743	819	819	581	527	577	577	398

		(β_X,β_Z) = (log (3), log (1.5)); RR=0.5
β_X	BIAS ×10³	−722	−89	−89	−548	−715	−85	−85	−554
	ASE ×10³	96	273	274	169	67	187	187	121
	ESD ×10³	87	254	254	159	69	191	191	117
	CP	0.00	0.94	0.93	0.11	0.00	0.91	0.92	0.01
	CL ×10³	377	1071	1073	664	263	732	733	476
β_Z	BIAS ×10³	470	11	11	349	487	35	35	368
	ASE ×10³	188	261	261	162	133	180	180	115
	ESD ×10³	190	248	248	138	150	193	193	107
	CP	0.31	0.95	0.96	0.40	0.05	0.94	0.94	0.10
	CL ×10³	738	1021	1021	636	522	707	705	449

Open in a new tab

In Table 1, we can see that the CPL estimator has ignorable biases, with coverage probabilities close to 95% when C depends on W. However, when C depends on X, the coverage probabilities of the CPL estimator for β_X dramatically decline due to the substantial biased problem, which is presented in Table 2. The bias problem becomes more serious as β_X increases and RR decreases. In Table 3, when X follows a uniform distribution, the CPL estimator has large biases and low coverage probabilities, especially when β_X is large and RR is low. In contrast, the proposed methods have good performance with at least 92% coverage probabilities and limited biases. In Table 4, it can be seen that the proposed estimators still have good performance in terms of bias and coverage probability compared to the CPL estimator. However, when the sample size increases to n = 2000, the coverage probabilities of the 95% confidence intervals for the proposed estimators may be lower than 90%.

Table 2.

Censoring time depends on X; X follows a normal distribution, and X and Z are independent.

		n = 300				n = 600
		Naive	RC	MC	CPL	Naive	RC	MC	CPL
		(β_X_,β_Z) = (log (1.5), log (1.5)); RR=0.8
β_X	BIAS ×10³	−80	2	3	−14	−102	−25	−20	−30
	ASE ×10³	133	167	167	118	96	120	120	84
	ESD ×10³	136	171	171	131	93	118	117	87
	CP	0.93	0.94	0.95	0.92	0.73	0.97	0.97	0.90
	CL ×10³	523	655	653	462	375	470	471	331
β_Z	BIAS ×10³	16	15	15	13	−20	−19	−21	−6
	ASE ×10³	161	161	161	105	115	115	115	75
	ESD ×10³	161	162	161	118	109	108	110	74
	CP	0.95	0.95	0.95	0.89	0.97	0.97	0.97	0.97
	CL ×10³	632	632	632	412	451	451	451	294

		(β_X, β_Z) = (log (1.5), log (1.5)); RR=0.5
β_X	BIAS ×10³	−216	−20	−20	−89	−198	11	11	49
	ASE ×10³	102	209	209	150	78	156	156	117
	ESD ×10³	103	211	211	139	69	142	142	115
	CP	0.43	0.92	0.92	0.89	0.23	0.96	0.96	0.925
	CL ×10³	401	821	821	587	289	586	586	409
β_Z	BIAS ×10³	5	7	7	14	7	8	8	12
	ASE ×10³	162	163	164	108	117	117	117	77
	ESD ×10³	168	169	169	107	115	117	117	79
	CP	0.94	0.94	0.94	0.94	0.96	0.96	0.96	0.93
	CL ×10³	523	655	653	462	375	470	471	331

		(β_X, β_Z) = (log (3), log (1.5)); RR=0.8
β_X	BIAS ×10³	−225	−8	−8	−91	−216	4	4	−95
	ASE ×10³	126	159	159	133	88	111	111	94
	ESD ×10³	124	157	157	141	84	106	106	89
	CP	0.58	0.96	0.96	0.87	0.33	0.95	0.95	0.83
	CL ×10³	496	622	622	520	346	434	434	368
β_Z	BIAS ×10³	3	4	4	7	3	3	3	4
	ASE ×10³	143	144	144	109	101	102	102	78
	ESD ×10³	147	146	146	109	98	97	97	82
	CP	0.95	0.94	0.94	0.94	0.98	0.98	0.98	0.93
	CL ×10³	562	563	563	429	398	398	398	305

		(β_X, β_Z) = (log (3), log (1.5)); RR=0.5
β_X	BIAS ×10³	−558	−6	−6	−238	−552	−1	−1	−229
	ASE ×10³	99	207	208	195	70	144	144	139
	ESD ×10³	100	202	202	186	68	147	147	133
	CP	0.00	0.97	0.97	0.74	0.00	0.95	0.95	0.59
	CL ×10³	389	812	814	763	273	565	565	546
β_Z	BIAS ×10³	6	6	6	11	−10	−12	−12	1
	ASE ×10³	151	154	154	125	106	108	108	88
	ESD ×10³	154	158	158	115	110	115	115	95
	CP	0.93	0.95	0.95	0.96	0.97	0.94	0.94	0.92
	CL ×10³	591	603	604	489	416	424	424	345

Open in a new tab

To summarize, the simulation study reveals that the proposed methods can effectively correct the bias due to measurement errors even when the conditionally normal assumption of X is violated. However, the CPL estimator is biased when C depends on X and is sensitive to the distributional assumption imposed on X. The simulation study also shows that the naive approach generally has a serious bias problem. We note that the proposed estimators are not consistent in Scenarios 3 and 4 because of a violation of the normal assumption imposed on X given (W,Z). Hence, the corresponding coverage probabilities obtained from the 95% confidence intervals may be lower than 90% when the sample size is large (such as n = 2000), especially under a skewed measurement error distribution.

5 Data analysis

In this section, we apply the proposed methods to the NPC trial dataset to assess the effect of plasma selenium treatment on SCC recurrences. This randomized, double-blinded clinical trial recruited 1312 patients with histories of skin cancer, including 653 and 659 patients in the treatment and placebo groups, respectively. The study period lasted up to 12 years.

Many critical risk factors for SCC were recorded at the baseline, particularly the plasma selenium level. As mentioned, the plasma selenium level is measured with error due to the measuring instrument or temporary biological fluctuation. Some patients in the placebo group had more than one plasma selenium measurement, which can be treated as replicates. However, patients in the treatment group had only one baseline plasma selenium measurement because successive measurements cannot represent the baseline values. A new incidence of SCC was diagnosed and recorded during the follow-up time; thus, the times of SCC occurrences were available.

In this analysis, we consider two covariates: the baseline plasma selenium measurement and the treatment assignment indicator. The latter is our primary covariate of interest, whereas the former is an important predictor for adjusting the model but is contaminated with measurement errors. Let X be the logarithm of the baseline plasma selenium value (abbreviated as log(selenium)) and Z be the treatment assignment. We assume that the recurrence of SCC follows a non-homogeneous Poisson process, with intensity function λ (t | ν,X,Z)=νλ₀(t)exp(β_XX + β_ZZ), where the frailty variable ν accounts for the correlations among the SCC recurrences and between the SCC event process and informative censoring time. Here, X is independent of Z because the NPC trial is a randomized clinical trial. Assume that X, given W, follows a conditional normal distribution. By using the replicates data, the variance of X, given W, is estimated by ${\hat{σ}}^{2} = {\hat{σ}}_{U}^{2} {\hat{σ}}_{X}^{2} / {\hat{σ}}_{W}^{2} = {0.156}^{2} \cdot {0.133}^{2} / {0.205}^{2} = {0.101}^{2}$ .

To verify the distributional assumptions imposed on the covariates, a subset consisting of 292 placebo-grouped patients with 10 or more selenium measurements is used. Because the numbers of replicates of these patients are large enough, the average of replicates should be very close to the true value of the plasma selenium level. We estimate X_i by ${\hat{X}}_{i} = \sum_{j = 1}^{k_{i}} W_{i j} / k_{i}$ and U_i by Û_i = W_i₁ − X̂_i for the ith patient in this subset. Figure 1 shows the histograms of X̂ and Û, which suggest the marginal normal distributions for X and U. Moreover, the correlation between X̂ and Û is −0.069, with P-value=0.234. Under the assumption of normality, the non-significant correlation implies the independence between X and U. Hence, the conditional normal assumption of X is appropriate in the NPC dataset.

Histograms of estimated true covariate {*X̂_i*} and estimated error terms {*Û_i*} by using 292 placebo grouped patients with more than 10 plasma selenium measurements.

The patients in the trial were arranged to receive the dermatologic examination periodically. Define the censoring time C as the last examination time from the randomization and τ = 149.5 (months) as the maximum time among the C’s. The existing recurrent event studies (Clark et al., 1996, Jiang et al., 1999) for the NPC data assumed that the censoring is non-informative, which might be improper. Figure 2 shows the weighted average of the SCC recurrences versus time for subjects in the four selected risk sets (t₁ = 54.9, t₂ = 86.3, t₃ = 115.5, t₄ = 135.2). Note that the number of SCC recurrences for time t is calculated as N_i(t ∧ C_i) for subject i, where a ∧ b = min(a,b). If the censoring time is independent of the SCC recurrence, we expect that all lines should be close to each other. However, it can be observed that the subjects who stayed in the trial longer (censoring time after 115.5 months and 135.2 months) tended to have fewer SCC recurrences in the early and middle stages. The result implies that the independent censoring assumption is not satisfied and the proposed methods are necessary.

Weighted average of the SCC recurrences versus time (month since randomization) for subjects in the four selected risk sets (t₁ = 54.9, t₂ = 86.3, t₃ = 115.5, t₄ = 135.2), where the weighted average of the SCC recurrences for subjects in the rth risk set at time t is calculated by $\sum_{i = 1}^{n} N_{i} (t \land C_{i}) I (C_{i} > t_{r}) / \sum_{i = 1}^{n} I (C_{i} > t_{r})$ , 0 ≤ t ≤ τ = 149.5 where r = 1,2,3,4.

After excluding 55 patients without any records of examination and SCC events and 2 without baseline plasma selenium measurements, we included 1255 patients in the analysis to fit the semi-parametric model for the SCC recurrences. Among these patients, 473 had at least one SCC occurrence. The result of the fitted model is presented in Table 5. Because the RC and MC estimates are identical, only the RC estimate is shown in the table. The results in the table show that the treatment effect estimates of all approaches are positive but statistically non-significant. That is, the supplement of plasma selenium has no significant effect on preventing the recurrence of SCC. This result is consistent with those of the previous studies (Clark et al., 1996, Jiang et al., 1999). Moreover, the phenomenon of attenuation can also be observed in the naive estimate of log(selenium). Under the 95% confidence level, the adjusted estimates obtained from the RC and MC methods are significant, with values equal to −1.502. This implies that patients with higher plasma selenium level at baseline have fewer SCC recurrences.

Table 5.

Regression analysis of the SCC recurrences in the NPC trial

		Naive	RC	CPL
log(Selenium)	EST	−0.555	−1.502	−1.109
	SE	0.292	0.790	0.842
	Z-value	−1.897	−1.902	−1.317

Treatment	EST	0.185	0.223	0.125
	SE	0.140	0.141	0.125
	Z-value	1.317	1.581	1.002

Open in a new tab

Note: EST denotes the estimate, SE denotes the standard error which is estimated by the square root of the asymptotic variance estimator. The MC estimates are identical to the RC estimates.

6 Discussion

To identify the population risk factor in the recurrent event analysis, inference of the rate function is commonly preferred. The existing methods depend on the assumptions of either accurately measured covariates or independent censoring, which may not always be realistic. In this article, we consider statistical methods for recurrent event data with measurement error and informative censoring. When the error-prone covariates are conditionally normally distributed, our proposed estimators are consistent. In our estimating procedure, we do not need additional assumptions of the frailty distribution or of the censoring time. The numerical results show that the naive method, which ignores measurement errors in the covariates, leads to a large biased estimator and that the CPL method strongly depends on the independence between the covariates and censoring time. Meanwhile, our proposed methods correct measurement errors effectively and give accurate confidence intervals under different scenarios.

The corrected methods considered in this paper are developed under parametric distributions for the covariates and measurement errors. In the NPC data example, these distributional assumptions can be validated via adequate replicates. In practice, we may not have enough information to validate these distributional assumptions of the errors and covariates. To relax such assumptions, a non-parametric correction method similar to Huang and Wang (2000) for Cox regression with measurement error might be further developed. However, the extension of nonparametric correction to the regression analysis of recurrent event data is not straight-forward; hence, future research is warranted. The idea of measurement error correction can be applied not only to recurrent event data but also to panel count data, of which the number of events can only be observed at several random times.

Supplementary Material

Supplemental Material

NIHMS870416-supplement-Supplemental_Material.pdf^{(45.2KB, pdf)}

Acknowledgments

We thank the editor and referees for their very helpful comments and suggestions that greatly improved the paper. This research was partially supported by Taiwan Ministry of Science and Technology MOST 104-2118-M-007-002 (Cheng and Yu), National Institutes of Health grants CA53996, ES017030, HL121347, and MH105857 (Wang), and a travel award from the Mathematics Research Promotion Center of National Science Council of Taiwan (Wang).

Appendices

A Asymptotic properties

Let ℬ_R, ℬ be any compact neighborhoods of β_R and β, respectively, which are the roots of the limits of the RC and MC estimating equations. Additionally, denote $W_{i} = {(1, {\bar{W}}_{i}^{'}, Z_{i}^{'})}^{'}$ and $X_{i} = {(1, E {(X_{i} ∣ {\bar{W}}_{i}, Z_{i}, γ)}^{'}, Z_{i}^{'})}^{'}$ . To prove the asymptotic properties of the proposed estimators, we impose the following regularity conditions:

(a1)
Λ₀(τ) > 0;
(a2)
Pr(C ≥ τ, ν > 0) > 0;
(a3)
G(u) ≡ ℰ[νI(C ≥ u)] is a continuous function for u ∈ [0,τ];
(a4)
ℰ{sup_b_∈ℬ𝒲𝒲′ exp(D′(b,η)𝒲)} and ℰ {sup_b_{∈ℬ_ℛ}𝒳𝒳′ exp(b′𝒳)} are bounded. Moreover, ℰ {𝒲𝒲′ exp(D′(β,η)𝒲)} and $E {X X^{'} exp (β_{R}^{'} X)}$ are non-singular.

Note that condition (a4) can be satisfied under the normality assumption imposed on the covariates.

Define Q₁(t) ≡ G(t)Λ₀(t), $Q_{2} (t) \equiv \int_{0}^{t} G (u) d Λ_{0} (u)$ . Under conditions (a1) through (a3), Wang et al. (2001) showed that

\hat{Φ} (t) - Φ (t) = \frac{1}{n} \sum_{i = 1}^{n} Φ (t) d_{i} (t) + o_{p} (n^{- 1 / 2}), \forall inf {s : Λ_{0} (s) > 0} < t \leq τ,

(6)

where $d_{i} (t) \equiv \sum_{j = 1}^{m_{i}} {\int_{t}^{τ} I (T_{i j} \leq u \leq C_{i}) / Q_{1}^{2} (u) d Q_{2} (u) - I (t < T_{i j} \leq τ) / Q_{1} (T_{i j})}$ are iid terms with zero expectations. Based on the central limit theorem, $\sqrt{n} (\hat{Φ} (t) - Φ (t))$ converges to a multivariate normal distribution with mean zero and variance $Φ^{2} (t) E [d_{i}^{2} (t)]$ .

By using the method of moments, the nuisance parameter estimator γ̂ is obtained by solving

n^{- 1} \sum_{i = 1}^{n} Ψ_{i} (γ) = n^{- 1} \sum_{i = 1}^{n} (\begin{matrix} k_{i} ({\bar{W}}_{i} - μ_{X}) \\ Z_{i} - μ_{Z} \\ \sum_{j = 1}^{k_{i}} (W_{i j} - {\bar{W}}_{i}) {(W_{i j} - {\bar{W}}_{i})}^{'} - (k_{i} - 1) \sum_{U} \\ k_{i} ({\bar{W}}_{i} - μ_{x}) {({\bar{W}}_{i} - μ_{x})}^{'} - \sum_{U} - k_{i} \sum_{X} \\ (Z_{i} - μ_{Z}) {(Z_{i} - μ_{Z})}^{'} - \sum_{Z} \\ ({\bar{W}}_{i} - μ_{X}) {(Z_{i} - μ_{Z})}^{'} - \sum_{X Z} \end{matrix}) = 0,

where Ψ_i(γ) are iid terms. With the same techniques as for these in M-estimators (Huber, 2009), it can be shown that γ̂ converges in probability to γ. Let R≡ℰ {−∂Ψ_i(γ)/∂γ′}, where R is non-singular under condition (a4); thus, via a Taylor expansion,

\hat{γ} - γ = R^{- 1} n^{- 1} \sum_{i = 1}^{n} Ψ_{i} (γ) + o_{p} (n^{- 1 / 2}) .

(7)

Based on the central limit theorem, $\sqrt{n} (\hat{γ} - γ)$ converges to a normal distribution with mean zero and a covariance-matrix R⁻¹ℰ {Ψ_i(γ)Ψ_i(γ)′}{R⁻¹}′.

With the consistencies of γ̂ and Φ̂(t),∀t ∈ [0,τ], we can prove the following propositions, of which the proofs are given in the Supplementary Information. Define V as the joint density of (W,Z,m, c), Π ≡ ∂𝒳/∂γ′, and Γ ≡ ∂D/∂γ′. Let

\begin{array}{l} g_{i} = X_{i} {\frac{m_{i}}{Φ (C_{i})} - e^{β_{R}^{'} X_{i}}} - \int \frac{X m d_{i} (C)}{Φ (C)} d V \\ + \int {(\frac{m}{Φ (C)} - e^{β_{R}^{'} X}) I_{l} - e^{β_{R}^{'} X} (β_{R}^{'} \otimes X)} Π d V R^{- 1} Ψ_{i} (γ), \end{array}

and

h_{i} = W_{i} {\frac{m_{i}}{Φ (C_{i})} - e^{D^{'} (β, η) W_{i}}} - \int \frac{m W d_{i} (C)}{Φ (C)} d V - {\int W W^{'} e^{D {(β, η)}^{'} W} d V} Γ R^{- 1} Ψ_{i} .

Proposition 1

Under conditions (a1) through (a4), β̂_R converges in probability to β_R. Furthermore, $\sqrt{n} ({\hat{β}}_{R} - β_{R})$ asymptotically follows a normal distribution with mean zero and a covariance matrix A⁻¹Σ_g{A⁻¹}′, where $A = E (- \partial g_{i} / \partial β_{R}^{'}), \sum_{g} = E (g_{i} g_{i}^{'})$ .

Proposition 2

Under conditions (a1) through (a4), β̂_M converges in probability to β. Furthermore, $\sqrt{n} ({\hat{β}}_{M} - β)$ is asymptotically normally distributed with mean zero and a covariance matrix B⁻¹Σ_h{B⁻¹}′, where B = ℰ (−∂h_i/∂β′), $\sum_{h} = E (h_{i} h_{i}^{'})$ .

B Covariance estimation of RC

To develop covariance estimation of the RC estimator, we first illustrate the covariance estimation of $\sqrt{n} (\hat{γ} - γ)$ and $\sqrt{n} (\hat{Φ} (t) - Φ (t))$ , ∀t ∈ [0, τ].

Let $R_{n} = n^{- 1} \sum_{i = 1}^{n} \partial Ψ_{i} (γ) / \partial γ^{'} ∣_{γ = \hat{γ}}$ and Π̂_i = ∂𝒳_i/∂γ′ |_γ₌_γ̂. The covariance matrix of $\sqrt{n} (\hat{γ} - γ)$ can be estimated by $R_{n}^{- 1} n^{- 1} \sum_{i = 1}^{n} Ψ_{i} (\hat{γ}) Ψ_{i} {(\hat{γ})}^{'} {R_{n}^{- 1}}^{'}$ . Define ${\hat{Q}}_{1} (u) = n^{- 1} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} I (T_{i j} \leq u \leq C_{i}), d {\hat{Q}}_{2} (u) = n^{- 1} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} I (T_{i j} = u)$ , and

\hat{d_{i}} (t) = \sum_{j = 1}^{m_{i}} [\sum_{T_{(l)} \in [t, τ]} \frac{I (T_{i j} \leq T_{(l)} \leq C_{i}) d {\hat{Q}}_{2} (T_{(l)})}{{\hat{Q}}_{1} {(T_{(l)})}^{2}} - \frac{I (t < T_{i j} \leq τ)}{{\hat{Q}}_{1} (T_{i j})}],

where T₍_l₎ are ordered and distinct values of {T_{i j}}_i_=1,…,_{n; j}_{=1….m_i}. Based on Wang et al. (2001), we can show that the covariance matrix of $\sqrt{n} (\hat{Φ} (t) - Φ (t))$ can be consistently estimated by ${\hat{Φ}}^{2} (t) n^{- 1} \sum_{i = 1}^{n} {\hat{d_{i}}}^{2} (t)$ .

Denote ⊗ as a Kronecker product and I_a as an identity matrix with size a. Let $\hat{X_{i}} = {(1, E {(X_{i} ∣ {\bar{W}}_{i}, Z_{i}, \hat{γ})}^{'}, Z_{i}^{'})}^{'}$ . Finally, the covariance matrix of $\sqrt{n} ({\hat{β}}_{R} - β_{R})$ can be consistently estimated by $A_{n}^{- 1} {\sum^{^}}_{g} {A_{n}^{- 1}}^{'}$ , where $A_{n} = n^{- 1} \sum_{i = 1}^{n} \hat{X_{i}} \hat{X_{i}^{'}} e^{{\hat{β}}_{R}^{'} \hat{X_{i}}}$ and ${\sum^{^}}_{g} = n^{- 1} \sum_{i = 1}^{n} {\hat{g}}_{i} {\hat{g}}_{i}^{'}$ , with

\begin{array}{l} {\hat{g}}_{i} = \hat{X_{i}} {\frac{m_{i}}{\hat{Φ} (C_{i})} - e^{{\hat{β}}_{R}^{'} \hat{X_{i}}}} - \sum_{j = 1}^{n} \frac{\hat{X_{j}} m_{j} \hat{d_{i}} (C_{j})}{\hat{Φ} (C_{j})} \\ + \sum_{j = 1}^{n} {(\frac{m_{j}}{\hat{Φ} (C_{j})} - e^{{\hat{β}}_{R}^{'} \hat{X_{j}}}) I_{1 + p + q} - e^{{\hat{β}}_{R}^{'} \hat{X_{j}}} ({\hat{β}}_{R}^{'} \otimes \hat{X_{j}}) {\hat{Π}}_{j}} R_{n}^{- 1} Ψ_{i} (\hat{γ}) . \end{array}

C Covariance estimation of MC

Let D̂ = D(β̂_M, η̂), Γ̂ =∂D/∂γ′ |_γ₌_γ̂. The covariance matrix of $\sqrt{n} ({\hat{β}}_{M} - β_{M})$ can be consistently estimated by $B_{n}^{- 1} {\sum^{^}}_{h} {B_{n}^{- 1}}^{'}$ , where

B_{n} = n^{- 1} \sum_{i = 1}^{n} W_{i} {W_{i}^{'} \otimes \frac{\partial D (β, \hat{η})}{\partial β^{'}} ∣_{β = {\hat{β}}_{M}}} e^{{\hat{D}}^{'} W_{i}},

and ${\sum^{^}}_{h} = n^{- 1} \sum_{i = 1}^{n} {\hat{h}}_{i} {\hat{h}}_{i}^{'}$ , with

{\hat{h}}_{i} = W_{i} {\frac{m_{i}}{\hat{Φ} (C_{i})} - e^{{\hat{D}}^{'} W_{i}}} - \sum_{j = 1}^{n} \frac{m_{j} W_{j} \hat{d_{i}} (C_{j})}{\hat{Φ} (C_{j})} - {\sum_{j = 1}^{n} W_{j} W_{j}^{'} e^{{\hat{D}}^{'} W_{j}}} \hat{Γ} R_{n}^{- 1} Ψ_{i} (\hat{γ}) .

D Proof of RC = MC for regression parameters

Recall that ℰ (X_i |W̄_i,Z_i,γ) =η₀+η_WW̄_i+η_ZZ_i, where η₀,η_W and η_Z are functions of γ. Let 0_r_×_s be a r×s matrix of zeros. With simple algebra, we can write 𝒳_i = H𝒲_i,∀i = 1, …,n, where

H = (\begin{matrix} 1 & 0_{1 \times p} & 0_{1 \times q} \\ η_{0} & η_{W} & η_{Z} \\ 0_{q \times 1} & 0_{q \times p} & I_{q} \end{matrix}) .

Because H remains the same for i = 1, …,n, for any fixed γ, equation (4) can be written as

n^{- 1} \sum_{i = 1}^{n} W_{i} {m_{i} {\hat{Φ}}^{- 1} (C_{i}) - e^{{(H^{'} {\hat{β}}_{R})}^{'} W_{i}}} = 0.

(8)

Recall that b̂_N is the unique root of equations, with the form of

n^{- 1} \sum_{i = 1}^{n} W_{i} {m_{i} {\hat{Φ}}^{- 1} (C_{i}) - e^{b^{'} W_{i}}} = 0.

It is easy to see that equation (8) has the same form as the equations given above. Thus, we have H′β̂_R = b̂_N. Moreover, by definition, b̂_N = D(b̂_M,γ) for any fixed γ. Therefore, we have

(\begin{matrix} {\hat{β}}_{R, 0} + η_{0} {\hat{β}}_{R, X} \\ η_{W}^{'} {\hat{β}}_{R, X} \\ η_{Z}^{'} {\hat{β}}_{R, X} + {\hat{β}}_{R, Z} \end{matrix}) = H^{'} {\hat{β}}_{R} = D ({\hat{b}}_{M}, γ) = (\begin{matrix} {\hat{β}}_{M, 0} + η_{0} {\hat{β}}_{M, X} + \frac{1}{2} {\hat{β}}_{M, X}^{'} \sum {\hat{β}}_{M, X} \\ η_{W}^{'} {\hat{β}}_{M, X} \\ η_{Z}^{'} {\hat{β}}_{M, X} + {\hat{β}}_{M, Z} \end{matrix}),

for any fixed γ. The above equations imply that ${\hat{β}}_{R, 0} = {\hat{β}}_{M, 0} + \frac{1}{2} {\hat{β}}_{M, X}^{'} \sum {\hat{β}}_{M, X}$ , β̂_R_,_X = β̂_M_,_X and β̂_R_,_Z = β̂_M_,_Z. Hence, the proof is complete.

Footnotes

Supplementary Materials

The Supplementary Information referenced in this paper is available at the website of the Journal.

References

Balakrishnan N, Peng Y. Generalized gamma frailty model. Statistics in Medicine. 2006;25:2797–2816. doi: 10.1002/sim.2375. [DOI] [PubMed] [Google Scholar]
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. London: Chapman & Hall; 2006. [Google Scholar]
Clark LC, Combs GF, Turnbull BW, Slate EH, Chalker DK, Chow J, Davis LS, Glover RA, Graham GF, Gross EG, et al. Effects of selenium supplementation for cancer prevention in patients with carcinoma of the skin: a randomized controlled trial. Journal of the American Medical Association. 1996;276:1957–1963. [PubMed] [Google Scholar]
Cook RJ, Lawless JF. The statistical analysis of recurrent events. New York: Springer; 2007. [Google Scholar]
Fleming TR, Harrington DP. Counting processes and survival analysis. New York: John Wiley & Sons; 1991. [Google Scholar]
Hu XJ, Lagakos SW. Nonparametric estimation of the mean function of a stochastic process with missing observations. Lifetime Data Analysis. 2007;13:51–73. doi: 10.1007/s10985-006-9030-0. [DOI] [PubMed] [Google Scholar]
Hu XJ, Lagakos SW, Lockhart RA. Generalized least squares estimation of the mean function of a counting process based on panel counts. Statistica Sinica. 2009;19:561–580. [PMC free article] [PubMed] [Google Scholar]
Hu XJ, Lawless JF. Estimation of rate and mean functions from truncated recurrent event data. Journal of the American Statistical Association. 1996;91:300–310. [Google Scholar]
Huang Y, Wang CY. Cox regression with accurate covariates unascertainable: a nonparametric-correction approach. Journal of the American Statistical Association. 2000;95:1209–1219. [Google Scholar]
Huber PJ. Robust statistics. New Jersey: John Wiley & Sons; 2009. [Google Scholar]
Jiang W, Turnbull BW, Clark LC. Semiparametric regression models for repeated events with random effects and measurement error. Journal of the American Statistical Association. 1999;94:111–124. [Google Scholar]
Kalbfleisch JD, Schaubel DE, Ye Y, Gong Q. An estimating function approach to the analysis of recurrent and terminal events. Biometrics. 2013;69:366–374. doi: 10.1111/biom.12025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lancaster T, Intrator O. Panel data with survival: hospitalization of hiv-positive patients. Journal of the American Statistical Association. 1998;93:46–53. [Google Scholar]
Lawless JF, Hu J, Cao J. Methods for the estimation of failure distributions and rates from automobile warranty data. Lifetime Data Analysis. 1995;1:227–240. doi: 10.1007/BF00985758. [DOI] [PubMed] [Google Scholar]
Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–168. [Google Scholar]
Lin DY, Sun W, Ying Z. Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika. 1999;86:59–70. [Google Scholar]
Liu W, Wu L. Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses. Biometrics. 2007;63:342–350. doi: 10.1111/j.1541-0420.2006.00687.x. [DOI] [PubMed] [Google Scholar]
Mazroui Y, Mathoulin-Pelissier S, Soubeyran P, Rondeau V. General joint frailty model for recurrent event data with a dependent terminal event: application to follicular lymphoma data. Statistics in medicine. 2012;31:1162–1176. doi: 10.1002/sim.4479. [DOI] [PubMed] [Google Scholar]
Morgan WJ, Butler SM, Johnson CA, Colin AA, FitzSimmons SC, Geller DE, Konstan MW, Light MJ, Rabin HR, Regelmann WE, et al. Epidemiologic study of cystic fibrosis: design and implementation of a prospective, multicenter, observational study of patients with cystic fibrosis in the us and canada. Pediatric Pulmonology. 1999;28:231–241. doi: 10.1002/(sici)1099-0496(199910)28:4<231::aid-ppul1>3.0.co;2-2. [DOI] [PubMed] [Google Scholar]
Nakamura T. Proportional hazards model with covariates subject to measurement error. Biometrics. 1992;48:829–838. [PubMed] [Google Scholar]
Nielsen GG, Gill RD, Andersen PK, Sørensen TI. A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics. 1992;19:25–43. [Google Scholar]
Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982;69:331–342. [Google Scholar]
Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981;68:373–379. [Google Scholar]
Stefanski LA. The effects of measurement error on parameter estimation. Biometrika. 1985;72:583–592. [Google Scholar]
Wang CY. Robust sandwich covariance estimation for regression calibration estimator in cox regression with measurement error. Statistics and Probability Letters. 1999;45:371–378. [Google Scholar]
Wang CY, Hsu L, Feng ZD, Prentice RL. Regression calibration in failure time regression. Biometrics. 1997;53:131–145. [PubMed] [Google Scholar]
Wang MC, Huang CY. Statistical inference methods for recurrent event processes with shape and size parameters. Biometrika. 2014;101:553–566. doi: 10.1093/biomet/asu016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang MC, Qin J, Chiang CT. Analyzing recurrent event data with informative censoring. Journal of the American Statistical Association. 2001;96:1057–1065. doi: 10.1198/016214501753209031. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu L. A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal of the American Statistical Association. 2002;97:955–964. [Google Scholar]
Wu L, Liu W, Hu XJ. Joint inference on HIV viral dynamics and immune suppression in presence of measurement errors. Biometrics. 2010;66:327–335. doi: 10.1111/j.1541-0420.2009.01308.x. [DOI] [PubMed] [Google Scholar]
Zeng D, Ibrahim J, Chen M, Hu K, Jia C. Multivariate recurrent events in the presence of multivariate informative censoring with applications to bleeding and transfusion events in myelodysplastic syndrome. Journal of biopharmaceutical statistics. 2014;24:429–442. doi: 10.1080/10543406.2013.860159. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

NIHMS870416-supplement-Supplemental_Material.pdf^{(45.2KB, pdf)}

[R1] Balakrishnan N, Peng Y. Generalized gamma frailty model. Statistics in Medicine. 2006;25:2797–2816. doi: 10.1002/sim.2375. [DOI] [PubMed] [Google Scholar]

[R2] Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. London: Chapman & Hall; 2006. [Google Scholar]

[R3] Clark LC, Combs GF, Turnbull BW, Slate EH, Chalker DK, Chow J, Davis LS, Glover RA, Graham GF, Gross EG, et al. Effects of selenium supplementation for cancer prevention in patients with carcinoma of the skin: a randomized controlled trial. Journal of the American Medical Association. 1996;276:1957–1963. [PubMed] [Google Scholar]

[R4] Cook RJ, Lawless JF. The statistical analysis of recurrent events. New York: Springer; 2007. [Google Scholar]

[R5] Fleming TR, Harrington DP. Counting processes and survival analysis. New York: John Wiley & Sons; 1991. [Google Scholar]

[R6] Hu XJ, Lagakos SW. Nonparametric estimation of the mean function of a stochastic process with missing observations. Lifetime Data Analysis. 2007;13:51–73. doi: 10.1007/s10985-006-9030-0. [DOI] [PubMed] [Google Scholar]

[R7] Hu XJ, Lagakos SW, Lockhart RA. Generalized least squares estimation of the mean function of a counting process based on panel counts. Statistica Sinica. 2009;19:561–580. [PMC free article] [PubMed] [Google Scholar]

[R8] Hu XJ, Lawless JF. Estimation of rate and mean functions from truncated recurrent event data. Journal of the American Statistical Association. 1996;91:300–310. [Google Scholar]

[R9] Huang Y, Wang CY. Cox regression with accurate covariates unascertainable: a nonparametric-correction approach. Journal of the American Statistical Association. 2000;95:1209–1219. [Google Scholar]

[R10] Huber PJ. Robust statistics. New Jersey: John Wiley & Sons; 2009. [Google Scholar]

[R11] Jiang W, Turnbull BW, Clark LC. Semiparametric regression models for repeated events with random effects and measurement error. Journal of the American Statistical Association. 1999;94:111–124. [Google Scholar]

[R12] Kalbfleisch JD, Schaubel DE, Ye Y, Gong Q. An estimating function approach to the analysis of recurrent and terminal events. Biometrics. 2013;69:366–374. doi: 10.1111/biom.12025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Lancaster T, Intrator O. Panel data with survival: hospitalization of hiv-positive patients. Journal of the American Statistical Association. 1998;93:46–53. [Google Scholar]

[R14] Lawless JF, Hu J, Cao J. Methods for the estimation of failure distributions and rates from automobile warranty data. Lifetime Data Analysis. 1995;1:227–240. doi: 10.1007/BF00985758. [DOI] [PubMed] [Google Scholar]

[R15] Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–168. [Google Scholar]

[R16] Lin DY, Sun W, Ying Z. Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika. 1999;86:59–70. [Google Scholar]

[R17] Liu W, Wu L. Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses. Biometrics. 2007;63:342–350. doi: 10.1111/j.1541-0420.2006.00687.x. [DOI] [PubMed] [Google Scholar]

[R18] Mazroui Y, Mathoulin-Pelissier S, Soubeyran P, Rondeau V. General joint frailty model for recurrent event data with a dependent terminal event: application to follicular lymphoma data. Statistics in medicine. 2012;31:1162–1176. doi: 10.1002/sim.4479. [DOI] [PubMed] [Google Scholar]

[R19] Morgan WJ, Butler SM, Johnson CA, Colin AA, FitzSimmons SC, Geller DE, Konstan MW, Light MJ, Rabin HR, Regelmann WE, et al. Epidemiologic study of cystic fibrosis: design and implementation of a prospective, multicenter, observational study of patients with cystic fibrosis in the us and canada. Pediatric Pulmonology. 1999;28:231–241. doi: 10.1002/(sici)1099-0496(199910)28:4<231::aid-ppul1>3.0.co;2-2. [DOI] [PubMed] [Google Scholar]

[R20] Nakamura T. Proportional hazards model with covariates subject to measurement error. Biometrics. 1992;48:829–838. [PubMed] [Google Scholar]

[R21] Nielsen GG, Gill RD, Andersen PK, Sørensen TI. A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics. 1992;19:25–43. [Google Scholar]

[R22] Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982;69:331–342. [Google Scholar]

[R23] Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981;68:373–379. [Google Scholar]

[R24] Stefanski LA. The effects of measurement error on parameter estimation. Biometrika. 1985;72:583–592. [Google Scholar]

[R25] Wang CY. Robust sandwich covariance estimation for regression calibration estimator in cox regression with measurement error. Statistics and Probability Letters. 1999;45:371–378. [Google Scholar]

[R26] Wang CY, Hsu L, Feng ZD, Prentice RL. Regression calibration in failure time regression. Biometrics. 1997;53:131–145. [PubMed] [Google Scholar]

[R27] Wang MC, Huang CY. Statistical inference methods for recurrent event processes with shape and size parameters. Biometrika. 2014;101:553–566. doi: 10.1093/biomet/asu016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Wang MC, Qin J, Chiang CT. Analyzing recurrent event data with informative censoring. Journal of the American Statistical Association. 2001;96:1057–1065. doi: 10.1198/016214501753209031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Wu L. A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal of the American Statistical Association. 2002;97:955–964. [Google Scholar]

[R30] Wu L, Liu W, Hu XJ. Joint inference on HIV viral dynamics and immune suppression in presence of measurement errors. Biometrics. 2010;66:327–335. doi: 10.1111/j.1541-0420.2009.01308.x. [DOI] [PubMed] [Google Scholar]

[R31] Zeng D, Ibrahim J, Chen M, Hu K, Jia C. Multivariate recurrent events in the presence of multivariate informative censoring with applications to bleeding and transfusion events in myelodysplastic syndrome. Journal of biopharmaceutical statistics. 2014;24:429–442. doi: 10.1080/10543406.2013.860159. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring

Hsiang Yu

Yu-Jen Cheng

Ching-Yun Wang

SUMMARY

1 Introduction

2 Model illustration

2.1 Recurrent event model

2.2 Measurement error model

3 Correction for errors-in-variable

3.1 Regression calibration approach

3.2 Moment corrected approach

4 Simulation study

Table 1.

Table 4.

Table 3.

Table 2.

5 Data analysis

Figure 1.

Figure 2.

Table 5.

6 Discussion

Supplementary Material

Acknowledgments

Appendices

A Asymptotic properties

Proposition 1

Proposition 2

B Covariance estimation of RC

C Covariance estimation of MC

D Proof of RC = MC for regression parameters

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases