Abstract
We consider the proportional hazards model in which the covariates include the discretized categories of a continuous time-dependent exposure variable measured with error. Naively ignoring the measurement error in the analysis may cause biased estimation and erroneous inference. Although various approaches have been proposed to deal with measurement error when the hazard depends linearly on the time-dependent variable, it has not yet been investigated how to correct when the hazard depends on the discretized categories of the time-dependent variable. To fill this gap in the literature, we propose a smoothed corrected score approach based on approximation of the discretized categories after smoothing the indicator function. The consistency and asymptotic normality of the proposed estimator are established. The observation times of the time-dependent variable are allowed to be informative. For comparison, we also extend to this setting two approximate approaches, the regression calibration and the risk-set regression calibration. The methods are assessed by simulation studies and by application to data from an HIV clinical trial.
Keywords: corrected score, regression calibration, smoothing, survival
1 |. INTRODUCTION
In biomedical studies, it is often of interest to characterize the relationship between a time-to-event outcome and some covariates. Time-dependent exposure variables are usually collected intermittently over time. The observations may fluctuate due to biological variation or inaccurate measurements. This induces the issue of measurement error. An example is the AIDS Clinical Trials Group (ACTG) 175, which aimed to compare four antiretroviral therapies in HIV-infected subjects (Hammer et al., 1996). During the study, 2467 subjects were recruited between December 1991 and October 1992 and followed until November 1994. CD4 count, as a reflection of immune status, was scheduled to be measured for each participant about every 12 weeks after randomization. A threshold of CD4 count below 350 cells/mm3 was used in the past to guide whether to start antiretroviral therapy for HIV-infected subjects (World Health Organization, 2009). This threshold was also used as a cut-off point on assessing survival time (May et al., 2014). It is of interest to investigate if I(CD4 count ≤ 350 cells/mm3) (I(·) is the indicator function) could be used as a surrogate marker for time to AIDS. One complication is that observations of CD4 count are subject to substantial biological variation and measurement error.
It is well known that measurement error in continuous covariates may cause bias (Carroll et al., 2006). Various approaches have been proposed to deal with measurement error for continuous time-dependent covariates, usually based on joint modeling of the observed longitudinal data and survival data. These include the regression calibration (RC) (Prentice, 1982; Tsiatis et al., 1995; Liao et al., 2011), likelihood-based approaches (Wulfsohn and Tsiatis, 1997; Song, 2002a; Xu et al., 2014), conditional score (Tsiatis and Davidian, 2001; Song, 2002b), and corrected score (Wang, 2006) approaches. The latter two approaches are more flexible without distributional assumptions on the underlying true covariates, and much simpler in implementation. Song (2017) proposed an improved corrected score approach that is more efficient and also allows the observation times to be informative; that is, the observation times may depend on the underlying longitudinal process or the observed survival time. For example, patients with more severe disease status may appear more often for hospital check-ups in observational studies or have more missing observations during follow-ups in clinical trials. In the ACTG 175 study, some patients had missing CD4 measurements before the event or censoring time. The missing rate seems significantly associated with features of the CD4 count trajectory (Song, 2017), which indicates that the observation times of the CD4 count may be informative.
Misclassification of discrete covariates may cause bias as well (Gustafson, 2004). When the discretized categories of a continuous exposure variable are used as covariates, such as the categories induced by dichotomizing CD4 counts, the measurement error in the exposure variable may lead to misclassification of the discretized covariates and consequently biased estimation of the covariate effects (Flegal et al. 1991). When the error-prone exposure variable is not time-dependent, approaches have been proposed to deal with misclassification of the corresponding discretized categories under the framework of linear regression (Gustafson and Le, 2002; Natarajan, 2009), logistic regression (Dalen et al., 2009), generalized linear regression (Wang et al., 2016), and Cox regression (Seguin et al., 2014). However, to the best of our knowledge, this issue has not yet been investigated when the exposure variable is time-dependent.
To fill this gap, we first adopt a linearization-based RC approach through joint modeling of the longitudinal exposure process and the survival time. We further improve the approach by calibration within each risk set. In addition, we propose a smoothed corrected score (SCS) approach. This approach inherits the advantage of the existing corrected score approach in its simplicity in implementation and robustness to deviation from normal error, and it works reasonably well even if the observation times of the time-dependent exposure variable are informative.
The idea of approximating indicator functions by smooth functions has been considered in the context of the binary response model (Horowitz, 1992) and rank-based approaches (Ma and Huang, 2005; Heller, 2007; Song, 2007), when the covariates are not measured with error. A common purpose of smoothing in these studies is to overcome the difficulty in computation caused by the non-continuity of the indicator function. This is different from our idea of smoothing in this study: to obtain approximately normal independent variables conditional on the true covariates so that the corrected score method can be applied. Our approach is novel in this aspect.
The paper is organized as follows. In Section 2, we give the definition of the model. We propose the RC approaches in Section 3 and the SCS approach in Section 4 for dichotomized covariates. We extend these approaches to multiple discretized categories in Section 5. The finite sample performance of the estimators is assessed by simulation studies and illustrated by an application in Section 6. We conclude with discussion in Section 7. The technical details of the asymptotic results are given in the Web Appendix.
2 |. MODEL DEFINITION
For subject i = 1, …, n, let Ti denote the survival time, and Ci denote the censoring time. The observed survival data are Vi = min(Ti, Ci), and Δi = I(Ti ≤ Ci); these and all other variables are independent across i. For simplicity, we consider a single continuous time-dependent exposure variable Xi(t); it is straightforward to extend to multiple time-dependent exposure variables. Longitudinal measurements of Xi(t) are taken at times with the observed values . The discretized categories of Xi(t) are the covariates of interest. Let Zi denote p time-independent covariates.
Assume that the longitudinal exposure process follows the linear mixed effects model
| (1) |
where f(t) is a known continuous q-dimensional function, αi is a q-dimensional random effect, and j = 1, …, mi. The inherent longitudinal trajectory is denoted by fT(t)αi, which may represent a polynomial or a spline. For example, when f(t) = (1, t)T and αi = (αi0, αi1)T, fT(t)αi denotes a linear trajectory. No distributional assumption is placed on the random effect αi. The errors have mean zero and variance σ2 and are independent across time, which denote within-subject biological fluctuation and measurement errors. Let . We assume ei is independent of αi, Zi, Ti and Ci. This implies the surrogacy assumption that (Ti, Ci) is independent of Wi given αi, Zi.
Suppose the survival time depends on discretized categories of Xi(t) and Zi. Since discretized categories may be represented by binary variables, for simplicity of presentation, we first consider a dichotomous variable I(Xi(t) > x0) with a cut-off point x0; an extension to multiple discretized categories is described in Section 5. Specifically, assume the proportional hazards model
| (2) |
where λ(t|·) denotes the hazard of failure at time t conditional on ·, λ0(t) is an unspecified baseline hazard function, and β0 and γ0 are the regression coefficients. We assume that the survival time Ti is independent of the censoring time Ci given I(Xi(t) > x0) and Zi. We focus on estimating the regression coefficients .
The joint models (1) and (2) can be used to assess surrogate markers in clinical trials. According to Prentice (1989), a surrogate marker should satisfy two conditions: (i) the marker should be prognostic for clinical outcome; (ii) the risk of progression given the marker should be independent of treatment. Suppose Xi(t) is a surrogate marker and Zi is the indicator of an effective treatment; effectiveness of the treatment can be evaluated by including Zi only in model (2). Prentice’s first condition would be indicated by β0 ≠ 0 by including Xi(t) only in model (2). Prentice’s second condition can be assessed by including both Xi(t) and Zi in model (2) and γ0 = 0 would suggest the treatment effect is mediated through Xi(t).
3 |. LINEARIZATION-BASED RC METHODS
Seguin et al. (2014) used a linearization-based RC for misclassified dichotomized covariates induced by error contaminated time-independent covariates. Here we extend this approach to misclassified dichotomized covariates induced by time-dependent covariates. A sketch of derivation is outlined here with the details given in Web Appendix S.1.5.
Let , where is the least square estimate of αi based on the longitudinal observations of subject i. Under model (2), it can be shown that
Adopting the idea of RC (Prentice, 1982), when the event is rare, we may approximate by and substitute an estimator for . Following Wang et al. (2016), at time t, we approximate I(x(t) > x0) by a straight line that passes ({x0 + μX(t) − 2σX(t)}/2, 0) and ({x0 + μX(t) + 2σX(t)}/2, 1), where μX(t) = E{X(t)}, and (Figure 1). Under this approximation, with similar arguments as in Wang et al. (2016), we have
and subsequently,
where with , and ξ0(t) and are functions of t defined in Web Appendix S.1.5. Therefore, conditional on and Zi, the hazard function can be approximated as follows:
Based on this approximation, the RC estimator can be obtained by replacing I(Xi(t) > x0) by in the partial likelihood estimating equations. Here is an estimator of obtained by substituting an estimator for σ2 (e.g., the method of moment estimator), and is an estimator of , which can be obtained by replacing var(αi) by its estimate
where . Thus the RC is simple to implement.
FIGURE 1.

Approximation of an indicator function I(x > x0) by a straight line. This figure appears in color in the electronic version of this article, and any mention of color refers to that version
It is known in the literature that the RC can be improved by calibration within each risk set (Xie et al., 2001). Adopting the same idea, a risk-set regression calibration (RRC) estimator can be obtained by estimating within each risk set. Specifically, at time t, we may obtain an estimator of by replacing var(αi) by its estimate within the risk set
where . The computation can be intensive since needs to be calculated at each failure time. To simplify the calculation, a usual strategy (Tsiatis and Davidian, 2001) is to estimate var(αi) at a set of given times 0 = t0 < t1 < … < tM, and estimate by substituting for var(αi), where tj ≤ t < tj+1 and tM+1 = ∞.
We calculate the approximate standard errors based on the partial likelihood information matrix without adjustment for the estimation of the and . As remarked by several authors (Tsiatis et al., 1995; Dafni and Tsiatis, 1998), adjustment for standard errors of the RC estimators seems unnecessary in simulation studies; moreover, such adjustment may be hard to implement.
4 |. SMOOTHED CORRECTED SCORE ESTIMATOR
For now, we assume that the error variance σ2 is known. We would like to apply the corrected score approach (Wang, 2006) to estimate the regression coefficients. However, I(Xi(t) > x0) is a noncontinuous indicator function and the correction term cannot be readily obtained. To tackle this difficulty, we propose to approximate the indicator function I(u > 0) by a continuous differentiable function Kn(u) = K(u/hn), where K(u) is a distribution function, and hn is a tuning parameter; when hn goes to zero, Kn(u) converges to I(u > 0) (note that K(·) denotes a distribution function rather than a density function, differently from notation commonly used in the kernel smoothing literature). Then the hazard function can be approximated as follows:
| (3) |
Suppose is the smoothed “ideal” partial likelihood estimator based on the smoothed hazard function (3) if Xi(t) is observed, which is the solution to
evaluated at time τ, where is the at risk process, is the event counting process, and for r = 0, 1, 2, with
Here for a vector a, a⊗r = 1, a, aaT for r = 0, 1, 2, respectively.
Lemma 1.
Under the assumptions C1–C6 given in Web Appendix S.1.1, as n → ∞, if hn → 0, then (i) almost surely; (ii) , where I is the identity matrix and D and Σ are given in Web Appendix S.1.2; D = O(hn).
Since Xi(t) is not observed, a naïve approach is to replace Xi(t) by . Note that, given Xi(t), is approximately normal with mean Xi(t) and variance . By the delta method, has approximately normal distribution with mean and variance
where K′(u) is the derivative of K(u). Then we may apply the improved corrected score approach (Song, 2017). Compared to the corrected score approach (Wang, 2006), this approach obtains the estimate based on all longitudinal observations for the ith subject rather than the observations by time t only, and consequently improves the efficiency. In addition, Wang (2006) derived and corrected the bias of the naive estimating function assuming the observation times independent of (αi, Vi, Zi). Observing that the naive estimating function contains four empirical processes, Song (2017) applied the correction to each empirical process separately conditional on (αi, Vi, Zi) without the independence requirement on the observation times. Thus the method allows the observation times to depend on (αi, Vi, Zi). Applying this approach as if were normally distributed and substituting it for in equation (4.3) of Song (2017), we obtain the following SCS estimating equation:
| (4) |
where Yi(t) = I(Vi ≥ t, mi ≥ q) and Ni(t) = I(Vi ≤ t, Δi = 1, mi ≥ q), which differ from and with a multiplier I(mi ≥ q)), reflecting that at least q observations are required to obtain the estimate ; for , and
Let denote the SCS estimator. To derive the asymptotic properties, we assume that there exists Mn > 0 such that converges to a normal distribution. Although this basically assumes that mi is large for all i, the SCS works well even if mi is small in our simulation studies.
Proposition 1.
Under the assumptions C3–C12 given in Web Appendix S.1.1, as n → ∞, if hn → 0 and , then (i) almost surely; (ii) , where D* and Σ* are given in Web Appendix S.1.3; for any arbitrarily small δ > 0 if the error is normal.
The variance of can be estimated by n−1A−1B{A−1}T, where , ,
| (5) |
Remark.
The asymptotic coverage probability of the Wald confidence interval achieves the nominal level when the asymptotic bias equals o(n−1/2). As shown in Web Appendix S.1.4, when the error is normal, the naive estimator is consistent and asymptotically normal if Mn → ∞. The asymptotic bias of the naive estimator is . When , asymptotic bias of the SCS estimator is . This is comparable to the naive estimator since δ can be arbitrarily small. Our numerical studies indicate that the SCS works better than the naive estimator when mi is much smaller than n as usually encountered in practice.
In practice, the error variance σ2 is usually unknown. It can be estimated by the method of moment estimate (Tsiatis and Davidian, 2001). This requires mi > q for a subset of subjects. The SCS estimate can be obtained by replacing σ2 by in (4). Stack the estimating equation of and together and denote the set of estimating equation by . The asymptotic variance of can be estimated by , where , , and , where is the same as ϕ with σ2 replaced by .
It is important to select an appropriate tuning parameter hn for smoothing-based approaches. Cross-validation may be used to select hn. We may use as the objective function the smoothed corrected log partial likelihood function
following Song and Wang (2017), where the corrected log partial likelihood function was used to select smoothing parameters for the time-varying coefficient proportional hazards model.
5 |. EXTENSION TO MULTIPLE DISCRETIZED CATEGORIES
The approaches can be extended to multiple discretized categories. Specifically, with a sequence of cutoff points x1 < x2 < … < xK, using I(X ≤ x1) as the reference group, we may include K dummy variables I(xk < X ≤ xk+1) (k = 1, …, K, and xK+1 = ∞) in the proportional hazards model. In parallel to the derivation in the dichotomized case, we consider an alternatively parameterization such that
where with for k = 1, …, K. Note that , and K = 1 corresponds to the dichotomized case. The RC and RRC estimators can be obtained in the same way by replacing by in the partial likelihood estimating equation as described in Section 3. Letting , the SCS estimating equation can be written as
where for r = 0, 1, with
The asymptotic variance of the SCS estimator can be calculated the same way as described in Section 4 with replaced by in (5).
6 |. NUMERICAL STUDIES
6.1 |. Simulation studies
We conducted simulation studies to evaluate the performance of the proposed approaches. Note that the naive, RC, RRC, and SCS include only subjects with at least q observations (mi ≥ q). Let be the average numbers of observations per subject for those with mi ≥ q. We first considered the case that there was only one time-dependent covariate Xi(t) = αi0 + αi1t with β0 = −1, where (αi0, αi1) were jointly normal with mean (2.627, −0.0019), variances (0.02408,0.000014), and the covariance −0.00008. A similar covariate process was used by Song (2002b), which mimicked the log10 transformed CD4 profile in a real HIV data set. The cutoff point is x0 = log10(350). The longitudinal observations of X(t) were scheduled to be measured at t = 0, 6, 12, …, 84. The censoring time was generated from an exponential distribution with mean equal to 185 and truncated at 84. We considered two baseline hazard models: (1) λ0(t) = 0.02I(t ≥ 12) + 0.005I(t ≥ 60); (2) λ0(t) = 0.0055I(t ≥ 12) + 0.0025I(t ≥ 60). These are referred to as the larger λ0(t) and smaller λ0(t) henceforth. The corresponding censoring rates were 50% and 80%, and (SD = 4.5) and 11.2 (SD = 4.5), respectively, where SD denotes the average standard deviation across the simulated data sets.
For each scenario, 1000 simulated data sets were generated. We estimated the regression coefficient using five methods: (1) the “ideal” approach assuming the true value of Xi(t) known; (2) the naive approach; (3) the RC approach; (4) the RRC approach; and (5) the SCS method. For each method, we calculated the empirical bias, standard deviation of the estimates across the simulated data sets, the average of the estimated standard errors, and the empirical coverage probability of the 95% Wald confidence interval. For the RRC approach, var(αi) is estimated at 0, 20, 40, 60, and 80 percentiles of V; our preliminary studies indicate that it works reasonably well.
We used the logistic distribution function to approximate the indicator function. To speed the calculation, we set the tuning parameter
| (6) |
where and c is a constant. We set hn proportional to so that the estimator is invariant to scale transformation of X. Considering the “ideal” estimator is the optimal estimator when there is no measurement error, we set hn proportional to . This also makes hn proportional to . The factor is included to ensure that hn is the same unit as X. In our simulation studies, c = 0.18 works reasonably well.
In case 1, the error was generated from a normal distribution with mean 0 and variance σ2 = 0.01 or 0.02. The results for n = 500 and 1000 are shown in Table 1. The naive estimator is biased and the coverage probabilities are well below the nominal level. The RC estimator has smaller bias and better coverage than the naive estimator, and the bias is smaller with higher censoring rate. The RRC estimator outperforms the naive and RC, but still shows some bias and under-coverage. The SCS estimator has bias close to the “ideal” estimator, coverage probabilities close to the nominal level, and empirical standard deviations smaller than the RC and RRC.
TABLE 1.
Simulation results for normal error
| LH (50% censoring) |
SH (80% censoring) |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Censoring | σ 2 | Method | Bias | SD | SE | CP | Bias | SD | SE | CP |
| n = 500 | Ideal | −0.000 | 0.132 | 0.133 | 0.955 | 0.008 | 0.228 | 0.221 | 0.949 | |
| 0.01 | Naive | 0.206 | 0.131 | 0.130 | 0.640 | 0.218 | 0.219 | 0.212 | 0.794 | |
| RC | −0.176 | 0.163 | 0.160 | 0.809 | −0.129 | 0.274 | 0.263 | 0.923 | ||
| RRC | −0.100 | 0.162 | 0.161 | 0.912 | −0.089 | 0.273 | 0.264 | 0.934 | ||
| SCS | −0.017 | 0.149 | 0.147 | 0.945 | 0.035 | 0.242 | 0.237 | 0.953 | ||
| 0.02 | Naive | 0.264 | 0.133 | 0.130 | 0.462 | 0.281 | 0.214 | 0.211 | 0.699 | |
| RC | −0.255 | 0.183 | 0.160 | 0.646 | −0.177 | 0.279 | 0.261 | 0.892 | ||
| RRC | −0.087 | 0.164 | 0.161 | 0.909 | −0.078 | 0.270 | 0.262 | 0.937 | ||
| SCS | −0.039 | 0.154 | 0.150 | 0.935 | 0.028 | 0.248 | 0.244 | 0.954 | ||
| n = 1000 | Ideal | 0.000 | 0.096 | 0.094 | 0.948 | 0.002 | 0.155 | 0.156 | 0.947 | |
| 0.01 | Naive | 0.210 | 0.091 | 0.092 | 0.381 | 0.212 | 0.150 | 0.150 | 0.694 | |
| RC | −0.165 | 0.114 | 0.113 | 0.688 | −0.133 | 0.187 | 0.186 | 0.902 | ||
| RRC | −0.093 | 0.113 | 0.114 | 0.875 | −0.095 | 0.188 | 0.187 | 0.921 | ||
| SCS | −0.010 | 0.104 | 0.104 | 0.945 | 0.025 | 0.166 | 0.167 | 0.953 | ||
| 0.02 | Naive | 0.270 | 0.091 | 0.092 | 0.162 | 0.277 | 0.146 | 0.149 | 0.541 | |
| RC | −0.232 | 0.127 | 0.113 | 0.466 | −0.172 | 0.191 | 0.185 | 0.846 | ||
| RRC | −0.079 | 0.113 | 0.113 | 0.894 | −0.080 | 0.184 | 0.186 | 0.932 | ||
| SCS | −0.034 | 0.107 | 0.106 | 0.931 | 0.020 | 0.171 | 0.172 | 0.950 | ||
Note: λ(t|I(Xi(t) > x0)) = λ0(t) exp{β0I(Xi(t) > x0)} with β0 = −1, Xi(t) = αi0 + αi1t measured at t = 0, 6, 12, …, 84, x0 = log10(350). LH, larger λ0(t) = 0.02I(t ≥ 12) + 0.005I(t ≥ 60); SH, smaller λ0(t) = 0.0055I(t ≥ 12) + 0.0025I(t ≥ 60). SD, empirical standard deviation of the estimates across simulated data sets; SE, average of estimated standard errors; CP, coverage probability of the 95% Wald confidence interval.
Case 2 is the same as case 1 except that the error was generated from nonnormal distributions with mean zero and variance σ2. We considered four nonnormal distributions: (i) a scaled t-distribution with degrees of freedom 4, mimicking the error distribution in the ACTG 175 study; (ii) a uniform distribution; (iii) a skewed bimodal normal mixture; (iv) a symmetric bimodal normal mixture. Here the mixtures in (iii) and (iv) are from two normal distributions and with μ1 = (1 − pm)aσN, μ2 = −pmaσN and , where pm is the mixing proportion and aσN is the distance between the means of the two normal distributions, with pm = 0.3 and a = 3 for (iii) and pm = 0.5 and a = 10 for (iv). The results for n = 1000 are given in Table 2, which mostly follow a pattern similar to that observed in case 1. Although the RC estimator works better than the naive estimator in most cases, it may have larger bias when the censoring rate is 50% and the error variance is 0.2. The results for n = 500 are given in Web Table S1.
TABLE 2.
Simulation results for nonnormal error
| Error distribution | σ 2 | Method | LH (50% censoring) |
SH (80% censoring) |
||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bias | SD | SE | CP | Bias | SD | SE | CP | |||
| Ideal | 0.000 | 0.096 | 0.094 | 0.948 | 0.002 | 0.155 | 0.156 | 0.947 | ||
| t | 0.01 | Naive | 0.197 | 0.091 | 0.092 | 0.416 | 0.202 | 0.154 | 0.151 | 0.722 |
| RC | −0.187 | 0.117 | 0.114 | 0.621 | −0.149 | 0.192 | 0.187 | 0.890 | ||
| RRC | −0.109 | 0.113 | 0.114 | 0.847 | −0.107 | 0.192 | 0.187 | 0.908 | ||
| SCS | −0.015 | 0.098 | 0.098 | 0.945 | 0.013 | 0.170 | 0.167 | 0.947 | ||
| 0.02 | N | 0.253 | 0.092 | 0.092 | 0.204 | 0.263 | 0.149 | 0.149 | 0.555 | |
| RC | −0.274 | 0.146 | 0.114 | 0.368 | −0.207 | 0.203 | 0.186 | 0.782 | ||
| RRC | −0.099 | 0.115 | 0.114 | 0.863 | −0.098 | 0.188 | 0.186 | 0.916 | ||
| SCS | −0.032 | 0.097 | 0.098 | 0.933 | 0.003 | 0.173 | 0.171 | 0.946 | ||
| uniform | 0.01 | Naive | 0.217 | 0.091 | 0.092 | 0.332 | 0.220 | 0.148 | 0.150 | 0.676 |
| RC | −0.156 | 0.113 | 0.113 | 0.728 | −0.123 | 0.184 | 0.186 | 0.916 | ||
| RRC | −0.085 | 0.113 | 0.114 | 0.886 | −0.085 | 0.185 | 0.186 | 0.934 | ||
| SCS | 0.007 | 0.099 | 0.098 | 0.945 | 0.033 | 0.165 | 0.167 | 0.947 | ||
| 0.02 | Naive | 0.276 | 0.092 | 0.092 | 0.145 | 0.286 | 0.147 | 0.149 | 0.510 | |
| RC | −0.222 | 0.125 | 0.113 | 0.487 | −0.158 | 0.190 | 0.184 | 0.868 | ||
| RRC | −0.073 | 0.114 | 0.113 | 0.897 | −0.070 | 0.185 | 0.185 | 0.934 | ||
| SCS | −0.007 | 0.100 | 0.098 | 0.949 | 0.030 | 0.172 | 0.171 | 0.946 | ||
| skewed normal mixture | 0.01 | Naive | 0.204 | 0.092 | 0.092 | 0.408 | 0.213 | 0.151 | 0.150 | 0.685 |
| RC | −0.171 | 0.116 | 0.113 | 0.660 | −0.131 | 0.187 | 0.186 | 0.898 | ||
| RRC | −0.100 | 0.115 | 0.114 | 0.862 | −0.093 | 0.188 | 0.187 | 0.930 | ||
| SCS | −0.008 | 0.099 | 0.098 | 0.941 | 0.026 | 0.168 | 0.167 | 0.945 | ||
| 0.02 | N | 0.255 | 0.093 | 0.092 | 0.224 | 0.273 | 0.150 | 0.149 | 0.541 | |
| RC | −0.247 | 0.131 | 0.113 | 0.434 | −0.175 | 0.191 | 0.185 | 0.846 | ||
| RRC | −0.095 | 0.116 | 0.114 | 0.858 | −0.084 | 0.187 | 0.186 | 0.930 | ||
| SCS | −0.030 | 0.099 | 0.098 | 0.937 | 0.015 | 0.171 | 0.171 | 0.952 | ||
| symmetric normal mixture | 0.01 | Naive | 0.221 | 0.091 | 0.092 | 0.319 | 0.227 | 0.149 | 0.150 | 0.647 |
| RC | −0.152 | 0.114 | 0.113 | 0.739 | −0.114 | 0.185 | 0.186 | 0.922 | ||
| RRC | −0.081 | 0.113 | 0.114 | 0.893 | −0.077 | 0.185 | 0.186 | 0.933 | ||
| SCS | 0.011 | 0.098 | 0.098 | 0.955 | 0.042 | 0.165 | 0.167 | 0.947 | ||
| 0.02 | Naive | 0.280 | 0.091 | 0.092 | 0.133 | 0.294 | 0.147 | 0.148 | 0.489 | |
| RC | −0.215 | 0.123 | 0.113 | 0.512 | −0.149 | 0.186 | 0.184 | 0.877 | ||
| RRC | −0.067 | 0.113 | 0.113 | 0.912 | −0.061 | 0.183 | 0.185 | 0.938 | ||
| SCS | −0.003 | 0.098 | 0.099 | 0.952 | 0.040 | 0.170 | 0.171 | 0.952 | ||
Note: λ(t|I(Xi(t) > x0)) = λ0(t) exp{β0I(Xi(t) > x0)} with β0 = −1, x0 = log10350, Xi(t) = αi0 + αi1t measured at t = 0, 6, 12, …, 84, n = 1000. LH, larger λ0(t) = 0.02I(t ≥ 12) + 0.005I(t ≥ 60); SH, smaller λ0(t) = 0.0055I(t ≥ 12) + 0.0025I(t ≥ 60). SD, empirical standard deviation of the estimates across simulated data sets; SE, average of estimated standard errors; CP, coverage probability of the 95% Wald confidence interval.
In case 3, we added in case 1 a treatment indicator Z, which was generated from a Bernoulli distribution with probability 0.5. The covariate X(t) has mean 2.627 but with different slopes for the two treatment arms, which equal −0.035 when Z = 0 and 0.015 when Z = 1. The true coefficient γ0 = −0.5. The censoring rates were 59% and 84%, and and 11.5 (SD = 4.4), respectively, corresponding to the larger and smaller λ0(t). The results for n = 1000 are shown in Table 3. The naive estimator shows obvious bias for estimating both β0 and γ0. The RC and RRC reduces bias but the bias is still large for estimating β0. The SCS works well for estimating both β0 and γ0.
TABLE 3.
Simulation results with an extra treatment indicator
| LH (59% censoring) |
SH (84% censoring) |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| σ 2 | Method | Bias | SD | SE | CP | Bias | SD | SE | CP | |
| β 0 | 0.01 | Ideal | −0.005 | 0.111 | 0.108 | 0.944 | 0.001 | 0.187 | 0.183 | 0.936 |
| Naive | 0.212 | 0.109 | 0.106 | 0.492 | 0.228 | 0.183 | 0.178 | 0.740 | ||
| RC | −0.179 | 0.134 | 0.130 | 0.718 | −0.130 | 0.224 | 0.219 | 0.911 | ||
| RRC | −0.107 | 0.134 | 0.131 | 0.868 | −0.091 | 0.225 | 0.220 | 0.927 | ||
| SCS | −0.009 | 0.123 | 0.117 | 0.932 | 0.039 | 0.203 | 0.195 | 0.946 | ||
| 0.02 | Naive | 0.270 | 0.108 | 0.106 | 0.278 | 0.292 | 0.184 | 0.177 | 0.613 | |
| RC | −0.257 | 0.146 | 0.130 | 0.499 | −0.184 | 0.231 | 0.218 | 0.861 | ||
| RRC | −0.104 | 0.132 | 0.131 | 0.882 | −0.089 | 0.225 | 0.219 | 0.926 | ||
| SCS | −0.031 | 0.126 | 0.119 | 0.928 | 0.032 | 0.209 | 0.200 | 0.941 | ||
| γ 0 | Ideal | −0.011 | 0.111 | 0.109 | 0.948 | −0.008 | 0.178 | 0.181 | 0.955 | |
| 0.01 | Naive | −0.087 | 0.111 | 0.109 | 0.881 | −0.092 | 0.181 | 0.183 | 0.922 | |
| RC | −0.041 | 0.111 | 0.109 | 0.933 | −0.050 | 0.180 | 0.182 | 0.952 | ||
| RRC | −0.056 | 0.111 | 0.109 | 0.928 | −0.058 | 0.180 | 0.182 | 0.947 | ||
| SCS | −0.018 | 0.113 | 0.109 | 0.941 | −0.025 | 0.183 | 0.182 | 0.957 | ||
| 0.02 | Naive | −0.109 | 0.111 | 0.109 | 0.818 | −0.118 | 0.182 | 0.183 | 0.903 | |
| RC | −0.031 | 0.112 | 0.109 | 0.940 | −0.044 | 0.180 | 0.182 | 0.957 | ||
| RRC | −0.063 | 0.111 | 0.109 | 0.917 | −0.064 | 0.181 | 0.182 | 0.951 | ||
| SCS | −0.017 | 0.114 | 0.109 | 0.935 | −0.027 | 0.184 | 0.183 | 0.952 | ||
Note: λ(t|I(Xi(t) > x0)) = λ0(t) exp{β0I(Xi(t) > x0) + γ0Zi} with β0 = −1, γ0 = −0.5, Xi(t) = αi0 + αi1t measured at t = 0, 6, 12, …, 84, normal error, x0 = log10350, normal error, n = 1000. LH, larger λ0(t) = 0.02I(t ≥ 12) + 0.005I(t ≥ 60); SH, smaller λ0(t). SD, empirical standard deviation of the estimates across simulated datasets; SE, average of estimated standard errors; CP, coverage probability of the 95% Wald confidence interval.
We also conducted simulations in settings with different β0, with less observations on , with informative observation times, or with trichotomized categories of X(t) (See Web Appendix S.2). Overall, the simulation evidence suggests that the SCS works better than the naive, RC, and RRC methods. We also assessed the performance of the SCS with hn selected by fivefold cross-validation for a small number (100) of simulated data sets under cases 1 and 2 with n = 1000 (Web Table S6). It works reasonably well compared to the results calculated based on (6).
6.2 |. Application
We applied the proposed approaches to the ACTG 175 data. We are interested in evaluating I(CD4 count ≤ 350 cells/mm3) as a potential surrogate marker. There were a total of 308 events observed during the study. The log10 transformation was applied to CD4 count to achieve approximately constant error variance. Figure 2(A) presents the log10-transformed CD4 trajectories for 10 randomly selected subjects, which shows an initial increase, with a peak around week 12, followed by an approximate linear decline. Because only nine events occurred by week 12, for simplicity, we considered the data after week 12 assumed X(u) = α0 + α1u represents the inherent log10 CD4 count at time u. The CD4 observations before week 12 were excluded from the analysis. The analysis included 2186 subjects with at least two CD4 observations after week 12 with an average 9.4 (SD = 3.2) observations. The residual plot from the least square estimates shows that error variance is approximately constant after the log10 transformation (Figure 2B), and the corresponding normal and Student’s t Q–Q plots indicate that the error distribution may be short tailed compared to the normal but close to a scaled t-distribution with degrees of freedom 4 (Figure 2C and D). The estimated error variance is 0.011, which was about 40% of the estimated baseline CD4 variance. The primary analysis found zidovudine alone to be inferior to the other three therapies; thus, further investigations focused on two treatment groups, zidovudine alone and the combination of the other three. Let Z = I(treatment ≠ zidovudine alone) be the treatment indicator. The estimate for Z = 0 and (2.581, −0.0021) for Z = 1.
FIGURE 2.

(A) Trajectories of log10(CD4) for 10 randomly selected subjects; (B) residual plot; (C) normal Q–Q plot, plot of empirical quantiles of residuals versus the theoretical quantiles of standard normal distribution; (D) Student’s t Q–Q plot, plot of empirical quantiles of residuals versus theoretical quantiles of t distribution with degrees of freedom 4. The reference lines in (C) and (D) are obtained from robust linear regression of the empirical quantiles on the theoretical quantiles. This figure appears in color in the electronic version of this article, and any mention of color refers to that version
To assess I(CD4 count ≤ 350 cells/mm3) or equivalently I(Xi(t) > log10 350) as a surrogate marker, we consider three proportional hazards models: (1) a hazard model with the covariate treatment Z only; (2) a hazard model with I(Xi(t) > log10 350) only; (3) a hazard model with both Z and I(Xi(t) > log10 350). Model (1) includes only an error-free covariate Z and was fitted via the standard partial likelihood approach. Models (2) and (3) include a dichotomized covariate of Xi(t) and were fitted using the naive, RC, RRC, and SCS approaches.
The results are shown in Table 4. Treatment alone [Model (1)] shows a significant effect. The results from the naive, SCS, RC and RRC approaches also show that I(Xi(t) > log10 350) alone [Model (2)] has a significant effect on survival time, which indicates that CD4 count ≤ 350 cells/mm3 is associated with larger hazard and is prognostic for time to AIDS or death. Model (3) shows that the treatment effect is no longer significant after adjusted for I(Xi(t) > log10 350), which implies that the treatment effect is mediated through whether CD4 count ≤ 350 cells/mm3, and hence confirms the Prentice’s conditions of surrogate markers. The SCS estimates are larger in magnitude for estimation of β0 and smaller for estimation of γ0 compared to the naive estimates, which might be due to the correction of the bias caused by the measurement error. The RC and RRC estimates are larger in magnitude than the corresponding SCS estimates, and the standard errors are either larger or comparable.
TABLE 4.
Analysis of the ACTG 175 data
|
β
0
|
exp(β0) |
γ
0
|
exp(γ0) |
||||||
|---|---|---|---|---|---|---|---|---|---|
| Est | SE | Est | CI | Est | SE | Est | CI | ||
| Model 1 | – | – | – | – | −0.363 | 0.131 | 0.696 | (0.538,0.900) | |
| Model 2 | Naive | −1.978 | 0.215 | 0.138 | (0.091,0.211) | – | – | – | – |
| RC | −2.864 | 0.293 | 0.057 | (0.032,0.101) | – | – | – | – | |
| RRC | −2.828 | 0.290 | 0.059 | (0.033,0.104) | – | – | – | – | |
| SCS | −2.673 | 0.265 | 0.069 | (0.041,0.116) | – | – | – | – | |
| Model 3 | Naive | −1.958 | 0.216 | 0.141 | (0.092,0.215) | −0.196 | 0.133 | 0.822 | (0.633,1.067) |
| RC | −2.705 | 0.282 | 0.067 | (0.038,0.116) | −0.189 | 0.133 | 0.827 | (0.637,1.074) | |
| RRC | −2.696 | 0.282 | 0.067 | (0.039,0.117) | −0.190 | 0.133 | 0.827 | (0.637,1.074) | |
| SCS | −2.670 | 0.268 | 0.069 | (0.041,0.117) | −0.157 | 0.133 | 0.855 | (0.659,1.109) | |
Abbreviations: CI, 95% confidence interval; Est, estimate; SE, estimated standard errors.
7 |. DISCUSSION
We propose an SCS approach for the proportional hazards model with time-dependent discretized covariates. We also extend the RC and RRC to time-dependent discretized covariates. Our numerical studies indicate that the SCS works better than the RC and RRC. We focus on the case when the cut-off points are known. In some situations, the cut-off points need to be estimated (e.g., quantiles). The case is of interest but beyond the scope of this paper.
The proposed approaches can be extended to more flexible models such as the time-varying coefficient model and the partially time-varying coefficient models using techniques as those in Song and Wang (2008) and Song and Wang (2017).
Supplementary Material
ACKNOWLEDGMENTS
This research was partially supported by NIH grants R43GM134768 and R44GM100573 (Chao, Wang and Song), CA239168 (Wang and Song), CA235122 and S10OD028685 (Wang), and NSF grant DMS-1916411 (Song).
Funding information
National Science Foundation, Grant/Award Number: 1916411; National Institutes of Health, Grant/Award Numbers: CA235122, CA239168, R43GM134768, R44GM100573, S10OD028685
Footnotes
SUPPORTING INFORMATION
Web Appendices and Tables referenced in Sections 3, 4, and 6.1 along with the implemented C++ code and the demonstrating simulation data are available with this paper at the Biometrics website on Wiley Online Library.
DATA AVAILABILITY STATEMENT
The ACTG 175 data used in Section 6.2 are available on request from AIDS Clinical Trials Group (https://actgnetwork.org/).
REFERENCES
- Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006) Measurement error in nonlinear models. New York, NY: Chapman and Hall/CRC. [Google Scholar]
- Dafni UG & Tsiatis AA (1998) Evaluating surrogate markers of clinical outcome measured with error. Biometrics, 54, 1445–1462. [PubMed] [Google Scholar]
- Dalen I, Buonaccorsi JP, Sexton JA, Laake P & Thoresen M (2009) Correction for misclassification of a categorized exposure in binary regression using replication data. Statistics in Medicine, 28, 3386–3410. [DOI] [PubMed] [Google Scholar]
- Flegal KM, Keyl PM & Nieto FJ (1991) Differential misclassification arising from nondifferential errors in exposure measurement. American Journal of Epidemiology, 134, 1233–1244. [DOI] [PubMed] [Google Scholar]
- Gustafson P (2004) Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. New York, NY: Chapman and Hall. [Google Scholar]
- Gustafson P & Le DN (2002) Comparing the effects of continuous and discrete covariate mismeasurement, with emphasis on the dichotomization of mismeasured predictors. Biometrics, 58, 878–887. [DOI] [PubMed] [Google Scholar]
- Hammer SM, Katezstein DA, Hughes MD, Gundaker H, Schooley RT, Haubrich RH et al. (1996) A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335, 1081–1089. [DOI] [PubMed] [Google Scholar]
- Heller G (2007) Smoothed rank regression with censored data. Journal of the American Statistical Association, 102, 552–559. [Google Scholar]
- Horowitz JL (1992) A smoothed maximum score estimator for the binary response model. Econometrica, 60, 505–531. [Google Scholar]
- Liao X, Zucker DM, Li Y & Speigelman D (2011) Survival analysis with error-prone time-varying covariates: a risk set calibration approach. Biometrics, 67, 50–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma S & Huang J (2005) Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics, 21, 4356–4362. [DOI] [PubMed] [Google Scholar]
- May MT, Gompels M, Delpech V, Porter K, Orkin C, Kegg S et al. (2014) Impact on life expectancy of HIV-1 positive individuals of CD4+ cell count and viral load response to antiretroviral therapy. AIDS, 28, 1193–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Natarajan L (2009) Regression calibration for dichotomized mismeasured predictors. International Journal of Biostatistics 5(1), 1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prentice R (1982) Covariate measurement errors and parameter estimates in a failure time regression model. Biometrika, 69, 331–342. [Google Scholar]
- Prentice R (1989) Surrogate endpoints in clinical trials: definition and operation criteria. Statistics in Medicine, 8, 431–440. [DOI] [PubMed] [Google Scholar]
- Seguin RA, Buchner D, Lui J, Messina C, Manson J, Moreland L et al. (2014) Sedentary behavior and mortality in older women: the womens health initiative observational and extension studies. American Journal of Preventive Medicine, 46, 122–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song X (2017) An improved corrected score estimator for the proportional hazards model with time-dependent covariates measured with error at informative observation times. Statistica Sinica, 27, 1037–1057. [Google Scholar]
- Song X, Davidian M & Tsiatis AA (2002a) A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics, 58, 742–753. [DOI] [PubMed] [Google Scholar]
- Song X, Davidian M & Tsiatis AA (2002b) An estimator for the proportional hazards model with multiple longitudinal covariates measured with error. Biostatistics, 3, 511–528. [DOI] [PubMed] [Google Scholar]
- Song X, Ma S, Huang J & Zhou XH (2007) A semiparametric approach for the nonparametric transformation survival model with multiple covariates. Biostatistics 8, 197–211. [DOI] [PubMed] [Google Scholar]
- Song X & Wang CY (2008) Semiparametric approaches for joint modeling of longitudinal and survival data with time varying coefficients. Statistica Sinica, 27, 3178–3190. [DOI] [PubMed] [Google Scholar]
- Song X & Wang L (2017) Partially time-varying coefficient proportional hazards models with error-prone time-dependent covariates - an application to the AIDS clinical trial group 175 data. Annals of Applied Statistics, 11, 274–296. [Google Scholar]
- Tsiatis AA & Davidian M (2001) A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika, 88, 447–458. [DOI] [PubMed] [Google Scholar]
- Tsiatis AA, DeGruttola V & Wulfsohn MS (1995) Modeling the relationship of survival to longitudinal data measured with error: applications to survival and CD4 counts in patients with aids. Journal of the American Statistical Association, 90, 27–37. [Google Scholar]
- Wang CY (2006) Corrected Score Estimator for Joint Modeling of Longitudinal And Failure Time Data. Statistica Sinica, 16, 235–253. [Google Scholar]
- Wang CY, Tapsoba JD, Duggan C, Campbell K & McTiernan A (2016) Methods to Adjust for Misclassification in The Quantiles for the Generalized Linear Model With Measurement Error in Continuous Exposures. Statistics in Medicine, 35, 1676–1688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wulfsohn MS & Tsiatis AA (1997) A joint model for survival and longitudinal data measured with error. Biometrics, 53, 330–339. [PubMed] [Google Scholar]
- World Health Organization (2009) Rapid advice: antiretroviral therapy for HIV infection in adults and adolescents. Available at: https://apps.who.int/iris/handle/10665/107280 (last accessed on 11/08/2021). [PubMed]
- Xie SX, Wang CY & Prentice RL (2001) A risk set calibration method for failure time regression by using a covariate reliability sample. Journal of Royal Statistical Society, Series B, 63, 855–870. [Google Scholar]
- Xu C, Baines PD & Wang JL (2014) Standard error estimation using the EM algorithm for the joint modeling of survival and longitudinal data. Biostatistics, 15, 731–744. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The ACTG 175 data used in Section 6.2 are available on request from AIDS Clinical Trials Group (https://actgnetwork.org/).
