Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 15.
Published in final edited form as: Stat Med. 2021 Jun 22;40(23):5006–5024. doi: 10.1002/sim.9108

An approximate quasi-likelihood approach for error-prone failure time outcomes and exposures

Lillian A Boe 1, Lesley F Tinker 2, Pamela A Shaw 1
PMCID: PMC8963256  NIHMSID: NIHMS1763004  PMID: 34519082

Abstract

Measurement error arises commonly in clinical research settings that rely on data from electronic health records or large observational cohorts. In particular, self-reported outcomes are typical in cohort studies for chronic diseases such as diabetes in order to avoid the burden of expensive diagnostic tests. Dietary intake, which is also commonly collected by self-report and subject to measurement error, is a major factor linked to diabetes and other chronic diseases. These errors can bias exposure-disease associations that ultimately can mislead clinical decision-making. We have extended an existing semiparametric likelihood-based method for handling error-prone, discrete failure time outcomes to also address covariate error. We conduct an extensive numerical study to compare the proposed method to the naive approach that ignores measurement error in terms of bias and efficiency in the estimation of the regression parameter of interest. In all settings considered, the proposed method showed minimal bias and maintained coverage probability, thus outperforming the naive analysis which showed extreme bias and low coverage. This method is applied to data from the Women’s Health Initiative to assess the association between energy and protein intake and the risk of incident diabetes mellitus. Our results show that correcting for errors in both the self-reported outcome and dietary exposures leads to considerably different hazard ratio estimates than those from analyses that ignore measurement error, which demonstrates the importance of correcting for both outcome and covariate error.

Keywords: Cox model, measurement error, misclassification, proportional hazards, regression calibration, survival analysis

1 |. INTRODUCTION

Chronic diseases are often recorded primarily by self-reported diagnosis in large observational cohort studies. For example, in comparison to reference (gold) standard measures for detecting diabetes, such as fasting glucose and hemoglobin A1c (HbA1c), self-reported diabetes status is inexpensive and easily attainable. However, not all people who are diagnosed with diabetes or other conditions will self-report that they have the disease. Reasons for failing to report having a chronic condition include failure to be diagnosed, lack of understanding about the disease, and a belief that the disease has gone away if it is being properly treated.1,2 Conversely, a positive disease status is occasionally reported when the disease is not actually present.3,4 Dietary intake, which is also commonly recorded by self-report, is thought to play a crucial role in determining the risk of chronic diseases such as diabetes and cardiovascular disease. In nutritional epidemiology, estimates of diet-disease associations can be distorted due to measurement error in both self-reported dietary exposures and disease outcomes. A new analytic approach is needed to properly relate error-prone exposures with error-prone disease outcomes of interest. In this article, we have extended an existing semiparametric model for handling failure time outcomes assessed through interval-censored, error-prone measures to also address measurement error in the exposure variable.

There is ample literature available on methods for adjusting analyses with error-prone exposures in the case of time-to-event outcomes.5 In existing epidemiological analyses, regression calibration is one of the most popular methods for addressing covariate measurement error.6 This method relies on building a calibration model that relates the expected value of the unobserved true exposure to the observed data. Prentice7 introduced the method for time-to-event outcomes. Rosner et al8,9 considered it for logistic regression, where a single or multiple covariates were error-prone. In nonlinear models, such as Cox and logistic regression, regression calibration is considered a quasi-likelihood approach as it is generally only an approximate correction,10 but it has been observed to do well for modest β and low event rates.5,7 The popularity of this approach likely has to do with the intuitive appeal of the method and the ease of implementation. The method proposed in this article uses regression calibration in order to develop an estimator that will correct for both covariate and outcome error.

Compared with methods for addressing covariate error, there has been notably less investigation into methods that correct for errors that occur in the time-to-event outcomes themselves. In epidemiologic cohort studies, the time-to-event of interest is often ascertained through periodic follow-up, thus resulting in data captured in fixed intervals. Thus, methods that address errors in the event indicator at each interval are of particular interest. Balasubramanian and Lagakos11 developed estimation methods for the distribution of the time-to-event that consider various periods of exposure and diagnostic tests with different levels of accuracy. Meier et al12 presented an adjusted proportional hazards model for estimating hazard ratios (HR) in discrete time survival analysis when the outcome is measured with error. Magaret13 considered methods that adjusted the proportional hazards model to incorporate data from validation subsets for the case where the sensitivity and the specificity of the diagnostic tests are unknown. All of this existing work assumes that the covariates included in the time-to-event analyses are error-free, which is often untrue with clinical data.

This article specifically builds on the work of Gu et al,14 which introduced a semiparametric likelihood-based approach for estimating the association of covariates with an error-prone discrete failure time outcome. Motivated by an example from the Women’s Health Initiative (WHI), we extend this method to incorporate a regression calibration fix that additionally adjusts for covariate measurement error and also allows for strata-specific baseline hazards. Our method can be applied to a study cohort that has collected follow-up data on an error-prone disease status variable at two or more distinct visit times and has information available at baseline on specific covariates of interest. In the presence of covariate measurement error, the proposed method can be considered when there is data that informs the measurement error model. We must assume that (1) information is available regarding the sensitivity and specificity of the outcome measure (2) a second measure of the error-prone covariate(s) is available on at least a subset.

Section 2 introduces the theoretical development of the method by providing notation, constructing the likelihood function and discussing the proposed adjustment method that corrects for outcome and covariate error. Next, we examine the numerical performance of the proposed method with a simulation study in Section 3. In Section 4, we apply the proposed method to evaluate the association between dietary energy, protein, and protein density intake and incident diabetes in a subset of women enrolled in the WHI. Finally, we highlight the important findings of this work and discuss potential extensions in Section 5.

2 |. METHODS

2.1 |. Notation and time-to-event model

Let Ti be the unobserved time-to-event of interest for subjects i = 1, …, N. Consider a study with periodic follow-up where each subject may have a slightly different visit schedule or missed visits. Define τ1, …, τJ as the distinct possible visit times among all N subjects. Denote τ0 = 0 and τJ+1 = ∞. We assume that the time to the event of interest is continuous, but follow-up occurs at discrete visit times. The follow-up time period can then be divided into J + 1 disjoint intervals, listed as follows: [τ0, τ1), [τ1, τ2), … [τJ, τJ+1). Assume that all subjects in the study are event-free at time τ0. Later, we will relax this assumption. Let ni be the number of visits for the ith subject, which we assume is random. In our motivating data example, each subject self-reports his or her disease status at each visit time, potentially with error, up until the first positive. Our method can also be applied to the more general setting for error-prone outcomes in which follow-up continues beyond the first positive. Define Yi and ti as the random 1 × ni vector of error-prone outcomes and corresponding vector of visit times for subject i. Specifically, define Yij as 1 if the jth error-prone outcome for ith subject is positive, and 0 otherwise. Then, the joint probability of the observed data for the ith subject is:

f(Yi,ti,ni)=j=1J+1θjPr(Yi,ti,ni|τj1<Tiτj), (1)

where θj = Pr(τj−1 < Tiτj).

We make the additional assumption that conditioned on the true event time Ti, the ni error-prone outcomes Yij are independent, that is, Pr(Yi|Ti,ti)=l=1niPr(Yil|Ti,til). Thus, other observed error-prone outcomes do not provide additional information about a specific error-prone outcome beyond what is already given by the true time of event. Following the notation and logic of Balasubramanian and Lagakos,11 it can be shown that for the case of a prespecified visit schedule, the likelihood becomes:

f(Yi,ti,ni)=j=1J+1θj[l=1niPr(Yil|τj1<Tiτj,tl)]=j=1J+1θjCij, (2)

where Cij=l=1niPr(Yil|τj1<Tiτj,tl). In Section S2 of the Supplementary Materials, we show how Equation (2) becomes the expression for a subject’s likelihood contribution.

For ease of presentation, we calculate Cij for the case of no missed visits, but the formula can be easily adapted to accommodate missed visits by summing up the θj for the (τj−1, τj] that define each subject’s observational interval. We assume constant and known sensitivity (Se) and specificity (Sp); namely, Se = Pr(Yil = 1|τj−1 < Tiτj, tlτj) and Sp = Pr(Yil = 0|τj−1 < Tiτj, tlτj−1). Then, the Cij terms take the following form:

Ci1=Sej=1niYij(1Se)j=1ni(1Yij),Ci2=Sp(1Yi1)(1Sp)Yi1Sej=2niYij(1Se)j=2ni(1Yij),Ci(J+1)=Spj=1ni(1Yij)(1Sp)j=1niYij.

Now suppose we have the proportional hazards model, S(t)=S0(t)exp(xTβX+zTβZ). We assume that one or more covariates are recorded with error. Define Xi* as a p-dimensional vector of covariates of interest that may be observed with error, and Xi a corresponding p-dimensional vector of unobserved true exposure variables. We describe the error structure of the observed error-prone covariate X* in Section 2.2.2. Let Zi be a q-dimensional vector of precisely measured covariates (ie, error-free) that may be correlated with Xi. Define β = (βX, βZ)T. The likelihood can be rewritten in terms of the baseline survival probabilities S = (S1, S2, …, SJ+1)T defined by the random variable T0 with survival function S0(t), where Sj = Pr(T0 > τj−1). One then has 1 = S1 > S2 > … > SJ+1 > 0 and Sj=h=jJ+1θh. We can define a linear (J + 1) × (J + 1) transformation matrix M such that θ = MS. Finally, define the (N) × (J + 1) matrix D = CM, where CN×(J+1) consists of the Cij elements defined above. Following Gu et al,14 the log-likelihood can be rewritten as:

l(S,β)=i=1Nlog(j=1J+1DijSjexp(xiTβX+ziTβZ)). (3)

The Dij components of the log-likelihood consist of elements of the matrix D defined above and are functions of the observed data, (Xi, Zi, Yi, ti), as well as Se and Sp. One can apply the usual maximum likelihood approach to solve for the unknown parameters βX, βZ, S2, … ,SJ+1. The covariance matrix can be found by inverting the Hessian matrix. Note that the model above introduced by Gu et al14 is considered semiparametric because we do not make any assumptions about the form of the baseline survival probabilities, Sj, for j = 1, …, J + 1.

2.2 |. Proposed method for outcome and covariate error

We now extend the above method that corrects for outcome error in the discrete proportional hazards model to also adjust for covariate error by adopting a regression calibration type approach. In this section, we describe the regression calibration approach for covariate measurement error, present our proposed method to adjust for covariate and outcome error, extend our method to accommodate a baseline hazard that varies across strata, and extend the method to handle false negatives that are mistakenly included in the analysis.

2.2.1 |. Regression calibration for covariate error

Regression calibration is an approach to correcting biases in regression parameters when exposure variables are recorded with error, in which a calibration equation for the unobserved exposure X is estimated. Namely, one builds a model for E(X|X*, Z), where X* is the error-prone observation or surrogate for X while Z are the other precisely observed covariates in the outcome model (3). Regression calibration may be used when X* follows the classical measurement error model or when X* has both systematic and classical random error. These error settings will be explained in further detail in the subsequent section. Rosner et al8 introduced a post hoc calibration fix in the logistic regression setting when there is measurement error in a single covariate of interest and Rosner et al9 extended the method to handle multiple error-prone covariates in logistic regression. In each of these approaches, the calibration equation is used to correct the naive parameter estimates that are obtained from first fitting the outcome regression that ignores the measurement error. An asymptotic formula for the variance that incorporates the uncertainty of the calibration equation is derived using the Delta method. We will employ a similar post hoc calibration fix-up for the estimator that first corrects for outcome measurement error. We further justify why this post hoc correction approach is expected to work well in our discrete-time proportional hazards setting at the end of this section.

2.2.2 |. Proposed approach for outcome and covariate error

Recall that Xi is a p-dimensional vector of true, unobserved covariates, while Zi is a q-dimensional vector of observed, precisely measured covariates possibly correlated with Xi. Instead of observing Xi, we assume an error-prone Xi* is observed, where Xi* is assumed to be linearly related with Xi and possibly other covariates Zi. This error model has been commonly applied in many settings, including nutritional epidemiology.5,15 The regression calibration model then takes the following form:

Xi=δ(0)+δ(1)Xi*+δ(2)Zi+Ui, (4)

where Ui is a random, mean 0 error term, which is independent of Xi* and Zi. Equation (4) directly implies that our observed, error-prone variable Xi* follows the linear measurement error model, that is, Xi*=α(0)+α(1)Xi+α(2)Zi+ei, where the random error ei is independent of Xi and Zi.15 Note that we also assume nondifferential error, that is, the distribution of T conditional on (X, X*, Z) is equal to the distribution of T conditional on (X, Z). The model parameters in Equation (4) are identifiable if we have a calibration subset available in which we observe the error-prone measure Xi*, as well as a measure Xi** that is unbiased for the true Xi and follows the classical measurement error model:

Xi**=Xi+ϵi, (5)

where ϵi is random, mean 0 error that is independent of Xi. X** is often referred to as an imperfect reference or alloyed gold standard.16,17 Note that ϵi are assumed to be independent of all variables in the outcome model (3). Observing the exact true exposure Xi in the ancillary data is a special case of observing Xi** where the error variance is 0, and the subset is typically called a validation subset. A special case of the linear measurement error model occurs when α(0)=α(2)=0 and α(1)=1, and thus the observed error-prone measurement Xi* has classical measurement error. In this scenario, we can estimate the parameters of the calibration model by assuming that we observe replicates of X*. Ancillary data of this type is typically referred to as a reliability subset.

When a calibration or validation subset is available, one can adopt a regression calibration type approach to further correct the regression coefficients for error in the exposure variable. In the case of a calibration subset, we regress Xi** on the error-prone exposure, Xi*, and other covariates of interest Zi to fit the model:

Xi**=δ(0)+δ(1)Xi*+δ(2)Zi+Vi, (6)

where Vi, is random, mean 0 error. Note the model in Equation (6) differs from that in Equation (4) only in that the error term Vi incorporates the extra variability introduced by the error term in Xi** Estimates of the coefficients from fitting this linear regression can then be used to correct the β coefficients from the time-to-event model. Following the approach of Rosner et al,9 the corrected β can be found by solving:

β^=β^*Δ^1, (7)

where β^* is the partially “naive” regression coefficient obtained from the time-to-event model ignoring the error in X*, and Δ^, the estimated multivariate correction factor, is defined as:

Δ^=[δ^(1)p×pδ^(2)p×q0q×pIq×q]. (8)

The variance-covariance matrix Σ for β^ is calculated using the multivariate delta method. Assume that β^* and Δ^ are independent, which holds if the calibration subset is an independent group of individuals from the main study (ie, the main study data and the calibration subset are either independent datasets or are mutually exclusive subsets of the same set of data) and approximately holds if the number of subjects in the calibration subset, nc, is a small percentage of the main study sample size, N.8 Once we make this connection, we see that we can apply the same formulas as Rosner et al9 and therefore the (j1, j2)th element of Σ^ for β^ is

Σ^β(j1,j2)(A^Σ^β*A^)j1,j2+β^*Σ^A,j1,j2β^*, (9)

where A^=Δ^1, Σ^β* is the corresponding estimated variance-covariance matrix, and Σ^A,j1,j2 is described below. Note that Σ^β* can be estimated from the model introduced above that only adjusts for outcome error. The matrix Σ^β(j1,j2) is essentially a sum of two pieces: the first can be viewed as the contribution of the uncertainty in estimating β* and the second is a contribution of the uncertainty in the calibration coefficients. Following Rosner et al,9 the (i1, i2)th element of Σ^A,j1,j2, for i1, i2, j1, j2 = 1, …, w, (w = p + q) is

Σ^A,j1,j2r=1ws=1wt=1wu=1wA^i1rA^sj1A^i2tA^uj2Cov(Δ^rs,Δ^tu). (10)

In the simple linear regression case, the post hoc correction presented in Equation (7) reduces to the following familiar form: β^=β^*δ^, where β^* is the estimate for β obtained from the “naive” regression using Xi* that ignores the error in the covariate of interest, and δ^ is the estimate of the attenuation coefficient from the simple linear regression correction. Similarly, the variance estimator for this correction is easily calculated using the univariate delta method as var(β^)=1δ^2var(β^*)+β^*2δ^4var(δ^).

Rosner et al9 justified this proposed correction for logistic regression for small β. One can use a Taylor series approximation to show when this method can be expected to work similarly for the Cox proportional hazards model. Specifically, Green and Symons18 used a linear Taylor series expansion to illustrate the approximate mathematical equivalence between the logistic regression model and the Cox proportional hazards model when the event of interest is rare, the follow-up time is short, and the baseline hazard in the Cox model is constant. The post hoc regression parameter correction developed for logistic regression is expected to do similarly well for the Cox proportional hazards model for settings that uphold these assumptions. We explore this further with a numerical study. In Section S3 of the Supplementary Materials, we establish the asymptotic properties of our estimator. Computational details and R code for implementing the proposed method are presented in Section S1 of the Supplementary Materials. The R code used to implement all simulations is available on GitHub at https://github.com/lboe23/Outcome-Error-RC.

2.2.3 |. Strata-specific baseline hazards

For a continuous failure time outcome, the proportional hazards model takes the familiar form S(t)=S0(t)exp(xTβx+zTβZ). Under this assumption, the baseline survival function S0(t) and baseline hazard function λ0(t) are shared by all subjects in the data. Oftentimes, however, this assumption is invalid and we expect baseline survival to differ across groups defined by one or more covariates. To address the issue of nonproportional hazards, we let the survival function for a subject from stratum k be Sk(t)=S0k(t)exp(xTβX+zTβZ), k = 1, …, K, where S0k(t) is the baseline survival for all individuals in stratum k.

In a discrete proportional hazards model that incorporates stratification, we allow strata-specific versions of the baseline survival function introduced in Section 2.1, such that Sk = (S1k, S2k, …, S(J+1)k)T. We can accordingly modify the log-likelihood function from Equation (3) to allow for stratification on one or more predictors. As in the continuous time setting, the stratified log-likelihood for all N subjects is a simple sum of the log-likelihood for each stratum. Now, in our discrete failure time setting, the log-likelihood function for the Nk subjects in stratum k is given by:

lk(Sk,β)=i=1Nklog(j=1J+1DijSjkexp(xiTβX+ZiTβZ)). (11)

Correspondingly, the log-likelihood for all N subjects is calculated as follows:

l(Sk,β)=k=1K[i=1Nklog(j=1J+1DijSjkexp(xiTβX+ZiTβZ))]. (12)

Using this likelihood, we can solve for the unknown parameters βX, βZ, S2k, …, S(J+1)k, k = 1, …, K and compute the estimated covariance matrix as described in Section 2.1. Although the baseline survival functions are different for each stratum, the coefficients βX and βZ are assumed to be uniform across all strata. Note, in the setting without misclassification in the event indicator, strata should be chosen such that each stratum contains subjects with the event of interest, as a stratum with no events does not contribute any information to the analysis.19 However, with a sensitivity less than 1, events and nonevents of a stratum both contribute to the likelihood (12). Under this model, we can apply the same post hoc fix introduced in Section 2.2.2 to also correct the estimated coefficients for exposure error.

2.2.4 |. Adjusting for false negatives at baseline

The proposed method can be modified to handle the case in which individuals with a baseline false negative test are erroneously included into the analysis. This simple extension of the method applies to scenarios in which subjects are only included in the study if they report being event-free at baseline. This extension is motivated by the analysis approach of Tinker et al,20 which excluded anyone with a positive self-report at baseline. To allow for a nonzero probability of a baseline false negative test, we will now assume S1 < 1.

Let Ri and Ei be the observed error-prone event status at baseline and the unobserved true event status at baseline, respectively. Consider all subjects in the study that have a negative error-prone outcome at baseline, that is, Ri = 0, and are therefore included in the analysis population. Define η as the negative predictive value, or the probability that a subject with a negative error-prone outcome is truly disease-free, that is, η = Pr(Ei = 0|Ri = 0), which we assume is constant across all N subjects. Further assume all subjects with a negative error-prone outcome who are truly disease-free constitute a random sample of all subjects who are truly disease-free at baseline, so that Pr(Yi, ti, ni|Ei = 0, Ri = 0) = Pr(Yi, ti, ni|Ei = 0). Then, the likelihood function for subject i can be expressed as follows:

f(Yi,ti,ni)=Pr(Yi,ti,ni|Ri=0)=ηPr(Yi,ti,ni|Ei=0,Ri=0)+(1η)Pr(Yi,ti,ni|Ei=1,Ri=0)=ηj=1J+1DijSjexp(xiTβX+ziTβZ)+(1η)Di1S1exp(xiTβX+ziTβZ).

Thus, the log-likelihood for all N subjects is

l(S,β)=i=1Nlog(Di1S1exp(xiTβX+ziTβZ)+ηj>1J+1DijSjexp(xiTβX+ziTβZ)). (13)

3 |. NUMERICAL STUDY

We examine the numerical performance of our proposed estimator using a simulation study. We compare our estimator to the results from the “true” model, in which a discrete proportional hazards model is fit with the true (error-free) event time and covariate values, and the “naive” model, which fits the same model with the error-prone outcome and covariate. In all simulations, we assume a single error-prone covariate of interest. We assume that there are two precisely measured covariates, which are moderately correlated with the error-prone variable. Our results show how our estimator performs under different levels of outcome sensitivity and specificity, error variance in the covariate, sample size, and censoring rates. We present percent biases, average standard errors (ASE), empirical standard errors (ESE), and 95% coverage probabilities (CP) across these various settings. Mean percent bias is calculated as follows: β^ββ×100, where β is the target regression parameter of interest. The ASE is defined as the mean of the estimated standard errors from the model, while the ESE is the empirical SD of the estimated coefficients across simulations. In addition, we present type I error results for βX1 = 0 and α = 0.05, where βX1 is the regression parameter corresponding to the error-prone covariate.

3.1 |. Simulation setup

We present results from 1000 simulations run in R version 3.5.2.21 The three covariates, X1, Z1, and Z2 were generated from a multivariate normal distribution, all with mean 0 and a covariance matrix with all diagonal elements equal to 1 and all off-diagonal elements equal to 0.3. We generated our error-prone covariate X1* using the linear measurement error model, X1* = α0 + α1X1 + α2Z1 + α3Z2 + e, with α0 = 1, α1 = 0.8, α2 = 0.3, and α3 = 0.5. We assumed e ~ N(0, σ2) and considered σ2 values of 0.59 and 1.72, which correspond to estimated δ(1) values of approximately 0.60 and 0.30, respectively.

Later, we assess how our method performs when error is not normally distributed, but instead e~0.4N(0,1)+0.6N(2,1.5) and e distributed as a t with 4 degrees of freedom (df). For all simulations, there are N = 1000 subjects in the main study data. We assume our calibration subset is a random sample of nC = 500 subjects from the main study. The measure approximating X1 in the calibration subset, X1**, is generated to follow the classical measurement error model from Equation (5), where ϵ~N(0,0.06).

We considered typical settings for which regression calibration has been observed to perform well, including a moderate βX1 and a higher censoring rate (CR).22 The true log HR were selected to be βX1 = log(1.5), βZ1 = log(0.7), and βZ2 = log(1.3). Later, we set βX1 = log(3) to assess how the method performs under a more extreme regression coefficient corresponding to the error-prone covariate. The true time-to-event was generated from a continuous time exponential distribution. To mimic the settings of real data, we considered a follow-up schedule with four possible visit times. To obtain an average true CR of approximately 0.90, we set the visit times to be {2,5,7,8} with baseline hazard rates of 0.012 and 0.008 for βX1 = log(1.5) and βX1 = log(3), respectively. Fixing the visit times at {1, 3, 4, 6} and baseline hazard rates at 0.094 and 0.076 for βX1 = log(1.5) and βX1 = log(3), respectively, leads to an average true CR of approximately 0.55. Note that the visit times are not required to be equally spaced. Figure S1 in the Supplementary Materials depicts the estimated nonparametric maximum likelihood estimators of the survival distribution for the true and error-prone outcomes under the two CRs for βX1 = log(1.5) for a single simulated dataset.

To assess how our method performs when the baseline hazard varies across strata, we simulate four approximately equal sized strata. For test times at {2, 5, 7, 8}, we let the four baseline hazard rates be 0.008, 0.010, 0.011, and 0.019, which resulted in an overall CR of approximately 90%. Similarly, to obtain an overall CR of approximately 55%, the baseline hazard rates for each stratum were fixed at 0.090, 0.080, 0.075, and 0.131 for visit times at {1, 3, 4, 6}.

To capture the interval in which each simulated event occurred, we created an indicator for whether or not the current visit time was greater than the actual event time itself. This indicator variable was “corrupted” using sensitivity and specificity values in order to create the error-prone vector of outcomes, Yi. To mimic a diagnostic test with different levels of accuracy, we considered the case where sensitivity = 0.90 while specificity = 0.80, and sensitivity = 0.80 while specificity = 0.90. Later, we assess the performance of the proposed method when a baseline negative predictive value (η) less than 1 is incorporated into the analyses to adjust for erroneously included false negative participants. We vary η between 0.98 and 0.90. To simulate this scenario, we set the true time-to-event equal to 0 for a fixed proportion of subjects, η, included in the data. This represents an event time prior to the start of the study. In addition, we show that the proposed method can handle different visit structures by allowing each visit to be subject to a constant, independent probability of missingness, which mimics the missing completely at random setting for missing data. To simulate this, we create a binary variable indicating whether the jth visit is missing for each subject using a fixed probability PMiss of either 0.10 or 0.40. We further assess the method under parameters that mimic the structure of the WHI data example, with N = 65 000, nC = 500, Se = 0.61 and Sp = 0.995, η = 0.96, and a CR of 95% for the error-prone discrete failure time. To simulate self-reported outcomes in the WHI data, we stopped visit times for each subject after the first positive error-prone outcome. Regression coefficients for the discrete time Cox proportional hazards model and the true data can be estimated by fitting a generalized linear model assuming the binomial outcome and complementary log-log link.23

3.2 |. Simulation results

Tables 1, 2, 3, and 4 present estimates of mean percent bias, ASE, ESE, and 95% CP across the various settings described above. For Table 1, we consider the case where βX1 = log(1.5). Overall, we see that the proposed method improves over naive analyses in bias and in the nominal coverage of 95% confidence intervals (CI). In fact, under various different settings, the percent bias of our parameters of interest never exceeds 5%. In addition, we maintain nominal coverage for a 95% CI. Furthermore, our ASEs closely resemble the ESEs, demonstrating that our SE estimates also performed well. By contrast, for the analyses that ignore measurement error, estimates of βX1, βZ1, and βZ2 have bias as high as −96.33% and attain very little coverage. Table S1 in the Supplementary Materials further shows results for the method that corrects for covariate error only and the method that corrects for outcome error only under these same simulation settings. Regression parameters for the method correcting for only covariate error have absolute mean percent biases ranging from 47.05 to 82.31, while the method correcting solely outcome error has bias ranging from 5.840 to 70.28. Unsurprisingly, the proposed method greatly improves over all three alternative approaches that ignore measurement error to some degree.

TABLE 1.

The mean percent (%) biases, average standard errors (ASE), empirical standard errors (ESE), and coverage probabilities (CP) are given for 1000 simulated datasets for the proposed method and naive method with βX1 = log(1.5), βZ1 = log(0.7), and βZ2 = log(1.3); e is normally distributed with mean zero

Sea = 0.80, Spb = 0.90
Proposed
Naive
δ^(1) c CR d β % Bias ASE ESE CP % Bias ASE ESE CP

0.60 0.90 β X1 1.616 0.200 0.204 0.950 −88.03 0.046 0.046 0.000
β Z1 −1.094 0.143 0.142 0.945 −79.22 0.057 0.058 0.002
β Z2 −3.731 0.143 0.143 0.945 −84.07 0.057 0.054 0.021
0.55 β X1 −1.231 0.093 0.094 0.949 −68.11 0.038 0.038 0.000
β Z1 −1.055 0.067 0.066 0.958 −43.46 0.047 0.046 0.079
β Z2 −3.018 0.066 0.065 0.957 −53.48 0.046 0.045 0.133

0.30 0.90 β X1 1.840 0.283 0.286 0.954 −93.88 0.033 0.033 0.000
β Z1 −1.233 0.151 0.151 0.947 −82.46 0.054 0.055 0.001
β Z2 −4.212 0.151 0.150 0.945 −79.74 0.054 0.052 0.025
0.55 β X1 −2.246 0.131 0.133 0.940 −84.02 0.027 0.027 0.000
β Z1 −1.967 0.071 0.069 0.951 −52.48 0.045 0.044 0.008
β Z2 −3.899 0.070 0.068 0.956 −42.08 0.045 0.044 0.306

Se = 0.90, Sp = 0.80
Proposed
Naive
δ^(1) CR β % Bias ASE ESE CP % Bias ASE ESE CP

0.60 0.90 β X1 0.391 0.210 0.209 0.957 −93.08 0.037 0.037 0.000
β Z1 −3.692 0.150 0.153 0.942 −91.96 0.046 0.045 0.001
0.55 β X1 −1.246 0.094 0.093 0.960 −77.95 0.034 0.035 0.000
β Z1 −1.188 0.068 0.067 0.951 −61.05 0.042 0.042 0.001
β Z2 −3.502 0.067 0.066 0.953 −68.46 0.042 0.042 0.014

0.30 0.90 β X1 0.665 0.296 0.291 0.967 −96.33 0.026 0.026 0.000
β Z1 −0.963 0.158 0.160 0.951 −90.21 0.044 0.044 0.000
β Z2 −4.214 0.158 0.160 0.947 −89.56 0.044 0.043 0.001
0.55 β X1 −2.034 0.133 0.130 0.964 −88.87 0.024 0.024 0.000
β Z1 −1.994 0.072 0.070 0.950 −67.22 0.040 0.040 0.000
β Z2 −4.420 0.071 0.069 0.959 −60.63 0.040 0.040 0.029

Se = 1, Sp = 1
Truth
CR β % Bias ASE ESE CP

0.90 β X1 0.163 0.108 0.109 0.944
β Z1 0.205 0.107 0.107 0.953
β Z2 −0.586 0.107 0.109 0.949

0.55 β X1 0.639 0.052 0.052 0.948
β Z1 0.345 0.052 0.051 0.949
β Z2 −0.383 0.052 0.052 0.952
a

Se = Sensitivity.

b

Sp = Specificity.

c

δ^(1) = Estimate of attenuation coefficient.

d

CR = True censoring rate.

TABLE 2.

The mean percent (%) biases, average standard errors (ASE), empirical standard errors (ESE), and coverage probabilities (CP) are given for 1000 simulated datasets for the proposed method and naive method with βX1 = log(3), βZ1 = log(0.7), and βZ2 = log(1.3); e is normally distributed with mean zero

Sea = 0.80, Spb = 0.90
Proposed
Naive
δ^(1) c CR d β % Bias ASE ESE CP % Bias ASE ESE CP

0.60 0.90 β X1 −3.442 0.211 0.213 0.946 −88.61 0.047 0.048 0.000
β Z1 −6.773 0.146 0.145 0.941 −78.03 0.057 0.059 0.002
β Z2 −9.280 0.145 0.142 0.948 −88.07 0.057 0.054 0.012
0.55 β X1 −12.71 0.111 0.101 0.752 −72.45 0.040 0.038 0.000
β Z1 −12.57 0.075 0.068 0.916 −45.77 0.047 0.047 0.078
β Z2 −14.42 0.075 0.066 0.952 −67.90 0.047 0.045 0.032

0.30 0.90 β X1 −4.532 0.296 0.295 0.951 −94.26 0.033 0.033 0.000
β Z1 −8.063 0.156 0.152 0.944 −86.52 0.054 0.056 0.000
β Z2 −11.64 0.155 0.149 0.951 −77.03 0.054 0.052 0.025
0.55 β X1 −16.88 0.154 0.137 0.766 −86.56 0.028 0.027 0.000
β Z1 −16.75 0.080 0.071 0.899 −67.26 0.045 0.045 0.000
β Z2 −18.69 0.080 0.069 0.956 −42.46 0.045 0.044 0.299

Se = 0.90, Sp = 0.80
Proposed
Naive
δ^(1) CR β % Bias ASE ESE CP % Bias ASE ESE CP

0.60 0.90 β X1 −3.581 0.220 0.221 0.945 −93.60 0.038 0.039 0.000
β Z1 −6.164 0.152 0.153 0.936 −87.76 0.046 0.047 0.000
β Z2 −8.663 0.151 0.150 0.954 −94.13 0.046 0.045 0.000
0.55 β X1 −12.65 0.112 0.103 0.764 −80.88 0.035 0.035 0.000
β Z1 −12.64 0.076 0.071 0.915 −62.26 0.042 0.043 0.001
β Z2 −14.65 0.076 0.068 0.939 −78.22 0.042 0.042 0.001

0.30 0.90 β X1 −4.585 0.309 0.298 0.963 −96.73 0.027 0.027 0.000
β Z1 −7.171 0.163 0.159 0.947 −92.46 0.044 0.045 0.000
β Z2 −11.06 0.162 0.157 0.954 −88.01 0.044 0.043 0.000
0.55 β X1 −16.67 0.156 0.138 0.772 −90.62 0.025 0.024 0.000
β Z1 −16.65 0.082 0.073 0.901 −77.08 0.040 0.041 0.000
β Z2 −18.86 0.081 0.070 0.943 −60.56 0.040 0.040 0.025

Se = 1, Sp = 1
Truth
CR β % Bias ASE ESE CP

0.90 β X1 0.565 0.115 0.116 0.951
β Z1 −0.222 0.108 0.108 0.949
β Z2 −0.347 0.108 0.110 0.948

0.55 β X1 0.605 0.063 0.064 0.944
β Z1 0.264 0.054 0.054 0.952
β Z2 −0.162 0.054 0.052 0.955
a

Se = Sensitivity.

b

Sp = Specificity.

c

δ^(1) = Estimate of attenuation coefficient.

d

CR =True censoring rate.

TABLE 3.

The mean percent (%) biases, average standard errors (ASE), empirical standard errors (ESE), and coverage probabilities (CP) are given for 1000 simulated datasets for the proposed method and naive method with βX1 = log(1.5), βZ1 = log(0.7), and βZ2 = log(1.3); e is distributed as either a t with 4 df or as .4N(0,1)+.6N(2,1.5)

Sea = 0.80, Spb = 0.90
Proposed
Naive
e c CR d β % Bias ASE ESE CP % Bias ASE ESE CP

t e 0.90 β X1 −2.238 0.291 0.300 0.953 −94.50 0.031 0.031 0.000
β Z1 0.622 0.152 0.157 0.951 −81.80 0.054 0.054 0.000
β Z2 2.529 0.152 0.148 0.957 −76.70 0.054 0.053 0.033
0.55 β X1 −0.646 0.140 0.153 0.940 −85.54 0.025 0.026 0.000
β Z1 −2.362 0.072 0.072 0.950 −53.37 0.045 0.044 0.013
β Z2 −3.303 0.071 0.072 0.950 −39.35 0.044 0.044 0.335

mix f 0.90 β X1 0.394 0.335 0.330 0.955 −95.64 0.027 0.028 0.000
β Z1 −1.015 0.158 0.158 0.953 −83.29 0.054 0.055 0.000
β Z2 −0.800 0.156 0.152 0.962 −75.25 0.054 0.056 0.055
0.55 β X1 −1.081 0.156 0.151 0.958 −88.74 0.022 0.022 0.000
β Z1 −2.415 0.074 0.070 0.958 −55.28 0.044 0.045 0.010
β Z2 −2.083 0.073 0.070 0.964 −36.40 0.044 0.045 0.419

Se = 0.90, Sp = 0.80
Proposed
Naive
δ^(1) CR β % Bias ASE ESE CP % Bias ASE ESE CP

t 0.90 β X1 −3.792 0.305 0.316 0.942 −96.91 0.025 0.025 0.000
β Z1 1.848 0.160 0.165 0.948 −89.40 0.044 0.044 0.000
β Z2 3.386 0.159 0.158 0.959 −86.97 0.044 0.044 0.000
0.55 β X1 −1.119 0.141 0.159 0.933 −90.02 0.023 0.024 0.000
β Z1 −1.666 0.073 0.073 0.940 −67.86 0.040 0.040 0.000
β Z2 −3.048 0.072 0.074 0.944 −58.35 0.040 0.040 0.031

mix 0.90 β X1 −0.975 0.350 0.346 0.952 −97.65 0.022 0.024 0.000
β Z1 −1.354 0.166 0.162 0.961 −90.94 0.043 0.044 0.000
β Z2 0.585 0.164 0.160 0.955 −86.57 0.043 0.045 0.000
0.55 β X1 −1.904 0.159 0.155 0.955 −92.26 0.020 0.021 0.000
β Z1 −2.590 0.075 0.072 0.954 −69.29 0.040 0.040 0.000
β Z2 −1.350 0.074 0.069 0.967 −56.67 0.040 0.039 0.036

Se = 1, Sp = 1
Truth
CR β % Bias ASE ESE CP

0.90 β X1 0.002 0.108 0.106 0.959
β Z1 0.034 0.108 0.109 0.951
β Z2 1.032 0.107 0.106 0.961

0.55 β X1 0.395 0.053 0.052 0.952
β Z1 −0.462 0.052 0.052 0.948
β Z2 −0.300 0.052 0.050 0.954
a

Se = Sensitivity.

b

Sp = Specificity.

c

e refers to the distribution of the error.

d

CR = True censoring rate.

e

t with 4 df.

f

Mixture of two normals, that is, .4N(0,1)+.6N(2,1.5).

TABLE 4.

The mean percent (%) biases, average standard errors (ASE), empirical standard errors (ESE), and coverage probabilities (CP) are given for 1000 simulated datasets for the proposed method and naive method, when both allow for strata-specific baseline hazards

Sea = 0.80, Spb = 0.90
Proposed
Naive
δ^(1) c CR d β % Bias ASE ESE CP % Bias ASE ESE CP

0.60 0.90 β X1 1.893 0.202 0.199 0.961 −88.44 0.047 0.046 0.000
β Z1 3.249 0.145 0.148 0.954 −78.36 0.057 0.058 0.001
β Z2 −0.263 0.144 0.151 0.946 −81.41 0.057 0.058 0.044
0.55 β X1 −0.489 0.094 0.089 0.965 −68.31 0.038 0.038 0.000
β Z1 0.001 0.068 0.066 0.960 −42.99 0.047 0.047 0.095
β Z2 −0.885 0.067 0.066 0.958 −52.09 0.047 0.048 0.172

0.30 0.90 β X1 1.036 0.286 0.280 0.959 −94.20 0.033 0.033 0.000
β Z1 2.777 0.153 0.154 0.956 −81.55 0.055 0.056 0.000
β Z2 −0.353 0.152 0.159 0.944 −77.15 0.055 0.056 0.046
0.55 β X1 −1.095 0.133 0.126 0.962 −84.08 0.027 0.027 0.000
β Z1 −0.866 0.071 0.070 0.964 −51.93 0.045 0.044 0.015
β Z2 −1.897 0.071 0.069 0.960 −40.78 0.045 0.047 0.337

Se = 0.90, Sp = 0.80
Proposed
Naive
δ^(1) CR β % Bias ASE ESE CP % Bias ASE ESE CP

0.60 0.90 β X1 0.986 0.214 0.217 0.949 −93.42 0.037 0.038 0.000
β Z1 3.516 0.153 0.159 0.948 −87.94 0.046 0.048 0.000
β Z2 0.025 0.151 0.162 0.945 −89.97 0.046 0.047 0.002
0.55 β X1 −0.488 0.096 0.092 0.958 −78.24 0.034 0.034 0.000
β Z1 −0.100 0.069 0.068 0.961 −60.97 0.042 0.043 0.000
β Z2 −0.953 0.068 0.067 0.957 −67.67 0.042 0.042 0.020

0.30 0.90 β X1 −0.310 0.301 0.303 0.951 −96.68 0.027 0.027 0.000
β Z1 2.982 0.161 0.167 0.952 −89.72 0.044 0.046 0.000
β Z2 0.278 0.160 0.170 0.941 −87.53 0.044 0.045 0.002
0.55 β X1 −1.167 0.135 0.130 0.955 −89.05 0.024 0.024 0.000
β Z1 −0.943 0.073 0.072 0.962 −67.08 0.040 0.041 0.000
β Z2 −1.921 0.072 0.071 0.958 −59.89 0.040 0.041 0.039

Se = 1, Sp = 1
Truth
CR β % Bias ASE ESE CP

0.90 β X1 1.652 0.108 0.106 0.955
β Z1 2.270 0.108 0.110 0.949
β Z2 0.080 0.108 0.112 0.949

0.55 β X1 1.252 0.053 0.052 0.961
β Z1 1.153 0.053 0.052 0.961
β Z2 0.223 0.052 0.053 0.937

Note: We assume four equally sized strata. Let βX1 = log(1.5), βZ1 = log(0.7), and βZ2 = log(1.3); e is normally distributed with mean zero.

a

Se = Sensitivity.

b

Sp = Specificity.

c

δ^(1) = Estimate of attenuation coefficient.

d

CR = True censoring rate.

In Table 2, we set βX1 = log(3). The method still performs reasonably well when the CR is high (CR = 0.90), as absolute percent bias stays below 12% and nominal coverage is maintained. However, when the CR decreases to 0.55, we begin to see an increase in bias and a steep decrease in coverage, particularly for βX1. This is unsurprising, as regression calibration is known to break down with a larger β coefficient and a higher event rate.22 We observe that even in the most challenging scenarios for the proposed method, that is, a more extreme βX1, less censoring, and more covariate measurement error, the percent attenuation bias (coverage) was 17% (77%) compared with 91% (0%) for the naive analysis.

In Table 3, we examine the relative performance of our proposed method when the error in X* no longer follows a normal distribution. Here, we let the error in X* follow either a t distribution with 4 df, or a mixture of two normals, as described in the simulation setup. On average, we observe δ(1) = 0.27 when the error in X* follows the t distribution and δ(1) = 0.21 when the error follows the mixture distribution, which reflects substantial error in our simulated covariate of interest in all scenarios. Since the applied regression calibration method assumes a first-order approximation to estimate E(X|X*, Z), we expect the proposed method to perform best when the error in X* is normally distributed. Thus, it is unsurprising that the mean percent bias for the proposed method is a bit higher for βX1 under these settings, particularly when the error follows a t distribution. Nonetheless, absolute percent bias stays under 4% in all scenarios. Most intervals still come very close to achieving the nominal level of 95% CP. Our proposed approach still outperforms the naive method, which again shows severe bias of up to −97.65% and poor coverage.

Table 4 shows the performance of the proposed method alongside the naive method in terms of mean percent bias, ASE, ESE, and 95% CP when both approaches allow for stratification. In this table, we revert to letting the error in X follow a normal distribution and set βX1 = log(1.5). We assume that there are four equally sized strata. Similarly to what we observed in Table 1, we see that the method performs well in terms of bias and coverage. Absolute bias for βX1, corresponding to the error-prone covariate, ranges from 0.310% to 1.893% and is therefore quite low in all scenarios. The SE estimator works well, as indicated by the attainment of nominal coverage. Again, we see extremely high bias for the naive approach, ranging from −68.31% to −96.68% for βX1.

Type I error results for the coefficient corresponding to the error-prone covariate are presented in Table 5. Type I error values ranged from 0.039 to 0.058 across different values of Se, Sp, δ(1), and CR. With 1000 simulations, a 95% CI based on the true error rate α = 0.05 is (0.036, 0.064). All calculated error rates in Table 5 are within simulation error of the truth, indicating that type I error is preserved in the proposed method for all settings.

TABLE 5.

Type I error results for βX1 = 0 are given for 1000 simulated datasets for the proposed method

Se a Sp b δ^(1) c CR d Type I Error
0.80 0.90 0.30 0.55 0.048
0.90 0.042
0.60 0.55 0.058
0.90 0.042

0.90 0.80 0.30 0.55 0.043
0.90 0.039
0.60 0.55 0.049
0.90 0.044

Note: Let βX1 = log(1.5), βZ1 = log(0.7), and βZ2 = log(1.3); e is normally distributed with mean zero.

a

Se = Sensitivity.

b

Sp = Specificity.

c

δ^(1) = Estimate of attenuation coefficient.

d

CR = True censoring rate.

Table S2 of the Supplementary Materials demonstrates the performance of the proposed method, now including adjustment for an imperfect baseline negative predictive value. Under different levels of covariate error and changes to the sensitivity and specificity, the bias of our parameters remains under 6% and nominal coverage for a 95% CI is maintained, illustrating that the method performs well. We observe that the performance of the proposed method surpasses that of the naive method, which shows excessive bias in the parameters of interest, ranging from −79.01% to −97.33%.

In Table S3 of the Supplementary Materials, we show that the proposed method can accommodate missed visits. Our approach performs well in all scenarios, maintaining an absolute mean percent bias of under 4.353% when we let each visit to be subject to either 10% or 40% missingness. When there are missed visits, the proposed method outperforms the naive method, which shows extreme mean percent bias of up to −96.85%.

Finally, we present results for the simulations that mimic the structure of the WHI data in Table S4 of the Supplementary Materials. We see that the proposed method works well under measurement error settings similar to that of the WHI, maintaining an absolute percent bias of under 0.8% for all scenarios. Again, the proposed method outperforms the naive method, in which we see absolute percent bias as high as 89.53% for the regression parameter of interest and 0% CP for many scenarios. Similarly, the methods that correct for covariate error only and outcome error only both show extreme bias and inadequate coverage under these settings.

4 |. WHI EXAMPLE

4.1 |. WHI study

The WHI is a collection of studies launched in 1993 that together investigated the major causes of morbidity and mortality in US postmenopausal women.24 We seek to examine the association between energy, protein and protein density (percentage of energy from protein) intakes with the risk of diabetes when all three exposures as well as diabetes status are self-reported and subject to error.14,25 We analyze data on postmenopausal women aged 50 to 79 who participated in either the comparison arm of the dietary modification trial (DM-C) or the observational study (OS) and who had an average follow-up of approximately 9 years.26,27 Neither women from the DM-C nor the OS received study interventions. The WHI also included the nutritional biomarker study which collected objective recovery biomarkers for energy and protein intake, thought to have only classical measurement error, on a subset of participants (nC = 544). These biomarkers were previously used to develop calibration equations for the self-reported intakes of energy, protein and protein density.25 Using these calibration equations, Tinker et al20 reported incident diabetes HR in this cohort for energy, protein, and protein density that were corrected for the error in self-reported dietary exposures. Self-reported diabetes in the WHI has been reported to be subject to error.28 We apply our proposed method to correct for error in both the exposure and the diabetes failure time outcome. Our goal was to answer a similar research question as Tinker et al,20 only to use our method that additionally adjusts for error in the diabetes outcome. We adopted the same exclusion criteria as Tinker et al20 in order to arrive at our final analytic dataset of 65 358 participants. In short, these criteria attempt to align the characteristics of DM-C and OS cohorts and exclude those with missing data or who reported diabetes at baseline. Baseline was defined as the time of the first self-reported dietary assessment postenrollment, year 1 for the DM-C and year 3 for the OS. Further details are provided in Section S4 of the Supplementary Materials.

We started with the previously developed calibration equations for dietary energy, protein, and protein density from Neuhouser et al,25 which we call our “base” calibrations. Body mass index (BMI), age, race-ethnicity, income, and physical activity were included in the energy calibration model; BMI, age, race-ethnicity, income, and education for protein; and BMI, age, and smoking status for protein density. To avoid bias, regression calibration requires the calibration model to include the same covariates as the outcome model.9,29 We only considered the form of regression calibration in which the variables in the calibration and outcome models are exactly aligned. Thus, we extended each base calibration to include all predictors from our outcome model. Specifically, education, hypertension, and alcohol use were added to all calibrations. For each of the three nutrients, the calibration equation was fit by regressing the biomarker value (X**) on the corresponding self-reported value and participant characteristics, as described above.

In the WHI, prevalent diabetes was recorded via a self-reported questionnaire at baseline. We consider data from 8 years of annual follow-up visits in our analyses. Only the censored event-time was recorded in continuous time in our analytical dataset. Thus, we discretized the available data by dividing the follow-up time into nine possible intervals. Then, for all 65 358 women in our analytic cohort, we considered the time at which the first occurrence of self-reported diabetes or censoring time was recorded and assumed that the occurrence of the censored self-reported outcome happened in the annual interval that the event time fell into. We note that in other settings our method could accommodate an increase in the number of time intervals if follow-up occurred more frequently than once a year (eg, a biannual visit structure).

Self-reported diabetes in the WHI was previously reported to have a sensitivity of 0.61, specificity of 0.995, and a baseline negative predictive value of 0.96.14 We incorporated these values into our analyses. We also considered a sensitivity analysis in which we examined the results for a negative predictive value of 1 and explored cohort-specific values of sensitivity and specificity. All diabetes risk models were adjusted by standard risk factors, also included in the calibration equations. In addition, we stratified our discrete proportional hazards models on age in 10-year categories and DM-C or OS membership to better approximate previous analyses. Because BMI may be only a mediator for energy intake or may possibly also be an independent risk factor, it is not clear whether adjusting for BMI in our diabetes risk model is appropriate due to the challenge of overcontrolled or undercontrolled models, as discussed in Tinker et al20 Thus, we ran each outcome model with and without BMI.

To fit the naive model, we used the binomial generalized linear model with the complementary log-log link. To fit the model corrected for covariate error only, we used this same approach, then adopted the post hoc matrix correction and corresponding variance adjustment described in the body of this article. We applied our proposed approach to correct for error in both the self-reported diabetes outcome and dietary exposures. In all models, we used log values of dietary energy, protein, and protein density. We present HR and 95% CI associated with a 20% increase in consumption.

4.2 |. Results

Incident diabetes was reported in 3053 (4.7%) of the 65 358 participants of analytic cohort. Table 6 shows the results for the three different analysis approaches. In the BMI-adjusted analysis, the HR (95% CI) for a 20% increase in energy intake was 0.822 (0.512, 1.318) for the proposed approach compared with 1.041 (0.758, 1.492) for the covariate-error adjusted method and 1.002 (0.986, 1.018) for the naive approach. Note, however, that the incident diabetes is not significantly associated with increasing energy in any of these three models. Without BMI in the outcome model, the proposed method estimated a HR of 1.189 (0.836, 1.692) for a 20% increase in energy intake, compared with 1.421 (1.043, 1.938) for the covariate-error adjusted method and 1.024 (1.008, 1.040) for the naive method. In this case, adjusting for error in the self-reported outcome led to qualitatively different results in that the HR was about 20% smaller and no longer significant.

TABLE 6.

Hazard ratio (HR) and 95% confidence interval (CI) estimates of incident diabetes for a 20% increase in consumption of energy (kcal/d), protein (g/d), and protein density (% energy from protein/d) based on the naive method ignoring error in the outcome and covariate, the regression calibration method that corrects for covariate error only, and the proposed method

Modela Method HR (95% CI)
Adjusted for BMIb Not adjusted for BMI
Energy (kcal/d) Naive 1.002 (0.986, 1.018) 1.024 (1.008, 1.040)
Regression calibration 1.041 (0.758, 1.429) 1.421 (1.043, 1.938)
Proposed 0.822 (0.512, 1.318) 1.189 (0.836, 1.692)

Protein (g/d) Naive 1.024 (1.010, 1.039) 1.051 (1.035, 1.066)
Regression calibration 1.121 (1.036, 1.213) 1.231 (1.130, 1.342)
Proposed 1.077 (0.978, 1.186) 1.241 (1.114, 1.384)

Protein Density Naive 1.100 (1.064, 1.137) 1.128 (1.091, 1.167)
Regression calibration 1.243 (1.125, 1.374) 1.325 (1.181, 1.486)
Proposed 1.266 (1.115, 1.436) 1.327 (1.183, 1.490)

Note: Here, sensitivity = 0.61, specificity = 0.995, and negative predictive value = 0.96.

a

Each model is adjusted for potential confounders and is stratified on age (10-year categories) and dietary modification trial or observational study cohort membership.

b

BMI = Body mass index (kg/m2).

When we apply the proposed method, a 20% increase in protein intake is associated with a 1.077 (0.978, 1.186) HR, compared with a HR of 1.121 (1.036, 1.213) for the covariate-error adjusted method and 1.024 (1.010, 1.039) for the naive approach. When we do not adjust for BMI, all three approaches result in HR that are significantly associated with an increase in protein consumption. For protein density, whether or not we adjust for BMI, all three approaches show that a 20% increase in intake is positively associated with risk of diabetes. When we adjust for BMI, the HR estimated by the proposed method, 1.266 (1.115, 1.436), is fairly similar to the HR estimated by the method that adjusts for covariate error only, 1.243 (1.125, 1.374), and somewhat higher than the HR estimated by the naive method, 1.100 (1.064, 1.137). We note some of our HR differ from the results reported by Tinker et al20 We believe this is due to a few discrepancies in the analytical dataset and model and is discussed further in Section S4 of the Supplementary Materials.

In Table S5 of the Supplementary Materials, we present a WHI data analysis results table that ignores the issue of an imperfect baseline self-report and assumes the negative predictive value is 1. For energy and protein density, assuming baseline self-reports are perfect does not qualitatively change our results. However, for protein, the HR (95% CI) estimated by the proposed method is 1.077 (0.978, 1.186) when the negative predictive value is set to 0.96, but changes to 1.107 (1.025, 1.195) when the negative predictive value is set to 1. Here, we see that because our estimate is so close to a boundary, incorporating the uncertainty at baseline into our analyses does slightly change our results.

Since we analyzed data on participants from two different cohorts, the WHI DM-C trial and the WHI OS, we investigated how cohort-specific sensitivity and specificity might impact our HR estimates. We used a weighted-average approach to select sensitivity and specificity values for the DM-C and OS trials such that the overall values worked out to be 0.61 and 0.995, respectively. One might hypothesize that the clinical trial (WHI DM-C) recorded data with higher accuracy than the larger OS, though in our analysis we also consider the possibility that sensitivity and specificity are higher for the OS. Table S6 in the Supplementary Materials presents the results of this analysis. We observe that implementing slightly variable cohort-specific sensitivity and specificity values was not enough to qualitatively impact our conclusions regarding the significance of the association between an increase in intake of dietary energy, protein, or protein density with the risk of diabetes.

5 |. DISCUSSION

In settings such as large epidemiological studies, where outcomes or complex exposures are often collected by self-report, both the exposure and outcome of interest can be subject to measurement error. This was observed in our data example from the WHI, but has also been observed in other cohorts where data were reliant on routinely collected electronic health records data.30,31 This article presents a method to accommodate errors in continuous covariates and a discrete failure time outcome variable when sensitivity and specificity of the error-prone outcome are known; when error rates are unknown, our method can be used as a sensitivity analysis using hypothesized values. The proposed method can be applied when, for a subset, there is either a gold standard measure of the exposure or a second measure with independent, unbiased (classical) measurement error available. For the WHI, the calibration subset containing the variable with classical measurement error was sampled after baseline with the assumption that the measurement error model did not change over time.

We studied the relative performance of the proposed method under various settings of sensitivity, specificity, error variance of the exposure, and CR, including those where ignoring the measurement error led to extreme bias in the regression parameters of interest. In all settings studied, our method led to nearly unbiased estimates of the regression parameters, maintaining bias of less than approximately 19% for nonzero regression parameters and generally much less bias when the underlying log-hazard parameter β was of moderate size (eg, log(1.5)). Furthermore, our variance estimator performed favorably, as evidenced by the CP and ASEs that closely resembled ESEs. Our variance estimator assumes approximate independence of β^* and δ^. While we have not verified independence of these components for all settings, even in our settings where the calibration subset was 50% of the cohort, we observed no appreciable correlation between these estimates (data not shown). If there is concern that this approximate independence does not hold, one could instead consider a bootstrap approach for variance estimation. For our simulations where βX1 = 0, we observed that type I error rates were preserved. Our adjustment for covariate error relied on a regression calibration type adjustment. As expected from previous literature, this method performs best when the regression parameter corresponding to the error-prone covariate is of modest size, the error in the covariate is normally distributed, and the CR is high (ie, the event of interest is rare). Our method in particular shows more appreciable bias when the regression parameter is of large size, for example, βX1 = log(3), especially for a lower CR. This method proved to be fairly robust to changes in the distribution of the error in X studied; for more extreme deviations from normality, this may no longer be true. Our method also performs favorably after stratifying on one or more covariates. Finally, the proposed method works well under simulation parameters that mimic the structure of the WHI data. In all scenarios explored, the proposed method substantially outperformed the naive method, which repeatedly showed severe bias and minimal coverage. For settings different from those studied, one might consider conducting additional numerical studies.

The method introduced in this article is applied to data from 65 358 postmenopausal women enrolled in the WHI to assess the association between energy, protein, and protein density intake and the risk of incident diabetes, adjusting for error in self-reported exposures and outcome. HR obtained for all exposures were considerably different than those from the naive analyses ignoring the error in both diabetes status and dietary intake and those that only adjusted for error in dietary intake. In some cases, our proposed method led to qualitatively different conclusions in that the parameter of interest was no longer statistically significant. For the case of nondifferential outcome error, this stems largely from the increased uncertainty in the results coming from the uncertain outcomes. These conclusions demonstrate the importance of adjusting for errors in both outcomes and covariates.

Our proposed method offers a practical approach to estimating the association between a covariate and a discrete time-to-event outcome, when both are recorded with error. A limitation of our approach stems from the curse of dimensionality that can accompany discrete data in settings where the visit times are irregular, which can cause the number of parameters to grow with the number of subjects in the data. It is impractical to assume that in a real data setting, all subjects’ visit times fall on the same schedule in the study (eg, exactly annually). Thus, we must make a compromise depending on how many parameters the data can stably support. Ultimately, the data should help inform a reasonable decision regarding the number of intervals to consider for analyses of this type. Sensitivity analyses can be also be conducted to examine whether the number or choice of discrete time intervals affected study estimates. In many cohort studies with long-term follow-up like the WHI, there is a specified visit schedule in the study protocol. If all subjects adhere to this schedule with little variation, this naturally leads to the discrete-time framework with a common set of possible visit times across all individuals. Frequently in these studies, including the WHI, the observed visit schedule varies across subjects. To apply the proposed method in our WHI example, we made some simplifying assumptions. Since our analytical dataset included only the amount of time that elapsed between enrollment and the first occurrence of self-reported diabetes or censoring time recorded on a continuous timescale, we rounded the censored event-time to the nearest annual visit date and assumed the outcome or censoring event occurred sometime between that visit and the prior annual visit. If data are available on the timing of all visits, the likelihood could be adapted to allow for longer intervals between visits for some individuals (ie, missed visits).

We note that for the case of self-reported data, we assume that each subject is followed up until the first positive, as it is not expected that a new diagnosis would be subsequently recorded. This assumption corresponds to the applied setting in which self-reported disease incidence stops after the first positive report. However, the model by Gu et al14 and thus the proposed approach do allow for a more flexible framework and can accommodate repeated testing. As an example, this approach can be applied to a dataset containing repeat blood test results, such as those used in monitoring for cancer relapse.

A potential limitation of our work is the reliance of the proposed method on the assumption that given the true disease status at each visit, the error-prone outcomes are independent. In the WHI data, we assume the self-reported outcomes are far enough apart that there are a number of random processes affecting a subject’s knowledge and interpretation of the outcomes questionnaire that make this independence assumption reasonable; however, this assumption may not always be realistic, particularly for the case of self-reported data. We note that our method is applicable more generally to settings where the error-prone outcome of interest is not self-reported, but derived say from an objective biomarker for which this assumption may be more reasonable. In future work, one might consider a similar framework to the one proposed which relaxes this assumption by positing a more complex error model for the outcome of interest, such as one with sensitivity and specificity potentially dependent on covariates or previous responses.

The increasing reliance of clinical research on self-administered questionnaires or administrative databases in epidemiological studies has led to more attention being given to methods to correct for measurement error. Gu et al14 conducted a sensitivity analysis to show how changes in sensitivity, specificity, and negative predictive shifted the estimated HR of statin use on the risk of incident diabetes in data from the WHI. The results showed that the estimated HR is highly sensitive to changes in specificity and modestly sensitive to changes in sensitivity and negative predictive value. This analysis helps illustrate the importance of having accurate values of sensitivity and specificity in the proposed method. Our sensitivity analysis showed that while varying sensitivity and specificity by cohort did not qualitatively change the results in our particular example, the HR estimates are much more vulnerable to changes in specificity when the event of interest is as rare as it is in the WHI data (diabetes incidence = 4.7%). Thus, we emphasize the importance of employing correct values of sensitivity and specificity, especially when they might vary by some demographic factor or group membership.

This article explored the incorporation of the negative predictive value into the analyses to handle misclassification at baseline. Evidence suggests that some women in the WHI who provided a negative self-report of diabetes at baseline were actually diabetic. A question of interest is whether mistakenly excluding women who were false positives can induce bias. It has been previously reported that when all potential confounders are adjusted for in the outcome model and the missing at random assumption is satisfied, missing data should not cause bias.32 Furthermore, given that positive predictive value is assumed to be quite high in the motivating data example, we did not explore the issue further in this article. This exclusion criteria-related matter may be more relevant in other cohorts, particularly if the reason for exclusion is related to some unobserved characteristic.

A worthwhile extension of this work might consider incorporating covariate-specific or even subject-specific sensitivity and specificity, particularly when these values are no longer assumed to be known constants and need to be estimated along with the outcome model parameters. Such an extension would require a validation or calibration subset to also contain information on the measurement error structure of the self-reported outcome. When the outcome is rare, such a cohort can be difficult to construct prospectively as validation subsets are generally of fairly modest size due to cost. Efficient choices of a validation sampling design and development of analysis methods that provide consistent estimates of the target parameter are two important areas of future research.

Supplementary Material

supplemental material

ACKNOWLEDGEMENTS

This work was supported in part by NIH grant R01-AI131771 and PCORI grant R-1609-36207. The authors would like to thank the investigators of the Women’s Health Initiative (WHI) for the use of their data. A short list of WHI investigators can be found here: https://www.whi.org/researchers. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C.

Funding information

National Institutes of Health, Grant/Award Number: R01 AI131771; Patient-Centered Outcomes Research Institute, Grant/Award Number: R160936207

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of this article.

DATA AVAILABILITY STATEMENT

The data used in this article can be obtained through submission and approval of an article proposal to the Women’s Health Initiative Publications and Presentations Committee, as described on the WHI website.33 For more details, see https://www.whi.org/researchers/data.

REFERENCES

  • 1.Centers for Disease Control and Prevention. National Diabetes Statistics Report, 2017. Vol 20. Atlanta, GA: Centers for Disease Control and Prevention, US Department of Health and Human Services; 2017:1–20. [Google Scholar]
  • 2.Shah BR, Manuel DG. Self-reported diabetes is associated with self-management behaviour: a cohort study. BMC Health Serv Res. 2008;8(1):142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ning M, Zhang Q, Yang M. Comparison of self-reported and biomedical data on hypertension and diabetes: findings from the China Health and Retirement Longitudinal Study (CHARLS). BMJ Open. 2016;6(1):e009836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Schneider AL, Pankow JS, Heiss G, Selvin E. Validity and reliability of self-reported diabetes in the atherosclerosis risk in communities study. Am J Epidemiol. 2012;176(8):738–743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. Boca Raton, FL: Chapman & Hall/CRC Press; 2006. [Google Scholar]
  • 6.Shaw PA, Deffner V, Keogh RH, et al. Epidemiologic analyses with error-prone exposures: review of current practice and recommendations. Ann Epidemiol. 2018;27(11):821–828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982;69(2):331–342. [Google Scholar]
  • 8.Rosner B, Willett W, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med. 1989;8(9):1051–1069. [DOI] [PubMed] [Google Scholar]
  • 9.Rosner B, Spiegelman D, Willett W. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol. 1990;132(4):734–745. [DOI] [PubMed] [Google Scholar]
  • 10.Buonaccorsi JP. Measurement Error: Models, Methods, and Applications. Boca Raton, FL: Chapman & Hall/CRC Press; 2010. [Google Scholar]
  • 11.Balasubramanian R, Lagakos SW. Estimation of a failure time distribution based on imperfect diagnostic tests. Biometrika. 2003;90(1):171–182. [Google Scholar]
  • 12.Meier AS, Richardson BA, Hughes JP. Discrete proportional hazards models for mismeasured outcomes. Biometrics. 2003;59(4):947–954. [DOI] [PubMed] [Google Scholar]
  • 13.Magaret AS. Incorporating validation subsets into discrete proportional hazards models for mismeasured outcomes. Stat Med. 2008;27(26):5456–5470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gu X, Ma Y, Balasubramanian R. Semiparametric time to event models in the presence of error-prone, self-reported outcomes–With application to the Women’s Health Initiative. Ann Appl Stat. 2015;9(2):714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Keogh RH, Shaw PA, Gustafson P, et al. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1–basic theory and simple methods of adjustment. Stat Med. 2020;39(16):2197–2231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shaw PA, Gustafson P, Carroll RJ, et al. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1–basic theory and simple methods of adjustment. Stat Med. 2020;39(16):2232–2263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Spiegelman D, Schneeweiss S, McDermott A. Measurement error correction for logistic regression models with an “alloyed gold standard”. Am J Epidemiol. 1997;145(2):184–196. [DOI] [PubMed] [Google Scholar]
  • 18.Green MS, Symons MJ. A comparison of the logistic risk function and the proportional hazards model in prospective epidemiologic studies. J Chronic Dis. 1983;36(10):715–723. [DOI] [PubMed] [Google Scholar]
  • 19.Harrell FE Jr. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. New York, NY: Springer; 2015. [Google Scholar]
  • 20.Tinker LF, Sarto GE, Howard BV, et al. Biomarker-calibrated dietary energy and protein intake associations with diabetes risk among postmenopausal women from the Women’s Health Initiative. Am J Clin Nutr. 2011;94(6):1600–1606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.R Core Team R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2018. https://www.R-project.org/. [Google Scholar]
  • 22.Shaw PA, Prentice RL. Hazard ratio estimation for biomarker-calibrated dietary exposures. Biometrics. 2012;68(2):397–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hashimoto EM, Ortega EM, Paula GA, Barreto ML. Regression models for grouped survival data: estimation and sensitivity analysis. Comput Stat Data Anal. 2011;55(2):993–1007. [Google Scholar]
  • 24.The Women’s Health Initiative Study Group. Design of the Women’s Health Initiative clinical trial and observational study. Control Clin Trials. 1998;19(1):61–109. [DOI] [PubMed] [Google Scholar]
  • 25.Neuhouser ML, Tinker L, Shaw PA, et al. Use of recovery biomarkers to calibrate nutrient consumption self-reports in the Women’s Health Initiative. Am J Epidemiol. 2008;167(10):1247–1259. [DOI] [PubMed] [Google Scholar]
  • 26.Ritenbaugh C, Patterson RE, Chlebowski RT, et al. The Women’s Health Initiative Dietary Modification trial: overview and baseline characteristics of participants. Ann Epidemiol. 2003;13(9):S87–S97. [DOI] [PubMed] [Google Scholar]
  • 27.Langer RD, White E, Lewis CE, Kotchen JM, Hendrix SL, Trevisan M. The Women’s Health Initiative Observational Study: baseline characteristics of participants and reliability of baseline measures. Ann Epidemiol. 2003;13(9):S107–S121. [DOI] [PubMed] [Google Scholar]
  • 28.Margolis KL, Qi L, Brzyski R, et al. Validity of diabetes self-reports in the Women’s Health Initiative: comparison with medication inventories and fasting glucose measurements. Clin Trials. 2008;5(3):240–247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kipnis V, Midthune D, Buckman DW, et al. Modeling data with excess zeros and measurement error: application to evaluating relationships between episodically consumed foods and health outcomes. Biometrics. 2009;65(4):1003–1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shepherd BE, Yu C. Accounting for data errors discovered from an audit in multiple linear regression. Biometrics. 2011;67(3):1083–1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Oh EJ, Shepherd BE, Lumley T, Shaw PA. Raking and regression calibration: methods to address bias from correlated covariate and time-to-event error; 2019:1–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Groenwold RH, Donders ART, Roes KC, Harrell FE Jr, Moons KG. Dealing with missing outcome data in randomized trials and observational studies. Am J Epidemiol. 2011;175(3):210–217. [DOI] [PubMed] [Google Scholar]
  • 33.Women’s Health Initiative Women’s health initiative investigator datasets; 2019. https://www.whi.org/researchers/data/Pages/Home.aspx.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental material

Data Availability Statement

The data used in this article can be obtained through submission and approval of an article proposal to the Women’s Health Initiative Publications and Presentations Committee, as described on the WHI website.33 For more details, see https://www.whi.org/researchers/data.

RESOURCES