Abstract
Measurement error arises through a variety of mechanisms. A rich literature exists on the bias introduced by covariate measurement error and on methods of analysis to address this bias. By comparison, less attention has been given to errors in outcome assessment and nonclassical covariate measurement error. We consider an extension of the regression calibration method to settings with errors in a continuous outcome, where the errors may be correlated with prognostic covariates or with covariate measurement error. This method adjusts for the measurement error in the data and can be applied with either a validation subset, on which the true data are also observed (eg, a study audit), or are liability subset, where a second observation of error prone measurements are available. For each case, we provide conditions under which the proposed method is identifiable and leads to consistent estimates of the regression parameter. When the second measurement on the reliability subset has no error or classical unbiased measurement error, the proposed method is consistent even when the primary outcome and exposures of interest are subject to both systematic and random error. We examine the performance of the method with simulations for a variety of measurement error scenarios and sizes of the reliability subset. We illustrate the method’s application using data from the Women’s Health Initiative Dietary Modification Trial.
Keywords: bias, linear regression, measurement error, nutrition assessment, nutritional epidemiology, regression calibration
1 |. INTRODUCTION
Measurement error arises in many biomedical settings. For example, data can be recorded with error in large clinical databases. Errors are also common in exposures that are primarily measured by self-report, such as dietary intakes. These errors can occur in the outcome or covariates, or both. Furthermore, errors in the outcome may be correlated with important prognostic variables or with errors in covariates. While much attention in the literature has focused on the setting of classical covariate measurement error, less attention has been paid to errors in the outcome. One reason for this is that for independent, mean zero random error in the outcome, the usual linear regression parameter estimates are unbiased; however, if the error in the outcome is related to prognostic variables or the errors in the covariates, then the estimated regression coefficients will be biased. Some work has been done in the setting of covariate-independent error in the response for the generalized linear model1 and several have considered methods to address misclassification in the response.2–6 Our interest is the linear regression setting where there may be error in both the response and covariates, and where the error in the response is correlated with either error in a covariate and or other covariates in the regression model.
One setting where errors are common in both outcome and exposures is large clinical studies, particular those using data primarily collected for nonresearch purposes (eg, administrative databases and electronic health records). Certain types of records (eg, records from a particular study site) are often more likely to have errors, records with errors often tend to have errors across multiple variables, and the magnitude of these errors may be correlated. When a covariate alone is subject to random error, its regression coefficient will be attenuated; however, with correlated errors in the outcome and exposure covariates, ignored errors in the data could lead to bias in any direction in estimates of regression parameters. In some studies, data validation or audits are performed in subsets of records for quality control purposes. Although often discarded, findings from audit data can be used to adjust analyses for errors which remain in the unaudited data. We will consider a measurement error correction method for linear regression that produces consistent estimates in the presence of correlated errors using audit/validation data, where it is assumed the true error-free values are observed on the audit/validation subset. Shepherd and Yu7 proposed a moment correction method for linear regression for this setting and applied their method to data from an HIV observational cohort study. Shepherd et al.8 considered similar methods in the context of audited randomized clinical trials. We consider an extension of the popular regression calibration method to this setting.
We also address a setting not considered by previous work to address correlated errors in outcomes and covariates, one where no validation subset is available, but where a second, objective measurement, that is, with unbiased classical measurement error, for the error prone exposures and outcome is available on a subset. In epidemiology, there are many examples of exposures and outcomes that are self-reported, and subject to both random and systematic error, but for which an objective measurement can also be obtained by a more expensive or more invasive procedure. One such setting comes from nutritional epidemiology, where patterns of dietary consumption are of interest; namely how one dietary attribute, such as energy intake, is associated with other dietary intakes, all of which are measured by self-reported questionnaire data. A common instrument to measure dietary intake, the food frequency questionnaire (FFQ), has been shown to have both systematic and random error;9 and errors in outcomes and exposures of interest assessed by the FFQ are likely correlated. Social pressure to lead a healthy lifestyle could correlate the systematic measurement error across self-reported outcomes and exposures. For some nutrients, there are objective biomarkers, known as recovery markers, which capture actual intake up to mean zero, random error. Two such examples are the doubly labeled water recovery marker for total energy consumption10 and a 24-hour urinary nitrogen recovery marker for protein intake.11 Due to expense and participant burden, a large prospective cohort study generally could only include these recovery markers on a small subset. We show that our methods can also be applied in this setting, where the primary exposure and outcome measures are subject to errors which are correlated and may consist of both systematic bias and random error and where an objective measure for these variables is available in a subcohort.
In this article, we develop an extension of regression calibration that yields consistent parameter estimates in linear regression in the presence of systematic and random measurement error in the outcome and covariates, as well as potentially correlated errors between the outcome and covariates. Regression calibration, introduced by Prentice,12 has become a popular method for addressing covariate measurement error, likely because of its easy implementation and its good numerical performance in a broad range of settings.13 Many applications of this method assume classical, unbiased measurement error in the covariate only, and to date this method has not addressed correlated errors between outcomes and exposures. Correlated errors in the outcome and exposure have been considered for linear regression in settings where the comparison of two different assays for a given compound is of interest. These methods frequently focus on the best measure of agreement for the assays or the calibration of one assay using another, but do not address covariate-dependent errors.14,15 Keogh et al16 considered the case where there was differential linear outcome error dependent only on an error-free binary treatment group indicator. We consider the more general case where the correlated errors in X and Y are mean zero and also where, as in the self-reported diet example, errors in both outcome and covariates could have bias dependent on subject characteristics. We provide assumptions under which repeat error-prone measures on a subset are sufficient for the proposed method and that the true outcomes and covariates do not need to ever be observed. We also consider the case where the true data are observed in a subset.
We examine the numerical performance of the proposed method for a variety of measurement error scenarios and compare its performance to the naive solution that ignores the error. We then illustrate the method with a data example using nutritional assessment data from the Women’s Health Initiative Dietary Modification Trial.17
2 |. MEASUREMENT ERROR MODEL
Let Yi be a continuous outcome, Xi be a p × 1 vector of covariates that may be observed with error, and Zi be a q × 1 vector of accurately observed covariates, for i = 1, …, N iid observations. Let and be error prone versions of Xi and Yi, respectively. Assume (Yi, Xi, Zi) follows the linear model
| (1) |
where ϵi is mean zero random error that is independent of all other random variables in Equation (1). Instead of observing (Xi, Zi, Yi), (, Zi, ) are observed on a cohort of N iid individuals and a random subset of n individuals have been selected to have a second measure of Xi and Yi. We consider three cases: (1) where the second observation is a repeat observation of , where j = 1, …, k; (2) where the second measure is the true (Xi, Yi), and (3) where the second measure is a different error prone measure of (Xi, Yi), namely , which is an objective biomarker measure whose errors are mean zero and independent of all other variables. Let Vi be the indicator that individual i is selected to have a second measure of (Xi, Yi). In the measurement error setting, this subset is referred to as a reliability subset when a second error prone measure is observed and a validation subset when the truth observed. We will refer to the group of individuals for which Vi = 1 as the reliability subset in case 1; as the validation or audit subset in case 2, and the biomarker subset in case 3. We next define the notation and measurement error model for each of these cases.
2.1 |. Case 1: Reliability subset
Define
| (2) |
where i = 1, …, N; j = 1, …, ki; and ki = 2 for individuals where Vi = 1 and 1 otherwise. (This could be easily extended to ki > 2 for some individuals.) Suppose are random error terms independent and identically distributed across individuals and with mean zero. We allow , but we assume that the error terms in repeat observations of and are independent, namely for j ≠ j′. In this case, where only repeat measures of (X⋆, Y⋆) are available, we must also assume that the error terms Ti and are independent of (Xi, Yi, Zi) for the necessary parameters for and to be identifiable (see Appendix). For cases 2 and 3, we can relax this assumption.
2.2 |. Case 2: Validation subset
Here we assume the same additive error model as in case 1, only that ki = 1 for all subjects and for the subset of n individuals where Vi = 1, we assume the true covariate and outcome (Xi, Yi) are also observed. When discussing this case, we will drop the second subscript j to emphasize there are no repeat observations. Since X and Y are observed in a subset of individuals, we can also allow for a more general error model, including differential error. That is, we can allow to have nonzero mean and dependence on (Xi, Yi, Zi) without losing identifiability.
A motivating setting for the validation subset case is that of the data audit of clinical studies discussed by Shepherd and Yu.7 In this setting, it is assumed that a subset of individuals are selected for a data audit that will ascertain the true outcome and exposure. These authors considered a mixture model for with point mass at 0. We consider this model as a special case of Equation (2).
2.3 |. Case 3: Biomarker subset
Motivated by our data example, we consider a more general error model than case 1 that allows the errors , in addition to being potentially correlated, to have nonzero mean due to potential scale and location bias in . We assume the general measurement error model for the setting of nutritional intake data,18 which allows for linear covariate-dependent systematic biases, which has also been applied to physical activity data.19,20
We assume that ki = 1 for all subjects and that the error for the primary measures may have systematic bias that is a linear function of Zi. Specifically, for defined by Equation (2), let
| (3) |
where are potentially correlated mean zero random error terms that are independent of the other terms in the model. For a random subset of n individuals (Vi = 1), we assume the objective biomarker measurements are also available, which obey a classical error model. Namely,
| (4) |
where ηi and νi are mean zero random errors that are mutually independent as well as independent of all terms in Equation (1) and the error terms in Equation (3). Because the errors (η, ν) are independent of (X, Y, Z), this case will maintain some of the same flexibility as case 2, in that we can allow cov(X, T) and to be nonzero and still have all parameters identifiable on the biomarker subset. Consequently, our proposed methods will be able to correct for a more general measurement error model for the primary measures (X⋆, Y⋆) observed on the whole cohort even when the true (X, Y) are never observed.
3 |. PROPOSED METHOD
Following the general approach of regression calibration, we model the expected value of the unobserved data as a function of the observed data and use these quantities to estimate the regression parameters of interest. Using the additivity of the error, one has
The second equality holds by applying the law of iterated expectation to the first term. The third equality holds because, by assumption, X⋆ provides no further information about Y than is contained in (X, Z). This suggests that one could regress on and get a consistent estimate for β = (β0, βx, βz), where denotes an estimate of the expectation. If the error term in Y⋆ is independent of that in X⋆, or when there is no measurement error in the observed outcome variable, one can perform the regression of Y⋆ on instead of (X, Z), which is the usual regression calibration approach,12 to obtain a consistent estimator of β = (β0, βx, βz). Estimation of and is described below for each case.
For an estimator of , one can consider the following first-order approximation
| (5) |
where Σab ≡ Cov(a, b) and Σa ≡ Var(a). Similarly, we approximate E[X|X⋆, Z] as
| (6) |
This general approach is tailored to each of the cases discussed in Section 2. In each case, a sandwich estimator of the variance is derived using the stacked estimating approach outlined by Stefanski and Boos.21 The Appendix provides more details. Standard errors for the proposed method can also be calculated using the bootstrap. For the bootstrap variance estimator, the bootstrap sampling is stratified on the subset membership status (ie, reliability, validation, or biomarker subset). For nonsubset members is resampled. For subset members this data, together with the additional information observed on those individuals, that is, in the case of the validation sample, is resampled.
3.1 |. Case 1
We first consider the reliability subset model shown in Equation (2). In this case, the errors are independent of (X, Y, Z) and centered at 0. One can use the following equalities to obtain the moments involving random variables not directly observed: , , , , , , and . Estimates of can be estimated using the reliability subset, and estimates of can be obtained from the entire study cohort.
For case 1, we have assumed subjects in the reliability subset have two measures of (X⋆, Y⋆), and those not in the reliability subset have only one measure. In the above formulas one can think of X⋆ as a vector of two observations or a single observation as appropriate. The Appendix provides an explicit estimating equation for each of the nuisance parameters, which can also be used to provide a sandwich estimator of the variance for the regression parameters of interest using the stacked estimating approach outlined by Stefanski and Boos.21
3.2 |. Case 2
Case 2, the validation subset, also uses Equations (5) and (6) for estimation of E(X|X⋆, Y) and . For case 2, all necessary moments in Equations (5) and (6) can be directly estimated, since both (X, Y) and (X⋆, Y⋆) are observed from the validation subset. This allows for a more general error model to be identifiable with the data. For example, T and could be allowed to be correlated with Z. In this case, we need to estimate and ΣXZ from the validation subset. The stacked estimation equations used for the M-estimator and the sandwich estimator of the variance for this case are also provided in the Appendix.
3.3 |. Case 3
For case 3, the biomarker subset, we consider an error model that allows for systematic and correlated error, but for which the true (X, Y) can never be observed. From Equation (3), one has
where , which has a different functional form than in cases 1 and 2. One can once again estimate β by regressing on . In this case, the parameters necessary to calculate are estimated by regressing Y⋆ − YB on (Z, X⋆) in the biomarker subset and one could also choose to estimate E(X|X⋆, Z) with a similar regression approach, instead of the moment based approach in Equations (5) and (6). Because the biomarker XB only involves classical measurement error, E(XB|X⋆, Z) = E(X|X⋆, Z) and the nuisance parameters necessary for can be estimated by the regression
The stacked estimation equations for the sandwich estimator of the variance for case 3 are provided in the Appendix. As shown in the Appendix, parameters for the proposed method will be identifiable for error structures with E[XB|X⋆, Z] and c⋆ linear in (X⋆, Z), using the biomarker subset data whose independent error terms are as described in Section 2.3.
4 |. SIMULATION STUDY
We examine the finite sample performance of the proposed regression calibration method using numerical simulation. In this section we focus on case 1, assuming a reliability subset is available. We examine a limited set of simulations for case 2, given the similarity with case 1. Numerical simulations for case 3 are presented in the next section as part of the data example. For the different parameter scenarios for the error and outcome regression models, we compare the numerical performance of the proposed method with that of standard linear regression using the true (X, Y) (True method) and the error prone observed data (X⋆, Y⋆) without correction (Naive method). For all scenarios, we summarize the %bias, empirical standard error (SE), average standard error from the sandwich estimator (ASE), and the coverage probability for the normal 95% confidence intervals (CP) based on the sandwich variance estimator across 1000 Monte Carlo simulations.
We assume the linear regression model Y = β0 + βxX + βzZ + ϵ, where μX = 0, σX = 1, and σϵ = 5, where σX is the standard deviation for X and similarly for ϵ. For Scenario 1, we simulate the simple linear regression model (βz = 0), with β0 = 2 and let βx ∈ {1, 5}. For Scenario 2, we consider the ANCOVA model and additionally study the effect of measurement error on a precisely observed covariate Z, setting (β0, βx, βz) = (2, 1, −1), σz = 1, and ρxz = cor(X,Z) ∈ {0, 0.5}. For the measurement error parameters, we consider σT ∈ {0.5, 1}, assume and allow to vary, with . Note the size of the error was chosen to represent moderate error, where the variance of the error is 25% the variance of the true exposure X, and large error, where the error variance is equal to the variance of X. For Scenarios 1 and 2 we consider normally distributed data, with a total cohort size of N = 400 and we assume we have a reliability subset of size n, for which a second measure of (X⋆, Y⋆) is available. We vary the size of this reliability subset, with n ∈ {25, 50, 100, 200, 400} in order to understand how absolute size of validation set may have affected the stability of the moment estimates, and consequently, the relative performance of the proposed method over the naïve method. We then consider simulations with N = 1000, as well as simulations with non-Gaussian error distributions.
Table 1 presents the results for Scenario 1, with βx = 1. For all the parameter configurations in this table, there is appreciable bias in the Naive estimates, that is, in the estimate based on error prone (X⋆, Y⋆) with no adjustment for the measurement error. Note, that even when ρ = 0, namely uncorrelated errors in the response Y* and X*, there is still attenuation bias in the regression coefficients due to the error in X*. This is evident by the moment formulas for the regression coefficients for Y regressed on X* and Z, which will not be the same as for the regression on X and Z.8 It is also clear from expanding out those formulas for the naïve regression that the error in X* causes a scale bias in the coefficient for X that depends on the covariance of X, Z, and X*. The correlated error from a nonzero ρ creates an extra additive error term for the coefficient of X that is proportional to , and whose sign matches the sign of ρ. Consequently, the positive ρ counteracted the attenuation bias from the error in X*, and the negative ρ further increased the attenuation. Coverage is generally poor for the Naive estimator, with worse performance for the larger measurement error and the low to negative correlations between and . The relative performance of the proposed method depended on the size of the reliability subset. Some small sample bias was present for n = 25, but for n = 50 and larger there was a notable improvement in all scenarios, with bias generally between 1% and 5% and diminishing to less than 1% for larger n. The mean squared error (MSE) for the proposed method was generally competitive or smaller than that for the Naive estimator for n = 50 and a marked improvement for larger n, with one exception being that for the smaller and high positive correlation the Naive estimator maintained the smallest MSE for all n. The ASE compared well with the empirical SE and provided good nominal coverage. The CP for the proposed method was generally in the range of 93% to 95% and comparable with the CP of 93.8% for the True estimator, that is, from regression using the true data (X, Y).
TABLE 1.
For 1000 simulated data sets of size N = 1000, the mean percent (%) bias, empirical standard error (SE), average estimated standard error (ASE), mean squared error (MSE) and 95% coverage probability (CP) are given for βx. Results are provided for linear regression using the true data (X, Y) (TRUE), the error prone data (X⋆, Y⋆) (NAIVE), and the proposed estimated using a reliability subset (denoted by size of reliably subset n = 25, 50, 200, 400). Data are generated according to Scenario 1 in Section 4, with βx = 1
| Method | %Bias | SE | ASE | MSE | CP | %Bias | SE | ASE | MSE | CP | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5 | TRUE | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −25.58 | 0.19 | 0.18 | 0.10 | 0.699 | −10.93 | 0.23 | 0.23 | 0.07 | 0.909 | |
| 25 | 2.56 | 0.79 | 0.98 | 0.62 | 0.958 | −1.03 | 0.31 | 0.29 | 0.09 | 0.934 | |
| 50 | 0.79 | 0.44 | 0.42 | 0.20 | 0.949 | −1.11 | 0.31 | 0.29 | 0.10 | 0.928 | |
| 100 | −0.85 | 0.41 | 0.38 | 0.17 | 0.939 | −1.40 | 0.31 | 0.29 | 0.10 | 0.928 | |
| 200 | −0.52 | 0.37 | 0.35 | 0.13 | 0.945 | −1.04 | 0.30 | 0.28 | 0.09 | 0.934 | |
| 400 | −0.61 | 0.32 | 0.31 | 0.11 | 0.939 | −0.93 | 0.28 | 0.26 | 0.08 | 0.937 | |
| 0.25 | TRUE | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −38.08 | 0.19 | 0.18 | 0.18 | 0.455 | −15.92 | 0.24 | 0.23 | 0.08 | 0.879 | |
| 25 | 4.05 | 1.14 | 1.43 | 1.30 | 0.951 | −0.89 | 0.31 | 0.29 | 0.10 | 0.930 | |
| 50 | 1.95 | 0.47 | 0.45 | 0.22 | 0.946 | −1.01 | 0.31 | 0.29 | 0.10 | 0.929 | |
| 100 | −0.27 | 0.42 | 0.39 | 0.17 | 0.943 | −1.32 | 0.31 | 0.29 | 0.10 | 0.934 | |
| 200 | −0.27 | 0.37 | 0.36 | 0.14 | 0.950 | −1.00 | 0.30 | 0.29 | 0.09 | 0.934 | |
| 400 | −0.49 | 0.33 | 0.31 | 0.11 | 0.933 | −0.90 | 0.28 | 0.26 | 0.08 | 0.938 | |
| 0 | TRUE | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −50.58 | 0.19 | 0.18 | 0.29 | 0.217 | −20.92 | 0.24 | 0.23 | 0.10 | 0.833 | |
| 25 | 8.99 | 1.07 | 0.97 | 1.14 | 0.945 | −0.74 | 0.32 | 0.30 | 0.10 | 0.929 | |
| 50 | 3.20 | 0.50 | 0.48 | 0.25 | 0.943 | −0.91 | 0.31 | 0.29 | 0.10 | 0.931 | |
| 100 | 0.35 | 0.43 | 0.40 | 0.18 | 0.946 | −1.23 | 0.31 | 0.29 | 0.10 | 0.933 | |
| 200 | −0.01 | 0.38 | 0.36 | 0.14 | 0.940 | −0.95 | 0.30 | 0.29 | 0.09 | 0.933 | |
| 400 | −0.37 | 0.33 | 0.32 | 0.11 | 0.934 | −0.88 | 0.28 | 0.26 | 0.08 | 0.940 | |
| −0.25 | TRUE | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −63.09 | 0.19 | 0.18 | 0.43 | 0.073 | −25.92 | 0.24 | 0.23 | 0.12 | 0.775 | |
| 25 | 18.31 | 2.47 | 2.38 | 6.12 | 0.939 | −0.58 | 0.32 | 0.30 | 0.10 | 0.931 | |
| 50 | 4.58 | 0.54 | 0.51 | 0.29 | 0.940 | −0.80 | 0.32 | 0.30 | 0.10 | 0.932 | |
| 100 | 1.02 | 0.44 | 0.42 | 0.19 | 0.950 | −1.14 | 0.32 | 0.29 | 0.10 | 0.933 | |
| 200 | 0.26 | 0.38 | 0.37 | 0.15 | 0.944 | −0.91 | 0.30 | 0.29 | 0.09 | 0.936 | |
| 400 | −0.25 | 0.33 | 0.32 | 0.11 | 0.934 | −0.86 | 0.28 | 0.27 | 0.08 | 0.942 | |
| −0.5 | TRUE | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 | −1.10 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −75.6 | 0.19 | 0.18 | 0.61 | 0.020 | −30.92 | 0.24 | 0.23 | 0.15 | 0.714 | |
| 25 | 9.03 | 2.92 | 3.10 | 8.53 | 0.928 | −0.43 | 0.33 | 0.31 | 0.11 | 0.934 | |
| 50 | 6.12 | 0.59 | 0.55 | 0.35 | 0.940 | −0.69 | 0.32 | 0.30 | 0.10 | 0.939 | |
| 100 | 1.72 | 0.45 | 0.43 | 0.21 | 0.947 | −1.04 | 0.32 | 0.30 | 0.10 | 0.935 | |
| 200 | 0.53 | 0.39 | 0.38 | 0.15 | 0.947 | −0.87 | 0.30 | 0.29 | 0.09 | 0.938 | |
| 400 | −0.14 | 0.33 | 0.32 | 0.11 | 0.936 | −0.84 | 0.28 | 0.27 | 0.08 | 0.942 | |
Table 2 presents results for the same set of parameters, except now letting βx = 5. The performance of the regression calibration estimator had little small sample bias and was comparatively unaffected by the size of β. The coverage probability was much poorer for the Naive estimator, and the proposed method maintained good coverage and the smallest MSE for all scenarios with n > 50 and for the smaller error variance also had the smallest MSE for n = 25.
TABLE 2.
For 1000 simulated data sets of size N = 1000, the mean percent (%) bias, empirical standard error (SE), average estimated standard error (ASE), mean squared error (MSE) and 95% coverage probability (CP) are given for βx. Results are provided for linear regression using the true data (X, Y) (TRUE), the error prone data (X⋆, Y⋆) (NAIVE), and the proposed estimated using a reliability subset (denoted by size of reliably subset n = 25, 50, 200, 400). Data are generated according to Scenario 1 in Section 4, with βx = 5
| Method | %Bias | SE | ASE | MSE | CP | %Bias | SE | ASE | MSE | CP | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5 | TRUE | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −45.06 | 0.21 | 0.21 | 5.12 | 0.000 | −18.19 | 0.25 | 0.24 | 0.89 | 0.034 | |
| 25 | 7.58 | 7.25 | 7.82 | 52.64 | 0.869 | −0.16 | 0.44 | 0.43 | 0.20 | 0.939 | |
| 50 | 2.73 | 1.14 | 1.08 | 1.33 | 0.901 | −0.22 | 0.39 | 0.37 | 0.15 | 0.939 | |
| 100 | 0.70 | 0.75 | 0.72 | 0.56 | 0.939 | −0.34 | 0.36 | 0.34 | 0.13 | 0.933 | |
| 200 | 0.59 | 0.55 | 0.54 | 0.31 | 0.950 | −0.15 | 0.33 | 0.31 | 0.11 | 0.937 | |
| 400 | 0.23 | 0.44 | 0.42 | 0.19 | 0.932 | −0.15 | 0.30 | 0.28 | 0.09 | 0.937 | |
| 0.25 | TRUE | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −47.55 | 0.22 | 0.22 | 5.70 | 0.000 | −19.19 | 0.25 | 0.24 | 0.98 | 0.026 | |
| 25 | 3.21 | 9.36 | 10.12 | 87.57 | 0.865 | −0.09 | 0.46 | 0.45 | 0.21 | 0.937 | |
| 50 | 3.12 | 1.20 | 1.14 | 1.47 | 0.906 | −0.16 | 0.40 | 0.38 | 0.16 | 0.936 | |
| 100 | 0.86 | 0.78 | 0.75 | 0.61 | 0.938 | −0.3 | 0.36 | 0.34 | 0.13 | 0.937 | |
| 200 | 0.63 | 0.57 | 0.56 | 0.33 | 0.948 | −0.14 | 0.33 | 0.32 | 0.11 | 0.939 | |
| 400 | 0.23 | 0.45 | 0.43 | 0.20 | 0.934 | −0.15 | 0.30 | 0.28 | 0.09 | 0.933 | |
| 0 | TRUE | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −50.06 | 0.22 | 0.22 | 6.31 | 0.000 | −20.19 | 0.25 | 0.25 | 1.08 | 0.021 | |
| 25 | 7.38 | 4.99 | 3.90 | 25.08 | 0.870 | −0.01 | 0.48 | 0.46 | 0.23 | 0.934 | |
| 50 | 3.56 | 1.28 | 1.21 | 1.66 | 0.909 | −0.10 | 0.41 | 0.39 | 0.17 | 0.927 | |
| 100 | 1.07 | 0.81 | 0.78 | 0.66 | 0.935 | −0.26 | 0.37 | 0.35 | 0.13 | 0.937 | |
| 200 | 0.68 | 0.59 | 0.58 | 0.35 | 0.949 | −0.13 | 0.33 | 0.32 | 0.11 | 0.941 | |
| 400 | 0.23 | 0.46 | 0.44 | 0.21 | 0.933 | −0.15 | 0.30 | 0.28 | 0.09 | 0.936 | |
| −0.25 | TRUE | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 | v0.22 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −52.57 | 0.23 | 0.22 | 6.96 | 0.000 | −21.19 | 0.26 | 0.25 | 1.19 | 0.014 | |
| 25 | 12.37 | 8.12 | 7.76 | 66.31 | 0.869 | 0.08 | 0.50 | 0.48 | 0.25 | 0.931 | |
| 50 | 4.09 | 1.37 | 1.28 | 1.93 | 0.910 | −0.04 | 0.42 | 0.40 | 0.18 | 0.929 | |
| 100 | 1.31 | 0.85 | 0.82 | 0.72 | 0.940 | −0.21 | 0.37 | 0.35 | 0.14 | 0.933 | |
| 200 | 0.74 | 0.61 | 0.59 | 0.37 | 0.946 | −0.12 | 0.34 | 0.32 | 0.11 | 0.942 | |
| 400 | 0.24 | 0.47 | 0.45 | 0.22 | 0.937 | −0.15 | 0.30 | 0.29 | 0.09 | 0.940 | |
| −0.5 | TRUE | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 | −0.22 | 0.26 | 0.25 | 0.07 | 0.938 |
| NAIVE | −55.09 | 0.23 | 0.23 | 7.64 | 0.000 | −22.2 | 0.26 | 0.25 | 1.30 | 0.010 | |
| 25 | 5.36 | 10.10 | 10.34 | 102.1 | 0.861 | 0.17 | 0.52 | 0.49 | 0.27 | 0.931 | |
| 50 | 4.71 | 1.49 | 1.36 | 2.29 | 0.907 | 0.03 | 0.43 | 0.41 | 0.19 | 0.928 | |
| 100 | 1.60 | 0.88 | 0.85 | 0.79 | 0.941 | −0.15 | 0.38 | 0.36 | 0.14 | 0.932 | |
| 200 | 0.81 | 0.63 | 0.61 | 0.39 | 0.939 | −0.10 | 0.34 | 0.32 | 0.12 | 0.941 | |
| 400 | 0.26 | 0.47 | 0.46 | 0.22 | 0.940 | −0.14 | 0.30 | 0.29 | 0.09 | 0.938 | |
In Table 3, we show the results for the ANCOVA model for a similar set of measurement error parameters as in Tables 1 and 2, fixing βx = 1 and focusing on positive correlation for . As expected, the Naive estimator for βz is unaffected by the measurement error in X⋆ for the scenarios where ρxz = 0.8 For the other scenarios there is bias in the Naive estimator and lower than the nominal 95% coverage for both βz and βx. The regression calibration estimates perform well across the different parameter choices for the measurement error model, with good coverage and the small sample bias diminishing with increasing size of the reliability subset. It is notable that in these simulations, when X is correlated with Z (ρxz = 0.5) and , the MSE for the regression calibration estimator is larger than that for the Naive estimator except with the larger reliability subsets. This is due to the increased uncertainty in the regression calibration estimates for β. Supplementary Table S1 in the Web Appendix examines scenarios similar to Table 3, only with smaller measurement error variance parameters . We see that for the Naive estimator, there is less bias and smaller MSE for both βx and βz with the smaller measurement error variance; however the Naive estimator has low coverage due to its bias for both βx and βz, whereas the proposed method again maintains close to 95% coverage in all scenarios. Supplementary Table S2 presents results for cohort size N = 1000 and reliability subset sizes n = 50, 100, 200, 500, 1000. The performance of the RC estimator improves with the larger reliability sizes, as expected. The coverage of the Naive estimator gets worse, as this estimator becomes more certain about the wrong thing. Supplementary Table S3 considers the effects of negative correlation for the error terms T in X⋆ and in Y⋆, allowing . Patterns in this case are similar to Table 3. The Naive estimator had poor coverage; whereas the proposed method maintained good coverage and generally a similar or better MSE for reliability subsets of size 50 or larger, with percent bias diminishing n increased. Supplementary Table S4 considered the relative performance for larger values of β. Similar to the results in Table 2, the Naive estimator had larger bias and lower coverage, whereas the proposed method maintained good coverage and for scenarios with n > 25 generally had competitive or better MSE. The proposed method had larger MSE and an inflated estimate of SE for the small reliability subset size n = 25.
TABLE 3.
For 1000 simulated data sets of size N = 1000, the mean percent (%) bias, empirical standard error (SE), average estimated standard error (ASE), mean squared error (MSE), and 95% coverage probability (CP) are given for (βx, βz). Results are provided for linear regression using the true data (X, Z, Y) (TRUE), (X⋆, Z, Y⋆) (NAIVE), and the proposed estimated with a reliability subset (Method denoted by size of reliably subset n = 25, 50, 200, 400). Data are generated according to Scenario 2 in Section 4, with βx = 1, βz = −1 and
| β x | β z | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ρ xz | Method | %Bias | SE | ASE | MSE | CP | %Bias | SE | ASE | MSE | CP | |
| 0.5 | 0.5 | TRUE | −1.05 | 0.29 | 0.29 | 0.08 | 0.955 | 0.39 | 0.29 | 0.29 | 0.08 | 0.949 |
| NAIVE | −29.39 | 0.19 | 0.19 | 0.12 | 0.686 | 14.61 | 0.27 | 0.27 | 0.09 | 0.913 | ||
| 25 | 1.81 | 3.04 | 6.51 | 9.25 | 0.970 | −1.19 | 1.36 | 3.00 | 1.85 | 0.967 | ||
| 50 | 2.90 | 0.63 | 0.63 | 0.40 | 0.967 | −1.59 | 0.41 | 0.42 | 0.17 | 0.960 | ||
| 100 | 1.11 | 0.50 | 0.49 | 0.25 | 0.959 | −0.84 | 0.37 | 0.37 | 0.14 | 0.945 | ||
| 200 | −0.01 | 0.44 | 0.44 | 0.20 | 0.945 | 0.02 | 0.35 | 0.35 | 0.12 | 0.945 | ||
| 400 | −0.41 | 0.38 | 0.38 | 0.15 | 0.952 | 0.01 | 0.32 | 0.32 | 0.10 | 0.955 | ||
| 0.25 | TRUE | −1.05 | 0.29 | 0.29 | 0.08 | 0.955 | 0.39 | 0.29 | 0.29 | 0.08 | 0.949 | |
| NAIVE | −43.59 | 0.20 | 0.19 | 0.23 | 0.393 | 21.73 | 0.27 | 0.27 | 0.12 | 0.886 | ||
| 25 | 5.93 | 1.56 | 2.94 | 2.43 | 0.959 | −3.35 | 0.79 | 1.45 | 0.63 | 0.962 | ||
| 50 | 5.02 | 0.64 | 0.64 | 0.41 | 0.962 | −2.64 | 0.41 | 0.43 | 0.17 | 0.960 | ||
| 100 | 2.33 | 0.53 | 0.51 | 0.28 | 0.957 | −1.43 | 0.38 | 0.38 | 0.15 | 0.945 | ||
| 200 | 0.47 | 0.45 | 0.45 | 0.20 | 0.947 | −0.21 | 0.35 | 0.35 | 0.12 | 0.942 | ||
| 400 | −0.22 | 0.38 | 0.39 | 0.15 | 0.948 | −0.08 | 0.32 | 0.32 | 0.10 | 0.955 | ||
| 0 | TRUE | −1.05 | 0.29 | 0.29 | 0.08 | 0.955 | 0.39 | 0.29 | 0.29 | 0.08 | 0.949 | |
| NAIVE | −57.78 | 0.20 | 0.19 | 0.37 | 0.139 | 28.85 | 0.27 | 0.28 | 0.16 | 0.817 | ||
| 25 | 25.41 | 3.61 | 6.84 | 13.07 | 0.946 | −13.09 | 1.92 | 3.66 | 3.70 | 0.961 | ||
| 50 | 7.48 | 0.68 | 0.68 | 0.47 | 0.955 | −3.87 | 0.43 | 0.45 | 0.19 | 0.961 | ||
| 100 | 3.47 | 0.55 | 0.53 | 0.30 | 0.961 | −1.99 | 0.39 | 0.38 | 0.15 | 0.944 | ||
| 200 | 0.93 | 0.46 | 0.46 | 0.21 | 0.946 | −0.42 | 0.36 | 0.35 | 0.13 | 0.942 | ||
| 400 | −0.04 | 0.39 | 0.39 | 0.15 | 0.947 | −0.17 | 0.32 | 0.32 | 0.10 | 0.954 | ||
| 0 | 0.5 | TRUE | −0.91 | 0.24 | 0.25 | 0.06 | 0.955 | 0.10 | 0.25 | 0.25 | 0.06 | 0.947 |
| NAIVE | −25.76 | 0.18 | 0.18 | 0.10 | 0.728 | 0.14 | 0.25 | 0.26 | 0.06 | 0.950 | ||
| 25 | 3.35 | 0.59 | 0.59 | 0.35 | 0.964 | 0.18 | 0.26 | 0.27 | 0.07 | 0.946 | ||
| 50 | 3.00 | 0.83 | 0.81 | 0.69 | 0.960 | 0.01 | 0.26 | 0.28 | 0.07 | 0.943 | ||
| 100 | 0.34 | 0.39 | 0.38 | 0.15 | 0.952 | −0.08 | 0.27 | 0.27 | 0.07 | 0.946 | ||
| 200 | −0.11 | 0.36 | 0.35 | 0.13 | 0.941 | 0.17 | 0.27 | 0.27 | 0.07 | 0.943 | ||
| 400 | −0.47 | 0.31 | 0.31 | 0.09 | 0.949 | 0.03 | 0.25 | 0.25 | 0.06 | 0.952 | ||
| 0.25 | TRUE | −0.91 | 0.24 | 0.25 | 0.06 | 0.955 | 0.10 | 0.25 | 0.25 | 0.06 | 0.947 | |
| NAIVE | −38.18 | 0.18 | 0.18 | 0.18 | 0.450 | 0.14 | 0.25 | 0.26 | 0.06 | 0.950 | ||
| 25 | 10.71 | 1.53 | 1.86 | 2.34 | 0.956 | 0.42 | 0.27 | 0.31 | 0.07 | 0.948 | ||
| 50 | −2.17 | 1.42 | 1.50 | 2.01 | 0.956 | 0.22 | 0.27 | 0.30 | 0.07 | 0.946 | ||
| 100 | 1.01 | 0.40 | 0.39 | 0.16 | 0.953 | −0.07 | 0.27 | 0.27 | 0.07 | 0.944 | ||
| 200 | 0.19 | 0.36 | 0.36 | 0.13 | 0.942 | 0.16 | 0.27 | 0.27 | 0.07 | 0.943 | ||
| 400 | −0.35 | 0.31 | 0.31 | 0.10 | 0.950 | 0.02 | 0.25 | 0.25 | 0.06 | 0.952 | ||
| 0 | TRUE | −0.91 | 0.24 | 0.25 | 0.06 | 0.955 | 0.10 | 0.25 | 0.25 | 0.06 | 0.947 | |
| NAIVE | −50.59 | 0.18 | 0.18 | 0.29 | 0.201 | 0.14 | 0.26 | 0.26 | 0.07 | 0.950 | ||
| 25 | 4.43 | 2.82 | 4.47 | 7.93 | 0.944 | 0.24 | 0.29 | 0.42 | 0.09 | 0.947 | ||
| 50 | 0.78 | 0.97 | 0.81 | 0.95 | 0.953 | 0.19 | 0.27 | 0.28 | 0.07 | 0.950 | ||
| 100 | 1.66 | 0.41 | 0.41 | 0.17 | 0.948 | −0.06 | 0.27 | 0.27 | 0.07 | 0.946 | ||
| 200 | 0.48 | 0.37 | 0.37 | 0.13 | 0.946 | 0.16 | 0.27 | 0.27 | 0.07 | 0.945 | ||
| 400 | −0.24 | 0.31 | 0.32 | 0.10 | 0.953 | 0.01 | 0.25 | 0.25 | 0.06 | 0.953 | ||
For Scenario 3, we simulated nonnormal distributions for the error and covariates. We consider a similar simulation to Scenario 1, letting , but consider both a mixture of two normals and a log-normal distribution. Supplementary Figure S1 shows these distributions. Simulation results are presented in Supplementary Table S5. The proposed method still provides estimates with diminishing small sample bias for both distributions as n increased; but for the log-normal errors, performance is noticeably better for the larger reliability subsets. For the highly skewed log-normal distribution, estimates for the scenarios with reliability subsets smaller than 200 have some small-sample bias. These were generally caused by a few extreme values across the simulations, and the median estimates were much closer to being unbiased (data not shown).
Simulations for case 2 are provided in Supplementary Tables S6a and S6b. In addition to the True and Naive methods, the Proposed results are compared with those from the related moment-based correction method by Shepherd and Yu.7 Parameter values for the simulations for this scenario are chosen from those explored by Shepherd and Yu for direct comparison. The proposed regression calibration and the method of Shepherd and Yu7 are both a type of moment correction estimate and are asymptotically equivalent for the linear error models we consider in cases 1 and 2. For the case of simple univariate regression they can be made equivalent by choosing the same estimators of the necessary moments (Table S6a). For multivariate regression, the two methods differ on finite samples but had comparable performance (Table S6b).
Shepherd and Yu7 only considered the setting where there was a validation subset available; however, we extend their method to the setting of case 1, where the true values are never observed on anyone, by using the moment estimators provided in the Appendix for the necessary nuisance parameters. Simulation results comparing regression calibration with the moment correction estimates for case 1 are shown in Supplemental Table S7. We see that the two methods again generally perform well for n > 50 and provide similar estimates, with low small sample bias and very comparable MSE across the scenarios. There was some instability for n = 25; there appears to be a slight advantage of smaller MSE for the proposed method over the moment correction method for this small reliability size.
In the next section, we examine the performance of regression calibration for case 3, where the measurements X⋆ and Y⋆ have systematic bias as well as correlated errors, after considering a motivating data example.
5 |. NUTRITIONAL EPIDEMIOLOGY DATA EXAMPLE
Tools for dietary assessment of usual intake rely primarily on self-reported data from instruments like the food frequency questionnaire (FFQ). The FFQ consists of a list of foods and a frequency of response for how often each food was consumed over a specified period, such as the last 3 months, and is translated into specific intakes using a nutrient database. The FFQ has been shown for many nutrients of interest, such as energy and protein, to contain systematic error associated with subject characteristics (body mass index [BMI], gender, age, etc), as well as within person variability. Despite these known measurement error problems, the FFQ is the most common diet instrument in large cohort studies9 because of its low cost. Measurement error for different nutrient intakes assessed with the same FFQ are likely to be correlated. We consider this problem examining the association between protein density and total caloric intake. We consider data from the usual diet (control) arm of the Women’s Health Initiative (WHI) Dietary Modification (DM) Trial, which measured dietary intake on 29 294 women using an FFQ. Because the baseline FFQ was used to determine eligibility in the DM trial by requiring a minimum of 32% estimated calories from fat, the data for this analysis were from 1 year after enrollment, at which time another FFQ was obtained. The DM trial also included a Nutritional Biomarker Substudy (NBS, N = 544). The NBS collected self-reported intake along with several objective biomarkers on randomly selected weight-stable women, including doubly labeled water and urinary nitrogen, with repeat measures on a subset (n = 110). These biomarkers are considered unbiased for short-term usual intake of energy and protein, respectively. See WHIscience.org for information regarding obtaining WHI study data. Results of the NBS study are reported by Neuhouser et al.22 They found BMI to be a strong determinant of subject-specific bias, with underreporting of energy and protein intake increasing with increasing BMI.
We consider the regression of the log-transformed energy (Y) on log-transformed protein density (X) and BMI (Z). Here we assume we have the measurement error structure of case 3 in Section 2.3, where the main outcome and exposure of interest have both systematic bias and potentially correlated measurement error (the FFQ on the DM cohort), along with an objective marker with independent classical measurement error for both these variables on a subset (the energy and protein biomarkers in the NBS subcohort). We compare the regression coefficients and normal 95% confidence intervals for log(protein density) and BMI, βx and βz respectively, for regressions using the naive approach (based on error-prone self-reported FFQ data only) and the proposed regression calibration method. From the naive analysis using data from the entire DM cohort, (P < .001, P < .001). Applying the proposed method to calibrate the FFQ data on the large cohort, we found (P = .43, P < .001). The coefficient for protein (X) appears deattenuated but is no longer significant due to a large increase in the standard errors. The coefficient of BMI is also larger compared with the analysis based on self-report, suggesting BMI-related bias in self-reported FFQ data was biasing both coefficients of the regression.
Since the biomarkers for protein and energy obey the classical measurement error model, we could obtain an alternate consistent estimator of using a complete case analysis on the NBS cohort with covariate-only regression calibration to adjust for the error in the protein biomarker and no necessary adjustments for the energy biomarker as the outcome. Standard errors are estimated using the bootstrap method. Using regression calibration on the biomarker subset one has (P = .07, P < .001). In this case, because the biomarkers were a much more precise and accurate measure of the diet,22 the regression using only this data had a stronger strength of association despite being on a much smaller subset. One could attempt to improve the efficiency by using a technique from the survey literature, called raking or survey calibration, to combine the information from the two estimators.23,24 Using the approach of Lumley et al,23 the proposed estimating equation based on the corrected FFQ data is used to create auxiliary variables, known on the whole cohort, that may be used to augment the efficiency of the consistent, regression calibration estimating equation on the biomarker subset. This approach preserves the consistency of the complete case estimator. The survey package in R was used to implement the raking procedure.25,26 The resulting raking estimator (P = .07, P < .001) had a similar strength of association to the complete case estimator, confirming that for this example, the error-prone FFQ provided very little additional information regarding the association between the two dietary intakes. An R code file demonstrating the implemented estimators on a similar, simulated data set is provided on GitHub (https://github.com/PamelaShaw/AuditRC/).
To better understand the performance of the proposed method under the measurement error structure seen in the WHI example, we built a simulation model based on the WHI data. We generated 1000 simulated WHI DM cohorts (N = 25000), that each included a biomarker subset (n = 500) of which 20% (n = 100) had repeat measures. The Supplementary Materials Web Appendix provides further details of the parameters used for this simulation and how they were identified from the WHI data. In short, for the main cohort and biomarker subset, a regression model for energy as a function of protein density and BMI, as well as measurement error parameters for the observed self-reported FFQ and biomarker versions of protein density and energy, were fit using the observed data. Correlation between the errors of the self-reported protein density and energy are identifiable using the biomarker data. A linear model for the dependence of the error on BMI was assumed and fit to the data. The resulting deattenuated strength of association between the true X and Y was of similar magnitude for this simulation and the WHI data. The simulation also reproduced the large attenuation seen in the BMI coefficient in the naive analysis of the self-reported data.
Table 4 compares the estimates of β for the regression of log energy (Y) on log protein density (X) and BMI (Z) for the regressions based on: the true (X, Y), the error-prone self-reported estimates (X⋆, Y⋆), and the proposed regression calibration method incorporating the biomarker observations (XB, YB). We also consider the results using just the regression of YB on (XB, Z) on the biomarker subset, one estimate ignoring and one accommodating for classical measurement error in XB. In this case, since the error in both biomarker observations for Y and X follows the classical measurement error model, the error in Y can be ignored and the usual calibration of only X provides a consistent estimate of β in the biomarker subset. The error in the biomarkers for these nutrients are generally considered smaller than that for self-report, so this analysis addresses the question of whether, despite the smaller sample, this analysis may perform better in terms of MSE relative to the proposed and naive analysis. Standard errors for the regression calibration methods were obtained via the bootstrap with 500 bootstrap samples.
TABLE 4.
For 1000 simulated data sets, the mean percent (%) bias, average estimated standard error (ASE), empirical standard error (SE), mean squared error (MSE), and coverage probability for the 95% confidence intervals (CP) are given for (βx, βz), the linear regression coefficients for protein density (log-scale) and BMI using the unadjusted self-reported data on the large cohort (Naive), the proposed regression calibration method (Proposed), the unadjusted biomarker values on subset (BM Naive), and a complete case regression calibration method that adjusts for the classical measurement error in the biomarker for log-protein density on the biomarker subset (BM RC). The True method are the results based on the (X, Y, Z), that is, (log-energy, log-protein, BMI), measured without error
| β x | β z | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | %Bias | SE | ASE | 100*MSE | CP | %Bias | SE | ASE | 100*MSE | CP |
| True | 0.0410 | 0.0062 | 0.0064 | 0.0039 | 0.957 | −0.0106 | 0.0002 | 0.0002 | 0.0000 | 0.951 |
| Naive | 16.2380 | 0.0099 | 0.0099 | 0.1069 | 0.114 | −80.1324 | 0.0004 | 0.0004 | 0.0110 | 0.000 |
| Proposed | −1.2704 | 0.0853 | 0.0806 | 0.7278 | 0.934 | 0.2963 | 0.0025 | 0.0025 | 0.0006 | 0.944 |
| BM Naive | −45.8489 | 0.0363 | 0.0360 | 0.9069 | 0.326 | −0.2303 | 0.0018 | 0.0018 | 0.0003 | 0.959 |
| BMRC | 3.0050 | 0.0749 | 0.0860 | 0.5642 | 0.936 | −0.3457 | 0.0019 | 0.0019 | 0.0003 | 0.945 |
The simulation shows the proposed method was close to unbiased, with a mean bias of −1.3% for and 0.30% for , but it also had large MSE, particularly for βx due to the amount of uncertainty around the slope parameters. The bootstrapped SE were in good agreement with the empirical standard errors. The relative performance of the methods for the simulated analyses in Table 4, in terms of the relative size of the standard errors for the β estimates across the methods, is similar to that seen in the data analysis. Overall, these simulations show that for the general measurement error model discussed in case 3, the proposed method had good coverage and produced estimates with small bias. They also provide additional evidence that the magnitudes of the error seen in the WHI nutrient data could have led to confidence intervals that included zero despite a true underlying relationship between the nutrients of interest. The naive analysis (ignoring measurement error) had large bias in the regression coefficient for the precisely observed BMI and overall poor coverage for both slope parameters. The analysis using regression calibration for covariate measurement only in the biomarker subset had smaller mean squared error than the proposed method calibrating both exposure and outcome implemented without raking. We note in a simulation where the random variance in the biomarker measure of X increased to 0.30, which decreased the reliability coefficient from 0.53 to 0.30, the proposed method performed better than the estimator based on the biomarker data alone (data not shown). Thus, the value of using the self-report on the larger cohort depended on the relative accuracy with that instrument and the biomarker. A similar phenomenon was observed by Keogh et al.16 In this example, due to the ease of implementation we used bootstrapped standard errors. We examined the relative performance of our method when using the bootstrap vs the sandwich estimate of the variance more generally. We considered the same scenario as shown in the upper left quadrant of Table 3, where samples sizes were smaller and where we might expect more differences between performance of the two standard error estimation approaches. Results are shown in Supplementary Table S8. Generally the bootstrap performed well and matched the empirical SE for audit subset sizes of 200 or larger. For smaller audit subsample sizes, particularly n = 25 or 50, the bootstrap subset would occasionally produce a spuriously large estimate, leading to some instability. We note for the data example simulations shown in Table 4, the bootstrap performed well compared with the empirical SE which had large sample sizes for both the cohort and subsample. The sandwich estimate performed well across all scenarios studied.
6 |. DISCUSSION
In this article we have described a regression calibration approach to account for correlated measurement error in both the outcome and the exposure. Our approach is flexible, simple to implement, and can be applied using reliability or validation samples. Here, we applied the methods assuming a randomly determined subset in the validation or reliability subset. The methods are also straightforward to apply for a design-based sample, along with inverse probability weighting to adjust the moment or estimating equations relying on the subsample. Regression calibration methods are popular in practice, and this extension allows their implementation to an important problem. Although methods for accounting for measurement error have been widely studied, there has been less focus on correlated measurement error on the outcome and exposure. Correlated measurement error is not uncommon; we have seen it from audits of observational data and in nutritional epidemiology settings.
As with all measurement error methods, there are variance/bias trade-offs with using our new method. With small validation subsets, the reduction in bias may not outweigh the increased variance in estimates, and in some scenarios, naive estimates may have lower mean-squared error than corrected estimates even with large sample sizes. In applied settings, raking can be used as a method that could potentially improve efficiency by combining the proposed estimator with that of a consistent estimator on the validation/biomarker subset; though in large error settings, as seen in the WHI data example, raking may not provide appreciable gains in efficiency over the compete case analysis of the validation data alone. The efficiency gain from raking comes from the correlation of the auxiliary variable used in raking with the influence function from the model fit with the error-free data.23 This correlation can be low in the cases where there is a large amount of measurement error. In the examples studied, the size of the measurement error relative to the variance of the true outcome and covariate subject to error, the correlation between the error in the outcome and exposure, and the size of the validation subset all affected the relative performance of the methods discussed. Simulation can be used as a tool to better understand how the anticipated measurement error structure for a given setting could affect precision and relative performance of the proposed method for different sizes of the validation subset.
For settings where a validation subset is available, a very general error model can be assumed for the data since complete data on the validation subset can be used to estimate the nuisance parameters; though here we must also assume covariates that determine the biased components of the error are observed and modeled correctly. For settings where error-free covariates cannot be observed, the methods can still be applied so long as a second error prone measurement whose errors are independent of the first is available. It is interesting to note that the errors in observed values (X⋆, Y⋆) need not be unbiased (mean zero) so long as a second measurement with independent and unbiased errors is available on at least a subset. When only repeat measures of the same error-prone instrument are observed, the nuisance parameters necessary for the proposed method are only identifiable if the errors are independent of the true values.
Development of the proposed methods for other types of outcomes is an important direction for future research. More work for this setting is also needed on methods to determine a sufficient size of the validation/reliability subset, as well as how to select individuals into the reliability/validation subset for more efficient validation substudy designs. Efficiency will be key for these methods to be applied to practically sized validation/reliability subsets without undoing the gains in bias-correction by increasing the variability in the parameter estimates.
Supplementary Material
ACKNOWLEDGEMENTS
The authors would like to thank the investigators of the Women’s Health Initiative for the use of their data. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, US Department of Health and Human Services through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C. The work of Dr Shaw and Dr Shepherd was supported in part by NIH grant R01-AI131771 and PCORI grant R-1609-36207. D. Shepherd was also supported by NIH Grants R01AI093234 and U01AI069923. The statements in this manuscript are solely the responsibility of the authors and do not necessarily represent the views of PCORI, its Board of Governors or Methodology Committee.
Funding information
National Institute of Allergy and Infectious Diseases, Grant/Award Number: R01-AI131771; Patient-Centered Outcomes Research Institute, Grant/Award Number: R-1609-36207
APPENDIX. STANDARD ERROR ESTIMATION
Standard errors of can be computed by applying the M-estimation technique and obtaining the sandwich variance estimate (Stefanski and Boos, 2002). A vector ψ(θ) of stacked estimating equations are formed for the parameter vector θ, which includes both the parameters of interest (β) and the nuisance parameters from the measurement error model. The estimates can be obtained by solving the equations and a sandwich estimator for the variance of is estimated as , where
This technique incorporates the extra variance in due to the uncertainty in the nuisance parameters. The first derivative of ψi(θ) can be computed either directly with the assistance of software (eg, MATLAB) or by numerical differentiation. The variance of corresponds to corresponding submatrix of the matrix.
In this section we provide the estimating equation ψ(θ) for each of the cases considered in Section 3. In the expressions below, the half-vectorization operator vech(A) is used to represent a symmetric matrix A as a vector by stacking the columns of the lower triangular portion of the matrix one below the other. In this manner, we create a vector of unique elements for the nuisance parameters in the model that are variance/covariance matrices.
For case 1, the reliability subset, one has the parameter vector . From these parameters one can also derive other parameters specified in the calibration equations; namely , , , and . We define
and the and in the first two equations of ψ(θ) are functions of θ, as follows. Subjects will have a different estimate of and , depending on whether they are in the reliability subset. We also take advantage of the parameter equalities implied by the measurement error assumptions, as described above.
For subjects with only one measure,
For subjects with reliability measure,
Note, by assumption, one has , which equals 0 for j ≠ j′.
For case 2, the validation subset, the parameter vector also includes the additional parameters ΣTZ, and a different M-estimation vector that allows for a more general covariance structure between (X⋆, Z) and . In this case, T and are allowed to be correlated with Z. We also use the equality . In this case, we define and
Here, , where is provided by Equation (5) and is defined as in Equation (6).
For case 3, the parameters needed to estimate and c⋆ can be estimated by standard linear regression on the biomarker subset and standard errors for the proposed method in the text (Sections 3 and 4)were calculated using the bootstrap. We provide stacked estimating equations for the sandwich variance estimate, whose performance was compared with the bootstrap variance estimator in Supplementary Table S8.
We estimate with this regression on the biomarker subset . We estimate , where c⋆ is obtained from the regression . We define the parameter vector and
Footnotes
SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.
DATA AVAILABILITY STATEMENT
See WHIscience.org for information regarding obtaining WHI study data.
REFERENCES
- 1.Buonaccorsi JP. Measurement error in the response in the general linear model. J Am Stat Assoc. 1996;91(434):633–642. [Google Scholar]
- 2.Pepe MS. Inference using surrogate outcome data and a validation sample. Biometrika. 1992;79(2):355–365. [Google Scholar]
- 3.Magder LS, Hughes JP. Logistic regression when the outcome is measured with uncertainty. Am J Epidemiol. 1997;146(2):195–203. [DOI] [PubMed] [Google Scholar]
- 4.Meier AS, Richardson BA, Hughes JP. Discrete proportional hazards models for mismeasured outcomes. Biometrics. 2003;59(4):947–954. [DOI] [PubMed] [Google Scholar]
- 5.Küchenhoff H, Mwalili SM, Lesaffre E. A general method for dealing with misclassification in regression: the misclassification SIMEX. Biometrics. 2006;62(1):85–96. [DOI] [PubMed] [Google Scholar]
- 6.Edwards JK, Cole SR, Troester MA, Richardson DB. Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data. Am J Epidemiol. 2013;177(9):904–912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Shepherd BE, Yu C. Accounting for data errors discovered from an audit in multiple linear regression. Biometrics. 2011;67(3):1083–1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shepherd BE, Shaw PA, Dodd LE. Using audit information to adjust parameter estimates for data errors in clinical trials. Clin Trials. 2012;9(6):721–729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Willett W Nutritional Epidemiology. 2nd ed. New York, NY: Oxford University Press; 1998. [Google Scholar]
- 10.Schoeller DA. Measurement of energy expenditure in free-living humans by using doubly labeled water. J Nutr. 1988;118:1278–1289. [DOI] [PubMed] [Google Scholar]
- 11.Bingham SA, Cummings JH. Urine nitrogen as an independent validatory measure of dietary intake: a study of nitrogen balance in individuals consuming their normal diet. Am J Clin Nutr. 1985;42:1276–1289. [DOI] [PubMed] [Google Scholar]
- 12.Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982;69:331–342. [Google Scholar]
- 13.Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd ed. London, UK: Chapman & Hall; 2006. [Google Scholar]
- 14.Francq Bernard G, Govaerts BB. Hyperbolic confidence bands of errors-in-variables regression lines applied to method comparison studies. J de la Société Française de Statistique. 2014;155(1):23–45. [Google Scholar]
- 15.Therneau T Total least squares: Deming, Theil-Sen, and Passing-Bablock regression. R Package Vignette; 2018. https://cran.r-project.org/web/packages/deming/vignettes/deming.pdf.
- 16.Keogh RH, Carroll RJ, Tooze JA, Kirkpatrick SI, Freedman LS. Statistical issues related to dietary intake as the response variable in intervention trials. Stat Med. 2016;35(25):4493–4508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Women’s Health Initiative Study Group. Design of the women’s health initiative clinical trial and observational study. Control Clin Trials. 1998;19(1):61–109. [DOI] [PubMed] [Google Scholar]
- 18.Prentice Ross L, Tinker LF, Huang Y, Neuhouser ML. Calibration of self-reported dietary measures using biomarkers: an approach to enhancing nutritional epidemiology reliability. Curr Atheroscler Rep. 2013;15(9):1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Shaw PA, McMurray R, Butte N, et al. Calibration of activity-related energy expenditure in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). J Sci Medic Sport. 2019;22(3):300–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Neuhouser ML, Di C, Tinker LF, et al. Physical activity assessment: biomarkers and self-report of activity-related energy expenditure in the WHI. Am J Epidemiol. 2013;177(6):576–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Stefanski LA, Boos DD. The calculus of M-estimation. Am Stat. 2002;56(1):29–38. [Google Scholar]
- 22.Neuhouser ML, Tinker L, Shaw PA, et al. Use of recovery biomarkers to calibrate nutrient consumption self-reports in the women’s health initiative. Am J Epidemiol. 2008;167(10):1247–1259. [DOI] [PubMed] [Google Scholar]
- 23.Lumley T, Shaw PA, Dai JY. Connections between survey calibration estimators and semiparametric models for incomplete data. Int Stat Rev. 2011;79(2):200–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Deville J-C, Särndal C-E, Sautory O. Generalized raking procedures in survey sampling. J Am Stat Assoc. 1993;88(423):1013–1020. [Google Scholar]
- 25.R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. [Google Scholar]
- 26.Lumley T Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: John Wiley & Sons; 2011. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
See WHIscience.org for information regarding obtaining WHI study data.
