Abstract
Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women’s Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.
Keywords and phrases: Measurement Error, Regression Calibration, Feeding Study, Biomarker, Cardiovascular Disease
1. Introduction
It is important to understand how dietary and physical activity patterns influence our health. There are frequently new studies on associations between dietary components and risks of chronic diseases, and useful information is being discovered. For example, a positive association between obesity and cancer risk is well established (Adams et al., 2006). However, for prevention and control purposes of chronic diseases, more detailed information is needed on how key energy balance factors are associated with the risk factors for major chronic diseases (World Cancer Research Fund/American Institute for Cancer Research (WCRF/AICR), 2007). The working mechanisms of these energy balance factors are complicated, requiring us to jointly study the associations between multiple dietary components and disease risks. Such associations are not easy to establish, and one big challenge arises from bias in dietary assessment, which is known to be challenging to deal with (Paeratakul et al., 1998). There is strong evidence (Prentice et al., 2011) that the misreporting of dietary energy intake is related to individual characteristics (for example, body mass index (BMI)). Such systematic measurement errors lead to estimation bias that cannot be automatically corrected (Carroll et al., 2006). Also, measurement error correction is more challenging when we attempt to model dietary components jointly when studying their relationships with chronic disease.
Measurement error correction has been studied for years with substantial impacts on nutritional studies (Freedman et al., 2011), and various approaches for addressing such measurement error have been proposed (Wang et al., 1997; Zucker, 2005; Song and Huang, 2005; Yan and Yi, 2015; Li and Ryan, 2006; Huang and Wang, 2000; Hu and Lin, 2002; Bartlett and Keogh, 2018). Among these methods, the regression calibration method allows one to handle the covariate-dependent measurement error. It has also been proposed to correct measurement error in covariates (e.g., Rosner, Spiegelman and Willett 1990), and has the advantage of easy implementation. Some previous works on the Women’s Health Initiative (WHI) (Shaw and Prentice, 2012; Prentice, 1982; Prentice et al., 2011; Zheng et al., 2014) have shown the validity and value of using joint regression calibration approaches to tackle the measurement error issue when objective measurements are available to be used as biomarkers of all modeled dietary intakes. These biomarkers are used to build calibration equations for self-reported measurements of the exposures of interest. Calibrated intake estimates are then used to estimate associations between dietary exposures and the risks of various diseases.
There remains an important research gap in building satisfactory calibrated estimates for many nutritional (and physical activity) variables with single objective measurements. Therefore, regression models with multiple predictors have been developed from the feeding study to obtain calibrated estimates. For a single dietary component (or energy balance factors or physical activity), a regression-based biomarker has been established (Zheng et al., 2022) for the WHI Nutrition and Physical Activity Assessment Study (NPAAS). To correct the systematic measurement errors in the self-reported food frequency questionnaire (FFQ) data from the full cohort (with 161,808 subjects), blood and urine measurements were collected for a subgroup (450 subjects) of the cohort (Prentice et al., 2011). Besides, a feeding study (NPAAS-FS) was performed on another smaller subgroup (153 subjects) where both blood and urine measurements and the assessed dietary intake information were collected (Lampe et al., 2017). This controlled feeding study adopted a novel design. Instead of feeding all women with one of several specific diets, each woman was provided food and drink that mimicked her habitual diet as described by her 4-day food record (4FDR) with adjustment based on individual discussion with the study dietitian (Lampe et al., 2017). This new feeding study design can potentially improve the accuracy of capturing the measurement error in the FFQ. However, under this experimental design, there are some challenges in building the regression calibration method. Specifically, the classical measurement error assumption will generally be violated by this feeding study-based biomarker development procedure, which regresses the consumed nutrient on blood and urine measurements and personal characteristics, because the residual of the regression model is independent of the predicted value, rather than the true value. Ignoring this violation of the classical measurement error assumption causes bias in the subsequent estimates of the calibrated dietary intake and the diet-disease association due to Berkson-type errors (Prentice et al., 2021). Zheng et al. (2022) developed new calibration methods to allow for Berkson-type errors in the biomarkers for association studies on single nutritional variables. With this method, under rare disease settings, consistent estimators for the disease associations of a single dietary component and valid confidence intervals (CI) for disease association parameters are established.
When studying the association between multiple nutritional variables and chronic disease risk jointly, Prentice et al. (2022) developed calibration equations for measurement error correction when objective biomarkers are available for all the nutritional components. It remains an open question on how to deal with the measurement error problem when such objective biomarkers are not available. Can the individually developed regression-based biomarkers and corresponding calibrated estimates still be collectively used in disease association studies? Is there a need for new biomarker development methods for joint regression calibration? To answer these questions, in this paper, we develop simultaneous regression-based biomarkers for a set of nutritional variables to form new joint regression calibration methods, and numerically compare their performance with a naïve method using the individually developed biomarkers. We show asymptotic consistency and construct valid confidence intervals for association estimation using the newly proposed joint regression calibration methods.
To further evaluate the performance of the new methods and demonstrate their real data applications, we apply the new methods to evaluate the association between dietary sodium and potassium intakes and cardiovascular disease (CVD) risks. The association between sodium and potassium intake and CVD risks has been evaluated in a recent WHI study using the regression calibration method where each nutrient intake was calibrated by a single measurement biomarker (from a single 24-hour urine collection)(Prentice et al., 2017, 2013; Huang et al., 2014). A previous study suggested positive associations between CVD and sodium-to-potassium ratio (Cook et al., 2009). However, a single measurement is suboptimal for the performance of the regression calibration approach, as it has a low correlation with the true dietary intakes (Cook et al., 2009). Prentice et al. (2017) also showed that the sodium-potassium ratio is positively associated with cardiovascular disease incidence. Instead of treating the sodium-potassium ratio as a single exposure, the sodium and potassium exposures have also been studied separately. Prentice et al. (2017) showed that sodium and potassium were respectively positively and inversely associated with most cardiovascular disease outcomes (except for hemorrhagic stroke). Recent studies (Mizéhoun-Adissoda et al., 2017; Xi et al., 2015; Glatz et al., 2017) showed systolic and diastolic blood pressure to be positively associated with sodium and negatively associated with potassium. A J-shaped association was found between major cardiovascular events and sodium while no significant association was found with potassium. O’Donnell et al. (2014) found individuals with 3–6 g of sodium excretion per day have a reduced risk of CVD. A lower risk of hypertension has been identified with a higher level of potassium intake and lower level of sodium intake in many observational studies and randomized trials (Cook et al., 2009; Geleijnse, Kok and Grobbee, 2004). Long-term potassium substitution for sodium or sodium intake reduction may also lead to lower risks of CVD. Other than the sodium-to-potassium ratio, the joint effect of sodium and potassium on CVD is also of particular interest. In this paper, we apply our simultaneous regression calibration methods to understand the joint effect of sodium and potassium intakes on the risks of CVD. We also compare this approach with the naïve method, where the univariate biomarkers developed separately are used for the calibration.
2. Framework and Notation
We study the association between a -dimensional exposure variable (for example the log-transformed dietary sodium and dietary potassium intakes) and the time to develop a certain chronic disease . We further consider potential confounding variables, which we call the personal characteristics . We use a Cox proportional hazard model to model the hazard rate of the response:
| (1) |
where . is the parameter of interest, is the parameter for , and is a ‘baseline’ hazard function. From the data, instead of observing , we only observe the self-reported dietary intakes , which may be biased from in a manner that depends on the personal characteristics:
| (2) |
where is an unknown parameter and is a vector of random error that is independent of and . The main purpose of the regression calibration equation development is to estimate the relationship between and . In the WHI feeding study, we provide the subjects’ diets using standardized food, in a manner that mimics the subjects’ usual diet, while using dietary components having well-characterized nutrient content (Lampe et al., 2017). We denote true unobservable dietary intake within this two-week feeding period as
| (3) |
Equation 3 assumes that the short term consumed intake is centered around the long-term intake with random variation . This assumption is satisfied under the design of this feeding study. When deciding on the menus of the feeding study, the intake selection was designed to mimic the long-term diets of the participants. The observed short-term dietary intake in the feeding study, which may suffer from inaccuracies in nutrient databases used to translate food intakes into nutrient intakes, can be modeled as
| (4) |
To predict the dietary intakes, we consider a vector of biomarker measurements, denoted as , follow a parametric model:
| (5) |
where is an unknown parameter and is independent of , , , , and . Figure 1 displays the study design, including the feeding study (Sample 1) for biomarker development, the biomarker substudy (Sample 2) for calibration equation development, and the full cohort (Sample 3) for disease association analyses.
Fig 1:

a. Flow chart of the process from biomarker construction to association estimation; b. Direct acyclic graph for the dependence of the variables; c. Measurement availability in the datasets. In the plots, we use to denote the (unobservable) true long-term dietary intake, the personal characteristics, the biomarker from objectively blood and urine measurements, the self-reported dietary intake, the (unobservable) true short-term dietary intake, the observed short-term dietary intake, (, ) the time to the event/censoring and indicator of event/censoring.
When a self-reported intake is also available for the feeding study samples, the bias of self-reported dietary intake can be calibrated directly (see Sections 3.2 and 3.3). However, to obtain the concurrent self-reported dietary intake , one would need to have a long-term feeding study where individuals report their provided dietary intakes over preceding months(e.g. 3 months). Instead, the self-reported intake value in NPAAS-FS was obtained just prior to the feeding period so it might have measurement errors that are strongly correlated with and cannot serve as presented in equation 2. Therefore, in our motivating NPAAS-FS application, is not available and our best choice is to use the self-reported intake collected at a different time (at baseline for Sample 3 in our example) as a substitute of for Sample 1. This baseline self-reported intake has been successfully used in studies for multiple dietary components (e.g. protein and carbohydrate) (Prentice et al., 2021). However, there is a time interval between the data collection of this baseline self-reported intake and measurement time for (, , , ) in Sample 1. Therefore, there is a concern that the distribution of self-reported intake conditional on , , , in Sample 1 might be different from the distribution of conditional on , , , in Samples 2 and 3 for some dietary components. Therefore, we consider as unavailable in NPAAS-FS. Alternatively, a vector of biomarker , which is composed of objectively measured blood and urine measurements can be used to bridge between in the feeding study sample and from another larger sample (the calibration sample). We consider the blood and urine measurements to be affected by the short-term diet while the self-reported questionnaire data are directly affected by the long-term diet .
Estimation of the association between and is a 3-stage procedure using non-overlapping samples. The three stages are: 1. the biomarker construction stage, 2. the calibration stage, and 3. the association assessment stage (Figure 1 (b)). For each stage, a different sample is used. We denote the sample size of stage as . Specifically, for Stage 1, we have sample size and for each individual in this sample, we have data (, , ) and in some cases we have available; for Stage 2, we have sample size and for each individual in this sample, we have (, , ); for Stage 3, we have sample size and for each individual in this sample, we have (, ) and the composite outcome (, ) where is the time to disease occurrence and is a potential censoring time. We use the traditional counting process notation, ) and , where is the indicator function.
The estimation procedure is as follows. In Stage 1, we use data from the biomarker construction stage to establish the biomarker. This model can be built by regressing the consumed dietary intakes on one of the following:
blood/urine measurements and personal characteristics ;
blood/urine measurements , self-reported dietary intake , and personal characteristics ;
self-reported dietary intake and personal characteristics .
For (ii) and (iii), we can perform regression of each on the corresponding alone, or perform multiple regression using all .
In Stage 2, using data from the calibration stage, we build a calibration equation using the self-reported dietary intake and the personal characteristics to predict the true dietary intake if (i) or (ii) is used in Stage 1. If (iii) is used in Stage 1, then the calibration equation was already established in Stage 1 and thus Stage 2 can be omitted. One caveat is that developing biomarkers using (i) and (ii) would not satisfy the classical measurement error assumption. Therefore, new methods are needed to account for this property. When building the calibration equation, we can perform the regression of each with only or perform multivariate regression using all .
In Stage 3, we only have information on the self-reported dietary intake , the personal characteristics , and the composite survival outcome (, ). We use the calibration equation developed in Stage 2 to calibrate the self-reported dietary intake for the large cohort and perform disease association analyses.
In summary, the simultaneous regression calibration procedure has three stages: biomarker construction, calibration, and estimation. In Stage 1, the relationship between the true dietary intake and biomarker is established. If self-reported dietary intakes are not available, option (i) can be used. If is available in stage 1, whether or not is also available, relationships between and can be directly established with option (iii). If both and are available for Stage 1, one of the options from (i), (ii), and (iii) can be used. As discussed, (i) might lead to Berkson type error (Carroll, Ruppert and Stefanski, 1995) and (iii) might have low efficiency. For Stage 2, we developed bias correction methods to account for the bias introduced by the Berkson type error. For Stage 3, we can use a multivariate approach to jointly study the associations between multiple dietary components and the disease risks. Figure 1(c) shows the variable availability among the three samples.
3. Methods
In this section, we discuss methods constructed with different options from the 3-stage framework developed in the previous section. With multiple exposures, a matrix form for the variance of consumed dietary intake, , is considered. In this section, we assume is known. In the real data analysis, when is not available, we vary the parameters to perform sensitivity analysis.
3.1. Method 1:
The three-step approach with bias correction
Method 1 adopts option (i) in Stage 1 and corrects for the bias in Stage 2.
Multivariate approach for Method 1
We begin with a naïve method without bias correction. In Stage 1, we perform a linear regression among subjects in the biomarker discovery sample of the consumed diet () on blood and urine measurements () as well as subject characteristics () to obtain:
In Stage 2, we compute for to predict the long-term dietary intake () among the calibration samples and run a regression of ‘s on self-reported food frequency questionnaire data () and subject characteristics () to build the calibration equation:
Finally, we predict the exposure by for in the association samples. Then a Cox model of (, ) on and are performed to estimate the association parameter by solving the following score equation:
| (6) |
where is a pre-specified large number and we assume .
One problem of this naïve approach is the bias introduced by the Berkson type error from Step 2. We show in Appendix A.2 (Theorem S1) that for multiple exposures, where the bias factor () is defined as:
Such will lead to bias in the estimation of association parameter, and if we further assume , we show in Appendix A.2 (Theorem S1) that the estimator as , with and under rare disease approximation.
To account for the in the estimation, we propose a bias-corrected estimator using the multivariate approach where,
is an estimated version of the . Then we can further obtain the bias-corrected estimators , and . Then we can estimate by solving the estimating equation (6) replacing by .
Univariate approach for Method 1
Instead of performing multivariate analysis, it might be attractive to perform a linear regression of each dietary exposure on and separately in the first step with or without additional bias factor adjustment. Because this will allow us to develop biomarkers once and then can be shared with multiple study purposes of different combinations of dietary intakes. So we will also study the performance of such an approach. Each element in can be estimated as below:
for . Then we can derive , that is,
for in the full cohort.
Similar to in the multivariate case, we can fit element-wise using:
where denote the element along the diagonal of . Then the bias-corrected estimators can be calculated as , and . Then we can estimate by solving the estimating equation (6) replacing by
Method 1 does not require the self-reported dietary intake data () in the feeding study, where we have multiple exposures. Next, we propose two methods that assume the availability of self-reported data in the feeding study and the association between the self-reported and the actual dietary intake to be the same among all studies.
3.2. Method 2:
Three-step with self-reported data
Method 2 uses option (ii) in Stage 1 and the bias correction is not needed.
Multivariate approach for Method 2
When the self-reported data is available from the feeding study and we believe that the distribution of (, ) is the same between controlled feeding study and the cohort, then the bias in the naïve estimator can be corrected simply by including in the biomarker development equation because the inclusion of guarantee that .
Based on this discovery, in the Stage 1 regression model, we add the self-reported food frequency questionnaire data (). To be more specific, for the first step, we regress on , , and to build the biomarker, and then use , and to predict in the second step. Mathematically, we have:
for , and then,
and for . We obtain by solving the estimating equation 6 replacing by .
Univariate approach for Method 2
The univariate approach for Method 2 is similar to the multivariate version. In Step 1, we add as a predictor in the linear model for to build the predicting equation for the kth biomarker. Mathematically, we have
Then in Stage 2 we have:
Finally, we predict the exposure by:
for in the full cohort.
for in the full cohort, and obtain by solving estimating equation 6 replacing by .
3.3. Method 3:
Two-step direct estimation
Method 3 adopts option (iii) in Stage 1 and skips Stage 2.
Multivariate approach for Method 3
When is available from the feeding study, there is an option to skip Stage 2 and directly build the estimating equation by regressing on and in Stage 1 and apply it to Stage 3.
With Method 3, we do not need the calibration samples for Stage 2. Instead, we directly build the calibration equation using the feeding study in Stage 1 by regressing on and and use the calibration equation to predict and perform a Cox regression of (, ) on and in the full cohort to estimate the association parameter based on different types of outcomes. Mathematically, we have and for in the full cohort and by solving estimating equation (6) replacing by .
Univariate approach for Method 3
In the univariate approach, we perform a linear regression of each exposure on , separately to build biomarkers. We have and then predict for .Then let , for in the full cohort. Finally, we can obtain by solving estimating equation (6) replacing by .
As a remark, Method 3 depends on the availability of the concurrent self-reported dietary intake in Sample 1. However, as discussed in Section 2, appropriate is rarely available. Another concern is that the sample size of a feeding study is typically limited, leading to non-satisfactory efficiency for disease association estimates, even in the cases where is univariate (Huang et al., 2022).
4. Asymptotics
In this section, we show that under our model framework (1)-(5) and the rare disease assumption:
| (7) |
, , , are consistent. Notice that our model framework satisfies the linearity assumption, normality assumption, and proportional hazard assumption which are required for the original regression calibration method (Prentice, 1982). We also show their asymptotic distributions in the following theorem. The proof can be found in Appendix A.1.
Later in Section 5, we will see from numerical studies that the naïve estimator and estimators from all the univariate approaches are not consistent. In practice, the violation of rare disease assumption will also lead to estimation bias in our proposed multivariate estimators (, , ) as well but the magnitude is much smaller than the other estimators (, , , , ) (as shown in the simulation section).
Theorem 1. With and , for , we have
where can be consistently estimated by with the detailed expressions defined in the proofs in Appendix A.
Here the does not depend on , though its corresponding estimate can depend on .
5. Simulation Studies
We perform simulations to study the finite sample behavior of our proposed estimators. We generate data from the following models:
and the event time is sampled based on the hazard function
where , , , are bivariate () and is a single covariate in our simulation. Hence,
and . The error terms, , and are independently sampled from multivariate normal distributions with mean zero and covariate matrix of , and ; we simulate from multivariate normal distribution as below:
Censoring time is sampled from a mixture of Uniform(0, 10) and a point mass at 10 with equal probability. We vary the parameters to form eight representative settings. First, in Settings 1–4, we let by setting . In Settings 5–8, and are not independent conditioning on by setting . For Settings 1, 3, 5, and 7, we simulate strong biomarkers for the dietary intakes, and for Settings 2, 4, 6, and 8, we simulate weak biomarkers. The strength of the self-reported data is fixed in every two sequential settings and varies among every two settings. More detailed information on the explained variation of the true () and consumed () dietary intakes by the biomarkers () and the FFQ information () can be found in Table S1. Detailed parameter settings are presented in Appendix B.1. We fix the sample size at , , for all settings. We compare our proposed multivariate Methods 1–3 with the Naïve Method estimator and the univariate methods as described in Section 3.
The bias, empirical standard deviation (SD), and coverage rate (CR) of 95% nominal confidence interval from 1000 simulations are calculated and the results are summarized in Table 1. The results using the multivariate approach are displayed in the left panel in all tables. The Naïve Method shows significant bias in most cases. The SDs are much larger than our proposed methods, and both over-coverage and under-coverage occur. With Methods 1–3, the biases were greatly attenuated, especially when biomarkers are relatively strong (Settings 1, 3, 5, 7), the SDs are smaller than the Naïve Method, and the coverage rates are around the nominal value of 95%. The performances of the three proposed methods have some differences across settings. Specifically, with relatively weak biomarkers and FFQ information, Method 1 has slightly larger biases and SD (i.e., Settings 4 and 8). The larger SD from Method 1 for settings with weak biomarkers (Settings 4 and 8) is due to several simulation replicates where is close to 0. This is in general not the case in real applications. As is not very small, or the estimated variance is not very large, Method 1 provides good results that are valid under weak assumptions. Methods 2 and 3 provide consistently good estimations on association parameters with good CRs. However, the efficiency of Method 3 is not as good as Methods 1 and 2 when the FFQ signal is weak but the biomarker is strong (Settings 3 and 7).
Table 1.
Simulation results under the multivariate settings (parameters can be found in Appendix B.1) comparing multivariate and univariate approaches.
| Setting | Method | Multivariate | Univariate | ||||
|---|---|---|---|---|---|---|---|
| Bias | SD | CR | Bias | SD | CR | ||
| 1 | Naïve | 0.04 0.10 |
0.264 0.236 |
0.96 0.93 |
0.33 0.19 |
0.224 0.184 |
0.68 0.82 |
| 1 | 0.00 0.00 |
0.155 0.150 |
0.95 0.95 |
0.00 −0.05 |
0.123 0.127 |
0.95 0.93 |
|
| 2 | 0.00 0.00 |
0.162 0.152 |
0.95 0.95 |
0.10 0.00 |
0.155 0.142 |
0.90 0.95 |
|
| 3 | 0.00 0.00 |
0.168 0.161 |
0.96 0.94 |
0.10 0.00 |
0.159 0.147 |
0.91 0.95 |
|
| 2 | Naïve | −0.47 0.78 |
1.658 1.628 |
1.00 1.00 |
0.77 0.98 |
0.379 0.415 |
0.47 0.34 |
| 1 | 0.01 −0.01 |
0.230 0.195 |
0.99 0.99 |
−0.05 −0.10 |
0.115 0.127 |
0.93 0.87 |
|
| 2 | 0.00 0.00 |
0.164 0.154 |
0.95 0.95 |
0.10 0.00 |
0.158 0.146 |
0.90 0.95 |
|
| 3 | 0.00 0.00 |
0.168 0.161 |
0.96 0.94 |
0.10 0.00 |
0.159 0.147 |
0.90 0.95 |
|
| 3 | Naïve | 0.04 0.11 |
0.457 0.438 |
0.98 0.98 |
0.44 0.22 |
0.267 0.254 |
0.62 0.86 |
| 1 | 0.00 0.00 |
0.251 0.264 |
0.98 0.98 |
0.06 −0.03 |
0.147 0.176 |
0.93 0.94 |
|
| 2 | 0.00 0.00 |
0.270 0.276 |
0.97 0.96 |
0.20 0.05 |
0.198 0.213 |
0.83 0.94 |
|
| 3 | 0.00 0.01 |
0.367 0.391 |
0.98 0.97 |
0.20 0.06 |
0.209 0.227 |
0.84 0.94 |
|
| 4 | Naïve | −0.93 1.28 |
6.682 7.059 |
1.00 1.00 |
0.92 0.99 |
0.446 0.536 |
0.46 0.54 |
| 1 | −0.05 0.05 |
0.818 0.948 |
1.00 1.00 |
0.00 −0.10 |
0.135 0.165 |
0.95 0.90 |
|
| 2 | 0.00 0.01 |
0.290 0.297 |
0.97 0.97 |
0.20 0.06 |
0.204 0.222 |
0.83 0.94 |
|
| 3 | 0.00 0.01 |
0.367 0.391 |
0.98 0.97 |
0.20 0.06 |
0.209 0.227 |
0.84 0.94 |
|
| 5 | Naïve | 0.07 0.12 |
0.197 0.164 |
0.95 0.89 |
0.25 0.14 |
0.232 0.185 |
0.82 0.87 |
| 1 | 0.00 0.00 |
0.132 0.123 |
0.96 0.95 |
−0.09 −0.11 |
0.110 0.120 |
0.85 0.86 |
|
| 2 | 0.00 0.00 |
0.136 0.123 |
0.95 0.95 |
−0.02 −0.07 |
0.137 0.131 |
0.95 0.91 |
|
| 3 | 0.00 0.00 |
0.139 0.130 |
0.96 0.93 |
−0.02 −0.07 |
0.137 0.135 |
0.94 0.91 |
|
| 6 | Naïve | −0.74 1.47 |
0.875 1.090 |
0.99 0.99 |
1.73 2.58 |
0.912 1.122 |
0.52 0.36 |
| 1 | 0.01 −0.01 |
0.152 0.142 |
0.99 0.99 |
−0.17 −0.19 |
0.100 0.135 |
0.58 0.71 |
|
| 2 | 0.00 0.00 |
0.137 0.125 |
0.95 0.95 |
−0.02 −0.07 |
0.138 0.134 |
0.95 0.91 |
|
| 3 | 0.00 0.00 |
0.139 0.130 |
0.96 0.93 |
−0.02 −0.07 |
0.137 0.135 |
0.95 0.94 |
|
| 7 | Naïve | 0.07 0.13 |
0.376 0.333 |
0.97 0.96 |
0.41 0.19 |
0.350 0.315 |
0.77 0.91 |
| 1 | 0.00 0.00 |
0.233 0.232 |
0.98 0.97 |
−0.02 −0.08 |
0.166 0.204 |
0.94 0.93 |
|
| 2 | 0.00 0.01 |
0.251 0.247 |
0.97 0.97 |
0.15 0.02 |
0.255 0.264 |
0.90 0.95 |
|
| 3 | 0.01 0.01 |
0.269 0.270 |
0.98 0.95 |
0.16 0.03 |
0.261 0.279 |
0.89 0.95 |
|
| 8 | Naïve | −1.83 2.95 |
38.318 50.535 |
1.00 1.00 |
1.99 2.44 |
1.210 1.561 |
0.64 0.66 |
| 1 | −0.07 0.03 |
2.385 2.328 |
1.00 1.00 |
−0.14 −0.21 |
0.132 0.196 |
0.82 0.82 |
|
| 2 | 0.00 0.01 |
0.259 0.255 |
0.97 0.97 |
0.16 0.02 |
0.259 0.276 |
0.91 0.96 |
|
| 3 | 0.01 0.01 |
0.269 0.270 |
0.98 0.95 |
0.16 0.03 |
0.261 0.279 |
0.88 0.95 |
|
The univariate approach (right panel) does not have as good performance as the multivariate approach. Biases are higher than those from the multivariate approach in all settings with all different methods. The SDs from the univariate approach are smaller compared with the multivariate approach in most cases. These will lead to unsatisfactory CR for 95% confidence intervals. Additional simulation results under the univariate type data generation mechanism (parameters given in Appendix B.2) can be found in Appendix B.3.
6. Data Analysis
We demonstrate the application of our proposed methods with the WHI NPAAS feeding study (, Years 2010–2014), the NPAAS biomarker study (, Years 2006–2009), and the full WHI cohort data (Years 1993–1998) in this section. The average gap between the NPAAS-FS and the WHI enrollment is 17 years. We are interested in different CVD outcomes including total coronary heart disease (CHD), coronary death, total stroke, total CVD, and heart failure. The incidence of CVD events ranges from 1–10%, which suggests the rare disease assumption is not severely violated. The follow-up times began with the time of FFQ measurement (year-1 visit in the dietary modification control arm (DM-C) and at enrollment in the observational study (OS)) and continued until the earliest of the specific CVD outcomes under analysis, death, loss to follow-up, or September 30, 2010, whichever occurred first. We adopted two options to form the exposure variables, the log-transformed sodium and potassium intakes in milligrams per day (mg/day), or the log-transformed ratio between sodium intakes and calories, and the log-transformed ratio between potassium intakes and calories in milligrams per kcal (mg/kcal). The Sodium (Na) and Potassium (K) biomarkers were measured based on 24-hour urine sodium and potassium analyses performed by ion-selective electrode (Korzun and Miller, 1987). The units of biomarkers () and FFQ information () also match the units used for the exposures. When measured at different scales, the correlation between the log-transformed Na and K are 0.21 (mg/day) and 0.10 (mg/kcal) from the consumed diet, 0.47 (mg/day) and 0.41 (mg/kcal) from the urine measurements, and 0.80 (mg/day) and 0.10 (mg/kcal) from the self-reported diet. The correlation between the consumed diet and the self-reported diet () for Na(mg/day), K(mg/day), Na(mg/kcal), and K(mg/kcal) are 0.17, 0.41, 0.12 and 0.30 respectively, which are relatively weak. The correlation between the consumed diet and the urine measurements () for Na(mg/day), K(mg/day), Na(mg/kcal), and K(mg/kcal) are 0.57, 0.57, 0.44, and 0.54 respectively, which are relatively strong. Covariates considered in our analysis include age, BMI, race/ethnicity, education level, family history of CVD, smoking status (current smoker; previous smoker; non-smoker as well as the number of cigarettes per day, and smoking years), blood pressure (systolic and diastolic), previous CVD, treated diabetes, treated hypertension, usage of the cholesterol-lowering drug (statin and acetylsalicylic acid) self-reported physical activity, usage of hormone therapy, and trial status (observational study and arms of the clinical trial). For Stages 1 and 2, only covariates with enough variation are selected while for Stage 3, all potential confounding variables are adjusted. Specifically, the Stage 3 analysis uses Cox model adjusting the age, race/ethnicity, education level, family history of CVD, smoking status, blood pressure, previous CVD, treated diabetes, treated hypertension, usage of cholesterin lowering drug, self-reported physical activity, usage of hormone therapy and stratified by 5-year age categories and trial status. As discussed in Prentice et al. (2017), to prevent over-adjustment, we excluded BMI from the variable list for adjustment in our main analysis, and further adjusted for BMI in a sensitivity analysis. Multivariate and univariate analyses with regard to the two exposure options are performed.
With multiple exposures, the adjusted bias factors are estimated by
where can be treated as a sensitivity parameter under multivariate analysis. The most conservative estimate on , a zero matrix, is used to illustrate the potential bias. The adjusted bias factor under univariate analysis is estimated by where , 2 denotes BF for the sodium and potassium exposures.
The estimated hazard ratios (HRs) for the CVD outcomes with a 20% increase in the sodium or potassium intakes from the multivariate analyses are shown in Tables 2 (mg/kcal) and 3 (mg/day) respectively. The two scales of the measurements on the intakes (mg/kcal and mg/day) provide slightly different interpretations, and the results are consistent with each other. Overall our analyses show findings that are consistent with the results in Prentice et al. (2017). Specifically, the analyses using Method 2 in both Tables 2 and 3 show that the increased sodium intake is significantly positively associated with the risks of CHD, coronary death, total CVD, and heart failure; the increased potassium intake is significantly negatively associated with the risks of CHD, stroke and total CVD. Comparing different methods, without measurement error correction (No correction), the magnitudes of the estimated effects are the smallest (HRs closest to 1), and the CIs are much narrower than those from Methods 1–3. For most of the outcomes the Naïve Method exaggerates the effects (HR furthest from 1). The estimated HRs using Methods 2 and 3 are similar. The estimated effects using Method 1 are in the same direction as Methods 2 and 3. For all the CVD outcomes except for Stroke, the magnitude of the estimated sodium effects are smaller using Method 1. Compared with the Naïve method, the proposed Methods 1–3 have narrower confidence intervals, leading to more significant associations.
Table 2:
Hazard ratio (HR) and 95% confidence interval (CI) of various cardiovascular disease outcomes (coronary heart disease (CHD), coronary death, stroke, total cardiovascular disease (CVD) and heart failure) associated with 20% increase in sodium and potassium (mg/kcal) from multivariate methods.
| Outcome | Naïve Method | Method 1 | Method 2 | Method 3 | No correction | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| Sample size N=105,571 | HR | 95% CI | HR | 95% CI | HR | 95% CI | HR | 95% CI | HR | 95% CI | |
| CHD (event n=4,214) | Sodium | 1.14 | (0.95,1.36) | 1.06 | (1.02,1.10) | 1.14 | (1.07,1.22) | 1.13 | (1.03,1.23) | 1.01 | (0.98,1.04) |
| Potassium | 0.72 | (0.52,1.01) | 0.90 | (0.84,0.97) | 0.89 | (0.84,0.95) | 0.88 | (0.82,0.95) | 0.95 | (0.92,0.97) | |
| Coronary death (event n=1,468) | Sodium | 1.35 | (1.11,1.63) | 1.09 | (1.03,1.14) | 1.27 | (1.11,1.46) | 1.24 | (1.10,1.40) | 1.04 | (0.99,1.10) |
| Potassium | 1.00 | (0.63,1.58) | 0.96 | (0.87,1.06) | 0.95 | (0.86,1.05) | 0.94 | (0.83,1.06) | 0.96 | (0.93,1.00) | |
| Stroke (event n=3,469) | Sodium | 0.95 | (0.78,1.15) | 1.01 | (0.97,1.05) | 0.99 | (0.92,1.07) | 1.01 | (0.94,1.08) | 0.98 | (0.95,1.02) |
| Potassium | 0.72 | (0.49,1.07) | 0.92 | (0.85,1.00) | 0.94 | (0.88,0.99) | 0.93 | (0.88,0.99) | 0.97 | (0.95,1.00) | |
| Total CVD (event n=9,902) | Sodium | 1.11 | (0.98,1.27) | 1.05 | (1.01,1.09) | 1.12 | (1.05,1.19) | 1.10 | (1.05,1.16) | 1.01 | (0.99,1.03) |
| Potassium | 0.78 | (0.60,1.02) | 0.92 | (0.87,0.97) | 0.93 | (0.89,0.98) | 0.93 | (0.88,0.98) | 0.96 | (0.95,0.98) | |
| Heart Failure (event n=2,078) | Sodium | 1.75 | (1.29,2.36) | 1.13 | (1.04,1.23) | 1.46 | (1.28,1.67) | 1.43 | (1.20,1.70) | 1.02 | (0.98,1.07) |
| Potassium | 1.44 | (0.76,2.73) | 1.01 | (0.87,1.18) | 0.96 | (0.82,1.13) | 0.94 | (0.76,1.17) | 0.97 | (0.94,1.00) | |
Table 3:
Hazard ratio (HR) and 95% confidence interval (CI) of various cardiovascular disease outcomes (coronary heart disease (CHD), coronary death, stroke, total cardiovascular disease (CVD) and heart failure) associated with 20% increase in sodium and potassium (mg/day) from multivariate methods.
| Outcome | Naïve Method | Method 1 | Method 2 | Method 3 | No correction | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| Sample size N=105,571 | HR | 95% CI | HR | 95% CI | HR | 95% CI | HR | 95% CI | HR | 95% CI | |
| CHD (event n=4,214) | Sodium | 1.19 | (1.11,1.29) | 1.07 | (1.03,1.12) | 1.15 | (1.09,1.21) | 1.16 | (1.10,1.23) | 1.04 | (1.02,1.06) |
| Potassium | 0.81 | (0.69,0.95) | 0.93 | (0.88,0.98) | 0.92 | (0.85,1.00) | 0.94 | (0.86,1.00) | 0.95 | (0.93,0.97) | |
| Coronary death (event n=1,468) | Sodium | 1.27 | (1.14,1.42) | 1.10 | (1.05,1.16) | 1.22 | (1.10,1.34) | 1.23 | (1.11,1.37) | 1.03 | (1.00,1.07) |
| Potassium | 0.81 | (0.59,1.12) | 0.93 | (0.84,1.03) | 0.92 | (0.81,1.05) | 0.93 | (0.82,1.05) | 0.95 | (0.91,0.98) | |
| Stroke (event n=3,469) | Sodium | 1.07 | (0.98,1.17) | 1.03 | (0.99,1.07) | 1.01 | (0.96,1.07) | 1.04 | (0.97,1.11) | 1.01 | (0.98,1.03) |
| Potassium | 0.75 | (0.52,1.09) | 0.91 | (0.81,1.02) | 0.91 | (0.85,0.98) | 0.93 | (0.88,0.98) | 0.97 | (0.95,0.99) | |
| Total CVD (event n=9,902) | Sodium | 1.16 | (1.09,1.24) | 1.06 | (1.03,1.10) | 1.12 | (1.06,1.18) | 1.12 | (1.06,1.18) | 1.02 | (1.01,1.04) |
| Potassium | 0.80 | (0.65,0.99) | 0.93 | (0.88,0.98) | 0.93 | (0.88,0.98) | 0.94 | (0.90,0.98) | 0.96 | (0.95,0.98) | |
| Heart Failure (event n=2,078) | Sodium | 1.30 | (1.13,1.50) | 1.11 | (1.05,1.17) | 1.37 | (1.21,1.56) | 1.43 | (1.22,1.68) | 1.03 | (1.00,1.06) |
| Potassium | 1.37 | (0.89,2.09) | 1.10 | (0.97,1.26) | 1.08 | (0.92,1.28) | 1.07 | (0.92,1.25) | 0.98 | (0.95,1.01) | |
Tables S3 and S4 in Appendix C show the univariate analysis results of the estimated hazard ratio according to a 20% increase in sodium and potassium measured in the unit of mg/kcal and mg/day respectively. For the univariate analysis, we use the univariate approach for the error correction step as described in Section 3, whereas the association step remains the same as the multivariate analysis. Compared with the multivariate analyses, point estimates of the effects using the univariate analyses are slightly larger (higher estimated HR for sodium and lower estimated HR for potassium), with wider confidence intervals. This may be due to the strong correlation effects among different multivariate estimated assessed dietary intakes. When the correlation between Na and K intakes is moderate, the results from the univariate analysis and the multivariate analysis are similar (see Tables 2 and S3); whereas when the correlation between Na and K intakes is high, the difference in the results are more obvious (see Tables 3 and S4). From our simulations, the univariate approach tends to generate biased point estimates. To summarize, the multivariate approach provides more efficient results with narrower confidence intervals in estimating the associations between CVD outcomes and sodium and potassium intakes compared to the univariate approach.
As commented in Prentice et al. (2017), there are concerns about over-adjustment when including BMI as a confounding variable in similar studies. As a sensitivity study, we show the multivariate analysis results when further including BMI as a confounding variable (Tables S5 and S6). The point estimates of the HRs are slightly smaller, and the confidence intervals are wider; suggesting that the inclusion of the BMI variable may affect the inference result.
7. Discussion
In this paper, we develop methods utilizing feeding studies to calibrate the measurement errors in self-reported dietary intakes and estimate the diet-disease association. One limitation of our proposed methods is the requirement of a series of parametric and semiparametric model assumptions such as the linearity, normality, and proportional hazard assumption. It will be worth conducting research to relax these assumptions in the future.
We construct valid biomarkers for regression calibration purposes with multiple exposures. We compare the performances of the multivariate and univariate analyses in controlling the estimation bias. Although the multivariate approach is more complex to implement, it has much better performance. From our simulation studies, we discover that the univariate approach does not perform well in general; the bias is well controlled only when the bivariate long-term dietary intakes are independent conditioning on the personal characteristics. When the multiple exposures are correlated conditioning on the personal characteristics, the univariate approach can lead to large biases even for the univariate data settings where and only affect corresponding and and and only affect corresponding and . (see details in Appendix B.2). The multivariate approach (Methods 1–3) produces estimators with small biases and narrow confidence intervals with good CR in most settings. However, for some cases the univariate approach has moderate bias and substantially lower SD, leading to smaller mean square error (MSE) than the multivariate approach. Hence the bias-variance trade-off needs to be considered in real applications. In general, the multivariate approach generated consistent and more robust estimations of association parameters compared with the univariate approach. The multivariate approach is recommended, especially when correlations among multiple dietary intakes conditioning on personal characteristics exist. On the other hand, the univariate analysis can be considered when the long-term dietary intakes are independent conditioning on personal characteristics.
Among the multivariate methods, the estimation of association parameters using Method 1 can be affected by weak biomarker information. For Settings 2, 4, 6, and 8 in our simulation study, a few estimated values on SE are very large. Such issues can be resolved by increasing the sample sizes in the biomarker construction and calibration building steps. The performance of the Naïve Method in controlling bias is not good in most settings and should not be used. Methods 2 and 3 provide consistent estimations and have more efficient results when the FFQ information is strongly associated with long-term dietary intakes. However, in reality, the association between the FFQ data and the true dietary intakes may be much smaller than the value shown in such settings. Under such cases, Method 1 can give better efficiency. In summary, Method 3 is straightforward but requires a strong dietary instrument and a large sample size. Method 2 uses biomarker information efficiently, and is stable with consistent dietary habits. In the absence of suitable dietary instruments, Method 1 can be considered. Method 1 is superior with strong biomarkers and weak dietary instruments, while Method 3 is better in reverse situations.
In order to derive asymptotic SE for Method 1 under multivariate analysis, the Delta method was used to approximate the . The Bootstrap approach provides better finite sample performance in estimating than the asymptotic variance computed using the Delta method and thus we use the Bootstrap method in our implementation.
Supplementary Material
Acknowledgements
This work was supported in part by grant R01 CA119171 from the U.S. National Cancer Institute and R01 GM106177 from the National Institute of General Medical Sciences.
The WHI programs are funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts, HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C.
The authors acknowledge the following investigators in the Women’s Health Initiative (WHI) Program: Program Office: Jacques E. Rossouw, Shari Ludlam, Dale Burwen, Joan McGowan, Leslie Ford, and Nancy Geller, National Heart, Lung, and Blood Institute, Bethesda, Maryland; Clinical Coordinating Center, Women’s Health Initiative Clinical Coordinating Center: Garnet L. Anderson, Ross L. Prentice, Andrea Z. LaCroix, and Charles L. Kooperberg, Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington; Investigators and Academic Centers: JoAnn E. Manson, Brigham and Women’s Hospital, Harvard Medical School, Boston,Massachusetts; Barbara V. Howard, MedStar Health Research Institute/Howard University, Washington, DC; Marcia L. Stefanick, Stanford Prevention Research Center, Stanford, California; Rebecca Jackson, The Ohio State University, Columbus, Ohio; Cynthia A. Thomson, University of Arizona, Tucson/Phoenix, Arizona; Jean Wactawski-Wende, University at Buffalo, Buffalo, New York; Marian C. Limacher, University of Florida, Gainesville/Jacksonville, Florida; Robert M. Wallace, University of Iowa, Iowa City/ Davenport, Iowa; Lewis H. Kuller, University of Pittsburgh, Pittsburgh, Pennsylvania; and Sally A. Shumaker, Wake Forest University School of Medicine, Winston-Salem, North Carolina; Women’s Health Initiative Memory Study: Sally A. Shumaker, Wake Forest University School of Medicine,Winston-Salem, North Carolina. For a list of all the investigators who have contributed to WHI science, please visit: https://www.whi.org/researchers/SitePages/WHI%20Investigators.aspx.
Decisions concerning study design, data collection and analysis, interpretation of the results, the preparation of the manuscript, and the decision to submit the manuscript for publication resided with committees that comprised WHI investigators and included National Heart, Lung, and Blood Institute representatives. The contents of the paper are solely the responsibility of the authors.
REFERENCES
- ADAMS KF, SCHATZKIN A, HARRIS TB, KIPNIS V, MORRIS T and BALLARD-BARBASH R (2006). Overweight, obesity and mortality in a large prospective cohort of persons 50 to 71 years old. New England Journal of Medicine 355 763–778. [DOI] [PubMed] [Google Scholar]
- BARTLETT JW and KEOGH RH (2018). Bayesian correction for covariate measurement error: A frequentist evaluation and comparison with regression calibration. Statistical Methods in Medical Research 27 1695–1708. [DOI] [PubMed] [Google Scholar]
- CARROLL RJ, RUPPERT D and STEFANSKI LA (1995). Measurement Error in Nonlinear Models Chapman and Hall London [Google Scholar]
- CARROLL RJ, RUPPERT D, STEFANSKI LA and CRAINICEANU CM (2006). Measurement error in nonlinear models: a modern perspective CRC Press, US. [Google Scholar]
- COOK NR, OBARZANEK E, CUTLER JA, BURING JE, REXRODE KM, KUMANYIKA SK, APPEL LJ and WHELTON PK (2009). Joint effects of sodium and potassium intake on subsequent cardiovascular disease: the Trials of Hypertension Prevention follow-up study. Archives of Internal Medicine 169 32–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- WORLD CANCER RESEARCH FUND/AMERICAN INSTITUTE FOR CANCER RESEARCH (WCRF/AICR) (2007). Food, Nutrition and the Prevention of Cancer: A Global Perspective Washington, DC: American Institute for Cancer Research. [Google Scholar]
- FREEDMAN LS, SCHATZKIN A, MIDTHUNE D and KIPNIS V (2011). Dealing with dietary measurement error in nutritional cohort studies. Journal of the National Cancer Institute 103 1086–1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GELEIJNSE JM, KOK FJ and GROBBEE DE (2004). Impact of dietary and lifestyle factors on the prevalence of hypertension in Western populations. The European Journal of Public Health 14 235–239. [DOI] [PubMed] [Google Scholar]
- GLATZ N, CHAPPUIS A, CONEN D, ERNE P, PÉCHÈRE-BERTSCHI A, GUESSOUS I, F OGNA V, GABUTTI L, MUGGLI F, GALLINO A et al. (2017). Associations of sodium, potassium and protein intake with blood pressure and hypertension in Switzerland. Swiss Medical Weekly 147 w14411. [DOI] [PubMed] [Google Scholar]
- HU C and LIN DY (2002). Cox regression with covariate measurement error. Scandinavian Journal of Statistics 29 637–655. [Google Scholar]
- HUANG Y and WANG CY (2000). Cox regression with accurate covariate unascertainable: A nonparametric-correction approach. Journal of the American Statistical Association 45 1209–1219. [Google Scholar]
- HUANG Y, VAN HORN L, TINKER LF, NEUHOUSER ML, CARBONE L, MOSSAVAR-RAHMANI Y, THOMAS F and PRENTICE RL (2014). Measurement error corrected sodium and potassium intake estimation using 24-hour urinary excretion. Hypertension 63 238–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- HUANG Y, ZHENG C, TINKER L, NEUHOUSER M and PRENTICE R (2022). Biomarker-based methods and study designs to calibrate dietary intake for assessing diet-disease associations. Journal of Nutrition 152 899–906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- KORZUN W and MILLER W (1987). Sodium and potassium. In Methods in Clinical Chemistry (Pesce A and L K, eds.) p. 86. CV Mosby, St. Louis MO. [Google Scholar]
- LAMPE JW, HUANG Y, NEUHOUSER ML, TINKER LF, SONG X, SCHOELLER DA, KIM S, RAFTERY D, DI C, ZHENG C, SCHWARZ Y, HORN LV, THOMSON CA, MOSSAVAR-RAHMANI Y, BERESFORD SAA and PRENTICE RL (2017). Dietary biomarker evaluation in a controlled feeding study in women from the Women’s Health Initiative cohort. The American Journal of Clinical Nutrition 105 466–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- LI Y and RYAN L (2006). Inference on survival data with covariate measurement error-An imputationapproach. Scandinavian Journal of Statistics 33 169–190. [Google Scholar]
- MIZÉHOUN-ADISSODA C, HOUINATO D, HOUEHANOU C, CHIANEA T, DALMAY F, BIGOT A, ABOYANS V, PREUX P-M, BOVET P and DESPORT J-C (2017). Dietary sodium and potassium intakes: Data from urban and rural areas. Nutrition 33 35–41. [DOI] [PubMed] [Google Scholar]
- O’DONNELL M, MENTE A, RANGARAJAN S, MCQUEEN MJ, WANG X, LIU L, YAN H, LEE SF, MONY P, DEVANATH A et al. (2014). Urinary sodium and potassium excretion, mortality, and cardiovascular events. New England Journal of Medicine 371 612–623. [DOI] [PubMed] [Google Scholar]
- PAERATAKUL S, POPKIN BM, KOHLMEIER L, HERTZ-PICCIOTTO I, GUO X and EDWARDS LJ (1998). Measurement error in dietary data: Implications for the epidemiologic study of the dietdisease relationship. European Journal of Clinical Nutrition 52 722–727. [DOI] [PubMed] [Google Scholar]
- PRENTICE RL (1982). Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika 69 331–342. [Google Scholar]
- PRENTICE RL, MOSSAVAR-RAHMANI Y, HUANG Y, HORN LV, BERESFORD SAA, CAAN B, TINKER L, SCHOELLER D, BINGHAM S, EATON CB, THOMSON C, JOHNSON KC, OCKENE J, SARTO G, HEISS G and NEUHOUSER ML (2011). Evaluation and comparison of food records, recalls, and frequencies for energy and protein assessment by using recovery biomarkers. American Journal of Epidemiology 174 591–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- PRENTICE RL, TINKER LF, HUANG Y and NEUHOUSER ML (2013). Calibration of self-reported dietary measures using biomarkers: an approach to enhancing nutritional epidemiology reliability. Current Atherosclerosis Reports 15 353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- PRENTICE RL, HUANG Y, NEUHOUSER ML, MANSON JE, MOSSAVAR-RAHMANI Y, THOMAS F, TINKER LF, ALLISON M, JOHNSON KC, WSSERTHEIL-SMOLLER S, SETH A, ROSSOUW JE, SHIKANY J, CRBONE LD, MARTIN LW, STEFANICK M, HARING B and HORN LV (2017). Associations of biomarker-calibrated sodium and potassium intakes with cardiovascular disease risk among postmenopausal women. American Journal of Epidemiology 186 1035–1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- PRENTICE R, PETTINGER M, NEUHOUSER M, RAFTERY D, ZHENG C, GOWDA N, HUANG Y, TINKER L, HOWARD B, MANSON J, WALLACE R, MOSSAVAR-RAHMANI Y, JOHNSON K and LAMPE J (2021). Biomarker-calibrated macronutrient intake and chronic disease risk among postmenopausal women. Journal of Nutrition 151 2330–2341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- PRENTICE RL, ARAGAKI AK, VAN HORN L, THOMSON CA, TINKER LF, MANSON JE, MOSSAVAR-RAHMANI Y, HUANG Y, ZHENG C, BERESFORD SAA, WALLACE R, ANDERSON GL, LAMPE JW and NEUHOUSER ML (2022). Mortality Associated with Healthy Eating Index Components and an Empirical-Scores Healthy Eating Index in a Cohort of Postmenopausal Women. The Journal of Nutrition 152 2493–2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ROSNER B, SPIEGELMAN D and WILLETT WC (1990). Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology 132 734–745. [DOI] [PubMed] [Google Scholar]
- SHAW PA and PRENTICE RL (2012). Hazard ratio estimation for biomarker-calibrated dietary exposures. Biometrics 68 397–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- SONG X and HUANG X (2005). On corrected score approach for proportional hazards model with covariatemeasurement error. Biometrics 61 702–714. [DOI] [PubMed] [Google Scholar]
- WANG CY, HSU L, FENG ZD and PRENTICE RL (1997). Regression calibration in failure time regression. Biometrics 53 131–145. [PubMed] [Google Scholar]
- XI L, HAO Y-C, LIU J, WANG W, WANG M, LI G-Q, QI Y, ZHAO F, XIE W-X, LI Y et al. (2015). Associations between serum potassium and sodium levels and risk of hypertension: a community-based cohort study. Journal of Geriatric Cardiology: JGC 12 119–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- YAN Y and YI GY (2015). A corrected profile likelihood method for survival data with covariate measurement error under the Cox model. The Canadian Journal of Statistics 43 454–480. [Google Scholar]
- ZHENG C, BERESFORD SAA, HORN LV, TINKER LF, THOMSON CA, NEUHOUSER ML, DI C, MANSON JE, MOSSAVAR-RAHMANI Y, SEGUIN R, MANINI T, LACROIX AZ and PRENTICE RL (2014). Simultaneous association of total energy consumption and activity-related energy expenditure with cardiovascular disease, cancer, and diabetes risk among postmenopausal women. American Journal of Epidemiology 180 526–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ZHENG C, ZHANG Y, HUANG Y and PRENTICE R (2022). Using controlled feeding study for biomarker development in regression calibration for disease association estimation. Statistics in Biosciences epub ahead. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ZUCKER DM (2005). A pseudo-partial likelihood method for semiparametric survival regression withcovariate errors. Journal of the American Statistical Association 100 1264–1277. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
