Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Feb 14.
Published in final edited form as: Stat Med. 2008 Aug 15;27(18):3466–3489. doi: 10.1002/sim.3238

Measurement error correction for nutritional exposures with correlated measurement error: Use of the method of triads in a longitudinal setting

Bernard Rosner 1,*,, Karin B Michels 2,3, Ya-Hua Chen 1, Nicholas E Day 4
PMCID: PMC3038790  NIHMSID: NIHMS66646  PMID: 18416440

SUMMARY

Nutritional exposures are often measured with considerable error in commonly used surrogate instruments such as the food frequency questionnaire (FFQ) (denoted by Qi for the ith subject). The error can be both systematic and random. The diet record (DR) denoted by Ri for the ith subject is considered an alloyed gold standard. However, some authors have reported both systematic and random errors with this instrument as well.

One goal in measurement error research is to estimate the regression coefficient of Ti (true intake for the ith subject) on Qi denoted by λTQ. If the systematic errors in Qi and Ri (denoted by qi and ri) are uncorrelated, then one can obtain an unbiased estimate of λTQ by λRQ obtained by regressing Ri on Qi. Howfever, if Corr(qi, ri) > 0, then λRQ > λTQ.

In this paper, we propose a method for indirectly estimating λTQ even in the presence of correlated systematic error based on a longitudinal design where Qi (surrogate measure of dietary intake), Ri (a reference measure of dietary intake), and Mi (a biomarker) are available on the same subjects at 2 time points. In addition, between-person variation in mean levels of Mi among people with the same dietary intake is also accounted for. The methodology is illustrated for dietary vitamin C intake based on longitudinal data from 323 subjects in the European Prospective Investigation of Cancer (EPIC)-Norfolk study who provided two measures of dietary vitamin C intake from the FFQ (Qi) and a 7-day DR (Ri) and plasma vitamin C (Mi) 4 years apart.

Keywords: measurement error, longitudinal data, correlated error, biomarkers

1. INTRODUCTION

The diet record (DR) has often been used as a reference instrument to validate other surrogate instruments of nutritional intake such as the food frequency questionnaire (FFQ) [1]. The FFQ is known to have both random and systematic components of measurement error [2]. The reference instrument (e.g. DR, 24-h recall) may also have both systematic and random errors, although it is generally acknowledged that on average over a large number of people, the reference instrument provides an unbiased estimate of the population mean of true intake. Plummer and Clayton [3] consider the following model:

Qij=αq+Ti+eQij,   i=1,,N,j=1,,JRij=Ti+eRij,   i=1,,N,j=1,,J (1)

where Qij(Rij) are the FFQ (DR) intakes for the i th subject at time j, Ti is the true intake, Corr(eQi1, eQi2) = ρq, Corr(eRi1, eRi2) = ρr, Corr(eQij, eRij) = ρqr, and Corr(eQij, eRil) = 0 for j ≠ l = 1, … , J. In this model, αq represents systematic error in the FFQ, eQij represents random error in the FFQ, and eRij represents random error in the DR.

This model is identifiable and allows for a shift-bias term (αq) for the FFQ. However, it does not allow for a scale-bias term where the degree of bias in the FFQ is a function of Ti. Plummer and Clayton [4] have extended the model in (1) by the use of scale-bias coefficients (βq, βr) for nutrient intake and the use of biomarker measurements (Mij):

Qij=αq+βqTi+eQij,   i=1,,N,j=1,,JRij=αr+βrTi+eRij,   i=1,,N,j=1,,JMij=αm+Ti+eMij,   i=1,,N,j=1,,J (2)

where Mij is the biomarker for the ith subject at the jth time period, Corr(eQij, eQil) = ρQ, jl ≠ 0, Corr (eRij, eRil) = ρRjl ≠ 0, Corr(eQij, eRij) = ρQR, j ≠ 0 and Corr (eQij, eRil) = ρQR, jl ≠ 0, jl, Corr(eQij, eMij) = Corr(eRij, eMij) = 0, Corr(eQij, eMil) = Corr(eRij, eMil) = 0, jl. This type of model might be appropriate for a recovery biomarker such as urine nitrogen, but may not be appropriate for a biomarker such as plasma vitamin C because of the absence of a scale-bias term for the regression of Mij on Ti.

Kaaks et al. [5] have considered a slightly different measurement error model, allowing for a scale-bias factor for the biomarker measurement (M), but no scale-bias factor for the reference instrument:

Qi=αq+βqTi+eQi,   i=1,,NRi=Ti+eRi,   i=1,,NMi=αm+βmTi+eMi,   i=1,,N (3)

where Corr(eQi, eRi) = Corr(eQi, eMi) = Corr(eRi, eMi) = 0.

Based on (3) and using structural equation methods, Ocke and Kaaks [6] proposed the method of triads estimator:

ρ^TQ2=Corr^(Qi,Ri)Corr^(Qi,Mi)/Corr^(Mi,Ri)

However, this estimator may not be valid if there is correlated error between the surrogate (Q) and the reference (R) instruments. Hence, Subar et al. [7] consider the following model:

Qij=μqj+αq+qi+βqTi+eQijRij=μrj+αr+ri+βrTi+eRijMij=μmj+Ti+eMij (4)

where αq and βq are the shift- and scale-bias factors for the surrogate (FFQ) and αr and βr are the shift- and scale-bias factors for the reference measure (e.g. DR). qi and ri are systematic errors for the surrogate and reference measures, respectively, eQij and eRij are random errors, and ∑j μqj = ∑j μrj = 0. This model is similar to the Plummer and Clayton [4] model in equation (2), except that the systematic and random errors in Q and R are more explicitly defined. The objective is to obtain the regression coefficient of Ti on Qij which can be expressed in the form: λTQ = Cov(Qij, Ti)/Var(Qij). Provided that (a) the systematic errors in the FFQ (qi) and DR (ri) are independent, (b) the scale bias for the reference instrument (βr) is 1, and (c) the random errors (eQij, eRij) are independent, it can be shown from (4) that Cov (Qij, Rij) = Cov (Qij, Ti); thus,

λTQ=Cov(Qij,Ti)/Var(Qij)=Cov(Qij,Rij)/Var(Qij)=λRQ

Thus, the reference instrument can then be used to correct for measurement error based on the regression calibration approach [2, 8].

However, it is possible that there is correlated error between the surrogate and the reference instruments or Cov(qi, ri) > 0, or that βr ≠ 1 whereby λRQ ≠ λTQ. The disparity between λRQ and λTQ can be large if Corr(qi, ri) is non-trivial [9].

Spiegelman et al. [10] have also considered a biomarker-based model of the form:

Qij=αq+qi+βqTi+eQij,   i=1,,N,j=1,,JQ2Rij=ri+Ti+eRij,   i=1,,N,j=1,,JR2Mij=αm+βmTi+eMij,   i=1,,N,j=1,,JM2 (5)

The authors propose a method of moments approach, whereby an unbiased estimate of λTQ can be obtained even if there is correlated error, if replicate measures of Qij, Rij, and Mij are available.

This model differs from equations (2) and (4) in that (a) the reference measure is assumed to have no shift or scale bias at the population level and (b) the biomarker does have possible shift-bias (αm) and scale-bias (βm) factors. This model may be more appropriate than (2) or (4) for an imperfect concentration biomarker (e.g. plasma vitamin C).

Fraser et al. [11] consider a two biomarker model of the form:

Qi=αq+βqTi+eQiMi1=αm1+βm1Ti+eM1iMi2=αm2+βm2Ti+eM2i (6)

where Corr(eQi, eM1i) = Corr(eQi, eM2i) = Corr(eM1i, eM2i) = 0.

The parameters in this model are identifiable but only under the assumption that the errors in the two biomarkers (eM1i, eM2i) are uncorrelated, which may not be generally true if there is between-person variation and covariation in mean levels of biomarker measurements among people with the same dietary intake. In addition, a model with Ri, substituted for Qi, is also considered, with similar assumptions.

Spiegelman et al. [10] also consider a design with an unreplicated biomarker (M) and an additional instrumental variable (V) of the form:

Qij=αq+qi+βqTi+eQij,   i=1,,N,j=1,,JQ2Rij=ri+Ti+eRij,   i=1,,N,j=1,,JR2Mi=αM+βmTi+eMi,   i=1,,NVi=αV+βυTi+eVi,   i=1,,N (7)

where Corr(eMi, eVi) = 0. This model extends the work of Fraser et al. [11] by allowing for the surrogate (Q), reference measure (R), a biomarker (M), and an instrumental variable (V) in the same model. However, the model in (7) is not uniquely identifiable if there is only a single biomarker and a single instrumental variable (V), but becomes identifiable if there are replicate measures available for both the biomarker (M) and the instrumental variable (V).

In general, there are some potential limitations to the biomarker-based models in equations (4)(7). First, there is the issue of the specificity of biomarker measurements for the exposure of interest. Second, even if specificity of the biomarker is assumed, there may be metabolic differences among people (e.g. some subjects may have systematically different metabolic absorption rates); hence, there may be systematic error in a biomarker (mi), which is likely to be uncorrelated with either qi or ri in equation (4) or (5). Third, if the time periods in equation (2), (4), or (5) are proximate to each other (e.g. months apart), then it is reasonable to assume that Ti (true intake) would be the same for a given subject at each time point. However, it may often be the case that surrogate instruments are administered at distinct long-term time periods (e.g. years apart) in which case Ti may change over time. In this paper, we focus our attention on the single biomarker case and generalize equations (2), (4), and (5) to allow for (a) possible systematic error (henceforth referred to as between-person variation) in biomarker measurements and (b) variation in true intake over time by using longitudinal data on Q, R, and M. All parameters in this model are estimable and standard errors and confidence limits in closed form are available. We then apply these methods to dietary vitamin C intake from the EPIC study to assess whether correlated error has a substantial impact on regression calibration.

2. METHODS

We will first consider the case where there are no additional covariates that affect nutrient intake or biomarker measurements for a particular nutrient.

2.1. No additional covariates that affect nutrient intake or associated biomarkers

We consider an extension of the model in Plummer and Clayton [4], Kaaks et al. [5], and Spiegelman et al. [10] of the form:

Qij=αqj+qi+βqTij+eQij,   i=1,,n,j=1,2Rij=ri+Tij+eRij,   i=1,,n,j=1,2Mij=αmj+mi+βmTij+eMij,   i=1,,n,j=1,2 (8)

where Tij is the true intake for the ith subject at the jth time point and qi, ri, and mi are random effects for the surrogate instrument (Q), reference instrument (R), and biomarker (M), which are distributed as N(0,σq2),N(0,σr2), and N(0,σm2), respectively. We assume that Corr(qi, mi) = Corr(ri, mi) = 0, but Corr(qi, ri) is not necessarily 0. Also the random errors for Q, R, and M denoted by eQij, eRij, and eMij are distributed as N(0,σeq2),N(0,σer2), and N(0,σem2), respectively, and are mutually independent of each other as well as qi, ri, and mi. Thus, the random errors in Q, R, and M are assumed to be independent both within a given visit and across visits. This may not hold if an additional covariate (e.g. body mass index (BMI)) is related to reported surrogate intake (Q) even conditional on true intake. This issue is considered further in Section 2.3. The random variable mi represents between-person variation in mean levels of the biomarker whose variance (σm2) is a measure of variation in the biomarker among people with the same dietary intake Tij. Finally, we assume that Var(Ti1) = Var(Ti2) and denote this common cross-sectional variance by Var(Tij), but allow E(Ti1) and E(Ti2) to be free parameters, denoted by μT1 and μT2, respectively.

Fitting this model requires longitudinal data over a comparable time period for the surrogate instrument, reference instrument, and biomarker. We note that change scores for Q, R, and M are of the form:

QdiQi2Qi1=αq2αq1+βq(Ti2Ti1)+eQi2eQi1αq2αq1+βqTdi+eQi2eQi1RdiRi2Ri1=Tdi+eRi2eRi1MdiMi2Mi1=αm2αm1+βmTdi+eMi2eMi1 (9)

where Tdi is the change in true intake for the ith subject = Ti2Ti1, i = 1, … ,n.

None of the change scores contain the random effects in (8). Our goal is to estimate the measurement error correction factor, which is obtained from the regression coefficient of Tij on Qij of the form:

Tij=αTQ+λTQQij+eij*

Thus, λTQ = Cov(Qij, Tij)/Var(Qij) = βq Var(Tij)/Var(Qij) = (βqm)[βm Var(Tij)]/Var(Qij). It can be shown that the maximum likelihood estimator (MLE) of βm Var(Tij) can be obtained by β^mVar^(Tij)=β^m[Var^(Ti1)+Var^(Ti2)]/2=[Cov^(Mi1,Ri1)+Cov^(Mi2,Ri2)]/2Cov^(Mij,Rij) Furthermore, the MLE of βqm can be obtained from (9) by

β^q/β^m=Cov^(Qdi,Rdi)/Cov^(Mdi,Rdi)

If we denote [Var^(Qi1)+Var^(Qi2)]/2 by Var^(Qij), it follows that the MLE of λTQ is given by

λ^TQ=Cov^(Qdi,Rdi)Cov^(Mij,Rij)/[Cov^(Mdi,Rdi)Var^(Qij)] (10)

The standard regression calibration factor based on the reference instrument alone is obtained from the regression coefficient of Rij on Qij given by λ^RQ=Cov^(Qij,Rij)/Var^(Qij). Based on equation (8), we obtain

λRQ=[βqVar(Tij)+Cov(qi,ri)]/Var(Qij) (11)

If there is correlated error, then Cov(qi, ri) > 0 and λRQ > λTQ.

We now consider confidence limits for λTQ. We have found in simulation studies based on the model in equation (8) that the sampling distribution of λ̂TQ is positively skewed, especially if n is small. Hence, in Appendix A we use the delta method to obtain a closed-form expression for Var[ln(λ̂TQ)]. A 100 per cent × (1−α) confidence interval (CI) for λTQ is then given by [exp(c1), exp(c2)], where

(c1,c2)=ln(λ^TQ)±z1α/2{Var[ln(λ^TQ)]}1/2 (12)

Var[ln(λ̂TQ)] is obtained from equation (A2) and z1 −α/2 is the upper α/2 percentile of an N(0,1) distribution.

2.2. Variance decomposition

Based on the model in (8), the variance of Qij can be separated into the following independent components:

Var(Qij)=Var(qi)+βq2Var(Tij)+Var(eQij) (13)

where βq2Var(Tij) represents variation in Q attributable to true intake, Var(qi) represents variation due to systematic error, and Var(eQij) represents variation due to random error. A similar decomposition can be performed for variations in the DR (Rij) and the biomarker (Mij), respectively. To facilitate this decomposition, one can derive maximum likelihood estimates (MLEs) of all the parameters in the model. For this purpose, we let ν̰i = (i, R̅i, M̅i), i = (Qdi, Rdi, Mdi), i = 1, … ,n and define

A=^(ν˜),  B=^(w˜)

where

A11=i=1n(Qi¯Q¯¯)2/n,A12=i=1n(Qi¯Q¯¯)(Ri¯R¯¯)/n,A13=i=1n(Qi¯Q¯¯)(Mi¯M¯¯)/nA22=i=1n(Ri¯R¯¯)2/n,A23=i=1n(Ri¯R¯¯)(Mi¯M¯¯)/n,A33=i=1n(Mi¯M¯¯)2/n

i = (Qi1 + Qi2)/2, i = (Ri1 + Ri2)/2, i = (Mi1 + Mi2)/2, Q¯¯=i=1nQ¯i/n,R¯¯=i=1nR¯i/n,M¯¯=i=1nM¯i/n, Alk = Akl for all k, l = 1,2,3 and the elements of B are defined similarly based on Qdi, Rdi, and Mdi, i = 1, … ,n, and Q¯d=i=1nQ¯di/n,R¯d=i=1nRdi/n,M¯d=i=1nMdi/n. We also let Si = (Ti1 + Ti2)/2 and define σS2=Var(Si),σD2=Var(Ti2Ti1)Var(Tdi). It can be shown that the MLEs of the variance–covariance parameters of (8) exist in closed form and are given in Appendix B.

2.3. Additional covariates affecting nutrient intake and/or associated biomarkers

It is often the case that biomarker measurements Mij will be affected by covariates other than true intake (Tij) of the nutrient under study. For example, BMI and cigarette smoking may influence the metabolism and absorption of many nutrients. In addition, true dietary intake (Tij) as well as recording of diet using a surrogate instrument (Qij) may also be influenced by other covariates. Let Zijk be the value of the kth covariate measured on the ith subject at time j ; k = 1, … , K. Thus, we consider an extension of (8), which is given by

Qij=αqj+qi+βqTij+γqZij+eQij,   i=1,,n,   j=1,2Rij=ri+Tij+eRij,   i=1,,n,  j=1,2Mij=αmj+mi+βmTij+γmZij+eMij,   i=1,,n,   j=1,2Tij=αTj+δZij+eTij,   i=1,,n,   j=1,2 (14)

where Zij=(Zij1,,ZijK),γq=(γq1,,γqK),γm=(γm1,,γmK), and δ′ = (δ1, … , δK) are 1 × K vectors; eQij~N(0,σeq2),eRij~N(0,σer2),eMij~N(0,σem2),eTij~N(0,σT2); eQij, eRij, eMij, and eTij are independent; qi, ri, and mi are independent of both Tij and Zij as well as eQij, eRij, eMij, and eTij; qi~N(0,σq2),ri~N(0,σr2),mi~N(0,σm2); and qi and ri are each independent of mi; however, qi and ri may be dependent. Note that qi, ri and mi in (14) represent random effects conditional on both Tij and Zij and, hence, have a different interpretation than in (8). For example, if Zij = BMI, then qi, ri, and mi are conditional on BMI, making the assumption of independence between say qi and Zij more reasonable.

We wish to estimate λTQ|Z = βq Var(Tij|Zij)/Var(Qij|Zij). Based on (14), we can express

Rij=αTj+ri+δZij+eTij+eRijMij=αmj*+mi+(βmδ+γm)Zij+βmeTij+eMij

where αmj*=αmj+βmαTj. If we let

Rij*RijδZij=αTj+ri+eTij+eRijMij*Mij(βmδ+γm)Zij=αmj*+mi+βmeTij+eMij (15)

then because ri, mi, and Zij are mutually independent, Rij* and Mij* can be interpreted as residuals of Rij and Mij, respectively, on Zij. It follows from (15) that

Cov(Mij*,Rij*)=βmVar(Tij|Zij)=βmσT|Z2 (16)

and thus, Cov^(Mij*,Rij*) is the MLE of βmσT|Z2. Similarly, from (14), we define

Qij*Qij(βqδ+γq)Zij=αqj*+qi+βqeTij+eQij

where αqj*=αqj+βqαTj and interpret Qij* as the residual of Qij on Zij. We now consider the difference scores:

Qdi*Qi2*Qi1*=(αq2*αq1*)+βq(eTi2eTi1)+(eQi2eQi1)Rdi*Ri2*Ri1*=(αT2αT1)+(eTi2eTi1)+(eRi2eRi1)Mdi*Mi2*Mi1*=(αm2*αm1*)+βm(eTi2eTi1)+(eMi2eMi1) (17)

From (17) it follows that Cov(Qdi*,Rdi*)=βqVar(eTi2eTi1), Cov(Mdi*,Rdi*)=βmVar(eTi2eTi1), and thus the MLE for βqm is given by

β^q/β^m=Cov^(Qdi*,Rdi*)/Cov^(Mdi*,Rdi*) (18)

Therefore, from (16) and (18) we have that the MLE for βqσT|Z2 is

β^qσ^T|Z2=Cov^(Mij*,Rij*)Cov^(Qdi*,Rdi*)/Cov^(Mdi*,Rdi*)

Finally, we estimate λTQ|Z by

λ^TQ|Z=Cov^(Tij,Qij|Zij)/Var^(Qij|Zij)=Cov^(Mij*,Rij*)Cov^(Qdi*,Rdi*)/[Cov^(Mdi*,Rdi*)Var^(Qij*)] (19)

which can be compared with λRQ|Z=Cov^(Qij*,Rij*)/Var^(Qij*). To obtain confidence limits for λTQ|Z, we use the same approach as in Appendix A and equation (12), replacing Qij, Rij, and Mij by Qij*,Rij*, and Mij*, respectively.

2.4. Assessment of covariate effects on the systematic components of dietary and plasma measurement errors

It is also of interest to estimate γ̰q and γ̰m. · γ̰q represents the effect of Zij on Qij conditional on true intake Tij. Hence, γ̰q allows us to evaluate whether covariates ij are associated with systematic components of dietary (Qij) measurement error. γ̰m has a similar interpretation regarding the effects of covariates ij on Mij (biomarker) conditional on Tij. If we refer to (14), we see that

QRijQijβqRij=αqj+(qiβqri)+γqZij+(eQijβqeRij)MRijMijβmRij=αmj+(miβmri)+γmZij+(eMijβmeRij) (20)

where βq and βm are estimated from (14) (see Appendix B). Hence, we can estimate γ̰q and γ̰m by running mixed effects regression models of QRij on Zij and MRij on Zij, respectively.

3. EXAMPLE

Applying the methods in this paper requires longitudinal data on intake obtained from a surrogate instrument, intake obtained from a reference instrument, and a biomarker over a sufficiently long period of time where non-trivial changes in dietary intake are possible. For this purpose, we use data from the EPIC study, a multi-center cohort study on diet and cancer conducted in 28 regional centers located in 10 Western European countries with varying dietary habits and cancer risk [12]. For 328 participants of the EPIC-Norfolk study, one of the two U.K.-based centers, data were available on dietary vitamin C assessed by both FFQ and a 7-day DR with plasma vitamin C as a biomarker. These data were available at both baseline and 4 years of follow up. We note that DR intake was obtained at the time of the blood draw, whereas FFQ intake pertains to intake during the previous year. There were five participants with outlying values for either plasma vitamin C (n = 3) or reported FFQ intake (n = 2) at one visit in the absence of outlying values at the other visit who were excluded from the analysis [13]. Previous analyses from the EPIC-Norfolk study have looked at the relationship between plasma vitamin C and dietary vitamin C assessed by FFQ and DR [14]. In this paper, we use the longitudinal data from the remaining 323 participants to estimate the parameters in (8). Descriptive statistics of the demographic variables, nutrient intake, and plasma levels at each time point are provided in Table I.

Table I.

Descriptive statistics for vitamin C intake and plasma vitamin C, EPIC-Norfolk study, n=323.

Baseline Year 4 Difference*
Total caloric intake (kcal)
  FFQ (mean±s.d.) 2033.6±509.9 1980.7±520.6 −52.9±490.6
  DR (mean±s.d.) 1755.1±394.8 1857.7±429.8 102.6±327.7
Dietary vitamin C intake (mg/day)
  FFQ (raw) (mean±s.d.) 135.5±57.4 137.1±63.2 1.6±55.6
  DR (raw) (mean±s.d.) 90.6±50.1 94.8±51.1 4.2±44.4
    Correlation (DR vs FFQ) 0.45 0.51 0.16§
  FFQ (cal.-adj.) (mean±s.d.) 134.4±54.5 135.7±58.7 1.3±50.7
  DR (cal.-adj.) (mean±s.d.) 90.6±50.2 94.6±52.0 4.1±46.2
    Correlation (DR vs FFQ) 0.47 0.57 0.22§
Plasma vitamin C (µmoL/L) 57.7±21.2 64.8±23.2 7.2±21.5
  Correlation (vs FFQ, raw) 0.25 0.24 0.11
  Correlation (vs DR, raw) 0.40 0.36 0.28
  Correlation (vs FFQ, cal.-adj.) 0.25 0.27 0.11
  Correlation (vs DR, cal.-adj.) 0.40 0.34 0.27
Age (mean±s.d.) 69.0±2.9 73.3±3.0
Gender
  Male 80 (25 per cent)
  Female 243 (75 per cent)
Height (cm) (mean±s.d.) 162.9±8.1 162.2±8.2
BMI (kg/m2) (mean±s.d.) 26.2±3.3 26.7±3.6
Smoking Status
  Current 17 (5 per cent) 12 (4 per cent)
  Past 127 (39 per cent) 132 (41 per cent)
  Never 179 (56 per cent) 179 (55 per cent)
Vitamin C supplement use
  Yes 42 (13 per cent)
  No 281 (87 per cent)
*

Year 4 minus baseline.

FFQ, food frequency questionnaire; DR, diet record.

Exclusive of vitamin supplements.

§

Correlation between change in DR intake (year 4 minus baseline) and change in FFQ intake (year 4 minus baseline).

Correlation between change in dietary intake (year 4 minus baseline) and change in plasma vitamin C (year 4 minus baseline).

At baseline, the mean age of the study population included in this analysis was 69 years and 75 per cent of the subjects were women. About 5 per cent of the subjects were current smokers and 13 per cent were vitamin C supplement users. We see that dietary vitamin C intake reported on the FFQ was about 50 per cent higher than the DR at both baseline and year 4. Reported intake on the FFQ was relatively constant over 4 years. Reported DR intake increased slightly and measured plasma vitamin C levels increased moderately over 4 years. Cross-sectional correlations between calorie-adjusted DR and FFQ vitamin C nutrient intake ranged from 0.47 to 0.57; correlations between plasma vitamin C and calorie-adjusted nutrient intake from either instrument ranged from 0.25 to 0.40. Correlations between change in calorie-adjusted FFQ and DR intake were substantially lower (ρ = 0.22) than cross-sectional correlations. Correlations between change in calorie-adjusted Vitamin C intake and change in plasma vitamin C were also weak, but were slightly stronger for DR intake (ρ = 0.27) than for FFQ intake (ρ = 0.11).

A number of covariates may potentially be related to either dietary vitamin C intake or plasma vitamin C, some of which may change over time. Hence, we ran the following mixed effects regression model with, for example, FFQ vitamin C intake (Qij) as the response variable, where Qi1,Qi2 = FFQ vitamin C intake for the ith subject at baseline and year 4, respectively, treating the subject as a random effect and age, gender, height, BMI, smoking status, and vitamin C supplement use as fixed effects and using a compound symmetry correlation structure:

Qij=α+β1ageij+β2male genderi+β3heightij+β4BMIij+β5current smokingij+β6ex-smokingij+β7  vit. C supplement useij+β8visitj+eij,   i=1,,323,   j=1,2 (21)

and obtained residuals of Qij from equation (21); similar analyses were performed for DR vitamin C intake (Ri1, Ri2) and plasma vitamin C (Mi1, Mi2). For dietary vitamin C, analyses were performed for both raw and calorie-adjusted intakes. Calorie-adjusted FFQ vitamin C intake scores for males were obtained from

Qij,cal.-adj=exp{ln(Qij)θQ,j[ln(Cij)mean[ln(Cij),i=1,,80]]},   i=1,,80,   j=1,2

where without loss of generality we assume that the first 80 subjects are males, Cij is the total caloric intake for the ith male at time j, and θQ, j is the regression coefficient of ln(Qij) on ln(Cij) based on the sample of 80 males. Similar formulas were used for females and for DR intake for both males and females. The results are given in Table II.

Table II.

Mixed effects regression of dietary vitamin C intake and plasma vitamin C, respectively, on other covariates, EPIC-Norfolk study, n=323.*

FFQ, raw FFQ, cal.-adj. DR, raw DR, cal.-adj. Plasma Vitamin C





Variable Beta±s.e. p-Value Beta±s.e. p-Value Beta±s.e. p-Value Beta±s.e. p-Value Beta±s.e. p-Value
Constant −4.8±117.7 −9.6±110.3 −24.5±100.0 −13.7±100.4 127.0±38.3
Age (yrs) 1.10±1.01 0.27 1.02±0.94 0.28 0.02±0.85 0.98 0.15±0.86 0.86 −0.93±0.33 0.005
Male gender (1=yes/0=no) −15.5±10.0 0.12 −15.0±9.4 0.11 −12.0±8.6 0.16 −9.8±8.6 0.26 −14.1±3.2 <0.001
Height (cm) 0.15±0.51 0.77 0.20±0.48 0.68 0.76±0.44 0.085 0.59±0.44 0.18 0.10±0.17 0.53
BMI (kg/m2) 1.78±0.82 0.032 1.85±0.77 0.017 −0.13±0.70 0.85 0.10±0.70 0.89 −0.73±0.27 0.008
Smoking status
  Current −9.6±13.3 0.47 −15.1±12.4 0.23 −28.7±11.2 0.010 −30.4±11.3 0.007 −21.0±4.4 <0.001
  Past −6.6±6.7 0.33 −9.1±6.3 0.15 −4.0±5.7 0.49 −4.5±5.7 0.43 −0.9±2.2 0.67
Vitamin C supplement use 4.7±8.7 0.59 2.4±8.2 0.77 1.4±7.5 0.85 2.6±7.5 0.73 14.7±2.8 <0.001
Visit (1=visit2/0=visit1) −4.1±5.3 0.45 −4.1±4.9 0.41 4.3±4.4 0.33 3.3±4.5 0.46 11.3±1.8 <0.001
Correlation between repeated measures 0.56 0.58 0.61 0.58 0.43
*

Based on PROC MIXED of SAS.

Using a compound symmetry correlation structure.

Based on Table II, we see that the BMI was significantly associated with calorie-adjusted FFQ vitamin C intake (Beta = 1.85±0.77, p = 0.017) with heavier subjects reporting higher levels of intake. However, no association was found for DR intake. Current smoking was inversely associated with calorie-adjusted DR intake with current smokers reporting lower levels of intake (Beta = −30.4±11.3, p = 0.007). Associations were strongest for plasma vitamin C. Plasma vitamin C was positively associated with vitamin C supplement use (Beta = 14.7±2.8, p<0.001) and inversely associated with age (Beta = −0.93±0.33, p = 0.005), male gender (Beta = −14.1±3.2, p<0.001), BMI (Beta = −0.73±0.27, p = 0.008), and current smoking (Beta = −21.0±4.4, p<0.001). After controlling for the risk factors in Table II, there was a moderate intraclass correlation between repeated measures of calorie-adjusted dietary intake (ICC = 0.58) and plasma vitamin C (ICC = 0.43).

We now fit the model in equation (14) by obtaining the maximum likelihood estimates of parameters after adjusting for the covariates in Table II. Separate analyses were performed for both raw and calorie-adjusted vitamin C intakes. Also, based on equation (13), we decomposed the variance of FFQ vitamin C intake (Var(Qij)) into components of variation due to systematic error (Var(qi)), true dietary intake (βq2Var(Tij)), and random error (Var(eQij)). This decomposition was performed for both unadjusted and covariate-adjusted analyses. A similar decomposition was used for DR and biomarker measurements. The results are given in Table III.

Table III.

Variance component estimates based on reported vitamin C intake and plasma vitamin C, EPIC-Norfolk study, n=323.*

Raw intake Calorie-adjusted intake


Source of variation Unadjusted (per cent) Covariate-adjusted (per cent) Unadjusted (per cent) Covariate-adjusted (per cent)
Food frequency questionnaire (Qij) 3634 3526 3199 3069
Systematic error 1848 (51) 1718 (49) 1652 (52) 1486 (48)
True intake   473 (13)   460 (13)   580 (18)   560 (18)
Random error 1313 (36) 1348 (38)   967 (30) 1023 (33)
Diet record (Rij) 2552 2492 2601 2542
Systematic error 1128 (44)   998 (40) 1045 (40)   904 (36)
True intake   853 (33)   884 (35) 1076 (41) 1092 (43)
Random error   571 (22)   610 (24)   480 (18)   546 (21)
Plasma vitamin C (Mij)§   492   401   492   401
Between-person variation   154 (31)     77 (19)   188 (38)   104 (26)
True intake   209 (42)   164 (41)   163 (33)   131 (33)
Random error   129 (26)   160 (40)   141 (29)   166 (41)
*

With adjustment for the covariates in Table II.

FFQ: variation due to systematic error, Var(qi); variation due to true intake, βq2Var(Tij); variation due to random error, Var(eQij).

DR: variation due to systematic error, Var(ri); variation due to true intake, Var(Tij); variation due to random error, Var(eRij).

§

Plasma vitamin C: between-person variation, Var(mi); variation due to true intake, βm2Var(Tij); variation due to random error, Var(eMij).

We see that for covariate- and calorie-adjusted FFQ intake, 48 per cent of the total variation is due to systematic error, 33 per cent is due to random error, and only 18 per cent is attributable to true dietary intake. For covariate- and calorie-adjusted DR intake, systematic error accounted for 36 per cent, random error for 21 per cent, and true dietary intake for 43 per cent of total variation. For plasma vitamin C, between-person variation accounted for 26 per cent of total variation, 41 per cent of the total variation was due to random error, and 33 per cent to variation in true dietary intake. Hence, the DR was most reflective of true intake among these three indices. For both raw and calorie-adjusted intakes, covariate-adjustment resulted in reduced variation due to systematic error and increased variation due to random error.

Estimates and standard errors for all the parameters in equations (8) and (14) are given in Table IV. We also computed the standard (λRQ) and modified (λTQ) regression calibration factors (equations (10), (11), and (19)), for both raw and calorie-adjusted nutrient intakes, with and without adjusting for the other covariates in Table II.

Table IV.

Parameter estimates from models in equations (8) and (14), EPIC-Norfolk study, n=323.

Raw vitamin C intake Calorie-adjusted vitamin C intake


Unadjusted* Covariate-adjusted Unadjusted* Covariate-adjusted




Parameter
type
Parameter Independent
variable
Est.±s.e. p-Value Est.±s.e. p-Value Est.±s.e. p-Value Est.±s.e. p-Value
Intercept μT1 90.6±2.8 90.6±2.8 90.6±2.8 90.6±2.8
μT2 94.8±2.8 94.8±2.8 94.6±2.9 94.6±2.9
αq1 68.0±17.4 27.0±98.6 67.9±19.2 14.3±89.8
αq2 66.5±18.2 20.3±100.9 66.2±20.0 8.2±91.9
αm1 12.9±20.2 140.8±44.3 22.4±14.5 134.9±39.9
αm2 17.9±21.2 150.5±45.4 28.0±15.1 145.3±40.9
Regression βq True vit. C intake 0.745±0.189 <0.001 0.721±0.167 <0.001 0.734±0.210 <0.001 0.716±0.181 <0.001
βm True vit. C intake 0.495±0.223 0.026 0.431±0.181 0.017 0.389±0.160 0.015 0.346±0.133 0.009
γ̰q Age (yrs) 0.99±0.84 0.24 0.81±0.77 0.29
Male gender (1=yes/0=no) −6.9±8.4 0.41 −8.0±7.6 0.29
Height (cm) −0.41±0.43 0.34 −0.25±0.39 0.52
BMI (kg/m2) 1.67±0.70 0.018 1.68±0.64 0.009
Current smoking 4.8±11.5 0.68 −0.6±10.5 0.95
Past smoking −2.8±5.6 0.61 −4.8±5.1 0.34
Vitamin C supplement use 3.8±7.2 0.60 0.7±6.6 0.92
γ̰m Age (yrs) −0.96±0.38 0.011 −1.01±0.34 0.003
Male gender (1=yes/0=no) −8.9±3.8 0.018 −10.8±3.4 0.002
Height (cm) −0.22±0.19 0.25 −0.10±0.17 0.56
BMI (kg/m2) −0.74±0.31 0.019 −0.81±0.28 0.005
Current smoking −8.3±5.1 0.11 −10.4±4.6 0.025
Past smoking 0.7±2.5 0.77 0.6±2.3 0.78
Vitamin C supplement use 14.1±3.3 <0.001 13.9±2.9 <0.001
Variance component
σS2
647±301 700±309 783±318 834±331
σD2
822±429 736±351 1168±570 1032±462
σT2
853±369 884±356 1076±427 1092±404
σq2
1848±428 1718±380 1652±387 1486±332
σr2
1127±454 998±431 1045±453 904±424
σm2
154±123 77±88 188±90 104±66
σeq2
1313±226 1348±214 967±216 1023±185
σer2
571±230 610±193 480±286 546±235
σem2
129±73 160±50 141±64 166±45
Correlation ρqr 0.61±0.10 0.62±0.10 0.62±0.12 0.61±0.11
ρT 0.52±0.20 0.58±0.17 0.46±0.23 0.53±0.20
Deattenuation λRQ 0.404±0.041 0.403±0.041 0.472±0.043 0.471±0.044
factor (0.324, 0.484) (0.322, 0.483) (0.388, 0.556) (0.385, 0.556)
λTQ 0.175±0.077 0.181±0.075 0.247±0.095 0.255±0.094
(0.073,0.416)§ (0.081, 0.406)§ (0.116, 0.526)§ (0.124, 0.525)§
*

Based on Equation (8).

Based on Equation (14) after adjusting for the covariates in Table II.

95 per cent CI for λRQ.

§

95 per cent CI for λTQ based on equation (12).

We see that with standard regression calibration, based on raw intake after adjusting for the covariates in Table II, the standard deattenuation factor (λRQ) is 0.403±0.041, 95 per cent CI = (0.322,0.483). However, upon accounting for possibly correlated error between the FFQ and the DR, the modified deattenuation factor (λTQ) is 0.181±0.075 (95 per cent CI = 0.081, 0.406), which is more extreme than with standard regression calibration. For example, if the uncorrected RR for an exposure of interest is 1.2, the deattenuated RR estimate would be 1.21/0.403 = 1.6 with standard regression calibration and 1.21/0.181 = 2.7 after correction for correlated error with modified regression calibration, which is a substantial difference. The estimated correlation between the systematic error for FFQ and DR intake (ρqr) was 0.62.

After adjusting for calories, both the standard and the modified regression calibration factors increased: λRQ = 0.471±0.044, 95 per cent CI = (0.385,0.556); λTQ = 0.255±0.094, 95 per cent CI = (0.124,0.525). The corrected RR estimates corresponding to an uncorrected RR of 1.2 were 1.21/0.471 = 1.5 with standard regression calibration and 1.21/0.255 = 2.0 with modified regression calibration, still a substantial difference. The degree of correlated error remained about the same after caloric adjustment (ρqr = 0.61). Also, both the modified regression calibration factor (λTQ) and the estimated degree of correlated error (ρqr) remained about the same for unadjusted and covariate-adjusted analyses.

We also estimated γ̰q and γ̰m in (14) by using the methods in equation (20) for both raw and calorie-adjusted vitamin C intakes. We see that for calorie-adjusted intake, there was a significant association between BMI and FFQ vitamin C intake even after controlling for true intake (γq = 1.68±0.64, p = 0.009). This implies that heavier people tend to systematically report higher levels of FFQ vitamin C intake than lighter people conditional on true intake. No other covariates were significantly associated with FFQ reported intake conditional on true intake. Regarding plasma vitamin C, there were significant effects of age (γm = −1.01±0.34, p = 0.003), male gender (γm = −10.8±3.4, p = 0.002), BMI (γm = −0.81±0.28, p = 0.005), current smoking (γm = −10.4±4.6, p = 0.025), and vitamin C supplement use (γm = 13.9±2.9, p<0.001). Hence, older individuals, males, heavier individuals, and current smokers had lower levels of plasma vitamin C, whereas vitamin C supplement users had higher levels of plasma vitamin C, conditional on true intake. Results were similar when raw intake was used instead of calorie-adjusted intake.

4. SIMULATION STUDY

We performed simulation studies to assess the bias and coverage probability of our estimator λTQ as given in equations (10) and (12). In addition, we computed the C statistic given by

C={i=14000[λ^TQ(i)λ¯TQ]2/3999}/{i=14000Var^(λ^TQ(i))/4000}

to assess the validity of the variance estimate of λ̂TQ given in equation (A2). We chose sample sizes of 100 and 350, where the latter sample size approximately mimics the sample size used in our example. For each of the 36 parameter combinations varying ρT, ρqr, and λTQ, we performed 4000 simulations. The detailed simulation study design is given as follows for each of i = 1, … , n subjects:

  1. We generated qi from an N(0,σq2) distribution.

  2. We generated ri|qi from an N[ρqrqi,σr2(1ρqr2)] distribution.

  3. We generated mi from an N(0,σm2) distribution.

  4. We generated (Ti1, Ti2) from an N(μT, ΣT) distribution where μT = (μT1, μT2), ΣT,11=ΣT,22=σT2,ΣT,12=ΣT,21=ρTσT2.

  5. We generated Qij from an N(αqj+qi+βqTij,σeq2) distribution; j = 1, 2.

  6. We generated Rij from an N(ri+Tij,σer2) distribution; j = 1, 2.

  7. We generated Mij from an N(αmj+mi+βmTij,σem2) distribution; j = 1, 2.

  8. We then computed λ̂TQ from equation (10).

  9. Furthermore, we computed the 95 per cent CI for λTQ based on equation (12) and obtained the estimated coverage probability given by the proportion of 95 per cent CIs which included the true value of λTQ.

  10. Finally, we used the C statistic to compare the empirical variance of λ̂TQ over 4000 simulations for each combination of parameters with the theoretical variance of λ̂TQ given by the average of Var^(λ^TQ(i))=λ^TQ2 Var[ln(λ^TQ)] in equation (A2) over 4000 simulations.

The simulation strategy in steps 1–10 was based on the following parameter values: αq1 = 0, αq2 = 1, βq = βm = 1, μT1 = 100, μT2 = 110, αm1 = 0, αm2 = 1, σq2=σr2=σm2=σeq2=σer2=σem2=1,, ρT = (0.2,0.5,0.8), ρqr = (0,0.3,0.6,0.9), and λTQ = (1/3,2/3,9/10), σT2=2λTQ/(1λTQ), and n = (100,350) with 4000 simulations run for each parameter combination. The results are shown in Table V.

Table V.

Simulation results for modified regression calibration approach, 4000 simulations per design.

ρqr

0 0.3 0.6 0.9




λTQ ρT Mean±s.d.
(range)
C
(NSIM)
Coverage
(per cent)
Mean±s.d.
(range)
C
(NSIM)
Coverage
(per cent)
Mean±s.d.
(range)
C
(NSIM)
Coverage
(per cent)
Mean±s.d.
(range)
C
(NSIM)
Coverage
(per cent)
(a) n=350
0.9 0.2 0.900±0.029 1.03 94.3 0.900±0.027 1.02 94.6 0.900±0.025 1.01 94.9 0.900±0.023 1.00 94.9
(0.798,1.005) (4000) (0.804,1.001) (4000) (0.810,0.998) (4000) (0.817,0.991) (4000)
0.5 0.900±0.034 1.04 94.2 0.900±0.032 1.04 94.3 0.900±0.031 1.03 94.3 0.900±0.029 1.02 94.7
(0.786,1.025) (4000) (0.792,1.023) (4000) (0.799,1.019) (4000) (0.807,1.012) (4000)
0.8 0.900±0.049 1.06 94.2 0.900±0.048 1.05 94.2 0.900±0.046 1.05 94.3 0.900±0.045 1.05 94.5
(0.728,1.080) (4000) (0.732,1.072) (4000) (0.738,1.065) (4000) (0.746,1.065) (4000)
0.667 0.2 0.667±0.046 1.03 94.5 0.667±0.044 1.02 94.6 0.667±0.042 1.01 94.5 0.667±0.040 1.00 94.9
(0.515,0.865) (4000) (0.522,0.862) (4000) (0.529,0.855) (4000) (0.536,0.842) (4000)
0.5 0.668±0.056 1.03 94.3 0.668±0.054 1.03 94.4 0.668±0.052 1.02 94.5 0.668±0.050 1.01 94.7
(0.483,0.920) (4000) (0.498,0.917) (4000) (0.513,0.910) (4000) (0.513,0.896) (4000)
0.8 0.671±0.095 1.03 95.0 0.671±0.093 1.02 94.8 0.671±0.092 1.02 95.0 0.671±0.090 1.01 95.0
(0.390,1.084) (4000) (0.395,1.078) (4000) (0.399,1.089) (4000) (0.399,1.401) (4000)
0.333 0.2 0.335±0.054 1.01 95.1 0.335±0.053 1.00 95.3 0.335±0.052 0.99 95.5 0.335±0.052 0.99 95.6
(0.160,0.577) (4000) (0.177,0.575) (4000) (0.185,0.567) (4000) (0.190,0.550) (4000)
0.5 0.338±0.072 0.98 95.2 0.338±0.071 0.98 95.5 0.338±0.070 0.98 95.5 0.338±0.069 0.97 95.8
(0.125,0.699) (4000) (0.139,0.688) (4000) (0.156,0.691) (4000) (0.150,0.708) (4000)
0.8* 0.360±0.150 0.55 99.0 0.359±0.149 0.55 99.1 0.360±0.148 0.54 99.0 0.360±0.148 0.54 98.9
(0.111,0.986) (3844) (0.111,0.992) (3845) (0.111,0.999) (3840) (0.111,0.989) (3839)
(b) n=100
0.9 0.2 0.901±0.053 0.98 94.0 0.901±0.050 0.97 94.0 0.901±0.047 0.97 93.8 0.901±0.044 0.95 94.1
(0.699,1.093) (4000) (0.709,1.080) (4000) (0.720,1.062) (4000) (0.736,1.044) (4000)
0.5 0.901±0.063 1.01 93.9 0.901±0.060 1.01 93.9 0.901±0.057 1.01 94.0 0.901±0.054 1.00 94.2
(0.679,1.159) (4000) (0.689,1.144) (4000) (0.701,1.124) (4000) (0.708,1.095) (4000)
0.8 0.903±0.091 1.03 94.0 0.903±0.089 1.03 93.9 0.902±0.086 1.03 94.1 0.902±0.083 1.03 94.4
(0.615,1.306) (4000) (0.621,1.289) (4000) (0.631,1.278) (4000) (0.630,1.275) (4000)
0.667 0.2 0.669±0.086 1.03 94.2 0.669±0.083 1.03 94.2 0.669±0.080 1.03 94.2 0.668±0.076 1.03 94.1
(0.394,1.044) (4000) (0.396,1.022) (4000) (0.394,0.987) (4000) (0.381,0.991) (4000)
0.5 0.671±0.107 1.03 94.1 0.671±0.104 1.03 94.3 0.670±0.100 1.03 94.4 0.670±0.096 1.03 94.4
(0.348,1.206) (4000) (0.338,1.178) (4000) (0.324,1.135) (4000) (0.309,1.083) (4000)
0.8 0.685±0.190 0.96 95.5 0.684±0.188 0.96 95.5 0.683±0.185 0.95 95.9 0.683±0.182 0.95 95.9
(0.088,2.084) (4000) (0.085,2.094) (4000) (0.082,2.107) (4000) (0.078,2.125) (4000)
0.333 0.2 0.340±0.108 0.99 94.7 0.340±0.107 1.00 94.9 0.339±0.106 1.01 95.1 0.339±0.104 1.00 94.8
(0.025,0.969) (4000) (0.022,1.050) (4000) (0.020,1.125) (4000) (0.017,1.186) (4000)
0.5 0.355±0.182 0.15 96.4 0.354±0.184 0.13 96.6 0.354±0.186 0.12 96.7 0.354±0.188 0.11 96.8
(0.022,5.035) (3999) (0.022,5.447) (3999) (0.022,5.830) (3999) (0.015,6.16) (3999)
0.8* 0.382±0.207 0.26 99.4 0.381±0.204 0.26 99.8 0.382±0.204 0.26 99.5 0.382±0.204 0.24 99.6
(0.111,0.999) (2917) (0.111,0.999) (2904) (0.112,0.999) (2908) (0.111,0.999) (2913)
*

Restricted to λ^TQ19 and λ^1.0.

In the case of n = 350 (Table V(a)), for 32 of the 36 designs (1st eight rows of Table V(a)), the bias is minimal for all parameter combinations. The C statistic ranges from 0.97 to 1.06 and the coverage probability ranges from 94.2 to 95.8 per cent compared with a nominal average of 95 per cent. The one exception to this rule is in the case where λTQ = 1/3 and ρT = 0.8 (9th row of Table V(a)), where both the point estimate λ̂TQ and its associated variance Var(λ̂TQ) become large if ρ̂T is close to 1. This results in a slightly biased estimate of λ̂TQ (range from 0.359 to 0.360) and wide confidence limits (coverage probability from 98.9 to 99.1 per cent). To reduce variation, we restricted the range of λ̂TQ to the interval (19,1.0), which was satisfied in 96 per cent of simulations. This reduced the problem but did not eliminate it. It is likely that a larger sample size for a validation study is needed to accurately estimate λTQ in this particular setting or one can bootstrap as an alternative to using the large sample confidence limits in equation (12). In our example, λ̂TQ was 0.25 and ρ̂T was 0.53, which is less extreme than the above aberrant situation.

In the case of n = 100 (Table V(b)), the coverage probability ranges from 93.8 to 95.9 per cent and the C statistic ranges from 0.95 to 1.03 in the first 7 rows of the table. The procedure behaves badly in the extreme case where λTQ = 0.333 and ρT = 0.5–0.8, with coverage probabilities that are too large. The number of simulations for particular parameter combinations is sometimes <4000 due to negative variance estimates for log λTQ in equation (A2) for some simulated samples, particularly for n = 100.

5. DISCUSSION

We have presented an extension of the standard regression calibration model that allows for the presence of correlated error between a surrogate instrument (Q) and a gold standard instrument (R). Fitting this model requires longitudinal data for Q, R, and a biomarker (M) over a comparable time period t that is sufficiently long so that a meaningful change in dietary intake is possible, which is correlated, albeit imperfectly, with a change in the associated biomarker. A notable feature of this approach is that possible between-person variation in the biomarker (mi) among people with the same dietary intake is accounted for, but is assumed to be uncorrelated with the systematic error in Q(qi) and R(ri). In addition, true intake (Tij) for individual subjects is allowed to change over time. Furthermore, since changes in other covariates (Z) may influence changes in Q, R, and M, an extension of the approach is presented, which allows one to control for changes in one or more covariates (Z). Maximum likelihood estimates of model parameters can be obtained with standard software. A formula for the standard error of the modified regression calibration factor (λTQ) is given in Appendix A (SAS macro available at the following website http://www.geocities.com/bernardrosner/Channing.html, which provides estimates and standard errors of all model parameters).

We applied these methods to the assessment of measurement error in dietary vitamin C intake among 323 subjects in the EPIC-Norfolk study, who provided dietary vitamin C intake data from both the FFQ and a 7-day DR as well as a plasma vitamin C sample on two occasions 4 years apart. Results from these analyses revealed substantial correlated error between the FFQ and the DR (ρqr ≅ 0.61). Thus, with an uncorrected calorie-adjusted RR of 1.2, we obtain a measurement error corrected RR of 1.5 and 2.0 using the standard and modified regression calibration approaches, a substantial difference. We also performed an extensive simulation study, which indicated that for most parameter combinations, the estimator of λTQ in equation (10) and the corresponding large sample confidence limits in equation (12) performed well based on validation study sample sizes of 350 and 100 subjects. For some extreme designs, coverage probabilities were sometimes slightly larger than 0.95, resulting in somewhat conservative inferences. In this simulation study, the proportions of variation due to random error in the FFQ (Var(eQij)/Var(Qij)), DR(Var(eRij)/Var(Rij)), and biomarker (Var(eMij)/Var(Mij)) were all fixed at 13. These proportions were similar to the observed proportions in the EPIC-Norfolk study data (0.33, 0.21, and 0.41, respectively) based on calorie-adjusted intake. Additional simulations could be performed with varying proportions due to random error to assess the quality of the estimator λ̂TQ under different conditions.

In the EPIC-Norfolk data set, plasma vitamin C was much more highly correlated with calorie-adjusted vitamin C intake from the DR than with calorie-adjusted vitamin C from the FFQ, both cross-sectionally and longitudinally. However, the FFQ estimates average intake over the past year, whereas the DR estimates intake over 1 week. Since the plasma vitamin C was obtained at about the same time as the DR, this may explain why it was more closely correlated with the DR than with the FFQ. We also looked at the correlation between plasma vitamin C at baseline vs each of the calorie-adjusted FFQ intakes at year 4 (ρ = 0.19) and calorie-adjusted DR intakes at year 4 (ρ = 0.26) (data not shown). The difference between these correlations appears narrower than the corresponding baseline cross-sectional correlations (FFQ baseline intake vs plasma vitamin C baseline, ρ = 0.25; DR baseline intake vs plasma vitamin C intake baseline, ρ = 0.40), reflecting the point that the FFQ estimates intake over a longer period of time and suggesting that the FFQ and DR may have similar validity as measures of long-term intake. Because the DR and biomarker were collected in close proximity both at baseline and at year 4, this would also tend to overstate the validity of change in vitamin C intake assessed by DR relative to change assessed by FFQ.

An assumption of the model in equations (8) and (14) is that the random effects qi, ri, and mi remain the same over time for each individual. Hence, errors in the estimates of changes in Q and R are assumed to be independent conditional on the change in true intake (equation (9)) and also change in other covariates (equation (17)). This assumption could be examined if independent information were available on one of the parameters, for example, βm, from a separate calibration experiment. This assumption is more likely to hold if the time interval between repeat measurements is short, but sufficiently long, so that true change in diet is possible. Of course, other covariates (Zij) may also change over time and may be associated with qi, ri, and mi in equation (8). However, the ability to control for change in (Zij) (equation (14)) makes the interpretation of qi, ri, and mi to be conditional on (Zij) and makes the assumption of homogeneity over time more reasonable.

In addition, we assume that Var(Tij) remains constant over time, while allowing E(Tij) to vary. If the former assumption is relaxed, one obtains separate regression calibration factors at visits 1 and 2 (λTQ,1, λTQ,2), which can be estimated by

λ^TQ,1=Cov^(Mi1,Ri1)Cov^(Qdi,Rdi)/[Cov^(Mdi,Rdi)Var^(Qi1)]λ^TQ,2=Cov^(Mi2,Ri2)Cov^(Qdi,Rdi)/[Cov^(Mdi,Rdi)Var^(Qi2)]

The delta method can also be used to obtain confidence limits for λTQ,1 and λTQ,2 using similar methods to these given in Appendix A. A possible future extension might test the homogeneity of λTQ at different visits. In the EPIC data set, Var(Qij), Var(Rij), and Var(Mij) remained relatively constant over time (Table I). Furthermore, based on the EPIC data, we have λ̂TQ,1 = 0.281±0.103 (95 per cent CI = 0.137, 0.578) and λ̂TQ,2 = 0.225 ± 0.076 (95 per cent CI = 0.116, 0.436) for calorie-adjusted intake, indicating relative homogeneity of λTQ over the two visits. Finally, previous literature should be explored to ensure that all relevant confounders are included in Zij, in (14).

The traditional goal of regression calibration is to obtain the regression coefficient of true intake (Tij) on surrogate intake (Qij) based on corresponding dietary assessments at one point in time. The example we used should be interpreted as providing estimates when true intake is conceptually relatively short term and biomarkers and nutrient intake are assessed at approximately the same time. However, since cumulative intake over long periods of time is likely to be more strongly associated with some diseases of interest, we should also consider the regression coefficient of μTi on Qij, where μTi is the true intake for subject i over a long period of time. Estimating this regression coefficient requires either more than two repeated measures or making some assumptions regarding the time series structure of true intake [i.e. Corr(Ti1, Ti2), |Ti1Ti2| = t]. If one assumes a first-order autoregressive model for Tij, one can extend equation (8) to estimate this long-term regression coefficient. Similarly, since an average of several FFQs over a long period of time is likely to provide a closer approximation to true intake than a single FFQ, one can also consider Corr(μQi, μ Ti), where μQi is the average FFQ intake over long periods of time. These extensions to measurement error correction of long-term intake are a subject for future work.

ACKNOWLEDGEMENTS

We acknowledge the support of the National Cancer Institute CA50597 in performing this work.

APPENDIX A: CONFIDENCE LIMITS FOR THE ALTERNATIVE REGRESSION CALIBRATION FACTOR (λTQ) AND THE REGRESSION COEFFICIENTS βq AND βm

Since the sampling distribution of λ̂TQ is likely to be skewed in small samples, we will consider

Var[ln(λ^TQ)]=Var{ln[Cov^(Qdi,Rdi)]+ln[Cov^(Mij,Rij)]ln[Cov^(Mdi,Rdi)]ln[Var^(Qij)]} (A1)

It will be advantageous for the evaluation of equation (A1) as well as the estimation of variances for the other parameters in Appendix B to define Xijk as the value of the kth variable at time j for the ith subject, where i = 1, … , n, j = 1, 2, and k = 1,2,3 denote Q, R, and M, respectively, and let

X¯ik=(Xi1k+Xi2k)/2,  Xidk=Xi2kXi1k,  i=1,,n,  k=1,,3

X¯k=i=1nX¯ik/n,k=1,,3,X¯dk=i=1nXidk/n,k=1,,3,X¯jk=i=1nXijk/n,j=1,2,k=1,,3. We also define

Akl=Cov(X¯ik,X¯il),  Bkl=Cov(Xidk,Xidl)

and

Ckl=[Cov(Xi1k,Xi1l)+Cov(Xi2k,Xi2l)]/2,  k,l=1,,3

We will see that variances of λ̂TQ as well as all the parameters in Appendix B can be expressed in terms of Cov(Ak1l1, Ak2l2), Cov(Bk1l1, Bk2l2), Cov(Ck1l1, Ck2l2), and Cov(Ak1l1, Ck2l2), k1, k2, l1, l2 = 1, … , 3. We note that Cov(Ak1l1, Bk2l2) = 0, k1, k2, l1, l2 = 1, … ,3, and Cov(Bk1l1, Ck2l2) = 0, k1, k2, l1, l2 = 1, … , 3.

We then can express equation (A1) in the form: Var[ln(λ̂TQ)] = Var[ln(B12) + ln(C23) − ln(B23) − ln(C11)] which upon using the delta method is given by

Var[ln(λ^TQ)]=Var(B12)/B122+Var(C23)/C232+Var(B23)/B232+Var(C11)/C1122Cov(B12,B23)/(B12B23)2Cov(C11,C23)/(C11C23) (A2)

We have upon some algebra that

Cov(Ak1l1,Ak2l2)=[i=1n(X¯ik1X¯k1)(X¯ik2X¯k2)(X¯il1X¯l1)(X¯il2X¯l2)/nAk1l1Ak2l2]/nCov(Bk1l1,Bk2l2)=[i=1n(Xidk1X¯dk1)(Xidk2X¯dk2)(Xidl1X¯dl1)(Xidl2X¯dl2)/nBk1l1Bk2l2]/nCov(Ck1l1,Ck2l2)=[i=1nj=12(Xijk1X¯jk1)(Xijl1X¯jl1)j=12(Xijk2X¯jk2)(Xijl2X¯jl2)/(4n)Ck1l1Ck2k2]/nCov(Ak1l1,Ck2l2)=[i=1n(X¯ik1X¯k1)(X¯il1X¯l1)j=12(Xijk2X¯jk2)(Xijl2X¯jl2)/(2n)Ak1l1Ck2l2]/n (A3)

Upon combining (A2) and (A3) we obtain Var[ln(λ̂TQ)]. To obtain a 100 per cent × (1 − α) CI for Var λ̂TQ, we compute [exp(c1), exp(c2)], where (c1, c2) = ln(λ̂TQ) ± z1 − α/2[Var(λ̂TQ)]1/2 and zp = pth percentile of a standard normal distribution.

In addition, we can obtain standard errors and CIs for each of the estimated parameters in Appendix B using the delta method as follows:

Var(σ^S2)=σ^S4[4Var(A23)/A232+Var(B12)/B122+Var(A13)/A132+Var(B23)/B2324Cov(A13,A23)/(A13A23)2Cov(B12,B23)/(B12B23)] (A4)
Var(σ^D2)=σ^D4[Var(B12)/B122+Var(B23)/B232+Var(B13)/B132+2Cov(B12,B23)/(B12B23)2Cov(B12,B13)/(B12B13)2Cov(B13,B23)/(B13B23)] (A5)
Var(σ^T2)=Var(σ^S2)+Var(σ^D2)/16 (A6)

For the remaining variance estimates, it will be useful to assess Cov(Akl,σT2),Cov(Bkl,σT2), and Cov(Ckl,σT2). We have upon using the delta method that

Cov(Akl,σT2)=Cov(Akl,σS2)=σS2[2Cov(Akl,A23)/A23Cov(Akl,A13)/A13] (A7)
Cov(Bkl,σT2)=Cov(Bkl,σS2)+Cov(Bkl,σD2)/4 (A7a)

where

Cov(Bkl,σS2)=σS2[Cov(Bkl,B12)/B12Cov(Bkl,B23)/B23] (A7b)
Cov(Bkl,σD2)=σD2[Cov(Bkl,B12)/B12+Cov(Bkl,B23)/B23Cov(Bkl,B13)/B13] (A7c)
Cov(Ckl,σT2)=Cov(Ckl,σS2)=σS2[2Cov(Ckl,A23)/A23Cov(Ckl,A13)/A13] (A7d)
Cov(Ckl,σD2)=Cov(Akl,σD2)=0Var(β^q)=β^q2[Var(B12)/B122+Var(C23)/C232+Var(B23)/B232+Var(σ^T2)/σ^T42Cov(B12,B23)/(B12B23)2Cov(B12,σ^T2)/(B12σ^T2)2Cov(C23,σ^T2)/(C23σ^T2)+2Cov(B23,σ^T2)/(B23σ^T2)] (A8)
Var(β^m)=β^m2[Var(C23)/C232+Var(σ^T2)/σ^T42Cov(C23,σ^T2)/(C23σ^T2)] (A9)

where Var(σ^T2) and Cov(C23,σ^T2) are given in (A6) and (A7d), respectively.

The variance of the remaining estimated parameters in Appendix B can also be obtained using the delta method similar to equations (A4)(A9).

APPENDIX B: MLES OF THE PARAMETERS FOR THE MODEL IN EQUATIONS (8) AND (14)

σ^S2=A232B12/(A13B23)
σ^D2=B12B23/B13
σ^T2=Var^(Tij)=σ^S2+σ^D2/4
β^q=B12Cov^(Mij,Rij)/B23σ^T2)
β^m=Cov^(Mij,Rij)/σ^T2
σ^eq2=(B11β^q2σ^D2)/2
σ^er2=(B22σ^D2)/2
σ^em2=(B33β^m2σ^D2)/2
σ^q2=A11β^q2σ^S2σ^eq2/2
σ^r2=A22σ^S2σ^er2/2
σ^m2=A33β^m2σ^S2σ^em2/2
ρ^qr=(A12β^qσ^S2)/(σ^qσ^r)
ρ^TCov^(Ti1,Ti2)/σ^T2=[Cov^(Ri1,Ri2)σ^r2]/σ^T2=(A22B22/4σ^r2)/σ^T2

Furthermore, the MLEs of the mean parameters in (8) are given by

μ^Tj=i=1nRij/n,j=1,2α^qj=i=1nQij/nβ^qμ^Tj,j=1,2α^mj=i=1nMij/nβ^mμ^Tj,j=1,2

The parameters αqj and αmj in equation (14) can be estimated by the intercept terms in the QRij and MRij mixed effects models in equation (20). The parameter σS2,,ρT in equation (14) can be estimated by substituting residuals of Q on Z, R on Z, and M on Z, respectively, for Q, R, and M and using the above expressions. The parameter μTj is estimated similarly in equations (8) and (14). The parameters γ̰q and γ̰m are estimated from the mixed effects regression models in equation (20).

REFERENCES

  • 1.Willett WC, Sampson L, Stampfer MJ, Rosner B, Bain C, Witschi J, Hennekens CH, Speizer FE. Reproducibility and validity of a semi-quantitative food frequency questionnaire. American Journal of Epidemiology. 1985;122:51–65. doi: 10.1093/oxfordjournals.aje.a114086. [DOI] [PubMed] [Google Scholar]
  • 2.Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics in Medicine. 1989;8:1051–1069. doi: 10.1002/sim.4780080905. [DOI] [PubMed] [Google Scholar]
  • 3.Plummer M, Clayton D. Measurement error in dietary assessment: an investigation using covariance structure models. Part I. Statistics in Medicine. 1993;12:925–935. doi: 10.1002/sim.4780121004. [DOI] [PubMed] [Google Scholar]
  • 4.Plummer M, Clayton D. Measurement error in dietary assessment: an investigation using covariance structure models. Part II. Statistics in Medicine. 1993;12:937–948. doi: 10.1002/sim.4780121005. [DOI] [PubMed] [Google Scholar]
  • 5.Kaaks R, Riboli E, Esteve J, Van Kappel A, Van Staveren W. Estimating the accuracy of dietary questionnaire assessments: validation in terms of structural equation models. Statistics in Medicine. 1994;13:127–142. doi: 10.1002/sim.4780130204. [DOI] [PubMed] [Google Scholar]
  • 6.Ocke MC, Kaaks RJ. Biochemical markers as additional measurements in dietary validity studies: application of the method of triads with examples from the European Prospective Investigation into Cancer and Nutrition. American Journal of Clinical Nutrition. 1997;65:1240S–1245S. doi: 10.1093/ajcn/65.4.1240S. [DOI] [PubMed] [Google Scholar]
  • 7.Subar AF, Kipnis V, Troiano RP, Midthune D, Schoeller DA, Bingham S, Sharbaugh CO, Trabulsi J, Runswick S, Ballard-Barbash R, Sunshine J, Schatzkin A. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. American Journal of Epidemiology. 2003;158(1):1–13. doi: 10.1093/aje/kwg092. [DOI] [PubMed] [Google Scholar]
  • 8.Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology. 1990;132:134–145. doi: 10.1093/oxfordjournals.aje.a115715. [DOI] [PubMed] [Google Scholar]
  • 9.Kipnis V, Carroll RJ, Freedman LS, Li L. Implications of a new dietary measurement error model for estimation and relative risk: application to four calibration studies. American Journal of Epidemiology. 1999;150(6):642–651. doi: 10.1093/oxfordjournals.aje.a010063. [DOI] [PubMed] [Google Scholar]
  • 10.Spiegelman D, Zhao B, Kim J. Correlated errors in biased surrogates: study designs and methods for measurement error correction. Statistics in Medicine. 2005;24(11):1657–1682. doi: 10.1002/sim.2055. [DOI] [PubMed] [Google Scholar]
  • 11.Fraser GE, Butler TL, Shavlik DJ. Correlation between estimated and true dietary intakes: using two instrumental variables. Annals of Epidemiology. 2005;15:509–518. doi: 10.1016/j.annepidem.2004.12.012. [DOI] [PubMed] [Google Scholar]
  • 12.Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, Fahey M, Charrondière UR, Hémon B, Casagrande C, Vignat J, Overvad K, Tjønneland A, Clavel-Chapelon F, Thiébaut A, Wahrendorf J, Boeing H, Trichopoulos D, Trichopoulou A, Vineis P, Palli D, Bueno-de-Mesquita HB, Peeters PHM, Lund E, Engeset D, González CA, Barricarte A, Berglund G, Hallmans G, Day NE, Key TJ, Kaaks R, Saracci R. European prospective investigation into cancer and nutrition (EPIC): study population and data collection. Public Health Nutrition. 2002;5:1113–1124. doi: 10.1079/PHN2002394. [DOI] [PubMed] [Google Scholar]
  • 13.Rosner B. Percentage points for a generalized ESD many-outlier procedure. Technometrics. 1983;25(2):165–172. [Google Scholar]
  • 14.Michels KB, Bingham SA, Luben R, Welch AA, Day NE. The effect of correlated measurement error in multivariate models of diet. American Journal of Epidemiology. 2004;160:59–67. doi: 10.1093/aje/kwh169. [DOI] [PubMed] [Google Scholar]

RESOURCES