Measurement error correction for nutritional exposures with correlated measurement error: Use of the method of triads in a longitudinal setting

Bernard Rosner; Karin B Michels; Ya-Hua Chen; Nicholas E Day

doi:10.1002/sim.3238

. Author manuscript; available in PMC: 2011 Feb 14.

Published in final edited form as: Stat Med. 2008 Aug 15;27(18):3466–3489. doi: 10.1002/sim.3238

Measurement error correction for nutritional exposures with correlated measurement error: Use of the method of triads in a longitudinal setting

Bernard Rosner ^1,^*,^†, Karin B Michels ^2,³, Ya-Hua Chen ¹, Nicholas E Day ⁴

PMCID: PMC3038790 NIHMSID: NIHMS66646 PMID: 18416440

SUMMARY

Nutritional exposures are often measured with considerable error in commonly used surrogate instruments such as the food frequency questionnaire (FFQ) (denoted by Q_i for the ith subject). The error can be both systematic and random. The diet record (DR) denoted by R_i for the ith subject is considered an alloyed gold standard. However, some authors have reported both systematic and random errors with this instrument as well.

One goal in measurement error research is to estimate the regression coefficient of T_i (true intake for the ith subject) on Q_i denoted by λ_TQ. If the systematic errors in Q_i and R_i (denoted by q_i and r_i) are uncorrelated, then one can obtain an unbiased estimate of λ_TQ by λ_RQ obtained by regressing R_i on Q_i. Howfever, if Corr(q_i, r_i) > 0, then λ_RQ > λTQ.

In this paper, we propose a method for indirectly estimating λ_TQ even in the presence of correlated systematic error based on a longitudinal design where Q_i (surrogate measure of dietary intake), R_i (a reference measure of dietary intake), and M_i (a biomarker) are available on the same subjects at 2 time points. In addition, between-person variation in mean levels of M_i among people with the same dietary intake is also accounted for. The methodology is illustrated for dietary vitamin C intake based on longitudinal data from 323 subjects in the European Prospective Investigation of Cancer (EPIC)-Norfolk study who provided two measures of dietary vitamin C intake from the FFQ (Q_i) and a 7-day DR (R_i) and plasma vitamin C (M_i) 4 years apart.

Keywords: measurement error, longitudinal data, correlated error, biomarkers

1. INTRODUCTION

The diet record (DR) has often been used as a reference instrument to validate other surrogate instruments of nutritional intake such as the food frequency questionnaire (FFQ) [1]. The FFQ is known to have both random and systematic components of measurement error [2]. The reference instrument (e.g. DR, 24-h recall) may also have both systematic and random errors, although it is generally acknowledged that on average over a large number of people, the reference instrument provides an unbiased estimate of the population mean of true intake. Plummer and Clayton [3] consider the following model:

\begin{matrix} Q_{ij} = α_{q} + T_{i} + e_{Qij}, i = 1, \dots, N, j = 1, \dots, J \\ R_{ij} = T_{i} + e_{Rij}, i = 1, \dots, N, j = 1, \dots, J \end{matrix}

(1)

where Q_ij(R_ij) are the FFQ (DR) intakes for the i th subject at time j, T_i is the true intake, Corr(e_Qi1, e_Qi2) = ρ_q, Corr(e_Ri1, e_Ri2) = ρ_r, Corr(e_Qij, e_Rij) = ρ_qr, and Corr(e_Qij, e_Ril) = 0 for j ≠ l = 1, … , J. In this model, α_q represents systematic error in the FFQ, e_Qij represents random error in the FFQ, and e_Rij represents random error in the DR.

This model is identifiable and allows for a shift-bias term (α_q) for the FFQ. However, it does not allow for a scale-bias term where the degree of bias in the FFQ is a function of T_i. Plummer and Clayton [4] have extended the model in (1) by the use of scale-bias coefficients (β_q, β_r) for nutrient intake and the use of biomarker measurements (M_ij):

\begin{matrix} Q_{ij} = α_{q} + β_{q} T_{i} + e_{Qij}, i = 1, \dots, N, j = 1, \dots, J \\ R_{ij} = α_{r} + β_{r} T_{i} + e_{Rij}, i = 1, \dots, N, j = 1, \dots, J \\ M_{ij} = α_{m} + T_{i} + e_{Mij}, i = 1, \dots, N, j = 1, \dots, J \end{matrix}

(2)

where M_ij is the biomarker for the ith subject at the jth time period, Corr(e_Qij, e_Qil) = ρ_{Q, jl} ≠ 0, Corr (e_Rij, e_Ril) = ρ_Rjl ≠ 0, Corr(e_Qij, e_Rij) = ρ_{QR, j} ≠ 0 and Corr (e_Qij, e_Ril) = ρ_{QR, jl} ≠ 0, j ≠l, Corr(e_Qij, e_Mij) = Corr(e_Rij, e_Mij) = 0, Corr(e_Qij, e_Mil) = Corr(e_Rij, e_Mil) = 0, j ≠ l. This type of model might be appropriate for a recovery biomarker such as urine nitrogen, but may not be appropriate for a biomarker such as plasma vitamin C because of the absence of a scale-bias term for the regression of M_ij on T_i.

Kaaks et al. [5] have considered a slightly different measurement error model, allowing for a scale-bias factor for the biomarker measurement (M), but no scale-bias factor for the reference instrument:

\begin{matrix} Q_{i} = α_{q} + β_{q} T_{i} + e_{Qi}, i = 1, \dots, N \\ R_{i} = T_{i} + e_{Ri}, i = 1, \dots, N \\ M_{i} = α_{m} + β_{m} T_{i} + e_{Mi}, i = 1, \dots, N \end{matrix}

(3)

where Corr(e_Qi, e_Ri) = Corr(e_Qi, e_Mi) = Corr(e_Ri, e_Mi) = 0.

Based on (3) and using structural equation methods, Ocke and Kaaks [6] proposed the method of triads estimator:

{\hat{ρ}}_{TQ}^{2} = \hat{Corr} (Q_{i}, R_{i}) \hat{Corr} (Q_{i}, M_{i}) / \hat{Corr} (M_{i}, R_{i})

However, this estimator may not be valid if there is correlated error between the surrogate (Q) and the reference (R) instruments. Hence, Subar et al. [7] consider the following model:

\begin{matrix} Q_{ij} = μ_{qj} + α_{q} + q_{i} + β_{q} T_{i} + e_{Qij} \\ R_{ij} = μ_{rj} + α_{r} + r_{i} + β_{r} T_{i} + e_{Rij} \\ M_{ij} = μ_{mj} + T_{i} + e_{Mij} \end{matrix}

(4)

where α_q and β_q are the shift- and scale-bias factors for the surrogate (FFQ) and α_r and β_r are the shift- and scale-bias factors for the reference measure (e.g. DR). q_i and r_i are systematic errors for the surrogate and reference measures, respectively, e_Qij and e_Rij are random errors, and ∑_j μ_qj = ∑_j μ_rj = 0. This model is similar to the Plummer and Clayton [4] model in equation (2), except that the systematic and random errors in Q and R are more explicitly defined. The objective is to obtain the regression coefficient of T_i on Q_ij which can be expressed in the form: λ_TQ = Cov(Q_ij, T_i)/Var(Q_ij). Provided that (a) the systematic errors in the FFQ (q_i) and DR (r_i) are independent, (b) the scale bias for the reference instrument (β_r) is 1, and (c) the random errors (e_Qij, e_Rij) are independent, it can be shown from (4) that Cov (Q_ij, R_ij) = Cov (Q_ij, T_i); thus,

λ_{TQ} = Cov (Q_{ij}, T_{i}) / Var (Q_{ij}) = Cov (Q_{ij}, R_{ij}) / Var (Q_{ij}) = λ_{RQ}

Thus, the reference instrument can then be used to correct for measurement error based on the regression calibration approach [2, 8].

However, it is possible that there is correlated error between the surrogate and the reference instruments or Cov(q_i, r_i) > 0, or that β_r ≠ 1 whereby λ_RQ ≠ λ_TQ. The disparity between λ_RQ and λ_TQ can be large if Corr(q_i, r_i) is non-trivial [9].

Spiegelman et al. [10] have also considered a biomarker-based model of the form:

\begin{matrix} Q_{ij} = α_{q} + q_{i} + β_{q} T_{i} + e_{Qij}, i = 1, \dots, N, j = 1, \dots, J_{Q} \geq 2 \\ R_{ij} = r_{i} + T_{i} + e_{Rij}, i = 1, \dots, N, j = 1, \dots, J_{R} \geq 2 \\ M_{ij} = α_{m} + β_{m} T_{i} + e_{Mij}, i = 1, \dots, N, j = 1, \dots, J_{M} \geq 2 \end{matrix}

(5)

The authors propose a method of moments approach, whereby an unbiased estimate of λ_TQ can be obtained even if there is correlated error, if replicate measures of Q_ij, R_ij, and M_ij are available.

This model differs from equations (2) and (4) in that (a) the reference measure is assumed to have no shift or scale bias at the population level and (b) the biomarker does have possible shift-bias (α_m) and scale-bias (β_m) factors. This model may be more appropriate than (2) or (4) for an imperfect concentration biomarker (e.g. plasma vitamin C).

Fraser et al. [11] consider a two biomarker model of the form:

\begin{matrix} Q_{i} = α_{q} + β_{q} T_{i} + e_{Qi} \\ M_{i 1} = α_{m 1} + β_{m 1} T_{i} + e_{M 1 i} \\ M_{i 2} = α_{m 2} + β_{m 2} T_{i} + e_{M 2 i} \end{matrix}

(6)

where Corr(e_Qi, e_M1i) = Corr(e_Qi, e_M2i) = Corr(e_M1i, e_M2i) = 0.

The parameters in this model are identifiable but only under the assumption that the errors in the two biomarkers (e_M1i, e_M2i) are uncorrelated, which may not be generally true if there is between-person variation and covariation in mean levels of biomarker measurements among people with the same dietary intake. In addition, a model with R_i, substituted for Q_i, is also considered, with similar assumptions.

Spiegelman et al. [10] also consider a design with an unreplicated biomarker (M) and an additional instrumental variable (V) of the form:

\begin{matrix} Q_{ij} = α_{q} + q_{i} + β_{q} T_{i} + e_{Qij}, i = 1, \dots, N, j = 1, \dots, J_{Q} \geq 2 \\ R_{ij} = r_{i} + T_{i} + e_{Rij}, i = 1, \dots, N, j = 1, \dots, J_{R} \geq 2 \\ M_{i} = α_{M} + β_{m} T_{i} + e_{Mi}, i = 1, \dots, N \\ V_{i} = α_{V} + β_{υ} T_{i} + e_{Vi}, i = 1, \dots, N \end{matrix}

(7)

where Corr(e_Mi, e_Vi) = 0. This model extends the work of Fraser et al. [11] by allowing for the surrogate (Q), reference measure (R), a biomarker (M), and an instrumental variable (V) in the same model. However, the model in (7) is not uniquely identifiable if there is only a single biomarker and a single instrumental variable (V), but becomes identifiable if there are replicate measures available for both the biomarker (M) and the instrumental variable (V).

In general, there are some potential limitations to the biomarker-based models in equations (4)–(7). First, there is the issue of the specificity of biomarker measurements for the exposure of interest. Second, even if specificity of the biomarker is assumed, there may be metabolic differences among people (e.g. some subjects may have systematically different metabolic absorption rates); hence, there may be systematic error in a biomarker (m_i), which is likely to be uncorrelated with either q_i or r_i in equation (4) or (5). Third, if the time periods in equation (2), (4), or (5) are proximate to each other (e.g. months apart), then it is reasonable to assume that T_i (true intake) would be the same for a given subject at each time point. However, it may often be the case that surrogate instruments are administered at distinct long-term time periods (e.g. years apart) in which case T_i may change over time. In this paper, we focus our attention on the single biomarker case and generalize equations (2), (4), and (5) to allow for (a) possible systematic error (henceforth referred to as between-person variation) in biomarker measurements and (b) variation in true intake over time by using longitudinal data on Q, R, and M. All parameters in this model are estimable and standard errors and confidence limits in closed form are available. We then apply these methods to dietary vitamin C intake from the EPIC study to assess whether correlated error has a substantial impact on regression calibration.

2. METHODS

We will first consider the case where there are no additional covariates that affect nutrient intake or biomarker measurements for a particular nutrient.

2.1. No additional covariates that affect nutrient intake or associated biomarkers

We consider an extension of the model in Plummer and Clayton [4], Kaaks et al. [5], and Spiegelman et al. [10] of the form:

\begin{matrix} Q_{ij} = α_{qj} + q_{i} + β_{q} T_{ij} + e_{Qij}, i = 1, \dots, n, j = 1, 2 \\ R_{ij} = r_{i} + T_{ij} + e_{Rij}, i = 1, \dots, n, j = 1, 2 \\ M_{ij} = α_{mj} + m_{i} + β_{m} T_{ij} + e_{Mij}, i = 1, \dots, n, j = 1, 2 \end{matrix}

(8)

where T_ij is the true intake for the ith subject at the jth time point and q_i, r_i, and m_i are random effects for the surrogate instrument (Q), reference instrument (R), and biomarker (M), which are distributed as $N (0, σ_{q}^{2}), N (0, σ_{r}^{2}), and N (0, σ_{m}^{2})$ , respectively. We assume that Corr(q_i, m_i) = Corr(r_i, m_i) = 0, but Corr(q_i, r_i) is not necessarily 0. Also the random errors for Q, R, and M denoted by e_Qij, e_Rij, and e_Mij are distributed as $N (0, σ_{eq}^{2}), N (0, σ_{er}^{2}), and N (0, σ_{em}^{2})$ , respectively, and are mutually independent of each other as well as q_i, r_i, and m_i. Thus, the random errors in Q, R, and M are assumed to be independent both within a given visit and across visits. This may not hold if an additional covariate (e.g. body mass index (BMI)) is related to reported surrogate intake (Q) even conditional on true intake. This issue is considered further in Section 2.3. The random variable m_i represents between-person variation in mean levels of the biomarker whose variance $(σ_{m}^{2})$ is a measure of variation in the biomarker among people with the same dietary intake T_ij. Finally, we assume that Var(T_i1) = Var(T_i2) and denote this common cross-sectional variance by Var(T_ij), but allow E(T_i1) and E(T_i2) to be free parameters, denoted by μ_T1 and μ_T2, respectively.

Fitting this model requires longitudinal data over a comparable time period for the surrogate instrument, reference instrument, and biomarker. We note that change scores for Q, R, and M are of the form:

\begin{matrix} Q_{di} \equiv Q_{i 2} - Q_{i 1} = α_{q 2} - α_{q 1} + β_{q} (T_{i 2} - T_{i 1}) + e_{Qi 2} - e_{Qi 1} \equiv α_{q 2} - α_{q 1} + β_{q} T_{di} + e_{Qi 2} - e_{Qi 1} \\ R_{di} \equiv R_{i 2} - R_{i 1} = T_{di} + e_{Ri 2} - e_{Ri 1} \\ M_{di} \equiv M_{i 2} - M_{i 1} = α_{m 2} - α_{m 1} + β_{m} T_{di} + e_{Mi 2} - e_{Mi 1} \end{matrix}

(9)

where T_di is the change in true intake for the ith subject = T_i2−T_i1, i = 1, … ,n.

None of the change scores contain the random effects in (8). Our goal is to estimate the measurement error correction factor, which is obtained from the regression coefficient of T_ij on Q_ij of the form:

T_{ij} = α_{TQ} + λ_{TQ} Q_{ij} + e_{ij}^{*}

Thus, λ_TQ = Cov(Q_ij, T_ij)/Var(Q_ij) = β_q Var(T_ij)/Var(Q_ij) = (β_q/β_m)[β_m Var(T_ij)]/Var(Q_ij). It can be shown that the maximum likelihood estimator (MLE) of β_m Var(T_ij) can be obtained by ${\hat{β}}_{m} \hat{Var} (T_{ij}) = {\hat{β}}_{m} [\hat{Var} (T_{i 1}) + \hat{Var} (T_{i 2})] / 2 = [\hat{Cov} (M_{i 1,} R_{i 1}) + \hat{Cov} (M_{i 2,} R_{i 2})] / 2 \equiv \hat{Cov} (M_{ij,} R_{ij})$ Furthermore, the MLE of β_q/β_m can be obtained from (9) by

{\hat{β}}_{q} / {\hat{β}}_{m} = \hat{Cov} (Q_{di}, R_{di}) / \hat{Cov} (M_{di}, R_{di})

If we denote $[\hat{Var} (Q_{i 1}) + \hat{Var} (Q_{i 2})] / 2 by \hat{Var} (Q_{ij})$ , it follows that the MLE of λ_TQ is given by

{\hat{λ}}_{TQ} = \hat{Cov} (Q_{di}, R_{di}) \hat{Cov} (M_{ij}, R_{ij}) / [\hat{Cov} (M_{di}, R_{di}) \hat{Var} (Q_{ij})]

(10)

The standard regression calibration factor based on the reference instrument alone is obtained from the regression coefficient of R_ij on Q_ij given by ${\hat{λ}}_{RQ} = \hat{Cov} (Q_{ij}, R_{ij}) / \hat{Var} (Q_{ij})$ . Based on equation (8), we obtain

λ_{RQ} = [β_{q} Var (T_{ij}) + Cov (q_{i}, r_{i})] / Var (Q_{ij})

(11)

If there is correlated error, then Cov(q_i, r_i) > 0 and λ_RQ > λ_TQ.

We now consider confidence limits for λ_TQ. We have found in simulation studies based on the model in equation (8) that the sampling distribution of λ̂_TQ is positively skewed, especially if n is small. Hence, in Appendix A we use the delta method to obtain a closed-form expression for Var[ln(λ̂_TQ)]. A 100 per cent × (1−α) confidence interval (CI) for λ_TQ is then given by [exp(c₁), exp(c₂)], where

(c_{1}, c_{2}) = ln ({\hat{λ}}_{TQ}) \pm z_{1 - α / 2} {Var [ln ({\hat{λ}}_{TQ})]}^{1 / 2}

(12)

Var[ln(λ̂_TQ)] is obtained from equation (A2) and z_{1 −α/2} is the upper α/2 percentile of an N(0,1) distribution.

2.2. Variance decomposition

Based on the model in (8), the variance of Q_ij can be separated into the following independent components:

Var (Q_{ij}) = Var (q_{i}) + β_{q}^{2} Var (T_{ij}) + Var (e_{Qij})

(13)

where $β_{q}^{2} Var (T_{ij})$ represents variation in Q attributable to true intake, Var(q_i) represents variation due to systematic error, and Var(e_Qij) represents variation due to random error. A similar decomposition can be performed for variations in the DR (R_ij) and the biomarker (M_ij), respectively. To facilitate this decomposition, one can derive maximum likelihood estimates (MLEs) of all the parameters in the model. For this purpose, we let ν̰_i = (Q̅_i, R̅_i, M̅_i), w̰_i = (Q_di, R_di, M_di), i = 1, … ,n and define

A = \sum^{^} (\underset{˜}{ν}), B = \sum^{^} (\underset{˜}{w})

where

\begin{matrix} A_{11} = \sum_{i = 1}^{n} {(\bar{Q_{i}} - \bar{\bar{Q}})}^{2} / n, & A_{12} = \sum_{i = 1}^{n} (\bar{Q_{i}} - \bar{\bar{Q}}) (\bar{R_{i}} - \bar{\bar{R}}) / n, & A_{13} = \sum_{i = 1}^{n} (\bar{Q_{i}} - \bar{\bar{Q}}) (\bar{M_{i}} - \bar{\bar{M}}) / n \\ A_{22} = \sum_{i = 1}^{n} {(\bar{R_{i}} - \bar{\bar{R}})}^{2} / n, & A_{23} = \sum_{i = 1}^{n} (\bar{R_{i}} - \bar{\bar{R}}) (\bar{M_{i}} - \bar{\bar{M}}) / n, & A_{33} = \sum_{i = 1}^{n} {(\bar{M_{i}} - \bar{\bar{M}})}^{2} / n \end{matrix}

Q̅_i = (Q_i1 + Q_i2)/2, R̅_i = (R_i1 + R_i2)/2, M̅_i = (M_i1 + M_i2)/2, $\bar{\bar{Q}} = \sum_{i = 1}^{n} {\bar{Q}}_{i} / n, \bar{\bar{R}} = \sum_{i = 1}^{n} {\bar{R}}_{i} / n, \bar{\bar{M}} = \sum_{i = 1}^{n} {\bar{M}}_{i} / n$ , A_lk = A_kl for all k, l = 1,2,3 and the elements of B are defined similarly based on Q_di, R_di, and M_di, i = 1, … ,n, and ${\bar{Q}}_{d} = \sum_{i = 1}^{n} {\bar{Q}}_{di} / n, {\bar{R}}_{d} = \sum_{i = 1}^{n} R_{di} / n, {\bar{M}}_{d} = \sum_{i = 1}^{n} M_{di} / n$ . We also let S_i = (T_i1 + T_i2)/2 and define $σ_{S}^{2} = Var (S_{i}), σ_{D}^{2} = Var (T_{i 2} - T_{i 1}) \equiv Var (T_{di})$ . It can be shown that the MLEs of the variance–covariance parameters of (8) exist in closed form and are given in Appendix B.

2.3. Additional covariates affecting nutrient intake and/or associated biomarkers

It is often the case that biomarker measurements M_ij will be affected by covariates other than true intake (T_ij) of the nutrient under study. For example, BMI and cigarette smoking may influence the metabolism and absorption of many nutrients. In addition, true dietary intake (T_ij) as well as recording of diet using a surrogate instrument (Q_ij) may also be influenced by other covariates. Let Z_ijk be the value of the kth covariate measured on the ith subject at time j ; k = 1, … , K. Thus, we consider an extension of (8), which is given by

\begin{matrix} Q_{ij} = α_{qj} + q_{i} + β_{q} T_{ij} + γ_{q}^{'} Z_{ij} + e_{Qij}, i = 1, \dots, n, j = 1, 2 \\ R_{ij} = r_{i} + T_{ij} + e_{Rij}, i = 1, \dots, n, j = 1, 2 \\ M_{ij} = α_{mj} + m_{i} + β_{m} T_{ij} + γ_{m}^{'} Z_{ij} + e_{Mij}, i = 1, \dots, n, j = 1, 2 \\ T_{ij} = α_{Tj} + δ' Z_{ij} + e_{Tij}, i = 1, \dots, n, j = 1, 2 \end{matrix}

(14)

where $Z_{ij}^{'} = (Z_{ij 1}, \dots, Z_{ijK}), γ_{q}^{'} = (γ_{q 1}, \dots, γ_{qK}), γ_{m}^{'} = (γ_{m 1}, \dots, γ_{mK})$ , and δ′ = (δ₁, … , δ_K) are 1 × K vectors; $e_{Qij} ~ N (0, σ_{eq}^{2}), e_{Rij} ~ N (0, σ_{er}^{2}), e_{Mij} ~ N (0, σ_{em}^{2}), e_{Tij} ~ N (0, σ_{T}^{2})$ ; e_Qij, e_Rij, e_Mij, and e_Tij are independent; q_i, r_i, and m_i are independent of both T_ij and Z_ij as well as e_Qij, e_Rij, e_Mij, and e_Tij; $q_{i} ~ N (0, σ_{q}^{2}), r_{i} ~ N (0, σ_{r}^{2}), m_{i} ~ N (0, σ_{m}^{2})$ ; and q_i and r_i are each independent of m_i; however, q_i and r_i may be dependent. Note that q_i, r_i and m_i in (14) represent random effects conditional on both T_ij and Z_ij and, hence, have a different interpretation than in (8). For example, if Z_ij = BMI, then q_i, r_i, and m_i are conditional on BMI, making the assumption of independence between say q_i and Z_ij more reasonable.

We wish to estimate λ_TQ|Z = β_q Var(T_ij|Z_ij)/Var(Q_ij|Z_ij). Based on (14), we can express

\begin{matrix} R_{ij} = α_{Tj} + r_{i} + δ' Z_{ij} + e_{Tij} + e_{Rij} \\ M_{ij} = α_{mj}^{*} + m_{i} + (β_{m} δ' + γ_{m}^{'}) Z_{ij} + β_{m} e_{Tij} + e_{Mij} \end{matrix}

where $α_{mj}^{*} = α_{mj} + β_{m} α_{Tj}$ . If we let

\begin{matrix} R_{ij}^{*} \equiv R_{ij} - δ' Z_{ij} = α_{Tj} + r_{i} + e_{Tij} + e_{Rij} \\ M_{ij}^{*} \equiv M_{ij} - (β_{m} δ' + γ_{m}^{'}) Z_{ij} = α_{mj}^{*} + m_{i} + β_{m} e_{Tij} + e_{Mij} \end{matrix}

(15)

then because r_i, m_i, and Z_ij are mutually independent, $R_{ij}^{*} and M_{ij}^{*}$ can be interpreted as residuals of R_ij and M_ij, respectively, on Z_ij. It follows from (15) that

Cov (M_{ij}^{*}, R_{ij}^{*}) = β_{m} Var (T_{ij} | Z_{ij}) = β_{m} σ_{T | Z}^{2}

(16)

and thus, $\hat{Cov} (M_{ij}^{*}, R_{ij}^{*})$ is the MLE of $β_{m} σ_{T | Z}^{2}$ . Similarly, from (14), we define

Q_{ij}^{*} \equiv Q_{ij} - (β_{q} δ' + γ_{q}^{'}) Z_{ij} = α_{qj}^{*} + q_{i} + β_{q} e_{Tij} + e_{Qij}

where $α_{qj}^{*} = α_{qj} + β_{q} α_{Tj}$ and interpret $Q_{ij}^{*}$ as the residual of Q_ij on Z_ij. We now consider the difference scores:

\begin{matrix} Q_{di}^{*} \equiv Q_{i 2}^{*} - Q_{i 1}^{*} = (α_{q 2}^{*} - α_{q 1}^{*}) + β_{q} (e_{Ti 2} - e_{Ti 1}) + (e_{Qi 2} - e_{Qi 1}) \\ R_{di}^{*} \equiv R_{i 2}^{*} - R_{i 1}^{*} = (α_{T 2} - α_{T 1}) + (e_{Ti 2} - e_{Ti 1}) + (e_{Ri 2} - e_{Ri 1}) \\ M_{di}^{*} \equiv M_{i 2}^{*} - M_{i 1}^{*} = (α_{m 2}^{*} - α_{m 1}^{*}) + β_{m} (e_{Ti 2} - e_{Ti 1}) + (e_{Mi 2} - e_{Mi 1}) \end{matrix}

(17)

From (17) it follows that $Cov (Q_{di}^{*}, R_{di}^{*}) = β_{q} Var (e_{Ti 2} - e_{Ti 1}), Cov (M_{di}^{*}, R_{di}^{*}) = β_{m} Var (e_{Ti 2} - e_{Ti 1})$ , and thus the MLE for β_q/β_m is given by

{\hat{β}}_{q} / {\hat{β}}_{m} = \hat{Cov} (Q_{di}^{*}, R_{di}^{*}) / \hat{Cov} (M_{di}^{*}, R_{di}^{*})

(18)

Therefore, from (16) and (18) we have that the MLE for $β_{q} σ_{T | Z}^{2}$ is

{\hat{β}}_{q} {\hat{σ}}_{T | Z}^{2} = \hat{Cov} (M_{ij}^{*}, R_{ij}^{*}) \hat{Cov} (Q_{di}^{*}, R_{di}^{*}) / \hat{Cov} (M_{di}^{*}, R_{di}^{*})

Finally, we estimate λ_TQ|Z by

{\hat{λ}}_{TQ | Z} = \hat{Cov} (T_{ij}, Q_{ij} | Z_{ij}) / \hat{Var} (Q_{ij} | Z_{ij}) = \hat{Cov} (M_{ij}^{*}, R_{ij}^{*}) \hat{Cov} (Q_{di}^{*}, R_{di}^{*}) / [\hat{Cov} (M_{di}^{*}, R_{di}^{*}) \hat{Var} (Q_{ij}^{*})]

(19)

which can be compared with $λ_{RQ | Z} = \hat{Cov} (Q_{ij}^{*}, R_{ij}^{*}) / \hat{Var} (Q_{ij}^{*})$ . To obtain confidence limits for λ_TQ|Z, we use the same approach as in Appendix A and equation (12), replacing Q_ij, R_ij, and M_ij by $Q_{ij}^{*}, R_{ij}^{*}, and M_{ij}^{*}$ , respectively.

2.4. Assessment of covariate effects on the systematic components of dietary and plasma measurement errors

It is also of interest to estimate γ̰_q and γ̰_m. · γ̰_q represents the effect of Z_ij on Q_ij conditional on true intake T_ij. Hence, γ̰_q allows us to evaluate whether covariates Z̰_ij are associated with systematic components of dietary (Q_ij) measurement error. γ̰_m has a similar interpretation regarding the effects of covariates Z̰_ij on M_ij (biomarker) conditional on T_ij. If we refer to (14), we see that

\begin{matrix} {QR}_{ij} \equiv Q_{ij} - β_{q} R_{ij} = α_{qj} + (q_{i} - β_{q} r_{i}) + γ_{q}^{'} Z_{ij} + (e_{Qij} - β_{q} e_{Rij}) \\ {MR}_{ij} \equiv M_{ij} - β_{m} R_{ij} = α_{mj} + (m_{i} - β_{m} r_{i}) + γ_{m}^{'} Z_{ij} + (e_{Mij} - β_{m} e_{Rij}) \end{matrix}

(20)

where β_q and β_m are estimated from (14) (see Appendix B). Hence, we can estimate γ̰_q and γ̰_m by running mixed effects regression models of QR_ij on Z_ij and MR_ij on Z_ij, respectively.

3. EXAMPLE

Applying the methods in this paper requires longitudinal data on intake obtained from a surrogate instrument, intake obtained from a reference instrument, and a biomarker over a sufficiently long period of time where non-trivial changes in dietary intake are possible. For this purpose, we use data from the EPIC study, a multi-center cohort study on diet and cancer conducted in 28 regional centers located in 10 Western European countries with varying dietary habits and cancer risk [12]. For 328 participants of the EPIC-Norfolk study, one of the two U.K.-based centers, data were available on dietary vitamin C assessed by both FFQ and a 7-day DR with plasma vitamin C as a biomarker. These data were available at both baseline and 4 years of follow up. We note that DR intake was obtained at the time of the blood draw, whereas FFQ intake pertains to intake during the previous year. There were five participants with outlying values for either plasma vitamin C (n = 3) or reported FFQ intake (n = 2) at one visit in the absence of outlying values at the other visit who were excluded from the analysis [13]. Previous analyses from the EPIC-Norfolk study have looked at the relationship between plasma vitamin C and dietary vitamin C assessed by FFQ and DR [14]. In this paper, we use the longitudinal data from the remaining 323 participants to estimate the parameters in (8). Descriptive statistics of the demographic variables, nutrient intake, and plasma levels at each time point are provided in Table I.

Table I.

Descriptive statistics for vitamin C intake and plasma vitamin C, EPIC-Norfolk study, n=323.

	Baseline	Year 4	Difference^*
Total caloric intake (kcal)
FFQ^† (mean±s.d.)	2033.6±509.9	1980.7±520.6	−52.9±490.6
DR^† (mean±s.d.)	1755.1±394.8	1857.7±429.8	102.6±327.7
Dietary vitamin C intake (mg/day)^‡
FFQ (raw) (mean±s.d.)	135.5±57.4	137.1±63.2	1.6±55.6
DR (raw) (mean±s.d.)	90.6±50.1	94.8±51.1	4.2±44.4
Correlation (DR vs FFQ)	0.45	0.51	0.16^§
FFQ (cal.-adj.) (mean±s.d.)	134.4±54.5	135.7±58.7	1.3±50.7
DR (cal.-adj.) (mean±s.d.)	90.6±50.2	94.6±52.0	4.1±46.2
Correlation (DR vs FFQ)	0.47	0.57	0.22^§
Plasma vitamin C (µmoL/L)	57.7±21.2	64.8±23.2	7.2±21.5
Correlation (vs FFQ, raw)	0.25	0.24	0.11^¶
Correlation (vs DR, raw)	0.40	0.36	0.28^¶
Correlation (vs FFQ, cal.-adj.)	0.25	0.27	0.11^¶
Correlation (vs DR, cal.-adj.)	0.40	0.34	0.27^¶
Age (mean±s.d.)	69.0±2.9	73.3±3.0
Gender
Male	80 (25 per cent)
Female	243 (75 per cent)
Height (cm) (mean±s.d.)	162.9±8.1	162.2±8.2
BMI (kg/m²) (mean±s.d.)	26.2±3.3	26.7±3.6
Smoking Status
Current	17 (5 per cent)	12 (4 per cent)
Past	127 (39 per cent)	132 (41 per cent)
Never	179 (56 per cent)	179 (55 per cent)
Vitamin C supplement use
Yes	42 (13 per cent)
No	281 (87 per cent)

Open in a new tab

Year 4 minus baseline.

^†

FFQ, food frequency questionnaire; DR, diet record.

^‡

Exclusive of vitamin supplements.

^§

Correlation between change in DR intake (year 4 minus baseline) and change in FFQ intake (year 4 minus baseline).

^¶

Correlation between change in dietary intake (year 4 minus baseline) and change in plasma vitamin C (year 4 minus baseline).

At baseline, the mean age of the study population included in this analysis was 69 years and 75 per cent of the subjects were women. About 5 per cent of the subjects were current smokers and 13 per cent were vitamin C supplement users. We see that dietary vitamin C intake reported on the FFQ was about 50 per cent higher than the DR at both baseline and year 4. Reported intake on the FFQ was relatively constant over 4 years. Reported DR intake increased slightly and measured plasma vitamin C levels increased moderately over 4 years. Cross-sectional correlations between calorie-adjusted DR and FFQ vitamin C nutrient intake ranged from 0.47 to 0.57; correlations between plasma vitamin C and calorie-adjusted nutrient intake from either instrument ranged from 0.25 to 0.40. Correlations between change in calorie-adjusted FFQ and DR intake were substantially lower (ρ = 0.22) than cross-sectional correlations. Correlations between change in calorie-adjusted Vitamin C intake and change in plasma vitamin C were also weak, but were slightly stronger for DR intake (ρ = 0.27) than for FFQ intake (ρ = 0.11).

A number of covariates may potentially be related to either dietary vitamin C intake or plasma vitamin C, some of which may change over time. Hence, we ran the following mixed effects regression model with, for example, FFQ vitamin C intake (Q_ij) as the response variable, where Q_i1,Q_i2 = FFQ vitamin C intake for the ith subject at baseline and year 4, respectively, treating the subject as a random effect and age, gender, height, BMI, smoking status, and vitamin C supplement use as fixed effects and using a compound symmetry correlation structure:

Q_{ij} = α + β_{1} {age}_{ij} + β_{2} {male gender}_{i} + β_{3} {height}_{ij} + β_{4} {BMI}_{ij} + β_{5} {current smoking}_{ij} + β_{6} {ex-smoking}_{ij} + β_{7} {vit. C supplement use}_{ij} + β_{8} {visit}_{j} + e_{ij}, i = 1, \dots, 323, j = 1, 2

(21)

and obtained residuals of Q_ij from equation (21); similar analyses were performed for DR vitamin C intake (R_i1, R_i2) and plasma vitamin C (M_i1, M_i2). For dietary vitamin C, analyses were performed for both raw and calorie-adjusted intakes. Calorie-adjusted FFQ vitamin C intake scores for males were obtained from

Q_{ij, cal.-adj} = exp {ln (Q_{ij}) - θ_{Q, j} [ln (C_{ij}) - mean [ln (C_{ij}), i = 1, \dots, 80]]}, i = 1, \dots, 80, j = 1, 2

where without loss of generality we assume that the first 80 subjects are males, C_ij is the total caloric intake for the ith male at time j, and θ_{Q, j} is the regression coefficient of ln(Q_ij) on ln(C_ij) based on the sample of 80 males. Similar formulas were used for females and for DR intake for both males and females. The results are given in Table II.

Table II.

Mixed effects regression of dietary vitamin C intake and plasma vitamin C, respectively, on other covariates, EPIC-Norfolk study, n=323.^*

	FFQ, raw		FFQ, cal.-adj.		DR, raw		DR, cal.-adj.		Plasma Vitamin C

Variable	Beta±s.e.	p-Value	Beta±s.e.	p-Value	Beta±s.e.	p-Value	Beta±s.e.	p-Value	Beta±s.e.	p-Value
Constant	−4.8±117.7		−9.6±110.3		−24.5±100.0		−13.7±100.4		127.0±38.3
Age (yrs)	1.10±1.01	0.27	1.02±0.94	0.28	0.02±0.85	0.98	0.15±0.86	0.86	−0.93±0.33	0.005
Male gender (1=yes/0=no)	−15.5±10.0	0.12	−15.0±9.4	0.11	−12.0±8.6	0.16	−9.8±8.6	0.26	−14.1±3.2	<0.001
Height (cm)	0.15±0.51	0.77	0.20±0.48	0.68	0.76±0.44	0.085	0.59±0.44	0.18	0.10±0.17	0.53
BMI (kg/m²)	1.78±0.82	0.032	1.85±0.77	0.017	−0.13±0.70	0.85	0.10±0.70	0.89	−0.73±0.27	0.008
Smoking status
Current	−9.6±13.3	0.47	−15.1±12.4	0.23	−28.7±11.2	0.010	−30.4±11.3	0.007	−21.0±4.4	<0.001
Past	−6.6±6.7	0.33	−9.1±6.3	0.15	−4.0±5.7	0.49	−4.5±5.7	0.43	−0.9±2.2	0.67
Vitamin C supplement use	4.7±8.7	0.59	2.4±8.2	0.77	1.4±7.5	0.85	2.6±7.5	0.73	14.7±2.8	<0.001
Visit (1=visit2/0=visit1)	−4.1±5.3	0.45	−4.1±4.9	0.41	4.3±4.4	0.33	3.3±4.5	0.46	11.3±1.8	<0.001
Correlation between repeated measures^†	0.56		0.58		0.61		0.58		0.43

Open in a new tab

Based on PROC MIXED of SAS.

^†

Using a compound symmetry correlation structure.

Based on Table II, we see that the BMI was significantly associated with calorie-adjusted FFQ vitamin C intake (Beta = 1.85±0.77, p = 0.017) with heavier subjects reporting higher levels of intake. However, no association was found for DR intake. Current smoking was inversely associated with calorie-adjusted DR intake with current smokers reporting lower levels of intake (Beta = −30.4±11.3, p = 0.007). Associations were strongest for plasma vitamin C. Plasma vitamin C was positively associated with vitamin C supplement use (Beta = 14.7±2.8, p<0.001) and inversely associated with age (Beta = −0.93±0.33, p = 0.005), male gender (Beta = −14.1±3.2, p<0.001), BMI (Beta = −0.73±0.27, p = 0.008), and current smoking (Beta = −21.0±4.4, p<0.001). After controlling for the risk factors in Table II, there was a moderate intraclass correlation between repeated measures of calorie-adjusted dietary intake (ICC = 0.58) and plasma vitamin C (ICC = 0.43).

We now fit the model in equation (14) by obtaining the maximum likelihood estimates of parameters after adjusting for the covariates in Table II. Separate analyses were performed for both raw and calorie-adjusted vitamin C intakes. Also, based on equation (13), we decomposed the variance of FFQ vitamin C intake (Var(Q_ij)) into components of variation due to systematic error (Var(q_i)), true dietary intake $(β_{q}^{2} Var (T_{ij}))$ , and random error (Var(e_Qij)). This decomposition was performed for both unadjusted and covariate-adjusted analyses. A similar decomposition was used for DR and biomarker measurements. The results are given in Table III.

Table III.

Variance component estimates based on reported vitamin C intake and plasma vitamin C, EPIC-Norfolk study, n=323.^*

	Raw intake		Calorie-adjusted intake

Source of variation	Unadjusted (per cent)	Covariate-adjusted (per cent)	Unadjusted (per cent)	Covariate-adjusted (per cent)
Food frequency questionnaire (Q_ij)^†	3634	3526	3199	3069
Systematic error	1848 (51)	1718 (49)	1652 (52)	1486 (48)
True intake	473 (13)	460 (13)	580 (18)	560 (18)
Random error	1313 (36)	1348 (38)	967 (30)	1023 (33)
Diet record (R_ij)^‡	2552	2492	2601	2542
Systematic error	1128 (44)	998 (40)	1045 (40)	904 (36)
True intake	853 (33)	884 (35)	1076 (41)	1092 (43)
Random error	571 (22)	610 (24)	480 (18)	546 (21)
Plasma vitamin C (M_ij)^§	492	401	492	401
Between-person variation	154 (31)	77 (19)	188 (38)	104 (26)
True intake	209 (42)	164 (41)	163 (33)	131 (33)
Random error	129 (26)	160 (40)	141 (29)	166 (41)

Open in a new tab

With adjustment for the covariates in Table II.

^†

FFQ: variation due to systematic error, Var(q_i); variation due to true intake, $β_{q}^{2} Var (T_{ij})$ ; variation due to random error, Var(e_Qij).

^‡

DR: variation due to systematic error, Var(r_i); variation due to true intake, Var(T_ij); variation due to random error, Var(e_Rij).

^§

Plasma vitamin C: between-person variation, Var(m_i); variation due to true intake, $β_{m}^{2} Var (T_{ij})$ ; variation due to random error, Var(e_Mij).

We see that for covariate- and calorie-adjusted FFQ intake, 48 per cent of the total variation is due to systematic error, 33 per cent is due to random error, and only 18 per cent is attributable to true dietary intake. For covariate- and calorie-adjusted DR intake, systematic error accounted for 36 per cent, random error for 21 per cent, and true dietary intake for 43 per cent of total variation. For plasma vitamin C, between-person variation accounted for 26 per cent of total variation, 41 per cent of the total variation was due to random error, and 33 per cent to variation in true dietary intake. Hence, the DR was most reflective of true intake among these three indices. For both raw and calorie-adjusted intakes, covariate-adjustment resulted in reduced variation due to systematic error and increased variation due to random error.

Estimates and standard errors for all the parameters in equations (8) and (14) are given in Table IV. We also computed the standard (λ_RQ) and modified (λ_TQ) regression calibration factors (equations (10), (11), and (19)), for both raw and calorie-adjusted nutrient intakes, with and without adjusting for the other covariates in Table II.

Table IV.

Parameter estimates from models in equations (8) and (14), EPIC-Norfolk study, n=323.

Raw vitamin C intake

Calorie-adjusted vitamin C intake

Unadjusted^*

Covariate-adjusted^†

Unadjusted^*

Covariate-adjusted^†

Parameter
type

Parameter

Independent
variable

Est.±s.e.

p-Value

Est.±s.e.

p-Value

Est.±s.e.

p-Value

Est.±s.e.

p-Value

Intercept

μ_T1

—

90.6±2.8

—

90.6±2.8

—

90.6±2.8

—

90.6±2.8

—

μ_T2

—

94.8±2.8

—

94.8±2.8

—

94.6±2.9

—

94.6±2.9

—

α_q1

—

68.0±17.4

—

27.0±98.6

—

67.9±19.2

—

14.3±89.8

—

α_q2

—

66.5±18.2

—

20.3±100.9

—

66.2±20.0

—

8.2±91.9

—

α_m1

—

12.9±20.2

—

140.8±44.3

—

22.4±14.5

—

134.9±39.9

—

α_m2

—

17.9±21.2

—

150.5±45.4

—

28.0±15.1

—

145.3±40.9

—

Regression

β_q

True vit. C intake

0.745±0.189

<0.001

0.721±0.167

<0.001

0.734±0.210

<0.001

0.716±0.181

<0.001

β_m

True vit. C intake

0.495±0.223

0.026

0.431±0.181

0.017

0.389±0.160

0.015

0.346±0.133

0.009

γ̰_q

Age (yrs)

—

0.99±0.84

0.24

—

0.81±0.77

0.29

Male gender (1=yes/0=no)

—

−6.9±8.4

0.41

—

−8.0±7.6

0.29

Height (cm)

—

−0.41±0.43

0.34

—

−0.25±0.39

0.52

BMI (kg/m²)

—

1.67±0.70

0.018

—

1.68±0.64

0.009

Current smoking

—

4.8±11.5

0.68

—

−0.6±10.5

0.95

Past smoking

—

−2.8±5.6

0.61

—

−4.8±5.1

0.34

Vitamin C supplement use

—

3.8±7.2

0.60

—

0.7±6.6

0.92

γ̰_m

Age (yrs)

—

−0.96±0.38

0.011

—

−1.01±0.34

0.003

Male gender (1=yes/0=no)

—

−8.9±3.8

0.018

—

−10.8±3.4

0.002

Height (cm)

—

−0.22±0.19

0.25

—

−0.10±0.17

0.56

BMI (kg/m²)

—

−0.74±0.31

0.019

—

−0.81±0.28

0.005

Current smoking

—

−8.3±5.1

0.11

—

−10.4±4.6

0.025

Past smoking

—

0.7±2.5

0.77

—

0.6±2.3

0.78

Vitamin C supplement use

—

14.1±3.3

<0.001

—

13.9±2.9

<0.001

Variance component

σ_{S}^{2}

—

647±301

—

700±309

—

783±318

—

834±331

—

σ_{D}^{2}

—

822±429

—

736±351

—

1168±570

—

1032±462

—

σ_{T}^{2}

—

853±369

—

884±356

—

1076±427

—

1092±404

—

σ_{q}^{2}

—

1848±428

—

1718±380

—

1652±387

—

1486±332

—

σ_{r}^{2}

—

1127±454

—

998±431

—

1045±453

—

904±424

—

σ_{m}^{2}

—

154±123

—

77±88

—

188±90

—

104±66

—

σ_{eq}^{2}

—

1313±226

—

1348±214

—

967±216

—

1023±185

—

σ_{er}^{2}

—

571±230

—

610±193

—

480±286

—

546±235

—

σ_{em}^{2}

—

129±73

—

160±50

—

141±64

—

166±45

—

Correlation

ρ_qr

—

0.61±0.10

—

0.62±0.10

—

0.62±0.12

—

0.61±0.11

—

ρ_T

—

0.52±0.20

—

0.58±0.17

—

0.46±0.23

—

0.53±0.20

—

Deattenuation

λ_RQ

—

0.404±0.041

—

0.403±0.041

—

0.472±0.043

—

0.471±0.044

—

factor

(0.324, 0.484)^‡

(0.322, 0.483)^‡

(0.388, 0.556)^‡

(0.385, 0.556)^‡

λ_TQ

—

0.175±0.077

—

0.181±0.075

—

0.247±0.095

—

0.255±0.094

—

(0.073,0.416)^§

(0.081, 0.406)^§

(0.116, 0.526)^§

(0.124, 0.525)^§

Open in a new tab

Based on Equation (8).

^†

Based on Equation (14) after adjusting for the covariates in Table II.

^‡

95 per cent CI for λ_RQ.

^§

95 per cent CI for λ_TQ based on equation (12).

We see that with standard regression calibration, based on raw intake after adjusting for the covariates in Table II, the standard deattenuation factor (λ_RQ) is 0.403±0.041, 95 per cent CI = (0.322,0.483). However, upon accounting for possibly correlated error between the FFQ and the DR, the modified deattenuation factor (λ_TQ) is 0.181±0.075 (95 per cent CI = 0.081, 0.406), which is more extreme than with standard regression calibration. For example, if the uncorrected RR for an exposure of interest is 1.2, the deattenuated RR estimate would be 1.2^1/0.403 = 1.6 with standard regression calibration and 1.2^1/0.181 = 2.7 after correction for correlated error with modified regression calibration, which is a substantial difference. The estimated correlation between the systematic error for FFQ and DR intake (ρ_qr) was 0.62.

After adjusting for calories, both the standard and the modified regression calibration factors increased: λ_RQ = 0.471±0.044, 95 per cent CI = (0.385,0.556); λ_TQ = 0.255±0.094, 95 per cent CI = (0.124,0.525). The corrected RR estimates corresponding to an uncorrected RR of 1.2 were 1.2^1/0.471 = 1.5 with standard regression calibration and 1.2^1/0.255 = 2.0 with modified regression calibration, still a substantial difference. The degree of correlated error remained about the same after caloric adjustment (ρ_qr = 0.61). Also, both the modified regression calibration factor (λ_TQ) and the estimated degree of correlated error (ρ_qr) remained about the same for unadjusted and covariate-adjusted analyses.

We also estimated γ̰_q and γ̰_m in (14) by using the methods in equation (20) for both raw and calorie-adjusted vitamin C intakes. We see that for calorie-adjusted intake, there was a significant association between BMI and FFQ vitamin C intake even after controlling for true intake (γ_q = 1.68±0.64, p = 0.009). This implies that heavier people tend to systematically report higher levels of FFQ vitamin C intake than lighter people conditional on true intake. No other covariates were significantly associated with FFQ reported intake conditional on true intake. Regarding plasma vitamin C, there were significant effects of age (γ_m = −1.01±0.34, p = 0.003), male gender (γ_m = −10.8±3.4, p = 0.002), BMI (γ_m = −0.81±0.28, p = 0.005), current smoking (γ_m = −10.4±4.6, p = 0.025), and vitamin C supplement use (γ_m = 13.9±2.9, p<0.001). Hence, older individuals, males, heavier individuals, and current smokers had lower levels of plasma vitamin C, whereas vitamin C supplement users had higher levels of plasma vitamin C, conditional on true intake. Results were similar when raw intake was used instead of calorie-adjusted intake.

4. SIMULATION STUDY

We performed simulation studies to assess the bias and coverage probability of our estimator λ_TQ as given in equations (10) and (12). In addition, we computed the C statistic given by

C = {\sum_{i = 1}^{4000} {[{\hat{λ}}_{TQ}^{(i)} - {\bar{λ}}_{TQ}]}^{2} / 3999} / {\sum_{i = 1}^{4000} \hat{Var} ({\hat{λ}}_{TQ}^{(i)}) / 4000}

to assess the validity of the variance estimate of λ̂_TQ given in equation (A2). We chose sample sizes of 100 and 350, where the latter sample size approximately mimics the sample size used in our example. For each of the 36 parameter combinations varying ρ_T, ρ_qr, and λ_TQ, we performed 4000 simulations. The detailed simulation study design is given as follows for each of i = 1, … , n subjects:

We generated q_i from an $N (0, σ_{q}^{2})$ distribution.
We generated r_i|q_i from an $N [ρ_{qr} q_{i}, σ_{r}^{2} (1 - ρ_{qr}^{2})]$ distribution.
We generated m_i from an $N (0, σ_{m}^{2})$ distribution.
We generated (T_i1, T_i2) from an N(μ_T, Σ_T) distribution where μ_T = (μ_T1, μ_T2), $Σ_{T, 11} = Σ_{T, 22} = σ_{T}^{2}, Σ_{T, 12} = Σ_{T, 21} = ρ_{T} σ_{T}^{2}$ .
We generated Q_ij from an $N (α_{qj} + q_{i} + β_{q} T_{ij}, σ_{eq}^{2})$ distribution; j = 1, 2.
We generated R_ij from an $N (r_{i} + T_{ij}, σ_{er}^{2})$ distribution; j = 1, 2.
We generated M_ij from an $N (α_{mj} + m_{i} + β_{m} T_{ij}, σ_{em}^{2})$ distribution; j = 1, 2.
We then computed λ̂_TQ from equation (10).
Furthermore, we computed the 95 per cent CI for λ_TQ based on equation (12) and obtained the estimated coverage probability given by the proportion of 95 per cent CIs which included the true value of λ_TQ.
Finally, we used the C statistic to compare the empirical variance of λ̂_TQ over 4000 simulations for each combination of parameters with the theoretical variance of λ̂_TQ given by the average of $\hat{Var} ({\hat{λ}}_{TQ}^{(i)}) = {\hat{λ}}_{TQ}^{2} Var [ln ({\hat{λ}}_{TQ})]$ in equation (A2) over 4000 simulations.

The simulation strategy in steps 1–10 was based on the following parameter values: α_q1 = 0, α_q2 = 1, β_q = β_m = 1, μ_T1 = 100, μ_T2 = 110, α_m1 = 0, α_m2 = 1, $σ_{q}^{2} = σ_{r}^{2} = σ_{m}^{2} = σ_{eq}^{2} = σ_{er}^{2} = σ_{em}^{2} = 1,$ , ρ_T = (0.2,0.5,0.8), ρ_qr = (0,0.3,0.6,0.9), and λ_TQ = (1/3,2/3,9/10), $σ_{T}^{2} = 2 λ_{TQ} / (1 - λ_{TQ})$ , and n = (100,350) with 4000 simulations run for each parameter combination. The results are shown in Table V.

Table V.

Simulation results for modified regression calibration approach, 4000 simulations per design.

		ρ_qr

		0			0.3			0.6			0.9

λ_TQ	ρ_T	Mean±s.d. (range)	C (N_SIM)	Coverage (per cent)	Mean±s.d. (range)	C (N_SIM)	Coverage (per cent)	Mean±s.d. (range)	C (N_SIM)	Coverage (per cent)	Mean±s.d. (range)	C (N_SIM)	Coverage (per cent)
(a) n=350
0.9	0.2	0.900±0.029	1.03	94.3	0.900±0.027	1.02	94.6	0.900±0.025	1.01	94.9	0.900±0.023	1.00	94.9
		(0.798,1.005)	(4000)		(0.804,1.001)	(4000)		(0.810,0.998)	(4000)		(0.817,0.991)	(4000)
	0.5	0.900±0.034	1.04	94.2	0.900±0.032	1.04	94.3	0.900±0.031	1.03	94.3	0.900±0.029	1.02	94.7
		(0.786,1.025)	(4000)		(0.792,1.023)	(4000)		(0.799,1.019)	(4000)		(0.807,1.012)	(4000)
	0.8	0.900±0.049	1.06	94.2	0.900±0.048	1.05	94.2	0.900±0.046	1.05	94.3	0.900±0.045	1.05	94.5
		(0.728,1.080)	(4000)		(0.732,1.072)	(4000)		(0.738,1.065)	(4000)		(0.746,1.065)	(4000)
0.667	0.2	0.667±0.046	1.03	94.5	0.667±0.044	1.02	94.6	0.667±0.042	1.01	94.5	0.667±0.040	1.00	94.9
		(0.515,0.865)	(4000)		(0.522,0.862)	(4000)		(0.529,0.855)	(4000)		(0.536,0.842)	(4000)
	0.5	0.668±0.056	1.03	94.3	0.668±0.054	1.03	94.4	0.668±0.052	1.02	94.5	0.668±0.050	1.01	94.7
		(0.483,0.920)	(4000)		(0.498,0.917)	(4000)		(0.513,0.910)	(4000)		(0.513,0.896)	(4000)
	0.8	0.671±0.095	1.03	95.0	0.671±0.093	1.02	94.8	0.671±0.092	1.02	95.0	0.671±0.090	1.01	95.0
		(0.390,1.084)	(4000)		(0.395,1.078)	(4000)		(0.399,1.089)	(4000)		(0.399,1.401)	(4000)
0.333	0.2	0.335±0.054	1.01	95.1	0.335±0.053	1.00	95.3	0.335±0.052	0.99	95.5	0.335±0.052	0.99	95.6
		(0.160,0.577)	(4000)		(0.177,0.575)	(4000)		(0.185,0.567)	(4000)		(0.190,0.550)	(4000)
	0.5	0.338±0.072	0.98	95.2	0.338±0.071	0.98	95.5	0.338±0.070	0.98	95.5	0.338±0.069	0.97	95.8
		(0.125,0.699)	(4000)		(0.139,0.688)	(4000)		(0.156,0.691)	(4000)		(0.150,0.708)	(4000)
	0.8^*	0.360±0.150	0.55	99.0	0.359±0.149	0.55	99.1	0.360±0.148	0.54	99.0	0.360±0.148	0.54	98.9
		(0.111,0.986)	(3844)		(0.111,0.992)	(3845)		(0.111,0.999)	(3840)		(0.111,0.989)	(3839)
(b) n=100
0.9	0.2	0.901±0.053	0.98	94.0	0.901±0.050	0.97	94.0	0.901±0.047	0.97	93.8	0.901±0.044	0.95	94.1
		(0.699,1.093)	(4000)		(0.709,1.080)	(4000)		(0.720,1.062)	(4000)		(0.736,1.044)	(4000)
	0.5	0.901±0.063	1.01	93.9	0.901±0.060	1.01	93.9	0.901±0.057	1.01	94.0	0.901±0.054	1.00	94.2
		(0.679,1.159)	(4000)		(0.689,1.144)	(4000)		(0.701,1.124)	(4000)		(0.708,1.095)	(4000)
	0.8	0.903±0.091	1.03	94.0	0.903±0.089	1.03	93.9	0.902±0.086	1.03	94.1	0.902±0.083	1.03	94.4
		(0.615,1.306)	(4000)		(0.621,1.289)	(4000)		(0.631,1.278)	(4000)		(0.630,1.275)	(4000)
0.667	0.2	0.669±0.086	1.03	94.2	0.669±0.083	1.03	94.2	0.669±0.080	1.03	94.2	0.668±0.076	1.03	94.1
		(0.394,1.044)	(4000)		(0.396,1.022)	(4000)		(0.394,0.987)	(4000)		(0.381,0.991)	(4000)
	0.5	0.671±0.107	1.03	94.1	0.671±0.104	1.03	94.3	0.670±0.100	1.03	94.4	0.670±0.096	1.03	94.4
		(0.348,1.206)	(4000)		(0.338,1.178)	(4000)		(0.324,1.135)	(4000)		(0.309,1.083)	(4000)
	0.8	0.685±0.190	0.96	95.5	0.684±0.188	0.96	95.5	0.683±0.185	0.95	95.9	0.683±0.182	0.95	95.9
		(0.088,2.084)	(4000)		(0.085,2.094)	(4000)		(0.082,2.107)	(4000)		(0.078,2.125)	(4000)
0.333	0.2	0.340±0.108	0.99	94.7	0.340±0.107	1.00	94.9	0.339±0.106	1.01	95.1	0.339±0.104	1.00	94.8
		(0.025,0.969)	(4000)		(0.022,1.050)	(4000)		(0.020,1.125)	(4000)		(0.017,1.186)	(4000)
	0.5	0.355±0.182	0.15	96.4	0.354±0.184	0.13	96.6	0.354±0.186	0.12	96.7	0.354±0.188	0.11	96.8
		(0.022,5.035)	(3999)		(0.022,5.447)	(3999)		(0.022,5.830)	(3999)		(0.015,6.16)	(3999)
	0.8^*	0.382±0.207	0.26	99.4	0.381±0.204	0.26	99.8	0.382±0.204	0.26	99.5	0.382±0.204	0.24	99.6
		(0.111,0.999)	(2917)		(0.111,0.999)	(2904)		(0.112,0.999)	(2908)		(0.111,0.999)	(2913)

Open in a new tab

Restricted to ${\hat{λ}}_{TQ} \geq \frac{1}{9} and \hat{λ} \leq 1.0$ .

In the case of n = 350 (Table V(a)), for 32 of the 36 designs (1st eight rows of Table V(a)), the bias is minimal for all parameter combinations. The C statistic ranges from 0.97 to 1.06 and the coverage probability ranges from 94.2 to 95.8 per cent compared with a nominal average of 95 per cent. The one exception to this rule is in the case where λ_TQ = 1/3 and ρ_T = 0.8 (9th row of Table V(a)), where both the point estimate λ̂_TQ and its associated variance Var(λ̂_TQ) become large if ρ̂_T is close to 1. This results in a slightly biased estimate of λ̂_TQ (range from 0.359 to 0.360) and wide confidence limits (coverage probability from 98.9 to 99.1 per cent). To reduce variation, we restricted the range of λ̂_TQ to the interval $(\frac{1}{9}, 1.0)$ , which was satisfied in 96 per cent of simulations. This reduced the problem but did not eliminate it. It is likely that a larger sample size for a validation study is needed to accurately estimate λ_TQ in this particular setting or one can bootstrap as an alternative to using the large sample confidence limits in equation (12). In our example, λ̂_TQ was 0.25 and ρ̂_T was 0.53, which is less extreme than the above aberrant situation.

In the case of n = 100 (Table V(b)), the coverage probability ranges from 93.8 to 95.9 per cent and the C statistic ranges from 0.95 to 1.03 in the first 7 rows of the table. The procedure behaves badly in the extreme case where λ_TQ = 0.333 and ρ_T = 0.5–0.8, with coverage probabilities that are too large. The number of simulations for particular parameter combinations is sometimes <4000 due to negative variance estimates for log λ_TQ in equation (A2) for some simulated samples, particularly for n = 100.

5. DISCUSSION

We have presented an extension of the standard regression calibration model that allows for the presence of correlated error between a surrogate instrument (Q) and a gold standard instrument (R). Fitting this model requires longitudinal data for Q, R, and a biomarker (M) over a comparable time period t that is sufficiently long so that a meaningful change in dietary intake is possible, which is correlated, albeit imperfectly, with a change in the associated biomarker. A notable feature of this approach is that possible between-person variation in the biomarker (m_i) among people with the same dietary intake is accounted for, but is assumed to be uncorrelated with the systematic error in Q(q_i) and R(r_i). In addition, true intake (T_ij) for individual subjects is allowed to change over time. Furthermore, since changes in other covariates (Z) may influence changes in Q, R, and M, an extension of the approach is presented, which allows one to control for changes in one or more covariates (Z). Maximum likelihood estimates of model parameters can be obtained with standard software. A formula for the standard error of the modified regression calibration factor (λ_TQ) is given in Appendix A (SAS macro available at the following website http://www.geocities.com/bernardrosner/Channing.html, which provides estimates and standard errors of all model parameters).

We applied these methods to the assessment of measurement error in dietary vitamin C intake among 323 subjects in the EPIC-Norfolk study, who provided dietary vitamin C intake data from both the FFQ and a 7-day DR as well as a plasma vitamin C sample on two occasions 4 years apart. Results from these analyses revealed substantial correlated error between the FFQ and the DR (ρ_qr ≅ 0.61). Thus, with an uncorrected calorie-adjusted RR of 1.2, we obtain a measurement error corrected RR of 1.5 and 2.0 using the standard and modified regression calibration approaches, a substantial difference. We also performed an extensive simulation study, which indicated that for most parameter combinations, the estimator of λ_TQ in equation (10) and the corresponding large sample confidence limits in equation (12) performed well based on validation study sample sizes of 350 and 100 subjects. For some extreme designs, coverage probabilities were sometimes slightly larger than 0.95, resulting in somewhat conservative inferences. In this simulation study, the proportions of variation due to random error in the FFQ (Var(e_Qij)/Var(Q_ij)), DR(Var(e_Rij)/Var(R_ij)), and biomarker (Var(e_Mij)/Var(M_ij)) were all fixed at $\frac{1}{3}$ . These proportions were similar to the observed proportions in the EPIC-Norfolk study data (0.33, 0.21, and 0.41, respectively) based on calorie-adjusted intake. Additional simulations could be performed with varying proportions due to random error to assess the quality of the estimator λ̂_TQ under different conditions.

In the EPIC-Norfolk data set, plasma vitamin C was much more highly correlated with calorie-adjusted vitamin C intake from the DR than with calorie-adjusted vitamin C from the FFQ, both cross-sectionally and longitudinally. However, the FFQ estimates average intake over the past year, whereas the DR estimates intake over 1 week. Since the plasma vitamin C was obtained at about the same time as the DR, this may explain why it was more closely correlated with the DR than with the FFQ. We also looked at the correlation between plasma vitamin C at baseline vs each of the calorie-adjusted FFQ intakes at year 4 (ρ = 0.19) and calorie-adjusted DR intakes at year 4 (ρ = 0.26) (data not shown). The difference between these correlations appears narrower than the corresponding baseline cross-sectional correlations (FFQ baseline intake vs plasma vitamin C baseline, ρ = 0.25; DR baseline intake vs plasma vitamin C intake baseline, ρ = 0.40), reflecting the point that the FFQ estimates intake over a longer period of time and suggesting that the FFQ and DR may have similar validity as measures of long-term intake. Because the DR and biomarker were collected in close proximity both at baseline and at year 4, this would also tend to overstate the validity of change in vitamin C intake assessed by DR relative to change assessed by FFQ.

An assumption of the model in equations (8) and (14) is that the random effects q_i, r_i, and m_i remain the same over time for each individual. Hence, errors in the estimates of changes in Q and R are assumed to be independent conditional on the change in true intake (equation (9)) and also change in other covariates (equation (17)). This assumption could be examined if independent information were available on one of the parameters, for example, β_m, from a separate calibration experiment. This assumption is more likely to hold if the time interval between repeat measurements is short, but sufficiently long, so that true change in diet is possible. Of course, other covariates (Z_ij) may also change over time and may be associated with q_i, r_i, and m_i in equation (8). However, the ability to control for change in (Z_ij) (equation (14)) makes the interpretation of q_i, r_i, and m_i to be conditional on (Z_ij) and makes the assumption of homogeneity over time more reasonable.

In addition, we assume that Var(T_ij) remains constant over time, while allowing E(T_ij) to vary. If the former assumption is relaxed, one obtains separate regression calibration factors at visits 1 and 2 (λ_TQ,1, λ_TQ,2), which can be estimated by

\begin{matrix} {\hat{λ}}_{TQ, 1} = \hat{Cov} (M_{i 1}, R_{i 1}) \hat{Cov} (Q_{di}, R_{di}) / [\hat{Cov} (M_{di}, R_{di}) \hat{Var} (Q_{i 1})] \\ {\hat{λ}}_{TQ, 2} = \hat{Cov} (M_{i 2}, R_{i 2}) \hat{Cov} (Q_{di}, R_{di}) / [\hat{Cov} (M_{di}, R_{di}) \hat{Var} (Q_{i 2})] \end{matrix}

The delta method can also be used to obtain confidence limits for λ_TQ,1 and λ_TQ,2 using similar methods to these given in Appendix A. A possible future extension might test the homogeneity of λ_TQ at different visits. In the EPIC data set, Var(Q_ij), Var(R_ij), and Var(M_ij) remained relatively constant over time (Table I). Furthermore, based on the EPIC data, we have λ̂_TQ,1 = 0.281±0.103 (95 per cent CI = 0.137, 0.578) and λ̂_TQ,2 = 0.225 ± 0.076 (95 per cent CI = 0.116, 0.436) for calorie-adjusted intake, indicating relative homogeneity of λ_TQ over the two visits. Finally, previous literature should be explored to ensure that all relevant confounders are included in Z_ij, in (14).

The traditional goal of regression calibration is to obtain the regression coefficient of true intake (T_ij) on surrogate intake (Q_ij) based on corresponding dietary assessments at one point in time. The example we used should be interpreted as providing estimates when true intake is conceptually relatively short term and biomarkers and nutrient intake are assessed at approximately the same time. However, since cumulative intake over long periods of time is likely to be more strongly associated with some diseases of interest, we should also consider the regression coefficient of μ_Ti on Q_ij, where μ_Ti is the true intake for subject i over a long period of time. Estimating this regression coefficient requires either more than two repeated measures or making some assumptions regarding the time series structure of true intake [i.e. Corr(T_i1, T_i2), |T_i1 − T_i2| = t]. If one assumes a first-order autoregressive model for T_ij, one can extend equation (8) to estimate this long-term regression coefficient. Similarly, since an average of several FFQs over a long period of time is likely to provide a closer approximation to true intake than a single FFQ, one can also consider Corr(μ_Qi, μ _Ti), where μ_Qi is the average FFQ intake over long periods of time. These extensions to measurement error correction of long-term intake are a subject for future work.

ACKNOWLEDGEMENTS

We acknowledge the support of the National Cancer Institute CA50597 in performing this work.

APPENDIX A: CONFIDENCE LIMITS FOR THE ALTERNATIVE REGRESSION CALIBRATION FACTOR (λ_TQ) AND THE REGRESSION COEFFICIENTS β_q AND β_m

Since the sampling distribution of λ̂_TQ is likely to be skewed in small samples, we will consider

Var [ln ({\hat{λ}}_{TQ})] = Var {ln [\hat{Cov} (Q_{di}, R_{di})] + ln [\hat{Cov} (M_{ij}, R_{ij})] - ln [\hat{Cov} (M_{di}, R_{di})] - ln [\hat{Var} (Q_{ij})]}

(A1)

It will be advantageous for the evaluation of equation (A1) as well as the estimation of variances for the other parameters in Appendix B to define X_ijk as the value of the kth variable at time j for the ith subject, where i = 1, … , n, j = 1, 2, and k = 1,2,3 denote Q, R, and M, respectively, and let

{\bar{X}}_{ik} = (X_{i 1 k} + X_{i 2 k}) / 2, X_{idk} = X_{i 2 k} - X_{i 1 k}, i = 1, \dots, n, k = 1, \dots, 3

${\bar{X}}_{k} = \sum_{i = 1}^{n} {\bar{X}}_{ik} / n, k = 1, \dots, 3, {\bar{X}}_{dk} = \sum_{i = 1}^{n} X_{idk} / n, k = 1, \dots, 3, {\bar{X}}_{jk} = \sum_{i = 1}^{n} X_{ijk} / n, j = 1, 2, k = 1, \dots, 3$ . We also define

A_{kl} = Cov ({\bar{X}}_{ik}, {\bar{X}}_{il}), B_{kl} = Cov (X_{idk}, X_{idl})

and

C_{kl} = [Cov (X_{i 1 k}, X_{i 1 l}) + Cov (X_{i 2 k}, X_{i 2 l})] / 2, k, l = 1, \dots, 3

We will see that variances of λ̂_TQ as well as all the parameters in Appendix B can be expressed in terms of Cov(A_k₁l₁, A_k₂l₂), Cov(B_k₁l₁, B_k₂l₂), Cov(C_k₁l₁, C_k₂l₂), and Cov(A_k₁l₁, C_k₂l₂), k₁, k₂, l₁, l₂ = 1, … , 3. We note that Cov(A_k₁l₁, B_k₂l₂) = 0, k₁, k₂, l₁, l₂ = 1, … ,3, and Cov(B_k₁l₁, C_k₂l₂) = 0, k₁, k₂, l₁, l₂ = 1, … , 3.

We then can express equation (A1) in the form: Var[ln(λ̂_TQ)] = Var[ln(B₁₂) + ln(C₂₃) − ln(B₂₃) − ln(C₁₁)] which upon using the delta method is given by

Var [ln ({\hat{λ}}_{TQ})] = Var (B_{12}) / B_{12}^{2} + Var (C_{23}) / C_{23}^{2} + Var (B_{23}) / B_{23}^{2} + Var (C_{11}) / C_{11}^{2} - 2 Cov (B_{12}, B_{23}) / (B_{12} B_{23}) - 2 Cov (C_{11}, C_{23}) / (C_{11} C_{23})

(A2)

We have upon some algebra that

\begin{matrix} Cov (A_{k_{1} l_{1}}, A_{k_{2} l_{2}}) = [\sum_{i = 1}^{n} ({\bar{X}}_{{ik}_{1}} - {\bar{X}}_{k_{1}}) ({\bar{X}}_{{ik}_{2}} - {\bar{X}}_{k_{2}}) ({\bar{X}}_{{il}_{1}} - {\bar{X}}_{l_{1}}) ({\bar{X}}_{{il}_{2}} - {\bar{X}}_{l_{2}}) / n - A_{k_{1} l_{1}} A_{k_{2} l_{2}}] / n \\ Cov (B_{k_{1} l_{1}}, B_{k_{2} l_{2}}) = [\sum_{i = 1}^{n} (X_{{idk}_{1}} - {\bar{X}}_{{dk}_{1}}) (X_{{idk}_{2}} - {\bar{X}}_{{dk}_{2}}) (X_{{idl}_{1}} - {\bar{X}}_{{dl}_{1}}) (X_{{idl}_{2}} - {\bar{X}}_{{dl}_{2}}) / n - B_{k_{1} l_{1}} B_{k_{2} l_{2}}] / n \\ Cov (C_{k_{1} l_{1}}, C_{k_{2} l_{2}}) = [\sum_{i = 1}^{n} \sum_{j = 1}^{2} (X_{{ijk}_{1}} - {\bar{X}}_{{jk}_{1}}) (X_{{ijl}_{1}} - {\bar{X}}_{{jl}_{1}}) \sum_{j = 1}^{2} (X_{{ijk}_{2}} - {\bar{X}}_{{jk}_{2}}) (X_{{ijl}_{2}} - {\bar{X}}_{{jl}_{2}}) / (4 n) - C_{k_{1} l_{1}} C_{k_{2} k_{2}}] / n \\ Cov (A_{k_{1} l_{1}}, C_{k_{2} l_{2}}) = [\sum_{i = 1}^{n} ({\bar{X}}_{{ik}_{1}} - {\bar{X}}_{k_{1}}) ({\bar{X}}_{{il}_{1}} - {\bar{X}}_{l_{1}}) \sum_{j = 1}^{2} (X_{{ijk}_{2}} - {\bar{X}}_{{jk}_{2}}) (X_{{ijl}_{2}} - {\bar{X}}_{{jl}_{2}}) / (2 n) - A_{k_{1} l_{1}} C_{k_{2} l_{2}}] / n \end{matrix}

(A3)

Upon combining (A2) and (A3) we obtain Var[ln(λ̂_TQ)]. To obtain a 100 per cent × (1 − α) CI for Var λ̂_TQ, we compute [exp(c₁), exp(c₂)], where (c₁, c₂) = ln(λ̂_TQ) ± z_{1 − α/2}[Var(λ̂_TQ)]^1/2 and z_p = pth percentile of a standard normal distribution.

In addition, we can obtain standard errors and CIs for each of the estimated parameters in Appendix B using the delta method as follows:

Var ({\hat{σ}}_{S}^{2}) = {\hat{σ}}_{S}^{4} [4 Var (A_{23}) / A_{23}^{2} + Var (B_{12}) / B_{12}^{2} + Var (A_{13}) / A_{13}^{2} + Var (B_{23}) / B_{23}^{2} - 4 Cov (A_{13}, A_{23}) / (A_{13} A_{23}) - 2 Cov (B_{12}, B_{23}) / (B_{12} B_{23})]

(A4)

Var ({\hat{σ}}_{D}^{2}) = {\hat{σ}}_{D}^{4} [Var (B_{12}) / B_{12}^{2} + Var (B_{23}) / B_{23}^{2} + Var (B_{13}) / B_{13}^{2} + 2 Cov (B_{12}, B_{23}) / (B_{12} B_{23}) - 2 Cov (B_{12}, B_{13}) / (B_{12} B_{13}) - 2 Cov (B_{13}, B_{23}) / (B_{13} B_{23})]

(A5)

Var ({\hat{σ}}_{T}^{2}) = Var ({\hat{σ}}_{S}^{2}) + Var ({\hat{σ}}_{D}^{2}) / 16

(A6)

For the remaining variance estimates, it will be useful to assess $Cov (A_{kl}, σ_{T}^{2}), Cov (B_{kl}, σ_{T}^{2}), and Cov (C_{kl}, σ_{T}^{2})$ . We have upon using the delta method that

Cov (A_{kl}, σ_{T}^{2}) = Cov (A_{kl}, σ_{S}^{2}) = σ_{S}^{2} [2 Cov (A_{kl}, A_{23}) / A_{23} - Cov (A_{kl}, A_{13}) / A_{13}]

(A7)

Cov (B_{kl}, σ_{T}^{2}) = Cov (B_{kl}, σ_{S}^{2}) + Cov (B_{kl}, σ_{D}^{2}) / 4

(A7a)

where

Cov (B_{kl}, σ_{S}^{2}) = σ_{S}^{2} [Cov (B_{kl}, B_{12}) / B_{12} - Cov (B_{kl}, B_{23}) / B_{23}]

(A7b)

Cov (B_{kl}, σ_{D}^{2}) = σ_{D}^{2} [Cov (B_{kl}, B_{12}) / B_{12} + Cov (B_{kl}, B_{23}) / B_{23} - Cov (B_{kl}, B_{13}) / B_{13}]

(A7c)

Cov (C_{kl}, σ_{T}^{2}) = Cov (C_{kl}, σ_{S}^{2}) = σ_{S}^{2} [2 Cov (C_{kl}, A_{23}) / A_{23} - Cov (C_{kl}, A_{13}) / A_{13}]

(A7d)

\begin{matrix} Cov (C_{kl}, σ_{D}^{2}) & = Cov (A_{kl}, σ_{D}^{2}) = 0 \\ Var ({\hat{β}}_{q}) & = {\hat{β}}_{q}^{2} [Var (B_{12}) / B_{12}^{2} + Var (C_{23}) / C_{23}^{2} + Var (B_{23}) / B_{23}^{2} + Var ({\hat{σ}}_{T}^{2}) / {\hat{σ}}_{T}^{4} - 2 Cov (B_{12}, B_{23}) / (B_{12} B_{23}) - 2 Cov (B_{12}, {\hat{σ}}_{T}^{2}) / (B_{12} {\hat{σ}}_{T}^{2}) - 2 Cov (C_{23}, {\hat{σ}}_{T}^{2}) / (C_{23} {\hat{σ}}_{T}^{2}) + 2 Cov (B_{23}, {\hat{σ}}_{T}^{2}) / (B_{23} {\hat{σ}}_{T}^{2})] \end{matrix}

(A8)

Var ({\hat{β}}_{m}) = {\hat{β}}_{m}^{2} [Var (C_{23}) / C_{23}^{2} + Var ({\hat{σ}}_{T}^{2}) / {\hat{σ}}_{T}^{4} - 2 Cov (C_{23}, {\hat{σ}}_{T}^{2}) / (C_{23} {\hat{σ}}_{T}^{2})]

(A9)

where $Var ({\hat{σ}}_{T}^{2}) and Cov (C_{23}, {\hat{σ}}_{T}^{2})$ are given in (A6) and (A7d), respectively.

The variance of the remaining estimated parameters in Appendix B can also be obtained using the delta method similar to equations (A4)–(A9).

APPENDIX B: MLES OF THE PARAMETERS FOR THE MODEL IN EQUATIONS (8) AND (14)

{\hat{σ}}_{S}^{2} = A_{23}^{2} B_{12} / (A_{13} B_{23})

{\hat{σ}}_{D}^{2} = B_{12} B_{23} / B_{13}

{\hat{σ}}_{T}^{2} = \hat{Var} (T_{ij}) = {\hat{σ}}_{S}^{2} + {\hat{σ}}_{D}^{2} / 4

{\hat{β}}_{q} = B_{12} \hat{Cov} (M_{ij}, R_{ij}) / B_{23} {\hat{σ}}_{T}^{2})

{\hat{β}}_{m} = \hat{Cov} (M_{ij}, R_{ij}) / {\hat{σ}}_{T}^{2}

{\hat{σ}}_{eq}^{2} = (B_{11} - {\hat{β}}_{q}^{2} {\hat{σ}}_{D}^{2}) / 2

{\hat{σ}}_{er}^{2} = (B_{22} - {\hat{σ}}_{D}^{2}) / 2

{\hat{σ}}_{em}^{2} = (B_{33} - {\hat{β}}_{m}^{2} {\hat{σ}}_{D}^{2}) / 2

{\hat{σ}}_{q}^{2} = A_{11} - {\hat{β}}_{q}^{2} {\hat{σ}}_{S}^{2} - {\hat{σ}}_{eq}^{2} / 2

{\hat{σ}}_{r}^{2} = A_{22} - {\hat{σ}}_{S}^{2} - {\hat{σ}}_{er}^{2} / 2

{\hat{σ}}_{m}^{2} = A_{33} - {\hat{β}}_{m}^{2} {\hat{σ}}_{S}^{2} - {\hat{σ}}_{em}^{2} / 2

{\hat{ρ}}_{qr} = (A_{12} - {\hat{β}}_{q} {\hat{σ}}_{S}^{2}) / ({\hat{σ}}_{q} {\hat{σ}}_{r})

{\hat{ρ}}_{T} \equiv \hat{Cov} (T_{i 1}, T_{i 2}) / {\hat{σ}}_{T}^{2} = [\hat{Cov} (R_{i 1}, R_{i 2}) - {\hat{σ}}_{r}^{2}] / {\hat{σ}}_{T}^{2} = (A_{22} - B_{22} / 4 - {\hat{σ}}_{r}^{2}) / {\hat{σ}}_{T}^{2}

Furthermore, the MLEs of the mean parameters in (8) are given by

\begin{matrix} {\hat{μ}}_{Tj} = \sum_{i = 1}^{n} R_{ij} / n, j = 1, 2 \\ {\hat{α}}_{qj} = \sum_{i = 1}^{n} Q_{ij} / n - {\hat{β}}_{q} {\hat{μ}}_{Tj}, j = 1, 2 \\ {\hat{α}}_{mj} = \sum_{i = 1}^{n} M_{ij} / n - {\hat{β}}_{m} {\hat{μ}}_{Tj}, j = 1, 2 \end{matrix}

The parameters α_qj and α_mj in equation (14) can be estimated by the intercept terms in the QR_ij and MR_ij mixed effects models in equation (20). The parameter $σ_{S}^{2}, \dots, ρ_{T}$ in equation (14) can be estimated by substituting residuals of Q on Z, R on Z, and M on Z, respectively, for Q, R, and M and using the above expressions. The parameter μ_Tj is estimated similarly in equations (8) and (14). The parameters γ̰_q and γ̰_m are estimated from the mixed effects regression models in equation (20).

REFERENCES

1.Willett WC, Sampson L, Stampfer MJ, Rosner B, Bain C, Witschi J, Hennekens CH, Speizer FE. Reproducibility and validity of a semi-quantitative food frequency questionnaire. American Journal of Epidemiology. 1985;122:51–65. doi: 10.1093/oxfordjournals.aje.a114086. [DOI] [PubMed] [Google Scholar]
2.Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics in Medicine. 1989;8:1051–1069. doi: 10.1002/sim.4780080905. [DOI] [PubMed] [Google Scholar]
3.Plummer M, Clayton D. Measurement error in dietary assessment: an investigation using covariance structure models. Part I. Statistics in Medicine. 1993;12:925–935. doi: 10.1002/sim.4780121004. [DOI] [PubMed] [Google Scholar]
4.Plummer M, Clayton D. Measurement error in dietary assessment: an investigation using covariance structure models. Part II. Statistics in Medicine. 1993;12:937–948. doi: 10.1002/sim.4780121005. [DOI] [PubMed] [Google Scholar]
5.Kaaks R, Riboli E, Esteve J, Van Kappel A, Van Staveren W. Estimating the accuracy of dietary questionnaire assessments: validation in terms of structural equation models. Statistics in Medicine. 1994;13:127–142. doi: 10.1002/sim.4780130204. [DOI] [PubMed] [Google Scholar]
6.Ocke MC, Kaaks RJ. Biochemical markers as additional measurements in dietary validity studies: application of the method of triads with examples from the European Prospective Investigation into Cancer and Nutrition. American Journal of Clinical Nutrition. 1997;65:1240S–1245S. doi: 10.1093/ajcn/65.4.1240S. [DOI] [PubMed] [Google Scholar]
7.Subar AF, Kipnis V, Troiano RP, Midthune D, Schoeller DA, Bingham S, Sharbaugh CO, Trabulsi J, Runswick S, Ballard-Barbash R, Sunshine J, Schatzkin A. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. American Journal of Epidemiology. 2003;158(1):1–13. doi: 10.1093/aje/kwg092. [DOI] [PubMed] [Google Scholar]
8.Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology. 1990;132:134–145. doi: 10.1093/oxfordjournals.aje.a115715. [DOI] [PubMed] [Google Scholar]
9.Kipnis V, Carroll RJ, Freedman LS, Li L. Implications of a new dietary measurement error model for estimation and relative risk: application to four calibration studies. American Journal of Epidemiology. 1999;150(6):642–651. doi: 10.1093/oxfordjournals.aje.a010063. [DOI] [PubMed] [Google Scholar]
10.Spiegelman D, Zhao B, Kim J. Correlated errors in biased surrogates: study designs and methods for measurement error correction. Statistics in Medicine. 2005;24(11):1657–1682. doi: 10.1002/sim.2055. [DOI] [PubMed] [Google Scholar]
11.Fraser GE, Butler TL, Shavlik DJ. Correlation between estimated and true dietary intakes: using two instrumental variables. Annals of Epidemiology. 2005;15:509–518. doi: 10.1016/j.annepidem.2004.12.012. [DOI] [PubMed] [Google Scholar]
12.Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, Fahey M, Charrondière UR, Hémon B, Casagrande C, Vignat J, Overvad K, Tjønneland A, Clavel-Chapelon F, Thiébaut A, Wahrendorf J, Boeing H, Trichopoulos D, Trichopoulou A, Vineis P, Palli D, Bueno-de-Mesquita HB, Peeters PHM, Lund E, Engeset D, González CA, Barricarte A, Berglund G, Hallmans G, Day NE, Key TJ, Kaaks R, Saracci R. European prospective investigation into cancer and nutrition (EPIC): study population and data collection. Public Health Nutrition. 2002;5:1113–1124. doi: 10.1079/PHN2002394. [DOI] [PubMed] [Google Scholar]
13.Rosner B. Percentage points for a generalized ESD many-outlier procedure. Technometrics. 1983;25(2):165–172. [Google Scholar]
14.Michels KB, Bingham SA, Luben R, Welch AA, Day NE. The effect of correlated measurement error in multivariate models of diet. American Journal of Epidemiology. 2004;160:59–67. doi: 10.1093/aje/kwh169. [DOI] [PubMed] [Google Scholar]

[R1] 1.Willett WC, Sampson L, Stampfer MJ, Rosner B, Bain C, Witschi J, Hennekens CH, Speizer FE. Reproducibility and validity of a semi-quantitative food frequency questionnaire. American Journal of Epidemiology. 1985;122:51–65. doi: 10.1093/oxfordjournals.aje.a114086. [DOI] [PubMed] [Google Scholar]

[R2] 2.Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics in Medicine. 1989;8:1051–1069. doi: 10.1002/sim.4780080905. [DOI] [PubMed] [Google Scholar]

[R3] 3.Plummer M, Clayton D. Measurement error in dietary assessment: an investigation using covariance structure models. Part I. Statistics in Medicine. 1993;12:925–935. doi: 10.1002/sim.4780121004. [DOI] [PubMed] [Google Scholar]

[R4] 4.Plummer M, Clayton D. Measurement error in dietary assessment: an investigation using covariance structure models. Part II. Statistics in Medicine. 1993;12:937–948. doi: 10.1002/sim.4780121005. [DOI] [PubMed] [Google Scholar]

[R5] 5.Kaaks R, Riboli E, Esteve J, Van Kappel A, Van Staveren W. Estimating the accuracy of dietary questionnaire assessments: validation in terms of structural equation models. Statistics in Medicine. 1994;13:127–142. doi: 10.1002/sim.4780130204. [DOI] [PubMed] [Google Scholar]

[R6] 6.Ocke MC, Kaaks RJ. Biochemical markers as additional measurements in dietary validity studies: application of the method of triads with examples from the European Prospective Investigation into Cancer and Nutrition. American Journal of Clinical Nutrition. 1997;65:1240S–1245S. doi: 10.1093/ajcn/65.4.1240S. [DOI] [PubMed] [Google Scholar]

[R7] 7.Subar AF, Kipnis V, Troiano RP, Midthune D, Schoeller DA, Bingham S, Sharbaugh CO, Trabulsi J, Runswick S, Ballard-Barbash R, Sunshine J, Schatzkin A. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. American Journal of Epidemiology. 2003;158(1):1–13. doi: 10.1093/aje/kwg092. [DOI] [PubMed] [Google Scholar]

[R8] 8.Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology. 1990;132:134–145. doi: 10.1093/oxfordjournals.aje.a115715. [DOI] [PubMed] [Google Scholar]

[R9] 9.Kipnis V, Carroll RJ, Freedman LS, Li L. Implications of a new dietary measurement error model for estimation and relative risk: application to four calibration studies. American Journal of Epidemiology. 1999;150(6):642–651. doi: 10.1093/oxfordjournals.aje.a010063. [DOI] [PubMed] [Google Scholar]

[R10] 10.Spiegelman D, Zhao B, Kim J. Correlated errors in biased surrogates: study designs and methods for measurement error correction. Statistics in Medicine. 2005;24(11):1657–1682. doi: 10.1002/sim.2055. [DOI] [PubMed] [Google Scholar]

[R11] 11.Fraser GE, Butler TL, Shavlik DJ. Correlation between estimated and true dietary intakes: using two instrumental variables. Annals of Epidemiology. 2005;15:509–518. doi: 10.1016/j.annepidem.2004.12.012. [DOI] [PubMed] [Google Scholar]

[R12] 12.Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, Fahey M, Charrondière UR, Hémon B, Casagrande C, Vignat J, Overvad K, Tjønneland A, Clavel-Chapelon F, Thiébaut A, Wahrendorf J, Boeing H, Trichopoulos D, Trichopoulou A, Vineis P, Palli D, Bueno-de-Mesquita HB, Peeters PHM, Lund E, Engeset D, González CA, Barricarte A, Berglund G, Hallmans G, Day NE, Key TJ, Kaaks R, Saracci R. European prospective investigation into cancer and nutrition (EPIC): study population and data collection. Public Health Nutrition. 2002;5:1113–1124. doi: 10.1079/PHN2002394. [DOI] [PubMed] [Google Scholar]

[R13] 13.Rosner B. Percentage points for a generalized ESD many-outlier procedure. Technometrics. 1983;25(2):165–172. [Google Scholar]

[R14] 14.Michels KB, Bingham SA, Luben R, Welch AA, Day NE. The effect of correlated measurement error in multivariate models of diet. American Journal of Epidemiology. 2004;160:59–67. doi: 10.1093/aje/kwh169. [DOI] [PubMed] [Google Scholar]

PERMALINK

Measurement error correction for nutritional exposures with correlated measurement error: Use of the method of triads in a longitudinal setting

Bernard Rosner

Karin B Michels

Ya-Hua Chen

Nicholas E Day

SUMMARY

1. INTRODUCTION

2. METHODS

2.1. No additional covariates that affect nutrient intake or associated biomarkers

2.2. Variance decomposition

2.3. Additional covariates affecting nutrient intake and/or associated biomarkers

2.4. Assessment of covariate effects on the systematic components of dietary and plasma measurement errors

3. EXAMPLE

Table I.

Table II.

Table III.

Table IV.

4. SIMULATION STUDY

Table V.

5. DISCUSSION

ACKNOWLEDGEMENTS

APPENDIX A: CONFIDENCE LIMITS FOR THE ALTERNATIVE REGRESSION CALIBRATION FACTOR (λ_TQ) AND THE REGRESSION COEFFICIENTS β_q AND β_m

APPENDIX B: MLES OF THE PARAMETERS FOR THE MODEL IN EQUATIONS (8) AND (14)

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Measurement error correction for nutritional exposures with correlated measurement error: Use of the method of triads in a longitudinal setting

Bernard Rosner

Karin B Michels

Ya-Hua Chen

Nicholas E Day

SUMMARY

1. INTRODUCTION

2. METHODS

2.1. No additional covariates that affect nutrient intake or associated biomarkers

2.2. Variance decomposition

2.3. Additional covariates affecting nutrient intake and/or associated biomarkers

2.4. Assessment of covariate effects on the systematic components of dietary and plasma measurement errors

3. EXAMPLE

Table I.

Table II.

Table III.

Table IV.

4. SIMULATION STUDY

Table V.

5. DISCUSSION

ACKNOWLEDGEMENTS

APPENDIX A: CONFIDENCE LIMITS FOR THE ALTERNATIVE REGRESSION CALIBRATION FACTOR (λTQ) AND THE REGRESSION COEFFICIENTS βq AND βm

APPENDIX B: MLES OF THE PARAMETERS FOR THE MODEL IN EQUATIONS (8) AND (14)

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

APPENDIX A: CONFIDENCE LIMITS FOR THE ALTERNATIVE REGRESSION CALIBRATION FACTOR (λ_TQ) AND THE REGRESSION COEFFICIENTS β_q AND β_m