Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Jun 26.
Published in final edited form as: J Am Stat Assoc. 2007;102(479):856–866. doi: 10.1198/016214506000000771

Structured measurement error in nutritional epidemiology: applications in the Pregnancy, Infection, and Nutrition (PIN) Study

Brent A Johnson 1, Amy H Herring 1, Joseph G Ibrahim 1, Anna Maria Siega-Riz 2
PMCID: PMC2440718  NIHMSID: NIHMS38386  PMID: 18584067

Abstract

Preterm birth, defined as delivery before 37 completed weeks’ gestation, is a leading cause of infant morbidity and mortality. Identifying factors related to preterm delivery is an important goal of public health professionals who wish to identify etiologic pathways to target for prevention. Validation studies are often conducted in nutritional epidemiology in order to study measurement error in instruments that are generally less invasive or less expensive than ”gold standard” instruments. Data from such studies are then used in adjusting estimates based on the full study sample. However, measurement error in nutritional epidemiology has recently been shown to be complicated by correlated error structures in the study-wide and validation instruments. Investigators of a study of preterm birth and dietary intake designed a validation study to assess measurement error in a food frequency questionnaire (FFQ) administered during pregnancy and with the secondary goal of assessing whether a single administration of the FFQ could be used to describe intake over the relatively short pregnancy period, in which energy intake typically increases. Here, we describe a likelihood-based method via Markov Chain Monte Carlo to estimate the regression coefficients in a generalized linear model relating preterm birth to covariates, where one of the covariates is measured with error and the multivariate measurement error model has correlated errors among contemporaneous instruments (i.e. FFQs, 24-hour recalls, and/or biomarkers). Because of constraints on the covariance parameters in our likelihood, identifiability for all the variance and covariance parameters is not guaranteed and, therefore, we derive the necessary and suficient conditions to identify the variance and covariance parameters under our measurement error model and assumptions. We investigate the sensitivity of our likelihood-based model to distributional assumptions placed on the true folate intake by employing semi-parametric Bayesian methods through the mixture of Dirichlet process priors framework. We exemplify our methods in a recent prospective cohort study of risk factors for preterm birth. We use long-term folate as our error-prone predictor of interest, the food-frequency questionnaire (FFQ) and 24-hour recall as two biased instruments, and serum folate biomarker as the unbiased instrument. We found that folate intake, as measured by the FFQ, led to a conservative estimate of the estimated odds ratio of preterm birth (0.76) when compared to the odds ratio estimate from our likelihood-based approach, which adjusts for the measurement error (0.63). We found that our parametric model led to similar conclusions to the semi-parametric Bayesian model.

Keywords: Adaptive-Rejection Sampling, Dirichlet process prior, MCMC, Semiparametric Bayes

1 Introduction

Measurement error is a common and well-known challenge in nutritional epidemiology. One only has to glance at a recent issue of any one of the leading epidemiological journals to attest to this fact and also to verify that there still are many unresolved questions. One of the more intriguing recent developments in nutritional epidemiology concerns the fitness and applicability of traditional error models used to assess the validity and generalizability of estimated risks obtained from studies using the food frequency questionnaire (FFQ).

Despite many documented pitfalls (Block, 2001; Byers, 2001; Willett, 2001) including systematic biases, and within and between subject variability, the FFQ is a common dietary instrument due to its ease of administration and economy in large nutritional studies. Naive regression methods that use the error-prone FFQ in place of the true long-term dietary intake often attenuate the regression coefficients toward zero [although the result is not true in general nonlinear models (Fuller, 1987; Carroll, Ruppert, and Stefanski, 1995)]. While several statistical methods have been proposed for the analysis of data where covariates are measured with error, regression calibration (Stefanski and Carroll, 1985) seems to be the default method in nutrition (Willett, 1998). The method is popular because it may be implemented using standard software assuming one has a reliable calibration model (Spiegelman et al., 2001; Spiegelman et al., 2005). In addition, much money and energy have been spent on validation studies over the past several decades; therefore, bias and variance parameters relating the FFQ to the true, long-term dietary intake can be estimated with some degree of precision. A related problem to the one considered here is the error in covariate misclassification (cf. Morrissey and Spiegelman, 1999; Holcroft and Spiegelman, 1999; Spiegelman et al., 2001; Zucker and Spiegelman, 2004).

The traditional statistical analysis and inference proceeds by first regressing the FFQ on the outcome to obtain a naive estimate of the regression coefficient. Then, we regress a reference instrument — that is, an unbiased measure for the true dietary intake — on the FFQ to estimate the attenuation factor. It can be shown that dividing the naive estimated regression coefficient by the estimated attenuation factor leads to a corrected estimate of the desired regression coefficient, that is, one obtained if we could have regressed the outcome on the true long-term dietary intake (Carroll et al., 1995; Kipnis et al., 2001). If the systematic bias or the correlated errors in the FFQ or 24-hr recall is ignored, then the attenuation factor will be biased, and subsequently, the “corrected”-regression coefficient estimate will no longer be reliable. While primary interest often lies in estimating this true regression coefficient, epidemiologists are also quite interested in the estimated attenuation factor. Because the power of the study to detect a significant effect is a function of the attenuation factor, epidemiologists use this fact to make post hoc calculations to determine whether a null finding appears to, in fact, be the case or whether it seems to be a result of low power.

Our method uses models which allow for correlation in the errors for contemporaneous instruments as suggested in the literature (Kaaks et al, 1994; Kipnis et al., 2001; Kipnis et al., 2003). Our point and interval estimation method is different from that considered in Kipnis et al. (2001, 2003) in that we use a likelihood-based approach (also called a structural measurement error model), whereas Kipnis et al. (2001) estimates the attenuation coefficient first and then appropriately scales the naive regression coefficient estimate to obtain the corrected coefficient estimate. Recently, Spiegelman et al. (2005) considered a joint model for all the parameters in disease (or outcome) model and measurement error model (as we do below in Section 3) by “stacking” the estimating equations for all the unknown parameters from both the disease and calibration model and forming an M-estimator (cf. Stefanski and Boos, 2002). Again, this regression calibration approach is different from our likelihood-based approach. We subsequently extend our likelihood-based model through the mixture of Dirichlet processes (MDP) methodology to avoid placing strict parametric assumptions on the latent true dietary intake variables. The remainder of this article is organized as follows: Section 2 describes the Pregnancy, Infection, and Nutrition (PIN) study from which the data are acquired and scientific questions of interest; Section 3 describes our statistical model and notation; Section 4 gives an overview of the joint full conditional distribution; Section 5 summarizes a small simulation study; Section 6 summarizes the results of our analysis and we end with a short discussion on the implications of our findings in Section 7.

2 The PIN Study Data

The PIN study was a prospective cohort study of risk factors for preterm birth (Savitz et al., 1999). Recruitment occurred between 24–29 weeks’ gestation, and several questionnaires, including a FFQ to assess dietary intake in the second trimester, were administered at this time as described in Savitz et al. (1999; Siega-Riz et al. 2002). The outcome of interest, preterm birth, was defined as delivery before 37 completed weeks of gestation. Siega-Riz et al. (2004) examined the relationship between maternal folate status and preterm birth, reporting increased risks of preterm birth among women with mean daily folate intake less than 500 μg and among women with serum folate levels less than 16.3 ng/mL. A variety of folate exposure variables, including mean daily dietary intake from the FFQ and two biomarkers, serum and red blood cell folate, were used in separate analyses, with all results reported.

To address FFQ measurement issues, the investigators conducted a validation substudy to determine whether dietary intake changed over the course of pregnancy and to quantify meausurement error in the FFQ. Women in the validation study were enrolled in the first trimester and were asked to complete three FFQs over the course of pregnancy, with each FFQ reflecting intake over the past trimester. The purpose of the longitudinal component of the validation substudy was to determine whether one FFQ measurement during the second trimester of pregnancy would be sufficient to characterize intake throughout pregnancy. In addition, three daily in-depth diet interviews (also called, “24-hour recalls”) were collected proximal to each FFQ, providing a maximum of 12 measurements over three time points. The replicate dietary records were collected in order to help quantify measurement errors in each FFQ.

Finally, we make two additional points regarding the PIN study data. First, one serum folate biomarker was collected on every woman in the study, that is, both in the main study as well as in the substudy. This feature of the PIN study is not common among dietary studies, where a “typical” study collects biomarkers only on women in the validation substudy. However, we found that the additional biomarker information compensated for a lack of information in the validation substudy (i.e. missing FFQs, 24-hour recalls, or biomarkers). Second, the PIN study collected serum and red-blood cell folate biomarkers, which we use as our reference instruments in our analyses. As pointed out by a referee, these biomarkers are measures of folate concentration and not folate intake. Better measures of the latter are replicate urinary nitrogen or doubly-labeled water measurements, neither of which were collected in the PIN study. This important point does not change the validity of the methods or analyses but does have a significant impact on the interpretation of the analysis results and their generalizability to other studies.

3 Model and Notation

In this section, we describe the proposed model and inference used in many nutritional studies. The outcome is often modeled in two stages, where the first stage models the response as a function of predictors, both latent and observed, and the second stage specifies a measurement error model for the error-prone covariables. Let Yi, i = 1, …, m, be an outcome of interest belonging to the exponential family of distributions (McCullagh and Nelder, 1983, p.28). In the PIN study, Yi will be the binary outcome preterm birth, where Yi = 1 if a woman delivered preterm and zero otherwise. Define Ti as a pT ×1 vector of error-prone covariates assumed to be related to the outcome of interest (for example, Ti may refer to the true long-term dietary intake of several nutrients of interest or may refer to a vector of true dietary intakes for a single nutrient over different trimesters); and Zi is a pZ × 1 vector of other covariates assumed to be “error-free.” The outcome is related to thecovariates through the following model:

g{θi(η)}=η0+ηTTi+ηZZi, (1)

where g(·) is a known link function, EYi = θi(η), and η=(η0,ηT,ηZ). The two primary instruments used in nutrition studies are the FFQ and 24-hour recall, which we denote by Qijl1, l1=1,,kijQ and Fijl2, l2=1,,kijF, respectively. In general, it will be convenient to let Qi denote the ki1 × 1 vector of all the FFQs for the i-th subject, where ki1=jkijQ, and similarly, let Fi be the ki2 × 1 vector of the 24-hour recalls, ki2=jkijF. As discussed above, evidence suggests the following measurement error model (Kipnis et al., 2001; Kipnis et al., 2003; Spiegelman et al., 2005) relating the observed instruments to the true dietary intake, Ti:

Qijl1=μQ+αjQ+βQTij+bi1+Uijl1Q (2)
Fijl2=μF+αjF+βFTij+bi2+Uijl2,F (3)

where μQ, μF are means for the FFQ and 24-hr recalls, respectively, (α1Q,,α3Q,α1F,,α3F) are trimester-level fixed effects, (bi1, bi2) are mean zero random effects describing subject-specific biases, (βQ, βF ) describe the systematic bias of the instruments, and (Uijl1Q,Uijl2F) is a bivariate perturbation vector assumed to have mean zero, variance (σQ2,σF2), respectively, and covariance ρjσQσF when j = j and zero otherwise. To identify trimester-level effects and systematic bias in the FFQ and 24-hour recall, it is necessary to have one instrument which is unbiased for true dietary intake Ti. Let the serum folate biomarker Mijl3,l3=1,,kijM, which is obtained from a blood draw taken close in time to the FFQ administration, be such an instrument which is assumed to follow the model (Kipnis et al., 2001; Kipnis et al., 2003; Spiegelman et al., 2005)

Mijl3=Tij+bi3+Uijl3M, (4)

where, again, bi3 is a mean zero random effect and Uijl3M an independent, instrument-specific measurement error with variance σM2. Again, recent research in nutritional epidemiology (Kipnis, 2001) suggests that it may be prudent to consider models where corr (Uijl2F,Uijl3M)0, j = j, and similarly for the FFQ. The resulting model is heavily parameterized and the identifiability of all parameters will only be satisfied with sufficiently rich data, e.g. replicate FFQs, 24-hour recalls, and biomarkers in a validation substudy. Because such data may not be observed in any one data set, one must reduce the complexity of the measurement error model (2)–(4) through simplification or a priori knowledge of some parameters to identify the remaining unknown parameters. In the following paragraph, we discuss details of the PIN study data and its consequences on our measurement error model; we compare our model to one used in a recent analysis of the Medical Research Council (MRC) study data (Kipnis et al., 2001).

In the PIN study, women had at most one FFQ per trimester j(kijQ1) and at most three 24-hour recalls (kijF3) per trimester. Only one biomarker was collected throughout the study period. In contrast, the MRC study collected one FFQ throughout their study period, but collected eight biomarkers (two per season) and four 24-hour recalls (one per season). For our analysis of the PIN data below, we set and model a single error-prone random variable Ti — i.e., Tij = Ti, j = 1, 2, 3 in (1)–(3) — and use the classical measurement error model for the biomarker in (4),

Mi=Ti+Ui3 (5)

With σM2 assumed to be known. Because replicate biomarkers are collected for every season of the MRC study, Kipnis et al. (2001) need not simplify the error model in the biomarker (4) as we have done for the PIN study. However, because only one FFQ is observed in the MRC study, identifiability for the all the parameters in model (2) becomes problematic. For example, it is not possible to identify var(bi1) and σQ2 separately from one FFQ per subject without additional assumptions. Despite our model simplifications, we use general notation following models (2)–(4) as our methods and subsequent analyses are germane to other measurement error problems with similar data.

To write the likelihood for the observed data, it is convenient to introduce some new notation and assumptions. Let Wi=(Qi,Fi,Mi) be the ki×1 vector of all the instruments, where Mi is a ki3×1 vector (ki3=jkijM) of unbiased reference instruments for the i-th subject and ki = ki1 + ki2 + ki3. Here, we also assume that the random effect vector bi = (bi1, bi2, bi3) is normally distributed with mean zero and covariance matrix D and the measurement error vector Ui=(UiQ,UiF,UiM) is normally distributed with mean zero and covariance matrix Σ. The likelihood function of the observed data conditional on Zi is Πi Li(Yi, Wi|Zi), where

Li(Yi,Wi|Zi)=Li(Yi|Ti,Zi)Li(Wi|Ti,Zi)Li(Ti|Zi)dTi, (6)

Li(Ti|Zi) is the likelihood of the true dietary intake vector Ti (e.g. Gaussian), Li(Wi|Ti, Zi) is the error distribution conditional on Zi, and Li(Yi|Ti, Zi) is the probability density function from the exponential family with the systematic and random components and link function given in (1). We will assume that Ui is independent of Zi and therefore replace Li(Wi|Ti, Zi) with Li(Wi|Ti), which is a multivariate normal distribution defined by the models in (2), (3), and (4). This assumption seems tenable in many applications but would not be reasonable if, for example, the mother’s height or weight were somehow related to the error in the instrument. If such an assumption were unjustified, a more complicated error model could be included without any additional difficulty. A detailed description of Li(Wi|Ti) is given in the next subsection.

3.1 Measurement Error Model

We first consider a simplified version of the model in (6), motivated by data from the PIN study described in Section 2. For simplicity, let j = 1, …, 3 and Tij = Ti for all j. Conditional on the random effects bi and true dietary folate intake Ti, we have

(QiFiMi)Nki{Xi(μα)+AiTi(β1)+Ribi,i},

where μ = (μQ, μF )′, α=(α1Q,,α3Q,α1F,,α3F), β = (βQ, βF)′ and Xi, Ai, and Ri are fixed design matrices linking the instruments/biomarkers to the calibration parameters and random effects, respectively. To continue this illustration, we make another common assumption and subsequent simplification in the measurement error model. In particular, one typically assumes that the measurement errors in the biomarkers for the i-th subject are independent of the measurement errors in the FFQ and 24-hr recalls. This assumption seems tenable in the PIN data as the FFQ and 24-hour recalls are both self-reported while the biomarkers are laboratory-measured with no a priori knowledge of FFQ or 24-hour recall. If we partition Σi into Σ11i, Σ12i, Σ21i, and Σ22i where Σ11i corresponds to the covariance matrix for the FFQs and 24-hr recalls, 12i=21i corresponds to the covariance between instruments and biomarkers, and Σ22i is the covariance matrix of the biomarkers, then the conditional independence assumption implies 12i=21iT=0. From here, it is useful to treat the biomarkers separately in the model as well as in the likelihood (6). Now, we focus on the error calibration model for the FFQ and 24-hr recall only. Hence, rewrite this portion of the model as

(QiFi)N(Hiγ+Ribi,11i), (7)

where γ=(γQ,γF),γr=(μr,α1r,α2r,α3r,βr) for r = Q, F, where Q is short-hand for FFQ and F denotes 24-hour recall. Because Hi is not full rank, it is necessary to constrain some of the parameters to achieve estimability of γ. We constrain the first trimester-level effect α1r=0, r = Q, F, which implies γr=(γ1r,γ2r,γ3r,γTr) has the following interpretations: γ1r=μr+α1r,γ2r=α2rα1rand γ3r=α3rα1r for r = Q, F. For consistency, we label γTr=βr. In (7), we also have that Hi is block diagonal, i.e. Hi=diag{HiQ,HiF}, where

HiQ=(BiQ|1ki1|Ti1ki1)

and BiQ=diag{1ki1Q,1ki2Q,1ki3Q}. HiF is defined similarly. With the Qi and Fi organized as in (7), then Σ11 (as a function of its parameters) may be written as

11i(σ,ρ)=Gi1/2(σ)Γi(ρ)Gi1/2(σ), (8)

where σ = (σQ, σF )′, ρ = (ρ1, ρ2, ρ3)′ for the three trimester correlation parameters, Gi(σ)=diag{σQ2Iki1,σF2Iki2}and Γi(ρ) is a symmetric ki× ki correlation matrix. Assuming a single FFQ in each of three trimesters (i.e. ki1Q=ki2Q=ki3Q=1 and three 24-hr recalls at each of three trimesters (i.e. ki1Q=ki2Q=ki3Q=3, Γi(ρ) is a correlation matrix with the following structure:

Γi(ρ)=(I3ρ113000ρ213000α313ρ113000ρ213000ρ313I9), (9)

where the 1r is a column vector of ones of length r and the 0s are vectors with the appropriate implied dimensions. Note that if we have replicate FFQs and 24-recalls greater than or equal to two at each time, then 1 (and analagously the 0s) will no longer refer to vectors, but matrices of ones (or zeros). So far, we have placed no restrictions on ρ. We discuss three correlation models of interest and subsequent restrictions on Σ11i(σ, ρ) in Subsection 3.2.

3.2 Correlation Models and Their Implied Constraints

In this subsection, we focus on three correlation models of interest and derive the conditions on ρ that lead to the positive definiteness of Σ11i(σ, ρ) and, therefore, ultimately leads to model identifiability.

The three correlation models (CM) of interest can be summarized as the following: for every l1, l2,

CM1:corr(Uijl1Q,Uijl2F)=0for every j,j (10)
CM2:corr(Uijl1Q,Uijl2F)={ρif j=j0otherwise (11)
CM3:corr(Uijl1Q,Uijl2F)={ρjif j=j0otherwise, (12)

for subject i at time j. In words, correlation model one (CM1) in (10) assumes that measurement errors between FFQs and 24-hr recalls are mutually independent, while CM2 and CM3 assume correlated errors. CM3 assumes measurement errors for different instruments are correlated differently for each measurement time while CM2 assumes the correlation remains the same over time. Both CM2 and CM3 are expected to reflect better the errors in contemporaneous instruments observed in nutritional epidemiological studies (Kipnis et al., 2001; Subar et al., 2002; Carroll, 2003; Carroll et al., 2004).

Now, we turn our attention to the positive definiteness of Σ11i. By definition, Σ11i will be positive definite when the quadratic form λΣ11iλ = 0 if and only if λ = 0. We use a corollary which allows us to check the positivity of the determinants of all the leading minors, or analogously, to check that the eigenvalues are all positive (Searle, 1971).

Assuming that there are J measurement times and a constant number of replicate FFQs and 24-hr recalls across trimesters, nQ and nF, respectively, the general form of the determinant of Σ11i is

|11i|=σF2JnFσQ2JnQj=1J(1nQnFρj2), (13)

and the unique eigenvalues of Σ11i are σQ2,σF2, and

1/2{σQ2+σF2±(σQ4+σF42σQ2σF2+4nFnQρj2σQ2σF2)1/2},

for j = 1,…, J. It is straightforward to verify that the product of the eigenvalues is indeed the determinant by including the missing replicate eigenvalues, i.e. J − 1 repeats of σQ2 and σF2. Now, through some straightforward algebra, it is easy to see that the condition that will ensure the positivity of the eigenvalues is

|ρj|<(nFnQ)1/2,j=1,,J. (14)

The condition in (14) is necessary and sufficient for the positive definiteness of Σ11i. Furthermore, any prior distribution placed on ρ must have support (14). Note that neither models (11) nor (12) will be able to detect/estimate correlation parameters that are extreme in either direction.

4 Prior and Posterior Distributions

In this section, we discuss the prior specification for all the parameters in our above models and the resulting posterior distributions to be used in a Gibbs (Geman and Geman, 1984) or Metropolis-Hastings (Metropolis and Ulam, 1949; Metropolis et al., 1953; Hastings, 1970) sampling algorithm. For now, assume that T1, …, Tm are independent and identically distributed random vectors from the distribution FT with mean μT and variance ΣT. Define Li(Yi|Ti, Zi; η) in (6) as the i-th contribution to the conditional likelihood given Ti arising from (1), e.g. for a Bernoulli response and a logit link function

logLi(Yi|Ti,Zi;η)=Yi(η0+ηTTi+ηZZi)+log{1θi(η)},

where θi(η) was defined in (1). For simplicity, we assume normal prior distributions on the mean parameters η, the systematic bias parameters γ in (7), and mean of the latent dietary intake random variables μT from Li(Ti|Zi) in (6), i.e. η ~ N(η0, V0) in (1), γ ~ N (γ0, V0,γ) in (7), and μT ~ N (μT,0, V0,μT) in Li(Ti|Zi), and conjugate Wishart priors on D−1 in (7) and T1 in Li(Ti|Zi) in (6), D−1 ~ Wq(νD, CD) and T1W(νT,CT), respectively. Note that while it is common to assume inverse Gamma priors for σ, this will not necessarily imply a conjugate prior distribution because of the correlation parameters ρ in Σ11i. Since our constraints on the correlation parameters do not depend on σ, we may factor our joint prior π(σ, ρ) into the product π(σ)π(ρ). Because there are typically more replicate FFQs and 24-hour recalls than biomarkers, we assume flat priors for σQ2 and σF2 but an Inverse-Gamma (IG) prior for σM2. Define our prior on σ as

π(σ)=σQ2σF2e1/(bMσM)/σMaM+1,

where aM, bM are specified hyperparameters. For ρ, we specify a uniform prior with support given by the parameter constraints given in Subsection 3.2, i.e. π(ρ) ∝ 1 with |ρj| < (nFnQ)−1/2, j = 1, …, J. Finally, here, we also assume Ti is normally distributed with mean μT and covariance matrix ΣT. Given π(σ) and π(ρ), and prior variances V0, V0, and V0,μT, the joint posterior of the parameters is given by

p(η,γ,b,T,σ,ρ,D,μT,T|Y,W)|W|1/2|D1|νD+mq12|T1|νT+mpT12×exp[i=1m{logLi(η)12(WiHγ)i1(WiHγ)12biD1bi12(TiμT)T1(TiμT)}12(ηη0)V0,η1(ηη0)12(γγ0)V0,γ1(γγ0)12(μTμT,0)V0,μT1(μTμT,0)12tr(CD1D1)12tr(CT1T1)]π(σ)π(ρ), (15)

where ΣW = diag{Σ1,…, Σm}. Additional details for the full conditional distributions are given in the Appendix.

4.1 Relaxing Distributional Assumptions on Ti

In measurement error problems, Ti is a latent random vector with distribution FT. The Bayesian paradigm offers a convenient method for handling latent variables and other incomplete data problems by sampling the latent variable from its full conditional distribution. When FT is parametric, e.g. Gaussian, then the full posterior is given by (15). However, this distributional assumption is difficult to check and a more flexible model is often desirable. One method is to use a scale mixture of normals for Ti. Towards this goal, suppose that we start with a univariate Gaussian distribution with mean μT and variance σT2. Then, we may write

Ti=μT+εi

where εiN(0,σT2). A straightforward extension of this model is to assume εiN(0,λiσT2) where the λi are subject specific latent variables and assumed to have Gamma distributions. A second method makes even fewer assumptions about the distribution function FT, requiring only that FT be a proper distribution function. We employ the mixture of Dirichlet process (MDP) methodology based on a Polya urn scheme (Antoniak, 1974; Escobar, 1994; MacEachern, 1994). In addition to using the Dirichlet process prior for parameters, the MDP methodology has been successfully applied to other missing data problems, such as random effects in mixed models (Kleinman and Ibrahim, 1998; Brown and Ibrahim, 2003). Less work has been done using the MDP prior in measurement error models. Two exceptions are Mallick et al. (2002) and Müller and Roeder (1997), the latter of which describes an application of the MDP prior methodology to case-control studies. There are at least two differences worth noting between our application here and the one presented in Müller and Roeder (1997). First, there is the fundamental difference in design between the retrospective and prospective study design, where the case-control design has the additional complexity derived from conditioning on the prevalence of cases in the sample, that is, conditioning on Σi Yi = 1 (c.f. Breslow and Day, 1980). Second, our two applications are different in that our model incorporates multiple validation instruments with correlated errors. We expect that in a case-control study with multivariate instruments as in our application presented here, a combined model using ideas presented here and in Müller and Roeder (1997) could be applied. Below, we describe how to apply the mixture of Dirichlet process methodology to our measurement error problem.

Assume the random vectors Ti are drawn from an arbitrary distribution FT, where FT has a Dirichlet process prior, denoted by FT ~ DP (ξF0), F0 ~ N (μT, ΣT ), and ξ is an unknown scalar confidence parameter. Suppressing parameters other than the error-prone covariate Ti, the full conditional distributions for {Ti, i = 1, …, m} are given by (See Kleinman and Ibrahim, 1998)

[Ti|{Ti,ki},Yi,Wi]q0Li(Yi,Wi|Ti,Zi)f0(Ti|Zi)+kiδ(dTi|Tk) (16)

where f0(Ti|Zi) ≡ Li(Ti|Zi), and Li(Yi, Wi|Ti, Zi) was defined in (6). Recall, Li(Yi, Wi|Ti, Zi) factors into the product Li(Yi|Ti, Zi)Li(Wi|Ti, Zi) by the nondifferential measurement error assumption. Also, {q0, qk, k = 1, …, m} are unnormalized selection probabilities where

q0ξLi(Yi,Wi|Ti,Zi)f0(Ti|Zi) (17)

and qkLi(Yi,Wi|Tk*,Zi) and Tk* are the unique atoms of f0(T|Z). Because (17) does not, in general, have a closed form solution, numerical integration is typically needed. However, it would be possible to find a closed form solution if, for example, Li(Yi, Wi|Ti, Zi) and F0 were both multivariate normal. At the next stage, we sample the unique vector Tj* from its full conditional distribution p(Tj*|Dobs,rest), where Dobs denotes the observed data, and rest is short-hand for all remaining parameters. For a fixed confidence parameter ξ, the full conditional distribution of Tj* is defined as

p(Tj*|Dobs,rest)exp[iSj{Yig(θi)+log(1θi)12(WiHiγRibi)i1(WiHiγRibi)}12(Tj*μT)i1(Tj*μT)] (18)

for Sj={i|Ti=Tj*}.

Define I* as the number of unique clusters of T*, I* ≤ m. Then, the confidence parameter ξ influences the tendency of the MCMC algorithm to favor large or small I*, with ξ implying large I*. In this paper, we use initially a two-stage data augmentation algorithm to sample ξ (Tanner and Wong, 1987) and then conduct sensitivity studies where ξ is fixed. Assume ξ has a Gamma prior with shape r and rate λ, i.e. ξ ~ Gamma(r, λ) with = r/λ. At the first stage, the augmentation algorithm samples a latent variable c conditional on the current value of ξ and I*, that is, [c|ξ, I*] ~ Beta(ξ + 1, I*). Next, we sample the confidence parameter ξ from the mixture of two Gamma distributions given the latent variable c and I*, i.e.

[ξ|c,I*]πcGamma (r+I*,λlog(c))+(1πc)Gamma (r+I*1,λlog(c))

where πc = z/(z + 1) and z = (r + I* − 1)/[I*{λ − log(c)}]. Some care is needed in choosing the prior parameters (r, λ) as this strongly influences the tendency of the algorithm to favor the base measure F0 or collapse on relatively few clusters. We use the following two priors, Gamma(1, 1) and Gamma(0.01, 0.01), to check the sensitivity of parameter estimates due to the choice of prior on ξ. Both priors have mean one, but the latter prior has variance 100 and therefore puts mass on both large and small values of ξ.

5 Simulation Studies

Here, we present a small simulation study to provide some empirical validity that the parameters in the complex measurement error model (2)–(3) are estimable. The structure of our simulation study mimics the PIN study data and, hence, we use the simplified biomarker model (5) with σM2 known. The details of our simulation study follow below.

We begin by simulating Ti as iid standard Gaussian random variates, i = 1, …, 75 and independently generating subject-specific biases bi from a bivariate Gaussian distribution with mean zero and covariance matrix D. Then, for each subject i, and visit j = 1, 2, 3, we generate a vector of instruments that satisfy the models:

Qij=γ1Q+γ2Q+γ3Q+γTQTi+biQ+UijQ,Fijl=γ1F+γ2F+γ3F+γTFTi+biF+UijlF,l=1,2,3,

where corr (UijQ,UijlF)=ρj, l = 1, 2, 3, and ρ2 = 0.25 but ρ1 = ρ3 = 0. Finally, we independently simulate one unbiased biomarker Mi as Gaussian with mean Ti and variance σM2=0.3. The specific values for the remaining parameters are given in Table 1.

Table 1.

Summary of posterior means and credible sets over 500 Monte Carlo data sets, where Mean represents the Monte Carlo average posterior mean, SD represents the Monte Carlo standard deviation of posterior means, and Coverage indicates the proportion of datasets in which a 95% credible set includes the true value. γ are systematic bias parameters in the MEM and the remaining parameters are covariance parameters in the MEM. For each dataset, we drew 2000 samples from our joint posterior and treated the first 1000 as burn-in.

Parameter Truth Mean SD Coverage
γ1Q 3.00 2.99 0.18 0.93
γ2Q −0.75 −0.75 0.17 0.95
γ3Q 0.75 0.75 0.17 0.93
γTQ 0.50 0.52 0.19 0.95
γ1F 1.00 1.01 0.21 0.93
γ2F −0.25 −0.25 0.09 0.95
γ3F 0.25 0.24 0.10 0.94
γTF 0.90 0.94 0.24 0.91
D11 1.25 1.26 0.29 0.95
D12 0.25 0.24 0.27 0.94
D22 2.25 2.31 0.48 0.95
σQ2 1.00 1.04 0.12 0.95
σF2 1.00 1.01 0.06 0.94
ρ1 0.00 0.00 0.07 0.97
ρ2 0.25 0.24 0.08 0.94
ρ3 0.00 0.00 0.06 0.95

In conclusion, we have not proven formally that the parameters in our measurement error model (2)–(3) are identified. At the same time, our simulation studies suggest that one can estimate all parameters in our measurement error model and draw correct inference from the posterior distribution using the correct likelihood specification and noninformative priors distributions.

6 Analysis of the PIN Study Data

For purposes of discussion, we split the data into two groups: women who were included in a substudy and women not in the substudy. In addition to the single FFQ, main study participants also provided serum folate measures, which were incorporated into the measurement error model in the analysis. Women in the substudy provided additional dietary information that other women were not requested to give, ideally providing three FFQs and 9 24-hour recalls (1 FFQ and 3 24-hr recalls per trimester for all three trimesters) during the pregnancy. For convenience, we split the i-th contribution to the likelihood (6) into two pieces through the use of indicator functions, I(·). Suppressing the parameters arising from Ti, we have:

Li(Yi,Wi)={Li,sub(Wi;γ,D,σ,ρ)}I(Si=1){Li,nsub(Yi,Wi;η,γ1Q,σF)}I(Si=0),

where Si equals one if the i-th women belongs to the substudy and zero otherwise. Therefore, the posteriors for γ and σ will have different contributions from women in the substudy versus those not in the substudy. Of course, the posterior for Ti depends on substudy status as well as each step in the MDP implementation.

Our analysis uses 172 women from the substudy who had at least one of the nine 24-hour recalls and 1679 women in the main study. Due to the rigorous protocol of the substudy, women did not provide all 12 dietary measures. The 1679 women in the main study were chosen to have complete data for preterm birth, the three “error-free” covariables in the outcome model — height, body mass index (BMI), and dietary caloric intake (also called “energy” in our analyses below) as measured in the FFQ — and serum folate. The overall preterm birth rate in the combined data was 12.7% (236/1851). Two covariables, BMI and dietary caloric intake, were transformed using the natural logarithm. All three covariables were standardized by their sample means and standard deviations (2.6, 0.24, 0.47, respectively) and all are assumed to be error-free. With additional information on the variability in the measurements in these variables, it would be possible to relax this assumption as well. This investigation is, however, beyond the scope of this manuscript and beyond the data available to the authors. The sample variance of the unbiased serum folate biomarker is 0.40.

While non-substudy women were chosen to have complete data, the same criterion was not used to select women in the substudy because of frequent non-response. As we see in Table 2, while many women provided one 24-hour recall at each trimester (82%, 73%, and 67% at visit 1, 2, and 3, respectively), fewer provided all three 24-hour recalls for any given trimester due to the rigorous protocol. Rather than remove these missing observations, we assumed the missing values were missing at random, then used our model and MCMC methods to sample the missing values (c.f. Little and Rubin (2002) for a review of Gibbs sampling for missing data problems). A similar strategy was employed for missing biomarkers (only 25 biomarkers were observed from the 172 substudy women).

Table 2.

Sample mean and standard deviation for two biased measures of folate intake — dietary folate (FFQ) and 24-hour recall — from 172 women in the PIN substudy. Both FFQ and 24-hour recall measurements are reported on the log-scale with the 24-hour recall attempted three times per trimester and FFQ attempted once per trimester.

Instrument Trimester Rep N Mean SD
1 1 97 5.92 0.46
FFQ 2 1 134 6.00 0.35
3 1 72 6.00 0.42
1 1 141 5.62 0.62
1 2 104 5.61 0.68
1 3 16 5.18 0.43
2 1 125 5.72 0.55
24-hour recall 2 2 95 5.84 0.48
2 3 5 5.55 0.58
3 1 116 5.87 0.56
3 2 87 5.91 0.51
3 3 2 6.14 0.22

We summarize the mean parameters from the outcome model (η) and the systematic bias parameters (γ) in Table 3 and variance parameters (σQ2,σF2,D,ρj) in Table 4. In Table 3, we include one column of “naive” parameter estimates which are calculated by fitting two independent regression models with complete data: first, logistic regression model in (1) with the true folate intake replaced by serum folate biomarker to obtain η̂naive, and second, linear regression of substudy FFQs and 24-hour recalls on serum folate biomarkers assuming model (2)–(3) under CM1 (ρj = 0) and no subject-specific biases (D = 0). In Table 3, we summarize the parameter estimates under CM1 for folate intake following a normal distribution and also our MDP model with ξ = 0.83, reflecting little confidence in the normality assumption. Interestingly, the protective folate effect from the naive analysis appears even stronger after adjusting for measurement error. Also, there appears to be an inverse intra-individual relationship between the FFQ and 24-hour recall (D12 = −0.22) which suggests that women who respond conservatively on the FFQ tend to respond liberally on the 24-hour recall and vice-versa. In the validation study, there is some evidence that folate consumption, as measured by the 24-hour recalls, increases throughout pregnancy, though there appears to be no monotone increase when evaluating folate consumption as measured by the FFQ. Though the cost may be prohibitive, future validation studies in pregnancy might consider including serial biomarkers to help determine whether there are substantial pregnancy-related dietary changes throughout the nine month period that would necessitate serial dietary assessments in studies of nutrition during pregnancy. In addition to the parameters in Table 3, we also estimated the odds ratio for an “IQR-increase” in folate, that is, an increase from the 25-th percentile to the 75-th percentile of the folate sample distribution. Hence, we estimated a 27% reduction in the odds [OR=0.73, (0.59–0.91)] of preterm birth for an IQR-increase in latent folate given BMI, mother’s height, and energy level. Mother’s height and BMI are important predictors of preterm birth both before and after adjusting for measurement error in the folate variable.

Table 4.

Summary of variance component estimates (with posterior standard deviations in parentheses) from MCMC analyses results from the PIN study using normal prior distribution on true folate concentration. Correlation models (CM1–CM3) refer to different assumptions about the correlation among errors of contemporaneous intstruments and are described in Section 3.

Parameter CM1* CM2 CM3
D11 0.54 (0.09) 0.51 (0.09) 0.46 (0.08)
D12 −0.21 (0.03) −0.23 (0.04) −0.24 (0.05)
D22 0.08 (0.01) 0.10 (0.02) 0.13 (0.03)
σQ2 0.43 (0.02) 0.43 (0.02) 0.43 (0.02)
σF2 1.19 (0.05) 1.19 (0.05) 1.18 (0.05)
ρ1 0* −0.02 (0.02) −0.11 (0.04)
ρ2 0* −0.02 (0.02) 0.03 (0.05)
ρ3 0* −0.02 (0.02) 0.02 (0.04)
*

Model 1 sets ρj = 0, j = 1, 2, 3

Table 3.

Analysis results from the PIN study making parametric assumptions about true folate and a common correlation parameter among contemporaneous instruments. The “naive” analysis refers to two independent, complete-case analyses which replace the true folate random variable with the serum folate biomarker. γ refers to systematic parameters in the measurement error model. Posterior means from 6000 Gibbs samples with the first 4500 treated as burn-in are reported with standard deviations reported in parentheses.

Parameter Naive Normal MDP (ξ = 0.83)
Intercept (η0) −1.95 (0.08) −1.99 (0.08) −1.99 (0.08)
Folate (ηT ) −0.27 (0.11) −0.46 (0.15) −0.48 (0.14)
Height (ηZ1) −0.07 (0.03) −0.08 (0.03) −0.08 (0.03)
BMI (ηZ2) 0.68 (0.30) 0.65 (0.31) 0.63 (0.31)
Energy (ηZ3) −0.01 (0.15) −0.01 (0.16) −0.01 (0.15)
γ1Q 5.90 (0.16) 5.97 (0.02) 5.97 (0.02)
γ2Q 0.18 (0.19) 0.11 (0.07) 0.10 (0.07)
γ3Q 0.20 (0.23) −0.34 (0.08) −0.35 (0.08)
γTQ −0.13 (0.13) 0.13 (0.03) 0.13 (0.03)
γ1F 5.27 (0.09) 5.31 (0.07) 5.34 (0.06)
γ2F 0.25 (0.12) 0.31 (0.09) 0.31 (0.09)
γ3F 0.59 (0.13) 0.56 (0.09) 0.55 (0.09)
γTF −0.01 (0.08) −0.02 (0.05) −0.02 (0.04)

The proposed measurement error model (2)–(4) is parameterized richly and our analyses did not find substantial differences among parameter estimates in models of increasing complexity. Hence, it may be preferable to select the most parsimonious model and eliminate unnecessary complexity in the measurement error model. To facilitate model comparisons, we use the deviance information criterion (DIC; Spiegelhalter et al., 2002) and compare the correlation models (CM1-CM3) in addition to one simpler model “CM1 + {D = 0}”, which allows for no subject-specific biases in the FFQ or 24-hour recall. Our results are displayed in Table 5 using the following additional notation: Δ̄ is the posterior mean of minus twice the log likelihood, pδ is the effective number of parameters, Δ* is minus twice the log likelihood evaluated at the posterior means of all parameters, and DIC = Δ̄ + pδ. We immediately notice that pδis strikingly large, again emphasizing the large number of unknown variables in our model. Recall, that each latent folate variable Ti is regarded as an unknown variable in addition to all missing FFQs, 24-hour recalls, and biomarkers in the validation substudy. Our model comparison suggests that a model with no correlation among contemporaneous measurements and no subject-specific biases is the best model. The effective number of parameters in this simple model is approximately 160 parameters fewer than model CM1 due to the latent subject-specific biases b1i, b2i which are absent when D = 0. However, when we believe that D ≠ 0 and only focus on CM1-CM3, we find that CM2 is the best model among the three which suggests that a model that considers non-zero correlations among contemporaneous instruments is useful.

Table 5.

Model comparison using deviance information criterion (DIC). {D = 0} implies D11 = D12 = D22 = 0 which implies no subject-specific biases (no heterogeneity) in the FFQ or 24-hour recall. Δ̄ is the posterior mean of minus twice the log likelihood, pδ is the effective number of parameters, Δ* is minus twice the log likelihood evaluated at the posterior means of the parameters, and DIC = Δ̄ + pδ.

Model Δ̄ Pδ Δ* DIC
CM1 + {D = 0} 8841.5 2893.3 5948.2 11734.8
CM1 8836.9 3053.2 5783.7 11890.1
CM2 8834.0 3052.2 5781.9 11886.3
CM3 8836.6 3055.1 5781.5 11891.7

In Tables 34, we presented parameter estimates which we claim are relatively insensitive to the confidence parameter ξ. To investigate further this claim, we ran more than 60 MDP analyses of the PIN study data with confidence parameters ranging from 0.01 to 10,000. We found that posterior means and standard deviations from an MDP analysis using confidence parameters greater than 50 did not change significantly. In Figure 1, we plot the number of unique clusters of T, i.e. I*, and the posterior means of five folate-related parameters as a function of the confidence parameter ξ and then fit a cubic-spline to the points to illustrate the average trend. So, our empirical findings suggest that the parameter estimates do not change significantly once the number of unique clusters of T gets beyond 120 or so, on average, as we see in Figure 1(a). In panels (b)–(f) of Figure 1, we graph the posterior means of five parameters most significantly impacted by choosing ξ sufficiently small. We note that all five parameters tend to decrease as ξ approaches zero. For example, the posterior mean of ηT is approximately −0.48 when ξ small but −0.46 for large ξ, the latter of which corresponds to the normality assumption in Table 3. At the same time, we emphasize a word of caution when drawing conclusions from these figures as the variability in posterior means cannot be ignored, particularly for small ξ. Moreover, the average change in posterior means from ξ ≈ 0.05 to ξ ≈ 50 may be extremely small, e.g. less than 0.01 for γTQ and less than 0.005 for γTF.

Figure 1.

Figure 1

The effect of the confidence parameter ξ on latent folate concentration parameters in an MDP analysis of the PIN study data. I* is the number of unique values of T; μT and σT are the mean and standard deviation of T, respectively; ηT is the folate effect on pre-term birth; γTQ and γTF are the systematic biases in the FFQ and 24-hour recall, respectively.

7 Discussion

We have presented a Bayesian semi-parametric method to estimate parameters from a generalized linear measurement error model with a structured measurement error model, and applied the method to an analysis of the PIN data. Our first method assumes that true long-term folate is normally distributed while the second method using mixture of Dirichlet process prior framework does not. We found that results based on a naive model which replaces true long-term folate by the observed serum folate to be somewhat conservative when compared to results based on our calibrated analysis. Furthermore, the results presented in Tables 34 appeared to be insensitive to the normality assumption on folate intake when compared to those from the MDP analysis for modest values of ξ.

In general, there has been mixed evidence in the literature about whether the instruments under-or over-estimate intake. In the past, FFQs have been shown both to under-estimate intake with respect to food records (Brown et al., 1996) and to over-estimate intake relative to food records (Suitor et al., 1989; Greeley et al., 1992; Forsythe and Gage, 1994; Erkkola et al., 2001; Robinson et al. 1996;). The PIN raw data show some evidence of underestimation of dietary intake in FFQ versus 24 hour recalls in the second and third trimesters, but this was not significant using tests of means. Food records themselves tend to underestimate dietary intake compared to the true gold standard, doubly-labeled water (Goldberg et al., 1993), under certain weight-stable conditions. As one anonymous referee pointed out, even doubly-labeled water may have additional measurement error with it, although we expect the error associated with it to be much smaller than the error associated with either the FFQ or 24-hour recall.

The PIN study data is unique among nutritional epidemiology studies of dietary intake for many reasons, one of which is the collection of an FFQ and biomarker in the main study. Typically, a study will collect the FFQ in the main study and then conduct a validation substudy to determine the relationship between the FFQ and biomarker. As suggested by an anonymous referee, it would be interesting to see how our analytic results changed once we removed the biomarker in the main study. We conducted these analyses, including the model comparison in Table 5, and found that our results are sensitive to the removal of this data. First, the measurement error model is too complex for the observed validation data in the PIN substudy. In addition to removing the correlation parameters (ρj) and subject-specific biases (i.e. D = 0), a substantial simplification of the trimester-level means (αjQ,αjF) in (2)–(3) would be necessary. Second, the estimated FFQ-biomarker association using only in the PIN substudy is too weak and, hence, after removing the biomarker in the main study, the posterior means of η in the outcome model (1) look more like the “naive” estimates than calibrated or corrected estimates. Thus, the parameter estimates presented above do require the biomarker in the main study in an analysis of the PIN study data. In general, however, we conjecture that all parameters in the measurement error model (2)–(4) are estimable given suffcient data in the validation substudy. Therefore, our models and methods are not limited to studies which collect biomarkers in the main study.

Our analysis used serum folate biomarkers as unbiased measures of folate concentration. For the PIN study data, serum biomarkers were analyzed in one of four batches with over 80% of the sample analyzed in the first batch (specifically, the sample proportions were approximately 0.87, 0.05, 0.06, 0.02, for batches one through four, respectively). Siega-Riz et al. (2004) found that batch differences were non-negligible and should be included in analyses using the serum biomarkers. Our analyses used the first batch as the reference group and placed vague, normal prior distributions on the remaining three batch effects. This additional caveat adds nothing novel to the overall measurement error model (2)–(4) and was easily incorporated into our Bayesian framework in Section 4. Finally, while the serum folate biomarker is believed to be free from systematic biases, it is not without drawbacks, which involve individual-specific factors such as personal rates of metabolism. In an ideal experiment, one would use an objective biomarker, like doubly-labeled water, rather than serum folate. Doubly-labeled water is a measure of energy expenditure and intake (under certain weight-stable conditions) and often regarded among the “best” biomarkers; however, it is not a true biomarker for any particular nutrient.

Acknowledgments

We thank the associate editor and two anonymous referees for helpful comments and suggestions which led to a much improved manuscript. The research of the first author was supported in part by grants from the National Institute for the Environmental Health Sciences (P30ES10126, T32ES007018). Dr. Herring’s research was supported in part by grants from the National Institutes of Child Health and Human Development (1R03HD045780, HD37584, HD39373) and NIEHS (P30ES10126). The PIN study was funded by grants from NICHD (HD28684, HD05798, DK55865, AG09525), UNC Clinical Nutrition Research Center (DK56350), UNC General Clinical Research Resources (RR00046), and funds from the Wake Area Health Education Center in Raleigh, NC. Dr. Ibrahim’s research was partially supported by NIH grants #CA 70101, #CA 69222, #GM 070335, and #AI 060373.

Appendix 1: Full conditional distributions

Let Dobs denote the observed data and rest be short-hand for all remaining parameters. Recall that Yi is a Bernoulli outcome with canonical link function so that

logLi(Yi|Ti,Zi;η)=Yi(η0+ηTTi+ηZZi)+log{1θi(η)},

where θ(u) = 1/(1 + eu).

1. Sample [η|rest, Dobs] from p(η|rest, Dobs) using Adaptive Rejection Sampling (Gilks and Wild, 1992) where,

p(η|rest,Dobs)exp{i=1mlogLi(η)12(ηη0)V0,η1(ηη0)}.

2. Sample [γ|rest, Dobs] from N {Λγγ̂ + (IΛγ )γ0, Λγ (HH)−1, where Λγ = (HH+V0)−1HH and γ̂ = (HH)−1H′(WRb).

3. Let b = (b1, …, bm)′. Sample [b|rest, Dobs] from N(Λbb^,W1Λb(RR)1) where Λb = (RR+ImD−1)−1RR, ⊗ is the Kronecker product, and = (RR)−1R′(WHγ).

4. Sample [D1|rest,Dobs]Wq(m+νD,(CD1+(i=1mbibi)1).

5. Sample (σ, ρ) from p(σ, ρ|rest, Dobs) where

p(σ,ρ|rest,Dobs)|W|1/2exp{12(WHγRb)W1(WHγRb)}π(σ)π(ρ)

6. Sample the error-prone covariate [Ti|rest, Dobs] from p(Ti|rest, Dobs) for i = 1, …, m where

p(Ti|rest,Dobs)=exp{logLi(η)12(WiHiγRibi)Wi1(WiHiγRibi)12(TiμT)T1(TiμT)}.

7. Sample the missing FFQs and 24-hour recalls in the substudy assuming the observations are missing at random, leading to [Wimiss|rest,Dobs]N(Hiγ+Ribi,Wi)

8. Sample the missing biomarkers from [Mimiss|rest,Dobs]N(Ti+bi3,σM2) assuming missing observations are missing at random.

9. Sample [μT |rest, Dobs] ~ N (ΛT + (IΛT)μ0,T, m−1 ΛT ΣT) where ΛT= V0,T (m−1ΣT + V0,T )−1 where T¯=m1i=1mTi.

10. Sample [T|rest,Dobs]WpT(m+νT,CT+(i=1mTiTi)1)

For the MDP implementation, substitute all of Subsection 4.1 for Step 6.

References

  1. Antoniak CE. Mixtures of Dirichlet processes with applications to non-parametric problems. Annals of Statistics. 1974;2:1152–1174. [Google Scholar]
  2. Block G. Invited Commentary: Another Perspective on Food Frequency Questionnaires. American Journal of Epidemiology. 2001;154:1103–1104. doi: 10.1093/aje/154.12.1103. [DOI] [PubMed] [Google Scholar]
  3. Breslow NE, Day NE. Statistical Methods in Cancer Research. Lyon: International Agency for Research on Cancer; 1980. [Google Scholar]
  4. Brown ER, Ibrahim JG. A Bayesian Semiparametric Joint Hierarchical Model for Longitudinal and Survival Data. Biometrics. 2003;59:221–228. doi: 10.1111/1541-0420.00028. [DOI] [PubMed] [Google Scholar]
  5. Brown JE, Buzzard IM, Jacobs DR, Hannan PJ, Kushi LH, Barosso GM, Schmid LA. A food frequency questionnaire can detect pregnancy-related changes in diet. Journal of the American Dietetic Association. 1996;96:262–266. doi: 10.1016/S0002-8223(96)00078-8. [DOI] [PubMed] [Google Scholar]
  6. Byers T. Food Frequency Dietary Assessment: How Bad is Good Enough? American Journal of Epidemiology. 2001;154:1087–1088. doi: 10.1093/aje/154.12.1087. [DOI] [PubMed] [Google Scholar]
  7. Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman & Hall/CRC; 1996. [Google Scholar]
  8. Carroll RJ. Variances are not always nuisance parameters. Biometrics. 2003;59:211–220. doi: 10.1111/1541-0420.t01-1-00027. [DOI] [PubMed] [Google Scholar]
  9. Carroll RJ, Ruppert D, Crainiceanu C, Tosteson T, Karagas M. Nonlinear and nonparametric regression and instrumental variables. Journal of the American Statistical Association. 2004;99:736–750. [Google Scholar]
  10. Carroll RJ, Ruppert D, Stefanski LA. Measurement Error in Nonlinear Models. Boca Raton: Chapman & Hall/CRC; 1995. [Google Scholar]
  11. Erkkola M, Karppinen M, Javanainen J, Rasanen L, Knip M, Virtanen SM. Validity and reproducibility of a food frequency questionnaire Finnish women. American Journal of Epidemiology. 2001;154:466–476. doi: 10.1093/aje/154.5.466. [DOI] [PubMed] [Google Scholar]
  12. Escobar MD. Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association. 1994;89:268–277. [Google Scholar]
  13. Geman S, Geman D. Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images. IEEE Trans on Pattern Analysis and Machine Intelligence. 1984;6:721–741. doi: 10.1109/tpami.1984.4767596. [DOI] [PubMed] [Google Scholar]
  14. Goldberg GR, Prentice AM, Coward WA, Davies HL, Murgatroyd PR, Wensing C, Black AE, Harding M, Sawyer M. Longitudinal assessment of energy expenditure in pregnancy by the doubly labeled water method. American Journal of Clinical Nutrition. 1993;57:494–505. doi: 10.1093/ajcn/57.4.494. [DOI] [PubMed] [Google Scholar]
  15. Greeley S, Storbakken L, Magel R. Use of a modified food frequency questionnaire during pregnancy. Journal of the American College of Nutrition. 1992;11:728–734. doi: 10.1080/07315724.1992.10718274. [DOI] [PubMed] [Google Scholar]
  16. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. [Google Scholar]
  17. Holcroft C, Spiegelman D. Design of validation studies for estimating the odds ratio of exposure-disease relationships when exposure is misclassified. Biometrics. 1999;55:1193–1201. doi: 10.1111/j.0006-341x.1999.01193.x. [DOI] [PubMed] [Google Scholar]
  18. Kaaks R, Riboli E, Esteve J, Van Kappel A, Vab Staveren W. Estimating the accuracy of dietary questionnaire assessments: validation in terms of structural equation models. Statistics in Medicine. 13:127–142. doi: 10.1002/sim.4780130204. [DOI] [PubMed] [Google Scholar]
  19. Kipnis V, Midthune D, Freedman LS, Bingham S, Schatzkin A, Subar A, Carroll RJ. Empirical evidence of correlated biases in dietary assessment instruments and its implications. American Journal of Epidemiology. 2001;153:394–403. doi: 10.1093/aje/153.4.394. [DOI] [PubMed] [Google Scholar]
  20. Kipnis V, Subar A, Midthune D, Freedman LS, Ballard-Barbash R, Troiano R, Bingham S, Schoeller DA, Schatzkin A, Carroll RJ. The structure of dietary measurement error: Results of the OPEN biomarker study. American Journal of Epidemiology. 2003;158:14–21. doi: 10.1093/aje/kwg091. [DOI] [PubMed] [Google Scholar]
  21. Kleinman KP, Ibrahim JG. A Semi-Parametric Bayesian Approach to Generalized Linear Mixed Models. Statistics in Medicine. 1998;17:2579–2596. doi: 10.1002/(sici)1097-0258(19981130)17:22<2579::aid-sim948>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
  22. Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 1987. [Google Scholar]
  23. MacEachern SN. Estimating normal means with a conjugate style Dirichlet process prior. Communications in Statistics: Simulation and Computation. 1994;23:727–741. [Google Scholar]
  24. Mallick B, Hoffman FO, Carroll RJ. Semiparametric regression modeling with mixtures of Berkson and classical error, with application to fallout from the Nevada Test Site. Biometrics. 2002;58:13–20. doi: 10.1111/j.0006-341x.2002.00013.x. [DOI] [PubMed] [Google Scholar]
  25. McCullagh P, Nelder JA. Generalized Linear Models. London: Chapman & Hall/CRC; 1983. [Google Scholar]
  26. Metropolis N, Ulam S. The Monte Carlo method. Journal of the American Statistical Association. 1949;44:335–341. doi: 10.1080/01621459.1949.10483310. [DOI] [PubMed] [Google Scholar]
  27. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. Journal of Chemical Physics. 1953;21:1087–1092. [Google Scholar]
  28. Morrissey MJ, Spiegelman D. Matrix methods for estimating odds ratios with misclassified exposure data: Extensions and comparisons. Biometrics. 1999;55:338–344. doi: 10.1111/j.0006-341x.1999.00338.x. [DOI] [PubMed] [Google Scholar]
  29. Robinson S, Godfrey K, Osmond C, Cox V, Barker D. Evaluation of a food frequency questionnaire used to assess nutrient intakes in pregnant women. European Journal of Clinical Nutrition. 1996;50:302–308. [PubMed] [Google Scholar]
  30. Savitz DA, Dole N, Williams J, Thorp JM, McDonald T, Carter CA, Eucker B. Study design and determinants of participation in an epidemiologic study of preterm delivery. Paediatric and Perinatal Epidemiology. 1999;13:114–125. doi: 10.1046/j.1365-3016.1999.00156.x. [DOI] [PubMed] [Google Scholar]
  31. Savitz DA, Dole N, Terry JW, Zhou H, Thorp JM. Smoking and pregnancy outcome among African-American and white women in central North Carolina. Epidemiology. 2001;12:636–642. doi: 10.1097/00001648-200111000-00010. [DOI] [PubMed] [Google Scholar]
  32. Savitz DA, Henderson L, Dole N, Herring A, Wilkins DG, Rollins D, Thorp JM. Indicators of cocaine exposure and preterm birth. Obstetrics and Gynecology. 2002;99:458–465. doi: 10.1016/s0029-7844(01)01735-5. [DOI] [PubMed] [Google Scholar]
  33. Searle SR. Linear Models. New York: John Wiley & Sons, Inc; 1971. [Google Scholar]
  34. Siega-Riz AM, Savitz DA, Zeisel SH, Thorp JM, Herring AH. Second trimester folate status and preterm birth. American Journal of Obstetrics and Gynecology. 2004;191:1851–1857. doi: 10.1016/j.ajog.2004.07.076. [DOI] [PubMed] [Google Scholar]
  35. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) J Roy Statist Soc, Ser B. 2002;64:583–639. [Google Scholar]
  36. Spiegelman D, Carroll RJ, Kipnis V. Efficient regression calibration for logistic regression in main study/internal validation study designs. Statistics in Medicine. 2001;29:139–161. doi: 10.1002/1097-0258(20010115)20:1<139::aid-sim644>3.0.co;2-k. [DOI] [PubMed] [Google Scholar]
  37. Spiegelman D, Rosner B, Logan R. Estimation and inference for binary data with covariate measurement error and misclassification for main study/validation study designs. Journal of the American Statistical Association. 2000;95:51–61. [Google Scholar]
  38. Spiegelman D, Zhao B, Kim J. Correlated errors in biased surrogates: study designs and methods for measurement error correction. Statistics in Medicine. 2004 doi: 10.1002/sim.2055. In press. [DOI] [PubMed] [Google Scholar]
  39. Stefanski LA, Boos DD. The calculus of M-estimation. The American Statistician. 2002;56:29–38. [Google Scholar]
  40. Stefanski LA, Carroll RJ. Covariate measurement error in logistic regression. Annals of Statistics. 1985;13:1335–1351. [Google Scholar]
  41. Subar AF, Kipnis V, Troiano RP, Midthune D, Schoeller DA, Bingham S, Sharbaugh CO, Trabulsi J, Runswick S, Ballard-Barbash R, Sunshine J, Schatzkin A. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN Study. American Journal of Epidemiology. 2003;158:1–13. doi: 10.1093/aje/kwg092. [DOI] [PubMed] [Google Scholar]
  42. Suitor CJ, Gardner J, Willett WC. A comparison of food frequency and diet recall methods in studies of nutrient intake of low-income pregnant women. Journal of the American Dietary Association. 1989;89:1786–1794. [PubMed] [Google Scholar]
  43. Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation (with discussion) Journal of the American Statistical Association. 1987;82:528–550. [Google Scholar]
  44. United States Department of Agriculture Food & Nutrition Service. WIC Dietary Assessment Validation Study Executive Summary. Freeman Sullivan and Company; San Francisco, CA 94105: 1994. [Google Scholar]
  45. Willett W. Nutritional Epidemiology. New York: Oxford University Press; 1998. [Google Scholar]
  46. Willett W. Invited Commentary: A Further Look at Dietary Questionnaire Validation. American Journal of Epidemiology. 2001;154:1100–1102. doi: 10.1093/aje/154.12.1100. [DOI] [PubMed] [Google Scholar]

RESOURCES