Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 15.
Published in final edited form as: Stat Biosci. 2022 Jul 12;15(1):57–113. doi: 10.1007/s12561-022-09349-3

Using Controlled Feeding Study for Biomarker Development in Regression Calibration for Disease Association Estimation

Cheng Zheng 1, Yiwen Zhang 2, Ying Huang 3, Ross Prentice 4
PMCID: PMC10270384  NIHMSID: NIHMS1888533  PMID: 37324058

Abstract

Correction for systematic measurement error in self-reported data is an important challenge in association studies of dietary intake and chronic disease risk. The regression calibration method has been used for this purpose when an objectively measured biomarker is available. However, a big limitation of the regression calibration method is that biomarkers have only been developed for a few dietary components. We propose new methods to use controlled feeding studies to develop valid biomarkers for many more dietary components and to estimate the diet disease associations. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulation is performed to study the finite sample performance of the proposed estimators. We applied our method to examine the associations between the sodium/potassium intake ratio and cardiovascular disease incidence using the Women’s Health Initiative cohort data. We discovered positive associations between sodium/potassium ratio and the risks of coronary heart disease, nonfatal myocardial infarction, coronary death, ischemic stroke, and total cardiovascular disease.

Keywords: Measurement Error, Regression Calibration, Feeding Study, Biomarker, Cardiovascular Disease

1. Introduction

There is an urgent need to obtain reliable information on dietary patterns that can reduce the risks of various chronic diseases, such as cancer, cardiovascular disease (CVD) and diabetes. Although a positive association between obesity and cancer risk is well established (Adams et al., 2006), epidemiological studies have not shown convincing evidence that key energy balance factors, including total energy intake, are risk factors for major chronic diseases (WCRF/AICR, 2007). A likely cause of this apparent discrepancy is bias in dietary assessment, which is known to be challenging to deal with, see e.g. (Paeratakul et al., 1998). There is strong evidence (Prentice et al., 2011) that the misreporting of dietary energy intake is related to individual characteristics (for example, body mass index (BMI)). Such systematic measurement error leads to bias that cannot be automatically corrected (Carroll et al., 2006). The challenge of having measurement error and its impact in nutritional study have been reviewed by Freedman et al. (2011). Methods to deal with dietary measurement error in nutritional studies has been studied for years and various approaches have been proposed (Bartlett & Keogh, 2018, Hu & Lin, 2002, Huang & Wang, 2000, Li & Ryan, 2006, Song & Huang, 2005, Wang et al., 1997, Yan & Yi, 2015, Zucker, 2005). Among these methods, we will focus on regression calibration method which allows us to handle the covariate dependent measurement error. Regression calibration has been proposed to correct measurement error in covariates (Prentice, 1982, Rosner et al., 1990) and has the advantage of being easy to implement. Some previous methodology, and their applications to the Women’s Health Initiative (WHI) (Prentice, 1982, Prentice et al., 2011, Shaw & Prentice, 2012, Zheng et al., 2014), have shown the validity and value of using (joint) regression calibration approaches to tackle this issue when objective measurements are available to be used as biomarkers of dietary intakes. These biomarkers are used to build calibration equations for self-reported measurements of the exposures of interest. Calibrated intake estimates can then be used to estimate associations between these dietary exposures and the risks of various diseases.

There remains an important research gap in building satisfactory biomarkers for many nutritional (and physical activity) variables using only univariate objective measurements. Therefore, regression models have been used to build biomarkers with multiple predictor variables. For example in the WHI Nutrition and Physical Activity Assessment Study (NPAAS), to correct systematic measurement error in the self-reported food frequency questionnaire (FFQ) data from the full cohort (with 161,808 subjects), blood and urine measurements were collected for a subgroup (450 subjects) of the cohort (Prentice et al., 2011). Besides, a feeding study was performed on another smaller subgroup (153 subjects) where both blood and urine measurements and the provided dietary intake information were collected (Lampe et al., 2017). This controlled feeding study used a novel design. Rather than feed all women the same standard diets, each woman was provided food that mimicked her habitual diet as described by her 4-day food record (4FDR) with adjustment based on individual discussion with the study dietitian (Lampe et al., 2017).

One might consider a natural strategy of developing calibration equations using biomarkers developed from the feeding study: First, with the 153 subjects from the feeding study, one builds a model to predict the provided dietary intakes using the blood and urine measurements along with some personal characteristics of the subjects (Lampe et al., 2017). Second, estimation of the ‘true’ dietary intake is made for the 450 subjects using their blood and urine measurements and the prediction model from the first step; and a calibration equation for the self-reported dietary intakes can be established. Previously developed regression calibration methods (e.g. Zheng et al. (2014)) used an externally developed biomarker, which plausibly satisfies the classical measurement error assumption. The classical measurement error assumption requires the measurement error to be independent of the true value. This assumption, however, will generally be violated by this feeding study based biomarker development procedure because the residual of the regression model is indpendent of the predicted value rather than the true value. Ignoring this violation of the classical measurement error assumption causes bias in the subsequent estimates of the calibrated dietary intake and the diet-disease association. To tackle this issue, in this project, we aim to develop new calibration methods that can allow for Berkson-type errors (i.e, an error that is independent of the error-prone variable rather than the error-free variable (Carroll et al., 2006)) in the biomarkers. Using our new methods, we establish consistent estimators for diet-disease associations; we incorporate variation in the biomarker construction step and the calibration step when estimating asymptotic distributions. From these, one can build valid confidence intervals for disease association parameters.

To evaluate the performance of the new methods and demonstrate their real data application, we applied the new methods to evaluate the association between the dietary sodium to potassium ratio and CVD risk. This association has been evaluated in a recent WHI study using a regression calibration method where the nutrient intake was calibrated by a single measurement biomarker (from a single 24-hour urine collection) (Huang et al., 2013, Prentice et al., 2017). In a different cohort, the large international Prospective Urban Rural Epidemiology (PURE) Study measured sodium and potassium intakes using fasting morning spot urine (O’Donnell et al., 2014). These previous studies suggested positive associations between CVD and sodium to potassium ratio. However, the single measure biomarker is suboptimal for the performance of the regression calibration approach, as it has a lower than desirable correlation with the true dietary intakes in the WHI context. The data from NPAAS provide a golden opportunity to build potentially stronger biomarkers for sodium and potassium intakes into disease association studies using novel regression methods.

2. Framework and Notation

We are interested in studying the association between a specific dietary intake ZR (for example the (log-transformed) dietary sodium-potassium ratio) and the time, T, to development of a certain chronic disease. We further consider some potential confounding variables, which we call personal characteristics VRq where q is the number of covariates. We use a Cox model to model the hazard of the response:

λ(tZ,V,Q)=λ(tZ,V)=λ0(t)expZ,Vθ, (1)

where θ=θz,θvR(1+q),θz is the parameter of interest, and λ0(t) is a ‘baseline’ hazard function. However, instead of observing Z, we only collect data on the self-reported dietary intake QR, which may be biased from Z in a manner that depends on personal characteristics:

Q=1,Z,Va+εq, (2)

where aR(2+q) is a unknown parameter vector and εq is a mean 0 random error that is independent of Z and V.

The main purpose of the regression calibration equation development is to discover the relationship between Q and Z. In the WHI feeding study, we provide the subjects’ diets using standardized food, in a manner that mimics the subjects’ usual diet, while using dietary components having well-characterised nutrient content (Lampe et al., 2017). We denote the true unobservable dietary intake within this two-week feeding period as X=Z+εx, where εxN0,σx2 is independent of Z and V. A subtle issue with the feeding study relates to measurement error induced by food packaging. For example, a bag of chips that is labeled with 100 calories in energy could actually have 101 calories. Therefore, the observed short-term dietary intake X˜ in the feeding study can be modeled as X˜=X+ε˜x, where ε˜xN0,σ˜x2 is independent of εx, Z and V. Figure 1 displays the study design, including the feeding study (Sample 1) for biomarker development, the biomarker substudy (Sample 2) for calibration equation development, and the full cohort (Sample 3) for disease association analyses. When a self-reported intake Q is also available for the feeding study samples, the bias of self-reported dietary intake can be calibrated directly (see section 3.4). However, the concurrent self-reported dietary intake Q is typically unavailable in Sample 1. To obtain that one would need to have a long-term feeding study where individuals report their provided dietary intakes over preceding months (e.g. 3 months). In addition, the Q value obtained just prior to the feeding period in NPAAS-FS is not collected at the same time as biomarker W and it might be inappropriately strongly correlated with X˜. Therefore, in practice, our best choice is to use the baseline Q collected at a different time (at baseline for Sample 3 in our example) for Sample 1. This baseline Q has been successfully used in studies for multiple dietary components (e.g. protein and carbohydrate) (Prentice et al., 2021a,b). However, there is a time interval between the data collection of this baseline Q and measurement time for (X,V,W,Z) in Sample 1. Therefore, there is a concern that the conditional distribution (QX˜,V,W,Z) in Sample 1 might be different from Samples 2 and 3 for some dietary components. In this case, we treat Q as unavailable in Sample 1. Even when Q is available, the sample size of a feeding study is typically quite limited, possibly leading to non-satisfactory efficiency for disease association estimates. Alternatively, a biomarker WRP, which is composed of p objectively measured blood and urine measurements can be used to bridge between X˜ in the feeding study sample and Q from another larger sample. We consider the blood and urine measurements W are affected by the short term diet X while the self-reported questionnaire data are directly affected by the long term diet Z. We assume W to follow a parametric model:

W=1,X,VB+εw,

where BR(2+q)×p is an unknown parameter and εwN0,σw2Ip is independent of εx,ε˜x,εq,Z,V and B.

Fig. 1:

Fig. 1:

a. The flow chart of the whole process from biomarker construction to association estimation; b. Direct Acyclic Graph for the variables; c. Measurements between datasets. In the plots, we use Z to denote the (unobservable) true long-term dietary intake, V the personal characteristics, W the biomarker from objectively blood and urine measurements, Q the self-reported dietary intake, X the (unobservable) true short-term dietary intake, X˜ the observed short-term dietary intake, T,Δ the time to event/censoring and indicator of event/censoring.

Estimation of the association between Z and T is composed of 3 stages with non-overlapping samples from the same underlying population: namely, 1. the biomarker construction stage, 2. the calibration stage, and 3. the association assessment stage (Figure 1(b)). For each stage, a different sample is used. We denote the sample size of stage k as nk. For stage 1, we have n1 samples and for every individual i, we have data X˜i,Wi,Vi and possibly Qi available; for stage 2, we have n2 samples and for every individual i, we have Qi,Wi,Vi; for stage 3, we have n3 samples and for every individual i, we have Qi,Vi and the composite outcome (Ti*=TiCi,Δi=I(TiCi)) where Ti is the time to disease occurrence and Ci is a potential censoring time.

The estimation procedure is as follows. In stage 1, we use data from the biomarker construction stage to establish the biomarker. This model can be built by regressing the observed short term dietary intakes X˜ on one of the following:

  1. blood/urine measurements W and personal characteristics V;

  2. blood/urine measurements W, self-reported dietary intake Q, and personal characteristics V;

  3. self-reported dietary intake Q and personal characteristics V.

Remark 1

As mentioned above, self-reported dietary intake Q might be considered to be unavailable in stage 1. In this case, we treat Q in stage 1 as unavailable and adopt option (i) in stage 1. When Q is available in stage 1, option (ii) potentially improves the estimation of X˜ in stage 1. When the biomarker W is unavailable, option (iii) directly model X˜ based on Q and V; the performance will potentially be affected by the limited sample size n1.

In Stage 2, using data from the calibration stage, we build a calibration equation using the self-reported log-transformed dietary intake Q and the personal characteristics V to predict the true dietary intake X if (i) or (ii) is used in Stage 1. If (iii) is used in Stage 1, then the calibration equation was already established in Stage 1 and thus Stage 2 is omitted. One caveat is that developing biomarker using (i) would lead to Berkson-type error and have impact on the regression calibration, as will be shown later. To see this, asymptotically X^ converges to :

E[XW,V]=Xεx=X+εx,

where, under a normality assumption for X,εx=εx is independent of E[XW,V] but not independent of X. Therefore εx is a Berkson-type error instead of a classical measurement error. Therefore new methods are needed to account for this property.

In Stage 3, we only have information on the self-reported dietary intake Q, the personal characteristics V, and the composite survival outcome (T*,Δ). We use the calibration equation developed in Stage 2 to calibrate the self-reported dietary intake for the large cohort and perform disease association analyses.

Figure 1(c) shows the variable availabilities among the three samples. When Q is not available in the stage 1 sample, option (i) can be considered for this stage.

Remark 2

In Stage 1, we consider both scenarios where Q is available in Sample 1 and Q is not available in Sample 1 (in the sense that the joint distribution of (Q,X˜,V,W) are different between Stage 1 and the other two stages). When Q is not available in Sample 1, although the joint distribution of (Q,X˜,V,W) is not nonparametrically identifiable, under our model assumption the first order moment E(X˜Q,V,W) is identifiable, which enables us to obtain the consistent estimator of X˜ when using regression calibration approach under the rare event assumption. More details are discussed in Section 7.

3. Method

In this section, we first introduce the traditional regression calibration method (Method 1) and then our three newly proposed methods (Method 2–4). We have n=n1+n2+n3 subjects S1,,Sn, with S1:n1 being the biomarker construction sample; Sn1+1:n1+n2 the calibration sample; and Sn1+n2+1:n the association assessment sample.

3.1. Method 1: The naïve three-step approach

We first consider the naïve three-step approach.

Step 1: With the stage 1 sample, regress the consumed diet (X˜) on the blood and urine measurements (W) and the personal characteristics (V) to obtain

β^1=i=1n11,Wi,Vi1,Wi,Vi1i=1n11,Wi,ViX˜i.

Step 2: For the stage 2 sample, compute X^1i=1,Wi,Viβ^1 for i=n1+1,,n1+n2 as a predictor for X. Then regress X^1 on Q and V to obtain calibration equation parameter estimate

γ^1=i=n1+1n1+n21,Qi,Vi1,Qi,Vi1i=n1+1n1+n21,Qi,ViX^1i.

Step 3: We first estimate Z in the stage 3 sample with Z^1i=1,Qi,Viγ^1 for i=n1+n2+1,,n1+n2+n3. Then we estimate the association of Z with the time-to-event endpoint (T*,Δ) by solving the score equation for Cox model:

0=i=n1+n2+1n1+n2+n30τ(Z^1iVi)jYj(t)exp(Z^1j,Vj)θkYk(t)exp(Z^1k,Vk)θZ^1jVjdNi(t).

where τ is a pre-specified large number and we assume P(C>τ)>0,Ni(t)=I(Δi=1,Ti*t) and Yi(t)=I(Ti*t). In application, τ is typically defined to be the largest follow-up time in the (Step 3) sample.

Remark 3

We show in the Appendix A.1 that

E(Z^1Q,V)=BF×E(ZQ,V)+(1BF)×E(ZV)E(ZQ,V),

where the bias factor (BF) is given by

BF=1VarXV,WVarXV=R1V2.

Such BF will lead to bias in the estimation of association parameter θ. Upon further assuming E(XV)=1,Vδ where δR1+q is unknown parameter, we show in Appendix A.1 and A.2 that the estimator θ^1θ1* as n, with θ1z*BF1×θz and θ1v*θv1BFBFθ1z. The approximation error is ignorable only under rare disease assumption, i.e., P(T<tZ,V)0 when n for all levels of Z and V and t[0,τ]. Here we assume BF>0. When BF=0, the variance of θˆ1 will be infinity because θz is not identifiable. By definition of BF, when BF=0, the biomarker W contains no information of X given V, therefore we are not able to separate the effect of X and V.

3.2. Method 2: Three-step with Bias Correction

As shown for Method 1, we have bias in Z^1 when using X^1 in Step 2. Therefore, we propose a bias-corrected estimator X^2i=BF^1X^1i where

BF^=R^1V2=1Var^(X˜V,W)σ˜x2Var^(X˜V)σ˜x2

is the estimated BF. Here, we assume that σ˜x2 is known. In many real applications, σ˜x2 can be estimated with a separate experiment, as we discuss in Section 7. When σ˜x2 is not available, we may vary this parameter to perform sensitivity analysis.

Then we have

γ2^=i=n1+1n1+n21,Qi,Vi1,Qi,Vi1i=n1+1n1+n21,Qi,ViXˆ2i.

In step 3, we predict the exposure by Z^2i=1,Qi,Viγ2^ for i=n1+n2+1,,n1+n2+n3 and obtain estimator θ^2 by solving the following estimating equations:

0=i=n1+n2+1n1+n2+n30τ(Z^2iVi)jYj(t)exp(Z^2j,Vj)θkYk(t)exp(Z^2k,Vk)θZ^2jVjdNi(t).

Alternatively, instead of correcting X^ which allows estimation of both θˆ2z and θˆv, we can also compute θ^2z as θ^1z×BF^.

Remark 4

This method does not require the self-reported dietary intake data (Q) to be collected in the feeding study. As a remark, even if the self-reported data is available in the feeding study (Stage 1), the association between the self-reported and the actual dietary intake in the feeding study might be different from that association from the cohort for reasons such as the modification of the dietary pattern during the controlled feeding study, or the potential change in dietary preference in the period of the feeding study. Method 2 is robust to such an association difference since we have not directly included Q in Stage 1 for biomarker construction. However, when Q is available among Stage 1 samples and its association with Z remains unchanged among all stages, we can use its information to improve efficiency. We will next propose two methods that require the availability of the self-reported dietary intake in the feeding study and assume the association between the self-reported dietary intake and the actual dietary intake to be the same among all three samples.

3.3. Method 3: Three-step with Q available in stage 1

When the self-reported dietary intake Q is available from the feeding study and the distribution of (Q|Z,V) are the same between controlled feeding study and the cohort, then the bias in the naïve estimator can be corrected simply by including Q in the biomarker development equation because the inclusion of Q guarantees that

E[Z^Q,V]=E[E[ZQ,V,W]Q,V]=E[ZQ,V].

The three steps of the first method remain the same, but in the first step regression model, the self-reported dietary intake (Q) is added. That is, for the first step, we regress X˜ on W,V and Q to build the biomarker, and then use W,V and Q to predict Z in the second step. Specifically, we have

β^3=i=1n11,Wi,Qi,Vi1,Wi,Qi,Vi11,Wi,Qi,ViX˜i,

X^3i=1,Wi,Qi,Viβ^3 for i=n1+1,,n1+n2, and then

γ^3=i=n1+1n1+n21,Qi,Vi1,Qi,Vi11,Qi,ViX^3i,

and Z^3i=1,Qi,Viγ^3 for i=n1+n2+1,,n1+n2+n3. We obtain θ^3 by solving the following estimating equations:

0=i=n1+n2+1n1+n2+n30τ(Z^3iVi)jYj(t)exp(Z^3j,Vj)θkYk(t)exp(Z^3k,Vk)θZ^3jVjdNi(t)

3.4. Method 4: Direct Estimation

When Q is available from the feeding study, another possibility is to ignore the second dataset (or not require a second dataset to be available) and directly build the estimating equation by regressing X˜ on Q and V in the first step and directly apply it in the third step. All other steps remain the same as the third method, except that we ignore the n2 calibration samples and directly build the calibration equation using the feeding study by regressing X˜ on V and Q and use the calibration equation to predict Z and perform a regression of Y on Z and V in the full cohort to estimate the association parameter. In other words, we have

γ^4=i=1n11,Qi,Vi1,Qi,Vi11,Qi,ViX˜i,

with Z^4i=1,Qi,Viγ^4 for i=n1+n2+1,,n1+n2+n3 and θ^4 by solving the following estimating equations:

0=i=n1+n2+1n1+n2+n30τZ^4iVijYj(t)exp(Z^4j,Vj)θkYk(t)exp(Z^4k,Vk)θZ^4jVjdNi(t).

4. Asymptotics

In this section, we derive the asymptotic distributions of the four estimators and show that Method 1 tends to give biased result while the other three methods provide consistent estimators under rare disease assumption (P(T<tZ,V)0 when n for t[0,τ]).

In practice, the violation of rare disease assumption will lead to estimation bias in Methods 2–4 (i.e., θ2*,θ3*,θ4* as shown in theorems below can be different from θ), but the extent of the biases from Methods 2–4 are usually much smaller than that of Method 1 according to our numerical study results. Here we summarize the asymptotic results in the following theorem with the proofs given in Appendix A.2.

Theorem 1:

With n3n2C2 and n2n1C1, for k=1,2,3,4, we have

n3(θ^kθk*)N0,Σθk

where Σθk=Iθk1Iθk+C2IγkΣγkIγkIθk can be consistently estimated by Σ^θk=I^θk1(I^θk+n3n2I^γkΣ^γkI^γk)I^θk with the detailed expressions defined in the proofs in Appendix A. Here the Iθk component is the same for k=2,3,4, though its corresponding estimate Iˆθk can be different.

Intuitively, method 4 should be less efficient compared with method 2 and 3 as it ignores the information contained in W. Heuristically, to compare the efficiency of the estimators from method 3 and method 4, we can show that

E[Var(Zˆ3Q,V)]/E[Var(Zˆ4Q,V)]=n1n2+(1n1n2)(1R(X,W)Q,V2), (3)

which suggests that the key quantity to characterise the usefulness of biomarker W is the partial coefficient of multiple determination between X and W conditional on Q and V(R(X,W)Q,V2). The closer the value of R(X,W)|Q,V2 is towards 0, the ‘weaker’ the biomarker is; the closer it is towards 1, the ‘stronger’ the biomarker is. Consider two extreme examples: (1) when R(x,W)|Q,V2=0, the relative efficiency is 1; method 3 does not have an efficiency gain compared with method 4; the biomarker is completely useless. (2) when R(X,W)QV2=1, the relative efficiency is n1n2. This is as if we have observed all X information in the NPAAS dataset and the efficiency gain is proportional to the sample size gain. The asymptotic variance comparing θ3 and θ^4 is always less than or equal to 1, which indicates a potential loss of efficiency by ignoring the second dataset. Also, method 2 should be less efficient than method 3 when Q is available in the first stage sample as it ignores the information on Q in the first sample.

5. Simulation Studies

We performed simulations to study the finite sample behavior of our proposed estimators. We generate exposure and covariates from the following models:

Z,VN0,1σx2ρρ1,W=b0+b1X+b2V+εw,X=Z+εx,Xˆ=X+εx,Q=a0+a1Z+a2V+εq,

where εw, εx, ε˜x and εq are independently sampled from normal distributions with mean zero and with respective standard deviations σw, σx, σ˜x and σq. The event time is generated from Cox regression model:

λtZ,X,V,W,Q=λtZ,V=λ0texpθzZ+θvV.

Three different censoring distributions are considered: (1) Mixed censoring: the censoring time is sampled from a mixture of Unif(0,10) and a point mass at 10 with equal probability; (2) Uniform censoring: the censoring time is sampled from Unif(0,10); (3) Exponential censoring: the censoring time is sampled from exponential distribution with mean 10. We simulated with three sample sizes, n1,n2,n3=(150,300,5150), n1,n2,n3=(300,600,10300) and n1,n2,n3=(600,1200,20600). We set b0=5,b2=1, σx=0.2, σ˜x=0.5, θz=0.4, θv=0.6, σw=1 and λ0(t)=0.002, λ0(t)=0.004 and λ0(t)=0.004t. Then we changed the values of b1, ρ, a1, a2 and σq to vary the values of the coefficients of multiple determinations R2s, and partial R2s conditional on V, that quantify the strength of W and Q in predicting Z. The detailed mathematical definitions and values of these R2s under 6 different settings can be found in the Appendix B. The R2s for regressions in step 1 and 2 in all settings are also listed in the Appendix B. For settings 1–3, we have weak Q effects (RZQV2=0.13), with the effect of W,RZWV2, decreasing from 0.49 to 0.28. For settings 4–6, we have a stronger Q effect (RZQV2=0.2) with the effect of W,RZWV2, decreasing from 0.53 to 0.19. The regression parameters and baseline hazards are chosen to illustrate the performance under the event rate of our real data application; the event rate ranges from 2% to 24% as listed in Appendix B.

Tables 1, 2 and 3 summarizes the simulation results for all six settings. The bias, mean estimated standard error (SE), empirical standard deviation (SD), and coverage rate (CR) of 95% nominal confidence interval for θz for all 4 methods with the three sample sizes from 1000 simulations are listed. The results show that method 1 tends to be biased even when the R2 is high; while our proposed methods 2–4 performed uniformly better across all settings, with much smaller bias and reduced SE. When the sample size is small, i.e., n1,n2,n3=(150,300,5150), the CR for Method 1 seems to be acceptable, but it is because of the SE for Method 1 is also large under this small sample size; when the sample size is doubled in the setting where n1,n2,n3=(300,600,10300), the SE becomes smaller and the bias dominates the error. Among the three newly proposed methods, Method 3 performs better than Method 4 under larger sample sizes (n1,n2,n3=(300,600,10300) and n1,n2,n3=(600,1200,20600)); observation is in accordance with our theoretical result for the asymptotic distributions. Method 2 has the best performance when W has strong effect and Q has weak effect (Setting 1). When the biomarker has a weak effect (Settings 3 and 6) and the sample size for the feeding study n1 is small, Method 2 may have relatively large bias and variance because the estimated BF is close to 0. The performance of Method 3 and Method 4 are less sensitive to the strength of W and for small sample size, n1,n2,n3=(150,300,5150), the performance of Method 4 is good. Specifically, when the effect of Q is weak (Setting 1–3), Method 4 outperforms Method 3 with smaller variance.

Table 1:

Simulation results comparing different methods’ performance in terms of bias, mean estimated standard error (SE), empirical standard deviation (SD) and coverage rate (CR) for 95% confidence interval when λ0(t)=0.002.

Censoring Setting Method (n1,n2,n3)=(150,300,5150) (n1,n2,n3)=(300,600,10300) (n1,n2,n3)=(600,1200,20600)

Bias SE SD CR Bias SE SD CR Bias SE SD CR

Mixed 1 1 0.48 0.56 0.56 0.97 0.40 0.36 0.37 0.87 0.39 0.34 0.33 0.82
2 0.05 0.29 0.29 0.97 0.01 0.19 0.19 0.96 0.01 0.18 0.17 0.96
3 0.07 0.47 0.65 0.97 0.01 0.19 0.19 0.97 0.01 0.17 0.17 0.96
4 0.05 0.30 0.30 0.98 0.02 0.19 0.20 0.97 0.01 0.18 0.18 0.96

2 1 0.67 0.72 0.73 0.97 0.56 0.45 0.46 0.84 0.55 0.41 0.40 0.77
2 0.06 0.31 0.31 0.97 0.01 0.19 0.20 0.97 0.01 0.18 0.18 0.97
3 0.07 0.63 0.84 0.97 0.01 0.19 0.19 0.97 0.01 0.18 0.17 0.96
4 0.05 0.30 0.30 0.98 0.02 0.19 0.20 0.97 0.01 0.18 0.18 0.96

3 1 1.38 1.82 1.97 0.98 1.09 0.74 0.76 0.79 1.05 0.64 0.64 0.67
2 0.10 0.50 0.55 0.96 0.02 0.21 0.21 0.96 0.02 0.19 0.18 0.96
3 0.07 0.55 0.73 0.97 0.01 0.19 0.20 0.97 0.01 0.18 0.17 0.96
4 0.05 0.30 0.30 0.98 0.02 0.19 0.20 0.97 0.01 0.18 0.18 0.96

4 1 0.38 0.35 0.34 0.91 0.35 0.24 0.24 0.75 0.35 0.22 0.22 0.67
2 0.03 0.19 0.19 0.96 0.01 0.13 0.13 0.96 0.01 0.12 0.12 0.96
3 0.02 0.19 0.19 0.97 0.00 0.13 0.13 0.95 0.01 0.12 0.12 0.95
4 0.02 0.19 0.19 0.97 0.01 0.13 0.14 0.95 0.01 0.12 0.12 0.96

5 1 0.73 0.55 0.55 0.87 0.66 0.36 0.37 0.58 0.66 0.32 0.32 0.46
2 0.03 0.21 0.22 0.96 0.01 0.14 0.14 0.96 0.01 0.13 0.13 0.96
3 0.02 0.19 0.20 0.97 0.00 0.13 0.13 0.96 0.01 0.12 0.12 0.95
4 0.02 0.19 0.19 0.97 0.01 0.13 0.14 0.95 0.01 0.12 0.12 0.96

6 1 2.17 2.31 2.87 0.91 1.77 0.86 0.90 0.42 1.71 0.71 0.71 0.24
2 0.09 0.45 0.60 0.94 0.02 0.17 0.17 0.96 0.02 0.14 0.14 0.96
3 0.02 0.19 0.20 0.97 0.01 0.13 0.13 0.96 0.01 0.12 0.12 0.95
4 0.02 0.19 0.19 0.97 0.01 0.13 0.14 0.95 0.01 0.12 0.12 0.96

Uniform 1 1 0.46 0.66 0.67 0.97 0.41 0.43 0.45 0.89 0.41 0.41 0.40 0.85
2 0.04 0.34 0.34 0.97 0.02 0.22 0.23 0.96 0.02 0.21 0.21 0.97
3 0.07 0.59 0.87 0.98 0.02 0.22 0.23 0.96 0.02 0.21 0.21 0.95
4 0.04 0.35 0.36 0.97 0.02 0.23 0.24 0.96 0.02 0.21 0.21 0.95

2 1 0.65 0.84 0.86 0.97 0.58 0.53 0.55 0.87 0.57 0.49 0.49 0.82
2 0.05 0.36 0.37 0.97 0.02 0.23 0.23 0.96 0.02 0.21 0.21 0.96
3 0.08 0.80 1.14 0.97 0.02 0.22 0.23 0.96 0.02 0.21 0.21 0.95
4 0.04 0.35 0.36 0.97 0.02 0.23 0.24 0.96 0.02 0.21 0.21 0.95

3 1 1.32 1.95 2.10 0.98 1.11 0.86 0.90 0.84 1.07 0.76 0.76 0.76
2 0.08 0.55 0.60 0.96 0.03 0.24 0.25 0.96 0.02 0.22 0.22 0.97
3 0.07 0.69 0.99 0.98 0.02 0.22 0.23 0.95 0.02 0.21 0.21 0.95
4 0.04 0.35 0.36 0.97 0.02 0.23 0.24 0.96 0.02 0.21 0.21 0.95

4 1 0.37 0.42 0.40 0.94 0.35 0.28 0.28 0.82 0.35 0.27 0.27 0.77
2 0.02 0.23 0.22 0.97 0.01 0.15 0.15 0.97 0.01 0.15 0.15 0.96
3 0.02 0.22 0.23 0.97 0.00 0.15 0.15 0.96 0.01 0.14 0.14 0.95
4 0.02 0.22 0.22 0.97 0.01 0.15 0.16 0.96 0.01 0.15 0.15 0.96

5 1 0.72 0.63 0.63 0.92 0.66 0.42 0.42 0.70 0.66 0.38 0.38 0.60
2 0.03 0.24 0.24 0.96 0.01 0.16 0.16 0.97 0.01 0.15 0.15 0.96
3 0.02 0.22 0.23 0.97 0.00 0.15 0.15 0.96 0.01 0.14 0.14 0.96
4 0.02 0.22 0.22 0.97 0.01 0.15 0.16 0.96 0.01 0.15 0.15 0.96

6 1 2.12 2.34 2.66 0.94 1.77 0.97 1.00 0.59 1.71 0.82 0.82 0.43
2 0.08 0.46 0.54 0.94 0.02 0.19 0.18 0.96 0.02 0.16 0.16 0.97
3 0.02 0.23 0.23 0.97 0.01 0.15 0.15 0.96 0.01 0.15 0.15 0.96
4 0.02 0.22 0.22 0.97 0.01 0.15 0.16 0.96 0.01 0.15 0.15 0.96

Exponential 1 1 0.44 0.51 0.52 0.95 0.41 0.33 0.35 0.83 0.39 0.30 0.30 0.79
2 0.03 0.26 0.27 0.97 0.01 0.17 0.18 0.96 0.01 0.16 0.16 0.96
3 0.05 0.44 0.63 0.96 0.01 0.17 0.18 0.96 0.01 0.16 0.16 0.97
4 0.03 0.27 0.27 0.96 0.02 0.18 0.19 0.95 0.01 0.16 0.16 0.96

2 1 0.63 0.66 0.68 0.95 0.57 0.41 0.44 0.80 0.54 0.37 0.37 0.72
2 0.04 0.28 0.29 0.96 0.02 0.18 0.18 0.96 0.01 0.16 0.16 0.97
3 0.06 0.60 0.82 0.96 0.01 0.17 0.18 0.96 0.01 0.16 0.16 0.96
4 0.03 0.27 0.27 0.96 0.02 0.18 0.19 0.95 0.01 0.16 0.16 0.96

3 1 1.29 1.66 1.79 0.97 1.10 0.69 0.74 0.73 1.03 0.59 0.59 0.61
2 0.07 0.46 0.51 0.95 0.02 0.19 0.20 0.95 0.01 0.17 0.17 0.97
3 0.05 0.52 0.71 0.96 0.02 0.17 0.18 0.95 0.01 0.16 0.16 0.96
4 0.03 0.27 0.27 0.96 0.02 0.18 0.19 0.95 0.01 0.16 0.16 0.96

4 1 0.37 0.32 0.31 0.91 0.34 0.22 0.23 0.69 0.34 0.20 0.20 0.63
2 0.02 0.18 0.17 0.97 0.00 0.12 0.12 0.96 0.00 0.11 0.11 0.96
3 0.02 0.17 0.17 0.97 0.00 0.11 0.12 0.96 0.00 0.11 0.11 0.95
4 0.02 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.00 0.11 0.11 0.95

5 1 0.72 0.51 0.51 0.85 0.66 0.33 0.35 0.49 0.64 0.29 0.29 0.39
2 0.03 0.19 0.20 0.97 0.01 0.13 0.13 0.96 0.00 0.11 0.11 0.96
3 0.02 0.17 0.18 0.97 0.00 0.12 0.12 0.95 0.00 0.11 0.11 0.95
4 0.02 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.00 0.11 0.11 0.95

6 1 2.11 2.05 2.38 0.88 1.76 0.81 0.87 0.32 1.67 0.65 0.65 0.15
2 0.08 0.40 0.49 0.93 0.02 0.16 0.16 0.94 0.01 0.13 0.13 0.96
3 0.02 0.17 0.18 0.96 0.00 0.12 0.12 0.95 0.00 0.11 0.11 0.95
4 0.02 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.00 0.11 0.11 0.95

Table 2:

Simulation results comparing different methods’ performance in terms of bias, mean estimated standard error (SE), empirical standard deviation (SD) and coverage rate (CR) for 95% confidence interval when λ0(t)=0.004.

Censoring Setting Method (n1,n2,n3)=(150,300,5150) (n1,n2,n3)=(300,600,10300) (n1,n2,n3)=(600,1200,20600)

Bias SE SD CR Bias SE SD CR Bias SE SD CR

Mixed 1 1 0.44 0.41 0.43 0.97 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
2 0.03 0.21 0.22 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
3 0.04 0.37 0.54 0.95 0.01 0.14 0.14 0.94 0.01 0.12 0.12 0.97
4 0.03 0.22 0.23 0.94 0.01 0.14 0.16 0.94 0.01 0.12 0.13 0.96

2 1 0.62 0.54 0.59 0.93 0.56 0.33 0.36 0.68 0.38 0.24 0.24 0.66
2 0.04 0.23 0.25 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
3 0.05 0.50 0.71 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.97
4 0.03 0.22 0.23 0.94 0.01 0.14 0.16 0.94 0.01 0.12 0.13 0.96

3 1 1.29 1.64 2.00 0.94 1.08 0.57 0.62 0.55 1.03 0.47 0.47 0.36
2 0.07 0.44 0.51 0.94 0.02 0.16 0.17 0.94 0.01 0.14 0.14 0.96
3 0.05 0.43 0.62 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.97
4 0.03 0.22 0.23 0.94 0.01 0.14 0.16 0.94 0.01 0.12 0.13 0.96

4 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.95 0.00 0.08 0.09 0.95
3 0.01 0.14 0.14 0.95 0.00 0.09 0.10 0.95 0.00 0.08 0.08 0.95
4 0.01 0.14 0.14 0.95 0.01 0.10 0.10 0.94 0.00 0.08 0.09 0.95

5 1 0.70 0.42 0.45 0.71 0.66 0.27 0.30 0.23 0.64 0.23 0.23 0.14
2 0.02 0.16 0.18 0.94 0.01 0.10 0.11 0.95 0.00 0.09 0.09 0.95
3 0.01 0.14 0.14 0.95 0.00 0.09 0.10 0.95 0.00 0.08 0.08 0.95
4 0.01 0.14 0.14 0.95 0.01 0.10 0.10 0.94 0.00 0.08 0.09 0.95

6 1 2.09 1.91 2.33 0.75 1.77 0.72 0.79 0.07 1.67 0.54 0.54 0.02
2 0.08 0.38 0.49 0.92 0.02 0.14 0.14 0.94 0.01 0.11 0.11 0.95
3 0.01 0.14 0.15 0.95 0.00 0.09 0.10 0.95 0.00 0.08 0.09 0.95
4 0.01 0.14 0.14 0.95 0.01 0.10 0.10 0.94 0.00 0.08 0.09 0.95

Uniform 1 1 0.47 0.51 0.51 0.96 0.41 0.33 0.35 0.83 0.39 0.30 0.29 0.77
2 0.05 0.26 0.26 0.97 0.02 0.17 0.18 0.96 0.01 0.16 0.15 0.96
3 0.06 0.44 0.64 0.97 0.02 0.17 0.18 0.95 0.01 0.16 0.15 0.96
4 0.05 0.27 0.27 0.97 0.02 0.18 0.19 0.96 0.01 0.16 0.16 0.96

2 1 0.66 0.66 0.68 0.95 0.57 0.40 0.43 0.79 0.55 0.36 0.36 0.69
2 0.05 0.28 0.29 0.97 0.02 0.17 0.18 0.96 0.01 0.16 0.16 0.96
3 0.07 0.60 0.83 0.97 0.02 0.17 0.18 0.95 0.01 0.16 0.15 0.96
4 0.05 0.27 0.27 0.97 0.02 0.18 0.19 0.96 0.01 0.16 0.16 0.96

3 1 1.36 1.87 2.17 0.96 1.10 0.68 0.72 0.72 1.04 0.58 0.57 0.57
2 0.09 0.51 0.59 0.95 0.02 0.19 0.20 0.96 0.02 0.17 0.17 0.96
3 0.07 0.52 0.72 0.97 0.02 0.17 0.18 0.96 0.01 0.16 0.15 0.96
4 0.05 0.27 0.27 0.97 0.02 0.18 0.19 0.96 0.01 0.16 0.16 0.96

4 1 0.37 0.32 0.31 0.91 0.35 0.21 0.22 0.68 0.35 0.20 0.20 0.59
2 0.02 0.17 0.17 0.96 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96
3 0.01 0.17 0.17 0.97 0.00 0.11 0.12 0.95 0.01 0.11 0.11 0.96
4 0.01 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96

5 1 0.71 0.50 0.51 0.85 0.66 0.33 0.34 0.46 0.66 0.29 0.29 0.36
2 0.02 0.19 0.20 0.96 0.01 0.13 0.13 0.95 0.01 0.11 0.11 0.96
3 0.01 0.17 0.18 0.97 0.01 0.11 0.12 0.95 0.01 0.11 0.11 0.96
4 0.01 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96

6 1 2.11 2.18 2.75 0.89 1.77 0.80 0.85 0.29 1.70 0.64 0.64 0.15
2 0.08 0.43 0.57 0.94 0.02 0.15 0.15 0.95 0.02 0.13 0.13 0.96
3 0.01 0.17 0.18 0.97 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96
4 0.01 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96

Exponential 1 1 0.44 0.41 0.43 0.94 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
2 0.03 0.21 0.22 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
3 0.04 0.37 0.54 0.95 0.01 0.14 0.14 0.94 0.01 0.12 0.12 0.97
4 0.03 0.22 0.23 0.94 0.01 0.14 0.16 0.94 0.01 0.12 0.13 0.96

2 1 0.62 0.54 0.59 0.93 0.56 0.33 0.36 0.68 0.54 0.29 0.29 0.55
2 0.04 0.23 0.25 0.95 0.01 0.14 0.15 0.94 0.01 0.13 0.13 0.96
3 0.05 0.50 0.71 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.97
4 0.03 0.22 0.23 0.94 0.01 0.14 0.16 0.94 0.01 0.12 0.13 0.96

3 1 1.29 1.64 2.00 0.94 1.08 0.57 0.62 0.55 1.03 0.47 0.47 0.36
2 0.07 0.44 0.51 0.94 0.02 0.16 0.17 0.94 0.01 0.14 0.14 0.96
3 0.05 0.43 0.62 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.97
4 0.03 0.22 0.23 0.94 0.01 0.14 0.16 0.94 0.01 0.12 0.13 0.96

4 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.95 0.00 0.08 0.09 0.95
3 0.01 0.14 0.14 0.95 0.00 0.09 0.10 0.95 0.00 0.08 0.08 0.95
4 0.01 0.14 0.14 0.95 0.01 0.10 0.10 0.94 0.00 0.08 0.09 0.95

5 1 0.70 0.42 0.45 0.71 0.66 0.27 0.30 0.23 0.64 0.23 0.23 0.14
2 0.02 0.16 0.18 0.94 0.01 0.10 0.11 0.95 0.00 0.09 0.09 0.95
3 0.01 0.14 0.14 0.95 0.00 0.09 0.10 0.95 0.00 0.08 0.08 0.95
4 0.01 0.14 0.14 0.95 0.01 0.10 0.10 0.94 0.00 0.08 0.09 0.95

6 1 2.09 1.91 2.33 0.75 1.77 0.72 0.79 0.07 1.67 0.54 0.54 0.02
2 0.08 0.38 0.49 0.92 0.02 0.14 0.14 0.94 0.01 0.11 0.11 0.95
3 0.01 0.14 0.15 0.95 0.00 0.09 0.10 0.95 0.00 0.08 0.09 0.95
4 0.01 0.14 0.14 0.95 0.01 0.10 0.10 0.94 0.00 0.08 0.09 0.95

Table 3:

Simulation results comparing different methods’ performance in terms of bias, mean estimated standard error (SE), empirical standard deviation (SD) and coverage rate (CR) for 95% confidence interval when λ0(t)=0.0004t.

Censoring Setting Method (n1,n2,n3)=(150,300,5150) (n1,n2,n3)=(300,600,10300) (n1,n2,n3)=(600,1200,20600)

Bias SE SD CR Bias SE SD CR Bias SE SD CR

Mixed 1 1 0.44 0.51 0.52 0.95 0.41 0.33 0.35 0.83 0.38 0.23 0.23 0.63
2 0.03 0.26 0.27 0.97 0.01 0.17 0.18 0.96 0.00 0.12 0.12 0.95
3 0.05 0.44 0.63 0.96 0.01 0.17 0.18 0.96 0.00 0.12 0.12 0.95
4 0.03 0.27 0.27 0.96 0.02 0.19 0.20 0.97 0.01 0.18 0.18 0.95

2 1 0.63 0.66 0.68 0.95 0.57 0.41 0.44 0.80 0.53 0.28 0.28 0.51
2 0.04 0.28 0.29 0.96 0.02 0.18 0.18 0.96 0.01 0.12 0.12 0.95
3 0.06 0.60 0.82 0.96 0.01 0.17 0.18 0.96 0.00 0.12 0.12 0.95
4 0.03 0.27 0.27 0.96 0.02 0.18 0.19 0.95 0.00 0.12 0.12 0.95

3 1 1.29 1.66 1.79 0.97 1.10 0.69 0.74 0.73 1.02 0.45 0.46 0.32
2 0.07 0.46 0.51 0.95 0.02 0.19 0.20 0.95 0.01 0.13 0.13 0.96
3 0.06 0.52 0.71 0.96 0.02 0.17 0.18 0.95 0.00 0.12 0.12 0.95
4 0.03 0.27 0.27 0.96 0.02 0.18 0.19 0.95 0.00 0.12 0.12 0.95

4 1 0.37 0.32 0.31 0.91 0.34 0.22 0.23 0.69 0.33 0.15 0.15 0.39
2 0.02 0.18 0.17 0.97 0.00 0.12 0.12 0.96 0.00 0.08 0.08 0.95
3 0.02 0.17 0.17 0.97 0.00 0.11 0.12 0.96 0.00 0.08 0.08 0.95
4 0.02 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.00 0.08 0.08 0.95

5 1 0.72 0.51 0.51 0.85 0.66 0.33 0.35 0.49 0.63 0.22 0.23 0.14
2 0.03 0.19 0.20 0.97 0.01 0.13 0.13 0.96 0.00 0.09 0.09 0.95
3 0.02 0.17 0.18 0.97 0.00 0.12 0.12 0.95 0.00 0.08 0.08 0.95
4 0.02 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.00 0.08 0.08 0.95

6 1 2.11 2.05 2.38 0.88 1.76 0.81 0.87 0.32 1.65 0.53 0.54 0.02
2 0.08 0.40 0.49 0.93 0.02 0.16 0.16 0.94 0.00 0.10 0.10 0.96
3 0.02 0.18 0.18 0.96 0.00 0.12 0.12 0.95 0.00 0.08 0.08 0.95
4 0.02 0.17 0.17 0.97 0.01 0.12 0.12 0.95 0.00 0.08 0.08 0.95

Uniform 1 1 0.42 0.33 0.36 0.92 0.38 0.20 0.21 0.59 0.37 0.17 0.17 0.42
2 0.02 0.17 0.19 0.93 0.00 0.11 0.11 0.95 0.00 0.09 0.09 0.93
3 0.04 0.36 0.65 0.94 0.09 0.10 0.11 0.95 0.00 0.09 0.09 0.94
4 0.02 0.18 0.18 0.94 0.01 0.11 0.12 0.93 0.00 0.09 0.09 0.94

2 1 0.60 0.44 0.52 0.90 0.54 0.26 0.26 0.43 0.52 0.21 0.22 0.25
2 0.03 0.19 0.23 0.93 0.00 0.11 0.11 0.95 0.00 0.09 0.09 0.93
3 0.05 0.53 0.87 0.94 0.00 0.11 0.11 0.95 0.00 0.09 0.09 0.94
4 0.02 0.18 0.18 0.94 0.01 0.11 0.12 0.93 0.00 0.09 0.09 0.94

3 1 1.24 1.41 1.74 0.90 1.04 0.47 0.49 0.23 1.00 0.36 0.37 0.10
2 0.06 0.39 0.50 0.92 0.01 0.13 0.13 0.93 0.00 0.10 0.11 0.93
3 0.04 0.44 0.74 0.94 0.00 0.11 0.11 0.94 0.00 0.09 0.09 0.94
4 0.02 0.18 0.18 0.94 0.01 0.11 0.12 0.93 0.00 0.09 0.09 0.94

4 1 0.34 0.20 0.21 0.73 0.31 0.14 0.14 0.31 0.31 0.11 0.11 0.14
2 0.00 0.11 0.12 0.93 −0.01 0.07 0.07 0.91 −0.01 0.06 0.06 0.93
3 0.00 0.10 0.12 0.94 −0.01 0.07 0.07 0.92 −0.01 0.06 0.06 0.93
4 0.00 0.11 0.11 0.92 −0.01 0.07 0.07 0.93 −0.01 0.06 0.06 0.93

5 1 0.67 0.35 0.37 0.52 0.62 0.22 0.23 0.04 0.61 0.17 0.18 0.01
2 0.01 0.13 0.15 0.93 −0.01 0.08 0.08 0.91 −0.01 0.07 0.07 0.93
3 0.00 0.11 0.12 0.94 −0.01 0.07 0.07 0.92 −0.01 0.06 0.06 0.93
4 0.00 0.11 0.11 0.92 −0.01 0.07 0.07 0.93 −0.01 0.06 0.06 0.93

6 1 1.99 1.70 2.09 0.55 1.67 0.62 0.66 0.01 1.60 0.44 0.45 0.00
2 0.06 0.34 0.45 0.92 0.00 0.12 0.12 0.91 0.00 0.09 0.09 0.93
3 0.00 0.11 0.12 0.93 −0.01 0.07 0.07 0.92 −0.01 0.06 0.06 0.93
4 0.00 0.11 0.11 0.92 −0.01 0.07 0.07 0.93 −0.01 0.06 0.06 0.93

Exponential 1 1 0.43 0.44 0.48 0.95 0.38 0.28 0.28 0.79 0.37 0.19 0.19 0.49
2 0.03 0.22 0.25 0.95 0.00 0.14 0.15 0.95 0.00 0.10 0.10 0.95
3 0.05 0.45 0.78 0.95 0.00 0.14 0.15 0.95 0.00 0.10 0.10 0.93
4 0.02 0.23 0.24 0.95 0.01 0.15 0.16 0.94 0.00 0.10 0.10 0.95

2 1 0.62 0.57 0.65 0.93 0.54 0.34 0.35 0.72 0.52 0.23 0.24 0.36
2 0.04 0.25 0.28 0.95 0.01 0.15 0.15 0.94 0.00 0.10 0.10 0.95
3 0.06 0.65 1.04 0.95 0.00 0.14 0.15 0.95 0.00 0.10 0.10 0.93
4 0.02 0.23 0.24 0.95 0.01 0.15 0.16 0.94 0.00 0.10 0.10 0.95

3 1 1.29 1.62 2.00 0.95 1.05 0.59 0.59 0.61 1.01 0.39 0.40 0.16
2 0.07 0.45 0.57 0.93 0.01 0.17 0.17 0.94 0.00 0.11 0.12 0.95
3 0.05 0.54 0.88 0.95 0.00 0.14 0.15 0.95 0.00 0.10 0.10 0.94
4 0.03 0.23 0.24 0.95 0.01 0.15 0.16 0.94 0.00 0.10 0.10 0.95

4 1 0.36 0.27 0.28 0.86 0.33 0.18 0.19 0.59 0.32 0.13 0.13 0.22
2 0.01 0.15 0.15 0.95 0.00 0.10 0.10 0.95 −0.01 0.07 0.07 0.93
3 0.01 0.14 0.16 0.95 0.00 0.10 0.10 0.95 −0.01 0.07 0.07 0.94
4 0.01 0.15 0.15 0.95 0.00 0.10 0.10 0.95 −0.01 0.07 0.97 0.94

5 1 0.69 0.44 0.46 0.76 0.64 0.28 0.29 0.29 0.62 0.19 0.20 0.04
2 0.02 0.17 0.18 0.94 0.00 0.11 0.11 0.94 0.00 0.07 0.08 0.94
3 0.01 0.15 0.16 0.95 0.00 0.10 0.10 0.95 −0.01 0.07 0.07 0.95
4 0.01 0.15 0.15 0.95 0.00 0.10 0.10 0.95 −0.01 0.07 0.07 0.94

6 1 2.05 1.92 2.33 0.80 1.73 0.73 0.76 0.12 1.63 0.47 0.49 0.00
2 0.07 0.38 0.51 0.92 0.01 0.14 0.14 0.94 0.00 0.09 0.10 0.94
3 0.01 0.15 0.16 0.95 0.00 0.10 0.10 0.95 −0.01 0.07 0.07 0.95
4 0.01 0.15 0.15 0.95 0.00 0.10 0.10 0.95 −0.01 0.07 0.07 0.94

To evaluate the relationships of bias versus RX˜WV2R12 and bias versus RX˜WV2R1v2, we vary the parameter ρ from 0 to 0.6 in setting 1 and setting σw=1 and σw=1.7, respectively, keeping all other parameters unchanged. Figure 2 shows the estimated bias for method 1 with respect to the squared multiple correlation coefficient from the first step (RX˜WV2) and squared multiple partial correlation coefficient given covariate V from the first step (RX˜WV2). Based on the results, we can see that the bias is not a monotonic decreasing function of RXWV2, but is a decreasing function of RX˜WV2, which is consistent with our theoretical derivation. Figure 2 suggests that the requirement R2>0.36 (Lampe et al., 2017) is insufficient as a single criterion to decide whether a biomarker is useful or not. In particular, the partial R2 after adjusting for the personal characteristic effect is an important factor influencing the bias of the current biomarker-based regression calibration for the association study. Figure 3 shows the relationship between SD of the BF corrected estimator of the association parameter and the RX˜WV2 and RX˜WV2. Based on Figure 3, we can see that the SD is a decreasing function of RX˜WV2 rather than RX˜WV2, which indicates that the partial R2, instead of R2, is an essential factor affecting SD. Therefore, to evaluate whether the calibration equation is useful, we should consider both the partial R2 and the R2. Similar patterns for the other two estimators with the use of self-reported dietary data from feeding study (method 3 and method 4) are shown in Figure 4.

Fig. 2:

Fig. 2:

Bias from method 1 in relation with R12 and R1v2

Fig. 3:

Fig. 3:

SD of method 1 and method 2 in relation with R12 and R1v2

Fig. 4:

Fig. 4:

SD of method 3 and method 4 in relation with R12 and R1v2

Additional simulation in Appendix B indicates that method 2 is robust to the assumption that the relationships between the self-reported dietary intake and the short-term dietary intake are the same between sample 1 and sample 2. Also, it showed that with the bias correction, one may not need to be too stringent on the threshold of partial R2(R1v2).

6. Real Data Example

We illustrate our methods with the WHI NPAAS feeding study (n=153), NPAAS biomarker study (n=450) and the full WHI cohort data. We treat log-transformed self-reported sodium and potassium intake from FFQ as Q. Variables including age, BMI, race/ethnicity, education level, self-reported physical activity, and smoking status are used as V; the 24 hour urine sodium and potassium measurements are used as W and the disease outcomes are several CVD categories, including total coronary heart disease (CHD) and its myocardial infarction (MI) and coronary death components, total stroke and its hemorrhagic and ischemic components, total CVD comprised of CHD and stroke, coronary artery bypass grafting (CABG) and percutaneous coronary intervention (PCI), and total CVD that also includes CABG and PCI, and heart failure. The prevalence of CVD events range from 1–10% (Prentice et al., 2017), which suggests the rare disease assumption is not severely violated. Follow-up times began with the time of FFQ measurement (year-1 visit in DM-C and at enrollment in OS) and continued until the earliest of the specific CVD outcomes under analysis, death, loss to follow-up, or September 30, 2010, whichever occurred first. In our analysis, the hazard rates was modeled as implicitly conditioning on the continued survival of the study subject. This means that death is not a source of censoring in our formulation. Rather death simply limits the follow-up period over which hazard rate information is obtained for thestudy subject. This is distinct from regarding death as censoring non-fatal outcomes, which a competing risk formulation would do. Using the log-transformed self-reported sodium-potassium ratio as a single predictor for the ‘true’ log-transformed dietary sodium-potassium ratio, we obtained R2=0.36 and an increased R2=0.38 can be obtained by using the self-reported log sodium and log potassium as separate predictors. Therefore for the analysis, we used these two self-reported measurements as predictors for the ‘true’ log-transformed dietary sodium-potassium ratio. The further inclusion of personal characteristics increased the R2 to 0.45 with a partial R2 conditional on personal characteristics equal to 0.37. We tested the normality of log-transformed self-reported sodium and potassium intake (Q), log-transformed 24 hour urine sodium and potassium measurements (W), and the log-transformed assessed sodium/potassium ratio (X˜). There is no evidence that any of the normality assumptions are violated (p>0.1).

For the feeding study, there were moderate measurement errors in the assessed consumed dietary data, so we differentiated the short term dietary intake X and the observed consumed dietary intake X˜ and adjusted the bias factor estimate by BF^=1Var^(X˜W,V)σ˜ˆx2Var^(X˜V)σ˜ˆx2 where σ˜ˆx2 was treated as a sensitivity parameter since we do not have lab replication data for nutrition evaluation to estimate the food packaging error and nutrient composition database error accurately. We set σ˜ˆx2 at several different levels to illustrate the potential bias.

The estimated hazard ratio (HR) for a 20% increase in dietary intake is shown in Table 4. Under the assumption that there was no measurement error in the consumed dietary data in the controlled feeding study, we estimate the bias factor will be about 0.37. This will lead to the most conservative (bias toward the null) HR estimate. The HR estimated from method 3 (three-step with FFQ approach) is about the same as the estimate using method 2 with a BF around 0.78, which is equivalent to the case where approximately 50% of the variation in the estimated consumed diet data is from ‘noise’. This noise level is about the same across all disease outcomes and is consistent with our estimation of total energy expenditure (Zheng et al., 2014).

Table 4:

Association between 20% increase in sodium/potassium ratio with various cardiovascular diseases.

Outcome Method 1 Method 3 Method 4

HR 95% CI HR 95% CI HR 95% CI

CHD 1.20 (1.06, 1.36) 1.16 (1.04, 1.28) 1.21 (1.03,1.43)
Nonfatal MI 1.22 (1.04, 1.42) 1.16 (1.04, 1.30) 1.19 (1.01,1.41)
Coronary death 1.21 (1.07, 1.38) 1.17 (1.03, 1.33) 1.27 (0.98,1.64)
Stroke 1.11 (1.02, 1.21) 1.10 (1.00, 1.20) 1.17 (1.02,1.34)
Ischemic Stroke 1.18 (1.06, 1.31) 1.15 (1.02, 1.29) 1.24 (1.05,1.46)
Hemorrhagic Stroke 0.86 (0.67, 1.10) 0.90 (0.73, 1.11) 0.92 (0.63,1.35)
Total CVD 1.15 (1.08, 1.22) 1.12 (1.03, 1.21) 1.17 (1.06,1.29)
Revascularization 1.21 (1.06, 1.37) 1.15 (1.04, 1.28) 1.18 (1.01,1.39)
Non-Revascularization 1.15 (1.05, 1.26) 1.12 (1.03, 1.22) 1.18 (1.04,1.34)
Heart Failure 1.06 (0.94, 1.19) 1.03 (0.94, 1.14) 1.00 (0.84,1.18)

Outcome Method 2 (no Error) Method 2 (40% Error) Method 2 (50% Error)

HR 95% CI HR 95% CI HR 95% CI

CHD 1.07 (1.02, 1.12) 1.13 (1.04, 1.23) 1.16 (1.04,1.30)
Nonfatal MI 1.08 (1.03, 1.13) 1.14 (1.03, 1.25) 1.17 (1.03,1.34)
Coronary death 1.07 (1.02, 1.14) 1.14 (1.02, 1.26) 1.17 (1.03,1.34)
Stroke 1.04 (1.00, 1.08) 1.07 (1.00, 1.15) 1.09 (1.00,1.18)
Ischemic Stroke 1.06 (1.01, 1.11) 1.11 (1.02, 1.21) 1.14 (1.03,1.27)
Hemorrhagic Stroke 0.94 (0.86, 1.03) 0.90 (0.77, 1.06) 0.88 (0.72,1.08)
Total CVD 1.05 (1.02, 1.09) 1.10 (1.03, 1.16) 1.12 (1.04,1.21)
Revascularization 1.07 (1.02, 1.12) 1.13 (1.03, 1.25) 1.17 (1.03,1.32)
Non-Revascularization 1.05 (1.02, 1.09) 1.10 (1.03, 1.17) 1.12 (1.03,1.22)
Heart Failure 1.02 (0.98, 1.07) 1.04 (0.96, 1.12) 1.05 (0.94,1.16)

Compared with the results from (Prentice et al., 2017), we have the same findings that sodium/potassium ratio is positively associated with the risk of CHD, nonfatal myocardial infarction, coronary death, ischemic stroke, total CVD, coronary revascularization, non-revascularization and negatively associated with hemorrhagic stroke, but the conservative lower bounds of such effects are much smaller than previously presented. Method 3 gives a slightly wider confidence interval compared with (Prentice et al., 2017), but the difference for point estimate is modest. Method 2 with a 50% error assumption gives a point estimate and a confidence interval that are similar to those from method 3, which is consistent with what we have observed from simulations. For method 2 with no error assumption, since the BF is over-estimated, the confidence interval is narrower than method 3 because of the shrinkage effect. However, since the assumption of no error is implausible, it leads to biased results towards the null and the comparison of efficiency is not of much interest. Method 4 has a wider confidence interval compared with method 3, indicating that the urine biomarker is ‘strong’ and provides useful independent information beyond the self-reported data.

Estimation procedures for non-linear hazard ratio have not been fully developed when using the regression calibration procedure (Prentice & Huang, 2018). However, a test for non-linearity is carried out by including a quadratic term in calibrated intake in the hazard ratio model and testing for a zero coefficient does not show significant nonlinear effects (Prentice et al., 2017).

7. Discussion

In this work, we showed both theoretically and numerically that the näive regression calibration method (Method 1) leads to biased estimation of the association between the exposure and the outcome, even when the R2 from the first step regression is large. In addition, R2 itself is not a sufficient criterion for the evaluation of the biomarkers in general; the partial R2 also needs be taken into consideration. Also the choice of V (to condition on) should not include potential mediators.

We proposed three new regression calibration methods which can provide consistent estimation of the association under the rare disease assumption. The newly proposed methods has significantly improved performance over Method 1. We performed extensive simulations to compare the performance of the different estimators to provide guidance on future study designs.

In practice, Method 1 should not be used independently due to its potential large bias, unless the biomarker is very strong. The other three methods have their advantages/disadvantages. Method 4 is the simplest one in design and requires fewest assumptions. However, it depends on the availability of a strong dietary instrument and a large number of subjects in the feeding study to accurately characterize the association between the dietary instrument and the true dietary intake. Method 3 is another simple three-step approach. From both the theoretical result and the numerical studies, Method 3 uniformly outperforms Method 4 with large sample size. It allows the use of biomarker information efficiently and is robust to the measurement error in the assessed diet from the controlled feeding study. Method 3 works well under the assumption that the association between the dietary instrument and the true dietary intake are the same between the cohort and the controlled feeding study subgroup. In this case, we give the preference to Method 3 for studies on dietary components whose intakes do not change dramatically over time. Method 3 has been successfully used in several recent studies (Prentice et al., 2021a,b).

When such dietary instruments are not available, we need to consider Method 2. Method 2 does not use dietary instrument information in the biomarker development stage, it depends only on the objective measurements. Therefore, as long as the model is correctly specified, the same biomarker can be used to build calibration equations for the dietary instruments that are available for the cohort but are not necessarily measured in the controlled feeding study. To compare the performance of Method 2 and Method 3, note that when we have relatively strong biomarker and weak dietary instruments, Method 2 has better performance than Method 3; whereas when we have relatively weak biomarker and strong dietary instrument, Method 3 has better performance.

One limitation of Method 2 is that it requires the knowledge of the variation of noise in the assessed diet, σ˜x2. The major portion of this variation might be from the inaccuracy of the records from the nutritional database (a bag of chips labeled with 100 cal might be 90 cal or 110 cal). Ideally, if this variation information can be added to the nutritional database, then the problem will be solved. We recommend when producing the standardized food, analyzing multiple samples for each type of food used in the feeding study during the nutritional analysis stage and reporting the standard error for each food type. When we do not have σ˜x2 available, a sensitivity analysis can be done. Alternatively, when there are more than one self-reported measurements exist (denoted as Q1 and Q2 for example) and only one of them (e.g. Q1) is available in the first stage sample, we can estimate σ˜x by comparing the estimator θˆ3 and θˆ1 using Q1 only and then apply Method 2 with Q2 or both Q1 and Q2 as shown in Appendix B.

As usual in Cox regression models without measurement errors in the predictors, if Z and V are highly correlated, then we will face variance inflation and will not be able to estimate the coefficient before Z precisely without extremely large sample size. The potential collinearity between W and V will not cause issue itself since we are only interested in predicting E[ZW,V]. However, if E[ZW,V] and V are highly correlated (i.e, the biomarker cannot provide enough additional information about the true diet given the adjusted confounding variables), then we will face the variance inflation issue.

Here we emphasize that our proposed approach is different from simple imputation (SI), and SI is not suitable for our measurement error correction problem. First, for Steps 1 and 2 of the biomarker-calibrated approach, we do not simply use SI to generate one or more observations from the conditional distribution of the unobserved dietary variable given available data, instead, we make a good approximation to the expectation of the unobserved ‘true’ intake given available data. Also, when calculating the asymptotic variance, unlike SI, which only uses the variance estimated from the last step, our proposed regression calibration method also incorporates the variation in the regression models for Steps 1 and 2. Hence, to attain similar efficiency of the regression calibration approach, with SI, a very large number of imputations are needed, which is not practical. Under the classical measurement error assumption and the rare event assumption, it has been shown in previous literature (Prentice, 1982) that a two-step regression calibration approach leads to a consistent estimator. By carefully selecting the set of variables that can be included in the regression at each step, we extended the existing literature and showed that our proposed three-step estimator is also consistent under similar assumptions.

Multiple imputation is not suitable for the measurement measure error correction problem in general since the true dietary intakes Z and X are never observed. The measurement error contained in X˜ leads to biased estimates using multiple imputation. In addition, due to the fact that n3 is much larger than n2 and n1, the multiple imputation algorithm is quite inefficient both statistically and computationally for our application.

To evaluate the generalizability of the developed biomarker and calibration equation, it will be of interest to validate and/or apply the biomarker and calibration equation developed in this study to other nutritional study such as Observing Protein and Energy Nutrition (OPEN) (Subar et al., 2003) and Nurses’s Health Study (NHS) (Belanger et al., 1978).

Table 7:

Simulation results comparing different methods’ performance when the correlation structures between the short term dietary intake and true dietary intakes of the controlled feeding study and the full cohort are different when λ0(t)=0.004.

Censor Setting Diff Method (n1,n2,n3)=(150,300,5150) (n1,n2,n3)=(300,600,10300) (n1,n2,n3)=(600,1200,20600)

Bias SE SD CR Bias SE SD CR Bias SE SD CR

Mix 1 0.1 1 0.44 0.41 0.43 0.94 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
0.1 2 0.03 0.21 0.22 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
0.1 3 0.02 0.81 1.03 0.96 0.03 0.14 0.16 0.95 0.02 0.13 0.13 0.97
0.1 4 0.08 0.27 0.28 0.96 0.06 0.17 0.20 0.96 0.05 0.14 0.14 0.97

0.3 1 0.44 0.41 0.43 0.94 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
0.3 2 0.03 0.21 0.22 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
0.3 3 0.01 6.7 2.74 0.97 0.07 0.17 0.18 0.98 0.07 0.15 0.14 0.97
0.3 4 0.23 0.54 0.56 0.98 0.17 0.26 0.32 0.99 0.16 0.19 0.20 0.97

0.5 1 0.44 0.41 0.43 0.94 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
0.5 2 0.03 0.21 0.29 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
0.5 3 0.22 2.27 1.38 0.98 0.14 0.21 0.24 0.99 0.13 0.17 0.17 0.96
0.5 4 0.56 10.97 3.62 0.99 0.39 13.57 5.49 1.00 0.40 0.36 0.41 0.96

4 0.1 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
0.1 2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.95 0.00 0.08 0.09 0.95
0.1 3 0.03 0.14 0.15 0.96 0.02 0.10 0.10 0.96 0.02 0.09 0.09 0.95
0.1 4 0.04 0.16 0.17 0.96 0.04 0.11 0.11 0.96 0.03 0.09 0.10 0.96

0.3 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
0.3 2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.95 0.00 0.08 0.09 0.95
0.3 3 0.07 0.17 0.19 0.97 0.05 0.11 0.12 0.96 0.05 0.10 0.10 0.95
0.3 4 0.16 0.28 0.37 0.99 0.13 0.15 0.17 0.98 0.13 0.12 0.13 0.91

0.5 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
0.5 2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.96 0.00 0.08 0.09 0.95
0.5 3 0.14 0.26 0.39 0.98 0.11 0.13 0.14 0.97 0.10 0.11 0.11 0.92
0.5 4 0.46 1.43 1.43 1.00 −0.20 6.72 21.5 0.98 0.31 0.20 0.21 0.79

Unif 1 0.1 1 0.47 0.51 0.51 0.96 0.41 0.33 0.35 0.83 0.39 0.30 0.29 0.77
0.1 2 0.05 0.26 0.26 0.97 0.02 0.17 0.18 0.96 0.01 0.16 0.15 0.96
0.1 3 0.03 0.96 1.20 0.98 0.03 0.18 0.19 0.96 0.03 0.16 0.16 0.96
0.1 4 0.09 0.32 0.33 0.98 0.06 0.20 0.22 0.96 0.05 0.17 0.17 0.97

0.3 1 0.47 0.51 0.51 0.96 0.41 0.33 0.35 0.83 0.39 0.30 0.29 0.77
0.3 2 0.05 0.26 0.26 0.97 0.02 0.17 0.18 0.96 0.01 0.16 0.15 0.96
0.3 3 0.03 7.56 3.08 0.98 0.08 0.21 0.22 0.97 0.07 0.18 0.18 0.97
0.3 4 0.25 0.62 0.65 0.99 0.19 0.30 0.34 0.98 0.16 0.23 0.23 0.97

0.5 1 0.78 0.51 0.51 0.94 0.41 0.33 0.35 0.83 0.39 0.30 0.29 0.77
0.5 2 0.05 0.26 0.26 0.97 0.02 0.17 0.18 0.96 0.01 0.16 0.15 0.96
0.5 3 0.29 3.73 2.37 0.99 0.15 0.25 0.27 0.98 0.13 0.21 0.21 0.96
0.5 4 0.57 12.47 4.62 0.99 0.40 14.07 5.68 0.99 0.40 0.40 0.43 0.97

4 0.1 1 0.37 0.32 0.31 0.91 0.35 0.21 0.22 0.68 0.35 0.20 0.20 0.59
0.1 2 0.02 0.17 0.17 0.96 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96
0.1 3 0.03 0.18 0.19 0.97 0.02 0.12 0.12 0.95 0.02 0.11 0.11 0.96
0.1 4 0.05 0.19 0.20 0.98 0.04 0.13 0.14 0.96 0.04 0.12 0.12 0.96

0.3 1 0.37 0.32 0.31 0.91 0.35 0.21 0.22 0.68 0.35 0.20 0.20 0.59
0.3 2 0.02 0.17 0.17 0.96 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96
0.3 3 0.07 0.21 0.24 0.98 0.06 0.13 0.14 0.96 0.06 0.12 0.12 0.95
0.3 4 0.17 0.34 0.46 0.99 0.14 0.18 0.19 0.97 0.13 0.15 0.15 0.92

0.5 1 0.37 0.32 0.31 0.91 0.35 0.21 0.22 0.68 0.35 0.20 0.20 0.59
0.5 2 0.02 0.17 0.17 0.96 0.01 0.12 0.12 0.95 0.01 0.11 0.11 0.96
0.5 3 0.14 0.31 0.49 0.99 0.11 0.16 0.17 0.96 0.11 0.14 0.14 0.93
0.5 4 0.45 1.39 1.34 1.00 −0.09 4.27 16.53 0.98 0.32 0.24 0.24 0.85

Exp 1 0.1 1 0.44 0.41 0.43 0.94 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
0.1 2 0.03 0.21 0.22 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
0.1 3 0.02 0.81 1.03 0.96 0.03 0.14 0.16 0.95 0.02 0.13 0.13 0.97
0.1 4 0.08 0.27 0.28 0.96 0.06 0.17 0.20 0.96 0.05 0.14 0.14 0.97

0.3 1 0.44 0.41 0.43 0.94 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
0.3 2 0.03 0.21 0.22 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
0.3 3 0.01 6.7 2.74 0.97 0.07 0.17 0.18 0.98 0.07 0.15 0.14 0.97
0.3 4 0.23 0.54 0.56 0.98 0.17 0.26 0.32 0.99 0.16 0.19 0.20 0.97

0.5 1 0.44 0.41 0.43 0.94 0.40 0.26 0.29 0.74 0.38 0.24 0.24 0.66
0.5 2 0.03 0.21 0.22 0.95 0.01 0.14 0.15 0.94 0.01 0.12 0.12 0.96
0.5 3 0.22 2.27 1.38 0.98 0.14 0.21 0.24 0.99 0.13 0.17 0.17 0.96
0.5 4 0.56 10.97 3.62 0.99 0.39 13.57 5.49 1.00 0.40 0.36 0.41 0.96

4 0.1 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
0.1 2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.95 0.00 0.08 0.09 0.95
0.1 3 0.03 0.14 0.15 0.96 0.02 0.10 0.10 0.96 0.02 0.09 0.09 0.95
0.1 4 0.04 0.16 0.17 0.96 0.04 0.11 0.11 0.96 0.03 0.09 0.10 0.96

0.3 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
0.3 2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.95 0.00 0.08 0.09 0.95
0.3 3 0.07 0.17 0.19 0.97 0.06 0.11 0.12 0.96 0.05 0.10 0.10 0.95
0.3 4 0.16 0.28 0.37 0.99 0.14 0.15 0.17 0.98 0.13 0.12 0.13 0.91

0.5 1 0.37 0.26 0.27 0.82 0.35 0.17 0.19 0.52 0.34 0.15 0.15 0.39
0.5 2 0.02 0.14 0.15 0.95 0.00 0.10 0.10 0.95 0.00 0.08 0.09 0.95
0.5 3 0.14 0.26 0.39 0.98 0.11 0.13 0.14 0.97 0.10 0.11 0.11 0.92
0.5 4 0.46 1.43 1.43 1.00 −0.20 6.72 21.5 0.98 0.31 0.20 0.21 0.79

Acknowledgements

This work was partially supported by grant R01 CA119171 from the U.S. National Cancer Institute and R01 GM106177 and U54 GM115458 from the National Institute of General Medical Sciences. The WHI programs are funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts, HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C.

The authors acknowledge the following investigators in the Women’s Health Initiative (WHI) Program: Program Office: Jacques E. Rossouw, Shari Ludlam, Dale Burwen, Joan McGowan, Leslie Ford, and Nancy Geller, National Heart, Lung, and Blood Institute, Bethesda, Maryland; Clinical Coordinating Center, Women’s Health Initiative Clinical Coordinating Center: Garnet L. Anderson, Ross L. Prentice, Andrea Z. LaCroix, and Charles L. Kooperberg, Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington; Investigators and Academic Centers: JoAnn E. Manson, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts; Barbara V. Howard, MedStar Health Research Institute/Howard University, Washington, DC; Marcia L. Stefanick, Stanford Prevention Research Center, Stanford, California; Rebecca Jackson, The Ohio State University, Columbus, Ohio; Cynthia A. Thomson, University of Arizona, Tucson/Phoenix, Arizona; Jean Wactawski-Wende, University at Buffalo, Buffalo, New York; Marian C. Limacher, University of Florida, Gainesville/Jacksonville, Florida; Robert M. Wallace, University of Iowa, Iowa City/Davenport, Iowa; Lewis H. Kuller, University of Pittsburgh, Pittsburgh, Pennsylvania; and Sally A. Shumaker, Wake Forest University School of Medicine, Winston-Salem, North Carolina; Women’s Health Initiative Memory Study: Sally A. Shumaker, Wake Forest University School of Medicine, Winston-Salem, North Carolina. For a list of all the investigators who have contributed to WHI science, please visit: https://www.whi.org/researchers/SitePages/WHI

Decisions concerning study design, data collection and analysis, interpretation of the results, the preparation of the manuscript, and the decision to submit the manuscript for publication resided with committees that comprised WHI investigators and included National Heart, Lung, and Blood Institute representatives. The contents of the paper are solely the responsibility of the authors.

Funding

The research is partially supported by grant R01 CA119171 from the U.S. National Cancer Institute, U54 GM115458 and R01 GM106177 from the National Institute of General Medical Sciences.

Appendix A: Technical Details

A.1: Bias Factor Derivation

In this subsection, we derived the asymptotic bias of θ for method 1 under rare disease assumption by checking how E(XˆQ,V) is biased away from E(XQ,V). For the first step regression model of X˜ on (W,V), we have

Xˆ=E(X˜W,V)=E(XW,V)=E(XV)+{WE(WV)}ΣWWV1ΣXWV.

where ΣWWV=Var(WV) and ΣXWV=Cov(X,WV) are conditional variance covariance matrices. Plugging this into the second step regression model, we have

E(XˆQ,V)=E{E(XˆX,Q,V)Q,V}=E{E(XˆX,V)Q,V}=EEE(XV)+{WE(WV)}ΣWWV1ΣXWVX,VQ,V=EE(XV)+{E(WX,V)E(WV)}ΣWWV1ΣXWVQ,V=EE(XV)+{XE(XV)}ΣXXV1ΣXWVΣWWV1ΣXWVQ,V=EE(XV)+{XE(XV)}ρVQ,V=ρVE(XQ,V)+1ρVE(XV).

When the bias factor ρV is a constant over V, we simply denote it as ρ and we have ρθz*=θz, or θz*=ρ1θz with appropriate adjustment for V. Explicitly, we have

ρ=1ΣXXV1ΣXXVΣXWVΣWWV1ΣXWV=1Var(XW,V)Var(XV)=RX,WV2.

If we further have E(XV)=Vδ is a linear function of V, then (1ρ)δθz*+θv*=θv, or θv*=θv(1ρ)δρθz.

A.2: Proof of Theorem 1

Before proof of main theorem 1, we first proof the lemma below which give the asymptotics for the third step regression given the γˆ from the second step.

Lemma 1

Assume n3/n2C2< and θ* is the unique solution to EUθ,γ*=0 for an estimating function U(θ,γ). If θˆ solve the estimating equation 0=n31i=1n3Ui(θ,γˆ) where n2γˆγ*N0,Σγ and γˆ is independent of Ui(θ,γ), then we have n3(θˆθ*)N0,Σθ where

Σθ=Iθ1Jθ+C2IγΣγIγIθ,

where Iθ=EU(θ,γ)θ|θ*,γ*, Iγ=EU(θ,γ)γ|θ*,γ* and Jθ=Var(U(θ*,γ*)). This variance can be consistently estimated by

Σˆθ=Iˆθ1Jˆθ+n3n2IˆγΣˆγIˆγIˆθ,

where Iˆθ=n31i=1n3Ui(θ,γ)θ|θˆ,γˆ,Iˆγ=n31i=1n3Ui(θ,γ)γ|θˆ,γˆ,Jˆθ=n31i=1n3Ui(θˆ,γˆ)Ui(θˆ,γˆ) and Σˆγ is a consistent estimator of Σγ.

Proof

We can derive the asymptotic for θˆ as below:

0=i=1n3Ui(θˆ,γˆ)=i=1n3Ui(θ*,γ*)+Ui(θ,γ)θ|θ*,γ(θˆθ*)+Ui(θ,γ)γ|θ*,γ(γˆγ*)+o(θˆθ*,γˆγ*).

With Uniform Law of Large Number (ULLN), we have

n31i=1n3Ui(θ,γ)θ|θ*,γγ=EU(θ,γ)θ|θ*,γ*+op(1)=Iθ+op(1).

Similarly, we have

n31i=1n3Ui(θ,γ)γ|θ*,γ*=Iγ+op1.

Plug in the Taylor expansion, notice that EU(θ*,γ*)=0, we have

0=n31i=1n3{Ui(θ*,γ*)EU(θ*,γ*)}Iθ(θˆθ*)Iγ(γˆγ*)+o(θˆθ*,γˆγ*)+op(1)*γˆγ*+op(1)*θˆθ*.

So we have

n3(θˆθ*)=Iθ1n31/2i=1n3Ui(θ*,γ*)Iγn3γˆγ*+op(1).

By Central Limit Theorem (CLT), we have

n31/2i=1n3{Ui(θ*,γ*)EU(θ*,γ*)}dN0,Jθ

So we have

Σθ=Iθ1Jθ+C2IγΣγIγIθ.

By ULLN and continuous mapping theorem, we have Iˆθ=Iθ+op(1), Iˆγ=Iγ+op(1) and Jˆθ=Jθ+op(1) By assumption Σˆγ=Σγ+op(1) and n3/n2C2<, using continuous mapping theorem, we have ΣˆθΣθ+op(1).

Now we provide the detail forms for the quantities involved in a Cox regression model.

Lemma 2

Assume n3/n2C2< and θ* is the unique solution to

0=Uθ,γ*=E0τZ*VEZ*VY(t)exp{(Z*,V)θ}EY(t)exp{(Z*,V)θ}dN(t).

where Z*=Xγ*. If θˆ solve the estimating equation

0=n31i=1n30τZˆiVij=1n3Yj(t)exp(Zˆj,Vj)θk=1n3Yk(t)exp(Zˆk,Vk)θZˆjVjdNi(t),

where Zˆi=Xiγˆ and n2γˆγ*N0,Σγ, then we have n3(θˆθ*)N0,Σθ where

Σθ=Iθ1Jθ+C2IγΣγIγIθ,Iθ=EYi(t)exp(Zi*,Vi)θZi*Vi2Jθ=E0τZi*Vis(1)(θ,t)s(0)(θ,t)2dNi(t)Iγ=E0τXi0A(θ,t)s(0)(θ,t)+s(1)(θ,t)G(θ,t)s(0)(θ,t)2dNi(t).

where a2=aaT and

A(θ,t)=EYi(t)1+θZZi*θZViexp(Zi*,Vi)θXi,G(θ,t)=EYi(t)exp(Zi*,Vi)θθZXi,s(0)(θ,t)=EYi(t)exp(Zi*,Vi)θ,s(1)(θ,t)=EYi(t)exp(Zi*,Vi)θZi*Vi.

This variance can be consistently estimated by

Σˆθ=Iˆθ1Jˆθ+n3n2IˆγΣˆγIˆγIˆθ,

where

Iˆθ=n31i=1n3Yi(t)exp(Zˆi,Vi)θˆZˆiVi2Jˆθ=n31i=1n3ΔiZˆiVijYjTiexp(Zˆj,Vj)θˆkYkTiexp(Zˆk,Vk)θˆ(ZˆjVj)2Iˆγ=n31i=1n3ΔiXi0Aˆ(θˆ,Ti)s(0)(θˆ,Ti)+sˆ(1)(θˆ,Ti)Gˆ(θˆ,Ti)sˆ(0)(θˆ,Ti)2Aˆ(θ,t)=n31i=1n3Yit1+θZZˆiθZViexp(Zˆi,Vi)θXi,Gˆ(θ,t)=n31i=1n3Yi(t)exp(Zˆi,Vi)θθZXi,

Σˆγ is a consistent estimator of Σγ and n3/n2C2.

Proof Denote Ui(θ,γ)=0τZiViEYj(t)expZj,VjθZjVjEYk(t)expZk,VkθdNi(t) where Zi=Xiγ. Then by definition of Iθ,Iγ,Jθ in Lemma 1, we can compute the form of these terms as stated above. The convergence of θˆ and γˆ as long as ULLN and continuous mapping ensure that we have

Aˆ(θˆ,t)=A(θ*,t)+op(1)Gˆ(θˆ,t)=G(θ*,t)+op(1)sˆ(0)(θˆ,t)=s(0)(θ*,t)+op(1)sˆ(1)(θˆ,t)=s(1)(θ*,t)+op(1)

which lead to the convergence Iˆθ=Iθ+op(1),Iˆγ=Iγ+op(1) and Jˆθ=Jθ+op(1). Applying Lemma 1, we have the asymptotic for θ˜ that solve the equation 0=n31i=1n3Ui(θ,γˆ). Now we just need to show θ˜ and θˆ is asymptotically equivalent, which is guaranteed by applying ULLN to get:

n31i=1n3Yj(t)exp{(Zˆj,Vj)θ}=s(0)(θ,t)+op(1)n31i=1n3Yj(t)exp{(Zˆj,Vj)θ}(ZˆjVj)=s(1)(θ,t)+op(1)

Here we would like to comment that for method 2–4, Z* has the same expression E(ZQ,V) and thus the Iθ are the same though their estimated version Iˆθ are different.

Now we handle the estimation of γ in the following lemma.

Lemma 3

Assume n2/n1C1< and γ* is the unique solution to EU(γ,β*)=0 for an estimating function U(γ,β)=[Qi,ViTXˆiQi,ViTQi,Viγ]. If γˆ solve the estimating equation 0=n31i=1n3Ui(γ,βˆ) where n1(βˆβ*)N0,Σβ, then we have n2γˆγ*N0,Σγ where

Σγ=Iγ1Iγ+C1IβΣβIβIγ,

where Iγ=EU(γ,β)γ|γ,β*=E[Qi,ViQi,Vi] and Iβ=EU(γ,β)β|γ*,β*. This variance can be consistently estimated by

Σˆγ=Iγ1Iˆγ+n2n1IˆβΣˆβIβIγ,

where Yˆγ=n21i=1n2Ui(γ,β)γ|γ,βˆ, Iˆβ=n21i=1n2Ui(γ,β)β|βˆ,βˆ and Σˆβ is a consistent estimator of Σβ.

Proof We can derive the asymptotic for γˆ as below:

0=i=1n2Ui(γˆ,βˆ)=i=1n2Ui(γ*,β*)+Uγγ*,β*(γˆγ*)+Uβ|γ*,β*(βˆβ*)+o(γˆγ*,βˆβ*).

With ULLN,

n21i=1n2Ui(γ,β)γ|γ,βˆ=EUγ|γ,βˆ+op(1)

and with continuous mapping theorem, we have

EUγ|γ,βˆ=EUγ|γ,β*+op(1)

So we have

n21i=1n2Ui(γ,β)γ|γˆ,βˆ=Iγ+op1.

Similarly, we have

n21i=1n2Ui(γ,β)β|γˆ,βˆ=Iβ+op(1).

Plug in the Taylor expansion, notice that EU(γ*,β*)=0, we have

0=n21i=1n2{Ui(γ*,β*)EU(γ*,β*)}Iγγˆγ*Iβ(βˆβ*)+o(γˆγ*,βˆβ*)+op(1)*βˆβ*+op(1)*γˆγ*.

So we have

n2γˆγ*=Iγ1n21/2i=1n2Ui(γ*,β*)Iβn2(βˆβ*)+op(1).

By CLT, we have

n21/2i=1n2{Ui(γ*,β*)EU(γ*,β*)}dN(0,Var(Ui(γ*,β*)))

So we have

Σγ=Iγ1Var(U(γ,β))+C2IβΣβIβIγ.

As we have shown Iˆγ=Iγ+op(1), Iˆβ=Iβ+op(1) and by assumption Σˆβ=Σβ+op(1) and n2/n1C1<, using continuous mapping theorem, we have Σˆγ=Σγ+op(1).

Applying the general form to each specific regression method, we can obtain the asymptotic results we need.

Lemma 4

Assume n2/n1C1< and γ1* is the unique solution to EU(γ1,β1*)=0 for an estimating function Uγ1,β1. If γˆ1 solve the estimating equation 0=n21i=1n2Ui(γ1,βˆ1) where n1(βˆ1β1*)N(0,Σβ1), then we have n2γˆ1γ1*N0,Σγ1 where

Σγ1=Iγ11Iγ1+C1Iβ1Σβ1Iβ1Iγ1,

where Iγ1=EUγ1,β1γ1|γ1*,β1* and Iβ1=EUγ1,β1β1|γ1*,β1*. This variance can be consistently estimated by

Σˆγ1=Iˆγ11Iˆγ1+n2n1Iˆβ1Σˆβ1Iˆβ1Iˆγ1,

where Iˆγ1=n21i=1n2Uiη1,β1η1|γˆ,βˆ1,Iˆβ1=n21i=1n2Uiγ1,β1β1|γˆ1,βˆ1 and Σˆβ1 is a consistent estimator of Σβ1.

Proof

We need to derive asymptotic for βˆ1 and then apply 3. The asymptotic of βˆ1 can be derived as below:

0=i=1n1Ui(βˆ1)=i=1n1Ui(β1*)+Uβ1|β1*(βˆ1β1*)+o(βˆ1β1*).

where Ui(β1*)=1,Wi,ViXi*1,Wi,Vi1,Wi,Viβ1*.

With ULLN,

n11i=1n1Uiβ1β1|βˆ1=EUβ1|βˆ1+op(1)

and with continuous mapping theorem, we have

EUβ1|βˆ1=EUβ1|β1*+op(1)

So we have

n11i=1n1Uiβ1β1|βˆ1=Iβ1+op1.

Plug in the Taylor expansion, notice that EU(β1*)=0, we have

0=n11i=1n1{Ui(β1*)EU(β1*)}Iβ1(βˆ1β1*)+op(1)βˆ1β1*.

So we have

n1(βˆ1β1*)=Iβ11n11/2i=1n1Ui(β1*)+op1.

By CLT, we have

n11/2i=1n2{Ui(β1*)EU(β1*)}dN(0,Var(Ui(β1*)))

So we have

Σβ1=Iβ11VarUβ1Iβ1.

By assumption, Σˆβ1=Σβ1+op(1) which is a consistent estimator of Σβ1.

Then by applying Lemma 3, we have

Σˆn=Iˆγ11Iˆγ1+n2n1Iˆβ1Σˆβ1Iˆβ1Iˆγ1

Lemma 5

Assume n2/n1C1< and γ2* is the unique solution to EU(γ2,β2*)=0 for an estimating function Uγ2,β2. If γˆ2 solve the estimating equation 0=n21i=1n2Ui(γ2,βˆ2) where n1(βˆ2β2*)N(0,Σβ2), then we have n2γˆ2γ2*N0,Σγ2 where

Σγ2=Iγ21Iγ2+C1Iβ2Σβ2Iβ2Iγ2,

where Iγ2=EUγ2,β2γ2|γ2*,β2* and Iβ2=EUγ2,β2β2|γ2*,β2*. This variance can be consistently estimated by

Σˆγ2=Iˆγ21Iˆγ2+n2n1Iˆβ2β2ˆIˆβ2Iˆγ2,

where Yˆγ2=n21i=1n2Uiγ2,β2γ2|γˆ2,βˆ2,Iˆβ2=n21i=1n2Uiγ2,β2βˆ2|γˆ2,βˆ2 and Σˆβ2 is a consistent estimator of Σβ2.

Proof

Define β2=β1BF.

First note that,

Var(X˜V,W)=Ω1=1n1iX˜i1,Wi,Viβ12,Var(X˜V)=Ω2=1n1iX˜i1,Viβt2,

where we let Ω1i=X˜i1,Wi,Viβ12 and Ω2i=X˜i1,Viβt2.

Second, the estimating equations considered are

U121i=1,Wi,ViΩ11(X˜i1,Wi,Viβ1),U122i=1,ViΩ21(X˜i(1,Vi)βt).U123i=(X˜i1,Wi,Viβ)2Ω1,U124i=(X˜i1,Viβt)2Ω2,

Third, we can derive the asymptotic normal distribution for β, βt, Ω1 and Ω2 as

n1βˆ1βˆtΩ^1Ωˆ2ββtΩ1Ω2N0,I1JI,

where J is the variance covariance matrix of the above four estimating equations and I is a matrix composed by the expectation of derivatives of each estimating equation with respect to β,βt,Ω1 and Ω2, respectively. Specifically,

I=1n1iXiXi00001n1iXtiXti0000100001

where Xi=1,Wi,Vi and Xti=1,Vi

Fourth, the asymptotic normal distribution for βˆ1 and BF^ jointly can be derived using delta method.

n1(βˆBF^)(βBF)N(0,CI1JIC),

where C is a matrix derived by taking derivative of β and BF each with respect to β, βt, V1 and V2 respectively. For example,

BFβ=0,BFβt=0,BFΩ1=1Ω2σ˜x2,BFV2=Ω1σ˜x2Ω2σ˜x22

Fifth, βˆ2=βˆ1BF can be derived using delta method.

n1βˆ2β2N(0,C(CI1JIC)C),

where C’ is a matrix derived by taking derivative of β2 each with respect to β and BF, respectively. That is,

β2β=1BF,β2BF=βBF2

Then we have estimating equation for β2 as below:

0=i=1n1Ui(βˆ2)=i=1n1Ui(β2*)+Uβ2β2*(βˆ2β2*)+o(βˆ2β2*).

With ULLN,

n11i=1n1Uiβ2β2|βˆ2=EUβ2|βˆ2+op(1)

and with continuous mapping theorem, we have

EUβ1|βˆ2=EUβ1|β2*+op(1)

So we have

n11i=1n1Uiβ2β1|βˆ2=Iβ1+op1.

Plug in the Taylor expansion, notice that EU(β2*)=0, we have

0=n11i=1n1{Ui(β2*)EU(β2*)}Iβ1(βˆ2β2*)+op(1)βˆ2β2*.

So we have

n1(βˆ2β2*)=Iβ21n11/2i=1n1Ui(β2*)+op1.

By CLT, we have

n11/2i=1n2{Ui(β2*)EU(β2*)}dN(0,Var(Ui(β2*)))

So we have

Σβ2=Iβ21VarUβ2Iβ2.

By assumption, Σˆβ2=Σβ2+op(1) which is a consistent estimator of Σβ2.

Then by applying Lemma 3, we have

Σˆγ2=Iˆγ21Iˆγ2+n2n1Iˆβ2Σˆβ2Iˆβ2Iˆγ2

Lemma 6

Assume n2/n1C1< and γ3* is the unique solution to EU(γ3,β3*)=0 for an estimating function Uγ3,β3. If γ3 solve the estimating equation 0=n21i=1n2Ui(γ3,βˆ3) where n1(βˆ3β3*)N(0,Σβ3), then we have n2γ3γ3*N0,Σγ3 where

Σγ3=Iγ31Iγ3+C1Iβ3Σβ3Iβ3Iγ3,

where Iγ3=EUγ3,β3γ3|γ3*,β3* and Iβ3=EUγ3,β3β3|γ3*,β3*. This variance can be consistently estimated by

Σˆγ3=Iˆγ31Iˆγ3+n2n1Iˆβ3Σˆβ3Iˆβ3Iˆγ3,

where Iˆγ3=n21i=1n2Uiγ3,β3γ3|γ3,βˆ3,Iˆβ3=n21i=1n2Uiγ3,β3β3|γˆ3,βˆ3 and Σˆβ3 is a consistent estimator of Σβ3.

Proof

We can derive the asymptotic for βˆ3 as below:

0=i=1n1Ui(βˆ3)=i=1n1Ui(β3*)+Uβ3|β3*(βˆ3β3*)+o(βˆ3β3*).

where Ui(β3*)=1,Wi,Vi,QiX˜i1,Wi,Vi,Qi1,Wi,Vi,Qiβ3*.

With ULLN,

n11i=1n1Uiβ3β3|βˆ3=EUβ3|βˆ3+op(1)

and with continuous mapping theorem, we have

EUβ3|βˆ3=EUβ3|β3*+op(1)

So we have

n11i=1n1Uiβ3β3|βˆ3=Iβ3+op1.

Plug in the Taylor expansion, notice that EU(β3*)=0, we have

0=n11i=1n1{Ui(β3*)EU(β3*)}Iβ3(βˆ3β3*)+op(1)βˆ3β3*.

So we have

n1(βˆ3β3*)=Iβ31n11/2i=1n1Ui(β3*)+op(1).

By CLT, we have

Σβ3=Iβ31VarUβ3β3.

By assumption, Σˆβ3=Σβ3+op(1) which is a consistent estimator of Σβ3.

By applying Lemma 3, we have

Σˆγ3=Iˆγ31Iˆγ3+n2n1Iˆβ3Σˆβ3Iˆβ3Iˆγ3

Lemma 7

Assume γ4* is the unique solution to EUγ4=0 for an estimating function Uγ4. Solving the estimating equation 0=n11i=1n1Uiγˆ4, we have n1γˆ4γ4*N0,Σγ4 where

Σγ4=Iγ41VarUγ4Iγ4,

where Iγ4=EUγ4γ4|γ4. This variance can be consistently estimated by

Σˆγ4=Iˆγ41VarUγ41ˆγ4,

where Iˆγ4=n11i=1n1Uiγ4γ4γˆ4 and Σˆγ4 is a consistent estimator of Σγ4.

Proof

Derive asymptotic γˆ4 for method 4 directly

0=i=1n1Uiγˆ4=i=1n1Uiγ4*+Uγ4|γ4*γˆ4γ4*+oγˆ4γ4*.

where Uiγ4*=1,Vi,QiX˜i1,Vi,QiT1,Vi,Qiγ4*. With ULLN,

n11i=1n1Uiγ4γ4|γ4=EUγ4|γ4+op(1)

and with continuous mapping theorem, we have

EUγ4|γ4=EUγ4|γ4+op(1)

So we have

n11i=1n1Uiγ4γ4|γˆ4=Iγ4+op1.

Plug in the Taylor expansion, notice that EUγ4*=0, we have

0=n11i=1n1Uiγ4*EUγ4*Iγ4γˆ4γ4*+op(1)γˆ4γ4*.

So we have

n1γˆ4γ4*=Iγ41n11/2i=1n1Uiγ4*+op(1)

By CLT, we have

n11/2i=1n2Uiγ4*EUγ4*dN0,VarUiγ4*

So we have

Ση4=Iγ41VarUγ4Iγ4

By assumption, Σγ4=Σγ4+op(1) which is a consistent estimator of Σγ4.

Combine different steps together, we obtain the final asymptotics result.

Theorem 1

Provide the asymptotic for method 1–4 under Cox model

With μ3μ2C2 and m2n1C1, we have n3(θ1θ1*)N0,Σθ1 where Σθ1=Iθ11Iθ1+C2InΣnlnIθ1 can be consistently estimated by Σˆθ1=Iˆθ11(Iˆθ1+n3n2IˆnΣˆnI˙n)Iˆθ1 for method 1.

With μ3μ2C2 and m2n1C1, we have n3(θˆ2θ2*)N0,Σθ2 where Σθ2=Iθ21(Iθ2+C2Iγ2Σγ2lγ2)Iθ2 can be consistently estimated by Σˆθ2=Iˆθ21(Iˆθ2+n3n2Iˆn2Σˆn2Iγ2T)Iˆθ2T for method 2.

With H1m2C2 and m2n1C1, we have m3(θˆ3θ3*)N0,Σθ3 where Σθ3=Iθ31(Iθ3+C2Iγ3Σγ3lγ3)Iθ3 can be consistently estimated by Σˆθ3=Iˆθ31(Iˆθ3+n3n2IˆnΣˆn3Iˆγ3)Iˆθ3 for method 3.

With m3μ2C2 and m2n1C1, we have n3(θˆ4θ4*)N0,Σθ4 where Σθ4=Iθ41(Iθ4+C2Iγ4Σγ4Iγ4)Iθ4 can be consistently estimated by Σˆθ4=Iˆθ41(Iˆθ4+n3n2Iˆ2Σˆ2Iˆz4)Iˆθ4 for methad 4.

Proof By applying Lemma 2 and Lemma 4, the asymptotic Σθ1 can be derived as Σˆθ1=Iˆθ11(Iˆθ1+n3n2IˆnΣˆγ1Iˆn)Iˆθ1. By applying Lemma 2 and Lemma 5, the asymptotic Σθ2 can be derived as Σˆθ2=Iˆθ21(Iˆθ2+n3n2Iˆγ2Σˆγ2Iˆγ2)Iˆθ2. By applying Lemma 2 and Lemma 6, the asymptotic Σθ3 can be derived as Σˆθ3=Iˆθ31(Iˆθ3+n3n2Iˆγ3Σˆγ3Iˆγ3)Iˆθ3. By applying Lemma 2 and Lemma 7, the asymptotic Σθ4 can be derived as Σˆθ4=Iˆθ41(Iˆθ4+n3n2Iˆγ4Σˆγ4Iγ4)Iˆθ4.

A.3: Efficiency Comparison

In this part, we heuristically compare the efficiency of method 3 and method 4 by comparing the average variation in the estimation of Z. For simplicity, we can assume without loss of generality that all variables are centered and only compare the efficiency in estimating Zˆ because the variance of θˆ is a monotone function of the variance of Zˆ. We compare the expected variance under fixed design. For method 4,

E[Var(Zˆ4Q,V)]=E[n111,Q,V{1,Q,V1,Q,V}11,Q,VVar(XQ,V)],

and for method 3,

E[Var(Zˆ3Q,V)]=E[n211,Q,V1,Q,V1,Q,V11,Q,VVar(XQ,V)]+n11n211,Q,V1,Q,V1,Q,V11,Q,VVar(XQ,V,W)=n111,Q,V1,Q,V1,Q,V11,Q,VVar(XQ,V)×n1n2+(1n1n2)(1R(X,W)Q,V2).

Appendix B: Details of Simulation Settings and Additional Results

B.1: Details of Simulation Settings

For each setting, we evaluate the strength of biomarker, self-reported data as well as observed prediction strength from each stage. Specifically, we consider the following quantities, coefficient and partial coefficient of multiple determination on long-term dietary intake by biomarker, mathematically, RZWV2=1Var(Z|W,V)Var(Z), RZWV2=1Var(ZW,V)Var(ZV); coefficient and partial coefficient of multiple determination on long-term dietary intake by self-reported data, mathematically, RZQV2=1Var(ZQ,V)Var(Z), RZQV2=1Var(ZQ,V)Var(ZV); coefficient and partial coefficient of multiple determination on long-term dietary intake by self-reported data and biomarker, mathematically, RZQWV2=1Var(ZW,Q,V)Var(Z), RZQWV2=1Var(Z|W,Q,V)Var(ZV); coefficient and partial coefficient of multiple determination on consumed dietary intake by biomaker, mathematically, RX~WV2=1Var(X~W,V)Var(X~), RX~WV2=1Var(X~W,V)Var(X~V); coefficient and partial coefficient of multiple determination on consumed dietary intake by biomaker and self-reported data, mathematically, RX~WV2=1Var(X~|W,Q,V)Var(X~), RX~WQV2=1Var(X~,W,Q,V)Var(X~V), coefficient and partial coefficient of multiple determination on estimated dietary intake with Method 2 by self-reported data, mathematically, RXˆ22QV=1Var(Xˆ2Q,V]Var(Xˆ2) and RXˆ2Q|V2=1Var(Xˆ2Q|V)Var(Xˆ2V).

In settings 1, 2 and 3, we fixed the effect on Q by setting a0=4,a1=1.5 and σq=3. In setting 4, 5 and 6, we set a0=0.4, a1=2 and σq=4. In addition, we decrease the coefficient of X on W from 1.3 to 0.8 in the first three settings while we decrease the coefficient of X on W from 1.1 to 0.5 in the last three settings. Table 5 displayed all different types of R2 mentioned above for all six settings. To be more specific, by fixing the strength of self-reported data and correlation between true dietary intake and personal characteristic, we gradually decreased the strength of biomarker in the first three settings. In the last three settings, the correlation between true dietary intake and personal characteristic is set to be 0. The strength of self-reported data is again fixed but at a different level compared to the first three settings. We again decreased the strength of biomarker gradually in the last three settings. Below is the list of the six settings with varying parameters.

Table 5:

List of R2 and censoring rates (CR) under mixed censoring (Mix), uniform censoring (Unif) and exponential censoring (Exp) among different measurements

Setting 1 Setting 2 Setting 3 Setting 4 Setting 5 Setting 6
RZWV2 0.68 0.64 0.55 0.53 0.38 0.19
RZW|V2 0.49 0.41 0.28 0.53 0.38 0.19
RZQV2 0.46 0.46 0.46 0.20 0.20 0.20
RZQ|V2 0.13 0.13 0.13 0.20 0.20 0.20
RZQWV2 0.71 0.67 0.60 0.58 0.46 0.33
RZQW|V2 0.53 0.46 0.35 0.58 0.46 0.33
RX˜WV2 0.56 0.52 0.44 0.44 0.32 0.16
RX˜W|V2 0.38 0.32 0.21 0.44 0.32 0.16
RX˜WQV2 0.58 0.54 0.48 0.48 0.38 0.26
RX˜WQ|V2 0.40 0.35 0.26 0.48 0.38 0.26
RX^2QV2 0.59 0.92 0.98 0.11 0.08 0.04
RX^2Q|V2 0.07 0.06 0.04 0.11 0.08 0.04
CR(λ0(t)=0.002,Mix) 94% 94% 94% 95% 95% 95%
CR(λ0(t)=0.002,Unif) 97% 97% 97% 97% 97% 97%
CR(λ0(t)=0.002,Exp) 94% 94% 94% 95% 95% 95%
CR(λ0(t)=0.004,Mix) 89% 89% 89% 90% 90% 90%
CR(λ0(t)=0.004,Unif) 94% 94% 94% 95% 95% 95%
CR(λ0(t)=0.004,Exp) 89% 89% 89% 90% 90% 90%
CR(λ0(t)=0.0004t,Mix) 94% 94% 94% 95% 95% 95%
CR(λ0(t)=0.0004t,Unif) 91% 91% 91% 92% 92% 92%
CR(λ0(t)=0.0004t,EXP) 76% 76% 76% 77% 77% 77%

b1=1.3,ρ=0.6,a0=4,a1=1.5,σq=3 (Setting 1);

b1=1.1,ρ=0.6,a0=4,a1=1.5,σq=3 (Setting 2);

b1=0.8,ρ=0.6,a0=4,a1=1.5,σq=3 (Setting 3);

b1=1.1,ρ=0,a0=0.4,a1=2,σq=4 (Setting 4);

b1=1.1,ρ=0,a0=0.4,a1=2,σq=4 (Setting 5);

b1=0.5,ρ=0,a0=0.4,a1=2,σq=4 (Setting 6); The values of R2s are listed in table 5.

B.2: Additional Simulation Results

To better compare the robustness of these methods, we further conduct simulation under the setting where the relationships between the self-reported dietary intake and the short-term dietary intake are different before and after the controlled feeding study. Table 6 summarized simulation results when the correlation structures between the FFQ and the true dietary intakes of the controlled feeding study and the full cohort are different under settings 1 and 4. With the correlation unequal between controlled feeding study and full cohort, method 2 with BF added has shown more robustness in controlling bias compared with method 3 and method 4. With the increase of difference from 10% to 50% in association between Q and Z, the performance of both method 3 and method 4 become worse (larger bias) while the performance in controlling bias of the estimator from method 2 is consistently good in most cases. In addition, in most settings, we observe that method 2 has smaller SD compared with method 3–4. The performance of method 2 is adequate even when partial R2s (i.e., Setting 3: RX˜WV2=0.21; Setting 6: RX˜WV2=0.16) from the biomarker construction step and the calibration equation building step are both low, which suggests that in the real application of method 2, one may not need to be too stringent on the threshold of partial R2.

Table 6:

Simulation results comparing different methods’ performance when the correlation structures between the short term dietary intake and true dietary intakes of the controlled feeding study and the full cohort are different when λ0(t)=0.002.

Censor Setting Diff Method (n1,n2,n3)=(150,300,5150) (n1,n2,n3)=(300,600,10300) (n1,n2,n3)=(600,1200,20600)

Bias SE SD CR Bias SE SD CR Bias SE SD CR

Mix 1 0.1 1 0.48 0.56 0.56 0.97 0.40 0.36 0.37 0.87 0.39 0.34 0.33 0.82
0.1 2 0.05 0.29 0.29 0.97 0.01 0.19 0.19 0.96 0.01 0.18 0.17 0.96
0.1 3 0.04 0.99 1.2 0.97 0.03 0.20 0.20 0.97 0.03 0.18 0.18 0.97
0.1 4 0.10 0.35 0.36 0.98 0.06 0.22 0.24 0.97 0.05 0.20 0.20 0.97

0.3 1 0.48 0.56 0.56 0.97 0.40 0.36 0.37 0.87 0.39 0.34 0.33 0.82
0.3 2 0.05 0.29 0.29 0.97 0.01 0.19 0.19 0.96 0.01 0.18 0.17 0.96
0.3 3 0.01 9.06 3.7 0.99 0.08 0.23 0.24 0.98 0.07 0.21 0.20 0.97
0.3 4 0.25 0.66 0.68 0.99 0.18 0.33 0.37 0.99 0.16 0.26 0.26 0.97

0.5 1 0.48 0.56 0.56 0.97 0.40 0.36 0.37 0.87 0.39 0.34 0.33 0.82
0.5 2 0.05 0.29 0.29 0.97 0.01 0.19 0.19 0.96 0.01 0.18 0.17 0.96
0.5 3 0.28 3.03 1.9 0.99 0.15 0.27 0.29 0.98 0.13 0.24 0.23 0.96
0.5 4 0.60 12.14 4.63 1.00 0.37 17.9 7.29 1.00 0.40 0.43 0.47 0.97

4 0.1 1 0.38 0.35 0.34 0.91 0.35 0.24 0.24 0.75 0.35 0.22 0.22 0.67
0.1 2 0.03 0.19 0.19 0.96 0.01 0.13 0.13 0.96 0.01 0.12 0.12 0.96
0.1 3 0.04 0.20 0.21 0.98 0.02 0.13 0.14 0.96 0.02 0.12 0.12 0.96
0.1 4 0.06 0.22 0.22 0.98 0.04 0.14 0.15 0.97 0.04 0.13 0.13 0.95

0.3 1 0.38 0.35 0.34 0.91 0.35 0.24 0.24 0.75 0.35 0.22 0.22 0.67
0.3 2 0.03 0.19 0.19 0.96 0.01 0.13 0.13 0.96 0.01 0.12 0.12 0.96
0.3 3 0.08 0.23 0.27 0.99 0.06 0.15 0.15 0.97 0.06 0.14 0.14 0.95
0.3 4 0.18 0.37 0.52 1.00 0.14 0.19 0.20 0.97 0.14 0.17 0.17 0.93

0.5 1 0.38 0.35 0.34 0.91 0.35 0.24 0.24 0.75 0.35 0.22 0.22 0.67
0.5 2 0.03 0.19 0.19 0.96 0.01 0.13 0.13 0.96 0.01 0.12 0.12 0.96
0.5 3 0.16 0.35 0.57 0.99 0.11 0.17 0.18 0.97 0.11 0.16 0.16 0.93
0.5 4 0.47 1.43 1.36 1.00 −0.26 7.08 23.52 0.98 0.33 0.26 0.26 0.87

Unif 1 0.1 1 0.46 0.66 0.67 0.97 0.41 0.43 0.45 0.89 0.41 0.41 0.40 0.85
0.1 2 0.04 0.34 0.34 0.97 0.02 0.22 0.23 0.96 0.02 0.21 0.21 0.97
0.1 3 0.02 1.29 1.63 0.98 0.04 0.23 0.24 0.96 0.04 0.22 0.22 0.96
0.1 4 0.09 0.41 0.43 0.98 0.07 0.26 0.28 0.97 0.06 0.23 0.23 0.96

0.3 1 0.46 0.66 0.67 0.97 0.41 0.43 0.45 0.89 0.41 0.41 0.40 0.85
0.3 2 0.04 0.34 0.34 0.97 0.02 0.22 0.23 0.96 0.02 0.21 0.21 0.97
0.3 3 0.01 8.55 3.47 0.98 0.08 0.27 0.28 0.97 0.08 0.25 0.24 0.96
0.3 4 0.24 0.74 0.79 0.99 0.19 0.37 0.42 0.98 0.17 0.30 0.30 0.96

0.5 1 0.46 0.66 0.67 0.97 0.41 0.43 0.45 0.89 0.41 0.41 0.40 0.85
0.5 2 0.04 0.34 0.34 0.97 0.02 0.22 0.23 0.96 0.02 0.21 0.21 0.97
0.5 3 0.28 3.3 2.05 0.99 0.15 0.32 0.34 0.97 0.14 0.28 0.28 0.96
0.5 4 0.64 10.89 4.01 0.99 0.31 23.52 9.65 0.99 0.42 0.50 0.52 0.97

4 0.1 1 0.37 0.42 0.40 0.94 0.41 0.43 0.45 0.89 0.35 0.27 0.27 0.77
0.1 2 0.02 0.23 0.22 0.97 0.01 0.15 0.15 0.97 0.01 0.15 0.15 0.96
0.1 3 0.03 0.23 0.24 0.97 0.02 0.16 0.16 0.96 0.02 0.15 0.15 0.95
0.1 4 0.05 0.25 0.25 0.97 0.04 0.17 0.17 0.97 0.04 0.16 0.16 0.95

0.3 1 0.37 0.42 0.40 0.94 0.41 0.43 0.45 0.89 0.35 0.27 0.27 0.77
0.3 2 0.02 0.23 0.22 0.97 0.01 0.15 0.15 0.97 0.01 0.15 0.15 0.96
0.3 3 0.08 0.27 0.31 0.98 0.06 0.17 0.17 0.96 0.06 0.16 0.16 0.95
0.3 4 0.18 0.42 0.60 0.99 0.14 0.22 0.23 0.96 0.13 0.20 0.20 0.94

0.5 1 0.37 0.42 0.40 0.94 0.41 0.43 0.45 0.89 0.35 0.27 0.27 0.77
0.5 2 0.02 0.23 0.22 0.97 0.01 0.15 0.15 0.97 0.01 0.15 0.15 0.96
0.5 3 0.15 0.40 0.66 0.98 0.11 0.20 0.20 0.97 0.11 0.19 0.19 0.94
0.5 4 0.44 1.16 0.99 0.99 −0.23 4.78 21.23 0.98 0.32 0.29 0.30 0.91

Exp 1 0.1 1 0.44 0.51 0.52 0.95 0.41 0.33 0.35 0.83 0.39 0.30 0.30 0.79
0.1 2 0.03 0.26 0.27 0.97 0.01 0.17 0.18 0.96 0.01 0.16 0.16 0.96
0.1 3 0.02 0.95 1.18 0.97 0.03 0.18 0.19 0.97 0.02 0.17 0.16 0.96
0.1 4 0.08 0.32 0.33 0.97 0.06 0.21 0.24 0.96 0.05 0.18 0.18 0.97

0.3 1 0.44 0.51 0.52 0.97 0.41 0.33 0.35 0.83 0.39 0.30 0.30 0.79
0.3 2 0.03 0.26 0.27 0.97 0.01 0.17 0.18 0.96 0.01 0.16 0.16 0.96
0.3 3 −0.03 10.60 4.36 0.98 0.08 0.21 0.23 0.97 0.07 0.19 0.18 0.97
0.3 4 0.03 0.61 0.63 0.98 0.18 0.31 0.37 0.99 0.16 0.24 0.25 0.97

0.5 1 0.44 0.51 0.52 0.97 0.41 0.33 0.35 0.83 0.39 0.30 0.30 0.79
0.5 2 0.03 0.26 0.27 0.97 0.01 0.17 0.18 0.96 0.01 0.16 0.16 0.96
0.5 3 0.23 2.42 1.47 0.98 0.15 0.26 0.29 0.98 0.13 0.22 0.21 0.96
0.5 4 0.63 9.78 3.56 0.99 0.42 13.46 5.43 1.00 0.40 0.42 0.49 0.98

4 0.1 1 0.37 0.32 0.31 0.91 0.34 0.22 0.23 0.69 0.34 0.20 0.20 0.63
0.1 2 0.02 0.18 0.17 0.97 0.00 0.12 0.12 0.96 0.00 0.11 0.11 0.96
0.1 3 0.03 0.18 0.19 0.97 0.02 0.12 0.12 0.96 0.02 0.11 0.11 0.95
0.1 4 0.05 0.20 0.20 0.97 0.04 0.13 0.14 0.97 0.03 0.12 0.12 0.95

0.3 1 0.37 0.32 0.31 0.91 0.34 0.22 0.23 0.69 0.34 0.20 0.20 0.63
0.3 2 0.02 0.18 0.17 0.97 0.00 0.12 0.12 0.96 0.00 0.11 0.11 0.96
0.3 3 0.08 0.21 0.25 0.97 0.06 0.14 0.14 0.97 0.05 0.12 0.12 0.94
0.3 4 0.17 0.35 0.50 0.99 0.14 0.18 0.18 0.98 0.12 0.15 0.15 0.93

0.5 1 0.37 0.32 0.31 0.91 0.34 0.22 0.23 0.69 0.34 0.20 0.20 0.63
0.5 2 0.02 0.18 0.17 0.97 0.00 0.12 0.12 0.96 0.00 0.11 0.11 0.96
0.5 3 0.15 0.33 0.55 0.98 0.11 0.16 0.17 0.97 0.10 0.14 0.14 0.93
0.5 4 0.46 1.53 1.53 1.00 −0.26 7.08 23.52 0.98 0.31 0.24 0.24 0.88

B.3: Estimation of σ˜x

Here we discuss an alternative method to estimate σ˜x when more than one self-reported measurements exist (denoted as Q1 and Q2 for example) and only one of them is available in the first stage sample. For this scenario, using method 3 will limit us to the information of Q1 only and thus might be less efficient comparing to method 2 with the usage of Q2 when Q2 has stronger association with Z than Q1. Specifically, we can first obtain the estimator θ^1 and θ^3 using Q1 only and then solve the equation θ^1=θ^3/BF to obtain σ˜^x from

σ˜^x=θˆ1θˆ3{Var(X˜W,V)Var(X˜V)(1θˆ3θˆ1)}

Then we can rerun method 2 with Q2 and σ˜^x to get a better estimate θ˜2. Bootstrap can be used to handle the additional variation due to the estimation of σ˜^x.

To illustrate this, we perform additional simulation under Setting 1 and 4. We modified σqs for Q1 and Q2 while keeping other parameters unchanged. Table 8 displayed the bias, SD, RZQV2 and RZQV2 under different σq and corresponding R2 and partial RZQV2 for method 2 and 3. Based on Table 8, we can see Method 2 using Q2 and σ˜^x is more efficient (smaller SD) than Method 3 on the estimated associated parameter when the strength of Q2 is large enough.

Table 8:

Simulation results comparing efficiency between method 2 and method 3 with σ˜^x used for method 2 when λ0(t)=0.002.

(n1,n2,n3)=(150,300,5150) (n1,n2,n3)=(300,600,10300)

Setting (σq1,σq2) Method RZQV2 RZQ|V2 Bias SD Bias SD

1 (2,3) 2 0.54 0.26 0.00 0.141 0.01 0.121
3 0.46 0.13 0.06 0.229 0.01 0.132

4 (2,5,4) 2 0.38 0.38 −0.01 0.092 0.00 0.079
3 0.19 0.19 0.02 0.115 0.00 0.089

Footnotes

Competing interests

All authors declare that there are no relevant financial or non-financial competing interests to report.

Contributor Information

Cheng Zheng, Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE, 68198, USA.

Yiwen Zhang, Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI, 53205, USA.

Ying Huang, Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.

Ross Prentice, Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.

Availability of data and material

The data that support the findings of this study are available from the Women’s Health Initiative (WHI) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request in a collaborative mode and with permission of Women’s Health Initiative (WHI).

References

  1. Adams KF, Schatzkin A, Harris TB, Kipnis V, Morris T, & Ballard-Barbash R 2006. Overweight, obesity and mortality in a large prospective cohort of persons 50 to 71 years old. New England Journal of Medicine, 355, 763–778. [DOI] [PubMed] [Google Scholar]
  2. Bartlett JW, & Keogh RH 2018. Bayesian correction for covariate measurement error: A frequentist evaluation and comparison with regression calibration. Statistical Methods in Medical Research, 27, 1695–1708. [DOI] [PubMed] [Google Scholar]
  3. Belanger CF, Hennekens CH, Rosner B, & Speizer FE 1978. The nurses’ health study. American Journal of Nursing, 78, 1039–1040. [PubMed] [Google Scholar]
  4. Carroll RJ, Ruppert D, Stefanski LA, & Crainiceanu CM 2006. Measurement error in nonlinear models: a modern perspective. US: CRC Press. [Google Scholar]
  5. Freedman LS, Schatzkin A, Midthune D, & Kipnis V 2011. Dealing with dietary measurement error in nutritional cohort studies. Journal of the National Cancer Institute, 103, 1086–1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Hu C, & Lin DY 2002. Cox regression with covariate measurement error. Scandinavian Journal of Statistics, 29, 637–655. [Google Scholar]
  7. Huang Y, & Wang CY 2000. Cox regression with accurate covariate unascertainable: A nonparametric-correction approach. Journal of the American Statistical Association, 45, 1209–1219. [Google Scholar]
  8. Huang Y, Horn L. Van, Tinker LF, Neuhouser ML, Carbone L, Mossavar-Rahmani Y, Thomas F, & Prentice RL 2013. Measurement error corrected sodium and potassium intake estimation using 24-hour urinary excretion. Hypertension, 63, 238–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Lampe JW, Huang Y, Neuhouser ML, Tinker LF, Song X, Schoeller DA, Kim S, Raftery D, Di C, Zheng C, Schwarz Y, Horn L. Van, Thomson CA, Mossavar-Rahmani Y, Beresford SAA, & Prentice RL 2017. Dietary biomarker evaluation in a controlled feeding study in women from the Women’s Health Initiative cohort. The American Journal of Clinical Nutrition, 105, 466–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Li Y, & Ryan L 2006. Inference on survival data with covariate measurement error-An imputationapproach. Scandinavian Journal of Statistics, 33, 169–190. [Google Scholar]
  11. O’Donnell M, Mente A, Rangarajan S, McQueen MJ, Wang X, Liu L, Yan H, Lee SF, Mony P, Devanath A, Rosengren A, Lopez-Jaramillo P, Diaz R, Avezum A, Lanas F, Yusoff K, Iqbal R, Ilow R, Mohammadifard N, Gulec S, Yusufali AH, Kruger L, Yusuf R, Chifamba J, Kabali C, Dagenais G, Lear SA, Teo K, & Yusuf S 2014. Urinary sodium and potassium excretion, mortality, and cardiovascular events. New England Journal of Medicine, 37, 612–623. [DOI] [PubMed] [Google Scholar]
  12. Paeratakul S, Popkin BM, Kohlmeier L, Hertz-Picciotto I, Guo X, & Edwards LJ 1998. Measurement error in dietary data: Implications for the epidemiologic study of the diet-disease relationship. European Journal of Clinical Nutrition, 52, 722–727. [DOI] [PubMed] [Google Scholar]
  13. Prentice RL 1982. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika, 69, 331–342. [Google Scholar]
  14. Prentice RL, & Huang Y 2018. Nutritional epidemiology methods and related statistical challenges and opportunities. Statistical Theorgy and Related Fields, 154, 2152–2164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Prentice RL, Mossavar-Rahmani Y, Huang Y, Horn L. Van, Beresford SAA, Caan B, Tinker L, Schoeller D, Bingham S, Eaton CB, Thomson C, Johnson KC, Ockene J, Sarto G, Heiss G, & Neuhouser ML 2011. Evaluation and comparison of food records, recalls, and frequencies for energy and protein assessment by using recovery biomarkers. American Journal of Epidemiology, 174, 591–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Prentice RL, Huang Y, Neuhouser ML, Manson JE, Mossavar-Rahmani Y, Thomas F, Tinker LF, Allison M, Johnson KC, Wssertheil-Smoller S, Seth A, Rossouw JE, Shikany J, Crbone LD, Martin LW, Stefanick M, Haring B, & Horn L Van. 2017. Associations of biomarker-calibrated sodium and potassium intakes with cardiovascular disease risk among postmenopausal women. American Journal of Epidemiology, 186, 1035–1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Prentice RL, Pettinger M, Neuhouser ML, Raftery D, Zheng C, Gowda GAN, Huang Y, Tinker LF, Howard BV, Manson JE, Wallace R, Mossavar-Rahmani Y, Johnson KC, & Lampe JW 2021a. Biomarker-Calibrated Macronutrient Intake and Chronic Disease Risk among Postmenopausal Women. The Journal of Nutrition, 151(8), 2330–2341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Prentice RL, Pettinger M, Neuhouser ML, Raftery D, Zheng C, Gowda GAN, Huang Y, Tinker LF, Howard BV, Manson JE, Wallace R, Mossavar-Rahmani Y, Johnson KC, & Lampe JW 2021b. Four-day food record macronutrient intake, with and without biomarker calibration, and chronic disease risk in postmenopausal women. American Journal of Epidemiology, Accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Rosner B, Spiegelman D, & Willett WC 1990. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology, 132, 734–745. [DOI] [PubMed] [Google Scholar]
  20. Shaw PA, & Prentice RL 2012. Hazard ratio estimation for biomarker-calibrated dietary exposures. Biometrics, 68, 397–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Song X, & Huang X 2005. On corrected score approach for proportional hazards model with covariate-measurement error. Biometrics, 61, 702–714. [DOI] [PubMed] [Google Scholar]
  22. Subar AF, Kipnis V, Troiano RP, Midthune D, Schoeller DA, Bingham S, Sharbaugh CO, Trabulsi J, Runswick S, Ballard-Barbash R, Sunshine J, & Schatzkin A 2003. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. American Journal of Epidemiology, 158, 1–13. [DOI] [PubMed] [Google Scholar]
  23. Wang CY, Hsu L, Feng ZD, & Prentice RL 1997. Regression calibration in failure time regression. Biometrics, 53, 131–145. [PubMed] [Google Scholar]
  24. WCRF/AICR. 2007. Food, Nutrition and the Prevention of Cancer: A Global Perspective. Washington, DC: American Institute for Cancer Research. [Google Scholar]
  25. Yan Y, & Yi GY 2015. A corrected profile likelihood method for survival data with covariate measurement error under the Cox model. The Canadian Journal of Statistics, 43, 454–480. [Google Scholar]
  26. Zheng C, Beresford SAA, Horn L. Van, Tinker LF, Thomson CA, Neuhouser ML, Di C, Manson JE, Mossavar-Rahmani Y, Seguin R, Manini T, LaCroix AZ, & Prentice RL 2014. Simultaneous association of total energy consumption and activity-related energy expenditure with cardiovascular disease, cancer, and diabetes risk among postmenopausal women. American Journal of Epidemiology, 180, 526–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Zucker DM 2005. A pseudo-partial likelihood method for semiparametric survival regression withcovariate errors. Journal of the American Statistical Association, 100, 1264–1277. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the Women’s Health Initiative (WHI) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request in a collaborative mode and with permission of Women’s Health Initiative (WHI).

RESOURCES