Abstract
Purpose:
A substantial proportion of global deaths is attributed to unhealthy diet, which can be assessed at baseline or longitudinally. We demonstrated how to simultaneously correct for random measurement error, correlations, and skewness in the estimation of associations between dietary intake and all-cause mortality.
Methods:
We applied a multivariate joint model (MJM) that simultaneously corrected for random measurement error, skewness, and correlation among longitudinally measured intake levels of cholesterol, total fat, dietary fiber, and energy with all-cause mortality using US National Health and Nutrition Examination Survey linked to the National Death Index mortality data. We compared MJM with the mean method that assessed intake levels as the mean of a person’s intake.
Results:
The estimates from MJM were larger than those from the mean method. For instance, the logarithm of hazard ratio (log HR) for dietary fiber intake increased by 14 times (from −0.04 to −0.60) with the MJM method. This translated into relative hazard of death of 0.55 (95% Credible Interval, CI: 0.45, 0.65) with the MJM and 0.96 (95% CI: 0.95, 0.97) with the mean method.
Conclusions:
MJM adjusts for random measurement error and flexibly addresses correlations and skewness among longitudinal measures of dietary intake when estimating their associations with death.
Keywords: Attenuation, Bayesian analysis, joint model, NHANES, usual intake
Introduction
A substantial proportion of global deaths from all causes and diseases is attributed to lifestyle risk factors such as unhealthy diet and unhealthy body weight [1]. Long-term intake of poor diets is linked with increased risk of mortality [2-5]. Additionally, obesity and advanced age are linked with increased mortality [6, 7]. In contrast, healthy dietary intakes, physical activity, and healthy body weights are associated with lower risk of premature deaths [8, 9]. These risk factors can be assessed either at baseline or longitudinally in epidemiologic studies. Dietary intake variables are often correlated, skewed and measured with error in longitudinal studies [10]. For instance, an individual who consumes a large amount of dietary fat is more likely to have high intake level of energy, resulting in positively correlated dietary variables. Further, the distribution of such dietary intake variables are often asymmetric and shifted to the right (right skewed). Dietary intake is usually assessed using dietary recall instruments, such as 24-hour recall, that are prone to recall bias, resulting in measurement error in dietary intake values. Despite the availability of repeated observations of individuals, observed error-prone baseline values are used to estimate baseline effects [11]. Random errors in the measured variables attenuate their associations with the outcome of interest [12, 13]. Failing to adjust for such measurement errors and not handling the correlation and skewness correctly may lead to biased estimation of associations [11]. Therefore, statistical methods that can model multiple longitudinal risk factors and mortality simultaneously are critical in epidemiologic studies.
To correctly estimate the model parameters, it is important to have knowledge of the risk factor history for all individuals [14]. Joint modeling (JM) approaches are increasingly used in biomedical research to analyze longitudinal data of a risk factor and survival data of an event simultaneously [15-21]. The interdependency between the longitudinal sub-model and the survival sub-model of the joint model can be incorporated through a set of shared/correlated random effects [21]. With the joint model, an underlying or true value of the longitudinal risk factor specified through the longitudinal sub-model is used in the survival sub-model instead of the observed error-prone value of the risk factor. Note that a longitudinal risk factor is the outcome in the longitudinal sub-model, but its true value is a predictor in the survival sub-model, thus linking the two sub-models. While the standard joint model estimates the association between a survival outcome and a single longitudinal risk factor, there are more often multiple risk factors predictive of the event of interest in epidemiologic studies [22]. Extending the standard joint model to a multivariate setting that handles multiple interdependent risk factors allows for the incorporation of more information, accounts for correlation and measurement error, and ensures better understanding of the underlying nature of the event dynamics [22]. The JM can further be extended to handle skewness and to accommodate common distributions from the exponential family, such as Gamma, Binomial and Poisson, for multiple longitudinal outcomes, and left, right and interval censoring types for survival outcomes including competing risks [16].
This work employed the US National Health and Nutrition Examination Survey (NHANES) data linked to the National Death Index (NDI) mortality public-use data [23, 24]. NHANES includes a standardized physical examination, laboratory tests and questionnaires covering various health related topics [25, 26]. The National Death Index is a centralized database of death records in the United States [23, 24].
Currently, there is limited research on the application of multivariate joint modeling (MJM) approaches in epidemiologic association studies subject to measurement error, skewness, and correlations. First, we demonstrated use of MJM to simultaneously correct for random measurement error and to handle skewness in correlated dietary intake variables while estimating the associations of four longitudinal dietary intakes (cholesterol, total fat, dietary fiber, and energy) and a set of baseline factors (sex, age, and Body Mass Index) with all-cause mortality among NHANES participants who had mortality data in the NDI database and compared this method to the mean method. The mean method estimates the underlying intake distribution with a person’s mean intake over the follow-up period. Second, we extended the work of Crowther et al. [11], through an application of MJM to estimate baseline effects of multiple longitudinally measured dietary intake on mortality and compared this approach to the standard method. The MJM approach adjusts for random measurement errors and incorporates repeated measures of the dietary intakes while estimating their baseline associations with the hazard of all-cause mortality. With the standard method, the observed error-prone dietary intake measurements are used to estimate their baseline effects on all-cause mortality.
Material and methods
NHANES-NDI mortality linked data
NHANES is a program of the National Center for Health Statistics (NCHS) of the US Centers for Disease Control and Prevention that began in 1960s and has conducted a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements [25, 26]. NHANES examines a nationally representative sample of approximately 5000 persons each year. These persons are in counties across the country, 15 of which are visited each year and the data are released in a 2-year cycle. The NHANES includes standardized physical examination, laboratory tests, and questionnaires covering health related topics [23]. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. NHANES includes an interview in the household, followed by an examination in a mobile examination center (MEC). The NHANES sample is a representative of noninstitutionalized civilian US population. Currently, the NHANES is linked to the NDI mortality through December 31, 2019. National Death Index is a centralized database containing death record information that can be used to identify the deceased, cause of death and manner of death. NHANES participants were considered eligible for mortality follow-up if they provided sufficient information at the time of the interview or MEC follow-up. For this study, we used NHANES data collected from 2009 to 2016 linked to NDI public-use mortality data through December 2019 for eligible participants aged 18 years and over at the time of the interview [23]. The public-use mortality data include limited information on decedents for confidentiality reasons.
Study variables
For illustration, we used a subset of the NHANES study variables in the public-use mortality linked data. Among the demographic variables considered were age at the screening interview, race-ethnicity coded as non-Hispanic black vs others, sex, and Body Mass Index (calculated from weight and height as kg/m2) and the interview cycle (2009-2010, 2011-2012, 2013-2014 and 2015-2016). Further, participants were asked whether they have been diagnosed with diabetes among other health conditions, with yes/no responses. We chose these illustrative study variables given their well-established link with mortality. For instance, greater risk of all-cause and cause-specific mortality was associated with diabetes and obesity among other risk factors in a meta-analysis of approximately 2 million individuals in 48 independent cohort studies [27]. Further, overweight and obesity are linked with increased risk of all-cause mortality [2, 6, 28, 29]. The effects of these risk factors on mortality usually differ by sex, age, and race/ethnicity [30, 31]. For the time-to-event outcome, we used all-cause mortality status (dead/alive) variable and duration of follow-up. The duration of follow-up was based on person months of follow-up from the date of the interview/MEC. For those who were deceased, the person-month variable was calculated from the date of the interview to the date of death. For those who were alive, the person-month variable was calculated as the date of the interview to the end of the follow-up period, which was December 31, 2019.
In addition, two measures of dietary intakes data were collected at different time points for foods consumed during the 24-hour period prior to the interview to estimate intakes of energy, nutrients, and other food components. The first dietary recall interview was collected in-person in the MEC, whereas the second interview data were collected by telephone 3 to 10 days later. For this study, we used two measurements per person of the 24-hour recall (24HR) intake data that were collected for cholesterol (mg), total fat (gm), dietary fiber (gm) and energy (kcal) intakes. We chose these dietary intake variables given research interest in their associations with mortality [32-39]. For instance, intake of dietary fiber is linked with reduced risk of death [8, 34, 40].
We assumed the four dietary intake variables (hereafter, referred to as longitudinal outcomes) to be measured with random errors that are uncorrelated with the unobserved true intakes. The random error exhibits intake variation from day to day within a person. We let denote the measured intake value for the -th person of the -th longitudinal outcome () at time point , denotes the random measurement error term; corresponds to the longitudinal trajectory representing the unobserved true intake of the -th longitudinal outcome, where , . Notably, in epidemiologic studies, interest is usually in the effect of long-term rather than daily dietary intake on a health outcome [41]. Usually, the mean of a large number of daily intakes for a person is assumed to be an unbiased estimator of the true long-term intake [41]. This estimator is defined statistically as
| (1) |
A simple approach to estimating true intake is to use the mean of several days of daily intakes per person. Although intuitively appealing, this simple approach might lead to biased estimation of the underlying distribution, because the presence of the day-to-day variability can inflate the variance of the distribution of individual means [41]. Consequently, any assessment based on the distribution of the mean of multiple measurements will be biased. That bias can be reduced by greatly increasing the number of measurements per person. This approach, however, might be impractical in terms of cost and respondent burden. Notably, dietary intake variables considered in this study and in other studies, are usually correlated and characterized by skewed (asymmetric) distributions with heterogeneous between-person variability in intakes [41, 43, 44]. Normality is not often attained by logarithmic transformation of daily intake data [41]. Therefore, it is important to apply a statistical method that removes the day-to-day variance in estimating the underlying distribution.
Multivariate joint model (MJM) method
The MJM framework involves specification of sub-models for the longitudinal outcomes and the survival outcome. The longitudinal sub-model is combined with the survival sub-model by estimating the two sub-models jointly. In estimating the MJM, the true values estimated from the longitudinal sub-model are used to predict the event status in the survival sub-model [45]. For the longitudinal sub-model, we used a multivariate generalized linear mixed effects model (GLMM). The multivariate GLMM accommodates multiple longitudinal outcomes with different distributional characteristics, such as right skewness, exhibited in dietary intakes. Specifically, the underlying distribution of the -th longitudinal outcome at time is defined as the conditional distribution of given a vector of random effects assumed to be a member of the exponential family [16, 17]
| (2) |
where is a one-to-one monotonic link function, is a vector of fixed intercept terms and is a vector of random intercepts, is a covariate vector of fixed effects at time with parameter vector is a design matrix of random effects with parameter vector . Note that the dimensionality of and can differ among the multiple longitudinal outcomes and can either be fixed or time varying. The multivariate GLMM accounts for the correlation between the longitudinal outcomes by assuming a multivariate normal distribution for the random effects as , , where is a variance-covariance matrix for the -th person with the diagonal elements as variances and the off-diagonal as covariances.
For the survival sub-model, we considered a proportional hazards (PH) model. We defined a survival data by letting the observed time of death due to all causes i.e., the minimum of true event time and the right-censoring time , and denote the event indicator by , for death and for censoring; is an indicator function. We included estimated true long-term intake values of the longitudinal outcomes (cholesterol, total fat, dietary fiber, and energy) and a set of baseline covariates (diabetes, sex, and age) as predictors in the proportional hazards sub-model as
| (3) |
where is a baseline hazard function, is a vector of baseline covariates with a vector of logarithm of hazard ratios (logHR) . The parameter quantifies the association of the underlying -th longitudinal outcome with the risk of an event [16]. Specifically, exp() denotes the relative increase in the hazard for an event at time per unit increase in at the same time point. The longitudinal predictors in model (3) are linked through the correlated random effects in the multivariate GLMM longitudinal sub-model. Shared covariates in and in the two sub-models further induce correlation between the longitudinal outcomes and baseline factors. The sub-models (2) and (3) are estimated jointly. Note that the MJM allows for additional association structures, especially when dealing with time-dependent covariates [16].
The mean method
We used the mean for each longitudinal outcome per person as an estimate of the true long-term intake to associate the longitudinal outcome with the risk of an event in the PH model as
| (4) |
where . This method is hereafter referred to as the mean method. Despite its simplicity, the mean method does not adjust for person-specific characteristics and might fail to adjust for the day-to-day intake variation and is used here expressly for comparison with the MJM method.
MJM to estimate baseline associations
To estimate the baseline effects of the longitudinal outcomes using the MJM framework with the repeated observations, we linked person-specific baseline values for the longitudinal outcomes to the hazard directly through the intercept association structure as
| (5) |
where now quantifies the strength of the association between the person-specific baseline values for the -th longitudinal outcomes, as estimated by the multivariate GLMM longitudinal sub-model, and the time-to-event; . With this approach, the risk of event depends directly on the subject-specific value of the longitudinal outcome at time [17]. This approach is an extension of Crowther, Lambert [11] to estimate baseline effects of multiple correlated longitudinal outcomes.
The standard method to estimate baseline associations
We further compared the MJM approach of estimating baseline effects with the standard method. With the standard method, the observed baseline values of the longitudinal outcomes were used as predictors in the PH model to estimate the baseline effects
| (6) |
where is the observed baseline value of the -th longitudinal outcome and is the logHR for a unit increase in the observed baseline values. This method does not account for the measurement error in . The four fitted models are presented schematically in Figure 1.
Figure 1.
Summary of the fitted survival models. The dotted lines denotes correlated predictor variables. MJM denotes Multivariate Joint Model. Only MJM accounts for the correlation between the predictor variables.
Statistical analysis
We used subset data from the NHANES-NDI mortality linked data for respondents who were eligible for the mortality follow-up and who had complete data for the study variables considered above. We included only respondents who reported consumption in both 24 dietary recalls for the four dietary intake variables. For each respondent, we used the two repeated measurements for the dietary intake variables (longitudinal outcomes). With the resulting subset data, the fitted joint model components were specified as follows: , where time denotes the measurement number for the longitudinal outcomes, and wave refers to the interview cycle. We used the same set of covariates for all longitudinal outcomes and assumed a random intercept only model, because we had only two time points such that , is a 4 × 4 covariance matrix of random intercepts; ; if death occurred within the follow-up period and if otherwise.
For the longitudinal outcomes in the multivariate GLMM sub-model, we assumed a Gamma distribution with a log link function for the skewed dietary intake variables (cholesterol, dietary fiber, total fat, and energy). The advantage of the Gamma model is that it allows for estimation of the true intake distribution without having to transform the intake data and accounts for heteroscedasticity, which is rife in dietary intake distributions [46]. The BMI and age at screening were standardized before being entered into the multivariate GLMM sub-model to improve model convergence. A variable is standardized by subtracting its sample mean and dividing by its standard deviation. We assessed the proportional hazards assumption in the survival sub-model using Chi-square test and graphically based on scaled Schoenfeld residuals [47]. The method correlates the corresponding set of scaled Schoenfeld residuals for each predictor variable with time to test for independence between residuals and time. Additionally, it performs a global test for the multivariable Cox model.
Models fitting
We fit the models with the Bayesian approach using Markov chain Monte Carlo (MCMC) algorithm to sample from the posterior conditional distributions of the model parameters. The estimation of the MJM model was performed using JMBayes2 package in R software version 4.0.5 [48]. The maximum likelihood estimates were used as the starting values for the MCMC simulation of the posterior samples. In estimating the MJM parameters, we summarized results from 3 chains, each consisting of 5000 MCMC iterations, after discarding 2000 burn-in samples per chain, and using a thinning interval of 5. This resulted in 1000 posterior samples per chain used for inference. We used diffuse normal priors for the fixed effect parameters and inverse Wishart prior for the variance-covariance matrix as specified in the JMBayes2 package. The convergence was assessed by inspecting the trace plots and tested formally using Gelman-Rubin statistic. A multivariate potential scale reduction factor (PSRF)≈1.0 for all model parameters together with quick mixing of MCMC chains was construed as good model convergence. The simplistic comparison survival models were fit in SAS version 9.4 (SAS Institute, Cary, NC, USA) using PHREG procedure. With the PHREG procedure, we used BAYES statement to perform Bayesian survival analysis [49]. The results were summarized with posterior means, 95% credible interval and standard deviation.
Results
Of the 24 433 NHANES respondents interviewed from 2009 to 2016 who were eligible for mortality follow-up, we used data for 14 859 (60.8%) respondents who had complete data on variables used in this study and had reported consumption in the 24 dietary recalls. Of these respondents with complete data, 1205 (8.1%) were deceased by December 2019. The Schoenfeld residuals analysis for proportional hazards assumption in the MJM model resulted in non-significant values for both univariable and global Chi-square tests (p-value > 0.05). The results for the longitudinal outcomes from the multivariate GLMM sub-model are presented in Table 1. Cholesterol intake was positively associated with BMI and being a non-Hispanic black but was negatively associated with age, being a female and measurement time. Dietary fiber intake was positively associated with age, BMI and being a female but was negatively associated with the wave. Total fat intake was positively associated with BMI and being a non-Hispanic black but was negatively associated with being a female, age, and measurement time. Energy intake was positively associated with age but was negatively associated with being a female, BMI, being non-Hispanic black and measurement time. We observed high positive correlations between random effects for dietary fiber, total fat, and energy intakes within a person. The estimated correlation between random intercept terms for dietary intake variables were: between total fat and dietary fiber, between dietary fiber and energy, between total fat and energy. The correlations for other pairwise combinations of intake variables were less than 0.1. Additionally, we observed highest between-person variability in total fat intake (), followed by scaled energy intake (), dietary fiber intake () and cholesterol intake ().
Table 1.
Parameter estimates for the multiple longitudinal outcomes from the multivariate GLMM sub-model of the multivariate joint model using NHANES data collected from 2009 to 2016 linked to NDI public-use mortality data through December 2019
| Cholesterol (milligram, mg) |
Dietary fiber (gram, gm) |
Total fat (gram, gm) | Energy (kilocalories, kcal/100) |
|||||
|---|---|---|---|---|---|---|---|---|
| Mean (95% CI) | StD ev |
Mean (95% CI) | StD ev |
Mean (95% CI) | StD ev |
Mean (95% CI) | StD ev |
|
| Intercept | 5.821 (5.763, 5.876) | 0.029 | 1.603 (1.542, 1.663) | 0.031 | 2.675 (2.577, 2.768) | 0.048 | 1.801 (1.613, 1.987) | 0.093 |
| Female | −0.299 (−0.339, −0.258) | 0.021 | 0.733 (0.667, 0.804) | 0.036 | −0.476 (−0.632, −0.314) | 0.083 | −0.336 (−0.490, −0.189) | 0.078 |
| BMI (kg/m2) | 0.025 (0.003, 0.047) | 0.011 | 0.304 (0.221, 0.384) | 0.041 | 1.175 (1.079, 1.273) | 0.049 | −0.189 (−0.282, −0.099) | 0.047 |
| Age (years) | −0.069 (−0.091, −0.049) | 0.011 | 0.117 (0.024, 0.213) | 0.049 | −0.359 (−0.513, −0.202) | 0.079 | 0.726 (0.555, 0.889) | 0.088 |
| Non-Hispanic black | 0.041 (0.002, 0.079) | 0.020 | −0.095 (−0.212, 0.019) | 0.059 | 0.476 (0.351, 0.601) | 0.065 | −0.275 (−0.423, −0.115) | 0.078 |
| Measurement time | −0.034 (−0.057, −0.010) | 0.012 | −0.008 (−0.023, 0.007) | 0.008 | −0.088 (−0.104, −0.073) | 0.008 | −0.077 (−0.089, −0.063) | 0.006 |
| Wave 2011-2012a | −0.024 (−0.049, 0.001) | 0.013 | −0.141 (−0.280, 0.001) | 0.070 | −0.402 (−0.581, −0.217) | 0.092 | −0.273 (−0.386, −0.157) | 0.058 |
| Wave 2013-2014 | −0.011 (−0.042, 0.021) | 0.016 | −0.156 (−0.205, −0.108) | 0.025 | −0.015 (−0.163, 0.134) | 0.076 | 0.150 (−0.029, 0.328) | 0.094 |
| Wave 2015-2016 | 0.014 (−0.021, 0.051) | 0.019 | −0.174 (−0.321, −0.027) | 0.077 | −0.235 (−0.329, −0.143) | 0.047 | −0.311 (−0.474, −0.154) | 0.082 |
| sigma | 2.237 (2.186, 2.291) | 0.028 | 4.279 (3.981, 4.507) | 0.130 | 4.134 (3.495, 4.610) | 0.292 | 5.747 (4.273, 7.137) | 0.758 |
Mean, posterior mean coefficient estimate; StDev, Standard deviation; CI, Credible Interval
Reference wave is 2009-2010; sigma, dispersion parameter.
The distributions of daily dietary intakes and estimated underlying distributions are presented in Figure 2. The right-skewness of dietary intake distribution is evidenced for all four dietary intake variables. We further observed differences in the variances of daily intakes relative to the estimated underlying intake distributions using the MJM, suggesting a substantial measurement error (day to day variation) in daily intakes. The daily intake distributions were flatter and more spread to the right, whereas the estimated underlying intake distributions using the MJM were steeper and narrower. The MJM approach removed the day-to-day variability in intake. The intake distributions estimated using the average of the two repeated measurements per person (mean method) were more similar to the observed daily intake distributions than the estimated underlying distribution from the MJM, suggesting that the mean method did not adjust for the large variability observed in daily measures of intake.
Figure 2.
Density plots for the observed daily intakes and estimated long-term average intake for the longitudinal outcomes using the average of two 24-HR intake (mean estimate) and multivariate joint model (MJM estimate) using NHANES data collected from 2009 to 2016 linked to NDI public-use mortality data through December 2019.
The effect (logHR) estimates from the MJM and the mean methods are presented in Table 2. The joint modeling estimation approach evidently resulted in stronger associations than the mean method for the dietary intake variables. With joint modeling, the log HR per gram increase in dietary fiber intake increased by approximately 14 times (from logHR = −0.040 to −0.603). With the dietary fiber intake example, we estimated the relative hazard of death due to all causes at a given time per gram increase in dietary fiber intake as 0.547 (95% CI: 0.450, 0.647) using the MJM method and 0.961 (95% CI: 0.951, 0.970) using the mean method. Therefore, using the mean intake to estimate the association of dietary intake with all-cause mortality resulted in severely attenuated associations of the time-varying explanatory variables. Standard deviations for the logHR estimates from MJM were greater than those from the mean method due to the uncertainty in estimating true intake distributions using the MJM method. Interestingly, the effect estimates for the error-free baseline covariates (diabetes, sex, and age) were sensitive to the estimation method for the underlying dietary intake distributions. The effect estimates for the baseline covariates (diabetes, female, and age at recruitment) from the mean method were inflated relative to the estimates from the MJM. For instance, the effect estimate of female gender was inflated by 45% (from −0.287 to −0.416) as estimated using the mean method relative to MJM.
Table 2.
Effects estimates from a multivariate joint modeling (MJM) and a multivariable Cox PH model with mean values for the longitudinal outcomes (the mean method) using NHANES data collected from 2009 to 2016 linked to NDI public-use mortality data through December 2019
| Multivariate joint modeling (MJM) | The mean method | |||||
|---|---|---|---|---|---|---|
| Effects | Mean logHR |
StDev | 95% CI | Mean logHR |
StDev | 95% CI |
| Cholesterol (mg) | 0.050 | 0.193 | −0.418, 0.376 | 0.0001 | 0.0002 | −0.0004, 0.0005 |
| Dietary fiber (gm) | −0.603 | 0.091 | −0.799, −0.436 | −0.040 | 0.005 | −0.050, −0.031 |
| Total fat (gm) | 0.060 | 0.088 | −0.112, 0.216 | −0.002 | 0.002 | −0.006, 0.002 |
| Energy (kcal/100) | 0.208 | 0.112 | −0.017, 0.428 | 0.034 | 0.010 | 0.015, 0.054 |
| Diabetes | 0.391 | 0.088 | 0.219, 0.558 | 0.411 | 0.065 | 0.275, 0.531 |
| Female | −0.287 | 0.081 | −0.452, −0.119 | −0.416 | 0.063 | −0.539, −0.296 |
| Age (years) | 1.574 | 0.054 | 1.471, 1.683 | 1.589 | 0.046 | 1.499, 1.677 |
Mean, posterior mean estimate; StDev, Standard deviation; logHR, logarithm of hazard ratio; CI, Credible Interval.
A comparison of the estimates for baseline effects of the longitudinal dietary intake with the standard method that uses the observed baseline values with the estimates from MJM that uses the intercept association structure are presented in Table 3. The log HR estimates from the standard method are severely attenuated. For instance, we estimated the relative hazard of death at baseline for dietary fiber intake as 0.79 (95%CI: 0.73, 0.85) using MJM method and 0.97 (95%CI: 0.96, 0.98) using the standard method. The standard deviations for logHR estimates for either longitudinal or baseline effects of dietary intake variables from the MJM are larger than those from the simplistic models (Tables 2 and 3). For instance, the standard deviation for the longitudinal effect estimate of dietary fiber intake was estimated as 0.091 from the MJM and 0.005 from the mean method. Consequently, we observed wider 95% CI for the estimates from MJM than from the simplistic models.
Table 3.
Effect estimates for baseline factors from the standard method that uses the observed baseline values and multivariate joint model (MJM) that uses the intercept association structure using NHANES data collected from 2009 to 2016 linked to NDI public-use mortality data through December 2019
| Multivariate joint modeling (MJM) | The standard method | |||||
|---|---|---|---|---|---|---|
| Effects | Mean logHR |
StDev | 95% CI | Mean logHR |
StDev | 95% CI |
| Cholesterol | 0.329 | 0.188 | −0.011, 0.706 | 0.00003 | 0.0002 | −0.0003, 0.0004 |
| Dietary fiber | −0.237 | 0.037 | −0.309, −0.166 | −0.028 | 0.004 | −0.035, −0.020 |
| Total fat | 0.099 | 0.149 | −0.197, 0.377 | −0.003 | 0.002 | −0.006, 0.000 |
| Energy (kcal/100) | −0.103 | 0.171 | −0.420, 0.238 | 0.0323 | 0.008 | 0.018, 0.048 |
| Diabetes | 0.299 | 0.065 | 0.170, 0.425 | 0.422 | 0.065 | 0.300, 0.552 |
| Female | −0.430 | 0.180 | −0.755, −0.065 | −0.407 | 0.062 | −0.529, −0.288 |
| Age | 1.470 | 0.182 | 1.126, 1.830 | 1.591 | 0.045 | 1.504, 1.679 |
Mean, posterior mean estimate; StDev; Standard deviation; logHR, logarithm of hazard ratio; CI, Credible Interval.
Discussion
We demonstrated the use of MJM in an epidemiologic study of dietary intakes and other risk factors of death. The MJM allows for the inclusion of multivariate longitudinal outcomes and adjusts for the random measurement error characterized by day-to-day variations. We exemplified the method using NHANES data linked to the NDI mortality data. We used a flexible Gamma distribution in the multivariate GLMM sub-model to handle skewness in dietary intake distributions. The MJM method adjusted for measurement errors in the longitudinally measured dietary intakes, handled the correlations and right-skewness in their distributions and incorporated person-characteristics, such as age, sex, and BMI, in estimating their true longitudinal profiles. The method further improved the strength of associations of longitudinal outcomes with the hazard of death. However, MJM estimated the effects with less precision (wide 95% CI) due to the uncertainty in estimating the underlying intake distribution using the multivariate GLMM sub-model. In contrast, estimating intake distribution using the mean of two days of intake failed to correct for the within-person variability and resulted in severely attenuated associations.
Currently, a wealth of longitudinal data are becoming available in epidemiologic studies, even if the main interest is in studying baseline effects of risk factors. We showed that by incorporating repeated measures of longitudinal outcomes within a unified joint modeling framework, we corrected for the random error in the measured baseline intake variables, leading to improved associations with death. In contrast, the standard method that fails to correct for measurement error by using the observed baseline values for the longitudinal dietary intake variables to estimate their baseline effects resulted in severe underestimation of the association. Importantly, measurement error in the longitudinal outcomes impacted the effect estimates of other baseline covariates that were assumed to be error free. For example, the logHR estimate for the effect of baseline diabetes on the hazard of death changed by 41% (0.299 to 0.422) by ignoring measurement error in the baseline dietary intake measurements. This suggests a contamination effect of measurement error, whereby the effect estimate of a perfectly measured covariate is affected by measurement error in an imperfectly measured covariate.
The attenuation effects of random measurement error shown in this study is in line with previous studies [13, 14, 50, 51]. For instance, in a simulation study, Crowther et al. [11] showed a marked underestimation of baseline effect of blood pressure by ignoring measurement error, but with a reduced bias by using a joint modeling framework. Campbell et al. [12] and Pepe et al. [52] found severe underestimation of the longitudinal and survival outcomes when using a time-varying covariate with measurement error. Similarly, estimating intake distribution with the mean of a few days of intake is prone to bias [41].
The strength of the MJM approach is that it estimates the complete trajectory allowing for the estimation of evolution effect of the longitudinal outcomes, which is often of interest in epidemiologic studies. The random effects in the multivariate GLMM further account for the correlation between repeated measurements from the same person. A further advantage of the MJM model is that it can be fit using standard statistical software, for instance, in R software using JMBayes2 package. The main limitation of the MJM approach pertains to its complexities attributable to many parameters, which might be computationally intensive. Nonetheless, the MCMC algorithm employed here to fit the MJM handles the computational intensity efficiently by sampling from the posterior distributions. Besides what is covered here, the MJM can be used to estimate the death probabilities in a dynamic manner, whereby an individual’s estimated future risk of an event is updated as each new risk factor value is obtained [15, 19, 45]. Thus, the MJM method can be used for individualized predictions of an event. In this study, we used structure shown in equation (3) to associate the current level of the -th longitudinal outcome with the risk of death. However, this parameterization might not correctly capture the relation between the two processes, prompting extensions to other functional forms, for instance, by allowing for the interaction effects, slope effects, lagged effects, and area under the curve [16]. We used a linear time effect, because we only had two repeated measurements per person for the longitudinal outcomes; however, in longitudinal studies with multiple repeated measurements per person, an investigator may apply flexible functional forms, such as splines, to model nonlinearity [17]. Note that we modeled death event under the proportionality assumption. However, in some cases, the proportionality assumption might not hold, requiring use of alternative modeling framework for the event time such as accelerated failure time. In the presence of multiple events, the MJM can extended to accommodate multiple event times such as competing risks and multi-state process [45]. We acknowledge that there may be other approaches for investigating longitudinal associations when using MJM, and that modeling longitudinal predictors may introduce structural equation paths creating potential for colliders. The model can, therefore, be extended to handle such causal relationships. Noteworthy, the NHANES employs a complex, multistage probability sampling design. Currently, the MJM method used here as implemented in JMBayes2 package does not account for the survey design features. Therefore, these results are solely to illustrate the utility of the MJM in correcting for measurement error, handling skewness and correlation rather than generalizing the study results to the NHANES population. Notably, the public-use mortality data were partially subjected to data perturbation for confidentiality reasons and may not reflect the true mortality details for some study participants.
In conclusion, the MJM provides an efficient modeling approach to correct for random measurement error in correlated and skewed distributed multiple longitudinal outcomes when estimating their associations with the risk of an event. Because the method can be easily implemented in standard statistical software, it is readily available to researchers interested in reducing the bias of associations between time-varying explanatory variables and survival outcome.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Lee I, Kim S, and Kang H, Lifestyle Risk Factors and All-Cause and Cardiovascular Disease Mortality: Data from the Korean Longitudinal Study of Aging. Int J Environ Res Public Health, 2019. 16(17). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ford DW, et al. , Body mass index, poor diet quality, and health-related quality of life are associated with mortality in rural older adults. J Nutr Gerontol Geriatr, 2014. 33(1): p. 23–34. [DOI] [PubMed] [Google Scholar]
- 3.Rawat R, McCoy SI, and Kadiyala S, Poor diet quality is associated with low CD4 count and anemia and predicts mortality among antiretroviral therapy-naive HIV-positive adults in Uganda. J Acquir Immune Defic Syndr, 2013. 62(2): p. 246–53. [DOI] [PubMed] [Google Scholar]
- 4.Weiss A, et al. , Serum total cholesterol: a mortality predictor in elderly hospitalized patients. Clin Nutr, 2013. 32(4): p. 533–7. [DOI] [PubMed] [Google Scholar]
- 5.Satoh M, et al. , A Combination of Blood Pressure and Total Cholesterol Increases the Lifetime Risk of Coronary Heart Disease Mortality: EPOCH-JAPAN. J Atheroscler Thromb, 2021. 28(1): p. 6–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Park Y, et al. , Body mass index and mortality in non-Hispanic black adults in the NIH-AARP Diet and Health Study. PLoS One, 2012. 7(11): p. e50091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gulin T, et al. , Advanced Age, High beta-CTX Levels, and Impaired Renal Function are Independent Risk Factors for All-Cause One-Year Mortality in Hip Fracture Patients. Calcif Tissue Int, 2016. 98(1): p. 67–75. [DOI] [PubMed] [Google Scholar]
- 8.Butler LM, Kan H, and London SJ, Dietary fiber prevents both morbidity and mortality from respiratory disease. Arch Intern Med, 2011. 171(12): p. 1123. [DOI] [PubMed] [Google Scholar]
- 9.Ford ES, et al. , Low-risk lifestyle behaviors and all-cause mortality: findings from the National Health and Nutrition Examination Survey III Mortality Study. Am J Public Health, 2011. 101(10): p. 1922–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Agogo GO, A zero-augmented generalized gamma regression calibration to adjust for covariate measurement error: A case of an episodically consumed dietary intake. Biom J, 2017. 59(1): p. 94–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Crowther MJ, Lambert PC, and Abrams KR, Adjusting for measurement error in baseline prognostic biomarkers included in a time-to-event analysis: a joint modelling approach. BMC Med Res Methodol, 2013. 13: p. 146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Campbell KR, et al. , Comparison of a time-varying covariate model and a joint model of time-to-event outcomes in the presence of measurement error and interval censoring: application to kidney transplantation. BMC Med Res Methodol, 2019. 19(1): p. 130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Freedman LS, et al. , Dealing with dietary measurement error in nutritional cohort studies. J Natl Cancer Inst, 2011. 103(14): p. 1086–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wulfsohn MS and Tsiatis AA, A joint model for survival and longitudinal data measured with error. Biometrics, 1997. 53(1): p. 330–9. [PubMed] [Google Scholar]
- 15.Rizopoulos D, Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics, 2011. 67(3): p. 819–29. [DOI] [PubMed] [Google Scholar]
- 16.Rizopoulos D, Joint models for longitudinal and time-to-event data: With applications in R. 2012: CRC press. [Google Scholar]
- 17.Rizopoulos D and Ghosh P, A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat Med, 2011. 30(12): p. 1366–80. [DOI] [PubMed] [Google Scholar]
- 18.Rizopoulos D and Lesaffre E, Introduction to the special issue on joint modelling techniques. Stat Methods Med Res, 2014. 23(1): p. 3–10. [DOI] [PubMed] [Google Scholar]
- 19.Rizopoulos D, Molenberghs G, and Lesaffre E, Dynamic predictions with time-dependent covariates in survival analysis using joint modeling and landmarking. Biom J, 2017. 59(6): p. 1261–1276. [DOI] [PubMed] [Google Scholar]
- 20.Rizopoulos D, et al. , Personalized screening intervals for biomarkers using joint models for longitudinal and survival data. Biostatistics, 2016. 17(1): p. 149–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Long JD and Mills JA, Joint modeling of multivariate longitudinal data and survival data in several observational studies of Huntington's disease. BMC Med Res Methodol, 2018. 18(1): p. 138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mauff K, et al. , Extension of the association structure in joint models to include weighted cumulative effects. Stat Med, 2017. 36(23): p. 3746–3759. [DOI] [PubMed] [Google Scholar]
- 23.Mirel LB, et al. , Comparative Analysis of the National Health Interview Survey Public-use and Restricted-use Linked Mortality Files. Natl Health Stat Report, 2020(143): p. 1–32. [PubMed] [Google Scholar]
- 24.NCHS, D.L.T. Linked mortality data. 2022. 27/July/2022 [cited 2022 20/09/2022]; NCHS has linked data from various surveys with death certificate records from the National Death Index (NDI)]. Available from: https://www.cdc.gov/nchs/data-linkage/mortality.htm. [Google Scholar]
- 25.Curtin LR, et al. , National Health and Nutrition Examination Survey: sample design, 2007-2010. Vital Health Stat 2, 2013(160): p. 1–23. [PubMed] [Google Scholar]
- 26.Curtin LR, et al. , The National Health and Nutrition Examination Survey: Sample Design, 1999-2006. Vital Health Stat 2, 2012(155): p. 1–39. [PubMed] [Google Scholar]
- 27.Stringhini S, et al. , Socioeconomic status and the 25 x 25 risk factors as determinants of premature mortality: a multicohort study and meta-analysis of 1.7 million men and women. Lancet, 2017. 389(10075): p. 1229–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lv YB, et al. , Association of Body Mass Index With Disability in Activities of Daily Living Among Chinese Adults 80 Years of Age or Older. JAMA Netw Open, 2018. 1(5): p. e181915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Global, B.M.I.M.C., et al. , Body-mass index and all-cause mortality: individual-participant-data meta-analysis of 239 prospective studies in four continents. Lancet, 2016. 388(10046): p. 776–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hajjar I, et al. , Racial Disparity in Cognitive and Functional Disability in Hypertension and All-Cause Mortality. Am J Hypertens, 2016. 29(2): p. 185–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Matoba N and Collins JW Jr., Racial disparity in infant mortality. Semin Perinatol, 2017. 41(6): p. 354–359. [DOI] [PubMed] [Google Scholar]
- 32.Kim Y and Je Y, Dietary fiber intake and total mortality: a meta-analysis of prospective cohort studies. Am J Epidemiol, 2014. 180(6): p. 565–73. [DOI] [PubMed] [Google Scholar]
- 33.Kwon YJ, et al. , Association of Dietary Fiber Intake with All-Cause Mortality and Cardiovascular Disease Mortality: A 10-Year Prospective Cohort Study. Nutrients, 2022. 14(15). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Park Y, et al. , Dietary fiber intake and mortality in the NIH-AARP diet and health study. Arch Intern Med, 2011. 171(12): p. 1061–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kwon YJ, et al. , Differential relationship between dietary fat and cholesterol on total mortality in Korean population cohorts. J Intern Med, 2021. 290(4): p. 866–877. [DOI] [PubMed] [Google Scholar]
- 36.Yi SW, Yi JJ, and Ohrr H, Total cholesterol and all-cause mortality by sex and age: a prospective cohort study among 12.8 million adults. Sci Rep, 2019. 9(1): p. 1596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lee PH and Chan CW, Energy intake, energy required and mortality in an older population. Public Health Nutr, 2016. 19(17): p. 3178–3184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Leosdottir M, et al. , The association between total energy intake and early mortality: data from the Malmo Diet and Cancer Study. J Intern Med, 2004. 256(6): p. 499–509. [DOI] [PubMed] [Google Scholar]
- 39.Nagai M, et al. , Association of Total Energy Intake with 29-Year Mortality in the Japanese: NIPPON DATA80. J Atheroscler Thromb, 2016. 23(3): p. 339–54. [DOI] [PubMed] [Google Scholar]
- 40.Yang Y, et al. , Association between dietary fiber and lower risk of all-cause mortality: a meta-analysis of cohort studies. Am J Epidemiol, 2015. 181(2): p. 83–91. [DOI] [PubMed] [Google Scholar]
- 41.Carriquiry AL, Estimation of usual intake distributions of nutrients and foods. J Nutr, 2003. 133(2): p. 601S–8S. [DOI] [PubMed] [Google Scholar]
- 42.Basiotis PP, et al. , Number of days of food intake records required to estimate individual and group nutrient intakes with defined confidence. J Nutr, 1987. 117(9): p. 1638–41. [DOI] [PubMed] [Google Scholar]
- 43.Agogo GO, et al. , Evaluation of a two-part regression calibration to adjust for dietary exposure measurement error in the Cox proportional hazards model: A simulation study. Biom J, 2016. 58(4): p. 766–82. [DOI] [PubMed] [Google Scholar]
- 44.Tooze J, et al. , A new method for estimating the usual intake of episodically consumed foods with application to their distribution. Journal of American Diet Association, 2006: p. 1575–1587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Baart SJ, et al. , Joint Modeling of Longitudinal Markers and Time-to-Event Outcomes: An Application and Tutorial in Patients After Surgical Repair of Transposition of the Great Arteries. Circ Cardiovasc Qual Outcomes, 2021. 14(11): p. e007593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Agogo GO, et al. , Use of two-part regression calibration model to correct for measurement error in episodically consumed foods in a single-replicate study design: EPIC case study. PLoS One, 2014. 9(11): p. e113160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Schoenfeld D, Partial residuals for the proportional hazards regression model. Biometrika, 1982. 69(1): p. 239–241. [Google Scholar]
- 48.Rizopoulos D JMbayes2: Extended Joint Models for Longitudinal and Time-to-Event Data. 2022. 09-September-2022 [cited 2022 20-09-2022]. Available from: https://cran.r-project.org/web/packages/JMbayes2/index.html. [Google Scholar]
- 49.SAS. PHREG Procedure: Bayesian Analysis of the Cox Model. 2020. 28/October/2022 [cited 2022 26/9/2022]; Available from: https://documentation.sas.com/doc/en/statug/15.2/statug_phreg_examples13.htm.
- 50.Carroll RJ, et al. , Measurement error in nonlinear models: a modern perspective. 2006: Chapman and Hall/CRC. [Google Scholar]
- 51.Rosner B and Gore R, Measurement error correction in nutritional epidemiology based on individual foods, with application to the relation of diet to breast cancer. Am J Epidemiol, 2001. 154(9): p. 827–35. [DOI] [PubMed] [Google Scholar]
- 52.Pepe MS, Self SG, and Prentice RL, Further results on covariate measurement errors in cohort studies with time to response data. Stat Med, 1989. 8(9): p. 1167–78; discussion 1179. [DOI] [PubMed] [Google Scholar]


