Skip to main content
International Journal of Epidemiology logoLink to International Journal of Epidemiology
. 2013 Dec 17;42(6):1795–1810. doi: 10.1093/ije/dyt208

Systematic evaluation of environmental and behavioural factors associated with all-cause mortality in the United States National Health and Nutrition Examination Survey

Chirag J Patel 1, David H Rehkopf 2, John T Leppert 3, Walter M Bortz 4, Mark R Cullen 2, Glenn M Chertow 4, John PA Ioannidis 1,*
PMCID: PMC3887569  PMID: 24345851

Abstract

Background Environmental and behavioural factors are thought to contribute to all-cause mortality. Here, we develop a method to systematically screen and validate the potential independent contributions to all-cause mortality of 249 environmental and behavioural factors in the National Health and Nutrition Examination Survey (NHANES).

Methods We used Cox proportional hazards regression to associate 249 factors with all-cause mortality while adjusting for sociodemographic factors on data in the 1999–2000 and 2001–02 surveys (median 5.5 follow-up years). We controlled for multiple comparisons with the false discovery rate (FDR) and validated significant findings in the 2003–04 survey (median 2.8 follow-up years). We selected 249 factors from a set of all possible factors based on their presence in both the 1999–2002 and 2003–04 surveys and linkage with at least 20 deceased participants. We evaluated the correlation pattern of validated factors and built a multivariable model to identify their independent contribution to mortality.

Results We identified seven environmental and behavioural factors associated with all-cause mortality, including serum and urinary cadmium, serum lycopene levels, smoking (3-level factor) and physical activity. In a multivariable model, only physical activity, past smoking, smoking in participant’s home and lycopene were independently associated with mortality. These three factors explained 2.1% of the variance of all-cause mortality after adjusting for demographic and socio-economic factors.

Conclusions Our association study suggests that, of the set of 249 factors in NHANES, physical activity, smoking, serum lycopene and serum/urinary cadmium are associated with all-cause mortality as identified in previous studies and after controlling for multiple hypotheses and validation in an independent survey. Whereas other NHANES factors may be associated with mortality, they may require larger cohorts with longer time of follow-up to detect. It is possible to use a systematic association study to prioritize risk factors for further investigation.

Keywords: All-cause mortality, exposure, behaviour, environment-wide association study

Introduction

Identification of environmental and behavioural factors associated with mortality is critical for public health and preventive care. Many of these factors may be possible to modify, as opposed to genetic and demographic factors (age, sex, race/ethnicity) that are impossible to change and socio-economic factors (e.g. income, education and occupation) that are very difficult to change. McGinnis, Foege, Mokdad et al. identify behavioural and environmental risk factors as ‘actual causes of deaths in the United States’, requiring as much attention and response as standard proximate clinical conditions.1,2 One way to ascertain and compare environmental and behavioural risks for mortality is to integrate data from national health surveys linked with mortality registries.2,3 There is a large body of literature on studies that try to identify environmental factors and behaviours that may increase or decrease death risk. However, these studies typically assess and report one or a few factors at a time, and may lack systematic validation in independent datasets. Modern humans are now exposed to a complex array of environmental and behavioural factors4,5 and in theory many behaviours may entail health risks and benefits. However, there is a lack of analytic strategies that aim to decipher concurrently how multiple environmental and behavioural factors are associated with mortality. Further, potential environmental exposure and behavioural risk may be modified or determined by demographic attributes, such as sex, race/ethnicity and socio-economic status. Lack of standardization in the analysis may lead to inflated or spurious irreproducible effects.6,7 This is in contrast to current-day genome-wide association studies (GWAS), a systematic analytic strategy to correlate millions of common genetic factors with disease traits.8 These investigations have resulted in a robust literature of genetic findings in contrast to environmentally- or behaviourally-based investigations.8

We have recently developed methods for environment-wide association study (EWAS), aiming to search for and validate environmental factors associated with disease and disease-related phenotypes.9–11 Here, we extend this methodology to systematically evaluate the associations of 249 environmental and behavioural factors, such as blood and urine biomarkers of exposure (e.g. pollutants and nutrients), and behavioural factors (e.g. physical activity, smoking and alcohol consumption), with all-cause mortality. We analyze the association of 249 factors on all-cause mortality using information collected from participants of the 1999–2002 United States National Health and Nutrition and Examination Survey (NHANES) with linked mortality information ascertained by the National Death Index (NDI) in 2006. We subsequently validate findings in an independent survey, 2003–04 NHANES. Last, we evaluate the correlation pattern between tentatively validated factors and identify those that have independent effects on all-cause mortality and how these interplay with demographic and socio-economic attributes.

Methods

NHANES 1999–2000, 2001–02 and 2003–04

We downloaded NHANES laboratory, questionnaire and National Death Index (NDI) linked mortality data for 1999–00, 2001–02 and 2003–04 surveys. Mortality information was collected from the date of the survey participation through 31 December 2006 and ascertained via a probabilistic match between NHANES and NDI death certificate information. The NDI matches individuals on personal and demographic criteria, such as social security number and date of birth, and its performance has been described elsewhere (e.g. ref 12). Overall, 9555, 11 021, and 10 100 participants were followed in the 1999–2000, 2001–02 and 2003–04 surveys, respectively, with 611, 470 and 276 assumed death events, respectively. We used the 1999–2000 and 2001–02 surveys to scan for factors associated with all-cause mortality (‘training’ dataset) and reserved the 2003–04 survey to replicate findings from the training set.

Factors such as age, sex, race/ethnicity, educational attainment, occupation and income are hypothesized to be associated with both mortality and environmental/behavioural factors and we estimated their association with mortality.13 Further, these sociodemographic factors may also confound associations of environmental/behavioural factors with death. In NHANES, race/ethnicity was coded as Non-Hispanic White (‘White’), Mexican American (‘Mexican’), Non-Hispanic Black (‘Black’), Other Hispanic and Other. We coded educational attainment as less than high school, high school equivalent and greater than high school education. We estimated socio-economic status (SES) as the categorical quintile of income/poverty index as previously described.9,10 We estimated occupation in categories corresponding to white-collar and professional (reference group), white-collar and semi-routine (e.g. technicians), blue-collar and high-skill (e.g mechanics, construction trades and military) and blue-collar and semi-routine (e.g. personal services, farmworkers) as previously described.14

Figure 1 depicts our procedure. We assessed a total of 249 environmental and behavioural factors, see Table 1 and Supplementary Table S1 (available as Supplementary data at IJE online). These factors were either (i) information on behaviours, such as self-reported dietary intake (from a food frequency questionnaire), self-reported alcohol consumption, self-reported smoking, body mass index (BMI) from a physical examination or self-reported physical activity; or (ii) physical/chemical biomarkers of external exposures measured in serum or urine, such as blood lead concentration. Table 1 shows examples of factors and Table S1 (available as Supplementary data at IJE online) provides a listing of all factors. There were a total of 416, 467 and 574 factors in the 1999–2000, 2001–02 and 2003–04 surveys, respectively. Next, from these 406, 457 and 564 factors, we identified a total of 347 that were present in all three surveys. Of these 347 factors, we found 249 that could be linked with at least 20 deceased participants in the training (1999–2000 and 2001–02 surveys) and testing (2003–04) datasets independently (Figure 1A, B).

Figure 1.

Figure 1

Methodology to scan for environmental and behavioural factors associated with mortality. (A) Summary of environmental and behavioural variables in three independent NHANES surveys (1999–2000, 2001–02, 2003–04). (B) Training (combined 1999–2000 and 2001–02 surveys) and testing survey information. (C) Associating each 249 variables with all-cause mortality (SES, socio-economic status estimate, quintile of income/poverty ratio). (D) Empirical false discovery rate (FDR) estimation in training surveys. (E) Proportional hazards assumption verification. (F) Tentative validation (P <0.05 in testing surveys). (G) Estimation of variance explained by tentatively validated factors with independent contribution and interaction with demographic variables

Table 1.

Number and examples of environmental and behavioural factors

Factor category No. Examples
Behavioural factors:
 Alcohol use 3 Drink five per day (yes/no)?
Quantity drinks per day (ordinal)
 Personal smoking 19 Current or past smoker (referent: no smoking)
Smoke cigars 20 times in life (yes/no)?
 Family smoking 4 Does anyone smoke in the home?
Total cigarette smokers in home (ordinal)
 Cotinine 1 Serum levels of nicotine metabolite (log and per 1 SD)
    Physical activity 1 Health.gov guideline activity levels (ordinal)
 Social support 3 Anyone to help (yes/no)?
First-degree support (yes/no)?
 Street drug use 1 Ever used cocaine or street drugs (yes/no)?
 Body mass index 4 <18.5 kg/m2, or
≥25 and <30 kg/m2, or
≥30 and <35 kg/m2, or
≥35 kg/m2 (referent: ≥18.5 and <25 kg/m2)
 Food nutrient recall 58 Dietary nutrient intake levels derived from Food frequency questionnaire (FFQ) (continuous and adjusted for caloric intake)
Environmental Factors (serum- and urine-based):
 Bacterial infection 2 MRSA 1 present (yes/no)
S. aureus present (yes/no)
 Viral infection 5 Hepatitis B antibody (yes/no)
Hepatitis A antibody (yes/no)
 Diakyl 6 Urinary dimethylphosphate (log per 1 SD)
 Dioxins 7 2,3,7,8-tetrachlorodibenzodioxin (log and per 1 SD)
 Furans 10 2,3,7,8-tetrachlorodibenzofuran (log and per 1 SD)
 Heavy metals 15 Urinary cadmium (log and per 1 SD)
Serum cadmium (log and per 1 SD)
 Hydrocarbons 21 Urinary 1-hydroxyfluorene (log and per 1SD)
 Nutrients and minerals 15 Serum folate (log and per 1 SD)
Serum vitamin D (log and per 1 SD)
 Polychlorinated biphenyls 34 Serum (polychlorinated biphenyls) PCB170 (log and per 1 SD)
 Pesticides 22 Serum heptachlor epoxide (log and per 1 SD)
 Phthalates 12 Urinary mono-n-butyl phthalate (log and per 1 SD)
 Phytoestrogens 6 Urinary enterolactone (log and per 1 SD)
Total 249

Behavioural factors included three surveying alcohol consumption, one on ‘street drug’ use, 58 factors on food and nutrient consumption, 23 on smoking-related behaviours [e.g. ‘current or past smoker? (versus never smoker)’, ‘does anyone in your household smoke (yes/no)?’)] one on physical activity and three on social support (e.g. ‘have anyone to help?’, ‘how many close friends do you have?’). We discuss these variables in the following. First, the three factors on alcohol consumption included five or more drinks per day, number of drinks per day in last month [z-standardized (divided by the population standard deviation to facilitate comparison of effects) ordinal factor] and how many total days drinking per year (z-standardized ordinal factor).

The 23 smoking factors included four regarding family smoking behaviour and 19 on personal smoking behaviour. The four family smoking behaviour factors included any smokers in the household (referent group: no smokers in household), total number of cigarette smokers in the household (z-standardized ordinal factor) and the total number of cigarettes smoked at home (z-standardized ordinal factor). The 18 factors regarding personal behaviour included a categorical factor on current or past smoking (analyzed as a two-level categorical factor with never smoking as a referent) and four on ever-used cigars, chewing tobacco, snuff and pipes (referent group: never smoked the item). Specifically for current and past smokers, factors included the number of cigarettes smoked just before quitting (z-standardized ordinal factor), how many years smoked (z-standardized ordinal factor), number of cigarettes currently smoking (z-standardized ordinal factor), the average number of cigarettes smoked per day in the past month (z-standardized ordinal factor) and an estimated nicotine, tar and carbon monoxide content of smoked item (z-standardized ordinal factors). Other factors for current smokers included years since started smoking (z-standardized ordinal variable).

Physical activity was estimated by deriving metabolic equivalents for self-reported leisure and normal-time activities15 and treated as an ordinal factor based on Health.gov physical activity guideline categories for no aerobic activity, low activity (medium intensity activity greater than baseline but fewer than 150 min/week), moderate activity (150 to 300 medium intensity min/week) and high activity (>300 min medium intensive activity per week or >150 min high intensity per week) as previously described.10,16

The 58 self-reported food and nutrient consumption factors were determined from one in-person 24-h interview (1999–2000, 2001–02) or two 24-h (2003–04) in-person and telephone interviews using the United States Department of Agriculture and Department of Health and Human Services food recall questionnaires.17–20 These food and nutrient consumption factors were linearly adjusted by total caloric intake and z-standardized.

We considered BMI as another behavioural four-level categorical factor. We divided BMI into five categories as previously described,21 including <18.5 kg/m2, ≥18.5 and <25 kg/m2, ≥25 and <30 kg/m2, ≥30 and <35 kg/m2, and ≥35 kg/m2. The ≥18.5 and <25 kg/m2 category was the reference group.

The 156 factors were serum or urine-based measures of environmental exposure, including infectious agents, environmental chemicals and nutrients. Broadly, these included a serum marker of nicotine metabolism (cotinine), dioxins (n = 7 markers), furans (n = 10), heavy metals (n = 15), hydrocarbons (n = 21), nutrients (n = 15), polychlorinated biphenyls (n = 34), pesticides (n = 22), phthalates (n = 12), oestrogenic compounds (n = 6), bacterial (n = 2) and viral organisms (n = 6). With the exception of assays detecting infectious agents (which were positive/negative assays), factors were continuous in scale. Continuous biomarker factors that had a right-skewed distribution were log-transformed and z-standardized as previously described.9,10

Different measures of environmental and behavioural factors had different numbers of eligible participants for mortality follow-up assessment (Figure 1B). In the training surveys (1999–2002), there were 330–6008 eligible participants (with 26–655 death events). For the replication survey (2003–04), there were 177–3258 eligible participants (with 20–202 deaths) (Supplementary Table S1, available as Supplementary data at IJE online). We used the R-project survival and survey library for all analyses and accounted for clusters pseudo-strata, pseudo-sampling units and participant weights to accommodate the complex sampling of the data.22,23 Estimates were verified with STATA.24

Systematic scan of environmental and behavioural factors associated with all-cause mortality

We associated each of the 249 factors to all-cause mortality serially using proportional hazards (Cox) regression, while adjusting for sociodemographic attributes described above, including age, sex, an estimate of SES (categorical quintiles of poverty to income ratio), educational attainment, occupation and race/ethnicity in the training surveys, the 1999–2002 NHANES (the ‘training’ step, Figure 1C). We used the FDR to correct for multiple hypotheses as described previously9–11 (Figure 1D). The FDR is the estimated proportion of the false discoveries made over the number of total discoveries made at a given significance level. We used a permutation simulation method to estimate the numerator, the number of false positives incurred at a significance threshold as documented earlier.9,11,25 Specifically, to estimate the expected number of false positives, we permuted the censorship and follow-up time variable within each stratum of the survey; in other words, participants were randomly assigned mortality status. Then, we re-ran survival analyses for each of the 249 factors. We repeated this process 100 times to attain a distribution of P-values drawn from the null distribution. The permutation method accounts for the correlation amongst factors.26 We set an FDR threshold of 5% to identify findings in the training step for validation in the testing survey. For each factor that passed the FDR threshold in the training step, we assessed violation of proportional hazards by examining interaction between the factor and follow-up time.

We deemed a factor tentatively validated if it had achieved FDR <5% significance in the training scan (1999–2002 surveys) and achieved nominal statistical significance in the test (2003–04) survey (P-value <0.05, Figure 1D–F). For validated findings, we computed an overall adjusted hazard ratio (referred to as ‘overall HR’) by combining both the training and testing survey datasets (Figure 1F). We verified whether the validated factors violated the proportional hazards assumption by checking their interaction with follow-up time. We did not have evidence that any of these factors significantly violated the assumption (P > 0.05).

We assessed the non-parametric correlations among factors that had an FDR < 5% in the training step, specifically bi-serial correlations between binary factors and Spearman correlations when considering quantitative factors. We visualized these pairwise correlations in a heat map and arranged the factors using a hierarchical clustering algorithm27 as previously described.11

We computed the power for detection of factors at P-value corresponding to FDR <5% (equivalent to P = 0.0003) for sample sizes corresponding to each factor tested at a range of adjusted HR of (1.1, 1.3, 1.5, 1.7 and 1.9) with the powerSurvEpi R library.28 Specifically, this library implements methods that take into account the correlation among the factor and adjustment co-variates29,30 sample size and number of death events to estimate power at a given P-value threshold and HR. We then estimated how many factors we would detect if every one of the 249 were associated with all-cause mortality for FDR <5% (P <0.0003) and each HR above by totalling the power estimations for each factor tested (Supplementary Table S2, available as Supplementary data at IJE online). At HRs of 1.1, 1.3, 1.5, 1.7 and 1.9, we estimated we would find 7 out of 249 (3%), 120/249 (49%), 194/249 (79%), 221/249 (89%) and 233/249 (94%), respectively, if all 249 factors were associated with all-cause mortality. We concluded we were adequately powered to detect modest and large associations (HR >1.3 or HR <0.8), but not weak associations with all-cause mortality.

Interaction checks with two lowest SES categories, male sex and Non-Hispanic Black race/ethnicity

For tentatively validated factors, we aimed to assess their interaction with demographic and socio-economic characteristics associated with risk for all-cause mortality, namely male sex, two lowest SES quintiles and Non-Hispanic Black race/ethnicity in the combined cohort (training and testing cohorts, Figure 1G). Specifically, we modelled the interactions among each of the validated findings and the three demographic factors with a multiplicative term in the Cox proportional hazards model while controlling for the remaining demographic co-variates above (age, sex, education, quintile of SES, occupation and race/ethnicity). As one example, the interaction between serum cadmium exposure (‘X’) and male sex would have been modelled as: log(HR) = β1 * X + β2 * male + β3 * X * male + other adjustment covariates (age, race/ethnicity, education, SES, occupation). We assessed whether inclusion of the interaction term (β3) was significant at the Bonferroni level of significance after considering 7 times 3 interaction tests (P < 0.05/21 = 0.002).

Variance explained of validated factors

To estimate the additive effects and overall variance explained by identified factors, we built three multivariable models that included tentatively validated factors (Figure 1G). The first models contained tentatively validated factors in addition to age, sex, quintiles of SES, education and occupation as defined above. The third model contained tentatively validated factors in addition to age, sex and race/ethnicity but excluded socio-economic factors. We hypothesized that the socio-economic factors may influence some of the environmental and behavioural factors. Under this hypothesis, the strength of the associations of the environmental/behavioural factors might be stronger in a model without socio-economic co-variates (SES, education and occupation) versus models with socio-economic factors. We computed the Nagelkerke R2 to estimate the variance explained for each model and, in addition, ascribed solely to the environmental and behavioural factors. We computed standard errors around the Nagelkerke R2 with a bootstrapping procedure that accommodated stratified data.31

Results

Baseline characteristics of deceased and surviving participants in NHANES 99–02 and NHANES 03–04

There were a total of 6008 eligible participants for study in NHANES 1999–2002 with a median time to follow-up of 66 months. As expected, we found important associations among demographic characteristics and mortality, including older age [adjusted hazard ratio (HR) = 2.2 (2.1, 2.4) for a 10-year increase], male sex [HR = 1.7 (1.4, 2.1)], and non-Hispanic Black race/ethnicity [HR = 1.4 (1.1, 1.8) relative to non-Hispanic Whites]. We also observed higher risk depending on SES (as defined by quintile of income-poverty ratio). Individuals at the two lowest SES standings had greater than 2-fold risk for death [HR = 2.2, (1.5, 3.6) and 2.4 (1.7, 3.4) for first and second quintiles, respectively] versus the highest SES (Table 2). Supplementary Table S1 (available as Supplementary data at IJE online) shows factors that differed among alive and deceased participants.

Table 2.

Demographic and socio-economic attributes and hazard ratios (HRs) for NHANES 1999–2002 ‘training’ samples

Survivors n = 5353a Deceased n = 655a Age-adjusted HRb Demographic-adjusted HRb
Age 43.45 (0.34) 68.25 (0.83) 2.32 [2.15,2.49] 2.24 [2.07,2.43]
Male 47.9 (0.6) 52.9 (2.4) 1.56 [1.26,1.93] 1.72 [1.38,2.13]
Race (%):
    Non-Hispanic Black 10.4 (1.2) 11.4 (1.4) 1.66 [1.33,2.08] 1.40 [1.09,1.81]
    Mexican American 7.2 (0.9) 3.6 (0.8) 1.15 [0.80,1.64] 0.86 [0.60,1.23]
    Other 4.7 (0.6) 1.6 (0.6) 0.57 [0.30,1.08] 0.54 [0.30,0.98]
    Other Hispanic 6.8 (1.7) 5.6 (2.2) 1.18 [0.77,1.80] 0.92 [0.59,1.43]
    Non-Hispanic White 70.8 (1.8) 77.7 (2.4) ref ref
Education (%):
    <High school 20.8 (0.9) 37.5 (2.4) 1.42 [1.14,1.77] 1.24 [0.98,1.57]
    High school 25.7 (1.0) 29.2 (1.9) 1.65 [1.33,2.05] 1.23 [0.98,1.54]
    >High school 53.5 (1.5) 33.2 (2.5) ref ref
Income (quintile of income/poverty) (%):
    Quintile 1 16.9 (0.9) 19.2 (2.0) 2.39 [1.70,3.38] 2.32 [1.51,3.57]
    Quintile 2 18.5 (1.1) 33.5 (2.8) 2.47 [1.74,3.50] 2.41 [1.69,3.44]
    Quintile 3 19.9 (0.7) 22.0 (2.3) 1.89 [1.29,2.75] 1.76 [1.20,2.57]
    Quintile 4 19.6 (0.6) 13.8 (1.7) 1.68 [1.08,2.59] 1.60 [1.03,2.48]
    Quintile 5 25.1 (1.4) 11.4 (2.0) ref ref
Occupation:
    Blue-collar semi 38.1 (1.0) 39.1 (2.8) 1.58 [1.25,1.99] 1.18 [0.87,1.59]
    Blue-collar high 10.3 (0.7) 14.6 (1.8) 1.73 [1.22,2.43] 1.10 [0.78,1.55]
    Never worked 2.6 (0.2) 2.8 (0.7) 0.78 [0.40,1.50] 0.76 [0.37,1.58]
    White-collar semi 20.9 (0.8) 19.0 (1.4) 0.96 [0.77,1.19] 0.98 [0.74,1.30]
    White-collar high 23.5 (0.8) 19.1 (1.8) ref ref

Semi, semi-routine; high, high skill; ref, referent.

aUnweighted sample size.

bHR adjusted for all other demographic and socio-economic factors.

NHANES 2003–04 was used for validation. There were a total of 3262 eligible participants for study in NHANES 2003–04 with a median follow-up of 34 months. We observed similar trends in NHANES 2003–04 (Table 3). Participants in NHANES 2003–04 had higher mean survivor age of 44.5 years and deceased mean age of 71.4 years. The adjusted HR for a 10-year increase in age was 2.6 [95% CI: 2.3, 3.0)] versus 2.2 in NHANES 1999–2002. We observed double the risk for men [adjusted HR = 2.0 (1.5, 2.7)].

Table 3.

Demographic and socio-economic attributes and hazard ratios (HRs) for NHANES 2003–2004 ‘testing’ samples

Survivors n = 3059a Deceased n = 203a Age-adjusted HRb Demographics-adjusted HRb
Age 44.49 (0.55) 71.42 (1.37) 2.60 [2.28,2.96] 2.28 [1.97,2.64]
Male 48.0 (0.7) 56.7 (3.5) 1.81 [1.40,2.36] 2.28 [1.55,3.35]
Race (%):
    Non-Hispanic Black 11.3 (1.8) 12.4 (3.3) 1.54 [1.11,2.12] 1.35 [0.98,1.86]
    Mexican American 8.0 (2.0) 4.3 (2.7) 1.21 [0.76,1.92] 0.78 [0.48,1.29]
    Other 5.2 (0.7) 4.8 (2.1) 1.53 [0.62,3.74] 1.48 [0.50,4.38]
    Other Hispanic 3.7 (0.7) 3.2 (1.9) 1.69 [0.58,4.87] 1.68 [0.59,4.82]
    Non-Hispanic White 71.8 (3.4) 75.3 (3.8) ref ref
Education (%):
    <High school 18.5 (1.2) 33.7 (4.6) 1.31 [0.96,1.80] 1.05 [0.67,1.65]
    High school 27.0 (1.0) 25.7 (4.8) 1.00 [0.56,1.77] 0.97 [0.53,1.76]
    >High school 54.6 (1.2) 40.6 (5.1) ref ref
Income (quintile of income/poverty) (%):
    Quintile 1 16.9 (1.5) 22.2 (4.0) 2.33 [1.24,4.36] 2.05 [1.04,4.04]
    Quintile 2 19.7 (0.9) 31.9 (3.9) 1.62 [0.79,3.30] 1.31 [0.59,2.92]
    Quintile 3 19.6 (1.0) 21.6 (5.2) 1.38 [0.59,3.21] 1.23 [0.52,2.91]
    Quintile 4 22.0 (1.2) 12.5 (2.9) 0.85 [0.51,1.43] 0.72 [0.37,1.39]
    Quintile 5 21.9 (1.8) 11.9 (3.2) ref ref
Occupation:
    Blue-collar semi 39.0 (2.0) 27.0 (5.4) 0.87 [0.52,1.44] 0.80 [0.47,1.38]
    Blue-collar high 11.8 (1.2) 23.4 (3.8) 1.89 [1.25,2.86] 1.40 [0.78,2.49]
    Never worked 2.2 (0.2) 5.9 (1.6) 1.37 [0.69,2.71] 1.61 [0.73,3.54]
    White-collar semi 23.1 (1.2) 17.9 (3.0) 0.71 [0.48,1.07] 0.94 [0.55,1.58]
    White-collar high 22.0 (1.4) 26.0 (3.8) ref ref

Semi, semi-routine; high, high skill; ref, referent.

aUnweighted sample size.

bHR adjusted for all other demographic and socio-economic factors.

Limited cause of death information was available for deceased participants and was coded as International Classification of Diseases version 10 (ICD10) codes. The Center for Disease Control and Prevention (CDC)/National Center for Health Statistics (NCHS) binned ICD10 codes into 113 groups. The top five causes of death for participants in the 1999–2001 surveys included the groups ‘other forms of ischaemic heart disease’ (ICD10 codes I20, I25.1–I25.9, 10% of the deceased population), ‘cerebrovascular diseases’ (ICD10: I60–I69, 8% of deceased participants), ‘other diseases’ (more than ten ICD10 groups, 7% of deceased participants), ‘malignant neoplasms of trachea, bronchus and lung’ (ICD10: C33–C34, 7%), and ‘acute myocardial infarction’ (ICD10: I21–I22, 6% of deceased participants). The top five causes of death for deceased participants in the 2003–04 survey were similar and included ‘other forms of ischaemic heart disease’ (12% of deceased participants), ‘malignant neoplasms of trachea, bronchus and lung’ (10% of deceased participants), ‘other diseases’ (8%), ‘acute myocardial infarction’ (8%) and ‘cerebrovascular diseases’ (6%).

Systematic scan of environmental and behavioural factors associated with all-cause mortality

We associated each of the 249 environmental and behavioural factors (self-reported or biomarkers of exposure) with all-cause mortality in turn, adjusting for age, sex, race/ethnicity, SES, occupation and educational attainment in the NHANES 1999–2002 surveys (the ‘training’ dataset). Figure 2 shows the results, visualizing the adjusted hazard ratio versus the P-value of the association. Adjusted HR denotes risk for all-cause mortality per 1 SD for continuous factors or per incremental change for ordinal values. For categorical or binary factors, adjusted hazard ratios denote risk relative to the referent category (‘negative’ for an exposure).

Figure 2.

Figure 2

Volcano plot of 249 environmental and behavioural factor associations with all-cause mortality in training step (all black points). Red horizontal line denotes FDR-adjusted level of statistical significance (FDR = 5%, P-value = 0.0003). Red points show the standard demographic and socio-economic factors considered for adjustments. For SES: SES_0: 1st quantile of SES, SES_1: 2nd quantile of SES, SES_2: 3rd quantile of SES, SES_3: 4th quantile of SES; SES HR are relative to highest quintile of SES. For education: education_hs: high school education, education_less_hs: less than high school education, education HR relative to greater than hig -school education. For occupation: occupation_blue_semi: semi blue c-ollar, occupation_blue_high: high blue-collar, occupation_white_semi: semi white-collar, occupation_never: never worked. Filled black markers denote validated factors. –log10(P-value) for physical activity and age are annotated in parentheses, since they are extreme. Y-axis is discontinuous to accommodate higher –log10(P-values) for physical activity and age

We found 7 (out of 249) factors at FDR <5% in the training surveys (1999–2002 NHANES) and were able to tentatively validate all 7 factors in the test survey (P <0.05 in 2003–04 NHANES) (Table 4, Figure 2). The strongest association included physical activity, analyzed as an ordinal factor (representing the trend from no, low, medium and high activity as defined by Health.gov categories) with adjusted HR of 0.72 for the trend [CI: (0.66, 0.79), P-value = 4 × 10−12] in the training surveys (Figure 2) and an adjusted HR of 0.63 (P = 1 × 10−10) in the test survey (Table 4). We also estimated the adjusted HR of each physical activity level relative to other categories. In the combined surveys, the adjusted HR for low activity relative to zero activity was 0.60 [95% CI: (0.47, 0.73), P = 3 × 10−6]. The adjusted HR for moderate activity versus low activity was 0.58 [95% CI: (0.41, 0.82), P = 3 × 10−3] and high activity versus moderate activity was not significant, with an adjusted HR of 1.2 [95% CI: (0.80, 1.7), P = 0.39]. We had evidence for multiple associations of environmental and behavioural factors with all-cause mortality as seen in the deviance from uniform distribution of P-values (Supplementary Figure S1, available as Supplementary data at IJE online).

Table 4.

Tentatively validated factors. Training denotes estimate from training survey, NHANES 1999–2002. Testing denotes estimates from testing survey, NHANES 2003–04. Combined denotes estimate from combining training and testing surveys

Training survey (1999–2002)
Testing survey (2003–04)
Combined surveys (1999–2004)
Description n Events Adjusted HR [95% CI] P-value FDR n Events Adjusted HR [95% CI] P-value Adjusted HR [95% CI] P-value
Current/past smoker (vs never smoker)
    Past smoker 5409 652 1.50 [1.23,1.83] 7.8x10−5 1.68x10−2 2911 201 1.66 [1.10,2.52] 1.65x10−2 1.53 [1.27,1.83] 5.31x10−6
    Current smoker 5409 652 2.00 [1.38,2.89] 2.3x10−4 2.80x10−2 2911 201 3.17 [1.90,5.28] 9.51x10−6 2.20 [1.61,3.00] 6.59x10−7
Cadmium (1 SD log) 5722 591 1.37 [1.19,1.57] 1.2x10−5 9.33x10−3 3120 188 1.63 [1.34,1.97] 5.66x10−7 1.45 [1.28,1.65] 3.97x10−9
Trans-lycopene(1 SD log) 3096 262 0.81 [0.72,0.91] 2.9x10−4 3.20x10−2 3054 179 0.79 [0.73,0.86] 1.45x10−7 0.80 [0.74,0.86] 1.20x10−9
Physical activity (MET-based rank) 5534 619 0.72 [0.66,0.79] 4.0x10−12 <0.001 2989 191 0.63 [0.54,0.72] 1.27x10−10 0.71 [0.66,0.77] 4.47x10−18
Does anyone smoke in home? 6008 655 2.00 [1.55,2.58] 1.1x10−7 3.5x10−3 3258 202 1.88 [1.28,2.76] 1.18x10−3 1.99 [1.59,2.48] 1.14x10−9
Cadmium, urine (1 SD log) 1783 186 1.62 [1.28,2.04] 5.7x10−5 1.58x10−2 1079 59 2.03 [1.47,2.80] 1.66x10−5 1.66 [1.35,2.04] 1.14x10−6

All estimates are adjusted by age, sex, race, SES, education and occupation. n and number of events are unweighted.

FDR, false discovery rate; MET, Metabolic equivalent.

Three self-reported smoking factors were associated with mortality. These included the categorical factor past and current smoking (versus never smoking). The adjusted HR for past smoking was 1.5 [95% CI: (1.2, 1.8), P = 8 × 10−5 in training surveys] and 2.0 for current smoking [95%CI: (1.4,2.9), P = 2x10−4]. The third self-reported smoking factor included anyone smoking in the participant’s home [adjusted HR: 2.0 (1.6, 2.6), P = 1 × 10−7 in the training surveys]. We observed slightly larger estimates in the test survey for these factors. For example, the adjusted HR for current smokers and past smokers versus never smokers was 1.7 and 3.0, respectively (P < 8 × 10−5 and 2 × 10−4).

We found urine and serum cadmium levels associated with mortality. Serum cadmium had an adjusted HR of 1.4 for a 1-SD change in logged exposure value [CI: (1.2, 1.6), P = 1 × 10−5] and for urinary cadmium the adjusted HR was 1.6 [CI: (1.3, 2.0), P-value = 6 × 10−5]. Adjusted HR in the test surveys were higher [1.6 (P = 6 × 10−7) and 2.0 (P = 2 × 10−5), respectively].

We also found a serum nutrient marker associated with all-cause mortality. Specifically, the serum carotenoid trans-lycopene was negatively associated with all-cause mortality (Figure 2; Table 4). Specifically, trans-lycopene had an HR of 0.6, and higher levels of trans-lycopene were associated with a 20% decreased risk for mortality (Table 4). Adjusted HRs in the test surveys for these variables were similar.

Several factors had higher HR (>2) but did not have FDR <5%, including hepatitis C antibody and hepatitis B surface antigen. Antibodies to hepatitis C had an adjusted HR of 2.7 in the training surveys 2.7 [95% CI: (1.4, 5.0), P = 0.002, FDR = 10%] and 2.2 in the combined surveys [95% CI: (1.2, 3.9), P = 0.009]. Hepatitis B surface antigen presence had an adjusted HR of 2.6 in the training surveys [95% CI: (0.9, 7.3), P = 0.08, FDR >30%] and 2.1 in the combined surveys [95% CI: (0.8, 6.0), P = 0.1].

Interaction checks with two lowest SES categories, male sex and Non-Hispanic Black race/ethnicity

We estimated whether the seven validated factors interacted with three demographic categories (total of 21 tests of interaction). We could not conclude that any of these demographic factors modified associations for all-cause mortality after consideration of multiple hypotheses (P >0.05 for all 21 interaction tests).

Correlation pattern between putative risk factors

We assessed the correlations among each of the environmental and behavioural factors with FDR <5% (n = 7) and adjustment covariates (n = 21) and observed that there were many modest correlations among the 351 pairwise correlations that were calculated; 210 of the 378 correlations were significant (Bonferroni-adjusted P < 0.05). The 5th to 95th percentile range of the absolute value of ρ was 0.005 to 0.30 (Figure 3) and the correlations that were significant had absolute values ranging from 0.04 to 0.62.

Figure 3.

Figure 3

Pairwise correlations of factors with FDR <5% in the training set and of the standard demographic and socio-economic factors used for adjustments

There were significant correlations between similar factors belonging to the same group, such as smoking- and cadmium-related factors. For example, the correlation between serum and urinary cadmium levels was 0.45 (adjusted P <1 × 10−12). Self-reported anybody smoking at home was significantly positively correlated with current smoking status (ρ = 0.6, adjusted P <1 × 10−12) and negatively correlated with past smoking status (ρ = −0.2, adjusted P <1 × 10−12).

We observed correlations between smoking-related behaviours, physical activity, levels of cadmium and levels of trans-lycopene. First, smoking behaviour was significantly correlated with cadmium biomarker levels. Specifically, current smoking was correlated with both serum and urine cadmium levels (ρ = 0.52 and 0.21, respectively, adjusted P < 1 × 10−12). Physical activity was modestly correlated with trans-lycopene with (ρ = 0.2, adjusted P < 1 × 10−12). Urine cadmium was modestly correlated with past smoking (ρ = 0.1, adjusted P = 1 × 10−5). On the other hand, trans-lycopene was modestly but significantly negatively correlated with serum and urine cadmium (ρ = −0.13 and −0.16, adjusted P < 1 × 10−12 and P = 8 × 10−11, respectively).

Moreover, there were modest correlations between the tentatively validated factors and demographic and socio-economic factors (ρ ≥ 0.1 and adjusted P <0.01). First, physical activity was positively correlated with above high school education and 5th quintile of SES (ρ = 0.2 and 0.2, respectively, adjusted P <1 × 10−12) and negatively correlated with age (ρ = −0.14, adjusted p < 1x10−12). Trans lycopene was inversely correlated with age (ρ = −0.3, adjusted P < 1 × 10−12) and less so than for high school education (ρ = −0.13, P <1 × 10−12). Serum and urinary cadmium were directly correlated with age (ρ = 0.24 and 0.34, respectively, adjusted P <1 × 10−12). Serum cadmium was additionally correlated with less than high school education (ρ = 0.13, adjusted P <1 × 10−12) and urinary cadmium was correlated with Non-Hispanic Black race/ethnicity (ρ = 0.16, adjusted P <1 × 10−12).

Smoking-related factors also exhibited correlations with demographic factors. Self-reported current smoking was correlated with male sex (ρ = 0.1, adjusted P <1 × 10−12) and inversely correlated with age (ρ = −0.15, adjusted P <1 × 10−12). Current smoking was also correlated with first quintile of SES (ρ = 0.12, adjusted P <1 × 10−12). Similarly, anyone smoking at home correlated with first quintile of SES (ρ = 0.11, adjusted P <1 × 10−12) and with Non-Hispanic black race/ethnicity (ρ = 0.11, adjusted P <1 × 10−12). Past smoking was strongly correlated with age (ρ = 0.26, adjusted P <1 × 10−12) and male sex (ρ = 0.14, adjusted P <1 × 10−12).

Multivariable models and variance explained by tentative validated factors

We built three multivariable models to estimate the variance explained by the tentatively validated factors. We opted to remove urinary cadmium from consideration in these models due to extensive missing information (only 1694 participants with 134 death events versus 5155 participants with 416 events). In the first, we entered five of the seven tentatively validated factors [serum cadmium, physical activity, anyone smoked in home, current smokers/past smokers (versus never smokers)] while adjusting for demographic covariates (Table 5) for participants from both the training and testing surveys (Model A). The second model was similar to the first, containing six of seven validated factors including trans-lycopene (Model B). The third multivariate model contained six of seven validated factors but omitted socio-economic factors, such as SES, education and occupation (Model C). The total number of participants available in the combined testing and training surveys in Model A was 7381 (733 deaths). The total number of participants available for Models B and C was 5155 (416 deaths).

Table 5.

Multivariable model coefficients

Model A
Model B
Model C
Multivariate HR [95% CI] P-value Multivariate HR [95% CI] P-value Multivariate HR [95% CI] P-value
Past smoker (vs. never Smoker) 1.36[1.12,1.64] 1.49x10−3 1.27[0.94,1.71] 0.013 1.28[0.94,1.73] 0.113
Current Smoker (vs. Never smoker) 1.1[0.72,1.67] 0.655 0.91[0.49,1.71] 0.775 0.92[0.5,1.7] 0.80
Serum cadmium (1 SD of log) 1.24[1.11,1.4] 2.88x10−4 1.24[1.03,1.49] 0.021 1.26[1.04,1.52] 0.017
Total physical activity (MET-based rank) 0.73[0.68,0.79] 3.06x10−14 0.67[0.6,0.74] 9.99x10−16 0.65[0.59,0.72] 1.11x10−16
Does anyone smoke in home? 1.67[1.24,2.24] 7.30x10−4 1.69[1.17,2.44] 4.85x10−3 1.76[1.23,2.51] 1.98x10−3
Trans-lycopene (1 SD of log) . . 0.84[0.77,0.91] 3.02x10−5 0.84[0.77,0.91] 4.61x10−5
n (number of events) 7,381 (733) 5,155 (416) 5,155 (416)
Nagelkerke R2 0.144 [0.127, 0.156]a 0.132 [0.111, 0.1438]a 0.11 [0.102, 0.129]a
Nagelkerke R2 (full-reduced) 0.016 [0.009, 0.022]a 0.021 [0.012, 0.028]a 0.023 [0.015,0.030]a
Model A variables Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, age, sex, race/ethnicity, education, SES, education, occupation
Model B variables Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, trans-lycopene, age, sex, race/ethnicity, education, SES, education, occupation
Model C variables Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, trans-lycopene, age, sex, race/ethnicity

MET, Metabolic equivalent.

aCI computed by bootstrap.

The total variance explained by models A, B and C was 14.4, 13.2 and 11%, respectively. The variance explained by the tentatively validated environmental and behavioural factors in these models was 1.6, 2.1 and 2.3%, respectively. Thus, models not including trans-lycopene but built on more complete data were not inferior versus models including trans-lycopene. Moreover, models that did not consider socio-economic factors had a modestly lower R2 than those that did (13 versus 11%). The contribution of environmental and behavioural factors was similar in models B and C. On the other hand, current smoking (P >0.7 in Models A–C, Table 5) and past smoking (Models B–C, Table 5) lost nominal significance (P >0.05) in multivariable models, indicating the correlative nature of the tentatively validated factors.

Discussion

Out of the 249 tested environmental and behavioural factors, we found that only physical activity, smoking and cadmium levels have consistent evidence for strong and validated associations with all-cause mortality. Some other factors might have been missed due to limited power. This suggests that the study of putative environmental and behavioural risk factors that regulate all-cause mortality at the general population level will require very large studies and careful validation. Given the small effect sizes of validated factors, continuing to perform modest-size studies with selective reporting of a few putative risk factors is unlikely to yield reliable and conclusive answers. Tentatively validated factors accounted for approximately 2% of the risk variance when demographic factors were accounted for, and this decreased little when socio-economic factors (income, education and occupation) were also accounted for. This suggests that little of the impact of these two modifiable behaviours out of 249 examined is explained by the few measured socio-economic forces that possibly influence physically inactivity or smoking.

Whereas we did not subject the socio-economic factors to validation, the volcano plot (Figure 2) shows that descriptively the two lowest quintiles of income were strongly associated with mortality, with a hazard ratio larger than any of the individual environmental and behavioural factors tested. This relation specific to the two lowest levels of income and mortality is consistent with prior work done on an earlier wave of NHANES14 and thus should be further investigated, given the out-of-sample replication and strength of association we observe.

Reassuringly, we were able to elicit well-known associations between smoking and physical activity and all-cause mortality. Our estimates of increased risk with current and past smoking are very similar to those of a recent meta-analysis where relative risks were 1.83 for current smokers and 1.34 for past smokers.32 Our protective estimates from physical activity are also similar to those identified by a recent large meta-analysis of 80 cohorts.33 Physical activity34 and current smoking35,36 are associated with average increase of 3–4 and decrease of 10 years in life expectancy, respectively, and physical inactivity and smoking are thought to be each responsible for approximately five million deaths worldwide. Our multivariable analysis also reiterates the combined effect of behavioural/environmental risk factors on mortality.37

Nutrition and nutrition quality has also been connected with mortality risk. We found a marker for a carotenoid nutrient (trans-lycopene) associated with all-cause mortality. Several observational studies have found correlations between carotenoid levels and mortality among elderly, for example among women38 and among Italian individuals.39 In NHANES III, Shardell and colleagues observed a modest decrease in mortality for 2nd and 3rd quartiles of lycopene.40 However, interventions focusing on carotenoid-related nutrients have not shown any benefits in clinical trials for prevention of chronic disease and cause-specific death (e.g ref41). Further, a randomized trial of a ‘tomato-rich diet’ containing high amounts of lycopene failed to change chronic disease risk profiles of 255 UK-based participants.42 Some investigations have suggested harm.43 Therefore, trans-lycopene may be a surrogate marker of other ‘healthy’ behaviours and possibly of a ‘healthy diet’ profile. It is unclear which measured or unmeasured correlate of trans-lycopene levels may be responsible for the association with mortality risk. Further, what exactly constitutes a ‘healthy diet’ is currently very difficult to define, in contrast to earlier claims.44 We have previously documented a large correlation matrix of environmental factors,10 and further studies should investigate how nutritional and other environmental and behavioural factors relate to one another8 to potentially trace sources of bias and harness confounding.

We found that urinary and serum cadmium were also associated with all-cause mortality. Tellez-Plaza et al. have reported similar results in these participants for blood and urinary cadmium on both all-cause and cardiovascular-related mortality, while adjusting for many other cardiovascular-related risk factors including smoking, cholesterol, blood pressure and medication use.45 Further studies will need to evaluate the relationship of behaviours that lead to cadmium exposure and all-cause or cause-specific mortality. For example, serum cadmium is postulated to indicate current exposure whereas urinary cadmium may reflect total body burden of cadmium, but urinary cadmium is reflective of serum levels.46 Serum cadmium levels increase as humans age, and sources include ambient air pollution (through fossil fuel combustion), diet and smoking.46 Cadmium levels were significantly correlated with smoking and age; however, the association of death risk with serum levels of cadmium was significant in the multivariable models even after smoking had been accounted for.

Our analysis on all-cause mortality has several limitations. First, to consider multiple factors in systematic and standard fashion, we had to make assumptions about what covariates to adjust for in our initial scan and replication procedure. Investigators may consider a different set of adjusting covariates specific for each factor; however, it is unclear how to attain a ‘standard’ set of covariates. We focused on a set of demographic factors (age, sex and race/ethnicity) that are impossible to modify, and on a set of socio-economic factors (income, education, occupation) that are very difficult for individuals to modify (although they are amenable to social and other multi-level interventions). Second, not all participants in NHANES have available measurements on the entire set of all factors assayed; thus, it is not possible to subject the scan to the same number of participants for each environmental and behavioural factor considered. This type of non-random missing information may lead to biased findings, as the sub-samples may not be representative of the larger sample. Although we did not detect any differences in population demographics of sub-samples, we acknowledge that missing data could have led to loss of power. Third, along these lines, our power calculations based on the available data with non-missing information suggest that we had power to detect modest relative risks, but many small effects could easily have been missed. The need to understand small effects requires a recalibration of our thinking about risk-factor epidemiology, with emphasis on very large studies and careful replication. For small effects, differentiating noise from genuine signals is difficult. Fourth, residual confounding is always possible in any observational associations, even those that are seemingly consistent and validated in different datasets.

Fifth, our data had a relative short follow-up in both the training (median 6 years) and testing (median 3 years) surveys and lacked repeated measurements of factors to assess the longer-term risk of these factors on mortality through time. We emphasize that in an investigation of non-institutionalized people in the general population, many environmental and behavioural factors will require longer exposure and follow-up times to detect associations with mortality. Whereas factors found through this study have strong evidence for association with all-cause mortality, we cannot rule out important factors not identified by these methods or these data. Further, the deceased participants considered here are older individuals (68 and 73 years mean age of deceased participants in the 1999–2002 and 2003–04 surveys, respectively), many of whose cause of death included chronic cardiovascular-related disease, such as heart disease.

Relatedly, many environmental factors, such as infectious agents, are only applicable to a small subset of the population, have lower prevalence and/or play a role in cause-specific mortality such as cancer. Therefore, a systematic scan of factors in a general population will be underpowered to detect these putative associations in the context of all-cause mortality. Specifically, the top causes of death in the general population included cardiovascular-related diseases (e.g. ischaemic heart disease, stroke and myocardial infarction) and the findings may only be pertinent to aetiologies of these diseases. On the other hand, factors with larger effect sizes but higher FDR (outliers on the volcano plot) can be noted for further investigation. For example, outliers in this study included hepatitis B and C, factors that may cause liver cancer.47 We emphasize that factors that are not top findings in such a scan may still play a large role in mortality risk, albeit in smaller sub-populations.

Sixth, we cannot claim that our systematic scan of environmental and behavioural factors in NHANES covers the entire space of the ‘exposome’.48 The CDC and NCHS have selected an array of behavioural and environmental factors based on their prevalence, measurement feasibility and hypothesized influence on population health. Furthermore, unlike static genetic factors, there is heterogeneity in exposure or self-reported factor ascertainment and exposures/behavioural factors will follow unique temporal patterns throughout an individual’s lifetime.49 For example, factors such as pollutants (e.g. polychlorinated biphenyls) are lipophilic and persistent in fatty tissue and are accrue in tissue over time.10 Other factors, such as bisphenol A, are metabolized rapidly, are short-lived and assume that individuals are continuously exposed to the factor (e.g. ref 50). Further, the relationships between the biomarkers and actual exposure are also difficult to surmise due to issues of sample timing and differential elimination. Self-reported dietary factors derived from a single point in time can be error-prone51,52 and there are documented examples of lack of concordance with objective indicators of intake.53,54 As a result of lack of comprehensive measures and heterogeneity, our systematic scan will have missed other candidate factors putatively associated with mortality risk.

Acknowledging these caveats, we have shown a generalized and systematic approach to identify strong and validated correlates of all-cause mortality and prioritize hypotheses regarding the association between environmental and behavioural factors and mortality. Instead of focusing on a few putative risk factors at a time, our approach gives a wider perspective about the strength of the evidence (or lack thereof) and the impact of a wide array of putative risk factors that may be possible to modify.

Supplementary Data

Supplementary data are available at IJE online.

Funding

This work was supported by the National Heart, Lung, and Blood Institute [T32 HL007034] to C.J.P., and the National Institute of Diabetes and Digestive Diseases [K24 DK085446] to G.M.C. and [K23 DK089086] to J.T.L.

Conflict of interest: None declared.

KEY MESSAGES.

  • Identification of environmental and behavioural factors associated with mortality is critical for public health and preventive care. However, there are few investigations that systematically search for associations between environmental and behavioural factors and all-cause mortality.

  • Here, we systematically associate 249 environmental and behavioural factors, such as urineary or serum markers of environmental exposure and self-reported nutrients, with time-to-death, and were able to tentatively validate five factors robustly associated with mortality.

  • Instead of focusing on a few risk factors at a time, our approach gives a wider perspective about the strength of the evidence (or lack thereof) and the impact of a wide array of risk factors that may be possible to modify.

Supplementary Material

Supplementary Data

References

  • 1.McGinnis JM, Foege WH. Actual causes of death in the United States. JAMA. 1993;270:2207–12. [PubMed] [Google Scholar]
  • 2.Mokdad AH, Marks JS, Stroup DF, Gerberding JL. Actual causes of death in the United States, 2000. JAMA. 2004;291:1238–45. doi: 10.1001/jama.291.10.1238. [DOI] [PubMed] [Google Scholar]
  • 3.Danaei G, Ding EL, Mozaffarian D, et al. The preventable causes of death in the United States: comparative risk assessment of dietary, lifestyle, and metabolic risk factors. PLoS Med. 2009;6:e1000058. doi: 10.1371/journal.pmed.1000058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wild CP. The exposome: from concept to utility. Int J Epidemiol. 2012;41:24–32. doi: 10.1093/ije/dyr236. [DOI] [PubMed] [Google Scholar]
  • 5.Schwartz D, Collins F. Medicine. Environmental biology and human disease. Science. 2007;316:695–96. doi: 10.1126/science.1141331. [DOI] [PubMed] [Google Scholar]
  • 6.Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008;19:640–48. doi: 10.1097/EDE.0b013e31818131e7. [DOI] [PubMed] [Google Scholar]
  • 7.Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ioannidis J, Loy EY, Poulton R, Chia KS. Researching Genetic Versus Nongenetic Determinants of Disease: A Comparison and Proposed Unification. Sci Transl Med. 2009;1:8. doi: 10.1126/scitranslmed.3000247. [DOI] [PubMed] [Google Scholar]
  • 9.Patel CJ, Bhattacharya J, Butte AJ. An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus. PLoS One. 2010;5:e10746. doi: 10.1371/journal.pone.0010746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Patel CJ, Cullen MR, Ioannidis JP, Butte AJ. Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels. Int J Epidemiol. 2012;41:828–43. doi: 10.1093/ije/dys003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tzoulaki I, Patel CJ, Okamura T, et al. A nutrient-wide association study on blood pressure. Circulation. 2012;126:2456–64. doi: 10.1161/CIRCULATIONAHA.112.114058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Fillenbaum GG, Burchett BM, Blazer DG. Identifying a national death index match. Am J Epidemiol. 2009;170:515–18. doi: 10.1093/aje/kwp155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Adler NE, Rehkopf DH. U.S. disparities in health: descriptions, causes, and mechanisms. Annu Rev Public Health. 2008;29:235–52. doi: 10.1146/annurev.publhealth.29.020907.090852. [DOI] [PubMed] [Google Scholar]
  • 14.Rehkopf DH, Berkman LF, Coull B, Krieger N. The non-linear risk of mortality by income level in a healthy population: US National Health and Nutrition Examination Survey mortality follow-up cohort, 1988-2001. BMC Public Health. 2008;8:383. doi: 10.1186/1471-2458-8-383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ainsworth BE, Haskell WL, Whitt MC, et al. Compendium of physical activities: an update of activity codes and MET intensities. Med Sci Sports Exerc. 2000;32(Suppl 9):S498–504. doi: 10.1097/00005768-200009001-00009. [DOI] [PubMed] [Google Scholar]
  • 16.US Department of Health and Human Services. 2008 Physical Activity Guidelines for Americans. 2008. Available from: http://www.health.gov/paguidelines/pdf/paguide.pdf (15 January 2013, date last accessed) [Google Scholar]
  • 17.Blanton CA, Moshfegh AJ, Baer DJ, Kretsch MJ. The USDA Automated Multiple-Pass Method accurately estimates group total energy and nutrient intake. J Nutr. 2006;136:2594–99. doi: 10.1093/jn/136.10.2594. [DOI] [PubMed] [Google Scholar]
  • 18.U.S. Department of Agriculture, Agricultural Research Service, Beltsville Human Nutrition Research Center et al. What We Eat in America, NHANES 2003-2004. Beltsville, MD: Beltsville Human Nutrition Research Center; Available from: ftp://ftp.cdc.gov/pub/Health_Statistics/nchs/nhanes/2003-2004/DR1TOT_C.XPT (15 January 2013, date last accessed) [Google Scholar]
  • 19.U.S. Department of Agriculture, Agricultural Research Service, Beltsville Human Nutrition Research Center et al. What We Eat in America, NHANES 2001-2002. Beltsville, MD: Beltsville Human Nutrition Research Center; Available from: ftp://ftp.cdc.gov/pub/Health_Statistics/nchs/nhanes/2001-2002/DRXTOT_B.XPT (15 January 2013, date last accessed) [Google Scholar]
  • 20.U.S. Department of Agriculture, Agricultural Research Service, Beltsville Human Nutrition Research Center et al. What We Eat in America, NHANES 1999-2000. Beltsville, MD: Beltsville Human Nutrition Research Center; Available from: ftp://ftp.cdc.gov/pub/Health_Statistics/nchs/nhanes/1999-2000/DRXTOT.XPT (15 January 2013, date last accessed) [Google Scholar]
  • 21.Flegal KM, Graubard BI, Williamson DF, Gail MH. Cause-specific excess deaths associated with underweight, overweight, and obesity. JAMA. 2007;298:2028–37. doi: 10.1001/jama.298.17.2028. [DOI] [PubMed] [Google Scholar]
  • 22.Therneau T. A Package for Survival Analysis in S. R package version 2.36-14. 2012. [Google Scholar]
  • 23.Lumley T. Survey: Analysis of Complex Survey Samples. R package version 3.14; 2009. [Google Scholar]
  • 24.StataCorp. Stata Statistical Software: Release 10. 10th edn. College Station, TX: StataCorp LP; 2007. [Google Scholar]
  • 25.Patel CJ, Chen R, Kodama K, Ioannidis JP, Butte AJ. Systematic identification of interaction effects between genome- and environment-wide associations in type 2 diabetes mellitus. Hum Genet. 2013;132:495–509. doi: 10.1007/s00439-012-1258-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Efron B. Large-Scale Inference. Cambridge, UK: Cambridge University Press; 2010. [Google Scholar]
  • 27.Gordon A. Classification. 2nd edn. Boca Raton, FL: Chapman and Hall/CRC; 1999. [Google Scholar]
  • 28.Qiu W, Chavarro J, Lazarus R, Rosner B, Ma J. powerSurvEpi: Power and sample size calculation for survival analysis of epidemiological studies. R package version 0.0.6; 2012. [Google Scholar]
  • 29.Hsieh FY, Lavori PW. Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. Control Clin Trials. 2000;21:552–60. doi: 10.1016/s0197-2456(00)00104-5. [DOI] [PubMed] [Google Scholar]
  • 30.Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39:499–503. [PubMed] [Google Scholar]
  • 31.Davison A, Hinkley D. Bootstrap Methods and Their Application. Cambridge, UK: Cambridge University Press; 1997. [Google Scholar]
  • 32.Gellert C, Schottker B, Brenner H. Smoking and all-cause mortality in older people: systematic review and meta-analysis. Arch Intern Med. 2012;172:837–44. doi: 10.1001/archinternmed.2012.1397. [DOI] [PubMed] [Google Scholar]
  • 33.Samitz G, Egger M, Zwahlen M. Domains of physical activity and all-cause mortality: systematic review and dose-response meta-analysis of cohort studies. Int J Epidemiol. 2011;40:1382–400. doi: 10.1093/ije/dyr112. [DOI] [PubMed] [Google Scholar]
  • 34.Moore SC, Patel AV, Matthews CE, et al. Leisure time physical activity of moderate to vigorous intensity and mortality: a large pooled cohort analysis. PLoS Med. 2012;9:e1001335. doi: 10.1371/journal.pmed.1001335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Jha P, Ramasundarahettige C, Landsman V, et al. 21st-century hazards of smoking and benefits of cessation in the United States. N Engl J Med. 2013;368:341–50. doi: 10.1056/NEJMsa1211128. [DOI] [PubMed] [Google Scholar]
  • 36.Pirie K, Peto R, Reeves GK, Green J, Beral V. The 21st century hazards of smoking and benefits of stopping: a prospective study of one million women in the UK. Lancet. 2013;381:133–41. doi: 10.1016/S0140-6736(12)61720-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Loef M, Walach H. The combined effects of healthy lifestyle behaviours on all cause mortality: a systematic review and meta-analysis. Prev Med. 2012;5:163–70. doi: 10.1016/j.ypmed.2012.06.017. [DOI] [PubMed] [Google Scholar]
  • 38.Nicklett EJ, Semba RD, Xue QL, et al. Fruit and vegetable intake, physical activity, and mortality in older community-dwelling women. J Am Geriatr Soc. 2012;60:862–68. doi: 10.1111/j.1532-5415.2012.03924.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lauretani F, Semba RD, Dayhoff-Brannigan M, et al. Low total plasma carotenoids are independent predictors of mortality among older persons: the InCHIANTI study. Eur J Nutr. 2008;47:335–40. doi: 10.1007/s00394-008-0732-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Shardell MD, Alley DE, Hicks GE, et al. Low-serum carotenoid concentrations and carotenoid interactions predict mortality in US adults: the Third National Health and Nutrition Examination Survey. Nutr Res. 2011;31:178–89. doi: 10.1016/j.nutres.2011.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.MRC/BHF Heart Protection Study of antioxidant vitamin supplementation in 20,536 high-risk individuals: a randomised placebo-controlled trial. Lancet. 2002;360:23–33. doi: 10.1016/S0140-6736(02)09328-5. [DOI] [PubMed] [Google Scholar]
  • 42.Thies F, Masson LF, Rudd A, et al. Effect of a tomato-rich diet on markers of cardiovascular disease risk in moderately overweight, disease-free, middle-aged adults: a randomized controlled trial. Am J Clin Nutr. 2012;95:1013–22. doi: 10.3945/ajcn.111.026286. [DOI] [PubMed] [Google Scholar]
  • 43.Bjelakovic G, Nikolova D, Gluud LL, Simonetti RG, Gluud C. Antioxidant supplements for prevention of mortality in healthy participants and patients with various diseases. Cochrane Database Syst Rev. 2012;3:CD007176. doi: 10.1002/14651858.CD007176.pub2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Hu FB, Willett WC. Optimal Diets for Prevention of Coronary Heart Disease. JAMA. 2002;288:2569–78. doi: 10.1001/jama.288.20.2569. [DOI] [PubMed] [Google Scholar]
  • 45.Tellez-Plaza M, Navas-Acien A, Menke A, Crainiceanu CM, Pastor-Barriuso R, Guallar E. Cadmium exposure and all-cause and cardiovascular mortality in the U.S. general population. Environ Health Perspect. 2012;120:1017–22. doi: 10.1289/ehp.1104352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Centers for Disease Control and Prevention, Agency for Toxic Substances and Disease Registry. ATDSR– Toxicological Profile: Cadmium. 2012. http://www.atsdr.cdc.gov/toxprofiles/tp.asp?id=48&tid=15 (15 January 2013, date last accessed) [Google Scholar]
  • 47.Altekruse SF, McGlynn KA, Reichman ME. Hepatocellular carcinoma incidence, mortality, and survival trends in the United States from 1975 to 2005. J Clin Oncol. 2009;27:1485–91. doi: 10.1200/JCO.2008.20.7753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Rappaport SM, Smith MT. Environment and Disease Risks. Science. 2010;330:460–61. doi: 10.1126/science.1192603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Athersuch TJ. The role of metabolomics in characterizing the human exposome. Bioanalysis. 2012;4:2207–12. doi: 10.4155/bio.12.211. [DOI] [PubMed] [Google Scholar]
  • 50.Calafat AM, Ye X, Wong LY, Reidy JA, Needham LL. Exposure of the U.S. population to bisphenol A and 4-tertiary-octylphenol: 2003-2004. Environ Health Perspect. 2008;116:39–44. doi: 10.1289/ehp.10753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Briefel RR, Flegal KM, Winn DM, Loria CM, Johnson CL, Sempos CT. Assessing the nation's diet: limitations of the food frequency questionnaire. J Am Diet Assoc. 1992;92:959–62. [PubMed] [Google Scholar]
  • 52.Byers T. Food frequency dietary assessment: how bad is good enough? Am J Epidemiol. 2001;154:1087–88. doi: 10.1093/aje/154.12.1087. [DOI] [PubMed] [Google Scholar]
  • 53.Brown D. Do food frequency questionnaires have too many limitations? J Am Diet Assoc. 2006;106:1541–42. doi: 10.1016/j.jada.2006.07.020. [DOI] [PubMed] [Google Scholar]
  • 54.Schatzkin A, Kipnis V, Carroll RJ, et al. A comparison of a food frequency questionnaire with a 24-hour recall for use in an epidemiological cohort study: results from the biomarker-based Observing Protein and Energy Nutrition (OPEN) study. Int J Epidemiol. 2003;32:1054–62. doi: 10.1093/ije/dyg264. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from International Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES