Background:
Several studies have confirmed associations between air pollution and overall mortality, but it is unclear to what extent these associations reflect causal relationships. Moreover, few studies to our knowledge have accounted for complex mixtures of air pollution. In this study, we evaluate the causal effects of a mixture of air pollutants on overall mortality in a large, prospective cohort of Dutch individuals.
Methods:
We evaluated 86,882 individuals from the LIFEWORK study, assessing overall mortality between 2013 and 2017 through national registry linkage. We predicted outdoor concentration of five air pollutants (PM2.5, PM10, NO2, PM2.5 absorbance, and oxidative potential) with land-use regression. We used logistic regression and mixture modeling (weighted quantile sum and boosted regression tree models) to identify potential confounders, assess pollutants' relevance in the mixture–outcome association, and investigate interactions and nonlinearities. Based on these results, we built a multivariate generalized propensity score model to estimate the causal effects of pollutant mixtures.
Results:
Regression model results were influenced by multicollinearity. Weighted quantile sum and boosted regression tree models indicated that all components contributed to a positive linear association with the outcome, with PM2.5 being the most relevant contributor. In the multivariate propensity score model, PM2.5 (OR=1.18, 95% CI: 1.08–1.29) and PM10 (OR=1.02, 95% CI: 0.91–1.14) were associated with increased odds of mortality per interquartile range increase.
Conclusion:
Using novel methods for causal inference and mixture modeling in a large prospective cohort, this study strengthened the causal interpretation of air pollution effects on overall mortality, emphasizing the primary role of PM2.5 within the pollutant mixture.
Keywords: Air pollution, Mortality, Mixture, Interaction, Machine learning, Causal methods, Propensity score
Exposure to air pollution has been found to be associated with higher mortality rates in several studies over the last decades,1–3 and associations have been reported even at low levels of exposure.2,4–7 However, to improve our understanding of these associations and to facilitate the development of better targeted public health regulations and interventions, it is important to determine to which extent these associations reflect causal relationships.8
When evaluating the health effects of environmental exposures such as air pollutants, it is important to account for the co-occurrence of multiple environmental constituents, present in the real world as a complex mixture.9 To evaluate the causal effects of air pollution on health, it is thus critical that studies account for this complex nature of exposure. This approach would allow identifying relevant contributors within the mixture as well as detecting potential interactions between pollutants. Several analytical methods have been proposed to deal with statistical challenges inherent to mixtures, such as co-exposure confounding, high correlation, and interaction between components of the mixture.10–12 Furthermore, regulatory policies are still mostly designed to regulate one pollutant or one source at a time, whereas more complex evaluations regarding causality may possibly lead to a more targeted regulatory policy.8 As such, there is a need to improve our understanding of the causal effects of environmental mixtures evaluated as a complex exposure situation of high-dimensional data.13,14
In this study, we investigated the effects of a mixture of five pollutants on overall mortality in a large population-based cohort of Dutch individuals where air pollution exposure has been assessed through state-of-the-art methodologies. We adopted a pluralistic approach exploring the pollutant mixture with targeted methods for high-dimensional exposures, including boosted regression tree and weighted quantile sum models and investigated the causal relationships between multiple pollutants and mortality with novel extensions of propensity score approaches.
MATERIAL AND METHODS
Study Participants and Outcome Definition
We used data from the LIFEWORK study, a large prospective cohort consortium comprising nearly 90,000 participants aged 18+ living in the Netherlands. LIFEWORK was designed as a federated study resulting from the integration of three existing Dutch cohorts: the Nightingale study, initiated in 2011 and the largest contributor to the LIFEWORK study (68%), the Occupational and Environmental Health Cohort Study (AMIGO) (17%) started in 2011, and the European Prospective Investigation into Cancer and Nutrition in the Netherlands (EPIC-NL) (15%), established between 1993 and 1997. Data were collected from each cohort between 2011 and 2012 (baseline questionnaires for AMIGO and Nightingale, follow-up questionnaire for EPIC-NL) and pooled to set up the LIFEWORK cohort, setting the baseline at January 1, 2013. The rationale, study design, and participant recruitment in LIFEWORK were discussed in detail elsewhere.15–18 The contributing subcohorts were approved by the local research ethics review committee or institutional review board (AMIGO and EPIC-NL Prospect by the committee at the University Medical Center Utrecht; EPIC-NL MORGEN by the committee at TNO Nutrition and Food Research; and Nightingale by the committee at the Netherlands Cancer Institute), and participants signed an informed consent form for each subcohort prior to enrolment.
From the original 88,466 LIFEWORK participants, we excluded 683 individuals with missing exposure information (their residential address either was incomplete; fell in the sea, river, or another watercourse; or at least one predictor for the land-use regression models was missing), 378 with reported emigration during the study, and 523 with no informed consent to link to the Municipal Personal Records Database (GBA). The GBA is a centralized automated population registration system that holds information on residence (home address) and date of death of people who reside in the Netherlands as well as personal data on migration. After exclusions, the total population evaluated in this study consisted of 86,882 individuals.
The outcome of interest was all-cause mortality, assessed by ascertaining vital status from the Dutch Central Bureau of Statistics (CBS) and date of death over a 5-year follow-up period (1 January 2013 to 31 December 2017) via data linkage to the GBA.
Exposure Assessment
We evaluated air pollution as a mixture of five components: particulate matter with aerodynamic diameter less than 2.5 μm (PM2.5), particulate matter with aerodynamic diameter less than 10 μm (PM10), a marker of diesel exhaust particulate (PM2.5 absorbance), nitrogen dioxide (NO2), and the oxidative potential estimated in PM2.5 by dithiothreitol.
Land-use regression models were fitted to estimate outdoor concentrations of air pollutants at the home address for each participant, combining monitoring of air pollution at different locations and predictor variables obtained from spatial data.19 Model development has been described in detail elsewhere.6 Briefly, we developed land-use regression models based upon annual average concentrations of PM2.5, PM2.5 absorbance, PM10 and NO2 measured between October 2008 and April 2011 during three 14-day periods to account for seasonal variation. We conducted measurements in 20 European study areas at 20–40 sites for PM and at 40–80 sites per area for NO2. The annual average ambient pollutant concentrations were estimated at addresses of study participants at baseline using as predictor variables data on traffic intensity, household density, land use, and other study-area variables such as altitude and distance to the sea. The median model explained variance (R2) ranged from 71% (PM2.5) to 89% (PM2.5 absorbance).5,20 Oxidative potential concentration was estimated based on a sampling period of three 2-week PM measurements carried out at 40 sites spread over the Netherlands and Belgium between February 2009 and February 2010 taking into account temporal variability. Land-use regression models for oxidative potential were estimated at participants’ addresses at baseline and achieved an R2 value of 60%.21
Covariates
We selected potential confounders of the associations between air pollution and overall mortality a priori based on results from preliminary studies.5,6,20 These potential confounders included age, sex, body mass index [BMI, weight (kg)/height (m)2], cardiovascular disease (CVD) diagnosis, chronic obstructive pulmonary disease (COPD) diagnosis, cancer diagnosis, smoking status (never, former, current), highest level of education attained (low, intermediate, high), the estimated monthly household income of the neighborhood based on income data provided by CBS in 2012 (www.cbs.nl), and the normalized difference vegetation index which quantifies vegetation density around each participant’s address based on Landsat 8 satellite images taken in 2008.22
Statistical Analysis
Descriptive statistics of the study population were evaluated overall and by levels of air pollution exposure. As the interest of this analysis was in pollutant mixtures, we identified profiles of pollutant mixture exposure through K-means cluster analysis. We evaluated the correlation between pollution components by calculating Spearman’s rank correlation coefficients.
We first evaluated the association between air pollution constituents and overall mortality with classical regression models, both independently (one model for each mixture component) as well as mutually adjusting pollutants in the same statistical model. In the primary analysis, mutual adjustment was performed by considering the full set of components available in the LIFEWORK cohort. Overall mortality was evaluated as a binary outcome (dead/alive) with logistic regression, estimating ORs for mortality risk, as well as with Poisson and Cox models to account for the duration of follow-up and for possible changes in event rates over time. A sensitivity analysis was conducted using multiple imputation by chained equation (MICE) to impute missing values in the exposures.23 Age, sex, BMI, smoking, and CVD diagnosis were specified as predictors in the algorithm for each incomplete exposure variable. An additional sensitivity analysis was performed by excluding individuals with baseline CVD diagnosis (angina, heart attack, transient ischemic attack, stroke, other heart conditions, defined according to ICD-9 and ICD-10), COPD, and cancer diagnosis. Last, we conducted a secondary analysis on overall mortality and a subset of components (NO2, PM2.5, PM10) representing a group of already regulated pollutants based on existing legislation.1
We used multiple regression models to identify confounders of the association to be evaluated in causal models. Specifically, we first evaluated a fully adjusted multiple regression model by adjusting for all covariates presented in the previous section and then removed those confounders that did not change any exposure coefficient by more than 10%. To assess the impact of multicollinearity of multiple regression estimates, we calculated variance inflation factors (VIFs).
To address issues of multicollinearity and to identify pollution constituents from clusters of correlated exposures that should be included in the causal analysis, we used weighted quantile sum and boosted regression tree models. In brief, these methods are techniques used in mixture modeling to identify the relative contribution of several exposures in the overall effect between the mixture and the outcome of interest, while accounting for high correlation structures.24,25 While both correlation analysis and multivariable regression can inform on the levels of correlation, neither of them can detect which covariates within the mixture are driving the associations, and to what extent. A weighted quantile sum summarizes the mixtures with a single index estimated as a weighted linear combination of the exposures and allows identifying the relative contribution of each mixture constituent. This technique makes the assumptions of linear associations on the quantile scale and of unidirectionality (all exposures-outcome associations are either positive or negative), but directly provides an estimate of the relative percent contribution of each exposure within the mixture.24 Boosted regression tree, on the other hand, is a machine learning technique based on tree modeling that does not provide any estimate of exposures contribution but allows ranking their relative importance while relaxing assumptions of unidirectionality and linearity, strengthening the interpretation of the results from the weighted quantile sum. In addition, boosted regression tree provides a qualitative assessment of interactions' importance (through the use of the measure called H-statistics), which can be used as an exploratory tool to detect two-way or higher-order interactions that should be incorporated in subsequent analyses.25,26
To estimate the causal effects of pollutant mixture on overall mortality we used propensity score methods, building the propensity scores from the set of confounders identified in the regression modeling.27 Propensity score methods achieve balance across a set of confounders thus reducing the confounding effect in the exposure–outcome relation. To evaluate pollutants as continuous exposures, we used the generalized propensity scores extension, which handles single continuous exposures given a set of confounders,28,29 under the assumption that exposures follow a normal distribution. We first used generalized propensity scores to generate weights for each continuous exposure separately.30 Next, to account for the mixture nature of air pollution, we used the multivariate generalized propensity score, a novel extension of the generalized propensity score for multiple simultaneous continuous exposures implemented in the R package mvGPS.31 Multivariate generalized propensity score has the advantage over generalized propensity score of simultaneously estimating weights for multivariate continuous exposures that are constructed as the ratio of the marginal density of the exposures to the conditional density.31 Specifically, the multivariate generalized propensity score generates stabilized inverse probability of treatment weights (IPTWs) assuming a multivariate normal distribution for the simultaneous exposures. These weights have been shown to balance confounders and provide unbiased exposure–response estimates.32 To optimize propensity score weights and avoid possible effects due to extreme weights, the procedure allows trimming both the upper and lower bounds of the weights’ distribution.33 We conducted the main analysis using the recommended weights threshold at the 99th percentile,31 and evaluated other thresholds (0.97, 0.95) in sensitivity analyses. All analyses were conducted with the R statistical software, version 4.0.4. Computing code related to all analyses presented is publicly available at https://github.com/andreabellavia/causalpm, also presenting different approaches to deal with categorical confounders, option that is not automatized in the current version of the mvGPS package (1.2.1) and requires additional coding. All exposures were evaluated as continuous variables and results indicate changes per interquartile range width (IQRw) increase in mean air pollution exposure.
Results
Baseline characteristics of the study population, overall and by levels of air pollution exposures, are presented in Table 1. K-means clustering identified three groups as the optimal characterization of the mixture, with the clusters summarizing levels of low, moderate, and high exposure to air pollution. Individuals with higher levels of exposure were on average older, lived in areas with lower normalized difference vegetation index, and were more likely to be smokers. Figure presents the correlation structure between air pollution constituents at baseline, while eTable 1; http://links.lww.com/EDE/B920 provides the distribution of each pollutant at baseline. All mixture components were highly positively correlated with each other.
TABLE 1.
Low Exposure | Moderate Exposure | High Exposure | Overall | |
---|---|---|---|---|
(N=34,018) | (N=37,853) | (N=15,011) | (N=86,882) | |
No. of participants (%) | ||||
Amigo | 22 | 15 | 10 | 17 |
EPIC | 9 | 16 | 28 | 15 |
Nightingale | 69 | 69 | 62 | 68 |
Age (years) | ||||
Mean (SD) | 48.8 (11.6) | 50.5 (12.9) | 52.2 (14.3) | 50.2 (12.7) |
Sex (%) | ||||
Male | 12 | 9 | 9 | 11 |
Female | 88 | 91 | 91 | 89 |
Highest level of education attainedc (%) | ||||
Low | 11 | 14 | 18 | 14 |
Intermediate | 48 | 43 | 35 | 44 |
High | 41 | 43 | 47 | 42 |
Missing | 0.2 | 0.2 | 0.3 | 0.2 |
Smoking status (%) | ||||
Never | 48 | 47 | 44 | 46 |
Former | 40 | 40 | 40 | 40 |
Current | 11 | 12 | 14 | 13 |
Missing | 0.6 | 1.0 | 1.7 | 1.0 |
Body mass index (kg/m2) | ||||
Mean (SD) | 25.2 (4.16) | 25.3 (4.30) | 25.2 (4.41) | 25.3 (4.26) |
Missing (%) | 0.4 | 0.6 | 0.9 | 0.6 |
CVD diagnosis at baseline (%) | ||||
Negative | 93 | 93 | 90 | 92 |
Positive | 7 | 7 | 10 | 8 |
COPD diagnosis at baseline (%) | ||||
Negative | 98 | 97 | 96 | 97 |
Positive | 2 | 3 | 4 | 3 |
Cancer diagnosis at baseline (%) | ||||
Negative | 98 | 97 | 95 | 97 |
Positive | 2 | 3 | 5 | 3 |
Monthly income estimated | ||||
Mean (SD) | 2,590 (764) | 2,800 (890) | 2,870 (1,000) | 2,730 (873) |
Missing (%) | 5.1 | 3.4 | 3.3 | 4.0 |
Normalized difference vegetation index | ||||
Mean (SD) | 0.571 (0.0844) | 0.503 (0.0796) | 0.448 (0.0864) | 0.520 (0.0943) |
Missing (%) | 2.6 | 1.2 | 0.9 | 1.7 |
NO2 (μg/m3) | ||||
Mean (SD) | 17.7 (2.55) | 24.9 (2.25) | 33.7 (4.30) | 23.6 (6.34) |
PM2.5 (μg/m3) | ||||
Mean (SD) | 16.3 (0.721) | 16.7 (0.559) | 17.0 (0.707) | 16.6 (0.704) |
PM2.5 absorbance (10−5 m−1) | ||||
Mean (SD) | 1.09 (0.132) | 1.29 (0.125) | 1.57 (0.227) | 1.26 (0.225) |
PM10 (μg/m3) | ||||
Mean (SD) | 24.1 (0.381) | 24.7 (0.643) | 26.4 (1.42) | 24.8 (1.12) |
Oxidative potential (nmol DTT/min/m3) | ||||
Mean (SD) | 1.06 (0.208) | 1.22 (0.165) | 1.32 (0.119) | 1.17 (0.202) |
Air pollution levels were estimated at baseline based on annual average concentrations measured between October 2008 and April 2011 (NO2, PM2.5, PM2.5 absorbance, PM10) and between February 2009 and February 2010 (Oxidative Potential).
Low, medium, and high levels of exposures derived with cluster analysis.
Low: primary school, lower vocational training or lower secondary education; intermediate: intermediate vocational education or intermediate/higher secondary education; high: higher vocational education or university degree.
Household income was estimated based on participants’ baseline postal code. Each postal code was linked to income data from Statistics Netherlands for December 2012.
CVD, cardiovascular disease; COPD, chronic obstructive pulmonary disease; SD, standard deviation.
During 5 years of follow-up, we observed 1071 deaths (1.2%). Results from logistic regression models are reported in Table 2 and eTable 2; http://links.lww.com/EDE/B920. Out of all potential confounders evaluated in fully adjusted models, only age, sex, BMI, smoking, and baseline CVD diagnosis met the criteria for confounding to be selected for inclusion in the final model (referred to, in tables, as minimally adjusted model). When mutually adjusting the full set of air pollution constituents in the same statistical model, both PM2.5 and PM10 were associated with higher odds of mortality (respectively, OR=1.17, 95% CI: 0.99–1.37; OR=1.21, 95% CI: 1.03–1.42), even though VIFs for these coefficients were relatively high (Table 2). PM2.5 absorbance was associated with a reduction in the odds of mortality, but the extremely high VIF associated with this coefficient suggests that this result might be due to (multi)collinearity. Results from the multivariable logistic regression model using MICE to impute the missing exposures showed no discrepancies from findings on complete cases (eTable 3; http://links.lww.com/EDE/B920). When mutually adjusting the models for a subset of air pollution constituents represented by NO2, PM2.5, and PM10, both PM2.5 (OR=1.03, 95% CI: 0.94–1.14) and PM10 (OR=1.06, 95% CI: 0.95–1.17) showed a positive, albeit much weaker, association with overall mortality (eTable 4; http://links.lww.com/EDE/B920). We observed negligible differences when excluding individuals with baseline CVD, and when using Poisson (data not shown) or Cox models (eTable 5; http://links.lww.com/EDE/B920). We, therefore, chose to only present results from logistic regression, as this allows a direct comparison with the statistical methods we used in our study to explore causal relationships, for which time-to-event models are not currently available.
TABLE 2.
Multivariable Model With Minimal Adjustmenta | |||
---|---|---|---|
Constituent | OR | 95% CI | VIF |
NO2 | 0.98 | (0.82–1.18) | 5.11 |
PM2.5 | 1.17 | (0.99–1.37) | 4.03 |
PM2.5 absorbance | 0.74 | (0.55–0.98) | 18.60 |
PM10 | 1.21 | (1.03–1.42) | 7.22 |
Oxidative potential | 1.07 | (0.96–1.19) | 1.58 |
Age, sex, BMI, smoking, CVD diagnosis.
BMI, body mass index; CVD, cardiovascular disease; OR, odds ratio; CI, confidence interval; VIF, variance inflation factor.
To evaluate the mixture of pollutants while accounting for the strong correlations, we estimated the relative contribution of each exposure in the mixture–outcome association with boosted regression tree and weighted quantile sum models. In the boosted regression tree model, which provides a nonparametric estimation that accounts for nonlinearities and interactions, all measures of H-statistics were consistently low, indicating a negligible impact of interactions in the mixture–outcome association (eFigure 1; http://links.lww.com/EDE/B920), and confirmed that exposure–response relationships were mostly linear and positive or null for all mixture components (data not shown). As such, weighted quantile sum assumptions were met, and this method could be used to provide an accurate estimate of the relative importance of the mixture components. Estimates of weighted quantile sum weights, presented in eFigure 2; http://links.lww.com/EDE/B920, show a prominent role of PM2.5 in the association, greatly surpassing the contribution of PM10 and other components of the mixture. Moreover, the negligible weight associated with PM2.5 absorbance indicates that the negative association observed in multiple regression for that variable is likely due to (multi)collinearity. The association between the overall mixture and mortality, estimated by the weighted quantile sum index, was negligible in our population (β=0.01, 95% CI: −0.03 to 0.04) (eFigure 3; http://links.lww.com/EDE/B920).
Based on results from multiple regression and mixture modeling, we built propensity score models using the minimal set of confounders (age, sex, BMI, smoking, CVD diagnosis), and all exposures were included in the models as continuous covariates, thus evaluating their linear effect on the outcome. Furthermore, based on results from boosted regression tree and weighted quantile sum models, we excluded PM2.5 absorbance from the analysis to limit the impact of multicollinearity on the results.
Table 3 presents results from the univariate and multivariate generalized propensity score models, with the recommended weights trimming at 0.99. All exposures met the normality distribution assumption required by these techniques. PM2.5 was associated with increased odds of mortality (OR=1.18, 95% CI: 1.08–1.29). PM10 was also associated with increased odds of mortality, even though the coefficient was attenuated (OR=1.02, 95% CI: 0.91–1.14) as compared to those from the multiple regression model. Results that considered alternative trimming are shown in eTable 6 (http://links.lww.com/EDE/B920) and indicate no discrepancies with the main finding.
TABLE 3.
GPS | mvGPS | |||
---|---|---|---|---|
Constituent | OR | 95% CI | OR | 95% CI |
NO2 | 1.10 | (1.01–1.19) | 1.13 | (0.97–1.31) |
PM2.5 | 1.11 | (1.03–1.20) | 1.18 | (1.08–1.29) |
PM10 | 1.08 | (1.02–1.15) | 1.02 | (0.91–1.14) |
Oxidative potential | 1.09 | (1.00–1.19) | 0.97 | (0.89–1.06) |
Trimming 0.99.
PS based on age, sex, BMI, smoking, CVD diagnosis.
BMI, body mass index; CVD, cardiovascular disease; OR, odds ratio; CI, confidence interval; GPS, generalized propensity score; mvGPS, multivariate generalized propensity score.
DISCUSSION
In this study, conducted on a large sample of individuals from the Dutch general population, we observed positive associations between air pollution mixtures and all-cause mortality, with PM2.5 being the main driver of the associations. Through the application of causal modeling approaches for environmental mixtures, we strengthened the causal interpretation of these findings, observing a strong effect of PM2.5 and a moderate effect of PM10.
Our findings are in line with results from previous studies,7,34,35 with the Netherlands being characterized by homogeneous geographic conditions due to its relatively small land extension and high population density compared to other geographic areas around the globe. In this regard, a recent systematic review supporting the derivation of updated guidelines by the World Health Organization (WHO) on PM exposure and mortality, highlighted the importance of considering the heterogeneity of study location and population characteristics, as well as level and composition of PM, among others, when interpreting and comparing results from different studies.36
The potential harmful effects of air pollution on overall mortality have been the primary focus of extensive research over the last decades.1–4 Associations have been repeatedly observed all over the world, and recent studies have also suggested that associations might follow linear relationships where even low levels of pollution might be harmful to health.2,7,37 Nevertheless, several research gaps in air pollution epidemiology remain to be addressed. First, air pollution is a complex exposure that should be characterized as a mixture, with different components and constituents possibly operating through either similar or different biologic pathways in the human body.38–43 Extensive work has been devoted to the development of high-resolution concentration surfaces of the different components and constituents of the complex ambient air pollution exposure.24–31 Epidemiologic studies, however, are mostly evaluating air pollution components one by one and switching the focus to air pollution as an environmental mixture has been advocated.44 Second, to improve our understanding of the mechanisms through which air pollution operates and to allow the development of more stringent public health regulations and interventions, it is important to determine to which extent these associations reflect causal relationships.8 Methods to address causality in observational studies are widely available,45,46 and several reports have discussed the application of these techniques in air pollution epidemiology.13,14 It is also desirable that such causal modeling approaches will account for the complex nature of air pollution as a mixture.13,14
To the best of our knowledge, this study was one of the first attempts to assess the causal effects of a mixture of air pollutants in a large population-based study. Our results confirm previous findings observed in this and other cohorts, showing a positive linear association between pollution components such as PM2.5 and PM10 and overall mortality. In addition, by jointly evaluating several components in the same statistical framework, we observed that PM2.5 seems to be the strongest predictor of overall mortality and that interactive mechanisms were not influential in our cohort. The possible mechanisms through which PM2.5 operates are increased systemic inflammation and oxidative stress, increased blood pressure, and reduced lung function, thus resulting in a greater risk of cardiovascular and respiratory morbidity.37 Results are consistent across the different methods applied, with the largest effect on overall mortality obtained for PM2.5 using the multivariate generalized propensity score. This method possibly provides, on theoretical grounds, more robust estimates compared to both the univariable and multivariable logistic regression, and the univariate generalized propensity score. However, due to the lack of studies that have previously applied this extension of the propensity score in epidemiologic settings, and therefore the inability to directly compare our findings with those obtained in other cohorts, this result must be interpreted with caution. The 2019 Integrated Science Assessment (ISA) released by US Environmental Protection Agency (EPA) rated the association between PM2.5 and natural-cause mortality as suggestive,47 contrary to PM10 which was already fully recognized as harmful to human health. Our results, by distinguishing the roles of PM10 and PM2.5, and showing the prominent role of the latter in our study population, provide relevant results that can inform future public health policies.
This study has several strengths. First, it is one of the first studies to evaluate the causal effects of air pollution while jointly evaluating several pollutants components as an environmental mixture. Specifically, we used a recent extension of the generalized propensity score, the multivariate generalized propensity score approach, that, to our knowledge, has never been used before in environmental epidemiology. While making the assumption that all evaluated exposures are normally distributed, the multivariate score improves on several aspects as compared with other approaches. First, the propensity score is a balancing score, which means that conditioning on propensity score via regression adjustment implies that individuals within the same strata of the propensity score should be identical in terms of their observable characteristics, regardless of their level of treatment.28,29 Thanks to the balancing property, the propensity score thus removes sources of potential confounding and returns valid estimates by balancing covariates to predict the probability of exposure.27 Second, the multivariate generalized propensity score approach has the ability of simultaneously estimating propensity score weights for each exposure, thus achieving superior balance compared to univariate alternatives. In addition, through the multivariate score, it is possible to specify multiple sets of confounders for each exposure of interest reflecting many real-world settings in which the confounders may actually differ across exposure variables. Finally, the option to trim extreme weights at a particular percentile and the wide number of metrics that can be used to select and compare different propensity score approaches, make the multivariate generalized propensity score a method well suited to get more robust estimates on the joint effect of multiple continuous exposures on health outcomes, confirming and possibly strengthening results obtained with more traditional methods. We recommend that future studies validate our results in other cohorts with this or alternative causal modeling techniques. Second, we used a pluralistic approach integrating several statistical methods for causal inference and environmental mixtures.46 To identify relevant predictors within the air pollution mixture we used two statistical methods, namely weighted quantile sum and boosted regression tree, that allow ranking the importance of exposures in the overall mixture–outcome association, thus informing which regression results might be biased due to the high correlation. In this study, multiple regression results were influenced by (multi)collinearity due to the high correlation structure, particularly PM2.5 absorbance which was shown to be mostly irrelevant in the mixture–outcome association once the high correlation was accounted for. Third, we used data from a large population of Dutch individuals with a prospective design, and a high-resolution assessment of air pollution components, all elements that further enhance the robustness of our results and the causal interpretation of these findings.
A limitation of this study is the relatively short duration of follow-up that did not allow us to thoroughly evaluate how the effects of air pollution may change over time. Future studies with longer follow-up should replicate these analyses and evaluate overall mortality as a time-to-event outcome for those statistical techniques where this extension is available. Moreover, no information was available on air pollution levels other than those modeled at the participants’ home addresses, thus precluding the possibility to quantify the exposure in places where participants could have spent some of their time during the day or when moving from one place to another. Furthermore, information on emigration time was not available for the majority of participants who had emigrated during the follow-up. As such, these individuals had to be excluded from the analysis. In addition, despite several sociodemographic covariates that were available and could be investigated as potential confounders of the associations, we cannot exclude the presence of residual confounding due to variables that were not available in this study. Exposures were derived using land-use regression models, which might introduce more complexity due to the use of shared predictors that may lead to stronger correlations between exposures than those existing in the real world.48 In large cohorts, such as the one we considered in our study, it is usually difficult or impossible to directly measure the different pollutants for each participant due to logistics complexity and the high costs associated, and therefore it is common to rely on exposure modeling. This is also suggested by WHO which indicates that exposure modeling is a logical or empirical construct that allows the estimation of an individual or population exposure parameters from available input data.49 Finally, in this first attempt to evaluate the causal effects of air pollution mixture we only focused on five major components of air pollution that had been assessed in this cohort. Future studies within LIFEWORK should consider finer pollution characterization, once this is available, by integrating additional components into the models, such as ultrafine particles, black carbon, as well as PM elemental constituents. Also, future studies could further expand analyses to include additional environmental risk factors (water pollution, noise, electromagnetic fields) and relevant conditions, such as lung cancer or respiratory diseases, making use of the statistical methods we proposed in our study to account for complex interrelations between risk factors in real-life settings. These results should also advise quantitative researchers to study and develop novel methods that could improve our understanding of the causal effects of complex mixtures of environmental pollutants.
In conclusion, this study strengthened the causal interpretation of air pollution effects on mortality while also accounting for the complex nature of the exposure as an environmental mixture. We encourage air pollution researchers to further study the causal effects of air pollution mixtures to continue improving our scientific knowledge on the relationship between air pollution and health outcomes, and to facilitate governmental bodies to better target regulations thanks to the identification of the strongest contributor(s) to overall mortality from a complex mixture.
ACKNOWLEDGMENTS
The authors are greatly indebted to all LIFEWORK participants. They are also grateful to Inka Pieterson at IRAS for the data management.
Supplementary Material
Footnotes
We acknowledge financial support from the EXPANSE (EC H2020, grant agreement No 874627) and EXPOSOME-NL. EXPOSOME-NL is funded through the Gravitation program of the Dutch Ministry of Education, Culture, and Science and the Netherlands Organization for Scientific Research (NWO grant number 024.004.017).
The authors report no conflicts of interest.
Data sharing statement: Researchers interested in collaboration are invited to propose occupational and environmental research based on the data available within LIFEWORK or to submit a request for additional data collection. Requests can be submitted to R.C.H. Vermeulen (r.c.h.vermeulen@uu.nl) and will be reviewed by the LIFEWORK scientific board.
Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com).
REFERENCES
- 1.World Health Organization. Ambient (Outdoor) Air Quality and Health. Fact Sheet No. 313. Geneva: World Health Organization. 2015. [Google Scholar]
- 2.Di Q, Wang Y, Zanobetti A, et al. Air pollution and mortality in the medicare population. N Engl J Med. 2017;376:2513–2522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Brook RD, Rajagopalan S, Pope CA, 3rd, et al.; American Heart Association Council on Epidemiology and Prevention, Council on the Kidney in Cardiovascular Disease, and Council on Nutrition, Physical Activity and Metabolism. Particulate matter air pollution and cardiovascular disease: An update to the scientific statement from the American Heart Association. Circulation. 2010;121:2331–2378. [DOI] [PubMed] [Google Scholar]
- 4.Wei Y, Yazdi MD, Di Q, et al. Emulating causal dose-response relations between air pollutants and mortality in the Medicare population. Environ Health. 2021;20:53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Beelen R, Hoek G, Vienneau D, et al. Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe – The ESCAPE project. Atmos Environ. 2013;72:10–23. [Google Scholar]
- 6.Chen J, Rodopoulou S, de Hoogh K, et al. Long-term exposure to fine particle elemental components and natural and cause-specific mortality-a pooled analysis of eight European cohorts within the ELAPSE Project. Environ Health Perspect. 2021;129:47009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Strak M, Weinmayr G, Rodopoulou S, et al. Long term exposure to low level air pollution and mortality in eight European cohorts within the ELAPSE project: pooled analysis. BMJ. 2021;374:n1904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.HEI Health Effect Institute. Strategic plan for understanding the health effects of air pollution. 2020–2025. Fist Draft May 2019. Available online: https://www.healtheffects.org/sites/default/files/First-Draft-HEI-StrategicPlan2020-2025.pdf. 2019.
- 9.Dominici F, Peng RD, Barr CD, Bell ML. Protecting human health from air pollution: shifting from a single-pollutant to a multipollutant approach. Epidemiology. 2010;21:187–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Taylor KW, Joubert BR, Braun JM, et al. statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environ Health Perspect. 2016;124:A227–A229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Billionnet C, Sherrill D, Annesi-Maesano I; GERIE study. Estimating the health effects of exposure to multi-pollutant mixture. Ann Epidemiol. 2012;22:126–141. [DOI] [PubMed] [Google Scholar]
- 12.Stafoggia M, Breitner S, Hampel R, Basagaña X. Statistical approaches to address multi-pollutant mixtures and multiple exposures: the state of the science. Curr Environ Health Rep. 2017;4:481–490. [DOI] [PubMed] [Google Scholar]
- 13.Dominici F, Zigler C. Best practices for gauging evidence of causality in air pollution epidemiology. Am J Epidemiol. 2017;186:1303–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Carone M, Dominici F, Sheppard L. In pursuit of evidence in air pollution epidemiology: the role of causally driven data science. Epidemiology. 2020;31:1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Beulens JW, Monninkhof EM, Verschuren WM, et al. Cohort profile: the EPIC-NL study. Int J Epidemiol. 2010;39:1170–1178. [DOI] [PubMed] [Google Scholar]
- 16.Pijpe A, Slottje P, van Pelt C, et al. The Nightingale study: rationale, study design and baseline characteristics of a prospective cohort study on shift work and breast cancer risk among nurses. BMC Cancer. 2014;14:47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Slottje P, Yzermans CJ, Korevaar JC, Hooiveld M, Vermeulen RC. The population-based Occupational and Environmental Health Prospective Cohort Study (AMIGO) in the Netherlands. BMJ Open. 2014;4:e005858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Reedijk M, Lenters V, Slottje P, et al. Cohort profile: LIFEWORK, a prospective cohort study on occupational and environmental risk factors and health in the Netherlands. BMJ Open. 2018;8:e018504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hoek G, Beelen R, de Hoogh K, et al. A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ. 2008;42, 7561–7578. [Google Scholar]
- 20.Eeftens M, Beelen R, de Hoogh K, et al. Development of land use regression models for PM(2.5), PM(2.5) absorbance, PM(10) and PM(coarse) in 20 European study areas; results of the ESCAPE project. Environ Sci Technol. 2012;46:11195–11205. [DOI] [PubMed] [Google Scholar]
- 21.Yang A, Wang M, Eeftens M, et al. Spatial variation and land use regression modeling of the oxidative potential of fine particles. Environ Health Perspect. 2015;123:1187–1192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rhew IC, Vander Stoep A, Kearney A, Smith NL, Dunbar MD. Validation of the normalized difference vegetation index as a measure of neighborhood greenness. Ann Epidemiol. 2011;21:946–952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.van Buuren S., Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Soft. 2011;45:1–67. [Google Scholar]
- 24.Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat. 2015;20:100–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A. The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees. Environ Health. 2014;13:57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bellavia A, Dickerson AS, Rotem RS, Hansen J, Gredal O, Weisskopf MG. Joint and interactive effects between health comorbidities and environmental exposures in predicting amyotrophic lateral sclerosis. Int J Hyg Environ Health. 2021;231:113655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. [Google Scholar]
- 28.Imai K, van Dyk DA. Causal inference with general treatment regimes. J Am Stat Assoc. 2004;99:854–866. [Google Scholar]
- 29.Hirano K, Imbens GW. The Propensity Score with Continuous Treatments. In Wiley Series in Probability and Statistics (eds. Gelman A, Meng X.-L.) 73–84 (John Wiley & Sons, Ltd, 2005). doi: 10.1002/0470090456.ch7. [Google Scholar]
- 30.Greifer N. WeightIt: weighting for covariate balance in observational studies. R package version 0.1. 0. 2017. [Google Scholar]
- 31.Williams JR, Crespi CM. Causal inference for multiple continuous exposures via the multivariate generalized propensity score. arXiv preprint arXiv. 2008;2020:13767. (2020). [Google Scholar]
- 32.Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560. [DOI] [PubMed] [Google Scholar]
- 33.Lee BK, Lessler J, Stuart EA. Weight trimming and propensity score weighting. PLoS One. 2011;6:e18174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pinault L, Tjepkema M, Crouse DL, et al. Risk estimates of mortality attributed to low concentrations of ambient fine particulate matter in the Canadian community health survey cohort. Environ Health. 2016;15:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cohen AJ, Brauer M, Burnett R, et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015. Lancet. 2017;389:1907–1918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chen J, Hoek G. Long-term exposure to PM and all-cause and cause-specific mortality: a systematic review and meta-analysis. Environ Int. 2020;143:105974. [DOI] [PubMed] [Google Scholar]
- 37.Shi L, Zanobetti A, Kloog I, et al. Low-Concentration PM2.5 and mortality: estimating acute and chronic effects in a population-based study. Environ Health Perspect. 2016;124:46–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pearce JL, Waller LA, Mulholland JA, et al. Exploring associations between multipollutant day types and asthma morbidity: epidemiologic applications of self-organizing map ambient air quality classifications. Environ Health. 2015;14:55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Austin E, Coull B, Thomas D, Koutrakis P. A framework for identifying distinct multipollutant profiles in air pollution data. Environ Int. 2012;45:112–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ Health. 2014;13:17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pearce JL, Waller LA, Chang HH, et al. Using self-organizing maps to develop ambient air quality classifications: a time series example. Environ Health. 2014;13:56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Winquist A, Kirrane E, Klein M, et al. Joint effects of ambient air pollutants on pediatric asthma emergency department visits in Atlanta, 1998-2004. Epidemiology. 2014;25:666–673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zanobetti A, Austin E, Coull BA, Schwartz J, Koutrakis P. Health effects of multi-pollutant profiles. Environ Int. 2014;71:13–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dominici F, Peng RD, Barr CD, Bell ML. Protecting human health from air pollution: shifting from a single-pollutant to a multipollutant approach. Epidemiology. 2010;21:187–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Rothman KJ, Greenland S. Causation and causal inference in epidemiology. Am J Public Health. 2005;95 Suppl 1:S144–S150. [DOI] [PubMed] [Google Scholar]
- 46.Vandenbroucke JP, Broadbent A, Pearce N. Causality and causal inference in epidemiology: the need for a pluralistic approach. Int J Epidemiol. 2016;45:1776–1786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.EPA U. US EPA integrated science assessment for particulate matter (final report) 2009. Dec,[Google Scholar]. [Google Scholar]
- 48.Szpiro AA, Paciorek CJ. Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics. 2013;24:501–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.World Health Organization. Regional Office for Europe. Methods of assessing risk to health from exposure to hazards released from waste landfills. https://apps.who.int/iris/handle/10665/108362. 2000.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.