Abstract
Objective
To demonstrate an application of Bayesian model averaging (BMA) with generalised additive mixed models (GAMM) and provide a novel modelling technique to assess the association between inhalable coarse particles (PM10) and respiratory mortality in time-series studies.
Design
A time-series study using regional death registry between 2009 and 2010.
Setting
8 districts in a large metropolitan area in Northern China.
Participants
9559 permanent residents of the 8 districts who died of respiratory diseases between 2009 and 2010.
Main outcome measures
Per cent increase in daily respiratory mortality rate (MR) per interquartile range (IQR) increase of PM10 concentration and corresponding 95% confidence interval (CI) in single-pollutant and multipollutant (including NOx, CO) models.
Results
The Bayesian model averaged GAMM (GAMM+BMA) and the optimal GAMM of PM10, multipollutants and principal components (PCs) of multipollutants showed comparable results for the effect of PM10 on daily respiratory MR, that is, one IQR increase in PM10 concentration corresponded to 1.38% vs 1.39%, 1.81% vs 1.83% and 0.87% vs 0.88% increase, respectively, in daily respiratory MR. However, GAMM+BMA gave slightly but noticeable wider CIs for the single-pollutant model (−1.09 to 4.28 vs −1.08 to 3.93) and the PCs-based model (−2.23 to 4.07 vs −2.03 vs 3.88). The CIs of the multiple-pollutant model from two methods are similar, that is, −1.12 to 4.85 versus −1.11 versus 4.83.
Conclusions
The BMA method may represent a useful tool for modelling uncertainty in time-series studies when evaluating the effect of air pollution on fatal health outcomes.
Keywords: Bayesian model averaging, Generalized additive mixed model, PM<sub>10</sub>, Respiratory mortality, Time-series study, Model uncertainty
Strengths and limitations of this study.
Provide a novel modelling technique allowing for the modelling uncertainty derived from knots selection for conventional GAMM to assess the association between air pollutants and adverse health outcomes.
Provide robust effect estimation from time-series studies on PM10 and fatal health outcomes.
Uncertainty from variable selection and other sources was not investigated.
No lag effect of PM10 on respiratory mortality was examined.
Introduction
Numerous time-series studies have indicated a positive association of ambient inhalable coarse particles, including particulate matters with diameters >2.5 µm and <10 µm (PM2.5−10) and PM10 with daily respiratory death counts.1–6 In time-series studies, one major methodology concern is potential confounding due to factors that vary on similar timescales as the pollutant concentrations or outcome. Although time-series studies have substantially strengthened the evidence base for the adverse health effect of PM10, methodological development of time-series studies to better adjust for confounding is still fully justified.7 The impact of potential confounders, for example, weather and time, on the association of PM10 with mortality or other health outcomes can be non-linear and may vary with season. Thus, a wide variety of approaches have been developed and applied in modelling and estimating the non-linear functions of continuous confounders in recent years. Prominent examples are smoothing splines,8 penalised basis splines,9 adaptive regression splines10 11 and local polynomials.12 These methods allow for greater flexibility in data modelling, because they relax the linearity assumption traditionally required in standard parametric methods.
The generalised additive mixed model (GAMM), an extension of the generalised additive model (GAM), has become a widely used method for evaluating short-term effect of air pollution. The fact that it allows for serial correlations and spatial designs makes it a popular method in environmental epidemiological studies.13 However, often there are several competing models to select from (ie, using different combinations of confounders), and the process of selecting the best/optimal model often varies and evaluating one's ultimate selection among the others is difficult. Model uncertainty can be significant, as selection of models might lead to largely different conclusions, but often the classical approach conditioning on a single presumed model ignores or underestimates such uncertainty.14
One common way to comprehend the problem is to conduct a ‘sensitivity analysis’ using a range of different plausible models to investigate the robustness of the estimates.15 However, this still does not incorporate model uncertainty into effect estimates because it still requires selection of a single final model. To address this, researchers have proposed the Bayesian model averaging (BMA) method when assessing the triggering effect of air pollution on mortality.16 BMA is a technique designed to account for the uncertainty of the model selection process. By averaging over many different competing models, BMA incorporates model uncertainty into the estimation of parameters and prediction. BMA has been applied successfully in many statistical model classes including linear regression, generalised linear models, Cox regression models and discrete graphical models, in all cases improving predictive performance.17
Our study re-examined the association between the concentration of PM10 (including PM2.5−10) and the daily respiratory mortality rate (MR) from a time-series study design.1 The goal of this paper is to demonstrate application of BMA within the GAMM frame and provide novel modelling techniques for time-series studies. We first demonstrated the application of the BMA method in GAMM. Second, we compared the estimates of three modelling techniques, that is, generalised linear mixed model (GLMM), optimal GAMM and Bayesian model averaged GAMM (GAMM+BMA). The study was approved by the Institutional Review Board of Basic Medical Sciences, Chinese Academy of Medical Sciences, China.
Materials and methods
The data used in this study included the number of daily respiratory deaths, air quality and meteorological conditions from 1 January 2009 to 31 December 2010 in eight districts having air quality monitoring stations of a metropolitan area in Northern China. The geographic location of the eight districts was published elsewhere.1 Respiratory mortality data were obtained from the regional Causes of Death Registry (CDR). All deaths in CDR were coded according to the 10th version of the International Classification of Diseases (ICD-10). The data collection was described in detail elsewhere.1 18 In this study, ICD-10 codes J00–J98 were used to identify deaths due to respiratory diseases. In total, 10.38 million permanent residents and 9559 respiratory deaths were included in the study. Air quality data included the concentrations of PM10, nitrogen oxides (NOx) and carbon monoxide (CO). The daily concentrations of these pollutants were presented as an average of 24 hourly measurements. To adjust for the effect of weather conditions, data on meteorological conditions, including mean daily temperature, relative humidity, wind speed and barometric pressure during the study period, were obtained from the local meteorological administration. Previous studies conducted in the same area mostly used citywide average pollutant concentrations,19 20 whereas our study used pollutant concentrations measured in 11 stations of the study area and used station-specific pollutant concentrations. The spatial distribution of these monitoring stations over the districts was described elsewhere.1
The daily numbers of respiratory deaths in the eight districts were assumed to follow quasi-Poisson distribution to account for overdispersion by relaxing the distribution assumption that the variance equals the mean.21 Given the non-linear relationship of the daily number of respiratory deaths to calendar day, temperature and barometric pressure, we used GAMM to account for these non-parametric components and district-level random effect. Natural splines were used to fit the non-linear trend of the mortality, adjusting for potential confounders, that is, meteorological conditions and day of the week (DOW). The full GAMM for single-pollutant included PM10, relative humidity, wind speed, DOW, smoothing functions for calendar day, temperature and barometric pressure, as well as random effect of districts, and can be expressed as:
![]() |
1 |
where E(yi,t) is the expected number of deaths in district i on t-th day, DOW is a dummy variable for day of week, Districti is a dummy variable for the eight districts and Zi is a random intercept for districts i. s(.)s are the smoothing functions realised by natural cubic spline with n1 knots per year to adjust for long-term temporal trend, n2 knots for temperature and n3 knots for barometric pressure.22 Although natural cubic spline offers less flexibility at the limits where the second derivatives are zero, it presents a larger variance around the limits.23 We used the annual average population size of each district as the offset in the Poisson regression model.24 We also used the multipollutant model that included NOx and CO to adjust for potential confounding from other pollutants. The optimal GAMM with the most appropriate number of knots for calendar day, temperature and relative humidity was determined by minimising Akaike information criterion (AIC).25
However, because we usually have rather limited knowledge about seasonal or longer time trend in the mortality time series, knot selection in GAMM might be complicated and oversmoothing or undersmoothing the series may potentially attenuate a true pollution effect.7 In such a situation, the BMA method provides a plausible solution to incorporate the model uncertainty derived from the knot selection. The basic idea behind BMA is summarised as follows.17
Considering model Mk having the structure given by equation 1, if β is the coefficient of interest, then its posterior distribution given data D is
![]() |
2 |
This is an average of the posterior distributions under each of the GAMM models considered, weighted by their posterior model probability. In equation (2), M1, …, MK are the models considered. The posterior probability for the model Mk is given by
![]() |
3 |
where Mk is one of the potential underlying models for data D with a prior probability pr(Mk) that it is true, and
![]() |
4 |
is the integrated likelihood of model Mk. In equation (4), βk is the coefficient of model Mk, pr(βk|Mk) is the prior density of βk under model Mk, pr(D|βk, Mk) is the likelihood and all probabilities are implicitly conditional on all models being considered.
The posterior mean and variance of β are defined as:
![]() |
5 |
and
![]() |
6 |
where .
The 95% Bayesian credible interval (CI) of β is
![]() |
7 |
The posterior probability pr(Mk|D) in the above formula for each model was estimated by:26
![]() |
8 |
where is the Bayesian information criterion (BIC) of model k, which extracts a penalty according to the number of terms in the model, and
is the average of
(l=1,…, K).
in equation 5 and
in equation 6 are the posterior mean and variance of the parameter of interested, which derived from existing functions to estimate GAMMs may not correspond to the posterior mean and variance. In addition, we use BIC in equation 8 to approximate Bayes factors. Thus, the method presented here is an empirical approximation to a fully Bayesian form of model averaging, bridging classical frequentist and Bayesian estimation methods.27
28
In our study, the prior probability pr(Mk) was assumed to be from the uniform distribution, thus:
![]() |
9 |
According to equation 5, the posteriors mean is averaged over all models, applying weights that depend on the degree to which data support each model. The weight given by equation 8 incorporates the BIC component that penalises for dimensionality of the model. Therefore, the best models are weighted heavily in equation 5 by equation 8, which in turn heavily favours parsimonious models as well.
To consider correlations between PM10 and CO and NOx, we introduced principal components (PCs) derived from principal component analysis (PCA) into the multipollutant models to exclude the impacts of collinearity between the three pollutants.18 We then transformed the regression coefficients of the PCs back to the regression coefficients of the original pollutants.
All analyses in our study were conducted in the statistical software Stata (V.14.1; StataCorp LP, College Station, Texas, USA) and using R software (V.3.2.3) packages ‘mgcv’, ‘splines’ and ‘lme4’. Since <2% of the observations in the data set were incomplete, the listwise deletion method was used to handle missing values.
Results
The daily respiratory mortality rate (per 100 000 persons) and PM10, NOx and CO concentrations of the eight districts are shown in table 1. The highest daily mortality rates were found in districts 3 (median=0.24; IQR=0.24) and 8 (median=0.25; IQR=0.25). During the 2-year study period, the annual median concentrations for PM10, NOx and CO were 106.0 μg/m3, 61.0 μg/m3 and 1.20 mg/m3, respectively. The annual median concentrations of PM10 and NOx were above the limits of Class II of the National Ambient Air Quality Standards of China (70 μg/m3 for PM10 and 50 μg/m3 for NOx), but that for CO was below the national limit (4 mg/m3).29
Table 1.
Daily respiratory mortality rate and PM10 concentrations by districts in the study area, 2009–2010
Mortality rate (1/100 000 persons) |
PM10 (μg/m3) |
NOx (μg/m3) |
CO (mg/m3) |
||||||
---|---|---|---|---|---|---|---|---|---|
Districts | Population (in 1000) | Median | P25–P75 | Median | P25–P75 | Median | P25–P75 | Median | P25–P75 |
District 1 | 896 | 0.11 | 0–0.22 | 94.0 | 57–138 | 52.0 | 33–78 | 1.20 | 0.8–1.7 |
District 2 | 3001 | 0.10 | 0.06–0.13 | 106.5 | 67–151 | 72.0 | 50.5–109.5 | 1.30 | 0.85–1.9 |
District 3 | 851 | 0.24 | 0.12–0.35 | 110.3 | 73.5–159 | 70.5 | 50.5–107.5 | 1.38 | 1.0–2.1 |
District 4 | 2814 | 0.07 | 0.04–0.14 | 112.0 | 71–154 | 79.0 | 52–116 | 1.20 | 0.8–2.0 |
District 5 | 316 | 0.00 | 0–0.32 | 82.5 | 49–124 | 33.0 | 23–53 | 1.00 | 0.6–1.4 |
District 6 | 546 | 0.18 | 0–0.18 | 129.0 | 83–174 | 60.0 | 44–88 | 1.40 | 1.0–2.0 |
District 7 | 736 | 0.00 | 0–0.14 | 108.5 | 66–154 | 52.0 | 37–75 | 0.90 | 0.6–1.4 |
District 8 | 1218 | 0.25 | 0.08–0.33 | 105.5 | 68.5–150.5 | 73.0 | 53–107.5 | 1.35 | 0.95–2.0 |
Total | 10 378 | 0.11 | 0–0.22 | 106.0 | 66–150 | 61.0 | 41–93 | 1.20 | 0.8–1.8 |
P25, the 25th percentile; P75, the 75th percentile.
The meteorological conditions of the same period in the study area are shown in table 2. The mean temperature was 13.0°C ranging from −12.5°C to 34.5°C, the mean relative humidity was 51.0% ranging from 13.0% to 92.0% and the mean barometric pressure was 101.2 kPa ranging from 99.0 to 103.7 kPa, a characteristic of a typical subhumid warm continental monsoon climate.
Table 2.
Meteorological conditions in the study area 2009–2010
Mean | SD | Min | Q1 | Median | Q3 | Max | |
---|---|---|---|---|---|---|---|
Air temperature (°C) | 13.0 | 11.7 | −12.5 | 1.7 | 14.7 | 24.3 | 34.5 |
Wind speed (m/s) | 2.2 | 1.0 | 0.5 | 1.5 | 2.1 | 2.7 | 6.4 |
Relative humid (%) | 51.0 | 19.2 | 13.0 | 35.0 | 52.0 | 67.0 | 92.0 |
Barometric pressure (kPa) | 101.2 | 1.0 | 99.0 | 100.4 | 101.1 | 102.0 | 103.7 |
The pairwise Pearson's correlation coefficients between pollutants and meteorological conditions are shown in table 3. We observed strong linear correlation between temperature and barometric pressure (r=−0.83, p<0.001). To control for the collinearity, we included temperature, relative humidity and wind speed but not barometric pressure in the conventional GLMM analysis.
Table 3.
Pairwise Pearson correlation coefficients between pollutants and meteorological conditions
PM10 | NOx | CO | Temperature | Barometric pressure | Relative humidity | |
---|---|---|---|---|---|---|
NOx | 0.4780* | |||||
CO | 0.5532* | 0.8210* | ||||
Temperature | −0.0157* | −0.3206* | −0.2939* | |||
Barometric pressure | −0.1845* | 0.1535* | 0.0892* | −0.8266* | ||
Humidity | 0.2178* | 0.1699* | 0.3215* | 0.3258* | −0.3121* | |
Wind speed | −0.1413* | −0.4626* | −0.4800* | −0.0668* | 0.0509* | −0.4859* |
*p<0.05.
The observed and predicted (based on quasi-Poisson distribution) numbers of daily respiratory deaths of the eight districts between 2009 and 2010 corresponded well with no zero-inflation observed, supporting the distributional assumptions in GAMM analysis (figure 1). There were a clear temporal trend of daily number of respiratory deaths (figure 2A) and a non-linear relationship between daily number of respiratory deaths and temperature as well as barometric pressure (figure 2C, D), supporting the choice of including them as smoothing functions in GAMM. There was a moderate linear relationship of daily average relative humidity to daily number of respiratory deaths (figure 2E), and we therefore included daily average relative humidity as a linear component in the model. However, we did not observe a clear linear or non-linear relationship between daily number of respiratory deaths and wind speed (figure 2F). As a result, we included wind speed as a linear component first and as a smoothing function later in a sensitivity analysis.
Figure 1.
Observed proportion and predicted probability based on Poisson distribution of number of daily respiratory deaths in the study area between 2009 and 2010.
Figure 2.
Relationship between number of daily respiratory deaths and (A) days; (B) PM10 concentrations and (C–F) meteorological conditions. Lowess; locally weighted scatterplot smoothing.
We also examined the seasonality of respiratory mortality using autocorrelation function (ACF). The slow decrease in autocorrelation from lag 1 to lag 50 shown in the correlogram (figure 3A) indicates that there is some temporal trend in the mortality series. We removed this trend by regressing the mortality against a smoothing function of time. The correlogram of the residuals after removing the seasonality (figure 3B) shows substantially less autocorrelation. Thus, season is an important factor related strongly to PM10, meteorological conditions and mortality as shown in figures 2 and 3.
Figure 3.
ACF for respiratory mortality for (A) raw data and (B) residuals after removing seasonality. ACF, autocorrelation functions.
There were moderate to strong correlations between PM10 and CO as well as NOx (table 3), and we introduced the first and the second PCs derived from PCA into the multipollutant models. They accounted for 94.22% of variation of the three pollutants.
For GAMM analyses, we used different combinations of number of knots (from 3 or 4 to 24 knots for each smoothing function of calendar day, temperature and barometric pressure) and showed the results of combinations with the relative large posterior probability around the best model, that is, 12, 14 and 16 knots calendar day; 5, 6 and 7 knots for temperature and 4, 5 and 6 knots for barometric pressure, respectively. The knot combinations with convergence problem or extreme small posterior probability were excluded. The estimated coefficients, their corresponding SEs as well as AICs and BICs of 27 considered versions of single-pollutant (ie, PM10) GAMMs are shown in table 4. The estimated regression coefficients of PM10 changed little with different knots for barometric pressure but increased with the increasing knots for temperature. However, when the number of knots of calendar day changed from 12 to 14 and from 14 to 16, the regression coefficients of PM10 showed a slight U-shape (table 4).
Table 4.
Coefficients of PM10 of GAMMs for single-pollutant with different knots
Model | Number of knots (D, T, P) | ![]() |
SE | AIC | BIC | Posterior probability |
---|---|---|---|---|---|---|
1 | 12, 5, 4 | 0.0001609643 | 0.0001499442 | 15063.93 | 15230.75 | 0.028997 |
2 | 12, 5, 5 | 0.0001609382 | 0.0001499458 | 15063.93 | 15230.75 | 0.028997 |
3 | 12, 5, 6 | 0.0001609357 | 0.0001499460 | 15063.93 | 15230.75 | 0.028997 |
4 | 12, 6, 4 | 0.0001611031 | 0.0001497939 | 15063.70 | 15230.51 | 0.032694 |
5 | 12, 6, 5 | 0.0001611031 | 0.0001497939 | 15063.70 | 15230.51 | 0.032694 |
6 | 12, 6, 6 | 0.0001611030 | 0.0001497939 | 15063.70 | 15230.51 | 0.032694 |
7 | 12, 7, 4 | 0.0001649193 | 0.0001497870 | 15063.35 | 15230.16 | 0.038947 |
8 | 12, 7, 5 | 0.0001649172 | 0.0001497871 | 15063.35 | 15230.16 | 0.038947 |
9 | 12, 7, 6 | 0.0001649211 | 0.0001497870 | 15063.35 | 15230.16 | 0.038947 |
10 | 14, 5, 4 | 0.0001592525 | 0.0001503313 | 15063.58 | 15230.39 | 0.034716 |
11 | 14, 5, 5 | 0.0001592566 | 0.0001503310 | 15063.58 | 15230.39 | 0.034716 |
12 | 14, 5, 6 | 0.0001592535 | 0.0001503312 | 15063.58 | 15230.39 | 0.034716 |
13 | 14, 6, 4 | 0.0001607594 | 0.0001500942 | 15063.40 | 15230.21 | 0.037985 |
14 | 14, 6, 5 | 0.0001607662 | 0.0001500939 | 15063.40 | 15230.21 | 0.037985 |
15 | 14, 6, 6 | 0.0001607661 | 0.0001500939 | 15063.40 | 15230.21 | 0.037985 |
16 | 14, 7, 4 | 0.0001646505 | 0.0001500780 | 15063.04 | 15229.85 | 0.045477 |
17 | 14, 7, 5 | 0.0001646507 | 0.0001500780 | 15063.04 | 15229.85 | 0.045477 |
18 | 14, 7, 6 | 0.0001646507 | 0.0001500780 | 15063.04 | 15229.85 | 0.045477 |
19 | 16, 5, 4 | 0.0001633165 | 0.0001502163 | 15063.64 | 15230.46 | 0.033522 |
20 | 16, 5, 5 | 0.0001633165 | 0.0001502164 | 15063.65 | 15230.46 | 0.033522 |
21 | 16, 5, 6 | 0.0001633165 | 0.0001502164 | 15063.65 | 15230.46 | 0.033522 |
22 | 16, 6, 4 | 0.0001650914 | 0.0001499675 | 15063.46 | 15230.27 | 0.036863 |
23 | 16, 6, 5 | 0.0001650913 | 0.0001499675 | 15063.46 | 15230.27 | 0.036863 |
24 | 16, 6, 6 | 0.0001650912 | 0.0001499675 | 15063.46 | 15230.27 | 0.036863 |
25 | 16, 7, 4 | 0.0001690050 | 0.0001499511 | 15063.10 | 15229.91 | 0.044133 |
26 | 16, 7, 5 | 0.0001690088 | 0.0001499509 | 15063.10 | 15229.91 | 0.044133 |
27 | 16, 7, 6 | 0.0001690044 | 0.0001499512 | 15063.10 | 15229.91 | 0.044133 |
AIC, Akaike information criterion; BIC, Bayesian information criterion; D, day; GAMMs, generalised additive mixed models; P, barometric pressure; T, temperature.
Estimated increases in respiratory MR for single-pollutant, multipollutant and PCA-based multipollutant models are presented in figure 4. The same knot combination for temperature, etc, was optimal across the single-pollutant and the multipollutant models. The results of the GLMMs, optimal GAMMs and GAMM+BMA are presented of in table 5. Only GLMM of the single-pollutant model confirmed a statistically significant association between PM10 and daily number of respiratory deaths, with the largest effect of PM10 3.07 (95% CI 0.91 to 5.27) per cent increase in daily respiratory MR per IQR increase in PM10 concentration. GAMM+BMA and the optimal GAMM of single-pollutant, multipollutants and PCA-based multipollutant showed comparable results for the effect of PM10 on daily respiratory MR, that is, one IQR increase in PM10 concentration corresponded to 1.38% vs 1.39%, 1.81% vs 1.83% and 0.87% vs 0.88% increase, respectively, in daily respiratory MR (table 5). However, by incorporating the uncertainty in knots selection, GAMM+BMA gave slightly but noticeable wider CIs for the single-pollutant model (−1.09 to 4.28 vs −1.08 to 3.93) and the PCA-based model (−2.23 to 4.07 vs −2.03 vs 3.88). The CIs of the multiple-pollutant model from the two methods are similar, that is, −1.12 to 4.85 versus −1.11 versus 4.83. The results indicate that BMA provides inference about parameters taking account of the different knot selection strategies, this sometimes being an important source of uncertainty in additive model analysis, and in our example we have found that single model-based CIs tend to be narrow, which might increase the probability of false-positive finding.
Figure 4.
Estimated per cent increase in daily respiratory deaths per IQR increase in PM10 concentration in GLMM, optimal GAMM, GAMMs with different knots in day, temperature and pressure (indicated by D, T and P) and GAMM+BMA for single pollutant, multiple pollutants and PCA. BMA, Bayesian model averaging; GAMM, generalised additive mixed model; GLMM, generalised linear mixed model; PCA, principal component analysis.
Table 5.
Per cent increase in daily respiratory MR associated with an IQR increase in PM10 concentration from GLMM, optimal GAMM and GAMM+BMA
Single-pollutant |
Multipollutant |
Multipollutant (PCA) |
||||
---|---|---|---|---|---|---|
Model | Per cent | 95% CI | Per cent | 95% CI | Per cent | 95% CI |
GLMM | 3.07 | (0.91 to 5.27) | 1.94 | (−0.80 to 4.75) | 1.47 | (−1.17 to 4.17) |
Optimal GAMM* | 1.39 | (−1.08 to 3.93) | 1.83 | (−1.11 to 4.83) | 0.88 | (−2.03 to 3.88) |
GAMM+BMA | 1.38 | (−1.09 to 4.28) | 1.81 | (−1.12 to 4.85) | 0.87 | (−2.23 to 4.07) |
*Knots for days, temperature and barometric pressure are 14, 7 and 4, respectively.
BMA, Bayesian model averaging; GAMM, generalised additive mixed model; GLMM, generalised linear mixed model; MR, mortality rate; PCA, principal component analysis.
In addition, the effect of the first PC in GAMM and GAMM+BMA was statistically significant (data not shown), potentially indicating a joint effect of PM10, NOx and CO on respiratory mortality. For sensitivity analysis, including wind speed as a linear component or a smoothing function in GAMM changed the results little (data not shown).
Discussion
Yang et al1 and Zhang et al30 had examined the triggering effect of PM10 on respiratory mortality in the same area previously. However, the studies were based on GAM. Although GAM is a powerful method for modelling non-linear effects of continuous covariates in regression models with non-Gaussian response, it cannot account for the between-cluster heterogeneity and within-cluster correlation of the pollutant concentrations. In recent years, smoothing based mixed model, that is, GAMM and its extensions have gained popularity in part for its ability to account for such limitations.13 31–33 Using a subset of multisite time-series data from Yang et al's1 study, we reinvestigated the associations between short-term exposure to ambient inhalable coarse particles PM10 and daily number of deaths from respiratory diseases. By adding the random effect of districts to the additive predictors, we used GAMM to provide a unified likelihood framework for non-parametric regression for potentially correlated exposures.13
Standard statistical methods for estimating the association between air pollutants and adverse health outcomes often fail to incorporate the model uncertainty in effect estimates. For example, knot selection for splines used in GAMM is critical, because it determines the degree of smoothness in the smoothing function of time as well as the amount of residual temporal variation in mortality. In studies using GAMM to examine effect of pollutants, we usually have rather limited knowledge about the complexity of the seasonal and long-term trends in the mortality time series or in the pollution time series. Although there are often biological or mechanistic information that is applied, current approaches in choosing the number and location of knots are mainly data-driven34 35 and are based on prior knowledge of the timescales where confounding is more likely to occur. More importantly, the single model selection at the end that ignores the entire process of getting to the final model and the failure to account for this uncertainty might bias our judgment.
Thus, there is concern about underestimated uncertainty in model selection in time-series air pollution studies. If a single ‘best’ model was used, the variance estimates for its coefficients will not fully reflect their true uncertainties.36 A coherent and conceptually simple way to take into account model uncertainty when making reference is BMA. In theory, BMA provides better average predictive performance than any single model, and this theoretical result has now been supported in practice in a range of applications involving different model classes and types of data.17 While BMA is an attractive solution to the problem of model uncertainty, it is not yet part of the standard data analysis tools in epidemiological studies due to several practical difficulties, including impractical exhaustive summation of posterior distribution, the computational difficulty of integrals and the challenge of specifying prior distribution over competing models. In addition, owing to lack of official solution for GAMM in common commercial statistical software, model fit criteria such as the BIC are not handily available,26 making it difficult to construct the model weight to weigh estimated coefficients of the individual models. Although Whitney and Ngo26 demonstrated the application of BMA in the GAM frame, they only examined the piece-wise linear relationship between air pollutants and mortality, essentially limiting their algorithm to a GLMM.
In our study, we demonstrated the feasibility of implementing BMA in GAMM frame in R software environment. For single-pollutant and multipollutant models, our optimal GAMM and GAMM+BMA gave equivalent point estimates for effect of PM10 on respiratory MR. Compared with previously published studies, our GAMM+BMA method for single-pollutant gave comparable results (per cent increases ranging from 0.87 to 1.38 vs 1.01 to 2.071 18 30 37 38). However, our multipollutant models showed smaller effect of PM10, which was consistent with previous findings suggesting that the effect of PM10 in multipollutant models was about two to three times smaller1 18 or slightly reversed.30 Although the relative increase of MR is rather small, taking into account the huge population (>10 million) in the study area, it still raises a severe public health challenge.
However, our main interest was not the estimated coefficients but the uncertainty of the estimation from different modelling strategies. The GAMM+BMA gave slightly but noticeable wider confidence even when only considering the model uncertainty from the knot. The averaged estimate across a set of potential valid models would derive more robust interval estimation for reducing the type I error. Future studies may use our technique to build a model-averaged dose–response relationship between PM10 and daily mortality to account for model uncertainty with respect to location and number of knots. Our method allows for a fully parametric characterisation of the effect of air pollutants on adverse health outcomes as well as adjusts for the non-parametric effects of other covariates.
There are also limitations in our study. First, we only considered the uncertainty from the knots selection but not from the selection of covariates and confounders, which is another major source of the uncertainty in estimating the effect of PM10. However, this limitation can be easily overcome by including different covariates and smoothing functions in the GAMM. Second, GAMM sometimes demonstrates problems in convergence and SE estimation, which might result in the failure of coefficient estimation. The data with many zeros can cause such problems in a model with a log link, because a mean of zero corresponds to an infinite range of linear predictor values. Fundamentally, it is due to lack of identifiability, and it is the case in our study. However, the problem can be addressed using stricter convergence criteria and panelised splines.39 Third, in our study, we did not investigate the lag effect of PM10 on respiratory mortality, because our interest lies on the uncertainty of the estimated association but not the association itself. However, previous studies in the same area indicated that the strongest effect of air pollutants on respiratory mortality was in day 0 (lag 0) and day 1 (lag 1), and the strongest cumulative effect was in 2-day moving average of day 0 and day 1.18 We incorporated lag 1 of PM10 concentration in the single-pollutant and multipollutant models as a sensitivity analysis and estimated effects of lag 0 PM10 on respiratory MR increase per IQR increase in lag 0 PM10 reduced to 0.79 (−1.91 to 3.55) and 1.20 (95% CI −1.93 to 4.42), respectively. Similar results were found for GAMM+BAM. Thus, the effects of PM10 need to be investigated in further GAMM+BAM with different lag structures.
In our study, we selected a uniform distribution for the prior distribution. In Bayesian statistics, the choice of the prior distribution is often controversial. Different rules for selecting priors have been suggested in the literature.40 The most intuitive solution is to use the uniform distribution as a non-informative prior when no information is available, although it does not integrate to 1 in most of the cases (this does not pose a major problem for Bayesian analyses). In view of the large sample size in our study, the data will dominate the posterior distribution (they will overwhelm the prior), so the selection of prior distribution would not pose much effect on our parameter estimation. Actually, we were informed by previous studies on the knot and variable selection, but we pretended to be uninformative in the current study in order to incorporate uncertainty (including inappropriate models) as more as possible. We would like to use informative prior distribution to narrow the range of the included models and examine performance of different prior distributions in future studies.
In conclusion, there is an increasing interest in the use of GAMM to investigate the association between short-term exposure to PM10 and adverse health outcomes in time-series studies. Epidemiology studies incorporate different modelling strategies to adjust for confounding, making it difficult to compare results across studies. Furthermore, the uncertainty of model selection has rarely been considered and quantified in these studies. Using BMA in the GAMM frame is a promising approach to investigate the association of air pollution with adverse health using time-series data, as well as other applications. Naturally, BMA tends to produce larger SE than the models that ignore model uncertainty do. However, the conclusions of BMA are more robust than those derived from analyses dependent upon a particularly selected model. Future work should aim to extend the model averaging over more extensive families of models, employing the method to explore heterogeneity in other areas (such as confounder selection), developing corresponding package in R for widespread use and exploring implications of utilising these resulting estimates of effect and corresponding uncertainty on policy-making and decision-making.
Footnotes
Contributors: XF designed the study, did data analysis, interpreted the results and drafted the article; RL prepared the data, drafted the article and revised it critically; HK revised the article critically for important intellectual content; MB provided statistical consultation and revised the article critically; FF and YC are the guarantors of the study, and they monitored the study implementation and revised the article critically for important intellectual content. All authors contributed to further drafts and approved the version to be published.
Funding: The authors gratefully acknowledge research grants to YC from the Junior Faculty Research Grants (C62412022) of the Institute of Environmental Medicine, Karolinska Institutet, and from the fund for PhD research (KID-funds) and travel (KI-foundations and funds) of Karolinska Institutet, Sweden, and research grants to RL from the Public Welfare Research Program of National Health and Family Planning Commission of China (201402022) and from the Opening Project of Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3).
Disclaimer: The funding sponsors had no role in the design of the study, in the collection, analysis or interpretation of data, in the writing of the manuscript or in the decision to publish the research results.
Competing interests: None declared.
Ethics approval: Institutional Review Board of Basic Medical Sciences, Chinese Academy of Medical Science, China.
Provenance and peer review: Not commissioned; externally peer reviewed.
Data sharing statement: No additional data are available.
References
- 1.Yang Y, Cao Y, Li W et al. Multi-site time series analysis of acute effects of multiple air pollutants on respiratory mortality: a population-based study in Beijing, China. Sci Total Environ 2015;508:178–87. 10.1016/j.scitotenv.2014.11.070 [DOI] [PubMed] [Google Scholar]
- 2.Shang Y, Sun Z, Cao J et al. Systematic review of Chinese studies of short-term exposure to air pollution and daily mortality. Environ Int 2013;54:100–11. 10.1016/j.envint.2013.01.010 [DOI] [PubMed] [Google Scholar]
- 3.Téllez-Rojo MM, Romieu I, Ruiz-Velasco S et al. Daily respiratory mortality and PM10 pollution in Mexico City: importance of considering place of death. Eur Respir J 2000;16:391–6. 10.1034/j.1399-3003.2000.016003391.x [DOI] [PubMed] [Google Scholar]
- 4.Ostro BD, Hurley S, Lipsett MJ. Air pollution and daily mortality in the Coachella Valley, California: a study of PM10 dominated by coarse particles. Environ Res 1999;81:231–8. 10.1006/enrs.1999.3978 [DOI] [PubMed] [Google Scholar]
- 5.Analitis A, Katsouyanni K, Dimakopoulou K et al. Short-term effects of ambient particles on cardiovascular and respiratory mortality. Epidemiology 2006;17:230–3. 10.1097/01.ede.0000199439.57655.6b [DOI] [PubMed] [Google Scholar]
- 6.Katsouyanni K, Touloumi G, Spix C et al. Short-term effects of ambient sulphur dioxide and particulate matter on mortality in 12 European cities: results from time series data from the APHEA project. Air Pollution and Health: a European Approach. BMJ 1997;314:1658–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Peng RD, Dominici F, Louis TA. Model choice in time series studies of air pollution and mortality. J Roy Stat Soc Ser A Stat Soc 2006;169:179–98. 10.1111/j.1467-985X.2006.00410.x [DOI] [Google Scholar]
- 8.Dominici F, McDermott A, Zeger SL et al. On the use of generalized additive models in time-series studies of air pollution and health. Am J Epidemiol 2002;156:193–203. 10.1093/aje/kwf062 [DOI] [PubMed] [Google Scholar]
- 9.von Klot S, Peters A, Aalto P et al. Ambient air pollution is associated with increased risk of hospital cardiac readmissions of myocardial infarction survivors in five European cities. Circulation 2005;112:3073–9. 10.1161/CIRCULATIONAHA.105.548743 [DOI] [PubMed] [Google Scholar]
- 10.Terzi Y, Cengiz MA. Using of generalized additive model for model selection in multiple Poisson regression for air pollution data. Sci Res Essays 2009;4:867–71. [Google Scholar]
- 11.Duarte BPM, Saraiva PM. Hybrid models combining mechanistic models with adaptive regression splines and local stepwise regression. Ind Eng Chem Res 2003;42:99–107. 10.1021/ie0107744 [DOI] [Google Scholar]
- 12.Jerrett M, Arain A, Kanaroglou P et al. A review and evaluation of intraurban air pollution exposure models. J Expo Anal Environ Epidemiol 2005;15:185–204. 10.1038/sj.jea.7500388 [DOI] [PubMed] [Google Scholar]
- 13.Lin X, Zhang D. Inference in generalized additive mixed models by using smoothing splines. J Roy Stat Soc Ser B Stat Methodol 1999;61:381–400. 10.1111/1467-9868.00183 [DOI] [Google Scholar]
- 14.Raftery AE, Gneiting T, Balabdaoui F et al. Using Bayesian model averaging to calibrate forecast ensembles. Mon Weather Rev 2005;133:1155–74. 10.1175/MWR2906.1 [DOI] [Google Scholar]
- 15.Saltelli A, Ratto M, Andres T et al. Global sensitivity analysis: the primer. West Sussex: John Wiley & Sons, 2008. [Google Scholar]
- 16.Koop G, Tole L. Measuring the health effects of air pollution: to what extent can we really say that people are dying from bad air? J Environ Econ Manag 2004;47:30–54. 10.1016/S0095-0696(03)00075-5 [DOI] [Google Scholar]
- 17.Hoeting JA, Madigan D, Raftery AE et al. Bayesian model averaging: a tutorial. Stat Sci 1999;14:382–401. 10.1214/ss/1009212519 [DOI] [Google Scholar]
- 18.Yang Y, Li R, Li W et al. The association between ambient air pollution and daily mortality in Beijing after the 2008 olympics: a time series study. PLoS One 2013;8:e76759 10.1371/journal.pone.0076759 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang F, Wang W, Lv J et al. Time-series studies on air pollution and daily outpatient visits for allergic rhinitis in Beijing, China. Sci Total Environ 2011;409:2486–92. 10.1016/j.scitotenv.2011.04.007 [DOI] [PubMed] [Google Scholar]
- 20.Zhang F, Xu J, Zhang Z et al. Ambient air quality and the effects of air pollutants on otolaryngology in Beijing. Environ Monit Assess 2015;187:495 10.1007/s10661-015-4711-3 [DOI] [PubMed] [Google Scholar]
- 21.Ver Hoef JM, Boveng PL. Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data? Ecology 2007;88:2766–72. 10.1890/07-0043.1 [DOI] [PubMed] [Google Scholar]
- 22.Dominici F, Daniels M, Zeger SL et al. Air pollution and mortality: estimating regional and national dose–response relationships. J Am Stat Assoc 2002;97:100–11. 10.1198/016214502753479266 [DOI] [Google Scholar]
- 23.Jbilou J, El Adlouni S. Generalized additive models in environmental health: a literature review. INTECH Open Access Publisher, 2012. [Google Scholar]
- 24.Koken PJ, Piver WT, Ye F et al. Temperature, air pollution, and hospitalization for cardiovascular diseases among elderly people in Denver. Environ Health Perspect 2003;111:1312–17. 10.1289/ehp.5957 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yang L, Qin G, Zhao N et al. Using a generalized additive model with autoregressive terms to study the effects of daily temperature on mortality. BMC Med Res Methodol 2012;12:165 10.1186/1471-2288-12-165 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Whitney M, Ngo L. Bayesian model averaging using SAS software. SUGI 2004;29:9–12. [Google Scholar]
- 27.Clyde M. Model averaging. New York: Wiley-Interscience, 2003. [Google Scholar]
- 28.Clyde M, George EI. Model uncertainty. Stat Sci 2004;19:81–94. [Google Scholar]
- 29.State Environmental Protection Agency of China. China national ambient air quality standard (GB 3095-2012). Beijing: State Environmental Protection Agency of China, 2012. [Google Scholar]
- 30.Zhang F, Li L, Krafft T et al. Study on the association between ambient air pollution and daily cardiovascular and respiratory mortality in an urban district of Beijing. Int J Environ Res Public Health 2011;8:2109–23. 10.3390/ijerph8062109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Xu M, Guo Y, Zhang Y et al. Spatiotemporal analysis of particulate air pollution and ischemic heart disease mortality in Beijing, China. Environ Health 2014;13:109 10.1186/1476-069X-13-109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sun Z, Mukherjee B, Brook RD et al. Air-Pollution and Cardiometabolic Diseases (AIRCMD): a prospective study investigating the impact of air pollution exposure and propensity for type II diabetes. Sci Total Environ 2013;448:72–8. 10.1016/j.scitotenv.2012.10.087 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Puett RC, Hart JE, Yanosky JD et al. Chronic fine and coarse particulate exposure, mortality, and coronary heart disease in the Nurses’ Health Study. Environ Health Perspect 2009;117:1697–701. 10.1289/ehp.0900572 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.He X, Shen L, Shen Z. A data-adaptive knot selection scheme for fitting splines. IEEE Signal Proc Lett 2001;8:137–9. 10.1109/97.917695 [DOI] [Google Scholar]
- 35.Miyata S, Shen X. Free-knot splines and adaptive knot selection. J Jpn Stat Soc 2005;35:303–24. 10.14490/jjss.35.303 [DOI] [Google Scholar]
- 36.Viallefont V, Raftery AE, Richardson S. Variable selection and Bayesian model averaging in case-control studies. Stat Med 2001;20:3215–30. 10.1002/sim.976 [DOI] [PubMed] [Google Scholar]
- 37.Zhou M, He G, Liu Y et al. The associations between ambient air pollution and adult respiratory mortality in 32 major Chinese cities, 2006–2010. Environ Res 2015;137:278–86. 10.1016/j.envres.2014.12.016 [DOI] [PubMed] [Google Scholar]
- 38.Zhu R, Chen Y, Wu S et al. The relationship between particulate matter (PM10) and hospitalizations and mortality of chronic obstructive pulmonary disease: a meta-analysis. J Chron Obstruct Pulmon Dis 2013;10:307–15. 10.3109/15412555.2012.744962 [DOI] [PubMed] [Google Scholar]
- 39.Health Effects Institute. Revised analyses of time-series studies of air pollution and health: special report. Boston: Health Effects Institute, 2003. [Google Scholar]
- 40.Kass RE, Wasserman L. The selection of prior distributions by formal rules. J Am Stat Assoc 1996;91:1343–70. 10.1080/01621459.1996.10477003 [DOI] [Google Scholar]