Abstract
Many countries have enforced social distancing to stop the spread of COVID-19. Within countries, although the measures taken by governments are similar, the incidence rate varies among areas (e.g., counties, cities). One potential explanation is that people in some areas are more vulnerable to the coronavirus disease because of their worsened health conditions caused by long-term exposure to poor air quality. In this study, we investigate whether long-term exposure to air pollution increases the risk of COVID-19 infection in Germany. The results show that nitrogen dioxide (NO) is significantly associated with COVID-19 incidence, with a 1 m increase in long-term exposure to NO increasing the COVID-19 incidence rate by 5.58% (95% credible interval [CI]: 3.35%, 7.86%). This result is consistent across various models. The analyses can be reproduced and updated routinely using public data sources and shared R code.
Keywords: COVID-19, Air pollution, Health impacts, Kriging, INLA
1. Introduction
COVID-19, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is currently widespread. It is much more dangerous than seasonal flu due to its high infection and death rates. Up to 20th October, 2020, it has led to over 40.5 million cases and 1,120,000 deaths worldwide. In Germany, the total confirmed cases up to 20th October, 2020 have risen to 377,000, with deaths being more than 9800. A recent study by Wu et al. (2020) investigated the impact of long-term average exposure to fine particulate matter (PM2.5) on the risk of COVID-19 deaths in the United States and found that an increase of 1 g/m in PM2.5 was associated with an 8% (95% confidence interval, 2%, 15%) increase in the COVID-19 death rate. Ogen (2020) reported that most COVID-19 fatal cases occurred in those regions with the highest NO concentrations while studying 66 administrative regions in Italy, Spain, France and Germany. These results suggest that high levels of air pollution may be an important contributor to COVID-19 infections or deaths.
1.1. Literature review on pollution impacts
The existing body of research on the impacts of air pollution on human health has linked PM2.5 and NO exposure to health damage, particularly respiratory and lung diseases, which could make people more vulnerable to contracting COVID-19. The main source of NO resulting from human activities is the combustion of fossil fuels (coal, gas and oil), especially fuel used in cars. Exposure to high levels of NO can cause inflammation of the airways. Long-term exposure may affect lung function and respiratory symptoms. For example, research by Bowatte et al. (2017) indicates that long-term exposure to NO was associated with increased risk of respiratory diseases, while Lee et al. (2009) show that long-term exposure to NO2 was significantly associated with respiratory hospital admissions in Edinburgh and Glasgow, UK. Similarly, Schikowski et al. (2005) suggest that long-term exposure to air pollution from NO and living near a major road might increase the risk of developing chronic obstructive pulmonary disease (COPD) and can have a detrimental effect on lung function.
On the other hand, particulate matter (both PM10 and PM2.5) is made up of a wide range of materials and arises from both human-made (such as stationary fuel combustion and transport) and natural sources (such as sea spray and Saharan sand dust). Exposure to particulate matter is associated with respiratory and cardiovascular illness and mortality as well as other adverse health effects. Since particulate matter can be inhaled into the thoracic region of the respiratory tract, there is a plausible reason the relationship could be causal. Examples include Lee et al. (2009) and Lee (2012), where the authors found that long-term exposure to PM10 was significantly associated with respiratory hospital admissions. Recent reviews by the Committee on the Medical Effects of Air Pollutants (COMEAP, 2010) have suggested exposure to PM2.5 had a stronger association with the observed adverse health effects because they can travel deeper into lungs.
1.2. Literature review on statistical models
A spatial ecological design can be used to estimate the impacts of air pollution on health by comparing geographical contrasts in air pollution and infection risk across contiguous small areas (Huang et al., 2018, Napier et al., 2018, Rushworth et al., 2014). In such studies, the outcome data are counts of disease cases occurring in each areal unit while the pollution concentrations in each areal unit are typically estimated by applying Kriging (see Diggle and Ribeiro, 2007), to data from a sparse monitoring network, or by computing averages over modelled concentrations (grid level) from an atmospheric dispersion model (Wu et al., 2020, Maheswaran et al., 2006, Lee et al., 2009, Warren et al., 2012), or by combining both to obtain a better prediction (Huang et al., 2018, Vinikoor-Imler et al., 2014, Sacks et al., 2014). The downside of these studies is that the inference is a population level association rather than an individual-level causal relationship, and wrongly assuming the two are the same is known as ecological bias (Arbia, 1988, Wakefield and Salway, 2001). Such bias is due in part to within-population variation in pollution exposures and disease incidence, because one does not know whether, within a population, it is the same individuals that exhibit disease and have the highest air pollution exposures (Lee et al., 2020). The simulation study from Lee et al. (2020) also suggests that the estimates of the aggregated model from individual levels almost always exhibit less variation than those from the ecological model.
Another challenge in air pollution health effect studies is how to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects (Huang et al., 2018, Blair et al., 2007). Specifically, the areal level pollution predictions produced from the pollution data are uncertain as they are only estimates of the true concentrations. The disadvantage of using a point estimate is that one may overstate the certainty about the connection between the outcome and the covariate. A number of approaches have been proposed to incorporate pollution uncertainties and measurement errors when modelling health outcomes (e.g., Huang et al., 2018, Lee et al., 2017, Blangiardo et al., 2016, Gryparis et al., 2009).
In this study, we investigate whether long-term average exposure to air pollution increases the risk of COVID-19 infection in Germany using a spatial ecological design. Specifically, in order to reduce the potential ecological bias, we better estimate the true areal pollution concentrations by first applying Kriging to pollution monitoring data to obtain predictions on a fine grid where population density data are available, then estimate the areal pollution concentrations by taking spatially population-weighted average of the gridded predictions lying within a specific county. This will likely enhance the estimation of people’s real exposure for those counties where they generally live at rural areas while their urban pollution are much worse compared to the rural areas. Given that the study from Lee et al. (2017) showed that treating the posterior predictive pollution distribution as a prior in the disease model has produced similar results to ignoring the uncertainty except for PM10, and Blangiardo et al. (2016) also found that incorporating uncertainty in pollution by making multiple sets of estimated exposure and then fitting the disease model separately for each set before combining the estimated health effects, did not change the substantive conclusions, we do not address exposure uncertainty in this study. Instead, we incorporate the reliability of gridded pollution predictions while aggregating them spatially, with details can be found in Section 3.2.
The remainder of this paper is organized as follows. The data and its exploratory analysis are presented in Section 2, while the statistical methodology is outlined in Section 3. The results of the study are reported in Section 4, and the key conclusions are presented in Section 5.
2. Study region
2.1. Data description
The study region is Germany which has a population of around 83 million people and counties (administrative districts), among which 294 are rural and 107 are urban. A map of these counties is shown in Fig. 1, showing boundaries obtained from Germany’s Federal Agency for Cartography and Geodesy (BKG, 2020).
Fig. 1.
Pollution stations, population density, log of COVID-19 SIR and population-weighted NO (g m) by county in Germany.
The data set used in this study include COVID-19 cases, pollution concentrations, temperature and population data. The accumulated COVID-19 cases used in this study are collected up to 13th, September, 2020 at the county level. Both pollution and temperature data are average concentrations for the years 2016–2018 (representing long-term exposure) from monitoring sites, which are converted into county level by applying the spatial modelling and prediction method described in Section 3 to obtain the spatially population-weighted representative concentrations for each county. The pollutants considered in this study include: common pollutants PM2.5, PM10, NO2 and SO; and also four poisonous pollutants benzene, arsenic, cadmium and nickel. These pollutants could have potential harmful health effects, such as damage to the lungs and nasal cavity, reducing lung function, causing chronic bronchitis and cancers of the bladder and lungs (Yu et al., 2003, Smith, 2010, Järup et al., 1998, Das et al., 2008).
The population data contain fine gridded population densities, the population by sex and age on the federal state level and also the county level population data. The fine gridded population density data are used for calculating population-weighted county level exposure (see Section 3 for details), while the latter two are used to calculate the expected number of cases in each county. Specifically, we denote Y as the reported numbers of COVID-19 cases for county , and calculate the expected number of cases in each county by , where is the population in county , is the national incidence rate in sex–age group , and denotes the population of the state which contains county . The latter part in the equation is the expected number of cases in state . The standardized incidence ratio (SIR) given by SIR, measures the risk of disease, and an SIR of 1.1 indicates a 10% increased risk of disease compared to that expected. A spatial map of the natural logarithm of SIR for COVID-19 (the scale will be modelled on) as of 13th September, 2020 can be seen in Fig. 1(c), showing a wide variation in SIRs across the counties in Germany and the majority of the high-risk counties are at the southern and northwestern parts of Germany.
2.2. Data sources
The COVID-19 cases by county, and the population by sex and age on the federal state level in Germany are publicly available from Kaggle (Heads or Tails, 2020). The COVID-19 cases and deaths are updated daily, with the earliest recorded cases are from 24th January, 2020. The COVID-19 data are originally collected by the Robert Koch Institute, with more details can be found in Heads or Tails (2020). The county level population data are freely available from The City Population (2019). Both population data sets reflect the (most recent available) estimates on 2018-12-31. The fine gridded population density data are freely available on DIVA-GIS (2020), and is shown in Fig. 1(b).
Air quality data are obtained from the Air Quality e-Reporting provided by European Environment Agency (EEA, 2020). The monitoring stations are shown in Fig. 1(a) and tend to be dense where the population density is high (see Fig. 1(b)). The temperature data are downloaded from the European Climate Assessment & Dataset (ECAD, 2020).
3. Method
The observed and expected case counts for each areal unit are used to calculate the standardized incidence ratio, with SIR, where SIR represents areas with elevated levels of disease risk, while SIR corresponds to comparatively healthy areas. Elevated risks are likely to happen by chance if is small, which can occur if the disease in question is rare and/or the population at risk is small (Lee, 2011). To overcome this problem, the Poisson log-linear spatial models are typically used for the analysis (Elliott et al., 2000, Banerjee et al., 2004, Lawson, 2008), where the linear predictor includes pollutant concentrations and potential confounders. These known covariates are augmented by a set of random effects to capture the residual spatial autocorrelation after the covariate effects have been accounted for. The random effects borrow strength from values in neighbouring areas, which reduces the variance of the estimated risk and the likelihood of excesses estimated risk occurring by chance.
These random effects are commonly modelled by the class of conditional autoregressive (CAR) prior distributions, which are a type of Markov random field model (see Rue and Held, 2005). The spatial correlation between the random effects is determined by a binary neighbourhood matrix W. Based on this neighbourhood matrix, the most common models for the random effects include intrinsic autoregressive model (Besag et al., 1991), convolution model (Besag et al., 1991), as well as those proposed by Cressie (1993) and Leroux et al. (1999). These CAR models differ by holding different assumptions about how the random effects depend on each other across space.
3.1. Pollution model
For simplicity, in this study we use a univariate model for each pollutant, since the number of monitoring stations is fairly large (709) to produce predictions with modest standard errors. We treat the underlying pollution levels in Germany as a spatial Gaussian process with mean , variance and correlation function , where denotes the Euclidean distance between and . Denote the observed pollution data as , and write for the unobserved values of the signal at the sampling locations , the pollution model is assumed as
| (1) |
where is uncorrelated with , and is the identity matrix of size . is multivariate Gaussian with mean vector , where denotes a vector each of whose elements is 1, and variance matrix , where is the by matrix with elements . Similarly, is multivariate Gaussian
| (2) |
where is the noise-to-signal variance ratio.
The log-likelihood corresponding to (1) is
| (3) |
Given , the maximum likelihood estimate (MLE) of and is given by,
| (4) |
By substituting , and into the log-likelihood function, we have,
| (5) |
which can be optimized numerically with respect to and , followed by back substitution to obtain and . This is achieved by function likfit() in geoR package by providing initial values for the covariance parameters (Diggle and Ribeiro, 2007).
3.2. Population-weighted exposure
The areal pollution exposure is estimated by aggregating the gridded predictions weighted by population density and by the precision of the predictions. For a new location , the Kriging formula of (Diggle and Ribeiro, 2007) is used to obtain its prediction by plugging-in the resulting estimates , which is
| (6) |
where . The corresponding prediction variance is , based on which we have the inverse variance for the prediction, . The higher is, the better quality the prediction has, and we give more weight to the most reliable pollution values while aggregating them (see Sanchez-Meca and Marín-Martínez, 1998).
After obtaining pollution predictions at the centre of all grids where the population density data are available (see Fig. 1(b)) using (6), and denoting the population density at location as , for a specific county , the spatially representative pollution concentration is estimated by
| (7) |
where represents county . Therefore, is a spatial metric of pollution concentrations weighted by population density and also the inverse of their Kriged variances.
3.3. COVID-19 incidence model
Recall that the outcome data are counts of the cases occurring in each county in Germany, and that the observed and expected number of COVID-19 cases for county are denoted as and , respectively. The model for COVID-19 incidence (health model) is a Poisson log-linear model (see Shaddick and Zidek, 2015), given by
| (8) |
where the relative risk of disease in county is denoted by , and is modelled on the log scale by covariates and a spatial random effect . The covariates are comprised of an intercept, pollutants, temperature and areal population density which is the areal population divided by its area (referred to as popDensity). The regression parameters are assigned weakly informative zero-mean Gaussian priors with diagonal variance matrix .
The spatial random effect, , is included to allow for any residual spatial autocorrelation remaining in the disease counts after the covariate effects have been accounted for, and is modelled by,
| (9) |
where . Spatial autocorrelation is induced into the random effects by the precision matrix , which corresponds to the CAR model proposed by Leroux et al. (1999). The spatial dependence in the data is captured by an neighbourhood matrix , whose th element equals 1 if areas and share a common border and is zero otherwise. The level of spatial autocorrelation in the random effects is controlled by . Finally, weakly informative hyperpriors are specified for the parameters by
| (10) |
The prior distribution of is fairly non-informative as it is roughly uniformly distributed within [0,1]. The prior distribution of allows small values, which are what we expect for the variation of the log scale of relative risk. The COVID-19 incidence models are implemented in INLA (Rue et al., 2009) which uses a computationally effective and extremely powerful method for fitting Bayesian models, and has an increasingly popular analysis package in R. For details on how to fit spatial and spatio-temporal models with R-INLA, refer to Blangiardo et al. (2013).
4. Results
4.1. Exposure estimation
Table 1 presents the estimation of pollution model parameters obtained by applying the model in (1) to different pollutants separately. The main message from the table is that the Akaike information criterion (AIC, Akaike, 1973) from the proposed spatial pollution model are all well below those from a non-spatial model that does not incorporate the component in (2). This suggests that spatial structure is an important component of the pollution models. The main outputs from the fitted pollution models are the population-weighted county level exposure for each pollutant. The population-weighted NO exposures are shown in Fig. 1(d), suggesting that the western part of Germany is much more exposed to NO. A summary of the estimated population-weighted county level exposure is presented in Table 2, while scatterplots of the natural logarithm of COVID-19 SIR against the population-weighted NO and PM2.5 are displayed in the upper part of Fig. 2. The latter seems to indicate a linear relationship between NO and log COVID-19 risk.
Table 1.
Parameter estimation from the spatial pollution model.
| AIC | Non-spatial AIC | |||||
|---|---|---|---|---|---|---|
| NO | 20.181 | 60.532 | 107.134 | 0.537 | 4501.620 | 4644.213 |
| PM2.5 | 10.395 | 1.156 | 1.525 | 0.736 | 741.625 | 771.677 |
| PM10 | 17.483 | 4.375 | 10.265 | 0.554 | 2138.916 | 2178.230 |
| SO | 1.507 | 0.751 | 0.115 | 0.681 | 301.668 | 365.301 |
| Benzene | 0.846 | 0.022 | 0.093 | 1.584 | 96.896 | 107.443 |
| Aresenic | 0.446 | 0.014 | 0.031 | 1.473 | −71.610 | −54.280 |
| Cadmium | 0.110 | 0.001 | 0.002 | 1.159 | −532.236 | −499.639 |
| Nickel | 1.468 | 0.532 | 1.239 | 0.922 | 603.834 | 631.438 |
| Temperature | 9.810 | 1.527 | 0.245 | 0.857 | 1783.719 | 2175.323 |
Table 2.
Population-weighted county level exposure summary, with unit for NO, PM25, PM10, SO, Benzene; ng m for Aresenic, Cadmium, Nickel; and for temperature.
| Min | Quantile25 | Median | Mean | Quantile75 | Max | |
|---|---|---|---|---|---|---|
| NO | 12.57 | 18.79 | 21.62 | 23.03 | 26.86 | 36.54 |
| PM2.5 | 8.64 | 9.98 | 10.52 | 10.48 | 10.94 | 12.21 |
| PM10 | 14.48 | 16.84 | 17.74 | 17.74 | 18.57 | 21.07 |
| SO | 0.69 | 1.21 | 1.51 | 1.66 | 1.95 | 4.23 |
| Benzene | 0.69 | 0.81 | 0.85 | 0.88 | 0.96 | 1.15 |
| Aresenic | 0.32 | 0.40 | 0.44 | 0.45 | 0.50 | 0.67 |
| Cadmium | 0.08 | 0.10 | 0.10 | 0.11 | 0.13 | 0.20 |
| Nickel | 0.97 | 1.32 | 1.44 | 1.63 | 1.85 | 3.22 |
| Temperature | 6.69 | 9.61 | 10.15 | 10.09 | 10.54 | 11.84 |
Fig. 2.
Upper: scatterplots of log COVID-19 SIR against NO (g m) and PM2.5 (g m); Middle: the Moran’s I test and the empirical semi-variogram of the residuals from the non-spatial health model (circles), with 95% Monte Carlo simulation envelopes (dashed lines); Bottom: posterior (solid line) and prior (dashed line) plots for and from health model (8).
4.2. Model validation
Before presenting the estimated effects of environmental factors on COVID-19, we assess the necessity of including spatial autocorrelation via the random effects model (9). In order to do so, we fitted a simplified version of model (8) without the spatial random effects term . The residuals from this model show substantial spatial autocorrelation, with significant Moran’s I statistics (Moran, 1950) shown as Fig. 2(c). The empirical semi-variogram of the residuals, illustrated in Fig. 2(d), shows several points are lying outside the 95% Monte Carlo simulation envelopes. This suggests that some spatial autocorrelation remains in the residuals, and thus that including the spatial random effect model (9) is necessary.
4.3. Effects of pollution on COVID-19
In this section we present the air pollution health effects, which are the main results in this study. For comparison purposes, we show both the results from our employed health model with the Leroux et al. (1999) CAR model to account for spatially correlated residuals, and the results from other commonly used CAR models: the “Besag” intrinsic autoregressive model proposed by Besag et al. (1991), the “BYM” convolution model (also proposed by Besag et al., 1991), and the non-spatial model (referred to as “IID”). In addition, as PM10 and PM2.5 are highly correlated (with correlation coefficient 0.75), we run two health models separately, with each model including either PM10 or PM2.5 to avoid collinearity. The results from having PM2.5 in the model are presented in Table 3, while those from having PM10 are presented in the Appendix in Table 4.
Table 3.
Posterior medians and 95% CI for the percentage increase in relative risk from one-unit increase in each covariate, and the WAIC from fitting various health models (having PM2.5), including the employed Leroux model, and the commonly used BYM, Besag, IID models. ‘Pr’ is the posterior probabilities that covariate increases relative risk.
| Leroux |
BYM |
Besag |
IID |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Est | CI | Pr | Est | CI | Pr | Est | CI | Pr | Est | CI | Pr | |
| NO | 5.58 | ( 3.35, 7.86) | [1.00] | 5.21 | ( 2.98, 7.50) | [1.00] | 5.36 | ( 3.06, 7.71) | [1.00] | 5.60 | ( 3.94, 7.29) | [1.00] |
| PM2.5 | 4.59 | (−12.57, 24.79) | [0.69] | 1.62 | (−15.76, 22.51) | [0.57] | 0.45 | (−17.49, 22.27) | [0.52] | 8.04 | ( −2.70, 19.94) | [0.93] |
| SO | 15.83 | ( −1.42, 35.45) | [0.96] | 6.29 | ( −9.20, 24.36) | [0.78] | 5.45 | (−10.56, 24.31) | [0.74] | 39.51 | ( 26.44, 53.94) | [1.00] |
| Temperature | −11.72 | (−20.84, −1.46) | [0.01] | −8.52 | (−18.01, 2.05) | [0.05] | −8.48 | (−18.23, 2.41) | [0.06] | −18.12 | (−24.55, −11.16) | [0.00] |
| Benzene | −1.21 | (−19.21, 20.12) | [0.45] | −2.75 | (−23.00, 22.75) | [0.41] | −3.45 | (−24.59, 23.58) | [0.39] | 9.32 | ( −1.58, 21.41) | [0.95] |
| Aresenic | −16.72 | (−31.45, 1.68) | [0.04] | −9.34 | (−26.32, 11.47) | [0.17] | −10.27 | (−27.89, 11.64) | [0.16] | −22.38 | (−30.83, −12.91) | [0.00] |
| Cadmium | 16.44 | ( −5.67, 44.53) | [0.92] | 23.93 | ( −1.12, 55.49) | [0.97] | 27.08 | ( 0.21, 61.15) | [0.98] | 6.86 | ( −5.49, 20.80) | [0.86] |
| Nickel | −1.35 | (−13.13, 12.03) | [0.42] | −1.41 | (−13.44, 12.26) | [0.41] | −1.56 | (−14.22, 12.95) | [0.41] | −1.47 | ( −8.90, 6.57) | [0.35] |
| popDensity | −2.12 | ( −7.34, 3.39) | [0.22] | −2.23 | ( −7.37, 3.19) | [0.20] | −1.83 | ( −6.98, 3.61) | [0.25] | −6.15 | (−11.35, −0.65) | [0.01] |
| WAIC | 3814.63 | 3814.8 | 3816.19 | 3815.01 | ||||||||
Table 4.
Posterior medians and 95% CI for the percentage increase in relative risk from one-unit increase in each covariate, and the WAIC from fitting various health models (having PM10), including the employed Leroux model, and the commonly used BYM, Besag, IID models. ‘Pr’ is the posterior probabilities that covariate increases relative risk.
| Leroux |
BYM |
Besag |
IID |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Est | CI | Pr | Est | CI | Pr | Est | CI | Pr | Est | CI | Pr | |
| NO | 6.06 | ( 3.50, 8.67) | [1.00] | 5.51 | ( 2.91, 8.20) | [1.00] | 5.57 | ( 2.94, 8.26) | [1.00] | 6.93 | ( 5.02, 8.88) | [1.00] |
| PM10 | −2.98 | (−12.49, 7.63) | [0.28] | −1.43 | (−11.56, 9.81) | [0.40] | −1.61 | (−11.78, 9.72) | [0.38] | −7.88 | (−14.13, −1.17) | [0.01] |
| SO | 18.82 | ( 0.90, 39.05) | [0.98] | 6.40 | ( −9.64, 25.21) | [0.77] | 6.18 | ( −9.94, 25.18) | [0.76] | 50.66 | ( 37.18, 65.45) | [1.00] |
| Temperature | −10.89 | (−20.34, −0.24) | [0.02] | −8.09 | (−18.09, 3.13) | [0.07] | −8.05 | (−18.12, 3.24) | [0.08] | −16.21 | (−22.97, −8.87) | [0.00] |
| Benzene | −0.80 | (−18.69, 20.39) | [0.47] | −3.00 | (−23.67, 23.13) | [0.40] | −3.27 | (−24.07, 23.22) | [0.39] | 8.46 | ( −2.34, 20.43) | [0.94] |
| Aresenic | −13.74 | (−29.20, 5.35) | [0.07] | −9.18 | (−26.99, 12.85) | [0.19] | −9.42 | (−27.34, 12.88) | [0.19] | −12.82 | (−22.89, −1.44) | [0.01] |
| Cadmium | 13.52 | ( −8.03, 41.20) | [0.88] | 25.46 | ( −0.87, 59.05) | [0.97] | 26.31 | ( −0.47, 60.27) | [0.97] | −1.37 | (−12.55, 11.21) | [0.41] |
| Nickel | −0.70 | (−12.51, 12.67) | [0.46] | −1.32 | (−13.82, 12.95) | [0.42] | −1.37 | (−14.00, 13.11) | [0.42] | −1.17 | ( −8.59, 6.85) | [0.38] |
| popDensity | −2.08 | ( −7.30, 3.43) | [0.22] | −1.92 | ( −7.09, 3.52) | [0.24] | −1.82 | ( −6.98, 3.61) | [0.25] | −5.89 | (−11.08, −0.41) | [0.02] |
| WAIC | 3814.55 | 3815.64 | 3816.13 | 3814.58 | ||||||||
The bottom of Fig. 2 shows both the prior and posterior distributions of the spatial dependence parameter and variance parameter from the Leroux model with PM2.5, suggesting that both of them are well estimated from the data. Fig. 2(e) shows that the estimate of is around 0.8 which indicates high spatial autocorrelation in the disease data after the covariate effects have been accounted for, justifying the use of the spatial random effects model. Similarly, Fig. 2(f) shows the estimate of the spatial variance parameter is around 0.75. The predicted COVID-19 risk presented in Fig. 3(a) shows that the majority of the high-risk counties are at the southeastern part of Germany. Fig. 3(b) shows these areas have high probabilities of excess risk.
Fig. 3.
Posterior means of relative risk E and probabilities of 50% excess risk Pr.
The main results are presented in Table 3, including the posterior medians and 95% credible intervals of relative risk from one-unit increase for each covariate, and the widely applicable information criterion (WAIC, Watanabe, 2010) from fitting various health models, including the employed Leroux model, and the commonly used BYM, Besag and IID models. Table 3 shows that the WAIC from different CAR models are similar, while it is slightly lower (better) from the currently used Leroux model. The results from Leroux model show that NO is significantly (at 0.05 level) associated with the COVID-19 risk, with a 1 g m increase in long-term exposure to NO increasing the COVID-19 incidence rate by 5.58% (95% CI: 3.35%, 7.86%). This statistically significant association between NO and COVID-19 incidence is consistent across various health models, including the BYM, Besag, IID models (and also those from Table 4 where the health model has PM10 rather than PM2.5), which enhances the plausibility of the results.
Areal population density does not have a significant association with COVID-19, while temperature displays a negative association (at 0.05 level) with COVID-19 incidence. As shown in Fig. 1(c), COVID-19 risk is generally higher in the south, which has lower long-term temperatures compared to the north (see MOW, 2020). This is consistent with the negative association between temperature and COVID-19 incidence rates. No substantial associations (at 0.05 level) were found between COVID-19 incidence and the other pollutants, including PM2.5, SO, Benzene, Aresenic, Cadmium and Nickel. Note that SO is just at the border of having a significant association with COVID-19 incidence, since the posterior probabilities of its increasing relative risk is 0.96 (see Table 3). SO is significantly associated with COVID-19 incidence in the model having PM10 rather than PM2.5 (see the Leroux model results from Table 4).
5. Discussion
Ogen (2020) states that poisoning our environment means poisoning our own body, and when it experiences chronic respiratory stress its ability to defend itself from infections is limited. Existing research has linked pollutants (e.g., PM2.5 and NO) exposure to health damage, particularly respiratory and lung diseases, which could make people more vulnerable to contracting COVID-19. This study uses a spatial ecological design to estimate the impacts of air pollution on COVID-19 infection in Germany by comparing geographical contrasts in air pollution and infection risk across contiguous small areas, where we use population-weighted method to better estimate individual’s air pollution exposure. The results show that long-term exposure to NO is significantly associated with COVID-19 incidence rate in Germany, with a 1 g m increase in long-term exposure to NO increasing the COVID-19 incidence rate by 5.58% (95% CI: 3.35%, 7.86%). No substantial associations were found between the COVID-19 incidence rate and the other pollutants, including PM2.5, PM10, SO, Benzene, Aresenic, Cadmium and Nickel. Temperature and population density are adjusted for in the model, and spatial random effects are also included to capture the residual spatial autocorrelation after the covariate effects have been accounted for.
For comparison purposes, we compared our results to other commonly used CAR models, including the intrinsic autoregressive model (Besag et al., 1991), convolution model (Besag et al., 1991), and non-spatial model. In addition, as PM10 and PM2.5 are highly correlated, we ran two health models separately, with each model including either PM10 or PM2.5 to avoid collinearity. We found that the statistically significant associations between NO and COVID-19 are consistent across these various health models, which enhances the plausibility of the results.
Several limitations to this pilot study need to be acknowledged. First, due to data availability, no socioeconomic or health care related covariates were included in the health model which, if included, would provide the possibility of sensitivity analyses and help testing the robustness of the findings. However, in our health model, we do include a spatial random effects term to allow for any spatial autocorrelation residuals after accounting for the known covariates, and the main findings of NO are adjusted for a set of other pollutants, temperature and population density. Another limitation is lacking COVID-19 testing numbers, since the confirmed cases (positive testing numbers) in a county rely on the total number of tests being conducted in that county. The limitation in regard to estimating pollution exposures is that the current model is univariate, which is potentially losing some power by not borrowing strength over correlated pollutants as would occur in a multivariate pollution model.
Finally, COVID-19 deaths, not only infections, should be focused when (or if) more deaths occur in the future. Such studies will help us better understand COVID-19, and also help the global communities and health organizations stay informed and make data driven decisions.
Acknowledgements
We would like to thank the editors, and two anonymous referees for their insightful and constructive comments, which greatly improved the presentation of the article. Professor Brown is funded by the Natural Sciences and Engineering Research Council of Canada .
Appendix.
The results from fitting COVID-19 incidence models having PM10 are shown in Table 4. The data and R code used in this study are shared on Github (https://github.com/hgw0610209/Germany-covid-paper).
References
- Akaike H. Selected Papers of Hirotugu Akaike. Springer New York; New York, NY: 1973. Information theory and an extension of the maximum likelihood principle; pp. 199–213. [Google Scholar]
- Arbia G. Springer; 1988. Spatial Data Configuration in Statistical Analysis of Regional Economic and Related Problems. [Google Scholar]
- Banerjee S., Carlin B.P., Gelfand A.E. first ed. Chapman and Hall/CRC Press; 2004. Hierarchical Modeling and Analysis of Spatial Data. [Google Scholar]
- Besag J., York J., Mollie A. Bayesian Image restoration with two applications in spatial statistics. Ann. Inst. Stat. Math. 1991;43:1–59. [Google Scholar]
- BKG J. 2020. Federal agency for cartoraphy and geodesy: Digitale geodaten. https://gdz.bkg.bund.de/index.php/default/digitale-geodaten.html. [Google Scholar]
- Blair A., Stewart P., Lubin J.H., Forastiere F. Methodological issues regarding confounding and exposure misclassification in epidemiological studies of occupational exposures. Amer. J. Ind. Med. 2007;50(3):199–207. doi: 10.1002/ajim.20281. [DOI] [PubMed] [Google Scholar]
- Blangiardo M., Cameletti M., Baio G., Rue H. Spatial and spatio-temporal models with R-INLA. Spatial Spatio-temporal Epidemiol. 2013;4:33–49. doi: 10.1016/j.sste.2012.12.001. [DOI] [PubMed] [Google Scholar]
- Blangiardo M., Finazzi F., Cameletti M. Two-stage Bayesian model to evaluate the effect of air pollution on chronic respiratory diseases using drug prescriptions. Spatial Spatio-temporal Epidemiol. 2016;18:1–12. doi: 10.1016/j.sste.2016.03.001. Environmental Exposure and Health. [DOI] [PubMed] [Google Scholar]
- Bowatte G., Erbas B., Lodge C.J., Knibbs L.D., Gurrin L.C., Marks G.B., Thomas P.S., Johns D.P., Giles G.G., Hui J., Dennekamp M., Perret J.L., Abramson M.J., Walters E.H., Matheson M.C., Dharmage S.C. Traffic-related air pollution exposure over a 5-year period is associated with increased risk of asthma and poor lung function in middle age. Eur. Respir. J. 2017;50(4) doi: 10.1183/13993003.02357-2016. [DOI] [PubMed] [Google Scholar]
- COMEAP G. Crown; 2010. The Mortality Effects of Long-Term Exposure to Particulate Air Pollution in the United Kingdom. [Google Scholar]
- Cressie N. Wiley; New York: 1993. Statistics for Spatial Data. [Google Scholar]
- Das K., Das S., Dhundasi S. Nickel, its adverse health effects & oxidative stress. Indian J. Med. Res. 2008;128(4):412–-425. [PubMed] [Google Scholar]
- Diggle P., Ribeiro P. Model-Based Geostatistics. Springer; 2007. (Springer Series in Statistics). [Google Scholar]
- DIVA-GIS P. 2020. DIVA-GIS: Free spatial data. https://www.diva-gis.org/gdata. [Google Scholar]
- ECAD P. 2020. The European climate assessment & dataset. https://www.ecad.eu/ [Google Scholar]
- EEA P. 2020. The European environment agency: Air quality e-reporting. https://www.eea.europa.eu/ [Google Scholar]
- Elliott P., Wakefield J., Best N., Briggs D. first ed. Oxford University Press; 2000. Spatial Epidemiology: Methods and Applications. [Google Scholar]
- Gryparis A., Paciorek C., Zeka A., Schwartz J., Coull B. Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics. 2009;10(2):258–274. doi: 10.1093/biostatistics/kxn033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heads or Tails A. 2020. COVID-19 tracking Germany, version 156. https://www.kaggle.com/headsortails/covid19-tracking-germany/version/156. [Google Scholar]
- Huang G., Lee D., Scott E.M. Multivariate space-time modelling of multiple air pollutants and their health effects accounting for exposure uncertainty. Stat. Med. 2018;37(7):1134–1148. doi: 10.1002/sim.7570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Järup L., Berglund M., Elinder C.G., Nordberg G., Vanter M. Health effects of cadmium exposure – a review of the literature and a risk estimate. Scand. J. Work Environ. Health. 1998;24:1–51. [PubMed] [Google Scholar]
- Lawson A.B. first ed. Chapman and Hall/CRC Press; 2008. Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology. [Google Scholar]
- Lee D. A comparison of conditional autoregressive models used in Bayesian disease mapping. Spatial Spatio-temporal Epidemiol. 2011;2(2):79–89. doi: 10.1016/j.sste.2011.03.001. [DOI] [PubMed] [Google Scholar]
- Lee D. Using spline models to estimate the varying health risks from air pollution across Scotland. Stat. Med. 2012;31(27):3366–3378. doi: 10.1002/sim.5420. [DOI] [PubMed] [Google Scholar]
- Lee D., Ferguson C., Mitchell R. Air pollution and health in Scotland: a multicity study. Biostatistics. 2009;10(3):409–423. doi: 10.1093/biostatistics/kxp010. [DOI] [PubMed] [Google Scholar]
- Lee D., Mukhopadhyay S., Rushworth A., Sahu S.K. A rigorous statistical framework for spatio-temporal pollution prediction and estimation of its long-term impact on health. Biostatistics. 2017;18(2):370–-385. doi: 10.1093/biostatistics/kxw048. [DOI] [PubMed] [Google Scholar]
- Lee D., Robertson C., Ramsay C., Pyper K. Quantifying the impact of the modifiable areal unit problem when estimating the health effects of air pollution. Environmetrics. 2020 [Google Scholar]
- Leroux B., Lei X., Breslow N. Springer-Verlag; New York: 1999. Estimation of Disease Rates in Small Areas: A New Mixed Model for Spatial Dependence; pp. 135–178. [Google Scholar]
- Maheswaran R., Haining R., Pearson T., Law J., Brindley P., Best N. Outdoor NOx and stroke mortality adjusting for small area level smoking prevalence using a Bayesian approach. Stat. Methods Med. Res. 2006;15:499–516. doi: 10.1177/0962280206071644. [DOI] [PubMed] [Google Scholar]
- Moran P. Notes on continuous stochastic phenomena. Biometrika. 1950;37:17–23. [PubMed] [Google Scholar]
- MOW P. 2020. Germany Weather map. https://www.mapsofworld.com/germany/thematic-maps/germany-temperature-map.html. [Google Scholar]
- Napier G., Lee D., Robertson C., Lawson A. A Bayesian space-time model for clustering areal units based on their disease trends. Biostatistics. 2018 doi: 10.1093/biostatistics/kxy024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ogen Y. Assessing nitrogen dioxide (NO2) levels as a contributing factor to coronavirus (COVID-19) fatality. Sci. Total Environ. 2020;726 doi: 10.1016/j.scitotenv.2020.138605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rue H., Held L. Chapman and Hall/CRC; New York: 2005. Gaussian Markov Random Fields: Theory and Applications. [Google Scholar]
- Rue H., Martino S., Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 2009;71(2):319–392. [Google Scholar]
- Rushworth A., Lee D., Mitchell R. A spatio-temporal model for estimating the long-term effects of air pollution on respiratory hospital admissions in Greater London. Spatial Spatio-temporal Epidemiol. 2014;10:29–38. doi: 10.1016/j.sste.2014.05.001. [DOI] [PubMed] [Google Scholar]
- Sacks J.D., Rappold A.G., Allen Davis Jr. J., Richardson D.B., Waller A.E., Luben T.J. Influence of urbanicity and county characteristics on the association between ozone and asthma emergency department visits in North Carolina. Environ. Health Perspect. 2014;122(5):506–512. doi: 10.1289/ehp.1306940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez-Meca J., Marín-Martínez F. Weighting by inverse variance or by sample size in meta-analysis: A simulation study. Educat. Psychol. Measur. 1998;58(2):211–220. [Google Scholar]
- Schikowski T., Sugiri D., Ranft U., Gehring U., Heinrich J., Wichmann H.-E., Kraemer U. Long-term air pollution exposure and living close to busy roads are associated with COPD in women. Respir. Res. 2005;6:152. doi: 10.1186/1465-9921-6-152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaddick G., Zidek J. Spatio-Temporal Methods in Environmental Epidemiology. Chapman & Hall; UK United Kingdom: 2015. (CRC Texts in Statistical Science). [Google Scholar]
- Smith M.T. Advances in understanding benzene health effects and susceptibility. Annu. Rev. Public Health. 2010;31(1):133–148. doi: 10.1146/annurev.publhealth.012809.103646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The City Population M.T. 2019. GERMANY: Administrative division: States and counties. https://www.citypopulation.de/en/germany/admin/ [Google Scholar]
- Vinikoor-Imler L.C., Davis J.A., Meyer R.E., Messer L.C., Luben T.J. Associations between prenatal exposure to air pollution, small for gestational age, and term low birthweight in a state-wide birth cohort. Environ. Res. 2014;132:132–139. doi: 10.1016/j.envres.2014.03.040. [DOI] [PubMed] [Google Scholar]
- Wakefield J., Salway R. A statistical framework for ecological and aggregate studies. J. R. Stat. Soc. A. 2001;164(1):119–137. [Google Scholar]
- Warren J., Fuentes M., Herring A., Langlois P. Bayesian Spatial-temporal model for cardiac congenital anomalies and ambient air pollution risk assessment. Environmetrics. 2012;23(8):673–684. doi: 10.1002/env.2174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watanabe S. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 2010;11:3571–3594. [Google Scholar]
- Wu X., Nethery R.C., Sabath B.M., Braun D., Dominici F. Exposure to air pollution and COVID-19 mortality in the United States. medRxiv. 2020 doi: 10.1126/sciadv.abd4049. https://www.medrxiv.org/content/10.1101/2020.04.05.20054502v2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu W.H., Harvey C.M., Harvey C.F. Arsenic in groundwater in Bangladesh: A geostatistical and epidemiological framework for evaluating health effects and potential remedies. Water Resour. Res. 2003;39(6) [Google Scholar]



