Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Aug 27;34(1):79. doi: 10.1186/s12302-022-00657-5

Association between short-term exposure to air pollution and COVID-19 mortality in all German districts: the importance of confounders

Gregor Miller 1,, Annette Menzel 2, Donna P Ankerst 1,2
PMCID: PMC9418649  PMID: 36062033

Abstract

Background

The focus of many studies is to estimate the effect of risk factors on outcomes, yet results may be dependent on the choice of other risk factors or potential confounders to include in a statistical model. For complex and unexplored systems, such as the COVID-19 spreading process, where a priori knowledge of potential confounders is lacking, data-driven empirical variable selection methods may be primarily utilized. Published studies often lack a sensitivity analysis as to how results depend on the choice of confounders in the model. This study showed variability in associations of short-term air pollution with COVID-19 mortality in Germany under multiple approaches accounting for confounders in statistical models.

Methods

Associations between air pollution variables PM2.5, PM10, CO, NO, NO2, and O3 and cumulative COVID-19 deaths in 400 German districts were assessed via negative binomial models for two time periods, March 2020–February 2021 and March 2021–February 2022. Prevalent methods for adjustment of confounders were identified after a literature search, including change-in-estimate and information criteria approaches. The methods were compared to assess the impact on the association estimates of air pollution and COVID-19 mortality considering 37 potential confounders.

Results

Univariate analyses showed significant negative associations with COVID-19 mortality for CO, NO, and NO2, and positive associations, at least for the first time period, for O3 and PM2.5. However, these associations became non-significant when other risk factors were accounted for in the model, in particular after adjustment for mobility, political orientation, and age. Model estimates from most selection methods were similar to models including all risk factors.

Conclusion

Results highlight the importance of adequately accounting for high-impact confounders when analyzing associations of air pollution with COVID-19 and show that it can be of help to compare multiple selection approaches. This study showed how model selection processes can be performed using different methods in the context of high-dimensional and correlated covariates, when important confounders are not known a priori. Apparent associations between air pollution and COVID-19 mortality failed to reach significance when leading selection methods were used.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12302-022-00657-5.

Keywords: Variable selection, COVID-19, Air quality, Pollution, Change-in-estimate, LASSO, AIC, BIC, Cross-sectional

Background

In light of the worldwide impact of COVID-19 ubiquitously on society sectors, an increasing supply of studies have been conducted to ascertain the risk factors shaping the spread and severity of the disease. Studies based on aggregated and individual-level data have identified a multitude of clinical and demographic risk factors, including age [14], gender [25], ethnicity [5], income [5, 6], education [5], mobility [7], obesity [4, 810], hypertension [3, 4, 1012], cardiovascular disease [2, 11, 12], respiratory disease [4, 12], pneumonia [1, 2, 4], history of cancer [10, 13], diabetes [3, 4, 1012], and chronic kidney disease [4, 14, 15]. The high number of potential confounders in COVID-19 studies and large heterogeneity in approaches to adjust for them warrants robustness strategies. Air quality is one of the factors that are hypothesized to play a role, however, the overall results of previous studies are heterogeneous.

To illustrate the issues, this study focuses on the relationship between air quality and COVID-19 as one of the pressing environmental concerns with COVID-19-related morbidity and mortality. Pollution is the largest environmental cause of death being responsible for 16% of all deaths worldwide [16]. Exposure to air pollution increases the risk for hospital admission for cardiovascular and respiratory diseases [17] and enhances general mortality [18]. The predominant effect of COVID-19 on the lower and upper respiratory tract [19] can be anticipated to be compounded by the additional targeted effects of air pollution and individual risk factors, such as smoking and cardiovascular disease history, leading to multiple pathways impacting patient outcomes. For example, pollution could affect susceptibility to COVID-19 via the increase of hypertension and cardiovascular diseases [20] or weaken the host defense system of the respiratory system [21, 22]. Airborne particles might serve as carriers for pathogens, thereby supporting the dominant airborne transmission [23, 24]. Multiple studies have analyzed aspects of the association between air quality and COVID-19 outcomes, including infections and mortality (Table 1). Some studies did not account for any confounders, others only for a small fixed set. Studies adjusting for wider ranges of confounders more often failed to find significant associations.

Table 1.

Overview of selected publications studying associations between air quality and COVID-19 statistics

Study Approach Result Area Time
Ogen [61] Categorized NO2 measurements were compared The results indicated a strong association between high values of the pollutant and high fatality cases 66 administrative regions in Italy, Spain, France, and Germany January to February 2020
Bashir et al. [62] The individual correlation between risk factors and new infections, total infections, and mortality were measured on a daily basis. Kendall and Spearman rank correlation was calculated. It is not clear what measurement was used to determine air quality Besides temperature, air quality was significantly correlated with the COVID-19 metrics New York City, USA March to April 2020
Accarino et al. [63] The Spearman correlation between PM2.5, PM10, NO2 and COVID-19 incidence rate as well as mortality rate was measured Significant associations between all of them were found 107 Italian territorial areas February and March 2020
Zhu et al. [64] Daily infections, meteorological variables, and air pollution concentrations for PM2.5, PM10, SO2, CO, NO2, and O3 were collected. Generalized additive models were used to estimate the associations between lagged, moving average concentrations of air pollutants and daily infections Significant positive associations for PM2.5, PM10, CO, NO2, and O3 and a negative association for SO2 were shown 120 Chinese cities January to February 2020
Stieb et al. [41] A negative binomial model was used to measure the association between PM2.5 from 2000 to 2016 and infection count. The Akaike information criterion was used to some extent to select from the socio-demographic, health, time since peak incidence, and temperature variables The multivariate model did not show a significant association for PM2.5 111 Canadian regions Up to May 13, 2020
Wu et al. [65] Negative binomial mixed models were used to regress on the mortality rate with PM2.5 and 20 other confounders as predictors. The particulate matter between 2000 and 2016 was considered A notable association was found for PM2.5, population density, days since first reported case, household income, percent of owner-occupied housing, high school education, age, and percent of Black residents 3089 US counties Up to June 18, 2020
Rodriguez-Villamizar et al. [42] A negative binomial hurdle model was used to analyze the effect of PM2.5 measured between 2014 and 2018 on COVID-19 mortality including socio-demographic, socio-economic and health confounders PM2.5 did not show a significant association with mortality 772 Colombian municipalities Up to July 17, 2020
Adhikari et al. [43] A negative binomial regression was applied on time-series data. Besides daily PM2.5 and ozone, meteorological confounders were included Ozone was found to be significantly associated with the daily infections but not with deaths Queens county, New York, USA March to April 2020
Borro et al. [66] Simple linear regressions were performed for cumulative COVID-19 incidence, mortality rate, and case-fatality rate with PM2.5 as predictor Significant associations were found for all three metrics 110 Italian provinces February to March 2020
Travaglio et al. [44] Negative binomial models were used to measure the association between PM2.5, PM10, NO, NO2, O3 and COVID-19 cases as well as deaths. Population density, average age, and mean earning were included as confounders. Air quality data prior to the pandemic were aggregated over one and five years Both COVID-19 metrics showed significant associations with the air quality risk factors England on regional and sub-regional level February to May 2020
Tieskens et al. [67] The incidence of five distinct time periods was analyzed via mixed-effect Poisson regression. Besides PM2.5, also 19 other socio-demographic, occupational, and mobility variables were incorporated. The variables were selected by excluding covariates with a variance inflation factor higher 2.5 in the regression of the first time period PM2.5 was not selected, yet almost all selected socio-demographic and economic variables indicated strong variance of their association between the time periods 351 cities in Massachusetts, USA March to October 2020
Liang et al. [45] Zero-inflated negative binomial models were used to determine the association between NO2, PM2.5, and O3 and case-fatality and mortality rates. Air quality measurements between 2010 and 2016 were considered. The models also included socio-demographic, socio-economic, health, and mobility variables For NO2, a positive association with the COVID-19 metrics was found 3122 US counties January to July 2020

Targeted association analyses, such as between air pollution and COVID-19 outcomes here, aim to both accurately estimate independent effect sizes as well as determine statistical significance. Determination of independent effects of a risk factor of primary interest requires adjustment for all potential confounders, many of which may be related. Although the liberal use of data-driven variable selection methods to control for confounders has been criticized [2527], such methods remain in widespread use. Among four major epidemiological journals in 2015, half used prior or causal graphs to select variables, 12% a change in effect estimate approach, 9% stepwise methods, 5% univariate analyses, 2% other methods, and 37% did not report their methods in detail [28]. Within any given study, robustness of primary association analyses to choice of confounders for inclusion is typically omitted, although sensitivity to choice of confounders has been demonstrated even in cases of small numbers of confounders [29]. The objective of this study was to compare the resulting associations of air pollution with COVID-19 mortality in high-dimensional settings when applying leading epidemiological methods for confounder selection.

Methods

The outcomes of interest were cumulative COVID-19 mortality counts from the 400 districts in Germany for two time periods, March 2020–February 2021 and March 2021–February 2022, extracted from the Robert Koch-Institut [30] (dl-de/by-2-0 [31]) (Additional file 1: Figure S1). During the analyzed timeframe, the local health departments of two districts, Eisenach and Wartburgkreis, merged and thus merged their COVID-19 numbers. The two time periods were selected to reflect an initial phase, where lockdowns led to decreased mobility and pollution, and a later re-opening phase with increased levels. The advantage of a single country analysis is that the availability and measuring processes for the observed data are standardized at least to a certain degree, which is especially relevant with respect to the international and temporal differences in reporting COVID-19 statistics [32, 33]. Furthermore, Germany, which holds the largest population in Europe, provides extensive data on potential confounders with high spatial resolution. COVID-19 death counts were used as the outcome as there is considerable, fluctuating under-ascertainment for infection counts. Even though also undercounted and varying with time, mortality data are considered more complete than infection data [32, 33].

Risk factors

For association with cumulative COVID-19 counts, 43 risk factors were assembled for the 400 German districts over the two time periods (Additional file 1: Table S1). Daily CO, NO, NO2, O3, PM10, and PM2.5 measurements extracted from the ENSEMBLE dataset of the Copernicus Atmosphere Monitoring Service referred to surface estimates at noon with a 0.1° horizontal coverage over all of Germany [34]. The ENSEMBLE dataset extracts the value from nine numerical air quality models and thereby achieves a higher degree of robustness than individual models. Daily district-wide estimates of the air quality values from extracted polygons were aggregated by calculating the weighted mean depending on how much of the respective district area was covered by the corresponding polygon. For each of the two analyzed time periods and each district, the average of the daily values was then calculated for inclusion as risk factors in the models.1

Socio-demographic, health infrastructure, political, educational, and socio-economic variables were extracted from the German Federal and State Statistical Offices [35, 36] (license: dl-de/by-2-0 [31], tables: 12411-0015, 11111-0002, 12411-0018, 12521-0040, 12521-0041, 12531-0040, AI014-1, AI014-2, AI003-2, AI005, AI-N-01-2, AI-N-10, AI-S-01, AI007-1). Political variables referred to the federal election in 2017; gross domestic product, disposable income, and employee distribution referred to 2018; education level, socio-demographic, proportion of settlement and traffic area, and health infrastructure, 2019, except for hospital bed density in 2017. Geographic data on district area were acquired from the OpenDataLab [37] (Geodatenzentrum © GeoBasis-DE/BKG 2018 (VG250 31.12., Data changed)).

Daily mobility data were extracted from the Google Community Mobility Report [38] and was only available on a state level for the 16 states in Germany. Mobility data quantified change in number of visits and length of stay for certain places, including groceries, pharmacies, parks, residences, retail and recreational areas, transit stations, and workplaces, with respect to a reference period between January 3 and February 6, 2020. Daily values were averaged over respective time periods. Flu and vaccination data were extracted from the Robert Koch-Institut [39, 40] (dl-de/by-2-0 [31]). Means of the reported yearly flu incidences between 2017 and 2019 were calculated for each district. Daily vaccination rates reported the number of people who had received full vaccination status in the district of vaccination divided by the population of the corresponding district. Vaccination rates at the end of the respective period were used for analysis. Finally, the mean of the reported yearly flu incidences between 2017 and 2019 was calculated for each district.

Statistical methods

Two-sample t-tests were used to assess differences in risk factors between the two time periods, with two-sided 0.05 levels considered statistically significant. Correlation between risk factors was assessed by the Spearman method and the corresponding p-values were approximated by using the t-distribution. Negative binomial regression was used for the univariate and multivariate association analyses of risk factors with cumulative COVID-19 mortality counts, with the logarithm of the population size as offset. Negative binomial regression extends the variation of Poisson regression to accommodate overdispersion, and hence is commonly used in COVID-19 studies [4145]. The exponentiated coefficient estimates of the negative binomial model are called incidence rate ratios (IRR).

Due to the high correlation between some of the air pollution variables, CO, NO, NO2, O3, PM10, and PM2.5, each was analyzed separately. A literature search identified leading methods for variable selection, which were investigated in the study as part of a sensitivity analysis [28, 46, 47]. Additionally, basic and full models were analyzed, either including only the considered air pollution variables, or all other risk factors as well. Selection methods were applied such that the respective air pollution variable, the target parameter, was always included in the model. This separates the approaches of this paper from other applications concerned only with prediction or interest in all risk factor effects equally.

Variable selection and model fitting, including basic and full models, were performed utilizing bootstrap sampling with 100 samples to obtain confidence intervals of coefficient estimates and included covariate numbers using quantiles [48]. All calculations were implemented using R version 4.1.2 [49] including the packages MASS [50], furrr [51], mpath [52], Hmisc [53], and lmtest [54].

Selection methods

The traditional stepwise selection method based on significance uses p-values to determine if the corresponding covariate should be included in the model. In this study, the selection criterion p < 0.05 was used. For the forward approach, the starting model is the basic model only including the current air pollution covariate. Iteratively a single new variable at a time is included in the current model. Each of the new models is compared to the current model via the likelihood ratio test, selecting the model with the smallest p-value. The process is repeated until all of the new potential models have p ≥ 0.05 or all of the potential covariates are included. In the backwards variant, the full model is the starting model and variables are discarded when their exclusion leads to the largest p-value. The process is stopped if all new potential models have p < 0.05 or only the air pollution covariate remains. The problem of the significance approach is that it can only determine if a risk factor is relevant given the other risk factors incorporated in the model.

Again starting with the basic or full model according to the forwards or backwards specification, also the Akaike (AIC) and Bayesian Information Criterion (BIC) were used. In this case, the models were selected with the smallest AIC or BIC value, respectively. With these criteria, it was possible to consider not only either inclusion or exclusion of covariates when comparing models, but both, regardless of the initial model. In general, the BIC penalizes larger numbers of covariates more severely and therefore favors smaller models. Information criteria allow the user to sort through huge numbers of models, while being computationally very efficient. However, as any of the stepwise approaches, they do not guarantee stable results such that small changes in the data may lead to very different selections.

In the change-in-estimate method (CIE), the selection criterion is based on the change of the coefficient estimate of the target parameter, in our case the air pollution variable. The implementation of the method occurs in many different flavors. In the predominant variant [55], a full model is fitted including the target parameter and all possible confounders. Confounders are then removed from the model one at a time until it becomes impossible to remove a confounder without altering the target parameter effect estimate too much compared to the estimate produced by the initial model. The change-in-estimate is defined as:

ΔCE=CEi-CE0CE0,

where CEi is the target parameter effect estimate of the considered model with one of the confounders removed and CE0 is the estimate of the initial model. In this backward variant, the confounder leading to the smallest change is selected as long as it is smaller than ten percent. A different option is the forward approach, where the initial model is the basic model including no confounders and confounders leading to the largest change are added as long as the change-in-estimate is larger than ten percent. The variant, where the change-in-estimate is not calculated with respect to the estimate of the initial model but with respect to the estimate of the model of the previous step, was also considered. The CIE approach offers an intuitive way to exclude and include risk factors in a model based directly on the changes in the coefficient estimates; however, setting the threshold of decision may even be more arbitrary than in other methods.

Finally, a variable selection, which is usually not presented as part of the traditional selection methods but has found its use in various studies, was also implemented [56, 57]. In this approach, the least absolute shrinkage and selection operator (LASSO) is used to select the relevant variables. LASSO is a shrinkage estimator penalizing the likelihood, thereby shrinking some coefficient estimates to zero. As all coefficient estimates are biased, the non-zero coefficients are then selected and used to refit the model to receive interpretable coefficient estimates. Cross-validation was used to set the hyperparameter of the procedure and no penalty factor for the air pollution variable was set to guarantee that it stayed in the model. The shrinkage approach of LASSO allows a more robust selection of risk factors than the other methods; however, it prohibits the direct interpretation of coefficient estimates.

Results

Before comparing the model selection approaches to evaluate the impact of the air pollution effects on COVID-19 mortality, the data are first visually and quantitatively explored. Comparisons between the first and second year after the start of the COVID-19 pandemic (Additional file 1: Table S1) showed that the air pollution variables all increased significantly except for ozone, which showed a significant decrease (all p < 0.001). NO, NO2, PM2.5, and PM10 more than doubled in the second period. Visits to grocery stores and pharmacies increased in comparison to the reference in the first year (median of district values: 16.5%), this dropped down again in the second year (7.0%, p < 0.001). Activity in parks remained on an increased level (57.0% and 58.5%), while activity in transit stations and at workplaces decreased in the first period (− 13.5% and − 2.0%) and then dropped even further (− 31.0% and − 26.0%, both p < 0.001). While the number of infections increased from the first to second period (27.9 to 151.1 infections per 1000 inhabitants, p < 0.001), the number of deaths decreased (86.6 to 51.5 deaths per 100 000 inhabitants, p < 0.001).

High correlations between some of the air pollution variables indicated the necessity to estimate their association to mortality separately (Additional file 1: Figure S2). NO2 and NO (Spearman rank correlation coefficient: 0.92) as well as PM2.5 and PM10 (0.91) were highly correlated. Other covariates also showed high correlations that could lead to multicollinearity. For example, transit station mobility was highly correlated with activity in retail and recreation (0.90) and workplaces (0.89), while residential and workplace mobility were negatively correlated (− 0.94). Other examples of significant correlations were between population density and proportion of urban area in a district (0.95), proportion of males and females at least 75 years old (0.91), as well as proportion of people working in the service and people working in manufacturing (− 0.99). All of these examples had p-values smaller than 0.0001.

The univariate analyses showed that O3 had a positive association with COVID-19 mortality for both considered time periods (IRR of first period: 1.02, p-value < 0.001; IRR second period: 1.01, p: 0.031) (Additional file 1: Table S2). Another significant positive association was shown for PM2.5 in the first period (IRR: 1.07, p: 0.009), this however lost significance in the second period (p: 0.4). Significant negative associations were estimated for NO, NO2, and CO in the first period (IRR: 0.90, 0.95, 1.00; p: 0.013, 0.002, 0.022). This remained stable for the second time period.

Many of the other covariates also showed significant associations with mortality (Additional file 1: Table S2). Generally, indicators positively associated with increased mortality included a higher proportion of older people, less foreigners, less education, more mobility in workplaces, transit stations, retail and recreation instead of residences, more votes for political parties at the outer spectrum, higher proportion of people working in manufacturing and construction, and less health personnel per persons needing inpatient care. Many of the associations remained similar between the two time periods, however, some showed changes such as the vaccination rate which was first positive (IRR: 1.23, p: 0.8), when barely any full vaccinations were performed, then negative one year later, although still not quite significant (IRR: 0.83, p: 0.07).

Comparison of the multivariate model selection algorithms

Coefficient estimates for all selection methods can be found in Fig. 1. The often significant association of air pollution with mortality was diminished if further variables were included. This was generally independent of the selection method. For example, NO2 is one of the clearest cases, where significant estimates in the univariates case were not visible anymore in the multivariate case. The coefficient estimates from first to second time period were somewhat decreased for O3 and PM2.5, otherwise, the estimates were very close between the time periods for the pollution variables, even though the variable selection methods ran independently and there were various changes for the effects of the other covariates. Another important result was that, for most selection methods, the coefficient estimates were equivalent to the full model. The LASSO selection as the only non-standard method led to larger deviations and sometimes did not converge properly. The otherwise homogeneity between the selection methods, however, did not translate to the number of selected covariates. In addition, multivariate analyses were performed for all air quality metrics and selection methods except LASSO with two additional risk factors, temperature and precipitation, which yielded similar results and were therefore not considered further.

Fig. 1.

Fig. 1

Coefficient estimates of bootstrapped variable selection processes for air pollution covariates with 95% quantiles from bootstrap samples. Generally, higher mortality rates and larger dispersion in the first period lead to increased quantiles in comparison to the second period

The number of selected covariates can be seen in Fig. 2. The BIC forward and LASSO selection methods led to the smallest number of covariates, but also showed larger differences to the full model. Almost all CIE methods had very large variances in the number of covariates, with the total CIE forward and significance backward consistently picking all covariates. Generally, the number of selected covariates was very consistent between the pollution variables. The most consistently selected covariates were the population proportion of females at least 75 years of age, the proportion of votes for the right-wing party AfD, and the activity in groceries and pharmacies, independent of the considered air pollution variable (Fig. 3).

Fig. 2.

Fig. 2

Median number of selected confounders after variable selection process with 95% quantiles from bootstrap samples

Fig. 3.

Fig. 3

Selection frequency of confounders depending on variable selection method aggregated for both analyzed time periods excluding the univariate and full model. For example, for CO, the proportion of females aged 75 or older was selected in 83% of the models with 8.7% being from the significance forward selection models

As an example, the confidence intervals of the NO2 coefficient extracted as quantiles from the bootstrap estimates were also compared with those calculated analytically in a single selection run for the entire data set (Additional file 1: Table S3). The confidence intervals were extremely similar. The number of selected covariates in the single run was also very close to the median of the bootstrap results.

Conclusion

While previous studies have investigated the impact of air pollution on COVID-19 mortality on a very short time frame with often limited confounders, leading to different conclusions, this is the first study to consider the association over two years while incorporating high dimensional confounders, as well as propose a sensitivity analysis comparing the effect of commonly proposed variable selection methods. Univariate analyses of one air pollution risk factor at a time yielded many significant results, with some pollution variables even showing negative associations with COVID-19 mortality, which failed to reach significance after adjustment for confounders by nearly all methods. One reason could be that other risk factors, such as mobility, also drive air pollution, leading to surrogacy effects. The traditional variable selection methods provided similar results and bootstrap confidence intervals were close to those of a single iteration. If there are considerable correlations of the main exposure to other risk factors, the multicollinearity effect needs to be considered and quantified. If possible, separate analyses should be considered such as in our case where separate models were created for each of the pollution variables. The analyses here demonstrate the importance of performing sensitivity analyses of targeted risk factor outcome results to multiple methods for confounder adjustment.

There are a number of limitations with respect to previous cross-sectional studies on air pollution and COVID-19, such as ignored time differences in the introduction of the virus, confounding due to aggregation of the data on a crude level [58], and omitted confounders. These vulnerabilities were avoided or at least mitigated in this study by using the highest available spatial resolution of the data and by selection of likely confounders. In this study, a single country was analyzed over a long time span starting after introduction of the virus, while many early studies considered only the first two or three months. Use of aggregated data rather than individual-level data lead to loss of specificity in risk factor outcome association precision. However, area-specific analyses are crucial to highlight the necessity of policy decisions and more feasible in the presence of large numbers of confounders, all of which could not be easily obtained for large numbers of individuals. Another limitation is that the considered air pollution metrics may be too low to measure a significant effect on the severity of COVID-19 in comparison, for example to the highly industrialized regions, Lombardy, Veneto, and Emilia-Romagna, where the initial surge of infections and deaths in Italy appeared most severely [59].

Further studies are required to determine and gauge associations of risk factors with the spread of COVID-19. Moreover, necessary data need to be available and be standardized between countries. For example, it would be necessary to know the place of residence of vaccinated people not only the place of their vaccinations, a standardized and reliable database of interventions with a high spatial resolution is necessary, and higher reliability of COVID-19 numbers is crucial. This study has focused on mortality, but when available, excess mortality with appropriate resolution should be considered as a potentially more reliable mortality measure to compare with reported deaths [60]. Comparable sensitivity analyses as performed here should be performed in other COVID-19 association studies to assess the robustness of targeted risk factor effects on outcomes, thus avoiding unnecessary or false public health actions based on spurious results.

Supplementary Information

12302_2022_657_MOESM1_ESM.docx (938.9KB, docx)

Additional file 1: Figure S1. Cumulative mortality rate and average NO2 in µg m-3 for the full considered time frame between March 2020 and February 2022 of the 400 German districts. Figure S2. Correlation plot of risk factors between German districts aggregated over the time frame between March 2020 and February 2022. Black borders indicate p<0.0005. Table S1. Risk factors and outcomes for first time period March 2020 – February 2021 and second time period March 2021 – February 2022. Table S2. Univariate association of variables with COVID-19 mortality for first time period March 2020 – February 2021 and second period March 2021 – February 2022. Table S3. Comparison of bootstrapped selection process with confidence intervals derived from the bootstrap quantiles and a single selection execution on the full dataset for NO2.

Acknowledgements

Not applicable.

Abbreviations

AIC

Akaike information criterion

BIC

Bayesian information criterion

CI

Confidence interval

CIE

Change-in-estimate

IQR

Inter-quartile range

IRR

Incidence rate ratios

LASSO

Least absolute shrinkage and selection operator

Author contributions

GM, DPA, and AM contributed to the conception of the work and revised the manuscript. GM and DPA analyzed and interpreted the data. GM drafted the manuscript. All authors read and approved the final manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Availability of data and materials

All data were acquired from publicly available databases.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Footnotes

1

Neither the European Commission nor the European Centre for Medium-Range Weather Forecasts are responsible for any use that may be made of the information or data this publication contains.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Estiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN. Predicting COVID-19 mortality with electronic medical records. Digit Med. 2021;4:1–10. doi: 10.1038/s41746-021-00383-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Redondo-Bravo L, Sierra Moros MJ, Martínez Sánchez EV, Lorusso N, Carmona Ubago A, Gallardo García V, et al. The first wave of the COVID-19 pandemic in Spain: characterisation of cases and risk factors for severe outcomes, as at 27 April 2020. Euro Surveill. 2020;25:2001431. doi: 10.2807/1560-7917.ES.2020.25.21.1900364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Li J, Huang DQ, Zou B, Yang H, Hui WZ, Rui F, et al. Epidemiology of COVID-19: A systematic review and meta-analysis of clinical characteristics, risk factors, and outcomes. J Med Virol. 2021;93:1449–1458. doi: 10.1002/jmv.26424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Parra-Bracamonte GM, Lopez-Villalobos N, Parra-Bracamonte FE. Clinical characteristics and risk factors for mortality of patients with COVID-19 in a large data set from Mexico. Ann Epidemiol. 2020;52:93–98.e2. doi: 10.1016/j.annepidem.2020.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Drefahl S, Wallace M, Mussino E, Aradhya S, Kolk M, Brandén M, et al. A population-based cohort study of socio-demographic risk factors for COVID-19 deaths in Sweden. Nat Commun. 2020;11:5097. doi: 10.1038/s41467-020-18926-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Baena-Díez JM, Barroso M, Cordeiro-Coelho SI, Díaz JL, Grau M. Impact of COVID-19 outbreak by income: hitting hardest the most deprived. J Public Health (Oxf) 2020;9:136. doi: 10.1093/pubmed/fdaa136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kephart JL, Delclòs-Alió X, Rodríguez DA, Sarmiento OL, Barrientos-Gutiérrez T, Ramirez-Zea M, et al. The effect of population mobility on COVID-19 incidence in 314 Latin American cities: a longitudinal ecological study with mobile phone location data. Lancet Digital Health. 2021;3:e716–e722. doi: 10.1016/S2589-7500(21)00174-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kwok S, Adam S, Ho JH, Iqbal Z, Turkington P, Razvi S, et al. Obesity: A critical risk factor in the COVID-19 pandemic. Clinical Obesity. 2020;10:e12403. doi: 10.1111/cob.12403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Malik VS, Ravindra K, Attri SV, Bhadada SK, Singh M. Higher body mass index is an important risk factor in COVID-19 patients: a systematic review and meta-analysis. Environ Sci Pollut Res. 2020;27:42115–42123. doi: 10.1007/s11356-020-10132-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kirillov Y, Timofeev S, Avdalyan A, Nikolenko VN, Gridin L, Sinelnikov MY. Analysis of Risk Factors in COVID-19 Adult Mortality in Russia. J Prim Care Community Health. 2021;12:21501327211008050. doi: 10.1177/21501327211008050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bae S, Kim SR, Kim M-N, Shim WJ, Park S-M. Impact of cardiovascular disease and risk factors on fatal outcomes in patients with COVID-19 according to age: a systematic review and meta-analysis. Heart. 2021;107:373–380. doi: 10.1136/heartjnl-2020-317901. [DOI] [PubMed] [Google Scholar]
  • 12.Zheng Z, Peng F, Xu B, Zhao J, Liu H, Peng J, et al. Risk factors of critical & mortal COVID-19 cases: a systematic literature review and meta-analysis. J Infect. 2020;81:e16–25. doi: 10.1016/j.jinf.2020.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Meng Y, Lu W, Guo E, Liu J, Yang B, Wu P, et al. Cancer history is an independent risk factor for mortality in hospitalized COVID-19 patients: a propensity score-matched analysis. J Hematol Oncol. 2020;13:75. doi: 10.1186/s13045-020-00907-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ozturk S, Turgutalp K, Arici M, Odabas AR, Altiparmak MR, Aydin Z, et al. Mortality analysis of COVID-19 infection in chronic kidney disease, haemodialysis and renal transplant patients compared with patients without kidney disease: a nationwide analysis from Turkey. Nephrol Dial Transplant. 2020;35:2083–2095. doi: 10.1093/ndt/gfaa271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cai R, Zhang J, Zhu Y, Liu L, Liu Y, He Q. Mortality in chronic kidney disease patients with COVID-19: a systematic review and meta-analysis. Int Urol Nephrol. 2021;53:1623–1629. doi: 10.1007/s11255-020-02740-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Landrigan PJ, Fuller R, Acosta NJR, Adeyi O, Arnold R, Basu N, et al. The Lancet Commission on pollution and health. Lancet. 2018;391:462–512. doi: 10.1016/S0140-6736(17)32345-0. [DOI] [PubMed] [Google Scholar]
  • 17.Dominici F, Peng RD, Bell ML, Pham L, McDermott A, Zeger SL, et al. Fine particulate air pollution and hospital admission for cardiovascular and respiratory diseases. JAMA. 2006;295:1127–1134. doi: 10.1001/jama.295.10.1127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Faustini A, Rapp R, Forastiere F. Nitrogen dioxide and mortality: review and meta-analysis of long-term studies. Eur Respir J. 2014;44:744–753. doi: 10.1183/09031936.00114713. [DOI] [PubMed] [Google Scholar]
  • 19.Harrison AG, Lin T, Wang P. Mechanisms of SARS-CoV-2 Transmission and Pathogenesis. Trends Immunol. 2020;41:1100–1115. doi: 10.1016/j.it.2020.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Meo SA, Suraya F. Effect of environmental air pollution on cardiovascular diseases. Eur Rev Med Pharmacol Sci. 2015;19:4890–4897. [PubMed] [Google Scholar]
  • 21.Yang L, Li C, Tang X. The Impact of PM2.5 on the Host Defense of Respiratory System. Front Cell Develop Biol. 2020;8:89. doi: 10.3389/fcell.2020.00089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Cao Y, Chen M, Dong D, Xie S, Liu M. Environmental pollutants damage airway epithelial cell cilia: Implications for the prevention of obstructive lung diseases. Thorac Cancer. 2020;11:505–510. doi: 10.1111/1759-7714.13323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang R, Li Y, Zhang AL, Wang Y, Molina MJ. Identifying airborne transmission as the dominant route for the spread of COVID-19. Proc Natl Acad Sci U S A. 2020;117:14857–14863. doi: 10.1073/pnas.2009637117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Setti L, Passarini F, De Gennaro G, Barbieri P, Perrone MG, Borelli M, et al. SARS-Cov-2RNA found on particulate matter of Bergamo in Northern Italy: First evidence. Environ Res. 2020;188:109754. doi: 10.1016/j.envres.2020.109754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Harrell FE. Multivariable Modeling Strategies. In: Harrell J Frank E, editor. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Cham: Springer International Publishing; 2015. p. 63–102.
  • 26.Steyerberg EW. Selection of main effects. In: Steyerberg EW, editor. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York: Springer; 2009. pp. 191–211. [Google Scholar]
  • 27.Chatfield C. Model Uncertainty, Data Mining and Statistical Inference. J R Stat Soc A Stat Soc. 1995;158:419–444. doi: 10.2307/2983440. [DOI] [Google Scholar]
  • 28.Talbot D, Massamba VK. A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement. Eur J Epidemiol. 2019;34:725–730. doi: 10.1007/s10654-019-00529-y. [DOI] [PubMed] [Google Scholar]
  • 29.Dominici F, Greenstone M, Sunstein CR. Particulate Matter Matters. Science. 2014;344:257–259. doi: 10.1126/science.1247348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Robert Koch-Institut. RKI_COVID19 - Übersicht. https://www.arcgis.com/home/item.html?id=f10774f1c63e40168479a1feb6c7ca74. Accessed 7 Mar 2022.
  • 31.GovData. DL-DE->BY-2.0. DL-DE->BY-2.0. https://www.govdata.de/dl-de/by-2-0. Accessed 23 Mar 2022.
  • 32.Russell TW, Golding N, Hellewell J, Abbott S, Wright L, Pearson CAB, et al. Reconstructing the early global dynamics of under-ascertained COVID-19 cases and infections. BMC Med. 2020;18:332. doi: 10.1186/s12916-020-01790-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Whittaker C, Walker PGT, Alhaffar M, Hamlet A, Djaafara BA, Ghani A, et al. Under-reporting of deaths limits our understanding of true burden of covid-19. BMJ. 2021;375:n2239. doi: 10.1136/bmj.n2239. [DOI] [PubMed] [Google Scholar]
  • 34.METEO FRANCE, Institut national de l’environnement industriel et des risques (Ineris), Aarhus University, Norwegian Meteorological Institute (MET Norway), Jülich Institut für Energie- und Klimaforschung (IEK), Institute of Environmental Protection – National Research Institute (IEP-NRI), Koninklijk Nederlands Meteorologisch Instituut (KNMI), Nederlandse Organisatie voor toegepast-natuurwetenschappelijk onderzoek (TNO), Swedish Meteorological and Hydrological Institute (SMHI) and Finnish Meteorological Institute (FMI). CAMS European air quality forecasts, ENSEMBLE data. 2020. https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-europe-air-quality-forecasts?tab=overview. Accessed 7 Mar 2022.
  • 35.Statistisches Bundesamt Deutschland. GENESIS-Online. 2022. https://www-genesis.destatis.de/genesis/online. Accessed 29 Apr 2022.
  • 36.Statistische Ämter des Bundes und der Länder. Regionaldatenbank Deutschland. 2022. https://www.regionalstatistik.de/genesis/online/. Accessed 29 Apr 2022.
  • 37.GeoJSON Utilities. http://opendatalab.de/projects/geojson-utilities/. Accessed 2 Jun 2020.
  • 38.Google LLC. COVID-19 Community Mobility Report. COVID-19 Community Mobility Report. https://www.google.com/covid19/mobility?hl=de. Accessed 7 Mar 2022.
  • 39.Robert Koch-Institut. SurvStat@RKI 2.0. 2021. https://survstat.rki.de/. Accessed 10 Dec 2021.
  • 40.Robert Koch-Institut F 33. COVID-19-Impfungen in Deutschland. 2021.
  • 41.Stieb DM, Evans GJ, To TM, Brook JR, Burnett RT. An ecological analysis of long-term exposure to PM2.5 and incidence of COVID-19 in Canadian health regions. Environ Res. 2020;191:110052. doi: 10.1016/j.envres.2020.110052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Rodriguez-Villamizar LA, Belalcázar-Ceron LC, Fernández-Niño JA, Marín-Pineda DM, Rojas-Sánchez OA, Acuña-Merchán LA, et al. Air pollution, sociodemographic and health conditions effects on COVID-19 mortality in Colombia: An ecological study. Sci Total Environ. 2021;756:144020. doi: 10.1016/j.scitotenv.2020.144020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Adhikari A, Yin J. Short-Term Effects of Ambient Ozone, PM2.5, and Meteorological Factors on COVID-19 Confirmed Cases and Deaths in Queens, New York. Int J Environ Res Public Health. 2020;17:4047. doi: 10.3390/ijerph17114047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Travaglio M, Yu Y, Popovic R, Selley L, Leal NS, Martins LM. Links between air pollution and COVID-19 in England. Environ Pollut. 2021;268:115859. doi: 10.1016/j.envpol.2020.115859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Liang D, Shi L, Zhao J, Liu P, Sarnat JA, Gao S, et al. Urban Air Pollution May Enhance COVID-19 Case-Fatality and Mortality Rates in the United States. Innovation (N Y) 2020;1:100047. doi: 10.1016/j.xinn.2020.100047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Heinze G, Wallisch C, Dunkler D. Variable selection – A review and recommendations for the practicing statistician. Biom J. 2018;60:431–449. doi: 10.1002/bimj.201700067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Greenland S, Daniel R, Pearce N. Outcome modelling strategies in epidemiology: traditional methods and basic alternatives. Int J Epidemiol. 2016;45:565–575. doi: 10.1093/ije/dyw040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KGM. Internal and external validation of predictive models: A simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56:441–447. doi: 10.1016/S0895-4356(03)00047-7. [DOI] [PubMed] [Google Scholar]
  • 49.R Core Team. R: A language and environment for statistical computing. 2021.
  • 50.Venables WN, Ripley BD, Venables WN. Modern applied statistics with S. 4. New York: Springer; 2002. [Google Scholar]
  • 51.Vaughan D, Dancho M. furrr: Apply Mapping Functions in Parallel using Futures. 2021.
  • 52.Wang Z. mpath: Regularized Linear Models. 2021.
  • 53.Harrell F. Hmisc: Harrell Miscellaneous. 2021.
  • 54.Zeileis A, Hothorn T. Diagnostic Checking in Regression Relationships. R News. 2002;2:7–10. [Google Scholar]
  • 55.Weng H-Y, Hsueh Y-H, Messam LLM, Hertz-Picciotto I. Methods of Covariate Selection: Directed Acyclic Graphs and the Change-in-Estimate Procedure. Am J Epidemiol. 2009;169:1182–1190. doi: 10.1093/aje/kwp035. [DOI] [PubMed] [Google Scholar]
  • 56.Meintrup D, Borgmann S, Seidl K, Stecher M, Jakob CEM, Pilgram L, et al. Specific Risk Factors for Fatal Outcome in Critically Ill COVID-19 Patients: Results from a European Multicenter Study. J Clin Med. 2021;10:3855. doi: 10.3390/jcm10173855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Nomura S, Eguchi A, Yoneoka D, Kawashima T, Tanoue Y, Murakami M, et al. Reasons for being unsure or unwilling regarding intention to take COVID-19 vaccine among Japanese people: A large cross-sectional national survey. Lancet Reg Health West Pac. 2021;14:100223. doi: 10.1016/j.lanwpc.2021.100223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Heederik DJJ, Smit LAM, Vermeulen RCH. Go slow to go fast: a plea for sustained scientific rigour in air pollution research during the COVID-19 pandemic. Eur Respir J. 2020;56:2001361. doi: 10.1183/13993003.01361-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Filippini T, Rothman KJ, Goffi A, Ferrari F, Maffeis G, Orsini N, et al. Satellite-detected tropospheric nitrogen dioxide and spread of SARS-CoV-2 infection in Northern Italy. Sci Total Environ. 2020;739:140278. doi: 10.1016/j.scitotenv.2020.140278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Karlinsky A, Kobak D. Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset. eLife. 10:e69336. [DOI] [PMC free article] [PubMed]
  • 61.Ogen Y. Assessing nitrogen dioxide (NO2) levels as a contributing factor to coronavirus (COVID-19) fatality. Sci Total Environ. 2020;726:138605. doi: 10.1016/j.scitotenv.2020.138605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Bashir MF, Ma B, Bilal, Komal B, Bashir MA, Tan D, et al. Correlation between climate indicators and COVID-19 pandemic in New York, USA. Sci Total Environ. 2020;728:138835. [DOI] [PMC free article] [PubMed]
  • 63.Accarino G, Lorenzetti S, Aloisio G. Assessing correlations between short-term exposure to atmospheric pollutants and COVID-19 spread in all Italian territorial areas. Environ Pollut. 2021;268:115714. doi: 10.1016/j.envpol.2020.115714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Zhu Y, Xie J, Huang F, Cao L. Association between short-term exposure to air pollution and COVID-19 infection: Evidence from China. Sci Total Environ. 2020;727:138704. doi: 10.1016/j.scitotenv.2020.138704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Wu X, Nethery RC, Sabath MB, Braun D, Dominici F. Air pollution and COVID-19 mortality in the United States: Strengths and limitations of an ecological regression analysis. Science Advances. 6:eabd4049. [DOI] [PMC free article] [PubMed]
  • 66.Borro M, Di Girolamo P, Gentile G, De Luca O, Preissner R, Marcolongo A, et al. Evidence-Based Considerations Exploring Relations between SARS-CoV-2 Pandemic and Air Pollution: Involvement of PM2.5-Mediated Up-Regulation of the Viral Receptor ACE-2. International Journal of Environmental Research and Public Health. 2020;17:5573. [DOI] [PMC free article] [PubMed]
  • 67.Tieskens KF, Patil P, Levy JI, Brochu P, Lane KJ, Fabian MP, et al. Time-varying associations between COVID-19 case incidence and community-level sociodemographic, occupational, environmental, and mobility risk factors in Massachusetts. BMC Infect Dis. 2021;21:686. doi: 10.1186/s12879-021-06389-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12302_2022_657_MOESM1_ESM.docx (938.9KB, docx)

Additional file 1: Figure S1. Cumulative mortality rate and average NO2 in µg m-3 for the full considered time frame between March 2020 and February 2022 of the 400 German districts. Figure S2. Correlation plot of risk factors between German districts aggregated over the time frame between March 2020 and February 2022. Black borders indicate p<0.0005. Table S1. Risk factors and outcomes for first time period March 2020 – February 2021 and second time period March 2021 – February 2022. Table S2. Univariate association of variables with COVID-19 mortality for first time period March 2020 – February 2021 and second period March 2021 – February 2022. Table S3. Comparison of bootstrapped selection process with confidence intervals derived from the bootstrap quantiles and a single selection execution on the full dataset for NO2.

Data Availability Statement

All data were acquired from publicly available databases.


Articles from Environmental Sciences Europe are provided here courtesy of Nature Publishing Group

RESOURCES