Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jun 20.
Published in final edited form as: Environ Sci Technol. 2017 Jun 5;51(12):6957–6964. doi: 10.1021/acs.est.7b01148

A Global Land Use Regression Model for Nitrogen Dioxide Air Pollution

Andrew Larkin a,*, Jeffrey A Geddes b, Randall V Martin c,d, Qingyang Xiao e, Yang Liu e, Julian D Marshall f, Michael Brauer g, Perry Hystad a
PMCID: PMC5565206  NIHMSID: NIHMS892820  PMID: 28520422

Abstract

Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the global distribution of NO2 exposure and associated impacts on global health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO2) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO2 variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n=10,000) demonstrated robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R2 within 2%) but not for Africa and Oceania (adjusted R2 within 11%) where NO2 monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO2 concentrations. Variable contributions differed between continental regions but major roads within 100m and satellite-derived NO2 were consistently the strongest predictors. The resulting model will be made available and can be used for global risk assessments and health studies, particularly in countries without existing NO2 monitoring data or models.

INTRODUCTION

Outdoor air pollution is a source of concern for global human health. The most recent version of the global burden of disease estimated that ambient fine particulate matter less than 2.5 microns (PM2.5) contributes to 4.2 million annual deaths and ozone an additional 254,000 deaths.1 More than 50% of the disease burden from air pollution is in the rapidly developing countries of China and India, where air pollution concentrations are high and populations large.2 In 2015, the World Health Assembly identified air pollution as the “…world’s largest single environmental health risk”, and called for additional efforts to monitor and evaluate the impacts of air pollution on health.3

To fully evaluate ambient air pollution impacts on human health, population exposure estimates should extend beyond PM2.5 and ozone to better represent additional common exposures, such as traffic air pollution, for which nitrogen dioxide (NO2) is a common marker4. A growing body of evidence links traffic-related pollution to myriad acute and chronic adverse health outcomes, including increased odds of incident asthma in children5, decreased lung function in children6 and adults7, and positive association with lung cancer in adults8.

The global distribution of NO2 exposure and concomitant impacts on global health is still largely uncertain, in part because of challenges in estimating global NO2 concentrations. NO2 air monitor networks are sparse or non-existent in many low-income countries. Where they do exist, they generally do not capture the important spatial gradients needed to understand exposures, e.g., NO2 concentrations near major roads and highways (100–400m9). Capturing fine scale NO2 gradients are important for exposure assessments, as within city variation is more strongly associated with multiple non-accidental causes of mortality than between city variation in annual NO2 concentrations10. Similarly, NO2 estimates derived from moderate resolution remote sensing products (those at ~ 10km x 10km resolution) do not capture fine-scale NO2 gradients. Land use regression (LUR) models can predict NO2 concentrations across large spatial extents, and have been created for large geographic areas, including the continental United States,1113 Canada,14 Europe,15 and Australia.16 These models are built from NO2 monitor data and combine satellite-based NO2 estimates with land use characteristics and roadway information to predict NO2 concentrations at fine spatial scales (30m–500m). In 2011, Novotny and associates11 demonstrated the utility of land classification datasets available globally in place of high-resolution country level equivalents with no loss of predictive power in their LUR model for the continental US. The LUR approach is therefore an excellent candidate for developing high-resolution NO2 estimates at a global extent.

Here, we present the development of the first global NO2 LUR model (for 2011) based on annual NO2 measurement data (n=5,220) compiled for 58 countries and available global predictor datasets. A global model of NO2 will inform global risk assessments in terms of estimates of NO2 exposure and associated health burden as well as provide standardized NO2 estimates for multi-country studies and NO2 estimates for health studies in developing countries where detailed city-specific or country-specific models do not exist.

METHODS

NO2 Air Pollution Monitoring Data

Annual NO2 air monitor measurements were collected from a wide range of environmental and regulatory agency websites (Supplemental Excel File). Air monitors from the US, Canada, Europe, China, and Japan were restricted to air monitors with greater than 75% hourly coverage. In other countries, percent coverage was not provided or the temporal unit to derive percent coverage (daily, hourly, or monthly) was unknown. For those countries, we therefore further restricted monitor selection to monitors with annual standard deviation less than 25 ppb, at least two years of annual measurements, year-round coverage (greater than 75% coverage or a positive indicator complete coverage for each month), and latitude, longitude coordinates with four or more decimal places of precision (i.e., to within 12 meters). The full database of collected NO2 air monitor measurements is available at http://health.oregonstate.edu/labs/spatial-health/resources/. To match air monitor measurements with satellite-based surface NO2 estimates (described below), mean annual NO2 measurements by the monitor were calculated for each monitor using up to three annual measurements closest to the year 2011.

Predictor Variables

Predictor variables included satellite NO2 estimates and land use related variables. All predictor variables and corresponding input data sources are listed in Table S1. Variables consisted of either estimates at the exact air monitor location (point) or an average of a variable within a radius around the air monitor location (buffer). First, satellite-based estimates of surface NO2 concentrations from 2010–2012 were applied to each monitor. Briefly, tropospheric NO2 column retrievals from the SCIAMACHY and GOME-2 instruments were combined with output from the global GEOS-Chem model to produce gridded NO2 surface estimates at ~10km x 10km resolution.17 The three-year averages were based on daily overpass data after excluding pixels contaminated by cloud (cloud radiance fraction > 0.5) and snow (estimated using snow cover from the National Ice Center’s Interactive Snow and Ice Mapping System). Potential sampling biases in the annual means were accounted for by applying a GEOS-Chem model correction for the missing days.

For each land use characteristic evaluated as a buffer, multiple buffer variables were created, ranging from 100m to 50km in radius (buffer distances are listed in Table S1). Land use characteristics in the dataset include normalized difference vegetation index (NDVI), tree cover, impervious surface area, population density, major and minor road length, length of major roads upwind from air monitors, power plant CO2 emissions, active fires and distance to coast. Major roads upwind from air monitors consists of the average length of major roads upwind from an air monitor station in each year (Figure S1). Buffer variable and point estimates were calculated using Python v. 2.7 18 scripts written for automated analysis in ArcGIS v. 10.3.1.19 Annual distributions of wind direction from the National Centers for Environmental Prediction Climate Forecast System20 were calculated using a Python script written for automated analysis in Google Earth Engine21 (Python scripts are available at https://github.com/larkinandy/LUR-NO2-Model).

Statistical Analysis

LUR models were developed using Lasso variable selection (glment package,16,22 in RStudio, v. 3.2.223). Lasso regression was successfully utilized by Knibbs et al. (2014) to create their Australian NO2 land use regression model. Parameters for Lasso variable selection include standardizing independent variables (standardization = True), selecting variables to minimize mean-square error (type.measure = ‘mse’), and forcing the direction of variable coefficients to conform to apriori hypotheses (e.g. increases in major roads and tree cover are associated with increases and decreases in NO2 concentrations, respectively) (lower.lim = 0). The lasso model with a lambda cross-validation score of one standard deviation from the minimum cross-validation score was selected as the model of choice to favor model simplification and inference over model prediction (s = lambda.1se). To reduce multicollinearity, models with incremental buffer sizes of the same land use characteristic were reduced to only include the smallest buffer size, if the radii of the larger buffers were within three times the radii of the smaller buffers. For example, if major roads variables with 100m, 200m, and 400m buffer sizes were all selected by lasso regression, only the 100m and 400m variables would be included in the regression model. Finally, variables were included in the final model if they were statistically significant, increased adjusted R2 either globally or in one or more continental regions, by 1 percent or more, exhibited variance inflation factors less than 5 for at least one region and less than 10 for all regions.

Model performance was evaluated by calculating root mean squared error (RMSE), mean absolute error (MAE), R-squared (R2), adjusted R-squared (Adj R2), mean percent bias (MB), and mean absolute percent bias (MAB) for the entire global dataset as well as within each continental region. Leave 10% out cross-validation was performed, in which 10% of the monitors from each continental region were randomly sampled into a testing dataset, with the remaining 90% from each region combined to create the model training dataset. Cross-validation was repeated in a bootstrap fashion 10,000 times to generate cross-validation estimates of RMSE, MAE, and R2 both globally and within each continental region.

Several sensitivity analyses were performed to evaluate the robustness of our global model. Continental LUR models were created for each region and compared to the previously published LUR models for the continental US, Canada, Europe, and Australia. Continent specific models were also created from the residuals of the global model to identify variables excluded from the global model that may be important in capturing regional variation. For a comparison of the global model, regional model and residual model methodologies, see Figure S2. To test model sensitivity and overfitting of vegetation levels, we performed two t-tests comparing residuals in the bottom (NDVI < 0.28) and top (NDVI > 0.57) decile of average vegetative cover within 10km. The first t-test used satellite-based predictions, while the second t-test used the developed global model predictions.

The R scripts used to create the LUR models, perform model performance, and perform sensitivity analyses are available at https://github.com/larkinandy/LUR-NO2-Model.

RESULTS

Global NO2 Database

The distribution of NO2 air measurements that passed selection criteria are shown in Figure 1, and the corresponding summary statistics, stratified by continental region, are shown in Table 1. Histograms of annual air monitor concentrations for each region are shown in Figure S3. Measurements were collected from 6,761 unique air monitors, 5,220 of which (77%) met selection criteria. Air monitor coverage is greatest in Europe and Asia, and sparse in Africa and Oceania. The global median year of air monitor measurements in this database is 2013, with median year of annual measurements by continent ranging from 2011 to 2013.5. Annual NO2 concentrations range from 0 to 59 ppb, with mean annual air monitor concentration of 11.5 ppb. Mean concentrations are greatest in Asia (14.1 ppb) and North America (13.1 ppb), and lowest in Africa (7.3 ppb) and Oceania (6.7 ppb). Regional standard deviation in air monitor averages range from 4.5 to (Oceania) to 8.2 (North America), with a global average of 7.5 ppb.

Figure 1.

Figure 1

Global Distribution of NO2 Air Monitor Locations

Table 1.

NO2 air monitor summary statistics, stratified by region.

Region Median Year Monitors (n) Min NO2 (ppb) Max NO2 (ppb) Mean NO2 (ppb) Std Dev NO2 (ppb) 25th perc* 50th perc* 75th perc* 90th perc
N America 2011 731 0 44 13.1 8.2 6.7 11.3 17.3 24.3
S America 2011 105 1 35 12.7 7.6 7.0 10.5 18.6 23.2
Europe 2012 2351 0 47 11.8 6.8 7.0 11.0 15.5 21.5
Africa 2013.5 63 2 19 7.3 3.8 4.5 7.0 8.8 13.0
Asia 2012 1886 1 59 14.1 7.7 8.5 13.0 18.3 58.7
Oceania 2011 84 1 23 6.7 4.5 3.4 6.0 9.3 12.7
Global 2013 5220 0 59 11.5 7.5 7.3 11.4 16.7 23.0
*

percentile.

NO2 LUR Model

The final LUR model performance is shown below in Table 2. Global model predictions are shown in Figure 2, and predicted vs. observed air monitor measurements are shown in Figures 3 (global) and S4 (by region). Final model variables are summarized in Table 3. Globally, the NO2 model explains 54% of annual NO2 variation, with MAE of 3.7 ppb and MAB of 44%. Model predictions are positively biased (25%) with positive and negative bias in general at air monitor locations with annual concentrations below 10ppb and above 40ppb, respectively (Figure 3). Regionally, adjusted R2 ranges from 0.31 (Africa) to 0.63 (South America), MAE ranges from 2.3 (Africa) to 4.4 ppb (North America), and MAB ranges from 34% (Asia) to 74% (North America). In general, model performance in each region is positively associated with regional NO2 standard deviation but not sample size (Table 1). Global distribution of model residuals is shown in Figure S5. Residuals are greatest in North America and smallest in Oceania.

Table 2.

NO2 model training performance.

Region RMSE (ppb)* MAE (ppb)** R2 AdjR2 MB*** (%) MAB (%)****
N America 5.7 4.4 0.52 0.52 52 74
S America 4.4 3.1 0.67 0.63 29 44
Europe 4.8 3.5 0.52 0.52 24 43
Africa 2.9 2.3 0.42 0.31 20 41
Asia 5.3 3.7 0.52 0.51 16 34
Oceania 3.2 2.4 0.51 0.44 30 63
Global 5.0 3.7 0.54 0.54 25 44
*

RMSE – root mean square error.

**

MAE – mean absolute error.

***

MB - mean percent bias.

****

MAB – mean absolute percent bias.

Figure 2.

Figure 2

Global NO2 model predictions for the year 2011. Inserts of select cities for each continental region demonstrate within city variation of model predictions

Figure 3.

Figure 3

Predicted vs observed mean annual NO2 concentrations. Values are moderately correlated with a positive mean bias.

Table 3.

Global NO2 LUR model structure

Variable Units IQR Buffer Radius (km) B Std Err Global %R2 reduction Regional %R2 Reduction Global p-value Regional p-value
Intercept Ppb NA NA 8.370 0.701 NA NA <0.01 NA
N America Intercept Ppb NA NA 2.985 0.611 NA NA <0.01 NA
S America Intercept Ppb NA NA 1.977 0.754 NA NA 0.01 NA
Europe Intercept Ppb NA NA 1.274 0.584 NA NA 0.03 NA
Asia Intercept Ppb NA NA 2.345 0.592 NA NA <0.01 NA
Major Roads Km 0.18 0.1 9.241 0.410 9.1 13.5 <0.01 <0.01
Satellite-Based NO2 Ppb 2.97 NA 0.832 0.038 8.8 19.5 <0.01 <0.01
Population Density persons/km 2.09 3.5 0.231 0.032 3.3 3.3 <0.01 <0.01
Water Body % 33 50 −3.883 0.394 1.9 12.3 <0.01 <0.01
Major Roads Km 27.08 2.5 0.040 0.015 1.4 8.2 <0.01 <0.01
NDVI*** Normalized 0.17 0.2 −8.290 1.287 0.8 11.9 <0.01 <0.01
Tree Cover % 10.05 1.5 −0.023 0.006 0.3 7.6 <0.01 <0.01
ISA**** % 33.96 1.5 0.028 0.008 0.2 2.4 <0.01 <0.01
ISA**** % 25.05 7 0.029 0.010 0.1 2.9 0.01 <0.01
NDVI*** Normalized 0.15 1.2 −1.600 1.524 0.02 11.5 0.29 <0.01
*

Global reduction in explained variance after removing variable from the model.

**

Maximum reduction in explained variance in a given region after removing variable from the model. The Africa intercept was not significant and therefore not included in the final model. Oceania served as the reference group for regional intercepts. Variables are listed in order of global %R2 reduction.

***

NDVI – Normalized Difference Vegetation Index.

****

ISA – Impervious Surface Area. The global model includes two variables for NDVI and ISA, with different buffer distances for each variable. See Figures S1 and S6 and Table S1 for more information about model variables. Regional R2 and p-values are based on regional subsets of the global training dataset.

Variables with negative coefficients include NDVI, tree cover, and water body. Percent water body contributes the most to predicting lower concentrations in the model, with 0.39 ppb estimated decrease for each 10% increase in water body coverage within 50km. Variables with positive coefficients include satellite-based NO2, impervious surface area, population density, and length of major roads. The most significant positive coefficient predictors are satellite-based NO2 and major road length within 100m. For every 0.1-km increase in major road length within 0.1km, predicted NO2 concentrations increase by 0.92 ppb. Some variables only explained greater than 1% of NO2 variation in continental datasets. For example, NDVI within 200m contributes only 0.81% of the variation in the global dataset; however, removing NDVI 200m from the model significantly reduces percent variance explained in specific regions (e.g., in Oceania by 11.9%). The percent R2 reduction for all model variables by continental regions are shown in Figure S5.

The applied model predictions for New York City, USA, and for Delhi, India, are shown in Figure 4. Individual variable contributions toward model predictions across a transect of both cities are shown in Figure S7. Major roads 100m, NDVI 200m, and population density 3500m were strong predictors for NO2 in both New York and Delhi. In New York, the strongest predictor was satellite NO2, while in Delhi the strongest predictor was population density.

Figure 4.

Figure 4

Predicted annual NO2 concentrations in New York City, USA (top left) and Delhi, India (bottom left). Green lines correspond to model transects, with model predictions along the transect (moving from southwest to northeast) shown on the top right for New York City and bottom right for Delhi.

Sensitivity Analyses

Results of the bootstrap 10% cross-validation are shown in Table S3. Globally, MAE is 0.1 ppb greater and R2 is 1% smaller compared to models trained with the entire dataset (Table 3). Regionally, MAEs and R2 are 0.3 greater and 4% lower, respectively, for Africa and 0.2 ppb greater and 18% lower, respectively, for Oceania. For all other regions, MAE and R2 are within 0.2 ppb and 5% of model training, respectively. Model performance is robust with respect to monitor sampling selection for South America, North America, Europe and Asia, but not for Africa and Oceania, likely due to small sample size and sparse spatial coverage.

Results of our regional model sensitivity analysis are shown below in Table 4. The R2 for all regional models was slightly higher than for the global model, except for Africa. Performance of the residual models are also provided in Table S2. With the exception of North America, Africa, and Oceania, residual models improve R2 by less than 2% compared to the global model. In addition, most of the variables selected by the residual models were 50km in buffer size, except for North America where the model included minor roads within a 50km buffer. In comparison to global satellite estimates alone, MSE is lower (6.6 vs. 5.0 ppb) and AdjR2 is greater (0.22 vs. 0.54) in the global LUR model.

Table 4.

Models created from regional partitions of the global dataset. Regional model performance of all models within their respective extents are greater compared the global, although the difference in performance varies.

Region RMSE* (ppb) MAE** (ppb) R2 Adj R2 MB(%)*** MAB**** (%)
N America 5.0 3.8 0.64 0.63 31 52
S America 3.5 2.6 0.79 0.77 20 36
Europe 4.5 3.3 0.57 0.57 20 38
Africa 2.9 2.4 0.41 0.38 21 43
Asia 4.9 3.5 0.59 0.58 16 33
Oceania 3.2 2.3 0.49 0.46 38 62
*

RMSE – root mean square error.

**

MAE – mean absolute error.

***

MB – mean percent bias.

****

MAB - mean absolute percent bias.

In our comparison of satellite and LUR model performance in areas with low and high vegetation, residuals from satellite estimates are significantly greater for areas with low vegetative cover compared to regions with high vegetative cover (p<0.001, 95% CI 7.4:9.2 ppb). Residuals from the developed global land use regression model, however, do not significantly differ between regions with high and low vegetation (p-value = 0.52, 95% CI −0.9:0.5 ppb). These results demonstrate the utility of and provide justification for the use of land use characteristics for improving satellite-based NO2 predictions.

DISCUSSION

Using 5,220 monitors from 58 countries, we developed a global LUR model that captured a large proportion of the NO2 variation. Importantly, this model captured both between and within-city spatial variability in NO2, representing fine-scale variation that is difficult to achieve using satellite-based estimates alone. The performance of this global model also aligns with existing country and regional LUR models. The global model developed here can be used to estimate the magnitude and spatial distribution of global NO2 concentrations and resulting health burden as well as to be applied to health studies in countries where NO2 data and models are not available.

Globally, the NO2 model explains 54% of annual NO2 variation with a RMSE of 5 ppb. The global model performed similarly in all regions with a range of R2 from 0.42 (Africa) to 0.67 (South America). We built a parsimonious model using Lasso regression with parameters that restricted variable selection to correspond to hypothesized effect directions and by limiting inclusion of the same variables but slightly different buffer sizes. This resulted in a model with 10 predictor variables. Some of these variables had limited global but significant regional associations. For example, while population density explained only 1% of global NO2 variation, it explains 3% of variation in Asia. An advantage of a parsimonious model is that it can identify specific associations between variables and NO2. For example, in our final model, NO2 increased by 0.92 ppb for every 100 meters of additional road length within 31,400 square meters of area (circular area with a 100m radius), after adjusting for multiple factors, including satellite-based and regional intercept adjustments for background NO2 levels. A fully unconstrained predictive global model results in model overfitting and contradictory interactions among variables (e.g., models with positive and negative predictors of the same variable); accordingly we focus instead on the constrained model results.

Sensitivity analyses using regional models built using residuals from the global models demonstrated that there is limited additional predictive power to be gained by regionally optimizing the variables included in our global model. Except for North America, variables selected by the regional residual models were 50km in buffer size, suggesting that in general residual models are capturing regional adjustments rather than fine-scale adjustments to NO2 concentrations. This was surprising as we had hypothesized that regional adjustments would capture different traffic levels and vehicle emissions differences (e.g. coefficients for major road variables would be larger in Asia, Africa and South America compared to North America and Europe where there are newer vehicles and more stringent fuel and emission standards). Nevertheless, future gains in global LUR modeling may involve adding additional variables, such as traffic counts, vehicle fleet composition, emission standards, and point source emission estimates, which can capture different dynamics of NO2 concentrations beyond the land use variables included in our model.

While there are no published global NO2 LUR models to compare our results to, there are several continental models. In comparison, the MAE and Adj R2 of continental United States model developed by Novotny et al. were 2.4 ppb and 0.78, respectively, compared with our North American model values of 4.4 ppb and 0.52 and our global model of 3.7 ppb and 0.54. Similarly, MAE and Adj R2 of the Australia regional model developed by Knibbs et al. (2014) were 1.4 ppb and 0.81, respectively, compared with our Oceania model values of 2.4 ppb and 0.52. In Western Europe, the MAE and Adj R2 were 8.8 μg/m3 and 0.56, respectively, compared with our Europe model values of 3.3 ppb and 0.57. Except for Western Europe, Adj R2 were greater and MAE were lower in the existing referenced models than our regional models. Adj R2 for our European regional model is within 1% of the existing Western European reference models.15,24 MAE is likewise similar between our regional European model and reference model(3.3 ppb and 8.8 μg/m3, respectively).15 It is noteworthy that the European model is the only reference model to include more than one country within its’ spatial extent. For the other existing models, there are several potential reasons for our lower model performance, including the greater spatial extent across multiple countries, air monitor data spanning multiple years, mismatch between NO2 satellite surface year and air monitor year, fewer air monitor measurements (for Australia), gradual reduction in NO2 levels in North America (~5% annually15), and additional variables such as sun intensity and year indicator in the Australian model. To test the impact of these potential factors we performed additional sensitivity analysis to match our modelling approach as closely as possible to the existing NO2 LUR models summarized above (see Supplemental Text). These sensitivity analyses suggest that discrepancies in regional performances and reference models are largely attributable to the factors described above and that our modelling approach is valid for capturing NO2 variation globally and regionally.

Our global NO2 LUR has several limitations. One major limitation is the global representation of available NO2 monitoring data. Most data came from North America, Europe, Japan and China. While we were able to obtain some data from South American and South Asian countries, limited monitoring data were available for Africa. Given the lack of monitoring data in many countries, we chose to include monitors that did not have enough documentation regarding temporal coverage (i.e., 75% hourly coverage throughout the year). Despite this relaxation of air monitor quality control requirements, some air monitor data from several countries, including Ecuador, Russia, and India, and China, were removed due to uncertainty in measurement quality. We anticipate that in upcoming years, as monitoring efforts continue and expand globally, it will be easier to enforce a selection criterion without sacrificing representation from specific countries. Cross-validation shows that our model is robust to air monitor selection for North America, Europe, Asia, and, to a lesser extent, South America, but not for Africa and Oceania. Additional monitoring data are needed in these regions, particularly in Africa where rapid urbanization is occurring.2426

A second limitation is that our approach did not include data on vehicle fleet composition, emission standards or traffic counts. Surprisingly, continental models of residuals did not change the variable selection or coefficients significantly. Future modelling could apply country-specific adjustments, for the small number of countries with sufficient monitoring coverage. Third, we chose not to include an adjustment factor for multiple years due to simultaneous decreasing and increasing trends in NO2 in different regions of the world. We also chose to include monitor measurements from more recent years (up through 2015) as limiting the monitor dataset to monitors matching the temporal coverage of satellite-based NO2 surface estimates (up through 2011) would exclude most air monitor measurements we collected from developing countries. We anticipate that this limitation can be addressed by adding a time series component, along with concomitant updates to satellite-based surface estimates and multiple years of measurements in developing countries, which could significantly improve global estimates. Updated satellite estimates in conjunction with several more years of collected global air monitor measurements would allow for spatiotemporal modeling, in a similar fashion to the continental United States LUR NO2 model developed for North America by Bechle et al.13

The regional MABs in our bootstrap analysis range from 31.8 – 74.3 %. By documenting regional differences in error and bias, regional differences in model performance can be considered when performing inter-regional comparisons in air quality, exposure, and related burdens. Regional models with better performance are better suited than the global model presented here for studies in which the study area is within the regional model extent.

In conclusion, we have created and demonstrated the robustness of the first global NO2 LUR model, which captures the important fine-scale spatial variability of NO2 air pollution. Globally, the model predicts 54% of annual NO2 variation (Adj R2 = 0.54), with continental R2 ranging from 0.42 to 0.67. Additional air monitor coverage in Africa, Oceania and, to a lesser extent, South America will increase confidence in model predictions for these sparsely covered regions. The NO2 LUR model developed here can be used to estimate the magnitude and spatial distribution of global NO2 exposures and resulting health burden as well as to be applied to health studies in countries without extensive monitoring networks. We are currently running this model globally (at a 100m resolution) to assign to population locations to estimate global exposure to NO2 and resulting health burden. Once completed we will make the global NO2 estimates available at http://health.oregonstate.edu/labs/spatial-health.

Supplementary Material

Supplement

Figure S1. Method for calculating length of roads upwind from air monitor locations.

Figure S2. Comparison of developed models.

Figure S3. Distribution of mean NO2 by continental region.

Figure S4. Predicted vs. observed NO2 concentrations, by continental region.

Figure S5. Model residuals by continental region.

Figure S6. Percent reduction in R2 for each model input variable both globally and by continental region.

Figure S7. Variable contributions towards NO2 predictions for New York City, USA and Delhi, India transects.

Figure S8. Boundaries used to define continental regions

Table S1. Predictor variable data sources and characteristics.

Table S2. Model performance in bootstrap 10% cross-validation (n=10,000).

Table S3. Performance of residual models.

Supplemental Text. Additional model sensitivity analyses.

Supplemental Excel File. List of air monitor data sources and corresponding urls.

Acknowledgments

The authors are grateful to Brittany Heller for collecting much of the NO2 air monitor datasets. The authors would also like to acknowledge regulatory agencies across the globe for providing publicly available air monitor measurements and quality control data. This research supported by the Office of the Director, National Institutes of Health under Award Number DP5OD019850. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The work of Y. Liu and Q. Xiao was partially supported by the NASA Applied Sciences Program (Grant # NNX11AI53G and NNX16AQ28G, PI: Liu).

References

  • 1.Forouzanfar MH, Afshin A, Alexander LT, Anderson HR, Bhutta ZA, Biryukov S, Brauer M, Burnett R, Cercy K, Charlson FJ, et al. Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2015. Lancet. 2016;388:1659–1674. doi: 10.1016/S0140-6736(16)31679-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Brauer M, Freedman G, Frostad J, Van Donkelaar A, Martin RV, Dentener F, van Dingenen R, Estep K, Amini H, Apte JS, et al. Ambient air pollution exposure estimation for the global burden of disease 2013. Environ Sci Technol. 2015;50(1):79–88. doi: 10.1021/acs.est.5b03709. [DOI] [PubMed] [Google Scholar]
  • 3.Wellenius G, Schwartz J, Mittleman M. Health and the environment: addressing the health impact of air pollution. Sixty-Eighth World Health Assem Agenda Item. 14:A68. [Google Scholar]
  • 4.Beckerman B, Jerrett M, Brook JR, Verma DK, Arian MA, Finkelstein MM. Correlation of nitrogen dioxide with other traffic pollutants near a major expressway. Atmos Environ. 2008;42(2):275–290. 5. [Google Scholar]
  • 5.Khreis H, Kelly C, Tate J, Parslow R, Lucas K, Nieuwenhujisen M. Exposure to traffic-related air pollution and risk development of childhood asthma: A systematic review and meta-analysis. Environ Int. 2017;100:1–31. doi: 10.1016/j.envint.2016.11.012. [DOI] [PubMed] [Google Scholar]
  • 6.Gehring U, Gruzieva O, Agius RM, Beelen R, Custovic A, Cyrys J, Eeftens M, Flexeder C, Fuertes E, Heinrich J, et al. Air pollution exposure and lung function in children: the ESCAPE project. Environ Health Perspect Online. 2013;121(11–12):1357. doi: 10.1289/ehp.1306770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Rice MB, Ljungman PL, Wilker EH, Gold DR, Schwartz JD, Koutrakis P, Washko GR, O’Connor GT, Mittleman MA. Short-term exposure to air pollution and lung function in the Framingham Heart Study. Am J Respir Crit Care Med. 2013;188(11):1351–1357. doi: 10.1164/rccm.201308-1414OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hamra GB, Laden F, Cohen AJ, Raasachou-Nielsen O, Brauer M, Loomis D. Environ Health Perspect. 2015;123(11):1107–1112. doi: 10.1289/ehp.1408882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Karner AA, Eisinger DS, Niemeier DA. Near-roadway air quality: synthesizing the findings from real-world data. Environ Sci Technol. 2010;44(14):5334–5344. doi: 10.1021/es100008x. [DOI] [PubMed] [Google Scholar]
  • 10.Crouse DL, Peters PA, Villeneuve PJ, Proux MO, Shin HH, Golberg MS, Johnson M, Wheeler AJ, Allen RW, Atari DO, Jerrett M, Brauer M, Brook JR, Cakmak S, Burnett RT. Within- and between-city contrasts in nitrogen dioxide and mortality in 10 Canadian cities; a subset of the Canadian Census Health and Environment Cohort (CanCHEC) J Exp Sci Environ Epidemiol. 2015;25(5):482–489. doi: 10.1038/jes.2014.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Novotny EV, Bechle MJ, Millet DB, Marshall JD. National satellite-based land-use regression: NO2 in the United States. Environ Sci Technol. 2011;45(10):4407–4414. doi: 10.1021/es103578x. [DOI] [PubMed] [Google Scholar]
  • 12.Young MT, Bechle MJ, Sampson PD, Szpiro AA, Marshall JD, Sheppard L, Kaufman JD. Satellite-based NO2 and model validation in a national prediction model based on universal Kriging and land-use regression. Environ Sci Technol. 2016;50(7):3686–3694. doi: 10.1021/acs.est.5b05099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bechle MJ, Millet DB, Marshall JD. National Spatiotemporal Exposure Surface for NO2: Monthly Scaling of a Satellite-Derived Land-Use Regression, 2000–2010. Environ Sci Technol. 2015;49(20):12297–12305. doi: 10.1021/acs.est.5b02882. [DOI] [PubMed] [Google Scholar]
  • 14.Hystad P, Setton E, Cervantes A, Poplawski K, Deschenes S, Brauer M, van Donkelaar A, Lamsal L, Martin R, Jerrett M, et al. Creating national air pollution models for population exposure assessment in Canada. Environ Health Perspect. 2011;119(8):1123. doi: 10.1289/ehp.1002976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Vienneau D, de Hoogh K, Bechle MJ, Beelen R, van Donkelaar A, Martin RV, Millet DB, Hoek G, Marshall JD. Western European land use regression incorporating satellite-and ground-based measurements of NO2 and PM10. Environ Sci Technol. 2013;47(23):13555–13564. doi: 10.1021/es403089q. [DOI] [PubMed] [Google Scholar]
  • 16.Knibbs LD, Hewson MG, Bechle MJ, Marshall JD, Barnett AG. A national satellite-based land-use regression model for air pollution exposure assessment in Australia. Environ Res. 2014;135:204–211. doi: 10.1016/j.envres.2014.09.011. [DOI] [PubMed] [Google Scholar]
  • 17.Geddes JA, Martin RV, Boys BL, van Donkelaar A. Long-term trends worldwide in ambient NO2 concentrations inferred from satellite observations. Environ Health Perspect Online. 2016;124(3):281. doi: 10.1289/ehp.1409567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.van Rossum G, Drake FL., Jr Extending and embedding Python, Release 2.7. Python Softw Found Wolfeboro Falls. 2010 [Google Scholar]
  • 19.ArcGIS E 10.3. 1. Environmental Systems Research Institute, Inc; Redlands: 2015. [Google Scholar]
  • 20.Saha S, Moorthi S, Pan H-L, Wu X, Wang J, Nadiga S, Tripp P, Kistler R, Woollen J, Behringer D, et al. The NCEP climate forecast system reanalysis. Bull Am Meteorol Soc. 2010;91(8):1015–1057. [Google Scholar]
  • 21.Moore RT, Hansen MC. Google Earth Engine: a new cloud-computing platform for global-scale earth observation data and analysis. AGU Fall Meeting Abstracts; 2011; p. 2. [Google Scholar]
  • 22.Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1. [PMC free article] [PubMed] [Google Scholar]
  • 23.Studio R. RStudio: integrated development environment for R. RStudio Inc; Boston Mass: 2012. [Google Scholar]
  • 24.de Hoogh K, Gulliver J, van Donkelaar A, Martin RV, Marshall JD, Bechle MJ, Cesaroni G, Pradas MC, Dedele A, Eeftens M, et al. Development of West-European PM 2.5 and NO 2 land use regression models incorporating satellite-derived and chemical transport modelling data. Environ Res. 2016;151:1–10. doi: 10.1016/j.envres.2016.07.005. [DOI] [PubMed] [Google Scholar]
  • 25.Henderson JV, Storeygard A, Deichmann U. Has climate change driven urbanization in Africa? J Dev Econ. 2017;124:60–82. doi: 10.1016/j.jdeveco.2016.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Parnell S, Walawege R. Sub-Saharan African urbanisation and global environmental change. Glob Environ Change. 2011;21:S12–S20. [Google Scholar]
  • 27.Gerland P, Raftery AE, Ševčíková H, Li N, Gu D, Spoorenberg T, Alkema L, Fosdick BK, Chunn J, Lalic N, et al. World population stabilization unlikely this century. Science. 2014;346(6206):234–237. doi: 10.1126/science.1257469. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

Figure S1. Method for calculating length of roads upwind from air monitor locations.

Figure S2. Comparison of developed models.

Figure S3. Distribution of mean NO2 by continental region.

Figure S4. Predicted vs. observed NO2 concentrations, by continental region.

Figure S5. Model residuals by continental region.

Figure S6. Percent reduction in R2 for each model input variable both globally and by continental region.

Figure S7. Variable contributions towards NO2 predictions for New York City, USA and Delhi, India transects.

Figure S8. Boundaries used to define continental regions

Table S1. Predictor variable data sources and characteristics.

Table S2. Model performance in bootstrap 10% cross-validation (n=10,000).

Table S3. Performance of residual models.

Supplemental Text. Additional model sensitivity analyses.

Supplemental Excel File. List of air monitor data sources and corresponding urls.

RESOURCES