A Global Land Use Regression Model for Nitrogen Dioxide Air Pollution

Andrew Larkin; Jeffrey A Geddes; Randall V Martin; Qingyang Xiao; Yang Liu; Julian D Marshall; Michael Brauer; Perry Hystad

doi:10.1021/acs.est.7b01148

. Author manuscript; available in PMC: 2018 Jun 20.

Published in final edited form as: Environ Sci Technol. 2017 Jun 5;51(12):6957–6964. doi: 10.1021/acs.est.7b01148

A Global Land Use Regression Model for Nitrogen Dioxide Air Pollution

Andrew Larkin ^a,^*, Jeffrey A Geddes ^b, Randall V Martin ^c,^d, Qingyang Xiao ^e, Yang Liu ^e, Julian D Marshall ^f, Michael Brauer ^g, Perry Hystad ^a

PMCID: PMC5565206 NIHMSID: NIHMS892820 PMID: 28520422

Abstract

Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the global distribution of NO₂ exposure and associated impacts on global health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO₂) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO₂ variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n=10,000) demonstrated robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R2 within 2%) but not for Africa and Oceania (adjusted R2 within 11%) where NO₂ monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO₂ concentrations. Variable contributions differed between continental regions but major roads within 100m and satellite-derived NO₂ were consistently the strongest predictors. The resulting model will be made available and can be used for global risk assessments and health studies, particularly in countries without existing NO₂ monitoring data or models.

INTRODUCTION

Outdoor air pollution is a source of concern for global human health. The most recent version of the global burden of disease estimated that ambient fine particulate matter less than 2.5 microns (PM_2.5) contributes to 4.2 million annual deaths and ozone an additional 254,000 deaths.¹ More than 50% of the disease burden from air pollution is in the rapidly developing countries of China and India, where air pollution concentrations are high and populations large.² In 2015, the World Health Assembly identified air pollution as the “…world’s largest single environmental health risk”, and called for additional efforts to monitor and evaluate the impacts of air pollution on health.³

To fully evaluate ambient air pollution impacts on human health, population exposure estimates should extend beyond PM_2.5 and ozone to better represent additional common exposures, such as traffic air pollution, for which nitrogen dioxide (NO₂) is a common marker⁴. A growing body of evidence links traffic-related pollution to myriad acute and chronic adverse health outcomes, including increased odds of incident asthma in children⁵, decreased lung function in children⁶ and adults⁷, and positive association with lung cancer in adults⁸.

The global distribution of NO₂ exposure and concomitant impacts on global health is still largely uncertain, in part because of challenges in estimating global NO₂ concentrations. NO₂ air monitor networks are sparse or non-existent in many low-income countries. Where they do exist, they generally do not capture the important spatial gradients needed to understand exposures, e.g., NO₂ concentrations near major roads and highways (100–400m⁹). Capturing fine scale NO₂ gradients are important for exposure assessments, as within city variation is more strongly associated with multiple non-accidental causes of mortality than between city variation in annual NO₂ concentrations¹⁰. Similarly, NO₂ estimates derived from moderate resolution remote sensing products (those at ~ 10km x 10km resolution) do not capture fine-scale NO₂ gradients. Land use regression (LUR) models can predict NO₂ concentrations across large spatial extents, and have been created for large geographic areas, including the continental United States,^11–13 Canada,¹⁴ Europe,¹⁵ and Australia.¹⁶ These models are built from NO₂ monitor data and combine satellite-based NO₂ estimates with land use characteristics and roadway information to predict NO₂ concentrations at fine spatial scales (30m–500m). In 2011, Novotny and associates¹¹ demonstrated the utility of land classification datasets available globally in place of high-resolution country level equivalents with no loss of predictive power in their LUR model for the continental US. The LUR approach is therefore an excellent candidate for developing high-resolution NO₂ estimates at a global extent.

Here, we present the development of the first global NO₂ LUR model (for 2011) based on annual NO₂ measurement data (n=5,220) compiled for 58 countries and available global predictor datasets. A global model of NO₂ will inform global risk assessments in terms of estimates of NO₂ exposure and associated health burden as well as provide standardized NO₂ estimates for multi-country studies and NO₂ estimates for health studies in developing countries where detailed city-specific or country-specific models do not exist.

METHODS

NO₂ Air Pollution Monitoring Data

Annual NO₂ air monitor measurements were collected from a wide range of environmental and regulatory agency websites (Supplemental Excel File). Air monitors from the US, Canada, Europe, China, and Japan were restricted to air monitors with greater than 75% hourly coverage. In other countries, percent coverage was not provided or the temporal unit to derive percent coverage (daily, hourly, or monthly) was unknown. For those countries, we therefore further restricted monitor selection to monitors with annual standard deviation less than 25 ppb, at least two years of annual measurements, year-round coverage (greater than 75% coverage or a positive indicator complete coverage for each month), and latitude, longitude coordinates with four or more decimal places of precision (i.e., to within 12 meters). The full database of collected NO₂ air monitor measurements is available at http://health.oregonstate.edu/labs/spatial-health/resources/. To match air monitor measurements with satellite-based surface NO₂ estimates (described below), mean annual NO₂ measurements by the monitor were calculated for each monitor using up to three annual measurements closest to the year 2011.

Predictor Variables

Predictor variables included satellite NO₂ estimates and land use related variables. All predictor variables and corresponding input data sources are listed in Table S1. Variables consisted of either estimates at the exact air monitor location (point) or an average of a variable within a radius around the air monitor location (buffer). First, satellite-based estimates of surface NO₂ concentrations from 2010–2012 were applied to each monitor. Briefly, tropospheric NO₂ column retrievals from the SCIAMACHY and GOME-2 instruments were combined with output from the global GEOS-Chem model to produce gridded NO₂ surface estimates at ~10km x 10km resolution.¹⁷ The three-year averages were based on daily overpass data after excluding pixels contaminated by cloud (cloud radiance fraction > 0.5) and snow (estimated using snow cover from the National Ice Center’s Interactive Snow and Ice Mapping System). Potential sampling biases in the annual means were accounted for by applying a GEOS-Chem model correction for the missing days.

For each land use characteristic evaluated as a buffer, multiple buffer variables were created, ranging from 100m to 50km in radius (buffer distances are listed in Table S1). Land use characteristics in the dataset include normalized difference vegetation index (NDVI), tree cover, impervious surface area, population density, major and minor road length, length of major roads upwind from air monitors, power plant CO2 emissions, active fires and distance to coast. Major roads upwind from air monitors consists of the average length of major roads upwind from an air monitor station in each year (Figure S1). Buffer variable and point estimates were calculated using Python v. 2.7 ¹⁸ scripts written for automated analysis in ArcGIS v. 10.3.1.¹⁹ Annual distributions of wind direction from the National Centers for Environmental Prediction Climate Forecast System²⁰ were calculated using a Python script written for automated analysis in Google Earth Engine²¹ (Python scripts are available at https://github.com/larkinandy/LUR-NO₂-Model).

Statistical Analysis

LUR models were developed using Lasso variable selection (glment package,^16,22 in RStudio, v. 3.2.2²³). Lasso regression was successfully utilized by Knibbs et al. (2014) to create their Australian NO₂ land use regression model. Parameters for Lasso variable selection include standardizing independent variables (standardization = True), selecting variables to minimize mean-square error (type.measure = ‘mse’), and forcing the direction of variable coefficients to conform to apriori hypotheses (e.g. increases in major roads and tree cover are associated with increases and decreases in NO₂ concentrations, respectively) (lower.lim = 0). The lasso model with a lambda cross-validation score of one standard deviation from the minimum cross-validation score was selected as the model of choice to favor model simplification and inference over model prediction (s = lambda.1se). To reduce multicollinearity, models with incremental buffer sizes of the same land use characteristic were reduced to only include the smallest buffer size, if the radii of the larger buffers were within three times the radii of the smaller buffers. For example, if major roads variables with 100m, 200m, and 400m buffer sizes were all selected by lasso regression, only the 100m and 400m variables would be included in the regression model. Finally, variables were included in the final model if they were statistically significant, increased adjusted R2 either globally or in one or more continental regions, by 1 percent or more, exhibited variance inflation factors less than 5 for at least one region and less than 10 for all regions.

Model performance was evaluated by calculating root mean squared error (RMSE), mean absolute error (MAE), R-squared (R2), adjusted R-squared (Adj R2), mean percent bias (MB), and mean absolute percent bias (MAB) for the entire global dataset as well as within each continental region. Leave 10% out cross-validation was performed, in which 10% of the monitors from each continental region were randomly sampled into a testing dataset, with the remaining 90% from each region combined to create the model training dataset. Cross-validation was repeated in a bootstrap fashion 10,000 times to generate cross-validation estimates of RMSE, MAE, and R2 both globally and within each continental region.

Several sensitivity analyses were performed to evaluate the robustness of our global model. Continental LUR models were created for each region and compared to the previously published LUR models for the continental US, Canada, Europe, and Australia. Continent specific models were also created from the residuals of the global model to identify variables excluded from the global model that may be important in capturing regional variation. For a comparison of the global model, regional model and residual model methodologies, see Figure S2. To test model sensitivity and overfitting of vegetation levels, we performed two t-tests comparing residuals in the bottom (NDVI < 0.28) and top (NDVI > 0.57) decile of average vegetative cover within 10km. The first t-test used satellite-based predictions, while the second t-test used the developed global model predictions.

The R scripts used to create the LUR models, perform model performance, and perform sensitivity analyses are available at https://github.com/larkinandy/LUR-NO₂-Model.

RESULTS

Global NO₂ Database

The distribution of NO₂ air measurements that passed selection criteria are shown in Figure 1, and the corresponding summary statistics, stratified by continental region, are shown in Table 1. Histograms of annual air monitor concentrations for each region are shown in Figure S3. Measurements were collected from 6,761 unique air monitors, 5,220 of which (77%) met selection criteria. Air monitor coverage is greatest in Europe and Asia, and sparse in Africa and Oceania. The global median year of air monitor measurements in this database is 2013, with median year of annual measurements by continent ranging from 2011 to 2013.5. Annual NO₂ concentrations range from 0 to 59 ppb, with mean annual air monitor concentration of 11.5 ppb. Mean concentrations are greatest in Asia (14.1 ppb) and North America (13.1 ppb), and lowest in Africa (7.3 ppb) and Oceania (6.7 ppb). Regional standard deviation in air monitor averages range from 4.5 to (Oceania) to 8.2 (North America), with a global average of 7.5 ppb.

Global Distribution of NO₂ Air Monitor Locations

Table 1.

NO₂ air monitor summary statistics, stratified by region.

Region	Median Year	Monitors (n)	Min NO₂ (ppb)	Max NO₂ (ppb)	Mean NO₂ (ppb)	Std Dev NO₂ (ppb)	25^th perc^*	50^th perc^*	75^th perc^*	90^th perc
N America	2011	731	0	44	13.1	8.2	6.7	11.3	17.3	24.3
S America	2011	105	1	35	12.7	7.6	7.0	10.5	18.6	23.2
Europe	2012	2351	0	47	11.8	6.8	7.0	11.0	15.5	21.5
Africa	2013.5	63	2	19	7.3	3.8	4.5	7.0	8.8	13.0
Asia	2012	1886	1	59	14.1	7.7	8.5	13.0	18.3	58.7
Oceania	2011	84	1	23	6.7	4.5	3.4	6.0	9.3	12.7
Global	2013	5220	0	59	11.5	7.5	7.3	11.4	16.7	23.0

Open in a new tab

percentile.

NO₂ LUR Model

The final LUR model performance is shown below in Table 2. Global model predictions are shown in Figure 2, and predicted vs. observed air monitor measurements are shown in Figures 3 (global) and S4 (by region). Final model variables are summarized in Table 3. Globally, the NO₂ model explains 54% of annual NO₂ variation, with MAE of 3.7 ppb and MAB of 44%. Model predictions are positively biased (25%) with positive and negative bias in general at air monitor locations with annual concentrations below 10ppb and above 40ppb, respectively (Figure 3). Regionally, adjusted R2 ranges from 0.31 (Africa) to 0.63 (South America), MAE ranges from 2.3 (Africa) to 4.4 ppb (North America), and MAB ranges from 34% (Asia) to 74% (North America). In general, model performance in each region is positively associated with regional NO₂ standard deviation but not sample size (Table 1). Global distribution of model residuals is shown in Figure S5. Residuals are greatest in North America and smallest in Oceania.

Table 2.

NO₂ model training performance.

Region	RMSE (ppb)^*	MAE (ppb)^**	R2	AdjR2	MB^*** (%)	MAB (%)^****
N America	5.7	4.4	0.52	0.52	52	74
S America	4.4	3.1	0.67	0.63	29	44
Europe	4.8	3.5	0.52	0.52	24	43
Africa	2.9	2.3	0.42	0.31	20	41
Asia	5.3	3.7	0.52	0.51	16	34
Oceania	3.2	2.4	0.51	0.44	30	63
Global	5.0	3.7	0.54	0.54	25	44

Open in a new tab

RMSE – root mean square error.

^**

MAE – mean absolute error.

^***

MB - mean percent bias.

^****

MAB – mean absolute percent bias.

Global NO2 model predictions for the year 2011. Inserts of select cities for each continental region demonstrate within city variation of model predictions

Predicted vs observed mean annual NO2 concentrations. Values are moderately correlated with a positive mean bias.

Table 3.

Global NO₂ LUR model structure

Variable	Units	IQR	Buffer Radius (km)	B	Std Err	Global %R2 reduction	Regional %R2 Reduction	Global p-value	Regional p-value
Intercept	Ppb	NA	NA	8.370	0.701	NA	NA	<0.01	NA
N America Intercept	Ppb	NA	NA	2.985	0.611	NA	NA	<0.01	NA
S America Intercept	Ppb	NA	NA	1.977	0.754	NA	NA	0.01	NA
Europe Intercept	Ppb	NA	NA	1.274	0.584	NA	NA	0.03	NA
Asia Intercept	Ppb	NA	NA	2.345	0.592	NA	NA	<0.01	NA
Major Roads	Km	0.18	0.1	9.241	0.410	9.1	13.5	<0.01	<0.01
Satellite-Based NO₂	Ppb	2.97	NA	0.832	0.038	8.8	19.5	<0.01	<0.01
Population Density	persons/km	2.09	3.5	0.231	0.032	3.3	3.3	<0.01	<0.01
Water Body	%	33	50	−3.883	0.394	1.9	12.3	<0.01	<0.01
Major Roads	Km	27.08	2.5	0.040	0.015	1.4	8.2	<0.01	<0.01
NDVI^***	Normalized	0.17	0.2	−8.290	1.287	0.8	11.9	<0.01	<0.01
Tree Cover	%	10.05	1.5	−0.023	0.006	0.3	7.6	<0.01	<0.01
ISA^****	%	33.96	1.5	0.028	0.008	0.2	2.4	<0.01	<0.01
ISA^****	%	25.05	7	0.029	0.010	0.1	2.9	0.01	<0.01
NDVI^***	Normalized	0.15	1.2	−1.600	1.524	0.02	11.5	0.29	<0.01

Open in a new tab

Global reduction in explained variance after removing variable from the model.

^**

Maximum reduction in explained variance in a given region after removing variable from the model. The Africa intercept was not significant and therefore not included in the final model. Oceania served as the reference group for regional intercepts. Variables are listed in order of global %R2 reduction.

^***

NDVI – Normalized Difference Vegetation Index.

^****

ISA – Impervious Surface Area. The global model includes two variables for NDVI and ISA, with different buffer distances for each variable. See Figures S1 and S6 and Table S1 for more information about model variables. Regional R2 and p-values are based on regional subsets of the global training dataset.

Variables with negative coefficients include NDVI, tree cover, and water body. Percent water body contributes the most to predicting lower concentrations in the model, with 0.39 ppb estimated decrease for each 10% increase in water body coverage within 50km. Variables with positive coefficients include satellite-based NO₂, impervious surface area, population density, and length of major roads. The most significant positive coefficient predictors are satellite-based NO₂ and major road length within 100m. For every 0.1-km increase in major road length within 0.1km, predicted NO₂ concentrations increase by 0.92 ppb. Some variables only explained greater than 1% of NO₂ variation in continental datasets. For example, NDVI within 200m contributes only 0.81% of the variation in the global dataset; however, removing NDVI 200m from the model significantly reduces percent variance explained in specific regions (e.g., in Oceania by 11.9%). The percent R2 reduction for all model variables by continental regions are shown in Figure S5.

The applied model predictions for New York City, USA, and for Delhi, India, are shown in Figure 4. Individual variable contributions toward model predictions across a transect of both cities are shown in Figure S7. Major roads 100m, NDVI 200m, and population density 3500m were strong predictors for NO₂ in both New York and Delhi. In New York, the strongest predictor was satellite NO₂, while in Delhi the strongest predictor was population density.

Predicted annual NO2 concentrations in New York City, USA (top left) and Delhi, India (bottom left). Green lines correspond to model transects, with model predictions along the transect (moving from southwest to northeast) shown on the top right for New York City and bottom right for Delhi.

Sensitivity Analyses

Results of the bootstrap 10% cross-validation are shown in Table S3. Globally, MAE is 0.1 ppb greater and R2 is 1% smaller compared to models trained with the entire dataset (Table 3). Regionally, MAEs and R2 are 0.3 greater and 4% lower, respectively, for Africa and 0.2 ppb greater and 18% lower, respectively, for Oceania. For all other regions, MAE and R2 are within 0.2 ppb and 5% of model training, respectively. Model performance is robust with respect to monitor sampling selection for South America, North America, Europe and Asia, but not for Africa and Oceania, likely due to small sample size and sparse spatial coverage.

Results of our regional model sensitivity analysis are shown below in Table 4. The R2 for all regional models was slightly higher than for the global model, except for Africa. Performance of the residual models are also provided in Table S2. With the exception of North America, Africa, and Oceania, residual models improve R2 by less than 2% compared to the global model. In addition, most of the variables selected by the residual models were 50km in buffer size, except for North America where the model included minor roads within a 50km buffer. In comparison to global satellite estimates alone, MSE is lower (6.6 vs. 5.0 ppb) and AdjR2 is greater (0.22 vs. 0.54) in the global LUR model.

Table 4.

Models created from regional partitions of the global dataset. Regional model performance of all models within their respective extents are greater compared the global, although the difference in performance varies.

Region	RMSE^* (ppb)	MAE^** (ppb)	R2	Adj R2	MB(%)^***	MAB^**** (%)
N America	5.0	3.8	0.64	0.63	31	52
S America	3.5	2.6	0.79	0.77	20	36
Europe	4.5	3.3	0.57	0.57	20	38
Africa	2.9	2.4	0.41	0.38	21	43
Asia	4.9	3.5	0.59	0.58	16	33
Oceania	3.2	2.3	0.49	0.46	38	62

Open in a new tab

RMSE – root mean square error.

^**

MAE – mean absolute error.

^***

MB – mean percent bias.

^****

MAB - mean absolute percent bias.

In our comparison of satellite and LUR model performance in areas with low and high vegetation, residuals from satellite estimates are significantly greater for areas with low vegetative cover compared to regions with high vegetative cover (p<0.001, 95% CI 7.4:9.2 ppb). Residuals from the developed global land use regression model, however, do not significantly differ between regions with high and low vegetation (p-value = 0.52, 95% CI −0.9:0.5 ppb). These results demonstrate the utility of and provide justification for the use of land use characteristics for improving satellite-based NO2 predictions.

DISCUSSION

Using 5,220 monitors from 58 countries, we developed a global LUR model that captured a large proportion of the NO₂ variation. Importantly, this model captured both between and within-city spatial variability in NO₂, representing fine-scale variation that is difficult to achieve using satellite-based estimates alone. The performance of this global model also aligns with existing country and regional LUR models. The global model developed here can be used to estimate the magnitude and spatial distribution of global NO₂ concentrations and resulting health burden as well as to be applied to health studies in countries where NO₂ data and models are not available.

Globally, the NO₂ model explains 54% of annual NO₂ variation with a RMSE of 5 ppb. The global model performed similarly in all regions with a range of R2 from 0.42 (Africa) to 0.67 (South America). We built a parsimonious model using Lasso regression with parameters that restricted variable selection to correspond to hypothesized effect directions and by limiting inclusion of the same variables but slightly different buffer sizes. This resulted in a model with 10 predictor variables. Some of these variables had limited global but significant regional associations. For example, while population density explained only 1% of global NO₂ variation, it explains 3% of variation in Asia. An advantage of a parsimonious model is that it can identify specific associations between variables and NO₂. For example, in our final model, NO₂ increased by 0.92 ppb for every 100 meters of additional road length within 31,400 square meters of area (circular area with a 100m radius), after adjusting for multiple factors, including satellite-based and regional intercept adjustments for background NO₂ levels. A fully unconstrained predictive global model results in model overfitting and contradictory interactions among variables (e.g., models with positive and negative predictors of the same variable); accordingly we focus instead on the constrained model results.

Sensitivity analyses using regional models built using residuals from the global models demonstrated that there is limited additional predictive power to be gained by regionally optimizing the variables included in our global model. Except for North America, variables selected by the regional residual models were 50km in buffer size, suggesting that in general residual models are capturing regional adjustments rather than fine-scale adjustments to NO₂ concentrations. This was surprising as we had hypothesized that regional adjustments would capture different traffic levels and vehicle emissions differences (e.g. coefficients for major road variables would be larger in Asia, Africa and South America compared to North America and Europe where there are newer vehicles and more stringent fuel and emission standards). Nevertheless, future gains in global LUR modeling may involve adding additional variables, such as traffic counts, vehicle fleet composition, emission standards, and point source emission estimates, which can capture different dynamics of NO₂ concentrations beyond the land use variables included in our model.

While there are no published global NO₂ LUR models to compare our results to, there are several continental models. In comparison, the MAE and Adj R2 of continental United States model developed by Novotny et al. were 2.4 ppb and 0.78, respectively, compared with our North American model values of 4.4 ppb and 0.52 and our global model of 3.7 ppb and 0.54. Similarly, MAE and Adj R2 of the Australia regional model developed by Knibbs et al. (2014) were 1.4 ppb and 0.81, respectively, compared with our Oceania model values of 2.4 ppb and 0.52. In Western Europe, the MAE and Adj R2 were 8.8 μg/m³ and 0.56, respectively, compared with our Europe model values of 3.3 ppb and 0.57. Except for Western Europe, Adj R2 were greater and MAE were lower in the existing referenced models than our regional models. Adj R2 for our European regional model is within 1% of the existing Western European reference models.^15,24 MAE is likewise similar between our regional European model and reference model(3.3 ppb and 8.8 μg/m³, respectively).¹⁵ It is noteworthy that the European model is the only reference model to include more than one country within its’ spatial extent. For the other existing models, there are several potential reasons for our lower model performance, including the greater spatial extent across multiple countries, air monitor data spanning multiple years, mismatch between NO₂ satellite surface year and air monitor year, fewer air monitor measurements (for Australia), gradual reduction in NO₂ levels in North America (~5% annually¹⁵), and additional variables such as sun intensity and year indicator in the Australian model. To test the impact of these potential factors we performed additional sensitivity analysis to match our modelling approach as closely as possible to the existing NO₂ LUR models summarized above (see Supplemental Text). These sensitivity analyses suggest that discrepancies in regional performances and reference models are largely attributable to the factors described above and that our modelling approach is valid for capturing NO₂ variation globally and regionally.

Our global NO₂ LUR has several limitations. One major limitation is the global representation of available NO₂ monitoring data. Most data came from North America, Europe, Japan and China. While we were able to obtain some data from South American and South Asian countries, limited monitoring data were available for Africa. Given the lack of monitoring data in many countries, we chose to include monitors that did not have enough documentation regarding temporal coverage (i.e., 75% hourly coverage throughout the year). Despite this relaxation of air monitor quality control requirements, some air monitor data from several countries, including Ecuador, Russia, and India, and China, were removed due to uncertainty in measurement quality. We anticipate that in upcoming years, as monitoring efforts continue and expand globally, it will be easier to enforce a selection criterion without sacrificing representation from specific countries. Cross-validation shows that our model is robust to air monitor selection for North America, Europe, Asia, and, to a lesser extent, South America, but not for Africa and Oceania. Additional monitoring data are needed in these regions, particularly in Africa where rapid urbanization is occurring.^24–26

A second limitation is that our approach did not include data on vehicle fleet composition, emission standards or traffic counts. Surprisingly, continental models of residuals did not change the variable selection or coefficients significantly. Future modelling could apply country-specific adjustments, for the small number of countries with sufficient monitoring coverage. Third, we chose not to include an adjustment factor for multiple years due to simultaneous decreasing and increasing trends in NO₂ in different regions of the world. We also chose to include monitor measurements from more recent years (up through 2015) as limiting the monitor dataset to monitors matching the temporal coverage of satellite-based NO₂ surface estimates (up through 2011) would exclude most air monitor measurements we collected from developing countries. We anticipate that this limitation can be addressed by adding a time series component, along with concomitant updates to satellite-based surface estimates and multiple years of measurements in developing countries, which could significantly improve global estimates. Updated satellite estimates in conjunction with several more years of collected global air monitor measurements would allow for spatiotemporal modeling, in a similar fashion to the continental United States LUR NO₂ model developed for North America by Bechle et al.¹³

The regional MABs in our bootstrap analysis range from 31.8 – 74.3 %. By documenting regional differences in error and bias, regional differences in model performance can be considered when performing inter-regional comparisons in air quality, exposure, and related burdens. Regional models with better performance are better suited than the global model presented here for studies in which the study area is within the regional model extent.

In conclusion, we have created and demonstrated the robustness of the first global NO₂ LUR model, which captures the important fine-scale spatial variability of NO₂ air pollution. Globally, the model predicts 54% of annual NO₂ variation (Adj R2 = 0.54), with continental R2 ranging from 0.42 to 0.67. Additional air monitor coverage in Africa, Oceania and, to a lesser extent, South America will increase confidence in model predictions for these sparsely covered regions. The NO₂ LUR model developed here can be used to estimate the magnitude and spatial distribution of global NO₂ exposures and resulting health burden as well as to be applied to health studies in countries without extensive monitoring networks. We are currently running this model globally (at a 100m resolution) to assign to population locations to estimate global exposure to NO₂ and resulting health burden. Once completed we will make the global NO₂ estimates available at http://health.oregonstate.edu/labs/spatial-health.

Supplementary Material

Supplement

Figure S1. Method for calculating length of roads upwind from air monitor locations.

Figure S2. Comparison of developed models.

Figure S3. Distribution of mean NO₂ by continental region.

Figure S4. Predicted vs. observed NO₂ concentrations, by continental region.

Figure S5. Model residuals by continental region.

Figure S6. Percent reduction in R2 for each model input variable both globally and by continental region.

Figure S7. Variable contributions towards NO₂ predictions for New York City, USA and Delhi, India transects.

Figure S8. Boundaries used to define continental regions

Table S1. Predictor variable data sources and characteristics.

Table S2. Model performance in bootstrap 10% cross-validation (n=10,000).

Table S3. Performance of residual models.

Supplemental Text. Additional model sensitivity analyses.

Supplemental Excel File. List of air monitor data sources and corresponding urls.

NIHMS892820-supplement-Supplement.docx^{(3.2MB, docx)}

Acknowledgments

The authors are grateful to Brittany Heller for collecting much of the NO₂ air monitor datasets. The authors would also like to acknowledge regulatory agencies across the globe for providing publicly available air monitor measurements and quality control data. This research supported by the Office of the Director, National Institutes of Health under Award Number DP5OD019850. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The work of Y. Liu and Q. Xiao was partially supported by the NASA Applied Sciences Program (Grant # NNX11AI53G and NNX16AQ28G, PI: Liu).

References

1.Forouzanfar MH, Afshin A, Alexander LT, Anderson HR, Bhutta ZA, Biryukov S, Brauer M, Burnett R, Cercy K, Charlson FJ, et al. Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2015. Lancet. 2016;388:1659–1674. doi: 10.1016/S0140-6736(16)31679-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Brauer M, Freedman G, Frostad J, Van Donkelaar A, Martin RV, Dentener F, van Dingenen R, Estep K, Amini H, Apte JS, et al. Ambient air pollution exposure estimation for the global burden of disease 2013. Environ Sci Technol. 2015;50(1):79–88. doi: 10.1021/acs.est.5b03709. [DOI] [PubMed] [Google Scholar]
3.Wellenius G, Schwartz J, Mittleman M. Health and the environment: addressing the health impact of air pollution. Sixty-Eighth World Health Assem Agenda Item. 14:A68. [Google Scholar]
4.Beckerman B, Jerrett M, Brook JR, Verma DK, Arian MA, Finkelstein MM. Correlation of nitrogen dioxide with other traffic pollutants near a major expressway. Atmos Environ. 2008;42(2):275–290. 5. [Google Scholar]
5.Khreis H, Kelly C, Tate J, Parslow R, Lucas K, Nieuwenhujisen M. Exposure to traffic-related air pollution and risk development of childhood asthma: A systematic review and meta-analysis. Environ Int. 2017;100:1–31. doi: 10.1016/j.envint.2016.11.012. [DOI] [PubMed] [Google Scholar]
6.Gehring U, Gruzieva O, Agius RM, Beelen R, Custovic A, Cyrys J, Eeftens M, Flexeder C, Fuertes E, Heinrich J, et al. Air pollution exposure and lung function in children: the ESCAPE project. Environ Health Perspect Online. 2013;121(11–12):1357. doi: 10.1289/ehp.1306770. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Rice MB, Ljungman PL, Wilker EH, Gold DR, Schwartz JD, Koutrakis P, Washko GR, O’Connor GT, Mittleman MA. Short-term exposure to air pollution and lung function in the Framingham Heart Study. Am J Respir Crit Care Med. 2013;188(11):1351–1357. doi: 10.1164/rccm.201308-1414OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hamra GB, Laden F, Cohen AJ, Raasachou-Nielsen O, Brauer M, Loomis D. Environ Health Perspect. 2015;123(11):1107–1112. doi: 10.1289/ehp.1408882. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Karner AA, Eisinger DS, Niemeier DA. Near-roadway air quality: synthesizing the findings from real-world data. Environ Sci Technol. 2010;44(14):5334–5344. doi: 10.1021/es100008x. [DOI] [PubMed] [Google Scholar]
10.Crouse DL, Peters PA, Villeneuve PJ, Proux MO, Shin HH, Golberg MS, Johnson M, Wheeler AJ, Allen RW, Atari DO, Jerrett M, Brauer M, Brook JR, Cakmak S, Burnett RT. Within- and between-city contrasts in nitrogen dioxide and mortality in 10 Canadian cities; a subset of the Canadian Census Health and Environment Cohort (CanCHEC) J Exp Sci Environ Epidemiol. 2015;25(5):482–489. doi: 10.1038/jes.2014.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Novotny EV, Bechle MJ, Millet DB, Marshall JD. National satellite-based land-use regression: NO2 in the United States. Environ Sci Technol. 2011;45(10):4407–4414. doi: 10.1021/es103578x. [DOI] [PubMed] [Google Scholar]
12.Young MT, Bechle MJ, Sampson PD, Szpiro AA, Marshall JD, Sheppard L, Kaufman JD. Satellite-based NO2 and model validation in a national prediction model based on universal Kriging and land-use regression. Environ Sci Technol. 2016;50(7):3686–3694. doi: 10.1021/acs.est.5b05099. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bechle MJ, Millet DB, Marshall JD. National Spatiotemporal Exposure Surface for NO2: Monthly Scaling of a Satellite-Derived Land-Use Regression, 2000–2010. Environ Sci Technol. 2015;49(20):12297–12305. doi: 10.1021/acs.est.5b02882. [DOI] [PubMed] [Google Scholar]
14.Hystad P, Setton E, Cervantes A, Poplawski K, Deschenes S, Brauer M, van Donkelaar A, Lamsal L, Martin R, Jerrett M, et al. Creating national air pollution models for population exposure assessment in Canada. Environ Health Perspect. 2011;119(8):1123. doi: 10.1289/ehp.1002976. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Vienneau D, de Hoogh K, Bechle MJ, Beelen R, van Donkelaar A, Martin RV, Millet DB, Hoek G, Marshall JD. Western European land use regression incorporating satellite-and ground-based measurements of NO2 and PM10. Environ Sci Technol. 2013;47(23):13555–13564. doi: 10.1021/es403089q. [DOI] [PubMed] [Google Scholar]
16.Knibbs LD, Hewson MG, Bechle MJ, Marshall JD, Barnett AG. A national satellite-based land-use regression model for air pollution exposure assessment in Australia. Environ Res. 2014;135:204–211. doi: 10.1016/j.envres.2014.09.011. [DOI] [PubMed] [Google Scholar]
17.Geddes JA, Martin RV, Boys BL, van Donkelaar A. Long-term trends worldwide in ambient NO2 concentrations inferred from satellite observations. Environ Health Perspect Online. 2016;124(3):281. doi: 10.1289/ehp.1409567. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.van Rossum G, Drake FL., Jr Extending and embedding Python, Release 2.7. Python Softw Found Wolfeboro Falls. 2010 [Google Scholar]
19.ArcGIS E 10.3. 1. Environmental Systems Research Institute, Inc; Redlands: 2015. [Google Scholar]
20.Saha S, Moorthi S, Pan H-L, Wu X, Wang J, Nadiga S, Tripp P, Kistler R, Woollen J, Behringer D, et al. The NCEP climate forecast system reanalysis. Bull Am Meteorol Soc. 2010;91(8):1015–1057. [Google Scholar]
21.Moore RT, Hansen MC. Google Earth Engine: a new cloud-computing platform for global-scale earth observation data and analysis. AGU Fall Meeting Abstracts; 2011; p. 2. [Google Scholar]
22.Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1. [PMC free article] [PubMed] [Google Scholar]
23.Studio R. RStudio: integrated development environment for R. RStudio Inc; Boston Mass: 2012. [Google Scholar]
24.de Hoogh K, Gulliver J, van Donkelaar A, Martin RV, Marshall JD, Bechle MJ, Cesaroni G, Pradas MC, Dedele A, Eeftens M, et al. Development of West-European PM 2.5 and NO 2 land use regression models incorporating satellite-derived and chemical transport modelling data. Environ Res. 2016;151:1–10. doi: 10.1016/j.envres.2016.07.005. [DOI] [PubMed] [Google Scholar]
25.Henderson JV, Storeygard A, Deichmann U. Has climate change driven urbanization in Africa? J Dev Econ. 2017;124:60–82. doi: 10.1016/j.jdeveco.2016.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Parnell S, Walawege R. Sub-Saharan African urbanisation and global environmental change. Glob Environ Change. 2011;21:S12–S20. [Google Scholar]
27.Gerland P, Raftery AE, Ševčíková H, Li N, Gu D, Spoorenberg T, Alkema L, Fosdick BK, Chunn J, Lalic N, et al. World population stabilization unlikely this century. Science. 2014;346(6206):234–237. doi: 10.1126/science.1257469. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

Figure S1. Method for calculating length of roads upwind from air monitor locations.

Figure S2. Comparison of developed models.

Figure S3. Distribution of mean NO₂ by continental region.

Figure S4. Predicted vs. observed NO₂ concentrations, by continental region.

Figure S5. Model residuals by continental region.

Figure S6. Percent reduction in R2 for each model input variable both globally and by continental region.

Figure S7. Variable contributions towards NO₂ predictions for New York City, USA and Delhi, India transects.

Figure S8. Boundaries used to define continental regions

Table S1. Predictor variable data sources and characteristics.

Table S2. Model performance in bootstrap 10% cross-validation (n=10,000).

Table S3. Performance of residual models.

Supplemental Text. Additional model sensitivity analyses.

Supplemental Excel File. List of air monitor data sources and corresponding urls.

NIHMS892820-supplement-Supplement.docx^{(3.2MB, docx)}

[R1] 1.Forouzanfar MH, Afshin A, Alexander LT, Anderson HR, Bhutta ZA, Biryukov S, Brauer M, Burnett R, Cercy K, Charlson FJ, et al. Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2015. Lancet. 2016;388:1659–1674. doi: 10.1016/S0140-6736(16)31679-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Brauer M, Freedman G, Frostad J, Van Donkelaar A, Martin RV, Dentener F, van Dingenen R, Estep K, Amini H, Apte JS, et al. Ambient air pollution exposure estimation for the global burden of disease 2013. Environ Sci Technol. 2015;50(1):79–88. doi: 10.1021/acs.est.5b03709. [DOI] [PubMed] [Google Scholar]

[R3] 3.Wellenius G, Schwartz J, Mittleman M. Health and the environment: addressing the health impact of air pollution. Sixty-Eighth World Health Assem Agenda Item. 14:A68. [Google Scholar]

[R4] 4.Beckerman B, Jerrett M, Brook JR, Verma DK, Arian MA, Finkelstein MM. Correlation of nitrogen dioxide with other traffic pollutants near a major expressway. Atmos Environ. 2008;42(2):275–290. 5. [Google Scholar]

[R5] 5.Khreis H, Kelly C, Tate J, Parslow R, Lucas K, Nieuwenhujisen M. Exposure to traffic-related air pollution and risk development of childhood asthma: A systematic review and meta-analysis. Environ Int. 2017;100:1–31. doi: 10.1016/j.envint.2016.11.012. [DOI] [PubMed] [Google Scholar]

[R6] 6.Gehring U, Gruzieva O, Agius RM, Beelen R, Custovic A, Cyrys J, Eeftens M, Flexeder C, Fuertes E, Heinrich J, et al. Air pollution exposure and lung function in children: the ESCAPE project. Environ Health Perspect Online. 2013;121(11–12):1357. doi: 10.1289/ehp.1306770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Rice MB, Ljungman PL, Wilker EH, Gold DR, Schwartz JD, Koutrakis P, Washko GR, O’Connor GT, Mittleman MA. Short-term exposure to air pollution and lung function in the Framingham Heart Study. Am J Respir Crit Care Med. 2013;188(11):1351–1357. doi: 10.1164/rccm.201308-1414OC. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Hamra GB, Laden F, Cohen AJ, Raasachou-Nielsen O, Brauer M, Loomis D. Environ Health Perspect. 2015;123(11):1107–1112. doi: 10.1289/ehp.1408882. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Karner AA, Eisinger DS, Niemeier DA. Near-roadway air quality: synthesizing the findings from real-world data. Environ Sci Technol. 2010;44(14):5334–5344. doi: 10.1021/es100008x. [DOI] [PubMed] [Google Scholar]

[R10] 10.Crouse DL, Peters PA, Villeneuve PJ, Proux MO, Shin HH, Golberg MS, Johnson M, Wheeler AJ, Allen RW, Atari DO, Jerrett M, Brauer M, Brook JR, Cakmak S, Burnett RT. Within- and between-city contrasts in nitrogen dioxide and mortality in 10 Canadian cities; a subset of the Canadian Census Health and Environment Cohort (CanCHEC) J Exp Sci Environ Epidemiol. 2015;25(5):482–489. doi: 10.1038/jes.2014.89. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Novotny EV, Bechle MJ, Millet DB, Marshall JD. National satellite-based land-use regression: NO2 in the United States. Environ Sci Technol. 2011;45(10):4407–4414. doi: 10.1021/es103578x. [DOI] [PubMed] [Google Scholar]

[R12] 12.Young MT, Bechle MJ, Sampson PD, Szpiro AA, Marshall JD, Sheppard L, Kaufman JD. Satellite-based NO2 and model validation in a national prediction model based on universal Kriging and land-use regression. Environ Sci Technol. 2016;50(7):3686–3694. doi: 10.1021/acs.est.5b05099. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Bechle MJ, Millet DB, Marshall JD. National Spatiotemporal Exposure Surface for NO2: Monthly Scaling of a Satellite-Derived Land-Use Regression, 2000–2010. Environ Sci Technol. 2015;49(20):12297–12305. doi: 10.1021/acs.est.5b02882. [DOI] [PubMed] [Google Scholar]

[R14] 14.Hystad P, Setton E, Cervantes A, Poplawski K, Deschenes S, Brauer M, van Donkelaar A, Lamsal L, Martin R, Jerrett M, et al. Creating national air pollution models for population exposure assessment in Canada. Environ Health Perspect. 2011;119(8):1123. doi: 10.1289/ehp.1002976. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Vienneau D, de Hoogh K, Bechle MJ, Beelen R, van Donkelaar A, Martin RV, Millet DB, Hoek G, Marshall JD. Western European land use regression incorporating satellite-and ground-based measurements of NO2 and PM10. Environ Sci Technol. 2013;47(23):13555–13564. doi: 10.1021/es403089q. [DOI] [PubMed] [Google Scholar]

[R16] 16.Knibbs LD, Hewson MG, Bechle MJ, Marshall JD, Barnett AG. A national satellite-based land-use regression model for air pollution exposure assessment in Australia. Environ Res. 2014;135:204–211. doi: 10.1016/j.envres.2014.09.011. [DOI] [PubMed] [Google Scholar]

[R17] 17.Geddes JA, Martin RV, Boys BL, van Donkelaar A. Long-term trends worldwide in ambient NO2 concentrations inferred from satellite observations. Environ Health Perspect Online. 2016;124(3):281. doi: 10.1289/ehp.1409567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.van Rossum G, Drake FL., Jr Extending and embedding Python, Release 2.7. Python Softw Found Wolfeboro Falls. 2010 [Google Scholar]

[R19] 19.ArcGIS E 10.3. 1. Environmental Systems Research Institute, Inc; Redlands: 2015. [Google Scholar]

[R20] 20.Saha S, Moorthi S, Pan H-L, Wu X, Wang J, Nadiga S, Tripp P, Kistler R, Woollen J, Behringer D, et al. The NCEP climate forecast system reanalysis. Bull Am Meteorol Soc. 2010;91(8):1015–1057. [Google Scholar]

[R21] 21.Moore RT, Hansen MC. Google Earth Engine: a new cloud-computing platform for global-scale earth observation data and analysis. AGU Fall Meeting Abstracts; 2011; p. 2. [Google Scholar]

[R22] 22.Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1. [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Studio R. RStudio: integrated development environment for R. RStudio Inc; Boston Mass: 2012. [Google Scholar]

[R24] 24.de Hoogh K, Gulliver J, van Donkelaar A, Martin RV, Marshall JD, Bechle MJ, Cesaroni G, Pradas MC, Dedele A, Eeftens M, et al. Development of West-European PM 2.5 and NO 2 land use regression models incorporating satellite-derived and chemical transport modelling data. Environ Res. 2016;151:1–10. doi: 10.1016/j.envres.2016.07.005. [DOI] [PubMed] [Google Scholar]

[R25] 25.Henderson JV, Storeygard A, Deichmann U. Has climate change driven urbanization in Africa? J Dev Econ. 2017;124:60–82. doi: 10.1016/j.jdeveco.2016.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Parnell S, Walawege R. Sub-Saharan African urbanisation and global environmental change. Glob Environ Change. 2011;21:S12–S20. [Google Scholar]

[R27] 27.Gerland P, Raftery AE, Ševčíková H, Li N, Gu D, Spoorenberg T, Alkema L, Fosdick BK, Chunn J, Lalic N, et al. World population stabilization unlikely this century. Science. 2014;346(6206):234–237. doi: 10.1126/science.1257469. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Global Land Use Regression Model for Nitrogen Dioxide Air Pollution

Andrew Larkin

Jeffrey A Geddes

Randall V Martin

Qingyang Xiao

Yang Liu

Julian D Marshall

Michael Brauer

Perry Hystad

Abstract

INTRODUCTION