Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Dec 1.
Published in final edited form as: J Expo Sci Environ Epidemiol. 2015 Jun 17;26(4):377–384. doi: 10.1038/jes.2015.41

Spatiotemporal prediction of fine particulate matter using high resolution satellite images in the southeastern U.S 2003–2011

Mihye Lee 1, Itai Kloog 2, Alexandra Chudnovsky 3, Alexei Lyapustin 4, Yujie Wang 5, Steven Melly 6, Brent Coull 7, Petros Koutrakis 1, Joel Schwartz 1
PMCID: PMC4760903  NIHMSID: NIHMS754654  PMID: 26082149

Abstract

Numerous studies have demonstrated that fine particulate matter (PM2.5, particles smaller than 2.5 μm in aerodynamic diameter) is associated with adverse health outcomes. The use of ground monitoring stations of PM2.5 to assess personal exposure; however, induces measurement error. Land use regression provides spatially resolved predictions but land use terms do not vary temporally. Meanwhile, the advent of satellite-retrieved aerosol optical depth (AOD) products have made possible to predict the spatial and temporal patterns of PM2.5 exposures.

In this paper, we used AOD data with other PM2.5 variables such as meteorological variables, land use regression, and spatial smoothing to predict daily concentrations of PM2.5 at a 1 km2 resolution of the southeastern United States including the seven states of Georgia, North Carolina, South Carolina, Alabama, Tennessee, Mississippi, and Florida for the years from 2003 through 2011. We divided the study area into 3 regions and applied separate mixed-effect models to calibrate AOD using ground PM2.5 measurements and other spatiotemporal predictors.

Using 10-fold cross-validation, we obtained out of sample R2 values of 0.77, 0.81, and 0.70 with the square root of the mean squared prediction errors (RMSPE) of 2.89, 2.51, and 2.82 μg/m3 for regions 1, 2, and 3, respectively. The slopes of the relationships between predicted PM2.5 and held out measurements were approximately 1 indicating no bias between the observed and modeled PM2.5 concentrations.

Predictions can be used in epidemiological studies investigating the effects of both acute and chronic exposures to PM2.5. Our model results will also extend the existing studies on PM2.5 which have mostly focused on urban areas due to the paucity of monitors in rural areas.

INTRODUCTION

Since the Six Cities study1, which showed a strong linear relationship between PM2.5 and mortality between cities that differed by pollution level, a body of literature has reported effects of PM2.5 on mortality and morbidity24. In many of those studies, the PM2.5 exposures were assessed using concentration data obtained at a central monitoring site located in a jurisdiction or within a specified distance. However, this approach introduces information bias, and thus leads to attenuation of the magnitude of effects of air pollution or increases the variance of estimate57. Many studies have attempted to address this issue and to produce PM2.5 concentrations for locations distant from the monitors810. This includes predicting PM2.5 levels using regression models based on geographic covariates such as land use regressions or geostatistical interpolation methods such as kriging8, 11, 12. However, predictions from a land-use regression are limited to long-term exposures for chronic health effects studies, since the geographic covariates are mostly not time varying13. Moreover, if the amount of pollution due to a geographic predictor, e.g. traffic density, changes over time because of control technology, this is not easily incorporated into land use regression. Geostatistical methods also have limitations because of the low density of monitoring stations, rendering the results unreliable especially in rural areas. Meanwhile, the aerosol optical depth (AOD) values from the Moderate-Resolution Imaging Spectroradiometer (MODIS) satellite provide daily measurements for the entire earth. AOD is a measure of particles in a column of air and is related to PM2.514. With the advent of a new processing algorithm called Multi-Angle Implementation of Atmospheric Correction (MAIAC)15, the spatial resolution of AOD has further improved from 10×10 km2 to 1×1 km2. Since the relationship between the AOD measurement and PM2.5 is affected by various factors such as the optical properties of particulates, mixing height, and humidity, which vary daily, we used a mixed-effect model with daily random slopes for daily calibration rather than a general regression. This provides better predictive performance than other studies using the satellite imagery for the PM2.5 prediction without daily calibration16.

In this paper, we used AOD satellite data and predictors such as meteorological variables, land use regression, and spatial smoothing to predict the daily concentration of PM2.5 at a 1 km2 resolution across the southeastern United States, including seven states of Georgia, North Carolina, South Carolina, Alabama, Tennessee, Mississippi, and Florida for the years 2003 through 2011.

DATA

Ground particulate matter measurements

We obtained PM2.5 mass concentration data from Federal Reference Method (FRM) monitors operated by the U.S. Environmental Protection Agency (EPA) and monitors with a Teflon filter in the Interagency Monitoring of Protected Visual Environments (IMPROVE) program for a total of 257 monitoring sites.

Aerosol optical depth data

The MAIAC data were obtained from the National Aeronautics and Space Administration (NASA) at the resolution of 1 km2. AOD data were delivered by tiles, which is the unit of spatial domain of MODIS image with an area of 10×10 degree at the equator. Our study used tiles h00v03, h01v02, h01v03, h01v04, h02v02, and h02v03. The data include the latitude and longitude in the WGS84 coordinate system, the corresponding AOD values, and a quality flag. We deleted AOD values higher than 1.5 likely reflecting cloud contamination and AOD values over water bodies since the water reflects light and affects the reliability of AOD readings. The AOD value which was the closest in distance within a 1 km buffer was assigned to each PM2.5 measurement.

To compare the new MAIAC data at a 1 km2 resolution with the existing data at a 10 km2 resolution, we decided to use the existing AOD data that we had retained. For the years 2000 2010, MODIS level 2 files from the Earth Observing System (EOS) Terra satellite were used to extract AOD values at a 10 km × 10 km resolution.

Meteorological data

We downloaded weather data from the National Climatic Data Center (NCDC, 2010) website. Weather variables include temperature, relative humidity, wind speed, visibility, and sea level pressure in the form of the daily mean. A total of 144 weather stations were used and we assigned the weather readings based on the closest distance on a specific data.

Normalized difference vegetation index

NASA provides normalized difference vegetation index (NDVI) data from the MODIS sensor. We aggregated NDVI measurements to a 1 km grid and a one month average. Specifically we used the Terra satellite product ID of MOD13A3.

Height of Planetary boundary layer

We obtained the daily height of planetary boundary layer (PBL) from the National Oceanic and Atmospheric Administration (NOAA) Reanalysis Data. The pixel resolution of PBL data was 32×32 km on a daily basis. To represent the daily PBL height, the 24-hr mean was used.

Land use variables

Emissions of PM2.5, PM10, and NOx from point sources and county area level emissions, were downloaded from National Emission Inventory (NEI) data for 2005 from the website of the environmental protection agency (EPA 2005 NEI). To produce the percentage of urbanism for each satellite grid cell at 1 km2 resolution, we used the national land cover database for 2011 (NLCD 2011) data at 30 meter resolution17. We reclassified land cover codes 22 (Developed, Low Intensity), 23 (Developed, Medium Intensity), and 24 (Developed, High Intensity) to 1 as an urban cell and assigned 0 for the rest of codes. The mean of binary vales was calculated for each 1 km grid cell. For the location of geographical predictors such as roads, major buildings, ports, airports, and water bodies, spatial data from ESRI Data & Maps 2004 were used (ArcGIS® and ArcMap by Esri, Copyright © Esri).

METHOD

Date preparation

For each day, we assigned the closest AOD readings within a 1 km buffer of PM monitors. We confined our analysis to PM2.5 less than 80 μg/m3 to eliminate influential outliers (25 observations among the total of 260,476 PM2.5 measurements for 9 years). We also restricted our analysis to cells greater or equal in population to 10, since the southeastern U.S. includes less populated areas. AOD values > 0.5 which corresponded to PM2.5 < 10 μg/m3 were removed because it is likely they are due to cloud contamination. Data with AOD < 0.15 and PM2.5 > 25 μg/m3 were removed because we decided it is likely on those days that low PBL moved particles closer to ground level, deteriorating the relationship between AOD and ground-level PM2.5 measurements.

The aim of our model lies in high-performance predication, not associational inference between the exposure and outcome such as in the epidemiological studies. Hence, our strategy was to eliminate observations with high residuals over 10 μg/m3 as too likely to distort our predictions for most observations, and to choose a model based on maximizing cross-validated (CV) R2. AOD values are not missing at random (for example there are more missing in the winter) which can distort the predictions. Thus, we used inverse probability weighting to account for this selection bias. Finally, the calibration between AOD and PM2.5 can vary spatially, and daily. The daily variation is due to changes in particle size distribution, color, and vertical profile, and we address this by daily calibration and by using PBL data in the model using mixed effect models with the random intercept and slopes for day. To account for spatial differences in these daily slopes, we nested them within sub-regions, and to account for more permanent differences between locations, we included land use terms in our model. Specifically, we fitted the following model:

E(PM2.5ij)=(β0+b0j+b0jk)+(β1+b1j+b2jk)AODij+(β2+b2j)tempij+m=17β1mX1mij+n=115β2nX2ni+β25AOD×PBL (1)

where PM2.5ij is the PM2.5 measurements at the monitoring site i on day j. β0 is the intercept for the fixed effect (the population intercept) and b0j is the overall random intercept which varies by day. b0jk is the random intercept for day nested in each sub-region. Similarly, β1 is the slope for the fixed effect of AOD, b1i is the overall slope for the random effect of AOD for the day, and b2jk is the random slope for each day nested in each sub-region. AOD is the AOD measurement that is used for the monitoring site i within 1 km of the site on day j. β2 and b2j represent the slopes for the fixed effect and the random effect of temperature, respectively. temp is the temperature that is measured by the closest weather monitor to the site i on day j. β1m is the slopes for the fixed effect of spatiotemporal variables. X1mij is the matrix of mth spatiotemporal covariates on the site i and day j other than temperature and consists of 7 variables: dew point temperature, sea level pressure, visibility, wind speed, absolute humidity; NDVI in the corresponding month; and PBL. β2n is the slopes for the fixed effect of spatial variables. X2ni is the matrix of 15 spatial covariates for the ith site which includes the percent urbanicity, elevation, the density of major roads, population within 10 km diameter, PM2.5 emissions at county level, PM2.5 emissions from point sources, PM10 emission from point sources, NOx emission from point sources, canopy surface in 2001, distance to the closest A1 roads, distance to the closest airport, distance to the closest port, distance to the closest railroad, distance to a closest road, and distance to the major building. Observations with residuals over 10 μg/m3 were re-visited and we determined their validity by comparing PM2.5 readings from the surrounding monitors and the previous day and the next day. If we determined them to be erroneous, we assigned the readings from the closest monitoring station within 15 km.

Model

Due to the vast study area, a single model was not able to achieve the best performance in prediction. The southeastern U.S. consists of various areas with different topography, climate (tropical in Florida), and geographic features such as swamps and forests. Therefore, we decided to split the study area into three regions and to fit separate models for each region and implement nested random coefficients for sub-regions within each region (Figure 1). Region 1 consist of Tennessee, Mississippi, Alabama, and Georgia. Region 2 covers North Carolina, South Carolina, and Georgia. Lastly, region 3 covers Florida, Mississippi, Alabama, Georgia, and South Carolina.

Figure 1.

Figure 1

Study area and the locations of PM2.5 monitoring stations

AOD measurement cannot be made due to various factors such as cloud or snow cover. We hypothesized that the cloud formation and snow cover is affected by weather conditions including temperature, wind speed, sea level pressure, elevation and the season. Therefore, to adjust the non-random missingness of AOD, we modeled inverse probability weights (IPW) and applied them to the first stage models. Specifically, we fitted the following logistic model for the missingness of AOD measurements.

E(logit(p))=β0+β1tempij+β2WSij+β3SLPij+β4elevi+β5monj, (2)

where temp is temperature of cell i on day j, WSij is wind speed of cell i on day j, SLPij is the sea level pressure of cell i on day j, elev is the elevation of cell i, and mon is the corresponding month that day j falls in.

Using the probability of the outcome (missing or not), we computed the inverse probability as, 1p. Next, we normalized IPW values by dividing them by their mean. These were applied to the subsequent models as a weight.

Each of the models corresponding to the three regions was evaluated using a 10 fold cross-validation to avoid over-fitting. We adopted a different approach in cross-validation which other similar studies performed record-based cross-validation. We conducted site-based cross-validation since we believed that cross-validation by monitoring stations was more appropriate so that it assesses the capabilities of the models to predict spatial variability. Firstly, we made a randomly ordered list of monitoring stations in each region. The station list then was split into 10 subsets. In turn, 90 % of monitoring stations were used to fit the model and 10 % of stations were used to test the model performance. This cross-validation were conducted for 10 times for each region. The site-based 10-fold cross-validated R2 was used for finalizing the models rather than modeled R2 as well as for assessing the model performance and for avoiding over-fitting. As a result, we ended up the following models based on the highest R2 from the 10-fold cross-validation.

In region 1, we fitted the following model for each year with the IPW:

E(PM2.5ij)=(β0+b0j+b0jk)+(β1+b1j+b1jk)AODij+β2tempij+β3dewpij+β4slpij+β5wdspij+β6visibij+β7ahij+β8NDVI+β9elevi+β10pblij+β11urbi+β12emissioni+β13PM10i+β14NOXi (3)

where PM2.5ij is the PM2.5 measurements at the monitoring site i on day j. β0 denotes the fixed effect intercept term (population intercept) and b0j is the random effect intercept varies randomly from one day to another. b0jk is the random intercept for day nested in each sub-region. Similarly, β1 is the slope for the fixed effect of AOD, b1i is the slope for the random effect of AOD for each day, and b2jk is the random slope for each day nested in each sub-region. AOD is the AOD measurement that is used for the monitoring site i within 1 km of the site on day j. temp is the temperature that is measured by the closest weather monitor to the site i on day j. dewp is the dew point that is measured by the closest weather monitor to the site i on day j. slp is the sea level pressure in millibars that is measured by the closest weather monitor to the site i on day j. wdsp is the wind speed in knots that is measured by the closest weather monitor to the site i on day j. visib is the visibility in miles that is measured by the closest weather monitor to the site i on day j. elev is the elevation of the site i. pbl is the height of the planetary boundary layer at the site i on day j. urb is the percentage of urbaness at the site i. emission is the annual emission of PM2.5 in ton from the closest point source such as an industrial factory. PM10 is the annual emission of PM10 in ton from the closest point source such as an industrial factory. NOX is the annual emission of NOX in ton from the closest point source such as an industrial factory.

In region 2, we fitted the following model for each year with the IPW:

E(PM2.5ij)=(β0+b0j+b0jk)+(β1+b1j+b1jk)AODij+β2tempij+β3dewpij+β4slpij+β5wdspij+β6visibij+β7ahij+β8NDVI+β9elevi+β10pblij+β11urbi+β12emission (4)

For the third region, we fitted the following model for each year with the IPW:

E(PM2.5ij)=(β0+b0j+b0jk)+(β1+b1j+b1jk)AODij+β2tempij+β3dewpij+β4slpij+β5wdspij+β6visibij+β7ahij (5)

Besides the overall R2 from the 10-fold cross-validation, we estimated a spatial R2 by regressing the annual mean of observed PM2.5 against that of predicted one for each site. To assess the precision of the predictions, root mean squared prediction error (RMSPE) was generated by taking the square root of the mean of squared prediction residuals. A temporal R2 was calculated by regressing the difference between the actual PM2.5 measurement on a specific day and the annual mean for each site against the equivalent for the predicted values from the model.

Once we finalized the calibration models by three regions as above, we predicted PM2.5 levels based on the coefficients for AOD values and other temporal and spatial variables.

For the areas and days with AOD missing, we interpolated those cells using the surrounding cells that had AOD values and thus had predictions in the second stage. Specifically, we applied the following model with the IPW.

(PredPM2.5ij)=(β0+b0j+b0jk)+s(lati,longi)+(β1+b1ik)MPMij+β2bimonij+β3pblij+β4ah_gm3ij+β5elevij+β6mpm×bimonij+β7mpm×pblhij, (6)

where PredPMij is the predicted PM2.5 level at a grid cell i on a day j in stage 2. lati and longi are the latitude and longitude coordinates of the cell i, respectively; and s() is a smooth function of thin plate splines. MPMij is the mean PM2.5 measured at monitoring stations within a 100 km buffer for the cell i on day j.

Since the purpose of the analysis of the 10 km data is to compare the performance of two data, we conducted the first stage model only. During the modeling, we applied same procedures as above with the same model with same variables, calibration, and IPW to make a fair comparison.

As for software, MATLAB 2014b was used to extract the AOD readings from the raw satellite image in the HDF format and ArcGIS Desktop 10.2.2 was used along with python scripting for data preparation. Models were implemented by using the R 3.02 and SAS 9.3 (Statistical Analysis System).

RESULTS

A total of 257 monitoring stations were used for the study. Figure 1 shows the study area and the locations of PM2.5 monitors. The study area with the thick boundary line covers most of the seven states except for the small area of western Mississippi due to the lack of the total spatial domain consisting of AOD tiles. The numbers from 1 to 3 in big bold font indicate the study area region. Region 1 mainly consists of the states of Tennessee, and the upper part of Mississippi, Alabama, and Georgia, and contains 61 monitoring stations (0.0003 monitor/km2). Region 2 includes most of North Carolina, and major parts of South Carolina, and Georgia with 88 monitors. Region 2 is most densely populated by PM monitoring stations (0.00038 monitor/km2). Region 3 covers the most southern part, including Florida and the southern part of Mississippi, Alabama, Georgia, and South Carolina. Although region 3 has the largest number of monitors of 108, due to its vast area, the spatial distribution of PM monitoring stations is most scattered among the three regions (0.00026 monitor/km2).

Table 1 shows the descriptive statistics for PM2.5 measurements from monitoring stations and AOD measurements by MAIAC algorithm in the southeastern U.S. by year from 2003 to 2011. The annual average of PM2.5 has steadily decreased from 12.2 in 2003 to 9.8 μg/m3 in 2011. The standard deviation has also decreased from 6.5 to 5.3 μg/m3. The mean AOD readings were on the order of 0.20 (dimensionless) over 9 years.

Table 1.

Descriptive statistics of PM2.5 (μg/m3) and MAIAC AOD

Year Mean PM (S.D.) Mean AOD (S.D.)
2003 12.2 (6.5) 0.18 (0.18)
2004 12.6 (6.6) 0.18 (0.17)
2005 13.1 (7.3) 0.20 (0.19)
2006 12.6 (6.6) 0.20 (0.19)
2007 12.4 (7.5) 0.21 (0.21)
2008 10.8 (5.6) 0.18 (0.16)
2009 9.4 (4.6) 0.17 (0.15)
2010 10.2 (4.9) 0.17 (0.15)
2011 9.8 (5.3) 0.20 (0.18)

S.D., standard deviation

Figure 2 shows the spatial distribution of PM2.5 concentrations in the study area, represented by the average PM2.5 levels by monitors during the study period (2003–2011). Monitoring stations in big cities such as Atlanta, Nashville, Charlotte, and Birmingham recorded the highest average PM2.5 level. Monitors at intersections of major highways also showed the high level of PM2.5. Among the seven study states, Florida showed the lowest PM2.5 level.

Figure 2.

Figure 2

Spatial Distribution of PM2.5 concentrations between 2003 and 2011

Our model showed a highly significant association between PM2.5 and AOD after controlling for other covariates and spatiotemporal predictors. Table 2 presents results from the stage 1 model where the calibration of AOD and other spatiotemporal predictors were done by each year and region. The R2 numbers are from the 10-fold cross-validation based on the sampling of monitors not observations regardless of monitors. The predictive power of the models differed by region. Region 2 showed the highest overall R2 of 0.81 with the year-to-year variation ranging from 0.78 in 2008 to 0.85 in 2007. Region 3 showed the lowest performance with an average cross-validated R2 of 0.70 (minimum of 0.63 occurred in 2011 and maximum of 0.75 occurred in 2003 and 2005). For region 1, an average cross-validated R2 was 0.77 and ranged from 0.65 in 2010 to 0.83 in 2005. The slopes between the observed PM2.5 versus the modeled PM2.5 were close to 1 for all the regions, suggesting a good agreement between the model results and actual measurements and the thus low bias. Region 2 exhibited the lowest average root mean square prediction error (RMSPE) of 2.51 μg/m3, followed by region 3 with 2.82 μg/m3 and region 1 with 2.87 μg/m3. The RMSPE for the spatial component was much lower at 0.82 μg/m3 in region 2. In general, the models performed better temporally than spatially. The temporal R2 values were higher than the spatial ones except for region 3. For the temporal result, the mean R2 was 0.80, 0.82, and 0.69 for regions 1, 2, and 3, respectively. For the spatial model the mean R2 was 0.69, 0.63, and 0.76 by region order.

Table 2.

Result of site-based 10-fold cross-validation from stage 1 model using 1 km2 data

Year Region R2 (CV) Slope (CV) RMSPE (μg/m3) Spatial R2 Temporal R2 Spatial RMSPE
2003 1 0.72 0.93 3.51 0.50 0.78 1.86
2 0.83 0.98 2.67 0.59 0.84 1.03
3 0.75 1.01 2.62 0.81 0.74 0.93
2004 1 0.79 0.97 2.92 0.94 0.80 1.07
2 0.80 0.99 2.77 0.52 0.81 0.79
3 0.74 0.99 2.83 0.77 0.74 0.86
2005 1 0.83 0.99 3.23 0.86 0.84 1.12
2 0.80 0.97 3.12 0.81 0.81 0.93
3 0.75 0.99 3.10 0.73 0.75 1.19
2006 1 0.80 0.98 2.99 0.53 0.83 1.26
2 0.84 0.99 2.70 0.70 0.85 0.86
3 0.74 1.00 2.69 0.67 0.75 1.15
2007 1 0.79 0.98 3.19 0.67 0.82 1.34
2 0.85 0.99 2.54 0.59 0.86 0.84
3 0.70 1.02 3.29 0.77 0.69 1.25
2008 1 0.78 0.99 2.71 0.74 0.80 0.99
2 0.78 0.98 2.48 0.60 0.79 0.79
3 0.69 1.00 2.74 0.85 0.65 0.99
2009 1 0.76 0.98 2.30 0.81 0.78 0.83
2 0.78 0.99 2.05 0.81 0.79 0.78
3 0.66 1.02 2.60 0.80 0.64 0.87
2010 1 0.65 0.95 2.80 0.33 0.71 1.33
2 0.80 0.99 2.09 0.46 0.81 0.68
3 0.66 1.00 2.51 0.69 0.66 1.11
2011 1 0.79 0.98 2.40 0.80 0.80 0.86
2 0.78 0.98 2.21 0.55 0.79 0.69
3 0.63 0.99 2.97 0.75 0.61 0.98

Mean 1 0.77 0.97 2.89 0.69 0.80 1.18
2 0.81 0.99 2.51 0.63 0.82 0.82
3 0.70 1.00 2.82 0.76 0.69 1.04

The output prediction model based on the third model gave very similar results (Table 3). The third column represents the R2 for the prediction from stage 2 (prediction for the gird cells and days that AOD readings were available) and the last column illustrates those for the comparison with actual PM2.5 observations. The final prediction showed high predictive power, from 0.89 (region 2) to 0.86 (region 3).

Table 3.

R2 from stage 3 model

Year Region R2 Pred2 R2 PM25
2003 1 0.83 0.90
2 0.86 0.91
3 0.61 0.85
2004 1 0.83 0.88
2 0.84 0.90
3 0.64 0.85
2005 1 0.83 0.91
2 0.84 0.90
3 0.65 0.87
2006 1 0.86 0.89
2 0.87 0.91
3 0.59 0.86
2007 1 0.83 0.90
2 0.84 0.91
3 0.62 0.88
2008 1 0.83 0.87
2 0.82 0.88
3 0.65 0.90
2009 1 0.81 0.86
2 0.80 0.86
3 0.61 0.83
2010 1 0.75 0.83
2 0.81 0.89
3 0.60 0.85
2011 1 0.85 0.89
2 0.81 0.88
3 0.61 0.87
Mean 1 0.82 0.88
2 0.83 0.89
3 0.62 0.86

To graphically represent the predictions, Figure 3 displays the prediction results in the form of annual average in 2003 where reveals higher PM2.5 levels for highways and the main cities. The spatial pattern of predictions matches well with the one of the measured PM2.5 represented in Figure 2. There was no systematic spatial patterns of residuals during the study period (Figure 4).

Figure 3.

Figure 3

Predicted PM2.5 level in 2003

Figure 4.

Figure 4

Residual Map

Compared to the existing AOD data at a 10 × 10 km resolution, the MAIAC data at a 1 × 1 km resolution showed the better performance (Table 4). Only except for slight decrease in the mean of 10-fold cross-validated R2 for Region 1 from 0.78 to 0.77, the MAIAC data showed the higher R2 values. Especially, the performance in Region 3 drastically improved from 0.62 to 0.70. The new data also had lower errors than the existing one. RMSPE values have decreased from 3.27 to 2.89 μg/m3 for Region 1, from 2.90 to 2.51 μg/m3 for Region 2, from 3.64 to 2.82 μg/m3 for Region 3. Other indicators such as Spatial R2 and temporal R2 have also improved when using the 1 km AOD data.

Table 4.

Result of site-based 10-fold cross-validation from stage 1 model using 10 km2 data

Year Region R2 (CV) Slope (CV) RMSPE (μg/m3) Spatial R2 Temporal R2 Spatial RMSPE
2000 1 0.84 0.99 3.76 0.64 0.85 1.09
2000 2 0.80 0.98 3.40 0.54 0.82 1.42
2000 3 0.72 1.00 4.09 0.64 0.74 1.73
2001 1 0.78 0.98 3.68 0.43 0.80 1.55
2001 2 0.79 0.99 3.18 0.52 0.80 1.14
2001 3 0.65 0.98 3.74 0.57 0.69 1.59
2002 1 0.79 0.97 3.60 0.62 0.80 1.15
2002 2 0.77 0.99 3.14 0.36 0.79 1.14
2002 3 0.63 0.96 3.72 0.58 0.63 1.40
2003 1 0.77 0.99 3.30 0.25 0.79 1.29
2003 2 0.84 0.99 2.79 0.30 0.86 1.03
2003 3 0.60 0.96 3.56 0.52 0.61 1.54
2004 1 0.75 0.97 3.30 0.36 0.77 1.25
2004 2 0.78 0.99 3.15 0.47 0.79 0.93
2004 3 0.69 0.97 3.60 0.63 0.71 1.40
2005 1 0.82 0.98 3.59 0.38 0.84 1.33
2005 2 0.81 0.99 3.20 0.58 0.83 1.06
2005 3 0.68 0.99 3.90 0.62 0.71 1.69
2006 1 0.79 1.00 3.36 0.60 0.80 1.13
2006 2 0.82 0.99 3.03 0.46 0.83 1.08
2006 3 0.62 0.97 3.45 0.55 0.64 1.60
2007 1 0.77 0.97 3.71 0.64 0.78 1.22
2007 2 0.82 0.99 2.88 0.66 0.83 0.73
2007 3 0.59 0.98 4.31 0.57 0.59 1.67
2008 1 0.77 0.98 2.79 0.46 0.79 0.99
2008 2 0.77 0.98 2.60 0.61 0.78 0.86
2008 3 0.58 0.96 3.36 0.59 0.57 1.42
2009 1 0.77 0.99 2.33 0.63 0.78 0.80
2009 2 0.75 0.99 2.15 0.78 0.77 0.76
2009 3 0.49 1.00 3.25 0.64 0.48 1.34
2010 1 0.70 0.99 2.57 0.20 0.73 0.96
2010 2 0.75 1.00 2.38 0.53 0.76 0.77
2010 3 0.55 0.99 3.02 0.68 0.54 1.38

Mean 1 0.78 0.98 3.27 0.47 0.79 1.16
2 0.79 0.99 2.90 0.53 0.81 0.99
3 0.62 0.98 3.64 0.60 0.63 1.52

DISCUSSION

In this paper, we predicted PM2.5 levels across the southeastern U.S. at a 1 km resolution using the MODIS satellite imagery derived by the newly developed algorithm MAIAC. Compared to the AOD data at a 10 km resolution, the MAIAC data at a 1 km resolution showed the better performance. Furthermore, higher resolution enabled the more precise exposure assessment for PM2.5 at a finer scale such as the street-level address.

These results will enable epidemiological studies to evaluate the association between PM2.5 and its health effects with reduced measurement error in exposure. We also anticipate study areas may extend to rural areas in the southeastern U.S., which were formerly restricted to urban areas due to the distance to monitoring stations. Considering that PM2.5 measurements are not always daily, our model interpolates the temporal break using the daily satellite imagery and a smoothing technique as well as spatial predictions. This approach enables epidemiological studies to examine both acute and chronic effects.

Model performance varied by region. Region 2 mainly covering North Carolina revealed the highest performance (0.81) and region 3, covering the most southern part, such as Florida, had the lowest performance (0.70). One possible explanation is that the spatial density of monitoring stations affects model. Region 2 has the most abundant monitoring stations compared to its area, whereas region 3 lacks monitoring stations for its extensive area. This appeared to affect the results by providing fewer pairs to fit the model. Another explanation may be that region 2 is relatively more urbanized compared to region 3 with more land use factors which could be taken into account. This suggestion parallels with our experience during the analysis that the calibration model based on the highest R2 for region 2 has more land use variables than that for region 3. Lastly, the quality of AOD from the MODIS instrument and the MAIAC algorithm should be considered. Visual analysis (data not present) by AOD swath revealed that the performance of AOD differed by tile of satellite imagery. Tile h01v02 that covers North Carolina showed the best performance, whereas tiles around Alabama (h00v03 and h01v03) showed the poorest performance. To improve model performance, other AOD products from other algorithms such as AOD data from Deep Blue algorithm18 at 10 km resolution can be incorporated which is used for bright surfaces. More studies are needed to determine which factors play a role in the prediction of PM2.5 using satellite imagery and to further improve the performance.

Compared to the existing studies on the similar area1921, our study shows higher R2 and less errors. After predicting PM2.5 levels at a 10 km resolution for the similar area for the year 200321, Hu et al19. examined the feasibility of the 1 km resolution MAIAC AOD data by comparing with the 10 km data. In their study, the performance of the MAIAC AOD data was comparable to the existing MODIS data but showed slightly lower performance. Our study demonstrated the MAIAC AOD can outperform the existing 10 km data by using various approaches on the top of the advantage of the higher resolution. The study resulted in an R2 of 0.64 and RMSPE of 3.93 μg/m3 for the MAIAC data in stage 1. In our model, the lowest R2 in 2003 was 0.72 with a RMSPE 3.51 μg/m3. Recently, they expanded their study period for the same area20 from a single year of 2003 to the multiple years from 2001 to 2010. Our study area covers vast additional areas in the southeastern U.S. by adding Florida, Mississippi, and the complete parts of other states. Adopting different approaches than their study, our study shows higher R2 values and lower RMSPE. The total mean of10-fold cross-validated R2 was 0.76 compared to the existing study 0.72 and that of RMSPE from our study was 2.74 compared to 3.72 μg/m3. Considering that our study area includes the most southern area such as Florida which showed the lowest performance with a big difference and we applied site-based cross-validation rather than observation-based cross-validation which produces higher R2, the actual improvement is expected to be bigger.

In conclusion, we have demonstrated that the use of satellite imagery and other land use variables with a mixed-effect model produces reliable predictions of daily PM2.5 for the large area of the southeastern United States. By incorporating land use terms and spatial smoothing, our models perform much better than previous studies. Therefore, our model results can be used in various epidemiological studies investigating the effects of PM2.5 allowing one to assess both acute and chronic exposures with the implication of a new application. Our model results will extend the existing studies on PM2.5 mainly targeted only for urban areas tied to the lack of monitors into new areas which used not to be studied such as rural areas.

Acknowledgments

This publication was made possible by USEPA grant RD 83479801. Its contents are solely the responsibility of the grantee and do not necessarily represent the official views of the USEPA. Further, USEPA does not endorse the purchase of any commercial products or services mentioned in the publication.

References

  • 1.Dockery DW, Pope CA, Xu X, et al. An Association between Air Pollution and Mortality in Six U.S. Cities. N Engl J Med. 1993;329(24):1753–1759. doi: 10.1056/NEJM199312093292401. [DOI] [PubMed] [Google Scholar]
  • 2.Pope CA., 3rd Epidemiology of fine particulate air pollution and human health: biologic mechanisms and who’s at risk? Environ Health Perspect. 2000;108(Suppl 4):713–723. doi: 10.1289/ehp.108-1637679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Pope CA, 3rd, Burnett RT, Thun MJ, et al. Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. JAMA. 2002;287(9):1132–1141. doi: 10.1001/jama.287.9.1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Barnett AG, Williams GM, Schwartz J, et al. The effects of air pollution on hospitalizations for cardiovascular disease in elderly people in Australian and New Zealand cities. Environ Health Perspect. 2006;114(7):1018–1023. doi: 10.1289/ehp.8674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Rhomberg LR, Chandalia JK, Long CM, Goodman JE. Measurement error in environmental epidemiology and the shape of exposure-response curves. Crit Rev Toxicol. 2011;41(8):651–671. doi: 10.3109/10408444.2011.563420. [DOI] [PubMed] [Google Scholar]
  • 6.Armstrong BG. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med. 1998;55(10):651–656. doi: 10.1136/oem.55.10.651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Goldman GT, Mulholland JA, Russell AG, et al. Impact of exposure measurement error in air pollution epidemiology: effect of error type in time-series studies. Environ Health. 2011;10:61-069X-10-61. doi: 10.1186/1476-069X-10-61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ryan PH, LeMasters GK. A review of land-use regression models for characterizing intraurban air pollution exposure. Inhal Toxicol. 2007;19(Suppl 1):127–133. doi: 10.1080/08958370701495998. 782016666 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.de Hoogh K, Wang M, Adam M, et al. Development of Land Use Regression Models for Particle Composition in Twenty Study Areas in Europe. Environ Sci Technol. 2013;47(11):5778–5786. doi: 10.1021/es400156t. [DOI] [PubMed] [Google Scholar]
  • 10.Beckerman BS, Jerrett M, Martin RV, van Donkelaar A, Ross Z, Burnett RT. Application of the deletion/substitution/addition algorithm to selecting land use regression models for interpolating air pollution measurements in California. Atmos Environ. 2013;77(0):172–177. doi: http://dx.doi.org/10.1016/j.atmosenv.2013.04.024. [Google Scholar]
  • 11.Wang R, Henderson SB, Sbihi H, Allen RW, Brauer M. Temporal stability of land use regression models for traffic-related air pollution. Atmos Environ. 2013;64(0):312–319. doi: http://dx.doi.org/10.1016/j.atmosenv.2012.09.056. [Google Scholar]
  • 12.Whitworth KW, Symanski E, Lai D, Coker AL. Kriged and modeled ambient air levels of benzene in an urban environment: an exposure assessment study. Environ Health. 2011;10:21-069X-10-21. doi: 10.1186/1476-069X-10-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kloog I, Koutrakis P, Coull BA, Lee HJ, Schwartz J. Assessing temporally and spatially resolved PM2.5 exposures for epidemiological studies using satellite aerosol optical depth measurements. Atmos Environ. 2011;45(35):6267–6275. doi: http://dx.doi.org.ezp-prod1.hul.harvard.edu/10.1016/j.atmosenv.2011.08.066. [Google Scholar]
  • 14.Alston EJ, Sokolik IN, Kalashnikova OV. Characterization of atmospheric aerosol in the US Southeast from ground- and space-based measurements over the past decade. Atmospheric Measurement Techniques. 2012;5(7):1667–1682. doi: 10.5194/amt-5-1667-2012. [DOI] [Google Scholar]
  • 15.Lyapustin A, Wang Y, Laszlo I, et al. Multiangle implementation of atmospheric correction (MAIAC): 2. Aerosol algorithm. Journal of Geophysical Research: Atmospheres. 2011;116(D3):D03211. doi: 10.1029/2010JD014986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lee HJ, Liu Y, Coull BA, Schwartz J, Koutrakis P. A novel calibration approach of MODIS AOD data to predict PM$_2.5$ concentrations. Atmospheric Chemistry and Physics. 2011;11(15):7991–8002. doi: 10.5194/acp-11-7991-2011. [DOI] [Google Scholar]
  • 17.Jin S, Yang L, Danielson P, Homer C, Fry J, Xian G. A comprehensive change detection method for updating the National Land Cover Database to circa 2011. Remote Sens Environ. 2013;132(0):159–175. doi: http://dx.doi.org/10.1016/j.rse.2013.01.012. [Google Scholar]
  • 18.Li X, Xia X, Wang S, Mao J, Liu Y. Validation of MODIS and Deep Blue aerosol optical depth retrievals in an arid/semi-arid region of northwest China. Particuology. 2012;10(1):132–139. doi: http://dx.doi.org/10.1016/j.partic.2011.08.002. [Google Scholar]
  • 19.Hu X, Waller LA, Lyapustin A, et al. Estimating ground-level PM2.5 concentrations in the Southeastern United States using MAIAC AOD retrievals and a two-stage model. Remote Sens Environ. 2014;140(0):220–232. doi: http://dx.doi.org/10.1016/j.rse.2013.08.032. [Google Scholar]
  • 20.Hu X, Waller LA, Lyapustin A, Wang Y, Liu Y. 10-year spatial and temporal trends of PM$_2.5$ concentrations in the southeastern US estimated using high-resolution satellite data. Atmospheric Chemistry and Physics. 2014;14(12):6301–6314. doi: 10.5194/acp-14-6301-2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hu X, Waller LA, Al-Hamdan MZ, et al. Estimating ground-level PM2.5 concentrations in the southeastern U.S. using geographically weighted regression. Environ Res. 2013;121(0):1–10. doi: 10.1016/j.envres.2012.11.003. doi: http://dx.doi.org/10.1016/j.envres.2012.11.003. [DOI] [PubMed] [Google Scholar]

RESOURCES