Abstract
Satellite-based PM2.5 monitoring has the potential to complement ground PM2.5 monitoring networks, especially for regions with sparsely distributed monitors. Satellite remote sensing provides data on aerosol optical depth (AOD), which reflects particle abundance in the atmospheric column. Thus AOD has been used in statistical models to predict ground-level PM2.5 concentrations. However, previous studies have shown that AOD may not be a strong predictor of PM2.5 ground levels. Another shortcoming of remote sensing is the large number of non-retrieval days (i.e., days without satellite data available) due to clouds and snow- and ice-cover.
In this paper we propose statistical approaches to overcome these two shortcomings, thereby making satellite imagery a viable method to estimate PM2.5 concentrations. First, we render AOD a robust predictor of PM2.5 mass concentration by introducing an AOD daily calibration approach through the use of mixed effects model. Second, we develop models that combine AOD and ground monitoring data to predict PM2.5 concentrations during non-retrieval days. A key feature of this approach is that we develop these prediction models separately for groups of days defined by the observed amount of spatial heterogeneity in concentrations across the study region. Subsequently, these methodologies were applied to examine the spatial and temporal patterns of daily PM2.5 concentrations for both retrieval days (i.e., days with satellite data available) and non-retrieval days in the New England region of the U.S. during the period 2000-2008. Overall, for the years 2000-2008, our statistical models predicted surface PM2.5 concentrations with reasonably high R2 (0.83) and low percent mean relative error (3.5%). Also the spatial distribution of the estimated PM2.5 levels in the study domain clearly exhibited densely populated and high traffic areas. The method we have developed demonstrates that remote sensing can have a tremendous impact on the fields of environmental monitoring and human exposure assessment.
Keywords: MODIS, aerosol optical depth, AOD, PM2.5, mixed effects model, cluster analysis, PM2.5 prediction model
1. Introduction
Particle pollution has been recognized as a significant concern related to human health and global climate change in many parts of the world (Brunekreef and Holgate, 2002; Ramanathan et al. 2001). Airborne particulate matter with aerodynamic diameter ≤2.5 μm (PM2.5) is a mixture of pollutants including sulfate, nitrate, ammonium, organic compounds, elemental carbon, metal oxides, and dust or soil particles (U.S. EPA 2004). Numerous studies have shown that ambient PM2.5 concentrations are associated with adverse health effects such as increased mortality and morbidity, aggravated respiratory and cardiovascular symptoms, and lower birth weight (Bell et al. 2007, 2010a; Franklin et al. 2007; Gent et al. 2003, 2009). In most epidemiological studies, subject-specific PM2.5 exposures are generally assessed by measuring ambient PM2.5 concentrations at one or more outdoor monitoring sites. However, sparse PM2.5 monitoring spatial networks may limit our ability to accurately assess human exposures to PM2.5, since concentrations measured at an outdoor site may be less representative of the subjects’ exposures as the distance from the monitor increases (Bell et al. 2010b; Lee et al. 2011a). In time-series analyses PM2.5 exposures should be highly correlated with ambient PM2.5 concentrations. In cross-sectional studies long-term PM2.5 exposures should be assessed with great accuracy (Bell et al. 2010b; Ito et al. 2004; Pinto et al. 2004). Furthermore, in interest of reducing cost, PM2.5 monitoring sites operate only a few days per week at varying frequencies such as every day, every third day, and every sixth day. Thus epidemiological studies are often compromised due to the lack of continuous measurements. In conclusion, due to their spatial and temporal limitations, current PM2.5 monitoring networks cannot provide sufficient data to fully assess PM2.5 human exposures for health effect studies and are hindered in their ability to help answer some key scientific questions, such as the effect of cumulative exposures over several days.
Satellite remote sensing provides data on aerosol optical depth (AOD), a measure of light extinction by atmospheric aerosols (i.e., light scattering and absorption). AOD values reflect particle abundance in the atmospheric column, and thus they have been used in statistical models to predict ground-level PM2.5 concentrations (Engel-Cox et al. 2004; Liu et al. 2005, 2007a, 2007b, 2007c, 2009; Schaap et al. 2009). Satellite-based PM2.5 monitoring has been considered to complement ground PM2.5 monitoring networks, especially for regions with a limited number of PM2.5 monitors. However, most of previous studies have reported that AOD has a low to moderate PM2.5 predictive ability (i.e., coefficient of determination R2 < 0.60) (Hoff and Christopher, 2009), which may not be sufficient for health effect studies. In a recent paper we introduced a new daily calibration technique for Moderate Resolution Imaging Spectroradiometer (MODIS) AOD to accurately predict ground PM2.5 concentrations (Lee et al. 2011b).
AOD values cannot be retrieved on days with clouds, high surface reflectance due to snow- and ice-cover, or retrieval errors. As a result, AOD data are not available for a large fraction of days (non-retrieval days) and thus PM2.5 concentration predictions are not always possible.
Due to the low to moderate PM2.5 predictability of AOD measurements and the large number of non-retrieval days, satellite remote sensing has played a limited role in the field of particle exposure assessment. As mentioned above, we have already addressed the low predictability issue by introducing the daily AOD calibration approach. The challenge of the infrequent satellite measurements will be addressed in this paper. Specifically, we have developed a statistical model to predict daily PM2.5 concentrations for both retrieval and non-retrieval days in the region of New England, U.S. for the years 2000-2008. This is of paramount importance in our efforts to enhance spatial and temporal coverage of PM2.5 concentration estimates, leading to more reliable environmental impact assessment, exposure assessment and health effect studies. Together these efforts render satellite remote sensing a powerful tool in the fields of environmental monitoring and human exposure assessment.
2. Methods
2.1. PM2.5 measurements
PM2.5 ambient air samples were collected at 69 U.S. Environmental Protection Agency (EPA) PM2.5 monitoring sites in Connecticut (CT), Massachusetts (MA), Rhode Island (RI), Southern Maine (ME), New Hampshire (NH), and Vermont (VT) for the years 2000-2008 (Fig. 1). At the 69 monitoring sites, 24-hr integrated PM2.5 filter samples were collected with varying frequencies including every day, every third day, and every sixth day as per EPA's monitoring program design. Not all monitoring sites operated for the entire nine years thus the number of monitoring sites varied by year.
Fig. 1.
Location of 69 EPA PM2.5 monitoring sites in the study region. This study region is covered by 582 grid cells (10×10 km2).
2.2. Satellite data
We obtained MODIS AOD data (Collection 5; Level 2 aerosol product) from the National Aeronautics and Space Administration (NASA)'s Earth Observing System (EOS) satellites, Terra and Aqua, over the New England region for the years 2000-2008. The MODIS AOD measurements have a relatively fine spatial (10 × 10 km2 grid) and temporal (every one to two days) resolutions, which makes them appropriate for daily air quality monitoring (Al-Saadi et al. 2005). The over-land retrieval algorithm of Collection 5 primarily uses three wavelength channels of 0.47, 0.66, and 2.12 μm (Levy et al. 2007), finally reporting AOD values at the wavelength of 0.55 μm, with the expected uncertainty of ΔAOD=±0.05±0.15×AOD (Levy et al. 2010; Remer et al. 2008). The Terra and Aqua satellites cross the equator at two different times, approximately 10:30 am (descending orbit) and 1:30 pm (ascending orbit) local sun times, respectively, providing aerosol information at two different times per day with a scanning swath of 2,330 km (cross-track) by 10 km (along-track at nadir). Details about the MODIS AOD retrieval algorithm can be found in Levy et al. (2007, 2009). The daily averages of Terra and Aqua AOD values are most likely to reflect the aerosol loading on a given day, but both AOD values are not always retrieved each day. Due to the diurnal variations (Green et al. 2009) and potential calibration differences between two satellite sensors, averaging the Terra and Aqua AOD measurements would not be appropriate depending on the data availability on a given day. Moreover, the Terra satellite was launched in December, 1999, while the Aqua satellite was launched in May 2002, thus only Terra measurements are available for the years 2000-2002. Consequently, we primarily used the Terra AOD measurements. The missing Terra AOD values were estimated from the Aqua AOD ones, if they were available, using an adjustment factor. This factor was equal to the ratio of the average Terra AOD to Aqua AOD for those days when both the Terra and Aqua measurements were available. We created 582 grid cells (10×10 km2) covering the New England region in ArcGIS (Version 9.3; ESRI), and all the subsequent analyses were based on the grid cells.
2.3. Statistical model
PM2.5 prediction for retrieval days
We have previously introduced a daily AOD calibration method that renders AOD a robust predictor of surface-level PM2.5 concentrations. This calibration method assumes minimal spatial variability of time-varying parameters influencing PM2.5-AOD relationships on a given day (Lee et al. 2011b). The proposed calibration approach relied on daily ground PM2.5 concentrations measured at multiple sites. Daily PM2.5-AOD relationships were derived using a mixed effects model with random intercepts and slopes (Fitzmaurice et al. 2004) as follows:
| (1) |
where PMij is the PM2.5 concentration at a spatial site i on a day j; AODij is the AOD value in the grid cell corresponding to site i on a day j; α and uj are the fixed and random intercepts, respectively; β and vj are the fixed and random slopes, respectively; is the random intercept of site i; εij ~ N(0, σ2) is the error term at site i on a day j; and Σ is the variance-covariance matrix for the day-specific random effects. The fixed effects represent the average intercept and PM2.5-AOD slope, and the random effects explain the daily-varying relationships. Due to the large number of sampling days it was not possible to run the mixed effects model for the entire study period. Therefore, we split the whole dataset (Years 2000-2008) into 5 subsets (Years 2000-2001, 2002-2003, 2004-2005, 2006-2007, and 2008). To build the mixed effects model, we matched the PM2.5 concentrations measured at a monitoring site and AOD values obtained for the corresponding grid cells. We removed the sampling days with only one PM2.5-AOD pair on a given day before running the model, since a slope cannot be determined with only one pair. In addition, we excluded those days when the root mean squared error (RMSE) between the measured and predicted PM2.5 concentrations was greater than 5 μg/m3 or when the estimated AOD slope was negative. These days were not considered reliable for calibrating AOD data due to PM2.5 instrumental measurement errors, cloud contaminated AOD, or any other potential errors. The number of PM2.5-AOD pairs for each data subset was 1,299 (2000-2001), 1,801 (2002-2003), 1,680 (2004-2005), 1,745 (2006-2007), and 972 (2008). We assessed the model performance using the coefficient of determination (R2) and percent mean relative errors (% MRE) between the measured and predicted PM2.5 concentrations. We obtained the measured PM2.5 concentrations from 69 EPA monitoring sites and the predicted values from our model estimations in the grid cells corresponding to the respective monitoring sites. The % MRE is calculated as [|mean predicted PM2.5 – mean measured PM2.5| / (mean measured PM2.5)] × 100. The R2 values show how well the measured and predicted PM2.5 concentrations are correlated, while the % MRE values present systematic differences between those concentration levels. Together, the values of R2 and % MRE are indicative of the ability of our modeling approach to produce reliable PM2.5 estimates for both time-series and cross-sectional health effect studies.
We validated the mixed effects model using a cross-validation (CV) method which checked for potential over-fitting. First, we randomly separated the entire dataset into 10 different subsets, each of them encompassing approximately 10% of the data. Each 10% subset of data was retained from the dataset, and the rest 90% of the data was used to fit the model. The fitted model was applied to predict PM2.5 concentrations for the 10% of the retained days. This process was repeated for each of the 10 subsets, and the predicted PM2.5 concentrations were compared to the measured PM2.5 concentrations using R2 and % MRE values.
PM2.5 prediction for non-retrieval days
The spatial patterns of the observed daily PM2.5 concentrations vary due to changes in meteorology and source emissions which influence the impact of local and regional sources in the study region. A cluster analysis using K-means was performed to identify groups of days with similar PM2.5 concentration spatial patterns in the R software. This analysis was based on the PM2.5 concentrations measured at the sampling sites within the study region and included the entire dataset (i.e., retrieval and non-retrieval days for the 9 year period). The K-means method is used to partition observations into K different subsets (i.e., clusters), yielding a solution that minimizes the within-cluster variance and maximizes the between-cluster variance. That is, the cluster analysis determines groups of days exhibiting similar spatial concentration patterns.
As mentioned above the number of available PM2.5 monitoring sites varied by day due to differences in sampling frequencies, site operating period, and missing data. Due to sampling design considerations, a large number of sites (>35 out of 69) operated every third day. The PM2.5 measurements in the sampling frequency of every third day were always on the same day for all sites. Thus the cluster analysis was applied to every third day data, which made it possible to obtain reliable day-specific PM2.5 spatial patterns. Furthermore, the cluster analysis was performed on PM2.5 concentration differences, obtained by subtracting the daily regional PM2.5 concentrations from the respective PM2.5 concentrations. On a given day the regional PM2.5 concentration was calculated by averaging the daily PM2.5 concentrations measured at all available monitoring sites. Since cluster assignment was done for every third day, the same cluster classification was applied to the adjacent two days, assuming an identical spatial pattern for three consecutive days. Finally, an important feature of the proposed approach is that the cluster analysis is independent of satellite data retrieval because it was based on PM2.5 ground measurements.
Following the mixed effects model and the cluster analysis, PM2.5 concentrations in each of the grid cells were predicted for days when no AOD values were available. Toward this end, a cluster-specific PM2.5 prediction model was developed which used a generalized additive model (GAM) (Hastie and Tibshirani, 1990) as follows:
| (2) |
where PMpredicted ij is the AOD-derived PM2.5 concentration at a spatial site i on a day j (from equation (1)); PMregional j is the regional PM2.5 concentration on a day j; s(latitude, longitude)i is a smooth function of location (latitude and longitude) for site i, and εij is the error term at site i on a day j. We modeled this smooth function as a thin plate spline as implemented in the R software package. As shown by equation (2), the predicted PM2.5 concentrations from the mixed effects model were used as a dependent variable. For each cluster, these PM2.5 concentration values were regressed on the regional PM2.5 levels and the spatial smooth function of latitude and longitude. As a result, a predicted spatial surface of PM2.5 concentrations could be generated for each group of days defined by the clustering algorithm. For each of the non-retrieval days PM2.5 concentrations in each grid cell were estimated by assuming that each cluster had a single PM2.5 spatial surface and the regional PM2.5 concentrations reflected the temporal variability of PM2.5 over the study domain. As mentioned above, each cluster includes both retrieval and non-retrieval days because the cluster analysis was independent of AOD data retrieval. The model performance for non-retrieval days was also examined by comparing the measured and predicted PM2.5 concentrations using R2 and % MRE estimates. In order to compare the GAM predicted PM2.5 concentrations to the measured ones at the monitoring sites we adjusted for site bias using the random site estimates from the mixed effects model. The statistical approach to predict PM2.5 concentrations within the study domain for non-retrieval days is summarized in Fig. 2.
Fig. 2.
Flowchart summarizing PM2.5 prediction for non-retrieval days.
2.4. Spatial variability of PM2.5 levels
We predicted concentrations for all retrieval and non-retrieval days during the 9-year period. Subsequently, we estimated the 9-year average PM2.5 concentrations for each of the grid cells in the study domain. We also present the PM2.5 concentration maps for each of the identified clusters. For the concentration maps, we split the distributions into 6 equally-sized bins due to the log-normally distributed PM2.5 levels.
3. Results and Discussion
3.1. Descriptive statistics
The mean (SE) PM2.5 concentrations measured at the 69 EPA monitoring sites varied from 7.96 (0.32) μg/m3 in Lebanon, NH (Site ID: 33-09-0010) to 16.38 (0.22) μg/m3 in New Haven, CT (Site ID: 09-09-0018). The overall mean PM2.5 concentration across the monitoring sites was 11.07 μg/m3 (SD=1.62 μg/m3). The PM2.5 concentrations measured at the spatial sites were not based on the same number of sampling days due to differences in sampling frequencies, site operation periods, or missing data, thus, the reported PM2.5 levels may not be directly comparable. Mean (SE) daily AOD values for the grid cells covering the New England region varied from 0.06 (0.01) to 0.30 (0.01). On average 627 AOD values per grid cell were retrieved which corresponds to 19.1% of the study period.
3.2. Model prediction
The mixed effects model generated 994 daily PM2.5-AOD relationships for the years 2000-2008. The number of the determined relationships did not vary much by year, ranging from 94 in 2000 to 131 in 2007 and 2008. However, the number of the relationships varied considerably by season, with summer being the highest (N=329) followed by fall (N=324), spring (N=253), and winter (N=88). The low number of retrieval days in winter was due to the larger number of days with clouds or snow in this season. Note that the number of PM2.5 ground measurements is generally constant throughout the year. The fixed effects of intercepts and slopes (AOD) for each of the 5 data subsets 2000-2001, 2002-2003, 2004-2005, 2006-2007, and 2008 were statistically significant (p<0.05), and the random effects of intercepts and slopes varied substantially by day. The daily intercepts and slopes (the mean of fixed plus random effect estimates) varied by season: 8.43 (SD=3.98), 7.98 (SD=3.86), 11.02 (SD=5.52), and 8.99 (SD=4.50) for intercepts; 8.18 (SD=4.12), 7.22 (SD=4.18), 9.25 (SD=5.31), and 8.49 (SD=4.63) for slopes in winter, spring, summer, and fall, respectively. The random site estimates for densely populated and high traffic areas were positive, indicating the necessity to include the site term in the mixed effects model to adjust for site bias. The cluster analysis of the entire dataset for the years 2000-2008 yielded 9 different clusters. Each of the clusters consisted of 1,404 (42.7%), 678 (20.6%), 441 (13.4%), 189 (5.8%), 162 (4.9%), 132 (4.0%), 105 (3.2%), 96 (2.9%), and 81 (2.5%) days, respectively. For all the cluster-specific performed GAMs, corresponding regional PM2.5 concentrations and spatial smooth function of coordinates were statistically significant (p<0.05).
The results of mixed effects model used to estimate PM2.5 concentrations for retrieval days are shown in Fig. 3. The model explained 93% of the variability in the measured PM2.5 concentrations obtained at the 69 monitoring sites (R2=0.93). There was a good agreement between the measured and predicted PM2.5 concentrations [slope=1.02 (SE=0.003) and intercept=-0.20 (SE=0.043)]. In addition, the cross-validation (CV) mixed effects model explained 88% of the variability in the observed PM2.5 concentrations (R2=0.88) with a slope of 1.00 (SE=0.004) and an intercept of 0.02 (SE=0.054). This suggests that AOD can be a robust predictor of PM2.5 in the mixed effects model. Also the model performance using these two simple linear regression models between the measured and predicted PM2.5 concentrations suggested that excessive over-fitting did not occur and the mixed effects model can be reliably applied to any grid cell in the study region.
Fig. 3.
Model performance for retrieval days for the years 2000-2008 (Unit: μg/m3): (A) Mixed effects model and (B) CV mixed effects model. The green solid line presents the regression line, and the red dashed line displays the 1:1 line indicating perfect agreement.
The site-specific PM2.5 predictability was examined by estimating the R2 and % MRE values for each of the 69 monitoring sites for both retrieval and non-retrieval days (Fig. 4). Each monitoring site had the measured and predicted PM2.5 concentrations, and the comparison between the measured and predicted PM2.5 levels showed the R2 and % MRE for each site. These 69 R2 values and 69 % MRE values were represented in each box plot of Fig. 4. For retrieval days, the average R2 and % MRE for 69 monitoring sites were 0.90 (SD=0.06) and 1.5 (SD=1.8) %, respectively. For the non-retrieval days, the average R2 and % MRE values were 0.80 (SD=0.10) and 6.1 (SD=4.4) %, respectively. Therefore, models for both the retrieval and non-retrieval days predicted PM2.5 concentrations accurately with reasonably high R2 and low % MRE values for most spatial sites. As expected, model performance was slightly better for retrieval days. In the past, low predictive power and a large fraction of non-retrieval days have limited the application of satellite-based PM2.5 exposure assessment to epidemiological studies.
Fig. 4.
Site-specific PM2.5 predictability using R2 and % MRE for the years 2000-2008: (A) Retrieval days and (B) Non-retrieval days. The R2 and % MRE values were based on the comparison between the measured and predicted PM2.5 concentrations by site. Each box plot represents 69 monitoring sites.
Model performance tests were conducted by year and season (Table 1). Overall, our statistical models predicted surface PM2.5 concentrations with high R2 (0.83) and low % MRE (3.5%) values. Yearly analysis showed constantly high R2 values ranging from 0.73 in 2000 to 0.87 in 2007 and low % MRE varying values from 2.3% in 2003 to 4.8% in 2002. Moreover, seasonal comparisons show that daily ground PM2.5 concentrations can be reliably estimated for all four seasons: R2 ranged from 0.75 in winter to 0.87 in summer and % MRE varied from 2.1% in spring to 5.0% in winter. The PM2.5 predictive ability in winter was lower than for other seasons, although the model performance was still reasonable. This may be explained by the higher proportion of non-retrieval days during the winter.
Table 1.
Overall, yearly, and seasonal comparisons between the measured and predicted PM2.5 concentrations. The measured and predicted PM2.5 concentrations and bias are in the unit of μg/m3.
| N | PM2.5 measured | PM2.5 predicted | Bias | %MRE | R2 | |
|---|---|---|---|---|---|---|
| Overall | ||||||
| 2000-2008 | 53,035 | 11.24 | 10.85 | 0.39 | 3.5 | 0.83 |
| Year | ||||||
| 2000 | 5,567 | 11.81 | 11.41 | 0.40 | 3.4 | 0.73 |
| 2001 | 5,876 | 12.42 | 11.97 | 0.45 | 3.6 | 0.81 |
| 2002 | 6,606 | 11.86 | 11.29 | 0.57 | 4.8 | 0.85 |
| 2003 | 5,878 | 11.70 | 11.43 | 0.27 | 2.3 | 0.82 |
| 2004 | 5,977 | 11.17 | 10.80 | 0.37 | 3.3 | 0.82 |
| 2005 | 5,153 | 11.60 | 11.14 | 0.45 | 3.9 | 0.81 |
| 2006 | 5,491 | 10.22 | 9.88 | 0.34 | 3.3 | 0.87 |
| 2007 | 6,142 | 10.50 | 10.16 | 0.33 | 3.2 | 0.87 |
| 2008 | 6,345 | 9.96 | 9.65 | 0.31 | 3.1 | 0.81 |
| Season | ||||||
| Winter | 12,478 | 12.18 | 11.57 | 0.61 | 5.0 | 0.75 |
| Spring | 13,349 | 9.33 | 9.13 | 0.20 | 2.1 | 0.78 |
| Summer | 13,648 | 13.69 | 13.21 | 0.48 | 3.5 | 0.87 |
| Fall | 13,560 | 9.80 | 9.52 | 0.28 | 2.9 | 0.81 |
Our study suggests that when satellite data are appropriately modeled they can be used to predict PM2.5 exposures for epidemiological studies based on temporal (e.g., time-series and case-crossover studies) and spatial (e.g., cross-sectional studies) variation in pollutant concentrations. Time-series and case-crossover studies investigate associations between day-to-day variations in exposures and health outcomes. Therefore, the high correlations between the measured and predicted PM2.5 concentrations (assessed by Pearson correlations or R2) indicate that the satellite-based PM2.5 predictions can provide reliable exposure estimates for longitudinal health effect studies. Furthermore, cross-sectional studies examine associations between health effects and long-term PM2.5 exposures across communities. The very good agreement between the site measured and predicted mean PM2.5 concentrations, as assessed by % MRE, suggests that satellite remote sensing can enhance our ability to assess PM2.5 exposures for the cross-sectional studies. Often PM2.5 concentrations vary spatially (Hoek et al. 2002; Kim et al. 2005). Therefore, using one or a limited number of outdoor monitors may not be sufficient to produce accurate human exposure assessments. This exposure error can result in the underestimation of health risks associated with PM2.5 exposures (Jerrett et al. 2005; Thomas et al. 1993; Zeger et al. 2000). Use of MODIS AOD data helps to capture some or most of the spatial heterogeneity thus reducing exposure error.
3.3. Spatial patterns of PM2.5 concentrations
The spatial distribution of the predicted 9-year average PM2.5 concentrations is shown in Fig. 5. In the figure, each grid cell value represents the average of daily predicted PM2.5 concentrations for 9 years. Our prediction models estimated daily PM2.5 concentrations for the entire study period (2000-2008) including both retrieval and non-retrieval days. Thus the number of the predicted PM2.5 concentrations in each grid cell is identical, and the reported grid cell concentrations can be compared directly. The estimated average PM2.5 concentrations during the period of 2000-2008 ranged from 10.25 to 11.44 μg/m3. Note that on a given day concentration ranges were larger. Higher concentrations were predicted for densely populated and high traffic areas (e.g., Bridgeport, Hartford, and New Haven, CT, Boston and Springfield, MA, and Providence, RI) and high point emission source areas (e.g., power plants located in the coastal cities, Somerset and Salem, MA) (U.S. EPA, 2008). Previous studies have shown that the Northeastern cities of the U.S. are mostly impacted by the regionally transported PM2.5 pollution (Lee et al. 2011a; Liu et al. 2003). As a result, average PM2.5 concentrations tend to exhibit low spatial variability throughout the study region (Burton et al. 1996; Suh et al. 1997).
Fig. 5.
Spatial distribution of the 9-year average predicted PM2.5 concentrations.
PM2.5 concentration spatial patterns may vary daily depending upon the prevailing meteorological conditions and the location and characteristics of the impacting PM and gaseous pollutant sources (Seinfeld and Pandis 2006). The New England area is the receptor of pollution mostly transported by northwestern, western, and southwestern winds. In addition, this region is impacted by emissions produced within the metropolitan New York area. Therefore, the spatial patterns and composition of particles in the New England depend on many time-varying parameters. However, it is possible to distinguish discrete spatial patterns that may reflect certain synoptic conditions. Using cluster analysis we were able to identify 9 distinct spatial patterns (clusters) for the entire 9 year study. The 9 concentration maps are shown by Fig. A1. In the figure, each concentration map characterized different spatial patterns of PM2.5 in terms of spatial gradients and PM2.5 levels. The spatial gradients clearly displayed a group of days (i.e., clusters) influenced by transported pollution from the metropolitan New York area and Canada. The average PM2.5 concentration levels and the range of the PM2.5 levels (i.e., the highest minus lowest PM2.5 concentrations) varied by cluster. This may be due to PM2.5 source locations/emission rates, local meteorology (i.e., prevailing winds and stability), and the relative contributions of transported and local pollution over the region. The 9 distinct cluster-specific concentration maps, as shown in Fig. A1, provide evidence that the cluster analysis successfully captured and represented the heterogeneous PM2.5 spatial patterns in the study region. Moreover, these spatial patterns help us qualitatively examine the characteristics of particle pollution (e.g., days that are strongly influenced by transported pollution).
4. Conclusions
We have introduced a new approach that uses satellite AOD data to predict the spatial and temporal patterns of PM2.5 levels in New England. Our method is based on the daily calibration of AOD measurements using ground-level PM2.5 concentrations, which was accomplished using a mixed effects model. These calibrations are necessary, since the relationship between AOD and PM2.5 concentrations depends on many time-varying parameters such as particle concentration vertical profile, particle composition, and relative humidity among others. Daily calibration renders AOD a better predictor of PM2.5, and it represents a significant improvement over previous studies, which assume a constant relationship between the two parameters.
Furthermore, we have proposed a new method to predict PM2.5 concentrations during non-retrieval days, which are quite frequent in New England due to clouds and snow. The cluster analysis, which was based on the analysis of ground PM2.5 measurements obtained at a large number of sites in New England, identified 9 distinct spatial patterns of PM2.5. Each of the spatial patterns determined a single PM2.5 spatial surface, generating 9 cluster surfaces of PM2.5 over the study period. The cluster-specific analysis allowed accurate prediction of all the missing PM2.5 concentrations in each grid cell. The cluster analysis is crucial in predicting ground PM2.5 concentrations for non-retrieval days, while overcoming the limitation of satellite data availability.
Overall, we have demonstrated the tremendous potential of satellite AOD data to accurately predict exposures to PM2.5. These data are necessary for both short- and long-term PM2.5 epidemiological studies. As satellite remote sensing improves, data with finer spatial and temporal resolutions will become available in the future, leading to more accurate PM2.5 exposure estimates. With regard to the MODIS, AOD data with the spatial resolution of 3 km is expected in the near future. This improvement will enhance our ability to assess daily subject-specific PM2.5 exposures, since data with finer spatial resolution may further reduce exposure measurement errors.
Supplementary Material
Fig. A1. Spatial distributions of the predicted PM2.5 concentrations for 9 clusters.
Highlights.
- Satellite-based PM2.5 prediction has the potential to monitor PM2.5 air quality.
- We use an AOD daily calibration approach to predict PM2.5 for retrieval days.
- The amount of PM2.5 spatial heterogeneity can be observed.
- These enable us to develop PM2.5 prediction models for non-retrieval days.
Acknowledgments
The authors thank the Harvard-EPA Clean Air Research Center and the Yale Center for Perinatal, Pediatric and Environmental Epidemiology. This publication was made possible by USEPA grant RD 83479801. Its contents are solely the responsibility of the grantee and do not necessarily represent the official views of the USEPA. Further, USEPA does not endorse the purchase of any commercial products or services mentioned in the publication. This publication was also supported by NIEHS grants R01ES016317 and R01ES019587.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of Interest: The authors declare that they have no actual or potential conflict of interest.
References
- Al-Saadi J, Szykman J, Pierce RB, Kittaka C, Neil D, Chu DA, Remer L, Gumley L, Prins E, Weinstock L, MacDonald C, Wayland R, Dimmick F, Fishman J. Improving national air quality forecasts with satellite aerosol observations. Bull Amer Meteor Soc. 2005;86:1249–1261. [Google Scholar]
- Bell ML, Ebisu K, Belanger K. Ambient air pollution and low birth weight in Connecticut and Massachusetts. Environ Health Perspect. 2007;115:1118–1124. doi: 10.1289/ehp.9759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell ML, Belanger K, Ebisu K, Gent JF, Lee HJ, Koutrakis P, Leaderer BP. Prenatal exposure to fine particulate matter and birth weight variations by particulate constituents and sources. Epidemiology. 2010a;21:884–891. doi: 10.1097/EDE.0b013e3181f2f405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell ML, Ebisu K, Peng RD. Community-level spatial heterogeneity of chemical constituent levels of fine particulates and implications for epidemiological research. J Expo Sci Environ Epidemiol. 2010b doi: 10.1038/jes.2010.24. doi:10.1038/jes.2010.24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brunekreef B, Holgate ST. Air pollution and health. Lancet. 2002;360:1233–1242. doi: 10.1016/S0140-6736(02)11274-8. [DOI] [PubMed] [Google Scholar]
- Burton RM, Suh HH, Koutrakis P. Spatial variation in particulate concentrations within metropolitan Philadelphia. Environ Sci Technol. 1996;30:400–407. [Google Scholar]
- Engel-Cox JA, Holloman CH, Coutant BW, Hoff RM. Qualitative and quantitative evaluation of MODIS satellite sensor data for regional and urban scale air quality. Atmos Environ. 2004;38:2495–2509. [Google Scholar]
- Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. Wiley & Sons; New York: 2004. [Google Scholar]
- Franklin M, Zeka A, Schwartz J. Association between PM2.5 and all-cause and specific-cause mortality in 27 US communities. J Expo Sci Environ Epidemiol. 2007;17:279–287. doi: 10.1038/sj.jes.7500530. [DOI] [PubMed] [Google Scholar]
- Gent JF, Triche EW, Holford TR, Belanger K, Bracken MB, Beckett WS, Leaderer BP. Association of low-level ozone and fine particles with respiratory symptoms in children with asthma. JAMA. 2003;290:1859–1867. doi: 10.1001/jama.290.14.1859. [DOI] [PubMed] [Google Scholar]
- Gent JF, Koutrakis P, Belanger K, Triche E, Holford TR, Bracken MB, Leaderer BP. Symptoms and medication use in children with asthma and traffic-related sources of fine particle pollution. Environ Health Perspect. 2009;117:1168–1174. doi: 10.1289/ehp.0800335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green M, Kondragunta S, Ciren P, Xu CY. Comparison of GOES and MODIS aerosol optical depth (AOD) to aerosol robotic network (AERONET) AOD and IMPROVE PM2.5 mass at Bondville, Illinois. J Air Waste Manag Assoc. 2009;59:1082–1091. doi: 10.3155/1047-3289.59.9.1082. [DOI] [PubMed] [Google Scholar]
- Hastie TJ, Tibshirani RJ. Generalized additive models. Chapman & Hall; New York: 1990. [Google Scholar]
- Hoff RM, Christopher SA. Remote sensing of particulate pollution from space: Have we reached the promised land? J Air Waste Manag Assoc. 2009;59:645–675. [PubMed] [Google Scholar]
- Hoek G, Meliefste K, Cyrys J, Lewne M, Bellander T, Brauer M, Fischer P, Gehring U, Heinrich J, van Vliet P, Brunekreef B. Spatial variability of fine particle concentrations in three European areas. Atmos Environ. 2002;36:4077–4088. [Google Scholar]
- Ito K, Xue N, Thurston G. Spatial variation of PM2.5 chemical species and source-apportioned mass concentrations in New York City. Atmos Environ. 2004;38:5269–5282. [Google Scholar]
- Jerrett M, Burnett RT, Ma RJ, Pope CA, Krewski D, Newbold KB, Thurston G, Shi YL, Finkelstein N, Calle EE, Thun MJ. Spatial analysis of air pollution and mortality in Los Angeles. Epidemiology. 2005;16:727–736. doi: 10.1097/01.ede.0000181630.15826.7d. [DOI] [PubMed] [Google Scholar]
- Kim E, Hopke PK, Pinto JP, Wilson WE. Spatial variability of fine particle mass, components, and source contributions during the regional air pollution study in St. Louis. Environ Sci Technol. 2005;39:4172–4179. doi: 10.1021/es049824x. [DOI] [PubMed] [Google Scholar]
- Lee HJ, Gent JF, Leaderer BP, Koutrakis P. Spatial and temporal variability of fine particle composition and source types in five cities of Connecticut and Massachusetts. Sci Total Environ. 2011a;409:2133–2142. doi: 10.1016/j.scitotenv.2011.02.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee HJ, Liu Y, Coull BA, Schwartz J, Koutrakis P. A novel calibration approach of MODIS AOD data to predict PM2.5 concentrations. Atmos Chem Phys. 2011b;11:7991–8002. doi:10.5194/acp-11-7991-2011. [Google Scholar]
- Levy RC, Remer LA, Mattoo S, Vermote EF, Kaufman YJ. Second-generation operational algorithm: Retrieval of aerosol properties over land from inversion of Moderate Resolution Imaging Spectroradiometer spectral reflectance. J Geophys Res. 2007;112:D13211. doi:10.1029/2006JD007811. [Google Scholar]
- Levy RC, Remer LA, Tanre D, Mattoo S, Kaufman YJ. Algorithm for remote sensing of tropospheric aerosol over dark targets from MODIS: Collections 005 and 051: Revision 2; February 2009. MODIS Algorithm Theoretical Basis Document. 2009 [Google Scholar]
- Levy RC, Remer LA, Kleidman RG, Mattoo S, Ichoku C, Kahn R, Eck TF. Global evaluation of the Collection 5 MODIS dark-target aerosol products over land. Atmos Chem Phys. 2010;10:10399–10420. [Google Scholar]
- Liu W, Hopke PK, Han YJ, Yi SM, Holsen TM, Cybart S, Kozlowski K, Milligan M. Application of receptor modeling to atmospheric constituents at Potsdam and Stockton, NY. Atmos Environ. 2003;37:4997–5007. [Google Scholar]
- Liu Y, Sarnat JA, Kilaru V, Jacob DJ, Koutrakis P. Estimating ground-level PM2.5 in the eastern United States using satellite remote sensing. Environ Sci Technol. 2005;39:3269–3278. doi: 10.1021/es049352m. [DOI] [PubMed] [Google Scholar]
- Liu Y, Franklin M, Kahn R, Koutrakis P. Using aerosol optical thickness to predict ground-level PM2.5 concentrations in the St. Louis area: A comparison between MISR and MODIS. Remote Sens Environ. 2007a;107:33–44. [Google Scholar]
- Liu Y, Koutrakis P, Kahn R. Estimating fine particulate matter component concentrations and size distributions using satellite-retrieved fractional aerosol optical depth: Part 1 - Method development. J Air Waste Manag Assoc. 2007b;57:1351–1359. doi: 10.3155/1047-3289.57.11.1351. [DOI] [PubMed] [Google Scholar]
- Liu Y, Koutrakis P, Kahn R, Turquety S, Yantosca RM. Estimating fine particulate matter component concentrations and size distributions using satellite-retrieved fractional aerosol optical depth: Part 2 - A case study. J Air Waste Manag Assoc. 2007c;57:1360–1369. doi: 10.3155/1047-3289.57.11.1360. [DOI] [PubMed] [Google Scholar]
- Liu Y, Paciorek CJ, Koutrakis P. Estimating regional spatial and temporal variability of PM2.5 concentrations using satellite data, meteorology, and land use information. Environ Health Perspect. 2009;117:886–892. doi: 10.1289/ehp.0800123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinto JP, Lefohn AS, Shadwick DS. Spatial variability of PM2.5 in urban areas in the United States. J Air Waste Manag Assoc. 2004;54:440–449. doi: 10.1080/10473289.2004.10470919. [DOI] [PubMed] [Google Scholar]
- Ramanathan V, Crutzen PJ, Kiehl JT, Rosenfeld D. Aerosols, climate, and the hydrological cycle. Science. 2001;294:2119–2124. doi: 10.1126/science.1064034. [DOI] [PubMed] [Google Scholar]
- Remer LA, Kleidman RG, Levy RC, Kaufman YJ, Tanre D, Mattoo S, Martins JV, Ichoku C, Koren I, Yu H, Holben BN. Global aerosol climatology from the MODIS satellite sensors. J Geophys Res. 2008;113:D14S07. doi:10.1029/2007JD009661. [Google Scholar]
- Schaap M, Apituley A, Timmermans RMA, Koelemeijer RBA, de Leeuw G. Exploring the relation between aerosol optical depth and PM2.5 at Cabauw, the Netherlands. Atmos Chem Phys. 2009;9:909–925. [Google Scholar]
- Seinfeld JH, Pandis SN. Atmospheric chemistry and physics: From air pollution to climate change. John Wiley & Sons; New York: 2006. [Google Scholar]
- Suh HH, Nishioka Y, Allen GA, Koutrakis P, Burton RM. The metropolitan acid aerosol characterization study: Results from the summer 1994 Washington, D.C. field study. Environ Health Perspect. 1997;105:826–834. doi: 10.1289/ehp.97105826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas D, Stram D, Dwyer J. Exposure measurement error: Influence on exposure-disease relationships and methods of correction. Annu Rev Publ Health. 1993;14:69–93. doi: 10.1146/annurev.pu.14.050193.000441. [DOI] [PubMed] [Google Scholar]
- U.S. Environmental Protection Agency (U.S. EPA) [15 March 2011];Air quality criteria for particulate matter; 2004. Available: http://cfpub2.epa.gov/ncea/cfm/recordisplay.cfm?deid=87903.
- U.S. Environmental Protection Agency (U.S. EPA) [14 January 2011];National Emissions Inventory (NEI); 2008. Available: http://www.epa.gov/ttn/chief/eiinformation.html.
- Zeger SL, Thomas D, Dominici F, Samet JM, Schwartz J, Dockery D, Cohen A. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect. 2000;108:419–426. doi: 10.1289/ehp.00108419. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Fig. A1. Spatial distributions of the predicted PM2.5 concentrations for 9 clusters.





