Abstract
Background
Development of exposure metrics that identify contrasts in multipollutant air quality across space are needed to better understand multipollutant geographies and health effects from air pollution.
Objective
Our aim is to improve understanding of: 1) long-term spatial distributions of multiple pollutants across urban environments; and 2) demographic characteristics of populations residing within areas that experience differing long-term air quality in order to assist in the development of future epidemiologic studies.
Methods
Data available for this study included seven years of spatiotemporally resolved concentrations for ten ambient air pollutants across the Atlanta metropolitan area. To analyze, we first distinguish the long-term behavior of air pollution at each grid location (n=253) by calculating study period means for each pollutant (n=10). Then, we apply the self-organizing map (SOM) technique to derive patterns in the multipollutant combinations observed among grid cells, i.e., multipollutant spatial types (MSTs), project results onto an ‘organized map’, and classify each grid cell under its most similar MST. Finally, we geographically map grid cell classifications to delineate regions of similar multipollutant characteristics and characterize associated population demographics using geographic information systems.
Results
We found six MSTs well describe the nature of multipollutant combinations experienced at locations in our study area. MST profiles highlighted a range of combinations, from locations experiencing generally clean air quality (all pollutants low) to locations experiencing conditions that were relatively dirty (high long-term concentrations of several pollutants). Mapping the spatial distribution of MSTs revealed strong within-class contiguity and highlighted that downtown areas were dominated by primary pollution and that suburban areas experienced relatively higher levels of secondary pollution. Demographics show that the largest proportion of the overall population of metro Atlanta resided in downtown locations experiencing relatively high levels of primary pollution. Moreover, higher proportions of nonwhites and children in poverty reside in these areas when compared to populations that resided in suburban areas exhibiting relatively lower pollution, moderate secondary pollution, or relatively high SO2.
Conclusion
Placing multipollutant air quality within a geographic regionalization problem reveals the nature and spatial distribution of differential pollutant combinations across urban environments and provides helpful insights for identifying spatial exposure and demographic contrasts for future health studies.
Keywords: Air pollution, classification, cluster analysis, kohonen map, geographic information systems (GIS), multipollutant, mixture
INTRODUCTION
Air quality within urban environments involves a mixture of gaseous and particulate concentrations that are affected by a variety of emission sources, local topographies, and meteorological conditions. As such, complex spatial patterning can occur in urban air quality making the variability of such phenomena difficult to characterize as different pollutants often exhibit differential spatial patterns (e.g., ozone vs. nitrogen dioxides). This is a concern for health scientists in the field of air pollution epidemiology who need to identify appropriate spatial contrasts in their exposure assessments of air pollution (Marshall, Nethery et al. 2008, Hajat, Diez-Roux et al. 2013). Such challenges, in part, have led investigators performing chronic exposure studies to typically focus on one pollutant at a time (Hoek, Krishnan et al. 2013); however, it is well understood that intercorrelations among various pollutants can be problematic for statistical models designed to estimate individual pollutant risk (Tolbert, Klein et al. 2007, Jerrett, Burnett et al. 2013). Therefore, investigations reporting associations between long-term exposure to air pollution and adverse health generally acknowledge that reported associations are likely the result of a pollutant mixture, not the sole effect of the proxy pollutant (Pope, Burnett et al. 2004, Lee, Ferguson et al. 2009, Hoek, Krishnan et al. 2013).
In order to improve our understanding of the health effects of long-term exposure to multiple pollutants it is necessary to examine the entire mix of pollutants (Dominici, Peng et al. 2010, Vedal and Kaufman 2011, Levy, Mihele et al. 2014). However, expanding chronic exposure studies of air pollution to incorporate information on multiple pollutants is expected to be challenging for at least two reasons: 1) measuring/modeling the joint spatial distribution of multiple air pollutants is difficult (Jerrett, Arain et al. 2005, Marshall, Nethery et al. 2008, Riley, Banks et al. 2014, Sororian, Holmes et al. 2014), and 2) characterizing the spatial distribution of multipollutant exposure is complex (Oakes, Baxter et al. 2014). To further complicate matters, different subgroups within the populations at risk (e.g., those with low socioeconomic status (SES)) may be more intensely exposed to air pollution than others, a situation that may confound estimated associations between air pollution and health (Laurent, Bard et al. 2007, Yanosky, Schwartz et al. 2008, Hajat, Diez-Roux et al. 2013).
Given such challenges, development of approaches that can be useful for investigating the health effects of complex multipollutant exposures are highly desired (Dominici, Peng et al. 2010). Recently, many techniques have been presented for characterizing multipollutant exposure (Oakes, Baxter et al. 2014); however, very few have been applied in spatial settings (Molitor, Su et al. 2011, Austin, Coull et al. 2013). Although limited, findings from these studies have noted significant spatial variation in multipollutant exposures within and across cities in the US. Therefore, it is clear more studies are needed to better understand spatial variation of complex exposures as well as heterogeneity in exposure to populations at risk.
In the present study, we use Atlanta, Georgia, as a case study to illustrate a methodological approach for characterizing long-term trends in population exposure to multiple pollutants. Atlanta’s air quality issues are well known and several studies have documented associations with health outcomes including asthma, cardiorespiratory morbidity, and preterm births (Alhanti, Chang et al. 2015, Chang, Warren et al. 2015, Pearce, Waller et al. 2015, Winquist, Schauer et al. 2015). Moreover, a novel set of spatially and temporally resolved multipollutant data is available for the region (Sororian, Holmes et al. 2014) that will allow us to more closely examine air pollution exposure across a unique and diverse population (Pooley 2015). Our general objective is to determine whether and to what extent long-term patterns in multipollutant combinations and populations at risk systematically map onto one another in the Atlanta region. We aim to achieve our objective by addressing the following questions of interest:
What types of long-term multipollutant combinations occur at locations within our study?
What is the spatial distribution of types of multipollutant combinations across our study region?
What demographics are associated with areas differentiated by types of multipollutant combinations?
In answering these questions we hope to improve future epidemiologic studies by increasing our understanding of: 1) the long-term geographic patterns of multipollutant air quality across our study region; and 2) the demographic makeup of populations residing in areas that experience distinct long-term multipollutant exposure.
METHODS
The principal focus of our approach is to identify geographic locations in our study area with similar long-term multipollutant characteristics in order to better understand local, long-term population exposure to ambient multipollutant mixtures. This is achieved in four stages: 1) divide the study area into grid cells, within which it is assumed the spatial distribution of pollution is relatively homogeneous, 2) define a number of multipollutant spatial types that describe the nature of the pollutant attributes of the grid cells, 3) characterize multipollutant geographies by mapping grid assignments to multipollutant spatial types in the study area, and 4) describe the demographic characteristics of the populations residing in locations corresponding to areas defined by the multipollutant spatial types.
Multipollutant Air Quality Data Acquisition
Available data for this study included seven years (2002 to 2008) of spatially and temporally resolved air pollution concentrations at a twelve kilometer gridded spatial resolution for ten ambient air pollutants obtained for a 31,285 km2 study area encompassing Atlanta, Georgia (Sororian, Holmes et al. 2014). This area contained 253 grid cells (Figure 1). In brief, data at each grid cell are daily concentration estimates obtained from calibrating gridded output from the Community Multi-scale Air Quality (CMAQ) model against measurements from monitoring sites in the study area – a.k.a. ‘fusion’ data (Sororian, Holmes et al. 2014). Pollutants available included 1-hr maximum carbon monoxide (CO) in ppm, 1-hr maximum nitrogen dioxide (NO2) and nitrous oxides (NOx) in ppb, 8-hr maximum ozone (O3) in ppb, 1-hr maximum sulfur dioxide (SO2) in ppb, and five 24-hr average PM2.5 components in µg/m3: elemental carbon (EC), organic carbon (OC), nitrate (NO3), ammonium (NH4), and sulfate (SO4). See Table 1 for summary statistics of these data.
Figure 1.
Map of study area illustrating air quality modeling fusion grid and population density. Note the white area in the south central portion reflects the location of Hartsfield-Jackson International Airport (no residents).
Table 1.
Summary statistics for air pollution with our study area during the years 2002 to 2008. (SD: standard deviation; CV: SD/mean; n=646,415 grid cell-days)
| Pollutant | Mean | SD | Min | Max | Units | Daily Temporal Metric |
CV |
|---|---|---|---|---|---|---|---|
| CO | 0.39 | 0.15 | 0.25 | 1.01 | ppm | 1 hr max | 0.38 |
| NO2 | 10.44 | 6.40 | 3.55 | 35.42 | ppb | 1 hr max | 0.61 |
| NOX | 0.020 | 0.015 | 0.006 | 0.091 | ppm | 1 hr max | 0.76 |
| O3 | 0.043 | 0.001 | 0.038 | 0.045 | ppm | 8 hr max | 0.02 |
| SO2 | 8.10 | 2.90 | 4.35 | 28.50 | ppb | 1 hr max | 0.36 |
| EC | 0.69 | 0.24 | 0.33 | 1.69 | ug/m3 | 24 hr avg | 0.35 |
| OC | 2.62 | 0.29 | 1.84 | 3.48 | ug/m3 | 24 hr avg | 0.11 |
| NH4 | 1.34 | 0.13 | 1.06 | 1.65 | ug/m3 | 24 hr avg | 0.10 |
| NO3 | 0.60 | 0.08 | 0.43 | 0.78 | ug/m3 | 24 hr avg | 0.14 |
| SO4 | 4.30 | 0.22 | 3.87 | 5.13 | ug/m3 | 24 hr avg | 0.05 |
Identify Spatial Profiles that Define Multipollutant Spatial Types (MSTs)
To address our first question, we apply an unsupervised learning tool known as the self-organizing map (SOM) to identify the types of multipollutant combinations that occur among the grid cells in our study area (Kohonen 2001). SOM uses an optimized clustering procedure to identify data-driven profiles that are used to formulate categories and then projects resulting profiles onto a spatially organized array – the ‘map’. We find the SOM algorithm to be appealing for air pollution mixture studies as it has the additional benefit of using the ‘map’ for visualization, a feature we find particularly useful when trying to understand relationships between profiles.
SOM Algorithm
In order to apply SOM two components must be specified by the user - the input data matrix and the output map (Figure 2). Here, the input matrix is our multipollutant data set, Z:
| Eq. 1 |
where n denotes the number of grid cell locations and p the number of pollutants. Each grid cell is represented by a row Zi within Z. The output collection of nodes (i.e., multipollutant profiles) is the “map”, M:
| Eq. 2 |
with each profile m represented as a node at location (x, y) on the map (Figure 2). Note X×Y determines the number of nodes (i.e., number of profiles) and the arrangement (e.g., 1D or 2D) of M. The shape M of is most commonly rectangular but can be other variations (e.g., hexagonal). Each node m is associated with a profile defined as vector
| Eq. 3 |
where μ are ‘learned’ coefficient values corresponding to the pollutant concentration values that characterize profile m.
Figure 2.
An illustration of the Self-Organizing Map (SOM) unsupervised learning procedure using a 6×5 SOM grid (i.e., 30 profiles) example. Note: Nodes represent locations of resulting profiles on the map.
Operationally, SOM implements the following steps. First, given M, map initialization occurs with each m being assigned a preliminary wm from a random selection of Zi ’s. Then, iterative learning begins where, for each iteration t, the algorithm randomly chooses a grid cell’s profile from Z and then computes a measure of (dis)similarity (in our case the Euclidean distance) between the observation and each . Next, SOM provisionally assigns a best matching node m*(t) whose wm* is most similar to each . Next, class profile development occurs via the Kohonen learning process:
| Eq. 4 |
where α is the learning rate, Nm*i is a neighborhood function that spatially constrains the neighborhood of m* on M, and Z̄ is the mean of pollutant values on days provisionally assigned to the nodes within the neighborhood set. The learning rate controls the magnitude of updating that occurs for t. The neighborhood function, which activates all nodes up to a certain distance on M from m*, forces similarity between neighboring nodes on M. Equation (4) updates coefficients within a neighborhood of m*, where the impact of the neighborhood decreases over iterations.
SOM performance is dependent on both α and N and thus mappings are sensitive to these parameters30. Therefore, in effort to provide guidance we note that α typically starts as small number and is specified to decrease monotonically (e.g., 0.05 to 0.01) as iterations increase. Similarly, the range of N starts large (e.g., 2/3 map size) and decreases to 1.0 over a predetermined termination period (e.g., 1/3 of iterations), after which fine adjustment of the map occurs.
Training continues for the number of user-defined iterations. Kohonen recommends the number of steps be at least 500 times the number of nodes on the map. Once training is complete, results include final coefficient values for each node’s wm, classification assignments for each day Zi, and coordinates of nodes on M. The final step is to visualize the class profiles by plotting the map. For additional details regarding SOM, please refer to the book of Kohonen (2001).
SOM Implementation
Application of SOM requires three steps. First, we calculate long-term means for each pollutant at each CMAQ grid cell during years 2002 to 2008. Next, we standardize the long-term averages of each pollutant by grid cell by subtracting the overall grid cell mean and dividing by the standard deviation in order to remove the absolute differences between variable magnitudes of different pollutants yet retain ratios between variable amplitudes. We then determined an appropriate number of spatial profiles by assessing (1) the grouping structure of our data, (2) the information retained by resulting classifications, and (3) the area size of the categories in order to better understand the number of potentially exposed. We evaluate grouping structure using principal component analysis (PCA), information retention using regression models where SOM classifications are assessed as a categorical predictor for each pollutant in the profile, and potential area size through evaluation of grid cell class assignments. This information was then used collectively to determine the number of profiles for the SOM algorithm.
Once the number of profiles was determined, SOM was applied to our entire data set using parameters described in (Pearce, Waller et al. 2014). Resulting spatial profiles are referred as multipollutant spatial types (MSTs) and are referenced on the ‘map’ using SOM [x,y] coordinates. It is important to note that our SOM is not a geographic map but rather a projection of resulting profiles onto a two-dimensional grid where locational proximity reflects profile similarity. In short, SOM aims to preserve the topology of the original multidimensional data space, a feature that results in neighboring profiles being more similar and distant profiles being more dissimilar. To enhance interpretation, MST profiles are visualized using barplots with mean centered concentrations on a percentage scale.
Geographic Distribution of Multipollutant Spatial Types
We visualize the spatial distribution of multipollutant combinations across our study area (to address our second question) using color-coded map that differentiates locations based on their assigned MST. The result distinguishes area boundaries among grid cells based on individual cells long-term air quality and serves to identify regions defined by MSTs. The map was spatially referenced using North American Datum 1983 and projected using the Georgia Statewide Lambert Conformal Conic system.
Population Characteristics of Multipollutant Spatial Types
To address our third question, we obtained population data from the US Census Topologically Integrated Geographic Encoding and Referencing (TIGER) products that provide geographic boundary data merged with 2010 Census data and 2008 to 2012 5- year estimates from the American Community Survey (United States Census Bureau 2010). Data were collected at the census tract level and variables of interest included total population, child subpopulation (aged < 18yrs), nonwhite subpopulation, and the percent of children (aged<18yrs) living in poverty. Poverty statistics presented in the ACS rely on a set of money income thresholds that vary by family size and composition. If the family’s income is less than the federal poverty threshold, then family and all included individuals are considered to be in poverty. The poverty thresholds do not vary geographically and are updated annually to allow for changes in the cost of living. For more detail see: “How Poverty is Calculated in the ACS” (US Census Bureau 2015).
We then used geographic information systems to match census tracts to the grid cell (Figure 1) in which their geographic centroids were located. Once matched, we calculate aggregate summaries for demographics under each MST category in order to get region specific population summaries.
RESULTS
Selecting the number of multipollutant spatial profiles
PCA projections suggest at least five primary modes of variation in our data: 1) CO, NO2, NOX, and EC; 2) NO3 and NH4; 3) SO4 and OC; 4) SO2; and 5) O3 (Figure 3). These can generally be described as the building blocks of air quality in Atlanta and likely correspond to variation driven by traffic related pollutants (1), secondary inorganic aerosols (2), secondary organic aerosols and sulfate (3), sulfur dioxide emissions (4), and region-wide ozone levels (5).
Figure 3.
Graphical and statistical evaluation measures used to guide number of spatial profiles for self-organizing map (SOM) analysis. Panel (a) presents a principal components analysis projection of our multipollutant data. The grey points represent the scores for each location along the first two principal components and the dark arrows indicate the corresponding loading vectors for each pollutant. Panel (b) displays the distribution of frequency assignments for each spatial type. Grey points reflect observed frequencies and trend line reflects the mean. Panel (c) displays the distribution of adjusted R2 values from simple linear regression models fit to each pollutant using multipollutant spatial types as a categorical predictor in the model. Each pollutant has a unique symbol and the trend line reflects the mean R2.
Frequency counts of grid cells assigned to each spatial type illustrate an anticipated reduction in the sample size as class number increases (i.e., fewer grid cells in each class when there are more classes) (Figure 3b). We prefer our SOM analysis to provide categorizations that will be useful for further analysis and thus we have added a reference line of 10%, which shows when our classifications capture ‘rare’ spatial profiles. For example, we see that a SOM classification with eight profiles identifies three spatial types that were observed in less than 10% of the locations.
Results from using SOM classes as categorical predictors of individual pollutant variance show a strong relationship between the number of classes and the explanatory power of the SOM classification (Figure 3c). The ability of the SOM classification to predict long-term average differs among the pollutants evaluated with NOX and NO2 generally being explained well and SO2 and O3 being explained poorly.
In combination, these results display aspects of the underlying variance structure in the data and illustrate how different partitions of the data can be used to capture features of interest for exposure characterization. For this study, we determined that a partition of the data into six multipollutant categories was appropriate as it reasonably captures the variation of our pollutants (Figure 3c) and has the benefit of samples sizes that identify both typical and rarer combinations in the data (Figure 3b).
Spatial Profiles for Multipollutant Spatial Types
To begin, we present a 3×2 SOM characterizing ambient air quality using six categories of locations reflecting the range of multipollutant combinations modeled at locations in our study area. Each category defines a spatial profile describing a multipollutant spatial type (MST) and is referenced using SOM [x,y] coordinates. Furthermore, relative concentrations of pollutants for the MST profiles are visualized using barplots with mean centered values on a percentage scale (Figure 4) and actual concentrations are presented in Table 2.
Figure 4.
A 3×2 SOM describing long-term air quality across Atlanta, GA, during 2002 to 2008 with six multipollutant spatial types (MSTs). For each spatial type, barplots reflect the mean (±SD) pollutant concentration of grid cells assigned to each of the MSTs. The y-axis of each plot has been centered to the overall mean across all grid cells for each individual pollutant and is expressed on a percentage scale. SOM coordinates are in brackets [x,y] and the relative frequencies and within-class sample sizes are in the upper right hand corner of each panel.
Table 2.
Air pollutant and geographic summary statistics (mean (SD)) for grid cells assigned to each multipollutant spatial type.
| SO M [X,Y] |
Area (km2) |
Are a % |
CO | NO2 | NOX | O3 | SO2 | EC | OC | NH4 | NO3 | SO4 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [1,1] | 2284.1 | 7.7 | 0.27 (0.01) |
5.42 (2.48) |
0.01 (0) | 0.04 (0) |
6.37 (1.17) |
0.47 (0.07) |
2.34 (0.25) |
1.2 (0.09) |
0.52 (0.06) |
4.05 (0.09) |
| [1,2] | 1774.1 | 6.0 | 0.4 (0.06) |
11.7 (2.92) |
0.02 (0.01) |
0.04 (0) |
14.85 (4.66) |
0.76 (0.12) |
2.76 (0.15) |
1.4 (0.1) | 0.6 (0.05) |
4.73 (0.15) |
| [2,1] | 11680. 3 |
39.2 | 0.34 (0.05) |
8.44 (2.26) |
0.01 (0) | 0.04 (0) |
8.55 (1.67) |
0.63 (0.1) |
2.66 (0.19) |
1.27 (0.08) |
0.55 (0.05) |
4.31 (0.12) |
| [2,2] | 4316.3 | 14.5 | 0.61 (0.09) |
20.39 (3.19) |
0.04 (0.01) |
0.04 (0) |
9.22 (1.7) | 1.06 (0.12) |
2.97 (0.11) |
1.39 (0.06) |
0.62 (0.04) |
4.46 (0.13) |
| [3,1] | 8247.6 | 27.7 | 0.36 (0.05) |
8.77 (2.49) |
0.02 (0) | 0.04 (0) |
6.09 (1.03) |
0.61 (0.11) |
2.5 (0.2) | 1.48 (0.07) |
0.7 (0.05) |
4.23 (0.13) |
| [3,2] | 1479.9 | 5.0 | 0.86 (0.11) |
30.44 (2.71) |
0.07 (0.01) |
0.04 (0) |
11.69 (1.8) |
1.39 (0.13) |
3.18 (0.16) |
1.47 (0.05) |
0.65 (0.02) |
4.63 (0.13) |
The most typical grid cells in our study area are characterized by the MST profiles on the bottom row of Figure 4. The most common, MST [2,1], identified that 33% of the grid cells in our study area experienced below average concentrations for all pollutants. The second most frequent, MST [3,1], identifies that 26% of locations experienced conditions with above average NH4 and NO3 in combination with below average concentrations for all other pollutants. MST [1,1] captures conditions that experienced well below average concentrations for all pollutants and covered 19% of locations in our study area.
In the upper row of the map we find spatial types that were less common and indicative of grid cells experiencing higher levels of long-term average pollution. In the upper left, MST [1,2] covers 6% of grid cells in the study region that experienced the highest long-term concentrations of SO2 in conjunction with slightly above average concentrations for all other pollutants except O3 and NO3. MST [2,2] and MST [3,2] covered a combined 16% of locations with similar profiles exhibiting higher than average concentrations for all pollutants (in particular primary pollutants) except O3, which was slightly below average. However, MST [3,2] presents concentrations that are 1.2 times higher overall than MST [2,2].
The identified spatial profiles captured a range of combinations present in the data for locations in our study area, from conditions where all pollutants measured relatively low to conditions with high concentrations of secondary or primary pollutants or both.
Geographic Distribution of Multipollutant Spatial Types
Mapping the locational assignments of each MST illustrates an approach for characterizing the spatial distribution of the types of ambient air quality mixtures found in our study area (Figure 5). Results reveal strong contiguity of the classification assignments and indicate a tendency for multipollutant combinations to regionalize across the study area. Central locations representing the urban core in our study area were assigned to MST [3,2] and [2,2], indicating that 20% of the study area experienced relatively high levels of primary pollution. Given the proximity to major interstate highways, these conditions are likely reflective of areas that experienced high traffic volume. Moving away from downtown we can see that 28% of the study region, primarily in the upper northeastern corner, is dominated by above average long-term levels of NH4 and NO3 (MST [3,1]). Moving to the west of downtown we see a small collection of disjointed areas dominated by MST [1,2], a profile of high SO2. In Atlanta, monitored SO2 values are often associated with plume touch-downs from the coal fired power plant to the west of the study area so smaller geographic concentrations of high long-term pollutants located to the west of the city are consistent with these findings. The locations surrounding these high SO2 areas and to the south of downtown are assigned to MST [2,1], a relatively low long-term pollution profile. Finally, the outer boundaries in the north and south-southwest of our study region are assigned to the low pollution profile MST [1,1]. In sum, spatial distributions suggest that downtown areas are more consistently dominated by increased primary pollution and outer suburban areas of the study area experienced higher levels of secondary pollution (Figure 5), a finding broadly consistent with other research findings from Atlanta (Wade, Mulholland et al. 2006).
Figure 5.
Map illustrating the spatial distribution of our SOM-based multipollutant spatial types (MSTs) of ambient air quality across the Atlanta Metropolitan Area. MST [X,Y] coordinates correspond to the profile labels in Figure 3.
Population Characteristics of Multipollutant Spatial Types
Demographic summaries indicate that the largest proportion of the population (Table 3) resides in locations with air quality defined by MST [2,2], suggesting that a substantial segment of the Atlanta population experienced relatively high long-term levels of primary pollution (CO, NO2, NOX, and EC) during the study period. The second most populated air quality region is MST [3,2], which is a high pollution region that encompasses the downtown area. Population demographics (percent children, percent nonwhite, and percent of children below poverty level) in each MST follow trends in total population (as expected). However, considering the composition of the population in each MST, MSTs [2,2] and [3,2] have higher population densities, higher proportions of nonwhite residents (47 and 60%) and MST [3,2] has the highest proportion of children living in poverty (6% of total population; 24% of child population) compared to other MSTs (<=5% of total population; <=17% of child population).
Table 3.
Summary statistics for select variables from the 2010 US Census and 2007–2012 American Community Survey data.
| SO M [X,Y ] |
Are a (km ^2) |
Propo rtion of Total Area (%) |
Popul ation Count |
Propo rtion of Total Popul ation (%) |
Populati on Density (person /km^2) |
Child Popul ation Count |
Propo rtion of Childr en (%) |
Child Density (person /km^2) |
Nonw hite Popul ation Count |
Propo rtion of Nonw hite (%) |
Nonwhi te Density (person /km^2) |
Chil dren in Pov erty Cou nt |
Propo rtion of Childr en in Pover ty (%) |
Poverty Density (person /km^2) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [1,1 ] |
228 4.1 |
7.7 | 5834 9 |
1.0 | 25.5 | 1520 8 |
26.1 | 6.7 | 7404 | 12.7 | 3.2 | 263 2 |
17.3 | 1.2 |
| [1,2 ] |
177 4.1 |
6.0 | 2673 52 |
4.5 | 150.7 | 8289 0 |
31.0 | 46.7 | 6575 5 |
24.6 | 37.1 | 110 10 |
13.3 | 6.2 |
| [2,1 ] |
116 80.3 |
39.2 | 7874 16 |
13.4 | 67.4 | 2318 71 |
29.4 | 19.9 | 2085 41 |
26.5 | 17.9 | 365 54 |
15.8 | 3.1 |
| [2,2 ] |
431 6.3 |
14.5 | 2397 102 |
40.7 | 555.4 | 7333 47 |
30.6 | 169.9 | 1134 710 |
47.3 | 262.9 | 854 35 |
11.7 | 19.8 |
| [3,1 ] |
824 7.6 |
27.7 | 9246 71 |
15.7 | 112.1 | 2776 29 |
30.0 | 33.7 | 1779 44 |
19.2 | 21.6 | 401 06 |
14.4 | 4.9 |
| [3,2 ] |
147 9.9 |
5.0 | 1452 472 |
24.7 | 981.5 | 3757 22 |
25.9 | 253.9 | 8822 55 |
60.7 | 596.2 | 901 05 |
24.0 | 60.9 |
| Ove rall |
297 82.4 |
100.0 | 5887 362 |
100 | 197.7 | 1716 667 |
29.2 | 57.6 | 2476 609 |
42.1 | 83.2 | 265 842 |
15.5 | 8.9 |
DISCUSSION
In this study, we deconstructed complex air pollution data into six multipollutant spatial types (MSTs) that represent long-term patterns in air quality at locations across our study area. Overall, the identified MSTs captured a range of air quality scenarios across our locations as profiles included conditions dominated by: relatively low levels of pollution, relatively high concentrations of single pollutants, and relatively high levels of multiple pollutants (Figure 3). We found that the spatial contrasts were most evident for primary pollutants – in particular oxides of nitrogen (NOX and NO2) and to a somewhat lesser degree CO, EC, and SO2. These results agree well with other spatial studies of ambient air pollution in Atlanta and those that have relied on oxides of nitrogen to capture spatial variation in ambient air pollution (Briggs, Collins et al. 1997, Wade, Mulholland et al. 2006). Our results also identified very little spatial variation for secondary pollutants, i.e., O3, OC, NH4, NO3, and SO4. Such results indicate that primary pollutants (particularly traffic related) may be most useful in identifying spatial contrasts for long term health effects studies of multipollutant mixtures. It is important to note that the data used in this study are the result of a data fusion between ambient air monitoring data and CMAQ model estimates and thus MST profiles reflect a blend of observations and expected concentrations based on the geographic distribution emissions and meteorology in the region (Sororian, Holmes et al. 2014).
Mapping the geographic assignments of the MSTs reveals strong patterns in the spatial distribution of multipollutant air quality and resulted in the identification of clearly delineated multipollutant regions in our study area (Figure 5). With these results we found a general pattern of air quality slowly shifting from locations dominated by higher concentrations of primary pollutants to locations dominated by secondary pollution as one moves further away from downtown, however, certain areas outside the downtown area did experience elevated long-term NH4 and NO3 or SO2. As such, it is clear that the strongest multipollutant contrasts are found between central urban locations and peripheral suburban areas.
Analysis of the demographics associated with our MST regions showed that the largest proportion of the population, along with the largest proportions of our subpopulations of interest (children, nonwhite, and children in poverty), resided in locations where air quality was generally dominated by relatively high long-term levels of primary pollution. This finding agrees well with other studies that have shown that higher exposure to air pollution occurs in communities with higher proportions of poverty and minorities (Molitor, Su et al. 2011, Hajat, Diez-Roux et al. 2013).
We also illustrate how an unsupervised learning tool (SOM) can be paired with geographic information systems to identify regions experiencing similar multipollutant air quality within an intraurban environment from complex data. The SOM approach has the attractive ability to deconstruct complex data into an interpretable collection of categories that can be visualized on an array revealing associations (the SOM ‘map’) and in a geographic context (the traditional map) to promote further understanding of interclass relationships across the study area. For example, looking at the organization of profiles on our SOM (Figure 4), we are able to generally infer that residents assigned to MST [3,2] experience air quality similar to residents of MST [2,2] but very different air quality than residents assigned to MST [1,1]. This is because proximity reflects similarity of profiles on the SOM.
A limitation of the approach presented here is our use of mean values at each location to represent local long-term pollution over the study period. Alternative measures such as the maximum or variance, may be more useful in identifying locations that experience the most extreme conditions or the most variable. Another potential shortcoming is our inclusion of pollutants that demonstrate limited spatial variation across our study area. For example, ozone concentrations were found to be quite similar across our MSTs and thus it is likely that O3 played a limited role in the formation of our spatial profiles. Nevertheless, we chose to include O3 in this study to assess the approach and to identify which pollutants in our available data would be most appropriate for developing spatial profiles; O3 is also an important health-relevant component of the air quality mixture in the Atlanta area (Strickland, Darrow et al. 2010, Pearce, Waller et al. 2015, Winquist, Schauer et al. 2015). Another potential limitation of the work was the spatial resolution of the 12km data. While this improves spatial coverage considerably over air monitoring network data, it will be interesting to explore finer scale resolutions for identifying intraurban spatial contrasts as such data become more widely available. Finally, our choice of six multipollutant spatial types was somewhat subjective; nevertheless, optimal statistical methods (i.e., clustering statistics) for identifying groups in data may not be optimal for defining pollutant-health associations.
The natural next step from this work would be to apply SOM to generate a multipollutant exposure metric for a long-term health effects study of multiple air pollutants. Beyond application in a health study, several areas in the development of our spatial exposure metrics could be refined. One area of interest is the evaluation of the importance of geographic scale for spatial studies of multiple pollutants. For example, the development of a multipollutant metric for a study of the southeastern US might include pollutants that are different than a study of downtown Atlanta due to the differences in the size of the study domains and the nature in which pollutants may vary within them. Another area of interest, which is currently under investigation in time-series mixture studies (Bobb, Valeri et al. 2014), involves variable selection with a goal of including only pollutants with reasonable spatial variability or pollutants with strong health associations. Another area of continued work involves the use of demographic data to guide the formation of multipollutant regions minimizing potential confounding. For example, the associations between poverty and poorer air quality seen here suggest that separation of an air pollution effect from a poverty effect in an epidemiological study may be difficult.
CONCLUSION
The method presented in this paper can be used to both elucidate the nature in which combinations of pollutants vary across geographic space and to explore associations with populations at risk of exposure. This approach can be useful for multiple purposes, including the development of epidemiologic studies of the long-term health effects of air pollutant mixtures.
Highlights.
This study applies the self-organizing map technique and geographic information systems to improve understanding of the spatial behavior of multiple pollutants and their relationships with populations at risk.
Results reveal that a broad range of long-term conditions occurred across our study area, from relatively ‘clean’ locations to relatively ‘dirty’ areas.
The spatial distribution of multipollutant behaviors identified distinct contrasts in the types of air quality experienced between downtown residents and suburban residents.
Demographic characteristics showed that a substantial portion of the overall population, nonwhites, and children in poverty, experience relatively high levels of primary pollutants.
This work lays a foundation for future epidemiologic studies of chronic health effects of multipollutant mixtures.
Acknowledgments
This publication was made possible, in part, by US Environmental Protection Agency grant R834799. USEPA does not endorse the purchase of any commercial products or services mentioned in the publication. Research reported in this publication was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under Award Number K99/R00ES023475. The content is solely the responsibility of the authors and does not necessarily represent the official views of the USEPA or NIH. We thank Anda Olsen for her comments on the writing, and the research team at the Southeastern Center for Air Pollution Epidemiology (SCAPE: http://www.scape.gatech.edu/) for their comments and reviews of this paper.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- Alhanti BA, Chang HH, Winquist A, Mulholland JA, Darrow LA, Sarnat SE. Ambient air pollution and emergency department visits for asthma: a multi-city assessment of effect modification by age. Journal of Exposure Science and Environmental Epidemiology. 2015 doi: 10.1038/jes.2015.57. [DOI] [PubMed] [Google Scholar]
- Austin E, Coull BA, Zanobetti A, Koutrakis P. A framework to spatially cluster air pollution monitoring sites in US based on the PM 2.5 composition. Environment international. 2013;59:244–254. doi: 10.1016/j.envint.2013.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bobb JF, Valeri L, Henn BC, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2014 doi: 10.1093/biostatistics/kxu058. kxu058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Briggs DJ, Collins S, Elliott P, Fischer P, Kingham S, Lebret E, Pryl K, van Reeuwijk H, Smallbone K, Van Der Veen A. Mapping urban air pollution using GIS: a regression-based approach. International Journal of Geographical Information Science. 1997;11(7):699–718. [Google Scholar]
- Chang HH, Warren JL, Darrow LA, Reich BJ, Waller LA. Assessment of critical exposure and outcome windows in time-to-event analysis with application to air pollution and preterm birth study. Biostatistics. 2015 doi: 10.1093/biostatistics/kxu060. kxu060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dominici F, Peng RD, Barr CD, Bell ML. Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology (Cambridge, Mass.) 2010;21(2):187. doi: 10.1097/EDE.0b013e3181cc86e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hajat A, Diez-Roux AV, Adar SD, Auchincloss AH, Lovasi GS, O’Neill MS, Sheppard L, Kaufman JD. Air pollution and individual and neighborhood socioeconomic status: evidence from the Multi-Ethnic Study of Atherosclerosis (MESA) Environmental health perspectives. 2013;121(11–12):1325. doi: 10.1289/ehp.1206337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoek G, Krishnan RM, Beelen R, Peters A, Ostro B, Brunekreef B, Kaufman JD. Long-term air pollution exposure and cardio-respiratory mortality: a review. Environ Health. 2013;12(1):43. doi: 10.1186/1476-069X-12-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jerrett M, Arain A, Kanaroglou P, Beckerman B, Potoglou D, Sahsuvaroglu T, Morrison J, Giovis C. A review and evaluation of intraurban air pollution exposure models. Journal of Exposure Science and Environmental Epidemiology. 2005;15(2):185–204. doi: 10.1038/sj.jea.7500388. [DOI] [PubMed] [Google Scholar]
- Jerrett M, Burnett RT, Beckerman BS, Turner MC, Krewski D, Thurston G, Martin RV, van Donkelaar A, Hughes E, Shi Y. Spatial analysis of air pollution and mortality in California. American journal of respiratory and critical care medicine. 2013;188(5):593–599. doi: 10.1164/rccm.201303-0609OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kohonen T. Self-organizing maps. Berlin; New York: Springer; 2001. [Google Scholar]
- Laurent O, Bard D, Filleul L, Segala C. Effect of socioeconomic status on the relationship between atmospheric pollution and mortality. Journal of epidemiology and community health. 2007;61(8):665–675. doi: 10.1136/jech.2006.053611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee D, Ferguson C, Mitchell R. Air pollution and health in Scotland: a multicity study. Biostatistics. 2009 doi: 10.1093/biostatistics/kxp010. kxp010. [DOI] [PubMed] [Google Scholar]
- Levy I, Mihele C, Lu G, Narayan J, Brook JR. Evaluating multipollutant exposure and urban air quality: pollutant interrelationships, neighborhood variability, and nitrogen dioxide as a proxy pollutant. Environmental health perspectives. 2014;122(1):65. doi: 10.1289/ehp.1306518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marshall JD, Nethery E, Brauer M. Within-urban variability in ambient air pollution: comparison of estimation methods. Atmospheric Environment. 2008;42(6):1359–1369. [Google Scholar]
- Molitor J, Su JG, Molitor NT, Rubio VG, Richardson S, Hastie D, Morello-Frosch R, Jerrett M. Identifying Vulnerable Populations through an Examination of the Association Between Multipollutant Profiles and Poverty. Environmental Science & Technology. 2011;45(18):7754–7760. doi: 10.1021/es104017x. [DOI] [PubMed] [Google Scholar]
- Oakes M, Baxter L, Long TC. Evaluating the application of multipollutant exposure metrics in air pollution health studies. Environment international. 2014;69:90–99. doi: 10.1016/j.envint.2014.03.030. [DOI] [PubMed] [Google Scholar]
- Pearce JL, Waller LA, Chang HH, Klein M, Mulholland JA, Sarnat JA, Sarnat SE, Strickland MJ, Tolbert PE. Using self-organizing maps to develop ambient air quality classifications: a time series example. Environmental Health. 2014;13(1):56. doi: 10.1186/1476-069X-13-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearce JL, Waller LA, Mulholland JA, Sarnat SE, Strickland MJ, Chang HH, Tolbert PE. Exploring associations between multipollutant day types and asthma morbidity: epidemiologic applications of self-organizing map ambient air quality classifications. Environmental Health. 2015;14(1):55. doi: 10.1186/s12940-015-0041-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pooley K. Segregation's New Geography: The Atlanta Metro Region, Race, and the Declining Prospects for Upward Mobility. 2015 [Google Scholar]
- Pope CA, Burnett RT, Thurston GD, Thun MJ, Calle EE, Krewski D, Godleski JJ. Cardiovascular mortality and long-term exposure to particulate air pollution epidemiological evidence of general pathophysiological pathways of disease. Circulation. 2004;109(1):71–77. doi: 10.1161/01.CIR.0000108927.80044.7F. [DOI] [PubMed] [Google Scholar]
- Riley EA, Banks L, Fintzi J, Gould TR, Hartin K, Schaal L, Davey M, Sheppard L, Larson T, Yost MG. Multi-pollutant mobile platform measurements of air pollutants adjacent to a major roadway. Atmospheric Environment. 2014;98:492–499. doi: 10.1016/j.atmosenv.2014.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sororian SA, Holmes HA, Friberg M, Ivey C, Hu Y, Mulholland JA, Russell AG, Strickland MJ. Air Pollution Modeling and its Application XXIII. Springer; 2014. Temporally and Spatially Resolved Air Pollution in Georgia Using Fused Ambient Monitor Data and Chemical Transport Model Results; pp. 301–306. [Google Scholar]
- Strickland MJ, Darrow LA, Klein M, Flanders WD, Sarnat JA, Waller LA, Sarnat SE, Mulholland JA, Tolbert PE. Short-term associations between ambient air pollutants and pediatric asthma emergency department visits. American journal of respiratory and critical care medicine. 2010;182(3):307–316. doi: 10.1164/rccm.200908-1201OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tolbert PE, Klein M, Peel JL, Sarnat SE, Sarnat JA. Multipollutant modeling issues in a study of ambient air quality and emergency department visits in Atlanta. Journal of Exposure Science and Environmental Epidemiology. 2007;17:S29–S35. doi: 10.1038/sj.jes.7500625. [DOI] [PubMed] [Google Scholar]
- United States Census Bureau. TIGER/Line with Selected Demographic and Economic Data. 2010 [Google Scholar]
- US Census Bureau. How the Census Bureau Measures Poverty. 2015 Retrieved October 29, 2015, 2015, from https://www.census.gov/hhes/www/poverty/about/overview/measure.html.
- Vedal S, Kaufman JD. What does multi-pollutant air pollution research mean? American journal of respiratory and critical care medicine. 2011;183(1):4–6. doi: 10.1164/rccm.201009-1520ED. [DOI] [PubMed] [Google Scholar]
- Wade KS, Mulholland JA, Marmur A, Russell AG, Hartsell B, Edgerton E, Klein M, Waller L, Peel JL, Tolbert PE. Effects of instrument precision and spatial variability on the assessment of the temporal variation of ambient air pollution in Atlanta, Georgia. Journal of the Air & Waste Management Association. 2006;56(6):876–888. doi: 10.1080/10473289.2006.10464499. [DOI] [PubMed] [Google Scholar]
- Winquist A, Schauer JJ, Turner JR, Klein M, Sarnat SE. Impact of ambient fine particulate matter carbon measurement methods on observed associations with acute cardiorespiratory morbidity. Journal of Exposure Science and Environmental Epidemiology. 2015;25(2):215–221. doi: 10.1038/jes.2014.55. [DOI] [PubMed] [Google Scholar]
- Yanosky JD, Schwartz J, Suh HH. Associations between measures of socioeconomic position and chronic nitrogen dioxide exposure in Worcester, Massachusetts. Journal of Toxicology and Environmental Health, Part A. 2008;71(24):1593–1602. doi: 10.1080/15287390802414307. [DOI] [PubMed] [Google Scholar]





