Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 May 17.
Published in final edited form as: Environ Sci Technol. 2016 Apr 26;50(10):5111–5118. doi: 10.1021/acs.est.5b06001

Combining Land-Use Regression and Chemical Transport Modeling in a Spatio-temporal Geostatistical Model for Ozone and PM2.5

Meng Wang 1,*, Paul D Sampson 2, Jianlin Hu 3, Michael Kleeman 4, Joshua P Keller 5, Casey Olives 1, Adam A Szpiro 5, Sverre Vedal 1, Joel D Kaufman 1
PMCID: PMC5096654  NIHMSID: NIHMS825846  PMID: 27074524

Abstract

Assessments of long-term air pollution exposure in population studies have commonly employed land use regression (LUR) or chemical transport modeling (CTM) techniques. Attempts to incorporate both approaches in one modeling framework are challenging. We present a novel geostatistical modeling framework, incorporating CTM predictions into a spatio-temporal LUR model with spatial smoothing to estimate spatio-temporal variability of ozone (O3) and particulate matter with diameter less than 2.5 μm (PM2.5) from 2000 to 2008 in the Los Angeles Basin. The observations include over nine years’ data from more than 20 routine monitoring sites and specific monitoring data at over 100 locations to provide more comprehensive spatial coverage of air pollutants. Our composite modeling approach outperforms separate CTM and LUR models in terms of root mean square error (RMSE) assessed by 10-fold cross-validation in both temporal and spatial dimensions, with larger improvement in the accuracy of predictions for O3 (RMSE [ppb] for CTM: 6.6, LUR: 4.6, composite: 3.6) than for PM2.5 (RMSE [μg/m3] CTM: 13.7, LUR: 3.2, composite: 3.1). Our study highlights the opportunity for future exposure assessment to make use of readily available spatio-temporal modeling methods and auxiliary gridded data that takes chemical reaction processes into account to improve the accuracy of predictions in a single spatio-temporal modeling framework.

TOC Art

graphic file with name nihms825846u1.jpg

1. INTRODUCTION

Long-term exposure to air pollution has been associated with adverse health outcomes.1 Currently, there is increased interest in estimating health effects with fine-scale estimates of exposure at participant locations in large populations, taking into account intra-urban differences in air pollution levels24 because of potentially biased health effect estimates based on exposure assignment at a community spatial scale.5, 6 Air pollution varies at multiple spatio-temporal scales. Due to the complexity of intra-urban sources, concentration gradients and atmospheric processes, simple spatial interpolation techniques (e.g. inverse distance weighing and ordinary kriging) relying on a limited number of monitoring sites are unable to accurately capture fine-scale spatio-temporal patterns of air pollutants. More advanced exposure estimation techniques include land use regression (LUR) and chemical transport models (CTMs), which are based on distinct methodological principles. LUR modeling employs statistical methods to combine data from air pollution measurements with data from geographic information systems (GIS) to explain spatial concentration variations.7 A LUR model is suitable to characterize small scale spatial variability of air pollutants reflecting, for example, roadside dispersion profiles, but the model performance is largely limited by the number and the spatial distribution of sampling sites.8, 9 A CTM relies on deterministic equations and utilizes data on emissions, meteorological conditions and topography to dynamically simulate the physico-chemical processes of pollutant transport and atmospheric chemistry to estimate outdoor air pollution concentrations.10 CTMs have been increasingly used to predict regional distributions of population exposures in large spatial domains and also to estimate long-term historical exposure.1113 However, the relatively coarse spatial resolution of a CTM (≥ 4 km) and imperfect emissions information restrict their ability to characterize air pollution concentrations at very local scales (i.e., meters) for exposure assessment.10

Several studies attempted to improve model performance by combining monitoring data with CTM14 or LUR predictions15 with some successes. However, few studies have integrated all data sources in one framework, except for a Bayesian Maximum Entropy spatial composite model reported for Barcelona.16 On one hand, LUR models, which often do not incorporate chemical and meteorological process, could be compensated by introducing CTM models. On the other hand, the relatively low spatial resolution of CTM predictions may be enhanced by adopting the contributions of LUR models with finer resolution. Therefore, a composite approach combining LUR and CTM in one framework could be favorable.

We present a composite geostatistical modeling framework, combining a spatio-temporal LUR (ST-LUR) model with CTM predictions to estimate spatio-temporal variability of ozone (O3) and particulate matter with diameter less than 2.5 μm (PM2.5) from 2000 to 2008 in the Los Angeles Basin, California. The performance of the composite model is evaluated and compared with the performances of the individual CTM and ST-LUR models.

2. METHODS

2.1 Monitoring Data

O3 and PM2.5 monitoring data obtained for fitting of our spatio-temporal LUR models include a dataset of continuous long-term measurements from the routine monitoring sites of the U.S. Environmental Protection Agency (EPA) in the South Coast Air Quality Management District (SCAQMD) and spatially dense (but temporally sparse) monitoring data acquired by the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) study.17, 18 Details of the monitoring data were reported elsewhere.19, 20 In brief, we selected 37 O3 and 22 PM2.5 routine monitoring sites within two 75 km buffers surrounding the Los Angeles metropolitan center as well as nearby Riverside County site (Figure S1). Daily averages of O3 and PM2.5 data at the routine monitoring sites were aggregated at a two-week time scale in order to mitigate the influence of daily meteorology and to match the temporal scale of the MESA Air sampling design. To better capture small scale spatial variation, we measured O3 and PM2.5 concentrations, respectively, at 117 and 113 participants’ addresses (home sites, Figure S1) with 1–3 two-week samplers per site, and at 2 and 7 fixed sites operated continuously for one and four year(s). The home sites were distributed densely in urban areas of Los Angeles and Riverside. We use monitoring data from 2000 to 2008, the time period for which CTM predictions are available. PM2.5 observations were log-transformed due to their skewed distribution.

2.2 Chemical Transport Modeling Data

We employed the UC Davis-California Institute of Technology (UCD-CIT) air quality model to simulate daily O3 (8-hour maximum) and daily PM2.5 (24-hour average) concentrations over the South Coast Air Basin from 2000 to 2008 (Figure 1).21 The UCD-CIT model is a 3D Eulerian source-oriented chemical transport model that includes a complete description of atmospheric transport, deposition, chemical reactions, and gas-particle transfer. The model was configured with a nested 4×4 km2 spatial resolution domain and with 16 vertical layers up to a height of 5 km above ground level. More complete details of the standard algorithms of the UCD-CIT airshed model and its evolution have been presented previously.2224 The Weather Research and Forecasting model (WRF v3.1.1)25 was used to simulate long-term historical meteorology over 9 years. The standard emission inventories from anthropogenic sources (i.e. point sources, stationary area sources, and mobile sources) were obtained from the California Air Resources Board (CARB).26 Hourly gridded gas and particulate emissions were generated using an updated version of the emission model described by Kleeman and Cass.27 Additional sources of emissions included biogenic emissions from the Biogenic Emissions Inventory System v3.14,28 in-line sea-salt emissions described by de Leeuw et al.,29 and wildfires and open burning emissions obtained from the Fire Inventory from NCAR (FINN).30

Figure 1.

Figure 1

Long-term average of (a) O3 (daily 8-hour maximum) and (b) PM2.5 (daily 24-hour average) concentrations estimated by CTM with 4×4 km resolution and distribution of monitoring sites.

The UCD-CIT predicted concentrations of O3 and PM2.5 were evaluated against ambient measurements at all available locations and times. Daily maximum O3 predictions are in good agreement with measurements across the entire modeling domain, with an overall mean fractional bias less than 0.1 and mean fractional error less than 0.2. PM2.5 total mass predictions also meet the model performance criteria, with mean fractional bias within 0.2 and mean fractional bias of 0.5, although PM2.5 sulfate and nitrate were under-predicted with mean fractional bias less than −0.3.21 The quality of the model predictions summarized above reflects the accuracy of the emissions inventories that have been refined over three decades in California, the development of reactive chemical transport models that include important aerosol transformation mechanisms, and the development of prognostic meteorological models that allow for long simulations of historical meteorology.

To make the CTM 8-hour maximum O3 predictions more comparable with the 2-week average measurement data, we applied a two-stage calibration approach to convert O3 8-hour maximum concentrations to a daily average for further aggregation. This is described in detail in the section below on the spatio-temporal model development. The CTM predictions of daily PM2.5 and calibrated 8-hour maximum O3 concentrations were subsequently averaged to a two-week time scale to match the format of the MESA Air monitoring data.

2.3 Geographic covariates

More than 180 geographic covariates were used for the ST-LUR and the composite model development. Variables covered a wide diversity of geographic features, such as road networks (e.g., distances to nearby major roads and, within buffers, lengths of roads and truck routes, and counts of intersections), industrial and port emissions, population density, land use (e.g., commercial space), and land cover (e.g., green space).19 Moreover, we incorporated an annual average of specific emission sources for NOx, SO2, CO, PM2.5 and PM10 from the U.S. EPA Emission Inventory Group and a long term average of California Line-source (Caline3QHCR) dispersion model predictions derived from primary mobile source emissions as covariates, as these are potentially related to O3 and PM2.5 formation and destruction.31 The Caline3QHCR model incorporates distance, traffic volume, meteorology and diurnal traffic patterns in the study region. Details of the geographic variables and their selection criteria are described in Table S1 and in a previous publication.19

2.4 Spatio-temporal model development

We developed a hierarchical modeling strategy to fully accommodate the unique features of our data as described above. Technical details of implementation, including the model structures and principles,3235 and recent applications of the models for PM2.5 and O3 without including CTM predictions, have been published.19, 20 The model is comprised of a spatio-temporal trend model and spatio-temporal residuals, written:

C(s,t)=μ(s,t)+v(s,t) (1)

where C(s,t) denotes the two-week average concentration of PM2.5 or O3 at location s and time t. μ(s,t) represents the spatio-temporal mean surface. The ν(s,t) represents the zero-mean spatio-temporal residual variation which has a spatial correlation structure, but is here assumed independent across the two-week time points. The mean μ(s,t) is further decomposed into a long-term average β0(s) at location s, a linear combination of temporal trend basis functions fi(t) with spatially-varying coefficient fields βi(s), and a spatio-temporal term M(s,t) representing the CTM model predictions with coefficient  γ as follows:

μ(s,t)=β0(s)+i=1mβi(s)fi(t)+γM(s,t) (2)

We refer to this as the composite model and call the model without CTM predictions (equivalently, γ=0) the ST-LUR model.

We derive the time trends fi(t) from routine and MESA Air fixed sites, which account for enough of the temporal structure across an entire study region. Specifically, the trends are computed from a singular value decomposition (SVD) of the space-time data matrix for the routine and fixed sites using the SpatioTemporal R package,3134 dealing with missing or incomplete monitoring data using an iterative EM (Expectation-Maximization) procedure described in detail by Sampson et al.34. The number of time trends m was selected by leave-one-site-out cross validation (LOOCV) of the built-in SVD function in R. The number of time trends providing the lowest mean square error and Akaike information criterion (AIC) was selected.

The spatially-varying long-term average β0(s) and time trend coefficients βi(s) are modeled in a universal kriging framework with a LUR mean and either an exponential or independent covariance structure.36 Each LUR mean comprises component scores computed as linear combinations of geographic covariates estimated by Partial Least Squares (PLS) at routine and fixed sites.37 The numbers of PLS scores were selected based on the LOOCV R2 of the built-in PLS function in R, which provides the most efficient description of the spatial variation for β0(s) and βi(s) while minimizing the risk of overfitting. The spatio-temporal term M(s,t) in the composite model represents the 2-week 24-hour average CTM predicted concentrations at all the monitoring sites (s). This was calculated at monitor and subject locations using distance-weighted averages of CTM predictions in the four surrounding grid cell centers. In order to match the daily 8-hour maximum CTM O3 predictions to the 2-week 24-hour average MESA Air and routine monitoring O3 concentrations, we conducted a two-stage calibration as follows. In the first stage, we regressed the 2-week average monitoring observations on CTM 8-hour maximum predictions M0(s,t) at each of the routine monitoring and fixed sites, allowing the slope (α1) and intercept (α0) to vary spatially from site to site:

M(s,t)=α^1(s)M0(s,t)+α^0(s) (3)

In the second stage, the site-specific slope and intercept vectors were regressed on geographic covariates using PLS.18,19 This PLS model enabled predictions of calibration coefficients at home sites and subject locations. To avoid over-fitting, we selected two PLS scores for the slope and intercept, respectively. This decision was based on LOOCV R2 at routine monitoring and fixed sites and the correlation between the calibrated CTM daily predictions and the observations at the home sites.

Once the time trends fi(t), the PLS scores for β(s), and the slope (α1) and intercept (α0) from calibrations were computed, the regression and parameters were estimated via maximum likelihood, using the SpatioTemporal package in R version 3.1.1.33

2.5 Model validation and comparison

We compare the composite ST-LUR model to the ST-LUR model without CTM and to using calibrated CTM predictions alone. We used cross validation (CV) to evaluate model performance holding fixed the pre-computed time trends and PLS scores. For each assessment of the model performance, we computed root mean square error (RMSE) and CV R2 (defined as squared correlations between predictions and observations) to quantify both the accuracy and precision of each model.

We used ten-fold CV by successively leaving out one-tenth of the monitoring sites for validation at the routine monitoring and fixed site locations, as well as the home site locations, which were intended to reflect spatial contrasts of O3 and PM2.5 at the places of most interest. For routine monitoring and fixed sites, we evaluated our performance in representing temporal variability using the across-sites median of the site-specific squared correlation coefficients (R2s) computed at the two-week scale over the entire study period. We similarly scored our performance in representing spatial variability using the across-years median of the R2s computed on annual averages for the routine monitoring and fixed sites. The spatial patterns of the RMSE distribution were assessed visually on maps. We identified geographical features related to the improvement of estimate errors between models. In addition, we separately examined the prediction ability of the models in the cold and warm seasons at the routine monitoring and fixed sites.

3. RESULTS

Figure 1 shows the spatial pattern of the study period average O3 concentrations (maximum 8-hour average) and PM2.5 concentrations from the CTM models before temporal calibration. The estimated values varied substantially over this spatial domain. The O3 model predicted lower concentrations in the urban areas such as the downtown of Los Angeles compared to rural and mountain areas, which is generally opposite to the spatial pattern of the PM2.5 predictions.

Table 1 presents an overview of the O3 and PM2.5 ST-LUR and the composite model specifications. Both models included two time trends with the first time trend explaining most of the temporal variability. Three and two PLS scores were selected for the long-term mean β0(s) in the O3 and PM2.5 models, respectively. In both models, the coefficients were spatially smoothed using an exponential covariance model. For the time trend coefficients, one and two PLS scores were selected for the O3 and PM2.5 models, respectively, without further spatial smoothing. In general, the long-term mean O3 was positively associated with elevation and green space (e.g. Grass and Shrub), and was negatively associated with the features indicating primary emissions (e.g., traffic and anthropogenic factors, including population and impervious surfaces) (see Figure S2 characterizing the PLS scores in terms of the GIS covariates). This is is in agreement with the known atmospheric processes of ozone formation. For instance, ozone is usually scavenged by primary NO emissions in locations where NOx/VOC ratios are high such as in urban core areas with dense traffic and is produced by biogenic VOCs from plants and NOx in rural areas or in locations downwind of major urban areas.15 Moreover, predictor variables such as population density and impervious surface do not merely represent pollution from anthropogenic activities such as traffic, wood burning and house heating, but also reflect differences in urban-rural concentration distributions. In contrast, the long-term mean PM2.5 concentrations was negatively associated with green space, and was positively associated with the primary emissions indicators.

Table 1.

Spatio-temporal model specifications for O3 and PM2.5

No. of routine monitoring Sites No. of Fixed Sites No. of Home Sites No. of Time Trends 1 No. of PLS Scores 2 Spatial Smoothing
Long Term (β0) Time Trend (βi)
O3 37 2 117 2 (3,1,1) Yes No
PM2.5 22 7 113 2 (2,1,2) Yes No
1

Number of PLS scores for each trend (long-term mean, 1st time trend, 2nd time trend);

2

Yes, exponential covariance structure. No, independent covariance structure.

Table 2 presents the performances of the three modeling approaches for O3 and PM2.5 at the routine monitoring and MESA Air fixed sites and the home locations. The O3 CTM model for 8-hour maximum concentrations, calibrated to the daily average level, performed moderately well at both the home (RMSE: 6.55 ppb; R2: 0.60) and the routine monitoring and Fixed sites (RMSE: 8.83 ppb; R2: 0.56), with better prediction ability in time than in space. The performance of the PM2.5 CTM model, however, was poor in representing temporal variability of simple PM2.5 mass at this 2-week time scale (RMSE: 9.58μg/m3; R2: 0.28), but showed good spatial predictive performance (RMSE: 7.11 μg/m3; R2: 0.59). Previous evaluation of the CTM performance identified an over-estimation of dust emissions leading to an over-estimation of PM2.5 mass concentrations and a damping of the relative temporal variability. Predicted PM2.5 mass concentrations at other major California cities exhibit good agreement with measurements, as do PM2.5 component(elemental carbon, organic carbon, nitrate, etc) at all California cities, including Los Angeles.21

Table 2.

Model performances at the home and routine monitoring (RM)/Fixed sites.

RMSE (R2) O3 PM2.5
CTM ST-LUR Composite CTM ST-LUR Composite
Home: Overall1 6.55 (0.60) 4.56 (0.78) 3.62 (0.84) 13.73 (0.32) 3.20 (0.78) 3.10 (0.80)
RM/Fixed: Overall2 8.83 (0.56) 5.64 (0.86) 4.65 (0.87) 10.15 (0.36) 3.41 (0.81) 3.35 (0.82)
RM/Fixed: Spatial3 7.35 (0.42) 4.88 (0.75) 3.57 (0.78) 7.11 (0.59) 1.31 (0.89) 1.17 (0.93)
RM/Fixed: Temporal4 6.78 (0.67) 4.83 (0.89) 3.80 (0.91) 9.58 (0.28) 3.26 (0.82) 3.19 (0.82)
1

Home: Home sites. Ten-fold CV (Note: CTM predictions are not cross-validated);

2

RM/Fixed: routine monitoring and fixed sites. Ten-fold CV (Note: O3 predictions are cross-validated in the calibration of the PLS regression; PM2.5 CTM predictions are not cross-validated);

3

Median RMSE and R2 based on annual averages at each routine monitoring and fixed site across years, which reflects spatial prediction ability of the models;

4

Median RMSE and R2 between predictions and observations at two-week time points across the entire study period for individual sites.

The ST-LUR model performed equally well for O3 and PM2.5, with better predictive performance in terms of R2 and RMSE (Table 2) and less overall bias than the CTM models (Figure 2). The ST-LUR model tended to overestimate O3 concentrations at home sites in the urban areas where O3 is scavenged by NO emitted by motor vehicles, while it under estimated O3 concentrations in background area where routine monitoring sites are located (Figure S1). For PM2.5, the bias of the ST-LUR model predictions was small (close to zero) at all sites.

Figure 2.

Figure 2

Distribution of prediction errors (predicted value minus observed value) at the home and routine monitoring sites for O3 and PM2.5 using the three modeling approaches.

The composite spatio-temporal models incorporating CTM predictions outperformed the CTM and the ST-LUR models alone, with no indication of bias (Figure 2). Including CTM predictions into the framework significantly improved prediction accuracy of O3 in terms of RMSE (reduction in RMSE by 21% at home sites), but provided only modest improvement in precision expressed in terms of R2 (increased R2 by 1 to 6 percentage points) (Table 2). A map of the spatial pattern of the temporal RMSEs generated by the CTM, ST-LUR, and composite models for O3 at the routine monitoring and the fixed sites showed that the RMSE value of the composite model was lower than those of the individual CTM and LUR models (Figure S3). By comparing the RMSEs of various modeling approaches by routine monitoring site types (i.e. traffic, urban background and rural background sites), we see that the composite model tended to be more accurate in predicting O3 concentrations especially at the rural background sites than the other models (Table S3).

The contribution of the CTM predictions to the spatiotemporal model improved the PM2.5 model only slightly, with 1 to 3 percent reduction in RMSE and 1 to 2 percent increment in R2 (Table 2). Differences in RMSE between the two models for PM2.5 did not correlate with geographical features, as one might expect in light of the relatively small benefit of the CTM predictions of PM2.5. Again, this result stems largely from the problem of unrealistic dust emissions contributing to PM2.5 mass concentrations in Los Angeles.

In a sensitivity analysis, directly incorporating the original O3 CTM model predictions (uncalibrated 8 hour averages) resulted in lower accuracy of predictions compared to the performance of the model with temporal calibration by site (daily average) (Table S2).

For both models, the improvement appeared to be greater for representation of spatial variability than for temporal variability (Table 2) and greater in the warm season than in the cold seasons (Table S4). For all the modeling approaches, the O3 models predicted consistently better in the cold season than in the warm season (Table S4). The PM2.5 composite model had only slightly better performance than the LUR model in the warm season (Table S4).

Figure 3 shows the difference of the estimated long-term average O3 and PM2.5 concentrations between models (Ccomposite-CST-LUR) in Los Angeles basin. Prediction maps of the composite models for O3 and PM2.5 are shown in Figure S1. O3 concentration estimates from the composite model were substantially higher in the rural areas and lower in urban areas than those from the ST-LUR model, and PM2.5 concentration estimates were slightly higher across the Los Angeles area, except in the mountain area.

Figure 3.

Figure 3

Difference of long-term predictions between the composite spatiotemporal model and ST-LUR model for (a) O3 and (b) PM2.5 in Los Angeles basin.

4. DISCUSSION

We developed a composite approach to incorporate CTM predictions into estimation of small-scale spatio-temporal variability of O3 and PM2.5 in the Los Angeles basin. Our findings suggest that incorporating UCD-CIT chemical transport model (CTM) predictions into our previously described spatio-temporal land-use regression based geostatistical framework (ST-LUR) improves estimation in O3 concentrations, but adds relatively less value to prediction of PM2.5 mass concentrations.

Our study has three strengths over previous studies which incorporated multiple modeling approaches in one framework.14, 15, 3840 First, we included intensive monitoring data from multiple resources (N=156 for O3; N=142 for PM2.5) to characterize spatial variability of air pollutants rather than relying solely on administrative monitoring sites. Second, our UCD-CIT CTM predictions were unique in terms of the long time span (2000–2008) and the fine spatial resolution (4×4 km2). Thirdly, our hierarchical modeling framework took full advantage of the monitoring and prediction data. This enables us to provide more accurate and highly-resolved concentrations for O3 and PM2.5 at intra-urban sites compared with most of the composite models focused on large-scale air pollution prediction.

4.1 O3 model

Integration of a CTM (4×4 km) with LUR to improve the spatial predictability of estimates at the intra-urban scale, primarily focused on nitrogen dioxide (NO2), was demonstrated (using the Bayesian Maximum Entropy (BME) approach) in Barcelona, Spain.16 Jointly accounting for the global scale variability in the concentration from the output of CTM and the citywide scale variability though LUR model output effectively increased the estimation accuracy for NO2 predictions (>30%) compared with other conventional approaches such as LUR and CTM alone. This is consistent with the general findings of our study that integration of the two modeling approaches was more accurate than the individual models, especially for O3. Since O3 is a reactive gas and rapidly reacts with nitrogen oxides in urban air, it is not surprising that the O3 model is improved with CTM predictions as was the NO2 model in the Barcelona study.16 Even though the degree of improvement from integration of the CTM in our O3 model is less than that of the NO2 model from Barcelona (>20% vs >30%), it is worth noting that our ST-LUR and CTM models have taken spatial residuals into account through smoothing (universal kriging) while the reported NO2 models from Barcelona16 did not. Previous studies on O3 exposure have shown considerable improvement from spatial smoothing in LUR and CTM models.14, 15 A recent report of a Canadian national O3 LUR model, which treated dispersion modeled O3 concentrations as a GIS-derived variable, explained 56% of spatio-temporal variability in O3 concentrations.39 However, the large spatial domain and very coarse resolution of the dispersion modeled outputs (21×21 km) make those results less comparable with our results.

There are at least two ways in which the CTM output contributed to reducing the estimation error in our models. Firstly, our CTM models capture well the temporal and spatial variability in concentrations across monitoring sites, especially for O3. The CTM predictions may have compensated for the relatively limited temporal variability in our home measurements and therefore increased the statistical power for predictions. Secondly, the CTM model incorporates meteorology and emissions information to simulate interactions between components in air over a large spatial domain. For instance, temperature, an important weather variable affecting oxidant photochemistry and surface O3 concentrations,41 is not included in the LUR model, but is well represented in the CTM model. This is reflected in the larger decrease in RMSE of the composite O3 model in warm seasons (23%) than in cold seasons (18%). Improved prediction ability of the O3 model was achieved by incorporating CTM output in suburban areas with a larger amount of green space, in areas of higher elevation and in urban areas with more traffic and population, features that are typical of locations with sparse monitoring data, suggesting that exposure estimates at cohort addresses far away from monitoring sites may be improved by incorporating CTM model predictions. This may be explained by the fact that the CTM model explicitly accounts for atmospheric chemical processes, long range tranport, the role of biogenic volatile organic compounds as O3 precursors, and effects of complex terrain and meteorology. Moreover, in urban areas the urban background concentrations estimated by the CTM, reflecting the effect of local emissions on O3 destruction over a relatively large spatial domain, also improved the accuracy of estimates at the home monitoring locations.42

A map of predicted O3 concentrations from the composite model shows a larger range of concentrations than those from the ST-LUR model, with higher concentrations in the rural areas and lower concentrations in the urban areas (Figure 3). This indicates that our composite model has a better ability to capture real world spatial O3 concentration contrasts and thus reduce exposure misclassification for epidemiological studies.

4.2 PM2.5 model

Our composite model incorporating CTM output only marginally improved the prediction ability for PM2.5. This may be attributed to the relatively short averaging time scales (two-week) of the inputs from the PM2.5 CTM. Previous examination of PM2.5 in California suggested better predictability of the PM2.5 CTM over longer periods due to the reduced influence of extreme events and reduction in short-term variability as the averaging period gets longer.21, 43 Consistent with this observation, we found better spatial agreement between annual averaged PM2.5 observations and predictions across the routine monitoring sites (R2=0.59, RMSE=7.11 μg/m3) than when using two-week average data, resulting in the largest improvement in spatial accuracy of the composite model. Moreover, PM2.5 predictions contained some compensating errors due to imperfect emission inventories and incomplete formation pathways in nitrate, sulfate, and secondary organic aerosols. Primary particles associated with dust emissions were over predicted while secondary particles were under predicted, resulting in a slightly over predicted PM2.5 concentrations by the CTM and a damping of the time trends in Los Angeles.21 Figure S3 indicates that the CTM PM2.5 mass predictions generally had larger RMSE in urban areas (where most monitoring sites were located) than in suburban and rural areas (where fewer monitoring sites were located) while the CTM O3 predictions had a generally opposite trend. It is likely that more accurate CTM predictions for individual chemical species such as PM2.5 EC, PM2.5 OC, etc. would make greater contributions to improved performance of the composition modeling approach than the current model in Los Angeles.

The use of composite modeling approaches for air pollutants has also recently been investigated regarding incorporation of satellite-based remote sensing data into ground-level exposure predictions.38, 40, 44, 45 Although this approach may better capture the complex spatio-temporal variability in ambient PM2.5 at the regional or national scale, we suspect that satellite data may have limited ability to improve our model at small scales due to the relatively coarse resolution (10×10 km2) of the data and the already excellent performance of our PM2.5 ST-LUR model based on comprehensive monitoring sites and a highly explanatory set of geographic covariates. We found higher PM2.5 predictions from the composite model than those from the ST-LUR model over a large region of our study area which may be driven by the over-predictions from the CTM (Figure 3).

4.3 Limitation

Our study has a few limitations. The two-week time frame of our monitoring data limited the ability of our O3 model to predict 8-hour maximum values that are regulated by the U.S. EPA. However, correlations between two-week average data from daily average and daily 8-hour maximum ozone observations were generally high across the routine monitoring sites (R2=0.81 on average), so this may not be a serious limitation. Furthermore, the spatial resolution of 4×4 km2 of the CTM data is still relatively coarse. As exposure methods evolve, higher spatial resolution data from CTMs and satellites, for instance, 1×1 km2, should become available to further reduce exposure measurement error.46, 47

In summary, we demonstrate that integration of output from a chemical transport model in a well-developed land use regression based geo-statistical framework can increase prediction accuracy at an intra-urban scale. The improvement in predictions is greater for O3 than for PM2.5 mass in Los Angeles, but these conclusions will likely change for other study regions. Our study highlights opportunities for future improvement in exposure assessment by making use of multiple sources of data to improve the accuracy of predictions in a fused modeling framework.

Supplementary Material

Supplementary Materials

Acknowledgments

This publication was developed under U.S. Environmental Protection Agency STAR research assistance agreements, RD831697 (MESA Air) and RD833741 (MESA Coarse) awarded to the University of Washington, and award R83386401 to the University of California, Davis. It was also supported by the University of Washington Center for Clean Air Research (Environmental Protection Agency RD83479601-01) and by the National Institute of Environmental Health Sciences of the National Institutes of Health under award number T32ES015459. It has not been formally reviewed by the EPA. The views expressed in this document are solely those of the authors and the EPA does not endorse any products or commercial services mentioned in this publication.

Footnotes

SUPPORTING INFORMATION

Table S1–S4, Figure S1–S3, details of model structure, geographical covariates, and sensitivity analyses on model performances by seasons and site types are provided in the Supporting information.

References

  • 1.Hoek G, Krishnan RM, Beelen R, Peters A, Ostro B, Brunekreef B, Kaufman JD. Long-term air pollution exposure and cardio- respiratory mortality: a review. Environ Health-Glob. 2013;12 doi: 10.1186/1476-069X-12-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Brauer M, Lencar C, Tamburic L, Koehoorn M, Demers P, Karr C. A cohort study of traffic-related air pollution impacts on birth outcomes. Environ Health Persp. 2008;116(5):680–686. doi: 10.1289/ehp.10952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Molter A, Agius RM, de Vocht F, Lindley S, Gerrard W, Lowe L, Belgrave D, Custovic A, Simpson A. Long-term Exposure to PM10 and NO2 in Association with Lung Volume and Airway Resistance in the MAAS Birth Cohort. Environ Health Persp. 2013;121(10):1232–1238. doi: 10.1289/ehp.1205961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gehring U, Gruzieva O, Agius RM, Beelen R, Custovic A, Cyrys J, Eeftens M, Flexeder C, Fuertes E, Heinrich J, Hoffmann B, de Jongste JC, Kerkhof M, Klumper C, Korek M, Molter A, Schultz ES, Simpson A, Sugiri D, Svartengren M, von Berg A, Wijga AH, Pershagen G, Brunekreef B. Air Pollution Exposure and Lung Function in Children: The ESCAPE Project. Environ Health Persp. 2013;121(11–12):1357–1364. doi: 10.1289/ehp.1306770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jerrett M, Burnett RT, Ma RJ, Pope CA, Krewski D, Newbold KB, Thurston G, Shi YL, Finkelstein N, Calle EE, Thun MJ. Spatial analysis of air pollution and mortality in Los Angeles. Epidemiology. 2005;16(6):727–736. doi: 10.1097/01.ede.0000181630.15826.7d. [DOI] [PubMed] [Google Scholar]
  • 6.Miller KA, Siscovick DS, Sheppard L, Shepherd K, Sullivan JH, Anderson GL, Kaufman JD. Long-term exposure to air pollution and incidence of cardiovascular events in women. New Engl J Med. 2007;356(5):447–458. doi: 10.1056/NEJMoa054409. [DOI] [PubMed] [Google Scholar]
  • 7.Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, Briggs D. A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ. 2008;42(33):7561–7578. [Google Scholar]
  • 8.Basagana X, Rivera M, Aguilera I, Agis D, Bouso L, Elosua R, Foraster M, de Nazelle A, Nieuwenhuijsen M, Vila J, Kunzli N. Effect of the number of measurement sites on land use regression models in estimating local air pollution. Atmos Environ. 2012;54:634–642. [Google Scholar]
  • 9.Wang M, Beelen R, Eeftens M, Meliefste K, Hoek G, Brunekreef B. Systematic Evaluation of Land Use Regression Models for NO2. Environ Sci Technol. 2012;46(8):4481–4489. doi: 10.1021/es204183v. [DOI] [PubMed] [Google Scholar]
  • 10.Jerrett M, Arain A, Kanaroglou P, Beckerman B, Potoglou D, Sahsuvaroglu T, Morrison J, Giovis C. A review and evaluation of intraurban air pollution exposure models. J Expo Anal Env Epid. 2005;15(2):185–204. doi: 10.1038/sj.jea.7500388. [DOI] [PubMed] [Google Scholar]
  • 11.Kirrane EF, Bowman C, Davis JA, Hoppin JA, Blair A, Chen HL, Patel MM, Sandler DP, Tanner CM, Vinikoor-Imler L, Ward MH, Luben TJ, Kamel F. Associations of Ozone and PM2.5 Concentrations With Parkinson’s Disease Among Participants in the Agricultural Health Study. J Occup Environ Med. 2015;57(5):509–517. doi: 10.1097/JOM.0000000000000451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sacks JD, Rappold AG, Davis JA, Richardson DB, Waller AE, Luben TJ. Influence of Urbanicity and County Characteristics on the Association between Ozone and Asthma Emergency Department Visits in North Carolina. Environ Health Persp. 2014;122(5):506–512. doi: 10.1289/ehp.1306940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ostro B, Hu JL, Goldberg D, Reynolds P, Hertz A, Bernstein L, Kleeman MJ. Associations of Mortality with Long-Term Exposures to Fine and Ultrafine Particles, Species and Sources: Results from the California Teachers Study Cohort. Environ Health Persp. 2015;123(6):549–556. doi: 10.1289/ehp.1408565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.De Nazelle A, Arunachalam S, Serre ML. Bayesian Maximum Entropy Integration of Ozone Observations and Model Predictions: An Application for Attainment Demonstration in North Carolina. Environ Sci Technol. 2010;44(15):5707–5713. doi: 10.1021/es100228w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Adam-Poupart A, Brand A, Fournier M, Jerrett M, Smargiassi A. Spatiotemporal Modeling of Ozone Levels in Quebec (Canada): A Comparison of Kriging, Land-Use Regression (LUR), and Combined Bayesian Maximum Entropy-LUR Approaches. Environ Health Persp. 2014;122(9):970–976. doi: 10.1289/ehp.1306566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Akita Y, Baldasano JM, Beelen R, Cirach M, de Hoogh K, Hoek G, Nieuwenhuijsen M, Serre ML, de Nazelle A. Large Scale Air Pollution Estimation Method Combining Land Use Regression and Chemical Transport Modeling in a Geostatistical Framework. Environ Sci Technol. 2014;48(8):4452–4459. doi: 10.1021/es405390e. [DOI] [PubMed] [Google Scholar]
  • 17.Zhang K, Larson TV, Gassett A, Szpiro AA, Daviglus M, Burke GL, Kaufman JD, Adar SD. Characterizing Spatial Patterns of Airborne Coarse Particulate (PM10–2.5) Mass and Chemical Components in Three Cities: The Multi-Ethnic Study of Atherosclerosis. Environ Health Persp. 2014;122(8):823–830. doi: 10.1289/ehp.1307287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Cohen MA, Adar SD, Allen RW, Avol E, Curl CL, Gould T, Hardie D, Ho A, Kinney P, Larson TV, Sampson P, Sheppard L, Stukovsky KD, Swan SS, Liu LJS, Kaufman JD. Approach to Estimating Participant Pollutant Exposures in the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) Environ Sci Technol. 2009;43(13):4687–4693. doi: 10.1021/es8030837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Keller JP, Olives C, Kim SY, Sheppard L, Sampson PD, Szpiro AA, Oron AP, Lindstrom J, Vedal S, Kaufman JD. A Unified Spatiotemporal Modeling Approach for Predicting Concentrations of Multiple Air Pollutants in the Multi-Ethnic Study of Atherosclerosis and Air Pollution. Environ Health Persp. 2015;123(4):301–309. doi: 10.1289/ehp.1408145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wang M, Keller JP, Adar SD, Kim SY, Larson TV, Olives C, Sampson PD, Sheppard L, Szpiro AA, Vedal S, Kaufman JD. Development of long-term spatiotemporal models for ambient ozone in six metropolitan regions of the United States: the MESA Air study. Under review. 2015 doi: 10.1016/j.atmosenv.2015.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hu J, Zhang H, Ying Q, Chen SH, Vandenberghe F, Kleeman MJ. Long-term particulate matter modeling for health effect studies in California – Part 1: Model performance on temporal and spatial variations. Atmos Chem Phys. 2015;15(6):3445–3461. [Google Scholar]
  • 22.Kleeman MJ, Cass GR. A 3D Eulerian source-oriented model for an externally mixed aerosol. Environ Sci Technol. 2001;35(24):4834–4848. doi: 10.1021/es010886m. [DOI] [PubMed] [Google Scholar]
  • 23.Kleeman MJ, Cass GR, Eldering A. Modeling the airborne particle complex as a source-oriented external mixture. J Geophys Res-Atmos. 1997;102(D17):21355–21372. [Google Scholar]
  • 24.Rasmussen DJ, Hu JL, Mahmud A, Kleeman MJ. The Ozone-Climate Penalty: Past, Present, and Future. Environ Sci Technol. 2013;47(24):14258–14266. doi: 10.1021/es403446m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wei WDM, Dudhia J, Lin H, Michalakes J, Rizvi S, Zhang X. The Advanced Research WRF (ARW) Version 3 Modeling System User’ s Guide. 2010 [Google Scholar]
  • 26.Ying Q, Lu J, Kleeman M. Modeling air quality during the California Regional PM10/PM2.5 Air Quality Study (CPRAQS) using the UCD/CIT source-oriented air quality model – Part III. Regional source apportionment of secondary and total airborne particulate matter. Atmos Environ. 2009;43(2):419–430. [Google Scholar]
  • 27.Kleeman MJ, Cass GR. Source contributions to the size and composition distribution of urban particulate air pollution. Atmos Environ. 1998;32(16):2803–2816. [Google Scholar]
  • 28.Vukovich JPT. The Implementation of BEIS3 Within the SMOKE Modeling Framework, MCNC-Environmental Modeling Center, Research Triangle Park and National Oceanic and Atmospheric Administration. 11th International Emission Inventory Conference – “Emission Inventories – Partnering for the Future”; Atlanta, GA. 15–18 April, 2002. [Google Scholar]
  • 29.de Leeuw G, Neele FP, Hill M, Smith MH, Vignali E. Production of sea spray aerosol in the surf zone. J Geophys Res-Atmos. 2000;105(D24):29397–29409. [Google Scholar]
  • 30.Hodzic A, Madronich S, Bohn B, Massie S, Menut L, Wiedinmyer C. Wildfire particulate matter in Europe during summer 2003: meso-scale modeling of smoke emissions, transport and radiative effects. Atmos Chem Phys. 2007;7(15):4043–4064. [Google Scholar]
  • 31.Eckhoff PBT. Addendum to the user’s guide to CAL3QHC version 2.0 (CAL3QHCR user’s guide) 1995 [Google Scholar]
  • 32.Lindstrom J, Szpiro AA, Sampson PD, Oron AP, Richards M, Larson TV, Sheppard L. A Flexible Spatio-Temporal Model for Air Pollution with Spatial and Spatio-Temporal Covariates. Environmental and ecological statistics. 2014;21(3):411–433. doi: 10.1007/s10651-013-0261-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lindström JSA, Sampson P, Bergen S, Oron A. SpatioTemporal: Spatio-Temporal Model Estimation. R Package Version 111. 2012 [Google Scholar]
  • 34.Sampson PD, Szpiro AA, Sheppard L, Lindstrom J, Kaufman JD. Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data. Atmos Environ. 2011;45(36):6593–6606. [Google Scholar]
  • 35.Szpiro AA, Sampson PD, Sheppard L, Lumley T, Adar SD, Kaufman JD. Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics. 2010;21(6):606–631. doi: 10.1002/env.1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cressie, N. S. f. S. D. R. E. J. W. S., 1993.
  • 37.Mevik BH, Wehrens R. The pls package: Principal component and partial least squares regression in R. J Stat Softw. 2007;18(2):1–23. [Google Scholar]
  • 38.Reid CE, Jerrett M, Petersen ML, Pfister GG, Morefield PE, Tager IB, Raffuse SM, Balmes JR. Spatiotemporal Prediction of Fine Particulate Matter During the 2008 Northern California Wildfires Using Machine Learning. Environ Sci Technol. 2015;49(6):3887–3896. doi: 10.1021/es505846r. [DOI] [PubMed] [Google Scholar]
  • 39.Hystad P, Demers PA, Johnson KC, Brook J, van Donkelaar A, Lamsal L, Martin R, Brauer M. Spatiotemporal air pollution exposure assessment for a Canadian population-based lung cancer case-control study. Environ Health-Glob. 2012;11 doi: 10.1186/1476-069X-11-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Beckerman BS, Jerrett M, Serre M, Martin RV, Lee SJ, van Donkelaar A, Ross Z, Su J, Burnett RT. A Hybrid Approach to Estimating National Scale Spatiotemporal Variability of PM2.5 in the Contiguous United States. Environ Sci Technol. 2013;47(13):7233–7241. doi: 10.1021/es400039u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Millstein DE, Harley RA. Impact of climate change on photochemical air pollution in Southern California. Atmos Chem Phys. 2009;9(11):3745–3754. [Google Scholar]
  • 42.Hu JL, Howard CJ, Mitloehner F, Green PG, Kleeman MJ. Mobile Source and Livestock Feed Contributions to Regional Ozone Formation in Central California. Environ Sci Technol. 2012;46(5):2781–2789. doi: 10.1021/es203369p. [DOI] [PubMed] [Google Scholar]
  • 43.Hu JL, Zhang HL, Chen SH, Wiedinmyer C, Vandenberghe F, Ying Q, Kleeman MJ. Predicting Primary PM2.5 and PM0.1 Trace Composition for Epidemiological Studies in California. Environ Sci Technol. 2014;48(9):4971–4979. doi: 10.1021/es404809j. [DOI] [PubMed] [Google Scholar]
  • 44.Kloog I, Nordio F, Coull BA, Schwartz J. Incorporating Local Land Use Regression And Satellite Aerosol Optical Depth In A Hybrid Model Of Spatiotemporal PM2.5 Exposures In The Mid-Atlantic States. Environ Sci Technol. 2012;46(21):11913–11921. doi: 10.1021/es302673e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Liu Y, Paciorek CJ, Koutrakis P. Estimating Regional Spatial and Temporal Variability of PM2.5 Concentrations Using Satellite Data, Meteorology, and Land Use Information. Environ Health Persp. 2009;117(6):886–892. doi: 10.1289/ehp.0800123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hu XF, Waller LA, Lyapustin A, Wang YJ, Al-Hamdan MZ, Crosson WL, Estes MG, Estes SM, Quattrochi DA, Puttaswamy SJ, Liu Y. Estimating ground-level PM2.5 concentrations in the Southeastern United States using MAIAC AOD retrievals and a two-stage model. Remote Sens Environ. 2014;140:220–232. [Google Scholar]
  • 47.Ching J, Herwehe J, Swall J. On joint deterministic grid modeling and sub-grid variability conceptual framework for model evaluation. Atmos Environ. 2006;40(26):4935–4945. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

RESOURCES