Abstract
Mobile monitoring is increasingly employed to measure fine spatial-scale variation in air pollutant concentrations. However, mobile measurement campaigns are typically conducted over periods much shorter than the decadal periods used for modeling chronic exposure for use in air pollution epidemiology. Using the regions of Los Angeles and Baltimore and the time period from 2005–2014 as our modeling domain, we investigate whether including mobile or stationary passive sampling device (PSD) monitoring data collected over a single two-week period in one or two seasons using a unified spatio-temporal air pollution model can improve model performance in predicting NO2 and NOx concentrations throughout the 9-year study period beyond what is possible using only routine monitoring data. In this initial study, we use data from mobile measurement campaigns conducted contemporaneously with deployments of stationary PSDs, and only use mobile data collected within 300m of a stationary PSD location for inclusion in the model. We find that including either mobile or PSD data substantially improves model performance for pollutants and locations where model performance was initially the worst (with the most-improved R2 changing from 0.40 to 0.82), but does not meaningfully change performance in cases where performance was already very good. Results indicate that in many cases additional spatial information from mobile monitoring and personal sampling are potentially cost-efficient inexpensive ways of improving exposure predictions at both two-week and decadal averaging periods, especially for the predictions that are located closer to features such as roadways targeted by the mobile short-term monitoring campaign.
Keywords: Air pollution monitoring, Mobile monitoring, Passive sampling, Exposure modeling, Spatio-temporal modeling
INTRODUCTION
Air pollution concentrations can vary widely within a city. This phenomenon may be due to factors that reflect local urban features, such as location and magnitude of the local emission sources, meteorology, physical features (e.g., buildings, elevated highways and urban canyons) and ventilation1. Among the top twenty most populous cities in the U.S, routine U.S. EPA air pollutant monitoring utilizes on average three to seven monitoring sites per city, depending on the pollutant. This density of monitoring may not accurately describe this spatial variability, especially for traffic-related air pollution.
Both short-term supplemental mobile monitoring campaigns and passive sampling campaigns are useful tools for ascertaining fine-scale variation in air pollutant concentrations, with mobile monitoring being increasingly employed for this purpose2–10 and being used to quantify spatial variability in longer-term average pollution concentrations11–14. Thus researchers have adopted mobile monitoring to sample urban microenvironments1,11,15 as a supplement to routine stationary monitoring. In addition to the advantages of high spatial resolution and cost-effectiveness, mobile monitoring can also be performed using a single measurement platform, such as a modified vehicle that deploys multiple state-of-the-art air quality measurement instruments.
However, the use of mobile monitoring has been limited partly by the sheer number of manhours required to conduct the sampling. This results in challenges in adequately capturing temporal trends or temporal variations in concentrations compared to stationary monitoring approaches. One solution to this challenge is to aggregate mobile data over longer time periods. Levy et al. found that average pollutant concentrations from their mobile data collected during three-week campaigns in three seasons were within 25% of actual annual averages for NO2, NOx, CO and O3 and within 30% of actual annual average PM2.5 at fixed monitoring sites.11 Riley et al. and Tessum et al. reported strong to moderate correlations between co-located and concurrent mobile and fixed-site passive sampler measurements of NOx and NO2 in both heating and non-heating seasons on a two-week timescale.12,13 These studies indicate that appropriately aggregated mobile monitoring data collected over the course of several weeks can provide useful information regarding spatial patterns in air pollutant concentrations. As mobile measurements can be taken at a large number of locations, this raises the prospect of developing stable surfaces of urban air pollution concentrations with high spatial resolution, allowing more accurate exposure estimates for epidemiologic studies.
A unified and flexible spatio-temporal air pollution modeling approach (the ST model) developed for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) provides accurate fine-scale predictions for multiple air pollutants.16–19 This modeling approach employs land use regression in estimating spatially-varying long-term averages (i.e., over several years), spatially-varying seasonal and long-term trends, and spatially correlated but temporally independent residuals in a universal kriging framework to develop space-time fields of air pollutant concentrations at fine spatial and temporal scales. The ST model has previously used stationary monitoring data from the MESA Air monitoring campaign along with data from the EPA Air Quality System (AQS). However, the model is also able to accommodate irregular space-time monitoring data of the type generated during a mobile monitoring campaign when averaged to the same time scale. Training and testing the ST model with the addition of mobile and/or passive sampler monitoring data is a critical step in determining the utility of mobile monitoring and passive sampling in generating accurate and more cost-efficient fine-scale air pollution concentration predictions. In this study, we evaluate the contributions of short-term mobile and passive monitoring platforms in developing long-term air pollution concentration predictions using routine ambient air monitoring network (AQS and several fixed monitoring sites conducted by MESA Air) data within the ST model framework.
METHODS
Overview
We developed independent spatio-temporal models of NOx and NO2 in Los Angeles and NOx in the Baltimore region for the period from June 2005 through December 2014 using NO2 or NOx measurement data and spatial covariates. During model development, we added data from a supplemental monitoring campaign conducted in 2012 and 2013 in addition to the data previously used to model these pollutants for the MESA Air study. (Details are described by Keller et al.19 and in the Supplementary Information. We did not model NO2 concentrations in the Baltimore region owing to inconsistencies in our mobile instrument measurements.) In this study, along with AQS and MESA fixed site data, either mobile monitoring data or passive sampler device (PSD) data sampled at the same locations as the mobile monitoring data were incorporated into the spatio-temporal model to assess whether addition of mobile monitoring data and PSD data can improve model performance. To evaluate the added benefit of using mobile monitoring data and PSD data during model training, we compared models created using three combinations of measurement data as three scenarios: 1) AQS+ fixed site data, 2) AQS + fixed site + mobile data, and 3) AQS + fixed site + PSD data. MESA home data (see Monitoring Data, below) were not included in the model training but were used to validate the model predictions.
Monitoring Data
The spatial domains of this study are the metropolitan regions of Los Angeles and Baltimore. We defined the metropolitan region as an area with an approximately 80 km radius of each metropolitan center. This definition allowed us to include more than 10 AQS monitoring sites in each study domain. We aggregated all measurements to two-week intervals since many MESA Air supplementary monitoring instruments only collected data at the two-week time scale. Previous analysis has supported the appropriateness of this level of temporal aggregation for use in predicting various chronic health effects.19
AQS and MESA Fixed-Site Air Monitoring
Daily NO2 and NOx concentration measurements from 1st June 2005 through 31st December 2014 from the U.S. EPA Air Quality System (AQS; https://aqs.epa.gov/api) were aggregated into 2-week averages. The Los Angeles and Baltimore study domains contained 21 and 11 AQS sites, respectively, as shown in Figure 1. AQS monitors that had less than two years of data or had partial temporal coverage (i.e., operated only in the summer) were excluded from this analysis.
In order to better capture the within-region variability of pollutant concentrations, MESA Air conducted a supplementary fixed-site monitoring campaign to collect more spatially rich data in addition to the AQS sites. Two-week average measurements of NO2 and NOx were repeatedly collected using Ogawa passive samplers20 from July 2005 through August 2009 at five sites in the Baltimore region and at seven sites in the Los Angeles region with total numbers of 86 and 77 measurements in Los Angeles and Baltimore, respectively. MESA Air also collected two-week average measurements of NO2 and NOx at a subset of cohort participant residences in each metropolitan region. Those home outdoor sites were visited on a rotating basis with one to three visits in different seasons from May 2006 through May 2008 in the Los Angeles region and from June 2006 through July 2008 in the Baltimore region. MESA Air monitoring site selection, data collection, and data quality control have been described in detail elsewhere.19, 20
Mobile and Fixed-Site Short-Term Air Monitoring
Details of the short-term mobile monitoring and short-term fixed-site passive sampling device (PSD) monitoring campaigns in Baltimore and Los Angeles are described by Riley et al. and Tessum et al.12,13; the monitoring campaign instrumentation is also summarized in the Supplemental Material. Briefly, 43 intersections were selected throughout each city as sampling sites, targeting intersections with a combination of residential streets and mixed traffic composition roadways. PSD monitoring campaigns were aligned with the mobile monitoring at these sampling sites and conducted at the same time. The Baltimore monitoring campaigns were conducted during the winter (February 12th –February 22nd) and summer (June 18th – June 27th) of 2012. The Los Angeles monitoring campaigns were conducted during the spring (March 9th –March 26th) and summer (June 14th – July 1st) of 2013.
During each PSD monitoring campaign, single two-week integrated NOx and NO2 concentrations were measured using Ogawa samplers (Ogawa & Company, USA, Inc., Pompano Beach, FL) at 43 sampling sites. These passive samplers were positioned on utility poles which were within a distance of approximately 2–8 m from the corresponding intersections.
During mobile monitoring campaign, measurements were taken in the immediate area surrounding the 43 intersections (approximately 300-meter radius) to create corresponding “fuzzy points.” The 300-meter radius was chosen based on Tessum et al. showing that a 300-meter radius buffer includes a sufficient number of measurements and provides the best correlations between collocated and concurrent mobile and PSD measurements among several radii tested.13 On any given mobile monitoring day, NOx and NO2 measurements were sampled through a vehicle-roof-mounted inlet made of Teflon, and were taken between the hours of 14:00 and 19:00. Each sampling site was repeatedly visited by a gasoline powered minivan (a 2012 model for the spring campaign and a 2013 model for the summer campaign) that was driven at an average speed of 20–30 km hr−1 from different directions. During the two-week monitoring period, each intersection was visited a total of 3–5 times for approximately 8–10 minutes each time. Measurements were recorded as 10-second averages. Data synchronization and logging were performed in the LabVIEW environment (with LabVIEW 2010 with DAQmx 9.4 and NI serial 3.8 instrument drivers). The unit of analysis for each pollutant was the median of these measurements during the two-week time periods at each sampling site.
A summary of all monitor sites and measurements by pollutant, region and site type is provided in Table 1. The location of each sampling site is shown in Figure 1.
Table 1.
Pollutant | Site type* | Number of sites | Total observations | Number of observations per site | |
---|---|---|---|---|---|
Minimum | Maximum | ||||
NO2 | Los Angeles, CA | ||||
AQS | 21 | 5049 | 202 | 250 | |
MESA fixed | 7 | 599 | 81 | 90 | |
MESA home | 120 | 216 | 1 | 2 | |
Mobile | 43 | 43 | 1 | 1 | |
PSD | 43 | 43 | 2 | 2 | |
NOx | Los Angeles, CA | ||||
AQS | 21 | 5055 | 109 | 111 | |
MESA fixed | 7 | 599 | 81 | 90 | |
MESA home | 120 | 217 | 1 | 2 | |
Mobile | 43 | 86 | 2 | 2 | |
PSD | 43 | 86 | 2 | 2 | |
Baltimore, MD | |||||
AQS | 11 | 2256 | 147 | 251 | |
MESA fixed | 5 | 386 | 27 | 98 | |
MESA home | 85 | 171 | 1 | 3 | |
Mobile | 43 | 86 | 2 | 2 | |
PSD | 42 | 84 | 1 | 2 |
Site type abbreviations: AQS, U.S. Environmental Protection Agency (EPA) Air Quality System sites; MESA fixed, MESA Air fixed monitoring sites; MESA home, MESA Air home monitoring sites; Mobile, mobile monitoring sites (a center of “fuzzy points” collocated with passive sampler device [PSD] sites).
Spatio-temporal Modeling Process
Our modeling approach combines the measurements described above with geographic information about the study domains to make predictions of concentrations at times and places for which we do not have measurements. See the Supplementary Information for additional details.
We generate spatio-temporal pollutant concentration surfaces using the ST modeling framework16–19 as implemented in the R language21 (SpatioTemporal package version 1.1.9, R Core Team). Briefly, the model is based on temporal trends, captured as smoothed empirical orthogonal functions estimated from the data, with spatiotemporal variation explained by spatial/geographic and spatiotemporal covariates. The model represents the spatio-temporal concentration surface as a series of temporal trends that vary in space. The model can be written as:
(1) |
where C(s,t) represents the log-transformed 2-week average pollutant concentration at location s and time t, β0(s) is the long-term mean at location s, fi(t) are smooth time trends, and βi(s) are spatially varying coefficients for the time trends, and v(s,t) is spatio-temporal residual variation.
To configure the model, we first estimated temporal trends using AQS and MESA fixed site measurements, and imputed missing measurement values using an expectation maximization-like algorithm17. The time trends were then smoothed using splines with degrees of freedom, df, selected as we describe below. We next estimated long-term average and time trend coefficients, β0(s) and βi(s), respectively, represented as spatial random fields with a spatially varying mean, using geographic covariates at location s and employing a covariance function. We assume the spatiotemporal residuals v(s,t) to be correlated in space and independent in time. The model implicitly accounts for the effects of meteorology through the trend functions and spatially correlated residuals.
More than 300 geographic variables were compiled for use in our model, including proximity variables (e.g., distance to nearest major road) and buffer variables (e.g., total road length within a buffer area of various radii). Variables were selected for inclusion separately for each pollutant and region following the approach described by Keller et al.19 This approach removes geographic variables containing highly influential values or limited variability. Approximately 200 geographic variables remained after applying the selection criteria. An example of selected geographic variables for the NOx model in Los Angeles is shown in Table S1.
In model development, we do not directly use all ~200 geographic covariates shown in Table S1, nor do we select individual geographic covariates for inclusion in the model. Instead, we use partial least squares (PLS) regression to reduce the dimensionality of the covariates and then use the PLS components in model fitting. PLS regression generates linear combinations or scores that maximize the covariance between the dependent variable and the independent components and has been successfully used in spatial22 and spatio-temporal modeling19; typically, three or fewer components are needed to capture the multidimensional information and avoid overfitting. PLS scores at AQS and fixed sites were obtained by regressing the time series of observations, C(s,t), on the smoothed time trends using ordinary least squares regression. PLS scores at mobile/PSD monitoring and MESA home sites were calculated using scaled loadings of all the geographic variables at those locations and the score definitions determined from the regression at AQS and fixed sites. PLS regression was performed using the pls package in R.23
Evaluation of Model Performance
Model hyperparameters were chosen independently for each pollutant in each metropolitan region using leave-one-out cross-validation (LOOCV). Detailed information regarding hyperparameters is provided in the Supplementary Information. To select the model with the best performance, we performed cross-validation using AQS and fixed site measurements for each location in each region based on two-week measurements, withholding all measurements for a single site and predicting concentrations at that site using a model generated based on data at all other locations. We report root mean-squared error (RMSE) as an absolute measure of model performance and cross-validation R2 (denoted by R2CV) as a relative measure of performance.
We additionally evaluated the pure out-of-sample performance of the final models by predicting concentrations at the MESA home sites, which were not used in model generation and thus represent an independent and robust measure to evaluate performance. However, because there only were a few measurements at those home sites, model performance was only evaluated at the two-week time scale. Evaluating model performance with external measurements provides an additional test of the model’s ability to provide predictions beyond the training locations.
RESULTS
Observations of pollutant concentration
Figure 2 shows concentrations of pollutants observed at each type of monitoring site for the entire study duration. NO2 and NOx concentrations sampled at MESA fixed site and home site were similar to ones sampled at AQS sites in both regions during the same sampling period, while mobile and PSD measurements were systematically higher than AQS measurements, especially for NOx measurements. This suggests, as expected, that higher NOx levels are seen at sites close to certain traffic related air pollution sources, especially near intersections. NOx measurements from all types of monitoring sites in Los Angeles are higher than those in Baltimore (Table S2).
Table 2 provides a comparison of the mobile and PSD measurements for each metropolitan region and pollutant. In general, mobile measurements of NO2 and NOx in Los Angeles were systematically lower than PSD measurements, while mobile measurements of NOx in Baltimore were systematically higher than PSD measurements. Mobile measurements of both NO2 and NOx had smaller variance across locations than PSD measurements in the study regions. Seasonal differences were also observed between PSD and mobile NOx measurements: summer mobile measurements were systematically higher than PSD measurements; winter mobile measurements were systematically lower than PSD measurements. It is not clear what factor was responsible for the seasonal differences. Any or all of multiple factors such as seasonal traffic pattern changes, presence of heating sources, different time of day sampling schemes, weather conditions and changes in atmospheric chemistry could have played a role. NO2 measurements were only conducted in the summer, with mobile measurements being systematically lower than PSD measurements and with smaller variance.
Table 2.
Pollutant | Site type | Winter/Spring | Summer | ||
---|---|---|---|---|---|
n | Mean ± SD | n | Mean ± SD | ||
NO2 (ppb) | Los Angeles, CA | ||||
Mobile | - | - | 43 | 12.3±3.22 | |
PSD | - | - | 43 | 14.9±3.37 | |
NOx (ppb) | Los Angeles, CA | ||||
Mobile | 43 | 37.7±7.56 | 43 | 25.4±3.32 | |
PSD | 43 | 48.6±6.93 | 43 | 25.1±4.58 | |
Baltimore, MD | |||||
Mobile | 42 | 35.2±10.0 | 42 | 24.5±6.30 | |
PSD | 42 | 40.2±11.2 | 42 | 12.2±5.67 |
Moderate to high correlations (R2=0.57 to 0.75) were present between mobile and PSD measurements of NO2 and NOx, as shown in Figure 3. The correlation between NOx measurements in Los Angeles was stronger than that in Baltimore.
Model structure
Table 3 shows the selected hyperparameters for each metropolitan region and pollutant. In general, the selected Los Angeles NO2 and NOx models were more highly parameterized than the Baltimore NOx model: the selected Los Angeles NO2 and NOx models had two time trends and had both spatial smoothing in the long-term average [β0(s)] and in the time trend coefficients [βi(s)], while the selected Baltimore NOx model had only one time trend. In addition, the Los Angeles NO2 model had three PLS scores per time trend, while both Los Angeles and Baltimore NOx models had two. Hyperparameters with larger values in LA indicate that more complex models were selected to account for variability in the data.
Table 3.
Pollutant | Location | No. of time trends | No. of PLS scores | df/year in time trend | Spatial smoothing | |
---|---|---|---|---|---|---|
Long-term average (β0) | Time trend coefficients (βi) | |||||
NO2 | Los Angeles, CA | 2 | 3 | 4 | Yes | Yes |
NOx | Los Angeles, CA | 2 | 2 | 8 | Yes | Yes |
Baltimore, MD | 1 | 2 | 4 | Yes | Yes |
Model Results
Table 4 shows MESA home site evaluation results for all models, evaluated at the two-week time scale and classified by pollutant and region. Adding either PSD or mobile data improved home-site prediction performance for NO2 and NOx models in Los Angeles relative to models that included only AQS and fixed monitoring site measurements. Substantial improvement was seen in the Los Angeles NO2 model, but no substantial change was observed in the Los Angeles NOx model. However, the NOx model in Baltimore shows no improvement with addition of PSD data, and even shows some decrease in prediction accuracy at home sites with the addition of mobile data. In general, NO2 and NOx models with PSD data consistently provided more accurate predictions than the comparable models with mobile data.
Table 4.
Pollutant | Model Scenario | Two-week averages | ||
---|---|---|---|---|
RMSE | R 2 CV * | R 2 CV reg * | ||
NO2 | Los Angeles, CA | |||
AQS+Fixed Site | 6.00 | 0.48 | 0.57 | |
AQS+Fixed Site+Mobile | 4.33 | 0.73 | 0.76 | |
AQS+Fixed Site+PSD | 4.27 | 0.74 | 0.77 | |
NOx | Los Angeles, CA | |||
AQS+Fixed Site | 11.0 | 0.82 | 0.83 | |
AQS+Fixed Site+Mobile | 10.6 | 0.83 | 0.84 | |
AQS+Fixed Site+PSD | 10.2 | 0.84 | 0.85 | |
Baltimore, MD | ||||
AQS+Fixed Site | 4.97 | 0.90 | 0.90 | |
AQS+Fixed Site+Mobile | 6.14 | 0.84 | 0.87 | |
AQS+Fixed Site+PSD | 5.08 | 0.89 | 0.90 |
R2CV provides a measure of fit to the 1–1 line, in contrast to the typical regression-based R2 (R2CV reg)
Scatter plots of two-week average home-site predictions and observations for each pollutant and region are shown in Supplemental Material Figures S2, S5 and S8. The benefits of adding mobile or PSD data to the model for NO2 and NOx in Los Angeles are most apparent at high-concentration locations (Figures S2 and S5). However, this is not the case when adding mobile data to the NOx model in Baltimore (Figure S8) where adding mobile data results in under-prediction at high concentration locations, while adding PSD data results in over-prediction at high-concentration locations.
Table 5 provides cross-validation metrics for all models at AQS and fixed sites for both two-week and long-term (entire study period) averages. In general, models with added PSD data consistently showed improved prediction accuracy at AQS and fixed sites at both short- and long-term time scales. Models with added mobile data did not substantially impact performance (either positively or negatively) when initial performance (i.e., models with no added data) is very good (i.e. models of NO2 and NOx in Los Angeles). However, models with added mobile data showed more substantial improvements in prediction accuracy for both two-week and long-term averages where model performance was initially not very good (i.e. models of NOx in Baltimore) as compared to models with added PSD data (R2cv 0.79 vs 0.69 and R2cv 0.82 vs 0.58, respectively).
Table 5.
Pollutant | Model Scenario | Two-week averages | Long-term averages | ||||
---|---|---|---|---|---|---|---|
RMSE | R 2 CV | R 2 CV reg | RMSE | R 2 CV | R 2 CV reg | ||
NO2 | Los Angeles, CA | ||||||
AQS+Fixed Site | 3.56 | 0.81 | 0.82 | 2.21 | 0.83 | 0.85 | |
AQS+Fixed Site+Mobile | 3.63 | 0.81 | 0.81 | 2.26 | 0.82 | 0.83 | |
AQS+Fixed Site+PSD | 3.51 | 0.82 | 0.82 | 2.04 | 0.85 | 0.85 | |
NOx | Los Angeles, CA | ||||||
AQS+Fixed Site | 9.65 | 0.84 | 0.84 | 5.34 | 0.85 | 0.87 | |
AQS+Fixed Site+Mobile | 9.24 | 0.85 | 0.85 | 4.79 | 0.88 | 0.88 | |
AQS+Fixed Site+PSD | 9.04 | 0.86 | 0.86 | 4.30 | 0.90 | 0.90 | |
Baltimore, MD | |||||||
AQS+Fixed Site | 9.61 | 0.61 | 0.75 | 7.88 | 0.40 | 0.82 | |
AQS+Fixed Site+Mobile | 7.07 | 0.79 | 0.79 | 4.28 | 0.82 | 0.85 | |
AQS+Fixed Site+PSD | 8.56 | 0.69 | 0.78 | 6.64 | 0.58 | 0.86 |
Figures S3–4, S6–7, and S9–10 contain scatter plots of predictions and observations for both two-week and long-term averages at AQS and fixed sites. As with predictions of home-site concentrations, the improvement in predicting both short- and long-term AQS and fixed site NO2 concentrations in Los Angeles when adding mobile and PSD data is mainly caused by more accurate predictions at high concentrations (Figures S3 and S4). The models predicting NOx at AQS and fixed sites in Los Angeles when adding mobile and PSD data show a similar pattern in improvement for short-term predications as the NO2 model in Los Angeles (Figure S7), but for long-term predictions prediction accuracy is improved at all concentrations (Figure S6). The improvement in predicting both short- and long-term AQS and fixed site NOx in Baltimore when adding additional data was also due to more accurate predictions at high concentrations (Figures S9 and S10). This improvement was particularly notable with addition of mobile data.
DISCUSSION
This study advances the literature by incorporating short-term mobile and PSD monitoring data targeting traffic intersections into spatio-temporal models to investigate the potential of improving short and long-term model predictions. In recent years, studies have adopted mobile monitoring data for use in LUR-based models for predicting short-term pollutant concentrations; however, to the best of our knowledge this is the first study to incorporate short-term mobile monitoring data for predicting long-term average concentrations. The unique flexible framework of this ST model allows for spatially and temporally unbalanced monitoring data to make this innovation possible. In addition, all models included spatial smoothing via universal kriging of the long-term average (columns 6 and 7 in Table 3), showing that while PLS scores derived from geographic covariates explained much of the spatial variation in the data, borrowing strength across observations nearby in space results in improvement in those models. Additional spatial information from mobile/PSD monitoring data benefited models by better capturing spatial variation in the pollutant concentrations.
Prediction accuracy of most models was excellent (R2CV > 0.8 as shown in Tables 4 and 5, and estimated NRMSE<0.33, RMSE normalized by mean) in both regions for both pollutants when only using regulatory (i.e., AQS) and MESA fixed site monitoring data with long time series, leaving limited potential for further improvement. However, we find that the addition of short-term mobile or PSD measurements, especially the latter, do nevertheless generally increase model prediction performance; substantial improvements are observed in cases where model performance was initially not very good (R2<0.65) but not in cases where performance was already very good (R2>0.80). With mobile and PSD measurements being taken on and adjacent to roadways, they are especially well-suited to capturing the higher concentrations of traffic-related NO2 and NOx (Figure 2) resulting in improved model performance in predicting at higher concentration sites closer to roadways (Figures S2–S10). One possible explanation for the improvement in predicting home-site NOx concentrations in Los Angeles, but not in Baltimore, may be due to more home sites in Los Angeles being closer to roadways than in Baltimore (26% of those sites in Los Angeles were within a 100-meter distance to roadway while 20% of those in Baltimore were within a 100-meter distance to roadway, Figure S11), as well as the home-site NOx concentrations in Los Angeles being higher than those in Baltimore (median concentration: 38 ppb vs. 15 ppb). For model performance at AQS and fixed sites, larger improvements generally were seen among NOx models with mobile and PSD data in Baltimore (26% and 11% improvement in RMSE, respectively) than in the corresponding NOx models in Los Angeles (4% and 6% improvement in RMSE, respectively). This may also be related to the AQS and fixed site monitors in Baltimore being closer to roadways than in Los Angeles (31% of the sites in Baltimore were within 50 meters of the roadway while 7% of those in Los Angeles were within 50 meters of the roadway).
Although the addition of mobile and PSD data to the models resulted in improved pollutant concentration prediction performance, there are at least two additional issues that need to be addressed. First, we found unexpectedly poor performance for the NO2 model using AQS and fixed site data only for home site predictions in Los Angeles compared to other model scenarios. The NO2 models in Los Angeles in general had poorer predictions at home sites compared to NOx models in Los Angeles and NOx models in Baltimore. Part of the explanation for the poor performance of those NO2 models was the presence of a single outlier site, as shown in all three scatter plots in Figure S2. When this site was excluded, the R2CV values increased 6–7% and RMSEs decreased 8%−18%. The high NO2 concentration at this particular residential site was not reflected in a high NOx concentration at the co-located and concurrent sampling period. Second, we noticed the addition of mobile and PSD data worsened R2 and RMSE in two cases (Baltimore NOx models in Table 4 and LA NO2 models in Table 5). It is possible that the added data decreased model overfitting in the original models, thereby resulting in apparently worsened performance measures. Additional research is required to explore this possibility.
Mobile and PSD data were collected at the same locations during the same two-week time periods. Although models using added mobile monitoring data show significant improvement among scenarios where model performance was initially not very good without added data, the improvement is less consistent than models using PSD data across all scenarios. This is likely because mobile measurements were taken during afternoon hours on 3–5 days only and therefore are less representative of two-week averages than are the PSD measurements, which were deployed for the entire two-week period. The mobile measurement values also represent an average of all measurements within 300 m of each intersection, while the PSD devices were deployed at a fixed location by the intersection. Although the mobile data added less value than did the PSD data, mobile data nevertheless did show improvements in some models and therefore may be of use in exposure studies that do not have the resources to deploy PSD monitors, in studies that focus on traffic-related air pollution exposure, or in studies measuring pollutants for which low cost sensors are not yet available. Additionally, our models using additional mobile data show some improvements in predictive ability, albeit not as good as models using PSD data, suggesting that mobile data have the potential to improve model performance even more if they can be sampled to better represent longer time periods, such as two weeks, in this case.
The design of the mobile monitoring campaign in this study was not specifically tailored for air pollution modelling, and therefore had several limitations. First, our mobile monitoring only sampled in the afternoon hours and only on weekdays, a period typically characterized by a higher mixing height and therefore better mixing. This limits the ability of these mobile measurements to serve as surrogates for two-week averages.13 For example, morning hours tend to have highest traffic-related emissions and relatively low atmospheric mixing depths, and weekends tend to have the worst traffic congestion in certain seasons. Second, each monitoring site in this study was visited only 3–5 times; more visits would likely yield a more representative average24. Third, we had 59 and 71 sites in the greater Baltimore and Los Angeles areas, respectively, with only 50 and 56 sites within the respective city limits. Recent work, however, has suggested that use of >200 sites is needed to accurately represent intra-urban concentration variability.25 A larger number of monitoring sites may be especially beneficial in a large urban area such as Los Angeles. Mobile monitoring, by its nature, can provide large spatial coverage. However, in this study we focused on establishing a method to aggregate and combine mobile monitoring data into an air pollution exposure model. Therefore, mobile monitoring data were only aggregated into 43 locations for comparison with the sites where PSD monitoring data were available. Future research could leverage the full mobile dataset to increase the spatial coverage by including more on-road locations, and increase the number of repeat visits at those locations for more representative average measurements.
This study demonstrated the feasibility and value of adding short-term mobile monitoring and passive sampling data to long-term fixed site monitoring data in generating spatio-temporal model predictions of both short and long-term air pollutant concentrations. When aiming to improve exposure predictions for use in epidemiological studies, adding short-term supplemental monitoring data is a potentially useful way of improving fine-scale spatial coverage at both shorter and longer term time frames. Mobile monitoring designs may be particularly effective for modeling traffic-related pollutants that cannot be sampled with low-cost sensors.
Supplementary Material
ACKNOWLEDGMENTS
This publication was made possible by USEPA grants (RD831697 and RD83479601-0). Its contents are solely the responsibility of the grantee and do not necessarily represent the official views of the USEPA. Further, the USEPA does not endorse the purchase of any commercial products or services mentioned in the publication. Additional support was provided by the National Institute of Environmental Health Sciences sponsored University of Washington Biostatistics, Epidemiologic and Bioinformatic Training in Environmental Health (BEBTEH) Training Grant (T32ES015459), and National Institute of Environmental Health Sciences sponsored University of Washington Interdisciplinary Center For Exposures, Diseases, Genomics & Environment Grant (P30ES007033) and the National Institute of Environmental Health Sciences and National Institute of Aging sponsored Adult Changes in Thought Air Pollution Study (R01ES026187). This publication’s contents are solely the responsibility of the authors and do not necessarily represent the official views of the sponsoring agencies.
Reference
- 1.Fujita EM; Campbell DE; Arnott WP; Johnson T; Ollison W Concentrations of mobile source air pollutants in urban microenvironments. Journal of the Air & Waste Management Association. 2014, 64 (7), 743–758. [DOI] [PubMed] [Google Scholar]
- 2.Mercer LD; Szpiro AA; Sheppard L; Lindström J; Adar SD; Allen RW; Avol EL; Oron AP; Larson T; Liu L-S; Kaufman JD Comparing universal kriging and land-use regression for predicting concentrations of gaseous oxides of nitrogen (NOx) for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Atmos Environ. 2011, 45: 4412–4420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Westerdahl D; Fruin S; Sax T; Fine PM; Sioutas C Mobile platform measurements of ultrafine particles and associated pollutant concentrations on freeways and residential streets in Los Angeles. Atmos. Environ 2005, 39 (20), 3597–3610. [Google Scholar]
- 4.Westerdahl D; Fruin S; Fine PL; Sioutas C The Los Angeles International Airport as a source of ultrafine particles and other pollutants to nearby communities. Atmos. Environ 2008, 42 (13), 3143–3155. [Google Scholar]
- 5.Riley EA; Banks L; Fintzi J; Gould TR; Hartin K; Schaal L; Davey M; Sheppard L; Larson T; Yost MG; Simpson CD Multi-pollutant mobile platform measurements of air pollutants adjacent to a major roadway. Atmos. Environ 2014, 98, 492–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Van den Bossche J; Peters J; Verwaeren J; Botteldooren D; Theunis J; De Baets B Mobile monitoring for mapping spatial variation in urban air quality: development and validation of a methodology based on an extensive dataset. Atmos. Environ 2015, 105, 148–161. [Google Scholar]
- 7.Montagne DR; Hoek G; Klompmaker JO; Wang M; Meliefste K; Brunekreef B Land use regression models for ultrafine particles and black carbon based on short-term monitoring predict past spatial variation. Environ. Sci. Technol 2015, 49(14), 8712–8720. [DOI] [PubMed] [Google Scholar]
- 8.Larson T; Gould T; Riley EA; Austin E; Fintzi J; Sheppard L; Yost M; Simpson C Ambient air quality measurements from a continuously moving mobile platform: Estimation of area-wide, fuel-based, mobile source emission factors using absolute principal component scores. Atmos. Environ 2017, 152, 201–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kerckhoffs J; Hoek G; Vlaanderen J; van Nunen E; Messier K; Brunekreef B; Gulliver J; Vermeulen R Robustness of intra urban land-use regression models for ultrafine particles and black carbon based on mobile monitoring. Environmental Research. 2017, 159, 500–508. [DOI] [PubMed] [Google Scholar]
- 10.Hoek G Methods for assessing long-term exposures to outdoor air pollutants. Curr Envir Health Rpt. 2017, 4(4), 450–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Levy I; Mihele C; Lu G; Narayan J; Hilker N; Brook JR Elucidating multipollutant exposure across a complex metropolitan area by systematic deployment of a mobile laboratory. Atmospheric Chemistry and Physics. 2014, 14, 7173–7193. [Google Scholar]
- 12.Riley E; Schaal L; Sasakura M; Crampton R; Gould T; Hartin K; Sheppard L; Larson T; Simpson C; Yost M Correlations between short-term mobile monitoring and long-term passive sampler measurements of traffic-related air pollution. Atmos. Environ 2016, 132, 229–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tessum MW; Larson T; Gould TR; Simpson CD; Yost MG; Vedal S Mobile and fixed-site measurements to identify spatial distributions of traffic-related pollution sources in Los Angeles. Environ. Sci. Technol 2008, 52 (5), 2844–2853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chambliss S; Preble CV; Caubel JJ; Cados T; Messier KP; Alvarez RA; LaFranchi B; Lunden MM; Marshall JD; Szpiro AA; Kirchstetter TW; Apte JS Comparison of Mobile and Fixed-Site Black Carbon Measurements for High-Resolution Urban Pollution Mapping. Environ. Sci. Technol 2020, DOI: 10.1021/acs.est.0c01409. [DOI] [PubMed] [Google Scholar]
- 15.Pirjola L; Lahde T; Niemi J; Kousa A; Ronkko T; Karjalainen P; Keskinen J; Frey A; Hillamo R 2012. Spatial and temporal characterization of traffic emissions in urban microenvironments with a mobile laboratory. Atmos. Environ 2012, 63, 156–167. [Google Scholar]
- 16.Szpiro AA; Sampson PD; Sheppard L; Lumley T; Adar SD; Kaufman JD Predicting intra- urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics 2010, 21, 606–631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sampson PD; Szpiro AA; Sheppard L; Lindström J; Kaufman JD Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data. Atmos. Environ 2011, 45, 6593–6606. [Google Scholar]
- 18.Lindström J; Szpiro AA; Sampson PD; Oron AP; Richards M; Larson TV; Sheppard L A flexible spatio-temporal model for air pollution with spatiotemporal covariates. Environ. Ecol. Stat 2014, 21, 411–433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Keller JP; Olives C; Kim SY; Sheppard L; Sampson PD; Szpiro AA; Oron AP; Lindstrom J; Vedal S; Kaufman JD A unified spatiotemporal modeling approach for predicting concentrations of multiple air pollutants in the multi-ethnic study of atherosclerosis and air pollution. Environ. Health Perspect 2015, 123 (4), 301–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cohen MA; Adar SD; Allen RW; Avol E; Curl CL; Gould T; Hardie D; Ho A; Kinney P; Larson TV; Sampson P; Sheppard L; Stukovsky KD; Swan SS; Liu LJS; Kaufman JD Approach to Estimating Participant Pollutant Exposures in the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Environ. Sci. Technol 2009, 43, 4687–4693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lindström J; Szpiro AA; Sampson PD; Bergen S; Oron AP SpatioTemporal: Spatio-Temporal Model Estimation. R package version 1.1.9. 2018. https://CRAN.R-project.org/package=SpatioTemporal.
- 22.Sampson PD; Richards M; Szpiro AA; Bergen S; Sheppard L; Larson TV; Kaufman JD 2013. A regionalized national universal kriging model using partial least squares regression for estimating annual PM2.5 concentrations in epidemiology. Atmos. Environ 2013, 75, 383–392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mevik BH; Wehrens R; Liland KH PLS: Partial Least Squares and Principal Component regression. R Package Version 2.3–0. 2011. http://cran.r-project.org/web/packages/pls/index.html.
- 24.Apte JS; Messier KP; Gani S; Brauer M; Kirchstetter TW; Lunden MM; Marshall JD; Portier CJ; Vermeulen RCH; Hamburg SP Hamburg, S.P. High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data. Environ. Sci. Technol 2017, 51, 6999–7008. [DOI] [PubMed] [Google Scholar]
- 25.Hatzopoulou M; Valois MF; Levy I; Mihele C; Lu G; Bagg S; Minet L; Brook J Robustness of Land-Use Regression Models Developed from Mobile Air Pollutant Measurements. Environ. Sci. Technol 2017, 51, 3938–3947. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.