Improved Estimation of Trends in U.S. Ozone Concentrations Adjusted for Interannual Variability in Meteorological Conditions

Benjamin Wells; Pat Dolwick; Brian Eder; Mark Evangelista; Kristen Foley; Elizabeth Mannshardt; Chris Misenis; Anthony Weishampel

doi:10.1016/j.atmosenv.2021.118234

. Author manuscript; available in PMC: 2022 Mar 1.

Published in final edited form as: Atmos Environ (1994). 2021 Mar 1;248:118234. doi: 10.1016/j.atmosenv.2021.118234

Improved Estimation of Trends in U.S. Ozone Concentrations Adjusted for Interannual Variability in Meteorological Conditions

Benjamin Wells ^1,^*, Pat Dolwick ¹, Brian Eder ¹, Mark Evangelista ¹, Kristen Foley ¹, Elizabeth Mannshardt ¹, Chris Misenis ¹, Anthony Weishampel ²

PMCID: PMC7995240 NIHMSID: NIHMS1680892 PMID: 33776540

Abstract

Daily maximum 8-hour average (MDA8) ozone (O₃) concentrations are well-known to be influenced by local meteorological conditions, which vary across both daily and seasonal temporal scales. Previous studies have adjusted long-term trends in O₃ concentrations for meteorological effects using various statistical and mathematical methods in order to get a better estimate of the long-term changes in O₃ concentrations due to changes in precursor emissions such as nitrogen oxides (NO_X) and volatile organic compounds (VOCs). In this work, the authors present improvements to the current method used by the United States Environmental Protection Agency (US EPA) to adjust O₃ trends for meteorological influences by making refinements to the input data sources and by allowing the underlying statistical model to vary locally using a variable selection procedure. The current method is also expanded by using a quantile regression model to adjust trends in the 90^th and 98^th percentiles of the distribution of MDA8 O₃ concentrations, allowing for a better understanding of the effects of local meteorology on peak O₃ levels in addition to seasonal average concentrations. The revised method is used to adjust trends in the May to September mean, 90^th percentile, and 98^th percentile MDA8 O₃ concentrations at over 700 monitoring sites in the U.S. for years 2000 to 2016. The utilization of variable selection and quantile regression allow for a more in-depth understanding of how weather conditions affect O₃ levels in the U.S. This represents a fundamental advancement in our ability to understand how interannual variability in weather conditions in the U.S. may impact attainment of the O₃ National Ambient Air Quality Standards (NAAQS).

Keywords: ozone, meteorology, trends, statistics, variable selection, quantile regression

1. Introduction

In the Earth’s troposphere, ozone (O₃) is a secondary atmospheric pollutant formed through photochemical reactions between precursor gases such as nitrogen oxides (NOx) and volatile organic compounds (VOC). O₃ is known to cause respiratory inflammation in humans and is one of six common air pollutants for which National Ambient Air Quality Standards (NAAQS) are set by the United States Environmental Protection Agency (US EPA) as mandated by the Clean Air Act. O₃ formation requires sunlight, and the reaction rate generally increases with warmer temperatures. Additionally, variations in local meteorological conditions related to humidity, transport and atmospheric stability can play an important role in determining O₃ concentrations.

Adjustment of long-term trends in O₃ concentrations for the influence of meteorological variability has been a subject of scientific study since the early 1990’s (Thompson et al., 2000). Early methods typically employed well-known statistical approaches such as linear and nonlinear regression using a small set of observed meteorological variables (Cox and Chu, 1993; Bloomfield et al., 1996). Over the next decade, more sophisticated approaches such as Kolmogorov-Zurbenko (KZ) filters (Milanchus et al., 1998), Bayesian techniques (Huang and Smith, 1999), neural networks (Lu and Chang, 2005), and Principal Component Analysis (PCA; Kovač-Andrić et al., 2009) were employed. In recent years, these approaches have been extended further to include the latest statistical and numerical methods such as quantile regression (Porter et al., 2015), machine learning algorithms (Grange et al., 2018), and convergent cross mapping (Chen et al., 2019). However, most of these newer methods continue to focus on the relationship between O₃ and only a small number of meteorological variables, often covering only a small geographic region. These methods may not be more broadly applicable over large geographic regions such as the contiguous U.S., where different meteorological factors may drive O₃ formation in different locations. Most newer studies also continue to focus on seasonal mean O₃ concentrations, which precludes their use in assessing the effects of meteorology on days with higher concentrations that are most relevant to the form of the O₃ NAAQS.

The study by Porter et al. (2015) addresses these concerns by including a large number of meteorological predictor variables at a large number of O₃ monitoring stations across the contiguous U.S. The authors also use quantile regression with a variable selection component to identify important meteorological predictors that vary both across geographic regions and across various points in the distribution of MDA8 O₃ concentrations. Using gridded meteorological reanalysis data at 32 km spatial resolution, the authors make many useful observations about the relative importance of various meteorological predictors across the distribution of MDA8 O₃ concentrations for 10 broad U.S. geographic regions. Here, the authors extend the work of Porter et al. (2015) by incorporating observational meteorology data and by providing meteorologically adjusted trends in both seasonal average and peak MDA8 O₃ concentrations.

The US EPA uses a statistical model to adjust trends in the seasonal average of the daily maximum 8-hour (MDA8) O₃ concentrations for weather-related variability in order to provide a more accurate assessment of the underlying trend in O₃ concentrations caused by changes in precursor emissions (https://www.epa.gov/air-trends/trends-ozone-adjusted-weather-conditions). The current technical approach for performing these adjustments is based on a publication by Camalier et al. (2007). Briefly, this approach employs a Generalized Linear Model (GLM) to establish statistical relationships between seasonal mean O₃ concentrations in an urban area and meteorological parameters either measured or calculated from data at a nearby weather station. The GLM is then used to adjust the observed O₃ trend by estimating the seasonal mean O₃ concentrations that would result in the absence of interannual meteorological variability, or in other words, the trend that would result if meteorological conditions each year were similar to the long-term average. Here, the work of Camalier at al. (2007) is extended to also consider peak concentrations which better align with the form of the O₃ NAAQS.

In this work, the authors explore potential refinements to the existing method presented in Camalier et al. (2007) for adjusting O₃ trends for meteorological effects, which are listed below.

In urban areas, the existing approach adjusts the trend based on the maximum MDA8 O₃ concentration observed across the area on each day of the season, defined as May to September. In large urban areas, such as Los Angeles, CA, both O₃ concentrations and weather conditions may vary significantly, and the existing approach precludes exploring how meteorology may impact trends in O₃ observed at individual monitoring sites within the area. Therefore, the adjustment of O₃ trends at individual monitoring sites is explored in order to better understand how local weather conditions may impact O₃ concentrations across an urban area.
The existing approach matches O₃ concentration data with meteorological data by pairing O₃ monitoring sites in an urban area with observations from a nearby weather station, usually an airport or Air Force base. However, these pairings were created manually and sometimes involve distances of 50 km or more between the O₃ and meteorological measurement stations, especially in large urban areas. In addition, the existing approach requires data coverage for the entire trend period at both the weather station and the O₃ monitoring sites in order to fit the GLM. This work explores alternative approaches such as the use of gridded meteorological data products and the use of Kriging techniques to estimate meteorological conditions at the O₃ monitoring site locations.
In the existing approach, the GLM is fit using a constant set of 9 meteorological variables (the 8 variables listed in Table 2 of Camalier et al. (2007), plus Julian Day) for each urban area. These variables were included in the model based on the frequency of statistical significance in a set of 39 eastern U.S. urban areas. The existing approach does not consider that meteorological conditions contributing to or inhibiting O₃ formation may be different in the western U.S., nor does it acknowledge that meteorological conditions affecting MDA8 O₃ concentrations may vary by location. An alternative approach explored here identifies the set of meteorological variables for each O₃ monitoring site which are most strongly associated with the observed MDA8 O₃ concentrations.
The existing approach adjusts the trend in the annual May to September mean MDA8 O₃ concentration. While understanding the effects of meteorology on seasonal mean concentrations can be useful, the form of the O₃ NAAQS is based on the annual 4^th highest MDA8 O₃ concentration. Thus, policy decisions are primarily focused on achieving reductions in peak O₃ concentrations. However, studies have shown that reductions in peak O₃ concentrations achieved by reducing precursor emissions often do not translate into reductions in seasonal mean O₃ concentrations, especially in urban areas (Lefohn et al., 2017; Simon et al., 2015). Thus, this work newly explores the effects of meteorology on peak O₃ concentrations by developing quantile regression methods for adjusting trends in the 90^th and 98^th percentiles of the distribution of May to September MDA8 O₃ concentrations.

An incremental testing approach is applied in this study: starting from the existing approach, one aspect of the input data or statistical method is altered, and the model is re-fit. The resulting model is then compared against the previous fit. If the change improves the model fit or does not degrade performance by including a technical improvement, the alteration is kept and used as the baseline in the next set of comparisons. In this way, each component of the model that is changed can be evaluated without confounding influences from other components.

2. Input Data

The statistical model used to adjust MDA8 O₃ trends for meteorological influences requires four types of input data: O₃ concentration data, surface meteorological data, above ground atmospheric data, and atmospheric transport data. The sections below describe the sources and processing for each data type. The input data sources for the existing approach are described in Camalier et al. (2007).

a. O₃ Concentration Data

MDA8 O₃ concentration data for over 1,900 monitoring stations which operated in the U.S. between 2000 and 2016 were retrieved from the US EPA’s Air Quality System (AQS, https://www.epa.gov/aqs/) database. Monitoring sites in the contiguous U.S. which reported MDA8 O₃ concentrations for at least 100 days in the months of May through September in each year from 2000 to 2016 were included in the analysis. A total of 702 monitoring sites in 47 states met these requirements.

b. Surface Meteorological Data

Hourly surface measurement data, including temperature, dew point temperature, atmospheric pressure, wind speed and direction, cloud cover, and precipitation were downloaded from the Integrated Surface Database (ISD, ftp://ftp.ncdc.noaa.gov/pub/data/noaa/) maintained by the National Oceanic and Atmospheric Administration (NOAA). The surface dataset consisted of measurements from over 4,500 stations in the U.S. which reported data between 2000 and 2016. The number of stations reporting data in individual years varied, ranging from 925 in 2001 to 2,811 in 2013. The hourly measurements were then used to calculate 8 daily parameters. The hourly variables are listed in Table 1 and daily parameters are listed in Table 2.

Table 1.

Raw data inputs included in meteorological adjustment dataset

Parameter Name	Parameter Description	Units	Data Source, Frequency
T	Temperature	°C	ISD, hourly
DPT	Dew point temperature	°C	ISD, hourly
SLP	Sea level pressure	mb	ISD, hourly
WD	Wind direction	degrees	ISD, hourly
WS	Wind speed	m/s	ISD, hourly
CC	Cloud cover	oktas	ISD, hourly
PCP	Precipitation	mm	ISD, hourly
APCP	Accumulated total precipitation	mm	NARR, 3-hourly
AIR.2m	Air temperature at 2m above ground level	°C	NARR, 3-hourly
DSWRF	Downward shortwave radiation flux	W/m²	NARR, 3-hourly
DPT.2m	Dew point temperature at 2m above ground level	°C	NARR, 3-hourly
HPBL	Planetary boundary layer height	m	NARR, 3-hourly
RHUM.2m	Relative humidity at 2m above ground level	%	NARR, 3-hourly
TCDC	Total cloud cover	%	NARR, 3-hourly
UWND.10m	U component of wind speed at 10m above ground level	m/s	NARR, 3-hourly
VWND.10m	V component of wind speed at 10m above ground level	m/s	NARR, 3-hourly
AIR.925	Air temperature at 925 mb	°C	NARR, 3-hourly
AIR.850	Air temperature at 850 mb	°C	NARR, 3-hourly
AIR.700	Air temperature at 700 mb	°C	NARR, 3-hourly
AIR.500	Air temperature at 500 mb	°C	NARR, 3-hourly
LAT.HH	Latitude of backward trajectory at HH hours (HH=0–24)	degrees	HYSPLIT, hourly
LON.HH	Longitude of backward trajectory at HH hours (HH=0–24)	degrees	HYSPLIT, hourly
HGT.HH	Height of backward trajectory at HH hours (HH=0–24)	m	HYSPLIT, hourly

Open in a new tab

Table 2.

Daily data inputs included in meteorological adjustment dataset for variable selection

Parameter Name	Parameter Description	Units
TMAX	Daily maximum surface temperature	°C
DPTMID	Mid-day (10 AM - 4 PM LST) average dewpoint temperature	°C
RHMID	Mid-day (10 AM - 4 PM LST) average relative humidity	%
WDAM	Morning (7–10 AM LST) average wind direction	degrees
WDPM	Afternoon (1–4 PM LST) average wind direction	degrees
WSAM	Morning (7–10 AM LST) average wind speed	m/s
WSPM	Afternoon (1–4 PM LST) average wind speed	m/s
CCDAY	Daytime (7 AM - 7 PM LST) average cloud cover	oktas
DT925	NARR surface - 925 mb temperature difference at 2100 UTC	°C
DT850	NARR surface - 850 mb temperature difference at 2100 UTC	°C
DT700	NARR surface - 700 mb temperature difference at 2100 UTC	°C
DT500	NARR surface - 500 mb temperature difference at 2100 UTC	°C
DEVT925	NARR 925 mb temperature anomaly at 2100 UTC	°C
DEVT850	NARR 850 mb temperature anomaly at 2100 UTC	°C
DEVT700	NARR 700 mb temperature anomaly at 2100 UTC	°C
DEVT500	NARR 500 mb temperature anomaly at 2100 UTC	°C
HPBLMAX	Daily maximum NARR planetary boundary layer height	m
SOLRAD	Daily sum of NARR downward shortwave radiation flux	W/m²
TDIR12	12-hour transport direction starting at 2100 UTC	degrees
TDIR24	24-hour transport direction starting at 2100 UTC	degrees
TDIS12	12-hour transport distance starting at 2100 UTC	km
TDIS24	24-hour transport distance starting at 2100 UTC	km
LOD	Length of daylight (time from sunrise to sunset)	minutes
JDAY	Julian day (1 = 1-Jan, 2 = 2-Jan, …, 365/366=31-Dec)	none

Open in a new tab

The values of the daily surface meteorological parameters were estimated at each of the 702 O₃ monitoring sites. This was accomplished by first fitting a thin plate spline to the observations, then evaluating the thin plate spline at the O₃ monitoring site locations (Green and Silverman, 1994). The spline fitting and evaluation was implemented using the Tps function in the fields package (Nychka et al., 2017) of the R statistical computing environment (R Core Team, 2019).

c. Above Ground Atmospheric Data

The North American Regional Reanalysis (NARR) gridded dataset maintained by NOAA contains dozens of meteorological variables at the surface, subsurface, and various above ground atmospheric levels covering the North American continent at 32 km × 32 km spatial resolution and 3-hourly (or 8x daily) temporal resolution. These data are assimilated from surface stations, radiosondes, satellites, aircraft and other observational sources to produce a long-term picture of weather over North America from 1979 to present (Mesinger et al., 2006).

NARR 3-hourly data for land cells covering the contiguous U.S. were downloaded from a NOAA website (ftp://ftp.cdc.noaa.gov/Datasets/NARR/) for 9 surface variables and 4 above ground atmospheric variables for years 2000 to 2016. The 3-hourly data were used to calculate 10 daily parameters, which were then paired with the MDA8 O₃ concentrations by retrieving the values from the grid cells containing the O₃ monitoring sites. Long-term monthly means based on years 1979–2000 were also downloaded for use in some calculations. The 3-hourly variables are listed in Table 1 and the daily parameters are listed in Table 2.

d. Atmospheric Transport Data

The Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model developed by NOAA (Draxler and Hess, 1998) is one of the most widely used models for atmospheric trajectory and dispersion calculations. The model source code was downloaded (https://www.ready.noaa.gov/HYSPLIT.php) and used to run 24-hour backwards trajectories (i.e., trajectories estimating the location of the air parcel currently at the O₃ monitoring site 24 hours in the past) at a starting height of 300 m above ground level from the 702 O₃ monitoring sites for each hour in years 2000 to 2016. The trajectories starting at 2100 Universal Coordinated Time (UTC; 2100 UTC corresponds to mid-afternoon in the U.S.) were used to calculate a daily transport distance (i.e., the distance between the starting and ending points of the trajectory, in km) and transport direction (i.e., the angle between the starting and ending points of the trajectory, in degrees clockwise from due north) for each O₃ monitoring site (see Table 2).

A sensitivity analysis was performed to assess the viability of using the NARR data in place of both the surface measurements and the Integrated Global Radiosonde Archive (IGRA) measurement data used in the existing approach. Three datasets were compared for the set of 112 urban areas and 49 rural monitoring sites used by the EPA in their meteorologically adjusted trends for 2000–2016. The first dataset used the input data based on Camalier et al. (2007), the second dataset used the closest NARR equivalent for each variable, and the third “hybrid” dataset used the Camalier et al. (2007) input data for the surface variables and the NARR data for the upper-air variables.

This analysis showed that the NARR data resulted in a poorer model fit compared to the Camalier et al. (2007) data, with a mean decrease of 0.08 in Pearson R-squared (R²). However, the hybrid dataset produced a model fit similar to Camalier et al. (2007) and virtually no change in the magnitude of the adjustments to the trend. The improved spatial and temporal resolution and elimination of missing values in the data was considered a technical improvement, therefore the NARR data was used in place of the IGRA measurements for the upper-air variables going forward. Further details and results from this analysis are presented in the Supplemental Information.

A second sensitivity analysis was performed to assess how the use of Kriging to estimate the values of the meteorological parameters at the O₃ monitoring sites compared to the existing approach. The “paired” dataset from Camalier et al. (2007) where MDA8 O₃ concentrations were paired with surface data from a nearby weather station was compared to an “interpolated” dataset where meteorological variables were estimated at each O₃ monitoring site using thin plate splines. In addition, transport variables were determined using HYSPLIT backward trajectories originating from the weather station in the paired dataset and originating from the O₃ monitoring sites in the interpolated dataset. This comparison was performed for a set of six urban areas (with trends fit for all O₃ monitoring sites in each area) and nine rural O₃ monitoring sites chosen for geographic diversity.

The results from this analysis showed that Kriging resulted in a slight net improvement in model fit (mean increase of 0.03 in R²) compared to the existing method. Rural and suburban O₃ monitoring sites that were farther from the nearest weather stations tended to have the largest improvements in model fit, while urban O₃ monitoring sites that were closer to the weather stations tended to have little change in model fit. The relative importance of each parameter varied by location, however, parameters which tend to vary on smaller spatial scales, such as wind speed, generally tended to be better predictors using the interpolated dataset. While the urban trends fit using individual O₃ monitoring sites were generally lower than the trends fit using the area wide highest MDA8 O₃ concentration on each day, as expected, the shapes of the trend lines tended to be similar. Therefore, the interpolated dataset described above was used as input to the statistical model fit at each O₃ monitoring site going forward. Further details and results from this sensitivity analysis are presented in the Supplemental Information.

3. Statistical Methods

The statistical model used in Camalier et al. (2007) takes the form of a Generalized Linear Model (GLM) which can be expressed by the equation:

g (μ_{i}) = α_{0} + \sum_{j = 1}^{3} \sum_{k = 1}^{9} β_{j, k} * f_{j} (x_{i, k}) + \sum_{p = 1}^{N} γ_{p} * Y_{i, p} + \sum_{d = 1}^{7} δ_{d} * W_{i, d}

Equation 1

The term g(μ_i) represents the link function g (McCullagh and Nelder, 1989) for the expected MDA8 O₃, μ_i, on the i^th day. A diagnostic evaluation showed that the log link function continues to be the most appropriate choice for the 2000 to 2016 trend period. The term α₀ represents the overall mean response. The functions f_j(x_i,k) represent the j^th order natural spline function for the k^th meteorological variable on the i^th day, and the terms β_j,k represent the effects based on each combination of function and variable. The natural spline functions allow for both linear and nonlinear effects for each meteorological variable. The terms γ_p represent the effects for the p^th year, where $Y_{i, p} = {\begin{array}{r} 1, & d a y i i n year p \\ 0, & otherwise \end{array}}$ , and finally the terms δ_d represent the effects for the d^th weekday (1=Sunday, 2=Monday, …, 7=Saturday), where $W_{i, d} = {\begin{array}{r} 1, & d a y i i s weekday d \\ 0, & otherwise \end{array}}$ .

Using the resulting fit from the GLM above, the adjusted seasonal mean MDA8 O₃ trend can be calculated as follows:

F (Y_{p}) = exp (\hat{α_{0}} + \hat{γ_{p}} - \frac{1}{N} * \sum_{i = 1}^{N} \hat{γ_{i}}), i = 1, 2, \dots, N

Equation 2

where F(Y_p) is the adjusted seasonal mean MDA8 O₃ value for year p, $\hat{α_{0}}$ and $\hat{γ_{p}}$ are the effects estimates for α₀ and γ_p, respectively, and N is the number of years in the trend period. The mean of the $\hat{γ_{p}}$ terms is subtracted so that the adjusted values are centered on the long-term average rather than the first year.

Starting with the initial set of 24 daily meteorological variables listed in Table 2, an automated variable selection procedure was employed to determine the meteorological variables most likely to affect MDA8 O₃ concentrations at each monitoring station. The procedure is a form of forward selection: starting with a model that includes only the intercept and year terms, each of the 24 variables in Table 2 is added to the model, and the variable which has the best predictive power in terms of the lowest Akaike Information Criterion (AIC; Akaike, 1974) is selected. Next, each of the remaining variables is in turn added to the model, and the resulting two-variable model with the lowest AIC is selected, and so on. The selection process is ended when ten meteorological variables are included in the model, or when none of the remaining variables reduce the AIC.

Two customized features were added to the variable selection process described above to safeguard against overfitting and multicollinearity. First, similar terms were excluded from the list of candidate variables in Table 2. Specifically, any time a variable from one of the sets {DT500, DT700, DT850, DT925}, {DEVT925, DEVT850, DEVT700, DEVT500}, {TDIR12, TDIR24} or {TDIS12, TDIS24} was selected, the other candidate variable(s) in that set were removed from further consideration. Second, each time a new variable was selected, the Pearson correlation between that variable and each of the remaining candidate variables was calculated, and any candidate variables whose correlation exceeded 0.8 were removed from further consideration. Additionally, the day of week term was not included in the model as it was deemed that this variable was intended to represent weekday-weekend effects, which are driven by changes in precursor emissions and not meteorology. Finally, the directional variables WDAM, WDPM, TDIR12, and TDIR24, which were treated as continuous variables in Camalier et al. (2007), were treated as factor variables with eight levels representing the cardinal and ordinal directions (i.e., N, NE, E, SE, S, SW, W, NW). This was implemented by dividing the 360-degree circle into eight equal portions (e.g., angles between 22.5 and 67.5 degrees clockwise from due north were assigned the NE factor).

The model described above was extended to adjust trends in peak MDA8 O₃ levels using a quantile regression model (Koenker, 2005). For a given quantile, τ, the regression model takes a form similar to Equation 1, with the adjusted trend calculated as in Equation 2. The variable selection procedure described above was applied to the input data for each of the 702 O₃ monitoring sites using quantile regression for τ = 0.5, 0.9 and 0.98, corresponding to the median, 90^th and 98^th percentiles of the annual May to September distribution of MDA8 O₃ concentrations, respectively. The case of τ = 0.98 is of particular interest since it corresponds to approximately the annual 4^th highest MDA8 O₃ concentration (based on a May to September O₃ season), which is the form of the O₃ NAAQS. The case of τ = 0.5 is intended as a sensitivity analysis: assuming the distribution of MDA8 O₃ concentrations is approximately lognormal, log-linear models for the mean and median MDA8 O₃ concentrations should produce similar results. The quantile regression model fitting was implemented using the quantreg package in the statistical software R (Koenker, 2019).

4. Results and Discussion

Figure 1 shows a map of the R² statistics based on the fitted GLMs using the variable selection procedure described above and a map comparing those values with the R² statistics based on the model parameterization in Camalier et al. (2007) for each of the 702 O₃ monitoring sites. About 85% of the O₃ monitoring sites had R² statistics between 0.6 and 0.8, with a median R² value of 0.71. The model fits were generally good in the eastern U.S., however, isolated instances where the R² statistics were as low as 0.4 occurred at some rural sites in the western U.S. Nationally, the median R² statistic increased by 0.02 using the variable selection procedure, with the largest increases occurring at sites near the Gulf of Mexico and urban areas in the western U.S. Overall, more than 95% of the O₃ monitoring sites had an increase in R² using the variable selection procedure. The large decreases in the R² statistics at some sites near the Los Angeles, CA metropolitan area were due to the removal of the day of week term.

Figure 2 shows a comparison of the adjusted seasonal mean O₃ trends using GLMs fit using the variable selection procedure and those using the model parameterization in Camalier et al. (2007). As would be expected, the adjusted trends are less variable than the observed trends. The adjusted trends resulting from both methods are generally similar, with the adjusted values based on variable selection being 0.2 to 0.6 ppb higher between 2005 and 2012 and 0.5 to 1.2 ppb lower from 2013 to 2016. A site-level comparison of the adjusted trends (see Section S3 in the Supplemental Information) revealed clear spatial patterns in the differences between the adjusted values produced by the variable selection procedure and Camalier et al. (2007). However, it remains unclear which meteorological factor or factors contribute to these patterns. About 97% of the adjusted values at the individual O₃ monitoring sites differed by less than 2 ppb, with about 80% differing by less than 1 ppb. Overall, this indicates a reasonable level of agreement between the two models.

Figure 3 shows the national mean of the observed and adjusted site-level median, 90^th percentile, and 98^th percentile MDA8 O₃ concentrations using the fits from the quantile regression models, along with the seasonal mean MDA8 O₃ trend based on the GLMs. In general, the trends in median and mean MDA8 O₃ are very similar, which indicates good agreement between the quantile regression models and the GLMs. It is noteworthy that the mean and median MDA8 O₃ trends converge in the most recent years, which is expected as substantial reductions in peak O₃ concentrations have occurred. Overall, the shape of the 90^th and 98^th percentile trends and the magnitude of the adjustments are similar to those for the mean and median. However, some differences are apparent, most notably the transition from slight upward adjustments in the mean and median trends to downward adjustments in the 90^th and 98^th percentile trends in 2002, and the transition from slight downward adjustments in the median and mean to upward adjustments in the 90^th and 98^th percentiles in 2013 and 2014.

Figure 4 shows maps of the magnitude of the adjustments (i.e., the observed values minus adjusted values) to the seasonal mean and 98^th percentile values at individual O₃ monitoring sites in 2014. The maps indicate that meteorological conditions were near normal in terms of being conducive to O₃ formation in terms of the seasonal mean over much of the U.S. On the other hand, the magnitude of adjustments to the 98^th percentile concentrations were much larger, with a much larger spatial gradient. Meteorological conditions were less favorable for peak O₃ concentrations than normal in the northeastern U.S., the Midwest, Texas and central California, as indicated by the negative magnitude of the adjustments in those areas. Meteorological conditions were more favorable for peak O₃ concentrations than normal in parts of the southeastern U.S., coastal California, and near Las Vegas, NV, as indicated by the positive magnitude of the adjustments in those areas. In general, the magnitude of the adjustments tends to be larger for peak O₃ concentrations than for median or mean concentrations. The spatial gradients in the adjustments also tend to be larger for peak O₃ concentrations, indicating that peak concentrations are more affected by more localized meteorological processes such as wind and cloud cover than mean and median concentrations.

Variable selection was an additional aspect of the alternative method to explore possible meteorological contributors beyond those considered in Camalier et al. (2007). Figure 5 shows the number of times each of the 24 meteorological variables in the initial set were included in the GLMs and the quantile regression models for the 98^th percentile, and the order in which each variable was selected (1^st – 10^th). This figure shows that mid-day relative humidity and the daily maximum temperature were the most important predictors of MDA8 O₃. Together, these two variables were selected first by both the GLM and the quantile regression model at more than half of the O₃ monitoring sites. This is consistent with the results found in Camalier et al. (2007). The Julian day term, which serves as a surrogate for seasonal changes in meteorology, was selected for the majority of sites in both models, as well as mid-day dew point temperature and daytime cloud cover, which were not included in the Camalier et al. (2007) model. The GLM selected fewer than ten terms for inclusion in the model at only four O₃ monitoring sites, while the quantile regression model for the 98^th percentile selected ten terms at about 90% of the O₃ monitoring sites.

Figure 5. — Bar charts showing the number of sites where each meteorological variable was selected for the GLM (top panel) and the quantile regression model for the 98^th percentile (bottom panel), with colors indicating the order in which each variable was selected.

Figure 6 shows maps of the variable selection order at each O₃ monitoring site using the GLMs for daily maximum temperature and mid-day relative humidity. The daily maximum temperature was most useful as a predictor of MDA8 O₃ in the northeastern U.S., the Midwest, and central California, while the mid-day relative humidity was most useful as a predictor of MDA8 O₃ in the southeastern U.S., the northwestern U.S., and coastal California. These results demonstrate a high level of geographic coherence in the variables selected by the models and are generally in agreement with the results presented in Camalier et al. (2007). The variable selection order in the quantile regression models was generally similar to that of the GLMs, with a slight tendency toward favoring more localized meteorological conditions such as cloud cover, wind speed and direction, and transport.

Additional results, including regional trends, maps showing the spatial distribution of the selection order for each meteorological variable and model, and maps showing the magnitude of the adjustments for each year and model are included in the Supplemental Information.

5. Conclusions and Future Applications

The technical refinements to the EPA’s current approach for adjusting O₃ trends for interannual variability in meteorological conditions presented here allow for a more in-depth understanding of how weather conditions affect O₃ levels in the U.S., which in turn may better inform air quality policy and decision making. The refinements to the input data sources allow for the statistical models to be fit using estimates of meteorological conditions specific to each individual O₃ monitoring site, rather than aggregated over urban areas which can span up to 100 km. In combination with the use of variable selection to choose the best meteorological predictors of MDA8 O₃ concentrations unique to each location, these refinements can improve our understanding of how weather conditions impact O₃ levels both regionally and at individual monitoring sites within the same geographic region. The ability to adjust trends in peak MDA8 O₃ concentrations via the implementation of quantile regression methods represents a fundamental advancement in terms of our ability to understand how interannual variability in weather conditions may impact attainment of the O₃ NAAQS. In addition, these trends may help inform air quality modelers as to the overall representativeness of the meteorological conditions which contribute to peak O₃ levels in a particular year.

One application which could follow from this approach is an attempt to further our understanding of how long-term changes in temperature and other meteorological conditions have impacted trends in mean and peak O₃ concentrations and how these impacts could affect attainment of the O₃ NAAQS in the future. Previous studies have shown that long-term changes in meteorological conditions can contribute to higher O₃ concentrations (Bloomer et al, 2009; Weaver et al, 2009). One could envision a study where adjusted trends in seasonal mean and peak O₃ concentrations are assessed over a longer period (e.g., 30 years or more), thus allowing for the estimation of the impacts of long-term changes in meteorological conditions on these trends. These estimates could then be applied to modeled projections of O₃ levels to assess the likelihood of attainment under future meteorological conditions.

Another potential future application is the extension of this approach to adjust trends in other pollutants, particularly fine particulate matter (PM_2.5) for variability in weather conditions. Meteorological adjustment of PM_2.5 trends would likely present additional challenges, particularly in terms of understanding how different components of PM_2.5 such as sulfates and nitrates are uniquely affected by weather conditions. However, these efforts would offer direct benefits in terms of understanding how meteorology could impact PM_2.5 levels in terms of the form of the NAAQS, which are based on the annual mean and the 98^th percentile.

Supplementary Material

Supplement1

NIHMS1680892-supplement-Supplement1.pdf^{(18.9MB, pdf)}

Acknowledgements:

The authors would like to thank Marshall Furman and Emili Moan from North Carolina State University; and Brian Timin, Norm Possiel, Chris Nolte, Tanya Spero, Kiran Alapaty, Elizabeth Naess, James Hemby and Ravi Srivastava from the U.S. Environmental Protection Agency for their contributions to the development and review of this article.

Footnotes

Publisher's Disclaimer: Disclaimer:

Publisher's Disclaimer: Although this article has been reviewed by the U.S. EPA and approved for publication, it does not necessarily reflect the U.S. EPA’s policies or views.

References

Akaike H (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6): 716–723. [Google Scholar]
Bloomer BJ, Stehr JW, Piety CA, Salawitch RJ, Dickerson RR (2009). Observed relationships of ozone air pollution with temperature and emissions. Geophysical Research Letters, 36 (9). [Google Scholar]
Bloomfield P, Royle J, Steinberg L, Yang Q (1996). Accounting for meteorological effects in measuring urban ozone levels and trends. Atmospheric Environment, 30, 3067–2077. [Google Scholar]
Camalier L, Cox W, Dolwick P (2007). The effects of meteorology on ozone in urban areas and their use in assessing ozone trends. Atmospheric Environment, 41, 7127–7137. [Google Scholar]
Chen Z, Zhuang Y, Xie X, Chen D, Cheng N, Yang L, Li R (2019). Understanding long-term variations of meteorological influences on ground ozone concentrations in Beijing during 2006–2016. Environmental Pollution, 245, 29–37. [DOI] [PubMed] [Google Scholar]
Cox W, Chu S (1993). Meteorologically adjusted ozone trends in urban areas: a probabilistic approach. Atmospheric Environment, 27B, 425–434. [Google Scholar]
Draxler RR and Hess GD (1998). An overview of the HYSPLIT_4 modeling system for trajectories, dispersion, and deposition. Australian Meteorological Magazine, 47, 295–308. [Google Scholar]
Grange SK, Carslaw DC, Lewis AC, Boleti E, Hueglin C (2018). Random forest meteorological normalisation models for Swiss PM₁₀ trend analysis. Atmospheric Chemistry and Physics, 18, 6223–6239. [Google Scholar]
Green PJ, Silverman BW (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. United Kingdom: Taylor & Francis. [Google Scholar]
Huang L, Smith RL (1999). Meteorologically-dependent trends in urban ozone. Environmetrics, 10, 103–118. [Google Scholar]
Koenker R (2005). Quantile Regression. United Kingdom: Cambridge University Press. [Google Scholar]
Koenker R (2019). quantreg: Quantile Regression. R package version 5.54. https://CRAN.R-project.org/package=quantreg
Kovač-Andrić E, Brana J, Gvozdić V (2009). Impact of meteorological factors on ozone concentrations modelled by time series analysis and multivariate statistical methods. Ecological Informatics, 4, 117–122. [Google Scholar]
Lefohn A, Malley C, Simon H, Wells B, Xu X, Zhang L, Tao W (2017). Responses of human health and vegetation exposure metrics to changes in ozone concentration distributions in the European Union, United States, and China. Atmospheric Environment, 152, 123–145. [Google Scholar]
Lu H, Chang T (2005). Meteorologically adjusted trends of daily maximum ozone concentrations in Taipei, Taiwan. Atmospheric Environment, 39, 6491–6501. [Google Scholar]
McCullagh P, Nelder JA (1989). Generalized Linear Models, 2nd Edition. United Kingdom: Taylor & Francis. [Google Scholar]
Mesinger F, Dimego G, Kalnay E, Mitchell K, Shafran PC, Ebisuzaki W, Jovic D, Woollen J, Rogers E, Berbery EH, Ek M, Yun F, Grumbine R, Higgins W, Hong L, Ying L, Manikin G, Parrish D, Wei S (2006). A long-term, consistent, high-resolution climate dataset for the North American domain, as a major improvement upon the earlier global reanalysis datasets in both resolution and accuracy presented. Bulletin for the American Meteorological Society, 87, 342–360. [Google Scholar]
Milanchus M, Rao ST, Zurbenko I (1998). Evaluating the effectiveness of ozone management efforts in the presence of meteorological variability. Journal of the Air & Waste Management Association, 48, 201–215. [DOI] [PubMed] [Google Scholar]
Nychka D, Furrer R, Paige J, Sain S (2017). fields: Tools for spatial data. R package version 10.0, 10.5065/D6W957CT. [DOI]
Porter WC, Heald CL, Cooley D, Russell B (2015). Investigating the observed sensitivities of air quality extremes to meteorological drivers via quantile regression. Atmospheric Chemistry and Physics, 15, 10349–10366. [Google Scholar]
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. [Google Scholar]
Simon H, Reff A, Wells B, Xing J, Frank N (2015). Ozone trends across the United States over a period of decreasing NOx and VOC emissions. Environmental Science & Technology, 49, 186–195. [DOI] [PubMed] [Google Scholar]
Thompson ML, Reynolds J, Cox LH, Guttorp P, Sampson PD (2001). A review of statistical methods for the meteorological adjustment of tropospheric ozone. Atmospheric Environment, 35, 617–630. [Google Scholar]
Weaver CP, and et al. (2009). A preliminary synthesis of modeled climate changes impacts on U.S. regional ozone concentrations. Bulletin for the American Meteorological Society, 90, 1843–1864. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

NIHMS1680892-supplement-Supplement1.pdf^{(18.9MB, pdf)}

[R1] Akaike H (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6): 716–723. [Google Scholar]

[R2] Bloomer BJ, Stehr JW, Piety CA, Salawitch RJ, Dickerson RR (2009). Observed relationships of ozone air pollution with temperature and emissions. Geophysical Research Letters, 36 (9). [Google Scholar]

[R3] Bloomfield P, Royle J, Steinberg L, Yang Q (1996). Accounting for meteorological effects in measuring urban ozone levels and trends. Atmospheric Environment, 30, 3067–2077. [Google Scholar]

[R4] Camalier L, Cox W, Dolwick P (2007). The effects of meteorology on ozone in urban areas and their use in assessing ozone trends. Atmospheric Environment, 41, 7127–7137. [Google Scholar]

[R5] Chen Z, Zhuang Y, Xie X, Chen D, Cheng N, Yang L, Li R (2019). Understanding long-term variations of meteorological influences on ground ozone concentrations in Beijing during 2006–2016. Environmental Pollution, 245, 29–37. [DOI] [PubMed] [Google Scholar]

[R6] Cox W, Chu S (1993). Meteorologically adjusted ozone trends in urban areas: a probabilistic approach. Atmospheric Environment, 27B, 425–434. [Google Scholar]

[R7] Draxler RR and Hess GD (1998). An overview of the HYSPLIT_4 modeling system for trajectories, dispersion, and deposition. Australian Meteorological Magazine, 47, 295–308. [Google Scholar]

[R8] Grange SK, Carslaw DC, Lewis AC, Boleti E, Hueglin C (2018). Random forest meteorological normalisation models for Swiss PM₁₀ trend analysis. Atmospheric Chemistry and Physics, 18, 6223–6239. [Google Scholar]

[R9] Green PJ, Silverman BW (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. United Kingdom: Taylor & Francis. [Google Scholar]

[R10] Huang L, Smith RL (1999). Meteorologically-dependent trends in urban ozone. Environmetrics, 10, 103–118. [Google Scholar]

[R11] Koenker R (2005). Quantile Regression. United Kingdom: Cambridge University Press. [Google Scholar]

[R12] Koenker R (2019). quantreg: Quantile Regression. R package version 5.54. https://CRAN.R-project.org/package=quantreg

[R13] Kovač-Andrić E, Brana J, Gvozdić V (2009). Impact of meteorological factors on ozone concentrations modelled by time series analysis and multivariate statistical methods. Ecological Informatics, 4, 117–122. [Google Scholar]

[R14] Lefohn A, Malley C, Simon H, Wells B, Xu X, Zhang L, Tao W (2017). Responses of human health and vegetation exposure metrics to changes in ozone concentration distributions in the European Union, United States, and China. Atmospheric Environment, 152, 123–145. [Google Scholar]

[R15] Lu H, Chang T (2005). Meteorologically adjusted trends of daily maximum ozone concentrations in Taipei, Taiwan. Atmospheric Environment, 39, 6491–6501. [Google Scholar]

[R16] McCullagh P, Nelder JA (1989). Generalized Linear Models, 2nd Edition. United Kingdom: Taylor & Francis. [Google Scholar]

[R17] Mesinger F, Dimego G, Kalnay E, Mitchell K, Shafran PC, Ebisuzaki W, Jovic D, Woollen J, Rogers E, Berbery EH, Ek M, Yun F, Grumbine R, Higgins W, Hong L, Ying L, Manikin G, Parrish D, Wei S (2006). A long-term, consistent, high-resolution climate dataset for the North American domain, as a major improvement upon the earlier global reanalysis datasets in both resolution and accuracy presented. Bulletin for the American Meteorological Society, 87, 342–360. [Google Scholar]

[R18] Milanchus M, Rao ST, Zurbenko I (1998). Evaluating the effectiveness of ozone management efforts in the presence of meteorological variability. Journal of the Air & Waste Management Association, 48, 201–215. [DOI] [PubMed] [Google Scholar]

[R19] Nychka D, Furrer R, Paige J, Sain S (2017). fields: Tools for spatial data. R package version 10.0, 10.5065/D6W957CT. [DOI]

[R20] Porter WC, Heald CL, Cooley D, Russell B (2015). Investigating the observed sensitivities of air quality extremes to meteorological drivers via quantile regression. Atmospheric Chemistry and Physics, 15, 10349–10366. [Google Scholar]

[R21] R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. [Google Scholar]

[R22] Simon H, Reff A, Wells B, Xing J, Frank N (2015). Ozone trends across the United States over a period of decreasing NOx and VOC emissions. Environmental Science & Technology, 49, 186–195. [DOI] [PubMed] [Google Scholar]

[R23] Thompson ML, Reynolds J, Cox LH, Guttorp P, Sampson PD (2001). A review of statistical methods for the meteorological adjustment of tropospheric ozone. Atmospheric Environment, 35, 617–630. [Google Scholar]

[R24] Weaver CP, and et al. (2009). A preliminary synthesis of modeled climate changes impacts on U.S. regional ozone concentrations. Bulletin for the American Meteorological Society, 90, 1843–1864. [Google Scholar]

PERMALINK

Improved Estimation of Trends in U.S. Ozone Concentrations Adjusted for Interannual Variability in Meteorological Conditions

Benjamin Wells

Pat Dolwick

Brian Eder

Mark Evangelista

Kristen Foley

Elizabeth Mannshardt

Chris Misenis

Anthony Weishampel

Abstract

1. Introduction