Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 1.
Published in final edited form as: Atmos Environ (1994). 2021 Mar 1;248:118234. doi: 10.1016/j.atmosenv.2021.118234

Improved Estimation of Trends in U.S. Ozone Concentrations Adjusted for Interannual Variability in Meteorological Conditions

Benjamin Wells 1,*, Pat Dolwick 1, Brian Eder 1, Mark Evangelista 1, Kristen Foley 1, Elizabeth Mannshardt 1, Chris Misenis 1, Anthony Weishampel 2
PMCID: PMC7995240  NIHMSID: NIHMS1680892  PMID: 33776540

Abstract

Daily maximum 8-hour average (MDA8) ozone (O3) concentrations are well-known to be influenced by local meteorological conditions, which vary across both daily and seasonal temporal scales. Previous studies have adjusted long-term trends in O3 concentrations for meteorological effects using various statistical and mathematical methods in order to get a better estimate of the long-term changes in O3 concentrations due to changes in precursor emissions such as nitrogen oxides (NOX) and volatile organic compounds (VOCs). In this work, the authors present improvements to the current method used by the United States Environmental Protection Agency (US EPA) to adjust O3 trends for meteorological influences by making refinements to the input data sources and by allowing the underlying statistical model to vary locally using a variable selection procedure. The current method is also expanded by using a quantile regression model to adjust trends in the 90th and 98th percentiles of the distribution of MDA8 O3 concentrations, allowing for a better understanding of the effects of local meteorology on peak O3 levels in addition to seasonal average concentrations. The revised method is used to adjust trends in the May to September mean, 90th percentile, and 98th percentile MDA8 O3 concentrations at over 700 monitoring sites in the U.S. for years 2000 to 2016. The utilization of variable selection and quantile regression allow for a more in-depth understanding of how weather conditions affect O3 levels in the U.S. This represents a fundamental advancement in our ability to understand how interannual variability in weather conditions in the U.S. may impact attainment of the O3 National Ambient Air Quality Standards (NAAQS).

Keywords: ozone, meteorology, trends, statistics, variable selection, quantile regression

1. Introduction

In the Earth’s troposphere, ozone (O3) is a secondary atmospheric pollutant formed through photochemical reactions between precursor gases such as nitrogen oxides (NOx) and volatile organic compounds (VOC). O3 is known to cause respiratory inflammation in humans and is one of six common air pollutants for which National Ambient Air Quality Standards (NAAQS) are set by the United States Environmental Protection Agency (US EPA) as mandated by the Clean Air Act. O3 formation requires sunlight, and the reaction rate generally increases with warmer temperatures. Additionally, variations in local meteorological conditions related to humidity, transport and atmospheric stability can play an important role in determining O3 concentrations.

Adjustment of long-term trends in O3 concentrations for the influence of meteorological variability has been a subject of scientific study since the early 1990’s (Thompson et al., 2000). Early methods typically employed well-known statistical approaches such as linear and nonlinear regression using a small set of observed meteorological variables (Cox and Chu, 1993; Bloomfield et al., 1996). Over the next decade, more sophisticated approaches such as Kolmogorov-Zurbenko (KZ) filters (Milanchus et al., 1998), Bayesian techniques (Huang and Smith, 1999), neural networks (Lu and Chang, 2005), and Principal Component Analysis (PCA; Kovač-Andrić et al., 2009) were employed. In recent years, these approaches have been extended further to include the latest statistical and numerical methods such as quantile regression (Porter et al., 2015), machine learning algorithms (Grange et al., 2018), and convergent cross mapping (Chen et al., 2019). However, most of these newer methods continue to focus on the relationship between O3 and only a small number of meteorological variables, often covering only a small geographic region. These methods may not be more broadly applicable over large geographic regions such as the contiguous U.S., where different meteorological factors may drive O3 formation in different locations. Most newer studies also continue to focus on seasonal mean O3 concentrations, which precludes their use in assessing the effects of meteorology on days with higher concentrations that are most relevant to the form of the O3 NAAQS.

The study by Porter et al. (2015) addresses these concerns by including a large number of meteorological predictor variables at a large number of O3 monitoring stations across the contiguous U.S. The authors also use quantile regression with a variable selection component to identify important meteorological predictors that vary both across geographic regions and across various points in the distribution of MDA8 O3 concentrations. Using gridded meteorological reanalysis data at 32 km spatial resolution, the authors make many useful observations about the relative importance of various meteorological predictors across the distribution of MDA8 O3 concentrations for 10 broad U.S. geographic regions. Here, the authors extend the work of Porter et al. (2015) by incorporating observational meteorology data and by providing meteorologically adjusted trends in both seasonal average and peak MDA8 O3 concentrations.

The US EPA uses a statistical model to adjust trends in the seasonal average of the daily maximum 8-hour (MDA8) O3 concentrations for weather-related variability in order to provide a more accurate assessment of the underlying trend in O3 concentrations caused by changes in precursor emissions (https://www.epa.gov/air-trends/trends-ozone-adjusted-weather-conditions). The current technical approach for performing these adjustments is based on a publication by Camalier et al. (2007). Briefly, this approach employs a Generalized Linear Model (GLM) to establish statistical relationships between seasonal mean O3 concentrations in an urban area and meteorological parameters either measured or calculated from data at a nearby weather station. The GLM is then used to adjust the observed O3 trend by estimating the seasonal mean O3 concentrations that would result in the absence of interannual meteorological variability, or in other words, the trend that would result if meteorological conditions each year were similar to the long-term average. Here, the work of Camalier at al. (2007) is extended to also consider peak concentrations which better align with the form of the O3 NAAQS.

In this work, the authors explore potential refinements to the existing method presented in Camalier et al. (2007) for adjusting O3 trends for meteorological effects, which are listed below.

  1. In urban areas, the existing approach adjusts the trend based on the maximum MDA8 O3 concentration observed across the area on each day of the season, defined as May to September. In large urban areas, such as Los Angeles, CA, both O3 concentrations and weather conditions may vary significantly, and the existing approach precludes exploring how meteorology may impact trends in O3 observed at individual monitoring sites within the area. Therefore, the adjustment of O3 trends at individual monitoring sites is explored in order to better understand how local weather conditions may impact O3 concentrations across an urban area.

  2. The existing approach matches O3 concentration data with meteorological data by pairing O3 monitoring sites in an urban area with observations from a nearby weather station, usually an airport or Air Force base. However, these pairings were created manually and sometimes involve distances of 50 km or more between the O3 and meteorological measurement stations, especially in large urban areas. In addition, the existing approach requires data coverage for the entire trend period at both the weather station and the O3 monitoring sites in order to fit the GLM. This work explores alternative approaches such as the use of gridded meteorological data products and the use of Kriging techniques to estimate meteorological conditions at the O3 monitoring site locations.

  3. In the existing approach, the GLM is fit using a constant set of 9 meteorological variables (the 8 variables listed in Table 2 of Camalier et al. (2007), plus Julian Day) for each urban area. These variables were included in the model based on the frequency of statistical significance in a set of 39 eastern U.S. urban areas. The existing approach does not consider that meteorological conditions contributing to or inhibiting O3 formation may be different in the western U.S., nor does it acknowledge that meteorological conditions affecting MDA8 O3 concentrations may vary by location. An alternative approach explored here identifies the set of meteorological variables for each O3 monitoring site which are most strongly associated with the observed MDA8 O3 concentrations.

  4. The existing approach adjusts the trend in the annual May to September mean MDA8 O3 concentration. While understanding the effects of meteorology on seasonal mean concentrations can be useful, the form of the O3 NAAQS is based on the annual 4th highest MDA8 O3 concentration. Thus, policy decisions are primarily focused on achieving reductions in peak O3 concentrations. However, studies have shown that reductions in peak O3 concentrations achieved by reducing precursor emissions often do not translate into reductions in seasonal mean O3 concentrations, especially in urban areas (Lefohn et al., 2017; Simon et al., 2015). Thus, this work newly explores the effects of meteorology on peak O3 concentrations by developing quantile regression methods for adjusting trends in the 90th and 98th percentiles of the distribution of May to September MDA8 O3 concentrations.

An incremental testing approach is applied in this study: starting from the existing approach, one aspect of the input data or statistical method is altered, and the model is re-fit. The resulting model is then compared against the previous fit. If the change improves the model fit or does not degrade performance by including a technical improvement, the alteration is kept and used as the baseline in the next set of comparisons. In this way, each component of the model that is changed can be evaluated without confounding influences from other components.

2. Input Data

The statistical model used to adjust MDA8 O3 trends for meteorological influences requires four types of input data: O3 concentration data, surface meteorological data, above ground atmospheric data, and atmospheric transport data. The sections below describe the sources and processing for each data type. The input data sources for the existing approach are described in Camalier et al. (2007).

a. O3 Concentration Data

MDA8 O3 concentration data for over 1,900 monitoring stations which operated in the U.S. between 2000 and 2016 were retrieved from the US EPA’s Air Quality System (AQS, https://www.epa.gov/aqs/) database. Monitoring sites in the contiguous U.S. which reported MDA8 O3 concentrations for at least 100 days in the months of May through September in each year from 2000 to 2016 were included in the analysis. A total of 702 monitoring sites in 47 states met these requirements.

b. Surface Meteorological Data

Hourly surface measurement data, including temperature, dew point temperature, atmospheric pressure, wind speed and direction, cloud cover, and precipitation were downloaded from the Integrated Surface Database (ISD, ftp://ftp.ncdc.noaa.gov/pub/data/noaa/) maintained by the National Oceanic and Atmospheric Administration (NOAA). The surface dataset consisted of measurements from over 4,500 stations in the U.S. which reported data between 2000 and 2016. The number of stations reporting data in individual years varied, ranging from 925 in 2001 to 2,811 in 2013. The hourly measurements were then used to calculate 8 daily parameters. The hourly variables are listed in Table 1 and daily parameters are listed in Table 2.

Table 1.

Raw data inputs included in meteorological adjustment dataset

Parameter Name Parameter Description Units Data Source, Frequency
T Temperature °C ISD, hourly
DPT Dew point temperature °C ISD, hourly
SLP Sea level pressure mb ISD, hourly
WD Wind direction degrees ISD, hourly
WS Wind speed m/s ISD, hourly
CC Cloud cover oktas ISD, hourly
PCP Precipitation mm ISD, hourly
APCP Accumulated total precipitation mm NARR, 3-hourly
AIR.2m Air temperature at 2m above ground level °C NARR, 3-hourly
DSWRF Downward shortwave radiation flux W/m2 NARR, 3-hourly
DPT.2m Dew point temperature at 2m above ground level °C NARR, 3-hourly
HPBL Planetary boundary layer height m NARR, 3-hourly
RHUM.2m Relative humidity at 2m above ground level % NARR, 3-hourly
TCDC Total cloud cover % NARR, 3-hourly
UWND.10m U component of wind speed at 10m above ground level m/s NARR, 3-hourly
VWND.10m V component of wind speed at 10m above ground level m/s NARR, 3-hourly
AIR.925 Air temperature at 925 mb °C NARR, 3-hourly
AIR.850 Air temperature at 850 mb °C NARR, 3-hourly
AIR.700 Air temperature at 700 mb °C NARR, 3-hourly
AIR.500 Air temperature at 500 mb °C NARR, 3-hourly
LAT.HH Latitude of backward trajectory at HH hours (HH=0–24) degrees HYSPLIT, hourly
LON.HH Longitude of backward trajectory at HH hours (HH=0–24) degrees HYSPLIT, hourly
HGT.HH Height of backward trajectory at HH hours (HH=0–24) m HYSPLIT, hourly

Table 2.

Daily data inputs included in meteorological adjustment dataset for variable selection

Parameter Name Parameter Description Units
TMAX Daily maximum surface temperature °C
DPTMID Mid-day (10 AM - 4 PM LST) average dewpoint temperature °C
RHMID Mid-day (10 AM - 4 PM LST) average relative humidity %
WDAM Morning (7–10 AM LST) average wind direction degrees
WDPM Afternoon (1–4 PM LST) average wind direction degrees
WSAM Morning (7–10 AM LST) average wind speed m/s
WSPM Afternoon (1–4 PM LST) average wind speed m/s
CCDAY Daytime (7 AM - 7 PM LST) average cloud cover oktas
DT925 NARR surface - 925 mb temperature difference at 2100 UTC °C
DT850 NARR surface - 850 mb temperature difference at 2100 UTC °C
DT700 NARR surface - 700 mb temperature difference at 2100 UTC °C
DT500 NARR surface - 500 mb temperature difference at 2100 UTC °C
DEVT925 NARR 925 mb temperature anomaly at 2100 UTC °C
DEVT850 NARR 850 mb temperature anomaly at 2100 UTC °C
DEVT700 NARR 700 mb temperature anomaly at 2100 UTC °C
DEVT500 NARR 500 mb temperature anomaly at 2100 UTC °C
HPBLMAX Daily maximum NARR planetary boundary layer height m
SOLRAD Daily sum of NARR downward shortwave radiation flux W/m2
TDIR12 12-hour transport direction starting at 2100 UTC degrees
TDIR24 24-hour transport direction starting at 2100 UTC degrees
TDIS12 12-hour transport distance starting at 2100 UTC km
TDIS24 24-hour transport distance starting at 2100 UTC km
LOD Length of daylight (time from sunrise to sunset) minutes
JDAY Julian day (1 = 1-Jan, 2 = 2-Jan, …, 365/366=31-Dec) none

The values of the daily surface meteorological parameters were estimated at each of the 702 O3 monitoring sites. This was accomplished by first fitting a thin plate spline to the observations, then evaluating the thin plate spline at the O3 monitoring site locations (Green and Silverman, 1994). The spline fitting and evaluation was implemented using the Tps function in the fields package (Nychka et al., 2017) of the R statistical computing environment (R Core Team, 2019).

c. Above Ground Atmospheric Data

The North American Regional Reanalysis (NARR) gridded dataset maintained by NOAA contains dozens of meteorological variables at the surface, subsurface, and various above ground atmospheric levels covering the North American continent at 32 km × 32 km spatial resolution and 3-hourly (or 8x daily) temporal resolution. These data are assimilated from surface stations, radiosondes, satellites, aircraft and other observational sources to produce a long-term picture of weather over North America from 1979 to present (Mesinger et al., 2006).

NARR 3-hourly data for land cells covering the contiguous U.S. were downloaded from a NOAA website (ftp://ftp.cdc.noaa.gov/Datasets/NARR/) for 9 surface variables and 4 above ground atmospheric variables for years 2000 to 2016. The 3-hourly data were used to calculate 10 daily parameters, which were then paired with the MDA8 O3 concentrations by retrieving the values from the grid cells containing the O3 monitoring sites. Long-term monthly means based on years 1979–2000 were also downloaded for use in some calculations. The 3-hourly variables are listed in Table 1 and the daily parameters are listed in Table 2.

d. Atmospheric Transport Data

The Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model developed by NOAA (Draxler and Hess, 1998) is one of the most widely used models for atmospheric trajectory and dispersion calculations. The model source code was downloaded (https://www.ready.noaa.gov/HYSPLIT.php) and used to run 24-hour backwards trajectories (i.e., trajectories estimating the location of the air parcel currently at the O3 monitoring site 24 hours in the past) at a starting height of 300 m above ground level from the 702 O3 monitoring sites for each hour in years 2000 to 2016. The trajectories starting at 2100 Universal Coordinated Time (UTC; 2100 UTC corresponds to mid-afternoon in the U.S.) were used to calculate a daily transport distance (i.e., the distance between the starting and ending points of the trajectory, in km) and transport direction (i.e., the angle between the starting and ending points of the trajectory, in degrees clockwise from due north) for each O3 monitoring site (see Table 2).

A sensitivity analysis was performed to assess the viability of using the NARR data in place of both the surface measurements and the Integrated Global Radiosonde Archive (IGRA) measurement data used in the existing approach. Three datasets were compared for the set of 112 urban areas and 49 rural monitoring sites used by the EPA in their meteorologically adjusted trends for 2000–2016. The first dataset used the input data based on Camalier et al. (2007), the second dataset used the closest NARR equivalent for each variable, and the third “hybrid” dataset used the Camalier et al. (2007) input data for the surface variables and the NARR data for the upper-air variables.

This analysis showed that the NARR data resulted in a poorer model fit compared to the Camalier et al. (2007) data, with a mean decrease of 0.08 in Pearson R-squared (R2). However, the hybrid dataset produced a model fit similar to Camalier et al. (2007) and virtually no change in the magnitude of the adjustments to the trend. The improved spatial and temporal resolution and elimination of missing values in the data was considered a technical improvement, therefore the NARR data was used in place of the IGRA measurements for the upper-air variables going forward. Further details and results from this analysis are presented in the Supplemental Information.

A second sensitivity analysis was performed to assess how the use of Kriging to estimate the values of the meteorological parameters at the O3 monitoring sites compared to the existing approach. The “paired” dataset from Camalier et al. (2007) where MDA8 O3 concentrations were paired with surface data from a nearby weather station was compared to an “interpolated” dataset where meteorological variables were estimated at each O3 monitoring site using thin plate splines. In addition, transport variables were determined using HYSPLIT backward trajectories originating from the weather station in the paired dataset and originating from the O3 monitoring sites in the interpolated dataset. This comparison was performed for a set of six urban areas (with trends fit for all O3 monitoring sites in each area) and nine rural O3 monitoring sites chosen for geographic diversity.

The results from this analysis showed that Kriging resulted in a slight net improvement in model fit (mean increase of 0.03 in R2) compared to the existing method. Rural and suburban O3 monitoring sites that were farther from the nearest weather stations tended to have the largest improvements in model fit, while urban O3 monitoring sites that were closer to the weather stations tended to have little change in model fit. The relative importance of each parameter varied by location, however, parameters which tend to vary on smaller spatial scales, such as wind speed, generally tended to be better predictors using the interpolated dataset. While the urban trends fit using individual O3 monitoring sites were generally lower than the trends fit using the area wide highest MDA8 O3 concentration on each day, as expected, the shapes of the trend lines tended to be similar. Therefore, the interpolated dataset described above was used as input to the statistical model fit at each O3 monitoring site going forward. Further details and results from this sensitivity analysis are presented in the Supplemental Information.

3. Statistical Methods

The statistical model used in Camalier et al. (2007) takes the form of a Generalized Linear Model (GLM) which can be expressed by the equation:

g(μi)=α0+j=13k=19βj,k*fj(xi,k)+p=1Nγp*Yi,p+d=17δd*Wi,d Equation 1

The term g(μi) represents the link function g (McCullagh and Nelder, 1989) for the expected MDA8 O3, μi, on the ith day. A diagnostic evaluation showed that the log link function continues to be the most appropriate choice for the 2000 to 2016 trend period. The term α0 represents the overall mean response. The functions fj(xi,k) represent the jth order natural spline function for the kth meteorological variable on the ith day, and the terms βj,k represent the effects based on each combination of function and variable. The natural spline functions allow for both linear and nonlinear effects for each meteorological variable. The terms γp represent the effects for the pth year, where Yi,p={1,dayiin year p0,otherwise}, and finally the terms δd represent the effects for the dth weekday (1=Sunday, 2=Monday, …, 7=Saturday), where Wi,d={1,dayiis weekday d0,otherwise} .

Using the resulting fit from the GLM above, the adjusted seasonal mean MDA8 O3 trend can be calculated as follows:

F(Yp)=exp(α0^+γp^1N*i=1Nγi^),i=1,2,,N Equation 2

where F(Yp) is the adjusted seasonal mean MDA8 O3 value for year p, α0^ and γp^ are the effects estimates for α0 and γp, respectively, and N is the number of years in the trend period. The mean of the γp^ terms is subtracted so that the adjusted values are centered on the long-term average rather than the first year.

Starting with the initial set of 24 daily meteorological variables listed in Table 2, an automated variable selection procedure was employed to determine the meteorological variables most likely to affect MDA8 O3 concentrations at each monitoring station. The procedure is a form of forward selection: starting with a model that includes only the intercept and year terms, each of the 24 variables in Table 2 is added to the model, and the variable which has the best predictive power in terms of the lowest Akaike Information Criterion (AIC; Akaike, 1974) is selected. Next, each of the remaining variables is in turn added to the model, and the resulting two-variable model with the lowest AIC is selected, and so on. The selection process is ended when ten meteorological variables are included in the model, or when none of the remaining variables reduce the AIC.

Two customized features were added to the variable selection process described above to safeguard against overfitting and multicollinearity. First, similar terms were excluded from the list of candidate variables in Table 2. Specifically, any time a variable from one of the sets {DT500, DT700, DT850, DT925}, {DEVT925, DEVT850, DEVT700, DEVT500}, {TDIR12, TDIR24} or {TDIS12, TDIS24} was selected, the other candidate variable(s) in that set were removed from further consideration. Second, each time a new variable was selected, the Pearson correlation between that variable and each of the remaining candidate variables was calculated, and any candidate variables whose correlation exceeded 0.8 were removed from further consideration. Additionally, the day of week term was not included in the model as it was deemed that this variable was intended to represent weekday-weekend effects, which are driven by changes in precursor emissions and not meteorology. Finally, the directional variables WDAM, WDPM, TDIR12, and TDIR24, which were treated as continuous variables in Camalier et al. (2007), were treated as factor variables with eight levels representing the cardinal and ordinal directions (i.e., N, NE, E, SE, S, SW, W, NW). This was implemented by dividing the 360-degree circle into eight equal portions (e.g., angles between 22.5 and 67.5 degrees clockwise from due north were assigned the NE factor).

The model described above was extended to adjust trends in peak MDA8 O3 levels using a quantile regression model (Koenker, 2005). For a given quantile, τ, the regression model takes a form similar to Equation 1, with the adjusted trend calculated as in Equation 2. The variable selection procedure described above was applied to the input data for each of the 702 O3 monitoring sites using quantile regression for τ = 0.5, 0.9 and 0.98, corresponding to the median, 90th and 98th percentiles of the annual May to September distribution of MDA8 O3 concentrations, respectively. The case of τ = 0.98 is of particular interest since it corresponds to approximately the annual 4th highest MDA8 O3 concentration (based on a May to September O3 season), which is the form of the O3 NAAQS. The case of τ = 0.5 is intended as a sensitivity analysis: assuming the distribution of MDA8 O3 concentrations is approximately lognormal, log-linear models for the mean and median MDA8 O3 concentrations should produce similar results. The quantile regression model fitting was implemented using the quantreg package in the statistical software R (Koenker, 2019).

4. Results and Discussion

Figure 1 shows a map of the R2 statistics based on the fitted GLMs using the variable selection procedure described above and a map comparing those values with the R2 statistics based on the model parameterization in Camalier et al. (2007) for each of the 702 O3 monitoring sites. About 85% of the O3 monitoring sites had R2 statistics between 0.6 and 0.8, with a median R2 value of 0.71. The model fits were generally good in the eastern U.S., however, isolated instances where the R2 statistics were as low as 0.4 occurred at some rural sites in the western U.S. Nationally, the median R2 statistic increased by 0.02 using the variable selection procedure, with the largest increases occurring at sites near the Gulf of Mexico and urban areas in the western U.S. Overall, more than 95% of the O3 monitoring sites had an increase in R2 using the variable selection procedure. The large decreases in the R2 statistics at some sites near the Los Angeles, CA metropolitan area were due to the removal of the day of week term.

Figure 1.

Figure 1.

Map showing Coefficient of Determination (R2) statistics based on the GLM fitted at each O3 monitoring site using variable selection (top panel), and a map showing the difference in R2 statistics between variable selection and Camalier et al. (2007) (bottom panel).

Figure 2 shows a comparison of the adjusted seasonal mean O3 trends using GLMs fit using the variable selection procedure and those using the model parameterization in Camalier et al. (2007). As would be expected, the adjusted trends are less variable than the observed trends. The adjusted trends resulting from both methods are generally similar, with the adjusted values based on variable selection being 0.2 to 0.6 ppb higher between 2005 and 2012 and 0.5 to 1.2 ppb lower from 2013 to 2016. A site-level comparison of the adjusted trends (see Section S3 in the Supplemental Information) revealed clear spatial patterns in the differences between the adjusted values produced by the variable selection procedure and Camalier et al. (2007). However, it remains unclear which meteorological factor or factors contribute to these patterns. About 97% of the adjusted values at the individual O3 monitoring sites differed by less than 2 ppb, with about 80% differing by less than 1 ppb. Overall, this indicates a reasonable level of agreement between the two models.

Figure 2.

Figure 2.

Comparison of adjusted May to September mean MDA8 O3 trends using variable selection and Camalier et al. (2007) (top panel); and distribution of site-level differences in the adjusted O3 trends using variable selection and Camalier et al. (2007) (bottom panel).

Figure 3 shows the national mean of the observed and adjusted site-level median, 90th percentile, and 98th percentile MDA8 O3 concentrations using the fits from the quantile regression models, along with the seasonal mean MDA8 O3 trend based on the GLMs. In general, the trends in median and mean MDA8 O3 are very similar, which indicates good agreement between the quantile regression models and the GLMs. It is noteworthy that the mean and median MDA8 O3 trends converge in the most recent years, which is expected as substantial reductions in peak O3 concentrations have occurred. Overall, the shape of the 90th and 98th percentile trends and the magnitude of the adjustments are similar to those for the mean and median. However, some differences are apparent, most notably the transition from slight upward adjustments in the mean and median trends to downward adjustments in the 90th and 98th percentile trends in 2002, and the transition from slight downward adjustments in the median and mean to upward adjustments in the 90th and 98th percentiles in 2013 and 2014.

Figure 3.

Figure 3.

Observed (dashed lines) and adjusted (solid lines) trends in May to September mean (black), median (green), 90th percentile (blue) and 98th percentile (red) MDA8 O3 concentrations.

Figure 4 shows maps of the magnitude of the adjustments (i.e., the observed values minus adjusted values) to the seasonal mean and 98th percentile values at individual O3 monitoring sites in 2014. The maps indicate that meteorological conditions were near normal in terms of being conducive to O3 formation in terms of the seasonal mean over much of the U.S. On the other hand, the magnitude of adjustments to the 98th percentile concentrations were much larger, with a much larger spatial gradient. Meteorological conditions were less favorable for peak O3 concentrations than normal in the northeastern U.S., the Midwest, Texas and central California, as indicated by the negative magnitude of the adjustments in those areas. Meteorological conditions were more favorable for peak O3 concentrations than normal in parts of the southeastern U.S., coastal California, and near Las Vegas, NV, as indicated by the positive magnitude of the adjustments in those areas. In general, the magnitude of the adjustments tends to be larger for peak O3 concentrations than for median or mean concentrations. The spatial gradients in the adjustments also tend to be larger for peak O3 concentrations, indicating that peak concentrations are more affected by more localized meteorological processes such as wind and cloud cover than mean and median concentrations.

Figure 4.

Figure 4.

Map showing the magnitude of the adjustments to the May to September mean (top panel) and 98th percentile (bottom panel) MDA8 O3 at individual monitoring sites in 2014.

Variable selection was an additional aspect of the alternative method to explore possible meteorological contributors beyond those considered in Camalier et al. (2007). Figure 5 shows the number of times each of the 24 meteorological variables in the initial set were included in the GLMs and the quantile regression models for the 98th percentile, and the order in which each variable was selected (1st – 10th). This figure shows that mid-day relative humidity and the daily maximum temperature were the most important predictors of MDA8 O3. Together, these two variables were selected first by both the GLM and the quantile regression model at more than half of the O3 monitoring sites. This is consistent with the results found in Camalier et al. (2007). The Julian day term, which serves as a surrogate for seasonal changes in meteorology, was selected for the majority of sites in both models, as well as mid-day dew point temperature and daytime cloud cover, which were not included in the Camalier et al. (2007) model. The GLM selected fewer than ten terms for inclusion in the model at only four O3 monitoring sites, while the quantile regression model for the 98th percentile selected ten terms at about 90% of the O3 monitoring sites.

Figure 5.

Figure 5.

Bar charts showing the number of sites where each meteorological variable was selected for the GLM (top panel) and the quantile regression model for the 98th percentile (bottom panel), with colors indicating the order in which each variable was selected.

Figure 6 shows maps of the variable selection order at each O3 monitoring site using the GLMs for daily maximum temperature and mid-day relative humidity. The daily maximum temperature was most useful as a predictor of MDA8 O3 in the northeastern U.S., the Midwest, and central California, while the mid-day relative humidity was most useful as a predictor of MDA8 O3 in the southeastern U.S., the northwestern U.S., and coastal California. These results demonstrate a high level of geographic coherence in the variables selected by the models and are generally in agreement with the results presented in Camalier et al. (2007). The variable selection order in the quantile regression models was generally similar to that of the GLMs, with a slight tendency toward favoring more localized meteorological conditions such as cloud cover, wind speed and direction, and transport.

Figure 6.

Figure 6.

Maps showing the variable selection order (1st – 10th) for each O3 monitoring site for daily maximum temperature (top panel) and mid-day relative humidity (bottom panel).

Additional results, including regional trends, maps showing the spatial distribution of the selection order for each meteorological variable and model, and maps showing the magnitude of the adjustments for each year and model are included in the Supplemental Information.

5. Conclusions and Future Applications

The technical refinements to the EPA’s current approach for adjusting O3 trends for interannual variability in meteorological conditions presented here allow for a more in-depth understanding of how weather conditions affect O3 levels in the U.S., which in turn may better inform air quality policy and decision making. The refinements to the input data sources allow for the statistical models to be fit using estimates of meteorological conditions specific to each individual O3 monitoring site, rather than aggregated over urban areas which can span up to 100 km. In combination with the use of variable selection to choose the best meteorological predictors of MDA8 O3 concentrations unique to each location, these refinements can improve our understanding of how weather conditions impact O3 levels both regionally and at individual monitoring sites within the same geographic region. The ability to adjust trends in peak MDA8 O3 concentrations via the implementation of quantile regression methods represents a fundamental advancement in terms of our ability to understand how interannual variability in weather conditions may impact attainment of the O3 NAAQS. In addition, these trends may help inform air quality modelers as to the overall representativeness of the meteorological conditions which contribute to peak O3 levels in a particular year.

One application which could follow from this approach is an attempt to further our understanding of how long-term changes in temperature and other meteorological conditions have impacted trends in mean and peak O3 concentrations and how these impacts could affect attainment of the O3 NAAQS in the future. Previous studies have shown that long-term changes in meteorological conditions can contribute to higher O3 concentrations (Bloomer et al, 2009; Weaver et al, 2009). One could envision a study where adjusted trends in seasonal mean and peak O3 concentrations are assessed over a longer period (e.g., 30 years or more), thus allowing for the estimation of the impacts of long-term changes in meteorological conditions on these trends. These estimates could then be applied to modeled projections of O3 levels to assess the likelihood of attainment under future meteorological conditions.

Another potential future application is the extension of this approach to adjust trends in other pollutants, particularly fine particulate matter (PM2.5) for variability in weather conditions. Meteorological adjustment of PM2.5 trends would likely present additional challenges, particularly in terms of understanding how different components of PM2.5 such as sulfates and nitrates are uniquely affected by weather conditions. However, these efforts would offer direct benefits in terms of understanding how meteorology could impact PM2.5 levels in terms of the form of the NAAQS, which are based on the annual mean and the 98th percentile.

Supplementary Material

Supplement1

Acknowledgements:

The authors would like to thank Marshall Furman and Emili Moan from North Carolina State University; and Brian Timin, Norm Possiel, Chris Nolte, Tanya Spero, Kiran Alapaty, Elizabeth Naess, James Hemby and Ravi Srivastava from the U.S. Environmental Protection Agency for their contributions to the development and review of this article.

Footnotes

Publisher's Disclaimer: Disclaimer:

Publisher's Disclaimer: Although this article has been reviewed by the U.S. EPA and approved for publication, it does not necessarily reflect the U.S. EPA’s policies or views.

References

  1. Akaike H (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6): 716–723. [Google Scholar]
  2. Bloomer BJ, Stehr JW, Piety CA, Salawitch RJ, Dickerson RR (2009). Observed relationships of ozone air pollution with temperature and emissions. Geophysical Research Letters, 36 (9). [Google Scholar]
  3. Bloomfield P, Royle J, Steinberg L, Yang Q (1996). Accounting for meteorological effects in measuring urban ozone levels and trends. Atmospheric Environment, 30, 3067–2077. [Google Scholar]
  4. Camalier L, Cox W, Dolwick P (2007). The effects of meteorology on ozone in urban areas and their use in assessing ozone trends. Atmospheric Environment, 41, 7127–7137. [Google Scholar]
  5. Chen Z, Zhuang Y, Xie X, Chen D, Cheng N, Yang L, Li R (2019). Understanding long-term variations of meteorological influences on ground ozone concentrations in Beijing during 2006–2016. Environmental Pollution, 245, 29–37. [DOI] [PubMed] [Google Scholar]
  6. Cox W, Chu S (1993). Meteorologically adjusted ozone trends in urban areas: a probabilistic approach. Atmospheric Environment, 27B, 425–434. [Google Scholar]
  7. Draxler RR and Hess GD (1998). An overview of the HYSPLIT_4 modeling system for trajectories, dispersion, and deposition. Australian Meteorological Magazine, 47, 295–308. [Google Scholar]
  8. Grange SK, Carslaw DC, Lewis AC, Boleti E, Hueglin C (2018). Random forest meteorological normalisation models for Swiss PM10 trend analysis. Atmospheric Chemistry and Physics, 18, 6223–6239. [Google Scholar]
  9. Green PJ, Silverman BW (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. United Kingdom: Taylor & Francis. [Google Scholar]
  10. Huang L, Smith RL (1999). Meteorologically-dependent trends in urban ozone. Environmetrics, 10, 103–118. [Google Scholar]
  11. Koenker R (2005). Quantile Regression. United Kingdom: Cambridge University Press. [Google Scholar]
  12. Koenker R (2019). quantreg: Quantile Regression. R package version 5.54. https://CRAN.R-project.org/package=quantreg
  13. Kovač-Andrić E, Brana J, Gvozdić V (2009). Impact of meteorological factors on ozone concentrations modelled by time series analysis and multivariate statistical methods. Ecological Informatics, 4, 117–122. [Google Scholar]
  14. Lefohn A, Malley C, Simon H, Wells B, Xu X, Zhang L, Tao W (2017). Responses of human health and vegetation exposure metrics to changes in ozone concentration distributions in the European Union, United States, and China. Atmospheric Environment, 152, 123–145. [Google Scholar]
  15. Lu H, Chang T (2005). Meteorologically adjusted trends of daily maximum ozone concentrations in Taipei, Taiwan. Atmospheric Environment, 39, 6491–6501. [Google Scholar]
  16. McCullagh P, Nelder JA (1989). Generalized Linear Models, 2nd Edition. United Kingdom: Taylor & Francis. [Google Scholar]
  17. Mesinger F, Dimego G, Kalnay E, Mitchell K, Shafran PC, Ebisuzaki W, Jovic D, Woollen J, Rogers E, Berbery EH, Ek M, Yun F, Grumbine R, Higgins W, Hong L, Ying L, Manikin G, Parrish D, Wei S (2006). A long-term, consistent, high-resolution climate dataset for the North American domain, as a major improvement upon the earlier global reanalysis datasets in both resolution and accuracy presented. Bulletin for the American Meteorological Society, 87, 342–360. [Google Scholar]
  18. Milanchus M, Rao ST, Zurbenko I (1998). Evaluating the effectiveness of ozone management efforts in the presence of meteorological variability. Journal of the Air & Waste Management Association, 48, 201–215. [DOI] [PubMed] [Google Scholar]
  19. Nychka D, Furrer R, Paige J, Sain S (2017). fields: Tools for spatial data. R package version 10.0, 10.5065/D6W957CT. [DOI]
  20. Porter WC, Heald CL, Cooley D, Russell B (2015). Investigating the observed sensitivities of air quality extremes to meteorological drivers via quantile regression. Atmospheric Chemistry and Physics, 15, 10349–10366. [Google Scholar]
  21. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. [Google Scholar]
  22. Simon H, Reff A, Wells B, Xing J, Frank N (2015). Ozone trends across the United States over a period of decreasing NOx and VOC emissions. Environmental Science & Technology, 49, 186–195. [DOI] [PubMed] [Google Scholar]
  23. Thompson ML, Reynolds J, Cox LH, Guttorp P, Sampson PD (2001). A review of statistical methods for the meteorological adjustment of tropospheric ozone. Atmospheric Environment, 35, 617–630. [Google Scholar]
  24. Weaver CP, and et al. (2009). A preliminary synthesis of modeled climate changes impacts on U.S. regional ozone concentrations. Bulletin for the American Meteorological Society, 90, 1843–1864. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

RESOURCES