Abstract
Investigating the health impacts of wildfire smoke requires data on people’s exposure to fine particulate matter (PM2.5) across space and time. In recent years, it has become common to use machine learning models to fill gaps in monitoring data. However, it remains unclear how well these models are able to capture spikes in PM2.5 during and across wildfire events. Here, we evaluate the accuracy of two sets of high-coverage and high-resolution machine learning-derived PM2.5 data sets created by Di et al. (2021) and Reid et al. (2021). In general, the Reid estimates are more accurate than the Di estimates when compared to independent validation data from mobile smoke monitors deployed by the US Forest Service. However, both models tend to severely under-predict PM2.5 on high-pollution days. Our findings complement other recent studies calling for increased air pollution monitoring in the western US and support the inclusion of wildfire-specific monitoring observations and predictor variables in model-based estimates of PM2.5. Lastly, we call for more rigorous error quantification of machine-learning derived exposure data sets, with special attention to extreme events.
Keywords: Air Pollution, Wildfires, Exposure Assessment, Environmental Health, Data Science, Validation
Introduction
Motivation: Health Impacts of Wildfire Smoke
Wildfires increasing PM2.5 (fine particulate matter) in the United States (US) could undermine the strides the US has made in reducing non-smoke related PM2.5 in recent decades.1,2 There is robust evidence of the severe health consequences of wildfire-related PM2.5 exposure, ranging from premature mortality to asthma exacerbation.3–5 However, there is mixed evidence for impacts of wildfire smoke on some health endpoints, such as cardiovascular disease.5–7 Many have argued that addressing this inconsistency in findings for the effects of wildfire smoke on some health endpoints, in addition to being able to study cumulative health impacts of wildfire smoke, requires smoke exposure assessment over large time periods and regions, enabling investigation across fire events.
While wildfire smoke often exposes people to a large collection of harmful air pollutants (including inorganic gases such as carbon monoxide, ozone, and NOX; hydrocarbons, and more8), the evidence to date points to PM2.5 as the pollutant of greatest concern for human health. Accordingly, epidemiologic studies evaluating the health impacts of wildfire smoke most commonly rely on estimates of ambient PM2.5 concentrations for exposure assessment. Ground measurements of PM2.5 from federal reference method (FRM) or federal equivalence method (FEM) monitors, used primarily for regulation of air pollution from point and mobile sources, are considered the gold standard for measuring PM2.5 concentrations. However, due to spatial and temporal sparsity in these monitor observations, models which estimate continuous surfaces of PM2.5 that can be easily integrated into epidemiologic analyses are commonly utilized. A variety of modeling approaches are used for exposure estimation, including geostatistical interpolation9 and machine learning algorithms, which can incorporate diverse information sources including remotely-sensed data products, chemical transport model output, land cover/usage variables characterizing human activity, and measurements from lower-cost and -accuracy monitors/sensors.10–13 Numerous model-based high-resolution PM2.5 products exist and are used widely in air pollution (including wildfire smoke) epidemiology.5,6,14
Wildfire smoke epidemiologic studies have shown that the choice of model-based PM2.5 estimates employed for exposure assessment can have a large impact on epidemiologic findings.15 For instance, a study of the 2012 Washington wildfires16 found that the association between PM2.5 and chronic obstructive pulmonary disease differed by how smoke exposure was estimated, varying both by direction (increased vs. decreased risk) and by whether or not the result was statistically significant. A study of the 2017 California wildfires17 also found that failing to account for the uncertainty in the PM2.5 exposure estimates resulted in an underestimation of the uncertainty in the association with health outcomes. However, to our knowledge, no studies have yet evaluated nor compared the accuracy of high-coverage (and high-resolution) PM2.5 products for epidemiologic exposure assessment during wildfire smoke episodes. More broadly, while there has been an influx of machine learning-derived air pollution exposure datasets in recent years, most studies do not quantify uncertainty in these estimates beyond cross-validation on their training sets. Assessing whether these models perform well under a variety of scenarios is a critical first step in developing strategies to improve exposure assessment and health analyses. Our study aims to help fill this gap in the literature, as well as to explore how and why health scientists might choose exposure estimates from one model over another for a particular purpose.
Overview of PM2.5 Validation and Comparison Methods
A 2019 review evaluated several publicly-available PM2.5 exposure products (overall, not specifically targeting wildfire smoke) and observed differences between estimates, likely due to the different input data and modeling techniques used to generate each product as well as the different spatio-temporal resolutions.18 Of relevance to our study, the review noted that PM2.5 in rural areas may often be overestimated, due to the fact that FRM/FEM monitors deployed by the Environmental Protection Agency (henceforth, ”EPA monitors”) are mostly located in urban areas, which often have more PM2.5 from industrial and traffic emissions. By contrast, during wildfires, the concentrations of PM2.5 are often higher in rural areas, necessitating supplemental monitoring in those areas to capture high PM2.5 levels.18 In general, the review advocated that (a) more inter-comparison and validation of PM2.5 datasets are needed to inform health research and public advocacy, (b) inter-comparison studies should attempt to identify methodological factors contributing to the comparison results, and (c) anticipated use(s) of the exposure data by health researchers and advocates should guide model evaluations. Our study addresses this call.
In the context of both creating and validating wildfire-specific PM2.5 estimates, several studies have demonstrated the importance of incorporating PM2.5 measurements from networks of temporary monitors deployed explicitly to measure wildfire smoke exposures (more details are in the Methods section). A 2020 study estimating smoke concentrations during the October 2017 wildfires in California19 observed that including PM2.5 observations from these temporary monitors in addition to observations from EPA monitors in model training increased the R2 by 36%. The temporary monitoring data can also be used as independent validation data, as was done in a 2021 investigation of the 2017 Northern California wildfires.20 Although their best model compared to observations from permanent monitor locations (used in the modeling) had R2 = 0.90, the highest R2 compared to the temporary monitoring data (independent validation set) was 0.25. In addition to validating the estimates from one model, having an independent source of validation data is critical for comparing the accuracies of different sets of model-based estimates.
In this study, we evaluate and compare the accuracy of two publicly-available, high-coverage and high-resolution sets of daily PM2.5 estimates spanning the western US, with an evaluation emphasis on accuracy during wildfire smoke episodes. We utilize independent validation data from temporary mobile ground monitors (deployed by the US Forest Service to capture wildfire smoke exposures) to assess accuracy of the modeled estimates across both time and space in the western US. Our findings may influence researchers’ selection of a PM2.5 exposure dataset to study health impacts of wildfire smoke and inform the creation of future exposure datasets that aim to target and/or effectively estimate PM2.5 concentrations during wildfire events.
Materials and Methods
PM2.5 Estimates Being Evaluated and Compared
For this analysis, we compared two existing publicly-available, high-resolution and high-coverage daily PM2.5 products, which were both created using machine learning models and which we anticipate researchers using to study the health impacts of wildfire smoke, or of air pollution more generally in the western US. County-level averages of both products, as well as the differences between them, are shown in Figure 1. Differences in seasonal averages are in the Supporting Information (SI), Figure S1. Both Figures 1 and S1 exhibit clear regional and seasonal differences between the estimates from the two models.
Figure 1:
Maps of county-level averages (2008-2016) for both sets of model-based estimates over the western US. The difference Reid - Di is also shown.
The first set of PM2.5 estimates evaluated here was created by Di et al. (2021); henceforth, the Di estimates.21 These daily estimates are available at every 1km x 1km grid across the continental US from 2000-2016, although their 2019 paper only covers accuracy metrics through 2015. Di et al. used a geographically-weighted GAM (generalized additive model) ensemble of random forest, gradient boosting, and neural network algorithms. (Details of the PM2.5 data and covariates used to train this model can be found in the 2019 paper.12) This approach yielded a 10-fold cross-validation R2 of 0.77 for the mountain west, 0.80 for the states on the Pacific coast, and 0.86 for the US overall. The authors noted that the model ”demonstrated good performance up to 60 μg/m3”.12 They also capped their PM2.5 estimated concentrations at 200 μg/m3 due to prediction instability for higher concentrations. We note that although the Di dataset was not explicitly designed for wildfire PM2.5 exposure assessment, these estimates have been widely used in national air pollution epidemiology studies, and as previously stated, smoke from wildfires is a large source of PM2.5 in the western US, making estimation of wildfire smoke concentrations an important part of general air pollution exposure assessment.
The second dataset was created by Reid et al. (2021); henceforth, the Reid estimates.10 These daily PM2.5 concentration estimates, which were created with wildfire smoke epidemiology in mind as a possible use case, are available from 2008-2018 at the centroid of every Census tract, ZIP code, and county in the western US (Arizona, California, Colorado, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming). Reid et al. used a generalized linear model ensemble of random forest and gradient boosting tree algorithms, and achieved a 10-fold cross-validation R2 of 0.73 for a 2008-2016 model and 0.72 for a 2008-2018 model. The reason for having two separate models is that one key variable, input from a chemical transport model called CMAQ (also used in the Di model), was not available past 2016 at the time of model fitting. The 2008-2018 Reid model did not use CMAQ. Unless otherwise specified, the 2008-2016 Reid estimates analyzed in this paper are from the with-CMAQ model.
One probable reason that the Di R2 values are higher than those of Reid is that the monitoring dataset they used had less extreme PM2.5 observations. Due to increasing wildfires in the western US, more recent years have had more extreme PM2.5 concentrations due to wildfire smoke; this increased variability is harder to accurately predict with statistical models.22 Indeed, both Di et al. and Reid et al. observed decreasing R2 values over time when they calculated this metric by year. In addition to the Reid models covering later (more wildfire-heavy) years, they also included a wider range of monitoring observations, with some deployed near wildfires for the purpose of capturing smoke exposures.
Validation Data
We evaluated the Di and Reid model estimates using measurements from mobile monitors deployed by the AirFire research team, which is part of the US Forest Service (USFS). The AirFire monitoring data are in the Airsys and Western Regional Climate Center (WRCC) archives,23,24 which can be obtained via the R package PWFSLSmoke.25
When USFS air resource advisors are assigned to wildfires in the western US (which typically occurs for large and long-duration fires and fires on federal lands), they deploy mobile monitors when and where there is concern about smoke affecting people.26,27 Specifically, the locations of the AirFire mobile monitors are normally downwind of the fires, close to where people live. Placement is at the judgement of the air resource advisor. The monitors are removed once the fire has been contained and smoke concerns have subsided (not necessarily immediately, so there are some observations in the AirFire dataset with relatively low levels of PM2.5).
Some of these data were used to train the Reid model. Only a fraction was used because, as of data retrieval for that project (in February 2018), not all of the data was discoverable / accessible from their website, and the PWFSLSmoke package was nascent. Thus, for the current study’s validation set, we removed all overlapping observations with the Reid training data (based on latitude, longitude, and date), which excluded some observations from the Airsys dataset in addition to WRCC. A map showing the locations of the monitors in the final validation set (and also indicating the number of days each was collecting data) is in the SI (Figure S2).
Data Processing
To compare the Di estimates and Reid estimates to the smoke monitor validation data, we identified the nearest prediction point for each dataset to each smoke monitor location based on latitude and longitude. For the Reid data, this was the nearest Census tract centroid; for the Di data, this was the nearest 1 km x 1 km grid point. We also aggregated the smoke monitor data into 24-hour averages. For any cases where there were multiple Reid estimates for the same location-day (an artifact from their data merging/imputation procedure), we took the mean of these estimates. The main validation/comparison dataset used in this paper spans 2008-2016, but we also compared the Reid data in 2017-2018 (created with the no-CMAQ model) to the smoke monitor data from those years for additional insight, as there were some very large wildfires during those two years.
To refine the smoke monitor dataset, we removed smoke monitor observations (24-hour averages) above 1,000 μg/m3 (n = 37, or 0.1%) to avoid values that could be due to sensor errors. We then removed all smoke monitor observations that overlapped with the training data used by Reid et al., both (a) identical observations (n = 3,377, or 7.6%, between 2008-2018) and (b) observations within 50 meters and within one week of an observation included in the Reid training set (n = 11,886, or 26.7%, between 2008-2018).
In the main 2008-2016 dataset, there were 35 location-days each from Reid and Di that were missing PM2.5 estimates. We removed these from the analysis due to differential missingness: the smoke monitor observations on location-days missing from the Di dataset were substantially higher than those missing from the Reid dataset (the means of the smoke validation data associated with Di’s and Reid’s missing observations were 34μg/m3 and 6μg/m3 respectively).
Finally, we extracted population density at each of the smoke monitor locations (by matching on Census tract) from the 2010-2014 American Community Survey using the R packages tidycensus28 and tigris.29 This whole procedure left 19,850 location-days across the western US (with observations available in all western states except Nevada) between 2008-2016.
Analysis
Our validation set (from the mobile smoke monitors) contained many low PM2.5 values in addition to very high values, often due to the monitors being left out for a while after fires were contained and smoke subsided. Thus, in this analysis we computed accuracy metrics not only for the overall validation dataset but also for particular concentration subsets of the data where the PM2.5 concentrations exceeded certain values, corresponding to thresholds of the US Air Quality Index (AQI). The AQI has six levels (”Good”, ”Moderate”, ”Unhealthy for Sensitive Groups”, ”Unhealthy”, ”Very Unhealthy”, and ”Hazardous”).30 The PM2.5 upper bounds for these AQI classifications are 12.1, 35.5, 55.5, 150.5, and 250.5 μg/m3 respectively.31 In this study, we computed accuracy metrics for the subset of location-days with smoke monitor PM2.5 ≥ 12.1μg/m3 (referred to as Medium) and the subset with smoke monitor PM2.5 ≥ 35.5μg/m3 (referred to as High). Isolating accuracy within these subsets facilitates our discussion about exposure to wildfire smoke, as we assume that (at least) the High level of PM2.5 measured by the smoke monitors was characterizing wildfire smoke days, and the Medium level might also.
In addition to summary statistics for each of these subsets and for each state in the western US, we compared the Di and Reid estimates to the smoke monitor observations using the following metrics: mean bias, normalized mean bias, median absolute difference, median ratio, mean ratio, root-mean squared difference (RMSD), squared correlation (R2), spatial RMSD, temporal RMSD, spatial correlation, and temporal correlation. Formulas for all these metrics, which have been used in similar studies,20,32,33 are in the SI. Briefly, the spatial metrics are calculated after averaging across all days at each location, and they inform about the accuracy with which each set of estimates captures the spatial patterns in wildfire-related PM2.5. The temporal metrics are calculated after averaging across all locations on each day, and they inform about the accuracy with which temporal patterns of wildfire-related PM2.5 are captured by the estimates.
Because environmental health studies are concerned with population exposure, the main analyses in this paper are weighted by the population density of the census tracts in which the smoke monitors were located. Unless otherwise stated, we henceforth refer to population density-weighted metrics. As sensitivity analyses we include metrics unweighted by population density in the SI.
To investigate the possibility of differential model performance in the warm season (when the majority of wildfires occur in the western US), we calculated the pairwise mean bias, median ratio, RMSD, R2, and both spatial and temporal correlation between the Reid estimates, Di estimates, and smoke monitor observations, stratified by whether the observation was from the warm season or not, which we defined as the months May through October.
To gain additional insights, we calculated the main metrics for the Reid estimates compared with the smoke monitor observations from 2017-2018 – years for which there are no Di estimates (n = 9,662). Lastly, we calculated these metrics for both the Di estimates and Reid estimates compared with the EPA monitor observations (2008-2016), on which both models were trained (n = 927,414). This allows for comparison of how well the models perform in general over the western US (not just focusing on wildfires).
Finally, we investigated the accuracy of both models’ classifications of PM2.5 according to the AQI. This was motivated by the possibility that health researchers might be inclined to use categorical smoke exposures to address the problem of large-magnitude model-based estimation errors at high concentrations of PM2.5. Indeed, use of both binary7,34 and multicategory35,36 exposures, including AQI classifications,37 are not unusual in the wildfire smoke epidemiology literature. Another motivation for our AQI-based analysis is that the AQI is now often used to trigger interventions to protect the public from wildfire smoke.38 As part of this analysis, we compared the AQI classifications of the Di and Reid model estimates to the AQI classifications of the smoke monitor observations. We also calculated the overall rates of misclassification, under- and over-classification (e.g. the observed AQI was ”Moderate” and the model predicted ”Unhealthy for Sensitive Groups”), and misclassification by more than one level of the AQI (e.g. the observed AQI was ”Unhealthy for Sensitive Groups” and the model estimated ”Good”).
In addition to using all six AQI classes, we also investigated using a binary exposure (non-smoke = ”Good” or ”Moderate” vs. smoke = ”Unhealthy for Sensitive Groups” or above). We calculated a confusion matrix as well as sensitivity, specificity, positive predictive value, and negative predictive value for each dataset compared to the smoke monitor observations.
Results
Summary Statistics
Table 1 shows how the smoke monitor observations not only tend to have higher PM2.5 values than either of the model estimates, but also have higher standard deviations, in each of the three levels of PM2.5. These patterns become more pronounced as the level of PM2.5 increases. Between the two models, the Reid estimates have higher summary statistics than the Di estimates, putting their values closer to those of the smoke monitor summary statistics (the exception to the latter is first quartile overall PM2.5).
Table 1:
Summary statistics (all in μg/m3) of the smoke monitor observations, Di estimates, and Reid estimates in the merged validation set, subset by level of PM2.5 and weighted by population density.
Overall (N = 19,850) | Medium (N = 5,432) | High (N = 1,034) | |||||||
---|---|---|---|---|---|---|---|---|---|
Metric | Monitor | Di | Reid | Monitor | Di | Reid | Monitor | Di | Reid |
Minimum | 0.00 | 0.00 | 0.00 | 12.12 | 0.00 | 0.63 | 35.50 | 0.00 | 0.63 |
1st Quartile | 4.38 | 3.92 | 6.36 | 14.58 | 6.81 | 10.89 | 45.38 | 6.49 | 18.41 |
Median | 7.58 | 6.34 | 8.51 | 18.54 | 11.43 | 14.68 | 65.46 | 10.92 | 27.67 |
Mean | 12.11 | 7.92 | 10.82 | 29.50 | 12.44 | 17.84 | 95.10 | 17.33 | 33.86 |
3rd Quartile | 12.42 | 10.28 | 12.49 | 26.04 | 15.34 | 19.14 | 69.92 | 23.16 | 36.89 |
Maximum | 928.04 | 190.71 | 227.11 | 928.04 | 190.71 | 227.11 | 928.04 | 190.71 | 227.11 |
Standard Deviation | 30.03 | 6.42 | 9.79 | 55.34 | 9.16 | 15.59 | 129.44 | 17.97 | 32.14 |
When we consider the mean and standard deviation stratified by state of the smoke monitor observations and model estimates (Table S1), similarly to Table 1, the mean smoke monitor observations tend to be higher than the mean estimates from either model (universally for medium and high levels of PM2.5), and the standard deviations tend to be higher as well. For the most part, the mean Reid estimates are substantially higher than the mean Di estimates. The only exceptions to this where the Di mean is more than 1μg/m3 higher than the Reid mean are Montana’s high PM2.5 (Di = 48μg/m3 vs Reid = 41μg/m3) and overall PM2.5 (Di = 10μg/m3 vs Reid = 8μg/m3) and Wyoming’s high PM2.5 (Di = 29μg/m3 vs Reid = 26μg/m3).
Scatterplots by Year and Season
Figure 2 shows pairwise comparisons of the Di, Reid, and smoke monitor data, with points colored by year in the left column and by season in the right column. We first note that both the Di and Reid model estimates have a large number of under-predictions at high values of smoke monitor PM2.5. The Di estimates are almost entirely under-predictions for smoke monitor values > 125μg/m3 and the Reid estimates are almost entirely under-predictions for smoke monitor values > 200μg/m3. These under-predictions contribute to the Deming lines of best fit (shown in green) having flatter slopes than the y=x lines (shown in black). Comparing the slopes of the green and black lines, the larger slope on the Reid vs. Monitor plots (0.21) compared to the Di vs. Monitor plots (0.09) indicates that the Reid estimates are better capturing the variation in the smoke monitor observations, particularly at higher concentrations. However, we also see from the Di vs. Reid plots (slope of best fit line = 0.44) that the two models’ estimates are closer to each other than either of them is to the smoke monitor observations.
Figure 2:
Pairwise comparisons of the Di estimates, Reid estimates, and the smoke monitor observations (all in μg/m3), with points colored by year and by season. (We define Warm Season as May through October.) Slopes of the best-fitting Deming lines are shown for each.
The scatterplots colored by year of observation illustrate how more extreme smoke monitor observations have been collected in more recent years. The scatterplots colored by season (May-October compared to November-April) offer a couple of insights. Unsurprisingly, the majority of observations in the smoke monitor dataset are from the warm months. Also, the observations from the cold months tend to be clustered closer to the origin, except for a string of high concentrations that both models appear to struggle to capture – though the Reid model identifies slightly more of this variability.
Comparison Metrics Results
The mean bias, normalized mean bias, and median ratio values comparing the Di and Reid estimates to the smoke monitor observations (in Table 2) confirm that both models tend to underestimate PM2.5 during smoke events. For instance, for high PM2.5, the mean bias for the Di estimates is −77.8μg/m3 and for the Reid estimates is −61.2μg/m3. The glaring contradiction is the mean ratio for overall PM2.5, which is substantially greater than 1 for both models (7.6 for Di and 13.4 for Reid). This is heavily influenced by instances when the monitor observation is near-zero and the models predict medium-low concentrations, which yields very high ratios. The Reid metric in particular is affected by 18 instances where the monitor observations are in their lowest quartile (< 4μg/m3) and the model estimates are high (> 35.5μg/m3). More details on these large over-predictions are in the SI. By contrast, the Di estimates do not exhibit this issue with large over-predictions when the smoke monitor observations are near-zero.
Table 2:
Metrics (weighted by population density) comparing the Di estimates and Reid estimates to the smoke monitor observations (2008-2016), stratified by PM2.5 level. All metrics except for median ratio, mean ratio, R2, and spatial/temporal correlation are in μg/m3.
Overall (N = 19,850) | Medium (N = 5,432) | High (N = 1,034) | ||||
---|---|---|---|---|---|---|
Metric (Optimal Value) | Di vs. Monitor | Reid vs. Monitor | Di vs. Monitor | Reid vs. Monitor | Di vs. Monitor | Reid vs. Monitor |
Mean Bias (0) | −4.19 | −1.29 | −17.06 | −11.66 | −77.77 | −61.24 |
Normalized Mean Bias (0) | −0.35 | −0.11 | −0.58 | −0.40 | −0.82 | −0.64 |
Median Absolute Difference (0) | 3.01 | 2.66 | 8.44 | 6.17 | 48.48 | 35.03 |
Median Ratio (1) | 0.85 | 1.15 | 0.55 | 0.73 | 0.14 | 0.46 |
Mean Ratio (1) | 7.63 | 13.38 | 0.60 | 0.79 | 0.28 | 0.50 |
RMSD (0) | 29.28 | 27.83 | 57.18 | 54.07 | 150.31 | 142.44 |
R2 (1) | 0.07 | 0.15 | 0.02 | 0.09 | 0.00 | 0.01 |
Spatial RMSD (0) | 74.74 | 71.49 | 85.52 | 81.37 | 154.77 | 148.82 |
Temporal RMSD (0) | 19.18 | 18.51 | 69.61 | 68.42 | 153.94 | 148.75 |
Spatial Correlation (1) | 0.35 | 0.39 | 0.23 | 0.33 | 0.00 | 0.15 |
Temporal Correlation (1) | 0.27 | 0.34 | 0.04 | 0.10 | −0.09 | −0.04 |
Across all metrics, both models’ performance deteriorates (even in normalized terms) as PM2.5 increases. The large RMSD values and small R2 values indicate that both models struggle to capture high values of PM2.5, which in this validation set are primarily due to wildfire smoke. Indeed, at high PM2.5 concentrations, the estimates from both models are effectively uncorrelated with the smoke monitor observations. Compared to the smoke monitor observations, the only metric in Table 2 for which the Di estimates clearly outperform the Reid estimates is the overall mean ratio (which, as mentioned above, is heavily impacted by discrepancies at lower levels of PM2.5 as measured by the smoke monitors).
A result that is harder to interpret is that for high PM2.5, both models’ temporal correlations are negative (−0.09 for the Di estimates and −0.04 for the Reid estimates), which indicates that higher monitor observations are linearly associated with lower model estimates. Negative temporal correlation could be due to the time it takes for wildfire smoke to travel away from a fire area and be measured by EPA monitors in other areas, the data from which constituted the majority of the training sets for the Di and Reid models. However, these temporal correlation values have relatively small magnitudes, so we caution against over-interpreting these results.
Figure 3 shows pairwise normalized mean bias, median ratio, RMSD, correlation, and spatial and temporal correlations for the Di and Reid estimates and smoke monitor observations, stratified by level of monitor PM2.5 and warm/cold season. Note that better agreement for each pair is indicated by normalized mean bias and RMSD values closer to zero, and median ratio, and (regular, spatial and temporal) correlation values closer to one. Clearly, there are more discrepancies between the three datasets as the level of PM2.5 increases and as we move from warm to cold season. The generally worse accuracy of the model estimates in the cold season may have implications for studies investigating differential health impacts of wild and prescribed fires, because the latter mostly occur/have been increasingly occurring during what we are defining as the cold season.39,40 Worse estimation accuracy for smoke during the cold season may also have implications for studies investigating health interaction effects between temperature and smoke (or PM2.5 in general), for which there is already some evidence.7,41
Figure 3:
Pairwise normalized mean bias, median ratio, RMSD, R2, and spatial and temporal correlations for the smoke monitor data, Di estimates and Reid estimates, stratified by PM2.5 level and whether the month was between May and October (which we define as the warm season in the western US) and weighted by population density. The plot for normalized mean bias does not include metrics for Di vs Reid to avoid having to use one of the sets of model estimates for normalization. Sample sizes for each subset are shown at the top.
The yellow bars, comparing the Di and Reid estimates, show by far the highest agreement (across levels of PM2.5 and warm/cold season), confirming that the two sets of model estimates tend to be more similar to each other than either is to the smoke monitor data. However, median ratio for overall PM2.5 indicates closer agreement between model estimates and smoke monitor observations than between the two models.
With the exception of temporal correlation for overall PM2.5 in the warm season, the only subset-metrics for which the Di estimates outperform the Reid estimates (compared with the smoke monitor observations) occur in the cold season. Di’s median ratio for overall PM2.5 is closer to one and both the temporal correlation for high PM2.5 and the spatial correlations across all levels of PM2.5 are less negative than their Reid counterparts. As noted previously, negative temporal correlation could be due to the time it takes for wildfire smoke to travel away from a fire area and be measured by EPA monitors in other areas. Spatially, negative correlation could be due to the aforementioned issue with the majority of EPA monitors being placed in more urban areas with higher background PM2.5 while the majority of wildfires and smoke monitors are located in more rural areas. While this is an interesting result, keep in mind that our sample size for high PM2.5 in the cold season is relatively small (n = 89). In any case, it appears that the Reid estimates struggle to capture spatio-temporal variability of wildfire smoke exposures more in the cold season but generally are able capture more variability than the Di estimates in the warm season, across levels of PM2.5.
We present and discuss the results of the AQI classification and binary smoke / nonsmoke classification analyses in the Supplemental Notes (in the SI). The results of the sensitivity analyses (metrics unweighted by population density, comparing the Reid estimates in 2017-2018 with the smoke monitor validation data, and comparing the Reid and Di estimates with the EPA monitor observations from 2008-2016) are also in the Supplemental Notes.
Discussion
In this project, we evaluated and compared the modeled daily PM2.5 concentrations over the western US in 2008-2016 from two publicly-available high-resolution datasets, with an eye towards their suitability for studying the health impacts of wildfire smoke.
Using independent validation data from mobile monitors deployed for the specific purpose of measuring wildfire smoke concentrations near where people live, we found that across levels of PM2.5, the smoke monitor validation observations were higher than the estimates from either model. This issue gets worse as the level of PM2.5 increases (even on a normalized scale). The relatively poor performance of both models compared to the smoke validation data (overall, Di RMSD = 29.3 μg/m3 and R2 = 0.07; Reid RMSD = 27.8 μg/m3 and R2 = 0.15) indicates that relying on estimates from machine learning models of air pollution concentrations to analyze the health impacts of wildfire smoke could introduce substantial exposure measurement error. Of course, only using monitoring data would likely result either in a much smaller sample size for health analyses (if one only considered people living in immediate vicinity of a monitor) or more exposure measurement error than the modeled estimates,42,43 especially for areas without a monitor close by.
Our analysis also highlights the importance of using independent smoke-specific ground observations for evaluating model estimates: the cross-validation R2 values reported in the Di and Reid papers were between four and eleven times better than the R2 values compared to the smoke monitor observations. This discrepancy aligns with others’ findings.20
In general, our comparison metrics show that the Reid estimates outperform the Di estimates when estimating PM2.5 as measured by the smoke monitors, and this is true across levels of PM2.5. However, the Di estimates perform slightly better than the Reid estimates at capturing spatio-temporal variability during the cold season, when more prescribed fires occur. That said, both models perform much worse during the cold season than during the warm season, likely in part because there are far fewer wildfires (and thus we have far less smoke monitor data) from those months.
For context, when compared with observations from the EPA monitors (on which both models were trained), the Reid estimates capture more large-scale variability in western US PM2.5 than the Di estimates. However, a few metrics suggest that the Di model is capturing a bit more localized variability, likely from non-wildfire (i.e. urban / industrial) sources of PM2.5, while not performing as well on the large-scale variability, which is often influenced by wildfire smoke.
We conclude that if researchers were to use one of these datasets for investigating wildfire smoke exposure and health impacts over the western US, the Reid estimates would be preferable to the Di estimates. However, both sets of estimates have their drawbacks. Any researchers interested in using estimates from these datasets (or similar datasets) must address the challenge of modeled data substantially underestimating peak PM2.5 events. Without adjustment, it is likely that under-estimation of these high PM2.5 concentrations would bias health effects towards the null.44
An approach that we imagine health researchers might consider to address the issue of large-magnitude errors is aggregating PM2.5 estimates into categorical exposures. To address this possibility, we investigated classification accuracies under both the six-level AQI and a binary smoke/non-smoke classification. Overall, the Reid and Di models misclassify the AQI (as measured by the smoke monitors) 20.7% and 23.2% of the time, respectively. However, the Di estimates exhibit more under-classification of the AQI, whereas the Reid estimates exhibit more over-classification. It is unclear whether these two cases (over- and under-classification) might differentially bias health effect estimates based on a categorical PM2.5 exposure. For the binary exposure, both models misclassify only 3.4% of the time, and are able to identify non-smoke days over 95% of the time. However, the Reid model is able to identify a higher fraction of the smoke days (sensitivity) while if the Di model classifies a smoke day, then there is a higher probability that it was smoky (positive predictive value).
One of the likely reasons that the Reid estimates are able to capture higher wildfire smoke concentrations is the inclusion of additional wildfire-specific monitoring observations in their training data. Our usage of the mobile smoke monitors for independent validation of both models further illustrates the value of such data. As presented in the Supplemental Notes, the Reid estimates from 2017-2018 perform much better than those from 2008-2016 (compared to the smoke monitor observations). We infer that this is because there has been increased collection of these wildfire-specific data over time. Before we processed the data (e.g., to remove overlap with the Reid training set), there was an approximately linear increase in the mobile smoke monitor measurements between 2008 and 2018, with an average increase of 736 daily observations per year. We recommend broader usage of such data in future wildfire smoke exposure modeling and validation analyses. In addition to helping estimate high concentrations, this may help to address the issue of negative temporal correlation (which both the Di and Reid estimates exhibited for high levels of PM2.5 observed by the smoke monitors).
Our conclusion that more western wildfire-specific monitoring is needed to capture population exposure to wildfire smoke is in agreement with the findings of two studies published in 2022, which used very different methodologies – one used mode decomposition on the Di estimates45 and the other used chemical transport modeling under past and future climate conditions.46 Both of these studies focused on the network of EPA monitors (and our study additionally used the USFS mobile smoke monitors), but another possibility for future years is including data from low-cost air quality sensors, which are much less accurate than EPA monitors and the mobile smoke monitors, but can provide more high-resolution spatial and temporal information on PM2.5.11 During the 2021 fire season, the USFS deployed low-cost sensors to fill a shortage in the smoke monitors.26 EPA researchers have also developed a smoke correction for PurpleAir sensors,47 a common brand of low-cost PM2.5 sensors that many people have purchased for use in their homes and communities. However, including lower-accuracy air quality observations from these low-cost sensors can introduce new challenges.11
In addition to training on wildfire smoke-specific monitor data, other aspects of the Reid model that may have helped it better capture high PM2.5 from wildfire events are utilization of a satellite product identifying active fire points and inclusion of spatio-temporal indicator variables for state and subregions of the western US, which may have helped optimize the model for this region. The Di model, in addition to covering the entire US, incorporated more spatial and temporal smoothing. Specifically, they used spatially lagged monitored PM2.5, as well as one-day lagged and three-day lagged values of spatially lagged terms, as predictor variables in their final model. While this technique may be beneficial for estimation of PM2.5 overall, it might also smooth out some of the extreme, short-term spikes due to wildfires. In future, we suggest that researchers modeling the effects of wildfire smoke (or other regionally-clustered phenomena) consider adding more event- and/or region-specific information. Users of model-based air pollution estimates should also be mindful of prediction caps (e.g., the Di estimates were capped at 200 μg/m3 whereas the Reid estimates were not capped) or other means of ”taming” large estimates, which can be inappropriate in the wildfire smoke context.
A drawback of region-specific modeling is that wildfire smoke can travel large distances. As suggested by a 2021 study,48 diluted smoke concentrations in more populated areas (i.e. in the eastern US) may increase attributable mortality and asthma morbidity from wildfire smoke relative to higher smoke concentrations in less populated areas (i.e. much of the western US). While the Di estimates cover the continental US, the Reid estimates only cover the western US.
There is also a difference in spatial resolution between the two datasets: the Di estimates are available at every 1 km x 1 km grid cell, while the Reid estimates are available at the centroid of every county, ZIP code, and Census tract. Given that most health data is aggregated to one of these geographical units and many of the covariates used to train the models were coarser than 1 km x 1 km, this difference may not greatly affect health analyses. On the other hand, when researchers have address information or are linking in other gridded data, the Di estimates may be preferred. In any case, the intended application of the PM2.5 estimates should inform their use, in epidemiologic studies or otherwise.
Another important aspect of both the Reid and Di models is that they estimated total PM2.5, not smoke-specific PM2.5 (total minus background PM2.5). To investigate exposure to and health impacts of wildfire smoke, researchers must decide how to classify ”smoke days” and ”non-smoke days”. However, this is also necessary for groups that choose to model the counterfactual background PM2.5 so that they can calculate the amount that is wildfire-specific. And, from a health impacts standpoint (e.g. for estimation of concentration-response functions), people are experiencing total PM2.5 and therefore any health impacts are influenced not just by the wildfire PM2.5 but total PM2.5. In future, using component-specific PM2.5 estimates may help improve characterization of wildfire smoke estimates and subsequent calculation of health effects.49
A final note about both models is that although we can estimate predictive accuracy using cross-validation and comparison with independent and/or smoke-specific monitoring data, these machine learning approaches do not allow for quantifying the uncertainty of individual predictions, which might then be used in subsequent estimation of health effects (i.e., health impact analyses or risk assessments) using an existing concentration-response function. Future air pollution modelers might consider a method such as Bayesian Additive Regression Trees (BART), which in addition to being a flexible ensemble modeling approach similar to those used in many air pollution predictive models, allows for natural estimation of predictive uncertainties using Bayesian modeling techniques.50 On the flip side, several health impact assessments (including one wildfire-specific study) have observed that the magnitude of mortality and morbidity estimates is much more dependent on the uncertainty of the concentration-response function being used than by uncertainty in the choice of PM2.5 estimates to which the concentration-response function is applied.17,33 This is good if one is applying a robust concentration-response function to a noisier exposure dataset, but bad if one is attempting to develop a wildfire smoke-specific concentration-response function using exposure estimates from models which struggle to predict high concentrations of PM2.5, which will likely lead to more uncertainty in the resulting concentration-response function.
The primary limitation of our main analysis is that the smoke monitor observations were not evenly spread across our study region in either space or time, as illustrated by Figure S2. For instance, there were zero observations available from Nevada, and 2017-2018 had 50% the number of observations that 2008-2016 (a nine-year period) had. This could imply that we are monitoring air quality more routinely or just that we are monitoring more during heavy wildfire events in the western US, which have been increasing.2 The heterogeneous distribution of the smoke monitor observations increased the difficulty of interpreting some of the metrics we used in our analysis, such as spatial and temporal correlation/RMSD. For instance, since most of the smoke monitors were collecting data during the summer months, spatial correlation in this analysis likely includes some temporal information. Another limitation of our analysis is that the warm and cold seasons (at least as they relate to wild and prescribed fires) are not exactly the same for all the states in the western US,40 so our definitions of the warm season as May-October and the cold season as November-April may have obscured some of the seasonal heterogeneity across states.
Supplementary Material
Figure S1: Maps of the seasonal differences between county-level averages of the Di and Reid estimates (2008-2016)
Figure S2: Map of all monitoring observations in the validation set, colored by the number of days they were deployed
Table S1: Mean (standard deviation) for each data source in the validation set, by state and level of PM2.5
Table S2: Comparison metrics (like Table 2) unweighted by population density
Table S3: Frequency tables of each dataset’s classification of the AQI in the validation set
Table S4: AQI classification metrics for the Di and Reid estimates compared with the smoke monitor observations
Table S5: Frequency tables of each dataset’s binary classification of AQI (”Non-smoke” vs. ”Smoke”) in the validation set
Table S6: Summary statistics for the Di and Reid estimates’ binary classification of AQI (”Non-smoke” vs. ”Smoke”) compared with the smoke monitor observations
Table S7: Comparison metrics (like Table 2) for the validation observations and the Reid estimates without CMAQ, split into 2017-2018 and 2008-2016
Table S8: Comparison metrics (like Table 2) for both sets of model estimates and the EPA monitor observations
Synopsis:
Airborne particulate matter data spanning years and large areas are needed to study the health impacts of wildfire smoke. We examine how well existing large datasets capture smoke exposures.
Acknowledgement
The authors thank:
The National Studies on Air Pollution and Health (NSAPH) research lab for their support of this project.
Pete Lahm of the Interagency Wildland Fire Air Quality Response Program for answering all of our questions about the mobile smoke monitors.
The AirFire Research Team, specifically Amy Marsha, for helping us obtain the Airsys and WRCC data.
Funding
NIH grant 5T32ES007142 (EMC)
NIH grant K01ES032458 (RCN)
Harvard Climate Change Solutions Fund (RCN)
Footnotes
Conflicts of Interest
Disclaimer: CER and EMC were the primary creators of the Reid estimates. However, DB and RCN also have close ties to the research group that created the Di estimates. Otherwise, we have no conflicts of interest to declare.
Code and Data Availability
Code used to process and analyze the data and visualize the results can be found at https://github.com/haohaojy/wildfire-PM2.5-comparison. The processed datasets used in this analysis are available on Harvard Dataverse: https://doi.org/10.7910/DVN/MBAVER. The raw datasets (prior to our processing) are also all publicly available.
References
- (1).Burke M; Driscoll A; Heft-Neal S; Xue J; Burney J; Wara M The changing risk and burden of wildfire in the United States. Proceedings of the National Academy of Sciences 2021, 118, e2011048118, DOI: 10.1073/pnas.2011048118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).O’Dell K; Ford B; Fischer EV; Pierce JR Contribution of Wildland-Fire Smoke to US PM2.5 and Its Influence on Recent Trends. Environmental Science & Technology 2019, 53, 1797–1804, DOI: 10.1021/acs.est.8b05430. [DOI] [PubMed] [Google Scholar]
- (3).Neumann JE; Amend M; Anenberg S; Kinney PL; Sarofim M; Martinich J; Lukens J; Xu J-W; Roman H Estimating PM2.5 mortality and morbidity associated with future wildfire emissions in the western US. Environmental Research Letters 2021, 16, 035019, DOI: 10.1088/1748-9326/abe82b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Liu JC; Mickley LJ; Sulprizio MP; Dominici F; Yue X; Ebisu K; Anderson GB; Khan RFA; Bravo MA; Bell ML Particulate air pollution from wildfires in the Western US under climate change. Climatic Change 2016, 138, 655–666, DOI: 10.1007/s10584-016-1762-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Reid CE; Maestas MM Wildfire smoke exposure under climate change: impact on respiratory health of affected communities. Current Opinion in Pulmonary Medicine 2019, 25, 179–187, DOI: 10.1097/MCP.0000000000000552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Chen H; Samet JM; Bromberg PA; Tong H Cardiovascular health impacts of wildfire smoke exposure. Particle and Fibre Toxicology 2021, 18, 2, DOI: 10.1186/s12989-020-00394-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Reid CE; Brauer M; Johnston FH; Jerrett M; Balmes JR; Elliott CT Critical Review of Health Impacts of Wildfire Smoke Exposure. Environmental Health Perspectives 2016, 124, 1334–1343, DOI: 10.1289/ehp.1409277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Naeher LP; Brauer M; Lipsett M; Zelikoff JT; Simpson CD; Koenig JQ; Smith KR Woodsmoke Health Effects: A Review. Inhalation Toxicology 2007, 19, 67–106, DOI: 10.1080/08958370600985875. [DOI] [PubMed] [Google Scholar]
- (9).Lassman W; Ford B; Gan RW; Pfister G; Magzamen S; Fischer EV; Pierce JR Spatial and temporal estimates of population exposure to wildfire smoke during the Washington state 2012 wildfire season using blended model, satellite, and in situ data: Multimethod Estimates of Smoke Exposure. GeoHealth 2017, 1, 106–121, DOI: 10.1002/2017GH000049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Reid C; Considine EM; Maestas MM; Li G Daily PM2.5 concentration estimates by county, ZIP code, and census tract in 11 western states 2008-2018. Scientific Data 2021, 8, DOI: 10.1038/s41597-021-00891-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Bi J; Wildani A; Chang HH; Liu Y Incorporating Low-Cost Sensor Measurements into High-Resolution PM2.5 Modeling at a Large Spatial Scale. Environmental Science & Technology 2020, 54, 2152–2162, DOI: 10.1021/acs.est.9b06046. [DOI] [PubMed] [Google Scholar]
- (12).Di Q; Amini H; Shi L; Kloog I; Silvern R; Kelly J; Sabath MB; Choirat C; Koutrakis P; Lyapustin A; Wang Y; Mickley LJ; Schwartz J An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environment International 2019, 130, 104909, DOI: 10.1016/j.envint.2019.104909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Reid CE; Jerrett M; Petersen ML; Pfister GG; Morefield PE; Tager IB; Raffuse SM; Balmes JR Spatiotemporal Prediction of Fine Particulate Matter During the 2008 Northern California Wildfires Using Machine Learning. Environmental Science & Technology 2015, 49, 3887–3896, DOI: 10.1021/es505846r. [DOI] [PubMed] [Google Scholar]
- (14).Di Q; Dai L; Wang Y; Zanobetti A; Choirat C; Schwartz JD; Dominici F Association of Short-Term Exposure to Air Pollution with Mortality in Older Adults. JAMA 2017, 318, 2446–2456, DOI: 10.1001/jama.2017.17923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Jiang X; Eum Y; Yoo E-H The impact of fire-specific PM2.5 calibration on health effect analyses. Science of The Total Environment 2023, 857, 159548, DOI: 10.1016/j.scitotenv.2022.159548. [DOI] [PubMed] [Google Scholar]
- (16).Gan RW; Ford B; Lassman W; Pfister G; Vaidyanathan A; Fischer E; Volckens J; Pierce JR; Magzamen S Comparison of wildfire smoke estimation methods and associations with cardiopulmonary-related hospital admissions: Estimates of Smoke and Health Outcomes. GeoHealth 2017, 1, 122–136, DOI: 10.1002/2017GH000073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Cleland SE; Serre ML; Rappold AG; West JJ Estimating the Acute Health Impacts of Fire-Originated PM2.5 Exposure During the 2017 California Wildfires: Sensitivity to Choices of Inputs. GeoHealth 2021, 5, DOI: 10.1029/2021GH000414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Diao M; Holloway T; Choi S; O’Neill SM; Al-Hamdan MZ; Van Donkelaar A; Martin RV; Jin X; Fiore AM; Henze DK; Lacey F; Kinney PL; Freedman F; Larkin NK; Zou Y; Kelly JT; Vaidyanathan A Methods, availability, and applications of PM2.5 exposure estimates derived from ground measurements, satellite, and atmospheric models. Journal of the Air & Waste Management Association 2019, 69, 1391–1414, DOI: 10.1080/10962247.2019.1668498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Cleland SE; West JJ; Jia Y; Reid S; Raffuse S; O’Neill S; Serre ML Estimating Wildfire Smoke Concentrations during the October 2017 California Fires through BME Space/Time Data Fusion of Observed, Modeled, and Satellite-Derived PM2.5. Environmental Science & Technology 2020, 54, 13439–13447, DOI: 10.1021/acs.est.0c03761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).O’Neill SM; Diao M; Raffuse S; Al-Hamdan M; Barik M; Jia Y; Reid S; Zou Y; Tong D; West JJ; Wilkins J; Marsha A; Freedman F; Vargo J; Larkin NK; Alvarado E; Loesche P A multi-analysis approach for estimating regional health impacts from the 2017 Northern California wildfires. Journal of the Air & Waste Management Association 2021, 71, 791–814, DOI: 10.1080/10962247.2021.1891994. [DOI] [PubMed] [Google Scholar]
- (21).Di Q; Wei Y; Shtein A; Hultquist C; Xing X; Amini H; Shi L; Kloog I; Silvern R; Kelly J; Sabath MB; Choirat C; Koutrakis P; Lyapustin A; Wang Y; Mickley LJ; ; Schwartz J Daily and Annual PM2.5 Concentrations for the Contiguous United States, 1-km Grids, v1 (2000-2016). https://beta.sedac.ciesin.columbia.edu/data/set/aqdh-pm2-5-concentrations-contiguous-us-1-km-2000-2016, 2021; Last accessed on 11/01/21.
- (22).Qi D; Majda AJ Using machine learning to predict extreme events in complex systems. Proceedings of the National Academy of Sciences 2020, 117, 52–59, DOI: 10.1073/pnas.1917285117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).AIRSIS: Interagency Real-Time Smoke Monitoring. https://app.airsis.com/USFS/, Last accessed on 01/03/23.
- (24).Desert Research Institute, W. R. C. C. Fire Cache Smoke Monitor Archive. https://wrcc.dri.edu/cgi-bin/smoke.pl, Last accessed on 01/03/23.
- (25).Callahan Jonathan et al. , Mazama Science, PWFSLSmoke package, version 1.2.117. 2021. [Google Scholar]
- (26).Lahm P; Considine E Phone inquiry about Desert Research Institute data and other mobile monitors. 2021. [Google Scholar]
- (27).U.S. Environmental Protection Agency, Comparative Assessment of the Impacts of Prescribed Fire Versus Wildfire (CAIF): A Case Study in the Western U.S https://cfpub.epa.gov/ncea/risk/recordisplay.cfm?deid=352824, 2021; Last accessed on 01/03/23.
- (28).Walker K; Herman M tidycensus: Load US Census Boundary and Attribute Data as ’tidyverse’ and ’sf’-Ready Data Frames. 2022; R package version 1.2.2. [Google Scholar]
- (29).Walker K tigris: Load Census TIGER/Line Shapefiles. 2022; R package version 1.6.1. [Google Scholar]
- (30).U.S. Environmental Protection Agency, Air Quality Index (AQI) Basics. https://www.airnow.gov/aqi/aqi-basics/, Last accessed on 01/03/23.
- (31).The AQI Equation. https://forum.airnowtech.org/t/the-aqi-equation/169, Last accessed on 01/03/23.
- (32).Ye X; Arab P; Ahmadov R; James E; Grell GA; Pierce B; Kumar A; Makar P; Chen J; Davignon D; Carmichael GR; Ferrada G; McQueen J; Huang J; Kumar R; Emmons L; Herron-Thorpe FL; Parrington M; Engelen R; Peuch V-H; da Silva A; Soja A; Gargulinski E; Wiggins E; Hair JW; Fenn M; Shingler T; Kondragunta S; Lyapustin A; Wang Y; Holben B; Giles DM; Saide PE Evaluation and intercomparison of wildfire smoke forecasts from multiple modeling systems for the 2019 Williams Flats fire. Atmospheric Chemistry and Physics 2021, 21, 14427–14469, DOI: 10.5194/acp-21-14427-2021. [DOI] [Google Scholar]
- (33).Jin X; Fiore AM; Civerolo K; Bi J; Liu Y; van Donkelaar A; Martin RV; Al-Hamdan M; Zhang Y; Insaf TZ; Kioumourtzoglou M-A; He MZ; Kinney PL Comparison of multiple PM2.5 exposure products for estimating health benefits of emission controls over New York State, USA. Environmental Research Letters 2019, 14, 084023, DOI: 10.1088/1748-9326/ab2dcb. [DOI] [Google Scholar]
- (34).Liu JC; Wilson A; Mickley LJ; Ebisu K; Sulprizio MP; Wang Y; Peng RD; Yue X; Dominici F; Bell ML Who Among the Elderly Is Most Vulnerable to Exposure to and Health Risks of Fine Particulate Matter From Wildfire Smoke? American Journal of Epidemiology 2017, 186, 730–735, DOI: 10.1093/aje/kwx141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Jones CG; Rappold AG; Vargo J; Cascio WE; Kharrazi M; McNally B; Hoshiko S; with the CARES Surveillance Group, Out-of-Hospital Cardiac Arrests and Wildfire-Related Particulate Matter During 2015–2017 California Wildfires. Journal of the American Heart Association 2020, 9, e014125, DOI: 10.1161/JAHA.119.014125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Alman BL; Pfister G; Hao H; Stowell J; Hu X; Liu Y; Strickland MJ The association of wildfire smoke with respiratory and cardiovascular emergency department visits in Colorado in 2012: a case crossover study. Environmental Health 2016, 15, 64, DOI: 10.1186/s12940-016-0146-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Hutchinson JA; Vargo J; Milet M; French NHF; Billmire M; Johnson J; Hoshiko S The San Diego 2007 wildfires and Medi-Cal emergency department presentations, inpatient hospitalizations, and outpatient visits: An observational study of smoke exposure periods and a bidirectional case-crossover analysis. PLOS Medicine 2018, 15, e1002601, DOI: 10.1371/journal.pmed.1002601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Holm SM; Miller MD; Balmes JR Health effects of wildfire smoke in children and public health tools: a narrative review. Journal of Exposure Science & Environmental Epidemiology 2021, 31, 1–20, DOI: 10.1038/s41370-020-00267-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Baijnath-Rodino JA; Li S; Martinez A; Kumar M; Quinn-Davidson LN; York RA; Banerjee T Historical seasonal changes in prescribed burn windows in California. Science of The Total Environment 2022, 836, 155723, DOI: 10.1016/j.scitotenv.2022.155723. [DOI] [PubMed] [Google Scholar]
- (40).Ryan KC; Knapp EE; Varner JM Prescribed fire in North American forests and woodlands: history, current practice, and challenges. Frontiers in Ecology and the Environment 2013, 11, e15–e24, DOI: 10.1890/120329, _eprint: 10.1890/120329. [DOI] [Google Scholar]
- (41).Anenberg SC; Haines S; Wang E; Nassikas N; Kinney PL Synergistic health effects of air pollution, temperature, and pollen exposure: a systematic review of epidemiological evidence. Environmental Health 2020, 19, 130, DOI: 10.1186/s12940-020-00681-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Keller JP; Peng RD Error in estimating area-level air pollution exposures for epidemiology. Environmetrics 2019, 30, e2573, DOI: 10.1002/env.2573, _eprint: 10.1002/env.2573. [DOI] [Google Scholar]
- (43).Ebelt ST; D’Souza RR; Yu H; Scovronick N; Moss S; Chang HH Monitoring vs. modeled exposure data in time-series studies of ambient air pollution and acute health outcomes. Journal of Exposure Science & Environmental Epidemiology 2022, DOI: 10.1038/s41370-022-00446-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).Wei Y; Qiu X; Yazdi MD; Shtein A; Shi L; Yang J; Peralta AA; Coull BA; Schwartz JD The Impact of Exposure Measurement Error on the Estimated Concentration-Response Relationship between Long-Term Exposure to PM2.5 and Mortality. Environmental Health Perspectives 2022, 130, 077006, DOI: 10.1289/EHP10389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Kelp MM; Lin S; Kutz JN; Mickley LJ A new approach for determining optimal placement of PM2.5 air quality sensors: case study for the contiguous United States. Environmental Research Letters 2022, 17, 034034, DOI: 10.1088/1748-9326/ac548f. [DOI] [Google Scholar]
- (46).Marlier ME; Brenner KI; Liu JC; Mickley LJ; Raby S; James E; Ahmadov R; Riden H Exposure of agricultural workers in California to wildfire smoke under past and future climate conditions. Environmental Research Letters 2022, 17, 094045, DOI: 10.1088/1748-9326/ac8c58. [DOI] [Google Scholar]
- (47).Barkjohn K; Holder A; Clements A; Frederick S; Evans R Sensor Data Cleaning and Correction: Application on the AirNow Fire and Smoke Map. https://cfpub.epa.gov/si/si_public_record_Report.cfm?dirEntryId=353088&Lab=CEMM, 2021; Last accessed on 08/09/22. [Google Scholar]
- (48).O’Dell K; Bilsback K; Ford B; Martenies SE; Magzamen S; Fischer EV; Pierce JR Estimated Mortality and Morbidity Attributable to Smoke Plumes in the United States: Not Just a Western US Problem. GeoHealth 2021, 5, DOI: 10.1029/2021GH000457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (49).Stowell JD; Bi J; Al-Hamdan MZ; Lee HJ; Lee S-M; Freedman F; Kinney PL; Liu Y Estimating PM2.5 in Southern California using satellite data: factors that affect model performance. Environmental Research Letters 2020, 15, 094004, DOI: 10.1088/1748-9326/ab9334. [DOI] [Google Scholar]
- (50).Zhang T; Geng G; Liu Y; Chang HH Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM2.5 Components. Atmosphere 2020, 11, 1233, DOI: 10.3390/atmos11111233. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1: Maps of the seasonal differences between county-level averages of the Di and Reid estimates (2008-2016)
Figure S2: Map of all monitoring observations in the validation set, colored by the number of days they were deployed
Table S1: Mean (standard deviation) for each data source in the validation set, by state and level of PM2.5
Table S2: Comparison metrics (like Table 2) unweighted by population density
Table S3: Frequency tables of each dataset’s classification of the AQI in the validation set
Table S4: AQI classification metrics for the Di and Reid estimates compared with the smoke monitor observations
Table S5: Frequency tables of each dataset’s binary classification of AQI (”Non-smoke” vs. ”Smoke”) in the validation set
Table S6: Summary statistics for the Di and Reid estimates’ binary classification of AQI (”Non-smoke” vs. ”Smoke”) compared with the smoke monitor observations
Table S7: Comparison metrics (like Table 2) for the validation observations and the Reid estimates without CMAQ, split into 2017-2018 and 2008-2016
Table S8: Comparison metrics (like Table 2) for both sets of model estimates and the EPA monitor observations
Data Availability Statement
Code used to process and analyze the data and visualize the results can be found at https://github.com/haohaojy/wildfire-PM2.5-comparison. The processed datasets used in this analysis are available on Harvard Dataverse: https://doi.org/10.7910/DVN/MBAVER. The raw datasets (prior to our processing) are also all publicly available.