Skip to main content
AMIA Summits on Translational Science Proceedings logoLink to AMIA Summits on Translational Science Proceedings
. 2023 Jun 16;2023:467–476.

Linking Ambient NO2 Pollution Measures with Electronic Health Record Data to Study Asthma Exacerbations

Alana Schreibman 1, Sherrie Xie 1, Rebecca A Hubbard 1, Blanca E Himes 1
PMCID: PMC10283087  PMID: 37350870

Abstract

Electronic health record (EHR)-derived data can be linked to geospatially distributed socioeconomic and environmental factors to conduct large-scale epidemiologic studies. Ambient NO2 is a known environmental risk factor for asthma. However, health exposure studies often rely on data from geographically sparse regulatory monitors that may not reflect true individual exposure. We contrasted use of interpolated NO2 regulatory monitor data with raw satellite measurements and satellite-derived ground estimates, building on previous work which has computed improved exposure estimates from remotely sensed data. Raw satellite and satellite-derived ground measurements captured spatial variation missed by interpolated ground monitor measurements. Multivariable analyses comparing these three NO2 measurement approaches (interpolated monitor, raw satellite, and satellite-derived) revealed a positive relationship between exposure and asthma exacerbations for both satellite measurements. Exposure-outcome relationships using the interpolated monitor NO2 were inconsistent with known relationships to asthma, suggesting that interpolated monitor data might yield misleading results in small region studies.

Introduction

Electronic health records (EHRs) have become invaluable to biomedical research since their use in clinical settings became nearly ubiquitous in the last decade1, due in part to their ability to provide low-cost, rich data for large patient populations at a scale that is inaccessible for traditional epidemiologic studies2. However, although EHRs capture clinical and demographic information on large and diverse patient populations, they lack important socioeconomic and environmental data that influences health. We and others have shown how EHR data can be useful in environmental and social epidemiology studies when its scope is expanded with external data on the physical, built, and social environment via linkage with patient addresses or other geographic information38. Air pollution, the largest environmental contributor to all-cause mortality that is estimated to account for one out of every nine deaths worldwide9, is a prominent risk factor whose integration with EHR data can improve the conduct of research studies related to many health outcomes.

Health studies that include pollution data often rely on measures of six criteria pollutants obtained by the United States Environmental Protection Agency (EPA) at thousands of fixed, quality-controlled, regulatory monitors distributed throughout the country. Because these monitors are geographically sparse relative to the spatial variation desired for most local studies, subjects are matched to measures taken by the nearest monitor or assigned a value computed with a spatial smoothing method such as inverse-distance weighting interpolation (IDW), techniques that imprecisely reflect true exposure and introduce measurement error1012. To overcome barriers related to sparse geographic coverage of regulatory monitors in a region, alternatives such as the use of low-cost, portable pollution sensors13 and measures captured by instruments deployed with satellites14,15 have been pursued. Although using wearable sensors makes possible the fine tracking of exposure throughout an individual’s day, this approach is difficult to implement at the scale needed for most EHR-based studies.

Nitrogen dioxide (NO2) is a gaseous toxicant that has been associated with respiratory disease, cardiovascular disease, and lung cancer16. Because it is a component of traffic-related air pollution (TRAP) with a short atmospheric lifetime, NO2 exhibits high intra-urban variability and is suitable for assessing fine-scale health risk. EPA regulatory monitors collect ground data on this criteria pollutant, offering valuable information on its changes over time and permitting comparison of its levels across broad geographic areas. The TROPOspheric Monitoring Instrument (TROPOMI) aboard the Sentinel-5 Precursor satellite has provided an alternative source of NO2 measures with unprecedented spatial resolution since its launch in 201717. Whereas EPA in situ NO2 measurements are collected in parts per billion (ppb), TROPOMI measures NO2 vertical column density (VCD), that is, the number of pollutant particles per unit area between Earth’s surface and the tropopause. Prior work has shown that TROPOMI NO2 VCD strongly correlates with EPA ground measurements18,19 and that this correlation can be improved further by using chemical transport models to estimate satellite-derived ground levels of NO220,21.

Asthma, a chronic disease characterized by inflammation and narrowing of the airways that affects 21 million adults in the United States22, is known to worsen with NO2 exposure23. Asthma management focuses on controlling symptoms and preventing exacerbations, episodes of worsening symptoms that require treatment with systemic steroids, via avoidance of patient-specific triggers and treatment with drugs24. Due to a complex combination of biological and social factors25, however, exacerbations remain common in some patients and are major contributors to asthma-related morbidity and mortality26,27. Efforts to reduce the impact of exacerbations using EHR-derived data include creating predictive models of asthma exacerbations28,29, deriving a computable asthma severity phenotype30, assessing the geospatial distribution of asthma risk factors68, and better understanding disease associations in real life populations3133. In addition to the contribution by pollution, exacerbation risk differs by race and ethnicity and has been linked to indicators of socioeconomic disadvantage, including low income34, housing instability35, and high crime rates36. Here, we sought to determine whether EPA monitor, raw satellite, or satellite-derived NO2 pollution measures were most suitable for the study of asthma exacerbation risk in Philadelphia.

Methods

Study population.

De-identified EHR data corresponding to Penn Medicine patient encounters from January 1, 2015 to April 8, 2021 was obtained from Penn Data Store (PDS), a clinical data warehouse, and utilized to identify subjects for a retrospective cohort analysis. Patient-level information included age at first encounter, sex, race and ethnicity, health insurance type, and geocodes corresponding to most recent patient address. Encounter-level information included dates, respiratory medication prescriptions, height, weight, smoking status, and all ICD-10 codes associated with the encounter. A years followed variable for each patient was calculated by subtracting the date of the first encounter from the last. Patient age was grouped into four categories: 18-34, 35-54, 55-74, and 75+ years. Health insurance type was grouped as private, Medicare, and Medicaid, and patients with multiple health insurance types listed were grouped as their highest frequency type. Body mass index (BMI) was computed for each encounter and categorized as: not overweight or obese (<25.0 kg/m2), overweight (25.0 to <30.0 kg/m2), grade 1 obese (30.0 to <35.0 kg/m2), grade 2 obese (35.0 to <40.0 kg/m2), and grade 3 obese and above (≥40.0 kg/m2). Height and weight entries outside of the four to seven foot and 80 to 700 pound range, respectively, were excluded to reduce bias from entry errors. Per-patient BMI was calculated as the mean of all available values. Smoking status was assigned based on frequency and recency of encounter-level responses, and was grouped into current, never, and history. Passive and past smoking were combined under history because of the small size of the passive level. Age, sex, race and ethnicity, BMI, health insurance type, and smoking status are hereinafter referred to as “EHR-derived variables.”

Inclusion criteria for the study were: (1) at least 18 years old at first encounter, (2) at least one instance of an asthma ICD-10 code (i.e., J45) in any encounter type, (3) at least one short-acting beta2-agonist (SABA) prescription, in accordance with United States national asthma management guidelines recommending short-acting beta2-agonist (SABA) treatment for managing both intermittent and persistent asthma in individuals 12 and over37,38, (4) at least one year followed between January 2017 and December 2020, and (5) residence within the city of Philadelphia to maximize the likelihood that Penn Medicine was the main point of care for asthma. Exclusion criteria were: (1) presence of ICD-10 codes for chronic obstructive pulmonary disease (i.e., J41, J42, J43, J44) or cystic fibrosis (i.e., E84), and (2) patients with incomplete covariates (i.e., missing BMI, insurance type, smoking status, geocode).

Asthma exacerbations.

We defined our study period as January 1, 2017 to March 17, 2020, a time period that ends on the date when COVID-19 restrictions were put in place in Philadelphia, to minimize the effects of the pandemic on healthcare utilization. We identified asthma exacerbations occurring during this period as encounters that had (1) a primary asthma ICD-10 code (i.e., J45), and (2) at least one oral corticosteroid (OCS) prescription, criteria derived from national asthma management guidelines recommending OCS as part of the clinical course for mild, moderate, and severe exacerbations37,38. For encounters without primary asthma ICD-10 codes, nonprimary codes were used to determine exacerbations. Total exacerbations per patient over this study period were categorized as: 0, 1, 2-3, and 4+.

Area Deprivation Index (ADI).

Socioeconomic disadvantage for each patient was summarized as the ADI, a single validated score that is based on American Community Survey data on education, income, housing, poverty, and employment to measure neighborhood disadvantage39. The 2018 ADI values for Pennsylvania were extracted from the Neighborhood Atlas on the census block level and converted to a raster40,41. ADI was assigned to each patient using bilinear interpolation from the values of the four nearest cells to their geocode using the R package terra42 and then grouped into four categories: 0-25, 25-50, 50-75, and 75-100.

Ambient NO2 Measures.

Three rasters of NO2 measures were obtained:

  1. Interpolated EPA NO2 This measure was derived from 2017-2019 averaged AirData annual concentrations by regulatory monitors in the southeastern Pennsylvania region43, including four NO2 monitors in Philadelphia44. Inverse distance weighting (IDW) was used to create a grid of interpolated values for the entire region at ~385m2 resolution. The grid was then rasterized and clipped to the Philadelphia bounding box region.

  2. Raw satellite NO2 This measure was derived from TROPOMI satellite observations on Google Earth Engine45, a cloud-based platform designed to make remote sensing analysis widely accessible. Observations for the Philadelphia bounding box region were downloaded using the rgee package46 with a resolution of ~1km2. Only 2019-averaged data was downloaded as 2019 is the first year for which Google Earth Engine had complete observations. Pixels with quality assurance values less than 0.75 were excluded.

  3. Satellite-derived ground NO2 This measure was downloaded from a resource reported in Cooper et al. and averaged over 2017-201947. The NO2 column densities were based on both TROPOMI and the Ozone Monitoring Instrument, a satellite with more years of historical data but lower resolution21. Column densities were then converted to ground estimates using GEOS-Chem, a chemical transport model. The downloaded rasters had a resolution of ~1km2.

These measures were assigned to each patient using bilinear interpolation from the values of the four cells nearest to their geocode using the R package terra42.

Comparison of NO2 Measures.

Monthly 2019 raw satellite NO2 measures were compared to monthly 2019 interpolated EPA data at the geocoordinates of three EPA regulatory monitors (i.e., Camden Spruce Street, Car-Barn Montgomery I-76, and Torresdale Station). A linear regression model was fitted for raw satellite NO2 versus interpolated EPA NO2, the Pearson correlation coefficient r was estimated, and points were joined with a linear smooth. Scatter plots of raw and interpolated EPA NO2 over 2019 at the coordinates of each regulatory monitor were smoothed using local polynomial regression.

Statistical analyses.

Chi-squared tests were used in bivariate analysis to assess relationships between patient characteristics (i.e., EHR-derived covariates and ADI) and asthma exacerbation level during the study period. Proportional odds logistic regression models were used in multivariable analysis to compute adjusted odds of exacerbation. All multivariable models included EHR-derived covariates and ADI as independent predictors. One NO2-unadjusted model contained no additional predictors. Three NO2-adjusted models included one of the three NO2 measurements of interest as an additional predictor. Ethnicity was not included in multivariable analysis since only 5.7% of patients identified as Hispanic. Stratified binomial models were used to test the proportional odds assumption that the relationship between all outcome levels is similar for each predictor. Statistical analyses were conducted using the R MASS package48.

Geospatial analyses of asthma exacerbations.

Hotspots for risk of asthma exacerbation were identified using previously described methods and the MapGAM R package6,49. Briefly, generalized additive models (GAMs) were used to estimate local odds of exacerbation, which was releveled into a binary outcome variable, where controls were patients with no exacerbations and cases were patients with one or more exacerbations. The log odds of this outcome were estimated as a function of location while simultaneously adjusting for covariates. EHR-derived variables (excluding ethnicity), and ADI were included as predictors in four GAMs that differed as follows: one did not include additional variables, and three were adjusted using one NO2 measurement type each. A global test of the null hypothesis that odds of exacerbation were not spatially correlated was performed by permuting the assignment of cases and controls over patient geocodes 1000 times. Log odds were converted to local odds ratios (ORs) using the odds of exacerbation for the whole cohort as a reference. A distribution of log odds at each point across all permutations was used to define “hotspots” and “coldspots”: points ranking in the upper 2.5% were termed “hotspots”, and the lower 2.5% were “coldspots.”

Results

Subject characteristics.

Our study population consisted of 16,744 people with asthma, 13,924 with no exacerbations during the study period and 2,820 with at least one exacerbation. Age, race, ethnicity, BMI, health insurance type, and ADI were associated with exacerbations according to bivariate analyses (p < 0.001) (Table 1). These relationships were consistent with known asthma disparities50: 66% of patients with no exacerbations were Black compared to 81% of patients with 4+ exacerbations, and 40% of patients with no exacerbations had Medicaid insurance compared to 53% of patients with 4+ exacerbations.

Table 1.

Patient characteristics by exacerbation count levels.

Number of Exacerbations
Characteristic Overall, N = 16,7441 0, N = 13,9241 1, N = 1,9181 2-3, N = 6391 4+, N = 2631 p-value2
Patient Age <0.001
18-34 6,727 (40%) 5,739 (41%) 700 (36%) 195 (31%) 93 (35%)
35-54 5,871 (35%) 4,752 (34%) 744 (39%) 275 (43%) 100 (38%)
55-74 3,526 (21%) 2,928 (21%) 402 (21%) 132 (21%) 64 (24%)
75+ 620 (3.7%) 505 (3.6%) 72 (3.8%) 37 (5.8%) 6 (2.3%)
Sex 0.001
Male 3,893 (23%) 3,264 (23%) 412 (21%) 134 (21%) 83 (32%)
Female 12,851 (77%) 10,660 (77%) 1,506 (79%) 505 (79%) 180 (68%)
Race <0.001
White 4,558 (27%) 3,977 (29%) 422 (22%) 117 (18%) 42 (16%)
API 450 (2.7%) 386 (2.8%) 43 (2.2%) 18 (2.8%) 3 (1.1%)
Black 11,131 (66%) 9,042 (65%) 1,395 (73%) 480 (75%) 214 (81%)
Other/Unknown 605 (3.6%) 519 (3.7%) 58 (3.0%) 24 (3.8%) 4 (1.5%)
Ethnicity <0.001
Non-Hispanic 15,793 (94%) 13,097 (94%) 1,818 (95%) 624 (98%) 254 (97%)
Hispanic 951 (5.7%) 827 (5.9%) 100 (5.2%) 15 (2.3%) 9 (3.4%)
BMI (kg/m2) <0.001
Not overweight or obese 3,526 (21%) 3,034 (22%) 360 (19%) 88 (14%) 44 (17%)
Overweight 4,232 (25%) 3,538 (25%) 479 (25%) 149 (23%) 66 (25%)
Class 1 obese 3,529 (21%) 2,929 (21%) 398 (21%) 147 (23%) 55 (21%)
Class 2 obese 2,488 (15%) 2,051 (15%) 301 (16%) 100 (16%) 36 (14%)
Class 3 obese 2,969 (18%) 2,372 (17%) 380 (20%) 155 (24%) 62 (24%)
Health Insurance Type <0.001
Private 7,941 (47%) 6,695 (48%) 858 (45%) 291 (46%) 97 (37%)
Medicaid 6,645 (40%) 5,443 (39%) 807 (42%) 255 (40%) 140 (53%)
Medicare 2,158 (13%) 1,786 (13%) 253 (13%) 93 (15%) 26 (9.9%)
Smoking History 0.004
Never 10,211 (61%) 8,539 (61%) 1,161 (61%) 372 (58%) 139 (53%)
Current 2,294 (14%) 1,862 (13%) 299 (16%) 96 (15%) 37 (14%)
History 4,239 (25%) 3,523 (25%) 458 (24%) 171 (27%) 87 (33%)
ADI <0.001
1-24 2,437 (15%) 2,100 (15%) 244 (13%) 70 (11%) 23 (8.7%)
25-49 2,820 (17%) 2,415 (17%) 284 (15%) 84 (13%) 37 (14%)
50-74 3,207 (19%) 2,650 (19%) 361 (19%) 137 (21%) 59 (22%)
75-100 8,280 (49%) 6,759 (49%) 1,029 (54%) 348 (54%) 144 (55%)
1

n (%), 2Corresponding to Pearson’s Chi-squared test

Comparison of ambient NO2 measures.

At the location of EPA regulatory monitors, EPA ground monitor measurements and raw satellite NO2 exhibited high correlation (r = 0.77) (Figure 1A). Descriptive spatio-temporal plots comparing EPA ground monitor measurements and raw satellite NO2 at three Philadelphia monitor locations indicated that raw TROPOMI observations were sensitive to fluctuations in ground-level NO2 and that both exhibited similar annual variability, validating raw satellite measurements for use in further analysis to represent exposures at the ground level (Figure 1B-D). However, the correlation between interpolated EPA and raw satellite measurements decayed as the distance from a monitor site increased. Interpolated EPA NO2 peaked in Northeast Philadelphia (Figure 2A), whereas raw satellite NO2 concentrations were highest in parts of West, South, and Southeast Philadelphia (Figure 2B). Satellite-derived NO2 concentrations closely matched raw satellite measurements but displayed more overall heterogeneity (Figure 2C).

Figure 1.

Figure 1.

Comparison of raw satellite and regulatory monitor NO2 measures. 2019 monthly averaged TROPOMI observations were compared to 2019 monthly averaged EPA measurements at three Philadelphia regulatory monitors. TROPOMI monthly averages were extracted from the pixel in which each monitor was located. A) Scatter plot for all three Philadelphia monitors combined with a linear smooth (r = 0.77). Descriptive time plots for monitors at B) Camden Spruce Street, C) Car-Barn Montgomery I-76, and D) Torresdale Station.

Figure 2.

Figure 2.

The geospatial distribution of the three NO2 measurement types considered. A) 2017-2019 EPA AirData computed using IDW interpolation at ~385m2 resolution. B) 2019 raw TROPOMI Google Earth Engine data at~1km2 resolution. C) 2017-2019 ground-level data derived from TROPOMI and OMI satellites at ~1km2 resolution, as presented in Cooper et al (2021).

Factors associated with asthma exacerbations.

In each multivariable model, all age categories (p<0.001), Black race (p<0.001), and class 3 obesity (p<0.001) were significantly associated with being in a higher exacerbation category (Table 2). Sex, health insurance type, smoking history, and ADI were not significant in any model. Addition of each of the NO2 measurements to the model resulted in a significant association with exacerbations, but raw satellite and satellite-derived NO2 conferred an increased risk (OR = 1.08, 95% CI [1.02, 1.15] and OR = 1.14, 95% CI [1.05, 1.24], respectively) while interpolated EPA NO2 was associated with decreased risk (OR = 0.94, 95% CI [0.91, 0.97]).

Table 2.

Factors associated with asthma exacerbations. ORs were computed from proportional odds models with exacerbation counts as the outcome. All models included EHR-derived covariates and ADI as independent variables. NO2-adjusted models additionally included EPA interpolated, raw satellite, or satellite-derived NO2 measures. Adjusted ORs and 95% confidence intervals (CIs) are shown. ***p<0.001, **p<0.01, *p<0.05

Characteristic NO2-Unadjusted Model Model with EPA-Interpolated NO2 Model with Raw Satellite NO2 Model with Satellite-Derived NO2
Patient Age
18-34 Reference Reference Reference Reference
35-54 1.36 (1.24, 1.5)*** 1.36 (1.24, 1.5)*** 1.37 (1.24, 1.51)*** 1.37 (1.24, 1.51)***
55-74 1.24 (1.1, 1.4)*** 1.23 (1.09, 1.39)*** 1.25 (1.11, 1.42)*** 1.25 (1.11, 1.42)***
75+ 1.58 (1.22, 2.03)*** 1.54 (1.2, 1.99)*** 1.59 (1.23, 2.05)*** 1.59 (1.24, 2.05)***
Sex
Male Reference Reference Reference Reference
Female 0.99 (0.9, 1.1) 1 (0.9, 1.1) 1 (0.9, 1.1) 1 (0.9, 1.1)
Race
White Reference Reference Reference Reference
API 1.18 (0.89, 1.56) 1.15 (0.87, 1.52) 1.17 (0.89, 1.55) 1.17 (0.88, 1.55)
Black 1.47 (1.3, 1.65)*** 1.42 (1.25, 1.6)*** 1.5 (1.33, 1.7)*** 1.52 (1.34, 1.71)***
Other/Unknown 1.12 (0.87, 1.43) 1.11 (0.87, 1.42) 1.15 (0.9, 1.47) 1.15 (0.9, 1.48)
BMI (kg/m2)
Not Overweight or Obese Reference Reference Reference Reference
Overweight 1.14 (1, 1.29)* 1.14 (1.01, 1.3)* 1.14 (1.01, 1.3)* 1.14 (1.01, 1.3)*
Class 1 Obese 1.12 (0.98, 1.28) 1.13 (0.99, 1.3) 1.13 (0.99, 1.29) 1.13 (0.99, 1.29)
Class 2 obese 1.13 (0.98, 1.31) 1.14 (0.98, 1.32) 1.13 (0.98, 1.31) 1.14 (0.98, 1.31)
Class 3 Obese 1.32 (1.15, 1.52)*** 1.33 (1.16, 1.53)*** 1.32 (1.15, 1.52)*** 1.32 (1.15, 1.52)***
Health Insurance Type
Private Reference Reference Reference Reference
Medicaid 1.05 (0.95, 1.15) 1.05 (0.95, 1.16) 1.03 (0.94, 1.14) 1.03 (0.93, 1.14)
Medicare 0.92 (0.79, 1.07) 0.92 (0.79, 1.08) 0.91 (0.78, 1.06) 0.91 (0.78, 1.06)
Smoking History
Never Reference Reference Reference Reference
Current 1.07 (0.95, 1.21) 1.06 (0.93, 1.2) 1.06 (0.94, 1.2) 1.06 (0.94, 1.2)
History 1 (0.9, 1.1) 0.99 (0.9, 1.1) 1 (0.9, 1.1) 0.99 (0.9, 1.1)
ADI
1-24 Reference Reference Reference Reference
25-49 0.95 (0.81, 1.12) 0.97 (0.83, 1.14) 1 (0.85, 1.18) 1.02 (0.86, 1.2)
50-74 1.05 (0.9, 1.24) 1.11 (0.94, 1.3) 1.12 (0.95, 1.32) 1.15 (0.97, 1.36)
75-100 1.04 (0.89, 1.21) 1.08 (0.93, 1.26) 1.09 (0.93, 1.28) 1.13 (0.96, 1.33)
Interpolated EPA NO2 0.94 (0.91, 0.97)***
Raw Satellite NO2 1.08 (1.02, 1.15)*
Satellite-Derived NO2 1.14 (1.05, 1.24)**

Geospatial distribution of asthma exacerbation risk.

GAM analysis adjusting for all EHR-derived covariates and ADI resulted in a significant global test statistic (p < 0.001) rejecting the null hypothesis of no spatial trend in asthma exacerbations (Figure 3A). The ORs for this GAM showed a single hotspot capturing parts of West Philadelphia and Southeast Philadelphia (Figure 3A). Three additional NO2-adjusted GAMs including all EHR-derived covariates, ADI, and one of the three NO2 measurements of interest also resulted in significant global test statistics (p < 0.001), indicating that spatial correlation persists after adjusting for NO2. However, the hotspots and coldspots for each model differed from each other and from those of the model without NO2. The GAM that included EPA interpolated NO2 had less spatial variation and the size of hotspots diminished substantially (Figure 3B). The coefficient for NO2 was negative, indicating an inverse association between exacerbations and exposure. The GAM that included raw satellite NO2 had a single hotspot nearly identical to that of the GAM that did not include an NO2 term (Figure 3C), suggesting that the spatial variation was unexplained by the inclusion of raw satellite NO2. The GAM that included satellite-derived NO2 had similar hotspots as those of the GAM that did not include NO2 but with markedly reduced size (Figure 3D), indicating that this measure of NO2 partially explained the spatial variation in exacerbations.

Figure 3.

Figure 3.

Spatial odds ratios (ORs) of exacerbation before and after integrating NO2 data. Significant hotspots and coldspots (p < 0.001) are indicated by black contour lines. A) NO2-unadjusted model included EHR-derived covariates and ADI. NO2-adjusted models included all covariates and B) interpolated EPA NO2, C) raw TROPOMI NO2, and D) satellite-derived ground NO2.

Discussion

The characteristics of people with more asthma exacerbations in the adjusted proportional odds models—Black race, class 3 obesity, Medicaid insurance—are known asthma exacerbation risk factors6. In contrast, we did not find that sex, health insurance type, smoking history, or ADI were significant predictors of asthma exacerbations in our adjusted models, despite reports suggesting that they play a role6 but consistent with our previous observations using EHR data32. The geospatial distribution of asthma exacerbations based on EHR data and ADI was found to be heterogeneous, with hotspots in West and South Philadelphia, regions of the city with a high proportion of residents who are Black and/or of low socioeconomic status.

Our results comparing ambient NO2 measurement types in Philadelphia found high correlation across time in measures taken at the location of EPA regulatory monitors, suggesting that EPA and satellite measures are consistent. However, review of interpolated EPA data versus that provided by satellite measures showed that regional variation was inadequately captured by the EPA measures. For example, examination of Figure 2 shows that EPA monitors failed to detect the largest NO2 hotspot in the city near Southeast Philadelphia. Further, the statistically significant inverse relationship observed between interpolated EPA NO2 exposure and asthma exacerbations in the proportional odds and spatial GAM models raises questions about the adequacy of interpolated EPA data to predict health risk because the inverse relationship is inconsistent with known associations between NO2 and asthma risk. Associations between the raw satellite NO2 and satellite-derived ground NO2 measures with asthma exacerbations, on the other hand, were consistent with the known relationship between NO2 and exacerbation risk.

Comparison of raw satellite data versus satellite-derived ground NO2 levels showed that although the two measures were similar, the ground level estimates were more heterogenous. However, we are limited in our ability to compare these two satellite measurements because they were based on different averaging periods (2019 for the raw satellite and 2017-2019 for the satellite-derived estimates) due to the limited availability of the relatively new TROPOMI instrument. The satellite-derived ground NO2 estimates introduced by work in Cooper et al. were originally validated using annual averages at 3,977 monitor sites worldwide and found to be consistent with in situ observations (r = 0.71)21. These measures provide improved exposure estimates because they address the limitations of both regulatory monitor and raw satellite measurements: poor spatial resolution and limited ability to relate vertical column density (VCD) to real in situ exposures, respectively. In our association models, satellite-derived NO2 had a larger odds ratio than the raw satellite NO2 (1.16 versus 1.09, respectively) for contributing to asthma exacerbations. In the spatial models, inclusion of satellite-derived NO2 reduced the size and effect of exacerbation hotspots, whereas inclusion of the raw satellite NO2 increased the size of the exacerbation hotspot while increasing its odds ratio slightly. As more TROPOMI measures become available, researchers studying time periods 2019 or later will be able to easily download and average multiple years of data. This will allow for detailed comparisons between raw satellite observations and satellite-derived estimates for use in exposure assessment studies. In addition, as other satellite-based measures of pollution beside NO2 become available, these will permit the comparison of contributions of other pollutants.

Our study is subject to limitations, including some related to use of EHR data: missingness, entry error, and phenotype misclassification resulting from the criteria used to classify asthma and asthma exacerbations. Penn Medicine is not the sole provider of care in Philadelphia, and hence, our patient density is greatest near Penn Medicine facilities. Although the GAM accounts for spatial variation in observation density, residual confounding may still be present given the uneven distribution of patients in our study area. Finally, the geocodes we used were the most current at the time of the data pull, but they may not have represented place of residence during the entire study period, nor does the location of residence enable a full assessment of environmental exposures. In future studies, our approach can be extended to include pediatric patients and to consider changes during the COVID-19 pandemic.

Conclusion

By comparing three publicly available ambient NO2 measurement types, we found that satellite data captured more spatial variability than EPA regulatory monitor data for local health studies, and that inclusion of ground-level satellite estimates of NO2 into multivariable models of asthma exacerbations revealed a significant association between NO2 exposure and asthma exacerbations in Philadelphia. In contrast, use of spatially smoothed EPA regulatory monitor data might yield misleading results when used in studies of small regions. Our findings demonstrate the potential of satellite-based pollution measurements to gain insights on local disease patterns, especially as these data become more widely available.

Acknowledgements

We would like to thank Selah Lynch from the University of Pennsylvania Penn Data Store for extracting the EHR data used for this project. Research reported in this publication was supported by the National Institutes of Health National Heart, Lung, And Blood Institute under Award Numbers R01HL162354 and R01HL133433 and the National Institute of Environmental Health Sciences under Award Numbers P30ES013508 and R25ES021649. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Figures & Table

References

  • 1.Henry J, Pylypchuk Y, Searcy T, Patel V. Adoption of electronic health record systems among U.S. non-federal acute care hospitals: 2008-2015 [Internet] Office of the National Coordinator for Health Information Technology. 2016 May [cited 2022 Jul 27]. (ONC Data Brief). Report No.: 35. Available from: https://www.healthit.gov/data/data-briefs/adoption-electronic-health-record-systems-among-us-non-federal-acute-care-1. [Google Scholar]
  • 2.Mooney SJ, Westreich DJ, El-Sayed AM. Commentary: Epidemiology in the era of big data. Epidemiology. 2015 May;c(3):390–4. doi: 10.1097/EDE.0000000000000274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: A review of methods and applications. Annu Rev Public Health. 2016 Mar;37:61–81. doi: 10.1146/annurev-publhealth-032315-021353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gabert R, Thomson B, Gakidou E, Roth G. Identifying high-risk neighborhoods using electronic medical records: A population-based approach for targeting diabetes prevention and treatment interventions. PLoS ONE. 2016 Jul 27;11(7):e0159227. doi: 10.1371/journal.pone.0159227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Al Sallakh MA, Vasileiou E, Rodgers SE, Lyons RA, Sheikh A, Davies GA. Defining asthma and assessing asthma outcomes using electronic health record data: A systematic scoping review. Eur Respir J. 2017 Jun;49:1700204. doi: 10.1183/13993003.00204-2017. [DOI] [PubMed] [Google Scholar]
  • 6.Xie S, Greenblatt R, Levy MZ, Himes BE. Enhancing electronic health record data with geospatial information. AMIA Summits Transl Sci Proc. 2017 Jul 26;2017:123–32. [PMC free article] [PubMed] [Google Scholar]
  • 7.Xie S, Himes BE. Approaches to link geospatially varying social, economic, and environmental factors with electronic health record data to better understand asthma exacerbations. AMIA Annu Symp Proc. 2018 Dec 5;2018:1561–70. [PMC free article] [PubMed] [Google Scholar]
  • 8.Bozigar M, Connolly CL, Legler A, et al. In-home environmental exposures predicted from geospatial characteristics of the built environment and electronic health records of children with asthma. Ann Epidemiol. 2022 Sep;73:38–47. doi: 10.1016/j.annepidem.2022.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.World Health Organization. Ambient air pollution: A global assessment of exposure and burden of disease [Internet] World Health Organization. 2016. [cited 2022 Aug 1]. Available from: https://apps.who.int/iris/handle/10665/250141.
  • 10.Kioumourtzoglou MA, Spiegelman D, Szpiro AA, et al. Exposure measurement error in PM2.5 health effects studies: A pooled analysis of eight personal exposure validation studies. Environ Health. 2014 Jan 13;13(1):2. doi: 10.1186/1476-069X-13-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Liao X, Zhou X, Wang M, Hart JE, Laden F, Spiegelman D. Survival analysis with functions of mismeasured covariate histories: The case of chronic air pollution exposure in relation to mortality in the nurses’ health study. J R Stat Soc Ser C Appl Stat. 2018;67(2):307–27. doi: 10.1111/rssc.12229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Greenblatt RE, Himes BE. Facilitating inclusion of geocoded pollution data into health studies. AMIA Summits Transl Sci Proc. 2019 May 6;2019:553–61. [PMC free article] [PubMed] [Google Scholar]
  • 13.Christie C, Xie S, Diwadkar AR, Greenblatt RE, Rizaldi A, Himes BE. Consolidated environmental and social data facilitates neighborhood-level health studies in Philadelphia. AMIA Annu Symp Proc. 2022 Feb 21;2021:305–13. [PMC free article] [PubMed] [Google Scholar]
  • 14.Kloog I, Melly SJ, Ridgway WL, Coull BA, Schwartz J. Using new satellite based exposure methods to study the association between pregnancy PM2.5 exposure, premature birth and birth weight in Massachusetts. Environ Health. 2012 Jun 18;11:40. doi: 10.1186/1476-069X-11-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hoek G. Methods for assessing long-term exposures to outdoor air pollutants. Curr Environ Health Rep. 2017 >Dec 1;4(4):450–62. doi: 10.1007/s40572-017-0169-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Atkinson RW, Butland BK, Anderson HR, Maynard RL. Long-term concentrations of nitrogen dioxide and mortality: A meta-analysis of cohort studies. Epidemiology. 2018 Jul;29(4):460–72. doi: 10.1097/EDE.0000000000000847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Veefkind JP, Aben I, McMullan K, et al. TROPOMI on the ESA Sentinel-5 Precursor: A GMES mission for global observations of the atmospheric composition for climate, air quality and ozone layer applications. Remote Sens Environ. 2012 May 15;120:70–83. [Google Scholar]
  • 18.Griffin D, Zhao X, McLinden CA, et al. High-resolution mapping of nitrogen dioxide with TROPOMI: first results and validation over the Canadian oil sands. Geophys Res Lett. 2018 Dec 28;46(2):1049–60. doi: 10.1029/2018GL081095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ialongo I, Virta H, Eskes H, Hovila J, Douros J. Comparison of TROPOMI/Sentinel-5 Precursor NO2 observations with ground-based measurements in Helsinki. Atmospheric Meas Tech. 2020 Jan 16;13(1):205–18. [Google Scholar]
  • 20.Di Q, Amini H, Shi L, et al. Assessing NO2 concentration and model uncertainty with high spatiotemporal resolution across the contiguous United States using ensemble model averaging. Environ Sci Technol. 2020;54(3):1372–84. doi: 10.1021/acs.est.9b03358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Cooper MJ, Martin RV, Hammer MS, et al. Global fine-scale changes in ambient NO2 during COVID-19 lockdowns. Nature. 2022 Jan;601(7893):380–7. doi: 10.1038/s41586-021-04229-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Centers for Disease Control and Prevention. 2020 National Health Interview Survey (NHIS) data [Internet] [cited 2022 Jul 25]. Available from: https://www.cdc.gov/asthma/nhis/2020/data.htm.
  • 23.Dominici F, Zanobetti A, Schwartz J, Braun D, Sabath B, Wu X. Assessing adverse health effects of long-term exposure to low levels of ambient air pollution: Implementation of causal inference methods [Internet] Health Effects Institute. 2022 [cited 2022 Jul 27]. Report No.: 211. Available from: https://www.healtheffects.org/publication/assessing-adverse-health-effects-long-term-exposure-low-levels-ambient-air-pollution-0. [PMC free article] [PubMed] [Google Scholar]
  • 24.Bateman ED, Hurd SS, Barnes PJ, et al. Global strategy for asthma management and prevention: GINA executive summary. Eur Respir J. 2008 Jan;31(1):143–78. doi: 10.1183/09031936.00138707. [DOI] [PubMed] [Google Scholar]
  • 25.Braido F. Failure in asthma control: Reasons and consequences. Scientifica. 2013 Dec 18;2013:e549252. doi: 10.1155/2013/549252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Krishnan V, Diette GB, Rand CS, et al. Mortality in patients hospitalized for asthma exacerbations in the United States. Am J Respir Crit Care Med. 2006;174(6):633–8. doi: 10.1164/rccm.200601-007OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sykes A, Johnston SL. Etiology of asthma exacerbations. J Allergy Clin Immunol. 2008 Oct;122(4):685–8. doi: 10.1016/j.jaci.2008.08.017. [DOI] [PubMed] [Google Scholar]
  • 28.Eggleston EM, Weitzman ER. Innovative uses of electronic health records and social media for public health surveillance. Curr Diab Rep. 2014 Mar;14(3):468. doi: 10.1007/s11892-013-0468-7. [DOI] [PubMed] [Google Scholar]
  • 29.Martin A, Bauer V, Datta A, et al. Development and validation of an asthma exacerbation prediction model using electronic health record (EHR) data. J Asthma. 2020;57(12):1339–46. doi: 10.1080/02770903.2019.1648505. [DOI] [PubMed] [Google Scholar]
  • 30.Peer K, Adams WG, Legler A, et al. Developing and evaluating a pediatric asthma severity computable phenotype derived from electronic health records. J Allergy Clin Immunol. 2021 Jun 1;147(6):2162–70. doi: 10.1016/j.jaci.2020.11.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tarabichi Y, Goyden J, Liu R, Lewis S, Sudano J, Kaelber DC. A step closer to nationwide electronic health record–based chronic disease surveillance: Characterizing asthma prevalence and emergency department utilization from 100 million patient records through a novel multisite collaboration. J Am Med Inform Assoc JAMIA. 2020 Jan;27(1):127–35. doi: 10.1093/jamia/ocz172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Greenblatt RE, Zhao EJ, Henrickson SE, Apter AJ, Hubbard RA, Himes BE. Factors associated with exacerbations among adults with asthma according to electronic health record data. Asthma Res Pract. 2019;5:1. doi: 10.1186/s40733-019-0048-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Klompas M, Cocoros NM, Menchaca JT, et al. State and local chronic disease surveillance using electronic health record systems. Am J Public Health. 2017 Sep 1;107(9):1406–12. doi: 10.2105/AJPH.2017.303874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cardet JC, Louisias M, King TS, et al. Income is an independent risk factor for worse asthma outcomes. J Allergy Clin Immunol. 2017 May 20;141(2):754–760.e3. doi: 10.1016/j.jaci.2017.04.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bryant-Stephens TC, Strane D, Robinson EK, Bhambhani S, Kenyon CC. Housing and asthma disparities. J Allergy Clin Immunol. 2021 Nov;148(5):1121–9. doi: 10.1016/j.jaci.2021.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wright RJ, Steinbach SF. Violence: An unrecognized environmental exposure that may contribute to greater asthma morbidity in high risk inner-city populations. Environ Health Perspect. 2001 Oct;109(10):1085–9. doi: 10.1289/ehp.011091085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cloutier MM, Baptist AP, Blake KV, et al. 2020 focused updates to the Asthma Management Guidelines: A report from the National Asthma Education and Prevention Program Coordinating Committee Expert Panel Working Group. J Allergy Clin Immunol. 2020 Dec 1;146(6):1217–70. doi: 10.1016/j.jaci.2020.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Expert Panel Report 3 (EPR-3): Guidelines for the diagnosis and management of asthma--Summary Report 2007. J Allergy Clin Immunol. 2007 Nov;120(5):S94–138. doi: 10.1016/j.jaci.2007.09.043. [DOI] [PubMed] [Google Scholar]
  • 39.Singh GK. Area deprivation and widening inequalities in US mortality, 1969–1998. Am J Public Health. 2003 Jul;93(7):1137–43. doi: 10.2105/ajph.93.7.1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.University of Wisconsin School of Medicine Public Health. 2018 Area Deprivation Index v3.1. Available from: https://www.neighborhoodatlas.medicine.wisc.edu/. Accessed July 2022.
  • 41.Kind AJH, Buckingham WR. Making neighborhood-disadvantage metrics accessible — the neighborhood atlas. N Engl J Med. 2018 Jun 28;378(26):2456–8. doi: 10.1056/NEJMp1802313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Hijmans RJ, Bivand R, Forner K, Ooms J, Pebesma E, Sumner MD. Spatial data science with R and “terra” [Internet] 2022 Available from: https://rspatial.org/terra/ [Google Scholar]
  • 43.United States Environmental Protection Agency. Pre-generated data files [Internet] United States Environmental Protection Agency. [cited 2022 Jul 1]. Available from: https://aqs.epa.gov/aqsweb/airdata/download_files.html.
  • 44.City of Philadelphia Department of Public Health Air Management Services. 2021-2022 Air Monitoring Network Plan. 2021.
  • 45.Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens Environ. 2017 Dec 1;202:18–27. [Google Scholar]
  • 46.Aybar C, Wu Q, Bautista L, Yali R, Barja A. rgee: An R package for interacting with Google Earth Engine. J Open Source Softw. 2020;5(51):2272. [Google Scholar]
  • 47.Cooper M. Satellite-derived ground level NO2 concentrations, 2005-2019 (Version v1) [Data set] [Internet] Zenodo. 2022 doi: 10.5281/zenodo.5424752. Available from: [DOI] [Google Scholar]
  • 48.Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D. MASS: support functions and datasets for Venables and Ripley’s MASS [Internet] 2022 [cited 2022 Jul 20]. Available from: https://CRAN.R-project.org/package=MASS. [Google Scholar]
  • 49.Bai L, Bartell S, Bliss R, Vieira V. Mapping smoothed effect estimates from individual-level spatial data [Internet] 2022 Available from: https://search.r-project.org/CRAN/refmans/MapGAM/html/MapGAM-package.html. [Google Scholar]
  • 50.Moorman JE, Akinbami LJ, Bailey CM, et al. National surveillance of asthma: United States, 2001-2010. Vital Health Stat 3. 2012 Nov;35:1–58. [PubMed] [Google Scholar]

Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES