Abstract
Despite a multi-decade decrease in cardiovascular disease, geographic disparities have widened, with excess mortality concentrated within the United States (U.S.) South. Petroleum production and refining, a major contributor to climate change, is concentrated within the U.S. South and emits multiple classes of atherogenic pollutants. We investigated whether residential exposure to oil refineries could explain variation in self-reported coronary heart disease (CHD) prevalence among adults in southern states for the year 2018, where the majority of oil refinery activity occurs (Alabama, Mississippi, Louisiana, Arkansas, Texas, New Mexico, and Oklahoma). We examined census tract-level association between oil refineries and CHD prevalence. We used a double matching method to adjust for measured and unmeasured spatial confounders: one-to-n distance matching and one-to-one generalized propensity score matching. Exposure metrics were constructed based on proximity to refineries, activities of refineries, and wind speed/direction. For all census tracts within 10km of refineries, self-reported CHD prevalence ranged from 1.2% to 17.6%. Compared to census tracts located at ≥5km and <10km, one standard deviation increase in the exposure within 5km of refineries was associated with a 0.33 (95% confidence interval: 0.04, 0.63) percentage point increase in the prevalence. A total of 1119.0 (123.5, 2114.2) prevalent cases or 1.6% (0.2, 3.1) of CHD prevalence in areas within 5km from refineries were potentially explained by exposure to oil refineries. At the census tract-level, the prevalence of CHD explained by exposure to oil refineries ranged from 0.02% (0.00, 0.05) to 47.4% (5.2, 89.5). Thus, although we cannot rule out potential confounding by other personal risk factors, CHD prevalence was found to be higher in populations living nearer to oil refineries, which may suggest that exposure to oil refineries can increase CHD risk, warranting further investigation.
Keywords: Oil refineries, Oil Industry, Cardiovascular diseases, Epidemiology, Environmental Health, Small area variation, Climate change
1. Introduction
Despite a sustained decline in cardiovascular disease over the past several decades, geographic disparities have widened as excess mortality has shifted from the United States Northeast and Mid-Atlantic to the South (Casper et al. 2016; Singh et al. 2015). As geographic disparities have increased, the rate of decline in cardiovascular mortality has slowed and may be reversing in some geographic regions. Indeed, the rates of decline in cardiovascular mortality (9–50%) within southern counties have lagged their geographic counterparts (64–83%) between the late-twentieth and early-twenty-first centuries (Casper et al. 2016).
The mechanisms underlying geographic disparities in cardiovascular outcomes are not fully understood. Although traditional vascular risk factors and socioeconomic conditions explain much of cardiovascular disease prevalence, these characteristics do not fully explain geographic variability (Gebreab et al. 2015; Glynn et al. 2021). Within the United States (U.S.) South, Mississippi has experienced a disproportionate share of age-adjusted mortality due to heart failure and among the slowest rates of decline in all-cause cardiovascular mortality (Casper et al. 2016; Glynn et al. 2021). The spatial patterns of cardiovascular outcomes in the U.S. South resemble those observed for stroke outcomes, often referred to as the Stroke Belt that are not fully elucidated (Cushman et al. 2008; Glymour et al. 2007; Howard 1999; Howard and Howard 2020; Howard et al. 2016; Karp et al. 2016; Liao et al. 2009). Within the South, sizable disparities in county-level cardiovascular mortality have prompted recommendations for small-area surveillance of cardiovascular disease to elucidate the potential explanatory mechanisms (Casper et al. 2016).
A body of evidence shows that environmental exposures such as particulate air pollution have strong sociodemographic and geographic patterning (Bell and Ebisu 2012; Bravo et al. 2016; Hajat et al. 2013; Jbaily et al. 2022; Jones et al. 2014; Liu et al. 2021) and that they are associated with increased cardiovascular risk (Brook et al. 2010; Rajagopalan et al. 2018; Rajagopalan and Landrigan 2021). Together this evidence suggests that environmental exposures may contribute to geographic disparities in cardiovascular outcomes.
To our knowledge, residential exposure to oil refineries has not previously been evaluated as a contextual risk factor for coronary heart disease (CHD). Roughly, two-thirds of petroleum production and refining in the U.S. occur within the third Petroleum Administration for Defense District (PADD-3). PADD-3 consists of six southern states (Alabama, Arkansas, Louisiana, Mississippi, New Mexico, and Texas) (US EIA, 2023). While the sector of oil refineries is the second highest ranked sector regarding greenhouse gas emissions per facility in the U.S. (1.15 million metric tons of carbon dioxide equivalent for the year 2020) (US EPA 2021), oil refineries also emit multiple airborne pollutants that have been implicated in cardiovascular pathogenesis, including sulfur dioxide (SO2), particulate matter (PM), nitrogen oxides (NOx), volatile organic compounds (VOCs), and polycyclic aromatic hydrocarbons (PAH)(Adebiyi 2022). Byproducts of petroleum production and refining may also impact adjacent soil, and potable water in residential areas (Damian 2013; Lynch et al. 2004), which may independently increase the risk of cardiovascular diseases. Thus, a mixture of environmental hazards from oil refineries may be related to CHD pathogenesis.
Therefore, we evaluated whether residential exposure to oil refineries could explain geographic variation in CHD within the U.S. South.
2. Materials and Methods
Health outcome
Our study population includes adults (≥18 years) living in seven states (Alabama, Mississippi, Louisiana, Arkansas, Texas, New Mexico, Oklahoma), where CHD is disproportionately concentrated (Casper et al. 2016; Singh et al. 2015), residing within approximately 10km from refineries. We restricted our study population as residents of the census tracts within approximately 10km from refineries because residents living in close proximity to refineries may be exposed to refinery-related pollution. Additionally, we needed to account for unmeasured spatial confounders, as outlined in Statistical Analysis subsection in this section. The health outcome of interest in this study is census tract-level self-reported CHD prevalence for the year 2018. We obtained this from the Centers for Disease Control and Prevention (CDC)’s Population Level Analysis and Community Estimates (PLACES) (Greenlund et al. 2022). Self-reported CHD prevalence was defined as the percentage of respondents aged ≥18 years who reported ever having been told by a doctor, nurse, or other health professional that they have had angina or CHD. The CDC PLACES estimates are small area estimates constructed by using statistical approaches and the Behavioral Risk Factor Surveillance System (BRFSS) that samples from a noninstitutionalized population. BRFSS is a telephone-based survey and samples were collected separately by each state. Except for Guam, Puerto Rico, and the U.S. Virgin Islands, the sampling design was a disproportionate stratified sample design, which used random digit dialed probability samples for those aged 18 years or older. The population weighted average of the prevalence in the seven states was 7.0% while that of the prevalence in the U.S. was 6.4%.
Exposure classification
Exposure of interest 1: the proximity to refineries
The geocoded locations for 59 oil refineries for the period from 2015–2017 were obtained from the U.S. Energy Information Administration (EIA). Because residents residing nearer to oil refineries may be exposed to higher levels of environmental pollution, we considered distance from refineries to be an important determinant of exposure. We assumed that proximity to refineries might be related to many plausible exposure pathways (Figure 1) (Adebiyi 2022; Damian 2013; Lynch et al. 2004), including airborne pollutant mixtures majorly.
Figure 1.

A conceptual framework for potential relationship between oil refineries and cardiovascular risk (US EIA 2023 ; US EPA 2015; Garcia-Gonzales et al. 2019; CEIP 2023). The hypotheses of this study are presented as dashed lines. Elements considered in this study are presented as red bolded texts in the boxes.
We calculated the Euclidian distance to the closest oil refinery from the centroid of each census tract. We dichotomized the proximity of refineries using three a priori selected distance thresholds (<2.5km, 2.5–5km, or <5km) from the centroid of a census tract to an oil refinery. We defined oil refineries-exposed census tracts as those with at least one refinery within a radius of <2.5km or <5km from their centroid. Controls were defined as census tracts located at ≥5km and <10km (approximately one degree) from a refinery; tracts ≥10km from a refinery were not included in analysis. The three categorizations (<2.5km, 2.5–5km, or <5km) of exposed census tracts were selected a priori, considering that air pollutants decay with distance from smokestacks of oil refineries (Chen et al. 2012) and the conventions applied in our previous study (Kim et al. 2022). Although some air pollutants may travel farther than 5km, we did not consider census tracts located ≥5km from refineries as exposed, due to the need to adjust for unmeasured spatial confounding. The use of census tracts located at both ≥5km and <10km from a refinery as the control group and the exclusion of tracts 10km or further from a refinery enabled us to adjust for unmeasured spatial confounding (Kim and Bell 2022; Kim et al. 2022). Thus, there was a trade-off between increasing the distance thresholds used to define exposure and choosing appropriate control groups to adjust for unmeasured spatial confounding.
Exposure of interest 2: wind-integrated inverse squared distance-weighted sum of actual petroleum production
In addition to proximity, we considered the productivity of a refinery as a second exposure domain because higher levels of petroleum production may correspond to a higher burden of polluting byproducts. Also, if census tracts are situated in close proximity to refineries in the downwind position, residents in those census tracts may have a higher likelihood of exposure to air pollutants emitted from those refineries. Additionally, even in the downwind position, higher wind speeds may reduce the likelihood of people being exposed to high levels of air pollutants compared to lower wind speeds, as pollutants can more easily disperse with stronger winds.
We obtained the average petroleum production capacities of refineries for the years 2015–2017. We defined production capacity using the reported atmospheric distillation capacity of each facility. We obtained average annual petroleum production for each PADD corresponding to states within our study area (PADDs 2 and 3). We estimated actual petroleum production (APP) for each refinery by approximating its fraction of annual oil production as a share of the total production capacity within each PADD. We utilized facility-reported atmospheric distillation capacity to estimate their share of production. We obtained ERA5 monthly averaged data for wind information at a 0.25 degree grid for the years 2015–2017. We calculated the average of the 10m u and v components of the wind speed. The u component is parallel to longitude and the v component is parallel to latitude. We propose a novel continuous exposure metric, a wind-integrated inverse squared distance-weighted sum of APPs for each census tract as follows:
Xc denotes a wind-integrated inverse squared distance-weighted sum APP at a census tract c. du,c,r and dv,c,r denote u and v-components of the distance vector between the centroid of census tract c and a refinery r, respectively. Rb is a set of refineries within a pre-specified buffer distance, b, from the centroid of census tract c. b was 2.5km or 5km. WDu,c,r and WDv,c,r are binary indicators of the u and v-components of the wind direction between c and r. If the average of each component of the wind direction over the period of 2015–2017 is downwind, WD=1; otherwise WD=0. If both components are upwind, the term, becomes the inverse squared distance. If at least each component of the wind is downwind, the value of the term would increase, meaning that people may be more likely exposed to the pollution byproducts of the refinery r.
The term , considers that air pollutants emitted from the refinery r may disperse less with lower wind speed, implying that people may be more likely exposed to air pollutants generated from that refinery. WSu,c,r and WSv,c,r are the u and v-components of the wind speed between c and r. This term would increase with the lower wind speed. Even in the downwind position, high wind speeds can easily disperse pollution, causing it to linger for only a short period of time. The term would decrease with the higher wind speeds.
The product of these two terms would equal the inverse squared distance factored by only if the average wind is upwind, which is . If the wind is downwind (at least one component) and wind speed is low, the product may be higher than what would be if it was . This implies that people may be more exposed to higher levels of air pollutants emitted from a refinery during downwind conditions with low wind speeds than they would be during upwind conditions. Since wind varies daily, and monthly, we assume that people may still be exposed to air pollutants even if the average wind over 2015–2017 is upwind, but their likelihood of exposure to high levels of air pollutants may be elevated only if the distance between c and r is short. This corresponds to a higher value of .
We considered different values of the two scale factors, δ and γ: 0.5, 1, and 1.5. Among the combination of these values, we found that Xcs are highly correlated, approximately 0.99. Therefore, we used 1 of δ and γ.
Potential confounders
We obtained information on potentially confounding sociodemographic factors at the census tract level from the American Community Survey (ACS, 2014–2018), including age group, sex, race, and ethnicity, as well as indicators of household and community socioeconomic conditions (percentage of the population living below the federal poverty line, with less than high-school educational attainment, and median household income). We obtained health and behavioral vascular risk factors, including the prevalence of tobacco use, hypertension, hypercholesterolemia, diabetes mellitus, and obesity, from CDC PLACES (Greenlund et al. 2022).
Statistical analysis
We conducted census tract-level cross-sectional analyses.
Binary exposure variable analysis
For the proximity to refineries, analyses were separately performed for the three dichotomous, distance-based classifications of exposed census tracts (<2.5km, 2.5–5km, and <5km). Controls were defined as census tracts located at ≥5km and <10km from a refinery. The use of these controls was based on our previous studies to adjust for unmeasured spatial confounding (Kim and Bell 2022; Kim et al. 2022). The choice of 10km was informed by Figure S1. Figure S1 includes semi-variogram for spatial correlation of CHD prevalence before and after adjustment for several risk factors (i.e., sociodemographic variables, and smoking). As noted in the introduction section, the spatial patterns of cardiovascular outcomes in the U.S. South are not fully elucidated. There remain the spatial patterns of the CHD prevalence in our study population after the risk factors were controlled, implying potential unmeasured spatial confounding.
Specifically, we conducted one-to-n distance-matching with replacement. This technique matched one exposed census tract to one or more control census tract(s), depending on the Euclidean distance between the exposure and control census tract(s). Matching was performed only if census tracts were within 10km of each another. After this one-to-n distance-matching, we conducted one-to-one nearest neighbor matching with replacement by propensity score (PS). PS models were fit using generalized additive models (GAM). When fitting PS models, we included several covariates to adjust for measured confounding. Covariates were manually selected by checking standardized mean difference (SMD) ≥±0.25 (Stuart et al. 2013). Additionally, a spatial smoother (i.e., thin-plane spline) was added to augment adjustment for unmeasured spatial confounding in addition to one-to-n distance-matching. The two rounds of matching produced one-to-one matched pairs of exposed and control (unexposed) census tracts.
After matching was completed, we performed linear regression to evaluate the association between proximity to refineries as a binary exposure variable and CHD prevalence. Dummy variables that indicate each of the distance-matched strata were added for matched analysis. We conducted bootstrapping with a distance- and PS-matched dataset to estimate a standard error of the association estimate.
Continuous exposure variable analysis
In addition to a binary exposure variable, we conducted separate analyses with a continuous variable, Xc. We conducted separate analyses for each of Xc within 2.5km and within 5km. Similar to analyses with a binary exposure variable, we conducted one-to-n distance-matching with replacement. One exposed census tract was matched to one or more controls (i.e., census tracts located at both ≥5km and <10km from a refinery).
We then conducted one-to-one nearest neighbor matching with replacement (NNWR) by GPS. GPS was for Xc=w (w>0) as a continuous exposure variable. The GPS estimation factored in two dimensions of Xc with bimodal distributions: 1) binary dimension (i.e., whether there is a refinery within a prespecified buffer distance); and 2) continuous dimension (i.e., the quantity of petroleum production by refineries). We estimated GPS for Xc conditional on the proximity to refineries as the binary exposure variable, referred to as conditional GPS (CGPS) (Kim and Bell 2022), and then multiplied CGPS and the above-described PS (See Binary exposure variable analysis subsection). CGPS models were fit using generalized additive models (GAM). When fitting CGPS models, we included several covariates to adjust for measured confounding. Covariates were manually selected by checking SMD ≥±0.25 and their correlations with Xc (≥±0.25) in the exposed census tracts (Kim and Bell 2022). Additionally, a spatial smoother (i.e., thin-plane spline) was added to augment adjustment for unmeasured spatial confounding in addition to one-to-n distance-matching. We estimated GPS for Xc=w>0 in each distance-matched stratum.
After matching was completed, we performed linear regression to evaluate the association between Xc and CHD prevalence. Dummy variables that indicate each of the distance- and GPS-matched strata were added for this matched analysis. To estimate a standard error of the association estimate, we conducted bootstrapping with the distance- and GPS-matched dataset.
Others
We quantified the extent to which exposure to oil refineries might explain geographic variation in CHD prevalence. We used to estimate the number of cases that are potentially explained by exposure to oil refineries. is a coefficient estimate indicating a percentage point increase in self-reported CHD prevalence (%) per one unit increase in Xc, and Popc is the population age 18 years or older at census tract c. We used to estimate the percentage of cases that are potentially explained by exposure to oil refineries, where Prevc is the CHD prevalence (%) at census tract c.
As sensitivity analyses, we used one-to-one nearest neighbor matching without replacement (NNWoR) and one-to-one nearest neighbor caliper matching with/without replacement (NNCW(o)R) instead of NNWR. We tested non-linear associations using a thin-plane spline in GAM.
Figures S2 and S3 show SMD for 18 potential confounders in our main analysis and sensitivity analyses. (Stuart et al. 2013).
We conducted all statistical analyses using R software 3.5.3 with mgcv, and CGPSspatialmatch packages (Kim and Bell 2022).
3. Results
Table 1 presents descriptive statistics of potential confounders and self-reported CHD prevalence by proximity to refineries (<2.5km, 2.5km–5km, <5km, as exposed groups and 5km–10km as the control group). The respective percentages of population with low socioeconomic status and Hispanic ethnicity were higher in census tracts nearer to refineries. The percentage of non-Hispanic Black and White populations were lower nearer to refineries. Census tracts nearer to refineries had higher prevalence of current smoking and obesity. The average of self-reported CHD prevalence (one standard deviation (SD)) was 8.0% (1.8%), 8.0% (1.9%), 8.0% (2.2%), and 7.8% (2.2%) for census tracts within 2.5km, between 2.5km and 5km, within 5km, and between 5km and 10km from refineries, respectively. For all census tracts within 10km from refineries, self-reported CHD prevalence ranged from 1.2% to 23.1% and its SD was 2.3%.
Table 1.
Distribution of census tract-level coronary heart disease prevalence and potential confounders by proximity to oil refineries.
| <2.5kma | P-valueb | 2.5 – 5kma | P-valueb | <5km | P-valueb | 5km – 10kma | |
|---|---|---|---|---|---|---|---|
| Number of census tractsc | 79 | 237 | 314 | 541 | |||
| Sex, % | |||||||
| Males | 48.4 (4.6) | 0.344 | 48.1 (5.5) | 0.470 | 48.2 (5.3) | 0.310 | 47.8 (5.9) |
| Females | 51.6 (4.6) | 0.694 | 51.9 (5.5) | 0.470 | 51.8 (5.3) | 0.310 | 52.2 (5.9) |
| Age, % | |||||||
| 18–19 years | 4.0 (2.7) | 0.694 | 3.6 (3.2) | 0.582 | 3.7 (3.1) | 0.776 | 3.8 (5.4) |
| 20–24 years | 10.1 (3.2) | 0.772 | 10.1 (4.8) | 0.611 | 10.1 (4.5) | 0.572 | 10.3 (6.4) |
| 25–44 years | 37.4 (6.5) | 0.122 | 35.8 (6.5) | 0.826 | 36.2 (6.5) | 0.613 | 36.0 (7.9) |
| 45–64 years | 32.1 (4.4) | 0.610 | 32.8 (5.4) | 0.023 | 32.6 (5.2) | 0.032 | 31.7 (6.6) |
| 65–84 years | 14.4 (4.9) | 0.028 | 15.6 (5.3) | 0.513 | 15.3 (5.3) | 0.134 | 15.9 (5.7) |
| 85+ years | 2.0 (1.5) | 0.114 | 2.1 (1.9) | 0.104 | 2.1 (1.8) | 0.041 | 2.4 (2.0) |
| Race/ethnicity, % | |||||||
| Hispanic | 39.2 (36.0) | 0.124 | 35.7 (35.8) | 0.273 | 36.6 (35.8) | 0.120 | 32.7 (34.7) |
| Non-Hispanic White | 28.6 (26.6) | 0.260 | 31.2 (28.1) | 0.573 | 30.6 (27.7) | 0.342 | 32.5 (28.8) |
| Non-Hispanic Black | 28.5 (32.6) | 0.655 | 29.0 (33.0) | 0.615 | 28.8 (32.9) | 0.546 | 30.2 (32.3) |
| Socioeconomic conditions | |||||||
| Educational attainment of less than high school, % | 25.5 (10.7) | <0.001 | 22.4 (12.8) | 0.007 | 23.2 (12.4) | <0.001 | 19.8 (11.9) |
| Under the federal poverty line, % | 22.1 (9.9) | 0.224 | 22.4 (10.1) | 0.886 | 23.2 (10.1) | 0.676 | 19.8 (11.7) |
| Have health insurance, % | 73.1 (11.9) | <0.001 | 76.7 (11.8) | 0.020 | 75.8 (11.9) | <0.001 | 78.7 (10.9) |
| Median Household Income, $ | 39882.7 (15095.5) | 0.074 | 44153.7 (22190.2) | 0.935 | 43079.1 (20696.7) | 0.508 | 44022.3 (19710.5) |
| Cardiovascular risk factors, % | |||||||
| Current smoking | 23.2 (5.6) | <0.001 | 21.1 (4.9) | 0.090 | 21.7 (5.2) | 0.001 | 20.4 (5.3) |
| Obesity | 41.6 (7.2) | 0.021 | 40.8 (6.8) | 0.037 | 41.0 (6.9) | 0.007 | 39.7 (6.6) |
| High cholesterol | 35.8 (2.8) | 0.395 | 36.1 (3.4) | 0.031 | 36.0 (3.2) | 0.028 | 35.4 (4.1) |
| Diabetes | 15.9 (4.3) | 0.376 | 15.8 (4.7) | 0.286 | 15.8 (4.6) | 0.202 | 15.4 (5.1) |
| High blood pressure | 38.3 (7.8) | 0.7995 | 38.8 (8.2) | 0.467 | 38.7 (8.1) | 0.544 | 38.3 (8.6) |
| Coronary heart disease prevalence, % | 8.0 (1.8) | 0.462 | 8.0 (1.9) | 0.180 | 8.0 (1.9) | 0.148 | 7.8 (2.2) |
Mean (Standard deviation).
P-value for the difference with respect to 5km–10.km
Distribution of variables are shown regardless of whether they were used in matched analyses.
Distributions of variables used in matched analyses are presented in Figures S2 and S3 as standardized mean differences.
Table 2 presents the associations between proximity to refineries and self-reported CHD prevalence. We did not find statistically significant associations for three distance-based, binary exposure thresholds (<2.5km, 2.5km–5km, and <5km). The tested associations were also nonsignificant in sensitivity analyses, except for one association (Table S1). In contrast, we found a statistically significant association between Xc for 2.5km and for 5km and self-reported CHD prevalence. A one-SD increase in Xc for 2.5km and 5km was associated with a 0.54 (95% confidence interval: 0.01, 1.06) and a 0.33 (0.04, 0.63) percentage point increase in CHD prevalence, respectively. Sensitivity analyses demonstrated consistent results, with point estimates ranging from 0.17 (−0.21, 0.54) to 0.34 (−0.22, 0.91) (Table S2).
Table 2.
Association between proximity to oil refineries and self-reported coronary heart disease prevalence
| Exposurea | Percentage point increase (95% confidence interval)b |
|---|---|
| Proximity to refineries | |
| <2.5km | 0.04 (−0.51, 0.59) |
| 2.5–5km | 0.12 (−0.19, 0.43) |
| <5km | 0.07 (−0.16, 0.29) |
| The weighted average of actual petroleum production (Xc) | |
| <2.5km | 0.54 (0.01, 1.06) |
| <5km | 0.33 (0.04, 0.63) |
Controls were defined as census tracts located at ≥5km and <10km from a refinery.
For proximity to refineries, this increase is an increase compared to the CHD prevalence (%) of controls (binary variable). For Xc, this increase is an increase per one standard deviation (SD) increase in Xc compared to the CHD prevalence (%) of controls (Xc=0).
Table 3 presents the fraction of variation in self-reported CHD prevalence that is explained by Xc within 5km across seven states. For each of the seven states, the percent of explained geographic variation in CHD prevalence was 1.6 (0.2, 3.1), and the number of cases that are potentially explained by exposure to refineries was 1119.0 (123.5, 2114.2). This percent differed by state: 3.0% (0.3, 5.7) in Arkansas, 2.5% (0.3, 4.8) in Mississippi, 2.0% (0.2, 3.7) in Texas, 1.5% (0.2, 2.8) in Louisiana, 0.5% (0.1, 0.9) in New Mexico, 0.4% (0.0, 0.7) in Oklahoma, and 0.2% (0.0, 0.3) in Alabama. The number of cases explained by exposure to oil refineries was the highest in Texas (791.7 (87.5, 1495.8)), followed by 240.1 (26.5, 453.7) in Louisiana, 41.8 (4.6, 78.9) in Arkansas, 25.4 (2.8, 48.0) in Oklahoma, 12.9 (1.4, 24.4) in Mississippi, 4.1 (0.4, 7.7) in Alabama, and 3.0 (0.3, 5.7) in New Mexico. For a total of 316 census tracts within 5km from refineries, CHD prevalence explained by Xc ranged from 0.02% (0.00, 0.05) to 47.4% (5.2, 89.5) (Figure 2).
Table 3.
Self-reported coronary heart disease prevalence and number of cases that are explained by exposure to oil refineries within 5km distance from refineries by seven states
| State | % of prevalence explained (95% confidence interval) | Prevalent cases explained (95% confidence interval) |
|---|---|---|
| Alabama | 0.2 (0, 0.3) | 4.1 (0.4, 7.7) |
| Arkansas | 3.0 (0.3, 5.7) | 41.8 (4.6, 78.9) |
| Louisiana | 1.5 (0.2, 2.8) | 240.1 (26.5, 453.7) |
| Mississippi | 2.5 (0.3, 4.8) | 12.9 (1.4, 24.4) |
| New Mexico | 0.5 (0.1, 0.9) | 3.0 (0.3, 5.7) |
| Oklahoma | 0.4 (0.0, 0.7) | 25.4 (2.8, 48.0) |
| Texas | 2.0 (0.2, 3.7) | 791.7 (87.5, 1495.8) |
| All seven states | 1.6 (0.2, 3.1) | 1119.0 (123.5, 2114.2) |
Figure 2.

The percentage of coronary heart disease prevalence that is explained by exposure to oil refineries. Each point indicates the percentage at an exposed census tract within a state. Each horizontal line indicates its 95% confidence interval. Census tracts were sorted by the percentage explained for each state (vertical axis)
4. Discussion
The U.S. Environmental Protection Agency (EPA) regulates several classes of air pollutants emitted from oil refineries and other byproducts that may pollute surrounding soil and water under the Clean Air Act., the Clean Water Act., and the Petroleum Refining Effluent Guidelines and Standards. Our findings suggest cardiovascular health impacts of pollution mixtures from oil refineries, meaning disproportionate concentration and small-area variation of CHD in the southern U.S. Petroleum production and refining may become a stronger explanatory factor of geographic differences in CHD prevalence when the petroleum production and refining level is higher and the level of the other contributing factors to the prevalence is lower.
Our exposure metric reflected the number, location, capacity of oil refineries, and wind. Similar approaches have been used in studies of the association between oil and gas wells and various health outcomes (McKenzie et al. 2014; McKenzie et al. 2017; Rasmussen et al. 2016). Such approaches can provide insight regarding the contribution of pollution point sources to regional and small-area variation in CHD. In addition, we proposed integrating wind speeds and directions into the exposure metric, enhancing its accuracy in representing potential exposure to pollution emitted from refineries compared to previous metrics based solely on only the number, location, and capacity of emission sources. Although our exposure metric has the advantage of accommodating geographic heterogeneity in exposure mechanisms and the composition of pollutant mixtures, it is not designed to isolate the health effects of single pollutant classes or discern among the various pathways (e.g., air, water) through which these activities impact health. To inform environmental policy, future research should disentangle the contribution of each pollutant class within byproducts of petroleum production and refining to CHD risk.
We found a significant positive association between residential exposure to oil refineries and CHD prevalence using wind-integrated inverse distance squared-weighted sum of APP (Xc), which was robust to different analytical choices, as confirmed by sensitivity analyses. We did not find consistent and significant associations using solely distance-based binary exposure definitions. This suggests that Xc, which considers count, proximity, the expected productivity of refineries, and wind speeds and directions may be a better surrogate for exposure to pollutant mixtures than distance alone. Nonetheless, the degree of individual exposure to pollutants depends on multiple factors that could not be considered within this study. These include the source and quality of fuels input into the oil refining process, pollution control measures, water treatment systems, chemical compositions of crude oils, and other meteorological factors, among others. Nevertheless, inverse distance squared weighted average of APP for 2.5km and 5km that is similar to Xc, were positively associated with measured levels of SO2 in other study (Kim et al. 2022). SO2 is a reasonable approximate pollutant indicator for local concentrations of byproducts because crude petroleum contains a large amount of sulfur and may be coupled with other pollutants emitted from refineries, including fine particles, black carbon, VOCs, NOx, and PAH (Adebiyi 2022). SO2 emissions are also a relatively unique signature of exposure to oil refineries due to multiple decades of regulation under the Clean Air Act that have reduced SO2 emission from other sources, including automotive transport. Moreover, one of the EPA’s recommended regulatory dispersion models also confirmed that SO2 and VOCs generated from oil refineries in Texas City can travel a sufficient distances to contribute to pollutant exposure within surrounding communities (Chen et al. 2012).
The potential contribution of exposure to oil refineries in the exposed census tracts to CHD prevalence (i.e., ) varied by state and census tracts. This variation suggests that petroleum production and refining may become a stronger explanatory factor of geographic differences in CHD prevalence when the petroleum production and refining level is higher and the level of the other contributing factors to the prevalence is lower.
We note several limitations. First, our study is ecological and therefore cannot establish temporal or causal relationships, despite our use of a causal inference method ((G)PS matching). Second, we used CHD prevalence. Prevalence of self-reported CHD (i.e., those who reported ever having been told by a doctor, nurse, or other health professional that they have had angina or CHD), by definition, depends on incidence and case-fatality. Our results do not indicate the association for the incidence or the association for the case-fatality. Third, CHD prevalence and health and behavioral risk factors were based on self-report and could not be triangulated or adjudicated using other data sources. Our source of health information, moreover, required us to dichotomize the vascular risk-factors, which may have a dose-response relationship to CHD outcomes, such as tobacco use and obesity. Also, the CDC PLACES estimates are census tract-level (small areas) estimates from statistical approaches, inherently having uncertainties. Fourth, we were unable to account for the route(s), magnitude, or duration of individual exposures or model complex mixtures of different environmental pollutants such as air, noise, soil, and water. Our exposure metric could not capture variation in major determinants of emissions, such as complex refining processes, pollution control measures, the quality of crude oil inputs, and temporal fluctuations in operating and idle capacity, among others. Exposure to air pollution generated from refineries is also determined by multiple factors including smokestack attributes (e.g., height, diameter), topology, and meteorology, all of which were not fully considered in our study. Fifth, it was not feasible to evaluate potential occupational exposures with our data, either. Sixth, we did not investigate buffer distances other than 2.5km and 5km. We categorized census tracts located at >5km and ≤10km from refineries as the control group. There are competing considerations: to expand exposed groups as to include long-distance impacts from a refinery versus to choose appropriate control groups to adjust for unmeasured spatial confounding. If we expand exposed groups, the analysis is less likely to achieve adjustment for unmeasured spatial confounding because census tracts farther from a refinery are less similar to exposed groups regarding unmeasured spatial confounders (Figure S1). If we narrow the exposed groups, the analysis is more likely to achieve adjustment for unmeasured spatial confounding. However, this may result in underestimation if these census tracts categorized as the control group had been actually exposed to oil refineries. Unmeasured spatial confounding is concerning in our study population because entrenched cardiovascular risks have not been fully explained by many known risk factors (Gebreab et al. 2015; Glynn et al. 2021). We adjusted for this confounding by distance-matching and (G)PS-matching integrated with spatial smoothers. Seventh, because the distance between census tracts and refineries was determined using the geographic centroid of census tracts and point coordinates of oil refineries, the exposure status of some residents of census tracts with a large spatial footprint might have been misclassified. Such census tracts are characterized by uneven population dispersion due to topological features, such as mountainous terrain.
5. Conclusion
We evaluated the association between local exposure to oil refineries and small-area variation in the prevalence of CHD. Our cross-sectional ecological analysis is based on a causal inference method that adjusts for measured confounders and unmeasured spatial confounders and suggests a potential relationship between residential exposure to oil refineries and CHD prevalence. Our research suggests that the sector of oil refineries, a major contributing sector to climate change, may also contribute to higher cardiovascular diseases in the southern U.S. Longitudinal studies with more detailed exposure assessments and incidence-based health outcomes will enhance evidence of the relationship between oil refineries and CHD risk, including analysis of the specific pollutants and their putative contributions to atherogenesis. This new area of research will improve understanding of potentially modifiable exposures that may contribute to the disproportionate concentration of cardiovascular diseases in the southern U.S and their small area variation.
Supplementary Material
Highlights.
Association of oil refineries with coronary heart disease (CHD) was examined.
We found positive association in seven states of the southern U.S.
The prevalence of CHD explained ranged from 0.02% to 47.4%.
Residents living nearer to oil refineries may have a higher risk of CHD.
Oil refineries may explain small area variation in the risk of CHD.
Acknowledgement
Funding
Dr. Honghyok Kim and Dr. Michelle Bell were supported by Assistance Agreement No. RD835871 awarded by the U.S. Environmental Protection Agency to Yale University and were also supported by the National Institute On Minority Health And Health Disparities of the National Institutes of Health under Award Number R01MD012769. Dr. Honghyok Kim was further supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1A6A3A14039711). Dr. Natalia Festa was supported by VA Office of Academic Affiliations through the VA/National Clinician Scholars Program and Yale University. Yale National Clinician Scholars Program and by CTSA Grant Number TL1 TR001864 from the National Center for Advancing Translational Science (NCATS), a component of the National Institutes of Health (NIH). Dr. Festa is currently supported by T32AG019134. Dr. Gill is supported by the Yale Claude D. Pepper Older Americans Independence Center (P30AG021342). This document has not been formally reviewed by EPA. The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. EPA does not endorse any products or commercial services mentioned in this publication. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The contents of this manuscript do not represent the views of the U.S. Department of Veterans Affairs or the U.S. Government.
Declaration of interests
Dae Cheol Kim reports financial support was provided by Hyundai Oilbank. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Abbreviations
- APP
Actual petroleum production
- ACS
American Community Survey
- CDC
Centers for Disease Control and Prevention
- CDC PLACES
CDC Population Level Analysis and Community EStimates
- CHD
Coronary Heart Disease
- CGPS
Conditional Generalized Propensity Score
- EPA
Environmental Protection Agency
- GAM
Generalized Additive Model
- GPS
Generalized Propensity Score
- NNWR
One-to-one Nearest Neighbor matching With Replacement
- NNWoR
One-to-one Nearest Neighbor matching With”o”ut Replacement
- NNCWR
One-to-one Nearest Neighbor Caliper matching With Replacement
- NNCWoR
One-to-one Nearest Neighbor Caliper matching With”o”ut Replacement
- NOx
Nitrogen oxides
- O3
Ozone
- PAH
Polycyclic Aromatic Hydrocarbon
- PM
Particulate Matter
- PS
Propensity Score
- SD
Standard Deviation
- SMD
Standardized Mean Difference
- SO2
Sulfur dioxide
- U.S.
United States
- VOCs
Volatile Organic Compounds
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of interest
Mr. Dae Cheol Kim is a full-time employee at Hyundai Oilbank. His contribution was made solely under his program at Graduate School of Public Health in Seoul National University, which is not related to the company. The other authors declare there is no conflict of interests.
Data sharing statement
Census tract-level variables are publicly available from the United States Census Bureau (https://data.census.gov) and CDC PLACES website (https://www.cdc.gov/places/index.html). Information on oil refineries is available in the United States Energy Information Administration (https://www.eia.gov/petroleum/refinerycapacity/). The data that support the findings of this study are openly available at the following URL: https://github.com/HonghyokKim/CHD_PPR.
References
- Adebiyi FM. 2022. Air quality and management in petroleum refining industry: A review. Environmental Chemistry and Ecotoxicology. [Google Scholar]
- Bell ML, Ebisu K. 2012. Environmental inequality in exposures to airborne particulate matter components in the united states. Environmental Health Perspectives 120:1699–1704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bravo MA, Anthopolos R, Bell ML, Miranda ML. 2016. Racial isolation and exposure to airborne particulate matter and ozone in understudied us populations: Environmental justice applications of downscaled numerical model output. Environment International 92:247–255. [DOI] [PubMed] [Google Scholar]
- Brook RD, Rajagopalan S, Pope CA III, Brook JR, Bhatnagar A, Diez-Roux AV, et al. 2010. Particulate matter air pollution and cardiovascular disease: An update to the scientific statement from the american heart association. Circulation 121:2331–2378. [DOI] [PubMed] [Google Scholar]
- Carnegie Endowment for International Peace (CEIP). 2023. Oil-climate index. http://oci.carnegieendowment.org/#supply-chain (Accessed on Aug. 10, 2023)
- Casper M, Kramer MR, Quick H, Schieb LJ, Vaughan AS, Greer S. 2016. Changes in the geographic patterns of heart disease mortality in the united states: 1973 to 2010. Circulation 133:1171–1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen JA, Zapata AR, Sutherland AJ, Molmen DR, Chow BS, Wu LE, et al. 2012. Sulfur dioxide and volatile organic compound exposure to a community in texas city, texas evaluated using aermod and empirical monitoring data. American Journal of Environmental Science 8:622–632. [Google Scholar]
- Cushman M, Cantrell RA, McClure LA, Howard G, Prineas RJ, Moy CS, et al. 2008. Estimated 10‐year stroke risk by region and race in the united states. Annals of Neurology: Official Journal of the American Neurological Association and the Child Neurology Society 64:507–513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Damian C 2013. Environmental pollution in the petroleum refining industry. Ovidius University Annals of Chemistry 24:109–114. [Google Scholar]
- Garcia-Gonzales DA, Shonkoff SB, Hays J, Jerrett M. 2019. Hazardous air pollutants associated with upstream oil and natural gas development: A critical synthesis of current peer-reviewed literature. Annual Review of Public Health 40:283–304. [DOI] [PubMed] [Google Scholar]
- Gebreab SY, Davis SK, Symanzik J, Mensah GA, Gibbons GH, Diez‐Roux AV. 2015. Geographic variations in cardiovascular health in the united states: Contributions of state‐and individual‐level factors. Journal of the American Heart Association 4:e001673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glymour MM, Avendano M, Berkman LF. 2007. Is the ‘stroke belt’worn from childhood? Risk of first stroke and state of residence in childhood and adulthood. Stroke 38:2415–2421. [DOI] [PubMed] [Google Scholar]
- Glynn PA, Molsberry R, Harrington K, Shah NS, Petito LC, Yancy CW, et al. 2021. Geographic variation in trends and disparities in heart failure mortality in the united states, 1999 to 2017. Journal of the American Heart Association 10:e020541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greenlund KJ, Lu H, Wang Y, Matthews KA, LeClercq JM, Lee B, et al. 2022. Peer reviewed: Places: Local data for better health. Preventing Chronic Disease 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hajat A, Diez-Roux AV, Adar SD, Auchincloss AH, Lovasi GS, O’Neill MS, et al. 2013. Air pollution and individual and neighborhood socioeconomic status: Evidence from the multi-ethnic study of atherosclerosis (mesa). Environmental Health Perspectives 121:1325–1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howard G 1999. Why do we have a stroke belt in the southeastern united states? A review of unlikely and uninvestigated potential causes. The American Journal of the Medical Sciences 317:160–167. [DOI] [PubMed] [Google Scholar]
- Howard G, Howard VJ. 2020. Twenty years of progress toward understanding the stroke belt. Stroke 51:742–750. [DOI] [PubMed] [Google Scholar]
- Howard VJ, McClure LA, Kleindorfer DO, Cunningham SA, Thrift AG, Roux AVD, et al. 2016. Neighborhood socioeconomic index and stroke incidence in a national cohort of blacks and whites. Neurology 87:2340–2347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jbaily A, Zhou X, Liu J, Lee T-H, Kamareddine L, Verguet S, et al. 2022. Air pollution exposure disparities across us population and income groups. Nature 601:228–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones MR, Diez-Roux AV, Hajat A, Kershaw KN, O’Neill MS, Guallar E, et al. 2014. Race/ethnicity, residential segregation, and exposure to ambient air pollution: The multi-ethnic study of atherosclerosis (mesa). American Journal of Public Health 104:2130–2137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karp DN, Wolff CS, Wiebe DJ, Branas CC, Carr BG, Mullen MT. 2016. Reassessing the stroke belt: Using small area spatial statistics to identify clusters of high stroke mortality in the united states. Stroke 47:1939–1942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim H, Bell M. 2022. Adjustment for unmeasured spatial confounding in settings of continuous exposure conditional on the binary exposure status: Conditional generalized propensity score-based spatial matching. arXiv preprint arXiv:220200814. [Google Scholar]
- Kim H, Festa N, Burrows K, Kim DC, Gill TM, Bell ML. 2022. Residential exposure to petroleum refining and stroke in the southern united states. Environmental Research Letters 17:094018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao Y, Greenlund KJ, Croft JB, Keenan NL, Giles WH. 2009. Factors explaining excess stroke prevalence in the us stroke belt. Stroke 40:3336–3341. [DOI] [PubMed] [Google Scholar]
- Liu J, Clark LP, Bechle MJ, Hajat A, Kim S-Y, Robinson AL, et al. 2021. Disparities in air pollution exposure in the united states by race/ethnicity and income, 1990–2010. Environmental Health Perspectives 129:127005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch MJ, Stretesky PB, Burns RG. 2004. Determinants of environmental law violation fines against petroleum refineries: Race, ethnicity, income, and aggregation effects. Society and Natural Resources 17:333–347. [Google Scholar]
- McKenzie LM, Guo R, Witter RZ, Savitz DA, Newman LS, Adgate JL. 2014. Birth outcomes and maternal residential proximity to natural gas development in rural colorado. Environmental Health Perspectives 122:412–417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKenzie LM, Allshouse WB, Byers TE, Bedrick EJ, Serdar B, Adgate JL. 2017. Childhood hematologic cancer and residential proximity to oil and gas development. PloS One 12:e0170423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rajagopalan S, Al-Kindi SG, Brook RD. 2018. Air pollution and cardiovascular disease: Jacc state-of-the-art review. Journal of the American College of Cardiology 72:2054–2070. [DOI] [PubMed] [Google Scholar]
- Rajagopalan S, Landrigan PJ. 2021. Pollution and the heart. New England Journal of Medicine 385:1881–1892. [DOI] [PubMed] [Google Scholar]
- Rasmussen SG, Ogburn EL, McCormack M, Casey JA, Bandeen-Roche K, Mercer DG, et al. 2016. Association between unconventional natural gas development in the marcellus shale and asthma exacerbations. JAMA Internal Medicine 176:1334–1343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh GK, Azuine RE, Siahpush M, Williams SD. 2015. Widening geographical disparities in cardiovascular disease mortality in the united states, 1969–2011. International Journal of MCH and AIDS 3:134. [PMC free article] [PubMed] [Google Scholar]
- Stuart EA, Lee BK, Leacy FP. 2013. Prognostic score–based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. Journal of Clinical Epidemiology 66:S84–S90. e81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- United States Energy Information Administration (US EIA). 2023. Drilling productivity report. https://www.eia.gov/petroleum/drilling/ (Accessed on Aug. 10, 2023)
- United States Environmental Protection Agency (US EPA). 2015. Ap 42, fifth edition, volume i chapter 5: Petroleum industry. https://www.epa.gov/sites/production/files/2020-09/documents/5.1_petroleum_refining.pdf (Accessed on Aug. 10, 2023)
- United States Environmental Protection Agency (US EPA). 2021. Refineries sector 2020 https://www.epa.gov/ghgreporting/ghgrp-refineries-sector-profile (Accessed on Aug. 10, 2023)
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Census tract-level variables are publicly available from the United States Census Bureau (https://data.census.gov) and CDC PLACES website (https://www.cdc.gov/places/index.html). Information on oil refineries is available in the United States Energy Information Administration (https://www.eia.gov/petroleum/refinerycapacity/). The data that support the findings of this study are openly available at the following URL: https://github.com/HonghyokKim/CHD_PPR.
