Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jun 1.
Published in final edited form as: Chemosphere. 2025 Apr 8;378:144390. doi: 10.1016/j.chemosphere.2025.144390

The association between long-term exposure to PM2.5 constituents and ischemic stroke in the New York City metropolitan area

Helena Krasnov a,*, Kshitij Sachdev b, Pablo Knobel a, Elena Colicino a, Maayan Yitshak-Sade a
PMCID: PMC12117512  NIHMSID: NIHMS2080748  PMID: 40203750

Abstract

Numerous studies linked fine particulate matter (PM2.5) to ischemic stroke. However, only a few investigated the differential associations with specific PM2.5 components and sources. We utilized electronic health records (EHR) from the Mount Sinai Health System in the New York City metropolitan area during 2011–2019 and assessed the associations of PM2.5 components and sources with ischemic stroke. We used mixed-effect Poisson survival regressions to assess the single-exposure associations with the chemical components. We used multivariable regression to assess the simultaneous associations with source-apportioned PM2.5 exposures estimated using non-negative matrix factorization. Then, we assessed the sensitivity of our results to different specifications of EHR data continuity: (1) using a less strict definition of censorship year, (2) adjusting the model for EHR data continuity index, a validated algorithm measuring EHR-data continuity based on indicators of primary care service utilization. We observed higher risks for ischemic stroke (Risk ratio [95 % confidence intervals] per interquartile range increase) associated with higher exposure to nickel (1.080 [1.045; 1.116]), vanadium (1.070 [1.033; 1.109]), zinc (1.076 [1.031; 1.122]), and nitrate (1.084 [1.039; 1.132]). In the multivariate models we found higher risk for ischemic stroke associated with exposure to oil combustion sourced PM2.5 (1.061 [1.012; 1.113]). The results remained consistent under different model specifications accounting for EHR data continuity. In conclusion, we found an increased risk of ischemic stroke associated with specific PM2.5 components and sources. These findings were robust to different specifications of EHR-data continuity. Our findings can inform policy and interventions aimed at reducing cardiovascular disease burden.

Keywords: Fine particulate matter, Ischemic stroke, Electronic health records, PM2.5 components, Data continuity, Air pollution

GRAPHICAL ABSTRACT

graphic file with name nihms-2080748-f0001.jpg

1. Introduction

Fine particulate matter (PM2.5) is a major contributor to morbidity and mortality, with over 4.2 million premature deaths worldwide attributed to PM2.5 exposure each year (Adamkiewicz et al., 2020; GRFC, 2018). Several studies found higher risks for stroke associated with short- and long-term exposures to PM2.5 (Yitshak et al., 2015; Yitshak-Sade et al., 2017, 2018), but knowledge gaps persist. First, most studies evaluated the exposure by particle size without considering its composition (Kulick et al., 2023). The few studies that investigated component-specific associations show mixed results and report a higher risk of stroke associated with different chemical components including sulfate (Liu et al., 2021), nitrate (Liu et al., 2021), organic and elemental carbon (Peng et al., 2009), lead, copper, zinc (Zhang et al., 2022a), arsenic and iron (Wang et al., 2019). Therefore, further research is needed to evaluate the differential stroke effects depending on the particle composition. Second, most studies investigating PM2.5 composition obtained PM2.5 component measurements from ground monitoring sites (Thurston et al., 2016). Exposure assessment at a finer spatial resolution can significantly minimize the exposure measurement error and improve our ability to estimate unbiased exposure-response associations (Thomas et al., 1993).

Finally, very few studies have investigated the simultaneous associations of specific PM2.5 components or sources with ischemic stroke risk (Ma et al., 2024a; Liu et al., 2023; de Bont et al., 2023). The single-pollutant approach may underestimate or overestimate the effects of each exposure, without accounting for additive, amplifying, or confounding relationships. Exposure mixture approaches within the exposome framework (Wild, 2005), however, overcome this limitation by accounting for the health effects of concurrent, and often correlated, exposures.

Many large-scale studies assessing the relationship between PM2.5 and stroke rely on electronic health records (EHR). The main advantage of EHR is the comprehensive sociodemographic and health information available for large and representative populations. However, if a patient seeks health care outside of the network that is captured by a particular EHR system, gaps in data continuity may arise. This could lead to the misclassification of events as non-events, and introduce bias due to outcome misclassification (Lin et al., 2018).

In this study, we investigate the association between PM2.5 chemical components and ischemic stroke using innovative machine-learning satellite-based exposure models (Just et al., 2022) and comprehensive EHR data from the Mount Sinai Health System (MSHS). Further, we use an exposomic approach to simultaneously assess the ischemic stroke risk associated with source apportioned PM2.5 exposure. Finally, we compare different specifications of EHR data-continuity and how these affect the observed association between the exposures and ischemic stroke.

2. Methods

2.1. Study population

This study was approved by the Institutional Review Board of Mount Sinai (STUDY 22–01400), and a waiver of informed consent was granted.

We included all individuals who were 20 years and older and had MSHS primary care physician during the observation period (2011–2019). MSHS is the largest healthcare provider in the New York City (NYC) metropolitan area. The EHR data includes inpatient and outpatient records obtained from five hospitals serving the population in the five boroughs of NYC and neighboring counties (Mount Sinai hospital, Mount Sinai Queens, Mount Sinai West, Mount Sinai Morningside, and Mount Sinai Brooklyn). We restricted the population to individuals residing in NYC (i.e., Bronx, Kings, New York, Queens, Richmond), Nassau, and Westchester counties, to minimize outcome misclassification. We further restricted the study population to individuals with a MSHS primary care physician to minimize the possibility of outcome misclassification due to limited access to out-of-network health information. Finally, we excluded individuals without a valid residential address (0.5 %). Each patient was included in the cohort from the first year they had any documented encounter within MSHS and was followed annually until death, censorship, or first occurrence of ischemic stroke.

If there were two consecutive years without any recorded use of MSHS services, we considered individuals as censored, and the censoring year was set as the last year in which service usage was documented. This approach reduces the likelihood of incorrectly keeping individuals, who no longer use MSHS services, under observation. However, a potential limitation of this approach is that it may lead to the premature removal of patients with low healthcare utilization. Therefore, in a secondary analysis to assess the sensitivity of the results to cohort and data-continuity specifications, we adopted a less stringent approach, defining the censoring year as the last year in which there was evidence of healthcare service utilization.

2.2. Exposures

We obtained annual PM2.5 chemical component estimates from models estimating the mean concentrations of the pollutants at each 50 m resolution grid in urban areas across the U.S. (Amini et al., 2022). The analyzed components included bromine - Br; calcium - Ca; copper - Cu; iron - Fe; potassium - K; nickel - Ni; lead - Pb; silicon - Si; vanadium - V; zinc - Zn; elemental carbon –-EC; ammonia - NH4; nitrate - NO3; organic carbon - OC; and sulfate - SO4. The models incorporate measurements from air pollution and weather monitoring sites, and over 160 predictor variables including land use information, meteorological covariates, and satellite observations. Predictions from three machine learning algorithms were integrated using super-learning and ensemble weighted-averaging models. The R2 values in an unseen test set were above 0.90 on average. Specific performance metrics and further details are available elsewhere (Amini et al., 2022). We linked the exposures to participants by proximity to geocoded residential addresses reported in the EHR.

Additionally, to allow the simultaneous investigation of source-apportioned PM2.5 exposures, we used Non-negative Matrix Factorization (NMF) (Yan et al., 2019) and attributed PM2.5 component annual means to source categories (Knobel et al., 2023). NMF is an efficient method for dimension reduction of a matrix of PM2.5 chemical components due to its non-negative constraint (Knobel et al., 2023). This method was previously used to identify the spatial distribution of PM2.5 sources across the US (Ma et al., 2024a; Knobel et al., 2023).

First, we aggregated the 50-m resolution values of PM2.5 annual component levels to census tract-level means across all the tracts within the Tri-State area (i.e., New York, New Jersey, and Connecticut). This aggregation facilitated computational feasibility while preserving high spatial resolution. NH4+ was not incorporated into the NMF model, since it is present mainly as ammonium sulfate and ammonium nitrate, and therefore is better captured by sulfate and nitrate, which are incorporated in the model. We then applied the NMF model and tested possible 4 to 6 source categories conducting 100 multiple runs to obtain the best factorization fit. We selected 5 categories based on the inflection point on the residuals sum of squares curve, a visual inspection of the mixture coefficient matrix, and an assessment of the correlation between the categories.

Finally, we rescaled the source factors to calculate source apportioned PM2.5 exposures. We calculated the total PM2.5 mixture level by summing up the concentrations of the 14 PM2.5 chemical components for each tract in each year. We regressed the source-factors concentrations against the total PM2.5 mixture concentration using a linear regression, to obtain the predicted coefficients for each source-factor. We then converted the source-factor concentration in each tract-year into μg/m3 units by multiplying each source-factor concentration by its corresponding coefficient (Feng et al., 2024).

2.3. Covariates and outcome

We extracted clinical and sociodemographic information from the EHR including age, sex (male and female), race (White, Black, Asian, and other/unknown), body mass index (BMI), smoking status (ever smoking – yes/no) and primary admission international classification codes-10th revision (ICD-10). We used ICD-10 codes to identify hospital admissions with a primary admission cause coded as ischemic stroke (ICD-10 I63). Additionally, we obtained census tract code level covariates from the U.S. Census (charcteristic CDSh) including: median household income, percentage of the population under the poverty line (annual income < $11,484), percentage of the population that has less than a high school education and percentage of the Black population.

In our secondary analysis comparing different specifications of EHR data-continuity we adjusted our models for the EHR – data continuity index. This validated algorithm measures patient data continuity based on indicators related to primary care service utilization. The prediction model for the index has been described in detail by Lin et al. (2018), and Merola et al. (2022), showing that better data continuity mitigates misclassification bias and improves internal validity. The index is calculated for each person-year based on the following: repeated encounters with the same physician, a general medical exam, BMI records, pap smear or mammogram records, colon cancer screening records, vaccination encounters, age, sex, race, number of diagnoses recorded, number of physician office visits, and number of distinct drugs recorded.

2.4. Statistical analysis

2.4.1. Single- and multi-exposure models

To investigate the associations between the individual PM2.5 component exposures and ischemic stroke, we used single pollutant mixed-effect Poisson survival regressions with the Anderson-Gill formulation (Whitehead, 1980; Sade et al., 2023; Yitshak-Sade et al., 2019). Because each subject has time-varying confounders, we fit a proportional hazard model using the equivalent Poisson regression. The models were adjusted for sociodemographic characteristics (age, sex, race and smoking status), and census tract-level socioeconomic variables (percentage below the poverty line, percentage of Black population, percentage with no high school diploma, and median household income). We initially tested for potential non-linear associations using penalized spline functions for the exposures, employing B-splines with evenly distributed maximum knots. Linear or nearly linear associations were observed for all components except Br (Supplementary Figure 1). Therefore, a penalized spline was used to assess the association with Br.

To investigate the simultaneous associations between the five source-factor categories and ischemic stroke we used a multivariable mixed-effect Poisson survival regression. The model was adjusted for the same set of confounders. Based on visual inspection showing approximately linear exposure-response curves, we used liner terms for all five source specific exposures (Supplementary Figure 2). Since moderate to high correlation was still present between the factors, we included single-source models as a sensitivity analysis, to ensure the inference did not change due to multicollinearity. Results are presented as rate ratios (RR) and 95 % confidence intervals (CI) for and interquartile range (IQR) increase in exposure. For the single-pollutant models, we calculated False Discovery Rate (FDR) correction to adjust for multiple comparisons. We consider the models from this analysis as the basic models and refer to these as “two-year gap” models.

2.4.2. Comparing methods of cohort specification and EHR data continuity

We compared findings from different methods of person-years inclusion criteria and control for EHR data continuity to evaluate the robustness of the observed associations. First, we repeated the “two-year gap” models adding further adjustment to EHR data continuity index. We refer to these as the “extended two-year gap” models. Second, rather than censoring subjects upon two consecutive years without any recorded use of MSHS services, we used a less stringent approach and defined the censoring year as the last year in which there was evidence of healthcare service utilization. We then repeated the single- and multi-pollutant models described above. We refer to these models as “last encounter” models. Finally, we repeated the “last encounter” models adding further adjustment to EHR data continuity index. We refer to these models as “extended last encounter” models.

3. Results

We included 855,147 person-years of 299,923 people, of whom 0.78 % (n = 2351) had ischemic stroke during the observation period. The mean age at the first year of follow-up was about 46 years, 62 % were women, 45 % identified as White, and almost 11 % identified as Black in both the “two-year gap” and “last encounter” cohorts (Table 1).

Table 1.

Descriptive statistics of the study population (N = 299,923) for the study period (2011–2019).

Variable Two-year gap” cohort
Last encounter” cohort
Mean (SD)/N (%) Mean (SD)/N (%)

Age at Cohort Entry 45.97 (17.46) 45.93 (17.47)
Race
Black 135,164 (45.07 %) 135,321 (45.06 %)
White 32,477 (10.83 %) 32,557 (10.84 %)
Other 22,474 (7.49 %) 22,486 (7.49 %)
Unknown/other 109,808 (36.61 %) 109,971 (36.62 %)
Body Mass Index (BMI) 26.56 (5.78) 26.56 (5.87)
Sex
Female 187,200 (62.42 %) 187,442 (62.41 %)
Male 112,723 (37.58 %) 112,893 (37.59 %)
Ever smoking 93,717 (31.20 %) 93,645 (31.22 %)
Ischemic stroke 2351 (0.78 %) 2351 (0.78 %)
 % Black population 0.11 (0.04) 0.11 (0.04)
 % under the poverty line 0.13 (0.07) 0.13 (0.07)
 % with no high school education 0.04 (0.01) 0.04 (0.01)
 Median household income 92,053.34 (16,909.84) 92,039.59 (16,912.64)
 EHR – continuity index 0.87 (0.28) 0.81 (0.34)

EHR = Electronic Health Records; SD = Standard Deviation.

The summary statistics of the PM2.5 chemical components and source-apportioned PM2.5 levels evaluated in this study are summarized in Table 2. The correlations between the component exposures and sources are presented in Supplementary Figure 3. The most substantial correlations were observed among SO4 and Pb (r = 0.87), NH4 and Pb (r = 0.87), Ca and Fe (r = 0.73), Fe and Zn(r = 0.72), and Fe and EC (r = 0.74). Among the source-apportioned exposures, we observed moderate correlations between metal industry and biomass burning, (r = −0.57) and between metal industry and oil combustion (−0.52). Additionally, there was a high correlation between metal industry and other industrial sources (r = 0.85). The correlations among the other sources were low.

Table 2.

Summary statistics of main exposures derived from the PM2.5 prediction model.

Component Mean (SD) IQR

Br (ng/m3) 2.05 (1.01) 1.09
Ca (ng/m3) 42.44 (8.67) 12.12
Cu (ng/m3) 5.88 (1.64) 2.82
Fe (ng/m3) 92.30 (18.78) 23.00
K (ng/m3) 44.12 (4.58) 7.09
Ni (ng/m3) 1.40 (0.33) 0.32
Pb (ng/m3) 4.23 (1.69) 3.29
Si (ng/m3) 59.69 (11.58) 14.20
V (ng/m3) 0.58 (0.42) 0.30
Zn (ng/m3) 16.54 (4.19) 4.99
EC (μg/m3) 1.00 (0.26) 0.33
NH4 (μg/m3) 0.44 (0.20) 0.28
NO3 (μg/m3) 1.18 (0.25) 0.25
OC (μg/m3) 2.01 (0.29) 0.42
SO4 (μg/m3) 1.10 (0.35) 0.44
Source apportioned PM2.5 (μg/m3)
Biomass Burning 0.94 (0.53) 2.18
Oil Combustion 0.40 (0.31) 2.23
Metal Industry 0.69 (0.37) 0.89
Other Industry 0.39 (0.41) 0.74
Motor Vehicle & Resuspension of Dust 1.90 (0.53) 3.47

Br - Bromine; Ca - Calcium carbonate; Cu - Copper; Fe - Iron; K - Potassium; Ni -Nickel; Pb - Lead; Si - Silicon; V - Vanadium; Zn - Zinc; EC - Elemental Carbon; NH4 - Ammonium; NO3 - Nitrate; OC - Organic Carbon; SO4 - Sulfate.

PM2.5 = Fine Particulate Matter; IQR = Interquartile Range; SD = Standard Deviation.

The contribution of the chemical components to each source-factor is shown in Supplementary Fig. 4. We labeled the 1st source as biomass burning based on high loading of K and Br; the 2nd as oil combustion based on high loading of V and Ni; the 3rd as metal-industry based on high loading of Pb; the 4th as other-industrial based on high loading of SO4; and the 5th as motor vehicle and resuspension of dust based on high loadings of EC, Fe, Cu, Ca, and Zn (Nan et al., 2023; Masri et al., 2015).

In the single-pollutant component models, we observed higher risks for ischemic stroke associated with IQR incremental increase in Ni (RR 1.080, 95 % CI 1.045; 1.116), V (RR 1.070, 95 % CI 1.033; 1.109), Zn (RR 1.076, 95 % CI 1.031; 1.122), and NO3 (RR 1.084, 95 % CI 1.039; 1.132) exposures. Our analysis also revealed protective effects of K (RR 0.858, 95 % CI 0.788; 0.935), SO4 (RR 0.820, 95 % CI 0.681; 0.987) and Br. The association with Br was nonlinear. For example, setting 2.8 ƞg/m3 as a reference (75th percentile), we observed a RR of 0.625 (95 % CI 0.556; 0.702) for an IQR increase in exposure. The protective effects across other values of the distribution were stronger (Table 3).

Table 3.

Single pollutant Poisson Regression results for the “two-year gap ” model.

Component Rate Ratio (95 % Confidence Interval) P-Value Ads P-Value

Br – 75th percentile 0.625 (0.556; 0.702) <0.001 0.005
Ca 1.009 (0.953; 1.068) 0.751 0.818
Cu 1.075 (0.975; 1.185) 0.144 0.262
Fe 1.045 (1.000; 1.092) 0.043 0.098
K 0.858 (0.788; 0.935) <0.001 0.002
Ni 1.080 (1.045; 1.116) <0.001 <0.001
Pb 0.803 (0.571; 1.131) 0.215 0.268
V 1.070 (1.033; 1.109) <0.001 <0.001
Si 0.991 (0.934; 1.050) 0.762 0.789
Zn 1.076 (1.031; 1.122) <0.001 0.004
EC 1.040 (0.995; 1.086) 0.070 0.140
NH4 1.087 (0.959; 1.233) 0.187 0.262
NO3 1.084 (1.039; 1.132) <0.001 <0.001
OC 1.028 (0.958; 1.103) 0.438 0.414
SO4 0.820 (0.681; 0.987) 0.004 0.094

We used single pollutant mixed-effect Poisson survival regressions to investigate the association between each PM2.5 component and ischemic stroke. The models were adjusted for sociodemographic characteristics (age, sex, race, and smoking status), and census tract-level socioeconomic variables (percentage below the poverty line, percentage of Black population, percentage with no high school diploma, and median household income).

We calculated False Discovery Rate (FDR) correction to adjust for multiple comparisons.

Br - Bromine; Ca - Calcium carbonate; Cu - Copper; Fe - Iron; K - Potassium; Ni - Nickel; Pb - Lead; Si - Silicon; V - Vanadium; Zn - Zinc; EC - Elemental Carbon; NH4 - Ammonium; NO3 - Nitrate; OC - Organic Carbon; SO4 – Sulfate; PM2.5 = fine particulate matter.

In the multivariable analysis assessing the simultaneous associations with source-apportioned PM2.5 exposures, we found higher risks for ischemic stroke associated with increased exposure to oil combustion sources (RR 1.061, 95 % CI 1.012; 1.113). The RR for the association with biomass burning was 0.750 (95 % CI 0.608; 0.923). The associations with the other source-apportioned exposures were imprecise with confidence limits including both positive and negative values (Table 4). Given the high to moderate correlation still present between the oil and industrial sources, we additionally conducted single-exposure models, including each source-apportioned exposure separately. This sensitivity analysis showed consistent associations, and the inference remained the same (Supplementary table 1).

Table 4.

Multivariable pollutant Poisson Regression results for the “two-year gap” model.

Source apportioned PM2.5 (μg/m3) Rate Ratio (95 % Confidence Interval) P-Value

Biomass Burning 0.750 (0.608; 0.923) 0.006
Oil Combustion 1.061 (1.012; 1.113) 0.013
Metal Industry 0.625 (0.369; 1.059) 0.080
Other Industry 0.911 (0.802; 1.034) 0.150
Motor Vehicle & Resuspension of Dust 0.965 (0.917; 1.014) 0.164

To investigate the simultaneous associations between the five source-factor source categories and ischemic stroke we used a multivariate mixed-effect Poisson survival regression. The model was adjusted for sociodemographic characteristics (age, sex, race, and smoking status), and census tract-level so cioeconomic variables (percentage below the poverty line, percentage of Black population, percentage with no high school diploma, and median household income).

We compared different methods of accounting for data continuity to evaluate the sensitivity of the associations to different data and model specifications. In the “two-year gap” cohort, the mean EHR – continuity index was 0.87, 88 % of the person-years indicated at least one encounter a year with an annual average of 2.59 encounters a year. Additionally, 87 % of the person-years had a value higher than 0.6, the threshold above which EHR – continuity is considered high (Merola et al., 2022). In the “last encounter” cohort, the mean EHR – continuity index was 0.81. 88 % of the person-years indicated at least one encounter a year with an annual average of 2.56 encounters a year. Additionally, 83 % of the person-years had a value higher than 0.6.

The secondary analysis comparing models using different methods for the definition of censorship year showed very similar results. For example, the RR for ischemic stroke associated with an IQR increase in Ni was 1.080 in the “two-year gap” model (95 % CI 1.045; 1.116) and 1.084 (95 % CI 1.049; 1.119) in the “last encounter” model. The RR for ischemic stroke associated with an IQR increase in V was 1.070 in the “two-year gap” model (95 % CI 1.033; 1.109) and 1.068 (95 % CI 1.031; 1.106) in the “last encounter” model. The RR for ischemic stroke associated with an IQR increase in oil combustion sources was 1.061 in the “two-year gap” model (95 % CI 1.012; 1.113) and 1.060 (95 % CI 1.012; 1.110) in the “last encounter” model. The effect estimates remained similar after adding an adjustment for the EHR – continuity index (Fig. 1; Supplementary Table 2).

Fig. 1.

Fig. 1.

The association between PM2.5 components and sources and ischemic stroke comparing the different methods of cohort specification and control for EHR data continuity.

PM2.5 = Fine Particulate Matter; RR = Rate Ratio; CI = Confidence Interval; Br - Bromine; Ca - Calcium carbonate; Cu - Copper; Fe - Iron; K - Potassium; Ni - Nickel; Pb - Lead; Si - Silicon; V - Vanadium; Zn - Zinc; EC - Elemental Carbon; NH4 - Ammonium; NO3 - Nitrate; OC - Organic Carbon; SO4 - Sulfate.

We used single pollutant mixed Poisson survival regressions to investigate the association between each PM2.5 component and ischemic stroke, and a multivariable model to investigate the simultaneous associations between the five source-apportioned PM2.5 exposures and ischemic stroke. The models were adjusted for sociodemographic characteristics (age, sex, race, and smoking status), and census tract-level socioeconomic variables (percentage below the poverty line, percentage of Black population, percentage with no high school diploma, and median household income).

In the “two-year gap” models and “extended two-year gap” models we consider people as censored if at least two consecutive following years showed no indication of Mount Sinai Health System (MSHS) services usage. In “last encounter” models and “extended last encounter” models, we consider people as censored after the last year with any recorded encounter.

4. Discussion

Our study shows the differential risks of ischemic stroke attributed to specific PM2.5 components and sources, in a large metropolitan area with diverse pollution sources. We found higher risk associated with Ni, V, Zn, and NO3. We also found a higher risk for ischemic stroke associated with exposure to oil combustion sourced PM2.5, to which V and Ni were the most highly contributing components. Furthermore, we show that these associations are not sensitive to different specifications of EHR data continuity.

Abundant epidemiological studies have investigated the effects of PM2.5 on ischemic stroke incidence (Alexeeff et al., 2021; Yuan et al., 2019; O’Donnell et al., 2011). PM2.5 exposure can affect cerebrovascular health by increasing oxidative stress along with producing reactive oxygen species (Schlesinger et al., 2006). A recent meta-analysis found a 14 % increase in stroke and 15 % increase in stroke mortality associated with long-term PM2.5 exposure (Fu et al., 2019). Accumulating evidence from recent years shows that different PM2.5 components have differential cardiovascular health effects (Ma et al., 2024a; Tian et al., 2024). This holds significant implications for large metropolitans like NYC, where dense populations and high building volumes amplify residents’ exposure to a diverse mixture of PM2.5 components. Most current studies, however, evaluated exposure by particle size without considering its composition (Yitshak et al., 2015; Yitshak-Sade et al., 2018; Kulick et al., 2023).

In this study, we evaluated the differential effects of PM2.5 components and source-apportioned PM2.5 exposures. We found significant associations with V and Ni exposure, known tracers of oil combustion emissions (Volkov et al., 2022; Tsygankova et al., 2011). We also found oil-combustion sourced PM2.5 exposure to be significantly associated with increased stroke risk, while accounting for contributions from other sources, underscoring the robustness of these findings. Oil-combustion related pollution is known to be associated with increased cardiovascular disease risk (Lewtas, 2007; Zhang et al., 2022b, 2022c). A recent national study in the U.S found oil PM2.5 exposure to be associated with increased risk of dying due to atherosclerotic cardiovascular disease. Oil combustion effects were especially pronounced in the Northeastern region (Ma et al., 2024b).

We also found elevated ischemic stroke risks associated with Zn and NO3. However, the lack of association with sources known to emit these chemical components in our multivariable model suggests that these associations may be confounded by other concurrent exposures. That said, in agreement with our findings, NO3, was previously found to be associated with increased risk for stroke mortality (Lin et al., 2016) and non-accidental total mortality (Tuomisto et al., 2008). Zn, a component found in the vehicle and suspended dust-related sources, was also reported to be associated with ischemic stroke in Shanghai (Jeong et al., 2022), California, and Denmark (Haddad et al., 2023). Therefore, further studies are needed to explore the role of these pollutants in the context of the exposome.

Contrary to our hypothesis and existing knowledge, we observed lower risks associated with Br, K, SO4 and biomass burning pollution, likely attributed to error. Similarly, a 2024 national study found a rate ratio of 0.996 (95 % CI 0.990; 1.002) for the association between biomass and coal burning PM2.5 and atherosclerotic cardiovascular disease mortality in northeastern regions in the U.S. However, in Western regions, coal and biomass burning sourced PM2.5 was significantly associated with increased risks. These regional differences may reflect variations in population characteristics, environmental and behavioral factors, which are also related to exposures to these components and could introduce residual confounding and bias (Ma et al., 2024a). The relatively large protective effect sizes observed for these components, compared to the more modest effect estimates for components associated with increased stroke risk, further support the hypothesis that these protective associations are likely biased.

An important finding of our study is the robustness of the PM2.5 and stroke associations to different specifications of EHR data continuity. Many of the large-scale studies of air pollution and stroke use EHR to identify cases. Although EHR gathers comprehensive sociodemographic and health information, the results may be affected by data completeness and continuity. One of the major limitations of EHR-based studies is the risk of outcome misclassification due to incomplete health information available. Investigators often classify individuals without the recording of certain conditions as “not having the condition”, potentially leading to misclassification of key variables.

Several researchers have explored misclassification in EHR (Hubbard et al., 2015; Beesley and Mukherjee, 2022) and reported on the significance of correctly estimating disease incidence and prevalence in a studied population. When it comes to EHR-derived conditions, diagnoses might be missed for patients with shorter follow-up periods, gaps in follow-up, and fewer documented visits (Phelan et al., 2017; Goldstein et al., 2016). To address this issue, we evaluated the sensitivity of the associations found in our study to different specifications of person-years inclusion criteria and data continuity. The associations observed in our study did not depend on the method used to define the censorship year, suggesting that the stricter “two-year gap” approach can be used without inducing bias due to the exclusion of individuals will lower healthcare utilization. Further, in the “two-year gap” models, adjustment to the EHR data continuity index did not change the results. This suggests that the “two-year gap” censorship definition is sufficient to account for the confounding effect of differential data continuity.

Our study had several limitations. First, the PM2.5 composition in NYC may not be generalizable to all rural and urban areas. However, our findings remain important to the millions of people who reside in NYC and comparable urban centers worldwide. Second, although we use a comprehensive modeling approach to estimate PM2.5 components, measurement error may still be present. This error, however, is minimized due to the high spatiotemporal resolution of our exposures. Additionally, our NMF model successfully reduced the data dimensionality, allowing us to incorporate the various PM2.5 components tested in one model. However, due to the complexity of separating air pollution sources within a dense urban environment, the factorization did not remove the correlation between the components entirely. To address this limitation, we added single-exposure models. Although some of the effects shifted due to confounding, the moderate to high correlation did not introduce bias or affected the overall inference. Finally, we focus our analysis on the MSHS health system patient population and do not have information on patients admitted to other healthcare facilities in our study area. However, MSHS is one of the largest healthcare providers in the area covering a large patient population. Like any EHR-based research, this analysis has inherent limitations. Specifically, stroke was identified using ICD-10 codes, potentially missing cases that were wrongly coded; Furthermore, although we incorporate individual and area levels confounders in our models, residual confounding may still be present.

5. Conclusion

In conclusion, we found differential stroke risks associated with specific PM2.5 components and sources in a large metropolitan area. We found an increased risk of ischemic stroke, associated with oil combustion sourced PM2.5, and known tracers of this emission source (i.e., V, and Ni). Increased stroke risk was possibly associated with Zn and NO3 exposure as well. Our findings can inform policy and interventions aimed at reducing cardiovascular disease burden.

Supplementary Material

supp

HIGHLIGHTS.

  • PM2.5 components (Ni, V, Zn, and NO3) are associated with elevated risks for ischemic stroke.

  • Oil combustion sourced PM2.5 is associated with elevated risk for ischemic stroke.

  • These associations were robust to different specifications of health data continuity.

Acknowledgments

Our co-author, Dr. Heresh Amini, died prematurely on July 12, 2024, after a battle with cancer. He contributed greatly to this manuscript, which, sadly, he did not live to see published. We dedicate this work to his academic excellence and endless optimism, honoring his contributions and memory.

This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Award (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences.

Abbreviations and acronyms

PM2.5

Fine Particulate Matter

EHR

Electronic Health Records

ICD-10

International Classification of Diseases version 10

MSHS

Mount Sinai Health System

NYC

New York City

BIC

Bayesian Information Criterion

RR

Rate Ratio

BMI

Body Mass Index

IQR

Interquartile Range

NMF

Non-negative Matrix Factorization

Footnotes

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Helena Krasnov: Writing – original draft, Formal analysis. Kshitij Sachdev: Writing – review & editing. Pablo Knobel: Writing – review & editing. Elena Colicino: Writing – review & editing. Maayan Yitshak-Sade: Writing – review & editing, Methodology, Conceptualization.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.chemosphere.2025.144390.

Data availability

The data that has been used is confidential.

References

  1. Adamkiewicz G, Liddie J, Gaffin JM, 2020. The respiratory risks of ambient/outdoor air pollution. Clin. Chest Med. 41 (4), 809–824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexeeff SE, Liao NS, Liu X, Van Den Eeden SK, Sidney S, 2021. Long-term PM2. 5 exposure and risks of ischemic heart disease and stroke events: review and meta-analysis. J. Am. Heart Assoc. 10 (1), e016890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Amini H, Danesh-Yazdi M, Di Q, et al. , 2022. Hyperlocal Super-learned PM2. 5 Components Across the Contiguous US.
  4. Beesley LJ, Mukherjee B, 2022. Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification. Biometrics 78 (1), 214–226. [DOI] [PubMed] [Google Scholar]
  5. Charcteristic CDSh. https://www.nyc.gov/site/dep/environment/air-pollution-regulations.page.
  6. de Bont J, Pickford R, Åström C, et al. , 2023. Mixtures of long-term exposure to ambient air pollution, built environment and temperature and stroke incidence across Europe. Environ. Int. 179, 108136. [DOI] [PubMed] [Google Scholar]
  7. Feng Y, Castro E, Wei Y, et al. , 2024. Long-term exposure to ambient PM2.5, particulate constituents and hospital admissions from non-respiratory infection. Nat. Commun. 15 (1), 1518. 10.1038/s41467-024-45776-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Fu P, Guo X, Cheung FMH, Yung KKL, 2019. The association between PM(2.5) exposure and neurological disorders: a systematic review and meta-analysis. Sci. Total Environ. 655, 1240–1248. 10.1016/j.scitotenv.2018.11.218. [DOI] [PubMed] [Google Scholar]
  9. Goldstein BA, Bhavsar NA, Phelan M, Pencina MJ, 2016. Controlling for informed presence bias due to the number of health encounters in an electronic health record. Am. J. Epidemiol. 184 (11), 847–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. GRFC, 2018. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: a systematic analysis for the global burden of disease study 2017. Lancet 392 (10159), 1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Haddad P, Joss MK, Weuve J, et al. , 2023. Long-term exposure to traffic-related air pollution and stroke: a systematic review and meta-analysis. Int. J. Hyg Environ. Health 247, 114079. [DOI] [PubMed] [Google Scholar]
  12. Hubbard RA, Benjamin-Johnson R, Onega T, Smith-Bindman R, Zhu W, Fenton JJ, 2015. Classification accuracy of claims-based methods for identifying providers failing to meet performance targets. Stat. Med. 34 (1), 93–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jeong H, Ryu J-S, Ra K, 2022. Characteristics of potentially toxic elements and multi-isotope signatures (cu, Zn, pb) in non-exhaust traffic emission sources. Environ. Pollut. 292, 118339. [DOI] [PubMed] [Google Scholar]
  14. Just A, Arfer K, Rush J, Lyapustin A, Kloog I, 2022. XIS-PM2. 5: a Daily Spatiotemporal machine-learning Model for PM2. 5 in the Contiguous United States. Authorea Preprints. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Knobel P, Hwang I, Castro E, et al. , 2023. Socioeconomic and racial disparities in source-apportioned PM2. 5 levels across urban areas in the contiguous US, 2010. Atmos. Environ. 303, 119753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kulick ER, Kaufman JD, Sack C, 2023. Ambient air pollution and stroke: an updated review. Stroke 54 (3), 882–893. 10.1161/strokeaha.122.035498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Lewtas J, 2007. Air pollution combustion emissions: characterization of causative agents and mechanisms associated with cancer, reproductive, and cardiovascular effects. Mutat. Res. Rev. Mutat. Res. 636 (1–3), 95–133. [DOI] [PubMed] [Google Scholar]
  18. Lin H, Tao J, Du Y, et al. , 2016. Differentiating the effects of characteristics of PM pollution on mortality from ischemic and hemorrhagic strokes. Int. J. Hyg Environ. Health 219 (2), 204–211. [DOI] [PubMed] [Google Scholar]
  19. Lin KJ, Glynn RJ, Singer DE, Murphy SN, Lii J, Schneeweiss S, 2018. Out-of-system care and recording of patient characteristics critical for comparative effectiveness research. Epidemiology 29 (3), 356–363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Liu L, Zhang Y, Yang Z, Luo S, Zhang Y, 2021. Long-term exposure to fine particulate constituents and cardiovascular diseases in Chinese adults. J. Hazard Mater. 416, 126051. 10.1016/j.jhazmat.2021.126051. [DOI] [PubMed] [Google Scholar]
  21. Liu T, Jiang Y, Hu J, et al. , 2023. Joint associations of short-term exposure to ambient air pollutants with hospital admission of ischemic stroke. Epidemiology 34 (2), 282–292. [DOI] [PubMed] [Google Scholar]
  22. Ma T, Knobel P, Hadley M, et al. , 2024a. Source-specific PM(2.5) and atherosclerotic cardiovascular disease mortality. NEJM Evid 3 (12), EVIDoa2400182. 10.1056/EVIDoa2400182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ma T, Knobel P, Hadley M, et al. , 2024b. PM2. 5 components mixture and atherosclerotic cardiovascular disease mortality: a national analysis of medicare enrollees. medRxiv, 2024.03. 23.24304739. [Google Scholar]
  24. Masri S, Kang CM, Koutrakis P, 2015. Composition and sources of fine and coarse particles collected during 2002–2010 in Boston, MA. J. Air Waste Manag. Assoc. 65 (3), 287–297. 10.1080/10962247.2014.982307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Merola D, Schneeweiss S, Jin Y, Lii J, Lin KJ, 2022. Advancing an algorithm for the identification of patients with high data continuity in electronic health records. Clin. Epidemiol. 1339–1349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Nan N, Yan Z, Zhang Y, Chen R, Qin G, Sang N, 2023. Overview of PM(2.5) and health outcomes: focusing on components, sources, and pollutant mixture co-exposure. Chemosphere 323, 138181. 10.1016/j.chemosphere.2023.138181. [DOI] [PubMed] [Google Scholar]
  27. O’Donnell MJ, Fang J, Mittleman MA, Kapral MK, Wellenius GA, Network, 2011. Fine particulate air pollution (PM2. 5) and the risk of acute ischemic stroke. Epidemiology 22 (3), 422–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Peng RD, Bell ML, Geyh AS, et al. , 2009. Emergency admissions for cardiovascular and respiratory diseases and the chemical composition of fine particle air pollution. Environmental health perspectives 117 (6), 957–963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Phelan M, Bhavsar NA, Goldstein BA, 2017. Illustrating informed presence bias in electronic health records data: how patient interactions with a health system can impact inference. EGEMs 5 (1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Sade MY, Shi L, Colicino E, et al. , 2023. Long-term air pollution exposure and diabetes risk in American older adults: a national secondary data-based cohort study. Environ Pollut 320, 121056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Schlesinger R, Kunzli N, Hidy G, Gotschi T, Jerrett M, 2006. The health relevance of ambient particulate matter characteristics: coherence of toxicological and epidemiological inferences. Inhal. Toxicol. 18 (2), 95–125. [DOI] [PubMed] [Google Scholar]
  32. Thomas D, Stram D, Dwyer J, 1993. Exposure measurement error: influence on exposure-disease relationships and methods of correction. Annu. Rev. Publ. Health 14 (1), 69–93. [DOI] [PubMed] [Google Scholar]
  33. Thurston GD, Burnett RT, Turner MC, et al. , 2016. Ischemic heart disease mortality and long-term exposure to source-related components of US fine particle air pollution. Environmental health perspectives 124 (6), 785–794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Tian Y, Ma Y, Wu J, et al. , 2024. Ambient PM2. 5 chemical composition and cardiovascular disease hospitalizations in China. Environmental Science & Technology 58 (37), 16327–16335. [DOI] [PubMed] [Google Scholar]
  35. Tsygankova M, Bukin V, Lysakova E, Smirnova A, Reznik A, 2011. The recovery of vanadium from ash obtained during the combustion of fuel oil at thermal power stations. Russ. J. Non-Ferrous Metals 52, 19–23. [Google Scholar]
  36. Tuomisto JT, Wilson A, Evans JS, Tainio M, 2008. Uncertainty in mortality response to airborne fine particulate matter: combining European air pollution experts. Reliab. Eng. Syst. Saf. 93 (5), 732–744. [Google Scholar]
  37. Volkov A, Kologrieva U, Stulov P, 2022. Study of forms of compounds of vanadium and other elements in samples of pyrometallurgical enrichment of ash from burning oil combustion at thermal power plants. Materials 15 (23), 8596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Wang W, Liu C, Ying Z, et al. , 2019. Particulate air pollution and ischemic stroke hospitalization: how the associations vary by constituents in shanghai, China. Sci. Total Environ. 695, 133780. 10.1016/j.scitotenv.2019.133780. [DOI] [PubMed] [Google Scholar]
  39. Whitehead J, 1980. Fitting Cox’s regression model to survival data using GLIM. J. Roy. Stat. Soc. C Appl. Stat. 29 (3), 268–275. [Google Scholar]
  40. Wild CP, 2005. Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiology and Prevention Biomarkers 14 (8), 1847–1850. [DOI] [PubMed] [Google Scholar]
  41. Yan M, Yang X, Hang W, Xia Y, 2019. Determining the number of factors for non-negative matrix and its application in source apportionment of air pollution in Singapore. Stoch. Environ. Res. Risk Assess. 33, 1175–1186. [Google Scholar]
  42. Yitshak Sade M., Novack V, Ifergane G, Horev A, Kloog I, 2015. Air pollution and ischemic stroke among young adults. Stroke. a journal of cerebral circulation 46 (12), 3348–3353. 10.1161/strokeaha.115.010992 (Epub 2015 Nov 2015. [DOI] [PubMed] [Google Scholar]
  43. Yitshak-Sade M, Kloog I, Novack V, 2017. Do air pollution and neighborhood greenness exposures improve the predicted cardiovascular risk? Environ. Int. 107, 147–153. 10.1016/j.envint.2017.07.011. [DOI] [PubMed] [Google Scholar]
  44. Yitshak-Sade M, Bobb JF, Schwartz JD, Kloog I, Zanobetti A, 2018. The association between short and long-term exposure to PM(2.5) and temperature and hospital admissions in new England and the synergistic effect of the short-term exposures. Sci. Total Environ. 639, 868–875. 10.1016/j.scitotenv.2018.05.181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Yitshak-Sade M, Blomberg AJ, Zanobetti A, et al. , 2019. County-level radon exposure and all-cause mortality risk among medicare beneficiaries. Environ. Int. 130, 104865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Yuan S, Wang J, Jiang Q, et al. , 2019. Long-term exposure to PM2. 5 and stroke: a systematic review and meta-analysis of cohort studies. Environ. Res. 177, 108587. [DOI] [PubMed] [Google Scholar]
  47. Zhang Y, He Q, Zhang Y, Xue X, Kan H, Wang X, 2022a. Differential associations of particle size ranges and constituents with stroke emergency-room visits in shanghai, China. Ecotoxicol. Environ. Saf. 232, 113237. 10.1016/j.ecoenv.2022.113237. [DOI] [PubMed] [Google Scholar]
  48. Zhang Y, Li W, Jiang N, et al. , 2022b. Associations between short-term exposure of PM2. 5 constituents and hospital admissions of cardiovascular diseases among 18 major Chinese cities. Ecotoxicol. Environ. Saf. 246, 114149. [DOI] [PubMed] [Google Scholar]
  49. Zhang Y, He Q, Zhang Y, Xue X, Kan H, Wang X, 2022c. Differential associations of particle size ranges and constituents with stroke emergency-room visits in shanghai, China. Ecotoxicol. Environ. Saf. 232, 113237. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp

Data Availability Statement

The data that has been used is confidential.

RESOURCES