Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jan 1.
Published in final edited form as: Environ Res. 2022 Nov 11;216(Pt 4):114792. doi: 10.1016/j.envres.2022.114792

Measurement Error Correction for Ambient PM2.5 Exposure Using Stratified Regression Calibration: Effects on All-Cause Mortality

Yijing Feng 1, Yaguang Wei 2, Brent A Coull 3, Joel D Schwartz 1,2
PMCID: PMC9729458  NIHMSID: NIHMS1850811  PMID: 36375508

Abstract

Background

Previous studies on the impact of measurement error for PM2.5 were mostly simulation studies, did not control for other pollutants, or used a single regression calibration model to correct for measurement error. However, the relationship between actual and error-prone PM2.5 concentration may vary by time and region. We aim to correct the measurement error of PM2.5 predictions using stratified regression calibration and investigate how the measurement error biases the association between PM2.5 and mortality in the Medicare Cohort.

Methods

The “gold-standard” measurements of PM2.5 were defined as daily monitoring data. We regressed daily monitoring PM2.5 on modeled PM2.5 using the simple linear regression by strata of season, elevation, census division and time period. Calibrated PM2.5 was calculated with stratum-specific calibration parameters β0 (intercept) and β1 (slope) for each strata and aggregated to annual level. Associations between calibrated and error-prone annual PM2.5 and all-cause mortality among Medicare beneficiaries were estimated with Quasi-Poisson regression models.

Results

Across 208 strata, the median of β0 and β1 were 0.62 (25% 0.0.20, 75% 1.06) and 0.93 (25% 0.87, 75% 0.99). From calibrated and error-prone PM2.5 data, we estimated that each 10μg/m3 increase in PM2.5 was respectively associated with 4.9% (95%CI 4.6-5.2) and 4.6% (95%CI 4.4-4.9) increases in the mortality rate among Medicare beneficiaries, conditional on confounders.

Conclusions

Regression calibration parameters of PM2.5 varied by time and region. Using error-prone measures of PM2.5 underestimated the association between PM2.5 and all-cause mortality. Modern exposure models produce relatively small bias.

Keywords: Measurement error, Regression calibration, Air pollution, PM2.5, Mortality

1. Introduction

Measurement error in exposure has long been a concern in epidemiological research on air pollution. Many previous studies on measurement error of air pollution were simulation studies which evaluated the influence of measurement error under different assumptions on the structure of the error 17. Few studies in the last ten years have evaluated the impact of measurement error on the estimated association between PM2.5 and mortality using real-world data 8,9. A recent review suggested that most of the time, measurement error yields an underestimate of the effect and inflates the standard error of the effect 10, and a recent study extends that conclusion to nonlinear and threshold relationships11. However, existing studies usually used a single model to correct for measurement error in air pollution, and did not control for other air pollutants8,1214. In addition, modeling studies suggested that the performance of air pollution prediction models varied by time and region 15,16. Therefore, correcting measurement error of predicted air pollution exposures using a single regression calibration model might not be adequate. Moreover, the error-prone measurement of air pollution exposures in previous studies were mostly predicted by traditional methods such as kriging models and land use regression models 5,8,17,18. Less is known about the measurement error from air pollution data predicted by ensemble machine learning models which have demonstrated improved predictive performance and fine spatiotemporal resolution.

In this study, we aim to correct the measurement error of PM2.5 using stratified regression calibration models, to evaluate the difference in regression calibration models across time and region, and to investigate how the measurement error from ensemble machine learning models biases the association between PM2.5 and mortality.

2. Methods

2.1. Mortality data

Mortality data of the Medicare beneficiaries from the Medicare beneficiary denominator file from the Centers for Medicare and Medicaid Services. Beneficiaries entered the open cohort on January 1st, 2000 or the first January 1st after enrollment into Medicare, whichever came later. All the included beneficiaries were followed until death or Dec 31st, 2016, whichever came first. Information on age, sex, race/ethnicity, Medicaid eligibility, ZIP code of residence and date of death was extracted for each included beneficiary. Age, ZIP code, and Medicaid eligibility were updated annually. Beneficiary data was aggregated by ZIP code and year. We excluded ZIP code-year records with fewer than 100 Medicare beneficiaries.

2.2. PM2.5 monitoring data

Details of the PM2.5 monitoring data were described elsewhere 15. Briefly, the monitoring data of PM2.5 were obtained from several sources including the Air Quality System (AQS) of the Environmental Protection Agency (EPA), Clean Air Status and Trends Network (CASTNET), The Interagency Monitoring of Protected Visual Environments (IMPROVE) and other regional or local datasets across 2,156 sites in the US. Most of the monitoring sites were located in the Eastern US, Western coast, and urban areas with fewer monitoring sites in rural areas and mountainous regions. Not all monitors operated continuously, with some of them operating every 3 or 6 days. We obtained or aggregated the PM2.5 data into 24-hour averages over the study period (Jan 1st, 2000 – Dec 31st, 2016). In this study, the monitoring data was treated as the “gold standard” of PM2.5 measurement.

2.3. Modeled PM2.5 data

The error-prone PM2.5 measurement were a set of exposure estimates calculated using validated ensemble models at the 1km*1km grid cell level across the US 15. Briefly, daily average concentration of PM2.5 was estimated by combining predictions from random forest, gradient boosting, and neural network models in a geographically weighted regression. Predictors in the models included aerosol optical depth, meteorology data, chemical transport model simulations and land-use data. The overall cross-validated R2 of the model was 0.860 with a spatial R2 of 0.894 and temporal R2 of 0.847. The statistics varied over time (0.751-0.902), census division (0.769-0.904) and season (0.825-0.901)15.

Daily PM2.5 at each grid cell was aggregated to ZIP code level 19. At each monitoring site, the estimated PM2.5 concentration was predicted using model which was trained without PM2.5 monitoring data from that specific site. For each year between 2000-2016, we calculated the ZIP code-level annual and seasonal PM2.5 concentrations. Seasonal concentrations were calculated by averaging a ZIP code’s daily concentration across the whole season (Spring was defined as March 1st – May 31st; Summer was defined as June 1st – August 31st; Fall was defined as September 1st-November 30th; Winter was defined as December 1st – last day of February). Annual concentration was calculated by averaging the daily concentration over an entire calendar year.

2.3. Estimating calibrated PM2.5 concentration at each ZIP code

At the location of each monitor site, we extracted monitoring PM2.5 data of the available days during the study period and the modeled PM2.5 concentration for the corresponding days. We then stratified these data by census divisions (New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, Mountain and Pacific regions), season, elevation (above or below 75%tile), and time period (2000-2005, 2006-2010 and 2011-2016), with 208 strata in total (the high elevation area in East South Central area was not stratified by time period due to the limited observations). We fit a strata-specific calibration model that calibrates the modeled exposures to the monitored data from each stratum. This allows different calibration slopes for each stratum. Because we conducted regression calibrations separately within each stratum, we were not able to obtain a single estimate of the uncertainty arising from the regression calibration and directly apply that to the health effect estimates. Therefore, we used bootstrap to account for the uncertainty from regression calibration in the health effect estimates. Specifically, we used a non-parametric bootstrap method to obtain the estimates of the regression calibration parameters (stratum specific β0 and β1).

  1. Within a specific stratum defined by census division, elevation and time period, we randomly generated 1,000 bootstrap samples of monitors. The number of sampled monitors was the same as the number of monitors within the stratum (some monitors could be sampled more than once)

  2. We extracted the observations from the sampled monitors within the stratum defined by census division, elevation and time period and generated a new dataset. If a monitor was sampled more than once within the bootstrap sample, we would include multiple copies of observations from that monitor in the new dataset (if a monitor was sampled three times within the bootstrap sample, each observation from that monitor would be replicated three times in the new dataset).

  3. The new dataset was then stratified by season. Within each stratum, we regressed the monitored PM2.5 on the modeled PM2.5 using simple linear regression. J=1,2,…,208; k = 1,2,…,1000.

    E(PM2.5monitoring)=β0jk+β1jkPM2.5modeled
  4. β0Jk^ and β1Jk^ were extracted for each stratum and bootstrap sample.

Because the high elevation area in East South Central had only one monitor, we could not generate bootstrap samples of monitors. Instead, we generated 1,000 bootstrap samples of observations within each season and estimated stratum specific β0^ and β1^.

We estimated stratum-specific regression calibration parameters β0J^ and β1J^ and their uncertainty from the mean and variance of β0Jk^ and β1Jk^. Cochran’s Q test was used to test the heterogeneity of the regression calibration parameters (β0J^ and β1J^) across strata.

The modeled seasonal PM2.5 concentrations for each ZIP code in the contiguous U.S. from 2000 to 2016 were also stratified by census division, season, elevation and time period in the same way as the monitoring sites. For each stratum, we estimated 1,000 sets of calibrated PM2.5 concentration based on the β0jk and β1jk of the corresponding stratum.

PM2.5calibrated=β0Jk^+β1Jk^PM2.5modeled

In this way, we obtained 1,000 estimates of the calibrated PM2.5 concentration for each ZIP code and season through 2000-2016.

Given that we are interested in the effect of long-term PM2.5 exposure, we calculated the annual average calibrated PM2.5 concentration at each ZIP code by averaging the concentration across the four seasons of the corresponding year and obtained 1,000 calibrated annual PM2.5 concentration for each ZIP code and each year through the study period.

2.4. Covariates

Data on annual ozone and nitrogen dioxide was predicted at 1km*1km level using well-validated ensemble models 15,20,21 and aggregated to ZIP code-level in the same way as the modeled PM2.5 data. Demographic information including age, sex, race, Medicare eligibility was obtained from the Medicare denominator file. Socioeconomic status (SES) data at each ZIP code including percentage of population living in poverty, percentage of population that had less than high school education, and percentage of population who were on public assistance were linearly extrapolated from the 2000 and 2010 US census and obtained annually from the American Community Survey from 2011-2016. Data on access to health care including average annual percent of Medicare enrollees having at least one ambulatory visit to a primary care clinician and distance to nearest hospital were obtained or calculated from the data from Dartmouth Health Atlas website. Population density data was obtained from NASA Socioeconomic Data and Application Center’s annual mean 30 second population density data at 1km*1km level and interpolated and aggregated to annual ZIP code level. Meteorology data including ZIP code-level summer mean of daily temperature and winter mean of daily temperature was obtained and calculated from NASA Daymet data. Calendar year was also included as a covariate to account for potential temporal trend.

2.5. Health outcome models

Calibrated and uncalibrated (modeled) PM2.5 were matched to the mortality data by ZIP code and year. We first estimated the association between calibrated PM2.5 and all-cause mortality by running Quasi Poisson regression models with each of the 1000 sets of calibrated ZIP code-level annual PM2.5. k = 1, 2, …, 1,000

log(E(death countsC,PM2.5calibrated,person counts))=θ¯0k+θ¯1kPM2.5calibrated+θ¯2kC+log(person counts)

Death counts and population counts denote the number of deaths among Medicare beneficiaries and total number of Medicare beneficiaries within a specific ZIP code and year. C denote covariates including data on demographics, SES, access to healthcare, population density, meteorology, ozone, NO2 and calendar year. For demographic information, we aggregated individual data to ZIP code-level and adjusted for percentage of beneficiaries who were male, percentage of beneficiaries who aged above between 65 and 74, percentage of beneficiaries who aged above between 75 and 84, percentage of beneficiaries who were white, and percentage of beneficiaries who were eligible for Medicaid in the model. From each model, we obtained the point estimate θ¯1k and the estimated variance Var(θ¯1k). The point estimate of θ¯1 was obtained from the mean of 1,000 the θ¯1k. The uncertainty of the θ¯1 arised from two sources: (1) the uncertainty from the regression calibration; (2) the uncertainty from the health outcome model. In our study, all of the 1,000 sets of calibrated PM2.5 were matched to the same mortality data, therefore, we estimated the variance of θ¯1 using Rubin’s rule:

Var(θ¯1)=1mk=1mVar(θ¯1k)+(1+1m)×(1m1k=1m(θ¯1kθ¯1)2)

Where m was the number of bootstrap samples, which was 1,000 in this study.

The association between uncalibrated PM2.5 and all-cause mortality was estimated by a single Quasi Poisson regression model and adjusting for the same covariates described above.

2.6. Low-level analysis

We restricted our analysis to ZIP code areas with calibrated PM2.5 concentration lower than 10μg/m3 to investigate the association between calibrated PM2.5 and all-cause mortality at concentrations well below the current national standard. We also conducted the same analysis for the uncalibrated PM2.5 concentration. A previous publication indicated there was little difference in the exposure error above or below 1022.

All of the analysis were conducted using R software, version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria).

3. Results

3.1. ZIP code-level characteristics

Mortality of 28,631 ZIP codes from 2000-2016 were included in the analysis. In total, 30,344,997 beneficiaries died during the study period. Across all the ZIP code-years, the median number of beneficiaries per ZIP code was 580, the median percentage of age over 75 among Medicare beneficiaries was 50%; the median percentage of male beneficiaries was 44%; the median percentage of white beneficiaries was 96%. SES, access to healthcare, temperature and other air pollution levels are summarized in table 1.

Table 1.

Characteristics of the included ZIP codes between 2000-2016.

Overall
ZIP code-years 506,897
Number of beneficiaries 580.00 [230.00, 1859.00]
Population density 53.98 [14.18, 536.46]
Characteristics of Medicare beneficiaries (median percentage [IQR])
Aged between 65-74 50 [46, 55]
Aged between 75-84 35 [32, 38]
Male 44 [41, 47]
White 96 [86, 99]
Black 1 [0, 6]
Eligible for Medicaid 10 [6, 16]
ZIP code-level socioeconomic status (median percentage [IQR])
Education below high-school 34 [27, 39]
Poverty 12 [08, 18]
Population on public assistance 2 [1, 3]
Access to healthcare
Percent beneficiaries had ambulatory visit 80 [77, 83]
Distance to the nearest hospital (KM) 8.81 [3.06, 17.16]
Meteorology and other pollutants
Mean temperature in summer (°C) 23.17 [20.96, 25.95]
Mean temperature in winter (°C) 1.66 [−2.50, 7.47]
NO2 (ppb) 15.5 [11.0, 22.3]
Ozone (ppb) 38.9 [37.0, 41.0]

3.2. Regression calibration

In total, 2,774,423 observations of PM2.5 monitoring data from 2,156 sites were used to develop the regression calibration model. Within each stratum, the median number of observations was 4,028 and the median number of monitoring sites was 25. Across the whole study period, the median of the monitored PM2.5 was 8.40 μg/m3 (25% 5.20, 75% 13.20). Cochran’s Q test suggested that both β0 (p<0.001) and β1 (p<0.001) were heterogeneous across the 208 strata. The calibrated parameters for each stratum are shown in figure 13 and figure 46. Across the 208 strata, the median of β0 was 0.58 (25% 0.11, 75% 1.17) and the median of β1 was 0.93 (25% 0.86, 75% 0.99). The lowest β0 was −2.61 estimated from the high elevation area of Middle Atlantic region in summer during 2011-2016 while the highest was 5.30 estimated from the high elevation area in South Atlantic region in summer during 2000-2005. The lowest β1 was 0.08 estimated from the high elevation area of New England in winter during 2011-2016 while the highest 1.33 estimated from the high elevation area in Middle Atlantic region in summer during 2011-2016.

Figure 1.

Figure 1.

The distribution of calibration parameter ϐ1 across strata by census division, season and elevation between 2000-2005

From the stratified regression calibration, we estimated the relationship between monitoring PM2.5 and predicted PM2.5 level. ϐ1 is the slope of the regression calibration model. This figure shows the median and 95% confidence interval of ϐ1 from different strata defined by elevation, season and census division.

Figure 3.

Figure 3.

The distribution of calibration parameter ϐ1 across strata by census division, season and elevation between 2011-2016

From the stratified regression calibration, we estimated the relationship between monitoring PM2.5 and predicted PM2.5 level. ϐ1 is the slope of the regression calibration model. This figure shows the median and 95% confidence interval of ϐ1 from different strata defined by elevation, season and census division.

Figure 4.

Figure 4.

The distribution of calibration parameter ϐ0 across strata by census division, season and elevation between 2000-2005.

From the stratified regression calibration, we estimated the relationship between monitoring PM2.5 and predicted PM2.5 level. ϐ0 is the intercept of the regression calibration model. This figure shows the median and 95% confidence interval of ϐ0 from different strata defined by elevation, season and census division.

Figure 6.

Figure 6.

The distribution of calibration parameter ϐ0 across strata by census division, season and elevation between 2011-2016

From the stratified regression calibration, we estimated the relationship between monitoring PM2.5 and predicted PM2.5 level. ϐ0 is the intercept of the regression calibration model. This figure shows the median and 95% confidence interval of ϐ0 from different strata defined by elevation, season and census division.

For areas with low elevation, the median β0 was 0.65 (25% 0.26, 75% 0.98) and the median β1 was 0.93 (25% 0.89, 75% 0.96), while the corresponding values for areas with high elevation were 0.54 (25% 0.04, 75% 1.15) and 0.93 (25% 0.83, 75% 1.01). The distribution of regression calibration parameters did not vary much by season. The median β0 values were 0.53 (25% 0.11, 75% 1.04), 0.56 (25% 0.20, 75% 0.93), 0.84 (25% 0.35, 75% 1.28), 0.54 (25% 0.13, 75% 0.88), and the median β1 values were 0.95 (25% 0.88, 75% 0.98), 0.93 (25% 0.86, 75% 0.98), 0.92 (25% 0.87, 75% 0.97), 0.93 (25% 0.87, 75% 0.97) for summer, fall, winter, and spring respectively. The regression calibration parameters varied across time periods. The median β0 were 0.76, 0.25 and 0.75 during the time periods 2000-2005, 2006-2010 and 2011-2016 while the median β1 were 0.93, 0.97 and 0.89 respectively. The regression calibration parameters also varied by census divisions. The lowest median β1 was estimated from West South-Central region, with the value of 0.88, while the highest was 0.99 from Middle Atlantic region. The lowest median β0 was 0.18 and was estimated from Middle Atlantic region, while the largest was 1.15 estimated from West South-Central region.

3.3. Modeled and calibrated PM2.5

During the study period, the median modeled PM2.5 across 28,631 ZIP code was 9.67 μg/m3 (25% 7.69, 75%11.81). For the calibrated PM2.5, the mean of the median PM2.5 across 1,000 calibration sets was 9.62 μg/m3, while the mean of 25th% and 75th% were 7.61 μg/m3 and 11.82 μg/m3 respectively. During our study period, the annual PM2.5 gradually decreased in the US. In 2000, the median modeled PM2.5 was 12.3 μg/m3 and the mean of median PM2.5 across 1,000 calibration set was 12.3 μg/m3. While in 2016, the corresponding numbers were 7.2 and 7.3 μg/m3.

3.4. Health outcome regression

We obtained estimates of θ¯1k and var(θ¯1k) from the 1,000 sets of calibrated annual ZIP code-level PM2.5. The mean of θ¯1k was 0.048, the variance of the 1,000 point-estimates was 9.47*10−7 (standard deviation = 9.73*10−4) and the mean of var(θ¯1k) was 1.18*10−6 (mean of standard error = 1.1*10−4). Therefore, from the calibrated PM2.5 data, we estimated that each 10μg/m3 increase in PM2.5 was associated with a 4.9% (95%CI 4.6-5.2) increase in the estimated rate of death among the Medicare beneficiaries after adjusting for demographics, SES, access to healthcare, population density, meteorology covariates, ozone, NO2 and calendar year (table 2).

Table 2.

Association between calibrated and modeled PM2.5 concentration and all-cause mortality among Medicare beneficiaries between 2000-2016.

Mortality rate Ratioa
(95%CI)
Percent Change in Effect Estimate
The whole US Modeled PM2.5 1.046 (1.044, 1.048)
Calibrated PM2.5 1.049 (1.046, 1.052) 7%
Areas with annual Modeled PM2.5 1.073 (1.068, 1.078)
pm2.5 ≤10μg/m3 Calibrated PM2.5 1.081 (1.074, 1.089) 11%
a.

All the mortality rate ratios were adjusted for percentage of beneficiaries who were male, percentage of beneficiaries who aged above between 65 and 74, percentage of beneficiaries who aged above between 75 and 84, percentage of beneficiaries who were white, and percentage of beneficiaries who were eligible for Medicaid in the model, percentage of population living in poverty, percentage of population had education less than high school and percentage of population who were on public assistance, average annual percent of Medicare enrollees having at least one ambulatory visit to a primary care clinician and distance to nearest hospital, population density, summer mean of daily temperature and winter mean of daily temperature at ZIP code-level and calendar year.

Using the modeled PM2.5 data, we estimated that each 10μg/m3 increase in annual ZIP code-level PM2.5 was associated with a 4.6% (95%CI 4.4-4.9) increase in the estimated rate of death among the Medicare beneficiaries after adjusting for covariates mentioned above. This is about an 7% downward bias due to measurement error.

When we restricted the analysis to ZIP codes with PM2.5 ≤10μg/m3, each 10μg/m3 increase in the calibrated annual ZIP code-level PM2.5 was associated with a 8.1% (95%CI 7.4-8.9) increase in the estimated rate of death among the Medicare beneficiaries, while each 10μg/m3 increase in the modeled annual ZIP code-level PM2.5 was associated with a 7.3% (95%CI 6.8-7.8) increase in the estimated rate of death among the Medicare beneficiaries, or about a 11% downward bias. Both results were adjusted for demographics, SES, access to healthcare, population density, meteorology covariates, ozone, NO2 and calendar year.

4. Discussion

In this study, we allowed regression calibration models to vary by strata of census division, elevation, season and time periods. Heterogeneity on the regression calibration parameters was observed across strata. Overall, using error-prone measures of PM2.5 estimated by ensemble machine learning models underestimated the association between PM2.5 and all-cause mortality among the Medicare population by about 7%. Effects were larger at lower (< 10 μg/m3) concentrations, and the bias due to measurement error was larger at lower concentrations.

Methods including risk set regression calibration, regression calibration using instrumental variables, simulation extrapolation (SIMEX) and non-parametric bootstrap and parameter bootstrap have been employed to correct for measurement error of air pollution in previous studies 8,2326. Our study adds to existing literature with a more flexible method to address the issue of measurement error in modeled air pollution data, and control for multiple pollutants.

Previous exposure modeling studies suggested that the performance of air pollution prediction models could vary by location, season, and time 15,16. In another study, Kioumourtzoglou et.al 27 estimated calibration coefficients using a mixed effect model and observed heterogeneity in the coefficients between cities. We used different regression calibration models by strata; therefore, the relationship between “gold standard” and its error-prone measurements could vary spatially and temporally. We observed that the calibration coefficients did vary across strata. Specifically, in areas with low elevation the slope between measured and modeled PM2.5 were mostly closer to one, suggesting that effect estimates in the health outcome models would yield small bias in these areas. However, for areas with high elevation, the corresponding slopes had larger variation and were further from one, especially in the New England, East North Central, and West South Central regions. Therefore, effect estimates in areas with high elevation using predicted PM2.5 may yield larger bias. The higher uncertainty in the calibration coefficients within high elevation areas could be due to the sparse monitoring data (when compared to low elevation area), or poorer performance of some of the predictors in those areas. Di et.al 15 found that the performance of their prediction models was lower in high elevation areas, which could be a potential cause for the fact that the slopes between measured and modeled PM2.5 were further from one in these areas. We also observed that the regression calibration parameter varied by time. The slopes between 2011-2016 was lower than the slopes in the earlier years. The difference could be potentially due to the change in PM2.5 level, availability and scope of the predictors over time.

Several previous studies used regression calibration to correct for measurement error of PM2.5. Hart et.al8 found that each 10 μg/m3 increase in measurement error-adjusted PM2.5 was associated with 27% increase in hazard of death while the corresponding number for error-prone PM2.5 was 20%, which could be translated into 35% change in the health effect estimate. In another study investigating the association between PM2.5 and lung cancer 14, the authors found that correcting for measurement error could increase the effect estimate by 10-38%. We observed a 7% increase in the effect estimate after correcting for measurement error in our study, which was lower than the previous studies. One potential reason could be that the error-prone PM2.5 in our study was estimated from ensemble machine learning models which yielded smaller bias than other traditional modeling methods 15,28, as reported by Di and coworkers. In addition, the gold standard measures of the previous studies were personal exposure instead of ambient exposure, and therefore, the predicted ambient PM2.5 concentration would be a worse proxy for the “gold standard” measurements in those studies.

We chose neighborhood ambient concentrations as our gold standard for several reasons. First, it is outdoor concentrations, and not personal exposures, that are monitored and regulated by governments. Hence the policy relevant question is what is the effect of that exposure metric on health. Second, as noted by Webster and Weisskopf, personal exposures are correlated with a wide range of personal behaviors which may themselves by risk factors for health29,30. Hence studies using personal exposure would need to control for them as confounders, although they are rarely measured and hard to measure. For example, driving a car increases personal exposure, but also increases stress. However, that stress is associated with the exposure inside the car, not the ambient exposure. Hence, they argue that the ambient exposure is less likely to be confounded and can act as an instrumental variable for the personal exposure. Using ambient exposure also means that it is neighborhood and not individual SES that is the potential confounder. Air pollution is associated with SES because neighborhoods with low SES tend to be closer to pollution sources than neighborhoods with high SES. However, an individual with low SES who happens to live in a high SES neighborhood would receive the lower exposure of the neighborhood. This simplifies control for confounding since neighborhood level factors are easier to obtain.

A number of simulation studies have investigated the impact of concentration estimation and measurement structure on the health effect estimate. Szpiro et.al 31 found that more accurate exposure prediction from Land Use models does not necessarily lead to improved health effect estimates when using spatially misaligned data. However, in another study conducted by Samoli et.al, 6 the authors found that using a complex hybrid prediction model that incorporated LUR, a dispersion model and machine learning methods yielded the least bias in estimating the long-term health effect of PM2.5, which is in accordance with our result that the bias in the effect estimate of our study was smaller than in previous studies. Besides, a simulation study conducted by Gryparis et.al 32 suggested that using out-of-sample regression calibration could correct the measurement error of spatially misaligned data effectively and efficiently, which supports our choice of measurement error correction method. Although under most of the circumstances, measurement error of air pollution biased the health effect estimate to the null, previous simulation studies observed that the effect estimate could be biased away from null in specific scenarios. Goldman et.al 3 found that, while additive Berkson error would not bias the health effect estimate, Berkson error on the log scale (i.e. the multiplicative scale) led to an overestimate of the effect of air pollution exposure on CVD in a time-series analysis. Butland et.al 1 observed that when there was high correlation and low ratio of the variance of the error-prone exposure and that of the “true” air pollution concentration, the health effect estimate could be biased away from null by more than 25%. However, for more common scenarios she found the health effect estimates were biased downward. Our results suggest that these extreme scenarios might not reflect the real-world scenarios.

One key strength of our study was that the regression calibration models for measurement error correction varied by time and region. Besides, the abundance of PM2.5 monitoring data allowed us to build regression calibration models for a large number of strata. Some other studies have been restricted to calibrations using dozens to a few hundred gold standard observations. Therefore, our analysis allowed for flexibility in modeling the relationship between monitoring and error-prone measurements of PM2.5, and does not assume the measurement error to be constant spatially and temporally.

This study also has several limitations. First, the calibrated PM2.5 was calculated at ZIP code-level while the regression calibration model was estimated at the coordinate level. However, given that each ZIP code area lies within a single census division and variation of elevation within a ZIP code is usually small, we believe the results would not be largely affected by within ZIP-code variation. Secondly, the gold standard measurement of PM2.5 in this study was monitoring data instead of personal exposure from ambient source, suggesting that the calibrated PM2.5 was potentially different from the actual exposure of each person. However, as noted earlier, ambient air pollution level is more relevant to public regulation and less susceptible to confounding, and we believe the tradeoff is worthwhile.. Third, the PM2.5 level in our study was not population-weighted. However, weighting the exposure with population size could potentially induce uncertainty into the study (due to error prone estimates of grid cell populations), which may lead to a larger measurement error. Fourth, due to the limited number of monitors and observations, we were not able to sample the monitors and run separated regression calibration within different time periods in high elevation area of East South Central region. However, the limited number of monitors suggests that only few people live in high elevation area within this region and that the results of the health outcome analysis would not be largely affected. Moreover, for the analysis below 10μg/m3, we directly applied the original regression calibration models, which might not totally capture the relationship between monitoring and modeled PM2.5 within that level. Lastly, all of the confounders that were included in the health outcome model were calculated at the ZIP code-level, which could potentially yield residual confounding. Again, the epidemiology results depend on the assumption of no unmeasured confounding.

5. Conclusions

In this study, we used different regression calibration models across strata formed by cross-classifying census division, elevation and season to correct for measurement error in modeled PM2.5. We observed that the regression calibration parameters vary by strata. Future regression calibration studies of PM2.5 should consider allowing more flexible relationship between true and error-prone concentration of the exposure. Using error-prone measure of PM2.5 underestimated the association between PM2.5 and all-cause mortality. However, modern exposure models of PM2.5 produce relatively small bias in the resulting mortality health effect estimates.

Figure 2.

Figure 2.

The distribution of calibration parameter ϐ1 across strata by census division, season and elevation between 2006-2010.

From the stratified regression calibration, we estimated the relationship between monitoring PM2.5 and predicted PM2.5 level. ϐ1 is the slope of the regression calibration model. This figure shows the median and 95% confidence interval of ϐ1 from different strata defined by elevation, season and census division.

Figure 5.

Figure 5.

The distribution of calibration parameter ϐ0 across strata by census division, season and elevation between 2006-2010

From the stratified regression calibration, we estimated the relationship between monitoring PM2.5 and predicted PM2.5 level. ϐ0 is the intercept of the regression calibration model. This figure shows the median and 95% confidence interval of ϐ0 from different strata defined by elevation, season and census division.

  • The relationship between monitored PM2.5 (true PM2.5 level) and the predicted PM2.5 level vary largely by time and region.

  • Propose a stratified regression calibration method for the correction of measurement error in air pollution, which allow flexible relationship between the true air pollution level and its error-prone measure.

  • Measurement error in predicted PM2.5 level estimated by advanced ensemble machine learning models only leads to small bias in the estimated association between PM2.5 and all-cause mortality

Funding

This work was supported by the National Institutes of Health [R01ES032418, and ES000002]

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Reference

  • 1.Butland BK, Samoli E, Atkinson RW, Barratt B, Katsouyanni K. Measurement error in a multi-level analysis of air pollution and health: a simulation study. Environ Health 2019;18(1):13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Evangelopoulos D, Katsouyanni K, Schwartz J, Walton H. Quantifying the short-term effects of air pollution on health in the presence of exposure measurement error: a simulation study of multi-pollutant model results. Environ Health 2021;20(1):94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Goldman GT, Mulholland JA, Russell AG, Strickland MJ, Klein M, Waller LA, Tolbert PE. Impact of exposure measurement error in air pollution epidemiology: effect of error type in time-series studies. Environ Health 2011;10:61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kim SY, Sheppard L, Kim H. Health effects of long-term air pollution: influence of exposure prediction methods. Epidemiology 2009;20(3):442–50. [DOI] [PubMed] [Google Scholar]
  • 5.Basagana X, Aguilera I, Rivera M, Agis D, Foraster M, Marrugat J, Elosua R, Kunzli N. Measurement error in epidemiologic studies of air pollution based on land-use regression models. Am J Epidemiol 2013;178(8):1342–6. [DOI] [PubMed] [Google Scholar]
  • 6.Samoli E, Butland BK, Rodopoulou S, Atkinson RW, Barratt B, Beevers SD, Beddows A, Dimakopoulou K, Schwartz JD, Yazdi MD, Katsouyanni K. The impact of measurement error in modeled ambient particles exposures on health effect estimates in multilevel analysis: A simulation study. Environ Epidemiol 2020;4(3):e094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Strickland MJ, Gass KM, Goldman GT, Mulholland JA. Effects of ambient air pollution measurement error on health effect estimates in time-series studies: a simulation-based analysis. J Expo Sci Environ Epidemiol 2015;25(2):160–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hart JE, Liao X, Hong B, Puett RC, Yanosky JD, Suh H, Kioumourtzoglou MA, Spiegelman D, Laden F. The association of long-term exposure to PM2.5 on all-cause mortality in the Nurses’ Health Study and the impact of measurement-error correction. Environ Health 2015;14:38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wu X, Braun D, Kioumourtzoglou MA, Choirat C, Di Q, Dominici F. Causal Inference in the Context of an Error Prone Exposure: Air Pollution and Mortality. Ann Appl Stat 2019;13(1):520–547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Richmond-Bryant J, Long TC. Influence of exposure measurement errors on results from epidemiologic studies of different designs. J Expo Sci Environ Epidemiol 2020;30(3):420–429. [DOI] [PubMed] [Google Scholar]
  • 11.Wei Y, Qiu X, Danesh Yazdi M, Shtein A, Yang J, Peralta A, Coull B, Schwartz J. The impact of exposure measurement error on the estimated concentration-response relationship between long-term exposure to PM2.5 and Mortality. Environmental Health Perspectives 2022;130(7). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Strand M, Vedal S, Rodes C, Dutton SJ, Gelfand EW, Rabinovitch N. Estimating effects of ambient PM(2.5) exposure on health using PM(2.5) component measurements and regression calibration. J Expo Sci Environ Epidemiol 2006;16(1):30–8. [DOI] [PubMed] [Google Scholar]
  • 13.Bateson TF, Wright JM. Regression calibration for classical exposure measurement error in environmental epidemiology studies using multiple local surrogate exposures. Am J Epidemiol 2010;172(3):344–52. [DOI] [PubMed] [Google Scholar]
  • 14.Hart JE, Spiegelman D, Beelen R, Hoek G, Brunekreef B, Schouten LJ, van den Brandt P. Long-Term Ambient Residential Traffic-Related Exposures and Measurement Error-Adjusted Risk of Incident Lung Cancer in the Netherlands Cohort Study on Diet and Cancer. Environ Health Perspect 2015;123(9):860–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, Sabath MB, Choirat C, Koutrakis P, Lyapustin A, Wang Y, Mickley LJ, Schwartz J. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ Int 2019;130:104909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fang X, Zou B, Liu X, Sternberg T, Zhai L. Satellite-based ground PM2. 5 estimation using timely structure adaptive modeling. Remote Sensing of Environment 2016;186:152–163. [Google Scholar]
  • 17.Sellier Y, Galineau J, Hulin A, Caini F, Marquis N, Navel V, Bottagisi S, Giorgis-Allemand L, Jacquier C, Slama R, Lepeule J, Group EM-CCS. Health effects of ambient air pollution: do different methods for estimating exposure lead to different results? Environ Int 2014;66:165–73. [DOI] [PubMed] [Google Scholar]
  • 18.Alexeeff SE, Schwartz J, Kloog I, Chudnovsky A, Koutrakis P, Coull BA. Consequences of kriging and land use regression for PM2.5 predictions in epidemiologic analyses: insights into spatial variability using high-resolution satellite data. J Expo Sci Environ Epidemiol 2015;25(2):138–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wei Y, Wang Y, Wu X, Di Q, Shi L, Koutrakis P, Zanobetti A, Dominici F, Schwartz JD. Causal Effects of Air Pollution on Mortality Rate in Massachusetts. Am J Epidemiol 2020;189(11):1316–1323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, Sabath MB, Choirat C, Koutrakis P, Lyapustin A, Wang Y, Mickley LJ, Schwartz J. Assessing NO2 Concentration and Model Uncertainty with High Spatiotemporal Resolution across the Contiguous United States Using Ensemble Model Averaging. Environ Sci Technol 2020;54(3):1372–1384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Requia WJ, Di Q, Silvern R, Kelly JT, Koutrakis P, Mickley LJ, Sulprizio MP, Amini H, Shi L, Schwartz J. An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States. Environ Sci Technol 2020;54(18):11037–11047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Schwartz JD, Yitshak-Sade M, Zanobetti A, Di Q, Requia WJ, Dominici F, Mittleman MA. A self-controlled approach to survival analysis, with application to air pollution and mortality. ENVIRONMENT INTERNATIONAL 2021;157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Szpiro AA, Sheppard L, Lumley T. Efficient measurement error correction with spatially misaligned data. Biostatistics 2011;12(4):610–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Strand M, Sillau S, Grunwald GK, Rabinovitch N. Regression calibration with instrumental variables for longitudinal models with interaction terms, and application to air pollution studies. Environmetrics 2015;26(6):393–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Alexeeff SE, Carroll RJ, Coull B. Spatial measurement error and correction by spatial SIMEX in linear regression models when using predicted air pollution exposures. Biostatistics 2016;17(2):377–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bergen S, Sheppard L, Kaufman JD, Szpiro AA. Multipollutant measurement error in air pollution epidemiology studies arising from predicting exposures with penalized regression splines. J R Stat Soc Ser C Appl Stat 2016;65(5):731–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kioumourtzoglou MA, Spiegelman D, Szpiro AA, Sheppard L, Kaufman JD, Yanosky JD, Williams R, Laden F, Hong B, Suh H. Exposure measurement error in PM2.5 health effects studies: a pooled analysis of eight personal exposure validation studies. Environ Health 2014;13(1):2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yu W, Li S, Ye T, Xu R, Song J, Guo Y. Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations. Environ Health Perspect 2022;130(3):37004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Weisskopf M, Webster T. Trade-offs of Personal Versus More Proxy Exposure Measures in Environmental Epidemiology. Epidemiology (Cambridge, Mass.) 2017(28):635–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yazdi MD, Wang Y, Di Q, Requia WJ, Wei Y, Shi L, Sabath MB, Dominici F, Coull B, Evans JS, Koutrakis P, Schwartz JD. Long-term effect of exposure to lower concentrations of air pollution on mortality among US Medicare participants and vulnerable subgroups: a doubly-robust approach. Lancet Planet Health 2021;5(10):e689–e697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Szpiro AA, Paciorek CJ, Sheppard L. Does more accurate exposure prediction necessarily improve health effect estimates? Epidemiology 2011;22(5):680–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gryparis A, Paciorek CJ, Zeka A, Schwartz J, Coull BA. Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics 2009;10(2):258–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES