Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Nov 2.
Published in final edited form as: Air Qual Atmos Health. 2009 Mar 1;2(1):47–55. doi: 10.1007/s11869-009-0033-3

Statistical issues in health impact assessment at the state and local levels

Montserrat Fuentes 1
PMCID: PMC2771419  NIHMSID: NIHMS146987  PMID: 19888447

Abstract

In this work we discuss the uncertainty in estimating the human health risk due to exposure to air pollution, including personal and population average exposure error, epidemiological designs and methods of analysis. Different epidemiological models may lead to very different conclusions for the same set of data. Thus, evaluation of the assumptions made and sensitivity analysis are necessary.

Short-term health impact indicators may be calculated using concentration-response (C-R) functions. We discuss different methods to combine C-R function estimates from a given locale and time period with the larger body of evidence from other locales and periods and with the literature. A shrunken method is recommended to combine C-R function estimates from multiple-locales. This shrunken estimate includes information from the overall and the local estimates, and thus it characterizes the estimated excess of risk due to heterogeneity between the different locations.

1 Introduction

This manuscript is part of a workshop on methodologies for environmental public health tracking of air pollution effects. This workshop was sponsored and organized by the Health E ects Institute (HEI), the U.S. Centers for Disease Prevention and Control (CDC), and the U.S. Environmental Protection Agency (EPA). The workshop was held in Baltimore, MD, on January 15 and 16, 2008. The overall goal of the workshop was to produce a set of recommendations for analyzing linked air quality and health data to estimate and track over time air pollution health impact indicators, for use at the U.S. state and sub-state levels. This manuscript focuses on air pollution acute effects, presenting the methodology for health impact assessment at the local levels, and it is intended for a broad audience.

In this manuscript we discuss relevant statistical issues in establishing the impact on human health of exposure to ozone, particulate matter and other pollutants at the state and local levels. A typical analysis consists of two stages, (1) exposure assessment and (2) epidemiological analysis relating exposure to the health outcome.

We start with the exposure assessment in Section 2. In this section we discuss different approaches to estimate pollution exposure including: the use of monitoring data, spatial statistical interpolation methods, air quality numerical models, satellite data and probabilistic exposure models. We discuss advantages and limitations of each one of the approaches, and we end this section with a discussion of uncertainty in the exposure assessment. Exposure assessment is an important activity for health risk assessment to air pollution, to investigate what is the health impact of a given exposure on a population, by applying a previously derived health effect model estimates to a population with a given exposure distribution.

In Sections 3 and 4 we discuss health outcome analyses. In Section 3, we introduce two complementary statistical methods to study the association between air pollution exposure and a health outcome: a time-series based approach and a case-crossover design, which are equivalent approaches under some assumptions. We present uncertainty analysis for both frameworks.

In Section 4, we introduce different approaches for local concentration-response function analysis: local regression analysis, adjusted estimates using external C-R functions, shrunken approaches, and full Bayesian methods. We discuss uncertainty and sensitivity analysis for the C-R function. In Section 5, we present a case study.

2 Exposure assessment

Epidemiologic studies typically assess the health impacts of particulate matter and ozone using ambient concentrations measured at a centrally-located monitoring site, or at several sites located across the study area, to reflect exposures for their study population. The ability of these ambient concentrations to reflect actual pollution exposures for the study population generally depends on several factors, including the spatial distribution of the ambient air pollutants, the time-activity patterns and housing characteristics for the study community.

One method to link personal exposure to ambient levels, and thus to the association between air pollution and the health endpoints, is to model exposure by simulating the movement of individuals through time and space and estimate their exposure to a given pollutant in indoor, outdoor, and vehicular microenvironments. The exposure model developed by the U.S. Environmental Protection Agency (EPA) to estimate human population exposure to particulate matter is called Stochastic Human Exposure and Dose Simulation (SHEDS-PM) (Burke, 2005) and the stochastic model for ozone is called Air Pollutants Exposure (APEX). They are both probabilistic models designed to account for the numerous sources of variability that a ect people’s exposures, including human activity. Daily activity patterns for individuals in a study area, an input to APEX and SHEDS, are obtained from detailed diaries that are compiled in the Consolidated Human Activity Database (CHAD) (McCurdy et al., 2000; EPA 2002). Although SHEDS and APEX can be valuable tools, human exposure simulation models introduce their own uncertainties, and such models need to be further evaluated and their uncertainties characterized.

Most of the previous analyses of particulate matter (PM) health effects have been conducted in urban areas; very little is known about rural PM-related health effects. One reason for this is that monitoring data are sparse across space and time for rural areas. For ozone, we lack information for the winter months, since most monitoring stations only operate from May to September. Thus, EPA in collaboration with the Centers for Disease Control and Prevention (CDC), and three state public health agencies (New York, Maine, and Wisconsin) are working together on the Public Health Air Surveillance Evaluation (PHASE) project to identify different spatial-temporal interpolation tools that can be used to generate daily surrogate measures of exposure to ambient air pollution and relate those measure to available public health data. As part of the PHASE project, EPA is using statistical techniques (e.g. Kriging, see Cressie, 1993) to interpolate monitoring data at locations and times for which we do not have observations. EPA is also supplementing monitoring data with satellite data and atmospheric deterministic models (e.g., Community Multiscale Air Quality (CMAQ) models). These models run by EPA provide hourly air pollution concentrations and fluxes at regular grids in the U.S.. CMAQ uses as inputs meteorological data, emissions data and boundary values of air pollution (Binkowski and Roselle, 2003; Byun and Schere, 2006). These air quality numerical models provide areal pollution estimates, rather than spatial point estimates. Thus, we have a change of support problem (see e.g. Gotway and Young, 2002), since monitoring data and numerical models do not have the same spatial resolution. EPA in the PHASE project has adopted a hierarchical Bayesian (HB) spatial-temporal model to fuse monitoring data with CMAQ, using sound statistical principles (McMillan et al., 2007). The Bayesian approach provides a natural framework for combining data (see Fuentes and Raftery, 2005), and it relays on prior distributions for different parameters in the statistical model. The prior distributions could be space dependent and also substance dependendent. Consequently, this framework needs to be used with caution when applied to different geographic domains and different air pollutants. The potential bias in the pollution estimates as a result of the change of support problem is not taken into account in the PHASE project due to the computational burden. For a description of the problems that arise when combining two methods with different support we refer to Gotway and Young (2002). This might not cause a significant impact on the estimated exposure when the air quality numerical models are run at a high spatial resolution (i.e. grid cells of 4km × 4km). However, when CMAQ is run at a coarse resolution (e.g. grid cells of 36km × 36km), the change of support problem could result in biased exposure estimates.

The final product of the HB approach adopted in the PHASE project is a joint distribution of the concentrations of pollution across space and time. Since this distribution is likely to be non-Normal, just the mean of the distribution at each location and time is not necessarily a good summary. Alternative summaries should be considered, such as different percentiles. Ideally, one would like to work with simulated values from the distribution rather than just a summary of the distribution, because that way we could characterize the uncertainty in the exposure when conducting the risk assessment. This will be discussed in Section 4.

2.1 Uncertainty in the exposure assessment

The use of statistical models (e.g. kriging), air quality numerical models (e.g. CMAQ), or exposure models (APEX, SHEDS) to help in characterizing exposure to ozone and particulate matter adds more sources of uncertainty to the human health risk assessment estimates, because these models have their own uncertainties. However, the air quality models can be a valuable and powerful tool to extend the concentration-response (C-R) function analysis to the national level and also for addressing gaps if not enough monitoring data are available. The air quality models, based on the dynamics and mechanics of atmospheric processes, typically provide information at higher temporal and spatial resolution than data from observational networks. Errors and biases in these deterministic models are inevitable due to simplified or neglected physical processes or mathematical approximations used in the physical parameterization. The exposure models can be considered a powerful tool for characterizing the exposures of the study population by taking into account human activities. The different sources of error and uncertainties in the exposure models (SHEDS, APEX) result from variability not modeled or modeled incorrectly, erroneous or uncertain inputs, errors in coding, simplifications of physical, chemical and biological processes to form the conceptual models, and flaws in the conceptual model. In particular, the uncertainty in the estimation of ambient air quality will be propagated by APEX and SHEDS. The APEX and SHEDS output could be also very sensitive to the uncertainty in the prior distributions used in the microenvironmental models. Evaluation of these air quality and exposure models would help to quantify and characterize the different sources of errors in the models.

Reich, Fuentes and Burke (2009) compare mortality risk estimates obtained under different exposures metrics, in particular using SHEDS versus just monitoring data to characterize fine particulate matter (PM) exposure in El Fresno, CA for years 2001 and 2002. The estimated risk parameter was not very different when using SHEDS versus monitoring data, but the 95% confidence intervals for the estimated risk in El Fresno were widened by using the exposure model (SHEDS), since SHEDS helps to characterize the heterogeneity in the population under consideration. Choi, Fuentes and Reich (2009), show how using CMAQ data combined with monitoring data to characterize fine PM exposure, helps to reduce the amount of uncertainty in the estimated risk of mortality due to fine PM exposure. Their study shows that the health effects in some areas were not significant when using only monitoring data, but then appeared to be significant when adding CMAQ as an additional source of information to characterize the exposure.

In some cases, presenting results from a small number of model scenarios would provide an adequate uncertainty analysis for the air quality and exposure models (e.g. when insu cient information is available). In most situations, probabilistic methods would be necessary to characterize properly at least some uncertainties, and also to communicate clearly the overall uncertainties. Although a full Bayesian analysis that incorporates all sources of information may be desirable in principle, in practice, it will be necessary to make strategic choices about which sources of uncertainty justify such treatment and which sources are better handled through less formal means, such as consideration of how model outputs might change as some of the inputs vary through a range of plausible values.

These different sources of uncertainty in the estimated exposure due to the use of different interpolation techniques need to be taken into account when estimating the C-R function. When using a Bayesian approach to estimate exposure (e.g. HB-PHASE approach), the uncertainty in the exposure to some degree is characterized by the joint distribution of the exposure values. To the extent that is computationally feasible, the risk assessment should be conducted using the joint distribution of the exposure values rather than just means from that distribution.

2.1.1 Sensitivity analysis

Sensitivity analysis should be conducted to understand the impact of the uncertainty in the exposure estimates on the health risk assessment, since it could result in over or under estimation of the risk.

Sensitivity calculations help to understand the sensitivity of results to some model assumptions. In particular, it is important to examine sensitivity to the structure of the spatial smoothing (kriging), and how is implemented, by comparing different covariance functions in the spatial smoothing techniques fitted using a plug-in method, empirical Bayes, or fully Bayesian (e.g. Gryparis et al., 2008). Sensitivity analysis should be conducted when using CMAQ/APEX/SHEDS models, to understand how results might be dependent on some of the inputs and parameterizations of these models.

3 Estimation of health effects

Time series analysis is a commonly-used technique for assessing the association between counts of health events over time and exposure to ambient air pollution. The case-crossover design is an alternative method, that uses cases only, and compares exposures just prior to the event times to exposures at comparable control, or referent times, in order to assess the effect of short-term exposure on the risk of a rare event (see Janes et al., 2004). Each technique has advantages and disadvantages (see Fung et al., 2003). The PHASE team has selected case-crossover rather than time-series analysis due to the shorter learning curve (easier to use), and because within one analysis the method can accommodate many time-series. It is important to keep in mind that the case-crossover design is equivalent to a Poisson regression analysis except that confounding is controlled for by design (matching) instead of in the regression model. Restricting referents to the same day of week and season as the index time can control for these confounding effects by design. Accurate estimates can be achieved with both methods. However, both methods require some decisions to be made by the researcher during the course of the analysis.

In modelling time series of adverse health outcomes and air pollution exposure, it is important to model the strong temporal trends present in the data due to seasonality, influenza, weather and calendar events. Recently, rigorous statistical time series modelling approaches have been used to better control for these potential confounders. Furthermore, sophisticated analytical techniques have been introduced to adjust for seasonal trends in the data, culminating in the introduction of the generalized additive model (GAM). Although temporal trends can be explicitly included in the model, nonparametric local smoothing methods (LOESS) based on the GAM were widely used to take into account such trends in the analysis. Dominici et al. (2002b) suggested another approach using parametric natural cubic splines in the GAM model instead of the LOESS. One of the main limitations of this type of time series modelling approach is that it is necessary to choose the time span in the LOESS smoothing process, or the degrees of freedom of the cubic splines, and the results can be very sensitive to how that is done (e.g., Peng et al, 2006).

The case-crossover design compares exposures at the time of the event (i.e. hospital admission) with one or more periods when the event is not triggered. Cases serve as their own controls. The excess risk is then evaluated using a pair-matched design and conditional logistic regression analysis. Proper selection of referents is crucial with air pollution exposures, because of the seasonality and long term time trend. Careful referent selection is important to control for time-varying confounders, and to ensure that the distribution of exposure is constant across referent times, which is the main assumption of this method. The referent strategy is important for a more basic reason: the estimating equations are biased when referents are not chosen a priori and are functions of the observed event times. This type of bias is called overlap bias. Different strategies, such as full stratum bidirectional referent selection (choosing referents both before and after the index time) (Navidi, 1998) have been proposed to reduce bias. But, they does not control for confounding by design.

3.1 Sensitivity analysis

For any study of the association between air pollution and adverse health outcomes, conducted based on a Poisson time series or a case-crossover design, is important to verify the model assumptions and to evaluate the model performance. Thus, there is need to assess the performance of the different variations of time series and case-crossover procedures to establish associations between air pollution and human health. Sensitivity analysis of the time series procedure to the statistical representation of the confounding effects need to be conducted. Since this could lead to significant bias in the estimation of the health effects. In particular, the sensitivity of the results with respect to the co-pollutants introduced in the model, the time span used in the LOESS smoothing process, and to the degrees of freedom when choosing cubic splines need to be determined. For the case-crossover studies using bi-directional control selection, sensitivity analysis regarding the choice of time interval need to be conducted.

4 Estimation of the C-R function

Short-term health impact indicators can be calculated using concentration-response (C-R) functions. A C-R function summarizes the associations between various measures of air pollution and the health outcome. Local C-R functions can be obtained from case-crossover or time series analysis using local information. However, since there is usually limited data for each location, pooling information across similar regions may improve local C-R estimates. A local analysis ignores information from other locations/periods, and could result in a less accurate estimate of the C-R local function. There is a precedent for use of methods that combine a local C-R function analysis with C-R functions from other locations and times, for example, Post et al. 2001, Trete et al., 2005, Dominici et al. 2002a, and Fuentes et al. 2006. We discuss in this section these different approaches to estimate local C-R functions. We start with simple local regression approaches, then we introduce external C-R functions, the next approach would be the use of shrunken estimates (empirical Bayes) and finally the use of Full Bayesian approaches. The degree of statistical training and the computational challenges increase as move along this list from the local regression to the Bayesian approaches. While Bayesian approaches are recommended because they better characterize different sources of uncertainty, depending on the resources one would have to make a decision about what method to use. The purpose of this Section is to highlight the advantages and limitations of each approach.

The C-R function assumed in most epidemiological studies on health effects of particulate matter (PM), ozone and other ambient pollutants, is exponential: y = Beβx, where x is the exposure level, y is the incidence of mortality (or other adverse health outcome) at level x, β is the coefficient of the environmental stressor, and B is the incidence at x = 0 when there is no exposure). In these epidemiological models at the local or state level, we assume that the counts of the health outcome come from a Poisson process. Thus, we have,

ln(E(ytc))=βcPtc+ηcXtc (1)

where E(ytc) represents the mean counts of the health outcome in the subdomain c on day t, Ptc are the daily levels of the environmental stressors at location c and day t, βc is the parameter to be estimated, which is the coefficient multiplying the environmental stressor. The log relative risk (RR) parameter is usually defined as βc * 103. Xct is the vector of the confounding factors (e.g. seasonality, weather variables, influenza and calendar events) and ηc is the corresponding vector of coefficients. The confounder term in this model is often replaced with a smooth function of the covariates (e.g. splines).

Local estimates

Local estimates of βc can be obtained at each location c separately, using a regression technique applied to model (1). Local regression would allow for more local covariate control. However, the evidence across different locations is ignored.

Adjusted estimates (external C-R function)

Local estimates (i.e., from multiple locations) can be combined using a random effects model, by regressing the local estimates against potential effect modifiers that vary across locations. This is done to gain precision in estimating the C-R function and to understand variability. The model assumptions are:

β^c~N(μc,SW,c2),μc~N(αZc,σB2).

If we ignore the potential variability within location c of the effect modifiers αZc, we have

β^c~N(αZc,SW,c2+σB2)

β^c is the estimated effect of P in location c, SW,c2 is the estimated within location c variance, and σB2, is the between locations variance. β^c and SW,c2 are obtained from the local regression analysis. The between locations variance, σB2, is usually estimated with the maximum likelihood estimate, using an iterative approach.

The random-effects pooled estimate is a weighted average of the location-specific β^c. The weights involve both the sampling error (the within-location variability) and the estimate of σB2, the variance of the underlying distribution of μc (the between-location variability).

Shrunken estimates

An alternative to the local estimates and to the overall (pooled random effects) estimate is obtained using the local shrunken estimates. The model assumptions are:

β^c~N(μc,SW,c2)μc~N(β~,σB2) (2)

where SW,c2 is the estimated within-location variance and obtained in a first-stage local analysis as the squared standard error (SE) from the local regression model, β^c is the Maximum likelihood (ML) estimate from the local regression. β^ is the overall pooled estimate, and σB2 is the between-location variance (treated as known, and obtained in a first-stage analysis using a maximum-likelihood approach).

Then, we can obtain the following conditional distribution:

μcβ^c,β~,SW,c2,σB2~N(SW,c2SW,c2+σB2β~+σB2SW,c2+σB2β^c,SW,c2σB2SW,c2+σB2),

this is called the posterior probability distribution of μc. The mean of this posterior distribution is also called the shrunken estimate of βc. The variance of the shrunken estimate is SW,c2σB2SW,c2+σB2, which is clearly smaller than SW,c2, the variance of our local regression estimate, because by introducing the spatial information we are able to reduce the variability of our risk estimate. This shrunken estimate includes information from the overall and the local estimates, and thus it characterizes the estimated excess of risk due to heterogeneity between the different locations. In the presence of heterogeneity, location-specific estimates vary regarding the overall effect estimate for two reasons: a) the true heterogeneity in the estimates, and b) additional stochastic error. A location-specific estimate reflects the first source of variation but not the second one. The use of shrunken estimates allows reduction of the stochastic variability of the local estimates. This shrunken method is an empirical Bayesian method, because β^c, β^, and the within and between variance parameters, are treated as known, and therefore the uncertainty about these parameters is not taken into account in the analysis. This could lead to underestimation of the variance associated to the log relative risk parameter.

Effect modifiers (external C-R function), αZc, could be also easily introduced in this empirical Bayes framework, by replacing in our model β^ with αZc.

Full Bayesian approach

A full Bayesian approach is an extension of the shrunken method, to characterize the uncertainty in the pooled estimate, β^, and the within location estimate, β^c, when obtaining the final estimate of the effect of the environmental stressor at a given location. Thus, rather than treating β^ and σB2 as known, they are modelled as random effects that are jointly estimated at all locations. This would just a one way random effects model which is easy to fit.

A Bayesian multi-stage framework would allow the characterization of the spatial dependency structure of the relative risk parameter, by treating βc as a spatial stochastic process (Fuentes et al, 2006). Lee and Shaddick (2007) smoothed the risk across time. However, this spatial/temporal analysis is usually highly dimensional, and the computational demand of a full Bayesian approach can be extremely laborious. The computation is often simplified by using empirical Bayes alternatives, such as the shrunken estimate.

5 Uncertainty in the C-R function

Concentration-response functions, estimated by epidemiological models, play a crucial role in the estimation of the risk associated with different pollutants. Uncertainty in the C-R function may impact conclusions. As described in the previous section, some of the formal approaches for uncertainty analysis in epidemiological models, include Bayesian analysis and Monte Carlo analysis.

To deal with epidemiological model uncertainty, it is possible to compare alternative models, but not combine them, weight predictions of alternative models (e.g. probability trees), and/or the use meta-models that degenerate into alternative models. For comparison of different models to estimate the C-R function, we recommend to use statistical information criteria, that have traditionally played an important role in model selection. The basic principle of model selection using information criteria is to select statistical models that simplify the description of the data and model. Specifically, information methods emphasize minimizing the amount of information required to express the data and the model. This results in the selection of models that are the most parsimonious or e cient representations of observed phenomena. Some of the commonly used information criteria are: AIC (Akaike information criterion, Akaike, 1973, 1978), BIC (Bayesian information criterion, also known as the Schwarz criterion, Schwarz, 1978), RIC (Risk inflation criterion, Foster and George, 1994), deviance information criterion (DIC), which is a generalization of the AIC and BIC. The DIC is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. These criteria allow to describe the level of uncertainty due to model selection, and can be used to combine inferences by averaging over a wider class of models (meta-analysis) using readily available summary statistics from standard model fitting programs.

There are also uncertainties associated with the estimate of the environmental stressor, and reliability of the limited ambient monitoring data in reflecting actual exposures (as discussed in the exposure assessment section). Because the uncertainties propagate to the epidemiological model, a full characterization of uncertainties in the exposure assessment is needed. The ability to quantify and propagate uncertainty is still an area in development. Using a hierarchical framework would help quantify uncertainties; the fitting can be done stage-by-stage, taking the interim posteriors from one stage as the priors for the next. Within each stage a fully Bayesian approach can be used to get the interim posterior distributions. As the implementation is based on the sequential version of the Bayes theorem, the corresponding model uncertainties will be captured at the final stage of the hierarchical model. The HB-PHASE framework to obtain exposures fits naturally within this multi-stage approach, by treating the exposure distributions obtained from the HB approach as priors in the next stage, in which we estimate the RR. However, this can be computationally demanding. Uncertainty analysis has certainly developed further and faster than our ability to use the results in decision-making. Effective uncertainty communication requires a high level of interaction with the relevant decision makers to ensure that they have the necessary information about the nature and sources of uncertainty and their consequences.

5.1 Sensitivity analysis

Sensitivity analyses need to be conducted to understand how results vary with the assumed shape of the concentration-response function and other model assumptions, since this could lead to biased results; in particular to the role of confounders, demographic factors, co-pollutants, the structure of the cessation lag, and sensitivity of the premature mortality estimate (or other endpoints) to the presence of a potential threshold.

6 Case Study

The NMMAPS (National Morbidity, Mortality, and Air Pollution Study) data are publicly available, and they contain mortality, weather, and air pollution data for 108 cities across the United States for years 1987-2000. The NMMAPS data are available through the internet-based health and air pollution surveillance system (iHAPSS). iHAPSS is developed and maintained by the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health.

Using the NMMAPS data we estimate the association between particulate matter, PM10 (particles with a diameter of 10 micrometers or less), and death due to cardiovascular diseases. In this application we work with 11 cities located in the north eastern U.S., and we compare the four different methods proposed in this paper to obtain local estimates: the local method, the adjusted method, the shrunken approach, and the fully Bayesian. The health end point is cardiovascular mortality. We also present pooled estimates for the overall effect using each one of the 4 methods, obtained as weighted average of the local estimates:

PA^=cβ^cSc2c1Sc2,

with associated standard error (SE)

SE(PA^)=1c1Sc2,

where β^c is the local estimate for each city and Sc2 is the corresponding variance.

First, we introduce our Poisson regression model (NMMAPS model, Peng et al., 2004):

Yt~Poisson(μt)logμt=γ1DOWt+γ2AgeCat+γ3s(tempt,df=6)+γ4s(tempt,13,df=6)+γ5s(dewptt,df=3)+γ6s(dewptt,13,df=3)+γ7s(t,df=7×#years)+γ8S(t,df=0.15×7×#years)+βPMtVar(Yt)=ϕμt (3)

where Yt is the number of cardiovascular deaths on day t, ϕ, β and γi for i = 1, 8 are unknown parameters, DOWt is the day of week for day t, AgeCat is an age indicator. The age categories used are ≥ 75, 65-74, and < 65 years old. tempt is the average temperature on day t, tempt,1–3 is the running mean of the temperature for the previous 3 days, PMt is the PM10 level for day t. The variables dewptt and dewptt,1–3 are current day and running mean of dewpoint temperature. Each of the temperature and dewpoint temperature variables, as well as time, are related to mortality via a smooth function s(). While there are many choices for smooth functions, the smooth function used in this study is natural splines. The smoothness of the functions of s() are controlled through the degrees of freedom (df) given to each function. The degrees of freedom are fixed at 6 df for the temperature functions and 3 df for the dewpoint temperature functions. The degrees of freedom for time are dependent on the number of years of data being used, and are adjusted for the presence of missing data. The smooth function of time has 7 df per a year, and there is also an addition smooth function per age category that has 0.15 × 7 df per a year. These smooth functions of time are important to control for seasonal factors, long term mortality trends, and possible age specific trends. The df in this application are the same as in NMMAPS. We define β^c as the effect for city c with associated variance SW,c2. We can think of SW,c2 as the within city variation.

In Table 1 we presented the estimated risk of mortality and its corresponding standard error (SE) using each one of the 4 proposed methods. The local analysis corresponds to a Poisson regression at each city. It is clear that all three methods that pool information from the local level (adjusted, shrunken, full Bayesian) are able to refine the local estimates (less variance). In this dataset we do not have external information for each city, so the “adjusted” estimates are the same for each city. Though, the variability is different.

Table 1.

1. Table showing how the 4 methods presented in this paper change the resulting estimates of the local effect β^c. The reported estimates are the health effect times 103, 103β^c, corresponding to percent increase in mortality per increase in 10 units of PM10.

Local Adjusted Shrunken Full Bayes
β^c SE β^c SE β^c SE β^c SE
Syracuse 3.18 1.56 0.72 1.60 0.82 0.31 1.41 0.98
Boston 2.50 1.27 0.72 1.31 0.83 0.31 1.35 0.88
Providence 2.03 1.23 0.72 1.27 0.80 0.31 1.21 0.83
Jersey City 1.25 0.75 0.72 0.81 0.80 0.29 1.03 0.60
Baltimore 0.40 0.62 0.72 0.69 0.66 0.28 0.55 0.52
Newark 0.23 1.05 0.72 1.09 0.68 0.30 0.56 0.72
Philadelphia 0.09 0.66 0.72 0.73 0.61 0.28 0.38 0.56
Washington 0.01 1.53 0.72 1.56 0.69 0.31 0.58 0.87
Kingston −1.20 2.19 0.72 2.21 0.68 0.31 0.45 1.02
Arlington −1.60 5.96 0.72 5.97 0.72 0.31 0.71 1.17
Richmond −2.24 2.74 0.72 2.76 0.68 0.31 0.42 1.09

Pooled Estimate 0.72 0.32 0.72 0.31 0.72 0.32 0.79 0.50

Table 1 illustrates the main conclusions from this paper, how the shrunken estimate borrows information from the overall and local estimates, and helps then to reduce stochastic variability of the local estimates. Therefore, some cities that did not have a significant health effect using only local analysis, appear to have a significant effect when using the shrunken method. A fully Bayesian approach characterizes also uncertainty in β^ and σB2, so it gives larger SE than the empirical Bayesian approach (shrunken method).

Acknowledgements

The author thanks the National Science Foundation (Fuentes DMS-0706731, DMS-0353029), the Environmental Protection Agency (Fuentes, R833863), and National Institutes of Health (Fuentes, 5R01ES014843-02) for partial support of this work. The author would like to thank the associate editor and two reviewers for their very helpful feedback, and also to acknowledge Eric Kalendra, graduate student at NCSU, for providing the results in Table 1.

References

  1. Akaike H. 2nd International Symposium on Information Theory. Akademia Kaido; Budapest: 1973. Information theory and an extension of the maximum likelihood; pp. 267–281. [Google Scholar]
  2. Akaike H. A new look at the Bayes procedure. Biometrika. 1978;65:53–59. [Google Scholar]
  3. Binkowski FS, Roselle SJ. Models-3 community multiscale air quality (CMAQ) model aerosol component, 1. Model description. J. Geophys. Res. 2003;108:4183. doi:10.1029/2001JD001409. [Google Scholar]
  4. Burke J. EPA Stochastic Human Exposure and Dose Simulaion for Particulate Matter (SHEDS-PM). User Guide. 2005.
  5. Burnham KP, Anderson DR. Model Selection and Multimodel Inference: A Practical-Theoretic Approach. 2nd ed Springer-Verlag; 2002. [Google Scholar]
  6. Byun DW, Schere KL. Review of the governing equations, computational algorithms and other components of the Models-3 Community Multiscale Air Quality (CMAQ) Modeling System. Applied Mechanics Reviews. 2006;59:51–77. [Google Scholar]; ENVIRON . CAMx User’s Guide. ENVIRON International Corporation; Novato, CA: Sep, 2006. www.camx.com; www.environcorp.com. [Google Scholar]
  7. Choi J, Fuentes M, Reich B. Spatial-temporal association between fine particulate matter and daily mortality. Journal of Computational Statistics and Data Analysis. 2009 doi: 10.1016/j.csda.2008.05.018. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cressie NA. Statistics for Spatial Data. Revised Edition Wiley; New York: 1993. [Google Scholar]
  9. Dominici F, Daniels M, Zeger SL, Samet JM. Air Pollution and Mortality: Estimating Regional and National Dose. Response Relationships Journal of the American Statistical Association. 2002a;97:100–111. [Google Scholar]
  10. Dominici F, McDermott A, Zeger SL, Samet JM. On the use of generalized additive models in time series of air pollution and health. American Journal of Epidemiology. 2002b;156:193–203. doi: 10.1093/aje/kwf062. [DOI] [PubMed] [Google Scholar]
  11. Foster DP, George EI. The risk inflation criterion for multiple regression. Annals of Statistics. 1994;22:1947–75. [Google Scholar]
  12. Fuentes M, Raftery AE. Model Evaluation and Spatial Interpolation by Bayesian Combination of Observations with Outputs from Numerical Models. Biometrics. 2005;61:36–45. doi: 10.1111/j.0006-341X.2005.030821.x. [DOI] [PubMed] [Google Scholar]
  13. Fuentes M, Song H, Ghosh SK, Holland DM, Davis JM. Spatial association between speciated fine particles and mortality. Biometrics. 2006;62:855–863. doi: 10.1111/j.1541-0420.2006.00526.x. [DOI] [PubMed] [Google Scholar]
  14. Fung KY, Krewski D, Chen Y, Burnett R, Cakmak S. Comparison of time series and case-crossover analyses of air pollution and hospital admission data. International Journal of Epidemiology. 2003;32:1064–1070. doi: 10.1093/ije/dyg246. [DOI] [PubMed] [Google Scholar]
  15. Gotway CA, Young LJ. Combining incompatible spatial data. Journal of the American Statistical Association. 2002;97:632–648. [Google Scholar]
  16. Gryparis A, Paciorek CJ, Zeka A, Schwartz J, Coull BA. Measurement error caused by spatial misalignment in environmental epidemiology. 2009. Tech. report at the Department of Biostatistics, Harvart University. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Janes H, Sheppard L, Lumley T. Overlap bias in the case-crossover design, with application to air pollution exposures. The Berkeley Electronic Press; 2004. (UW Biostatistics Working Paper Series). University of Washington, paper # 213. [DOI] [PubMed] [Google Scholar]
  18. Lee D, Shaddick G. Time-Varying Coe cient Models for the Analysis of Air Pollution and Health Outcome Data. Biometrics. 2007 doi: 10.1111/j.1541-0420.2007.00776.x. doi: 10.1111/j.1541-0420.2007.00776.x. [DOI] [PubMed] [Google Scholar]
  19. McCurdy T, Glen G, Smith L, Lakkadi Y. The National Exposure Research Laboratory’s Consolidated Human Activity Database. J. Exposure Anal. Environ. Epidemiol. 2000;10:566–578. doi: 10.1038/sj.jea.7500114. [DOI] [PubMed] [Google Scholar]
  20. McMillan NJ, Holland DM, Morara M. Combining numerical model output and particulate data using Bayesian space-time modelling. 2007. Tech. Report at U.S. EPA (RTP, NC) [Google Scholar]
  21. Navidi N. Bidirectional case-crossover designs for exposures with time trends. Biometrics. 1998;54:569–605. [PubMed] [Google Scholar]
  22. Peng RD, Welty LJ, McDermott AM. The National, Mortality, and Air Pollution Study Database in R. Johns Hopkins University, Dept. of Biostatistics Working Papers. 2004. Year 2004, paper #44. [Google Scholar]
  23. Peng RD, Dominici F, Louis T. Model Choice in Multi-Site Time Series Studies of Air Pollution and Mortality. Journal of the Royal Statistical Society, Series A. 2006;169(Part 2):179–203. [Google Scholar]
  24. Post E, Hoaglin D, Deck L, Larntz K. An empirical Bayes approach to estimating the relation of mortality to exposure to particulate matter. Risk Analysis. 2001;21:837–842. doi: 10.1111/0272-4332.215155. (2001) [DOI] [PubMed] [Google Scholar]
  25. Reich B, Fuentes M, Bruke J. Analysis of the effects of ultrafine particulate matter while accounting for human exposure. Environmetrics. 2009 doi: 10.1002/env.915. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6:461–464. [Google Scholar]
  27. Tertre AL, Schwartz J, Touloumi G. Empirical Bayes and adjusted estimates approach to estimating the relation of mortality to exposure of PM10. Risk Analysis. 2005;25:711–718. doi: 10.1111/j.1539-6924.2005.00606.x. [DOI] [PubMed] [Google Scholar]
  28. U.S. Environmental Protection Agency Consolidated Human Activities Database (CHAD) Users Guide. 2002 Database and documentation available at: http://www.epa.gov/chadnet1/

RESOURCES