Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Mar 1.
Published in final edited form as: Biometrics. 2013 Dec 10;70(1):164–174. doi: 10.1111/biom.12125

Estimating Acute Air Pollution Health EFFects from Cohort Study Data

Adam A Szpiro 1,*, Lianne Sheppard 2, Sara D Adar 3, Joel D Kaufman 4
PMCID: PMC4080094  NIHMSID: NIHMS540694  PMID: 24571570

Summary

Traditional studies of short-term air pollution health effects use time series data, while cohort studies generally focus on long-term effects. There is increasing interest in exploiting individual level cohort data to assess short-term health effects in order to understand the mechanisms and time scales of action. We extend semiparametric regression methods used to adjust for unmeasured confounding in time series studies to the cohort setting. Time series methods are not directly applicable since cohort data are typically collected over a prespecified time period and include exposure measurements on days without health observations. Therefore, long-time asymptotics are not appropriate, and it is possible to improve efficiency by exploiting the additional exposure data. We show that flexibility of the semiparametric adjustment model should match the complexity of the trend in the health outcome, in contrast to the time series setting where it suffices to match temporal structure in the exposure. We also demonstrate that pre-adjusting exposures concurrent with the health endpoints using trends in the complete exposure time series results in unbiased health effect estimation and can improve efficiency without additional confounding adjustment. A recently published article found evidence of an association between short-term exposure to ambient fine particulate matter (PM2.5) and retinal arteriolar diameter as measured by retinal photography in the Multi-Ethnic Study of Atherosclerosis (MESA). We reanalyze the data from this article in order to compare the methods described here, and we evaluate our methods in a simulation study based on the MESA data.

Keywords: Air pollution, Environmental epidemiology, Generalized least squares, Mixed models, Semiparametric regression, Time series, Unmeasured confounding.

1. Introduction

Epidemiologic evidence demonstrates an association between exposure to fine particulate matter (PM2.5) air pollution and adverse health effects. Since air pollution is a modifiable risk factor, it is important to accurately estimate the magnitude of health effects and understand their mechanisms and time scales (Brook et al., 2010). The Environmental Protection Agency (EPA) has a legislative mandate to set standards for short-term and long-term air pollution levels to protect human health. Epidemiologic evidence plays a central role in establishing the scientific basis for these regulations (Environmental Protection Agency, 2006).

Short-term air pollution exposure on a time scale of hours or days is most likely associated with acute or transient health outcomes. A traditional approach to assessing the acute impact of short-term exposure uses population outcomes such as hospitalization or mortality rates in time series studies (Schwartz, 1994; Sheppard et al., 1999; Samet et al., 2000; Dominici, McDermott, and Hastie, 2004). Other designs used for this purpose include case-crossover studies (Janes, Sheppard, and Lumley, 2005), which are closely related to time series methods, and panel studies in which a small cohort of individuals are followed longitudinally (Dominici, Sheppard, and Clyde, 2003; Janes, Sheppard, and Shepherd, 2008).

Air pollution cohort studies have focused primarily on the cross-sectional effect of long-term air pollution exposure on chronic health outcomes (Dockery et al., 1993; Pope et al., 2002; Miller et al., 2007). Long-term exposure could refer to a subject's entire lifetime or to a period on the order of a year or more. There is growing interest in exploiting cohort data to estimate associations between short-term exposure and acute health effects in order to better understand the biological mechanisms by which air pollution causes disease.

We consider the recently published analysis by Adar et al. (2010) of the association between PM2.5 exposure and retinal microvasculature as a marker of subclinical cardiovascular disease in the Multi-Ethnic Study of Atherosclerosis (MESA). While Adar et al. (2010) evaluate both chronic and acute effects on retinal arteriolar and vascular outcomes, we focus on the acute association with daily air pollution exposure. The dominant PM2.5 variability is temporal rather than spatial, so we follow the approach in Adar et al. (2010) and treat exposure as a spatially homogeneous time series within metropolitan areas.

The exposure and the outcome can have seasonal and meteorological trends, so we need to control for shared sources of temporal variability to estimate unconfounded associations with air pollution. A methodology developed for time series studies is to include semiparametric spline terms in a regression model to adjust for temporal confounding. For this approach to be effective, we need to ensure that the spline terms contain sufficient degrees of freedom (df) to fully adjust for the temporal structure. A number of methods have been proposed for selecting df in time series studies (Dominici et al., 2004; Peng, Dominici, and Louis, 2006), but the existing literature does not address the implications for cohort study data.

The first objective of this article is to adapt the semipara-metric regression methodology to cross-sectional cohort studies. The theory does not carry over directly for a number of reasons: (i) the relevant asymptotics are different, as in a cohort study we are concerned with large n asymptotics corresponding to a large number of subjects, whereas in time series studies the interest is in large T asymptotics corresponding to long study time periods; (ii) there can be multiple or no health observations on a given day, in contrast to a time series study where a single population-level health outcome is available on each day in a given geographic region; (iii) different assumptions about sources of randomness in the exposure may be appropriate for the two study designs; (iv) inter-subject variability makes it more difficult to accurately identify the seasonal and meteorological trends in cohort health data than in time series data; and (v) we need to be concerned with subject-specific covariates in cohort data such as blood pressure that could have their own temporal trends.

Our second objective is to propose a more efficient alternative to semiparametric regression. Since cohort study data often include air pollution measurements on days without health outcomes, semiparametric regression does not utilize all of the available exposure data. An alternative is to pre-adjust the exposure for temporal variability due to seasonality or meteorology and then use this modified exposure to estimate an unconfounded effect by ordinary least squares (OLS) or generalized least squares (GLS), without further adjustment in the disease model. Similar ideas have been considered for time series studies, but it is not clear that there is an advantage in that setting since the conventional approach already utilizes all of the available exposure data (Fung et al., 2003).

We summarize the data and findings from Adar et al. (2010) in Section 2, and we introduce notation and describe our statistical framework in Section 3. In Section 4, we formalize the semiparametric regression methodology for cohort studies and discuss the required number of df to obtain unbiased effect estimates and valid standard errors. In Section 5 we describe the pre-adjustment methodology. We illustrate our findings with a simulation study in Section 6 and reanalyze the retinal arteriolar data from MESA in Section 7. We conclude in Section 8 with a discussion, including guidance on when pre-adjustment followed by OLS or GLS is preferable to semiparametric regression.

2. Retinal Arteriolar Diameter and Air Pollution

A recently published analysis of the MESA cohort found evidence of an association between decreased retinal arteriolar diameter and elevated exposure to PM2.5 air pollution on the previous day (Adar et al., 2010). As discussed by Adar et al. (2010), previous studies have found that changes in the microvasculature, including retinal arteriolar diameter, are associated with increased risk of myocardial infarction, stroke, and cardiovascular mortality, independent of other traditional risk factors. Therefore, these findings provide support for the hypothesis that reported associations between air pollution and the development and exacerbation of clinical cardiovascular disease are related to microvascular phenomena.

MESA is a prospective cohort study designed to examine the progression of subclinical cardiovascular disease (CVD). It enrolled 6814 men and women 45–84 years of age who were free of clinical CVD at entry from six U.S. communities in Baltimore, Chicago, Los Angeles, New York, Minneapolis-St. Paul, and Winston-Salem. Details of the sampling, recruitment, and data collection are described by Bild et al. (2002). The MESA cohort provides an excellent infrastructure for assessing the relationship between air pollution and various indicators of cardiovascular disease, particularly within the framework of MESA Air, an ancillary study to MESA funded by the EPA that includes collection of additional air quality monitoring and health endpoint data (Kaufman et al., 2012).

Retinal arteriolar diameter, a marker of microvasculature phenomena, was measured in MESA participants by retinal photography. Retinal photography was performed during the second MESA examination between August 2002 and January 2004. A total of 6176 individuals had retinal photographs taken, and 4607 subjects had complete data for inclusion in this analysis. Retinal arteriolar diameters within an area equal to 0.5–1 disc diameters from the optic disc margin are summarized as central retinal arteriolar equivalents (CRAE).

Air pollution exposures on the day prior to retinal photography were assigned based on the area-wide average concentrations from EPA Air Quality System (AQS) monitoring stations with complete time series during the period of interest. In light of the complex topography in the Los Angeles basin, the analysis incorporated four sub-regions: coastal Los Angeles, downtown Los Angeles, Riverside, and the area between Los Angeles and Riverside, giving a total of nine separate regions in our analysis. The data in Figure 2 show clear evidence of temporal trends in meteorology and PM2.5 concentrations. Inter-subject variability makes it difficult to determine if there are temporal trends in CRAE measurements, but as noted by Adar et al. (2010) there is scientific reason to believe such trends are present. In a multivariate linear model with a full suite of subject-specific covariates and semi-parametric adjustment for season with 12 df per year in each region and for meteorology with 6 df in each region, Adar et al. (2010) found a –0.4 μm (95% CI 0.8 to 0.1) decrease in CRAE per 10 μg/m3 increase in the previous day's PM2.5 concentration.

Figure 2.

Figure 2

Data from the study of the association between short-term air pollution exposure and retinal arteriolar diameter in six MESA cities (plus four meteorology zones in Los Angeles). For each plot, time series are shown of temperature, relative humidity, and PM2.5 concentration in the first three rows. The fourth row shows all available measurements of central retinal arteriolar equivalents (CRAE) on each day.

The analytic approach used by Adar et al. (2010) was chosen to be consistent with standard practice in air pollution time series studies. The present work was motivated by a desire to (i) improve precision by more fully utilizing exposure data on days when no health outcome measurements were available and (ii) determine if alternative criteria for selecting df were either necessary to ensure unbiasedness or preferable to increase precision. In Section 7, we will reanalyze this dataset using the methods proposed here.

3. Statistical Framework

3.1. Overview of Model

Consider a cohort study with a continuous health outcomes yi, exposures xi, and subject-specific covariates zi for subjects i = 1, . . . , n, measured at follow-up times ti. We assume the ti can take values in 1, . . . , T and {1,000,T} and note that the exposure is defined observable at every time in 1 the and {1, . . . , T , outcome subject-specific covariates are while only observed on days that study subjects have clinical follow-up. We refer to the units of time as days, although other timescales can also be considered. We focus on the asymptotic properties of estimators for large n, keeping T fixed, since the number of subjects is the natural asymptotic scaling for a cohort study.

In Section 3.2, we describe a model for the random follow-up times ti. In Sections 3.3 and 3.4, respectively, we describe models for the xi and zi conditional on the ti. Finally, in Section 3.5 we describe a model for the yi conditional on the xi, zi, and ti.

3.2. Follow-Up Time

In many observational studies, including MESA, follow-up times are determined by clinic visit dates. We assume clinics make an effort to schedule multiple appointments on a subset of the available dates, resulting in clusters of subjects with the same follow-up times, as we see in the MESA data. It is impossible to know the exact procedure by which this occurs, so we adopt the following model. Assume the study participants are divided into clusters of varying sizes such that the cluster visit days are chosen independently of each other, and each subject is pre-determined to be in a particular cluster. Notice that under this model, there is no way of knowing from the data whether individuals with clinical follow-up on the same day are part of a shared cluster. Our analyses assume that they are, which can lead to slightly conservative inference since the number of independent clusters is underestimated.

We have also evaluated the performance of our estimation methodology in simulations with alternative clustering mechanisms and with no clustering (i.e., independent follow-up times). The results are similar for different clustering mechanisms and the differences between methodologies are less pronounced when there is no clustering since there are fewer days without health outcomes (not shown).

3.3. Exposure Model

We assume there is a shared time series of exposures x(·) defined on t ∈ {1, . . . , T } such that conditional on ti we can write xi = x(ti) and

x(ti)=g(ti)+η(ti), (1)

where g(·) is a smooth function of time and η(·) is residual temporal variation.

An important question is whether to regard the function η(·) as deterministic or random variation around the temporal trend g(·). While it is conventional to regard η( ) as stochastic, say with the η(t) for t ∈ {1, . . . , T } i.i.d. normal ·with mean zero and variance ση2 (Dominici et al., 2004), it is not clear what stochastic data-generating mechanism could underly such a construction in a cohort study. Furthermore, even if it is appropriate to regard η(·) as stochastic, the assumption that there is no autocorrelation may be problematic.

We believe it is most natural to regard η(·) as deterministic and assume the sources of randomness in hypothetical repeated experiments are the choice of subjects in the cohort, their disease states, and the follow-up days ti on which their disease states are measured. In what follows, we consider the implications of treating η(·) as either deterministic or stochastic (i.i.d. normal as described above).

3.4. Subject-Specific Covariates

Some subject-specific covariates will have temporal structure (e.g., blood pressure) while others will be independent of time (e.g., height). To accommodate both types of covariate, we decompose the subject-specific covariates as zi = w(ti) + ζi, where w(·) is the temporal trend component and the ζ i are independent of time and have mean zero. Notice that unlike our model for air pollution exposure, subject-specific covariates are not purely a function of time and we always model the residual term ζi as i.i.d. normal, with the stochasticity derived from random selection of subjects from the superpopulation.

3.5. Disease Model

Finally, we assume a linear disease model

yi=xiβx+ziβz+f(ti)+εi, (2)

where βx is the parameter of interest, f (·) is a smooth function of time, and the εi are i.i.d. normal random variables with mean zero and variance σε2. Our objective is to derive efficient and unbiased estimates of βx. We omit dependence on meteorology to simplify notation, but no substantive changes are required to include this in the analysis.

We observe each of the yi and zi and the corresponding follow-up times ti. We also assume we are able to measure the shared exposure time series x(·) without error, so that we know xi = x(ti). The challenge in estimating βx is to control for temporal confounding that manifests itself as a correlation between f (·) and the exposure time series x(·). If we observed f (ti) for each ti we could easily adjust for temporal confounding by including it in the regression model. Since we do not observe the f (ti), we need to assume a flexible structure for f (·) and exploit this structure to adjust for temporal confounding.

3.6. Regression Splines

We extend the framework in Dominici et al. (2004) and assume that f (·), g(·), and w(·) can be represented by regression splines with m1, m2, and · m3 df, respectively. There is always some error in assuming that a smooth function can be fully represented by a particular regression spline basis, but if we allow sufficient df this error is relatively small. Let h1(·), h2(·), . . . be a possibly infinite sequence of orthogonal regression spline basis functions, and for any positive integer m let Hm(·) = (h1(·), . . . , hm(·)) be the vector-valued function comprised of the first m basis functions. We can write f (·) = Hm1(·)γm1 and g(·) = Hm2(·)αm2 for some m1 1 and m2 × 1 vectors of coefficients γm and α × > 1 m2 , respectively, and w(·) = Hm3(·)δm3 for some m3 × r matrix of coefficients δm3.

In practice we do not know how many df are needed to adequately describe f (·), g(·), or w( ). It is tempting to estimate these quantities from the data·using a method such as generalized cross-validation or Akaike Information Criterion (AIC). These methods have been applied for estimating the degree of smoothness in time series datasets where the residual variability is relatively small (Peng et al., 2006). However, such methods favor parsimony and may underestimate the required number of df if the smooth trends are difficult to identify in the data. In addition, a data-driven approach such as this requires using some or all of the data twice, making it difficult to estimate valid standard errors.

We prefer to determine m1, m2, or m3 based on scientific judgment about the degree of variability in the seasonal and meteorological trends in the outcome, exposure, and subject-specific covariates (Schwartz, 2006) and to assess the sensitivity of our findings by fitting the model with additional df (Peng et al., 2006). We assume we can estimate m1, m2, or m3 well based on scientific considerations, or at least that we have valid lower bounds for these quantities. Finally, to simplify the exposition we assume m3 ≤ min(m1, m2). The arguments that follow can be adapted easily to situations where m3 > min(m1, m2).

4. Semiparametric Regression Model

The first approach to adjusting for temporal confounding is semiparametric regression. As adapted by Dominici et al. (2004) for time-series studies, the semiparametric regression methodology is to estimate βx by OLS from the model

yi=xiβx+ziβz+Hm(ti)γm+ε~i (3)

for some value of m. If m < m1 then ε~i may not be identical to εi. Assuming the degrees of smoothness of f (t) and g(t) are known, two natural choices are to take m m1 or m = m2, which correspond to including sufficient df in the disease model to account for the trend in the health outcome or the exposure, respectively. Dominici et al. (2004) demonstrate for time series studies that it is sufficient to take m = min(m1, m2). We generalize their development to cohort studies and explain why in this setting it is preferable to choose mm1.

4.1. Sufficient Degrees of Freedom to Account for the Trend in the Outcome (m = m1)

The analysis is straightforward if we set m m1, since the model in (3) fully adjusts for f (·) = H m1 (·)γm1. We can rely on fixed covariate regression results, conditioning first on the ti and on η(·) if it is random, to conclude that there is no bias in estimating βx and that classical standard error estimates are valid.

4.2. Sufficient Degrees of Freedom to Account for the Trend in the Exposure (m = m2

Suppose we set m = m2 in a scenario where m1 is greater than m2 (the results from Section 4.1 are applicable if m2 is greater than or equal to m1). Define γm1m2=(γm2+1,,γm1) and the vector-valued function Hm1/m2(·) = (hm2+1( ), . . . , hm1(·)). For fixed T and conditional on the ti, we define Hm1m2=(Hm1m2(t1)T,,Hm1m2(tn)T)T and denote Hm1m2 in the special case of n = T and ti = i corresponding to exactly one observation per day. We similarly define Hm2 and Hm2. Considering a fixed sequence of η(ti) (conditionally, if η(·) is random), we define η = (ηt1, . . . , ηtn)T and η = (η(1),...,η(T))T. We can now adapt an argument from Dominici et al. (2004) to show that as n → ∞,

E(β^xβx)a.s.ηTHm1m2γm1m2ηT(IHm2(Hm2THm2)1Hm2T)η. (4)

See Web Appendix A for details.

The right-hand side of (4) is non-zero for a general deterministic function η(·), so β^x is asymptotically biased. Symmetry implies the right-hand side of (4) has zero expectation if the η(t) for t ∈ {1, . . . , T } are i.i.d. normal with mean zero, in which case we conclude β^x is asymptotically unbiased for large n. In an asymptotic analysis appropriate for time series studies but not cohort studies, Dominici et al. (2004) show the right-hand side of (4) converges to zero as the study duration, T , converges to infinity, even for fixed η(·).

Standard error estimation is also an open problem for m = m2. Classical fixed covariate regression results do not apply since the bias is only eliminated by averaging over realizations of η(·), and random covariate regression methods with robust “sandwich” standard errors (White, 1980) do not apply since the shared random function η(·) induces correlation across all study subjects. We recommend selecting m based on m1, as in Section 4.1.

5. Pre-Adjusting the Exposure

We consider an alternative to semiparametric regression that can be more efficient if fitting (3) with sufficiently large m requires too many df relative to the available health data. The idea is to remove the temporal trend from the exposure time series and then estimate βx without further concern for confounding. This is particularly appealing when there are many days with exposure data on which there is no health data. We assume in this section that clinic visits are equally likely on each day in {1, . . . , T }, with obvious modifications for other follow-up day probability distributions.

We estimate g(·) by g^()=Hm()α^m where α^m is the OLS estimate from fitting

x()=Hm()αm+η~()

based on the data {x(1), . . . , x(T )} and {Hm(1), . . . , Hm(T). If m < m2 then η~() may not be identical to η(·). We define η~()=x()g^() to be the pre-adjusted exposure from which the estimated trend is removed, and we estimate βx by OLS from

yi=η^(ti)βx+g^(ti)β˘x+ziβz+(f(ti)+εi), (5)

regarding f (ti) + εi as the unobserved random noise. equation (2) implies that (5) holds with β˘x=βx = βx, but we estimate these quantities separately and only interpret β^x since we will show that for sufficiently large m it is not confounded by f (·).

)For any two random functions of time φ(·) and ψ(·) and ψ(·). We define stochastic orthogonality on {1,000,T} by Et=1Tϕ(t)ψ(t)=0. A straightforward extension of Lemma 1 in White (1980) shows that β^x estimated from (5) is strongly consistent for βx if η η^() is stochastically orthogonal toĝ(·), f (·), and each element wk(·) of w(·) for k = 1, . . . , r. orthogonal to ĝ(·), f(·) and each element wk(·) for k = 1,000,r.

It is always true that η^() is stochastically orthogonal to ĝ(·) since t=1Tη^()g^() holds by construction. In the next two subsectoins, we give conditions on m to guarantee stochastic orthogonality with f (·) and the wk(·), and we discuss calculation of standard errors.

5.1.Sufficient Degrees of Freedom to Account for the Trend in the Outcome m = m1)

If we set m = m , then by construction t=1Tη^(t)hk(t)=0 for k = 1,..., m1 from which it follows that η^() is stochastocally orthogonal to f (·), regardless of wheather η(·) is fixed or random. Unlike the semiparametric regression model in the Section 4.1, however, this is not sufficient to guarantee strong consistency of β^x due to inclusion of time-varying subject-specific covariates zi in the model. We have assumed m1m3 so that similar logic guarantees that η^() is stochastically≥ orthogonal to the wk (·). If there is reason to believe that m3 > > m1 then m should be chosen to be at least as large as m3. The required orthogonality conditions hold for fixed η(·), or conditionally if η(·) is random, so GEE standard errors (Liang and Zeger, 1986) can account for clusters of subjects with follow-ups on the same day.

5.2. Sufficient Degrees of Freedom to Account for the Trend in the Exposure (m = m2)

Suppose now that we set m = m2 in a scenario where m1 > m2. We cannot rely on the arguments from Section 5.1 for a fixed η(·) to conclude that β^x is strongly consistent for βx. However, if we assume the η(t) for t 1, . . . , T are i.i.d. random variables with mean zero and are independent of the ti, εi, and ζi, then it follows immediately that Eη^(t)=0 for each t = 1,000, T and that η^() is stochastically orthogonal to f(·) and the wk(·), and there is no asymptotic bias. However, similar to Section 4.2, it is not clear how to calculate standard errors in this setting because the estimated subject exposures η^(ti) in (5) are all correlated with each other due to their shared dependence on η(·). Therefore, we recommend choosing m based on the smoothness of temporal trends in the outcome as in Section 5.1.

5.3. GLS to Improve Efficiency

Comparing (3) and (5), we see a tradeoff in efficiency between semiparametric regression and pre-adjusting the exposure. If we pre-adjust the exposure, we can estimate β^x from a model with m fewer degrees of freedom. However, this comes at the cost of adding f (·) to the unmeasured residual in the disease model, suggesting that it is better to use semiparametric regression when f (·) is large enough to be a precision variable. We now describe a strategy for improving efficiency of using a pre-adjusted exposure in such situations.

When we estimate βx from (5) by OLS there is a loss of efficiency from weighting the contrasts between all subjects equally, especially if the shared component of the residual f (·) is relatively large. It is preferable to assign larger weights to contrasts for pairs of subjects i and j such that f (ti) and f (tj) are similar, which is analogous to assigning larger weights to within-subject contrasts in a crossover experiment (Diggle et al., 2002, p. 63). One way of achieving this in our setting is by GLS estimation with a suitably chosen reweighting matrix.

Since we do not explicitly estimate the spline coefficients for f (·), we cannot immediately calculate optimal weights. However, we can construct approximate weights based on an estimate of the average magnitude of the γk, which is obtained by regarding the γk as random effects in a linear mixed effects model. We emphasize that the γk are fixed in a given geographic region, so we do not posit a random data-generating mechanism, but we can still formally derive a mixed effects model by regarding the γk as exchangeable (Gelman, 2005; Hoff, 2009; Hodges and Reich, 2010). It may be that some of the γk represent seasonality and others represent meteorology, in which case we allow different random effect variances for the distinct groups of exchangeable coefficients.

Take m = m1 as in Section 5.1 and consider the formal mixed model

yi=η^(ti)βx+g^(ti)β˘x+ziβz+Hm(ti)γm+εi, (6)

where unlike our treatment of (3) we regard γm as a random effect. If all components of γm are exchangeable (e.g., if they are all coefficients for temporal spline functions), the random effect model has a diagonal homoscedastic covariance matrix. If there are multiple groups of exchangeable coefficients in γm we allow separate homoscedastic diagonal covariance matrices for each group, with a separate variance estimate for each group.

Let W–1 be the marginal covariance matrix for the yi based on estimated variances of γm and the εi. This can be obtained by restricted maximum likelihood (REML) using standard software such as the NLME package (Pinheiro et al., 2010) in R (R Development Core Team, 2010). We obtain β^x by fitting (5) using GLS with weight matrix W. GLS is more efficient than OLS because it takes advantage of the shared residual structure between observations.

Point estimates from GLS are identical to those from directly fitting the mixed model in (6), so little additional programming is required. However, standard errors based on treating the fixed γk as if they were stochastic cannot be assumed to be valid, and in some of our simulations they underestimate the variability of β^x (not shown). Therefore, we emphasize the GLS interpretation and further discuss standard error estimation in Web Appendix B.

6. Simulations

We simulate data according to (1) and (2) with βx 0.5 and no subject-specific covariates. The time period is T = –546 days from September 2002 through February 2004 in six regions denoted by R = 0, . . . , 5. We consider cohorts with 1527 clusters (as in MESA) or 300 clusters (smaller cohort) of average size 3 (range 1–10). The cluster sizes are based on the distinct groups of MESA subjects with follow-ups on the same date.

The temporal trend for the exposure is

g(t)=0.87sin(2π365(t+60R)),

which corresponds to a seasonal annual pattern with a different phase in each region. The outcome trend has the inverse seasonal structure plus additional finer scale structure

f(t)=α[0.35sin(2π365(t+60R))0.48sin(8π365(t60R))].

We set the outcome trend multiplier α = 1, 10, 20, 30, with α = 1 corresponding to the magnitude of seasonal variation observed in the MESA data. Using B-splines, visual inspection shows that 7 df per year in each region are adequate to accurately model g(·) (m2 = 63) while 14 df per year in each region are required for f (·) (m1 = 126) (not shown).

The residuals in the health model are independent Gaussian random variables with σε2=124. The non-smooth part of the exposure η(·) is generated as independent Gaussian random variables with ση2=0.48. The values of ση2 and σε2 are based on corresponding residual variances in our analysis of the MESA data. We primarily report results based on a single realization of η(·) across Monte Carlo simulations, which we have argued in Section 3.3 is more scientifically plausible than the alternative scenario of independent realizations in each Monte Carlo simulation. The results with random η(·) are similar to what we report below, except that there is no bias even if we only use 7 df per year for seasonal adjustment.

Example realizations of simulated data are shown in Figure 1. Simulation results with 5000 Monte Carlo simulations in each scenario are reported in Tables 1 (7 df per year) and 2 (14 df per year). We report relative bias, the observed standard deviation (SD) of β^x, the mean estimated standard error (SE) of β^x, and coverage of 95% confidence intervals (CIs). No adjustment is denoted by NO-ADJ, semiparametric adjustment is denoted by SEMIPAR, and pre-adjustment followed by OLS and GLS are denoted by PRE-OLS and PRE-GLS, respectively. For NO-ADJ and SEMIPAR we use classical standard errors estimates, and for PRE-OLS and PRE-GLS we use GEE standard error estimates. We also consider, as a variant, using GEE standard error estimates with SEMIPAR.

Figure 1.

Figure 1

Simulated exposure and outcome data for a single region from a cohort with 1527 clusters across six regions (total of 781 subjects in this region). The gray dots are simulated observations, and the black curves are the assumed underlying trends.

Table 1.

Simulation results with 7 degrees of freedom per year, based on 5000 Monte Carlo realizations for each scenario

Small study population (300 clusters)
Large study population (1527 clusters)
Rel. bias SD E(SE) 95% CI Rel bias SD E(SE) 95% CI
Outcome trend multiplier α = 1
    NO-ADJ 0.37 (0.012) 0.41 0.40 92% 0.36 (0.005) 0.17 0.18 83%
    SEMIPAR (7 df/year) 0.01 (0.018) 0.65 0.63 95% 0.00 (0.007) 0.25 0.25 95%
    SEMIPAR + GEE (7 df/year) 0.01 (0.018) 0.65 0.55 91% 0.00 (0.007) 0.25 0.24 94%
    PRE-OLS (7 df/year) 0.00 (0.016) 0.56 0.54 94% 0.00 (0.007) 0.24 0.24 95%
    PRE-GLS (7 df/year) 0.00 (0.016) 0.56 0.54 94% 0.00 (0.007) 0.24 0.24 95%
Outcome trend multiplier α = 10
    NO-ADJ 3.58 (0.014) 0.49 0.42 2% 3.58 (0.006) 0.21 0.19 0%
    SEMIPAR (7 df/year) – 0.04 (0.021) 0.73 0.65 92% –0.05 (0.008) 0.28 0.26 92%
    SEMIPAR + GEE (7 df/year) – 0.04 (0.021) 0.73 0.62 91% –0.05 (0.008) 0.28 0.28 94%
    PRE-OLS (7 df/year) – 0.07 (0.019) 0.66 0.64 94% –0.06 (0.008) 0.28 0.29 96%
    PRE-GLS (7 df/year) – 0.07 (0.019) 0.66 0.63 94% –0.06 (0.008) 0.28 0.29 96%
Outcome trend multiplier α=20
    NO-ADJ 7.15 (0.019) 0.67 0.48 0% 7.15 (0.008) 0.29 0.21 0%
    SEMIPAR (7 df/year) –0.09 (0.026) 0.93 0.70 86% –0.11 (0.010) 0.36 0.28 87%
    SEMIPAR + GEE (7 df/year) –0.09 (0.026) 0.93 6.50 91% –0.11 (0.010) 0.36 0.38 96%
    PRE-OLS (7 df/year) –0.15 (0.025) 0.88 0.87 94% –0.11 (0.011) 0.37 0.42 97%
    PRE-GLS (7 df/year) –0.14 (0.025) 0.88 0.85 94% –0.11 (0.010) 0.36 0.39 96%
Outcome trend multiplier α=30
    NO-ADJ 10.71 (0.025) 0.90 0.58 0% 10.73 (0.011) 0.39 0.25 0%
    SEMIPAR (7 df/year) –0.14 (0.034) 1.20 0.79 80% –0.16 (0.013) 0.47 0.32 81%
    SEMIPAR + GEE (7 df/year) –0.14 (0.034) 1.20 1.02 91% –0.16 (0.013) 0.47 0.51 96%
    PRE-OLS (7 df/year) –0.23 (0.033) 1.16 1.16 94% –0.17 (0.014) 0.49 0.57 97%
    PRE-GLS (7 df/year) –0.21 (0.033) 1.17 1.10 93% –0.17 (0.013) 0.47 0.51 96%

The exposure deviations ηt are fixed across all Monte Carlo realizations. For each simulation scenario and estimation method, we report the mean relative bias in estimating βx = – 0.5 and Monte Carlo standard error in parentheses, the empirical standard deviation of β^x, the mean estimated standard error, and coverage of 95% Wald confidence intervals. No adjustment is denoted by NO-ADJ, semiparametric adjustment is denoted by SEMIPAR, and pre-adjustment followed by OLS and GLS are denoted by PRE-OLS and PRE-GLS, respectively. SEMIPAR + GEE refers to the variant of using GEE standard error estimates with SEMIPAR estimation.

6.1. Bias and Confidence Interval Coverage

There is noticeable bias in β^x when no seasonal adjustment is made. The bias is completely eliminated by SEMIPAR, PREOLS, and PRE-GLS when we use 14 df per year in each region, the number of df required to account for seasonality in the outcome. There is some residual bias if we only use 7 df per year, especially for larger values of α.

All three adjustment approaches have good inferential properties when we use 14 df/year. The mean estimated SEs are close to the observed SDs, and we see nearly nominal coverage for 95% CIs. SE estimates are slightly conservative for PRE-OLS with 1527 clusters. This may be attributed to our data-based determination of which subjects to include in a cluster.

The SE estimates from SEMIPAR with 7 df/year are too small for larger values of α, resulting in undercoverage of 95% CIs. Results are improved by GEE SEs, but bias remains and numerical instability is a concern due to the small number of independent clusters compared to df in the SEMIPAR model. In particular, standard software fails to calculate GEE standard errors in a small number of our 5000 realizations (7 df/year: 1 with 300 clusters, 0 with 1527 clusters; 14 df/year: 81 with 300 clusters, 1 with 1527 clusters). We report these results only for 7 df/year and exclude the one problematic realization. The SE estimates from PRE-OLS and PRE-GLS remain accurate when we use 7 df/year and, despite the residual bias, 95% CI coverage is close to nominal.

6.2. Relative Efficiency

We focus on simulations with 300 clusters where we use 14 df per year for adjustment (the first four columns in Table 2). Similar patterns in relative efficiency are evident when we simulate a larger cohort with 1527 clusters, although the differences are considerably smaller. Given the residual bias with 7 df per year, relative efficiency may be of less interest, but the general patterns are similar, although somewhat less pronounced.

Table 2.

Simulation results with 14 degrees of freedom per year, based on 5000 Monte Carlo realizations for each scenario

Small study population (300 clusters)
Large study population (1527 clusters)
Rel. bias SD E (SE) 95% CI Rel. bias SD E (SE) 95% CI
Outcome trend multiplier α = 1
    NO-ADJ 0.37 (0.012) 0.41 0.40 92% 0.36 (0.005) 0.17 0.18 83%
    SEMIPAR (14 df/year) 0.01 (0.022) 0.78 0.77 94% 0.00 (0.007) 0.26 0.26 95%
    PRE-OLS (14 df/year) 0.01 (0.016) 0.56 0.54 94% 0.00 (0.007) 0.24 0.24 95%
    PRE-GLS (14 df/year) 0.01 (0.016) 0.56 0.54 94% 0.00 (0.007) 0.24 0.24 95%
Outcome trend multiplierα = 10
    NO-ADJ 3.58 (0.014) 0.49 0.42 2% 3.58 (0.006) 0.21 0.19 0%
    SEMIPAR (14 df/year) 0.01 (0.022) 0.78 0.77 94% 0.01 (0.007) 0.26 0.26 95%
    PRE-OLS (14 df/year) –0.01 (0.019) 0.66 0.64 94% 0.00 (0.008) 0.28 0.30 96%
    PRE-GLS (14 df/year) 0.00 (0.019) 0.66 0.63 94% 0.00 (0.007) 0.25 0.25 95%
Outcome trend multiplier α=20
    NO-ADJ 7.15 (0.019) 0.67 0.48 0% 7.15 (0.008) 0.29 0.21 0%
    SEMIPAR (14 df/year) 0.01 (0.022) 0.78 0.77 94% 0.01 (0.007) 0.26 0.26 95%
    PRE-OLS (14 df/year) 0.04 (0.025) 0.90 0.88 94% 0.00 (0.011) 0.38 0.43 97%
    PRE-GLS (14 df/year) 0.00 (0.021) 0.74 0.69 94% 0.01 (0.007) 0.26 0.26 95%
Outcome trend multiplier α = 30
    NO-ADJ 10.71 (0.025) 0.90 0.58 0% 10.73 (0.011) 0.39 0.25 0%
    SEMIPAR (14 df/year) 0.02 (0.022) 0.78 0.77 94% 0.01 (0.007) 0.26 0.26 95%
    PRE-OLS (14 df/year) 0.06 (0.034) 1.18 1.18 94% 0.00 (0.014) 0.50 0.58 98%
    PRE-GLS (14 df/year) 0.01 (0.021) 0.75 0.71 93% 0.01 (0.007) 0.26 0.26 95%

The exposure deviations ηt are fixed across all Monte Carlo realizations. For each simulation scenario and estimation method, we report the mean relative bias in estimating βx = – 0.5 and Monte Carlo standard error in parentheses, the empirical standard deviation of β^x, the mean estimated standard error, and coverage of 95% Wald confidence intervals. No adjustment is denoted by NO-ADJ, semiparametric adjustment is denoted by SEMIPAR, and pre-adjustment followed by OLS and GLS are denoted by PRE-OLS and PRE-GLS, respectively.

We first consider scenarios with relatively small magnitudes of trend in the outcome. With α = 1, the SD of β^x is 0.78 using SEMIPAR and 0.56 using PRE-OLS and PRE-GLS, which implies the relative efficiency of SEMIPAR compared to either PRE-OLS or PRE-GLS is 0.52 (ratio of variances). Similarly, with α = 10, the relative efficiency of SEMIPAR compared to either PRE-OLS or PRE-GLS is 0.72. Consistent with our expectations, we gain efficiency by pre-adjusting the exposure, and there is no benefit from using GLS rather than OLS since the trend is not an important precision variable.

Turning now to scenarios with larger magnitudes of trend in the outcome, PRE-GLS is consistently the most efficient analysis. The relative efficiency of PRE-OLS is 0.68 for α 20 and 0.40 for α = 30, and the relative efficiency of SEMIPAR is 0.95 for α = 20 and 0.92 for α = 30. Thus, consistent with our expectations, we see that ignoring the trend as a precision variable in PRE-OLS results in less efficiency compared to either SEMIPAR or PRE-GLS. Furthermore, it turns out that PRE-GLS is slightly more efficient that SEMIPAR when it is important to take advantage of the structure in the trend.

7. Application to Retinal Arteriolar Data

We reanalyze the data from Adar et al. (2010) to compare the impact of different temporal adjustment methods. We adjust for the full set of subject-specific covariates and use SEMIPAR, PRE-OLS, and PRE-GLS for calendar date, temperature, and relative humidity with separate B-splines with interactions by region. We also include a day-of-week term with regional interaction. We vary the df in each region between 0 and 20 per year for seasonality and 0 and 9 for the meteorology variables. For PRE-GLS, calendar date, temperature, and relative humidity are independent random effects in the mixed model formulation.

Figure 3a suggests that results are minimally sensitive to the method of adjustment for temporal confounding. This is consistent with Figure 2, since the trend in the outcome appears small compared to the overall variability, similar to α = 1 in our simulations. Closer examination of the results in Figure 3a, however, reveals efficiency gains. If we follow Adar et al. (2010) and use 12 df per year for calendar time and 6 df for meteorology in each region, the relative efficiency of SEMI-PAR compared to PRE-OLS or PRE-GLS is 0.76. The 95% confidence interval for SEMIPAR crosses zero, while the confidence intervals for PRE-OLS and PRE-GLS do not, meaning our proposed methodology results in a statistically significant association, whereas SEMIPAR does not. Of course, great care is needed in interpreting such marginally significant findings. To further illustrate the efficiency gains, in Figure 3b we show results for a randomly selected subset of 1000 MESA subjects. With the same df as above, the relative efficiency of SEMIPAR compared to PRE-OLS or PRE-GLS is 0.69.

Figure 3.

Figure 3

Estimated increase in CRAE (μm) corresponding to a 10 μg/m3 increase in daily PM2.5 concentration. Semi-parametric regression and detrended exposure with and without shrinkage are employed with varying degrees of freedom to control for meteorology (separate splines for relative humidity and temperature in each zone) and calendar time (separate splines in each city). Solid lines are point estimates and dashed lines are 95% confidence intervals. Semiparametric adjustment is denoted by SEMIPAR and pre-adjustment followed by OLS and GLS are denoted by PRE-OLS and PRE-GLS, respectively. Point estimates for PRE-OLS and PRE-GLS are indistinguishable. (a) Full MESA cohort (4,607) subjects. (b) Randomly selected subset of MESA cohort (1,000 subjects).

8. Discussion

The need to adjust for temporal confounding in estimating acute air pollution effects is well known, especially in time series studies. Extension of time series methods to air pollution cohort data requires some care due to differences in data availability and plausible assumptions about randomness. A noteworthy difference is that air pollution cohort studies will often include data for exposure on days where there is no outcome data. We demonstrate that pre-adjusting the exposure rather than fitting a semiparametric regression model can result in increased efficiency by utilizing the additional exposure data. This approach can be improved further by estimating the health effect parameter of interest using GLS with a weight matrix determined by a formal random effects model.

Our simulation studies suggest that the advantage of pre-adjustment is most pronounced when two conditions are met, namely (i) the cohort is relatively small (in particular, smaller than MESA) such that there is no health outcome data on most days in the study period and (ii) the magnitude of temporal trend in the outcome data is small compared to the overall variability. When the trend in the outcome data is larger, the temporal adjustment terms are precision variables in the semiparametric adjustment, and pre-adjustment followed by OLS is less efficient. However, even in that situation pre-adjustment with GLS is at least as efficient as the standard semiparametric approach. Therefore, we recommend pre-adjustment followed by GLS in smaller cohort studies where there is reason to be concerned about the number of degrees of freedom required to robustly adjust for temporal confounding.

Our development emphasizes the importance of adjusting for temporal confounding with a sufficiently rich model to account for temporal trends in the outcome, and ideally any subject-specific covariates with temporal structure. Inter-subject variability presents a challenge for estimating the required model richness from cohort data, so it is even more important than in time series studies to rely on prior scientific knowledge and to conduct sensitivity analyses with different levels of temporal adjustment. While we have focused on cross-sectional cohort studies, similar issues can arise if longitudinal cohort data or certain types of panel study data are used to study acute air pollution health effects (Dominici et al., 2003).

Hodges and Reich (2010) point out the danger of introducing bias by using a formal mixed effects model as a device for smoothing when there is no random effect in the data-generating mechanism. It would appear that our GLS approach has the potential to introduce the bias they describe, but it does not because by construction the pre-adjusted exposure is orthogonal to the random effect basis functions and to other covariates in the model in (6). If we were to use shrinkage or penalization directly in the semiparametric regression model in (3), there would be a possibility of bias as described by Hodges and Reich (2010).

A recent article proposed Bayesian adjustment for confounding (BAC), a form of model averaging, to parsimoniously adjust for confounding in time series studies (Wang, Parmigiani, and Dominici, 2012). A salient feature of BAC is joint estimation of the exposure and disease models, in contrast to our two-stage approach. The confounding adjustment in BAC is approximate and is most appropriate when there are not sufficient data to support a complete confounder adjustment, whereas our methods efficiently use all of the available data to fully adjust for confounding, assuming the data are sufficient.

In our example from MESA, exposure is defined to be the concentration on the day prior to measurement of the retinal arteriolar diameter. Other lags have been considered, including the day of exposure and the average of several days prior to exposure (Dominici et al., 2006). It is straightforward to adapt our discussion to any such pre-specified exposure lag or averaging period. The unconstrained distributed lag model (Schwartz, 2000), which provides a more flexible framework for combining exposures from multiple days, however, presents additional complications because the exposure is multivariate. Further research is needed to determine how the pre-adjustment methodology can be adapted to this setting.

We have treated short-term air pollution as spatially fixed within regions. This is reasonable given that there is much more temporal than spatial variability, but there is some exposure misclassification from ignoring the spatial variability The primary result of this is loss of statistical power to detect small effects (Zeger et al., 2000). In principle, it is possible to adapt a spatio-temporal prediction model such as the one described by Szpiro et al. (2010) to produce daily exposure predictions at subject locations. However, this has the potential to introduce additional exposure misclassification (Szpiro and Paciorek, 2013), so inference about short-term air pollution effects may not be improved.

Supplementary Material

Supplementary Appendix

Acknowledgements

This research was funded by NIH/NIEHS through 5P50ES015915 and by the United States EPA through R831697 and RD-83479601. It has not been subjected to the EPA's required peer and policy review and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred.

Footnotes

9. Supplementary Material

Web Appendices referenced in Sections 4.2 and 5.3 and example code are available with this paper at the Biometrics website on Wiley Online Library.

References

  1. Adar SD, Klein R, Klein BEK, Szpiro AA, Cotch MF, Wong TY, O’Neill MS, Shrager S, Barr RG, Siscovick D, Daviglus ML, Sampson PD, Kaufman JD. Air pollution and the human microvasculature in vivo assessed via retinal imaging: The Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). PLoS Medicine. 2010;7:1–11. doi: 10.1371/journal.pmed.1000372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, Greenland P, Jacobs DR, Kronmal R, Liu K, Nelson JC, O’Leary D, Saad MF, Shea S, Szklo M, Tracy RP. Multi-ethnic study of atherosclerosis: Objectives and design. American Journal of Epidemiology. 2002;156:871. doi: 10.1093/aje/kwf113. [DOI] [PubMed] [Google Scholar]
  3. Brook RD, Rajagopalan S, Pope CA, Brook JR, Bhatnagar A, Diez-Roux AV, Holguin F, Hong Y, Luepker RV, Mittleman MA, Peters A, Sicovick D, Smith SC, Whitsel L, Kaufman JD, the American Heart Association Council on Epidemiology and Prevention, Council on the Kidney in Cardiovacular Disease, and Council on Nutrition, Physical Activity and Metabolism Particulate matter air pollution and cardiovascular disease: An update to the scientific statement from the American Heart Association. Circulation. 2010;121:2331–2378. doi: 10.1161/CIR.0b013e3181dbece1. [DOI] [PubMed] [Google Scholar]
  4. Diggle P, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. Oxford University Press; Oxford: 2002. [Google Scholar]
  5. Dockery DW, Pope CA, Xu X, Spangler JD, Ware JH, Fay ME, Ferris BG, Speizer FE. An association between air pollution and mortality in six cities. New England Journal of Medicine. 1993;329:1753–1759. doi: 10.1056/NEJM199312093292401. [DOI] [PubMed] [Google Scholar]
  6. Dominici F, Sheppard l., Clyde M. Health effects of air pollution: A statistical review. International Statistical Review. 2003;71:243–276. [Google Scholar]
  7. Dominici F, McDermott A, Hastie TJ. Improved semiparametric time series models of air pollution and mortality. Journal of the American Statistical Association. 2004;99:938–948. [Google Scholar]
  8. Dominici F, Peng RD, Bell ML, Pham L, McDermott A, Zeger SL, Samet JM. Fine particulate air pollution and hospital admission for cardiovascular and respiratory diseases. Journal of the American Medical Association. 2006;295:1127–1134. doi: 10.1001/jama.295.10.1127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Environmental Protection Agency National Ambient Air Quality Standards. 2006 [Google Scholar]
  10. Fung KY, Krewski D, Chen Y, Burnett R, Cakmak S. Comparison of time series and case-crossover analyses of air pollution and hospital admission data. International Journal of Epidemiology. 2003;32:1064–1070. doi: 10.1093/ije/dyg246. [DOI] [PubMed] [Google Scholar]
  11. Gelman A. Analysis of variance: Why it is more important than ever. The Annals of Statistics. 2005;33:1–31. [Google Scholar]
  12. Hodges J, Reich B. Adding spatially-correlated errors can mess up the fixed effect you love. The American Statistician. 2010;64:325–334. [Google Scholar]
  13. Hoff P. A First Course in Bayesian Statistical Methods. Springer; Dordrecht: 2009. [Google Scholar]
  14. Janes H, Sheppard L, Lumley T. Overlap bias in the case-crossover design with applications to air pollution exposures. Statistics in Medicine. 2005;24:285–300. doi: 10.1002/sim.1889. [DOI] [PubMed] [Google Scholar]
  15. Janes H, Sheppard L, Shepherd K. Statistical analysis of air pollution panel studies: an illustration. Annals of Epidemiology. 2008;18:792–802. doi: 10.1016/j.annepidem.2008.06.004. [DOI] [PubMed] [Google Scholar]
  16. Kaufman JD, Adar SD, Allen R, Barr RG, Budoff M, Burke G, Casillas A, Cohen M, Curl C, Daviglus M, Diez-Roux A, Jacobs D, Kronmal R, Larson R, Liu l.-J., Lumley T, Navas-Acien A, O’Leary D, Rotter J, Sampson PD, Sheppard L, Siscovick D, Stein J, Szpiro AA. Prospective study of particulate air pollution exposures, subclinical atherosclerosis, and clinical cardiovascular disease. the multi-ethnic study of atherosclerosis and air pollution (MESA Air). American Journal of Epidemiology. 2012;176:825–837. doi: 10.1093/aje/kws169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
  18. Miller KA, Sicovick DS, Sheppard L, Shepherd K, Sullivan JH, Anderson GL, Kaufman JD. Long-term exposure to air pollution and incidence of cardiovascular events in women. New England Journal of Medicine. 2007;356:447–458. doi: 10.1056/NEJMoa054409. [DOI] [PubMed] [Google Scholar]
  19. Peng RD, Dominici F, Louis TA. Model choice in time series studies of air pollution and mortality. Journal of the Royal Statistical Society, Series A. 2006;169:179–198. [Google Scholar]
  20. Pinheiro J, Bates D, DebRoy S, Sarkar D, the R Development Core Team nlme: Linear and Nonlinear Mixed Effects Models. R package version 3. 2010:1–97. [Google Scholar]
  21. Pope CA, Burnett RT, Thun MJ, Calle EE, Ito K, Krewski D, Thurston GD. Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. Journal of the American Medical Association. 2002;9:1132–1141. doi: 10.1001/jama.287.9.1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. R Development Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2010. ISBN 3-900051-07-0. [Google Scholar]
  23. Samet JM, Dominici F, Curriero F, Coursac I, Zeger SL. Particulate air pollution and mortality: Findings from 20 US cities. New England Journal of Medicine. 2000;343:1742–1749. doi: 10.1056/NEJM200012143432401. [DOI] [PubMed] [Google Scholar]
  24. Schwartz J. Nonparametric smoothing in the analysis of air pollution and respiratory illness. Canadian Journal of Statistics. 1994;22:471–487. [Google Scholar]
  25. Schwartz J. The distributed lag between air pollution and daily deaths. Epidemiology. 2000;11:320. doi: 10.1097/00001648-200005000-00016. [DOI] [PubMed] [Google Scholar]
  26. Schwartz J. Comment on: Model choice in time series studies of air pollution and mortality. Journal of the Royal Statistical Society, Series A. 2006;169:198–200. [Google Scholar]
  27. Sheppard L, Levy D, Norris G, Larson TV, Koenig JQ. Effects of ambient air pollution on nonelderly asthma hospital admissions in Seattle, Washington, 1987–1994. Epidemiology. 1999;10:23–30. [PubMed] [Google Scholar]
  28. Szpiro A, Sampson PD, Sheppard L, Lumley T, Adar SD, Kaufman JD. Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics. 2010;21:606–631. doi: 10.1002/env.1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Szpiro AA, Paciorek CJ. Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics. doi: 10.1002/env.2233. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68:661–686. doi: 10.1111/j.1541-0420.2011.01731.x. [DOI] [PubMed] [Google Scholar]
  31. White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48:817–838. [Google Scholar]
  32. Zeger SL, Thomas D, Dominici F, Samet JM, Schwartz J, Dockery D, Cohen A. Exposure measurement error in time-series studies of air pollution: Concepts and consequences. Environmental Health Perspectives. 2000;108:419–423. doi: 10.1289/ehp.00108419. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Appendix

RESOURCES