Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Apr 26.
Published in final edited form as: Stat Sci. 2010 Feb;25(1):107–125. doi: 10.1214/10-STS326

The importance of scale for spatial-confounding bias and precision of spatial regression estimators

Christopher J Paciorek 1
PMCID: PMC3082155  NIHMSID: NIHMS214677  PMID: 21528104

Abstract

Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When unmeasured confounding introduces spatial structure into the residuals, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual: one can reduce bias by fitting a spatial model only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals are independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.

Key words and phrases: epidemiology, identifiability, mixed model, penalized likelihood, random effects, spatial correlation, splines

1. INTRODUCTION

Spatial confounding is likely present in many of the applied contexts in which residuals are spatially correlated, particularly in public health and social science. Consider the motivating example of the health effects of exposure to (spatially varying) air pollution, an important public health issue. Many variables that explain variability in the response, including potential confounding variables that may be correlated with exposure, also vary spatially. For example, large-scale regional patterns in air pollution may be correlated with regional patterns in diet, income and other risk factors for a health outcome of interest. Small-scale patterns in air pollution from local sources may be correlated with risk factors as well, for example if lower-income people live nearer to busy roads or industrial sources. If confounding variables are not measured, it will be difficult to distinguish the effect of air pollution from residual spatial variation in the health outcome. I use the term spatial confounding to characterize this situation. Researchers have modeled the spatial structure in the outcome with the apparent goal of reducing confounding bias (e.g., Clayton, Bernardinelli and Montomoli, 1993; Pope et al., 2002; Cakmak et al., 2003; Biggeri et al., 2005). However, the statistical mechanism for reducing bias does not appear to be well understood nor investigated rigorously in the statistical or applied literature.

To consider the problem formally, start with simple linear regression with spatial structure:

Yi=β0+βxXi+ei,i=1,,neN(0,), (1)

where each outcome, Yi, is associated with a spatial location, si ∈ ℜ2. Xi is the corresponding value of a univariate regressor of interest, which may also vary spatially, in which case we would represent Xi as X(si). e = (e1, …, en)T is the vector of errors, whose covariance matrix, Σ, captures any residual spatial correlation, as well as independent variation. The regression coefficients, β = {β0, βx}, are unknown, and estimation of βx is of primary interest. Spatial statistics and regression texts note that the ordinary least squares (OLS) estimator for βx in this setting is unbiased but inefficient, and the usual OLS variance estimator is incorrect. Assuming known Σ, the generalized least squares (GLS) estimator is the most efficient estimator. However, little appears to be known about how the spatial scales of the residual variability and of X affect inference. Spatial structure in X is very common in applications and complicates the problem because X and the residual spatial structure compete to explain the variability in the response (Waller and Gotway, 2004). Furthermore, it would not be surprising if the spatial correlation in the residuals were caused by an unmeasured spatially varying confounder; I next introduce another representation of (1) to enable exploration of confounding. Motivated by the air pollution example, I will refer to X as the ‘exposure’.

One can obtain the basic spatial regression model (1) using a simple mixed model,

Yi=β0+βxX(si)+g(si)+εi, (2)

with random effects, g = (g(s1), …, g(sn))T, and white noise errors, εiiidN(0,τ2). Suppose the random effects are spatially correlated, with gN(0,σg2R(θg)), where R(θg) is a spatial correlation matrix parameterized by θg, a spatial range parameter, and σg2 is the variance of the random effects. Marginalizing over g gives the marginal likelihood,

Y=(Y1,,Yn)TN(β01+βxX,σg2R(θg)+τ2I), (3)

where 1 is an n-vector of ones, I is the identity matrix, and X = (X1, …, Xn)T. Here Σ in (1) is explicitly decomposed into spatial and non-spatial components. An alternative formulation would specify the unknown spatial function, g(s), as a penalized spline, where a penalty parameter plays the role of {θg, σg2} in the marginal likelihood in penalizing complexity of the spatial structure. The exposure may itself be spatially correlated. For example, if X(s) is a Gaussian process, then XN(0,σx2R(θx)), with parameters analogous to those for g. To demonstrate processes operating at different spatial scales, Fig. 1 shows simulated spatial surfaces as one varies the spatial range parameter, θ, in a Gaussian process model.

Fig 1.

Fig 1

Gaussian process realizations using the Matérn covariance (see Section 2.2) for three values of θ, with (a) high-frequency, small (fine)-scale variability when θ = 0.1, (b) moderate scale variability when θ = 0.5, and (c) low-frequency, large-scale variability when θ = 0.9.

The spatial statistics literature assumes that the error, ei in (1), is independent of the covariate(s) (Cressie, 1993; Waller and Gotway, 2004), with little or no discussion of the possibility that the error involves variation from unmeasured confounders. Henceforth I will refer to the errors as residuals because of the common use of the term ‘spatial residual’ to refer to unexplained spatial variability. To explore the possibility of confounding, let’s consider gβzZ to be induced as the effect, βz, of an unmeasured variable, Z, on the outcome. Z = (Z(s1), …, Z(sn))T may also be spatially correlated, e.g., ZN(0,σz2R(θz)), such that σz2=σg2/βz2, where θz is again a spatial range parameter. If Z and X are dependent, then Z is an unmeasured spatial confounder. Derivation of the marginal likelihood should be done by integrating over the (unknown) conditional distribution of Z given X, whereas the integration leading to (1) ignores the dependence. Note that if X and Z are considered fixed, then association between X and gβzZ is known as concurvity (Buja, Hastie and Tibshirani, 1989; Ramsay, Burnett and Krewski, 2003).

In the applied literature, practitioners often recognize the need to consider residual spatial structure in the outcome, with language of ‘control’ or ‘accounting’ for autocorrelation, and they fit models (such as kriging or spatial random effects) that implicitly assume independence of the residual and the exposure (Burnett et al., 2001; Cakmak et al., 2003; Cho, 2003; Burden et al., 2005; Augustin et al., 2007; Molitor et al., 2007; Cerdá et al., 2009; Lee, Ferguson and Mitchell, 2009). With the recent exception of Hodges and Reich (2010), formal statements of the goals and properties of fitting such spatial models are generally absent. However, much of the interest appears to lie in using the spatial residual structure to try to account for spatial confounding, with the implicit assumption that such models reduce or eliminate confounding bias (e.g., Clayton, Bernardinelli and Montomoli, 1993; Pope et al., 2002; Cakmak et al., 2003; Richardson, 2003; Biggeri et al., 2005). One approach is to explicitly consider the spatial scales involved, hoping that accounting for variation at a relatively large spatial scale allows for identification of the parameter of interest based on exposure heterogeneity at a smaller spatial scale (e.g., Burnett et al., 2001; Cakmak et al., 2003; Zeger et al., 2007). This smaller scale variation may be less prone to confounding in a given application. However, this consideration of spatial scale is often not explicit, and effects of scale on bias reduction, while sometimes hinted at, have not been developed formally.

In the analogous context of time series modeling of air pollution, Dominici, McDermott and Hastie (2004) attempt to attribute all the temporally correlated variability in the outcome to the residual term in order to identify the effect of exposure based on the temporally uncorrelated (and presumably unconfounded) heterogeneity in the exposure. Dominici, McDermott and Hastie (2004) provide no guidance in the scenario that the exposure cannot be decomposed into autocorrelated and uncorrelated components. This issue also applies to the approach of Lombardía and Sperlich (2007), who filter out the dependence between fixed and random effects. In the spatial setting, in which measurements cannot be made at all locations, accurate estimation of the uncorrelated component, if such a component even exists, is rare: consider atmospheric phenomena such as temperature and air pollution. A common situation in which fine-scale heterogeneity is not resolved involves prediction of spatially varying exposure values using averages of nearby measurements or spatial smoothing techniques. Hence I seek to address the problem when all of the measured components of variation in exposure are spatial.

In this paper I address estimation in simple regression models with spatial residual structure. I focus on the properties of penalized models, using a simple mixed model fit by GLS, equivalent to universal kriging, to analyze the effects of spatially correlated residual structure on fixed effect estimators. Section 2 focuses on bias from spatial confounding. I report analytic results when the full covariance structure is known and supporting simulations when the covariance (or the amount of smoothing in penalized spline models) is estimated from the data. I assess the use of sensitivity analysis approaches based on spline models that explicitly consider the bias-variance tradeoff involved in choosing the spatial scale at which to model the residual variation. Section 3 focuses on precision of estimators when there is no association between exposure and residual (no spatial confounding). I close with a case study of the effects of air pollution on birthweight (Section 4).

2. SPATIAL CONFOUNDING AND BIAS

2.1 Identifiability

A key consideration in the basic model (2) is identifiability of βx and g(s). A closely-related question is how the estimation procedure attributes variability between the exposure and the spatial residual term (the random effects). In the simple linear model, attribution of variability to the covariates rather than the error term is favored because this allows the estimate of the error variance to decrease, with the normalizing constant of the likelihood favoring smaller error variance. In the spatial model, if the spatial term, g, is unconstrained, then βxX and g are not identifiable in the likelihood: one could remove the covariate from the model and redefine g*(s) ≡ βxX(s) + g(s) with no change in the likelihood. Identifiability comes through constraints on g, either by (1) penalizing lack of smoothness in g(s), (2) considering g to be a random effects term, or (3) having a prior on g. These approaches give higher penalized likelihood, marginal likelihood, or posterior density, respectively, when variability is attributed to the unpenalized fixed effects term rather than to the spatial term. In the spatial confounding context this dynamic causes bias in estimation of βx, for example as seen in the simulations of Peng, Dominici and Louis (2006). An alternative constraint is to represent g in a reduced dimension basis, say as a regression spline. In this case the model is identifiable if there is a component of variability in X that cannot be explained by the spline structure, i.e., if X is not perfectly collinear with the columns of the chosen basis matrix.

2.2 Analytic Framework

To consider bias from unmeasured spatially varying confounders, take the following model as the data-generating mechanism,

YiN(β0+βxX(si)+βxZ(si),τ2), (4)

with the notation as in Section 1. For each location, s, suppose the correlation of X(s) and Z(s) over repeated sampling at the location is ρ ≠ 0, so that Z is a confounder. Suppose further that Z is not observed and that one models the residual spatial structure in the outcome through spatially correlated random effects, gN(0,σg2R(θg)) as in (2). Finally, suppose that one ignores the correlation between gβzZ and X and integrates over the marginal distribution for g, giving (3). Equivalently, Yi=β0+βxX(si)+εi. The induced correlation between X and ε* violates the usual regression assumption that the error is independent of the covariate, leading to bias. From the random effects perspective, we have (incorrectly) assumed that the random effects are independent of the covariate, a key (but often unstated) assumption of mixed effects models (Breslow and Clayton, 1993; Diggle et al., 2002, p. 170).

The treatment of X(s) and Z(s) as random naturally induces spatial structure. However, in a given dataset the most plausible repeated sampling framework may suggest that X and Z reflect spatial structure that does not arise from a stochastic data generating process. Rather, one might consider X(s) and Z(s) to be fixed unknown functions, particularly when X and Z vary at large scales, which mimics the partial spline/partial linear setting. This also is consistent with the treatment of large-scale variation in the mean term in traditional kriging. Consider the case when there is concurvity between the two fixed functions, reflected in a non-zero empirical correlation, ρ̂, between X and gβzZ as calculated over the collection of locations (e.g., the concurvity in the simulations of Ramsay, Burnett and Krewski (2003); Peng, Dominici and Louis (2006); He, Mazumdar and Arena (2006)). In the partial linear/partial spline setting it is well known that such association between the exposure and the nonparametric smooth term causes bias (Rice, 1986, eq. 28; Speckman, 1988). In any real dataset, the orthogonality needed for ρ̂ ≈ 0 seems particularly unlikely if both X and Z vary at large scale relative to the size of the domain (though ρ̂ < 0 may be as much a possibility as ρ̂ > 0).

The stochastic generative model is still useful under this framework of fixed functions because realizations of X(s) and Z(s) give plausible values for X and Z that could arise in real applications for which there is no reasonable stochastic mechanism. I choose to treat X and Z stochastically, and I use ρ to quantify explicitly the strength of association between the residual spatial variation and the exposure. This approach allows for some simple, useful analytic results and is further justified in that the variation that an unmeasured Z induces in Y is necessarily treated stochastically as part of the residual in actual applications. In some cases I report results conditional on X, and in others I also average over the stochastic variability in X and over variability in the spatial locations of the observations.

Since Z represents an unmeasured confounder, I assess the inferential properties of fitting a regression model by maximizing the marginal likelihood (3) using GLS, thereby ignoring correlation between the residual and the exposure. I assess bias as a function of the spatial scales of X(s) and Z(s), which I suppose to be generated as Gaussian processes with Matérn spatial correlation function,

R(d;θ,ν)=1Γ(ν)2ν1(2νdθ)νKν(2νdθ),

where d is the Euclidean distance between two locations, θ is the spatial range parameter, and Inline graphic(·) is the modified Bessel function of the second kind, whose order is the smoothness parameter, ν. I fix ν = 2, which gives continuous and differentiable Gaussian process realizations. This reflects an assumption of some smoothness in the spatial processes under consideration, but I also consider results based on an exponential correlation function (i.e., ν = 0.5). The model (3) is equivalent to both a mixed model and a universal kriging model if one knows the variance and spatial dependence parameters. Furthermore, given the extensive use of penalized splines in applications, and the connection between penalized splines and mixed models (Ruppert, Wand and Carroll, 2003), I also consider the use of a penalized spline to represent g(s).

In the non-spatial context, one would generally try to adjust for confounding by including relevant covariates as fixed effects; in the spatial context one could include spatial regression spline terms. The basic question that I explore in the remainder of Section 2 is the extent to which inclusion of a spatial random effect term or a penalized spline can adjust for unmeasured spatial confounding, given that these approaches do not involve a projection in the way that a regression spline does. The random effects and penalized spline approaches do estimate the residual spatial variation based on a bias-variance tradeoff (e.g., Claeskens, Krivobokova and Opsomer, 2009), and the penalized spline is a regression spline in the limit as the penalty goes to zero. So it seems plausible that these approaches may reduce bias by at least partially adjusting for the unmeasured spatial confounder. I will show that the spatial scales involved are critical.

2.3 Bias with known parameters

This section considers bias when I suppose that the variance parameters are known and only the regression coefficients, β0 and βx, are unknown. The initial results concern the situation when the exposure, X, and the unmeasured confounder, Z, vary at the same spatial scale. I then assess what happens when X varies at two scales and one is the same scale as the single-scale confounder. Finally, I consider the possibility that there is additional variability in the outcome at another scale, but uncorrelated with X.

To start, suppose that X(s) and Z(S) share the same spatial correlation range, θ, but may have different marginal variances, namely, XN(μx1,σx2R(θ)) and ZN(μz1,σz2R(θ)) and Cov(X, Z) = ρσxσzR(θ). Straightforward conditional normal calculations give

E(β^xX)=βx+[(XT1X)1XT1E(ZX)βz]2=βx+[(XT1X)1XT1X(μzρσzσxβzμx)ρσzσxβz)]2=βx+ρσzσxβz, (5)

where Inline graphic = [1 X], [·]2 indicates the second element of the 2-vector, and =σg2R(θ)+τ2I. The resulting bias, ρσzσxβz, is the same as if X and Z were not spatially structured and is also equal to the bias under OLS. This demonstrates that we have not adjusted for confounding at all by fitting the model that includes spatial structure. As with OLS, the model attributes as much of the variability as possible to the exposure, rather than to the spatially correlated residual term, including all of the variability in Z that is related to X. If ρ = 0, the bias is zero in (5). This occurs because we average over stochastic variability in Z, so any non-orthogonality between X and Z in individual realizations contributes to variance rather than bias. This contrasts with the bias terms in Rice (1986) and Dominici, McDermott and Hastie (2004), which are caused by non-orthogonality of the fine-scale variation in X and the nonparametric component of the model, since neither is treated stochastically.

Next I keep the same data-generating and model-fitting framework, but explore the situation in which the exposure varies at two scales. I suppose that X(s) is a multi-scale process and introduce correlation between Z(s) and one of the components of X(s). Let X = Xc + Xu be decomposed into a component, Xc, that is at the same scale as the confounder, Z, and a component at a different scale, Xu, which is independent of Xc and Z. Specifically, take Cov(X)=σc2R(θc)+σu2R(θu),Cov(Z)=σz2R(θc) and Cov(X, Z) = Cov(Xc, Z) = ρσcσzR(θc). After some straightforward algebra and matrix manipulations, we have

E(β^xX)=βx+[(XT1X)1XT1E(ZX)βz]2=βx+c(X)ρσzσcβz (6)

where

k(X)[(XT1X)1XT1M(Xμx1)]2pc,βz2σz2R(θc)+τ2Iβz2σz2+τ2=((1pz)I+pzR(θc)),M(pcI+(1pc)R(θu)R(θc)1)1,

and pzβz2σz2/(βz2σz2+τ2). We see that the bias term is proportional to that in the single-scale setting, multiplied by an additional term, k(X), that modulates the bias. k(X) necessarily includes an extra multiplicative factor, pcσc2/(σc2+σu2), that quantifies the magnitude of the confounded component of X relative to the total variation in X. While the term k(X) is complicated, we can explore its dependence on the spatial scales (θc and θu) and the magnitudes of the variance component ratios (pz and pc) to see how the bias compares to the same-scale setting. In the following results I average over the variability in X.

For a grid of n = 100 locations on the unit square, Fig. 2 shows the average of k(X) over 1000 simulations as a function of θc and θu, for combinations of pc and pz, where the empirical average approximates the expectation with respect to the distribution of X. There is a simple pattern to the bias modification relative to the same-scale setting. For θc = θu (the diagonal elements on the 1:1 line), we do not need simulation: EXk(X) = pc, which is equivalent to the same-scale result (5), after accounting for the proportion of variability in X that is confounded, pc. Note that if one estimates the analogous bias to (6) for OLS applied to spatial data, it is nearly constant regardless of the spatial scales (ÊXk(X) ≈ pc; not shown). Only when θu < θc, and particularly when θuθc, do we see less bias than in the same-scale setting, with clear potential for bias reduction from modeling the residual spatial variation in the outcome (recall Fig. 1 to interpret the values of θ). Above the diagonal, for θu > θc, ÊXk(X) > pc, indicating more bias when the scale of confounding is smaller than the scale of unconfounded variability. This situation may be of limited practical interest, because it’s not clear that there are real applications in which the unconfounded variation in the exposure occurs at larger scales than the confounded variation. However, it does show that there are circumstances in which bias is larger than under OLS, a point also made in Hodges and Reich (2010). Note that the patterns in Fig. 2 are qualitatively similar regardless of the values of pc and pz. Quantitatively, for larger values of pc, corresponding to a larger proportion of the variation in the exposure being confounded variation, bias is larger. For larger values of pz, corresponding to a larger proportion of the residual variation being the contribution of the confounder, the effects of the spatial scales are more distinct. The results highlight that inclusion of the spatial residual does not give unbiased estimates and bias is substantial in many scenarios even when the covariance parameters are known. Results are very similar when I sample locations uniformly on the unit square or in a clustered fashion (using a Poisson cluster process).

Fig 2.

Fig 2

The expected value of the bias modification term, ÊXk(X), as a function of the spatial scales of confounded (θc) and unconfounded (θu) variability for a selection of values of pz and pc. k(X) quantifies the amount of bias relative to the bias in the same-scale setting or with non-spatial confounding (ρσzβzx). Along the diagonal (θc = θu) EXk(X) = pc, which is equivalent to no bias reduction. Values near zero indicate substantial bias reduction.

The results in Fig. 2 correct for the complication that the sample variance of spatial process values (calculated over the domain) decreases as θ increases. This occurs because the sample variance over the domain in a single spatial replicate underestimates population variability; see Fig. 1b–c for examples. I want to have fixed ratios of average sample variances, pzβz2EZsz2/(βz2EZsz2+τ2){0.1,0.5,0.9} and pcEXcsc2/(EXcsc2+EXusu2){0.1,0.5,0.9}, for all values of θc and θu, thereby avoiding the introduction of artifacts caused solely by having ratios of sample variances change with the spatial ranges. Here sz2,sc2, and su2 are the sample variances of Z, Xc, and Xu, respectively. To achieve this I generate XcN(0,dc2σc2R(θc)) and XuN(0,du2σu2R(θu)) and modify the calculation of k(X) in (6) accordingly. dc and du are functions of θc and θu, respectively, that are chosen such that EXcsc2(θc)σc2 and EXusu2(θu)σu2, where sc2(θc) is the sample variance of Xc for a given realization under θc and analogously for su2(θu). The expectations are taken with respect to the distribution of the subscripted random vector. These manipulations allow me to present bias for scenarios that correspond to specific ratios of average sample variability of Xc, Xu, Z, and ε over the spatial domain.

To have only a single scale of residual spatial variabilty is not very realistic. Therefore, I carried out an additional simulation study with residual spatial variability in the outcome that is independent of the exposure and at a smaller scale than the scale of Z(s). I suppose that the data-generating model is

Y=β01+βxX+βzZ+h+ε. (7)

and that hN(0,σh2R(θu)), independent of X, Z, and ε, with all of the other details as before. Under this data-generating model and again supposing that all variance parameters are known, simulation estimates of EXk(X) indicate that bias is somewhat smaller than that seen in Fig. 2 for θc > θu and somewhat larger for θc < θu (not shown). Note that if the additional small-scale variability is correlated with the exposure, then one is back in the situation of having common scales for the exposure and the confounder, which is considered at the beginning of this section.

2.4 Bias and precision with estimated parameters

To generalize the results of Section 2.3, which supposed known variance and spatial dependence parameters, I set up a simulation study to assess the impact of estimating those parameters. In addition to maximum likelihood estimation of a mixed effects/kriging model based on the marginal likelihood (3), I consider the use of penalized likelihood to fit the model (2) with a penalized thin plate spline spatial term for g(s). I implemented the penalized spline using gam() in R, which uses generalized cross-validation (GCV) for data-driven smoothing parameter estimation (Wood, 2006). For the core simulations, I set the following parameter values, σu2=σc2=βz2σz2=1, τ2 = 4, βx = 0.5, ρ = 0.3, and sample 100 spatial locations uniformly from the unit square. For a range of values of θc and θu, I simulate 2000 datasets for each pair {θc, θu}. For each simulated dataset, I generate new spatial locations and new values of X and Z; I then generate Y using (4). Again we have to account for the reduced empirical spatial variability as θ increases; these simulations have effective values of pc = 0.5 and pz = 0.2.

With regard to bias, the simulation results for the mixed/kriging model (Fig. 3b) reasonably match the theoretical values with known variance parameters (Fig. 3a). However, when θuθc, the bias is generally larger than with known variance parameters, because the fitted model sometimes estimates little or no spatial structure in the residuals, pushing bias results toward the larger bias seen under OLS (Fig. 3d). Results for the penalized spline model (Fig. 3c) show smaller bias for θu < θc than the mixed model, presumably caused by the difference between estimating the amount of smoothing by GCV compared to maximum likelihood. In either case, spatial scales are critical, and bias is smaller than with OLS only when the scale of confounding is larger than the scale of the unconfounded variability. Additional simulations indicate that as the correlation of confounder and exposure increases, or the magnitude of variation in the confounder increases, or the effect size decreases, relative bias increases (not shown). In such scenarios, substantial bias reduction occurs only for very small spatial scales in the exposure and large scales of confounding.

Fig 3.

Fig 3

Relative bias, (Ê(β̂x) − βx)/βx, as a function of the spatial scales of confounded (θc) and unconfounded variability (θu): (a) theoretical bias for the mixed/kriging model with known variance parameters, and (b, c, d) simulated bias with estimated variance/penalty parameters for (b) the mixed model, (c) a penalized spline model, and (d) OLS.

Fig. 4 compares the mixed model with the penalized spline in the context of a bias-variance tradeoff. There is a substantial bias-variance tradeoff, with the smaller bias of the penalized spline model (for θc < θu) trading off for increased variance. The result is increased mean squared error (MSE) in β̂x, except when θu is very small. Both model variance estimates (third column) understate the variability in the coefficient estimates (second column), with particular underestimation of uncertainty and low coverage for the mixed/kriging model, and with lower coverage as one moves away from the region of θcθu. Of course the bias causes much of the poor coverage.

Fig 4.

Fig 4

Simulation results for (top row) mixed model/kriging fit and (bottom row) penalized spline model. Each plot shows results as a function of the spatial scales of the confounded (θc) and unconfounded variability (θu), with MSE (first column), variance of the estimates over the simulations (second column), average squared standard error (third column) and coverage (fourth column).

Fitting the mixed/kriging model by restricted maximum likelihood (REML) rather than maximum likelihood produces moderate improvement in coverage, with the average variance estimate more similar to the variance of the estimated coefficients. Using ν = 0.5 (i.e., an exponential spatial correlation function) in the fitting rather than the true ν = 2 has little effect on results. However, when I generate the unconfounded variability, Xu, based on ν = 0.5, bias is substantially smaller than the core results (particularly note that there is reduced bias relative to OLS when θc = θu), apparently because the non-differentiable sample paths of processes with exponential covariance play the role of very fine-scale, unconfounded variability. There is little change in results when using spatial locations simulated using a Poisson cluster process with an average of seven children per cluster and cluster kernel standard deviation of 0.03. Finally, simulations with ρ = 0, i.e., no confounding, indicate no bias for either model, as expected.

Our bias results when ρ ≠ 0 are analogous to the bias seen with penalized spline models in He, Mazumdar and Arena (2006) and Peng, Dominici and Louis (2006). There, concurvity (i.e., ρ̂ ≠ 0) between the smooth temporal term (analogous to our spatial residual) and the exposure emerged from the fixed basis coefficients chosen based on empirical data examples (R. Peng, personal communication, He, Mazumdar and Arena, 2006). Similar results are seen in the spatial settings of Ramsay, Burnett and Krewski (2003).

The presence of small-scale independent variation in the residual (7) reduces bias for θu < θc (not shown), relative to the results presented above. This occurs through an increase in the number of degrees of freedom estimated from the data to capture residual variability, i.e., undersmoothing with respect to the variation at the θc scale, analogous to undersmoothing in the partial spline setting (Rice, 1986; Speckman, 1988). This scenario seems quite likely in applications: if there is large-scale residual spatial structure, there is likely to be finer-scale structure as well. Thus, analyses that attempt to best fit the data may in the process reduce bias from confounding at the larger scales.

2.5 The bias-variance tradeoff

We have seen that even when all covariance parameters are known and the scale of confounding is much larger than the scale of unconfounded variability in X, bias remains, albeit at a much reduced level. In principle, if the structure at the confounded scale could be exactly fit using a set of basis functions, such as a regression spline (e.g., Dominici, McDermott and Hastie, 2004), then the exposure effect estimate would be unbiased, as in any multiple regression. The partial residual kernel smoothing approach of Speckman (1988) reduces bias in similar fashion, albeit without using a projection, through the technique of twicing. However, in a real application, one has to choose the basis functions, and if the basis functions don’t fully explain the confounded, large-scale variability, even with a basis of seemingly sufficient dimension, this will induce a bias. One could instead consider a penalized spline approach with penalty parameter chosen in advance to give the desired effective degrees of freedom (edf). For fixed edf, since the penalized spline smoother is not a true projection (Speckman, 1988; Peng, Dominici and Louis, 2006), one would expect the penalized spline approach to have more bias than the regression spline approach. Heuristically, bias in this approach occurs because the estimated spatial term does not fully explain the confounded component of the variability in the outcome, causing a bias analogous to that seen in the partial spline setting (Rice, 1986; Speckman, 1988). However, we would expect the penalized spline to be less sensitive to the exact form of the basis functions and number and placement of knots, as is seen in the example (Section 4). Furthermore, one can always undersmooth to reduce the bias, following the recommendation in the partial spline literature (Rice, 1986; Speckman, 1988). Thus, using a penalized spline seems reasonable, albeit without the clean interpretation of a projection. I show below that simulations comparing regression spline and penalized spline models support these theoretical results from the literature, in the spatial context considered here, with the regression spline having reduced bias and increased variance relative to penalized modeling.

The primary issue in an application is choosing the amount of smoothing to reduce bias, since inference about βx is the goal rather than best fitting the data. Data-driven smoothing might reduce bias (if there is small-scale residual correlation) or might have little effect on bias (if the data suggest only large-scale residual correlation). Thus the reduction in bias will depend on the scales involved and the actual amount of smoothing done, and the analysis will reveal little about the sensitivity of estimation to scale. Instead, one could explicitly assess the bias-variance tradeoff by varying the amount of smoothing and assessing the sensitivity of the exposure effect inference. One approach is a spatial analogue to the sensitivity analysis approaches of Peng, Dominici and Louis (2006): fit a model with spatial basis functions and vary the edf (e.g., Zeger et al., 2007). Plotting β̂x and uncertainty intervals as a function of edf (or some other metric) provides an assessment of the robustness of results to potential spatial confounding at various scales. If one is concerned about confounding at a particular scale, then one can report the results for an edf that would undersmooth with respect to that scale to reduce bias, accepting the tradeoff of increased uncertainty.

Motivated by this analysis strategy, I set up a simulation under the settings of Section 2.4, using a regression spline (i.e., unpenalized fixed effects) and varying the edf by changing the dimesion of the basis in gam() in R. Fig. 5a shows relative bias as a function of the spatial scales involved. As before, I focus on the results below the 1:1 diagonal (θc = θu) as this is the scenario of practical interest. By choosing a large number of edf, one can decrease bias more effectively than when estimating the amount of smoothing from the data (i.e., Fig. 3c). However, with moderate and large scale variability, the variance of the estimates in this fixed effects model increases dramatically (Fig. 5b). This causes a concordant increase in the MSE (not shown), highlighting the bias-variance tradeoff. In contrast, using a penalized spline with fixed edf (fixing the smoothing parameter in gam() in R) shows much more stable results. As expected, for a given edf bias is not reduced as much as with a regression spline (Fig. 5a), but there is much less variability (Fig. 5b).

Fig 5.

Fig 5

Simulation results for relative bias (a) and variance (b) of β̂x as a function of the spatial scales of confounded (θc) and unconfounded (θu) variability for regression splines (top rows) and penalized splines (bottom rows) with 5, 15 and 30 edf, where the edf are pre-specified, rather than estimated based on the data.

A diagnostic approach to understanding whether the residual may include variation from an unmeasured confounder is to assess the correlation between the residual and the exposure. Not knowing βx, one might use a variety of plausible values of βx to estimate g and then calculate the correlation with X (and potentially with filtered versions of X that exclude small-scale variation).

2.6 Accounting for residual spatial correlation

If one accounts for large-scale variation as a means of reducing bias, there may still be small-scale residual variation, such as fine-scale correlation in health outcomes related to residential sorting. As I have shown, one can reduce potential confounding bias from this fine-scale variation through explicit spatial modeling only if there is variability in the exposure at an even smaller scale. If there is not, then one is effectively assuming that the fine-scale variation is uncorrelated with the exposure. Given this assumption, one may need to account for the fine-scale residual spatial variation so that uncertainty estimation for βx is not compromised (but note the results of Section 3.3). One possibility would be to use an analysis robust to misspecification of the residual variance, for example using an estimating equation with uncertainty based on the sandwich estimator, with regression spline terms in the mean to account for large-scale spatial confounding bias. Alternatively, one could fit a penalized model with the amount of smoothing determined from the data. This has two effects. First, it naturally accounts for the effect of the spatial structure on uncertainty estimation. Second, in the presence of small-scale residual variability, the model will naturally undersmooth with respect to large-scale variability that may cause confounding, thereby reducing bias from confounding at the larger scale, as discussed previously.

3. SPATIALLY CORRELATED RESIDUALS AND PRECISION

In this section I suppose that the residual and the exposure are independent (ρ = 0 in the framework of Section 2), which results in unbiased estimation of βx. I consider effects of spatial scale on the following questions about efficiency of estimators for βx (henceforth simply β) and quantification of uncertainty:

  1. Given a fixed amount of residual variation, how is efficiency affected by the proportion of that variation that is spatial?

  2. What is the magnitude of the improvement in efficiency when accounting for residual spatial variation, relative to OLS?

  3. If one uses the naive estimator for the variance of the OLS estimator, β̂OLS, what is the magnitude of the error in uncertainty estimation compared to the correct variance estimator for β̂OLS?

The first question does not appear to have been raised in the literature. With regard to the second, while we know that GLS is the most efficient estimator when the residuals are correlated, here I investigate the magnitude of this efficiency advantage as a function of the spatial scales involved. Regarding the third, the conventional wisdom in the statistical and applied literature appears to be that not accounting for spatial structure leads to underestimation of uncertainty (e.g., Legendre, 1993; Burnett et al., 2001; Schabenberger and Gotway, 2005, p. 324). However, I have not seen a formal quantification of this underestimation for a regression coefficient, in contrast to our understanding of the potentially severe underestimation of uncertainty for the mean of a spatial process (Cressie, 1993, Sec. 1.3; Schabenberger and Gotway, 2005, Sec. 1.5).

Note that there are three variance estimators (i.e., estimators for the sampling variability of the estimated regression coefficient) under consideration here: the true GLS variance estimator, and the true and naive OLS variance estimators. When ρ = 0, OLS is unbiased, so it makes sense to consider OLS for estimation, provided we adjust the usual OLS variance estimator to account for the residual spatial correlation. While actual applications will likely involve more complicated modeling, consideration of these questions in this simple setting, and with known variance components, helps to understand the basic issues.

3.1 Relationship between spatial scale and GLS efficiency

Given a fixed amount of residual variation, how is efficiency affected by the proportion of variation that is spatial? I quantify efficiency in terms of precision rather than variance as this allows for closed form derivations.

Lemma 3.1

Consider the model (3) and suppose that all parameters are known except β0 and β ≡ βx. The expectation of the precision of β̂GLS, with respect to the sampling distribution of X, is

EX(Var(β^GLS)1)=σx2τ2+σg2(tr{1R(θx)}1T1R(θx)111T11), (8)

where Σ̃ ≡ (1 − pg)I + pgRg), pgσg2/(σg2+τ2), and the remaining notation follows that in previous sections. See the Appendix for the proof.

Note that the term in parentheses is an effective sample size, analogous to n − 1 in the non-spatial problem. Here the adjustment is for spatial structure in residual and exposure, with the second component in the parentheses analogous to the degree of freedom lost for estimating a mean.

Fig. 6a shows Monte Carlo estimates of the expected precision as a function of θx and θg, averaging (8) over 500 sets of n = 100 locations simulated uniformly on the unit square. I report the expected precision divided by a baseline of σx2(n1)/(τ2+σg2), which is the expected precision in the non-spatial setting, supposing that the total residual variation, τ2+σg2, remains constant. Compared to the non-spatial setting, lower precision occurs unless the exposure varies at small spatial scale. When the exposure varies at small spatial scale and the residual at larger spatial scale, precision can be substantially greater than in the non-spatial setting. The model is able to account for part of the residual variance through the spatial structure, as if the spatial structure were an additional covariate to which variation in the response is attributed. GLS implicitly estimates the process, g in (2), that gives rise to the marginalized model (3). This reduces the remaining ‘unexplained’ residual variability and thereby improves efficiency relative to having independent errors but equivalent overall residual variability. In contrast, when X varies only at large spatial scales, then efficiency decreases because of difficulty in distinguishing βX(s) from g(s). Results are similar using points on a regular grid or clustered based on a Poisson cluster process.

Fig 6.

Fig 6

Efficiency and precision results for three values of pg=σg2/(σg2+τ2) (columns) as a function of the spatial scales of the residual (θg) and the exposure (θx). (a) The log of the expected precision of the GLS estimator (8), relative to the expected precision in the non-spatial setting with equivalent total residual variation. (b) Relative efficiency of GLS and OLS estimation, quantified as the log of the expected ratio of GLS to OLS precision. (c) The log of the expected ratio of the correct and naive OLS variance estimators (9). The results are based on 500 simulations for each set of parameter values, with a Matérn correlation with ν = 2 and 100 locations sampled uniformly over the unit square.

3.2 Efficiency of GLS and OLS estimators

Here I consider how spatial scale affects the relative efficiency of spatial and non-spatial estimators, comparing the precisions of the OLS and GLS estimators. Since the true OLS variance, [(XTX)1(XT(τ2I+σg2Rg)X)(XTX)1]2,2, is a complicated function, it is difficult to derive closed form expressions for efficiency relative to the GLS estimator. Instead I conduct a small simulation study. For a regular grid of values of θx and θg, I carry out 500 simulations for each pair of values, with n = 100 observations whose spatial locations are drawn uniformly over the unit square domain. Note that I consider the ratio of the GLS precision to the OLS precision, so the values of σx2 and σg2+τ2 cancel out of the ratio and do not affect the results.

Fig. 6b shows the Monte Carlo estimates of the expected relative precision, as a function of the spatial scales, θg and θx, and the proportion of the residual variability that is spatial. When little of the residual variability is spatial (pg = 0.1), there is little gain in precision, as expected. When more is spatial, the gains in precision are small when g varies at a small scale, but substantial when g varies at a large scale. Unfortunately, this is also precisely the case in which one would be concerned about spatial confounding. If we suppose that the large-scale structure in the residual has been controlled for in an effort to reduce the potential for bias, then with the remaining residual variability being fine scale, there is limited gain in precision regardless of the spatial scale of the exposure. With locations on a regular grid, the gains in precision are slightly less for small values of θg, while with Poisson cluster process sampling, the gains are somewhat larger for small values of θg. See also Dow, Burton and White (1982) for similar simulation results when a Markov random field structure induces the correlation.

3.3 Underestimation of uncertainty by the naive OLS variance estimator

Applied analyses often ignore residual spatial correlation, raising the question of how strongly uncertainty estimates are affected. One can express the ratio of the true OLS variance to the incorrect naive OLS variance as follows. First define W ≡ (X1)/s where s21n(XiX¯)2. After expressing β̂x = [(Inline graphic Inline graphic)−1 Inline graphicY]2 = (T )−1 T Y where = X1, we have

Vartrue(β^x)Varnaive(β^x)=(σg2+τ2)1(XTX)(σg2+τ2)1(XTX)(XTX)1XTX=XTXXTX=1nWTW.

Averaging over the sampling distribution of X, we have

EX(1nWTW)=1ntr(Cov(W)). (9)

So for Σ̃I or Cov(W) ≈ I, i.e. when either θg or θx is close to zero, we expect the ratio to be near one. Note also that with spatial correlation functions that are non-negative, the only negative contribution to the ratio can be from negative covariances induced by standardizing X. Such negative covariances should diminish as the sample size increases, so we expect the ratio to generally be no smaller than one, indicating that the naive variance does underestimate uncertainty. Finally, the largest values of the ratio would occur with large positive correlations in corresponding elements of Σ̃ and Cov(W), which is to be expected when both g and X show large-scale variation.

Fig. 6c supports these heuristic results, showing the average ratio of variances in simulations, where the simulations are conducted as in Section 3.2. The ratio is close to one when either of the spatial terms has fine-scale variability and far from one when both have large-scale behavior. This result is similar to that of Bivand (1980) for inference about a correlation coefficient and to (Johnston and DiNardo, 1997, p. 178) under serial autocorrelation in a regression setting. As expected, when the proportion of residual variability is smaller (moving from the bottom left to bottom right panels), the expected ratio gets closer to one. This indicates that when non-spatial variation dominates the residual and the spatial structure in the residual or exposure is not too large in scale, the naive variance estimator may be reasonable. A lack of large-scale residual structure might result from having accounted for large-scale variation in attempting to reduce spatial confounding bias. Results with gridded locations show ratios slightly closer to one, and with clustered locations, ratios further from one. Note that the uncertainty estimate in any given naive analysis may be larger than when fitting a spatial model because the more sophisticated model both corrects the variance estimate, which increases the estimated uncertainty, and uses a more efficient estimator, which decreases the fundamental uncertainty.

Simple simulations with spatial ranges and sampling designs specific to an analysis could be easily carried out for further guidance in a given setting, allowing one to assess whether ignoring the spatial structure has substantial impact on uncertainty estimation. Accounting for small-scale spatial correlation requires estimation of the spatial structure and is often computationally burdensome, so an assumption of independence can have an important practical benefit. Of course in some analyses, any underestimation of variability may be cause for concern, in which case use of the naive variance estimator would not be tenable.

4. CASE STUDY: BIRTHWEIGHT AND AIR POLLUTION

Chronic health effects of ambient air pollution in developed countries involve small relative risks, but are of considerable public health importance because of widespread exposure. Epidemiologic studies attempt to estimate a small effect from data with high levels of variability and stronger effects from other covariates, including potential confounders such as socioeconomic status, so spatial confounding bias is of critical concern.

I reanalyze data on the association between ambient air pollution (estimates of black carbon, a component of particulate matter) and birthweight in eastern Massachusetts (Zeka, Melly and Schwartz, 2008; Gryparis et al., 2009). These analyses found significant negative effects of traffic proxy variables and black carbon, respectively, on birthweight. Gryparis et al. (2009) used several methods to try to account for effects of measurement error in the predicted black carbon concentrations, which are based on a regression model that accounts for spatial and temporal structure and key covariates.

I follow these analyses in using an extensive set of covariates to try to account for potential confounding. I use smooth terms for mother’s age, gestational age, and mother’s cigarette use, to account for nonlinearities, a linear term for census tract income, and categorical variables for the following: presence of a health condition of the mother, previous preterm birth, previous large birth, sex of baby, year of birth, index of prenatal care, and maternal education. The exposure of interest is estimated nine-month average black carbon concentration at the geocoded address of the mother, based on a black carbon prediction model (Gryparis et al., 2007). Following Gryparis et al. (2009), for simplicity I exclude the 13,347 observations with any missing covariate values, giving 205,713 births.

In Gryparis et al. (2009) we found no evidence of residual spatial correlation based on a spatial semivariogram. Further analysis here indicates that there is significant residual spatial variation but that non-spatial variation overwhelms the magnitude of this variation. Fig. 7a is a semivariogram showing no evidence of spatial structure, while a spatial smooth of model residuals (Fig. 7b) indicates clear spatial structure. While individual non-spatial variability amongst babies swamps the spatial variation (hence the flat semivariogram), it is large relative to the estimated pollution effect (note the surface values in the range of −40 to 40, for comparison with effect estimates in Fig. 8). Thus, if the residual spatial variation is caused by spatially varying confounders, it could bias estimation of the pollution effect.

Fig 7.

Fig 7

(a) Semivariogram of full model residuals, with the first point representing births to mothers living at the same location. (b) Spatial smooth of residuals with town boundaries in grey. The spatial smooth, with 129 edf chosen by GCV, is highly significant.

Fig 8.

Fig 8

For the model with the full set of covariates (a) and the reduced set of covariates (b), black carbon effect estimates and 95% confidence intervals based on different specifications for the spatial term in an additive model: black pluses indicate the model with no spatial term and green dots with the edf chosen by GCV, while black (regression spline) and red (penalized spline) dots indicate results when fixing the degrees of freedom at a set of discrete values. The lines through the points and corresponding dashed lines are taken by connecting the effect estimate and confidence interval bounds for the discrete set.

To include a spatial term in models of birthweight, I consider a regression spline, an unpenalized approach, and a penalized spline, both with edf chosen in advance (see Section 2.5), as well as a penalized spline with data-driven smoothing parameter estimation based on GCV, all implemented in gam() in R, using the thin plate spline basis. Note that the thin plate regression spline approach implemented in gam() should minimize sensitivity to knot placement (Wood, 2006).

I first add a spatial term to the model with the full set of covariates to assess whether some of the estimated effect may be biased by spatial confounding. Fig. 8a shows how the estimated effect of black carbon varies with the edf and the spatial smoothing approach. The estimate attenuates somewhat as more edf are used to account for the spatial structure. For the penalized spline, as more than about 10 edf are used, the upper confidence limit exceeds zero, and for larger edf, the upper limit increases further. GCV chooses 157 edf, indicating fairly small-scale spatial structure in the data. For context note that with 129 edf in Fig. 7b we see spatial features at the scale of individual towns. While the regression spline approach implemented here avoids having to choose the knots, the empirical results are still very sensitive to edf, in contrast to the stability of the penalized spline solution as the edf varies. For both penalized and regression splines, there is a clear bias-variance tradeoff, with increasing variance as the number of edf increases. However, for this problem with a very large sample size, the confidence intervals do not increase drastically, nor is there much difference in the uncertainty between the regression and penalized spline approaches. The spatial confounding assessment suggests that while we have somewhat reduced confidence in the black carbon effect, the effect estimate is reasonably stable even when using a spatial term with a large number of degrees of freedom.

Next I consider what might have happened if most of the covariates (particularly the ones related to socioeconomic status) were not measured, potentially inducing serious confounding. Fig. 8b indicates that without any spatial term in the model, the effect estimate is −23.0 with a 95% confidence interval of (−26.8, −19.2), indicating a much more substantial effect of black carbon than the fully adjusted model. As soon as one accounts for spatial structure, even with a small number of edf, the estimate attenuates, approaching the fully adjusted estimate, with the upper confidence limit rising above zero. The reduced model appears to suffer from serious confounding, with the estimated pollution effect apparently driven by large-scale association of pollution and birthweight. The spatial analysis is able to account for much of this apparent confounding, substituting for a rich set of covariates.

Ideally one would fit a model that accounts for fine-scale spatial structure to improve one’s confidence in the uncertainty estimation. However, with 205,713 observations, this is a computational challenge that I do not take up here. Given the results in Section 3.3 that indicate that large-scale structure causes most of the variance underestimation, one can hope that the uncertainty at the larger values for the spatial edf in Fig. 8 may reasonably approximate the true uncertainty.

5. DISCUSSION

Considerations of scale are critical in spatial regression problems. Standard spatial regression models, which use spatial random effects, kriging specifications, or a penalized spline to represent the spatial structure, are penalized models with inherent bias-variance tradeoffs in estimating the smooth function. Under unmeasured spatial confounding, the bias carries over into estimating the coefficient for the exposure of interest, but the degree of bias depends on the spatial scales involved. Inclusion of a spatial residual term accounts for spatial correlation in the sense of reducing bias from unmeasured spatial confounders only when there is unconfounded variability in the exposure at a scale smaller than the scale of the confounding. If the variation in exposure is solely at large scales, there is little opportunity to reduce spatial confounding bias, but with a component of small-scale exposure variability, large-scale spatial confounding bias can be reduced substantially. Accounting for large-scale residual correlation is also important for improving precision of regression estimators and for correctly estimating uncertainty. In contrast, when residual correlation occurs at small scales, there is little opportunity for reducing spatial confounding bias at those scales or improving regression estimator precision. However, under the assumption of no small-scale confounding, fitting such residual structure can reduce bias from larger scale confounding by causing undersmoothing with respect to the large-scale structure. While the results here are limited to the simple setting of linear regression/additive models with a single covariate and single unmeasured confounder, I expect that the qualitative results and principles hold in more complicated settings, with no reason to believe that the bias results would improve in more complicated models.

Sensitivity analyses that show the bias-variance tradeoff as a function of the scale at which the spatial residual structure is modeled (Peng, Dominici and Louis, 2006; Zeger et al., 2007) offer one approach that helps to frame the issue of bias in the context of the spatial scales involved. In choosing a spline formulation to carry out such an analysis, while a regression spline has an appealing interpretation and in theory result in less bias in estimating the effect of interest, a penalized spline with a fixed effective degrees of freedom may give more stable results. Of course the sensitivity analysis approach does not answer the question of how to get a single estimate of the effect of interest. One might also consider an approach similar to that of Beelen et al. (2007) and explicitly decompose the exposure into multiple scales, including exposure at each scale as a separate covariate and focus causal interpretation on the effect estimates for the smaller scales (e.g., Janes, Dominici and Zeger, 2007). Lu and Zeger (2007) use matching estimators for each pair of observations and assess how effect estimates vary with spatial lag between the pairs to assess sensitivity. Note that estimating equation approaches are not capable of reducing bias from unmeasured spatial confounding because the marginal variance is assumed to be unrelated to the exposure and variation is not attributed to a spatial term.

From the econometric perspective, spatial confounding bias might be seen as a type of endogeneity bias, with exposure the endogenous variable and the unconfounded component of exposure, or some proxy for it, an exogenous variable. Since the unconfounded component is not measured directly, some sort of scale decomposition appears necessary. Standard endogenous variable techniques such as two-stage least squares and instrumental variable methods (Johnston and DiNardo, 1997) do not appear directly useful but do share commonalities with approaches mentioned above.

Others have noted the identifiability problems in spatial models, with sensitivity of effect estimates to inclusion of a spatial residual term when the covariates vary spatially (Breslow and Clayton, 1993; Clayton, Bernardinelli and Montomoli, 1993; Burden et al., 2005; Lawson, 2006, p. 187; Augustin et al., 2007; Wakefield, 2007). A different methodologic perspective than that presented here has been taken by Reich, Hodges and Zadnik (2006) and Houseman, Coull and Shine (2006), who estimate the effect of exposure, X, by forcing the spatial residual to be orthogonal to X, attributing as much variability as possible to X. This approach makes a very strong assumption of no confounding to avoid over adjustment bias from accidentally accounting for some of the effect of the covariate in the residual. Note that the residuals and covariates are not orthogonal under GLS estimation (Schabenberger and Gotway, 2005, p. 349). Gustafson and Greenland (2006) confront a similar problem of modeling systematic residual confounding in a context with identifiability problems, finding that imposing structure through a prior distribution in a nonidentified model can help account for a portion of the confounding, improving bias and precision of estimators.

Note that measurement error in the exposure is of critical concern, because reducing bias relies on estimating variability in exposure at scales smaller than the confounding. In many contexts, measurement error becomes an increasing concern at small scales because of limitations in measurement resources. In contrast, large-scale exposure variation may be well estimated using spatial smoothing and regression models, thereby inducing Berkson-type error through what is effectively regression calibration (Gryparis et al., 2009). To the extent to which accounting for bias forces one to rely on exposure estimates more likely contaminated by classical measurement error, one may find oneself reducing bias from confounding only to increase it from measurement error. To the extent small-scale variation is affected by Berkson error, one would increase variance but not incur bias by relying on the small-scale variation.

Finally note that in many settings one has aggregated exposure and outcome data, so one has limited ability to identify effects of exposure based on fine-scale variation because the aggregation eliminates the fine-scale variation (e.g., Janes, Dominici and Zeger, 2007). This suggests that accounting for spatial confounding with areal data, for which researchers often use standard conditional auto-regressive models, is likely to be ineffective when aggregating over large areal units, which is consistent with the bias seen in Richardson (2003). In work concurrent with that presented here, Hodges and Reich (2010) have investigated bias in the areal setting under a variety of perspectives on the spatial random effects, also making the case for the approach taken in Reich, Hodges and Zadnik (2006).

Acknowledgments

The author thanks Louise Ryan and Francesca Dominici for feedback and encouragement, Andy Houseman, Eric Tchetgen, and Brent Coull for comments, Ben Armstrong and John Rice for thought-provoking discussions, Joel Schwartz for access to the birthweight data, Alexandros Gryparis and Steve Melly for assistance with the birthweight data, and Brent Coull for funding through NIEHS R01 grant ES01244. This work was also funded by NIEHS Center grant ES000002 and NCI P01 grant CA134294-01.

APPENDIX A: PROOF OF LEMMA 3.1

Proof

From the definition of the GLS estimator, we have

Var(β^GLS)=[XT1X]2,21=1T111T11XT1XXT111T1X.

Using the definitions of Σ̃ and pg, and taking the reciprocal, we have

Prec(β^GLS)=1σg2+τ2(XT1XXT111T1X1T11).

Conclude by taking the expectation with respect to the sampling distribution of X, using the expectation of a quadratic form, and rearranging the matrices inside the second trace to give a scalar:

EX(Prec(β^GLS))=σx2σg2+τ2(tr(1R(θx))tr(111T1R(θx))1T11+μx21T11μx21T111T111T11)=σx2σg2+τ2(tr(1R(θx))1T1R(θx)111T11).

References

  1. Augustin N, Lang S, Musio M, Von Wilpert K. A spatial model for the needle losses of pine-trees in the forests of Baden-Wurttemberg: an application of Bayesian structured additive regression. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2007;56:29–50. [Google Scholar]
  2. Beelen R, Hoek G, Fischer P, Brandt P, Brunekreef B. Estimated long-term outdoor air pollution concentrations in a cohort study. Atmospheric Environment. 2007;41:1343–1358. [Google Scholar]
  3. Biggeri A, Bonannini M, Catelan D, Divino F, Dreassi E, Lagazio C. Bayesian ecological regression with latent factors: Atmospheric pollutants, emissions, and mortality for lung cancer. Environmental and Ecological Statistics. 2005;12:397–409. [Google Scholar]
  4. Bivand R. A Monte Carlo study of correlation coefficient estimation with spatially autocorrelated observations. Quaestiones Geographicae. 1980;6:5–10. [Google Scholar]
  5. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]
  6. Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models. The Annals of Statistics. 1989;17:453–510. [Google Scholar]
  7. Burden S, Guha S, Morgan G, Ryan L, Sparks R, Young L. Spatio-temporal analysis of acute admissions for ischemic heart disease in NSW, Australia. Environmental and Ecological Statistics. 2005;12:427–448. [Google Scholar]
  8. Burnett R, Ma R, Jerrett M, Goldberg M, Cakmak S, Pope C, III, Krewski D. The spatial association between community air pollution and mortality: A new method of analyzing correlated geographic cohort data. Environmental Health Perspectives. 2001;109:375–380. doi: 10.1289/ehp.01109s3375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cakmak S, Burnett R, Jerrett M, Goldberg M, Pope C, III, Ma R, Gultekin T, Thun M, Krewski D. Spatial regression models for large-cohort studies linking community air pollution and health. Journal of Toxicology and Environmental Health, Part A. 2003;66:1811–1823. doi: 10.1080/15287390306444. [DOI] [PubMed] [Google Scholar]
  10. Cerdá M, Tracy M, Messner S, Vlahov D, Tardiff K, Galea S. Misdemeanor policing, physical disorder, and gun-related homicide: A spatial analytic test of “broken-windows” theory. Epidemiology. 2009;20:533–541. doi: 10.1097/EDE.0b013e3181a48a99. [DOI] [PubMed] [Google Scholar]
  11. Cho W. Contagion effects and ethnic contribution networks. American Journal of Political Science. 2003;47:368–387. [Google Scholar]
  12. Claeskens G, Krivobokova T, Opsomer J. Asymptotic properties of penalized spline estimators. Biometrika. 2009;96:529–544. [Google Scholar]
  13. Clayton D, Bernardinelli L, Montomoli C. Spatial correlation in ecological analysis. International Journal of Epidemiology. 1993;22:1193–1202. doi: 10.1093/ije/22.6.1193. [DOI] [PubMed] [Google Scholar]
  14. Cressie N. Statistics for Spatial Data. Wiley-Interscience; New York: 1993. Rev. ed. [Google Scholar]
  15. Diggle P, Heagerty PJ, Liang K-Y, Zeger S. Analysis of Longitudinal Data. 2. Oxford University Press; Oxford: 2002. [Google Scholar]
  16. Dominici F, McDermott A, Hastie T. Improved semiparametric time series models of air pollution and mortality. Journal of the American Statistical Association. 2004;99:938–949. [Google Scholar]
  17. Dow M, Burton M, White D. Network autocorrelation: A simulation study of a foundational problem in regression and survey research. Social Networks. 1982;4:169–200. [Google Scholar]
  18. Gryparis A, Coull B, Schwartz J, Suh H. Latent variable semiparametric regression models for spatio-temporal modeling of mobile source pollution in the greater Boston area. Journal of the Royal Statistical Society, Series C. 2007;56:183–209. [Google Scholar]
  19. Gryparis A, Paciorek C, Zeka A, Schwartz J, Coull B. Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics. 2009;10:258–274. doi: 10.1093/biostatistics/kxn033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Gustafson P, Greenland S. The performance of random coefficient regression in accounting for residual confounding. Biometrics. 2006;62:760–768. doi: 10.1111/j.1541-0420.2005.00510.x. [DOI] [PubMed] [Google Scholar]
  21. He S, Mazumdar S, Arena V. A comparative study of the use of GAM and GLM in air pollution research. Environmetrics. 2006;17:81–93. [Google Scholar]
  22. Hodges J, Reich B. Adding spatially-correlated errors can mess up the fixed effect you love. University of Minnesota Division of Biostatistics; 2010. Technical Report No. 2010-002. Available at http://www.biostat.umn.edu/ftp/pub/2010/rr2010-002.pdf. [Google Scholar]
  23. Houseman E, Coull B, Shine J. A nonstationary negative binomial time series with time-dependent covariates: enterococcus counts in Boston harbor. Journal of the American Statistical Association. 2006;101:1365–1376. [Google Scholar]
  24. Janes H, Dominici F, Zeger S. Trends in air pollution and mortality: An approach to the assessment of unmeasured confounding. Epidemiology. 2007;18:416–423. doi: 10.1097/EDE.0b013e31806462e9. [DOI] [PubMed] [Google Scholar]
  25. Johnston J, DiNardo J. Econometric Methods. 4. McGraw-Hill; New York: 1997. [Google Scholar]
  26. Lawson A. Statistical Methods in Spatial Epidemiology. 2. John Wiley & Sons; New York: 2006. [Google Scholar]
  27. Lee D, Ferguson C, Mitchell R. Air pollution and health in Scotland: a multicity study. Biostatistics. 2009;10:409–423. doi: 10.1093/biostatistics/kxp010. [DOI] [PubMed] [Google Scholar]
  28. Legendre P. Spatial autocorrelation: Trouble or new paradigm? Ecology. 1993;74:1659–1673. [Google Scholar]
  29. Lombardía MJ, Sperlich S. Multi-level regression between fixed effects and mixed effects models. Georg-August-Universität Göttingen; 2007. Technical Report. Available at http://www.zfs.uni-goettingen.de/index.php?id=54. [Google Scholar]
  30. Lu Y, Zeger S. Decomposition of regression estimators to explore the influence of “unmeasured” time-varying confounders. Johns Hopkins University Department of Biostatistics; 2007. Technical Report No. 159. Available at http://www.bepress.com/jhubiostat/paper159. [Google Scholar]
  31. Molitor J, Jerrett M, Chang C, et al. Assessing uncertainty in spatial exposure models for air pollution health effects assessment. Environmental Health Perspectives. 2007;115:1147–1153. doi: 10.1289/ehp.9849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Peng R, Dominici F, Louis T. Model choice in time series studies of air pollution and mortality. Journal of the Royal Statistical Society Series A. 2006;169:179–203. [Google Scholar]
  33. Pope C, III, Burnett R, Thun M, Calle E, Krewski D, Ito K, Thurston G. Lung cancer, cardiopulmonary mortality and long-term exposure to fine particulate air pollution. Journal of the American Medical Association. 2002;287:1132–1141. doi: 10.1001/jama.287.9.1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Ramsay T, Burnett R, Krewski D. Exploring bias in a generalized additive model for spatial air pollution data. Environmental Health Perspectives. 2003;111:1283–1288. doi: 10.1289/ehp.6047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Reich B, Hodges J, Zadnik V. Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics. 2006;62:1197–1206. doi: 10.1111/j.1541-0420.2006.00617.x. [DOI] [PubMed] [Google Scholar]
  36. Rice J. Convergence rate for partially linear splined models. Statistics and Probability Letters. 1986;4:203–208. [Google Scholar]
  37. Richardson S. Spatial models in epidemiological applications. In: Green P, Hjort N, Richardson S, editors. Highly Structured Stochastic Systems. Oxford University Press; 2003. pp. 237–259. [Google Scholar]
  38. Ruppert D, Wand M, Carroll R. Semiparametric Regression. Cambridge University Press; Cambridge, U.K: 2003. [Google Scholar]
  39. Schabenberger O, Gotway C. Statistical Methods for Spatial Data Analysis. Chapman & Hall; Boca Raton: 2005. [Google Scholar]
  40. Speckman P. Kernel smoothing in partial linear models. Journal of the Royal Statistical Society Series B. 1988;50:413–436. [Google Scholar]
  41. Wakefield J. Disease mapping and spatial regression with count data. Biostatistics. 2007;8:158–183. doi: 10.1093/biostatistics/kxl008. [DOI] [PubMed] [Google Scholar]
  42. Waller L, Gotway C. Applied Spatial Statistics for Public Health Data. Wiley; Hoboken, New Jersey: 2004. [Google Scholar]
  43. Wood S. Generalized Additive Models: An Introduction with R. Chapman & Hall; Boca Raton: 2006. [Google Scholar]
  44. Zeger S, Dominici F, McDermott A, Samet J. Mortality in the Medicare population and chronic exposure to fine particulate air pollution. Johns Hopkins University Department of Biostatistics; 2007. Technical Report No. 133. Available at http://www.bepress.com/jhubiostat/paper133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Zeka A, Melly S, Schwartz J. The effects of socioeconomic status and indices of physical environment on reduced birth weight and preterm births in eastern Massachusetts. Environmental Health. 2008;7:60. doi: 10.1186/1476-069X-7-60. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES