Summary
This article addresses the asymptotic performance of popular spatial regression estimators of the linear effect of an exposure on an outcome under spatial confounding, the presence of an unmeasured spatially structured variable influencing both the exposure and the outcome. We first show that the estimators from ordinary least squares and restricted spatial regression are asymptotically biased under spatial confounding. We then prove a novel result on the infill consistency of the generalized least squares estimator using a working covariance matrix from a Matérn or squared exponential kernel, in the presence of spatial confounding. The result holds under very mild assumptions, accommodating any exposure with some nonspatial variation, any spatially continuous fixed confounder function, and non-Gaussian errors in both the exposure and the outcome. Finally, we prove that spatial estimators from generalized least squares, Gaussian process regression and spline models that are consistent under confounding by a fixed function will also be consistent under endogeneity or confounding by a random function, i.e., a stochastic process. We conclude that, contrary to some claims in the literature on spatial confounding, traditional spatial estimators are capable of estimating linear exposure effects under spatial confounding as long as there is some noise in the exposure. We support our theoretical arguments with simulation studies.
Some key words: Causal inference, Gaussian process, Generalized least squares, Spatial confounding, Spatial statistics
1. Introduction
Spatial confounding commonly refers to the phenomenon in which an unmeasured variable that is spatially structured, i.e., a continuous or smooth function of space, influences both the outcome and the exposure variables. Understanding the impact of spatial confounding on popular estimators of the exposure effect in geospatial models has become a very popular topic in spatial and environmental statistics. However, the literature has offered conflicting conclusions and confused accounts of the origins of, and solutions to, spatial confounding, without much asymptotic theory to support the claims. In this article we establish consistency, or lack thereof, of popular estimators based on spatial regression models. We focus on the ordinary least squares (OLS) estimator, restricted spatial regression estimator and generalized least squares (GLS) estimator using Gaussian process covariance, but we also discuss spline regression and Bayesian Gaussian process regression. Our main result establishes the consistency of GLS under very general forms of spatial confounding in fixed-domain, i.e., infill, asymptotics, as long as there is some noise, i.e., nonspatial variation, in the exposure process.
One of the earliest discussions of spatial confounding is found in Clayton et al. (1993), which notes that introducing a spatial error term into a linear model can induce large changes in coefficient estimates relative to a nonspatial model. Clayton et al. (1993) recommended inclusion of such a term anyway since unmeasured confounding is often a strong threat to study validity. Paciorek (2010) defined spatial confounding as the presence of an unmeasured spatially varying confounder that is correlated with the exposure, derived the finite-sample bias of the GLS estimator, and studied it analytically and via extensive empirical studies, concluding that traditional spatial estimators such as GLS and splines are preferable to OLS estimators in most realistic scenarios of spatial confounding.
In contrast, Reich et al. (2006) and Wakefield (2006) viewed the shift in effect estimates due to adding a spatial error to a linear model as problematic. Hodges & Reich (2010) introduced restricted spatial regression as a workaround, restricting the spatial random function to be geometrically orthogonal to covariates. While they acknowledged that the existence of an unmeasured spatial confounder implies bias in this estimate, they claimed that traditional spatial models cannot adjust for such bias either. Some later work has built on restricted spatial regression while maintaining its essential features (Hughes & Haran, 2013; Prates et al., 2019; Azevedo et al., 2023). However, restricted spatial regression has come under criticism recently. Hanks et al. (2015), Khan & Calder (2022) and Zimmerman & Ver Hoef (2022) found that uncertainty quantification from restricted spatial regression can be severely anticonservative. While this line of work makes a convincing case for not using restricted spatial regression because of its second-order properties, i.e., variance, it does not consider data generation processes with spatial confounding and thus does not address the original claim that the restricted spatial regression estimator is likely to be no more biased than estimators from traditional spatial regression models under spatial confounding.
Some studies have considered data generation processes with explicit spatial confounding to derive analytical expressions of the biases and mean squared errors of spatial estimators such as GLS (Paciorek, 2010; Page et al., 2017; Schnell & Papadogeorgou, 2020; Nobre et al., 2021; Khan & Berrett, 2023; Narcisi et al., 2024). These finite-sample bias expressions have sometimes led to claims that the GLS estimator is biased. For example, Page et al. (2017) notes the equivalence of spatial confounding and endogeneity and argues that the GLS estimator is biased under endogeneity. Schnell & Papadogeorgou (2020) points to the finite-sample bias of the GLS estimator and claims that common spatial estimators do not mitigate bias under spatial confounding. However, in the presence of unmeasured spatial confounding, no estimator of the exposure effect will be unbiased in finite samples unless one makes very strong assumptions on the confounder. An important factor in the evaluation of an estimator is asymptotic consistency, which complements analytical or empirical studies of the finite-sample bias under various data generation scenarios and sample sizes. In this regard, Khan & Berrett (2023) provides a comprehensive study of nonspatial or unadjusted, i.e., OLS, and spatial, i.e., GLS, estimators under data generation processes with explicit spatial confounding, expanding on the previous work of Paciorek (2010). Khan & Berrett (2023) derived analytical expressions for the finite-sample bias of OLS and GLS estimators and showed that in most scenarios of spatial confounding, spatial estimators such as GLS will generally reduce bias relative to the OLS estimator. They also presented numerical results from a suite of different data generation scenarios, providing additional empirical evidence in favour of using the GLS estimator under spatial confounding. However, to the best of our knowledge, none of these studies have investigated how the GLS or OLS estimators will behave asymptotically under spatial confounding. This is the knowledge gap that we fill here.
For our study, we also make a clear distinction between the data generation process and the analysis models used to derive estimators of the exposure effect. The data generation process for the outcome is specified as the widely studied and used spatial partially linear model. We consider two scenarios: one where the unmeasured confounder is a fixed spatial function and one where it is a random spatial function, i.e., a stochastic process on the spatial domain. We show how spatial confounding can be defined from first principles of causal inference and establish conditions for identifiability of the linear exposure effect for data generated from the partially linear model. We then present a direct first-order argument against estimators such as the unadjusted OLS estimator and the restricted spatial regression estimator by quantifying their nonvanishing asymptotic error under spatial confounding.
Our main result establishes the consistency of the GLS estimator under spatial confounding as long as there is some nonspatial variation in the exposure. The result holds under very general conditions: any random sampling scheme of locations in a fixed domain, i.e., infill asymptotics; spatial confounding by any continuous fixed function of space; errors for both the outcome and the exposure deviating from normality; and the Gaussian process covariance function used in GLS being any universal kernel, e.g., Matérn or squared exponential, with any choice of kernel parameters. The proof uses Mercer representation of the reproducing kernel Hilbert space of Gaussian processes with a universal kernel to show that the GLS procedure essentially prewhitens the outcome and the exposure, removing the spatial confounder and allowing recovery of the exposure effect from the nonspatial part of the exposure. Similar ideas of prewhitening have been more explicitly exploited in joint outcome and exposure models to mitigate spatial confounding (Thaden & Kneib, 2018; Dupont et al., 2022). We show that the traditional GLS estimator achieves the same goal without explicitly specifying the exposure model. The result thus dispels claims, in an asymptotic sense, that GLS does not mitigate bias under spatial confounding (see, e.g., Hodges & Reich, 2010; Page et al., 2017; Schnell & Papadogeorgou, 2020). Our result provides asymptotic confirmation for the empirical and analytical finite-sample conclusions of Paciorek (2010) and Khan & Berrett (2023) that in the presence of some nonspatial variation in the exposure, traditional spatial estimators can be used even in the presence of spatial confounding.
Last, we consider spatial confounding by a random function of space. This is a form of endogeneity, as omitting the spatial random function, i.e., a random effect, would result in endogenous error terms that are correlated with the exposure. We show that GLS is also consistent under very general conditions of endogeneity, not requiring any distributional assumptions, such as stationarity or Gaussianity, on the confounding stochastic process beyond having continuous sample paths. We present a general result on when the consistency of an estimator under spatial confounding by a fixed function will imply consistency under endogeneity. This result is broadly applicable to other estimators such as splines and Gaussian processes, showing that existing results on their consistency for the partially linear model under spatial confounding by a fixed function, e.g., Rice (1986) and Dupont et al. (2022) for splines and Yang et al. (2015) for Gaussian processes, also imply consistency of these estimators under spatial endogeneity. This result contradicts the perception that estimators from exogenous analysis models such as the GLS or the Gaussian process regression estimator, where error processes of the outcome are modelled as being independent of the exposure, cannot account for endogeneity (see, e.g., Bell et al., 2018). This seemingly paradoxical result on the consistency of exogenous estimators under endogeneity is due to the spatially smooth nature of the confounder, demonstrating the benefit of spatial confounding as opposed to an unstructured or clustered confounder. We corroborate the theoretical findings with synthetic experiments.
2. Data generation process
2.1. Outcome model
Throughout, we assume the following data generation process. Let be a convex and compact spatial domain. The locations are independent and identically distributed, sampled randomly from a sampling density with support on the whole of . Our asymptotic results will keep the domain fixed, i.e., we consider infill asymptotics with random sampling of an increasing number of locations. For , let be an univariate outcome and let be a -dimensional exposure observed at . Let denote the process and the matrix or vector formed by vertically stacking . We use the same notation for all other fixed functions or random functions, i.e., stochastic processes, on , e.g., and . The data generation process for the outcome process is
| (1) |
where is independent and identically distributed with and .
This partially linear model features extensively in spatial analysis and the study of spatial confounding, both as the data generation model and as the analysis model. The spatial function can be either a fixed function or a random function; the empirical properties of a single draw cannot determine which is the case. Some of the literature on spatial confounding decides on the nature of the true based on the analysis model used. For example, if is modelled using splines or basis functions and estimated using penalized regression, the true is also assumed to be fixed (Rice, 1986; Dupont et al., 2022). When is modelled as a random function, typically as a Gaussian process, the true function is also assumed to be a random function (Hodges & Reich, 2010; Khan & Calder, 2022; Zimmerman & Ver Hoef, 2022). This latter practice can be problematic: treating the true as a Gaussian process, which is traditionally modelled to be independent of the exposure , ignores their possible true correlation in the data generation process and has led to researchers overlooking the possibility of bias of certain estimators under spatial confounding. We correct these misapprehensions in § 3.2.
It is therefore important to disentangle the data generation process from the analysis model. Regardless of the analysis model used, we view it as preferable to treat the true in the data generation process as a fixed function of space, as we are typically interested in inference about a population that would share the realization of that generated the data. Assuming to be a fixed function in the data generation process is perfectly compatible with an analysis model that models as a random function such as a Gaussian process or a spline or basis function expansion with priors for the coefficients; this is analogous to a Bayesian model that treats a fixed parameter as random during the analysis. Occasionally, there can be compelling reasons to consider a random function in the data generation process, such as a desire to generalize findings to future, independent generations of the data. In such cases, studying the bias of an estimator requires explicit consideration of the correlation between this random function and the exposure, i.e., endogeneity, as done in Paciorek (2010), Page et al. (2017), Schnell & Papadogeorgou (2020), Nobre et al. (2021), Khan & Berrett (2023) and Narcisi et al. (2024). Ignoring this correlation results in erroneous conclusions of unbiasedness. We consider the data generation process with confounding by a random spatial function in § 3.4.
2.2. Spatial confounding
Spatial confounding occurs in (1) when the exposure function is correlated with the fixed spatial function in the outcome, i.e.,
| (2) |
Condition (2), established in the Supplementary Material using potential outcomes, is meaningful even without the causal inference formalism used to derive it. If is a fixed function, then the correlation in (2) accounts for randomness in from the random sampling of the location , i.e., . If is a random function, then randomness in comes from both and . As is -dimensional, for , (2) is a vector equation, i.e., there is a nonzero correlation between and at least one component of .
2.3. Identifiability and exposure model
In (1), since both and possibly the exposure vary with space, additional assumptions are required on the exposure process to ensure that the parameters in (1) are identified, in the sense that only one pair is compatible with infinite data generated from (1). For restricted spatial regression, Hodges & Reich (2010) restricted to the orthogonal complement of , ensuring identifiability. However, as we see from (2), assuming implies a data generation process without any spatial confounding. Below we provide a necessary and sufficient criterion for identifiability in the partially linear model that accommodates confounding. All proofs are given in the Supplementary Material.
Proposition 1. Let be a class of functions from to that is closed under pairwise addition and scalar multiplication. Let denote a draw, i.e., realization, of . Then, given , the partially linear model (1) has a unique solution in , in the sense that there is only one pair for which the data have the distribution given in (1) for all , if and only if for all nonzero .
The partially linear model is thus identified when the confounder belongs to the class and the exposure , or in the multivariate case any linear combination of its components, does not belong to . As function classes are typically defined by smoothness, e.g., Hölder smoothness, and smoother classes are nested inside less smooth ones, this implies that the exposure needs to have a nonsmooth component relative to . The identifiability result thus agrees with earlier literature suggesting the need for the exposure to vary on a smaller spatial scale than the confounder to identify the exposure effect. Spatial scale has been defined and interpreted in different ways in the literature and is often tied to specific models. Paciorek (2010) and Khan & Berrett (2023) define it via the range parameters of the Gaussian processes assumed to model the exposure and the confounder. Schnell & Papadogeorgou (2020) define it through the parameters of the conditional autoregressive models for and . Guan et al. (2022) and Keller & Szpiro (2020) define spatial scales in terms of the coefficients of Fourier or other basis expansions of the two functions. More generally, Schnell & Papadogeorgou (2020) and Gilbert et al. (2024) discuss how differences in the spatial scales of the exposure and confounder are related to the positivity assumption in causal inference. Proposition 1 does not use such specific distributional assumptions and defines spatial scale through the smoothness of the function class for the confounder . If , then it varies at finer scales than . For Proposition 1, the exposure need not be restricted to the orthogonal complement of , as in restricted spatial regression, but can be correlated with , thereby accommodating spatial confounding.
Proposition 1 motivates consideration of the following data generation process for the exposure:
| (3) |
where , with each being some fixed function of space, and where is independent and identically distributed with and .
Yang et al. (2015) assumes a similar data generation process to (3) for the exposure in their study of Bayesian Gaussian process estimators for the partially linear model (1). The data generation process is quite general, allowing to have both a spatial component , correlation of which with effectuates spatial confounding, and a nonspatial noise component with zero mean and a strictly positive-definite variance; a nonzero mean of can be absorbed into as a constant function of space, i.e., an intercept. The nonspatial noise component precludes the scenario that any linear combination is degenerate for any nonzero , ensuring that sample paths, i.e., draws, of the independent and identically distributed error process are almost surely of zero smoothness. This, in turn, guarantees that no linear combination belongs to any smooth function class considered for , implying identifiability from Proposition 1.
3. Analysis models and the consistency of their estimators
3.1. Analysis models
We consider various analysis models for estimating the effect of the exposure on the outcome in the partially linear model (1). Akin to Khan & Berrett (2023), we distinguish between the data generation process, specified in (1) and (3), and the analysis models used because, crucially, they need not coincide: in some cases an analysis model may result in consistent estimation even if it makes modelling assumptions that conflict with the data generation process. Recognizing this distinction is crucial to unravelling the confusion around spatial confounding. The OLS estimator of is . It arises from an analysis model that assumes that is absent from (1) and is thus often referred to as the unadjusted estimator, as it does not attempt to adjust for spatial confounding. Common estimators that adjust for a spatial component in the outcome model include the estimator of from a spline regression (Rice, 1986), which models as a fixed function of space represented using splines. Alternatively, is also commonly modelled as a draw from a random function, i.e., a spatial stochastic process endowed with a Gaussian process prior with zero mean and some covariance function . The Gaussian process regression analysis model can be summarized as a hierarchical spatial mixed-effects linear model:
| (4) |
If is also assigned a prior, then Bayesian Markov chain Monte Carlo algorithms are typically used to estimate and . Alternatively, the GLS estimator arises as the maximum likelihood estimator of from the marginal model obtained from integrating over the Gaussian process prior of in (4), i.e.,
| (5) |
We re-emphasize that (4) is an analysis model here used to derive the GLS estimator. It is not suitable for being a data generation process as the random function , modelled to be independent of the exposure , precludes any mechanism of spatial confounding. The restricted spatial regression analysis model is similar to the mixed-effects model (4), with the additional restriction that is geometrically orthogonal to or, in practice, where . This geometric orthogonality is different from prior specification of statistical independence between and as in (4), which does not guarantee geometric orthogonality between sample paths of and the posterior estimate of . The hard constraint of explicit orthogonality imposed by restricted spatial regression results in the same maximum likelihood estimator for as OLS, that is, .
3.2. Bias of OLS and restricted spatial regression under spatial confounding
We quantify the asymptotic bias of the unadjusted OLS estimator and, equivalently, the restricted spatial regression estimator under spatial confounding.
Proposition 2. Consider locations . Suppose that are generated from (1) with a continuous and are generated from (3) with a continuous such that there is spatial confounding as defined in (2). Then the unadjusted OLS estimator or restricted spatial regression estimator satisfies in probability, where .
Proposition 2 quantifies the bias of the unadjusted OLS estimator or restricted spatial regression estimator under very general conditions of spatial confounding, requiring no assumptions on the functions and beyond continuity and satisfying the spatial confounding condition, and no assumptions on the error processes of the exposure and outcome beyond finite second-order moments.
We give the intuition of the proof here. If were a known function, then the adjusted OLS regression of onto with the offset would yield an unbiased estimator of . We call this the oracle estimator, . Then
Since is unbiased and consistent, as it is an OLS estimate from a correctly specified model, we can immediately see that the nonspatial model is biased under spatial confounding, i.e., if and are correlated, with error . The asymptotic error given in Proposition 2 is the population limit of this error and is nonzero if there is spatial confounding.
This first-order bias of the OLS or restricted spatial regression estimator is unsurprising and stems from the well-known issue of omitting a confounding variable in a regression. Recent criticisms of the restricted spatial regression have mostly focused on its second-order properties, showing that the variance of the restricted spatial regression estimator is anticonservative (Khan & Calder, 2022; Zimmerman & Ver Hoef, 2022). These studies assumed the mixed-effects analysis model (4) as the true data generation process, in which case there is no spatial confounding, as is assumed to be independent of , and thus the estimator from restricted spatial regression is unbiased. We provide a direct first-order bias result for the OLS or restricted spatial regression estimator under any scenario of spatial confounding. This highlights the importance of separating the data generation process from the analysis model.
3.3. Consistency of GLS under spatial confounding
The omitted-variable bias of the OLS or restricted spatial regression estimator under spatial confounding is unsurprising. Similar logic has been used to argue that a spatial GLS estimator (5) cannot adjust for confounding either (e.g., Hodges & Reich, 2010). This notion likely arises from the fact that the GLS estimator is derived from the misspecified model (4): it marginalizes over a spatial random function with a Gaussian process prior assumed to be independent of the exposure , completely ignoring their correlation in the data generation process. We now state our main result, which is contrary to this common belief and intuition: consistency of the GLS estimator under spatial confounding.
Theorem 1. Consider locations , generated from (1) with a continuous , and generated from (3) with a continuous for . Consider the GLS estimate in (5) using a working covariance matrix where and is a stationary Matérn kernel with any fixed set of parameters: variance , spatial decay and smoothness . Then in probability.
Theorem 1 establishes the consistency of the spatial GLS estimator under spatial confounding using very mild assumptions. There is no assumption on the smoothness of and beyond continuity. There is also no restriction on the extent of spatial confounding, which comes from the possible correlation of with , the spatial part of . For example, extreme cases of confounding with some or all components being exactly equal to are accommodated. The only distributional assumptions on the error terms and are their having zero mean and finite, and for nonzero, variance. There is no assumption on the shape of these error distributions, such as normality, even though the GLS estimator is derived by marginalizing the mixed-effects analysis model (4) with normal errors. The Gaussian process kernel used to construct the working covariance matrix can be any Matérn or, for , square exponential kernel, without any restriction on the parameter values used.
Two ingredients are crucial for the consistency result to hold. First, the nugget used in the working covariance matrix , which can be different from the true error variance , needs to be strictly positive even if there is no true nugget, i.e., . Including a positive nugget in the working covariance matrix ensures that the eigenvalues of are bounded and is well behaved. This is not an assumption on the data generation process, as the user can always force the nugget in the working covariance matrix to be nonzero.
Second, and key to the GLS result, is that the exposure must have some nonspatial variation , i.e., must be strictly positive definite, as specified in (3). A similar assumption of additive noise in the exposure was imposed by Thaden & Kneib (2018) and Dupont et al. (2022), who also required this noise to be Gaussian, which we do not do here. Additive noise in the exposure ensures identifiability; see the discussion in § 2.3. We now provide some intuition as to why the GLS is consistent when this holds. The GLS estimator (5) can be viewed as an OLS estimator based on and . From the data generation process (1) and (3), we have and . Crudely, a continuous function can be expanded approximately in terms of eigenfunctions of a Matérn kernel , which is a universal kernel, with larger weights assigned to the ‘flatter’ or ‘low-frequency’ eigenfunctions, corresponding to the larger eigenvalues of and thus of . The term is approximately the norm of the weights scaled by the square root of the respective eigenvalues and will be small as the larger weights, corresponding to larger eigenvalues, will be scaled down more. A similar argument holds for . Relatively, will be much larger as realizations of the noise process will be extremely nonsmooth with high probability, giving large weights to even ‘high-frequency’ eigenfunctions with smaller eigenvalues. So and , involving the spatially smooth functions, are small in magnitude relative to , which involves the noise in the exposure. We then have
As , the GLS estimator between and , which is the OLS estimator between and , is consistent for . The linear transformation using therefore approximately results in a residualization or prewhitening of and , removing their spatial components. Regressing the residual part of the outcome on the residual noise component of the exposure is enough to identify as long as there is a nonzero noise, i.e., nonspatial, component in the exposure, which is ensured by assuming . Residualization has been explicitly used in Thaden & Kneib (2018) and Dupont et al. (2022) to develop two-stage models that can identify the exposure effect under spatial confounding. Our result shows that the GLS carries out the residualization without needing a two-stage analysis model. The formal proof of the GLS consistency result, given in the Supplementary Material, is based on the Mercer representation of the reproducing kernel Hilbert space of a Gaussian process with a universal kernel .
The inclusion of noise in the exposure generation and inclusion of a positive nugget in the working covariance matrix are essential to the consistency result. Bolin & Wallin (2024) showed that the exposure effect cannot be consistently identified under spatial confounding if the outcome depends on a smoothed version of the exposure. Even if there is no confounding, Wang et al. (2020) showed that the GLS estimator is not consistent if the exposure is smooth and there is no nugget in . Thus our result on consistency does not contradict these inconsistency results, which consider smooth exposures and/or the lack of a nugget in the working covariance matrix.
The consistency of GLS under spatial confounding may seem counterintuitive, as reflected by the literature reviewed in § 1. This is because the Gaussian process regression model (4) from which GLS is derived is misspecified, assigning a Gaussian process prior for the confounder that is independent of the exposure, thereby apparently ignoring the correlation between the two. Indeed, the finite-sample bias of the GLS estimator, as derived in Paciorek (2010), Page et al. (2017), Schnell & Papadogeorgou (2020), Nobre et al. (2021) and Khan & Berrett (2023), stems from this misspecification. The bias is where , whose expectation conditional on will generally be nonzero if and are correlated. However, the asymptotic limit of has not been studied previously and Theorem 1 shows that this term vanishes asymptotically even under spatial confounding, leading to consistency of the GLS estimator and dispelling the myth that it cannot be used under spatial confounding. Our GLS consistency result aligns with the consistency of the estimator for derived from Bayesian implementation of the Gaussian process regression (4) (Yang et al., 2015). The consistency of GLS for a fixed function has also been shown previously by He & Severini (2016), but that work did not connect to the literature on spatial confounding, and the result relies on technical assumptions that are hard to verify or interpret. Our result is also in accordance with the empirical and finite-sample analytical studies of Paciorek (2010) and Khan & Berrett (2023), which demonstrated the reduction in bias from GLS estimators when the exposure varies at finer spatial scales than the confounder. The bias of OLS and the consistency of the GLS estimator are confirmed via numerical experiments in the Supplementary Material. An approximate expression for the finite-sample bias of the GLS estimator based on the reproducing kernel Hilbert space representation is also derived and empirically verified in the Supplementary Material. The numerical experiments reveal that interval estimates for GLS can offer poor coverage. This is not surprising as the interval estimates are derived from the analysis model (4), which is clearly misspecified under confounding, and the estimates may be affected by so-called regularization bias, as discussed in the Supplementary Material. So while Theorem 1 shows that the GLS point estimate is robust against model misspecification, this may not be true of the second-order properties of the estimator.
3.4. Consistency of spatial estimators under endogeneity
We now consider a data generation process where the functions and in (1) and (3) are random functions. Often and are grouped together to represent the total error process for the outcome. If is correlated with , then is correlated with the exposure , leading to endogeneity. We now establish that the GLS estimator is also consistent under such endogeneity.
Theorem 2. Consider the data generation process (1) and (3) where is a -dimensional random function, i.e., stochastic process, on that has almost surely continuous sample paths and is independent of and . Then for the same choice of working covariance matrix as in Theorem 1, .
The result is very general, imposing no distributional assumptions on the -variate random function or stochastic process beyond them having continuous sample paths. There is also no restriction on the nature of the correlation between and , thereby allowing for many different scenarios of spatial confounding. For example, can be any -dimensional Gaussian process with any multivariate covariance kernel that leads to continuous sample paths. This encompasses all multivariate Matérn Gaussian processes, including the extreme case of confounding where is a univariate Matérn Gaussian process and for some or all . Theorem 2 also accommodates stochastic processes that are non-Gaussian and nonstationary, demonstrating the robustness of the consistency result for GLS with respect to a wide variety of data generation mechanisms. For both Theorems 1 and 2, the choice of the working covariance matrix is not tied to the true data generation process. Thus need not equal or . The only requirement for is that it be of the form for a universal kernel with any valid choice of parameters and a positive nugget . In the Supplementary Material we empirically confirm the robustness of the GLS consistency under confounding against covariance misspecification.
This result extending the consistency of GLS from fixed to random spatial functions in the data generation process relies on a general result, Proposition S1 in the Supplementary Material, about when consistency conditional on a given realization of a random function, i.e., a fixed function, implies marginal consistency accounting for the distribution of the random function. This proposition also applies to other estimators, such as splines or Gaussian process regression, whose consistency for the partially linear model under spatial confounding has been established for fixed (Rice, 1986; Yang et al., 2015), proving that they will also be consistent, under corresponding assumptions, if is random. Thus, when confounding is by a random spatial function, these estimators that are based on exogenous analysis models and which ignore the correlation between the exposure and the outcome errors, can control for endogeneity. This result, seemingly at odds with the conventional wisdom about endogeneity, holds owing to the smoothness of the unmeasured confounder, as opposed to a nonsmooth or discrete unmeasured confounder, and demonstrates the benefits of spatially continuous confounding as opposed to unstructured or grouped confounding. We confirm the result via simulations in the Supplementary Material.
4. Discussion
Throughout this study we have separated the data generation process and the analysis model used to derive an estimate of the exposure effect. We define spatial confounding from the principles of causal inference via potential outcomes. We emphasize that the presence or absence of spatial confounding is determined by the data generation process, whereas the analysis model plays a role in determining whether an estimator derived from it can adjust for unmeasured spatial confounding. An unadjusted OLS estimator derived from an analysis model that regresses on will generally not be able to adjust for omitted-confounder bias. Similarly, the restricted spatial regression estimator is derived from an analysis model that places a hard constraint on the spatial variable in the outcome regression to be geometrically orthogonal to the exposure, thereby precluding the possibility that it could be a confounder and thus failing to adjust for it. The Gaussian process regression model from which the GLS estimator is derived is also misspecified a priori, as the Gaussian process prior for the spatial variable is modelled to be independent of, i.e., statistically orthogonal to, the exposure. However, this prior specification is only a soft constraint, and when endowing the Gaussian process with a universal kernel such as the Matérn kernel, the prior gives positive mass to neighbourhoods of any continuous function, including functions that are empirically correlated with the sample paths of the smooth part of the exposure. This enables the GLS estimator to adjust for an unmeasured smooth confounder given enough data and some noise in the exposure. So a misspecified analysis model can still adjust for spatial confounding.
We confirm the consistency result via simulation experiments in the Supplementary Material. Our theoretical results also agree with the conclusions of Khan & Berrett (2023), who provided expressions, similar to the one in Lemma 2, for the finite-sample bias of the unadjusted OLS estimator and of the GLS estimator and studied how the numerator and denominator of the bias terms behave with respect to the smoothness of the exposure and the confounder. They generally concluded that the GLS estimator mitigates bias relative to the OLS estimator, but did not study the asymptotic properties of either one. Our asymptotic results complement the finite-sample analytical study and the extensive numerical experiments of Khan & Berrett (2023) in showing that the GLS estimator is indeed consistent under a wide variety of data generation scenarios with spatial confounding where the OLS estimator is inconsistent.
Consistency is only the first step. Future studies will focus on how the choice of the working covariance matrix, relative to the true data covariance, determines the efficiency of the estimators. The numerical experiments also reveal that in finite samples, even some of these consistent spatial estimators may lead to poor coverage as their variance estimates based on specific analysis models are not robust against model misspecification. Developing valid variance or interval estimates for effect estimators under spatial confounding is an important future research direction.
There is a growing inventory of novel approaches to spatial confounding (Papadogeorgou et al., 2018; Thaden & Kneib, 2018; Keller & Szpiro, 2020; Schnell & Papadogeorgou, 2020; Dupont et al., 2022; Guan et al., 2022; Marques et al., 2022; Gilbert et al., 2024). Most approaches assume a linear exposure-outcome relationship, as we do here, and many rely on additional assumptions about the data generation process. For example, Schnell & Papadogeorgou (2020) assumes a specific Markov structure on the joint distribution of the exposure and confounder, a ring graph design, and normality. The robustness of these estimators under relaxed assumptions needs to be studied. We have limited our attention to the partially linear data generation process, but of course in practice the parametric restrictions encoded in the partially linear model outcome model may not hold; Gilbert et al. (2024) proposed a robust nonparametric causal inference framework for understanding and mitigating spatial confounding under minimal parametric assumptions. Many of these new estimators might be preferable to the traditional GLS estimator, as they are designed to mitigate spatial confounding and are likely to possess better second-order properties. We do not study these new estimators here as the goal is to clarify contradictory claims about the first-order performance of traditional spatial estimators from OLS, restricted spatial regression and GLS.
Supplementary Material
The Supplementary Material contains further simulations and proofs of the theorems and propositions.
Acknowledgement
We thank the editor for suggesting the current presentation of the consistency of the GLS estimator under confounding. We also thank Professor James S. Hodges for helpful discussions on this topic and for providing extensive feedback on an earlier version of the manuscript, as well as two reviewers for their constructive comments. We gratefully acknowledge use of the facilities at the Joint High Performance Computing Exchange in the Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health. This work was partially supported by the U.S. National Institute of Environmental Health Sciences (R01 ES033739) and Office of Naval Research (N00014-21-1-2820).
References
- Azevedo DR, Prates MO & Bandyopadhyay D (2023). Alleviating spatial confounding in frailty models. Biostatistics 24, 945–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell A, Fairbrother M & Jones K (2018). Fixed and random effects models: Making an informed choice. Qual. Quant 53, 1051–74. [Google Scholar]
- Bolin D & Wallin J (2024). Spatial confounding under infill asymptotics. arXiv: 2403.18961.
- Clayton DG, Bernardinelli L & Montomoli C (1993). Spatial correlation in ecological analysis. Int. J. Epidemiol 22, 1193–202. [DOI] [PubMed] [Google Scholar]
- Dupont E, Wood SN & Augustin NH (2022). Spatial+: A novel approach to spatial confounding. Biometrics 78, 1279–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert B, Datta A, Casey JA & Ogburn EL (2024). A causal inference framework for spatial confounding. arXiv: 2112.14946v12.
- Guan Y, Page GL, Reich BJ, Ventrucci M & Yang S (2022). Spectral adjustment for spatial confounding. Biometrika 110, 699–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hanks EM, Schliep EM, Hooten MB & Hoeting JA (2015). Restricted spatial regression in practice: Geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26, 243–54. [Google Scholar]
- He H & Severini TA (2016). A flexible approach to inference in semiparametric regression models with correlated errors using Gaussian processes. Comp. Statist. Data Anal 103, 316–29. [Google Scholar]
- Hodges JS & Reich BJ (2010). Adding spatially-correlated errors can mess up the fixed effect you love. Am. Statistician 64, 325–34. [Google Scholar]
- Hughes J & Haran M (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R. Statist. Soc. B 75, 139–59. [Google Scholar]
- Keller JP & Szpiro AA (2020). Selecting a scale for spatial confounding adjustment. J. R. Statist. Soc. A 183, 1121–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khan K & Berrett C (2023). Re-thinking spatial confounding in spatial linear mixed models. arXiv: 2301.05743v2.
- Khan K & Calder CA (2022). Restricted spatial regression methods: Implications for inference. J. Am. Statist. Assoc 117, 482–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marques I, Kneib T & Klein N (2022). Mitigating spatial confounding by explicitly correlating Gaussian random fields. Environmetrics 33, e2727. [Google Scholar]
- Narcisi M, Greco F & Trivisano C (2024). On the effect of confounding in linear regression models: An approach based on the theory of quadratic forms. Envir. Ecol. Statist 31, 433–61. [Google Scholar]
- Nobre WS, Schmidt AM & Pereira JB (2021). On the effects of spatial confounding in hierarchical models. Int. Statist. Rev 89, 302–22. [Google Scholar]
- Paciorek CJ (2010). The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Statist. Sci 25, 107–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Page GL, Liu Y, He Z & Sun D (2017). Estimation and prediction in the presence of spatial confounding for spatial linear models. Scand. J. Statist 44, 780–97. [Google Scholar]
- Papadogeorgou G, Choirat C & Zigler CM (2018). Adjusting for unmeasured spatial confounding with distance adjusted propensity score matching. Biostatistics 20, 256–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prates MO, Assunção RM & Rodrigues EC (2019). Alleviating spatial confounding for areal data problems by displacing the geographical centroids. Bayesian Anal. 14, 623–47. [Google Scholar]
- Reich BJ, Hodges JS & Zadnik V (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62, 1197–206. [DOI] [PubMed] [Google Scholar]
- Rice J (1986). Convergence rates for partially splined models. Statist. Prob. Lett 4, 203–8. [Google Scholar]
- Schnell PM & Papadogeorgou G (2020). Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths. Ann. Appl. Statist 14, 2069–95. [Google Scholar]
- Thaden H & Kneib T (2018). Structural equation models for dealing with spatial confounding. Am. Statistician 72, 239–52. [Google Scholar]
- Wakefield J (2006). Disease mapping and spatial regression with count data. Biostatistics 8, 158–83. [DOI] [PubMed] [Google Scholar]
- Wang W, Tuo R & Wu CFJ (2020). On prediction properties of kriging: Uniform error bounds and robustness. J. Am. Statist. Assoc 115, 920–30. [Google Scholar]
- Yang Y, Cheng G & Dunson DB (2015). Semiparametric Bernstein-von Mises theorem: Second order studies. arXiv: 1503.04493.
- Zimmerman DL & Ver Hoef JM (2022). On deconfounding spatial confounding in linear models. Am. Statistician 76, 159–67. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
