Spectral adjustment for spatial confounding

YAWEN GUAN; GARRITT L PAGE; BRIAN J REICH; MASSIMO VENTRUCCI; SHU YANG

doi:10.1093/biomet/asac069

. Author manuscript; available in PMC: 2024 Mar 18.

Published in final edited form as: Biometrika. 2022 Dec 21;110(3):699–719. doi: 10.1093/biomet/asac069

Spectral adjustment for spatial confounding

YAWEN GUAN ¹, GARRITT L PAGE ², BRIAN J REICH ³, MASSIMO VENTRUCCI ⁴, SHU YANG ⁵

PMCID: PMC10947425 NIHMSID: NIHMS1970240 PMID: 38500847

Summary

Adjusting for an unmeasured confounder is generally an intractable problem, but in the spatial setting it may be possible under certain conditions. We derive necessary conditions on the coherence between the exposure and the unmeasured confounder that ensure the effect of exposure is estimable. We specify our model and assumptions in the spectral domain to allow for different degrees of confounding at different spatial resolutions. One assumption that ensures identifiability is that confounding present at global scales dissipates at local scales. We show that this assumption in the spectral domain is equivalent to adjusting for global-scale confounding in the spatial domain by adding a spatially smoothed version of the exposure to the mean of the response variable. Within this general framework, we propose a sequence of confounder adjustment methods that range from parametric adjustments based on the Matérn coherence function to more robust semiparametric methods that use smoothing splines. These ideas are applied to areal and geostatistical data for both simulated and real datasets.

Keywords: Coherence, Conditional autoregressive prior, COVID-19, Matérn covariance, Spatial confounding

1. Introduction

A fundamental task in environmental and epidemiological applications is to use spatially correlated observational data to estimate the effect of exposures. A key assumption needed to identify the effect of the exposures is that all relevant confounding variables have been included in the statistical model. This no-missing-confounder assumption is generally impossible to verify, but in the spatial setting it may be possible to remove the effects of unmeasured confounding variables if they have strong spatial dependence. An unmeasured spatial confounder exists when spatially varying factors that influence both the exposures and response are not observed. When the unmeasured spatial confounder is not taken into account, the effect estimate of an exposure can be biased, and the bias depends on the spatial scales of the exposure and the unmeasured spatial confounder (Paciorek, 2010; Page et al., 2017).

A slightly different but related confounding issue in modelling spatial data was discussed in Clayton et al. (1993). They proposed to model the geographical patterns in the response using a spatial random effect term independent of exposure to account for unmeasured spatially structured covariates that influence the response. Then, confounding, i.e., confounding due to location, may arise if exposure also varies smoothly with location, and the location may act as a confounder. In this case, regressions with and without spatial random effects can give different inference results on regression coefficients (Reich et al., 2006; Hodges & Reich, 2010), a phenomena also known as spatial confounding. For confounding due to location, the unmeasured spatial covariates do not directly influence the exposure, but rather are multicollinear with the exposure, and the multicollinearity is more likely to occur when the spatial random effects and exposure are both spatially smooth. In this paper, we focus on spatial confounding due to an unmeasured spatial confounder, in which the confounding due to location can be considered as a special case when the association between exposure and missing spatial covariates is zero.

To account for spatial confounding due to location, Reich et al. (2006), Hughes & Haran (2013) and Prates et al. (2019) restricted the residual spatial process to be orthogonal to the exposure, an approach referred to as restricted spatial regression. However, the approach makes a strong orthogonality assumption and can perform poorly in coefficient inference when the model is misspecified (Hanks et al., 2015; Khan & Calder, 2022; Zimmerman & Hoef, 2022). Alternative approaches with a focus on decomposing spatial scales following Paciorek (2010) have appeared in the literature. The main idea is that the location confounding can be eliminated by first removing the smooth components from both the exposure and response (Thaden & Kneib, 2018) or just the exposure (Keller & Szpiro, 2020; Dupont et al., 2022), then the exposure effect can be estimated by assessing the local variations in the covariate and response. More recently, Marques et al. (2022) proposed a joint Gaussian Markov random field model for exposure and response. Their work was developed independently around the same time as ours and is related to a special case of our parametric model. The listed works attempt to alleviate spatial confounding, but, in general, adjusting for a missing confounding variable is impossible without further information or assumptions. It remains unclear how to specify assumptions and methods that lead to consistent estimation of the exposure effect in the presence of an unmeasured spatial confounding variable, which may posit a more complex confounding structure than location confounding.

Connections with causal inference have been made. For areal data, Thaden & Kneib (2018) and Schnell & Papadogeorgou (2020) proposed jointly modelling the spatial structure in exposure and the unmeasured confounder. The former uses a structural equation model while the latter uses a Gaussian Markov random field construction. Both these methods have connections with causal inference; see Reich et al. (2021) for a recent review of spatial causal inference. In the spatial causal effect setting, Osama et al. (2019) permitted the spatial causal effect to vary across space. Different assumptions on the confounding relationship may lead to a variety of approaches.

We propose new methods to couch spatial regression with missing spatial confounding variables using spectral methods. Similarly to Paciorek (2010), Page et al. (2017) and Keller & Szpiro (2020), the spatial scales of exposure and missing confounder are the focus, but we explicitly specify a joint model for these variables in the spectral domain and study their coherence, i.e., their correlation at different spatial scales. As an aside, from a temporal perspective, Stokes & Purdon (2017) and Faes et al. (2019) considered a frequency-domain measure of causality, although their estimators are quite different than those proposed here. The resulting effect estimate from our approach reveals that the optimal confounder adjustment is a function of the coherence function, providing fundamental insights on spatial confounding. We show that the optimal confounder adjustment is not estimable without further assumptions, and provide a set of conditions that allow us to identify the exposure effect. Parametric and nonparametric methods are developed to approximate the optimal confounding adjustment and identify the exposure effect, while accounting for uncertainty in this approximation. We consider both areal and point-referenced data for Gaussian and non-Gaussian responses. Proofs are given in the Supplementary Material.

2. Continuous-space modelling framework

2.1. Preliminaries

Let $X (s)$ and $Z (s)$ be the observed exposure and confounder processes, respectively, at the spatial location $s \in D \subset ℛ^{2}$ , and let vectors $X = {(X_{1}, \dots, X_{n})}^{T}$ and $Z = {(Z_{1}, \dots, Z_{n})}^{T}$ be the process evaluated at the set of locations $𝒮 = \{s_{1}, \dots, s_{n}\} \in D$ , where $X_{i} = X (s_{i})$ and $Z_{i} = Z (s_{i})$ . For simplicity, we consider only a single exposure and confounding variable, but results extend to a multivariate setting; see the Supplementary Material. We do not assume that $𝒮$ is on a complete grid nor that the $n$ observation locations are distinct, i.e., we allow for multiple observations at the same location with the inclusion of a nugget effect. Both $X$ and $Z$ are assumed to be spatial processes, potentially with some nonspatial nugget variability.

Following the commonly used spatial regression model, we assume a linear additive relationship for the response $Y = {(Y_{1}, \dots, Y_{n})}^{T}$ . Here, we present the method for continuous response, but the extension to non-Gaussian cases is straightforward, and therefore the details for the generalized model are presented in the Supplementary Material.

We have

Y = β_{0} + β_{x} X + β_{z} Z + ε,

(1)

where $ε = {(ε_{1}, \dots, ε_{n})}^{T}$ and $ε_{i} \overset{i.i.d.}{~} N (0, σ^{2})$ , a normal distribution with mean 0 and variance $σ^{2}$ . The regression coefficient $β_{x}$ has a causal interpretation under the potential outcomes framework and the stable-unit-treatment-value, consistency and conditional-treatment-ignorability assumptions. If we observe the confounder $Z$ , identification and estimation of $β_{x}$ is straightforward using multiple linear regression. However, we assume that $Z$ is an unmeasured confounder, making $β_{x}$ not identifiable in general. We propose to exploit the spatial structure of $Z$ to mitigate the effects of the unobserved confounder and specify assumptions in the spectral domain for identifying $β_{x}$ . Our inference procedure involves introducing a confounder adjustment variable $\hat{Z}$ to the linear model. The derivation of $\hat{Z}$ is based on the association between $X$ and $Z$ in the spectral domain, while in the spatial domain, $\hat{Z}$ can be viewed as a smoothed version of $X$ under one of our assumptions for identification. Our work aims to identify and estimate $β_{x}$ , which has a causal interpretation under the potential outcomes framework and assumptions described in the Supplementary Material. Here instead of restating all elements of the causal framework, in the remaining sections, we assume that the confounder process $Z (s)$ contains all unmeasured confounders and mainly focus on mitigating the effects of $Z (s)$ . Therefore, we interpret $β_{x}$ as the effect of exposure under model (1) rather than the causal effect, as the latter requires additional assumptions that are not essential for $β_{x}$ inference in this work.

2.2. Spectral representation of confounding and identification

We model the dependence between $X (s)$ and $Z (s)$ using their spectral representations. This allows for different dependencies at different spatial scales as each frequency corresponds to a spatial scale, with low frequency corresponding to large spatial scale. We assume that both $X (s)$ and $Z (s)$ are mean-zero stationary Gaussian processes, and thus have spectral representations $X (s) = \int e x p (i ω^{T} s) 𝒳 (ω) d ω$ and $Z (s) = \int e x p (i ω^{T} s) 𝒵 (ω) d ω$ , where $ω \in ℛ^{2}$ is a frequency. The spectral processes $𝒳 (ω)$ and $𝒵 (ω)$ are Gaussian with $E {𝒳 (ω)} = E {𝒵 (ω)} = 0$ and are independent across frequencies, so that, for any $ω \neq ω^{'}, c o v \{𝒳 (ω), 𝒳 (ω^{'})\} = c o v \{𝒵 (ω), 𝒵 (ω^{'})\} = c o v \{𝒳 (ω), 𝒵 (ω^{'})\} = 0$ . At the same frequency, the covariance of the joint spectral process has the form

cov \{\begin{array}{l} 𝒳 (ω) \\ 𝒵 (ω) \end{array}\} = \{\begin{matrix} σ_{x}^{2} f_{x} (ω) & ρ σ_{x} σ_{z} f_{x z} (ω) \\ ρ σ_{x} σ_{z} f_{x z} (ω) & σ_{z}^{2} f_{z} (ω) \end{matrix}\},

where $σ_{x}^{2}$ and $σ_{z}^{2}$ are variance parameters, $f_{x} (ω) > 0$ and $f_{z} (ω) > 0$ are spectral densities that determine the marginal spatial correlation of $X (s)$ and $Z (s)$ , respectively, and the cross-spectral density $f_{x z} (ω)$ determines the dependence between the spectral processes.

Normalizing the cross-spectral density by each marginal standard deviation, we can derive the coherence function that determines the correlations between the two spectral processes (Kleiber, 2017),

γ (ω) = ρ \frac{f_{x z} (ω)}{\sqrt f_{x} (ω) f_{z} (ω)} \in [- 1, 1] .

(2)

The scalar parameter $ρ$ controls the overall strength of cross-correlation.

Returning to the response model (1), we let $Y (s) = \int e x p (i ω^{T} s) 𝒴 (ω) d ω$ be the spectral representation of the response. The conditional distribution of $𝒴 (ω)$ given $𝒳 (ω)$ , marginalizing over $𝒵 (ω)$ , is

𝒴 (ω) ∣ 𝒳 (ω) \overset{indep}{~} N \{β_{x} 𝒳 (ω) + β_{z} α (ω) 𝒳 (ω), τ^{2} (ω) + σ^{2}\},

(3)

α (ω) = ρ \frac{σ_{z} f_{x z} (ω)}{σ_{x} f_{x} (ω)} = \frac{σ_{z} \sqrt f_{z} (ω)}{σ_{x} \sqrt f_{x} (ω)} γ (ω),

τ^{2} (ω) = β_{z}^{2} σ_{z}^{2} f_{z} (ω) {1 - ρ^{2} \frac{f_{x z} {(ω)}^{2}}{f_{x} (ω) f_{z} (ω)}} .

The regression coefficient for $𝒳 (ω)$ is $β (ω) = β_{x} + β_{z} α (ω) \neq β_{x}$ . The additional term $\hat{𝒵} (ω) = E {𝒵 (ω) ∣ 𝒳 (ω)} = α (ω) 𝒳 (ω)$ is a result of attributing the effect of the unmeasured confounder on the response to the exposure, potentially inducing bias in estimating $β_{x}$ .

Therefore, $β_{x}$ is identified only if the projection operator $α (ω)$ can be assumed to be known or estimated for some prespecified $ω$ . Of course, $α (ω)$ is generally not known and cannot be estimated without further assumptions because $Z (s)$ , and therefore $𝒵 (ω)$ , is not observed. We consider two approaches for identification: assume unconfoundedness at high frequencies, i.e., $α (ω) \approx 0$ for large $∥ ω ∥$ , so that high-frequency terms identify $β_{x}$ ; or specify a parsimonious coherence function with constraints on the parameters to ensure identification of $β_{x}$ through estimation of $α (ω)$ . We detail both approaches next.

For the case of unconfoundedness at high frequencies, if we assume that $α (ω) \to 0$ for large $∥ ω ∥$ then $E {𝒴 (ω) ∣ 𝒳 (ω)} \approx β_{x} 𝒳 (ω)$ and thus $β_{x}$ is identified. The assumption that $α (ω) \to 0$ for large $∥ ω ∥$ implies that the cross-spectral density decreases to zero faster than the spectral density of $X$ , which means that confounding dissipates as the frequency increases, that is, as the scale of the spatial variation becomes smaller. High-frequency terms provide the most reliable information about $β_{x}$ because they correlate local changes in the exposure with local changes in the response. An extreme case of local information about the exposure effect is the difference in the response for two nearby sites with different levels of exposure. This local difference eliminates problems caused by omitted variables that vary smoothly over space. Of course, this cannot completely rule out missing confounding variables that covary with both the exposure and the response at high frequencies, but it does lessen the likelihood of spurious confounding effects.

For the case of parsimonious coherence, if we assume that $f_{x z} (ω) = C \sqrt f_{x} (ω) f_{z} (ω)$ for a constant $C$ , then the coherence in (2) simplifies to the constant function $γ (ω) = ρ C \in (- 1, 1)$ . Generalizing the use of a term from Gneiting et al. (2010), we refer to this as the parsimonious coherence model. This imposes the assumption that the correlation between the exposure and missing confounder is frequency invariant, and this greatly simplifies estimation because the model involves only two spectral densities that can be estimated using the marginal spatial covariances of the response and exposure, as described below. Moreover, if the marginal spectral densities differ for some frequencies then these frequencies can be used to identify $β_{x}$ . The expression in (3) simplifies under the parsimonious model to

𝒴 (ω) | 𝒳 (ω) \overset{ind .}{~} N [{β_{x} + ρ C β_{z} \frac{σ_{z} \sqrt f_{z} (ω)}{σ_{x} \sqrt f_{x} (ω)}} 𝒳 (ω), {1 - {(ρ C)}^{2}} β_{z}^{2} σ_{z}^{2} f_{z} (ω) + σ^{2}] .

(4)

It is important to establish the identifiability of parameters in (4). The $ρ$ and $C$ are not uniquely identified, nor are $β_{z}$ and $σ_{z}$ , as they both appear in the model only through products. However, $ρ^{*} = ρ C$ is identified, as is $σ_{z}$ if $β_{z} = 1$ . An alternative parameterization is to let $σ_{z}^{*} = β_{z} σ_{z}$ and $σ_{z}^{*}$ can be identified. The two expressions are equivalent, which does not affect the result of Theorem 1 below. The identifiability of $ρ^{*}, σ_{z}$ under these conditions and that of the remaining parameters in (4) is established in the following theorem.

Theorem 1. Assume that $f_{x} (ω) \neq f_{z} (ω)$ for some $ω$ , and set $β_{z} = 1$ . Then parameters $β_{x}, ρ^{*}, σ_{x}$ and $σ_{z}$ , and functions $f_{x}$ and $f_{z}$ in model (4) are all identified.

2.3. Spatial representation of confounding and identification

Returning to the spatial domain, the response process can be written as

Y (s) ∣ X (s) = β_{0} + β_{x} X (s) + β_{z} \hat{Z} (s) + δ (s),

\hat{Z} (s) = \int exp (i ω^{T} s) \hat{Ƶ} (ω) d ω = \int exp (i ω^{T} s) α (ω) 𝒳 (ω) d ω,

and $δ (s)$ is a mean-zero Gaussian process with spectral density $τ^{2} (ω) + σ^{2}$ independent of $X (s)$ and $\hat{Z} (s)$ . The function $α (ω)$ acts as a smoothing operator and, if it were known, then $\hat{Z} (s)$ would be an appropriate adjustment to the mean to account for the unmeasured confounder. The form of the oracle confounder adjustment, i.e., $\hat{Z} (s)$ if $α (ω)$ is known, is established in the following lemma.

Lemma 1. If $α (ω)$ is known then $\hat{Z} (s) = \int K (s - s^{'}) X (s^{'}) d s^{'}$ , where the kernel function $K (s - s^{'})$ is the inverse Fourier transform of $α (ω)$ .

The appealing consequence of Lemma 1 is that the oracle confounder adjustment is conveniently expressed as a kernel-smoothed function of the covariate of interest. It is also straightforward to show that, for any $n$ locations $𝒮$ ,

Y ∣ X = β_{0} + β_{x} X + β_{z} \hat{Z} + δ, where \hat{Z} = Σ_{z x} Σ_{x}^{- 1} X,

(5)

$c o v (δ) = β_{z}^{2} (Σ_{z} - Σ_{z x} Σ_{x}^{- 1} Σ_{z x}^{T}) + σ^{2} I_{n}$ and $Σ_{z x} = c o v (Z, X), Σ_{x} = c o v (X)$ and $Σ_{z} = c o v (Z)$ . The product $Σ_{z x} Σ_{x}^{- 1}$ serves as a smoothing operator on $X$ . This representation is convenient for estimation and to determine the strength of confounding dependence between $X$ and $Z$ .

Since $\hat{Z}$ is a smoothed version of $X$ , including it as a covariate in (5) effectively removes effects of the large-scale spatial trends in $X$ , so that the estimate of $β_{x}$ is largely determined by high-frequency terms. This expression also lays bare the importance of assuming that $α (ω)$ converges to zero for large frequencies or that $X$ and $Z$ have different spectral densities. If restrictions are not placed on $α (ω)$ then it may be that $Σ_{z x} = Σ_{x}$ and thus $\hat{Z} = X$ , giving the nonidentifiable model $Y = β_{0} + β_{x} X + β_{z} X + δ$ .

3. Continuous-space estimation strategies

3.1. The bivariate Matérn parametric model

The bivariate Matérn model (Gneiting et al., 2010; Apanasovich et al., 2012) is a flexible parametric model for the spectral densities $f_{x}, f_{z}$ and $f_{x z}$ . The Matérn spectral density function for a process in two dimensions is $m (ω; v, ϕ) = v ϕ^{- 2 v} {(ϕ^{- 2} + ∥ ω ∥^{2})}^{- (v + 1)}$ , with smoothness $v > 0$ and spatial range $ϕ > 0$ . The bivariate Matérn may have different parameters for each process, $f_{j} (ω) = m (ω; v_{j}, ϕ_{j})$ for $j \in {x, z, x z}$ , but constraints on the range and smoothness parameters are needed to ensure that the coherence is positive definite for all $ω$ (Gneiting et al., 2010; Apanasovich et al., 2012). Another advantage of the Matérn parametric model is the closed-form expressions for both the spectral density and covariance functions. Under a common range assumption, as described in the next paragraph, the projection operator $α (ω)$ also has a closed-form Fourier transformation, allowing the estimation procedure to be performed completely in the spatial domain using (5).

With the bivariate Matérn modelling assumption, the projection operator has the form $α (ω) = ρ σ_{z} m (ω; v_{x z}, ϕ_{x z}) / \{σ_{x} m (ω; v_{x}, ϕ_{x})\}$ . Therefore, if the cross-spectral density decays faster than the covariate spectral density, the confounding adjustment will be smaller for higher frequencies, i.e., $α (ω) \to 0$ . Comparison of the ratio of spectral densities such that $α (ω) \to 0$ is complicated in general, and so we explore the special cases of common range, common smoothness and parsimonious models below. For each special case, we discuss the parameter settings that ensure unconfoundedness at high frequencies.

The common range model takes $ϕ_{j} = ϕ$ for $j \in {x, z, x z}$ ,

α (ω) = ρ \frac{σ_{z}}{σ_{x}} {(ϕ^{- 2} + ∥ ω ∥^{2})}^{- (v_{x z} - v_{x})} .

(6)

In this case, we have unconfoundedness at high frequencies, $α (ω) \to 0$ , if and only if $v_{x z} > v_{x}$ , i.e., the cross-covariance is smoother than the covariate covariance. On the other hand, if we assume a common smoothness $v_{j} = v$ for $j \in {x, z, x z}$ then $α (ω) \to {(ϕ_{x} / ϕ_{x z})}^{2 v}$ and confounding persists at high frequencies, regardless of the range parameters. Therefore, a common range parameterization allows us to identify the exposure effect by reducing high-resolution confounding while a common smoothness parameterization will not. For simplicity, we assume a common range in the remainder of this section.

Figure 1 illustrates the confounder adjustment $\hat{Z}$ in (5) for the bivariate Matérn model with a common range and different smoothness $v_{z x} = c v_{x}$ for increasing values of $c$ . Increasing $c$ implies increasing decay rates of $α (ω)$ . The original simulated $X$ is plotted in the left panel of Fig. 1 with $c = 1$ , in which case we have $\hat{Z} = X$ and thus a completely confounded model. In the cases with $c > 1$ the confounder adjustment $\hat{Z}$ is a smoothed version of $X$ . Therefore, including $\hat{Z}$ as a covariate in the model removes large-scale trends in $X$ to adjust for confounding at low frequencies.

Fig. 1. — Example confounder adjustment for the bivariate Matérn: $X (s)$ is generated from Matérn with $ϕ_{x} = 1$ and $v_{x} = 1$ on a 50 × 50 grid with grid spacing one. The panels show the confounder adjustment $\hat{Z} (s)$ for $ϕ_{x z} = ϕ_{x}$ and $v_{x z} = c v_{x}$ for $c \in {1, 3, 5}$ . For $c = 1, \hat{Z} (s) = X (s)$ .

The unmeasured confounder cannot be observed, making it difficult to estimate all the parameters in the bivariate Matérn model. Therefore, additional constraints are required for identifiability of the remaining parameters in addition to $β_{x}$ . These are provided in the next theorem

Theorem 2. Assuming that $β_{z} = 1$ and a common range parameter, sufficient conditions for identifiability of the remaining parameters $\{β_{x}, ρ, v_{x}, v_{z}, v_{x z}, σ_{x}, σ_{z}\}$ are a large cross-smoothness parameter $v_{x z} > m a x \{v_{x}, (v_{x} + v_{z}) / 2\}$ and $ρ^{2} < v_{x} v_{z} / v_{x z}^{2}$ .

The common range model simplifies further under the parsimonious model in (4) with $C = (v_{x} + v_{z}) / (2 \sqrt v_{x} v_{z}), | ρ | < 1 / C$ and $f_{x z} (ω) = m \{ω; (v_{x} + v_{z}) / 2, ϕ\}$ . This is the parsimonious Matérn model of Gneiting et al. (2010), i.e., the cross-smoothness equals the average of the marginal smoothness parameters, $v_{x z} = (v_{x} + v_{z}) / 2$ . Under this model, the confounder adjustment becomes

α (ω) = ρ \frac{σ_{z}}{σ_{x}} {(ϕ^{- 2} + ∥ ω ∥^{2})}^{- (v_{z} - v_{x}) / 2},

(7)

and thus $α (ω) \to 0$ if and only if $v_{z} > v_{x}$ , i.e., the missing confounder is smoother than the exposure. On the other hand, if $v_{z} < v_{x}$ then $α (ω) \to \infty$ . However, $α (ω) \to 0$ is not needed here as the identification strategy for the parsimonious coherence is established in Theorem 1.

3.2. Semiparametric model

Rather than indirectly modelling the projection operator $α (ω)$ via a model for the cross-covariance function, in this section we directly model $α (ω)$ using a flexible mixture model. We use a linear combination of cubic B-splines of order 4,

α (ω) = \sum_{l = 1}^{L} B_{l} (∥ ω ∥) b_{l}, 0 < ∥ ω ∥ < π / △_{s},

where the $B_{l} (\cdot)$ are B-spline basis functions, the $b_{l}$ are the associated coefficients and $△_{s} = m a x \{d s_{1}, d s_{2}\}$ for grid spacing $(d s_{1}, d s_{2})$ . A uniform sequence of knots $\{ω_{1}^{*}, \dots, ω_{L}^{*}\}$ is placed to cover the interval $[0, π / △_{S}]$ , such that $0 \in (ω_{1}^{*}, ω_{2}^{*})$ and $π / △_{S} \in (ω_{L - 1}^{*}, ω_{L}^{*})$ . The interval upper bound $π / △_{S}$ is the largest spectrum that can be observed from uniformly spaced data due to aliasing (Fuentes & Reich, 2010). We have restricted the projection operator to be isotropic by taking the Euclidean norm $∥ \cdot ∥$ of the two-dimensional frequency $ω$ , but this can be relaxed by using bivariate spline functions. Other mixture priors (Reich & Fuentes, 2012; Jang et al., 2017; Chen et al., 2021) can also be used for modelling the projection operator.

The B-spline mixture model for $α (ω)$ does not have a closed-form inverse Fourier transformation. We approximate the kernel-smoothed function $K (s - s^{'})$ with a finite sum at a set of equally spaced frequencies $ℱ = \{ω_{1}^{f}, \dots, ω_{m}^{f}\}$ with spacing $△_{ℱ}$ and $ω_{m}^{f} = π / △_{S}$ following Qadir & Sun (2020):

K (s - s^{'}) = \sum_{ω^{f} \in ℱ} h {(\frac{2 π ω^{f}}{h})}^{κ + 1} 𝒥_{κ} (ω^{f} h) α (ω^{f}) Δ_{ℱ} .

Here $h = ∥s - s^{'}∥, κ = d / 2 - 1$ and $𝒥_{κ} (\cdot)$ is a Bessel function of the first kind of order $κ$ (Watson, 1995). This approximation allows us to directly compute confounder adjustment in the spatial domain, which would otherwise require the Fourier transform of data to perform analysis in the spectral domain. The confounder adjustment is then given by $\hat{Z} (s) = \sum_{l = 1}^{L} b_{l} {\hat{Z}}_{l} (s)$ , where

{\hat{Z}}_{l} (s) = \sum_{ω^{f} \in ℱ} {(2 π ω^{f})}^{κ + 1} B_{l} (ω^{f}) Δ_{ℱ} \int \frac{𝒥_{κ} (ω^{f} h)}{h^{κ}} X (s^{'}) d s^{'} .

(8)

When $X (s)$ is observed on a grid, the integral can be approximated as $\int 𝒥_{κ} (ω^{f} h) h^{- κ} X (s^{'}) d s^{'} = (1 / n) \sum_{i = 1}^{n} 𝒥_{κ} (ω^{f} h_{i}) h_{i}^{- κ} X (s_{i})$ with $h_{i} = ∥s - s_{i}∥$ . For nongridded data, the covariate can be interpolated to a grid and this discrete approximation to the grid can be applied. Other numerical approximations to integrals can be applied to nongridded data such as the finite element method (Johnson, 2012, Ch. 12). The confounder adjustment covariates ${\hat{Z}}_{l} (s)$ are precomputed to reduce computation during model fitting. We then fit the spatial model

Y (s) = β_{0} + β_{x} X (s) + \sum_{l = 1}^{L} b_{l} {\hat{Z}}_{l} (s) + δ (s),

(9)

where $β_{z} = 1$ for identification and $δ (s)$ is modelled as a Gaussian process with nugget. The coefficients ${(b_{1}, \dots, b_{L})}^{T}$ are given intrinsic autoregressive priors with full conditional distributions $b_{k} ∣ b_{(- k)} ~ N ({\overline{b}}_{k}, σ_{b}^{2} / N_{k})$ , where ${\overline{b}}_{k}$ is the mean of the $N_{k}$ coefficients $b_{l}$ with $| l - k | = 1$ , so $N_{1} = N_{L} = 1$ and $N_{2} = \dots = N_{L - 1} = 2$ .

4. Discrete-space methodology

4.1. A spectral model for confounding

We extend our methodology to the discrete case for a spatial domain comprised of $n$ regions. For region $i$ , let $Y_{i}, X_{i}$ and $Z_{i}$ be the response, exposure and confounding variables, respectively. Let $Y = {(Y_{1}, \dots, Y_{n})}^{T}$ and define $X, Z$ similarly. We model $Z$ using the conditional autoregressive model (Gelfand et al., 2010) with the Leroux parameterization (Leroux et al., 2000),

Z ~ N [μ, σ_{z}^{2} {\{(1 - λ_{z}) I_{n} + λ_{z} R\}}^{- 1}],

where $μ = {(μ_{1}, \dots, μ_{n})}^{T}$ is the mean vector, $σ_{z}^{2}$ determines the overall variance, $λ_{z} \in [0, 1]$ controls the strength of spatial dependence and $R$ is an $n \times n$ matrix specifying the spatial dependence. For the discrete case, the spatial dependence between the regions is often described by an adjacency structure. Let $a_{i j} = 1$ if regions $i$ and $j$ are adjacent and 0 otherwise, and let $m_{i}$ be the number of regions adjacent to region $i$ . Then $R$ has $(i, j)$ off-diagonal element $- a_{i j}$ and ith diagonal element $m_{i}$ . We denote this model as $Z ~ C A R (μ, σ_{z}^{2}, λ_{z})$ .

An advantage of the Leroux parameterization is that the spatial covariance can be written as

σ_{z}^{2} {\{(1 - λ_{z}) I_{n} + λ_{z} R\}}^{- 1} = σ_{z}^{2} Γ {\{(1 - λ_{z}) I_{n} + λ_{z} W\}}^{- 1} Γ^{T},

where the spectral decomposition of $R$ is $R = Γ W Γ^{T}$ for orthonormal eigenvector matrix $Γ$ and diagonal eigenvalue matrix $W$ with kth diagonal element $ω_{k} ⩾ 0$ , ordered so that $ω_{1} ⩽ \dots ⩽ ω_{n}$ . Assuming that all variables have the same adjacency structure $R$ , this allows us to project the model into the spectral domain using the graph Fourier transform (Sandryhaila & Moura, 2013), $Y^{*} = Γ^{T} Y = {(Y_{1}^{*}, \dots, Y_{n}^{*})}^{T}, X^{*} = Γ^{T} X = {(X_{1}^{*}, \dots, X_{n}^{*})}^{T}$ and $Z^{*} = Γ^{T} Z = {(Z_{1}^{*}, \dots, Z_{n}^{*})}^{T}$ . This transformation decorrelates the model and gives

Y_{k}^{*} ∣ X_{k}^{*}, Z_{k}^{*} \overset{indep}{~} N (β_{0} M_{k} + β_{x} X_{k}^{*} + β_{z} Z_{k}^{*}, σ^{2}),

where $M_{k}$ is the sum of the kth column of $Γ$ and $(X_{k}^{*}, Z_{k}^{*})$ are independent across $k$ . To exploit this decorrelation property of the graph Fourier transform, we conduct all analyses of Gaussian data for the discrete spatial domain in the spectral scale.

Comparing the discrete to continuous cases, the eigenvalue $ω_{k}$ is analogous to frequency $ω$ . Terms with small $ω_{k}$ have large variance and measure large-scale trends in the data. For example, it can be shown that if the $n$ locations form a connected graph then $ω_{1} = 0$ and $Y_{1}^{*}$ is proportional to the mean of $Y$ . In contrast, terms with large $ω_{k}$ have small variance and represent small-scale features. Using this analogy, in the remainder of this section we extend two of the continuous-domain methods of § 3 to the discrete case.

4.2. Bivariate conditional autoregressive model

As in § 2, we assume a joint model for $X^{*}$ and $Z^{*}$ . We assume that the pairs $(X_{k}^{*}, Z_{k}^{*})$ are independent across $k$ , and Gaussian with mean zero and covariance

cov \{\begin{array}{l} X_{k}^{*} \\ Z_{k}^{*} \end{array}\} = \{\begin{matrix} σ_{x}^{2} f_{x} (ω_{k}) & ρ σ_{x} σ_{z} f_{x z} (ω_{k}) \\ ρ σ_{x} σ_{z} f_{x z} (ω_{k}) & σ_{z}^{2} f_{z} (ω_{k}) \end{matrix}\},

(10)

where $σ_{x}^{2}$ and $σ_{z}^{2}$ are variance parameters, $f_{x} (ω_{k}) > 0$ and $f_{z} (ω_{k}) > 0$ are variance functions that determine the covariance of $X$ and $Z$ , respectively, and scalar $ρ$ and function $f_{x z} (ω_{k})$ determine the dependence between $X$ and $Z$ . For the Leroux conditional autoregressive model, we have $f_{j} (ω_{k}) = 1 / (1 - λ_{j} + λ_{j} ω_{k})$ for $j \in {x, z}$ so that the marginal distributions are $X ~ C A R (0, σ_{x}^{2}, λ_{x})$ and $Z ~ C A R (0, σ_{z}^{2}, λ_{z})$ .

One possible parametric cross-covariance model is $f_{x z} (ω) = 1 / (1 - λ_{x z} + λ_{x z} ω)$ . As with the bivariate Matérn, $f_{x z}$ has the same functional form as $f_{x}$ and $f_{z}$ . Constraints are required to ensure that the covariance in (10) is positive definite, i.e., that

ρ^{2} (1 - λ_{x} + λ_{x} w) (1 - λ_{z} + λ_{z} w) < {(1 - λ_{x z} + λ_{x z} w)}^{2}

(11)

for all $w \in \{ω_{1}, \dots, ω_{n}\}$ . Necessary conditions for (11) to hold for all $w ⩾ 0$ are $ρ^{2} (1 - λ_{x}) (1 - λ_{z}) < {(1 - λ_{x z})}^{2}$ and $ρ^{2} λ_{x} λ_{z} < λ_{x z}^{2}$ , but these conditions are not sufficient and not even necessary when considering only $w \in \{ω_{1}, \dots, ω_{n}\}$ .

Assuming that the covariance parameters give a valid covariance, then marginalizing over $Z_{k}^{*}$ and setting $β_{z} = 1$ , as in § 2, for identification gives

Y_{k}^{*} ∣ X_{k}^{*} \overset{indep}{~} N \{β_{0} M_{k} + β_{x} X_{k}^{*} + α (ω_{k}) X_{k}^{*}, τ^{2} (ω_{k}) + σ^{2}\},

(12)

where $α (ω_{k}) = ρ σ_{z} σ_{x}^{- 1} (1 - λ_{x} + λ_{x} ω_{k}) {(1 - λ_{x z} + λ_{x z} ω_{k})}^{- 1}$ and $τ^{2} (ω_{k}) = σ_{z}^{2} (1 - λ_{z} + {λ_{z} ω_{k})}^{- 1} - ρ^{2} σ_{z}^{2} (1 - λ_{x} + λ_{x} ω_{k}) {(1 - λ_{x z} + λ_{x z} ω_{k})}^{- 2}$ . Therefore, $α (ω) \to ρ σ_{z} σ_{x}^{- 1} λ_{x} λ_{x z}^{- 1}$ as $ω \to \infty$ , and thus the high-resolution confounding effect is smallest when $λ_{x}$ is smaller than $λ_{x z}$ .

The parsimonious cross-covariance model is $f_{x z} (ω_{k}) = \sqrt f_{x} (ω_{k}) f_{z} (ω_{k})$ , giving $c o r (X_{k}^{*}, Z_{k}^{*}) = ρ$ for all $k$ . With this simplification, any $ρ \in (- 1, 1)$ and $λ_{x}, λ_{z} \in (0, 1)$ give a valid covariance, and the terms in (12) reduce to $α (ω_{k}) = ρ σ_{z} σ_{x}^{- 1} {(1 - λ_{x} + λ_{x} ω_{k})}^{1 / 2} {(1 - λ_{z} + λ_{z} ω_{k})}^{- 1 / 2}$ and $τ^{2} (ω_{k}) = σ_{z}^{2} (1 - ρ^{2}) / (1 - λ_{z} + λ_{z} ω_{k})$ . Here, the missing confounder need not be smoother than the exposure, i.e., $λ_{z} > λ_{x}$ , for identifying $β_{x}$ as the assumption for identification is parsimonious coherence. As long as $λ_{z} \neq λ_{x}$ , the remaining parameters can also be identified, as established in the next theorem.

Theorem 3. Assuming that $λ_{x} \neq λ_{z}$ and $β_{z} = 1$ , then the parameters $β_{x}, σ^{2}, ρ, λ_{x}, λ_{z}, σ_{x}^{2}, σ_{z}^{2}$ in the parsimonious model (13) below are all identifiable.

We have fixed $β_{z} = 1$ in Theorem 3 so we can estimate $σ_{z}$ , but this is unnecessary. The alternative parameterization $σ^{*} = β_{z} σ_{z}$ can be used, which does not affect the results, as discussed in § 2.2.

In the spatial domain, the parsimonious model is

Y ∣ X, V ~ N (β_{0} + β_{x} X + Γ A Γ^{T} X + V, σ^{2} I_{n}),

(13)

where $V ~ C A R \{0, σ_{z}^{2} (1 - ρ^{2}), λ_{z}\}$ and $A$ is diagonal with $k$ th diagonal element $α (ω_{k})$ . The term $Γ A Γ^{T} X$ adjusts for missing spatial confounders and the term $V$ captures spatial variation that is independent of $X$ . In this case with $λ_{z} > λ_{x}$ , the confounder adjustment $Γ A Γ^{T} X$ smooths $X$ by first projecting into the spectral domain by multiplying by $Γ^{T}$ , then dampening the high-frequency terms with large $ω$ and thus small $α (ω)$ by multiplying by $A$ , and finally projecting back in the spatial domain by multiplying by $Γ$ . Marques et al. (2022) developed a Bayesian method to mitigate spatial confounding. Their model is related to our parsimonious Leroux conditional autoregressive model, except they used a Gaussian Markov random field model using the stochastic partial differential equation approach (Lindgren et al., 2011) for X and Z, and a penalized complexity prior for $ρ$ .

4.3. Semiparametric conditional autoregressive model

Mirroring § 3.2, rather than specifying a parametric joint model for $(X_{k}^{*}, Z_{k}^{*})$ , we directly specify a flexible model for the confounder adjustment, $α (ω)$ . The joint model is specified first with the conditional model

Z_{k}^{*} |X_{k}^{*} \overset{indep}{~} N \{α (ω_{k}) X_{k}^{*}, \frac{σ_{z}^{2}}{1 - λ_{z} + λ_{z} ω_{k}}\} .

In the spatial domain, this implies that $Z ∣ X ~ C A R (Γ A Γ^{T} X, σ_{z}, λ_{z})$ , where $A$ is diagonal with diagonal elements $\{α (ω_{1}), \dots, α (ω_{n})\}$ . Therefore, with any valid marginal distribution of $X$ , the joint model of $X$ and $Z$ is well defined. Since $X$ is observed, we do not need a model for its marginal distribution.

Marginalizing over the unknown $Z^{*}$ gives

Y_{k}^{*} |X_{k}^{*} \overset{indep}{~} N \{β_{0} M_{k} + β (ω_{k}) X_{k}^{*}, \frac{σ_{z}^{2}}{1 - λ_{z} + λ_{z} ω_{k}} + σ^{2}\},

(14)

where $β (ω_{k}) = β_{x} + α (ω_{k})$ . Following § 3.2, we assume that $α (ω_{n}) = 0$ so that $X_{n}^{*}$ and $Z_{n}^{*}$ are uncorrelated for the highest-frequency term. This implies that $β (ω_{n}) = β_{x}$ and $E (Y_{n}^{*}) = β_{0} M_{n} + β_{x} X_{n}$ , and thus the final term supplies unbiased information about the true exposure effect $β_{x}$ . Of course, a single unbiased term is insufficient for estimation, and so we further assume that $α (ω)$ varies smoothly over $ω$ to permit semiparametric estimation of $β_{x}$ .

We fit model (14) with a covariate effect that is allowed to vary with $k$ to separate associations at different spatial resolutions. Although other smoothing techniques are possible, the frequency-specific coefficients are smoothed using the basis expansion $β (ω) = \sum_{l = 1}^{L} B_{l} (ω) b_{l}$ , where the $B_{l} (ω)$ are cubic B-spline basis functions and the $b_{l}$ are the associated coefficients. Analogously to the semiparametric continuous-space model in § 3.2, we employ equally spaced B-splines (Eilers & Marx, 1996) with an intrinsic autoregressive prior on ${(b_{1}, \dots, b_{L})}^{T}$ . Under the assumption that $α (ω_{n}) = 0$ , we use the posterior distribution of $β (ω_{n})$ to summarize the effect $β_{x}$ .

In the spatial domain, the semiparametric conditional autoregressive model can be written as (9), i.e.,

Y ∣ X, V ~ N (β_{0} + \sum_{l = 1}^{L} {\hat{Z}}_{l} b_{l} + V, σ^{2} I_{n}),

where $V ~ C A R (0, σ_{z}^{2}, λ_{z}), {\hat{Z}}_{l} = Γ B_{l} Γ^{T} X, B_{l}$ is the diagonal matrix with spline basis functions, $\{B_{l} (ω_{1}), \dots, B_{l} (ω_{n})\}$ , on the diagonal and the regression coefficients are modelled as described below (8). The constructed covariates ${\hat{Z}}_{l}$ can be precomputed prior to estimation, and thus computation resembles a standard spatial analysis with $L$ known covariates. As above, under the assumption of no confounding for large $ω$ , we use the posterior of $β_{x} = \sum_{l = 1}^{L} B_{l} (ω_{n}) b_{l}$ to summarize the exposure effect. As our estimation of $β_{x}$ relies on a B-spline estimate at an endpoint, it may be associated with relatively large uncertainty. Therefore, other smoothing techniques may also be considered to test the sensitivity of the estimate to the chosen method of estimation.

5. Simulation study

5.1. Discrete space

Data are generated at $n$ locations $s_{1}, \dots, s_{n}$ on a 40 × 40 square grid with grid spacing one. The conditional autoregressive model uses rook neighbourhood structure so that $a_{i j} = 1$ if and only if $∥s_{i} - s_{j}∥ = 1$ . Data are generated from $X ~ C A R (0, σ_{x}^{2}, λ), Z ∣ X ~ C A R (β_{x z} W X, σ_{z}^{2}, λ)$ and $Y ∣ X, Z ~ N (β_{x} X + β_{z} Z, σ^{2} I_{n})$ , where $W$ is the kernel smoothing matrix with bandwidth $ϕ$ , i.e., $W_{i j} = w_{i j} / (\sum_{l = 1}^{n} w_{i l})$ and $l o g (w_{i j}) = - {(∥s_{i} - s_{j}∥ / ϕ)}^{2}$ . Including the kernel smoothed $X$ in the mean of $Z$ induces low-resolution dependence between $X$ and $Z$ . In all cases we take $σ_{x}^{2} = 1.7, σ_{z}^{2} = 1, λ = 0.95, β_{x} = β_{z} = 0.5$ and $σ^{2} = {0.25}^{2}$ , and we vary the strength of dependence via $β_{x z} \in {0, 1, 2}$ , and the kernel bandwidth $ϕ \in {1, 2}$ . The value of $β_{z}$ can be chosen without loss of generality, as only the product of $β_{z}$ and $σ_{z}$ can be uniquely identified. For each parameter combination, we generate 500 datasets; see the data examples in the Supplementary Material.

Figure 2 plots the induced correlations in the spectral domain for each scenario with $β_{x z} > 0$ . The correlation is nonzero for only low-frequency terms when $ϕ = 2$ , but correlation spills over to high-frequency terms when $ϕ = 1$ , especially when $β_{x z} = 2$ . Therefore, the assumption of no confounding at high frequencies is questionable when $ϕ = 1$ , and these scenarios are used to examine sensitivity to this key assumption. Also, these scenarios violate the parsimonious assumption of constant correlation across frequency, and so they illustrate the effects of misspecifying the parametric model.

Fig. 2. — Correlations in the spectral domain for the simulation study: $c o r (X_{k}^{*}, Z_{k}^{*})$ by $ω_{k}$ for different kernel bandwidths $(ϕ)$ and strengths of exposure/confounder dependence $(β_{x z})$ ; the correlations in the spatial domain (over locations) $c o r (X_{i}, Z_{i})$ are 0.62 when $ϕ = 1$ and $β_{x z} = 1,0.80$ when $ϕ = 1$ and $β_{x z} = 2,0.44$ when $ϕ = 2$ and $β_{x z} = 1$ , and 0.62 when $ϕ = 2$ and $β_{x z} = 2$ .

For each simulated dataset, we fit the standard Leroux conditional autoregressive model that has $β_{k} = β_{x}$ for all $k$ , the parametric parsimonious bivariate conditional autoregressive model and the semiparametric model with $β_{k}$ varying across $k$ using a cubic B-spline basis expansion. We compare two priors for $σ_{b}^{2}$ , the variance of the coefficient process $β_{k}$ , for the semiparametric model. The penalized complexity prior shrinks the process towards the constant function $β_{k} = β_{x}$ to avoid overfitting (Franco-Villoria et al., 2019); the second prior for the variance induces a Un (0, 1) prior on the proportion of overall model variance explained by variation in $β_{k}$ to balance all levels of spatial confounding. The prior distributions for all models are given in the Supplementary Material. We fit the semiparametric models for all $L \in {1, 5, 10, 20, 30, 40}$ and select the number of basis functions using the deviance information criterion (Spiegelhalter et al., 2002). All methods are fit using Markov chain Monte Carlo with 25000 iterations and the first 5000 discarded as burn-in.

Table 1 compares methods in terms of the root mean squared error, bias, posterior standard deviation averaged over datasets and empirical coverage of 95% intervals for $β_{x}$ , and Fig. 3 summarizes the sampling distribution of $β_{k}$ against $ω_{k}$ . The standard method performs well in the first scenario with no unmeasured confounder, $β_{x z} = 0$ , but in all other scenarios the standard method is biased and has coverage at or near zero. The standard method allows for spatially dependent residuals, but this does not eliminate spatial confounding bias. Since the standard model assumes that $X$ and $Z$ are independent, when $X$ and $Z$ are highly correlated, all spatial variability is attributed to the exposure effect, leading to bias and small posterior standard deviation.

Table 1.

Discrete-space simulation study comparing four methods: the standard Leroux model, the parametric parsimonious model, the semiparametric model with a penalized complexity prior and the semiparametric model with a uniform prior on the proportion of variance (semi- $R^{2}$ ). Data are generated with dependence between exposure and confounder controlled by $β_{x z}$ and kernel bandwidth $ϕ$ . Standard errors are given in parentheses and all results are multiplied by 100

Scenario	Method	$ϕ$	$β_{x z}$	RMSE	Bias	SD	Cov
1	Standard	_—	0	1.2 (0.0)	0.0 (0.1)	1.3 (0.0)	95.2 (1.0)
	Parametric			1.3 (0.0)	0.0 (0.1)	1.4 (0.0)	95.4 (0.9)
	Semiparametric			4.4 (0.3)	0.0 (0.2)	2.5 (0.1)	95.0 (1.0)
	Semi- $R^{2}$			4.5 (0.4)	0.0 (0.2)	2.5 (0.1)	94.8 (1.0)
2	Standard	1	1	19.0 (0.1)	18.9 (0.1)	1.4 (0.0)	0.0 (0.0)
	Parametric			14.0 (0.l)	13.8 (0.1)	1.7 (0.0)	0.0 (0.0)
	Semiparametric			7.7 (0.3)	−0.9 (0.3)	8.3 (0.1)	96.6 (0.8)
	Semi- $R^{2}$			8.0 (0.3)	−0.7 (0.4)	8.9 (0.1)	97.2 (0.7)
3	Standard	1	2	34.3 (0.1)	34.3 (0.1)	1.7 (0.0)	0.0 (0.0)
	Parametric			20.6 (0.2)	20.4 (0.1)	2.0 (0.0)	0.0 (0.0)
	Semiparametric			9.5 (0.3)	0.6 (0.4)	9.3 (0.1)	94.4 (1.0)
	Semi- $R^{2}$			9.5 (0.3)	0.8 (0.4)	9.4 (0.1)	96.0 (0.9)
4	Standard	2	1	5.6 (0.1)	5.4 (0.1)	1.4 (0.0)	3.8 (0.9)
	Parametric			2.1 (0.1)	1.2 (0.1)	1.6 (0.0)	85.4 (1.6)
	Semiparametric			8.2 (0.2)	−0.9 (0.4)	9.3 (0.1)	96.6 (0.8)
	Semi- $R^{2}$			8.2 (0.3)	−0.7 (0.4)	9.2 (0.1)	95.8 (0.9)
5	Standard	2	2	8.8 (0.1)	8.7 (0.1)	1.5 (0.0)	0.0 (0.0)
	Parametric			2.5 (0.1)	−1.1 (0.1)	1.8 (0.0)	84.0 (1.6)
	Semiparametric			10.2 (0.3)	−0.6 (0.5)	10.3 (0.1)	94.8 (1.0)
	Semi- $R^{2}$			10.4 (0.3)	−0.6 (0.5)	10.3 (0.1)	94.0 (1.1)

Open in a new tab

RMSE, root-mean-squared error; SD, average posterior standard deviation; Cov, coverage of 95% posterior intervals.

Fig. 3. — Performance for the discrete simulation study: median (solid) and 95% confidence interval (dashed) for $β_{k}$ for the standard model (red), semiparametric model with the penalized complexity prior (green), and parametric model (blue) for data generated with dependence between exposure, and confounder controlled by $β_{x z}$ and kernel bandwidth $ϕ$ . The black lines are the true $β_{x} = 0.5$ .

The parametric model performs well in scenario 1, and respectably in scenarios 4 and 5 where there is no confounding at high frequencies and parametric model assumptions do not hold. In fact, the parametric model is nearly identical to the standard model in the first case with no spatial confounding, suggesting that little is lost by allowing for a parametric confounding adjustment when it is not needed. However, the parametric model gives bias and low coverage in the cases with $ϕ = 1$ , and thus the form of spatial confounding does not match the parametric model. The estimated $β_{k}$ curves in Fig. 3 show that the parametric form of the $β_{k}$ model cannot match the slow decline in the true correlation of Fig. 2 when $ϕ = 1$ . However, the parametric model is still able to recover reasonably well the truth for $ϕ = 2$ ; it appears that there is some robustness to the parsimonious coherence assumption for the parametric model if the unconfoundedness at high frequencies assumption holds.

The semiparametric methods have low bias and coverage near the nominal level for all five scenarios. However, the posterior standard deviation is always larger for the semiparametric models than the standard or parametric models. Therefore, in these cases, the semiparametric methods are robust, but conservative for estimating a casual effect in the presence of spatial confounding. Surprisingly, the semiparametric methods are insensitive to the choice of prior. Despite the two prior specifications having very different motivations, the results are similar, likely because the deviance information criterion often selects a small number of basis functions that negates the influence of the prior for $σ_{b}^{2}$ .

5.2. Continuous space

Data in the continuous space are generated similarly to the discrete case. The data are simulated on a 23 × 23 unit square grid. We simulated $X ~ N (0, σ_{x}^{2} Σ_{x}), Z ∣ X ~ N (β_{x z} W X, σ_{z}^{2} Σ_{z})$ and $Y ∣ X, Z ~ N (β_{x} X + β_{z} Z, σ^{2} I_{n})$ , where $Σ_{j}$ is the $n \times n$ Matérn correlation matrix defined by parameters $ϕ_{j}$ and $v_{j}$ , and $W$ is the kernel smoothing matrix with bandwidth $ϕ$ . In all cases we take $σ_{x}^{2} = σ_{z}^{2} = 1$ , spatial range parameters $ϕ_{x} = ϕ_{z} = 0.1, v_{x} = v_{z} = 0.5, β_{x} = β_{z} = 1$ and $σ^{2} = {0.25}^{2}$ , and we vary $β_{x z} \in {0, 1, 2}$ , and the kernel bandwidth $ϕ \in {1 / 15, 2 / 15}$ . For each combination of these parameters, we generate 100 datasets. For each simulated dataset, we fit four models: the standard Matérn model in § 3.1 with $ρ = 0$ , and thus no confounding adjustment, the bivariate Matérn model with common range in (6), the parsimonious Matérn model in (7) and the semiparametric model in § 3.2. Prior distributions and computing details are given in the Supplementary Material.

The results mirror those in the discrete case and therefore the result table is deferred to the Supplementary Material. The semiparametric method maintains nearly the nominal coverage and low bias across all scenarios. The parametric Matérn models have bias and low coverage for the simulation settings where the data are not simulated with a Matérn covariance. The common range Matérn model dramatically reduces root-mean-square error and improves coverage compared to the parsimonious Matérn model, but neither is sufficiently flexible for these cases.

6. Real data examples

6.1. Analysis of the Scottish lip cancer dataset

All model fits in this section were carried out using the eCAR package found in R (R Development Core Team, 2023) that was created to fit the discrete space methods. We first consider the well-known lip cancer dataset, see Fig. 4, available in the R package CAR-Bayesdata. The data cover $n = 56$ districts in Scotland. Three variables are recorded for each district: the recorded number of lip cancer cases, $Y_{i}$ , the expected number of lip cancer cases computed using indirect standardization based on Scotland-wide disease rates, $E_{i}$ , and the percentage of the district’s workforce employed in agriculture, fishing and forestry, $X_{i}$ . Since we have non-Gaussian responses, the generalized models presented in the Supplementary Material are used.

Fig. 4. — Maps for the Scotland lip cancer dataset: (a) standard mortality ratio and (b) mortality rate in the agriculture, fishing and forestry workforce.

We fit the spatial Poisson regression model $Y_{i} ∣ θ_{i} \overset{indep}{~} P o \{E_{i} e x p (θ_{i})\}$ , where $θ_{i}$ is the log relative risk in district $i$ . We model $θ = {(θ_{1}, \dots, θ_{n})}^{T}$ as $θ ∣ X ~ C A R (β_{0} + β_{x} X + Γ A Γ^{T} X, σ_{z}^{2}, λ_{z})$ . For the parametric approach, we use the same priors as in § 5.1 and, for the semiparametric model, we employ INLA (Rue et al., 2009) and use the penalized complexity prior with $L = 10$ basis functions, chosen via the deviance information criterion; results are stable for $L = {10, 20, 30, 40, 50}$ . We also fit two standard nonspectral methods where $β_{k}$ is constant over $ω_{k}$ : a Poisson regression with the percentage of the workforce employed in agriculture, fishing and forestry as the covariate and a Poisson regression that includes Leroux conditional autoregressive random effects.

Figure 5 plots the posterior of $e x p (β_{k})$ by the eigenvalue $ω_{k}$ for each model. The standard methods have positive posterior mean and their 95% intervals exclude 1, indicating a significant increase in risk for lip cancer for a unit difference in the percentage of the workforce employed in agriculture, fishing and forestry. The spectral methods, which attempt to account for spatial confounding, do not agree with the standard methods: the estimated $e x p (β_{k})$ trends toward 1 for large $ω_{k}$ , meaning that the results of the standard models should be interpreted with caution because the strength of the relationship between these variables is weak at the local spatial scale. These results are consistent with a missing confounding variable with the same large-scale spatial pattern as lip cancer disease and the percentage of the workforce employed in agriculture, fishing and forestry.

Fig. 5. — Effect of the percentage of the workforce employed in agriculture, fishing and forestry on lip cancer in Scotland: posterior mean (solid lines) and 95% credible interval (dashed lines) of $e x p (β_{k})$ for the spectral parametric model (black), the spectral semiparametric model with $L = 10$ (green), a Poisson regression on the percentage of the workforce employed in agriculture, fishing and forestry (red) and a Poisson regression with residuals modelled as Leroux (blue).

6.2. Analysis of COVID-19 mortality and PM_2.5 exposure

Wu et al. (2020) noted that many of the pre-existing conditions that increase the mortality risk of COVID-19 are connected with long-term exposure to air pollution. They conducted a study and found that a difference of $1 μ g / m^{3}$ in ambient fine particulate matter, PM_2.5, is positively associated with a 15% difference in the COVID-19 mortality rate. To further illustrate our proposed methods, we analyse the data collected by Wu et al. (2020) in an attempt to estimate the effect of PM_2.5 on COVID-19 mortality using spatial methods.

The response is the cumulative COVID-19 mortality counts through May 12, 2020 for US counties. County-level exposure to PM_2.5 was calculated by averaging results from an established exposure prediction model for the years 2000–16; see Wu et al. (2020) for more details. Eight counties and 12 Virginia cities were missing from the database, so we imputed their values using neighbourhood means with neighbours defined by counties that share a boundary. This resulted in mortality counts and PM_2.5 measures for $n = 3109$ counties; see Fig. 6. In addition to PM_2.5 exposure, 20 potential confounding variables, e.g., the percentage of the population at least 65 years old, are included in our modelling; see Wu et al. (2020) for the complete set of potential confounding covariates. For county $i$ , denote $Y_{i}$ as the number of deaths attributed to COVID-19, $E_{i}$ as the population, $X_{i}$ as the average PM_2.5 and $C_{i}$ as the vector of 20 known confounding variables. Similar to Wu et al. (2020), we fit a negative-binomial regression model $Y_{i} ∣ X_{i}, Z_{i}, C_{i} \overset{indep}{~} N e g B i n (r_{i}, p_{i})$ , where $r_{i}$ is the size parameter and $p_{i}$ the probability of success. Under this model, the mean is $E (Y_{i} ∣ X_{i}, Z_{i}, C_{i}) = λ_{i} = r_{i} (1 - p_{i}) / p_{i}$ . We parameterize the model in terms of $λ_{i}$ and $r_{i}$ . The prior is $l o g (r_{i}) ~ N (0, 10)$ and the mean is linked to the linear predictor as $l o g (λ_{i}) = l o g (E_{i}) + θ_{i}$ , where $θ_{i} = β_{x} X_{i} + Z_{i} + C_{i}^{'} β_{c}$ , the offset term $E_{i}$ is the county population and $β_{c}$ is a vector of regression coefficients associated with the confounding variables. Following the non-Gaussian models in the Supplementary Material, the linear predictor becomes $θ ∣ X ~ C A R (β_{0} + β_{x} X + Γ A Γ^{T} X + β_{c} C, σ_{z}^{2}, λ_{z})$ , where $C$ is a design matrix that includes an intercept term. We fit this model using the parametric and semiparametric approaches detailed in § 4.2 and § 4.3. The negative-binomial model is chosen to mimic the analysis in Wu et al. (2020). We also considered a binomial and Poisson model and inferences were relatively unchanged.

Fig. 6. — (a) Average PM_2.5 ( $(μ g / m^{3}$ ) over 2000–16 and (b) the log COVID-19 mortality rate through May 12, 2020. Counties with no deaths are shaded grey.

We compare our method to a variant of the model employed in Wu et al. (2020), which we refer to as the standard spatial model, i.e., a negative-binomial regression with all control variables and county random effects modelled using the Leroux model. Following Wu et al. (2020), two separate analyses using all $n = 3109$ counties and $n = 1977$ counties that reported at least 10 confirmed COVID-19 deaths were conducted; this was done to account for the fact that the size of an outbreak in a given county may be positively associated with both the COVID-19 mortality rate and PM2.5, thus introducing confounding bias.

Figure 7 displays model fits using the full and reduced data. The estimated difference in the COVID-19 mortality rate, associated with a difference of $1 μ g / m^{3}$ of PM_2.5, under the standard spatial model is 16% (95% confidence interval: 1.08, 1.25), and 12% (95% confidence interval: 1.04, 1.21) in the full and reduced analyses, respectively. The posterior mean estimates from the parametric and semiparametric spectral models generally agree with the standard spatial approach, but the posterior standard deviation is higher for the spectral methods. In this analysis, the spectral methods support the standard spatial model and serve as a check of sensitivity to adjustments for missing confounders.

Fig. 7. — Results for the COVID-19 example: the posterior mean (solid) and 95% credible interval (dashed) of the mortality rate ratio associated with a difference of $1 μ g / m^{3}$ of PM_2.5, $e x p (β_{k})$ . Results are for (a) all $n = 3109$ counties and (b) $n = 1977$ counties that reported at least 10 confirmed COVID-19 deaths. The standard spatial approach refers to a regression model including all confounders and a spatial Leroux model for county random effects.

7. Discussion

Our work is the first to study the problem of spatial confounding in the spectral domain and model the coherence between the exposure and the unmeasured spatial confounder. We provide theoretical results that help understand the study in Paciorek (2010), where extensive simulations illustrate the bias obtained under different combinations of spatial scales for $X$ and $Z$ . New spectral methods are also proposed to adjust for unmeasured spatial confounding variables, and sufficient conditions are provided to ensure that the exposure effect is identifiable, including the important case without a nugget effect in the treatment and/or response variable. These ideas are developed for continuous and discrete spatial domains, and Gaussian and non-Gaussian data.

We have assumed that $X$ and $Z$ are stationary Gaussian processes in developing our methodology. Such modelling assumptions often work well with weak nonstationarity. With strong nonstationarity under model misspecification, we believe that the semiparametric model is more robust, and that $β_{x}$ estimation will be primarily driven by observations from the regions that have variations at smaller spatial scales. We recommend gravitating towards the semiparametric approach. The parametric parsimonious model depends on scale-invariant coherence between the exposure and the unmeasured confounder; it also relies on parsimonious parameterization for estimation of $α (ω)$ . The semiparametric model depends on the assumption that their coherence tends to zero for large frequencies. While neither of these assumptions are empirically verifiable, we believe that the latter assumption is more reasonable in practice. The semiparametric methods are also easier to implement computationally as the confounding adjustment takes the form of smoothed covariates; it is straightforward to pass these constructed variables into standard spatial computing packages. However, the continuous-space semiparametric model requires numerical approximations to integrals. When the exposure observations are highly spatially irregular, our implementation in § 3.2 can be problematic. In this case, we recommend seeking other numerical approximations to integrals.

Supplementary Material

supplementary

NIHMS1970240-supplement-supplementary.pdf^{(392.3KB, pdf)}

Acknowledgement

This work was partially supported by the National Institutes of Health and the King Abdullah University of Science and Technology.

Footnotes

Supplementary material

The Supplementary Material includes an extension of the proposed methods to multiple predictors and non-Gaussian observations, details of the causal assumptions for the spatial framework, proofs of the lemmas and theorems, prior distributions and computing details for both discrete and continuous cases, and a results table for the continuous-space simulation study. The R code is available at https://github.com/yawenguan/spatial_confounding.

Contributor Information

YAWEN GUAN, Department of Statistics, University of Nebraska, 343C Hardin Hall, Lincoln, Nebraska 68583, U.S.A..

GARRITT L. PAGE, Department of Statistics, Brigham Young University, 238 TMCB, Provo, Utah 84602, U.S.A.

BRIAN J. REICH, Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, North Carolina 27695, U.S.A.

MASSIMO VENTRUCCI, Department of Statistical Sciences, University of Bologna, Via Zamboni 33, Bologna 40126, Italy.

SHU YANG, Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, North Carolina 27695, U.S.A..

References

Apanasovich TV, Genton MG & Sun Y (2012). A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components. J. Am. Statist. Assoc 107, 180–93. [Google Scholar]
Chen K, Dai F, Marchiori E & Theodoridis S (2021). Novel compressible adaptive spectral mixture kernels for Gaussian processes with sparse time and phase delay structures. ar Xiv: 1808.00560v8. [Google Scholar]
Clayton DG, Bernardinellli L & Montomoli C (1993). Spatial correlation in ecological analysis. Int. J. Epidemiol 22, 1193–202. [DOI] [PubMed] [Google Scholar]
Dupont E, Wood SN & Augustin N (2022). Spatial+: a novel approach to spatial confounding. Biometrics 78, 1279–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eilers PHC & Marx BD (1996). Flexible smoothing with b-splines and penalties. Statist. Sci 11, 89–102. [Google Scholar]
Faes L, Krohova J, Pernice R, Busacca A & Javorka M (2019). A new frequency domain measure of causality based on partial spectral decomposition of autoregressive processes and its application to cardiovascular interactions. In 2019 41st Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), pp. 4258–61. Piscataway, NJ: IEEE Press. [DOI] [PubMed] [Google Scholar]
Franco-Villoria M, Ventrucci M & Rue H (2019). A unified view on Bayesian varying coefficient models. Electron. J. Statist 13, 5334–59. [Google Scholar]
Fuentes M & Reich B (2010). Spectral domain. In Handbook of Spatial Statistics, Gelfand AE, Diggle P, Fuentes M & Guttorp P, eds. Boca Raton, FL: CRC Press, pp. 57–77. [Google Scholar]
Gelfand AE, Diggle P, Fuentes M & Guttorp P, Ed. (2010). Handbook of Spatial Statistics. Boca Raton, FL: CRC Press. [Google Scholar]
Gneiting T, Kleiber W & Schlather M (2010). Matérn cross-covariance functions for multivariate random fields. J. Am. Statist. Assoc 105, 1167–77. [Google Scholar]
Hanks EM, Schliep EM, Hooten MB & Hoeting JA (2015). Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26, 243–54. [Google Scholar]
Hodges JS & Reich BJ (2010). Adding spatially-correlated errors can mess up the fixed effect you love. Am. Statistician 64, 325–34. [Google Scholar]
Hughes J & Haran M (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R. Statist. Soc. B 75, 139–59. [Google Scholar]
Jang PA, Loeb A, Davidow M & Wilson AG (2017). Scalable Levy process priors for spectral kernel learning. In Advances in Neural Information Processing Systems, pp. 3943–52. New York: Curran Associates. [Google Scholar]
Johnson C (2012). Numerical Solution of Partial Differential Equations by the Finite Element Method. New York: Dover Publications. [Google Scholar]
Keller JP & Szpiro AA (2020). Selecting a scale for spatial confounding adjustment. J. R. Statist. Soc. A 183, 1121–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khan K & Calder CA (2022). Restricted spatial regression methods: implications for inference. J. Am. Statist. Assoc 117, 482–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kleiber W (2017). Coherence for multivariate random fields. Statist. Sinica 27, 1675–97. [Google Scholar]
Leroux BG, Lei X & Breslow N (2000). Estimation of disease rates in small areas: a new mixed model for spatial dependence. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, Halloran ME & Berry D, eds. New York: Springer, pp. 179–91. [Google Scholar]
Lindgren F, Rue H & Lindström J (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Statist. Soc. B 73, 423–98. [Google Scholar]
Marques I, Kneib T & Klein N (2022). Mitigating spatial confounding by explicitly correlating Gaussian random fields. Environmetrics 33, e2727. [Google Scholar]
Osama M, Zachariah D & Schön TB (2019). Inferring heterogeneous causal effects in presence of spatial confounding. In Proc. 36th Int. Conf. Mach. Learn., Long Beach, California, vol. 97, Chaudhuri K & Salakhutdinov R, eds. PMLR, pp. 4942–50. [Google Scholar]
Paciorek CJ (2010). The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Statist. Sci 25, 107–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
Page GL, Liu Y, He Z & Sun D (2017). Estimation and prediction in the presence of spatial confounding for spatial linear models. Scand. J. Statist 44, 780–97. [Google Scholar]
Prates MO, Assunção RM & Rodrigues EC (2019). Alleviating spatial confounding for areal data problems by displacing the geographical centroids. Bayesian Anal. 14, 623–47. [Google Scholar]
Qadir GA & Sun Y (2020). Semiparametric estimation of cross-covariance functions for multivariate random fields. Biometrics 77, 547–60. [DOI] [PubMed] [Google Scholar]
R Development Core Team (2023). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, http://www.R-project.org. [Google Scholar]
Reich BJ & Fuentes M (2012). Nonparametric Bayesian models for a spatial covariance. Statist. Methodol 9, 265–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reich BJ, Hodges JS & Zadnik V (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62, 1197–206. [DOI] [PubMed] [Google Scholar]
Reich BJ, Yang S, Guan Y, Giffin AB, Miller MJ & Rappold A (2021). A review of spatial causal inference methods for environmental and epidemiological applications. Int. Statist. Rev 89, 605–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rue H, Martino S & Chopin N (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Statist. Soc. B 71, 319–92. [Google Scholar]
Sandryhaila A & Moura JMF (2013). Discrete signal processing on graphs: graph Fourier transform. In 2013 IEEE Int. Conf. Acoust. Speech Sig. Proces., pp. 6167–70. Piscataway, NJ: IEEE Press. [Google Scholar]
Schnell P & Papadogeorgou G (2020). Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths. Ann. Appl. Statist 14, 2069–95. [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP & Van Der Linde A (2002). Bayesian measures of model complexity and fit. J. R. Statist. Soc. B 64, 583–639. [Google Scholar]
Stokes PA & Purdon PL (2017). A study of problems encountered in Granger causality analysis from a neuroscience perspective. Proc. Nat. Acad. Sci 114, E7063–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thaden H & Kneib T (2018). Structural equation models for dealing with spatial confounding. Am. Statistician 72, 239–52. [Google Scholar]
Watson GN (1995). A Treatise on the Theory of Bessel Functions. Cambridge: Cambridge University Press. [Google Scholar]
Wu X, Nethery RC, Sabath BM, Braun D & Dominici F (2020). Air pollution and COVID-19 mortality in the United States: strengths and limitations of an ecological regression analysis. Sci. Adv 6, 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zimmerman DL & Hoef JMV (2022). On deconfounding spatial confounding in linear models. Am. Statistician 76, 159–67. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary

NIHMS1970240-supplement-supplementary.pdf^{(392.3KB, pdf)}

[R1] Apanasovich TV, Genton MG & Sun Y (2012). A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components. J. Am. Statist. Assoc 107, 180–93. [Google Scholar]

[R2] Chen K, Dai F, Marchiori E & Theodoridis S (2021). Novel compressible adaptive spectral mixture kernels for Gaussian processes with sparse time and phase delay structures. ar Xiv: 1808.00560v8. [Google Scholar]

[R3] Clayton DG, Bernardinellli L & Montomoli C (1993). Spatial correlation in ecological analysis. Int. J. Epidemiol 22, 1193–202. [DOI] [PubMed] [Google Scholar]

[R4] Dupont E, Wood SN & Augustin N (2022). Spatial+: a novel approach to spatial confounding. Biometrics 78, 1279–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Eilers PHC & Marx BD (1996). Flexible smoothing with b-splines and penalties. Statist. Sci 11, 89–102. [Google Scholar]

[R6] Faes L, Krohova J, Pernice R, Busacca A & Javorka M (2019). A new frequency domain measure of causality based on partial spectral decomposition of autoregressive processes and its application to cardiovascular interactions. In 2019 41st Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), pp. 4258–61. Piscataway, NJ: IEEE Press. [DOI] [PubMed] [Google Scholar]

[R7] Franco-Villoria M, Ventrucci M & Rue H (2019). A unified view on Bayesian varying coefficient models. Electron. J. Statist 13, 5334–59. [Google Scholar]

[R8] Fuentes M & Reich B (2010). Spectral domain. In Handbook of Spatial Statistics, Gelfand AE, Diggle P, Fuentes M & Guttorp P, eds. Boca Raton, FL: CRC Press, pp. 57–77. [Google Scholar]

[R9] Gelfand AE, Diggle P, Fuentes M & Guttorp P, Ed. (2010). Handbook of Spatial Statistics. Boca Raton, FL: CRC Press. [Google Scholar]

[R10] Gneiting T, Kleiber W & Schlather M (2010). Matérn cross-covariance functions for multivariate random fields. J. Am. Statist. Assoc 105, 1167–77. [Google Scholar]

[R11] Hanks EM, Schliep EM, Hooten MB & Hoeting JA (2015). Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26, 243–54. [Google Scholar]

[R12] Hodges JS & Reich BJ (2010). Adding spatially-correlated errors can mess up the fixed effect you love. Am. Statistician 64, 325–34. [Google Scholar]

[R13] Hughes J & Haran M (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R. Statist. Soc. B 75, 139–59. [Google Scholar]

[R14] Jang PA, Loeb A, Davidow M & Wilson AG (2017). Scalable Levy process priors for spectral kernel learning. In Advances in Neural Information Processing Systems, pp. 3943–52. New York: Curran Associates. [Google Scholar]

[R15] Johnson C (2012). Numerical Solution of Partial Differential Equations by the Finite Element Method. New York: Dover Publications. [Google Scholar]

[R16] Keller JP & Szpiro AA (2020). Selecting a scale for spatial confounding adjustment. J. R. Statist. Soc. A 183, 1121–43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Khan K & Calder CA (2022). Restricted spatial regression methods: implications for inference. J. Am. Statist. Assoc 117, 482–94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Kleiber W (2017). Coherence for multivariate random fields. Statist. Sinica 27, 1675–97. [Google Scholar]

[R19] Leroux BG, Lei X & Breslow N (2000). Estimation of disease rates in small areas: a new mixed model for spatial dependence. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, Halloran ME & Berry D, eds. New York: Springer, pp. 179–91. [Google Scholar]

[R20] Lindgren F, Rue H & Lindström J (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Statist. Soc. B 73, 423–98. [Google Scholar]

[R21] Marques I, Kneib T & Klein N (2022). Mitigating spatial confounding by explicitly correlating Gaussian random fields. Environmetrics 33, e2727. [Google Scholar]

[R22] Osama M, Zachariah D & Schön TB (2019). Inferring heterogeneous causal effects in presence of spatial confounding. In Proc. 36th Int. Conf. Mach. Learn., Long Beach, California, vol. 97, Chaudhuri K & Salakhutdinov R, eds. PMLR, pp. 4942–50. [Google Scholar]

[R23] Paciorek CJ (2010). The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Statist. Sci 25, 107–25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Page GL, Liu Y, He Z & Sun D (2017). Estimation and prediction in the presence of spatial confounding for spatial linear models. Scand. J. Statist 44, 780–97. [Google Scholar]

[R25] Prates MO, Assunção RM & Rodrigues EC (2019). Alleviating spatial confounding for areal data problems by displacing the geographical centroids. Bayesian Anal. 14, 623–47. [Google Scholar]

[R26] Qadir GA & Sun Y (2020). Semiparametric estimation of cross-covariance functions for multivariate random fields. Biometrics 77, 547–60. [DOI] [PubMed] [Google Scholar]

[R27] R Development Core Team (2023). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, http://www.R-project.org. [Google Scholar]

[R28] Reich BJ & Fuentes M (2012). Nonparametric Bayesian models for a spatial covariance. Statist. Methodol 9, 265–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Reich BJ, Hodges JS & Zadnik V (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62, 1197–206. [DOI] [PubMed] [Google Scholar]

[R30] Reich BJ, Yang S, Guan Y, Giffin AB, Miller MJ & Rappold A (2021). A review of spatial causal inference methods for environmental and epidemiological applications. Int. Statist. Rev 89, 605–34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Rue H, Martino S & Chopin N (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Statist. Soc. B 71, 319–92. [Google Scholar]

[R32] Sandryhaila A & Moura JMF (2013). Discrete signal processing on graphs: graph Fourier transform. In 2013 IEEE Int. Conf. Acoust. Speech Sig. Proces., pp. 6167–70. Piscataway, NJ: IEEE Press. [Google Scholar]

[R33] Schnell P & Papadogeorgou G (2020). Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths. Ann. Appl. Statist 14, 2069–95. [Google Scholar]

[R34] Spiegelhalter DJ, Best NG, Carlin BP & Van Der Linde A (2002). Bayesian measures of model complexity and fit. J. R. Statist. Soc. B 64, 583–639. [Google Scholar]

[R35] Stokes PA & Purdon PL (2017). A study of problems encountered in Granger causality analysis from a neuroscience perspective. Proc. Nat. Acad. Sci 114, E7063–72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Thaden H & Kneib T (2018). Structural equation models for dealing with spatial confounding. Am. Statistician 72, 239–52. [Google Scholar]

[R37] Watson GN (1995). A Treatise on the Theory of Bessel Functions. Cambridge: Cambridge University Press. [Google Scholar]

[R38] Wu X, Nethery RC, Sabath BM, Braun D & Dominici F (2020). Air pollution and COVID-19 mortality in the United States: strengths and limitations of an ecological regression analysis. Sci. Adv 6, 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Zimmerman DL & Hoef JMV (2022). On deconfounding spatial confounding in linear models. Am. Statistician 76, 159–67. [Google Scholar]

PERMALINK

Spectral adjustment for spatial confounding

YAWEN GUAN

GARRITT L PAGE

BRIAN J REICH

MASSIMO VENTRUCCI

SHU YANG

Summary

1. Introduction

2. Continuous-space modelling framework

2.1. Preliminaries

2.2. Spectral representation of confounding and identification

2.3. Spatial representation of confounding and identification

3. Continuous-space estimation strategies

3.1. The bivariate Matérn parametric model

Fig. 1.

3.2. Semiparametric model

4. Discrete-space methodology

4.1. A spectral model for confounding

4.2. Bivariate conditional autoregressive model

4.3. Semiparametric conditional autoregressive model

5. Simulation study

5.1. Discrete space

Fig. 2.

Table 1.

Fig. 3.

5.2. Continuous space

6. Real data examples

6.1. Analysis of the Scottish lip cancer dataset

Fig. 4.

Fig. 5.

6.2. Analysis of COVID-19 mortality and PM2.5 exposure

Fig. 6.

Fig. 7.

7. Discussion

Supplementary Material

Acknowledgement

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

6.2. Analysis of COVID-19 mortality and PM_2.5 exposure