Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 26.
Published in final edited form as: Bayesian Anal. 2023 Jun 1;19(2):465–500. doi: 10.1214/22-BA1357

Generalized Geographically Weighted Regression Model within a Modularized Bayesian Framework*

Yang Liu †,§, Robert J B Goudie
PMCID: PMC7614111  EMSID: EMS160152  PMID: 36714467

Abstract

Geographically weighted regression (GWR) models handle geographical dependence through a spatially varying coefficient model and have been widely used in applied science, but its general Bayesian extension is unclear because it involves a weighted log-likelihood which does not imply a probability distribution on data. We present a Bayesian GWR model and show that its essence is dealing with partial misspecification of the model. Current modularized Bayesian inference models accommodate partial misspecification from a single component of the model. We extend these models to handle partial misspecification in more than one component of the model, as required for our Bayesian GWR model. Information from the various spatial locations is manipulated via a geographically weighted kernel and the optimal manipulation is chosen according to a Kullback–Leibler (KL) divergence. We justify the model via an information risk minimization approach and show the consistency of the proposed estimator in terms of a geographically weighted KL divergence.

Keywords: geographically weighted regression, modularized Bayesian, cutting feedback, model misspecification, power likelihood

MSC2020 subject classifications: Primary 62F15, 62J12

1. Introduction

Conventional regression models have been widely used in various studies to infer the association between variables. While basic regression models often assume an independent sampling scheme, geographical dependence must be taken into consideration when the dataset or sampling scheme has a spatial structure. Therefore,rather than assuming a constant association between variables with constant coefficients, models with geographically-variable coefficients have been proposed for this purpose. Suppose we have observations (Xi, Yi) at sampling location i with coordinates (ui, vi), i = 0,…, n. We assume that the unknown true data generating process of the outcome Yi, given the covariate vector Xi, is pˇi(YiXi) at a particular location i. To model pˇi, we assume a generalized linear model (GLM) 𝔼(Yi|Xi) = g−1 (Xiφ(ui, vi)), with link function g, and where the coefficient φ(u, v) is a smooth function with respect to (u, v). For simplicity, we define φiφ(ui, vi) for location i.

In addition to the coefficient φi, for some generalized linear regression models, such as negative binomial or beta regression, for each location i there is an additional parameter θi that determines the variability (scale) of the distribution. The additional parameter θi is usually regarded as a nuisance parameter. This variability could be attributed to sampling or measurement errors, which may be different at different locations. We assume throughout this paper that θi is similar, but not the same, across spatial locations but the variability is not spatially smooth. For instance, consider a variability induced by a difference in measurement equipment: each location may have arbitrarily used different measurement equipment, and consequently the variabilities of observations at different locations are not constant but also not spatially smooth. We denote the likelihood of the GLM as p(Yi|θi, φi) at location i. For example, in the case of a negative binomial likelihood, with a log link function, the likelihood is:

p(Yi|θi,φi)=Γ(Yi+θi1)Γ(θi1)Γ(Yi+1)(11+θiexp(Xiφi))θi1(θiexp(Xiφi)1+θiexp(Xiφi))Yi. (1.1)

We assume that a single location i = 0 is of primary interest, and our first aim is to estimate φ0 and θ0 at this location. We will then consider the case when multiple locations are of interest.

Several modelling approaches have been proposed for geographically variable coefficients. One class of approaches involves clustering locations into groups and considering a group-wise estimation of the coefficients. For example, Li and Sang (2019) proposed spatially clustered coefficient (SCC) regression that adds a penalty term to the residual sum of squares such that differences of coefficients for neighbouring locations are penalized and consequently locations may share the same coefficient. Sugasawa and Murakami (2021) proposed a partially clustered regression that allocates locations into groups, with locations sharing the same coefficients within a group. Luo et al. (2021) proposed Bayesian spatially clustered coefficient regression, with the spatial clustering defined via the connected components of an undirected graph induced by a spanning tree. This class of models has also been extended to spatial vector autoregressive regression (Yan et al., 2021). Another class of approaches are the Bayesian spatially varying coefficient (SVC) model (Gelfand et al., 2003) and its extensions (e.g., Paez et al., 2005; Finley et al., 2007; Berrocal et al., 2010; Reich et al., 2010; Zhao et al., 2020; Dambon et al., 2021). These have been developed within a standard Bayesian framework with geographically varying associations 𝔼(Yi|Xi) = Xiφi. SVC models induce geographical dependence via a random spatial adjustment to coefficients, such as φi = φfix+φrandom,i, where {φrandom,i}i=1n are modeled by a Gaussian random field with a covariance structure corresponding to the geographical dependence of n locations. SVC models share the power of hierarchical modeling (Gelfand and Banerjee, 2017) via their similarity to spatial hierarchical models, which use a random Gaussian field to model the regression error (e.g., Zhu et al., 2005; Lin, 2010; Afroughi et al., 2011; Fuglstad et al., 2015; Utazi et al., 2019; Marques et al., 2020). SVC models do not involve any geographical weight function so the probability density is always proper and standard Bayesian inference can be applied. However, the sampling of the posterior distribution under SVC models can be challenging because the dimension of parameters (i.e., {φrandom,i}i=1n) increases with the number of locations, making both sampling parameters and inverting the spatial covariance matrix computationally difficult. This issue can be avoided in the Gaussian case because {φrandom,i}i=1n can be integrated out to obtain the marginal likelihood of φfix explicitly and the conditional posterior of {φrandom,i}i=1n given φfix is analytically tractable. However, in generalized linear models marginalization of {φrandom,i}i=1n is not usually feasible, e.g. for binary and Poisson (Banerjee et al., 2008), and so sampling and computation could be problematic in practice due to the high dimension of the parameters if we have lots of locations (Sugasawa and Murakami, 2021). To address this computational issue, an efficient approach named PICAR was recently developed (Lee and Haran, 2022). The idea is to discretize the underlying spatial random field on a triangular mesh to reduce the dimension.

Other attractive and simpler alternatives are geographically weighted regression (GWR) models (e.g., Fotheringham et al., 1996; Brunsdon et al., 1996) and its extensions (e.g., Nakaya et al., 2005; Chen et al., 2012; da Silva and Rodrigues, 2014; da Silva and de Oliveira Lima, 2017; Mu et al., 2018; Liu et al., 2018; Li and Fotheringham, 2020; Tasyurek and Celik, 2020; Wu et al., 2021), which have been widely adopted in many spatial application areas (e.g., Windle et al., 2009; Duan and Li, 2016; Mayfield et al., 2018; Wang et al., 2019; Wu, 2020; Mohammed et al., 2022). GWR models use the first law of geography to justify additionally using data that are sampled from neighbouring locations when we have insufficient samples at a location of interest i = 0 to accurately estimate parameters at this location using only data from this location. The first law of geography states that ‘everything is related to everything else, but near things are more related than distant things’ (Tobler, 1970). “Borrowing” samples from neighbouring locations to support the estimation of φ0 should decrease the variance of estimates, although bias might be introduced.

For now, assume we have m observations Yi,1:m = (Yi,1, Yi, 2,…, Yi,m) at each location i, with i = 0,…, n. The complete set of observations is Y0:n,1:m = (Y0,1:m, Y1,1:m,…, Yn,1:m) with corresponding location-specific parameters θ0:n = (θ0,…, θn). We assume that Yi,1,…, Yi,m are independent identical observations of the random variable yi at location i. In addition, we assume Y0,1:m,…, Yn,1:m are independent but not necessarily identically-distributed. Let di be the geographic distance between location of interest 0 and location i. The generalized GWR likelihood is a locally-weighted likelihood:

p(Y0:n,1:mθ0:n,φ0)=p(Y0,1:mθ0,φ0)i=1np(Yi,1:mθi,φ0)W(di,η), (1.2)

with coefficient φ0 and where W(di, η) is a geographically weighted kernel, with band-width η, determined by the distance. Following the first law of geography, geographically weighted kernels gradually decrease to 0 as the distance di increases. One popular choice of weighted kernel is a Gaussian kernel (Brunsdon et al., 1996)

W(di,η)=exp(di2η2), (1.3)

where η is a geographical bandwidth which regulates the kernel size.

Inference for GWR models has usually been conducted in a frequentist framework, but a Bayesian extension of the GWR model would allow introduction of prior information, and also simplify situations where the covariance of the estimator is not easily obtainable. However, Bayesian inference for general GWR models is not immediately clear since, (1.2) is not in general a proper probability density if the power terms are not 1. Hence, Bayes’ theorem does not apply. In the special case of a Gaussian likelihood, W(di, η) can be viewed as a scale parameter of θi and thus we obtain a proper probability density. This special case has previously been considered, allowing inference for Gaussian GWR models within a standard Bayesian framework (Subedi et al., 2018; Ma et al., 2020). However, a Bayesian extension for a broader distribution family is unclear, and to the best of our knowledge, no previous papers have considered this problem.

In this article, we extend the generalized GWR model to the Bayesian framework and justify its usage. Observe that (1.2), ignoring the power terms, treats data sampled from neighbouring locations i, i ≠ 0, as if they share the same relationship with covariate Xi as data sampled from the location of interest i = 0. This inevitably leads to the problem of misspecification since φiφ0 due to the spatial non-stationarity. The degree of misspecification depends on the total variation of φi. This observation suggests that the essence of the Bayesian GWR model is dealing with misspecification due to incorporating extra observations from neighbouring locations and inspired us to draw ideas from the literature considering partial misspecification of Bayesian models and the modularized Bayesian analysis (Liu et al., 2009). The model involves a geographically powered posterior, with the power term being a deterministic functional form of the geographical distance. The contribution from each location to the inference of the parameter of interest is manipulated through a geographical bandwidth in the power term and we discuss the optimal selection of this bandwidth so that the negative impact from misspecification and positive impact from extra observations are well balanced. We show some theoretical properties of the model and outline the algorithm.

2. Robust Bayesian Inference and Modularization

Several attractive properties of Bayesian inference rely on the correct specification of the model. However, it is generally impossible to ensure the correct specification of a complete Bayesian model. Here, we adopt the M-closed view that a model is correctly specified if the true data generating process pˇ(Y) is exactly equal to a parametric distribution pψ0(Y|ψ0), given parameters ψ0 ∈ Ψ, which is subsequently referred as the likelihood (Bissiri et al., 2016). Misspecification might exist in all aspects of the model, or in only a few components (or modules in the terminology of Liu et al., 2009) of the model.

In the case of all aspects of the model being misspecified, modification of the conventional Bayesian model is required to improve the robustness of the model. One approach is to raise the likelihood to a power term and regard its logarithm as a loss function (Friel and Pettitt, 2008; Bissiri et al., 2016; Holmes and Walker, 2017), to obtain a weighted likelihood similar to the generalized GWR in (1.2):

ppow,η(ψ|Y)pψ(Y|ψ)ηπ(ψ). (2.1)

This is called the power posterior or fractional posterior, with power η. While weighted likelihoods have a long history in frequentist statistics (e.g., Cai et al., 2000; Markatou, 2000; Hu and Zidek, 2002; Biswas et al., 2015), it is only recently that justification of their usage in Bayesian statistics has been studied. One interpretation of the power term is that it adjusts the sample size with a multiplier η (Miller and Dunson, 2019). Another interpretation is that it is equivalent to a data-dependent prior (Martin et al., 2017). Miller and Dunson (2019) further argue that (2.1) approximates p(ψ|DKL(pψ(|ψ),p^())<R) under mild conditions, where the Kullback-Leibler (KL) divergence DKL(p^(),pψ(|ψ))=p^(y)log(p^(y)/pψ(y|ψ))dy and R is determined by the number of samples and the power η. The contraction of the power posterior is shown by Bhattacharya et al. (2019). These papers suggest that, in the case of a M-open view, where the true data generating process does not belong to the parametric distributions termed as likelihood, inference can proceed by looking for parameters whose likelihood approximates the true data generating process. In addition, an appropriate choice of η can accommodate this departure of misspecified pψ (Y|ψ) from the truth pˇ(Y) and the model is robust (Miller and Dunson, 2019). Importantly, the power η controls the relative credence given to the observed data and the prior; consequently it is not deemed as a parameter. Therefore, a prior is not assigned for η and it is not updated via Bayes theorem.

In the case of partial misspecification, misspecification of even a single module can cause incorrect estimation of other modules, even if these modules are correctly specified (Plummer, 2015; Liu and Goudie, 2022). Consider the two module model illustrated in Figure 1, with likelihood terms p(Y|θ, φ) and p(Z|φ), and prior terms π(θ) and π(φ). The posterior distribution, with parameters of interest ψ = (θ, φ), is

p(ψY,Z)=p(θY,φ)p(φY,Z)=p(Yθ,φ)π(θ)p(Yφ)p(Yφ)p(Zφ)π(φ)p(Y,Z).

Figure 1.

Figure 1

DAG representation of a two module model. The modules are separated by a dashed line.

Suppose that the specification of the likelihood for Y is suspected to be incorrect. If we wish to prevent Y affecting estimation of φ, then we can use the cut distribution (Lunn et al., 2009), defined for this model as

pcut(ψY,Z):=p(θY,φ)p(φZ)=p(Yθ,φ)π(θ)p(Yφ)p(Zφ)π(φ)p(Z).

Note that under the cut distribution φ depends on only the data Z; the data Y makes no contribution to the estimation of φ. This is called “cutting the feedback” (Lunn et al., 2009). This model has been used for Bayesian propensity scores (e.g., McCandless et al., 2010; Kaplan and Chen, 2012; Zigler and Dominici, 2014) where feedback from the outcome module to the propensity score module should be removed (Rubin, 2008; Zigler et al., 2013). It has also been used in various other fields (e.g., Blangiardo et al., 2011; Arendt et al., 2012; Frank et al., 2019).

The cut distribution and the standard posterior are two extremes: all information from the suspect module is either removed or retained. However, completely cutting or retaining the feedback from the suspect module might either lose usable information or introduce excessive bias. To control the feedback from the potentially misspecified module, a combination of the power posterior and cut model was recently proposed by Carmona and Nicholls (2020). Their Semi-Modular Inference (SMI) model introduces an auxiliary variable θ˜, which has the same distribution as θ, to regulate the contributions to the estimation of φ. Given a prior π(φ,θ˜), the SMI distribution of the augmented parameter (θ,θ˜,φ) is

pη(θ,θ˜,φY,Z)=ppow,η(θ˜,φY,Z)p(θY,φ),

where

ppow,η(θ˜,φ|Y,Z)p(Z|φ)p(Y|φ,θ˜)ηπ(φ,θ˜)

is a power posterior of θ˜ and φ, with power η. The SMI distribution of the parameters of interest ψ = (θ, φ) is

pη(ψY,Z)=pη(θ,θ˜,φY,Z)dθ˜.

The power η controls how much information from the suspect module involving Y is used to estimate φ.

3. Modularized Bayesian Inference for Multiple Modules

3.1. Standard Bayesian Posterior and Cut Distribution

To establish notation, first consider the simple case when the spatial coefficient function φ(ui, vi) = φ(u0, v0) = φ0; that is φ is constant across the whole geographical space and so we can directly include all data from all locations into the model. Denote the likelihood p(Yi,1:m|θi, φ0) at location i, with i = 0, 1,…, n. The DAG of this model is shown in Figure 2. The joint distribution with an independent prior π(θ0:n,φ0)=π(φ0)i=0nπ(θi) is

p(Y0:n,1:m,θ0:n,φ0)=π(θ0:n,φ0)i=0np(Yi,1:mθi,φ0).

Figure 2. DAG representation when φ(ui, vi) = φ0.

Figure 2

The following lemma gives the form of the standard Bayesian posterior.

Lemma 3.1

The standard Bayesian posterior is:

p(θ0:n,φ0Y0:n,1:m)=p(θ0,φ0Y0:n,1:m)i=1np(θiYi,1:m,φ0). (3.1)

For proof, see Supplementary Materials (Liu and Goudie, 2023).

Note that, estimation of (θ0, φ0) is influenced by all observations Y0,1:m,…, Yn,1:m as is standard in Bayesian inference: the contribution from any location is equal in the sense that no manipulation of feedback is conducted.

In contrast, consider the case when φ(ui, vi) is not constant. If we nevertheless include data from location (ui, vi), i ≠ 0 to estimate the parameter (θ0, φ0) and regard yip(η|θi, φ0) as module i, i = 0,1,…, n, then the likelihood Πi=0np(Yi,1:m|θi,φ0) is clearly misspecified since φ0φ(ui, vi). A straightforward way to handle this misspecification is to remove the influence of these modules on the estimation of φ0 by using the cut distribution. The cut distribution for this model is:

pcut(θ0:n,φ0Y0:n,1:m):=p(θ0,φ0Y0,1:m)i=1np(θiYi,1:m,φ0). (3.2)

Here, estimation of φ0 depends on only Y0,1:m. Contributions from Y1:n,1:m at other locations are completely removed.

3.2. Manipulating the Multiple Feedback and the Bayesian GWR Posterior

Suppose now that φ(u, v) is not constant but is a smooth function with respect to (u, v) so that closer locations have more similar φ. In this case it is inappropriate to treat the misspecification as equally problematic at every location since this may lead to a loss of usable information from the dataset. Instead we propose to manipulate contributions to the estimation of φ0 from observations Yi,1:m neighbouring the location of interest i = 0 by varying amounts. We achieve this by allocating a geographically weighted kernel W(di, η) to the likelihood of Yi,1:m where di is the distance between location 0 and location i.

Figure 3 shows a DAG of this model. It can be viewed as a case of manipulating the feedback between n+1 modules. Extending Carmona and Nicholls (2020), we introduce an auxiliary variable θ˜1:n=(θ˜1,...,θ˜n), which has the same likelihood term as θ1:n. We set an independent prior π(θ0,θ˜1:n,φ0)=i=1nπ(θ˜i)π(θ0)π(φ0). Then we write

pη(θ0:n,θ˜1:n,φ0Y0:n,1:m)=ppow,η(θ0,θ˜1:n,φ0Y0:n,1:m)i=1np(θiYi,1:m,φ0), (3.3)

where

ppow,η(θ0,θ˜1:n,φ0Y0:n,1:m)p(Y0,1:mθ0,φ0)π(θ0,θ˜1:n,φ0)i=1np(Yi,1:mθ˜i,φ0)W(di,η) (3.4)

is called the geographically-powered posterior and is used to adjust contributions from observations Yi,1:m by allocating the corresponding weighted kernel W (di, η) to the likelihood p(Yi,1:m|φ0,θ˜i). Note that (3.4) is an extension of the usual power posterior and it contains the GWR locally-weighted likelihood (1.2). Given the geographical bandwidth η, the SMI distribution for this multiple module case is

pη(θ0:n,φ0Y0:n,1:m)=pη(θ0:n,θ˜1:n,φ0Y0:n,1:m)dθ˜1:n.

Figure 3.

Figure 3

DAG representation when the feedback is manipulated. The n + 1 modules (Yi,1:m, φ0, θi), i = 0,…, n, are separated by a dashed line. The location of interest is i = 0.

The Bayesian GWR posterior for the parameters of interest ψ0 and θ0 at the location of interest i = 0 is

pη(θ0,φ0|Y0:n,1:n)=pη(θ0:n,φ0|Y0:n,1:m)dθ1:n=pη(θ0:n,θ˜1:n,φ0Y0:n,1:m)dθ˜1:ndθ1:n=ppow,η(θ0,θ˜1:n,φ0Y0:n,1:m)dθ˜1:n. (3.5)

We call estimation of the parameter of interest (θ0, φ0) via (3.5) Bayesian GWR inference. The Bayesian GWR model manipulates the feedback from each of the multiple neighbouring observations through the geographical bandwidth η, and reduces to the cut distribution and the standard posterior distribution for certain values for η. Specifically, when the variation of φ(u, v) is so large that we are not confident to include neighbouring locations, then η → 0 and the estimation of θ0 and φ0 only depends on observations Y0,1:m.

limη0pη(θ0:n,φ0Y0:n,1:m)=p(θ0,φ0Y0,1:m)i=1np(θiYi,1:m,φ0)=pcut(θ0:n,φ0Y0:n,1:m).

This is the cut distribution (3.2). In contrast, when the variation of φ(u, v) is so small that we can include observations from all locations, then η → ∞ and estimation of θ0 and φ0 depends on all observations Y0:n,1:m as in the standard posterior distribution (3.1):

limηpη(θ0:n,φ0Y0:n,1:m)=p(θ0,φ0Y0:n,1:m)i=1np(θiYi,1:m,φ0)=p(θ0:n,φ0Y0:n,1:m).

In summary, we propose the Bayesian GWR model for multiple suspect modules for the situation that the geographical weighted kernel (1.3) has a known and deterministic functional form with respect to the geographical coordinates. Since the joint ‘likelihood’ involved in (3.4) is the geographically weighted likelihood widely used in the GWR framework, the essence of the Bayesian GWR model is a particular extension of the SMI model.

3.3. Theoretical Analysis

Bayes’ theorem can not be used to justify the proposed geographically-powered posterior because the power likelihood is not a proper probability distribution. Instead, we justify the geographically-powered posterior as a minimizing rule within an information processing framework, thus avoiding the need to appeal to Bayes’ theorem. We also study its property subject to large sample size.

We write the true data generating process for the complete set of observations Y0:n,1:m as

pˇ0:n,1:m(Y0:n,1:m)=i=0npˇi,1:m(Yi,1:m)=i=0nj=1mpˇi(Yi,j),

where pˇi is the true generating process at location i. Let Pˇ0:n,1:m be the corresponding probability measure. Denoting ψ=(θ0,θ˜1:n,φ0)Ψ and Wi = W(di, η) and omitting η in ppow,η for simplicity, the geographically-powered likelihood ppow(Y0:n,1:m|φ) for observations Y0:n,1:m is written as (1.2) where θi is replaced with θ˜ for i ≠ 0. Let Π be the probability measure of prior distribution. If ppow(Y0:n,1:m):= ∫ ppow(Y0:n,1:m|ψ)Π() < ∞, we can re-write the probability measure of geographically-powered posterior (3.4) on any Ψ* ⊂ Ψ in terms of the true data generating processes as

Ppow(ΨY0:n,1:m)=Ψ*exp{r0,1:m(ψ)i=1nWiri,1:m(ψ)}Π(dψ)Ψexp{r0,1:m(ψ)i=1nWiri,1:m(ψ)}Π(dψ), (3.6)

where

r0,1:m(ψ)=log{pˇ0,1:m(Y0,1:m)p(Y0,1:mθ0,φ0)};ri,1:m(ψ)=log{pˇi,1:m(Yi,1:m)p(Yi,1:mθ˜i,φ0)}i0.

This representation makes it clear that (3.6) is an extension of the Gibbs posterior (Jiang and Tanner,2008), which is also known as the generalized Bayesian posterior (Grünwald and van Ommen,2017); pseudo posterior (Walker and Hjort, 2001; Alquier et al., 2016); and quasi-posterior (Chernozhukov and Hong, 2003; Dunson and Taylor, 2005), which plays an essential role in the study of the PAC-Bayesian inference (e.g., Dalalyan and Tsybakov, 2008; Lever et al., 2013). The Gibbs posterior generalizes the usual Bayesian posterior by defining a prior for the parameter of a loss function, which need not be the negative log-likelihood as used in standard Bayesian inference.

Our model extends the existing Gibbs posterior literature by allowing multiple learning rates (also interpreted as temperatures in thermodynamics (Geman and Geman, 1984)) which correspond to geographically weighted kernels. The loss function (or the statistical risk function) at each location is Wiri,1:m(ψ), where W0 = 1. We denote the empirical total loss function L1:m(ψ), given the parameter of the model ψ,as:

L1:m(ψ)=1m(r0,1:m(ψ)+i=1nWiri,1:m(ψ)).

Let F be a probability measure on the parameter space Ψ which results from processing the information from observations Y0:n,1:m and prior knowledge Π. We aim to show that the geographically-powered posterior Ppow is the optimal F in the sense that Ppow minimizes an information bound. We first need to construct this information bound. Bhattacharya et al. (2019) provides a PAC-Bayesian type bound for the power posterior. The bound controls a Rényi divergence which characterizes the performance of the power posterior. We now denote the Rényi divergence between two arbitrary distribution p and q, given an α ∈ (0, 1), as:

Dα(p(),q())=1α1log(p(y)αq(y)1αdy).

We have the following theorem that extends the Theorem 3.4 of Bhattacharya et al. (2019) by allowing multiple learning rates.

Theorem 3.1

(Weighted Rényi divergence bound). Given a distribution f(φ) with probability measure F(η) over parameter space Ψ, for any ε ∈ (0, 1), the following inequality

i=1n(1Wi)DWi(p(θ˜i,φ0),pˇi())F(dψ)1m(r0,1:m(ψ)+i=1nWiri,1:m(ψ))F(dψ)+DKL(f(),π())m+1mlog(1ε)

holds with Pˇ0:n,1:m probability at least (1 − ε).

For proof, see Supplementary Materials (Liu and Goudie, 2023).

Remark

Theorem 3.1 leads to the following “information posterior bound” (Zhang, 2006), which holds with Pˇ0:n,1:m probability at least (1 − ε).

EψF{log(EY0:n,1:mPˇ0:n,1:mexp(L1:m(ψ)))}EψFL1:m(ψ)+DKL(f(),π())m+1mlog(1ε).

For proof, see Supplementary Materials (Liu and Goudie, 2023).

Given a distribution F which results from an information processing rule, the Remark states that the negative logarithm of the expected exponential of the negative loss is controlled by the empirical loss from the usage of F and an additional penalty on the discrepancy between F and the prior Π. Zhang (2006) proposed an approach called “Information Risk Minimization” which selects F by minimizing the right hand side of the information posterior bound. Note that, although the bound involves ε, the inequality holds for any ε ∈ (0, 1). Hence, the selection of F is not affected by ε. Similarly, the true data generating process drops out since it does not involve F. To apply this approach, it is equivalent to find a F that minimizes the following criterion function

Mm(f(ψ))=DKL(f(),π())f(ψ)log(p(Y0,1:mθ0,φ0))dψi=1nWif(ψ)log(p(Yi,1:mθ˜i,φ0))dψ.

Note that the “Information Risk Minimization” used here can be regarded as a modified “Information Conservation Principle” (Zellner, 1988). This principle states that an optimal information processing rule has equal input information Iin, which consists the information processing (i.e., prior knowledge, observations and model), and output information Iout. In our setting, for the probability measure F, the input information Iin is:

Iin:=log(π(ψ))F(dψ)+log(ppow(Y0:n,1:mψ))F(dψ)=log(π(ψ))F(dψ)+log(p(Y0,1:mθ0,φ0))F(dψ)+Wii=1nlog(p(Yi,1:mθ˜i,φ0))F(dψ).

Note that in contrast to the original input information discussed in Zellner (1988), the input information from each geographical location is manipulated by the geographically weighted kernel. The output information Iout is:

Iout:=log(f(ψ))F(dψ)+log(ppow(Y0:n,1:m))F(dψ).

Now we present the following theorem which justifies the use of the geographically-powered posterior (3.6) as the form of probability distribution that statistically learns information from the observations and the prior knowledge while minimising the loss.

Theorem 3.2

(Justification). If ppow(Y0:n,1:m) := p(Y0:n,1:m|ψ)π(ψ) < ∞, the geographical ly-powered posterior ppow(ψ|Y0:n,1:m) minimizes the criterion function Mm(f(ψ)) with respectto a probability distribution f(ψ). In addition, the geographically-powered posterior results from the optimal information processing rule.

For proof, see Supplementary Materials (Liu and Goudie, 2023).

We now consider the large sample size setting. Let y0:n = {y0, y1,…, yn} be random variables corresponding to a single observation at each location and y0:n ~ P=i=1nPˇi. Although the GWR model is less necessary in the large sample size setting (since effective statistical inference can be conducted separately at each location), we wish to show that the posterior predictive distribution p(yi|θ˜i,φ0), where (θ˜i,φ0)Ppow, approaches the truth pˇi at each location i = 0, 1,…, n when the degree of the partial misspecification varies across the geographical space. Denote the expected total loss function L(ψ), given the parameter of the model ψ, as:

L(ψ)=Ey0:nPˇ{log(pˇ0(y0)p(y0θ0,φ0))+i=1nWilog(pˇi(yi)p(yiθ˜i,φ0))}.

We present the following theorem.

Theorem 3.3

(Consistency). Given a finite number of observations, the geographically-powered posterior Ppow minimizes

EψPpow(L1:m(ψ))+m1DKL(ppow(Y0:n,1:m),π()).

When the sample size m → ∞ at all locations and suppose that the limit of the geographically-powered posterior Ppow():=limmPpow exists, then ppow() puts all its mass at ψ*=(θ0*,θ˜1:n*,φ0*) which minimizes the expected total loss function (a geographically weighted combination of Kullback–Leibler divergences):

ψ*=argminψ=(θ0,θ˜1:n,φ0)L(ψ)=argminψ=(θ0,θ¯1:n,φ0)DKL(pˇ0(),p(θ0,φ0))+i=1nWiDKL(pˇi(),p(θ˜i,φ0)).

For proof, see Supplementary Materials (Liu and Goudie, 2023).

Although partial misspecification remains and predictions drawn from the model will not follow the true data generating process, Theorem 3.3 states that the geographically-powered posterior draws predictions that balance minimizing the empirical total loss function and the discrepancy between posterior and prior knowledge. When the sample size increases, the model acts similarly to a standard Bayesian model by learning more from observations. In the limit of infinite sample size, the model provides a prediction that is closest to the true data generating process. Note that, although the model draws predictions close to the truth, more priority is assigned to locations close to the location of interest, and so we cannot use a single Bayesian GWR model when inference is needed for multiple locations. Instead, separate models should be used at each location of interest.

4. Inference for Multiple Locations and Bandwidth Selection

4.1. Predictive Performance of One Bayesian GWR Model

In Section 3, we considered the setting when there is a single location of interest. We now consider inference for multiple sampling locations when all locations are of interest. This is done by using separate Bayesian GWR models for each location while assuming the same geographical bandwidth for all models. We give the following definition which generalizes the Bayesian GWR model by relaxing the location of interest.

Definition 4.1

Consider observations Yi,1:m sampled from location i with coordinate (ui, vi), i = 0,…, n; a bandwidth η and a specific geographical coordinate (u, v) that we call the geographical centre. Define the Bayesian GWR model M = ((u, v), η) with parameter ψM = (θM,0:n, ψM) to be the SMI model with distribution

pM(ψMY0:n,1:m)=pM(ψM,θ˜M,0:nY0:n,1:m)dθ˜M,0:n, (4.1)

where θ˜M,1:n is the auxiliary variable for model M and

pM(ψM,θ˜M,0:nY0:n,1:m)π(θ˜M,0:n,φM)×i=0np(Yi,1:mθ˜M,i,φM)W(di,η)i=0np(θM,iYi,1:m,φM) (4.2)

where di is the geographical distance between location (ui, vi) and the geographical centre (u, v). In the special case when the geographical centre is one of the sampling locations, which we assume without loss of generality to be (u0, v0), then (4.1) and (4.2) reduce to (3.3) and (3.4).

To measure the predictive performance of a model M for, for example, a new observation Yi* from location i with true generating process pˇi(Yi), we use the Kullback-Leibler (KL) divergence. This is achieved by looking at the expected log pointwise predictive density (Gelman et al., 2014; Jacob et al., 2017), which is essentially a constant term minus the KL divergence, and is defined as

elpd(ui,vi)(M):=pˇi(Yi)log(pM(YiY0:n,1:m))dYi, (4.3)

where the predictive distribution pM(YiY0:n,1:m) is defined as

pM(YiY0:n,1:m):=p(YiθM,i,φM)pM(ψMY0:n,1:m)dψM=p(YiθM,i,φM)pM(θM,i,φMY0:n,1:m)dθM,idφM.

Here, we denote pM(θM,i, φM|Y0:n,1:m) := pM(ψM|Y0:n,1:m)M,-i, where we define θM,-i = (θM,0,…, θM,i-1, θM,i+1,…, θM, n).

4.2. Inference for Multiple Locations

Having defined the measure of predictive performance for one Bayesian GWR model, we are ready to extend it to infer multiple locations by setting and tuning multiple Bayesian GWR models. The following assumption can be viewed as a rephrasing of the first law of geography (Tobler, 1970), since for an arbitrary location of interest i, observations from closer locations contribute more to the estimation of the shared parameter φ when the geographical centre is exactly equal to the location of interest.

Assumption 4.1

For any fixed geographical bandwidth η and specific location with geographical coordinates (uk, vk), elpd(uk, vk)(M) is maximized when the geographical centre (u, v) = (uk, vk). That is:

((uk,vk),η)=argmaxMelpd(uk,vk)(M),η.

We define the space of Bayesian GWR models 𝓜 = {M = ((u, v), η) : η > 0}. The following assumption assumes inferences from multiple models are independent.

Assumption 4.2

Given a dataset Y0:n,1:m and Bayesian GWR models MsM, s = 1,…, S, we have the joint Bayesian GWR posterior

p(ψM1,,ψMSY0:n,1:m)=s=1SpMs(ψMsY0:n,1:m).

We are now ready to extend inference to multiple locations. Given a set of Bayesian GWR models M = (M0,…, Mn), one for each geographic sampling location, all with identical geographical bandwidth η, we define the expected log pointwise predictive density for new observations Y0:n*=(Y0*,...,Yn*) with each single observation Yi* from location i as

elpd(M)=log(p(Y0:nY0:n,1:m))i=0npˇi(Yi)dY0:n,

where

p(Y0:nY0:n,1:m)=p(Y0:nψM0,,ψMn)p(ψM0,,ψMnY0:n,1:m)dψM0dψMn.

We then present the following theorem to select the optimal bandwidth.

Theorem 4.1

(Bandwidth selection). Given Assumption 4.1 and 4.2, for observations Y0:n,1:m sampled from locations i with coordinates (ui, vi), i = 0,…, n, the optimal combination of n+1 separate Bayesian GWR models M = (M0,…, Mn) that maximizes elpd (M), where each Mi=((ui*,vi*),η*) is used for prediction in location i, satisfies

  1. For all 0 ≤ in,
    (ui*,vi*)=(ui,vi).
  2. Redefine Mi(η) = ((ui, vi), η), then the optimal bandwidth η* maximizes the mean (across all sampling locations) expected log pointwise predictive density.

    η*=argmaxη1n+1i=0nelpd(ui,vi)(Mi(η)).

For proof, see Supplementary Materials (Liu and Goudie, 2023).

In practice, we do not know the true data generating process pˇi. Numerous methods (e.g., Gelman et al., 2014) can be applied to approximate (4.3). Here, we adopt cross-validation to estimate elpd(ui, vi)(Mi) because it measures out-of-sample predictive performance and consequently avoids overestimating elpd. We train the model Mi on all observations from other locations Yj,1:m, ji and a subset Yi,1:m of the observations from location i (denoted as {Y0:n,1:m \ Yi,m’+1:m}), and estimate elpd using the test set Yi,m’+1:m by

elpd^(ui,vi)(Mi)=1mmj=m+1mlog(p(Yi,j|ψMi)pMi(ψMi|{Y0:n,1:m\Yi,m'+1:m})dψMi). (4.4)

The integral within (4.4) can be easily approximated by the Monte Carlo samples drawn from the Bayesian GWR posterior. This is summarized in Algorithm 1.

4.3. Algorithm and Simplification of Computation

We summarize the algorithm for the Bayesian GWR model when there are n + 1 locations. For a set of candidate geographical bandwidths {ηr}r=1R, we select the optimal geographical bandwidth η using Algorithm 1. In Algorithm 2, samples at each iteration can be drawn by using any standard sampler (e.g., Metropolis-Hastings or Gibbs sampler). The algorithm requires an approximation of the elpd at each location separately. This can be done in parallel to expedite computation. Once the optimal geographical bandwidth η has been selected, we refit model with this bandwidth to the whole dataset, as described in Algorithm 2. We provide the code for both algorithms in Python Version 3 (https://github.com/MathBilibili/Bayesian-geographically-weighted-regression).

Algorithm 1 Selection of geographical bandwidth η by cross-validation.

Require: A candidate set of geographical bandwidths {ηr}r=1R, observations Y0:n,1:m and its corresponding coordinates {(ui,vi)}i=0n, likelihood p(Y|θ, φ), prior π(φ), π (θ˜) and π(φ), number of iterations S, number Q of k-fold cross-validation folds.

1: for r ∈ {1,…,R} do

2:     for q ∈ {1,…,Q} do

3:          Select the test set Yi,(m’+1:m), a random 100/k% subset of observations at location i, and training set Y(i) = Y0:n,1:m \ Yi,(m’+1.m) for location i, i = 0,…,n.

4:          Call Algorithm 2 with Bayesian GWR models Mi(ηr) = ((ui, vi), ηr)and location-specific dataset Y(i), i = 0,…,n.

5:          Calculate elpd^(ui,vi)(Mi(ηr)) on the test set Yi,(m’+1:m) using samples {(φi(s),θi(s))}s=1S, i = 0,…,n.

6:          Calculate qth mean elpd: elpdq¯(ηr)=1n+1i=0nelpd^(ui,vi)(Mi(ηr)).

7:     end for

8: end for

9: return {{elpdq¯(ηr)}q=1Q}r=1R.

The computational cost of a Bayesian GWR model for multiple locations is mainly determined by two factors when using a Metropolis-Hasting sampler. The first factor is the number of observations m at each location, which clearly determines the number of likelihood evaluations required. In practice, this evaluation normally benefits from vectorization.

Algorithm 2 Bayesian GWR model for multiple locations.

Require: A geographical bandwidth η, observations Y0:n,1:m and corresponding coordinates {(ui,vi)}i=0n, likelihood p(Y|θ, φ), prior π(θ), π(θ˜) and π(ψ), number of iterations S.

1: Set Bayesian GWR models Mi(η) = ((ui, vi), η) and location-specific dataset Y(i), i = 0,…,n. Note that Y(i) = Y0:n,1:m if cross-validation is not required.

2: for i ∈ {0,…,n} do

3:      Calculate geographically weighted kernels, where the distance is calculated between (uj, vj) and geographical centre (ui, vi) for j = 0,…,n.

4:      Draw samples {θi(s),θ˜i(s),φi(s)} from ppow,η(θi,θ˜i,φiY(i)), s = 1, …, S, according to (3.4), with location of interest i and θ˜i=(θ˜0,,θ˜i1,θ˜i+1,,θ˜n).

5: end for

6: return Bayesian GWR posterior samples {{(φi(s),θi(s))}s=1S}i=0n.

The other factor is the number of locations n. On the one hand, by Assumption 4.2, inference of parameters at each location is conducted using n separate Bayesian GWR models, which can be easily parallelized. This can greatly reduce the computation time. On the other hand, when using the geographically weighted kernel (1.3), (3.4) requires the powered likelihood to be evaluated n times. When this computational cost is too large, it is possible to reduce the load by disregarding distant locations with only tiny weights. Specifically, inspired by the bi-square weighting function (Brunsdon et al., 1996), a modified truncated Gaussian kernel may be useful:

W(di,η)={exp(di2η2)ifexp(di2η2)>W*0otherwise, (4.5)

where W* (e.g., 10−2) is a threshold value that controls the degree of exclusion. We want this exclusion to reduce the number of likelihood evaluations needed, while retaining all information from the neighbouring locations. A practical way to check this is by looking at the percentage change of the value of (3.4) between kernels (1.3) and (4.5). If the percentage change is trivial, (4.5) will closely approximate (1.3) but at a much lower computational cost, especially when a small bandwidth η is adopted. In summary when adopting kernel (4.5), the computational complexity, in terms of evaluating the likelihood of one observation of one MCMC iteration for one location of interest, is 𝒪(m × n(W*)), where n(W*) is the number of locations for evaluations in (3.4) with threshold W*.

5. Simulation

To illustrate our methodology and the influence of the geographical bandwidth, we simulated data on a 40 × 40 regular lattice (u, v), with u = 1,…,40 and v = 1,…,40, with geographically varying coefficients φ = (φ0, φ1(u, v), φ2(u)) defined as:

φ0=3,φ1(u,v)=0.1+0.01u2+v2,φ2(u)=0.05(sin(π/2+π(u/20))+cos(π/2+π(u/20))+4).

We generated the true θ(u, v) ∼ N(0.5, 0.012) independently: the resulting θ(u, v) is relatively constant across spatial locations, and its variability is not spatially smooth. With these coefficients, we simulated m = 100 independent samples at each location from a negative binomial distribution, with covariates X = (X1, X2, X3) where X1 = 1 and X2 and X3 drawn from a uniform distribution U(0, 10) and U(2, 7). The Supplementary Materials (Liu and Goudie, 2023) contains results when the number of independent samples is reduced to m = 50 and m = 10.

We then fitted Bayesian GWR model to each location separately and independently using the truncated Gaussian kernel with threshold 10−2, with geographical bandwidth η. The difference in (3.4) using a truncated and non-truncated Gaussian kernel was less than 10−5%, suggesting the truncated kernel closely approximates the non-truncated kernel. To estimate the elpd by cross validation, we excluded half of the samples at the location of interest from the training set. We drew 4×103 iterations for each of 10 independent chains at each location, discarding the first 1 × 103 samples as burn-in. We also fitted the PICAR model (Lee and Haran, 2022) as a reference. The details and settings of the PICAR model can be found in the Supplementary Materials (Liu and Goudie, 2023).

To identify the optimal geographical bandwidth η for Bayesian GWR, we repeated this process for each of the 9 candidate values η = 0.0001, 2, 4, 6, 8, 10, 20, 40, and 1000. Figure 4 shows the computational time and estimated mean expected log pointwise predictive density (mean elpd across space), according to (4.3), for each candidate value. It can be seen that the mean elpd achieves its highest value when the bandwidth is 4, so we will compare results with η = 4, η = 0.0001 (the smallest candidate, equivalent to using samples only from the geographic centre) and η = 1000 (the largest candidate, assuming the least geographic variation).

Figure 4.

Figure 4

Computational time and elpd against geographical bandwidth. The computational time is calculated based on one MCMC iteration of running Bayesian GWR model for all locations. This is processed in parallel on ten cores of Intel Xeon E7-8860 v3 CPU. Each boxplot represents the elpd estimates from 10 chains across the whole geographic space. The red line is the average elpd estimates across the 10 chains. The blue dashed line indicates the optimal bandwidth. Two black dashed areas are equivalent the inset figure is a zoomed-in version.

We then ran the model on the complete dataset without excluding any observations. For each location, we ran 10 chains independently for 4 × 103 iterations, discarding the first 1 × 103 samples as burn-in, so that the change of the value of the estimated elpd was smaller than 0.05 (trace plot in Supplementary Materials (Liu and Goudie, 2023)). The true values and estimated means for coefficients (φ0, φ1, φ2) obtained via applying PICAR and Bayesian GWR model, when η = 0.0001, 4, 1000, are shown in Figure 5. When η = 0.0001, estimation at each location relies almost exclusively on data from that location, so the estimated coefficients vary considerably across spatial locations: the connection between locations is almost completely “cut”. Furthermore, some estimates are extreme because excluding neighbouring samples means only a small number of samples are used by the model. These results reveal the nature of using a small bandwidth in a GWR model, as has also been discussed previously (Guo et al., 2008). In contrast, when η = 1000, we can see the estimated coefficients are almost constant across geographic locations, due to the large bandwidth that assumes samples from neighbouring locations are very similar to samples from the location of interest. Finally, the estimates obtained via applying PICAR and Bayesian GWR using the optimal bandwidth η = 4 are close to the true values across all geographic locations, although estimates from Bayesian GWR appear to be slightly more smoothed.

Figure 5.

Figure 5

Heatmap of the true values and estimated means for the coefficients φ0, φ1 and φ2, using PICAR and the Bayesian GWR model, with geographic bandwidth η = 0 0001, 4 and 1000.

Figure 6 shows boxplots of the squared error between the Bayesian GWR and PICAR estimated means and the true values of the three coefficients across all geographic locations. The true φ0 is constant, therefore a large bandwidth that incorporates more samples will have a lower mean squared error. Hence, the model with η = 4 provides good estimation of φ0. In contrast, the model with η = 0.0001 fails to estimate the true value of φ0 because the sample size at each location is not sufficient to enable precise estimation. Moreover, the model with η = 1000 has a significant bias because it incorporates too much information from other locations which have considerably different data generating processes to the location of interest. For φ1 and φ2 which do vary geographically, the model with η = 1000 as expected performs poorly because the model assumes little geographic variation. The model with η = 0.0001 also performs poorly due to the insufficient sample size at each individual location. Overall, the model with the optimal bandwidth η = 4 performs the best in mean squared error. When comparing PICAR with Bayesian GWR using the optimal bandwidth, both models give similar results. PICAR on average has slightly lower squared error than our proposed Bayesian GWR model, but the squared error from PICAR appears to be more variable than Bayesian GWR. This is due to the fact that GWR model is an intrinsically smoothing approach. The Supplementary Material (Liu and Goudie, 2023) contains further discussion of the estimation error induced by applying Bayesian GWR with different bandwidths.

Figure 6.

Figure 6

Boxplots of the squared error of the estimated mean coefficients φ0, φ1 and φ2 under the Bayesian GWR model and PICAR model across geographic locations, with geographic bandwidth η = 0.0001, 4, and 1000. The orange line and green triangle indicate the median and mean squared error.

6. Application to Real Data

It has been shown in epidemiological studies that there is a global variation in the seasonal activity of the influenza virus (e.g., Finkelman et al., 2007; Azziz Baumgartner et al., 2012; Lam et al., 2019). In particular, there are normally clear and consistent influenza epidemic peaks during the winter in the high-latitude regions (Cox and Subbarao, 2000), whereas seasonal transmission patterns are unclear in low-latitude (subtropical/tropical) regions (Viboud et al., 2006; Li et al., 2019). This suggests that transmission and viability of the influenza virus is linked with atmospheric conditions: the regular occurrence of influenza epidemic in temperate regions is largely attributed to the exposure of cold and dry environments (e.g., Lowen et al., 2007; Lowen and Steel, 2014; Deyle et al., 2016; Chong et al., 2020). However, this relationship is weaker in sub-tropical/tropical regions (Tamerius et al., 2013). In this section, we apply the Bayesian GWR model to a human influenza dataset to assess spatial variation in the association between the occurrence of influenza and two major climatic factors (temperature and precipitation). Note that, it has been shown that the relationship between the occurrence of influenza and climatic factors may be temporally varying as well (see Liu et al. (2018)). We ignore these temporal effects here since the extension of GWR models to geographically-temporally weighted regression (GTWR) within a Bayesian framework is not straightforward and requires further study (see Section 7).

We used monthly, country-level human influenza surveillance data between January 2010 and December 2014 from the World Health Organization FluNet (https://www.who.int/tools/flunet). We selected 20 countries of similar size and with relatively comprehensive influenza records. We selected 16 European countries to represent the temperate region (Austria; Belgium; Bosnia and Herzegovina; Croatia; Czech; France; Germany; Hungary; Italy; Luxembourg; Netherlands; Poland; Romania; Slovakia; Slovenia; UK) and 4 South-East Asian countries to represent the tropical region (Cambodia; Laos; Thailand; Vietnam). We used the geographical center coordinates (ui, vi), i = 1, …, 20 of each country as the geographical coordinates. The dataset contains the number of positive cases Yi,t and total number of tests Ni,t in country i = 1, …, 20 during month t. The temperature Xi1,t (degrees Celsius) and amount of precipitation Xi2,t (mm/month) during month t in country i were obtained from CRUCY (Harris et al., 2014).

The countries we included show distinct patterns of influenza activities. Figure 7 shows, for the UK and Thailand, the monthly influenza positivity rate, temperature, precipitation and the corresponding wavelet analysis of the periodicity of influenza activity. In the UK, we can observe that the peak of influenza activity is consistent with the winter season in the UK and a clear negative correlation can be observed between influenza positivity rate and temperature. The relationship visually appears less strong for precipitation. In contrast, in Thailand influenza has a more variable peak time and the relationship with temperature and precipitation is not clear. To further quantify the distinct seasonality of influenza activities between two countries for better understanding of the underlying geographical difference, we conducted a separate (exploratory) wavelet analysis using WaveletComp in R. This decomposes the influenza time series into numerous wavelets, each with a distinct frequency. The degree to which influenza follows a particular periodicity can be assessed by the magnitude of the corresponding wavelet. This reveals clear evidence of periodicity of between 10–15 months in all years in the UK, whereas there is no consistent periodicity in Thailand (Figure 7). This highlights the potential geographical variation of the influenza activities, suggesting a GWR model is appropriate.

Figure 7.

Figure 7

Association between influenza, temperature and precipitation in the UK and Thailand. Left panel: Monthly influenza positivity rate, temperature (degrees Celsius) and precipitation (mm/month). Right panel: Wavelet power spectrum (absolute square of the wavelet transform) of the influenza activity. The black line surrounds the significant area (p-value < 0.01), where the power spectrum is significantly large than the power spectrum of random noise.

In our Bayesian GWR model, we assumed that the number of positive cases Yi,t follows a negative binomial distribution, as in (1.1), except that the total number of tests Ni,t was embedded into the link function and spherical distance was calculated using the haversine formula. The mean and variance of Yi,t are:

E(Yi,tXi1,t,Xi2,t)=exp(log(Ni,t)+φ0(ui,vi)+φ1(ui,vi)Xi1,t+φ2(ui,vi)Xi2,t),Var(Yi,tXi1,t,Xi2,t)=E(Yi,tXi1,t,Xi2,t)+θiE2(Yi,tXi1,t,Xi2,t).

We considered each of the 20 countries separately, with each of the following geographical bandwidths η = 100, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 10000 and 20000 (kilometres) for a Gaussian kernel. These choices of bandwidth cover a broad range of different assumptions regarding the impact of neighbouring countries. For each country, we randomly left-out 50% of the observations to use as a test set. We ran 30 independent MCMC chains, and after discarding the first 3 × 103 samples, we drew 104 samples from the Bayesian GWR posterior. Figure 8 shows the estimated elpd for each bandwidth across the whole space, suggesting that the optimal choice of the bandwidth from the candidate set is 3000. This bandwidth indicates that there is spatial variation of the underlying association across the countries we selected. Note that, the range of 3000 kilometres roughly spans either Europe or South-East Asia but not both, meaning that spatial non-stationarity was detected between these two regions but the spatial non-stationarity is not significant within the two regions.

Figure 8.

Figure 8

ELPD against geographical bandwidth. Each boxplot represents the elpd estimates from 30 chains across 20 countries. The red line is the average elpd estimates across the 30 chains. The blue dashed line indicates the optimal bandwidth.

We applied the model in all 20 countries independently, using the whole dataset and with bandwidth η = 3000. We ran 20 independent MCMC chains for each country, and retained 104 samples after discarding the first 3 × 103 samples as burn-in. The pooled samples drawn from the Bayesian GWR posterior for φ for temperature and precipitation were used to estimate the median, lower and upper 95% bound of credible interval (CI) for each country. Figure 9 shows the results, after applying kriging interpolation with ArcGIS Version 10.7. These estimates imply that in European countries a negative association exists between influenza and both temperature and precipitation. That is, influenza transmission tends to be more prevalent during the cold and dry season. In contrast, there is no significant association in the south-east Asian countries. These conclusions are consistent with previous findings (e.g., Tamerius et al., 2013).

Figure 9.

Figure 9

Median (middle panel), lower (bottom panel) and upper (top panel) 95% credible intervals for φ for European (left column) and East Asian (right column) countries. Panel (a) shows estimates for φ1 for temperature (a); panel (b) shows estimates for φ2 for precipitation. The color and map scales are listed at the bottom.

7. Conclusions

We have introduced and extended the SMI model and the candidate distribution selection technique to the field of geographic information science (GIS). Currently, a Bayesian approach for GWR models is only available for the Gaussian linear regression (Subedi et al., 2018; Ma et al., 2020). We therefore elucidate the theoretical validity of applying a Bayesian approach to generalized GWR models and reveal the essential link between the Bayesian GWR model and cutting or manipulating feedback. The motivation of Bayesian GWR model is to decrease the random error at the expense of introducing systematic error. This is realized by incorporating observations from neighbouring locations. The geographically weighted kernel manipulates the information provided by extra observations. The optimal geographical bandwidth η balances the trade-off between two types of error. Our model can also be applied for the Gaussian distribution with θ being the standard deviation. We note that our Bayesian GWR for Gaussian is different to the Bayesian GWR proposed by Ma et al. (2020). This is because our model is based on the weighted log-likelihood while Ma et al. (2020) is based on a weighted least squares approach. Specifically for the Gaussian distribution, these two models may be equivalent if the parameter of interest is only φ because only the exponential term of the likelihood, which is proportional to the residual sum of squares when log-likelihood is used, contains φ. However, they are different if θ is also considered.

GWR models in a frequentist framework require tedious mathematical derivation of the estimator to obtain estimates of the uncertainty of the parameter estimates, which may not be always accessible. In contrast, the Bayesian GWR model provides easily obtainable and straightforward measures of the uncertainty of the parameter estimates given the posterior samples. Furthermore, the Bayesian nature of this model means that prior knowledge can be easily introduced into the model. Unlike the SVC generalized linear model, which may require Monte Carlo sampling in a high-dimensional parameter space, the Bayesian GWR model requires only sampling separately for each location, meaning the dimension of the parameter does not scale with the number of locations, regardless of the generalized linear model used. Regarding computation, unlike the SVC model and other standard Bayesian spatial methods which require sequential sampling of parameters for all locations, the Bayesian GWR model can easily benefit from the availability of parallelization due to the separate inference for each location.

While most GWR models have considered spatially smooth parameters (coefficients), the GWR literature has not previously considered the more general case when some of the parameters are locally unique but not spatially smooth e.g. linear regression, negative binomial regression or beta regression involving a parameter akin to θi in (1.1). Hence, our model can be viewed as an extension of the conventional GWR models that is able to simultaneously deal with (1) spatially smooth and (2) locally unique but not spatially smooth parameters. The case when all parameters vary spatially-smoothly can be incorporated into our framework by including all parameters in the φ vector. However, it is important to note that when θi = ∅, the model no longer falls into the standard SMI framework.

The SMI model was previously only established for the two module case, i.e. with a single cut. In this study, we extend it to a special case of multiple cuts when information from suspect modules are manipulated via a deterministic functional form controlled by a single kernel bandwidth.

Several limitations of the current model are left for future investigation. First, the current model selects the optimal bandwidth using cross-validation. This can be computationally expensive since it requires multiple partitions of the set of observations Yi,1:m for each location i. Second, although the current model can infer the parameter θ, this inference may suffer from insufficient observations when m is small because the inference of θ only depends on observations from the location of interest as shown in (3.3). Third, our model uses a globally fixed geographical bandwidth. This could be problematic when the true data generating process varies considerably within some areas but only varies to a small degree within other areas; or when some elements of the regression coefficient φ have a large geographical variation whereas other elements of φ have a small geographical variation. Spatially-varying bandwidth or parameter-specific distance metrics have been proposed for standard GWR models (Leong and Yue, 2017; Fotheringham et al., 2017; Lu et al., 2017; Hu et al., 2021), but the extension of these methods within a Bayesian framework is not straightforward computationally because a basic implementation would involve repeated evaluation of the geographically weighted kernel for all locations. Fourth, it would be appropriate in some applications to account for temporal effects, rather than ignoring potential temporal non-stationarity as we do in the current framework. However, while the idea is intuitive, the extension of the methodology is not straightforward because the scale of geography and time are different. This leads to a more involved bandwidth selection process, the translation of which to the Bayesian setting is not immediate, especially given our use of cross-validation and MCMC.

Supplementary Material

Supplementary Material for “Generalized Geographically Weighted Regression Model within a Modularized Bayesian Framework” (DOI: 10.1214/22-BA1357SUPP;.pdf).

Supplementary File

Acknowledgments

The data analysed and the code used in this study are freely available at https://github.com/MathBilibili/Bayesian-geographically-weighted-regression.

For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.

References

  1. Afroughi S, Faghihzadeh S, Khaledi MJ, Motlagh MG, Hajizadeh E. Analysis of clustered spatially correlated binary data using autologistic model and Bayesian method with an application to dental caries of 3-5-year-old children. Journal of Applied Statistics. 2011;38(12):2763–2774. doi: 10.1080/02664763.2011.570315. MR2859833 doi: https://doi.org/10.1080/02664763.2011.570315. 2. [DOI] [Google Scholar]
  2. Alquier P, Ridgway J, Chopin N. On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research. 2016;17(236):1–41. URL http://jmlr.org/papers/v17/15-290.htmlMR359517310. [Google Scholar]
  3. Arendt PD, Apley DW, Chen W. Quantification of model uncertainty: Calibration, model discrepancy, and identifiability. Journal of Mechanical Design. 2012;134(10):100908. doi: 10.1115/1.4007390. 6. [DOI] [Google Scholar]
  4. Azziz Baumgartner E, Dao CN, Nasreen S, Bhuiyan MU, Mah-E-Muneer S, Mamun AA, Sharker MAY, Zaman RU, Cheng P-Y, Klimov AI, Wid-dowson M-A, et al. Seasonality, timing, and climate drivers of influenza activity worldwide. The Journal of Infectious Diseases. 2012;206(6):838–846. doi: 10.1093/infdis/jis467. 21. [DOI] [PubMed] [Google Scholar]
  5. Banerjee S, Gelfand AE, Finley AO, Sang H. Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008;70(4):825–848. doi: 10.1111/j.1467-9868.2008.00663.x. MR2523906 doi: https://doi.org/10.1111/j.1467-9868.2008.00663.x. 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Berrocal VJ, Gelfand AE, Holland DM. A spatio-temporal downscaler for output from numerical models. Journal of Agricultural, Biological, and Environmental Statistics. 2010;15(2):176–197. doi: 10.1007/s13253-009-0004-z. MR2787270. 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bhattacharya A, Pati D, Yang Y. Bayesian fractional posteriors. The Annals of Statistics. 2019;47(1):39–66. doi: 10.1214/18-AOS1712. MR3909926. doi: https://doi.org/10.1214/18-AOS1712, 5 11. [DOI] [Google Scholar]
  8. Bissiri PG, Holmes CC, Walker SG. A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2016;78(5):1103–1130. doi: 10.1111/rssb.12158. MR3557191 doi: https://doi.org/10.1111/rssb.12158. 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Biswas A, Roy T, Majumder S, Basu A. A new weighted likelihood approach. Stat. 2015;4(1):97–107. doi: 10.1002/sta4.80. MR3405393. doi: https://doi.org/10.1002/sta4.80. 5. [DOI] [Google Scholar]
  10. Blangiardo M, Hansell A, Richardson S. A Bayesian model of time activity data to investigate health effect of air pollution in time series studies. Atmospheric Environment. 2011;45(2):379–386. URL http://www.sciencedirect.com/science/article/pii/S13522310100086426. [Google Scholar]
  11. Brunsdon C, Fotheringham AS, Charlton ME. Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis. 1996;28(4):281–298. doi: 10.1111/j.1538-4632.1996.tb00936.x3. 17. [DOI] [Google Scholar]
  12. Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association. 2000;95(451):888–902. doi: 10.1080/01621459.2000.10474280. MR1804446.doi: https://doi.org/10.2307/2669472. 5. [DOI] [Google Scholar]
  13. Carmona C, Nicholls G. In: Chiappa S, Calandra R, editors. Semi-modular inference: Enhanced learning in multi-modular models by tempering the influence of components; Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics; Proceedings of Machine Learning Research; 2020. pp. 4226–4235. PMLR. MR4147553. 6, 8. [Google Scholar]
  14. Chen VY-J, Deng W-S, Yang T-C, Matthews SA. Geographically weighted quantile regression (GWQR): An application to U.S. mortality data. Geographical Analysis. 2012;44(2):134–150. doi: 10.1111/j.1538-4632.2012.00841.x. MR4158003. 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Chernozhukov V, Hong H. An MCMC approach to classical estimation. Journal of Econometrics. 2003;115(2):293–346. doi: 10.1016/S0304-4076(03)00100-3. URL http://www.sciencedirect.com/science/article/pii/S0304407603001003MR1984779. 10. [DOI] [Google Scholar]
  16. Chong KC, Lee TC, Bialasiewicz S, Chen J, Smith DW, Choy WS, Kra-jden M, Jalal H, Jennings L, Alexander B, et al. Association between meteorological variations and activities of influenza A and B across different climate zones: A multi-region modelling analysis across the globe. Journal of Infection. 2020;80(1):84–98. doi: 10.1016/j.jinf.2019.09.013. 22. [DOI] [PubMed] [Google Scholar]
  17. Cox NJ, Subbarao K. Global epidemiology of influenza: Past and present. Annual Review of Medicine. 2000;51(1):407–421. doi: 10.1146/annurev.med.51.1.407. [DOI] [PubMed] [Google Scholar]
  18. da Silva AR, de Oliveira Lima A. Geographically weighted beta regression. Spatial Statistics. 2017;21:279–303. doi: 10.1016/j.spasta.2017.07.011. URL http://www.sciencedirect.com/science/article/pii/S2211675317300179MR36921893. [DOI] [Google Scholar]
  19. da Silva AR, Rodrigues TCV. Geographically weighted negative binomial regression—incorporating overdispersion. Statistics and Computing. 2014;24(5):769–783. doi: 10.1007/s11222-013-9401-9. MR3229696. 3. [DOI] [Google Scholar]
  20. Dalalyan A, Tsybakov AB. Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Machine Learning. 2008;72(1-2):39–61. 10. [Google Scholar]
  21. Dambon JA, Sigrist F, Furrer R. Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction. Spatial Statistics. 2021;41:100470. doi: 10.1016/j.spasta.2020.100470. URL https://www.sciencedirect.com/science/article/pii/S2211675320300646MR4176952 2. [DOI] [Google Scholar]
  22. Deyle ER, Maher MC, Hernandez RD, Basu S, Sugihara G. Global environmental drivers of influenza. Proceedings of the National Academy of Sciences. 2016;113(46):13081–13086. doi: 10.1073/pnas.1607747113. URL https://www.pnas.org/content/113/46/1308122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Duan S-B, Li Z-L. Spatial downscaling of MODIS land surface temperatures using geographically weighted regression: Case study in Northern China. IEEE Transactions on Geoscience and Remote Sensing. 2016;54(11):6458–6469. 3. [Google Scholar]
  24. Dunson DB, Taylor JA. Approximate Bayesian inference for quantiles. Journal of Nonparametric Statistics. 2005;17(3):385–400. doi: 10.1080/10485250500039049. MR2129840. doi: https://doi.org/10.1080/10485250500039049. 10. [DOI] [Google Scholar]
  25. Finkelman BS, Viboud C, Koelle K, Ferrari MJ, Bharti N, Grenfell BT. Global patterns in seasonal activity of influenza A/H3N2, A/H1N1, and B from 1997 to 2005: Viral coexistence and latitudinal gradients. PLOS ONE. 2007;2(12):1–10. doi: 10.1371/journal.pone.0001296. 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Finley AO, Banerjee S, Carlin BP. spBayes: an R package for univariate and multivariate hierarchical point-referenced spatial models. Journal of Statistical Software. 2007;19(4):1. doi: 10.18637/jss.v019.i04. MR2701470. 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Fotheringham AS, Charlton M, Brunsdon C. The geography of parameter space: an investigation of spatial non-stationarity. International Journal of Geographical Information Systems. 1996;10(5):605–627. doi: 10.1080/02693799608902100. 3. [DOI] [Google Scholar]
  28. Fotheringham AS, Yang W, Kang W. Multiscale geographically weighted regression (MGWR) Annals of the American Association of Geographers. 2017;107(6):1247–1265. doi: 10.1080/24694452.2017.1352480. 26. [DOI] [Google Scholar]
  29. Frank JM, Massman WJ, Ewers BE, Williams DG. Bayesian analyses of 17 winters of water vapor fluxes show bark beetles reduce sublimation. Water Resources Research. 2019;55(2):1598–1623. doi: 10.1029/2018WR023054. 6. [DOI] [Google Scholar]
  30. Friel N, Pettitt AN. Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008;70(3):589–607. doi: 10.1111/j.1467-9868.2007.00650.x. MR2420416. doi: https://doi.org/10.1111/j.1467-9868.2007.00650.x. 4. [DOI] [Google Scholar]
  31. Fuglstad G-A, Lindgren F, Simpson D, Rue H. Exploring a new class of non-stationary spatial Gaussian random fields with varying local anisotropy. Statistica Sinica. 2015;25(1):115–133. URL http://www.jstor.org/stable/24311007MR33288062. [Google Scholar]
  32. Gelfand AE, Banerjee S. Bayesian modeling and analysis of geostatistical data. Annual Review of Statistics and Its Application. 2017;4(1):245–266. doi: 10.1146/annurev-statistics-060116-054155. 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gelfand AE, Kim H-J, Sirmans CF, Banerjee S. Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association. 2003;98(462):387–396. doi: 10.1198/016214503000170. MR1995715. doi: https://doi.org/10.1198/016214503000170. 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Statistics and Computing. 2014;24(6):997–1016. doi: 10.1007/s11222-013-9416-2. MR3253850. 14, 15. [DOI] [Google Scholar]
  35. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;6(6):721–741. doi: 10.1109/tpami.1984.4767596. 10. [DOI] [PubMed] [Google Scholar]
  36. Grünwald P, van Ommen T. Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Analysis. 2017;12(4):1069–1103. doi: 10.1214/17-BA1085. MR3724979. doi: https://doi.org/10.1214/17-BA1085. 10. [DOI] [Google Scholar]
  37. Guo L, Ma Z, Zhang L. Comparison of bandwidth selection in application of geographically weighted regression: a case study. Canadian Journal of Forest Research. 2008;38(9):2526–2534. doi: 10.1139/X08-091. 18. [DOI] [Google Scholar]
  38. Harris I, Jones P, Osborn T, Lister D. Updated high-resolution grids of monthly climatic observations -the CRU TS3.10 Dataset. International Journal of Climatology. 2014;34(3):623–642. doi: 10.1002/joc.3711. 22. [DOI] [Google Scholar]
  39. Holmes CC, Walker SG. Assigning a value to a power likelihood in a general Bayesian model. Biometrika. 2017;104(2):497–503. doi: 10.1093/biomet/asx010. MR3698270. doi: https://doi.org/10.1093/biomet/asx010. 4. [DOI] [Google Scholar]
  40. Hu F, Zidek JV. The weighted likelihood. Canadian Journal of Statistics. 2002;30(3):347–371. doi: 10.2307/3316141. MR1944367. doi: https://doi.org/10.2307/3316141. 5. [DOI] [Google Scholar]
  41. Hu X, Lu Y, Zhang H, Jiang H, Shi Q. Selection of the bandwidth matrix in spatial varying coefficient models to detect anisotropic regression relationships. Mathematics. 2021;9(18) URL https://www.mdpi.com/2227-7390/9/18/234326. [Google Scholar]
  42. Jacob PE, Murray LM, Holmes CC, Robert CP. Better together? Statistical learning in models made of modules. arXiv preprint. 2017:arXiv: 1708.08719. 14. [Google Scholar]
  43. Jiang W, Tanner MA. Gibbs posterior for variable selection in high-dimensional classification and data mining. The Annals of Statistics. 2008;36(5):2207–2231. doi: 10.1214/07-AOS547. MR2458185. doi: https://doi.org/10.1214/07-AOS547. 10. [DOI] [Google Scholar]
  44. Kaplan D, Chen J. A two-step Bayesian approach for propensity score analysis: Simulations and case study. Psychometrika. 2012;77(3):581–609. doi: 10.1007/s11336-012-9262-8. MR2943114. 6. [DOI] [PubMed] [Google Scholar]
  45. Lam TT, Tang JW, Lai FY, Zaraket H, Dbaibo G, Bialasiewicz S, Tozer S, Heraud J-M, Drews SJ, Hachette T, et al. Comparative global epidemiology of influenza, respiratory syncytial and parainfluenza viruses, 2010-2015. Journal of Infection. 2019;79(4):373–382. doi: 10.1016/j.jinf.2019.07.008. 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Lee BS, Haran M. PICAR: An efficient extendable approach for fitting hierarchical spatial models. Technometrics. 2022;64(2):187–198. doi: 10.1080/00401706.2021.1933596. MR4410913. doi: https://doi.org/10.1080/00401706.2021.1933596. 3, 18. [DOI] [Google Scholar]
  47. Leong Y-Y, Yue JC. A modification to geographically weighted regression. International Journal of Health Geographics. 2017;16(1):11. doi: 10.1186/s12942-017-0085-9. MR4158003. 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Lever G, Laviolette F, Shawe-Taylor J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science. 2013;473:4–28. doi: 10.1016/j.tcs.2012.10.013. Special Issue on Algorithmic Learning Theory URL http://www.sciencedirect.com/science/article/pii/S0304397512009346MR3015336 10. [DOI] [Google Scholar]
  49. Li F, Sang H. Spatial homogeneity pursuit of regression coefficients for large datasets. Journal of the American Statistical Association. 2019;114(527):1050–1062. doi: 10.1080/01621459.2018.1529595. MR4011757. doi: https://doi.org/10.1080/01621459.2018.1529595. 2. [DOI] [Google Scholar]
  50. Li Y, Reeves RM, Wang X, Bassat Q, Brooks WA, Cohen C, Moore DP, Nunes M, Rath B, Campbell H, et al. Global patterns in monthly activity of influenza virus, respiratory syncytial virus, parainfluenza virus, and metapneumovirus: a systematic analysis. The Lancet Global Health. 2019;7(8):e1031–e1045. doi: 10.1016/S2214-109X(19)30264-5. 22. [DOI] [PubMed] [Google Scholar]
  51. Li Z, Fotheringham AS. Computational improvements to multi-scale geographically weighted regression. International Journal of Geographical Information Science. 2020;34(7):1378–1397. doi: 10.1080/13658816.2020.1720692MR4158003. 3. [DOI] [Google Scholar]
  52. Lin P-S. Estimating equations for separable spatial-temporal binary data. Environmental and Ecological Statistics. 2010;17(4):543–557. doi: 10.1007/s10651-009-0117-0. MR2756363. 2. [DOI] [Google Scholar]
  53. Liu F, Bayarri M, Berger J, et al. Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis. 2009;4(1):119–150. doi: 10.1214/09-BA404. MR2486241. 4. [DOI] [Google Scholar]
  54. Liu Y, Goudie RJB. Stochastic approximation cut algorithm for inference in modularized Bayesian models. Statistics and Computing. 2022;32(1):1–15. doi: 10.1007/s11222-021-10070-2. MR4350200. 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Liu Y, Lam K-F, Wu JT, Lam TT-Y. Geographically weighted temporally correlated logistic regression model. Scientific Reports. 2018;8(1):1–14. doi: 10.1038/s41598-018-19772-6. 3, 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Liu Y, Goudie RJB. Supplementary Material for Generalized Geographically Weighted Regression Model within a Modularized Bayesian Framework”. Bayesian Analysis. 2023 doi: 10.1214/22-BA1357. 7, 11, 12, 13, 15, 18, 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Lowen AC, Mubareka S, Steel J, Palese P. Influenza virus transmission is dependent on relative humidity and temperature. PLOS Pathogens. 2007;3(10):1–7. doi: 10.1371/journal.ppat.0030151. 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Lowen AC, Steel J. Roles of humidity and temperature in shaping influenza seasonality. Journal of Virology. 2014;88(14):7692–7695. doi: 10.1128/JVI.03544-13. URL https://jvi.asm.org/content/88/14/769222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Lu B, Brunsdon C, Charlton M, Harris P. Geographically weighted regression with parameter-specific distance metrics. International Journal of Geographical Information Science. 2017;31(5):982–998. doi: 10.1080/13658816.2016.1263731. 26. [DOI] [Google Scholar]
  60. Lunn D, Best N, Spiegelhalter D, Graham G, Neuenschwander B. Combining MCMC with ‘sequential’ PKPD modelling. Journal of Pharmacokinetics and Pharmacodynamics. 2009;36(1):19–38. doi: 10.1007/s10928-008-9109-1. 5, 6. [DOI] [PubMed] [Google Scholar]
  61. Luo ZT, Sang H, Mallick B. A Bayesian contiguous partitioning method for learning clustered latent variables. Journal of Machine Learning Research. 2021;22(37):1–52. URL http://jmlr.org/papers/v22/20-136.htmlMR42537302. [Google Scholar]
  62. Ma Z, Xue Y, Hu G. Geographically weighted regression analysis for spatial economics data: A Bayesian recourse. International Regional Science Review. 2020;44(5):582–604. doi: 10.1177/0160017620959823. 4, 24, 25. [DOI] [Google Scholar]
  63. Markatou M. Mixture models, robustness, and the weighted likelihood method-ology. Biometrics. 2000;56(2):483–486. doi: 10.1111/j.0006-341x.2000.00483.x. 5. [DOI] [PubMed] [Google Scholar]
  64. Marques I, Klein N, Kneib T. Non-stationary spatial regression for modelling monthly precipitation in Germany. Spatial Statistics. 2020;40:100386. doi: 10.1016/j.spasta.2019.100386. URL http://www.sciencedirect.com/science/article/pii/S221167531930137XMR4181138 2. [DOI] [Google Scholar]
  65. Martin R, Mess R, Walker SG. Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli. 2017;23(3):1822–1847. doi: 10.3150/15-BEJ797. MR3624879. doi: https://doi.org/10.3150/15-BEJ797. 5. [DOI] [Google Scholar]
  66. Mayfield HJ, Lowry JH, Watson CH, Kama M, Nilles EJ, Lau CL. Use of geographically weighted logistic regression to quantify spatial variation in the environmental and sociodemographic drivers of leptospirosis in Fiji: a modelling study. The Lancet Planetary Health. 2018;2(5):e223–e232. doi: 10.1016/S2542-5196(18)30066-4. URL https://www.sciencedirect.com/science/article/pii/S25425196183006643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. McCandless LC, Douglas IJ, Evans SJ, Smeeth L. Cutting feedback in Bayesian regression adjustment for the propensity score. The International Journal of Biostatistics. 2010;6(2):16. doi: 10.2202/1557-4679.1205. MR2602559. doi: . 6. [DOI] [PubMed] [Google Scholar]
  68. Miller JW, Dunson DB. Robust Bayesian inference via coarsening. Journal of the American Statistical Association. 2019;114(527):1113–1125. doi: 10.1080/01621459.2018.1469995. MR4011766. doi: https://doi.org/10.1080/01621459.2018.1469995. 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Mohammed S, Ravikumar V, Warner E, Patel S, Bakas S, Rao A, Jain R. Quantifying T2-FLAIR mismatch using geographically weighted regression and predicting molecular status in lower-grade gliomas. American Journal of Neuroradiology. 2022;43(1):33–39. doi: 10.3174/ajnr.A7341. URL http://www.ajnr.org/content/43/1/333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Mu J, Wang G, Wang L. Estimation and inference in spatially varying coefficient models. Environmetrics. 2018;29(1):e2485. doi: 10.1002/env.2485. MR3749556. doi: https://doi.org/10.1002/env.2485. 3. [DOI] [Google Scholar]
  71. Nakaya T, Fotheringham AS, Brunsdon C, Charlton M. Geographically weighted Poisson regression for disease association mapping. Statistics in Medicine. 2005;24(17):2695–2717. doi: 10.1002/sim.2129. MR2196209. doi: . 3. [DOI] [PubMed] [Google Scholar]
  72. Paez MS, Gamerman D, De Oliveira V. Interpolation performance of a spatio-temporal model with spatially varying coefficients: application to PM 10 concentrations in Rio de Janeiro. Environmental and Ecological Statistics. 2005;12(2):169–193. doi: 10.1007/s10651-005-1040-7. MR2144400. 2. [DOI] [Google Scholar]
  73. Plummer M. Cuts in Bayesian graphical models. Statistics and Computing. 2015;25(1):37–43. doi: 10.1007/s11222-014-9503-z. MR3304902. 5. [DOI] [Google Scholar]
  74. Reich BJ, Fuentes M, Herring AH, Evenson KR. Bayesian variable selection for multivariate spatially varying coefficient regression. Biometrics. 2010;66(3):772–782. doi: 10.1111/j.1541-0420.2009.01333.x. MR2758213. doi: https://doi.org/10.1111/j.1541-0420.2009.01333.x. 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Rubin DB. For objective causal inference, design trumps analysis. The Annals of Applied Statistics. 2008;2(3):808–840. doi: 10.1214/08-AOAS187. MR2516795. doi: https://doi.org/10.1214/08-AOAS187. 6. [DOI] [Google Scholar]
  76. Subedi N, Zhang L, Zhen Z. Bayesian geographically weighted regression and its application for local modeling of relationships between tree variables. iForest -Biogeosciences and Forestry. 2018;(5):542–552. URL https://iforest.sisef.org/contents/?id=ifor2574-011424. [Google Scholar]
  77. Sugasawa S, Murakami D. Spatially clustered regression. Spatial Statistics. 2021;44:100525. doi: 10.1016/j.spasta.2021.100525. URL https://www.sciencedirect.com/science/article/pii/S221167532100035XMR4277047 2, 3. [DOI] [Google Scholar]
  78. Tamerius JD, Shaman J, Alonso WJ, Bloom-Feshbach K, Uejio CK, Comrie A, Viboud C. Environmental predictors of seasonal influenza epidemics across temperate and tropical climates. PLOS Pathogens. 2013;9(3):1–12. doi: 10.1371/journal.ppat.1003194. 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Tasyurek M, Celik M. RNN-GWR: A geographically weighted regression approach for frequently updated data. Neurocomputing. 2020;399:258–270. URL https://www.sciencedirect.com/science/article/pii/S09252312203024843. [Google Scholar]
  80. Tobler WR. A computer movie simulating urban growth in the Detroit region. Economic Geography. 1970;46(sup1):234–240. 3, 14. [Google Scholar]
  81. Utazi C, Thorley J, Alegana V, Ferrari M, Nilsen K, Takahashi S, Metcalf C, Lessler J, Tatem A. A spatial regression model for the disaggregation of areal unit based data to high-resolution grids with application to vaccination coverage mapping. Statistical Methods in Medical Research. 2019;28(10-11):3226–3241. doi: 10.1177/0962280218797362. MR4002695. doi: https://doi.org/10.1177/0962280218797362. 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Viboud C, Alonso WJ, Simonsen L. Influenza in tropical regions. PLOS Medicine. 2006;3(4):e89. doi: 10.1371/journal.pmed.0030089. 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Walker S, Hjort NL. On Bayesian consistency. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2001;63(4):811–821. doi: 10.1111/1467-9868.00314. doi: https://doi.org/10.1111/1467-9868.00314. 10. [DOI] [Google Scholar]
  84. Wang S, Shi C, Fang C, Feng K. Examining the spatial variations of determinants of energy-related CO2 emissions in China at the city level using Geographically Weighted Regression Model. Applied Energy. 2019;235:95–105. URL https://www.sciencedirect.com/science/article/pii/S03062619183165203. [Google Scholar]
  85. Windle MJS, Rose GA, Devillers R, Fortin M-J. Exploring spatial non-stationarity of fisheries survey data using geographically weighted regression (GWR): an example from the Northwest Atlantic. ICES Journal of Marine Science. 2009;67(1):145–154. doi: 10.1093/icesjms/fsp224. 3. [DOI] [Google Scholar]
  86. Wu D. Spatially and temporally varying relationships between ecological footprint and influencing factors in China’s provinces Using Geographically Weighted Regression (GWR) Journal of Cleaner Production. 2020;261:121089. URL https://www.sciencedirect.com/science/article/pii/S09596526203113673. [Google Scholar]
  87. Wu S, Wang Z, Du Z, Huang B, Zhang F, Liu R. Geographically and temporally neural network weighted regression for modeling spatiotemporal non-stationary relationships. International Journal of Geographical Information Science. 2021;35(3):582–608. doi: 10.1080/13658816.2020.1775836. 3. [DOI] [Google Scholar]
  88. Yan Y, Huang H-C, Genton MG. Vector autoregressive models with spatially structured coefficients for time series on a spatial grid. Journal of Agricultural, Biological and Environmental Statistics. 2021;26(3):387–408. doi: 10.1007/s13253-021-00444-4. MR4292794. 2. [DOI] [Google Scholar]
  89. Zellner A. Optimal information processing and Bayes’s theorem. The American Statistician. 1988;42(4):278–280. doi: 10.1080/00031305.1988.10475585. MR0971095. doi: https://doi.org/10.2307/2685143. 12. [DOI] [Google Scholar]
  90. Zhang T. Information-theoretic upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory. 2006;52(4):1307–1321. doi: 10.1109/TIT.2005.864439. MR2241190. 11. [DOI] [Google Scholar]
  91. Zhao P, Yang H-C, Dey DK, Hu G. Bayesian spatial homogeneity pursuit regression for count value data. 2020 URL https://arxiv.org/abs/2002.066782. [Google Scholar]
  92. Zhu J, Huang H-C, Wu J. Modeling spatial-temporal binary data using Markov random fields. Journal of Agricultural, Biological, and Environmental Statistics. 2005;10(2):212–225. URL http://www.jstor.org/stable/275955562. [Google Scholar]
  93. Zigler CM, Dominici F. Uncertainty in propensity score estimation: Bayesian methods for variable selection and model-averaged causal effects. Journal of the American Statistical Association. 2014;109(505):95–107. doi: 10.1080/01621459.2013.869498. MR3180549. doi: https://doi.org/10.1080/01621459.2013.869498. 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Zigler CM, Watts K, Yeh RW, Wang Y, Coull BA, Dominici F. Model feedback in Bayesian propensity score estimation. Biometrics. 2013;69(1):263–273. doi: 10.1111/j.1541-0420.2012.01830.x. MR3058073. doi: https://doi.org/10.1111/j.1541-0420.2012.01830.x. 6. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

RESOURCES