Skip to main content
PLOS One logoLink to PLOS One
. 2020 May 20;15(5):e0233019. doi: 10.1371/journal.pone.0233019

Comparing Bayesian spatial models: Goodness-of-smoothing criteria for assessing under- and over-smoothing

Earl W Duncan 1,*, Kerrie L Mengersen 1
Editor: Qiang Zeng2
PMCID: PMC7239453  PMID: 32433653

Abstract

Background

Many methods of spatial smoothing have been developed, for both point data as well as areal data. In Bayesian spatial models, this is achieved by purposefully designed prior(s) or smoothing functions which smooth estimates towards a local or global mean. Smoothing is important for several reasons, not least of all because it increases predictive robustness and reduces uncertainty of the estimates. Despite the benefits of smoothing, this attribute is all but ignored when it comes to model selection. Traditional goodness-of-fit measures focus on model fit and model parsimony, but neglect “goodness-of-smoothing”, and are therefore not necessarily good indicators of model performance. Comparing spatial models while taking into account the degree of spatial smoothing is not straightforward because smoothing and model fit can be viewed as opposing goals. Over- and under-smoothing of spatial data are genuine concerns, but have received very little attention in the literature.

Methods

This paper demonstrates the problem with spatial model selection based solely on goodness-of-fit by proposing several methods for quantifying the degree of smoothing. Several commonly used spatial models are fit to real data, and subsequently compared using the goodness-of-fit and goodness-of-smoothing statistics.

Results

The proposed goodness-of-smoothing statistics show substantial agreement in the task of model selection, and tend to avoid models that over- or under-smooth. Conversely, the traditional goodness-of-fit criteria often don’t agree, and can lead to poor model choice. In particular, the well-known deviance information criterion tended to select under-smoothed models.

Conclusions

Some of the goodness-of-smoothing methods may be improved with modifications and better guidelines for their interpretation. However, these proposed goodness-of-smoothing methods offer researchers a solution to spatial model selection which is easy to implement. Moreover, they highlight the danger in relying on goodness-of-fit measures when comparing spatial models.

Introduction

Spatial smoothing is a technique used when modelling the underlying data-generating process of spatial data to account for spatial autocorrelation, as expressed by Tobler’s first law of geography: “… near things are more related than distant things” [1]. Neglecting spatial autocorrelation is akin to ignoring the order of time-series data, leading to greater uncertainty about the model parameters, poorer predictions, and misguided inference. Conversely, when spatial smoothing is applied, it has the benefit of more appropriately representing the statistical uncertainty of model parameters, better predictions, and providing more insight into the layers of the underlying data-generating process, similar to how the trend and seasonality help explain layers of decomposed time series data [2, 3].

Many methods of spatial smoothing have been developed, for both point data as well as areal data, including linear or non-linear functions based on distances, loess smoothing [4], spline functions [5], kriging [6] and Gaussian process priors [7, 8], and empirical Bayes approaches to spatial smoothing [9]. For an overview of smoothing techniques, see Kafadar [2] and Tiwari and Rushton [10]. Empirical Bayes methods, which smooth estimates of points on a spatial surface towards the global mean based on a distribution whose parameters are fixed a priori, gained popularity as computing power increased and parameter estimation techniques became more widely accessible. Fully Bayes methods have also been proposed [10, 11], where the surface is typically estimated by one or more spatially varying parameters with purposefully chosen prior distributions to account for the spatial autocorrelation.

Spatial smoothing plays an important role in a broad range of applications, including the assessment of feature significance [12] and seafloor classification in geostatistics [13], monitoring of groundwater contaminant plumes [14], image processing [11, 15], the calorific value distributions in coal facies [16], and analysis of traffic accidents [17, 18] to name a few.

Notwithstanding the benefits associated with spatial smoothing, there seems to be a small but growing awareness of the dangers associated with under- and over-smoothing. While over-smoothing causes genuine deviations from the local or global mean to be obscured [13, 19], under-smoothing is equally undesirable as it exaggerates features in the surface, making them indistinguishable from background noise, which defeats the point of spatial smoothing. The negative effects of under-smoothing are a lot less vocalised in spatial modelling than in time series modelling, where the link between under-smoothing and residual autocorrelation, large prediction errors, and biased hypothesis tests have been articulated [20, 21].

Despite the growing awareness, there is very little guidance in the literature on how to assess the appropriateness of the level of spatial smoothing, and according to our knowledge, any efforts to account for such smoothing in model selection are non-existent. The latter is evident by the widespread use of model selection criteria like the Bayesian information criterion (BIC) [22], the deviance and related deviance information criterion (DIC) [23], and widely applicable information criterion (WAIC) [24] to compare spatial models, ironically even in studies which aimed to assess the presence of under- or over-smoothing (see for example, Rodrigues and Assunҫão [25] and Law [26]). The problem is that these criteria are designed to quantify goodness-of-fit (GoF), that is, the discrepancy between the observed data and the predicted values from the model, while penalising for over-fitting (model complexity), but they fail to account for the spatial dependencies [27] and the effect that spatial smoothing has on model fit. Put another way, the problem of model selection can be viewed as an optimisation problem with several competing objective functions: in addition to GoF, model parsimony and predictive capability, spatial models necessitate an additional objective function–“goodness-of-smoothing” (GoS). Hence not only should a model which under- or over-smooths be given less preference, but a model with an appropriate amount of smoothing should be preferred over a model without any smoothing, even though it is likely to have a poorer GoF to the observed data.

In the context of Bayesian spatial modelling, spatial smoothing is typically implemented through a prior distribution using spatial weights to define the spatial dependencies; see Cramb et al. [28] for a critical review of popular Bayesian spatial models. One of the most common prior distributions for spatial random effects (SREs) in a Bayesian spatial model is the intrinsic conditional autoregressive (ICAR) prior [19, 29]. The BYM model [11] makes use of the ICAR prior, but also includes unstructured (independent) SREs so that the estimated risks are smoothed towards a local mean as well as a global mean [26]. The two random effects are henceforth referred to as the structured (SSRE) and unstructured spatial random effects (USRE). In response to the complexity of having two sets of SREs, Leroux et al. [30] proposed a model in which the SREs were a weighted mixture of the USRE and SSRE, the latter modelled by the ICAR prior. Although the BYM and Leroux models remain popular, especially in epidemiology [25], some concern about the potential for over-smoothing has been expressed (for example, see Smith et al. [19]; Law [26]; Kandhasamy and Ghosh [31], Lawson and Clark [32], Best et al. [33] and Cramb et al. [28])

This paper has three aims: 1) to demonstrate that reliance on common GoF criteria for spatial model selection is inadequate; 2) to propose several methods for quantifying the degree of smoothing; and 3) to compare these methods against GoF statistics on real data. These methods were developed within the context of disease-mapping using areal data in a Bayesian framework. However, some of these methods were inspired from methodology outside this field and will equally be applicable to problems in other contexts, such as geostatistics; other methods are more specific to the disease-mapping context, but could potentially be extended to a broader class of models and problems with little modification.

Without loss of generality, we impose three constraints on our study. The first is the range of models considered. We limit our analysis to the BYM and Leroux models for several reasons: they are well known and widely used; the ICAR model, which underpins both the BYM and Leroux models, has been criticised for being susceptible to over-smoothing; and as the ensuing analysis reveals, a wide range of models with varying degrees of smoothing can be achieved simply via changes to the hyperprior specification. For the purpose of quantifying and comparing different degrees of smoothing, this is adequate. Moreover, given the large influence of the hyperpriors on smoothing, the choice of model seems secondary. More broadly, other approaches such as models based on Gaussian process priors will suffer similar issues with respect to under- and over-smoothing.

The second constraint is investigating the effect of spatial smoothing parameters or spatial weights on the degree of smoothing. Typically, in models such as the BYM and Leroux, spatial weights are based on first-order adjacency. That is, each pair of spatial units (areas) are assigned a weight of 1 if they are considered (typically geographically) adjacent and zero otherwise. This simplifies the spatial covariance function substantially and improves computation without substantial loss of information. However, many other formulations have been explored (see for example Earnest et al. [27], Law [26], and Duncan et al. [34]). Not only has this issue already received much attention, but the conclusions suggest that binary first-order adjacency weights are often a good choice anyway.

Third, the task of trying to determine the optimal amount of smoothing for a given model is not considered. Again, this has already been addressed in the literature (e.g. Evers et al. [35]), but more importantly, this task is impeded by the lack of guidance on how the degree of smoothing can be quantified.

The structure of this paper is as follows. The Methods section describes the Bayesian spatial models and introduces an important quantity derived from the model parameters which is subsequently used in the analysis. Also described in this section are five approaches to quantifying smoothing and three commonly used GoF measures, as well as the two spatial datasets. The Results section reports the parameter estimates, the GoF and GoS criteria are evaluated which are subsequently used to compare the models. These results and limitations of this study are examined in the Discussion.

Methods

Bayesian spatial models

For specificity, we consider two spatial models for area-level count data that are commonly used in epidemiological modelling. For each model, the data are assumed to follow a Poisson distribution

yiPois(Eieμi)

where yi and Ei are the observed and expected counts respectively, and μi is the log relative risk for the ith area. Assuming k covariates and some weakly informative priors, the Leroux model [30] is specified as

μi=βTxi+si
βkN(0,σ2)
si|s\i~N(ρjwijsjρjwij+1ρ,σs2ρjwij+1ρ)
ρUnif(0,1)
σs2IG(α,η)

and the BYM model [11] is specified as

μi=βTxi+si+ui
βkN(0,σ2)
si|s\i~N(jwijsjjwij,σs2jwij)
uiN(0,σu2)
σs2IG(α,η)
σu2N(0,10)+

where βT = (β0,…,βk)T are the k + 1 regression coefficients, IG denotes the inverse-gamma (IG) distribution, parameterised in terms of shape and rate, N()+ denotes a Normal distribution left-truncated at zero, and all Normal distributions including the truncated distribution are parameterised in terms of mean and variance. The spatial weights wij were fixed a priori as the binary, first-order adjacency weights, σ2 was held fixed at 100, while different combinations of values of α and η were used to fit different models with varying degrees of smoothing.

Given the sensitivity to the hyperprior for σs2, left-truncated Normal (LTN) distributions, N(π,ν)+, were also trialled. Other hyperpriors are possible (see Gelman [36] for example), but are not considered here for the sake of brevity. The specific values of α, η, v, and π are included in S1 Table. It should be stressed that these values are not necessarily sensible from a practical standpoint–they were chosen deliberately to induce a set of maps with varying degrees of smoothing to test the methods for quantifying smoothing described below. This yields a total of 4 models each with 12 model variants labelled A through L. While the relationship between the informativeness of a prior distribution and the impact it will have on smoothing is not straightforward, these model variants are approximately ordered in descending order of smoothing intensity.

Extensions of the standardised incidence ratio

In the disease mapping context, the ratio yi/Ei is called the observed or ‘raw’ standardised incidence ratio (SIR). This is usually unstable due to low incidence and/or small populations at risk [10, 30, 37], and thus the goal is to provide a better estimate, given by the relative risk exp(μi), or posterior SIR.

We introduce a new quantity, the covariate-adjusted SIR (CASIR), which is a key component of the methods below,

CASIRi=exp(μiβTxi)

which is equivalent to exponentiating the SRE, exp(si). In the case of the BYM model, the unstructured spatial random effects (USRE) are also subtracted from μi before exponentiating. Similarly, we define the covariate-adjusted raw SIR (CARSIR) as

CARSIRi=yiEiexp(βTxi)

where yi/Ei is the raw SIR. As will become apparent, the smoothed SIR surface, given by exp(μi), may not necessarily appear smooth, and paradoxically may appear less smooth when more smoothing is applied, and vice versa. This is because the smoothness exhibited by the SIR depends on the effect of the covariate(s), and their relative contribution to the SIR compared to the SRE. Conversely, the CASIR directly reflects the degree of smoothing.

We justify use of the CASIR over the SRE for two reasons. First, the CASIR is comparable to the SIR, the main parameter of interest in these epidemiological models, by converting the SRE to a ratio scale parameter. Second, it allows a theoretical bound on the potential values of CASIR to be computed, which is a central feature of one of the approaches to quantifying smoothing described below. Taking logarithms of the raw SIR to compute a range for si is not reliable since yi may be zero.

Computation

The Leroux model with the IG prior distribution was fit using the R package CARBayes [38], for computational efficiency while the other three models were fit using WinBUGS [39] via the R package R2WinBUGS [40, 41]. Although CARBayes can fit the BYM model with an IG prior, only the sum of the estimated SREs are provided whereas separate estimates of the SSRE and USRE are highly valuable for this analysis. These software use Markov chain Monte Carlo (MCMC) techniques to estimate the posterior distribution. Although other software is available which should produce very similar results, these software were chosen for their reliability and convenience in fitting these particular spatial models.

Approaches to quantifying smoothing

There are potentially several ways to quantify the degree of smoothing attained by a given model. To address the second aim of this paper, five ideas are explored. The origins of these ideas and their technical details are described below.

Ratio of variograms

The classical variogram for area i at lag h is given by

γi(h)=12Ni(h)ji(zizj)2

where Ni(h) is the number of areas which are no more distant than the lag h from area i, and ji denotes all areas i and j which satisfy dij < h where dij is the distance between areas i and j, and zi is a measured variable for area i [3, 6]. Instead of using the Great Circle distance between the centroids of each area, we define dij as the minimum number of boundaries that must be crossed to move from area i to area j, as proposed by Knorr-Held and Raßer [42]. This appears to be more appropriate for areal data as it tends to provide smoother and more robust estimates of the variogram, especially for small lag values. Additionally, under this construction, adjacency of areas defines the autocorrelation in the variogram as well as the weights matrix in the modelling.

The variogram, averaged over the areas,

γ(h)=1Ni=1Nγi(h),

provides a succinct visual representation of the spatial continuity of the variable z = (z1,…,zN). Plotting the variogram of CASIR against the variogram of the CARSIR may be helpful in assessing the degree of smoothing: a variogram that is too flat indicates over-smoothing, while a variogram that is similar to that for the raw SIR indicates under-smoothing. As a quantitative metric for assessing GoS, we propose the ratio of the variograms for CASIR to CARSIR, averaged over the areas and lag parameter. This can be compared against a user-specified target to determine whether the smoothing is appropriate.

Kurtosis preservation

Drawing on inspiration from developments in time series analysis, we propose a method based on the work of Rong and Bailis [43]. The authors address the issue of over-smoothing in time series analysis by using a simple moving average smoothing function such that the moving average window size minimises the “roughness” (defined as the standard deviation of the first-order difference series) with the constraint that the kurtosis of the smoothed time series must be greater than or equal to the kurtosis of the original, unsmoothed time series. That is, they aim to smooth a time series as much as possible while preserving kurtosis. The result is that the smoothed time series retains rare large-scale deviations while smoothing out more frequent modestly sized deviations.

This methodology presented in Rong and Bailis [43] not only provides a technique for smoothing, but also a statistic for quantifying smoothness. It is the latter development that is of interest here, since the spatial smoothing is performed as part of the Bayesian modelling. However, spatial dependencies differ from longitudinal dependencies in terms of how individual units (areas or time points) are assumed to interact. As an analogy to first-order differences in time, we consider a first-order neighbourhood approach in space, that is, differences between a measure at a given area and the mean of its first-order neighbours. The roughness is the standard deviation of these differences over all areas.

For a generic spatial variable zi associated with the ith area, the excess kurtosis is defined as

Kurt(zi)=E[(ziz¯(wi))4]E[(ziz¯(wi))2]23

where z¯(wi) is the weighted mean of {zi,; i = 1,…N}, and wi is the vector of spatial weights pertaining to the ith area. The overall measure of kurtosis is given by averaging over all areas, i = 1,…,N. A larger kurtosis implies that the variation is dominated by infrequent and extreme deviations [43].

Note that whether z is defined as the CASIR or SIR, the kurtosis is very similar when compared with their raw counterpart (i.e. CARSIR and raw SIR). However, the roughness can vary substantially, making inference difficult. In our analyses, the SIR was found to be a more reliable measure, which is what is presented here. For consistency, SIR was also used to compute the kurtosis, i.e. zi = SIRi.

Kappa

Cohen’s kappa statistic [44] has been used previously in the spatial context to compare spatial agreement of patterns and to quantify the magnitude of spatial smoothing (e.g. Sterlacchini et al. [45] and Earnest et al. [27]). The statistic is defined as

κ=Pr(Ao)Pr(Ae)1Pr(Ae)

where Pr(Ao) and Pr(Ae) are the observed and expected proportion of agreement between a spatial variable respectively,

Pr(Ao)=1Ni=1cii
Pr(Ae)=1Nj=1cij×i=1cijN

and {cij} are the elements of a confusion matrix formed from the cross-tabulation of the categories of nominal variables [44]. To cross-tabulate values of continuous variables like the observed and smoothed SIRs, they must first be categorised by specifying “epidemiologically meaningful” thresholds [27, 46]. Following the suggestions of Earnest et al. [27] and Sterlacchini et al. [45], kappa was computed on the quantiles of CASIR and CARSIR using 3 categories (2 cut-offs: 0.25 and 0.75) as well as 5 categories (4 cut-offs: 0.1, 0.3, 0.7, 0.9).

In addition to being designed for categorical data, Cohen’s kappa has several criticisms. Interpretation of kappa is not straightforward since its magnitude can be influenced by multiple factors, and it may not be clear which factor(s) is responsible [45, 46]. While there is no consensus to interpreting kappa, some guidelines have been suggested in the literature (e.g. Landis and Koch [47]). Broadly, kappa values less than or close to zero indicate a lack of agreement, while kappa values close to 1 indicate substantial agreement [4547]. However, the difficulty of interpreting kappa is exacerbated in the spatial context. The statistic does not take into account the spatial structure of the two variates being compared, and being symmetric, there is no clear “baseline” for assessing agreement. Consequently, there is no unambiguous connection between kappa and the degree of smoothness exhibited by the spatial variables. This problem is illustrated in Fig 1.

Fig 1. Examples of Cohen’s kappa for 2 spatial variables categorised into 2 groups, illustrating the difficulty in interpretation of kappa with respect to degree of spatial smoothing.

Fig 1

The kappa values are a) κ^=1, b) κ^=1, c) κ^=1, and d) κ^0.04.

In Fig 1A and 1B, there is perfect agreement between variables A and B. However, the surfaces in b) are not smooth, so a kappa value close to 1 does not necessarily indicate a high degree of smoothness. In Fig 1C, there is perfect disagreement, yet both surfaces are smooth, so the low kappa value should not be interpreted as a low degree of smoothness. In Fig 1D, kappa is approximately 0.04. Regardless of how this is interpreted, it is not clear how it would apply to surfaces A and B simultaneously.

If one of the two variables being compared is designated as the baseline, then this may help in the interpretation. For example, consider the two variables raw and smoothed SIR. The null hypothesis is that the raw SIR is not smooth. As smoothing increases, the disagreement between these variables will increase, thereby reducing the kappa value. Thus it has been suggested that smaller kappa values indicate greater smoothing [27].

In the absence of more definitive guidelines, the following metric to assess the GoS was devised using the results from the other methods as calibration: κ^<0.05 indicates over-smoothing, 0.05 < κ^ <0.95 indicates a reasonable degree of smoothing, and κ^>0.95 indicates under-smoothing.

Note that Earnest et al. [27] compute kappa for the raw and smoothed SIR. However, the only covariates included in their models are temporal, not spatial, making these variables more comparable to the CARSIR and CASIR respectively. As explained above when introducing CASIR, it is necessary to remove the effect of spatial covariates when assessing spatial smoothing. Consequently, in this paper, kappa is computed for the estimates of CASIR and CARSIR, treating the latter as the baseline for agreement.

Fraction of spatial variation

Earnest et al. [27] and Law [26] also consider comparing models based on the fraction of spatial variation explained by the model. This is defined as the ratio of the empirical variance captured by the SSRE to the total spatial variation,

ψ=Var(s)Var(s)+Var(u)

where s and u are the SSRE and USRE in the BYM model respectively–the only model considered by Earnest et al. [27]. As illustrated in Duncan et al. [34], this ratio, albeit using standard deviations rather than variances, is helpful in solving the identifiability issue between s and u, by modifying these random effects according to ψ, which has been applied to the results from all the BYM model variants in this paper. It is not meaningful to compute this ratio again after modification, nor is this ratio applicable to other models which have only one set of SREs, like the Leroux model.

To generalise this concept to all spatial models with a SRE, s, we propose redefining the total spatial variation to be Var(s)+Var(ε) where ε=(ε1,,εN)T are the model residuals, which for the BYM model includes the unstructured spatial random effect. That is, the residuals for the BYM model are defined as

εi=Eieμiuiyi

since the USREs do not contribute to an understanding of the spatial variation but rather represent spatial noise. To compute the ratio, the posterior median for a posterior sample of size M is computed before computing the variance over the areas, i.e.

Var(s)=1N1i=1N(si*s¯*)2
Var(ε)=1N1i=1N(εi*ε¯*)2

where

si*=medianm=1,,M{si(m)}
s¯*=1Ni=1Nsi*

and similarly for εi* and ε¯*. Whether a small or large fraction of spatial variation is preferred depends on the reason for modelling the SIR [27]. Moreover, it is not obvious what values would be considered small or large in general or in a particular application. Given the lack of guidelines for interpreting this statistic for the purpose of assessing the degree of spatial smoothing, this criterion was not given further consideration when comparing the models. However, the results are reported below for completeness.

Relative position of CASIR

The fifth approach to quantifying smoothing begins with the observation that if no smoothing (i.e. no shrinkage) occurs, then the smoothed SIR and CASIR become more similar to their raw counterparts. As the degree of smoothing increases, each estimate of the SIR is smoothed towards the mean of its neighbours, subject to the model constraints and a priori knowledge imposed by the prior distributions. When the maximum amount of smoothing is applied to area i,

CASIRiE(CASIRj~i|y)

which approaches 1 as the SRE tends to zero. This does not imply that all areas will be smoothed towards the global mean, since areas may experience different degrees of smoothing. In fact, some areas will undoubtedly be smoothed away from the global mean. Notwithstanding some small deviations due to the use of posterior point estimates and properties of the posterior sample such as convergence and effective sample size, the CASIRi estimate will lie somewhere between CARSIRi and the posterior mean of its neighbours, E(CASIRj ∼ i |y). If the relative position of CASIR at these two extremes is denoted 0 and 1 respectively, then this quantifies the degree of smoothing exhibited by a given area in relative terms. To quantify the overall degree of smoothing for a given model, the distribution of these relative positions is compared against a specified cut-off (see Table 1 for some examples).

Table 1. Cut-offs used to construct the GoS criteria.
Statistic Cut-off type Criteria
Variogram ratio (u) The ratio, averaged over the lag, is between 0.2 and 0.8.
(c) The ratio, averaged over the lag, is between 0.25 and 0.75.
(pu) The ratio, averaged over the lag, is between 0.1 and 0.4.
Kurtosis Preservation (u) The kurtosis of CASIR ≥ kurtosis of CARSIR and the roughness of CASIR is less than the minimum roughness + 30%
(c) The kurtosis of CASIR ≥ kurtosis of CARSIR and the roughness of CASIR is less than the minimum roughness + 10%
Kappa (u) Kappa lies between 0.05 and 0.95.
(c) Kappa lies between 0.1 and 0.9.
(pu) Kappa lies between 0.05 and 0.7.
Relative position of CASIR (u) At least 75% of the N CASIR point estimates lie within the range 0.01 to 0.99 (inclusive).
(c) At least 85% of the N CASIR point estimates lie within the range 0.02 to 0.98 (inclusive).
(pu) At least 75% of the N CASIR point estimates lie within the range 0.2 to 0.98 (inclusive).

(u) = unbiased; (c) = conservative (less likely to choose under- or over-smoothed mode ls); (pu) = penalise under-smoothing more heavily than over-smoothing.

Assessing GoS criteria

Several criteria were used to classify the models based on example cut-offs, listed in Table 1. These cut-offs can be adjusted in the same way that different cut-offs for DIC and WAIC can be specified to broaden or narrow the set of models considered “good”. A “PASS” indicates that the model variant is neither under- nor over-smoothing under the given criterion. Note that unlike the other GoS approaches, the kurtosis preservation method only has 2 cut-offs as it is not obvious how this criteria can be adjusted to penalise under-smoothing in favour of models with more smoothing. For the reasons outlined above, the fraction of spatial variation is excluded.

Goodness-of-fit and predictive performance

To address the first and third aims of this paper, we consider the following criteria commonly used to measure GoF and check predictive performance. Many studies involving model selection amongst competing spatial models use DIC which evaluates the model GoF while penalising for model complexity (e.g. Law [26] amd Earnest et al. [27]). The DIC was proposed by Spiegelhalter et al. [23] as a generalisation of Akaike’s information criterion (AIC) [48] using information theoretic justification. The DIC can be defined as

DIC=2pD2logp(y|θ¯)

where pD is the effective dimension of the model and p(y|θ¯) is the likelihood evaluated at the posterior mean of the unknown parameters, θ. The WAIC [24, 49] is a similar criterion, defined as

WAIC=2pW2logi=1NEθ[p(yi|θi)|yi].

The advantages of WAIC over DIC include that it uses the entire posterior distribution, is invariant to parameterisation, and closely approximates Bayesian cross-validation [49, 50]. Both GOF criteria are considered here for comprehensiveness.

Gelman et al. [49] propose two variants of pW. Here we use the second variant,

pW=12i=1Nvar[logp(yi|θi)|yi]

which, after simplification, leads to the specific WAIC criterion

WAIC=2i=1N{varm=1,,M[logp(yi|θi(m))]log1Mm=1Mp(yi|θi(m))}

where θi(m) is the estimate of the unknown parameter(s) for the ith area and mth MCMC iteration. Predictions, or theoretical future observations, denoted y˜, can be drawn from the posterior predictive distribution

p(y˜|y)=p(y˜|θ)p(θ|y)dθ

which can be used to assess predictive performance. The idea is that if the model is adequate in describing the data generating process, then the predicted data y˜ will be close to the observed data y. Thus these posterior predictive checks (PPCs) can be viewed as a variation on GoF diagnostics [51].

One specific PPC is the conditional predictive ordinate (CPO) [52] which seeks to re-observe a datum yi given all other observed data, denoted y/i,

CPOi=p(yi|y\i)=p(yi|θ)p(θ|y\i)dθ.

This metric is equivalent to the posterior predictive ordinate (PPO), p(yi|y), in the sense that the set of leave-one-out marginal distributions {p(yi|y\i);i=1,,N} contain the same information about the predictive performance as the marginal distribution p(y) [51, 53]. However, the CPO avoids double use of the data since it is a leave-one-out cross-validation predictive density. Additionally, unlike the PPO, the literature contains several useful guidelines for interpreting the CPO [51]. For detecting outlying observations, Congdon [54] suggests scaling the CPO values by dividing them by the maximum CPO value. Scaled CPOs less than 0.01 suggest areas for which the model does not fit well. To compare models, several overall measures of fit have been proposed (e.g. Ntzoufras [51] and Congdon [54]). However, the most numerically stable option seems to be the sum of the log CPO values, as suggested by Held et al. [55], which we adopt here. The best model is taken to be the model which minimises

i=1Nlog(CPOi).

In addition to these GoF criteria, we use Moran’s I statistic [56] to measure the degree of autocorrelation remaining in the model residuals, checking the model assumption that the residuals are independent and identically distributed.

To compare models with respect to predictive performance, the minimum DIC and WAIC were determined for each model, indicating the best model fit, and model variants with a DIC or WAIC within 2 or 7 units were identified as having reasonable model fit, as per the common rule of thumb [23]. Smaller sums of log CPOs indicated better predictive performance, and the model with the minimum was flagged. Moran’s I was compared across model variants using p-values from the test assuming normality of the statistic under the null hypothesis of no autocorrelation.

Data

Two spatial datasets are analysed. The first is the North Carolina sudden infant death syndrome (SIDS) dataset first presented by Atkinson [57], and subsequently augmented and analysed by Cressie and Read [37] and Cressie and Chan [58] amongst others. The observed data represent counts of SIDS aggregated from 1979 to 1983 for each of the 100 counties in North Carolina. The non-white birth rate over the same period is included here as a covariate. The second dataset is the Scottish lip cancer dataset compiled by Kemp et al. [59] and first analysed in Clayton and Kaldor [9]. This data has been previously analysed by Spiegelhalter et al. [23], Leroux et al. [30], and Duncan et al. [34] amongst others. The observed data represent counts of lip cancer across 56 counties of Scotland, and a spatial covariate representing the percentage of the workforce acting as a proxy for sun exposure is included. A graphical summary of the data is shown in Fig 2. To improve visual interpretation, the northeast island counties of Scotland, Shetland and Orkney, are excluded from all maps. This modification is limited to the maps–data from these counties are still used and estimates for these counties are still generated by the models.

Fig 2. Summary of the two datasets.

Fig 2

The observed and expected counts are shown in greyscale; the gradient is capped at the maximum observed value (57 and 39 respectively); larger expected values are shown in black. The colour gradient for the raw SIR reflects a ratio scale; darker shades of red indicate a higher raw SIR, while darker shades of blue indicate a lower raw SIR.

These datasets were chosen for the following reasons: they each contain one useful spatial covariate, which is essential in demonstrating the importance of CASIR; each study region contains a sufficient number of areas to enable adequate evaluation of spatial effects; they have been extensively analysed previously, corroborating the plausibility of the model specifications and parameter estimates presented here; and they are publicly available data, facilitating reproducibility. Additionally, these data represent real cases. This has the advantage over simulated data which may not resemble realistic data, thus casting doubt on the authenticity of the model results and accuracy of the approaches to quantifying smoothing.

Results

For the sake of brevity, the ensuing figures relate mostly to the lip cancer data, with the remaining results presented in S1 Appendix. Key parameter estimates for the BYM model with IG hyperpriors fit to the lip cancer dataset are summarised in Fig 3. The values represent the posterior means. The first four columns correspond to the linear scale parameters: the SSRE (si), USRE (ui), covariate effect (βxi), and the logarithm of the smoothed SIR (μi). The last two columns correspond to the ratio scale parameters, namely the SIR (eμi) and CASIR (esi). The colour gradient is consistent within each of these two classes of variables (i.e. same hues indicate the same values), but the legend reflects the range of values for the specific variable. Note that the degree of smoothing generally decreases as the model variant increases from A to L.

Fig 3. Maps showing the posterior mean estimates of the key model parameters for 5 select model variants of the BYM model with an IG hyperprior (lip cancer dataset).

Fig 3

Maps of the key parameter estimates for all the alternative models and variants, for both the lip cancer and SIDS datasets are provided as supplementary material (see S1 Fig through S8 Fig).

The spatial pattern of the SIR appears similar across model variants, while the CASIR varies considerably. The contrast between the SIR and CASIR is greater when more smoothing is applied, highlighting the value of CASIR when trying to investigate the occurrence over-smoothing. This is particularly true for this model applied to this dataset, as the maps of the SIR look similar to the map of the raw SIR in Fig 2. A visual inspection of the SIR maps in Fig 3 (and maps of the remaining 7 model variants in S3 Fig) might lead one to conclude that all model variants have under-smoothed, when in fact the majority of the model variants are likely to be over-smoothed, as the subsequent analysis reveals. Moreover, aside from the SIR, these model variants vary considerably in the estimated SSRE, USRE, and covariate effect, each providing different statistical inference.

The smoothing paradox effect on the SIR surface is not readily observed in Fig 3, but is quite noticeable in the results for the Leroux model variants on the lip cancer data (see S1 and S2 Figs), and the BYM model variants on the SIDS data (see S7 and S8 Figs). The extent of this effect depends largely on the contribution of the covariate effect to the log-risk surface and how spatially autocorrelated the covariate is.

Goodness-of-fit criteria

The results for the GoF criteria and Moran’s I p-values are summarised in Figs 4 and 5.

Fig 4. Values of the GoF criteria and Moran’s I p-values for each model variant fit to the SIDS dataset.

Fig 4

Fig 5. Values of the GoF criteria and Moran’s I p-values for each model variant fit to the lip cancer dataset.

Fig 5

The interpretation of Figs 4 and 5 is the same. For the DIC and WAIC, the model that minimises the respective criterion is highlighted blue. This is the best model under this criterion. Models with a DIC or WAIC value within 2 or 7 units are highlighted in lighter shades of blue, indicating a reasonable model fit. For the CPO, the model which minimises the criterion is highlighted. For Moran’s I, models with small p-values are highlighted red.

There are two important observations to be made here. First, the GoF criteria DIC, WAIC, and CPO are rarely in agreement, and sometimes identify very different models. For example, in Fig 4, for the Leroux LTN model, the model variants considered “best” under each of the three GOF criteria are L, F, and C. Second, sometimes the best model under DIC coincided with a low Moran’s I p-value. However, Moran’s I p-values should be interpreted cautiously–a high degree of autocorrelation amongst near-zero residuals should not warrant the same concern as highly autocorrelated residuals that are large in magnitude. This is especially true for those models closer to variant L which have less smoothing and therefore generally have smaller residuals (see S9 and S10 Figs).

Goodness-of-smoothing criteria

Ratio of variograms

The variograms for the lip cancer data are shown in Fig 6. Similar results hold for the SIDS dataset (see S11 Fig). In general, as the smoothing decreases, the relative distance between the CASIR and CARSIR variograms decreases. That is, the ratio of the CASIR variogram to the CARSIR variogram, averaged over the lag, increases.

Fig 6. Variograms for each model variant fit to the lip cancer dataset.

Fig 6

The solid and dashed lines denote the variograms of CASIR and CARSIR respectively, each averaged over the areas. The grey dots denote the area-specific variogram of CASIR. Note that the y-axis has been capped at 1.3 for clarity.

Kurtosis preservation

The kurtosis and roughness for the models fit to the lip cancer dataset are shown in Fig 7. Recall that the aim is to preserve the spatial kurtosis of the SIR with respect to the raw SIR while minimising the roughness of the SIR. The kurtosis was generally preserved for the Leroux models, and less frequently for the BYM models. The model variants in the middle (e.g. S4 Fig through S9 Fig) tend to have less roughness, steering model choice away from more extreme models which are likely to be over- or under-smoothing.

Fig 7. Kurtosis and roughness for each model variant fit to the lip cancer dataset.

Fig 7

The horizontal lines denote the kurtosis of the raw SIR. The black dots denote the estimates of kurtosis and roughness.

The results for the SIDS dataset (see S12 Fig) were less clear, with the SIR kurtosis values being less than the raw SIR kurtosis except for 5 model variants. This suggests that it is not only the type of model (i.e. Leroux vs BYM) that influences how well the kurtosis is preserved, but that it may also depend on other factors including characteristics of the data. Also contrary to the lip cancer data results, the roughness for the SIDS models generally increased with the model variants from A through L.

Kappa

The values of the kappa statistic for the lip cancer data, representing the spatial agreement between CASIR and the baseline CARSIR are shown in Fig 8. The values of kappa generally increase with model variant, as expected. There is not much difference between the kappa values whether 3 or 5 categories are used, but using fewer categories generally improves the robustness of this estimate since there is more information contributing to each cell of the confusion matrix. The results are generally similar for the SIDS data (see S13 Fig), although the kappa values start to decrease for some of the BYM model variants with less smoothing.

Fig 8. Kappa statistic between CASIR and CARSIR for each model variant fit to the lip cancer dataset, using 3 and 5 discrete categories.

Fig 8

Values close to 0 suggest over-smoothing while values close to 1 suggest under-smoothing.

Fraction of spatial variation

The results of the fraction of spatial variation for the lip cancer dataset are shown in Fig 9. The results for the SIDS dataset exhibit a similar trend and magnitude of values (see S14 Fig). For both datasets, the fraction of spatial variation ranges between 0% and 10% approximately, and generally increases with model variant, similar to the kappa statistic.

Fig 9. Fraction of spatial variation for each model variant fit to the lip cancer dataset.

Fig 9

Relative position of CASIR

The posterior mean CASIR values and their relative position for select variants of the BYM IG model fit to the lip cancer dataset are shown in Fig 10. These results correspond to the maps shown in Fig 3. The mean CASIR estimates are denoted by the filled circles, which are situated within the range of potential values. When the degree of smoothing is large, these estimates tend to lie towards the end of the range representing the mean of the neighbouring values. As smoothing decreases, these estimates tend to move towards the opposite end of the range, representing the CARSIR estimates. Note that in general, as smoothing increases, the CASIR estimates are smoothed towards the global mean of 1. However, the direction a given estimate of CASIR moves is not necessarily towards 1; sometimes the CASIR estimate will be smoothed away from 1, depending on the neighbouring values.

Fig 10. Area-specific posterior mean estimates of CASIR and their relative position for select model variants of the BYM IG model (lip cancer data).

Fig 10

The coloured bars represent the theoretical range of CASIR values, coloured according the mean CASIR estimates, with the cross symbol marking the endpoint corresponding to the CARSIR estimate. The dots represent the mean CASIR estimate, coloured according to the relative position, with the pink and green colours indicating cases that are likely over- and under-smoothing respectively (grey indicates areas that were excluded due to the theoretical range of CASIR values being too narrow).

The CASIR values may lay outside the range of potential values due to the flexibility afforded by the prior distribution–the less informative the hyperprior for σs2, the greater the propensity. This effect is minimised by taking the posterior mean of the CASIR values, but conversely, the effect is exaggerated when the range of potential values is very small, thus overestimating the effect of under- or over-smoothing for these areas. To address this, the relative position of CASIR was not computed for areas when the logarithm of the range of potential values was less than 0.03.

The distribution of the relative positions for each model variant fit to the lip cancer data is shown in Fig 11. This gives an overall indicator of whether a given model variant is under- or over-smoothing. If a large portion of the density is greater than or close to 1, this indicates that the model is over-smoothing. Conversely, under-smoothing can be declared for densities close to 0. Note that the values of the relative position of CASIR are capped at -0.2 and 1.2. The distribution of the relative positions for the models fit to the SIDS data is provided as supplementary material (see S15 Fig)

Fig 11. Distribution of the relative position of the CASIR estimates for each model variant (lip cancer data).

Fig 11

Distributions with substantial density close to 0 indicate under-smoothing; distributions with substantial density close to 1 indicate over-smoothing.

Model comparison

The GoF criteria are summarised in Figs 4 and 5. These criteria are often used to conduct model selection on the basis of model fit and parsimony. However, as aforementioned, these criteria often don’t agree, and can lead to poor model choices. In line with the aims of this paper, we now compare the models based on the GoS statistics using the cut-offs described in Table 1. The full results are provided in the supplementary material, S2 and S3 Tables.

The criteria with unbiased and conservative cut-offs tend to favour under-smoothed models. If more smoothing is desired, then the cut-off that penalises under-smoothing is more appropriate. Despite the differences between these GoS statistics mathematically and differences between the criteria definitions, there is substantial agreement among the results given a particular cut-off (u, c, or pu). Focusing on only the criteria which penalise under-smoothing, Fig 12 provides a consensus result, showing which model variants pass 2 GoS criteria and which pass all 3. The best models under the GoF criteria are included for comparison.

Fig 12. Consensus of the results based on the three GoS criteria that penalise under-smoothing, and the best models according to the GoF criteria.

Fig 12

Discussion

This paper presented three existing GoF measures and proposed five new GoS measures. Each of these measures attempts to quantify one or more important characteristics of a model: goodness of model fit, parsimony, and adequacy of spatial smoothing.

The GoS approaches vary from original proposals to reinventions and modifications of existing ideas. Consequently, there are likely great improvements that can be made, both in defining the statistics and the guidelines for their interpretation. For example, the kurtosis preservation method appeared to be the least reliable GoS measure. This may be improved, for example, if the spatial kurtosis were defined differently. Guidelines for the fraction of spatial variation approach are notably lacking, which may be the main drawback of this otherwise seemingly reliable and relatively simple method.

The third aim of this paper was to compare the results of the GoF and GoS statistics. The criteria used for the GoF statistics were taken from the literature, while the criteria for the GoS criteria were specifically designed to favour models with more smoothing rather than less. Such criteria seem appropriate in practice given the benefits of spatial smoothing. Under these particular criteria, summarised in Fig 12 and presented more fully in Figs 5 and 6 and S2 and S3 Tables, there is a fairly strong consensus among the GoS approaches. Conversely, the GoF criteria rarely agree on the best model, often choosing models with substantially different degrees of smoothing, and even choosing models that are arguably greatly under- or over-smoothed according to the GoS consensus results.

Out of the three GoS approaches forming the consensus, the relative position of the CASIR approach coincided with the consensus (2 or more criteria) 91.7% of the time, the variogram ratio approach coincided 88.2% of the time, and kappa coincided 85.4% of the time. The relative position of CASIR is the only GoS statistic that avoided selecting models with small Moran’s I p-values. Thus the relative position of CASIR may be considered the most conservative approach in that other GoS are likely to agree in identifying good models, but not necessarily vice versa. To achieve the most robust model comparison on the basis of spatial smoothing, it is recommended that multiple GoS methods and even multiple criteria are used. However, the relative position of CASIR is likely to perform well if used independently.

While it is difficult to compare GoF against GoS in the absence of a ground truth, the GoS does appear to identify better models more accurately than GoF based on the consistency of the GoS approaches, and the fact that the model variants were intentionally specified to yield over- and under-smoothed models closest to variants A and L respectively. This is corroborated by visual inference from maps such as those shown in Fig 3.

Using the consensus shown in Fig 12 as the benchmark, the problem with relying on GoF measures to identify the best or even a good model becomes apparent. In particular, DIC tended to identify under-smoothed model variants (variant L identified as best model 6 out of 8 times). The WAIC and CPO criteria tended to align better with the GoS criteria, but still showing a tendency to favour models with less and more smoothing respectively. In fact, the WAIC criteria always choose model variants at least as close to L if not closer than the CPO, and DIC always choose model variants closer to L than the WAIC. Clearly there is a great danger in relying on DIC, and to a lesser extent other GoF measures, to perform model selection among competing spatial models.

While the GoS approaches presented in this paper highlight a very important problem, they offer only simple, empirical solutions to quantifying spatial smoothing. They are by no means model-decision theoretic approaches. However, it is hoped that this demonstration of the challenge will motivate the development of more elaborate solutions, perhaps even combing multiple objective functions into a single utility function to be optimised. In the meantime, these simple GoS approaches should prove useful to researchers evaluating spatial models and performing model selection.

The main limitations of this analysis are the scope of the data, models, and criteria used. Both GoS and GoF criteria require subjective input from the user, usually in the form of cut-offs. While care has been taken to use sensible criteria, different cut-offs may produce different results. Only two datasets and two models were used, albeit with several variants. Another possible extension to this research is to compare these approaches compare across other models and other datasets.

Supporting information

S1 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an IG hyperprior (lip cancer data set, 56 counties of Scotland).

(DOCX)

S2 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an LTN hyperprior (lip cancer data set, 56 counties of Scotland).

(DOCX)

S3 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an IG hyperprior (lip cancer data set, 56 counties of Scotland).

(DOCX)

S4 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an LTN hyperprior (lip cancer data set, 56 counties of Scotland).

(DOCX)

S5 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an IG hyperprior (SIDS data set, 100 counties in North Carolina).

(DOCX)

S6 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an LTN hyperprior (SIDS data set, 100 counties in North Carolina).

(DOCX)

S7 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an IG hyperprior (SIDS data set, 100 counties in North Carolina).

(DOCX)

S8 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an LTN hyperprior (SIDS data set, 100 counties in North Carolina).

(DOCX)

S9 Fig. Model residuals for the lip cancer data set, 56 counties of Scotland.

(DOCX)

S10 Fig. Model residuals for the SIDS data set, 100 counties in North Carolina.

(DOCX)

S11 Fig. Variograms for each model variant fit to the SIDS data set.

The solid and dashed lines denote the variograms of CASIR and CARSIR respectively, each averaged over the areas. The grey dots denote the area-specific variogram of CASIR. Note that the y-axis has been capped at 1 for clarity.

(DOCX)

S12 Fig. Kurtosis and roughness for each model variant fit to the SIDS data set.

(DOCX)

S13 Fig. Kappa statistic between CASIR and CARSIR for each model variant fit to the SIDS data set, using 3 and 5 discrete categories.

Values close to 0 suggest over-smoothing while values close to 1 suggest under-smoothing.

(DOCX)

S14 Fig. Fraction of spatial variation for each model variant fit to the SIDS data set.

(DOCX)

S15 Fig. Distribution of the relative position of the CASIR estimates for each model variant (SIDS data).

Distributions with substantial density close to 0 indicate under-smoothing; distributions with substantial density close to 1 indicate over-smoothing.

(DOCX)

S1 Table. The specific values of the hyperparameters α, η, v, and π used to produce the model variants.

(DOCX)

S2 Table. Classification of the models based on the GoS criteria (lip cancer data).

A “PASS” indicates that the model variant is neither under- nor over-smoothing under the given criterion (see Table 1). VR = variogram ratio; KP = kurtosis preservation; K = kappa; RPC = relative position of CASIR; (u) = unbiased; (c) = conservative (less likely to choose under- or over-smoothed models); (pu) = penalise under-smoothing more heavily than over-smoothing.

(DOCX)

S3 Table. Classification of the models based on the GoS criteria (SIDS data).

A “PASS” indicates that the model variant is neither under- nor over-smoothing under the given criterion (see Table 1). VR = variogram ratio; KP = kurtosis preservation; K = kappa; RPC = relative position of CASIR; (u) = unbiased; (c) = conservative (less likely to choose under- or over-smoothed models); (pu) = penalise under-smoothing more heavily than over-smoothing.

(DOCX)

Acknowledgments

The authors would like to thank Dr Susanna Cramb for her feedback on earlier drafts of this manuscript.

List of acronyms

AIC

Akaike’s information criterion

BIC

Bayesian information criterion

BYM

a Bayesian spatial model named after the authors Besag, York, and Mollié

CARSIR

covariate-adjusted raw standardised incidence ratio

CASIR

covariate-adjusted standardised incidence ratio

CPO

conditional predictive ordinate

DIC

deviance information criterion

GoF

goodness-of-fit

GoS

goodness-of-smoothing

ICAR

intrinsic conditional autoregressive

IG

inverse-gamma

LTN

left-truncated Normal

MCMC

Markov chain Monte Carlo

PPC

posterior predictive check

PPO

posterior predictive ordinate

SIDS

sudden infant death syndrome

SIR

standardised incidence ratio

SRE

spatial random effect

SSRE

structured spatial random effect

USRE

unstructured spatial random effect

WAIC

Widely applicable information criterion

Data Availability

All data are publicly available from existing studies, the details of which are provided in the paper.

Funding Statement

This work was supported by the ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS) and Queensland University of Technology (QUT).

References

  • 1.Tobler W. A computer movie simulating urban growth in the Detroit region. Econ Geogr. 1970; 46 (2): 234–240. 10.2307/143141 [DOI] [Google Scholar]
  • 2.Kafadar K. Choosing among two-dimensional smoothers in practice. Comput Stat Data Anal. 1994; 18 (4): 419–439. 10.1016/0167-9473(94)90160-0 [DOI] [Google Scholar]
  • 3.Zhu A-X, Lu G, Lui J, Qin C-Z, Zhou C. Spatial prediction based on third law of geography. Ann GIS. 2018; 24 (4): 225–240. 10.1080/19475683.2018.1534890 [DOI] [Google Scholar]
  • 4.Cleveland WS, Devlin SJ. Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc. 1988; 83 (403): 596–610. [Google Scholar]
  • 5.Silverman BW. Spline smoothing: the equivalent variable kernel method. Ann Stat. 1984; 12 (3): 898–916. [Google Scholar]
  • 6.Matheron G. Principles of geostatistics. Economic Geology, 1963; 58 (8): 1246–1266. [Google Scholar]
  • 7.Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. Massachusetts: MIT Press; 2006. 10.7551/mitpress/3206.001.0001 [DOI] [Google Scholar]
  • 8.Cressie NAC. Statistics for spatial data. Rev. ed. New York: John Wiley & Sons, Inc; 1993. 10.1002/9781119115151 [DOI] [Google Scholar]
  • 9.Clayton D, Kaldor J. Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics. 1987; 43 (3): 671–81. 10.2307/2532003 [DOI] [PubMed] [Google Scholar]
  • 10.Tiwari C, Rushton G. Using spatially adaptive filters to map late stage colorectal cancer incidence in Iowa In: Fisher P, editor. Developments in Spatial Data Handling. Berlin: Springer; 2005. pp. 665–676. 10.1007/3-540-26772-7_50 [DOI] [Google Scholar]
  • 11.Besag J, York J, Mollié A. Bayesian image restoration with application in spatial statistics. Ann Inst Stat Math. 1991; 43 (1): 1–20. 10.1007/BF00116466 [DOI] [Google Scholar]
  • 12.Ganguli B, Wand MP. Feature significance in geostatistics. J Comput Graph Stat. 2004; 13 (4): 954–973. 10.1198/106186004X12515 [DOI] [Google Scholar]
  • 13.Herzfeld UC, Higginson CA. Automated geostatistical seafloor classification–principles, parameters, feature vectors, and discrimination criteria. Comput Geosci. 1996; 22 (1): 35–52. 10.1016/0098-3004(96)89522-7 [DOI] [Google Scholar]
  • 14.McLean MI, Evers L, Bowman AW, Bonte M, Jones WR. Statistical modelling of groundwater contamination monitoring data: A comparison of spatial and spatiotemporal methods. Sci Total Environ. 2019; 652: 1339–1346. 10.1016/j.scitotenv.2018.10.231 [DOI] [PubMed] [Google Scholar]
  • 15.Wong WCK, Chung ACS, Yu SCH. Trilateral filtering for biomedical images. Proceedings of the 2004 2nd IEEE International Symposium on Biomedical Imaging: Macro to Nano (IEEE Cat No. 04EX821); 2004. 10.1109/isbi.2004.1398664 [DOI]
  • 16.Falivene O, Cabrera L, Tolosana-Delgado R, Sáez A. Interpolation algorithm ranking using cross-validation and the role of smoothing effect. A coal zone example. Comput Geosci. 2010; 36 (4): 512–519. 10.1016/j.cageo.2009.09.015 [DOI] [Google Scholar]
  • 17.Zeng Q, Wen H, Huang H, Abdel-Aty M. A Bayesian spatial random parameters Tobit model for analysing crash rates on roadway segments. Accid Anal Prev. 2017; 100: 37–43. 10.1016/j.aap.2016.12.023 [DOI] [PubMed] [Google Scholar]
  • 18.Ziakopoulos A, Yannis G. A review of spatial approaches in road safety. Accid Anal Prev. 2020; 135: 105323 10.1016/j.aap.2019.105323 [DOI] [PubMed] [Google Scholar]
  • 19.Smith TR, Wakefield J, Dobra A. Restricted covariance priors with applications in spatial statistics. Bayesian Anal. 2015; 10 (4): 965–990. 10.1214/14-BA927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Peng RD, Dominici F, Louis TA. Model choice in time series studies of air pollution and morality. J R Stat Soc Ser A Stat Soc. 2006; 169 (2): 179–203. 10.1111/j.1467-985X.2006.00410.x [DOI] [Google Scholar]
  • 21.McElroy TS, Politis DN. Time series: A first course with bootstrap starter. Boca Rato: CRC Press; 2019. [Google Scholar]
  • 22.Gideon S. Estimating the dimension of a model. Ann Stat. 1978; 6 (2): 461–64. 10.1214/aos/1176344136 [DOI] [Google Scholar]
  • 23.Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Series B Stat Methodol. 2002; 64 (4): 583–639. 10.1111/1467-9868.00353 [DOI] [Google Scholar]
  • 24.Watanabe S. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res. 2010; 11: 3571–94. [Google Scholar]
  • 25.Rodrigues EC, Assunҫão R. Bayesian spatial models with a mixture neighborhood structure. J Multivar Anal. 2012; 109: 88–102. 10.1016/j.jmva.2012.02.017 [DOI] [Google Scholar]
  • 26.Law J. Exploring the specifications of spatial adjacencies and weights in Bayesian spatial modeling with intrinsic conditional autoregressive priors in a small-area study of fall injuries. AIMS Public Health. 2016; 3 (1): 65–82. 10.3934/publichealth.2016.1.65 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Earnest A, Morgan G, Mengersen K, Ryan L, Summerhayes R, Beard J. Evaluating the effect of neighbourhood weight matrices on smoothing properties of Conditional Autoregressive (CAR) models. Int J Health Geogr. 2007; 6 (1): 54 10.1186/1476-072x-6-54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Cramb SM, Duncan EW, Baade PD, Mengersen KL. Investigation of Bayesian spatial models. Brisbane: Cancer Council Queensland and Queensland University of Technology (QUT); 2018. Available from: https://eprints.qut.edu.au/115590. [Google Scholar]
  • 29.Besag J. Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Series B Stat Methodol. 1974; 36 (2): 192–236. [Google Scholar]
  • 30.Leroux BG, Lei X, Breslow N. Estimation of disease rates in small areas: a new mixed model for spatial dependence In: Halloran ME, Berry D, editors. Statistical models in epidemiology, the environment and clinical trials. The IMA Volumes in Mathematics and its Applications, vol 116 New York: Springer; 2000. pp. 179–191. 10.1007/978-1-4612-1284-3_4 [DOI] [Google Scholar]
  • 31.Kandhasamy C, Ghosh K. Relative risk for HIV in India–an estimate using conditional auto-regressive models with Bayesian approach. Spat Spatiotemporal Epidemiol. 2017; 20: 27–34. 10.1016/j.sste.2017.01.001 [DOI] [PubMed] [Google Scholar]
  • 32.Lawson AB, Clark A. Spatial mixture relative risk models applied to disease mapping. Stat Med. 2002; 21 (3), 359–370. 10.1002/sim.1022 [DOI] [PubMed] [Google Scholar]
  • 33.Best N, Richardson S, and Thomson A. A comparison of Bayesian spatial models for disease mapping. Stat Methods Med Res. 2005; 14 (1): 35–59. 10.1191/0962280205sm388oa [DOI] [PubMed] [Google Scholar]
  • 34.Duncan EW, White NM, and Mengersen K. Spatial smoothing in Bayesian models: a comparison of weights matrix specifications and their impact on inference. Int J Health Geogr. 2017; 16 (1): 47 10.1186/s12942-017-0120-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Evers L, Molinari DA, Bowman AW, Jones WR, and Spence MJ. Efficient and automatic methods for flexible regression on spatiotemporal data, with applications to groundwater monitoring. Environmetrics. 2015; 26 (6): 431–441. 10.1002/env.2347 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006; 1 (3): 515–533. 10.1214/06-ba117a [DOI] [Google Scholar]
  • 37.Cressie N, Read RC. Spatial data analysis of regional counts. Biom J. 1989; 31 (6): 699–719. 10.1002/bimj.4710310607 [DOI] [Google Scholar]
  • 38.Lee D. CARBayes: An R package for Bayesian spatial modeling with conditional autoregressive priors. J Stat Softw. 2013; 55 (13): 1–24. 10.18637/jss.v055.i13 [DOI] [Google Scholar]
  • 39.Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS–a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput. 2000; 10 (4): 325–337. 10.1023/A:1008929526011 [DOI] [Google Scholar]
  • 40.Sturtz S, Ligges U, Gelman A. R2WinBUGS: a package for running WinBUGS from R. J Stat Softw. 2005; 12 (3): 1–16. 10.18637/jss.v012.i03 [DOI] [Google Scholar]
  • 41.R Core Team. R: A language and environment for statistical computing [Internet]. R Foundation for Statistical Computing; 2019. Available from: https://www.R-project.org. [Google Scholar]
  • 42.Knorr-Held L, and Raßer G. Bayesian detection of clusters and discontinuities in disease maps. Biometrics. 2000; 56 (1): 13–21. 10.1111/j.0006-341x.2000.00013.x [DOI] [PubMed] [Google Scholar]
  • 43.Rong K, Bailis P. ASAP: prioritizing attention via time series smoothing. Proceedings VLDB Endowment. 2017; 10 (11): 1358–1369. 10.14778/3137628.3137645 [DOI] [Google Scholar]
  • 44.Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960; 20 (1):37–46. 10.1177/001316446002000104 [DOI] [Google Scholar]
  • 45.Sterlacchini S, Ballabio C, Blahut J, Masetti M, Sorichetta A. Spatial agreement of predicted patterns in landslide susceptibility maps. Geomorphology. 2011; 125 (1): 51–61. 10.1016/j.geomorph.2010.09.004 [DOI] [Google Scholar]
  • 46.Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005; 85 (3): 257–68. 10.1093/ptj/85.3.257 [DOI] [PubMed] [Google Scholar]
  • 47.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33 (1): 159–174. 10.2307/2529310 [DOI] [PubMed] [Google Scholar]
  • 48.Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csáki F, editors. 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2–8, 1971. Budapest: Akadémiai Kiadó; 1973. pp. 267–281.
  • 49.Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Stat Comput. 2014; 24 (6): 997–1016. 10.1007/s11222-013-9416-2 [DOI] [Google Scholar]
  • 50.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017; 27 (5): 1413–32. 10.1007/s11222-016-9709-3 [DOI] [Google Scholar]
  • 51.Ntzoufras I. Bayesian Modeling Using WinBUGS. Melbourne: Hoboken: Wiley; 2009. 10.1002/9780470434567 [DOI] [Google Scholar]
  • 52.Geisser S, Eddy WF. A predictive approach to model selection. J Am Stat Assoc. 1979; 74 (365): 153–160. 10.2307/2286745 [DOI] [Google Scholar]
  • 53.Gelfand AE. Model determination using sampling-based methods In: Gilks WR, Richardson S, Spiegelhalter D, editors. Markov Chain Monte Carlo in Practice. New York: Chapman and Hall/CRC; 1996. pp. 145–161. [Google Scholar]
  • 54.Congdon P. Chapter 2: Model Comparison and Choice In: Congdon P, editor. Bayesian Models for Categorical Data. Chichester: Wiley-Blackwell; 2005. pp. 29–53. 10.1002/0470092394.ch2 [DOI] [Google Scholar]
  • 55.Held L, Schrödle B, Rue H. Posterior and Cross-validatory Predictive Checks: A Comparison of MCMC and INLA In: Kneib T, Tutz G, editors. Statistical Modelling and Regression Structures. Berlin: Springer-Verlag; 2010. pp. 91–110 [Google Scholar]
  • 56.Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950; 37 (1/2): 17–23. 10.2307/2332142 [DOI] [PubMed] [Google Scholar]
  • 57.Atkinson D. Epidemiology of sudden infant death in North Carolina: do cases tend to cluster? PHSB Studies, No. 16. Raleigh, North Carolina: N. C. Department of Human Resources, Division of Health Services, Public Health Statistics Branch; 1978.
  • 58.Cressie N, Chan NH. Spatial modeling of regional variables. J Am Stat Assoc. 1989; 84 (406): 393–401. [Google Scholar]
  • 59.Kemp I, Boyle P, Smans M, Muir C. Atlas of cancer in Scotland, 1975–1980 Incidence and Epidemiologic Perspective, IARC Scientific Publication 72. Lyon: International Agency for Research on Cancer; 1985. [Google Scholar]

Decision Letter 0

Qiang Zeng

30 Mar 2020

PONE-D-20-05534

Comparing Bayesian Spatial Models: Goodness-of-smoothing Criteria for Assessing Under- and Over-smoothing

PLOS ONE

Dear Dr Duncan,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by May 14 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Qiang Zeng, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that Figures 2-3 and S1-S10 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1.    You may seek permission from the original copyright holder of Figures 2-3 and S1-S10 to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2.    If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper proposed several goodness-of-smoothing criteria for assessing the bayesian spatial model, and compared them against traditional goodness-of-fit statistics on real data. The topic is very important and promising. The study is appropriate for publication in the journal of Plos One with only minor revisions.

As a researcher of traffic statistical analysis, I fully understand that Bayesian spatial models are widely used in accident statistical modelling (some recent references are listed below). It is suggested that latest development and applications for Bayesian spatial models (such as used in traffic fields) can be briefly introduced.

Apostolos Z, George Y. A review of spatial approaches in road safety. Accident Analysis and Prevention. https://doi.org/10.1016/j.aap.2019.105323.

Zeng Q, Gu W , Zhang X, et al. Analyzing freeway crash severity using a Bayesian spatial generalized ordered logit model with conditional autoregressive priors. Accident Analysis and Prevention, 2019, 127, 87-95.

Zeng Q, Wen H, Huang H, et al. A Bayesian spatial random parameters Tobit model for analyzing crash rates on roadway segments[J]. Accident Analysis and Prevention, 2017, 100: 37-43.

Ma Q, Yang H Xie K, et al. Taxicab crashes modeling with informative spatial autocorrelation. Accident Analysis and Prevention, 2019, 131, 297-307.

The abstract is suggested to rewritten with four parts: objective, methods, results and conclusions.

Reviewer #2: This study proposed several methods for quantifying degree of smoothing. By comparing these methods against commonly used goodness-of-fit measures, the authors demonstrated the inadequacy of depending solely on goodness-of-fit criteria for spatial model selection. The topic is interesting and worthy of investigation. The whole manuscript is well structured and easy to follow. Before suggesting it publication, several issues, however, need to be well addressed.

1. The commonly used inverse-Gamma (α,η) prior for variance parameter is sensitive to the values of α and η if the true variance is close to zero (Gelman, 2006). In addition to specifying different combination of values of α and η to fit models with varying degrees of smoothing, the authors are therefore suggested to use a uniform (0, M) prior for σ_s as a benchmark.

2. The definition of spatial correlation should have an effect on the degree of smoothing. Despite simplification and easy to manipulate, use of binary first-order adjacency weights that only areas with common borders are assumed to be spatially correlated is indeed a strong assumption, especially in empirical case studies without validation. This limitation should be highlighted at the end of manuscript.

Reference

Gelman, A., 2006. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1, 515-533.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 May 20;15(5):e0233019. doi: 10.1371/journal.pone.0233019.r002

Author response to Decision Letter 0


25 Apr 2020

Reviewer 1

Comment 1: As a researcher of traffic statistical analysis, I fully understand that Bayesian spatial models are widely used in accident statistical modelling (some recent references are listed below). It is suggested that latest development and applications for Bayesian spatial models (such as used in traffic fields) can be briefly introduced.

Apostolos Z, George Y. A review of spatial approaches in road safety. Accident Analysis and Prevention. https://doi.org/10.1016/j.aap.2019.105323.

Zeng Q, Gu W , Zhang X, et al. Analyzing freeway crash severity using a Bayesian spatial generalized ordered logit model with conditional autoregressive priors. Accident Analysis and Prevention, 2019, 127, 87-95.

Zeng Q, Wen H, Huang H, et al. A Bayesian spatial random parameters Tobit model for analyzing crash rates on roadway segments[J]. Accident Analysis and Prevention, 2017, 100: 37-43.

Ma Q, Yang H Xie K, et al. Taxicab crashes modeling with informative spatial autocorrelation. Accident Analysis and Prevention, 2019, 131, 297-307.

Response 1: The authors thank the reviewer for these useful references. We found the first and third reference particularly relevant for the motivation of this research and have added these to the introduction as suggested, on line 81.

Comment 2: The abstract is suggested to rewritten with four parts: objective, methods, results and conclusions.

Response 2: The abstract has been rewritten in four parts as suggested. The new abstract now reads:

Background:

Many methods of spatial smoothing have been developed, for both point data as well as areal data. In Bayesian spatial models, this is achieved by purposefully designed prior(s) or smoothing functions which smooth estimates towards a local or global mean. Smoothing is important for several reasons, not least of all because it increases predictive robustness and reduces uncertainty of the estimates. Despite the benefits of smoothing, this attribute is all but ignored when it comes to model selection. Traditional goodness-of-fit measures focus on model fit and model parsimony, but neglect “goodness-of-smoothing”, and are therefore not necessarily good indicators of model performance. Comparing spatial models while taking into account the degree of spatial smoothing is not straightforward because smoothing and model fit can be viewed as opposing goals. Over- and under-smoothing of spatial data are genuine concerns, but have received very little attention in the literature.

Methods:

This paper aims to demonstrates the problem with spatial model selection based solely on goodness-of-fit, to propose by proposing several methods for quantifying the degree of smoothing, and to compare these methods. Several commonly used spatial models are fit to real data, and subsequently compared using the against goodness-of-fit and goodness-of-smoothing statistics on real data.

Results:

The proposed goodness-of-smoothing statistics show substantial agreement in the task of model selection, and tend to avoid models that over- or under-smooth. Conversely, the traditional goodness-of-fit criteria often don’t agree, and can lead to poor model choice. In particular, the well-known deviance information criterion tended to select under-smoothed models.

Conclusions:

Some of the goodness-of-smoothing methods may be improved with modifications and better guidelines for their interpretation. However, these proposed goodness-of-smoothing methods offer researchers a solution to spatial model selection which is easy to implement. Moreover, they highlight the danger in relying on goodness-of-fit measures when comparing spatial models.

Reviewer 2

Comment 1: The commonly used inverse-Gamma (α,η) prior for variance parameter is sensitive to the values of α and η if the true variance is close to zero (Gelman, 2006). In addition to specifying different combination of values of α and η to fit models with varying degrees of smoothing, the authors are therefore suggested to use a uniform (0, M) prior for σ_s as a benchmark.

Response 1: The authors implemented model variants using a uniform (0, M) prior as suggested. Specifically, the BYM and Leroux models were re-run on both data sets, each using variants of the uniform prior by changing the value of M, ranging from 0.5 to 10^4 (larger values resulted in priors too vague for the sampler to produce samples from). The results showed very little variation between model variants. This seems to be consistent with Gelman (2006), namely that for a finite but sufficiently large M, inferences are not sensitive to this choice of M. Unfortunately, there is no guarantee that such models will produce a model with an adequate degree of smoothing. In the case of the Scottish lip cancer, the models with the uniform prior were actually quite good in terms of the goodness-of-smoothing compared to the best inverse-gamma and left-truncated normal priors. This was reflected by the GoS criteria as well as a visual inspection of the risk surface and associated maps. Conversely, for the North Carolina SIDS data, the models with the uniform prior resulted in slight over-smoothing for the Leroux variants (comparable to variant D of the Leroux IG and LTN models), and severe over-smoothing for the BYM variants (comparable to variants C of the BYM IG and LTN models). It is comforting to see the proposed GoS criteria performing as expected for the uniform model variants. However, due to the inability to produce variation in the results because of the insensitivity to the hyper –parameter, we have not included these results in the paper. The authors do agree with the reviewer that trying other model specifications would be useful, but this has already been noted as a limitation and a recommendation for future research (lines 805-806 of the revised manuscript).

Comment 2: The definition of spatial correlation should have an effect on the degree of smoothing. Despite simplification and easy to manipulate, use of binary first-order adjacency weights that only areas with common borders are assumed to be spatially correlated is indeed a strong assumption, especially in empirical case studies without validation. This limitation should be highlighted at the end of manuscript.

Response 2: The authors acknowledge that this is a strong assumption. The authors have stated this assumption up front in the introduction as one of the three constraints imposed on this study. Given the focus of the study is on proposing and comparing the goodness-of-smoothing criteria, we do not consider it a limitation of this study per se, but rather that consideration of other spatial weights specifications are out of scope. Consequently, no changes have been made to the manuscript.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Qiang Zeng

28 Apr 2020

Comparing Bayesian Spatial Models: Goodness-of-smoothing Criteria for Assessing Under- and Over-smoothing

PONE-D-20-05534R1

Dear Dr. Duncan,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Qiang Zeng, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Qiang Zeng

7 May 2020

PONE-D-20-05534R1

Comparing Bayesian Spatial Models: Goodness-of-smoothing Criteria for Assessing Under- and Over-smoothing

Dear Dr. Duncan:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Qiang Zeng

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an IG hyperprior (lip cancer data set, 56 counties of Scotland).

    (DOCX)

    S2 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an LTN hyperprior (lip cancer data set, 56 counties of Scotland).

    (DOCX)

    S3 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an IG hyperprior (lip cancer data set, 56 counties of Scotland).

    (DOCX)

    S4 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an LTN hyperprior (lip cancer data set, 56 counties of Scotland).

    (DOCX)

    S5 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an IG hyperprior (SIDS data set, 100 counties in North Carolina).

    (DOCX)

    S6 Fig. Maps showing the posterior mean estimates of the key model parameters for the Leroux model variants with an LTN hyperprior (SIDS data set, 100 counties in North Carolina).

    (DOCX)

    S7 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an IG hyperprior (SIDS data set, 100 counties in North Carolina).

    (DOCX)

    S8 Fig. Maps showing the posterior mean estimates of the key model parameters for the BYM model variants with an LTN hyperprior (SIDS data set, 100 counties in North Carolina).

    (DOCX)

    S9 Fig. Model residuals for the lip cancer data set, 56 counties of Scotland.

    (DOCX)

    S10 Fig. Model residuals for the SIDS data set, 100 counties in North Carolina.

    (DOCX)

    S11 Fig. Variograms for each model variant fit to the SIDS data set.

    The solid and dashed lines denote the variograms of CASIR and CARSIR respectively, each averaged over the areas. The grey dots denote the area-specific variogram of CASIR. Note that the y-axis has been capped at 1 for clarity.

    (DOCX)

    S12 Fig. Kurtosis and roughness for each model variant fit to the SIDS data set.

    (DOCX)

    S13 Fig. Kappa statistic between CASIR and CARSIR for each model variant fit to the SIDS data set, using 3 and 5 discrete categories.

    Values close to 0 suggest over-smoothing while values close to 1 suggest under-smoothing.

    (DOCX)

    S14 Fig. Fraction of spatial variation for each model variant fit to the SIDS data set.

    (DOCX)

    S15 Fig. Distribution of the relative position of the CASIR estimates for each model variant (SIDS data).

    Distributions with substantial density close to 0 indicate under-smoothing; distributions with substantial density close to 1 indicate over-smoothing.

    (DOCX)

    S1 Table. The specific values of the hyperparameters α, η, v, and π used to produce the model variants.

    (DOCX)

    S2 Table. Classification of the models based on the GoS criteria (lip cancer data).

    A “PASS” indicates that the model variant is neither under- nor over-smoothing under the given criterion (see Table 1). VR = variogram ratio; KP = kurtosis preservation; K = kappa; RPC = relative position of CASIR; (u) = unbiased; (c) = conservative (less likely to choose under- or over-smoothed models); (pu) = penalise under-smoothing more heavily than over-smoothing.

    (DOCX)

    S3 Table. Classification of the models based on the GoS criteria (SIDS data).

    A “PASS” indicates that the model variant is neither under- nor over-smoothing under the given criterion (see Table 1). VR = variogram ratio; KP = kurtosis preservation; K = kappa; RPC = relative position of CASIR; (u) = unbiased; (c) = conservative (less likely to choose under- or over-smoothed models); (pu) = penalise under-smoothing more heavily than over-smoothing.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All data are publicly available from existing studies, the details of which are provided in the paper.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES