Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, Australia

Farzana Jahan; Shovanur Haque; James Hogg; Aiden Price; Conor Hassan; Wala Areed; Helen Thompson; Jessica Cameron; Susanna M Cramb

doi:10.1371/journal.pone.0313079

. 2025 Jan 28;20(1):e0313079. doi: 10.1371/journal.pone.0313079

Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, Australia

Farzana Jahan ^1,^2,^*,^#, Shovanur Haque ^2,^#, James Hogg ^2,³, Aiden Price ^2,³, Conor Hassan ^2,³, Wala Areed ^2,³, Helen Thompson ^2,³, Jessica Cameron ^5,^#, Susanna M Cramb ^2,^4,^#

Editor: T Ganesh Kumar⁶

PMCID: PMC11774366 PMID: 39874284

Abstract

Background

Spatial data are often aggregated by area to protect the confidentiality of individuals and aid the calculation of pertinent risks and rates. However, the analysis of spatially aggregated data is susceptible to the modifiable areal unit problem (MAUP), which arises when inference varies with boundary or aggregation changes. While the impact of the MAUP has been examined previously, typically these studies have focused on well-populated areas. Understanding how the MAUP behaves when data are sparse is particularly important for countries with less populated areas, such as Australia. This study aims to assess different geographical regions’ vulnerability to the MAUP when data are relatively sparse to inform researchers’ choice of aggregation level for fitting spatial models.

Methods

To understand the impact of the MAUP in Queensland, Australia, the present study investigates inference from simulated lung cancer incidence data using the five levels of spatial aggregation defined by the Australian Statistical Geography Standard. To this end, Bayesian spatial BYM models with and without covariates were fitted.

Results and conclusion

The MAUP impacted inference in the analysis of cancer counts for data aggregated to coarsest areal structures. However, area structures with moderate resolution were not greatly impacted by the MAUP, and offer advantages in terms of data sparsity, computational intensity and availability of data sets.

Introduction

It is well known that health outcomes vary by residential location. Spatial modelling of health data (e.g. disease mapping) can provide critical insights into the geographic patterns of and the relationships between health outcomes and their determinants. Aggregating spatial data by area is a popular practice in disease mapping as it helps: (1) identify clustering of disease, (2) provide insights into the spatial variability of disease burden, and (3) identify health disparities across different geographic regions [1]. Spatial aggregation can also help protect confidentiality and aid in calculation of pertinent rates and risks. But aggregated spatial data are susceptible to the modifiable areal unit problem (MAUP), where the choice of spatial units used to aggregate the data can significantly impact the analysis results.

The term, MAUP, was used for the first time by Openshaw and Taylor [2] showing the systematic variation in correlation values using different boundary systems for analysis. The MAUP may emerge from two different mechanisms [3]: the zoning effect, where the total number of areas is kept constant but boundaries are redefined; and the scaling effect, where the spatial resolution or aggregation level is altered leading to changes in the total number of areas. Analysing datasets from the same population using different zonations or scales may lead to inconsistent results, which is the essence of the MAUP [3]. Multiple cancer studies have demonstrated that different methods of aggregating health data led to varying interpretations of disease clusters (e.g. [4, 5]), with different zoning schemes and scales of aggregation resulting in different clusters being identified and impacting conclusions about high-risk areas.

Some have argued that aggregation beyond the minimal level yields untrustworthy results [6], while others believe there can be equally valid but different conclusions from analysing data at different aggregation levels [7]. Understanding the MAUP is crucial for analysis of spatial data at the small area level [8], since there are still several unresolved issues [9].

Ongoing research is examining how the MAUP affects various types of data and different geographical structures [7, 10–13], however, Australia’s geography is unique due to its vast geographical size, diverse ecological systems, and highly heterogeneous population distribution. As a result, the MAUP may have different implications for health data analysis in Australia compared to other countries. To date, there are only a handful of published works [6, 11, 14] investigating the impact of the MAUP in the Australian context.

In Australia, MAUP studies have modelled emergency department [11] or hospital data [14] using the popular Bayesian Besag, York and Mollié (BYM) model [15] in the city of Perth, Western Australia. There were 3890 SA1s and 164 SA2s in Perth in 2011, considered in this study, with median populations of 400 (for statistical areas level 1 (SA1s)), and 10000 (for larger SA2s). These studies have proposed various solutions to the MAUP, including using an overlay aggregation method for disease mapping by incorporating information from two nested aggregate levels [14], or minimising the influence of the MAUP through only reporting results at very high resolution, such as SA1s [11]. However, there are far greater numbers of people who present to hospital emergency departments or are admitted to hospitals than experience less common diseases or conditions.

Internationally, there has been some examination of MAUP for less common diseases. In the United States MAUP was evaluated when modelling breast cancer incidence in Indiana State Cancer Registry from 2010 to 2015, but even at the smallest levels examined, areas had between 600 to 3000 residents [16]. Their suggestion to avoid MAUP was to disaggregate counts to approximate individual-point locations, as occurred for a New Hampshire study on birth defects [17]. This used Restricted and Controlled Monte Carlo (RCMC) to disaggregate area-level counts by randomly placing individuals in pixels within the original area, with a higher probability of being placed where the population was higher. Disadvantages included running multiple times to account for the inherent uncertainty in placement, and then being restricted to using analyses designed for individual data.

Investigating the impact of the MAUP in regions with sparse data is crucial for accurate and reliable spatial analyses. In countries like Australia, where large areas have low population densities, this investigation is particularly important to inform policies and decisions that are equitable, effective, and tailored to the unique characteristics of these regions. Understanding the MAUP ensures that data-driven decisions truly reflect the realities of these areas, leading to better outcomes for their inhabitants and environments.

Given the limited suggestions offered to date, there remains a significant opportunity to elucidate the influence of the MAUP within sparse data contexts. The impact of the MAUP has not previously been explored on outcomes with low counts and very sparse populations, such as cancer within rural and remote Australia. The present study aims to fill this research gap by exploring the implications of Bayesian spatial modelling applied to sparse data aggregated at diverse boundary levels across an entire state in Australia.

In this paper we explore the impact of the MAUP using Bayesian spatial models applied to synthetic-style cancer data, which has lower overall counts than previous studies, across different levels of aggregation in Queensland, Australia. We consider the impact on both spatial disease patterns and covariate inference.

Materials and methods

Spatial structures

The Australian Statistical Geography Standard (ASGS) 2016, developed by the Australian Bureau of Statistics, defines a hierarchy of area structures with five levels of resolution. The area structure with the finest resolution is termed “mesh blocks”, and the state of Queensland has 67,047 mesh blocks with a median population of 82. The remaining levels of the hierarchy are termed “Statistical Areas” (SA) and the resolution decreases from SA1 (n = 11,507 areas, median population of 401) to SA4 (n = 19, median population of 198,975), excluding the zero population areas. The hierarchy is nested, so that each mesh block lies wholly within a single SA1, each SA1 lies within a single SA2 and so on.

Data

Simulated lung cancer incidence data for Queensland, Australia were provided by Cancer Council Queensland and reflects the distribution for lung cancer incidence over a ten year period, 2005–2014 [18]. The lung cancer incidence (number of cases) were simulated at the SA2 level, and then aggregated to SA3 and SA4 levels, and dis-aggregated to SA1 and mesh block levels using publicly available geographical correspondence files [19]. The correspondence files enable more accurate aggregations and dis-aggregations by accounting for the total populations of mesh and SA levels [20]. Counts and populations at each level of aggregation were calculated by adding the counts and populations for the respective areas. At each level of aggregation, indirectly-standardised incidence rates for area i (SIR_obs,i) were calculated as the ratio of the observed counts (y_i) divided by the expected counts (E_i), based on the state-wide observed counts in age group k (y_k), and the population in age group k for area i (pop_ik) and state-wide (pop_k), as follows.

\begin{matrix} S I R_{o b s, i} & = & y_{i} / E_{i} for i = 1, 2, \dots, N areas, \\ E_{i} & = & \sum_{k = 1}^{K} \frac{y_{k} p o p_{i k}}{p o p_{k}} . \end{matrix}

Fig 1 shows the simulated cancer counts for each of the different aggregation levels in Queensland, followed by the observed standardised incidence rates in Fig 2. Disease counts are not a good representation of disease risk, since they do not account for the total population in the area. Hence the SIR, which is the ratio of observed to expected incidence cases, is used to summarise the underlying pattern of lung cancer incidence in the simulated Queensland data. Table 1 presents the descriptive statistics of the data.

Table 1. Population and number of lung cancer diagnoses by geographic levels of aggregation.

Aggregate level	Number of areas	Mean	SD	Min	Median	IQR	Max
	Population
Mesh	67047.0	87.0	57.7	3.0	82	63.0	2446.0
SA1	11507.0	422.0	181.4	14.0	401.0	186.0	5156.0
SA2	507.0	7737.0	3932.6	7.0	7120.0	4764.0	23338.0
SA3	82.0	48878.0	24660.5	10487.0	45363.0	22764	159519.0
SA4	19.0	160431.0	83677.4	81082.0	198975.0	91187	462573.0
	Lung cancer diagnosis
Mesh	67047.0	0.3	0.3	0	0.3	0.4	10.0
SA1	11507.0	1.9	1.1	0	1.7	1.3	16
SA2	507.0	39.6	27.1	0	33.0	28	216.0
SA3	82.0	252.4	134.2	55.0	223.4	151.3	672.0
SA4	19.0	1089.4	590.5	396.2	972.0	573.6	2778

Open in a new tab

SD stands for standard deviation, IQR stands interquartile range is calculated

as the difference between 75^th and 25^th percentile

Socio-Economic Indexes for Areas (SEIFA)

Lung cancer is known to have a strong association with socioeconomic factors, with higher incidence among areas with greater socioeconomic disadvantage [21]. To study the impact of the MAUP on inference at different levels of spatial aggregation, we included a covariate: area-level socioeconomic disadvantage. We used the Australian Bureau of Statistics’ Index of Relative Socioeconomic Disadvantage (IRSD), categorised into quintiles, where a region in the first quintile is amongst the most disadvantaged regions and a region in the fifth quintile is amongst the least disadvantaged regions [22]. The IRSD scores and quintiles are publicly available at the SA1 and SA2 level, and thus, SA2-level IRSD scores were disaggregated to mesh block and aggregated to higher regions (SA3 and SA4) following the methods recommended by the Australian Bureau of Statistics [23], and formed into quintiles.

Ethical approval

Only simulated data were used in this manuscript, lung cancer data was simulated by Cancer Council Queensland following the appropriate ethical consideration [18]. This was approved by the Data Custodians of the original data. The other datasets used in this manuscript are the geographical boundary levels, population counts which are publicly available via Australian Bureau of Statistics website [24].

Statistical methods

The impact of the MAUP on varying area levels was assessed by investigating the underlying spatial autocorrelation in observed data and model residuals. We also compared the posterior summaries of the parameters of interest from the Bayesian spatial models fitted to five different aggregate levels (mesh blocks to SA4 level).

Spatial autocorrelation

When spatial autocorrelation is present, analyses conducted need to be able to adequately incorporate it [25]. While there are multiple indices and tests available to detect the presence of spatial autocorrelation, in this study we use Moran’s I, an inferential statistic used to measure spatial autocorrelation [26], in two ways. First, to assess if the presence of spatial autocorrelation varies among different aggregations of raw data. Second, to check if the model residuals have any remaining spatial autocorrelation. Spatial models are fitted using a spatially structured error component to accommodate the spatial autocorrelation in the observed data. However, there could be well fitting models that provide good predictions but still have spatial autocorrelation indicated by significant Moran’s I on model residuals [27].

Bayesian spatial model: BYM model

The observed cancer counts from different small area aggregations are modelled using a very well-known Bayesian spatial modelling framework, a BYM model [15]. The BYM model is a type of generic three level Bayesian hierarchical model, where at the first stage the observed disease counts are modelled using an appropriate likelihood, the second stage models the spatial association by specifying appropriate structures and the third stage specifies the hyperprior distributions [1].

The BYM model incorporates extra variation at the second stage via two random effect terms: a) a spatially structured component and b) a spatially unstructured component. Inclusion of both spatially structured and unstructured random effects enable the disease rates to be smoothed at both global and local level [28]. For the spatially structured term, the most commonly used approach for areal spatial data is the Gaussian Markov random field (GMRF) model which conditions area i within its neighbourhood [29, 30], particularly in the form of the conditional autoregressive (CAR) prior. The detailed BYM model used is presented below:

Let Y_i be the observed lung cancer counts, which can be expressed under the general disease mapping framework using a Poisson likelihood [1]:

\begin{matrix} Y_{i} \sim Poisson (E_{i} e^{μ_{i}}), for i = 1, 2, \dots, N areas, \end{matrix}

where E_i is the expected cancer counts in each area i. The present study will have different values of N for different boundary levels (Table 1). The log-relative risk, denoted μ_i, is often expressed as a regression equation:

\begin{matrix} μ_{i} = α + x_{i}^{T} β + ψ_{i}, \end{matrix}

where the intercept α denotes an overall fixed effect, β is the covariate effects associated with the vector of covariates x_i relating to area i and ψ_i are the spatial random effects. In the context of this study, x_i represents the SEIFA IRSD quintiles for each area, where the first quintile (most disadvantaged) is considered as the reference category and β is hence a 4 × 1 vector representing the covariate effect of SEIFA IRSD quintiles 2, 3, 4, and 5. The Gaussian priors, $N (0, σ_{α}^{2})$ and $N (0, σ_{β}^{2})$ are used for α and β. The value of hyperparameters $σ_{α}^{2}$ and $σ_{β}^{2}$ are usually chosen to be a large number. In the present study, we have used the default priors and parameters provided by the CARBayes package ( $σ_{α}^{2} = σ_{β}^{2} = 1, 000, 000$ ) [31]. The random effect ψ_i = u_i+v_i, where u_i is a spatial random effect a CAR prior structure and v_i is the unstructured random effect. The conditional distribution of each u_i can be expressed as:

\begin{matrix} u_{i} | u_{j, i \neq j} \sim N (\frac{\sum_{j} w_{i j} u_{j}}{\sum_{j} w_{i j}}, \frac{σ_{u}^{2}}{\sum_{j} w_{i j}}), \end{matrix}

where w_ij are the weights defining the relationship between area i and its neighbours such that:

\begin{matrix} w_{i j} = {\begin{matrix} 1, & if areas i and j are adjacent \\ 0, & otherwise. \end{matrix} \end{matrix}

The prior for the unstructured random component v_i is typically an independent normal distribution,

\begin{matrix} v_{i} \sim N (0, σ_{v}^{2}) . \end{matrix}

The hyperpriors placed on the variance terms $σ_{u}^{2}$ and $σ_{v}^{2}$ are Inverse-Gamma(a, b) with hyperparameters a = 1 (shape) and b = 0.01 (scale) [31].

Implementation

All the analyses in this study were implemented using R [32]. In total, ten BYM models were fitted covering every combination of aggregation level (mesh block, SA1, SA2, SA3 and SA4 level) and model specification (with and without covariates). Fully Bayesian inference was conducted via Markov chain Monte Carlo in the R package CARBayes version 5.2.5 [31]. For a single chain, we used 1,500,000 total iterations, discarding the first 500,000 as burnin. The posterior draws were thinned by 100 to reduce autocorrelation, leaving 10,000 iterations. Model convergence was checked through Geweke diagnostics [33]. The spatial autocorrelation of the observed data and model residuals were checked using Moran’s I with the R package spdep version 1.2–8 [34].

The associated R code for model implementation and calculation of spatial autocorrelation indices are available in the Github repository: https://github.com/Farzana-Jahan/MAUP.git.

Results

The simulated lung cancer data had a total of 20,700 lung cancer cases, with the median number per area ranging from 0.27 in meshblocks to 972 in SA4s (Table 1). There was large variation in observed and modelled SIRs across the state, with more remote western regions typically having higher rates, and the same area could have very different raw and modelled SIRs at different levels of aggregation (Figs 2–4). High spatial autocorrelation was present in the observed data at the mesh block, SA1 and SA2 levels, but not higher levels of aggregation (Table 2). After modelling, regardless of whether the socioeconomic variable was included, the residuals still showed high positive spatial autocorrelation (Table 2), but this likely could be attenuated by including additional covariates. However, the models for each level of aggregation had predominately spatially structured smoothing occurring, as demonstrated through the fraction of spatial variation, and this was slightly attenuated after including socioeconomic details for SA1 to SA3, and markedly for SA4 (Tables 3 and 4).

Table 2. Moran’s I of observed counts and modelled residuals.

Aggregate level	Observed counts		Residuals (without covariate)		Residuals (with covariate)
Aggregate level	statistic	p-value	statistic	p-value	statistic	p-value
Mesh	0.274	<0.0001	0.226	0.001	0.273	0.001
SA1	0.293	0.001	0.269	0.0001	0.21	<0.0001
SA2	0.228	0.001	0.245	0.0001	0.216	<0.0001
SA3	0.022	0.318	0.062	0.1476	0.135	0.0237
SA4	0.152	0.062	-0.078	0.5518	-0.231	0.917

Open in a new tab

Table 3. Posterior summary of simulated lung cancer BYM model without covariates.

Aggregate level	Parameter estimates			Fraction of spatial variation
Aggregate level	α Median (95% CI)	$σ_{u}^{2}$ Median (95% CI)	$σ_{v}^{2}$ Median (95% CI)	Fraction of spatial variation
Mesh	-0.45 (-0.47,-0.43)	0.08 (0.07,0.09)	0.0013 (0.0007,0.0023)	0.98
SA1	-0.0037 (-0.0178,0.0103)	0.0054 (0.0033,0.0083)	0.0009 (0.0006,0.0015)	0.85
SA2	-0.0052 (-0.02,0.0093)	0.007 (0.0032,0.0124)	0.0031 (0.0014,0.0058)	0.69
SA3	-0.0005 (-0.017,0.0164)	0.0133 (0.0043,0.028)	0.0048 (0.0018,0.0097)	0.73
SA4	-0.0018 (-0.0222,0.0198)	0.0452 (0.003,0.0633)	0.019 (0.0028,0.0247)	0.70

Open in a new tab

Table 4. Posterior summary of simulated lung cancer BYM model with covariates.

Aggregate level	Parameter estimates							Fraction of spatial variation
Aggregate level	α Median (95% CI)	β₁ Median (95% CI)	β₂ Median (95% CI)	β₃ Median (95% CI)	β₄ Median (95% CI)	$σ_{u}^{2}$ Median (95% CI)	$σ_{v}^{2}$ Median (95% CI)	Fraction of spatial variation
Mesh	-0.33 (-0.55, -0.10)	-1.71 (-1.95,-1.47)	-0.471 (-0.83,-0.39)	-0.22 (-0.44,-0.01)	0.02 (-0.20, 0.24)	0.26 (0.23, 0.28)	0.001 (0.0007,0.0021)	0.99
SA1	0.12 (0.09,0.15)	-0.07 (-0.11, -0.03)	-0.12 (-0.17, -0.08)	-0.18 (-0.24,-0.15)	-0.28 (-0.32,-0.22)	0.002 (0.001,0.003)	0.0009 (0.006,0.0015)	0.69
SA2	0.199 (0.17,0.23)	-0.131 (-0.17,-0.09)	-0.229 (-0.27,-0.19)	-0.287 (-0.33,-0.24)	-0.462 (-0.51,-0.41)	0.0015 (0.0008,0.003)	0.0012 (0.0007,0.0019)	0.55
SA3	0.10 (0.0598,0.147)	-0.047 (-0.11,0.017)	-0.06 (-0.13,0.005)	-0.153 (-0.23,-0.08)	-0.27 (-0.34,-0.2)	0.004 (0.002,0.009)	0.003 (0.001,0.005)	0.57
SA4	0.13 (0.05,0.22)	-0.08 (-0.21,0.04)	-0.14 (-0.285,0.008)	-0.19 (-0.33,-0.06)	-0.32 (-0.46,-0.16)	0.008 (0.002,0.024)	0.005 (0.002,0.012)	0.32

Open in a new tab

Differences were found in the estimates for socioeconomic coefficients depending on the level of aggregation. For instance, β₄ (least disadvantaged) was essentially 0 and non-significant at the mesh block level, while all other aggregations indicated significantly negative values (Table 4). This meant that for SA1 and above, least disadvantaged areas had lower lung cancer diagnosis rates compared to most disadvantaged areas (the reference category). In contrast, estimates for β₂ and β₃ had negative associations at higher resolution that turned to insignificance for SA3 and SA4s (Table 4). These differences could be a result of the MAUP. However, apart from meshblocks, each level of aggregation showed a consistent socioeconomic gradient from the modelled results, with increasingly negative coefficients as socioeconomic disadvantage diminished.

It is evident from the posterior summaries that inference changes with different aggregation of areas across Queensland. For models both with and without covariates, the width of the credible intervals for regression parameters (α, β_i), and variance parameters ( $σ_{u}^{2}$ and $σ_{v}^{2}$ ), were smaller for lower levels of aggregation (SA1 and SA2) and wider for the larger boundaries (SA3 and SA4) (Tables 3 and 4 and Figs 5 and 6), except for inference using the mesh block level data.

Discussion and conclusion

The present study sought to understand the impact of the MAUP in spatial analysis of sparse disease data across different administrative boundary levels when using a popular Bayesian spatial model in a geographically challenging environment. We found important differences in all aspects: spatial patterns, inference from estimates and coefficient results.

One proposed method to minimising the MAUP is through using high spatial resolution [11]. However, we found that meshblocks, which typically represent 30 to 60 dwellings [35], were too sparse for a condition as rare as cancer. Estimates for model parameters at the meshblock level often greatly differed from all other aggregation levels, and the large number of areas made it computationally intensive. While the next aggregation level of SA1s performed better, it is extremely difficult to obtain health data with location either provided as an SA1, or with the detailed street address needed to assign an SA1. Given SA2s tend to be the lowest realistic area size possible in health datasets, the large differences in modelled covariate coefficients between SA1s and SA2s and different spatial patterns amplifies the importance of acknowledging the MAUP when conducting spatial analyses.

While modeling at the mesh block level can increase the sample size (i.e., the number of areas) and reduce heterogeneity within areas, it can also lead to situations where many areas have a small or zero number of cases or population, resulting in a loss of information and the potential for oversmoothing. This aligns with the findings in Fontanet et al. [16] where the smallest level failed to provide meaningful inference due to smaller populations and counts. Kok et al. [11] also found this to be the case when analysing hospitalisation rates for foot-related issues among the Indigenous population of Australia. Our findings, in conjunction with theirs, suggest that while aiming for the highest level of resolution may be optimal to reduce the biases of the MAUP, using mesh blocks can introduce other problems such as small numbers, zero populations, and computational inefficiencies. In small-area health studies, particularly those involving rare diseases or small populations, sparse data can lead to highly imprecise rate estimates. This imprecision is amplified by the MAUP, causing substantial variability in the identification of disease clusters based on different spatial units [4, 36]. As a result, health interventions and resource allocations might be misdirected due to the unreliability of the spatial analysis.

In addition, advantages of using the well-defined boundary levels of SA1 and SA2 also enables easy incorporation of additional demographic and socio-economic data into disease models [37]. Detailed spatial data at these levels can often be more effectively communicated to stakeholders, policymakers, and the public, facilitating better understanding and engagement.

With higher aggregations, we found that the spatial autocorrelation of the data is no longer significant (for SA3 and SA4 levels), so use of spatial modelling/smoothing may not be required. But since SA1 and SA2 level data has significant spatial autocorrelation, analysis of cancer incidence data at these levels should be performed using a model that accounts for spatial autocorrelation.

The study has several limitations arising from data and methodology restrictions. Data constraints required SA1 and meshblock level data to be created from the SA2 level. The high level of heterogeneity within larger areas such as SA3s and above means that area-level socioeconomic variables are not released, and this may have impacted results, despite using recommended methodology to create these. However, this is also a feature of the MAUP: both the outcome and its covariates are impacted by aggregation. Some models did not fit well, considering the large amount of spatial correlation remaining in the residuals. As inadequate adjustment for spatial autocorrelation can bias fixed effects, in practical applications researchers are encouraged to explore different spatial structures and more useful covariates, particularly spatially correlated covariates. Our focus was on the commonly used Australian areal structures (Mesh block, SA1, SA2, SA3 and SA4), but rezoning boundaries (while keeping land areas consistent) can also identify the MAUP. We were interested in seeing how sparse data performed in the challenging Australian context, but our approach could be repeated considering other health conditions of varying prevalence, or including all of Australia.

In conclusion, as a result of this work we propose an alternative narrative regarding the optimal aggregation level for spatial modelling. Although it is common to analyse spatial data at the lowest level of aggregation possible, we suggest that a natural balance is instead preferable for inference, modelling and most importantly, reducing the impact of the MAUP. In the context of Australia, we found that although analysing cancer incidence at the mesh block level (lowest aggregation) alleviates the concerns of the MAUP, is not advisable due to the high computational burden and extremely noisy and sparse data. Moreover, when using higher aggregation levels such as SA3 and SA4, uncertainty increases, making it more difficult to identify useful covariates as a result of the MAUP. Thus, for the lung cancer data used in this work, we recommend modelling at the SA1 or SA2 level. Due to the sparsity inherent in disaggregating cancer data, we hypothesise that a similar conclusion may be drawn for other cancers and other countries with areas of low population density.

Supporting information

S1 Data

(XLSX)

pone.0313079.s001.xlsx^{(83.7KB, xlsx)}

Data Availability

Simulated lung cancer data for Queensland used in the research are made available in the form of supplmentary information.

Funding Statement

QUT Centre for Data Science.

References

1. Best N, Richardson S, Thompson A. A comparison of Bayesian spatial models for disease mapping. Statistical methods in medical research. 2005;14(1):35–59. doi: 10.1191/0962280205sm388oa [DOI] [PubMed] [Google Scholar]
2. Openshaw S. The modifiable areal unit problem. Concepts and techniques in modern geography. 1984;38, Norwich:GeoBooks. [Google Scholar]
3. Wong DW. The modifiable areal unit problem (MAUP). WorldMinds: geographical perspectives on 100 problems: commemorating the 100th anniversary of the association of American geographers 1904-2004. 2004; p. 571–575. doi: 10.1007/978-1-4020-2352-1_93 [DOI] [Google Scholar]
4. Fotheringham AS, Wong DW. The modifiable areal unit problem in multivariate statistical analysis. Environment and planning A. 1991;23(7):1025–1044. doi: 10.1068/a231025 [DOI] [Google Scholar]
5. Gregorio DI, DeChello LM, Samociuk H, Kulldorff M. Lumping or splitting: seeking the preferred areal unit for health geography studies. International Journal of Health Geographics. 2005;4:1–10. doi: 10.1186/1476-072X-4-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Tuson M, Yap M, Kok MR, Murray K, Turlach B, Whyatt D. Incorporating geography into a new generalized theoretical and statistical framework addressing the modifiable areal unit problem. International journal of health geographics. 2019;18(1):1–15. doi: 10.1186/s12942-019-0170-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Buzzelli M. Modifiable areal unit problem. International encyclopedia of human geography. 2020; p. 169. doi: 10.1016/B978-0-08-102295-5.10406-8 [DOI] [Google Scholar]
8. Nelson JK, Brewer CA. Evaluating data stability in aggregation structures across spatial scales: revisiting the modifiable areal unit problem. Cartography and Geographic Information Science. 2017;44(1):35–50. doi: 10.1080/15230406.2015.1093431 [DOI] [Google Scholar]
9. Manley D. Scale, Aggregation, and the Modifiable Areal Unit Problem. In: Handbook of regional science. Springer; 2013. p. 1157–1171. [Google Scholar]
10. Marí-Dell’Olmo M, Oliveras L, Vergara-Hernández C, Artazcoz L, Borrell C, Gotsens M, et al. Geographical inequalities in energy poverty in a Mediterranean city: Using small-area Bayesian spatial models. Energy Reports. 2022;8:1249–1259. doi: 10.1016/j.egyr.2021.12.025 [DOI] [Google Scholar]
11. Kok MR, Tuson M, Yap M, Turlach B, Boruff B, Vickery A, et al. Impact of the modifiable areal unit problem in assessing determinants of emergency department demand. Emergency Medicine Australasia. 2021;33(5):794–802. doi: 10.1111/1742-6723.13727 [DOI] [PubMed] [Google Scholar]
12. Dapena AD, Vázquez EF, Morollón FR. The role of spatial scale in regional convergence: the effect of MAUP in the estimation of β-convergence equations. The Annals of Regional Science. 2016;56(2):473–489. doi: 10.1007/s00168-016-0750-0 [DOI] [Google Scholar]
13. Jeffery C, Ozonoff A, Pagano M. The effect of spatial aggregation on performance when mapping a risk of disease. International journal of health geographics. 2014;13(1):1–9. doi: 10.1186/1476-072X-13-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Tuson M, Yap M, Kok MR, Boruff B, Murray K, Vickery A, et al. Overcoming inefficiencies arising due to the impact of the modifiable areal unit problem on single-aggregation disease maps. International journal of health geographics. 2020;19(1):1–18. doi: 10.1186/s12942-020-00236-y [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Besag J, York J, Mollié A. Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics. 1991;43(1):1–20. doi: 10.1007/BF00116466 [DOI] [Google Scholar]
16. Fontanet CP, Carlos H, Weiss JE, Diaz MCG, Shi X, Onega T, et al. Evaluating Geographic Health Disparities in Cancer Care: Example of the Modifiable Areal Unit Problem. Annals of Surgical Oncology. 2023;30(12):6987–6989. doi: 10.1245/s10434-023-14140-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Shi X, Miller S, Mwenda K, Onda A, Rees J, Onega T, et al. Mapping disease at an approximated individual level using aggregate data: a case study of mapping New Hampshire birth defects. International Journal of Environmental Research and Public Health. 2013;10(9):4161–4174. doi: 10.3390/ijerph10094161 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Cancer Council Queensland. Simulated Lung cancer data for Queensland; 2015. Email: Statistics@cancerqld.org.au.
19.ABS. Australian statistical geography standard (ASGS); 2016. Available from: https://www.abs.gov.au/AUSSTATS/abs@.nsf/allprimarymainfeatures/871A7FF33DF471FBCA257801000DCD5F?opendocument.
20.ABS. Main Features-Overview; 2016. Available from: https://www.abs.gov.au/ausstats/abs@.nsf/Lookup/by%20Subject/1270.0.55.001~July%202016~Main%20Features~Overview~1.
21. Wong MC, Lao XQ, Ho KF, Goggins WB, Tse SL. Incidence and mortality of lung cancer: global trends and association with socioeconomic status. Scientific reports. 2017;7(1):14300. doi: 10.1038/s41598-017-14513-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.ABS. Socio-Economic Indexes for Areas; 2016. Available from: https://www.abs.gov.au/ausstats/abs@.nsf/mf/2033.0.55.001.
23.ABS. 2033.0.55.001—Census of Population and Housing: Socio-Economic Indexes for Areas (SEIFA), Australia, 2016 —.nsf; 2016. https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/2033.0.55.0012016.
24.ABS. Australian Bureau of Statistics; 2016. Available from: https://www.abs.gov.au/.
25.Cliff A, Ord J. The Problem of spatial autocorrelation. Papers in Regional Science 1, Studies in Regional Science, 25-55, edited by A. J Scott; 1969.
26.Tiefelsdorf M. Modelling spatial processes: the identification and analysis of spatial relationships in regression residuals by means of Moran’s I. Springer; 2000.
27. Kim D. Predicting the magnitude of residual spatial autocorrelation in geographical ecology. Ecography. 2021;44(7):1121–1130. doi: 10.1111/ecog.05403 [DOI] [Google Scholar]
28. Sue Bell B, Broemeling LD. A Bayesian analysis for spatial processes with application to disease mapping. Statistics in Medicine. 2000;19(7):957–974. doi: 10.1002/(SICI)1097-0258(20000415)19:7<957::AID-SIM396>3.0.CO;2-Q [DOI] [PubMed] [Google Scholar]
29. CARLIN BP, Xia H. Assessing environmental justice using Bayesian hierarchical models: two case studies. Journal of Exposure Analysis & Environmental Epidemiology. 1999;9(1). [DOI] [PubMed] [Google Scholar]
30. Escaramís G, Carrasco JL, Ascaso C. Detection of significant disease risks using a spatial conditional autoregressive model. Biometrics. 2008;64(4):1043–1053. doi: 10.1111/j.1541-0420.2007.00981.x [DOI] [PubMed] [Google Scholar]
31. Lee D. CARBayes: an R package for Bayesian spatial modeling with conditional autoregressive priors. Journal of Statistical Software. 2013;55(13):1–24. doi: 10.18637/jss.v055.i13 [DOI] [Google Scholar]
32.R Core Team. R: A Language and Environment for Statistical Computing; 2016. Available from: https://www.R-project.org/.
33.Lam P. Convergence diagnostics. Lecture presented at Government. 2002;.
34.Bivand R, Altman M, Anselin L, Assunção R, Berke O, Bernat A, et al. Package ‘spdep’; 2015. Available from: https://cran.r-project.org/web/packages/spdep/index.html.
35.ABS. Mesh Blocks; Jul 2021—Jun 2026. Available from: https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/main-structure-and-greater-capital-city-statistical-areas/mesh-blocks.
36. Amrhein CG. Searching for the elusive aggregation effect: evidence from statistical simulations. Environment and planning A. 1995;27(1):105–119. doi: 10.1068/a270105 [DOI] [Google Scholar]
37. Kang SY, McGree J, Mengersen K. The choice of spatial scales and spatial smoothness priors for various spatial patterns. Spatial and Spatio-temporal Epidemiology. 2014;10:11–26. doi: 10.1016/j.sste.2014.05.003 [DOI] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0313079.r001

Decision Letter 0

T Ganesh Kumar

27 May 2024

PONE-D-24-13684Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, AustraliaPLOS ONE

Dear Dr. Jahan,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR: Kindly revise the research paper as per the reviewer comments and resubmit it soon.

==============================

Please submit your revised manuscript by Jul 11 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

T. Ganesh Kumar, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure:

“QUT Centre for Data Science.”

Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“Ethical Approval

Only simulated data were used in this manuscript, data was simulated by Cancer

Council Queensland following the appropriate ethical consideration. This was

approved by the Data Custodians of the original data.

Funding

This research was funded by QUT Centre for Data Science.

Data availability

Datasets used in the research are not publicly available. However, the data can be

made available on request to the authors.”

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“QUT Centre for Data Science.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. In the online submission form you indicate that your data is not available for proprietary reasons and have provided a contact point for accessing this data. Please note that your current contact point is a co-author on this manuscript. According to our Data Policy, the contact point must not be an author on the manuscript and must be an institutional contact, ideally not an individual. Please revise your data statement to a non-author institutional point of contact, such as a data access or ethics committee, and send this to us via return email. Please also include contact information for the third party organization, and please include the full citation of where the data can be found.

6. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

7. We note that Figures 1, 2, 3, and 4 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1. You may seek permission from the original copyright holder of Figures 1, 2, 3, and 4 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

8. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Dear Authors,

Your paper is quality for publication. We have received the comments from the reviewers. Both reviewers are accepted the research paper with few comments.

The authors needs to revise the manuscript as per the reviewer comments. Kindly revise your paper as per the comments and resubmit it soon.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript provides a comprehensive introduction to the modifiable areal unit problem (MAUP) and its significance in spatial epidemiology, particularly in Bayesian disease mapping. The literature review is thorough, referencing key studies that highlight the challenges and previous approaches to addressing MAUP. The methodology section is well-detailed, describing the Bayesian disease mapping techniques used, including the choice of spatial units and statistical models. The manuscript justifies the selection of spatial units, explaining how different aggregations might influence the results. Advanced statistical methods are employed to account for spatial autocorrelation and to model disease risk, demonstrating a rigorous approach. Results are presented clearly, with appropriate use of tables and maps summaries to illustrate the impact of MAUP on disease mapping. Statistical analyses are robust, showing how different spatial aggregations affect the estimates of disease risk. The results are consistent with the hypotheses, demonstrating the influence of MAUP on the findings. The discussion effectively links the results back to the research questions and the broader literature on MAUP and disease mapping. Conclusions are logically derived from the results, emphasizing the importance of considering MAUP in spatial epidemiology. Recommendations for future research and policy implications are clearly stated, highlighting the practical significance of the findings. The manuscript critically assesses the impact of MAUP, discussing potential solutions and areas for further investigation. The data used are directly relevant to the study's objectives, focusing on disease incidence in Queensland, Australia. The spatial and temporal resolution of the data is appropriate for examining the effects of MAUP on Bayesian disease mapping. The analyses are thorough, using multiple spatial aggregations to demonstrate how MAUP affects disease risk estimates. Statistical techniques are appropriately applied, ensuring that the findings are robust and reliable. The interpretation of the results is sound, clearly showing how different spatial units lead to variations in disease risk estimates. The manuscript discusses the implications of these variations, supporting the need for careful consideration of MAUP in spatial epidemiological studies. The data support the conclusions drawn, with clear evidence that MAUP significantly influences Bayesian disease mapping outcomes. The manuscript provides a strong case for the importance of addressing MAUP, backed by rigorous statistical analysis and comprehensive data interpretation. Overall, the manuscript "Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, Australia" is technically sound. The data are credible and relevant, and the analyses are thorough and appropriately applied. The conclusions are well-supported by the data, providing valuable insights into the impact of MAUP on disease mapping and highlighting the need for careful spatial analysis in epidemiological studies.

Reviewer #2: 1. Why is it significant to investigate the impact of the MAUP in regions with sparse data, particularly in countries like Australia with less populated areas?

2. How MAUP affects the analysis of spatially aggregated data? Why it is a concern in spatial analysis?

3. What advantages do area structures with moderate resolution offer in terms of dealing with the MAUP?

4. What types of Bayesian spatial models were employed in the study, and how do they account for spatial correlation and variability?

5. What are the potential benefits of using SA1 or SA2 levels for spatial modelling according to the authors?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Dr.M.Subbulakshmi

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-24-13684_Review Comments.docx

pone.0313079.s002.docx^{(13.4KB, docx)}

PLoS One. 2025 Jan 28;20(1):e0313079. doi: 10.1371/journal.pone.0313079.r002

Author response to Decision Letter 0

25 Aug 2024

Added the response to reviewer comments document and added the response to journal requirements in the cover letter to the editor.

Attachment

Submitted filename: Response to Reviewers comments.pdf

pone.0313079.s003.pdf^{(781.3KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0313079.r003

Decision Letter 1

T Ganesh Kumar

18 Oct 2024

Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, Australia

PONE-D-24-13684R1

Dear Dr. Jahan,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

T. Ganesh Kumar, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The authors have answered the reviewer's comments and updated the contents in the revised versions.

I accept the paper for the further process.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: The manuscript "Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, Australia" presents to assess different geographical regions' vulnerability to the MAUP when data are relatively sparse to inform researchers' choice of aggregation level for fitting spatial models. The authors have demonstrated a commendable effort in developing a system for this purpose. Providing clear headings, subheadings, and transitions between sections would improve the flow of information and aid in understanding the research methodology and findings. Providing a comprehensive overview of the algorithms, parameters, and preprocessing techniques used would enhance the reproducibility and understanding of the study. The manuscript should adhere to Standard English conventions to ensure clarity and coherence. This includes using appropriate grammar, punctuation, and sentence structure throughout the text. Ensure consistency in terminology and terminology usage throughout the manuscript. Additionally, maintaining cohesion between sections and paragraphs by clearly establishing logical connections between ideas would enhance readability and comprehension. Additionally, complex technical terms and concepts should be explained clearly to accommodate readers with varying levels of expertise in the subject matter. The introduction provides a comprehensive background on the Modifiable Areal Unit Problem (MAUP) and its significance in Bayesian disease mapping. However, the review of literature could be expanded to include more recent studies or alternative methodologies that address MAUP. The specific relevance of Queensland, Australia, to the study is well articulated. It would be helpful to provide more context on why Queensland was chosen and how its geographic and demographic characteristics might influence the findings. The methodology section is robust, but it would benefit from a more detailed explanation of the Bayesian models used and how they were adjusted to account for MAUP. The sources of data and their quality are crucial for the validity of the results. More information on the data collection process and any limitations of the datasets used would strengthen this section. The statistical techniques employed for assessing the impact of MAUP on disease mapping are sound, but additional details on the assumptions made and how they were validated would be useful. The results are presented clearly, but the discussion could be enhanced by providing more insight into how the findings compare with previous studies on MAUP and Bayesian disease mapping. The discussion section does a good job of outlining the implications of the findings for Bayesian disease mapping. However, it could benefit from a deeper exploration of how the identified issues with MAUP could influence public health decision-making and policy in Queensland. Ensure that all references are up-to-date and relevant. Adding recent publications or key studies related to MAUP and Bayesian disease mapping could enhance the credibility and depth of the literature review. Addressing these considerations will enhance the intelligibility and readability of the manuscript, thereby improving its overall impact and effectiveness in communicating the research findings to the scientific community. Overall, while the manuscript presents a promising approach to Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, Australia strengthen the technical soundness of the study and the support for its conclusions.

Reviewer #2: The raised queries are answered and modified in the revised manuscript in few areas. It can be accepted.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

PLoS One. doi: 10.1371/journal.pone.0313079.r004

Acceptance letter

T Ganesh Kumar

31 Oct 2024

PONE-D-24-13684R1

PLOS ONE

Dear Dr. Jahan,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. T. Ganesh Kumar

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Data

(XLSX)

pone.0313079.s001.xlsx^{(83.7KB, xlsx)}

Attachment

Submitted filename: PONE-D-24-13684_Review Comments.docx

pone.0313079.s002.docx^{(13.4KB, docx)}

Attachment

Submitted filename: Response to Reviewers comments.pdf

pone.0313079.s003.pdf^{(781.3KB, pdf)}

Data Availability Statement

Simulated lung cancer data for Queensland used in the research are made available in the form of supplmentary information.

[pone.0313079.ref001] 1. Best N, Richardson S, Thompson A. A comparison of Bayesian spatial models for disease mapping. Statistical methods in medical research. 2005;14(1):35–59. doi: 10.1191/0962280205sm388oa [DOI] [PubMed] [Google Scholar]

[pone.0313079.ref002] 2. Openshaw S. The modifiable areal unit problem. Concepts and techniques in modern geography. 1984;38, Norwich:GeoBooks. [Google Scholar]

[pone.0313079.ref003] 3. Wong DW. The modifiable areal unit problem (MAUP). WorldMinds: geographical perspectives on 100 problems: commemorating the 100th anniversary of the association of American geographers 1904-2004. 2004; p. 571–575. doi: 10.1007/978-1-4020-2352-1_93 [DOI] [Google Scholar]

[pone.0313079.ref004] 4. Fotheringham AS, Wong DW. The modifiable areal unit problem in multivariate statistical analysis. Environment and planning A. 1991;23(7):1025–1044. doi: 10.1068/a231025 [DOI] [Google Scholar]

[pone.0313079.ref005] 5. Gregorio DI, DeChello LM, Samociuk H, Kulldorff M. Lumping or splitting: seeking the preferred areal unit for health geography studies. International Journal of Health Geographics. 2005;4:1–10. doi: 10.1186/1476-072X-4-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313079.ref006] 6. Tuson M, Yap M, Kok MR, Murray K, Turlach B, Whyatt D. Incorporating geography into a new generalized theoretical and statistical framework addressing the modifiable areal unit problem. International journal of health geographics. 2019;18(1):1–15. doi: 10.1186/s12942-019-0170-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313079.ref007] 7. Buzzelli M. Modifiable areal unit problem. International encyclopedia of human geography. 2020; p. 169. doi: 10.1016/B978-0-08-102295-5.10406-8 [DOI] [Google Scholar]

[pone.0313079.ref008] 8. Nelson JK, Brewer CA. Evaluating data stability in aggregation structures across spatial scales: revisiting the modifiable areal unit problem. Cartography and Geographic Information Science. 2017;44(1):35–50. doi: 10.1080/15230406.2015.1093431 [DOI] [Google Scholar]

[pone.0313079.ref009] 9. Manley D. Scale, Aggregation, and the Modifiable Areal Unit Problem. In: Handbook of regional science. Springer; 2013. p. 1157–1171. [Google Scholar]

[pone.0313079.ref010] 10. Marí-Dell’Olmo M, Oliveras L, Vergara-Hernández C, Artazcoz L, Borrell C, Gotsens M, et al. Geographical inequalities in energy poverty in a Mediterranean city: Using small-area Bayesian spatial models. Energy Reports. 2022;8:1249–1259. doi: 10.1016/j.egyr.2021.12.025 [DOI] [Google Scholar]

[pone.0313079.ref011] 11. Kok MR, Tuson M, Yap M, Turlach B, Boruff B, Vickery A, et al. Impact of the modifiable areal unit problem in assessing determinants of emergency department demand. Emergency Medicine Australasia. 2021;33(5):794–802. doi: 10.1111/1742-6723.13727 [DOI] [PubMed] [Google Scholar]

[pone.0313079.ref012] 12. Dapena AD, Vázquez EF, Morollón FR. The role of spatial scale in regional convergence: the effect of MAUP in the estimation of β-convergence equations. The Annals of Regional Science. 2016;56(2):473–489. doi: 10.1007/s00168-016-0750-0 [DOI] [Google Scholar]

[pone.0313079.ref013] 13. Jeffery C, Ozonoff A, Pagano M. The effect of spatial aggregation on performance when mapping a risk of disease. International journal of health geographics. 2014;13(1):1–9. doi: 10.1186/1476-072X-13-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313079.ref014] 14. Tuson M, Yap M, Kok MR, Boruff B, Murray K, Vickery A, et al. Overcoming inefficiencies arising due to the impact of the modifiable areal unit problem on single-aggregation disease maps. International journal of health geographics. 2020;19(1):1–18. doi: 10.1186/s12942-020-00236-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313079.ref015] 15. Besag J, York J, Mollié A. Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics. 1991;43(1):1–20. doi: 10.1007/BF00116466 [DOI] [Google Scholar]

[pone.0313079.ref016] 16. Fontanet CP, Carlos H, Weiss JE, Diaz MCG, Shi X, Onega T, et al. Evaluating Geographic Health Disparities in Cancer Care: Example of the Modifiable Areal Unit Problem. Annals of Surgical Oncology. 2023;30(12):6987–6989. doi: 10.1245/s10434-023-14140-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313079.ref017] 17. Shi X, Miller S, Mwenda K, Onda A, Rees J, Onega T, et al. Mapping disease at an approximated individual level using aggregate data: a case study of mapping New Hampshire birth defects. International Journal of Environmental Research and Public Health. 2013;10(9):4161–4174. doi: 10.3390/ijerph10094161 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313079.ref018] 18.Cancer Council Queensland. Simulated Lung cancer data for Queensland; 2015. Email: Statistics@cancerqld.org.au.

[pone.0313079.ref019] 19.ABS. Australian statistical geography standard (ASGS); 2016. Available from: https://www.abs.gov.au/AUSSTATS/abs@.nsf/allprimarymainfeatures/871A7FF33DF471FBCA257801000DCD5F?opendocument.

[pone.0313079.ref020] 20.ABS. Main Features-Overview; 2016. Available from: https://www.abs.gov.au/ausstats/abs@.nsf/Lookup/by%20Subject/1270.0.55.001~July%202016~Main%20Features~Overview~1.

[pone.0313079.ref021] 21. Wong MC, Lao XQ, Ho KF, Goggins WB, Tse SL. Incidence and mortality of lung cancer: global trends and association with socioeconomic status. Scientific reports. 2017;7(1):14300. doi: 10.1038/s41598-017-14513-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313079.ref022] 22.ABS. Socio-Economic Indexes for Areas; 2016. Available from: https://www.abs.gov.au/ausstats/abs@.nsf/mf/2033.0.55.001.

[pone.0313079.ref023] 23.ABS. 2033.0.55.001—Census of Population and Housing: Socio-Economic Indexes for Areas (SEIFA), Australia, 2016 —.nsf; 2016. https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/2033.0.55.0012016.

[pone.0313079.ref024] 24.ABS. Australian Bureau of Statistics; 2016. Available from: https://www.abs.gov.au/.

[pone.0313079.ref025] 25.Cliff A, Ord J. The Problem of spatial autocorrelation. Papers in Regional Science 1, Studies in Regional Science, 25-55, edited by A. J Scott; 1969.

[pone.0313079.ref026] 26.Tiefelsdorf M. Modelling spatial processes: the identification and analysis of spatial relationships in regression residuals by means of Moran’s I. Springer; 2000.

[pone.0313079.ref027] 27. Kim D. Predicting the magnitude of residual spatial autocorrelation in geographical ecology. Ecography. 2021;44(7):1121–1130. doi: 10.1111/ecog.05403 [DOI] [Google Scholar]

[pone.0313079.ref028] 28. Sue Bell B, Broemeling LD. A Bayesian analysis for spatial processes with application to disease mapping. Statistics in Medicine. 2000;19(7):957–974. doi: 10.1002/(SICI)1097-0258(20000415)19:7<957::AID-SIM396>3.0.CO;2-Q [DOI] [PubMed] [Google Scholar]

[pone.0313079.ref029] 29. CARLIN BP, Xia H. Assessing environmental justice using Bayesian hierarchical models: two case studies. Journal of Exposure Analysis & Environmental Epidemiology. 1999;9(1). [DOI] [PubMed] [Google Scholar]

[pone.0313079.ref030] 30. Escaramís G, Carrasco JL, Ascaso C. Detection of significant disease risks using a spatial conditional autoregressive model. Biometrics. 2008;64(4):1043–1053. doi: 10.1111/j.1541-0420.2007.00981.x [DOI] [PubMed] [Google Scholar]

[pone.0313079.ref031] 31. Lee D. CARBayes: an R package for Bayesian spatial modeling with conditional autoregressive priors. Journal of Statistical Software. 2013;55(13):1–24. doi: 10.18637/jss.v055.i13 [DOI] [Google Scholar]

[pone.0313079.ref032] 32.R Core Team. R: A Language and Environment for Statistical Computing; 2016. Available from: https://www.R-project.org/.

[pone.0313079.ref033] 33.Lam P. Convergence diagnostics. Lecture presented at Government. 2002;.

[pone.0313079.ref034] 34.Bivand R, Altman M, Anselin L, Assunção R, Berke O, Bernat A, et al. Package ‘spdep’; 2015. Available from: https://cran.r-project.org/web/packages/spdep/index.html.

[pone.0313079.ref035] 35.ABS. Mesh Blocks; Jul 2021—Jun 2026. Available from: https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/main-structure-and-greater-capital-city-statistical-areas/mesh-blocks.

[pone.0313079.ref036] 36. Amrhein CG. Searching for the elusive aggregation effect: evidence from statistical simulations. Environment and planning A. 1995;27(1):105–119. doi: 10.1068/a270105 [DOI] [Google Scholar]

[pone.0313079.ref037] 37. Kang SY, McGree J, Mengersen K. The choice of spatial scales and spatial smoothness priors for various spatial patterns. Spatial and Spatio-temporal Epidemiology. 2014;10:11–26. doi: 10.1016/j.sste.2014.05.003 [DOI] [PubMed] [Google Scholar]

PERMALINK

Assessing the influence of the modifiable areal unit problem on Bayesian disease mapping in Queensland, Australia

Farzana Jahan

Shovanur Haque

James Hogg

Aiden Price

Conor Hassan

Wala Areed

Helen Thompson

Jessica Cameron

Susanna M Cramb

Roles

Abstract

Background

Methods

Results and conclusion

Introduction

Materials and methods

Spatial structures

Data

Fig 1. Choropleth maps displaying the simulated counts of lung cancer at multiple aggregation levels for Queensland, Australia.

Fig 2. Choropleth maps displaying the observed SIR from the simulated lung cancer data at multiple aggregation levels for Queensland, Australia (grey regions have 0 cases).

Table 1. Population and number of lung cancer diagnoses by geographic levels of aggregation.

Socio-Economic Indexes for Areas (SEIFA)

Ethical approval

Statistical methods

Spatial autocorrelation

Bayesian spatial model: BYM model

Implementation

Results

Fig 4. Choropleth maps displaying the fitted SIR (with covariate model) from the simulated lung cancer data at multiple aggregation levels for Queensland, Australia.

Table 2. Moran’s I of observed counts and modelled residuals.

Table 3. Posterior summary of simulated lung cancer BYM model without covariates.

Table 4. Posterior summary of simulated lung cancer BYM model with covariates.

Fig 3. Choropleth maps displaying the fitted SIR (without covariate model) from the simulated lung cancer data at multiple aggregation levels for Queensland, Australia.

Fig 5. Credible intervals of parameters for BYM model with covariate.

Fig 6. Credible intervals of parameters for BYM model without covariate.

Discussion and conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

T Ganesh Kumar

Roles

Author response to Decision Letter 0

Decision Letter 1

T Ganesh Kumar

Roles

Acceptance letter

T Ganesh Kumar

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases