Skip to main content
PLOS One logoLink to PLOS One
. 2020 Jan 27;15(1):e0221070. doi: 10.1371/journal.pone.0221070

Downscaling livestock census data using multivariate predictive models: Sensitivity to modifiable areal unit problem

Daniele Da Re 1,*, Marius Gilbert 2, Celia Chaiban 1,2, Pierre Bourguignon 1, Weerapong Thanapongtharm 3, Timothy P Robinson 4,5, Sophie O Vanwambeke 1
Editor: Sotirios Koukoulas6
PMCID: PMC6984718  PMID: 31986146

Abstract

The analysis of census data aggregated by administrative units introduces a statistical bias known as the modifiable areal unit problem (MAUP). Previous researches have mostly assessed the effect of MAUP on upscaling models. The present study contributes to clarify the effects of MAUP on the downscaling methodologies, highlighting how a priori choices of scales and shapes could influence the results. We aggregated chicken and duck fine-resolution census in Thailand, using three administrative census levels in regular and irregular shapes. We then disaggregated the data within the Gridded Livestock of the World analytical framework, sampling predictors in two different ways. A sensitivity analysis on Pearson’s r correlation statistics and RMSE was carried out to understand how size and shapes of the response variables affect the goodness-of-fit and downscaling performances. We showed that scale, rather than shapes and sampling methods, affected downscaling precision, suggesting that training the model using the finest administrative level available is preferable. Moreover, datasets showing non-homogeneous distribution but instead spatial clustering seemed less affected by MAUP, yielding higher Pearson’s r values and lower RMSE compared to a more spatially homogenous dataset. Implementing aggregation sensitivity analysis in spatial studies could help to interpret complex results and disseminate robust products.

Introduction

Spatial data are becoming increasingly more accessible to the scientific community. However, much data are provided in an aggregated form at different administrative levels, mainly for operational and privacy reasons [1, 2]. Administrative levels are usually determined and modifiable, meaning that they can be subdivided to form units of different sizes and shapes [3, 4]. Because administrative units may not adequately reflect the spatial organization of human or natural phenomena, researchers pursue the elaboration of methods for data disaggregation with the help of broadly available remote sensing data. Often, little attention is paid to the issue of the modifiable units and its effect on spatial representations [5]. This specific issue has been discussed in the spatial analysis literature since the 1930s (e.g. [6]), but gained attention with the milestone work of Openshaw and Taylor [7, 8] that led to the introduction of the concept of Modifiable Areal Unit Problem (MAUP). The MAUP encompasses two related but distinctive components: the scale issue and the zonation issue [3, 4, 710]. The scale problem reflects how the description of a phenomenon is potentially affected by changing the size of the sampling units, while the zonation issue relates to how changing the shape of sampling units could influence the representation of the phenomenon [7]. These effects occur because patterns and processes operate in the real world according to various scales and designs that are often unknown to the researcher [9]. A descriptive example illustrates some immediate effects. Fig 1a shows how the aggregation of individual-level data at different scales causes a reduction of the variability, and thus narrowing of the distribution. In Fig 1b, individual-level data are aggregated at the same scale but using different, arbitrary, areal unit shapes. The results are highly variable [3, 8, 10].

Fig 1. The modifiable areal unit problem.

Fig 1

Example showing the two effects of MAUP (adapted from [3]).

MAUP is closely related to the ecological inference fallacy, a misinterpretation of statistical inferences drawn at the group level but interpreted at the individual level [11]. With spatial data becoming a staple in a diversity of fields, the effects of MAUP have been widely explored, from ecology to remote sensing and from physical geography to economy [3, 10, 1218]. Despite the fact that the impact of MAUP is often ingnored [5], when it is addressed researchers mostly assess its effect on upscaling, or aggregating [3, 16, 18], and mostly on its effect on model estimates rather than on downscaling, or disaggregating precision (but see [19]).

The availability of spatial data and data processing capacity fostered an interest into the spatial heterogeneity of diverse processes and encouraged researchers to find ways to disaggregate data. Downscaling techniques are used to disaggregate variables recorded or distributed at an aggregated scale, such as census data, and provide predictions at a finer level of spatial detail. Such fine scale data are of crucial interest in diverse fields and applications in agricultural socio-economics, food security, environmental impact assessment and epidemiology [20]. Concerning livestock, analyzing the emergence of zoonotic diseases requires detailed spatially explicit data of both hosts and their pathogens, e.g. for pathogenic avian influenza (HPAI, [21]).

The Gridded Livestock of the World (GLW, [22]) and WorldPop [23] disaggregate population data using statistical techniques and environmental predictors. Outputs of both projects attain good accuracy scores [20, 24], but as they result from a downscaling process, both are potentially subject to the MAUP. Despite the fact that the application of the GLW methodology has become robust and its application frequent (e.g. [20, 2528]), its vulnerability to MAUP has not yet been directly investigated. Previous studies (e.g. [25]) showed a certain degree of sensitivity to the scale issue, however, the severity of the problem has not been assessed and a sensitivity analysis using various scale and shape configurations would help quantifying potential sources of uncertainties.

In this study, we analyzed the impact of both MAUP effects on the disaggregation of census-like livestock data. The objectives were: (i) to assess, on two different spatially-constrained real datasets, how the MAUP affects both goodness-of-fit metrics and downscaled results, (ii) to increase awareness about the MAUP issues in the context of data disaggregation. A fine resolution census dataset of poultry in Thailand was aggregated at scales corresponding to administrative levels, using sampling units with variable shapes and areas and subsequentely disaggregated to a common resolution over a 500m grid.

Materials and methods

Poultry population data

In 2010, the Department of Livestock Development of the Thai government conducted a national census of poultry in each sub-district and village, counting poultry head per owner. Each farm was associated by a unique administrative code number to its village, for which geographic coordinates were recorded. The census distinguished between broiler chickens, layer chickens, native chickens, farm ducks and free-grazing ducks. Here, we combined all data to species level ending with chicken and duck. The spatial constraints and determinants of the production systems of duck and chickens differ (intensive and backyard; [2931]). While chickens can be raised anywhere, in Thailand, ducks are largely raised in wetlands used for double-crop rice production, where free-grazing ducks feed year round in rice paddies [30, 31].

Village records with incorrect coordinates (coordinates outside of the Thai territory or with 0 in latitude or longitude fields) were removed. In the case of duplicate coordinates or duplicate village unique ID, only one record for each duplicate was randomly selected. The provinces of Bangkok, Nakhon Sawan, Pattani and Phetchaburi were excluded due to lack of data. Once filtered, the village dataset was joined to the census dataset using the villages’ administrative code number.

The poultry census individual level data were aggregated using a simple additive aggregation method according to Thai administrative units: districts, sub-districts and villages. As a comprehensive file of village boundaries is not available, Voronoi polygons were computed from the village coordinates.

Modelling

We used the methodology of the Gridded Livestock of the World (GLW) project. The GLW disaggregates livestock statistics and provides spatially detailed estimates of livestock density in the form of raster spatial data [22]. The most recent version (GLW3; [26]) relies on stratified random forest models and a set of environmental predictors. The GLW methodology is fully described in [25] and [20]. Two user-controlled parameters drive the performance of random forest models: the number of trees created and the number of variables randomly selected when creating a splitting point. [32] have shown that 500 trees are a good rule of thumb, while the minimum number of variables that are randomly selected was calculated using the square root of the total number of variables [33].

The set of predictors was chosen among those shown to be relevant environmental and socio-economical drivers of poultry distribution [20, 30, 31, 34]. It included Fourier-transformed MODIS variables (two vegetation indices, the day and night land surface temperature and the band 3 middle-infra-red), eco-climatic variables (length of the growing season and annual precipitation), topographic variables (elevation and slope) land cover classes and anthropogenic variables (human population density and travel time to major cities and ports). Unpopulated areas, natural areas and water bodies were masked out and only areas suitable for poultry production were considered and used to get corrected poultry densities. Poultry densities corrected by area were transformed to logarithm (base 10) and used as response variable. The full list of spatial domain and predictors is detailed in Table 1 along with sources.

Table 1. List of input spatial dataset used to model chickens and ducks densities.

Type Variables Use Source
Land Land and water area Spatial domain and Spatial predictor [35, 36]
Land use IUCN world database of protected area Mask [37]
Anthropogenic Worldpop human population density Spatial predictor and suitability mask [23]
Travel time to the capital, province capitals and main harbors Spatial predictor [38, 39]
Topography Elevation (GTOPO30) Spatial predictor [40]
Slope (GTOPO30) Spatial predictor [40]
Vegetation 10 Fourier-derived variables from Normalized Difference Vegetation Index from MODIS (MODIS)* Spatial predictor [41]
Length of growing period Spatial predictor [42]
Green-up and senescence (annual cycle 1 and 2) Spatial predictor [43]
Forest cover Spatial predictor [44]
Cropland, irrigated cropland and rainfed cropland cover Spatial predictor [45]
Climatic 10 Fourier-derived variables from Day/Night Land Surface Temperature (MODIS) Spatial predictor [41]
Precipitations Spatial predictor [46]

*Annual mean, annual muinimum, annual maximum, amplitude and phase of annual cycle, amplitude and phase of bi-annual cycle, amplitude and phase of tri-annual cycle, variance in annual, bi-annual, and tri-annual cycles.

All input raster layers (e.g. masks and predictor variables) and outputs (predicted densities) were processed on the whole of Thailand with a spatial resolution of 500 m.

Experimental design

The effect of scale was explored by aggregating the individual level data to village, sub-district and district level. The effects of zoning were analyzed using two different sets of polygon sampling units (PSUs) for each administrative level: (i) irregular (IRR) shapes, the original administrative units, and (ii) regular shapes (REG), a grid having the spatial resolution of the average spatial resolution (ASR) of the correspondent IRR PSUs. The ASR measures the effective resolution of administrative units in kilometers. It is calculated as the square root of the land area of the administrative units considered, divided by the number of administrative units [47, 48]. District, sub-district and village ASR is respectively 557.04, 69.55 and 8.30 km. REG PSUs were computed only at sub-district and district level. The density of birds per km² of suitable land was estimated in all polygons corresponding to each PSUs and transformed to its Log10 [25].

Two methods were applied to extract or sample the predictors by polygon, in order to understand their effect on the downscaled prediction. One method randomly sampled a point in each PSU and extracted the matching pixel value for each predictor. The other averaged the predictors within the PSU.

Model evaluation

The polygons used as response variable were separated in training and validation sets. 70% of polygons were used to train the model, while the remaining 30% were used as evaluation data set. PSUs were sampled into training and evaluation datasets 20 times to assess the internal variability of the predictions. Once the model was fitted, average and standard deviation maps were computed from the 20 outputs.

Model evaluation was carried out using two approaches. Firstly, to assess how well the model predicted poultry densities, the root mean square error (RMSE) and Pearson’s r correlation coefficient (COR) were computed between the observed values of the evaluation set of PSU and the predicted densities aggregated at polygon level of the corresponding validation PSUs. RMSE measures model accuracy, i.e. how far the predicted values were, on average, from the observed values. COR measures precision, i.e. the extent to which the observed and predicted values are proportional to each other. Lower RMSE and higher COR indicate better fits between predicted and observed values. RMSE and COR were estimated for the overall models. Moreover, to measure the internal precision associated with the area, RMSE and COR were also estimated considering PSUs area, grouping PSUs according to the frequencies of their area (Supporting information, S1 Fig): 0-10 km², 10-20 km² and >20 km² for villages, 0-100 km², 100-200 km² and >200 km² for sub-districts, 0-500 km², 500-1000 km² and >1000 km² for districts.

Secondly, Pearson’s r was computed between predictions and the observed data at the village level only to assess the capacity of models trained using various PSUs to predict poultry population at a fine scale, i.e their “downscaling precision” (CORdown). This is crucial to understand the effects of MAUP on the downscaled predictions considering the finest administrative levels available as reference. Three different bounding boxes (hereafter bbox) were selected in different areas of Thailand to visually investigate the differences between the predictions and the observations. A graphical summary of the methodology is shown in Fig 2. The model is fully operational under R 3.4 [49] and the codes used, as well as the aggregated census data, are available at https://gitlab.com/danidr/glw/tree/master/glw_maup.

Fig 2. Flowchart of the analysis.

Fig 2

Results

Data cleaning and filtering

The 62 142 village records originally available were reduced to 57 794 (Table 2). Once the filtered village database was joined to the poultry census, the final georeferenced census dataset used to train the models accounts for 53 301 records (Table 2). Fig 3 show the observed densities for the two species aggregated at sub-districts and districts administrative level. Chickens were homogenously distributed. Ducks were mainly clustered in the central and southeast part of the country.

Table 2. Data filtering results.

For duplicate coordinates or duplicate village unique ID, only one record for each duplicated row was randomly selected and added to the finale database.

Unfiltered Duplicated ID Duplicated coordinates Filtered
Villages 62142 6579 33 57794
Census 3170213 - - 53301

Fig 3. Observed poultry densities in logarithm (base 10) aggregated at districts and sub-districts level.

Fig 3

In grey the provinces of Bangkok, Nakhon Sawan, Pattani and Phetchaburi, excluded from the analysis due to lack of data.

Model output maps

The model predictions within bbox 1 are shown in Fig 4, while bbox 2 and 3 are displayed in the S4 and S7 Figs. Chickens were widely distributed though high density clusters are observable in the North-East and South-West parts of bbox 1. Ducks were present mostly in the central part. The model was able to reproduce the observed spatial pattern of both species, regardless of the sampling method.

Fig 4. Observed and predicted Log10 poultry values inside bbox1.

Fig 4

a) chickens, b) ducks.

The mean predicted values are comparable to the observed ones but the predicted values distributions are clustered around the mean and appeared less variable than the observed. For both species, the aggregation of input data produced higher mean values at coarser scale, together with a narrowing effect of the value distribution and a smoothing effect on the frequencies (S2 and S3 Figs).

IRR and REG shaped administrative units showed slightly different predicted spatial patterns. In both cases, the distribution of the predicted values is consistent with the observed values, however, REG shapes seemed to predict a slightly smoother spatial pattern, detecting more variability across space than IRR shapes, which predicted more values clustered around and above the mean value.

Model evaluation

The RMSE bar plots for ducks and chickens are shown in Fig 5. For both species, the overall accuracy increased (lower RMSE values) as the administrative level of the input data became coarser. However, this trend is more consistent for ducks rather than for chickens. Model runs on REG shaped PSUs showed generally less variability, but they had lower accuracy than IRR PSUs for chickens and comparable or slightly lower for ducks. Randomly sampling the predictors within the PSUs yielded slightly lower RMSEs than their aggregation.

Fig 5. Root mean square error (RMSE).

Fig 5

RMSE computed between predicted densities and observed chickens densities a) averaged sampling b) random sampling; RMSE computed between predicted densities and observed ducks densities c) averaged sampling d) random sampling.

COR bar plots based on stratified random sampling of the predictors and averaged predictors are shown in Fig 6. For both species, the COR value increased as the administrative level of the input data became coarser. REG PSUs produced higher correlations than the corresponding IRR PSUs and the overall models, showing also less variability among the bootstraps. The choice of the sampling methods did not affect the results strongly, but random sampling showed apparently higher variability between individual bootstraps.

Fig 6. Pearson’s r.

Fig 6

Pearson’s r coefficient computed between predicted densities and observed chickens densities a) averaged sampling b) random sampling; Pearson’s r coefficient computed between predicted densities and observed ducks densities c) averaged sampling d) random sampling.

Downscaling precision

CORdown, the Pearson’s r coefficient between the predicted and observed densities at village level are shown in Fig 7. Models of duck distribution had higher correlations than the chicken models. Contrary to the internal precision of the model, smaller PSUs had higher Pearson’s r values than larger ones. The shape of the PSUs produced comparable results in terms of Pearson’s r values. Random sampling produced higher Pearson’s r values compared to average sampling, which generally had a lower variability among model runs. A table summarising the evaluation of model runs is found in S1 Table.

Fig 7. Downscaling precision.

Fig 7

Pearson’s r coefficient between predicted densities and observed densities at village level: a) chickens; b) ducks. Random sampling (rp), averaged sampling (av).

Discussion

Overall MAUP bias

Our model predicted poultry density patterns and value distributions similar to the observed densities, confirming the validity of the methodology [20]. As expected, chickens were dispersed at high densities across the whole country, while ducks were constrained to wetlands used for double-crop rice production [21, 30, 31].

The scale of the training data affected the output maps goodness-of-fit. On average, duck models showed higher downscaling precision and higher accuracy and precision compared to chickens. Swift, Liu and Uber [50] and Swift et al. [14] reported that a spatially clustered phenomenon aggregated using various size and shapes of areal units is less affected by MAUP compared to a randomly distributed phenomenon. Because of that, when the clustered structure of the observed point pattern is preserved, the MAUP bias is considerably reduced. Moreover, Swift et al. [14] also showed that aggregating the independent variable using an areal unit shape related to its spatial structure reduces the effect of MAUP, but their conclusion rely on simulated data only. To aggregate empirical data, choosing a priori areal unit shapes that preserve the spatial structure and reduce the MAUP may be challenging, and in the context of data disaggregation, may be impossible. But, in the context of data disaggregation, the MAUP bias may be smaller if the spatial units are able to capture the spatial variability of the phenomenon at hand. Recently Tuson et al. [51] proposed a theorethical and statistical framework to address the MAUP trying to detect a minimal geographical unit of analysis. Though promising, in our case the minimal geographical unit of analysis is determined by the minimal administrative level available, making the results dependent on the units used.

MAUP scale effect

Qualitatively, fine resolution polygon training data produced predictions with a more detailed spatial pattern compared to coarser resolution training data. As far as the effect of scale on the internal precision of the model is concerned, better model precision and accuracy was reached by models trained with coarser resolution input data, contrary to what Van Boeckel et al. [21] found. These apparently contradictory results can be explained considering that Van Boeckel et al. [21] used different modelling approaches and that their goodness-of-fit were computed under a different rationale. In particular, whilst our goodness-of-fit metrics were computed between validation PSUs and predicted pixel values aggregated at the respective PSUs areas, Van Boeckel et al. [21] computed goodness-of-fit metrics between validation and predicted value at point level. Though the RMSE and COR trends are not in accordance with this previous study on Thai poultry, our results are consistent with their findings in terms of RMSE and COR ranges. More importantly, our results reflect the general trend described by Gehlke and Biehl [6], where correlation coefficients tend to increase as the number of areal units representing the data decreased, as a consequence of the data smoothing associated with the aggregation process.

MAUP zone effect

Comparing COR and RMSE results at the same scale, REG PSU produced slightly higher mean values and less variability between model runs than IRR ones (S1 Table). In our case Pearson’s r is the statistic most affected by the zone effect, but still it appeared marginal in comparison to the effect of scale, as observed also by Swift et al. [14] for simulated data. Recently, García-Llamas et al. [18] investigated the effects of MAUP using landscape heterogeneity as a proxy of species richness. They highlighted how the use of irregularly shaped eco-geographic area units (watersheds) performed better than arbitrary square units, probably because in their case eco-geographic areas better capture the spatial variability of species diversity. Though our REG PSU based on the ASR of the IRR PGU showed higher precision scores, our design remains affected by ecological fallacy, as both administrative levels and PSU shapes may be independent of the phenomena investigated and not effectively describe the environmental and social envelope of farm distribution in geographical space [11, 52]. On this point, Fox et al. [52] suggest that combining reasonable assumptions to empirical data and spatial analysis may help to develop functional boundaries around the individual level investigated.

Sampling methods effect

The choice of the sampling methods of the predictors did not affect RMSE and both correlation coefficients. The mean value of our evaluation indices were stable and the variability observable in Figs 5, 6 and 7 is likely to be more related to variability between model runs rather than to the choice of the sampling methods.

Downscaling precision

The downscaling precision statistic was affected mainly by the scale rather than by the zone or sampling methods. The ranges are generally narrow, considering scale, zone and sampling effect. The downscaling precision as expressed by CORdown increases with higher resolution of the training data. In fact, Robinson et al. [25] computed the goodness-of-fit metrics of their downscaling models comparing the predicted values to the observed data at the highest administrative level (a similar approach to what we used here for the CORdown), using a real census livestock dataset as we did. Similarly to our findings, they underlined how the statistical model trained on smaller administrative units got better accuracy and precision in the disaggregation of administrative units. These findings suggest that, if possible, data should be collected at the finest spatial resolution available to train the model.

The question of how to select the spatial scale of the prediction according to the available detail of aggregated data remains. The choice of the spatial scale of analysis influences the understanding of the geographical patterns [53]. When downscaling, it is thus crucial to understand whether the polygons’ area within a given administrative level could influence the disaggregated results. For instance, considering the frequency histogram of district areas (S1 Fig), we do not know how larger polygons affect the downscaling precision. From one perspective, adding larger polygons would include more environmental heterogeneity in the model and would allow the model to discriminate better between suitable and unsuitable areas. However, since smaller polygons suit best in terms of downscaling precision, larger polygons could add noise to the spatial distribution of the response variable. It is unlikely that geometry of one set of areal units would match any measured phenomena exactly as it is and as it would occur for a simulated pattern [14], but new approaches combining geostatistics and Bayesian hierarchical models (e.g. [5456]) are promising tools to address the MAUP effects.

Conclusion

Within the GLW framework, we assessed the MAUP effects on the downscaled predictions starting from different aggregated response variable scales. We focused on the predictive rather than the explanatory power of the model, unlike numerous studies on MAUP focused on its effects on parameter estimates or p-values (e.g. [5760]). The goal of the downscaling methodologies is not only to compare and interpret the pixel-wise absolute value per se, but also to detect and represent well the spatial variation and pattern of the phenomena investigated. Since absolute values and trends are different, the choice of CORdown was made under the rationale to look for the scale that best preserves the observed value, allowing at the same time to detect the existing spatial trends.

GLW is an efficient approach to disaggregate census data to predict spatial distribution of livestock. Scale, rather than shapes and sampling methods, appears to affect downscaling precision, suggesting that the finest administrative level should be sought to train the model. Moreover, the effects of MAUP appear weaker on a spatially constrained dataset rather than a more spatially homogenous one, as already shown for simulated data.

Carrying a sensitivity analysis and reporting the various results obtained from different sets of aggregation and zoning systems helped to adequately address the MAUP issue and to understand how much it affected the predictions. Understanding the magnitude of the bias introduced in the data due to the aggregation is crucial to inform spatial scientist on the often-ignored effect of data aggregation and to provide robust spatial prediction to policy maker. The effect of MAUP on aggregated data is unavoidable and only individual level data can avoid it [14, 61].

As already stated by previous authors(e.g. [14, 50, 62], sensitivity to aggregation should be analysed in any spatial study in order to correctly interpret complex results and disseminate clear and robust maps.

Supporting information

S1 Fig. Polygon sampling units areas’ histograms.

The histograms of the area of polygon sampling units used to estimate RMSE and COR for different polygon areal sizes. The red bars represent the Average Spatial Resolution (ASR) of the polygons, while the blue lines are the polygon area classes chosen: a) 0-500 km², 500-1000 km² and >1000 km² are the districts area classes used, ASR = 3.11 km, b) 0-100 km², 100-200 km² and >200 km² are the sub-districts area classes used, ASR = 8.33 km, c) 0-10 km², 10-20 km² and >20 km² are the villages area classes used, ASR = 23.60 km.

(TIF)

S2 Fig. Observed and predicted Log10 chicken values histogram inside bbox1.

The blue lines represent the mean value.

(TIF)

S3 Fig. Observed and predicted Log10 duck values histogram inside bbox 1.

The blue lines represent the mean value.

(TIF)

S4 Fig. Observed and predicted Log10 poultry values inside bbox 2.

a) chickens, b) Ducks.

(TIF)

S5 Fig. Observed and predicted Log10 chickens values histogram inside bbox 2.

The blue lines represent the mean value.

(TIF)

S6 Fig. Observed and predicted Log10 ducks values histogram inside bbox 2.

The blue lines represent the mean value.

(TIF)

S7 Fig. Observed and predicted Log10 poultry values inside bbox 3.

a) chickens, b) Ducks.

(TIF)

S8 Fig. Observed and predicted Log10 chickens values histogram inside bbox 3.

The blue lines represent the mean value.

(TIF)

S9 Fig. Observed and predicted Log10 ducks values histogram inside bbox 3.

The blue lines represent the mean value.

(TIF)

S1 Table. Summary table of models’ goodness of fit and downscaling precision.

(CSV)

Acknowledgments

We thank the staff of Thailand’s Department of Livestock Development (DLD), composed of the District Livestock Offices, Provincial Livestock Offices, and Center for Information Technology for animal census data; Thailand’s Ministry of Transportation for geodata; and the Department of Provincial Administration, Ministry of Interior, for population data.

Computational resources have been provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL) and the Consortium des Equipements de Calcul Intensif en Fédération Wallonie Bruxelles (CECI) funded by the Fond de la Recherche Scientifique de Belgique (FRS-FNRS).

DDR is F.R.S-FNRS Research Fellow, Belgium. DDR was supported by the FRFS-WISD Walloon Institute for Sustainable Development PDR “Mapping livestock’s transition” (PDR-WISD X302317F).

Data Availability

The R codes used in the study and the aggregated census data at different administrative levels are available in the gitlab folder https://gitlab.com/danidr/glw/tree/master/glw_maup.

Funding Statement

D.D.R. is supported by the FRFS-WISD Walloon Institute for Sustainable Development PDR “Mapping livestock’s transition” (PDR-WISD X302317F). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Goodchild MF, Proctor JD. Goodchild and Proctor 1997 Scale.pdf. Geographical and Environmental Modelling. 1997;1(1):5–23. [Google Scholar]
  • 2.Sleeter R, Gould MD. Geographic information system software to remodel population data using dasymetric mapping methods; 2007.
  • 3. Jelinski DE, Wu J. The modifiable areal unit problem and implications for landscape ecology. Landscape Ecology. 1996;11(3):129–140. 10.1007/BF02447512 [DOI] [Google Scholar]
  • 4. Marceau DJ. The Scale Issue in the Social and Natural Sciences. Canadian Journal of Remote Sensing. 1999;25(4):347–356. 10.1080/07038992.1999.10874734 [DOI] [Google Scholar]
  • 5. Manley D. Scale, aggregation, and the modifiable areal unit problem. Handbook of regional science. 2014; p. 1157–1171. 10.1007/978-3-642-23430-9_69 [DOI] [Google Scholar]
  • 6. Gehlke CE, Biehl K. Certain Effects of Grouping upon the Size of the Correlation Coefficient in Census Tract Material. Journal of the American Statistical Association. 1934;29(185A):169–170. 10.2307/2277827 [DOI] [Google Scholar]
  • 7. Openshaw S. A million or so correlation coefficients, three experiments on the modifiable areal unit problem. Statistical applications in the spatial science. 1979; p. 127–144. [Google Scholar]
  • 8. Openshaw S. Ecological Fallacies and the Analysis of Areal Census Data. Environment and Planning A: Economy and Space. 1984;16(1):17–31. 10.1068/a160017 [DOI] [PubMed] [Google Scholar]
  • 9. Manley D, Flowerdew R, Steel D. Scales, levels and processes: Studying spatial patterns of British census variables. Computers, Environment and Urban Systems. 2006;30(2):143–160. 10.1016/j.compenvurbsys.2005.08.005 [DOI] [Google Scholar]
  • 10. Dark SJ, Bram D. The modifiable areal unit problem (MAUP) in physical geography. Progress in Physical Geography: Earth and Environment. 2007;31(5):471–479. 10.1177/0309133307083294 [DOI] [Google Scholar]
  • 11. Robinson W. Ecological Correlations and the Behavior of Individuals. American Sociological Review. 1950;15(3). 10.2307/2087176 [DOI] [Google Scholar]
  • 12. Briant A, Combes PP, Lafourcade M. Dots to boxes: Do the size and shape of spatial units jeopardize economic geography estimations? Journal of Urban Economics. 2010;67(3):287–302. 10.1016/j.jue.2009.09.014 [DOI] [Google Scholar]
  • 13. Amici V, Rocchini D, Filibeck G, Bacaro G, Santi E, Geri F, et al. Landscape structure effects on forest plant diversity at local scale: Exploring the role of spatial extent. Ecological Complexity. 2015;21:44–52. 10.1016/j.ecocom.2014.12.004 [DOI] [Google Scholar]
  • 14. Swift A, Liu L, Uber J. MAUP sensitivity analysis of ecological bias in health studies. GeoJournal. 2014;79(2):137–153. 10.1007/s10708-013-9504-z [DOI] [Google Scholar]
  • 15. Bacaro G, Rocchini D, Diekmann M, Gasparini P, Gioria M, Maccherini S, et al. Shape matters in sampling plant diversity: Evidence from the field. Ecological Complexity. 2015;24:37–45. 10.1016/j.ecocom.2015.09.003 [DOI] [Google Scholar]
  • 16. Nouri H, Anderson S, Sutton P, Beecham S, Nagler P, Jarchow CJ, et al. NDVI, scale invariance and the modifiable areal unit problem: An assessment of vegetation in the Adelaide Parklands. Science of the Total Environment. 2017;584-585:11–18. 10.1016/j.scitotenv.2017.01.130 [DOI] [PubMed] [Google Scholar]
  • 17. Salas-Olmedo MH, Moya-Gómez B, García-Palomares JC, Gutiérrez J. Tourists’ digital footprint in cities: Comparing Big Data sources. Tourism Management. 2018;66:13–25. 10.1016/j.tourman.2017.11.001 [DOI] [Google Scholar]
  • 18. García-Llamas P, Calvo L, De la Cruz M, Suárez-Seoane S. Landscape heterogeneity as a surrogate of biodiversity in mountain systems: What is the most appropriate spatial analytical unit? Ecological Indicators. 2018;85(November 2017):285–294. 10.1016/j.ecolind.2017.10.026 [DOI] [Google Scholar]
  • 19. Stevens FR, Gaughan AE, Linard C, Tatem AJ. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PloS one. 2015;10(2):e0107042 10.1371/journal.pone.0107042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Nicolas G, Robinson TP, Wint GRW, Conchedda G, Cinardi G, Gilbert M. Using Random Forest to improve the downscaling of global livestock census data. PLoS ONE. 2016;11(3):1–16. 10.1371/journal.pone.0150424 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Van Boeckel TP, Prosser D, Franceschini G, Biradar C, Wint W, Robinson T, et al. Modelling the distribution of domestic ducks in Monsoon Asia. Agriculture, Ecosystems and Environment. 2011;141(3–4):373–380. 10.1016/j.agee.2011.04.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wint G, Robinson T. Gridded livestock of the world. Food and Agriculture Organization of the United Nations, Rome; 2007. [PubMed]
  • 23. Tatem AJ. WorldPop, open data for spatial demography. Scientific Data. 2017;4:170004 10.1038/sdata.2017.4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Utazi C, Thorley J, Alegana V, Ferrari M, Nilsen K, Takahashi S, et al. A spatial regression model for the disaggregation of areal unit based data to high-resolution grids with application to vaccination coverage mapping. Statistical Methods in Medical Research. 2018; p. 096228021879736. 10.1177/0962280218797362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Robinson TP, William Wint GR, Conchedda G, Van Boeckel TP, Ercoli V, Palamara E, et al. Mapping the global distribution of livestock. PLoS ONE. 2014;9(5):e96084 10.1371/journal.pone.0096084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Gilbert M, Nicolas G, Cinardi G, Van Boeckel TP, Vanwambeke SO, Wint GRWW, et al. Global distribution data for cattle, buffaloes, horses, sheep, goats, pigs, chickens and ducks in 2010. Scientific Data. 2018;5(1):1–11. 10.1038/sdata.2018.227 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Vigiak O, Grizzetti B, Udias-Moinelo A, Zanni M, Dorati C, Bouraoui F, et al. Predicting biochemical oxygen demand in European freshwater bodies. Science of the Total Environment. 2019;666:1089–1105. 10.1016/j.scitotenv.2019.02.252 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jara M, Escobar LE, Rodriges RO, Frias-De-Diego A, Sanhueza J, Machado G. Spatial distribution and spread potential of sixteen Leptospira serovars in a subtropical region of Brazil. Transboundary and emerging diseases. 2019. [DOI] [PubMed]
  • 29.Seré C, Steinfeld H. World livestock production systems-Current status. Issues and Trends (Food Agriculture Organization, Rome). 1996.
  • 30. Gilbert M, Chaitaweesub P, Parakamawongsa T, Premashthira S, Tiensin T, Kalpravidh W, et al. Free-grazing ducks and highly pathogenic avian influenza, Thailand. Emerging infectious diseases. 2006;12(2):227–34. 10.3201/eid1202.050640 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Van Boeckel TP, Thanapongtharm W, Robinson T, D’Aietti L, Gilbert M. Predicting the distribution of intensive poultry farming in Thailand. Agriculture, Ecosystems & Environment. 2012;149:144–153. 10.1016/j.agee.2011.12.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Lawrence RL, Wood SD, Sheley RL. Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (randomForest). Remote Sensing of Environment. 2006;100(3):356–362. 10.1016/j.rse.2005.10.014 [DOI] [Google Scholar]
  • 33. Gislason PO, Benediktsson JA, Sveinsson JR. Random Forests for land cover classification. Pattern Recognition Letters. 2006;27(4):294–300. 10.1016/j.patrec.2005.08.011 [DOI] [Google Scholar]
  • 34. Prosser DJ, Wu J, Ellis EC, Gale F, Van Boeckel TP, Wint W, et al. Modelling the distribution of chickens, ducks, and geese in China. Agriculture, Ecosystems and Environment. 2011;141(3-4):381–389. 10.1016/j.agee.2011.04.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.OpenStreetMap Contributors. OpenStreetMap; 2014.
  • 36.Center for International Earth Science Information Network (CIESIN)—Columbia University. Gridded population of the world, version 4 (GPWV4): population density; 2016.
  • 37.IUCN, UNEP-WCMC. The World Database on Protected Areas (WDPA); 2010. Available from: www.protectedplanet.net
  • 38.Nelson A. Travel time to major cities: A global map of Accessibility. Ispra: European Commission. 2008;.
  • 39. Weiss DJ, Nelson A, Gibson HS, Temperley W, Peedell S, Lieber A, et al. A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature. 2018;553(7688):333–336. 10.1038/nature25181 [DOI] [PubMed] [Google Scholar]
  • 40.Land Process Distributed Active Archive Center (LDAAC). Global 30 Arc-Second Elevation Data Set GTOPO30; 2004.
  • 41. Scharlemann JPW, Benz D, Hay SI, Purse BV, Tatem AJ, Wint GRW, et al. Global Data for Ecology and Epidemiology: A Novel Algorithm for Temporal Fourier Processing MODIS Data. PLoS ONE. 2008;3(1):e1408 10.1371/journal.pone.0001408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Jones P, Policy PTES&, undefined 2009. Croppers to livestock keepers: livelihood transitions to 2050 in Africa due to climate change. Elsevier;.
  • 43. Zhang X, Friedl MA, Schaaf CB, Strahler AH, Hodges JCF, Gao F, et al. Monitoring vegetation phenology using MODIS. Remote Sensing of Environment. 2003;84(3):471–475. 10.1016/S0034-4257(02)00135-9 [DOI] [Google Scholar]
  • 44. Hansen MC, Potapov PV, Moore R, Hancher M, Turubanova S, Tyukavina A, et al. High-resolution global maps of 21st-century forest cover change. science. 2013;342(6160):850–853. 10.1126/science.1244693 [DOI] [PubMed] [Google Scholar]
  • 45.Arino O, Ramos Perez JJ, Kalogirou V, Bontemps S, Defourny P, Van Bogaert E. Global land cover map for 2009 (GlobCover 2009). ESA & UCL. 2012;.
  • 46. Fick SE, Hijmans RJ. WorldClim 2: new 1–km spatial resolution climate surfaces for global land areas. International Journal of Climatology. 2017;37(12):4302–4315. 10.1002/joc.5086 [DOI] [Google Scholar]
  • 47. Balk D, Yetman G. The global distribution of population: evaluating the gains in resolution refinement. New York: Center for International Earth Science Information Network (CIESIN), Columbia University; 2004;. [Google Scholar]
  • 48. Linard C, Gilbert M, Tatem AJ. Assessing the use of global land cover data for guiding large area population distribution modelling. GeoJournal. 2011;76(5):525–538. 10.1007/s10708-010-9364-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.R Development Core Team R. R: A Language and Environment for Statistical Computing; 2011. Available from: http://www.r-project.org.
  • 50. Swift A, Liu L, Uber J. Reducing MAUP bias of correlation statistics between water quality and GI illness. Computers, Environment and Urban Systems. 2008;32(2):134–148. 10.1016/j.compenvurbsys.2008.01.002 [DOI] [Google Scholar]
  • 51. Tuson M, Yap M, Kok MR, Murray K, Turlach B, Whyatt D. Incorporating geography into a new generalized theoretical and statistical framework addressing the modifiable areal unit problem. International Journal of Health Geographics. 2019;18(1):1–15. 10.1186/s12942-019-0170-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Fox J, Rindfuss RR, Walsh SJ, Mishra V. People and the environment: Approaches for linking household and community surveys to remote sensing and GIS. vol. 1 Springer Science & Business Media; 2003. [Google Scholar]
  • 53. Cebrecos A, Domínguez-Berjón MF, Duque I, Franco M, Escobar F. Geographic and statistic stability of deprivation aggregated measures at different spatial units in health research. Applied Geography. 2018;95:9–18. 10.1016/j.apgeog.2018.04.001 [DOI] [Google Scholar]
  • 54. Rohde D, Corcoran J, Chhetri P. Spatial forecasting of residential urban fires: A Bayesian approach. Computers, Environment and Urban Systems. 2010;34(1):58–69. 10.1016/j.compenvurbsys.2009.09.001 [DOI] [Google Scholar]
  • 55. Xu P, Huang H, Dong N, Abdel-Aty M. Sensitivity analysis in the context of regional safety modeling: Identifying and assessing the modifiable areal unit problem. Accident Analysis & Prevention. 2014;70:110–120. 10.1016/j.aap.2014.02.012 [DOI] [PubMed] [Google Scholar]
  • 56. Truong PN, Stein A. A hierarchically adaptable spatial regression model to link aggregated health data and environmental data. Spatial Statistics. 2018;23:36–51. 10.1016/j.spasta.2017.11.002 [DOI] [Google Scholar]
  • 57. Tagashira N, Okabe A. The Modifiable Areal Unit Problem, in a Regression Model Whose Independent Variable Is a Distance from a Predetermined Point. Geographical Analysis. 2002;34(1):1–20. 10.1353/geo.2002.0006 [DOI] [Google Scholar]
  • 58. Parenteau MP, Sawada MC. The modifiable areal unit problem (MAUP) in the relationship between exposure to NO 2 and respiratory health. International journal of health geographics. 2011;10(1):58 10.1186/1476-072X-10-58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Mitra R, Buliung RN. Built environment correlates of active school transportation: neighborhood and the modifiable areal unit problem. Journal of Transport Geography. 2012;20(1):51–61. 10.1016/j.jtrangeo.2011.07.009 [DOI] [Google Scholar]
  • 60. Lee G, Cho D, Kim K. The modifiable areal unit problem in hedonic house-price models. Urban Geography. 2016;37(2):223–245. 10.1080/02723638.2015.1057397 [DOI] [Google Scholar]
  • 61. Goodman AC, et al. A comparison of block group and census tract data in a hedonic housing price model. Land Economics. 1977;53(4):483–487. 10.2307/3145991 [DOI] [Google Scholar]
  • 62. Wakefield J. Sensitivity Analyses for Ecological Regression. Biometrics. 2003;59(1):9–17. 10.1111/1541-0420.00002 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Sotirios Koukoulas

18 Oct 2019

PONE-D-19-20969

Downscaling livestock census data using multivariate predictive models: sensitivity to modifiable areal unit problem

PLOS ONE

Dear Mr. Da Re,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Dec 02 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Sotirios Koukoulas, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that Figure 1-3 in your submission contain map/satellite images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

 We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

  1. You may seek permission from the original copyright holder of Figures 1-3 to publish the content specifically under the CC BY 4.0 license. 

 We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

 Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

 In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

  1. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.The following resources for replacing copyrighted map figures may be helpful:

 USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

3. Please amend the manuscript submission data (via Edit Submission) to include author Sophie O. Vanwambeke.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper addresses the MAUP, an important issue in spatial analysis that is related to changing spatial scale. Although the paper is well written, I recommend the authors adding clarifications for several elements as follows:

1. Statement in lines 29-31 seems not telling the complete review on the research of MAUP. There are many published studies on the MAUP issues for downscaling or spatial disaggregation.

2. Explain what the individual level data at line 77-78 are? How many of them? For the whole country or not? In which year? Etc., a detailed description of the data at individua level is required.

3. Line 76: which aggregation method was used, linear or non-linear aggregation?

4. Line 90-99: explain why those predictors were used with reference.

5. Line 97: explain why log-transform was applied to poultry density? What correction was used as mentioned in line 97 (poultry densities corrected by area).

6. Provide a map that shows the three administration units in Thailand: village, sub-district and district, how large are they relatively to each other?

7. Also provide the maps visualizing IRR and REG PSUs. Are the IRR PSUs the same as the three administration units above?

8. What are the census polygons in line 118?

9. Fig 2 has poor quality, I cannot read them all.

10. Why are there two separate sections for model evaluation and downscaling precision? Are they not the same? What is the difference between COR in line 179 and CORdown in line 187?

11. Overall, I find the findings of this paper obvious. Should this be again published the facts that we all know?

Reviewer #2: My review has turned out to be more of a proof reading session than anything else I have up;loaded an annotated pdf with quite a few very minor linguistic changes highlighted.

The science seems clean and well presented , and pretty much acceptable as it is. I have only one minor technical question - namely about the use of Voronoi polygons around villages - doesn't this mean that any density calculated for the polygon is affected by the distance between villages. I dont think this affects the validity of the study, but might it affect the values of the means calculated and this the comparison between different scales??. Not a major problem though, so I am happy to recommend acceptance with very minor revisions

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: daREgilbertetalPONE-D-19-20969_reviewed.pdf

PLoS One. 2020 Jan 27;15(1):e0221070. doi: 10.1371/journal.pone.0221070.r002

Author response to Decision Letter 0


10 Dec 2019

Reviewers’ Comments and Authors Response

Reviewer 1

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper addresses the MAUP, an important issue in spatial analysis that is related to changing spatial scale. Although the paper is well written, I recommend the authors adding clarifications for several elements as follows:

1. Statement in lines 29-31 seems not telling the complete review on the research of MAUP. There are many published studies on the MAUP issues for downscaling or spatial disaggregation.

A (Authors’ response): We have edited the sentence highlighting the fact that, as far as we know, the MAUP issue was more addressed in the aggregation and in relation to the changes of models estimates rather than investigating the effect of MAUP in the dowscaling precision.

2. Explain what the individual level data at line 77-78 are? How many of them? For the whole country or not? In which year? Etc., a detailed description of the data at individua level is required.

A: We specified at L78 that the individual level data are the poultry census described in “Poultry population data” section.

3. Line 76: which aggregation method was used, linear or non-linear aggregation?

A: We used simple additive aggregation method, we have now specified it at L78 of the revised manuscript.

4. Line 90-99: explain why those predictors were used with reference.

A: We have specified it at L93-94.

5. Line 97: explain why log-transform was applied to poultry density? What correction was used as mentioned in line 97 (poultry densities corrected by area).

A: For the sake of brevity, and given the general use of these setting in the GLW framework, we refer the interested reader to the paper by Gilbert et al 2018 were this is fully explained.

6. Provide a map that shows the three administration units in Thailand: village, sub-district and district, how large are they relatively to each other?

A: We have not provided a map showing the three administrative units because at country scale it would not have been possible to appreciate the differences among them. However, we have specified the Average Spatial Resolution (ASR) of the three administrative units at L114-115.

7. Also provide the maps visualizing IRR and REG PSUs. Are the IRR PSUs the same as the three administration units above?

A: IRR and REG PSUs can be observed in Fig3. The IRR PSUs do correspond to the three administration units described above.

8. What are the census polygons in line 118?

A: The polygons used to train the model. We have called them polygons now.

9. Fig 2 has poor quality, I cannot read them all.

A: We have improved the resolution of Fig2.

10. Why are there two separate sections for model evaluation and downscaling precision? Are they not the same? What is the difference between COR in line 179 and CORdown in line 187?

A: As specified at L42, CORdown “(...) Pearson’s r was computed between predictions and the observed data at the village level only to assess the capacity of models trained using various PSUs to predict poultry population at a fine scale”.

11. Overall, I find the findings of this paper obvious. Should this be again published the facts that we all know?

A: Though obvious, we believe that our sensitivity analysis and findings are needed to support the increase use of the GLW and other downscaled data products, in order to make the users aware of the potential issue of this data product itself and of all the downscaled data product that are becoming more and more available every year.

Reviewer 2

12. My review has turned out to be more of a proof reading session than anything else I have up;loaded an annotated pdf with quite a few very minor linguistic changes highlighted.

A: We thank Reviewer2 for his/her positive comment. All the comments in the PDF have been addressed.

13. The science seems clean and well presented , and pretty much acceptable as it is. I have only one minor technical question - namely about the use of Voronoi polygons around villages - doesn't this mean that any density calculated for the polygon is affected by the distance between villages. I dont think this affects the validity of the study, but might it affect the values of the means calculated and this the comparison between different scales??.

In addiction Comment at L78: Doest using Voronoi polygon then mean the the animals are spread throughout the polygon...i.e the training density depends on the size of the polygons - or the distance between villages? Might it have been better to use a fixed buffer or grid instead so the poultry density wasnt affected by village density??

A: We compute the Voronoi polygons because villages administrative units for Thailand are not available. This was a simple methodological choice in order to get small scale polygons and have the chance to train the GLW model using different polygons scales.

14. Not a major problem though, so I am happy to recommend acceptance with very minor revisions

A: We thank Reviewer2 for his/her positive assessment.

15. Comment on L218: ??coarse predicitions despite the fact that the predictors were the same resolution (500m) for all models

A: We are referring to the scale of the training polygons dataset and not to the scale and spatial resolution of the predictors. We have specified in the revised version that we are referring to the scale of the training polygons.

16. Comment on L218: Accordingly:In agreement with ??BUT SEE your line 220 re MAUP scale effect- coarser training data gave better accuracy. Is this not a contradiction?

or have i missed the point??

A: The dowscaling precision (CORdown) measures the “(...) Pearson’s r (…) between predictions and the observed data at the village level only to assess the capacity of models trained using various PSUs to predict poultry population at a fine scale”. It is a measure of the performances of the GLW model to downscale a polygon, which is better when the training polygon is small.

On the contrary, if we want to asses the precision of the model when predicting the whole response variable distribution, it provides better estimates when the model is trained with coarser polygons, due to the narrowing effect of the data distribution caused by the MAUP-scale effect (please see e.g. Supplementary Materials 2.1).

Attachment

Submitted filename: PONE-D-19-20969_RebuttalLetter_Reply2reviewers.pdf

Decision Letter 1

Sotirios Koukoulas

19 Dec 2019

Downscaling livestock census data using multivariate predictive models: sensitivity to modifiable areal unit problem

PONE-D-19-20969R1

Dear Dr. Da Re,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Sotirios Koukoulas, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Sotirios Koukoulas

23 Dec 2019

PONE-D-19-20969R1

Downscaling livestock census data using multivariate predictive models: sensitivity to modifiable areal unit problem

Dear Dr. Da Re:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sotirios Koukoulas

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Polygon sampling units areas’ histograms.

    The histograms of the area of polygon sampling units used to estimate RMSE and COR for different polygon areal sizes. The red bars represent the Average Spatial Resolution (ASR) of the polygons, while the blue lines are the polygon area classes chosen: a) 0-500 km², 500-1000 km² and >1000 km² are the districts area classes used, ASR = 3.11 km, b) 0-100 km², 100-200 km² and >200 km² are the sub-districts area classes used, ASR = 8.33 km, c) 0-10 km², 10-20 km² and >20 km² are the villages area classes used, ASR = 23.60 km.

    (TIF)

    S2 Fig. Observed and predicted Log10 chicken values histogram inside bbox1.

    The blue lines represent the mean value.

    (TIF)

    S3 Fig. Observed and predicted Log10 duck values histogram inside bbox 1.

    The blue lines represent the mean value.

    (TIF)

    S4 Fig. Observed and predicted Log10 poultry values inside bbox 2.

    a) chickens, b) Ducks.

    (TIF)

    S5 Fig. Observed and predicted Log10 chickens values histogram inside bbox 2.

    The blue lines represent the mean value.

    (TIF)

    S6 Fig. Observed and predicted Log10 ducks values histogram inside bbox 2.

    The blue lines represent the mean value.

    (TIF)

    S7 Fig. Observed and predicted Log10 poultry values inside bbox 3.

    a) chickens, b) Ducks.

    (TIF)

    S8 Fig. Observed and predicted Log10 chickens values histogram inside bbox 3.

    The blue lines represent the mean value.

    (TIF)

    S9 Fig. Observed and predicted Log10 ducks values histogram inside bbox 3.

    The blue lines represent the mean value.

    (TIF)

    S1 Table. Summary table of models’ goodness of fit and downscaling precision.

    (CSV)

    Attachment

    Submitted filename: daREgilbertetalPONE-D-19-20969_reviewed.pdf

    Attachment

    Submitted filename: PONE-D-19-20969_RebuttalLetter_Reply2reviewers.pdf

    Data Availability Statement

    The R codes used in the study and the aggregated census data at different administrative levels are available in the gitlab folder https://gitlab.com/danidr/glw/tree/master/glw_maup.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES