Skip to main content
PLOS One logoLink to PLOS One
. 2020 Nov 9;15(11):e0241981. doi: 10.1371/journal.pone.0241981

Better coverage, better outcomes? Mapping mobile network data to official statistics using satellite imagery and radio propagation modelling

Till Koebe 1,*
Editor: Jacinto Estima2
PMCID: PMC7652289  PMID: 33166359

Abstract

Mobile sensing data has become a popular data source for geo-spatial analysis, however, mapping it accurately to other sources of information such as statistical data remains a challenge. Popular mapping approaches such as point allocation or voronoi tessellation provide only crude approximations of the mobile network coverage as they do not consider holes, overlaps and within-cell heterogeneity. More elaborate mapping schemes often require additional proprietary data operators are highly reluctant to share. In this paper, I use human settlement information extracted from publicly available satellite imagery in combination with stochastic radio propagation modelling techniques to account for that. I show in a simulation study and a real-world application on unemployment estimates in Senegal that better coverage approximations do not necessarily lead to better outcome predictions.

Introduction

Mobile phone metadata has become a popular data source to complement official statistics. When an individual makes a call, sends a message or uses the mobile internet, meta information about this interaction, such as the time stamp and the location, are stored in a database of the mobile network operator (MNO). Researchers exploit those spatio-temporal references for geo-located analysis. One string of research in this field investigates the question whether a certain characteristic such as poverty, literacy or food insecurity is reflected in mobile phone behaviour. Matching this behaviour accurately to a ‘groundtruth’—often statistical data from surveys or censuses provided for statistical areas—however, poses a major challenge as the two data sources lack a common reference. In the case of call detail records (CDRs), the geographic reference is provided by the antenna location, often stored as a point coordinate of the physical location of the corresponding base transmitter station (BTS). Due to its simplicity, some scientific literature treat antennas as point coordinates [1]. However, the interactions captured by the antenna do not happen entirely at this exact coordinate, but within the coverage area of the antenna—the cell. While an antenna may be located in one statistical area, most of the cell may lie within the neighboring area. The state-of-the-art attempt to address this is to use spatial weights based on the overlapping area size of statistical areas and cells approximated via voronoi tessellation [2, 3]. This approach has three major drawbacks: First, voronoi tessellation perfectly divides the space around BTS locations depending on the distance to the surrounding BTS. This represents a naïve approximation of the true coverage areas as it does not take overlaps, areas without coverage and additional network complexities (multiple antennas per site/BTS, directionality of antennas, varying frequency bands etc.) into account [4]. For example, roughly 90 million people in Africa in 2019 were still not connected to any mobile network hinting at major holes in the coverage [5]. Second, even though the concept of ‘home-locating’ subscribers to specific BTS offers a network-based alternative to the statistical concept of ‘usual place of residence’, it is not reflected within cells. As the weights are based on area sizes, the voronoi tessellation implicitly assumes that individuals/households are homogeneously distributed within cells, which in most cases does not hold true. For example, a lake would receive the same importance in the creation of area-level mobile phone metadata aggregates as an equally sized built-up area. Third, as mobile stations (MS, generally defined as a combination of device and SIM card) and antennas communicate via modulated radio signals whose propagation paths depend on a range of factors such as the weather, coverage areas are stochastic by nature. More elaborate approaches to model coverage ranges of mobile networks exist [4, 6], especially in the field of radio propagation modelling native to electrical engineering, however, they often require detailed information on the area’s topology, a number of technical details concerning the network infrastructure and additional information from passive monitoring systems, which mobile network operators are generally highly reluctant to share and in the latter case often not capable to collect.

Contributions

Acknowledging this, I divide my methodological contribution in this paper in two parts: First, I propose the use of settlement information extracted from publicly available satellite imagery to account for within-cell heterogeneity within the mobile network when linking statistical data with mobile phone metadata. Building on this, the second part of the methodology takes advantage of scenarios where additional technical specifications are available in order to address the issues for holes, overlaps and non-linearities within the mobile network using propagation-based modelling. My main contributions are as follows:

  1. The idea of using settlements retrieved from publicly available satellite imagery as a common reference for statistical units such as households and ‘home-located’ MS in order to calculate weights for mapping mobile phone metadata and statistical data based on settlement counts in scenarios where MS counts are not available. This way, within-cell heterogeneity is addressed.

  2. A propagation-based approach to account for overlaps, holes and non-linearities in coverage service provision—in case additional information on the network infrastructure are available.

  3. A large-scale simulation study on a synthetic population grid to systematically compare the accuracy of different mapping approaches and their effects on predictive performance.

  4. A real-world application that demonstrates the impact of the mapping choice on outcomes in later analysis.

Datasets

In the application, I revisit the simulation study of Schmid et al. [1] published in 2017 in the Journal of the Royal Statistical Society Series A on fine-granular unemployment estimates from mobile phone metadata in Senegal in order to investigate the effects of different mapping schemes on the unemployment outcomes. Therefore, I re-run the original simulation with the difference that I implement multiple mapping schemes to derive area-level covariates from CDRs. Specifically, I use behavioural indicators and SIM card counts extracted from CDRs provided by the major Senegalese MNO Sonatel in the context of the D4D 2014 challenge for the whole year of 2013 and aggregated on the level of BTS, for which the exact geo-coordinates are also provided [7]. The behavioural indicators are generated using the popular open-source Python module Bandicoot [8]. Further, I use population counts from the full 2013 general population and housing census (RGPHAE 2013) available for the NUTS 4-level of Senegal—the communes—on the website of ANSD, the National Statistical Office of Senegal. Commune-level unemployment information are generated from a 10% sample of RGPHAE 2013. Unemployment information in RGPHAE 2013 are self-reported.

Geographic information on the administrative boundaries are available for communes and above. The settlement-based weights I present in this paper use data on human settlement areas in Senegal extracted from the Global Urban Footprint (GUF) project [9] of the German Aerospace Center (DLR) at a resolution of 0.4 arc seconds, which is approximately 12m x 12m. The GUF project used 180,000 TerraSAR-X and TanDEM-X images collected during the period of 2011—2012 (with some data from 2013/14 to fill gaps) to create black and white abstractions where white pixels represent human settlements with a true positive rate (accuracy to correctly detect human settlements) of 85% on average, with 68% at lowest and 98% at heighest. GUF data for Senegal is provided as a single black and white.tif-file with a resolution of 55568 x 39459 pixels (see Fig 1). All datasets used in this study are available for research purposes under the conditions of the respective data use agreements.

Fig 1. Settlements in Senegal provided as b/w image by the GUF project.

Fig 1

Lower resolution built-settlements extents data reprinted from [10] under a CC BY license, with permission from WorldPop, original copyright 2018, are used in this figure for illustrative purposes.

Related work

Increasing processing capabilities have propelled the use of satellite imagery in official statistics. The UN [11] recommends using satellite imagery to prioritize and check geospatial processes such as the delineation of enumeration areas during census preparation. It further supports the construction of population grids as a common spatial reference system as proposed by [12, 13]. Various studies have used remote sensing, sometimes in combination with mobile phone metadata, to estimate key statistical indicators such as economic growth [1416], population density [1720] or poverty [2, 21, 22]. Work in that field most closely related to this study uses settlement information extracted from satellite imagery in combination with radio propagation models for application in cost-benefit analysis concerning additional infrastructure investments [23]. While [23] also uses population counts from official statistics to estimate the latent demand for mobile services, the author neither investigates the effects of different coverage mapping techniques on the results nor does he use mobile phone metadata for statistical purposes.

In addition, the last decade has seen an impressive amount of research on proposing the use of mobile phone metadata for official statistics foremost in the hope to overcome the limiting relationship of sample size and data collection costs. [24] provides an excellent overview on the use of mobile phone metadata that also covers its application for statistical purposes. Use cases to produce more frequent, more granular and/or more timely data on a wide range of statistical topics have been identified. For example, [4, 2528] use mobile phone metadata to investigate population dynamics for more frequent population and tourism statistics. [29, 30] apply the question on the whereabouts of a population to the post disaster setting. Mobility aspects such as commuting and travelling routines have been looked at in more detail by [3136]. By exploiting both mobility and (social) network characteristics of mobile phone metadata, [3742] and [43, 44] use mobile phone metadata to model disease spreading and integration, respectively. Mobile usage patterns have been explored to provide fine granular insights on socio-demographic indicators such as multi-dimensional poverty [2, 3], literacy [1, 45] and economic vulnerability [46, 47]. While most of these studies have mapped mobile phone metadata and groundtruth data using point-to-polygon allocation or voronoi tessellation, very few studies have applied more elaborate approximation schemes. [4] propose a methodology based on maximum likelihood estimation that uses cell footprints provided by one or multiple MNOs in combination with location data from passive monitoring systems to acquire more accurate measures on the density of MS. The authors run a simulation study on a 100x100m synthetic population grid to compare the proposed methodology against voronoi-based coverage maps. However, the methodology requires very detailed information from the involved MNOs, e.g. on the cell footprints and the signalling data that may prove difficult to acquire in practice (see Section Mobile phone metadata). Further, while the authors rightly assume a multinomial distribution of the MS counts, finding appropriate distributions for the wide range of behavioural covariates appears less trivial. In order to simplify and improve the coverage mapping process, members of the European Statistical System as part of the ESSnet Big Data project are currently developing mobloc [48]—an R package that implements the free space path loss propagation model using technical specifications of antennas as input parameters. However, neither [4] nor [48] systematically evaluate different coverage mapping techniques on statistical modelling approaches using real-world data.

Background

Mobile phone metadata

Mobile networks not only transport data for communication purposes, they also generate data for reasons such as network auditing, billing, maintenance and service provision. Some of this meta information is created in interaction with user equipment such as MS. There are four main caveats of using mobile phone metadata for population statistics in general. All of them have in common that they are active areas of current research. First, the customer base of an MNO constitutes a non-representative population sample with unknown sampling design. The consequences are varying sampling rates, i.e. locally changing market shares and parts of the population being structurally excluded from the sample such as children, elderly and the very poor. Second, the unit of observation—i.e. the MS, device, the SIM card and/or the subscriber—does not perfectly match the unit of interest, which is the individual or household, as phone sharing schemes or multi-SIM uses illustrate. Common approaches to account for these two caveats are calibration and/or reconstructing the sampling design empirically. Third, mobile phone metadata lacks the statistical concept of usual residence—a concept frequently used in official statistics to determine the geo-location of an individual/household defined as the place where an individual has lived or intends to live for a period of at least 6 or 12 months [49]. Different approaches to approximate the home location of an MS exists (e.g. night-time home location defined as the most frequently used cell by an MS between 7pm and 7am during a certain time window), however, the definitions do not map perfectly introducing uncertainty in further analysis [50]. Fourth, coverage areas cannot be pinpointed as radio propagation is dynamic and stochastic by nature. Propagation models of various complexity exist to provide approximations as coverage ranges can generally vary from couple of hundred meters to over 40km.

Most scientific studies in the context of international development and official statistics use CDRs—logs of interactions such as calls, text messages or internet use containing attributes of the MS, the network and the connection—as a basis for further analysis. The advantages of CDRs compared to other mobile phone metadata such as Visitor Location Registers (VLRs) or other signalling data are threefold: First, they provide fine-grained geographical resolution through cell-level identifiers. Second, they provide information both on the mobility and the (social) network of the MS. Third, CDRs are fairly easy to access and to use in analysis as the storage of essential attributes adheres to global standards such as 3gpp 32.295. However, in addition to the aforementioned general caveats of mobile phone metadata there are important caveats specific to CDRs: Social network information extracted from CDRs are increasingly incomplete due to a shift towards app-based communication (e.g. Whatsapp and Facebook messenger). Mobility patterns are fragmented as locations are logged only during active MS use—again a case of non-random sampling. Some MNOs are able to extract more detailed information on the location of an MS and its app usage e.g. for geo-fencing purposes or app-based pricing schemes through trilateration of signalling data and deep packet inspection, respectively. This, however, requires specific hardware equipment and software capabilities, which not every MNO has. Consequently, these type of information are rarely available to researchers.

Radio propagation modelling

Radio propagation modelling has been subject to research for decades. Coverage mappings in mobile networks are generally used for network planning purposes [23, 51]. Looking at Phillips et al. [6] is highly recommended as they provide an excellent overview on coverage mapping methods. In general, radio propagation modelling techniques in mobile networks largely focus on estimating the path loss Lp a radio signal incurs en route between a transmitter tx and a receiver rx. Together with the output power of the transmitter Ptx, the gains through directivity and efficiency of the involved antennas Gtx and Grx and their respective technically-incurred losses Ltx and Lrx, it defines the link budget—the received power Prx usually expressed logarithmically in decibel per milliwatt (dBm).

Prx=Ptx+Gtx+Grx-Ltx-Lrx-Lp (1)

Since all RHS parameters except Lp are either known in advance due to the choice of the technical equipment (i.e. Gtx and Ltx) or hardly observable (i.e. Grx and Lrx), I assume Gtx + GrxLtxLrx = 0 in the following, leading to a simplified link budget defined as:

Prx=Ptx-Lp (2)

Intuitively, Eq 2 thus states that the signal strength observed on a MS solely depends on the output power of the connected antenna and the loss in signal strength that occurs along the way between antenna and MS. Given the abundance of available models, I follow the guidance of the European Conference of Postal and Telecommunications Administrations (CEPT) on radio propagation simulation for mobile services and opt for the widely popular extended HATA model [52], named after Masaharu Hata, the author of the 1980 landmark study on the “Empirical Formula for Propagation Loss in Land Mobile Radio Services” [53]. It is derived from the COST-231 HATA model [54], which in turn builds on the original HATA [53] and Okumura model [55]. They all have in common that they are empirical models to estimate the median path loss between a transmitter and a receiver based on real-world measurements. The HATA model extends the Okumura model by distinguishing between urban, suburban and rural settings, thus accounting for different levels of mean attenuation due to obstacles and changes in terrain. The COST-231 HATA model increases the frequency range of the original HATA model. The extended HATA model is applicable for settings with frequencies f between 30-3000 MHz, distances d between 0-100km, transmitter heights htx between 30-200m and receiver heights hrx between 1-10m. The general form of the extended HATA model LpEH consists of a loss function L for the median path loss and a path loss variation term V drawn from a log-normal distribution that accounts for the stochastic nature of radio propagation Since model parameters vary depending on the distance, the expected environment env (indoor/outdoor and rural/suburban/urban) and the frequency, the full extended HATA model is not spelled out in this paper, but can be accessed here: https://ecocfl.cept.org/display/SH/A17.3.1+Outdoor-outdoor+propagation.

LpEH(f,d,htx,hrx,env)=L(f,d,htx,hrx,env)+V(μ,σ,d) (3)

As an example, I provide the path loss function of the extended HATA model LpEH for distances above 0.1km outdoor in rural areas for frequencies between 150 and 1500 MHz:

LpEH=69.6+46.09*log10f-13.82*log10htx+(44.9-6.55*log10htx)*log10d-(1.1*log10f-0.7)*hrx-20*log10(hrx/10)-20*log10(htx/30)-4.78*(log10f)2-40.14+V(12,12) (4)

So, for example, an MS 1m above the ground at a line-of-sight distance of 3km in a rural area to an omnidirectional antenna that is 30m above the ground transmitting at the 900 MHz frequency band would experience a path loss of LpEH118dBm. Assuming a GSM macro-cell with an output power Ptx = 43 dBm using Eq 2 yields a budget for that link, also known as received signal strength (RSS), of Prx=Ptx-LpEH-75dBm. As a rule of thumb, signals with RSS values above −80 dBm are considered excellent, RSS values below −110 dBm point to very poor signals.

Methodology

Usually, statistical data on individuals or households are geo-located to statistical areas via their respective places of residence. Further, unit-level data is aggregated to area-level aggregates using some form of weighting factor such as survey weights. For example, the poverty rate of a region can either be calculated as the share of units classified as poor among the interviewed residents of the region multiplied by their sampling weight or via sub-regional poverty rates weighted with the respective sub-regional population counts. However, neither the places of residences nor the weights are generally available on the cell-level of a mobile network (as an equivalent to the sub-region). Hence, they need to be estimated.

In mobile phone metadata analysis, the place of residence of an individual/household is usually approximated with the night-time home location of an MS recorded at the cell-level.

To derive survey weight proxies, for example, point-to-polygon allocation assumes equal weights for all cells point-located within a statistical area. Voronoi tessellation uses the area size of the intersection of voronoi tile and statistical area as weighting factor, i.e. 1 km2 always conveys the same importance in aggregation, no matter whether it is 1 km2 of sparsely-inhabited desert or 1 km2 of a densely-populated city.

In most cases, the place of residence of an individual/household (thus is approximation alike) is linked to some form of settlement. However, neither the statistical area nor the coverage area of a cell account for that fact. Consequently, the underlying idea behind the proposed methodology is to use human settlement information extracted from publicly available satellite imagery as common geographic reference level for both statistical units such as households and home-located MS. This allows to a) construct weights based on settlement counts and b) refine weights in cases where MS counts, often regarded as highly sensitive information by the MNO, are available. Further, in combination with technical information on the antenna, it allows for an efficient coverage estimation to address the issues of holes and overlaps in a mobile network.

In the following, settlements are denoted as i, BTS as j, statistical areas as t, the number of home-located MS as d, the population count as p, the number of settlements as n and metadata covariates as R. To illustrate the value added of the proposed methodologies, Fig 2a and Table 1 showcase a typical setup faced when one seeks to augment official statistics with mobile phone metadata: statistical indicators are provided for statistical areas A, B and C. Mobile phone metadata is provided as BTS-level aggregates with the corresponding point locations 1 and 2. To account for that, I treat each cell site that may host multiple antennas as single omnidirectional antenna, calling it BTS subsequently. This constitutes a simplification of real mobile networks where usually multiple directional antennas serving on various frequency bands are co-located at the same site that does not necessarily have to be an actual (cell) tower. Although accounting for directionality of antennas as done by e.g. [4] is likely to affect the overall outcome of later analysis by increasing the number of network tiles available for mapping, the challenges for allocating them correctly (holes, non-linearities, overlaps, within-cell heterogeneity) remain. Consequently, it is expected that results from this study also apply to a setup based on directional antennas, thereby justifying the simplifying assumption. Further details on Fig 2b–2f are provided in the following subsections.

Fig 2. Popular and proposed mapping schemes.

Fig 2

Three statistical areas (A-C), two BTS (1-2) and numerous dots representing built-up areas illustrate how different mapping schemes affect the allocation of BTS-level data to statistical data.

Table 1. Example of statistical data and mobile phone metadata.

area_id poverty_rate bts_id # of calls lon lat
1 0.23 6453 34050 43.2344 23.2342
2 0.11 8348 1023 50.0988 18.84217

Point-to-polygon allocation

For purposes such as model fitting one approach to combine statistical data and mobile phone metadata is to aggregate metadata covariates onto the same geographical level, e.g. statistical areas. To do so, the point-to-polygon approach (p2p) treats BTS point locations as such and allocates BTS-level metadata covariates using a binary weighting scheme (see Fig 2b and Eq 5).

wj,tp2p{1ifjt0otherwise (5)

Consequently, all network traffic handled by a BTS is attributed to one statistical area exclusively, no matter whether it was generated by a home-located MS actually ‘residing’ in this area or not. In the toy example, but also in the real-world application presented in Section Application this leads to a situation where no metadata covariates are available for certain area, e.g. area C—with negative effects on the final sample size in model fitting.

Voronoi tessellation

In contrast, voronoi tessellation (denoted by superscript v) divides the total space of interest into perfectly disjunct tiles along the equidistant lines between points, in this case the BTS point locations (see Fig 2c). The current state-of-the-art procedure is to intersect these tiles—representing approximated coverage areas of BTS—with the statistical areas. The weights to aggregate BTS-level metadata covariates to the respective statistical area are derived from the size of the intersection of tiles aj and at of BTS j and statistical area t, respectively, in relation to the total size of at, also expressed as

wj,tvajatat (6)

In the toy example of Fig 2c, this would reduce to be the intersection of e.g. statistical area A and the voronoi tile of BTS 1 divided by the total area of A. However, as mentioned above, area sizes are used in that approach to approximate the (usually) unknown population counts per intersection by implicitly assuming homogeneous distribution of the population within a given statistical area.

Augmented voronoi tessellation

The proposed settlement-based mapping schemes relax this obviously strong assumption by assuming a homogeneous housing structure instead, i.e. a constant population density per settlement area within a given statistical area. Applied to voronoi tessellation, Fig 2c and 2d—with settlement areas represented as dots—illustrate the difference. Instead of using the area sizes aj and at to calculate the weights, the “augmented” voronoi tessellation (av) uses the number of settlements per area, denoted as nj and nt, respectively.

wj,tavnjntnt (7)

Consequently, statistical area-level covariates can easily be acquired for both approaches using a weighted average (or a weighted median) on BTS-level data.

R^t=j=1Jwj,tRj (8)

Going back to the toy example, while BTS 1 covers the smaller part of C in Fig 2c, thus receives a smaller weight in the calculation of area-level metadata aggregates, it looks different in Fig 2d when comparing the number of settlements, represented by green and purple dots. This way, the proposed methodology accounts for within-cell heterogeneity of the population distribution.

Both voronoi tessellation and augmented voronoi tessellation splits the full space of interest into disjunct tiles. Applied to a mobile network this means ubiquituous coverage and zero redundancies, i.e. all dots are uniquely associated to a specific BTS in the toy example. Again this is a strong assumption that most likely does not hold true in any real-world application. To relax this assumption by introducing holes and overlaps in the network coverage, additional information are necessary that allow for the estimation of coverage measures such as the received signal strength (RSS) at any given point in space. Fig 2e exemplifies the consequences: Some settlements are not covered (black dots) and some settlements, even though closer to one BTS, receive a stronger signal from a more distant BTS. Assuming coverages are correctly estimated in Fig 2e and 2f, it demonstrates that point-to-polygon allocation tends to underestimate the coverage of statistical areas while voronoi tessellation tends to overestimate it.

Propagation-based mapping schemes

Previously presented schemes follow a ‘BTS-centric’ approach by first determining the respective coverage area of a BTS and then analyzing potential overlaps with other places of interest such as settlements. In contrast, propagation-based schemes follow an ‘MS-centric’ approach by looking at the connectivity at the place of interest, i.e. the place of usual residence or the home location first and then estimating which (group of) BTS it most likely serves. As outlined in Section Radio propagation modelling, multiple ways exist to estimate the ‘connectivity’ of an MS, but all require at least information on the distance to the surrounding BTS and additional technical specifications. With that, the serving BTS can be determined at each place of interest, thus allowing for a more nuanced coverage mapping. Here, settlements can provide a common geographic reference for the place usual residence and the home location alike.

Best server area (BSA)

In mobile networks, an MS usually connects to the antenna that offers the strongest signal. Thus, the settlement-level weight is 1 for the BTS with the strongest signal and 0 otherwise.

wi,jbsa{1ifPrx,i,j=max(Prx,i,·)0otherwise, (9)

Links weaker than a certain threshold (e.g. a Prx value below—110 dBm) can be discarded as they represent ‘dead’ links. This way the approach accounts for holes in the network coverage. The weights wi,j express the importance of a BTS for a pixel. Similarly to Eq 8, they can be used to determine the statistical area-level covariate estimates Rt^ using a weighted average:

R^t=i=1ntwi,ji=1ntwi,jRj (10)

Due to the binary nature of the weight, i=1ntwi,j represents the number of settlements with mobile coverage within a given statistical area. In areas with homogeneous network infrastructure and full coverage, the best server approach closely resembles the augmented voronoi tessellation with the difference that path loss increases non-linearly with the distance, i.e. locations very close to the location of a BTS may be served by another, more distant one.

Inverse signal strength

Radio propagation is stochastic by nature. Changing environmental conditions and varying network loads affect the RSS at a given location across time. Consequently, the strongest signal is not always provided by the same BTS. In order to assure quality of service, mobile networks usually exhibit a certain number of overlaps. To account for that, I calculate inverse distance weights (IDW) for each pixel i using the median link budget Prx,i,j as non-linear distance measure (see Eq 11) to the k-nearest antennas. s denotes a tuning parameter, where s = 0 reduces wi,jidw to a fixed weight per BTS and a large s can be used to approximate the best server approach.

wi,jidwvi,jj=1kivi,jwithvi,j1|Prx,i,j|sjki (11)

Here again, wi,jidw can be used to calculate statistical area-level weighted averages of BTS-level mobile phone metadata covariates as presented in Eq 10.

Potential extensions

Depending on data availability, the methodology can further be extended. While MNOs often regard MS counts as highly sensitive information since they reveal a detailed picture of local market shares, they can be used to further refine the weights towards more accurate population counts. [4] presents elaborate approaches to use MS counts and advanced technical network specifications to derive high-resolution population density estimates from signalling data.

Further, high-resolution population grid estimates such as provided by WorldPop at 100x100m [12] can be used as an alternative to binary settlement data. Here, w^i,j can be substituted with the estimated population count p^i per pixel directly extracted from the image.

Simulation

In order to evaluate the underlying motivation behind this methodology, i.e. more accurate mapping schemes produce more accurate outcomes, I test the performance of the different mapping approaches in terms of their overlap with the true coverage area and the accuracy of the predictions in a controlled setting with groundtruth information. Therefore, I run a simulation T = 1000 times on a synthetic population grid in which I re-distribute individuals, their poverty status, BTS locations and technical BTS specifications randomly. I observe the geographical overlap of the true and the estimated coverage areas, the overlap in home-located settlements and the correlation between the true and the estimated variable of interest (in this case the poverty rate). The main challenge in this simulation is to create “true” coverage areas for each BTS that provide a realistic, but simplified benchmark for this study. Consequently, I opt for the extended HATA model. The choice is motivated by a series of propagation model evaluations using real-world measurements, notably [5658]. The stochastic component within the HATA model is disabled in order to isolate the effect of interest.

Setup

I simulate a country including a major city, an uninhabited area such as a large lake or a national park and rural area otherwise using a 1000 x 1000 grid where each quadratic pixel represents an edge length of 100m. The urban area is divided into 16 equally-sized (50 x 50 pixel) small statistical areas, whereas the rural area is divided into 24 larger ones (200 x 200 pixel). I randomly distribute one million individuals across the grid using a multivariate normal distributions with μx = 10, μy = 10, Σx = [50, 0] and Σy = [0, 50] for the urban area (1/2 of the total population) and varying parameter values for the rural centers and a uniform distribution for the remaining rural area. Pixel-level population counts are calculated from individual-level data. Fig 3 shows an example of the settlement distribution across space and the corresponding population density.

Fig 3. Simulation setup—Settlements.

Fig 3

(a) shows locations of the built-up areas in a hypothetical country, while (b) shows the corresponding population density in these areas (the brighter the colour, the higher the population density).

In the next step, I randomly assign a poverty rate to each pixel. First, I generate a 4x4-pixel poverty grid for which I calculate the population density (see Fig 4b). In order to account for differences in the poverty rate between urban and rural areas, I randomly draw from a uniform distribution with values between 0 and 1 and multiply it with the inverted normalized population density. This poverty rate serves as the mean μ for randomly assigning poverty rates to settlements within the respective grid area using a normal distribution N(μ, σ) with σ = 0.5. Values below 0 and above 1 are windsorized. This two-step procedure tries to limit good predictive performances for areas not actually covered due to inference facilitated by the same underlying data generating process. Further, I assume that every inhabitant has one and only one MS and that there exists an indicator derived from mobile phone metadata that perfectly correlates with the true poverty rate of a given set of MS. Consequently, deviations in the correlation between the poverty rate captured via the “true” coverage area and the poverty rate captured via the estimated coverage area exclusively originate in their coverage mismatch.

Fig 4. Simulation setup—True poverty rate.

Fig 4

In order to create a mobile network on top of that structure, I use a clustering algorithm based on the population density (see Fig 5b). BTS are distributed across the country at a ratio of roughly 1 BTS per 5,000 inhabitants in urban areas and 1 BTS per 10,000 inhabitants in rural areas. This results in 100 urban and 50 rural BTS in this simulation. BTS are interpreted as omnidirectional antennas and assigned specific heights, frequencies and output powers. The specifications vary more strongly in the urban area in order to reflect the greater complexity of network topology generally found in metropolitan areas. Since the HATA model requires a classification of areas into urban, suburban and rural, I use those 50% of BTS with the smallest number of pixels associated to them by the clustering algorithm used above as urban and those 5% of BTS with the largest number of pixels as rural, suburban otherwise. At the end, BTS heights are between 15—60 m with frequencies at 900 MHz and 2100 MHz and output power between 40 and 47 dBm. The MS height is fixed at 1m above ground level.

Fig 5. Simulation setup—BTS locations.

Fig 5

Based on these technical specifications, the true coverage areas and the true home locations of the settlements using the extended HATA model are calculated and used to create benchmark estimates of the true poverty rate. The results are then compared against estimates from point-to-polygon allocation, voronoi tessellation, augmented voronoi tessellation and BSA and IDW approaches of a naïve (’simple’) version of the extended HATA model that does not know the exact technical BTS specifications, but makes an educated guess based on publicly available information such as the frequencies used in the country and the location of urban centers. Fig 6 exemplifies how the approaches differ in terms of geographical coverage.

Fig 6. Coverage areas exemplified.

Fig 6

The results are compared in three different ways: How much do they overlap geographically? How much do they overlap in terms of home-located settlements? How well do they predict the true poverty rate of a given statistical area?

Results

Table 2 shows the best performing approach in each round across round for all five performance indicators. Performance differences between voronoi tessellation versus the augmented voronoi tessellation and the augmented voronoi tessellation versus the HATA (BSA) approach showcase the relative contribution of settlement weighting and radio-propagation modelling, respectively. As expected, the simple HATA model clearly outperforms the other mapping approaches in terms of overlap, both geographically with the true coverage area (see Table 3) as well as concerning the home-located settlements (see Table 4). As the settlement-based approaches do only affect the calculation of weights and not of the coverage area, the coverage results are identical for voronoi tessellation and augmented voronoi tessellation and for the two HATA approaches, respectively. However, this advantage is not reflected to a similar extent in the predictive performance.

Table 2. Best performing approach by round across rounds (in%).

Mapping Coverage Prediction
Geography Settlements R2 Bias RMSE
Point 0.0 0.0 27.5 28.2 28.6
Voronoi 0.0 0.07 2.6 9.9 2.1
Aug. Voronoi (GUF) 35.5 33.1 36.8
HATA (GUF, BSA) 100.0 99.3 29.7 13.7 27.7
HATA (GUF, IDW) 4.7 15.1 4.8

Table 3. Geographical overlap with true coverage area (in%).

Mapping Total Rural Suburban Urban
Point 25.8 15.3 30.9 22.3
Voronoi 30.7 14.1 25.5 37.0
Simple HATA 55.3 80.1 62.1 46.7

Table 4. Overlap with true home-located settlements (in%).

Mapping Total Rural Suburban Urban
Point 16.9 44.3 15.9 14.9
Voronoi 54.2 56.8 60.7 48.0
Simple HATA 59.7 87.6 66.5 50.6

Interestingly, the HATA (IDW) approach performs poorly in prediction in contrast to the HATA (BSA) approach. This is due to the fact that the poverty rate in the true coverage area is calculated based on a deterministic home location, i.e. it is calculated from a constant set of settlements. This coincides directly with the mode-based HATA (BSA) approach, however, it does not reflect most real-world settings, in which stochastic radio propagation and overlapping coverage areas lead to situations where the captured poverty rate by the BTS is sourced from varying sets of settlements. The HATA (IDW) approach addresses this setup. Consequently, it is expected that the differences between these two approaches at least diminish in the application with real-world data in Section Application. Also, deviations of the HATA (BSA) approach from the benchmark exclusively originate in the technical misspecifications as the true coverage area is calculated from a correctly specified HATA model. The network complexity faced in real-world settings is expected to further undermine the accuracy of propagation-based mapping schemes.

Looking at the performance of the two voronoi approaches in Table 2 the value added of using settlement information becomes apparent. Recalling the setup, the simulation assumes error-free human settlement identification. This, again, may not hold true in a real-world application as some buildings may not be detected while some detected buildings may not be inhabited. Consequently, it is expected that the difference between thee two voronoi approaches will be less stark in the application.

Fig 7 shows the distribution of the three performance indicators across rounds for those statistical areas for which every mapping scheme can provide estimates. On average, this reduces the underlying set of observations from 40 to 32 (see the sample sizes in Table 5). The result for the true coverage area are represented as benchmark for the other approaches as it estimates the settlement-level poverty rates actually captured by the respective BTS. Consequently, the benchmark should provide the upper bound for the R2 and the lower bound for the bias and the RMSE in each round. Deviations thereof may only be due to spurious correlation.

Fig 7. Estimating the true poverty rate for statistical areas.

Fig 7

Distribution of the three performance metrics adjusted R2, bias and RMSE with the estimated poverty rate using the true coverage area, i.e. built-up areas perfectly allocated to BTS, as ‘Benchmark’ across 1000 simulation runs.

Table 5. Area-level correlation of estimated and true poverty rate & sample size.

Mapping ρ n ρRural nRural ρUrban nUrban
Benchmark 0.905 40 0.734 24 0.971 16
Point 0.930 36 0.828 20 0.940 16
Voronoi 0.873 40 0.622 24 0.966 16
Aug. Voronoi 0.896 40 0.715 24 0.966 16
Simple HATA (BSA) 0.897 40 0.717 24 0.957 16
Simple HATA (IDW) 0.885 40 0.670 24 0.962 16

The sample size difference also explains the difference between the performance of the point-to-polygon approach in terms of correlation in Table 5 vis-à-vis the performance metrics, especially in rural areas. Point-to-polygon allocation does not provide poverty estimates for 8 out of 40 statistical areas, on average, as they do not host a BTS (cf. Fig 6b). As both poverty rate and BTS allocation is linked to the population density by design, it can be expected that the predictive performance for rural areas not hosting a BTS are poor as they are generated from different underlying distributions.

However, this does not fully explain the performance differences between the approaches. On one hand, statistical areas are quite large, thus most of the BTS experience little overlaps in their true coverage area with other statistical areas. Consequently, the statistical area provides a decent approximation for the coverage. In contrast, simple voronoi tessellation with geographical weights tends to overemphasize the importance of remote areas as a) it assumes to cover areas for which data is actually not captured and b) BTS are usually located in close proximity to populated areas while serving remote areas further away as a side effect of it. This may be especially relevant in situations with large between-variation among statistical areas, strong population clusters and imperfect mobile network coverage. While b) is accounted for in the simulation, only approx. 0.1% of the settlements are not covered by the network. Although this in line with the mobile network coverage in most countries, it can be expected that propagation-based schemes that account for holes in the mobile network outperform established approaches in setups with poor coverage.

Application

In their 2017 study on estimating literacy rates in Senegal published in the Journal of the Royal Statistical Society Series A, Schmid et al. [1] use point-to-polygon allocation to map BTS point locations to statistical areas (communes). I revisit the design-based simulation of the study and extend it with four alternative mapping schemes, notably voronoi tessellation, satellite-augmented voronoi tessellation and the herein presented propagation-based coverage estimation methods using the best server area approach and the inverse signal strength weights. I compare the outcomes of all five schemes in terms of bias, root mean squared error (RMSE) and adjusted R2.

Situation in Senegal

The application draws on real-world data from Orange-Sonatel for the year of 2013 [7]. During that time, the MNO operated mainly on the GSM 900 (2G) band with some UMTS 2100 (3G) deployments in urban centers. A large share of on-net traffic (approx. 91% of overall traffic vis-à-vis a market share of approx. 57%) during that year suggests a high prevalence of dual SIM use. It is expected that in this setting a negligible share of SIM cards are used by IoT devices others than MS. Coverage advantages in rural areas suggest dual-SIM use to be a phenomenon of more densely populated areas. The country exhibits little irregularities in the terrain: The highest point of Senegal being approx. 648 m above sea level is located at its southern border. The lowest point constitutes the sea level. Urban built-up areas with multi-storey buildings are predominantly limited to downtown Dakar. Most of the country is dominated by savanna with sparse high-grown vegetation.

Original study

In their design-based simulation, Schmid et al. [1] implement a stratified two-stage cluster sample design similar to the one used in large-scale household surveys such as the Demographic and Health Survey (DHS) using a 10% random sample of a pseudo-population as sampling frame, the 431 communes of Senegal as primary sampling units (PSUs) and the 14 regions of Senegal as strata. The authors combine the constructed ‘survey’ data with covariates extracted from mobile phone metadata on the level of communes in order to evaluate different small area estimation techniques using the unemployment rate as target variable of choice. The 72 available covariates are calculated on the subscriber-level using the Python library Bandicoot [8]. The subscriber-level covariates are allocated and aggregated to a BTS using the most frequently used BTS by a subscriber between 7pm and 7am as the home location. The BTS-level covariates are then allocated and aggregated using point-in-polygon allocation. Variable selection is performed backwards on large communes using the Bayesian Information Criterion. The covariates are used to generate small area unemployment rate estimates using a transformed Fay-Herriot model. Finally, Schmid et al. evaluate the small area estimates against the ‘true’ pseudo-population aggregates in 500 simulation runs using bias and RSME for a) communes covered by the survey (in-sample) b) communes not covered by the survey (out-of-sample) and c) communes without covariates from mobile phone metadata. For additional details on the setup of the original study, I refer to [1].

Extensions

I re-run the simulation of the original study five times thereby only varying the commune-level matrix of covariates as inputs. Specifically, I create five distinct sets of commune-level covariates beforehand by applying different mapping schemes during the aggregation process of the BTS-level data of the original study. First, I use the point-to-polygon allocation used in the original study. Second, I apply a standard voronoi tessellation to extract spatial weights proportional to the geographical overlap of tile and statistical area as described in Voronoi tessellation since it is used in most other studies in this field. Third, I augment the voronoi tessellation with settlement information from GUF by taking the number of white pixels (representing (part of) a settlement) within each section as a weight for commune-level aggregates to account for within-cell heterogeneity. Fourth, I implement the extended HATA (BSA) model as presented in Methodology and GUF data. In densely populated areas, this approach closely resembles voronoi tessellation, however, it allows for holes in the network and for non-linear relationships between signal strength and distance. Fifth, I use inverse signal strength weights—HATA (IDW)—to capture the stochastic nature of a link.

Comparing Fig 8c and 8d to the direct estimator (Fig 8b) shows the benefits of augmenting survey data with mobile phone metadata: providing estimates for small areas not originally covered by the survey. Looking at settlements in Fig 8a, it is noteworthy that one commune—Thietty in the region Kolda—does not appear to host any settlement identified as such in GUF data. While official population numbers do not support this view, it underlines the fact that information extracted from satellite imagery, e.g. settlement classifications, are subject to some degree of uncertainty.

Fig 8. Commune-level coverage areas in Senegal.

Fig 8

Areas for which estimates of indicators of interest are available are coloured in red. Lower resolution built-settlements extents data reprinted from [10] under a CC BY license, with permission from WorldPop, original copyright 2018, are used in (a) for illustrative purposes.

Assumptions

In contrast to point-to-polygon allocation and voronoi tessellation, the extended HATA model requires additional technical antenna specifications, notably the antenna and receiver height, the frequency and the transmitter power. As additional information are not available in the original study, I make following assumptions: I fix both the antenna height htx and the receiver height hrx at the lower bound of the extended HATA model, which is 30 m and 1 m, respectively, both located outdoors with line-of-sight and a transceiver installed above the roof. As most of Senegal is flat without high multi-storey buildings except in downtown Dakar and in large parts no high-grown vegetation this assumption appears reasonable. Further, I fix the frequency in rural areas at 900 MHz and in urban centers at 2100 MHz and I interpret BTS as omnidirectional antennas with an output power of 45 dBm. This is clearly a simplification of the actual network topology, especially in urban areas with a mix of directed micro and macro cells. However, in Senegal in 2013, 4G has not yet been introduced and Orange-Sonatel was operating 3G (on the 2100 MHz frequency band) only in urban areas. The remaining country was served with 2G technology on the 900 MHz band. Comparing own estimates with coverage area estimates for 2G in 2017 published by Sonatel [59] allows for a rough sanity check for the assumptions.

While Senegal offers an official classification of rural and urban on the commune-level, it is imperfect for the purposes of this study, as it takes a wide variety of non-network-specific factors into account. This leads to a situation where places with a high population density, e.g. Touba Mosque, are classified as commune rurale. Instead, I use BTS density per km2 as a proxy for urbanity with a threshold of 1. Communes with more than one BTS per km2 are classified as urban, those 50% of the communes with the lowest site density are classified as rural, the remaining communes are classified as suburban. This represents a more network-oriented measure of urbanity and is also in line with the area type classification of the HATA model.

Results

Similar to Table 2 in the simulation, Table 6 shows which mapping scheme performed best across the 500 evaluation rounds. Confirming initial findings of Section Simulation, there is no clear winner. While point-to-polygon allocation performs best in out-of-sample predictions in terms of RMSE (54.0% of the rounds), it performs poorest in in-sample predictions. One possible explanation is that the lower average number of predictors used across rounds reduces the effects of overfitting. While HATA (IDW), HATA (BSA) and the augmented voronoi approach perform well across performance metrics, the overall difference between the approaches is limited (see Fig 9 and Table 7).

Table 6. Best performing approach by round across rounds (in%).

Mapping Adj. R2 Bias RMSE Avg. # of predictors
in in out in out
Point 6.0 16.4 23.2 12.6 54.0 4.2
Voronoi 10.2 21.0 16.6 21.6 22.8 5.0
Aug. Voronoi (GUF) 27.2 22.6 18.0 33.2 5.2 6.5
HATA (GUF, BSA) 27.0 21.8 16.4 18.0 7.8 6.4
HATA (GUF, IDW) 29.6 18.2 25.8 14.6 10.2 6.2

Fig 9. Evaluation of poverty rate estimates for in-sample communes.

Fig 9

Distribution of the three performance metrics adjusted R2, bias and RMSE across 500 simulation runs on a comparable set of communes. The typical trade-off between the bias and the variance of a small area estimator vis-à-vis the direct survey estimator becomes apparent.

Table 7. Correlation with true unemployment rate and sample size in Senegal.

Mapping ρ n ρin nin ρout nout ρooc nooc
Point 0.535 431 0.765 192 0.320 210 0.355 29
Voronoi 0.542 431 0.778 196 0.313 235 - 0
Aug. Voronoi (GUF) 0.519 431 0.780 195 0.280 233 0.586 3
HATA (GUF, BSA) 0.511 431 0.770 194 0.269 232 0.670 5
HATA (GUF, IDW) 0.527 431 0.781 196 0.308 234 - 1

In contrast, urban communes do not perform significantly better than rural ones as suggested by the simulation results. Table 8 shows, similar to Table 5 for the simulation, the correlation between the actual and predicted commune-level unemployment rates. Fig 10 shows an orientation along the diagonal signalling overall good fit. A possible explanation is that the structural relationship of mobile phone metadata covariates and the unemployment rate is captured more robustly for rural areas as they constitute 385 out of 431 communes in Senegal. To test this explanation, Tables 1 and 2 in S1 Appendix show the results for in-sample and out-of-sample predictions by commune status, respectively. While urban communes outperform rural ones in in-sample prediction they fare worse for in the out-of-sample setting, thus supporting the aforementioned hypothesis.

Table 8. Area-level correlation of estimated and true unemployment rate & sample size.

Mapping ρ n ρRural nRural ρUrban nUrban
Point 0.535 431 0.507 385 0.527 46
Voronoi 0.542 431 0.519 385 0.469 46
Aug. Voronoi 0.519 431 0.495 385 0.411 46
Simple HATA (BSA) 0.511 431 0.487 385 0.374 46
Simple HATA (IDW) 0.527 431 0.510 385 0.369 46

Fig 10. True vs. estimated unemployment rate by commune status for a single simulation run.

Fig 10

While settlement-based mapping schemes exhibit improvements in the model fit compared to point allocation or voronoi tessellation, they do not translate into major efficiency gains in terms of bias and rmse (see Fig 9b and 9c). Possible reasons are threefold: There is a significant classification error in the settlement data. The complete absence of settlements in Thietty, Kolda, support this assumption. As a cross-check, I re-run the analysis with an alternative source of settlement information. Specifically, I use high-resolution population density estimates from WorldPop [12], however, it does not lead to gains in efficiency (cf. Table 3 in S1 Appendix). Second, there is high spatial auto-correlation, thus little structural difference between the densely and sparsely populated areas in terms of the variable of interest—here unemployment—so even though latter are overemphasized in the calculations, it does not affect the outcome predictions. Here, I re-run the application with alternative variables of interest, i.e. the literacy rate and the population count (cf. Tables 4 and 5 in S1 Appendix); again, without significant efficiency gains versa point allocation and voronoi tessellation. Third, there is little within-area variation of the population density so that geographic weights and settlement-based weights are very similar. The correlation coefficient between the weights of the two voronoi approaches confirm that with ρ = 0.98. Also, I use the 100 meters x 100 meters population estimates from WorldPop to extract commune-specific variation coefficients. For 76.8% of the communes, the within-commune variance is below 1, for 4% it is above 100 with a maximum at 3553.4.

In general, the value added of using propagation-based mapping schemes appears to be negligible in this application, even though official coverage area estimates by Sonatel [59] hint at the abundant presence of both overlaps and holes in the mobile network. A potential explanation is that the simplified HATA model is misspecified to an extent where the introduced errors cancel out the potential benefits. Looking at the specifications used in the application, this is most likely due to an underestimation of the coverage as the augmented voronoi approach closely resembles the upper bound for an overestimation using the HATA (BSA) within a—by assumption—largely homogeneous network.

Conclusion

Augmenting official statistics with mobile phone metadata still faces multiple methodological challenges, one of them is finding a common reference unit. As record-linkage on the individual-level presents considerable privacy risks a common procedure is to combine aggregates of these two disparate data sources on a geographical level. However, the stochastic nature of radio propagation makes it difficult to pin down coverage areas of the mobile network. Based on this study the good news is that it does not have to be complicated if supervised learning / prediction is the goal. While propagation-based models can help to refine the accuracy of coverage area estimation, it does not greatly impact the quality of the outcome predictions. One reason is that usually cells are located in a way that they provide a good service to as many MS as possible. As radio signals fade over distance, this means they are in close proximity to areas with high demand, i.e. densely populated places. Mapping schemes, in turn, mainly differ from each other when looking at the limits of a cell. However, most of the traffic which is correlated with statistical data for training/prediction is generated nearby, so the differences between mapping schemes become less relevant. Also, while geographical weights as used in most applications in this field ignore heterogeneity occurring within the cells, the corresponding statistical areas are often significantly larger. Therefore, cross-border cells, which could actually profit from weighting schemes that take within-cell heterogeneity into account, occur less frequent. In addition, cells and administrative (thus often statistical) areas are intimately linked via population clusters as both tend to be centered around them.

However, this study just provided initial evidence to inform future mapping choices and could be extended in multiple ways: First, both in the simulation and the application directional antennas are combined to omnidirectional antennas. While this is motivated by the typical data availability in real-world applications, it is of course a strong simplification of the actual network topology. As the lower bound of spatial heterogeneity captured is given by the number of unique areas resulting from intersecting coverage areas and statistical areas, studies such as [4] have shown that moving from an BTS-oriented to a cell-oriented analysis could greatly affect analysis, especially via potential increases in sample size. However, it needs further investigation how refined mapping schemes can add further value, particularly in the presence of measurement uncertainty, to supervised learning setups in cell-level analysis. Second, the study used comparatively simple empirical propagation models based on real-world measurements largely ignoring actual environments. More advanced propagation models exist, however, they require significantly more computing resources that could limit their applicability as they take the physical surrounding via digital surface models into account. Nevertheless, investigating this constitutes an interesting path for further research.

Supporting information

S1 Appendix. Results and instructions.

Results from the cross-checks of the application and instructions for replicating the findings of this study.

(PDF)

S1 File. Simulation.

Code for replicating the simulation study.

(ZIP)

S2 File. Application.

Code and data for replicating the application study. See S1 Appendix for further details.

(ZIP)

Acknowledgments

The author would like to thank Damien Jacques, Emmanuel Letouzé, Edward Oughton, Sören Pannier, Neeti Pokhriyal and Timo Schmid for excellent comments and helpful discussions.

Data Availability

The mobile phone data at the antenna- and commune-level aggregated to the year 2013 including noisy antenna locations as well as instructions for replicating the study results have been added as part of the Supporting information. In order to access record-level mobile phone data and exact antenna locations, one would need to contact Sonatel directly and present the research project that would require the data (contact: Mr El Hadji Birahim Gueye, Direction des Systèmes d’information Sonatel, ebgueye@orange-sonatel.com or post mail: Orange-Sonatel, 46 Boulevard de la République, BP 69 Dakar, Senegal). GUF data cannot be shared publicly because third-party access conditions apply (for scientific, non-commercial use). However, it is available for research purposes under a data user agreement. For data access, please contact the German Aerospace Agency under guf@dlr.de (https://www.dlr.de/eoc/en/PortalData/60/Resources/dokumente/guf/DLR-GUF_LicenseAgreement-and-OrderForm.pdf). Census data used in the study cannot be shared publicly because third-party access conditions apply. However, it is available for research purposes under a data user agreement. For data access, please visit the microdata catalogue of the statistical office in Senegal (http://anads.ansd.sn/index.php/catalog/51) or send the inquiry to statsenegal@ansd.sn. All code required for replicating the findings of this study is fully available in the Supporting information of this submission (S1 and S2 Files) and under https://github.com/tilluz/geomatching_open.

Funding Statement

The author received no specific funding for this work.

References

  • 1. Schmid T, Bruckschen F, Salvati N, Zbiranski T. Constructing sociodemographic indicators for national statistical institutes by using mobile phone data: estimating literacy rates in Senegal. Journal of the Royal Statistical Society Series A: Statistics in Society. 2017;180(4):1163–1190. 10.1111/rssa.12305 [DOI] [Google Scholar]
  • 2. Pokhriyal N, Jacques DC. Combining disparate data sources for improved poverty prediction and mapping. Proceedings of the National Academy of Sciences of the United States of America. 2017;114(46):E9783–E9792. 10.1073/pnas.1700319114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Blumenstock J, Cadamuro G, On R. Predicting poverty and wealth from mobile phone metadata. Science. 2015;350:1073–1076. 10.1126/science.aac4420 [DOI] [PubMed] [Google Scholar]
  • 4. Ricciato F, Widhalm P, Pantisano F, Craglia M. Beyond the’single-operator, CDR-only’ paradigm: An interoperable framework for mobile phone network data analyses and population density estimation. Pervasive and Mobile Computing. 2017;35:65–82. 10.1016/j.pmcj.2016.04.009 [DOI] [Google Scholar]
  • 5.The Economist Intelligence Unit. The Inclusive Internet Index 2019; 2019. Available from: https://theinclusiveinternet.eiu.com/.
  • 6. Phillips C, Sicker D, Grunwald D. A survey of wireless path loss prediction and coverage mapping methods. IEEE Communications Surveys and Tutorials. 2013;15(1):255–270. 10.1109/SURV.2012.022412.00172 [DOI] [Google Scholar]
  • 7.de Montjoye YA, Smoreda Z, Trinquart R, Ziemlicki C, Blondel VD. D4D-Senegal: The Second Mobile Phone Data for Development Challenge. arXiv preprint arXiv:14074885. 2014;.
  • 8. De Montjoye YA, Rocher L, Pentland AS. Bandicoot: A python toolbox for mobile phone metadata. Journal of Machine Learning Research. 2016;17:1–5. [Google Scholar]
  • 9. Esch T, Heldens W, Hirner A, Keil M, Marconcini M, Roth A, et al. Breaking new ground in mapping human settlements from space—The Global Urban Footprint. ISPRS Journal of Photogrammetry and Remote Sensing. 2017;134:30–42. 10.1016/j.isprsjprs.2017.10.012 [DOI] [Google Scholar]
  • 10.WorldPop. (www.worldpop.org—School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University; 2018. Global High Resolution Population Denominators Project—Funded by The Bill and Melinda Gates Foundation (OPP1134076). 10.5258/SOTON/WP00649 [DOI]
  • 11. Department of Economic and Social Affairs UN. Handbook on geospatial infrastructure in support of census activities. St/esa/sta ed New York, USA: United Nations Publication; 2009. [Google Scholar]
  • 12. Stevens FR, Gaughan AE, Linard C, Tatem AJ. Disaggregating census data for population mapping using Random forests with remotely-sensed and ancillary data. PLoS ONE. 2015;10(2):e0107042 10.1371/journal.pone.0107042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Freire S, MacManus K, Pesaresi M, Doxsey-Whitfield E, Mills J. Development of new open and free multi-temporal global population grids at 250 m resolution. AGILE. 2016; p. 6. [Google Scholar]
  • 14.Henderson JV, Storeygard A, Weil DN. Measuring economic growth from outer space; 2012. Available from: http://pubs.aeaweb.org/doi/10.1257/aer.102.2.994. [DOI] [PMC free article] [PubMed]
  • 15. Chen X, Nordhaus WD. Using luminosity data as a proxy for economic statistics. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(21):8589–8594. 10.1073/pnas.1017031108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Pinkovskiy M, Sala-i Martin X. Lights, Camera… Income! Illuminating the National Accounts-Household Surveys Debate. The Quarterly Journal of Economics. 2016;131(2):579–631. 10.1093/qje/qjw003 [DOI] [Google Scholar]
  • 17. Leyk S, Gaughan AE, Adamo SB, de Sherbinin A, Balk D, Freire S, et al. Allocating people to pixels: A review of large-scale gridded population data products and their fitness for use. Earth System Science Data Discussions. 2019;11(3):1–30. 10.5194/essd-2019-82 [DOI] [Google Scholar]
  • 18.Bonafilia D, Gill J, Kirsanov D, Sundram J. Mapping for humanitarian aid and development with weakly-and semi-supervised learning. Facebook; 2019. Available from: https://bit.ly/2PxK5dx.
  • 19. Harvey JT. Estimating census district populations from satellite imagery: Some approaches and limitations. International Journal of Remote Sensing. 2002;23(10):2071–2095. 10.1080/01431160110075901 [DOI] [Google Scholar]
  • 20. Steinnocher K, De Bono A, Chatenoux B, Tiede D, Wendt L. Estimating urban population patterns from stereo-satellite imagery. European Journal of Remote Sensing. 2019;52(sup2):12–25. 10.1080/22797254.2019.1604081 [DOI] [Google Scholar]
  • 21. Jean N, Burke M, Xie M, Davis WM, Lobell DB, Ermon S. Combining satellite imagery and machine learning to predict poverty. Science. 2016;353(6301):790–794. 10.1126/science.aaf7894 [DOI] [PubMed] [Google Scholar]
  • 22. Weidmann NB, Schutte S. Using night light emissions for the prediction of local wealth. Journal of Peace Research. 2017;54(2):125–140. 10.1177/0022343316630359 [DOI] [Google Scholar]
  • 23. Oughton E. Quantified Global Broadband Strategies for Connecting Unconnected Communities. SSRN Electronic Journal. 2019;. 10.2139/ssrn.3427492 [DOI] [Google Scholar]
  • 24.Blondel VD, Decuyper A, Krings G. A survey of results on mobile phone datasets analysis; 2015. Available from: http://www.epjdatascience.com/content/4/1/10.
  • 25. Deville P, Linard C, Martin S, Gilbert M, Stevens FR, Gaughan AE, et al. Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences. 2014;111 (45):15888–15893. 10.1073/pnas.1408439111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Khodabandelou G, Gauthier V, Fiore M, El Yacoubi AM. Estimation of Static and Dynamic Urban Populations with Mobile Network Metadata. IEEE Transactions on Mobile Computing. 2018;. 10.1109/TMC.2018.2871156 [DOI] [Google Scholar]
  • 27. Botta F, Moat HS, Preis T. Quantifying crowd size with mobile phone and Twitter data. Royal Society Open Science. 2015;2(5):150162 10.1098/rsos.150162 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Douglass RW, Meyer DA, Ram M, Rideout D, Song D. High resolution population estimates from telecommunications data. EPJ Data Science. 2014;4(1):1–13. 10.1140/epjds/s13688-015-0040-6 [DOI] [Google Scholar]
  • 29. Lu X, Bengtsson L, Holme P. Predictability of population displacement after the 2010 Haiti earthquake. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(29):11576–11581. 10.1073/pnas.1203882109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Gundogdu D, Incel OD, Salah AA, Lepri B. Countrywide arrhythmia: emergency event detection using mobile phone data. EPJ Data Science. 2016;5(1):25 10.1140/epjds/s13688-016-0086-0 [DOI] [Google Scholar]
  • 31. Schneider CM, Belik V, Couronné T, Smoreda Z, González MC. Unravelling daily human mobility motifs. Journal of the Royal Society Interface. 2013;10(84). 10.1098/rsif.2013.0246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Wesolowski A, Eagle N, Noor AM, Snow RW, Buckee CO. The impact of biases in mobile phone ownership on estimates of human mobility. Journal of the Royal Society Interface. 2013;10(81):20120986 10.1098/rsif.2012.0986 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Matamalas JT, De Domenico M, Arenas A. Assessing reliable human mobility patterns from higher order memory in mobile communications. Journal of the Royal Society Interface. 2016;13(121):20160203 10.1098/rsif.2016.0203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Iovan C, Olteanu-Raimond AM, Couronné T, Smoreda Z. Moving and calling: Mobile phone data quality measurements and spatiotemporal uncertainty in human mobility studies. In: Lecture Notes in Geoinformation and Cartography. vol. 2013-Janua. Springer, Cham; 2013. p. 247–265. Available from: http://link.springer.com/10.1007/978-3-319-00615-4_14.
  • 35. Janzen M, Vanhoof M, Smoreda Z, Axhausen KW. Closer to the total? Long-distance travel of French mobile phone users. Travel Behaviour and Society. 2018;11:31–42. 10.1016/j.tbs.2017.12.001 [DOI] [Google Scholar]
  • 36. Taylor L. No place to hide? The ethics and analytics of tracking mobility using mobile phone data. Environment and Planning D: Society and Space. 2016;34(2):319–336. 10.1177/0263775815608851 [DOI] [Google Scholar]
  • 37. Wesolowski A, Eagle N, Tatem AJ, Smith DL, Noor AM, Snow RW, et al. Quantifying the impact of human mobility on malaria. Science. 2012;338(6104):267–270. 10.1126/science.1223467 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Rubrichi S, Smoreda Z, Musolesi M. A comparison of spatial-based targeted disease mitigation strategies using mobile phone data. EPJ Data Science. 2018;7(1):17 10.1140/epjds/s13688-018-0145-9 [DOI] [Google Scholar]
  • 39. Tizzoni M, Bajardi P, Decuyper A, Kon Kam King G, Schneider CM, Blondel V, et al. On the Use of Human Mobility Proxies for Modeling Epidemics. PLoS Computational Biology. 2014;10(7):e1003716 10.1371/journal.pcbi.1003716 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Le Menach A, Tatem AJ, Cohen JM, Hay SI, Randell H, Patil AP, et al. Travel risk, malaria importation and malaria transmission in Zanzibar. Scientific Reports. 2011;1(1):93 10.1038/srep00093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Frías-Martínez E, Williamson G, Frías-Martínez V. An agent-based model of epidemic spread using human mobility and social network information. In: Proceedings—2011 IEEE International Conference on Privacy, Security, Risk and Trust and IEEE International Conference on Social Computing, PASSAT/SocialCom 2011. IEEE; 2011. p. 57–64. Available from: http://ieeexplore.ieee.org/document/6113095/.
  • 42. Lima A, Domenico MD, Pejovic V, Musolesi M. Exploiting Cellular Data for Disease Containment and Information Campaigns Strategies in Country-Wide Epidemics. CoRR. 2013;abs/1306.4. [Google Scholar]
  • 43. Park PS, Blumenstock JE, Macy MW. The strength of long-range ties in population-scale social networks. Science. 2018;362(6421):1410–1413. 10.1126/science.aau9735 [DOI] [PubMed] [Google Scholar]
  • 44. Bakker MA, Piracha DA, Lu PJ, Bejgo K, Bahrami M, Leng Y, et al. Measuring Fine-Grained Multidimensional Integration Using Mobile Phone Metadata: The Case of Syrian Refugees in Turkey In: Guide to Mobile Data Analytics in Refugee Scenarios. Cham: Springer International Publishing; 2019. p. 123–140. Available from: http://link.springer.com/10.1007/978-3-030-12554-7_7 [Google Scholar]
  • 45.Sundsøy P. Can mobile usage predict illiteracy in a developing country? arXiv preprint arXiv:160701337. 2016;.
  • 46. Blumenstock J, Callen M, Ghani T. Why do defaults affect behavior? Experimental evidence from Afghanistan. American Economic Review. 2018;108(10):2868–2901. 10.1257/aer.20171676 [DOI] [Google Scholar]
  • 47. Bruckschen F, Koebe T, Ludolph M, Marino MF, Schmid T. Refugees in Undeclared Employment—A Case Study in Turkey In: Guide to Mobile Data Analytics in Refugee Scenarios. Cham: Springer International Publishing; 2019. p. 329–346. Available from: http://link.springer.com/10.1007/978-3-030-12554-7_17 [Google Scholar]
  • 48.Tennekes M. mobloc: Mobile phone location algorithms and tools; 2018. Available from: https://github.com/MobilePhoneESSnetBigData/mobloc_v0.1.
  • 49.OECD. Household definitions in other statistical standards. In: OECD Guidelines for Micro Statistics on Household Wealth. OECD Publishing; 2013. p. 275–277. Available from: https://www.oecd-ilibrary.org/docserver/9789264194878-18-en.pdf?expires=1570525698&id=id&accname=guest&checksum=FE901313FB9732B831D7F32703E8569C.
  • 50.Vanhoof M, Lee C, Smoreda Z. Performance and sensitivities of home detection from mobile phone data. arXiv preprint arXiv:180909911. 2018;.
  • 51. Oughton EJ, Frias Z, van der Gaast S, van der Berg R. Assessing the capacity, coverage and cost of 5G infrastructure strategies: Analysis of the Netherlands. Telematics and Informatics. 2019;37:50–69. 10.1016/j.tele.2019.01.003 [DOI] [Google Scholar]
  • 52.Green MP, Wang SS. Signal propagation model used to predict location accuracy of GSM mobile phones for emergency applications. In: Proceedings—RAWCON 2002: 2002 IEEE Radio and Wireless Conference. Institute of Electrical and Electronics Engineers Inc.; 2002. p. 119–122.
  • 53. Hata M. Empirical Formula for Propagation Loss in Land Mobile Radio Services. IEEE Transactions on Vehicular Technology. 1980;29(3):317–325. 10.1109/T-VT.1980.23859 [DOI] [Google Scholar]
  • 54.Damasso E L M Correia. Digital Mobile Radio Towards Future Generation. Luxembourg: European Commission; 1999. 11. Available from: https://publications.europa.eu/en/publication-detail/-/publication/f2f42003-4028-4496-af95-beaa38fd475f/language-en.
  • 55. Okumura Y, Ohmori E, Kawano T, Fukuda K. Field Strength and Its Variability in UHF and VHF Land-Mobile Radio Service. Review of the Electrical Communication Laboratory, September-October, 1968. 1968;16:825–873. [Google Scholar]
  • 56. Sharma RK Singh PK. Comparative Analysis of Propagation Path loss Models with Field Measured Data. International Journal of Engineering Science and Technology. 2010;2(6):2008–2013. [Google Scholar]
  • 57.Abhayawardhana VS, Wassellt IJ, Crosby D, Sellars MP, Brown MG. Comparison of empirical propagation path loss models for fixed wireless access systems. In: IEEE Vehicular Technology Conference. vol. 61; 2005. p. 73–77.
  • 58.Phillips C, Sicker D, Grunwald D. Bounding the error of path loss models. In: 2011 IEEE International Symposium on Dynamic Spectrum Access Networks, DySPAN 2011; 2011. p. 71–82.
  • 59.Sonatel. Coverage Map Sonatel 2019; 2019. Available from: https://bit.ly/2uJplYk.

Decision Letter 0

Jacinto Estima

15 Jul 2020

PONE-D-20-05427

Better coverage, better outcomes? Mapping mobile network data to official statistics using satellite imagery and radio propagation modelling

PLOS ONE

Dear Dr. Koebe,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The paper should be corrected regarding the comments provided by the reviewers. Reviewer 1 provided major suggestions that may affect the substance of the paper's finding, but will certainly contribute to improve its quality and contributions. Also, make sure that all data underlying the findings described in the manuscript are fully available without restriction, as this is a requirement of the journal.

Please submit your revised manuscript by Aug 29 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Jacinto Estima

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that Figures 1, 8, 9 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

2.1. You may seek permission from the original copyright holder of Figures 1, 8, 9 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2.2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Review of P-one-D-20-0542

This study provides a detailed methodological approach to small-area-estimation of poverty based on data derived from mobile phones. Specifically, it addresses methodological advances in correcting how well the data derived from phones are able to assist in predictions of poverty rates. The study is well described and an important contribution to the field. I have several comments I hope can improve the general readability and accessibility of the manuscript, and one request for re-analysis that would improve applicability of the results.

As a general comment, I believe the authors have understated one of the main advances the paper offers, which is to allow for the idea that the statistical units at which poverty is measured may be best represented by more than one tower location. This idea should be better expressed in the introduction.

Both settlement weighting to derive augmented Voronoi polygons and radio-propogation-based modelling aim to address a core problem with how population density interacts with tower densities. While the results in the present study do not show significant improvements in predictive power,

I feel that it could be valuable to further explore the relative contribution of settlement weighting and radio-propagation modelling in the context of population density and tower distribution. For example, in the simulation studies, the predictive power of the model was higher in urban versus Rural areas. It would be of interest to see how this difference played out in Senegal.

In the conclusions, the author suggests that model misspecification could be the reason for a lack of significant model improvement using propogation models. A further potential interpretation – and possible contribtion of the study is that it hints at a lower limit to the scale at which spatial heterogeneity in poverty rates can be discerned using CDR data, at least in the context of poverty data aggregated within statistical units.

Please be consistent with capitalisation of ‘Voronoi’

Line 196: Please define HATA the first time you use it in text. Or does HATA refer to the author in citation 53?

Line 332: Please define BSA as Best server area here.

Line 350: please define IDW here as inverse distance weights

Results

Line 596: I don’t see any differentiation between urban and rural predictions in the Senagal case. Can these be provided for interest?

Figure 2: please be consistent in your labelling of the different scenarios so that 2e and 2f are labelled BSA and IDW in line with other text and tables.

Reviewer #2: Please see attached

Please see attached

Please see attached

Please see attached

Please see attached

Please see attached

Please see attached

Please see attached

Please see attached

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Nov 9;15(11):e0241981. doi: 10.1371/journal.pone.0241981.r002

Author response to Decision Letter 0


20 Aug 2020

I am grateful to the academic editor and the two reviewers for their constructive and excellent comments. These have been very helpful for improving and preparing the revised version of this paper. I have done my best to respond to all comments. The file 'Responses to Reviewers' show how I addressed each comment.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Jacinto Estima

26 Oct 2020

Better coverage, better outcomes? Mapping mobile network data to official statistics using satellite imagery and radio propagation modelling

PONE-D-20-05427R1

Dear Dr. Koebe,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jacinto Estima

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Acceptance letter

Jacinto Estima

29 Oct 2020

PONE-D-20-05427R1

Better coverage, better outcomes? Mapping mobile network data to official statistics using satellite imagery and radio propagation modelling

Dear Dr. Koebe:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jacinto Estima

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Results and instructions.

    Results from the cross-checks of the application and instructions for replicating the findings of this study.

    (PDF)

    S1 File. Simulation.

    Code for replicating the simulation study.

    (ZIP)

    S2 File. Application.

    Code and data for replicating the application study. See S1 Appendix for further details.

    (ZIP)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    The mobile phone data at the antenna- and commune-level aggregated to the year 2013 including noisy antenna locations as well as instructions for replicating the study results have been added as part of the Supporting information. In order to access record-level mobile phone data and exact antenna locations, one would need to contact Sonatel directly and present the research project that would require the data (contact: Mr El Hadji Birahim Gueye, Direction des Systèmes d’information Sonatel, ebgueye@orange-sonatel.com or post mail: Orange-Sonatel, 46 Boulevard de la République, BP 69 Dakar, Senegal). GUF data cannot be shared publicly because third-party access conditions apply (for scientific, non-commercial use). However, it is available for research purposes under a data user agreement. For data access, please contact the German Aerospace Agency under guf@dlr.de (https://www.dlr.de/eoc/en/PortalData/60/Resources/dokumente/guf/DLR-GUF_LicenseAgreement-and-OrderForm.pdf). Census data used in the study cannot be shared publicly because third-party access conditions apply. However, it is available for research purposes under a data user agreement. For data access, please visit the microdata catalogue of the statistical office in Senegal (http://anads.ansd.sn/index.php/catalog/51) or send the inquiry to statsenegal@ansd.sn. All code required for replicating the findings of this study is fully available in the Supporting information of this submission (S1 and S2 Files) and under https://github.com/tilluz/geomatching_open.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES