Skip to main content
Journal of the Royal Society Interface logoLink to Journal of the Royal Society Interface
. 2017 Feb;14(127):20160690. doi: 10.1098/rsif.2016.0690

Mapping poverty using mobile phone and satellite data

Jessica E Steele 1,2,, Pål Roe Sundsøy 3, Carla Pezzulo 1, Victor A Alegana 1, Tomas J Bird 1, Joshua Blumenstock 4, Johannes Bjelland 3, Kenth Engø-Monsen 3, Yves-Alexandre de Montjoye 5, Asif M Iqbal 6, Khandakar N Hadiuzzaman 6, Xin Lu 2,7,8, Erik Wetter 2,9, Andrew J Tatem 1,2,10, Linus Bengtsson 2,7
PMCID: PMC5332562  PMID: 28148765

Abstract

Poverty is one of the most important determinants of adverse health outcomes globally, a major cause of societal instability and one of the largest causes of lost human potential. Traditional approaches to measuring and targeting poverty rely heavily on census data, which in most low- and middle-income countries (LMICs) are unavailable or out-of-date. Alternate measures are needed to complement and update estimates between censuses. This study demonstrates how public and private data sources that are commonly available for LMICs can be used to provide novel insight into the spatial distribution of poverty. We evaluate the relative value of modelling three traditional poverty measures using aggregate data from mobile operators and widely available geospatial data. Taken together, models combining these data sources provide the best predictive power (highest r2 = 0.78) and lowest error, but generally models employing mobile data only yield comparable results, offering the potential to measure poverty more frequently and at finer granularity. Stratifying models into urban and rural areas highlights the advantage of using mobile data in urban areas and different data in different contexts. The findings indicate the possibility to estimate and continually monitor poverty rates at high spatial resolution in countries with limited capacity to support traditional methods of data collection.

Keywords: poverty mapping, mobile phone data, Bayesian geostatistical modelling, remote sensing

1. Background

In 2015, approximately 700 million people lived in extreme poverty [1]. Poverty is a major determinant of adverse health outcomes including child mortality [2], and contributes to population growth [3], societal instability and conflict [4]. Eradicating poverty in all its forms remains a major challenge and the first target of the Sustainable Development Goals (SDGs) [5]. To eradicate poverty, it is crucial that information is available on where affected people live. Such data improve the understanding of the causes of poverty, enable improved allocation of resources for poverty alleviation programmes, and are a critical component for monitoring poverty rates over time. The latter issue is especially pertinent for efforts aimed at reaching the SDGs, which need to be monitored at national and subnational levels over the coming 15 years [5].

The definition of poverty and the measurement methods used to identify poor persons are part of a longstanding discussion in development economics [69]. Different approaches exist to calculate indicators of living standards, including the construction of unidimensional and multidimensional indices, as well as the use of monetary or non-monetary metrics. A further discussion for living standard indices regards the methods used to set appropriate thresholds (poverty lines) under which a person is defined as poor [1012]. Monetary-based metrics identify poverty as a shortfall in consumption (or income) and measure whether households or individuals fall above or below a defined poverty line [13,14]. By contrast, asset-based indicators define household welfare based on asset ownership (e.g. refrigerator, radio or bicycle), dwelling characteristics, and access to basic services like clean water and electricity [15]. Moreover, poverty indicators can capture the status of a household or individual at a given point in time, or identify chronic versus transient poverty over time [14,1618].

Every approach used to calculate indicators of living standards for a population has its advantages and disadvantages, and each indicator discerns different characteristics of the population. Consumption data can be highly noisy due to recall error or because expenditures occurred outside the period captured in surveys, but provide a better shorter-term concept of poverty [19,20]. Asset-based measures have been regarded as a better proxy for the long-term status of households as they are thought to be more representative of permanent income or long-term control of resources [2022]. The same population can be ranked quite differently along a poverty distribution when comparing consumption and asset-based measures and many assumptions are necessarily accepted in order do such comparisons. These include assumptions that the data represent the same populations in the same time period; that the indicators are well matched in their wording and response options; and that the poverty measures have a similar distribution of responses [20,23]. Furthermore, it is difficult to compare asset-based measures to income or consumption as it is not straightforward to link the productive potential of a household to their assets owned; this can be particularly relevant in rural areas where the return on physical assets can be strongly environmentally related and interactions among assets may be important [24]. These factors necessitate a flexible approach to modelling poverty as indicators representing asset-based, consumption-based and income-based measures are not necessarily expected to produce similar results.

While numerous high-resolution indicators of human welfare are routinely collected for populations in high-income countries, the geographical distribution of poverty in low- and middle-income countries (LMICs) is often uncertain [25]. Small area estimation (SAE) forms the standard approach to produce sub-national estimates of the proportion of households in poverty. SAE uses statistical techniques to estimate parameters for sub-populations by combining household survey and census data to use the detail in household surveys and the coverage of the census. Common variables between the two are used to predict a poverty metric across the population [2628]. These techniques rely on the availability of census data, which are typically collected every 10 years and often released with a delay of one or more years, making the updating of poverty estimates challenging. Recently, there are promising signs that novel sources of high-resolution data can provide an accurate and up-to-date indication of living conditions. In particular, recent work illustrates the potential of features derived from remote sensing and geographic information system data [2935] (hereafter called RS data) and mobile operator call detail records (CDRs) [3639]. However, the predictive power in integrating these two data sources, and their ability to estimate different measures of poverty has not been evaluated.

RS and CDR data capture distinct and complimentary correlates of human living conditions and behaviour. For example, RS data of physical properties, such as rainfall, temperature and vegetation capture information related to agricultural productivity, while distance to roads and cities reflects access to markets and information. Similarly, monthly credit consumption on mobile phones and the proportion of people in an area using mobile phones indicate household access to financial resources, while movements of mobile phones and the structure and geographical reach of the calling networks of individuals may be correlated with remittance flows and economic opportunities [3941].

RS and CDR data are generated at different spatial scales, which further complement each other. The CDR indicators used in this study are derived from data aggregated at the level of the physical cell towers to preserve the privacy of individual subscribers. Thus, the spatial resolution of these data is determined by tower coverage, which is larger in rural areas and fine-scaled in urban areas. By contrast, RS data can be relatively coarse in urban areas and only capture physical properties of the land. As RS and CDR data are continually collected, the ability to produce accurate maps using these data types offers the promise of ongoing subnational monitoring required by the SDGs.

Here, we use overlapping sources of RS, CDR and traditional survey-based data from Bangladesh to provide the first systematic evaluation of the extent to which different sources of input data can accurately estimate three different measures of poverty. To date, the predictive power in integrating these data sources, and their ability to estimate different measures of poverty, has not been evaluated. We use hierarchical Bayesian geostatistical models (BGMs) to construct highly granular maps of poverty for three commonly used indicators of living standards: the Demographic and Health Surveys (DHS) Wealth Index (WI); an indicator of household expenditures (Progress out of Poverty Index, PPI) [42] and reported household monetary income. We additionally compare our results with previous poverty estimates for Bangladesh at coarser and finer resolutions.

2. Material and methods

2.1. Spatial scale and data processing

All data used in this study were processed to ensure that projections, resolutions and extents matched. The spatial scale of analysis was based on approximating the mobile tower coverage areas using Voronoi tessellation [43] and models were built on the scale of the Voronoi polygons (figure 1). This allowed us to maintain the fine spatial detail in mobile phone data within urban areas, as Voronoi polygon size, and corresponding spatial detail, varies greatly from urban to rural areas (minimum 60 m, maximum 5 km) as shown in the figure. All datasets were then summarized to spatially align with these polygons. In practice, each polygon was assigned RS and CDR values representing the mean, sum or mode of the corresponding data. The survey data are matched to the Voronois based on the GPS located lat/long of PPI data, the lat/long representing the centroid of each DHS cluster, and the home tower of each income survey respondent. Where multiple points from the same output (WI, PPI and income) fell within the same polygon, we used the mean aggregated value.

Figure 1.

Figure 1.

Spatial structure of Voronoi polygons based on the configuration of mobile phone towers in Bangladesh. The zoom window shows the spatial detail of Dhaka.

2.2. Poverty data

We used three geographically referenced datasets representing asset, consumption and income-based measures of wellbeing in Bangladesh (see the electronic supplementary material, figure S1 and section A.1). These data were obtained from three sources: the 2011 Bangladesh DHS, the 2014 FII survey [44] with data collected on the PPI (www.progressoutofpoverty.org) and national household surveys conducted by Telenor Group subsidiary Grameenphone (GP) between November 2013 and March 2014 collecting household income data.

The DHS WI is constructed by taking the first principal component of a basket of household assets and housing characteristics such as floor type and ceiling material, which explains the largest percentage of the total variance, adjusting for differences in urban and rural strata [45]. A final composite combined score is then used as a WI whereby each household is assigned its correspondent quintile in the distribution and each individual belonging to the same household shares the same WI score. A higher score implies higher socioeconomic status (range = −1.45 to 3.5). Here, we used aggregated average WI scores per primary sampling unit (PSU) for 600 PSUs (207 in urban areas and 393 in rural areas) to estimate the mean WI of sampled populations residing in each Voronoi polygon.

The PPI is a measurement tool built from the answers to 10 questions about a household's characteristics and asset ownership, scored to compute the likelihood the household is living above or below a poverty line. In Bangladesh, these poverty scorecard questions were determined using data from the 2010 Household Income and Expenditure Survey (HIES) [42,46], and used in a nationally representative survey of 6000 Bangladeshi adults undertaken in 2014 [44]. Together with basic demographics and access to financial services information, the 10 questions needed to construct the PPI were collected. These data were used to assign a poverty measure to each individual interviewed: the likelihood they have per capita expenditure above or below a poverty line. Here, we estimate the mean likelihood (range = 12.3–99.7%) of populations residing in each Voronoi polygon to be below the $2.50 a day poverty line.

Income data were obtained from two independent, sequential household surveys run by GP. For each survey, face-to-face interviews were conducted with 90 000 individuals, and their corresponding household income was collected, together with basic demographic information for each survey participant (e.g. gender, age, profession, education) and phone usage. Respondents were directly asked about income and were requested to place themselves within pre-set income bins. Among GP subscribers, CDRs were successfully linked to phone numbers for 76 000 participants. Here we converted income bins to USD (range = 0–1285$) and modelled the average USD for each Voronoi polygon.

2.3. CDR and RS data

CDR features were generated from four months of mobile phone metadata collected between November 2013 and March 2014. GP subscribers consented to the use of their data for the analysis. GP, the largest mobile network operator in Bangladesh, had 48 million customers at the time of the analysis, with a network covering 99% of the population and 90% of the land area [47]. CDR features range from metrics such as basic phone usage, top-up patterns, and social network to metrics of user mobility and handset usage. These features are easily made available in data warehouses and do not rely on complex algorithms. They include various parameters of the corresponding distributions such as weekly or monthly median, mean and variance (see the electronic supplementary material).

We further identified, assembled and processed 25 raster and vector datasets into a set of RS covariates for the whole of Bangladesh at a 1 km spatial resolution. These data were obtained from existing sources and produced ad hoc for this study to include environmental and physical metrics likely to be associated with human welfare [31,33,4850] such as vegetation indices, night-time lights, climatic conditions, and distance to roads or major urban areas. A full summary of assembled covariates is provided in the electronic supplementary material.

2.4. Covariate selection

Prior to statistical analyses, all CDR and RS covariate data were log transformed for normality. Bivariate Pearson's correlations were computed for each pair of covariates to assess multicollinearity, and for high correlations (r > 0.70), we eliminated covariates that were less generalizable outside Bangladesh. For example, population data are widely available (e.g. www.worldpop.org.uk/) but births data may not be; similarly, volumes of calls could be computed and compared across countries, but charges may be country-specific.

To identify the set of predictors most suitable for modelling the WI, PPI, and income data, we employed a model selection stage as is common in statistical modelling [51]. For this we used non-spatial generalized linear models (glms), implemented via the R glmulti package [52,53], to build every possible non-redundant model for every combination of covariates. Models were built on a randomly selected 80% of the data to guard against overfitting. Models were chosen using Akaike's information criterion (AIC), which ranks models based on goodness of fit and complexity, while penalizing deviance [52]. A full IC-based approach such as this allows for multi-model inference. Where multiple glms had near-identical AIC values, we selected the model with the fewest number of covariates. For the CDR data only, we used forward and backward stepwise selection (p = 0.05) prior to model selection to reduce the initial CDR inputs from 150 to 30 or less. The covariate selection process was completed for all three poverty measures for national, urban and rural strata, and using RS-only, CDR-only and CDR–RS datasets (27 resulting models). This allowed us to explore differences in factors related to urban and rural poverty, as well as to explicitly compare the ability of RS-only, CDR-only and CDR–RS datasets to predict poverty measures. The resulting models were then used in the hierarchical Bayesian geostatistical approach (see the electronic supplementary material, tables S2a–c).

2.5. Prediction mapping

Using the models selected by the previous step, we employed hierarchical Bayesian geostatistical models (BGMs) to predict the three poverty metrics at unsampled locations across the population. We chose BGMs as they offer several advantages for addressing the limitations and constraints associated with modelling geolocated survey data. These include straightforwardly imputing missing data, allowing for the specification of prior distributions in model parameters and spatial covariance, and estimating uncertainty in the predictions as a distribution around each estimate [54,55].

Additionally, we needed to account for spatial autocorrelation in the data as they are aligned to the tower locations, which are clustered across varying spatial scales (described in §2.1 and figure 1). BGMs can achieve this through incorporating a spatially varying random effect. Here, the Voronoi polygons themselves form the neighbourhood structure for this spatial random effect, and neighbours are defined within a scaled precision matrix [56]. The matrix represents the spatially explicit processes that may affect poverty estimates. It is passed through a graph function in the model which assumes the neighbour relations are connected [57], that is all adjacent polygons share a boundary. This function accounts for the spatial covariance in the data by allowing observations to have decreasing effects on predictions that are further away.

Here, all BGMs were implemented using integrated nested Laplace approximations (INLA) [58], which uses an approximation for inference and avoids the computational demands, convergence issues and mixing problems sometimes encountered by MCMC algorithms [59]. The model is fit using R-INLA, with the Besag model for spatial effects specified inside the function [60,61]. In the Besag model, Gaussian Markov random fields (GMRFs) are used as priors to model spatial dependency structures and unobserved effects. GMRFs penalize local deviation from a constant level based on the precision parameter τ, where the hyperpriors are loggamma distributed [56]. The hyperprior distribution governs the smoothness of the field used to estimate spatial autocorrelation [56]. The spatial random vector x = (x1, … ,xn) is thus defined as

2.5.

where ni is the number of neighbours of node i, i ∼ j indicates that the two nodes i and j are neighbours. The precision parameter τ is represented as

2.5.

where the prior is defined on θ1 [60]. The geostatistical models defined for the WI, PPI and income data were applied to produce predictions of the each poverty metric for each Voronoi polygon as a posterior distribution with complete modelled uncertainty around estimates. The posterior mean and standard deviation for each polygon were then used to generate prediction maps with associated uncertainty (figure 2 and electronic supplementary material, figures S2–S6). Model performance was based on out-of-sample validation statistics calculated on a 20% test subset of data. Pearson product-moment correlation coefficient (r) (or Spearman's rho (ρ) for n < 100), root-mean-square-error (RMSE), mean absolute error (MAE) and the coefficient of determination (r2) were calculated for all BGMs. Finally, because glms do not incorporate prior information for model parameters, we ran each model through INLA while excluding the random spatial effect to obtain non-spatial Bayesian estimates and compare model fit and performance due to the explicit spatial process.

Figure 2.

Figure 2.

National level prediction maps for mean WI (a) with uncertainty (d); mean probability of households being below $2.50/day (b) with uncertainty (e); and mean USD income (c) with uncertainty (f). Maps were generated using call detail record features, remote sensing data and Bayesian geostatistical models. The maps show the posterior mean and standard deviation from CDR–RS models for the WI and income data (a,c), and the RS model for the PPI (b). Red indicates poorer areas in prediction maps, and higher error in uncertainty maps.

3. Results

We find models employing a combination of CDR and RS data generally provide an advantage over models based on either data source alone. However, RS-only and some CDR-only models performed nearly as well (table 1). While the combined CDR–RS model performed well in both urban (r2 = 0.78) and rural (r2 = 0.66) areas, and at the national level (r2 = 0.76), the performance of RS-only and CDR-only models was more context-dependent. For example, PPI and income models did not improve predictions in urban areas, but in rural areas the RS-only models performed nearly as well for both indicators. The fine spatial granularity of the resultant poverty estimates can be shown in figure 2, which shows the predicted distribution of poverty for all three measures. Spatially, the models exhibit higher uncertainty where fewer data are available, such as the peninsular areas surrounding Chittagong in the southeast where mobile towers are sparse. We also find that explicitly modelling the spatial covariance in the data was critically important. This resulted in improved predictions, lower error and better measures of fit based on cross-validation and the deviance information criteria (DIC), a hierarchical modelling generalization of the AIC [62] (electronic supplementary material, tables S3 and S4).

Table 1.

Cross-validation statistics based on a random 20% test subset of data for all Bayesian geostatistical models.

poverty metric model r2 RMSE
whole country
DHS WI CDR–RS 0.76 0.394
CDR 0.64 0.483
RS 0.74 0.413
PPI CDR–RS 0.25 57.907
CDR 0.23 58.562
RS 0.32 57.439
income CDR–RS 0.27 105.465
CDR 0.24 107.155
RS 0.22 108.682
urban
DHS WI CDR–RS 0.78 0.424
CDR 0.70 0.552
RS 0.71 0.433
PPI CDR–RS 0.00 60.128
CDR 0.03 60.935
RS 0.00 60.384
income CDR–RS 0.15 168.452
CDR 0.15 172.738
RS 0.05 176.705
rural
DHS WI CDR–RS 0.66 0.402
CDR 0.50 0.483
RS 0.62 0.427
PPI CDR–RS 0.18 57.397
CDR 0.17 57.991
RS 0.21 57.162
income CDR–RS 0.14 81.979
CDR 0.13 82.773
RS 0.23 76.527

Separating estimation by urban and rural regions further highlights the importance of different data in different contexts (electronic supplementary material, tables S2a–c). Night-time lights, transport time to the closest urban settlement, and elevation were important nationally and in rural models; climate variables were also important in rural areas. Distances to roads and waterways were significant in urban and rural strata. In general, the addition of CDR data did not change the selection of RS covariates at any level. Top-up features derived from recharge amounts and tower averages were significant in every model, affirming their importance in poverty work. People predicted to be poorer top-up their phones more frequently in small amounts. Per cent nocturnal calls, and count and duration of SMS traffic were significant nationally. Mobility and social network features were important in all three strata. In urban areas, SMS traffic was important, whereas multimedia messaging and video attributes were key in rural areas.

Models were most successful at reconstructing the WI to model poverty (r2 = 0.76); consumption-based and income-based poverty proved more elusive. WI models have better fit, lower error and higher explained variance based on out-of-sample validation (figure 3). Combined CDR–RS data produced the best WI models and lowest error (r2 (CDR–RS) = 0.76, r2 (RS) = 0.74, r2 (CDR) = 0.64; RMSE (CDR–RS) = 0.394, RMSE(RS) = 0.413, RMSE(CDR) = 0.483). However, for the PPI models, the best model predicting the probability of falling below $2.50/day was the RS-only model (figure 2b,e, r2 (RS) = 0.32; RMSE(RS) = 57.439). The model discerns many urban areas but also predicts areas with very low poverty likelihood and high uncertainty outside urban areas, especially around Sylhet in the northeast. Income predictions (figure 2c,f) show greater variation across the country, and the best national model was for combined CDR–RS data (r2 (CDR–RS) = 0.27, RMSE (CDR–RS) = 105.465).

Figure 3.

Figure 3.

Out-of-sample observed versus predicted values for (a) DHS WI using mobile phone and remote sensing data: r2 = 0.76, n = 117, p < 0.001, RMSE = 0.394; (b) progress out of Poverty Index using remote sensing data: r2 = 0.32, n = 100, p < 0.001, RMSE = 57.439; and (c) income using mobile phone and remote sensing data: r2 = 0.27, n = 1384, p < 0.001, RMSE = 105.465.

The resulting predictions line up well with existing SAE estimates for Bangladesh, and with high-resolution maps of slum areas in Dhaka. The urban CDR–RS model has the highest explained variance for any model (r2 (CDR–RS_urb) = 0.78) and the urban CDR-only model outperforms the national CDR-only model (r2 (CDR_urb) = 0.70). Precision and accuracy are slightly lower, but the improved correlation highlights the advantage of using CDRs within a diverse urban population. To explore this further, we compared our WI predictions against a spatially explicit dataset of slum areas in Dhaka [63] (figure 4). We find the mean predicted WI of slum and non-slum areas to be significantly different, t615 = −17.2, p < 0.001, predicting slum areas to be poorer than non-slum areas.

Figure 4.

Figure 4.

Comparison of predicted mean DHS WI values between slum and non-slum areas in Dhaka as delineated by Gruebner et al. [63] t615 =−17.2, p < 0.001. The 95% confidence interval using Student's t-distribution with 615 degrees of freedom is (−0.48, −0.38).

To compare our method to previous poverty estimates at administrative level 3 (upazila), we used the same methodology at the lower spatial resolution, using the upazila boundaries to form the random spatial effect in the model, and covariates from the best national level model for each poverty measure. We find strong correlations (r = −0.91 and −0.86 for the WI; 0.99 and 0.97 for the PPI; and −0.96 and −0.94 for income, respectively, p < 0.001 for all models) between our upazila predictions and earlier estimates of poverty derived from SAE techniques based on data from the 2010 Household Income and Expenditure (HIES) survey and 2011 census [64] (figure 5). The r-values reported for WI and income are negative at administrative level 3 because as the proportion of people below the poverty line as estimated by Ahmed et al. decreases, the WI value and income in USD of the sampled population increases. That is, people who are wealthier as estimated by the WI and income data are also less likely to live below the poverty line according to earlier estimates. The geostatistical method presented here thus accurately maps heterogeneities at small spatial scales while correlating well with earlier coarser estimates. All remaining WI, PPI and income prediction maps are provided in the electronic supplementary material.

Figure 5.

Figure 5.

Comparison of the proportion of people falling below upper (circles) and lower (triangles) poverty lines estimated by Ahmad et al. [64] and (a) predicted mean WI using mobile phone and remote sensing data, (b) predicted probability of being below $2.50 per day using remote sensing data and (c) predicted income using mobile phone and remote sensing data. All models were predicted at the upazila scale (Admin unit 3). Pearson's r correlations: −0.91 and −0.86 for the WI; 0.99 and 0.97 for the PPI; and −0.96 and −0.94 for income, respectively (p < 0.001 for all models).

4. Discussion

This work represents the first attempt to build predictive maps of poverty using a combination of CDR and RS data. The results demonstrate that CDR-only and RS-only models perform comparably in their ability to map poverty indicators, and that integrating these data sources provides improvement in predictive power and lower error. These results are promising as the CDR data here produce accurate, high-resolution estimates in urban areas not possible using RS data alone. As such, CDRs potentially allow for estimation of wealth at much finer granularity—including the neighbourhood or even the household or individual—than the current generation of RS technologies [36]. While CDRs are proprietary data, they are increasingly used in research, and have formed the basis for hundreds of published articles over the past few years [65]. They also provide significant advantages in temporal granularity: CDRs update in real-time versus RS data, which update far less frequently. Although in this study we have not used dynamic validation data, it is a clear future application for CDRs in real-time to better comprehend the dynamic nature of poverty.

The higher accuracy of predictions for the asset-based WI over other poverty metrics is presumably due to several factors. The predictive power for assets has been shown to be higher than for consumption [35] in addition to the aforementioned issues of survey question wording and response options [20,23]. Further, income and consumption can vary hugely by day, week, and can be related to changes in household size, job loss or gain, piecework or harvest outcomes. Assets and housing characteristics are generally considered more stable [2022]. For the datasets used in this study, WI data are based on clusters of households, and this sampling strategy provides more robust estimates and less variability than the individually based PPI and income data. Greater success in predicting the WI is also presumably due to the WI measuring a wider range of living standard across the population. That is, the full range of distribution from poorest to wealthiest in the population is represented in these data. Alternatively, by considering a streamlined 10 questions, the PPI is meant to identify the poorest individuals in a population. Similarly, in the income data, there were very few respondents in higher income categories.

The higher error associated with CDR-only models is not surprising considering the noise inherent in these data. CDR features are derived from daily and weekly measurements aggregated over short temporal intervals, while RS covariates are generally comprised of long-term averages or comparatively less dynamic measures of location and access such as roads or proximity to urban centres. Bearing this in mind, we find CDR data useful for estimating poverty in the absence of ancillary datasets.

Our findings provide further support for correlations between socio-economic measures and night-time light intensity [36,48,49], access to roads and cities [50,66], entropy of contacts [37,40] and mobility features [39]. The universal coverage of cell towers across Bangladesh made it possible to predict poverty at high-resolution in both urban and rural areas. Within urban areas, the high correlation with maps of slums in Dhaka suggests we are capturing the poorest populations. Even if the poorest populations are not generating call data [36], and thus not included in the CDRs, we still see a clear difference in WI predictions between slum and non-slum areas using tower level CDR aggregates. This finding extends recent work which predicted wealth and poverty at the district level, but were unable to verify predictions at finer scales [36].

Using CDRs and RS data within BGMs to produce accurate, high-resolution poverty maps in LMICs offers a way to complement census-based methods and provide more regular updates. Regularly updated poverty estimates are necessary to enable subnational monitoring of the SDGs during intercensal years and are critical to ensure mobilization of resources to end poverty in all its dimensions as set out in SDG 1. Poverty estimates are time sensitive and become obsolete when factors such as migration rates, infrastructure, and market integration change [67]. Furthermore, the methods presented here offer a workaround to estimating poverty with household survey data, which can be time consuming and expensive to obtain.

To end poverty in all its dimensions, it is likely that methods that exploit information from, and correlations between, many different data sources will provide the greatest benefit in understanding the distribution of human living conditions. To leverage data from differing sample sizes, temporal and spatial scales, BGMs provide such a rigorous framework. This study further provides an example of how aggregated CDR data can be processed in such a way that detailed maps can be created without revealing sensitive user and commercial information. As insights from CDRs and other remote sensing data become more widely available, analysing these data at regular intervals could allow for dynamic poverty mapping and provide the means for operationally monitoring poverty. The combination of spatial detail and frequent, repeated measurements may distinguish the transitorily poor from the chronically poor, and allow for monitoring economic shocks [68]. This offers the potential for a fuller characterization of the spatial distribution of poverty and provides the foundation for evidence-based strategies to eradicate poverty. Researchers would do well to use the additional information and granularity afforded by CDR data with matched individual-based consumption data to further infer novel and useful information from mobile data.

Supplementary Material

Supplementary Information (SI)

Acknowledgements

The authors gratefully acknowledge Dr Elisabeth zu Erbach-Schoenberg, Dr Nick Ruktanonchai and Dr Alessandro Sorichetta for useful discussions. We also wish to thank two anonymous reviewers for providing useful comments. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Authors' contributions

J.E.S. was responsible for research design, production of RS covariates, data cleaning and processing, statistical analyses, interpretation, drafting and production of the final manuscript. C.P. was responsible for survey data management, cleaning and processing, and interpretation and drafting of the final manuscript. P.S., J.Bj. and K.E. were responsible for CDR data management, cleaning, and production of CDR data, and interpretation and drafting of the final manuscript. J.Bl. was responsible for interpretation, drafting, and production of the final manuscript. V.A., T.B., Y.M., X.L. and E.W. were responsible for interpretation and production of the final manuscript. A.I. and K.N.H. were responsible for income survey data collection, management and processing. A.J.T. and L.B. were responsible for overall scientific management, interpretation and production of the final manuscript. All authors gave final approval for publication.

Competing interests

We declare we have no competing interests.

Funding

J.E.S., E.W. and J.Bl. are supported by the Bill & Melinda Gates Foundation (OPP1106936). C.P. is supported by the Bill & Melinda Gates Foundation (OPP1106427). X.L. acknowledges the Natural Science Foundation of China under grant nos. 71301165 and 71522014. A.J.T. is supported by funding from NIH/NIAID (U19AI089674), the Bill & Melinda Gates Foundation (OPP1106427, 1032350, OPP1134076 and OPP1094793), the Clinton Health Access Initiative, National Institutes of Health, and a Wellcome Trust Sustaining Health grant (106866/Z/15/Z). L.B. acknowledges the Swedish Research Council, grant no. D0313701. This work forms part of the WorldPop Project (www.worldpop.org.uk) and Flowminder Foundation (www.flowminder.org).

References

  • 1.Cruz M, Foster J, Quillin B, Schellekens P. 2015. Ending extreme poverty and sharing prosperity: progress and policies. 83. Washington, DC: World Bank Development Economics Group. [Google Scholar]
  • 2.Målqvist M. 2015. Abolishing inequity, a necessity for poverty reduction and the realisation of child mortality targets. Arch. Dis. Child. 100, S5–S9. ( 10.1136/archdischild-2013-305722) [DOI] [PubMed] [Google Scholar]
  • 3.Population and poverty | UNFPA - United Nations Population Fund. 2016. See http://www.unfpa.org/resources/population-and-poverty (accessed: 21 January 2016).
  • 4.Braithwaite A, Dasandi N, Hudson D. 2014 doi: 10.1177/0738894214559673. Does poverty cause conflict? Isolating the causal origins of the conflict trap. Confl. Manag. Peace Sci. 33 , 45–66. ( ) [DOI] [Google Scholar]
  • 5.United Nations General Assembly. 2015. Transforming our world: the 2030 Agenda for Sustainable Development.
  • 6.Chambers R.2006. What is poverty? Who asks? Who answers? Poverty in Focus December 2006, UNDP International Poverty Centre. See http://opendocs.ids.ac.uk/opendocs/handle/123456789/120 .
  • 7.Deaton A, Zaidi S. 1999. Guidelines for constructing consumption aggregates for welfare analysis. Princeton, NJ: Woodrow Wilson School - Development Studies. [Google Scholar]
  • 8.Kuznets S. 1955. Economic growth and income inequality. Am. Econ. Rev. 45, 1–28. [Google Scholar]
  • 9.Harsanyi J. 1955. Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility. J. Polit. Econ. 63, 309–321. ( 10.1086/257678) [DOI] [Google Scholar]
  • 10.Alkire S, Foster J. 2011. Understandings and misunderstandings of multidimensional poverty measurement. J. Econ. Inequal. 9, 289–314. ( 10.1007/s10888-011-9181-4) [DOI] [Google Scholar]
  • 11.Coudouel A, Hentschel J, Wodon Q. 2002. Poverty measurement and analysis.
  • 12.Babu S, Gajanan SN, Sanyal P. 2014. Food security, poverty and nutrition policy analysis: statistical methods and applications. New York, NY: Academic Press. [Google Scholar]
  • 13.Ravallion M. 1998. Poverty lines in theory and practice, pp. 1–53. Washington, DC: The World Bank. [Google Scholar]
  • 14.Caterina Ruggeri Laderchi RS, and F. S. Does it matter that we don't agree on the definition of poverty? A comparison of four approaches. Queen Elizabeth House Working Paper Series. Oxford, UK: University of Oxford.
  • 15.Falkingham J, Namazie C. 2002. Measuring health and poverty: a review of approaches to identifying the poor. London, UK: DFID Health Systems Resource Centre. [Google Scholar]
  • 16.Hulme D, Shepherd A. 2003. Conceptualizing chronic poverty. World Dev. 31, 403–423. ( 10.1016/S0305-750X(02)00222-X) [DOI] [Google Scholar]
  • 17.Foster J, Greer J, Thorbecke E. 1984. A class of decomposable poverty measures. Econometrica 52, 761–766. ( 10.2307/1913475) [DOI] [Google Scholar]
  • 18.Ligon E, Schechter L. 2003. Measuring vulnerability. Econ. J. 113, C95–C102. ( 10.1111/1468-0297.00117) [DOI] [Google Scholar]
  • 19.Banerjee A, Duflo E, Chattopadhyay R, Shapiro J. 2010. Targeting the hard-core poor: an impact assessment. Poverty Action Lab. See https://www.povertyactionlab.org/sites/default/files/publications/110-%20November%202011_0.pdf.
  • 20.Schreiner M. 2011. Estimating expenditure-based poverty from the Bangladesh demographic and health survey — MEASURE evaluation. Bangladesh J. Dev. Stud. 34, 65–94. [Google Scholar]
  • 21.Filmer D, Pritchett LH. 2001. Estimating wealth effects without expenditure data--or tears: an application to educational enrollments in states of India. Demography 38, 115–132. [DOI] [PubMed] [Google Scholar]
  • 22.Sahn DE, Stifel D. 2003. Exploring alternative measures of welfare in the absence of expenditure data. Rev. Income Wealth 49, 463–489. ( 10.1111/j.0034-6586.2003.00100.x) [DOI] [Google Scholar]
  • 23.Foreit K, Schreiner M. 2011. Comparing alternative measures of poverty: assets-based wealth index vs. expenditures-based poverty score — MEASURE evaluation. Chapel Hill, NC: University of North Carolina at Chapel Hill. [Google Scholar]
  • 24.Liverpool-Tasie LSO, Winter-Nelson A. 2011. Asset versus consumption poverty and poverty dynamics in rural Ethiopia. Agric. Econ. 42, 221–233. ( 10.1111/j.1574-0862.2010.00479.x) [DOI] [Google Scholar]
  • 25.Jerven M. 2013. Poor numbers: how we are misled by African development statistics and what to do about it. Ithaca, NY: Cornell University Press. [Google Scholar]
  • 26.Hentschel J, Lanjouw JO, Lanjouw P, Poggi J. 1998. Combining census and survey data to study spatial dimensions of poverty. Washington, DC: The World Bank Development Research Group and Poverty Reduction and Economic Management Network Poverty Division. [Google Scholar]
  • 27.Elbers C, Lanjouw JO, Lanjouw P. 2002. Micro-level estimation of poverty and inequality. Econometrica 71, 355–364. ( 10.1111/1468-0262.00399) [DOI] [Google Scholar]
  • 28.Elbers C, Lanjouw JO, Lanjouw P. 2002. Micro-level estimation of welfare. Washington, DC: World Bank Development Research Group. [Google Scholar]
  • 29.Tatem AJ, Gething PW, Pezzulo C, Weiss D, Bhatt S. 2014. Development of high-resolution gridded poverty surfaces. See http://www.worldpop.org.uk/resources/docs/Poverty-mapping-report.pdf.
  • 30.Sedda L, et al. 2015. Poverty, health and satellite-derived vegetation indices: their inter-spatial relationship in West Africa. Int. Health 7, 99–106. ( 10.1093/inthealth/ihv005) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Pozzi F, Robinson T, Nelson A. 2009. Accessibility mapping and rural poverty in the horn of Africa. Rome, Italy: Food and Agriculture Organization of the United Nations. [Google Scholar]
  • 32.Robinson T, Pozzi F. 2008. Poverty and welfare measures in the Horn of Africa. Rome, Italy: IGAD Livestock Policy Initiative. [Google Scholar]
  • 33.Rogers D, Emwanu T, Robinson T. 2006. Poverty mapping in Uganda: an analysis using remotely sensed and other environmental data. Rome, Italy: Food and Agriculture Organization of the United Nations. [Google Scholar]
  • 34.Okwi PO, et al. 2007. Spatial determinants of poverty in rural Kenya. Proc. Natl Acad. Sci. USA 104, 16 769–16 774. ( 10.1073/pnas.0611107104) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Jean N, et al. 2016. Combining satellite imagery and machine learning to predict poverty. Science 353, 790–794. ( 10.1126/science.aaf7894) [DOI] [PubMed] [Google Scholar]
  • 36.Blumenstock J, Cadamuro G. 2015. On R. Predicting poverty and wealth from mobile phone metadata. Science 350, 1073–1076. [DOI] [PubMed] [Google Scholar]
  • 37.Smith-Clarke C, Mashhadi A, Capra L. 2014. Poverty on the cheap: estimating poverty maps using aggregated mobile communication networks. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, Toronto, Ontario, Canada, pp. 511–520. New York, NY: ACM.
  • 38.Frias-Martinez V, Virseda J. 2012. On the relationship between socio-economic factors and cell phone usage. In Proc. of the Fifth Int. Conf. on Information and Communication Technologies and Development, Atlanta, GA, pp. 76–84. New York, NY: ACM.
  • 39.Soto V, Frias-Martinez V, Virseda J, Frias-Martinez E. 2011. Prediction of socioeconomic levels using cell phone records. In User modeling, adaption and personalization (eds Konstan JA, Conejo R, Marzo JL, Oliver N), pp. 377–388. Berlin, Heidelberg: Springer. [Google Scholar]
  • 40.Eagle N, Macy M, Claxton R. 2010. Network diversity and economic development. Science 328, 1029–1031. ( 10.1126/science.1186605) [DOI] [PubMed] [Google Scholar]
  • 41.Blumenstock JE, Eagle N. 2012. Divided we call: disparities in access and use of mobile phones in Rwanda. Inf. Technol. Int. Dev. 8, 1–16. [Google Scholar]
  • 42.Desiere S, Vellema W, D'Haese M. 2015. A validity assessment of the Progress out of Poverty Index (PPI)TM. Eval. Program Plann. 49, 10–18. ( 10.1016/j.evalprogplan.2014.11.002) [DOI] [PubMed] [Google Scholar]
  • 43.Okabe A, Boots B, Sugihara K, Chiu SN. 2009. Spatial tessellations: concepts and applications of voronoi diagrams. Chichester, UK: John Wiley & Sons. [Google Scholar]
  • 44.Intermedia. 2014. Intermedia Financial Inclusion Insight Project Bangladesh - Steps Toward Financial Inclusion 2014. See http://finclusion.org/.
  • 45.Rutstein S. 2008. The DHS wealth index: approaches for rural and urban areas. Calverton, MD: United States Agency for International Development. [Google Scholar]
  • 46.Schreiner M. 2013. A Simple Poverty Scorecard for Bangladesh. See http://microfinance.com/English/Papers/Scoring_Poverty_Bangladesh_2010_EN.pdf.
  • 47.Grameenphone, Bangladesh. 2015. Telenor Group. See http://www.telenor.com/investors/company-facts/business-description/grameenphone-bangladesh/ (accessed: 15 November 2015).
  • 48.Noor AM, Alegana VA, Gething PW, Tatem AJ, Snow RW. 2008. Using remotely sensed night-time light as a proxy for poverty in Africa. Popul. Health Metr. 6, 5 ( 10.1186/1478-7954-6-5) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ghosh T, Anderson SJ, Elvidge CD, Sutton PC. 2013. Using nighttime satellite imagery as a proxy measure of human well-being. Sustainability 5, 4988–5019. ( 10.3390/su5124988) [DOI] [Google Scholar]
  • 50.Watmough GR, Atkinson PM, Saikia A, Hutton CW. 2016. Understanding the evidence base for poverty–environment relationships using remotely sensed satellite data: an example from Assam, India. World Dev. 78, 188–203. ( 10.1016/j.worlddev.2015.10.031) [DOI] [Google Scholar]
  • 51.Murtaugh PA. 2009. Performance of several variable-selection methods applied to real ecological data. Ecol. Lett. 12, 1061–1068. ( 10.1111/j.1461-0248.2009.01361.x) [DOI] [PubMed] [Google Scholar]
  • 52. glmulti: An R package for easy automated model selection with (generalized) linear models. J. Stat. softw. See http://www.jstatsoft.org/article/view/v034i12. (accessed: 21 January 2016).
  • 53.R Core Team. 2015. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
  • 54.Blangiardo M, Cameletti M, Baio G, Rue H. 2013. Spatial and spatio-temporal models with R-INLA. Spat. Spatio-Temporal Epidemiol. 4, 33–49. ( 10.1016/j.sste.2012.12.001) [DOI] [PubMed] [Google Scholar]
  • 55.Blangiardo M, Cameletti M. 2015. Spatial and spatio-temporal Bayesian models with R - INLA. New York, NY: John Wiley & Sons. [Google Scholar]
  • 56.Sørbye SH, Rue H. 2014. Scaling intrinsic Gaussian Markov random field priors in spatial modelling. Spat. Stat. 8, 39–51. ( 10.1016/j.spasta.2013.06.004) [DOI] [Google Scholar]
  • 57.Besag J, Kooperberg C. 1995. On conditional and intrinsic autoregressions. Biometrika 82, 733–746. [Google Scholar]
  • 58.Rue H, Martino S, Chopin N. 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71, 319–392. ( 10.1111/j.1467-9868.2008.00700.x) [DOI] [Google Scholar]
  • 59.Rue H, Martino S. 2007. Approximate Bayesian inference for hierarchical Gaussian Markov random field models. J. Stat. Plan. Inference 137, 3177–3192. ( 10.1016/j.jspi.2006.07.016) [DOI] [Google Scholar]
  • 60.The R-INLA project. Latent models. The R-INLA project. 2016. See http://www.r-inla.org/models/latent-models (accessed: 21st January 2016).
  • 61.The R-INLA project. 2015 Besag model for spatial effects. See http://www.math.ntnu.no/inla/r-inla.org/doc/latent/besag.pdf.
  • 62.Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. 2002. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64, 583–639. ( 10.1111/1467-9868.00353) [DOI] [Google Scholar]
  • 63.Gruebner O, et al. 2014. Mapping the slums of Dhaka from 2006 to 2010. Dataset Pap. Sci. Dataset Pap. Sci. 2014, e172182. [Google Scholar]
  • 64.Ahmad N, et al. 2010. Technical report, pp. 1–56. Singapore: The World Bank. [Google Scholar]
  • 65.Blondel VD, Decuyper A, Krings G. 2015. A survey of results on mobile phone datasets analysis. EPJ Data Sci. 4, 10 ( 10.1140/epjds/s13688-015-0046-0) [DOI] [Google Scholar]
  • 66.Barrett CB. 2005. Rural poverty dynamics: development policy implications. Agric. Econ. 32, 45–60. ( 10.1111/j.0169-5150.2004.00013.x) [DOI] [Google Scholar]
  • 67.Bedi T, Coudouel A, Simler K. 2007. More than a pretty picture: using poverty maps to design better policies and interventions. Washington, DC: World Bank Publications. [Google Scholar]
  • 68.Toole JL, et al. 2015. Tracking employment shocks using mobile phone data. J. R. Soc. Interface 12, 20150185 ( 10.1098/rsif.2015.0185) [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (SI)

Articles from Journal of the Royal Society Interface are provided here courtesy of The Royal Society

RESOURCES