Abstract
High-resolution poverty maps are important tools for promoting equitable and sustainable development. In settings without data at every location, we can use spatial interpolation (SI) to create such maps using sample-based surveys and additional covariates. In the model-based geostatistics (MBG) framework for SI, it is typically assumed that the similarity of two areas is inversely related to their distance between one another. Applications of spline interpolation take a contrasting approach that an area's absolute location and its characteristics are more important for prediction than distance to/characteristics of other locations. This study compares prediction accuracy of the MBG approach with spline interpolation as part of a generalized additive model (GAM) for four low- and middle-income countries. We also identify any potentially generalizable data characteristics influencing comparative accuracy. We found spatially scattered pockets of wealth in Malawi and Tanzania (corresponding to the major cities), and overarching spatial gradients in Kenya and Nigeria. Spline interpolation/GAM performed better than MBG for Malawi, Nigeria and Tanzania, but marginally worse in Kenya. We conclude that the spatial patterns of wealth and other covariates should be carefully accounted for when choosing the best SI approach. This is particularly pertinent as different methods capture geographical variation differently.
Keywords: spatial interpolation, poverty mapping, wealth inequality, low- and middle-income countries
1. Introduction
Poverty has strong associations with adverse health outcomes, lost human potential and societal instability [1,2]. The international community and national governments that signed on to the Sustainable Development Goals are committed to eradicating poverty in all its forms. To achieve this goal, it is helpful to have information on where affected individuals and communities reside at refined spatial scales since aggregate data can conceal heterogeneity and any underlying patterns. Subnational poverty maps that describe spatial patterns of poverty and inequality across a country can help more effectively allocate resources and implement targeted interventions to attain higher levels of wealth and welfare among the most deprived.
Data on poverty indicators at refined geographical scales across a country are regularly collected via censuses. However, decennial censuses are too infrequent to enable timely monitoring and tracking. National surveys are generally more frequent but representative only for coarse spatial units. To overcome these constraints, Elbers and colleagues extended the techniques of small area estimation (SAE) to both types of data in 2000 [3]. The SAE method for area-unit mapping identifies comparable census and survey variables, models the desired attribute (e.g. poverty) using the survey variables common to the census, and computes poverty on small geographical partitions (e.g. enumeration area, village or hamlet) across the country based on the model obtained and the census predictor variables. Poverty/welfare is assumed to be uniform within each target region. The method has been widely used to produce subnational choropleth maps of poverty indicators in many low- and middle-income countries (LMICs) [4–7]. Other areal interpolation techniques include the dasymetric modelling method, which takes advantage of ancillary data to better approximate and redistribute count data within each target area [8]. This method has been demonstrated for accurate population mapping by age, sex and race [9], and may be extended to other socioeconomic characteristics.
The availability of georeferenced data has drastically increased in recent years. The Demographic and Health Survey (DHS) Program, for instance, has gained a reputation for collecting and providing georeferenced data on core development indicators in LMICs over the last few decades. Better data, coupled with new geographical information system (GIS) analytical techniques, have fuelled interests among researchers to improve output quality, including using spatial interpolation (SI) modelling techniques to create high-resolution gridded map surfaces. We assume that the variable of interest (e.g. poverty) has a meaningful value at every location within a study region, which is typically divided into non-overlapping small grid squares. SI techniques are then employed to predict values at every grid based on the sampled data and, where applicable, auxiliary covariates. Many multivariate spatial statistical methods have been applied in studies using DHS georeferenced data for spatial modelling and interpolation, such as SAE, kriging, autoregressive methods and model-based geostatistics (MBG) [10]. In 2013, The DHS Spatial Interpolation Working Group assessed various properties of these SI methods, e.g. computational efficiency, account for non-stationary variance and inclusion of optimal covariate selection procedures. The Working Group proposed the MBG approach as the most suitable for creating interpolated surfaces [10–12]. The incorporation of uncertainty into the modelling framework was seen as a compelling strength of MBG [10].
In 2014, the WorldPop project and partners pioneered the creation of high-resolution gridded map surfaces of the estimated proportions of people living under the USD1.25 and USD2.00 poverty thresholds with DHS data using a Bayesian MBG approach [13]. Poverty map surfaces were drawn for Kenya, Tanzania, Uganda and Pakistan, among others [13]. Furthermore, maps of population age structure [14], fertility indicators [15], malaria indicators [16,17] and other health indicators (e.g. childhood vaccination, childhood malnutrition, household access to improved source of drinking water and sanitation [12,18]) were also produced using the same framework.
The MBG methodology is detailed elsewhere [19]. Briefly, it divides spatial variation into three components—deterministic variation, spatial autocorrelation and random noise [20]. The deterministic variation of the phenomenon of interest is modelled as a set of covariates, while spatial autocorrelation refers to a variable's relationship with itself in space [21]. It is generally assumed that nearer neighbours are more related to each other than more distant counterparts. Such positive autocorrelation structure is defined and used as part of the MBG approach to explain variation in the data and make more accurate predictions at unsampled locations across the map region. Non-stationarity and other localized effects can be dealt with when implementing MBG via, for instance, optimal estimation of the covariance matrix or a Bayesian partition model [22], but the method remains most widely used for phenomena that are more similar as a function of the distance separating the sampled locations in practice [23–25].
On the other hand, spline interpolation is grounded in a slightly different theoretical viewpoint. Spline interpolation assumes that the interpolation function should pass through (or close to) the data points while being as smooth as possible. Spline interpolation can be conceptualized as bending a sheet of rubber through the observations in three-dimensional space. In this method, the geographical structure of the mapped phenomenon is not explicitly formulated. Researchers have incorporated spline spatial interpolation in a generalized additive model (GAM) formulation with the geographical coordinates (e.g. longitude and latitude) and other covariates to create interpolated map surfaces. In this GAM framework, each predictor variable is related to the outcome via a smoothed function, then all functions are added to predict the link function. Insurance pricing [26], property pricing [27], lexical data [28] and fish ecology [29,30], just to name a few different outcomes, have been mapped using this robust method in the literature.
The assumptions about the underlying variation in the sampled data, the choice of method and the parameters used can be critical to SI prediction accuracy [20]. Individuals and households with common characteristics sometimes cluster together either by choice or due to social, economic, geographical or political forces [31]. The assumption of spatial autocorrelation in wealth may be valid, as poverty tends to concentrate in mountainous regions, arid land, land-locked areas, and levels-off closer to the national/financial capitals, bodies of water and coastal areas [32]. In recent years however, the emergence of secondary cities in many LMICs may have led to certain degree of within-country redistribution of the population, economic opportunities and wealth [33]. Secondary cities are fast-developing regional hubs that provide critical support functions for governance, production services and transportation. Sometimes the locations of these cities are deliberately planned for deprived regions. Thus, a rather complex spatial structure of towns and cities might be expected, and raises concerns regarding the quality of interpolation when (positive) spatial autocorrelation is assumed and used for prediction making.
The way in which wealth is distributed across the map region likely affects prediction accuracy of the poverty maps made using existing SI approaches to different extents. We present an analysis comparing the performance of spline interpolation as part of GAM-based fitting with multivariate MBG for four LMICs in sub-Saharan Africa. The result of this comparative analysis will empirically reveal the data characteristics that contribute to any discrepancies in prediction accuracy found between methods. This will in turn shed light on the suitability of the two methods for the creation of interpolated poverty maps.
2. Data and methods
2.1. Study area
We studied four LMICs in sub-Saharan Africa—Kenya, Malawi, Nigeria and Tanzania. These countries were selected based on available data and variability in terms of geography and economy. National statistics on wealth and economics of the four countries according to The World Bank [34] and International Labour Organization [34] are presented in table 1.
Table 1.
Kenya | Malawi | Nigeria | Tanzania | |
---|---|---|---|---|
total area (km2) | 580 367 | 118 484 | 923 768 | 947 300 |
% land area | 98.1 | 79.4 | 98.6 | 93.5 |
national population (million)a | 47.2 | 17.6 | 181.2 | 53.9 |
% urban populationa | 26 | 16 | 48 | 32 |
population annual growth rate (%)a | 2.6 | 2.9 | 2.6 | 3.1 |
unemployment rate (%)a | 11.9 | 6.4 | 4.3 | 2.1 |
GDP per capita, PPP (international dollar)a | 3020 | 1159 | 6039 | 2653 |
GDP annual growth rate (%)a | 5.7 | 2.8 | 2.7 | 7.0 |
GDP composition (%)a | ||||
agriculture | 33.3 | 29.7 | 20.9 | 31.5 |
industry | 19.1 | 16.0 | 20.4 | 26.4 |
services | 47.6 | 54.3 | 58.8 | 42.2 |
labour force by occupation (%)b | ||||
agriculture | 38.0 | 84.7 | 2.1 | 66.7 |
industry | 14.3 | 8.4 | 19.5 | 6.0 |
services | 47.8 | 6.9 | 78.5 | 27.3 |
Gini indexc | 48.5 | 46.1 | 43.0 | 37.8 |
aData for 2015.
bInternational Labour Organization modelled estimates for 2017.
cMost recent data available from http://databank.worldbank.org (last accessed: 31 March 2018).
2.2. Data
We used the most recent DHS as of October 2017. The DHS collects nationally representative data on population health and sociodemographic characteristics using a multi-stage cluster sampling design with enumeration area as the primary sampling unit (PSU). As part of the DHS sampling procedure, a list of established households in each sampled PSU is obtained and used as the sampling frame for household selection [35]. The surveys include the longitude and latitude coordinates of the population centroids of sampled PSUs. The accuracy of these locations is estimated within 15 m [36]. For anonymity considerations, urban clusters are displaced up to 2 km and rural clusters up to 5 km [37]. The displaced point is then checked to ensure that it falls within the boundaries of the first administrative region, and re-displaced if necessary [37].
For each DHS, a household wealth index (WI) is computed from a range of consumer durables, access to services and housing materials via a principal component analysis [38]. The WI is widely adopted in LMICs as an indicator of socioeconomic position that describes a household's cumulative living standard within an individual survey [39]. The index is also broadly used for assessing pro-poor targeting and inequality, and, where relevant, for controlling for socioeconomic confounding [40]. To approximate poverty at the PSU level, we used the average household WI (rescaled by a factor of 10−6) with adjustment for survey-specific weights as outlined by DHS manuals. We present descriptive results of the data distribution and spatial pattern of WI of the four study countries.
2.3. Model covariates
We identified and assembled a collection of remote sensing covariates based on those used by others to generate maps of multidimensional poverty [13]. In general, the accuracy of these data to provide up-to-date indication on welfare and living conditions is considered acceptable [13,41–45]. We included data on population density from version four of the Gridded Population of the World (GPW) [46], on daytime land surface temperature [47] and vegetation index [48] from the NASA Earth Observations (NEO), on elevation data from the United States Geological Survey (USGS) [49], rasterized surfaces of Global Potential evapotranspiration and Global Aridity Index from the Consortium for Spatial Information at the Consultative Group for International Agricultural Research (CGIAR-CSI) [50–52], and on night-time light emission from the National Oceanic and Atmospheric Administration (NOAA)/National Geophysical Data Center by the United States Air Force Weather Agency [53,54]. At their finest resolutions, the land surface temperature layer and the vegetation index layer were 0.1 degree grids (approximately 11 km at the equator), while the other covariate layers were 30 arc second grids (approximately 1 km at the equator). Using these files, we extracted covariate values from each raster layers at the georeferenced PSUs, represented as spatial points, via spatial overlaying [55]. That is, we superimposed a spatial layer of the georeferenced PSUs over different covariate layers and obtained covariate values at the corresponding locations. To account for PSU location displacement, averages were obtained from the four nearest raster cells. As a check of sensitivity to alternative analytical scales, these averages were compared to those resulting from applying buffer sizes of 5, 10 and 20 km using Pearson correlation coefficients.
We updated accessibility measures for the current analysis with Natural Earth's free data on ‘populated places’ (v. 4.0.0, released in October 2017), which included national and subnational capitals, as well as places with a population size of at least 50 000 [56]. We calculated the straight-line distance from every included DHS PSU to the nearest populated place. We opted for straight-line distance for its comparability to proxy accessibility with more complicated metrics such as mechanized and non-mechanized estimated travel time in LMIC settings [57]. Country administrative areas shapefiles were obtained from the freely available Database of Global Administrative Areas [58].
We found missing data in the spatial coordinates of 9 of 1594 PSUs from the Kenya DHS and 7 of 896 PSUs from the Nigeria DHS. These PSUs were removed from the analysis [59]. There were no other missing data. In addition, one PSU data point was removed from the analysis in the Southern Region in Malawi as it had an extreme value of 41 453 for population density, while the median and 75th percentile were 267 and 579 and observations of the nearest neighbours were below 10 000.
2.4. Methods of interpolation
2.4.1. Model-based geostatistics
The MBG model is a class of generalized linear mixed models with an approximation of a multivariate stationary Gaussian Process for outcome z at location si with mean μ and covariance C for the spatial component, as well as an unstructured component ɛ(si) represented as Gaussian with zero mean and variance [19]. The mean μ is modelled using a linear function of the predictor variables, while spatial covariance is written as
where s1 and s2 are a pair of sampled locations of distance h units apart. Covariance expresses the amount of variation in the observed Z values at s1 and s2. We separately modelled the spatial dependency structure using a spherical covariance function for each included survey. The spherical covariance function is written as ρ(h): C(h) = σ2ρ(h), where
and ϕ the decay parameter [60].
2.4.2. Generalized additive model using spline spatial interpolation
The spline interpolation consists of polynomials that describe pieces of a surface and are fitted together so that they join smoothly [20]. The Akima method was developed to implement bivariate interpolation onto a grid for irregularly spaced point data using bivariate smoothing techniques [61,62]. The interpolation function should pass through or nearby the observed values at all sampled locations.
For each survey, the interactions between latitude and longitude of the DHS PSUs are used as a predicator variable together with the aforementioned in a GAM as smooth functions. The GAM regression technique supports non-Gaussian error distributions and nonlinear relationships between the outcome and predictor variables [63]. GAMs are non-parametric extensions of linear model regressions that apply nonparametric smoothers to each predictor and additively calculate the component outcome [63]. A GAM is expressed as
We use the identity link g(.) to relate the linear predictor with the expected value of the response Z. For each predictor variable Xi, a smoothing function fi is found. GAM can provide fit for a linear, nonlinear and non-monotonic relationship. We specified each term as a penalized thin plate regression spline. A truncated eigen-decomposition is used to achieve the rank reduction [64].
2.4.3. Linear models
Lastly, we compared the spatial methods with a multivariable linear regression, which estimates WI by exploiting its dependency on population density and other covariates as outlined earlier. The equation used is
The regression coefficients βi are constant over the whole study area and can be estimated using the least square method, from a set of covariates at N observed locations.
2.5. Assessment of predictive performance
We randomly divided the PSUs of each selected survey into a training set of 80% and a holdout of 20% for validation. We used the training set to build the models with all predictor variables, which was then used to make predictions for the holdout locations. This enabled us to directly assess prediction accuracy of the three methods compared to the observed values. We conducted the process for 100 randomly selected training and testing datasets and compared the mean values of four accuracy metrics for each method. We further repeated the process with three different proportions of holdout—30%, 40% and 50%—to examine the potential impact on prediction accuracy, as data availability changes.
Prediction accuracy was measured by the mean absolute error (MAE), root mean square error (RMSE), the goodness-of-prediction (G) statistics (also referred to as the predictive R-squared), and correlation coefficient between observed and predicted values. The MAE was used to detect bias, and should be zero if the predictions were unbiased. RMSE was used to measure the average magnitude of the squared error. Smaller MAE and RMSE values would indicate few errors and more accurate predictions from the model. The two are calculated as follows:
where n is the number of predictions made, pi the predicted value at point si and oi the observed value at location si.
The G-value is a measure of the effectiveness of model estimates relative to estimating with just the sample mean. The G-value is written as
A G-value of 1 indicates perfect prediction, a positive value indicates a more reliable model than if the sample mean had been used, a negative value indicates a less reliable model than if the sample mean had been used.
We used Stata/SE 14 data management and R v. 3.4.1 for all the statistical analyses. MBG and GAM-based fitting were performed using the R packages spBayes [65] and mgcv [64], respectively.
3. Results
The number of georeferenced PSUs across the four study surveys ranged from 605 in Tanzania and 1585 in Kenya (figure 1). The number of sampled households ranged from 12 558 in Tanzania to 38 021 in Nigeria. The average number of PSU per 1000 km2 was higher in Malawi than in the other countries—9 compared to 1–3 (figure 1). The average numbers of households per PSU in Kenya, Malawi, Nigeria and Tanzania, respectively, were 23, 28, 43 and 21, and of de-jure household members per PSU were 91, 141, 199 and 104.
The distributions of PSU mean WI for each country are shown in figure 1b–e. In Tanzania and Malawi, majority of PSUs were relatively poor and the distributions of the WI were heavily right-skewed. The spatial distribution of PSU mean WI is also presented and showed good survey coverage in all areas (figure 1a and electronic supplementary material A).
The spatial pattern of PSU mean WI varied across countries. In Malawi, concentrations of wealthy PSUs were observed in Mzuzu, Lilongwe and Blantyre, among others. In Tanzania, we found relatively wealthy PSUs in Dar es Salaam, Arusha, Mwanza and Zanzibar. On the other hand, prominent spatial gradients were observed in Kenya and Nigeria. In Kenya, majority of the north and northeast was poor except for a few larger towns and the regional capitals. The wealthiest PSUs were found in the Nairobi and Central Kenya provinces. Most mid-WI PSUs were found to the west and east sides of Central Kenya Province. Northern Nigeria was predominantly poor. The majority of relatively rich PSUs were located in the southern part of the country, and one cluster at the centre in Abuja. In the south, a substantial number of mid-WI PSUs were seen the Enugu and Makurdi states.
Table 2 shows the four accuracy metrics for all results. Across all study countries, both SI approaches performed better than the linear fit. For both MBG and GAM, mean errors generally increased from lower to higher holdout proportions, and the opposite was observed for G-value and correlation. This indicated a greater probability that inaccurate predictions occurred in models with larger holdouts. Regardless of the SI method used, G-value and correlation were the lowest for Malawi which reflected worst prediction effectiveness when compared with the other three countries.
Table 2.
The GAM fit performed better at all holdout proportions for Malawi, Nigeria and Tanzania based on all four metrics. In Kenya, mixed results were observed—MBG interpolations were comparatively better for RMSE, G-value and correlation between predicted WI and observed WI at 20–40% holdout. The relative performance of the GAM in Kenya improved as holdout proportions increased to 50%.
The spatial patterns of the covariates are illustrated in electronic supplementary material B, and we explored the effects of the covariates by country using the full datasets (electronic supplementary material C). Night-time light emission most consistently showed an association with WI across all countries, followed by population density. Overall, night-time light was positively associated with WI, while the opposite was observed for population density. In the GAM fits, in all cases except for population density in Tanzania, the curves were significant at the 0.001% level.
Finally, Pearson correlation coefficients between the average values used for our analysis and those resulting from using buffer sizes of 5, 10 and 20 km showed strong correlations across different extractions methods (table 3), thus we do not expect the analytical results to differ by using alternative scales.
Table 3.
Kenya |
Malawi |
Nigeria |
Tanzania |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 km | 10 km | 20 km | 5 km | 10 km | 20 km | 5 km | 10 km | 20 km | 5 km | 10 km | 20 km | |
night-time light | 0.978 | 0.933 | 0.859 | 0.958 | 0.863 | 0.673 | 0.976 | 0.914 | 0.802 | 0.966 | 0.891 | 0.812 |
aridity | 0.997 | 0.989 | 0.964 | 0.995 | 0.988 | 0.974 | 1.000 | 1.000 | 1.000 | 0.996 | 0.983 | 0.958 |
potential evapotranspiration | 0.996 | 0.989 | 0.968 | 0.989 | 0.970 | 0.940 | 1.000 | 0.999 | 0.999 | 0.995 | 0.982 | 0.948 |
land surface temperature | 0.978 | 0.994 | 0.970 | 0.956 | 0.988 | 0.946 | 0.995 | 0.997 | 0.972 | 0.960 | 0.986 | 0.906 |
vegetation index | 0.948 | 0.988 | 0.961 | 0.926 | 0.979 | 0.933 | 0.974 | 0.992 | 0.961 | 0.928 | 0.982 | 0.929 |
elevation | 0.998 | 0.994 | 0.984 | 0.990 | 0.974 | 0.949 | 0.996 | 0.991 | 0.984 | 0.998 | 0.993 | 0.981 |
4. Discussion
In this study, we assessed the performances of two spatial methods to predict poverty for four countries in sub-Saharan Africa. We compared a Bayesian multivariable MBG approach with spline interpolation as part of a GAM to predict WI at holdout locations using DHS data. We observed better predictive performances of these spatial methods when compared to non-spatial models. Our results revealed marked differences in the shape of the distribution and spatial pattern of WI across the four countries. We found that predictive performance was the lowest in Malawi compared to the other three countries regardless of the method used. GAM-based fitting of smoothed functions of the spatial coordinates and WI, adjusted for other predictor variables, generally performed better than MBG in Malawi, Nigeria and Tanzania. In Kenya, on the other hand, the GAM fit resulted in marginally worse prediction accuracy than the MBG approach.
4.1. Study limitations
Our findings have important implications but should be understood within certain limitations. First, the random displacement applied to the GPS coordinates of the DHS PSUs could have misclassified assignment of predictor variables [66]. The extent of misclassification depends on the smoothness of the surface from which data are being linked [66]. We attempted to mitigate the effects of potential bias of rough/unsmooth surfaces by integrating from raster cells close to the displaced locations. Second, as we conducted an out-of-sample validation based on sampled data, the comparative performance between MBG and GAM if 100% of the data were used to make predictions at unsampled locations is uncertain. Third, we did not use the revised global map of travel time to cities estimated by Weiss and colleagues [67] which was published after we analysed our data. Fourth, we opted for a straight-line measure for accessibility as it is unclear that more sophisticated methods are better [57,68]. Fifth, the use of asset-based indices to assess poverty may be affected by the choice of components and poor comparability between urban and rural areas [69,70], but such indices are easy to compute and compare well to more complex indicators of wealth [71–73]. Sixth, we only used four case countries, and our results may have limited generalizability to LMICs. Last, the wealth index data used for modelling were aggregated to the PSU from household-level data, and the covariates exploited were provided at different grid sizes. Grid size of the land surface temperature layer and the vegetation index layer, in particular, are larger and have potential within-grid variations that cannot be accounted for in the current analysis.
4.2. Model-based geostatistics and generalized additive model
Comparison based on goodness-of-fit value and correlation showed that predictive performance was lowest in Malawi, indicating neither model was sufficient to address the spatial variability of WI. The covariate datasets used were provided as raster objects at set grid size. Within each grid, covariate values are considered constant. Given that Malawi has a substantially smaller land area compared to the other three countries, every grid on a Malawi covariate layer covers a larger proportion of the country's surface area, leading to higher levels of aggregation. At higher levels of aggregation, there is greater potential for information loss [74]. Night-time light emission, one of the strongest predictors found in this study (see electronic supplementary material C), ranged between zero and approximately 60 units across all four countries. If the spatial scale of covariate effect in Malawi was also similar to the other countries, higher levels of aggregation may not lead to greater information loss. On the other hand, if the spatial scale of covariate effect for night-time light in the smaller country and economy was as least as rapid as the other three countries, greater potential for information loss might be expected [74]. This may have contributed to the reduced model performance and prediction accuracy in Malawi.
While the two SI methods explored in this analysis offer different ways of capturing the underlying spatial pattern, they share certain mathematical connection as previously discussed by Cressie and Wahba [75,76]. Cressie, for instance, demonstrated commonalities between the two-dimensional Laplacian smoothing spline of degree two and the universal kriging predictor [76]. Nonetheless, the two methods remain ‘practically very different’ [76], and the predictive performances resulting from the typical ways in which these methods are applied are the main interest of the current analysis.
Many factors affect the predictive performance of different SI methods, and our study did not yield a consistent ‘best method’. Rather, each approach offers different ways of capturing different data structure, and in line with previous studies [77–79], we found different methods performed better under different conditions. Our results revealed four possible factors for the performance of the methods: (i) data density, (ii) normality of data, (iii) the underlying spatial wealth pattern and (iv) the choice of covariates.
Firstly, the comparative performance of the two approaches might be sensitive to data density. Our results across a range of holdout proportions demonstrated that predictive performances reduced for both methods when sparse datasets were used. While this may not be surprising, the more optimal SI method for Kenya changed from MBG to GAM when data density decreased from 80 to 50%.
Secondly, non-spatial exploratory data analysis indicated that the WI values at the PSU level for Kenya (figure 1b) followed a normal distribution. On the other hand, the distributions for Malawi, Nigeria and Tanzania were right-skewed (figure 1c–e). This empirical difference across countries coincided with MBG performing more optimally for Kenya. Although normality in the outcome is not required for MBG, second order variation is structured as a multivariate normal-distributed random field. The influence of data normality, together with the choice of covariates (more below), on the suitability of different SI methods should be carefully accounted for. This may be particularly pertinent as top inequality—large and slowly declining top wealth shares as indicated by right skew—is rising both globally and in many countries [80–82]. It is also unlikely to be solely due to our use of WI as a measure of wealth, since previous studies have also found a similar distribution in other wealth indicators in Malawi [83] and Tanzania [84].
Thirdly, the underlying spatial pattern in the data is important to choosing the ‘best’ performing SI method in a given map region. MBG predictive maps are typically based on the assumptions of stationarity of the spatial process, as the approach accounts for the covariance of the residuals between any two locations by modelling it as dependent on the distance and direction between them, and is independent of the location itself. In the presence of good global spatial autocorrelation, such as the case of Kenya, where the global spatial pattern of wealth appears to decrease over distance from Nairobi (figure 1a), MBG performed marginally better than GAM. In 1969, the post-colonial Kenyan government selected seven cities around Nairobi to develop as secondary cities to decongest urban conditions [85]. While Nairobi remains economically dominant in Kenya, the seven cities have developed a sizable economic base over the last few decades [85]. Except for Mombasa, these cities span across the Kenyan savannah in the southwest [86]. The rest of the country is predominately arid land where livelihoods are generally challenging [87,88]. The geographical pattern of wealth in Kenya may thus be more parsimoniously explained by spatial autocorrelation compared to the other study countries. The tendency for poverty rates to be more similar in nearby locations has also been shown in other LMICs [89,90].
In other settings, pairs of locations distant from each other may be more similar than nearby neighbours although local spatial autocorrelation is observed, in which case the assumption of stationarity may not be optimal when considering spatial processes over the whole map region. One practical way to take non-stationarity into account in an MBG framework is by partitioning the study area into disjoint regions and define a separate stationary process in each region [91]. Other non-stationary models may also be appropriate. The GAM formulation, for instance, allows the outcome to vary smoothly in space instead of assuming locations' predictive power on one another to be dependent on distance. In our study, the GAM approach provided better predictions than MBG at all holdout proportions for Malawi and Tanzania, where we observed spatial scatter of concentrations of wealthy locations across the national extents. The pattern observed in Malawi and Tanzania may not be unique. In Ethiopia and Rwanda, for instance, a secondary cities development component involving collections of locations that form a spatially multi-centred network has been proposed as part of a strategy to attain inclusive growth and build resilience [92,93]. The identification and inclusion of these secondary cities were partially based on their institutional capacity at the time of selection. Moreover, there were also the intentions to relieve urban conditions in primary cities, promote a spatial balance and equity and transform the economic geography of the countries through redistributing resources [92,93].
As development of secondary cities continues to be the focus of sustainable growth, it is important to account for the geographical organization of these emerging cities when constructing smoothed map surfaces of wealth and other development indicators using SI techniques. Researchers, planners and development agencies have conceived several types of theoretical city/settlement patterns, including nucleated, clustered, dispersed and random [33,94,95]. Depending on the spatial processes of the outcome and available covariates, the assumption of spatial stationarity in the SI model formulation may or may not be suitable. The potentials for the similarities between a distant pair of locations, or any pair of locations, to be used as an input for poverty mapping warrant further research. In particular, the application of some non-spatial methods for interpolation, including machine learning techniques, without the constraint of using neighbouring data to make prediction at an unsampled or unobserved location offers new opportunities to capturing more complex spatial patterns [41]. With these methods, an algorithm is used to decide which observations should be leveraged for a certain prediction, allowing the inclusion of data from any other sample points if the model finds them similar to the location being interpolated in terms of the predictor variables.
Lastly, the choice of predictor variables and their relationships with the outcome is a strong factor influencing the predictive performance and the choice of SI method. The outcome being mapped may be spatially correlated, and largely due to certain spatial trends in the covariates. In which case, accounting for covariate effects and examining whether any residual spatial correlation remains are crucial. The current analysis was performed using the full model formulation with all covariates included. Overall, the curvatures for night-time light emission and population density showed the strongest effects across study countries, while the other climatic and environmental features have moderate effects in Kenya and Nigeria, and weak effects in Malawi and Tanzania. This is an important point to note for two reasons: (i) the spatial processes of WI in Malawi and Tanzania are less stationary compared to Kenya and Nigeria and (ii) remotely sensed data are generally less costly to collect on a vast scale compared to other data collection efforts, making them suitable for the use of SI, but their availability is usually higher for natural conditions in LMICs where the determinants of the spatial structure of wealth are becoming more complex. Non-stationary spatial processes that lack suitable and readily available predictors (e.g. wealth/poverty in Malawi and Tanzania) can limit the predictive performance of SI methods that rely on good spatial stationarity. Different groups around the world are working on producing high resolution data on ‘man-made’ features for large geographical areas—anonymised mobile data [96], human settlement pattern [97], urban–rural classification [97], which are potentially more closely associated with the spatial process of wealth for some settings. Although mostly confined to smaller geographical areas such as subnational administrative regions, the number of studies on high or very high resolution of urban slum mapping have also been increasing [98]. The use of these data as covariates may mean that spatial autocorrelation would become more or less informative, and have potential influence on the comparative performance of different SI methods. In future attempts to create a smoothed poverty surface for a given region, one may wish to explore method-specific, contextually relevant covariates/interactions, perform variable selection as well as allowing for a more flexible predictor–outcome structure to find the best SI method and model formulation.
5. Conclusion
MBG and spline interpolation offer different ways of capturing spatial variability in the data. Our results shed light on four factors relevant to selecting a suitable method when interpolating poverty for an LMIC from sampled data and other covariates. These factors include data density, normality of data, the underlying geographical pattern of wealth and the choice of covariates. As part of the progress towards inclusive growth and resilience, governments and policymakers in some LMICs are beginning to aim for a spatial economic balance by redistributing resources within the national extent instead of having one primary city. This likely impacts the spatial autocorrelation structures of welfare, health and demographic indicators, leading to deviations from the most ideal conditions for some SI methods to perform optimally. The use of covariates further influences the extent to which residual spatial correlation can be informative in the prediction making process. In future attempts to create an interpolated poverty surface for an LMIC, researchers and analysts should carefully explore the structure of the possible covariates and the outcome in order to identify the most suitable SI method.
Supplementary Material
Supplementary Material
Supplementary Material
Acknowledgements
NASA images on land surface temperature were made by Reto Stockli, NASA's Earth Observatory Team, using data provided by the MODIS Land Science Team. Images of vegetation index were made available by Jesse Allen and Reto Stockli, NASA Earth Observatory Group, using data provided by the MODIS Land Science Team. Data on elevation were made available from the US Geological Survey. We also acknowledge CGIAR-CSI as the provider of the Global-Aridity and Global-PET Database. Image and data of night-time light were processed by NOAA's National Geophysical Data Center and data collected by the US Air Force Weather Agency. We took population density estimates by administrative unit centroid location from the Gridded Population of the World Administrative Unit Center Points with Population Estimates (version 4). These data were developed by the Center for International Earth Science Information Network (CIESIN), Columbia University and were obtained from the NASA Socioeconomic Data and Applications Center (SEDAC) at http://dx.doi.org/10.7927/H4NP22DQ. The locations of populated places were compiled by Nature Earth. We wish to thank the DHS and all providers and contributors for permitting data usage. Lastly, we wish to thank the reviewers, Miss Julia Shen, Dr Bindu Sunny, Dr Elizabeth Williamson and Dr Claudio Fonterre for useful discussions.
Data accessibility
All datasets generated and analysed during the current study are available from the following repositories: https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MOD11C1_M_LSTDA, https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MOD_NDVI_M, https://lta.cr.usgs.gov/GTOPO30, http://www.cgiar-csi.org/data/global-aridity-and-pet-database, https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html, http://sedac.ciesin.columbia.edu/data/collection/gpw-v4, http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-populated-places/ and https://dhsprogram.com/data/available-datasets.cfm.
Authors' contributions
K.L.M.W. and O.J.B. conceptualized the study. K.L.M.W undertook data processing and assembling. K.L.M.W. conducted the analysis, with supervision from O.J.B. and L.B. O.J.B., L.B. and O.M.R.C. contributed to interpretation of the findings. K.L.M.W. drafted the manuscript, with contributions from L.B., O.J.B. and O.M.R.C. All authors read and approved the final manuscript.
Competing Interests
We declare we have no competing interests.
Funding
O.J.B. is supported by a Sir Henry Wellcome Fellowship funded by Wellcome Trust (grant no. 206471/Z/17/Z).
References
- 1.Population and poverty | UNFPA—United Nations Population Fund. See http://www.unfpa.org/resources/population-and-poverty (accessed 14 December 2017).
- 2.Braithwaite A, Dasandi N, Hudson D. 2016. Does poverty cause conflict? Isolating the causal origins of the conflict trap. Confl. Manag. Peace Sci. 33, 45–66. ( 10.1177/0738894214559673) [DOI] [Google Scholar]
- 3.Elbers C, Lanjouw JO, Lanjouw P. 2002. Micro-level estimation of welfare. Washington, DC: World Bank Group. [Google Scholar]
- 4.De La Fuente A, Murr AE, Rascon Ramirez EG. 2015. Mapping subnational poverty in Zambia. Washington, DC: World Bank Group. [Google Scholar]
- 5.Paper W. 2013. Vietnam's evolving poverty map patterns and implications for policy. Washington, DC: World Bank Group. [Google Scholar]
- 6.Blumenstock J, Cadamuro G, On R. 2015. Predicting poverty and wealth from mobile phone metadata. Science 350, 1073–1076. ( 10.1126/science.aac4420) [DOI] [PubMed] [Google Scholar]
- 7.Gianni R, Correlatrice B, Tesi LN, Laurea D, Mauro KR. 2016. Small area estimation of poverty in the Philippines. Siena, Italy: University of Siena. [Google Scholar]
- 8.Qiu F, Cromley R. 2013. Areal interpolation and dasymetric modeling. Geogr. Anal. 45, 213–215. ( 10.1111/gean.12016) [DOI] [Google Scholar]
- 9.Xu W. 2014. Developing population grid with demographic trait: an example for Milwaukee County, Wisconsin. Master's thesis, University of Wisconsin-Milwaukee, WI, USA.
- 10.DHS Spatial Interpolation Working Group. 2014. Spatial interpolation with Demographic and Health Survey data : Key considerations. DHS Spatial Analysis Reports No. 9 Rockville, MD: ICF International [Google Scholar]
- 11.Burgert-Brucker CR, Dontamsetti T, Mashall A, Gething P. 2016. Guidance for use of the DHS Program modeled map surfaces. DHS Spatial Analysis Reports No. 14 Rockville, MD: ICF International. [Google Scholar]
- 12.Gething P, Tatem A, Bird T, Burgert-Brucker CR. 2015. Creating spatial interpolation surfaces with DHS data. Rockville, MD: ICF International. [Google Scholar]
- 13.Tatem A, Gething P, Pezzulo C, Weiss D, Bhatt S. 2014. Development of high-resolution gridded poverty surfaces. Bill and Melinda Gates Foundation Contract Final Report. [Google Scholar]
- 14.Alegana VA, Atkinson PM, Pezzulo C, Sorichetta A, Weiss D, Bird T, Erbach-Schoenberg E, Tatem AJ. 2015. Fine resolution mapping of population age-structures for health and development applications. J. R. Soc. Interface 12, 20150073 ( 10.1098/rsif.2015.0073) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tatem AJ, Campbell J, Guerra-Arias M, De Bernis L, Moran A, Matthews Z. 2014. Mapping for maternal and newborn health: the distributions of women of childbearing age, pregnancies and births. Int. J. Health Geogr. 13, 2 ( 10.1186/1476-072X-13-2) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kanyangarara M, et al. 2016. High-resolution Plasmodium falciparum malaria risk mapping in Mutasa District, Zimbabwe: implications for regaining control. Am. J. Trop. Med. Hyg. 95, 141–147. ( 10.4269/ajtmh.15-0865) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gething PW, Patil AP, Smith DL, Guerra CA, Elyazar IRF, Johnston GL, Tatem AJ, Hay SI. 2011. A new world malaria map: Plasmodium falciparum endemicity in 2010. Malar. J. 10, 378 ( 10.1186/1475-2875-10-378) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Spatial Data Repository—Modeled Surfaces. See https://spatialdata.dhsprogram.com/modeled-surfaces/#countryId=BD (accessed 14 December 2017).
- 19.Diggle P, Ribeiro PJ. 2007. Model-based geostatistics. New York, NY: Springer. [Google Scholar]
- 20.Burrough PA, McDonnell R, Lloyd CD. 2015. Principles of geographical information systems. Oxford, UK: Oxford University Press. [Google Scholar]
- 21.Legendre P. 1993. Spatial autocorrelation: trouble or new paradigm? Ecology 74, 1659–1673. ( 10.2307/1939924) [DOI] [Google Scholar]
- 22.Stephenson J, Gallagher K, Holmes CC. 2004. Beyond kriging: dealing with discontinuous spatial data fields using adaptive prior information and Bayesian partition modelling. Geol. Soc. Lond. Spec. Publ. 239, 195–209. ( 10.1144/GSL.SP.2004.239.01.13) [DOI] [Google Scholar]
- 23.Karydas CG, Gitas IZ, Koutsogiannaki E, Lydakis-Simantiris N, Silleos GΝ. 2009. Evaluation of spatial interpolation techniques for mapping agricultural topsoil properties in Crete. EARSeL eProceedings 8, 26–39. [Google Scholar]
- 24.How Kriging works—Help | ArcGIS Desktop. See http://desktop.arcgis.com/en/arcmap/latest/tools/3d-analyst-toolbox/how-kriging-works.htm (accessed 14 December 2017).
- 25.Gourdji S, Bash Almaliki JS, Falih Nazal Z. 2017. Distribution modeling of hazardous airborne emissions from industrial campuses in Iraq via GIS techniques. IOP Conf. Ser. Mater. Sci. Eng. 227, 012055 ( 10.1088/1757-899X/227/1/012055) [DOI] [Google Scholar]
- 26.Frigo C, Osterloo K. 2016. exSPLINE that: explaining geographic variation in insurance pricing. EARSeL eProceedings 8, 26–39. [Google Scholar]
- 27.Sengupta S.2010. Spatial Statistics: A Framework for Analyzing Geographically Referenced Data in Insurance Ratemaking. See https://www.casact.org/education/rpm/2010/handouts/PM1-Sengupta.pdf. (accessed 14 December 2017).
- 28.Wieling M, Montemagni S, Nerbonne J, Baayen H.2012. Applying Generalized Additive Mixed Modeling: Tuscan Dialects vs. Standard Italian. See https://lstat.kuleuven.be/research/lsd/lsd2012/presentations2012/Leuven.pdf. (accessed 14 December 2017).
- 29.O'brien L, Rago P. 1996. An application of the generalized additive model to groundfish survey data with Atlantic cod off the northeast coast of the United States as an example. NAFO Sci. Coun. Stud. 28, 79–95. [Google Scholar]
- 30.O'brien L. 1997. Preliminary results of a spatial and temporal analysis of haddock distribution applying a generalized additive model. US Department of Commerce, National Oceanic and Atmospheric Administration, National Marine Fisheries Service, Northeast Region, Northeast Fisheries Science Center. [Google Scholar]
- 31.Voss PR, Long DD, Hammer RB, Friedman S. 2006. County child poverty rates in the US: a spatial regression approach. Popul. Res. Policy Rev. 25, 369–391. ( 10.1007/s11113-006-9007-4) [DOI] [Google Scholar]
- 32.Lawson D, Ado-Kofie L, Hulme D.2017. What works for Africa's poorest: programmes and policies for the extreme poor. Rugby, UK: Practical Action Publishing. (doi:10.3362/9781780448435)
- 33.Roberts BH. 2014. Managing systems of secondary cities: policy responses in international development. Brussels, Belgium: Cities Alliance: Cities without Slums. [Google Scholar]
- 34.The World Bank. Indicators | Data. See https://data.worldbank.org/indicator (accessed 31 March 2018).
- 35.ICF International. 2012. Demographic and health survey sampling and household listing manual. Rockville, MD: ICF International. [Google Scholar]
- 36.ICF International. 2013. Incorporating geographic information into demographic and health surveys: a field guide to GPS data collection. Calverton, MD: ICF International. [Google Scholar]
- 37.Burgert C, Colston J, Roy T, Zachary B. 2013. Geographic displacement procedure and georeferenced data release policy for the Demographic and Health Surveys. Rockville, MD: ICF International. [Google Scholar]
- 38.Rutstein SO, Johnson K. 2004. The DHS wealth index. DHS Comparative Reports No. 6 Calverton, MD. [Google Scholar]
- 39.The DHS Program—Wealth Index Construction. See https://www.dhsprogram.com/topics/wealth-index/Wealth-Index-Construction.cfm (accessed 19 December 2017).
- 40.Howe LD, Hargreaves JR, Ploubidis GB, De Stavola BL, Huttly SRA. 2011. Subjective measures of socio-economic position and the wealth index: a comparative analysis. Health Policy Plan. 26, 223–232. ( 10.1093/heapol/czq043) [DOI] [PubMed] [Google Scholar]
- 41.Jean N, Burke M, Xie M, Davis WM, Lobell DB, Ermon S. 2016. Combining satellite imagery and machine learning to predict poverty. Science 353, 790–794. ( 10.1126/science.aaf7894) [DOI] [PubMed] [Google Scholar]
- 42.Jokwi P, et al. 2007. Spatial determinants of poverty in rural Kenya. Proc. Natl Acad. Sci. USA 104, 16 769–16 774. ( 10.1073/pnas.0611107104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Rogers D, Emwanu T, Robinson T. 2006. Poverty mapping in Uganda: an analysis using remotely sensed and other environmental data. Rome, Italy: PPLPI, FAO. [Google Scholar]
- 44.Pozzi F, Robinson T, Nelson A. 2009. Accessibility mapping and rural poverty in the Horn of Africa. PPLPI Working Paper-Pro-Poor Livestock Policy Initiative, FAO 47. [Google Scholar]
- 45.Sedda L, Tatem AJ, Morley DW, Atkinson PM, Wardrop NA, Pezzulo C, Sorichetta A, Kuleszo J, Rogers DJ. 2015. Poverty, health and satellite-derived vegetation indices: their inter-spatial relationship in West Africa. Int. Health 7, 99–106. ( 10.1093/inthealth/ihv005) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.CIESIN—Columbia University. 2017. Gridded Population of the World, Version 4 (GPWv4): Population Density, Revision 10. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC) ( 10.7927/H4DZ068D) [DOI] [Google Scholar]
- 47.Land Surface Temperature [Day] (1 month - Terra/MODIS) | NASA. 2017. See https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MOD11C1_M_LSTDA (accessed 14 December 2017).
- 48.Vegetation Index (1 month - Terra/MODIS) | NASA. 2017. See https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MOD_NDVI_M (accessed 14 December 2017).
- 49.Global 30 Arc-Second Elevation (GTOPO30) | The Long Term Archive. See https://lta.cr.usgs.gov/GTOPO30 (accessed 14 December 2017).
- 50.Zomer RJ, Trabucco A, Bossio DA. 2008. Climate change mitigation: a spatial analysis of global land suitability for clean development mechanism afforestation and reforestation. Agric. Ecosyst. Environ. 126, 67–80. ( 10.1016/j.agee.2008.01.014) [DOI] [Google Scholar]
- 51.Zomer RRJ, Bossio DAJ, Trabucco A, Yuanjie L, Gupta DC, Singh VP. 2007. Trees and water: smallholder agroforestry on irrigated lands in northern India. Colombo, Sri Lanka: International Water Management Institute. [Google Scholar]
- 52.Global Aridity and PET Database | CGIAR-CSI. See http://www.cgiar-csi.org/data/global-aridity-and-pet-database (accessed 14 December 2017).
- 53.NOAA/NGDC—Earth Observation Group—Defense Meteorological Satellite Program, Boulder. See https://ngdc.noaa.gov/eog/faq.html (accessed 17 January 2018).
- 54.Earth Observation Group—Defense Meteorological Satellite Program, Boulder | ngdc.noaa.gov. See https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html (accessed 17 January 2018).
- 55.Davidson R. 2008. Reading Topographic Maps. See http://www.map-reading.com/ (accessed 25 June 2018).
- 56.Natural Earth. See http://www.naturalearthdata.com/about/terms-of-use/ (accessed 14 December 2017).
- 57.Nesbitt RC, et al. 2014. Methods to measure potential spatial access to delivery care in low- and middle-income countries: a case study in rural Ghana. Int. J. Health Geogr. 13, 25 ( 10.1186/1476-072X-13-25) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Global Administrative Areas | Boundaries without limits. See http://www.gadm.org/ (accessed 14 December 2017).
- 59.The DHS Program User Forum: Geographic Data How to deal with missing DHS clusters? See https://userforum.dhsprogram.com/index.php?t=msg&th=759&goto=1224&S=Google. (accessed 9 January 2018).
- 60.Banerjee S, Carlin B, Gelfand A. 2014. Hierarchical modeling and analysis for spatial data. New York, NY: CRC Press. [Google Scholar]
- 61.Akima H. 1996. Algorithm 761; scattered-data surface fitting that has the accuracy of a cubic polynomial. ACM Trans. Math. Softw. 22, 362–371. ( 10.1145/232826.232856) [DOI] [Google Scholar]
- 62.Akima H. 1978. A method of bivariate interpolation and smooth surface fitting for irregularly distributed data points. ACM Trans. Math. Softw. 4, 148–159. ( 10.1145/355780.355786) [DOI] [Google Scholar]
- 63.Wood SN. 2017. Generalized additive models: an introduction with R. New York, NY: Chapman and Hall/CRC. [Google Scholar]
- 64.Wood S. 2017. Mixed GAM computation vehicle with automatic smoothness estimation. R package vers 1.8–22.
- 65.Finley AO, Banerjee S, Gelfand AE. 2015. spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models. J. Stat. Softw. 63, 1–28. ( 10.18637/jss.v063.i13) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Perez-Haydrich C, Warren JL, Burgert CR, Emch ME. 2013. Guidelines on the use of DHS GPS data. Spatial Analysis Reports No. 8. Calverton, MD: ICF International.
- 67.Weiss DJ, et al. 2018. A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature 553, 333–336. ( 10.1038/nature25181) [DOI] [PubMed] [Google Scholar]
- 68.Noor AM, Amin AA, Gething PW, Atkinson PM, Hay SI, Snow RW. 2006. Modelling distances travelled to government health services in Kenya. Trop. Med. Int. Health 11, 188–196. ( 10.1111/j.1365-3156.2005.01555.x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Houweling TA, Kunst AE, Mackenbach JP. 2003. Measuring health inequality among children in developing countries: does the choice of the indicator of economic status matter? Int. J. Equity Health 2, 8 ( 10.1186/1475-9276-2-8) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Howe LD, Hargreaves JR, Huttly SR. 2008. Issues in the construction of wealth indices for the measurement of socio-economic position in low-income countries. Emerg. Themes Epidemiol. 5, 3 ( 10.1186/1742-7622-5-3) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Filmer D, Scott K. 2012. Assessing asset indices. Demography 49, 359–392. ( 10.1007/s13524-011-0077-5) [DOI] [PubMed] [Google Scholar]
- 72.Filmer D, Pritchett L. 1998. Estimating wealth effects without expenditure data—or tears: an application to educational enrollments in states of India. Policy Research Working Papers, No. 1994. [DOI] [PubMed] [Google Scholar]
- 73.Morris S, Carletto C, Hoddinott J, Christiaensen L. 2000. Validity of rapid estimates of household wealth and income for health surveys in rural Africa. J. Epidemiol. Community Health 54, 381–387. ( 10.1136/jech.54.5.381) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Wakefield J, Lyons H. 2010. Spatial aggregation and the ecological fallacy. In Handbook of spatial statistics (eds AE Gelfand, PJ Diggle, M Fuentes, P Guttorp), pp. 537–554 Boca Raton, FL: CRC Press. [Google Scholar]
- 75.Wahba G. 1990. Comment on Cressie, Letters to the Editor. Am. Stat. 44, 255. [Google Scholar]
- 76.Cressie N. 1990. Reply to G. Wabha's letter to the editor—comment on Cressie. Am. Stat. 44, 256–258. [Google Scholar]
- 77.Li J, Heap AD. 2014. Spatial interpolation methods applied in the environmental sciences: a review. Environ. Model. Softw. 53, 173–189. ( 10.1016/j.envsoft.2013.12.008) [DOI] [Google Scholar]
- 78.Li J, Heap AD, Potter A, Daniell JJ. 2011. Application of machine learning methods to spatial interpolation of environmental variables. Environ. Model. Softw. 26, 1647–1659. ( 10.1016/j.envsoft.2011.07.004) [DOI] [Google Scholar]
- 79.Dirks K, Hay J, Stow C, Harris D. 1998. High-resolution studies of rainfall on Norfolk Island: Part II: Interpolation of rainfall data. J. Hydrol. 208, 187–193. ( 10.1016/S0022-1694(98)00155-3) [DOI] [Google Scholar]
- 80.Alavaredo F, Chancel L, Piketty T, Saez E, Zucman G. 2017. World Inequality Report 2018.
- 81.Benhabib J, Bisin A. 2016. Skewed wealth distributions: theory and empirics. Working Paper 21924. Cambridge, MA: National Bureau of Economic Research.
- 82.Jones CI. 2015. Pareto and Piketty: the macroeconomics of top income and wealth inequality. J. Econ. Perspect. 29, 29–46. ( 10.1257/jep.29.1.29) [DOI] [Google Scholar]
- 83.Mussa R, Masanjala W. 2015. A dangerous divide: the state of inequality in Malawi. Oxford, UK: Oxfam International.
- 84.L LP. 2016. Households income poverty and inequalities in Tanzania: analysis of empirical evidence of methodological challenges. J. Ecosyst. Ecography 6, 183 ( 10.4172/2157-7625.1000183) [DOI] [Google Scholar]
- 85.Otiso KM. 2005. Kenya's secondary cities growth strategy at a crossroads: which way forward? GeoJournal 62, 117–128. ( 10.1007/s10708-005-8180-z) [DOI] [Google Scholar]
- 86.Government of Kenya. 1999. National Poverty Eradication Plan 1999–2015. Nairobi, Kenya: Government Printer. [Google Scholar]
- 87.Elliot H, Fowler B. 2012. Markets and poverty in northern Kenya: towards a financial graduation model. Nairobi, Kenya: FSD Kenya. [Google Scholar]
- 88.Mugahsi Z, Obudho R. 1989. The spatial distribution of health services in the urban centres of Kenya. In Urbanisation et santé dans le Tiers Monde: transition épidémiologique, changement social et soins de santé primaires (eds G Salem, E Jeannée), pp. 235–256. Paris, France: ORSTOM. [Google Scholar]
- 89.Microcredit Summit Campaign. 2015. Mapping pathways out of poverty: the state of the microcredit summit campaign report. [Google Scholar]
- 90.Xie M, Jean N, Burke M, Lobell D, Ermon S.2015. Transfer learning from deep features for remote sensing and poverty mapping. (https://arxiv.org/abs/1510.00098. ) [DOI] [PubMed]
- 91.Abdul Rahm S, Rahim A, Mallongi A. 2016. Forecasting of dengue disease incident risks using non-stationary spatial of geostatistics model in bone regency Indonesia. J. Entomol. 14, 49–57. ( 10.3923/je.2017.49.57) [DOI] [Google Scholar]
- 92.Woldeyes F, Bisshop R. 2015. Unlocking the power of Ethiopia's cities.
- 93.Government of Rwanda and GGGI. 2015. National Roadmap for Green Secondary City Development. Kigali. [Google Scholar]
- 94.Choe K, Roberts B. 2011. Competitive cities in the 21st century: cluster-based local economic development. Manila, The Philippines: Asian Development Bank. [Google Scholar]
- 95.Tong D, Murray AT. 2012. Spatial optimization in geography. Ann. Assoc. Am. Geogr. 102, 1290–1309. ( 10.1080/00045608.2012.685044) [DOI] [Google Scholar]
- 96.Steele JE, et al. 2017. Mapping poverty using mobile phone and satellite data. J. R. Soc. Interface 14, 20160690 ( 10.1098/rsif.2016.0690) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Pesaresi M, Ehrilch D, Florczyk AJ, Freire S, Julea A, Kemper T, Soille P, Syrris V. 2015. GHS built-up grid, derived from Landsat, multitemporal (1975, 1990, 2000, 2014). Brussels, Belgium: European Commission.
- 98.Kuffer M, Pfeffer K, Sliuzas R. 2016. Slums from space—15 years of slum mapping using remote sensing. Remote Sens. 8, 455 ( 10.3390/rs8060455) [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All datasets generated and analysed during the current study are available from the following repositories: https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MOD11C1_M_LSTDA, https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MOD_NDVI_M, https://lta.cr.usgs.gov/GTOPO30, http://www.cgiar-csi.org/data/global-aridity-and-pet-database, https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html, http://sedac.ciesin.columbia.edu/data/collection/gpw-v4, http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-populated-places/ and https://dhsprogram.com/data/available-datasets.cfm.