Significance
The first United Nations Sustainable Development Goals target is for all people to clear the expenditures-based extreme global poverty line ($2.15 per person-day in 2017 prices) by 2030. Yet survey estimates of when and where people fall below this poverty line are often unavailable. One strategy to fill that gap is to train machine-learning models to estimate poverty from Earth observations data. Most models train on asset data, however, generating maps of relative wealth that do not map to poverty lines. We pilot a two-step modeling procedure that harnesses the accuracy gains of prevailing methods, but then maps those predictions to more policy-relevant poverty measures. This allows us to compute stable and forward-looking estimates of where people live in poverty.
Keywords: assets, expenditures, machine learning, poverty maps, small area estimates
Abstract
For many countries in the Global South traditional poverty estimates are available only infrequently and at coarse spatial resolutions, if at all. This limits decision-makers’ and analysts’ ability to target humanitarian and development interventions and makes it difficult to study relationships between poverty and other natural and human phenomena at finer spatial scales. Advances in Earth observation and machine learning-based methods have proven capable of generating more granular estimates of relative asset wealth indices. They have been less successful in predicting the consumption-based poverty measures most commonly used by decision-makers, those tied to national and international poverty lines. For a study area including four countries in southern and eastern Africa, we pilot a two-step approach that combines Earth observation, accessible machine learning methods, and asset-based structural poverty measurement to address this gap. This structural poverty approach to machine learning-based poverty estimation preserves the interpretability and policy-relevance of consumption-based poverty measures, while allowing us to explain 72 to 78% of cluster-level variation in a pooled model and 40 to 54% even when predicting out-of-country.
Accurate estimates of the number of people deprived of a minimum acceptable standard of living are available infrequently and only at the first- or second-level administrative unit, if at all, for many places in the Global South. These aggregate estimates can mask pockets of extreme poverty and quickly become outdated. This limits policymakers’ ability to recognize and respond to the most urgent human needs, to study the processes that cause and perpetuate poverty, and to evaluate the effectiveness of interventions. The scarcity of poverty estimates in low-resource settings persists because high quality household surveys of income and consumption expenditures are difficult and expensive to administer. As a result, such data remain undersupplied, a shortfall that is especially pronounced in many African countries (1).
Recent research seeks to address this gap through modeling efforts that leverage advances in machine learning (ML) and Earth observation (EO) (1–10). Scientific progress in this space has focused on improving the out-of-sample predictive accuracy of asset-based poverty (or wealth) measures through advances in algorithms or in the feature sets used to explain outcomes. For these advances to translate into greater uptake and impact, however, the measures predicted must also be policy relevant. The maps of asset wealth indices prevalent in this literature do not readily translate to the consumption-based poverty measures more often used by policy makers, such as the share of people living below national or international poverty lines. One alternative is to model consumption expenditures directly from EO data; however, this approach tends to reduce the predictive accuracy of the models (2).
Our goal is to improve the relevance of the dependent variable for cross-sectional ML poverty mapping, without sacrificing predictive accuracy. We employ a two-stage approach to model “structural poverty,” defined as the expected, poor, or nonpoor, level of consumption expenditure for a household given their durable characteristics, such as productive assets (11, 12). In the first stage, we estimate the relationship between productive assets and consumption expenditures; effectively “drawing the line” from households’ (more predictable) productive assets to their (more interpretable) consumption expenditures. In the second stage, we train EO-based models on cluster-level aggregates of the fitted structural poverty estimates from the first stage. This approach allows us to predict structural poverty using EO data available for unsurveyed areas, preserving the interpretable units of consumption expenditure while integrating the stability and forward-looking qualities of asset wealth measures, which we hypothesize also enhances predictability.
Data Fusion for Microlevel Poverty Estimation.
Multiple data fusion methods have been developed to address gaps in the availability of survey-based poverty estimates. For decades, researchers and practitioners have used and refined techniques that leverage census data on the covariates of poverty to produce more precise and unbiased small area estimates (SAEs) (13–16). First, a model of household or area-level characteristics on poverty is estimated using sample-based survey data that include consumption expenditures (or income). The resulting parameterized model is then used to predict poverty at more granular scales from the same household or area-level characteristics available for the entire population in the census data. These SAEs offer insight into spatial patterns of poverty but are published infrequently and often with long lag times.* Further, such poverty maps are not designed for intercountry comparability and cannot be easily customized because of the proprietary nature of the underlying data.
Newer ML based methods that harness EO and other geospatial “Big Data” have proven capable of generating more granular estimates of relative deprivation within as well as across low- and middle-income countries (1–3, 6, 8).† Instead of using census data, researchers derive area-level characteristics from cell phone records, satellite imagery, and various EO-based data products—including publicly available sources (1–3, 8). These data are used in concert with various ML methods that are well suited to handle large feature sets and model nonlinear relationships.
Among efforts to leverage geospatial data to interpolate and extrapolate into unsurveyed places, individual country studies in the SAE tradition have frequently retained flow-based monetary measures as their object or predictand. In contrast, multicountry studies have favored indices of asset holdings to proxy spatial patterns in poverty, with several advantages. Survey data collection of asset stocks is easier, cheaper, and less prone to substantial measurement error than flow measures of well-being like consumption expenditures or income. As a result, high-quality asset data are more often available to train the ML models. Productive assets are the stocks that generate income flows that enable consumption expenditure. Thus the connection between asset-based wealth indicators and income- or expenditure-based poverty measures follows intuitively. Indeed, the literature on asset-based, structural poverty demonstrates that, especially in poor places subject to multiple market failures that impede consumption expenditure smoothing, productive asset holdings reflect expected, permanent income (11, 20–22). Household assets and their correlates may also be more easily observed from EO. Satellite imagery can detect the size and quality of buildings, vehicles, and infrastructure but may overlook many short-term drivers of community-level consumption expenditures, such as disease outbreaks, labor market conditions, or price shocks. For these (and other) reasons, ML models trained on assets are more prevalent and consistently outperform models of monetary poverty and other well-being measures (2, 3, 23).
The result is that the poverty mapping literature has primarily produced maps of relative asset wealth. Meanwhile, practitioners primarily rely on monetary poverty measures based on income or consumption expenditure flows, anchored to interpretable normative thresholds, such as national and international poverty lines that represent a minimum acceptable standard of living as defined by governments and multilateral institutions. For example, under the first Sustainable Development Goal to “end poverty in all its forms everywhere,” the first target is to bring all people above the $2.15 (2017 purchasing power parity, PPP) per person per day extreme global poverty line by 2030 (24).‡ Progress toward such a goal cannot be tracked using an asset wealth index, which has no direct conversion to monetary poverty. While asset wealth indices may seem an intuitive proxy for consumption-based poverty, this assumed correlation is not always empirically well supported (25).
Unlike unit-less asset indices, estimates of monetary poverty can be compared over place and time using PPP conversions. Monetary measures are also flexible. For example, consumption expenditures data can be used to estimate the Foster-Greer-Thorbecke (FGT) class of distribution-sensitive poverty measures, including the “poverty gap” and “poverty gap squared” (26). The FGT measures take into account how far below the poverty line people’s incomes or consumption expenditures fall, and satisfy a range of desirable axiomatic properties (26, 27). The advantages of consumption-based poverty measures are balanced by the expense (and therefore scarcity) of high quality training data and the stochastic nature of consumption. Snapshots of monetary poverty may be dominated by transitory shocks or seasonality in income or expenditure patterns. This may obscure the chronic or structural deprivations of first-order humanitarian concern (11, 12, 28, 29).
As ML poverty mapping gains traction, it is a timely moment to examine the trade-offs in policy relevance, comparability, and accuracy that stem from using asset-based vs. consumption-based poverty maps—and to explore ways to mitigate these trade-offs. One way forward, which we advance in this paper, is to leverage both asset and consumption data to train the ML models that predict poverty from geospatial features. We introduce a set of structural poverty measures, based on the expectation of consumption expenditures given household asset holdings (11, 12), as predictands for microlevel multicountry poverty estimation. This paper outlines the conceptual advantages of these measures and develops a two-stage method for ML-based structural poverty mapping. We then implement and empirically evaluate this approach, testing contemporaneous out-of-sample predictive performance according to multiple simulated scenarios of data availability.
Our study area includes a set of countries in southern and eastern Africa that suffer from high poverty rates, are highly dependent on rainfed agriculture, and are subject to the types of market failures that are likely to inhibit consumption smoothing (11, 20–22). Models are trained and tested on data from 13 Living Standards Measurement Studies (LSMS) household surveys conducted in Ethiopia, Malawi, Tanzania, and Uganda between 2008 and 2020. These surveys are spatially and temporally matched to geospatial data on population density, building footprints, remoteness, night lights, elevation, slope, rainfall, temperature, and the Normalized Difference Vegetation Index (NDVI).
Results
We propose a set of structural poverty measures with desirable properties as the object (or predictands) of ML poverty mapping. These structural poverty measures are stable and forward-looking because they are anchored to the stock of productive assets. They are also expressed in familiar flow-based units tied to a normatively meaningful standard of living (e.g., the share of people living below a poverty line). They can be compared across countries and over time. These attributes respond to the needs of humanitarian and development programming, which require an understanding of both absolute and relative levels of deprivation, and must be responsive to poverty now and into the future.
To construct these structural poverty measures in the training data, we begin by introducing and modeling a more durable, asset-based analog to flow-based monetary measures: structural consumption (Methodology). Structural consumption is the expectation of consumption expenditures for a given portfolio of household assets. We use the household-level structural consumption estimates from these models to construct the FGT poverty headcount (), poverty gap (), and poverty gap squared () aggregates for each survey cluster sampling unit, where subscript denotes structural and superscript is the FGT poverty aversion parameter. These cluster-level structural poverty measures become the training data for the EO-based models. Because structural poverty is a latent variable, the performance of our EO models is validated against our survey-based estimates of structural poverty. These estimates in turn rely on assumptions about the strength and stability of the relationship between productive assets and consumption expenditures, and the stochastic nature of shocks to consumption.
We evaluate the strength of these assumptions for our study area prior to proceeding to train EO-based models from our structural poverty estimates. Our empirical assessment confirms the premise that productive assets are strong predictors of consumption expenditures, but with some limitations. In particular, we find that asset–expenditure relationships vary even across our study countries, which are geographically proximate and share many social and economic characteristics. Therefore, when we predict structural consumption out-of-country, differences emerge in the distributions of our structural estimates vs. realized consumption, which should be similar in expectation. This issue is mitigated when the model is trained using data from the target country or the pooled dataset of all study countries.
In addition to its conceptual advantages, we hypothesize that structural poverty can be more accurately proxied than realized consumption expenditures using ML models and EO data. This is supported by the comparative success in predicting assets over consumption in the literature (1, 2, 23). Our empirical results corroborate this expectation. We find that models of structural poverty consistently outperform models of comparable realized poverty measures, by multiple performance metrics and by a substantial margin. At the extreme poverty line of $1.90 per person per day, multicountry ML models predict approximately 72% (40% out-of-country) of cluster-level variation in the structural poverty headcount, compared to 57% (0% out-of-country) for a comparable realized poverty measure. At $3.20 per person per day, we predict approximately 78% (54% out-of-country) of variation in structural poverty and 69% (26% out-of-country) for realized poverty. Improvements of that scale are operationally meaningful for development practitioners.
Our results illustrate and elucidate the current limitations of EO-based poverty mapping, particularly weaker model performance at the lower end of the wealth distribution and the potential for bias when making out-of-country predictions. These problems persist but do not appear to be exacerbated by the structural poverty estimation approach.
Estimating Structural Poverty from Productive Assets.
When evaluating candidate structural consumption models, traditional performance metrics can be misleading. Perfect or near-perfect correlation between structural predictions and realized consumption expenditures would indicate overfitting. The advantage of a structural measure lies in its ability to filter out the “noise” of classical measurement error and stochastic shocks from which a household may have recovered by the time data are published and an agency had time to assimilate and act upon the poverty estimates. Instead, we seek a balance of fit and stability, along with evidence of unbiasedness, as we compare regressions of consumption on assets, including parametric ordinary least squares (OLS) first- (OLS-1) and second-order (OLS-2) polynomials and a random forest (RF-)regression. We also consider an RF-classification model for (non)poor status, suitable only for poverty headcount () estimation. Separate sets of models are estimated for each individual country and for the pooled (all-country) dataset and compared to a test set of held-out clusters or else on entirely held-out countries.
Our structural consumption models confirm our prior from the literature that productive assets are strongly predictive of consumption expenditures, using both linear and nonlinear models. As depicted in Fig. 1, structural predictions and realized consumption estimates agree on the (non)poor classification of 62 to 82% of households at the global extreme poverty line ( a day in 2011 PPP), and 66 to 90% for the global poverty line ( a day in 2011 PPP).§ The RF models for structural poverty have overall stronger agreement with realized poverty estimates in the test set than the OLS models (see SI Appendix, Fig. S10 for additional measures of fit). However, while agreement with the test set is highest in the RF models they are also less stable, with greater differences in these statistics from training to test set.
Fig. 1.
Comparison of realized vs. structural (non)poor classification. Panel A indicates classification agreement at a $1.90 poverty line, Panel B at a $3.20 poverty line (per person-day in 2011 prices). The circle indicates the share of households with the same classification in the training (open circle) and test (filled circle) data. The horizontal line is the difference between agreement in the training and the test set. Country models are based on training and test data from the same country, while leave-one-country-out (LOCO) models are trained on the pooled dataset excluding the test country.
These models are not intended to estimate causal relationships between specific variables and structural poverty. However, feature importance assessments confirm that our predictions incorporate information across multiple asset types, including access to electricity and technology (especially mobile phones), agricultural assets (such as land and livestock), and housing quality. Household size is the most highly ranked predictor of consumption, which may reflect labor supply and dependency ratios, or the limitations of per capita consumption estimates in accounting for household economies of scale (30, 31). The specific variables that contribute most to prediction vary by cross-validation geography, as shown in SI Appendix, Figs. S12–S14. This may partly reflect correlations among variables in related asset groups; for example, land and livestock ownership are correlated, as are housing quality characteristics (SI Appendix, Table S1).
In theory, our structural poverty models should produce unbiased estimators of realized consumption. Thus, while we anticipate lower variance in the structural vs. realized consumption distributions, we expect similar means at higher levels of aggregation (e.g., at the country level). In Fig. 2, we compare the realized and predicted distributions of consumption expenditures by country and by model. For the single-country models (left column), we observe no distinguishable difference in means for Malawi, Tanzania, and Uganda. In Ethiopia, predictions may be biased slightly upward (the mean of the RF predictions is $3.66, vs. $3.48 for realized consumption). For the pooled models (middle column), we still see no difference in means for Malawi and Tanzania, but the Uganda RF model now predicts mean consumption at $2.04 compared to the realized $1.87, and in Ethiopia, the difference is more pronounced compared to the single-country model. As we move to leave-one-country-out (LOCO) validation (right column) we detect differences in most models; the largest of these for Malawi ($3.53 to $3.68 vs. the realized $2.70) and Tanzania ($3.02 to $3.13 vs. the realized $3.98).
Fig. 2.
Distributions and comparison of means for realized consumption and structural estimates. The mirrored density plots represent the distribution of realized consumption expenditures as well as structural estimates from OLS and RF models. Horizontal lines represent the mean of each distribution. The brackets at the top of each plot indicate whether a t test for the difference in means is statistically significant (statistical significance is indicated by not significant (ns), *, **, and ***). Panels rows (A–D) are grouped by country, and panel rows by CV strategies: 1) country CV, 2) multicountry pooled CV, and 3) leave-one-country-out CV.
Our approach assumes a stable relationship, across space and over time, between productive assets and consumption expenditure. Our empirical assessment suggests that this is a strong assumption, more likely to hold when models are trained on same-country data vs. the data of neighboring countries. This may reflect substantive differences across our study countries: For example, the returns to land or livestock depend on the asset quality as well as local agro-ecology, labor and agricultural markets, the quality of institutions and social safety nets, and other factors. Observed differences may also reflect inconsistencies in measurement: How assets and consumption are surveyed and aggregated by different national statistical agencies.
Importantly, the quality and productivity of assets may also vary systematically with poverty. Poor households may have lower-quality assets, or may live in places where the productivity of those assets is lower due to lack of access to markets, production technologies, institutions, or physical infrastructure. If so, our models might overpredict consumption for the poorest households and underpredict consumption of the wealthiest. Empirically, we cannot distinguish this heterogeneity in the asset–consumption relationship from differences that arise due to stochastic variation in consumption. For example, if the lowest and highest realizations of consumption expenditures arise due to classical measurement error or stochastic shocks, rather than true structural poverty, we would expect to see a reduction in the variance of the distribution. This reduction in variance is observed (SI Appendix, Fig. S11). The households with the lowest realized consumption expenditures in our data are predicted to be slightly better off in terms of structural consumption. The reverse is also true: The households with the highest realized consumption have relatively lower structural consumption. One way forward, particularly in settings where we have a sense of the magnitude of the undesirable component of this difference, would be to adjust predicted structural poverty estimates ex post. However, here we are unable to parse the desirable reduction in transient shocks and noisy data offered by structural poverty estimation from undesirable risk of bias due to model errors that correlate with poverty.¶ We thus proceed using unadjusted estimates from RF structural consumption models to build the training dataset for the EO models. However, we urge that this be kept in mind when interpreting final estimates, particularly for the distribution-sensitive and measures that will be more affected by such biases.
Predicting Structural Poverty from Earth Observation.
The EO models trained on structural poverty () demonstrate consistently superior predictive performance over models trained on realized poverty (), with higher out-of-sample values, lower Root Mean Squared Error (RMSE), and higher Spearman’s rank correlation coefficients (). RF models consistently outperform comparable OLS specifications, but the main finding—that we predict structural poverty more accurately than realized poverty—is consistent. Results from the RF models are summarized in Table 1. The summary OLS results are reported in SI Appendix, Table S2 and full results by geography and fold are reported in SI Appendix, Figs. S15–S20.
Table 1.
Summary out-of-sample performance for EO-based random forest models
R-squared | RMSE | Spearman’s | |||||
---|---|---|---|---|---|---|---|
Validation | |||||||
A. FGT poverty measures, extreme global poverty line () | |||||||
country cv | 0.512 | 0.664 | 0.210 | 0.167 | 0.710 | 0.747 | |
country sp-cv | 0.279 | 0.453 | 0.246 | 0.189 | 0.570 | 0.620 | |
pooled cv | 0.573 | 0.716 | 0.207 | 0.174 | 0.736 | 0.832 | |
pooled loco | 0.051 | 0.395 | 0.304 | 0.245 | 0.495 | 0.692 | |
country cv | 0.429 | 0.590 | 0.098 | 0.058 | 0.708 | 0.756 | |
country sp-cv | 0.128 | 0.238 | 0.121 | 0.067 | 0.554 | 0.630 | |
pooled cv | 0.521 | 0.649 | 0.106 | 0.054 | 0.754 | 0.853 | |
pooled loco | 0.059 | 0.203 | 0.187 | 0.079 | 0.451 | 0.682 | |
country cv | 0.368 | 0.477 | 0.060 | 0.027 | 0.703 | 0.720 | |
country sp-cv | 0.051 | 0.012 | 0.074 | 0.030 | 0.521 | 0.598 | |
pooled cv | 0.453 | 0.540 | 0.070 | 0.023 | 0.726 | 0.845 | |
pooled loco | 0.065 | 0.003 | 0.129 | 0.033 | 0.404 | 0.647 | |
B. FGT poverty measures, global poverty line () | |||||||
country cv | 0.654 | 0.750 | 0.182 | 0.159 | 0.714 | 0.703 | |
country sp-cv | 0.371 | 0.582 | 0.193 | 0.172 | 0.575 | 0.577 | |
pooled cv | 0.687 | 0.777 | 0.174 | 0.165 | 0.721 | 0.774 | |
pooled loco | 0.263 | 0.544 | 0.260 | 0.214 | 0.506 | 0.672 | |
country cv | 0.604 | 0.741 | 0.120 | 0.081 | 0.759 | 0.840 | |
country sp-cv | 0.299 | 0.562 | 0.149 | 0.093 | 0.539 | 0.677 | |
pooled cv | 0.659 | 0.802 | 0.122 | 0.084 | 0.792 | 0.866 | |
pooled loco | 0.043 | 0.542 | 0.206 | 0.127 | 0.506 | 0.736 | |
country cv | 0.534 | 0.696 | 0.090 | 0.056 | 0.744 | 0.833 | |
country sp-cv | 0.213 | 0.446 | 0.115 | 0.059 | 0.588 | 0.622 | |
pooled cv | 0.579 | 0.757 | 0.098 | 0.053 | 0.777 | 0.874 | |
pooled loco | 0.067 | 0.451 | 0.173 | 0.075 | 0.482 | 0.717 | |
C. Asset wealth index | |||||||
country cv | 0.693 | 7.109 | 0.802 | ||||
country sp-cv | 0.459 | 8.331 | 0.628 | ||||
pooled cv | 0.774 | 7.321 | 0.811 | ||||
pooled loco | 0.507 | 9.318 | 0.706 |
Notes: Diagnostic statistics are the median over folds and geographies. The corresponding disaggregated performance statistics are plotted in SI Appendix, Figs. S15–S20.
To ensure that the superior performance of the structural models is not simply a product of the noisier consumption data (of concern particularly for the metric), we also compare the EO-based model trained on realized consumption against the test set of structural poverty estimates. Our main result is robust to this alternative validation: The structural EO model consistently outperforms the realized EO model when both are evaluated against structural poverty. In other words, the EO-based model trained directly on consumption expenditures does not appear to be indirectly learning about structural poverty.
Our models consistently perform best when trained on data that spatially overlaps with the test set.# For example, using standard (vs. LOCO) cross-validation, our pooled model for the poverty headcount at has a median of 0.72 (vs. 0.40), RMSE of 0.17 (vs. 0.21), and of 0.83 (vs. 0.74). To visualize this, Fig. 3 plots the first and second-stage out-of-sample structural predictions for all three spatial approaches to cross-validation (Data Splitting), as well as realized consumption expenditures for comparison (left-most panel).
Fig. 3.
Maps of Poverty Headcount at the extreme poverty line. For comparison, the leftmost panel for each row indicates cluster poverty rates estimated directly from realized consumption in the survey data. The remaining panels on the Top row are predictions from the asset–consumption models into the test sets (combining model results from cross-validation). The corresponding maps in the Bottom row are predicted from EO data trained on the structural poverty estimates.
This result is consistent with the literature as well as expectations; what poverty “looks like” from a satellite view varies somewhat across even neighboring countries as the natural, social, and economic systems differ across contexts. Measurement error may also be correlated by country, survey, and even spatially within surveys due to the enumerators or the way that people answer questions about consumption. In sum, we may have both differences in the true underlying structural poverty model and in our ability to detect these relationships across settings.
Accordingly, performance across our country-specific models is heterogeneous. Uganda stands out for its uneven performance across evaluation metrics and specifications. For example, for the at predictand with standard cross-validation, the values across folds are consistently positive for Ethiopia (0.75 to 0.87), Malawi (0.66 to 0.75), and Tanzania (0.41 to 0.56), but range widely for Uganda (0.88 to 0.77). There may be several reasons for this, including the aforementioned issues of data quality or a fundamentally weaker correlation between our EO features and structural poverty in Uganda. However, we suspect that it at least in part reflects the small sample size for Uganda: With only 245 clusters, we may simply not have enough data to reliably train a model. We have substantially more data for the remaining study countries, with 1047 (Ethiopia), 1691 (Malawi), and 1642 (Tanzania) unique clusters. Back-of-the-envelope calculations based on the LSMS data catalog suggest that the latter cases are more typical, at least for LSMS covered countries.‖
For comparison and to better situate our findings in the literature, we also predict asset wealth () using a comparable model and EO feature set. Results are reported in Part C of Table 1. However, while structural and realized estimates are for measures, the asset index is instead aggregated to the cluster level using a simple mean. This limits comparability: To which poverty line and to which do we compare? We cannot compare RMSE across the dependent variable types and the may also be sensitive to differences in the distribution, variance, and quality of the data. Still, it is encouraging that for the structural poverty headcount models ( at and ), the and for are in the same general range: Neither of these two dependent variables demonstrate a clear and consistent performance advantage vis-a-vis the other for all models.
Our goal is to improve the relevance of the dependent variable for ML poverty mapping without compromising our ability to predict it. To usefully compare and contrast across predictands and samples requires a model and feature set with good predictive performance, which we achieve using an RF model and a suite of EO-derived variables. When predicting structural poverty headcounts we achieve values (the most commonly reported metric in this literature) of 0.64 to 0.79 for the pooled cross-validation and 0.30 to 0.61 for the LOCO CV.** In comparison, a previous effort using satellite imagery and deep learning to predict consumption and assets for a similar study area achieved of 0.36 to 0.52 for consumption and 0.46 to 0.63 for asset wealth using LOCO CV (2). Another study that trained ML models on asset wealth data from 23 African countries achieved an average of 0.70 for held-out country-years (1). An asset-wealth model trained on data from 56 low- and middle-income countries (LMICs) achieved an average of 0.70 using basic cross-validation and 0.59 using LOCO CV (6). Using an approach that combines inference from interpretable features and satellite imagery from 25 countries in Africa, another recent study achieved an average of 0.85 for country-level CV and 0.88 for LOCO prediction of an asset wealth index (8).
In sum, it appears that our models and feature set offer solid performance despite the comparative simplicity and accessibility of our data and methods, and the limited sample of countries. We suspect that the small size of the clusters in the LSMS data (from 6 to 16 households) is also a limiting factor for model performance (7). While it is useful to situate our performance within the literature and is an intuitive metric, we caution that differences in the data, study areas, and approaches to validation complicate comparison of these values across studies. The may also not be the most important metric. For example, the relative ordering of clusters (as captured by metrics like the rank correlation coefficient) sometimes matters more for targeting the distribution of a limited aid budget. Further, all three of the performance metrics considered thus far (, RMSE, and ) are agnostic to heterogeneity in predictive performance. Previous studies have shown that performance at the low end of the wealth distribution tends to be the weakest; a promising average statistic may reflect the ability to distinguish “wealthier clusters from poorer clusters rather than in separating the poor from the near poor” (1). To investigate this important issue, we consider heterogeneity as well as how model performance changes when predicting the more distributionally sensitive measures.
Consistent with previous studies, we find that our models systematically underpredict the poverty headcount and overpredict asset wealth for the poorest clusters. Consider again our benchmark pooled EO-models for and at and the model predicting average asset wealth . All three of these models have good overall predictive performance and appear to be relatively unbiased estimators, with predicted means similar to the reference test sets. Yet the realized, structural, and asset models all predict that the poorest clusters are better off, in an absolute sense, than in the reference data. For the bottom quintile, the EO-model of realized consumption predicts (vs. the “ground-truth”) a poverty rate of 66% (vs. 89%), the structural model predicts a poverty rate of 67% (vs. 83%), and the asset model predicts a wealth index of 16.7 (vs. 10.6).††
Our findings across measures also suggest that these models’ ability to predict the magnitude of the gap, and especially the gap-squared, below an extreme poverty line is weaker than poverty headcount prediction. The proportion of the variation that we can predict out-of-sample () declines slightly as we move from the poverty headcount () to the poverty gap () and more steeply for the poverty gap squared () measures, especially for the poverty line and for the spatially out-of-sample predictions (spatial CV and LOCO CV). This is not unique to the structural poverty estimates; also declines for the realized poverty measures.
In contrast, the rank correlation coefficients are relatively stable across the predictands. This aligns with prior research showing that geospatial predictions are more effective at capturing the relative geography of poverty than its specific levels (32). This may arise because in our data relative poverty rankings appear to be relatively stable across the measures: In the realized survey data, the rank correlation coefficient () between is 0.95 and between is 0.92. Rank correlations are similarly high for the structural estimates. Arguably, these rank correlation coefficient estimates are the most salient for policymakers or program managers in geographic targeting of the distribution of scarce resources. However, for applications that require accurate level estimates of poverty or wealth for the poorest places, existing geospatial data fusion techniques may be inadequate, especially when in-country training data are not available. Further research is needed; one potential approach is to oversample these populations of interest to enhance model training. However, this strategy risks fitting the model to measurement errors and stochastic shocks, which may be more prevalent at the lower end of the distribution.
A final dimension of interest is variation in performance across urban and rural areas. Previous studies suggest that much of the power behind machine-learning poverty prediction from satellite features in Africa relies on our ability to distinguish typically wealthier urban clusters from poorer rural ones (1). Our results are consistent with this finding. The overall median of 0.72 for our baseline $1.90 model is higher than the within-rural of 0.45 to 0.57. Meanwhile, our model performs poorly in explaining variation in poverty rates among urban EAs ( of 0.74 to 0.20). Differences in rank correlation are less extreme but follow the same pattern; for the rural sample the average and for the urban sample .
While the evidence is insufficient to definitively explain the urban–rural performance difference, we offer some hypotheses for follow-on research. Several of our features, for example related to building footprints and nighttime lights, rank highly in terms of variable importance (SI Appendix, Fig. S21). Such features may be working to distinguish wealthier urban from poorer rural areas. Additional features, such as NVDI, slope and elevation, and climatology, also contribute to overall prediction. However, these features may be more helpful in differentiating poverty levels across rural areas highly dependent on rainfed agriculture, but less predictive in urban settings. This suggests the need for improved or expanded features focusing on urban poverty prediction. The offsetting of geospatial coordinates in our LSMS survey data may also be an issue for urban prediction, especially when wealthy and poor areas are in close proximity.‡‡ Moreover, it seems plausible that urban livelihoods simply depend more than rural ones on unobserved skills that are rewarded in labor markets or in enterprise performance but that are not directly observable in either survey or EO data, making explanation of within-urban variation a more challenging task than identifying rural-urban, or even within-rural differences.
Discussion
We argue that structural poverty holds promise as a policy-relevant and predictable object for machine learning poverty mapping. It is expressed in the same units as national and global policy objectives such as “[e]radicating extreme poverty for all people everywhere” under the first UN Sustainable Development Goal (24). Structural poverty is stable and forward-looking by construction; it is less sensitive to the classical measurement error and stochastic shocks that may quickly render maps based on realized consumption outdated. This interpretability and durability makes structural poverty estimates well-suited to inform development agendas that require medium- and long-term planning. They also have potential for measuring the geography of progress, but further research is needed to understand the dynamics of structural poverty mapping.
Of course, for applications such as the targeting of humanitarian aid in response to shocks policy makers need to understand both patterns of chronic deprivation and short-term impacts. Structural poverty estimation is designed for the former, capturing patterns of chronic deprivation but likely missing instances of acute, especially transitory poverty. Moreover, this paper does not test our ability to predict intertemporal changes in structural poverty using EO features. This remains an important question for future research, with implications for the appropriate use of structural poverty estimates. Quantifying uncertainty is another important avenue for future research. Due to the latent nature of structural poverty and the lack of access to census and other data that could better account for dependencies in our error terms, we do not provide quantitative uncertainty bounds for the estimates in this study.
We do not argue for structural poverty mapping to the exclusion of other efforts, especially important recent progress on mapping and forecasting shocks to consumption, food insecurity, or undernutrition (33, 34). Maps of asset wealth indices can be predicted quasi-globally and are useful complements to consumption-based poverty maps, even if they are imperfect substitutes. Human flourishing and deprivations are multidimensional and contextual, and pursuing a rich landscape of data products can eventually help us to understand the geographic intersections and discontinuities across measures.
In addition to its conceptual advantages, for a sample of four countries in southern and eastern Africa, we find that structural poverty is more easily predicted than realized poverty from an EO-based feature set. These differences are substantial. In our benchmark pooled multicountry model for the poverty headcount, the structural poverty measure has a higher (0.716 vs. 0.565), a lower RMSE (0.174 vs. 0.210), and a higher rank correlation coefficient (0.837 vs. 0.736) compared to models predicting realized poverty. In some specifications, asset indices may maintain a slight predictive advantage over structural poverty, but at the cost of interpretability and relevance to antipoverty policy.§§
The predictive accuracy of our models falls within the range of recently published multicountry poverty mapping efforts, but short of recent work that combines interpretable features and image-based deep learning (8). Our approach prioritizes accessibility: We use open-source data and models that can be run on a personal computer.¶¶ Combining structural poverty and deep- and transfer-learning could be a productive avenue for future research.
Our results suggest that bias in our structural poverty estimates is likely modest in the context of interpolation: For example, when we are predicting poverty using models trained on survey data from adjacent communities in the same country. But the likelihood of bias increases when we extrapolate, for example, into another country that is not represented in the training data. This has potential implications for the coverage of structural poverty estimation, as well as for other methods of FGT poverty estimation. The type of data we utilize to train the structural consumption models is of limited availability and, as is illustrated by the case of Uganda, performance is sensitive to the size of the training data. Meanwhile, asset data are more plentiful and available for more countries.
In theory, we could leverage our trained asset–consumption models to predict structural poverty in settings where only asset data are available. But our results give us pause about undertaking such extensions, given the substantial issues of bias that emerge even predicting into a neighboring country with many shared attributes. Going beyond the southern and eastern Africa context it will be necessary to adapt and recalibrate the structural poverty model. For example, incorporating savings and liabilities may be important in countries where these are more common.
All three sets of models—for realized poverty, structural poverty, and asset wealth—underestimate poverty in the poorest places. That this occurs across all three predictands suggests a fundamental limitation of current methods to predict extreme poverty from EO features. The same constraint appears to affect performance at the low end of the wealth distribution for imagery-based deep learning approaches (1, 2). This may arise because local labor markets, social safety nets, health, and other factors that are difficult to capture from satellite imagery or other geospatial features play a disproportionate role in the well-being of the poorest households. It likely also reflects noise or bias in the training data.
Even before we layer in data fusion and machine learning, survey-based consumption and poverty measurement is a topic of lively debate. Household consumption estimates are known to suffer from measurement errors, and those errors may inversely correlate with consumption or other markers of household welfare such as literacy and asset holdings (35). This has been shown to bias and decrease the accuracy of proxy means testing (36) and could similarly affect our first-stage structural poverty estimates. Long-standing questions around how best to adjust poverty measures for local consumption patterns and economies of scale within households have yet to be resolved (30, 31), and may affect geographic comparisons particularly between urban and rural populations, or across settings with different livelihoods or cultural norms regarding household structure. The small size of the household clusters in the LSMS data introduces random sampling error that will negatively impact our model performance (7). Sample bias is also a concern, particularly if there are fundamental differences between places that are and are not surveyed (37).
Our approach contributes to a broader literature on theory-informed machine learning prediction. Combining physical-science-based models with machine learning is already common in fields such as engineering, biomedical research, climate science, and hydrology (38). Integrating ML with theory-driven social science models presents added challenges: We must address inherent uncertainties and rely on strong assumptions. Yet, when the target phenomenon is a latent variable—or otherwise cannot be directly observed through remote sensing—two-stage modeling may improve prediction.
In time, improvements in algorithms or the availability of EO and other geospatial data products may improve our ability to detect the features of extreme poverty. But for now, high quality household surveys and survey-based research are needed to accurately understand the depth of deprivation among the poorest households and communities. Such data and analyses are similarly critical to the progression of ML microlevel poverty mapping in the future (4, 5).
Materials and Methods
Methodology.
A household is defined as structurally (non)poor if in expectation their portfolio of assets is associated with a (non)poor consumption expenditure level (11). Here, we are interested in the continuous analog to this binary concept of structural poverty, which we will refer to as structural consumption and denote by for household in period :
[1] |
where is a vector of household productive assets. Unlike with a binary (poor and nonpoor) classifier, a continuous measure allows us to assess the depth of a household’s structural deprivations and to later construct aggregate poverty measures that capture the magnitude of any such shortfall.
Our approach is motivated by well-established theory that consumption expenditures flow from the average (i.e., permanent) income generated by productive assets. However, note that our definition of structural poverty is more flexible; we do not require that the parameter estimates describing the relationship between assets and consumption be unbiased to identify causal relationships, only that the expected income prediction is unbiased. Indeed, we acknowledge the bidirectional nature of the relationship: Assets generate income and consumption, while income supports the accumulation of assets.
If we assume that the differences between a household’s structural consumption and realized consumption are stochastic, due to random shocks and/or classical measurement error, we can estimate a regression model that relates household assets and consumption expenditures to identify the function :
[2] |
where are the idiosyncratic errors. The best function for prediction is unknown, so we test the comparative performance of parametric (first- and second-order polynomial regressions) as well as nonparametric RF models. We also evaluate a RF classification model, which predicts households’ (non)poor status. In our preliminary analysis, we consider the bias–variance (or approximation v. overfit) trade-off between parametric and nonparametric models. We then use the most promising (RF-regression) model to construct estimates of structural poverty at the household level:
[3] |
Next, we construct the FGT poverty measures , , for the survey cluster of interest (26). Specifically, we calculate the estimated share of individuals## with consumption expenditures that fall below a national or international poverty line, or poverty headcount ; the average shortfall, or poverty gap ; and the average squared poverty gap :
[4] |
where is the total number of individuals in the sample and denotes the sampled households at location estimated to be below the poverty line , and is the FGT “poverty aversion” parameter. These are the poverty measures we wish to map; these estimates serve as the training data in the subsequent step.
Next, we consider how to project these estimates of structural poverty into areas not covered by household surveys. We use different CV strategies (described below) to simulate common scenarios where this type of interpolation or extrapolation might be useful to decision-makers. We train OLS and RF regression models to predict structural poverty using open-source EO and other geospatial data as
[5] |
All RF models are fit to minimize RMSE. The household structural consumption models use a shared set of hyperparameters based on results of grid search (SI Appendix, Figs. S1–S9), with the number of , the minimum size of terminal nodes , and the maximum number of variables sampled as candidates at each split at . For the cluster level models, the number of and the model hyperparameters are instead tuned individually using a 10 () by 10 () grid search. Model tuning is further described in SI Appendix.
We evaluate model performance using the coefficient of determination (), the RMSE, and a rank correlation coefficient, Spearman’s . The is the most commonly cited performance measure in the poverty mapping literature, and offers an intuitive measure of the degree of variation in the dependent variable that is explained by the model. However, it is sensitive to properties of the data used for validation, such as the variance and measurement error (39). RMSE may be a more reliable indicator of performance, except if we wish to compare across different types of dependent variables, for example a poverty headcount vs. an asset index. Finally, rank correlation coefficients may be a particularly useful diagnostic for applications such as the geographic targeting of humanitarian or development aid, when we are most interested in the relative ordering of communities rather than their absolute levels of deprivation.
Data Splitting.
We use three complementary nested cross-validation approaches that allow us to assess performance in reference to different use cases:
k-fold cross-validation: First, we split the data for each country into five folds based on a random draw of the enumeration areas, or clusters. We also implement a multicountry, or pooled, version of the k-fold cross-validation. This approach simulates predictive performance for interpolation in surveyed areas. For example, if we have cluster-sampled household survey data for the country or countries of interest, this approach simulates performance predicting into the unsurveyed clusters.
Spatial k-fold cross-validation: We also implement a spatially stratified variation of the k-fold cross-validation. Here, the test fold is geographically distinct from the training data to avoid overestimating performance due to spatial autocorrelation (6, 40). This would be analogous to a use case where we have survey data for the country of interest, but not for all regions, and therefore need to spatially extrapolate within the same country.
Leave-one-country-out cross-validation: Finally, we test the validity of a pooled model for predicting into a country for which (we simulate that) there are no household survey data available. Here, we leave out each country in the dataset in turn, training the model on all other countries’ data. Extrapolation into unsurveyed countries requires stronger assumptions but also has the advantage of more training data.
Household Survey Data.
Our approach requires data on consumption expenditures (or income) and productive assets, georeferenced at the microlevel. These are obtained from 13 LSMS surveys: For Ethiopia (2011–2012, 2013–2014, 2015–2016, and 2018–2019), Malawi (2010–2011, 2016–2017, and 2019–2020), Tanzania (2008–2009, 2010–2011, 2012–2013, 2014–2015, and 2019–2020), and Uganda (2011–2012); see SI Appendix for a link to the survey details and data. We use the published consumption expenditure aggregates from the respective datasets, which have been constructed by aggregating across several categories of consumption and then adjusting for regional cost-of-living differences. According to survey documentation, these are broadly consistent in their construction across the countries and surveys in our sample. We convert all values to 2011 purchasing-power-parity US dollars. Our asset index, which is not preconstructed in the LSMS data, is calculated following the data reduction techniques used to consolidate and harmonize asset data across Demographic and Health Surveys (41–44). We implement a broad definition of productive assets: the stocks that generate the income that enables consumption expenditure. This includes human capital, land, livestock, capital equipment and buildings, and water and sanitation. Details of the procedure and specific assets are described in SI Appendix.
Geospatial Features.
Our geospatial predictors consist of interpretable features, known to correlate with poverty and/or wealth, derived from publicly available data sources. Because our data are georeferenced at the cluster level with some random displacement to preserve anonymity, we extract survey-year averages of our geospatial variables for a 2 km buffer radius in urban areas and a 5 km buffer radius in rural areas (unless otherwise noted). Feature values are contemporaneous to the survey data unless otherwise noted; we include lags for some variables based on the expected temporal relationship. Large datasets were accessed via and preprocessed in Google Earth Engine, with dataset construction in R.
The geocoordinates for the LSMS data are at the cluster level and include a small random offset to preserve the privacy of the study population. Previous research suggests that this has minimal impacts on point estimation of poverty from geospatial data; however, it could impact estimates of uncertainty (32). For these and other reasons, this study does not include such uncertainty estimates.
Several of our features relating to geography and demography are time-invariant or slow-moving. We include building footprints, obtained from the Open Buildings Project Version 2 (45), which are a reliable indicator of human settlement and socioeconomic conditions on the ground (7, 10, 46). Average slopes and elevations are computed via Google Earth Engine based on data from the NASA Shuttle Radar Topography Mission to capture geophysical constraints on economic development (47). We also include travel time to the nearest urban center, a known correlate of prosperity, from the Malaria Atlas Project (48).
We draw on time-series data for features that vary substantively over the study period. Population count and density, which have been shown to be predictive of asset wealth in previous studies (6, 8), are derived from data by WorldPop (49, 50). A three-year average of nighttime lights is included as a proxy for economic activity (51). Given the time span of our dataset (2008–2020), we use a nighttime lights product that harmonizes data from the Defense Meteorological Satellite Program (1992–2013) and the Visible Infrared Imaging Radiometer Suite (2012–2018) (52, 53).
Climatic conditions and episodes of heat and water stress may impact people’s well-being through multiple avenues, especially via conditions for agriculture and livestock. We use the Climate Hazards group Infrared Precipitation with Stations to construct variables for long-term rainfall patterns, annual rainfall, and rainfall z-scores (54). Binned temperature variables reflecting the hours above 30 degrees Celsius are constructed from the Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2) 2-meter air temperature (55). The NDVI is derived from the NOAA Climate Data Record of Advanced Very High-Resolution Radiometer Surface Reflectance (56). The NDVI is an indicator of greenness that has been shown to correlate with poverty in rural, agriculturally dependent settings (3, 40).
Although many public satellite sensors offer at least monthly revisits with near-global coverage, this does not imply that the EO data products used in this analysis are of equal quality for all locations. Indeed, compared to other regions of the world, bias and measurement error in EO data products are likely to be elevated in our study area, and perhaps more so in poor rural areas. This is due to the underrepresentation of Africa generally, and especially remote locations, in training and validation datasets used to develop EO products, such as in situ monitoring stations (57). Seasonally persistent cloud cover, which is common in parts of our study area, may also impact the quality of EO estimates, especially for NDVI and nighttime luminosity (58). These factors may adversely affect the accuracy of our estimates, and especially contribute to weaker predictive performance for the poorest places.
Supplementary Material
Appendix 01 (PDF)
Acknowledgments
This work was supported by the Cornell Atkinson Center for Sustainability. We thank Mo Alloush, Kathy Baylis, Luc Christiaensen, Rebecca Chaplin-Kramer, Peter Lanjouw, Yanyan Liu, Daniel Maggio, Linden McBride, Joshua D. Merfeld, Aleksandr Michuda, Rachel Neugarten, David Newhouse, Stephan Schmidt, Linda Steinhübel-Rasheed, and Hantao Wu for helpful conversations and feedback on this work. We also thank Cassian D’Cunha and the Cornell Center for Social Sciences for computational resources and support. Any remaining errors are solely ours.
Author contributions
E.T., D.S.M., and C.B.B. designed research; E.T., Y.R., P.S., and C.B.B. performed research; E.T., Y.R., and P.S. analyzed data; E.T., D.S.M., and C.B.B. overall project supervision; and E.T., P.S., and C.B.B. wrote the paper.
Competing interests
The authors declare no competing interest.
Footnotes
Reviewers: M.A., Hamilton College; K.B., University of California, Santa Barbara; and P.L., Vrije Universiteit Amsterdam Afdeling Economics.
*This is in part because contemporaneous censuses and household surveys are scarce. Methods for SAE with disjoint census and consumption surveys have been developed (17, 18). See ref. 19 for a discussion of variations of the SAE approach suited to different data availability scenarios.
†Country-level work that uses geospatial data to build more directly on SAE methods is also gaining traction (7, 9, 10).
‡When the 2030 Agenda for Sustainable Development was released in 2015 this goal referenced the $1.25 per person per day (2005 PPP) extreme poverty line. This was later updated to the $1.90 (2011 PPP) and most recently to $2.15 (2017 PPP). These updates are implemented primarily to adjust for inflation. The empirical portion of this paper employs the $1.90 (2011 PPP $) poverty line, which was in effect at the start of this research.
§For brevity, we refer to these simply as the $1.90 and $3.20 poverty lines or the global extreme and global poverty lines henceforth. As previously noted, these thresholds have more recently been updated to 2017 PPP values of $2.15 and $3.65, respectively, which are substantively similar.
¶Although the first stage residuals are not strictly “errors,” we nonetheless consider whether they correlate with residuals in the second stage estimation. Reassuringly, the residuals across the two stages are slightly negatively correlated (SI Appendix, Fig. S23).
#Specifically, containing clusters from the same country and/or region; there is no test/train overlap of the clusters.
‖Based on survey information extracted from the World Bank LSMS data catalog (available at: https://microdata.worldbank.org/index.php/catalog/lsms), we estimate that the average number of EAs per survey is approximately 500 and the average EA-year observations per country approximately 1,500. However, as we did not verify that all required information was available for each of these surveys, these are likely to be overestimates.
**Our asset index models have of 0.72 to 0.79 for pooled CV, and 0.42 to 0.65 for LOCO CV.
††The values for the asset wealth index range from approximately 1 to 80.
‡‡Prior research finds that random offsets “do not meaningfully alter the estimated geography of poverty” (32). However, this finding is for slightly more aggregated units (vs. clusters based on enumeration areas), and the authors cite the need for further research in highly urbanized settings.
§§The benchmark pooled multicountry model for average asset wealth has an of 0.771. However, comparative performance of asset vs. structural models varies across models, and measures such as and RMSE are sensitive to the scale and distributions of the respective data.
¶¶With the possible exception of the grid searches used to tune the household structural consumption models. It is therefore worth noting that the performance of these models is not highly sensitive to the choice of hyperparameters. Models still perform well with simpler (including software default) approaches to model tuning. For those interested in further reducing computation time, we note that first-stage models using second-order polynomial OLS regression achieve good performance in the first-stage structural poverty estimation.
##Consumption expenditures are estimated at the household level, then weighted based on household size.
Contributor Information
Elizabeth Tennant, Email: ejt58@cornell.edu.
Christopher B. Barrett, Email: cbb2@cornell.edu.
Data, Materials, and Software Availability
All author generated data and code for this project are included in the replication package, available on the Zenodo repository (59). All source data utilized for this project are publicly available for noncommercial use, but may not be included directly in the replication package due to file size or permissions. Code is in R. Large datasets were accessed via and preprocessed in Google Earth Engine.
Supporting Information
References
- 1.Yeh C., et al. , Using publicly available satellite imagery and deep learning to understand economic well-being in Africa. Nat. Commun. 11, 2583 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jean N., et al. , Combining satellite imagery and machine learning to predict poverty. Science 353, 790–794 (2016). [DOI] [PubMed] [Google Scholar]
- 3.Browne C., et al. , Multivariate random forest prediction of poverty and malnutrition prevalence. PLoS One 16, e0255519 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Burke M., Driscoll A., Lobell D. B., Ermon S., Using satellite imagery to understand and promote sustainable development. Science 371, eabe8628 (2021). [DOI] [PubMed] [Google Scholar]
- 5.McBride L., et al. , Predicting poverty and malnutrition for targeting, mapping, monitoring, and early warning. Appl. Econ. Perspect. Policy 44, 879–892 (2022). [Google Scholar]
- 6.Chi G., Fang H., Chatterjee S., Blumenstock J. E., Microestimates of wealth for all low- and middle-income countries. Proc. Natl. Acad. Sci. U.S.A. 119, e2113658119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Engstrom R., Hersh J., Newhouse D., Poverty from space: Using high resolution satellite imagery for estimating economic well-being. World Bank Econ. Rev. 36, 382–412 (2022). [Google Scholar]
- 8.Lee K., Braithwaite J., High-resolution poverty maps in Sub-Saharan Africa. World Dev. 159, 106028 (2022). [Google Scholar]
- 9.D. Newhouse, J. D. Merfeld, A. P. Ramakrishnan, T. Swartz, P. Lahiri, Small area estimation of monetary poverty in Mexico using satellite imagery and machine learning. World Bank Policy Res. Work. Pap. 1, 10175 (2022).
- 10.Newhouse D., Small area estimation of poverty and wealth using geospatial data: What have we learned so far? Calcutta Stat. Assoc. Bull. 76, 7–32 (2024). [Google Scholar]
- 11.Carter M. R., Barrett C. B., The economics of poverty traps and persistent poverty: An asset-based approach. J. Dev. Stud. 42, 178–199 (2006). [Google Scholar]
- 12.Barrett C. B., Carter M. R., The economics of poverty traps and persistent poverty: Empirical and policy implications. J. Dev. Stud. 49, 976–990 (2013). [Google Scholar]
- 13.Ghosh M., Rao J. N. K., Small area estimation: An appraisal. Stat. Sci. 9, 55–76 (1994). [Google Scholar]
- 14.Rao J. N. K., Some recent advances in model based small area estimation. Surv. Methodol. 25, 175–186 (1999). [Google Scholar]
- 15.Elbers C., Lanjouw J. O., Lanjouw P., Micro-level estimation of poverty and inequality. Econometrica 71, 355–364 (2003). [Google Scholar]
- 16.Christiaensen L., Lanjouw P., Luoto J., Stifel D., Small area estimation-based prediction methods to track poverty: Validation and applications. J. Econ. Inequal. 10, 267–297 (2012). [Google Scholar]
- 17.Torabi M., Rao J. N. K., On small area estimation under a sub-area level model. J. Multivar. Anal. 127, 36–55 (2014). [Google Scholar]
- 18.Lange S., Pape U. J., Pütz P., Small area estimation of poverty under structural change. Rev. Income Wealth 68, S264–S281 (2022). [Google Scholar]
- 19.P. Corral, I. Molina, A. Cojocaru, S. Segovia, “Guidelines to small area estimation for poverty mapping” (Tech. Rep., World Bank, Washington, DC, 2022).
- 20.Sen A., Inequality Reexamined (Russell Sage Foundation, New York, 1992). [Google Scholar]
- 21.Carter M. R., May J., One kind of freedom: Poverty dynamics in post-apartheid South Africa. World Dev. 29, 1987–2006 (2001). [Google Scholar]
- 22.Sullivan J. X., Turner L., Danziger S., The relationship between income and material hardship. J. Policy Anal. Manag. 27, 63–81 (2008). [Google Scholar]
- 23.A. Head, M. Manguin, N. Tran, J. E. Blumenstock, “Can human development be measured with satellite imagery?” in Proceedings of the Ninth International Conference on Information and Communication Technologies and Development (ACM, Lahore Pakistan, 2017), pp. 1–11.
- 24.G. A. Un, “Transforming our World: The 2030 agenda for sustainable development” (Tech. Rep. A/RES/70/1, United Nations, 2015).
- 25.Pu C. J., et al. , How poverty is measured impacts who gets classified as impoverished. Proc. Natl. Acad. Sci. U.S.A. 121, e2316730121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Foster J., Greer J., Thorbecke E., A class of decomposable poverty measures. Econometrica 52, 761–766 (1984). [Google Scholar]
- 27.Foster J., Greer J., Thorbecke E., The Foster-Greer-Thorbecke (FGT) poverty measures: 25 years later. J. Econ. Inequal. 8, 491–524 (2010). [Google Scholar]
- 28.Naschold F., Barrett C. B., Do short-term observed income changes overstate structural economic mobility? Oxf. Bull. Econ. Stat. 73, 705–717 (2011). [Google Scholar]
- 29.Barrett C. B., Garg T., McBride L., Well-being dynamics and poverty traps. Annu. Rev. Resou. Econ. 8, 303–327 (2016). [Google Scholar]
- 30.Deaton A., The Analysis of Household Surveys: A Microeconometric Approach to Development Policy (The World Bank, 1997). [Google Scholar]
- 31.D. Jolliffe, S. K. Tetteh-Baah, “Identifying the poor - accounting for household economies of scale in global poverty estimates” in IZA Institute of Labor Economics Discussion Paper Series IZA DP No. 15615 (IZA Institute of Labor Economics, 2022).
- 32.van der Weide R., Blankespoor B., Elbers C., Lanjouw P., How accurate is a poverty map based on remote sensing data? An application to Malawi. J. Dev. Econ. 171, 103352 (2024). [Google Scholar]
- 33.Lentz E. C., Michelson H., Baylis K., Zhou Y., A data-driven approach improves food insecurity crisis prediction. World Dev. 122, 399–409 (2019). [Google Scholar]
- 34.Martini G., et al. , Machine learning can guide food security efforts when primary data are not available. Nat. Food 3, 716–728 (2022). [DOI] [PubMed] [Google Scholar]
- 35.Gibson J., Kim B., Measurement error in recall surveys and the relationship between household size and food demand. Am. J. Agric. Econ. 89, 473–489 (2007). [Google Scholar]
- 36.Gazeaud J., Proxy means testing vulnerability to measurement errors? J. Dev. Stud. 56, 2113–2133 (2020). [Google Scholar]
- 37.J. D. Merfeld, D. Newhouse, Improving estimates of mean welfare and uncertainty in developing countries. World Bank Policy Res. Work. Pap. 1, 10348 (2023).
- 38.Karpatne A., et al. , Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017). [Google Scholar]
- 39.Corral P., Henderson H., Segovia S., Poverty mapping in the age of machine learning. J. Dev. Econ. 172, 103377 (2025). [Google Scholar]
- 40.Tang B., Liu Y., Matteson D. S., Predicting poverty with vegetation index. Appl. Econ. Perspect. Policy 44, 930–945 (2022). [Google Scholar]
- 41.S. O. Rutstein, K. Johnson, “The DHS wealth index” (Tech. Rep. 6, United States Agency for International Development, DHS Comparative, 2004).
- 42.S. O. Rutstein, “The DHS wealth index: Approaches for rural and urban areas” (Tech. Rep. 60, United States Agency for International Development, DHS Working Papers, 2008).
- 43.T. N. Croft, A. M. J. Marshall, C. K. Allen, “Guide to DHS statistics” (Tech. Rep. DHS-7, ICF, Rockville, MD, USA, 2018).
- 44.Smits J., Steendijk R., The international wealth index (IWI). Soc. Indic. Res. 122, 65–85 (2015). [Google Scholar]
- 45.W. Sirko et al., Continental-scale building detection from high resolution satellite imager. arXiv [Preprint] (2021). http://arxiv.org/abs/2107.12283 (Accessed 28 February 2023).
- 46.Masaki T., Newhouse D., Silwal A. R., Bedada A., Engstrom R., Small area estimation of non-monetary poverty with geospatial data. Stat. J. IAOS 38, 1035–1051 (2022). [Google Scholar]
- 47.Farr T. G., et al. , The shuttle radar topography mission. Rev. Geophys. 45, RG2004 (2007). [Google Scholar]
- 48.Weiss D. J., et al. , A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature 553, 333–336 (2018). [DOI] [PubMed] [Google Scholar]
- 49.Lloyd C. T., Sorichetta A., Tatem A. J., High resolution global gridded data for use in population studies. Sci. Data 4, 170001 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.WorldPop, Global 1 km Population (2018). https://hub.worldpop.org/doi/10.5258/SOTON/WP00647. Accessed 30 January 2024.
- 51.Huang Q., Yang X., Gao B., Yang Y., Zhao Y., Application of DMSP/OLS nighttime light images: A meta-analysis and a systematic literature review. Remote Sens. 6, 6844–6866 (2014). [Google Scholar]
- 52.Li X., Zhou Y., Zhao M., Zhao X., A harmonized global nighttime light dataset 1992–2018. Sci. Data 7, 168 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.X. Li, Y. Zhou, M. Zhao, X. Zhao, Harmonization of DMSP and VIIRS nighttime light data from 1992–2021 at the global scale (2023). 10.6084/m9.figshare.9828827.v8. Accessed 29 January 2024 [DOI]
- 54.Funk C., et al. , The climate hazards infrared precipitation with stations–a new environmental record for monitoring extremes. Sci. Data 2, 150066 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Global Modeling and Assimilation Office (GMAO), MERRA-2 tavg1_2d_slv_Nx: 2d, 1-Hourly, Time-Averaged, Single-Level, Assimilation, Single-Level Diagnostics V5.12.4 (2015). 10.5067/VJAFPLI1CSIV. Accessed 5 December 2022 [DOI]
- 56.E. Vermote, NOAA Climate Data Record (CDR) of AVHRR Normalized Difference Vegetation Index (NDVI), Version 5 (2018). 10.7289/V5ZG6QH9. Accessed 19 December 2023. [DOI]
- 57.Sun Q., et al. , A review of global precipitation data sets: Data sources, estimation, and intercomparisons. Rev. Geophys. 56, 79–107 (2018). [Google Scholar]
- 58.Jain M., The benefits and pitfalls of using satellite data for causal inference. Rev. Environ. Econ. Policy 14, 157–169 (2020). [Google Scholar]
- 59.E. Tennant, Y. Ru, P. Sheng, D. Matteson, C. B. Barrett, Replication package for Microlevel structural poverty estimates for southern and eastern Africa (Version 1). Zenodo. 10.5281/zenodo.14645889. Deposited 14 January 2025. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Data Availability Statement
All author generated data and code for this project are included in the replication package, available on the Zenodo repository (59). All source data utilized for this project are publicly available for noncommercial use, but may not be included directly in the replication package due to file size or permissions. Code is in R. Large datasets were accessed via and preprocessed in Google Earth Engine.