Abstract
Exposure to air pollution has been associated with anemia in children, but little effort has been made in low- and middle-income countries (LMICs) in which the prevalence of anemia is persistently high. This study aimed to assess the effects of air pollution and household environmental indicators on anemia among children aged 6–59 months using machine learning algorithms. The Demographic and Health Survey (DHS) datasets from 45 LMICs were linked with the satellite-derived estimates of annual average particulate matter (PM2.5) and nitrogen dioxide (NO2) based on children’s area of residence. The modified Poisson regression model was used to assess the association between exposure to air pollutants, household environmental indicators, and anemia status of children. Machine learning algorithms (MLA) such as logistic regression, Ridge, Lasso, elastic net, Artificial Neural Network, Naïve Bayes, Boosting, and Random Forest were used for predicting the anemia status. We randomly split the dataset into two (train/test), and the model performance was evaluated using sensitivity, specificity and the area under the receiver operating characteristic curve (AU-ROC). The study included 177,251 under-five children, of which 99,290 (56%) were anemic and varied across countries ranging from 16% (Armenia) to 81% (Mali). A child who lived in areas with a PM2.5 concentration above the WHO recommended guidelines has a 26% higher risk of being anemic (aPR = 1.26; 95% CI 1.22–1.30) and a child from households having clean fuel for cooking, improved water, and improved sanitation have 24%, 3%, and 13% lower risk of being anemic. The random forest MLA achieved the best classification accuracy of 68%, specificity of 54%, sensitivity of 79% and AUC of 74%. The MLA is more effective than traditional analytical approaches in predicting anemia status. The RF revealed that the age of the child, wet day of the environment, the location of the child, and PM2.5 are the most important features to predict the anemia status of a child.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-33837-3.
Keywords: Machine learning, Air pollution, Environmental quality, Anemia prediction, Low-middle income countries, Modified poisson regression
Subject terms: Diseases, Environmental sciences, Environmental social sciences, Health care, Medical research, Risk factors
Introduction
Anemia is one of the most prevalent1 and common concerns in the world and it is one of the ten most important health problems contributing to the global burden of disease2,3. It is caused by nutritional (iron deficiency) and non-nutritional causes (infection) and it affects approximately one-third of the global population mainly pregnant women and preschool-aged children (6–59 months of age)4,5. Anemia has adverse effects on children’s immunization and cognitive development abilities against diseases4–6. A pooled study in 47 low and middle-income countries (LMICs)7 revealed that 57% of under-five children had anemia and a similar study on 29 LMICs focused on sub–Saharan Africa (sSA)8 showed that 64% of children aged under five years had anemia. The highest burden of anemia among children is in sSA and South Asia, which also varies within and between countries9. According to a 2016 World Bank report, the prevalence of anemia in sSA ranges from 36% (Rwanda) to 86% in Burkina Faso10.
Furthermore, most people who live in LMICs are exposed to harmful levels of air pollutants including PM2.5 and NO211–13. The annual mean PM2.5 concentration in sSA is 45 µg/m311, which is 9-fold the World Health Organization’s (WHO) recommended (5 µg/m3) limit for healthy air13,14. South Asian countries (Bangladesh, Nepal, India, and Pakistan) lead the world in PM2.5 pollution12 limit. Recent studies conducted in China among elderly people15 and under-five children6, and studies conducted in America16 among elderly people revealed that air pollution exposure is associated with an increased prevalence of anemia. Though abundant evidence supported the relationship between anemia status and air pollution (PM2.5 and NO2) and most of the studies were focused on specific populations, such as elderly15,16, and children6,17. However, the effects of air pollutants on anemia in the entire population have received less attention in LMICs.
Exposure to air pollutants (PM2.5, NO2) and household environmental quality indicators (cooking fuel type, Availability of water, Toilet facilities e.t.c…) have been previously associated with anemia8,16,18,19. However, the effects of those environmental exposures on anemia prevalence have not been given sufficient emphasis in LMICs, where the prevalence of anemia is persistently high7,9,20. Due to the accessibility of large datasets and developmental robust computational approaches, machine learning algorithms are emerged as a promising tool over the classical statistical models for evaluating and detecting the effects of covariates on health-related outcomes21–25. To determine the factors that may affect anemia among under-five children, classical models were frequently used9,16,18–20 and those models failed to reflect the complex relations between covariates and/or make oversimplified assumptions. The ML approach can handle the interaction between multiple components and non-linear correlations22,26,27. This is the first comprehensive analysis of the effects of air pollution and different household-level environmental qualities as well as related factors on anemia status among children aged 6–59 months across 45 LMICs using machine learning algorithms.
Materials and methods
Data sources and variables
The data used in this study was obtained from two sources: the Demographic and Health Survey (DHS) collected from 2009 to 2023, which is described at https://dhsprogram.comwebsite. Most recent DHS data from 45 LMICs and their global positioning systems (GPS) coordinates (latitude, longitude) for household clusters were included in the study (Fig. 1). Moreover, the geographical covariates were extracted from the Spatial Data Repository (https://spatialdata.dhsprogram.com/covariates/) and linked to the original individual DHS datasets using the cluster identifying number (ID). Air pollution data by Enumeration Areas: the DHS dataset has a separate shapefile of districts (Enumeration areas) that consists of cluster IDs with their GPS locations. These GPS locations were used to extract the outdoor air pollution which was measured using the Global annual mean fine particulate matter (PM2.5) in micrograms per cubic meter concentrations and NO2 air pollutants from National Aeronautics and Space Administration (NASA)28 and SeaWiFS satellite data29. The GeoTIFF datasets which estimated at a resolution of 0.01° × 0.01° (approximately 1 km × 1 km) were linked to displaced GPS coordinates of DHS survey clusters [anonymized by 0–2 km (urban) and 0–5 km (rural)], connecting PM2.5 and NO2 concentrations to survey cluster locations. Hence, each child was assigned the average PM2.5 and NO2 exposure levels in their areas of residence (clusters) at the time of the survey. The global, annual ambient levels of PM2.5 and NO228 were the main exposure variables, and these variables were matched temporarily with the calendar year in which the DHS surveys were performed in the given countries (Fig. S1 and Fig. S2). These environmental variables were extracted from the raster images (GeoTIFF) using the open-source R software via the GPS locations which were extracted from the DHS shape files. Children within a given cluster were assigned to the same exposure (PM2.5 and NO2) estimates. From the recent updates on health effects, the WHO updated the air quality guidelines in 202113 by revising the threshold down to 5 µg/m3 and significantly increasing the stringency of its 2005 guidelines 10 µg/m3. Moreover, for countries that exceed the threshold, the WHO recommend a new interim target (Interim Target 4: at 10, 15, 25, and 35 µg/m3 ) and revised the Air Quality Guideline thresholds13.
Fig. 1.
Blue line showing the DHS survey data collected from eligible countries, DHS data (2011–2023). (a) countries selected (b) total sample included.
U5C: number of under-five children aged 6–59 months, PSU: Primary Sampling Unit (clusters).
Ethical considerations
The datasets obtained from the DHS program are freely available to the public and anonymized. The protocol of DHS data was approved by the International Institute Review (ICF) board and the data is publicly available at http://dhsprogram.com/data/available-datasets.cfm.
Study variables and measurements
Outcome variable: The anemia status was the outcome variable in the study. The DHS survey data collects blood samples from a finger/heel of children 6–59 months old using a portable battery-operated device30,31 to measure Hemoglobin (HB) levels after consent is obtained from parents/guardians. Children with severe and moderate anemia testing are referred to nearby local healthcare facilities for appropriate care and children were categorized into anemic (any of mild, moderate, and severe) and none-anemic (Hb
110 g/L) in our analysis7,30,32,33.Independent variables: The independent variables extracted were based on a review of the literature20,34–38. The variables included in the analysis are summarized in Table 1.
Table 1.
Descriptions of variables along with their categories.
| Variables | Descriptions | Categories |
|---|---|---|
| Outcome | Children aged 6 − 59 months with a hemoglobin concentration less than 110 g/L, adjusted for altitude. It has its responses classified into four categories based on the WHO recommendation. | (i) ‘not anemic’ for children with Hb count (g/L) measuring above 110 g/L; (ii) ‘mild anemia’ for Hb count of 100–109 g/L; (iii) ‘moderate anemia’ for Hb count between 70 and 99 g/L and (iv) ‘severe anemia’ for Hb count less than 70 g/L |
| Maternal age | Age of mothers in years | 15–29, 30–34, 35–39, 40–49 |
| Place of residence | Type of place of residence where the household resides | Urban, Rural |
| Maternal Education | Mother’s educational level | no education, primary, secondary, higher |
| Education_F | Father’s highest educational level | no education, primary, secondary, higher |
| Employment_M | Current working status of mother | working, not working |
| Wealth index (V190) | wealth quintiles: poorest, poorer, middle, rich and richest). | |
| Media exposure | Access to media (TV, radio and newspaper) | yes, no |
| Woman’s autonomy | Overall score for woman’s autonomy (decision) | low, medium and high autonomy |
| NU5C | Number of under five children | 1–2, 3–4, 5+ |
| HHS | Household size | < 4, 5–9, 10+ |
| Districts | The administrative areas respondents reside | Second administrative levels |
| Sex of child | Sex of a child | Male, Female |
| Child’s age in months | 6–11, 12–23,24–35, 36–59 | |
| Comorbidity | A child having either fever, Diarrhea or acute respiratory symptoms in the 2 weeks preceding the survey | Yes, no |
| Geospatial and seasonal covariates/potential environmental drivers | ||
| Vegetation index value (EVI) | Vegetation index value between 0 (least vegetation) and 10,000 (Most vegetation) | Mean per year |
| Aridity | Index between 0 (most arid) and 300 (most wet) | Mean per year |
| Temperature | Mean temperature year | Degrees Celsius |
| Temperature (maximum) | Yearly average temperature in Degrees Celsius | Mean Degrees Celsius |
| NLST | The mean annual daytime land surface temperature | Mean Degrees Celsius |
| Wet days | The average number of days receiving rainfall | Mean (number of days) |
| DLST | The mean annual nighttime land surface temperature | Mean Degrees Celsius |
| Environmental exposures | Improved/clean | unimproved/unclean (solid/polluting) |
| Types of cooking fuels (fuel) | Electricity, liquid petroleum gas, natural gas and biogas | Kerosene, coal/lignite, charcoal, wood, straw/ shrubs/grass and animal dung |
| Water | Piped water, boreholes or tube wells, protected dug wells, protected springs, rainwater, and packaged or delivered water | Unprotected dug well, unprotected spring, river, dam, lake, pond, stream, canal and irrigation canal |
| Types of toilets | Flush/pour flush to piped sewer systems, septic tanks or pit latrines, ventilated improved pit latrines, composting toilets or pit latrines with slabs | Pit latrines without a slab or platform, hanging latrines or bucket latrines and open defecation |
| Particulate matter (PM2.5)1 | Mean annual surface PM2.5 level measured at a cluster-level at a 0.01 degree resolution in µg/m3, (the assumed place where child was born and live at the time of survey) | |
| Nitrogen dioxide (NO2) | Mean annual NO2 measured at a cluster-level, in µg/m3 at a 0.01 degree | |
1Global Annual PM2.5 Grids from MODIS, MISR and SeaWiFS Aerosol Optical Depth (AOD), 1998–2019, V4.GL.03 | NASA Earthdata: and https://sites.wustl.edu/acag/datasets/surface-pm2-5/.
Methodology
The crude prevalence of anemia was determined after accounting for the complex survey design and sampling weights. To identify the environmental determinants of anemia, we used the modified Poisson regression models with robust error variance and reported the results as an adjusted prevalence ratio (aPR) with a 95% confidence interval. This model modified the Poisson regression model since the odds ratio estimated using the traditional logistic regression from a cross-sectional study may overestimate the relative risk when the outcome is common39,40 and in case of convergence issues this model performs better in estimating the prevalence ratio41,42.
Data preprocessing and feature selection
The raw data obtained from the given sources underwent preprocessing procedures including data cleaning, missing value imputation, and variable transformation to ensure its suitability for statistical analysis22,23,43,44. There are many features in the DHS dataset, and not all of these features are always important; hence feature selection methods were employed to identify the most important predictors22,32,43. This is because unnecessary variables during the model training phase might cause us to increase the model’s complexity and reduce its overall accuracy25.
Model Building
The datasets were randomly divided into two (training/testing) with an 8:2 ratio meaning that 80% of the dataset was used for training and the remaining 20% of the dataset was used for testing the model. This process repeated with (7:3) and (6:4) split and then 2, 5, and 10-fold [2 K, 5 K, and 10 K] cross-validation was used to assess the impact of different training and testing ratios on the performance of the different machine learning algorithms (MLA)24,45–49. To select suitable MLA, we reviewed related works using MLAs on different childhood outcomes such as childhood nutrition status, anemia status, and mortality22,23,26,43,44,50. For this study, different supervised MLA such as generalized linear models (binary logistic regression51, Ridge52, Least Absolute Shrinkage and Selection Operator (LASSO) regression53 and Elastic Net47,54, Artificial Neural Network (ANN)55,56, Random Forest (RF)47,57, Naïve Bayes58–60 and Decision Trees60 were adopted for predicting the anemia status of children aged 6–59 months residing in 45 LMICs.
- Logistic regression (LR): most used approach for classification problems. The LR model with the logit link function is given as:

1
where
,
is the outcome variable for the
under-five child,
and
are respectively the vector of covariates and vector of coefficients51. However, LR has various limitations including: overfitting, computational difficulties when we have large number of features (dimensionalities)51,53. To address these issues, regularization approaches like ridge, lasso, and elastic net were used to impose a penalty term on the parameters to shrink towards zero47,52–54. The penalized likelihood formulation for ridge (Eq. 2.), LASSO (Eq. 3.), and elastic net (Eq. 4.), regression models are given by:
![]() |
2 |
![]() |
3 |
![]() |
4 |
Where
is a penalized parameter applied for all the parameters except the constant51,52. Moreover, we trained the generalized linear model (GLM) estimators with common
values from the set {0, 0.5, 1}, where (
=0.0, 0.5 and 1.0 respectively refers to the ridge, elastic net, and lasso penalty)47,54,61.
where,
,, P (C),
, and
are respectively the conditional probability (CP) of class (C), the prior probability of class of class (C), the CP of
given C and the prior probability of
.
Random forest (RF): is the popular supervised ML approach in both classification and regression, and it is also a popular approach in variable screening for dimension reduction62–64. For the RF algorithm, we used the random Forest package in R. To optimize the model’s performance, the number of trees, maximum depth, minimum node size, and other key settings were adjusted.
Performance measures and model validation
To evaluate the performances of the given ML algorithms, a confusion matrix was used24,45–49,66 and accuracy was also utilized to measure the proportion of correct predictions67. Model sensitivity and specificity relationships are expressed using receiver operating characteristics (ROC) curves (Fig. 2), which were calculated based on the true and predicted outcome of interest. All the curves which are plotted to the left of the diagonal line are performing better than chance. The area under each curve (AUC) gives an aggregated value which explains the probability that a random sample would be correctly classified by each of the ML algorithms45,68. The identified best-fit model is then used to predict the anemia status in another dataset, known as the test dataset24,45–49.
Fig. 2.
Overview flow chart of the machine learning algorithms used for predicting Anemia among under-five children.
Feature importance
Feature importance rankings illustrate the relevance of each feature in the decision-making process. To assess the feature importance, we employed mean decrease impurity (MDI)43,44,69. Features that contribute more to lowering impurity are ranked higher, highlighting the most determinant factors in predicting anemia. This insight is vital for prioritizing interventions aimed at addressing the key determinants of childhood anemia. We used STATA, R, SAS and GIS statistical software for data management and statistical analysis.
Patient and public involvement
No patient is involved.
Results
A total of 242,340 children aged 6–59 months from the DHS data across 45 LMICs were included (Table 2). Of these, 73.1% had measured HB levels, resulting in a final sample of 177,251 children included in the study. The overall prevalence of anemia among children aged 6–59 months in LMICs was 56.02%. The percentage was highest among children born to mothers residing in Africa (62.15%) As indicated in Fig. 3, there is high variation of the prevalence of anemia among children aged 6–59 months in developing countries. Among the African countries included in the study, Mali (80.8%) and Mauritania (75.4%) had the highest prevalence of children with anemia, while Egypt (29.0%) and Rwanda (35.8%) had the lowest. Asian countries also reveal similar variations in anemia prevalence estimates. Specifically, the highest prevalence of anemic children is found in Myanmar (55.6%) and a relatively low prevalence is found in Armenia (16.3%). In Latin America, the prevalence is remarkably high in Haiti (65.7%) and low in Honduras (29.6%).
Table 2.
Baseline characteristics and prevalence ratios (PR) estimated from modified Poisson regression.
| Covariates | Categories | Overall, n(%) | Anemic, n (%) | Adjusted prevalence ratio (aPR) | |
|---|---|---|---|---|---|
| Overall prevalence | 177,251 (100) | 99,290 (56.02) | Estimates | 95% CI | |
| Locations | Africa® | 125,921 (71.04) | 78,265 (62.15) | 1 | 1 |
| America | 23,210 (13.09) | 8,977 (38.68) | 0.83 | [0.80, 0.86]*** | |
| Asia | 28,120 (15.86) | 12,048 (42.84) | 0.62 | [0.60, 0.64]*** | |
| Age of mother | 15–29® | 96,673 (54.54) | 55,485 (57.39) | 1 | 1 |
| 30–34 | 38,511 (21.73) | 20,922 (54.33) | 0.98 | [0.96, 0.98]*** | |
| 35–39 | 26,089 (14.72) | 14,260 (54.66) | 0.96 | [0.95, 0.97]*** | |
| 40–49 | 15,978 (9.01) | 8,623 (53.97) | 0.95 | [0.94, 0.97]*** | |
| Education_M | literate® | 124,219 (70.08) | 63,625 (51.25) | 1 | |
| illiterate | 53,030 (29.92) | 35,629 (67.19) | 1.07 | [1.06, 1.08]*** | |
| Employment_M | yes | 149,055 (93.87) | 82,807 (55.55) | 0.99 | [0.98, 1.01] |
| no® | 9,733 (6.13) | 5,721 (58.78) | 1 | 1 | |
| Autonomous | Low® | 72,304 (4079) | 42,259 (58.45) | 1 | 1 |
| Medium | 57,844 (32.63) | 31,340 (54.18) | 0.99 | [0.98, 1.01] | |
| High | 47,103 (26.57) | 25,691 (54.54) | 0.99 | [0.98, 1.01] | |
| Education_F | literate ® | 109,578 (61.82) | 56,036 (51.14) | 1 | 1 |
| illiterate | 67,673 (38.18) | 43,254 (63.92) | 1.05 | [1.03, 1.06]*** | |
| Wealth Index | Poorest® | 47,546 (26.829 | 28,954 (60.90) | 1 | 1 |
| Poorer | 38,854 (21.92) | 22,733 (58.51) | 0.98 | [0.97, 0.99]*** | |
| Medium | 35,208 (19.86) | 19,890 (56.49) | 0.96 | [0.95, 0.97]*** | |
| Richer | 30,704 (17.32) | 16,236 (52.88) | 0.93 | [0.92, 0.94]*** | |
| Richest | 24,939 (14.07) | 11,477 (46.02) | 0.83 | [0.82, 0.85]*** | |
| Residence | Rural | 119,709 (67.54) | 69,904 (58.39) | 1.02 | [1.01, 1.03]** |
| Urban® | 57,542 (32.46) | 29,386 (51.07) | 1 | 1 | |
| NU5C | 1–2® | 136,281 (46.89) | 73,369 (53.84) | 1 | 1 |
| 3–4 | 34,356 (19.38) | 21,331 (62.09) | 1.04 | [1.03, 1.05]*** | |
| 5+ | 6,614 (3.73) | 4,590 (69.40) | 1.03 | [1.02, 1.06]*** | |
| HHS | < 4® | 44,760 (25.25) | 23,971 (53.55) | 1 | 1 |
| 5–9 | 103,737 (58.53) | 57,156 (55.10) | 1.02 | [1.01, 1.03]* | |
| 10+ | 28,754 (16.22) | 18,163 (63.17) | 1.01 | [0.99, 1.03] | |
| Child’s sex | Male® | 89,800 (50.66) | 51,314 (57.14) | 1 | 1 |
| Female | 87,451 (49.34) | 47,976 (54.86) | 0.96 | [0.95, 0.97]*** | |
| Child’s age (months) | 6–11 | 20,662 (11.66) | 14,954 (15.06) | 1 | 1 |
| 12–23 | 41,403 (23.36) | 28,172 (28.37) | 0.95 | [0.94, 0.96]*** | |
| 24–35 | 38,964 (21.98) | 21,610 (21.76) | 0.78 | [0.77, 0.79]*** | |
| 36–59 | 76,222 (43.00) | 34,554 (34.00) | 0.64 | [0.63, 0.65]*** | |
| comorbidity | yes | 57,520 (32.45) | 35,459 (61.65) | 1.10 | [1.09, 1.12]*** |
| no® | 119,731 (67.55) | 63,831 (53.31) | 1 | 1 | |
| Fuel | Clean® | 141,167 (81.38) | 13,999 (43.33) | 0.76 | [0.75, 0.77]*** |
| Unclean | 32,308 (18.62) | 84,212 (59.65) | 1 | 1 | |
| Water | Improved | 94,690 (53.43) | 50,539 (53.37) | 0.97 | [0.96, 0.98]*** |
| Unimproved® | 82,561 (46.58) | 48,769 (59.07) | 1 | 1 | |
| Toilet | Improved | 133,816 (75.50) | 70,881 (52.97) | 0.87 | [0.86, 0.88]*** |
| Unimproved ® | 43,427 (24.50) | 28,425 (65.45) | 1 | 1 | |
| PM2.5 | below 5 µg/m3® | 4,066 (2.29) | 1,997 (49.11) | 1 | 1 |
| above 5 µg/m3 | 173,185 (97.71) | 97,293 (56.18) | 1.26 | [1.22, 1.30]*** | |
| NO2 | below 10 µg/m3® | 167,781 (94.66) | 95,507 (56.92) | 1 | 1 |
| above 10 µg/m3 | 9,470 (5.34) | 3.783 (39.95) | 0.86 | [0.84, 0.88]*** | |
| Continuous EA level covariates mean (sd) | Not anemic | Anemic | |||
| EVI | 2482.5 (5.22) | 2334.8 (4.64) | 0.97 | [0.96, 0.999]*** | |
| Aridity | 29.1 (0.07) | 27.4 (0.06) | 0.98 | [[0.95, 0.99]* | |
| Temperature | 22.4 (0.02) | 24.29 (0.01) | 1.18 | [1.13, 1.22]*** | |
| NLST | 17.1 (0.02) | 18.7 (0.01) | 1.01 | [0.96, 1.05] | |
| DLST | 29.2 (0.02) | 30.5 (0.01) | 0.93 | [0.91, 0.96]*** | |
| Wet Days | 9.6 (0.02) | 8.9 (0.01) | 0.91 | [0.89, 0.93]*** | |
EA: Enumeration areas, EVI: Enhanced Vegetation Index, DLST: day land surface temperature, NLST: night land surface temperature.
Fig. 3.
Anemia proportion by country. Note: Year of the survey collection for each country is mentioned next to its name. It is in ascending order by continent, and prevalence of anemia.
The change of PM2.5 and NO2 levels have not been uniform for the LMICs over the last 24 years (Fig. S1 and Fig.S2). The prevalence of anemia among children in the first administrative areas (regions) of each country was shown in Fig.S3 and the prevalence of anemia varies within each country. Figure 4 shows that most countries in Africa, such as Benin, Burkina Faso, Togo, Namibia, Senegal, Malawi and Mali, fall under the hazardous classifications for PM2.5 levels. Some countries, including Liberia, are classified as safe, while other such as Mauritania, Sierra Leone, and South Africa are at low-risk classification. Guatemala in Latin America, Bangladesh, Cambodia and Nepal in Asia are also under the hazardous classifications of PM2.5 levels. Using WHO’s 2021 revised fine PM2.5 thresholds, our results showed that 173,185 (97.7%) of the under-five children are directly exposed to unsafe average annual PM2.5 concentrations (above 5 µg/m3) and 89% of the sample children live in areas that exceed even the WHO’s 2005 guideline threshold of 10 µg/m3. Of the total sample under five children, 71% are directly exposed to at least high (over 15 µg/m3) PM2.5 levels, and 36.11% are exposed to hazardous (over 35 µg/m3) levels. Benin, Bangladesh, and Nepal had the highest mean PM2.5 exposures, while Liberia, Guyana, and Timor-Leste had the lowest mean annual PM2.5 concentrations. In almost all countries, the mean value of PM2.5 (µg/m3) among anemic children is higher than that of nonanemic children, whereas the mean NO2 concentration (µg/m3) of anemic children is lower among anemic children compared to their non-anemic counterparts.
Fig. 4.
Survey year cluster level estimates of average annual value of PM2.5 (µg/m3) and NO2 (µg/m3) by country and by anemia status.
The coverage of household environmental quality indicators, such as the use of clean cooking fuel, improved drinking water, and improved sanitation facilities, varies across LMICs (Fig.S4 ). These differences also affect the adverse health outcomes of children, including anemia among children under 5 years.
Nearly 63%, 43% and 39% children resided in Africa, Asia and Latin America, respectively, are anemic (Table 2). Most anemic under-five children were born from mothers with no formal education (67.16%), living in households with the lowest wealth quantile index (60.90%), and from rural areas (58.39%). A high proportion of anemic children come from households that had uncleaned fuel for cooking (59.65%), use unimproved water (59.07%) and unimproved toilet facilities (65.45%). Children exposed to higher than WHO-recommended (above 5 µg/m3) PM2.5 level (56.18%) and exposed to lower than WHO-recommended NO2 (10 µg/m3) level (56.92%) had a higher anemic proportion compared to their counterparts. Of the total children from rural households, 69,904 (58.39%) were anemic. The prevalence of anemia was slightly higher in male children (57.14%) than in female.
The risk factors identified from the multivariable modified Poisson regression model with robust error variance for anemia status among children aged 6–59 months are also presented in Table 2. The anemia status was strongly associated with the mothers’ age: compared to children born from mothers aged 15–19 years, those born from mothers aged 35–39 years were 4% less likely to become anemic (aPR = 0.96, 95% CI: 0.95, 0.97), similar results were observed for children born from mothers aged 40–49 years (aPR = 0.95, 95% CI: 0.94, 0.97). Children born in Latin America had a 17% (aPR = 0.83, 95% CI: 0.80, 0.86) and those born in Asian a 38% (aPR = 0.62, 95% CI: 0.60, 0.64) lower risk of being anemic compared with children from Africa. Children whose parents are illiterate mothers had an increased risk of becoming anemic (mothers: aPR = 1.07, 95% CI: 1.06, 1.08 and fathers: aPR = 1.05, 95% CI: 1.03, 1.06. The wealth quantile index of the household is significantly associated with the anemia status of a child. Compared to poorest quantile, children born from richest quantile households had a 17% (aPR = 0.83, 95% CI: 0.82, 0.85) lower risk of becoming anemic, while those born from medium wealth index households had a 4% (aPR = 0.96, 95% CI: 0.95, 0.97) lower risk. Similarly, children born from households using improved water for drinking had a 3% lower risk of becoming anemic (aPR = 0.97, 95% CI: 0.96, 0.98), those using safe toilet facility had a 13% lower risk (aPR = 0.87, 95% CI: 0.86, 0.88), and those using clean fuel for cooking had a 24% lower risk (aPR = 0.76, 95% CI: 0.75, 0.77), compared with their counterparts. Moreover, children born in areas where the mean annual PM2.5 threshold value was above the WHO recommended healthy air quality had a 26% higher risk of becoming anemic (aPR = 1.26, 95% CI: 1.22, 1.30), compared with the nonanemic counterparts (Table 2).
The prediction model results for the different MLA with 3 protocols (K2, K5 and K10) and different training/testing dataset ratios are presented in Table S1 and Fig. S5. Almost for all combinations, the AUC was slightly improved with increasing the protocols. For 10 cross-validations (K10) and 80:20 training and test dataset ratios, the result from logistic regression reveals the specificity, sensitivity and accuracy are 65%, 65% and 67% respectively. However, the result showed that there is no substantial difference in accuracies of the different MLAs that can predict the anemia status of children aged 6–59 months across the countries. The random forest algorithm attains a relatively higher AUC, providing the highest accuracy (68.2% for 80:20, 67.5%% for 70:30 and 67.73% for 64:40). After evaluating the performance of different MLAs, the mean decrease impurity (MDI) was implemented for the RF (the best for anemia prediction in this study) with robust scaling to rank and utilize the important features in predicting the anemia status of children aged 6–59 months. The result revealed that most informative features overall were child’ age, wet day of the environment, location of a child, and PM2.5 concentrations, ranking as the 1st, 2nd, 3rd and 4th, respectively Fig. S5.
Discussions
Anemia remains a critical public health issue4,9,10, particularly in LMICs, where it adversely affects child growth, cognitive development, and long-term health outcomes2–5. Moreover, the anemia prevalence varies across countries, which may be due to differences in data collection timing and procedures, as they were not conducted in the same year. Most of the previous studies used traditional statistical methods to identify the risk factors of anemia7,9,10,20. However, these methods have often been limited to clinical and demographic factors, overlooking potential air quality factors and household environmental indicators. To address this gap, we incorporated satellite-derived data on air pollution variables specifically, PM2.5 and NO2 into our analysis. This study revealed that children born from households using clean cooking fuels, improved water for drinking, and improved sanitation were less likely to be anemic. This is in line with previous studies in different countries that the effects of using unclean fuel for cooking, unimproved water, and unimproved toilet facilities may increase the odds of being anemic among children6,8,17,20,38. Moreover, the location of a child was associated with anemia. Specifically, children living in Africa had a higher odd of being anemic compared to those living in Latin America and even in Asia. A study in sSA6,20,38 confirmed that the prevalence of anemia was higher compared to other countries across the globe70–72. This might be due to the severe prevalence of undernutrition, insufficient diet intakes in the sSA regions73,74 and the high prevalence of infectious diseases including hookworms, malaria and other75–77. Children living in large households and with a higher number of under-five children had higher odds of suffering from anemia compared with their counterparts. This was also observed in previous studies6,20,38. Moreover, our study found that the socioeconomic status (wealth index and education) of women had a significant association with the anemia status of their children. Particularly, children from literate mothers and rich households have lower odds of being anemic, which is also in line with previous studies6,20,38,69. This observation was expected since households with high socioeconomic status can afford adequate meals for their children and those educated mothers has a positive impact on nutritional outcomes of their children78,79. Our result revealed that, children born to older mothers tend to be less anemic than those of younger mothers, a finding consistently reported in different DHS based and related studies20,80,81. This association may be due to the fact that older mothers are generally more experienced in childcare and feeding practices, more likely to utilize health services, and often have better socioeconomic standing, which enables access to diverse diets and improved living conditions.
Older children had a lower odd of anemia, and this is consistent with studies reported in82,83. This may be because early childhood is a critical time for starting complementary foods, during which children are more exposed to contaminated food and water84. The presence of other comorbidities (e.g., diarrhea, fever and acute respiratory symptoms) had increased the odds of anemia among children aged 6–59 months, which is consistent with previous studies69–72. This may be due to children with those comorbidities may lose their appetite, increasing the likelihood of anemia85. Moreover, in this cross-sectional study conducted in 45 LMICs, we aimed to assess the association between the levels of PM2.5 and anemia status. The estimated values of PM2.5 were substantially higher than the new WHO air quality guidelines (5 ug/m)13 in almost all the LMICs included in the study. The results found that the PM2.5 above the WHO new guidelines was associated with the greater odds of anemia prevalence among children aged 6–59 months. This was also observed in previous studies conducted in different countries/regions around the world6,17,20, which found that increased PM2.5 levels are positively associated with anemia among children. However, those previous studies were limited to a few countries, while we included 45 LMICs in the DHS program across Africa, Asia, and Latin America that had variables of interest. Our result revealed that children exposed to concentrations of NO₂ below the WHO recommended threshold had a higher prevalence of anemia. This unexpected relationship may be due to indirect or contextual factors rather than a biological protective effect of NO₂. Higher NO₂ concentrations often occur in urban areas (a significant relationship was also observed in our study), where improved nutrition, access to health care, and lower malaria prevalence may contribute to reduced anemia rates86,87. Therefore, the observed association may reflect socioeconomic and environmental confounding rather than a causal link between NO₂ exposure and anemia. Additionally, we used MLA to predict the anemia status in children under the age of five years considering air quality and household environmental indicators. By integrating these environmental variables with the extensive child record health data from the DHS, we aimed to improve the predictive accuracy of ML models and provide a deeper understanding of the complex factors contributing to anemia in children aged 6–59 months across LMICs. Implementing and integrating the MLA in predicting disease status are becoming popular in big data, public health, and healthcare research showing the progressive impression of healthcare-related planning22,26,43,50. However, to date, very limited research50,88 has been conducted on the application of MLAs to predict the outcome using a big cross-sectional dataset (DHS) across LMICs and linked to the satellite-derived estimates of annual average PM2.5 and NO2.
Strength and limitation
This study utilized a large, nationally representative DHS survey from 45 LMICs and data from NASA. It also used the modified Poisson regression with machine learning algorithms to determine the risk factors of anemia among children and predict the status of anemia. The study had also some limitations, including the use of PM2.5 levels at the cluster (enumeration areas), which may have led to exposure misclassification and potential bias. Moreover, the survey is a cross-sectional data, and we can only draw conclusions on statistical association (not causality). The other limitation of our study is the assignment of air pollution exposure based on annual estimates from the same year as the survey. While this approach ensures temporal alignment between exposure and outcome data, it may not fully capture the cumulative or lagged effects of air pollution on anemia, which could develop over months or years. Longer-term exposure averages (over the previous years) might better reflect biologically relevant exposure windows. Future studies should consider examining multi-year or lagged exposure periods to more accurately estimate the long-term effects of air pollution on anemia risk.
Conclusions
Anemia among children aged 6–59 months in LMICs was the major public health problem. The result from the modified Poisson regression model revealed that the factors related to maternal characteristics such as maternal age, educational status, and employment status have strong relations with anemia status. The household characteristics such as the use of improved water, improved sanitation, clean fuel use for cooking, and the highest wealth index were negatively associated with the anemic children aged 6–59 months. The air pollutants (PM2.5) threshold within the WHO-recommended healthy air was associated with a lower risk of anemia status among children. In this study, we harnessed the power of machine learning algorithms to predict the anemia status of children aged 6–59 months across 45 LMICs using large-scale cross-sectional DHS datasets and satellite datasets. This novel approach allows for a more holistic assessment of anemia risk, combining health, demographic, and environmental data in a way that can better inform targeted public health interventions across LMICs.
Supplementary Information
Below is the link to the electronic supplementary material.
Abbreviations
- LMICs
Low and middle-income countries
- DHS
Demographic and Health Surveys
- MLA
Machine learning algorithms
- ACU
Area under the curve
- sSA
sub Saharan Africa
- WHO
World health organization
- PM2.5
Particulate matter
- NO2
nitrogen dioxide
- NASA
National aeronautics and space administration
- ROC
Receiver operating characteristics
- GPS
Global positioning system
- PSU
Primary sampling unit
Author contributions
HMF*a, b,e and JJa, b, d, were involved in this study from data management, data analysis, drafting, and revising the final manuscript. KAc , AKRa, and IPa contributed to the conception, design, and interpretation of data, as well as to manuscript reviews and revisions. All authors have read and approved the manuscript.
Data availability
The datasets used in this study are publicly available and can be accessed from portals. The DHS data are publicly available from https://dhsprogram.com after a formal request is accepted, while the PM2.5 and NO2 estimates are publicly available as version V4.GL.03 at https://sites.wustl.edu/acag/datasets. The STATA and R code used in the study will be available from a formal request from corresponding author.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Haile Mekonnen Fenta, Email: haile.fenta@oulu.fi.
Jouni J.K. Jaakkola, Email: jouni.jaakkola@oulu.fi
References
- 1.Gardner, W. M. et al. Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: findings from the global burden of disease study 2021. Lancet Haematol.10 (9), e713–e734 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Organization, W. H. The World Health Report 2002: Reducing risks, Promoting Healthy Life (World Health Organization, 2002). [DOI] [PubMed]
- 3.Chaparro, C. M. & Suchdev, P. S. Anemia epidemiology, pathophysiology, and etiology in low-and middle‐income countries. Ann. N. Y. Acad. Sci.1450 (1), 15–31 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.McLean, E. et al. Worldwide prevalence of anaemia, WHO vitamin and mineral nutrition information system, 1993–2005. Public Health. Nutr.12 (4), 444–454 (2009). [DOI] [PubMed] [Google Scholar]
- 5.Sakemi, H. Iron-deficiency anemia. N. Engl. J. Med.373 (5), 484–485 (2015). [DOI] [PubMed] [Google Scholar]
- 6.Morales-Ancajima, V. C. et al. Increased outdoor PM2. 5 concentration is associated with moderate/severe anemia in children aged 6–59 months in Lima, Peru. J. Environ. public. Health. 2019 (1), 6127845 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sun, J. et al. Prevalence and changes of anemia among young children and women in 47 low-and middle-income countries, 2000–2018. EClinicalMedicine, 41. (2021). [DOI] [PMC free article] [PubMed]
- 8.Amadu, I. et al. Household cooking fuel type and childhood anaemia in sub-Saharan africa: analysis of cross-sectional surveys of 123, 186 children from 29 countries. BMJ open.11 (7), e048724 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kassebaum, N. J. et al. A systematic analysis of global anemia burden from 1990 to 2010. Blood J. Am. Soc. Hematol.123 (5), 615–624 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Obasohan, P. E. et al. A scoping review of the risk factors associated with anaemia among children under five years in sub-Saharan African countries. Int. J. Environ. Res. Public Health. 17 (23), 8829 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fisher, S. et al. Air pollution and development in africa: impacts on health, the economy, and human capital. Lancet Planet. Health. 5 (10), e681–e688 (2021). [DOI] [PubMed] [Google Scholar]
- 12.Organization, W. H. WHO Global Air Quality Guidelines: Particulate Matter (PM2. 5 and PM10), ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide (World Health Organization, 2021). [PubMed]
- 13.Amini, H. WHO air quality guidelines need to be adopted. Front. Media SA p. 1604483. (2021). [DOI] [PMC free article] [PubMed]
- 14.Apte, J. S. et al. Addressing global mortality from ambient PM2. 5. Environ. Sci. Technol.49 (13), 8057–8066 (2015). [DOI] [PubMed] [Google Scholar]
- 15.Elbarbary, M. et al. Ambient air pollution exposure association with anaemia prevalence and haemoglobin levels in Chinese older adults. Int. J. Environ. Res. Public Health. 17 (9), 3209 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Honda, T. et al. Anemia prevalence and hemoglobin levels are associated with long-term exposure to air pollution in an older population. Environ. Int.101, 125–132 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mehta, U. et al. The association between ambient PM2. 5 exposure and anemia outcomes among children under five years of age in India. Environ. Epidemiol.5 (1), e125 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Odo, D. B. et al. A cross-sectional analysis of ambient fine particulate matter (PM2. 5) exposure and haemoglobin levels in children aged under 5 years living in 36 countries. Environ. Res.227, 115734 (2023). [DOI] [PubMed] [Google Scholar]
- 19.Hwang, J. & Kim, H. J. Association of ambient air pollution with hemoglobin levels and anemia in the general population of Korean adults. BMC Public. Health. 24 (1), 1–11 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tesema, G. A. et al. Prevalence and determinants of severity levels of anemia among children aged 6–59 months in sub-Saharan africa: A multilevel ordinal logistic regression analysis. PloS One. 16 (4), e0249978 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pandey, V. K. et al. Machine learning algorithms and fundamentals as emerging safety tools in preservation of fruits and vegetables: a review. Processes11 (6), 1720 (2023). [Google Scholar]
- 22.Fenta, H. M., Zewotir, T. & Muluneh, E. K. A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Med. Inf. Decis. Mak.21, 1–12 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Samuel, O., Zewotir, T. & North, D. Application of machine learning methods for predicting under-five mortality: analysis of Nigerian demographic health survey 2018 dataset. BMC Med. Inf. Decis. Mak.24 (1), 86 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Quinlau, R. Induction of decision trees. Mach. Learn.1 (1), S1–S106 (1986). [Google Scholar]
- 25.Dhal, P. & Azad, C. A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell.52 (4), 4543–4581 (2022). [Google Scholar]
- 26.Moulaei, K. et al. Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med. Inf. Decis. Mak.22 (1), 2 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Batani, J. A. Deep Learning Model for Predicting Under-Five Mortality in Zimbabwe. in 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD). IEEE. (2023).
- 28.Hammer, M. S. et al. Global estimates and long-term trends of fine particulate matter concentrations (1998–2018). Environ. Sci. Technol.54 (13), 7879–7890 (2020). [DOI] [PubMed] [Google Scholar]
- 29.Van Donkelaar, A. et al. Monthly global estimates of fine particulate matter and their uncertainty. Environ. Sci. Technol.55 (22), 15287–15300 (2021). [DOI] [PubMed] [Google Scholar]
- 30.Pullum, T. W. et al. Hemoglobin Data in DHS Surveys: Intrinsic Variation and Measurement Error (ICF, 2017).
- 31.Demographic, I. health surveys standard recode manual for dhs7. The Demographic and Health Surveys Program, (2018).
- 32.Croft, T. N. et al. Guide To DHS Statistics Vol. 645, p. 292–303 (ICF, 2018).
- 33.Stevens, G. A. et al. National, regional, and global estimates of anaemia by severity in women and children for 2000–19: a pooled analysis of population-representative data. Lancet Global Health. 10 (5), e627–e639 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.He, C. et al. Anemia is associated with long-term exposure to PM2. 5 and its components: a large population-based study in Southwest China. Therapeutic Adv. Hematol.14, 20406207231189922 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ma, Y. et al. Short-term exposure to fine particulate matter and nitrogen dioxide and mortality in 4 countries. JAMA Netw. Open.7 (3), e2354607–e2354607 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Liyew, A. M. et al. Prevalence and determinants of anemia among pregnant women in East Africa; A multi-level analysis of recent demographic and health surveys. PloS One. 16 (4), e0250560 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Woodruff, B. A. et al. Determinants of stunting, wasting, and anemia in Guinean preschool-age children: an analysis of DHS data from 1999, 2005, and 2012. Food and nutrition bulletin, 39(1): pp. 39–53. (2018). [DOI] [PubMed]
- 38.Nambiema, A., Robert, A. & Yaya, I. Prevalence and risk factors of anemia in children aged from 6 to 59 months in togo: analysis from Togo demographic and health survey data, 2013–2014. BMC public. Health. 19, 1–9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Barros, A. J. & Hirakata, V. N. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med. Res. Methodol.3, 1–13 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tamhane, A. R. et al. Prevalence odds ratio versus prevalence ratio: choice comes with consequences. Stat. Med.35 (30), 5730–5735 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yelland, L. N., Salter, A. B. & Ryan, P. Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data. Am. J. Epidemiol.174 (8), 984–992 (2011). [DOI] [PubMed] [Google Scholar]
- 42.Zou, G. A modified Poisson regression approach to prospective studies with binary data. Am. J. Epidemiol.159 (7), 702–706 (2004). [DOI] [PubMed] [Google Scholar]
- 43.Bitew, F. H., Sparks, C. S. & Nyarko, S. H. Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia. Public Health. Nutr.25 (2), 269–280 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Talukder, A. & Ahammed, B. Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition78, 110861 (2020). [DOI] [PubMed] [Google Scholar]
- 45.Gareth, J. et al. An Introduction To Statistical Learning: with Applications in R (Spinger, 2013).
- 46.Molina, M. & Garip, F. Machine learning for sociology. Ann. Rev. Sociol., (2019).
- 47.Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques To Build Intelligent Systems (O’Reilly Media, 2019).
- 48.Marsland, S. Machine Learning: an Algorithmic Perspective (CRC, 2015).
- 49.Zhang, H. The Optimality of Naïve Bayes. FLAIRS2004 conference. (2004).
- 50.Fenta, H. M. et al. Factors of acute respiratory infection among under-five children across sub-Saharan African countries using machine learning approaches. Sci. Rep.14 (1), 15801 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Yu, H. F., Huang, F. L. & Lin, C. J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn.85 (1–2), 41–75 (2011). [Google Scholar]
- 52.Arthur, E. H. & Robert, W. K. Ridge regression: biased Estimation for nonorthogonal problems. Technometrics12 (1), 55–67 (1970). [Google Scholar]
- 53.Tibshirani, R. Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.). 58 (1), 267–288 (1996). [Google Scholar]
- 54.Zou, H. & Hastie, T. Addendum: regularization and variable selection via the elastic net. J. Royal Stat. Soc. Ser. B (Statistical Methodology). 67 (5), 768–768 (2005). [Google Scholar]
- 55.Hecht-Nielsen, R. Theory of the backpropagation neural network, in Neural networks for perception. Elsevier. pp. 65–93. (1992).
- 56.Abdelhafiz, D. et al. Deep convolutional neural networks for mammography: advances, challenges and applications. BMC Bioinform.20 (11), 1–20 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Chen, T. & guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningNew York, NY, USA, KDD ‘16, ACM. (2016).
- 58.McCallum, A. & Nigam, K. A comparison of event models for naive bayes text classification. in AAAI-98 workshop on learning for text categorization. Madison, WI. (1998).
- 59.Zhang, D. Bayesian classification, in Fundamentals of Image Data Mining. Springer. 161–178. (2019).
- 60.James, G. et al. An Introduction To Statistical Learning Vol. 112 (Springer, 2013).
- 61.Hoerl, A. E. & Kennard, R. W. Ridge regression: biased Estimation for nonorthogonal problems. Technometrics12 (1), 55–67 (1970). [Google Scholar]
- 62.Breiman, L. Random forests. Mach. Learn.45 (1), 5–32 (2001). [Google Scholar]
- 63.Genuer, R., Poggi, J. M. & Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett.31 (14), 2225–2236 (2010). [Google Scholar]
- 64.Janitza, S., Tutz, G. & Boulesteix, A. L. Random forest for ordinal responses: prediction and variable selection. Comput. Stat. Data Anal.96, 57–73 (2016). [Google Scholar]
- 65.Sipöcz, N., Tobiesen, F. A. & Assadi, M. The use of artificial neural network models for CO2 capture plants. Appl. Energy. 88 (7), 2368–2376 (2011). [Google Scholar]
- 66.Bland, J. M. & Altman, D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet327 (8476), 307–310 (1986). [PubMed] [Google Scholar]
- 67.Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag.45 (4), 427–437 (2009). [Google Scholar]
- 68.Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology143 (1), 29–36 (1982). [DOI] [PubMed] [Google Scholar]
- 69.Khan, J. R. et al. Machine learning algorithms to predict the childhood anemia in Bangladesh. J. Data Sci.17 (1), 195–218 (2019). [Google Scholar]
- 70.De Oliveira, J. et al. Iron deficiency anemia in children: prevalence and prevention studies in Ribeirão Preto, Brazil. Arch. Latinoam. Nutr.47 (2 Suppl 1), 41–43 (1997). [PubMed] [Google Scholar]
- 71.Male, C. et al. Prevalence of iron deficiency in 12-mo‐old infants from 11 European areas and influence of dietary factors on iron status (Euro‐Growth study). Acta Paediatr.90 (5), 492–498 (2001). [DOI] [PubMed] [Google Scholar]
- 72.Eussen, S. et al. Iron intake and status of children aged 6–36 months in europe: a systematic review. Annals Nutr. Metabol.. 66 (2–3), 80–92 (2015). [DOI] [PubMed] [Google Scholar]
- 73.Bhutta, Z. A. et al. Severe childhood malnutrition. Nat. Reviews Disease Primers. 3 (1), 1–18 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Fabunmi, T. et al. Nutrient intakes and nutritional status of mothers and their under-five children in a rural community of Oyo state, Nigeria. Int. J. Child. Health Nutr.2 (1), 39–49 (2013). [Google Scholar]
- 75.Smits, H. L. Prospects for the control of neglected tropical diseases by mass drug administration. Expert Rev. Anti-infective Therapy. 7 (1), 37–56 (2009). [DOI] [PubMed] [Google Scholar]
- 76.Wilson, M. E. Geography of infectious diseases. Infectious diseases, : p. 1055. (2010).
- 77.Greenwood, B. The epidemiology of malaria. Annals Trop. Med. Parasitol.91 (7), 763–769 (1997). [DOI] [PubMed] [Google Scholar]
- 78.Reed, B. A., Habicht, J. P. & Niameogo, C. The effects of maternal education on child nutritional status depend on socio-environmental conditions. Int. J. Epidemiol.25 (3), 585–592 (1996). [DOI] [PubMed] [Google Scholar]
- 79.Frost, M. B., Forste, R. & Haas, D. W. Maternal education and child nutritional status in bolivia: finding the links. Soc. Sci. Med.60 (2), 395–407 (2005). [DOI] [PubMed] [Google Scholar]
- 80.Dessie, G. et al. Prevalence and determinants of stunting-anemia and wasting-anemia comorbidities and micronutrient deficiencies in children under 5 in the least-developed countries: a systematic review and meta-analysis. Nutr. Rev.83 (2), e178–e194 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Rahman, M. M. et al. Maternal anemia and risk of adverse birth and health outcomes in low-and middle-income countries: systematic review and meta-analysis. Am. J. Clin. Nutr.103 (2), 495–504 (2016). [DOI] [PubMed] [Google Scholar]
- 82.Onyeneho, N. G., Ozumba, B. C. & Subramanian, S. Determinants of childhood anemia in India. Sci. Rep.9 (1), 16540 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Khan, J. R., Awan, N. & Misu, F. Determinants of anemia among 6–59 months aged children in bangladesh: evidence from nationally representative data. BMC Pediatr.16, 1–12 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Rao, S. et al. Study of complementary feeding practices among mothers of children aged six months to two years-A study from coastal South India. Australasian Med. J.4 (5), 252 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Semrad, C. E. Approach to the patient with diarrhea and malabsorption. Goldman’s cecil medicine, : p. 895. (2012).
- 86.Crooijmans, K. L. et al. Nitrogen dioxide exposure, attentional function, and working memory in children from 4 to 8 years: periods of susceptibility from pregnancy to childhood. Environ. Int.186, 108604 (2024). [DOI] [PubMed] [Google Scholar]
- 87.Stern, G. et al. A prospective study of the impact of air pollution on respiratory symptoms and infections in infants. Am. J. Respir. Crit Care Med.187 (12), 1341–1348 (2013). [DOI] [PubMed] [Google Scholar]
- 88.Yu, H. F., Huang, F. L. & Lin, C. J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn.85, 41–75 (2011). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets used in this study are publicly available and can be accessed from portals. The DHS data are publicly available from https://dhsprogram.com after a formal request is accepted, while the PM2.5 and NO2 estimates are publicly available as version V4.GL.03 at https://sites.wustl.edu/acag/datasets. The STATA and R code used in the study will be available from a formal request from corresponding author.








