A predictive model of inadequate minimum dietary diversity among women with a child under 24 months in ethiopia: a machine learning approach using the 2016 EDHS

Aychew Kassa Belete; Bantie Getnet Yirsaw; Birhan Ambachew Taye

doi:10.1186/s41043-025-01237-y

. 2026 Jan 11;45:52. doi: 10.1186/s41043-025-01237-y

A predictive model of inadequate minimum dietary diversity among women with a child under 24 months in ethiopia: a machine learning approach using the 2016 EDHS

Aychew Kassa Belete ^1,^✉, Bantie Getnet Yirsaw ², Birhan Ambachew Taye ³

PMCID: PMC12879448 PMID: 41521338

Abstract

Background

Mothers with inadequate intake of micronutrients are a serious and collective global health issue, especially in poverty stricken areas. However, the available studies in Ethiopia have been usually focused in early childhood nutrition using old statistical methods. The aim of this study is to apply multiple machine learning algorithms to construct a high fidelity predictive model and identify key predictors of Inadequate Minimum Dietary Diversity for Women among Ethiopian mothers with a child under 24 months.

Methods

A weighted sample of 3,914 women from the Ethiopian Demographic Health Survey 2016 was utilized to conduct a secondary analysis of data. The outcome variable was dichotomous: Inadequate Minimum Dietary Diversity for Women or Adequate Minimum Dietary Diversity for Women. The data was divided into 20% and 80% in the testing and training respectively. We used R software version 4.5 to apply and test ML algorithms. To deal with the harsh imbalance of classes, the Adaptive Synthetic method was utilized, and robust feature selection was performed by the Boruta algorithm. An entire set of seven machine learning algorithms classifiers was trained and tested (Accuracy, Recall, F1 score, specificity, precision and AUC).

Findings

Random forest algorithm (accuracy = 95.03%, sensitivity = 92.73%, precision = 97.28% F1-score = 94.94% and AUC = 98.34) was the best predictive model since it had better performance metrics on the test set. Rural residence, unprotected source of drink water, poor wealth index, no media exposure, unimproved toilet facility, no education, age, religion, and traditional method of contraceptive were the top factors to predict minimum dietary diversity of women.

Conclusion

Machine learning models, specifically the Random forest classifier, are well-suited to predict a mother with Minimum Dietary Diversity, which provides a useful decision-supporting tool to the health officials of the populace. The results of the study suggest evidence based guidance, including the necessity of geographically concentrated interventions and the combined programs that can integrate the effects of nutrition education, family planning, and economic empowerment to help reduce the overwhelming socioeconomic and demographic risk factors to advance poor maternal dietary diversity in Ethiopia.

Keywords: Minimum dietary diversity, Machine learning, Random forest, Ethiopia, Ethiopian demographic and health survey

Introduction

Lack of micronutrients in mothers is an urgent global health concern to the population and nutrition science, with the greatest stakes being the poor societies [1]. These nutritional deficiencies have long-term effects that cut across generations. Such deficits weaken the vitality of a mother and her capacity to live, but they, at the same time, serve as a major trigger of poor health outcomes in children. In particular, they have a strong implication in limited fetal development and retarded postnatal development [2, 3]. The nutritional balance is generally compromised through common eating habits: nutritionally unenthusiastic diets with a lack of variety [4]. These diets are mostly based on high-density carbohydrates contained in staple foods such as grains or tubers, but are notably deficient in important foods such as fresh produce, fruits, and indispensable animal based protein [5].

Measuring the diversity of the food intake is known to be one of the direct way of assessing the adequacy of micronutrient intake; therefore, Dietary Diversity is the best and field ready tool of estimating the nutritional adequacy of a woman [6, 7]. The unique biological developmental milestones of Women of Reproductive Age (WRA), which include the ages between 15 and 49 years, have heavy nutritional requirements, especially during the periods of conception, pregnancy, and lactation [8, 9]. These essential needs automatically put this population at a higher risk of becoming deficient. The danger is highest to mothers who are involved in the care of a child who is below two years old [10]. In this hyperactive time, strong nutrient reserves of the mother are necessary to ensure the best postpartum recovery, as well as to uphold the richness needed in breast milk, which directly ensures the highest level of growth and development in a child in the first stage of development.

Having acknowledged the systemic complication of monitoring nutritional status in low-income countries, a barrier posed by the lack of data and the lack of universalized measures, the Food and Agriculture Organization (FAO) officially established the Minimum Dietary Diversity for Women (MDD-W) indicator in 2016. It is a sensible, dichotomous measurement, which sets up at least five out of ten pre-specified food groups consumed by women (15–49 years) during the one-day period before the test [11, 12]. More importantly, this recommended degree of dietary variety is a highly validated programmatic surrogate, with good support provided in the likelihood that the woman is adhering to the essential recommendations of the essential micronutrients [13, 14].

Although the MDD-W has been shown to have significant utility in the macro-level epidemiological surveillance, the conventional tools of statistics, like descriptive summaries and standard regression, are often insufficient to explain the non-linear relationship that the attainment of the MDD-W takes on the different cohorts [15]. Bringing about a paradigm shift in the generalized nutritional surveillance to the implementation of precision-targeted interventions requires a subtle, sub-population-level understanding of the cooperating socioeconomic and health covariates. As a result, an application of the paradigms of Machine Learning (ML) is a scientifically strong methodological development [16, 17]. ML algorithms provide an advanced computational platform that can go beyond correlational inference to provide strong predictive and classification accuracy. These models have the sole capacity to handle high-dimensional, complex data to define and rank the most salient prognostic variables, thereby producing high-fidelity models in order to segment women correctly according to their expected future MDD-W state. Thus, the objective of this research will be to construct and test an underlying predictive model of Minimum Dietary Diversity in women with a child under 24 months in Ethiopia through a machine learning method. Based on the nationally representative data of the 2016 Ethiopian Demographic and Health Survey (EDHS), this study will identify the critical drivers of MDD-W and present an accurate and data-driven instrument to predict the nutritional susceptibility of mothers, hence guiding evidence-based policymaking and investment in the Ethiopian public health sector.

Materials and methods

Design, data source, setting, and periods

We performed a secondary analysis with the 2016 Ethiopian Demographic and Health Survey (EDHS) data. It is a nationally representative database retrieved from the DHS Program database after an official request and following an approval, which is in line with the protocols of the established ethical guidelines. The EDHS, which was implemented between January and June 2016, was a multi-stage stratified sampling design that sampled 645 enumeration areas, with high data quality being provided by the use of standardized, locally adapted questionnaires. In the initial survey, 15,683 women of childbearing age (15–49 years) were interviewed. In order to achieve the goal of our study, the cohort of analysis was limited to the subset of 3,914 respondents who had a child below the age of 24 months who matched our specific inclusion criteria.

Population of the study

The population of the study is women of reproductive age (15–49 years) in Ethiopia with a child of less than 24 months of age at the time of the 2016 Ethiopian Demographic and Health Survey (Fig. 1).

Study variables and measurements

The main outcome variable of this study is the status of Minimum Dietary Diversity of Women (MDD-W). It is a binary variable; a woman reported the consumption of the ten mentioned food groups within 24 h before the survey. To assess DD_W in our study, we calculated the Minimum Dietary Diversity for Women (MDD-W) using Ethiopia Demographic and Health Survey (EDHS) data. We began by grouping specific food items into ten essential categories, such as grains, pulses, dairy, and various vitamin-rich fruits and vegetables. By summing these groups, we created a diversity score for each woman that reflects the variety of her diet. Finally, we used the standard global cutoff to evaluate these scores: women who consumed five or more food groups were classified as having adequate dietary diversity, while those who consumed fewer than five were categorized as having inadequate dietary diversity [18].

A set of covariates was considered as the possible risk factors for Minimum Dietary Diversity for Women (MDD-W) status and extracted from the DHS dataset based on the previous studies [19–23] and the WHO conceptual framework on Minimum Dietary Diversity for Women (MDD-W) status: context, causes, and consequences [24]. In this context, the predictors of the current study are Region, Residence Place, Educational Status, Use of Contraceptives, Age, Religion, Source of Drinking water, Type of toilet Facility, Frequency Listening to Radio, Wealth Index, Current Age of Child, Total children ever Born, and Marital Status. Age Group: Women’s age at current date and re-coded into three groups with values of “0” for 15–24, “1” for 25–34, and “2” for 35–49. Religion: Recoded in four groups with a value of “0” for Muslim, “1” for Orthodox, “2” for protestant, and “3” for other religious groups (combining catholic, other, and traditional). Marital status: Recoded in two categories with a value of “0” for married, and “1” for Unmarried. Source of drinking water: Recode in two categories with a value of “0” for Unprotected, and “1” for Protected. Type of toilet facility: Recode in two categories with a value “0” Un improved, and “1” for Improved. Radio frequency of listening: Re-code to two categories with value “0” for No exposure to listening, and “1” for Listening. Wealth index: re-code to three categories with value “0” for Poor, “1” for Middle, and “2” for Rich. For the Current age of children: Recode in four groups with value “0” for 0–5 months, “1” for 6–11 months, “2” for 12–17 months, and “3” for 18–23 months.

Data pre-processing and analytic strategies

Data pre-processing is essential for the development of a predictive model, significantly influencing its performance. Key pre-processing activities included data cleaning, feature engineering, dimensionality reduction, and data balancing. The dataset was split into training (80%, 3,131 observations) and testing (20%, 783 observations) sets. Several machine learning models were built using the training data, including Support Vector Machine (SVM), Logistic Regression (LR), Gaussian Naïve Bayes (GNB), eXtreme Gradient Boosting (XGB), Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbors (KNN). After evaluating the models based on performance metrics, the most effective model was selected for predicting stunting status and identifying key predictors (Fig. 2).

Data cleaning

Data cleaning is an essential first step following data collection. It involves spotting and removing outliers, tackling missing values, and fixing any imbalances in the outcome variable. We considered several methods for dealing with missing data, including deletion, imputation techniques, and model-based approaches. Ultimately, we opted for deletion to preserve data integrity, as 167 women only provided demographic information and did not respond to any of the ten dietary survey questions. This choice helps reduce bias by excluding incomplete cases. To identify outliers, we used scatter plots, box plots, and histograms. Additionally, we assessed multicollinearity by examining the correlation matrix, where a value over 0.8 indicates a strong correlation between variables [25].

Imbalanced data handling

To address the challenge of class imbalance, where machine learning (ML) models tend to favor the dominant class and perform poorly in predicting minority outcomes, we implemented targeted data balancing methods. We made a strict comparison of the four most used methods for overcoming this intrinsic bias: under-sampling, over-sampling, the Synthetic Minority Over-Sampling Technique (SMOTE), and the Adaptive Synthetic (ADASYN) approach. Following initial training on the unbalanced raw data, we tested the performance of our ML models under each balancing regime. Model performance was thoroughly assessed using an exhaustive set of metrics, including accuracy, recall, F1 score, specificity, precision, and Area under the Curve (AUC). Comparative evaluation demonstrated that the ADASYN approach consistently yielded the best improvements on all the metrics of assessment, and as such, it was selected as the final method for training the final, stable predictive models.

Feature engineering

Feature engineering began with encoding categorical variables to prepare them for modelling: nominal features were one-hot encoded, and ordinal features were label encoded, both accomplished by the scikit-learn library. For increasing model efficiency and generalization, dimensionality reduction was performed via feature selection, most appropriate for our tabular data, unlike feature extraction for image processing. We tried numerous varied feature selection techniques, including Lasso, PCA, stepwise reduction, and chi-square tests [26, 27]. The intensive testing confirmed that the Boruta algorithm yielded the most significant and appropriate predictors for our top-performing model.

Data splitting

To check our models’ performance on blind data, we began by applying a straightforward 80/20 split procedure. It involved allocating 80% of the samples (approximately 3,131 respondents) to training and keeping the remainder (approximately 783 respondents) aside for strict external verification of the last model. In addition, to maximise the robustness of the training process and use all available data to the fullest, a key consideration given our limited sample size, we used tenfold cross-validation. This method ensures that the model is trained and validated on multiple subsets of the data and avoids wasting valuable samples ahead of time.

Model selection

Due to the nature of the outcome variable (Adequate or Inadequate MDD-W), the task of modeling was set to be a binary classification problem. Following the 80/20 data split, we selected and trained a diverse portfolio of seven Machine Learning (ML) classifiers to perform the prediction. This comprehensive set included both classical linear and non-linear models: Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Gaussian Naïve Bayes (GNB), eXtreme Gradient Boosting (XGBoost), and Decision Tree (DT). These models were chosen specifically for their well-documented predictive accuracy in earlier classification studies applying Demographic and Health Survey (DHS) datasets [28–31].

Model training and evaluation

Once the dataset was divided into independent training and test sets, we proceeded with the model selection and training phase. As the target variable (MDD-W status) is categorical in nature, the analysis was configured as a binary classification problem. We tested a series of seven Machine Learning (ML) classifiers, a selection guided by previous protocols in existing ML research utilizing DHS data, the nature of the problem at hand, and our dataset composition. We initially trained the selected classifiers with balanced and unbalanced datasets. The best-performing predictive model was then selected and trained on the final balanced training set before being applied to the unseen test set. Model performance was rigorously tested using a combination of the confusion matrix and the Receive Operative Character (ROC) curve, with Accuracy, Recall, F1 score, specificity, precision, and the Area under the Curve (AUC) as measures. The AUC was utilized as the primary performance measure and offers a robust, all-around estimate of the discriminative ability of the model at varied thresholds.

Model interpretability

To enhance the interpretability and transparency of our final predictive model, we used Shapley Additive explanations (SHAP) analysis [32, 33], a rigorous, game-theoretic method that comprehensively explains how each feature impacts individual and overall model predictions [34, 35]. We applied SHAP in two broad ways: first, calculating the mean absolute SHAP value for all features provided insight into general importance, where the sign indicated a push toward or away from the predicted class (Adequate MDD-W) and magnitude indicated strength of influence.

Ethical considerations

For the purposes of this research, ethical approval for the main collection of the 2016 Ethiopian Demographic and Health Survey (EDHS) was granted by the Central Statistical Agency (CSA) from the Ethiopian Health and Nutrition Research Institute Review Board and the National Research Ethics Review Committee. The CSA confirmed that all field work, including obtaining written informed consent from the respondents, conformed to the principles outlined in the Declaration of Helsinki. We obtained a formal letter of approval to access and use anonymized data from the DHS Program data archivist upon submitting a comprehensive study proposal on Aug 12, 2025. In the analysis, we maintained complete data privacy and accessed the dataset solely for the intended purpose of this study.

Results

Minimum dietary diversity of women by background characteristics

A total of weighted 3,914 women with a child under 24 months were included in the final analysis. From this, the prevalence of inadequate Minimum Dietary Diversity for Women (MDD-W) was found to be 91.13% (N = 3,606) (Fig. 3). Among the women who achieved inadequate MDD-W, the highest proportion was in the less than 24 age group (31.3%), and 60.8 were not educated. Furthermore, the majority of these women lived in rural areas (81.1%) and were currently married. Regarding household characteristics, 82.8% utilized an unimproved toilet facility, and 76% reported having no access to media exposure. The parity distribution showed that the largest subset of these mothers (38%) had between one and two children (Table 1).

Table 1.

Socio-demographic characteristics among women in Ethiopia, 2016 EDHS

Variables	Categories	Inadequate- MDD	Adequate-MDD	Chi-square test
Age	< 24	1129(31.3%)	87(28.2%)	0.263
	25–34	1792(49.7%)	168(54.5%)
	>=35	685(19%)	53(17.2%)
Residence	Rural	2923(81.1%)	117(38%)	< 0.001
	Urban	683(18.9%)	191(62%)	< 0.001
Education level	No education	2191(60.8%)	129(41.9%)	< 0.001
	Primary	990(27.5%)	99(32.1%)
	secondary	294(8.2%)	41(13.3%)
	higher	131(3.6%)	39(12.7%)
Religion	Muslim	1812(50.2)	138(44.8%)	0.170
	Orthodox	1072(29.8%)	104(33.8%)
	protestant	628(17.4%)	61(19.8%)
	Others	91(2.5%)	5(1.6%)
Type of toilet facility	Unimproved	2984(82.8%)	90(29.2%)	< 0.001
	Improved	622(17.2%)	218(70.8%)	< 0.001
Media exposure	No	2740(76%)	114(37%)	< 0.001
	yes	866(24%)	194(63%)	< 0.001
Wealth index	Poor	1920(53.2%)	97(31.5%)	< 0.001
	Middle	505(14%)	47(15.3%)
	Rich	1181(32.8%)	164(53.2%)
Age of children	0–5 month	1079(29.9%)	14(4.5%)	< 0.001
	6–11 month	924(25.6%)	78(25.3%)
	12–17 month	962(26.7%)	115(37.3%)
	18–23 month	641(17.8%)	101(32.8%)
Marital status	Married	3462(96%)	299(97.1%)	0.352
	Unmarried	144(4%)	9(2.9%)	0.352
Source of drinking water	Unprotected	2180(60.5%)	237(76.9%)	< 0.001
	protected	1426(39.5%)	71(23.1%)	< 0.001
Number of children	1–2 children	1371(38%)	150(48.7%)	< 0.001
	3–4 children	958(26.6%)	74(24%)
	5 + children	1277(35.4%)	84(27.3%)
Current contraceptive use	Traditional	2748(76.2%)	181(58.8%)	< 0.001
	Modern	858(23.8%)	127(41.2%)	< 0.001

Open in a new tab

Machine learning analysis of MDD-W data balancing

We exclusively compared the performance of four data balancing algorithms, under-sampling, over-sampling, SMOTE, and ADASYN, on improving our models’ performance on the skewed dataset. All these approaches’ effectiveness was evaluated using standard classification metrics, including accuracy, recall, the F1 measure, specificity, precision, and the Area under the Curve (AUC). Even though the RF algorithm had initially shown good promise with an AUC of 98.34%, all the classifiers were tested as a whole, and it was discovered that the ADASYN method was the best balancer utilized in preparing the data for eventual prediction on a consistent basis (Table 2).

Table 2.

Comparison of imbalanced data handling techniques using Accuracy, Recall, F1 score, and AUC

Model	Metric	US	OS	SMOTE	ADASYN
LR	Accuracy	0.6675 ± 0.0458	0.6559 ± 0.0185	0.6648 ± 0.0322	0.6681 ± 0.0175
	AUC	0.7329 ± 0.0463	0.7327 ± 0.0179	0.7276 ± 0.0573	0.7296 ± 0.0222
	Recall	0.6494 ± 0.0520	0.6525 ± 0.0424	0.6461 ± 0.0653	0.6883 ± 0.0182
	Specificity	0.6856 ± 0.0788	0.6592 ± 0.0353	0.6664 ± 0.0325	0.6478 ± 0.0326
	Precision	0.6770 ± 0.0569	0.6573 ± 0.0194	0.1430 ± 0.0201	0.6640 ± 0.0209
	F1-score	0.6614 ± 0.0420	0.6542 ± 0.0240	0.2339 ± 0.0305	0.6757 ± 0.0147
RF	Accuracy	0.6916 ± 0.0337	0.9491 ± 0.0066	0.9507 ± 0.0087	0.9503 ± 0.0175
	AUC	0.7539 ± 0.0399	0.9877 ± 0.0047	0.9811 ± 0.0051	0.9834 ± 0.0222
	Recall	0.6753 ± 0.0407	0.9897 ± 0.0054	0.9297 ± 0.0177	0.9273 ± 0.0182
	Specificity	0.7078 ± 0.0479	0.9085 ± 0.0129	0.9703 ± 0.0104	0.9737 ± 0.0326
	Precision	0.6990 ± 0.0408	0.9155 ± 0.0108	0.9673 ± 0.0111	0.9728 ± 0.0209
	F1-score	0.6864 ± 0.0341	0.9511 ± 0.0061	0.9480 ± 0.0095	0.9494 ± 0.0147
KNN	Accuracy	0.6236 ± 0.0444	0.8326 ± 0.0166	0.7184 ± 0.0212	0.8756 ± 0.0095
	AUC	0.6727 ± 0.0564	0.9375 ± 0.0100	0.6301 ± 0.0496	0.9465 ± 0.0045
	Recall	0.6461 ± 0.0941	0.9936 ± 0.0063	0.4577 ± 0.0748	0.9657 ± 0.0143
	Specificity	0.6008 ± 0.0796	0.6716 ± 0.0338	0.7407 ± 0.0239	0.7845 ± 0.0086
	Precision	0.6193 ± 0.0447	0.7521 ± 0.0192	0.1313 ± 0.0197	0.8195 ± 0.0089
	F1-score	0.6295 ± 0.0563	0.8560 ± 0.0123	0.2037 ± 0.0297	0.8865 ± 0.0100
SVM	Accuracy	0.6611 ± 0.0579	0.6611 ± 0.0579	0.7736 ± 0.0295	0.8463 ± 0.0108
	AUC	0.7356 ± 0.0476	0.7356 ± 0.0476	0.7209 ± 0.0505	0.9235 ± 0.0100
	Recall	0.6237 ± 0.0742	0.6237 ± 0.0742	0.5066 ± 0.0975	0.8524 ± 0.0124
	Specificity	0.6984 ± 0.0703	0.6984 ± 0.0703	0.7965 ± 0.0308	0.8403 ± 0.0164
	Precision	0.6748 ± 0.0616	0.6748 ± 0.0616	0.1770 ± 0.0352	0.8435 ± 0.0139
	F1-score	0.6471 ± 0.0635	0.6471 ± 0.0635	0.2616 ± 0.0494	0.8478 ± 0.0104
GNB	Accuracy	0.6411 ± 0.0333	0.6384 ± 0.0223	0.8032 ± 0.0372	0.7741 ± 0.0237
	AUC	0.7355 ± 0.0388	0.7337 ± 0.0230	0.7260 ± 0.0639	0.8666 ± 0.0191
	Recall	0.5187 ± 0.1022	0.5100 ± 0.0472	0.4385 ± 0.0643	0.7160 ± 0.0269
	Specificity	0.7631 ± 0.0722	0.7668 ± 0.0316	0.8344 ± 0.0384	0.8328 ± 0.0341
	Precision	0.6917 ± 0.0477	0.6867 ± 0.0279	0.1905 ± 0.0480	0.8124 ± 0.0331
	F1-score	0.5861 ± 0.0665	0.5842 ± 0.0352	0.2641 ± 0.0561	0.7608 ± 0.0242
XGBoost	Accuracy	0.7095 ± 0.0389	0.8307 ± 0.0159	0.8097 ± 0.0287	0.9532 ± 0.0116
	AUC	0.7730 ± 0.0496	0.8965 ± 0.0133	0.7570 ± 0.0688	0.9816 ± 0.0046
	Recall	0.7533 ± 0.0714	0.8946 ± 0.0207	0.4483 ± 0.1087	0.9212 ± 0.0222
	Specificity	0.6655 ± 0.0702	0.7668 ± 0.0249	0.8405 ± 0.0281	0.9856 ± 0.0035
	Precision	0.6947 ± 0.0418	0.7935 ± 0.0182	0.1960 ± 0.0472	0.9848 ± 0.0037
	F1-score	0.7210 ± 0.0399	0.8409 ± 0.0148	0.2716 ± 0.0635	0.9519 ± 0.0126
DT	Accuracy	0.6850 ± 0.0294	0.6843 ± 0.0216	0.8459 ± 0.0338	0.7476 ± 0.0095
	AUC	0.7294 ± 0.0333	0.7110 ± 0.0374	0.7190 ± 0.0462	0.8037 ± 0.0045
	Recall	0.6818 ± 0.0624	0.8101 ± 0.0715	0.3344 ± 0.0697	0.7947 ± 0.0143
	Specificity	0.6882 ± 0.0707	0.5585 ± 0.1037	0.8896 ± 0.0356	0.6999 ± 0.0086
	Precision	0.6889 ± 0.0403	0.6519 ± 0.0389	0.2163 ± 0.0580	0.7327 ± 0.0089
	F1-score	0.6832 ± 0.0338	0.7190 ± 0.0148	0.2591 ± 0.0575	0.7599 ± 0.0100

Open in a new tab

Bold result indicates that the highest performance metrics

Features selection using Boruta algorithms

In an effort to determine the best combination of predictors, we used the Boruta algorithm, a data-driven feature selection statistical method invented for trustworthy feature selection. Boruta focuses on finding the statistical relevance of the contribution of every independent variable to predicting the Minimum Dietary Diversity (MDD-W) status. This process yielded twelve retained features visually emphasized in green in Fig. 4, which had a high ability to explain the variance in MDD-W and were therefore chosen for predictive modelling. On the other hand, only one variable, marital status (emphasized red in Fig. 4), was statistically removed as irrelevant since it was found to contribute minimally, like random noise.

Fig. 4 — Feature selections using the Boruta algorithm method

Model development and performance evaluation to predict minimum dietary diversity

To compare the predictive power of the algorithms comprehensively, a range of performance metrics were employed, including accuracy, precision, recall, F1 score, specificity, and AUC value. The measures collectively validated the models’ overall accuracy, their capacity to label positive and negative examples correctly, and their overall discriminative capacity. By performing this intensive analysis, more specifically the Receive Operative Character (ROC) curve analysis, we determined that the top two performing machine learning classifiers for MDD-W status were the random forest and eXtreme Gradient Boosting classifiers (Fig. 5)

Fig. 5 — Performance evaluation measure of the machine learning algorithm after data balancing with ADASYN

SHAP value interpretation

The Boruta feature importance chart (Fig. 6) shows that Current age of children is the most influential factor, scoring 83.17. It is closely followed by Regions at 66.79 and Source of drink water at 55.06, all highlighting their significance in the analysis. Other important features include Age, frequency listening radio, and contraceptive uses, which also contribute notably. In contrast, residence scores the lowest at 27.33, indicating it has little impact on the outcomes. Overall, these findings emphasize the importance of age and regional differences, along with socio-economic elements like education and health factors.

Discussion

The research highlights the performance of Machine Learning algorithms for the prediction of the incidence of poor Minimum Dietary Diversity for Women (MDD-W) among Ethiopian women. This success provides the potential for the creation of automatic diagnostic tools and clinical decision support systems to assist healthcare professionals in the diagnosis and management of this significant public health problem. Our study employed and contrasted seven ML models. Random Forest (RF), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGB), Support Vector Machine (SVM), and Logistic Regression (LR) to quantify their comparative predictive capability. Performance assessment revealed that all seven algorithms surpassed the optimal cut-off point for Receive Operative Character (ROC) scores. Most significantly, the GNB algorithm emerged as the optimal model with a high (accuracy = 95.03%, sensitivity = 92.73%, precision = 97.28% F1-score = 94.94% and AUC = 98.34) (Fig. 7).

Fig. 7 — ROC Curves for the seven models

We also estimated and compared the accuracy of the seven models using the calibration plot (Fig. 8) the calibration plot measures the level of agreement between the average predicted probabilities of minimum dietary diversity predicted by the models (X-axis) versus the observed anaemia frequency (Y-axis). The plot shows that the calibration by random forest is good, as the mean predicted probability of women minimum dietary diversity is similar to the minimum dietary diversity across the entire distribution. The prediction from the other models is far from the observed frequency.

The use of the Gaussian Naïve Bayes (GNB) classifier in predicting Minimum Dietary Diversity for Women (MDD-W) has significant implications, primarily by yielding good predictive models and worthwhile risk factor mechanisms. Such capability enables one to detect at-risk subgroups with high accuracy and reflects a clear avenue towards integrating machine learning solutions within existing healthcare systems. These improvements are crucial for the development of targeted nutrition interventions, the implementation of personalized healthcare strategies, and the realization of improved health in women affected by diets of poor quality.

The second goal of this research was to determine the key predictors of poor MDD-W among Ethiopian women. For this purpose, Boruta feature selection was employed. With 13 variables derived from the literature, the top 12 predictors of poor dietary diversity could be determined through the analysis. These are key factors: current age of children, region, religion, toilet facility type, wealth index, education level, current contraceptive use, residence place, current age of respondent, drinking water source, number of children ever given birth, and frequency of listening to radio. This result is supported by the study conducted in Ethiopia [36] and Bangladesh [37]. The subsequent analysis of the mean SHAP value report provided further quantitative data about the relative importance of these attributes in the classification model. Specifically, the most important factors to influence the model predictions were identified to be region, number of children ever born, age of the children now, and use of contraceptives now. The conclusion that region and children’s age now possess the largest SHAP values of inadequate MDD-W provides clear direction, instructing policymakers and health personnel to target interventions and services such as enhanced nutrition and health education programs in the areas and groups found to have the highest risk.

The observation that low Minimum Dietary Diversity for Women (MDD-W) is significantly more prevalent in rural areas (81.1%) compared to their urban counterparts is buttressed by high-quality previous nutritional and epidemiological research in Ethiopia. which is consistent with previous research conducted in south Africa [38], Uganda [39], and Nigeria. This excessive rural-urban disparity is evidently multi-factorial in origin, largely due to spatial variations in socioeconomic status, access to markets, and levels of education. Rural settlements are frequently very much restricted to the achievement of dietary diversity and security on the basis of subsistence farm agriculture, which very naturally results in the consumption pattern that is focused around unvaried cereal-based diets. Availability of nutritionally high groups of foods, Animal Source Foods (ASF), and diverse fruits and vegetables, is severely limited by underdeveloped market infrastructure, reduced disposable resources for buying foods, and a lack of proper preservation and storage facilities. This result lends direct support to research that suggests wealth index and geographic location as major predictors of MDD-W, in the sense that living in cities typically provides advantages such as increased market density and diversified non-agricultural work, and hence better access to differentiated food sources. Also, the more prevalent lower extents of female formal schooling and lesser extents of exposure to mass media in rural Ethiopia tend to act as principal barriers to adoption of optimal dietary practices, thereby reinforcing the intergenerational trajectory of maternal malnutrition and finally substantiating the need for highly geographically focalized public health interventions.

The study revealed that women who have given birth to five or more children (35.4%) were more likely to have poor Minimum Dietary Diversity (MDD-W) than those with fewer children. This can be biological; successive pregnancies and extended lactations place cumulative physiological strain on the mother, resulting in the gradual exhaustion of maternal micronutrient stores, at times loosely described as the “maternal depletion syndrome” [40]. This depletion increases the woman’s baseline nutritional need, and it is much harder to meet the necessary intake from a usual diet. Socioeconomically, high parity is usually linked with lower household wealth, lower education, and increased dependency ratios in settings where resources are limited. Large numbers of children squeeze the household budget, making resource allocation decisions necessary between using money for staple foods to feed the large family or purchasing diversified, expensive micronutrient-rich foods (like Animal Source Foods and certain fruits and vegetables) required for the mother’s MDD-W score. This trend, therefore, shows a double burden whereby the biological demands are escalated as economic possibilities for meeting the demands are simultaneously diminished, confirming the imperative of integrating family planning and economic empowerment within nutrition programs.

The finding that a low level of formal education among women considerably enhances the risk of lacking Minimum Dietary Diversity (MDD-W) is one of the strongest and commonly accepted connections in nutrition science. This strong connection is implicitly based on three cross-cutting processes. First, education is a major transmission channel of nutritional knowledge, enabling women to access, read, and utilize more detailed information on diet and health, enabling them to make informed consumption choices required to achieve a diversified, micronutrient-rich diet [41]. Second, an improved level of education is directly connected with greater socioeconomic empowerment and expanded decision-making autonomy over household funds [42, 43]. This empowerment allows well-educated women to spend priority funds on variable, often costlier, food categories necessary for MDD-W compliance rather than spending priority funds on staple foods for the family. Finally, education encourages greater personal control and capability, and thus more utilization of core health care services, such as antenatal and postnatal care, in which vital nutrition counselling is given [44, 45]. In effect, formal schooling works as a powerful social determinant, equipping women with cognitive ability and economic leverage to overcome system-level obstacles to diet quality and hence legitimize its lack as an effective predictor of poor MDD-W.

The finding that women with zero exposure to the media are at higher risk of low Minimum Dietary Diversity (MDD-W) compared to those with some exposure is significant and in line with conventional behavioral and communication theory within public health [46]. This pattern is justified through the role mass media, through radio, television, and phone accessibility, plays as a channel for disseminating vital nutrition-related information.

Mass media is the most effective and least costly medium for reaching geographically dispersed and often less educated populations in the majority of low-income settings like Ethiopia [47]. Exposure to media campaigns, public announcements on health topics, and educational programs on diet, hygiene, and maternal health directly increases a woman’s knowledge of nutrition and awareness of diverse foods. Furthermore, exposure to media acts as a substitute for social connectedness and modernity, which is typically accompanied by enhanced health aspirations, greater openness to change, and increased ability to obtain market-variety foods [48]. In the absence of such exposure, women become information-poor, disconnected from important health information, and buffered against the uptake of new, varied foods and thus solidifying its role as a significant barrier to the achievement of adequate MDD-W.

The research’s applied utility is high, with its usefulness in guiding early detection of poor MDD-W, allowing prevention strategies to be formulated with an eye on the target group, and facilitating tailored intervention plans. The research findings are also bound to inform resource planning and policy formulation directly in the health sector. These findings have the potential to help significantly enhance the health status of Ethiopian women by effectively reducing inadequate MDD-W and its extensive impact on people, households, and the national healthcare system. Consequently, this study brings new knowledge into the discipline of MDD-W intervention among mothers with its innovative methodological approach, the unique identification of significant risk factors, and the development of accurate prediction models. These results offer policymakers and program planners valuable, data-intensive insights, offering astute direction for crafting highly focused interventions to improve the nutritional status of women and overall health within the region.

Comparison with previous studies

This study provides valuable insights into the predictors of inadequate Minimum Dietary Diversity for Women (MDD-W) among mothers in Ethiopia by employing machine learning techniques, marking a significant advancement over traditional statistical methods used in prior research. Previous studies, including those by Mekonen et al. (2022) [49] and Saaka et al. (2021) [20], primarily relied on linear regression models, which often fail to capture the complex, non-linear relationships present in dietary diversity data. In contrast, our study utilizes various machine learning algorithms, such as Random Forest and eXtreme Gradient Boosting, achieving a notably higher predictive accuracy (95.03% accuracy for the Random Forest model) compared to earlier conventional analyses.

Our research identifies critical predictors of MDD-W, including Rural residence, unprotected source of drink water, poor wealth index, no media exposure, unimproved toilet facility, no education, age, religion, and traditional method of contraceptive were the top factors to predict minimum dietary diversity of women. These findings align closely with studies conducted in other African countries, like Chakona and Shackleton (2017) in South Africa [50], as well as research from Asian regions, notably Zamalet al. (2023) in Bangladesh [51]. In both contexts, similar variables were recognized as key factors influencing dietary diversity, emphasizing the significance of socioeconomic determinants across diverse settings.

Moreover, we have addressed class imbalance using the Adaptive Synthetic (ADASYN) method, which significantly enhances the reliability of our results. Earlier works, including those by Bitew et al. (2020) [52], experienced challenges with imbalanced datasets that affected the accuracy of their predictive outcomes. Our use of machine learning techniques not only strengthens the validity of our findings but also illustrates their potential to improve public health interventions focused on maternal nutrition.

Additionally, our findings resonate with those from studies in Asia, where educational attainment and media exposure have been highlighted as important factors in determining women’s dietary diversity [19]. Research in countries like India [53] and Pakistan [54] corroborates our conclusions, showing that higher levels of education are strongly associated with improved dietary diversity, thereby aligning with the insights presented in our study.

Strengths and limitations of the study

The article has a number of strengths and limitations. Its strengths include a novel approach based on machine learning algorithms, which can contribute to a more accurate predictive value and a more detailed study of the factors affecting Minimum Dietary Diversity (MDD). It reviews data on a large, nationally representative dataset (2016 EDHS), which augments the validity and applicability of its results. Also, the identification of key socio-demographic predictors gives practical implications for targeted public health activities, and SHAP value analysis can be used to identify the significance of various predictors, thereby increasing the model explanations. Nonetheless, the study is also limited by the fact that it is based on cross-sectional data from one year (2016), using which causal inferences may be limited, as well as dynamic shifts in dietary patterns may not be observed. There is a possibility of bias in self-reporting dietary intake that can influence the validity of MDD assessments, and the restrictiveness of women with children below the age of 24 months might not be representative of the diversity of dietary problems of all women in Ethiopia. Lastly, although ADASYN method was used to deal with the imbalance of classes, some imbalance that has been avoided might still affect the predictive capability of the model in terms of poor MDD.

Conclusion

Machine learning (ML) algorithms specifically, the random forest (RF) classifier, can be a predictive tool with a high degree of AUC and usefulness in diagnosing Ethiopian mothers with Inadequate Minimal Dietary Diversity in Women (MDD-W). The RF model has outperformed them in that it is able to predict almost all the at-risk mothers. Rural residence, unprotected source of drink water, poor wealth index, no media exposure, unimproved toilet facility, no education, age, religion, and traditional method of contraceptive were the top factors to predict minimum dietary diversity of women. The results of this study are evidence-based guidance to healthcare administrators at the community level since efforts must be focused on geographically-specific interventions involving a synthesis of nutrition education, family planning, and economic empowerment to effectively manage maternal nutrition deficiencies and policy brief for concerned body for achieving SDG. We also recommend that other researchers consider using time-series DHS data or federated models for nutritional monitoring.

Acknowledgements

We would like to thank the demographic health survey program for providing free access to the data set used for this analysis.

Abbreviations

ML: Machine learning
MDD-W: Minimum dietary diversity of women
AUC: Area under the curve
WRA: Women of reproductive age
FAO: Food and agriculture organization
EDHS: Ethiopian demography health survey
SMOTE: Minority over sampling technique
ADASYN: Adaptive synthetic approach
LR: Logistic regression
RF: Random forest
KNN: K-Nearest neighbours
SVM: Supportive vector machine
XGBOOST: Extreme gradient boosting
DT: Decision tree
GNB: Gaussian naïve bayes
ROC: Receive operative character

Author contributions

Conceptualization: Aychew Kassa BeleteData curation: Aychew Kassa Belete, Birhan Ambachew Taye.Formal analysis: Aychew Kassa Belete, Birhan Ambachew Taye, Bantie Getnet Yirsaw. **Methodology: ** Bantie Getnet Yirsaw, Birhan Ambachew Taye.Software: Aychew Kassa Belete, Birhan Ambachew Taye, Bantie Getnet Yirsaw.Writing– original draft: Aychew Kassa Belete, Birhan Ambachew Taye, Bantie Getnet Yirsaw **Writing– review & editing: ** Aychew Kassa Belete.

Data availability

The dataset can be accessed at [**https://dhsprogram.com/methodology/survey/survey-display-478.cfm**](https:/dhsprogram.com/methodology/survey/survey-display-478.cfm).

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Hanieh S, High H, Boulton J. Nutrition justice: Uncovering invisible pathways to malnutrition. Front Endocrinol. 2020;11:150. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.House SH. Transgenerational healing: educating children in genesis of healthy children, with focus on nutrition, emotion, and epigenetic effects on brain development. Nutr Health. 2013;22(1):9–45. [DOI] [PubMed] [Google Scholar]
3.Delisle HF. Poverty: the double burden of malnutrition in mothers and the intergenerational impact. Ann N Y Acad Sci. 2008;1136(1):172–84. [DOI] [PubMed] [Google Scholar]
4.Diversi T, Fraser A. The Good Enough Diet: Where Near Enough is Good Enough to Lose Weight. John Wiley & Sons; 2011. [Google Scholar]
5.Sommer A. Vitamin A deficiency and its consequences: a field guide to detection and control. World Health Organization; 1995.
6.Intakes S. C.o.t.S.E.o.D.R., S.o. Interpretation, and U.o.D.R. Intakes, Dietary reference intakes: applications in dietary assessment. 2001.
7.Warriner KM. Dietary diversity and nutritional status of pregnant women attending an ante-natal clinic in KZN. 2018.
8.Tinker A et al. Women’s health and nutrition. World Bank discussion paper, 1995(256).
9.Soysa P. Women and nutrition. World Rev Nutr Diet. 1987;52:1. [DOI] [PubMed] [Google Scholar]
10.Goodman SH, Gotlib IH. Risk for psychopathology in the children of depressed mothers: a developmental model for understanding mechanisms of transmission. Psychol Rev. 1999;106(3):458. [DOI] [PubMed] [Google Scholar]
11.Ilo J, Onabanjo O, Hamzat A. Dietary diversity and micronutrient intake of adult women in Ogun State, Nigeria (case study). Egypt J Nutr. 2023;38(3):13–21. [Google Scholar]
12.Forsido SF, et al. Maternal dietary practices, dietary diversity, and nutrient composition of diets of lactating mothers in Jimma Zone, Southwest Ethiopia. PLoS One. 2021;16(7):e0254259. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bekele WK. Food-based strategies to improve iron status of pregnant women: randomized controlled trial. University of South Africa (South Africa); 2019.
14.Aqeel A. Dietary genomics as a complementary tool in childhood malnutrition intervention. Duke University; 2025.
15.Manohar S. Childhood linear growth velocity in the plains (Tarai) of nepal: patterns and risk factors. Johns Hopkins University; 2019.
16.Xu Y, et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation. 2021. 10.1016/j.xinn.2021.100179. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Vamathevan J, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discovery. 2019;18(6):463–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Norde MM, et al. The global diet quality score as an indicator of adequate nutrient intake and dietary quality–a nation-wide representative study. Nutr J. 2024;23(1):42. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Mekonen EG. Minimum dietary diversity and its determinants among women of childbearing age in three Sub-Saharan African and South Asian countries: evidence from the most recent nationally representative surveys (2022). Womens Health Rep. 2024;5(1):954–64. [Google Scholar]
20.Saaka M, Mutaru S, Osman SM. Determinants of dietary diversity and its relationship with the nutritional status of pregnant women. J Nutr Sci. 2021;10:e14. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Babah OA, et al. Dietary diversity insufficiently explains differences in prevalence of anaemia in pregnancy across regions in Nigeria: a secondary analysis of demographic and health survey 2018. PLoS Glob Public Health. 2025;5(5):e0004540. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Kinshella M-LM. Maternal diets and dietary diversity in relation to pregnancy hypertension in sub-Saharan Africa. University of British Columbia; 2024.
23.Vahedi L, et al. Intimate partner violence and women’s dietary diversity: a population-based investigation in 8 low-and middle-income countries. J Nutr. 2025;155(4):1236–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Fisher R. Nutrient adequacy and dietary diversity of women in the Gauteng and Eastern Cape provinces, South Africa–focus on micronutrients from the national food fortification programme. 2021.
25.Goodarzi M, Dejaegher B, Heyden YV. Feature selection methods in QSAR studies. J AOAC Int. 2012;95(3):636–51. [DOI] [PubMed] [Google Scholar]
26.Jović A, Brkić K, Bogunović N. A review of feature selection methods with applications. 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). Ieee; 2015.
27.Theng D, Bhoyar KK. Feature selection techniques for machine learning: a survey of more than two decades of research. Knowl Inf Syst. 2024;66(3):1575–637. [Google Scholar]
28.Bitew FH, et al. Machine learning approach for predicting under-five mortality determinants in ethiopia: evidence from the 2016 Ethiopian demographic and health survey. Genus. 2020;76(1):37. [Google Scholar]
29.Rao B, et al. Machine learning in predicting child malnutrition: a meta-analysis of demographic and health surveys data. Int J Environ Res Public Health. 2025;22(3):449. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Khudri MM, et al. Predicting nutritional status for women of childbearing age from their economic, health, and demographic features: a supervised machine learning approach. PLoS One. 2023;18(5):e0277738. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Kananura RM. Machine learning predictive modelling for identification of predictors of acute respiratory infection and diarrhoea in Uganda’s rural and urban settings. PLoS Glob Public Health. 2022;2(5):e0000430. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Li J, et al. Interpretable mortality prediction model for ICU patients with pneumonia: using Shapley additive explanation method. BMC Pulm Med. 2024;24(1):447. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Sahlaoui H, et al. Predicting and interpreting student performance using ensemble models and Shapley additive explanations. IEEe Access. 2021;9:152688–703. [Google Scholar]
34.Shehadeh A, Alshboul O. Game theory integration in construction management: a comprehensive approach to cost, risk, and coordination under uncertainty. J Constr Eng Manag. 2025;151(5):04025039. [Google Scholar]
35.Ferraz V. Understanding and Modeling Economic Behavior: Experimental Insights and Computational Perspectives. 2024.
36.Zemariam AB, et al. Prediction of stunting and its socioeconomic determinants among adolescent girls in Ethiopia using machine learning algorithms. PLoS One. 2025;20(1):e0316452. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Islam MH, et al. Dietary diversity and micronutrients adequacy among the women of reproductive age at St. Martin’s Island in Bangladesh. BMC Nutr. 2023;9(1):52. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Chakona G, Shackleton C. Minimum dietary diversity scores for women indicate micronutrient adequacy and food insecurity status in South African towns. Nutrients. 2017;9(8):812. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Kimuli D, et al. Prevalence and determinants of minimum dietary diversity for women of reproductive age in Uganda. BMC Nutr. 2024;10(1):39. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Serrallach O. The Postnatal Depletion Cure: A complete guide to rebuilding your health and reclaiming your energy for mothers of newborns, toddlers and young children. Hachette UK; 2018. [Google Scholar]
41.Katenga-Kaunda LZ, et al. Enhancing nutrition knowledge and dietary diversity among rural pregnant women in Malawi: a randomized controlled trial. BMC Pregnancy Childbirth. 2021;21(1):644. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Acharya DR, et al. Women’s autonomy in household decision-making: a demographic study in Nepal. Reproductive Health. 2010;7(1):15. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Hameed W, et al. Women’s empowerment and contraceptive use: the role of independent versus couples’ decision-making, from a lower middle income country perspective. PLoS ONE. 2014;9(8):e104633. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Titaley CR, et al. Why don’t some women attend antenatal and postnatal care services? A qualitative study of community members’ perspectives in Garut, Sukabumi and Ciamis districts of West Java Province, Indonesia. BMC Pregnancy Childbirth. 2010;10(1):61. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Salam RA, et al. Essential childbirth and postnatal interventions for improved maternal and neonatal health. Reprod Health. 2014;11(Suppl 1):S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Michael M, Cheuvront CCJB. Health communication on the Internet: an effective channel for health behavior change? J Health Commun. 1998;3(1):71–9. [DOI] [PubMed] [Google Scholar]
47.Negussie A, et al. Reach and impact of a nationwide media campaign in Ethiopia for promoting safe breastfeeding practices in the context of the COVID-19 pandemic. BMC Global Public Health. 2024;2(1):37. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Hernandez DCSH. Building a modern port: urban space, local government and social change in Veracruz, Mexico, 1872–1914. The University of Chicago. 2014.
49.Zegeye AF, et al. Prevalence and determinants of inadequate dietary diversity among pregnant women in four Sub-Saharan Africa countries: a multilevel analysis of recent demographic and health surveys from 2021 to 2022. Front Nutr. 2024;11:1405102. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Chakona G, Shackleton CM. Household food insecurity along an agro-ecological gradient influences children’s nutritional status in South Africa. Front Nutr. 2018;4:72. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Zaman S, Ahammed T, Bashar MA. Association between dietary diversity and complications during pregnancy in a South-West district of Bangladesh. Malaysian J Nutr. 2024;30(1):80–84. [Google Scholar]
52.Zemariam AB, et al. Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in ethiopia: evidence from 2016 EDHS. Health Inf J. 2024;30(3):14604582241285769. [DOI] [PubMed] [Google Scholar]
53.Singh S, Jones AD, Jain M. Regional differences in agricultural and socioeconomic factors associated with farmer household dietary diversity in India. PLoS One. 2020;15(4):e0231107. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Waseem M, et al. Do crop diversity and livestock production improve smallholder intra-household dietary diversity, nutrition and sustainable food production? Empirical evidence from Pakistan. Front Sustainable Food Syst. 2023;7:1143774. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset can be accessed at [**https://dhsprogram.com/methodology/survey/survey-display-478.cfm**](https:/dhsprogram.com/methodology/survey/survey-display-478.cfm).

[CR1] 1.Hanieh S, High H, Boulton J. Nutrition justice: Uncovering invisible pathways to malnutrition. Front Endocrinol. 2020;11:150. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.House SH. Transgenerational healing: educating children in genesis of healthy children, with focus on nutrition, emotion, and epigenetic effects on brain development. Nutr Health. 2013;22(1):9–45. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Delisle HF. Poverty: the double burden of malnutrition in mothers and the intergenerational impact. Ann N Y Acad Sci. 2008;1136(1):172–84. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Diversi T, Fraser A. The Good Enough Diet: Where Near Enough is Good Enough to Lose Weight. John Wiley & Sons; 2011. [Google Scholar]

[CR5] 5.Sommer A. Vitamin A deficiency and its consequences: a field guide to detection and control. World Health Organization; 1995.

[CR6] 6.Intakes S. C.o.t.S.E.o.D.R., S.o. Interpretation, and U.o.D.R. Intakes, Dietary reference intakes: applications in dietary assessment. 2001.

[CR7] 7.Warriner KM. Dietary diversity and nutritional status of pregnant women attending an ante-natal clinic in KZN. 2018.

[CR8] 8.Tinker A et al. Women’s health and nutrition. World Bank discussion paper, 1995(256).

[CR9] 9.Soysa P. Women and nutrition. World Rev Nutr Diet. 1987;52:1. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Goodman SH, Gotlib IH. Risk for psychopathology in the children of depressed mothers: a developmental model for understanding mechanisms of transmission. Psychol Rev. 1999;106(3):458. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Ilo J, Onabanjo O, Hamzat A. Dietary diversity and micronutrient intake of adult women in Ogun State, Nigeria (case study). Egypt J Nutr. 2023;38(3):13–21. [Google Scholar]

[CR12] 12.Forsido SF, et al. Maternal dietary practices, dietary diversity, and nutrient composition of diets of lactating mothers in Jimma Zone, Southwest Ethiopia. PLoS One. 2021;16(7):e0254259. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Bekele WK. Food-based strategies to improve iron status of pregnant women: randomized controlled trial. University of South Africa (South Africa); 2019.

[CR14] 14.Aqeel A. Dietary genomics as a complementary tool in childhood malnutrition intervention. Duke University; 2025.

[CR15] 15.Manohar S. Childhood linear growth velocity in the plains (Tarai) of nepal: patterns and risk factors. Johns Hopkins University; 2019.

[CR16] 16.Xu Y, et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation. 2021. 10.1016/j.xinn.2021.100179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Vamathevan J, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discovery. 2019;18(6):463–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Norde MM, et al. The global diet quality score as an indicator of adequate nutrient intake and dietary quality–a nation-wide representative study. Nutr J. 2024;23(1):42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Mekonen EG. Minimum dietary diversity and its determinants among women of childbearing age in three Sub-Saharan African and South Asian countries: evidence from the most recent nationally representative surveys (2022). Womens Health Rep. 2024;5(1):954–64. [Google Scholar]

[CR20] 20.Saaka M, Mutaru S, Osman SM. Determinants of dietary diversity and its relationship with the nutritional status of pregnant women. J Nutr Sci. 2021;10:e14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Babah OA, et al. Dietary diversity insufficiently explains differences in prevalence of anaemia in pregnancy across regions in Nigeria: a secondary analysis of demographic and health survey 2018. PLoS Glob Public Health. 2025;5(5):e0004540. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Kinshella M-LM. Maternal diets and dietary diversity in relation to pregnancy hypertension in sub-Saharan Africa. University of British Columbia; 2024.

[CR23] 23.Vahedi L, et al. Intimate partner violence and women’s dietary diversity: a population-based investigation in 8 low-and middle-income countries. J Nutr. 2025;155(4):1236–45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Fisher R. Nutrient adequacy and dietary diversity of women in the Gauteng and Eastern Cape provinces, South Africa–focus on micronutrients from the national food fortification programme. 2021.

[CR25] 25.Goodarzi M, Dejaegher B, Heyden YV. Feature selection methods in QSAR studies. J AOAC Int. 2012;95(3):636–51. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Jović A, Brkić K, Bogunović N. A review of feature selection methods with applications. 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). Ieee; 2015.

[CR27] 27.Theng D, Bhoyar KK. Feature selection techniques for machine learning: a survey of more than two decades of research. Knowl Inf Syst. 2024;66(3):1575–637. [Google Scholar]

[CR28] 28.Bitew FH, et al. Machine learning approach for predicting under-five mortality determinants in ethiopia: evidence from the 2016 Ethiopian demographic and health survey. Genus. 2020;76(1):37. [Google Scholar]

[CR29] 29.Rao B, et al. Machine learning in predicting child malnutrition: a meta-analysis of demographic and health surveys data. Int J Environ Res Public Health. 2025;22(3):449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Khudri MM, et al. Predicting nutritional status for women of childbearing age from their economic, health, and demographic features: a supervised machine learning approach. PLoS One. 2023;18(5):e0277738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Kananura RM. Machine learning predictive modelling for identification of predictors of acute respiratory infection and diarrhoea in Uganda’s rural and urban settings. PLoS Glob Public Health. 2022;2(5):e0000430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Li J, et al. Interpretable mortality prediction model for ICU patients with pneumonia: using Shapley additive explanation method. BMC Pulm Med. 2024;24(1):447. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Sahlaoui H, et al. Predicting and interpreting student performance using ensemble models and Shapley additive explanations. IEEe Access. 2021;9:152688–703. [Google Scholar]

[CR34] 34.Shehadeh A, Alshboul O. Game theory integration in construction management: a comprehensive approach to cost, risk, and coordination under uncertainty. J Constr Eng Manag. 2025;151(5):04025039. [Google Scholar]

[CR35] 35.Ferraz V. Understanding and Modeling Economic Behavior: Experimental Insights and Computational Perspectives. 2024.

[CR36] 36.Zemariam AB, et al. Prediction of stunting and its socioeconomic determinants among adolescent girls in Ethiopia using machine learning algorithms. PLoS One. 2025;20(1):e0316452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Islam MH, et al. Dietary diversity and micronutrients adequacy among the women of reproductive age at St. Martin’s Island in Bangladesh. BMC Nutr. 2023;9(1):52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Chakona G, Shackleton C. Minimum dietary diversity scores for women indicate micronutrient adequacy and food insecurity status in South African towns. Nutrients. 2017;9(8):812. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Kimuli D, et al. Prevalence and determinants of minimum dietary diversity for women of reproductive age in Uganda. BMC Nutr. 2024;10(1):39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Serrallach O. The Postnatal Depletion Cure: A complete guide to rebuilding your health and reclaiming your energy for mothers of newborns, toddlers and young children. Hachette UK; 2018. [Google Scholar]

[CR41] 41.Katenga-Kaunda LZ, et al. Enhancing nutrition knowledge and dietary diversity among rural pregnant women in Malawi: a randomized controlled trial. BMC Pregnancy Childbirth. 2021;21(1):644. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Acharya DR, et al. Women’s autonomy in household decision-making: a demographic study in Nepal. Reproductive Health. 2010;7(1):15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Hameed W, et al. Women’s empowerment and contraceptive use: the role of independent versus couples’ decision-making, from a lower middle income country perspective. PLoS ONE. 2014;9(8):e104633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Titaley CR, et al. Why don’t some women attend antenatal and postnatal care services? A qualitative study of community members’ perspectives in Garut, Sukabumi and Ciamis districts of West Java Province, Indonesia. BMC Pregnancy Childbirth. 2010;10(1):61. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Salam RA, et al. Essential childbirth and postnatal interventions for improved maternal and neonatal health. Reprod Health. 2014;11(Suppl 1):S3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Michael M, Cheuvront CCJB. Health communication on the Internet: an effective channel for health behavior change? J Health Commun. 1998;3(1):71–9. [DOI] [PubMed] [Google Scholar]

[CR47] 47.Negussie A, et al. Reach and impact of a nationwide media campaign in Ethiopia for promoting safe breastfeeding practices in the context of the COVID-19 pandemic. BMC Global Public Health. 2024;2(1):37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Hernandez DCSH. Building a modern port: urban space, local government and social change in Veracruz, Mexico, 1872–1914. The University of Chicago. 2014.

[CR49] 49.Zegeye AF, et al. Prevalence and determinants of inadequate dietary diversity among pregnant women in four Sub-Saharan Africa countries: a multilevel analysis of recent demographic and health surveys from 2021 to 2022. Front Nutr. 2024;11:1405102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Chakona G, Shackleton CM. Household food insecurity along an agro-ecological gradient influences children’s nutritional status in South Africa. Front Nutr. 2018;4:72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Zaman S, Ahammed T, Bashar MA. Association between dietary diversity and complications during pregnancy in a South-West district of Bangladesh. Malaysian J Nutr. 2024;30(1):80–84. [Google Scholar]

[CR52] 52.Zemariam AB, et al. Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in ethiopia: evidence from 2016 EDHS. Health Inf J. 2024;30(3):14604582241285769. [DOI] [PubMed] [Google Scholar]

[CR53] 53.Singh S, Jones AD, Jain M. Regional differences in agricultural and socioeconomic factors associated with farmer household dietary diversity in India. PLoS One. 2020;15(4):e0231107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Waseem M, et al. Do crop diversity and livestock production improve smallholder intra-household dietary diversity, nutrition and sustainable food production? Empirical evidence from Pakistan. Front Sustainable Food Syst. 2023;7:1143774. [Google Scholar]

PERMALINK

A predictive model of inadequate minimum dietary diversity among women with a child under 24 months in ethiopia: a machine learning approach using the 2016 EDHS

Aychew Kassa Belete

Bantie Getnet Yirsaw

Birhan Ambachew Taye

Abstract

Background

Methods

Findings

Conclusion

Introduction

Materials and methods

Design, data source, setting, and periods

Population of the study

Fig. 1.

Study variables and measurements

Data pre-processing and analytic strategies

Fig. 2.

Data cleaning

Imbalanced data handling

Feature engineering

Data splitting

Model selection

Model training and evaluation

Model interpretability

Ethical considerations

Results

Minimum dietary diversity of women by background characteristics

Fig. 3.

Table 1.

Machine learning analysis of MDD-W data balancing

Table 2.

Features selection using Boruta algorithms

Fig. 4.

Model development and performance evaluation to predict minimum dietary diversity

Fig. 5.

SHAP value interpretation

Fig. 6.

Discussion

Fig. 7.

Fig. 8.

Comparison with previous studies

Strengths and limitations of the study

Conclusion

Acknowledgements

Abbreviations

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases