Abstract
Introduction
Neonatal complications remain a leading cause of illness and death in low- and middle-income countries, particularly in rural areas. Early identification of high-risk neonates is crucial for timely interventions. This study assessed the incidence and determinants of neonatal complications and evaluated the predictive performance of machine learning algorithms using a unified risk framework encompassing both adverse birth outcomes and early postnatal complications.
Methods
We conducted a retrospective cohort study using routinely collected maternal and neonatal records. Five supervised machine learning models - logistic regression (LR), support vector machine (SVM), random forest (RF), artificial neural network (ANN), and extreme gradient boosting (XGBoost) were developed in R. Model performance was assessed with area under the curve (AUC), sensitivity, specificity, F1 score, and calibration. SHapley Additive Explanations (SHAP) identified key predictors. Sensitivity analyses evaluated the robustness of results by examining birth outcomes and postnatal complications separately.
Results
Of the neonates studied, 15.2% (95% CI: 14.0–16.5) experienced complications, with higher rates in rural (17.1%) than urban areas (11.2%, p < 0.01). Preterm birth occurred in 12.7% and low birth weight in 9.4%, while 4.1% developed postnatal complications. XGBoost achieved the highest predictive performance [AUC = 0.85; sensitivity = 78%; specificity = 80%; F1 = 0.76], followed by RF and ANN. LR and SVM showed moderate accuracy. SHAP analysis highlighted maternal age <20, previous neonatal complications, low education, unplanned pregnancy, <4 antenatal visits, anemia, and rural residence as significant predictors. Sensitivity analyses confirmed stable performance across separate outcomes.
Conclusion
Neonatal complications remain prevalent, with pronounced rural–urban disparities. XGBoost offers accurate and interpretable early risk prediction using routine maternal and antenatal data. Targeted interventions including expanded prenatal care, anemia management, and strengthened rural health services could reduce neonatal morbidity and mortality.
Keywords: Neonatal complications, maternal risk factors, antenatal care, machine learning, XGBoost, SHAP analysis, retrospective cohort, inequities in rural health, Ethiopia
Introduction
Neonatal complications continue to be a major global public health concern, accounting for over 47% of all fatalities among children under the age of five. 1 The burden is highest in low- and middle-income countries (LMICs), particularly in Sub-Saharan Africa, where inadequate access to prompt interventions, experienced birth attendants, and neonatal intensive care contributes to poor outcomes.2,3 Ethiopia is no different; neonatal morbidity and mortality remain high, particularly in rural regions with little health-care infrastructure, despite dramatic reductions in under-five mortality over the last 20 years. 4 Neonatal complication remain a pressing public health concern in the Sidama region, characterized by a preponderance of rural residents and logistical obstacles to hospital access. 5
Early detection is associated with improved survival rates for high-risk newborns. Conventional risk classification systems rely heavily on clinician experience and paper-based monitoring, which can be resource costly, inconsistent, or delayed. 6 There is growing evidence that early warning systems powered by artificial intelligence (AI) and machine learning (ML) can enhance forecast accuracy for poor neonatal outcomes.7,8 In high-income countries, these methods have produced positive results, including accurate prediction of preterm birth (PTB), infant infections, and respiratory problems.9,10 Nonetheless, AI-driven prediction models are largely underutilized in rural, low-resource areas, particularly in Sub-Saharan Africa, where challenges such as missing data, limited digital infrastructure, and contextual variables might hinder model performance. 11
Despite technological advances, research gaps remain in Ethiopia and other LMICs. First, rural newborn populations are underrepresented in research since most studies focus on hospitalized patients in urban areas. 12 Second, the use of current AI models for health-care providers in resource-constrained settings is limited, as they often focus on specific outcomes or lack interpretability. 13 Third, prediction algorithms rarely include contextual factors known to influence neonatal issues, such as mother education, geographic accessibility, and antenatal care (ANC) utilization. 14 These shortcomings highlight the critical need for interpretable, contextually appropriate AI systems that can reliably identify high-risk neonates in rural Ethiopia.
There are significant clinical and public health implications to closing these gaps. AI-driven early warning systems that prioritize high-risk cases may potentially help guide interventions to reduce neonatal morbidity and mortality. 15 Furthermore, by providing practical insights into regional and national health-care planning for women and newborns, these models can help to build evidence-based policies. 16 The use of these technologies in distant areas such as the Sidama region has the potential to transform newborn care delivery, improve health equity, and meet the Sustainable Development Goals for neonatal survival. 17
The current research aims to develop and evaluate AI-driven early warning systems for neonatal complications in the Sidama region of rural Ethiopia to address these unmet needs. This study aims to assess model performance and interpretability, identify key maternal, neonatal, and contextual factors, and make recommendations for real-world implementation in resource-constrained settings. The study's goal is to increase the use of AI in neonatal health-care and provide recommendations for programs to reduce neonatal morbidity and mortality in rural Ethiopia by closing significant research gaps.
Methods
Study design and setting
A retrospective cohort study was conducted in the Northern Zone of the Sidama region, Ethiopia, from January 2021 to January 2025. The study included four rural districts—Boricha, Hawela, Shebedino, and Bilate Zuriya—that face significant impediments to modern neonatal care. 18 Multiple government hospitals and health facilities contributed health data to ensure comprehensive coverage of deliveries in the region.
Study population
Eligible participants comprised all women who gave birth to live babies throughout the study period and had complete obstetric, maternal, and neonatal records. Exclusion criteria included multiple gestations other than twins, congenital abnormalities that were not compatible with life, and inadequate data. The final cohort included 3954 mother-infant pairs, providing sufficient power for predictive modeling.
Sample size and power calculation
The study cohort included 3954 mother–infant dyads, which provided sufficient statistical power for the planned analyses. Power calculations were based on the primary composite neonatal outcome (PTB, low birth weight (LBW), or postnatal complication), with an observed prevalence of 15.2%. Assuming a binary outcome, a two-sided significance level of 0.05, and an anticipated adjusted risk ratio of ≥1.5 for key maternal predictors, this sample size yields over 90% power to detect meaningful associations in multivariable regression and ML models. The large cohort also supports model training, validation, and sensitivity analyses while ensuring stable estimation of predictor effects.
Outcome definition
In this study, we defined overall neonatal risk as a composite outcome encompassing both birth outcomes and postnatal complications occurring within the first 28 days of life. The composite outcome includes:
Birth outcomes: PTB (birth before 37 completed weeks of gestation) and LBW (<2500 g).
Postnatal neonatal complications: neonatal sepsis, jaundice requiring phototherapy, hypoglycemia, and respiratory distress. Each complication was given a binary code (0 = nonexistent, 1 = present).
This composite measure captures the full spectrum of early neonatal complications, enabling timely risk stratification and intervention planning in low-resource settings. To ensure conceptual clarity, we also conducted sensitivity analyses using separate models for birth outcomes and postnatal complications, confirming that predictors identified in the primary analysis remained consistent across distinct outcome categories.
Predictor variable selection
Predictor variables were chosen based on clinical relevance, prior evidence, and data availability within the study setting. Maternal and antenatal factors such as maternal age, education, history of previous neonatal complications, anemia, ANC visits, pregnancy planning status, and place of residence were prioritized due to their established associations with neonatal outcomes in Ethiopia and similar low-resource contexts. Contextual variables, including distance to health facilities, were also considered to capture environmental influences. We focused on interpretable, clinically meaningful predictors and did not employ unsupervised learning to identify additional variables, ensuring that the model could be realistically implemented using routinely collected data.
Data collection, quality checks, and preprocessing
Data were extracted from both electronic medical records and paper charts using a standardized abstraction form. Approximately 40% of the records were obtained from paper charts, while 60% were from electronic sources. Data quality was assessed by cross-checking entries, identifying inconsistencies, and referencing primary records as necessary. Missing data were handled using multiple imputation and chained equations. 19 Continuous values were normalized, while categorical variables were encoded once. To address potential class imbalance in neonatal complications, the training set was oversampled for minority outcomes using Synthetic Minority Over-sampling Technique (SMOTE). The data was randomly partitioned into training (70%) and validation (30%) sets. To address class imbalance in the primary composite outcome, the SMOTE was applied only to the training set. SMOTE used maternal and antenatal predictors exclusively. The test set remained untouched to ensure unbiased model evaluation. Although the class imbalance was moderate, SMOTE was used to improve the model's learning of the minority class. We also tested alternative approaches, including class weighting, which produced comparable results.
Machine learning modeling
We implemented five supervised ML algorithms—logistic regression (RL), random forest (RF), support vector machine (SVM), artificial neural network (ANN), and XGBoost to predict the composite neonatal risk outcome. Sensitivity analyses modeled birth outcomes and postnatal complications separately. Hyperparameters were tuned using a 10-fold cross-validation grid search; the final parameters are listed in Supplemental Table 1. The ANN architecture consisted of 1–2 hidden layers with 10–30 neurons per layer, using ReLU or sigmoid activation, trained with the Adam optimizer for 100–200 epochs. This approach ensures transparency and reproducibility, and enables replication of model training and evaluation in similar low-resource settings.
Model interpretability
To enhance interpretability, Shapley Additive Explanations (SHAP) were applied to each model to quantify the contribution of individual predictors and explore potential interactions. 20 This approach allows clinicians and policymakers to identify high-risk newborns using only pre-delivery maternal information while maintaining transparency and reproducibility.
Data preprocessing and missing data
Variables with more than 20% missing values were excluded, including maternal weight, mid-upper arm circumference, selected laboratory measures (hemoglobin, malaria, parasites, syphilis), and some socioeconomic indicators (income and occupation), which were inconsistently recorded. The 20% cutoff balanced the retention of important predictors with the need to minimize bias; a stricter threshold (e.g. 10%) would have excluded key variables, reducing model interpretability and generalizability in this low-resource setting. For the retained variables, the median proportion of missing values was 4% (range: 1–16%), which were handled using multiple imputation with chained equations to preserve statistical power and reliability.
Statistical analysis
Descriptive statistics summarized maternal, neonatal, and environmental characteristics. Categorical data were given as frequencies and percentages, whereas continuous variables were shown as means ± standard deviation (SD). The bivariate relationships were investigated using chi-square and t-tests. Statistical significance was determined at p < 0.05 for all analyses conducted in R. The composite outcome approach preserves statistical power and reflects the overall early neonatal risk burden, a clinically meaningful measure in low-resource rural settings. The separate sensitivity analyses serve to validate the conceptual integrity of the composite outcome, ensuring that findings are biologically plausible and consistent with individual outcome models.
Results
Study population characteristics
The final cohort included 3954 mother-infant dyads. The mean maternal age was 27.4 ± 5.8 years, with 18.6% under 20. The most of participants (67.3%) lived in rural areas, and 35.2% had less than a primary education. Approximately 28.9% of pregnancies were unplanned, and 41.5% of women received fewer than four ANC visits. Maternal anemia was diagnosed in 21.4% of the mothers. Table 1 presents the baseline maternal, neonatal, and environmental factors.
Table 1.
Baseline characteristics of the study population (n = 3954).
| Characteristic | n (%) or mean ± SD |
|---|---|
| Maternal age (years) | 27.4 ± 5.8 |
| <20 | 735 (18.6) |
| 20–34 | 2689 (68.0) |
| ≥35 | 530 (13.4) |
| Maternal education | |
| No formal education | 1390 (35.2) |
| Primary | 1420 (35.9) |
| Secondary and above | 1144 (28.9) |
| Residence | |
| Rural | 2657 (67.3) |
| Urban | 1297 (32.7) |
| Antenatal care visits <4 | 1641 (41.5) |
| Unplanned pregnancy | 1141 (28.9) |
| Maternal anemia | 846 (21.4) |
| Neonatal birth weight <2500 g | 371 (9.4) |
| Preterm birth (<37 weeks) | 503 (12.7) |
| Neonatal sepsis | 162 (4.1) |
| Distance to nearest health facility >5 km | 1870 (47.3) |
Incidence of neonatal complications
In the final cohort of 3954 mother-infant dyads, 15.2% (95% CI: 14.0–16.5) of neonates experienced at least one event included in the composite neonatal risk outcome within the first 28 days of life. PTB (12.7%) and LBW (9.4%) were the most frequent birth outcomes, while postnatal complications, including neonatal sepsis, jaundice, hypoglycemia, and respiratory distress, occurred in 4.1% of neonates. The incidence of composite neonatal risk was significantly higher in rural districts (17.1%) than in urban areas (11.2%; p < 0.01), highlighting persistent rural–urban inequities. Figure 1 displays the incidence by district.
Figure 1.
Incidence of neonatal complications by district.
Machine learning model performance
Five supervised ML algorithms were applied to predict the composite neonatal risk outcome. Table 2 summarizes the comparative performance of these models, whereas Figure 2 depicts them. Among them, XGBoost had the highest predictive power, with an area under the curve (AUC) of 0.85 (95% CI: 0.82–0.88), sensitivity and specificity of 78% (95% CI: 73–82) and 80% (95% CI: 76–84), respectively, and an F1-score of 0.76 (95% CI: 0.72–0.80). The RF and ANN models were also effective, with AUC values of 0.83 (95% CI: 0.80–86) and 0.81 (95% CI: 0.78–0.84), respectively. In contrast, LR and SVM models offer acceptable predictive ability, with AUCs of 0.75 (95% CI: 0.72–0.78) and 0.77 (95% CI: 0.74–0.80), respectively. Calibration analysis demonstrated good agreement between predicted probabilities and observed outcomes for all models, with XGBoost showing the closest alignment (calibration slope = 1.02). Hyperparameters for each model, including ANN architecture, learning rates, and tree parameters, are detailed in Supplemental Table 2, ensuring transparency and enabling replication of the modeling process in similar low-resource settings. Given the moderate incidence of neonatal complications (15.2%), we also evaluated model performance using precision–recall curves. The area under the precision–recall curve for XGBoost was 0.68 (95% CI: 0.64–0.72), indicating strong positive predictive performance, and consistently outperformed the other models (random forest: 0.64; ANN: 0.62; SVM: 0.57; logistic regression: 0.55). These results are presented in Supplemental Table 3, which highlights the robustness of XGBoost in identifying high-risk neonates, even when the positive outcome is relatively rare.
Table 2.
Performance metrics of machine learning models for predicting composite neonatal complications.
| Model | AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1-score (95% CI) | Calibration slope |
|---|---|---|---|---|---|
| Logistic Regression | 0.75 (0.72–0.78) | 0.70 (0.65–0.75) | 0.72 (0.68–0.76) | 0.66 (0.62–0.70) | 0.93 |
| Random Forest | 0.83 (0.80–0.86) | 0.76 (0.71–0.81) | 0.78 (0.74–0.82) | 0.73 (0.69–0.77) | 0.98 |
| XGBoost | 0.85 (0.82–0.88) | 0.78 (0.73–0.82) | 0.80 (0.76–0.84) | 0.76 (0.72–0.80) | 1.02 |
| SVM | 0.77 (0.74–0.80) | 0.72 (0.67–0.77) | 0.74 (0.70–0.78) | 0.68 (0.64–0.72) | 0.95 |
| ANN | 0.81 (0.78–0.84) | 0.74 (0.69–0.79) | 0.76 (0.72–0.80) | 0.72 (0.68–0.76) | 0.99 |
AUC: area under the curve; ANN: artificial neural network; CI: confidence interval; SVM: support vector machine; XGBoost: extreme gradient boosting.
Figure 2.
ROC curves of the five machine learning models.
F1 score and calibration interpretation
F1 scores and calibration slopes provide complementary insights into model performance. The F1 score reflects the balance between precision and recall, whereas calibration slopes measure agreement between predicted and observed outcomes. For instance, XGBoost's F1 score of 0.76 and calibration slope of 1.02 demonstrate both high discriminative ability and well-calibrated predictions. Models with lower F1 scores, such as logistic regression (0.66) and slope 0.93, exhibited slightly less optimal calibration, consistent with reduced predictive performance.
Key predictors of neonatal complications
Figure 3 shows the SHAP analysis of the XGBoost model, which showed the significant predictors of neonatal complications. Individual predictors discovered included maternal age under 20 years, a history of past neonatal complications, and low maternal education. Unplanned pregnancy, fewer than four ANC visits, maternal anemia, and contextual factors such as rural residence and a long distance to the nearest health facility were also found as risks.
Figure 3.
SHAP summary plot of the XGBoost model for predicting composite neonatal complications.
The highest compounded risk was observed among neonates whose mothers were both anemic and attended fewer than four ANC visits. This combination markedly increased predicted neonatal risk compared with either factor alone. Other potential interactions, such as young maternal age combined with low education, unplanned pregnancy combined with limited ANC, or rural residence combined with long travel distance, were carefully evaluated for effect modification, but none reached statistical significance. These findings highlight that, while several maternal and contextual factors contribute individually to neonatal risk, the primary compounding effect is driven by anemia combined with inadequate ANC (Figure 4).
Figure 4.
SHAP interaction plot illustrating the combined effect of maternal anemia and <4 ANC visits on predicted neonatal complication risk.
Sensitivity analyses results
Birth outcomes (PTB and LBW)
Sensitivity analyses were conducted using separate models to predict PTB and LBW. Among the cohort, PTB and LBW occurred in 12.7% and 9.4% of neonates, respectively. XGBoost again demonstrated the highest predictive performance for both outcomes (Table 3): PTB: AUC = 0.84, sensitivity = 76%, specificity = 79%, F1-score = 0.74 and LBW: AUC = 0.82, sensitivity = 74%, specificity = 77%, F1-score = 0.71.
Table 3.
Performance metrics for predicting preterm birth and low birth weight.
| Model | PTB AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1-score (95% CI) | LBW AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1-score (95% CI) |
|---|---|---|---|---|---|---|---|---|
| XGBoost | 0.84 (0.81–0.87) | 0.76 (0.71–0.81) | 0.79 (0.75–0.83) | 0.74 (0.70–0.78) | 0.82 (0.79–0.85) | 0.74 (0.69–0.79) | 0.77 (0.73–0.81) | 0.71 (0.67–0.75) |
| Random Forest | 0.82 (0.79–0.85) | 0.74 (0.69–0.79) | 0.77 (0.73–0.81) | 0.71 (0.67–0.75) | 0.80 (0.77–0.83) | 0.72 (0.68–0.76) | 0.75 (0.71–0.79) | 0.69 (0.65–0.73) |
| ANN | 0.80 (0.77–0.83) | 0.73 (0.68–0.78) | 0.76 (0.72–0.80) | 0.70 (0.66–0.74) | 0.78 (0.75–0.81) | 0.71 (0.66–0.76) | 0.74 (0.70–0.78) | 0.67 (0.63–0.71) |
| SVM | 0.76 (0.73–0.79) | 0.70 (0.65–0.75) | 0.73 (0.69–0.77) | 0.66 (0.62–0.70) | 0.74 (0.71–0.77) | 0.69 (0.64–0.74) | 0.72 (0.68–0.76) | 0.65 (0.61–0.69) |
| Logistic Regression | 0.74 (0.71–0.77) | 0.68 (0.63–0.73) | 0.71 (0.67–0.75) | 0.64 (0.60–0.68) | 0.72 (0.69–0.75) | 0.67 (0.62–0.72) | 0.70 (0.66–0.74) | 0.63 (0.59–0.67) |
AUC: area under the curve; ANN: artificial neural network; CI: confidence interval; SVM: support vector machine; XGBoost: extreme gradient boosting; PTB: predict preterm birth; LBW: low birth weight.
SHAP analysis identified the most important maternal and antenatal predictors for PTB and LBW as maternal age <20 years, low maternal education, history of previous neonatal complications, unplanned pregnancy, and fewer than four ANC visits. Rural residence and maternal anemia also contributed to higher predicted risks, consistent with findings from the primary composite outcome model (Figures 5 and 6).
Figure 5.
SHAP summary plot showing the top 7 predictors of preterm birth complications (XGBoost model).
Figure 6.
SHAP summary plot showing the top 7 predictors of low-birth-weight complications (XGBoost model).
Postnatal complications
Separate models were also developed for postnatal neonatal complications, including neonatal sepsis, jaundice, hypoglycemia, and respiratory distress, which collectively occurred in 4.1% of neonates. XGBoost again achieved the best performance: AUC = 0.83, sensitivity = 77%, specificity = 78%, F1-score = 0.72 (Table 4).
Table 4.
Performance metrics for postnatal complications.
| Model | AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1-score (95% CI) |
|---|---|---|---|---|
| XGBoost | 0.83 (0.80–0.86) | 0.77 (0.72–0.82) | 0.78 (0.74–0.82) | 0.72 (0.68–0.76) |
| Random Forest | 0.81 (0.78–0.84) | 0.75 (0.70–0.80) | 0.77 (0.73–0.81) | 0.70 (0.66–0.74) |
| ANN | 0.79 (0.76–0.82) | 0.73 (0.68–0.78) | 0.75 (0.71–0.79) | 0.68 (0.64–0.72) |
| SVM | 0.76 (0.73–0.79) | 0.70 (0.65–0.75) | 0.73 (0.69–0.77) | 0.65 (0.61–0.69) |
| Logistic Regression | 0.74 (0.71–0.77) | 0.68 (0.63–0.73) | 0.71 (0.67–0.75) | 0.63 (0.59–0.67) |
AUC: area under the curve; ANN: artificial neural network; CI: confidence interval; SVM: support vector machine; XGBoost: extreme gradient boosting.
Key predictors identified were largely consistent with the birth outcome models, with maternal anemia, limited ANC attendance, rural residence, and low maternal education showing the largest contributions. Notably, interaction effects were observed, highlighting that neonates exposed to multiple maternal risk factors were at disproportionately higher risk of postnatal complications (Figure 7).
Figure 7.
SHAP summary plot showing the top 6 predictors of postnatal complications (XGBoost model).
Comparison with primary composite outcome
Overall, the sensitivity analyses confirmed that the predictors of individual outcomes aligned closely with those identified in the primary composite outcome model, supporting the robustness of our primary analysis. The results demonstrate that maternal and antenatal features alone can effectively identify neonates at high risk for both birth outcomes and postnatal complications, providing a biologically plausible and clinically interpretable risk stratification framework. Sensitivity analyses using class weighting instead of SMOTE yielded similar model performance, supporting the robustness of our findings.
Discussion
Incidence of neonatal complications
In this retrospective cohort research, neonatal complications remain substantial, particularly in rural areas. We recognize that PTB and LBW occur at delivery, whereas postnatal complications such as sepsis, jaundice, hypoglycemia, and respiratory distress develop after birth. To address this conceptual distinction, we used a composite outcome as the primary measure of overall neonatal risk, which captures both birth and postnatal events. These findings highlight persistent rural–urban disparities in neonatal health and highlight structural inequities in health-care access. Similar rural–urban discrepancies have been found in Sub-Saharan Africa and South Asia, where structural impediments to maternal and neonatal health care disproportionately affect rural communities.21–24
Machine learning model performance
Our analysis suggests that ensemble learning techniques, particularly XGBoost, show improved predictive performance when maternal and antenatal features are used alone. Importantly, all models were trained using maternal and antenatal features only, excluding neonatal features measured at birth, ensuring temporal validity. To validate the composite outcome, we conducted sensitivity analyses with separate models for birth outcomes (PTB and LBW) and postnatal complications. These separate models produced consistent predictor patterns and similar performance metrics, supporting the biological and statistical rationale for the composite outcome. The clinical utility of these models lies in their ability to stratify neonates by risk before delivery, allowing targeted interventions in resource-limited rural settings. Model calibration indicated reasonable concordance between predicted and observed risks, further supporting the reliability of these estimates.
These findings align with growing evidence that ML can predict maternal and neonatal health risks. In Ethiopia, ensemble learning techniques such as random forest and XGBoost have been shown to outperform traditional regression models in forecasting neonatal outcomes. 25 Prenatal risk stratification, including the prediction of PTB and neonatal intensive care admission, has seen global uptake, particularly with gradient boosting and neural networks.26,27 Our results add to this literature by showing that XGBoost consistently outperforms alternative models in a rural, retrospective cohort, reinforcing its suitability for early neonatal risk prediction. While SMOTE was applied only to the training set and model performance was confirmed with alternative class weighting, we acknowledge that synthetic oversampling may still influence predictive performance and should be considered in future external validation studies.
Key predictors of neonatal complications
Maternal age less than 20 years, limited ANC, anemia, and low education emerged as key predictors of neonatal complications. Notably, these predictors were consistent across both the primary composite outcome model and the sensitivity analyses for birth outcomes and postnatal complications, indicating that maternal and antenatal factors alone reliably identify neonates at risk.
Neonates born to anemic mothers who did not receive adequate ANC care, for example, had a considerably higher projected risk, highlighting associations between maternal factors and neonatal outcomes. These findings are consistent with previous Ethiopian research, which discovered a strong link between newborn morbidity and death, maternal anemia, poor ANC use, and low maternal education.28,29 Similar findings have been found in South Asia and Sub-Saharan Africa, where maternal age extremes, anemia, and rural residency remain major predictors of neonatal complications.30–32 Our findings on compounded risk effects are comparable to those of previous studies from Bangladesh and India, which found that overlapping vulnerabilities greatly increased neonatal risk.33,34
Comparison with previous findings
Our findings corroborate previous research, while providing novel insights through interpretable ML models in a rural Ethiopian context. While previous community-based studies in Ethiopia report a higher incidence (27%) of neonatal illness, 21 this discrepancy may reflect differences in study design, case definitions, and recall bias. 21 By separating birth outcomes from postnatal complications in sensitivity analyses, we confirmed that maternal and antenatal predictors consistently influence both sets of outcomes, supporting the validity of the composite primary outcome. The breakdown of outcomes in our study is consistent with WHO global estimates, which place LBW at approximately 15% and PTB at 10% - 12% in LMICs. 35
Consistent with prior literature, ensemble learning approaches such as XGBoost have been shown to achieve improved predictive performance compared with traditional regression models in perinatal health prediction.26,36 Importantly, because temporality can be determined more precisely than in cross-sectional studies, our retrospective cohort design strengthens the validity of the observed associations between maternal factors and neonatal outcomes.
Strengths and limitations
Key strengths include a large, well-characterized cohort, comprehensive inclusion of maternal and antenatal predictors, and interpretable ML modeling using SHAP. The primary composite outcome is further validated through sensitivity analyses that separately examined birth and postnatal outcomes, ensuring conceptual clarity and biological plausibility.
However, it is critical to understand its limitations. Variability in clinical documentation, missing records, and inaccurate classification may still affect data quality in this retrospective cohort. Not all-important social determinants were considered, including household and community context. Model validation was limited to internal data splitting, and no temporal, geographical, or external validation cohort was available; therefore, the generalizability of the findings beyond the study setting may be limited. Our analysis focused on neonatal complications, and we did not collect information on neonatal mortality, limiting broader outcome assessment. A substantial portion of data (≈40%) was extracted from paper charts, which may limit the potential for fully automated data collection and real-time implementation of predictive models. Although we took rigorous steps to ensure data accuracy, reliance on manual records could introduce errors or delays in workflow deployment. Future studies should prioritize digital record systems to enhance scalability and efficiency. Finally, while the design supports temporal ordering, unmeasured confounding cannot be ruled out. Therefore, caution should be exercised when deriving causal findings.
Practical and policy implications
Our findings have significant implications for policy and practice. The identification of high-risk populations, such as teenage mothers, anemic women, and those who attend ANC less frequently, provides a solid framework for targeted interventions. Initiatives to boost ANC coverage and improve maternal anemia treatment may have a major influence on reducing neonatal complications. To address structural inequalities, the report underlines the importance of policy-level support for maternal education efforts, transportation, and rural health infrastructure. Our results demonstrate that early risk prediction using maternal and antenatal features can identify high-risk neonates before delivery, enabling timely, targeted interventions even in low-resource rural settings. Finally, ML algorithms such as XGBoost could be integrated into electronic health record systems to help with appropriate resource allocation and real-time risk classification in neonatal care.
Implementation and practical use
The model was intentionally designed using only maternal and antenatal information that is routinely collected in local health facilities, making it practical for real-world use. Key factors such as maternal age, number of ANC visits, anemia status, pregnancy planning, and place of residence can be entered into a simple tool to generate individualized neonatal risk scores before delivery. Neonates identified as high-risk could then receive closer monitoring, timely interventions, or referral to higher-level care. Moving forward, we plan to develop a user-friendly interface, validate the model prospectively in rural health centers, and provide training for health-care staff to ensure that it can be applied effectively and sustainably in low-resource settings.
Conclusion
This retrospective cohort study provides valuable data on the incidence, contributory variables, and predictive modeling of neonatal complications in Ethiopia. While the identification of relevant predictors highlights the interwoven roles of maternal, environmental, and health-care system factors, the observed incidence demonstrates the persistent burden of neonatal complications. Sensitivity analyses separating birth outcomes and postnatal complications validate the composite outcome approach, addressing conceptual concerns and reinforcing the robustness of our findings. The improved performance of XGBoost validates ML's potential for predicting neonatal complications in resource-constrained contexts. Improving neonatal survival and health outcomes is likely supported by policy interventions that enhance ANC coverage, address maternal anemia, and reduce rural–urban disparities.
Supplemental Material
Supplemental material, sj-docx-1-dhj-10.1177_20552076261431481 for Machine learning/AI for early neonatal complication detection in rural Ethiopia: A retrospective cohort study in the Sidama region by Amanuel Yoseph, Yohannes Seifu Berego, Mehretu Belayneh and Francisco Guillen-Grima in DIGITAL HEALTH
Supplemental material, sj-docx-2-dhj-10.1177_20552076261431481 for Machine learning/AI for early neonatal complication detection in rural Ethiopia: A retrospective cohort study in the Sidama region by Amanuel Yoseph, Yohannes Seifu Berego, Mehretu Belayneh and Francisco Guillen-Grima in DIGITAL HEALTH
Supplemental material, sj-docx-3-dhj-10.1177_20552076261431481 for Machine learning/AI for early neonatal complication detection in rural Ethiopia: A retrospective cohort study in the Sidama region by Amanuel Yoseph, Yohannes Seifu Berego, Mehretu Belayneh and Francisco Guillen-Grima in DIGITAL HEALTH
Acknowledgements
We express our sincere gratitude to the Sidama President's Office for their kind financial assistance, which was essential to the accomplishment of this study. We also acknowledge the cooperation of the District Health Offices and the Sidama Regional Health Bureau. Our deepest gratitude goes out to the supervisors, field assistants, data collectors, and study participants whose efforts were invaluable. Lastly, we would like to thank Hawassa University's School of Public Health for providing technical assistance during the study's design and data processing.
Footnotes
ORCID iDs: Amanuel Yoseph https://orcid.org/0000-0002-7708-6370
Yohannes Seifu Berego https://orcid.org/0000-0002-9266-1126
Ethics approval and consent to participate: The Institutional Review Board of Hawassa University College of Medicine and Health Sciences provided ethical approval (reference: IRB/076/15). Data confidentiality was tightly enforced, and all patient identifiers were anonymized before analysis. The work followed Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) standards for observational studies, ensuring transparency, reproducibility, and ethical rigor.
Author contributions: AY designed the study, curated the data, carried out the formal analysis, and oversaw the project. AY, LM, and MB worked on the study design, methodology, investigation, supervision, validation, and software development. AY, LM, MB, and FGG were active in visualizing the results and authoring the initial manuscript. All authors (AY, LM, MB, and FGG) critically examined and modified the manuscript for intellectual content. All writers read and approved the final manuscript.
Funding: The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Sidama President's Office provided funding for this study (Grant Number: HUHS/021/2015). Study design, data collection, analysis, interpretation, manuscript preparation, and the decision to submit the work for publication were all outside the funder's purview.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Availability of data and materials: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplemental material: Supplemental material for this article is available online.
References
- 1.Herbst M. The State of the World's Children 2024: The Future of Childhood in a Changing World. UNICEF. 3 United Nations Plaza, New York, NY 10017; 2024.
- 2.Lawn JE, Blencowe H, Oza S, et al. Every newborn: progress, priorities, and potential beyond survival. Lancet 2014; 384: 189–205. [DOI] [PubMed] [Google Scholar]
- 3.Onambele L, Ortega-Leon W, Guillen-Aguinaga S, et al. Maternal mortality in Africa: regional trends (2000–2017). Int J Environ Res Public Health 2022; 19: 13146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Csa IJ. Central statistical agency (CSA)[Ethiopia] and ICF. Ethiopia demographic and health survey, Addis Ababa, Ethiopia and Calverton, Maryland, USA. 2016;1(1).
- 5.Taye K, Kebede Y, Tsegaw D, et al. Predictors of neonatal mortality among neonates admitted to the neonatal intensive care unit at Hawassa University Comprehensive Specialized Hospital, Sidama regional state, Ethiopia. BMC Pediatr 2024; 24: 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Milner KM, Neal EF, Roberts G, et al. Long-term neurodevelopmental outcome in high-risk newborns in resource-limited settings: a systematic review of the literature. Paediatr Int Child Health 2015; 35: 227–242. [DOI] [PubMed] [Google Scholar]
- 7.McAdams RM, Kaur R, Sun Y, et al. Predicting clinical outcomes using artificial intelligence and machine learning in neonatal intensive care units: a systematic review. J Perinatol 2022; 42: 1561–1575. [DOI] [PubMed] [Google Scholar]
- 8.Mangold C, Zoretic S, Thallapureddy K, et al. Machine learning models for predicting neonatal mortality: a systematic review. Neonatology 2021; 118: 394–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Tądel K, Dudek A, Bil-Lula I. AI Algorithms for modeling the risk, progression, and treatment of sepsis, including early-onset sepsis—a systematic review. J Clin Med 2024; 13: 5959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Escobar GJ, Soltesz L, Schuler A, et al. Prediction of obstetrical and fetal complications using automated electronic health record data. Am J Obstet Gynecol 2021; 224: 137–147. [DOI] [PubMed] [Google Scholar]
- 11.Adedinsewo DA, Onietan D, Morales-Lara AC, et al. Contextual challenges in implementing artificial intelligence for healthcare in low-resource environments: insights from the SPEC-AI Nigeria trial. Front Cardiovasc Med 2025; 12: 1516088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Asaye S, Sekata D, Birhanu D, et al. Trends and determinants of neonatal mortality in rural Ethiopia. Sage Open Pediatrics 2025; 12: 30502225251319871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bharati S, Mondal MR, Podder P. A review on explainable artificial intelligence for healthcare: why, how, and when? IEEE Trans Artif Intell 2023; 5: 1429–1442. [Google Scholar]
- 14.Debelew GT, Afework MF, Yalew AW. Determinants and causes of neonatal mortality in jimma zone, southwest Ethiopia: a multilevel analysis of prospective follow up study. PloS One 2014; 9: e107184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chavda VP, Patel K, Patel S, et al. Artificial intelligence and machine learning in healthcare sector. Bioinf Tools Pharm Drug Prod Dev 2023: 285–314. doi: 10.1002/9781119865728.ch13 [DOI] [Google Scholar]
- 16.López DM, Rico-Olarte C, Blobel B, et al. Challenges and solutions for transforming health ecosystems in low-and middle-income countries through artificial intelligence. Front Med 2022; 9: 958097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rambler S, Nayyar S, Bonini A. How can we accelerate transformations to achieve the Sustainable Development Goals (SDGs)? Insights from the 2023 Global Sustainable Development Report.
- 18.Yoseph A, Teklesilasie W, Guillen-Grima F, et al. Individual-and community-level determinants of maternal health service utilization in southern Ethiopia: a multilevel analysis. Women's Health 2023; 19: 17455057231218195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw 2011; 45: 1–67. [Google Scholar]
- 20.Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017; 30. [Google Scholar]
- 21.Yoseph A, Simachew Y, Tsegaye B, et al. Individual and community-level determinants of knowledge of obstetric danger signs among women in Southern Ethiopia: a multi-level mixed effect negative binomial analysis. PLoS One 2025; 20: e0314916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mitiku HD. Neonatal mortality and associated factors in Ethiopia: a cross-sectional population-based study. BMC Womens Health 2021; 21: 56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Je L. 4 Million neonatal deaths: when? Where? Why? Lancet 2005; 365: 891–900. [DOI] [PubMed] [Google Scholar]
- 24.Oza S, Cousens SN, Lawn JE. Estimation of daily risk of neonatal death, including the day of birth, in 186 countries in 2013: a vital-registration and modelling-based study. Lancet Glob Health 2014; 2: e635–e644. [DOI] [PubMed] [Google Scholar]
- 25.Mengistu AK, Mengistie MB. Modeling and Forecasting Neonatal Mortality in Ethiopia: A Comparative Study Using Statistical, Machine Learning, and Deep Learning Approaches.
- 26.Sidey-Gibbons JA, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol 2019; 19: 64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Reddy CD, Van den Eynde J, Kutty S. Artificial intelligence in perinatal diagnosis and management of congenital heart disease. Semin Perinatol 2022; 46: 151588. [DOI] [PubMed] [Google Scholar]
- 28.Kolobo HA, Chaka TE, Kassa RT. Determinants of neonatal mortality among newborns admitted to neonatal intensive care unit Adama, Ethiopia: a case–control study. J Clin Neonatol 2019; 8: 232–237. [Google Scholar]
- 29.Tiruneh GT, Birhanu TM, Seid A, et al. Systematic review neonatal mortality in neonatal intensive care unit hospitals in Ethiopia remains high: a systematic review and meta-analysis. Ethiop Med J 2021: 153. [Google Scholar]
- 30.Thiruvengadam R, Murugesan DR, Desiraju BK, et al. Incidence of and risk factors for small vulnerable newborns in north India: a secondary analysis of a prospective pregnancy cohort. Lancet Glob Health 2024; 12: e1261–e1277. [DOI] [PubMed] [Google Scholar]
- 31.Kabir MA, Rahman MM, Khan MN. Maternal anemia and risk of adverse maternal health and birth outcomes in Bangladesh: a nationwide population-based survey. PloS One 2022; 17: e0277654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Beatriz ED, Molnar BE, Griffith JL, et al. Urban-rural disparity and urban population growth: a multilevel analysis of under-5 mortality in 30 sub-Saharan African countries. Health Place 2018; 52: 196–204. [DOI] [PubMed] [Google Scholar]
- 33.Roy S, Haque MA. Effect of antenatal care and social well-being on early neonatal mortality in Bangladesh. BMC Pregnancy Childbirth 2018; 18: 85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kalter HD, Mohan P, Mishra A, et al. Maternal death inquiry and response in India-the impact of contextual factors on defining an optimal model to help meet critical maternal health policy objectives. Health Res Policy Syst 2011; 9: 41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Blencowe H, Krasevec J, De Onis M, et al. National, regional, and worldwide estimates of low birthweight in 2015, with trends from 2000: a systematic analysis. Lancet Glob Health 2019; 7: e849–e860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Luo Y, Szolovits P, Dighe AS, et al. Using machine learning to predict laboratory test results. Am J Clin Pathol 2016; 145: 778–788. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-docx-1-dhj-10.1177_20552076261431481 for Machine learning/AI for early neonatal complication detection in rural Ethiopia: A retrospective cohort study in the Sidama region by Amanuel Yoseph, Yohannes Seifu Berego, Mehretu Belayneh and Francisco Guillen-Grima in DIGITAL HEALTH
Supplemental material, sj-docx-2-dhj-10.1177_20552076261431481 for Machine learning/AI for early neonatal complication detection in rural Ethiopia: A retrospective cohort study in the Sidama region by Amanuel Yoseph, Yohannes Seifu Berego, Mehretu Belayneh and Francisco Guillen-Grima in DIGITAL HEALTH
Supplemental material, sj-docx-3-dhj-10.1177_20552076261431481 for Machine learning/AI for early neonatal complication detection in rural Ethiopia: A retrospective cohort study in the Sidama region by Amanuel Yoseph, Yohannes Seifu Berego, Mehretu Belayneh and Francisco Guillen-Grima in DIGITAL HEALTH







