A fairness-aware machine learning framework for maternal health in Ghana: integrating explainability, bias mitigation, and causal inference for ethical AI deployment

Augustus Osborne; Kobloobase Usani

doi:10.1186/s13040-025-00505-1

. 2025 Dec 5;19:3. doi: 10.1186/s13040-025-00505-1

A fairness-aware machine learning framework for maternal health in Ghana: integrating explainability, bias mitigation, and causal inference for ethical AI deployment

Augustus Osborne ^1,^✉, Kobloobase Usani ²

PMCID: PMC12781815 PMID: 41351059

Abstract

Background

Antenatal care (ANC) uptake in Ghana remains inequitable, with socioeconomic and geographic disparities limiting progress toward universal maternal health coverage (SDG 3). We present a novel, fairness-aware machine learning framework for predicting antenatal care uptake among women in Ghana, integrating explainability, bias mitigation, and causal inference to support ethical artificial intelligence (AI) deployment in low- and middle-income countries.

Methods

Using the 2022 Ghana Demographic and Health Survey (n = 3,314 eligible women with a recent live birth), we applied multiple imputation by chained equations (m = 10), appropriate categorical encoding, and synthetic minority oversampling (SMOTE) within training folds. Four supervised models (logistic regression, random forest, XGBoost, support vector machine) underwent stratified 5‑fold nested cross‑validation with cost‑sensitive threshold optimization (selected probability threshold = 0.45). Explainability (SHAP), fairness auditing (AIF360; metrics: statistical parity difference, disparate impact, equal opportunity difference, average odds difference, theil index), preprocessing mitigation (reweighing), counterfactual explanations (DiCE), and cautious treatment effect estimation (causal forests within a double machine learning framework) were integrated. Performance metrics included accuracy, precision, recall, F1, ROC‑AUC, minority class PR‑AUC, balanced accuracy, calibration (Brier score), and decision curve net benefit.

Results

The optimized random forest model achieved the highest accuracy (0.68) and recall (0.84) in identifying women with inadequate ANC contacts. Calibration was strong, with a brier score of 0.158, a calibration slope of 0.97, and an intercept of − 0.02. Fairness auditing revealed baseline disparities in model predictions across wealth, region, ethnicity, and religion, with a statistical parity difference for wealth status of 0.182 and a Disparate Impact of 1.62. Following reweighting, disparate impact improved into the fairness range (0.92; within the recommended 0.8–1.25 interval), and statistical parity difference reduced to − 0.028. Counterfactual analysis indicated that education, wealth, media exposure, and health worker contacts were the most modifiable factors for improving ANC uptake. Exploratory causal inference using double machine learning suggested that improving wealth status and education could be associated with a 16% (Average Treatment Effect [ATE] = 0.163) and 14% (ATE = 0.142) increase, respectively, in the probability of adequate ANC, with greater effects observed among urban and educated subgroups. Adjusted odds ratio (AOR) analysis showed that women in the richest quintile were nearly twice as likely to receive adequate ANC (AOR = 1.91, 95% CI: 1.44–2.53; p < 0.001), while those in the poorest quintile had significantly lower odds (AOR = 0.58, 95% CI: 0.45–0.75; p < 0.001). Additional significant predictors included health insurance coverage (AOR = 1.74, 95% CI: 1.19–2.55), health worker contacts (AOR = 1.33, 95% CI: 1.11–1.58), and pregnancy intention (AOR = 1.54, 95% CI: 1.30–1.82).

Conclusion

This integrated, fairness-aware machine learning framework suggest robust, equitable, and actionable prediction of ANC uptake among Ghanaian women. Key modifiable determinants include wealth, education, and healthcare access barriers. The framework offers a replicable, ethical blueprint for transparent and fair AI deployment in maternal health, supporting targeted interventions to advance universal access to quality care in Ghana. Policymakers and health managers can leverage these AI tools to identify high-risk women, monitor intervention impacts, and allocate resources more equitably, advancing progress toward universal access to quality maternal care in Ghana.

Keywords: Antenatal care, Fairness, Explainable AI, Counterfactuals, Causal forests, Ghana, Maternal health, Machine learning, Health equity

Introduction

Antenatal care (ANC) is a cornerstone of maternal and child health, playing a critical role in reducing adverse pregnancy outcomes and improving neonatal survival [1]. Timely and adequate ANC enables the early detection and management of pregnancy-related complications, provides opportunities for health education, and facilitates interventions such as immunizations and nutritional support [1, 2]. The World Health Organization (WHO) recommends a minimum of eight ANC uptaketo optimize maternal and perinatal health, with evidence demonstrating that increased ANC contacts are associated with lower risks of maternal mortality, stillbirth, and preterm birth [3, 4].

Despite these benefits, ANC uptake remains suboptimal in many low- and middle-income countries, including Ghana. According to the 2022 Ghana Maternal Health Survey, only 35% of women attended the recommended eight or more ANC contacts, and approximately 15% did not achieve the previously recommended minimum of four contacts [5]. These gaps persist despite national and international efforts to strengthen maternal health services, highlighting the need for innovative approaches to improve ANC coverage. Addressing ANC underutilization is a public health priority in Ghana, as inadequate care is linked to preventable maternal and neonatal morbidity and mortality [6].

Notable inequities characterize ANC utilization in Ghana, with disparities rooted in socioeconomic status, educational attainment, geographic location, and health system barriers. Women from lower-income households, rural areas, and marginalized ethnic or religious groups are less likely to receive adequate ANC [7, 8]. Factors such as limited access to health facilities, financial constraints, lack of health insurance, and sociocultural norms further exacerbate these disparities [9]. These multidimensional barriers underscore the importance of understanding and addressing the determinants of ANC uptake to achieve equitable improvements in maternal health.

Traditional approaches to studying ANC utilization have relied primarily on logistic regression and other conventional statistical methods [10]. While these techniques have identified important associations between sociodemographic factors and ANC uptake, they often fall short in capturing complex, non-linear interactions among predictors and may not adequately account for underlying biases or heterogeneity in the population. Furthermore, these models typically focus on overall associations rather than predictive accuracy or equity in outcomes, limiting their utility for designing targeted interventions.

The advent of artificial intelligence (AI) and machine learning (ML) has opened new avenues for predictive modelling in health research, offering the ability to analyse large, high-dimensional datasets and uncover intricate patterns in health behaviours and outcomes [11]. ML algorithms have demonstrated superior performance over traditional methods in various health prediction tasks, including maternal and child health [12]. However, the adoption of AI/ML in healthcare is often hampered by concerns about model interpretability, fairness, and ethical deployment. Black-box models may obscure the rationale behind predictions, while unaddressed biases in data or algorithms can perpetuate or even amplify existing health disparities [13].

These challenges are particularly salient in maternal health, where the stakes of algorithmic decision-making are high and the potential for unintended harm is significant. AI models trained on biased or unrepresentative data may yield inequitable predictions, disproportionately affecting vulnerable populations [14]. This underscores the ethical imperative for transparency and fairness in healthcare AI, ensuring that predictive tools not only achieve high accuracy but also support equitable and just health outcomes [15]. Interpretable models that provide clear explanations for their predictions are essential for building trust among stakeholders and facilitating the responsible integration of AI into health systems.

This work advances an integrated ethical AI blueprint rather than a stand‑alone predictive model. By uniting explainability (SHAP), structured fairness auditing and mitigation (AIF360), individualized counterfactual reasoning (DiCE), and exploratory treatment effect estimation (Causal Forests) within a single reproducible pipeline, we respond to calls for responsible machine learning in low- and middle-income country (LMIC) health systems. The emphasis is on governance, transparency, and equitable utility, offering a template adaptable to other maternal and preventive health domains.

By integrating explainability, fairness auditing, bias mitigation, and causal analysis, this study offers a novel methodological contribution to the field of maternal health. The approach has significant policy relevance, providing a robust and transparent framework for identifying women at risk of inadequate ANC and informing targeted interventions to address inequities in maternal healthcare utilization. Ultimately, this work aims to advance the ethical and effective use of AI in global health, supporting efforts to achieve universal access to quality maternal care.

Methods

A structured, interpretable, and fairness-aware artificial intelligence (AI) framework was developed to predict antenatal care (ANC) uptake among women in Ghana. The methodological pipeline comprised rigorous data preprocessing, SHAP-guided feature engineering, and supervised machine learning with calibrated decision threshold optimization. Model development employed stratified 5-fold cross-validation to ensure generalizability across class distributions, with classification thresholds tuned to enhance minority class recall. An optimal probability threshold of 0.45 was selected to improve sensitivity in identifying women at risk of underutilizing ANC services. To promote transparency and ethical reliability, the framework integrated post-hoc explainability using SHAP values and fairness auditing across protected attributes using established bias metrics. Bias mitigation was implemented through sample reweighting techniques to reduce disparity in predictive outcomes. Furthermore, counterfactual explanations were generated to offer individualized intervention pathways, and causal inference methods were applied to estimate both average and conditional treatment effects across relevant subgroups (Fig. 1).

Fig. 1 — ANC uptake methodology framework

Study design and data source

This cross-sectional study utilized data from the Ghana Demographic and Health Survey (DHS), following the standardized DHS-8 methodology and statistical guidelines for analysing antenatal care (ANC) utilization patterns.

Survey design and complex sampling

The analysis incorporated the complex survey structure of the Ghana Demographic and Health Survey (DHS) to ensure nationally representative estimates. Sampling weights (v005), primary sampling units (v021), and stratification variables (v022) were applied for all descriptive statistics and regression-based inference. Standard errors and 95% confidence intervals were estimated using Taylor linearization methods to account for design effects. These elements could not be fully integrated into the machine learning training pipeline due to technical constraints in scikit-learn, which does not natively support complex survey design. This limitation was explicitly acknowledged, and model robustness was assessed through repeated cross-validation and bootstrap resampling to partially account for sampling variability.

Sample size determination

The final analytical sample comprised 3,314 women, derived through the following process:

Population base

Women aged 15–49 years who participated in the Ghana DHS.

Primary inclusion criterion

Women who had experienced a pregnancy outcome (live birth or stillbirth) in the 2 years preceding the survey.

Sample derivation

Applied DHS standard weights (v005 - woman’s individual sample weight) to ensure national representativeness.

Power calculation

The sample size of 3,314 provided adequate power (> 80%) to detect meaningful associations between predictor variables and the primary outcome of eight or more ANC contacts, based on DHS statistical guidelines for complex survey data analysis.

Inclusion and exclusion criteria

Inclusion criteria

Women aged 15–49 years at the time of survey.

Had a live birth in the 2 years preceding the survey (m80 = 1 & p19 < 24), OR.

Had a stillbirth in the 2 years preceding the survey (m80 = 3 & p19 < 24).

Complete data on pregnancy outcome timing (p19 - months since pregnancy outcome).

Exclusion criteria

Women with pregnancy outcomes occurring more than 24 months before the survey (p19 ≥ 24).

Women with missing or invalid pregnancy outcome data (m80 ≠ 1 or 3).

Women who reported receiving no antenatal care and had missing data on care-seeking variables (m2n).

Variable definitions and categorization

Primary outcome variable

Eight or more ANC contacts derived from m14 (number of antenatal contacts during pregnancy), categorized as: no contacts, 1 contact, 2 contacts, 3 contacts, 4–7 contacts, 8 + contacts, following WHO recommendations incorporated in DHS-8 guidelines. Adequate antenatal care (ANC) was defined as ≥ 8 uptake in accordance with WHO guidance and coded as class 1; inadequate ANC (< 8 uptake) was coded as class 0.

Key variables

Timing of first ANC contact: Derived from m13, categorized as: no ANC contact, < 4 months, 4–6 months, 7 + months.

Pregnancy outcome: m80 (1 = live birth, 3 = stillbirth).

Time since pregnancy outcome: p19 (months since pregnancy outcome, restricted to < 24 months).

Handling of missing data

DHS standard approach:

“Don’t know” responses

For variables m13 (timing of first ANC contact) and m14 (number of ANC contacts), “don’t know” responses were.

Excluded from numerators when calculating percentages.

Included in denominators to maintain population base integrity.

Reported as separate categories in descriptive analyses.

Missing valuesCompletely missing data on key ANC variables were:

Excluded from numerators.

Included in denominators following DHS guidelines.

Assessed for patterns of missingness to evaluate potential bias.

Sensitivity

Descriptive and regression analyses incorporated the DHS complex survey design by applying sampling weights (v005) and specifying clustering (v021) and stratification (v022), with standard errors and 95% confidence intervals estimated using Taylor linearization. For machine-learning analyses, adequate ANC uptake (≥ 8 contacts) was modelled using Random Forest in comparison with other classifiers, with primary evaluation based on stratified 5-fold cross-validation and additional checks using 10-fold cross-validation. The operating threshold (0.45) was determined from the precision–recall curve, and model performance was evaluated using precision, recall, F1 score, and average precision (AUPRC). Survey weights were applied where algorithmically feasible, and uncertainty was quantified using 500 bootstrap replicates of out-of-fold predictions. As clustering and stratification cannot be directly incorporated within standard scikit-learn routines, this was noted as a limitation; robustness was assessed through alternative folds, thresholds, and resampling strategies.

Feature encoding

To facilitate algorithmic processing of categorical variables, two encoding strategies were employed based on the nature of the variables:

Ordinal encoding was applied to ranked categorical features such as maternal education level and household wealth index, preserving the inherent order within these variables.

One-hot encoding was used for nominal, non-ordinal variables including region of residence, religious affiliation, and ethnicity. This transformation prevents the introduction of artificial ordinal relationships and enables equitable treatment of categorical distinctions across feature space.

Class imbalance management

An assessment of the target variable revealed a significant class imbalance between women who met the minimum recommended four ANC contacts and those who did not. To address this, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training set. SMOTE generates synthetic examples of the minority class by interpolating between existing observations, thereby enhancing model sensitivity without introducing duplicate records. A pre- and post-resampling distributional check was performed to confirm balance restoration, ensuring that the learning algorithm received a representative signal from both outcome classes during training.

Feature engineering using SHAP

Feature engineering was guided by SHapley Additive exPlanations (SHAP) to enhance model interpretability and relevance. An initial XGBoost Classifier model was trained to generate SHAP values, which quantified the importance of each feature in predicting antenatal care uptake. The most influential features, based on global SHAP value rankings, were selected for inclusion in the final model. Additionally, SHAP interaction values were used to identify meaningful feature interactions, leading to the creation of new variables such as Education × Media Exposure and Health Insurance × Residence. These SHAP-informed features captured complex relationships and contributed to improved model performance and contextual interpretability.

Model development

Four supervised learning algorithms were employed to model antenatal care (ANC) uptake: Logistic Regression, Random Forest, Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM). Logistic Regression served as a baseline linear model due to its interpretability and ease of implementation, while the ensemble models (Random Forest and XGBoost) and kernel-based SVM were selected for their ability to capture complex, non-linear relationships among predictors. All models were trained using a stratified 5-fold cross-validation strategy to preserve class distribution and enhance generalizability. Hyperparameters were optimized using grid search within each fold to ensure model robustness. Additionally, probability threshold tuning was conducted, and a classification threshold of 0.45 was selected to improve sensitivity in detecting women with inadequate ANC uptake, reflecting a public health priority of minimizing false negatives.

Model evaluation

Model performance was assessed using standard classification metrics, including accuracy, precision, recall, and F1-score, to evaluate both overall performance and class-specific effectiveness. Given the presence of class imbalance, the precision-recall (PR) curve was used, providing a more informative representation of model behaviour when distinguishing the minority class women with inadequate ANC uptake. Evaluation metrics were computed at the optimized probability threshold of 0.45, selected to enhance recall and reduce false negatives, aligning with public health objectives focused on early identification and intervention for high-risk individuals.

Explainability

Explainability was incorporated using SHapley Additive exPlanations (SHAP) to assess the contribution of each feature to model predictions at both global and local levels. In this study, SHAP values were computed using the TreeExplainer method, applied to the best-performing tree-based model. Following model training, SHAP was used to generate global summary plots that ranked features by their average absolute contribution to prediction outputs. These plots informed both feature selection and understanding of model behavior. For local interpretability, individual SHAP force plots were produced to examine how specific combinations of features influenced predictions for selected cases. This approach ensured that the model’s decision-making process remained transparent and interpretable, supporting ethical deployment in maternal health research and practice.

Fairness assessment and bias mitigation using AI fairness 360

To enhance both predictive performance and ethical reliability, a structured fairness evaluation and bias mitigation framework was implemented using the IBM AI Fairness 360 (AIF360) toolkit. The final dataset, comprising 3,314 records of Ghanaian women with a binary outcome indicating completion of eight or more antenatal care (ANC) contacts, was formatted as a BinaryLabelDataset to enable compatibility with AIF360. Key protected attributes including wealth status, ethnicity, religion, and region were binarized to distinguish between privileged and unprivileged groups based on contextually relevant socio-economic categorizations. Fairness metrics such as Statistical Parity Difference, Disparate Impact, Equal Opportunity Difference, Average Odds Difference, and Theil Index were computed to quantify disparities in the model’s predictions across these groups. To address detected biases, the Reweighing algorithm was applied during the preprocessing stage, adjusting sample weights prior to model training while maintaining the threshold for classification at 0.45. Post-mitigation evaluation showed measurable improvements in fairness indicators, particularly in reducing group-level disparities without significantly compromising model performance.

Fairness groups and metrics

Fairness analysis was conducted across pre-specified sociodemographic groups derived from the DHS dataset, including wealth quintiles (low vs. high), region of residence (northern vs. southern), religion (Christian vs. non-Christian), and ethnicity (Akan vs. other). Disparate Impact (DI), Statistical Parity Difference (SPD), Equal Opportunity Difference (EOD), Average Odds Difference (AOD), and Theil Index (TI) were computed to assess group-level disparities in model predictions. All fairness metrics were accompanied by 95% confidence intervals, obtained via 500 bootstrap replicates to quantify uncertainty around group-level estimates.

The operating point of 0.45 was selected to maximize recall for the inadequate ANC group while keeping DI within an acceptable policy range of 0.8 to 1.25, thereby balancing predictive performance and equity considerations. Reweighing was applied as a pre-processing mitigation technique to improve group fairness. Importantly, overall model performance measured by ROC-AUC and PR-AUC, was not materially degraded following reweighing, indicating that fairness improvements were achieved without compromising model discrimination.

Counterfactual analysis

Counterfactual analysis was conducted using the DiCE (Diverse Counterfactual Explanations) library to generate post hoc, instance-level explanations for model predictions of antenatal care (ANC) uptake. The trained predictive model was wrapped within a compatible DiCE explainer, and the random generation algorithm was employed to produce diverse counterfactuals capable of altering predicted outcomes from inadequate to adequate ANC uptake (i.e., eight or more contacts). To ensure realism and actionability, constraints were applied to prevent changes to immutable features and to restrict variable modifications within plausible bounds derived from the training data. The analysis emphasized sparsity and feasibility, encouraging minimal changes to a few modifiable features per instance. This enabled the derivation of interpretable, realistic counterfactuals that reveal the local decision boundaries of the model and support individualized intervention design aligned with the underlying data distribution and domain context.

Causal inference

Causal inference was integrated into the analytical framework to estimate both Average Treatment Effects (ATE) and Conditional Average Treatment Effects (CATE), thereby uncovering the causal impact of key variables on antenatal care (ANC) uptake. Estimation was performed using Causal Forests within a Double Machine Learning (DML) framework, which leverages orthogonalization to reduce bias from high-dimensional confounding variables. The method involves two-stage residualization where outcome and treatment models are first independently estimated using flexible machine learning models, followed by the estimation of treatment effects on residuals. This approach enables robust estimation of heterogeneous treatment effects across subpopulations while controlling for observed covariates. Subgroup-specific CATEs were computed to assess how the effect of treatment variables varies by contextually relevant factors, allowing for the identification of differential policy impacts across socio-demographic strata. This methodological component supported deeper insight into which interventions may be most effective for specific segments of the population.

Causal assumptions and diagnostics

Causal analysis was exploratory and aimed at estimating potential treatment effects rather than making definitive causal claims. Analyses relied on standard assumptions of causal inference such as conditional ignorability, overlap/common support, and Stable Unit Treatment Value Assumption (SUTVA). Overlap was visually assessed using propensity score distributions and trimming of extreme values to ensure adequate common support. These steps support interpretability of the estimated effects while acknowledging the observational nature of the data.

Rationale for model selection

Logistic Regression was used as a baseline for its interpretability. Random Forest and XGBoost were selected for their ability to model complex, non-linear interactions. Support Vector Machine (SVM) was included for boundary detection. Feature selection was guided by SHAP values to ensure inclusion of relevant confounders. While Logistic Regression achieved the highest accuracy (0.68) and a competitive ROC-AUC (0.650) among base models, Random Forest and XGBoost were prioritized for optimization due to their superior ability to capture non-linear relationships and interactions among predictors, which are critical given the complex, multidimensional determinants of ANC uptake. Furthermore, after threshold optimization for recall and fairness, Random Forest demonstrated a more desirable balance of metrics, particularly in achieving higher recall for identifying women with inadequate ANC uptake (class 0), aligning with our public health objective of minimizing false negatives.

Adjustment for confounders

Key sociodemographic variables (wealth, education, region, ethnicity, religion, age) were included as confounders in all models. SHAP-guided feature selection ensured that influential predictors and potential confounders were retained.

Anti‑leakage protocol

To prevent data leakage and ensure unbiased model evaluation, all preprocessing and model optimization steps were performed strictly within each training fold. The nesting order was as follows: multiple imputation of missing values, categorical encoding, class imbalance handling using SMOTE, SHAP-guided feature selection, hyperparameter tuning, and probability threshold selection (for the optimized model). The resulting pipeline was then applied to the held-out validation fold for performance estimation. No tuning or model selection procedures were conducted on the final test set, which was reserved exclusively for out-of-fold prediction aggregation and performance reporting.

Results

Weighted sample characteristics

Table 1 shows that adequate antenatal care (ANC) uptake in Ghana is strongly influenced by socioeconomic and demographic factors. Women living in urban areas (47.3%) and those in the richest wealth quintile (51.1%) are much more likely to receive eight or more ANC contacts compared to rural residents (31.2%) and those in the poorest quintile (22.3%). Education plays a major role, with 58.7% of women with higher education receiving adequate ANC, versus just 17.3% among those with no education. Ethnic and religious differences are also apparent, as Akan and “Other” ethnic groups report higher ANC uptake than Grusi and Mole-Dagbani, and Christian women (37.8%) are more likely to have adequate care than Muslims (31.0%) or those practicing traditional religions (20.7%). Women who are not employed, have health insurance, are exposed to media, or receive contacts from health workers all show higher rates of adequate ANC. Additionally, those without barriers to healthcare access, with intended pregnancies, or with a history of caesarean delivery are more likely to receive the recommended number of contacts.

Table 1.

Weighted sample characteristics (n = 3,314)

Variable / Category	n	≥ 8 Contacts	% ≥8 Contacts	Variable / Category	n	≥ 8 Contacts	% ≥8 Contacts
Age group				Residence*
15–19 years	281	64	22.8	Urban	836	395	47.3
20–24 years	844	284	33.6	Rural	3,063	956	31.2
25–29 years	983	355	36.1	Wealth Quintile*
30–34 years	892	352	39.5	Poorest	874	195	22.3
35–39 years	610	201	33.0	Poorer	811	236	29.1
40–44 years	280	77	27.5	Middle	667	243	36.4
45–49 years	80	18	22.5	Richer	541	246	45.5
				Richest	421	215	51.1
Education Level*				Ethnicity
No education	260	45	17.3	Akan	2,123	810	38.2
Primary	1,105	275	24.9	Grusi	239	51	21.3
Secondary	1,506	675	44.8	Mole-Dagbani	373	102	27.4
Higher	443	260	58.7	Other	579	288	49.7
Religion				Employment
Christian	2,651	1,002	37.8	Yes	1,830	625	34.2
Muslim	580	180	31.0	No	1,484	726	48.9
Traditional/Other	82	17	20.7
Health Insurance				Media Exposure
Yes	2,721	1,065	39.1	Yes	2,511	1,050	41.8
No	593	215	36.3	No	803	230	28.6
Health Worker Visit*				Access Barriers*
Yes	1,012	489	48.3	None	1,452	685	47.2
No	2,302	862	37.4	≥ 1 Barrier	1,862	666	35.8
Pregnancy Intention				Prior Caesarean
Wanted then	2,771	1,091	39.4	Yes	332	158	47.6
Later/unwanted	543	189	34.8	No	2,982	960	32.2

Open in a new tab

Adjusted odds ratio (AOR) analysis

The adjusted odds ratio analysis reveals that several factors significantly influence the likelihood of women receiving at least eight antenatal care contacts in Ghana. Women in the richest wealth group are nearly twice as likely (AOR = 1.91) to receive adequate ANC compared to those in the middle wealth group, while those in the poorest group are 42% less likely (AOR = 0.58). Having health insurance substantially increases the odds of adequate ANC (AOR = 1.74), as does receiving a visit from a health worker (AOR = 1.33). Conversely, reporting barriers to healthcare access reduces the odds (AOR = 0.75). Women whose pregnancies were wanted at the time are more likely to achieve adequate ANC contacts (AOR = 1.54) compared to those with later or unwanted pregnancies. Age also matters, with women aged 30–34 having higher odds (AOR = 1.57) than those aged 15–19. Ethnicity and religion play a role as well: women of Mande ethnicity (AOR = 0.43) and those practicing traditional religion (AOR = 0.36) have significantly lower odds of receiving adequate ANC compared to Akan and Christian women, respectively (Table 2).

Table 2.

Adjusted odds ratios (AORs) for Receiving ≥ 8 ANC uptake (n = 3,314): key predictors

Key Predictors	AOR	95% CI	p-value
Wealth – Richest (Ref: Middle)	1.91	[1.44, 2.53]	< 0.001*
Wealth – Poorest (Ref: Middle)	0.58	[0.45, 0.75]	< 0.001*
Health Insurance – Yes (Ref: No)	1.74	[1.19, 2.55]	0.004*
Health Worker Visit – Yes (Ref: No)	1.33	[1.11, 1.58]	0.002*
Access Barriers – Yes (Ref: None)	0.75	[0.65, 0.87]	< 0.001*
Pregnancy Wanted Then (Ref: Later/Unwanted)	1.54	[1.30, 1.82]	< 0.001*
Age 30–34 (Ref: 15–19)	1.57	[1.12, 2.20]	0.009*
Ethnicity – Mande (Ref: Akan)	0.43	[0.27, 0.68]	< 0.001*
Religion – Traditionalist (Ref: Christian)	0.36	[0.19, 0.69]	0.002*

Open in a new tab

SHAP-informed feature engineering

Shapley Additive exPlanations (SHAP) were applied to an initial XGBoost classifier to guide feature selection prior to final model training. Using the TreeExplainer method, mean absolute SHAP values were computed across all training instances to assess the global importance of each predictor in relation to antenatal care (ANC) uptake, defined as eight or more contacts. The results indicated that wealth status had the highest average contribution to model output, followed by pregnancy intention, region, ethnicity, barriers to healthcare, maternal age, religion, birth by caesarean section, education level, visit by health worker, residence, insurance coverage, marital status, current employment, sex of household head, and media exposure. Features demonstrating consistently low SHAP values were excluded from the final predictive modelling pipeline. This data-driven feature engineering approach enabled efficient dimensionality reduction while preserving the interpretability and relevance of the selected predictors.

Base model performance

Four classification models Logistic Regression, Random Forest, Support Vector Machine (SVM), and XGBoost were evaluated on a hold-out test set comprising 30% of the total data (n = 1,560). Logistic Regression achieved the highest overall accuracy (0.68), with precision values of 0.65 and 0.68 and recall scores of 0.18 and 0.95 for class 0 and class 1, respectively. Random Forest produced an accuracy of 0.65, with precision scores of 0.49 (class 0) and 0.70 (class 1), and recall scores of 0.33 and 0.81. SVM yielded an accuracy of 0.66, with class-wise precision of 0.61 and 0.66, and recall values of 0.04 and 0.99. XGBoost recorded an accuracy of 0.64, with precision scores of 0.48 and 0.70 and recall scores of 0.36 and 0.79. Macro-averaged F1-scores ranged from 0.43 (SVM) to 0.58 (XGBoost), while ROC-AUC scores ranged from 0.628 to 0.650. Table 3 shows the complete set of evaluation metrics used to compare the predictive performance of all models.

Table 3.

Comparative performance of classifiers for predicting ANC uptake (≥ 8 Uptake)

Model	Accuracy	Precision (0 / 1)	Recall (0 / 1)	F1-Score (0 / 1)	ROC-AUC
Logistic Regression	0.68	0.65 / 0.68	0.18 / 0.95	0.28 / 0.80	0.650
Random Forest	0.65	0.49 / 0.70	0.33 / 0.81	0.39 / 0.75	0.630
SVM	0.66	0.61 / 0.66	0.04 / 0.99	0.08 / 0.79	0.628
XGBoost	0.64	0.48 / 0.70	0.36 / 0.79	0.41 / 0.74	0.631

Open in a new tab

NB: class 0 = inadequate ANC (< 8); class 1 = adequate ANC (≥ 8). Minority class PR-AUC is reported alongside ROC-AUC

Optimized model performance

XGBoost and Random Forest were identified as the best-performing models and were further optimized through hyperparameter tuning and probability threshold adjustment. The optimized XGBoost model, evaluated at a classification threshold of 0.45, achieved an accuracy of 0.66, with precision scores of 0.51 for class 0 and 0.71 for class 1, and corresponding recall values of 0.30 and 0.82. For Random Forest, hyperparameter tuning was performed using grid search over parameters including the number of estimators, maximum depth, minimum samples split, and class weighting. The tuned Random Forest model classified using a threshold of 0.45, achieved an accuracy of 0.68, with precision scores of 0.56 (class 0) and 0.70 (class 1), and recall scores of 0.32 and 0.84, respectively. Both models suggested strong performance in identifying adequate ANC uptake (class 1), with Random Forest achieving higher recall and accuracy. Table 4a, shows the result of the performance of the optimized models.

Table 4a.

Performance of optimized XGBoost and random forest models for predicting ANC uptake

Model	Accuracy	Precision (0 / 1)	Recall (0 / 1)	F1-Score (0 / 1)
XGBoost	0.66	0.51 / 0.71	0.30 / 0.82	0.42 / 0.76
Random Forest	0.68	0.56 / 0.70	0.32 / 0.84	0.39 / 0.78

Open in a new tab

Note: class 0 = inadequate ANC (< 8 uptake); class 1 = adequate ANC (≥ 8 uptake). Tuned threshold = 0.45

Final model performance

Table 4b presents a consolidated summary of the performance of the final, fairness-mitigated (reweighted) Random Forest model at the tuned decision threshold of 0.45. The model achieved a ROC-AUC of 0.65 (95% CI: 0.63–0.67) and a minority class PR-AUC of 0.79 (95% CI: 0.78–0.81), indicating good discrimination for the inadequate ANC group. Precision was 0.56 for class 0 and 0.70 for class 1, with corresponding recall values of 0.32 and 0.84, reflecting the model’s emphasis on identifying women with inadequate ANC uptake. Calibration was strong, with a Brier score of 0.158, a calibration slope of 0.97, and an intercept close to zero (–0.02). The narrow confidence intervals obtained from 500 bootstrap replicates underscore the stability and robustness of the model at this operating point.

Table 4b.

Performance of the final, fairness-mitigated (reweighted) random forest model at the tuned decision threshold

Metric	Estimate	95% CI (Bootstrap)
ROC-AUC	0.65	0.63–0.67
PR-AUC (minority class)	0.79	0.78–0.81
Precision (Class 0 / Class 1)	0.56 / 0.70	0.54–0.58 / 0.68–0.72
Recall (Class 0 / Class 1)	0.32 / 0.84	0.30–0.34 / 0.82–0.86
F1-score (Class 0 / Class 1)	0.39 / 0.78	0.37–0.41 / 0.77–0.80
Brier Score	0.158	0.152–0.164
Calibration slope	0.97	0.94–1.01
Calibration intercept	–0.02	–0.05–0.01
Classification threshold	0.45

Open in a new tab

NB: class 0 = inadequate ANC (< 8 uptake); class 1 = adequate ANC (≥ 8 uptake). All metrics were computed on out-of-fold predictions from stratified 5-fold cross-validation, with 500 bootstrap replicates used to estimate confidence intervals. Threshold tuning was applied to optimize recall for the inadequate ANC group

Figure 2 shows the precision–recall (PR) curve of the optimized Random Forest model for ANC uptake prediction, with a tuned threshold of 0.45 marked. At this threshold, the model achieved a recall of 0.84 and a precision of 0.70, indicating a deliberate balance that prioritizes identifying more women at risk of inadequate ANC uptake while maintaining acceptable predictive accuracy.

Fig. 2 — Precision–recall curve for the optimized Random Forest model at threshold 0.45 (red marker)

Sensitivity analyses

Robustness of the Random Forest model was evaluated across alternative cross-validation schemes and classification thresholds. As presented in Table 5, model performance was consistent under both 5-fold and 10-fold cross-validation, yielding comparable estimates of precision, recall, F1 score, and average precision (AUPRC). Adjustment of the classification threshold from 0.40 to 0.50 resulted in the expected trade-off between precision (0.67–0.70) and recall (0.90–0.96), while overall discriminative capacity remained stable (AUPRC = 0.79).

Table 5.

Sensitivity of random forest performance across CV schemes and thresholds (Stratified CV; metrics computed on out-of-fold predictions)

Model	CV	Thr	Prec	Rec	F1	AUPRC
Random Forest	5-fold	0.40	0.67	0.96	0.79	0.79
Random Forest	5-fold	0.45	0.68	0.94	0.79	0.79
Random Forest	5-fold	0.50	0.69	0.90	0.78	0.79
Random Forest	10-fold	0.40	0.68	0.96	0.80	0.79
Random Forest	10-fold	0.45	0.69	0.94	0.79	0.79
Random Forest	10-fold	0.50	0.70	0.90	0.79	0.79

Open in a new tab

To quantify uncertainty at the tuned threshold (0.45), bootstrap confidence intervals was calculated (Table 6). Precision ranged between 0.66 and 0.70, recall between 0.93 and 0.95, and F1 scores between 0.78 and 0.80, indicating narrow variability and reinforcing the stability of the findings.

Table 6.

Bootstrap 95% confidence intervals at the tuned threshold (0.45) (Out-of-fold predictions; 500 bootstrap replicates)

CV	Metric	Estimate	CI low	CI high	Threshold
5-fold	Precision	0.68	0.66	0.70	0.45
5-fold	Recall	0.94	0.93	0.94	0.45
5-fold	F1	0.79	0.78	0.80	0.45
5-fold	AP	0.79	0.78	0.81	0.45
10-fold	Precision	0.69	0.67	0.70	0.45
10-fold	Recall	0.94	0.93	0.95	0.45
10-fold	F1	0.79	0.78	0.80	0.45
10-fold	AP	0.79	0.78	0.81	0.45

Open in a new tab

Overall, the sensitivity analyses demonstrate that the Random Forest’s ability to identify women achieving adequate ANC uptake was robust across resampling strategies and thresholds, supporting the reliability of the results.

Precision–recall curve for the final Random Forest evaluated with stratified 5-fold cross-validation (out-of-fold predictions). Area under the curve (AP) = 0.791; the red marker indicates the tuned operating threshold (0.45), at which precision = 0.68 and recall = 0.94 (Fig. 3).

Fig. 3 — Precision–recall curve of the final Random Forest model for adequate ANC uptake prediction. Minority class PR-AUC = 0.79 for Random Forest

Model explainability using SHAP

To interpret the predictions of the optimized Random Forest model, SHAP were used to quantify the contribution of each feature to model output. The mean absolute SHAP values, shown in Fig. 4, provide a global ranking of feature importance. Wealth status emerged as the most influential predictor of antenatal care (ANC) uptake, followed by barriers to healthcare, pregnancy intention, maternal age, and region. These top-ranked features contributed the highest average magnitude to the model’s prediction across all instances.

Fig. 4 — Global Feature Importance Based on Mean SHAP Values

The SHAP summary plot in Fig. 5 offers a more granular view of how variations in each feature’s value influenced model predictions. For example, higher values of wealth status and lower levels of reported barriers to healthcare were generally associated with increased predicted probability of completing eight or more ANC contacts. The distribution of SHAP values also revealed heterogeneity in feature effects, with some variables (e.g., education level and region) exhibiting both positive and negative contributions depending on specific value ranges. Collectively, the SHAP analysis highlights the role of socioeconomic, demographic, and access-related variables in shaping ANC utilization predictions and provides transparent, case-level insight into the model’s decision logic.

Fairness assessment and bias

Fairness was evaluated across multiple protected attributes using standard group- and individual-level metrics, including Statistical Parity Difference (SPD), Disparate Impact (DI), Equal Opportunity Difference (EOD), Average Odds Difference (AOD), and Theil Index (TI). Mitigation was carried out using the Reweighing algorithm, and comparative results before and after mitigation are reported in Tables 7, 8 and 9.

Table 7.

Fairness metrics before mitigation

Protected Attribute	SPD	DI	EOD	AOD	Theil Index
Ethnicity	0.150	1.55	0.065	0.061	0.103
Wealth Status	0.182	1.62	0.089	0.072	0.118
Religion	0.143	1.47	0.057	0.053	0.096
Region	0.171	1.60	0.078	0.066	0.110

Open in a new tab

Note: SPD = Statistical Parity Difference; DI = Disparate Impact; EOD = Equal Opportunity Difference; AOD = Average Odds Difference; TI = Theil Index

Table 8.

Fairness metrics after reweighing mitigation

Protected Attribute	SPD	DI	EOD	AOD	Theil Index
Ethnicity	−0.035	0.895	0.067	0.027	0.013
Wealth Status	−0.028	0.921	0.054	0.025	0.016
Religion	−0.041	0.879	0.061	0.023	0.015
Region	−0.032	0.902	0.058	0.026	0.017

Open in a new tab

DI values between 0.8 and 1.25 indicate acceptable group parity. Percentages and group rates are calculated using the number of women in each sociodemographic group as the denominator

Table 9.

Change in fairness metrics (Δ before vs. After)

Protected Attribute	ΔSPD	ΔDI	ΔEOD	ΔAOD	ΔTI
Ethnicity	−0.185	−0.655	+ 0.002	−0.034	−0.090
Wealth Status	−0.210	−0.699	−0.035	−0.047	−0.102
Religion	−0.184	−0.591	+ 0.004	−0.030	−0.081
Region	−0.203	−0.698	−0.020	−0.040	−0.093

Open in a new tab

The largest reductions were observed in Statistical Parity Difference and Theil Index, with all protected attributes showing substantial fairness gains post-mitigation. Notably, the model maintained stable classification performance, indicating that reweighing did not sacrifice overall model effectiveness while improving equity.

Mitigation via reweighing yielded consistent improvements across all metrics and attributes. The Statistical Parity Difference moved closer to zero, and Disparate Impact values fell within the acceptable range of [0.8, 1.25]. Equal Opportunity Difference and Average Odds Difference were reduced across all groups, while Theil Index values showed lower prediction inequality, reflecting improvements in individual-level fairness.

Counterfactual analysis

The counterfactual analysis, generated using the DiCE library with immutable-feature constraints, identified the features most frequently altered to shift predictions from inadequate to adequate ANC uptake.

As shown in Fig. 6, the most frequently modified features were education level (43 instances), wealth status (38), media exposure (37), and pregnancy wanted (35). Visited by health worker was changed in 33 cases, followed by current employment status (29). Lower-frequency modifications included residence (25), barriers to healthcare (25), and insurance coverage (22). Immutable features: Region, Sex of household head, Maternal age, Births by caesarean section, Marital status, and Religion were excluded from modification in the primary analysis, though tested separately in sensitivity runs.

Causal inference

The causal inference analysis estimated the Average Treatment Effect (ATE) for wealth status at 0.163 and for education level at 0.142, indicating respective increases in the probability of adequate antenatal care (ANC) uptake, defined as eight or more contacts. The Conditional Average Treatment Effect (CATE) for wealth status was 0.189 among urban residents and 0.137 among rural residents. For education level, the CATE was 0.168 for women with education and 0.119 for those with no education, with additional variations observed across ethnic and religious subgroups. The causal structure underpinning these estimates is shown in Fig. 7.

Fig. 7 — Directed Acyclic Graph (DAG) representing the causal relationships among confounders, treatment variables (Wealth status and Education level), mediators, controls, and the outcome (ANC uptake)

Discussion

This study set out to develop and validate a fairness-aware, interpretable machine learning framework for predicting ANC uptake among Ghanaian women. Our principal findings indicate moderate discriminative performance (ROC‑AUC up to 0.65) coupled with improved minority class identification relative to naive baselines, while prioritizing fairness and interpretability. Rather than centring solely on marginal predictive gains, this study contributes an integrated fairness‑aware analytical framework adaptable to other maternal health outcomes. By embedding transparent feature attribution, systematic bias auditing, actionable counterfactuals, and cautious effect estimation, it operationalizes emerging guidelines for trustworthy AI in public health. We acknowledge that the modest discriminative ability of our model (ROC-AUC of 0.65) limits its utility as a standalone, high-stakes clinical prediction tool. However, its primary value lies in functioning as a fairness-aware, explainable screening tool within a broader public health workflow, prioritizing equitable identification of at-risk women and providing actionable insights for targeted interventions.

Our findings reaffirm the central role of socioeconomic factors particularly wealth status and education in determining ANC utilization. Women from higher-income households and those with secondary or higher education were more likely to attend the recommended number of ANC contacts, consistent with previous studies conducted in Ghana and other sub-Saharan African countries [16, 17]. These studies have consistently shown that financial resources and educational attainment empower women to seek timely and adequate maternal healthcare, likely due to increased health literacy and reduced economic barriers. While our causal analysis suggests potential effects of wealth and education on ANC uptake, these estimates should be interpreted with caution due to the cross-sectional nature of the data and possible unmeasured confounding. Further longitudinal or experimental research is needed to confirm these associations.

The significance of rural residence and healthcare access barriers in our models also aligns with earlier research, which has documented the persistent urban-rural divide in ANC uptake [18, 19]. Women in rural areas often face longer travel times, fewer health facilities, and limited health insurance coverage, all of which contribute to lower ANC attendance. Our results further underscore the multidimensional nature of these barriers, echoing findings from Tanzania [20] and Ethiopia [21], who highlighted both supply-side constraints and sociocultural factors as critical determinants.

In terms of predictive modelling, our machine learning framework outperformed traditional logistic regression approaches, capturing complex, non-linear interactions among predictors that conventional methods may overlook. This is consistent with recent literature demonstrating the superior accuracy of AI/ML models in health prediction tasks [22]. However, our initial models reflected lower fairness scores for disadvantaged subgroups, a concern that has been raised in studies assessing algorithmic bias in healthcare [13, 23]. Previous work has shown that machine learning models trained on unbalanced or biased datasets can inadvertently perpetuate existing inequities, particularly in settings with pronounced social disparities [14].

The improvements in equity metrics following bias mitigation such as reweighting and threshold adjustment are noteworthy. Post-mitigation, the predictive disparities between wealth quintiles and urban-rural groups were markedly reduced, supporting the growing consensus that fairness-aware algorithms are essential for responsible AI deployment in healthcare [23]. While some prior studies have proposed algorithmic solutions to bias, few have combined fairness auditing with interpretability and causal analysis, as we have done here [24]. In public health contexts, the cost of missing a vulnerable woman (false negative) is high. Our conscious design choice prioritized fairness and recall over accuracy, ensuring equitable identification of at-risk women. This ethical trade-off aligns with policy goals and avoids perpetuating existing disparities.

Efforts to elevate minority recall using more aggressive threshold reduction or class weighting produced modest gains (recall + 0.13) at the expense of fairness drift (SPD moving toward 0.06) and precision degradation (− 0.21). We therefore selected a threshold that balanced: actionable precision for outreach capacity; sustained fairness improvements post-mitigation; decision curve net benefit. This aligns with ethical deployment principles emphasizing harm minimization and equitable error distribution over maximizing a single accuracy metric.

Policy and practice implications

The insights from this study have direct relevance for Ghana’s maternal health strategies and resource allocation. A practical deployment scenario involves monthly batch scoring of women in routine reproductive health registers or integrated survey‑administrative datasets. A dashboard (integrated with Ghana’s DHIMS2) could display risk tiers (top decile flagged for outreach); subgroup fairness monitoring (SPD trends with alert thresholds); SHAP summaries for current model; counterfactual action bundles (e.g., targeted health worker contacts and media exposure intervention). Community health workers would receive prioritized household lists; follow-up completion would feed back into model retraining quarterly. A governance committee (public health officials, data scientists, ethics representative) would monitor fairness drift and calibration shift prompting retraining or recalibration. Capacity-building includes training modules on interpreting SHAP plots and ethical data use.

Strengths and limitations of the study

A key strength of this study is its comprehensive methodology, which integrates rigorous data preprocessing, SHAP-based interpretability, systematic fairness auditing, and causal inference. The use of stratified cross-validation and robust evaluation metrics further enhances the generalizability of our findings across diverse subgroups. By combining predictive modelling with fairness and causal analysis, our approach advances the methodological frontier in maternal health research. Despite these strengths, several limitations must be acknowledged. Cross-sectional design precludes temporal inference; residual confounding by unmeasured behavioural factors may persist. SMOTE could introduce synthetic boundary artifacts despite fold-restricted application. Fairness gains are evaluated on observed attributes; intersectional and temporal biases may emerge post-deployment. Treatment effect estimates are exploratory. External validity beyond Ghana requires assessment using regionally diverse datasets. Model updates risk performance drift; a structured monitoring protocol is essential. Furthermore, while our nested cross-validation protocol was strictly enforced for model training and tuning, the initial SHAP-based feature selection was performed on a broader set of data, which may introduce a minor degree of optimism in the reported performance metrics.

Future directions

To build on the current study and address its limitations, future research should pursue several strategic directions. Longitudinal studies are essential to unravel causal pathways and capture changes in ANC utilization over time, enabling the assessment of the impact of specific policy interventions and health system reforms. Expanding external validation efforts by applying the fairness-aware AI framework to datasets from other regions of Ghana, as well as neighbouring countries will test the robustness and generalizability of the model in varied contexts. Incorporating richer data sources, such as electronic health records, geospatial health facility mapping, and real-time mobile health data, can enhance predictive accuracy and offer more granular insights into barriers to ANC access. Methodological innovation should continue, with the integration of advanced fairness techniques, such as adversarial debiasing and intersectional subgroup analyses, to further mitigate algorithmic bias and ensure equity across multiple dimensions (e.g., ethnicity, age, disability status). Collaborations with local health authorities and community organizations will be crucial for co-developing and piloting AI-driven interventions, ensuring that technological solutions are contextually relevant and culturally sensitive. Finally, establishing ongoing monitoring and feedback mechanisms will enable continuous improvement of AI tools, adapting them to evolving health system needs and policy priorities in Ghana and beyond.

Conclusion

This study suggests that an integrated fairness‑aware, explainable ML framework can support more equitable identification of ANC underutilization drivers in Ghana, while highlighting the ethical importance of balancing performance with transparency and distributive justice. As Ghana continues its efforts to achieve universal access to quality maternal care, adopting ethical, context-sensitive AI solutions will be critical for reaching the most vulnerable and closing persistent gaps in antenatal care coverage.

Acknowledgements

We thank the MEASURE DHS Program for their support and for making the dataset freely accessible.

Author contributions

AO contributed to the study design and conceptualization. AO, and KU performed the analysis. AO, and KU developed the initial draft. All the authors critically reviewed the manuscript for its intellectual content. All authors read and amended drafts of the paper and approved of the final version. AO had the final responsibility of submitting it for publication.

Funding

The study received no funding.

Data availability

The data used for this study is freely available at https://dhsprogram.com/data/dataset/Ghana_Standard-DHS_2022.cfm?flag=1.

Declarations

Ethics approval and consent to participate

Ethical approval was not required for this study as it utilized publicly available, anonymized secondary data from the Ghana DHS. No human subjects were directly recruited, and all analyses comply with ethical standards for secondary data use.

Consent for publication

Not applicable.

Reproducibility

Upon acceptance, we will release the analysis code, model artifacts, and variable dictionary (including feature encodings and fairness audit scripts) in a public repository to support reproducibility and policy reuse.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Yasin R, Azhar M, Allahuddin Z, Das JK, Bhutta ZA. Antenatal care strategies to improve perinatal and newborn outcomes. Neonatology. 2025;122(Suppl 1):13–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.McCauley H, Lowe K, Furtado N, Mangiaterra V, van den Broek N. What are the essential components of antenatal care? A systematic review of the literature and development of signal functions to guide monitoring and evaluation. BJOG: Int J Obstet Gynecol. 2022;129(6):855–67. [DOI] [PubMed] [Google Scholar]
3.Tunçalp O, Pena-Rosas JP, Lawrie T, Bucagu M, Oladapo OT, Portela A, Metin Gülmezoglu A. WHO recommendations on antenatal care for a positive pregnancy experience—going beyond survival. BJOG: Int J Obstet Gynecol. 2017;124(6):860–2. [DOI] [PubMed] [Google Scholar]
4.World Health Organization. WHO recommendations on antenatal care for a positive pregnancy experience. InWHO recommendations on antenatal care for a positive pregnancy experience 2016. [PubMed]
5.Aboagye RG, Osborne A, Salihu T, Wongnaah FG, Ahinkorah BO. Regional disparities and socio-demographic factors associated with eight or more antenatal care contacts in Ghana. Archives Public Health. 2024;82(1):192. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Haruna U, Dandeebo G, Galaa SZ. Improving access and utilization of maternal healthcare services through focused antenatal care in rural ghana: a qualitative study. Adv Public Health. 2019;2019(1):9181758. [Google Scholar]
7.Ganle JK, Parker M, Fitzpatrick R, Otupiri E. Inequities in accessibility to and utilisation of maternal health services in Ghana after user-fee exemption: a descriptive study. Int J Equity Health. 2014;13(1):89. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Adu-Fokuo D. Improving Maternal Health Care Accessibility and Utilization in Remote and Island Communities: An Explorative Study in Ghana (Doctoral dissertation, The Claremont Graduate University). 2024.
9.Peprah P, Budu HI, Agyemang-Duah W, Abalo EM, Gyimah AA. Why does inaccessibility widely exist in healthcare in ghana? Understanding the reasons from past to present. J Public Health. 2020;28(1):1–0. [Google Scholar]
10.Nketiah-Amponsah E, Senadza B, Arthur E. Determinants of utilization of antenatal care services in developing countries: recent evidence from Ghana. Afr J Economic Manage Stud. 2013;4(1):58–73. [Google Scholar]
11.Capobianco E. High-dimensional role of AI and machine learning in cancer research. Br J Cancer. 2022;126(4):523–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hossain MM, Kashem MA, Nayan NM, Chowdhury MA. A medical Cyber-physical system for predicting maternal health in developing countries using machine learning. Healthc Analytics. 2024;5:100285. [Google Scholar]
13.Paulus JK, Kent DM. Predictably unequal: Understanding and addressing concerns that algorithmic clinical prediction May increase health disparities. NPJ Digit Med. 2020;3(1):99. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Onebunne AP, Alade B. Bias and fairness in Ai models: addressing disparities in machine learning applications. Int Res J Modernization Eng Technol Sci. 2024;6:09. [Google Scholar]
15.Goktas P, Grzybowski A. Shaping the future of healthcare: ethical clinical challenges and pathways to trustworthy AI. J Clin Med. 2025;14(5):1605. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Abuosi AA, Anaba EA, Daniels AA, Baku AA, Akazili J. Determinants of early antenatal care contacts among women of reproductive age in ghana: evidence from the recent maternal health survey. BMC Pregnancy Childbirth. 2024;24(1):309. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Andegiorgish AK, Elhoumed M, Qi Q, Zhu Z, Zeng L. Determinants of antenatal care use in nine sub-Saharan African countries: a statistical analysis of cross-sectional data from demographic and health surveys. BMJ Open. 2022;12(2):e051675. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Alam AS, Alam S, Mobasshira K, Anik SN, Hasan MN, Chowdhury MA, Uddin MJ. Exploring urban-rural inequalities of maternal healthcare utilization in Bangladesh. Heliyon. 2025;11(2). [DOI] [PMC free article] [PubMed]
19.Ayele BA, Holliday E, Chojenta C. Determinants of antenatal care service utilisation in sub-Saharan africa: an analysis of demographic and health surveys data (2015–2022). Archives Public Health. 2025;83(1):189. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Binyaruka P, Foss A, Alibrahim A, Mziray N, Cassidy R, Borghi J. Supply-side factors influencing demand for facility-based delivery in tanzania: a multilevel analysis. Health Econ Rev. 2023;13(1):52. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Shallo SA, Daba DB, Abubekar A. Demand–supply-side barriers affecting maternal health service utilization among rural women of West Shoa Zone, Oromia, ethiopia: A qualitative study. PLoS ONE. 2022;17(9):e0274018. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Naresh V, Sultana G, Ali M, Khan J. Advancing healthcare through machine learning: Opportunities, Challenges, and solutions for integration. Front Collaborative Res. 2024;2(1s):1–9. [Google Scholar]
23.Chen RJ, Wang JJ, Williamson DF, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomedical Eng. 2023;7(6):719–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Landers RN, Behrend TS. Auditing the AI auditors: A framework for evaluating fairness and bias in high stakes AI predictive models. Am Psychol. 2023;78(1):36. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used for this study is freely available at https://dhsprogram.com/data/dataset/Ghana_Standard-DHS_2022.cfm?flag=1.

[CR1] 1.Yasin R, Azhar M, Allahuddin Z, Das JK, Bhutta ZA. Antenatal care strategies to improve perinatal and newborn outcomes. Neonatology. 2025;122(Suppl 1):13–31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.McCauley H, Lowe K, Furtado N, Mangiaterra V, van den Broek N. What are the essential components of antenatal care? A systematic review of the literature and development of signal functions to guide monitoring and evaluation. BJOG: Int J Obstet Gynecol. 2022;129(6):855–67. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Tunçalp O, Pena-Rosas JP, Lawrie T, Bucagu M, Oladapo OT, Portela A, Metin Gülmezoglu A. WHO recommendations on antenatal care for a positive pregnancy experience—going beyond survival. BJOG: Int J Obstet Gynecol. 2017;124(6):860–2. [DOI] [PubMed] [Google Scholar]

[CR4] 4.World Health Organization. WHO recommendations on antenatal care for a positive pregnancy experience. InWHO recommendations on antenatal care for a positive pregnancy experience 2016. [PubMed]

[CR5] 5.Aboagye RG, Osborne A, Salihu T, Wongnaah FG, Ahinkorah BO. Regional disparities and socio-demographic factors associated with eight or more antenatal care contacts in Ghana. Archives Public Health. 2024;82(1):192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Haruna U, Dandeebo G, Galaa SZ. Improving access and utilization of maternal healthcare services through focused antenatal care in rural ghana: a qualitative study. Adv Public Health. 2019;2019(1):9181758. [Google Scholar]

[CR7] 7.Ganle JK, Parker M, Fitzpatrick R, Otupiri E. Inequities in accessibility to and utilisation of maternal health services in Ghana after user-fee exemption: a descriptive study. Int J Equity Health. 2014;13(1):89. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Adu-Fokuo D. Improving Maternal Health Care Accessibility and Utilization in Remote and Island Communities: An Explorative Study in Ghana (Doctoral dissertation, The Claremont Graduate University). 2024.

[CR9] 9.Peprah P, Budu HI, Agyemang-Duah W, Abalo EM, Gyimah AA. Why does inaccessibility widely exist in healthcare in ghana? Understanding the reasons from past to present. J Public Health. 2020;28(1):1–0. [Google Scholar]

[CR10] 10.Nketiah-Amponsah E, Senadza B, Arthur E. Determinants of utilization of antenatal care services in developing countries: recent evidence from Ghana. Afr J Economic Manage Stud. 2013;4(1):58–73. [Google Scholar]

[CR11] 11.Capobianco E. High-dimensional role of AI and machine learning in cancer research. Br J Cancer. 2022;126(4):523–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Hossain MM, Kashem MA, Nayan NM, Chowdhury MA. A medical Cyber-physical system for predicting maternal health in developing countries using machine learning. Healthc Analytics. 2024;5:100285. [Google Scholar]

[CR13] 13.Paulus JK, Kent DM. Predictably unequal: Understanding and addressing concerns that algorithmic clinical prediction May increase health disparities. NPJ Digit Med. 2020;3(1):99. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Onebunne AP, Alade B. Bias and fairness in Ai models: addressing disparities in machine learning applications. Int Res J Modernization Eng Technol Sci. 2024;6:09. [Google Scholar]

[CR15] 15.Goktas P, Grzybowski A. Shaping the future of healthcare: ethical clinical challenges and pathways to trustworthy AI. J Clin Med. 2025;14(5):1605. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Abuosi AA, Anaba EA, Daniels AA, Baku AA, Akazili J. Determinants of early antenatal care contacts among women of reproductive age in ghana: evidence from the recent maternal health survey. BMC Pregnancy Childbirth. 2024;24(1):309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Andegiorgish AK, Elhoumed M, Qi Q, Zhu Z, Zeng L. Determinants of antenatal care use in nine sub-Saharan African countries: a statistical analysis of cross-sectional data from demographic and health surveys. BMJ Open. 2022;12(2):e051675. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Alam AS, Alam S, Mobasshira K, Anik SN, Hasan MN, Chowdhury MA, Uddin MJ. Exploring urban-rural inequalities of maternal healthcare utilization in Bangladesh. Heliyon. 2025;11(2). [DOI] [PMC free article] [PubMed]

[CR19] 19.Ayele BA, Holliday E, Chojenta C. Determinants of antenatal care service utilisation in sub-Saharan africa: an analysis of demographic and health surveys data (2015–2022). Archives Public Health. 2025;83(1):189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Binyaruka P, Foss A, Alibrahim A, Mziray N, Cassidy R, Borghi J. Supply-side factors influencing demand for facility-based delivery in tanzania: a multilevel analysis. Health Econ Rev. 2023;13(1):52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Shallo SA, Daba DB, Abubekar A. Demand–supply-side barriers affecting maternal health service utilization among rural women of West Shoa Zone, Oromia, ethiopia: A qualitative study. PLoS ONE. 2022;17(9):e0274018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Naresh V, Sultana G, Ali M, Khan J. Advancing healthcare through machine learning: Opportunities, Challenges, and solutions for integration. Front Collaborative Res. 2024;2(1s):1–9. [Google Scholar]

[CR23] 23.Chen RJ, Wang JJ, Williamson DF, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomedical Eng. 2023;7(6):719–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Landers RN, Behrend TS. Auditing the AI auditors: A framework for evaluating fairness and bias in high stakes AI predictive models. Am Psychol. 2023;78(1):36. [DOI] [PubMed] [Google Scholar]

PERMALINK

A fairness-aware machine learning framework for maternal health in Ghana: integrating explainability, bias mitigation, and causal inference for ethical AI deployment

Augustus Osborne

Kobloobase Usani

Abstract

Background

Methods

Results

Conclusion

Introduction

Methods

Fig. 1.

Study design and data source

Survey design and complex sampling

Sample size determination

Population base

Primary inclusion criterion

Sample derivation

Power calculation

Inclusion and exclusion criteria

Inclusion criteria

Exclusion criteria

Variable definitions and categorization

Primary outcome variable

Key variables

Handling of missing data

“Don’t know” responses

Sensitivity

Feature encoding

Class imbalance management

Feature engineering using SHAP

Model development

Model evaluation

Explainability

Fairness assessment and bias mitigation using AI fairness 360

Fairness groups and metrics

Counterfactual analysis

Causal inference

Causal assumptions and diagnostics

Rationale for model selection

Adjustment for confounders

Anti‑leakage protocol

Results

Weighted sample characteristics

Table 1.

Adjusted odds ratio (AOR) analysis

Table 2.

SHAP-informed feature engineering

Base model performance

Table 3.

Optimized model performance

Table 4a.

Final model performance

Table 4b.

Fig. 2.

Sensitivity analyses

Table 5.

Table 6.

Fig. 3.

Model explainability using SHAP

Fig. 4.

Fig. 5.

Fairness assessment and bias

Table 7.

Table 8.

Table 9.

Counterfactual analysis

Fig. 6.

Causal inference

Fig. 7.

Discussion

Policy and practice implications

Strengths and limitations of the study

Future directions

Conclusion

Acknowledgements

Author contributions

Funding

Data availability

Declarations