Skip to main content
BMC Pregnancy and Childbirth logoLink to BMC Pregnancy and Childbirth
. 2025 Sep 2;25:916. doi: 10.1186/s12884-025-08021-0

Multidimensional predictors of preterm birth risk among black and white primiparous women in the U.S.: insights from machine learning

Sangmi Kim 1,, Zahra Barandouzi 1, Sophie Grant 1, Athena D F Sherman 1, Brenice Duroseau 2, Monique S Balthazar 3
PMCID: PMC12403921  PMID: 40898051

Abstract

Background

Unmeasured contextual factors contribute to Black-White disparities in preterm birth (PTB), but their effects are difficult to isolate due to complex relationships with individual factors connected in non-linear ways. To address this, we applied explainable machine learning to model interactions between individual and contextual factors to predict PTB and identify its key predictors among non-Hispanic Black (NHB) and non-Hispanic White (NHW) primiparous women in the U.S.

Methods

Elastic Net, Random Forest, and XGBoost models were developed using Pregnancy Risk Assessment Monitoring System and the Social Vulnerability Index data from nine U.S. states. SHAP (SHapley Additive exPlanations) values were computed to assess feature importance. Model performance was evaluated using the area under the ROC curve (AUC).

Results

Our models predicted PTB with high accuracy (AUC: 0.87–0.93) for NHB and NHW primiparous women, identifying both shared and distinct multidimensional predictors. Shared individual predictors included ≥ 9 prenatal care visits (protective; mean |SHAP| 0.42–1.58), adequate + prenatal care (risk-increasing; mean |SHAP| 0.69 for NHB and 1.18 for NHW), and gestational hypertension (risk-increasing; mean |SHAP| 0.17 and 0.20, respectively). Contextual socioeconomic status and household composition also contributed significantly to PTB prediction, with a stronger impact among NHB women.

Conclusions

Explainble machine learning with SHAP values can accurately quantify the contribution of individual and contextual factors to PTB risk specific to NHB and NHW primiparous women. By integrating feature importance with the prevalence of risk factors, this approach offers actionable insights to identify priority areas for intervention and inform tailored preventive strategies aimed at reducing Black-White disparities in PTB.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12884-025-08021-0.

Keywords: Preterm, Race/ethnicity, Disparities, Machine learning, Social vulnerability index, PRAMS

Introduction

Although the United States (U.S.) is among the highest-income countries in the world, it is ranked amongst the worst for preterm birth rates (PTB; birth at < 37 weeks’ gestation) [1]. In 2022, approximately 1 in 10 U.S. infants were born preterm [2]. PTB has lasting negative impacts on infants [3], families [4], and society (e.g., medical expenditures, lost academic progress, lost income) [5]. Importantly, the racial and ethnic inequities in PTB in the U.S. remain persistent, where non-Hispanic Black (NHB) women are approximately 1.5 times more likely than non-Hispanic White (NHW) women to experience PTB [2]. Nevertheless, factors driving Black-White disparities have not been fully elucidated [6].

Evidence suggests that social determinants of health (SDoH), such as racism, underly Black-White inequities in numerous health and behavioral outcomes, including PTB. Racial categorizations determines the distribution of opportunities and resources for health and well-being among individuals and communities. Individual and contextual (or area-based) risks and resources do not exist in isolation; rather, the former is embedded and interconnected with the latter, together contributing to differential PTB rates between groups [7].

Researchers hypothesize that unmeasured contextual factors drive the Black-White disparities in PTB, which persist even after controlling for individual factors [8, 9]. However, the complex relationships among such multidimensional determinants of health present significant challenges in measurement and prediction [10]. When contextual factors are analyzed alongside individual factors, they often fail to demonstrate statistical significance [11]. Linear models, in particular, struggle to capture the intricately woven network of associations driving PTB effectively [12]. The failure to capture the significant impact of contextual factors in this cascade may inadvertently lead researchers, healthcare providers, and policymakers to hyperfocus on individual-level interventions. Given that contextual factors shape exposure and vulnerability to individual factors [13], such narrowly focused approaches may undermine the effectiveness and sustainability of interventions to prevent PTB.

To effectively address the complex dynamics between multidimensional determinants of health influencing PTB among Black and White pregnant women, it is essential to employ more flexible and sophisticated modeling techniques than conventional linear regression. Machine learning (ML) meets this need: many, such as tree-based methods, automatically detect non-linear and high-order interactions without prior specification [14], while others, like Elastic Net, preserve a linear structure but apply penalization to improve variable selection and generalizability [15]. Consequently, ML models tend to outperform linear models in predicting PTB [16, 17], suggesting that ML offers a more robust approach to addressing the multidemensional determinants of health underlying PTB, potentially enhancing the model’s predictive accuracy.

Thus, this study aimed to leverage ML and population datasets to model the interactions between individual and contextual factors for predicting PTB while identifying key predictors among NHB and NHW pregnant women in the U.S. To date, few studies have used large population datasets to predict PTB, and none have incorporated contextual factors into their ML models [1822].

Methods

Study design and population

This retrospective, cross-sectional study used the CDC’s Pregnancy Risk Assessment Monitoring System (PRAMS) data (Phase 8, 2016–2021) linked with the birth certificate as well as the Social Vulnerability Index (SVI) data (2014–2018 and 2016–2020).

PRAMS [23] is an ongoing, population-based surveillance project to monitor maternal attitudes and experiences before, during, and shortly after pregnancy​. Each month, participating states sample individuals who recently gave birth (within 2–6 months) to a live infant, identified through birth certificate files, for data collection via mailed questionnaires and follow-up phone interviews [24]. Participating states can choose to oversample priority subpopulations and typically sample 1,000–3,000 individuals annually [24]. The final PRAMS dataset is weighted for sample design, nonresponse, and noncoverage to produce representative population estimates [24]. To reduce nonresponse bias, the CDC PRAMS working group sets a response rate threshold of 55–70%, depending on the survey year​ [24].

SVI [25] assesses the relative vulnerability of each county within a state by ranking Census tracts on 15–16 social factors (depending on the year), grouped into four themes: (1) socioeconomic status, (2) household characteristics, (3) racial/ethnic minority status, and (4) housing type and transportation. Tract rankings are percentile-based, ranging from zero to one, with higher values indicating greater vulnerability. Tracts at or above the 90th percentile are assigned a value of one (“flagged”) to indicate high vulnerability, while tracts below the 90th percentile are assigned a value of zero [26].

As this study used publicly available, deidentified data, it was deemed exempt from ethical review by Emory University Institutional Review Board (STUDY00003674).

Inclusion criteria

The study population consisted of primiparous women who delivered a live singleton infant without birth defects and self-identified as NHB or NHW. Originally, the dataset included 229,697 individuals. We excluded those who (1) were multiparous (n = 143,370), (2) had multiple gestation (n = 1,480), or (3) delivered an infant with a birth defect (n = 814). Then, we excluded those who were not NHB or NHW (n = 23,278). This sequential elimination process reduced the initial sample size to 60,755. We further subset the data to nine states (FL, GA, IA, IN, MN, MO, NC, WI, and WY) with available data on racial discrimination, stressful life events, and SVI. The final sample size was 9,595.

Measures

Outcome

We grouped gestational age in completed weeks (≤27, 28–33, 34–36, 37–39, and 40+) into PTB (< 37 weeks) and term birth (37 + weeks). This decision was made because the number of cases for extremely PTB (< 28 weeks, n = 182) and very PTB (28–31 weeks, n = 535) was small, particularly after stratification by race and ethnicity. The limited sample size would not provide sufficient data to reliably train and predict these separate outcomes.

Predictors

Variables in the analysis were based on prior literature about risk/protective factors for PTB. Most of our predictors were SDoH, defined by Healthy People 2030, representing economic stability, education access and quality, health care access and quality, neighborhood and built environment, and social and community context [27]. Variables of survey years and U.S. states were also included to consider potential differences in the characteristics of the maternal cohort who gave birth in certain years and locations (Table S1 for details).

The individual factors included socioeconomic, psychological, medical, and behavioral factors. Specifically, socioeconomic factors incorporated maternal race and ethnicity, maternal age, marital status, yearly total household income, the number of dependents relying on that income, receiving the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), maternal education, health insurance, and area of residence. Psychological factors incorporated 14 stressful life events, depression, physical violence by a current or ex-husband/partner, and perceived racial discrimination. Medical factors incorporated the number of previous other pregnancy outcomes that ended up in pregnancy terminations, diabetes before pregnancy, gestational diabetes, hypertension before pregnancy, gestational hypertension, pre-pregnancy body mass index (BMI) (categorical), weight gain during pregnancy (categorical), fever during pregnancy, premature rupture of membrane (PROM), and medical risk factors. A variable for weight gain during pregnancy was created based on the Institute of Medicine guidelines [28], which specify recommended ranges of weight gain by pre-pregnancy BMI category. Behavioral factors incorporated the number of prenatal care visits (PNC), initiation of the PNC in the first trimester, Kotelchuck’s Adequacy of Prenatal Care Utilization Index, multivitamin intake, pregnancy intention, cigarette smoking, e-cigarette smoking, and drinking. Lastly, contextual factors encompassed the number of flagged counties per state across a total of 31 (15 [2014–2018] and 16 [2016–2020]) social factors from the SVI. As noted earlier, census tracts at or above the 90th percentile for a given factor are assigned a value of 1 (flagged) to indicate areas of highest vulnerability, while all others are assigned a value of 0. Since PRAMS data is available only at the state level, we aggregated the number of flags for each social factor within individual states, resulting in 15–16 measures of social vulnerability per state every four years. A higher number of flags suggests a greater number of socially vulnerable counties within a state.

Data analysis

Variable selection and handling of missing data

Of the 100 predictor variables pre-determined per the literature review, 13 were excluded due to the degree of missingness ≥10% [29] (Table S2). Missing observations were removed for the outcome as required by ML for prediction. For the predictors, median imputation was conducted for numeric variables, and a code (“NA”) was assigned to missing observations for categorical variables with the expectation that our model would automatically learn what the missing patterns implied [30].

Descriptive statistics

We summarized the characteristics of study populations with frequency, percentage, mean, and standard deviation. We reported Black-White differences in these characteristics as well as their associations with PTB using Pearson’s Chi-squared tests and Wilcoxon rank sum tests. The statistical significance was set at α = 0.05. Notably, we selected only predictors significantly associated with PTB for modeling, distinct to NHB and NHW women.

ML

We built Elastic Net, Random Forest (RF), and Extreme Gradient Boosting (XGB) models to predict PTB. These algorithms were chosen because of their widespread application, relative ease of use, and diverse learning capabilities [31]. The data were split into training and test sets (70/30), each of which maintained the same PTB : term birth ratio to mitigate the data imbalance. The predictors were pre-processed by scaling the continuous variables and dummy-coding the categorical variables. Hyperparameter tuning was carried out using a random grid search via Latin hypercube sampling (grid size = 500). Individual models with each possible combination of the hyperparameters were evaluated for their performance with 5-fold cross-validation and area under the curve (AUC) score as an evaluation matrix.

We enhanced the explainability of our models by identifying key predictors of PTB among NHB and NHW women, respectively, using SHAP (SHapley Additive exPlanations) values [32]. Rooted in Cooperative Game Theory, SHAP ensures fair attribution of feature importance by computing each feature’s average marginal contribution across all possible subsets of features in a model. Although absolute SHAP values reflect the magnitude of a feature’s influence within a model, they are not directly comparable across the NHB and NHW models in this study—because they are scale-dependent and influenced by model-specific factors, including prediction variance and feature distribution [33]. By quantifying how much each individual and contextual factor contributes to PTB risk within each racial and ethnic group, SHAP enhances trust in ML predictions and provides actionable insights to guide tailored preventive strategies.

Sensitivity analysis

We performed two sensitivity analyses. First, we re-trained the best-performing models with all 85 candidate predictors, omitting the initial univariate p-value screen. Second, we repeated the analysis with 84 predictors after removing PROM, a variable that helps define spontaneous PTB and can precede medically indicated PTB. These tests addressed two concerns: (i) the original screen might have excluded clinically relevant yet non-significant predictors, and (ii) some algorithms (e.g., Elastic Net) perform their own feature selection.

The final models with the selected hyperparameters (Table S3) were evaluated on the test sets. All data analyses were conducted using R version 4.0.2 (2020-06-22).

Results

The unweighted PTB rate of NHB primiparous women was 1.2 times higher than the rate of NHW women (22.2% vs. 18.5%). However, after accounting for the PRAMS sampling weights, NHB women had 1.78 times the risk of PTB compared to NHW women (RR = 1.78; 95% CI: 1.54, 2.06) (Table S4).

NHB women were more likely than NHW women to live in urban areas (90% vs. 60.5%), be younger, be unmarried (76.8% vs. 32%), be of lower socioeconomic status, and report having experienced physical violence by an intimate partner and stressful life events (with a few exceptions). About one in five NHB women reported experiences of racial discrimination during the 12 months before their baby was born. NHB women were also more likely than NHW women to report hypertension (8.6% vs. 4.9%) and obesity (33.6% vs. 25.0%) before pregnancy and inadequate weight gain during pregnancy (26.7% vs. 19.4%). NHB women were more likely to report less frequent intake of multivitamins (0 time/week: 63.7% vs. 42.2%), unwanted pregnancy (5.6% vs. 2.6%), and little PNC (< = 8 visits: 25.1% vs. 14.7%). Further, NHB women were more likely to reside in states characterized by higher social vulnerability. For example, the average number of flags for living below 150% of the poverty level across the nine states was 12.4 for NHB women, compared to 9.8 for NHW women.

NHW women, on the other hand, were more likely than NHB women to report such stressful life events as the illness of a family member (26.8% vs. 23.5%), being apart from their husband/partner (5.7% vs. 3.5%), and having people close to them with drinking/drug problems (14.1% vs. 11.4%). They also more frequently reported having experienced depression before pregnancy (20.1% vs. 16.2%) and gestational diabetes (8.4% vs. 7.0%). Additionally, NHW reported higher use of tobacco (average number of cigarettes before pregnancy: 2.0 vs. 0.9) and alcohol (83.0% vs. 58.0%) before and during pregnancy (Table 1). The full descriptive results are available in Table S5 in the supplemental material.

Table 1.

Maternal characteristics by race and ethnicity among Non-Hispanic black and Non-Hispanic white primiparous women in the U.S. (PRAMS phase 8, 2016–2021; SVI 2014–2018/2016–2020)

Characteristic Overall
N = 9,5951
Non-Hispanic Black
N = 3,2061
Non-Hispanic White
N = 6,3891
p-value2
Rural Area < 0.001
 Rural 2,559 (29.4%) 296 (10.0%) 2,263 (39.5%)
 Urban 6,133 (70.6%) 2,668 (90.0%) 3,465 (60.5%)
Maternal Age < 0.001
 17 or younger 275 (2.9%) 176 (5.5%) 99 (1.6%)
 18–19 733 (7.6%) 383 (11.9%) 350 (5.5%)
 20–24 2,613 (27.2%) 1,180 (36.8%) 1,433 (22.4%)
 25–29 2,924 (30.5%) 788 (24.6%) 2,136 (33.4%)
 30–34 2,152 (22.4%) 432 (13.5%) 1,720 (26.9%)
 35–39 749 (7.8%) 191 (6.0%) 558 (8.7%)
 40+ 149 (1.6%) 56 (1.8%) 93 (1.5%)
Marital Status < 0.001
 Married 5,090 (53.1%) 744 (23.2%) 4,346 (68.0%)
 Not Married 4,501 (46.9%) 2,459 (76.8%) 2,042 (32.0%)
Health Insurance Before Pregnancy < 0.001
 Insured 8,316 (86.7%) 2,634 (82.3%) 5,682 (89.0%)
 Uninsured 1,272 (13.3%) 567 (17.7%) 705 (11.0%)
Health Insurance During Pregnancy 0.028
 Insured 9,165 (98.3%) 2,971 (97.9%) 6,194 (98.5%)
 Uninsured 160 (1.7%) 65 (2.1%) 95 (1.5%)
Maternal Education (Years) < 0.001
 00–08 71 (0.7%) 29 (0.9%) 42 (0.7%)
 09–11 728 (7.6%) 414 (13.0%) 314 (4.9%)
 12 2,286 (23.9%) 1,127 (35.4%) 1,159 (18.2%)
 13–15 2,805 (29.3%) 988 (31.0%) 1,817 (28.5%)
 16+ 3,669 (38.4%) 630 (19.8%) 3,039 (47.7%)
Receive WIC During Pregnancy < 0.001
 No 6,158 (64.9%) 1,272 (40.5%) 4,886 (77.0%)
 Yes 3,334 (35.1%) 1,872 (59.5%) 1,462 (23.0%)
Physical Abuse by Partner Before Pregnancy < 0.001
 No 9,216 (97.8%) 2,991 (96.5%) 6,225 (98.5%)
 Yes 205 (2.2%) 110 (3.6%) 95 (1.5%)
Physical Abuse by Partner During Pregnancy < 0.001
 No 9,257 (98.3%) 2,997 (97.0%) 6,260 (99.0%)
 Yes 157 (1.7%) 94 (3.0%) 63 (1.00%)
Illness of Family Member < 0.001
 No 7,020 (74.3%) 2,377 (76.5%) 4,643 (73.2%)
 Yes 2,426 (25.7%) 729 (23.5%) 1,697 (26.8%)
Divorce < 0.001
 No 8,968 (94.8%) 2,878 (92.5%) 6,090 (95.9%)
 Yes 493 (5.2%) 235 (7.5%) 258 (4.1%)
Move < 0.001
 No 5,776 (61.1%) 1,820 (58.6%) 3,956 (62.3%)
 Yes 3,681 (38.9%) 1,286 (41.4%) 2,395 (37.7%)
Homeless < 0.001
 No 9,170 (96.9%) 2,930 (94.1%) 6,240 (98.3%)
 Yes 290 (3.1%) 185 (5.9%) 105 (1.7%)
Job Loss of Husband/Partner < 0.001
 No 8,532 (90.4%) 2,747 (88.7%) 5,785 (91.2%)
 Yes 906 (9.6%) 351 (11.3%) 555 (8.8%)
Job Loss of Self < 0.001
 No 8,358 (88.4%) 2,522 (81.1%) 5,836 (92.0%)
 Yes 1,095 (11.6%) 586 (18.9%) 509 (8.0%)
Cut in Work Hours or Pay of Husband/Partner/Self < 0.001
 No 7,877 (83.4%) 2,521 (81.2%) 5,356 (84.5%)
 Yes 1,570 (16.6%) 584 (18.8%) 986 (15.5%)
Apart from Husband/Partner < 0.001
 No 8,982 (95.0%) 2,997 (96.5%) 5,985 (94.3%)
 Yes 471 (5.0%) 110 (3.5%) 361 (5.7%)
Argument More Than Usual < 0.001
 No 7,557 (80.0%) 2,258 (72.7%) 5,299 (83.6%)
 Yes 1,891 (20.0%) 848 (27.3%) 1,043 (16.4%)
Unwanted Pregnancy by Husband/Partner < 0.001
 No 8,831 (93.5%) 2,803 (90.3%) 6,028 (95.0%)
 Yes 618 (6.5%) 302 (9.7%) 316 (5.0%)
Problem Paying Bill < 0.001
 No 8,037 (85.1%) 2,493 (80.3%) 5,544 (87.4%)
 Yes 1,411 (14.9%) 613 (19.7%) 798 (12.6%)
Imprisonment of Husband/Partner/Self < 0.001
 No 9,071 (96.2%) 2,898 (93.6%) 6,173 (97.5%)
 Yes 355 (3.8%) 197 (6.4%) 158 (2.5%)
Problem with Drinking/Drugs of People Close to Me < 0.001
 No 8,214 (86.8%) 2,759 (88.6%) 5,455 (85.9%)
 Yes 1,247 (13.2%) 354 (11.4%) 893 (14.1%)
Death of People Close to Me < 0.001
 No 7,541 (79.8%) 2,336 (75.2%) 5,205 (82.1%)
 Yes 1,908 (20.2%) 772 (24.8%) 1,136 (17.9%)
Depression Before Pregnancy < 0.001
 No 7,733 (81.2%) 2,654 (83.8%) 5,079 (79.9%)
 Yes 1,790 (18.8%) 513 (16.2%) 1,277 (20.1%)
Depression During Pregnancy 0.049
 No 7,741 (82.1%) 2,521 (81.0%) 5,220 (82.7%)
 Yes 1,683 (17.9%) 590 (19.0%) 1,093 (17.3%)
Perceived Racial Discrimination < 0.001
 No 8,666 (91.9%) 2,513 (81.2%) 6,153 (97.2%)
 Yes 760 (8.1%) 581 (18.8%) 179 (2.8%)
Hypertension Before Pregnancy < 0.001
 No 8,923 (93.8%) 2,884 (91.4%) 6,039 (95.1%)
 Yes 587 (6.2%) 273 (8.6%) 314 (4.9%)
Diabetes Before Pregnancy 0.081
 No 9,169 (96.5%) 3,031 (96.0%) 6,138 (96.7%)
 Yes 332 (3.5%) 125 (4.0%) 207 (3.3%)
Hypertension During Pregnancy 0.2
 No 7,442 (78.7%) 2,431 (77.9%) 5,011 (79.1%)
 Yes 2,015 (21.3%) 691 (22.1%) 1,324 (20.9%)
Diabetes During Pregnancy 0.021
 No 8,705 (92.0%) 2,902 (93.0%) 5,803 (91.6%)
 Yes 753 (8.0%) 220 (7.0%) 533 (8.4%)
Medical Risk Factors 0.025
 No Risks 7,174 (75.4%) 2,343 (74.0%) 4,831 (76.1%)
 Risks 2,340 (24.6%) 823 (26.0%) 1,517 (23.9%)
Body Mass Index Before Pregnancy < 0.001
 Normal 4,158 (44.4%) 1,166 (37.9%) 2,992 (47.6%)
 Underweight 318 (3.40%) 125 (4.06%) 193 (3.07%)
 Overweight 2,276 (24.3%) 754 (24.5%) 1,522 (24.2%)
 Obese 2,606 (27.8%) 1,033 (33.6%) 1,573 (25.0%)
Weight Gain During Pregnancy < 0.001
 Inadequate 1,999 (21.8%) 799 (26.7%) 1,200 (19.4%)
 Adequate 2,480 (27.0%) 721 (24.1%) 1,759 (28.4%)
 Excessive 4,693 (51.2%) 1,469 (49.1%) 3,224 (52.1%)
Pregnancy Intention < 0.001
 Later 2,192 (23.2%) 1,048 (33.2%) 1,144 (18.2%)
 Not Sure 1,455 (15.4%) 708 (22.4%) 747 (11.9%)
 Not Want 342 (3.6%) 177 (5.6%) 165 (2.6%)
 Sooner 1,668 (17.6%) 336 (10.6%) 1,332 (21.2%)
 Then 3,801 (40.2%) 892 (28.2%) 2,909 (46.2%)
No. of Prenatal Care Visits < 0.001
 <= 08 1,684 (18.1%) 767 (25.1%) 917 (14.7%)
 09–11 2,641 (28.4%) 847 (27.7%) 1,794 (28.8%)
 12+ 4,968 (53.5%) 1,442 (47.2%) 3,526 (56.5%)
Prenatal Care Adequacy (Kotelchuck Index) < 0.001
 Inadequate 1,049 (11.3%) 501 (16.3%) 548 (8.8%)
 Intermediate 666 (7.1%) 267 (8.7%) 399 (6.4%)
 Adequate 3,924 (42.1%) 1,114 (36.3%) 2,810 (45.0%)
 Adequate Plus 3,677 (39.5%) 1,189 (38.7%) 2,488 (39.8%)
No. Cigarettes Before Pregnancy 1.6 (6.0) 0.9 (4.1) 2.0 (6.8) < 0.001
No. Cigarettes in 1 st Trimester 0.8 (3.7) 0.4 (2.4) 1.0 (4.2) < 0.001
No. Cigarettes in 2nd Trimester 0.5 (2.8) 0.3 (2.0) 0.7 (3.1) < 0.001
No. Cigarettes in 3rd Trimester 0.4 (2.5) 0.2 (1.8) 0.5 (2.8) < 0.001
Drinking in the Last 2 Years < 0.001
 No 2,382 (25.2%) 1,304 (42.0%) 1,078 (17.0%)
 Yes 7,060 (74.8%) 1,803 (58.0%) 5,257 (83.0%)
Below Poverty (2014–2018) 10.7 (16.1) 12.4 (18.7) 9.8 (14.5) < 0.001
Unemployed (2014–2018) 6.5 (9.7) 7.6 (11.0) 6.0 (9.0) 0.6
Low Income (2014–2018) 10.3 (13.6) 11.4 (15.6) 9.8 (12.4) > 0.9
No High School Diploma (2014–2018) 8.5 (11.9) 9.8 (13.8) 7.8 (10.7) 0.14
Aged 65 or Older (2014–2018) 7.8 (3.9) 8.5 (3.8) 7.4 (3.9) < 0.001
Aged 17 or Younger (2014–2018) 5.8 (6.0) 6.6 (6.9) 5.4 (5.4) < 0.001
Civilian With a Disability (2014–2018) 6.1 (8.3) 4.2 (6.6) 7.1 (8.9) < 0.001
Single-Parent Households (2014–2018) 9.5 (13.9) 11.6 (16.0) 8.5 (12.6) 0.2
Minority (2014–2018) 5.5 (9.1) 6.9 (10.2) 4.8 (8.4) < 0.001
Limited English Proficiency (2014–2018) 4.4 (4.4) 5.1 (5.1) 4.1 (4.0) < 0.001
Multi-Unit Structures (2014–2018) 8.1 (4.9) 8.8 (5.1) 7.8 (4.8) 0.2
Mobile Homes (2014–2018) 12.3 (21.3) 15.9 (24.4) 10.6 (19.3) 0.4
Crowding (2014–2018) 2.9 (3.1) 3.5 (3.6) 2.5 (2.8) < 0.001
No Vehicle (2014–2018) 7.1 (10.0) 8.5 (11.5) 6.4 (9.2) < 0.001
Group Quarters (2014–2018) 9.2 (7.9) 9.7 (8.6) 9.0 (7.5) 0.10
Below 150% Poverty (2016–2020) 9.4 (13.9) 10.7 (16.2) 8.8 (12.5) < 0.001
Unemployed (2016–2020) 5.6 (7.5) 6.5 (8.4) 5.2 (6.9) < 0.001
Housing Cost Burden (2016–2020) 8.5 (11.1) 10.3 (12.1) 7.6 (10.5) < 0.001
No High School Diploma (2016–2020) 7.5 (11.6) 9.1 (13.5) 6.8 (10.5) 0.043
No Health Insurance (2016–2020) 8.8 (12.2) 10.2 (14.0) 8.1 (11.1) 0.004
Aged 65 or Older (2016–2020) 8.3 (4.2) 8.9 (4.3) 8.0 (4.1) < 0.001
Aged 17 or Younger (2016–2020) 5.1 (4.5) 5.8 (5.2) 4.8 (4.1) < 0.001
Civilian With a Disability (2016–2020) 5.8 (7.5) 4.1 (6.0) 6.6 (8.0) < 0.001
Single-Parent Households (2016–2020) 9.2 (15.5) 11.6 (17.9) 8.1 (14.0) < 0.001
Limited English Proficiency (2016–2020) 4.1 (5.1) 5.2 (5.6) 3.5 (4.8) < 0.001
Minority (2016–2020) 5.6 (9.4) 7.0 (10.5) 4.9 (8.7) < 0.001
Multi-Unit Structures (2016–2020) 8.0 (5.1) 8.8 (5.1) 7.6 (5.0) < 0.001
Mobile Homes (2016–2020) 11.5 (21.1) 15.2 (24.2) 9.7 (19.1) 0.4
Crowding (2016–2020) 4.7 (4.0) 5.2 (4.7) 4.5 (3.6) 0.2
No Vehicle (2016–2020) 6.9 (9.2) 8.1 (10.5) 6.3 (8.3) < 0.001
Group Quarters (2016–2020) 9.2 (7.6) 9.6 (8.3) 9.0 (7.3) 0.14
Preterm Birth < 0.001
 Preterm 1,894 (19.7%) 712 (22.2%) 1,182 (18.5%)
 Term 7,701 (80.3%) 2,494 (77.8%) 5,207 (81.5%)
Sub-Categories of Preterm Birth < 0.001
 Extremely Preterm 182 (1.90%) 93 (2.90%) 89 (1.39%)
 Very Preterm 535 (5.58%) 207 (6.46%) 328 (5.13%)
 Moderate to Late Preterm 1,177 (12.3%) 412 (12.9%) 765 (12.0%)
 Term 7,701 (80.3%) 2,494 (77.8%) 5,207 (81.5%)

WIC = Special Supplemental Nutrition Program for Women, Infants, and Children. P-values less than 0.05 are shown in bold

1 Mean (SD); n (%)

2 Wilcoxon rank sum test; Pearson’s Chi-squared test

There were Black-White differences in the distribution of factors significantly associated with PTB, as well as in the magnitude of such associations. For example, a maternal age pattern in PTB differed by race and ethnicity: Unlike the well-known “U-shaped” pattern with higher PTB rates at the extremes of maternal age, NHB women showed a maternal age-related increase in PTB rates. Given the same age, the impact of older age on PTB was greater among NHB than NHW women (risk difference in 30–34, 35–39, and 40 + years: 7.4, 8.5, and 20.6). Although not statistically significant, the risk ratio for maternal age was 1.50 among NHB women and 0.82 among NHW women (data can be provided upon request). Additionally, the Black-White gap in PTB was outstanding for women with hypertension before pregnancy (40.7% vs. 29.6%) and PROM (55.6% vs. 44.5%). Depression before and during pregnancy was significantly associated with PTB only among NHW women (Table 2).

Table 2.

Associations between preterm birth and maternal characteristics by race and ethnicity among Non-Hispanic black and Non-Hispanic white primiparous women in U.S. (PRAMS phase 8, 2016–2021; SVI 2014–2018/2016–2020)

Characteristic Non-Hispanic Black
N = 7121
p-value2 Non-Hispanic White
N = 1,1821
p-value2
Year of Birth < 0.001 0.053
 16 51 (14.4%) 136.0 (18.7%)
 17 114 (17.6%) 184.0 (17.7%)
 18 139 (21.6%) 234.0 (17.3%)
 19 186 (27.0%) 265.0 (18.1%)
 20 133 (23.1%) 204.0 (18.3%)
 21 89 (30.3%) 159.0 (22.9%)
State - < 0.001
 Florida 61 (23.2%) 52 (19.6%)
 Georgia 261 (44.8%) 233 (37.0%)
 Iowa 53 (11.2%) 45 (8.8%)
 Indiana 13 (13.5%) 7 (6.7%)
 Minnesota 23 (8.7%) 50 (6.4%)
 Missouri 108 (28.3%) 402 (22.5%)
 North Carolina 96 (39.5%) 141 (21.4%)
 Wisconsin 96 (10.8%) 54 (6.7%)
 Wyoming 1 (8.3%) 198 (23.4%)
Rural Area 0.042 0.3
 Rural 75 (25.3%) 428 (18.9%)
 Urban 541 (20.3%) 615 (17.7%)
Maternal Age < 0.001 < 0.001
 17 or younger 33 (18.8%) 25 (25.3%)
 18–19 63 (16.4%) 81 (23.1%)
 20–24 236 (20.0%) 288 (20.1%)
 25–29 183 (23.2%) 328 (15.4%)
 30–34 108 (25.0%) 303 (17.6%)
 35–39 60 (31.4%) 128 (22.9%)
 40+ 29 (51.8%) 29 (31.2%)
Marital Status > 0.9 < 0.001
 Married 166 (22.3%) 755 (17.4%)
 Not Married 544 (22.1%) 427 (20.9%)
Health Insurance Before Pregnancy < 0.001 0.006
 Insured 555 (21.1%) 1,025 (18.0%)
 Uninsured 156 (27.5%) 157 (22.3%)
Health Insurance During Pregnancy 0.010 0.14
 Insured 654 (22.0%) 1,137 (18.4%)
 Uninsured 23 (35.4%) 23 (24.2%)
Total Annual Income 0.14 < 0.001
 $0 to $16,000 218 (20.8%) 178 (23.5%)
 $16,001 to $20,000 74 (21.2%) 73 (20.3%)
 $20,001 to $24,000 52 (23.4%) 56 (22.3%)
 $24,001 to $28,000 36 (24.7%) 40 (19.8%)
 $28,001 to $32,000 30 (19.2%) 40 (15.6%)
 $32,001 to $40,000 43 (24.4%) 59 (19.9%)
 $40,001 to $48,000 26 (20.6%) 39 (13.1%)
 $48,001 to $57,000 31 (27.4%) 75 (22.9%)
 $57,001 to $60,000 18 (37.5%) 52 (20.4%)
 $60,001 to $73,000 23 (31.5%) 69 (16.9%)
 $73,001 to $85,000 14 (23.7%) 85 (17.2%)
 $85,001 to more 44 (20.4%) 358 (16.3%)
No. of Household Members 1.8 (1.0) 0.008 2.1 (0.9) 0.2
Maternal Education (Years) 0.4 < 0.001
 00–08 7 (24.1%) 9 (21.4%)
 09–11 87 (21.0%) 71 (22.6%)
 12 247 (21.9%) 269 (23.2%)
 13–15 210 (21.3%) 346 (19.0%)
 16+ 159 (25.2%) 484 (15.9%)
Receive WIC During Pregnancy < 0.001 < 0.001
 No 325 (25.6%) 852 (17.4%)
 Yes 375 (20.0%) 323 (22.1%)
Physical Abuse by Partner Before Pregnancy 0.040 0.048
 No 685 (22.9%) 1,144 (18.4%)
 Yes 16 (14.5%) 25 (26.3%)
Physical Abuse by Partner During Pregnancy 0.038 0.038
 No 687 (22.9%) 1,150 (18.4%)
 Yes 13 (13.8%) 18 (28.6%)
Physical Abuse by Ex-Partner Before Pregnancy 0.2 0.065
 No 679 (22.8%) 1,134 (18.3%)
 Yes 20 (17.2%) 31 (24.8%)
Physical Abuse by Ex-Partner During Pregnancy 0.13 0.12
 No 688 (22.9%) 1,152 (18.4%)
 Yes 11 (15.3%) 14 (26.9%)
Illness of Family Member 0.9 0.8
 No 535 (22.5%) 857 (18.5%)
 Yes 166 (22.8%) 317 (18.7%)
Divorce 0.9 0.019
 No 649 (22.6%) 1,112 (18.3%)
 Yes 52 (22.1%) 62 (24.0%)
Move < 0.001 0.2
 No 458 (25.2%) 749 (18.9%)
 Yes 241 (18.7%) 425 (17.7%)
Homeless 0.001 0.4
 No 676 (23.1%) 1,149 (18.4%)
 Yes 26 (14.1%) 23 (21.9%)
Job Loss of Husband/Partner 0.3 0.8
 No 625 (22.8%) 1,073 (18.5%)
 Yes 72 (20.5%) 101 (18.2%)
Job Loss of Self 0.2 0.003
 No 581 (23.0%) 1,054 (18.1%)
 Yes 120 (20.5%) 119 (23.4%)
Cut in Work Hours or Pay of Husband/Partner/Self 0.5 0.056
 No 562 (22.3%) 1,012 (18.9%)
 Yes 137 (23.5%) 161 (16.3%)
Apart from Husband/Partner 0.2 0.003
 No 672 (22.4%) 1,086 (18.1%)
 Yes 30 (27.3%) 88 (24.4%)
Argument More Than Usual 0.050 0.7
 No 530 (23.5%) 976 (18.4%)
 Yes 171 (20.2%) 197 (18.9%)
Unwanted Pregnancy by Husband/Partner 0.7 > 0.9
 No 629 (22.4%) 1,115 (18.5%)
 Yes 71 (23.5%) 58 (18.4%)
Problem Paying Bill 0.035 0.6
 No 583 (23.4%) 1,021 (18.4%)
 Yes 119 (19.4%) 153 (19.2%)
Imprisonment of Husband/Partner/Self 0.2 0.1
 No 662 (22.8%) 1,132 (18.3%)
 Yes 37 (18.8%) 37 (23.4%)
Problem with Drinking/Drugs of People Close to Me 0.045 0.6
 No 637 (23.1%) 1,014 (18.6%)
 Yes 65 (18.4%) 160 (17.9%)
Death of People Close to Me 0.076 0.2
 No 509 (21.8%) 948 (18.2%)
 Yes 192 (24.9%) 225 (19.8%)
Depression Before Pregnancy 0.4 < 0.001
 No 600 (22.6%) 895 (17.6%)
 Yes 107 (20.9%) 285 (22.3%)
Depression During Pregnancy 0.6 < 0.001
 No 569 (22.6%) 909 (17.4%)
 Yes 127 (21.5%) 256 (23.4%)
Perceived Racial Discrimination 0.008 0.021
 No 591 (23.5%) 1,128 (18.3%)
 Yes 107 (18.4%) 45 (25.1%)
No. of Loss of Pregnancy 0.4 (0.8) < 0.001 0.3 (0.8) < 0.001
Hypertension Before Pregnancy < 0.001 < 0.001
 No 595 (20.6%) 1,086 (18.0%)
 Yes 111 (40.7%) 93 (29.6%)
Diabetes Before Pregnancy 0.002 < 0.001
 No 664 (21.9%) 1,115 (18.2%)
 Yes 42 (33.6%) 63 (30.4%)
Hypertension During Pregnancy < 0.001 < 0.001
 No 432 (17.8%) 711 (14.2%)
 Yes 269 (38.9%) 460 (34.7%)
Diabetes During Pregnancy 0.035 < 0.001
 No 639 (22.0%) 1,025 (17.7%)
 Yes 62 (28.2%) 143 (26.8%)
Fever During Pregnancy 0.2 0.2
 No 695 (22.4%) 1,161 (18.6%)
 Yes 17 (17.3%) 20 (14.4%)
Medical Risk Factors < 0.001 < 0.001
 No Risks 402 (17.2%) 713 (14.8%)
 Risks 303 (36.8%) 467 (30.8%)
Premature Rupture of Membrane < 0.001 < 0.001
 No 565 (19.2%) 976 (16.5%)
 Yes 145 (55.6%) 205 (44.5%)
Body Mass Index Before Pregnancy 0.002 < 0.001
 Normal 226 (19.4%) 479 (16.0%)
 Underweight 29 (23.2%) 49 (25.4%)
 Overweight 163 (21.6%) 273 (17.9%)
 Obese 271 (26.2%) 355 (22.6%)
Weight Gain During Pregnancy < 0.001 < 0.001
 Inadequate 227 (28.4%) 350 (29.2%)
 Adequate 163 (22.6%) 321 (18.2%)
 Excessive 261 (17.8%) 458 (14.2%)
Intake of Multivitamin (Times/Week) 0.12 0.088
 0 434 (21.4%) 522 (19.4%)
 1–3 51 (20.9%) 64 (18.4%)
 4–6 20 (20.2%) 55 (14.1%)
 7 206 (25.4%) 540 (18.3%)
Pregnancy Intention < 0.001 0.7
 Later 202 (19.3%) 199 (17.4%)
 Not Sure 126 (17.8%) 140 (18.7%)
 Not Want 45 (25.4%) 28 (17.0%)
 Sooner 92 (27.4%) 261 (19.6%)
 Then 243 (27.2%) 547 (18.8%)
Start of Prenatal Care in 1 st Trimester 0.025 < 0.001
 No PNC 12 (37.5%) 13 (46.4%)
 No 94 (19.0%) 84 (18.9%)
 Yes 576 (22.3%) 1,073 (18.3%)
No. of Prenatal Care Visits < 0.001 < 0.001
 <= 08 353 (46.0%) 456 (49.7%)
 09–11 153 (18.1%) 349 (19.5%)
 12+ 163 (11.3%) 312 (8.8%)
Prenatal Care Adequacy (Kotelchuck Index) < 0.001 < 0.001
 Inadequate 126 (25.1%) 112 (20.4%)
 Intermediate 33 (12.4%) 41 (10.3%)
 Adequate 122 (11.0%) 197 (7.0%)
 Adequate Plus 392 (33.0%) 771 (31.0%)
No. Cigarettes Before Pregnancy 0.8 (3.5) 0.6 2.3 (7.2) 0.11
No. Cigarettes in 1 st Trimester 0.5 (2.4) 0.6 1.2 (4.3) 0.006
No. Cigarettes in 2nd Trimester 0.3 (1.8) > 0.9 0.8 (3.0) 0.017
No. Cigarettes in 3rd Trimester 0.2 (1.6) 0.8 0.7 (3.5) 0.019
E-Cigarettes Before Pregnancy 0.047 0.6
 Not use 683 (22.8%) 1,067 (18.4%)
 1 day a week or less 9 (17.6%) 27 (16.5%)
 2–6 days a week 0 (0.0%) 14 (22.6%)
 Once a day 1 (6.3%) 8 (19.0%)
 More than once a day 8 (25.0%) 53 (21.6%)
E-Cigarettes During Pregnancy 0.7 0.5
 Not use 695 (22.5%) 1,149 (18.5%)
 1 day a week or less 2 (25.0%) 7 (15.2%)
 2–6 days a week 1 (16.7%) 5 (33.3%)
 Once a day 3 (42.9%) 2 (11.1%)
 More than once a day 1 (25.0%) 9 (15.5%)
Drinking in the Last 2 Years 0.4 0.003
 No 302 (23.2%) 234 (21.7%)
 Yes 396 (22.0%) 936 (17.8%)
Contextual Factor3
Below Poverty (2014–2018) 22.8 (21.8) < 0.001 15.6 (18.2) < 0.001
Unemployed (2014–2018) 14.1 (12.4) < 0.001 9.3 (10.8) < 0.001
Low Income (2014–2018) 20.2 (17.4) < 0.001 14.9 (14.7) < 0.001
No High School Diploma (2014–2018) 17.4 (16.0) < 0.001 12.0 (13.4) < 0.001
Aged 65 Or Older (2014–2018) 8.7 (3.2) 0.2 7.1 (3.6) < 0.001
Aged 17 Or Younger (2014–2018) 9.7 (8.7) < 0.001 6.9 (7.1) < 0.001
Civilian With a Disability (2014–2018) 6.0 (6.7) < 0.001 8.9 (8.9) < 0.001
Single-Parent Households (2014–2018) 20.2 (19.2) < 0.001 12.8 (16.4) < 0.001
Minority (2014–2018) 12.6 (11.8) < 0.001 7.6 (10.5) < 0.001
Limited English Proficiency (2014–2018) 7.3 (5.2) < 0.001 5.0 (4.6) < 0.001
Multi-Unit Structures (2014–2018) 9.0 (4.8) < 0.001 7.4 (4.6) 0.2
Mobile Homes (2014–2018) 29.1 (28.4) < 0.001 17.5 (24.5) < 0.001
Crowding (2014–2018) 5.1 (4.1) < 0.001 3.5 (3.5) < 0.001
No Vehicle (2014–2018) 14.8 (13.5) < 0.001 9.6 (11.7) < 0.001
Group Quarters (2014–2018) 14.2 (8.5) < 0.001 11.7 (7.9) < 0.001
Below 150% Poverty (2016–2020) 19.5 (19.0) < 0.001 13.9 (15.5) < 0.001
Unemployed (2016–2020) 11.4 (9.5) < 0.001 7.7 (8.3) < 0.001
Housing Cost Burden (2016–2020) 17.2 (13.3) < 0.001 11.1 (12.4) < 0.001
No High School Diploma (2016–2020) 16.5 (15.9) < 0.001 10.8 (13.3) < 0.001
No Health Insurance (2016–2020) 17.5 (16.7) < 0.001 12.5 (13.8) < 0.001
Aged 65 Or Older (2016–2020) 9.9 (3.6) 0.002 8.2 (4.0) 0.035
Aged 17 Or Younger (2016–2020) 7.8 (6.4) < 0.001 5.6 (5.3) 0.057
Civilian With a Disability (2016–2020) 6.0 (6.0) < 0.001 8.4 (7.9) < 0.001
Single-Parent Households (2016–2020) 21.1 (21.5) < 0.001 13.0 (18.2) < 0.001
Limited English Proficiency (2016–2020) 7.5 (5.8) < 0.001 4.4 (5.6) < 0.001
Minority (2016–2020) 13.0 (12.2) < 0.001 7.8 (10.9) < 0.001
Multi-Unit Structures (2016–2020) 9.0 (4.9) < 0.001 7.1 (4.9) 0.9
Mobile Homes (2016–2020) 28.1 (28.3) < 0.001 16.3 (24.5) < 0.001
Crowding (2016–2020) 7.6 (5.3) < 0.001 5.8 (4.4) < 0.001
No Vehicle (2016–2020) 13.9 (12.5) < 0.001 9.4 (10.6) < 0.001
Group Quarters (2016–2020) 14.0 (8.0) < 0.001 11.7 (7.6) < 0.001

WIC = Special Supplemental Nutrition Program for Women, Infants, and Children. P-values less than 0.05 are shown in bold

1For categorical variables, we presented the frequency and percentage. For continuous variables, we presented the mean and standard deviation of women who experienced preterm birth. Also, the calculated values were derived from the data without missing values whose numbers varied from variable to variable. 712 and 1,182 are the total number of preterm births in each racial/ethnic group

2Pearson’s Chi-squared test; Wilcoxon rank sum test; Fisher’s exact test

3Preterm birth groups among non-Hispanic Black and non-Hispanic White women had a higher average number of social vulnerability flags than their term birth counterparts, although data for term births were not shown here. P-values reflect the statistically significance of differences in contextual factors between preterm and term births

Our ML models demonstrated high and consistent accuracy in predicting PTB across all three models (Table 3). Among NHB primiparous women, AUC values ranged from 0.87 to 0.91, while among NHW counterparts, AUC values ranged from 0.90 to 0.93.

Table 3.

Preterm birth prediction accuracy by model and population

Elastic Net Random Forest XGBoost
Training Test Training Test Training Test
AUC (95% CI) AUC AUC (95% CI) AUC AUC (95% CI) AUC
Pooled 0.91 (0.91, 0.92) 0.91 0.91 (0.91, 0.91) 0.91 0.92 (0.91, 0.93) 0.92
N-H Black 0.90 (0.90, 0.91) 0.92 0.89 (0.88, 0.90) 0.87 0.91 (0.90, 0.92) 0.91
N-H White 0.91 (0.90, 0.91) 0.93 0.91 (0.90, 0.92) 0.90 0.92 (0.90, 0.93) 0.92

AUC area under the curve, CI confidence interval, N-H non-Hispanic

Our models identified both shared and distinct predictors of PTB among NHB and NHW primiparous women in this study (Figs. 1 and 2). Shared individual predictors deemed important included the number of PNC visits (more visits as protective), adequacy of PNC based on Kotelchuck Index (adequate + PNC as risk-elevating vs. intermediate PNC as protective), gestational hypertension and PROM (risk-elevating), excessive weight gain during pregnancy (protective), and medical risk factors (risk-elevating). Intriguingly, inadequate PNC was linked to lower PTB risk in NHB women but to higher risk in NHW women.

Fig. 1.

Fig. 1

Variable Importance in Preterm Birth Prediction among non-Hispanic Black Primiparous Women. Base log-odds = -1.417 (≈ 19.5% probability). Each dot represents one participant’s contribution (SHAP value) for a given predictor. Positive SHAP values push the prediction toward higher preterm birth risk, whereas negative SHAP values push it lower. The horizontal spread shows between-person variability: a wide spread signals that a predictor's impact varies greatly across persons, while a tight cluster denotes a more uniform effect. Predictors are ordered top to bottom by their mean absolute SHAP value. *Variable scaling: All nominal and continuous predictors were standardised to 0–1 before model fitting. For binary variables, a high feature value (red) represents presence = 1 and a low value (blue) absence = 0, except for the four contextual factors—“high % poverty,” “high % unemployment,” “lower per-capita income,”", "and “number of pregnancy loss”—which were treated as standardised continuous measures. Grey bars are the mean |SHAP| importance

Fig. 2.

Fig. 2

Variable Importance in Preterm Birth Prediction among non-Hispanic White Primiparous Women. Base log-odds = -1.598 (≈ 16.8% probability). Each dot represents one participant’s contribution (SHAP value) for a given predictor. Positive SHAP values push the prediction toward higher preterm birth risk, whereas negative SHAP values push it lower. The horizontal spread shows between-person variability: a wide spread signals that a predictor's impact varies greatly across persons, while a tight cluster denotes a more uniform effect. Predictors are ordered top to bottom by their mean absolute SHAP value. *Variable scaling: All nominal and continuous predictors were standardised to 0–1 before model fitting. For binary variables, a high feature value (red) represents presence = 1 and a low value (blue) absence = 0, except for the contextual factors, which were treated as standardised continuous measures. Grey bars are the mean |SHAP| importance

Importantly, contextual factors signaling area-based socioeconomic disadvantage—namely high poverty and unemployment rates and low per-capita income—were influential in both groups, often outranking several individual factors. Among NHW women, two additional contextual characteristics—high proportions of individuals without a high-school diploma and single-parent households—also emerged as key predictors.

Lastly, broadening the XGBoost model to all 85 candidate predictors did not enhance predictive accuracy over the more parsimonious, screened version. AUC remained virtually unchanged at 0.92 for NHB women and 0.91 for NHW women. Excluding the definitional variable PROM had similarly little impact (AUC = 0.90 and 0.92, respectively). Feature importance profiles were also highly consistent: the top 5–6 predictors were identical across models, shifting only in rank order (detailed results are available on request).

Discussion

Using the nationally representative data in conjunction with ML, this study predicted PTB among NHB and NHW primiparous women in the U.S. with high accuracy. By capturing more effectively the ways that multidimensional SDoH manifest, our ML models with SHAP values showed Black-White differences in what factors contributed to predicting PTB and to what extent, offering a more nuanced explanation of such disparities. The number of PNC visits [34], adequacy of PNC [35], gestational hypertension [36], and the degree of gained weight during pregnancy [37] were found to play the most important role in determining PTB risk among both groups in this study. However, contextual socioeconomic disadvantage [38] (for NHB and NHW women) and high rates of individuals without a high school diploma [39] and single-parent households (for NHW women) also significantly contributed to predicting PTB, even to a greater extent than some of the well-known individual factors. The sensitivity analyses confirmed the robustness of our parsimonious, screened models.

Black-White inequities in PTB are driven by SDoH that lead to differential access to resources between the groups [13] based on systematic acts of exclusion and discrimination [40]. One significant example, with repercussions that persist today, is the practice of “redlining” (1930–1960s), whereby the federal U.S. government intentionally and systematically promoted segregation and then disinvested in communities where Black people lived [41]. Consequently, in the U.S., NHB women tend to suffer greater exposure to health-compromising SDoH, whereas NHW women tend to benefit from greater exposure to health-enhancing SDoH.

Ongoing systematic differentials in exposure and access to various SDoH have translated to the heterogeneity of the commonly observed drivers of PTB between NHB and NHW women [22, 42]. Secondary to the long- and short-term effects of systematic discrimination [43], NHB women in our study were significantly more likely to be of lower socioeconomic status and to live in areas with greater social vulnerability across all SVI domains. Accordingly, NHB women who experienced PTB reported higher rates of obesity before pregnancy and inadequate weight gain during pregnancy, likely due to the “food desert residency” associated with socially vulnerable areas [44]. Pregnancy intention and the number of prior pregnancy loss also emerged as critical areas of risk for NHB women.

Importantly, our ML models detected the significant contribution of contextual vulnerability—particularly concentration of socioeconomic disadvantage (poverty, lower income, unemployment, and lower education) and single-parent households—to PTB, above and beyond proximal individual factors. This aligns with prior research documenting that community deprivation [45, 46] and high concentrations of single-parenthood households [47] contribute to Black-White disparities in PTB. Concordant with the existing evidence [48, 49], we also found that community deprivation exerted a stronger influence on PTB among NHB than NHW women. Specifically, area-based poverty and unemployment ranked as the fourth and sixth most important predictor of PTB among NHB women, while area-based lower education and unemployment emerged as the eleventh and twelfth among NHW women. However, findings across studies remain mixed, with some reporting greater vulnerability to community deprivation among NHW women [45]. This underscores the need for continued research to elucidate the mechanisms by which diverse dimensions of contextual disadvantage contribute to the Black-White disparities in PTB.

While the role of contextual factors in PTB is not entirely new, our use of advanced data analytics, particularly SHAP values, strengthens the evidence for their contribution to Black-White disparities in PTB. By evaluating all possible combinations of individual and contextual factors in PTB prediction, we identified contextual predictors unique to NHB and NHW women. These predictors consistently demonstrated a significant impact, at times surpassing the influence of individual factors.

Our findings also highlight that how PNC is used matters more than simply how much is used for both NHB and NHW women. To mitigate reverse-causation bias where complicated pregnancies accumulate extra visits, we modeled both the total number of PNC visits and the adequacy of PNC utilization via Kotelchuck Index. Adequate + PNC utilization emerged as the second-strongest, risk-increasing predictor of PTB, followed by ≥ 12 PNC visits (protective) in both groups. Because excessive PNC utilization often flags pregnancies already experiencing complications, assessing PNC adequacy, not just visit counts, adds predictive value and helps clinicians identify at-risk women early in pregnancy.

Although not directly examined in this study, we assume the cumulative impact of multidimensional SDoH on PTB, particularly among NHB women, as we observed a weathering effect manifested as an increasing PTB rate with advanced maternal age in this group. Given the same chronological age, the impact of older age on PTB was greater among NHB than NHW women. These findings suggest a need for further investigation into NHB women’s aging accelerated by chronic stress from multidimensional health-compromising SDoH over the life course as an underlying mechanism of the Black-White disparities in PTB. In addition to chronic stress, psychological distress (e.g., depression) may further accelerate biological aging [50]. Emerging evidence shows that depressive symptoms and their biological sequelae differ by race/ethnicity, reflecting the interplay of gendered racism, cultures/norms, and inequitable care [51, 52]. Failing to recognize these racialized expressions of distress can compound stress burdens and thereby magnify PTB risk among NHB women [51].

This study has important implications in improving Black-White inequities in PTB. First, future research is encouraged to utilize advanced modeling techniques to take the complex associations between multidimensional determinants of health into account to accurately identify both women at high risk for PTB as well as the important, population-specific predictors for prevention and early intervention. ML is useful for revealing significant associations between contextual factors and PTB, otherwise masked by proximal individual factors, with linear models. In this way, ML has the potential to advance our understanding of the drivers of the Black-White disparities in PTB in the U.S. From a policy standpoint, our findings underscore the need for local, state, and federal government to invest in building/rebuilding socially vulnerable communities throughout the U.S.; a living wage and accessible and affordable nutritious food are paramount to positive birth outcomes. Our findings also highlight the need for equitable access to quality healthcare before and during pregnancy. Clinicians should also develop holistic care plans that address individual women’s health and social needs. This can be achieved through assessing both individual and contextual risk factors and connecting to available resources, including financial counseling, food banks, and transportation.

This study has some limitations. Because the PRAMS only includes state-level geographic identifiers, we were unable to merge it with the SVI at the county or census tract level for more granular analysis. As a result, the aggregation of contextual factors at the state level limits our ability to detect within-state heterogeneity and to fully characterize disparities at more localized levels. It is likely that analyses using finer geographic units would reveal wider Black-White gaps in contextual disadvantage and associated PTB risk. Additionally, because pregnancy intention is assessed postpartum in the PRAMS, it may be subject to recall bias or social desirability bias, potentially leading to misclassification and underreporting of unintended pregnancies. All these limitations stemming from secondary data may have impacted the models’ prediction accuracy. Moreover, aggregating different PTB types into a single outcome may have obscured predictors for specific PTB types, reducing the clinical application of our study findings. Because our study population was restricted to primiparous women, findings may not generalize to multiparous births, where recurrent PTB and different risk profiles are common. Replicating the present analysis in multiparous women will be an important next step.

Moreover, bias may have risen from median imputation if missingness is not random, exclusion of variables with > 10% missing data, and potential overfitting of ML models. Lastly, not accounting for PRAMS sampling weights could impact model performance and generalizability. However, stratified modeling by race and ethnicity helped address NHB women’s underrepresentation, improving prediction accuracy for this group.

Conclusion

This study utilized ML to uncover the pivotal role of contextual SDoH in shaping PTB risk among NHB and NHW primiparous women in the U.S. It highlights how contextual socioeconomic disadvantage exacerbates individual risk factors, underscoring the profound, generational consequences of structural racism on maternal health. Addressing Black-White inequities in PTB requires systemic efforts to dismantle racial segregation, reverse community disinvestment, and break the cycle of poverty. Without these transformative changes, the persistent Black-White disparities will remain entrenched.

Supplementary Information

Supplementary Material 1. (122.6KB, docx)

Acknowledgements

We thank the PRAMS Working Group, which includes the PRAMS Team, Division of Reproductive Health, CDC and the following PRAMS sites for their role in conducting PRAMS surveillance and allowing the use of their data: PRAMS Florida, PRAMS Georgia, PRAMS Iowa, PRAMS Indiana, PRAMS Minnesota, PRAMS Missouri, PRAMS North Carolina, PRAMS Wisconsin, PRAMS Wyoming.

Authors’ contributions

SK conceived and designed the study; obtained, analyzed, interpreted, and visualized the data; drafted, edited, and finalized the manuscript; administered the research project; and secured funding. ZB contributed to drafting, reviewing, and editing the manuscript. SG drafted the manuscript. ADFS contributed to drafting, reviewing, and editing the manuscript. BD contributed to drafting, reviewing, and editing the manuscript. MSB conceptualized the study, interpreted the data, drafted, reviewed, and edited the manuscript, and supervised the study. All authors have read and approved the final manuscript.

Funding

Research reported in this publication was supported by the National Institute of Nursing Research of the National Institutes of Health under Award Numbers K01NR019651 and R25NR021324.

Data availability

The data that support the findings of this study are available from the CDC PRAMS and SVI. The SVI data are freely available from the CDC website. However, access to PRAMS data requires approval from the CDC PRAMS team, and thus the data are not publicly available. Requests for the PRAMS data used in this study can be directed to the corresponding author.

Declarations

Ethics approval and consent to participate

As this study used publicly available, deidentified data, it was deemed exempt from ethical review by Emory University Institutional Review Board (STUDY00003674).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Ohuma EO, Moller AB, Bradley E, et al. National, regional, and global estimates of preterm birth in 2020, with trends from 2010: a systematic analysis. Lancet. 2023;402(10409):1261–71. [DOI] [PubMed] [Google Scholar]
  • 2.Osterman M, Hamilton B, Martin J, Driscoll A, Valenzuela C. Births: Final Data for 2022. 2024. 10.15620/cdc:145588. [PubMed]
  • 3.Behrman RE, Butler AS. Mortality and acute complications in preterm infants. Preterm birth: causes, consequences, and prevention. Washington, DC: National Academies Press (US); 2007. [PubMed]
  • 4.Ionio C, Colombo C, Brazzoduro V, et al. Mothers and fathers in NICU: the impact of preterm birth on parental distress. Eur J Psychol. 2016;12(4):604–21. 10.5964/ejop.v12i4.1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Beam AL, Fried I, Palmer N, et al. Estimates of healthcare spending for preterm and low-birthweight infants in a commercially insured population: 2008–2016. J Perinatol. 2020;40(7):1091–9. 10.1038/s41372-020-0635-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Braveman P, Dominguez TP, Burke W, et al. Explaining the Black-White disparity in preterm birth: a consensus statement from a multi-disciplinary scientific work group convened by the March of Dimes. Front Reprod Health. 2021. 10.3389/frph.2021.684207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Reno R, Burch J, Stookey J, Jackson R, Joudeh L, Guendelman S. Preterm birth and social support services for prenatal depression and social determinants. PLoS One. 2021;16(8):e0255810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Preis H, Wang W, Zhu W, Mahaffey B, Lobel M. Social determinants of health, prenatal maternal stress, and earlier birth during the COVID-19 pandemic. Soc Personal Psychol Compass. 2023;17(7): e12751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Koning SM, Ehrenthal DB. Stressor landscapes, birth weight, and prematurity at the intersection of race and income: elucidating birth contexts through patterned life events. SSM-population Health. 2019;8:100460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hong X, Bartell TR, Wang X. Gaining a deeper understanding of social determinants of preterm birth by integrating multi-omics data. Pediatr Res. 2021;89(2):336–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chae DH, Clouston S, Martz CD, et al. Area racism and birth outcomes among Blacks in the United States. Soc Sci Med. 2018;199:49–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kino S, Hsu YT, Shiba K, et al. A scoping review on the use of machine learning in research on social determinants of health: trends and research prospects. SSM-population Health. 2021;15:100836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.World Health Organization. Operational Framework for Monitoring Social Determinants of Health Equity. Geneva: World Health Organization; 2024. License: CC BY-NC-SA 3.0 IGO. https://www.who.int/publications/i/item/9789240088320. Accessed 15 Aug 2025.
  • 14.Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. 10.1023/A:1010933404324. [Google Scholar]
  • 15.Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67(2):301–20. 10.1111/j.1467-9868.2005.00503.x. [Google Scholar]
  • 16.Arabi Belaghi R, Beyene J, McDonald SD. Prediction of preterm birth in nulliparous women using logistic regression and machine learning. PLoS One. 2021;16(6):e0252025-0252025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Togunwa TO, Babatunde AO, Abdullah K. Ur R. Deep hybrid model for maternal health risk classification in pregnancy: synergy of ANN and random forest. Front Artif Intell. 2023. 10.3389/frai.2023.1213436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Esty A, Frize M, Gilchrist J, Bariciak E. Applying data preprocessing methods to predict premature birth. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2018:6096–6099. [DOI] [PubMed]
  • 19.Frize M, Yu N, Weyand S. Effectiveness of a hybrid pattern classifier for medical applications. Int J Hybrid Intell Syst. 2011;8(2):71–9. [Google Scholar]
  • 20.Koivu A, Sairanen M. Predicting risk of stillbirth and preterm pregnancies with machine learning. Health Inf Sci Syst. 2020;8(1):14. 10.1007/s13755-020-00105-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pan I, Nolan LB, Brown RR, et al. Machine learning for social services: a study of prenatal case management in Illinois. Am J Public Health. 2017;107(6):938–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kim S, Brennan PA, Slavich GM, Hertzberg V, Kelly U, Dunlop AL. Black-white differences in chronic stress exposures to predict preterm birth: interpretable, race/ethnicity-specific machine learning model. BMC Pregnancy Childbirth. 2024;24(1):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Centers for Disease Control and Prevention, About PRAMS. 2022. Accessed December 5, 2023. https://cdc.gov/prams/about/prams-faq.htm
  • 24.Shulman HB, D’Angelo DV, Harrison L, Smith RA, Warner L. The pregnancy risk assessment monitoring system (PRAMS): overview of design and methodology. Am J Public Health. 2018;108(10):1305–13. 10.2105/AJPH.2018.304563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Agency for Toxic Substances and Disease Registry. CDC/ATSDR Social Vulnerability Index (CDC/ATSDR SVI). Overview. June 14, 2024. Accessed July 25, 2024. https://www.atsdr.cdc.gov/placeandhealth/svi/index.html
  • 26.Agency for Toxic Substances and Disease Registry. CDC SVI Documentation. 2020. Centers for Disease Control and Prevention. Published online 2022. https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/SVI_documentation_2020.html
  • 27.U.S. Department of Health and Human Services. Social Determinants of Health. Accessed October 25. 2024. https://health.gov/healthypeople/priority-areas/social-determinants-health
  • 28.Institute of Medicine (US) and National Research Council (US) Committee to Reexamine IOM Pregnancy Weight Guidelines. In: Rasmussen KM, Yaktine AL, editors. Weight gain during pregnancy: reexamining the guidelines. National Academies Press (US); 2009. [PubMed]
  • 29.Bennett DA. How can I deal with missing data in my study? Aust N Z J Public Health. 2001;25(5):464–9. http://www.ncbi.nlm.nih.gov/pubmed/11688629. [PubMed] [Google Scholar]
  • 30.Chollet F, Kalinowski T, Allaire JJ. Deep learning with R, second edition. Manning; 2022.
  • 31.Wong K, Tessema GA, Chai K, Pereira G. Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015. Sci Rep. 2022;12(1):19153. 10.1038/s41598-022-23782-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Molnar C. 9. Local Model-Agnostic Methods. In: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2nd ed.; 2022. Accessed November 19, 2024. christophm.github.io/interpretable-ml-book/
  • 33.Covert I, Lundberg S, Lee S. Explaining by removing: a unified framework for model explanation. J Mach Learn Res. 2021;22(209):1–90. [Google Scholar]
  • 34.Champion ML, Bushman ET, Martin KD, et al. Reevaluating associations between prenatal care utilization and current trends in preterm birth. Am J Perinatol. 2024;41(13):1880–6. 10.1055/a-2295-6524. [DOI] [PubMed] [Google Scholar]
  • 35.Krueger P, Scholl T. Adequacy of prenatal care and pregnancy outcome. J Osteopath Med. 2000;100(8):485–92. [PubMed] [Google Scholar]
  • 36.Wilson DA, Mateus J, Ash E, Turan TN, Hunt KJ, Malek AM. The association of hypertensive disorders of pregnancy with infant mortality, preterm delivery, and small for gestational age. Healthcare. 2024;12(5):597. 10.3390/healthcare12050597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Santos S, Voerman E, Amiano P, et al. Impact of maternal body mass index and gestational weight gain on pregnancy complications: an individual participant data meta-analysis of European, North American and Australian cohorts. BJOG. 2019;126(8):984–95. 10.1111/1471-0528.15661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Blumenshine P, Egerter S, Barclay CJ, Cubbin C, Braveman PA. Socioeconomic disparities in adverse birth outcomes: A systematic review. Am J Prev Med. 2010;39(3):263–72. 10.1016/J.AMEPRE.2010.05.012. [DOI] [PubMed] [Google Scholar]
  • 39.Auger N, Gamache P, Adam-Smith J, Harper S. Relative and absolute disparities in preterm birth related to neighborhood education. Ann Epidemiol. 2011;21(7):481–8. 10.1016/J.ANNEPIDEM.2011.03.012. [DOI] [PubMed] [Google Scholar]
  • 40.U.S. Department of Health and Human Services. Health Equity in Healthy People. 2030. Office of Disease Prevention and Health Promotion, Office of the Assistant Secretary for Health, Office of the Secretary, U.S. Department of Health and Human Services. 2022. Accessed November 21, 2024. https://health.gov/healthypeople/priority-areas/health-equity-healthy-people-2030
  • 41.Hollenbach SJ, Thornburg LL, Glantz JC, Hill E. Associations between historically redlined districts and racial disparities in current obstetric outcomes. JAMA Netw Open. 2021;4(9):e2126707. 10.1001/jamanetworkopen.2021.26707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kim S, Im EO, Liu J, Ulrich C. Factor structure for chronic stress before and during pregnancy by racial/ethnic group. West J Nurs Res. 2019;41(5):704–27. 10.1177/0193945918788852. [DOI] [PubMed] [Google Scholar]
  • 43.Paradies Y, Ben J, Denson N, et al. Racism as a determinant of health: a systematic review and meta-analysis. PLoS One. 2015;10(9):e0138511. 10.1371/journal.pone.0138511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Haley CO, Singleton CR, King LE, Dyer L, Theall KP, Wallace M. Association of food desert residency and preterm birth in the united States. Int J Environ Res Public Health. 2024;21(4): 412. 10.3390/ijerph21040412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ncube CN, Enquobahrie DA, Albert SM, Herrick AL, Burke JG. Association of neighborhood context with offspring risk of preterm birth and low birthweight: a systematic review and meta-analysis of population-based studies. Soc Sci Med. 2016;153:156–64. 10.1016/j.socscimed.2016.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.O’Campo P, Burke JG, Culhane J, et al. Neighborhood deprivation and preterm birth among Non-Hispanic black and white women in eight geographic areas in the united States. Am J Epidemiol. 2007;167(2):155–63. 10.1093/aje/kwm277. [DOI] [PubMed] [Google Scholar]
  • 47.Khan SS, Vaughan AS, Harrington K, et al. US county–level variation in preterm birth rates, 2007–2019. JAMA Netw Open. 2023;6(12):e2346864. 10.1001/jamanetworkopen.2023.46864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Janevic T, Stein CR, Savitz DA, Kaufman JS, Mason SM, Herring AH. Neighborhood deprivation and adverse birth outcomes among diverse ethnic groups. Ann Epidemiol. 2010;20(6):445–51. 10.1016/J.ANNEPIDEM.2010.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Blackson EA, Williams B, Judson J, Bell C, Wallace M. The association between structural disadvantage and adverse birth outcomes: analyzing preterm birth and low birth weight using the structural racism effect index. J Racial Ethn Health Disparities. Published Online April. 2025;29. 10.1007/s40615-025-02454-1. [DOI] [PubMed]
  • 50.Saeed H, Wu J, Tesfaye M, Grantz KL, Tekola-Ayele F. Placental accelerated aging in antenatal depression. Am J Obstet Gynecol MFM. 2024;6(1): 101237. 10.1016/J.AJOGMF.2023.101237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Mayne GB, Ghidei L. The impact of devaluing women of color: stress, reproduction, and justice. Birth. 2024;51(2):245–52. 10.1111/birt.12825. [DOI] [PubMed] [Google Scholar]
  • 52.Perez NB, D’Eramo Melkus G, Wright F, et al. Latent class analysis of depressive symptom phenotypes among black/African American mothers. Nurs Res. 2023;72(2):93–102. 10.1097/NNR.0000000000000635. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1. (122.6KB, docx)

Data Availability Statement

The data that support the findings of this study are available from the CDC PRAMS and SVI. The SVI data are freely available from the CDC website. However, access to PRAMS data requires approval from the CDC PRAMS team, and thus the data are not publicly available. Requests for the PRAMS data used in this study can be directed to the corresponding author.


Articles from BMC Pregnancy and Childbirth are provided here courtesy of BMC

RESOURCES