Skip to main content
BMC Medical Informatics and Decision Making logoLink to BMC Medical Informatics and Decision Making
. 2025 Nov 10;25:415. doi: 10.1186/s12911-025-03254-7

Predicting the risk of preterm birth with machine learning and electronic health records in China

Lushuai Qian 1,#, Hanyue Jia 1,#, Zhou Chang 1, Yanjun Hu 2, Chunling Chen 2, Xiaoqing Li 2,, Hongping Zhang 2,
PMCID: PMC12604261  PMID: 41214653

Abstract

Background

Preterm birth is a serious global public health issue, and early prediction in pregnant women is crucial for timely intervention and reduction of the incidence preterm births. We aimed to predict and validate the risk of preterm birth with machine learning, deep learning, and electronic health records in China.

Materials and methods

Data were collected from 58,424 pregnant women between May 2015 and April 2024. After excluding incomplete records, a total of 36,378 cases were included, consisting of 34,132 full-term births and 2,246 preterm births. Of the 24 known high-risk factors for preterm birth, 20 statistically significant features were identified for model construction. Six machine learning algorithms were applied to process the data containing missing values, and 22 models were developed for predicting preterm births using the imputed data. Additionally, two dynamic deep learning methods were incorporated in our model development process.

Results

Among the machine learning models, the Random Forest model performed best in both datasets with missing values and imputed data, achieving a maximum AUC of 0.826. The LightGBM model also exhibited strong performance, even with fewer features. Among the deep learning models, the LSTM model performed better, with an AUC of 0.851. Additionally, data from 10,367 pregnant women, collected between May and December 2024, were used as an external validation set, confirming the model’s stability.

Conclusions

The findings of this study indicate that both machine learning and deep learning models using electronic health records are valuable for preterm birth risk screening, supporting their use in clinical practice for preterm birth risk management.

Clinical trial number

Not applicable.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12911-025-03254-7.

Keywords: Machine learning, Deep learning, Electronic health records, Prediction, Preterm birth

Introduction

Preterm birth is defined as delivery prior to 37 completed weeks of gestation [1, 2]. Globally, approximately 15 million infants are born preterm each year, corresponding to an overall preterm birth rate of approximately 11%, with notable regional variations: 13.0% in India, 10.0% in the United States and 6.1% in China [3, 4]. Notably, more than 1 million of these preterm infants worldwide die before five years of age as a result of preterm birth and its associated complications [3]. Surviving preterm infants often face a range of short-term and long-term health issues, including diabetes, hypertension, neurodevelopmental disorders, and an increased risk of cognitive dysfunction [5]. In the United States alone, the economic burden associated with preterm birth is estimated to reach $26.2 billion annually [6]. These challenges place a significant psychological and financial strain on families and pose a major challenge to public health systems.

Early identification of pregnant women at risk for preterm birth, coupled with the implementation of appropriate preventive measures, can significantly reduce the incidence of preterm birth and improve neonatal health outcomes [7]. Research has demonstrated that the use of prenatal steroids and tocolytics can extend gestation in women at risk of preterm birth [8], and with effective medical intervention, the survival rate of preterm infants can reach up to 75% [6]. Therefore, accurate prediction of preterm birth risk is essential, as it not only allows healthcare teams to intervene early but also enables the creation of tailored care plans for pregnant women, thereby mitigating potential risks.

Studies have shown that various types of data, such as electronic health records (EHRs), serological data, molecular genetics, and metabolomics, can be used to predict preterm birth [916]. Among these, EHRs stand out as simpler and more accessible sources of information. EHRs are large, readily available, and continuously updated databases that offer unique advantages in disease prediction, particularly for pregnant women who undergo regular examinations, making the data more complete and reliable for preterm birth risk assessment [17]. Yanli et al. developed a preterm birth prediction model using 13 indicators, including age, education level, race, pre-pregnancy weight, and pre-pregnancy diabetes, derived from the U.S. National Vital Statistics System (NVSS) database for 2018–2019, achieving an AUC of 0.688 [18]. Abraham et al. combined various data sources from EHRs, integrating demographic factors, clinical history, laboratory tests, and genetic risk, to predict preterm birth using machine learning, resulting in an AUC of 0.75 [19]. These findings suggest that effectively integrating and analyzing diverse data sources within EHRs can not only significantly improve the accuracy and reliability of preterm birth risk prediction but also provide strong support for clinical decision-making and intervention.

Traditional risk factor assessment methods rely on simple linear statistical models, which are limited in their ability to identify many potential risk factors, particularly in predicting high-risk pregnancies. In contrast, machine learning models can analyze large amounts of complex medical data, identify potential risk factors for preterm birth, capture linear and non-linear correlations between features and outcomes, and demonstrate strong robustness and the ability to deal with missing data [20], which minimizes the impact of incomplete information on prediction accuracy and enhances the overall performance and reliability of the model. Deep learning models, by virtue of their dynamic data processing capabilities, further complement this advantage: they excel at capturing time-varying or sequential patterns in medical data that are often overlooked by traditional method [2123]. Together, machine learning and deep learning, through extracting valuable data from extensive EHRs, machine learning offers new pathways for disease prediction, diagnosis, and prognosis assessment, making it a powerful tool for disease risk prediction and preventive management [21, 23, 24].

This study aimed to predict the risk of preterm birth using EHRs data from Wenzhou People’s Hospital, China, spanning from May 2015 to April 2024 in China, using 22 different machine learning and 2 deep learning methods. Notably, the preterm birth rate among the study population in this hospital was 6.17%. For context, this rate aligns closely with the national average preterm birth rate in China. A total of 22 machine learning algorithms and 2 deep learning approaches were employed for model development, all developed models underwent systematic optimization and validation to confirm their reliability and practical application value. The findings of this research aim to enhance clinical preterm birth prediction and early intervention, ultimately contributing to improved neonatal health outcomes.

Methods

Data source and participants

The data used in this study were derived from the electronic health records of pregnant women who delivered at Wenzhou People’s Hospital in Zhejiang Province, China, between May 2015 and April 2024, with complete and documented obstetric histories. The demographic and clinical characteristics of all subjects were extracted from hospital databases, and a total of 58,424 cases (covering two or more births in the same person during the aforementioned time period) were included in the study. Additionally, data from 10,367 women who delivered between May 2024 and December 2024 were included as an external validation set. This study was approved by the Ethics Committee of Wenzhou People’s Hospital in accordance with the Declaration of Helsinki.

Data collection and preprocessing

We used the data of all pregnant women who gave birth in hospital between May 2015 and April 2024 as the training set and the test set. The data were first cleaned to exclude: pregnant women with incomplete personal information, missing last menstrual period, missing time of birth of the fetus, fetal malformation or death, and pregnant women under 18 years of age. A total of 36,378 cases were ultimately included for model construction. Subsequently, gestational age at birth was calculated by subtracting the time of last menstrual period date from the infant’s birth date. Cases with a gestational age of less than 37 weeks were defined as preterm, while those with a gestational age of 37 weeks or more were defined as full-term [25]. Among these cases, 2,246 were classified as preterm and 34,132 as full-term. The cases were then divided into training and testing datasets in a 7:3 ratio, with the training set used for model development and the testing set used for internal validation.

Feature selection

We selected 24 high-risk factors for preterm birth as candidate features [18, 2630]. Among them, maternal age (36,378 cases), cervical length (6,100 cases), and pre-pregnancy Body mass index (BMI, 4,519 cases) were numerical features. Degree of education (26,419 cases) was divided into 7 categories: primary school, middle school, high school, associate degree, bachelor’s degree, master’s degree, and doctoral degree. History of preterm birth, history of miscarriage, in vitro fertilization (IVF), pre-pregnancy hypertension, pre-pregnancy diabetes, polycystic ovary syndrome (PCOS), scarred uterus, multiple pregnancies, placenta previa, gestational diabetes, gestational hypertension, eclampsia, gestational anemia, gestational thyroid dysfunction, gestational chorioamnionitis, gestational vaginitis, uterine fibroids, hydramnios or oligohydramnios, premature rupture of membranes, and fetal distress were all classified indicators with no missing value (36,378 cases). The formula for calculating pre-pregnancy BMI was: pre-pregnancy weight (kg) / [height (m)]².

To reduce inconsistencies caused by variable physician documentation habits: For “maternal age”, we calculated it using the infant’s confirmed birth time and the maternal birth date extracted from China’s unique national ID card. For indicators like “medical history” and “family genetic history”, we unified them into predefined categorical variables for consistent statistical analysis. For “cervical length”, we only included data measured by transvaginal ultrasound [31], ensuring uniformity in measurement methods.

Indicators that showed statistically significant differences between preterm and full-term women were selected as model features. First, the normality of age, cervical length, and pre-pregnancy BMI data was assessed using the Kolmogorov-Smirnov test, all were found to follow a normal distribution. Independent sample t-tests were then used for comparisons, with results expressed as mean ± standard deviation. Categorical data were described using frequency (percentage) and compared using the chi-square test. A p-value of less than 0.05 was considered statistically significant. All statistical analyses were performed using the Scipy library in Python 3.11.5.

Establishing premature birth prediction model based on machine learning algorithm

In this study, we initially selected six machine learning methods capable of handling missing values to construct preterm birth prediction models. These methods include Random Forest, Gradient Boosting Algorithms (Hist Gradient Boosting, XGBoost, LightGBM, CatBoost), and Decision Trees. Subsequently, we imputed missing values in cervical canal length data using the mean and in educational level data using the mode. Additionally, we employed KNN imputation and model-based imputation to further explore the impact of different missing value handling methods on model performance. We employed 22 machine learning methods to construct preterm birth prediction models, including 7 ensemble methods (Hist Gradient Boosting, XGBoost, CatBoost, LightGBM, Random Forest, AdaBoost, Gradient Boosting), 3 Naive Bayes methods (BernoulliNB, GaussianNB, MultinomialNB), 6 linear models (Logistic Regression, Logistic RegressionCV, Linear Discriminant Analysis, Ridge Classifier, Perceptron, SGD Classifier), 2 nearest neighbor methods (KNeighbors, Nearest Centroid), and 2 semi-supervised learning methods (Label Propagation, Label Spreading), as well as Decision Trees and Neural Networks.

Each model was used to predict based on a single feature, and the features were ranked according to their Area under the curve (AUC) values. Subsequently, features were incrementally added to determine the optimal combination. During model training, hyperparameters were optimized using grid search. The model performance was evaluated using Receiver operating characteristic (ROC) curves, AUC, and accuracy. All procedures were performed using the scikit-learn library in Python version 3.11.5.

External validation set for the model

This study selected 10,367 pregnant women who delivered between May 2024 and December 2024 as the external validation set. After data cleaning, 233 preterm births and 3,679 full-term births were included. The optimal prediction model and corresponding parameters were used for external validation.

Continuous dynamic modeling

To address the limitation of traditional static modeling, which only reflects information at a single time point, this study constructed a multi-stage time-series prediction framework based on Long Short-Term Memory (LSTM). The core input of this framework is cervical length data collected at multiple time points during pregnancy. Through the LSTM model, the evolutionary characteristics of length of cervical canal at different stages of pregnancy are continuously explored in the process of time-series modeling, providing targeted time-series modeling support for the dynamic assessment of preterm birth risk. In addition, we also used Transformer to build a dynamic model, which also takes cervical length data at multiple time points as input.

Data visualization

The visualizations of the results of this study were drawn using the Matplotlib library of Python version 3.11.5.

Results

Study design and characteristic value of preterm birth risk were determined

We collected EHRs information from pregnant women between May 2015 and April 2024. After excluding 1,224 cases of incomplete ID number, 66 cases under the age of 18, 18,017 cases of missing last menstrual period and 2,019 cases of missing birth time, 34,132 cases of full-term and 2,246 cases of premature birth were finally included. We selected 24 previously reported high-risk factors for preterm birth as candidate features, including 10 pre-pregnancy indicators and 14 pregnancy indicators. Statistically significant features were identified by analyzing the differences between the full-term and preterm groups. First, we used six machine learning algorithms to predict preterm birth on the dataset containing missing values. Subsequently, after interpolating the missing data, 22 models were used for prediction. The best-performing model and feature combination were selected and externally validated using electronic health records of pregnant women from May 2024 to December 2024 (Fig. 1).

Fig. 1.

Fig. 1

Schematic representation of the study. The experimental design included data collection, data preprocessing, feature selection, select the optimal model, model validation and model evaluation. HGB, Hist Gradient Boosting Classifier; XGB, XGB Classifier; CB, CatBoost Classifier; LGBM, LGBM Classifier; RF, Random Forest Classifier; DT, Decision Tree Classifier; AB, AdaBoost Classifier; GB, Gradient Boosting Classifier; BNB, BernoulliNB; GNB, GaussianNB; MNB, MultinomialNB; KNN, KNeighbors Classifier; NC, Nearest Centroid; LDA, Linear Discriminant Analysis; LR, Logistic Regression; LRCV, Logistic RegressionCV; RC, Ridge Classifier; SGD, SGD Classifier; PCP, Perceptron; MLP, MLP Classifier; LP, Label Propagation; LS, Label Spreading; LSTM, Long Short-Term Memory

Eigenvalue selection

To identify the feature variables for constructing a preterm birth prediction model, we performed a statistical analysis of 24 candidate risk factors (Table 1). The results showed that compared to full-term mothers, preterm mothers were significantly older (P < 0.001), and had significantly shorter cervical lengths (P < 0.001). The number of women with a history of preterm birth (P < 0.001), a history of miscarriage (P < 0.001), and those who underwent in vitro fertilization (P < 0.001) was significantly higher. There was also a significant increase in the prevalence of pre-pregnancy hypertension (P < 0.001), polycystic ovary syndrome (P < 0.001), gestational diabetes (P < 0.001), gestational hypertension (P < 0.001), eclampsia (P < 0.001), gestational anemia (P < 0.001), gestational chorioamnionitis (P < 0.001), gestational vaginitis (P = 0.004), uterine fibroids (P < 0.001), hydramnios or oligohydramnios (P = 0.018), placenta previa (P < 0.001), multiple pregnancies (P < 0.001), premature rupture of membranes (P < 0.001), and fetal distress (P < 0.001), along with a lower degree of education (P < 0.001). Consequently, we ultimately selected these 20 statistically significant indicators as feature variables to construct the preterm birth prediction model.

Table 1.

The baseline of preterm birth and full-term pregnant women

Variables Non-preterm birth Preterm birth Statistic P value
(n = 34132) (n = 2246)
Age 30.70 ± 4.57 31.13 ± 4.98 t=-4.02 < 0.001
Degree of education χ²=72.94 < 0.001
 Primary school 724 (2.93) 90 (5.26)
 Middle school 5417 (21.92) 468 (27.37)
 High school 4215 (17.06) 299 (17.49)
 Associate degree 7935 (32.11) 504 (29.47)
 Bachelor’s degree 6012 (24.33) 327 (19.12)
 Master’s degree 397 (1.61) 21 (1.23)
 Doctoral degree 9 (0.04) 1 (0.06)
History of preterm birth χ²=242.56 < 0.001
 No 33,474 (98.07) 2090 (93.05)
 Yes 658 (1.93) 156 (6.95)
History of miscarriage χ²=13.60 < 0.001
 No 15,482 (45.36) 929 (41.36)
 Yes 18,650 (54.64) 1317 (58.64)
In-vitro fertilization or not χ²=68.23 < 0.001
 No 33,782 (98.97) 2180 (97.06)
 Yes 350 (1.03) 66 (2.94)
Pre-pregnancy BMI 21.04 ± 3.55 21.16 ± 3.21 t=-0.56 0.577
Pre-pregnancy hypertension χ²=45.76 < 0.001
 No 34,019 (99.67) 2218 (98.75)
 Yes 113 (0.33) 28 (1.25)
Pre-pregnancy diabetes χ²=2.76 0.097
 No 33,986 (99.57) 2231 (99.33)
 Yes 146 (0.43) 15 (0.67)
Polycystic ovary syndrome χ²=21.67 < 0.001
 No 34,021 (99.67) 2225 (99.07)
 Yes 111 (0.33) 21 (0.93)
Scar uterus χ²=1.74 0.187
 No 25,756 (75.46) 1667 (74.22)
 Yes 8376 (24.54) 579 (25.78)
Multiple pregnancies χ²=1817.52 < 0.001
 No 33,880 (99.26) 1983 (88.29)
 Yes 252 (0.74) 263 (11.71)
Length of cervical canal 34.80 ± 5.56 29.87 ± 8.91 t = 13.72 < 0.001
Placenta previa χ²=1084.05 < 0.001
 No 33,591 (98.41) 1972 (87.80)
 Yes 541 (1.59) 274 (12.20)
Gestational diabetes χ²=65.27 < 0.001
 No 28,339 (83.03) 1715 (76.36)
 Yes 5793 (16.97) 531 (23.64)
Gestational hypertension χ²=67.93 < 0.001
 No 32,593 (95.49) 2059 (91.67)
 Yes 1539 (4.51) 187 (8.33)
Gestational eclampsia χ²=610.46 < 0.001
 No 33,660 (98.62) 2053 (91.41)
 Yes 472 (1.38) 193 (8.59)
Gestational anaemia χ²=190.31 < 0.001
 No 22,753 (66.66) 1177 (52.40)
 Yes 11,379 (33.34) 1069 (47.60)
Gestational thyroid dysfunction χ²=1.25 0.263
 No 31,795 (93.15) 2106 (93.77)
 Yes 2337 (6.85) 140 (6.23)
Gestational chorioamnionitis χ²=508.26 < 0.001
 No 30,304 (88.78) 1633 (72.71)
 Yes 3828 (11.22) 613 (27.29)
Gestational vaginitis χ²=8.43 0.004
 No 33,671 (98.65) 2199 (97.91)
 Yes 461 (1.35) 47 (2.09)
Uterine fibroids χ²=18.01 < 0.001
 No 31,857 (93.33) 2044 (91.01)
 Yes 2275 (6.67) 202 (8.99)
Hydramnios or oligohydramnios χ²=5.58 0.018
 No 31,249 (91.55) 2024 (90.12)
 Yes 2883 (8.45) 222 (9.88)
Premature rupture of membranes χ²=870.05 < 0.001
 No 29,781 (87.25) 1457 (64.87)
 Yes 4351 (12.75) 789 (35.13)
Fetal distress χ²=36.23 < 0.001
 No 31,560 (92.46) 1998 (88.96)
 Yes 2572 (7.54) 248 (11.04)

Data is expressed as frequency (percentage) or mean (SD). SD stands for standard deviation, using a rank sum test (t-test) for continuous variables and a Chi-square test for categorical variables

Selection of the optimal preterm birth prediction model based on machine learning algorithms

Preterm birth prediction using six machine learning algorithms

To construct a prediction model for preterm birth, we initially employed six machine learning algorithms capable of handling missing values, including Random Forest, Gradient Boosting algorithms (Hist Gradient Boosting, XGBoost, LightGBM, CatBoost), and Decision Tree. We performed separate predictions for preterm birth using 20 features (Fig. 2), and found that regardless of the model used, premature rupture of membranes was the most significant influencing factor, with the AUC values for all six algorithms reaching 0.61. This was followed by chorioamnionitis and cervical length.

Fig. 2.

Fig. 2

The predictive power of individual features for preterm birth across different models was evaluated. Six machine learning algorithms were used to predict preterm birth without any preprocessing of missing data

Further, we ranked the features in different models from large to small according to the predictive ability of each feature, and gradually increased the number of features in different models according to the ranking. The results showed that the AUC value of the model gradually increased with the increase of the number of features (Fig. 3A). When using the top 9 features for prediction in each model, LightGBM achieved the highest AUC value of 0.809, with an accuracy of 0.941 (Fig. 3B). The included features were: premature rupture of membranes, gestational chorioamnionitis, cervical length, gestational anemia, degree of education, multiple pregnancies, placenta previa, maternal age, and eclampsia. This was closely followed by Random Forest (AUC = 0.806), CatBoost (AUC = 0.806), Hist Gradient Boosting (AUC = 0.805), XGBoost (AUC = 0.805), and Decision Tree (AUC = 0.784). Among all models, when using 19 features for preterm birth prediction, Random Forest performed the best, achieving the highest AUC value of 0.826 with an accuracy of 0.940 (Fig. 3C).

Fig. 3.

Fig. 3

Find the best combination of eigenvalues in different models. (A) The AUC value of the number of eigenvalues is gradually increased in different models. (B) The receiver operating characteristic curve predicted by the LightGBM model using 9 features. (C) The receiver operating characteristic curve predicted by the random forest model using 19 features. All the missing value data is not processed. (D) The SHapley Additive exPlanations (SHAP) summary plot for the LightGBM model, which quantifies feature importance and the direction of their association with preterm birth risk. (E) The SHAP summary plot for the LightGBM model. PROM, premature rupture of membranes; PPV, placenta previa; GCA, gestational chorioamnionitis; MP, multiple pregnancies; CCL, length of cervical canal; GA, gestational anaemia; GE, gestational eclampsia; PTB Hx, history of preterm birth; Educ, degree of education; GDM, gestational diabetes; FD, fetal distress; H/O, hydramnios or oligohydramnios; GH, gestational hypertension; UF, uterine fibroids; Hx of Misc, history of miscarriage; IVF or not, In-vitro fertilization or not; PP HTN, pre-pregnancy hypertension; GV, gestational vaginitis

We conducted SHAP-based feature interpretation for the LightGBM and Random Forest models to quantify feature importance and clarify their association direction with preterm birth. For LightGBM, premature rupture of membranes, gestational chorioamnionitis, placenta previa, and shorter cervical length significantly increased preterm birth risk (Fig. 3D). For Random Forest, a history of preterm birth, education level, and gestational diabetes also elevated preterm birth likelihood (Fig. 3E).

Preterm birth prediction was performed using 22 machine learning algorithms based on data missing value interpolation

To explore the predictive ability of more machine learning algorithms for preterm birth, we handled the data with missing values using an imputation method. Subsequently, we used 22 machine learning methods to build prediction models for individual features with 20 eigenvalues (Fig. 4). Similar to the results of the six models that handled missing data mentioned above, premature rupture of membranes demonstrated good predictive performance in most models, with AUC values reaching 0.61, followed by gestational chorioamnionitis and cervical length. However, in the Multinomial Naive Bayes, Perceptron, and Ridge Classifier models, the contribution of individual features was relatively balanced, with AUC values around 0.5.

Fig. 4.

Fig. 4

The predictive power of a single eigenvalue for preterm birth in different models. Twenty-two machine learning algorithms were used to predict preterm birth, and the missing value data was imputed

Further, we ranked the prediction ability of each feature in different models from large to small, and gradually increased the number of features in different models according to the size. The results show that the predictive ability of most models improves with the increase of features, but the predictive ability of KNeighbors, LabelPropagation, Nearest Centroid and Perceptron models decreases after the increase. When the number of features increased to the 9th, 10 models, including XGBoost, CatBoost, Gradient Boosting, and MLP Classifier, achieved an AUC value greater than 0.800, with LightGBM having the highest AUC value of 0.809. When the number of features was further increased to 19, the Random Forest model had the strongest predictive ability, maintaining an AUC value of 0.826, with an accuracy of 0.941 (Fig. 5). Meanwhile, the AUC values of 7 models, including XGBoost, CatBoost, Gradient Boosting, and MLP Classifier, also reached 0.82. Similar to the results of the above six models with unprocessed missing values, the random forest model performs well in the case of sufficient features, providing stable and efficient predictions.

Fig. 5.

Fig. 5

The best combination of eigenvalues was found among 22 models. The missing value data is imputed

External model validation

We found that the Random Forest model showed the highest predictive AUC value for preterm birth in the data without processing the missing data (Fig. 3C) and after interpolation processing (Fig. 5). Further, we collected 39,12 cases from May to December 2024 as an external validation set to evaluate the model’s performance. In the data set without missing value imputation, the model achieved an AUC value of 0.814, an accuracy rate of 0.892, a recall rate of 0.494 (Fig. 6A). After interpolating the missing values, the model obtained an AUC value of 0.815, an accuracy rate of 0.890, a recall rate of 0.515 (Fig. 6B).

Fig. 6.

Fig. 6

ROC curves of random forest and LightGBM models on external validation sets. (A) Receiver operating characteristic curve of the random forest model at 19 eigenvalues without processing the external validation set of missing values. (B) Receiver operating characteristic curve of the random forest model in the external validation set with imputed missing values at 19 eigenvalues. (C) Receiver operating characteristic curve of the LightGBM model in the external validation set of the unprocessed missing values at the time of 9 eigenvalues. (D) Receiver operating characteristic curve of the LightGBM model in the external validation set with imputed missing values at 19 eigenvalues

In addition, we found that LightGBM algorithm showed better prediction effect in fewer eigenvalues (9 in total). Further, we validated this on an external validation set. In the dataset without handling missing values, the model achieved an AUC value of 0.793, an accuracy of 0.831, a recall rate of 0.605 (Fig. 6C). After imputing the missing values, the model had an AUC value of 0.797, an accuracy of 0.823, a recall rate of 0.622 (Fig. 6D). The external validation showed good predictive results in both cases. This suggests that the LightGBM algorithm can be considered when using a smaller number of features. We also employed KNN imputation and model-based imputation methods, with validation on the validation set showing similar AUC values and other metrics to the results above (Supplementary Fig. 1A-D).

Construction and validation of a preterm birth prediction model based on deep learning model

To fully explore the dynamic associations of temporal features in pregnancy data, this study introduced an LSTM network. Model performance verification showed that, compared with static models not incorporating temporal features, adding cervical length temporal data significantly improved the LSTM model’s predictive performance on the test set. The model achieved an AUC of 0.851 and a high recall rate of 0.757 (Fig. 7A), indicating its effectiveness in identifying high-risk preterm birth populations. In the external validation set, the model maintained stable performance with an AUC of 0.811 and a recall rate of 0.647 (Fig. 7B), confirming the key role of temporal features in enhancing model generalization. Additionally, we constructed a Transformer model, which achieved an AUC of 0.850 and a recall of 0.553 on the test set (Fig. 7C), and maintained stable performance in the external validation set with an AUC of 0.807 and a recall of 0.506 (Fig. 7D). Among all models, the LSTM model exhibited the best overall performance. Furthermore, we employed KNN imputation and model-based imputation methods, with validation results showing AUC values and other metrics similar to those mentioned above (Supplementary Fig. 2A-H).

Fig. 7.

Fig. 7

Prediction performance of the LSTM model integrating multi-time-point cervical length and static features. (A) Receiver Operating Characteristic (ROC) curve of the LSTM model for preterm birth prediction on the test set. (B) ROC curve of the LSTM model for preterm birth prediction on the validation set. (C) ROC curve of the Transformer model for preterm birth prediction on the test set. (D) ROC curve of the Transformer model for preterm birth prediction on the validation set

Discussion

This study analyzed nearly 10 years of EHRs from pregnant women and identified 20 features closely related to preterm birth. Using these 20 features, a series of PTB prediction models were developed, incorporating 22 machine learning algorithms and 2 deep learning methods. Among the machine learning models, the random forest model achieved a maximum AUC of 0.826. Notably, across all constructed models, the LSTM dynamic prediction model outperformed others, reaching the highest AUC of 0.851. This outcome lays a solid foundation for the early identification of preterm birth risk and preventive interventions, offering the potential to provide more precise management plans for pregnant women in clinical practice.

First, our study addresses the small-sample limitation of prior PTB prediction research by using a much larger dataset: over 40,000 retrospective EHRs collected over nearly a decade. This scale far exceeds the sample sizes of previous studies, which typically included only several hundred to several thousand cases [26, 32, 33]. This large cohort enhances statistical power and the reliability of model training. Second, we optimized the balance between performance and clinical practicality. Existing PTB prediction models often require more features to achieve lower AUC values: for example, one study needed 13 features to reach an AUC of 0.68 [18], while another relied on 12 key variables to attain an AUC of 0.75 [34]. In contrast, our LightGBM model uses only 9 features to attain an AUC of 0.809. Fewer features streamline data collection and model use in clinical settings. Third, unlike most static, single-time-point machine learning models [3537], we developed deep learning-based dynamic frameworks integrated with multi-time-point cervical length data. These models capture temporal pregnancy changes and maintain high recall rates, which is critical for reducing missed high-risk PTB diagnoses.

We found that premature rupture of membranes, gestational chorioamnionitis, cervical length, and gestational anemia had stronger predictive power for preterm birth. This is consistent with the findings of studies by Ramachandran A [38]and Yao Zhang [26], indicating that these factors have strong predictive ability in assessing preterm birth risk. PROM can lead to the loss of amniotic fluid, increasing the risk of intrauterine infection and triggering contractions, which may result in preterm birth [1]. Gestational chorioamnionitis is often accompanied by elevated inflammatory mediators, which is closely associated with adverse perinatal outcomes [39]. Additionally, pregnant women with a cervical length < 25 mm, especially in the second trimester, typically face a higher preterm birth risk. This may be due to increased tension in the cervical canal, which raises the likelihood of uterine contractions and thus induces preterm birth [40]. These factors play a critical role in the pathogenesis of preterm birth.

Missing data is a common issue in clinical research, especially in studies involving electronic health records. This study employed two different approaches: one using data with missing values left unprocessed, and the other using data with missing values handled by imputation. In the case of unprocessed data, the Random Forest model achieved an AUC value of 0.826, and after imputation, the AUC value remained at 0.826. This indicates that the Random Forest model has high robustness against missing data. Its advantage lies in its ability to build a large number of decision trees and aggregate their results through voting or averaging, which provides higher accuracy, reduces overfitting, and performs stably on complex and high-dimensional data [41, 42]. In contrast, this study found that KNeighbors, Label Propagation, Label Spreading, Nearest Centroid, Perceptron, and Ridge Classifier did not perform well in prediction. This may be because these models are sensitive to data size and feature selection, and too many features can increase model complexity, thereby affecting their generalization ability. To further verify the impact of missing value interpolation methods on model performance, we use K-nearest neighbor (KNN) interpolation and model-based interpolation to supplement the reprocessing of missing data. The results are basically consistent, confirming that different interpolation strategies have no significant impact on the stability of the model.

Our study has some limitations. The dataset was derived from a single hospital in Zhejiang Province, China, which limits the generalizability of results. Future studies should incorporate more prospective data to further validate the clinical practicality and reliability of the models. A multi-center collaborative design is recommended, enroll participants from at least 3–5 representative medical institutions across different geographic regions to cover diverse demographic backgrounds and clinical management styles. Notably, beyond preterm birth risk screening, our framework could also be extended to predict preterm neonate outcomes [43]. By integrating postnatal clinical data with our existing prenatal feature set, our framework has the potential to develop into a full-cycle assessment tool for maternal-neonatal health.

Conclusions

This study developed machine learning and deep learning-based preterm birth prediction models and validated its effectiveness. The results showed that LSTM was outstanding in predicting preterm birth. Regardless of data missing or not, the AUC of the model reached 0.851, which could provide stable and efficient prediction results. It holds promise for use in clinical practice for preterm risk screening and management, offering a robust tool for identifying at-risk pregnant women. Future research could focus on further optimizing feature selection and algorithms to enhance the model’s applicability and accuracy across different populations and healthcare settings.

Supplementary Information

Below is the link to the electronic supplementary material.

Acknowledgements

Not applicable.

Abbreviations

IVF

in vitro fertilization

PCOS

polycystic ovary syndrome

EMR

electronic medical records

HGB

Hist Gradient Boosting Classifier

XGB

XGB Classifier

CB

CatBoost Classifier

LGBM

LGBM Classifier

RF

Random Forest Classifier

DT

Decision Tree Classifier

AB

AdaBoost Classifier

GB

Gradient Boosting Classifier

BNB

BernoulliNB

GNB

GaussianNB

MNB

MultinomialNB

KNN

KNeighbors Classifier

NC

Nearest Centroid

LDA

Linear Discriminant Analysis

LR

Logistic Regression

LRCV

Logistic RegressionCV

RC

Ridge Classifier

SGD

SGD Classifier

PCP

Perceptron

MLP

MLP Classifier

LP

Label Propagation

LS

Label Spreading

Author contributions

Lushuai Qian: Data interpretation, Writing-original draft, Data analysis. Hanyue Jia: Writing-original draft, Data analysis, Data visualization. Zhou Chang: Data analysis. Yanjun Hu: Data interpretation. Chunling Chen: Data collection. Xiaoqing Li: Supervision, Methodology, Review & editing. Hongping Zhang: Conceptualization, Project administration, Data Curation.

Funding

This study was supported by the Joint Funds of the Zhejiang Provincial Natural Science Foundation of China under Grant No.LBY23H200008, the Medical Health Science and Technology Project of Zhejiang Provincial under Grant No.2023RC272 and 2022KY1207, and the Science and Technology Planning Project of Wenzhou under Grant No.Y2023088 and ZY2021025.

Data availability

The data supporting the findings of this study were obtained from Wenzhou People’s Hospital. However, restrictions apply to the availability of these data, which were used under license for the current study and are not publicly accessible. Data may be available from the authors upon reasonable request and with permission from Wenzhou People’s Hospital.

Declarations

Ethics approval

The study was approved by the ethics committee of Wenzhou People’s Hospital (ethics number: WRY2021-215) in accordance with the Declaration of Helsinki.

Consent to participate

All subjects provided informed written consent.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Lushuai Qian and Hanyue Jia contributed equally to this work.

Contributor Information

Xiaoqing Li, Email: wzslixq@163.com.

Hongping Zhang, Email: zjzhp@126.com.

References

  • 1.Goldenberg RL, Culhane JF, Iams JD, et al. Epidemiology and causes of preterm birth. Lancet. 2008;371(9606):75–84. 10.1016/s0140-6736(08)60074-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Vogel JP, Chawanpaiboon S, Moller A-B, et al. The global epidemiology of preterm birth. Best Pract Res Clin Obstet Gynecol. 2018;52:3–12. 10.1016/j.bpobgyn.2018.04.003. [DOI] [PubMed] [Google Scholar]
  • 3.Walani SR. Global burden of preterm birth. Int J Gynecol Obstet. 2020;150(1):31–3. 10.1002/ijgo.13195. [DOI] [PubMed] [Google Scholar]
  • 4.Ohuma EO, Moller AB, Bradley E et al. National, regional, and global estimates of preterm birth in 2020, with trends from 2010: a systematic analysis. Lancet,2023,402(10409): 1261–71. 10.1016/s0140-6736(23)00878-4 [DOI] [PubMed]
  • 5.Blencowe H, Cousens S, Chou D, et al. Born too soon: the global epidemiology of 15 million preterm births. Reprod Health. 2013;10(1). 10.1186/1742-4755-10-s1-s2. S2. [DOI] [PMC free article] [PubMed]
  • 6.Institute of Medicine Committee on Understanding, Premature B, Assuring Healthy O. Preterm birth: Causes, Consequences, and prevention. Washington (DC): National Academy of Sciences; 2007. 10.17226/11622. [Google Scholar]
  • 7.Ville Y, Rozenberg P. Predictors of preterm birth. Best Pract Res Clin Obstet Gynecol. 2018;52:23–32. 10.1016/j.bpobgyn.2018.05.002. [DOI] [PubMed] [Google Scholar]
  • 8.Haas DM, Imperiale TF, Kirkpatrick PR, et al. Tocolytic therapy: a meta-analysis and decision analysis. Obstet Gynecol. 2009;113(3):585–94. 10.1097/AOG.0b013e318199924a. [DOI] [PubMed] [Google Scholar]
  • 9.Thomakos N, Daskalakis G, Papapanagiotou A, et al. Amniotic fluid interleukin-6 and tumor necrosis factor-α at mid-trimester genetic amniocentesis: relationship to intra-amniotic microbial invasion and preterm delivery. Eur J Obstet Gynecol Reproductive Biology. 2010;148(2):147–51. 10.1016/j.ejogrb.2009.10.027. [DOI] [PubMed] [Google Scholar]
  • 10.Chakoory O, Barra V, Rochette E, et al. DeepMPTB: a vaginal microbiome-based deep neural network as artificial intelligence strategy for efficient preterm birth prediction. Biomark Res. 2024;12(1). 10.1186/s40364-024-00557-1. [DOI] [PMC free article] [PubMed]
  • 11.Hashemi L, Shahshahan Z. Maternal serum cytokines in the prediction of preterm labor and response to tocolytic therapy in preterm labor women. Adv Biomedical Res. 2014;3(1). 10.4103/2277-9175.133243. [DOI] [PMC free article] [PubMed]
  • 12.Cordeiro CN, Savva Y, Vaidya D, et al. Mathematical modeling of the biomarker milieu to characterize preterm birth and predict adverse neonatal outcomes. Am J Reprod Immunol. 2016;75(5):594–601. 10.1111/aji.12502. [DOI] [PubMed] [Google Scholar]
  • 13.Camunas-Soler J, Gee EPS, Reddy M, et al. Predictive RNA profiles for early and very early spontaneous preterm birth. Am J Obstet Gynecol. 2022;227(1):72e71-72.e16. 10.1016/j.ajog.2022.04.002. [DOI] [PubMed] [Google Scholar]
  • 14.Considine EC, Khashan AS, Kenny LC. Screening for preterm birth: potential for a metabolomics biomarker panel. Metabolites. 2019;9(5). 10.3390/metabo9050090. [DOI] [PMC free article] [PubMed]
  • 15.Goffinet F. Primary predictors of preterm labour. BJOG. 2005;112(Suppl 1):38–47. 10.1111/j.1471-0528.2005.00583.x. [DOI] [PubMed] [Google Scholar]
  • 16.Kloska A, Harmoza A, Kloska SM, et al. Predicting preterm birth using machine learning methods. Sci Rep. 2025;15(1):5683. 10.1038/s41598-025-89905-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Abul-Husn NS, Kenny EE. Personalized medicine and the power of electronic health records. Cell. 2019;177(1):58–69. 10.1016/j.cell.2019.02.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li Y, Fu X, Guo X, et al. Maternal preterm birth prediction in the united states: a case-control database study. BMC Pediatr. 2022;22(1). 10.1186/s12887-022-03591-w. [DOI] [PMC free article] [PubMed]
  • 19.Abraham A, Le B, Kosti I, et al. Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth. BMC Med. 2022;20(1). 10.1186/s12916-022-02522-x. [DOI] [PMC free article] [PubMed]
  • 20.Li Y, Lou Y, Liu M, et al. Machine learning based biomarker discovery for chronic kidney disease–mineral and bone disorder (CKD-MBD). BMC Med Inf Decis Mak. 2024;24(1). 10.1186/s12911-024-02421-6. [DOI] [PMC free article] [PubMed]
  • 21.Mumuni A, Mumuni F. Automated data processing and feature engineering for deep learning and big data applications: A survey. J Inform Intell. 2025;3(2):113–53. 10.1016/j.jiixd.2024.01.002. [Google Scholar]
  • 22.Xue H, Huynh DQ, Reynolds M. PoPPL: pedestrian trajectory prediction by LSTM with automatic route class clustering. IEEE Trans Neural Netw Learn Syst. 2021;32(1):77–90. 10.1109/tnnls.2020.2975837. [DOI] [PubMed] [Google Scholar]
  • 23.Jisna VA, Jayaraj PB. Protein structure prediction: conventional and deep learning perspectives. Protein J. 2021;40(4):522–44. 10.1007/s10930-021-10003-y. [DOI] [PubMed] [Google Scholar]
  • 24.Nti IK, Owusu-Boadu B. A hybrid boosting ensemble model for predicting maternal mortality and sustaining reproductive. Smart Health. 2022;26:100325. 10.1016/j.smhl.2022.100325. [Google Scholar]
  • 25.World Health O. Born too soon: the global action report on preterm birth. Geneva; World Health Organization.10.1186/1742-4755-10-s1-s1
  • 26.Zhang Y, Du S, Hu T, et al. Establishment of a model for predicting preterm birth based on the machine learning algorithm. BMC Pregnancy Childbirth. 2023;23(1). 10.1186/s12884-023-06058-7. [DOI] [PMC free article] [PubMed]
  • 27.Szecsi PB, Arabi Belaghi R, Beyene J, et al. Prediction of preterm birth in nulliparous women using logistic regression and machine learning. PLoS ONE. 2021;16(6). 10.1371/journal.pone.0252025. [DOI] [PMC free article] [PubMed]
  • 28.Begum M, Redoy RM, Das Anty A. Preterm Baby Birth Prediction using Machine Learning Techniques [Z]. 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD). 2021: 50–54.10.1109/icict4sd50815.2021.9396933
  • 29.Gao C, Osmundson S, Velez Edwards DR, et al. Deep learning predicts extreme preterm birth from electronic health records. J Biomed Inform. 2019;100. 10.1016/j.jbi.2019.103334. [DOI] [PMC free article] [PubMed]
  • 30.Hershey M, Burris HH, Cereceda D, et al. Predicting the risk of spontaneous premature births using clinical data and machine learning. Inf Med Unlocked. 2022;32. 10.1016/j.imu.2022.101053.
  • 31.Kuusela P, Wennerholm UB, Fadl H, et al. Second trimester cervical length measurements with transvaginal ultrasound: A prospective observational agreement and reliability study. Acta Obstet Gynecol Scand. 2020;99(11):1476–85. 10.1111/aogs.13895. [DOI] [PubMed] [Google Scholar]
  • 32.Zhang J, Pan M, Zhan W, et al. Two-stage nomogram models in mid-gestation for predicting the risk of spontaneous preterm birth in twin pregnancy. Arch Gynecol Obstet. 2021;303(6):1439–49. 10.1007/s00404-020-05872-0. [DOI] [PubMed] [Google Scholar]
  • 33.Guerby P, Fillion A, Pasquier JC, et al. Evaluation of midtrimester cervical length thresholds for the prediction of spontaneous preterm birth. J Gynecol Obstet Hum Reprod. 2022;51(2):102287. 10.1016/j.jogoh.2021.102287. [DOI] [PubMed] [Google Scholar]
  • 34.Ding L, Yin X, Wen G, et al. Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China. BMC Pregnancy Childbirth. 2024;24(1):810. 10.1186/s12884-024-06980-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Xiu Y, Lin Z, Pan M. Development and validation of a risk prediction model for spontaneous preterm birth. Am J Transl Res. 2024;16(11):6500–9. 10.62347/tnwa5229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Huang X, Zhou Y, Liu B, et al. Prediction model for spontaneous preterm birth less than 32 weeks of gestation in low-risk women with mid-trimester short cervical length: a retrospective cohort study. BMC Pregnancy Childbirth. 2024;24(1):621. 10.1186/s12884-024-06822-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bitar G, Liu W, Tunguhan J, et al. A machine learning algorithm using clinical and demographic data for all-cause preterm birth prediction. Am J Perinatol. 2024;41:01. 10.1055/s-0043-1776917 [DOI] [PubMed]
  • 38.Ramachandran A, Clottey KD, Gordon A, et al. Prediction and prevention of preterm birth: quality assessment and systematic review of clinical practice guidelines using the AGREE II framework. Int J Gynaecol Obstet. 2024;166(3):932–42. 10.1002/ijgo.15514. [DOI] [PubMed] [Google Scholar]
  • 39.Jain VG, Willis KA, Jobe A, et al. Chorioamnionitis and neonatal outcomes. Pediatr Res. 2022;91(2):289–96. 10.1038/s41390-021-01633-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Impis Oglou M, Tsakiridis I, Mamopoulos A, et al. Cervical length screening for predicting preterm birth: A comparative review of guidelines. J Clin Ultrasound. 2023;51(3):472–8. 10.1002/jcu.23354. [DOI] [PubMed] [Google Scholar]
  • 41.Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Brief Bioinform. 2023;24(2). 10.1093/bib/bbad002. [DOI] [PMC free article] [PubMed]
  • 42.Ishwaran H, Kogalur UB, Blackstone EH, et al. Random survival forests. Annals Appl Stat. 2008;2(3). 10.1214/08-aoas169.
  • 43.Shu CH, Zebda R, Espinosa C, et al. Early prediction of mortality and morbidities in VLBW preterm neonates using machine learning. Pediatr Res. 2025;97(6):2056–64. 10.1038/s41390-024-03604-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The data supporting the findings of this study were obtained from Wenzhou People’s Hospital. However, restrictions apply to the availability of these data, which were used under license for the current study and are not publicly accessible. Data may be available from the authors upon reasonable request and with permission from Wenzhou People’s Hospital.


Articles from BMC Medical Informatics and Decision Making are provided here courtesy of BMC

RESOURCES