Abstract
Objective
Term low birth weight (TLBW) elevates the risk of several health complications and even mortality in infants. By incorporating a range of perinatal factors related to maternal health, we aim to develop predictive models that can enhance clinical decision-making during pregnancy, ultimately promoting the health and well-being of both mothers and newborns.
Methods
A retrospective study was conducted on 1,559 singleton term mothers who delivered either TLBW or normal birth weight babies at our hospital between 2019 and 2023. The objective was to identify various perinatal factors associated with maternal characteristics that may contribute to the occurrence of TLBW infants. The cohort was randomly split into training and test sets in a 7:3 ratio. Boruta algorithm, Lasso regression, and logistic regression analyses were applied to identify factors influencing TLBW, with intersections visualized using Venn diagrams. Ten different machine learning algorithms were used to construct predictive models after selecting key influencing factors. Model performance was evaluated using ROC curves, calibration curves, and DCA curves.
Results
Seven key feature variables were identified as contributors to the machine learning model: maternal high-risk factors, low BMI, elevated triglycerides, high albumin levels, elevated homocysteine, low HDL, and high LDL. These factors were recognized as significant risk indicators for the development of TLBW infants. The GBM model, among the ten machine learning algorithms tested, demonstrated exceptional predictive performance. Shapley Additive Explanations (SHAP) were used to interpret the GBM model. Additionally, a web-based risk calculator for predicting TLBW infants was successfully developed using the Shiny framework.
Conclusion
We developed a machine learning-based clinical prediction model to identify risk factors associated with TLBW. The model’s performance was thoroughly validated and evaluated, enabling early and accurate detection of TLBW infants. This tool provides valuable support for clinical decision-making and enhances maternal and neonatal care.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12887-025-06186-3.
Keywords: Maternity, Infants, Term low birth weight, Machine learning, SHAP, Shiny
Introduction
Intrauterine growth patterns can be effectively assessed through two key parameters: small for gestational age (SGA) classification and term low birth weight (TLBW) measurements. The latter refers to neonates delivered at or beyond 37 weeks of gestation with birth weights below 2,500 g, representing a distinct clinical population. These term LBW neonates exhibit specific clinical characteristics that differentiate them from their preterm counterparts, presenting a spectrum of immediate and long-term health concerns. Their clinical profile encompasses elevated risks of perinatal complications, including fetal distress and susceptibility to infections, alongside potential childhood manifestations of impaired neurological development and suboptimal physical growth [1–3]. Current epidemiological data reveal a concerning prevalence of TLBW cases, yet research efforts focusing on predictive analytics for this population remain disproportionately limited in the scientific literature.
Contemporary research in birth weight prediction has predominantly concentrated on premature infant populations or undifferentiated gestational age groups [4, 5]. Nevertheless, the pathogenesis and determinant factors underlying term LBW exhibit substantial distinctions from preterm cases. Scientific evidence indicates that TLBW demonstrates significant correlations with two primary domains: maternal nutritional indicators and environmental determinants, which are frequently inadequately addressed in current predictive frameworks [6, 7]. Furthermore, conventional statistical approaches present substantial constraints when processing complex perinatal datasets characterized by multidimensionality and nonlinear relationships, particularly in elucidating intricate interplays between various risk parameters [7, 8]. This critical gap in predictive methodology significantly hinders healthcare providers’ capacity for timely risk assessment and preventive management of TLBW cases.
The application of machine learning (ML) methodologies, characterized by their capacity to discern intricate nonlinear patterns and facilitate automated performance enhancement, has emerged as a transformative approach in medical predictive analytics [9–11]. Particularly in the context of complex healthcare datasets marked by high dimensionality and heterogeneity, ML algorithms demonstrate superior capability in synthesizing diverse data sources to enhance prognostic precision [12]. This investigation pioneers the implementation of ML techniques for TLBW prediction, with three principal objectives: firstly, to establish a specialized predictive framework addressing the current research void in TLBW forecasting; secondly, to comprehensively delineate critical perinatal determinants influencing TLBW outcomes; and thirdly, to develop a clinically viable early detection system facilitating personalized therapeutic interventions. The anticipated outcomes of this research endeavor are projected to advance perinatal healthcare delivery and optimize neonatal health outcomes through innovative technological solutions.
Methods
Study population
The study population comprised 740 women who delivered full-term low birth weight infants at our hospital between January 2019 and December 2023, alongside 819 women who delivered full-term infants with normal birth weights during the same period. In total, there were 1,559 full-term births. In our study, a total of 1,559 newborns (740 TLBW and 819 controls) were included. The sample size was determined using PASS 15 software, with a two-sided test assuming α = 0.05 and 90% power, which indicated a requirement of 372 cases per group (total N = 744). To account for potential 20% dropout, we increased this to 465 per group (N = 930). Our final cohort of 1,559 participants (740 TLBW and 819 controls) not only met but exceeded these power requirements while satisfying the 10 events-per-variable (EPV) rule (180 events for our 18 predictor variables). TLBW neonates were defined as infants delivered at ≥ 37 weeks with birth weight < 10th percentile without evidence of fetal growth restriction (FGR), comprise constitutionally small healthy neonates and undiagnosed mild growth restriction cases, while FGR was diagnosed per ACOG 2021/ISUOG 2020/SMFM 2020 guidelines [13–15] by either major criteria (abdominal circumference/estimated fetal weight < 3rd percentile or abnormal Doppler parameters) or minor criteria (growth velocity decline, oligohydramnios, or abnormal uterine artery PI). To ensure cohort homogeneity, during retrospective data collection, we intentionally excluded all pregnancies with antenatal diagnoses of FGR, recognizing that the distinct pathophys of FGR might confound model interpretation.
Data collection
We retrospectively collected key perinatal data from the hospital’s electronic medical record system and nursing documentation system. Our study employed a proactive data completeness protocol where all 29 variables were manually collected only for patients with complete target variables. Unlike conventional machine learning workflows that apply post-collection screening and imputation, we enforced completeness during data acquisition through a two-stage process: initial eligibility verification of outcome variables followed by comprehensive manual extraction of all predictors. This approach eliminated missing values by design and better reflected clinical decision-making paradigms requiring complete information. The absence of traditional data screening steps in our workflow reflects this built-in completeness control during case selection and manual data collection. Given the nature of the study as a retrospective data analysis and the strict anonymization of all data to ensure privacy, informed consent was not required from individual study subjects. However, to protect patients’ rights, the following measures were implemented: all personally identifiable information (e.g., names, ID numbers, contact details, etc.) was removed, and the data were stored and analyzed in a coded form, ensuring that they could not be traced back to any individuals. The data included 6 basic variables: maternal age, parity (number of previous births), gravidity (number of pregnancies), risk factors : (1) pregnancy-induced hypertension (including gestational hypertension and chronic hypertension), (2) gestational diabetes mellitus requiring medical treatment, (3) preeclampsia and eclampsia. A patient is now classified as having a “risk factor” (coded as ‘yes’) if they present with any one of these conditions, as each independently confers significant risk, body mass index (BMI), and migration status (this term describes internal migrant workers who move from less developed regions (e.g. western inland provinces) to economically advanced areas (e.g. eastern coastal cities) for employment opportunities, typically under temporary residency status). In addition, we gathered 12 physiological and biochemical indicators: blood glucose, cholesterol, triglycerides, high-density lipoprotein (HDL), low-density lipoprotein (LDL), albumin, total protein, uric acid, cysteine, folate, vitamin B12, and aminoglycosides. As for outliers or anomalies, we employed independent checks by two researchers and used box plots and scatter plots for visualization.These factors, carefully considered by experienced obstetricians, midwives, and neonatologists, provide a comprehensive and informative dataset for the study. To clarify, laboratory biochemical indicators such as triglycerides and albumin were collected during routine second-trimester prenatal visits (mid-pregnancy), while BMI was measured at the time of hospital admission for delivery.
Statistical analysis
Descriptive analyses were conducted on the individual characteristics of the participants in this study. Data were statistically analyzed using R software (version 4.4.1). Continuous variables were summarized as means with standard deviations (SD) and compared using t-tests for baseline characteristics. Categorical variables were expressed as percentages and analyzed using chi-square tests for baseline comparisons. A 7:3 ratio was used to randomly divide the participants into training and test sets. Univariate and multivariate logistic regression analyses were performedin the training set to calculate odds ratios (OR) and 95% confidence intervals (CI), with statistical significance set at p < 0.05. To identify variables associated with the occurrence of TLBW, we employed Borutaalgorithm, Lasso regression, and logistic regression. The intersection of these three algorithms was visualized using a Venn diagram. A correlation heatmap was generated to assess relationships between the identified variables. The training set was used to construct predictive models, which were then validated using the test set. Ten machine learning algorithms were implemented in the training set, including Gradient Boosting Machine (GBM), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), Neural Networks, Light Gradient Boosting Machine (LightGBM), Category Boosting (CatBoost), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The performance of these models was evaluated based on metrics such as the receiver operating characteristic curve (ROC), area under the curve (AUC), sensitivity, specificity, F1 score, accuracy, and predictive ability. Additionally, a shiny-based online risk calculator was developed using the GBM, which demonstrated the best predictive performance. This tool enables clinicians to predict the likelihood of a full-term low-birth-weight delivery. The calculator is user-friendly, allowing physicians to input relevant variables through dropdown menus, and provides the risk probability for each patient when the “Predict” button is clicked. The R packages used in this study are compatible with R version 4.4.1,The R packages involved are” pbapply “,” rlang “,” tidyverse “,” reshape2 “,” openxlsx “,” DALEX “,” readr “,” gbm “,” dplyr “,” caret “,” ggplot2 “,” pROC “,” rms “,” rmda “,” dcurves “,” Hmisc “,” ResourceSelection “,” DynNom “,” survey “,” foreign “,” plotROC “,” survival “,” shapper “,” iml “,” e1071 “,” ROCR “,” corrplot “,” lattice “,” Formula “,” SparseM “,” survival “,” riskRegression “,” kernelshap “,” extraTrees “,” rJava “,” pheatmap “,” fastshap “,” naivebayes “,” ingredients “,” mlr3 “,” Table 1 “,” tableone “,” adabag “,” RColorBrewer “,” VIM “,” mice “,” autoReg “,” cvms “,” tibble “,” corrplot “,” data.table “,” devtools “,” ComplexHeatmap “,” RColorBrewer “,” circlize “,” ROSE “,” DMwR “,” scales “,” catboost “,” lightgbm “,” shapviz “,” shiny “. and all can be installed and run using this version.
Table 1.
Baseline data disaggregated by full-term low-birth-weight infants or not
| Variable | [ALL] | No | Yes | P |
|---|---|---|---|---|
| N = 1559 | N = 819 | N = 740 | ||
| Parity, n(%) | < 0.05 | |||
| Primigravid | 984 (63.1%) | 491 (60.0%) | 493 (66.6%) | |
| Non-primigravid | 575 (36.9%) | 328 (40.0%) | 247 (33.4%) | |
| Migrant, n(%) | < 0.05 | |||
| Native population | 1141 (73.2%) | 630 (76.9%) | 511 (69.1%) | |
| Mobile population | 418 (26.8%) | 189 (23.1%) | 229 (30.9%) | |
| Risk factor, n(%) | < 0.05 | |||
| Yes | 275 (17.6%) | 127 (15.5%) | 675 (91.2%) | |
| No | 1284 (82.4%) | 692 (84.5%) | 65 (8.8%) | |
| Gravidity, (mean,±SD) | 2.0 (1.2) | 2.0 (1.2) | 2.0 (1.3) | 0.42 |
| Age, (mean,±SD) | 28.6 (4.5) | 28.8 (4.1) | 28.4 (4.9) | 0.09 |
| BMI, (mean,±SD) | 26.9 (2.7) | 27.6 (1.9) | 26.1 (3.2) | < 0.05 |
| Glycemic, (mean,±SD) | 4.7 (1.1) | 4.6 (1.1) | 4.7 (1.2) | 0.21 |
| Cholesterol, (mean,±SD) | 6.4 (1.3) | 6.4 (1.2) | 6.5 (1.3) | 0.43 |
| Triglyceride, (mean,±SD) | 3.6 (1.9) | 3.4 (1.5) | 3.9 (2.2) | < 0.05 |
| HDL, (mean,±SD) | 2.0 (0.6) | 2.1 (0.4) | 1.9 (0.7) | < 0.05 |
| LDL, (mean,±SD) | 3.2 (0.9) | 2.8 (0.7) | 3.6 (1.0) | < 0.05 |
| Albumin, (mean,±SD) | 33.4 (3.2) | 32.3 (2.8) | 34.6 (3.1) | < 0.05 |
| Total protein, (mean,±SD) | 62.4 (5.1) | 61.9 (5.2) | 63.0 (5.1) | < 0.05 |
| Uric acid, (mean,±SD) | 341.0 (92.9) | 333.0 (86.0) | 349.0 (99.5) | < 0.05 |
| Cysteine, (mean,±SD) | 8.4 (3.5) | 7.9 (2.2) | 8.9 (4.4) | < 0.05 |
| Folic acid, (mean,±SD) | 8.1 (5.2) | 8.0 (4.4) | 8.2 (6.0) | 0.48 |
| Vitamin B12, (mean,±SD) | 251.0 (101.0) | 241.0 (99.6) | 262.0 (103.0) | < 0.05 |
| Aminopeptidase, (mean,±SD) | 299.0 (141.0) | 272.0 (117.0) | 329.0 (159.0) | < 0.05 |
Results
Patient’s characteristics
Table 1 presents the baseline characteristics of the study population, categorized by whether the infants were full-term low-birth-weight or full-term normal-birth-weight. After excluding patients with missing data, a total of 1,559 neonates were included in the analysis: 740 (46.3%) were full-term low-birth-weight infants, and 819 (53.7%) were full-term normal-birth-weight infants. Statistically significant differences (p < 0.05) were found across 13 variables, including Parity, Migration status, Risk factors, BMI, Triglycerides, HDL, LDL, Albumin, Total protein, Uric acid, Cysteine, Vitamin B12, and Aminopeptidase. Table 2 provides baseline data for both the training and external validation cohorts. The patient data were randomly split into training and test sets in a 7:3 ratio. In the training set, 47.4% (518 cases) of the neonates were full-term low-birth-weight infants, while in the test set, 47.5% (222 cases) were full-term low-birth-weight infants. Statistically significant differences (p < 0.05) were observed for two variables: Migration status and Cysteine.
Table 2.
Baseline data for both the training and external validation cohorts
| Variable | [ALL] | Training set | Test set | p |
|---|---|---|---|---|
| N = 1559 | N = 1092 | N = 467 | ||
| Full-term LBW, n(%) | 0.97 | |||
| Yes | 740 (47.5%) | 518 (47.4%) | 222 (47.5%) | |
| No | 819 (52.5%) | 574 (52.6%) | 245 (52.5%) | |
| Parity, n(%) | 0.26 | |||
| Primigravid | 984 (63.1%) | 699 (64.0%) | 285 (61.0%) | |
| Non-primigravid | 575 (36.9%) | 393 (36.0%) | 182 (39.0%) | |
| Migrant, n(%) | <0.05 | |||
| Native population | 1141 (73.2%) | 822 (75.3%) | 319 (68.3%) | |
| Mobile population | 418 (26.8%) | 270 (24.7%) | 148 (31.7%) | |
| Risk factor, n(%) | 0.70 | |||
| Yes | 275 (17.6%) | 190 (17.4%) | 85 (18.2%) | |
| No | 1284 (82.4%) | 902 (82.6%) | 382 (82.8%) | |
| Gravidity, (mean,±SD) | 2.0 (1.2) | 2.0 (1.2) | 2.0 (1.2) | 0.64 |
| Age, (mean,±SD) | 28.6 (4.5) | 28.6 (4.4) | 28.4 (4.7) | 0.38 |
| BMI, (mean,±SD) | 26.9 (2.7) | 27.0 (2.7) | 26.8 (2.7) | 0.40 |
| Glycemic, (mean,±SD) | 4.7 (1.1) | 4.7 (1.2) | 4.6 (1.0) | 0.26 |
| Cholesterol, (mean,±SD) | 6.4 (1.3) | 6.4 (1.2) | 6.5 (1.3) | 0.22 |
| Triglyceride, (mean,±SD) | 3.6 (1.9) | 3.6 (1.9) | 3.6 (1.9) | 0.95 |
| HDL, (mean,±SD) | 2.0 (0.6) | 2.0 (0.6) | 2.0 (0.4) | 0.36 |
| LDL, (mean,±SD) | 3.2 (1.0) | 3.2 (1.0) | 3.3 (1.0) | 0.11 |
| Albumin, (mean,±SD) | 33.4 (3.2) | 33.4 (3.1) | 33.3 (3.3) | 0.70 |
| Total protein, (mean,±SD) | 62.4 (5.1) | 62.4 (5.1) | 62.5 (5.3) | 0.68 |
| Uric acid, (mean,±SD) | 341.0 (92.9) | 338.0 (89.9) | 348.0 (99.2) | 0.06 |
| Cysteine, (mean,±SD) | 8.4 (3.5) | 8.3 (3.2) | 8.7 (3.9) | <0.05 |
| Folic acid, (mean,±SD) | 8.1 (5.2) | 8.2 (5.5) | 7.8 (4.5) | 0.19 |
| Vitamin B12, (mean,±SD) | 251.0 (101.0) | 253.0 (104.0) | 246.0 (95.5) | 0.17 |
| Aminopeptidase, (mean,±SD) | 299.0 (141.0) | 298.0 (142.0) | 301.0 (140.0) | 0.76 |
Univariate and multivariate logisticregression analyses
As shown in Table 3, the univariate logistic regression analysis identified several factors significantly associated with term low birth weight infants, including Parity, Migrant status, Risk factors, BMI, Triglycerides, HDL, LDL, Albumin, Total protein, Cysteine, Vitamin B12, and Aminopeptidase (p < 0.05). In the multivariate logistic regression analysis, the factors that remained statistically significant were Migrant status, Risk factors, BMI, Triglycerides, HDL, LDL, Albumin, Total protein, Cysteine, Vitamin B12 and Aminopeptidase (p < 0.05).
Table 3.
Univariate and multivariate logistic regression analysis of the training set
| Variable | Univariate logistic | P | Multivariable logistic | P |
|---|---|---|---|---|
| OR (95%CI) | OR (95%CI) | |||
| Parity | ||||
| Primigravid | ||||
| Non-primigravid | 0.72 (0.56–0.92) | < 0.05 | 0.98 (0.71–1.36) | 0.91 |
| Migrant | ||||
| Native population | ||||
| Mobile population | 1.64 (1.24–2.16) | < 0.05 | 1.75 (1.20–2.55) | < 0.05 |
| Riskfactor | ||||
| No | ||||
| Yes | 1.39 (1.01–1.90) | < 0.05 | 1.60 (1.07–2.39) | < 0.05 |
| Age | 0.98 (0.96–1.01) | 0.24 | ||
| Gravidity | 1.06 (0.96–1.17) | 0.25 | ||
| BMI | 0.80 (0.77–0.85) | < 0.05 | 0.81 (0.76–0.86) | < 0.05 |
| Glycemic | 1.11 (1.00-1.23) | 0.05 | ||
| Cholesterol | 0.99 (0.90–1.09) | 0.80 | ||
| Triglyceride | 1.16 (1.08–1.24) | < 0.05 | 1.19 (1.08–1.30) | < 0.05 |
| HDL | 0.48 (0.36–0.64) | < 0.05 | 0.44 (0.30–0.65) | < 0.05 |
| LDL | 3.19 (2.67–3.82) | < 0.05 | 3.17 (2.56–3.93) | < 0.05 |
| Albumin | 1.31 (1.25–1.38) | < 0.05 | 1.41 (1.32–1.51) | < 0.05 |
| Total protein | 1.04 (1.02–1.07) | < 0.05 | 0.94 (0.91–0.98) | < 0.05 |
| Uric acid | 1.00 (1.00–1.00) | 0.108 | ||
| Cysteine | 1.13 (1.07–1.18) | < 0.05 | 1.18 (1.09–1.28) | < 0.05 |
| Folic acid | 1.01 (0.99–1.04) | 0.285 | ||
| Vitamin B12 | 1.00 (1.00–1.00) | < 0.05 | 1.00 (1.00–1.00) | < 0.05 |
| Aminopeptidase | 1.00 (1.00–1.00) | < 0.05 | 1.00 (1.00–1.00) | < 0.05 |
OR Odds ratio, CI Confidence interval
Boruta algorithm and lasso regression analysis
The Boruta algorithm is a feature selection technique designed to identify important features from a dataset for predictive modeling. It extends the Random Forest model and efficiently addresses the feature selection challenge, particularly when dealing with datasets containing a large number of features. In Fig. 1A, The black box-and-line plots represent the minimum, average, and maximum Z-scores of a shadow feature. The red and green box-and-line plots correspond to the Z-scores of rejected and confirmed features, respectively, while the yellow andblue color indicates the Z-scores of features yet to be identified. 14 variables—Migrant, Aminopeptidase, Folic acid, Uric acid, Triglyceride, Total protein, Cholesterol, Risk factor, Cysteine, HDL, Albumin, LDL, BMI, and Vitamin B12—are identified (in green) as significant features. In contrast, three variables—Glycemic, Age and Parity—are rejected (in red). Figure 1B shows the path diagram of regression coefficients. It includes 18 variables, each represented by a differently colored line, with a variable number on each line. Each curve illustrates the trajectory of a variable’s coefficient over different values of log(λ), with the vertical axis showing the coefficient value, the lower horizontal axis representing log(λ), and the upper horizontal axis indicating the number of non-zero coefficients at each point in the model. As the log(λ) parameter increases, the regression coefficients (represented by the vertical axis) converge toward zero. Figure 1D presents a cross-validation plot, where the X-axis shows log(λ) values for the penalty coefficients and the Y-axis represents the likelihood deviation. Smaller Y-axis values indicate a better fit to the model. Table 4 lists the 11 variables selected by multifactor logistic regression after the lasso regression.
Fig. 1.
A Boruta algorithm output showing importance scores of all candidate variables. Variables above the “shadowMax” threshold were selected as significant. B LASSO coefficient profiles for 18 variables. The vertical dotted line indicates optimal λ (lambda) value selected by 10-fold cross-validation, resulting in 11 non-zero coefficients. C Venn diagram of variables selected by three methods: Boruta, Lasso, and logistic regression. D Cross-validation error curve for LASSO regression. Red dots represent mean squared error (MSE) ± 1 SD at each λ value, with optimal λ marked by the vertical dotted line. E Spearman correlation matrix of the final selected predictors. Color intensity and circle size represent correlation strength
Table 4.
The variables selected by multifactor logistic regression after the Lasso regression
| Variable | Estimate | Std.Error | OR(95%CI) | P |
|---|---|---|---|---|
| Parity | −0.722 | 0.270 | 0.48(0.28–0.82) | < 0.05 |
| Risk factor | 0.359 | 0.241 | 1.43(0.89–2.29) | < 0.05 |
| Gravidity | 0.365 | 0.108 | 1.44(1.66–1.78) | < 0.05 |
| BMI | −0.266 | 0.040 | 0.77(0.71–0.83) | < 0.05 |
| Glycemic | 0.187 | 0.089 | 1.20(1.01–1.44) | < 0.05 |
| Cholesterol | −2.147 | 0.183 | 0.12(0.08–0.16) | < 0.05 |
| Triglyceride | 0.384 | 0.061 | 1.47(1.30–1.66) | < 0.05 |
| HDL | 0.569 | 0.290 | 1.77(1.01–3.13) | < 0.05 |
| LDL | 3.653 | 0.264 | 3.86(2.34–6.44) | < 0.05 |
| Albumin | 0.307 | 0.040 | 1.36(1.26–1.47) | < 0.05 |
| Cysteine | 0.101 | 0.038 | 1.11(1.03–1.19) | < 0.05 |
OR Odds ratio, CI Confidence interval
Wayne diagram and spearman correlation heatmap
We compared the feature sets selected by Boruta’s algorithm, Lasso regression, and logistic regression, ultimately selecting a subset of features that were identified by all three methods as the final predictor variables. As shown in Fig. 1C, the seven features selected for inclusion in the machine learning predictive model were BMI, risk factors, triglycerides, HDL, LDL, albumin, and cysteine. Figure 1E displays the Spearman correlation coefficient matrix heatmap, commonly used in the literature to visualize the relationships between variables. The analysis revealed that no multicollinearity was detected between the selected variables. (p > 0.05).
Performance comparison of ten machine learning algorithms
Ten machine learning algorithms (GBM, Adaboost, XGBoost, Neural Network, LightGBM, CatBoost, RF, LR, SVM, and KNN) were employed to develop predictive models for term low birth weight(TLBW) infants. The comparative analysis of AUC performance across training and test sets demonstrates that GBM (Fig. 2A and D) delivers the most outstanding overall performance. It achieves the highest AUC values in both datasets (training set: 0.972, test set: 0.915) while maintaining the smallest performance gap (ΔAUC = 0.057), indicating exceptional generalization capability. Table 5 presents the performance metrics of these models in both the training and validation cohorts, including F1 score, accuracy, sensitivity, specificity, and predictability. The comprehensive performance comparison across training and test sets demonstrates that GBM exhibits the most outstanding overall performance. On the training set, GBM achieves superior metrics across the board (F1 = 0.908, Accuracy = 0.912), while maintaining its leading performance on the test set (F1 = 0.84, Accuracy = 0.85). Notably, it shows the smallest performance discrepancy between training and test sets (ΔF1 = 0.068), indicating exceptional generalization capability and stability. In contrast, although LightGBM and Random Forest achieve near-perfect metrics on the training set (F1 = 0.985 and 1.0 respectively), they demonstrate significant performance degradation on the test set (F1 dropping to 0.785 and 0.813 respectively), revealing pronounced overfitting tendencies. Among traditional methods, Logistic Regression and SVM display the most stable performance, albeit with relatively conservative results (test set F1 ≈ 0.78–0.79), while KNN stands out among simpler models (test set F1 = 0.812). Neural Networks (F1 = 0.813) and XGBoost (F1 = 0.806) deliver moderate performance, with AdaBoost (F1 = 0.708) and CatBoost (specificity dropping to 0.718) performing the poorest. After comprehensive evaluation of prediction accuracy, generalization ability, and computational efficiency, GBM emerges as the optimal choice due to its consistently high performance, making it particularly suitable for real-world clinical applications. These results indicate that the GBM model consistently delivered optimal performance, making it the preferred choice for constructing the predictive model for TLBW infants. Additionally, the calibration curves for both cohorts were closer to the diagonal line compared to those of other machine learning algorithms, indicating that GBM’s predictions closely aligned with actual incidence rates (Fig. 2B and E). Decision Curve Analysis (DCA) further demonstrated that the GBM model provided a higher net clinical benefit across both datasets (Fig. 2C and F). Furthermore, to assess the stability of the GBM model, we conducted a ten-fold cross-validation on both the training and validation sets. The results are presented in Fig. 3A and B, clearly illustrating the AUC values along with the average AUC values for each fold of validation. The confusion matrix (Fig. 3C and D) confirmed GBM as the optimal model by highlighting the differences between the predicted and actual results.In conclusion, GBM was selected as the final predictive model for TLBW infants, demonstrating superior performance across multiple evaluation metrics.
Fig. 2.
2A ROC curves for ten models in the training cohort. 2D ROC curves for ten models in the validation cohort. 2B Calibration curves for ten models in the training cohort. 2E Calibration curves for ten models in the validation cohort. 2C DCA curves for ten models in the training cohort. 2F DCA curves for ten models in the validation cohort
Table 5.
Performance parameters of the ten models in both the training and validation cohorts
| Data set | ML | Accuracy | Sensitivity | Specificity | Precision | F1 |
|---|---|---|---|---|---|---|
| Training set | Logistic | 0.793 | 0.695 | 0.882 | 0.841 | 0.761 |
| SVM | 0.788 | 0.705 | 0.864 | 0.824 | 0.760 | |
| GBM | 0.912 | 0.919 | 0.906 | 0.898 | 0.908 | |
| NeuralNetwork | 0.840 | 0.776 | 0.897 | 0.872 | 0.821 | |
| RandomForest | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| Xgboost | 0.835 | 0.793 | 0.873 | 0.849 | 0.820 | |
| KNN | 0.901 | 0.911 | 0.892 | 0.884 | 0.897 | |
| Adaboost | 0.761 | 0.616 | 0.892 | 0.837 | 0.710 | |
| LightGBM | 0.986 | 0.977 | 0.995 | 0.994 | 0.985 | |
| CatBoost | 0.829 | 0.840 | 0.819 | 0.807 | 0.823 | |
| Test set | Logistic | 0.807 | 0.757 | 0.853 | 0.824 | 0.789 |
| SVM | 0.805 | 0.721 | 0.882 | 0.847 | 0.779 | |
| GBM | 0.850 | 0.829 | 0.869 | 0.852 | 0.840 | |
| NeuralNetwork | 0.833 | 0.766 | 0.894 | 0.867 | 0.813 | |
| RandomForest | 0.824 | 0.802 | 0.845 | 0.824 | 0.813 | |
| Xgboost | 0.820 | 0.788 | 0.849 | 0.825 | 0.806 | |
| KNN | 0.822 | 0.806 | 0.837 | 0.817 | 0.812 | |
| Adaboost | 0.760 | 0.613 | 0.894 | 0.840 | 0.708 | |
| LightGBM | 0.794 | 0.788 | 0.800 | 0.781 | 0.785 | |
| CatBoost | 0.779 | 0.847 | 0.718 | 0.732 | 0.785 |
Fig. 3.
3A Ten-fold cross-validation of the training set. 3B Ten-fold cross-validation of the validation cohort. 3C Confusion matrix results for ten models in the training cohort. 3D Confusion matrix results for ten models in the validation cohort
Characteristic importance and interpretability of GBM model
The feature importance of GBM model was assessed using SHapley Additive exPlanation (SHAP). We analyzed the marginal contributions of the seven model features across all samples, combining feature importance and feature effects with summary plots (Fig. 4A and B). The results indicate that the features Albumin, LDL, HDL, and BMI exhibit more dispersed sample distributions, with a broader range of SHAP values. This suggests that these features have a significant impact on the model’s predictions. In contrast, the distributions for Risk factor and Triglycerides are concentrated around SHAP = 0, implying that these features exert a minimal effect. The dependence plot (Fig. 4C) demonstrates a nonlinear relationship between LDL levels and the target variable: as LDL increases within the 1–4 range, the SHAP values show a marked rise from − 0.25 to 0.25, indicating a substantial positive impact on model predictions. Beyond the threshold of 4, however, the SHAP values plateau around 0.25, suggesting a saturation effect where additional increases in LDL exert diminishing influence on the model’s output. Additionally, waterfall plots were generated to visualize the contribution of individual features to specific sample predictions (Fig. 4D).
Fig. 4.
4A Ranking of the importance scores for the sevent variables derived from the GBM-based model. 4B Swarm map based on SHAP interpretation. 4C Dependence plot between the characteristic variables gestational week and BMI. 4D Single-sample interpretable force diagram
Discussion
The occurrence of term low birth weight (TLBW) is influenced by a complex interplay of multiple factors. In this study, we developed and validated ten machine learning algorithms to predict TLBW based on perinatal data from pregnant women. Among these models, the GBM algorithm demonstrated exceptional performance, with AUC values of 0.972 and 0.915 in internal and external validation tests, respectively. The decision curve analysis (DCA) and calibration curves further confirmed the model’s strong applicability. We believe the GBM model’s superior accuracy is due to its nature as an ensemble learning algorithm, well-suited for handling large and complex datasets. The clinical significance of machine learning lies in its ability to identify risk factors closely associated with TLBW. Our analysis identified seven key predictors: BMI, risk factors, triglycerides, HDL, LDL, albumin, and cysteine.
Based on the ranked importance of the characteristic variables identified by the GBM algorithm, maternal BMI are key factors in predicting TLBW. Previous studies support these findings: Gul et al. demonstrated a direct relationship between maternal BMI and neonatal birth weight, while Bramsved et al. found that early pregnancy BMI strongly correlates with neonatal birth weight. Both of these studies align with our results [16, 17]. Our study further demonstrated that the presence of high-risk factors in the mother during pregnancy, such as co-morbidities, significantly increases the risk of LBW in full-term newborns. Consistent with our findings, previous research by Xu, L. and Getaneh, T. showed that maternal gestational hypertension has a substantial impact on neonatal birth weight, further supporting our conclusions [18, 19]. Additionally, maternal lipid metabolism markers—such as triglycerides, HDL, LDL, albumin, and cysteine—were found to be critical factors in the risk of TLBW. The relationship between maternal lipid levels and neonatal outcomes has been a widely discussed topic in recent years. While several studies have shown a positive correlation between maternal triglyceride levels and neonatal birth weight during early pregnancy (before 34 weeks) [20, 21], our study, which focused on full-term pregnancies (after 37 weeks), found that higher maternal triglyceride levels were associated with lower neonatal birth weight [22]. High triglyceride levels can indicate increased maternal energy reserves; however, when they are linked to other unhealthy metabolic conditions, such as gestational diabetes or hypertension, they may negatively affect fetal development. These conditions can interfere with placental nutrient transport, insulin regulation, and fetal growth factors, potentially leading to reduced birth weight. Notably, our study population included a relatively high percentage of pregnant women with such high-risk factors, which further corroborates our findings. This elevated proportion of high-risk pregnancies strengthens the evidence supporting the association between maternal metabolic disturbances and adverse neonatal outcomes.
Our study found that HDL and LDL have opposing effects on TLBW. Specifically, HDL showed a positive correlation with birth weight, likely due to its role in maternal fat metabolism, vascular health, and placental function. HDL contributes to enhancing the maternal circulatory system and improving placental oxygenation, both of which support normal fetal growth [23]. In contrast, elevated maternal LDL cholesterol may impair fetal development through altered placental cholesterol transport mechanisms. The placenta regulates selective uptake of maternal LDL through receptor-mediated endocytosis, and excessive LDL levels can disrupt this delicate balance, potentially affecting fetal steroidogenesis and membrane formation during critical developmental periods [24, 25]. These pathological changes are particularly relevant in pregnancies complicated by gestational diabetes or hypertension, where dyslipidemia and endothelial dysfunction converge to restrict fetal growth [26]. The negative association between maternal albumin levels and birth weight may be primarily attributed to its essential function in fatty acid transport. As the principal carrier of free fatty acids, albumin facilitates the placental transfer of long-chain polyunsaturated fatty acids (LCPUFAs) that are critical for fetal growth and development. Experimental studies demonstrate that altered albumin concentrations impair the binding and delivery of these essential fatty acids to placental membranes, thereby disrupting the nutrient supply cascade [27]. A recent retrospective study by Wu et al. in China, which analyzed a large sample, identified a significant negative correlation between maternal serum albumin levels and fetal growth during early pregnancy. Higher albumin levels in early gestation were associated with adverse effects on fetal growth, with these effects becoming more pronounced during the second trimester [28]. Elevated homocysteine levels in pregnant women can damage placental blood vessels, leading to poor placental perfusion. This impairment disrupts nutrient and oxygen exchange, hindering fetal growth and contributing to lower birth weight [29].
While the model demonstrates excellent predictive performance, several limitations must be considered when applying it clinically. Firstly, the model’s success is significantly dependent on the quality and completeness of the input data, which necessitates regular updates and recalibrations to maintain accuracy. This requirement places a considerable demand on data standardization and may increase the clinical workload. Secondly, clinical integration is challenged by system interfacing issues, requiring seamless compatibility with electronic medical record systems, which can alter existing workflows and necessitate adaptation by medical personnel. Lastly, variations in the levels of informatization across medical institutions and uncertainties regarding the cost-benefit ratio may impact the model’s implementation.Nonetheless, the findings of this study can enhance clinical practice in various ways. For instance, the model can be incorporated into the prenatal examination process to facilitate automatic risk assessments, allowing for the early identification of high-risk cases before mid-pregnancy. This enables the formulation of personalized management plans based on risk classifications, such as scheduling ultrasound monitoring for fetal growth every two weeks in high-risk pregnancies. Additionally, we can develop patient education tools that provide customized health guidance based on risk factors, including dietary control programs for pregnant women with elevated triglyceride levels to improve patient compliance and potentially reduce the incidence of low birth weight at term. Looking to the future, we aim to deepen our research by establishing a cohort of 10,000 cases in collaboration with multiple regional medical centers and evaluating the model’s generalizability. These efforts will significantly advance the transition of predictive models into effective clinical decision support systems.We also have developed a risk calculator using Shiny in R to visualize the model and facilitate its clinical application for TLBW infants. The relevant code (app.R) has been included in the Supplementary File (S1), allowing anyone with R software to easily open the app.R file and use it for clinical purposes. Figure 5 presents a screenshot of the user interface, where clinicians can determine the likelihood of a particular outcome for a full-term low-birth-weight infant by simply clicking on the prediction.
Fig. 5.
User interface for a risk calculator developed using Shiny
The aim of this study was to develop ten machine learning models for accurately predicting the occurrence of TLBW using data from our hospital. In addition, we visualized the trends and distributions of TLBW in relation to various perinatal factors of pregnant women, and examined how target variables responded to different characteristic variables in order to address any unexplained patterns by the models. Despite these efforts, our study has several limitations. First, as the model was developed using single-center data, its generalizability to diverse populations may be limited. We strongly recommend future multi-center validation studies involving geographically and demographically diverse cohorts to verify the robustness of our findings. Second, although the model’s accuracy exceeds 85%, prospective studies are necessary to further validate its practical utility. Third, while internal validation showed promising results, the absence of external validation data from other institutions underscores the need for collaborative verification efforts to assess cross-center applicability.
Conclusion
In this study, we successfully developed and validated a machine learning-based predictive model for term low birth weight (TLBW). The GBM model demonstrated exceptional performance in both training and validation datasets. Our analysis identified seven key predictors, including body mass index, various risk factors, triglycerides, HDL, LDL, albumin, and cysteine. Additionally, we implemented a web-based risk calculator using the Shiny framework, which facilitates the early and effective detection of high-risk cases. This tool aims to enhance the early identification of TLBW and provide valuable support for clinical decision-making.
Supplementary Information
Supplementary Material 1. S1 R-language code. [app.R (docx)]
Supplementary Material 2. S2 Dataset of this study. [Dataset (xlsx)]
Abbreviations
- TLBW
Term low birth weight
- SGA
Small for gestational age
- ML
Machine learning
- SHAP
Shapley additive explanations
- AUC
Area under the curve
- DCA
Decision curve analysis
- CI
Confidence interval
- BMI
Body mass index
- HDL
High-density lipoprotein
- LDL
Low-density lipoprotein
Authors’ contributions
C and W contributed to the idea and design. C, W, S and Z collected and analyzed the data. S drew the figures and tables. C and S wrote the draft. C, W, S and Z contributed to manuscript writing and revision. All authors contributed to the article and approved the submitted version.
Funding information
There is no supporting funding.
Data availability
Data is provided within the manuscript or supplementary information files.
Declarations
Ethics approval and consent to participate
This retrospective study was approved by the Ethics Committee of Shaoxing Maternal and Child Health Hospital in December 2024,the ethical approval number is [IRB-AF/37 − 2.0]. And it did not involve animal or human clinical trials. And data collection was based on complete medical records and data analysis was performed anonymously. The ethics committee waived the requirement for informed consent.All our research methods were in accordance with relevant guidelines and regulations.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Sharma D, Shastri S, Sharma P. Intrauterine growth restriction: antenatal and postnatal aspects. Clin Med Insights Pediatr. 2016;10:67–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Arcangeli T, Thilaganathan B, Hooper R, et al. Neurodevelopmental delay in small babies at term: a systematic review. Ultrasound Obstet Gynecol. 2012;40(3):267–75. [DOI] [PubMed] [Google Scholar]
- 3.Sacchi C, Marino C, Nosarti C, et al. Association of intrauterine growth restriction and small for gestational age status with childhood cognitive outcomes: a systematic review and meta-analysis. JAMA Pediatr. 2020;174(8):772–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lee AC, Kozuki N, Cousens S et al. Estimates of burden and consequences of infants born small for gestational age in low and middle income countries with INTERGROWTH-21st standard: analysis of CHERG datasets. BMJ. 2017;358:j3677. [DOI] [PMC free article] [PubMed]
- 5.Risnes KR, Vatten LJ, Baker JL, et al. Birthweight and mortality in adulthood: a systematic review and meta-analysis. Int J Epidemiol. 2011;40(3):647–61. [DOI] [PubMed] [Google Scholar]
- 6.Boscarino G, Migliorino R, Carbone G, et al. Biomarkers of neonatal sepsis: where we are and where we are going. Antibiotics. 2023;12:8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.McCowan LM, Figueras F, Anderson NH. Evidence-based national guidelines for the management of suspected fetal growth restriction: comparison, consensus, and controversy. Am J Obstet Gynecol. 2018;218(2S):S855-68. [DOI] [PubMed] [Google Scholar]
- 8.Sovio U, White IR, Dacey A, et al. Screening for fetal growth restriction with universal third trimester ultrasonography in nulliparous women in the pregnancy outcome prediction (POP) study: a prospective cohort study. Lancet. 2015;386(10008):2089–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shickel B, Tighe PJ, Bihorac A, et al. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2018;22(5):1589–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fergus P, Selvaraj M, Chalmers C. Machine learning ensemble modelling to classify caesarean section and vaginal delivery types using cardiotocography traces. Comput Biol Med. 2018;93:7–16. [DOI] [PubMed] [Google Scholar]
- 11.Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol. 2019;19(1):64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lundberg SM, Nair B, Vavilala MS, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2(10):749–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.ACOG practice bulletin summary. Fetal growth restriction. Obstet Gynecol. 2021;137(2):385–7. [DOI] [PubMed] [Google Scholar]
- 14.Lees CC, Stampalija T, Baschat A, et al. ISUOG practice guidelines: diagnosis and management of small-for-gestational-age fetus and fetal growth restriction. Ultrasound Obstet Gynecol. 2020;56(2):298–312. [DOI] [PubMed] [Google Scholar]
- 15.Martins JG, Biggio JR, Abuhamad A. Society for Maternal-Fetal medicine consult series #52: diagnosis and management of fetal growth restriction: (replaces clinical guideline number 3, April 2012). Am J Obstet Gynecol. 2020;223(4):B2-17. [DOI] [PubMed] [Google Scholar]
- 16.Gul R, Iqbal S, Anwar Z, et al. Pre-pregnancy maternal BMI as predictor of neonatal birth weight. PLoS ONE. 2020;15:e0240748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bramsved R, Mårild S, Bygdell M, et al. Impact of BMI and smoking in adolescence and at the start of pregnancy on birth weight. BMC Pregnancy Childbirth. 2023;23:206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xu L, Sheng XJ, Gu LP, et al. Influence of perinatal factors on full-term low-birth-weight infants and construction of a predictive model. World J Clin Cases. 2024;12:5901–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Getaneh T, Negesse A, Dessie G, et al. The impact of pregnancy induced hypertension on low birth weight in Ethiopia: systematic review and meta-analysis. Ital J Pediatr. 2020;46:174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Geraghty AA, Alberdi G, O’Sullivan EJ, et al. Maternal blood lipid profile during pregnancy and associations with child adiposity: findings from the ROLO study. PLoS ONE. 2016;11:e0161206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yelverton CA, O’Keeffe LM, Bartels HC, et al. Association between maternal blood lipids during pregnancy and offspring growth trajectories in a predominantly macrosomic cohort: findings from the ROLO longitudinal birth cohort study. Eur J Pediatr. 2023;182:5625–35. [DOI] [PubMed] [Google Scholar]
- 22.Bhowmik B, Siddique T, Majumder A, et al. Maternal BMI and nutritional status in early pregnancy and its impact on neonatal outcomes at birth in Bangladesh. BMC Pregnancy Childbirth. 2019;19:413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Woollett LA, Catov JM, Jones HN. Roles of maternal HDL during pregnancy. Biochimica et Biophysica Acta (BBA). 2022;1867:159106. [DOI] [PubMed] [Google Scholar]
- 24.Herrera E, Ortega-Senovilla H. Disturbances in lipid metabolism in diabetic pregnancy - are these the cause of the problem? Best Pract Res Clin Endocrinol Metab. 2010;24:7. [DOI] [PubMed] [Google Scholar]
- 25.Woollett LA. Maternal cholesterol in fetal development: transport of cholesterol from the maternal to the fetal circulation. Am J Clin Nutr. 2005;11:82. [DOI] [PubMed] [Google Scholar]
- 26.Duttaroy AK. Transport of fatty acids across the human placenta: a review. Prog Lipid Res. 2009;0:48. [DOI] [PubMed] [Google Scholar]
- 27.Zhu SM, Zhang HQ, Li C, et al. Maternal lipid profile during early pregnancy and birth weight: a retrospective study. Front Endocrinol (Lausanne). 2022;13:951871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wu J, Liu X, Qin C, et al. Effect of maternal serum albumin level on birthweight and gestational age: an analysis of 39200 singleton newborns. Front Endocrinol (Lausanne). 2024;15:1266669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dhobale M, Chavan P, Kulkarni A, et al. Reduced folate, increased vitamin B(12) and homocysteine concentrations in women delivering preterm. Ann Nutr Metab. 2012;61:7–14. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Material 1. S1 R-language code. [app.R (docx)]
Supplementary Material 2. S2 Dataset of this study. [Dataset (xlsx)]
Data Availability Statement
Data is provided within the manuscript or supplementary information files.





