Skip to main content
Indian Journal of Community Medicine: Official Publication of Indian Association of Preventive & Social Medicine logoLink to Indian Journal of Community Medicine: Official Publication of Indian Association of Preventive & Social Medicine
. 2025 Feb 27;50(Suppl 1):S134–S139. doi: 10.4103/ijcm.ijcm_247_24

Predictive Modelling of Low Birth Weight in Pregnancies: A Comparative Analysis of Logistic Regression and Decision Tree Approaches

Ravi Kumar 1,, Abhinav Bahuguna 1, Palak Goyal 1, Richa Mishra 1, Huma Khan 1, Amit Kumar 1
PMCID: PMC12430836  PMID: 40949519

Abstract

Background:

Birth weight plays a vital role in an infant’s comprehensive development. Low birth weight (LBW) infants may go through several kinds of health complications in the early stages of their lives. This paper is an attempt to identify the predictors that significantly influence the likelihood of LBW through a model-based approach.

Methodology:

Data for this hospital based cross sectional study includes 130 pregnant women during the years 2022-2023. We have applied logistic regression and the decision tree method for predicting LBW in pregnancies. The performance of these predictive models has been assessed through receiving operating characteristic curve (ROC).

Results:

The findings revealed 38.5% prevalence of LBW in pregnancies. Factors such as age of mother, abortion, presence of co-morbidities, pregnancy complications, and gestational age have been identified as significant predictors (P < 0.05) of LBW through logistic regression. The area under the ROC curve (AUC=0.881) for logistic regression and decision tree (AUC=0.814) indicates that the fitted models have better discrimination ability.

Conclusions:

Logistic have better accuracy than decision tree model. Decision tree excels at capturing patterns but may overfit and hence should be used with caution. This study highlighted the need of targeted policy implementation on maternal and childhood care to reduce the risk of LBW.

Keywords: Low birth weight, logistic regression, decision tree, receiving operating characteristic curve, area under the ROC curve

INTRODUCTION

Low birth weight (LBW) puts a significant challenge not only for maternal health but also in development of a comprehensive health of children. A newborn baby with low weight has a high chance of dying in the early days of its life. According to World Health Organisation (WHO), LBW newborn is defined as one with his/her weight less than 2500 grams (5.5 pounds) at the time of birth. Some factors responsible for LBW are maternal malnutrition, health problems (diabetes, high blood pressure, infections), and risk factors (smoking, alcohol consumption, medical complications), etc. Globally, more than approximately 20 million infants (or, approximately, 15%–20%) are born with LBW.[1,2]

In the prediction of LBW, maternal and pregnancy-related factors play a crucial role in determining the extent of its happening. Understanding these factors play a significant role in building any predictive model. These models are usually capable of identifying high-risk pregnancy factors and, hence, facilitating timely treatments. In this paper, we aim to apply a logistic regression and decision tree approach for prediction of LBW. Logistic regression is a robust statistical method for analysing a binary outcome variable by considering a single or a set of explanatory variables (predictors).[3] On the other hand, decision tree is a machine learning approach which is used for both classification and regression. This technique is well suitable for selecting best feature variables by identifying the ongoing pattern within the dataset.[4] Some studies on epidemiological and clinical studies have applied tree-based approach for studying disease progression.[5,6]

The causes of LBW in infants are interdependent, and therefore, studying socio-economic factors and maternal health can be helpful in reducing complications like child mortality. Some important key determinants like maternal nutritious status, daily physical activity, age of mother, birth order, extreme prematurity, antenatal steroids can affect an infant’s health.[7,8] So, the ultimate purpose of this study is to elicit the distribution of key demographic variables for prediction of LBW.

METHODOLOGY

Data and sample characteristics

This hospital-based cross-sectional study was conducted on pregnant women delivered in the hospital of Sri Ram Murti Smarak Institute of Medical Sciences, Bareilly, Uttar Pradesh, over a period of 2 months from March–April 2024, through secondary data retrieved retrospectively from 2022 to 2023. Simple random sampling method was applied for ensuring a representativeness of sample. By the consideration of z value at 95% confidence level for two-sided test at alpha level of significance, prevalence (P) of LBW at 18% (from NFHS-5) with margin of error (d) at 7%, and 10% dropout rate in Cochran’s formula Inline graphic, we decided to include 130 pregnant women for this study.

Statistical analysis

We have applied logistic regression for the prediction of binary outcome variable, say the presence of LBW (0 for ‘NO’ and 1 for ‘Yes’). The explanatory variables include maternal and pregnancy-related characteristics available in both continuous and categorical form. These predictors were selected based on established literatures, clinical relevance, and their known association with LBW. Further, to identify the hierarchical structure of predictive factors influencing LBW, we have applied ‘decision tree’ which is a part of ‘supervised machine learning’. It is applied in the outermost regions of the tree and then reverts to its initial stage through a ‘retrogrades return’ process.[9] Model performance is assessed using metrics such as accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve.

RESULT

Table 1 shows mean haemoglobin level was 11.04 ± 1.85 g/dl for mothers of LBW infants and 10.90 ± 1.63 g/dl for mothers of normal birth weight infants which was not statistically significant (P = 0.141). Similarly, mean body mass index (BMI) was significantly lower for mothers of LBW infants (37.14 ± 6.44 kg/m2) to mothers of normal birth weight infants (42.04 ± 7.45 kg/m2). Further, 58.0% of LBW infants and 63.8% of normal birth weight infants were delivered through normal vaginal delivery. Conversely, 42.0% of LBW infants were delivered through caesarean section compared to 36.3% of normal birth weight infants (P = 0.512). 78.0% of LBW infants were primigravida compared to 57.5% of normal birth weight infants (P = 0.049). Furthermore, the presence of complications during pregnancy was significantly higher (P < 0.001) among mothers of LBW infants compared to mothers of normal birth weight infants (58.0% vs. 25.0%). Prevalence of hypertension was significantly associated with LBW (P = 0.014). 30.0% of infants with LBW were born to mothers with hypertension, while 12.5% of infants with normal birth weight were born to mothers with hypertension.

Table 1.

Descriptive statistics for characteristics of maternal health and pregnancy related outcomes of low birth weights

Continuous variables Low birth weight
Mean difference P @
Yes No
Mean±Std Mean±Std
Haemoglobin (g/dl) of mother 11.04±1.85 10.90±1.63 0.141 0.498
BMI (kg/mt^2) of mother 37.14 ± 6.44 42.04 ± 7.45 4.89 <0.001***
Age of mother (years) 27.16±4.78 26.91±4.21 0.247 0.277
Frequency of ANC visits 8.26±2.39 7.74±2.09 0.523 0.065
Abortion 0.12±0.32 0.31±0.56 -0.193 <0.001***

Categorical variable Frequency (%) Frequency (%) Total P #

Mode of delivery Normal vaginal 29 51 80 0.512
58.0% 63.8% 61.5%
C section 21 29 50
42.0% 36.3% 38.5%
Gravida Primigravida 39 46 85 0.049**
78.0% 57.5% 65.4%
Bigravida 7 25 32
14.0% 31.3% 24.6%
Multigravida 4 9 13
8.0% 11.3% 10.0%
Parity Nullipara 43 60 103 0.121
86.0% 75.0% 79.2%
Primipara 3 15 18
6.0% 18.8% 13.8%
Multipara 4 5 9
8.0% 6.3% 6.9%
Religion of mother Hindu 27 40 67 0.657
54.0% 50.0% 51.5%
Muslim 23 40 63
46.0% 50.0% 48.5%
Presence of other comorbidities No 28 40 68 0.505
56.0% 50.0% 52.3%
Yes 22 40 62
44.0% 50.0% 47.7%
Complications during pregnancy No 21 60 81 <0.001***
42.0% 75.0% 62.3%
Yes 29 20 49
58.0% 25.0% 37.7%
Gestational age category Preterm (<37 weeks) 19 5 24 <0.001***
38.0% 6.3% 18.5%
Term (37-42 weeks) 29 65 94
58.0% 81.3% 72.3%
Post-term (> 42 weeks) 2 10 12
4.0% 12.5% 9.2%
Hypertension No 35 70 105 0.014**
70.0% 87.5% 80.8%
Yes 15 10 25
30.0% 12.5% 19.2%
Total 50 80 130
100.0% 100.0% 100.0%

@Independent t test, #Chi square test. ** P<0.05, ***P<0.001 (significant). Percentages are calculated column-wise

The distribution of LBW outcomes among 130 pregnancies reveals that 80 (61.5%) of the pregnancies result in LBW. The logistic regression model exhibited that age of mother, the presence of comorbidities, complications during pregnancy, gestational age (categorical), and abortion, are the significant contributors in prediction of LBW, among others as represented in Table 2. Besides these, maternal comorbidities and those facing pregnancy complications are found to be at a increased risk of delivering infants with LBW. Notably, considering gestational age in a specified category plays a crucial role. The preterm births show a significantly higher likelihood of LBW compared to other term births [Figure 1]. The value of the Nagelkerke R Square statistic comes out to be 52.22%, indicating that the model is moderately fitted to the given data. The Akaike information criterion (AIC = 143.54) and Schwarz’s Bayesian information criterion (BIC = 192.29) reveal a good compromise between model’s complexity and its fit. The logistic regression model has a high discrimination capacity (area under curve (AUC) =0.8813), showing that it can effectively distinguish between LBW and non-LBW instances [Figure 2]. The Hosmer–Lemeshow test indicates a good model fit (χ2 (8) =10.75, P > 0.05), confirming its accurate predictability of LBW [Table 2].

Table 2.

The predictors of low birth weight (LBW) using logistic regression (n=130)

Predictors Odds ratio (OR) S.E. 95% C.I. for EXP (B)
P
Lower Upper
Haemoglobin (gm/dl) of mother 1.04 0.20 0.69 1.56 0.83
BMI (kg/mt2) of mother 0.99 0.00 0.99 1.00 0.38
Age of mother 1.95 0.05 0.85 1.07 0.045**
Frequency of ANC visits (minimum) 1.15 0.12 0.89 1.48 0.26
Abortion 1.08 1.28 0.08 13.34 0.025**
Mode of delivery (C Section) 1.01 0.81 0.20 5.01 0.98
Gravida 0.18
  Primigravida 3.79 2.82 0.01 971.17 0.63
  Bi-Gravida 0.27 1.96 0.00 13.09 0.51
Parity 0.70
  PrimiPara 0.17 2.12 0.00 11.15 0.40
  MultiPara 0.50 1.82 0.01 17.96 0.70
Religion of mother (muslim) 1.38 0.50 0.51 3.68 0.51
Presence of co-morbidities 7.70 0.88 1.35 44.01 0.022**
Complications during pregnancy 0.02 0.98 0.00 0.19 <0.001***
Gestational age category 0.001***
  Term 4.48 1.43 0.26 75.33 0.29
  Post-term 0.24 1.37 0.01 3.57 0.30
Hypertension 0.64 0.95 0.10 4.15 0.64
Constant 8.80 3.82 0.56
Nagelkerke R Square 52.61%
AIC 143.54
BIC 192.29
Area under ROC curve 0.88, P<0.001
Hosmer–Lemeshow χ2 (8)=10.75, P>0.05

**P<0.05, ***P<0.001, Significant p value in bold.

Figure 1.

Figure 1

Contribution of predictors to LBW based on Logistic regression analysis

Figure 2.

Figure 2

Receiver Operating Characteristic (ROC) Curve for prediction of Low Birth Weight

Another aspect we have considered for the prediction of LBW is decision tree [Figure 3]. We found that the training set has a lower average log likelihood than the test set, indicating a demand for effective generalization. The lift values are 2.4424 and 2.0 in the training and test sets, respectively. The training set has a lower misclassification cost, confirming its good performance. The confusion matrix shows low false-positive rate in both the training and test sets. In terms of misclassification, the model has a 6.0% error rate for actual events in the training set and a 16.3% error rate for non-events, with costs of 0.06 and 0.16, respectively [Table 3].

Figure 3.

Figure 3

Contribution of predictors to LBW based on Decision tree analysis

Table 3.

Model summary with confusion matrix and misclassification statistics of decision tree model

Training (n1=104) Test (n2=26)
Average log likelihood 0.27 2.28
Area under ROC curve 0.94 0.81
95% CI (0.79, 1) (0.73, 0.89)
Lift 2.44 2
Misclassification cost 0.22 0.37

Actual Class Low birth weight
Total
Yes No

N 50 80 130
Predicted Class (Training) 47 13 60
Predicted Class (Test) 42 17 59
% Correct (Training) 94 83.8 87.7
% Correct (Test) 84 78.8 80.8
Misclassed (Training) 3 67 16
% Error (Training) 6 16.3 12.3
Cost (Training) 0.060 0.163 0.111
Misclassed (Test) 8 63 25
% Error (Test) 16 21.30 19.20
Cost (Test) 0.16 0.21 0.19

Model was trained using 80% of the sample, and 20% was used to test the model performance

DISCUSSION

In this study, overall prevalence of LBW is about 38.5%, and independent variables like age of mother, the presence of comorbidities, complications during pregnancy, gestational categorical age groups, and abortion are the significant predictors of LBW. One such similar study is conducted by Assefa et al.,[10] where with the help of logistic regression, wealth status and antenatal care (ANC) were found to be significant determinants for LBW with 28.3% of incidence rate. In another study by Zeleke et al.,[11] the factors like first delivery and lack of antenatal care were associated with LBW.

The incidence of LBW in this study conducted in Gondar University Hospital was found to be 17.1%. However, some Indian studies have shown that the prevalence of LBW in pregnancies is about 32.8% and some risk factors for LBW are consanguinity, tobacco or smoke consumption, number of ANCs, supplements like calcium and iron, hypertension, delivery type, pre-eclampsia, gestational age.[12] The age of the mother and number of abortions showed a positive association with LBW. This association is supported by the studies from developing countries,[13] and Nepal.[14]

Further, the decision tree model, though exhibiting effective generalization, reflects concerns voiced in the literature about the possibility for over fitting, underlining the need for caution in its application.[15,16]

The future researchers should develop such type of predictive model for prediction of risk factor LBW; for example, lower maternal age may highly be associated with poor newborn outcome. A study by Ranjbar et al.[17] has shown that machine learning models such as deep learning and random forest can perform best for predicting LBW. In the past, at national and state level, governments have initiated some policies like Janani Shishu Suraksha Karyakram, Navjaat Shishu Suraksha Karyakram, Home-Based Newborn Care, and Indian Newborn Action Plan (INAP). The policies to some extent are found to be successful in terms of improving maternal health and childhood care.[18] The increasing LBW in pregnancies can be improved through combination of advanced statistical approaches for better understanding of the pregnancy related issues along with medical intervention, providing nutrition support and education, prenatal care, antenatal quality, and improving maternal lifestyle, etc.

CONCLUSION

It can be concluded that logistic regression is preferred when the focus is on interpretability, the relationships between variables and predicting binary outcomes. While decision trees offer a more flexible way of detecting nonlinear relationships/interactions within datasets. In this study, logistic regression performs better which highlights high prevalence of LBW infants occurs due to various demographic and clinical factors including abortions, the presence of comorbidities, complications during pregnancy, and gestational age. There is a need of early abortion diagnosis, genetic counselling for consanguineous couples, and enhanced prenatal care to reduce the incidence of LBW.

Limitations

The findings from this study are based on the data obtained from a single hospital only with smaller sample size.

Conflicts of interest

There are no conflicts of interest.

Acknowledgement

This paper has emerged out of the project entitled “Comparing the pregnancy outcomes in women with or without congenital or other heart diseases: A record based study” sponsored by Women in Cardiology and Related Sciences (WINCARS) under the scheme Prajwalika Scholarship Program.

Funding Statement

WINCARS Association under the scheme Prajwalika Scholarship Program.

REFERENCES

  • 1.United Nations Children’s Fund (UNICEF), World Health Organization (WHO) Geneva: World Health Organization; 2019. UNICEF-WHO low birthweight estimates: Levels and trends 2000–2015. Licence: CC BY-NC-SA 3.0 IGO. Available from: data.unicef.org/nutrition; who.int/nutrition. [Google Scholar]
  • 2.Ranjbaran M, Jafary-Manesh H, Sajjadi-Hazaneh L., Eisaabadi S, Talkhabi S, Sadat Khoshniyat A. Prevalence of low birth weight and some associated factors in Markazi province, 2013–2014. World J Med Sci. 2015;12:252–8. [Google Scholar]
  • 3.Hosmer DW, Lemeshow S, Sturdivant RX. 3rd. Hoboken (NJ): John Wiley and Sons; 2013. Applied Logistic Regression. doi: 10.1002/9781118548387. [Google Scholar]
  • 4.Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD. An introduction to decision tree modeling. Journal of Chemometrics. 2004;18:275–85. [Google Scholar]
  • 5.Werneck GL, De Carvalho DM, Barroso DE, Cook EF, Walker SM. Brief report. Classification trees and logistic regression applied to prognostic studies: A comparison using meningococcal disease as an example. J Trop Pediatr. 1999;45:248–51. doi: 10.1093/tropej/45.4.248. [DOI] [PubMed] [Google Scholar]
  • 6.Harper PR. A review and comparison of classification algorithms for medical decision making. Health Policy. 2005;71:315–31. doi: 10.1016/j.healthpol.2004.05.002. [DOI] [PubMed] [Google Scholar]
  • 7.Muthayya S. Maternal nutrition and low birth weight-what is really important. Indian J Med Res. 2009;130:600–8. [PubMed] [Google Scholar]
  • 8.Khan N, Mozumdar A, Kaur S. Determinants of low birth weight in India: An investigation from the National Family Health Survey. Am J Hum Biol. 2020;32:e23355. doi: 10.1002/ajhb.23355. doi: 10.1002/ajhb. 23355. [DOI] [PubMed] [Google Scholar]
  • 9.Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: A methodology review. J Biomed Inform. 2002;35:352–9. doi: 10.1016/s1532-0464(03)00034-0. [DOI] [PubMed] [Google Scholar]
  • 10.Assefa N, Berhane Y, Worku A. Wealth status, mid upper arm circumference (MUAC) and antenatal care (ANC) are determinants for low birth weight in Kersa, Ethiopia. PloS One. 2012;7:e39957. doi: 10.1371/journal.pone.0039957. doi: 10.1371/journal.pone. 0039957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zeleke BM, Zelalem M, Mohammed N. Incidence and correlates of low birth weight at a referral hospital in Northwest Ethiopia. Pan Afr Med J. 2012;12:4. [PMC free article] [PubMed] [Google Scholar]
  • 12.Kumari S, Garg N, Kumar A, Guru PKI, Ansari S, Anwar S, et al. Maternal and severe anaemia in delivering women is associated with risk of preterm and low birth weight: A cross sectional study from Jharkhand, India. One Health. 2019;8:100098. doi: 10.1016/j.onehlt.2019.100098. doi: 10.1016/j.onehlt. 2019.100098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mahumud RA, Sultana M, Sarker AR. Distribution and determinants of low birth weight in developing countries. J Prev Med Public Health. 2017;50:18. doi: 10.3961/jpmph.16.087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bhaskar RK, Deo KK, Neupane U, Chaudhary Bhaskar S, Yadav BK, Pokharel HP, et al. A case control study on risk factors associated with low birth weight babies in Eastern Nepal. Int J Pediatr. 2015;2015:807373. doi: 10.1155/2015/807373. doi: 10.1155/2015/807373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Borson NS, Kabir MR, Zamal Z, Rahman RM. Correlation analysis of demographic factors on low birth weight and prediction modeling using machine learning techniques. Proceedings of 2020 4th World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), London, UK. 2020:169–73. doi: 10.1109/WorldS450073.2020.9210338. [Google Scholar]
  • 16.Islam Pollob SMA, Abedin MM, Islam MT, Islam MM, Maniruzzaman M. Predicting risks of low birth weight in Bangladesh with machine learning. PLoS One. 2022;17:e0267190. doi: 10.1371/journal.pone.0267190. doi: 10.1371/journal.pone. 0267190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ranjbar A, Montazeri F, Farashah MV, Mehrnoush V, Darsareh F, Roozbeh N. Machine learning-based approach for predicting low birth weight. BMC Pregnancy Childbirth. 2023;23:803. doi: 10.1186/s12884-023-06128-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bhagat AK, Mehendale AM, Muneshwar KN. Factors associated with low birth weight among the tribal population in India: A narrative review. Cureus. 2024;16:e53478. doi: 10.7759/cureus.53478. doi: 10.7759/cureus. 53478. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Indian Journal of Community Medicine: Official Publication of Indian Association of Preventive & Social Medicine are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES