Table 1.
Summary of low birthweight (LBW) prediction– and classification-related studies.
Study | Data size (LBW %) | Number of input features | Summary of features | Rebalancing method | Prediction model | Best performancea | Identified risk factors | ||||||||
Previous studies with small sample sizes | |||||||||||||||
|
Yarlapati et al [25], 2017 | 101 (not reported) | 18 | Mothers’ predelivery factors: physical, social, medical, and nutritional | None | Bayes minimum error rate | Accuracy: 96.77%; recall: 1.0; AUROCb: 0.93 | Mothers’ living community, age, and weight | |||||||
|
Senthilkumar and Paulraj [20], 2015 | 189 (31.22) | 11 | Mothers’ predelivery factors: physical | None | LRc, NBd, RFe, SVMf, NNg, and CTh | CT—accuracy: 89.95%; recall: 0.98; AUROC: 0.94 | Last weight before pregnancy, mother’s age, number of physician visits during the first trimester, and number of previous premature labors | |||||||
|
Ahmadi et al [21]i, 2017 | 600 (9.5) | 17 | Baby: sex and delivery method; mothers’ predelivery factors: physical, social, medical, and nutritional | None | RF and LR | RF—accuracy: 95%; recall: 0.72; AUROC: 0.89 | Pregnancy age, BMI, and mother’s age | |||||||
|
Desiani et al [26], 2019 | 219 (not reported) | 6 | Mothers’ predelivery factors: physical, social, and medical | None | NB | Accuracy (for LBW group): 81.25%; recall (for LBW group): 0.72; AUROC: not reported | None | |||||||
Previous studies with larger sample sizes without using any rebalancing methods | |||||||||||||||
|
Akmal and Razmy [27], 2020 | 2702 (not reported) | 12 | Mothers’ predelivery factors: physical and medical | None | BFj tree, C4.5, RF, random tree, REPk tree, and logistic model tree | C4.5—accuracy: 79.23%; recall: 1.0; AUROC: not reported | None | |||||||
|
Islam et al [28]i, 2022 | 2351 (16.2) | 17 | Baby: sex, singleton, and delivery method; mothers’ predelivery factors: physical, social, and medical | None | LR and DTl | LR—accuracy: 87.6%; recall: 1.0; AUROC: 0.59 | None | |||||||
|
Borson et al [22], 2020 | 4498 (not reported) | 9 | Mothers’ predelivery factors: physical and social | None | LR, NB, KNNm, RF, SVM, and MLPn | SVM and MLP—accuracy: 81.67%; recall: 0.82; AUROC: not reported | None | |||||||
|
Faruk and Cahyono [23], 2018 | 12,055 (<10) | 8 | Mothers’ predelivery factors: physical and social; fathers’ factors: social | None | LR and RF | RF—accuracy: 93%; recall: unknown; AUROC: 0.51 | Top 3 features: mother’s age, time zone, and wealth index | |||||||
|
Eliyati et al [29], 2019 | 12,055 (<10) | 8 | Mothers’ predelivery factors: physical and social; fathers’ factors: social | None | SVM | Accuracy: 92.9%; recall: not reported; AUROC: 0.56 | None | |||||||
Previous studies using rebalancing methods | |||||||||||||||
|
Loreto et al [30]i, 2019 | 2328 (13.45) | 8 | Baby: sex; mothers’ predelivery factors: physical and medical; mothers’ non-predelivery factor: gestational age | Oversampling | AdaBoosto, CT, KNN, NB, RF, and SVM | AdaBoost—accuracy: 98%; recall: 0.91; AUROC: not reported | None | |||||||
|
Bekele [31]i, 2022 | 2110 (10) | 25 | Baby: sex and delivery method; mothers’ predelivery factors: physical, social, and medical | SMOTEp | LR, DT, NB, KNN, RF, SVM, gradient boosting, and XGBoostq | RF—accuracy: 91.6%; recall: 0.92; AUROC: 0.97 | Gender of the child, marriage to birth interval, mother’s occupation, and mother’s age | |||||||
|
Khan et al [32]i, 2022 | 821 (10.84) | 88 | Mothers’ predelivery factors: physical, social, and medical; fathers’ factors: social; mothers’ non-predelivery factor: gestational age | SMOTE | Zero, KNN, NB, kStar, MLP, random tree, SVM, AdaBoost, LR, RF, OneR, stacking, stack, DT, and bagging | LR—accuracy: 90.24%; recall: 0.90 (accuracy on LBW: 33%) | Diabetes, gestational age, and hypertension |
aThe performance measurements are explained in Multimedia Appendix 1 [33].
bAUROC: area under the receiver operating characteristic curve.
cLR: logistic regression.
dNB: naive Bayes.
eRF: random forest.
fSVM: support vector machine.
gNN: neural network.
hCT: classification tree.
iThe study applied features from the baby side or the mothers’ non-predelivery features.
jBF: best first.
kREP: reduced error pruning.
lDT: decision tree.
mKNN: k-nearest neighbors.
nMLP: multilayer perceptron.
oAdaBoost: adaptive boosting.
pSMOTE: synthetic minority oversampling technique.
qXGBoost: extreme gradient boosting.