. 2023 May 31;25:e44081. doi: 10.2196/44081

Table 1.

Summary of low birthweight (LBW) prediction– and classification-related studies.

Study			Data size (LBW %)		Number of input features		Summary of features		Rebalancing method		Prediction model		Best performance^a		Identified risk factors
Previous studies with small sample sizes
	Yarlapati et al [25], 2017	101 (not reported)		18		Mothers’ predelivery factors: physical, social, medical, and nutritional		None		Bayes minimum error rate		Accuracy: 96.77%; recall: 1.0; AUROC^b: 0.93		Mothers’ living community, age, and weight
	Senthilkumar and Paulraj [20], 2015	189 (31.22)		11		Mothers’ predelivery factors: physical		None		LR^c, NB^d, RF^e, SVM^f, NN^g, and CT^h		CT—accuracy: 89.95%; recall: 0.98; AUROC: 0.94		Last weight before pregnancy, mother’s age, number of physician visits during the first trimester, and number of previous premature labors
	Ahmadi et al [21]ⁱ, 2017	600 (9.5)		17		Baby: sex and delivery method; mothers’ predelivery factors: physical, social, medical, and nutritional		None		RF and LR		RF—accuracy: 95%; recall: 0.72; AUROC: 0.89		Pregnancy age, BMI, and mother’s age
	Desiani et al [26], 2019	219 (not reported)		6		Mothers’ predelivery factors: physical, social, and medical		None		NB		Accuracy (for LBW group): 81.25%; recall (for LBW group): 0.72; AUROC: not reported		None
Previous studies with larger sample sizes without using any rebalancing methods
	Akmal and Razmy [27], 2020	2702 (not reported)		12		Mothers’ predelivery factors: physical and medical		None		BF^j tree, C4.5, RF, random tree, REP^k tree, and logistic model tree		C4.5—accuracy: 79.23%; recall: 1.0; AUROC: not reported		None
	Islam et al [28]ⁱ, 2022	2351 (16.2)		17		Baby: sex, singleton, and delivery method; mothers’ predelivery factors: physical, social, and medical		None		LR and DT^l		LR—accuracy: 87.6%; recall: 1.0; AUROC: 0.59		None
	Borson et al [22], 2020	4498 (not reported)		9		Mothers’ predelivery factors: physical and social		None		LR, NB, KNN^m, RF, SVM, and MLPⁿ		SVM and MLP—accuracy: 81.67%; recall: 0.82; AUROC: not reported		None
	Faruk and Cahyono [23], 2018	12,055 (<10)		8		Mothers’ predelivery factors: physical and social; fathers’ factors: social		None		LR and RF		RF—accuracy: 93%; recall: unknown; AUROC: 0.51		Top 3 features: mother’s age, time zone, and wealth index
	Eliyati et al [29], 2019	12,055 (<10)		8		Mothers’ predelivery factors: physical and social; fathers’ factors: social		None		SVM		Accuracy: 92.9%; recall: not reported; AUROC: 0.56		None
Previous studies using rebalancing methods
	Loreto et al [30]ⁱ, 2019	2328 (13.45)		8		Baby: sex; mothers’ predelivery factors: physical and medical; mothers’ non-predelivery factor: gestational age		Oversampling		AdaBoost^o, CT, KNN, NB, RF, and SVM		AdaBoost—accuracy: 98%; recall: 0.91; AUROC: not reported		None
	Bekele [31]ⁱ, 2022	2110 (10)		25		Baby: sex and delivery method; mothers’ predelivery factors: physical, social, and medical		SMOTE^p		LR, DT, NB, KNN, RF, SVM, gradient boosting, and XGBoost^q		RF—accuracy: 91.6%; recall: 0.92; AUROC: 0.97		Gender of the child, marriage to birth interval, mother’s occupation, and mother’s age
	Khan et al [32]ⁱ, 2022	821 (10.84)		88		Mothers’ predelivery factors: physical, social, and medical; fathers’ factors: social; mothers’ non-predelivery factor: gestational age		SMOTE		Zero, KNN, NB, kStar, MLP, random tree, SVM, AdaBoost, LR, RF, OneR, stacking, stack, DT, and bagging		LR—accuracy: 90.24%; recall: 0.90 (accuracy on LBW: 33%)		Diabetes, gestational age, and hypertension

^aThe performance measurements are explained in Multimedia Appendix 1 [33].

^bAUROC: area under the receiver operating characteristic curve.

^cLR: logistic regression.

^dNB: naive Bayes.

^eRF: random forest.

^fSVM: support vector machine.

^gNN: neural network.

^hCT: classification tree.

ⁱThe study applied features from the baby side or the mothers’ non-predelivery features.

^jBF: best first.

^kREP: reduced error pruning.

^lDT: decision tree.

^mKNN: k-nearest neighbors.

ⁿMLP: multilayer perceptron.

^oAdaBoost: adaptive boosting.

^pSMOTE: synthetic minority oversampling technique.

^qXGBoost: extreme gradient boosting.