. 2022 Apr 20;10(4):e33875. doi: 10.2196/33875

Table 3.

Data processing and machine learning modeling.

Study	Preprocessing data			Model		Dominant model		Evaluation metrics		Analysis software and package		Findings
	Missing data management	Class imbalance
Weber et al, 2018 [15]	MICE^a	—^b	Super learning approach using logistic regression, random forest, K-nearest neighbors, LR^c (LASSO^d, ridge, and an elastic net)		No difference between models		Sensitivity, specificity, PVP^e, PVN^f, and AUC^g		Rstudio (version 3.3.2), SuperLearner package		AUC=0.67, sensitivity=0.61, specificity=0.64
Rawashdeh et al, 2020 [16]	Instances with missing values were removed manually	SMOTE^h	Locally weighted learning, Gaussian process, K-star classifier, linear regression, K-nearest neighbor, decision tree, random forest, neural network		Random forest		Accuracy, sensitivity, specificity, AUC, and G-means		WEKAⁱ (version 3.9)		Random forest: G-mean=0.96, sensitivity=1.00, specificity=0.94, accuracy=0.95, AUC=0.98 (oversampling ratio of 200%)
Gao et al, 2019 [17]	—	Control group were undersampled	RNNs^j, long short-term memory network, logistic regression, SVM^k, Gradient boosting		RNN ensembled models on balanced data		Sensitivity, specificity, PVP, and AUC		—		AUC=0.827, sensitivity=0.965, specificity=0.698, PVP=0.033
Lee and Ahn, 2019 [18]	—	—	ANN^l, logistic regression, decision tree, naïve Bayes, random forest, SVM		No difference between models		Accuracy		Python (version 3.52)		No difference in accuracy between ANN (0.9115) with logistic regression and the random forest (0.9180 and 0.8918, respectively)
Woolery and Grzymala-Busse, 1994 [19]	—	—	LERS^m		—		Accuracy		ID3ⁿ, LERS CONCLUS		Database 1: accuracy=88.8% accurate for both low-risk and high-risk pregnancy. Database 2: accuracy=59.2% in high-risk pregnant women. Database 3: accuracy=53.4%
Grzymala-Busse and Woolery,1994 [20]	—	—	LERS based on the bucket brigade algorithm of genetic algorithms and enhanced by partial matching		—		Accuracy		LERS		Accuracy=68% to 90%
Vovsha et al, 2014 [21]	—	Oversampling techniques (Adasyn)	SVMs with linear and nonlinear kernels, LR (forward selection, stepwise selection, L1 LASSO regression, and elastic net regression)		—		Sensitivity, specificity, and G-means		Rstudio, glmnet package		SVM: sensitivity (0.404 to 0.594), specificity (0.621 to 0.84), G-mean (0.575 to 0.652); LR: sensitivity (0.502 to 0.591), specificity (0.587 to 0.731), G-mean (0.586 to 0.604)
Esty et al, 2018 [22]	Imputation with the missForest package in R	Not clear	Hybrid C5.0 decision tree−ANN classifier		—		Sensitivity, specificity, and ROC^o		R software, missForest Package, FANN^p library		Sensitivity: 84.1% to 93.4%, specificity: 70.6% to 76.9%, AUC: 78.5% to 89.4%
Frize et al, 2011 [23]	Decision tree	—	Hybrid decision tree–ANN		—		Sensitivity, specificity, ROC for P^q and NP^r cases		See5, MATLAB Neural Ware tool		Training (P: sensitivity=66%, specificity=83%, AUC=0.81; NP: sensitivity=62.8%, specificity=71.7%, AUC=0.72), test (P: sensitivity=66.3%, specificity=83.9%, AUC=0.80; NP: sensitivity=65%, specificity=71.3%, AUC=0.73), and verification (P sensitivity=61.4%, specificity=83.3%, AUC=0.79; NP: sensitivity=65.5%, specificity=71.1%, AUC=0.73)
Goodwin and Maher, 2000 [24]	PVRuleMinerl or FactMiner	—	Neural networks, LR, CART^s, and software programs called PVRuleMiner and FactMiner		No difference between models		ROC		Custom data mining software (Clinical Miner and PVRuleMiner, FactMiner)		No significant difference between techniques. Neural network (AUC=0.68), stepwise LR (AUC=0.66), CART (AUC=0.65), FactMiner (demographic features only; AUC=0.725), FactMiner (demographic plus other indicator features; AUC=0.757)
Tran et al, 2016 [3]	—	Undersampling of the majority class	SSLR^t, RGB^u		—		Sensitivity, specificity, NPV^v, PVP, F-measure, and AUC		—		SSLR: sensitivity=0.698 to 0.734, specificity=0.643 to 0.732, F-measure=0.70 0.73, AUC=0.764 to 0.791, NPV=0.96 to 0.719, PVP=0.679, 0.731; RGB: sensitivity=0.621 to 0.720, specificity=0.74 to 0.841, F-measures=0.693 to 0.732, NPV=0.675 to 0.717, PVP=0.783 to 0.743, AUC=0.782 to 0.807
Koivu and Sairanen, 2020 [9]	—	—	LR, ANN, LGBM^w, deep neural network, SELU^x network, average ensemble, and weighted average WA^y ensemble		—		AUC		Rstudio (version 3.5.1) and Python (version 3.6.9)		AUC for classifiers: LR=0.62 to 0.64; deep neural network: 0.63 to 0.66; SELU network: 0.64 to 0.67; LGBM: 0.64 to 0.67; average ensemble: 0.63 to 0.67; WA ensemble: 0.63 to 0.67
Khatibi et al, 2019 [25]	Map phase module	—	Decision trees, SVMs and random forests, ensemble classifiers		—		Accuracy and AUC		—		Accuracy=81% and AUC=68%

^aMICE: Multiple Imputation by Chained Equations.

^bNot reported in the study.

^cLR: linear regression.

^dLASSO: least absolute shrinkage and selection operator.

^ePVP: predictive value positive.

^fPVN: predictive value negative.

^gAUC: area under the ROC curve.

^hSMOTE: Synthetic Minority Oversampling Technique.

ⁱWEKA: Waikato Environment for Knowledge Analysis.

^jRNN: recurrent neural network.

^kSVM: support vector machine.

^lANN: artificial neural network.

^mLERS: learning from examples of rough sets.

ⁿID3: iterative dichotomiser 3.

^oROC: receiver operating characteristic.

^pFANN: Fast Artificial Neural Network.

^qP: parous.

^rNP: nulliparous.

^sCART: classification and regression tree.

^tSSLR: stabilized sparse logistic regression.

^uRGB: Randomized Gradient Boosting.

^vNPV: net present value.

^wLGBM: Light Gradient Boosting Machine.

^xSELU: scaled exponential linear unit.

^yWA: weighted average.