. 2021 Apr 29;9:662183. doi: 10.3389/fped.2021.662183

Table 4.

Ten-fold cross-validation results for logistic regression (LR), random forest (RF), and generalized boosted regression (GBM) models for predicting diagnosis, management, and severity.

Classifier	Diagnosis		Management		Severity
	Sens. (±SD)	Spec. (±SD)	Sens. (±SD)	Spec. (±SD)	Sens. (±SD)	Spec. (±SD)
Random	0.57	0.43	0.62	0.38	0.88	0.12
AS or PAS ≥ 4 and appendix diameter ≥ 6 mm	0.91	0.73	—	—	—	—
Suspected diagnosis	1.00	0.46	—	—	—	—
LR (full)	0.88 (±0.06)	0.76 (±0.11)	0.85 (±0.09)	0.82 (±0.09)	0.93 (±0.05)	0.42 (±0.32)
LR (w/o US)	0.75 (±0.06)	0.72 (±0.09)	0.92 (±0.07)	0.85 (±0.05)	0.95 (±0.04)	0.52 (±0.29)
LR (w/o peritonitis/abdominal guarding)	0.87 (±0.07)	0.76 (±0.12)	0.84 (±0.10)	0.68 (±0.15)	0.94 (±0.05)	0.40 (±0.36)
LR (w/o US and peritonitis/abdominal guarding)	0.77 (±0.06)	0.67 (±0.11)	0.82 (±0.06)	0.63 (±0.07)	0.97 (±0.05)	0.44 (±0.34)
RF (full)	0.91 (±0.03)	0.86 (±0.08)	0.94 (±0.07)	0.80 (±0.09)	0.98 (±0.02)	0.45 (±0.16)
RF (w/o US)	0.81 (±0.07)	0.71 (±0.07)	0.93 (±0.07)	0.82 (±0.07)	0.97 (±0.02)	0.44 (±0.13)
RF (w/o peritonitis/abdominal guarding)	0.91 (±0.04)	0.90 (±0.06)	0.86 (±0.07)	0.65 (±0.18)	0.98 (±0.02)	0.37 (±0.17)
RF (w/o US and peritonitis/abdominal guarding)	0.79 (±0.06)	0.64 (±0.11)	0.81 (±0.06)	0.56 (±0.06)	0.98 (±0.02)	0.40 (±0.15)
GBM (full)	0.93 (±0.02)	0.86 (±0.07)	0.93 (±0.07)	0.86 (±0.07)	0.97 (±0.02)	0.46 (±0.18)
GBM (w/o US)	0.80 (±0.07)	0.74 (±0.11)	0.91 (±0.08)	0.85 (±0.05)	0.97 (±0.03)	0.44 (±0.16)
GBM (w/o peritonitis/abdominal guarding)	0.92 (±0.04)	0.83 (±0.09)	0.88 (±0.04)	0.66 (±0.11)	0.97 (±0.03)	0.47 (±0.20)
GBM (w/o US and peritonitis/abdominal guarding)	0.80 (±0.06)	0.61 (±0.10)	0.82 (±0.07)	0.59 (±0.09)	0.97 (±0.03)	0.47 (±0.19)

Results are given by average sensitivities (sens.) and specificities (spec.) with standard deviations across 10 folds. “Full” models use all predictors; models “w/o US” were trained without ultrasonographic findings; models “w/o peritonitis/abdominal guarding” were trained without the “peritonitis/abdominal guarding” predictor; and models “w/o US and peritonitis/abdominal guarding” were trained without ultrasonographic findings or the “peritonitis/abdominal guarding” predictor. For all classifiers, a probability threshold of 0.5 was used to differentiate between classes. “Random” corresponds to a random guess and serves as a naïve baseline. Bold values correspond to the best average performances achieved across all models.