. 2021 Apr 29;9:662183. doi: 10.3389/fped.2021.662183

Table 3.

Ten-fold cross-validation results for logistic regression (LR), random forest (RF), and generalized boosted regression (GBM) models for predicting diagnosis, management, and severity.

Classifier	Diagnosis		Management		Severity
	AUROC (±SD)	AUPR (±SD)	AUROC (±SD)	AUPR (±SD)	AUROC (±SD)	AUPR (±SD)
Random	0.50	0.43	0.50	0.38	0.50	0.12
AS	0.75	0.71	—	—	—	—
PAS	0.71	0.67	—	—	—	—
AS or PAS ≥ 4 and appendix diameter ≥ 6 mm	0.79	0.83	—	—	—	—
Suspected diagnosis	0.73	0.85	—	—	—	—
LR (full)	0.91 (±0.04)	0.88 (±0.07)	0.90 (±0.04)	0.88 (±0.06)	0.82 (±0.13)	0.53 (±0.26)
LR (w/o US)	0.82 (±0.06)	0.71 (±0.12)	0.91 (±0.04)	0.90 (±0.05)	0.91 (±0.09)	0.69 (±0.26)
LR (w/o peritonitis/abdominal guarding)	0.90 (±0.04)	0.87 (±0.06)	0.83 (±0.04)	0.79 (±0.06)	0.82 (±0.15)	0.58 (±0.28)
LR (w/o US and peritonitis/abdominal guarding)	0.77 (±0.06)	0.67 (±0.14)	0.80 (±0.04)	0.77 (±0.06)	0.81 (±0.16)	0.62 (±0.26)
RF (full)	0.96 (±0.01)	0.94 (±0.03)	0.94 (±0.02)	0.92 (±0.05)	0.90 (±0.08)	0.70 (±0.17)
RF (w/o US)	0.85 (±0.05)	0.77 (±0.11)	0.93 (±0.03)	0.90 (±0.07)	0.90 (±0.08)	0.67 (±0.18)
RF (w/o peritonitis/abdominal guarding)	0.95 (±0.01)	0.93 (±0.05)	0.85 (±0.07)	0.79 (±0.11)	0.88 (±0.10)	0.65 (±0.23)
RF (w/o US and peritonitis/abdominal guarding)	0.80 (±0.06)	0.73 (±0.11)	0.78 (±0.05)	0.70 (±0.10)	0.86 (±0.10)	0.58 (±0.21)
GBM (full)	0.96 (±0.02)	0.94 (±0.03)	0.94 (±0.02)	0.93 (±0.04)	0.90 (±0.07)	0.64 (±0.21)
GBM (w/o US)	0.85 (±0.06)	0.75 (±0.10)	0.92 (±0.04)	0.90 (±0.05)	0.91 (±0.07)	0.60 (±0.25)
GBM (w/o peritonitis/abdominal guarding)	0.95 (±0.02)	0.92 (±0.05)	0.87 (±0.05)	0.82 (±0.08)	0.84 (±0.13)	0.58 (±0.25)
GBM (w/o US and peritonitis/abdominal guarding)	0.79 (±0.06)	0.71 (±0.11)	0.79 (±0.07)	0.72 (±0.08)	0.84 (±0.12)	0.55 (±0.27)

Results are given by average areas under receiver operating characteristic (AUROC) and precision-recall (AUPR) curves and standard deviations across 10 folds. “Full” models use all predictors; models “w/o US” were trained without ultrasonographic findings; models “w/o peritonitis/abdominal guarding” were trained without “peritonitis/abdominal guarding” predictor; and models “w/o US and peritonitis/abdominal guarding” were trained without ultrasonographic findings and “peritonitis/abdominal guarding” predictor. For fixed classification rules, such as Alvarado (AS) and pediatric appendicitis scores (PAS), AUROC and AUPR on the whole dataset are reported without standard deviations. For random classifiers, we report expected AUROC and AUPR. “Random” corresponds to a random guess and serves as a naïve baseline. Bold values correspond to the best average performances achieved across all models.