. 2022 Nov 24;22:1415. doi: 10.1186/s12913-022-08748-y

Table 2.

Area Under the Curve (AUC) of models generated with Python, where the Word2Vec features are the sum of the numeric vectors of the last 25 codes

Validation and Test AUCs of Different Models and Feature Configurations
Model	Features	Average Cross-Validation AUC (St. Dev.)	Test AUC
Logistic Regression (LR)	Manual	0.7612 (0.004123)	0.747
	Word2Vec	0.7470 (0.005600)	0.757
	Manual and Word2Vec	0.7862 (0.005758)	0.783
Gradient Boosting Machine (GBM)	Manual	0.8037 (0.004001)	0.804
	Word2Vec	0.7700 (0.005138)	0.768
	Manual and Word2Vec	0.8138 (0.004534)	0.813
	Manual and Word2Vec^†	0.8249 (0.004549)	0.826
Logistic Regression (LR)	LACE	0.6548 (0.006444)	0.655

^†GBM with manually selected training parameters (see Section Methods - Model Training)