Skip to main content
. 2022 Nov 24;22:1415. doi: 10.1186/s12913-022-08748-y

Table 2.

Area Under the Curve (AUC) of models generated with Python, where the Word2Vec features are the sum of the numeric vectors of the last 25 codes

Validation and Test AUCs of Different Models and Feature Configurations
Model Features Average Cross-Validation AUC (St. Dev.) Test AUC
Logistic Regression (LR) Manual 0.7612 (0.004123) 0.747
Word2Vec 0.7470 (0.005600) 0.757
Manual and Word2Vec 0.7862 (0.005758) 0.783
Gradient Boosting Machine (GBM) Manual 0.8037 (0.004001) 0.804
Word2Vec 0.7700 (0.005138) 0.768
Manual and Word2Vec 0.8138 (0.004534) 0.813
Manual and Word2Vec 0.8249 (0.004549) 0.826
Logistic Regression (LR) LACE 0.6548 (0.006444) 0.655

GBM with manually selected training parameters (see Section Methods - Model Training)