Table 2.
Area Under the Curve (AUC) of models generated with Python, where the Word2Vec features are the sum of the numeric vectors of the last 25 codes
Validation and Test AUCs of Different Models and Feature Configurations | |||
---|---|---|---|
Model | Features | Average Cross-Validation AUC (St. Dev.) | Test AUC |
Logistic Regression (LR) | Manual | 0.7612 (0.004123) | 0.747 |
Word2Vec | 0.7470 (0.005600) | 0.757 | |
Manual and Word2Vec | 0.7862 (0.005758) | 0.783 | |
Gradient Boosting Machine (GBM) | Manual | 0.8037 (0.004001) | 0.804 |
Word2Vec | 0.7700 (0.005138) | 0.768 | |
Manual and Word2Vec | 0.8138 (0.004534) | 0.813 | |
Manual and Word2Vec† | 0.8249 (0.004549) | 0.826 | |
Logistic Regression (LR) | LACE | 0.6548 (0.006444) | 0.655 |
†GBM with manually selected training parameters (see Section Methods - Model Training)