Table 6.
Test Set | Training Set | |||||||
---|---|---|---|---|---|---|---|---|
Balance Accuracy | Precision | Sensitivity | Specificity | MCC | F1-Score | Balance Accuracy | Encoder | |
DNN Circular Fingerprint | 0.746 | 0.527 | 0.672 | 0.819 | 0.454 | 0.591 | 0.870 | Circular Fingerprint |
DNN MACCS | 0.700 | 0.446 | 0.638 | 0.762 | 0.358 | 0.525 | 0.737 | MACCS fingerprint |
DNN CDDD | 0.808 | 0.539 | 0.828 | 0.788 | 0.542 | 0.653 | 0.836 | Latent representation CDDD |
DNN Molecular Descriptors | 0.774 | 0.471 | 0.828 | 0.720 | 0.470 | 0.600 | 0.811 | Molecular Descriptors |
MPNN | 0.746 | 0.527 | 0.672 | 0.819 | 0.454 | 0.591 | 0.741 | Graph |
NLP chars Embedding | 0.780 | 0.551 | 0.741 | 0.819 | 0.510 | 0.632 | 0.753 | Text Vectorization and character embedding |
NLP chars Embedding Augmented | 0.815 | 0.616 | 0.776 | 0.855 | 0.585 | 0.687 | 0.886 | Text Vectorization and character embedding |
Multimodal | 0.808 | 0.592 | 0.777 | 0.839 | 0.564 | 0.672 | 0.830 | All (no graph) |
Extreme Gradient Boosting (best ML with oversampling methods) | 0.742 | 0.605 | 0.600 | 0.883 | 0.485 | 0.602 | 0.921 | Latent Description CDDD |