Table 3. Average Metrics for Each of 10 Hold-Out Folds from Cross-Validation Using 676 Molecules from Aliper et al. Annotated with Only One of the 12 MeSH Classesa.
problem group | metric | SVM1 | DNN2 | IMG + CNN3 | MFP + RF4 |
---|---|---|---|---|---|
3-class | accuracy | 0.53 | 0.701 | 0.747 ± 0.0657 | 0.742 ± 0.0692 |
balanced accuracy | 0.739 ± 0.0644 | 0.715 ± 0.0766 | |||
MCC | 0.619 ± 0.102 | 0.612 ± 0.106 | |||
ROC AUC | 0.870 ± 0.0412 | 0.894 ± 0.0417 | |||
ave. precision score | 0.806 ± 0.0592 | 0.847 ± 0.0588 | |||
5-class | accuracy | 0.417 | 0.596 | 0.653 ± 0.0451 | 0.694 ± 0.0497 |
balanced accuracy | 0.620 ± 0.0509 | 0.635 ± 0.0661 | |||
MCC | 0.549 ± 0.0599 | 0.606 ± 0.0660 | |||
ROC AUC | 0.867 ± 0.0322 | 0.892 ± 0.0284 | |||
ave. precision score | 0.735 ± 0.0568 | 0.791 ± 0.0471 | |||
12-class | accuracy | 0.366 | 0.546 | 0.608 ± 0.0500 | 0.641 ± 0.0331 |
balanced accuracy | 0.507 ± 0.107 | 0.504 ± 0.0522 | |||
MCC | 0.525 ± 0.0620 | 0.572 ± 0.0388 | |||
ROC AUCb | 0.863 ± 0.209 | 0.896 ± 0.0200 | |||
ave. precision scoreb | 0.672 ± 0.0303 | 0.751 ± 0.0205 |
Values for the gene-expression-based models are from Aliper et al. who used different training and validation folds for 10-fold cross-validation with a 1support vector machine (SVM) or 2multilayer perceptron deep neural network (DNN) based on pathway activation scores. Values from this paper using 3molecule images input to a convolutional neural network (IMG + CNN) or 4Morgan molecular fingerprints as the input to the random forest (MFP + RF). Values for 3,4 are the mean of the validation folds ± standard deviation.
Receiver operator characteristic area under the curve (ROC AUC) and average precision score were computed as the weighted average of scores across classes and only computed for the first six validation sets of the 12-class problem due to fewer than 10 examples in the dermatological and urological classes.