Skip to main content
. 2021 Mar 29;17(3):e1008857. doi: 10.1371/journal.pcbi.1008857

Table 2. Model performance using several supervised machine-learning algorithms.

Various machine-learning algorithms were evaluated using the entire feature set (n = 3065 genes) and the Clairvoyance-optimized feature set (GeneSety1-y5, n = 399 genes) with the same LCOCV pairs. Performance metrics for each LCOCV set include accuracy, precision, recall, and F1 score. LCOCV refers to Leave Compound Out Cross Validation where we remove all instances of a compound from the data used to fit the model (training data) and evaluate performance on the held-out compound profiles (testing data) (see Materials and Methods).

Clairvoyance feature selection [N = 399 Genes] No feature selection [N = 3065 Genes]
Accuracy F1 Score Precision Recall Accuracy F1 Score Precision Recall
Classifier
CoHEC 0.999 0.983 0.983 0.982 0.749 0.693 0.715 0.682
Logistic Regression 0.880 0.829 0.856 0.817 0.793 0.732 0.763 0.723
Random Forest 0.792 0.719 0.768 0.708 0.742 0.659 0.703 0.645
K-Nearest Neighbors 0.714 0.568 0.617 0.546 0.636 0.506 0.561 0.481
Support Vector Machine 0.798 0.722 0.778 0.704 0.694 0.616 0.668 0.600
Naive Bayes (Gaussian) 0.698 0.582 0.623 0.561 0.429 0.302 0.389 0.274
AdaBoost 0.333 0.308 0.333 0.301 0.339 0.277 0.333 0.261
Neural Network 0.872 0.785 0.815 0.773 0.741 0.635 0.683 0.619