Table 6.
Accuracy (ACC) and area under the precision-recall curve (AUPRC) of the top five one-feature and two-feature combinations of the logistic regression (LR), support vector machine (SVM), k-nearest neighbors (KNN) and decision tree (DT) models for classifying low/high viscosity. There are 20 mAbs in this study. The ACC and AUPRC are averaged from 100 randomly generated 4-fold cross-validation sets. The baseline ACC is 0.70 and the baseline AUPRC is 0.30
One-feature | ACC | AUPRC | Two-features | ACC | AUPRC | ||
---|---|---|---|---|---|---|---|
N_neg_VH | 0.79 | 0.57 | SCM_neg_VH | SCM_neg_VL | 0.86 | 0.70 | |
SCM_neg_VL | 0.77 | 0.54 | N_neg_VH | SCM_neg_VL | 0.84 | 0.68 | |
LR | net charges_VH | 0.78 | 0.53 | N_neg_VH | net charges_VL | 0.83 | 0.67 |
N_neg_VL | 0.77 | 0.51 | SCM_neg_VL | SCM_pos_VH | 0.83 | 0.66 | |
net charges_VL | 0.74 | 0.48 | net charges_VH | net charges_VL | 0.81 | 0.65 | |
N_neg_VH | 0.76 | 0.47 | N_philic_VH | SAP_pos_VL | 0.82 | 0.64 | |
net charges_VH | 0.74 | 0.46 | N_philic_Fv | SAP_pos_VL | 0.82 | 0.63 | |
SVM | SCM_neg_VL | 0.72 | 0.45 | N_philic_Fv | N_neg_VH | 0.82 | 0.60 |
mAbCSP | 0.74 | 0.37 | N_phobic_VL | N_neg_VH | 0.82 | 0.60 | |
N_neg_VL | 0.70 | 0.34 | N_philic_VH | N_neg_VH | 0.81 | 0.58 | |
HVI | 0.76 | 0.59 | N_pos_VL | N_neg_VH | 0.83 | 0.66 | |
SAP_pos_VL | 0.82 | 0.65 | N_philic_Fv | FvCSP | 0.82 | 0.64 | |
KNN | SCM_neg_VL | 0.74 | 0.52 | N_pos_VL | net charges_VH | 0.83 | 0.64 |
net charges_VH | 0.78 | 0.51 | SCM_neg_VH | SCM_neg_VL | 0.82 | 0.62 | |
N_neg_VH | 0.78 | 0.5 | N_philic_VH | FvCSP | 0.76 | 0.62 | |
SAP_pos_VL | 0.73 | 0.52 | N_phobic_VL | SAP_pos_VL | 0.84 | 0.74 | |
net charges_VH | 0.77 | 0.51 | N_neg_Fv | SCM_pos_VL | 0.78 | 0.60 | |
DT | N_neg_mAb | 0.79 | 0.51 | N_neg_mAb | net charges_VL | 0.77 | 0.60 |
SCM_pos_VL | 0.75 | 0.49 | N_neg_mAb | SCM_neg_VL | 0.76 | 0.58 | |
N_neg_VH | 0.76 | 0.49 | N_phobic_VL | net charges_VH | 0.79 | 0.56 |