1 |
Feature selection |
One (GA) |
Two (FS and SFS) |
– |
2 |
Reproducibility of linear modelling |
Low |
High |
Given the same sample size and number of descriptors, GA produces different LDA models on different runs, whereas both the FS and SFS always yield the same model |
3 |
Diagnosis of intercollinearity among variables |
Not available |
Available and automatically performed |
Very helpful for ascertaining the robustness of the derived linear models |
4 |
Dataset division options |
Random, Kennard-Stone, Euclidean-based |
Random, pre-defined, k-MCA |
Since only the random division option is fast, the other QSAR-Co options were replaced to reduce computational time |
5 |
Automatic generation of the validation set |
Not available |
Available |
Unlike QSAR-Co, QSAR-Co-X allows generating both the screening and validation sets |
6 |
Statistical parameters for the validation set |
Manual calculations are required |
Automatic calculation |
Automatic calculation allows fast selection of the models |
7 |
Number of Box-Jenkins operators available |
One (pre-defined) |
Four (three pre-defined and one user-specific) |
Additional and more flexible operators were added to QSAR-Co-X |
8 |
Yc randomisation |
Not available |
Available |
A modified form of the Y-randomisation technique that incorporates the influence of experimental elements |
9 |
Machine-learning tools |
One (RF only) |
Six (kNN, SVM, RF, NB, GB, and MLP) |
QSAR-Co-X affords several non-linear modelling tools |
10 |
Number of parameters that may be altered in RF modelling |
5 |
8 |
QSAR-Co-X offers more flexibility for setting up RF models |
11 |
Comparative analysis of multiple ML methods |
Not possible |
Possible |
Useful to decide which ML method performs best |
12 |
Hyperparameter tuning options for ML methods |
Not available |
Available |
Extremely useful to find optimised non-linear models |
13 |
User specific parameter settings for building non-linear models |
For RF only |
For kNN, SVM, RF, NB, GB, and MLP |
– |
14 |
Display of ROC plots (linear modelling) |
For sub-training and test sets |
For sub-training, test and validation sets |
– |
15 |
Condition-wise prediction |
Not available |
Available |
Useful to understand how the developed model performs against individual experimental conditions, particularly for large datasets |