Skip to main content
. 2021 Apr 15;13:29. doi: 10.1186/s13321-021-00508-0

Table 1.

Major differences between QSAR-Co and QSAR-Co-X

No Utility QSAR-Co QSAR-Co-X Remarks
1 Feature selection One (GA) Two (FS and SFS)
2 Reproducibility of linear modelling Low High Given the same sample size and number of descriptors, GA produces different LDA models on different runs, whereas both the FS and SFS always yield the same model
3 Diagnosis of intercollinearity among variables Not available Available and automatically performed Very helpful for ascertaining the robustness of the derived linear models
4 Dataset division options Random, Kennard-Stone, Euclidean-based Random, pre-defined, k-MCA Since only the random division option is fast, the other QSAR-Co options were replaced to reduce computational time
5 Automatic generation of the validation set Not available Available Unlike QSAR-Co, QSAR-Co-X allows generating both the screening and validation sets
6 Statistical parameters for the validation set Manual calculations are required Automatic calculation Automatic calculation allows fast selection of the models
7 Number of Box-Jenkins operators available One (pre-defined) Four (three pre-defined and one user-specific) Additional and more flexible operators were added to QSAR-Co-X
8 Yc randomisation Not available Available A modified form of the Y-randomisation technique that incorporates the influence of experimental elements
9 Machine-learning tools One (RF only) Six (kNN, SVM, RF, NB, GB, and MLP) QSAR-Co-X affords several non-linear modelling tools
10 Number of parameters that may be altered in RF modelling 5 8 QSAR-Co-X offers more flexibility for setting up RF models
11 Comparative analysis of multiple ML methods Not possible Possible Useful to decide which ML method performs best
12 Hyperparameter tuning options for ML methods Not available Available Extremely useful to find optimised non-linear models
13 User specific parameter settings for building non-linear models For RF only For kNN, SVM, RF, NB, GB, and MLP
14 Display of ROC plots (linear modelling) For sub-training and test sets For sub-training, test and validation sets
15 Condition-wise prediction Not available Available Useful to understand how the developed model performs against individual experimental conditions, particularly for large datasets