TABLE 1.
Performance Metrics of best performing models developed for NS3 and NS5 Protein using various MLTs and Feature Selection method on TT and IV Datasets.
| Algorithm | Feature selection | Model parameters | Dataset | MAE | MSE | RMSE | R 2 | PCC |
|---|---|---|---|---|---|---|---|---|
| NS3 | ||||||||
| SVM | SVR | gamma:0.0001 C:100 | T 1092 | 0.191 | 0.065 | 0.255 | 0.733 | 0.857 |
| V 121 | 0.195 | 0.074 | 0.272 | 0.756 | 0.870 | |||
| Perceptron | gamma:0.001 C:10 | T1092 | 0.239 | 0.099 | 0.315 | 0.593 | 0.771 | |
| V121 | 0.212 | 0.087 | 0.295 | 0.713 | 0.844 | |||
| DT | gamma:0.005 C:1 | T1092 | 0.255 | 0.118 | 0.343 | 0.517 | 0.719 | |
| V121 | 0.215 | 0.097 | 0.311 | 0.680 | 0.833 | |||
| RF | SVR | n:400 | T1092 | 0.264 | 0.124 | 0.352 | 0.492 | 0.703 |
| V121 | 0.213 | 0.093 | 0.305 | 0.693 | 0.851 | |||
| Perceptron | n:500 | T1092 | 0.275 | 0.130 | 0.361 | 0.467 | 0.684 | |
| V121 | 0.225 | 0.102 | 0.319 | 0.663 | 0.832 | |||
| DT | n:400 | T1092 | 0.266 | 0.124 | 0.350 | 0.475 | 0.705 | |
| V121 | 0.218 | 0.103 | 0.321 | 0.66 | 0.825 | |||
| kNN | SVR | k:3 | T1092 | 0.251 | 0.119 | 0.346 | 0.511 | 0.725 |
| V121 | 0.250 | 0.131 | 0.363 | 0.565 | 0.756 | |||
| Perceptron | k:7 | T1092 | 0.267 | 0.127 | 0.356 | 0.480 | 0.695 | |
| V121 | 0.227 | 0.105 | 0.325 | 0.651 | 0.811 | |||
| DT | k:5 | T1092 | 0.270 | 0.131 | 0.362 | 0.464 | 0.690 | |
| V121 | 0.233 | 0.109 | 0.330 | 0.639 | 0.799 | |||
| ANN | SVR | solver:lbfgs activation: identity |
T 1092 | 0.192 | 0.064 | 0.253 | 0.738 | 0.862 |
| V 121 | 0.190 | 0.070 | 0.265 | 0.781 | 0.894 | |||
| Perceptron | activation:logistic | T1092 | 0.249 | 0.108 | 0.329 | 0.556 | 0.751 | |
| V121 | 0.216 | 0.086 | 0.293 | 0.716 | 0.844 | |||
| DT | activation: logistic | T1092 | 0.275 | 0.135 | 0.365 | 0.426 | 0.682 | |
| V121 | 0.241 | 0.111 | 0.332 | 0.634 | 0.798 | |||
| XGBoost | SVR | n_estimators = 300, max_depth = 3, learning_rate = 0.141 | T1092 | 0.249 | 0.111 | 0.334 | 0.544 | 0.738 |
| V121 | 0.222 | 0.087 | 0.296 | 0.710 | 0.849 | |||
| Perceptron | n_estimators = 300, max_depth = 7, learning_rate = 0.058 | T1092 | 0.272 | 0.132 | 0.362 | 0.448 | 0.678 | |
| V121 | 0.212 | 0.089 | 0.298 | 0.705 | 0.842 | |||
| DT | n_estimators = 200, max_depth = 3, learning_rate = 0.104 | T1092 | 0.259 | 0.119 | 0.344 | 0.514 | 0.718 | |
| V121 | 0.225 | 0.096 | 0.31 | 0.681 | 0.830 | |||
| NS5 | ||||||||
| SVM | SVR | gamma:0.0001 C:400 | T 140 | 0.135 | 0.049 | 0.197 | 0.954 | 0.982 |
| V 15 | 0.138 | 0.044 | 0.210 | 0.94 | 0.970 | |||
| Perceptron | gamma:0.0005 C:400 | T140 | 0.222 | 0.105 | 0.310 | 0.884 | 0.953 | |
| V15 | 0.24 | 0.137 | 0.370 | 0.814 | 0.904 | |||
| DT | gamma:0.005 C:10 | T140 | 0.429 | 0.399 | 0.591 | 0.632 | 0.802 | |
| V15 | 0.420 | 0.446 | 0.668 | 0.395 | 0.713 | |||
| RF | SVR | n:400 depth: 12 | T140 | 0.399 | 0.324 | 0.544 | 0.659 | 0.840 |
| V15 | 0.340 | 0.211 | 0.46 | 0.713 | 0.863 | |||
| Perceptron | n:300 | T140 | 0.360 | 0.294 | 0.513 | 0.680 | 0.852 | |
| V15 | 0.288 | 0.180 | 0.425 | 0.755 | 0.873 | |||
| DT | n:200 depth: None leaf:1 | T140 | 0.424 | 0.388 | 0.601 | 0.560 | 0.799 | |
| V15 | 0.367 | 0.308 | 0.555 | 0.582 | 0.771 | |||
| kNN | SVR | k:3 | T140 | 0.343 | 0.235 | 0.468 | 0.727 | 0.889 |
| V15 | 0.292 | 0.144 | 0.380 | 0.804 | 0.901 | |||
| Perceptron | k:3 | T140 | 0.335 | 0.235 | 0.468 | 0.753 | 0.895 | |
| V15 | 0.360 | 0.232 | 0.481 | 0.686 | 0.833 | |||
| DT | k:3 | T140 | 0.508 | 0.499 | 0.687 | 0.446 | 0.739 | |
| V15 | 0.563 | 0.609 | 0.781 | 0.173 | 0.663 | |||
| ANN | SVR | solver: lbfgs activation: identity learning: invscaling | T 140 | 0.159 | 0.073 | 0.271 | 0.928 | 0.964 |
| V 15 | 0.160 | 0.048 | 0.219 | 0.935 | 0.977 | |||
| Perceptron | solver: lbfgs activation: logistic learning: adaptive | T140 | 0.255 | 0.119 | 0.345 | 0.884 | 0.942 | |
| V15 | 0.337 | 0.238 | 0.488 | 0.710 | 0.854 | |||
| DT | solver: lbfgs activation: tanh learning: adaptive | T140 | 0.532 | 0.505 | 0.711 | 0.508 | 0.762 | |
| V15 | 0.507 | 0.592 | 0.769 | 0.197 | 0.708 | |||
| XGBoost | SVR | n_estimators = 300, max_depth = 3, learning_rate = 0.078 | T140 | 0.334 | 0.22 | 0.444 | 0.766 | 0.889 |
| V15 | 0.388 | 0.274 | 0.523 | 0.628 | 0.818 | |||
| Perceptron | n_estimators = 300, max_depth = 3, learning_rate = 0.143 | T140 | 0.335 | 0.243 | 0.458 | 0.678 | 0.846 | |
| V15 | 0.313 | 0.200 | 0.447 | 0.729 | 0.870 | |||
| DT | n_estimators = 150, max_depth = 5, learning_rate = 0.038 | T140 | 0.406 | 0.326 | 0.548 | 0.609 | 0.812 | |
| V15 | 0.358 | 0.207 | 0.455 | 0.719 | 0.854 | |||
SVR, Support vector regression; DT, Decision tree; TT, Training or testing dataset; IV, Independent validation dataset; PCC, Pearson’s correlation coefficient, R 2 - coefficient of determination, MAE, mean absolute error; MSE, mean squared error; RMSE, root mean squared error.
Bold values indicate the best-performing model(s) based on Pearson correlation coefficient (PCC) and coefficient of determination (R2) for each dataset.