Table 1.
S. No. | Feature | No. of Features | PCC | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Training/Testing, T683 (10×) | Validation, V76 | |||||||||
SVM | RF | IBk | K* | SVM | RF | IBk | K* | |||
1 | Amino acid composition (Mono) | 20 | 0.59 | 0.61 | 0.44 | 0.41 | 0.64 | 0.64 | 0.42 | 0.41 |
2 | Di‐peptide composition (Di) | 400 | 0.61 | 0.60 | 0.47 | 0.43 | 0.66 | 0.62 | 0.47 | 0.45 |
3 | C8 Binary profile (C8 Bin) | 160 | 0.56 | 0.57 | 0.45 | 0.42 | 0.59 | 0.60 | 0.43 | 0.41 |
4 | N8 Binary profile (N8 Bin) | 160 | 0.51 | 0.54 | 0.45 | 0.43 | 0.48 | 0.60 | 0.45 | 0.43 |
5 | Physicochemical properties (Physico) | 315 | 0.59 | 0.54 | 0.46 | 0.44 | 0.63 | 0.68 | 0.46 | 0.45 |
6 | Solvent accessibility (SA) | 21 | 0.22 | 0.20 | 0.18 | 0.19 | 0.21 | 0.18 | 0.15 | 0.16 |
7 | Secondary structure (SS) | 3 | 0.18 | 0.18 | 0.16 | 0.17 | 0.19 | 0.16 | 0.17 | 0.18 |
8 | 1 + 2 | 420 | 0.60 | 0.61 | 0.47 | 0.45 | 0.67 | 0.62 | 0.48 | 0.48 |
9 | 3 + 4 | 320 | 0.59 | 0.62 | 0.51 | 0.48 | 0.62 | 0.65 | 0.52 | 0.50 |
10 | 1 + 2+5 | 735 | 0.63 | 0.61 | 0.52 | 0.51 | 0.70 | 0.64 | 0.54 | 0.51 |
11 | 3 + 4+5 | 635 | 0.63 | 0.60 | 0.51 | 0.50 | 0.72 | 0.67 | 0.52 | 0.50 |
12 | 1 + 2+3 + 4 | 740 | 0.61 | 0.62 | 0.51 | 0.49 | 0.67 | 0.63 | 0.51 | 0.50 |
13 | 1 + 2+3 + 4+5 | 1055 | 0.62 | 0.61 | 0.50 | 0.51 | 0.66 | 0.64 | 0.54 | 0.53 |
14 | 6 + 7 | 23 | 0.22 | 0.20 | 0.18 | 0.21 | 0.23 | 0.19 | 0.20 | 0.18 |
15 | 1 + 2+5 + 6+7 | 758 | 0.66 | 0.63 | 0.55 | 0.54 | 0.74 | 0.68 | 0.59 | 0.57 |
16 | 3 + 4+5 + 6+7 | 658 | 0.65 | 0.64 | 0.56 | 0.55 | 0.73 | 0.70 | 0.58 | 0.56 |
10‐Fold cross validation performance of predictive models on AVP dataset of 683 sequences (T683) and evaluation of performance of predictive models on validation dataset of 76 peptides (V76) using SVM, RF, IBk, and K* MLTs.
Abbreviations: SVM: support vector machine; RF: random forest; IBk: instance‐based classifier (Weka); K*: KStar (Weka); T685: Training dataset of 683 AVPs; 10×: 10‐fold cross validation; V76: independent dataset of 76 AVPs.