Table 1.
Training Features | Bagging | Random Forest | Adaptive Boosting | Gradient Boosting | Neural Network | Average | |
---|---|---|---|---|---|---|---|
S1 | BSA | 0.74 | 0.74 | 0.81 | 0.81 | 0.55 | 0.73 |
(0.51) | (0.51) | (0.43) | (0.41) | (0.50) | (0.47) | ||
S2 | RCs | 0.86 | 0.86 | 0.85 | 0.86 | 0.85 | 0.86 |
(0.50) | (0.50) | (0.51) | (0.50) | (0.54) | (0.51) | ||
S3 | CC, CP, CA, PP, AP, AA | 0.89 | 0.90 | 0.89 | 0.89 | 0.89 | 0.89 |
(0.67) | (0.70) | (0.69) | (0.67) | (0.67) | (0.68) | ||
S4 | CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS | 0.90 (0.69) |
0.90 (0.69) |
0.89 (0.66) |
0.89 (0.67) |
0.89 (0.67) |
0.89 (0.68) |
S5 | CC, CP, CA, PP, AP, AA, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T | 0.92 (0.74) |
0.92 (0.73) |
0.91 (0.74) |
0.92 (0.71) |
0.91 (0.77) |
0.92 (0.74) |
S6 | CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T | 0.92 (0.73) |
0.92 (0.75) |
0.91 (0.74) |
0.93 (0.70) |
0.92 (0.76) |
0.92 (0.74) |
E1 | HS | 0.76 | 0.76 | 0.83 | 0.82 | 0.82 | 0.80 |
(0.59) | (0.59) | (0.62) | (0.62) | (0.59) | (0.60) | ||
E2 | Eelec, Evdw, Edes | 0.87 | 0.87 | 0.87 | 0.87 | 0.85 | 0.87 |
(0.64) | (0.61) | (0.62) | (0.62) | (0.68) | (0.63) | ||
CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T, Eelec, Evdw, Edes | 0.92 (0.72) |
0.93 (0.73) |
0.92 (0.74) |
0.93 (0.72) |
0.90 (0.77) |
0.92 (0.74) |
|
C |
Accuracy values calculated according Eq. 2 in “Methods”
The predictive accuracies have been reported for several classification models tested. Nine sets of features have been used to train new predictive models, based on structural properties (S1, S2, S3, S4, S4, S6), energetics (E1, E2) and a combination of structure and energetics (C). For each set of training features, five machine learning algorithms have been used for the training (Bagging, Random Forest, Adaptive Boosting, Gradient Boosting and Neural Network). For the trained models, the accuracies on the Many [34] and the DC [15] (numbers in brackets) datasets are reported. The accuracy on the Many is reported as average of the 10-fold cross validation. In brackets the accuracy over the DC dataset is reported