Skip to main content
. 2018 Nov 30;19(Suppl 15):438. doi: 10.1186/s12859-018-2414-9

Table 1.

Performance of classification models based on different features and training algorithms

Training Features Bagging Random Forest Adaptive Boosting Gradient Boosting Neural Network Average
S1 BSA 0.74 0.74 0.81 0.81 0.55 0.73
(0.51) (0.51) (0.43) (0.41) (0.50) (0.47)
S2 RCs 0.86 0.86 0.85 0.86 0.85 0.86
(0.50) (0.50) (0.51) (0.50) (0.54) (0.51)
S3 CC, CP, CA, PP, AP, AA 0.89 0.90 0.89 0.89 0.89 0.89
(0.67) (0.70) (0.69) (0.67) (0.67) (0.68)
S4 CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS 0.90
(0.69)
0.90
(0.69)
0.89
(0.66)
0.89
(0.67)
0.89
(0.67)
0.89
(0.68)
S5 CC, CP, CA, PP, AP, AA, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T 0.92
(0.74)
0.92
(0.73)
0.91
(0.74)
0.92
(0.71)
0.91
(0.77)
0.92
(0.74)
S6 CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T 0.92
(0.73)
0.92
(0.75)
0.91
(0.74)
0.93
(0.70)
0.92
(0.76)
0.92
(0.74)
E1 HS 0.76 0.76 0.83 0.82 0.82 0.80
(0.59) (0.59) (0.62) (0.62) (0.59) (0.60)
E2 Eelec, Evdw, Edes 0.87 0.87 0.87 0.87 0.85 0.87
(0.64) (0.61) (0.62) (0.62) (0.68) (0.63)
CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T, Eelec, Evdw, Edes 0.92
(0.72)
0.93
(0.73)
0.92
(0.74)
0.93
(0.72)
0.90
(0.77)
0.92
(0.74)
C

Accuracy values calculated according Eq. 2 in “Methods

The predictive accuracies have been reported for several classification models tested. Nine sets of features have been used to train new predictive models, based on structural properties (S1, S2, S3, S4, S4, S6), energetics (E1, E2) and a combination of structure and energetics (C). For each set of training features, five machine learning algorithms have been used for the training (Bagging, Random Forest, Adaptive Boosting, Gradient Boosting and Neural Network). For the trained models, the accuracies on the Many [34] and the DC [15] (numbers in brackets) datasets are reported. The accuracy on the Many is reported as average of the 10-fold cross validation. In brackets the accuracy over the DC dataset is reported