Skip to main content
. 2020 Mar 4;10:3986. doi: 10.1038/s41598-020-60747-3

Table 4.

Performance of QSAR classification models autofluorescence activity specific to wavelength.

Autofluorescence assays
Blue
10-fold cross-validation (full set, n = 1045)
Method Acc Accb Sp Se MCC
RF 0.773 +/− 0.011 0.685 +/− 0.02 0.917 +/− 0.013 0.453 +/− 0.027 0.431 +/− 0.028
SVM-linear 0.759 +/− 0.009 0.67 +/− 0.011 0.905 +/− 0.01 0.435 +/− 0.012 0.394 +/− 0.022
SVM-radial 0.705 +/− 0.003 0.527 +/− 0.0045 0.998 +/− 0.002 0.056 +/− 0.007 0.184 +/− 0.019
SVM-sigmoid 0.633 +/− 0.015 0.5475 +/− 0.016 0.772 +/− 0.015 0.323 +/− 0.017 0.101 +/− 0.032
LDA 0.737 +/− 0.01 0.6685 +/− 0.014 0.849 +/− 0.008 0.488 +/− 0.02 0.357 +/− 0.026
CART 0.728 +/− 0.014 0.656 +/− 0.025 0.845 +/− 0.013 0.467 +/− 0.036 0.333 +/− 0.036
NN 0.722 +/− 0.011 0.670 +/− 0.037 0.808 +/− 0.027 0.531 +/− 0.048 0.344 +/− 0.026
Fitting (training set, n = 888)
Method Acc Accb Sp Se MCC
RF 0.998 +/− 0.002 0.9965 +/− 0.004 0.999 +/− 0.001 0.994 +/− 0.006 0.995 +/− 0.005
SVM-linear 0.801 +/− 0.005 0.7175 +/− 0.014 0.936 +/− 0.01 0.499 +/− 0.018 0.505 +/− 0.011
SVM-radial 0.974 +/− 0.004 0.958 +/− 0.0065 1 +/− 0.001 0.916 +/− 0.012 0.939 +/− 0.01
SVM-sigmoid 0.605 +/− 0.021 0.518 +/− 0.024 0.747 +/− 0.027 0.289 +/− 0.021 0.039 +/− 0.04
LDA 0.823 +/− 0.008 0.764 +/− 0.014 0.918 +/− 0.006 0.61 +/− 0.022 0.568 +/− 0.019
CART 0.144 +/− 0.009 0.188 +/− 0.029 0.072 +/− 0.013 0.304 +/− 0.044 −0.654 +/− 0.024
NN 0.799 +/− 0.027 0.7515 +/− 0.07 0.877 +/− 0.048 0.626 +/− 0.092 0.522 +/− 0.065
External validation (test set, n = 157)
Method Acc Accb Sp Se MCC
RF 0.773 +/− 0.025 0.6905 +/− 0.044 0.916 +/− 0.023 0.465 +/− 0.065 0.438 +/− 0.075
SVM-linear 0.762 +/− 0.033 0.678 +/− 0.0525 0.91 +/− 0.027 0.446 +/− 0.078 0.411 +/− 0.089
SVM-radial 0.698 +/− 0.032 0.5245 +/− 0.014 0.999 +/− 0.003 0.05 +/− 0.025 0.176 +/− 0.045
SVM-sigmoid 0.655 +/− 0.025 0.5785 +/− 0.054 0.793 +/− 0.039 0.364 +/− 0.069 0.167 +/− 0.054
LDA 0.733 +/− 0.028 0.663 +/− 0.038 0.857 +/− 0.026 0.469 +/− 0.049 0.352 +/− 0.061
CART 0.278 +/− 0.021 0.3425 +/− 0.054 0.165 +/− 0.045 0.52 +/− 0.063 −0.334 +/− 0.065
NN 0.729 +/− 0.044 0.6765 +/− 0.104 0.826 +/− 0.059 0.527 +/− 0.149 0.363 +/− 0.115
Green
10-fold cross-validation (full set, n = 339)
Method Acc Accb Sp Se MCC
RF 0.856 +/− 0.009 0.7975 +/− 0.022 0.943 +/− 0.009 0.652 +/− 0.034 0.642 +/− 0.025
SVM-linear 0.81 +/− 0.011 0.72 +/− 0.0215 0.942 +/− 0.017 0.498 +/−0.026 0.516 +/−0.027
SVM-radial 0.719 +/− 0.006 0.5285 +/− 0.005 0.998 +/− 0.002 0.059 +− 0.008 0.193 +/− 0.024
SVM-sigmoid 0.689 +/− 0.031 0.5925 +/− 0.0415 0.829 +/− 0.035 0.356 +/− 0.048 0.204 +/− 0.068
LDA 0.783 +/− 0.023 0.7605 +/− 0.030 0.817 +/− 0.02 0.704 +/− 0.039 0.503 +/− 0.051
CART 0.79 +/− 0.026 0.741 +/− 0.044 0.862 +/− 0.015 0.62 +/− 0.072 0.49 +/− 0.07
NN 0.8 +/− 0.024 0.76 +/− 0.0405 0.858 +/− 0.028 0.662 +/− 0.053 0.521 +/− 0.055
Fitting (training set, n = 288)
Method Acc Accb Sp Se MCC
RF 0.998 +/− 0.002 0.997 +/− 0.004 0.999 +/− 0.002 0.995 +/− 0.006 0.995 +/− 0.006
SVM-linear 0.871 +/− 0.039 0.8065 +/− 0.064 0.965 +/− 0.012 0.648 +/− 0.116 0.678 +/− 0.099
SVM-radial 0.977 +/− 0.008 0.9615 +/− 0.014 0.999 +/− 0.002 0.924 +/− 0.026 0.946 +/− 0.019
SVM-sigmoid 0.624 +/− 0.033 0.5305 +/− 0.046 0.76 +/− 0.037 0.301 +/− 0.055 0.063 +/− 0.07
LDA 0.976 +/− 0.01 0.9695 +/− 0.014 0.986 +/− 0.012 0.953 +/− 0.016 0.943 +/− 0.024
CART 0.096 +/− 0.007 0.1305 +/− 0.041 0.046 +/− 0.023 0.215 +/− 0.059 −0.767 +/− 0.017
NN 0.914 +/− 0.026 0.8965 +/− 0.038 0.941 +/− 0.035 0.852 +/− 0.041 0.797 +/− 0.056
External validation (test set, n = 51)
Method Acc Accb Sp Se MCC
RF 0.857 +/− 0.032 0.8075 +/− 0.069 0.932 +/− 0.039 0.683 +/− 0.099 0.651 +/− 0.085
SVM-linear 0.823 +/− 0.04 0.7405 +/− 0.0745 0.952 +/− 0.037 0.529 +/− 0.112 0.561 +/− 0.107
SVM-radial 0.713 +/− 0.038 0.5285 +/− 0.0225 1 +/− 0 0.057 +/− 0.045 0.168 +/− 0.116
SVM-sigmoid 0.674 +/− 0.032 0.5965 +/− 0.088 0.794 +/− 0.055 0.399 +/− 0.121 0.199 +/− 0.087
LDA 0.749 +/− 0.048 0.729 +/− 0.094 0.779 +/− 0.064 0.679 +/− 0.124 0.44 +/− 0.11
CART 0.211 +/− 0.067 0.2525 +/− 0.114 0.149 +/− 0.078 0.356 +/− 0.15 −0.502 +/− 0.16
NN 0.784 +/− 0.068 0.74 +/− 0.1085 0.853 +/− 0.061 0.627 +/− 0.156 0.487 +/− 0.16
Red
10-fold cross-validation (training set, n = 148)
Method Acc Accb Sp Se MCC
RF 0.877 +/− 0.016 0.82 +/− 0.042 0.955 +/− 0.022 0.685 +/− 0.062 0.691 +/− 0.043
SVM-linear 0.846 +/− 0.026 0.7685 +/− 0.033 0.954 +/− 0.027 0.583 +/− 0.039 0.609 +/− 0.071
SVM-radial 0.737 +/− 0.007 0.5475 +/− 0.0075 1 +/− 0 0.095 +/− 0.015 0.262 +/− 0.024
SVM-sigmoid 0.647 +/− 0.047 0.5365 +/− 0.063 0.8 +/− 0.038 0.273 +/− 0.088 0.079 +/− 0.122
LDA 0.68 +/− 0.041 0.676 +/− 0.053 0.687 +/− 0.049 0.665 +/− 0.057 0.325 +/− 0.074
CART 0.804 +/− 0.016 0.752 +/− 0.044 0.876 +/− 0.028 0.628 +/− 0.06 0.517 +/− 0.043
NN 0.795 +/− 0.028 0.7485 +/− 0.04 0.86 +/− 0.026 0.637 +/− 0.054 0.501 +/− 0.065
Fitting (training set, n = 126)
Method Acc Accb Sp Se MCC
RF 0.99 +/− 0.005 0.9825 +/− 0.009 1 +/− 0 0.965 +/− 0.017 0.975 +/− 0.012
SVM-linear 0.934 +/− 0.033 0.895 +/− 0.0565 0.989 +/− 0.01 0.801 +/− 0.103 0.839 +/− 0.083
SVM-radial 0.98 +/− 0.005 0.966 +/− 0.0095 1 +/− 0 0.932 +/− 0.019 0.953 +/− 0.013
SVM-sigmoid 0.622 +/− 0.055 0.517 +/− 0.089 0.764 +/− 0.075 0.27 +/− 0.103 0.043 +/− 0.113
LDA 0.997 +/− 0.004 0.9955 +/− 0.007 1 +/− 0 0.991 +/− 0.013 0.994 +/− 0.009
CART 0.104 +/− 0.014 0.1335 +/− 0.057 0.064 +/− 0.031 0.203 +/− 0.082 −0.748 +/− 0.034
NN 0.958 +/− 0.028 0.9485 +/− 0.036 0.973 +/− 0.022 0.924 +/− 0.05 0.899 +/− 0.069
External validation (test set, n = 22)
Method Acc Accb Sp Se MCC
RF 0.868 +/− 0.04 0.7995 +/− 0.121 0.949 +/− 0.055 0.65 +/− 0.186 0.672 +/− 0.092
SVM-linear 0.839 +/− 0.092 0.7705 +/− 0.1175 0.932 +/− 0.101 0.609 +/− 0.134 0.621 +/− 0.172
SVM-radial 0.727 +/− 0.076 0.529 +/− 0.0355 1 +/− 0 0.058 +/− 0.071 0.126 +/− 0.156
SVM-sigmoid 0.587 +/− 0.1 0.5355 +/− 0.162 0.701 +/− 0.094 0.37 +/− 0.23 0.038 +/− 0.219
LDA 0.64 +/− 0.114 0.625 +/− 0.168 0.664 +/− 0.143 0.586 +/− 0.192 0.235 +/− 0.217
CART 0.229 +/− 0.077 0.27 +/− 0.155 0.191 +/− 0.115 0.349 +/− 0.195 −0.46 +/− 0.186
NN 0.845 +/− 0.071 0.808 +/− 0.1605 0.882 +/− 0.065 0.734 +/− 0.256 0.61 +/− 0.222

Chemicals active under any cell culture condition were considered as actives for each color channel. Each model building process was repeated 10 times with distinct data segregation and inactive under sampling from the entire Tox21 dataset, and the mean (M) and the standard deviation (SD) of each performance criterion are reported, Acc: accuracy, Accb: balanced accuracy, Sp: specificity, Se: sensitivity and MCC: Matthew Coefficient Correlation, see methods.