Skip to main content
. 2014 Jan 10;54(2):634–647. doi: 10.1021/ci400460q

Table 1. QSAR Model Validations on the External 5-Fold CV Sets As Well As the Additional Independent External Set from WOMBAT.

      confusion matrix
  statistics
machine learning methods external sets prediction CCR N(1)a N(2)a TP TN FP FN   SE SP EN(1) EN(2)
  1 0.86 19b 14 17 11 3 2   0.89 0.79 1.61 1.76
  2 0.61 20 13 15 6 7 5   0.75 0.46 1.16 1.30
k-nearest neighbor 3 0.77 22 11 20 7 4 2   0.91 0.64 1.43 1.75
  4 0.86 20 13 19 10 3 1   0.95 0.77 1.61 1.88
  5 0.68 23 10 22 4 6 1   0.96 0.40 1.23 1.80
  Cumulative 0.76 104 61 93 38 23 11   0.89 0.62 1.41 1.71
  WOMBAT N/A 66 0 62 N/A N/A 4   0.94 N/A N/A N/A
  1 0.80 20 14 16 11 3 4   0.80 0.79 1.58 1.59
  2 0.68 20 13 15 8 5 5   0.75 0.62 1.32 1.42
random forest 3 0.84 22 11 21 8 3 1   0.95 0.73 1.56 1.88
  4 0.74 20 13 19 7 6 1   0.95 0.54 1.35 1.83
  5 0.83 23 10 22 7 3 1   0.96 0.70 1.52 1.88
  Cumulative 0.78 105 61 93 41 20 12   0.89 0.67 1.46 1.71
  WOMBAT N/A 66 0 62 N/A N/A 4   0.94 N/A N/A N/A
  1 0.87 20 14 19 11 3 1   0.95 0.79 1.36 1.88
  2 0.68 20 13 18 6 7 2   0.90 0.46 1.25 1.64
support vector machines 3 0.95 22 11 22 10 1 0   1.00 0.91 1.83 2.00
  4 0.76 20 13 18 8 5 2   0.90 0.62 1.40 1.72
  5 0.76 23 10 21 6 4 2   0.91 0.60 1.39 1.75
  Cumulative 0.80 105 61 98 41 20 7   0.93 0.67 1.48 1.82
  WOMBAT N/A 66 0 62 N/A N/A 4   0.96 N/A N/A N/A
a

N(1) = number of actives, N(2) = number of inactives, TP = true positive (actives predicted as actives), FP = false positives (inactives predicted as actives), FN = false negatives (actives predicted as inactives), TN = true negative (inactives predicted as inactives), SE = sensitivity = TP/N(1), SP = specificity = TN/N(2), EN = the normalized enrichment, EN(1) = (2TP × N(2))/(TP × N(2) + FP × N(1)), EN(2) = (2TN × N(1))/(TN × N(1) + FN × N(2)), and CCR = correct classification rate.

b

Some N(1) actives of and N(2) inactives were out of application domain of all consensus models, thus having no prediction. Only data for compounds found within the AD were used for statistical summaries.