Table 5. Comparison of RFs induced using non-redundant subsets of the Kinase dataset.
Threshold | Non-redundant Observations (Pos/Unl) | Non-redundant Dataset G Mean | Entire Dataset | ||||
---|---|---|---|---|---|---|---|
TP | FP | TN | FN | G Mean | |||
20% | 102 (18/84) | 0.79 | 51 | 196 | 371 | 43 | 0.60 |
30% | 198 (26/172) | 0.85 | 49 | 165 | 402 | 45 | 0.61 |
40% | 332 (49/283) | 0.78 | 75 | 184 | 383 | 19 | 0.73 |
50% | 432 (67/365) | 0.79 | 72 | 120 | 447 | 22 | 0.78 |
60% | 497 (77/420) | 0.81 | 77 | 132 | 435 | 17 | 0.79 |
70% | 569 (83/486) | 0.79 | 72 | 118 | 449 | 22 | 0.78 |
80% | 625 (88/537) | 0.80 | 72 | 112 | 455 | 22 | 0.78 |
90% | 650 (94/556) | 0.79 | 69 | 90 | 477 | 25 | 0.79 |
100% | 661 (94/567) | 0.80 | 72 | 98 | 469 | 22 | 0.80 |
For each threshold, a non-redundant dataset was generated using Leaf and used to induce a RF. The RF was then used to classify the proteins in both the non-redundant dataset it was trained on and the entire Kinase dataset. The TPs/FNs are the number of positive proteins in the entire dataset predicted correctly/incorrectly, and the TNs/FPs are the number of unlabelled proteins predicted correctly/incorrectly.