Table 2. Performance of two random forest models on validation set of 152 FDA-approved drugs as a function of cells tested.
# of Cells | All | 7 | 6 | 5 | 4 |
# of Drugs | 152 | 29 | 30 | 42 | 51 |
On-the-fly | |||||
Top 100 | 58 | 13 | 15 | 16 | 14 |
Top 50 | 42 | 10 | 10 | 12 | 10 |
Top 100% | 38% | 45% | 50% | 38% | 27% |
Top 50% | 28% | 34% | 33% | 29% | 20% |
Two-level | |||||
Top 100 | 63 | 14 | 15 | 22 | 13 |
Top 50 | 54 | 12 | 14 | 20 | 8 |
Top 100% | 41% | 48% | 50% | 52% | 25% |
Top 50% | 36% | 41% | 47% | 48% | 16% |
Top 50%/100% - percent of drugs with targets correctly predicted as top 50/100. The number of drugs with targets ranked in top 100/50 are shown for the “on-the-fly” and “two-level” RF classification models. Results of the models are shown for “All” drugs tested in four or more cell lines, as well as for the subsets of drugs profiled in different numbers of cell lines. Note that the success rate for RF is significant with p < 10−6 based on randomization tests (S1 Fig).