Skip to main content
. 2010 Dec 5;27(2):220–224. doi: 10.1093/bioinformatics/btq628

Fig. 1.

Fig. 1.

Overview of the model building algorithm. (A) A RANDOM FOREST model is fit between all the probesets in the training set (16 644) and the IC50 values for each drug. (B) PROBESETS that have a variable importance OF 2 SDs > mean of variable importance for all probesets are kept as a gene expression signature; a second Random Forest model is fit between this gene expression signature and the IC50 values for each drug. (C) CASE proximity values for each drug are generated from the second model using Equation (1), outlying cell lines are removed, and a third Random Forest model is fit with the remaining cell lines and the gene expression signature.