(A) Schematic of an example of training-prediction-assessment workflow, depicting the generation of a prediction model (yellow, fK1) that considers MEKi PD-901 response data from the Klijn 2015 dataset (yK1, light red) and DNA and RNA features (xK). The 154 cell lines in common between the two datasets were excluded from model building. Prediction models were built on 70% of training cell lines (selected randomly), repeated 30 times, and the final predicted drug response of a given cell line in the validation sets was calculated as the average of the 30 repeats. The resulting prediction models are applied to within-dataset and cross-dataset RNA and DNA data (xK and xC) to generate predicted drug response scores (ŷK(K1) and ŷC(K1)). Predicted drug response values, shown in light green boxes, were then compared with observed drug response to evaluate model performance (within-dataset: ŷK(K1) vs. yK1
| yK2; cross-dataset: ŷC(K1) vs. yC1
| yC2). Model generation is depicted with black arrows, model application with green dashed arrows, and performance assessment with blue dotted arrows. (B) Outline of the full combinations of 4 models based on input data, 4 algorithms, assessments by comparing predicted MEKi response to the 4 series of observed response data, and 2 performance metrics. (C) Two examples showing observed and predicted log(IC50) from the fK1 model: regularized regression and within-dataset validation (top panel) or logistic regression and cross-dataset validation (bottom). Rank correlation (Spearman’s ρ) and concordance index are shown in the top left corner. (D) Performance of all combinations of models, algorithms (y-axis), and assessments by rank correlation (Spearman’s ρ, top panel) and concordance index (bottom). Within-dataset performances are indicated by shades of blue: cyan/dark blue, while between-dataset performances are indicated by shades of red: pink/dark red. Models trained from CCLE data are indicated by the darker shade. Gray boxes: random forest models trained on CCLE-Selumetinib data (fC2). Regul: regularized regression; RF (reg): regression-based random forest; Logit: logistic regression; RF (bin): classification-based (binary) random forest.