Skip to main content
. Author manuscript; available in PMC: 2015 Aug 24.
Published in final edited form as: Nat Biotechnol. 2014 Jun 1;32(12):1202–1212. doi: 10.1038/nbt.2877

Table 1.

NCI-DREAM drug sensitivity prediction methods

Team Synopsis wpc-index (scaled) FDR Data
Kernel method
1 Bayesian multitask MKL (see main text). 0.583(0.629) 2.6 × 105 exnmrc OI
2 A predefined number of features were selected using Pearson correlation, training and prediction was done
using support vector regression (SVR; radial basis).
0.559(0.592) 1.0 × 103 enmrc
3 Separate normalizations were applied to each dataset, several support vector machine (SVM) classifiers were
independently trained (varying kernels and input data), final predictions were made using a weighted average
of all SVM outputs.
0.553(0.582) 2.7 × 103 exnmrc
4 Bidirectional search was used to select features, training and prediction was done using a SVM (radial basis). 0.549(0.575) 4.8 × 103 enmrc
Nonlinear regression (regression trees)
1 Features were randomly selected to built an ensemble of unpruned regression trees for each dataset, missing
values were imputed, weights for the models were calculated, final predictions were made using a weighted
sum of the individual models.
0.577(0.620) 7.2 × 105 enm
2 Features were filtered based on their correlation to dose-response values, random forests were trained for
each dataset, missing values were imputed, final rankings were based on a composite score from four
individual dataset models (enrc).
0.569(0.607) 2.9 × 104 enrc OI
3 Features were filtered based on their correlation to dose-response values, random forests were trained for
each dataset, missing values were imputed, final rankings were based on a composite score from five
individual dataset models (enmrc).
0.565(0.601) 5.1 × 104 enmrc OI
4 Features were filtered based on their correlation to dose-response values, random forests were trained for each
dataset, missing values were imputed, final rankings were based on a composite score from five individual
dataset models (exnrc).
0.564(0.599) 5.1 × 104 exnrc OI
5 Features were filtered based on their correlation to dose-response values, random forests were trained for
each dataset, missing values were imputed, final rankings were based on a composite score from individual
dataset models (exnmrc).
0.559(0.591) 1.0 × 103 exnmrc OI
6 Gene features were selected using linear regression and maximal information coefficient, pathway information
was also used to derive features, training and prediction was done using a random forest model.
0.551(0.579) 3.3 × 103 exnmrc
7 Random forests were constructed in a stacked approach, an ensemble of regression trees was constructed for
all drug/dataset pairs, missing values were imputed, predictions were made for individual models and another
random forest was used to combine the different predictions for the drugs to a final prediction.
0.548(0.575) 5.0 × 103 exnmrc
8 Features were ranked according to the absolute value of Spearman’s correlation, the average rank of all
cell lines was calculated according to the top features.
0.548(0.574) 5.0 × 103 exnmrc
9 Features were selected using Pearson correlation and a combination of bagging and gradient boosting,
prediction was made using selected features and a regression tree.
0.544(0.568) 1.0 × 102 exnmrc
10 Features were selected using matrix approximation methods leveraging SVD, training and prediction were
done using a regression tree models using gradient boosting.
0.538(0.560) 1.9 × 102 en
11 Features were selected for individual cell lines by constructing random forests and pruning (recursive
feature elimination), missing values were imputed, final predictions were made by training a random forest
using features from all cell lines. In addition to cell line features, bioactivity spectra of the individual
compounds were included as compound features.
0.524(0.538) 9.2 × 102 exnmrc
Sparse linear regression
1 Features were simultaneously selected and a ranking model built for each drug by lasso regression. 0.564(0.600) 5.1 × 104 en
2 Features were initially filtered based on linear regression to drug response, training and prediction were done
using elastic nets.
0.564(0.600) 5.1 × 104 exnmrc
3 Gene and pathway features were determined using a one-dimensional factor analysis, training and predictions
were made with spike and slab multitask regression, drug dose-response values were recalculated from raw
growth curves.
0.564(0.598) 5.1 × 104 exnmrc OI
4 Missing features were imputed, combinations of datasets were enumerated and used to train elastic net
regression models, for each drug, final predictions were made using the best-performing model.
0.551(0.579) 3.3 × 103 exmrc
5 Gene and pathway features were determined using a one-dimension factor analysis, training and predictions
were made with spike and slab multitask regression, drug dose-response values were recalculated from raw
growth curves, Heiser et al. data were used to train the model.
0.539(0.560) 1.9 × 102 exnmrc OI
6 Features were removed with low dynamic range, missing feature values were imputed, training and predictions
were made using lasso regression on individual datasets, final predictions were made using the weighted sum
of regression models.
0.539(0.560) 1.9 × 102 exnmrc
7 Statistically significant features were selected using Spearman correlation, training and prediction were done
using an elastic net.
0.532(0.549) 4.7 × 102 e
8 Features were constructed by grouping genes according to GO terms, training and prediction were done using
relaxed lasso regression.
0.531(0.548) 4.7 × 102 en OI
9 Gene and pathway features were determined using a one-dimension factor analysis, training and predictions
were made with spike and slab multitask regression, GI50 values were used.
0.531(0.547) 4.9 × 102 exnmrc OI
10 Features were selected using a regression with log penalty, which bridges the L0 and L1 penalty, missing values
were imputed, penalized regression models were trained on individual datasets, final predictions were made
using a weighted average.
0.531(0.547) 4.9 × 102 exnrc
11 Features were selected based on elastic nets, missing values were imputed, training and predictions were done
using ridge regression.
0.527(0.543) 6.7 × 102 exnmrc
12 Features were filtered on dataset-specific criteria, missing values were set to random numbers, training and
predictions were made using the interior point method for L1-regularization.
0.519(0.529) 1.5 × 101 enmrc
13 Features were selected using a Gompertz growth model, predictions were made using a lasso regression model. 0.517(0.526) 1.8 × 101 exnmrc
14 Putative gene set expression values were calculated from constituent genes, training and predictions were
made using linear regression.
0.485(0.477) 8.0 × 101 e
PLS or PC regression
1 Removed lowly expressed and/or low variance features, features were selected based on correlation to drug
response, multiple partial least squares regression models were trained and consensus determined for final
prediction.
0.562(0.597) 5.5 × 104 en OI
2 Features were selected by using lasso regression and groups of genes predefined by core signaling pathways,
predictions were made by linear regression of the reduced feature set to drug response, predictor datasets
were merged in advance of drug response prediction, and responses were predicted simultaneously sharing
information among drugs.
0.543(0.567) 1.0 × 102 exnmrc OI
3 Training and prediction were done using principal component regression for individual drugs. 0.535(0.554) 3.1 × 102 exnmrc
4 Statistically significant features were selected using correlation, models were fit using principal component
regression, final predictions were made using a weighted average of models.
0.524(0.538) 9.2 × 102 en
Ensemble/model selection
1 Features were selected using correlation, dimensionality reduced using principal component analysis, lasso
and ridge method, several regression models were trained for individual drugs and the top cross-validated
model was selected to make final predictions for each drug.
0.562(0.597) 5.5 × 104 exnmrc
2 Features were selected on outside information, missing values were imputed, predictions were made by
aggregating results from an ensemble of machine-learning methods.
0.556(0.587) 1.6 × 103 exnmrc
3 Features were selected using Spearman’s rank correlation, missing values were imputed, predictions were
made using the best-performance method (determined by cross-validation on the training set) among an
ensemble of methods (random forest, support vector machine and linear regression).
0.554(0.583) 2.6 × 103 exnmrc
4 Gene and pathway features were compiled using outside data, an ensemble of prediction models were trained,
final predictions were based on a rank-aggregation of combined prediction models.
0.517(0.527) 1.7 × 101 exnmrc OI
5 Features were selected using outside pathway and interaction data, missing values were imputed, individual
drug predictions were made using the best model selected from an ensemble of methods.
0.506(0.509) 3.7 × 101 e OI
Other
1 Features were weighted based on Pearson’s correlation to drug response, predictions were made using the
correlation of the weighted features.
0.570(0.608) 2.9 × 104 enr
2 Gene features showing strong survival from the METABRIC dataset were selected, then hierarchically clustered,
a linear model was built to fit gene clusters to drug response, predictions were made using a regression model.
0.553(0.582) 2.6 × 103 e OI
3 Missing features were imputed, signatures were extracted for each dataset, predictions were made using
1-nearest-neighbor to training cell lines via Pearson’s correlation between signatures for each data type, final
predictions are the weighted sum of the individual datasets.
0.553(0.581) 2.7 × 103 exnmrc
4 Features were selected using dataset-specific criteria, missing values were imputed, predictions were made
using KNN.
0.531(0.549) 4.7 × 102 exnmrc
5 Features were filtered using dataset-specific criteria, an ensemble of Cox regression models were constructed
using random sampling from top-performing features, final prediction is the average of all models.
0.528(0.543) 6.5 × 102 nmc
6 Features were selected using the concordance index, predictions were made using an integrated voting
strategy based on each feature’s ability to predict the order of pairs of cell lines.
0.521(0.532) 1.3 × 101 enmrc

The 44 team submissions were categorized according to their underlying methodology. The indexing scheme is used in Figures 2 and 5. Team scores (wpc-index) were re-scaled setting the gold-standard ranking to 1 and the inverse to 0. Teams leveraged different genomic datasets, coded as (e) gene expression, (x) exome sequencing, (n) RNA seq, (m) methylation, (r) RPPA and (c) copy number variation. The use of outside information, often in the form of biological pathway annotation, was found to be a factor that improved average team rank and is noted in the Data column as ‘OI’. Additional method characterizations can be found in Supplemental Table 1.