. Author manuscript; available in PMC: 2015 Aug 24.

Published in final edited form as: Nat Biotechnol. 2014 Jun 1;32(12):1202–1212. doi: 10.1038/nbt.2877

Table 1.

NCI-DREAM drug sensitivity prediction methods

Team	Synopsis	wpc-index (scaled)	FDR	Data
Kernel method
1	Bayesian multitask MKL (see main text).	0.583(0.629)	2.6 × 10⁵	exnmrc OI
2	A predefined number of features were selected using Pearson correlation, training and prediction was done using support vector regression (SVR; radial basis).	0.559(0.592)	1.0 × 10³	enmrc
3	Separate normalizations were applied to each dataset, several support vector machine (SVM) classifiers were independently trained (varying kernels and input data), final predictions were made using a weighted average of all SVM outputs.	0.553(0.582)	2.7 × 10³	exnmrc
4	Bidirectional search was used to select features, training and prediction was done using a SVM (radial basis).	0.549(0.575)	4.8 × 10³	enmrc
Nonlinear regression (regression trees)
1	Features were randomly selected to built an ensemble of unpruned regression trees for each dataset, missing values were imputed, weights for the models were calculated, final predictions were made using a weighted sum of the individual models.	0.577(0.620)	7.2 × 10⁵	enm
2	Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from four individual dataset models (enrc).	0.569(0.607)	2.9 × 10⁴	enrc OI
3	Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from five individual dataset models (enmrc).	0.565(0.601)	5.1 × 10⁴	enmrc OI
4	Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from five individual dataset models (exnrc).	0.564(0.599)	5.1 × 10⁴	exnrc OI
5	Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from individual dataset models (exnmrc).	0.559(0.591)	1.0 × 10³	exnmrc OI
6	Gene features were selected using linear regression and maximal information coefficient, pathway information was also used to derive features, training and prediction was done using a random forest model.	0.551(0.579)	3.3 × 10³	exnmrc
7	Random forests were constructed in a stacked approach, an ensemble of regression trees was constructed for all drug/dataset pairs, missing values were imputed, predictions were made for individual models and another random forest was used to combine the different predictions for the drugs to a final prediction.	0.548(0.575)	5.0 × 10³	exnmrc
8	Features were ranked according to the absolute value of Spearman’s correlation, the average rank of all cell lines was calculated according to the top features.	0.548(0.574)	5.0 × 10³	exnmrc
9	Features were selected using Pearson correlation and a combination of bagging and gradient boosting, prediction was made using selected features and a regression tree.	0.544(0.568)	1.0 × 10²	exnmrc
10	Features were selected using matrix approximation methods leveraging SVD, training and prediction were done using a regression tree models using gradient boosting.	0.538(0.560)	1.9 × 10²	en
11	Features were selected for individual cell lines by constructing random forests and pruning (recursive feature elimination), missing values were imputed, final predictions were made by training a random forest using features from all cell lines. In addition to cell line features, bioactivity spectra of the individual compounds were included as compound features.	0.524(0.538)	9.2 × 10²	exnmrc
Sparse linear regression
1	Features were simultaneously selected and a ranking model built for each drug by lasso regression.	0.564(0.600)	5.1 × 10⁴	en
2	Features were initially filtered based on linear regression to drug response, training and prediction were done using elastic nets.	0.564(0.600)	5.1 × 10⁴	exnmrc
3	Gene and pathway features were determined using a one-dimensional factor analysis, training and predictions were made with spike and slab multitask regression, drug dose-response values were recalculated from raw growth curves.	0.564(0.598)	5.1 × 10⁴	exnmrc OI
4	Missing features were imputed, combinations of datasets were enumerated and used to train elastic net regression models, for each drug, final predictions were made using the best-performing model.	0.551(0.579)	3.3 × 10³	exmrc
5	Gene and pathway features were determined using a one-dimension factor analysis, training and predictions were made with spike and slab multitask regression, drug dose-response values were recalculated from raw growth curves, Heiser et al. data were used to train the model.	0.539(0.560)	1.9 × 10²	exnmrc OI
6	Features were removed with low dynamic range, missing feature values were imputed, training and predictions were made using lasso regression on individual datasets, final predictions were made using the weighted sum of regression models.	0.539(0.560)	1.9 × 10²	exnmrc
7	Statistically significant features were selected using Spearman correlation, training and prediction were done using an elastic net.	0.532(0.549)	4.7 × 10²	e
8	Features were constructed by grouping genes according to GO terms, training and prediction were done using relaxed lasso regression.	0.531(0.548)	4.7 × 10²	en OI
9	Gene and pathway features were determined using a one-dimension factor analysis, training and predictions were made with spike and slab multitask regression, GI₅₀ values were used.	0.531(0.547)	4.9 × 10²	exnmrc OI
10	Features were selected using a regression with log penalty, which bridges the L0 and L1 penalty, missing values were imputed, penalized regression models were trained on individual datasets, final predictions were made using a weighted average.	0.531(0.547)	4.9 × 10²	exnrc
11	Features were selected based on elastic nets, missing values were imputed, training and predictions were done using ridge regression.	0.527(0.543)	6.7 × 10²	exnmrc
12	Features were filtered on dataset-specific criteria, missing values were set to random numbers, training and predictions were made using the interior point method for L1-regularization.	0.519(0.529)	1.5 × 10¹	enmrc
13	Features were selected using a Gompertz growth model, predictions were made using a lasso regression model.	0.517(0.526)	1.8 × 10¹	exnmrc
14	Putative gene set expression values were calculated from constituent genes, training and predictions were made using linear regression.	0.485(0.477)	8.0 × 10¹	e
PLS or PC regression
1	Removed lowly expressed and/or low variance features, features were selected based on correlation to drug response, multiple partial least squares regression models were trained and consensus determined for final prediction.	0.562(0.597)	5.5 × 10⁴	en OI
2	Features were selected by using lasso regression and groups of genes predefined by core signaling pathways, predictions were made by linear regression of the reduced feature set to drug response, predictor datasets were merged in advance of drug response prediction, and responses were predicted simultaneously sharing information among drugs.	0.543(0.567)	1.0 × 10²	exnmrc OI
3	Training and prediction were done using principal component regression for individual drugs.	0.535(0.554)	3.1 × 10²	exnmrc
4	Statistically significant features were selected using correlation, models were fit using principal component regression, final predictions were made using a weighted average of models.	0.524(0.538)	9.2 × 10²	en
Ensemble/model selection
1	Features were selected using correlation, dimensionality reduced using principal component analysis, lasso and ridge method, several regression models were trained for individual drugs and the top cross-validated model was selected to make final predictions for each drug.	0.562(0.597)	5.5 × 10⁴	exnmrc
2	Features were selected on outside information, missing values were imputed, predictions were made by aggregating results from an ensemble of machine-learning methods.	0.556(0.587)	1.6 × 10³	exnmrc
3	Features were selected using Spearman’s rank correlation, missing values were imputed, predictions were made using the best-performance method (determined by cross-validation on the training set) among an ensemble of methods (random forest, support vector machine and linear regression).	0.554(0.583)	2.6 × 10³	exnmrc
4	Gene and pathway features were compiled using outside data, an ensemble of prediction models were trained, final predictions were based on a rank-aggregation of combined prediction models.	0.517(0.527)	1.7 × 10¹	exnmrc OI
5	Features were selected using outside pathway and interaction data, missing values were imputed, individual drug predictions were made using the best model selected from an ensemble of methods.	0.506(0.509)	3.7 × 10¹	e OI
Other
1	Features were weighted based on Pearson’s correlation to drug response, predictions were made using the correlation of the weighted features.	0.570(0.608)	2.9 × 10⁴	enr
2	Gene features showing strong survival from the METABRIC dataset were selected, then hierarchically clustered, a linear model was built to fit gene clusters to drug response, predictions were made using a regression model.	0.553(0.582)	2.6 × 10³	e OI
3	Missing features were imputed, signatures were extracted for each dataset, predictions were made using 1-nearest-neighbor to training cell lines via Pearson’s correlation between signatures for each data type, final predictions are the weighted sum of the individual datasets.	0.553(0.581)	2.7 × 10³	exnmrc
4	Features were selected using dataset-specific criteria, missing values were imputed, predictions were made using KNN.	0.531(0.549)	4.7 × 10²	exnmrc
5	Features were filtered using dataset-specific criteria, an ensemble of Cox regression models were constructed using random sampling from top-performing features, final prediction is the average of all models.	0.528(0.543)	6.5 × 10²	nmc
6	Features were selected using the concordance index, predictions were made using an integrated voting strategy based on each feature’s ability to predict the order of pairs of cell lines.	0.521(0.532)	1.3 × 10¹	enmrc

The 44 team submissions were categorized according to their underlying methodology. The indexing scheme is used in Figures 2 and 5. Team scores (wpc-index) were re-scaled setting the gold-standard ranking to 1 and the inverse to 0. Teams leveraged different genomic datasets, coded as (e) gene expression, (x) exome sequencing, (n) RNA seq, (m) methylation, (r) RPPA and (c) copy number variation. The use of outside information, often in the form of biological pathway annotation, was found to be a factor that improved average team rank and is noted in the Data column as ‘OI’. Additional method characterizations can be found in Supplemental Table 1.