Table 1.
Team | Synopsis | wpc-index (scaled) | FDR | Data |
---|---|---|---|---|
Kernel method | ||||
1 | Bayesian multitask MKL (see main text). | 0.583(0.629) | 2.6 × 105 | exnmrc OI |
2 | A predefined number of features were selected using Pearson correlation, training and prediction was done using support vector regression (SVR; radial basis). |
0.559(0.592) | 1.0 × 103 | enmrc |
3 | Separate normalizations were applied to each dataset, several support vector machine (SVM) classifiers were independently trained (varying kernels and input data), final predictions were made using a weighted average of all SVM outputs. |
0.553(0.582) | 2.7 × 103 | exnmrc |
4 | Bidirectional search was used to select features, training and prediction was done using a SVM (radial basis). | 0.549(0.575) | 4.8 × 103 | enmrc |
Nonlinear regression (regression trees) | ||||
1 | Features were randomly selected to built an ensemble of unpruned regression trees for each dataset, missing values were imputed, weights for the models were calculated, final predictions were made using a weighted sum of the individual models. |
0.577(0.620) | 7.2 × 105 | enm |
2 | Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from four individual dataset models (enrc). |
0.569(0.607) | 2.9 × 104 | enrc OI |
3 | Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from five individual dataset models (enmrc). |
0.565(0.601) | 5.1 × 104 | enmrc OI |
4 | Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from five individual dataset models (exnrc). |
0.564(0.599) | 5.1 × 104 | exnrc OI |
5 | Features were filtered based on their correlation to dose-response values, random forests were trained for each dataset, missing values were imputed, final rankings were based on a composite score from individual dataset models (exnmrc). |
0.559(0.591) | 1.0 × 103 | exnmrc OI |
6 | Gene features were selected using linear regression and maximal information coefficient, pathway information was also used to derive features, training and prediction was done using a random forest model. |
0.551(0.579) | 3.3 × 103 | exnmrc |
7 | Random forests were constructed in a stacked approach, an ensemble of regression trees was constructed for all drug/dataset pairs, missing values were imputed, predictions were made for individual models and another random forest was used to combine the different predictions for the drugs to a final prediction. |
0.548(0.575) | 5.0 × 103 | exnmrc |
8 | Features were ranked according to the absolute value of Spearman’s correlation, the average rank of all cell lines was calculated according to the top features. |
0.548(0.574) | 5.0 × 103 | exnmrc |
9 | Features were selected using Pearson correlation and a combination of bagging and gradient boosting, prediction was made using selected features and a regression tree. |
0.544(0.568) | 1.0 × 102 | exnmrc |
10 | Features were selected using matrix approximation methods leveraging SVD, training and prediction were done using a regression tree models using gradient boosting. |
0.538(0.560) | 1.9 × 102 | en |
11 | Features were selected for individual cell lines by constructing random forests and pruning (recursive feature elimination), missing values were imputed, final predictions were made by training a random forest using features from all cell lines. In addition to cell line features, bioactivity spectra of the individual compounds were included as compound features. |
0.524(0.538) | 9.2 × 102 | exnmrc |
Sparse linear regression | ||||
1 | Features were simultaneously selected and a ranking model built for each drug by lasso regression. | 0.564(0.600) | 5.1 × 104 | en |
2 | Features were initially filtered based on linear regression to drug response, training and prediction were done using elastic nets. |
0.564(0.600) | 5.1 × 104 | exnmrc |
3 | Gene and pathway features were determined using a one-dimensional factor analysis, training and predictions were made with spike and slab multitask regression, drug dose-response values were recalculated from raw growth curves. |
0.564(0.598) | 5.1 × 104 | exnmrc OI |
4 | Missing features were imputed, combinations of datasets were enumerated and used to train elastic net regression models, for each drug, final predictions were made using the best-performing model. |
0.551(0.579) | 3.3 × 103 | exmrc |
5 | Gene and pathway features were determined using a one-dimension factor analysis, training and predictions were made with spike and slab multitask regression, drug dose-response values were recalculated from raw growth curves, Heiser et al. data were used to train the model. |
0.539(0.560) | 1.9 × 102 | exnmrc OI |
6 | Features were removed with low dynamic range, missing feature values were imputed, training and predictions were made using lasso regression on individual datasets, final predictions were made using the weighted sum of regression models. |
0.539(0.560) | 1.9 × 102 | exnmrc |
7 | Statistically significant features were selected using Spearman correlation, training and prediction were done using an elastic net. |
0.532(0.549) | 4.7 × 102 | e |
8 | Features were constructed by grouping genes according to GO terms, training and prediction were done using relaxed lasso regression. |
0.531(0.548) | 4.7 × 102 | en OI |
9 | Gene and pathway features were determined using a one-dimension factor analysis, training and predictions were made with spike and slab multitask regression, GI50 values were used. |
0.531(0.547) | 4.9 × 102 | exnmrc OI |
10 | Features were selected using a regression with log penalty, which bridges the L0 and L1 penalty, missing values were imputed, penalized regression models were trained on individual datasets, final predictions were made using a weighted average. |
0.531(0.547) | 4.9 × 102 | exnrc |
11 | Features were selected based on elastic nets, missing values were imputed, training and predictions were done using ridge regression. |
0.527(0.543) | 6.7 × 102 | exnmrc |
12 | Features were filtered on dataset-specific criteria, missing values were set to random numbers, training and predictions were made using the interior point method for L1-regularization. |
0.519(0.529) | 1.5 × 101 | enmrc |
13 | Features were selected using a Gompertz growth model, predictions were made using a lasso regression model. | 0.517(0.526) | 1.8 × 101 | exnmrc |
14 | Putative gene set expression values were calculated from constituent genes, training and predictions were made using linear regression. |
0.485(0.477) | 8.0 × 101 | e |
PLS or PC regression | ||||
1 | Removed lowly expressed and/or low variance features, features were selected based on correlation to drug response, multiple partial least squares regression models were trained and consensus determined for final prediction. |
0.562(0.597) | 5.5 × 104 | en OI |
2 | Features were selected by using lasso regression and groups of genes predefined by core signaling pathways, predictions were made by linear regression of the reduced feature set to drug response, predictor datasets were merged in advance of drug response prediction, and responses were predicted simultaneously sharing information among drugs. |
0.543(0.567) | 1.0 × 102 | exnmrc OI |
3 | Training and prediction were done using principal component regression for individual drugs. | 0.535(0.554) | 3.1 × 102 | exnmrc |
4 | Statistically significant features were selected using correlation, models were fit using principal component regression, final predictions were made using a weighted average of models. |
0.524(0.538) | 9.2 × 102 | en |
Ensemble/model selection | ||||
1 | Features were selected using correlation, dimensionality reduced using principal component analysis, lasso and ridge method, several regression models were trained for individual drugs and the top cross-validated model was selected to make final predictions for each drug. |
0.562(0.597) | 5.5 × 104 | exnmrc |
2 | Features were selected on outside information, missing values were imputed, predictions were made by aggregating results from an ensemble of machine-learning methods. |
0.556(0.587) | 1.6 × 103 | exnmrc |
3 | Features were selected using Spearman’s rank correlation, missing values were imputed, predictions were made using the best-performance method (determined by cross-validation on the training set) among an ensemble of methods (random forest, support vector machine and linear regression). |
0.554(0.583) | 2.6 × 103 | exnmrc |
4 | Gene and pathway features were compiled using outside data, an ensemble of prediction models were trained, final predictions were based on a rank-aggregation of combined prediction models. |
0.517(0.527) | 1.7 × 101 | exnmrc OI |
5 | Features were selected using outside pathway and interaction data, missing values were imputed, individual drug predictions were made using the best model selected from an ensemble of methods. |
0.506(0.509) | 3.7 × 101 | e OI |
Other | ||||
1 | Features were weighted based on Pearson’s correlation to drug response, predictions were made using the correlation of the weighted features. |
0.570(0.608) | 2.9 × 104 | enr |
2 | Gene features showing strong survival from the METABRIC dataset were selected, then hierarchically clustered, a linear model was built to fit gene clusters to drug response, predictions were made using a regression model. |
0.553(0.582) | 2.6 × 103 | e OI |
3 | Missing features were imputed, signatures were extracted for each dataset, predictions were made using 1-nearest-neighbor to training cell lines via Pearson’s correlation between signatures for each data type, final predictions are the weighted sum of the individual datasets. |
0.553(0.581) | 2.7 × 103 | exnmrc |
4 | Features were selected using dataset-specific criteria, missing values were imputed, predictions were made using KNN. |
0.531(0.549) | 4.7 × 102 | exnmrc |
5 | Features were filtered using dataset-specific criteria, an ensemble of Cox regression models were constructed using random sampling from top-performing features, final prediction is the average of all models. |
0.528(0.543) | 6.5 × 102 | nmc |
6 | Features were selected using the concordance index, predictions were made using an integrated voting strategy based on each feature’s ability to predict the order of pairs of cell lines. |
0.521(0.532) | 1.3 × 101 | enmrc |
The 44 team submissions were categorized according to their underlying methodology. The indexing scheme is used in Figures 2 and 5. Team scores (wpc-index) were re-scaled setting the gold-standard ranking to 1 and the inverse to 0. Teams leveraged different genomic datasets, coded as (e) gene expression, (x) exome sequencing, (n) RNA seq, (m) methylation, (r) RPPA and (c) copy number variation. The use of outside information, often in the form of biological pathway annotation, was found to be a factor that improved average team rank and is noted in the Data column as ‘OI’. Additional method characterizations can be found in Supplemental Table 1.