Skip to main content
. Author manuscript; available in PMC: 2018 Sep 1.
Published in final edited form as: Hum Mutat. 2017 Mar 9;38(9):1240–1250. doi: 10.1002/humu.23197

Table 1.

Summary of submissions. Methods and features used by each group for the two parts of the challenge. Features are divided into 4 classes: 1) experimentally measured epigenetic properties, 2) predicted epigenetic properties 3) other locus-specific properties 4) DNA k-mer frequencies. For the methods: R corresponds to the Regression tasks (predicting Log2FC in part I or LogSkew in part II); C corresponds to the classification tasks (predicting regulatory hits in part I or emVar hits in part II).

Group Features
(feature classes 1–4)
Methods (part I)
(R: regression; C: classification)
Methods (part II)
(R: regression; C: classification)
1 Histone modifications in K562 cells (Consortium, 2012) (class 1); Evolutionary conservation (Siepel, et al., 2005) (class 3); k-mer frequencies (class 4); Regularized regression (e.g., elastic net (Hui Zou, 2005); R, C), random forest (R, C), SVR (R), SVM (C) Same as part I
2 Histone modifications, DHS, and TFBS in LCL (Consortium, 2012) (class 1); Predictions of DHS in 164 cell lines (Kelley, et al., 2016) (class 2); Predictions of TFBS (Cowper-Sallari, et al., 2012) and LCL-specific histone modifications and DHS based on (Consortium, 2012) (class 2). Ensemble of gradient boosting models (Fabian Pedregosa, 2011). Each model trained on a different feature subset (R, C). Same as part I
3 k-mers (class 4) Linear SVR (R) and SVM (C) Same as part I
4 Part I: Segmentation of genomic regions based on histone modifications in LCL (Consortium, 2012; Ernst and Kellis, 2012) (class 1); Predictions of TFBS, DHS, and histone marks (using (Alipanahi, et al., 2015; Zhou and Troyanskaya, 2015), with data from (Consortium, 2012; Romanoski, et al., 2015); class 2).
Part II: allele-specific activity level predicted by the models in part I
Ensemble of models, using LASSO or Random Forest, and trained on different feature subsets (R).
Ensemble of neural networks, trained different feature subsets (C)
Difference between predicted alleles’ scores (R)
Ensemble of classifiers (e.g., KNN; C)
5 Predictions of DHS (using (Ghandi, et al., 2014) with LCL data from (Consortium, 2012); class 2). Predicted alleles’ DH scores are used directly (R,C) Difference between DHS scores of the two alleles (Ghandi, et al., 2014) (R, C)
6 Part I: Histone modifications, DHS, DNA-methylation, and TFBS in LCL (Consortium, 2012) (class 1); Predictions of TFBS, and protein binding sites in the transcribed RNA (using (Alipanahi, et al., 2015; Grant, et al., 2011; Hume, et al., 2015), with data from (Alipanahi, et al., 2015; Consortium, 2012); class 2).
Part II: all of features form part I, plus allele-specific activity levels predicted by the models in part I.
Random forest (R, C). The classifier used the results of the regression task as additional features. Random forest (R, C)
7 Predictions of TFBS, DHS, and histone marks, using (Zhou and Troyanskaya, 2015) with data from (Consortium, 2012); class 2). 0/1 Indicator of leading variant and eQTL p-value (class 3). Random forest Same as part I

References

Alipanahi B, Delong A, Weirauch MT, Frey BJ. 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–8.

Consortium EP. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74.

Cowper-Sal lari R, Zhang X, Wright JB, Bailey SD, Cole MD, Eeckhoute J, Moore JH, Lupien M. 2012. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat Genet 44(11):1191–8.

Ernst J, Kellis M. 2012. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 9(3):215–6.

Fabian Pedregosa GV, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12:2825–2830.

Ghandi M, Lee D, Mohammad-Noori M, Beer MA. 2014. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol 10(7):e1003711.

Grant CE, Bailey TL, Noble WS. 2011. FIMO: scanning for occurrences of a given motif. Bioinformatics 27(7):1017–8.

Hui Zou TH. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301–320.

Hume MA, Barrera LA, Gisselbrecht SS, Bulyk ML. 2015. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res 43(Database issue):D117–22.

Kelley DR, Snoek J, Rinn J. 2016. Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res.

Romanoski CE, Glass CK, Stunnenberg HG, Wilson L, Almouzni G. 2015. Epigenomics: Roadmap for regulation. Nature 518(7539):314–6.

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S and others. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–50.

Zhou J, Troyanskaya OG. 2015. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–4.