Skip to main content
. 2016 Feb 11;5:158. [Version 1] doi: 10.12688/f1000research.7485.1

Figure 1. Summary of the DREAM6 gene expression challenge.

Figure 1.

( A) Training data consisted of DNA sequences for 90 yeast RP promoters whose activities were experimentally determined 30, 34. DNA sequences for blinded test set of 53 promoters whose activity was hidden also experimentally determined but withheld from the challenge participants was also provided. ( B) Outline for strategy of modeling promoter activity. Each promoter was segmented into 100 bp non-overlapping windows with the full promoter regarded as a separate window. For each window, DNA sequence features were extracted and feature selection using a linear regression wrapper performed prior to machine learning. Performance of machine learning models trained on each window was determined in 5- and 10-fold cross-validations using Pearson correlation.