Skip to main content
. 2021 Jun 29;12(7):591. doi: 10.3390/insects12070591

Figure 2.

Figure 2

Supervised motif-blind CRM discovery (SCRMshaw). (a) SCRMshaw uses a training set of known Drosophila melanogaster CRMs (“training sequences”), drawn from REDfly, that are defined by common functional characterization, and a 10-fold larger background set of similarly sized non-CRM sequences (“background sequences”). (b) The short DNA subsequence (kmer) count distributions of these sequences are then used to train a statistical model. The trained model (c) is used to score overlapping windows in the “target genome”; to date, we have successfully used multiple different species from the Holometabola and several Hemiptera. (d) High-scoring regions are predicted to be functional regulatory sequences (asterisks). Figure adapted from [69]. Insect images downloaded from TheNounProject.com (accessed on 28 April 2021) set “Bugs” by Georgiana Ionescu, under the CC-BY license.