Figure 3.
Motif composition of Twist ChIP-seq regions shows preferential concentration of specific E-boxes near summits. (A) Locations of CAYRTG = CACATG CATATG and CACGTG E-box instances located within ±250 bp of the ChIP-seq peak (ERANGE-shifted called signal summit; see Methods) (y axis), plotted as a function of signal intensity rank from highest (1) to lowest (2000) (x axis). 1099 MC ChIP-seq data set is shown with a dashed line. CACATG is the most prevalent E-box motif in Twist ChIP regions and it shows the strongest central concentration. (B) Direct (top panel) and cumulative (bottom panel) motif density plots. In the MC data set, 65% of CACATG motifs and 50% of CAGATG occur within ±50 bp of Twist peaks. (C) CAGATG occurs more frequently in Twist ChIP-seq regions and is more centrally localized than (D). (D) CATATG is the motif most prominent in SELEX data (see text). (E) Other E-boxes (defined here as CANNTG motifs where NN is neither CA, GA, nor TA) display a more uniform distribution (B,E), though the other CABVTG E-boxes not pictured here (CG, GC, and CC) provide a minor central enrichment (see Supplemental Fig. 8). The number and distribution of explanatory E-boxes changes with ChIP-seq signal strength, suggesting that more E-boxes create a more robust Twist ChIP signal (A; Supplemental Fig. 7).