a) Among all motifs discovered by TF-MoDISco, 18 motifs display unusually high information content (IC) of >30 bits (green). The expected short motifs are shown in gray. b) Histogram of the overlap of short motifs (gray) and long motifs (green) with repeat elements shows that long motifs overlap >80% with annotated retrotransposons. c) Long motifs with their PFM, ID, fraction of motif instances overlapping with a repeat and the most frequent (top class) RepeatMasker annotation. Highlighted within the repeat elements are potential motif instances of Oct4-Sox2, Sox2, Nanog and Klf4 as indicated by the CWMs. d) To identify a set of representative motifs from the 33 short motifs discovered for different TFs (information content <30 bit, shown in Supplementary Fig. 3) and remove redundant short motifs, motifs were clustered by similarity using hierarchical clustering. The results were then manually inspected to select clusters that separate known motifs that are distinct (e.g. Oct4-Oct4 resembles the known MORE and PORE motifs that bind Oct4 homodimers, which is different from the monomerically bound Oct4 motif). Among very similar motifs within a cluster, we then selected the most abundant motif that was discovered for the most relevant TF if known). The 11 representative motifs that we selected are shown on the left. Non-canonical motifs were given a name (Nanog-alt for Nanog alternative, Klf4-long for longer Klf4).