Skip to main content
. 2023 Jun 27;24:154. doi: 10.1186/s13059-023-02985-y

Fig. 3.

Fig. 3

De novo motif discovery with ExplaiNN. A Average performances (AUPRC; y-axis) by rank (x-axis) of PWMs derived from training ExplaiNN models with 100 units on in vivo datasets of 163 TFs and then visualizing the filter of each unit (i.e., 100 PWMs per TF). The rank of each PWM is given by the importance of its unit. The gray dots indicate the rank of the best performing PWM for each TF. B Pairwise comparison of the individual performances (AUPRC) of the best PWMs derived for each TF from the previous dataset using ExplaiNN (y-axis) or STREME [34] (x-axis). C Performance difference (i.e., ΔAUPRC) of the previous PWMs (x-axis) is plotted with respect to the dataset size of the corresponding TF (x-axis). D Execution time (in seconds; y-axis) of the de novo motif discovery application of ExplaiNN (green dots) and STREME (gray dots) is plotted with respect to the dataset size of the corresponding TF (x-axis). E Logos derived using ExplaiNN or STREME for the nuclear receptors AR, NR2F2, and VDR from the previous dataset. For comparison, the JASPAR [18] logos for these TF profiles are shown: MA0007.3 (AR), MA1111.1 (NR2F2), and MA0693.1 (VDR). F Performances (AUPRC; y-axis) of PWMs derived from different experimental assay datasets related to the TF GATA3 by different methods (x-axis), including ExplaiNN (green bars) and four assay-specific methods [34, 3840] (gray bars). G GATA3 logos derived from the dataset of each experimental assay using ExplaiNN or the assay-specific method. The JASPAR logo for this TF profile (MA0037.4), derived by applying RSAT [35] on the mouse Gata3 ChIP-seq data from ReMap [41], is shown at the top. AUPRC, area under the precision-recall curve; HMM, hidden Markov model; PBM, protein binding microarray; PWM, position weight matrix; S&W, Seed-And-Wobble; TF, transcription factor