Skip to main content
. 2019 Jun 25;8:e43966. doi: 10.7554/eLife.43966

Figure 7. Components show shared cis-regulatory features.

(A) Motifs discovered from the promoter sequences of genes with high component loadings. In each motif logo pair the lower logo shows the de novo inferred motif and the upper logo shows the motif in the HOCOMOCO database best matching the de novo motif. Orange ‘T’ indicates this transcription factor is highest expressed in testis in the GTEx database (half T indicates second highest). Green ‘I’ indicates that a mouse knockout of this gene is infertile. Blue ‘L’ indicates a mouse knockout of this gene is embryonic lethal. Red ‘M’ indicates this gene is required for macrophage development. The notation ‘Crem-t (Atf1)’ indicates that we suspect that the true transcription factor recognizing the motif is not the closest matching database-motif (Atf1). * the (upper) STRA8 motif shown is from Kojima et al., rather than the HOCOMOCO database (B) Association of gene loadings with the probability each de novo identified motif is found in the genes for each component. Coloring is a Z-score from a correlation test between gene loadings and motif probabilities,where red (blue) indicates positive (negative) association. The germ cell components (rows) are ordered by pseudotime. The correlation was calculated for positive and negative parts of the component separately and in the cases where the component is mainly one-sided the other side has been omitted, as have the single cell components. The additional column ‘CpG’ shows the same association test, but with count of promoter CpG dinucleotides, for each component. Across the top of the panel, color bars indicate the maximum probability of there being a CpG at any one position in the denovo motif, and whether that probability is greater than 0.3. See Figure 7—figure supplement 2 for an analogous plot using the HOCOMOCO motif probabilities. We find high and specific enrichment of ChIP-seq targets of STRA8, MYBL1, RFX2 and CREM in the gene loadings of components associated with those motifs, validating our interpretation that covariation of expression of genes within many components reflects shared transcriptional regulation (Figure 7—figure supplement 1).

Figure 7.

Figure 7—figure supplement 1. Validation of Motif Inference Using ChIP-seq Data.

Figure 7—figure supplement 1.

As part of the validation that some SDA components represent biological co-expression we tested for enrichment of ChIP-seq defined target genes (Materials and methods) of well-known meiotic transcription factors (A) STRA8, (B) MYBL1, (C) RFX2, and (D) CREM. Enrichment was calculated using Fisher’s test against the top 500 genes in each component (positive and negative loadings separately). OR = Odds Ratio. Red line represents bonferroni corrected p=0.05.
Figure 7—figure supplement 2. Validation of Motif Inference from SDA Component Loadings.

Figure 7—figure supplement 2.

As described in Figure 7, we performed de novo identification of transcription factor binding sites with enrichment in the cis-regulatory regions of genes loading highly on each SDA component. Here we validate these motifs by quantifying the association of SDA component gene loadings with the probability that each motif is found near the corresponding component genes promoters. The coloring of the heatmap is the test statistic (t-value) from a correlation test between gene loadings and motif probabilities. The t-value follows a t distribution under the null hypothesis of no correlation. Motif probabilities are calculated using the MotifFinder package, and based on the HocoMoco matched motifs. For probabilities based on the de novo motifs see Figure 7 in the main text. The components (rows) are ordered by pseudotime.