Skip to main content
. 2024 Feb 26;12:RP89371. doi: 10.7554/eLife.89371

Figure 1. Study design and identification of enhancer activity.

(A) Sheared DNA from the GM12878 cell line was subjected to enrichment via capture with probes targeting loci selected in reduced representation bisulfite sequencing (RRBS) workflows (MspI targets), CpG sites on the Infinium EPIC array, the gene NR3C1 and flanking regions, and 100,000 randomly distributed control regions. Note that a single 600 bp window can contain multiple target types (Figure 1—figure supplement 1). Library diversity summaries are shown in Figure 1—figure supplement 2. (B) Captured loci were cloned into the mSTARR-seq vector, pmSTARRseq1, treated with either the CpG methylating enzyme M. SssI or a sham treatment, and transfected into K562 cells. Methylation levels post-transfection and rarefaction analyses of sequencing depth of replicate samples are shown in Figure 1—figure supplements 3 and 4. Right panel shows an example of DNA methylation-dependent regulatory activity near the first exon of the TTC32 gene, where the methylation-dependent regulatory element overlaps an active promoter chromatin state (red horizontal bar denotes active promoter as defined by ENCODE: The ENCODE Project Consortium, 2012). (C) mSTARR-seq regulatory activity in the baseline condition is strongly enriched in ENCODE-defined enhancers and some classes of promoters (indicated in blue), and depleted in repressed, repetitive, and heterochromatin states. See Supplementary file 4 for full results of this analysis. Regions with mSTARR-seq regulatory activity detected in this experiment also significantly overlap with regions with regulatory activity in other mSTARR-seq and conventional STARR-seq datasets (Figure 1—figure supplement 5; see also Figure 1—figure supplement 6 for estimates of concordance across technical replicates). (D) Left column shows, under the baseline condition (i.e. unstimulated cells), the proportion of 600 bp windows that exhibited minimal regulatory activity (at least 3 replicate samples produced non-zero RNA-seq reads in either the methylated condition or the unmethylated condition) in the mSTARR-seq assay (pink) versus those with detectable input DNA but no evidence of regulatory activity (blue), for windows containing sites from each target set. Right column shows the proportion of windows with regulatory capacity (i.e., the subset of the windows represented in pink on the left that produce excess RNA relative to the DNA input at FDR <1%) that are also methylation-dependent (dark brown). Within each column, pie charts are scaled by the total numbers of windows represented. See Figure 1—figure supplements 7 and 8 for comparisons to regulatory regions in other datasets. Figure 1—figure supplements 9 and 10 show window-level RNA to DNA ratios. Figure 1—figure supplement 11 shows the relationship between CpG density and methylation-dependent regulatory activity.

Figure 1.

Figure 1—figure supplement 1. Overlap of target genomic regions with each other.

Figure 1—figure supplement 1.

Upset plot showing the degree to which 600 bp non-overlapping genomic windows are shared between the four target genomic regions (EPIC CpGs, Msp1 CpG cut sites, the NR3C1 region, or control sites). Overlap occurs because a single 600 bp genomic window can simultaneously include EPIC CpGs, Msp1 CpG cut sites, the NR3C1 region, and/or control sites. This plot includes 722,472 unique windows, reflecting the set of windows containing at least 1 basepair of sequence in the target loci.
Figure 1—figure supplement 2. mSTARR-seq library diversity.

Figure 1—figure supplement 2.

Comparison of diversity of unique mSTARR-seq DNA and RNA fragments from the library generated in this study (transfected into K562 cells) relative to the library published in Lea et al., 2018 (independently transfected into K562 cells). Each dot represents an experimental replicate (Lea DNA replicates n=12; Lea RNA n=12; current DNA replicates n=35; current RNA replicates n=35). Each box represents the interquartile range, with the median value depicted as a horizontal bar. Whiskers extend to the most extreme values within 1.5 x of the interquartile range.
Figure 1—figure supplement 3. Methylation levels of mSTARR-seq DNA, pre- and post-transfection.

Figure 1—figure supplement 3.

(A) Bisulfite sequencing shows that DNA methylation on the mSTARR-seq plasmid is maintained until the end of the experiment (i.e. 48 hr after transfection), with significantly higher methylation levels in the replicates from the methyltransferase reaction relative to the replicates from the sham methyltransferase reaction (mean methylated = 0.885 [n=17], mean unmethylated = 0.066 [n=15]; unpaired t-test: t=–14.66, df=15.124, p=2.39 x 10–10). Each dot represents an experimental replicate. Red dots indicate post-transfection DNA samples; the single blue dot per condition indicates pre-transfection DNA methylation levels. Methylation estimates are based on the CpG at the position 2294, which is located in the plasmid region used for Gibson assembly. We assessed methylation of this CpG, rather than across CpGs genome-wide, because the genomic coverage of our bisulfite sequencing data across replicates was too variable to perform reliable site-by-site analysis of DNA methylation levels before and after the 48 hr experiment. One sample from the dex sham reaction, L31395, shows an unexpectedly high level of methylation, which appears to be due to an error during generation of the bisulfite sequencing library (e.g. mislabeled tube or poor bisulfite conversion), and not the experimental replicate of cells itself, as the mSTARR-seq RNA library (L31244) from the same replicate clusters with the unmethylated sham replicates as expected (panel B). (B) The first two principal components summarizing overall counts of mSTARR-seq reads for the dex-treated RNA samples (i.e. the raw readout of overall regulatory activity). Each dot represents an experimental replicate, with red and black indicating sham and methylated replicates, respectively. Overall regulatory activity of sample L31244 (indicated by arrow) clusters with the sham replicates as expected, suggesting that this replicate was indeed transfected with sham-treated mSTARR-seq DNA.
Figure 1—figure supplement 4. Rarefaction curve showing total number of windows formally tested for regulatory activity, as a function of number of reads sequenced per DNA or RNA replicate.

Figure 1—figure supplement 4.

Sequencing reads from the (A) DNA replicates or (B) RNA replicates of the baseline dataset were rarefied to the values shown on the x-axis before running the data processing steps and applying the filtering criteria described in the Materials and Methods for the full data set. Dashed vertical lines represent the mean number of sequenced reads per DNA replicate (mean [SD]=30.375 million [3.335 million]) or RNA replicate (mean [SD]=31.712 million [8.194 million]) in the full baseline dataset. These analyses show that our sequencing effort saturated the number of formally analyzable windows based on either our criteria for inclusion based on DNA library sequencing depth or RNA library sequencing depth.
Figure 1—figure supplement 5. Overlap of regulatory activity across datasets.

Figure 1—figure supplement 5.

Regulatory regions (in either the unmethylated sham condition, the methylated condition, or both) identified via mSTARR-seq in this study significantly overlap with: K562 regulatory regions (in either the unmethylated sham or methylated condition, or both) from a previously generated mSTARR-seq dataset reanalyzed with our pipeline (Lea et al., 2018) (log2(OR) [95% CI]=6.212 [6.086, 6.440], p<1.0 x 10–300); regulatory regions (in either the unmethylated sham or methylated condition, or both) from an mSTARR-seq experiment in HepG2 liver cells (log2(OR) [95% CI]=3.534 [3.381, 3.684], p=5.21 x 10–307); and regulatory regions from a conventional STARR-seq experiment (i.e. an unmethylated condition) in A549 lung epithelial cells (Johnson et al., 2018) (log2(OR) [95% CI]=2.451 [2.442, 2.461], p<1.0 x 10–300). Bars represent 95% confidence intervals.
Figure 1—figure supplement 6. Correlations between RNA and DNA replicates.

Figure 1—figure supplement 6.

Pearson correlations (r) of raw counts between RNA replicates (A–D) and between DNA replicates (E–H) within the windows we formally analyzed for enhancer activity in the baseline dataset reported here and in Lea et al., 2018, following a uniform data processing pipeline. All replicate pairs (both RNA and DNA, in both sham and methylated conditions) in the baseline dataset show correlations ≥0.89, demonstrating replicate reproducibility comparable to other STARR-seq studies (e.g. Klein et al., 2020). For RNA libraries, replicates in the baseline dataset are more correlated than in the Lea et al., 2018 dataset (RNA replicates: baseline mean r=0.926; Lea et al., 2018 mean r=0.347), although DNA replicates show similar inter-replicate consistency (baseline mean r=0.992; Lea et al., 2018 mean r=0.991).
Figure 1—figure supplement 7. Library diversity and regions of regulatory activity in HepG2 cells.

Figure 1—figure supplement 7.

(A) Comparison of diversity of unique mSTARR-seq DNA and RNA fragments from the library published in Lea et al., 2018 (transfected into K562 cells) versus the same library transfected into HepG2 cells in this study. Each dot represents an experimental replicate (n=12 replicates for each box). Each box represents the interquartile range, with the median value depicted as a horizontal bar. Whiskers extend to the most extreme values within 1.5 x of the interquartile range. (B) mSTARR-seq regulatory activity in HepG2 cells is strongly enriched in ENCODE-defined enhancers (indicated in blue) and some classes of promoters, and depleted in repressed and repetitive states.
Figure 1—figure supplement 8. Methylation-dependent regulatory activity across datasets.

Figure 1—figure supplement 8.

Effects of methylation on regulatory activity estimated in this study in the baseline dataset are consistent with methylation effects in K562s estimated from a previously generated mSTARR-seq dataset (Lea et al., 2018) and with methylation effects estimated in HepG2 liver cells (Lea et al., 2018: Pearson’s r=0.534 for 1250 windows with FDR <1% in both data sets, R2=0.286, p=3.19 x 10–93; HepG2: Pearson’s r=0.526 for 511 windows with FDR <1% in both data sets, R2=0.277, p=8.87 x 10–38). Each dot represents a 600 bp regulatory window identified, in either the sham or methylated states, in both datasets (FDR <1%; note that not all regulatory windows show significant methylation dependence [MD]). Dashed lines are the best fit lines. In all cases, negative effect sizes correspond to reduced activity in the methylated condition and positive effect sizes correspond to increased activity in the methylated condition.
Figure 1—figure supplement 9. RNA to DNA ratios in the methylated and unmethylated replicates in the baseline dataset.

Figure 1—figure supplement 9.

Mean RNA (in counts per million) to DNA ratios for methylated replicates (x-axis) versus unmethylated replicates (y-axis). A constant of 0.5 was added to the initial raw counts to ensure no denominator values were 0. Each dot represents a 600 bp window that was formally tested for enhancer activity (A), exhibited significant regulatory activity (B), or exhibited significant methylation-dependent regulatory activity (C) in the baseline dataset. Solid diagonal lines represent y=x, and dashed lines represent the best fit lines. As expected, 600 bp windows tend to show higher RNA to DNA ratios in the unmethylated condition relative to the methylated condition.
Figure 1—figure supplement 10. Histograms of RNA to DNA ratios in the baseline dataset.

Figure 1—figure supplement 10.

The x-axis represents the log2(mean RNA [in counts per million] to DNA ratios) for the baseline dataset. A constant of 0.5 was added to the initial raw counts to prevent 0 from being in the denominator. Lower values on the x-axis indicate windows showing lower regulatory activity. N=3721 regulatory windows; N=1768 MD regulatory windows.
Figure 1—figure supplement 11. Relationship between CpG density and methylation-dependent regulatory activity.

Figure 1—figure supplement 11.

CpG-dense mSTARR-seq regulatory regions are more likely to be repressed by DNA methylation (positive y-axis value; Spearman’s rho=0.370, p=9.865 x 10–121; n=3,721 regions with mSTARR-seq regulatory activity). Each dot represents a 600 bp window that showed significant regulatory activity (FDR <1%). Red and blue dots represent regulatory windows where methylation-dependent activity was or was not detected, respectively. The dashed line represents the best fit line.