Abstract
Generation of functional transcripts requires transcriptional initiation at regular start sites, avoiding production of aberrant and potentially hazardous aberrant RNAs. The mechanisms maintaining transcriptional fidelity and the impact of spurious transcripts on cellular physiology and organ function have not been fully elucidated. Here we show that TET3, which successively oxidizes 5-methylcytosine to 5-hydroxymethylcytosine (5hmC) and other derivatives, prevents aberrant intragenic entry of RNA polymerase II pSer5 into highly expressed genes of airway smooth muscle cells, assuring faithful transcriptional initiation at canonical start sites. Loss of TET3-dependent 5hmC production in SMCs results in accumulation of spurious transcripts, which stimulate the endosomal nucleic-acid-sensing TLR7/8 signaling pathway, thereby provoking massive inflammation and airway remodeling resembling human bronchial asthma. Furthermore, we found that 5hmC levels are substantially lower in human asthma airways compared with control samples. Suppression of spurious transcription might be important to prevent chronic inflammation in asthma.
Subject terms: Respiratory tract diseases, Epigenetics, Inflammatory diseases
Smooth muscle cell-specific knockout of Tet3 in mice leads to loss of intragenic 5-hydroxymethylcytosine, accumulation of spurious transcripts and TLR7/8-mediated lung inflammation resembling asthma in human lung samples.
Main
DNA methylation on cytosine (5mC) is a crucial mechanism in the epigenetic modulation of cell-type specific transcription required for maintaining cell identity. Methylation of CpG islands located in gene promoters often results in transcriptional repression whereas, paradoxically, intragenic 5mC correlates frequently with transcriptional strength, although strong CpG methylation (at least 90%) slows down elongation1,2. A recent study demonstrated that the presence of intragenic DNA methylation correlates with prevention of spurious transcription within gene bodies, thereby ensuring transcriptional fidelity3 and eliminating the threat of aberrant potentially hazardous aberrant RNAs4–6.
Although 5mC-mediated gene repression is relatively stable, substantial phenotypical or functional changes such as cell differentiation but also pathological processes require demethylation. The biochemistry of demethylation remained enigmatic for decades until the discovery of TET enzymes, which successively convert 5mC to 5hmC, 5-formylcytosine (5fC) and 5-carboxycytosine (5caC)7. Thymine DNA glycosylase (TDG)-mediated excision of 5fC and 5caC coupled with base excision repair (BER) will eventually result in demethylation8. Three family members exist in mammals, TET1–3, which share similar enzymatic activities and cofactor requirements (that is, α-ketoglutarate, oxygen and Fe2+) but differ in expression profiles and target preferences9. For example, expression of mouse Tet3 is low in embryonic stem cells (ESC) but increases substantially in some differentiated cell types10. Generation of 5hmC by TETs has been viewed mostly as a transition state, required for removal of 5mC and subsequent alleviation of gene repression, but the substantial amount of 5hmC in several somatic cell types makes it unlikely that 5hmC exclusively represents a nonfunctional intermediate of demethylation. Further support of this idea comes from the enrichment of 5hmC in gene bodies of highly expressed genes and at active enhancers11. Despite remarkable progress in the field, the true function of 5hmC formation at gene bodies is still incompletely understood, which is caused in part by missing knowledge about the physiological roles of putative 5hmC interactors12.
Changes in DNA methylation contribute to profound and reversible phenotype changes from contractile to synthetic states of smooth muscle cells (SMCs) in response to external cues13,14. SMCs are found not only in the medial layer of muscularized vessels but also in the airways. Phenotype switching of SMCs in airways contributes to diseases of the lung such as asthma and chronic obstructive pulmonary disease (COPD)15–17. Tet2 was reported to act as a master regulator of murine SMC plasticity, since its knockdown in vitro inhibits expression of key procontractile genes and its overexpression elicits SMC gene expression in fibroblasts18. In contrast, the function of Tet3 in SMCs remains unclear.
Here we describe a pivotal role of TET3 in regulating the fidelity of gene transcription that is required for maintaining the identity of SMC and balancing immune responses in the lung. Our study reveals that spurious transcripts in Tet3-deficient mouse SMCs lead to activation of TLR7/8 signaling-dependent innate immune responses and massive lung inflammation, resembling human asthma, offering perspectives to treat various lung diseases.
Results
Loss of Tet3 reduces 5hmc levels in SMC
To explore the role of 5hmC in the regulation of gene expression in a physiological context, we first searched for cells strongly expressing Tet3, assuming that high expression may be indicative of a decisive function of TET3. To this end, we introduced a LacZ reporter gene cassette into the endogenous Tet3 gene and visualized LacZ expression by 5-bromo-4-chloro-3-indolyl-β-d-galactoside (X-gal) staining. Expression of Tet3-LacZ, which was present in virtually all cells of the developing mouse embryo at E9.5, was enriched in SMCs of adult animals (Fig. 1a and Extended Data Fig. 1a,b). Likewise, we observed increased expression of Tet3 in contractile SMCs derived from mouse embryonic stem cell (mESC-SMCs), while expression of Tet1 and Tet2 was lower, suggesting a dominant function of TET3 in SMCs (Extended Data Fig. 1c–e). To address the function of TET3 in SMCs in vivo, we generated SMC-specific Tet3 knockout mice (Tet3smKO) using an inducible α-SMAERT2Cre strain19 (Extended Data Fig. 1f–h). Tet3smKO mice were viable and fertile but failed to gain body weight 15 weeks after tamoxifen administration (Extended Data Fig. 1i). Phenotyping of Tet3smKO mice, 8 weeks after tamoxifen administration, showed clear morphological changes in the lung, indicated by a shift from a columnar to a cuboidal epithelium, whereas no obvious structural abnormalities were detected in other SMC-containing organs (Fig. 1b and Extended Data Fig. 1j,k). Thus, we decided to focus on the lung for further studies. Introduction of a tdTomato reporter allele into Tet3smKO mice (referred to as Tet3smKO:T) allowed FACS-based isolation of lung SMCs, revealing a profound reduction of 5hmC levels in SMCs after loss of Tet3 (Fig. 1c–f and Extended Data Fig. 1l–p), while global levels of 5fC and 5caC were unchanged. Since 5fC and 5caC are effectively removed by TDG and base excision repair, we assume that 5fC/5caC levels do not reflect dynamic changes in the oxidation of 5hmC in Tet3 mutant SMC but rather represent stable remnants of previous oxidation events, probably acquired during SMC differentiation (Fig. 1e,f). Furthermore, no obvious changes in the global 5mC content in Tet3 mutant SMCs were observed (Fig. 1e,f). Although we cannot exclude that hypermethylation of active genomic regions is levelled out by a paradoxical loss of DNA methylation in heterochromatin20, this finding might indicate that TET3 serves an additional function in SMCs, independent of dynamic 5mC changes required for transcriptional activation. Loss of Tet3 did not lead to compensatory upregulation of TET2 and germline inactivation of Tet2 using two different mouse strains did not change 5hmC levels in SMCs of the aorta and lung (Extended Data Fig. 2a–h,j). Furthermore, we did not detect any obvious morphological abnormalities in SMC-containing organs of Tet2 mutants, although a critical role of Tet2 was reported for maintaining the differentiated state of SMC in human coronary arteries18 (Extended Data Fig. 2g–k). Notably, Tet2 inactivation in Tet3-deficient SMCs (Tet2/Tet3smKO) did not lead to a further decline of 5hmC in bronchial SMCs (BSMCs) or aggravated the airway remodeling phenotype of Tet3smKO mice. In pulmonary vascular SMCs (VSMCs) of Tet2/Tet3smKO, the 5hmC levels were lower than in Tet3smKO VSMCs, but no serious vascular abnormalities were evident (Extended Data Fig. 2l,m). We conclude that TET3 is the main enzyme for maintaining normal 5hmC levels in BSMCs and that the functions of TET2 and TET3 overlap in pulmonary VSMCs.
5hmC prevents spurious entry of RNA polymerase II
Next, we determined the distribution of 5hmC in lung SMCs by Nano-5hmC-seal (Nano-seal), a nonantibody-based technique. Bioinformatics analysis disclosed genome-wide accumulation of 5hmC in gene bodies with an enrichment at proximal 5′-upstream regulatory regions and a sharp decline at the transcriptional start sites (TSS) (Extended Data Fig. 3a). Highly transcribed genes showed the strongest accumulation of 5hmC within gene bodies, while weakly expressed genes had much lower 5hmC levels (Fig. 2a). Inactivation of Tet3 led to global reduction of 5hmC levels (Fig. 2b and Extended Data Fig. 3a). In contrast, genes expressed at very low levels (bottom 5%) showed no enrichment in gene bodies and were less affected by the loss of Tet3 (Fig. 2a). These data suggest that TET3-mediated generation of intragenic 5hmC depends primarily on transcriptional activity.
Next, we determined the binding profiles of RNA polymerase II (Pol II) phosphorylated at Ser5 (Pol II pSer5) by chromatin immunoprecipitation with sequencing (ChIP–seq) after DRB-induced block of transcriptional elongation (Fig. 2c). Importantly, loss of Tet3 increased binding of Pol II pSer5 to TSS and gene bodies (Extended Data Fig. 3b). Grouping of genes into quartiles (group a–d) based on Pol II pSer5 ChIP–seq data (log2 Tet3smKO:T/Ctrl) yielded an even clearer view. Increased intragenic binding of Pol II pSer5 in Tet3 mutant SMCs was correlated positively with transcriptional activity measured by higher levels of Pol II pSer5 binding at TSSs (Fig. 2d), RNA-seq reads (Fig. 2e) and enhanced 5hmC content within gene bodies of control SMCs (Fig. 2f and Extended Data Fig. 3c). These data indicate that Tet3-mediated formation of 5hmC at gene bodies prevents intragenic entry of Pol II into highly transcribed genes.
To dissect the mechanisms leading to preferential accumulation of 5hmC at gene bodies of highly transcribed genes, we investigated whether TET3 associates with the transcription elongation machinery. Coimmunoprecipitation (Co-IP) revealed that wildtype (WT) but not catalytically inactive TET3 interacts with pan-RNA Pol II and elongating Pol II (Pol II pSer2), which was further verified by in situ proximity ligation assays (PLA) (Fig. 3a,b and Extended Data Fig. 3d). We also found interactions of WT but not catalytically inactive TET3 with the H3K36 trimethyltransferase SETD2 and colocalization of 5hmC with SETD2 but not with the H3K36 dimethyltransferase NSD3 (Fig. 3a,b). Importantly, increased expression of WT but not catalytically inactive TET3 enhanced binding of SETD2 to Pol II (Fig. 3c), suggesting that TET3-mediated 5hmC formation stabilizes interactions of SETD2 with the RNA Pol II-containing elongation machinery, although SETD2 is able to interact directly with the carboxy-terminal domain of Pol II at pSer221,22. The dramatic reduction of H3K36me3 within gene bodies in SMCs after inactivation of Tet3 indicates a failure of SETD2 or H3K36me3-dependent repressive chromatin formation required to prevent entry of Pol II (refs.3,4) (Extended Data Fig. 3e,f). Integrated analysis of H3K36me3 ChIP–seq and RNA-seq or Pol II pSer5 ChIP–seq data revealed a strong decline in H3K36me3 levels in highly transcribed genes concomitant with a strong increase of intragenic Pol II entry following loss of Tet3 (Fig. 3d,e). ChIP–qPCR further validated substantial reduction of H3K36me3 in intragenic regions of highly expressed genes such as Acta2, Cnn1, Myh11, Dbn1, Arhgap18 and Lpxn in Tet3 mutant SMCs, which was not observed in low-expressed genes (Extended Data Fig. 3g). Taken together, our findings indicate that TET3 and/or 5hmC facilitate recruitment of SETD2 and subsequent H3K36me3 deposition within transcribed gene bodies, preventing ectopic entry of Pol II pSer5 to gene bodies in SMCs.
Aberrant transcripts in Tet3 mutant SMCs
To identify genes with spurious intragenic transcription, we analyzed RNA-seq data of SMCs isolated from Tet3smKO:T mice by calculating the ratio between the RPKM (reads per kilobase per million mapped reads) of intermediate and first exons (Extended Data Fig. 4a,b). Of all genes containing more than four exons, 7,761 had a log2 ratio greater than one of all intermediate exons from second exon onwards versus the first exon in Tet3-deficient SMCs (Fig. 4a). To detect bona fide cryptic transcription initiation events, we performed Cap-analysis gene expression-sequencing (CAGE–seq), which identifies transcription start sites (TSSs) at single-base pair resolution23,24. Importantly, the number of intragenic CTSS (defined as TSS with CAGE tag greater than eight, the average value of each single-base TSS on annotated TSSs) increased significantly in Tet3-deficient SMCs (Extended Data Fig. 4c). The frequency of ectopic intragenic transcriptional initiation correlated positively with transcription activity indicated by CAGE signals at canonical TSSs, which corresponds well to Pol II ChIP–seq data (Fig. 2d and Extended Data Fig. 4d). Since localization of 5hmC is strongly asymmetric, with significantly higher levels on the sense strand25, we focused on 2,114 genes that contain intragenic CTSS on the sense-strand-specific for Tet3-deficient SMC (Fig. 4a). Out of the 2,114 genes, 515 (24%) showed increased ratios of RNA-seq reads between downstream and first exons as well as enhanced Pol II intragenic entry in mutant SMCs, and were therefore designated spuriously expressed genes (Fig. 4a). Genes with high transcriptional activity and epigenetic signatures, including increased DNA accessibility and H3K4me2/3 deposition generated more spurious transcripts than other genes (Fig. 4b,c and Extended Data Fig. 4e–g). Such genes code for contractile actin filament bundle and actomyosin structure organization (for example, Acta2, Cnn1, Myh11, Dbn1) and pathways important for sarcoplasmic reticulum function and SMC contraction such as ‘Focal adhesion,’ ‘Calcium signaling pathway’ and ‘Inositol phosphate metabolism’26 (Fig. 4d,e and Extended Data Fig. 4h,i). As a consequence of enhanced spurious transcription, more RNA-seq reads of highly expressed contractile SMCs genes, for example, Acta2, Arhgap18 and Cnn1, were recorded after Tet3 inactivation, albeit concentrations of functional full-length mRNAs detected by semiquantitative PCR dropped (Fig. 4e and Extended Data Fig. 4j). We reason that the reduced presence of full-length mRNAs for contractile functions is not caused by reduced transcriptional activity, since (1) transcriptional activity at respective loci is not diminished and (2) expression of key transcription factors driving SMC gene expression (for example, Klf4 and myocardin (Myocd)) remained unchanged (Extended Data Fig. 4k).
The drop in 5hmC accumulation after inactivation of Tet3 was particularly evident within gene bodies, where 5hmC levels are high compared with proximal 5′-upstream regulatory regions, indicating a more important role of TET3 in transcriptional elongation than transcriptional initiation in this subset of genes (Extended Data Fig. 5a,b). hMeDIP–qPCR confirmed a marked reduction of 5hmC levels specifically at intragenic but not promoter regions of highly expressed spurious genes coding for contractile proteins after Tet3 depletion. In contrast, 5hmC levels of low-expressed nonspurious genes were not affected (Extended Data Fig. 5c). Intriguingly, 5mC levels were not significantly altered within either intragenic or promoter regions of highly expressed contractile and low-expressed synthetic genes after loss of Tet3 (Extended Data Fig. 5d). This finding indicates that dynamic formation of 5hmC and deposition of H3K36me3 rather than the mere presence of 5mC alone plays a decisive role in preventing spurious transcription.
Of note, we found a significant enrichment of CpG dinucleotides and transcription factor binding motifs containing CpG sequences, including motifs related to Sp2 and members of the Ets family within 50 base pairs (bp) of TET3-dependent intragenic CTSSs in spuriously expressed genes (Fig. 4f). CAGE–seq analysis confirmed that ectopic transcriptional initiation occurred specifically at intragenic binding motifs of contractile genes such as Acta2 and Myh11 in Tet3-deficient SMC (Fig. 4g). Importantly, 5hmC and H3K36me3 levels were both reduced in the vicinity of intragenic CTSSs of Acta2 and Myh11 genes in Tet3-deficient SMCs (Fig. 4g).
Spurious transcripts activate TLR7 signaling
To determine the functional impact of spurious transcripts, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of RNA-seq data from Tet3 mutant and control lung SMCs was performed. Intriguingly, the top 15 upregulated pathways were associated mainly with inflammatory responses (Fig. 5a). In particular, Tet3 inactivation resulted in upregulation of genes involved in endosomal TLR7/8 signaling (that is, Tlr7, Myd88, Ccl5, Il1b and so on), which is normally activated by single-stranded RNA of viral origin causing production of cytokines and chemokines, and expression of a set of macrophage-enriched genes such as Cd68, Adgre1 and Lgals3 in SMC27 (Fig. 5b and Extended Data Fig. 6a). Moreover, we detected enhanced levels of EEA1 and RAB7—proteins regulating endosome trafficking that colocalize with TLR7 in Tet3-deficient SMCs (Extended Data Fig. 6b). Recruitment of the adapter molecule MYD88 by TLR7 was increased substantially in Tet3-deficient SMCs (Fig. 5c), suggesting that aberrant spurious transcripts provoke activation of nucleic-acid-sensing TLRs, which bestow Tet3-deficient SMCs with macrophage-like properties—as seen during phenotype switching under pathological conditions28. To analyze whether spurious transcripts indeed activate TLR7 signaling, we transfected HEK293 and HeLa cells with whole cellular RNA extracted from control and Tet3-deficient SMCs. Expression levels of endosomal TLR7 downstream genes including IRF7, IL1b, CCL5, CD86, IFNb, CXCL9 but not of TLR7/MYD88 or the nonendosomal target CCR5 were significantly elevated in HeLa cells by RNA from Tet3-deficient SMCs compared with control SMC RNA (Fig. 5d and Extended Data Fig. 6c). Absence of TLR7 signaling, as in HEK293 cells, or E6446-mediated TLR7 inhibition of HeLa cells, prevented such an increase, indicating that induction of innate immune responses by spurious transcripts depends on TLR7 (Fig. 5d and Extended Data Fig. 6d,e).
To confirm the hypothesis that TET3-mediated 5hmC formation prevents spurious transcription and subsequent inflammatory responses, we expressed either WT or catalytically inactive human TET3 in mESCs-derived SMCs after Tet3 knockdown (Tet3KD) (Fig. 6a and Extended Data Fig. 6f). We verified that knockdown of Tet3 reduces 5hmC and H3K36me3 formation (Fig. 6a and Extended Data Fig. 6g) and enhances TLR7-dependent expression of cytokine/chemokine genes, similar to Tet3-deficient primary SMCs (Extended Data Fig. 6h). Expression of WT but not catalytically inactive human TET3 normalized expression of cytokine/chemokine genes in Tet3KD SMCs (Fig. 6b). Transfection of cellular RNA collected from Tet3KD SMCs stimulated cytokine/chemokine gene expression in recipient cells, which was blocked by TLR7 inhibition. Likewise, cellular RNA from Tet3KD SMCs lost this stimulatory effect when WT, but not catalytically inactive, human TET3 was expressed in the donor cells (Fig. 6c).
Loss of Tet3 causes airway inflammation
Characterization of the pathological responses in airways uncovered a switch from the contractile (spindle-shape with actin filament and dense bodies) to the synthetic (rhomboid-shape with rough endoplasmic reticulum state of SMCs, 2 months after Tet3 inactivation (Fig. 7a and Extended Data Fig. 7a–c). In line with ultrastructural changes in SMCs, expression levels of miR-145a (a master regulator of SMC contractility29) and contractile markers such as α-SMA and MYH11 were decreased substantially in FACS-sorted Tet3smKO:T and in vitro differentiated Tet3KD SMCs (Fig. 7b,c and Extended Data Fig. 7d–f). In contrast, protein levels of synthetic marker genes such as TPM4 and VIM (vimentin) were elevated (Fig. 7b,c and Extended Data Fig. 7f). Moreover, we observed enhanced binding of Pol II pSer5 at intragenic CTSSs within contractile but not synthetic marker genes, resulting in elevated transcription of intermediate exons and impaired production of full-length mRNA transcripts of contractile, but not synthetic, genes (Fig. 7d–f). Expression of WT but not a catalytically inactive human TET3 prevented phenotypic changes induced by Tet3 suppression in mESC-derived SMCs (mESC-SMCs), conclusively demonstrating that the phenotype switch of SMCs relies on the reduction of 5hmC (Fig. 7d–f).
The phenotype switch after Tet3 inactivation did not enhance proliferation of SMCs as indicated by unchanged numbers of Ki67+ SMCs (Extended Data Fig. 7g). Instead, we detected more SA-β-Gal-positive cells in the bronchial smooth muscle layer and elevated expression of senescence marker genes, for example, p16 (Cdkn2a) and p21 (Cdkn1a) in Tet3-deficient SMCs (Extended Data Fig. 7h,i). Acquisition of a senescence-associated secretory phenotype (SASP) by SMCs might enhance paracrine effects on neighboring cells in the lung. In fact, we detected concomitant upregulation of interferon response-related genes in lung SMCs and epithelial but not in endothelial cells, although the switch from the contractile to synthetic state was also observed in VSMCs of the lung by electron microscopy (EM) (Extended Data Fig. 7b,c and Extended Data Fig. 8a,b). Consequences of putative paracrine effects of Tet3-null SMCs were further examined by culturing mouse epithelial lung cells (MLE12 cells) with conditioned medium from control and Tet3KD SMCs. Conditioned medium from Tet3KD SMCs increased expression of pro-inflammatory genes (Il6, Il1b and Ifnb) and EMT related genes (Fn1, Cdh1, Vim). Expression of WT but not of catalytically inactive human TET3 in Tet3KD SMCs abolished this effect (Extended Data Fig. 8c). We conclude that bronchial SMCs are particularly susceptible to innate immune responses, eliciting adverse effects on neighboring epithelial cells, whereas vascular SMCs may require additional noxae to induce pathological vascular responses.
Furthermore, we noted a pronounced metaplasia of Club (CCSP+) but not ciliated cells (α-tubulin+) to mucus-producing goblet cells (Mucin5AC+, AGR2+ or PAS+) and excessive extracellular matrix deposition (Collagen I+) 2 months after SMC-specific Tet3 inactivation (Fig. 8a and Extended Data Fig. 8d,f). Of note, Tet3 inactivation did not lead to elevated expression of Mucin5AC in the intestinal epithelium, suggesting tissue-specific reactions (Extended Data Fig. 8g).
FACS analysis and immunofluorescence staining revealed substantially increased numbers of neutrophils, CD3+ T cells and interstitial macrophages but not of B cells and eosinophils in whole lung tissues, 2 months after Tet3 inactivation (Fig. 8b and Extended Data Fig. 8h). At 6 months after Tet3 inactivation, the lung phenotype had progressed further. We found massive peribronchiolar fibrosis and lesions in Tet3smKO lungs, composed primarily of proliferative CD45R+ B cells (Fig. 8c and Extended Data Fig. 8i,j). In addition, the number of eosinophils was increased, while the rise of CD3+ T cells was no longer significant, although a moderate but significant increase in the number of Th2 cells was detected (Fig. 8d and Extended Data Fig. 8k). In line with this finding, production of the Th2 cytokines interleukin 4 (IL4), interleukin 13 (IL13) and interleukin 17a (IL17a), inducing Club cell metaplasia and enhanced mucin secretion30, was strongly upregulated in the CD3+ T cell fraction (Extended Data Fig. 8l), indicating secondary immune responses mediated mainly by B cells, eosinophils and Th2 cells.
The phenotype of Tet3smKO:T mice strongly resembles the clinical appearance of adult human asthma, in which innate immune responses cause phenotype switching of SMCs but also bears some similarities to COPD16,17,31. Notably, we detected a strong reduction of 5hmC levels within bronchial SMC layer in human samples from asthma patients and in two distinct mouse models, which rely either on the use of house dust mites or Aspergillus fumigatus to induce asthma (Fig. 8e–h and Extended Data Fig. 9a–d). In contrast, we did not detect a decline of 5hmC in the bronchial SMC of human COPD and cystic fibrosis patients and in mouse lung after exposure to hypoxia for 28 days (10% O2) (Extended Data Fig. 9e–h).
Discussion
Previous studies unveiled that DNMT3B-dependent 5mC formation is critical to prevent inappropriate transcription at gene bodies3,32. Here, we propose that TET3-mediated 5hmC formation stabilizes interactions of SETD2 with the Pol II-containing elongation machinery, thereby facilitating H3K36me3 chromatin modifications, which, after passage of Pol II, prevents its re-entry (Extended Data Fig. 10a). Such a model is fully compatible with an essential role of DNMT3B but indicates that 5hmC is indispensable not only for allowing transcriptional elongation but also for preventing aberrant transcription initiation within gene bodies.
CAGE–seq data analysis revealed that spurious transcription initiation sites in Tet3-deficient SMC are enriched at CpG dinucleotides. This observation suggests that TET3-mediated oxidation of 5mC occurs primarily at heavily methylated cryptic intragenic TSSs or transposon elements to prevent aberrant transcriptional initiation. Since high density of CpG methylation (at least 90%) slows down elongation rates1, TET3, which interacts with Pol II, will have more time to demethylate genomic loci with high 5mC density. Although the role of 5hmC after formation of H3K36me3 within the intragenic cryptic TSSs is still enigmatic, it is possible that 5hmC not only enhances recruitment of SETD2 but also serves as an intermediate towards DNA demethylation to support the elongating Pol II complex for overcoming obstacles imposed by DNA methylation. We also demonstrate that reduced 5hmC formation is associated with reduced H3K36me3 deposition in spuriously transcribed genes after Tet3 inactivation. In contrast, the 5mC content of such genes did not change, further supporting the decisive function of 5hmC and H3K36me3 to prevent spurious transcription by favoring closed chromatin structures in gene bodies of SMCs4,33.
In addition to TET3, SMCs also express TET2 but not TET1. A recent study reported that TET2 acts as a master regulator of SMC plasticity by increasing chromatin accessibility at promoters of key procontractile genes18. Notably, we found that TET2 depletion neither attenuates 5hmC levels nor causes lung abnormalities in vivo, suggesting that TET3 compensates for the absence of TET2 in several differentiated tissues in vivo, at least under baseline conditions. This observation is consistent with upregulation of TET3 in various differentiated organs of Tet1 and Tet2 double knockout mice10. Concomitant inactivation of Tet2 and Tet3 in SMCs further reduced 5hmC in pulmonary VSMCs but not in BSMCs, although the gross morphology of vessels was unaffected in Tet2/Tet3 compound mutants. We assume that the function of Tet2 and Tet3 to generate 5hmC partially overlaps in VSMCs. The question remains why BSMCs are more vulnerable to the loss of 5hmC than VSMCs, which might be due to a lower rate of spurious transcription in VSMCs compared with BSMCs or a lower threshold of BSMCs to activate innate immune responses. We assume that the switch from a contractile to a synthetic phenotype in Tet3-deficient pulmonary VSMCs is the consequence of massive lung inflammation caused by Tet3-deficient BSMCs, since Tet3 mutant VSMCs in the aorta do not show a reduction in the expression of contractile genes. Alternatively, a differential responsiveness of neighboring cells to activate innate immune responses may contribute, which is supported by the upregulation of interferon response-related genes in epithelial but not in endothelial cells via conditioned medium from Tet3 mutant SMCs.
Our study unveils the biological consequences of inappropriate cryptic transcription in mammals, which has not previously been adequately addressed5,6. We discovered that accretion of aberrant intragenic transcripts in Tet3-deficient SMCs activates the TLR7 nucleic-acid-sensing system, subsequently provoking immune responses and lung pathogenesis (Extended Data Fig. 10b). Generation of spurious transcript might interfere with proper modification of self-RNAs, such as 2′-O-methylation, pseudouridine (Ψ), 5-methylcytidine (m5C), 2-thio-uridine (s2U) or N6-methyladenosine (m6A)34, or provoke removal of such modifications, thus misleading cells to recognize them as foreign. Spurious transcripts might also resemble ssRNA degradation products or become processed into products recognized by TLR7/8. Currently, we do not know which specific properties of spurious transcripts enable activation of the endosomal TLR7/8 signaling pathway, but it is evident that such products are stable enough to elicit an innate immune response after transfection into other cells.
Our study highlights the central role of SMCs in lung disease, demonstrating that alterations of contractility and initiation of innate immune responses in SMCs are sufficient to stage a massive inflammatory reaction, which compromises airway function and results in an asthma-like phenotype. The observation of reduced 5hmC formation in bronchial SMCs in asthma patients and in two different mouse asthma models is intriguing. The discovery that loss of Tet3 causes an asthma-like phenotype strongly suggests that the reduction of 5hmC in airways of human asthma patients is not an epiphenomenon but is causally involved in the pathogenesis of asthma. We speculate that modulation of TET3 activity and/or 5hmC formation might be a viable approach to interfere with chronic innate immune responses, initiated or maintained by inappropriate cryptic transcription.
Methods
Study approval
Approval to use human samples from the BioMaterialBank Nord for research was granted by the Ethics Committee of the University of Lübeck (Az 12-220 and 14-225). Tissue donations from the DZL Biobank (Deutsches Zentrum für Lungenforschung) was approved by the Ethics Committee of the Department of Human Medicine of Justus Liebig University Hospital, in accordance with national law and with the ‘Good Clinical Practice/International Conference on Harmonization’ guidelines. Written informed consent was obtained from each patient or the patient’s next of kin (Az. 58/15 and 111/08). All animal experiments were done in accordance with the Guide for the Care and Use of Laboratory Animals published by the US National Institutes of Health (NIH Publication No. 85-23, revised 1996) and were approved by the responsible Committee for Animal Rights Protection of the State of Hessen (Regierungspraesidium Darmstadt) with the project numbers B2/1125, B2/1137 and B2/1056.
Animals
Tet3floxLacZ/+ and Tet3fl/fl mice were generated in house by using a targeting vector purchased from the European Conditional Mouse Mutagenesis Program (EUCOMM). α-SMAERT2Cre2 transgenic mice were provided by P. Chambon (IGCMB Strasbourg). ROSA26tdTomato mice were obtained from The Jackson Laboratory. C57BL/6 mice were obtained from Charles River. All mice were maintained in individually ventilated cages, at 22.5 °C ±1 °C and a relative humidity of 50% ±5% with controlled illumination (12 h dark/light cycle). Mice were given access to food and water ad libitum. All mouse strains were backcrossed and maintained on a C57BL/6 genetic background. Primers used for genotyping are listed in (Supplementary Table 1). All experiments were performed using approximately equal numbers of male and female mice, since preliminary data did not indicate significant differences between females and males in respect to changes in airway morphology. Tamoxifen (Sigma) was administered intraperitoneally at 75 mg kg–1 body weight daily for 10 days starting from 8 weeks old. In all experiments, mice without the respective floxed allele but containing the Cre-recombinase expressing allele and/or the tdTomato reporter served as controls, unless indicated otherwise.
Isolation of SMCs and epithelial and endothelial cells
After sacrificing experimental mice, blood was removed by perfusion with cold PBS through the right ventricle before lung dissection. Lung tissues were dissected and minced into small pieces before incubation in 3 ml digestion buffer (DPBS containing Collagenase type 2 (2 mg ml–1, Worthington), Elastase (0.04 mg ml–1, Worthington) and DNase (5U ml–1, Roche) with frequent agitation at 37 °C for 10 min. Immediately afterwards, ten times the volume of cold DMEM supplemented with 10% fetal bovine serum (FBS) was added to single-cell suspensions. Cells were dissociated mechanically by passing four to five times through a 30 ml syringe and consecutive filtering through 100-, 70- and 40-µm cell strainers (BD Biosciences). The filtrate was centrifuged at 300 g at room temperature (RT) for 10 min. Pellets were resuspended in 1 ml precooled MACS buffer (catalog no. A9576, Miltenyi Biotec) with 1% BSA. After 5 min centrifugation at 300 g, 4 °C, cell pellets were resuspended in 90 µl MACS buffer and incubated with 10 µl CD45 MicroBeads (catalog no. 130-052-301) and anti-Ter-119 MicroBeads (catalog no. 130-049-901) at 4 °C for 15 min to remove hematopoietic cells. After washing with MACS buffer, cells were loaded into preconditioned LS columns (Miltenyi Biotec) on a MACS separator and the flow-through containing unlabeled cells was collected. 4,6-Diamidino-2-phenylindole (DAPI)– and tdTomato+ populations were sorted using a FACSAria III (BD Biosciences). (Extended Data Fig. 1m and Supplementary Figs. 1 and 2). Epithelial and endothelial cells were isolated using anti-EpCAM MicroBeads (catalog no. 130-105-958) and anti-CD31 Microbeads (catalog no. 130-097-418), respectively.
In vitro differentiation of mouse embryonic stem cell-derived SMCs
The method for differentiation of SMC from mouse embryonic stem (ES) cells was based on a published protocol35. Briefly, resuspended single mouse ES cells were plated on plates coated with gelatin at a density of 4 × 104 cm–2 at 37 °C, 5% CO2 in differentiation medium (DMEM (Sigma) supplemented with 10% fetal calf serum FCS (Sigma), 1 mM l-glutamine, 0.1 mM l–1 2-mercaptoethanol, 0.1 mM l–1 nonessential amino acids, 100 U penicillin and 10–5 mol l–1 all-trans RA (Sigma)). Cells were cultured for 8–10 days with a daily change of fresh RA-containing differentiation medium.
Lentivirus infection
HEK293T cells (2 × 106 per 10 cm dish) were transfected with either 5 µg pLKO.1-Tet3-shRNA (shRNA sequence: sh1, 5′-CTGTTAGGCAGATTGTTCT; and sh2, 5′-TCCAACGAGAAGCTATTT), which does not target human Tet3, pLJM1-TET3WT or pLJM1-TET3CD (mutation at H1077Y and D1079A), together with 4.5 µg psPAX2 (Addgene, catalog no. 12260), and 0.5 µg pMD2.G (Addgene, catalog no. 12259) using the Turbofect transfection reagent and Opti-MEM for 6–8 h. The lentivirus-containing supernatant was collected at 48 and 72 h after transfection, pooled and filtered through a 0.45 µM cell strainer to remove HEK293T cells. Lentiviruses were concentrated with a Lenti-X concentrator according to the manufacturer’s instructions (TaKaRa, catalog no. 631231). Differentiated SMCs were infected with the Tet3 shRNA lentivirus with Polybrene (8 μg ml–1) for 6–8 h, followed by infection with either TET3WT and TET3CD lentiviruses 3 days later.
Human asthma, donor, COPD and cystic fibrosis samples and mouse asthma and chronic hypoxia samples
Human asthma samples were received from the BioMaterialBank Nord, Clinical and Experimental Pathology Medicine, Research Center Borstel. Human donor samples, human COPD samples and human cystic fibrosis samples were provided by the DZL Biobank were obtained during lung transplantation of human COPD and cystic fibrosis patients36. Donor lung material was obtained as a result of atypical resections undertaken to adjust the donor organ to the recipient’s thoracal cavity. Clinical characteristics of patients and donors are provided in Supplementary Table 2. Experimental asthma in mice was induced by intranasal (IN) application of house dust mite (HDM) allergen whole-body extracts (Greer Laboratories) derived from the common HDM species Dermatophagoides pteronyssinus (Der p) and Dermatophagoides farinae (Der f)37,38. In the second model, experimental asthma was induced by either intraperitoneal (IP) or subcutaneous (SC) injection of A. fumigatus (ASP) followed by IN challenges with ASP39. Control animals were treated with PBS. For chronic hypoxia experiments, mice were kept under normobaric hypoxia (10% O2) or normobaric normoxia (21% O2) in a ventilated chamber (Biospherix) for 28 days. All animal studies were reviewed and approved by the Federal Authorities for Animal Research of Regierungspräsidium Giessen, Hessen, Germany (animal protocols G61/2019 and G27/2020, Gi 09/2017 for the hypoxia mouse model) and were carried out according to the guidelines of the German Animal Welfare Act.
Gene expression analysis
Total RNA was extracted using TRIzol reagent (Invitrogen), following the manufacturer’s instructions. RNA was reverse-transcribed with Superscript II (Invitrogen) following standard procedures. Real-time PCR was performed with two technical replicates using the StepOne Real-time PCR system and KAPA SYBR FAST qPCR Master Mix (KAPA Biosystems). Relative quantitation of gene expression was performed using the ∆∆CT method. Ct values of the target genes were normalized to the β-actin gene using the equation ΔCt = Ctreference – Cttarget and expressed as ΔCt. Relative mRNA expressions are shown with the average from control samples set as 1. Primers and PCR conditions are listed in Supplementary Table 1.
Immunohistochemistry, immunofluorescence and histological analysis
After perfusion with PBS, tissues were dissected and immediately fixed in 4% paraformaldehyde (PFA). For paraffin sections, samples were dehydrated following standard protocols and sectioned at 7 µm after paraffin embedding for immunofluorescence, hematoxylin/eosin (H&E) and trichrome staining using established techniques. For cryosections, fixed tissues were equilibrated in 30% sucrose/PBS at 4 °C overnight and frozen on dry ice. Sections (7 µm) were mounted on SuperFrost slides for immunofluorescence or periodic acid-Schiff (PAS) staining using a kit from Sigma. Immunofluorescence images were acquired with a Leica M205 FA and a ZEISS Imager Z1. Acquisition of immunohistochemistry and histological images was performed with a ZEISS Axioplan2. 5hmC signals were determined by quantifying the average mean fluorescence intensity (MFI) per nucleus of 100 randomly selected α-SMA+ cells in lung tissue section of individual mouse and human subjects using Image J. N numbers refer to the number of individual mouse and human subjects. Antibodies for immunofluorescence staining are listed in Supplementary Table 3.
Western blot and dot blot assays
Sorted SMCs were incubated in lysis buffer (20 mM Tris-HCl, pH 8.0, 200 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100) and resolved by SDS–PAGE before transfer to nitrocellulose filters. Dot blot assays were performed with 100 ng genomic DNA using a Bio-Dot Microfiltration apparatus (catalog nos. 170-6545 and 170-6547). Protein expression was visualized using an enhanced chemiluminescence detection system (GE Healthcare) and quantified using the ChemiDoc gel documentation system (Bio-Rad). Antibodies are listed in Supplementary Table 3.
Electron microscopy
Lungs were isolated and fixed in 1.5% glutaraldehyde (v/v), 1.5% PFA (v/w) in 0.15 M HEPES (v/w), pH 8.0 at 4 °C for at least 24 h, and subsequently incubated with 1% osmium tetroxide for 2 h. Samples were stained en bloc with 50% saturated uranyl acetate, followed by sequential ethanol dehydration (30%, 50%, 75%, 95%), and embedded in Agar 100. Ultrathin sections were cut using an ultramicrotome and image acquisition was performed with a Philips CM10 electron microscope. All images were captured with a slow-scan 2K CCD camera.
FACS analysis
Single-cell suspensions from lung were analyzed with different antibody panels: T cells were defined as CD3+; B cells were defined as B220+; eosinophils were defined as Siglec-F+CD11c–); alveolar macrophages (AMs) were defined as Siglec-F+ CD11c+ CD11B−F4/80+; interstitial macrophages (IMs) were defined as Siglec-F– CD11c−CD11b+ F4/80+; neutrophils were defined as Siglec-F– CD11c−CD11b+ Ly6G+; dendritic cells were defined as Siglec-F– CD11chi MHCIIhi40. T helper type 1 (Th1) cells were defined as CD4+ CD183+; Th2 cells were defined as CD4+ CD194+ CD196−; Th9 cells were defined as CD4+ CD194− CD196+; Th17 cells were defined as CD4+ CD194+ CD196+. CountBright Absolute Counting Beads (Thermo Fisher) was used to calculate absolute numbers of cells in the sample. Fluorescence compensation controls and fluorescence-minus-one (FMO) stain sets were used to identify cells within multicolor-stained samples. Flow cytometry was performed with the LSR Fortessa (BD Biosciences) analyzer. Data acquisition and analysis was done using BD FACS Diva v.8 software (Supplementary Fig. 2).
RNA-seq
RNA was isolated from sorted SMC using the miRNeasy micro Kit (Qiagen) combined with on-column DNase digestion (DNase-Free DNase Set, Qiagen) to avoid contamination by genomic DNA. RNA and library preparation integrity were verified with BioAnalyzer 2100 (Agilent) or LabChip Gx Touch 24 (Perkin Elmer). Total RNA (50 ng) was used as input for ribosomal depletion with RiboGone-Mammalian (Clontech) followed by library preparation using SMARTer Stranded Total RNA Sample Prep Kit (Clontech). Sequencing was performed on the NextSeq500 instrument (Illumina) using v.2 chemistry, resulting in average of 44 M reads per library with 1 × 75bp single-end setup. Raw reads were assessed for quality, adapter content and duplication rates with FastQC v.0.11.8 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Trimmomatic v.≥0.36 was employed to trim reads after a quality drop below a mean of Q15 in a window of five nucleotides41. Only reads of at least 15 nucleotides were cleared for subsequent analyses. Trimmed and filtered reads were aligned versus mouse genome v.mm10 (GRCm38.p5) using STAR ≥2.5.4b with the parameters ‘–outFilterMismatchNoverLmax 0.1–alignIntronMax 20000042. The number of reads aligning to genes was counted with featureCounts ≥1.6.0 from the Subread package43. Only reads mapping at least partially inside exons were admitted and aggregated per gene. Reads overlapping multiple genes or aligning to multiple regions were excluded. Differentially expressed genes were identified using DESeq2 v. ≥1.14.0 (ref. 44). The annotation was enriched with UniProt data (release March 24, 2017) based on Ensembl gene identifiers (Activities at the Universal Protein Resource (UniProt)).
Cell culture, plasmid transfection and Co-IP
HEK293, HEK293T and HeLa cells were grown in DMEM (Sigma) supplemented with 10% FCS (Sigma), 2 mM l-glutamine, 100 U penicillin and 100 µg ml–1 ptreptomycin at 37 °C, 5% CO2. HEK293T cells (2 × 106 per 10 cm dish) were transfected with 8 µg Flag-HA-TET3-pEF (catalog no. 49446, Addgene) using calcium phosphate precipitation method. At 48 h after transfection, HEK293T cells were collected and washed twice with ice-cold PBS. Cells were resuspended in 300 µl lysis buffer (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100) and sonicated with a bioruptor for 15 min. Lysates were supplemented with 500 μl lysis buffer and incubated on a rotating wheel at 4 °C for 30 min. Cell debris was removed by centrifugation at 12,000g for 20 min at 4 °C. Protein lysate (800 µg) protein lysate was incubated with HA or SETD2 antibody overnight at 4 °C followed by incubation with Protein A-agarose beads (Roche) antibodies at 4 °C for 4 h. After washing three times with lysis buffer, precipitated proteins were eluted from beads in 2× SDS loading buffer and analyzed by western blot.
Chromatin immunoprecipitation
Chromatin immunoprecipitation (ChIP) was performed following published protocols45. Briefly, FACS-purified SMCs (300,000) were first cross-linked with 1% formaldehyde for 10 min and then quenched using the truChIP Chromatin Shearing Kit (COVARIS) for 10 min at RT. Chromatin was sheared to an average size of 200–500 bp by sonication (Diagnode Biorupter). Protein–DNA complexes were immunoprecipitated with IgG or antibodies listed in (Supplementary Table 3), followed by incubation with Protein A/G magnetic beads (Dynabeads, Invitrogen). For ChIP–qPCR, beads were washed and protein–DNA complexes were eluted and purified using 10% (w/v) chelex-100 (Bio-Rad Laboratories) in Tris-EDTA46. Immunoprecipitated chromatin was analyzed by qPCR using SYBR Green quantitative real-time analysis with primers listed in Supplementary Table 1. For Pol II pSer5 ChIP–seq, 3 × 106 FACS-purified SMCs were treated with DRB (100 µmol) at 4 °C for 1 h. Sheared genomic DNA (50 µg) was subjected to immunoprecipitation with 4 µg Pol II Ser5 antibody according to established protocols. Protein–DNA complexes were eluted from beads by incubation with 50 µl elution buffer (10 mM Tris-HCl pH 7.4, 5 mM EDTA, 300 mM NaCl, 0.5% SDS) at RT for 5 min and treated with 1 µg DNase-free RNase (Roche) at 37 °C for 30 min. After incubation with 25 µg proteinase K (10 mg ml–1), 1 µg glycogen at 37 °C for 2 h, samples were heated at 65 °C with constant shaking at 1,350 rpm overnight. DNA was purified with a PCR purification Kit (MinElute PCR Purification Kit).
hMeDIP–qPCR
Genomic DNA (1 µg) was extracted from control and Tet3smKO:T SMCs by using the AllPrep DAN/RNA Micro Kit (Qiagen). hMeDIP was done following instructions provided with the hMeDIP kit (Diagenode). IgG antibodies were employed as a control. Input and hMeDIP products were used as templates for quantitative real-time PCR. Relative 5hmC enrichment was calculated as follows: %recovery (specific locus) =2^[(Ct(10%input) – 3.32) – Ct(hmeDNA-IP)] × 100%; enrichment = %recovery (specific locus)/%recovery (IgG). Primers are listed in Supplementary Table 1.
ChIP–seq and data analysis
Purified ChIP DNA (0.5–10 ng) was used for TruSeq ChIP Library Preparation Kit (Illumina) with modifications. Briefly, libraries were size selected by SPRI-bead based approach after final PCR with 18 cycles: samples were first cleaned at a 1× bead:DNA ratio, followed by two-sided-bead cleanup step with a 0.6× bead:DNA ratio. Supernatant was transferred to a new tube and incubated with additional beads at a 0.2× bead:DNA ratio. Bound DNA samples were washed with 80% ethanol, dried and resuspended in Tris-EDTA buffer. Sequencing was performed on the NextSeq500 instrument (Illumina) using v.2 chemistry with 1× 75 bp single-end setup. Quality assessment was performed via FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and reads where trimmed using Reaper47. Reads were further deduplicated using Picard v.2.17.10. The Macs2 peak caller v.2.1.0 was employed to accommodate for the range of peak widths as typically expected48. The minimum Q value was set to –1.5 and false discovery rate (FDR) was changed to 0.001. Peaks overlapping ENCODE blacklisted regions (known misassemblies, satellite repeats) were excluded. To determine thresholds for significant peaks per IP, data were inspected manually in Integrated Genome Viewer (IGV) v.2.3.52 (ref. 49). For comparison of peaks in different samples, significant peaks were overlapped and unified to represent identical regions and recounted. Background-correction was performed to correct read counts from different regions (unified peaks, promoters, genes). Treatment and Input samples were normalized for sequencing depth, before subtracting reads of the input sample from reads of the respective treatment sample in windows of 50 nt length50. All windows with negative values (Input > Treatment) were set to zero51. Background-corrected counts for regions were calculated using BigWigAverageOverBed (UCSC Tools) and normalized with DESeq2 v.≥1.14.0 (ref. 52). Peaks were annotated with the promoter (TSS ±5,000 nt) of genes closely located to the center of the peak based on reference data from GENCODE v.25. To permit comparative display of samples in IGV, raw BAM files were scaled with DESeq2 size factors based on all unified peaks using bedtools genomecov resulting in normalized BigWig files53. Finally, DESeq2 was used to identify significantly differentially modified peaks based on background-corrected read counts from recounted unified peak regions.
Nano-5hmC-seal (Nano-seal)
Genomic DNA was isolated following standard protocols and suspended in nuclease-free water. The Illumina sequencing library was generated from 75 ng genomic DNA using the NxSeq UltraLow DNA Library Kit (Lucigen) following the manufacturer’s instruction, but without PCR amplification. Glucosylation of half of the purified DNA library was performed in a 21 µl reaction containing 1× Thermo Epi Buffer, 100 µM N3-UDP-Glc and 2 µl T4 beta-glucosyltransferase (Thermo Fisher) at 37 °C for 1 h. After glucosylation, 2.1 µl Biotin-PEG4-DBCO (Click Chemistry Tools, 20 mM stock) was added directly to the reaction mixture and incubated at 37 °C for 2 h. Biotinylated DNA was purified by paramagnetic DNA binding beads (1.8× volume; Omega Bio-Tek) following standard procedures. Purified DNA was incubated with 5 µl C1 Streptavidin beads (Life Technologies) in B&W buffer (B&W buffer: 5 mM Tris pH 7.5, 0.5 mM EDTA and 1 M NaCl) for 40 min at RT with rotation, according to the manufacturer’s instructions. Beads were subsequently subjected to six 5-min washes with B&W buffer before elution in 40 µl water. Eluted enriched DNA libraries were PCR-amplified with index primers and sequenced on an Illumina NextSeq 2000. Trimmomatic v.0.39 was employed to trim reads below a mean of Q15 in a window of five nucleotides41. Only reads longer than 15 nucleotides were used for further analyses. Trimmed and filtered reads were aligned to the mouse genome v.mm10 (ensemble release 101) using STAR v.2.7.10a with the parameters ‘–outFilterMismatchNoverLmax 0.2–outFilterMatchNmin 20–alignIntronMax 1–outFilterMultimapNmax 1’ (ref. 42), retaining only unique alignments and excluding reads of uncertain origin. Reads were further deduplicated using Picard v.2.18.16 (Picard: A set of tools (in Java) for working with next generation sequencing data in the BAM format) to mitigate PCR artefacts leading to multiple copies of the same original fragment. Reads aligning to the mitochondrial chromosome were removed.
CAGE–seq
Total RNA was isolated using the miRNeasy micro Kit (Qiagen) combined with on-column DNase digestion (DNase-Free DNase Set, Qiagen) to avoid contamination of genomic DNA. CAGE library preparation, sequencing, mapping and motif discovery analysis were performed by DNAFORM (Life Science Research Center, Japan). In brief, RNA quality was assessed by Bioanalyzer (Agilent) to ensure a RIN (RNA integrity number) greater than 7.0, and A260/280 and 260/230 ratios greater than 1.7. First-strand cDNA was transcribed to the 5′ end of capped RNAs, attached to CAGE ‘bar code’ tags. Sequenced CAGE tags were mapped to the mouse mm10 genome using BWA software (v.0.5.9) and HISAT2 after discarding ribosomal or non-A/C/G/T base-containing RNAs. Mapped CAGE tags with mapping quality higher than ten were retained, separated by the strand and trimmed to the length of one nucleotide at the 5′ end as CAGE tag start sites (CTSSs)54. CTSS numbers at gene bodies were calculated by excluding exon 1 and by normalizing to total tags. To identify strand-specific TSS, only sense CTSS were used. Genes with 1.3-fold higher intragenic CTSS signal in Tet3-deficient cells compared to controls were selected. Average CTSS coverage at single-base nucleotide was around eight in control samples, based on which single nucleotide sites with CTSS less than eight were defined as low-expressed. For visualization, only CTSS with tag greater than eight were employed. Regions 50 bp upstream or downstream of TSSs specific to TET3 KO were used for motif enrichment analysis by HOMER55.
Laser capture microdissection and DNA-microarray analysis
Cyrosections (7–10 µm) mounted on glass microscope slides were successively immersed into the 70% ethanol fixative solution (10 s); dH2O (10 s); Mayer’s hematoxylin (45 s); ddH2O (10 s); tap water 10 s; 70% ethanol (10 s); 95% ethanol (10 s); 95 % ethanol (10 s); 100% ethanol (60 s); 100% ethanol (60 s). Slides were air-dried before bronchi/bronchioles and vasculature were microdissected using the Laser Microbeam System (P.A.L.M.), and collected into a tube containing 200 µl RNA lysis buffer for RNA extraction using Rneasy Micro Kit (QIAGEN). RNA (0.8 ng) was used for DNA-microarray analysis using the GeneChip WT Pico Reagent Kit (P/N 703262 Rev.1); the GeneChip WT Pico Kit, P/N: 902622; the GeneChip Hybridization, Wash and Stain Kit P/N 900720 and the Mouse transcriptome array 1.0 ST (ClariomD, Ref: 520851) according to the Affymetrix protocol User Guide. DNA-microarray data were analyzed based on published protocols56.
RNA transfection
RNA from 1 million freshly sorted SMCs was extracted using the TRIzol reagent (Invitrogen). HeLa or HEK293 cells at 70% confluence were mock-transfected or transfected with 1 µg total RNA using Lipofectamin MessengerMAX (LipoMAX) (Thermo Fisher) according to the manufacturer’s instructions. At 18 h after transfection, cells were collected and RNA was extracted for gene expression analysis.
Proximity ligation assay
Cryosections of lung tissues or FACS-sorted SMCs after 6 days cultivation with SmBM Smooth Muscle Cell Growth Basal Medium (LONZA) in 6-well plates (around 200,000 SMCs per well) were fixed with 4% PFA for 10 min, permeabilized with 0.3% Triton-100X in PBS for 15 min and washed twice with PBS. PLAs were performed following the Duolink PLA Fluorescence Protocol (Merck) and antibodies listed in Supplementary Table 3. Tissue sections and cells were mounted using Duolink In situ Mounting Media with DAPI (Sigma). Image acquisition were performed by confocal confocal microscopy using a Leica SP8.
Statistical analysis
For all quantitative analyses, a minimum of two biological replicates were analyzed. For the usage of statistical tests, it was assumed that sample data are derived from a population following a probability distribution based on a fixed set of parameters; t-tests were used to determine the statistical significance of differences between two groups. For multiple comparisons, one-way analysis of variance (ANOVA) with Tukey’s post hoc test for correction of multiple testing was performed. The following values were considered to be statistically significant: *P < 0.05; **P < 0.01; P*** < 0.001; P**** < 0.0001; NS, not significant. Calculations were done using the GraphPad Prism 9 software and R v.4.1.0. Data are represented as mean ± s.e.m. unless indicated otherwise. The boxplot displays the median with 25% (bottom value) and 75% quantiles (top value) unless indicated otherwise. No statistical method was used to predetermine sample size.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-022-01252-3.
Supplementary information
Acknowledgements
We thank K. Richter, G. Kripp, S. Krüger, S. Thomas and S. Kreutzer for technical help and Y. Zhang for providing reagents for Nano-seal. This work was supported by the Deutsche Forschungsgemeinschaft (DFG) SFB-TRR 267 TP A05 to T. Braun and T. Boettger; SFB-TRR81 A02 to X.Y. and T. Braun; SFB1213-Project-ID 268555672-TP A02 to T. Braun and -TP B02 to X.Y. and T. Braun; SFB1531-TP B08 to T. Braun; the LOEWE project iCANx to T. Braun. The BioMaterialBank Nord, member of the Popgen 2.0 Network (P2N), is supported by the German Center for Lung Research, funded by the German Ministry for Education and Research (01ER1103). Human samples were provided by the UGMLC Giessen Biobank, member of the DZL Platform Biobanking, which also is supported by DFG grant SFB1213-Project-ID 268555672-TP CP01. The funders had no role in study design, data collection and analysis, or decisions to prepare or publish the manuscript.
Extended data
Source data
Author contributions
X.Y. and T. Braun conceived and designed experiments. F.W. performed most of the experiments, analyzed the data and prepared figures. D.D. performed Nano-seal. S.G. performed next-generation deep sequencing. M.L., S.G. and C.K. performed bioinformatics analysis of RNA-seq and ChIP–seq. H.L. analyzed CAGE–seq and Nano-seal data. T. Boettger and N.W. contributed to microarray analysis. S.L., S.O. and M.P. provided transgenic mouse lines. S.K. and H.R. provided the mouse asthma models and N.W. the chronic hypoxia mouse model. Y.Z., X.L., A.A. and U.G. participated in data analysis and discussions and provided advice. T. Braun, X.Y. and F.W. wrote the manuscript.
Peer review
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Funding
Open access funding provided by Max Planck Society.
Data availability
Data have been deposited in public databases. RNA-seq data are available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE166816, Pol II pSer5 ChIP–seq data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE166815, CAGE–seq data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE168206, Nano-seal data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE202201 and H3K36me3 ChIP–seq data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE201924. Microarray data were deposited at www.ebi.ac.uk/arrayexpress/ under the accession number E-MTAB-10144. Source data are provided with this paper.
Code availability
All scripts used to analyze datasets described in this study are available at https://github.com/loosolab/Wu_et_al_2022_Spurious_transcription/blob/main/README.md (10.5281/zenodo.7248507). Published software were used for all calculations and visualizations. Separate folders are provided, according to the data type. Commands in *base_processing.sh should be performed first, followed by those in *downstream_processing.sh, if necessary. Remaining scripts contain helper functions. Further information is provided in Methods and Reporting summary.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Xiang Li, Mario Looso, Hang Liu.
Contributor Information
Xuejun Yuan, Email: Xuejun.Yuan@mpi-bn.mpg.de.
Thomas Braun, Email: Thomas.Braun@mpi-bn.mpg.de.
Extended data
is available for this paper at 10.1038/s41588-022-01252-3.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-022-01252-3.
References
- 1.Veloso A, et al. Rate of elongation by RNA polymerase II is associated with specific gene features and epigenetic modifications. Genome Res. 2014;24:896–905. doi: 10.1101/gr.171405.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jones PA. The DNA methylation paradox. Trends Genet. 1999;15:34–37. doi: 10.1016/S0168-9525(98)01636-9. [DOI] [PubMed] [Google Scholar]
- 3.Neri F, et al. Intragenic DNA methylation prevents spurious transcription initiation. Nature. 2017;543:72–77. doi: 10.1038/nature21373. [DOI] [PubMed] [Google Scholar]
- 4.Wagner EJ, Carpenter PB. Understanding the language of Lys36 methylation at histone H3. Nat. Rev. Mol. Cell Biol. 2012;13:115–126. doi: 10.1038/nrm3274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nicolas E, et al. Distinct roles of HDAC complexes in promoter silencing, antisense suppression and DNA damage protection. Nat. Struct. Mol. Biol. 2007;14:372–380. doi: 10.1038/nsmb1239. [DOI] [PubMed] [Google Scholar]
- 6.Kim T, Xu Z, Clauder-Munster S, Steinmetz LM, Buratowski S. Set3 HDAC mediates effects of overlapping noncoding transcription on gene induction kinetics. Cell. 2012;150:1158–1169. doi: 10.1016/j.cell.2012.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ooi SK, Bestor TH. The colorful history of active DNA demethylation. Cell. 2008;133:1145–1148. doi: 10.1016/j.cell.2008.06.009. [DOI] [PubMed] [Google Scholar]
- 8.Shen L, et al. Genome-wide analysis reveals TET- and TDG-dependent 5-methylcytosine oxidation dynamics. Cell. 2013;153:692–706. doi: 10.1016/j.cell.2013.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Melamed P, Yosefzon Y, David C, Tsukerman A, Pnueli L. Tet enzymes, variants, and differential effects on function. Front. Cell Dev. Biol. 2018;6:22. doi: 10.3389/fcell.2018.00022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dawlaty MM, et al. Combined deficiency of Tet1 and Tet2 causes epigenetic abnormalities but is compatible with postnatal development. Dev. Cell. 2013;24:310–323. doi: 10.1016/j.devcel.2012.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lio CJ, et al. TET methylcytosine oxidases: new insights from a decade of research. J. Biosci. 2020;45:21. doi: 10.1007/s12038-019-9973-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Pfeifer GP, Szabó PE, Song J. Protein interactions at oxidized 5-methylcytosine bases. J. Mol. Biol. 2019;432:1718–1730. doi: 10.1016/j.jmb.2019.07.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gomez D, Swiatlowska P, Owens GK. Epigenetic control of smooth muscle cell identity and lineage memory. Arterioscler. Thromb. Vasc. Biol. 2015;35:2508–2516. doi: 10.1161/ATVBAHA.115.305044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Owens GK, Kumar MS, Wamhoff BR. Molecular regulation of vascular smooth muscle cell differentiation in development and disease. Physiol. Rev. 2004;84:767–801. doi: 10.1152/physrev.00041.2003. [DOI] [PubMed] [Google Scholar]
- 15.Wright DB, et al. Phenotype modulation of airway smooth muscle in asthma. Pulm. Pharm. Ther. 2013;26:42–49. doi: 10.1016/j.pupt.2012.08.005. [DOI] [PubMed] [Google Scholar]
- 16.Yap HM, Israf DA, Harith HH, Tham CL, Sulaiman MR. Crosstalk between signaling pathways involved in the regulation of airway smooth muscle cell hyperplasia. Front. Pharmacol. 2019;10:1148. doi: 10.3389/fphar.2019.01148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yan F, Gao H, Zhao H, Bhatia M, Zeng Y. Roles of airway smooth muscle dysfunction in chronic obstructive pulmonary disease. J. Transl. Med. 2018;16:262. doi: 10.1186/s12967-018-1635-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Liu R, et al. Ten-eleven translocation-2 (TET2) is a master regulator of smooth muscle cell plasticity. Circulation. 2013;128:2047–2057. doi: 10.1161/CIRCULATIONAHA.113.002887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wendling O, Bornert JM, Chambon P, Metzger D. Efficient temporally-controlled targeted mutagenesis in smooth muscle cells of the adult mouse. Genesis. 2009;47:14–18. doi: 10.1002/dvg.20448. [DOI] [PubMed] [Google Scholar]
- 20.Lopez-Moyado IF, et al. Paradoxical association of TET loss of function with genome-wide DNA hypomethylation. Proc. Natl Acad. Sci. USA. 2019;116:16933–16942. doi: 10.1073/pnas.1903059116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Fong N, Saldi T, Sheridan RM, Cortazar MA, Bentley DL. RNA Pol II dynamics modulate co-transcriptional chromatin modification, CTD phosphorylation, and transcriptional direction. Mol. Cell. 2017;66:546–557.e3. doi: 10.1016/j.molcel.2017.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.McDaniel SL, Strahl BD. Shaping the cellular landscape with Set2/SETD2 methylation. Cell. Mol. Life Sci. 2017;74:3317–3334. doi: 10.1007/s00018-017-2517-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Shiraki T, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA. 2003;100:15776–15781. doi: 10.1073/pnas.2136655100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kodzius R, et al. CAGE: cap analysis of gene expression. Nat. Methods. 2006;3:211–222. doi: 10.1038/nmeth0306-211. [DOI] [PubMed] [Google Scholar]
- 25.Yu M, et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell. 2012;149:1368–1380. doi: 10.1016/j.cell.2012.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ribeiro-Silva JC, Miyakawa AA, Krieger JE. Focal adhesion signaling: vascular smooth muscle cell contractility beyond calcium mechanisms. Clin. Sci. (Lond.) 2021;135:1189–1207. doi: 10.1042/CS20201528. [DOI] [PubMed] [Google Scholar]
- 27.Kawasaki T, Kawai T. Toll-like receptor signaling pathways. Front. Immunol. 2014;5:461. doi: 10.3389/fimmu.2014.00461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Feil S, et al. Transdifferentiation of vascular smooth muscle cells to macrophage-like cells during atherogenesis. Circ. Res. 2014;115:662–667. doi: 10.1161/CIRCRESAHA.115.304634. [DOI] [PubMed] [Google Scholar]
- 29.Boettger T, et al. Acquisition of the contractile phenotype by murine arterial smooth muscle cells depends on the Mir143/145 gene cluster. J. Clin. Invest. 2009;119:2634–2647. doi: 10.1172/JCI38864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Whitsett JA, Alenghat T. Respiratory epithelial cells orchestrate pulmonary innate immunity. Nat. Immunol. 2015;16:27–35. doi: 10.1038/ni.3045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sorokin V, et al. Role of vascular smooth muscle cell plasticity and interactions in vessel wall inflammation. Front. Immunol. 2020;11:599415. doi: 10.3389/fimmu.2020.599415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Baubec T, et al. Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature. 2015;520:243–247. doi: 10.1038/nature14176. [DOI] [PubMed] [Google Scholar]
- 33.Xie L, et al. KDM5B regulates embryonic stem cell self-renewal and represses cryptic intragenic transcription. EMBO J. 2011;30:1473–1484. doi: 10.1038/emboj.2011.91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Schlee M, Hartmann G. Discriminating self from non-self in nucleic acid sensing. Nat. Rev. Immunol. 2016;16:566–580. doi: 10.1038/nri.2016.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Xie CQ, et al. A comparison of murine smooth muscle cells generated from embryonic versus induced pluripotent stem cells. Stem Cells Dev. 2009;18:741–748. doi: 10.1089/scd.2008.0179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Qi H, et al. Epigenetic regulation by Suv4-20h1 in cardiopulmonary progenitor cells is required to prevent pulmonary hypertension and chronic obstructive pulmonary disease. Circulation. 2021;144:1042–1058. doi: 10.1161/CIRCULATIONAHA.120.051680. [DOI] [PubMed] [Google Scholar]
- 37.Nowroozilarki N, et al. Anti-inflammatory role of CD11b(+)Ly6G(+) neutrophilic cells in allergic airway inflammation in mice. Immunol. Lett. 2018;204:67–74. doi: 10.1016/j.imlet.2018.10.007. [DOI] [PubMed] [Google Scholar]
- 38.van Geffen C, et al. Myeloid-derived suppressor cells dampen airway inflammation through prostaglandin E2 receptor 4. Front. Immunol. 2021;12:695933. doi: 10.3389/fimmu.2021.695933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Haczku A, et al. Aspergillus fumigatus-induced allergic airway inflammation alters surfactant homeostasis and lung function in BALB/c mice. Am. J. Respir. Cell Mol. Biol. 2001;25:45–50. doi: 10.1165/ajrcmb.25.1.4391. [DOI] [PubMed] [Google Scholar]
- 40.Han H, Ziegler SF. Bronchoalveolar lavage and lung tissue digestion. Bio Protoc. 2013;3:e859. doi: 10.21769/BioProtoc.859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–930. doi: 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
- 44.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Dahl JA, Collas P. A rapid micro chromatin immunoprecipitation assay (microChIP) Nat. Protoc. 2008;3:1032–1045. doi: 10.1038/nprot.2008.68. [DOI] [PubMed] [Google Scholar]
- 46.Nelson JD, Denisenko O, Sova P, Bomsztyk K. Fast chromatin immunoprecipitation assay. Nucleic Acids Res. 2006;34:e2. doi: 10.1093/nar/gnj004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Davis MP, van Dongen S, Abreu-Goodger C, Bartonicek N, Enright AJ. Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods. 2013;63:41–49. doi: 10.1016/j.ymeth.2013.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhang Y, et al. Model-based analysis of ChIP–Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Robinson JT, et al. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42:W187–W191. doi: 10.1093/nar/gku365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Zerbino DR, Johnson N, Juettemann T, Wilder SP, Flicek P. WiggleTools: parallel processing of large collections of genome-wide datasets for visualization and statistical analysis. Bioinformatics. 2014;30:1008–1009. doi: 10.1093/bioinformatics/btt737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Takahashi H, Lassmann T, Murata M, Carninci P. 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat. Protoc. 2012;7:542–561. doi: 10.1038/nprot.2012.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wang L, et al. BMP9 and BMP10 act directly on vascular smooth muscle cells for generation and maintenance of the contractile state. Circulation. 2021;143:1394–1410. doi: 10.1161/CIRCULATIONAHA.120.047375. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data have been deposited in public databases. RNA-seq data are available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE166816, Pol II pSer5 ChIP–seq data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE166815, CAGE–seq data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE168206, Nano-seal data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE202201 and H3K36me3 ChIP–seq data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE201924. Microarray data were deposited at www.ebi.ac.uk/arrayexpress/ under the accession number E-MTAB-10144. Source data are provided with this paper.
All scripts used to analyze datasets described in this study are available at https://github.com/loosolab/Wu_et_al_2022_Spurious_transcription/blob/main/README.md (10.5281/zenodo.7248507). Published software were used for all calculations and visualizations. Separate folders are provided, according to the data type. Commands in *base_processing.sh should be performed first, followed by those in *downstream_processing.sh, if necessary. Remaining scripts contain helper functions. Further information is provided in Methods and Reporting summary.