Abstract
RNA polymerase (Pol) I, II, and III are most commonly described as having distinct roles in synthesizing ribosomal RNA (rRNA), messenger RNA (mRNA), and specific small noncoding (nc)RNAs, respectively. This delineation of transcriptional responsibilities is not definitive, however, as evidenced by instances of Pol II recruitment to genes conventionally transcribed by Pol III, including the co-transcription of RPPH1 - the catalytic RNA component of RNase P. A comprehensive understanding of the interplay between RNA polymerase complexes remains lacking, however, due to limited comparative analyses for all three enzymes. To address this gap, we applied a uniform framework for quantifying global Pol I, II, and III occupancies that integrates currently available human RNA polymerase ChIP-seq datasets. Occupancy maps are combined with a comprehensive multi-class promoter set that includes protein-coding genes, noncoding genes, and repetitive elements. While our genomic survey appropriately identifies recruitment of Pol I, II, and III to canonical target genes, we unexpectedly discover widespread recruitment of the Pol III machinery to promoters of specific protein-coding genes, supported by colocalization patterns observed for several Pol III-specific subunits. We show that Pol III-occupied Pol II promoters are enriched for small, nascent RNA reads terminating in a run of 4 Ts, a unique hallmark of Pol III transcription termination and evidence of active Pol III activity at these sites. Pol III disruption differentially modulates the expression of Pol III-occupied coding genes, which are functionally enriched for ribosomal proteins and genes broadly linked to unfavorable outcomes in cancer. Our map also identifies additional, currently unannotated genomic elements occupied by Pol III with clear signatures of nascent RNA species that are sensitive to disruption of La (SSB) - a Pol III-related RNA chaperone protein. These findings reshape our current understanding of the interplay between Pols II and III and identify potentially novel small ncRNAs with broad implications for gene regulatory paradigms and RNA biology.
Introduction
RNA polymerase (Pol) I, II, and III are recruited by distinct transcription factor (TF) repertoires that functionally subdivide the recruitment and ensuing transcription potential of each enzyme. However, the distinctions between these TF repertoires and the transcriptional machineries themselves are blurred by the fact that multiple subunits and individual regulatory factors are shared by two or more polymerases1–3. Taken further, the simple paradigm of transcriptional division itself is challenged by examples of Pol II overlap reported at both Pol I and Pol III-transcribed genes4–6, including examples of shared transcription7. The full extent of RNA polymerase overlap and shared transcription remains largely unexplored, nevertheless, due to limited genomic studies of Pol I and Pol III in humans.
Pol III is unique from other polymerases in multiple respects, including that it exclusively transcribes genes encoding small noncoding RNA (ncRNA), many of which are critical for cell growth and proliferation8–10. Examples include tRNA and 5S ribosomal RNA (rRNA) - both integral components of translation - as well as H1 RNA, the catalytic RNA component of RNase P7. The gene encoding H1, RPPH1, is transcribed by both Pol III and Pol II, exemplifying scenarios of competent recruitment and transcription by multiple RNA polymerase machineries11.
Recently, we explored the global occupancy of human Pol III by integrating ChIP-seq experiments for a large subset of Pol III subunits12. Defining the binding patterns of many Pol III subunits increases confidence of complex, rather than subunit-specific, chromatin signatures. We note that these signatures unexpectedly included numerous overlapping coordinates beyond the canonical annotation of the Pol III “transcriptome”, defined on the basis of previous biochemical and genomic mapping experiments12.
Here, we reexamine the canonical and noncanonical binding preferences of human Pol III by integrating multiple genomic approaches with comprehensive coding and noncoding gene annotations. We apply a uniform, context-agnostic framework for scoring Pol I, II, and III occupancies by integrating >200 ChIP-seq datasets, facilitating unbiased annotation of polymerase occupancy and overlap. While this approach recovers canonical recruitment and transcription patterns, we unexpectedly observe high Pol III scores at specific protein-coding gene promoters. These results inspired deeper consideration of the transcriptional competency and consequences of Pol III occupancy at these sites.
Pol III transcription termination relies on complementary binding between nascent RNA and a relatively short repeat of 4 to 6 Thymidine (T) sequences on the non-template DNA strand13. Thereafter, the terminal oligo(U) tract in the corresponding nascent RNA is recognized by SSB (La), an RNA chaperone protein important for Pol III-derived RNA stability and processing14. The Pol III termination signature is unique from Pol I, which relies on a relatively long T-tract15, as well as Pol II termination signals following cleavage and polyadenylation of mRNA16. We therefore developed a complementary framework for scoring global patterns of single nucleotide 3’-end pileup at minimal sequences of 4 Ts (referred to here as “T4”) as a molecular breadcrumb of Pol III transcription. These data, when coupled with Pol III occupancy patterns, provide compelling evidence of active Pol III transcription at a multitude of protein-coding gene promoters. Moreover, the Pol II-derived mRNA products from genes bound by Pol III are notably sensitive to Pol III disruption, altogether establishing a novel link between Pol III recruitment, transcription, and important downstream consequences on protein-coding genes.
Results
Global interrogation of RNA polymerase I, II, III occupancy
We first retrieved available ChIP-seq data corresponding to complex-specific Pol I, II, and III subunits. Previous mapping experiments for human Pol I are exceptionally limited, with only 2 datasets reported for binding of the large sub-units, POLR1A and POLR1B17,18. Pol II datasets are significantly more abundant, with 153 POLR2A ChIP-seq experiments available for the large subunit of Pol II. ChIP-seq experiments for Pol III, on the other hand, include a multitude of Pol III-specific subunits, including POLR3A, -B, -C, -D, -E, and -G (Figure 1a, Supplemental Table 1). The broader profile of Pol III subunit occupancy is meaningful for more confidently establishing de facto recruitment of the Pol III machinery.
Figure 1. Genome-wide survey of human RNA polymerase binding patterns.

(a) Illustrative overview of Pol I, II, and III complexes, coloring and labels correspond to individual subunits mapped by ChIP-seq. (b) Application of a uniform scoring framework (left) for composite Pol I, II, and III ChIP-seq signals. Pol occupancy scores are defined globally across 50 bp bins by assigning adjusted Poisson probabilities determined against maximum, bin-specific lambda scores. Global Pol scoring recovers canonical polymerase-gene dominance at rRNA (Pol I), snoRNA (Pol II), and tRNA (Pol III). (c) Specific examples of Pol I dominance at ribosomal RNA genes (RNA45SN3, left), Pol II dominance at snoRNA genes (SNORD118, center), and Pol III dominance at tRNA genes (tRNA-Thr, right). (d) Coding, noncoding, and repeat annotations were integrated into a multi-class promoter set, defined as a window 50 bp upstream to 150 downstream of transcription start sites, for global scoring of polymerase occupancies.
Pol II and III ChIP-seq data were compiled by unique cellular context for each individual subunit and uniformly scaled to the minimum number of subunit-specific sample reads. Summary profiles were then generated by combining all processed data and subsequently scaling to 100 million total reads, thereby producing a profile weighed equally by each biological context. Finally, a median Pol III summary profile was calculated across all individual subunits and rescaled to 100 million reads. Due to the limited experimental data for Pol I, POLR1A and POLR1B datasets were simply aggregated and analogously scaled to 100 million reads.
We next applied a peak calling framework for establishing global polymerase-specific scores at 50 bp resolution. Briefly, a Poisson probability was calculated for every 50 bp bin by comparing the observed signal to a local expectation - lambda - defined as the maximum polymerase signal at 5 Kb, 10 Kb, or genome-wide, a stringent approach that accounts for local chromatin biases specific to each genomic window19. We then compared Pol I, II, and III scores at canonical gene annotations, including rRNA, snoRNA, and tRNA. As expected, our uniform scoring framework successfully recovers Pol I-specific occupancy of rRNA, Pol II-specific occupancy of snoRNA, and Pol III-specific occupancy of tRNA (Figure 1b, specific examples shown in Figure 1c).
To effectively visualize variable levels of polymerase occupancy at these and other gene categories, we applied a center-of-mass equation to define the coordinate position of each specific gene, thereby establishing a tool for simultaneous three-way comparison of Pol I, II, and III “dominance”. Genes with strong signal for only a single polymerase will be positioned close to its respective enzyme (e.g., tRNA-Pol III, Figure 1b), whereas genes with strong signal for two machineries will be positioned closer to the center of a given axis. Using this approach, specific Pol III-transcribed tRNA genes include notable Pol II signal, consistent with previous reports, yet specific Pol II-transcribed snoRNA genes are also pulled, to some degree, from Pol II dominance towards Pol III (e.g., Figure 1b, arrows). We therefore expanded our survey and more closely examined significant Pol III binding events at both canonical and noncanonical genes.
To systematically annotate Pol I, II, and III patterns while also addressing the underlying challenges of variable gene annotations sets, we integrated coding and noncoding genes by combining Ensembl transcript annotations (GENCODE V43) with the RNAcentral noncoding RNA sequence database (v22.0). Gene promoter windows were uniformly defined as a 200 bp window, including 50 bp upstream and 150 bp downstream of each individual transcription start site, harmonized across instances of varied annotation. Additionally, intervals corresponding to repeat sequences annotated by RepeatMasker were incorporated into our curated, multi-class promoter resource (Figure 1d). We thereafter assigned the maximum score for each polymerase over a given genomic interval and identified protein-coding, noncoding, and repeat intervals with significant polymerase occupancy (adjusted p-value < 0.01).
Pol III transcription and termination at canonical genes
Beyond tRNA and 5S rRNA, the Pol III transcriptome includes U6 and U6atac spliceosomal RNA, 7SK small nuclear (sn)RNA, 7SL signal recognition particle (SRP) RNA and its evolutionary derivatives - Alu, BC200, and snaR - as well as Y RNA, RNase P and RNase MRP RNA (RMRP, H1), Vault RNA, and nc8668,10,20. Recruitment of Pol III to each gene type is directed through either gene-internal promoter elements (e.g. 5S rRNA, tRNA) or upstream regulatory sequences that are recognized by specific transcription factor repertoires21. Consistent with this established annotation of the Pol III-transcriptome, we observe strong Pol III occupancy scores across most gene subclasses for Pol III (Figure 2a). However, whereas Pol III is the dominant polymerase at nearly all canonical genes, our framework recovers shared dominance of RPPH1 by Pol II and Pol III as previously described (Supplemental Figure 1)7.
Figure 2. Discovery of RNA polymerase III occupancy and transcription termination signatures at noncanonical loci, including protein-coding gene promoters.
(a) Analysis of polymerase signal dominance at specific subclasses of Pol III-transcribed genes, including 5S rRNA, 7SK snRNA, RMRP, Y RNA, Vault, and nc886. (b) Illustrative overview of Pol III termination, which occurs at a run of 4–6 Thymidine (T) sequences. T4 scoring of 3’-single nucleotide pileup in nascent RNA is used as a molecular readout of Pol III transcription termination. (c) T4 score enrichment for specific Pol III-transcribed gene subclasses, with respect to all sites with significant Pol III occupancy (padj < .05). (d) Pol I, II, and III signal enrichment at gene promoters, ranked by maximum T4 scores present within 350 bp of a given transcription start site. (e) RNA polymerase dominance plot for small noncoding RNA genes not previously defined as Pol III-transcribed but characterized by significant Pol III occupancy in our metamap. (f) Analogous dominance plot for protein-coding genes with significant Pol III occupancy. (g-h) Observations and corresponding empirical null distributions for T4 scores at noncanonical small RNA genes (related to Figure 2e) and protein-coding gene promoters (related to Figure 2f). (i) Visualization of Pol I, II, IIII occupancies at RNU11 - a small ncRNA gene canonically expressed by Pol II - and multiple protein-coding genes, including SRSF7, RPL5, and DPY19L4. All genes are defined as having significant Pol III occupancy and T4 score enrichment. Significant T4 score peaks are highlighted.
To further query Pol III activity signatures at canonical genes, we generated a complementary composite profile of precision nuclear run-on RNA-sequencing (PRO-seq), a method that isolates nascent RNA to measure transcription rather than steady-state RNA levels (Supplemental Table 2)22. Based on the unique termination mechanism of Pol III at 4–6 Ts, we considered that single-nucleotide read enrichment mapping to the terminal 3’ nucleotide of nascent RNA over 4 or more contiguous Ts would be a strong indicator of Pol III transcription termination (Figure 2b). We therefore applied a similar peak scoring framework to 3’-read pileup instances at T4 sequences, deriving a local lambda from the maximum signal intensity at either 20 or 50 bp surrounding each T4. Gene-specific T4 scores were then calculated by retrieving the maximum gene-internal T4 score within 350 bp of a transcription start site. Using this approach, we find that most Pol III-transcribed genes are significantly enriched for strong T4 signatures (Figure 2c). Interestingly, T4 enrichment patterns are notably lower for tRNA genes compared to other Pol III-transcribed gene classes, likely driven by rapid cleavage events of nascent tRNA 3’ trailer sequences23. Globally, we find that Pol III is highly enriched at genes ranked by maximum T4 score, in contrast to Pol I and Pol II (Figure 2d). These data indicate a clear association of nascent T4 enrichment patterns and Pol III activity, both at canonical target genes and potentially unexpected loci. We therefore turned our attention to the broader patterns of Pol III occupancy and transcription towards better understanding the global Pol III landscape.
Pol III occupancy and transcription signatures at noncanonical small RNA and protein-coding genes
We first re-visited our RNA polymerase scoring framework, specifically searching for sites with significant Pol III levels beyond the canonical Pol III transcriptome. For all protein-coding transcription start sites, genomic intervals within 500 bp of an established Pol III-transcribed gene or a similar small ncRNA subclass were masked to avoid potential distance-related artifacts. Altogether, this approach identifies 58 noncoding gene promoters and, surprisingly, 1,572 protein-coding gene promoters with significant Pol III signal. Whereas most noncanonical ncRNA genes are dominated by Pol II (Figure 2e), we unexpectedly observe a continuum of Pol II-Pol III dominance at protein-coding gene promoters, with examples of Pol III dominance over Pol II (Figure 2f).
By surveying the T4 termination signatures at noncanonical Pol III loci, we further uncover significant T4 score enrichment within Pol III-occupied small ncRNA and protein-coding genes, strongly indicative of Pol III transcription at these loci (Figure 2g–h). Accordantly, single-nucleotide T4 pileup scores are enriched in Pol III-occupied genes, most significantly within the first ~ 100 bp, suggesting that Pol III activity is restricted to a short genomic interval, likely constrained by the frequency of T4 sequence events (Supplemental Figure 2). Visualization of individual noncanonical loci, such as RNU11 - a Pol II-derived snRNA component of the minor spliceosome - and protein-coding genes such as SRSF7, RPL5, and DPY19L4, illustrates this general trend of Pol III occupancy and sharp T4 pileups a short distance from transcription start sites (Figure 2i).
In addition to T4 score enrichment, we considered whether specific transcription factors (TFs) were enriched at Pol III-occupied protein-coding genes. TF-binding patterns were retrieved from ChIP-atlas24 and corresponding enrichment patterns determined by comparison against a randomization-based null model. Examination of TFs with established roles in Pol III recruitment reveals notable enrichment for BRF1, a TFIIIB subcomponent involved in recruiting Pol III to type 1 and type 2 promoters21, and SNAPC4, a SNAP-complex protein involved in recruiting Pol III to type 3 promoters (Supplemental Figure 3)21. We therefore similarly applied our uniform framework for scoring composite ChIP-seq profiles to Pol III-related transcription factors, including TFIIIB subunits (BRF1, TBP), SNAPC subunits (SNAPC1, −2, −4, −5), and TFIIIC subunits (GTF3C-1, −2, −5). Consistent with the overlap enrichment results, this method uncovers a strong correlation between Pol III and BRF1 scores at protein-coding gene promoters, with comparatively lower correlation scores observed for TBP, SNAPC4, and GTF3C5, and minimal scores for other TFIIIC and SNAPC subunits (Supplemental Figure 3). Examination of individual gene promoters further supports these trends, with evidence of BRF1 occupancy at specific protein-coding genes, such as RPS7 (ribosomal protein S7), which features significant T4 scores indicative of Pol III termination.
Pol III occupied protein-coding genes are differentially sensitive to Pol III depletion and inhibition
The discovery of Pol III occupancy and transcription termination signatures led us to investigate the effect of Pol III disruption on these and other protein-coding gene subpopulations. Knockdown of POLR3A - the gene encoding Pol III large subunit RPC1 - is sufficient to significantly reduce Pol III-derived gene products, including multiple tRNAs (Figure 3a–b). We show that the consequence of POLR3A disruption extends far beyond canonical Pol III products, resulting in upregulation and downregulation of several thousand protein-coding genes (Figure 3c). Analysis of mRNA dynamics corresponding to genes occupied by Pol III, or occupied by Pol III with evidence of transcription termination, shows that reduction of Pol III generally reduces, rather than increases, the expression of protein-coding genes occupied by Pol III compared with genes lacking Pol III occupancy (Figure 3d).
Figure 3. Pol III-occupied protein-coding gene promoters are sensitive to Pol III depletion and inhibition.
(a) Immunoblot analysis POLR3A (RPC1) and beta-Tubulin protein levels in control (scramble) and POLR3A siRNA conditions in HEK293T cells. (b) RT-qPCR analysis of the relative change in Pol III-transcribed tRNA levels following siRNA-mediated reduction in HEK293T. (c) Differential expression profile for protein coding genes, including the number of up- and down-regulated genes (FDR < 0.05 and abs(Log2FC > 1) following disruption of POLR3A. (d) Distributions of significance-weighted fold change scores for protein-coding genes without Pol III, with Pol III, or with both Pol III and significant T4 scores. (e) Differential expression profile for protein coding genes, including the number of up- and down-regulated genes (FDR < 0.05 and abs(Log2FC > 1) following Pol III inhibition with small molecule ML-60218. (f) Distributions of significance-weighted fold change scores for protein-coding genes without Pol III, with Pol III, or with both significant Pol III and T4 scores, following Pol III inhibition with small molecule ML-60218. (g-h) Specific example of shared Pol II and Pol III occupancy at a protein-coding gene NRAS (g) which is highly sensitive to Pol III disruption (h), either through depletion of POLR3A or through inhibition of Pol III with ML-60218. (i-j) Additional example of Pol III occupancy (i) and and Pol III sensitivity (j) at protein-coding gene AMD1. (k) Total number of shared differentially down-(blue) and upregulated genes (red) following si-POLR3A or ML-60218 (padj < 0.01). (l) Overlap enrichment of Pol III occupancy at down-, up-, and nondifferential (N.D.) gene sets
As a complementary approach, we disrupted Pol III function with ML-60218 - a small molecule Pol III-specific inhibitor that we show drives upregulation and down-regulation of thousands of genes within an acute, 4-hour exposure (Figure 3e)25. As a distribution, Pol III-occupied protein-coding genes generally decrease in response to ML-60218, consistent with the biased effect of POLR3A depletion (Figure 3f).
In support of this trend, examination of specific Pol III-occupied genes illustrates examples of Pol III-sensitive protein-coding genes. For example, NRAS - a proto-oncogene encoding a Ras-family GTPase implicated in the progression of several forms of cancer26 - is included among the list of si-POLR3A and ML-60218 sensitive genes, with strong evidence of Pol III occupancy and signatures consistent with Pol III termination (Figure 3g–h). Similarly, AMD1, which encodes an adenosylmethionine decarboxylase implicated in multiple forms of cancer progression27–29, is characterized by strong evidence of Pol III occupancy and termination, and sensitive to si-POLR3A and ML-60218 exposure (Figure 3i–j).
We next considered whether significant Pol III-binding events, defined by our global occupancy map, were overrepresented at Pol III-sensitive genes. We reasoned that genes determined to be significantly differential in both si-POLR3A and ML-60218 experiments are most likely to represent true, primary Pol III-sensitive features. Altogether, 695 genes are significantly downregulated in both conditions, and 286 significantly upregulated (Figure 3k; adjusted p-value < 0.01 in both conditions). Integration of these Pol III-sensitive gene sets with Pol III occupancy signatures uncovers significant over-representation of Pol III-binding, specifically at genes downregulated in both si-POLR3A and ML-60218 experiments (Figure 3l). Although Pol III-occupied loci continue to possess higher T4 scores independent of a particular gene group, the enrichment of T4 scores at genes that are downregulated by si-POLR3A and ML-60218 does not reach statistical significance (Supplemental Figure 4). Nevertheless, the enrichment of Pol III occupancy at downregulated genes, which contrasts that of both upregulated and nondifferential gene sets, further supports a model in which Pol III recruitment and transcription may functionally enhance Pol II-mediated production of full-length mRNAs from these loci.
Pol III-sensitive and Pol III-occupied protein-coding genes are linked with Pol III correlation signatures across cancer
Building on our observations of Pol III occupancy and Pol III sensitivity at protein-coding genes, we further explored the relationship between Pol III and gene expression signatures captured by The Cancer Genome Atlas (TCGA) across thousands of diverse primary tumors and cancers30. We specifically examined gene expression correlates for Pol III-specific subunits POLR3A, POLR3B, POLR3C, POLR3D, POLR3E, POLR3F, POLR3G, POLR3GL, POLR3H, and POLR3K with more than 20,000 genes (Figure 4a). Though RNA levels are an imperfect representation of Pol III machinery availability in any particular context, the expansive breadth of expression signatures across thousands of samples offers a reasonably confident indication, collectively, of the variable Pol III levels for the purpose of linking specific genes to Pol III. Using this approach, we identify distinct subpopulations of protein-coding genes that are strongly associated, either positively or negatively, with the expression of most Pol III-specific subunits (Figure 4a).
Figure 4. Expression of Pol III-sensitive and Pol III-occupied protein-coding genes is broadly linked with Pol III subunit expression and unfavorable outcomes in cancer.
(a) Heatmap visualization of gene expression correlation scores for > 20,000 genes, ranked by the median correlation with Pol III-specific subunits. (b-c) Analysis of subunit-specific correlation features for genes downregulated in response to both si-POLR3A and ML-60218 experiments (b), and genes that are non-differential in both experiments (c). Barplots illustrate the strength and direction of correlation between a given gene set and genes encoding individual Pol III subunits, benchmarked against empirical null distributions. A summary z-score, calculated using Stouffer’s method, is represented at the top. (d-e) Analogous comparison of subunit-specific correlation features for Pol III occupied (d) and Pol III unoccupied (e) protein-coding genes. (f) Distributions of pan-cancer survival statistics related to the expression of genes that are nondifferential (N.D.), downregulated by both si-POLR3A and ML-60218 (downreg.), or upregulated in both conditions (upreg.). Survival statistics were retrieved from tcga-survival.com. (g) Distributions of pan-cancer survival statistics related to the expression of genes that Pol III unoccupied (− Pol III), Pol III occupied (+ Pol III), or Pol III occupied with T4 signatures (+ T4, Pol III). (h-i) Distributions of pan-cancer survival statistics related to the expression of genes as a function of its median correlation with Pol III-specific subunits (related to a) for Pol III-sensitive (h) or Pol III-occupied (i) genes. (j) Gene ontology (GO) enrichment analysis for cellular component (CC) terms significant (FDR < 0.05) in both Pol III-sensitive and Pol III-occupied gene sets. (k) Enriched biological process (BP) GO terms for Pol III-sensitive (top) and Pol III-occupied (bottom) gene sets.
To determine whether Pol III-sensitive and Pol III-occupied genes are significantly linked in the multi-cancer gene expression map, we compared the observed correlation scores for Pol III subunits individually using a randomization-based null model. Revisiting the 695 si-POLR3A and ML-60218 sensitive genes, we uncover a strong positive association with the expression of most Pol III-specific subunits, whereas si-POLR3A and ML-60218 insensitive genes are largely unrelated (Figure 4b–c). Similarly, we identify an even stronger positive relationship between Pol III subunit expression with protein-coding genes occupied by Pol III in our survey, in clear contrast to Pol III unoccupied loci (Figure 4d–e). These data provide further evidence that Pol III occupancy likely contributes to the expression landscape of specific protein-coding genes.
Pol III-sensitive and Pol III-occupied protein-coding genes are functionally enriched for ribosomal proteins and genes linked to unfavorable outcomes in cancer
Pol III overactivity, which is ostensibly associated with its expansion to noncanonical loci such as protein-coding genes, has been described as a hallmark of cancer31–33. We therefore considered whether Pol III-sensitive, Pol III-occupied, and/or Pol III-correlated gene features are relevant to cancer outcomes in patients. We specifically considered the relationship between gene expression and clinical outcomes matched to specific primary tumors, as previously reported (tcga-survival.com34). Using this approach, we find that the distribution of genes downregulated by both si-POLR3A and ML-60218 have significantly higher z-scores, which reflect an indication of unfavorable outcomes among cancer patients, compared to si-POLR3A and ML-60218 insensitive genes (Figure 4f). Notably, this pattern is not observed for genes that are upregulated following Pol III disruption. To a lesser extent, we find that the distribution of Pol III-occupied protein-coding genes are also characterized as having significantly higher z-scores compared to Pol III-unoccupied genes (Figure 4g). However, further subdivideing Pol III-sensitive (downregulated) and Pol III-occupied protein-coding genes by their observed correlation patterns across cancer reveals a remarkable link between positively-correlated genes and negative patient outcomes (Figure 4h–i). These data give rise to a model in which Pol III recruitment and enhanced Pol II transcription of specific protein-coding genes may promote cellular growth and, consequently, disease progression.
To further explore whether specific gene functions are enriched in Pol III-related protein-coding genes, we performed gene ontology (GO) enrichment analysis on both Pol III-occupied and Pol III-sensitive gene sets. Both sets share significant enrichment for genes encoding ribosomal subunits, with Pol III-occupied genes most strongly enriched for these proteins (Figure 4j). While this is further reflected in Pol III-occupied genes being enriched for “cytoplasmic translation” and “ribosome biogenesis”, Pol III-sensitive genes are instead primarily enriched for mitochondrial related metabolic processes, including “oxidative phosphorylation” and “ATP synthesis coupled electron transport” (Figure 4k). These results suggest that Pol III-sensitive and Pol III-occupied genes are closely related to biosynthesis and energy metabolism directly linked to cellular growth.
Pol III recruitment and transcription at protein-coding genes does not produce stable small RNA species
Given the co-association of Pol III occupancy and T4 termination signatures at specific protein-coding genes, we naturally questioned whether Pol III activity at these sites produce stable small RNA species, hypothetically giving rise to a nested Pol III transcriptome derived from these regions. We therefore profiled small RNA abundance linked to Pol III-occupied protein-coding genes by integrating all human multi-tissue small RNA-seq data generated by the ENCODE project35 (Supplemental Table 3). Pol III-occupied protein-coding gene promoters are indeed characterized by significantly higher small RNA signatures, particularly for promoters with Pol III-binding and T4 signatures (Figure 5a). We find that this trend is further supported by small RNA-seq in HEK293T cells (Figure 5b). However, the distribution of small RNA abundance at these loci is minimal in comparison to canonical Pol III-transcribed genes (i.e. tRNA, Y RNA, vault RNA, etc.), suggesting that Pol III activity at these sites does not produce similarly stable ncRNA species (Figure 5c–d).
Figure 5. Depletion of RNA chaperone protein La (SSB) rules out the production of stable, promoter-derived small RNAs, but otherwise reveals multiple La-sensitive ncRNA species produced by RNA polymerase III at currently unannotated loci.
(a) Distributions of multi-tissue small RNA levels mapped to Pol III-unoccupied (− Pol III), Pol III-occupied (+ Pol III), or Pol III-occupied protein-coding gene promoters with significant T4 enrichment (+T4, Pol III). (b) Analogous distributions of small RNA abundance linked to specific protein-coding promoters in HEK293T cells, which are sensitive to si-POLR3A and ML-60218 treatments. (c) Distributions of multi-tissue small RNA levels mapped to Pol III occupied protein-coding genes, compared to canonical Pol III-transcribed gene classes. (d) Analogous distributions of small RNA levels in HEK293T cells. (e) Immunoblot and quantification of RNA chaperone protein La following siRNA depletion in HEK293T cells. (f) Significance-weighted (SW) log2(fold change) of 45S pre-rRNAs, snoRNAs, and tRNAs following depletion of La (left), and of small RNAs mapped to protein-coding gene promoters of varying Pol III occupancy signatures. (g) Model: recruitment of Pol III to protein-coding genes results in transcription that terminates at T4 sequences, typically a short distance from the transcription start site, in contrast to processive transcription by Pol II. Pol III activity functionally enhances Pol II, but does not generate stable, promoter-derived small RNAs. (h) Heatmap visualization of La sensitivity for small RNAs from currently unannotated or lncRNA-proximal loci with significant Pol III signatures. (i) Examples of La-sensitive intervals with genomic indicators of Pol III activity. (j) Candidate sequence elements, related to 5i, on the basis of La-sensitivity, Pol III localization, and small RNA reads mapping to individual regions. Putative A-box, B-box, and termination sequence elements are highlighted.
Nevertheless, we further considered whether small RNA produced at specific protein-coding genes might depend on La (SSB), an RNA chaperone protein that plays critical roles in the stability, maturation, and transport of nascent Pol III-derived ncRNAs36. As an RNA-binding protein, La specifically recognizes the 3’-terminal oligo(U) tract that is universally present on these nascent ncRNAs, a feature driven by the termination mechanism of Pol III13. We reasoned that the presence of T4 termination and resulting 3’-oligo(U) tracts on nascent, promoter-derived small RNAs might similarly depend on La for RNA stability. To this end, we find that differential small RNA-seq analysis following La depletion confirms sensitivity of tRNAs to La disruption, in contrast to Pol I-derived 45S pre-rRNA and Pol II-derived snoRNAs (Figure 5e–f). However, examination of differential small RNA levels mapped to Pol III-occupied promoter windows does not indicate a similar pattern of dependency on La (Figure 5f). The minimal abundance and lack of La-sensitivity suggest that Pol III activity at protein-coding genes does not give rise to additional functional ncRNA species originating from these intervals. Instead, these results support a model in which Pol III recruitment and its transcription of protein-coding promoter sequences, constrained by T4 termination signals, is generally and primarily limited to enhancing the production of specific mRNA subpopulations (Figure 5g).
Disruption of RNA chaperone protein La (SSB) uncovers potentially novel La-dependent Pol III-derived ncRNAs in humans
In addition to linking Pol III activity to protein-coding and other noncanonical genes, we finally considered whether our genomic atlas could successfully identify intervals in which Pol III transcription occurs beyond well-established coding, noncoding, and repeat sequences. We specifically queried genomic bins with significant Pol III signal and evidence of La-sensitivity, such that small RNA reads mapping to a given locus are significantly downregulated following La depletion. Out of 1,370 sites with Pol III occupancy that do not intersect any gene feature, or which otherwise intersect relatively uncharacterized lncRNAs37, we identify just a small number of regions with evidence of La-sensitivity (Figure 5h). Among the most compelling examples of putatively novel Pol III-derived ncRNAs, we show that strong Pol III occupancy and La-sensitivity converge on genomic windows proximal to uncharacterized lncRNAs on chromosomes 1 and 11, and to an unannotated locus within the protein-coding gene C1QTNF6 on chromosome 22 (Figure 5i). Evidence of Pol III activity at these loci is further supported by significant BRF1 signals present across all three intervals, consistent with the canonical mechanisms of Pol III recruitment. In addition, significant T4 termination signatures are present at two of these sites, strongly supporting Pol III-driven production of nascent RNA (Figure 5i).
We infer the putative sequences of these ncRNAs (preliminarily named nc96, nc111, and nc142 on the basis of sequence length), by relying on the multi-context small RNA profiles generated across numerous human tissues and T4 termination signals (Figure 5j). Upon close examination of each putative ncRNA, we identify intervals with strong sequence similarity to both A-box and B-box gene-internal regulatory elements critical for TFIIIC recruitment to genes with type 2 promoter architectures38. Indeed, TFIIIC subunit GTF3C5 - a component of the DNA-binding TFIIIC2 subcomplex - is mapped to all three windows, consistent with established mechanisms of Pol III recruitment (Supplemental Figure 5)39. The confluence of sequence, recruitment, and post-transcriptional regulatory signatures, together, strongly support the classification of nc96, nc111, and nc142 as novel Pol III-derived ncRNAs for future investigation.
Discussion
To date, limited efforts have been undertaken to understand the full extent of Pol III localization or the implications of Pol III activity beyond the boundaries of the currently established Pol III transcriptome. Our broad exploration of RNA polymerase occupancies, made possible by the 100s of ChIP-seq and RNA-seq datasets generated across human tissues and cell lines35, provides strong evidence of Pol III recruitment and transcription at specific protein-coding genes. These findings are supported by both Pol III occupancy and nascent RNA-seq analyses tailored to the unique termination mechanisms and resulting signatures tied to Pol III, and suggest that Pol III activity is limited to short elements constrained by the presence of T4 sequences capable of DNA-directed Pol III termination (Figure 5h). We note that previous Pol III transcription studies reported evidence of competent Pol III initiation at Pol II-transcribed gene promoters in vitro40, consistent with our discovery of widespread Pol III recruitment and transcription at Pol II-dependent protein-coding gene promoters in human cells. Though our work makes progress on understanding the significance of these events, our findings naturally give rise to new questions related to the mechanisms and implications of cross-talk between Pols II and III.
In the present study, a context-agnostic approach was made necessary by the currently limited breadth of Pol III mapping studies, precluding our ability to understand the contextual underpinnings of Pol III transcription at protein-coding genes. Nevertheless, if Pol III activity at protein-coding genes is restricted to specific cellular contexts or conditions, it may have been difficult to uncover Pol III activity at these sites without taking a comprehensive, multi-context perspective.
While current resources hint that BRF1, a TFIIIB complex subunit involved in Pol III recruitment at canonical type 1 and 2 promoters, is particularly enriched at Pol III-occupied protein-coding gene promoters, further delineation of the gene-specific Pol III recruitment mechanisms remain open-ended questions that future surveys of Pol III factors may address. Similarly, to what degree Pol III recruitment and activity at noncanonical loci is dependent on increased cellular levels and availability of Pol III, potentially driven under specific conditions, remains an important question for future research.
Functionally, the links between Pol III depletion and small molecule inhibition and the observed disruption (rather than enhancement) of mRNA produced from Pol III-occupied genes indicates that Pol III recruitment functionally promotes (rather than competes for) Pol II activity at these loci. The possibility of Pol III-dependent initiation and/or maintenance of an active chromatin environment, perhaps also intersecting promoter-proximal pausing mechanisms, are tempting preliminary models of speculation. In either case, the biological significance of this effect appears to be tied to protein-coding genes central to biosynthesis and cellular growth. We show, for example, that Pol III is recruited to numerous genes encoding ribosomal proteins and, additionally, that high expression of Pol III-occupied and Pol III-sensitive genes is linked to unfavorable outcomes in cancer. These findings point to a novel form of cross-talk between Pols II and III, such that increaseing Pol III activity may promote complementary increases in ribosomal proteins and other growth-related factors.
Finally, our study suggests that the small ncRNA products produced by Pol III at protein-coding gene promoters are minimally abundant, potentially due to rapid turnover. Although this result limits the notion of novel functional small RNA species derived from Pol III activity at these loci, we nevertheless uncover evidence for novel small RNA species derived from other genomic intervals that currently lack clear gene annotation features. While the significance of these ncRNAs will require targeted functional genomic studies, these data altogether reveal new and unexpected insights into the breadth of Pol III transcription.
Limitations of this study
Due to the context-agnostic nature of our RNA polymerase metamap, we are unable to define the specific cellular contexts in which Pol III activity is tied to a particular gene. However, our complementary Pol III disruption experiments (si-POLR3A and ML-60218) demonstrate a link between Pol III and Pol III-occupied protein-coding genes specifically in HEK293T, a transformed human embryonic kidney cell line amenable to functional studies. Application of large-scale, integrative genomic approaches targeting Pol III in a variety of cellular contexts is therefore needed to fully appreciate and delineate the dynamic range of Pol III expansion under specific contexts and conditions.
We additionally note that our pursuit of establishing Pol III occupancy patterns is dependent on the most recent “snapshot” of gene annotations in humans, and that several non-trivial decisions, such as setting gene-class priorities, gene promoter window sizes, and defining which genomic intervals to mask, play an important role in shaping the outcome of our study. In this way, numerous examples of Pol III occupancy may be missed in our metamap - a trade-off between sensitivity and specificity. Thus, potentially promising but uncalled Pol III target genes (i.e. false negatives) is an additional limitation of our study.
Methods
Cell culture and RNAi / small molecule Pol III inhibition
HEK293Tcells were obtained from ATCC (CRL-3216, Batch#70049877) and grown in 10 cm BioLite™ Cell Culture Treated Dishes (Catalog# 12–556-002 Thermo Scientific) in Dulbecco’s Modified Eagle Medium, high glucose (Catalog# 11965092, Gibco). HEK293T experiments were conducted on cells between passage 10–25.
For POLR3A depletion experiments, HEK293T cells were seeded into 6 well Nunc™ Cell-CultureTreated Multidishes (Thermo Scientific, Catalog# 140675) and incubated overnight. siRNA (listed below, IDT) is transfected using Lipofectamine® RNAiMax (cat # 13778150). Cells were incubated for 48 hours at 37 °C with 5% CO2 before collection. 80% of cell pellets are used for RNA extraction, performed using E.Z.N.A.® Total RNA Kit I (Catalog# R6834–01, Omega), with 100% ethanol instead of 70% at the binding step to enrich small RNA. 20% cells are saved for western blot.
| Scramble siRNA | Silencer™ Select Negative Control No. 1 siRNA (Catalog# 439084, Invitrogen™) | 20nM |
|---|---|---|
| POLR3A siRNA | rCrCrUrGrCrArGrCrUrArUrGrGrArGrGrGrArATT rUrUrCrCrCrUrCrCrArUrArGrCrUrGrCrArGrGTT |
20nM |
For La (SSB) depletion experiments, HEK293T cells were seeded into 12-well Cell-CultureTreated Multidishes (Thermo Scientific) and incubated overnight. HEK293T cells were then transfected with 150nM siRNA using Lipofectamine® RNAi Max (cat # 13778150) for 48 hours at 37 °C with 5% CO2 before collection. siLa/SSB (IDT): rArGrA rUrUrG rGrArU rGrCrU rUrGrC rUrGrA rATT; si-scramble (Invitrogen): Silencer Select Negative Control #1 siRNA. RNA extraction was performed using mirVana PARIS RNA and Native Protein Purification kit according to the manufacturer’s instructions (Invitrogen).
For Pol III inhibition, ML-60218 (MedChemExpress, HY-122122) was added to 80% confluent HEK293T cells to a final concentration of 25uM, and cells are collected 4h post treatment.
Reverse Transcription qPCR
For small RNA qPCR following POLR3A depletion, cDNA was first generated using the TaqMan MicroRNA Reverse Transcription Kit (Catalog # 4366597, ThermoFisher). qPCR was then performed using TaqMan™ Fast Advanced Master Mix (Catalog#4444557, Applied Biosystems) and TaqMan™ MicroRNA Assay (Catalog # 4440887, 4398987, thermo fisher)
| target | Assay ID | Context sequence |
|---|---|---|
| tRNA_Ala_TGC | CT322WW | TCAGTGGTAGAGCGCATGCTTTGCATGTATGAGGCCTCGGTTCGATCCCCGACACCTCCA |
| tRNA_Cys_GCA_1 | CT7DP2R | CAGTGGTAGAGCATTTGACTGCAGATCAAGAGGTCCCCGGTTCAAATCCGGGTGCCCCCT |
| RNU18 | 001204 | |
| Z30 | 001092 |
Immunoblotting and protein quantification
Cell pellets were washed once with PBS before lysis with RIPA buffer (Catalog#J62524.AD, Thermo Scientific) following standard protocols. Total protein concentration was determined using Pierce BCA protein assay kit (Catalog# 23225, Thermo Scientific), and equivalent protein fractions were diluted in cell lysis buffe, and 4*Laemmli Sample Buffer (Catalog#1610741, BIO-RAD). Proteins were separated on 4–20% Mini-PROTEAN® TGX Stain-Free™ Protein Gels, 15 wells (Catalog#4568096, BIO-RAD) using 10x Tris/Glycine/SDS (Catalog#1610732, BIO-RAD) and transferred onto polyvinylidene difluoride membranes 0.2 um (Catalog# LC2002, Invitrogen) with Trans-Blot® Turbo ™ Transfer System (Catalog# 1704150, BIO-RAD). Transfer membranes were blocked with 5% blotting-grade blocker (Catalog# 1706404, BIO-RAD), followed by incubation with the primary antibodies at 4°C overnight. Membranes were washed with TBST and incubated with mouse and rabbit secondary antibodies conjugated with horseradish peroxidase (Catalog# 31430, 31462, Invitrogen) for 2 h at room temperature followed by three washes in TBST. Proteins were visualized using Super SignalWest Pico (Catalog# 34580, Thermo Scientific) or SuperSignal WestFemto (Catalog#34096, ThermoScientific) with ChemiDoc™ Touch Imaging System (Catalog# 1708370, BIO-RAD). Protein abundances were calculated by normalizing to reference protein Lamin B2 or β-tubulin and to total protein.
Antibodies:
| Target | Cat number | dilution |
|---|---|---|
| POLR3A | PA5-58170 (Invitrogen) | 1:1000 |
| Tubulin | ab21058 (abcam) | 1:1000 |
| Lamin B2 | E1S1Q (Cell Signaling Technology) | 1:1000 |
| La (SSB) | sc-80656 (Santa Cruz Biotechnology) | 1:1000 |
DNA library preparation and sequencing
For RNA-seq experiments following POLR3A depletion and Pol III inhibition, libraries were prepared with the Kapa Hyper Stranded mRNA library kit (Roche) and sequenced on two 10B lanes for 151 cycles from both ends of the fragments on a NovaSeq X Plus with V1.0 sequencing kits. For small RNA-seq experiments following La (SSB) depletion, samples were first treated with Antarctic Phosphatase (NEB), followed by treatment with T4 Polynucleotide Kinase (NEB) according to the manufacturer’s protocol. Small RNA libraries were generated using the NEBnext small RNA library preparation kit according to instructions (New England BioLabs). small RNA high-throughput sequencing libraries were sequenced for 101 cycles using NovaSeq X Plus. All sequencing was performed at the University of Illinois, Roy J.Carver Biotechnology Center. Fastq files were generated and demultiplexed with the bcl2fastq v2.20 Conversion Software (Illumina).
RNA-seq alignment and differential analysis
RNA-seq reads were first trimmed using TrimGalore43 and then mapped to the hg38 transcriptome and quantified with Salmon51. Differentially expressed genes were determined using DESeq252. small RNA-seq reads were first trimmed using TrimGalore43 and then mapped to the entire genome space (GRCh38) using bowtie244. By default, multi-mapping reads were reported as a singular best alignment. Promoter-specific raw small RNA signal counts were extracted using bedtools48. Differentially expressed small RNA was determined using edgeR58.
Multi-class promoter curation
Protein-coding transcript annotations were retrieved from Ensembl (Gencode V43)41, whereas annotations for noncoding gene types were retrieved from RNAcentral (V22)42. For each URSID in RNAcentral, we concatenated the gene information across over 30 databases linked to RNAcentral. Clustering of TSS was performed to eliminate redundancy of TSS within each gene type, with clusters identified within a 120-nucleotide vicinity for each gene type and strand. Gene information pertaining to TSSs of the same cluster are collapsed together. Promoter regions were defined based on the position of the TSS cluster median, spanning from 50 nucleotides upstream to 150 nucleotides downstream. Repeat annotations were retrieved from RepeatMasker. Finally, all promoter windows associated with “blacklisted” genomic intervals (ENCFF356LFX) were masked from further analysis. The workflow and resources related to the Multi-class promoter curation method are available at https://github.com/VanBortleLab/multiClassPromoterCuration.
Uniform RNA polymerase occupancy scoring
We leveraged ChIP-seq data from GEO46 and ENCODE35, encompassing 2, 153, and 47 datasets for Pol I, Pol II, and Pol III, respectively (Supplemental Table 1), with corresponding total mapped reads of 53 million, 7 billion, and 5.6 billion. We implemented a comprehensive normalization procedure: first, in the context of Pol III and Pol II, reads were normalized across each context for every available subunit (including POLR3A, POLR3B, POLR3C, POLR3D, POLR3E, POLR3G and POLR2A), ensuring parity in signal representation across each cellular context. Subsequently, in the context of Pol III, we normalized reads for each subunit to equalize their contributions, thus preventing biases stemming from variable sequencing depths of subunits. After normalizing subunits of Pol III, we computed the median occupancy signal across all Pol III subunits. In the context of Pol I, we chose to concatenate reads from POLR1A and POLR1B, the only two experiments currently available in humans. Finally, we scaled each polymerase to 100 million reads ensuring consistency in signal intensities and enabling meaningful comparisons between polymerases. This process facilitated unbiased analysis of RNA polymerase occupancy and dominance across the genome.
Following read normalization, the signal coverage for each polymerase was extracted across binned intervals of 50 bp. Enrichment scores for each bin were determined using a Poisson framework. Briefly, for each bin, the expected signal within 5 Kb (λ5k), 10 Kb (λ10k), and across the entire genome (λgenome) is computed. The λ value is thereafter determined as the maximum lambda value of the distance set. The probability density function (PDF) utilized is given by P(X = x) = (ê(−λ) * λ^x) / x!. If k represents the observed signal for the bin, the p-value tied to a specific bin enrichment is calculated using P(X > k). P-values were adjusted globally using the Benjamini-Hochberg method. Finally, an enrichment score is computed as the negative logarithm (base 10) of the adjusted p-value. In cases where infinite scores are encountered, they are assigned the highest non-infinite score. The workflow and processed data are available at https://github.com/VanBortleLab/RNApolymerase_metamap.
RNA polymerase dominance visualization
Initially, RNA polymerase scores were assigned to each gene promoter by selecting the maximum score from the bins intersecting the promoter window. Subsequently, RNA polymerase dominance plots were generated by representing the three RNA polymerase scores for each gene as three point masses set at three vertices of an equilateral triangle, and plotting the center of mass of this system. getCOM() and plotTriangle() functions, which calculate and plot the center of mass for each gene, respectively, are available at https://github.com/VanBortleLab/dominatR.
Nascent T4 score calculation
We retrieved all available nascent PRO-seq (Precision Run-on Sequencing22) datasets available via ENCODE35 corresponding to 18 distinct experiments across 7 different cell lines. To ascertain strand specificity, we employed the infer_experiment code from the RSeQC47 package and filtered out 5 experiments with a strand specificity of less than 85%, resulting in 13 experiments (Supplemental Table 2). Subsequently, we utilized genomecov from BEDtools48 to create the 3’ pileup of reads across the genome, employing the flags “-dz −3”. T4 motifs (TTTT) were identified within the hg38 genome using SeqKit49, with a 3-nt cushion applied on either side of the T4 motif, resulting in a 10-nt long T4 motif. Employing the same scoring method utilized for the occupancy score of 50-nt bins, we computed the T4 score, this time focusing on the 10-nt T4 motif with expected signal over two vicinities: 20 nt and 50 nt on either side of the T4 motif. The workflow and processed data are available at https://github.com/VanBortleLab/NascentT4score.
To test the significance of enrichment of T4 scores in pol3-enriched active genes compared to pol2-enriched active genes, we conducted a significance test using random sampling. Specifically, we generated a distribution of T4 scores by summing the T4 scores of genes randomly sampled from the list of pol2-enriched genes. The number of genes thus sampled from the pol2-enriched set equaled the number of genes in the pol3-enriched set. This sampling process was repeated 1 million times to create a robust distribution of T4 scores. The p-value was determined as the fraction of these 1 million sums greater than the sum of T4 scores of our pol3-enriched genes. This approach allowed us to assess the significance of T4 score enrichment in pol3-enriched active genes relative to pol2-enriched active genes.
ChIP-seq Transcription Factor (TF) enrichment analysis
Protein coding genes were divided into two groups: one comprising 1572 genes with a median pol3 score greater than or equal to 2, and the other consisting of genes failing to meet this threshold. Subsequently, utilizing the ChIPatlas enrichment tool24, a list of transcription factors (TFs) across multiple experiments and cell types, along with their enrichment (−logqval and FoldEnrichment (FE) value) on the first group of genes compared to the second group was found. Experiments with fewer than 5000 peaks were filtered out. Enrichment scores were calculated using the formula SCORE = (−logqval) * log2(FE). For each TF, scores were collapsed within the same cell type using the mean score, and then across different cell types using the median score.
Multi-tissue small RNA-seq abundance
We retrieved a total of 204 small RNA-seq datasets (Supplemental table 3) available via ENCODE35 derived from 64 distinct contexts (i.e. cell lines and tissue types). Small RNA read pile-ups were determined using multiBamSummary from deeptools50 at 10 bp resolution. Reads were aggregated by cellular context and a summary RNA profile was thereafter determined by scaling all context-specific RNA reads to the minimal sequencing depth observed across all 64 contexts. Multi-tissue promoter-linked small RNA levels were then determined as the sum of normalized signals across bins within 100 bp of each promoter window.
TCGA gene correlation and survival features
The Cancer Genome Atlas (TCGA) primary solid tumor gene expression levels were extracted across all disease cohorts from the Broad Institute TCGA Genome Data Analysis Center (GDAC) Firehose mRNASeq Level 3 RSEM gene normalized data files at https://gdac.broadinstitute.org. Individual pan-cancer Pearson correlation scores were calculated for each Pol III-specific subunit against all genes included in these data. The strength and direction of correlation signatures related to Pol III-occupied and Pol III-sensitive genes were then measured by comparing the observed distribution of correlation metrics of a given gene set against an empirical null distribution derived by 10,000 permutations. Survival statistics linking gene expression levels to patient outcomes were retrieved from tcga-survival.com53,54.
Functional enrichment analysis
Gene Ontology enrichment analysis was performed using the R package clusterProfiler55,56 with genome wide annotation for Human org.Hs.eg.db57. Gene sets containing those associated with Pol III occupancy or Pol III sensitivity were compared against a universe of genes restricted to those with RNA polymerase occupancy (set 1) or inclusion in differential analysis (i.e. present following low count filtering, set 2). Enrichments were calculated for GO cellular component and GO biological processes and p values adjusted for multiple hypothesis testing.
Identification of unannotated, putative Pol III products
To identify unannotated putative Pol III products, at first, genomic bins with a median Pol3 score less than 2 were removed. Next, bins within 350 base pairs of any known annotations’ transcription start sites (TSS) were removed. In cases of consecutive adjacent bins, only the bin with the maximum median Pol3 score was chosen. Finally, bins within blacklisted regions were excluded, and the resulting bins were extended by 75 base pairs on each side to generate a list of 659 unannotated putative Pol3 promoters.
Supplementary Material
Acknowledgements
We thank Alvaro Hernandez, Chris Wright, Danman Zhang, and staff at the Carver Biotechnology Center for sequencing services, and administrators of the Carl R. Woese Institute for Genomic Biology (UIUC) Biocluster for computational support. We thank Prof. Brian Freeman, Prof. Andy Belmont, and members of the Van Bortle lab for critical reading of the manuscript and helpful suggestions. This work was supported by the National Institutes of Health, National Human Genome Research Institute (NHGRI) grant R00HG010362 to K.V.B.
Footnotes
References
- 1.Sentenac A. Eukaryotic RNA Polymerase. Crit. Rev. Biochem. 18, 31–90 (1985). [DOI] [PubMed] [Google Scholar]
- 2.Turowski T. W. & Boguta M. Specific Features of RNA Polymerases I and III: Structure and Assembly. Front. Mol. Biosci. 8, 680090 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.White R. & Jackson S. The TATA-binding protein: a central role in transcription by RNA polymerases I, II and III. Trends Genet. 8, 284–288 (1992). [DOI] [PubMed] [Google Scholar]
- 4.Gao Z., Herrera-Carrillo E. & Berkhout B. RNA Polymerase II Activity of Type 3 Pol III Promoters. Mol. Ther. - Nucleic Acids 12, 135–145 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Listerman I., Bledau A. S., Grishina I. & Neugebauer K. M. Extragenic Accumulation of RNA Polymerase II Enhances Transcription by RNA Polymerase III. PLoS Genet. 3, e212 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gerber A., Ito K., Chu C.-S. & Roeder R. G. Gene-Specific Control of tRNA Expression by RNA Polymerase II. Mol. Cell 78, 765–778.e7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.James Faresse N. et al. Genomic Study of RNA Polymerase II and III SNAPc-Bound Promoters Reveals a Gene Transcribed by Both Enzymes and a Broad Use of Common Activators. PLoS Genet. 8, e1003028 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhou S. & Van Bortle K. The Pol III transcriptome: Basic features, recurrent patterns, and emerging roles in cancer. WIREs RNA 14, e1782 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Campbell K. J. & White R. J. MYC Regulation of Cell Growth through Control of Transcription by RNA Polymerases I and III. Cold Spring Harb. Perspect. Med. 4, a018408–a018408 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yeganeh M. & Hernandez N. RNA polymerase III transcription as a disease factor. Genes Dev. 34, 865–882 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dergai O. & Hernandez N. How to Recruit the Correct RNA Polymerase? Lessons from snRNA Genes. Trends Genet. 35, 457–469 (2019). [DOI] [PubMed] [Google Scholar]
- 12.Van Bortle K. et al. A cancer-associated RNA polymerase III identity drives robust transcription and expression of snaR-A noncoding RNA. Nat. Commun. 13, 3007 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Arimbasseri A. G., Rijal K. & Maraia R. J. Transcription termination by the eukaryotic RNA polymerase III. Biochim. Biophys. Acta BBA - Gene Regul. Mech. 1829, 318–330 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mathews M. B. & Francoeur A. M. La antigen recognizes and binds to the 3’-oligouridylate tail of a small RNA. Mol. Cell. Biol. 4, 1134–1140 (1984). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lang W. H. & Reeder R. H. Transcription termination of RNA polymerase I due to a T-rich element interacting with Reb1p. Proc. Natl. Acad. Sci. 92, 9781–9785 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kuehner J. N., Pearson E. L. & Moore C. Unravelling the means to an end: RNA polymerase II transcription termination. Nat. Rev. Mol. Cell Biol. 12, 283–294 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mars J.-C., Sabourin-Felix M., Tremblay M. G. & Moss T. A Deconvolution Protocol for ChIP-Seq Reveals Analogous Enhancer Structures on the Mouse and Human Ribosomal RNA Genes. G3 GenesGenomesGenetics 8, 303–314 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zentner G. E., Saiakhova A., Manaenkov P., Adams M. D. & Scacheri P. C. Integrative genomic analysis of human ribosomal DNA. Nucleic Acids Res. 39, 4949–4960 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang Y. et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kessler A. C. & Maraia R. J. The nuclear and cytoplasmic activities of RNA polymerase III, and an evolving transcriptome for surveillance. Nucleic Acids Res. 49, 12017–12034 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Dieci G., Fiorino G., Castelnuovo M., Teichmann M. & Pagano A. The expanding RNA polymerase III transcriptome. Trends Genet. 23, 614–622 (2007). [DOI] [PubMed] [Google Scholar]
- 22.Mahat D. B. et al. Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nat. Protoc. 11, 1455–1476 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Phizicky E. M. & Hopper A. K. The life and times of a tRNA. RNA 29, 898–957 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zou Z., Ohta T., Miura F. & Oki S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Res. 50, W175–W182 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wu L. et al. Novel Small-Molecule Inhibitors of RNA Polymerase III. Eukaryot. Cell 2, 256–264 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li S., Balmain A. & Counter C. M. A model for RAS mutation patterns in cancers: finding the sweet spot. Nat. Rev. Cancer 18, 767–777 (2018). [DOI] [PubMed] [Google Scholar]
- 27.Zabala-Letona A. et al. mTORC1-dependent AMD1 regulation sustains polyamine metabolism in prostate cancer. Nature 547, 109–113 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bian X. et al. AMD1 upregulates hepatocellular carcinoma cells stemness by FTO mediated mRNA demethylation. Clin. Transl. Med. 11, e352 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Liao R. et al. AMD1 promotes breast cancer aggressiveness via a spermidine-eIF5A hypusination-TCF4 axis. Breast Cancer Res. 26, 70 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Johnson S. A. S., Dubeau L. & Johnson D. L. Enhanced RNA Polymerase III-dependent Transcription Is Required for Oncogenic Transformation*. J. Biol. Chem. 283, 19184–19191 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Petrie J. L. et al. Effects on prostate cancer cells of targeting RNA polymerase III. Nucleic Acids Res. 47, 3937–3956 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.White R. J. RNA polymerases I and III, growth control and cancer. Nat. Rev. Mol. Cell Biol. 6, 69–78 (2005). [DOI] [PubMed] [Google Scholar]
- 34.Smith J. C. & Sheltzer J. M. Genome-wide identification and analysis of prognostic features in human cancers. Cell Rep. 38, 110569 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Maraia R. J. La Protein and the Trafficking of Nascent RNA Polymerase III Transcripts. J. Cell Biol. 153, F13–F18 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Derrien T. et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cieśla M., Skowronek E. & Boguta M. Function of TFIIIC, RNA polymerase III initiation factor, in activation and repression of tRNA gene transcription. Nucleic Acids Res. 46, 9444–9455 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hsieh Y.-J., Wang Z., Kovelman R. & Roeder R. G. Cloning and Characterization of Two Evolutionarily Conserved Subunits (TFIIIC102 and TFIIIC63) of Human TFIIIC and Their Involvement in Functional Interactions with TFIIIB and RNA Polymerase III. Mol. Cell. Biol. 19, 4944–4952 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Duttke S. H. C. RNA Polymerase III Accurately Initiates Transcription from RNA Polymerase II Promoters in Vitro. J. Biol. Chem. 289, 20396–20404 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Frankish A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.RNAcentral Consortium et al. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res. 49, D212–D220 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Krueger F. et al. FelixKrueger/TrimGalore: v0.6.10 - add default decompression path. Zenodo 10.5281/ZENODO.7598955 (2023). [DOI] [Google Scholar]
- 44.Langmead B. & Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Danecek P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Barrett T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wang L., Wang S. & Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012). [DOI] [PubMed] [Google Scholar]
- 48.Quinlan A. R. & Hall I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Shen W., Sipos B. & Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta e191 (2024) doi: 10.1002/imt2.191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ramírez F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Patro R., Duggal G., Love M. I., Irizarry R. A. & Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Love M. I., Huber W. & Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Smith J. C. & Sheltzer J. M. Systematic identification of mutations and copy number alterations associated with cancer patient prognosis. eLife 7, e39217 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Smith J. C. & Sheltzer J. M. Genome-wide identification and analysis of prognostic features in human cancers. Cell Rep. 38, 110569 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Yu G., Wang L.-G., Han Y. & He Q.-Y. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS J. Integr. Biol. 16, 284–287 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wu T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2, 100141 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Carlson M. org.Hs.eg.db: Genome wide annotation for Human. (2022).
- 58.Chen Yunshun <Yuchen@Wehi. Edu.Au>, A. L. E. A. edgeR. Bioconductor 10.18129/B9.BIOC.EDGER (2017). [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




