Abstract
The archaeal transcription apparatus is closely related to the eukaryotic RNA polymerase (RNAP) II system, while archaeal genomes are more similar to bacteria with densely packed genes organised in operons. This makes understanding transcription in archaea vital, both in terms of molecular mechanisms and evolution. Very little is known about how archaeal cells orchestrate transcription on a systems level. We have characterised the genome-wide occupancy of the Methanocaldococcus jannaschii transcription machinery and its transcriptome. Our data reveal how the TATA and BRE promoter elements facilitate the recruitment of the essential initiation factors TBP and TFB, respectively, which in turn are responsible for the loading of RNAP into the transcription units. The occupancy of RNAP and Spt4/5 strongly correlate with each other and with the RNA levels. Our results show that Spt4/5 is a general elongation factor in archaea since its presence on all genes matches RNAP. Spt4/5 is recruited proximal to the TSS on the majority of transcription units, while on a subset of genes including rRNA and CRISPR loci, Spt4/5 is recruited to the transcription elongation complex during early elongation within 500 bp of the TSS, and akin to its bacterial homolog NusG.
Keywords: transcription, archaea, promoter, TBP, TFB, RNAP, Spt4/5
Introduction
Transcription is a fundamental process in biology and RNA polymerases (RNAP) are closely related in all domains of life1. The archaeal and eukaryotic systems are near-identical in terms of RNAP subunit composition and architecture, regarding transcription initiation and elongation factors and the molecular mechanisms that govern their activity2. The universally conserved core of RNAP resembles a crab claw-like structure made of the large catalytic subunits Rpo1 and 2 and the assembly platform including Rpo3/11. The archaeal RNAP shares five to six additional subunits with eukaryotic RNAPII that are absent in bacterial RNAP3. This includes the Rpo4/7 stalk module that protrudes from the core enzyme, binds to the nascent RNA and modulates transcription processivity and termination4. Archaeal transcription has been studied extensively in vitro, but relatively little is known about the genome-wide distribution of RNAP and basal transcription factors, and how this correlates with promoter elements and transcription output. A limited number of archaeal promoters have been functionally characterised, and seem to rely TATA boxes, B-recognition- (BRE) and Initiator elements (Inr)5,6. The former two are binding sites for the two basal transcription factors TBP and TFB, respectively7–9. Both are strictly required for promoter-directed transcription in vitro10, and homologous to eukaryotic TBP and TFIIB with identical functions but faster dynamics in terms of promoter binding11. The third basal transcription factor TFE is homologous to TFIIE, it enhances the stability of the transcription preinitiation complex (PIC) by catalysing the isomerisation of closed to open complex, during which the DNA strands are separated and the template strand is loaded into the active site of RNAP12–14. The elongation factor Spt4/5, NusG in bacteria, is the only RNAP-associated factor that is conserved throughout the three domains of life. Spt4/5 enhances transcription processivity and possibly plays a role during promoter escape15. Interestingly, in vitro experiments revealed that archaeal Spt4/5 and bacterial NusG are denied access to the preinitiation complex (PIC) by TFE and σ70, respectively13,16. Chromatin immunoprecipitation (ChIP) experiments show that yeast Spt4/5 is recruited to RNAP proximal to the promoter, suggesting a role in the transition from initiation to elongation17, whereas E. coli NusG is recruited to RNAP during elongation in a stochastic fashion18.
We applied a Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) strategy in order to characterise the whole genome distribution of Methanocaldococcus jannaschii (Mja) RNAP and the cognate initiation factors TBP and TFB, and to shed light on the recruitment patterns of Spt4/5 in archaea. To orientate the association of the transcription machinery within the genome, we mapped and analysed global TSSs and steady-state RNA levels. We identified positive sequential correlations between: BRE/TATA motif strength; binding of TBP and TFB to the promoter; occupancy of RNAP and Spt4/5 within the gene; and RNA levels. The elongation factor Spt4/5 showed two different modes of recruitment: early, promoter-proximal recruitment to RNAP similar to yeast Spt4/5; and a later recruitment during early elongation on rRNA and CRISPR loci more akin to bacterial NusG.
Results
Organisation of the Mja transcriptome
The workflow of the RNA-seq analysis is illustrated in Supplementary Figure 1a. To characterise the Mja transcriptome we first mapped the genome-wide transcription start sites (TSSs) using a terminator exonuclease (TEX) RNA-seq approach. We mapped 1508 TSSs (see supplementary materials) and used our TSS map to annotate 976 transcription units (TUs) that we defined as the sequence spanning from the primary TSS to the stop codon (on mRNA genes) or the annotated 3’ end (on noncoding RNA genes) of the last cistron. A further 138 TUs were predicted based on gene orientation but were not associated with a TSS. We identified several novel genes encoding ORFs and ncRNAs that are listed in Supplementary tables 3 and 4. Mja TUs are organised into a combination of single- and multicistronic operons (Supplementary Fig. 2e). The majority of protein-encoding genes encode long untranslated leader regions (5’-UTR) with only 16 mRNAs (1.9%) being defined as leaderless (<5 nt, Fig. 1a). Within the 5’ UTRs we identified ribosome binding sites (RBS) in 54% of mRNA genes (Fig. 1a). To determine the global steady-state RNA levels, we next calculated RPKM (Reads Per Kilobase of transcript per Million mapped reads) values for each TU. Using a cut off value of RPKM > 1, we defined 63% of the TUs as transcriptionally active (adjusted P value < 0.05, Supplementary text and Supplementary Table 3). The two ribosomal rRNA operons had the highest RPKM values and account for 80% of all mapped reads. Several small ncRNA genes including tRNAs were detected at low levels but may be misrepresented due to loss during size selection of library preparation. We could detect antisense transcription in Mja (Fig. 1b), however, the majority of antisense transcripts were not associated with a TSS which is possibly due to their rapid degradation. We identified twelve antisense TUs with assigned TSS, including the Mja histone A3 gene (Fig. 1c, Supplementary Table 4). Both sense and antisense A3 transcripts were highly abundant, hinting at a possible regulation of A3 expression by antisense transcription. Northern blotting confirmed the presence of both sense and antisense A3 transcripts covering the histone A3 ORF (Supplementary Fig. 2f).
Figure 1. Transcription start site map and promoter motif analysis.
a, The 5’-UTR distance distribution from the primary TSS to the start codon of Mja mRNAs. The insert shows the ribosome binding site (RBS) sequence motif identified by the MEME algorithm; for comparison the complementary sequence of the Mja 16S RNA. b, Comparison of sense and antisense RNA steady-state levels at all TUs. c, Strand-specific RNA profiles reveals sense-and antisense transcripts on the histone A3 locus. The grey arrows indicate TSS. d, Promoter DNA sequence alignments centred on the TSS reveal regions with a sequence bias corresponding to the BRE/TATA elements, the initially melted region (IMR), and the initiator (Inr) of the promoter. e, The BRE/TATA consensus motif identified by MEME-ChIP. f, the distance between the 3’ end of the TATA motif and the TSS is centred on 24 bp (TATA at a P value of < 10-3). g, The AT content distribution of the IMR that exceeds the genome average of 68.7% (red dotted line). The increase is significant according to a Wilcoxon signed rank test (P < 10-10, n = 1508). h, The dinucleotide frequency of TA and TG motifs surrounding the TSS. The red dotted line indicates the genome wide frequency of 0.15, and the significance was assessed by Fisher’s exact test (n = 1507). i, The T(A/G) motif increases the precision of TSS selection. The read count of all 5’- ends from TEX-treated RNA surrounding assigned TSSs were identified and the reads normalised to the TSS at each position. Initiation immediately upstream and downstream is four- and two-fold lower, respectively, for TSS with T(A/G) compared to those without (Wilcoxon rank sum test n = 447 not T(A/G); n = 762 T(A/G) Inr. P value: * <0.05; ** <0.01; *** <0.001).
Promoter sequence elements and start site selection
Alignment of DNA sequences surrounding the TSSs identified two regions with a sequence bias that correspond to the BRE/TATA elements, and the initially melted region (IMR) that includes the initiator (Inr) surrounding the TSS (Fig. 1d). Sequence motif analysis of these DNA seqences revealed a global BRE/TATA consensus (Fig. 1e). These elements could be identified upstream of 76% of TSSs using a stringent motif confidence score (P value < 10-3, Supplementary Fig. 3a), including all primary TSSs of TUs defined as transcriptionally active. BRE/TATA motifs are centred on register +24 relative the TSS; this distance is conserved from archaea to metazoans19 (Fig. 1f). During open complex formation the two DNA strands of the initially melted region (IMR) of the promoter from -12 to +2 are separated12,20–22. The second region of sequence bias corresponding to the IMR is enriched in A and T residues (80 ± 12% AT, genome average 69 % AT, Fig. 1g). The AT content of the IMR does not correlate with RNA levels (Supplementary Fig. 3b). The Inr element formed by the bases surrounding the TSS showed a strong bias for the sequence T(A/G) at position -1/+1 (Fig. 1d) but, similar to the IMR, did not correlate with RNA levels (Supplementary Fig. 3c). Examining the dinucleotide frequency within this region revealed that TA and TG are not only highly enriched at position -1/+1 (combined > 60%, compared to the genome average of 15%), but also strongly disfavoured at the neighbouring positions (-2/-1 and +1/+2, Fig. 1h). The conservation of the T(A/G) motif is independent of the distance between the TATA box and the TSS (Supplementary Fig. 3d). Since these results suggest that the Inr dictates TSS selection, we analysed the TSS specificity on promoters with and without Inr motif. Promoters with an Inr sequence T(A/G) showed up to four-fold lower levels of transcription initiation at neighbouring positions compared to promoters without the T(A/G) motif (Fig. 1i). In summary, while the BRE/TATA motifs facilitate the transcription preinitiation complex assembly, the Inr fine-tunes TSS selection. A comparison with other archaeal promoters23–28 (Supplementary Fig. 4) reveals that the TATA consensus is largely conserved across the archaea, while the significance of IMR and Inr are subject to variation29.
TBP and TFB binding to the Mja BRE/TATA motifs
We determined the global occupancy of the essential initiation factors TBP and TFB by chromatin immunoprecipitation using polyclonal antibodies raised against recombinant proteins followed by high-throughput sequencing (ChIP-seq). The workflow and detailed methods are described in supplementary materials. Figure 2a-e show the ChIP-seq profiles of four representative promoters (Fig. 2a-e), ranging from promoters that show a distinct and defined increased TBP and TFB occupancy centred on the BRE/TATA motifs (mcrB and ftr, panel b and c), those that display broader profiles, but are distict from the mock control (sla, d), to promoters that do not show any increased occupancy at all (rrnA, e). Averaging the TBP/TFB occupancy profiles centred on the TSS of mRNA promoters directing transcription of the top 25% of TUs (by RPKM) shows distinct TBP and TFB peaks (Fig. 2f). The apex of both peaks concur with the location of the BRE/TATA motifs, which confirms the validity of our TBP/TFB profiling analysis (Fig. 1f). The profile of the mock IP control demonstrates that both TBP and TFB signals are above the background, while also the mock shows a slight increase in signal (Fig. 2f). In order to validate our results we compared our data to a subset of experimentally characterised promoters. 19 tRNA and 12 mRNA Mja promoters have been analysed quantitatively in vitro with respect to the formation of DNA-TBP-TFB complexes using electrophoretic mobility shift assays (EMSAs)6. There is a strong correlation between the published in vitro binding data and the in vivo occupancy across their promoter regions (TSS ± 250 bp, TBP R = 0.7, P value = 1.1 x 10-5, TFB R = 0.61, P value = 2.6 x 10-4 Supplementary Fig. 5c), which also implies that in vitro EMSAs are a good indicator for the binding of TBP and TFB to promoters in vivo. In order to relate strength of the TBP/TFB binding to the sequence of the BRE/TATA motifs, we compared the confidence score (P value) of the BRE/TATA motif of each promoter to the TBP and TFB ChIP signal (Fig. 2g). The BRE/TATA score showed a weak but significant correlation to the TBP/TFB occupancy (TBP R = -0.23, P value = 6 x 10-8, TFB R = -0.30, P value < 10-10, mock R = -0.08, P value = 0.03), but only a very weak correlation to TU steady-state RNA levels (Fig. 2h, TBP R = 0.15, P value = 1 x 10-4, TFB R = 0.15, P value 8.1 x 10-5, no correlation with mock, P value > 0.05).
Figure 2. Correlation between TBP/TFB binding to the promoter and RNA levels.
a, The BRE/TATA motifs (highlighted in blue), primary and secondary TSS (red and pink, respectively), and the coding region (grey) of three selected mRNA (mcrB, ftr and sla) and the rRNA promoter. The confidence score (P value) for the BRE/TATA motif is indicated to the right of the sequence. b-e, TBP and TFB occupancy profiles at the mcrB (b), ftr (c), sla (d) and rrnA (e) promoter. TSS are indicated as a arrows, with the primary TSS in black. f, A metadata analysis shows that the averaged occupancy profiles of TBP and TFB of the top 25% of mRNA TU (by sense RPKM) colocate with the predicted BRE/TATA motif (grey). The mock control is indicated in black. g, Correlation between the BRE/TATA score (P value) and TBP occupancy. Spearman correlations are indicated TBP R = -0.23, P value = 6 x 10-8. h, Correlation between the TBP occupancy and RNA levels (sense RPKM for all TU with detectable transcript). Spearman correlations indicated on TBP R = 0.15, P value = 1 x 10-4).
RNAP occupancy correlates with RNA levels
We characterised the global occupancy of RNAP with two polyclonal antibodies directed against two distinct RNAP subcomplexes. The pair-wise correlation between the genome-wide occupancy of the Rpo4/7 stalk and Rpo3/11 assembly platform subunits was calculated using 250 bp windows with a 50 bp overlap. The Rpo4/7 and 3/11 signals correlate very strongly with each other (R = 0.95, P value < 10-10, Fig. 3a). In order to visualise the RNAP occupancy within TUs we plotted the ChIP-seq profile as occupancy per nucleotide across the genome. As expected, the RNAP ChIP seq profiles of individual loci emphasise very diverse profiles on different genes (Fig. 3b-e, and figure 5), e..g. while occupancy is high on the sla and mcr TUs (Figure 3b/d), it is low on the tuf and rpo operons (Figure 3c/e). A metadata analysis averaging the RNAP occupancy centred on the TSS reveals that the Rpo4/7 signal appears approximately 100 bp upstream of the the Rpo3/11 signal (Fig. 3f). Promoter-bound TBP and TFB are strictly required for the recruitment and subsequent loading of RNAP into the TU in vitro. In good agreement, the occupancy of TBP and TFB at the promoter correlated with RNAP occupancy within the TU (Fig. 3g, Rpo4/7 compared to TBP R = 0.37, P value < 10-10, Rpo4/7 to TFB R = 0.3, P value < 10-10, mock R = 0.1, P value = 0.02). Finally, the RNAP occupancy within TUs correlated moderately well with RNA levels (Fig. 3h Rpo4/7 R = 0.45, P value < 10-10, Rpo3/11 R = 0.48, P value < 10-10, mock R = -0.15, P value 3.4 x 10-4).
Figure 3. The Rpo4/7 stalk and RNAP core remain associated through the transcription cycle.
a, The correlation between the occupancy of RNAP subunit complexes Rpo4/7 and Rpo3/11 is very strong across the genome. Points are coloured using a density gradient (ranging from blue-low to red-high), Spearman correlations indicated (P value < 10-10). b-e, RNAP occupancy profiles on representative TUs: the sla (b), tuf (c), mcr (d), and RNAP subunit operon (e). Arrows indicate TSS (primary in black). f, Averaged occupancy profiles of Rpo4/7, Rpo3/11 and mock control at the top 25% of mRNA TU (by sense RPKM). g, Correlation between the TBP promoter occupancy (TSS +/- 250 bp) and RNAP TU occupancy (TSS +/- 250 to TU end) for all TUs (RPKM > 1). Spearman correlations TBP R = 0.37, P value < 10-10. d, Correlation between steady-state RNA levels (sense RPKM for all TU RPKM > 1), and RNAP (Rpo4/7) occupancy within the body of each TU, Spearman correlations Rpo4/7 R = 0.45, P value < 10-10.
Figure 5. Archaeal Spt4/5 is a general elongation factor that is recruited to RNAP via two distinct modes.
a, Spt4/5 and RNAP occupancy correlates very strongly across the whole genome. Data points of substoichiometric Spt4/5:RNAP occupancy, with Spt4/5 occupancy more than 1 Log2(IP/input) lower than RNAP occupancy, are indicated in red, Spearman correlations R = 0.96, P value < 10-10. b-f, The Spt4/5 occupancy profiles reflect two recruitment modes of Spt4/5 exemplified by the archaealleum (b) and rRNA operons (d). Representative RNAP and Spt4/5 occupancy profiles on the fla (b), hsp60 (c), rrnA (d), CRISPR13 operon (e) and larger scale plot of the long 5’ UTR gene rpl3 gene (f). Arrows indicate TSS. g, The 5’-UTR length does not affect the difference between Spt4/5 and Rpo3/11 occupancy proximal to the promoter of TUs (RPKM > 1). Points coloured by density.
In vitro preinitiation complex assembly and promoter activity
Surprisingly the two Mja rRNA promoters (rrnA and rrnB) have no identifiable BRE/TATA motifs and do not show strong TBP/TFB ChIP signal (Fig. 2a,e). This suggests that they are weak promoters which is in stark contrast to the high RNAP occupancy and RNA levels. In order to probe the strength of Mja rrn promoters in vitro, we monitored PIC formation on the rrnA promoter using EMSA, and promoter activity using transcription assays. For comparison, we included a representative Mja mRNA promoter (rpl12), which is associated with high RNAP occupancy and steady-state mRNA, an Mja CRISPR promoter, which has high RNAP occupancy (but low RNA levels likely due to the rapid processing of the primary transcript in vivo), and the well-characterised viral SSV T6 promoter (Fig. 4a)10,20,30,31.
Figure 4. PIC formation and promoter strength in vitro.
a, Alignment of SSV T6 model promoter and representative Mja promoters including ribosomal RNA (rrnA), CRISPR and mRNA (ribosomal protein rpl12) promoters. The BRE/TATA motifs are shown in dark gray with P values indicated, the IMR is highlighted in light grey with AT% indicated. b, EMSA showing preinitiation complex (PIC) formation on promoter templates shown in (a). c, EMSAs using heteroduplex promoter variants. PIC indicates the transcription preinitiation complex, and TC the ternary DNA-TBP-TFB complexes. Exposure is adjusted to account for diverse signal intensities. d, Promoter-directed in vitro transcription assays. Promoter templates shown in (a) were fused to C-less cassette resulting in transcripts of 150 nt (T6), 157 nt (rrnA) and 152 nt (CRISPR and rpl12) length.
The SSV T6 and CRISPR promoters recruit RNAP in a TBP/TFB-dependent fashion, and the addition of TFE stimulated the PIC in EMSA experiments (Fig. 4b). The rpl12 promoter, that has a similar BRE/TATA consensus but lower IMR AT% than the CRISPR promoter formed a weak PIC in the absence of TFE. In contrast, the rrnA promoter was not able to form a stable PIC. Heteroduplex promoter variants include a 4 bp noncomplementary region (-3 to +1), mimick the open complex and enhance PIC stability20,30. These variants enabled PIC formation at all four promoters, including rrnA (Fig. 4c). Introducing mutations into the TATA sequence abolished or dramatically reduced PIC formation on all promoters (Supplementary Fig. 6a-b). We used promoter-directed in vitro transcription experiments to complement the promoter-binding experiments. The results from both assays mirrored each other; while the SSV T6, rpl12 and CRISPR promoters resulted in large amounts of transcripts with the correct size, the rrnA promoter was inactive (Fig. 3d). In conclusion, in contrast to the in vivo analysis, the in vitro transcription experiments show a direct link between promoter motifs, the recruitment of stable PIC and promoter strength.
Spt4/5 is a general elongation factor with two distinct recruitment modes
We carried out a ChIP-seq analysis in order to characterise the global occupancy of the transcription elongation factor Spt4/5. The pair-wise correlation between genome-wide occupancies of Spt4/5 and RNAP is very strong (Fig. 5a, Rpo3/11 R = 0.96, P value < 10-10; Rpo4/7 R = 0.95, P value < 10-10, mock R = 0.035, P value < 10-10). Furthermore, a comparison of RNAP and Spt4/5 ChIP-seq profiles on individual TUs (by plotting their per nucleotide occupancy) demonstrates that Spt4/5 closely mirrors the undulating pattern of RNAP occupancy that likely reflects pausing and varying degrees of transcription processivity (Fig. 5b,c). This behavior suggests that Spt4/5 stably associates with the transcription elongation complex (TEC) in vivo. In order to detect any potential heterogeneity in the genome occupancy of RNAP and Spt4/5 we identified genome locations characterised by a lower Spt4/5:RNAP occupancy ratio (red dots in Fig. 5a). The individual 250 bp windows were merged to identified 23 separate genome regions with significantly lower Spt4/5 than RNAP occupancy (adjusted P value < 0.05, Supplementary Table 5). These regions included 18 our of the 20 CRISPR loci, both ribosomal rRNA operons (rrnA and rrnB), two annotated small non-coding RNA genes, and mj0496 (uncharacterised ORF). A closer scrutiny of these regions revealed that the lower Spt4/5:RNAP occupancy ratio is restricted to the promoter proximal region of the gene, with the Spt4/5 profile matching that of RNAP from ~500 bp downstream of the promoter onwards (Fig. 5d,e, Supplemental Table 5).
The bacterial Spt5 homologue NusG aids the coupling of transcription and translation by interacting with the RNAP and the ribosome32,33. Similarly transcription and translation are coupled in archaea 34. We tested whether the recruitment of Spt4/5 to TECs on protein-encoding genes was influenced by the recruitment of the ribosome to the RBS by analysing Spt4/5 occupancy on mRNA genes with long 5’ UTRs. The 5’ UTR of the rpl3 gene is 286 bp long, but Spt4/5 is recruited symmetrically with RNAP close to the TSS and not further downstream at the RBS (Fig. 5f). To explore this globally we subtracted the RNAP- from the Spt4/5 occupancy at each mRNA promoter and plotted the value against the length of the 5’-UTR. If Spt4/5 recruitment was aided by the ribosome we would expect the difference in occupancy to increase with 5’-UTR length, however no difference was observed (Fig. 5g). In conclusion, our results have revealed two modes of Spt4/5 recruitment (Fig. 6). On the majority of Mja genes Spt4/5 is recruited to the RNAP slightly off-set from the TSS in proximity of the promoter. On a subset of genes, including the noncoding ribosomal RNA operons and CRISPR loci, Spt4/5 is recruited to RNAP several hundred bp downstream of the TSS.
Figure 6. The initial stages of the transcription cycle in archaea.
The average occupancy profiles of TBP, TFB, RNAP and Spt4/5 on the top 25% of mRNA TUs reflect the initial stages of the transcription cycle. TBP and TFB are bound to the TATA and BRE promoter elements 24 bp upstream of the TSS, which in turn recruit RNAP to form the preinitiation complex (PIC). Subsequently, two modes of Spt4/5 recruitment could be distinguished on different genes: 1. On the majority of genes Spt4/5 is recruited ‘early’, likely during promoter escape; 2. On the ribosomal rRNA operons and CRISPR Spt4/5 is recruited ‘later’ offset from TSS in the downstream direction, likely occurring during transcription elongation.
Discussion
We present the first comprehensive genome-wide analysis of transcription in archaea by characterising the (i) occupancy of RNAP and basal transcription factors, (ii) the transcriptome including a TSS map, and (iii) a promoter motif analysis all in the same organism.
We identified 1508 TSSs in M. jannaschii, and could account for 88% of TSS of the 1114 predicted TU. The TSS analysis furthermore reveals that M. jannaschii mRNAs have long 5’ UTRs indicative of extensive riboregulation by sRNA and riboswitches. This pattern is similar to other methanogens including M. mazei, M. psychrophilus, T. kodakarensis and P. furiosus, and different from Sulfolobales and halophilic archaea that are characterised by leaderless mRNAs23–27,35–38. The assembly of the PIC in vitro is strictly dependent on the binding of TBP and TFB to TATA and BRE motifs of archaeal promoters, respectively. Our in vivo analysis reveals the prevalence of BRE/TATA motifs, suggesting that they are the dominant promoter elements in archaea. This is in contrast to eukaryotes where conventional TATA motifs are absent at the majority of promoters39. We also reveal the importance of downstream sequences including the IMR and the pervasive 3-bp Inr element that increases the accuracy of TSS selection, while not correlating with the RNA levels. Thus far the role of the archaeal Inr has only been studied in vitro, mainly with mutated variants of the viral SSV1 T6 model promoter40,41. Our systems data reveal that the Mja Inr has a bias for T(A/G) at registers -1/+1. This preference for pyrimidine and purine nucleotides is a universally conserved promoter feature, which reflects the high degree of conservation between the RNAP active site architectures in the three domains of life19,42,43.
The elevated AT content of the IMR favors local DNA melting, and experimental evidence shows that the IMR sequence affects promoter strength at individual promoters in vitro12,29. However, on a global level the AT content of the Mja promoter IMR does not correlate with RNA levels, and it is thus unlikely that the IMR’s AT content alone limits promoter strength in vivo.
Having explored the sequence characteristics of archaeal promoters we characterised the association of RNAP, TBP, TFB and the elongation factor Spt4/5 with the genome. The averaged occupancy profiles of highly expressed genes illustrate the early stages of the archaeal transcription cycle with the step wise assembly of the PIC, RNAP and Spt4/5 recruitment, and promoter escape of RNAP (Fig. 6). The individual RNAP profiles in different TUs are very diverse, including regions of high and low occupancy proximal to the promoter motifs and within TUs, which likely reflects variations in promoter recruitment, efficiency of escape, and processivity and pausing44. It has been proposed that the yeast RNAPII RPB4/7 stalk reversibly associated with the RNAP core. Our ChIP-seq results demonstrate that both Rpo4/7 and 3/11 are colocalised (R = 0.95) across the genome suggesting that the stalk remains associated with the RNAP core as it progresses through the transcription cycle. The fact that Rpo4/7 is slightly off-set upstream from Rpo3/11 signals at TSSs is likely due to epitope occlusion of the latter in transcription preinitiation complexes15,20. The molecular mechanisms of archaeal Spt4/5 have been characterised in some detail in vitro13,21,45. Our ChIP-seq results demonstrate that Spt4/5 associates with elongating RNAPs throughout the genome behaving like an ‘honorary’ RNAP subunit on all genes, protein-encoding as well as non-coding RNA genes, meaning that Spt4/5 fulfills the criteria of a general elongation factor. By comparing the ChIP-seq profiles of RNAP and Spt4/5 two distinct modes of Spt4/5 recruitment become apparent, either (1) proximal to promoter and just off-set from the TSS or (2) further downstream within the first 500 bp of the TU (Fig. 6). All multisubunit RNAP face a similar mechanical engineering challenge: a network of interactions between promoter-bound initiation factors (TBP/TFB/TFE) and RNAP is crucial to enable efficient recruitment of RNAP during early initiation, however, these interactions need to be disrupted to allow RNAP to escape from the promoter15. As Spt4/5 and the initiation factor TFE bind to the RNAP clamp in a mutally exclusive manner in vitro13,15, Spt4/5 recruitment proximal to the TSS could assist promoter escape of RNAP by displacing TFE. Our repeated attempts to ChIP TFE were unsuccessful despite the use of several independent antibody preparations, therefore we could not directly characterise the swapping of Spt4/5 and TFE in vivo. However, Spt4/5 mode (1) does support the recruitment during promoter escape - and not during elongation. ChIP analyses from eukaryotic systems are in agreement with promoter proximal recruitment of Spt4/517 and the swapping with TFIIE proximal to the promoter46,47. Our results show notable exceptions to mode (1); in mode (2) the Spt4/5 occupancy does not match RNAP occupancy until several hundreds bp downstream of the TSS; these include the two ribosomal RNA operons that account for 80% of the total RNA in the cell, and the abundant CRISPR loci. In contrast to Mja Spt4/5, E. coli NusG is recruited during elongation at most TUs, but proximal to rRNA promoters due to the assembly of antitermination complexes including NusA, B and E, and other ribosomal proteins, some of which are conserved in archaea18,48,49. rRNA operons and CRISPR regions differ from coding genes as templates for transcription in several regards such as absence of coupled translation, strong secondary-structure content, co-transcriptional processing and ribosome biogenesis (rRNA). Yet unidentified rRNA and CRISPR promoter-specific transcription activators could enhance RNAP recruitment, stabilise the PIC, or interact with the RNAP clamp and possibly enhance promoter escape. This notion is supported by our finding that Mja rRNA promoters have a suprisingly poor BRE/TATA motifs and have very low activity in transcription experiments in vitro, in apparent conflict with the high steady-state levels of rRNA and RNAP occupancy on rRNA operons in vivo. The Sulfolobus solfataricus and Pyrococcus furiosus rRNA promoters have defined BRE/TATA motifs, and are very strong in vitro12,31,50, while bacterial rRNA promoters tend to form unstable PICs, making them more amenable to regulation51,52.
A quantitative analysis of the transcriptome reveals that 700 of the 1114 TU (63 %) contain detectable transcript, which is plausible considering that Mja was cultured under optimal growth conditions. We found only a weak correlation between BRE/TATA motif scores or TBP/TFB occupancy, and no correlation with RNA levels. Steady-state RNA levels do not take into account factors such as RNA stablility, however as a good correlation was found between RNAP occupancy and RNA levels it seems a reasonable proxy for transcription output for most Mja genes. The lack of a strong correlation between promoter motifs and RNA levels illustrates the importance of additional factors such as the chromatin context as well as gene-specific regulators53,54. For example, TBP recruitment to the Mja rb2 promoter TATA element is enhanced by the adjacent binding of the Ptr2 activator in vitro55,56. Based on the BRE/TATA score of the rb2 promoter the relative TBP promoter occupancy can by linear regression be predicted to 0.14 Log2 (IP/input), while the observed value is much higher at 1.01, in line with a Ptr2-enhancement of TBP binding in vivo. A nascent elongating transcript (NET)-seq57–59 approach would allow a direct determination of transcription output in vivo, and could provide incisive insights into the manifold factors that regulate transcription in the archaea in the future.
Methods
Culture conditions
Mja strain DSM 266160 were grown in large scale 100 l fermenters in a minimal media containing 0.3 mM K2HPO4, 0.4 mM KH2PO4, 3.6 mM KCl, 0.4 M NaCl, 10 mM NaHCO3, 2.5 mM CaCl2, 38 mM MgCl2, 22 mM NH4Cl, 31 μM Fe(NH4)2(SO4)2, 1 mM C6H9NO6, 1.2 μM MgSO4, 0.4 mM CuSO4, 0.3 μM MnSO4, 36 nM FeSO4, 36 nM CoSO4, 3.5 nM ZnSO4, 4 nM KAl(SO4)2, 16 nM H3BO3, 42 μM Na2SeO4, 0.3 nM Na2WO4, 11 μM NaMoO4, 44 μM (NH4)2Ni(SO4)2 and 2 mM Na2S. Fermenters were mixed at 250 rpm and with H2:CO2 gas at 4:1 ratio at 85°C.
RNA preparation
RNA for sequencing was prepared from Mja cell pellets by Vertis Biotechnologies AG using the mirVana RNA isolation kit (Ambion). For TSS mapping total RNA was treated with Terminator exonuclease (TEX, Epicentre) to remove 5’ mono-phosphate RNA. RNA for Northern blot analysis was prepared from Mja cell pellets using peqGOLD TriFast reagent (PeQlab) as per manufacturers instructions.
Chromatin immunoprecipitation
All antibodies used in ChIP experiments were rabbit antisera produced by Davids Biotechnologie GmbH using recombinant proteins prepared as in61. Specificity of antibodies was determined by Western blot. Mock control IPs used pre-immune sera. ChIP was performed on cultures of Mja that were grown to late log phase as measured by a cell count of ~ 1 x 108 cells/ml, and cross-linked by addition of 0.1% formaldehyde for 1 min before quenching with 12.5 mM glycine. Similar cross-linking conditions have been used successfully for the thermophile Pyrococcus62,63. Fixed cell pellets were washed three times in PBS and then resuspended in lysis buffer (0.1% sodium deoxycholate, 1 mM EDTA, 50 mM HEPES pH 7.5, 140 mM NaCl, 1% Triton-X-100) plus 10% glycerol and protease inhibitor (cOmplete mini, EDTA-free protease inhibitor cocktail, Roche). DNA was sheared by sonication to approximately 300 bp fragments using a cup horn sonicator (Qsonica Q700) before mixing overnight at 4°C with the appropriate antibody prebound to Dynabeads M-280 sheep anti-rabbit IgG (Life Technologies). Beads were washed twice with lysis buffer, once with lysis buffer 500 (0.1% sodium deoxycholate, 1 mM EDTA, 50 mM HEPES pH 7.5, 500 mM NaCl, 1% Triton-X-100), once with LiCl buffer (0.5% sodium deoxycholate, 1 mM EDTA, 250 mM LiCl, 0.5% nonidet P-40, 10 mM Tris pH 8) and a final wash with TE buffer (10 mM Tris pH 7, 0.1 mM EDTA). DNA-protein complexes were eluted with ChIP elution buffer (10 mM EGTA, 1% SDS, 50 mM Tris pH 8) at 65°C for 10 min and remaining complexes eluted in TE (10 mM Tris pH 7, 0.1 mM EGTA) containing 0.67% SDS. Input samples were prepared by mixing sheared DNA-protein mix with TE (10 mM Tris pH 7, 0.1 mM EGTA) containing 1% SDS. Crosslinks were reversed and protein removed by treatment of samples with 0.05 mg ml-1 RNase A and 0.5 mg ml-1 proteinase K at 37°C for 2-4 hrs followed by overnight incubation at 65°C. DNA fragments were purified using MinElute columns (Qiagen) and quantified using the Qubit ds DNA HS kit (Life Technologies).
Illumina sequencing
For summary of steps see Supplementary Fig. 1. Library preparation and Illumina sequencing of total- and TEX treated RNA was performed by Vertis Biotechnologies. For the TEX treated samples RNA adapters were ligated to the 5’ ends and 3’ ends were poly(A) tailed before first-strand cDNA synthesis and PCR amplification. Resulting cDNA was fractionated by ultrasound and 5’ ends selected and further amplified after ligation of TruSeq 3’ end adapter primer (Illumina). For RNA-seq of total RNA samples were fragmented with ultrasound and first-strand cDNA synthesis was performed using randomised N6 primer before ligation of strand-specific TruSeq adapters (Illumina) to the 5’ and 3’ end of the cDNA and PCR amplification. cDNA samples were pooled, subjected to size selection of 150-500 bp using Agencourt AMPure XP beads (Beckman Coulter) and sequenced on an Illumina HiSeq 2000 with single-end 50 bp read length followed by adapter trimming and filtering by quality score. ChIP-seq library preparation was performed using NEBNext ChIP-seq library preparation set for Illumina and NEBNext multiplex adaptor oligos (New England Biolabs) including size selection to ~250 bp using Agencourt AMPure kit and sequenced on an Illumina HiSeq (library 1) or MiSeq (libraries 2 and 3) with single-end 50 nt read length followed by adapter trimming and quality filter. The quality of the sequences was further assessed by FastQC64.
TSS mapping
For TSS analysis TEX treated RNA sequences were aligned to the Mja genome using Bowtie65 allowing for no mismatches in the first 28 nt of the read and filtering out any read that aligned to more than one location, (mapping statistics in Supplementary Table 1). BedTools66 was used to create strand specific nucleotide resolution histograms of the 5’ nucleotide of each read across the entire genome for each replicate. The R statistical program67 with findPeaks function from package quantmod was used to determine the genome positions containing TSS as peaks, i.e. the highest position in any continuous sequence of counts. These TSS were further filtered as detailed in Supplemenatry Text and identified TSS are listed in Supplementary Table 2 along with the read count for each replicate at the TSS coordinate.
TU mapping
The TSS list and list of annotated and novel genes (Supplementary Tables 2-4) was used to determine the transcription units (TU) for single gene cistrons, multi gene operons and non-coding RNA genes. TU co-ordinates were defined as the TSS to the stop codon of the last cistron for coding TU, or the annotated end for non-coding RNA. Where multiple TSS occur for a single TU the primary TSS, i.e. that with the highest read count, was used (details in Supplementary Text).
Fidelity of TSS selection
To assay fidelity of TSS the TSS were first filtered so that where multiple assigned TSS occurred within 5 nt the one with the highest read count was retained. Then the number of reads from the TEX treated samples whose 5’-end mapped to each position -5 to +5 relative to the assigned TSS was determined and averaged over the two replicates. For each individual region the read count was normalised to the read count at the +1 position of the assigned TSS. Significance between the same relative positions for assigned TSS with an Inr of T(A/G) compared to those without was determined by Wilcoxon rank sum test.
Transcriptome analysis
For transcriptome analysis random primed RNA sequences were aligned to the Mja genome using Bowtie65 allowing for no mismatches in the first 28 nt of the read. Reads that align to more than one location were found to only effect 1.8% of the genome so these were included and each mapped to one location so that regions containing repeats (such as the ribosomal rRNA operons) were not misrepresented in the data set. Mapping statistics in Supplementary Table 1. For expression analysis the number of strand specific reads across the length of each TU was determined using BedTools66 and used to calculate the strand specific RPKM (reads per kilobase per million mapped reads). RPKM values were averaged over the two replicates (Supplementary Table 3). To assess if a TU contains detectable transcript sense RPKM values for each replicate were first log transformed to approximate a normal distribution, then applied a one sample t-test for Log10(RPKM) greater than 0 (ie RPKM greater than 1) followed by Benjamini Hochberg false discovery rate adjustment. An adjusted P value < 0.05 was used to define detectable transcript.
ChIP occupancy analysis
An outline of the sequencing analysis is shown in Supplementary Fig. 1b. ChIP sequenced reads were aligned to genome using Bowtie65 allowing for no mismatches within the first 28 nt. BAM files were read into the R statistical program67 with packages ShortRead and GenomicRanges. The package chipseq was used to extend the 50 bp reads in the sense orientation to reflect the average fragment size of 250 nt. Mapping statistics are shown in Supplementary Table 1 (for additional details see Supplementary text).
Genome wide occupancy overlapping windows across entire genome
For pair-wise genome-wide comparison of occupancies the genome was split into overlapping windows of 250 bp to reflect the average DNA fragment length of the ChIP fragments. The reads per window for each IP and input sample was determined using BedTools65 and normalised to individual read depth by dividing by total mapped reads per sample, and multiplying by 1,000,000. Each IP sample was divided by the input resulting in the normalised (IP/input) read count. The normalised read count was averaged across replicates and log transformed to provide the Log2(IP/input) for each region.
Genome wide occupancy: TU occupancy
To determine the TU occupancy each TU with detectable transcript levels (sense RPKM >1 with adjusted P value < 0.05) was first separated into a promoter region corresponding to TSS ± 250 nt (average fragment length), and a intra-TU region starting at the TSS + 250 nt to the end of the TU, and excluding those TU smaller than 250 nt. The reads per segment for each IP and input sample was determined using BedTools65 and normalised to individual read depth by dividing by total mapped reads per sample, and multiplying by 1,000,000. Each IP sample was divided by the input resulting in the normalised (IP/input) read count. The normalised read count was averaged across replicates and log transformed to provide the Log2(IP/input) for each region.
Occupancy at specific loci
For comparison of specific genomic intervals BedTools65 was used to create per nucleotide read count for the extended reads of IP and input samples across the entire genome. The reads were normalised to individual read depth at each position by dividing by total mapped reads per sample, and multiplying by 1,000,000. Each IP sample was divided by the input resulting in the normalised (IP/input) read count. The normalised read count was averaged across replicates and log transformed to provide the Log2(IP/input) for each position. For individual genomic intervals the histograms at specific genome coordinates were extracted, replicates were averaged, and plots smoothed using sliding 40 bp windows.
Meta-data analysis plots
To prepare average occupancy profiles, the read counts surrounding the regions of interest (e.g. TSS for top 25% of mRNA genes by RPKM) were extracted from the per nucleotide occupancy histograms normalised to read depth and input. The occupancy at each position relative to the site of interest was averaged across each TU. Replicates were averaged and plots smoothed by averaging over sliding 60 bp windows.
Occupancy RNAP vs Spt4/5
In order to detect variations in Spt4/5 recruitment pattern on different TUs, we calculated the difference between Spt4/5 and RNAP occupancy for each 250 bp window across the genome as described above. We extracted the coordinates for windows with a difference < -1, i. e. where Spt4/5 Log2(IP/input) occupancy was at least 1 lower than RNAP occupancy. Overlapping windows were merged to determine coordinates of theses regions of difference and the read counts for each complete region of difference was calculated and normalised to read depth and input as described above. The significance between RNAP and Spt4/5 occupancies at these regions was determined by applying the Welch’s t-test followed by Benjamini Hochberg false discovery rate adjustment. In order to determine whether differences between RNAP and Spt4/5 related to 5’-UTR length of coding TU genome-wide, the difference between Spt4/5 and RNAP occupancy were calculated for each mRNA TU promoter region (see above for calculation of promoter occupancy) and correlated to the length of the 5’-UTR.
Sequence motif analysis
To identify promoter elements, the DNA sequences ranging from -50 to +10 nt relative to the identified TSS were extracted using BedTools66 and direct alignments were visualised using WebLogo 368. Putative promoter motifs were determined using MEME-ChIP (Motif Analysis of Large Nucleotide Datasets)69 restricting the search to motifs 6-15 nt wide on the sense strand. The position weight matrix of the resulting 15 nt BRE/TATA motif was used with FIMO (Find Individual Motif Occurrences)69 to identify matches in the sequences upstream of the TSS and provide confidence scores as P values. Due to high AT content of Mja genome, FIMO was also used to identify matches to the BRE/TATA motif in a control set of 7 randomly generated sets of 1508 sequences of the same length from the Mja genome using BedTools66 (Supplementary Fig. 3a). For identification of the Mja RBS motif the DNA sequences corresponding to -20 to +20 surrounding the start codons were analysed using MEME-ChIP and restricting the search to motifs of 4-5 nt on the sense strand. For analysis of the dinucleotide frequencies, the proportion of TA or TG at each position relative to the TSS was calculated. This was compared to the genome average occurrence ot TA/TG dinucleotides using Fisher’s exact test of significance. For analysis of the IMR the percentage of AT at postions -12 to +2 relative to the TSS was calculated using BedTools66 and significance calculated by Wilcoxon signed rank test.
EMSA and in vitro transcription assays
Recombinant mjRNAP was prepared as in61 and EMSA assays performed as in70. Oligonucleotides are listed in Supplemental Table 6. In vitro transcription reactions with plasmids bearing Mja promoters fused to C-less cassettes were carried out analogous to12 with the promoter region including 15 bp upstream of the identified BRE/TATA motifs and 8-13 bp downstream of the TSS. For construction of the C-less fusions the following oligos (Supplemental Table 6) were used: rrnA fw, CRISPR TSS1 fw, CRISPR TSS2 fw, and rpl12 fw all with the C-less rev. Buffer conditions and Mja transcription factor concentrations for Mja in vitro transcription assays were as described in70 with 300 ng of SacI-linearised plasmid, heparin concentration reduced to 5 µg/ml and a single incubation step at 65 °C for 15 min. A recovery marker was included in order to monitor possible losses during the nucleic acid purification prior to gel loading.
Northern blotting
Northern blotting was carried out as in71 using low range RiboRuler RNA ladder (Fermentas) and probes constructed from oligonucleotide templates A3 sense and A3 antisense (Supplemental Table 6).
Statistical analysis
All graphs were produced using GraphPad Prism version 5 and The R Statististical program67 and package ggplot272. Correlations and statistical tests were performed using using R base install, specific tests details as appropriate.
Supplementary Material
Acknowledgements
We are very grateful to Jürg Bähler and Daniel Bitton for helpful advice throughout this project. We would like to thank Tine Arnvig, Dina Grohman and other members of the RNAP lab for encouragement and critical reading of the manuscript. Research in the RNAP laboratory at University College London is funded by Wellcome Trust Investigator Award WT096553MA (to FW).
Footnotes
Author contrubutions
KS designed research, performed experiments, analysed data and wrote the manuscript. FB performed experiments and wrote the manuscript. RR perfomed experiments. MT provided novel materials. FW conceptualised study, designed research and wrote the manuscript.
Competing financial interests
The authors declare no competing financial interests.
Data availability
All sequencing data is available through the NCBI sequence read archive (SRA) with accession SRP089683 (ChIP) and SRP089689 (RNA).
References
- 1.Werner F. Structural evolution of multisubunit RNA polymerases. Trends Microbiol. 2008;16:247–250. doi: 10.1016/j.tim.2008.03.008. [DOI] [PubMed] [Google Scholar]
- 2.Werner F, Grohmann D. Evolution of multisubunit RNA polymerases in the three domains of life. Nat Rev Microbiol. 2011;9:85–98. doi: 10.1038/nrmicro2507. [DOI] [PubMed] [Google Scholar]
- 3.Korkhin Y, et al. Evolution of complex RNA polymerases: the complete archaeal RNA polymerase structure. PLoS Biol. 2009;7:e1000102. doi: 10.1371/journal.pbio.1000102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hirtreiter A, Grohmann D, Werner F. Molecular mechanisms of RNA polymerase - the F/E (RPB4/7) complex is required for high processivity in vitro. Nucleic acids research. 2010;38:585–596. doi: 10.1093/nar/gkp928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Li E, Reich CI, Olsen GJ. A whole-genome approach to identifying protein binding sites: promoters in Methanocaldococcus (Methanococcus) jannaschii. Nucleic acids research. 2008;36:6948–6958. doi: 10.1093/nar/gkm499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhang J, Li E, Olsen GJ. Protein-coding gene promoters in Methanocaldococcus (Methanococcus) jannaschii. Nucleic acids research. 2009;37:3588–3601. doi: 10.1093/nar/gkp213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bell SD, Kosa PL, Sigler PB, Jackson SP. Orientation of the transcription preinitiation complex in Archaea. Proc Natl Acad Sci U S A. 1999;96:13662–13667. doi: 10.1073/pnas.96.24.13662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rowlands T, Baumann P, Jackson SP. The Tata-Binding Protein - a General Transcription Factor in Eukaryotes and Archaebacteria. Science. 1994;264:1326–1329. doi: 10.1126/science.8191287. [DOI] [PubMed] [Google Scholar]
- 9.Qureshi SA, Baumann P, Rowlands T, Khoo B, Jackson SP. Cloning and functional analysis of the TATA-binding protein from Sulfolobus shibatae. Nucleic acids research. 1995;23:1775–1781. doi: 10.1093/nar/23.10.1775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Werner F, Weinzierl RO. A recombinant RNA polymerase II-like enzyme capable of promoter-specific transcription. Mol Cell. 2002;10:635–646. doi: 10.1016/s1097-2765(02)00629-9. [DOI] [PubMed] [Google Scholar]
- 11.Gietl A, et al. Eukaryotic and archaeal TBP and TFB/TF(II)B follow different promoter DNA bending pathways. Nucleic acids research. 2014;42:6219–6231. doi: 10.1093/nar/gku273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Blombach F, et al. Archaeal TFEalpha/beta is a hybrid of TFIIE and the RNA polymerase III subcomplex hRPC62/39. Elife. 2015;4:e08378. doi: 10.7554/eLife.08378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Grohmann D, et al. The initiation factor TFE and the elongation factor Spt4/5 compete for the RNAP clamp during transcription initiation and elongation. Mol Cell. 2011;43:263–274. doi: 10.1016/j.molcel.2011.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Grünberg S, Bartlett MS, Naji S, Thomm M. Transcription factor E is a part of transcription elongation complexes. The Journal of biological chemistry. 2007;282:35482–35490. doi: 10.1074/jbc.M707371200. [DOI] [PubMed] [Google Scholar]
- 15.Werner F. A nexus for gene expression-molecular mechanisms of Spt5 and NusG in the three domains of life. J Mol Biol. 2012;417:13–27. doi: 10.1016/j.jmb.2012.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sevostyanova A, Svetlov V, Vassylyev DG, Artsimovitch I. The elongation factor RfaH and the initiation factor sigma bind to the same site on the transcription elongation complex. Proc Natl Acad Sci U S A. 2008;105:865–870. doi: 10.1073/pnas.0708432105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mayer A, et al. Uniform transitions of the general RNA polymerase II transcription complex. Nat Struct Mol Biol. 2010;17:1272–1278. doi: 10.1038/nsmb.1903. [DOI] [PubMed] [Google Scholar]
- 18.Mooney RA, et al. Regulator trafficking on bacterial transcription units in vivo. Mol Cell. 2009;33:97–108. doi: 10.1016/j.molcel.2008.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kadonaga JT. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip Rev Dev Biol. 2012;1:40–51. doi: 10.1002/wdev.21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nagy J, et al. Complete architecture of the archaeal RNA polymerase open complex from single-molecule FRET and NPS. Nature communications. 2015;6:6161. doi: 10.1038/ncomms7161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Schulz S, et al. TFE and Spt4/5 open and close the RNA polymerase clamp during the transcription cycle. Proc Natl Acad Sci U S A. 2016;113:E1816–1825. doi: 10.1073/pnas.1515817113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bell SD, Jaxel C, Nadal M, Kosa PF, Jackson SP. Temperature, template topology, and factor requirements of archaeal transcription. Proc Natl Acad Sci U S A. 1998;95:15218–15222. doi: 10.1073/pnas.95.26.15218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jäger D, Förstner KU, Sharma CM, Santangelo TJ, Reeve JN. Primary transcriptome map of the hyperthermophilic archaeon Thermococcus kodakarensis. Bmc Genomics. 2014;15:684. doi: 10.1186/1471-2164-15-684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jäger D, et al. Deep sequencing analysis of the Methanosarcina mazei Go1 transcriptome in response to nitrogen availability. Proc Natl Acad Sci U S A. 2009;106:21878–21882. doi: 10.1073/pnas.0909051106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Li J, et al. Global mapping transcriptional start sites revealed both transcriptional and post-transcriptional regulation of cold adaptation in the methanogenic archaeon Methanolobus psychrophilus. Sci Rep. 2015;5:9209. doi: 10.1038/srep09209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wurtzel O, et al. A single-base resolution map of an archaeal transcriptome. Genome Res. 2010;20:133–141. doi: 10.1101/gr.100396.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Babski J, et al. Genome-wide identification of transcriptional start sites in the haloarchaeon Haloferax volcanii based on differential RNA-Seq (dRNA-Seq) Bmc Genomics. 2016;17:629. doi: 10.1186/s12864-016-2920-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Seitzer P, Wilbanks EG, Larsen DJ, Facciotti MT. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs. BMC Bioinformatics. 2012;13:317. doi: 10.1186/1471-2105-13-317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Blombach F, Smollett KL, Grohmann D, Werner F. Molecular Mechanisms of Transcription Initiation-Structure, Function, and Evolution of TFE/TFIIE-Like Factors and Open Complex Formation. J Mol Biol. 2016;428:2592–2606. doi: 10.1016/j.jmb.2016.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Werner F, Weinzierl RO. Direct modulation of RNA polymerase core functions by basal transcription factors. Mol Cell Biol. 2005;25:8344–8355. doi: 10.1128/MCB.25.18.8344-8355.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Qureshi SA, Bell SD, Jackson SP. Factor requirements for transcription in the Archaeon Sulfolobus shibatae. Embo J. 1997;16:2927–2936. doi: 10.1093/emboj/16.10.2927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Burmann BM, et al. A NusE:NusG complex links transcription and translation. Science. 2010;328:501–504. doi: 10.1126/science.1184953. [DOI] [PubMed] [Google Scholar]
- 33.Proshkin S, Rahmouni AR, Mironov A, Nudler E. Cooperation between translating ribosomes and RNA polymerase in transcription elongation. Science. 2010;328:504–508. doi: 10.1126/science.1184939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.French SL, Santangelo TJ, Beyer AL, Reeve JN. Transcription and translation are coupled in Archaea. Mol Biol Evol. 2007;24:893–895.:msm007[pii] doi: 10.1093/molbev/msm007. [DOI] [PubMed] [Google Scholar]
- 35.Brenneis M, Hering O, Lange C, Soppa J. Experimental characterization of Cis-acting elements important for translation and transcription in halophilic archaea. PLoS Genet. 2007;3:e229. doi: 10.1371/journal.pgen.0030229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Torarinsson E, Klenk HP, Garrett RA. Divergent transcriptional and translational signals in Archaea. Environ Microbiol. 2005;7:47–54. doi: 10.1111/j.1462-2920.2004.00674.x. [DOI] [PubMed] [Google Scholar]
- 37.Koide T, et al. Prevalence of transcription promoters within archaeal operons and coding sequences. Mol Syst Biol. 2009;5:285. doi: 10.1038/msb.2009.42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Toffano-Nioche C, et al. RNA at 92 degrees C: the non-coding transcriptome of the hyperthermophilic archaeon Pyrococcus abyssi. RNA Biol. 2013;10:1211–1220. doi: 10.4161/rna.25567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Yang C, Bolotin E, Jiang T, Sladek FM, Martinez E. Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters. Gene. 2007;389:52–65. doi: 10.1016/j.gene.2006.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Qureshi SA. Role of the Sulfolobus shibatae viral T6 initiator in conferring promoter strength and in influencing transcription start site selection. Can J Microbiol. 2006;52:1136–1140. doi: 10.1139/w06-073. [DOI] [PubMed] [Google Scholar]
- 41.Bell SD, Jackson SP. The role of transcription factor B in transcription initiation and promoter clearance in the archaeon Sulfolobus acidocaldarius. The Journal of biological chemistry. 2000;275:12934–12940. doi: 10.1074/jbc.275.17.12934. [DOI] [PubMed] [Google Scholar]
- 42.Shultzaberger RK, Chen Z, Lewis KA, Schneider TD. Anatomy of Escherichia coli sigma70 promoters. Nucleic acids research. 2007;35:771–788. doi: 10.1093/nar/gkl956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Basu RS, et al. Structural basis of transcription initiation by bacterial RNA polymerase holoenzyme. The Journal of biological chemistry. 2014;289:24549–24559. doi: 10.1074/jbc.M114.584037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ehrensberger AH, Kelly GP, Svejstrup JQ. Mechanistic interpretation of promoter-proximal peaks and RNAPII density maps. Cell. 2013;154:713–715. doi: 10.1016/j.cell.2013.07.032. [DOI] [PubMed] [Google Scholar]
- 45.Hirtreiter A, et al. Spt4/5 stimulates transcription elongation through the RNA polymerase clamp coiled-coil motif. Nucleic acids research. 2010;38:4040–4051. doi: 10.1093/nar/gkq135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Diamant G, Bahat A, Dikstein R. The elongation factor Spt5 facilitates transcription initiation for rapid induction of inflammatory-response genes. Nature communications. 2016;7:11547. doi: 10.1038/ncomms11547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Larochelle S, et al. Cyclin-dependent kinase control of the initiation-to-elongation switch of RNA polymerase II. Nat Struct Mol Biol. 2012;19:1108–1115. doi: 10.1038/nsmb.2399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Arnvig KB, et al. Evolutionary comparison of ribosomal operon antitermination function. J Bacteriol. 2008;190:7251–7257. doi: 10.1128/JB.00760-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Werner F. Molecular mechanisms of transcription elongation in archaea. Chem Rev. 2013;113:8331–8349. doi: 10.1021/cr4002325. [DOI] [PubMed] [Google Scholar]
- 50.Micorescu M, et al. Archaeal transcription: function of an alternative transcription factor B from Pyrococcus furiosus. J Bacteriol. 2008;190:157–167. doi: 10.1128/JB.01498-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Jensen KF, Pedersen S. Metabolic growth rate control in Escherichia coli may be a consequence of subsaturation of the macromolecular biosynthetic apparatus with substrates and catalytic components. Microbiol Rev. 1990;54:89–100. doi: 10.1128/mr.54.2.89-100.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Winkelman JT, Chandrangsu P, Ross W, Gourse RL. Open complex scrunching before nucleotide addition accounts for the unusual transcription start site of E. coli ribosomal RNA promoters. P Natl Acad Sci USA. 2016;113:E1787–E1795. doi: 10.1073/pnas.1522159113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Peeters E, Driessen RP, Werner F, Dame RT. The interplay between nucleoid organization and transcription in archaeal genomes. Nat Rev Microbiol. 2015;13:333–341. doi: 10.1038/nrmicro3467. [DOI] [PubMed] [Google Scholar]
- 54.Gehring AM, Walker JE, Santangelo TJ. Transcription Regulation in Archaea. J Bacteriol. 2016;198:1906–1917. doi: 10.1128/JB.00255-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ouhammouch M, Dewhurst RE, Hausner W, Thomm M, Geiduschek EP. Activation of archaeal transcription by recruitment of the TATA-binding protein. Proc Natl Acad Sci U S A. 2003;100:5097–5102. doi: 10.1073/pnas.0837150100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ouhammouch M, Werner F, Weinzierl RO, Geiduschek EP. A fully recombinant system for activator-dependent archaeal transcription. The Journal of biological chemistry. 2004;279:51719–51721. doi: 10.1074/jbc.C400446200. [DOI] [PubMed] [Google Scholar]
- 57.Churchman LS, Weissman JS. Native elongating transcript sequencing (NET-seq) Curr Protoc Mol Biol. 2012;14:11–17. doi: 10.1002/0471142727.mb0414s98. Chapter 4 Unit 4. [DOI] [PubMed] [Google Scholar]
- 58.Nojima T, Gomes T, Carmo-Fonseca M, Proudfoot NJ. Mammalian NET-seq analysis defines nascent RNA profiles and associated RNA processing genome-wide. Nat Protoc. 2016;11:413–428. doi: 10.1038/nprot.2016.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Nojima T, et al. Mammalian NET-Seq Reveals Genome-wide Nascent Transcription Coupled to RNA Processing. Cell. 2015;161:526–540. doi: 10.1016/j.cell.2015.03.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Jones WJ, Leigh JA, Mayer F, Woese CR, Wolfe RS. Methanococcus jannaschii Sp. Nov., an extremely thermophilic methanogen from a submarine hydrothermal vent. Arch Microbiology. 1983;136:254–261. [Google Scholar]
- 61.Smollett K, Blombach F, Werner F. Transcription in Archaea: preparation of Methanocaldococcus jannaschii transcription machinery. Methods Mol Biol. 2015;1276:291–303. doi: 10.1007/978-1-4939-2392-2_17. [DOI] [PubMed] [Google Scholar]
- 62.Reichelt R, Gindner A, Thomm M, Hausner W. Genome-wide binding analysis of the transcriptional regulator TrmBL1 in Pyrococcus furiosus. Bmc Genomics. 2016;17 doi: 10.1186/s12864-015-2360-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Liu W, Vierke G, Wenke AK, Thomm M, Ladenstein R. Crystal structure of the archaeal heat shock regulator from Pyrococcus furiosus: A molecular chimera representing eukaryal and bacterial features. Journal of Molecular Biology. 2007;369:474–488. doi: 10.1016/j.jmb.2007.03.044. [DOI] [PubMed] [Google Scholar]
- 64.Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
- 65.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10 doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.R Core Team. R: A language and environment for statistical computing. 2014.
- 68.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Bailey TL, et al. MEME SUITE: tools for motif discovery and searching. Nucleic acids research. 2009;37:W202–208. doi: 10.1093/nar/gkp335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Smollett K, Blombach F, Werner F. Transcription in Archaea: in vitro transcription assays for mjRNAP. Methods Mol Biol. 2015;1276:305–314. doi: 10.1007/978-1-4939-2392-2_18. [DOI] [PubMed] [Google Scholar]
- 71.Arnvig KB, Young DB. Identification of small RNAs in Mycobacterium tuberculosis. Mol Microbiol. 2009;73:397–408. doi: 10.1111/j.1365-2958.2009.06777.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag; New York: 2009. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All sequencing data is available through the NCBI sequence read archive (SRA) with accession SRP089683 (ChIP) and SRP089689 (RNA).






