Abstract
A plethora of non-coding RNAs has been discovered using high-resolution transcriptomics tools, indicating that transcriptional and post-transcriptional regulation is much more complex than previously appreciated. Small RNAs associated with transcription start sites of annotated coding regions (TSSaRNAs) are pervasive in both eukaryotes and bacteria. Here, we provide evidence for existence of TSSaRNAs in several archaeal transcriptomes including: Halobacterium salinarum, Pyrococcus furiosus, Methanococcus maripaludis, and Sulfolobus solfataricus. We validated TSSaRNAs from the model archaeon Halobacterium salinarum NRC-1 by deep sequencing two independent small-RNA enriched (RNA-seq) and a primary-transcript enriched (dRNA-seq) strand-specific libraries. We identified 652 transcripts, of which 179 were shown to be primary transcripts (∼7% of the annotated genome). Distinct growth-associated expression patterns between TSSaRNAs and their cognate genes were observed, indicating a possible role in environmental responses that may result from RNA polymerase with varying pausing rhythms. This work shows that TSSaRNAs are ubiquitous across all domains of life.
Introduction
Molecular mechanisms that are conserved throughout evolution, or arise independently to perform similar tasks are of major interest to biology [1]. Evolutionary conservation and convergence are strong indicators of important biological functions. Understanding commonalities and differences across organisms from all three domains of life have therefore served as powerful means to discover and characterize important molecular mechanisms.
The roles of non-coding RNA (ncRNA) molecules have proven to be especially elusive. Only recently, high-throughput technologies have revealed that ncRNAs have important functions across diverse biological systems and processes [2], [3]. Among the newly discovered ncRNAs is an intriguing class of transcription start site associated RNAs (TSSaRNAs) that have thus far been observed in eukaryotes and bacteria [4]–[7].
Based on their location, TSSaRNAs have been speculated to play a role in transcription initiation [5], [6], [8]; and based on their tissue-specific regulation they have also been putatively implicated in epigenetic regulation [5], [9]. TSSaRNAs have also been reported in bacteria where it is suggested that they could be part of a regulatory mechanism that prevents transcription initiation until a functional RNA polymerase complex has assembled [4]. In both eukaryotes and bacteria, the production of these transcripts seems to be associated with stalled RNA polymerase [4]–[6]. The RNA polymerase pausing model is the most accepted TSSaRNA biogenesis hypothesis and its functional implications is still under investigation [10], [11].
Regardless, TSSaRNA ubiquity across eukaryotes and bacteria suggests that TSSaRNAs are ancient and must have been present in LUCA. Discovery of TSSaRNAs in archaea would lend credibility to this hypothesis and provide clues into why they are evolutionarily conserved across all organisms.
Results and Discussion
Discovery of TSSaRNAs in the third domain of life
In the present work, we investigated whether TSSaRNAs do indeed exist in archaea and, thus, ubiquitous across all three domains of life. By mining publicly available data, we gathered evidence for TSSaRNAs in 10 archaeal transcriptomes (H. salinarum, M. maripaludis, S. solfataricus, P. furiosus, N. equitans, M. kandleri, H. volcanii, M. psycrophilus, M. mazei and P. abyssi [12]–[21], see supplemental material), including compendia of gene expression profiles over growth curves for 4 organisms: H. salinarum [14], M. maripaludis [12], S. solfataricus [12], [13] and P. furiosus [12] ( Figure 1 ). We mined publicly available gene expression datasets from GEO [22] (http://www.ncbi.nlm.nih.gov/geo/), SRA [23] (http://www.ncbi.nlm.nih.gov/sra) and UCSC Archaeal Genome Browser [24] (http://archaea.ucsc.edu/). Datasets not available in public databases were obtained directly from publications.
Expression of a putative TSSaRNA, measured either by hybridization intensities or by read coverage, had a distinct signature characterized by a sharp rise in signal that plateaus over a small distance and then decays precipitously. This signature was conserved across most transcriptomes that were analyzed, and across all sequencing (Illumina, SOLiD and Roche 454) and microarray (NimbleGen and Agilent) platforms, and all library construction protocols (strand-specific and non-strand specific) [12]–[21] (Figure S1). Aiming TSSaRNAs discovery in all archaeal organisms, all datasets were manually inspected.
TSSaRNAs in H. salinarum NRC-1
The consistency of TSSaRNAs discovery across all platforms and organisms justified further experimentation for independent validation. H. salinarum is a model organism for halophilic archaea and has been extensively studied in the last decade. It became a prime model to study aspects of gene expression regulation, especially due to the establishment of predictive quantitative models with high accuracy [25].
In order to precisely map TSSaRNAs in H. salinarum NRC-1, we performed a strand-specific RNA-seq experiment using non-fragmented small RNAs in the 20–230 bp range. Two biological replicates were extracted from cultures under standard growth conditions [26]. From these replicates, 3.4 million reads were aligned to H. salinarum NRC-1 genome.
The reads from TSSaRNAs create a surplus in coverage values when taken together with reads from the cognate gene ( Figure 2 , Figure S2). A given genomic location can have two sets of aligned reads starting exactly there: (i) reads from transcripts greater than 151 nt but truncated at any length, up to the maximum sequencing length limit (151 bp) and (ii) identical full-length reads from transcripts smaller than 151 nt. Although both sets map to the same initial position, the former show repeatedly the same start and end genomic coordinates. We used relative enrichment of the aligned start position as a feature to automatically detect TSSaRNAs (peaks in “start counts” profiles in Figure 2 ). Using this approach, we discovered 652 TSSaRNAs that were evenly distributed on both strands, and associated with 25% of all annotated protein coding genes.
To distinguish between processed and primary transcripts, we performed a dRNA-seq experiment [27]. Since primary transcripts have their 5′ ends intact, a TEX (Terminator 5′-Phosphate-Dependent Exonuclease) enzyme treatment would enrich a sample for them. Comparing sequenced reads from treated (TEX+) and control (TEX-) libraries it is possible to identify primary TSSaRNAs. Using this approach, we refined our observations and defined 179 primary TSSaRNAs that were evenly distributed on both strands, and associated with 7% of all annotated protein coding genes (Table S1). It is important to note that dRNA-seq experiments are prone to false negatives [28]–[29], thus, it is possible that more than 179 TSSaRNAs do exist. To turn the association of TSSaRNAs to transcription start sites (TSS) robust, we choose to further investigate only those small RNAs strictly correlated to primary TSS positions validated by dRNA-seq data.
The TSSaRNA sizes in H. salinarum ranged from 16 nt to 146 nt with a median size of 27 nt (Figure S3A). The distribution and median size of TSSaRNAs was consistent across many organisms: murine (range: 20 nt to 90 nt, median 20 nt) [6]; human, chicken and fruit fly (range in all three: 13 nt to 28 nt, median 18 nt) [5]. By contrast, the distribution of TSSaRNA sizes in some bacterial organisms was much narrower, e.g., E. coli (range: 33 nt to 40 nt) and M. pneumoniae (range: 35 to 55 nt, few TSSaRNAs up to ∼100 nt) [4]. The proximal locations of TSSaRNAs to translation initiation sites of cognate genes (Figure S3B) are consistent with previous observations that most transcripts in H. salinarum are leaderless [14]. As for bacteria and eukaryote, the distribution of TSSaRNAs location shows that there are some TSS internal to annotated CDSs, which may point to structural annotation imprecision or alternative transcripts.
Transcriptome data indicates multiple and time-varying RNA polymerase pausing sites
The current understanding is that the production of TSSaRNA transcripts is associated with stalled RNA polymerase during cognate gene transcription in eukaryote and bacteria [4]–[6]. This polymerase pausing hypothesis is becoming the prime biogenesis model for TSSaRNA and is bringing key insights into gene expression regulation [10], [11], eclipsing alternative hypothesis such as degradative 3′ end processing or non-degradative (cleavage) gene processing.
In archaea, the absence of a set of RNA-seq reads starting just before TSSaRNA reads' ends (Figure S4) argue against the cleavage biogenesis hypothesis. Moreover, the observation that TSSaRNA compositional/thermodynamical properties are no different from similar regions in non-cognate gene sequences (Figure S5) argue against the degradative biogenesis hypothesis, following the same rationale put forward by Yus et al. [4]. Unsurprisingly, given that the molecular mechanisms involved in RNA polymerase pausing are complex [27] and often involve gene specific structures [28], there were no clear pausing site signatures in the vicinity of all 179 primary TSSaRNA 3′ ends, or even considering all 652 putative TSSaRNAs. Altogether, we have no evidence to believe that only archaea would present a different biogenesis process other than RNA polymerase pausing. To explore this hypothesis properties, we created a simple computational model for RNA polymerase pausing biogenesis scenario (File S1, File S2). This model explores only two parameters for RNA polymerase: elapsed time paused at any given genomic location and time between successive transcription initiation events (Figure S6).
Using multiple pausing sites along a gene with different retention times, the model explains a recurrent RNA-seq experimental observation in our datasets: an ensemble of full-length reads aligned at the same starting position, but with different sizes. We validated this model's implication by performing classical northern-blot experiments for two highly expressed genes: one showing signs of multiple pausing sites (VNG0101G) and one derived from a single pausing site (VNG1213G). VNG0101G encodes a conserved cold shock protein and was selected for further validation since the signal associated with its TSSaRNA was top ranked in tiling array experiments [14]. Notwithstanding the low sensitivity of detecting low abundance RNAs with northern blot [5], the 26 nt TSSaRNA was observed as a distinct band along with its cognate gene transcript ( Figure 3B ). Along with the northern-blot band directly corresponding to the most frequent reads aligned at VNG0101G's TSS position ( Figure 3A ), it is possible to see other less stronger bands, which sizes also correspond to less abundant RNA-seq reads. The computational model can easily recapitulate these observations by using multiple retention positions and times (File S1). If, on the other hand, only one genomic position stalls a RNA polymerase, then only one type of small molecule associated with the TSS would be created. This case is also observed experimentally for VNG1213C gene, a probable exonuclease: RNA-seq data shows a population of reads concentrated around 72 nt, which maps directly with the single band found in the northern-blot experiment ( Figure 3CD ). Therefore, our transcriptome data indicates that it is possible to find multiple RNA polymerase pausing sites along a gene sequence.
Remarkably, it was clear from gene expression profiles that dynamical behavior of a TSSaRNA may be distinct from that of its cognate gene. In some cases, the cognate gene level does not change, but expression of the TSSaRNA has distinct dynamics, with up to 16 fold up-regulation or down-regulation to different degrees ( Figure 4AC ). We also observed instances when both TSSaRNA and cognate gene were differentially regulated, albeit with different patterns ( Figure 4BD ). Imposing stringent criteria, we identified at least 10 TSSaRNA differentially expressed relative to their cognate genes (Table S2, Figure S7). Such differential expression patterns would not be expected if transcription of a TSSaRNA and the full-length transcript of its cognate gene were not regulated by environmental signals, nor could it arise as an experimental artifact of tiling array hybridization and processing. Using pausing sites that can vary their retention time along the growth curve, the RNA polymerase pausing model explains our experimental observation that TSSaRNA can have distinct dynamical behavior relative to their cognate gene. Although counterintuitive, it is possible to generate dynamical profiles such as the ones where TSSaRNA levels remains constant and its cognate gene varies and vice versa, only exploring the two parameters of the model: elapsed time spent paused and time between successive transcripts initiation events (Figure S8, File S2).
Therefore, our transcriptome analysis indicates that there is probably RNA polymerase pausing rhythm regulation in response to environmental perturbations. Future experimental work would reveal how this rhythm may be tuned and what are the implication of this regulation.
Conclusions
In this study we demonstrated that TSSaRNAs are also present in archaea. Our findings complement previous discoveries of these ncRNAs in eukaryotes and bacteria, to show that TSSaRNAs are ubiquitous in all domains of life. Furthermore, the northern-blot banding patterns in our experiment were consistent with previous observations in eukaryotes and bacteria [6], [30], suggesting that TSSaRNAs may be accompanied by a population of transcripts. The prevalent TSSaRNA biogenesis hypotheses, the RNA polymerase pausing, would easily explain these patterns as well as our observation of TSSaRNA/cognate gene differential expression. Comparative transcriptome analysis among all domains of life will be critical for elucidating the precise roles played by TSSaRNAs, in order to explain why they are evolutionarily conserved.
Materials and Methods
Data-mining on archaeal gene expression datasets
To verify the presence of TSSaRNAs in archaea, we mined archaeal publicly available gene expression datasets from GEO [22], SRA [23] and UCSC Archaeal Genome Browser [24].
In this study we analyzed the transcriptome of 11 archaea: Halobacterium salinarum NRC-1, Pyrococcus furiosus DSM 3638, Methanococcus maripaludis S2, Sulfolobus solfataricus P2, Nanoarchaeum equitans Kin4-M, Methanopyrus kandleri AV19, Sulfolobus acidocaldarius MW001, Haloferax volcanii DS2, Methanolobus psycrophilus R15, Methanosarcina mazei Gö1 and Pyrococcus abyssi [12]–[21]. Only S. acidocaldarius data did not present sufficient coverage to clearly show at least one TSSaRNAs signature. Therefore, our observations were made for 10 organisms. Archaeal transcriptomes for which dynamical information was available were highlighted in this work: Halobacterium salinarum NRC-1 [14], Pyrococcus furiosus DSM 3638 [12], Methanococcus maripaludis S2 [12] and Sulfolobus solfataricus P2 [12], [13]. Original accession numbers for these datasets are: GSE13150, GSE18630, GSE38821, GSE26782, GSE44979, SRP028191, SRX188664. Datasets not available in public databases were obtained directly from publications. A brief description for each dataset used is provided in the Table S3.
The expression signal for putative TSSaRNAs locations is a distinct signature characterized by a sharp rise in signal that plateaus over a relatively small distance and then decays precipitously. Tiling array probe intensities and log ratio data for all growth curve time points were obtained from GEO and processed as described in [14]. Heatmaps for expression profiles over the growth curve were relative to a reference growth condition and visualized in Gaggle Genome Browser [31]. Raw RNA-seq datasets were processed by: i) trimming each library using FASTX-toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) to remove adapters; ii) mapping against appropriate reference genomes using Bowtie [32]; and iii) visualizing non-normalized reads coverage as a proxy for gene expression using the integrative tool Gaggle Genome Browser [31].
Cell cultivation and small RNA isolation
H. salinarum NRC-1 was grown in CM media, in a water bath incubator at 37°C with agitation of 125 r.p.m. Reference samples were cultured under standard growth conditions [26], at mid-log phase (OD600≈0.5). Small RNAs for RNA-seq libraries and Total RNAs for dRNA-seq libraries and northern blot experiments were isolated using the MirVana RNA extraction kit (Ambion).
RNA-seq library preparation, sequencing and pre-processing
Two small RNA libraries (biological replicates) from H. salinarum NRC-1 were prepared for sequencing. Small RNAs at mid-log phase cultures were extracted. For each sample, 10 µg of small RNAs were treated with RNAse-free DnaseI (Fermentas) in a final volume of 30 µL. The reaction was incubated for 45 min at 37°C and the RNA was purified using phenol/chloroform purification. 1 µg of treated small RNA was ligated to RNA 3′ Adapter (RA3) using T4 RNA ligase 2 truncated (BioLabs) for 1 hour at 28°C, in the presence of RNase inhibitor. Once RA3 was ligated, we performed the RNA 5′ Adapter (RA5) ligation using T4 RNA ligase in the presence of 10 mM ATP. cDNA was synthesized using specific oligos for 5′ and 3′ adapters using SuperScript III Reverse Transcriptase, according to Illumina Truseq protocol. cDNA libraries were amplified and samples were separated in a Novex 6% PAGE gel. cDNAs from 20 bp up to 230 bp were isolated from the gel and subjected to quantification and quality analysis.
The resulting double stranded cDNA was sequenced on Illumina Miseq v2 platform. Biological replicates were sequenced in the same flow-cell using different indexes. Strand-specific sequencing was performed in MiSeq set to 151 cycles per manufacturer's instructions.
Reads were trimmed using FASTX-toolkit (http://hannonlab.cshl.edu/fastx_toolkit/links.html) to remove adapters. Processed reads were aligned against H. salinarum NRC-1 reference genome (chromosome: NC_002607, plasmid pNRC100: NC_001869.1 and plasmid pNRC200: NC_002608.1) using Bowtie [32] with default parameters (except by “m” parameter, since we discarded ambiguous alignment). Overall, 3,489,281 aligned reads from biological replicates combined were considered in subsequent analysis.
RNA-Seq data were submitted to NCBI's SRA website under the accession number SRP035406.
TSSaRNA definition in H. salinarum
Since H. salinarum small RNA libraries were made without fragmentation, we can observe two sets of reads consistently aligned at the same position near the start codon of a gene: (i) reads marking the transcription start site (TSS) of the gene itself, truncated at diverse lengths up to 151 bp; and (ii) reads smaller than 151 bp consistently found with the same 3′ end, thus, being full-length reads (Figure S9). Type (ii) reads are generally generated by TSSaRNAs.
We used relative enrichment of reads' aligned start coordinates as a parameter to automatically detect TSSaRNAs. We looked for the most frequent start coordinate near the start codon of a CDS. The search was performed in a window starting 50 bp upstream of the translation start site and comprising at maximum 20% of CDS length. To make sure that the TSS is reliable, reads must sum up more than 20 counts. This procedure can detect TSSs, but it is still necessary to split TSSaRNA and cognate gene signals. To isolate the TSSaRNA signal, the most abundant read smaller than 151 bp is defined as the TSSaRNA full-length sequence. All other reads starting at the same position are related to the cognate gene. To be conservative, TSSaRNA reads are only retained if they sum up at least 10 counts.
dRNA-seq library preparation, sequencing and analysis
Total RNAs were treated with Turbo™ DNase (Ambion) and incubated with Terminator™ 5′-phosphate-dependent exonuclease (Epicentre) (TEX+ sample) or only in buffer reaction (TEX- sample) at 30°C for 60 min, at proportion of 1 U TEX per 1 µg total RNA. Reaction products were purified with RNeasy MinElute Cleanup Kit (QIAGEN) and incubated with 1 U of Tobacco Acid Pyrophosphatase (TAP) (Epicentre) at 37°C for 1 hour in order to generate 5′-mono-phosphates RNAs able to bind to sequencing adapters. Reactions were purified again with RNeasy MinElute Cleanup Kit (QIAGEN).
Sequencing libraries were prepared with 1 µg of treated (TEX+) and untreated (TEX-) samples using a similar protocol described above for RNA-seq experiments. To ensure sequencing of a wider range of transcripts we increased the extension time on cDNA amplification step to 1 min and isolated molecules from 20 bp up to ∼480 bp on the gel. Paired-end sequencing was performed on Illumina Miseq v2 platform using 300 cycles kit. Forward reads were trimmed and mapped to the reference genome using Bowtie [32] as previous described. 435,339 reads corresponding to TSSaRNAs were used in subsequent analysis. TSSaRNAs presenting at least a 95% reads enrichment in TEX+ library relative to the TEX- library were considered as primary transcripts.
Northern-blot
For Northern-blot analyzes, 30 µg of total RNA treated with RNAse-free DNAseI (Fermentas) was separated on polyacrylamide gel (8% acrylamide:bisacrylamide [29∶1], 8 M urea, 1xTris–borate–EDTA buffer). RNAs were transferred to Hybond-N+ membranes (GE Healthcare) and hybridized with 32P-labeled oligonucleotides (5′-AGTGTCGTTGAAGAAGTCAACTTCGCCTGTCGCCATTGCAACT-3′ for VNG0101G and 5′-AAAAGTGGCCGTGGGCAGCGGCCACCCGAT-3′ for VNG1213C) using Rapid-hyb buffer (GE Healthcare). Signals were detected by autoradiography using a M35A X-Omat Processor (Kodak). Genes encoding a conserved cold-shock protein (VNG0101G) (updated annotation: Supplementary Material 2 table from [14] and a probable exonuclease(VNG1213C) (updated annotation: UCSC Archaeal Genome Browser [24] and HaloLex project [33]) were chosen for this analysis.
Promoter and structural analysis of TSSaRNA sequences
DNA sequences of 11 bp around TSSaRNA 3′ ends were analyzed for conserved patterns using MEME with default parameters [34] in order to identify possible RNA polymerase pausing site motifs. Secondary structures of TSSaRNAs were predicted using the GeneRfold Bioconductor package interface for Vienna RNA library [35]. In this analysis, Gibbs Free Energy of predicted structures derived from TSSaRNAs sequences were compared to sequences from non-cognate genes derived from similar regions.
TSSaRNAs differential expression analysis
Differential expression of TSSaRNAs in H. salinarum NRC-1 was computed from a published dataset generated by tiling array hybridization of total RNA from 13 time points over a growth curve [14]. Using TSSaRNAs sequence coordinates information defined by single-base resolution RNA-seq, we revisited hybridization data and automatically selected a tiling array probe that best fits each TSSaRNA. The selected probe was required to have the highest TSSaRNA sequence coverage and, at the same time, should not cover any length beyond the TSSaRNA end (Figure S10). We compared the TSSaRNA representative probe intensity with the median intensity of the upstream region and also, with the intensity of the cognate gene. To be considered differentially expressed, this probe must have a substantial difference in relative intensity when compared to the other cognate gene probes and its surrounding (Figure S11). A TSSaRNA probe must show at least 10-fold difference relative to the overall relative intensity of its cognate gene: V = MTSSaRNA – Mcognate≥1, where M = log10(t/t ref), t ref is taken at the reference time point in [14], t is taken at the growth curve time point when the second most different |V| is seen, Mcognate is the median of all cognate gene probes starting beyond TSSaRNA 3′ end. The same procedure is also required for an upstream region to make sure that TSSaRNA probe is not a merely continuum of adjacent transcript signal. Therefore, a differentially expressed probe must also show at least a 2-fold difference relative to the overall relative intensity of an upstream region. This upstream region is 300 bp long, 120 bp away from TSSaRNA start (Figure S10). If, there is an annotated gene closer than 200 bp from the TSSaRNA start, the aforementioned region is ignored and the whole adjacent CDS region is considered for probe averaging.
RNA polymerase pausing computational model
A simple RNA polymerase pausing model was created (Figure S6) and implemented in R programing language (File S1, File S2). The model attributes a waiting time for each base position along a virtual gene. For simplicity, this waiting time is taken to be 1 arbitrary time unit. A RNA polymerase pausing site is a position where a moving RNA polymerase stalls for more than the default waiting time. This time is called “stalled time” (Δt). There is an “intrinsic transcription initiation time interval” (Δτ), which is the time it takes between two successive RNA polymerases to start their trajectory along the gene from the first base pair to the gene's end. These two time interval parameters are the most important parameters. Other auxiliary parameters are: gene length L, pause position L′ and total simulation elapsed time T. A RNA polymerase is not allowed to keep traveling along the gene if there is another one stalled at the next base pair. In this case it releases its transcript and detaches from DNA, terminating the transcription process. Also, the stalled RNA polymerase that blocked the previous one is not affected and only keep moving forward when its waiting time at pausing site is up.
Supporting Information
Acknowledgments
We thank FMRP Miseq sequencing facility; Carolina Marcano and Guilherme Mendes from Illumina Corp. We thank Prof. Angela K. Cruz laboratory for helping with radioactive experiments. We are specially grateful to Diego M. Salvanha for extremely helpful discussions on Gaggle Genome Browser usage; to all LaBiSisMi (Laboratório de Biologia Sistêmica de Microorganismos) members specially Sílvia Epifânio and José Vicente Gomes Filho. We thank the anonymous reviewers for extremely helpful criticism that improved our work substantially.
Data Availability
The authors confirm that all data underlying the findings are fully available without restriction.
Funding Statement
This work was supported by Projeto Jovem Pesquisador em Centros Emergentes da Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, http://fapesp.br/en/) [09/09532-0 to TK]; Edital Universal do Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) [473660/2013-0 to TK, 470120/2009-6 to TK, 476724/2013-9 to RZNV]; Fundação de Apoio ao Ensino, Pesquisa e Assistência do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto da Universidade de São Paulo (FAEPA) [1640/2009 to TK]; Núcleo de Pesquisa em Ciência Genômica (NAP-CG) da Universidade de São Paulo; and fellowships FAPESP [11/07487-7 to LSZ and 11/14455-4 to FC]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Stern DL (2013) The genetic causes of convergent evolution. Nat Rev Genet 14: 751–764 Available: http://www.ncbi.nlm.nih.gov/pubmed/24105273 Accessed 11 December 2013 [DOI] [PubMed] [Google Scholar]
- 2. Gong H, Vu G-P, Bai Y, Chan E, Wu R, et al. (2011) A Salmonella small non-coding RNA facilitates bacterial invasion and intracellular replication by modulating the expression of virulence factors. PLoS Pathog 7: e1002120 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3174252&tool=pmcentrez&rendertype=abstract Accessed 31 May 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lease RA, Smith D, Mcdonough K, Belfort M (2004) The Small Noncoding DsrA RNA Is an Acid Resistance Regulator in Escherichia coli. J Bacteriol 186 . doi:10.1128/JB.186.18.6179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Yus E, Güell M, Vivancos AP, Chen W-H, Lluch-Senar M, et al. (2012) Transcription start site associated RNAs in bacteria. Mol Syst Biol 8: 585 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3377991&tool=pmcentrez&rendertype=abstract Accessed 20 January 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Taft RJ, Glazov EA, Cloonan N, Simons C, Stephen S, et al. (2009) Tiny RNAs associated with transcription start sites in animals. Nat Genet 41: 572–578 Available: http://www.ncbi.nlm.nih.gov/pubmed/19377478 Accessed 5 March 2013 [DOI] [PubMed] [Google Scholar]
- 6. Seila AC, Calabrese JM, Levine SS, Yeo GW, Rahl PB, et al. (2008) Divergent transcription from active promoters. Science 322: 1849–1851 Available: http://www.ncbi.nlm.nih.gov/pubmed/19597342 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Cserzo M, Turu G, Varnai P, Hunyady L (2010) Relating underrepresented genomic DNA patterns and tiRNAs: the rule behind the observation and beyond. Biol Direct 5: 56 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3583238&tool=pmcentrez&rendertype=abstract [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Henriques T, Gilchrist DA, Nechaev S, Bern M, Muse GW, et al. (2013) Stable Pausing by RNA Polymerase II Provides an Opportunity to Target and Integrate Regulatory Signals. Mol Cell: 1–12. Available: http://www.ncbi.nlm.nih.gov/pubmed/24184211. Accessed 6 November 2013. [DOI] [PMC free article] [PubMed]
- 9. Taft RJ, Hawkins PG, Mattick JS, Morris KV (2011) The relationship between transcription initiation RNAs and CCCTC-binding factor (CTCF) localization. Epigenetics Chromatin 4: 13 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3170176&tool=pmcentrez&rendertype=abstract Accessed 16 October 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Buckley MS, Kwak H, Zipfel WR, Lis JT (2014) Kinetics of promoter Pol II on Hsp70 reveal stable pausing and key insights into its regulation. Genes Dev 28: 14–19 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3894409&tool=pmcentrez&rendertype=abstract Accessed 11 July 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jonkers I, Kwak H, Lis JT (2014) Genome-wide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons: 1–25. doi:10.7554/eLife.02407. [DOI] [PMC free article] [PubMed]
- 12. Yoon SH, Reiss DJ, Bare JC, Tenenbaum D, Pan M, et al. (2011) Parallel evolution of transcriptome architecture during genome reorganization. Genome Res 21: 1892–1904 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3205574&tool=pmcentrez&rendertype=abstract Accessed 14 February 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Wurtzel O, Sapra R, Chen F, Zhu Y, Simmons BA, et al. (2010) A single-base resolution map of an archaeal transcriptome. Genome Res 20: 133–141 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2798825&tool=pmcentrez&rendertype=abstract Accessed 21 May 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Koide T, Reiss DJ, Bare JC, Pang WL, Facciotti MT, et al. (2009) Prevalence of transcription promoters within archaeal operons and coding sequences. Mol Syst Biol 5: 1–16 Available: http://www.ncbi.nlm.nih.gov/pubmed/19536208 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Chen Z, Yu H, Li L, Hu S, Dong X (2012) The genome and transcriptome of a newly described psychrophilic archaeon, Methanolobus psychrophilus R15, reveal its cold adaptive characteristics. Environ Microbiol Rep 4: 633–641 Available: http://www.ncbi.nlm.nih.gov/pubmed/23760934 Accessed 14 February 2014 [DOI] [PubMed] [Google Scholar]
- 16. Jäger D, Sharma CM, Thomsen J, Ehlers C, Vogel J, et al. (2009) Deep sequencing analysis of the Methanosarcina mazei Gö1 transcriptome in response to nitrogen availability. PNAS 106: 21878–21882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Märtens B, Amman F, Manoharadas S, Zeichen L, Orell A, et al. (2013) Alterations of the Transcriptome of Sulfolobus acidocaldarius by Exoribonuclease aCPSF2. PLoS One 8: e76569 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3792030&tool=pmcentrez&rendertype=abstract Accessed 25 December 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Su AAH, Tripp V, Randau L (2013) RNA-Seq analyses reveal the order of tRNA processing events and the maturation of C/D box and CRISPR RNAs in the hyperthermophile Methanopyrus kandleri. Nucleic Acids Res 41: 6250–6258 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3695527&tool=pmcentrez&rendertype=abstract Accessed 5 February 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Toffano-Nioche C, Ott A, Crozat E, Nguyen AN, Zytnicki M, et al. (2013) RNA at 92°C. RNA Biol 10: 1211–1220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Randau L (2012) RNA processing in the minimal organism Nanoarchaeum equitans. Genome Biol 13: R63 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3491384&tool=pmcentrez&rendertype=abstract Accessed 4 October 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Ammar R, Torti D, Tsui K, Gebbia M, Durbic T, et al. (2012) Chromatin is an ancient innovation conserved between Archaea and Eukarya. Elife 1: e00078 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3510453&tool=pmcentrez&rendertype=abstract Accessed 19 November 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, et al. (2013) NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res 41: D991–5 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3531084&tool=pmcentrez&rendertype=abstract Accessed 11 December 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36: D13–21 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2238880&tool=pmcentrez&rendertype=abstract Accessed 20 December 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Chan PP, Holmes AD, Smith AM, Tran D, Lowe TM (2012) The UCSC Archaeal Genome Browser: 2012 update. Nucleic Acids Res 40: D646–52 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3245099&tool=pmcentrez&rendertype=abstract Accessed 23 June 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Bonneau R, Facciotti MT, Reiss DJ, Schmid AK, Pan M, et al. (2007) A predictive model for transcriptional control of physiology in a free living cell. Cell 131: 1354–1365. [DOI] [PubMed] [Google Scholar]
- 26. Baliga NS, DasSarma S (1999) Saturation mutagenesis of the TATA box and upstream activator sequence in the haloarchaeal bop gene promoter. J Bacteriol 181: 2513–2518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, et al. (2010) The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464: 250–255 Available: http://www.ncbi.nlm.nih.gov/pubmed/20164839 Accessed 5 March 2013 [DOI] [PubMed] [Google Scholar]
- 28. Jorjani H, Zavolan M (2014) TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data. Bioinformatics 30: 971–974 Available: http://www.ncbi.nlm.nih.gov/pubmed/24371151 Accessed 16 July 2014 [DOI] [PubMed] [Google Scholar]
- 29. Amman F, Wolfinger MT, Lorenz R, Hofacker IL, Stadler PF, et al. (2014) TSSAR: TSS annotation regime for dRNA-seq data. BMC Bioinformatics 15: 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Hot D, Slupek S, Wulbrecht B, D'Hondt A, Hubans C, et al. (2011) Detection of small RNAs in Bordetella pertussis and identification of a novel repeated genetic element. BMC Genomics 12: 207 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3110155&tool=pmcentrez&rendertype=abstract Accessed 20 December 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Bare JC, Koide T, Reiss DJ, Tenenbaum D, Baliga NS (2010) Integration and visualization of systems biology data in context of the genome. BMC Bioinformatics 11: 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2690996&tool=pmcentrez&rendertype=abstract Accessed 1 March 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Pfeiffer F, Broicher A, Gillich T, Klee K, Mejía J, et al. (2008) Genome information management and integrated data analysis with HaloLex. Arch Microbiol 190: 281–299 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2516542&tool=pmcentrez&rendertype=abstract Accessed 11 July 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, et al. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37: W202–8 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2703892&tool=pmcentrez&rendertype=abstract Accessed 29 January 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lucas A, Thermes C (2006) GeneRfold: R for genes and sequences, using viennaRNA package. http://www.tbi.univie.ac.at/~ivo/RNA.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors confirm that all data underlying the findings are fully available without restriction.