Abstract
Streptomyces coelicolor is considered the model organism among Gram-positive, GC-rich bacteria. Its genome has been sequenced but little is known about the occurrence and distribution of small non-coding RNAs in this biotechnologically relevant organism. Using deep sequencing we analyzed the transcriptome at the end of exponential growth, which corresponds to the onset of secondary metabolism. We mapped 193 transcriptional start sites of mRNA genes and identified putative new and alternative open reading frames. We identified 63 non-coding RNAs including 29 cis encoded antisense RNAs and confirmed expression for 11, most of them being growth-phase-dependent. A comparison between the sequencing results and bioinformatic sRNA predictions using Dynalign and RNAz revealed only a small overlap between the different approaches.
Key words: small non-coding RNA, Streptomyces coelicolor, transcriptome, deep sequencing, bioinformatic prediction
Introduction
Streptomycetes are soil dwelling, Gram-positive, GC-rich bacteria that undergo complex morphological differentiation.1 They have a unique capacity to produce novel bioactive compounds and are the most prolific producer of secondary metabolites like antibiotics, immunosuppressants, antivirals and herbicides.2,3 Therefore, they are of great importance for industrial applications.3
Streptomyces coelicolor has a linear chromosome with a length of 8.7 Mb that harbors almost 8000 annotated open reading frames (ORF). The chromosome of actinomycetes is divided into three parts. The ‘core’—a region of about 4.9 Mb in the S. coelicolor genome—encodes the majority of the essential house-keeping genes.1 The left and right arms of the chromosome (1.5 and 2.3 Mb, respectively) on either side of the core region mostly contain non-essential and species-specific genes, e.g., coding for secondary metabolite pathways. Their unusually complex life cycle involving the development of sporophyte and mycelia colonies demands multilevel regulation to control the decisive steps of its morphological and chemical differentiation.1 This is reflected by the fact that 10% of all protein coding genes in S. coelicolor are predicted to have a regulatory function controlling the metabolic and morphological changes these bacteria undergo during their life cycle.1 In fact, the genome encodes close to 70 sigma factors and hundreds of transcription factors. Transcriptional regulation of metabolic processes and the control of secondary metabolism have been studied extensively; however, only little is known about the extent and importance of posttranscriptional regulation within this organism.4,5
Within the last years, small non-coding RNAs (sRNAs) have been implicated as important posttranscriptional regulators in a variety of adaptive cellular and developmental processes as well as during virulence in bacteria.6–9 They are an integral part of the gene regulatory network and add another layer of control. Due to its extensive metabolic activities, S. coelicolor already harbors a highly complex regulatory network on transcriptional level. Therefore we questioned to what extend this bacterium makes use of sRNAs to fine tune its gene expression. Thereby the high GC content of streptomycetes of up to 74% and the lack of the RNA chaperone Hfq suggest that control at the posttranscriptional level may differ from other bacteria due to their potential to form highly structured and stable RNA species.
Deep sequencing has recently emerged as a powerful tool for the identification of sRNAs.10–14 Here, we present the first analysis of the primary transcriptome of S. coelicolor M145 using a differential RNA-sequencing (dRNA-seq) approach and the 454 sequencing technology.11 We have chosen a time point at the end of exponential growth, which corresponds to the onset of secondary metabolism. We identified 63 putative sRNAs and verified expression of 11 of them, with the expression of most of them being growth-phase-dependent. In addition, we mapped the transcriptional start sites of 193 mRNA genes. Comparison of our sequencing data with previously identified sRNAs15–18 and bioinformatic sRNA prediction by RNAz and Dynalign and revealed a very low overlap between the different approaches.
Results and Discussion
Deep sequencing of S. coelicolor total RNA.
We analyzed total RNA harvested from S. coelicolor grown in liquid-rich medium (TSB19) for 72 hours until the end of the exponential growth phase, a time point at the onset of secondary metabolism which is reflected by the start of production of the antibiotic actinorhodin as shown in Figure 1A. Two cDNA pools were created, one generated from the original, untreated total RNA covering both primary and processed transcripts (SCO−) and a second library, where newly initiated transcripts were enriched by enzymatic treatment with 5′-P-dependent terminator exonuclease (SCO+) as previously described in reference 11. The resulting libraries were subjected to sequencing yielding a total of 79,374. Only sequences >17 nucleotides (nt) were considered for further analysis (SCO+: 28,414 reads, SCO−: 29,288 reads, representing 73 and 72% of the respective library). These 57,702 sequences were mapped to the genome of S. coelicolor using WU_Blast 2.0 (blast.wustl.edu/). 80% of these could be aligned to a single position in the genome. Sequences overlapping each other were clustered and sorted by whether they contained annotated features or not using a python script. That way the sequencing results could be assigned to 3,206 locations (clusters) in the genome. 1,954 clusters covered annotated features like tRNAs, rRNAs and ORFs while 1,252 clusters are located in intergenic regions. The distribution of the different RNA species is shown in Figure 1B. Half of the annotated transcripts (50% of the 1,954 clusters) are located in the 4.9 Mb core region of the linear S. coelicolor chromosome. The other half is encoded almost equally on the left and right arm (28 and 22%, respectively). The transcripts mapping to non-annotated regions of the genome (1,252 clusters) show a similar bias towards the core region (Sup. Fig. 1). This hints at a more general function of most of the newly identified transcripts.
Figure 1.
(A) Growth curve and actinorhodin production of S. coelicolor M145. The time point of cell harvest for the 454 sequencing is indicated by an arrow. Actinorhodin was measured simultaneously. (B) Characterization of the cDNA libraries. Distribution of rRNAs, tRNAs, transcripts of annotated mRNAs (ORFs), transcripts antisense to mRNAs (as to mRNA), transcripts located in intergenic regions (igr) and other known RNAs (like tmRNA or 6s RNA) within the 454 sequencing libraries.
Annotation of transcriptional start sites.
We analyzed the 5′untranslated regions (UTRs) of the 1,954 clusters which correspond to annotated genome locations in order to identify transcriptional start sites (TSS). Only sequencing reads starting at the same nucleotide (±1 nt) have been taken into account. In total, we mapped 192 TSS (Sup. Tables 1 and 2). The transcripts showed a broad variation in the length of the 5′UTR with a peak at 31–40 nt and a length ranging from 13 to 496 nt. A detailed length distribution is displayed in Figure 2A. Twenty-two transcripts (11.5%) have long leader regions (>150 nt) that may be a target for posttranscriptional regulation or carry cis encoded regulatory RNA elements such as riboswitches. Some of the TSS have already been published before and fit well to our 454 data. For two of them, actII-2 and acoA,20,21 the cDNA distribution shows a typical enrichment pattern towards the 5′-end in the exonuclease treated library (Sup. Fig. 2).
Figure 2.
(A) Length distribution of the 139 mapped 5′UTRs. (B) Distribution of the 63 identified sRNAs in the genome. Regions with sRNAs verified by northern blot are marked in red, the chromosomal arms are shaded and labeled.
Leaderless translation is common in S. coelicolor.
Genes with 5′UTRs <11 nt were considered to be leaderless mRNA transcripts. One fourth of all mapped transcripts (53 TSS) were leaderless with a strong bias to transcripts with the start codon as the very first nucleotide (Table 1). S. coelicolor uses not only AUG but also GUG and UUG as a start codon. Interestingly, we detect GUG and UUG start codons also for the leaderless transcripts (19% GUG and 2% UUG, Table 1). This is in contrast to previous reports from Escherichia coli and Haloferax volcanii where exclusively AUG can act as start codon on leaderless transcripts.22,23 S. coelicolor shows a slight bias towards AUG in leaderless genes compared to all ORFs but this is far from the exclusiveness reported for E. coli and H. volcanii (Table 1).
Table 1.
Usage of different start codons for leaderless translation
| AUG | GUG | UUG | |
| (A) Usage of different start codons | |||
| E. coli, total transcripts19 | 97% | 3% | |
| S. coelicolor, total transcripts19 | 61% | 36% | 3% |
| S. coelicolor, leaderless transcripts | 79% | 19% | 2% |
| (B) Total number of different start codons used for leaderless transcripts | |||
| nt in front of the start codon | |||
| 0 | 32 | 5 | |
| 1 | 2 | 2 | |
| 2 | 2 | 1 | |
| 3 | 5 | ||
| 4 | |||
| 5 | 1 | ||
| 6 | 1 | ||
| 7 | |||
| 8 | |||
| 9 | 1 | 1 | |
| 10 | |||
The extensive use of leaderless transcripts in Streptomyces has already been described in references 19 and 24–26. Streptomycetes can translate leaderless mRNAs, like the aphI gene from Streptomyces fradiae27 very efficiently. Most of the previously described leaderless mRNAs in streptomycetes belong to antibiotic resistance genes and are not present in S. coelicolor. A homolog of the leaderless afsA gene from S. griseus (scbA) was detected in our 454 analysis; however, it has a 46 nt long 5′UTR in S. coelicolor. Another known leaderless mRNA (of the response regulator afsQ1) was found to be leaderless but was represented by only one cDNA read.
Mechanistic studies regarding the translation of leadeless mRNA have been performed in E. coli.28–30 This organism employs specialized 61S ribosomes that lack among others the ribosomal protein S1.31 S1 seems to be less important for translation in Gram-positive bacteria. It is non-essential in Bacillus subtilis and truncated at its C-terminus.32 Such a shortened version of S1 also exists in S. coelicolor (SCO1998) and may lead to the speculation that leaderless mRNAs play a more important role in Gram-positive bacteria with translation initiation less dependent on S1.
Identification of small non-coding RNAs in the intergenic regions and antisense RNAs.
1,252 clusters were identified that do not map to annotated regions of the genome. They are located in intergenic regions or antisense to annotated ORFs. Strikingly, we observed numerous transcripts with a length of 30–50 nt, which is shorter than the currently discussed average size of sRNAs in other bacteria.6 A closer inspection of their location and structure suggests that these transcripts form a single hairpin loop structure often located up to 50 nt downstream of the 3′ end of annotated genes (expression of five of them has been verified by northern blot analysis, data not shown). We speculate that these transcripts may be transcriptional terminator fragments remaining from mRNA degradation.
Based on this observation we defined a set of strict filters to distinguish between real sRNA candidates and degradation products. Reads to be considered for further analysis had to be >79 nt in length and at least 60 nt away from any annotated gene encoded on the same strand. In addition and analogous to the analysis of the TSS of annotated genes, the 5′end had to be sequenced at least three times. These parameters resulted in 63 candidates of 82–494 nt in length (34 putative sRNAs and 29 transcripts located antisense to an ORF). All 63 candidates are listed in Table 2. A graphical representation of their distribution in the genome is shown in Figure 2B. Interestingly, about half of the putative sRNA candidates are located in the core region which is similar to the distribution of the annotated transcripts (see Sup. Fig. 1).
Table 2.
Summary of putative non coding RNAs identified by 454 sequencing
| Name | Start [nt] | Stop [nt] | Strand | Length [nt] | Reference | Comments |
| sRNA, expression verified by northern blot | ||||||
| scr1601 | 1711967 | 1712074 | c | 108 | ||
| scr2736-2 | 2982570 | 2982745 | 176 | |||
| scr2952 | 3208719 | 3208810 | 92 | |||
| scr3202 | 3510095 | 3510186 | c | 92 | ||
| scr3920 | 4315358 | 4315484 | 127 | |||
| scr4115 | 4515318 | 4515426 | 109 | |||
| scr4389 | 4805579 | 4805789 | c | 211 | ||
| scr4632 | 5055055 | 5055172 | c | 118 | ||
| scr5676 | 6176284 | 6176413 | c | 130 | ||
| scr6106 | 6706584 | 6706667 | 84 | |||
| scr6925 | 7688336 | 7688466 | 131 | |||
| sRNA, expression not yet verified by northern blot | ||||||
| scr1104 | 1161587 | 1161700 | c | 114 | ||
| scr2736-1 | 2982224 | 2982505 | 282 | |||
| scr3580 | 3958028 | 3958181 | c | 154 | ||
| scr3928 | 4323842 | 4324023 | 182 | |||
| scr4132 | 4545730 | 4545843 | 114 | |||
| scr4659 | 5088759 | 5088859 | 101 | |||
| scr4827 | 5257254 | 5257339 | 86 | |||
| scr5529 | 6023430 | 6023545 | 116 | |||
| scr6280 | 6937783 | 6937870 | 88 | |||
| scr6908 | 7672451 | 7672670 | 220 | |||
| scr7601 | 8427312 | 8427437 | c | 126 | ||
| Antisense RNA | ||||||
| as0091 | 78047 | 78191 | c | 145 | ||
| as0642 | 682646 | 682730 | 85 | |||
| as2080 | 2233552 | 2233634 | 83 | |||
| as2247 | 2416288 | 2416397 | 110 | |||
| as2364 | 2533451 | 2533546 | c | 96 | ||
| as2780 | 3034099 | 3034209 | c | 111 | ||
| as3029 | 3312231 | 3312325 | 95 | |||
| as3111 | 3412144 | 3412346 | 203 | |||
| as3125 | 3425777 | 3425967 | c | 191 | ||
| as3287 | 3636488 | 3636569 | c | 82 | ||
| as3317 | 3669433 | 3669657 | 225 | |||
| as3321 | 3673385 | 3673581 | 197 | |||
| as3404 | 3770178 | 3770278 | 101 | |||
| as3496 | 3861776 | 3861898 | 123 | |||
| as3680 | 4064602 | 4064701 | 100 | |||
| as4261 | 4674583 | 4674770 | 188 | |||
| as4566 | 4984053 | 4984157 | c | 105 | ||
| as4567 | 4984635 | 4984793 | c | 159 | ||
| as4672 | 5104194 | 5104588 | 395 | |||
| as4675 | 5106760 | 5106941 | 182 | |||
| as4692/3 | 5119179 | 5119518 | c | 340 | scO4692/31 | |
| as4699 | 5124039 | 5124366 | c | 328 | ||
| as5028 | 5464169 | 5464418 | c | 250 | ||
| as5721 | 6241794 | 6241937 | c | 144 | ||
| as6323 | 6983615 | 6983803 | 189 | |||
| as6418 | 7087036 | 7087162 | 127 | |||
| as6721 | 7476805 | 7476903 | c | 99 | ||
| as7201 | 8004633 | 8004746 | c | 114 | ||
| Further transcripts | ||||||
| scr0991 | 1045891 | 1046090 | 200 | cis encoded, cobRs | ||
| scr2076 | 2226957 | 2227117 | 161 | cis encoded, t-box | ||
| scr4701 | 5127282 | 5127536 | 255 | cis encoded, S10 | ||
| scr3558 | 3933527 | 3933668 | c | 142 | 15, 16, 19 | 6C motif |
| scr3559 | 3934693 | 3934927 | 235 | 15, 16 | 6S RNA | |
| scr1821 | 1950867 | 1950948 | 82 | 16 | new ORF2 | |
| scr3035 | 3321216 | 3321353 | c | 138 | 16 | new ORF3 |
| scr4164 | 4580774 | 4580933 | 160 | 16 | new ORF4 | |
| scr5822 | 6370296 | 6370789 | c | 494 | new ORF5 | |
| scr1980 | 2118877 | 2118959 | c | 83 | 16 | alternative ORF6 |
| scr3323 | 3675098 | 3675329 | 232 | 16 | alternative ORF7 | |
| scr4800 | 5224234 | 5224319 | c | 86 | 16 | alternative ORF8 |
| scr5856 | 6412335 | 6412512 | c | 178 | alternative ORF9 | |
References are given for transcripts shown to be expressed in previous studies; c, crick strand; as1234, antisense transcript to the gene SC O1234, cis, transcript verified as cis-encoded RNA (see Supplemental Figure 5); new/alternative ORF, sRNA located within or close to a conserved/alternative but up to now not yet annotated open reading frame.
Antisense transcript which overlaps with both SC O4692 and SC O4693
A 44 amino acids (aa) long ORF is located close to scr1821 which is conserved in a number of Streptomyces strains. An additional conserved ORF 29 nt downstream of the transcript is also conserved in a large number of bacteria.
scr3035 is located sense to a 183 aa long ORF that is annotated as hypothetical protein in several Streptomyces strains.
scr4164 overlaps with the 3′end of a 93 aa ORF conserved in several streptomyces species.
scr5822 covers a 78 aa ORF conserved in numerous different bacteria. 2–5The location of these new ORFs is displayed in Supplemental Figure 3.
There is an alternative ORF to SC O1979 that is 78 aa longer on the 5′end of the gene. This longer fragment is conserved in several Streptomyces strains and covers scr1980.
The 5′end of ORF 3323 is annotated at 3675442 nt at NCBI and at 3675181 nt at strepDB. In the latter case, scr3323 represents the 5′UTR of that gene.
There is an alternative ORF to SC O4799 that is 23 aa longer on the 5′end of the gene. This longer fragment is conserved in several Streptomyces strains and covers scr4800.
There is an alternative ORF to SC O5855 that is 11 aa longer on the 5′ end of the gene. This longer fragment is conserved in several Streptomyces strains. The distance to scr5856 in that case is only 44 nt, so below our threshold. 6,8,9The location of the alternative ORFs is displayed in Supplemental Figure 4.
We excluded ten of the 34 putative sRNAs from further analysis for the following reasons. Four transcripts overlap with ORFs that are highly conserved but had not been annotated so far (Table 2: transcripts scr1821, scr3035, scr4164, scr5822, their location is displayed in Sup. Fig. 3). Three further transcripts are located close to ORFs. For these ORFs we identified an alternative, longer reading frames with a new start codon upstream of the annotated one. These ORFs would include the respective transcript or are located next to it (Table 2: transcripts scr1980, scr4800, scr5856, the location is displayed in Sup. Fig. 4). scr3558 was already described as an sRNA (6C motif33) and scr3559 corresponds to 6S RNA.34 scr3323 was excluded because of inconsistent annotation of the downstream gene SCO3323. The 5′end of ORF3323 is annotated at nucleotide 3675442 at the NCBI database but at 3675181 at the StrepDB. In the latter case, the transcript would represent the 5′UTR of that gene and consequently not fit our filter criteria.
The remaining 24 candidates were subjected to confirmatory northern blot analysis. The RNA for the sequencing analysis had been prepared at the onset of stationary phase. We now prepared RNA from three time points during S. coelicolor development covering logarithmic, exponential and stationary phase (24, 48 and 90 hours). Expression of 14 sRNAs could be verified on northern blots (Fig. 3A and Sup. Fig. 5). Three candidates were predicted to encode potential cis regulatory RNA elements (scr0991, scr2076 and scr4701, Sup. Fig. 5). scr0991 possibly encodes a cobalamin riboswitch (predicted by Rfam, family RF00174) in the 5′UTR of a putative kinase with unknown function (SCO0991). scr2076 may encode a t-box leader element (also predicted by Rfam, family RF00230) in the 5′UTR of an isoleucyl t-RNA synthetase gene (SCO2076). scr4701 may constitutes the 5′UTR of the rpsJ gene encoding the ribosomal protein S10 (SCO4701).
Figure 3.
Expression analysis and genomic location of sRNAs. (A) Northern blot analysis of 11 sRNAs at three time points during growth. The size as predicted from 454 sequencing and the apparent size from the northern blot analysis are shown on the left. 5S rRNA was used as a loading control. The northern blots were repeated at least twice. (B) Genomic map showing the localization of the verified sRNAs (not to scale).
The genomic location of the 11 new sRNAs is shown in Figure 3B. Most sRNAs show secondary structure predictions with a very high reliability (RNAfold,35 Sup. Fig. 6). For scr4389 which is the largest sRNA with a lower reliability we performed enzymatic probing and supported the predicted secondary structure (Sup. Fig. 7). Northern blot analysis confirmed a differential expression for all except for scr2952 and scr3920. This hints towards a possible role of these sRNAs in the development of S. coelicolor. It is supported by the fact that nine of 11 sRNAs are encoded within the genetic core which indicates a role in the general life cycle instead of secondary processes (Fig. 2B).
The transcript lengths apparent from the northern blot analysis are consistent with the lengths predicted by the deep sequencing approach for five sRNAs (scr3920, scr4115, scr4389, scr4632 and scr6925). scr1601 and scr5676 appear to be longer than predicted. scr1601 has a sequenced size of 108 nt, however, a 160 nt and a 200 nt signal is present at 24 h and a 200 nt signal at 48 h. If we extend the sequence of the sRNA on the 3′end according to the northern blot results we find a conserved stem loop followed by a poly(T) stretch that resembles a classical transcriptional terminator structure. Termination at this point would give rise to a 215 nt long transcript which would fit the northern blot results. scr5676 can be detected only at 24 h with an apparent size of about 200 nt, whereas a transcript of 130 nt has been sequenced. In both cases the intergenic region would be large enough to accommodate a longer transcript that may be expressed at a different time point.
Four RNAs showed a smaller transcript than expected form the sequencing indicating processing steps (scr2736-2, scr2952, scr3202 and scr6106). scr2736-2 appears ∼100 nt smaller in the northern blot. This may correspond to a transcript ranging from the 5′end of the 454 read to the first drop in expression strength (marked by a * in Sup. Fig. 8A). scr2952 gives a signal ∼11 nt smaller than expected which could be the result of a processing event at the 5′end (marked by a * in Sup. Fig. 8B). Sequencing data of scr3202 shows two main transcripts of 70 and 90 nt size (Sup. Fig. 8C). The shorter and more abundant transcript fits the signal seen in the northern blot. The longer transcript folds into two distinct stem loops. Both loops are C-rich and the stems are followed by uridine residues. Stem loops with C-rich loops have been proposed to act as terminators in streptomycetes.15 The gene hrdD coding for the principal RNA polymerase sigma factor36 is located about 250 nt upstream of scr3202. It cannot be excluded that scr3202 is the terminator of a rather long 3′UTR of hrdD mRNA. As discussed earlier we have detected a significant number of small transcripts that fold into a single stem loop structure which may correspond to terminator fragments. Their appearance prompted us to use strict filter criteria (size >79 nt and more than 50 nt away from of a transcript). Alternatively, C-rich loops have also been shown to be the binding sites of sRNAs with their target mRNA.11,37,38 We will address this question in future experiments. Sequencing data of scr6106 show processing at the 5′end with products of 62 and 84 nt lengths (Sup. Fig. 8D). This fits exactly the northern blot results with a specific signal at about 60 nt. In the sequencing data the 84 nt long transcript is only present in the pool enriched for primary transcripts indicating a processing event.
Conservation analysis of identified sRNAs.
We analyzed whether the new sRNAs are conserved in other species. We compared the 11 verified sRNA sequences from S. coelicolor to all available microbial genomes using the Blast algorithms at NCBI (www.ncbi.nlm.nih.gov/sutils/genom_table.cgi) and at the Broad Institute (www.broadinstitute.org/annotation/genome/streptomycesgroup/Blast.html) where genome sequences from numerous Actinomycetales including 21 different Streptomyces species are available. Interestingly, both approaches led to hits only in other Streptomyces species (only hits with an E-value less than 10−4 were taken into account). None of the transcripts could be identified outside this group. The result from the Blast search is summarized in Table 3. We find all grade of conservation with scr2952 identified in 20/21 Streptomyces genomes and scr4632 and scr6925 being present only in S. coelicolor. The observed exclusive conservation in streptomycetes could argue for Streptomyces-specific and, in two cases, perhaps even S. coelicolor-specific sRNAs.
Table 3.
Conservation of newly identified sRNA in different Streptomyces species
| scr 1601 | scr 2736 | scr 2952 | scr 3202 | scr 3920 | scr 4115 | scr 4389 | scr 4632 | scr 5676 | scr 6106 | scr 6925 | |
| S. albus J1074 | X | X | X | X | |||||||
| S. avermitilis MA-4680 | X | X | X | X | X | X | X | ||||
| S. clavuligerus ATCC 27064 | X | X | X | ||||||||
| S. coelicolor A3(2) | X | X | X | X | X | X | X | X | X | X | X |
| S. ghanaensis ATCC 14672 | X | X | X | X | X | X | X | X | |||
| S. griseoflavus Tu4000 | X | X | X | X | X | X | X | X | X | X | |
| S. griseus subsp. griseus NBRC 13350 | X | X | |||||||||
| S. hygroscopicus ATCC 53653 | X | X | |||||||||
| S. lividans TK24 | X | X | X | X | X | X | X | X | X | ||
| S. pristinaespiralis ATCC 25486 | X | X | X | X | |||||||
| S. roseosporus NRRL 11379 (v4) | X | X | X | X | |||||||
| S. roseosporus NRRL 15998 | X | X | X | X | |||||||
| S. scabiei 87.22 | X | X | X | X | X | ||||||
| S. sp. AA4 | |||||||||||
| S. sp. C | X | X | X | ||||||||
| S. sp. E14 | X | X | X | X | X | X | X | ||||
| S. sp. Mg1 | X | X | |||||||||
| S. sp. SPB74 | X | X | |||||||||
| S. sp. SPB78 | X | X | X | ||||||||
| S. sviceus ATCC 29083 | X | X | X | X | X | X | X | X | |||
| S. viridochromogenes DSM 40736 | X | X | X | X | X | X | X | X |
Crosses denote conservation of the sRNA in different species.
Target prediction for the identified sRNAs.
We performed an in silico search for putative targets regulated by the sRNAs using TargetRNA.39 Candidate targets were evaluated in the order of their binding score. A large number of potential targets with long stable interaction sites were predicted for each sRNA. However, we repeated this scan using a 150 nt random RNA sequence of equal GC content as bait and found results of equal quality which was most probably due to the high GC content of the genome. Therefore we did not continue with this approach.
sRNAs often bind their target with conserved sequences present in loop structures or single-stranded regions. Therefore we examined our sRNAs for highly conserved, single stranded sequence motifs. We performed alignments of those sRNAs that are conserved in at least nine of the 21 Streptomyces species (Sup. Fig. 9) and identified five sequence motifs (scr3920: motif 1: AUU GGA motif 2: ACG AGG GGG GA, scr4115: UCC CCG C and scr4389: motif 1: GAU GUA motif 2: UAC GU). The conservation and the location of the motifs in the predicted secondary structure is shown in Supplemental Figures 6, 7 and 9. Complementary sequences of the loop motif UCC CCG C of scr4115 were found 3,250 times which is too numerous to be significant. For scr3920 and scr4389, we performed an target prediction where we looked for adjacent regions in the genome (spacing between 10 and 50 nts) complementary to both of the motifs using the program fuzznuc.40
The analysis for scr3920 resulted in no hits. For scr4389 we obtained 13 hits (summarized in Sup. Table 3). Putative target sites are located in the 5′UTR, translation initiation region (TIR) or the coding sequence of 13 genes but not in intergenic regions. Most interestingly, six of them are located in genes involved in the central carbon metabolism (aceE1 and aceE2: pyruvate decarboxylases, aceB1: malate synthase, glmS2: glucosamine-fructose-6-phosphate aminotransferase, SCO6229: sugar transporter and SCO7211: glycosyl hydrolase). This hints at a role in the primary metabolism. We are currently analyzing the expression pattern of this sRNA under different nutrient conditions and a potential interaction between the sRNA and the predicted targets.
Comparison of sequencing results with bioinformatic sRNA predictions.
An interesting question is how well bioinformatic searches can predict the sequencing results we presented here. We compared our results to previously published studies from S. coelicolor.15,16,33 ∼300 sRNAs have been predicted by Tjaden et al. using the program sRNAFinder.41 24 of these showed at least one read in the 454 analysis. However none of the candidates we finally selected for northern blot analysis have been experimentally validated in this study. Another study by Pánek et al. predicted 39 sRNAs via terminator predictions using TransTermHP42 and validated the expression of 13.27 out of the predicted 39 sRNAs showed reads in our 454 sequencing data. Seven of those transcripts did not match our criteria or were already described before. Twelve were not found in our sequencing approach (but we found transcripts on the opposite strand). According to our data nine further transcripts may represent 5′ and 3′UTRs (for details see Sup. Tables 4 and 5). Taken together, all of our newly identified sRNAs have not been detected by previous approaches.
The little overlap between our and previously published results prompted us to compare our sequencing data to new bioinformatic sRNA predictions. Thereby we used similar filter criteria for the bioinformatics as for the sequencing analysis with a cut-off of 79 nt. We could not take into account the distance to neighboring genes since UTR length are not comparable between the different organisms. With this restriction, the 454 analysis resulted in 276 candidates (instead of the 64 discussed above). We used two different algorithms for non-coding RNA prediction (RNAz and Dynalign).43,44 We compared the genome of S. coelicolor with the phylogenetically closely related bacterium Streptomyces avermitilis and the more distant relative Thermobifida fusca. Intergenic regions from these bacteria were compared with each other using BLAST and afterwards aligned using ClustalW.45,46 RNAz was then used on the multiple sequence alignments to identify candidate sRNA.47 In the second approach Dynalign was used.44 Here, two separate genome alignments, S. coelicolor to S. avermitilis and S. coelicolor to T. fusca, were scanned for putative sRNAs and the results were pooled together into one list of candidate sRNAs.
The overlap is summarized in Figure 3. No more than 18 regions were observed by both bioinformatic methods and overlap with the sequencing (Fig. 4 and Sup. Table 6). Only six of the validated 14 cis or trans acting RNAs have been predicted by at least one of the programs (Sup. Table 7). Eight sRNAs however are missed by both programs. Two of them are only present in S. coelicolor (scr4632 and scr6925). Since both programs depend on a sequence/structure alignment for their prediction they are not suitable to find such non-conserved sRNAs.
Figure 4.

Venn diagram showing the overlap between RNAz- and Dynalign-facilitated prediction and 454 sequencing. The total number of putative sRNAs is indicated in parentheses (putative sRNAs: predicted to be a sRNAs with confidence ≥90% with a length >79 nt). The number of validated sRNAs per subsection is given in the black hexagons. 540 and 639 regions, respectively, were predicted by RNAz and Dynalign with an overlap of only 163. Only 18 regions were observed by both bioinformatic methods and overlap with the sequencing (Sup. Table 6). Only six of the validated 14 RNAs have been bioinformatically predicted (Sup. Table 7).
The lack of overlap which has been noted previously in reference 48, suggests that there is more to learn about discovering functional RNAs. In addition to the 62 regions predicted to be functional RNA that are expressed at late exponential growth, our bioinformatic approaches predicted another 936 regions to be putative sRNAs. Since the newly identified sRNA show growth phase-dependent expression it is likely that much more sRNAs are expressed at other growth phases or under certain stress conditions. Thus, beside the new sRNAs identified in this study additional sequencing approaches using different time points, growth and media conditions will result in further sRNAs and will complete transcriptional map of the S. coelicolor genome.
Material and Methods
Cultivation of S. coelicolor.
108 spores per 50 ml medium were pre-germinated as described by Kieser et al. Cultures were grown in TSβ-Media19 (Becton-Dickinson) with glass beads (2 g/50 ml) at 30°C under continuous shaking to the end of the exponential phase (72 h). Cell growth was monitored by measuring the OD450.
RNA isolation.
Total RNA was isolated using the hot-phenol method described by Mattatall et al. with the following modifications: A 50 ml culture was harvested by centrifugation at 8,500 rpm at 4°C and resuspended in 10 ml lysis buffer (10 mM sodium acetate, 150 mM sucrose, pH 4.8). Glass beads (5 ml each of 0.4 and of 4 mm diameter) and 10 ml hot phenol (65°C, pH 4.5) were added and shaken in the FastPrep-24 (MP Biomedicals) for 3 × 30 s at 6 m/s. The cell suspension was incubated at 65°C for 1 min between each shaking step. After phenol/chlorophorm extraction and ethanol precipitation the RNA was resuspended in 500 µl water and the concentration was determined (usually 2–4 µg/µl). 100 µg total RNA was incubated with 30 U Turbo DNase (Ambion) for 1 h to remove residual DNA, subsequently precipitated and resuspended in 50 µl water. Usually, a concentration of 1–1.5 µg/µl was obtained and quality checked on a 1% agarose gel. Absence of contaminating DNA was verified by PCR primer pairs: 4091-fwd/rev and 6640-fwd/rev, program: 96°C 5 min; 40x (96°C 30 s-50°C 30 s-72°C 30 s); 72°C 5 min, primer sequences are available upon request.
Preparation of cDNA library and sequencing.
cDNA library generation and deep sequencing analysis was performed as described before, but omitting size fractionation of RNA prior to cDNA synthesis.50 Briefly, the isolated RNA was split into two equal samples. One of the samples was treated with terminator 5′exonuclease (TEX) thereby removing all RNAs carrying a 5′monophosphate as previously described in reference 11. That way, primary transcripts were enriched in this sample. Next the pyrophosphate group from the 5′ end of primary transcripts was removed using tobacco acid pyrophosphatase (TAP). The RNA was polyA tailed and RNA linkers were ligated to the 5′end of each RNA. Finally a cDNA library was created for each sample (SCO−: no treatment; SCO+: enrichment of primary transcripts). After sequencing, 5′linker sequences and polyA tails were removed from the cDNA reads. Reads >17 nt were aligned to the S. coelicolor genome (NC_003888,1) using WU_BLAST 2.0 (blast.wustl.edu). For visualization of BLAST hits, graph files were calculated and loaded into the Integrated Genome Browser (Affymetrix) as previously described in reference 50.
Northern blot analysis of sRNAs.
Ten micrograms total RNA was separated on 8% denaturing polyacrylamide (PAA) gels and transferred to a positively charged nylon membrane (Hybond N+, GE Healthcare) in a tank blotting device (Peqlab) at 4°C. Five picomole 20–25 mer oligonucleotides (Purimex) antisense to the putative sRNA were radioactive labeled at the 5′ end using 2 µl γ-32P-ATP (∼3.3 pmol/µl, Hartmann-Analytik) and 1 µl T4 polynucleotide kinase (Roche) in the supplied buffer for 1 h at 37°C and purified with Illustra Microspin G-25 columns (GE Healthcare). 25 µl labeled oligonucleotide (∼300,000 cpm/µl) was used as probe per experiment. Signals were quantified by phosphoimaging using a Typhoon 9100 (GE Healthcare). If no signal was detected, the candidate sRNA was retested using a second different probe. 20,000 cpm of a radiolabeled ssRNA size marker were also loaded on the gel. Therefore 5 µl Low Range ssRNA Marker (NEB) was treated with alkaline phosphatase (Roche) to remove the 5′phosphate and afterwards labeled as described above.
Dynalign facilitated prediction of sRNAs.
MUMmer51 was used to generate pairwise genomic alignments between S. coelicolor/S. avermitilis and S. coelicolor/T. fusca. The option, “-b 1600-c 10” was used in MUMmer in order to have more genomic coverage than the default parameters provided. The S. coelicolor genome was divided into windows of 100 nt with 50 overlapping nucleotides between adjacent windows. Those windows in intergenic regions were mapped to the genomic alignments to find the aligned sequences. The aligned pairs of windows were provided to the Dynalign/SVMz method to predict their probabilities to be ncRNA. This method uses Dynalign52 to predict the lowest free energy common structure for two sequences. The folding free energy change and sequence features are then used as data by a support vector machine classification method to predict the probability that a given window is a sRNA.53,54 The prediction results from the two genomic alignments were pooled to make a candidate list. Hits >79 nt with probabilities of 0.9 or larger were selected as candidates and over-lapping/duplicated hits were combined.
sRNAFinder/RNAz facilitated prediction of sRNAs.
The sRNAfinder method of prediction of regulatory RNAs is based on a phylogenetic, comparative approach to identify structurally conserved regions in the genome of closely related organisms.41 The genomes of these organisms are reduced to their extended intergenic regions (eIGR) plus 100 bases of their flanking genes. These eIGRs are grouped based on a BLAST46 search. Grouped eIGRs are searched for putative stable structures with RNAz.43 Hits >79 nt with probabil ities of 0.9 or larger were selected as candidates and combined as described above.
Acknowledgements
The authors thank Dr. Jörg Vogel for support with the 454 sequencing and Shan Zhao and Zhi John Lu for training the support vector machines used for Dynalign-facilitated sRNA prediction classification. This work was supported by the Deutsche Forschungsgemeinschaft within the priority program SPP1258 (SU402/2-2), the Cluster of Excellence: Macromolecular Complexes and the Aventis Foundation to B.S., and by NIH grant R01HG004002 to D.H.M.
Supplementary Material
References
- 1.Bentley SD, Chater KF, Cerdeño-Tárraga AM, Challis GL, Thomson NR, James KD, et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) Nature. 2002;417:141–147. doi: 10.1038/417141a. [DOI] [PubMed] [Google Scholar]
- 2.Binnie C, Jenish D, Cossar D, Szabo A, Trudeau D, Krygsman P, et al. Expression and characterization of soluble human erythropoietin receptor made in Streptomyces lividans 66. Protein Expr Purif. 1997;11:271–278. doi: 10.1006/prep.1997.0787. [DOI] [PubMed] [Google Scholar]
- 3.Watve MG, Tickoo R, Jog MM, Bhole BD. How many antibiotics are produced by the genus Streptomyces? Archives of Microbiology. 2001;176:386–390. doi: 10.1007/s002030100345. [DOI] [PubMed] [Google Scholar]
- 4.Chater KF. Regulation of sporulation in Streptomyces coelicolor A3(2): a checkpoint multiplex? Curr Opin Microbiol. 2001;4:667–673. doi: 10.1016/s1369-5274(01)00267-3. [DOI] [PubMed] [Google Scholar]
- 5.Reuther J, Wohlleben W. Nitrogen metabolism in Streptomyces coelicolor: transcriptional and post-translational regulation. J Mol Microbiol Biotechnol. 2007;12:139–146. doi: 10.1159/000096469. [DOI] [PubMed] [Google Scholar]
- 6.Sharma CM, Vogel J. Experimental approaches for the discovery and characterization of regulatory small RNA. Curr Opin Microbiol. 2009;12:536–546. doi: 10.1016/j.mib.2009.07.006. [DOI] [PubMed] [Google Scholar]
- 7.Storz G, Altuvia S, Wassarman KM. An abundance of RNA regulators. Annu Rev Biochem. 2005;74:199–217. doi: 10.1146/annurev.biochem.74.082803.133136. [DOI] [PubMed] [Google Scholar]
- 8.Gottesman S. The small RNA regulators of Escherichia coli: Roles and Mechanisms. Annu Rev Microbiol. 2004;58:303–328. doi: 10.1146/annurev.micro.58.030603.123841. [DOI] [PubMed] [Google Scholar]
- 9.Waters LS, Storz G. Regulatory RNAs in bacteria. Cell. 2009;136:615–628. doi: 10.1016/j.cell.2009.01.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jäger D, Sharma CM, Thomsen J, Ehlers C, Vogel J, Schmitz RA. Deep sequencing analysis of the Methanosarcina mazei Go1 transcriptome in response to nitrogen availability. Proc Natl Acad Sci USA. 2009 doi: 10.1073/pnas.0909051106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiß S, Sittka A, et al. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature. 2010;;464:250–255. doi: 10.1038/nature08756. [DOI] [PubMed] [Google Scholar]
- 12.Albrecht M, Sharma CM, Reinhardt R, Vogel J, Rudel T. Deep sequencing-based discovery of the Chlamydia trachomatis transcriptome. Nucleic Acids Research. 2010;38:868–877. doi: 10.1093/nar/gkp1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.vanVliet AH. Next generation sequencing of microbial transcriptomes: challenges and opportunities. FEMS Microbiol Lett. 2010;302:1–7. doi: 10.1111/j.1574-6968.2009.01767.x. [DOI] [PubMed] [Google Scholar]
- 14.Sorek R, Cossart P. Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet. 11:9–16. doi: 10.1038/nrg2695. [DOI] [PubMed] [Google Scholar]
- 15.Pánek J, Bobek J, Mikulik K, Basler M, Vohradsky J. Biocomputational prediction of small non-coding RNAs in Streptomyces. BMC Genomics. 2008;9:217. doi: 10.1186/1471-2164-9-217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Swiercz J, Hindra, Bobek J, Haise H, Di Berardo C, Tjaden B, et al. Small non-coding RNAs in Streptomyces coelicolor. Nucleic Acids Research. 2008;36:7240–7251. doi: 10.1093/nar/gkn898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.D'alia D, Nieselt K, Steigele S, Müller J, Verburg I, Takano E. Non-coding RNA of glutamine synthetase I modulates antibiotic production in Streptomyces coelicolor A3(2) J Bacteriol. 2010;192:1160–1164. doi: 10.1128/JB.01374-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tezuka T, Hara H, Ohnishi Y, Horinouchi S. Identification and gene disruption of small noncoding RNAs in Streptomyces griseus. J Bacteriol. 2009;191:4896–4904. doi: 10.1128/JB.00087-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tobias Kieser MJB, Buttner MJ, Chater KF, Hopwood DA. Practical Streptomyces Genetics. Colney, Norwich NR4 7UH, England: John Innes Foundation; [Google Scholar]
- 20.Caballero JL, Malpartida F, Hopwood DA. Transcriptional organization and regulation of an antibiotic export complex in the producing Streptomyces culture. Mol Gen Genet. 1991;228:372–380. doi: 10.1007/BF00260629. [DOI] [PubMed] [Google Scholar]
- 21.Viollier PH, Nguyen KT, Minas W, Folcher M, Dale G, Thompson CJ. Roles of aconitase in growth, metabolism and morphological differentiation of Streptomyces coelicolor. Journal of Bacteriology. 2001;183:3193–3203. doi: 10.1128/JB.183.10.3193-3203.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Van Etten WJ, Janssen GR. An AUG initiation codon, not codon- anticodon complementarity, is required for the translation of unleadered mRNA in Escherichia coli. Mol Microbiol. 1998;27:987–1001. doi: 10.1046/j.1365-2958.1998.00744.x. [DOI] [PubMed] [Google Scholar]
- 23.Hering O, Brenneis M, Beer J, Suess B, Soppa J. A novel mechanism for translation initiation operates in haloarchaea. Mol Microbiol. 2009;71:1451–1463. doi: 10.1111/j.1365-2958.2009.06615.x. [DOI] [PubMed] [Google Scholar]
- 24.Janssen GR. Eubacterial, Archaebacterial and Eucaryotic Genes That Encode Leaderless mRNA. In: Balz RH, editor. Industrial Microorganisms: Basic and Applied Molecular Genetics. Washington DC: American Society for Microbiology; 1993. [Google Scholar]
- 25.Bibb MJ, White J, Ward JM, Janssen GR. The mRNA for the 23S rRNA methylase encoded by the ermE gene of Saccharopolyspora erythraea is translated in the absence of a conventional ribosome-binding site. Mol Microbiol. 1994;14:533–545. doi: 10.1111/j.1365-2958.1994.tb02187.x. [DOI] [PubMed] [Google Scholar]
- 26.Janssen GR, Ward JM, Bibb MJ. Unusual transcriptional and translational features of the aminoglycoside phosphotransferase gene (aph) from Streptomyces fradiae. Genes & Development. 1989;3:415–429. doi: 10.1101/gad.3.3.415. [DOI] [PubMed] [Google Scholar]
- 27.Jones RL, Jaskula JC, Janssen GR. In vivo translational start site selection on leaderless mRNA transcribed from the Streptomyces fradiae aph gene. Journal of Bacteriology. 1992;174:4753–4760. doi: 10.1128/jb.174.14.4753-4760.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kaberdin VR, Bläsi U. Translation initiation and the fate of bacterial mRNAs. FEMS Microbiol Rev. 2006;30:967–979. doi: 10.1111/j.1574-6976.2006.00043.x. [DOI] [PubMed] [Google Scholar]
- 29.Moll I, Grill S, Gualerzi CO, Blasi U. Leaderless mRNAs in bacteria: surprises in ribosomal recruitment and translational control. Mol Microbiol. 2002;43:239–246. doi: 10.1046/j.1365-2958.2002.02739.x. [DOI] [PubMed] [Google Scholar]
- 30.Grill S, Gualerzi CO, Londei P, Blasi U. Selective stimulation of translation of leaderless mRNA by initiation factor 2: evolutionary implications for translation. EMBO J. 2000;19:4101–4110. doi: 10.1093/emboj/19.15.4101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kaberdina AC, Szaflarski W, Nierhaus KH, Moll I. An unexpected type of ribosomes induced by kasugamycin: a look into ancestral times of protein synthesis? Molecular Cell. 2009;33:227–236. doi: 10.1016/j.molcel.2008.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sorokin A, Serror P, Pujic P, Azevedo V, Ehrlich SD. The Bacillus subtilis chromosome region encoding homologues of the Escherichia coli mssA and rpsA gene products. Microbiology. 1995;141:311–319. doi: 10.1099/13500872-141-2-311. [DOI] [PubMed] [Google Scholar]
- 33.Weinberg Z, Barrick JE, Yao Z, Roth A, Kim JN, Gore J, et al. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucleic Acids Research. 2007;35:4809–4819. doi: 10.1093/nar/gkm487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wassarman KM. 6S RNA: a small RNA regulator of transcription. Curr Opin Microbiol. 2007;10:164–168. doi: 10.1016/j.mib.2007.03.008. [DOI] [PubMed] [Google Scholar]
- 35.Gruber A, Lorenz R, Bernhart S, Neubock R, Hofacker IL. The Vienna RNA Websuite. Nucleic Acids Research. 2008;36:W70–W74. doi: 10.1093/nar/gkn188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kormanec J, Farkasovsky M, Potuckova L. Four genes in Streptomyces aureofaciens containing a domain characteristic of principal sigma factors. Gene. 1992;122:63–70. doi: 10.1016/0378-1119(92)90032-k. [DOI] [PubMed] [Google Scholar]
- 37.Xiao B, Li W, Guo G, Li B, Liu Z, Jia K, et al. Identification of small noncoding RNAs in Helicobacter pylori by a bioinformatics-based approach. Curr Microbiol. 2009;58:258–263. doi: 10.1007/s00284-008-9318-2. [DOI] [PubMed] [Google Scholar]
- 38.Boisset S, Geissmann T, Huntzinger E, Fechter P, Bendridi N, Possedko M, et al. Staphylococcus aureus RNAIII coordinately represses the synthesis of virulence factors and the transcription regulator Rot by an antisense mechanism. Genes Dev. 2007;21:1353–1366. doi: 10.1101/gad.423507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Tjaden B, Goodwin SS, Opdyke JA, Guillier M, Fu DX, Gottesman S, et al. Target prediction for small, noncoding RNAs in bacteria. Nucleic Acids Research. 2006;34:2791–2802. doi: 10.1093/nar/gkl356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. doi: 10.1016/s0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
- 41.Tjaden B. Prediction of small, noncoding RNAs in bacteria using heterogeneous data. J Math Biol. 2008;56:183–200. doi: 10.1007/s00285-007-0079-5. [DOI] [PubMed] [Google Scholar]
- 42.Kingsford CL, Ayanbule K, Salzberg SL. Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol. 2007;8:22. doi: 10.1186/gb-2007-8-2-r22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA. 2005;102:2454–2459. doi: 10.1073/pnas.0409169102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Uzilov AV, Keegan JM, Mathews DH. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics. 2006;7:173. doi: 10.1186/1471-2105-7-173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Thompson J, Higgins D, Gibson T. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 47.Axmann I, Kensche P, Vogel J, Kohl S, Herzel H, Hess WR. Identification of cyanobacterial non-coding RNAs by comparative genome analysis. Genome Biol. 2005;6:73. doi: 10.1186/gb-2005-6-9-r73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gorodkin J, Hofacker IL, Torarinsson E, Yao Z, Havgaard JH, Ruzzo WL. De novo prediction of structured RNAs from genomic sequences. Trends Biotechnol. 28:9–19. doi: 10.1016/j.tibtech.2009.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Mattatall NR, Sanderson KE. Salmonella typhimurium LT2 possesses three distinct 23S rRNA intervening sequences. J Bacteriol. 1996;178:2272–2278. doi: 10.1128/jb.178.8.2272-2278.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sittka A, Lucchini S, Papenfort K, Sharma CM, Rolle K, Binnewies TT, et al. Deep sequencing analysis of small noncoding RNA and mRNA targets of the global post-transcriptional regulator, Hfq. PLoS Genetics. 2008;4:1000163. doi: 10.1371/journal.pgen.1000163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Mathews DH, Turner DH. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol. 2002;317:191–203. doi: 10.1006/jmbi.2001.5351. [DOI] [PubMed] [Google Scholar]
- 53.Chang C, Lin C. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/∼cjlin/libsvm. [Google Scholar]
- 54.Lu Z. Secondary Structure Prediction of Non-coding RNA. Rochester: University of Rochester; 2008. [Google Scholar]
- 55.Tjaden B. Prediction of small, noncoding RNAs in bacteria using heterogeneous data. J Math Biol. 2008;56:183–200. doi: 10.1007/s00285-007-0079-5. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



