Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2010 Oct;13(5):619–624. doi: 10.1016/j.mib.2010.09.009

Studying bacterial transcriptomes using RNA-seq

Nicholas J Croucher 1, Nicholas R Thomson 1
PMCID: PMC3025319  PMID: 20888288

Abstract

Genome-wide studies of bacterial gene expression are shifting from microarray technology to second generation sequencing platforms. RNA-seq has a number of advantages over hybridization-based techniques, such as annotation-independent detection of transcription, improved sensitivity and increased dynamic range. Early studies have uncovered a wealth of novel coding sequences and non-coding RNA, and are revealing a transcriptional landscape that increasingly mirrors that of eukaryotes. Already basic RNA-seq protocols have been improved and adapted to looking at particular aspects of RNA biology, often with an emphasis on non-coding RNAs, and further refinements to current techniques will improve our understanding of gene expression, and genome content, in the future.

Introduction

The advent of second generation sequencing technologies has created many opportunities to improve functional genomics experiments, including quantitative gene expression studies. Most previous transcriptional analysis methods have relied on hybridization of targeted oligonucleotides to particular loci for their sequence specificity: either primers binding to target cDNA in quantitative reverse transcription polymerase chain reaction (qRT-PCR), labeled probes binding to RNA in Northern blotting or hybridization of cDNA to probes on microarray chips. RNA-seq is different in principle in that data are matched to genes by sequence alignment instead.

This has intrinsic advantages: first, because no probe sequences are specified, all transcription is studied in an unbiased manner, and experimental design does not need to be altered in accordance with differences in genome sequence. This promises to be a particular advantage in the study of bacteria with large amounts of genetic variation between strains [1]. It also allows the discovery of novel genetic features, as well as permitting the delineation of operons and untranslated regions, allowing the improvement and extension of sequence annotation.

Second, mapping of sequence data is more precise than hybridization between oligonucleotides. This allows transcription to be studied at a much higher resolution by sequencing, thereby also permitting the study of more repetitive regions of the genome. Additionally, it means quantification of gene expression by RNA-seq does not suffer from the issues of interference between genes due to non-specific hybridization of cDNA to probes [2,3].

Third, whereas hybridization-based methods measure gene expression levels through detection of fluorescence or radioactivity, RNA-seq uses the amount of data matching a given coding sequence (CDS), typically quantified as reads per kilobase CDS length per million reads analyzed (RPKM) [4]. This measure cannot be saturated in the way the detection of light or radioactivity can, hence RNA-seq has a much greater dynamic range for measuring variability in expression levels. Consequently, it can also be much more discriminatory both at high levels of gene expression and more sensitive at very low levels of expression, given sufficient sequencing depth.

Preparation of cDNA

RNA is typically extracted using organic solvents or commercially available kits; however, care should be taken to ensure the method does not bias the sampling of the transcriptome [5] and is capable of harvesting sufficient starting material needed to construct a sequencing library, as more RNA is typically needed than for microarray experiments. Furthermore, the exclusion of highly expressed transcripts, which risk saturating the dataset, is also more difficult than with microarray experiments, where probes can be omitted from the chip design as required. As ribosomal RNA comprises the vast majority of the extracted RNA population, depletion of these molecules through hybridization to magnetic bead-linked complementary oligonucleotides [5–10,11], or the use of terminator exonucleases that specifically degrade transcripts with a 5′ monophosphate group [12••], has been used in efforts to increase the coverage of mRNA and ncRNA. However, the rapid increase in the productivity of the second generation sequencing technologies renders the expensive depletion processes largely unnecessary, especially given the opportunity for sample degradation and bias it presents [10]. Nevertheless, saturation of sequence data by abundant transcripts will remain an issue in some cases; for instance, when analyzing bacterial gene expression within host tissues, where eukaryotic RNA will be far more abundant than that of the prokaryote.

In the original RNA-seq protocols, following extensive DNase treatment, RNA was typically converted into cDNA through random hexamer-primed reverse transcription followed by second DNA strand synthesis [5–9,13]. However, using double stranded cDNA for making sequencing libraries results in equal levels of signal on both the sense and antisense strands, thereby losing information regarding the direction of transcription. A simple method for maintaining the directional signal in RNA-seq data is to construct Illumina libraries from only first strand cDNA [10]. Alternative techniques used to maintain directional fidelity involve sequentially ligating adapters onto RNA molecules in an orientation-specific manner [14,15], with one approach implemented in studies of Mycoplasma pneumoniae and Pseudomonas syringae transcriptomes [16••,17] and another used for RNA-seq in Helicobacter pylori and Salmonella enterica Typhimurium [12••,18] (Figure 1). Other methods for maintaining directional information pioneered in studies of eukaryotes include the use of template switching PCR [19], bisulfite-induced conversion of cytosine to uracil in transcripts before reverse transcription [20], addition of sequence tags into the primers used for reverse transcription [21] and incorporation of deoxyuridine into the second strand of cDNA, which can subsequently be degraded using uracil-N-glycosylase [22]. The importance of this information in characterizing ncRNA and observing antisense transcription is becoming increasingly evident.

Figure 1.

Figure 1

Methods for preparation of cDNA. All methods require the extraction of nucleic acids from a sample of cells, followed by the enzymatic removal of DNA. Ribosomal RNA may then be depleted to increase the sequence coverage of other transcripts. To identify putative transcriptional start sites, samples are first treated with tobacco acid pyrophosphatase (TAP), which converts the triphosphate group at the 5′ end of intact transcripts to a monophosphate. This is required for the ligation reaction to attach an adapter to the 5′ end; polyadenylation or oxidation of the 3′ end of the RNA is used to ensure the specificity in the orientation of this reaction. This allows the 3′ part of the cDNA, corresponding to the extreme 5′ end of the original transcript, to be targeted for sequencing. In order to obtain sequence data covering the entire transcriptome, small cDNA molecules must be randomly generated from throughout the RNA sample. This has frequently been achieved through random hexamer-primed reverse transcription; using only the first strand for sequencing library construction allows information on the direction of transcription to be maintained. Alternatively, the RNA may be fragmented, and information on the template strand for transcription retained through orientation-specific, stepwise attachment of adapters. One method involves dephosphorylating the 5′ end so the first adapter can only be ligated to the 3′ end of the transcript; the complementary approach is to polyadenylate the 3′ end such that the first adapter is only found attached the 5′ end of the RNA. One technique not shown is the use of fragmented RNA as a template for random hexamer-primed reverse transcription, as performed by Oliver et al. A wider range of methods has been applied in obtaining similar information from eukaryotic transcriptomes (see text).

Alternative applications of RNA-seq

As well as surveying the entire transcriptomes of bacterial strains, RNA-seq can be adapted to other experiments as well. For instance, techniques have been developed to specifically sequence the 5′ region of RNA molecules, allowing the identification of putative transcriptional start sites and helping to define operons and ncRNA [12••,13] (Figure 1). In S. Typhimurium, coimmunoprecipitation of RNA molecules with Hfq, a chaperone that facilitates hybridization between ncRNA and mRNA, was used to enrich a sample for transcripts participating in such interactions [18], while in Vibrio cholerae, a very stringent depletion and size-selection process was used to specifically sequence small ncRNA [23]. RNA-seq has also been applied to whole environments, leading to the development of techniques for sampling the metatranscriptomes of marine [24,25] and soil communities [26].

Analysis of sequence data

Illumina, 454 and SOLiD sequencing platforms have been used in bacterial RNA-seq studies [27–29]. Each offers a different compromise between the length of reads, which determines what proportion of the genome data can be uniquely mapped to, and depth of coverage, which determines the dynamic range over which gene expression can be quantified.

However, above a certain threshold, obtaining longer reads results in a relatively small increase in the amount of the genome that can be studied, hence read depth will be the more important consideration in almost all cases.

After sequencing, reads can be assembled using software either based on overlap graphs, such as EDENA [30], or de Bruijn graphs, for instance ABySS [31], ALLPATHS [32] or Velvet [33], which features a strand-specific assembly mode. Alternatively, the reads can be mapped onto a reference sequence. Some studies have used BLAST-based or nucmer-based algorithms [34,35] to align sequence reads to the genome, but a number of programs have been developed specifically for mapping short read data [36–39], which often have the advantages of considering base quality and read pair information when performing alignments. The results of mapping analyses have commonly been visualized as a graph of sequence read coverage across a genome, displayed using software such as the Integrated Genome Browser [40] or Artemis [41]. With the introduction of specialist tools such as BamView [42], raw sequence data can be visualized as well as coverage graphs, allowing a more intuitive understanding of the transcriptional landscape (Figure 2).

Figure 2.

Figure 2

Display of RNA-seq data. Data from a Salmonella bongori transcriptome, prepared as described in Ref. [9], displayed using Artemis. Using BamView, the total coverage is shown displayed as a plot (a), as raw reads aligned against the reference sequence (b) and as reads assigned separately to the two strands of the genome (c). A strand-specific coverage plot is also shown (d) and the genome annotation is displayed underneath.

RNA-seq, as with comparable methods, requires biological replicates for robust quantification of differential expression. However, the greater cost of sequencing relative to microarray hybridization makes such repetition expensive, so statistical methods have been developed to overcome this by modeling the expected distributions of sequence reads mapping to a locus in different samples. DEGseq [43] uses a Poisson distribution to model the variation between datasets [44], whereas the approaches of edgeR [45] and DEseq [46] are based on the negative binomial distribution, which is suggested to be more appropriate for modeling the variation inherent between biological replicates [47].

Characteristics of bacterial transcriptomes

The results of bacterial RNA-seq studies have done much to refine our understanding of bacterial gene expression. One initial insight was that genome-wide CDS expression levels appear to be continuously distributed, with no obvious division between actively expressed genes and a ‘background’ transcription level [6,7]. By contrast, marine metatranscriptome studies have found that gene sequences that are most highly represented in cDNA samples are often rare, or absent, from the corresponding genomic DNA samples, suggesting some bacteria may be transcribing a set of uncharacterized genes at an unusually high level [24,25].

Annotation of CDSs has been significantly improved using RNA-seq data. Novel CDSs have been identified in most studies [7–9,11,13,17], including that of M. pneumoniae, which has a genome just 816 kb in size [16••]. Existing gene models have been refined, often involving correcting the choice of start codon, and associated with one another into operons, which can include the identification of untranslated regions.

However, in both M. pneumoniae and H. pylori, annotation of transcriptional units was complicated by an unexpectedly high level of flexibility in the structure of operons [12••,16••]. Evidence from both tiling microarray and RNA-seq data indicated different promoters appeared to be driving expression of the same genes under different conditions, leading to the division of genes into ‘suboperons’. The level of such alternative transcript forms in M. pneumoniae was estimated to be similar to that in some eukaryotes [16••].

All these amendments to genome annotation are aided by having information on the 5′ ends of transcripts; in Sulfolobus solfataricus, mapping these ends was also used to detect putative transcript degradation products. Enrichment of such sites was found to inversely correlate with the half life of the RNA molecule, suggesting an endoribonucleolytic cleavage mechanism may be important in gene regulation [13].

Bacterial whole transcriptome studies have thus far had a very high success rate of ncRNA discovery. Such transcripts have even been identified and mapped to genomes from marine metatranscriptome data, where certain putative ncRNA showed distinct spatial distributions throughout the water column [48]. Validation using RT-PCR and Northern blots has been largely successful [6,12••,16••,18,23], and work has even begun on functionally characterizing these targets. In H. pylori, both in silico analysis and mutational inactivation suggested that one novel ncRNA uncovered by RNA-seq regulated a chemotaxis receptor as an antisense RNA [12••], and a similar mechanism was posited for a novel ncRNA in V. cholerae, which was found to down regulate mannitol metabolism [23].

Directional RNA-seq data are particularly helpful in annotating ncRNA, as it allows reads to be assigned to a particular strand. Furthermore, it has allowed the detection of large amounts of cis antisense ncRNA: regions of CDSs that are bidirectionally transcribed, and suggested to act to block expression of the encoded protein [12••,13,16••,17]. Such transcripts, identified from both whole genome RNA-seq and on the basis of transcriptional start site identification, have been detected and characterized before [49], but the genome-wide scale of their prevalence is only now being appreciated.

Overall, bacterial transcription is starting to appear to more closely mirror that of eukaryotes. Rather than operons being fixed polycistronic transcriptional units, they may represent one way, of several, of transcribing a particular gene, with CDSs having a greater than expected level of independence from their neighbors. Additionally, antisense RNAs, acting either in cis or trans, may prove to be much more important than previously appreciated.

Limitations, problems and future directions

RNA-seq datasets have proved to be highly consistent, when comparing either technical or biological replicates, making them appropriate for expression studies [10,50]. However, there are technical issues still awaiting resolution, such as the highly variable nature of the coverage across genes and operons, thought to be the combined result of transcript secondary structure and biases introduced through random hexamer priming of reverse transcription and second strand synthesis [4,51]. This variability, which is generally reproducible between replicate experiments [11], introduces uncertainty into the quantification of RNA abundance. More even coverage has been achieved in eukaryotic datasets through reducing transcript secondary structure by using metal ion-catalyzed hydrolysis to fragment the RNA before reverse transcription [4], and it has been demonstrated that bacterial RNA can be fragmented in a similar manner [5,12••,15,17]. There is also the issue of the PCR amplification stages of sequence library construction for all three second generation sequencing platforms, which result in redundant sequence reads and bias the final dataset.

To circumvent such issues, techniques such as direct RNA sequencing [52] and FRT-seq [53], which sequence RNA directly without cDNA intermediates, have been developed. These promise to eventually replace current methods, but suffer from the disadvantage of requiring ribonuclease-free sequencing environments, difficult to maintain in a high throughput sequencing facility. Efforts are also being made to reduce the quantity of starting material required for RNA-seq, with the aim of characterizing the transcriptomes of individual cells [54].

Conclusions

RNA-seq promises to gradually replace microarrays in most, if not all, genome-wide gene expression studies. Both technologies have their own limitations, but the opportunity to quantitatively study transcription to single nucleotide resolution makes RNA-seq increasingly attractive as sequencing become cheaper and easier. The use of protocols that sequence RNA in a strand-specific manner, and identify transcriptional start sites, will prove especially useful in the identification of ncRNA and defining the operons to which genes belong. Hence there is the potential for this technique to greatly refine our understanding of bacterial gene regulation.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

We thank Adam Reid, Maria Fookes and Julian Parkhill for their comments on this manuscript. The authors are funded by the Wellcome Trust.

References

  • 1.Tettelin H., Riley D., Cattuto C., Medini D. Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol. 2008;11:472–477. doi: 10.1016/j.mib.2008.09.006. [DOI] [PubMed] [Google Scholar]
  • 2.Cloonan N., Grimmond S.M. Transcriptome content and dynamics at single-nucleotide resolution. Genome Biol. 2008;9:234. doi: 10.1186/gb-2008-9-9-234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kane M.D., Jatkoe T.A., Stumpf C.R., Lu J., Thomas J.D., Madore S.J. Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res. 2000;28:4552–4557. doi: 10.1093/nar/28.22.4552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
  • 5.Oliver H.F., Orsi R.H., Ponnala L., Keich U., Wang W., Sun Q., Cartinhour S.W., Filiatrault M.J., Wiedmann M., Boor K.J. Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs. BMC Genomics. 2009;10:641. doi: 10.1186/1471-2164-10-641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yoder-Himes D.R., Chain P.S., Zhu Y., Wurtzel O., Rubin E.M., Tiedje J.M., Sorek R. Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci U S A. 2009;106:3976–3981. doi: 10.1073/pnas.0813403106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Passalacqua K.D., Varadarajan A., Ondov B.D., Okou D.T., Zwick M.E., Bergman N.H. Structure and complexity of a bacterial transcriptome. J Bacteriol. 2009;191:3203–3211. doi: 10.1128/JB.00122-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Camarena L., Bruno V., Euskirchen G., Poggio S., Snyder M. Molecular mechanisms of ethanol-induced pathogenesis revealed by RNA-sequencing. PLoS Pathog. 2010;6:e1000834. doi: 10.1371/journal.ppat.1000834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mao C., Evans C., Jensen R.V., Sobral B.W. Identification of new genes in Sinorhizobium meliloti using the Genome Sequencer FLX system. BMC Microbiol. 2008;8:72. doi: 10.1186/1471-2180-8-72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Croucher N.J., Fookes M.C., Perkins T.T., Turner D.J., Marguerat S.B., Keane T., Quail M.A., He M., Assefa S., Bahler J. A simple method for directional transcriptome sequencing using Illumina technology. Nucleic Acids Res. 2009;37:e148. doi: 10.1093/nar/gkp811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11•.Perkins T.T., Kingsley R.A., Fookes M.C., Gardner P.P., James K.D., Yu L., Assefa S.A., He M., Croucher N.J., Pickard D.J. A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet. 2009;5:e1000569. doi: 10.1371/journal.pgen.1000569. [DOI] [PMC free article] [PubMed] [Google Scholar]; An interesting analysis of the transcription of the accessory genome and non-coding RNAs of Salmonella enterica Typhi.
  • 12••.Sharma C.M., Hoffmann S., Darfeuille F., Reignier J., Findeiss S., Sittka A., Chabas S., Reiche K., Hackermuller J., Reinhardt R. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature. 2010;464:250–255. doi: 10.1038/nature08756. [DOI] [PubMed] [Google Scholar]; A highly informative study, particularly with regard to the unexpectedly high levels of non-coding RNA expression and antisense transcription.
  • 13•.Wurtzel O., Sapra R., Chen F., Zhu Y., Simmons B.A., Sorek R. A single-base resolution map of an archaeal transcriptome. Genome Res. 2010;20:133–141. doi: 10.1101/gr.100396.109. [DOI] [PMC free article] [PubMed] [Google Scholar]; This description of transcription in an archeal organism makes an interesting comparison to those performed in eubacteria.
  • 14.Lister R., O’Malley R.C., Tonti-Filippini J., Gregory B.D., Berry C.C., Millar A.H., Ecker J.R. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008;133:523–536. doi: 10.1016/j.cell.2008.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Vivancos A.P., Guell M., Dohm J.C., Serrano L., Himmelbauer H. Strand-specific deep sequencing of the transcriptome. Genome Res. 2010;20:989–999. doi: 10.1101/gr.094318.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16••.Guell M., van Noort V., Yus E., Chen W.H., Leigh-Bell J., Michalodimitrakis K., Yamada T., Arumugam M., Doerks T., Kuhner S. Transcriptome complexity in a genome-reduced bacterium. Science. 2009;326:1268–1271. doi: 10.1126/science.1176951. [DOI] [PubMed] [Google Scholar]; This study is of particular interest because of the analysis of antisense RNA and alternative transcript forms observed across the genome.
  • 17•.Filiatrault M.J., Stodghill P.V., Bronstein P.A., Moll S., Lindeberg M., Grills G., Schweitzer P., Wang W., Schroth G.P., Luo S. Transcriptome analysis of Pseudomonas syringae identifies new genes, noncoding RNAs, and antisense activity. J Bacteriol. 2010;192:2359–2372. doi: 10.1128/JB.01445-09. [DOI] [PMC free article] [PubMed] [Google Scholar]; An interesting discussion of non-coding RNAs, antisense expression and transcriptional read-through of adjacent genes.
  • 18•.Sittka A., Lucchini S., Papenfort K., Sharma C.M., Rolle K., Binnewies T.T., Hinton J.C., Vogel J. Deep sequencing analysis of small noncoding RNA and mRNA targets of the global post-transcriptional regulator, Hfq. PLoS Genet. 2008;4:e1000163. doi: 10.1371/journal.pgen.1000163. [DOI] [PMC free article] [PubMed] [Google Scholar]; A good example of the use of RNA-seq for a specific application, targeted at studying Hfq-mediated interactions between transcripts.
  • 19.Cloonan N., Forrest A.R., Kolle G., Gardiner B.B., Faulkner G.J., Brown M.K., Taylor D.F., Steptoe A.L., Wani S., Bethel G. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008;5:613–619. doi: 10.1038/nmeth.1223. [DOI] [PubMed] [Google Scholar]
  • 20.He Y., Vogelstein B., Velculescu V.E., Papadopoulos N., Kinzler K.W. The antisense transcriptomes of human cells. Science. 2008;322:1855–1857. doi: 10.1126/science.1163853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Armour C.D., Castle J.C., Chen R., Babak T., Loerch P., Jackson S., Shah J.K., Dey J., Rohl C.A., Johnson J.M. Digital transcriptome profiling using selective hexamer priming for cDNA synthesis. Nat Methods. 2009;6:647–649. doi: 10.1038/nmeth.1360. [DOI] [PubMed] [Google Scholar]
  • 22.Parkhomchuk D., Borodina T., Amstislavskiy V., Banaru M., Hallen L., Krobitsch S., Lehrach H., Soldatov A. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 2009;37:e123. doi: 10.1093/nar/gkp596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Liu J.M., Livny J., Lawrence M.S., Kimball M.D., Waldor M.K., Camilli A. Experimental discovery of sRNAs in Vibrio cholerae by direct cloning, 5S/tRNA depletion and parallel sequencing. Nucleic Acids Res. 2009;37:e46. doi: 10.1093/nar/gkp080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Frias-Lopez J., Shi Y., Tyson G.W., Coleman M.L., Schuster S.C., Chisholm S.W., Delong E.F. Microbial community gene expression in ocean surface waters. Proc Natl Acad Sci U S A. 2008;105:3805–3810. doi: 10.1073/pnas.0708897105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gilbert J.A., Field D., Huang Y., Edwards R., Li W., Gilna P., Joint I. Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One. 2008;3:e3042. doi: 10.1371/journal.pone.0003042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Urich T., Lanzen A., Qi J., Huson D.H., Schleper C., Schuster S.C. Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS One. 2008;3:e2527. doi: 10.1371/journal.pone.0002527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bentley D.R., Balasubramanian S., Swerdlow H.P., Smith G.P., Milton J., Brown C.G., Hall K.P., Evers D.J., Barnes C.L., Bignell H.R. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. doi: 10.1038/nature07517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ronaghi M., Uhlen M., Nyren P. A sequencing method based on real-time pyrophosphate. Science. 1998;281 doi: 10.1126/science.281.5375.363. 363, 365. [DOI] [PubMed] [Google Scholar]
  • 29.Shendure J., Porreca G.J., Reppas N.B., Lin X., McCutcheon J.P., Rosenbaum A.M., Wang M.D., Zhang K., Mitra R.D., Church G.M. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309:1728–1732. doi: 10.1126/science.1117389. [DOI] [PubMed] [Google Scholar]
  • 30.Hernandez D., Francois P., Farinelli L., Osteras M., Schrenzel J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 2008;18:802–809. doi: 10.1101/gr.072033.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Simpson J.T., Wong K., Jackman S.D., Schein J.E., Jones S.J., Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–1123. doi: 10.1101/gr.089532.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Maccallum I., Przybylski D., Gnerre S., Burton J., Shlyakhter I., Gnirke A., Malek J., McKernan K., Ranade S., Shea T.P. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol. 2009;10:R103. doi: 10.1186/gb-2009-10-10-r103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zerbino D.R., Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 35.Delcher A.L., Phillippy A., Carlton J., Salzberg S.L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002;30:2478–2483. doi: 10.1093/nar/30.11.2478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Li H., Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 2010 doi: 10.1093/bib/bbq015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ning Z., Cox A.J., Mullikin J.C. SSAHA: a fast search method for large DNA databases. Genome Res. 2001;11:1725–1729. doi: 10.1101/gr.194201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Li H., Ruan J., Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18:1851–1858. doi: 10.1101/gr.078212.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Li H., Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nicol J.W., Helt G.A., Blanchard S.G., Jr, Raja A., Loraine A.E. The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics. 2009;25:2730–2731. doi: 10.1093/bioinformatics/btp472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Carver T., Berriman M., Tivey A., Patel C., Bohme U., Barrell B.G., Parkhill J., Rajandream M.A. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008;24:2672–2676. doi: 10.1093/bioinformatics/btn529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Carver T., Bohme U., Otto T.D., Parkhill J., Berriman M. BamView: viewing mapped read alignment data in the context of the reference sequence. Bioinformatics. 2010;26:676–677. doi: 10.1093/bioinformatics/btq010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wang L., Feng Z., Wang X., Zhang X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics. 2010;26:136–138. doi: 10.1093/bioinformatics/btp612. [DOI] [PubMed] [Google Scholar]
  • 44.Jiang H., Wong W.H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009;25:1026–1032. doi: 10.1093/bioinformatics/btp113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Anders S., Huber W. Differential expression analysis for sequence count data. Nat Precedings. 2010 doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Robinson M.D., Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Shi Y., Tyson G.W., DeLong E.F. Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column. Nature. 2009;459:266–269. doi: 10.1038/nature08055. [DOI] [PubMed] [Google Scholar]
  • 49.Brantl S. Regulatory mechanisms employed by cis-encoded antisense RNAs. Curr Opin Microbiol. 2007;10:102–109. doi: 10.1016/j.mib.2007.03.012. [DOI] [PubMed] [Google Scholar]
  • 50.Marioni J.C., Mason C.E., Mane S.M., Stephens M., Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. doi: 10.1101/gr.079558.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hansen K.D., Brenner S.E., Dudoit S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 2010;38(12):e131. doi: 10.1093/nar/gkq224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Ozsolak F., Platt A.R., Jones D.R., Reifenberger J.G., Sass L.E., McInerney P., Thompson J.F., Bowers J., Jarosz M., Milos P.M. Direct RNA sequencing. Nature. 2009;461:814–818. doi: 10.1038/nature08390. [DOI] [PubMed] [Google Scholar]
  • 53.Mamanova L., Andrews R.M., James K.D., Sheridan E.M., Ellis P.D., Langford C.F., Ost T.W., Collins J.E., Turner D.J. FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nat Methods. 2010;7:130–132. doi: 10.1038/nmeth.1417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Tang F., Barbacioru C., Wang Y., Nordman E., Li C., Xu N., Wang X., Bodeau J., Tuch B.B., Siddiqui A, Lao K., Surani M.A. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6(5):377–382. doi: 10.1038/nmeth.1315. [DOI] [PubMed] [Google Scholar]

RESOURCES