Abstract
The noncoding RNAs database is a collection of currently available sequence data on RNAs, which have no protein-coding capacity and have been implicated in regulation of cellular processes. The RNAs included in the database form very heterogenous group of molecules that act on different levels of information transmission in the cell. It includes RNAs acting on the level of chromatin structure, transcriptional and translational regulation of gene expression, modulation of protein function and regulation of subcellular distribution of RNAs and proteins. Those RNAs, with potential regulatory functions have been identified in prokaryotic, animal and plant cells. The database can be accessed at http://biobases.ibch.poznan.pl/ncRNA/.
INTRODUCTION
In recent years it became increasingly clear, that non-protein coding RNAs (ncRNAs) constitute a large portion of the transcriptional output from the genomes. There is a constantly growing number of novel RNAs that do not encode proteins and do not perform housekeeping functions in the cells (like tRNAs, rRNAs, snRNAs, snoRNAs, RNAseP RNA, tmRNA). Those RNA species often play regulatory roles and are sometimes called ‘riboregulators’ (1). In contrast to housekeeping RNAs, which are usually constitutively expressed, regulatory noncoding RNAs are produced only at certain stages of organism's development or cell differentiation, or in response to external stimuli. Generally, the noncoding regulatory RNAs can be divided into transcriptional and post-transcriptional regulators of gene expression, modulators of the activity of proteins and determinants of RNA and proteins distribution within the cell (2,3). NcRNAs are implicated in a number of cellular processes, but in many cases, it is difficult to precisely determine mechanism of their action.
Riboregulators are involved in regulation of gene expression on a very basic level of chromatin structure. In mammals and Drosophila, the processes of X-chromosome inactivation and dosage compensation, that are responsible for equalization of transcriptional levels from X chromosomes involve regulatory RNAs (Xist, Tsix, roX) (4–6). This is achieved either by shutting off transcription from one of the X chromosomes by the silencing factors recruited by Xist RNA in mammalian XX cells, or by doubling the transcriptional output from single the single X chromosome in Drosophila XY cells.
Another group of noncoding RNAs, constitute transcripts from imprinted loci whose transcription depends on parental origin. Although the role of these RNAs is largely unknown, their disruption is often associated with severe genetic disorders (7,8). From several examples, it seems that they may play a role as determinants of the imprinting status for large portions of the chromosomes (9).
Regulatory RNAs are also involved in transcriptional regulation as modulators of protein functions. In bacteria, 6S RNA, whose function remained a mystery for many years, was found to form stable complexes with the σ70 holoenzyme of RNA polymerase. This interaction is responsible for the modulation of enzyme activity and different promoter usage in the stationary phase of growth (10). In mammals, SRA RNA was identified as a coactivator of several steroid receptors, including receptors for androgens, estrogens, glucocorticoids and progestins (11). Another mammalian noncoding transcript—7SK RNA—was found regulate the RNA polymerase II activity. 7SK RNA forms specific complex with the positive transcription elongation factor (P-TEFb) and inhibits its kinase and transcriptional activity (12,13).
The most obvious way, in which an RNA molecule can influence expression of genetic information is through an antisense mechanism, or base-pairing with complementary fragment of another RNA chain. First identified in Caenorhabditis elegans, a new class of small (∼21) nt long RNAs (microRNAs) can regulate the expression of protein genes on the post-transcriptional level by hybridization with complementary fragments in mRNA (14). Subsequent studies showed, that these tiny RNAs, processed from longer precursors forming stem–loop structures are widely represented in animals (15,16) and in plants (17). This suggests that, that mechanisms of gene expression involving small antisense RNAs originated very early in the evolution of eukaryotes. The antisense Noncoding RNAs can also play a role of the regulators of protein functions. In bacteria, a number of small noncoding RNAs is involved in stimulation or repression of mRNA translation via the antisense mechanism (18). Natural antisense RNA-mediated regulation of translation was also demonstrated in vitro for human HFE gene involved in iron metabolism (19).
In several cases, noncoding RNAs have been shown to be involved in developmental events or as tissue-specific transcripts. In birds, MHM RNA and in mammals, TTY2 RNA have been implicated in mechanism of sex determination (20,21) and the number of cell type-specific noncoding transcripts is growing every year (1,2).
The above examples do not exhaust the full repertoire of possible roles that may be played by RNA molecules. They show, however, that for decades the functions of RNA in the cells was grossly underestimated.
It is difficult to assess the number of noncoding RNAs yet to be discovered. Unlike, protein-coding genes that can be identified by a presence of open reading frames, splicing and polyadenylation sites, RNA-coding genes are much more difficult to find within the genomic sequences. Several attempts have been made to define characteristic features of RNA-encoding genes that could be useful in screening of genomic sequences (22,23). Comparative genomic analyses were employed to search for novel ncRNA genes in bacteria. A combination of computational and experimental methods led to identification of new noncoding RNAs in Escherichia coli (24–26) Arabidopsis thaliana (27) and two archaeal species (28).
The advances in the field of noncoding RNAs clearly show that we are still far away from understanding of all the possible ways they can influence a variety of molecular processes in the cell. Chemical properties of RNA make it an ideal specific, intracellular signalling molecule that can be quickly produced in a response to internal or external stimulus and then rapidly destroyed, when it is no longer needed.
CONTENTS OF THE DATABASE
The purpose of the database is to serve information on noncoding RNAs with documented or possible regulatory functions. In addition to nucleotide sequences the database contains short descriptions of the activities of particular ncRNAs, original GenBank accession numbers and literature references. The sequences can be retrieved as FASTA format files. In the instances where more than one different variants of a given RNA has been identified and described in a single GenBank entry (e.g. as a result of alternative splicing), each of them is included as a separate sequence in the database. The nucleotide sequences, that constitute parts of longer GenBank records were extracted using information provided in the feature tables or based on the multiple sequence alignments.
At present, there are over 300 sequences in the database, with over 50% contribution of microRNAs. The sequences in the database are divided based on two criteria: origin (eubacterial, archaeal and eukaryotic) and, in the case of eukaryotic RNAs, function or specific expression pattern. In a situation where, a large fraction of noncoding RNAs has yet unidentified function, their classification is a difficult task. For the purpose of clarity, the ncRNAs from Eukaryota, which constitute the major part of the database, are divided into nine groups.
Dosage compensation and X-chromosome inactivation RNAs
This group includes RNAs implicated in the regulation of transcription of the genes located on sex chromosomes in mammals (Xist, Tsix), Drosophila (roX) and birds (MHM).
Noncoding transcripts from imprinted genes
The noncoding RNAs in this group have been shown to originate from chromosomal regions that are subject to genetic imprinting. They include H19, Rian and IPW RNAs and antisense transcripts from other imprinted loci.
Stress-induced transcripts
RNAs produced in a response to various stress conditions like heat shock (G8 RNA from Tetrahymena thermophila), hypoxia (aHIF RNA), treatment with hydrogen peroxide (adapt15, adapt33) or DNA damage (gadd7). This group also includes plant noncoding RNAs that have been shown to be hormone- and stress-induced transcripts (CR20, GUT15, Mt4).
Protein function modulators
These RNAs have been implicated in the regulation of activity of proteins involved in transcriptional control: steroid receptors (SRA1 RNA) and P-TEFb (7SK RNA).
Nervous system RNAs
The transcripts in this group are specifically expressed in nervous tissue in rodents (BC1, Ntab, Bsr) and primates (BC200).
Developmentally regulated and tissue specific RNAs
This is a very heterogenous group. The RNAs included here are involved in the developmental processes or are specifically expressed or overproduced in specialized cell types.
RNAs involved in localization of RNAs and proteins
These RNAs were demonstrated to play a role in the regulation of the distribution of mRNAs and proteins within the cell. They include Xenopus Xlsirt sequences, ascidian yellow crescent RNA (ScYC RNA) and Drosophila hsr-omega RNAs.
snoRNA host genes
Noncoding RNA transcripts with introns processed to snoRNAs (gas5, U17HG, UHG, U50HG, U19H).
microRNAs
A collection of human, murine, Drosophila melanogaster, C. elegans and A. thaliana microRNAs.
The database is available online at http://biobases.ibch.poznan.pl/ncRNA/.
Acknowledgments
ACKNOWLEDGEMENTS
This work was supported by grants from the Polish State Committee for Scientific Research to J.B., and from the Fonds der Chemischen Industrie e.V., the Bundesministerium fur Wissenschaft, Forschung und Technologie and the National Foundation for Cancer Research to V.A.E.
REFERENCES
- 1.Erdmann V.A., Barciszewska,M.Z., Szymanski,M., Hochberg,A., de Groot,N. and Barciszewski,J. (2001) The non-coding RNAs as riboregulators. Nucleic Acids Res., 29, 189–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Szymanski M. and Barciszewski,J. (2002) Beyond the proteome, non-coding regulatory RNAs. Genome Biol., 3, reviews0005.1–0005.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Storz G. (2002) An expanding universe of noncoding RNAs. Science, 296, 1260–1263. [DOI] [PubMed] [Google Scholar]
- 4.Avner P. and Heard,E. (2001) X-chromosome inactivation, counting, choice and initiation. Nature Rev. Genet., 2, 59–67. [DOI] [PubMed] [Google Scholar]
- 5.Lee J.T., Davidow,L.S. and Warshawsky,D. (1999) Tsix, a gene antisense to Xist at the X-inactivation center. Nature Genet., 21, 400–404. [DOI] [PubMed] [Google Scholar]
- 6.Meller V.H., Gordadze,P.R., Park,Y., Chu,X., Stuckenholz,C., Kelley,R.L. and Kuroda,M.I. (2000) Ordered assembly of roX RNAs into MSL complexes on the dosage—compensated X chromosome in Drosophila. Curr. Biol., 10, 136–143. [DOI] [PubMed] [Google Scholar]
- 7.Jiang Y., Tsai,T.-F., Bressler,J. and Beaudet,A.L. (1998) Imprinting in Angelman and Prader-Willi syndromes. Curr. Opin. Genet. Dev., 8, 334–342. [DOI] [PubMed] [Google Scholar]
- 8.Tanaka K., Shiota,G., Meguro,M., Mitsuya,K., Oshimura,M. and Kawasaki,H. (2001) Loss of imprinting of long QT intronic transcript 1 in colorectal cancer. Oncology, 60, 268–273. [DOI] [PubMed] [Google Scholar]
- 9.Horike S., Mitsuya,K., Meguro,M., Kotobuki,N., Kashiwagi,A., Notsu,T., Schulz,T.C., Shirayoshi,Y. and Oshimura,M. (2000) Targeted disruption of the human LIT1 locus defines a putative imprinting control element playing an essential role in Beckwith-Wiedemann syndrome. Hum. Mol. Genet., 9, 2075–2083. [DOI] [PubMed] [Google Scholar]
- 10.Wassarman K.M. and Storz,G. (2000) 6S RNA regulates E. coli RNA polymerase activity. Cell, 101, 613–623. [DOI] [PubMed] [Google Scholar]
- 11.Lanz R.B., McKenna,N.J., Onate,S.A., Albrecht,U., Wong,J., Tsai,S.Y., Tsai,M.J. and O'Malley,B.W. (1999) A steroid receptor coactivator, SRA, functions as an RNA and is present in an SRC-1 complex. Cell, 97, 17–27. [DOI] [PubMed] [Google Scholar]
- 12.Nguyen V.T., Kiss,T., Michels,A.A. and Bensaude,O. (2001) 7SK small nuclear RNA binds to and inhibits the activity of CDK9/cyclin T complexes. Nature, 414, 322–325. [DOI] [PubMed] [Google Scholar]
- 13.Yang Z., Zhu,Q., Luo,K. and Zhou,Q. (2001) The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription. Nature, 414, 317–322. [DOI] [PubMed] [Google Scholar]
- 14.Lee R.C., Feinbaum,R.L., Ambros,V. (1993) The C.elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 75, 843–854. [DOI] [PubMed] [Google Scholar]
- 15.Lee L.C. and Ambros,V. (2001) An extensive class of small RNAs in C. elegans. Science, 294, 862–864. [DOI] [PubMed] [Google Scholar]
- 16.Lagos-Quintana M., Rauhut,R., Lendeckel,W. and Tuschl,T. (2001) Identification of novel genes coding for small expressed RNAs. Science, 294, 853–858. [DOI] [PubMed] [Google Scholar]
- 17.Reinhart B.J., Weinstein,E.G., Rhoades,M.W., Bartel,B. and Bartel,D.P. (2002) MicroRNAs in plants. Genes Dev., 16, 1616–1626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wassarman K.M., Zhang,A. and Storz,G. (1999) Small RNAs in E. coli. Trends Microbiol., 7, 37–45. [DOI] [PubMed] [Google Scholar]
- 19.Thenie A.C., Gicquel,I.M., Hardy,S., Ferran,H., Fergelot,P., Le Gall,J.-Y. and Mosser,J. (2001) Identification of an endogenous RNA transcribed from the antisense strand of the HFE gene. Hum. Mol. Genet., 10, 1859–1866. [DOI] [PubMed] [Google Scholar]
- 20.Teranishi M., Shimada,Y., Hori,T., Nakabayashi,O., Kikuchi,T., Macleod,T., Pym,R., Sheldon,B., Solovei,I., Macgregor,H. and Mizuno,S. (2001) Transcripts of the MHM region on the chicken Z chromosome accumulate as non-coding RNA in the nucleus of female cells adjacent to the DMRT1 locus. Chromosome Res., 9, 147–165. [DOI] [PubMed] [Google Scholar]
- 21.Makrinou E., Fox,M., Lovett,M., Haworth,K., Cameron,J.M., Taylor,K. and Edwards,Y.H. (2001) TTY2, a multicopy Y-linked gene family. Genome Res., 11, 935–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rivas E. and Eddy,S.R. (2001) Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics, 16, 583–605. [DOI] [PubMed] [Google Scholar]
- 23.Carter R.J., Dubchak,I. and Holbrook,S.R. (2001) A computational approach to identify genes for functional RNAs in genomic sequences. Nucleic Acids Res., 29, 3928–3938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rivas E., Klein,R.J., Jones,T.A. and Eddy,S.R. (2001) Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr. Biol., 11, 1369–1373. [DOI] [PubMed] [Google Scholar]
- 25.Wassarman K.M., Repoila,F., Rosenow,C., Storz,G. and Gottesman,S. (2001) Identification of novel small RNAs using comparative genomics and microarrays. Genes Dev., 15, 1637–1651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Argaman L., Hershberg,R., Vogel,J., Bejerano,G., Wagner,G.E.H., Margalit,H. and Altuvia,S. (2001) Novel small RNA-encoding genes in the intergenic regions of E. coli. Curr. Biol., 11, 941–950. [DOI] [PubMed] [Google Scholar]
- 27.MacIntosh G.C., Wilkerson,C. and Green,P.J. (2001) Identification and analysis of Arabidopsis expressed sequence tags characteristic of non-coding RNAs. Plant Physiol., 127, 765–776. [PMC free article] [PubMed] [Google Scholar]
- 28.Klein R.J., Misulovin,Z. and Eddy,S.R. (2002) Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc. Natl Acad. Sci. USA, 99, 7542–7547. [DOI] [PMC free article] [PubMed] [Google Scholar]