Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2000 Jan 1;28(1):193–196. doi: 10.1093/nar/28.1.193

UTRdb and UTRsite: specialized databases of sequences and functional elements of 5′ and 3′ untranslated regions of eukaryotic mRNAs

Graziano Pesole 1,2,a, Sabino Liuni 2,3, Giorgio Grillo 2, Flavio Licciulli 2, Alessandra Larizza 4, Wojciech Makalowski 5, Cecilia Saccone 2,3,4
PMCID: PMC102415  PMID: 10592223

Abstract

The 5′ and 3′ untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb, a specialized database of 5′ and 3′ untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements. All internet resources implemented for retrieval and functional analysis of 5′ and 3′ untranslated regions of eukaryotic mRNAs are accessible at http://bigarea.area.ba.cnr.it:8000/EmbIT/UTRHome/

INTRODUCTION

Understanding the basic mechanisms of cell growth, differentiation and response to environmental stimuli, i.e., the program controlling the temporal and spatial order of molecular events, is becoming a real challenge in Molecular Biology. Indeed, although most of the regulatory elements are thought to be embedded in the non-coding part of the genomes, nucleotide databases are biased by the presence of expressed sequences mostly corresponding to the protein coding portion of the genes. Among non-coding regions, the 5′ and 3′ untranslated regions (5′-UTR and 3′-UTR) of eukaryotic mRNAs have often been experimentally demonstrated to contain sequence elements crucial for many aspects of gene regulation and expression (17).

The main functional roles so far demonstrated for 5′- and 3′-UTR sequences are: (i) control of mRNA cellular and subcellular localization (4,7,8); (ii) control of mRNA stability (1,9); and (iii) control of mRNA translation efficiency (10,11).

Several regulatory signals have already been identified in 5′- and 3′-UTR sequences, usually corresponding to short oligonucleotide tracts, also able to fold in specific secondary structures, which are protein binding sites for various regulatory proteins.

The analysis of large collections of functionally equivalent sequences (12,13), such as 5′- and 3′-UTR sequences, could indeed be very useful for defining their structural and compositional features as well as for searching the alleged function-associated sequence patterns (1416). For this reason we constructed UTRdb, a specialized sequence collection, deprived from redundancy, of 5′- and 3′-UTR sequences from eukaryotic mRNAs.

UTRdb entries have been enriched with specialized information not present in the primary databases, including the presence of sequence patterns demonstrated by experimental evidence to play some functional role. Additionally, because ~10% of mammalian mRNAs contain repetitive elements in their UTRs (17) which are not usually annotated in the original records, we decided to include this information in our database.

We also created UTRsite, a collection of functional sequence patterns located in the 5′- or 3′-UTR sequences which could prove very useful for automatic annotation of anonymous sequences generated by sequencing projects, as well as for finding previously undetected signals in known gene sequences.

ASSEMBLING UTRdb COLLECTIONS

The specialized database of UTR sequences was generated by UTRdb_gen, a computer program we devised for this task. Eight sequence collections were generated for both 5′- and 3′-UTR sequences, one for each of the eukaryotic divisions of the EMBL/GenBank nucleotide database, namely: (i) Human; (ii) Rodent; (iii) Other mammal; (iv) Other vertebrate; (v) Invertebrate; (vi) Plant; (vii) Fungi; and (viii) Patent.

UTRdb_gen, performing an accurate parsing of the Feature Table of the relevant EMBL entries is able to automatically generate the various UTRdb collections. Although the feature keys ‘5′UTR’ and ‘3′UTR’ are valid features for the EMBL/Genbank entries, only a small percentage of the entries are adequately annotated. Indeed, of the 120 767 primary entries where UTRdb_gen was able to extract 5′- or 3′-UTR sequences, only 15.8% contained the 5′UTR or 3′UTR feature key in the corresponding EMBL entry. UTRdb_gen is able to define UTR regions even when these keys are not reported in the primary entry by using a predefinite syntactic parsing of other relevant feature keys, such as mRNA, CDS, exon, intron, etc.

UTRdb_gen automatically annotates generated UTR entries by adding some specialized information such as completeness (or not) of the UTR region, number of spanned exons and cross-referencing to the primary database entry. A cross reference between 5′- and 3′-UTR sequences from the same mRNA has also been established.

The generation of UTR entries cleaned from redundancy has been obtained by using CLEANUP program (18) which is able to generate automatically, very quickly, cleaned collections by removing entries having a similarity and overlapping degree with longer entries present in the database above a user-fixed threshold. In this case, the cut-off parameters we used for the CLEANUP application were 95% for similarity and 90% for overlapping.

The UTR entries have been further enriched by using the program UTRnote (kindly provided by G. Grillo, Area de Ricerca di Bari del Consiglio Nazionale delle Ricerche) including information about the location of experimentally defined patterns collected in UTRsite and of repetitive elements present in the Repbase database (19). The UTRsite entries describe the various regulatory elements present in UTR regions whose functional role has been established on an experimental basis. Each UTRsite entry is constructed on the basis of information reported in the literature and revised by distinguished scientists experimentally working on the functional characterization of the relevant UTR regulatory element.

CONTENT OF UTRdb

Table 1 reports a summary description of UTRdb (release 12.0) which in total contains 120 767 entries and 37 353 172 nucleotides. On average, >29.3% of entries proved to be redundant and were removed from the database.

Table 1. Number of entries (N) and nucleotide length (L) of UTRdb collections (release 12.0) after redundancy cleaning.

      Redundancy
  N L %N %L
5-UTR        
Fungi 1136    195 215 23.91 13.04
Human 8785  1 887 755 38.61 28.15
Invertebrate 5376  1 033 413 27.63 15.52
Other_mammal 2429    339 321 36.06 27.62
Other_vertebrate 3564    519 656 25.63 18.19
Plant 8499    924 695 24.91 13.98
Rodent 8496  1 629 025 34.98 24.92
Patent 213     55 918 29.00 41.86
TOTAL 38 498  6 584 998    
3-UTR        
Fungi 1415    338 564 13.61  9.47
Human 10 207  8 367 057 36.91 30.95
Invertebrate 6677  2 607 959 19.89 17.06
Other_mammal 3202  1 457 422 29.14 24.27
Other_vertebrate 4419  2 195 694 21.22 14.36
Plant 11 548  2 777 812 15.16 14.15
Rodent 9181  5 737 426 34.66 27.41
Patent 232     91 287 27.04 43.03
TOTAL 46 881 23 573 221    

UTRdb 12.0 was generated from EMBL release 59. Relevant redundancy percentages calculated with respect to the number of entries (%N) and to the nucleotide length (%L) are also indicated.

5′-UTR sequences were defined as the mRNA region spanning from the cap site to the starting codon (excluded), whereas 3′-UTR sequences were defined as the mRNA region spanning from the stop codon (excluded) to the poly-A starting site.

A sample UTRdb entry is shown in Figure 1. The UTRdb entries have been formatted according to the EMBL database format.

Figure 1.

Figure 1

Sample entry of UTRdb. Specialized information not present in the primary EMBL/GenBank database is shown in bold case with active crosslinks with other databases underlined. The ‘UT’ line reports information about completeness or not of the relevant UTR entry (e.g. complete or partial) as well as the number of spanned exons in the case of genomic DNA sequences. The presence in this sequence entry of a ‘5′ribosomal mRNA TOP’ (32–34) (UTRsite entry: U0010) and of a microsatellite element has also been annotated.

Table 2 reports functional patterns and repetitive elements included in UTRsite (release 3.0). More entries will be included in further releases. A sample UTRsite entry is reported in Figure 2. Functional patterns, defined on the basis of the information reported in the literature and/or advice by the scientists expert in the field, were described by using the pattern description syntax used in the PATSCAN program (20).

Table 2. Functional patterns included so far in UTRsite (v3.0).

Functional patterns Reference Hits found in UTRdb 12.0
Iron-responsive element (IRE)
23
65
Histone 3′UTR stem–loop structure
24
27
AU-rich class II destabilising element
25
175
TGE translational regulation element
26
45
Selenocysteine insertion sequence (SECIS)
27,28
189
APP 3′-UTR stability control element
29
7
Cytoplasmatic polyadenylation element (CPE)
30
4614
Nanos
31
397
ribosomal protein mRNA 5′ TOP
32–34
298
TNF mRNA translation repression element
35
14
Vimentin 3′UTR mRNA element
36
12
GLUT1 mRNA stabilising element
37
48
15-LOX-DICE
38
83
Repetitive elements   44 806

For each pattern the number of hits with UTRdb entries is also reported.

Figure 2.

Figure 2

Sample entry of UTRsite describing the ‘iron responsive element (IRE)’ (23). The IRE functional pattern which consists of both primary and secondary structure information is described in the ‘Pattern’ section according to the format adopted by the PATSCAN program (http://bio-www.ba.cnr.it:8000/BioWWW/patscanGCG.html ).

AVAILABILITY OF UTRdb

UTRdb and UTRsite are publicly available by anonymous FTP (ftp://area.ba.cnr.it/pub/embnet/database/utr/ ). All internet resources we implemented for retrieval and functional analysis of 5′- and 3′-UTR sequences are accessible at http://bigarea.area.ba.cnr.it:8000/EmbIT/UTRHome/ (21). These include SRS retrieval (22) of UTRdb and UTRsite, also available at the EBI WWW server (http://srs.ebi.ac.uk:80/ ), UTRscan and UTRfasta. The UTRscan utility allows the enquirer to search user-submitted sequences for any of the patterns collected in UTRsite. The UTRfasta utility allows database searches against fully annotated UTRdb entries.

CONCLUSIONS AND PERSPECTIVES

The important role that untranslated regions of eukaryotic mRNAs may play in gene regulation and expression is now widely recognized. Indeed, experimental studies have demonstrated that sequence motifs located in the untranslated regions are involved in crucial biological functions.

The huge amount of functionally equivalent sequences stored in UTRdb now makes possible the study of their structural and compositional features and the application of statistical methods for the identification of significant signals. Previous cleaning-up of databases is necessary however to avoid artefacts caused by redundant sequences. Even if statistical significance does not necessarily mean biological significance, it may provide a useful indication for further experimental work, such as site-directed mutagenesis.

UTRdb will be updated with the new EMBL database releases and UTRsite will be continuously updated by adding new entries describing functional patterns whose biological role has been experimentally demonstrated.

Acknowledgments

ACKNOWLEDGEMENTS

For revision of UTRsite entries we would like to thank Jim Malter (APP 3′-UTR stability control element), Alain Krol (SECIS), Matthias Hentze (IRE and 15-LOX DICE), Bill Marzluff (histone stem–loop structure), Ann-Bin Shyu (ARE), Arturo Verrotti (CPE), Robin Wharton (nanos), Elizabeth Goodwin (TGE), Roger Kaspar (ribosomal protein mRNA TOP), Danuta Radzioch (TNF mRNA translation repression element), Ruben Boado (GLUT1 mRNA stabilising element) and Zendra E. Zehner (Vimentin 3′UTR mRNA element). This work was supported by EU grant ERB-BIO4-CT96-0030 and by Programma Biotecnologie legge 95/95 (MURST 5%).

REFERENCES


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES