FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones

Hidemasa Bono; Takeya Kasukawa; Masaaki Furuno; Yoshihide Hayashizaki; Yasushi Okazaki

doi:10.1093/nar/30.1.116

. 2002 Jan 1;30(1):116–118. doi: 10.1093/nar/30.1.116

FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones

Hidemasa Bono ¹, Takeya Kasukawa ^1,2, Masaaki Furuno ¹, Yoshihide Hayashizaki ¹, Yasushi Okazaki ^1,^a

PMCID: PMC99137 PMID: 11752270

Abstract

FANTOM DB, the database of Functional Annotation of RIKEN Mouse cDNA Clones, is designed to store sequence information of RIKEN full-length enriched mouse cDNA clones, graphical views of sequence analysis results, curated functional annotation information and additional descriptions, including Gene Ontology terms. RIKEN’s Mouse Gene Encyclopedia Project aims to collect full-length enriched cDNA clones from various mouse tissues, determine the full-length nucleotide sequences, infer their chromosomal locations by computer and characterize gene expression patterns. FANTOM DB has been developed to facilitate this work and to facilitate functional genomic studies such as positional candidate cloning, cDNA microarrays and protein interaction analyses. FANTOM DB contains 21 076 full-length cDNA sequences with rich functional annotations and is publicly available. FANTOM DB thus provides curated functional annotation to RIKEN full-length enriched mouse clones, and has links to other public resources. FANTOM DB can be accessed at http://fantom.gsc.riken.go.jp/db/.

INTRODUCTION

The RIKEN Mouse Gene Encyclopedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collecting and sequencing full-length complementary DNAs (cDNAs), physical mapping of the corresponding genes to the mouse genome, determining gene expression patterns in all mouse tissues and determining interactions among all transcripts. FANTOM DB has been developed to facilitate work on the project as well as functional genomic studies such as positional candidate cloning, cDNA microarrays and protein interaction analyses. FANTOM DB provides users with well-curated functional annotation of the RIKEN full-length cDNA clones together with a systematic search tool and BLAST search. The annotation process and the vocabulary used were determined at the international meeting of Functional Annotation of RIKEN full-length Mouse cDNAs (FANTOM meeting) (1). The data are now stored as a database, accessible worldwide. These data are available both in a web page generated on-the-fly and in concatenated text in MaXML (Mouse Annotation XML) format.

IMPLEMENTATION

Curators annotated the clones with the help of the FANTOM+ system, which allowed them to view pre-computed sequence similarity and motif search results, to launch additional searches and to transfer the annotation from any of these to the FANTOM DB. FANTOM+ system has the following features:

1. A simple interface with HTML and JavaScript.

2. A graphical user interface.

3. A controlled system of annotation records.

4. A parser for XML output (MaXML).

The top page of the FANTOM DB (http://fantom.gsc.riken.go.jp/db/) has a menu (Search, MaXML and BLAST search) and a link to the FANTOM main page. The left side of Figure 1 shows a search result page containing three frames. The top frame shows a summary of the current object (RIKEN clone, database record, protein motif, etc.) (summary field). The middle frame shows the results of curated functional annotation of the current cDNA clone (annotation field). The curation has been performed using the FANTOM+ system which provides an annotation editor. The lower frame displays pre-computed sequence similarity and motif search results (reference field). The results of clustering, FASTA search against SPTR-nrdb (SWISS-PROT and TrEMBL non-redundant database, ftp://ftp.ebi.ac.uk/pub/databases/st_tr_nrdb/), BLAST search against SPTR-nrdb and TIGR-EST (http://www.tigr.org/tdb/tgi.shtml), Wise2 search against Pfam and InterPro search, etc., are displayed. The orthologue search using HomoloGene database is also shown at the bottom. The alignment of each sequence can be browsed by clicking the ‘align’ link button (right upper and lower window). MaXML provides all the functional annotation information described in the annotation field in XML format. The BLAST search allows the users to search against all the sequences included in the FANTOM DB.

A search result page containing three frames is shown. The top frame is the summary field, with links to DNA and amino acid sequences. The middle frame is the annotation field, where functional annotation of a cDNA is described. The lower frame is the reference field, which shows sequence analysis results and source information. The right windows show alignments with SPTR-nrdb (upper) and Pfam (lower). The alignment windows appear when the ‘align’ buttons in the reference frame are selected. Links in the reference and annotation frames take users to MGD, InterPro, LocusLink, GenPept, EGAD_HT, TIGR-MGI (TIGR-Mouse Gene Index), DDBJ, GenBank, EMBL, NCBI-nr, PIR (Protein Information Resource), UniGene and UTRdb (Untranslated Region database, http://bighost.area.ba.cnr.it/BIG/UTRHome/). The reference field also has links to Mapping (http://www.gsc.riken.go.jp/e/FANTOM/map/) and Superfamily (http://stash.mrc-lmb.cam.ac.uk/SUPERFAMILY/).

Sequence data

Full-length mouse cDNA sequences were produced as a Phase II activity of the RIKEN Mouse Encyclopedia Project (2).

Functional annotation

In the FANTOM DB, each RIKEN clone is assigned a RIKEN definition (riken_def) to indicate the most likely gene function or status of the sequence on the basis of similarity to known genes. A supplementary RIKEN definition line (riken_def_suppl) is available for additional pertinent annotation. Biological function (Biological Process, Molecular Function and Cellular Component) is described by assigning Gene Ontology (GO) terms to the clones (3). The IDs of these GO terms can be referred to as the attributes of database, whose qualifiers are gene_ontology_P (Biological Process), gene_ontology_F (Molecular Function) and gene_ontology_P (Cellular Process). The database contains curators’ statements in note and comment fields.

Search

Annotation data for mouse clones can be seen via the Search menu in FANTOM+. Currently, users can search clones by ID search, keyword search, Pfam domain search and TIGR Cellular Role search.

ID search. Each entry has many kinds of IDs, including Clone ID, Sequence ID and FANTOM ID. Users can search for entries by ID.

Keyword search. Users can search for entries by functional annotation keywords with target qualifiers. Both GO terms and IDs can be used as keywords.

Pfam domain. Users can search for entries with results of estwisedb [a program in the Wise2 package (http://www.sanger.ac.uk/Software/Wise2/)] against the Pfam database (4). If they know the Pfam name to retrieve, they can enter it in the search form. If they do not know it, they can see a list of all Pfam names matched to sequences in the FANTOM database.

Cellular Role search. Users can search for entries with the results of TIGR Cellular Role prediction by homology searches against the EGAD HT database (Expressed Gene Anatomy Database, http://www.tigr.org/tdb/egad/egad.shtml).

Data availability

All sequences and functional annotations are available from our web and FTP sites. For further information please write to fantom-help@gsc.rike.go.jp.

FUTURE DIRECTIONS

We are continuously updating functional annotations in the FANTOM DB with new results of sequence analyses. We are organizing a FANTOM2 meeting to annotate an additional sequence set. After this meeting, additional sequences and functional annotations will be released. We have been developing an automatic functional annotation system (FIND: the Functional Inference Descriptor) to determine RIKEN definitions. FIND will be used at the FANTOM2 meeting and the results will be compared to those obtained by human curators.

CITING FANTOM

The following citation format is suggested when referring to the RIKEN FANTOM set annotation: FANTOM, Genome Exploration Research Group, Genomic Sciences Center, Yokohama Institute, RIKEN, Yokohama, Kanagawa, Japan (http://fantom.gsc.riken.go.jp/) and: The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium (2001) Functional annotation of a full-length mouse cDNA collection. Nature, 409, 685–690.

Acknowledgments

ACKNOWLEDGEMENTS

We acknowledge all the members of the FANTOM consortium for the use of FANTOM annotation. We also acknowledge Dr M. Brownstein for comments and editing the manuscript. This study was supported by a Research Grant for the RIKEN Genome Exploration Research Project from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government and by ACT-JST (Research and Development for Applying Advanced Computational Science and Technology) of the Japan Science and Technology Corporation (JST).

REFERENCES

1.Quackenbush J. (2000) Viva la revolution! A report from the FANTOM meeting. Nature Genet., 26, 255–256. [DOI] [PubMed] [Google Scholar]
2.The RIKEN Genome Exploration Research Group Phase II Team and The FANTOM Consortium (2001) Functional annotation of a full-length mouse cDNA collection. Nature, 409, 685–690. [DOI] [PubMed] [Google Scholar]
3.Ashburner M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. et al. (2000) Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet., 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bateman A., Birney,E., Durbin,R., Eddy,S.R., Howe,K.L. and Sonnhammer,E.L. (2000) The Pfam protein families database. Nucleic Acids Res., 28, 263–266. Updated article in this issue: Nucleic Acids Res. (2002), 30, 276–280. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All sequences and functional annotations are available from our web and FTP sites. For further information please write to fantom-help@gsc.rike.go.jp.

[gkf081c1] 1.Quackenbush J. (2000) Viva la revolution! A report from the FANTOM meeting. Nature Genet., 26, 255–256. [DOI] [PubMed] [Google Scholar]

[gkf081c2] 2.The RIKEN Genome Exploration Research Group Phase II Team and The FANTOM Consortium (2001) Functional annotation of a full-length mouse cDNA collection. Nature, 409, 685–690. [DOI] [PubMed] [Google Scholar]

[gkf081c3] 3.Ashburner M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. et al. (2000) Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet., 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkf081c4] 4.Bateman A., Birney,E., Durbin,R., Eddy,S.R., Howe,K.L. and Sonnhammer,E.L. (2000) The Pfam protein families database. Nucleic Acids Res., 28, 263–266. Updated article in this issue: Nucleic Acids Res. (2002), 30, 276–280. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones

Hidemasa Bono

Takeya Kasukawa

Masaaki Furuno

Yoshihide Hayashizaki

Yasushi Okazaki

Abstract

INTRODUCTION

IMPLEMENTATION

Figure 1.

Sequence data

Functional annotation

Search

Data availability

FUTURE DIRECTIONS

CITING FANTOM

Acknowledgments

ACKNOWLEDGEMENTS

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones

Hidemasa Bono

Takeya Kasukawa

Masaaki Furuno

Yoshihide Hayashizaki

Yasushi Okazaki

Abstract

INTRODUCTION

IMPLEMENTATION

Figure 1.

Sequence data

Functional annotation

Search

Data availability

FUTURE DIRECTIONS

CITING FANTOM

Acknowledgments

ACKNOWLEDGEMENTS

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases