Abstract
With the increasing quantities of Brassica genomic data being entered into the public domain and in preparation for the complete Brassica genome sequencing effort, there is a growing requirement for the structuring and detailed bioinformatic analysis of Brassica genomic information within a user-friendly database. At the Plant Biotechnology Centre, Melbourne, Australia, we have developed a series of tools and computational pipelines to assist in the processing and structuring of genomic data, to aid its application to agricultural biotechnology research. These tools include a sequence database, ASTRA, a sequence processing pipeline incorporating annotation against GenBank, SwissProt and Arabidopsis Gene Ontology (GO) data and tools for molecular marker discovery and comparative genome analysis. All sequences are mined for simple sequence repeat (SSR) molecular markers using ‘SSR primer’ and mapped onto the complete Arabidopsis thaliana genome by sequence comparison. The database may be queried using a text-based search of sequence annotation or GO terms, BLAST comparison against resident sequences, or by the position of candidate orthologues within the Arabidopsis genome. Tools have also been developed and applied to the discovery of single nucleotide polymorphism (SNP) molecular markers and the in silico mapping of Brassica BAC end sequences onto the Arabidopsis genome. Planned extensions to this resource include the integration of gene expression data and the development of an EnsEMBL-based genome viewer.
Full Text
The Full Text of this article is available as a PDF (95.0 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000 Dec 14;408(6814):796–815. doi: 10.1038/35048692. [DOI] [PubMed] [Google Scholar]
- Barker Gary, Batley Jacqueline, O' Sullivan Helen, Edwards Keith J., Edwards David. Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics. 2003 Feb 12;19(3):421–422. doi: 10.1093/bioinformatics/btf881. [DOI] [PubMed] [Google Scholar]
- Batley Jacqueline, Barker Gary, O'Sullivan Helen, Edwards Keith J., Edwards David. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiol. 2003 May;132(1):84–91. doi: 10.1104/pp.102.019422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewing B., Hillier L., Wendl M. C., Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998 Mar;8(3):175–185. doi: 10.1101/gr.8.3.175. [DOI] [PubMed] [Google Scholar]
- Huala E., Dickerman A. W., Garcia-Hernandez M., Weems D., Reiser L., LaFond F., Hanley D., Kiphart D., Zhuang M., Huang W. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 2001 Jan 1;29(1):102–105. doi: 10.1093/nar/29.1.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pertea Geo, Huang Xiaoqiu, Liang Feng, Antonescu Valentin, Sultana Razvan, Karamycheva Svetlana, Lee Yuandan, White Joseph, Cheung Foo, Parvizi Babak. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003 Mar 22;19(5):651–652. doi: 10.1093/bioinformatics/btg034. [DOI] [PubMed] [Google Scholar]
- Rozen S., Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000;132:365–386. doi: 10.1385/1-59259-192-2:365. [DOI] [PubMed] [Google Scholar]