Skip to main content
Plant Physiology logoLink to Plant Physiology
. 1996 Nov;112(3):1177–1183. doi: 10.1104/pp.112.3.1177

The construction of Arabidopsis expressed sequence tag assemblies. A new resource to facilitate gene identification.

S D Rounsley 1, A Glodek 1, G Sutton 1, M D Adams 1, C R Somerville 1, J C Venter 1, A R Kerlavage 1
PMCID: PMC158044  PMID: 8938416

Abstract

The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.

Full Text

The Full Text of this article is available as a PDF (759.7 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Adams M. D., Dubnick M., Kerlavage A. R., Moreno R., Kelley J. M., Utterback T. R., Nagle J. W., Fields C., Venter J. C. Sequence identification of 2,375 human brain genes. Nature. 1992 Feb 13;355(6361):632–634. doi: 10.1038/355632a0. [DOI] [PubMed] [Google Scholar]
  2. Adams M. D., Kelley J. M., Gocayne J. D., Dubnick M., Polymeropoulos M. H., Xiao H., Merril C. R., Wu A., Olde B., Moreno R. F. Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 1991 Jun 21;252(5013):1651–1656. doi: 10.1126/science.2047873. [DOI] [PubMed] [Google Scholar]
  3. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  4. Boguski M. S., Lowe T. M., Tolstoshev C. M. dbEST--database for "expressed sequence tags". Nat Genet. 1993 Aug;4(4):332–333. doi: 10.1038/ng0893-332. [DOI] [PubMed] [Google Scholar]
  5. Bult C. J., White O., Olsen G. J., Zhou L., Fleischmann R. D., Sutton G. G., Blake J. A., FitzGerald L. M., Clayton R. A., Gocayne J. D. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. doi: 10.1126/science.273.5278.1058. [DOI] [PubMed] [Google Scholar]
  6. Fleischmann R. D., Adams M. D., White O., Clayton R. A., Kirkness E. F., Kerlavage A. R., Bult C. J., Tomb J. F., Dougherty B. A., Merrick J. M. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995 Jul 28;269(5223):496–512. doi: 10.1126/science.7542800. [DOI] [PubMed] [Google Scholar]
  7. Fraser C. M., Gocayne J. D., White O., Adams M. D., Clayton R. A., Fleischmann R. D., Bult C. J., Kerlavage A. R., Sutton G., Kelley J. M. The minimal gene complement of Mycoplasma genitalium. Science. 1995 Oct 20;270(5235):397–403. doi: 10.1126/science.270.5235.397. [DOI] [PubMed] [Google Scholar]
  8. Gibson S., Somerville C. Isolating plant genes. Trends Biotechnol. 1993 Jul;11(7):306–313. doi: 10.1016/0167-7799(93)90019-6. [DOI] [PubMed] [Google Scholar]
  9. Höfte H., Desprez T., Amselem J., Chiapello H., Rouzé P., Caboche M., Moisan A., Jourjon M. F., Charpenteau J. L., Berthomieu P. An inventory of 1152 expressed sequence tags obtained by partial sequencing of cDNAs from Arabidopsis thaliana. Plant J. 1993 Dec;4(6):1051–1061. doi: 10.1046/j.1365-313x.1993.04061051.x. [DOI] [PubMed] [Google Scholar]
  10. Keen G., Burton J., Crowley D., Dickinson E., Espinosa-Lujan A., Franks E., Harger C., Manning M., March S., McLeod M. The Genome Sequence DataBase (GSDB): meeting the challenge of genomic sequencing. Nucleic Acids Res. 1996 Jan 1;24(1):13–16. doi: 10.1093/nar/24.1.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Patanjali S. R., Parimoo S., Weissman S. M. Construction of a uniform-abundance (normalized) cDNA library. Proc Natl Acad Sci U S A. 1991 Mar 1;88(5):1943–1947. doi: 10.1073/pnas.88.5.1943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Smith T. F., Waterman M. S. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES