Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2002 Jan 1;30(1):84–86. doi: 10.1093/nar/30.1.84

Genomic database resources for Dictyostelium discoideum

Lisa Kreppel 1, Alan R Kimmel 1,a
PMCID: PMC99139  PMID: 11752261

Abstract

Dictyostelium is an attractive model system for the study of mechanisms basic to cellular function or complex multicellular developmental processes. Recent advances in Dictyostelium genomics have generated a wide spectrum of resources. However, much of the current genomic sequence information is still not currently available through GenBank or related databases. Thus, many investigators are unaware that extensive sequence data from Dictyostelium has been compiled, or of its availability and access. Here, we discuss progress in Dictyostelium genomics and gene annotation, and highlight the primary portals for sequence access, manipulation and analysis (http://genome.imb-jena.de/dictyostelium/; http://dictygenome.bcm.tmc.edu/; http://www.sanger. ac.uk/Projects/D_discoideum/; http://www.csm.biol. tsukuba.ac.jp/cDNAproject.html).

INTRODUCTION

Dictyostelium discoideum is a member of a unique grouping of organisms that exists at the transition of multicellularity. It has proven a powerful system for studying molecular mechanisms that underlie fundamental cellular processes, including cytokinesis, motility, phagocytosis, chemotaxis and signal transduction. In addition, many developmental pathways that regulate cell sorting, pattern formation, activated gene expression and cell fate choice are shared by Dictyostelium and the metazoa.

Dictyostelium has been designated by The National Institutes of Health (USA) as a non-mammalian model organism for functional analysis of sequenced genes. Dictyostelium has a haploid genome arrayed on six chromosomes that range in size from ∼4 to 8 Mb. The chromosomal genome is only ∼34 Mb, approximately three times that of Saccharomyces cerevisiae, but ∼25–35% that of Caenorhabditis elegans or Arabidopsis thaliana. Various estimates predict a coding capacity of approximately 10 000 genes, and many of the known genes show a high degree of sequence similarity to genes in vertebrate species. In addition, Dictyostelium has proven exceptionally receptive for generating targeted gene disruptions at very high frequency. An international consortium has been organized to determine the sequence of the Dictyostelium genome. These data will be used in combination with the ongoing Japanese cDNA project to fully annotate the genome.

Biological details, chromosomal maps and technical information, including downloads of the Franke literature database, the restriction enzyme-mediated integration (REMI; 1) gene disruption program and images of cytoskeletal dynamics, chemotaxis and morphogenesis, can be found at DictyBase (R. Chisholm, Northwestern University Medical School, IL; http://dictybase.org/) or DictyDB, an ACeDB database for Dictyostelium (D. Smith and W. Loomis, University of California, San Diego, CA; http://www-biology.ucsd.edu/others/dsmith/dictydb.html). Additional web sites can be found as Supplementary Material at NAR Online.

THE GENOME CONSORTIUM

The sequencing and analyses of the Dictyostelium genome was established as a collaborative effort among the Institute of Molecular Biotechnology at Jena (Germany), the University of Cologne (Germany), the Baylor College of Medicine (USA), the Sanger Centre (UK) and the Pasteur Institute (France). To facilitate assembly, sequencing was organized on a chromosome by chromosome shotgun basis. Individual chromosomes were separated by Edward Cox, Princeton University (USA), and the enriched chromosomal-specific DNA preparations were randomly sheared to 1–4 kb fragments and cloned into plasmid-based libraries. The libraries were distributed among the sequencing centers, Jena/Cologne for chromosomes 1, 2 and 3 and Baylor/Sanger for chromosomes 4, 5 and 6. Initial focus was on the chromosomes most easily resolved, the largest, chromosome 2, and the smallest, chromosome 6. Since all the chromosomes are of similar size, none could be purified to homogeneity. Thus, the chromosomal ‘specific’ reads were inevitably ‘contaminated’ with sequences from other chromosomes, but by continuously exchanging all primary data and clones, complete coverage has been facilitated.

Sequence assembly is anchored using data obtained from a combination of YAC and HAPPY mapping. An overlapping set of ordered YACs exists for each of the Dictyostelium chromosomes (2). Skimmed sequences derived from the ordered YACs define landing markers to identify and assemble linked sequences. However, YACs are subject to severe artifacts of chimerism that can yield false linkage information. HAPPY mapping (3) is a completely independent in vitro approach that is functionally analogous to classical genetic linkage mapping. The HAPPY maps are being used to identify chromosomal (STS equivalent) markers at ∼10 kb spacing and to eliminate chimeric and incorrectly mapped YACs (4). These techniques permit the clustering of a new tiling set of YACs for seeding and gap-filling.

Chromosomes 1, 2 and 6 have been sequenced to an approximate overall depth of 6-fold. However, due to the non-random distribution of A+T-rich stretches that bias cloning and sequence representation (see below), protein coding regions have been sequenced to an 8-fold depth, with intergenic regions represented at <4-fold. Sequence analysis of the dispersed, complex, chromosomal repeat families is complete (5), as is the that of the 55 kb circular mitochondrial (mtDNA) genome (6) and the 88 kb linear, extrachromosomal rDNA palindrome (A. Kuspa, see below). Assembly of chromosomes 1, 2 and 6 is nearing completion. Sequencing of chromosomes 3, 4 and 5 is proceeding.

Approximately 66% of the entire chromosomal genome can be displayed in contigs of >2 kb, with the largest scaffolds approaching 500 kb. Annotation (see below) predicts an average spacing of one gene per 3 kb, consistent with an estimate of approximately 10 000 genes in the Dictyostelium genome and one of the highest gene densities for any eukaryote.

GenBank is still not the primary resource for Dictyostelium genomic data. Sequences will only be deposited that are unequivocal and compiled to a level of extremely high quality. However, all sequence data can be accessed and searched through the various centers listed below. All sites offer access to genomic sequences from all of the centers. In addition, primary and contig data + GenBank + ESTs + mtDNA + rDNA can be searched by web-based modes for BLAST (or BLAST-variant) analyses and comparisons (7). However, note that since all primary data are available, some may be of poor quality or even of non-Dictyostelium origin. The contig sequences are filtered of these problematic data.

Dictyostelium genomic resource sites

The University of Cologne (A.A. Noegel and L. Eichinger) and GSC Jena (G. Glockner, M. Platzer and A. Rosenethal); http://genome.imb-jena.de/dictyostelium/.

The Baylor College of Medicine (A. Kuspa and R. Gibbs); http://dictygenome.bcm.tmc.edu/.

The Sanger Centre (B. Barrell, M.-A. Rajandream and M. Quail); http://www.sanger.ac.uk/Projects/D_discoideum/.

A basic BLAST server for all Dictyostelium sequences is available through the San Diego Supercomputer Center (N. Iranfar and W.F. Loomis) at UCSD; http://dicty.sdsc.edu/. The data at this site are updated within days of the appearance of new data at the other sequencing centers.

THE TRANSCRIPTOME

The Dictyostelium cDNA Project of Japan is a collaborative effort among the Universities of Tsukuba (Y. Tanaka, H. Urushihara, T. Morio, M. Katoh and H. Kuwayama), Hokkaido (H. Ochiai and T. Saito) and Osaka (M. Maeda) to identify developmental patterns of gene expression in Dictyostelium by sequencing stage-specific sets of cDNAs. Initially the data were derived from slug and, to a lesser extent, growth stage cDNAs. BLAST and PHRAP-based (8,9) analyses were used in conjunction with genomic sequence alignments to establish a non-redundant gene set. Collectively, these two stages minimally express approximately 5500 genes. Thus, through this cooperative effort alone, approximately half of the predicted Dictyostelium genes are already characterized and available as cloned cDNAs.

Using oligo-capping methods (10) for cDNA preparation, this group has recently created cDNA sets from each of four major transitional stages of the Dictyostelium developmental cycle: growth, aggregation, slug and culmination; ∼90% of the cDNAs in each set are full-length. They have begun 5′ and 3′ single-pass sequence reads to identify translational start and stop sites. These sequences will be compiled with all available data to assemble the Dictyostelium transcriptome with a preliminary assessment of stage-specific representation.

The cDNAs have been organized into clusters and each clone and cluster has been analyzed for conceptual translation and for both nucleotide and amino acid sequence similarities. Nucleotide sequences for many of the cDNAs have been deposited in GenBank (11). The cDNA site (http://www.csm.biol.tsukuba.ac.jp/cDNAproject.html) is searchable on the web by clone names or key words. BLAST and FastA services are also available, but results are not web-based; communication is via email response. Although it may be more convenient to search for cDNAs and ESTs at the genomic sites, their databases may not be as complete as here.

Finally, individual cDNA clones or the non-redundant slug cDNA set are directly available from the cDNA Project upon request through the site. The Project has proven to be a remarkable and unique resource.

GENE DISCOVERY AND ANNOTATION

The unusual genomic organization of Dictyostelium has made gene prediction extremely tractable. While the overall G+C content is 22%, intergenic elements, promoters, UTRs and introns are <10% G+C; protein coding regions are, on average, ∼40% G+C and thus easily distinguished. The Dictyostelium introns utilize consensus GT/AG splice junctions and are usually quite small (100–200 nt), further enabling protein sequence assembly (12,13). By coupling the unique gene characteristics with comparative analyses of the extensive databases developed by the cDNA and genomic sequencing projects, prediction and assembly for any selected element can be easily performed by simple scanning. Processes have been automated with moderate success using variations of GlimmerM (14) and GeneFinder (15). Each of the sequencing centers provides unique analyses. Additional classification is available through UCSD/SDSC (http://dicty.sdsc.edu/annot-blast.html); Dicty Workbench portal (T.B.K. Reddy; http://dictyworkbench.sdsc.edu/) is based on Oracle and provides BLAST, RPS-BLAST and PFAM analysis data.

More than 6000 protein sequences have, thus far, been annotated, and the depth of genomic sequencing predicts that ∼98% of all Dictyostelium genes are at least partially represented in the various databases. These analyses have already led to the identification and functional characterization of numerous genes and gene families shared by the metazoa. Nonetheless, >50% of the genes appear unique to Dictyostelium. Interestingly, 11 of the 113 human genes that are absent from the genomes of C.elegans, S.cerevisiae or Drosophila melanogaster, but which share sequence identity with bacterial genes, are also present in Dictyostelium (16). Many others are not present in S.cerevisiae, but are shared by Dictyostelium and the metazoa.

FUTURE PERSPECTIVES

Completion of the Dictyostelium genome during the next year will enable superimposition of physical and gene maps on the chromosomes. Full annotation will facilitate the design of a unigene set and production of all-gene microarray platforms. Finally, the facile ability of high frequency targeted gene disruption in Dictyostelium will lead to the directed mutagenesis and functional studies of every predicted gene.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.

[Supplementary Data]
nar_30_1_84__index.html (1.5KB, html)

Acknowledgments

ACKNOWLEDGEMENTS

We are indebted to all of our colleagues mentioned above, who have focused much of their research on the analysis of the Dictyostelium genome. We are grateful for the many long conversations with Drs R. Chisholm, E. Cox, P. Devreotes, L. Eichinger, R. Firtel, R. Kay, A. Kuspa, W. Loomis, M. Maeda, T. Morio, A. Noegel, G. Shaulsky, R. Sucgang, Y. Tanaka, H. Urushihara and J. Williams with regard to this subject. Finally, we wish to thank Drs J. Brzostowski, F. Comer, L. Kim, T. Khurana, C. Parent, D. Rosel and P. Schwartzberg for their helpful discussions and suggestions.

REFERENCES

  • 1.Kuspa A. and Loomis,W.F. (1992) Tagging developmental genes in Dictyostelium by restriction enzyme-mediated integration of plasmid DNA. Proc. Natl Acad. Sci. USA, 8, 8803–8807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kuspa A. and Loomis,W.F. (1996) Ordered yeast artificial chromosome clones representing the Dictyostelium discoideum genome. Proc. Natl Acad. Sci. USA, 93, 5562–5566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Dear P.H. and Cook,P,R. (1993) Happy mapping: linkage mapping using a physical analogue of meiosis. Nucleic Acids Res., 12, 13–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Konfortov B.A., Cohen,H.M., Bankier,A.T. and Dear,P.H. (2000) A high-resolution HAPPY map of Dictyostelium discoideum chromosome 6. Genome Res., 10, 1658–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Glockner G., Szafranski,K., Winckler,T., Dingermann,T., Quail,M.A., Cox,E., Eichinger,L., Noegel,A.A. and Rosenthal,A. (2001) The complex repeats of Dictyostelium discoideum. Genome Res., 11, 585–594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ogawa S., Yoshino,R., Angata,K., Iwamoto,M., Pi,M., Kuroe,K., Matsuo,K., Morio,T., Urushihara,H., Yanagisawa,K. and Tanaka,Y. (2000) The mitochondrial DNA of Dictyostelium discoideum: complete sequence, gene content and genome organization. Mol. Gen. Genet., 263, 514–519. [DOI] [PubMed] [Google Scholar]
  • 7.Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. [DOI] [PubMed] [Google Scholar]
  • 8.Ewing B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res., 8, 175–185. [DOI] [PubMed] [Google Scholar]
  • 9.Ewing B. and Green,P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, 186–194. [PubMed] [Google Scholar]
  • 10.Suzuki Y. and Sugano,S. (2001) Construction of full-length-enriched cDNA libraries. The oligo-capping method. Methods Mol. Biol., 175, 143–153. [DOI] [PubMed] [Google Scholar]
  • 11.Morio T., Urushihara,H., SaitoT., Ugawa,Y., Mizuno,H., Yoshida,M., Yoshino,R., Mitra,B.N., Pi,M., Sato,T. et al. (1998) The Dictyostelium developmental cDNA project: generation and analysis of expressed sequence tags from the first-finger stage of development. DNA Res., 5, 335–340. [DOI] [PubMed] [Google Scholar]
  • 12.Kimmel A.R. and Firtel,R.A. (1982) Organization and expression of the Dictyostelium genome. In Loomis,W.F. (ed.), The Development of Dictyostelium discoideum. Academic Press, New York, pp. 234–324.
  • 13.Kimmel A.R. and Firtel,R.A. (1983) Sequence organization in Dictyostelium: unique structure at the 5′-ends of protein coding genes. Nucleic Acids Res., 11, 541–552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Salzberg S.L., Pertea,M., Delcher,A.L., Gardner,M.J. and Tettelin,H. (1999) Interpolated Markov models for eukaryotic gene finding. Genomics, 59, 24–31. [DOI] [PubMed] [Google Scholar]
  • 15.Solovyev V. and Salamov,A. (1997) The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Ismb, 5, 294–302. [PubMed] [Google Scholar]
  • 16.Roelofs J. and Van Haastert,P.J. (2001) Genes lost during evolution. Nature, 411, 1013–1014. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplementary Data]
nar_30_1_84__index.html (1.5KB, html)

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES