Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2004 Jan 1;32(Database issue):D423–D426. doi: 10.1093/nar/gkh010

Nematode.net: a tool for navigating sequences from parasitic and free-living nematodes

Todd Wylie 1,*, John C Martin 1, Michael Dante 1, Makedonka Dautova Mitreva 1, Sandra W Clifton 1, Asif Chinwalla 1, Robert H Waterston 1,2, Richard K Wilson 1, James P McCarter 1,3
PMCID: PMC308745  PMID: 14681448

Abstract

Nematode.net (www.nematode.net) is a web- accessible resource for investigating gene sequences from nematode genomes. The database is an outgrowth of the parasitic nematode EST project at Washington University’s Genome Sequencing Center (GSC), St Louis. A sister project at the University of Edinburgh and the Sanger Institute is also underway. More than 295 000 ESTs have been generated from >30 nematodes other than Caenorhabditis elegans including key parasites of humans, animals and plants. Nematode.net currently provides NemaGene EST cluster consensus sequence, enhanced online BLAST search tools, functional classifications of cluster sequences and comprehensive information concerning the ongoing generation of nematode genome data. The long-term goal of nematode.net is to provide the scientific community with the highest quality sequence information and tools for studying these diverse species.

INTRODUCTION

Nematodes or roundworms are members of an ancient phylum that accounts for perhaps four out of every five individual animals in the world (1). Parasitic nematodes infect nearly half the world’s human population, resulting in significant morbidity and mortality. Nematodes also parasitize livestock and companion animals and cause over 80 billion dollars in crop damage annually (2,3). Nematode.net is a specialty database that makes accessible the rapidly expanding nucleotide sequence data and related resources from species across this phylum to target audiences including human/mammalian parasitologists, plant nematologists, Caenorhabditis elegans biologists and other scientists.

SEQUENCES FROM PARASITIC NEMATODES

Following the completion of the first fully sequenced animal genome, the nematode C.elegans (4), increasing efforts have been made to rapidly generate and make public gene sequences from parasitic nematodes of medical and economic importance as a route toward research on new anthelmintic drugs, vaccines, safe pesticides and resistant plants. Initiatives have primarily utilized expressed sequence tags (ESTs), focusing first on the filarial worms responsible for elephantiasis and river blindness (5,6). A collaboration is currently underway involving the Genome Sequencing Center (GSC) at Washington University in St Louis, the Wellcome Trust Sanger Institute, the University of Edinburgh and dozens of participating parasitologists to extend EST-based gene discovery to more than 30 nematode species (7,8). To date, over 295 000 ESTs have been generated from nematodes beyond C.elegans, with nearly 220 000 of these sequences provided by the GSC (Table 1).

Table 1. Nematode EST projects by species.

Clade Nematode species Host Total ESTs GSC ESTs ESTs clustered Clusters Database
V Ancylostoma caninum Mammal 9331 9331 9286 4020 NemaGene
  Ancylostoma ceylanicum Mammal 10651 10590 10590 3369 NemaGene
  Caenorhabditis briggsae Free-living 2424 2424      
  Caenorhabditis elegans Free-living 215202 388     Wormbase
  Haemonchus contortus Mammal 21967 14014 5181 1970 NEMBASE
  Necator americanus Mammal 4766   4766 2298 NEMBASE
  Nippostrongylus brasiliensis Mammal 1234   1234 750 NEMBASE
  Ostertagia ostertagi Mammal 7009 6558      
  Pristionchus pacificus Free-living 8818 8818 4979 2603 NemaGene
  Teladorsagia circumcincta Mammal 4313        
               
IVA Strongyloides stercoralis Mammal 11392 11335 10908 3311 NemaGene
  Strongyloides ratti Mammal 14822 14822 8618 2941 NemaGene
  Parastrongyloides trichosuri Mammal 7963 7963 4528 2155 NemaGene
               
IVB Globodera rostochiensis Plant 5934 5040 5039 2375 NemaGene
  Globodera pallida Plant 1832        
  Heterodera glycines Plant 20114 20109 4307 1790 NemaGene
  Heterodera schachtii Plant 2662 2662      
  Meloidogyne arenaria Plant 3519 3519 3321 1866 NemaGene
  Meloidogyne chitwoodi Plant 10789 10789      
  Meloidogyne hapla Plant 13869 13869      
  Meloidogyne incognita Plant 13452 13168 5661 1625 NemaGene
  Meloidogyne javanica Plant 5600 5578 5574 2598 NemaGene
  Pratylenchus penetrans Plant 1928 1928 1926 420 NemaGene
  Zeldia punctata Free-living 391 391 378 195 NemaGene
               
III Ascaris lumbricoides Mammal 1822        
  Ascaris suum Mammal 39242 29960 19280 4262 NemaGene
  Brugia malayi Mammal 26212 3773 18741 8392 NEMBASE
  Dirofilaria immitis Mammal 4005 4005      
  Litomosoides sigmodontis Mammal 873        
  Onchocerca volvulus Mammal 14971 1230 7911 3504 NEMBASE
  Toxocara canis Mammal 4889 4370      
  Wuchereria bancrofti Mammal 2166        
               
I Trichinella spiralis Mammal 10 767 10548 10130 3454 NemaGene
  Trichuris muris Mammal 3063   2125 1322 NEMBASE
  Trichuris vulpis Mammal 2402 2402      
               
  Totals   510394 219584 144483 55220  
               

Nematodes with >100 ESTs are shown. NEMBASE clusters are available at www.nematodes.org. Clades are based upon (9).

NemaGene CLUSTERS AND NemaBLAST SEARCHES

While GSC-generated ESTs are immediately deposited in GenBank’s database of ESTs (dbEST), no such repository exists for nematode EST cluster consensus sequences, nor are tailored BLAST searches easily performed. Nematode.net began in 2000 by providing these services. NemaGene clustering improves upon EST data by reducing data redundancy, increasing transcript length and improving base accuracy. The NemaGene method uses the Phred/Phrap/Consed suite of analysis programs (10), together with internal supplemental scripts, and has the advantage that clusters can be edited when necessary and tracked by name through multiple builds (11). Clusters can be searched on the nematode.net website by EST name, putative identity and individual contig or cluster name (Fig. 1). Cluster entries provide EST membership with NCBI links, as well as SWIR non-redundant protein database, Sanger Centre and C.elegans (Wormpep) homology. Cluster information and sequences can also be downloaded by FTP. NemaGene clusters have so far been generated for 15 species (Table 1). Both NemaGene clusters and individual ESTs can be searched for sequence identity using the online NemaBLAST tool, which utilizes a local WU-BLAST server (12) (http://blast.wustl.edu). Searches can be performed on ESTs from specific species, clades, stages and libraries, in any combination desired by the user.

Figure 1.

Figure 1

A NemaGene Cluster Search query response showing constituents of consensus sequence by contig.

FUNCTIONAL CLASSIFICATIONS AND OTHER FEATURES

Nematode.net provides the user with two avenues to explore the putative function of NemaGene clusters. Both are based on extrapolation from homology and must be regarded as providing only a starting hypothesis in studying function. Cluster sequences were used to search the Interpro protein domain database (13) (www.ebi.ac.uk/interpro) with InterProScan. Based on the presence of conserved domains, clusters were then mapped onto the Gene Ontology (GO) classification scheme (14) (www.geneontology.org). GO biological, molecular and cellular classifications are provided at nematode.net with the AmiGO interface. NemaGene clusters have also been mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database of biochemical pathways using enzyme commission (EC) numbers as the basis for putative assignment (15) (www.genome.ad.jp/kegg). Addi tional useful features of nematode.net include summaries of sequence status for all nematode species, cDNA library descriptions, project specifics, >300 organized nematology links and a trace viewer that allows users to examine raw sequence data. Nematode.net is also used to manage requests for clones generated by the project. Since 1999, 377 clones and dozens of plates have been provided to 37 investigators in 14 countries.

SITE AND DATABASE DESIGN

The Nematode.net interface was constructed using the Dreamweaver MX web development application in combination with a Perl CGI/DBI database interface. The GUI-based Dreamweaver MX editor was chosen for HTML design due to ease of use, ability to make rapid site-wide modifications and project tracking features. HTML pages written under Dreamweaver MX are sourced by a GSC Perl module, which has proved to be fast, extensible, and useful for recycling previously written code. Relational databases were initially built in MySQL and are now being replaced by a single, more efficient Oracle database.

FUTURE DIRECTIONS

Nematode.net is a work in progress with the long-term goal of providing the nematology community with useful, consistent and lasting integrated databases and tools. With over 29 000 unique users in the past year, nematode.net is already providing a useful service, but improvements are envisioned in three areas. First, the site’s current databases will be extended to include almost all available nematode species and sequences, expedited by further automation of clustering algorithms. Second, nematode.net will become more closely integrated with the C.elegans database Wormbase (16) (www.wormbase.org) and Nembase (www.nematodes.org), a site maintained by our collaborators at the University of Edinburgh that also provides tools for investigating nematode sequences (8). Plans for Wormbase integration include the layering of non-C.elegans nematode gene sequences over C.elegans homologs using the Distributed Annotation System (DAS) method (17). Currently, 9894 C.elegans genes have strong homologs in other nematodes (BLAST score of <1e-20). C.elegans information will continue to reside only at Wormbase. Third, in collaboration with Nembase, additional features for navigating nematode sequences will be made available. Databases covering all nematodes will include: postulated amino acid translations of EST clusters; protein domains connected to Pfam (18) and Interpro including new nematode-specific domains; genes with homologs in C.elegans where RNA interference phenotype information is available (19); proteins with predicted signal peptide sequences; and codon usage tables for each species. Other possible additions include the integration of whole-genome information for parasitic nematode species (e.g. Brugia malayi) as such data become available.

Acknowledgments

ACKNOWLEDGEMENTS

Sequence generation has been aided by numerous collaborators in the nematology community, cDNA library creation by Claire Murphy and Brandi Chiapelli, and the dedicated members of the Darwin EST laboratory at the GSC. Wormbase efforts at the GSC are headed by John Spieth. We would like to thank our collaborators at NemBase, Mark Blaxter and John Parkinson, and others involved in Wellcome-Trust-funded nematode sequencing at the University of Edinburgh and the Sanger Institute. Additional feedback on website development was provided by Ben Oberkfel and Mike Nhan. Nematode.net and the parasitic nematode EST sequencing at the GSC is supported by US National Institute for Allergy and Infectious Disease grant AI46593 to R.H.W. and R.K.W. and National Science Foundation Plant Genome award 0077503 to S.W.C. and David M.Bird. J.P.M. was a Helen Hay Whitney/Merck Fellow.

REFERENCES

  • 1.Platt H.M. (1994) Foreword. In Lorenzen,S. (ed.), The Phylogenetic Systematics of Free-Living Nematodes. The Ray Society, London, pp. i–ii. [Google Scholar]
  • 2.Blaxter M. and Bird,D. (1997) Parasitic Nematodes. In Riddle,D.L., Blumenthal,T. Meyers,B.J. and Priess,J.R. (eds), C. elegans II. Cold Spring Harbor Laboratory Press, Plainview, NY, pp. 851–878. [PubMed] [Google Scholar]
  • 3.Barker K.R., Hussey,R.S., Krusberg,L.R., Bird,G.W., Dunn,R.A., Ferris,V.R., Freckmann,D.W., Gabriel,C.J., Grewal,P.S., Macguidwin,A.E., Riddle,D.L., Roberts,P.A. and Schmitt,D.P. (1994) Plant and soil nematodes—societal impact and focus for the future. J. Nematol., 26, 127–137. [PMC free article] [PubMed] [Google Scholar]
  • 4.The Caenorhabditis elegans Genome Sequencing Consortium (1998) Genome sequence of Caenorhabditis elegans: a platform for investigating biology. Science, 282, 2012–2018. [DOI] [PubMed] [Google Scholar]
  • 5.Williams S.A., Lizotte-Waniewski,M.R., Foster,J., Guiliano,D., Daub,J., Scott,A.L., Slatko,B. and Blaxter,M.L. (2000) The filarial genome project: analysis of the nuclear, mitochondrial and endosymbiont genomes of Brugia malayi. Int. J. Parasitol., 30, 411–419. [DOI] [PubMed] [Google Scholar]
  • 6.Unnasch T.R. and Williams,S.A. (2000) The genomes of Onchocerca volvulus. Int. J. Parasitol., 30, 543–552. [DOI] [PubMed] [Google Scholar]
  • 7.McCarter J.P., Clifton,S., Bird,D.M. and Waterston,R.H. (2002) Nematode gene sequences, update for June 2002. J. Nematol., 34, 71–74. [PMC free article] [PubMed] [Google Scholar]
  • 8.Parkinson J., Mitreva,M., Hall,N., Blaxter,M. and McCarter,J.P. (2003) 400 000 nematode ESTs on the Net. Trends Parasitol., 19, 283–286. [DOI] [PubMed] [Google Scholar]
  • 9.Blaxter M.L., De Ley,P., Garey,J.R., Liu,L.X., Scheldeman,P., Vierstraete,A., Vanfleteren,J.R., Mackey,L.Y., Dorris,M., Frisse,L.M. et al. (1998) A molecular evolutionary framework for the phylum Nematoda. Nature, 392, 71–75. [DOI] [PubMed] [Google Scholar]
  • 10.Ewing B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res., 8, 175–185. [DOI] [PubMed] [Google Scholar]
  • 11.McCarter J.P., Mitreva,M.D., Martin,J., Dante,M., Wylie,T., Rao,U., Pape,D., Bowers,Y., Theising,B., Murphy,C.V. et al. (2003) Analysis and functional classification of transcripts from the nematode Meloidogyne incognita. Genome Biol., 4, R26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. [DOI] [PubMed] [Google Scholar]
  • 13.Mulder N.J., Apweiler,R., Attwood,T.K., Bairoch,A., Barrell,D., Bateman,A., Binns,D., Biswas,M., Bradley,P., Bork,P. et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res., 31, 315–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ashburner M. and Lewis,S. (2002) On ontologies for biologists: the Gene Ontology—untangling the web. Novartis Found. Symp., 247, 66–90, 244–252. [PubMed] [Google Scholar]
  • 15.Kanehisa M. (2002) The KEGG database. Novartis Found. Symp., 247, 91–103, 119,–128, 244–252. [PubMed] [Google Scholar]
  • 16.Harris T.W., Lee,R., Schwarz,E., Bradnam,K., Lawson,D., Chen,W., Blasier,D., Kenny,E., Cunningham,F. and Kishore,R. (2003) WormBase: a cross-species database for comparative genomics. Nucleic Acids Res., 31, 133–137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Dowell R.D., Jokerst,R.M., Day,A., Eddy,S.R. and Stein,L. (2001) The Distributed Annotation System. BMC Bioinformatics, 2, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bateman A., Birney,E., Cerruti,L., Durbin,R., Etwiller,L., Eddy,S.R., Griffiths-Jones,S., Howe,K.L., Marshall,M. and Sonnhammer,E.L. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kamath R.S., Fraser,A.G., Dong,Y., Poulin,G., Durbin,R., Gotta,M., Kanapin,A., Le Bot,N., Moreno,S., Sohrmann,M. et al. (2003) Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature, 421, 231–237. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES