Abstract
PlasmoDB (http://PlasmoDB.org) is a functional genomic database for Plasmodium spp. that provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB belongs to a family of genomic resources that are housed under the EuPathDB (http://EuPathDB.org) Bioinformatics Resource Center (BRC) umbrella. The latest release, PlasmoDB 5.5, contains numerous new data types from several broad categories—annotated genomes, evidence of transcription, proteomics evidence, protein function evidence, population biology and evolution. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page. Search results can be downloaded with associated functional data and registered users can store their query history for future retrieval or analysis.
INTRODUCTION
Plasmodium spp. are obligate intracellular protozoan parasites of humans and animals, and are the causative agents of malaria. Transmission of these parasites to humans occurs via the Anopheles mosquito vector and the geographic distribution of endemic regions puts almost half of the world's population at risk to contracting malaria. This disease is a major source of morbidity and mortality worldwide, which results in 300–500 million clinical cases and 1–2 million deaths annually (1,2). While several species of Plasmodium cause disease in humans (including P. vivax, P. malariae, P. ovale and P. knowlesi), P. falciparum is by far the deadliest (1,3). The life cycle of the Plasmodium parasite takes it through multiple cell types (in the vertebrate host and arthropod vector) during which the parasite undergoes multiple developmental changes (both sexual and asexual). The different life-cycle stages are marked by specific genomic, transcriptomic, proteomic and metabolomic states. Understanding how these changes are triggered and orchestrated requires mechanisms to view and interrogate genomic and functional genomic data in a powerful and intuitive manner. Over the past 10 years, PlasmoDB has evolved into a venue that integrates such data and allows the user to perform complex queries tailored to their specific needs and interests.
UPDATED DATA CONTENT
The data available in PlasmoDB has expanded to include genomic and functional data from eight Plasmodium species and is summarized in Table 1 (4). The current release (PlasmoDB 5.5) contains fully sequenced and annotated genomes of P. falciparum, P. vivax, P. yoelii, P. berghei, P. chabaudi and P. knowlesi. Importantly, PlasmoDB 5.5 contains results of annotation efforts from multiple sources including the recent systematic effort to update the P. falciparum genome that is an ongoing project started at a workshop in late 2007 co-organized by the Wellcome Trust Sanger Institute (WTSI) and EuPathDB (formerly ApiDB) teams. Reannotation data have been released in incremental steps (snapshots) in order to provide timely information to users of PlasmoDB and to solicit user comments regarding the reannotations.
Table 1.
Types of data available in PlasmoDB and example queries
Type of Data | Species for which this data is available | Example query |
---|---|---|
Genomic data | ||
Full sequence and annotation | P. falciparum, P. vivax, P. yoelii, P. berghei, P. chabaudi, P. knowlesi | Search annotations for specific keyword (see Figure 1C). |
Sequence only | P. reichenowi, P. gallinaceum | Find sequence similarity using BLAST. |
Transcript expression data | ||
Microarray | P. falciparum, P. berghei, P. yoelii | Identify genes expressed at specific life-cycle stages. |
EST | P. falciparum, P. vivax, P. berghei, P. yoelii | Confirm gene models and alternative gene models. |
SAGE | P. falciparum | Identify genes with transcript evidence. |
Protein expression data | P. falciparum, P. berghei, P. yoelii | Identify genes with protein expression evidence at specific life-cycle stages. |
Population biology | ||
SNP Microsatellite Isolate data | P. falciparum | Find highly polymorphic genes or distinguish isolates based on their SNP profile. |
Protein interaction | ||
Yeast two hybrid Interactome map | P. falciparum | Identify possible interaction partners of a gene of interest. |
Putative function | ||
GO annotation | P. falciparum, P. vivax, P. yoelii, P. berghei, P. chabaudi, P. knowlesi | Identify genes that have GO annotations. |
EC numbers | P. falciparum, P. yoelii, P. knowlesi | Identify genes with enzymatic annotations. |
Metabolic pathways | P. falciparum | Identify parasite-specific or missing metabolic pathways. |
Evolutionary | ||
Orthology based | P. falciparum, P. vivax, P. yoelii, P. berghei, P. chabaudi, P. knowlesi | Identify genes specific to apicomplexa. |
Homology based | P. falciparum and P. yoelii | Identify homologs of a gene or list of genes of interest. |
Protein features | ||
Protein motifs Interpro/pfam domains Molecular weight Isoelectric point Protein structure Immune epitopes | P. falciparum, P. vivax, P. yoelii, P. berghei, P. chabaudi, P. knowlesi | Identify genes with specific protein attributes. |
Protein localization | ||
Signal peptide Transmembrane domains Targeting to the RBC | P. falciparum, P. vivax, P. yoelii, P. berghei, P. chabaudi, P. knowlesi | Identify genes targeted to the host cell. |
Apicoplast targeting | P. falciparum | Identify genes targeted to the apicoplast. |
Transcript expression data [microarray, expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE)] available through PlasmoDB has expanded dramatically over the past few releases to include microarray data from multiple life-cycle stages, gene knock-out mutants of P. falciparum and P. berghei (5–12) and multiple stages of P. yoelii (mosquito, erythrocytic and liver stages) (13). Also included are EST data from over 130 libraries (P. falciparum, P. vivax, P. berghei and P. yoelii) (14,15) [dbEST (http://www.ncbi.nlm.nih.gov/dbEST/)] and SAGE data (P. falciparum only) (16–18). Protein expression evidence includes data from various life-cycle stages (P. falciparum, P. berghei and P. yoelii) (11,13,19–21; Leiden Malaria Group, unpublished data).
Population biology evidence (P. falciparum only) includes mapping of microsatellite data (22) onto the genome (available as a genome browser track), single nucleotide polymorphism (SNP) data from resequencing efforts of more than 20 P. falciparum strains (P. reichenowi is included as an out-group for comparison purposes) and data from nearly 100 P. falciparum isolates (23–25). OrthoMCL analyses provide ortholog determinations between the different species facilitating discovery of shared genes between lineages (26). Protein function assignments are aided by a number of additional functional data types available through PlasmoDB 5.5 including evidence of protein–protein interaction (yeast two hybrid and predicted interactome) (27,28), Genome Ontology (GO) (29) and InterPro domain (30) annotations for P. falciparum, P. vivax, P. berghei, P. yoelii, P. knowlesi and P. chabaudi, Enzyme Commission (EC) number (29) annotation for P. falciparum, P. yoelii and P. knowlesi (31) and metabolic pathway assignments for P. falciparum (31). In addition, subcellular localization of proteins is available through signal peptide (32) and transmembrane domain predictions (33) for P. falciparum, P. vivax, P. berghei, P. yoelii, P. knowlesi and P. chabaudi, and parasite-specific predictions (P. falciparum only) for apicoplast localization (34) and export to the host cell (35–37).
HOW TO USE PLASMODB
A visitor to PlasmoDB can use the database in two general ways: (i) To retrieve all available information associated with a particular gene of interest using a search for an exact gene ID, gene name or gene product name. (ii) To ask single questions (Table 1) and/or conduct a series of searches followed by refining the results by combining them or subtracting them from one another. Starting with the PlasmoDB home page (Figure 1A), a user can perform a quick search by entering an identifier or test term, or select a specific query from a number of drop-down menus (data not shown). Alternatively, queries may be accessed by visiting the ‘Queries and Tools’ section of PlasmoDB (Figure 1A), which includes a grid displaying all available queries/searches. By using the queries and tools, a user can interrogate data in PlasmoDB—the third column of Table 1 includes example data-specific questions that are available.
Figure 1.
Screenshots from PlasmoDB 5.5 and query workflow. (A) The top of the screenshot shows the PlasmoDB logo. On the left side are links to various sections of PlasmoDB and a point for logging in or registering as a user (not required for using the site but useful for storing search histories. The query grid is in the center and provides an access point to all searchable data in PlasmoDB. (B) This is a scheme of a workflow that a user may follow when building a set of queries. Beginning at the left, queries can be performed starting from the query grid and the results can be joined using operations available through the query history page. (C) Screen shots of a ‘key word’ search page, an example gene query history and a gene results page. Note the add column feature in the results page that allows the addition of columns with additional data and the ability to sort results.
When conducting queries with the purpose of combining results it may be useful to visualize the searches in a workflow environment where nodes are connected using different criteria (‘and’, ‘or’, ‘not’) (Figure 1B). In PlasmoDB this would be accomplished by performing a number of queries and subsequently combining the results in the ‘query history’ section (Figure 1C, middle screen shot). For example, one may be interested in identifying a short list of possible vaccine candidates. One possible way of accomplishing this would be by identifying all proteins predicted to be exported to the host cell in P. falciparum. There are three exported protein datasets in PlasmoDB and a union (‘or’ function) of all three results retrieves 405 genes (Figure 1B, steps 1 and 2). To restrict this list further, intersecting (‘and’ function) these results with genes that have no orthologs in mammals reduces the results to 321 genes (Figure 1B, Step 3). Next a user may further prune this list by intersecting the results with other queries, such as genes that are nonpolymorphic between a chloroquine sensitive (3D7) and resistant strain (Dd2). This cuts the number of candidates to 32 genes (Figure 1B, Step 4 and Figure 1C, right screen shot). Alternatively, one may be interested in the genes that have protein expression evidence in a particular stage in the parasite's life cycle (the results of an intersection with genes that have proteomic evidence in gametocyte yields 27 genes). Finally, examination of the list reveals several genes encoding for rifins (a family of clonally variant proteins expressed on the surface of infected red blood cells) (38), and a user may wish to investigate genes other than rifins—this can be accomplished by excluding (‘not’ operation) results of a keyword query using the term ‘rifin’ (Figure 1B, Step 5 and Figure 1C, left most panel). A user may examine the specific gene pages for more gene-specific details, download results with their associated data or log in (if they have not done so already) to ensure that their search strategy is saved for future examination.
FUTURE DIRECTIONS
It is expected that PlasmoDB will continue its data content and tool expansion as user needs require. We anticipate the incorporation of multiple new data sets including microarray, proteomic and specific parasite isolate data. Additionally, over the next few years we look forward to incorporating sequence data from a dramatically expanded Plasmodium spp. sequencing effort (http://www.genome.gov/26525388). In the coming year, we will also release a new user interface that will include a workflow-based search strategy page, similar to what is shown in Figure 1B, which we anticipate will provide a more biologically intuitive and dynamic experience for scientists accessing PlasmoDB and other EuPathDB sites.
FUNDING
Federal funds from the National Institute of Allergy and Infectious Diseases; National Institutes of Health; Department of Health and Human Services, under Contract No. HHSN266200400037C. Funding to pay the Open Access publication charges for this article was provided by this contract.
Conflict of interest statement. None declared.
ACKNOWLEDGEMENTS
The authors wish to thank members of the Plasmodium research community for their willingness to share genomic-scale data sets, often prior to publication, and for numerous comments and suggestions that have helped to improve the functionality of PlasmoDB. We also wish to thank Dr Akhil Vaidya for his valuable advice and continued support to PlasmoDB. We also thank past and present staff associated with the ApiDB-BRC project, and our research laboratory colleagues whose contributions have facilitated the creation and maintenance of this database resource.
REFERENCES
- 1.Phillips RS. Current status of malaria and potential for control. Clin. Microbiol. Rev. 2001;14:208–226. doi: 10.1128/CMR.14.1.208-226.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Snow RW, Guerra CA, Noor AM, Myint HY, Hay SI. The global distribution of clinical episodes of Plasmodium falciparum malaria. Nature. 2005;434:214–217. doi: 10.1038/nature03342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Singh B, Kim Sung L, Matusop A, Radhakrishnan A, Shamsul SS, Cox-Singh J, Thomas A, Conway DJ. A large focus of naturally acquired Plasmodium knowlesi infections in human beings. Lancet. 2004;363:1017–1024. doi: 10.1016/S0140-6736(04)15836-4. [DOI] [PubMed] [Google Scholar]
- 4.Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, Ginsburg H, Gupta D, Kissinger JC, Labo P, et al. PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 2003;31:212–215. doi: 10.1093/nar/gkg081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, DeRisi JL. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003;1:E5. doi: 10.1371/journal.pbio.0000005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch JK, Haynes JD, De La Vega P, Holder AA, Batalov S, Carucci DJ, et al. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science. 2003;301:1503–1508. doi: 10.1126/science.1087025. [DOI] [PubMed] [Google Scholar]
- 7.Llinas M, Bozdech Z, Wong ED, Adai AT, DeRisi JL. Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res. 2006;34:1166–1173. doi: 10.1093/nar/gkj517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Baum J, Maier AG, Good RT, Simpson KM, Cowman AF. Invasion by P. falciparum merozoites suggests a hierarchy of molecular interactions. PLoS Pathog. 2005;1:e37. doi: 10.1371/journal.ppat.0010037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Duraisingh MT, Voss TS, Marty AJ, Duffy MF, Good RT, Thompson JK, Freitas-Junior LH, Scherf A, Crabb BS, Cowman AF. Heterochromatin silencing and locus repositioning linked to regulation of virulence genes in Plasmodium falciparum. Cell. 2005;121:13–24. doi: 10.1016/j.cell.2005.01.036. [DOI] [PubMed] [Google Scholar]
- 10.Stubbs J, Simpson KM, Triglia T, Plouffe D, Tonkin CJ, Duraisingh MT, Maier AG, Winzeler EA, Cowman AF. Molecular mechanism for switching of P. falciparum invasion pathways into human erythrocytes. Science. 2005;309:1384–1387. doi: 10.1126/science.1115257. [DOI] [PubMed] [Google Scholar]
- 11.Hall N, Karras M, Raine JD, Carlton JM, Kooij TW, Berriman M, Florens L, Janssen CS, Pain A, Christophides GK, et al. A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005;307:82–86. doi: 10.1126/science.1103717. [DOI] [PubMed] [Google Scholar]
- 12.Mair GR, Braks JA, Garver LS, Wiegant JC, Hall N, Dirks RW, Khan SM, Dimopoulos G, Janse CJ, Waters AP. Regulation of sexual development of Plasmodium by translational repression. Science. 2006;313:667–669. doi: 10.1126/science.1125129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tarun AS, Peng X, Dumpit RF, Ogata Y, Silva-Rivera H, Camargo N, Daly TM, Bergman LW, Kappe SH. A combined transcriptome and proteome survey of malaria parasite liver stages. Proc. Natl Acad. Sci. USA. 2008;105:305–310. doi: 10.1073/pnas.0710780104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Florent I, Charneau S, Grellier P. Plasmodium falciparum genes differentially expressed during merozoite morphogenesis. Mol. Biochem. Parasitol. 2004;135:143–148. doi: 10.1016/j.molbiopara.2003.12.010. [DOI] [PubMed] [Google Scholar]
- 15.Watanabe J, Wakaguri H, Sasaki M, Suzuki Y, Sugano S. Comparasite: a database for comparative study of transcriptomes of parasites defined by full-length cDNAs. Nucleic Acids Res. 2007;35:D431–D438. doi: 10.1093/nar/gkl1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gunasekera AM, Patankar S, Schug J, Eisen G, Kissinger J, Roos D, Wirth DF. Widespread distribution of antisense transcripts in the Plasmodium falciparum genome. Mol. Biochem. Parasitol. 2004;136:35–42. doi: 10.1016/j.molbiopara.2004.02.007. [DOI] [PubMed] [Google Scholar]
- 17.Gunasekera AM, Patankar S, Schug J, Eisen G, Wirth DF. Drug-induced alterations in gene expression of the asexual blood forms of Plasmodium falciparum. Mol. Microbiol. 2003;50:1229–1239. doi: 10.1046/j.1365-2958.2003.03787.x. [DOI] [PubMed] [Google Scholar]
- 18.Patankar S, Munasinghe A, Shoaibi A, Cummings LM, Wirth DF. Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of anti-sense transcripts in the malarial parasite. Mol. Biol. Cell. 2001;12:3114–3125. doi: 10.1091/mbc.12.10.3114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Florens L, Liu X, Wang Y, Yang S, Schwartz O, Peglar M, Carucci DJ, Yates J.R., III, Wub Y. Proteomics approach reveals novel proteins on the surface of malaria-infected erythrocytes. Mol. Biochem. Parasitol. 2004;135:1–11. doi: 10.1016/j.molbiopara.2003.12.007. [DOI] [PubMed] [Google Scholar]
- 20.Florens L, Washburn MP, Raine JD, Anthony RM, Grainger M, Haynes JD, Moch JK, Muster N, Sacci JB, Tabb DL, et al. A proteomic view of the Plasmodium falciparum life cycle. Nature. 2002;419:520–526. doi: 10.1038/nature01107. [DOI] [PubMed] [Google Scholar]
- 21.Khan SM, Franke-Fayard B, Mair GR, Lasonder E, Janse CJ, Mann M, Waters AP. Proteome analysis of separated male and female gametocytes reveals novel sex-specific. Plasmodium Biol. Cell. 2005;121:675–687. doi: 10.1016/j.cell.2005.03.027. [DOI] [PubMed] [Google Scholar]
- 22.Su X, Ferdig MT, Huang Y, Huynh CQ, Liu A, You J, Wootton JC, Wellems TE. A genetic map and recombination parameters of the human malaria parasite Plasmodium falciparum. Science. 1999;286:1351–1353. doi: 10.1126/science.286.5443.1351. [DOI] [PubMed] [Google Scholar]
- 23.Jeffares DC, Pain A, Berry A, Cox AV, Stalker J, Ingle CE, Thomas A, Quail MA, Siebenthall K, Uhlemann AC, et al. Genome variation and evolution of the malaria parasite Plasmodium falciparum. Nat. Genet. 2007;39:120–125. doi: 10.1038/ng1931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mu J, Awadalla P, Duan J, McGee KM, Keebler J, Seydel K, McVean GA, Su XZ. Genome-wide variation and identification of vaccine targets in the Plasmodium falciparum genome. Nat. Genet. 2007;39:126–130. doi: 10.1038/ng1924. [DOI] [PubMed] [Google Scholar]
- 25.Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner D.A., Jr, Daily JP, Sarr O, Ndiaye D, Ndir O, et al. A genome-wide map of diversity in Plasmodium falciparum. Nat. Genet. 2007;39:113–119. doi: 10.1038/ng1930. [DOI] [PubMed] [Google Scholar]
- 26.Chen F, Mackey AJ, Stoeckert C.J., Jr, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–D368. doi: 10.1093/nar/gkj123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Date SV, Stoeckert C.J., Jr Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale. Genome Res. 2006;16:542–549. doi: 10.1101/gr.4573206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.LaCount DJ, Vignali M, Chettier R, Phansalkar A, Bell R, Hesselberth JR, Schoenfeld LW, Ota I, Sahasrabudhe S, Kurschner C, et al. A protein interaction network of the malaria parasite Plasmodium falciparum. Nature. 2005;438:103–107. doi: 10.1038/nature04104. [DOI] [PubMed] [Google Scholar]
- 29.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, et al. InterPro, progress and status in 2005. Nucleic Acids Res. 2005;33:D201–D205. doi: 10.1093/nar/gki106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ginsburg H. Progress in in silico functional genomics: the malaria Metabolic Pathways database. Trends Parasitol. 2006;22:238–240. doi: 10.1016/j.pt.2006.04.008. [DOI] [PubMed] [Google Scholar]
- 32.Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 2004;340:783–795. doi: 10.1016/j.jmb.2004.05.028. [DOI] [PubMed] [Google Scholar]
- 33.Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
- 34.Foth BJ, Ralph SA, Tonkin CJ, Struck NS, Fraunholz M, Roos DS, Cowman AF, McFadden GI. Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science. 2003;299:705–708. doi: 10.1126/science.1078599. [DOI] [PubMed] [Google Scholar]
- 35.Hiller NL, Bhattacharjee S, van Ooij C, Liolios K, Harrison T, Lopez-Estrano C, Haldar K. A host-targeting signal in virulence proteins reveals a secretome in malarial infection. Science. 2004;306:1934–1937. doi: 10.1126/science.1102737. [DOI] [PubMed] [Google Scholar]
- 36.Marti M, Baum J, Rug M, Tilley L, Cowman AF. Signal-mediated export of proteins from the malaria parasite to the host erythrocyte. J. Cell Biol. 2005;171:587–592. doi: 10.1083/jcb.200508051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Marti M, Good RT, Rug M, Knuepfer E, Cowman AF. Targeting malaria virulence and remodeling proteins to the host erythrocyte. Science. 2004;306:1930–1933. doi: 10.1126/science.1102452. [DOI] [PubMed] [Google Scholar]
- 38.Kyes SA, Rowe JA, Kriek N, Newbold CI. Rifins: a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Proc. Natl Acad. Sci. USA. 1999;96:9333–9338. doi: 10.1073/pnas.96.16.9333. [DOI] [PMC free article] [PubMed] [Google Scholar]