Abstract
Pathema (http://pathema.jcvi.org) is one of the eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infectious Disease (NIAID) designed to serve as a core resource for the bio-defense and infectious disease research community. Pathema strives to support basic research and accelerate scientific progress for understanding, detecting, diagnosing and treating an established set of six target NIAID Category A–C pathogens: Category A priority pathogens; Bacillus anthracis and Clostridium botulinum, and Category B priority pathogens; Burkholderia mallei, Burkholderia pseudomallei, Clostridium perfringens and Entamoeba histolytica. Each target pathogen is represented in one of four distinct clade-specific Pathema web resources and underlying databases developed to target the specific data and analysis needs of each scientific community. All publicly available complete genome projects of phylogenetically related organisms are also represented, providing a comprehensive collection of organisms for comparative analyses. Pathema facilitates the scientific exploration of genomic and related data through its integration with web-based analysis tools, customized to obtain, display, and compute results relevant to ongoing pathogen research. Pathema serves the bio-defense and infectious disease research community by disseminating data resulting from pathogen genome sequencing projects and providing access to the results of inter-genomic comparisons for these organisms.
INTRODUCTION
Pathema is a community driven bioinformatics resource that provides access to genomic data integrated with analysis tools designed to aid researchers in identifying potential targets for novel therapeutics, vaccines, and diagnostics for six selected National Institute of Allergy and Infectious Disease (NIAID) priority pathogens (1). Organisms classified by NIAID as priority pathogens are selected based on their association as agents or potential agents of bioterrorism. The priority pathogens Pathema supports includes five prokaryotes Bacillus anthracis, Burkholderia mallei, Burkholderia pseudomallei, Clostridium botulinum and Clostridium perfringens and one eukaryote Entamoeba histolytica. To provide researchers with a comprehensive collection of organisms for comparative analyses, 66 unique strains of priority pathogens are supported, to include 54 phylogenetically related species. Organisms are grouped taxonomically by genus, with associated data stored in four distinct databases, each accessible through four different clade web interfaces. Each Pathema clade resource, linked from one central Pathema gateway interface, is tailored to address the specific data and analysis needs of each scientific community; feedback gathered through outreach activities. Pathema disseminates high-quality, up-to-date data to include genome sequence, annotation data types and curation assertions, and specialty datasets as they relate to ongoing pathogen and infectious disease research. The most current data generated is displayed throughout the resource and Pathema deposits all relevant data in public repositories such as the Pathogen Portal (http://www.pathogenportal.org/), GenBank (2) and the GO repository (3). Integrated with this data is a suite of sophisticated bioinformatics software and over 50 analysis tools customized to retrieve, display and compute results relevant to the research of each Pathema target pathogen community. Bioinformatics tools for cross-genome comparisons and identification of metabolic pathways are also integrated to facilitate the identification of potential targets for vaccine development, therapeutics, and diagnostics. In addition, clade-specific training courses, detailed tutorials, standard operating procedures are offered to provide instruction and documentation on the use of this system and underlying databases.
PATHEMA ORGANISMS
Pathema supports sequence and detailed curation of six NIAID target priority pathogens and related species grouped taxonomically by genus into four clades: Bacillus, Burkholderia, Clostridium and Entamoeba (Table 1). These pathogens are included among two of three high-priority categories (Categories A, B and C) classified by NIAID based on their relative capabilities for causing morbidity or mortality from disease in case of biowarfare (http://www3.niaid.nih.gov/topics/BiodefenseRelated/Biodefense/research/CatA.htm). The inclusion of closely related species provides researchers with a comprehensive collection of organisms for comparative analyses.
Table 1.
Pathema clade | Target NIAID pathogen | Organisms supported | Completed genomes | Draft genomes | NIAID category | Associated disease |
---|---|---|---|---|---|---|
Bacillus | 40 | 21 | 19 | |||
Bacillus anthracis | 19 | 6 | 13 | A | Anthrax | |
Burkholderia | 41 | 24 | 18 | |||
Burkholderia mallei | 10 | 4 | 6 | B | Glanders | |
Burkholderia pseudomallei | 12 | 4 | 8 | B | Melioidosis | |
Clostridium | 36 | 23 | 13 | |||
Clostridium botulinum | 15 | 10 | 5 | A | Botulism | |
Clostridium perfringens | 9 | 3 | 6 | B | Enterotoxemia | |
Entamoeba | 3 | 3 | 0 | |||
Entamoeba histolytica | 1 | 1 | 0 | B | Amebiasis | |
Total Pathema | 120 | 71 | 50 |
A complete list of supported organisms is included in Supplementary Table S1.
The Bacillus clade supports 40 prokaryotic organisms including the target pathogen B. anthracis (Category A), as well as the pathogens B. cereus and B. thuringiensis. Long regarded as one of the preferred biological warfare agents, B. anthracis is the causative agent of anthrax. Its potential for use as a bioweapon was demonstrated by the autumn 2001 anthrax letter attacks in the US. Its lethality, combined with ease of laboratory production and ability to disseminate anthrax spores in aerosol form, accounts for its interest as a biowarfare agent (4).
Included among the 41 prokaryotes supported by the Burkholderia clade are the target pathogens B. mallei and B. pseudomallei (Category B), as well as the pathogen B. cepacia. B. mallei is responsible for glanders, a disease that occurs mostly in horses and related animals. Glanders has been associated with war for centuries, to include the use of B. mallei as a bioweapon in World War I, World War II, and anecdotal evidence supports its use in Afghanistan. Its ease of transmission and severity of disease makes B. mallei of interest as an agent for bioterrorism (5). Burkholderia pseudomallei, a human and animal pathogen, is the causative agent of melioidosis, an infectious disease endemic to Southeast Asia and northern Australia, and may occur in other tropical and subtropical regions. Its severe course of infection, aerosol infectivity and worldwide availability resulted in its inclusion as a potential agent of biological warfare or bioterrorism (6).
The Clostridium clade supports 36 prokaryotic organisms encompassing the four main species responsible for disease in humans. These include the target pathogens C. botulinum (Category A), C. perfringens (Category B), as well as the pathogens C. difficile and C. tetani. Different strains of C. botulinum produce different types of toxins apart from the well-known botulinum neurotoxin, the causative agent of the disease botulism in humans and animals (4). The botulism toxin, considered the most lethal naturally occurring substance, was linked for use as a bioweapon during World War II and the Persian Gulf War (7). C. perfringens is known to be the most widely distributed pathogen in nature. It is shown to be a causative agent of human diseases such as gas gangrene, food poisoning, and enteritis necroticans, as well as various animal diseases (5).
Included in the Entamoeba clade are three parasitic protists: E. histolytica, E. dispar and E. invadens. The target pathogen E. histolytica (Category B), is the causative agent of the most common diarrheal disease, amebiasis. Amebiasis accounts for between 40 000 and 100 000 deaths annually, and is predominantly seen in developing countries where a high prevalence of infection is due to fecal contamination of food and water supply, factors that cannot be immediately remedied due to limited financial resources in these countries (8). Its interest as a potential biothreat organism is its low infectious dose and potential for dissemination through compromised food and water supplies.
To assist researchers in identifying correlations between patient phenotype and geography, symptoms/outcome and pathogen sequence variation, and to gain an understanding of the impact of pathogen genomic variations on drug resistance or vaccine efficacy, Pathema integrates epidemiological and clinical data. Where available, this data is obtained from the research community for each organism and includes: the original source location of each organism strain, detailed clinical information (e.g. date isolated, isolation source, historical background), genotype numbering based on Multi Locus Sequence Typing (9), and source contact information for obtaining the DNA.
INTERFACE DESIGN AND DATABASE DESCRIPTION
The main Pathema gateway interface serves as the central entry point to access Pathema’s target pathogens and related species through one of four distinct clade-specific web resources: Bacillus, Clostridium, Burkholderia and Entamoeba. This gateway provides general information, news and highlights, planned data updates, and tutorials relevant to the entire Pathema resource, with links to each of the four clade sites supporting clade-specific data and analysis tools. Based on feedback gathered through community outreach, Pathema’s four clade resources aim to target the individual research needs of each community by integrating the specific datasets and analysis tools requested by organism experts. Through the customized development of clade resources, Pathema serves as a core resource supporting scientific investigation and hypothesis generation of its supported target organisms.
The Pathema web interface uses the Coati (Collaborative Open Applications Tool Initiative) architecture framework. Coati is an open source project housed at SourceForge (http://sourceforge.net/projects/coati-api/). Each clade-specific web interface interacts with one of four separate Chado (10) relational database schemas that house Pathema clade sequence and annotation data, and comparative computes. Chado underlies many Generic Model Organism Database (GMOD) (11) installations and is a general schema used to share genomic data, annotations and analyses.
CURATION DATA TYPES
Pathema generates and continuously updates gene model and functional annotation data for 120 supported genome projects, disseminating data of over 600 000 predicted genes with common data types (Table 2). Common data types are assigned using an automated pipeline to process the genomic sequences of all Pathema organisms. This pipeline consists of several algorithms for the prediction of gene models and genome features (e.g. RNAs, terminators, repeats), and employs a hierarchical evidence ranking scheme to assign functional annotation [e.g. protein name, gene symbol, Enzyme Commission (EC) number (12), Gene Ontology (GO) terms]. By assigning common data types using one standardized pipeline across all organisms, comparative analyses become easier and more meaningful to the researcher. Additionally, based on the use of common data types, a rich set of curation assertions with supporting evidence are generated. These curation assertions are based on the Gene Ontology Consortium and attempt to describe the complete profile (i.e. molecular function, biological process, cellular location) of proteins in biologically meaningful ways, those that cannot be captured by individual data types alone. Standardized evidence types represent a diverse range of specific forms of evidence (i.e. direct assay, mutant phenotype) used to support each curation assertion. The use of standardized evidence types facilitates a mechanism to easily assess the level of confidence supporting each assertion, ultimately validating hypotheses derived from the profile analysis of individual proteins, orthologs and pathway data.
Table 2.
Pathema clade | Total organisms | Predicted genes | Evidence types supporting manual curation |
Curated specialty genes |
Annotation data types |
Curation assertions |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sequence similarity | Mutant phenotype | Expression pattern | Direct assay | Genome context | Epitopes | Virulence factors | Multidrug exporters | Protein interactions | Experimentally verified | Protein name (%) | Gene symbol (%) | EC number (%) | Molecular function (%) | Biological process (%) | Cellular component (%) | |||
Bacillus | 40 | 217 352 | 10 645 | 61 | 0 | 57 | 12 | 758 | 74 | 6473 | 163 | 343 | 69 | 20 | 14 | 91 | 94 | 36 |
Burkholderia | 41 | 245 739 | 48 142 | 104 | 11 | 72 | 110 | 418 | 122 | 5448 | 52 | 714 | 70 | 19 | 15 | 82 | 80 | 40 |
Clostridium | 36 | 131 359 | 28 803 | 3 | 1 | 32 | 55 | 345 | 17 | 2897 | 1 | 227 | 73 | 22 | 16 | 74 | 73 | 34 |
Entamoeba | 3 | 28 560 | 2537 | 1 | 0 | 14 | 0 | 0 | 176 | 141 | 0 | 31 | 12 | 1 | 14 | 16 | 10 | 5 |
Total | 120 | 623 010 | 90 127 | 169 | 12 | 175 | 177 | 1521 | 389 | 14 959 | 216 | 1315 | 68 | 19 | 15 | 80 | 80 | 36 |
Only a subset of annotation data types and curation assertions used by Pathema to describe predicted genes based on supporting evidence are included.
Common annotation data types and curation assertions with supporting evidence are computationally generated for all Pathema organisms. With the goal of providing the scientific community with the most accurate annotation, automated predictions are manually curated for each of Pathema’s six target pathogens. Established naming conventions and evidence interpretation guidelines are adhered to during this manual process. Additionally, the genomic annotation of these organisms reflects in-depth manual literature curation of biodefense and infectious disease related datasets. These datasets include clade-specific virulence factors, epitopes (13), protein–protein interactions (14), multidrug exporters (15) and experimentally characterized proteins. Inclusion of these datasets enrich existing genome annotation, thereby facilitating the identification of potential new targets of pathogen research interest.
Although Pathema’s six target pathogens are the primary focus of manual effort, Pathema strives to provide the same level of high-quality annotation across all organisms supported by the Pathema resource. To achieve this, a homology mapping strategy is employed. This strategy uses the MUMmer (16) whole genome alignment program to identify close protein homologs, with subsequent propagation of high-quality manually curated data from each target organism to all closely related Pathema clade members.
All annotation standard operating procedures, Pathema’s Gene Naming and Annotation Guidelines, and all other related annotation documentation is obtainable throughout the Pathema resource (http://pathema.jcvi.org/protocols).
GENOME AND COMPARATIVE ANALYSIS TOOLS
Pathema supports over 50 web-based data mining, single gene, whole-genome and multi-genome comparative tools to facilitate analyses of genomic sequence and annotation data across Pathema organisms. Tools are designed to facilitate scientific exploration in the areas of functional curation, pathogenicity, therapeutics, comparative analysis and functional genomics. While every tool has several applications, taken together they provide numerous opportunities for discovery and hypothesis generation (Supplementary Table S2).
Data mining
Pathema incorporates over 25 different search capabilities that enable data mining and retrieval of all data types stored in the Pathema database. Search tools query genes, genomes, sequences or text, matching user-defined strings across gene loci, gene symbols and protein product names. Virulence factors, epitopes, experimentally characterized proteins and protein interaction data can be retrieved using Pathema search tools across user-selected organisms. Other queries include EC#, GenBank, SwissProt (17) and GO id searches, and common sequence search methods such as BLAST (18), Hidden Markov Model (19) and protein motif searches (20) are also available.
Literature mining
A semantic visualization tool, based on the National Library of Medicine’s SemMed viewer (21), is integrated within Pathema. This tool provides access to biomedical literature archived in PubMed, through manually curated semantic condensate data records of relevant subjects for each Pathema clade. Records can be displayed in both graphical and word cloud format, and include links to external data sites containing relevant information, such as genetic databases, Unified Medical Language System (UMLS) entries and the original Medline reference.
Single gene analysis
Individual gene pages highlight annotation data and associated evidence, as well as provide access to single gene analysis tools for every gene available on Pathema. Annotation data displayed and downloadable includes protein product name, gene symbol, EC#, GO ids, functional role category assignment, and DNA and protein sequences. Literature references are provided for all proteins that are identified virulence factors, are associated with an epitope(s), interact with another protein(s), or have experimental characterizations. Calculating the transmembrane HMM profile (22), secondary structure and third position GC-Skew are just a few types of analyses that can be performed. Links to other relevant resources such as UniProt, GenBank, Prosite, Pfam (23), etc. are also available.
Whole-genome analysis
Over 20 different displays and analyses of whole-genome data are included in Pathema. These analysis tools enable the display and analysis of individual genomic data using a variety of different methods. Whole-genome data can be displayed graphically as a linear representation of genes on regions of a chromosome or as a complete circle for an entire chromosome. Data can be investigated through biochemical pathways (24–26), codon usage tables, percent GC plots, computer generated 2D and restriction digest gels, and summary information such as average gene size or numbers of coding regions can be retrieved as viewable and downloadable tables and lists.
Comparative analysis
Integrated into Pathema are over 15 different comparative analysis tools for multi-genome comparisons among Pathema clade organisms (Figure 1). The basis for Pathema’s current comparative tools is either pre-generated Jaccard orthologous protein clusters or All versus All blastp searches. Incorporated, are the most popular tools of the publicly available Sybil comparative analysis suite (27). Sybil uses Pathema’s pre-generated protein clusters as the underlying data for its synteny gradient and comparative genomic displays. Sybil protein cluster ortholog, paralog and singleton data are also available.
COMMUNITY OUTREACH
Pathema launched a community outreach strategic plan to assess the scientific and informatic needs of the pathogen research community. This community consists of over 950 identified researchers who study the six Pathema target pathogens, with over 25% participating in Pathema community outreach efforts. These efforts were designed to gather feedback during the initial phases of resource development and testing, with feedback continuously gathered during various training and other outreach activities. Pathema provides detailed training in the form of clade-specific annotation jamborees and hands-on Pathema resource workshops conducted both on site and in conjunction with major organism specific conferences. In-depth resource tutorials and manuals that describe Pathema tools and data are also available. Currently 20 scientific publications reference the use of Pathema and its underlying data sets (28–46).
AVAILABILITY
Pathema is maintained at the J. Craig Venter Institute and can be accessible through a web browser at http://pathema.jcvi.org. There are no license restrictions for user access to any of the data supported by Pathema, and all source code is managed under an open-source collaborative development paradigm. Web scripts and data maintenance programs are located at SourceForge under the Pathema project (http://sourceforge.net/projects/pathema). Pathema sequence and annotation data formatted GFF3 files can be obtained from the Pathema FTP download site (ftp://ftp.pathogenportal.org/gff3/Pathema/); retrievable from the ‘downloads’ tab off the main resource header or linked directly from each organism homepage. Additionally results obtained from complex searches or genomic comparisons are available in tab-delimited format throughout Pathema on each respective results page.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
National Institute of Allergy and Infectious Disease contract HHSN266200400038C. Funding for open access charge: NIAID.
Conflict of interest statement. None declared.
ACKNOWLEDGMENTS
The authors would like to thank the J. Craig Venter Institute Information Technology and Bioinformatics Departments for their ongoing technical, engineering and scientific support to include Michael Heaney, Darnell Edwards, Tom Emmel, Dan H. Haft, Roland Richter and Jeremy Selengut as well as the support received from the Institute for Genomic Sciences to include Sam Angiuoli, Sean Daugherty, Michelle Gwinn Giglio, Heather Huot, Anup Mahurkar and Jennifer Wortman. The authors would also like to thank Tom Rindflesch and Dongwook Shin from the Lister Hill National Center for Biomedical Communications for providing the version of SemMed that was used in Pathema development activities.
REFERENCES
- 1.Greene JM, Collins F, Lefkowitz EJ, Roos D, Scheuermann RH, Sobral B, Stevens R, White O, Di Francesco V. National Institute of Allergy and Infectious Diseases bioinformatics resource centers: new assets for pathogen informatics. Infect. Immun. 2007;75:3212–3219. doi: 10.1128/IAI.00105-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2008;36:D25–D30. doi: 10.1093/nar/gkm929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Darling RG, Catlett CL, Huebner KD, Jarrett DG. Threats in bioterrorism. I: CDC category A agents. Emerg. Med. Clin. North Am. 2002;20:273–309. doi: 10.1016/s0733-8627(02)00005-6. [DOI] [PubMed] [Google Scholar]
- 5.Moran GJ. Threats in bioterrorism. II: CDC category B and C agents. Emerg. Med. Clin. North Am. 2002;20:311–330. doi: 10.1016/s0733-8627(01)00003-7. [DOI] [PubMed] [Google Scholar]
- 6.Gilad J, Harary I, Dushnitsky T, Schwartz D, Amsalem Y. Burkholderia mallei and Burkholderia pseudomallei as bioterrorism agents: national aspects of emergency preparedness. Isr. Med. Assoc. J. 2007;9:499–503. [PubMed] [Google Scholar]
- 7.Roffey R, Tegnell A, Elgh F. Biological warfare in a historical perspective. Clin. Microbiol. Infect. 2002;8:450–454. doi: 10.1046/j.1469-0691.2002.00501.x. [DOI] [PubMed] [Google Scholar]
- 8.Upcroft P, Upcroft JA. Drug targets and mechanisms of resistance in the anaerobic protozoa. Clin. Microbiol. Rev. 2001;14:150–164. doi: 10.1128/CMR.14.1.150-164.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Urwin R, Maiden MCJ. Multi-locus sequence typing: a tool for global epidemiology. Trends Microbiol. 2003;11:479–487. doi: 10.1016/j.tim.2003.08.006. [DOI] [PubMed] [Google Scholar]
- 10.Mungall CJ, Emmert DB. A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics. 2007;23:i337–i346. doi: 10.1093/bioinformatics/btm189. [DOI] [PubMed] [Google Scholar]
- 11.O'C;onnor BD, Day A, Cain S, Arnaiz O, Sperling L, Stein LD. GMODWeb: a web framework for the Generic Model Organism Database. Genome Biol. 2008;9:R102. doi: 10.1186/gb-2008-9-6-r102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Webb EC. Enzyme Nomenclature. San Diego, California: Academic Press; 1992. [Google Scholar]
- 13.Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, et al. The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 2005;3:e91. doi: 10.1371/journal.pbio.0030091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Goll J, Rajagopala SV, Shiau SC, Wu H, Lamb BT, Uetz P. MPIDB: the microbial protein interaction database. Bioinformatics. 2008;24:1743–1744. doi: 10.1093/bioinformatics/btn285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Busch W, Saier MH. The Transporter Classification (TC) system, 2002. Crit. Rev. Biochem. Mol. Biol. 2002;37:287–337. doi: 10.1080/10409230290771528. [DOI] [PubMed] [Google Scholar]
- 16.Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2005;33:D154–D159. doi: 10.1093/nar/gki070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Altschul S, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 19.Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
- 20.Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ. The PROSITE database. Nucleic Acids Res. 2006;34:D227–D230. doi: 10.1093/nar/gkj063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kilicoglu H, Fiszman M, Rodriguez A, Shin D, Ripple AM, Rindflesch TC. Semantic MEDLINE: a web application to manage the results of PubMed searches. 2008 Proceedings of the Third International Symposium for Semantic Mining in Biomedicine (SMBM), Turku Finland, Sep. 1–3;69–76. [Google Scholar]
- 22.Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
- 23.Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer ELL, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36:D281–D288. doi: 10.1093/nar/gkm960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Haft DH, Selengut JD, Brinkac LM, Zafar N, White O. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics. 2005;21:293–306. doi: 10.1093/bioinformatics/bti015. [DOI] [PubMed] [Google Scholar]
- 25.Karp PD, Paley S, Romero P. The Pathway Tools software. Bioinformatics. 2002;18:S225–S232. doi: 10.1093/bioinformatics/18.suppl_1.s225. [DOI] [PubMed] [Google Scholar]
- 26.Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008;36:D480–D484. doi: 10.1093/nar/gkm882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Crabtree J, Angiuoli SV, Wortman JR, White OR. Sybil: methods and software for multiple genome comparison and visualization. Methods Mol. Biol. 2007;408:93–108. doi: 10.1007/978-1-59745-547-3_6. [DOI] [PubMed] [Google Scholar]
- 28.Abhyankar MM, Hochreiter AE, Connell SK, Gilchrist CA, Mann BJ, Petri WA., Jr Development of the Gateway system for cloning and expressing genes in Entamoeba histolytica. Parasitol. Int. 2009;58:95–97. doi: 10.1016/j.parint.2008.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cer RZ, Mudunuri U, Stephens R, Lebeda FJ. IC50-to-Ki: a web-based tool for converting IC50 to Ki values for inhibitors of enzyme activity and ligand binding. Nucleic Acids Res. 2009;37:W441–W445. doi: 10.1093/nar/gkp253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Janvilisri T, Scaria J, Thompson AD, Nicholson A, Limbago BM, Arroyo LG, Songer JG, Grohn YT, Chang YF. Microarray identification of Clostridium difficile core components and divergent regions associated with host origin. J. Bacteriol. 2009;191:3881–3891. doi: 10.1128/JB.00222-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Cruz-Castaneda A, Hernandez-Sanchez J, Olivares-Trejo JJ. Cloning and identification of a gene coding for a 26-kDa hemoglobin-binding protein from Entamoeba histolytica. Biochimie. 2009;91:383–389. doi: 10.1016/j.biochi.2008.10.016. [DOI] [PubMed] [Google Scholar]
- 32.Melendez-Hernandez MG, Barrios ML, Orozco E, Luna-Arias JP. The vacuolar ATPase from Entamoeba histolytica: molecular cloning of the gene encoding for the B subunit and subcellular localization of the protein. BMC Microbiol. 2008;8:235. doi: 10.1186/1471-2180-8-235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zhang H, Ehrenkaufer GM, Pompey JM, Hackney JA, Singh U. Small RNAs with 5′-polyphosphate termini associate with a Piwi-related protein and regulate gene expression in the single-celled eukaryote Entamoeba histolytica. PLoS Pathog. 2008;4:e1000219. doi: 10.1371/journal.ppat.1000219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Marchat LA, Orozco E, Guillen N, Weber C, Lopez-Camarillo C. Putative DEAD and DExH-box RNA helicases families in Entamoeba histolytica. Gene. 2008;424:1–10. doi: 10.1016/j.gene.2008.07.042. [DOI] [PubMed] [Google Scholar]
- 35.Abhyankar MM, Hochreiter AE, Hershey J, Evans C, Zhang Y, Crasta O, Sobral BW, Mann BJ, Petri WA, Jr, Gilchrist CA. Characterization of an Entamoeba histolytica high-mobility-group box protein induced during intestinal infection. Eukaryot. Cell. 2008;7:1565–1572. doi: 10.1128/EC.00123-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gilchrist CA, Baba DJ, Zhang Y, Crasta O, Evans C, Caler E, Sobral BW, Bousquet CB, Leo M, Hochreiter A, et al. Targets of the Entamoeba histolytica transcription factor URE3-BP. PLoS Negl. Trop. Dis. 2008;2:e282. doi: 10.1371/journal.pntd.0000282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Duerkop BA, Herman JP, Ulrich RL, Churchill ME, Greenberg EP. The Burkholderia mallei BmaR3-BmaI3 quorum-sensing system produces and responds to N-3-hydroxy-octanoyl homoserine lactone. J. Bacteriol. 2008;190:5137–5141. doi: 10.1128/JB.00246-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Majumder S, Lohia A. Entamoeba histolytica encodes unique formins, a subset of which regulates DNA content and cell division. Infect. Immunity. 2008;76:2368–2378. doi: 10.1128/IAI.01449-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lopez-Camarillo C, de la Luz Garcia-Hernandez M, Marchat LA, Luna-Arias JP, Hernandez de la Cruz O, Mendoza L, Orozco E. Entamoeba histolytica EhDEAD1 is a conserved DEAD-box RNA helicase with ATPase and ATP-dependent RNA unwinding activities. Gene. 2008;414:19–31. doi: 10.1016/j.gene.2008.01.024. [DOI] [PubMed] [Google Scholar]
- 40.Li J, McClane BA. A novel small acid soluble protein variant is important for spore resistance of most Clostridium perfringens food poisoning isolates. PLoS Pathog. 2008;4:e1000056. doi: 10.1371/journal.ppat.1000056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lopez-Casamichana M, Orozco E, Marchat LA, Lopez-Camarillo C. Transcriptional profile of the homologous recombination machinery and characterization of the EhRAD51 recombinase in response to DNA damage in Entamoeba histolytica. BMC Mol. Biol. 2008;9 doi: 10.1186/1471-2199-9-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Jhingran A, Padmanabhan PK, Singh S, Anamika K, Bakre AA, Bhattacharya S, Bhattacharya A, Srinivasan N, Madhubala R. Characterization of the Entamoeba histolytica Ornithine Decarboxylase-Like Enzyme. PLoS Negl. Trop. Dis. 2008;2:e115. doi: 10.1371/journal.pntd.0000115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Whitlock GC, Estes DM, Torres AG. Glanders: off to the races with Burkholderia mallei. Fems Microbiol. Lett. 2007;277:115–122. doi: 10.1111/j.1574-6968.2007.00949.x. [DOI] [PubMed] [Google Scholar]
- 44.Sun J, Tuncay K, Haidar AA, Ensman L, Stanley F, Trelinski M, Ortoleva P. Transcriptional regulatory network discovery via multiple method integration: application to E. coli K12. Algorithms Mol. Biol. 2007;2:2. doi: 10.1186/1748-7188-2-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tiyawisutsri R, Holden MTG, Tumapa S, Rengpipat S, Clarke SR, Foster SJ, Nierman WC, Day NPJ, Peacock SJ. Burkholderia Hep_Hap autotransporter (BuHA) proteins elicit a strong antibody response during experimental glanders but not human melioidosis. BMC Microbiol. 2007;7:19. doi: 10.1186/1471-2180-7-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Vidal JE, Chen J, Li J, McClane BA. Use of an EZ-Tn5-based random mutagenesis system to identify a novel toxin regulatory locus in Clostridium perfringens strain 13. PLoS ONE. 2009;4:e6232. doi: 10.1371/journal.pone.0006232. [DOI] [PMC free article] [PubMed] [Google Scholar]