Abstract
Background
Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial species. Existing tools to investigate genomic contexts, however, lack features for dynamically comparing and exploring genomic regions from multiple species. As DNA sequencing technologies improve and the number of whole sequenced microbial genomes increases, a user-friendly genome context comparison platform designed for use by a broad range of users promises to satisfy a growing need in the biological community.
Results
Here we present JContextExplorer: a tool that organizes genomic contexts into branching diagrams. We implement several alternative context-comparison and tree rendering algorithms, and allow for easy transitioning between different clustering algorithms. To facilitate genomic context analysis, our tool implements GUI features, such as text search filtering, point-and-click interrogation of individual contexts, and genomic visualization via a multi-genome browser. We demonstrate a use case of our tool by attempting to resolve annotation ambiguities between two highly homologous yet functionally distinct genes in a set of 22 alpha and gamma proteobacteria.
Conclusions
JContextExplorer should enable a broad range of users to analyze and explore genomic contexts. The program has been tested on Windows, Mac, and Linux operating systems, and is implemented both as an executable JAR file and java WebStart. Program executables, source code, and documentation is available at http://www.bme.ucdavis.edu/facciotti/resources_data/software/.
Keywords: Genomic context, Genomic neighborhood, Comparative genomics, Java, GUI
Background
As genomic sequencing becomes increasingly accurate, cheaper, and widespread, the need for tools to meaningfully interpret whole-organism genomic sequence data has increased. While a large collection of tools are devoted to sequence homology and phylogenetic analyses [1,2], far less attention has been paid to tools designed to meaningfully compare gene neighborhoods, or genomic contexts, across species. Differences among genomic contexts across species may indicate changes in the organization of functional transcription units [3,4], which ultimately result in differences among gene regulatory networks [5]. Genomic context may also be helpful in elucidating details of horizontal gene transfer and duplication events [6-8], and has been used to improve upon sequence-based gene annotation algorithms [9-11] and aid in the construction of protein-protein association networks [12]. In each of these investigations, a new method was created to meaningfully define and compare genomic contexts. The existence of a fast, accurate, user-friendly context comparison tool could have aided these investigations, and could encourage future researchers to incorporate genomic context analyses into their investigations.
In plant and animal species, a number of tools interrogating synteny (the degree that genes remain on corresponding chromosomes) and collinearity (the degree that genes remain on corresponding chromosomes and in order) [13] have been developed, such as MCScanX [14] and i-ADHoRe [15]). The Ensembl project [16] also utilizes syntenic data. These tools have many useful features, however lack a powerful visualization methods. Additionally, they do not focus on microbial species. In general, genomic context comparison methods applied to microbial species [3-11] have been highly customized, non-GUI based, and not readily extendable to other investigations. However, a number of rudimentary GUI platforms for exploration of annotated microbial genomes have been developed, such as the Integrated Microbial Genome system (IMG) [17], which has developed a system that allows clickable navigation of one or more genomes [18]. While the tool offers several alternative homology-based clustering methods, it does not have much flexibility in other aspects - for example, genes may only be organized into groups called “chromosomal cassettes” according to a hard-coded 300-bp intergenic distance threshold, and there is no way to export graphical representations of genome contexts.
Several tools have focused on visualization of syntenic and collinear regions, such as the plant genome duplication database [12], PLAZA [19], and Genomicus [20]. These tools are most appropriate when investigating plant and animal species, however, and could benefit from additional user flexibility and control in their visualization and interrogation of genomic segments. A number of genome navigable interfaces have been developed (such as the UCSC genome browser [21], the Gaggle genome browser [22], and JBrowse [23]). Many genome browsers have been developed with a focus of interrogating one or a few model organism(s) of interest, such as EcoCyc, (interrogating Escherichia coli[24]) and the Yeast Gene Order Browser (interrogating various species of yeast [25]). While these tools are sophisticated in their visualization schemes, they are limited in the species available for cross-species comparisons. MicrobesOnline [26] has developed a “domain browser” tool, which allows one to analyze the domain content of homologous proteins across microbial species. However, this tool compares the domain content of one gene at a time (rather than the organization of groups of genes) and so is not appropriate for studying changes in genomic context.
A tool with broad applicability, powerful multi-genome visualization tools, and a high degree of user control could complement the existing set of synteny and genomic context comparison tools well. To bridge the gap in genomic context comparison and visualization software, we have developed a new tool: JContextExplorer. Our tool extends the Java Multidendrograms package [27], which allows for flexible computation, re-analysis, and export of multidendrograms. We apply the multidendrogram approach to a set of user-supplied annotated genomes to create “context trees”: genomic contexts (which form the leaves of the tree) are assembled into a multidendrogram using variable group agglomerative hierarchical clustering. Previous genomic context investigations often determined the genomic contexts of interest in a set of species, and compared the observable differences in genomic contexts to a phylogenetic tree of the organisms [28-30]. However, genomic contexts do not always differ in ways that match species phylogeny, especially when a number of horizontal gene transfer events have taken place [30]. Our context tree approach offers an alternative to whole-species or even single gene phylogenetic trees that emphasizes the arrangement, size, and spacing of individual genetic elements within a contextual region of DNA instead of nucleotide-specific differences in the DNA.
The genomic contexts used to assemble context trees may be interrogated in an intuitive context viewer window, and information associated with individual genes may be retrieved by button clicks. Our software facilitates easy modification of parameters, and enables interrogation of several alternative genomic contexts of interest simultaneously. A balance of automation and manual control is essential for any software tool; we have attempted to automate only essential processes (such as tree computation and tree rendering), and leave a great deal of control to the user. Our motivation was to develop a novel, general-purpose genomic context comparison platform to both (1) generate context trees, and (2) facilitate genomic exploration through our multi-genome browser interface. We demonstrate a use case for our tool by resolving annotation ambiguities between ggt and hpxW genes among 22 species of alpha and gamma proteobacteria. Though in the use case provided here we focus on microbial species, we emphasize that analyses are not limited to microbial species.
Implementation
JContextExplorer is a platform-independent pure Java application, requiring Java 1.6 or higher. The software extends the MultiDendrograms software package [27], and also uses BioJava [31] and the Java EPS Graphics2D API (version 0.1) [32]. The software has been tested to functionally equally on MacOS X, Windows 7, and Linux Ubuntu environments. Input data is read in via a series of tab-delimited text files. We provide instructions and examples in the user manual (Additional file 1) to help familiarize new users to the tool. The look and feel of all GUI components has been set to match the default look and feel of the operating system running the program. Program development was undertaken over several platforms to ensure an intuitive look and feel on all major platforms.
JContextExplorer has the ability to output JPG, PNG, and EPS representations of context tress and multi-genome browsable contexts. EPS representations of genomic contexts were achieved using the Java EPS Graphics2D API [32]. It took approximately 35 seconds to launch the program with a set of 22 annotated microbial genomes, computationally predict operons in all organisms using an intergenic distance threshold of 20 nucleotides, and load pre-computed homology cluster information for 81,102 annotated genes on a 2 x 2.8 GHz Quad-Core Intel Xeon processor, with 16 GB of RAM and total memory of 2 TB.
Results
JContextExplorer software usage
JContextExplorer may be launched via downloadable executable JAR file, or directly through the Internet via Java WebStart at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. The program is organized as a series of major and minor windows laid out in a semi-hierarchical manner (Figure 1). An initial welcome window invites the user to (1) specify the genomic working set (the set of genomes to investigate, see Figure 2) and (2) include cross-species homologous gene cluster information. Individual annotated genomes should be formatted as tab-delimited .GFF files (version 2). This information is imported into JContextExplorer by selecting either a directory containing a set of .GFF files or an additional tab-delimited mapping file listing the system locations of all individual annotated genomes files and corresponding species names. The user may also include tab-delimited cross-species gene clustering information, which could be computed using a combination of BLAST [33] and MCL [34], for example, or one of a number of various other gene clustering pipelines [35,36]. Homology cluster information may be entered in 5 alternative tab-delimited file formats (please see the user manual for a more detailed description). Once these files have been loaded, the user pushes a “submit” button to close the starting window and open the main window.
Once in the main window of the system, the user may search all loaded genomes by (1) gene annotation or (2) common homology group ID number. All computed genomic groupings in all organisms that contain one or more genes that match the search query are retrieved and organized in to a multidendrogram, according to a dissimilarity measure and linkage function. As a default, the starting context set defines genomic groupings only as the annotated features that match a search query (called the “SingleGene” context set), however 6 additional context sets are available, and may be accessed by clicking the “Add/Remove” button in the starting frame. Available genomic grouping schemes include organizing genes into operons, taking a range of nucleotides or genes around a query match, and loading a customized set of genomic groupings from file (for a complete description, please see the user manual). In this program, we have implemented 4 genomic grouping comparison metrics (or dissimilarity measures), each of which are appropriate for different use cases. If the genomic groupings that comprise a given context set are large, we suggest using either “Common Genes – Dice” or “Common Genes – Jaccard” metrics, which implement the set-based Dice and Jaccard dissimilarity approaches [37], with the individual annotated features within each grouping acting as elements and the whole genomic grouping acting as the set. If genomic groupings contain the same annotated features, however vary in the intergenic spacing between features, we recommend using the “Moving Distances” approach, which uses gene order and intergenic spacing to describe differences between contexts. Changes in intergenic spacing between genes within an operon has been experimentally shown to be related to gene co-expression in E. coli and B. subilits) [4], and may be a reflection of microbial gene regulatory networks changing over evolutionary time [3,4]. Finally, if the context set under investigation does not appear to change significantly except in the size of one or more genes, the “Total Length” dissimilarity metric may be effective (this is especially useful in for genomic groupings that consist of only one or a few genes). A more detailed description of these dissimilarity metrics is available in the user manual.
Linkage methods and display options available in the original multidendrograms package [27] are re-implemented here, which allows for easy re-computation of the context tree. All generated trees appear as individual internal frames; the user may therefore work on several alternative contexts at once (changes in tree computation and rendering will affect only the tree in focus). Individual leaves on the tree (which each represent a single context set grouping) are named by concatenating the name of the organism from which they derive to a serial number of the instance that a query match was found within that organism. Individual leaves on the tree may be selected by clicking on their name, clicking the “select all” button, or entering a leaf name search filter in the genomic context viewer tool search bar (located below the tree). Subsequent mouse clicks may bring up child windows either for (1) annotations of the query matches for selected, or (2) a multi-genome browser window (context viewer window). As depicted in (Figure 1A), the start window, main window, and context viewer window are the central components of the tool, and various child windows are available within the main window and context viewer windows.
The context viewer window (Figure 1B) is a multi-genome browser specifically designed to interrogate analogous gene groupings across many species (or multiple genomic regions within a single species), rather than explore the genome of a single species. Individual genes are rendered as colored rectangles, oriented above or below a centerline to represent their placement on the forward (above centerline) or reverse (below centerline) strand. Each segment is centered about the center of each gene grouping. Below all rendered contexts, a “genomic display” sub-panel contains check box options to (1) show/hide genomic coordinates, (2) normalize displayed contexts according to strand (which may allow for easier visual inspection of analogous contexts), (3) display genes surrounding each context that are not a part of the context, and (4) color the genes surrounding the context (if this is unchecked, surrounding genes are displayed as gray). Genes are colored according to homology or common annotation, depending on the method used to generate the context tree. Left clicking on individual genes within a rendered context brings up a pop-up window displaying biological information related to each gene (this information may be modified in a “gene information sub-panel”). Right clicking enables exporting rendered contexts as an image and offers the option to display a gene color legend. Middle clicking selects the clicked gene as well as all homologous genes or genes that with the same annotation (depending on the initial search type) displayed in the frame. Finally, the rendered range of each context may be easily changed using the “range around context segment” sub-panel, and clicking an “update contexts” button. The context viewer window and main window are actively linked; modifying selected leaves in the tree, for example, will add or remove these leaves in the context viewer window after clicking the “update contexts” button. The tool is designed to facilitate coordination of the context tree and the context viewer window – such coordination may inspire re-investigation of the same gene of interest using alternative context groupings, or re-computation of the context tree using a different clustering algorithm.
Analysis of the hpxW and ggt genes in 22 alpha and gamma proteobacteria
In the gamma-proteobacterial species Klebsiella oxytoca M5a1, the hpxW gene is known to form an operon with hpxW, hpxY, and hpxZ[38]. The hpxW gene, however, is highly homologous to another gene encoding gamma-glutamyl transpeptidase (ggt). A sequence alignment of K. oxytocahpxW and Escherichia coli ggt revealed that their amino acid sequences are almost co-linear and share 30% identity. This high degree of homology confuses automated annotation programs, which often misannotate hpxW as ggt. Fortunately, the ggt enzyme has been characterized in several microbial organisms, [39] and has a genomic context very different from the hpxW context (ggt occurs as a single gene, hpxW in an operon with at least 3 other genes). Therefore, by taking into account context as well as homology, it is possible to accurately separate ggt genes from hpxW genes.
We used JContextExplorer to attempt to separate ggt genes from hpxW genes in 22 alpha and gamma proteobacterial species based on differences between ggt and hpxW contexts. We found that ggt and hpxW grouped into two major out-branches (Figure 3). Interestingly, we discovered a third group, where manual investigation revealed that it was unclear if these genes were ggt or hpxW (data not shown). Visualization of the contexts in the hpxW group revealed agreement with previously described hpxWXYZ structures, and a comparison of a whole-genome phylogenetic tree with the ggt / hpxW context tree (Additional file 2) revealed good agreement among closely related organisms. Details relating to the methods associated with the above analyses are also available (Additional file 3). This investigation highlights the utility of combining automation (generating the ggt/hpxW context tree) with manual interrogation (investigation using the multi-genome browser context viewer tool).
Conclusion
Comparing genomic contexts across organisms is an effective but underutilized technique. While a handful of custom approaches have been developed, no universal platform for cross-species genomic context analyses has yet been produced. We have developed JContextExplorer to address this need. We have attempted to make JContextExplorer easy to install and use by offering our program as a GUI WebStart application (launching is as simple as navigating to a website, and clicking on the appropriate button). Additionally, our program is organized in a way that does not require a steep learning curve among prospective users. To help new users, we provide an extensive user manual and a series of video tutorials (Additional file 1) along with the program executable (Additional file 4). We hope that JContextExplorer may find use in the bioinformatics community with its emphases of producing a positive user experience and simultaneously offering a navigable tool of high quality and portability.
Availability and requirements
Project Name: JContextExplorer
Operating System: Platform independent
Programming language: Java
Other requirements: None.
License: Source code and binary executable are available under terms of the GPL free software license (ver-sion 2 or later) at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. Incorporation into commercial software under non-GPL terms is possible by obtaining a custom license from the University of California.
URL: http://www.bme.ucdavis.edu/facciotti/resources_data/software/.
Competing interests
The authors declare they have no competing interests.
Authors’ contributions
PS wrote the source code and drafted the manuscript. PS and TH analyzed the hpxWXYZ operon in alpha and gamma proteobasterial species with TH, which was instrumental in the development of JContextExplorer. TH also helped write the background information regarding purine catabolism and the hpxW gene in alpha and gamma proteobacteria, in the supplemental information. MF provided essential feedback for software development and oversaw the project, and helped to interpret results related to the hpxWXYZ genomic contexts. All authors contributed to the preparation of the manuscript, and have read and approved the final manuscript.
Supplementary Material
Contributor Information
Phillip Seitzer, Email: pmseitzer@ucdavis.edu.
Tu Anh Huynh, Email: tnahuynh@ucdavis.edu.
Marc T Facciotti, Email: mtfacciotti@ucdavis.edu.
Acknowledgements
Erin Lynch provided instrumental feedback for the development of the program, especially from the point of view of essential, biologically meaningful features. Aaron Darling provided helpful advice for the Java implementation, and aid in running the PhyloSift program. Dr. Valley Stewart provided expert knowledge of the hpxWXYZ operon in alpha and gamma proteobacterial species, and aided in discussions of the project. Support for PS and MTF came from NSF EF-094953 and startup funds to MTF.
References
- Wolf YI, Rogozin IB, Grishin NV, Koonin EV. Genome trees and the tree of life. Trends in genetics: TIG. 2002;18:472–479. doi: 10.1016/S0168-9525(02)02744-0. [DOI] [PubMed] [Google Scholar]
- Kuzniar A, van Ham RCHJ, Pongor S, Leunissen JaM. The quest for orthologs: finding the corresponding gene across genomes. Trends in genetics: TIG. 2008;24:539–551. doi: 10.1016/j.tig.2008.08.009. [DOI] [PubMed] [Google Scholar]
- Price MN, Huang KH, Arkin AP, Alm EJ. Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res. 2005;15:809–819. doi: 10.1101/gr.3368805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price MN, Arkin AP, Alm EJ. The life-cycle of operons. PLoS Genet. 2006;2:e96. doi: 10.1371/journal.pgen.0020096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Novichkov PS, Rodionov DA, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, Arkin AP. et al. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Res. 2010;38:W299–W307. doi: 10.1093/nar/gkq531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zu M, Esteban CD, Deutscher J, Pe G. Horizontal gene transfer in the molecular evolution of mannose PTS transporters. Mol Biol Evol. 2005;22:1673–1685. doi: 10.1093/molbev/msi163. [DOI] [PubMed] [Google Scholar]
- Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely L, Koonin EV. Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res. 2002;30:2212–2223. doi: 10.1093/nar/30.10.2212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kojima KK, Kanehisa M. Systematic survey for novel types of prokaryotic retroelements based on gene neighborhood and protein architecture. Mol Biol Evol. 2008;25:1395–1404. doi: 10.1093/molbev/msn081. [DOI] [PubMed] [Google Scholar]
- Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999;96:2896–2901. doi: 10.1073/pnas.96.6.2896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Itoh T, Takemoto K, Mori H, Gojobori T. Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. Mol Biol Evol. 1999;16:332–346. doi: 10.1093/oxfordjournals.molbev.a026114. [DOI] [PubMed] [Google Scholar]
- Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV. Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res. 2001;11:356–372. doi: 10.1101/gr.GR-1619R. [DOI] [PubMed] [Google Scholar]
- Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T. et al. STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009;37:D412–D416. doi: 10.1093/nar/gkn760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. Synteny and collinearity in plant genomes. Science (New York, N.Y.) 2008;320:486–488. doi: 10.1126/science.1153917. [DOI] [PubMed] [Google Scholar]
- Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee T-h. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, Vandepoele K. i-ADHoRe 3.0–fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res. 2012;40:e11. doi: 10.1093/nar/gkr955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P. et al. Ensembl 2012. Nucleic Acids Res. 2012;40:D84–D90. doi: 10.1093/nar/gkr991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Markowitz VM, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A, Zhao X. et al. The integrated microbial genomes (IMG) system. Nucleic Acids Res. 2006;34:D344–D348. doi: 10.1093/nar/gkj024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mavromatis K, Chu K, Ivanova N, Hooper SD, Markowitz VM, Kyrpides NC. Gene context analysis in the Integrated Microbial Genomes (IMG) data management system. PLoS One. 2009;4:e7979. doi: 10.1371/journal.pone.0007979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, Van de Peer Y, Vandepoele K. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell. 2009;21:3718–3731. doi: 10.1105/tpc.109.071506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muffato M, Louis A, Poisnel C-E, Roest Crollius H. Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics (Oxford, England) 2010;26:1119–1121. doi: 10.1093/bioinformatics/btq079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM. et al. The UCSC genome browser database. Nucleic Acids Res. 2003;31:51–54. doi: 10.1093/nar/gkg129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bare JC, Koide T, Reiss DJ, Tenenbaum D, Baliga NS. Integration and visualization of systems biology data in context of the genome. BMC Bioinforma. 2010;11:382. doi: 10.1186/1471-2105-11-382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH. JBrowse: a next-generation genome browser. Genome Res. 2009;19:1630–1638. doi: 10.1101/gr.094607.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keseler IM, Bonavides-Martínez C, Collado-Vides J, Gama-Castro S, Gunsalus RP, Johnson DA, Krummenacker M. et al. EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res. 2009;37:D464–D470. doi: 10.1093/nar/gkn751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Byrne KP, Wolfe KH. The yeast gene order browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 2005;15:1456–1461. doi: 10.1101/gr.3672305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dehal PS, Joachimiak MP, Price MN, Bates JT, Baumohl JK, Chivian D, Friedland GD. et al. MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res. 2010;38:D396–D400. doi: 10.1093/nar/gkp919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gomez S, Fernandez A, Montiel J, Torres D. Solving Non-uniqueness in agglomerative hierarchical clustering using multidendrograms. J Classif. 2008;65:43–65. [Google Scholar]
- Lathe WC III, Snel B, Bork P. Gene context conservation of a higher order than operons. Mol Biol. 2000;13:25388–25392. doi: 10.1016/s0968-0004(00)01663-7. [DOI] [PubMed] [Google Scholar]
- Sharma AK, Walsh Da, Bapteste E, Rodriguez-Valera F, Ford Doolittle W, Papke RT. Evolution of rhodopsin ion pumps in haloarchaea. BMC Evol Biol. 2007;7:79. doi: 10.1186/1471-2148-7-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Techtmann SM, Lebedinsky AV, Colman AS, Sokolova TG, Woyke T, Goodwin L, Robb F. Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases. Front Microbiol. 2012;3:132. doi: 10.3389/fmicb.2012.00132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holland RCG, Down TA, Pocock M, Prlić A, Huen D, James K, Foisy S. et al. BioJava: an open-source framework for bioinformatics. Bioinformatics (Oxford, England) 2008;24:2096–2097. doi: 10.1093/bioinformatics/btn397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mutton P, Arnaud B. Java EPS graphics 2D. http://jlibeps.sourceforge.net/
- Altschul S, Gish W, Miller W, Myers E. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Enright AJ, Van Dongen S, Ouzounis C. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–1584. doi: 10.1093/nar/30.7.1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen F, Mackey AJ, Stoeckert CJ, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–D368. doi: 10.1093/nar/gkj123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pandit S, Gupta S. A COMPARATIVE STUDY ON DISTANCE MEASURING. IEEE Trans Neural Netw. 2011;2:29–31. [Google Scholar]
- Pope SD, Chen L-L, Stewart V. Purine utilization by Klebsiella oxytoca M5al: genes for ring-oxidizing and -opening enzymes. J Bacteriol. 2009;191:1006–1017. doi: 10.1128/JB.01281-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liebert MA, Masip L, Veeravalli K, Georgiou G. The many faces of glutathione in bacteria. Antioxid Redox Signal. 2006;8:753–763. doi: 10.1089/ars.2006.8.753. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.