Abstract
Background
Phylogenetic trees are essential diagrams used in different sciences, such as evolutionary biology or taxonomy, and they depict the relationships between a given set of taxa sharing a common ancestor. So far, a multitude of tools have already been developed to infer phylogeny, and even more to visualize the resulting trees. However, editing generated graphical plots to obtain ready-to-publish figures is still a major issue. Most available tools do not take into consideration important aspects in nomenclature, such as the use of italics for taxon names or the superscript T that must be displayed after the strain/specimen designation to denote the type strain/specimen, at least not automatically. A gap also exists to easily highlight tree branches conserved across different phylogenies containing the same taxa. The lack of available tools to achieve these tasks is challenging for scientists, since manual formatting of phylogenetic trees is very time-consuming.
Results
Here, we present a tool named ‘gitana’, running in Linux/Windows/Mac operating systems with R software installed. It creates ready-to-publish trees with formatting taxon nomenclature and editing options such as rerooting, clade highlighting or collapsing, among other features. Moreover, ‘gitana’ performs node comparisons among phylogenies comprising the same taxa to identify conserved branches.
Conclusions
‘gitana’ is a user-friendly tool to output high-quality and ready-to-publish phylogenetic trees for users without R-coding skills. It combines dedicated functions of popular R packages for phylogeny and graphical visualization into an easy one-line-command. The users’ manual and source code are freely available at https://github.com/cristinagalisteo/gitana.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12859-025-06178-1.
Keywords: Phylogenetic tree, Ready-to-publish tree, Conserved nodes, Nomenclature, Tree formatting, Tree edition
Background
Phylogenetic trees have a relevant role in evolutionary biology as graphical representation of the relationships between several organisms sharing a common ancestor inferred by mathematical algorithms [1–3]. Nomenclature is a field within the systematics which deals with the assignment of names to taxonomic groups [4]. It is officially regulated by International Codes such as the International Code of Nomenclature of Prokaryotes (ICNP) [5], the International Code of Nomenclature for Algae, Fungi, and Plants (ICN) [6], and the International Code of Zoological Nomenclature (ICZN) [7]. Rules of nomenclature provide a precise and standardized system, accepted by the international scientific community, for creating new taxa names and assessing their correctness [5]. Systematic studies and proposals of new taxa are often supported by phylogenetic trees, where displayed organisms’ names should adhere to the nomenclature rules to guarantee the rigor in publications. In particular, the binary combination of generic name and specific epithet must be written in italics, followed by the strain/specimen designation (e.g., Salinivibrio kushneri TGB4 or Fusarium venenatum ICMP 19150). In the case of a type strain/specimen, this is usually indicated with the superscript T immediately after the strain/specimen designation (e.g., Salinivibrio kushneri CECT 9177T or Fusarium venenatum ICMP 20649T). Moreover, if the tree has been inferred from molecular markers, the accession numbers of the sequences used must be specified in brackets following the strain/specimen designation. Unfortunately, this correct formatting is not always implemented in figures from high-impact journals such as Scientific Reports [8–11], Frontiers in Microbiology [12, 13], and Nature Microbiology [14, 15].
Many bioinformatic programs have been developed to generate multiple alignments from nucleotide and amino acid sequences (e.g., Clustal [16, 17], T-Coffee [18], MUSCLE [19]), to infer phylogenies (e.g., PHYLIP [20], FastTree [21], BEAST [22], MrBayes [23], MEGA [24], RAxML [25], PAUP* [26], IQ-TREE [27], among others), and to visualize the results (e.g., FigTree [28], ETE Toolkit [29], iTOL [30]). Some pipelines, such as the popular GTDB-Tk [31] or GToTree [32] software, integrate more than one of these tools. Besides, it is a widely adopted and recommended practice to employ different algorithms (i.e., maximum-likelihood, maximum-parsimony, and neighbor-joining) to infer phylogenies and compare the resulting topologies. Consistently shared nodes across multiple trees indicate robust relationships and lead to more reliable conclusions.
Furthermore, the ‘phylo’ class was firstly included in the R language within the package ‘ape’ in 2004 [33]. Over the years, new R packages have been developed to work with phylogenies, such as ‘phytools’ [34, 35], ‘GEIGER’ [36], ‘phangorn’ [37], ‘treeman’ [38], ‘phylogram’ [39], and ‘dendextend’ [40]. Regarding the display and editing, the ‘ggtree’ package provides plenty of flexibility to visualize, manage, and annotate phylogenetic trees. It is based on the widely used ‘ggplot2’ package and its layer-by-layer plotting approach [41–43]. Besides, ‘ggtree’ is constantly being updated and expanded with other complementary packages such as ‘treeio’ [44], ‘tidytree’ [45], and ‘ggtreeExtra’ [46]. Undoubtedly, ‘ggtree’ is one of the leading available packages to draw and personalize phylogenetic trees. As a main disadvantage, it requires R coding skills by the final users.
Here, we developed an argument-based R script that combines some of the most useful ‘phylo’-related functions to automatically obtain ready-to-publish phylogenetic trees that adhere to the rules of nomenclature and other formatting recommendations.
Implementation
‘gitana’ has been written in the R programming language and the tool is provided as an executable RScript that can be run from the command prompt on Linux/Windows/Mac operating systems. Obviously, the R language and the required libraries must be previously installed in the machine. The script accepts one or more phylogenetic trees (containing the same taxa) as input. If more than one tree is provided, their topologies will be compared, marking the shared nodes. Bootstraps percentages below the indicated cut-off value will be filtered out. Besides, ‘gitana’ can perform structural modifications on the base tree (‘phylo’ class), such as node rotation or reroot by outgroup. Optionally (but recommended), a tab-separated text file with information about the taxa displayed in the trees can be provided to automatically edit and format the taxa names and sequence accession numbers according to the nomenclature rules and publishing style. In order to customize the plot, multiple graphical modifications can be conducted, such as collapsing specific nodes, coloring taxa of interest, or adding annotations. The final tree visualization is saved with default parameters unless other values are manually selected. Although ‘gitana’ is designed for users with very basic bioinformatic knowledge, R-coding experience users can export the plot as an R object for further modifications. Schematic internal workflow of ‘gitana’ tool is detailed in Fig. 1.
Fig. 1.
Overall workflow of ‘gitana’. Object of the class ‘phylo’ and its modifications are marked in blue; optional additional trees and the identification of conserved shared branches are highlighted in pink; taxa information file and its edition as an R dataframe are displayed in green; quick plot of the input base tree with no edition is indicated with a yellow arrow
Results and discussion
‘gitana’ takes as input one or more previously inferred phylogenetic trees. To obtain a final figure with correctly edited taxa labels, it is necessary to provide a text file with the information described in the ‘gitana’ manual (https://github.com/cristinagalisteo/gitana) as well as in the output of the ‘--help’ option. Text on the branch tips includes binomial species name followed by strain/specimen designation (and superscript T if requested) and sequence accession number in brackets, e.g., Natronomonas aquatica CECT 9970T (MZ318646).
Moreover, if multiple trees containing the same taxa are provided (e.g., phylogenies inferred from the same dataset using different algorithms), ‘gitana’ will automatically compare them and mark the shared nodes with black-filled circles, using the first provided tree as the reference topology (Fig. 2A). Although ‘gitana’ does not infer phylogeny, the input tree can be modified to output either a rooted or unrooted tree (options ‘--root’ and ‘--unroot’, respectively), automatically creating a new object of the class ‘phylo’. Multiple aesthetic parameters can be settled, such as coloring taxa names and/or branches, framing clusters, collapsing nodes, and adding side labels (Fig. 2B). Besides, the position of the bootstrap values above the tree branches can be modified, as well as the scale bar.
Fig. 2.
Examples of ready-to-publish phylogenetic trees output by ‘gitana’ tool. A Tree displaying the correct binomial nomenclature for species, including the T designation for type strains. Sequence accession numbers are indicated in brackets. Bootstrap values ≥ 70% are displayed above branches. Filled circles highlight nodes conserved across the three algorithms used for tree inference. Scale bar width is set to 0.05 substitutions per nucleotide position. B Previous tree further edited and decorated for illustrative purposes. The taxon of interest is green colored, and the corresponding genus is highlighted in orange. The internal node comprising the five species of the genus Gracilimonas is collapsed and labelled with the appropriate genus name
By default, the output tree will be saved as PDF, although other formats can be indicated. Likewise, the resulting plot will have an A4 size (29.7 × 21 cm). Larger trees (i.e., those with a greater number of taxa) will automatically adjust the sheet area to ensure an adequate fit. In any case, the user can select the desired output dimensions, as well as other features such as font size or scale bar width. Further uses and examples are detailed in the ‘--help’ option and in the users’ manual.
The beta version of ‘gitana’ has widely been tested in the authors’ lab and used in the creation of ready-to-publish figures for numerous taxonomic articles [47–53].
Conclusions
The introduction of ‘gitana’ to phylogenetic analysis protocols spares the researchers the time-consuming task of manually editing tree figures. That is mainly archived by the automatic formatting of the taxa names and the comparison of different tree topologies retrieved from the same dataset. Besides, additional options allow easy and quick rearrangements and graphical modifications by using a single command line. The obtained results are fully reproducible and only require basic informatic skills, which makes it a friendly tool for the final user.
Availability and requirements
Project name: gitana (phyloGenetic Imaging Tool for Adjusting Nodes and other Arrangements). Project home page: https://github.com/cristinagalisteo/gitana. Operating system: Platform independent. Programming language: R. Other requirements: R packages ‘ape’ ≥ v5.8.1, ‘ggplot2’ ≥ v3.5.1, ‘ggtext’ ≥ v0.1.2, ‘ggtree’ ≥ v3.14.0, ‘optparse’ ≥ v1.7.5, ‘phytools’ ≥ v2.4.4. License: Apache License 2.0. Any restrictions to use by non-academics: None.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
We are thankful to Dr. Fernando Puente-Sánchez for encouraging the publication of this tool and the members of the BIO-213 team from the Department of Microbiology and Parasitology of the University of Sevilla for testing early versions of the script.
Author contributions
C.G.: Software, Methodology, Writing—Original Draft; R.R.H.: Conceptualization, Supervision, Writing—Review & Editing. All authors read and approved the final manuscript.
Funding
This research was supported by grant number PID2023-148654NB-I00 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU. C.G. was also supported by grant number LE088P23 funded by Junta de Castilla y León.
Data availability
The datasets analyzed during the current study are available in the GitHub repository (https://github.com/cristinagalisteo/gitana) and included in this published article and its supplementary information files.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Kinene T, Wainaina J, Maina S, Boykin LM. Rooting trees, methods for. In: Kliman RM, editor. Encyclopedia of Evolutionary Biology. Oxford: Academic; 2016. pp. 489–93. [Google Scholar]
- 2.Nixon KC. Phylogeny. In: Levin SA, editor. Encyclopedia of Biodiversity. New York: Elsevier; 2001. pp. 559–68. [Google Scholar]
- 3.Scott AD, Baum DA. Phylogenetic tree. In: Kliman RM, editor. Encyclopedia of Evolutionary Biology. Oxford: Academic; 2016. pp. 270–6. [Google Scholar]
- 4.Cowan ST. Principles and practice of bacterial taxonomy—a forward look. J Gen Microbiol. 1965;39:143–53. [DOI] [PubMed] [Google Scholar]
- 5.Oren A, Arahal DR, Göker M, Moore ERB, Rossello-Mora R, Sutcliffe IC. International Code of Nomenclature of Prokaryotes. Prokaryotic Code (2022 Revision). Int J Syst Evol Microbiol. 2023;73:005585. [DOI] [PubMed] [Google Scholar]
- 6.Turland N, Wiersema J, Barrie F, Greuter W, Hawksworth D, Herendeen P et al. International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) adopted by the Nineteenth International Botanical Congress Shenzhen, China, July 2017. Regnum Vegetabile 159. Glashütten, Germany: Koeltz Botanical Books; 2018.
- 7.Ride WDL, Cogger HG, Dupuis C, Kraus O, Minelli A, Thompson FC, Tubbs PK. International Code of Zoological Nomenclature. 4th ed. London: The International Trust for Zoological Nomenclature; 1999. [Google Scholar]
- 8.Belknap KC, Park CJ, Barth BM, Andam CP. Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria. Sci Rep. 2020;10:2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gaba S, Kumari A, Medema M, Kaushik R. Pan-genome analysis and ancestral state reconstruction of class halobacteria: probability of a new super-order. Sci Rep. 2020;10:21205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mandakovic D, Cintolesi Á, Maldonado J, Mendoza SN, Aïte M, Gaete A, et al. Genome-scale metabolic models of Microbacterium species isolated from a high altitude desert environment. Sci Rep. 2020;10:5560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Abdelsattar AM, El-Esawi MA, Elsayed A, Heikal YM. Comparison between bacterial bio-formulations and gibberellic acid effects on Stevia rebaudiana growth and production of steviol glycosides through regulating their encoding genes. Sci Rep. 2024;14:24130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cycil LM, DasSarma S, Pecher W, McDonald R, AbdulSalam M, Hasan F. Metagenomic insights into the diversity of halophilic microorganisms indigenous to the Karak salt mine, Pakistan. Front Microbiol. 2020;11:1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kumar V, Parida SN, Dhar S, Bisai K, Sarkar DJ, Panda SP, et al. Biogenic synthesis of silver nanoparticle by Cytobacillus firmus isolated from the river sediment with potential antimicrobial properties against Edwardsiella tarda. Front Microbiol. 2024;15:1416411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Davín AA, Waite DW, et al. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021;6:946–59. [DOI] [PubMed] [Google Scholar]
- 15.Jin H, Quan K, He Q, Kwok L-Y, Ma T, Li Y, et al. A high-quality genome compendium of the human gut microbiome of inner mongolians. Nat Microbiol. 2023;8:150–61. [DOI] [PubMed] [Google Scholar]
- 16.Sievers F, Higgins DG. Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol. 2014;1079:105–16. [DOI] [PubMed] [Google Scholar]
- 17.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–17. [DOI] [PubMed] [Google Scholar]
- 19.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Felsenstein J. PHYLIP—phylogeny inference package (Version 3.2). Cladistics. 1989;5:164–6. [Google Scholar]
- 21.Price MN, Dehal PS, Arkin AP. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5:e9490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29:1969–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013;30:2725–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Swofford D. PAUP*. Phylogenetic Analysis Using Parsimony (*and other methods); 2017.
- 27.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rambaut A. FigTree v1.4.4. Edinburgh: Institute of Evolutionary Biology, University of Edinburgh; 2018. [Google Scholar]
- 29.Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol. 2016;33:1635–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;36:1925–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lee MD. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics. 2019;35:4162–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 2004;20:289–90. [DOI] [PubMed] [Google Scholar]
- 34.Revell LJ. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ. 2024;12:e16505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012;3:217–23. [Google Scholar]
- 36.Harmon LJ, Weir JT, Brock CD, Glor RE, Challenger W, Science A. GEIGER: investigating evolutionary radiations. Bioinformatics. 2008;24:129–31. [DOI] [PubMed] [Google Scholar]
- 37.Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27:592–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bennett DJ, Sutton MD, Turvey ST. treeman: an R package for efficient and intuitive manipulation of phylogenetic trees. BMC Res Notes. 2017;10:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wilkinson SP, Davy SK. phylogram: an R package for phylogenetic analysis with nested lists. J Open Source Softw. 2018;3:790. [Google Scholar]
- 40.Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics. 2015;31:3718–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yu G. Using ggtree to visualize data on tree-like structures. Curr Protoc Bioinf. 2020;69:e96. [DOI] [PubMed] [Google Scholar]
- 42.Yu G, Smith DK, Zhu H, Guan Y, Lam TT. GGTREE: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2017;8:28–36. [Google Scholar]
- 43.Yu G, Lam TTY, Zhu H, Guan Y. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Mol Biol Evol. 2018;35:3041–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wang L, Lam TT, Xu S, Dai Z, Zhou L, Feng T, et al. Treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Mol Biol Evol. 2020;37:599–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yu G. tidytree: a tidy tool for phylogenetic tree data manipulation; 2021.
- 46.Xu S, Dai Z, Guo P, Fu X, Liu S, Zhou L, et al. ggtreeExtra: compact visualization of richly annotated phylogenetic data. Mol Biol Evol. 2021;38:4039–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Straková D, Galisteo C, de la Haba RR, Ventosa A. Characterization of Haloarcula terrestris sp. nov. and reclassification of a Haloarcula species based on a taxogenomic approach. Int J Syst Evol Microbiol. 2023;73:006157. [DOI] [PubMed] [Google Scholar]
- 48.Galisteo C, de la Haba RR, Ventosa A, Sánchez-Porro C. The hypersaline soils of the Odiel Saltmarshes Natural Area as a source for uncovering a new taxon: Pseudidiomarina terrestris sp. nov. Microorganisms. 2024;12:375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Galisteo C, de la Haba RR, Sánchez-Porro C, Ventosa A. A step into the rare biosphere: genomic features of the new genus Terrihalobacillus and the new species Aquibacillus salsiterrae from hypersaline soils. Front Microbiol. 2023;14:1192059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Galisteo C, de la Haba RR, Sánchez-Porro C, Ventosa A. Biotin pathway in novel Fodinibius salsisoli sp. nov., isolated from hypersaline soils and reclassification of the genus Aliifodinibius as Fodinibius. Front Microbiol. 2023;13:1101464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.García-Roldán A, Durán-Viseras A, de la Haba RR, Corral P, Sánchez-Porro C, Ventosa A. Genomic-based phylogenetic and metabolic analyses of the genus Natronomonas, and description of Natronomonas aquatica sp. nov. Front Microbiol. 2023;14:1109549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Straková D, Sánchez-Porro C, de la Haba RR, Ventosa A. Reclassification of Halomicroarcula saliterrae Straková et al. 2024 and Halomicroarcula onubensis Straková et al. 2024 into the genus Haloarcula, as Haloarcula saliterrae comb. nov. and Haloarcula onubensis comb. nov., respectively. Int J Syst Evol Microbiol. 2024;74:006510. [DOI] [PubMed]
- 53.Ventosa A, de la Haba RR, Arahal DR, Sánchez-Porro C. Halomonadaceae. In: Whitman WB, De Vos P, Dedysh S, Hedlund BP, Kämpfer P, Rainey F, et al., editors. Bergey’s Manual of Systematics of Archaea and Bacteria. Hoboken, NJ: John Wiley & Sons, Inc., in association with Bergey’s Manual Trust; 2021. pp. 1–10.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets analyzed during the current study are available in the GitHub repository (https://github.com/cristinagalisteo/gitana) and included in this published article and its supplementary information files.