Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2017 Mar 22;34(6):1535–1542. doi: 10.1093/molbev/msx055

A Critical Review on the Use of Support Values in Tree Viewers and Bioinformatics Toolkits

Lucas Czech 1,*, Jaime Huerta-Cepas 2, Alexandros Stamatakis 1,3
PMCID: PMC5435079  PMID: 28369572

Abstract

Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Most empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that arise when displaying branch values on trees after rerooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when rerooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed ten tree viewers and ten bioinformatics toolkits that can display and reroot trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements in eight tools. We suggest tools should provide options that explicitly force users to define the semantics of node labels.

Keywords: phylogenetic trees, tree visualization, tree viewers, bioinformatics toolkits, Newick format, branch support values, branch labels, software, bugs

Introduction

Problem Description

The Newick format is widely used to store and visualize phylogenies. Since its introduction by Archie et al. (1986), it has become the de facto standard for storing, exchanging, and displaying phylogenies. It uses parentheses and commas to specify the nesting structure of the tree and also allows for storing node labels as well as branch lengths.

In many cases, additional vital information needs to be associated with the branches of a tree. Published phylogenies usually display branch values, such as bootstrap (Felsenstein, 1985) support, Bayesian posterior probability (Huelsenbeck et al., 2001), or approximate likelihood ratio test (aLRT) (Anisimova and Gascuel, 2006b) values. These values are associated with branches (splits/bipartitions) of the tree and not nodes of the tree. In the original specification of the Newick format, the authors had not foreseen an option for specifying branch labels or other metadata associated to branches.

Thus, as a workaround, branch values are often stored as inner node labels in the output of phylogenetic inference tools. Node labels of tip nodes usually contain the species name of the extant organisms. Inner nodes, however, represent hypothetical common ancestors and are therefore generally not named. Thus, these inner node labels can be (mis)used to store branch information.

The original Newick format is well defined, for example via the formal grammar provided by Olsen (1990). There is however no official standard for it, including respective semantics of Newick comments, for instance. Hence, there is also no officially correct way of using it—attributes of branches and nodes can essentially be interpreted ad libitum. Thus, users need to be aware of the semantics of such attributes. Their interpretation depends on the convention used when storing those values in Newick format.

The convention, or rather workaround, for storing branch values as node labels exhibits potential pitfalls. This is because, in an unrooted binary tree, it is not clear to which of the three outgoing branches of an inner node such a node label refers to.

However, for rooted trees, there is an unambiguous mapping of node labels to branches: The node label (branch value) at an inner node can always be associated with (or mapped to) the outgoing branch that points toward the root. Note that, unrooted trees often have a dedicated inner node that serves as a hook for both computing and visualizing the tree. This so called top-level trifurcation is not a root in the strict sense, but required for storing and parsing the tree, because we need to recursively start traversing the tree from somewhere. We can choose the inner node that serves as top-level trifurcation arbitrarily. That is, the same underlying unrooted tree can be displayed or written to file in many distinct ways. For n taxa, an unrooted binary tree has n2 inner nodes, hence we can choose n2 such top-level trifurcations. For each of these possible top-level trifurcations, we can then also freely choose by which order we descend into the subtrees defined by the three outgoing branches to print out or visualize the tree. The chosen top-level trifurcation induces an artificial orientation for branches in the tree, and can thus be used to unambiguously associate node labels with branches. Figure 1a shows an unrooted tree with this structure.

Fig. 1.

Fig. 1

Our exemplary tree, before and after rooting on the branch leading to the tip node X. The rooted trees contain an additional root node R’. (a) Original rooting (via top-level trifurcation) and visual representation of our Newick test tree TN. Inner nodes and branches are colored according to the correct node label to branch mapping of TN. (b) Tree rooted on node R’. Node labels are mapped incorrectly to branches, resulting in a tree with an erroneous node label to branch value mapping. (c) Tree rooted on node R’. Node labels are correctly mapped to the branches of the tree.

Thus, both rooted and unrooted trees in Newick format explicitly (root) or implicitly (top-level trifurcation) encode a direction for branches. Therefore, the mapping between branch values and node labels in Newick files is well defined in principle: For restoring the correct association between node labels and branches, the direction toward the top-level node (root or top-level trifurcation) can be used. This however entails an implicit semantic interpretation. When reading a Newick-formatted tree, the user or program needs to know if inner node labels need to be interpreted as branch values. When this semantic distinction is not made, node labels need to be interpreted as being associated to the nodes, because this was the original intention of the Newick format. When node labels that should be interpreted as branch labels (e.g., support values) are erroneously interpreted as node labels, this can lead to incorrect visualizations as well as interpretations of phylogenies. These issues can potentially affect downstream analysis tools that parse phylogenies with node labels, for instance, tools for computing the weighted Robinson–Foulds distance (Robinson and Foulds, 1981) between phylogenies with branch support values.

Here, we show that 14 out of 20 common tree viewers and general purpose bioinformatics toolkits do not offer an explicit option for specifying the semantics of inner node labels. A simple way to examine the behavior of tools in this regard, is to (re)root a given tree—an option that all tested viewers and toolkits offer. If node labels shall represent branch labels, the association of some node labels with corresponding branches has to be changed during the rerooting process. This is because the direction toward the root (or top-level trifurcation) changes. We found that 8 out of 20 tools exhibit incorrect behavior when rerooting trees.

Note that, rerooting a tree is not always a meaningful operation. For example, a tree inferred with a time-asymmetric model might contain posterior support values that belong to nodes rather than branches/splits of the tree. As another example, inner node labels that represent clade names (e.g., “Mammalia”) are attributes associated with one direction of a branch (only mammals in one part of the split induced by the branch, none in the other). In fact, this is a third class of values associated with the tree, which, again, behaves differently when rerooting the tree. We are, however, not aware of any tree file format that allows to store this type of information. Thus, we focus on the distinction between node labels and branch values here, and use rerooting to reveal the internal workings of the tested tools.

Test Case

Our unrooted bifurcating Newick test tree with inner node labels

TN=((C,D)1,(A,(B,X)3)2,E)R;

has six leaf nodes (A…E) and four inner nodes (labeled 1…3, and the top-level trifurcation R). For the sake of simplicity, we ignore branch length values. We use TN throughout this review to test the behavior of tree viewers and toolkits when rerooting the original topology. We also outline potential problems that may arise due to the mostly implicit semantics of inner node labels in Newick trees.

An alternative variant to output branch values is to store them as Newick comments in square brackets instead of node labels. The tree

TC=((C,D)[1],(A,(B,X)[3])[2],E)[R];

shows an example for this notation and contains the same information as tree TN. For the semantics and the association of those comments with branches, the same convention applies as for the node label notation. Some of the tested tools are able to correctly parse and display this format, but, in general, the same semantic issues and mapping problems arise.

For example, the output formats for phylogenies with branch support values of three widely used phylogenetic inference tools are different. PHYML (Guindon et al., 2010) reports support values as node labels, see (Anisimova and Gascuel, 2006a). RAxML (Stamatakis, 2014) generates two tree files, one with comments and one with node labels. Finally, MrBayes (Ronquist and Huelsenbeck, 2003; Ronquist et al., 2012) uses its own Nexus-based format, which internally uses a variation of Newick comments to report support values (posterior probabilities). Those different idiosyncratic output formats illustrate the difficulties associated to working with trees that have branch support values.

In figure 1, we show tree TN, where colors indicate the correct mapping of inner node labels to nodes and branches.

If we now (re)root TN at the branch that leads to tip X, the mappings between all nodes and branches that lie on the path between the old and the new top-level node have to be altered. In our example, the nodes on the path from R to X are the inner nodes 2 and 3. In figure 1, we display the incorrect (figure 1b) and correct (figure 1c) mapping of inner node labels to nodes and branches after rerooting. Note that, this rooted binary tree now contains one more node, which is the newly created root node R’. In both figures, the inner node labels are correctly assigned to their corresponding nodes. However, the association of those labels to the corresponding branches is only correct in figure 1c.

An incorrect mapping of node labels to branches as presented in figure 1b will lead to incorrectly displayed branch values in empirical phylogenetic studies. In addition, since a typically large fraction of the results and discussion sections of such studies is dedicated to interpreting the support values of the phylogeny, the conclusions of these studies might also be incorrect.

In the following, we examine different popular tree viewers and several bioinformatics toolkits to determine if they maintain the correct branch value mapping when rerooting our test tree TN at the branch leading to tip node X.

Finally, since Dendroscope (Huson and Scornavacca, 2012), one of the most commonly used tree viewers tested, yielded incorrect mappings for all versions prior to v. 3.5.0 (released 2016-01-07), we also assessed if there exist published empirical phylogenetic studies using Dendroscope with incorrectly visualized support values.

Review

Experimental Setup

Given a Newick tree with inner node labels (e.g., tree TN with labels 1, 2, and 3), we distinguish between two possible interpretations for those labels: 1) They are actual node labels (e.g., ancestral species names). We call this the “node interpretation” and 2) they represent branch labels (e.g., support values). We call this the “branch interpretation”. The same applies to trees that use comments instead of node labels (e.g., tree TC). For a program to support both interpretations, a reasonable solution would be to offer an option for choosing between the two, that is, to include an explicit semantic interpretation dialog.

We tested the tree viewers as follows:

  • Check whether the tool has an option to specify the semantics of inner values.

  • Load trees TN and TC from the corresponding Newick file.

  • Check how the tool interprets the values.

  • Reroot the tree at the branch leading to node X.

  • Check whether the viewer works correctly based on its interpretation.

In table 1, we provide an overview of the tested tree viewers and bioinformatics toolkits. Whereas the list does not cover all available tools, we focus on highly used resources offering rerooting capabilities, as the impact of potential errors in these tools on published phylogenies is larger. We also tested some less known tools, in order to assess how widely spread the issue is.

Table 1.

Evaluated Tree Viewers (first half) and Bioinformatics Toolkits (second half) with Accumulated Number of Citations (https://scholar.google.com, accessed on 2016-11-11).

Tool Version Reference Citations
Archaeopteryx 0.9911 Han and Zmasek, 2009 268
ATV 4.00 alpha 13 Zmasek and Eddy, 2001 288
Dendroscope 3.4.0 and 3.5.3 Huson and Scornavacca, 2012 1,348
ETE (GUI) 2.3.10 Huerta-Cepas et al., 2016 238
EvolView Accessed 2016-08-15 Zhang et al., 2012 105
FigTree 1.4.2 Rambaut, 2007 >2,362a
iTOL Accessed 2016-08-15 Letunic and Bork, 2016 1,879
PhyloWidget Accessed 2016-08-15 Jordan and Piel, 2008 113
TreeView 1.6.6 (Windows) Roderic, 1996 10,570
T-REX Accessed 2016-08-15 Boc et al., 2012 285
APE 3.4 Paradis et al., 2003 3,915
BioPerl 1.006925 Stajich et al., 2002 1,410
BioPython 1.63b Cock et al., 2009 797
Dendropy 4.1.0 Sukumaran and Holder, 2010 525
ETE (API) 3.0.0b35 Huerta-Cepas et al., 2016 238
Geneious 10.0.5 Kearse et al., 2012 1,689
Mega 7.0.14 build 7160126 Kumar et al., 2016 69,134
Mesquite 3.10 (build 765) Maddison and Maddison, 2001 5,616
Newick Utilities 1.6 Junier and Zdobnov, 2010 31
Pycogent/scikit-bio 1.5.3 Knight et al., 2007 148
Total 100,721
a

FigTree does not have an official publication, so we estimated the number of citations by accumulating the counts for the most recent versions.

In the following, we discuss our observations for the aforementioned tree viewers and general purpose toolkits. In table 2, we provide an overview of these results.

Table 2.

Evaluation of tree viewers and bioinformatics toolkits. The columns “Nodes” and “Branches” indicate which of the two interpretations of Newick node labels the tool supports. The last column shows whether the rerooting behavior is correct according to the interpretation offered or implied by the tool.

Tool Nodes Branches Default behavior Correct rerooting
Archaeopteryx Nodes
ATV Branches
Dendroscope a Dialoga a
ETE (GUI) Branches
EvolView Branches
FigTree Both
iTOL Input dependent
PhyloWidget Nodes
TreeView Branches
T-REX Branches ()
APE a Nodes a
BioPerl Nodes
BioPython Nodes
Dendropy a Nodes ()
ETE (API) Branches
Geneious () Nodes
MEGA Branches
Mesquite Nodes
Newick Utilities a Nodes a
Pycogent/scikit-bio Branches
a

Option added or improved after this review.

Results

Tree Viewers

Archaeopteryx is aware of the semantic issue, see (Zmasek, 2015). It offers an option to define the semantics of annotated values. The default is to interpret nodes labels as node labels, thus the rerooted tree is correctly displayed only for the node interpretation. When activating the option “Internal Node Names are Confidence Values”, rerooting algorithms correctly shift support values to the corresponding branches. Prior to v. 0.9911, there was a minor issue in displaying these values on screen. This was fixed after we contacted the developers. Archaeopteryx does not support the comment notation (e.g., tree TC).

ATV is the predecessor to Archaeopteryx. Different versions seem to alternate between the two possible interpretations of inner node labels. The one we tested uses the branch interpretation of node labels and thus correctly reroots.

Dendroscope versions prior to v. 3.5.0 only offered the node labels as node labels interpretation for our test trees. This led to incorrect results when rerooting trees with node labels that actually represented branch support values. Only if the tree also contains branch lengths, Dendroscope interpreted the Newick comments as support values (e.g., tree TC plus branch lengths). The alternative notation using inner node labels (e.g., tree TN) is not affected by this and always applies the node label interpretation. This behavior was not fully documented in the manual. We assess the impact of this behavior on published empirical phylogenetic studies in section “Impact on Empirical Phylogenetic Studies”. In the latest versions of Dendroscope (v. 3.5.0 up to v. 3.5.4), all of our recommendations (see section “Conclusions”) made in the first bioRxiv preprint (Czech and Stamatakis, 2015) of this review were implemented by Daniel Huson. When reading a Newick file with node labels, Dendroscope now explicitly asks the user for the intended interpretation. It also has a menu option to choose between the interpretations.

ETE (GUI) (Huerta-Cepas et al., 2010, 2016) is another viewer that supports both interpretations. When reading a Newick formatted tree, it offers an option for specifying label semantics. The comment notation is not supported (e.g., tree TC).

EvolView is able to display numerical values at inner nodes. Rerooting however misplaces those values to wrong nodes and sets some of them to zero. Rerooting a given tree several times at different branches results in all inner node values becoming zero. Furthermore, rerooting does not resolve the initial trifurcation properly, so that the resulting tree contains a multifurcation at node R. The developers are aware of these issues and intend to fix them in a future release.

FigTree is able to display multiple inner node labels using both semantic interpretations. When rerooting the tree, however, there is no option to define the interpretation of the node labels, that is, FigTree internally always assumes the branch interpretation. Thus, after rerooting actual node labels, the labels are mapped to wrong nodes. In addition, it cannot parse certain Newick variants, such as trees that contain both branch lengths and support values stored as comments.

iTOL (Letunic and Bork, 2011, 2016) works correctly. If the inner values are numbers, it implicitly applies the branch support values interpretation. If they are strings, they are interpreted as inner node names. In both cases, re-rooting works as expected. However, it does not offer an explicit option to change this behavior, that is, to interpret numbers as belonging to the nodes, or strings as belonging to the branches.

PhyloWidget interprets node labels as node labels. Thus, rerooting a tree with branch support values yields errors. Also, rerooting does not resolve the initial trifurcation, similar to EvolView. Phylowidget is no longer maintained, thus its authors recommend not to use it for rerooting phylogenies or displaying branch support values. Therefore, it is marked as not correct in table 2.

TreeView interprets node labels as branch support values and correctly reroots under this interpretation. However, it displays the values next to the nodes instead of the branches, which may lead to potential confusion.

T-REX also applies the branch interpretation and correctly reroots. The branch support values are however always displayed as percentages, that is, followed by a “%” sign. This is not always the correct or desired way for displaying branch support values. The developers plan to fix this in the next release. Hence, we marked it as almost correct in table 2. T-REX does not work with the comment notation.

Bioinformatics Toolkits

APE interprets inner node labels as node attributes when rerooting. We reported this issue to the project maintainers and a new version of the package (v. 3.6) is now available that allows handling node labels as support values when rooting. In addition, a workaround solution is provided in the supplementary material, of this manuscript that patches previous APE versions.

BioPerl offers options to explicitly load node labels as branch or node attributes. When the branch interpretation is selected, rerooting algorithms work correctly.

BioPython, with the BioPhylo module for handling trees (Talevich et al., 2012), interprets inner node labels as confidence values when parsing a Newick tree. However, those values are handled as node attributes rather than as branch attributes when rerooting the tree, therefore leading to incorrect positions of the support values. The same behavior is observed when explicitly loading support values using the PhyloXML format. This is currently a known issue in the project and a fix is being developed.

Dendropy loads inner node labels as node attributes. Therefore, if those labels are meant to represent support values, rerooting will lead to incorrect results. The Dendropy documentation explains this behavior in detail, and a workaround is available that permits to reroot trees where bootstrap values are encoded as node labels in the Newick format. A new option has been added in version 4.2 that allows to automatically translate node labels into branch support values when loading a Newick tree, so rerooting algorithms can be safely applied without further tree processing.

ETE (API) (Huerta-Cepas et al., 2010, 2016) offers the same options as when used for tree visualization (see above). Node labels can be loaded as node names (node attributes) or branch support values (branch attributes). When rerooting, branch support values will be correctly remapped to branches.

Geneious is able to read both Newick notations, and by default interprets the values as node labels. The branch interpretation is available as an undocumented feature, depending on the naming of those values. However, when rerooting the tree, the values are treated as belonging to the branches in both cases. This results in misplaced node labels. The maintainers are planning to fix this and to make the interpretation choice more apparent.

MEGA (Tamura et al., 2007, 2011, 2013; Kumar et al., 2016) is able to read both notations, and interprets the values as branch support values in both cases. Rerooting works correctly under this interpretation.

Mesquite understands the node label notation, but not the comment notation. By default, it interprets node labels as node labels and correctly reroots. There is also a function to reinterpret internal node labels and turn them into branch values; rerooting works correctly after this transformation. For a future release, the maintainers plan to implement a user prompt for choosing the interpretation when a tree with inner node labels is loaded.

Newick Utilities does not handle node labels as branch attributes by default, therefore leading to incorrect results when rerooting Newick trees. After reporting the issue, a previously undocumented option (–s) has been documented that permits to automatically interpret inner node labels as branch support attributes.

Pycogent interprets inner node labels as support values by default and those are correctly handled by the rooting functions.

Impact on Empirical Phylogenetic Studies

Users, who are not aware of the implicit semantic assumptions of tree manipulation tools, might obtain tree visualizations with incorrectly mapped support values. This is particularly the case if the node interpretation is wrongly applied to branch support values. Most prominently, older versions of Dendroscope (before version 3.5.0, see section “Results”) implicitly interpret node labels as, simply that, node labels. The extent to which this affects published phylogenies is hard to quantify. This is because all visualized phylogenies in all published papers citing Dendroscope (over 1,200 for the two Dendroscope papers based on Google scholar, accessed on August 15, 2016) would need to be checked and all original tree files would need to be available, which they should be, in principle. Hence, this is also an issue of reproducibility of scientific results—even if in our case it simply boils down to making available a published Newick tree with support values for download. To at least get a feeling of the visualization and reproducibility issue, we contacted the authors of 14 papers that used Dendroscope to visualize trees with support values, published in journals such as Nature, PLOS, BMC, and JBC. Out of the contacted authors, five replied, but only two were finally able to provide us with the trees that were used to generate the visualizations in their publications.

In the following, we analyze the trees visualized for these two papers with respect to the correctness of the branch support value mapping.

The first article (Liu et al., 2014) presents a phylogeny of 80 Arabidopsis accessions (see fig. 4b of Liu et al., 2014) along with bootstrap values for some of the branches. The tree and bootstrap values were inferred with RAxML 7.3.5 (Stamatakis et al., 2008), which writes a tree file that uses Newick comments for storing support values. Dendroscope (Huson and Scornavacca, 2012) was used to reroot and visualize the tree. As already mentioned, the tool is able to correctly handle this variant of stored support values. Thus, the error did not occur in this paper and the tree is correctly visualized.

The second article (Lundin et al., 2010) presents several phylogenies for all three domains of life. The trees were inferred using RAxML v7.2.6 (Stamatakis, 2006a, 2006b; Stamatakis et al., 2008) and PHYML v3.0 (Guindon and Gascuel, 2003; Le et al., 2008; Guindon et al., 2010). Branch support values were estimated with PHYML using the SH-like likelihood ratio test (Anisimova and Gascuel, 2006b), which reports support values as node labels. All trees in figures 2 and 4–7 of Lundin et al. (2010) were rerooted using Dendroscope such that they can be more easily compared with the comprehensive trees presented in figure 1 of the article. In all cases, branch support values were mapped incorrectly to the rerooted trees in these figures.

We illustrate this in figure 2. Figure 2a is the original Newick tree used to generate figure 2a in Lundin et al. (2010). We have marked the branch used for (re)rooting the tree by a red cross. We colored the subtrees so that their corresponding position in the rerooted tree is easily visible. Figure 2b shows the rerooted tree using Dendroscope v. 3.4.0, which is identical to the one presented in Lundin et al. (2010). The branch support values between the old and the new root node in our figure 2 are not mapped to the same bipartition in figure 2a and b. For example, in figure 2a the support value underlined in green refers to the bipartition green taxa—blue taxon, red taxa whereas in figure 2b it refers to the bipartition red taxa—green taxa, blue taxon. Fortunately, in this specific case, the incorrectly mapped support values do not change the conclusions of the paper (pers. comm. with Daniel Lundin on December 28, 2015). In figure 2c, we show the correctly rerooted tree, created with the updated Dendroscope version 3.5.3. The value underlined in green now refers to the correct bipartition. Furthermore, the value underlined in red is correctly duplicated at both outgoing branches of the root.

Fig. 2.

Fig. 2

Example of a published phylogeny showing that the issue occurred in real-life data. We used the original data from Lundin et al. (2010) to recreate Figure 2(a) of Lundin et al. (2010). (a) The original tree with the branch used for rerooting marked by a red cross. (b) The rerooted tree with incorrectly placed branch support values (e.g., the one underlined in green). This tree was created using Dendroscope 3.4.0. (c) The same rerooted tree, this time using the updated Dendroscope 3.5.3. The error does not occur, because the correct interpretation of the values was selected. Note that, the value underlined in red is now correctly duplicated at both ends of the root branch. We colored the subtrees to highlight their positions after rerooting.

Conclusions

Our results indicate that an explicit convention and explicit semantics for interpreting node and branch values in tree viewers and other common bioinformatics tools are clearly missing. From the tested viewers, only three (Archaeopteryx, ETE, and Dendroscope from v. 3.5.0 onwards) offer a user dialog to define the semantics of node labels. Older versions of Dendroscope offer an implicit choice depending on the input format. Other viewers cannot read certain Newick variants (e.g., Tree TC). Similarly, bioinformatics toolkits differ in the way node labels are interpreted. Six out of the ten tested toolkits did not provide explicit options for interpreting node labels as branch values. At present, APE, Dendropy, and Newick Utilities have now included options for automatically interpreting node labels as branch values when reading and rerooting trees.

Overall, the tools treat node labels and branch values in their own, often undocumented and implicit, ways. Users must therefore be aware and simply accept the implicit interpretation a particular tool implements.

Furthermore, programs that can infer branch support values use a plethora of distinct output formats. Developers of phylogenetic inference programs may consider storing branch support values using explicit tags as supported by formats such as Extended Newick or PhyloXML (Han and Zmasek, 2009). PhyloXML trees are, however, more difficult to parse and yield substantially larger tree files. For instance, our test tree TN requires 24 bytes in Newick, but 856 bytes in PhyloXML format. Another exemplary 512 taxon tree with branch lengths requires 40,303 bytes in Newick and 239,795 bytes in PhyloXML.

In order to resolve the ambiguity of inner node labels in the Newick format, we recommend to use the comment notation with square brackets to store branch values. This way, the semantics of inner node labels are not overloaded. This is also the variant required by the Nexus standard (Maddison et al., 1997). Nexus is a container format that internally stores trees in Newick format; in its specification, it refines the original Newick format. However, as this notation “misuses” comments to store metadata, it is also valid for programs to ignore them. It can thus not be expected to work with current tools, which we showed in this review. Furthermore, particularly when using the comment notation, it is important to explicitly choose the correct interpretation of the stored values.

To address this general problem, we suggest that all tree viewers and toolkits shall offer an explicit option to choose between the two possible interpretations of node labels. Ideally, users should be forced to define the semantics of their node labels before the tree is displayed or rerooted by the respective tool. This way, accidentally wrong interpretations are avoided and unaware users will become aware of the semantics of inner node labels.

Finally, we suggest that published phylogenies should be reassessed, if branch support values were stored as node labels in the original Newick files and trees were manipulated using bioinformatics tools (e.g., if Dendroscope prior to v. 3.5.0 was used for rerooting and tree visualization).

We conclude with some practical suggestions for users of phylogenetic tree viewing tools.

  • Pay attention to the options a tool offers for interpreting node labels in Newick files.

  • If available, use the option to set the desired interpretation (e.g., Archaeopteryx, ETE, Dendroscope).

  • Ensure that rerooting represents a valid operation for your type of tree and its associated metadata.

  • Double check your results, maybe try other tools, or conduct a visual inspection, particularly if the original trees were rerooted or otherwise manipulated.

The behavior of tools can easily be tested with our example trees TN and TC that are available for download at https://github.com/stamatak/tree-viz-issues (last accessed August 17, 2016).

Supplementary Material

Supplementary materials are available at Molecular Biology and Evolution online.

Supplementary Material

Supplementary Data

Acknowledgments

This work was financially supported by the Klaus Tschira Stiftung gGmbH Foundation and the European Molecular Biology Laboratory Heidelberg (EMBL).

We wish to thank the authors of the papers described in section “Impact on Empirical Phylogenetic Studies” for sending us their original tree files: Anthony Poole and Daniel Lundin (Lundin et al., 2010), as well as Artem Pankin and Franziska Turck (Liu et al., 2014).

Furthermore, we want to particularly thank Daniel Huson. He implemented fixes to Dendroscope (Huson and Scornavacca, 2012; Huson et al., 2007) according to our suggestions shortly after the biorXiv preprint of this review (Czech and Stamatakis, 2015) became available. We highly appreciate his feedback on this review and his positive response to our critique and suggestions.

We also want to thank Christian Zmasek (Archaeopteryx, ATV), Zhenxiang Chen and Wei-Hua Chen (EvolView), Ivica Letunić (iTOL), Gregory Jordan (PhyloWidget), Vladimir Makarenkov (T-REX), Emmanuel Paradis (APE), Chris Fields and Jason Stajich (BioPerl), Peter Cock (BioPython), Jeet Sukumaran and Mark T. Holder (Dendropy), Alexei Drummond and Richard Moir (Geneious), Sudhir Kumar and Koichiro Tamura (Mega), David Maddison and Wayne Maddison (Mesquite), Thomas Junier (Newick Utilities), and Daniel McDonald (PyCogent) for their feedback and discussions regarding this review and their software.

Author’s Contributions

L.C. collected the data, carried out the experiments on tree viewers and on some of the bioinformatics toolkits, and drafted the manuscript. J.H.C. carried out the experiments on most of the bioinformatics toolkits and helped to draft the manuscript. A.S. conceived the study, participated in its design as well as coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.

References

  1. Anisimova M, Gascuel O.. 2006a. A Fast Implementation of aLRT in PhyML [cited 2016 Mar 07]. Available from: http://www.atgc-montpellier.fr/phyml/alrt/.
  2. Anisimova M, Gascuel O.. 2006b. Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol. 554:539–552. [DOI] [PubMed] [Google Scholar]
  3. Archie J, Day WH, Maddison W, Meacham C, Rohlf FJ, Swofford D, Felsenstein J.. 1986. The newick tree format. [cited 2015 Jul 26]. Available from: http://evolution.genetics.washington.edu/phylip/newicktree.html.
  4. Boc A, Diallo AB, Makarenkov V.. 2012. T-REX: A web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res. 40(W1):W573–W579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. 2009. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2511:1422–1423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Czech L, Stamatakis A.. 2015. Do phylogenetic tree viewers correctly display support values? bioRxiv. Cold Spring Harbor Labs Journals. [Cited 2016 November 11]. Available from: http://biorxiv.org/content/early/2015/12/25/035360. [Google Scholar]
  7. Felsenstein J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 394:783–791. [DOI] [PubMed] [Google Scholar]
  8. Guindon S, Gascuel O.. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 525:696–704. [DOI] [PubMed] [Google Scholar]
  9. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O.. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. Syst Biol. 593:307–321. [DOI] [PubMed] [Google Scholar]
  10. Han MV, Zmasek CM.. 2009. phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics. 10:356.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP.. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science. 2945550:2310–2314. [DOI] [PubMed] [Google Scholar]
  12. Huerta-Cepas J, Dopazo J, Gabaldón T.. 2010. ETE: a python environment for tree exploration. BMC Bioinformatics. 111:24.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Huerta-Cepas J, Serra F, Bork P.. 2016. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol 336:1635–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Huson DH, Scornavacca C.. 2012. Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Syst Biol. 616:1061–1067. [DOI] [PubMed] [Google Scholar]
  15. Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R.. 2007. Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinformatics. 81:460.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Jordan GE, Piel WH.. 2008. PhyloWidget: web-based visualizations for the tree of life. Bioinformatics. 2414:1641–1642. [DOI] [PubMed] [Google Scholar]
  17. Junier T, Zdobnov EM.. 2010. The newick utilities: high-throughput phylogenetic tree processing in the unix shell. Bioinformatics. 2613:1669–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, et al. 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2812:1647–1649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Knight R, Maxwell P, Birmingham A, Carnes J, Caporaso JG, Easton BC, Eaton M, Hamady M, Lindsay H, Liu Z, et al. 2007. PyCogent: a toolkit for making sense from sequence. Genome Biol 88:R171.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kumar S, Stecher G, Tamura K.. 2016. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 337:1870–1874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Le SQ, Lartillot N, Gascuel O.. 2008. Phylogenetic mixture models for proteins. Philos Trans R Soc B Biol Sci. 3631512:3965–3976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Letunic I, Bork P.. 2011. Interactive tree of life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 39(Suppl 2):1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Letunic I, Bork P.. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44(W1):W242–W245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Liu L, Adrian J, Pankin A, Hu J, Dong X, von Korff M, Turck F.. 2014. Induced and natural variation of promoter length modulates the photoperiodic response of FLOWERING LOCUS T. Nature Commun. 5:4558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lundin D, Gribaldo S, Torrents E, Sjoberg BM, Poole A.. 2010. Ribonucleotide reduction—horizontal transfer of a required function spans all three domains. BMC Evol Biol. 101:383.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Maddison W, Maddison DR.. 2001. Mesquite: a modular system for evolutionary analysis. [Cited 2016 November 11]. Available from: http://mesquiteproject.org/.
  27. Maddison DR, Swofford DL, Maddison WP.. 1997. NEXUS: an extensible file format for systematic information. Syst Biol. 464:590–621. [DOI] [PubMed] [Google Scholar]
  28. Olsen G. 1990. Gary olsen’s interpretation of the” newick’s 8:45” tree format standard [cited 2017 Jan 10]. Available from: http://evolution.genetics.washington.edu/phylip/newick_doc.html.
  29. Paradis E, Claude J, Strimmer K.. 2003. Ape: analyses of phylogenetics and evolution in r language. Bioinformatics. 202:289–290. [DOI] [PubMed] [Google Scholar]
  30. Rambaut A. 2007. FigTree, a graphical viewer of phylogenetic trees [cited 2015 Jul 27]. Available from: http://tree.bio.ed.ac.uk/software/figtree/.
  31. Robinson D, Foulds LR.. 1981. Comparison of phylogenetic trees. Math Biosci. 531:131–147. [Google Scholar]
  32. Roderic DMP. 1996. TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 124:357–358. [DOI] [PubMed] [Google Scholar]
  33. Ronquist F, Huelsenbeck JP.. 2003. Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 1912:1572–1574. [DOI] [PubMed] [Google Scholar]
  34. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP.. 2012. Mrbayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 613:539–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al. 2002. The bioperl toolkit: perl modules for the life sciences. Genome Res. 1210:1611–1618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Stamatakis A. 2006a. Phylogenetic models of rate heterogeneity: a high performance computing perspective. In: Spirakis Paul, Siegel H.J. editors. 20th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS); April 25-29, 2006; Rhodes Island, Greece. High Performance Computational Biology Workshop. p. 8.
  37. Stamatakis A. 2006b. Raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2221:2688–2690. [DOI] [PubMed] [Google Scholar]
  38. Stamatakis A. 2014. Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 309:1312–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Stamatakis A, Hoover P, Rougemont J.. 2008. A rapid bootstrap algorithm for the raxml web servers. Syst Biol. 575:758–771. [DOI] [PubMed] [Google Scholar]
  40. Sukumaran J, Holder MT.. 2010. Dendropy: a python library for phylogenetic computing. Bioinformatics. 2612:1569–1571. [DOI] [PubMed] [Google Scholar]
  41. Talevich E, Invergo BM, Cock PJ, Chapman BA.. 2012. Bio.Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics. 131:209.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Tamura K, Dudley J, Nei M, Kumar S.. 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 248:1596–1599. [DOI] [PubMed] [Google Scholar]
  43. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S.. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2810:2731–2739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S.. 2013. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 3012:2725–2729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Zhang H, Gao S, Lercher MJ, Hu S, Chen WHH.. 2012. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees. Nucleic Acids Res. 40(W1):W569-W572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Zmasek C. 2015. Archaeopteryx documentation. [cited 2016 Mar 07]. Available from: https://sites.google.com/site/cmzmasek/home/software/archaeopteryx/documentation#TOC-Internal-Node-Names-are-Confidence-Vales.
  47. Zmasek CM, Eddy SR.. 2001. ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics. 174:383–384. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES