Abstract
Background
Ancestral character states computed from the combination of phylogenetic trees with extrinsic traits are used to decipher evolutionary scenarios in various research fields such as phylogeography, epidemiology, and ecology. Despite the existence of powerful methods and software in ancestral character state inference, difficulties may arise when interpreting the outputs of such inferences. The growing complexity of data (trees, annotations), the diversity of optimization criteria for computing trees and ancestral character states, the combinatorial explosion of potential evolutionary scenarios if some ancestral characters states do not stand out clearly from others, requires the design of new methods to explore associations of phylogenetic trees with extrinsic traits, to ease the visualization and interpretation of evolutionary scenarios.
Result
We developed PastView, a user-friendly interface that includes numerical and graphical features to help users to import and/or compute ancestral character states from discrete variables and extract ancestral scenarios as sets of successive transitions of character states from the tree root to its leaves. PastView provides summarized views such as transition maps and integrates comparative tools to highlight agreements or discrepancies between methods of ancestral annotations inference.
Conclusion
The main contribution of PastView is to assemble known numerical and graphical methods into a multi-maps graphical user interface dedicated to the computing, searching and viewing of evolutionary scenarios based on phylogenetic trees and ancestral character states. PastView is available publicly as a standalone software on www.pastview.org.
Keywords: Phylogeny, Ancestral character states, Evolutionary scenario
Background
Phylogenetic trees are often combined with extrinsic traits to reconstruct evolutionary scenarios in various research fields, including phylogeography, molecular epidemiology, and ecology. The analysis starts with the computation of ancestral annotations from extrinsic traits (discrete variables such as geographic origin, resistance to a treatment, and life history traits), associated with the sampled sequences used to build the phylogenetic tree. It continues with the study of the evolutionary changes from the tree root to its leaves to characterize evolutionary scenarios such as the spread of a disease, the dynamic of a drug resistance, or shifts in ecological habitats.
There are numerous methods and software applications in ancestral character state inferences, among the most used are SIMMAP [1], BEAST [2], SpreaD3 [3], and various functions implemented in R packages [4, 5]. However, many difficulties may arise when interpreting the outputs of these methods. First, there is a diversity of optimization criteria for computing trees and ancestral annotations (parsimony, maximum likelihood, Bayesian inference) and each of them involves various models of evolution; these methods may all yield different results that need to be compared. Second, the tree size and the complexity of the annotations can be an inconvenience with respect to computation time, and the results often require tedious graphical analyses to be interpreted. Third, although probabilistic models give more accurate results than other methods, if some ancestral characters do not stand out clearly from others (i.e. if they are not much more likely), these methods may produce a combinatorial explosion of potential evolutionary scenarios.
To the best of our knowledge, there are no tools available to compare different sets of ancestral annotations and to find common patterns across multiple evolutionary scenarios from a beam of ancestral annotation transitions. We thus present PastView that provides numerical and graphical tools to facilitate the interpretation of evolutionary scenarios derived from phylogenetic trees with ancestral character states.
Implementation
PastView is written in Tcl/Tk. It is an open source, cross-platform, standalone editor available for Windows and Unix-like systems including OSX. The PastView process includes five steps (Fig. 1): (1) data input (tree and annotations), (2) tree and/or annotations edition, (3) computing and displaying of ancestral annotations, (4) study of ancestral annotations transitions, and (5) comparative analysis between different methods of ancestral annotation inference. The PastView user interface/architecture (Fig. 2) of PastView is based on multi-maps, where each map is subdivided into a main view for the tree display and secondary view(s). PastView’s features are grouped into five toolboxes (Fig. 2c), following its five main steps:
The “Input/output” toolbox includes controls for I/O management: loading a phylogenetic tree (Newick format) and ancestral annotations or primary annotations (i.e. annotations for leaves only, see below for computing ancestral annotations) in CSV format. The user can also import data in the NEXUS format, which merges tree topology with ancestral annotations (e.g. BEAST output). PastView’s graphical output format is PostScript and/or SVG for high quality graphics and CSV for annotations.
The “Editing trees and annotations” toolbox includes controls for basic editing of trees and annotations: tree layouts, rooting, swapping, ladderizing, zooming in/out, etc.
The third toolbox is related to the “Computation and display of ancestral annotations”. PastView computes ancestral annotations using parsimony: DELTRAN and ACCTRAN [6–8], and maximum likelihood (ML) assuming an F81-like model of character evolution. The branch lengths (typically obtained from sequences) are rescaled to fit the rate of evolution of the studied character. A constant scaling factor is applied to all branch lengths, with value inferred by ML. PastView provides both the marginal posterior state probabilities (using standard tree traversals [9, 10]) and the joint reconstruction with maximum likelihood (using dynamic programming [11]). Details of ML computations can be found in [12]. PastView can also input ancestral annotations computed using other programs (e.g. BEAST). Ancestral annotations are displayed with color-coded bubbles and pie charts with the use of thresholds/filters to simplify views (e.g. highlighting contentious nodes with several annotations having high but close probabilities, assemble annotations with small probabilities, etc.). Ancestral annotation can also be displayed with a tree foreground color-coded under the constraints of a probability threshold and/or a background color-coded according to the Size criterion (Sz, i.e. number of descendant with the same annotation, as in the PhyloType method [13]).
Toolbox for analysis of ancestral annotation transitions: a transition is a change in ancestral annotation between subsequent nodes of a rooted phylogenic tree (Top-Down reading). PastView includes numerical and graphical methods to study transition suites. A summary view of all transitions observed in the tree is called a transition map (Fig. 3). There are three kinds of transition maps included in PastView. With the transition map of type 1, we start with a node corresponding to the ancestral root state i, and then create a node j when a transition i - > j is observed; this procedure is continued in a top-down fashion until we reach the tree tips. This map thus stores all transitions occurring in the tree. In the second type of maps we cluster together identical transitions occurring at different positions and depths in the phylogenetic tree, when they have the same ancestor in the transition map of type 1. For instance, in the example Fig. 3, the transition from « gray square » to « red circle » is observed twice in layout 1 (nodes t1 and t2), they are then collapsed into one node in layout 2. Lastly, transition maps of type 3 bring together identical transitions occurring at different positions and depths in the tree, when they have the same ancestral annotation in the transition map of type 2. Transitions maps can be displayed with different layouts (rectangular, slanted, and radial) and can be drawn dynamically to show the transitions depending on the node distance to the tree root (transition map of type 1 only). Conversely, pointing at a node on a transition map automatically highlights the corresponding part(s) of the phylogenetic tree. The transition maps can be saved in Newick format. The transition matrices provide other overviews of transitions, crossing annotations and giving counts or more elaborated indices of transitions between pairs of annotations (for instance, see [14]). PastView includes two types of matrices. The first one computes the number of transitions in the tree, from one ancestral annotation to another. The second one is a fast, count-based estimation of the relative transition rates, where raw counts are normalized and divided by state priors. PastView offers a query system to search for a given evolutionary path: the user selects a transition sequence (wild cards may be used) and matching tree pathways are identified and highlighted. For example, the scenario “A * B” lights up the tree pathways entailing the transitions from “A” to “B”, independently of the intermediaries (example in Fig. 4e).
Comparative analyses: this toolbox is dedicated to comparing multiple datasets (trees/ancestral annotations). For example, given a tree and ancestral annotations resulting from different inferences, PastView automatically adjusts tree foreground colors for nodes with similar results, but highlights nodes (bubbles, pie charts) exhibiting incongruences in ancestral annotations (example in Fig. 5a).
Results
In the following, we present two published examples of virus phylogeographic studies re-examined with PastView. The first one is related to the HIV-1A epidemiological history in Albania. The second is related to the global phylogeography of Dengue serotype 1 virus.
Example 1
Study of HIV-1A epidemiological history in Albania.
Figure 4 shows a simple example of PastView output, in the study of HIV-1A epidemiological history in Albania. The dataset, tree, and origin of the sequences (locations) are from [15] (see also [13]). Ancestral locations are computed by parsimony (DELTRAN), maximum of the marginal posterior probabilities (MAP, F-81 like), and joint most likely scenario. Computation time of ancestral annotations for this small dataset (153 strains) is ~ 2 s with a 3.1 GHz Intel I7 computer. The tree is foreground color-coded if the ancestral annotations are identical between MAP F-81 like, joint most likely reconstruction and parsimony. If not, color-coded bubbles are displayed (Fig. 4a) (in the same order, reading from root to tips). Here, we identify nine nodes with discrepancies, and only four nodes if we compare MAP and joint inferences only. The pie charts display (Fig. 4b) the posteriors for nodes having two or more annotations with probability higher than their MAP F81 like probability value minus 40%. Based on MAP and joint inferences, the transition maps (Fig. 4c and d) are slightly different but output the same global scenario: the virus spread from Africa to West Europe, East Europe and Greece, and then from Greece to Albania, with a few Greek sequences coming back from Albania. This reaches the same general conclusion as [15]: there has been a single major introduction of HIV-1A from Greece followed by a local epidemic spread. This result (Fig. 4e) is highlighted by the thick pathways from the tree root to all entries to Albania.
Example 2
Study of the global phylogeography of dengue serotype 1 virus.
The second example is related to the study of the worldwide phylogeography of Dengue virus of serotype 1 (DENV-1). The dataset, tree (269 strains), and origin of the sequences (13 locations) were obtained from [16]. Tree and ancestral annotations are first computed with BEAST (same parameters / model as [16]), then imported in PastView. Another set of ancestral annotations is computed with PastView (F81-like method, MAP, computation time is ~40s with an 3.1 GHz Intel I7 computer). The tree is then foreground color-coded (Fig. 5a) if the ancestral annotations (MAPs) are the same between the two methods. If not, pie-charts are displayed (from a root to tips reading: left = BEAST, right = PastView). Both analyses reach the same main conclusions as described in [16]: Southeast Asia countries are found to be the most likely origin of the virus dispersion and India played a crucial role in the establishment, evolution and dispersal of the Cosmopolitan (Africa, America, Carribean, East & Southeast Asia) DENV-1 genotype (Fig. 5b and c). The Caribbean region is also found by both methods as the dissemination origin point of the virus to the Americas (South and Central America, then North America). If there is a global consensus for the most ancestral nodes/ transitions, some differences between the results obtained with the two methods exist especially for the most recent nodes/transitions (Fig. 5a, b and c). These disagreements are related mainly to transitions within small clusters of sequences with heterogeneous annotations.
Conclusion
The main contribution of PastView is to assemble a series of numerical and graphical tools into a multi-map interface dedicated to computing, searching and viewing ancestral scenarios. One challenge will be to develop and integrate new methods to compute ancestral annotations combining the advantages of the nuanced outputs of the ML/Bayesian settings, and the overly stringent outputs of the joint and MAP methods, such as the recent decision theory-based PastML approach [12]. In the long-term, a next step will be to integrate statistical tools and information visualization methods to quickly identify robust evolutionary scenarios, such as a consensus of transition maps between different methods (tree computations, ancestral annotation inferences, see for instance [17]), or for more elaborated transition maps in the case of huge datasets. In the context of massive data, which are more are more common nowadays (e.g. from high-throughput sequencing), it is easy to imagine analyses of trees comprising tens of thousands of tips (e.g. HIV strains) associated to complex, multiple annotations (e.g. clinical data). Then, it will become necessary to design methods that operate on tree topologies / extrinsic traits associations to produce user-friendly views summarizing large, complex evolutionary scenarios.
Availability and requirements
Project name: PastView
Project home page: http://pastview.org/
Operating system(s): Platform independent
Programming language: Tcl/Tk
Other requirements: ActiveTcl 8.6.8 or higher
License: GNU GPL
Any restrictions to use by non-academics: none
Acknowledgements
We would like to thank Stéphane Guindon for providing insightful ideas on software and manuscript.
Authors’ contributions
FC conceived the idea, developed the tool and wrote the manuscript. OG conceived the idea, participated in supervision of the project. EJ participated in the idea and tested the tool. GC participated in the idea and in supervision of the project, and tested the tool. All authors have read, revised, and approved the final manuscript.
Funding
This work was supported by The PALADIN project (GC), publicly funded through the French National Research Agency under the “Investissements d’avenir” program with the reference ANR-10-LABX-04-01 Labex CEMEB, and coordinated by the University of Montpellier, by the EU-H2020 Virogenesis project (grant number 634650, OG), and by the INCEPTION project (PIA/ANR-16-CONV-0005, OG).
Availability of data and materials
PastView source code and data (tree, primary and ancestral annotations) of the two examples used in the manuscript are available on http://pastview.org/ and in the PastView package. PastView and the source code is publicly available on www.pastview.org
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Bollback JP. SIMMAP: stochastic character mapping of discrete traits on phylogenies. BMC Bioinformatics. 2006;7:88. doi: 10.1186/1471-2105-7-88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4(1). 10.1093/ve/vey016. eCollection 2018 Jan. [DOI] [PMC free article] [PubMed]
- 3.Bielejec F, Baele G, Vrancken B, Suchard MA, Rambaut A, Lemey P. Sprea3D: interactive visualization of spatiotemporal history and trait evolutionary processes. Mol Biol Evol. 2016;33(8):2167–2169. doi: 10.1093/molbev/msw082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Paradis E, Claude J, Strimmer K. APE: analyses of Phylogenetics and evolution in R language. Bioinformatics. 2004;20(2):289–290. doi: 10.1093/bioinformatics/btg412. [DOI] [PubMed] [Google Scholar]
- 5.Revell LJ. Phytools: an R package for phylogenetic comparative biology (and other things) Methods Ecol Evol. 2011;3:217–223. doi: 10.1111/j.2041-210X.2011.00169.x. [DOI] [Google Scholar]
- 6.Fitch WM. Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool. 1971;20(4):406–416. doi: 10.2307/2412116. [DOI] [Google Scholar]
- 7.Farris JS. Methods for computing Wagner trees. Syst Zool. 1970;19(1):83–92. doi: 10.2307/2412028. [DOI] [Google Scholar]
- 8.Swofford D, Maddison W. Reconstructing ancestral character states under Wagner parsimony. Math Biosci. 1987;87(2):199–229. doi: 10.1016/0025-5564(87)90074-5. [DOI] [Google Scholar]
- 9.Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–376. doi: 10.1007/BF01734359. [DOI] [PubMed] [Google Scholar]
- 10.Gascuel O, Steel M. Predicting the ancestral character changes in a tree is typically easier than predicting the root state. Syst Biol. 2014;63(3):421–435. doi: 10.1093/sysbio/syu010. [DOI] [PubMed] [Google Scholar]
- 11.Pupko T, Pe’er I, Shamir R, Graur D. A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol Biol and Evo. 2000;17(6):890–896. doi: 10.1093/oxfordjournals.molbev.a026369. [DOI] [PubMed] [Google Scholar]
- 12.Ishikawa S, Zhukova A, Iwasaki W, Gascuel O. Fast likelihood method to reconstruct and visualize ancestral scenarios. Mol Biol Evol. 2019. 10.1093/molbev/msz131. [DOI] [PMC free article] [PubMed]
- 13.Chevenet F, Jung M, Peeters M, de Oliveira T, Gascuel O. Searching for virus phylotypes. Bioinformatics. 2013;29(5):561–570. doi: 10.1093/bioinformatics/btt010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Maddison D, Maddison W. MacClade 4. 2003. [Google Scholar]
- 15.Salemi Marco, de Oliveira Tulio, Ciccozzi Massimo, Rezza Giovanni, Goodenow Maureen M. High-Resolution Molecular Epidemiology and Evolutionary History of HIV-1 Subtypes in Albania. PLoS ONE. 2008;3(1):e1390. doi: 10.1371/journal.pone.0001390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Walimbe AM, Lotankar M, Cecilia D, Cherian SS. Global phylogeography of dengue type 1 and 2 viruses reveals the role of India. Infect Genet Evol. 2014;22:30–39. doi: 10.1016/j.meegid.2014.01.001. [DOI] [PubMed] [Google Scholar]
- 17.Cui Y, Jansson J, Sung WK. Polynomial-time algorithms for building a consensus MUL-tree. J Comput Biol. 2012;19(9):1073–1088. doi: 10.1089/cmb.2012.0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
PastView source code and data (tree, primary and ancestral annotations) of the two examples used in the manuscript are available on http://pastview.org/ and in the PastView package. PastView and the source code is publicly available on www.pastview.org